[00:04:53] <grrrit-wm>	 (03PS2) 10Yuvipanda: dynamicproxy: Migrate to python3 [puppet] - 10https://gerrit.wikimedia.org/r/291565 (owner: 10Ladsgroup)
[00:05:09] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] "All of these packages seem available, so let me give this a shot." [puppet] - 10https://gerrit.wikimedia.org/r/291565 (owner: 10Ladsgroup)
[00:14:54] <grrrit-wm>	 (03CR) 10Yuvipanda: "and fix all the service {} stanzas as well." [puppet] - 10https://gerrit.wikimedia.org/r/291751 (owner: 10Alexandros Kosiaris)
[00:15:24] <grrrit-wm>	 (03PS2) 10Yuvipanda: Attempt to fix dynamicproxy-api service [puppet] - 10https://gerrit.wikimedia.org/r/289870 (owner: 10Alex Monk)
[00:19:42] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] Attempt to fix dynamicproxy-api service [puppet] - 10https://gerrit.wikimedia.org/r/289870 (owner: 10Alex Monk)
[00:29:51] <grrrit-wm>	 (03PS1) 10Yuvipanda: dynamicproxy: Followup to I7c13506b1a38f03815481651fd13411f7cf7c0c9 [puppet] - 10https://gerrit.wikimedia.org/r/291853 
[00:31:01] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] dynamicproxy: Followup to I7c13506b1a38f03815481651fd13411f7cf7c0c9 [puppet] - 10https://gerrit.wikimedia.org/r/291853 (owner: 10Yuvipanda)
[02:25:43] <logmsgbot>	 !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.3) (duration: 09m 30s)
[02:25:51] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:31:37] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Tue May 31 02:31:36 UTC 2016 (duration 5m 54s)
[02:31:42] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[05:05:09] <icinga-wm>	 RECOVERY - cassandra-c CQL 10.64.32.196:9042 on restbase1008 is OK: TCP OK - 0.042 second response time on port 9042
[05:36:42] <grrrit-wm>	 (03PS4) 10Giuseppe Lavagetto: base::grub: fix the ioscheduler setting [puppet] - 10https://gerrit.wikimedia.org/r/291706 
[05:37:59] <_joe_>	 how the hell can jenkins be slow at this time of the day?
[05:40:51] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] base::grub: fix the ioscheduler setting [puppet] - 10https://gerrit.wikimedia.org/r/291706 (owner: 10Giuseppe Lavagetto)
[05:55:06] <grrrit-wm>	 (03PS1) 10Mobrovac: service::node: Provide MW API and RESTBase request templates [puppet] - 10https://gerrit.wikimedia.org/r/291857 
[06:02:25] <grrrit-wm>	 (03PS4) 10Giuseppe Lavagetto: base::grub: actually use augeas on jessie [puppet] - 10https://gerrit.wikimedia.org/r/291707 
[06:04:58] <grrrit-wm>	 (03CR) 10Mobrovac: "OKed by PCC - https://puppet-compiler.wmflabs.org/2994/" [puppet] - 10https://gerrit.wikimedia.org/r/291857 (owner: 10Mobrovac)
[06:05:08] <grrrit-wm>	 (03CR) 10Mobrovac: [C: 031] Change-Prop: White-list user-agent header header in http filter [puppet] - 10https://gerrit.wikimedia.org/r/291784 (owner: 10Ppchelko)
[06:06:46] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] "Let's go." [puppet] - 10https://gerrit.wikimedia.org/r/291707 (owner: 10Giuseppe Lavagetto)
[06:13:56] <grrrit-wm>	 (03PS2) 10Giuseppe Lavagetto: base::grub: allow enabling the memory cgroup controller [puppet] - 10https://gerrit.wikimedia.org/r/291772 
[06:15:23] <YuviPanda>	 _joe_: I think your aug changes might be causing some puppet failure in labs
[06:15:36] <YuviPanda>	 > ESC[1;31mError: /Stage[main]/Base::Grub/Augeas[grub2]: Could not evaluate: Save failed with return code false, see debugESC[0m
[06:15:38] <YuviPanda>	 ESC[mNotice: /Stage[main]/Base::Grub/Exec[update-grub]: Dependency Augeas[grub2] has failures: trueESC[0m
[06:15:40] <YuviPanda>	 ESC[1;31mWarning: /Stage[main]/Base::Grub/Exec[update-grub]: Skipping because of failed dependenciesESC[0m
[06:16:15] <_joe_>	 YuviPanda: wat?
[06:16:21] <_joe_>	 in prod it's working well
[06:16:29] <_joe_>	 you mean we don't install augeas in labs?
[06:16:45] <YuviPanda>	 _joe_: not sure. am running puppet again to see
[06:16:46] <_joe_>	 YuviPanda: can you name one machine where I can see that?
[06:16:56] <_joe_>	 ot
[06:17:02] <YuviPanda>	 _joe_: I'm looking at tools-exec-1409
[06:17:04] <_joe_>	 *is it a jessie machine btw?
[06:17:08] <YuviPanda>	 but waiting for another run
[06:17:10] <YuviPanda>	 to see if it works fine
[06:17:12] <YuviPanda>	 _joe_: no trusty
[06:17:17] <YuviPanda>	 nope it's still failing
[06:17:23] <_joe_>	 trusty doesn't have augeas 1.2.0 AFAIR
[06:17:31] <YuviPanda>	 let me check a jessie / precise machine
[06:17:49] <_joe_>	 oh no, it has augeas
[06:17:50] <_joe_>	 cool
[06:18:10] <_joe_>	 dpkg -l libaugeas0
[06:18:35] <YuviPanda>	 is fine on jessie
[06:19:12] <_joe_>	 YuviPanda: uhm ok
[06:19:43] <_joe_>	 YuviPanda: that's strange, in prod on a trusty machine it works like a charm
[06:19:45] <YuviPanda>	 (is still running in precise)
[06:19:57] <YuviPanda>	 _joe_: I wonder if it is a problem with us having trusty-backports?
[06:20:12] <YuviPanda>	 _joe_: ok, fine on precise too.
[06:20:15] <moritzm>	 !log upgrading hhvm in eqiad (also picking up updated versions of icu and lcms)
[06:20:21] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[06:20:36] <_joe_>	 YuviPanda: I think it's an issue with that specific machine
[06:20:51] <YuviPanda>	 _joe_: yeah, am trying a different one right now
[06:21:02] <_joe_>	 I am running puppet with --debug there
[06:21:18] <YuviPanda>	 (am testing on tools-bastion-03 now)
[06:22:44] <_joe_>	 oblivian@tools-exec-1409:~$ sudo cat /etc/default/grub
[06:22:44] <_joe_>	 cat: /etc/default/grub: No such file or directory
[06:22:56] <_joe_>	 YuviPanda: no shit it fails :P
[06:23:13] <YuviPanda>	 hmm
[06:23:21] <YuviPanda>	 it works fine on -bastion-03 :D
[06:23:26] <YuviPanda>	 _joe_: I've no idea what that implies tho :D
[06:23:47] <YuviPanda>	 _joe_: my suspicion now is that these were all from a particular base image that did something to grub maybe
[06:23:54] <_joe_>	 YuviPanda: it implies that if you upgrade the kernel there, you won't be able to rebuild the bootloader
[06:30:10] <icinga-wm>	 PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:30:19] <icinga-wm>	 PROBLEM - puppet last run on mc2007 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:30:48] <icinga-wm>	 PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:30:49] <icinga-wm>	 PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: puppet fail
[06:31:09] <icinga-wm>	 PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:31:29] <icinga-wm>	 PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:09] <icinga-wm>	 PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:10] <icinga-wm>	 PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:40] <icinga-wm>	 PROBLEM - puppet last run on ms-be2021 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:37:37] <grrrit-wm>	 (03PS3) 10Giuseppe Lavagetto: base::grub: allow enabling the memory cgroup controller [puppet] - 10https://gerrit.wikimedia.org/r/291772 
[06:39:11] <grrrit-wm>	 (03PS4) 10Giuseppe Lavagetto: base::grub: allow enabling the memory cgroup controller [puppet] - 10https://gerrit.wikimedia.org/r/291772 
[06:43:08] <icinga-wm>	 PROBLEM - puppet last run on snapshot1003 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:46:18] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0]
[06:47:01] <grrrit-wm>	 (03PS5) 10Giuseppe Lavagetto: base::grub: allow enabling the memory cgroup controller [puppet] - 10https://gerrit.wikimedia.org/r/291772 
[06:47:07] <_joe_>	 can someone look int the 5xxs?
[06:47:08] <icinga-wm>	 PROBLEM - puppet last run on mw1161 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:47:58] <_joe_>	 moritzm: these failures ^^ seem related to hhvm
[06:48:23] <jynus>	 let me see oxygen
[06:49:18] <jynus>	 I think current status is ok, normally an indicative of a passed spike
[06:49:25] <jynus>	 let me confirm that
[06:49:35] <_joe_>	 it could be moritzm upgrading hhvm
[06:50:14] <moritzm>	 having a look
[06:50:19] <jynus>	 if that is true, maybe it should be done slower, but let me confirm first the status
[06:50:46] <jynus>	 yes, it was a spike, a relatively large one, however
[06:50:47] <moritzm>	 haven't started the restarts yet, though, upgrading all the mw* systems in eqiad and pulling their 0.5 GB dbg package took a little
[06:50:49] <icinga-wm>	 RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures
[06:50:50] <_joe_>	 jynus: even when done very slow some more 503s are expected, not enough to create a spike though
[06:51:06] <_joe_>	 moritzm: well upgrade does the hhvm restart already
[06:51:14] <jynus>	 large is a bad word
[06:51:20] <jynus>	 wide, but not tall
[06:51:42] <jynus>	 let me now see why
[06:51:46] <moritzm>	 _joe_: ah, ofc. silly me
[06:52:57] <moritzm>	 the puppet failure on mw1161 resolved itself with a puppet run which executed /usr/local/sbin/install-pkg-src from hhvm::debug
[06:53:40] <moritzm>	 sorry about the noise, need coffee
[06:53:57] <_joe_>	 eheh
[06:54:17] <jynus>	 it seems mostly api calls, was the one(s) affected an api node?
[06:55:06] <jynus>	 let's go to mediawiki to find out
[06:55:42] <jynus>	 mmm, high db errors
[06:55:59] <icinga-wm>	 RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[06:56:18] <icinga-wm>	 RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[06:56:31] <jynus>	 of course, mediawiki will not have that if HHVM was the culprit
[06:57:09] <icinga-wm>	 RECOVERY - puppet last run on mw1226 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:20] <icinga-wm>	 RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:29] <icinga-wm>	 RECOVERY - puppet last run on mc2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:29] <jynus>	 db1049 having issues, though
[06:57:49] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[06:58:00] <icinga-wm>	 RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures
[06:58:38] <icinga-wm>	 RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:48] <icinga-wm>	 RECOVERY - puppet last run on ms-be2021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:59:06] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] base::grub: allow enabling the memory cgroup controller [puppet] - 10https://gerrit.wikimedia.org/r/291772 (owner: 10Giuseppe Lavagetto)
[06:59:09] <icinga-wm>	 RECOVERY - puppet last run on restbase2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:59:30] <jynus>	 most of the issues are rpc-related
[07:00:58] <jynus>	 the isses started happening all of a sudden since 14h yesterday
[07:01:55] <grrrit-wm>	 (03PS2) 10Muehlenhoff: Stop using package->latest in gerrit module [puppet] - 10https://gerrit.wikimedia.org/r/291762 (https://phabricator.wikimedia.org/T115348) 
[07:02:24] <jynus>	 there is no lag, but there are wikidata jobs locking for >10 seconds
[07:02:52] <jynus>	 some wikiadmin jobs >500 seconds
[07:04:13] <jynus>	 nothing interesting happening at that time, no deploys
[07:04:58] <icinga-wm>	 RECOVERY - puppet last run on snapshot1003 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[07:08:26] <jynus>	 I do not see pure OS issues
[07:09:25] <jynus>	 one critical disk, though
[07:10:42] <jynus>	 but it is is only 2 Media Error
[07:14:02] <jynus>	 I am going to put offline that drive to discard disk issues
[07:15:54] <jynus>	 !log db1049> megacli -PDOffline -PhysDrv '[32:4]' -a0
[07:16:00] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[07:16:17] <jynus>	 let's see if that helps
[07:19:19] <jynus>	 if it does, we may want to tune raid heuristics for doing that automatically, if that is a thing
[07:20:38] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "@halfak: I am not following. You mean that somehow "ores" is not clearly distinguishable from "celery-ores-worker" ?" [puppet] - 10https://gerrit.wikimedia.org/r/291751 (owner: 10Alexandros Kosiaris)
[07:21:18] <icinga-wm>	 PROBLEM - MegaRAID on db1049 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded)
[07:25:00] <jynus>	 ^that is expected and self-created
[07:28:19] <grrrit-wm>	 (03CR) 10Aklapper: "The DBs phabricator_maniphest.edge and phabricator_maniphest.maniphest_task were not used beforehand in this script, so this might require" [puppet] - 10https://gerrit.wikimedia.org/r/291781 (https://phabricator.wikimedia.org/T133649) (owner: 10Aklapper)
[07:28:22] <grrrit-wm>	 (03CR) 10Jcrespo: "https://puppet-compiler.wmflabs.org/2995/" [puppet] - 10https://gerrit.wikimedia.org/r/291784 (owner: 10Ppchelko)
[07:31:09] <wikibugs>	 06Operations, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, 06Services, and 2 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2340394 (10KartikMistry) There has been slow update, but I have built most of packages locally, seems OK on Jess...
[07:32:50] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 031] "Apart from the confusing "forward_headers" part which makes me think changeprop will be receiving requests from something and is requested" [puppet] - 10https://gerrit.wikimedia.org/r/291784 (owner: 10Ppchelko)
[07:32:57] <wikibugs>	 06Operations, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, 06Services, and 2 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2340398 (10KartikMistry)
[07:33:17] <wikibugs>	 06Operations, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, 06Services, and 2 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#1491973 (10KartikMistry)
[07:34:18] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 031] service::node: Provide MW API and RESTBase request templates [puppet] - 10https://gerrit.wikimedia.org/r/291857 (owner: 10Mobrovac)
[07:34:45] <jynus>	 as far as I can see, offlining the disk did not improve the issues
[07:35:19] <grrrit-wm>	 (03PS2) 10Jcrespo: Change-Prop: White-list user-agent header header in http filter [puppet] - 10https://gerrit.wikimedia.org/r/291784 (owner: 10Ppchelko)
[07:36:36] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Change-Prop: White-list user-agent header header in http filter [puppet] - 10https://gerrit.wikimedia.org/r/291784 (owner: 10Ppchelko)
[07:46:10] <mobrovac>	 !log change-prop deploying 980f65c
[07:46:15] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[07:56:03] <grrrit-wm>	 (03PS1) 10Elukey: Clarify the use of TAG_F_NOVARMATCH within the context of %{<strftime-format>}t. [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/291870 (https://phabricator.wikimedia.org/T136314) 
[08:01:14] <grrrit-wm>	 (03PS2) 10Elukey: Clarify the use of TAG_F_NOVARMATCH within the context of %{<strftime-format>}t. [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/291870 (https://phabricator.wikimedia.org/T136314) 
[08:03:17] <jynus>	 let's revive that disk, that didn't work
[08:04:49] <jynus>	 !log db1049> megacli -PDOnline -PhysDrv '[32:4]' -a0
[08:04:55] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[08:06:47] <icinga-wm>	 RECOVERY - MegaRAID on db1049 is OK: OK: optimal, 1 logical, 2 physical
[08:07:36] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: Package and backport Thumbor dependencies in Debian - https://phabricator.wikimedia.org/T134485#2340433 (10Gilles) I've started working on https://wiki.debian.org/Python/Thumbor as Marcelo requested and I'm almost done filing ITPs. I'll add a column for jessie and...
[08:18:08] <icinga-wm>	 PROBLEM - MegaRAID on db1049 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded)
[08:24:03] <wikibugs>	 06Operations: Linking a bn.wikipedia.org button to G+ page. - https://phabricator.wikimedia.org/T109810#2340449 (10Aklapper) >>! In T109810#2197918, @Jalexander wrote: > will look into it in the morning.  @Jalexander: Any news to share here? Thanks!
[08:30:32] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on db1049 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) Jcrespo rebuilding completely just in case
[08:31:37] <wikibugs>	 06Operations, 06Analytics-Kanban, 10Traffic, 13Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2340456 (10elukey) I discussed with @ema the inconsistency that we are seeing and we came to the conclusion that this change could be...
[08:31:40] <jynus>	 given than #1 cause is not the reason, let's go to #2: bots
[08:32:53] <jynus>	 BotNinja is there, but it is behaving
[08:34:46] <mobrovac>	 !log restbase deploy start of fcd62e1
[08:34:51] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[08:50:04] <mobrovac>	 !log restbase deploy end of fcd62e1
[08:50:09] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[08:52:42] <grrrit-wm>	 (03CR) 10Mobrovac: [C: 031] "Cherry-picked in Beta as well, works." [puppet] - 10https://gerrit.wikimedia.org/r/291857 (owner: 10Mobrovac)
[09:08:23] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: Package and backport Thumbor dependencies in Debian - https://phabricator.wikimedia.org/T134485#2340504 (10Gilles)
[09:19:46] <grrrit-wm>	 (03PS4) 10Mobrovac: Partially port RESTBaseUpdateJobs to change propagation. [puppet] - 10https://gerrit.wikimedia.org/r/291201 (owner: 10Ppchelko)
[09:39:27] <grrrit-wm>	 (03PS1) 10Gehel: Change expired file zoom level from 16 to 15. [puppet] - 10https://gerrit.wikimedia.org/r/291885 (https://phabricator.wikimedia.org/T136483) 
[09:43:13] <grrrit-wm>	 (03CR) 10Gehel: "Puppet compiler output: https://puppet-compiler.wmflabs.org/2997/" [puppet] - 10https://gerrit.wikimedia.org/r/291885 (https://phabricator.wikimedia.org/T136483) (owner: 10Gehel)
[09:49:28] <wikibugs>	 06Operations, 10MediaWiki-General-or-Unknown, 06Performance-Team: Update limit.sh to support systemd-based cgroup management - https://phabricator.wikimedia.org/T136603#2340617 (10Joe)
[09:59:50] <grrrit-wm>	 (03CR) 10Jcrespo: "I can confirm UID is correct and key is not shared with labs." [puppet] - 10https://gerrit.wikimedia.org/r/291255 (https://phabricator.wikimedia.org/T136417) (owner: 10Ladsgroup)
[10:06:15] <wikibugs>	 06Operations, 10Traffic, 07Browser-Support-Firefox, 07HTTPS: Secure connection failed when attempting to send POST request (if connection has been idle for a while; disabling HTTP/2 helps) - https://phabricator.wikimedia.org/T134869#2340659 (10Aklapper)
[10:11:25] <wikibugs>	 06Operations, 10Traffic, 07Browser-Support-Firefox, 07HTTPS: Secure connection failed when attempting to send POST request using HTTP/2 (if connection has been idle for a certain time) - https://phabricator.wikimedia.org/T134869#2340670 (10Danny_B)
[10:19:13] <grrrit-wm>	 (03PS54) 10Alexandros Kosiaris: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup)
[10:19:41] <grrrit-wm>	 (03PS3) 10Muehlenhoff: Stop using package->latest in gerrit module [puppet] - 10https://gerrit.wikimedia.org/r/291762 (https://phabricator.wikimedia.org/T115348) 
[10:19:52] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 032 V: 032] Stop using package->latest in gerrit module [puppet] - 10https://gerrit.wikimedia.org/r/291762 (https://phabricator.wikimedia.org/T115348) (owner: 10Muehlenhoff)
[10:22:09] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "amended per 20after4's recommendation" [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup)
[10:23:01] <grrrit-wm>	 (03PS2) 10Jcrespo: Add ladsgroup user key and data and production cluster access [puppet] - 10https://gerrit.wikimedia.org/r/291255 (https://phabricator.wikimedia.org/T136417) (owner: 10Ladsgroup)
[10:23:06] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: service::node: Provide MW API and RESTBase request templates [puppet] - 10https://gerrit.wikimedia.org/r/291857 (owner: 10Mobrovac)
[10:23:11] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] service::node: Provide MW API and RESTBase request templates [puppet] - 10https://gerrit.wikimedia.org/r/291857 (owner: 10Mobrovac)
[10:23:56] <grrrit-wm>	 (03PS3) 10Jcrespo: Add ladsgroup user key and data and production cluster access [puppet] - 10https://gerrit.wikimedia.org/r/291255 (https://phabricator.wikimedia.org/T136417) (owner: 10Ladsgroup)
[10:25:44] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Add ladsgroup user key and data and production cluster access [puppet] - 10https://gerrit.wikimedia.org/r/291255 (https://phabricator.wikimedia.org/T136417) (owner: 10Ladsgroup)
[10:26:36] <wikibugs>	 06Operations, 10MediaWiki-General-or-Unknown, 06Performance-Team: Update limit.sh to support systemd-based cgroup management - https://phabricator.wikimedia.org/T136603#2340617 (10akosiaris) Installing libpam-systemd seems to solve the above for logged in users, need to check if it solves it as well for serv...
[10:33:46] <wikibugs>	 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to stats for Ladsgroup - https://phabricator.wikimedia.org/T136417#2340736 (10jcrespo) With your description, I see you want statistics-privatedata-users which will give you access to the web request logs.   However, while I see you h...
[10:39:15] <wikibugs>	 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting deployment access (for deploying to scb) for Ladsgroup - https://phabricator.wikimedia.org/T136406#2340741 (10jcrespo) @Ladsgroup I have granted you access to the cluster (although not yet to scb or other hosts). While I work on the rest of...
[10:51:18] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 2 failures
[11:01:29] <icinga-wm>	 PROBLEM - puppet last run on mc2015 is CRITICAL: CRITICAL: puppet fail
[11:05:15] <icinga-wm>	 RECOVERY - MegaRAID on db1049 is OK: OK: optimal, 1 logical, 2 physical
[11:09:24] <grrrit-wm>	 (03CR) 10Elukey: "LGTM, maybe we could discuss Filippo's suggestion and then decide if merge or add the change?" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/291752 (https://phabricator.wikimedia.org/T132835) (owner: 10Ema)
[11:14:01] <wikibugs>	 06Operations, 06Analytics-Kanban: Jmxtrans failures on Kafka hosts caused metric holes in grafana - https://phabricator.wikimedia.org/T136405#2340779 (10elukey) @mforns: Sure! I think that we should follow up on the items in the task's description, namely:  1) follow up with upstream and package/test/deploy th...
[11:20:49] <grrrit-wm>	 (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/291116 (https://phabricator.wikimedia.org/T136188) (owner: 10Hashar)
[11:24:39] <grrrit-wm>	 (03PS2) 10Hashar: (DO NOT SUBMIT) chromium on hold, drop ensure => latest [puppet] - 10https://gerrit.wikimedia.org/r/291116 (https://phabricator.wikimedia.org/T136188) 
[11:26:45] <icinga-wm>	 RECOVERY - puppet last run on mc2015 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures
[11:34:16] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 645 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6160116 keys - replication_delay is 645
[11:47:15] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[11:48:08] <grrrit-wm>	 (03PS1) 10KartikMistry: Beta: Enable Compact Language Links for new users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291908 (https://phabricator.wikimedia.org/T136161) 
[11:51:18] <grrrit-wm>	 (03PS1) 10Muehlenhoff: Stop installing PHP on jessie app servers [puppet] - 10https://gerrit.wikimedia.org/r/291909 
[11:59:16] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6068894 keys - replication_delay is 0
[12:07:56] <wikibugs>	 06Operations, 10MediaWiki-General-or-Unknown, 06Performance-Team: Update limit.sh to support systemd-based cgroup management - https://phabricator.wikimedia.org/T136603#2340856 (10Joe) @akosiaris libpam-systemd solves the problem for users with a login session, as it registers a user slice etc.   It doesn't...
[12:09:58] <grrrit-wm>	 (03PS6) 10Muehlenhoff: Add a new backup set to backup openldap databases and enable on serpens [puppet] - 10https://gerrit.wikimedia.org/r/289824 (https://phabricator.wikimedia.org/T120919) 
[12:10:17] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 032 V: 032] Add a new backup set to backup openldap databases and enable on serpens [puppet] - 10https://gerrit.wikimedia.org/r/289824 (https://phabricator.wikimedia.org/T120919) (owner: 10Muehlenhoff)
[12:20:17] <Amir1>	 jynus: hey, around?
[12:21:50] <moritzm>	 !log continue rolling reboot of mc2* systems for Linux 4.4 upgrade
[12:21:55] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:23:08] <wikibugs>	 06Operations, 10ops-eqiad: HP Warning on boot [Firmware Bug]: the BIOS has corrupted hw-PMU resources - https://phabricator.wikimedia.org/T136345#2340874 (10jcrespo)
[12:24:11] <Amir1>	 jynus: I was able to access for the first time but I saw some warnings that I want to share with you. 
[12:24:22] <Amir1>	 maybe there is MIM
[12:24:23] <jynus>	 Amir1, please do
[12:41:48] <wikibugs>	 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting deployment access (for deploying to scb) for Ladsgroup - https://phabricator.wikimedia.org/T136406#2340892 (10Ladsgroup) @jcrespo [[https://people.wikimedia.org/~ladsgroup/|This]] might answer your question \o/
[12:51:43] <wikibugs>	 06Operations, 10ops-eqiad, 10fundraising-tech-ops: investigate RAID failure on beryllium.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T135178#2340902 (10Aklapper) >>! In T135178#2292645, @Cmjohnson wrote: > @jgreen,  We will need to scheduled down time to replace the disk.  @Cmjohnson / @jgreen: Has...
[12:58:47] <wikibugs>	 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to stats for Ladsgroup - https://phabricator.wikimedia.org/T136417#2340906 (10Ladsgroup) I only need access to stats1002 without having access to private data
[13:00:33] <grrrit-wm>	 (03PS1) 10Hashar: zuul: log 'connection' bucket [puppet] - 10https://gerrit.wikimedia.org/r/291913 
[13:06:08] <wikibugs>	 06Operations, 06Analytics-Kanban: Jmxtrans failures on Kafka hosts caused metric holes in grafana - https://phabricator.wikimedia.org/T136405#2340918 (10mforns) Thanks @elukey!
[13:07:38] <grrrit-wm>	 (03PS2) 10BBlack: raise fe mem size to 37% on text and upload [puppet] - 10https://gerrit.wikimedia.org/r/291593 (https://phabricator.wikimedia.org/T135384) 
[13:07:40] <grrrit-wm>	 (03PS2) 10BBlack: varnish: jemalloc tuning for frontend caches [puppet] - 10https://gerrit.wikimedia.org/r/291592 (https://phabricator.wikimedia.org/T135384) 
[13:07:53] <grrrit-wm>	 (03PS1) 10BBlack: tlsproxy: double ssl session cache size [puppet] - 10https://gerrit.wikimedia.org/r/291914 
[13:09:10] <grrrit-wm>	 (03CR) 10BBlack: varnish: jemalloc tuning for frontend caches (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/291592 (https://phabricator.wikimedia.org/T135384) (owner: 10BBlack)
[13:16:34] <grrrit-wm>	 (03CR) 10Lokal Profil: "My apologies.I didn't know I was the one who could add it to a deploy window. I'll figure out which works for me and add it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/288582 (https://phabricator.wikimedia.org/T135212) (owner: 10Lokal Profil)
[13:20:11] <wikibugs>	 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to stats for Ladsgroup - https://phabricator.wikimedia.org/T136417#2340940 (10jcrespo) From what I undestand after talking on IRC to Ladsgroup, then the right groups is analytics-users: access to stats1002 and stats1004 with no direct...
[13:27:41] <grrrit-wm>	 (03PS1) 10Elukey: Deploy memcached 1.4.25 to mc1010 as part of a performance experiment. [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) 
[13:28:13] <icinga-wm>	 PROBLEM - Disk space on ms-be2012 is CRITICAL: DISK CRITICAL - free space: / 2124 MB (3% inode=96%)
[13:32:26] <grrrit-wm>	 (03PS2) 10Elukey: Deploy memcached 1.4.25 to mc1010 as part of a performance experiment. [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) 
[13:33:32] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Deploy memcached 1.4.25 to mc1010 as part of a performance experiment. [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) (owner: 10Elukey)
[13:33:50] <wikibugs>	 07Puppet, 10ORES, 06Revision-Scoring-As-A-Service: ORES-staging is broken due to service::uwsgi mandatory scap::target invoke - https://phabricator.wikimedia.org/T136488#2340955 (10Ladsgroup) Got a workaround with adding deployment-tin.deployment-prep.eqiad.wmflabs' IP as tin.eqiad.wmnet in /etc/hosts
[13:34:05] <grrrit-wm>	 (03PS1) 10Jcrespo: Add Ladsgroup to analytics-users [puppet] - 10https://gerrit.wikimedia.org/r/291918 (https://phabricator.wikimedia.org/T136417) 
[13:34:13] <grrrit-wm>	 (03Abandoned) 10Ladsgroup: service: Let other methods of deployment work in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/291527 (owner: 10Ladsgroup)
[13:34:35] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 04-2] "Not yet sponsored." [puppet] - 10https://gerrit.wikimedia.org/r/291918 (https://phabricator.wikimedia.org/T136417) (owner: 10Jcrespo)
[13:36:35] <grrrit-wm>	 (03PS3) 10Elukey: Deploy memcached 1.4.25 to mc1010 as part of a performance experiment. [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) 
[13:41:55] <grrrit-wm>	 (03PS4) 10Elukey: Deploy memcached 1.4.25 to mc1010 as part of a performance experiment. [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) 
[13:56:09] <grrrit-wm>	 (03CR) 10Elukey: "Puppet compiler looks good: https://puppet-compiler.wmflabs.org/3003/" [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) (owner: 10Elukey)
[14:00:55] <icinga-wm>	 RECOVERY - Disk space on ms-be2012 is OK: DISK OK
[14:14:38] <wikibugs>	 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to stats for Ladsgroup - https://phabricator.wikimedia.org/T136417#2340987 (10Halfak) +1  I'm working with @Ladsgroup and @TJones on some modeling work that will require access to private data on the stats machines.
[14:15:57] <grrrit-wm>	 (03PS1) 10Muehlenhoff: Add firejail profile and wrapper for ghostscript [puppet] - 10https://gerrit.wikimedia.org/r/291924 (https://phabricator.wikimedia.org/T135111) 
[14:16:08] <Amir1>	 jynus: ^
[14:17:40] <wikibugs>	 06Operations, 10Ops-Access-Requests: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2340989 (10Halfak)
[14:18:15] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 031] "confirmed by User:Halfak_(WMF) on https://phabricator.wikimedia.org/T136417#2340987" [puppet] - 10https://gerrit.wikimedia.org/r/291918 (https://phabricator.wikimedia.org/T136417) (owner: 10Jcrespo)
[14:18:15] <wikibugs>	 06Operations, 10Ops-Access-Requests: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2341003 (10Halfak) See @Ladsgroup's access request here: T136406  We'll need similar rights to be able to update and deploy #ORES.
[14:19:19] <wikibugs>	 06Operations, 10Ops-Access-Requests: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2341005 (10jcrespo) a:03jcrespo
[14:19:23] <wikibugs>	 06Operations, 10Ops-Access-Requests: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2341006 (10Halfak) Notes that @Ladsgroup's permissions were added in this patch: https://gerrit.wikimedia.org/r/291255
[14:20:00] <wikibugs>	 06Operations, 10Ops-Access-Requests: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2341007 (10Halfak) Actually, it looks like https://gerrit.wikimedia.org/r/#/c/291716/ might be more relevant.
[14:20:36] <andrewbogott>	 subbu: I'd like to migrate wikitextexp to a different virt host, is it ok if there's an hour or so of downtime with that instance?
[14:20:54] <grrrit-wm>	 (03CR) 10Faidon Liambotis: Increase time before alter for elasticsearch disk space issues (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/290487 (owner: 10Gehel)
[14:21:42] <wikibugs>	 06Operations, 10Ops-Access-Requests: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2341008 (10Ladsgroup) https://gerrit.wikimedia.org/r/291716 is actually adding @halfak's permission as well
[14:23:07] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Add Ladsgroup to analytics-users [puppet] - 10https://gerrit.wikimedia.org/r/291918 (https://phabricator.wikimedia.org/T136417) (owner: 10Jcrespo)
[14:24:01] <Amir1>	 thanks :)
[14:24:26] <Amir1>	 I probably need to wait until puppetmasters catch up
[14:27:55] <jynus>	 mmm, I do not see puppet creating your user
[14:34:11] <grrrit-wm>	 (03CR) 10Gehel: Increase time before alter for elasticsearch disk space issues (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/290487 (owner: 10Gehel)
[14:36:39] <icinga-wm>	 PROBLEM - Host beryllium is DOWN: PING CRITICAL - Packet loss = 100%
[14:37:23] <grrrit-wm>	 (03PS3) 10BBlack: varnish: jemalloc tuning for frontend caches [puppet] - 10https://gerrit.wikimedia.org/r/291592 (https://phabricator.wikimedia.org/T135384) 
[14:37:36] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] varnish: jemalloc tuning for frontend caches [puppet] - 10https://gerrit.wikimedia.org/r/291592 (https://phabricator.wikimedia.org/T135384) (owner: 10BBlack)
[14:37:50] <grrrit-wm>	 (03PS3) 10BBlack: raise fe mem size to 37% on text and upload [puppet] - 10https://gerrit.wikimedia.org/r/291593 (https://phabricator.wikimedia.org/T135384) 
[14:38:07] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] raise fe mem size to 37% on text and upload [puppet] - 10https://gerrit.wikimedia.org/r/291593 (https://phabricator.wikimedia.org/T135384) (owner: 10BBlack)
[14:38:17] <jynus>	 icinga says 1/2
[14:38:43] <jynus>	 oh, a wild downtime appears!
[14:39:14] <wikibugs>	 06Operations, 10Ops-Access-Requests: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2340989 (10DarTar) Supporting this request as @halfak's manager. This will substantially improve ORES release workflow.
[14:39:45] <jynus>	 author marvin-bot ?
[14:41:27] <grrrit-wm>	 (03PS6) 10Gehel: Increase time before alter for elasticsearch disk space issues [puppet] - 10https://gerrit.wikimedia.org/r/290487 
[14:41:43] <grrrit-wm>	 (03CR) 10Gehel: Increase time before alter for elasticsearch disk space issues (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/290487 (owner: 10Gehel)
[14:41:52] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Increase time before alter for elasticsearch disk space issues [puppet] - 10https://gerrit.wikimedia.org/r/290487 (owner: 10Gehel)
[14:46:50] <grrrit-wm>	 (03PS2) 10Thcipriani: Scap3 config for tilerator [puppet] - 10https://gerrit.wikimedia.org/r/291268 (https://phabricator.wikimedia.org/T129146) 
[14:49:06] <grrrit-wm>	 (03PS7) 10Gehel: Increase time before alter for elasticsearch disk space issues [puppet] - 10https://gerrit.wikimedia.org/r/290487 
[14:52:57] <wikibugs>	 06Operations, 10ops-codfw: Faulty RAM on mc2001 - https://phabricator.wikimedia.org/T136558#2341072 (10RobH) mc2001 is out of warranty for about a year now.  I'll check with @mark and see if we're planning on replacing these in Q1 of next fiscal, or later.  (Later and we'll want to replace the faulty memory.)...
[14:53:55] <grrrit-wm>	 (03PS1) 10Thcipriani: Scap3 config for Kartotherian [puppet] - 10https://gerrit.wikimedia.org/r/291930 (https://phabricator.wikimedia.org/T129150) 
[14:54:15] <moritzm>	 !log rolling reboot of scb systems in codfw for Linux 4.4 upgrade
[14:54:16] <wikibugs>	 06Operations, 10ops-eqiad: HP Warning on boot [Firmware Bug]: the BIOS has corrupted hw-PMU resources - https://phabricator.wikimedia.org/T136345#2341077 (10RobH) a:03RobH This isn't really ops-eqiad, but an issue for all HP setups onsite.  I'll keep this assigned to me until the documentation is updated on...
[14:54:21] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:55:02] <_joe_>	  /win 23
[14:55:39] <subbu>	 andrewbogott, sounds good. i have 2 vms .. mw-expt and mw-base .. do you mean both of those or just one? but, either is okay.
[14:56:22] <andrewbogott>	 subbu: oh sorry, I was reading the wrong line :)  Instance name 'mw-base '
[14:56:29] <andrewbogott>	 I'll start moving it right now
[14:56:38] <subbu>	 k
[15:00:00] <grrrit-wm>	 (03PS1) 10Jcrespo: Add analytics-users accounts to stats1002 [puppet] - 10https://gerrit.wikimedia.org/r/291932 
[15:00:04] <jouncebot>	 anomie ostriches thcipriani marktraceur Krenair: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160531T1500). Please do the needful.
[15:00:04] <jouncebot>	 mobrovac Pchelolo: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[15:00:42] <thcipriani>	 I can SWAT today: mobrovac Pchelolo ping me when you're around
[15:00:51] <mobrovac>	 here Th
[15:00:54] <mobrovac>	 here thcipriani
[15:00:59] <mobrovac>	 let's go :)
[15:01:04] <grrrit-wm>	 (03CR) 10Jcrespo: "I do not know if this is right or documentation should be updated." [puppet] - 10https://gerrit.wikimedia.org/r/291932 (owner: 10Jcrespo)
[15:01:27] <grrrit-wm>	 (03PS2) 10Thcipriani: Math: Enable MathML everywhere but private wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291766 (https://phabricator.wikimedia.org/T131177) (owner: 10Mobrovac)
[15:01:29] <Pchelolo>	 thcipriani: I'm here too
[15:02:15] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291766 (https://phabricator.wikimedia.org/T131177) (owner: 10Mobrovac)
[15:03:02] <grrrit-wm>	 (03Merged) 10jenkins-bot: Math: Enable MathML everywhere but private wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291766 (https://phabricator.wikimedia.org/T131177) (owner: 10Mobrovac)
[15:03:42] <bblack>	 !log restarting all frontend caches for new memory params (randomized order, ~1-min spacing, ~2h to completion) - T135384
[15:03:43] <stashbot>	 T135384: Raise cache frontend memory sizes significantly - https://phabricator.wikimedia.org/T135384
[15:03:48] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:05:43] <wikibugs>	 06Operations, 06Labs, 10Wikimedia-Video, 07Need-volunteer: Upload the Wikimania 2014 videos to Commons - https://phabricator.wikimedia.org/T106038#2341123 (10chasemp) p:05Triage>03Low
[15:05:59] <logmsgbot>	 !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:291766|Math: Enable MathML everywhere but private wikis]] (duration: 00m 34s)
[15:06:02] <thcipriani>	 ^ mobrovac check please
[15:06:05] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:06:38] <mobrovac>	 checking
[15:07:36] <moritzm>	 !log rebooting mendelevium (ticket.wikimedia.org) for update to Linux 4.4
[15:07:41] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:08:19] <icinga-wm>	 RECOVERY - Host beryllium is UP: PING OK - Packet loss = 0%, RTA = 2.31 ms
[15:08:21] <mobrovac>	 thcipriani: works!
[15:08:22] <mobrovac>	 thnx!
[15:08:30] <thcipriani>	 mobrovac: cool, thanks for checking!
[15:08:47] <grrrit-wm>	 (03PS4) 10Andrew Bogott: Allow horizon to query the labs puppetmaster for a list of classes [puppet] - 10https://gerrit.wikimedia.org/r/284103 
[15:08:59] <icinga-wm>	 PROBLEM - HHVM rendering on mw1180 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:09:20] <icinga-wm>	 PROBLEM - Apache HTTP on mw1180 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:10:19] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: es2017 and es2019 crashed with no logs - https://phabricator.wikimedia.org/T130702#2341138 (10Papaul) @Volans it is not a problem to update the RAID controller firmware .
[15:10:27] <logmsgbot>	 !log thcipriani@tin Synchronized php-1.28.0-wmf.3/extensions/EventBus/EventBus.hooks.php: SWAT: [[gerrit:291904|Replace wfUrlEncode with rawurlencode]] (duration: 00m 27s)
[15:10:33] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:10:35] <thcipriani>	 ^ Pchelolo check please
[15:12:28] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] network: Move into module [puppet] - 10https://gerrit.wikimedia.org/r/291234 (owner: 10Alexandros Kosiaris)
[15:12:33] <grrrit-wm>	 (03PS7) 10Alexandros Kosiaris: network: Move into module [puppet] - 10https://gerrit.wikimedia.org/r/291234 
[15:12:41] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] network: Move into module [puppet] - 10https://gerrit.wikimedia.org/r/291234 (owner: 10Alexandros Kosiaris)
[15:13:29] <Pchelolo>	 thcipriani: it's not easy to check, but from what I can tell it's ok
[15:13:31] <Pchelolo>	 thank you
[15:13:39] <grrrit-wm>	 (03PS1) 10Jcrespo: Move ladsgroup from analytics-users to statistics-users [puppet] - 10https://gerrit.wikimedia.org/r/291933 (https://phabricator.wikimedia.org/T136417) 
[15:13:39] <thcipriani>	 Pchelolo: ack. Thanks.
[15:14:56] <grrrit-wm>	 (03PS2) 10Jcrespo: Move ladsgroup from analytics-users to statistics-users [puppet] - 10https://gerrit.wikimedia.org/r/291933 (https://phabricator.wikimedia.org/T136417) 
[15:15:09] <icinga-wm>	 PROBLEM - check_apache2 on payments2001 is CRITICAL: PROCS CRITICAL: 0 processes with command name apache2
[15:15:31] <wikibugs>	 06Operations, 10ops-codfw: lvs2006 degraded RAID - https://phabricator.wikimedia.org/T136584#2341163 (10Papaul) p:05Triage>03High
[15:15:39] <grrrit-wm>	 (03CR) 10Lokal Profil: "Added to the Wednesday, June 01 Morning SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/288582 (https://phabricator.wikimedia.org/T135212) (owner: 10Lokal Profil)
[15:16:39] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: sca: remove cxserver-admin [puppet] - 10https://gerrit.wikimedia.org/r/291785 
[15:16:49] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] sca: remove cxserver-admin [puppet] - 10https://gerrit.wikimedia.org/r/291785 (owner: 10Alexandros Kosiaris)
[15:16:52] <wikibugs>	 06Operations, 10DBA, 06Labs, 10Tool-Labs: Replicate wikimania2017wiki to labs - https://phabricator.wikimedia.org/T126096#2341169 (10chasemp) p:05Triage>03Normal
[15:17:19] <grrrit-wm>	 (03PS5) 10Andrew Bogott: Allow horizon to query the labs puppetmaster for a list of classes [puppet] - 10https://gerrit.wikimedia.org/r/284103 
[15:17:35] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Introduce ores.svc.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/277725 (https://phabricator.wikimedia.org/T124202) (owner: 10Alexandros Kosiaris)
[15:18:19] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Allow horizon to query the labs puppetmaster for a list of classes [puppet] - 10https://gerrit.wikimedia.org/r/284103 (owner: 10Andrew Bogott)
[15:19:16] <wikibugs>	 06Operations, 06Discovery, 06Discovery-Search-Backlog, 06Labs, 10hardware-requests: eqiad: (2) Relevance forge servers - https://phabricator.wikimedia.org/T131184#2341198 (10chasemp) p:05Triage>03Normal
[15:19:19] <icinga-wm>	 PROBLEM - puppet last run on db1022 is CRITICAL: CRITICAL: puppet fail
[15:19:20] <grrrit-wm>	 (03PS6) 10Andrew Bogott: Allow horizon to query the labs puppetmaster for a list of classes [puppet] - 10https://gerrit.wikimedia.org/r/284103 
[15:19:26] <wikibugs>	 07Puppet, 06Labs, 10Phabricator: Phabricator labs puppet role configures phabricator wrong - https://phabricator.wikimedia.org/T131899#2341199 (10chasemp) p:05Triage>03Normal
[15:20:09] <icinga-wm>	 PROBLEM - check_apache2 on payments2002 is CRITICAL: PROCS CRITICAL: 0 processes with command name apache2
[15:20:10] <icinga-wm>	 PROBLEM - check_apache2 on payments2001 is CRITICAL: PROCS CRITICAL: 0 processes with command name apache2
[15:20:10] <icinga-wm>	 PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 1 failures
[15:21:35] <grrrit-wm>	 (03PS1) 10Eevans: Upgrade eqiad rack 'a' to Cassandra 2.2 [puppet] - 10https://gerrit.wikimedia.org/r/291935 (https://phabricator.wikimedia.org/T126629) 
[15:24:03] <grrrit-wm>	 (03PS3) 10Jcrespo: Move ladsgroup from analytics-users to statistics-users [puppet] - 10https://gerrit.wikimedia.org/r/291933 (https://phabricator.wikimedia.org/T136417) 
[15:25:05] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: es2017 and es2019 crashed with no logs - https://phabricator.wikimedia.org/T130702#2341216 (10jcrespo) Do you need downtime for that? If yes, let's program a time.
[15:25:09] <icinga-wm>	 RECOVERY - check_apache2 on payments2001 is OK: PROCS OK: 6 processes with command name apache2
[15:25:10] <icinga-wm>	 PROBLEM - check_apache2 on payments2002 is CRITICAL: PROCS CRITICAL: 0 processes with command name apache2
[15:25:11] <icinga-wm>	 PROBLEM - check_apache2 on payments2003 is CRITICAL: PROCS CRITICAL: 0 processes with command name apache2
[15:25:11] <icinga-wm>	 PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 1 failures
[15:25:39] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Move ladsgroup from analytics-users to statistics-users [puppet] - 10https://gerrit.wikimedia.org/r/291933 (https://phabricator.wikimedia.org/T136417) (owner: 10Jcrespo)
[15:28:17] <andrewbogott>	 subbu: all done
[15:28:18] <grrrit-wm>	 (03PS55) 10Alexandros Kosiaris: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup)
[15:29:01] <subbu>	 andrewbogott, ok. thanks.
[15:29:40] <wikibugs>	 06Operations, 06Labs, 10Labs-Infrastructure, 10Traffic: Move californium to an internal host? - https://phabricator.wikimedia.org/T133149#2341254 (10chasemp) p:05Triage>03Normal
[15:30:10] <icinga-wm>	 RECOVERY - check_apache2 on payments2002 is OK: PROCS OK: 6 processes with command name apache2
[15:30:10] <icinga-wm>	 RECOVERY - check_apache2 on payments2003 is OK: PROCS OK: 6 processes with command name apache2
[15:30:10] <icinga-wm>	 RECOVERY - check_puppetrun on payments2002 is OK: OK: Puppet is currently enabled, last run 227 seconds ago with 0 failures
[15:31:38] <wikibugs>	 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to stats for Ladsgroup - https://phabricator.wikimedia.org/T136417#2341265 (10jcrespo) 05Open>03Resolved User confirmed on IRC the access.
[15:32:10] <wikibugs>	 06Operations, 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure, 10Monitoring: Have a paging check for Nova API accessible - https://phabricator.wikimedia.org/T133656#2341273 (10chasemp) p:05Triage>03High I believe this is still happening on infrequently
[15:32:39] <grrrit-wm>	 (03Abandoned) 10Jcrespo: Add analytics-users accounts to stats1002 [puppet] - 10https://gerrit.wikimedia.org/r/291932 (owner: 10Jcrespo)
[15:36:49] <grrrit-wm>	 (03CR) 10Ottomata: [C: 031] Set Kafka default cleanup policy to 'delete' to avoid any compaction with 0.9 [puppet] - 10https://gerrit.wikimedia.org/r/291697 (owner: 10Elukey)
[15:37:58] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] "merging" [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup)
[15:38:04] <grrrit-wm>	 (03PS56) 10Alexandros Kosiaris: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup)
[15:38:08] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [V: 032] ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup)
[15:42:50] <icinga-wm>	 PROBLEM - Restbase root url on restbase1012 is CRITICAL: Connection refused
[15:43:00] <icinga-wm>	 PROBLEM - Host cp3016 is DOWN: PING CRITICAL - Packet loss = 100%
[15:43:03] <urandom>	 ^^ looking
[15:43:10] <icinga-wm>	 PROBLEM - Host cp3006 is DOWN: PING CRITICAL - Packet loss = 100%
[15:43:10] <icinga-wm>	 PROBLEM - Host cp3037 is DOWN: PING CRITICAL - Packet loss = 100%
[15:43:10] <icinga-wm>	 PROBLEM - Host cp3045 is DOWN: PING CRITICAL - Packet loss = 100%
[15:43:11] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.32.79, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused)))
[15:43:20] <icinga-wm>	 RECOVERY - Host cp3006 is UP: PING OK - Packet loss = 0%, RTA = 84.00 ms
[15:43:20] <icinga-wm>	 RECOVERY - Host cp3045 is UP: PING OK - Packet loss = 0%, RTA = 83.60 ms
[15:43:20] <icinga-wm>	 RECOVERY - Host cp3037 is UP: PING OK - Packet loss = 0%, RTA = 83.51 ms
[15:43:29] <icinga-wm>	 RECOVERY - Host cp3016 is UP: PING OK - Packet loss = 0%, RTA = 83.26 ms
[15:44:14] <yurik>	 thcipriani, hi, want to hold my hand for scap3 in 1.25 hours?
[15:44:16] <urandom>	 mobrovac: restbase on 1012 is down again, it logged a shutdown
[15:44:45] <thcipriani>	 yurik: if we can find an opsen that's willing to hold both our hands :)
[15:44:54] <yurik>	 gehel, ^?
[15:45:00] <icinga-wm>	 RECOVERY - puppet last run on db1022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[15:45:38] <mobrovac>	 uf
[15:45:41] <mobrovac>	 kk thnx urandom
[15:45:55] <urandom>	 mobrovac: gremlins?
[15:51:14] <urandom>	 !log Disabling puppet in preparation for upgrade on restbase1007, 1010, and 1011 : T126629
[15:51:15] <stashbot>	 T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629
[15:51:20] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:53:21] * gehel reading back...
[15:54:00] <icinga-wm>	 PROBLEM - Host beryllium is DOWN: PING CRITICAL - Packet loss = 100%
[15:54:25] <gehel>	 yurik, thcipriani: I know mostly nothing about scap3, so I can hold your virtual hands if you want, but that's probably the extent of my contribution
[15:54:40] <icinga-wm>	 RECOVERY - Restbase root url on restbase1012 is OK: HTTP OK: HTTP/1.1 200 - 15273 bytes in 0.009 second response time
[15:54:45] <yurik>	 thcipriani, what would we need from opsen?
[15:55:10] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy
[15:55:23] <thcipriani>	 yurik: we just need to merge the two puppet patches + run puppet in a couple places.
[15:55:47] <yurik>	 thcipriani, who would be the most ideal opsen for the task?
[15:55:53] <yurik>	 (and the 2nd idea)
[15:55:55] <thcipriani>	 _joe_: and godog have been helping out with the scap3 migrations
[15:55:56] <yurik>	 ideal)
[15:56:34] <thcipriani>	 yurik: maybe we can schedule the move for your services with them soonish. They may be gone already :\
[15:56:51] <yurik>	 thcipriani, i will be traveling in the next few days, will not be very stable for me
[15:57:00] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production).
[15:57:19] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production).
[15:57:28] <_joe_>	 thcipriani: both me and godog are on vacation for the rest of the week btw
[15:57:51] <thcipriani>	 _joe_: oh, was unaware, sorry for the ping, thanks for the update.
[15:58:26] <thcipriani>	 yurik: maybe it'd be best to schedule for next week then, from the sounds of it.
[15:58:29] <_joe_>	 thcipriani: well I am here tomorrow, but it seems yurik won't :P
[15:58:52] <yurik>	 sounds like next week it is :)
[15:59:00] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge.
[15:59:07] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Enable ores.svc.eqiad.wmnet IP on scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/291942 
[15:59:09] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: conftool: Add ores.svc.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/291943 
[15:59:11] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge.
[15:59:11] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Add ores-admin to scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/291944 
[15:59:13] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: lvs: add ores [puppet] - 10https://gerrit.wikimedia.org/r/291945 
[15:59:18] <yurik>	 i could even do it during the day if needed (i'm on UTC+3 time)
[16:00:04] <jouncebot>	 godog moritzm coreyfloyd: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160531T1600).
[16:00:04] <jouncebot>	 urandom: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process.
[16:00:12] * urandom is present
[16:00:13] <urandom>	 o/
[16:00:23] <_joe_>	 I can puppetswat if no one's around
[16:00:58] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: Enable ores.svc.eqiad.wmnet IP on scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/291942 
[16:01:04] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Enable ores.svc.eqiad.wmnet IP on scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/291942 (owner: 10Alexandros Kosiaris)
[16:01:07] <_joe_>	 urandom: how should we proceed?
[16:01:20] <_joe_>	 disable puppet on the hosts before upgrading?
[16:01:25] <urandom>	 _joe_: already done
[16:01:27] <_joe_>	 and then run sequentially?
[16:01:39] <mobrovac>	 who restarted rb on restbase1012?
[16:01:40] <grrrit-wm>	 (03PS2) 10Giuseppe Lavagetto: Upgrade eqiad rack 'a' to Cassandra 2.2 [puppet] - 10https://gerrit.wikimedia.org/r/291935 (https://phabricator.wikimedia.org/T126629) (owner: 10Eevans)
[16:01:41] <thcipriani>	 yurik: kk, next week wfm. I want to make scap migrations a more regular calendar event just to make scheduling less ad-hoc.
[16:01:57] <urandom>	 _joe_: thank you sir!
[16:01:58] <_joe_>	 mobrovac: need me to take a look?
[16:02:07] <yurik>	 thcipriani, scap migrations or scap depls? :)
[16:02:11] <urandom>	 _joe_: also, you have a strange idea of how to vacation!
[16:02:15] <_joe_>	 urandom: so it's safe to merge if I understood correctly
[16:02:21] <_joe_>	 urandom: actually I am working tomorrow
[16:02:23] <urandom>	 _joe_: yup
[16:02:28] <mobrovac>	 no _joe_, i'm there, but it seems rb went down and came back up
[16:02:30] <urandom>	 _joe_: safe to merge
[16:02:34] <mobrovac>	 and i'm not sure what's up with that
[16:02:47] <thcipriani>	 yurik: just the initial migrations. the deploys should already mostly be on the calendar :)
[16:02:57] <_joe_>	 mobrovac: check /var/log/auth.log if someone did it by hand
[16:03:06] <mobrovac>	 kk thnx
[16:03:22] <_joe_>	 jenkins makes me sad
[16:03:31] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Upgrade eqiad rack 'a' to Cassandra 2.2 [puppet] - 10https://gerrit.wikimedia.org/r/291935 (https://phabricator.wikimedia.org/T126629) (owner: 10Eevans)
[16:03:36] <urandom>	 \o/
[16:04:02] <_joe_>	 urandom: merged
[16:04:12] <_joe_>	 should I run puppet or are you going to do that?
[16:04:24] <urandom>	 _joe_: no, i will do  that
[16:04:30] <urandom>	 incrementally :)
[16:04:41] <urandom>	 _joe_: thank you!
[16:04:47] <_joe_>	 cool
[16:04:51] <_joe_>	 that was easy :P
[16:04:56] <urandom>	 yeah :)
[16:05:46] <_joe_>	 yurik: and btw, I guess gehel will need to learn about scap3 sooner or later ;)
[16:05:57] <_joe_>	 he's your guy, right?
[16:06:14] <gehel>	 _joe_: yep, I'm happy to follow that one to learn a bit about it...
[16:06:14] <yurik>	 _joe_, agreed, but i don't want him to get overloaded and run screaming for the mountains
[16:06:16] <urandom>	 !log Stopping restbase1007-{a,b,c} in preparation for upgrade : T126629
[16:06:17] <stashbot>	 T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629
[16:06:22] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:07:11] <_joe_>	 yurik: and me/filippo would be the alternatives not overloaded? ;)
[16:07:45] <bblack>	 !log depooling cp3032 to investigate T126062
[16:07:46] <stashbot>	 T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062
[16:07:49] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:07:51] <_joe_>	 my point is, it's a good thing for a few project your team has if gehel gets his ears wet with scap3
[16:08:00] <_joe_>	 *projects
[16:08:14] <yurik>	 hehe, totally, its in the pipeline :)
[16:08:17] * gehel fully agree with _joe_
[16:08:19] <_joe_>	 I'm sure me/alex/filippo will help 
[16:08:52] <yurik>	 but lets launch production maps first :)
[16:09:25] <mobrovac>	 so that you can forget about it?
[16:09:27] <mobrovac>	 tsc tsc
[16:09:40] <gehel>	 Concretely, what do I need to do for this scap3 deployment?
[16:10:09] <urandom>	 !log Upgrading Cassandra to 2.2.6 on restbase1007.eqiad.wmnet : T126629
[16:10:10] <stashbot>	 T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629
[16:10:14] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:10:59] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: db2034 degraded RAID - https://phabricator.wikimedia.org/T136583#2341475 (10Papaul) p:05Triage>03Normal
[16:11:17] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: db2034 degraded RAID - https://phabricator.wikimedia.org/T136583#2339694 (10Papaul) Will have the disk on site tomorrow  Dear Mr Papaul Tshibamba,  Thank you for contacting Hewlett Packard Enterprise for your service request. This email confirms your request for service and...
[16:11:51] <icinga-wm>	 PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures
[16:12:02] <icinga-wm>	 PROBLEM - cassandra-c CQL 10.64.0.232:9042 on restbase1007 is CRITICAL: Connection refused
[16:12:22] <urandom>	 ^^^ expected
[16:12:22] <icinga-wm>	 PROBLEM - cassandra-a CQL 10.64.0.230:9042 on restbase1007 is CRITICAL: Connection refused
[16:12:27] <thcipriani>	 gehel: for tilerator, you'd merge: https://gerrit.wikimedia.org/r/#/c/291268/ then run puppet on tin, yurik would run: `scap deploy` for tilerator, it will fail, then you'll run puppet on the tilerator service nodes, it should succeed, then yurik can run `scap deploy` for tilerator again from tin and this time it should succeed and new code should be deployed.
[16:12:42] <wikibugs>	 06Operations, 10ops-codfw: lvs2006 degraded RAID - https://phabricator.wikimedia.org/T136584#2341482 (10Papaul) Will receive the disk tomorrow.  Dear Mr Papaul Tshibamba,  Thank you for contacting Hewlett Packard Enterprise for your service request. This email confirms your request for service and the details...
[16:13:00] <thcipriani>	 gehel: same process for kartotherian with https://gerrit.wikimedia.org/r/#/c/291930/
[16:13:01] <icinga-wm>	 PROBLEM - cassandra-b CQL 10.64.0.231:9042 on restbase1007 is CRITICAL: Connection refused
[16:13:27] <gehel>	 thcipriani: ok, so if everything goes as planned, it's easy, and if not I'll scream for help. I can do that.
[16:13:34] <thcipriani>	 :D
[16:13:52] <icinga-wm>	 RECOVERY - cassandra-c CQL 10.64.0.232:9042 on restbase1007 is OK: TCP OK - 0.000 second response time on port 9042
[16:14:06] <thcipriani>	 sounds like the tent-posts of every deployment :)
[16:14:22] <icinga-wm>	 RECOVERY - cassandra-a CQL 10.64.0.230:9042 on restbase1007 is OK: TCP OK - 0.001 second response time on port 9042
[16:14:52] <icinga-wm>	 RECOVERY - cassandra-b CQL 10.64.0.231:9042 on restbase1007 is OK: TCP OK - 0.003 second response time on port 9042
[16:15:29] <gehel>	 yurik, thcipriani: when does this need to happen?
[16:15:41] <urandom>	 !log Upgrade of restbase1007-{a,b,c} complete : T126629
[16:15:42] <yurik>	 no rush i guess
[16:15:42] <stashbot>	 T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629
[16:15:46] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:17:58] <thcipriani>	 yurik: your call. I'm around during your deployment window today. These two will have to merge pre-scap-deploy: https://gerrit.wikimedia.org/r/#/c/285979/ https://gerrit.wikimedia.org/r/#/c/285980/
[16:18:36] <wikibugs>	 06Operations, 10ops-esams, 06DC-Ops, 10Traffic: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062#2341497 (10BBlack) Supporting the theory that these need firmware updates....  cp2001 racadm getversion: ```  Bios Version                     = 1.2.10  iDRAC Version...
[16:18:41] <yurik>	 thcipriani, sure, i have exactly one hour during the depl :)
[16:18:50] <yurik>	 afterwards is a meeting, and than i'm running for the train )
[16:19:06] <grrrit-wm>	 (03PS3) 10Jcrespo: Add ores-admins group and provide permissions for scb [puppet] - 10https://gerrit.wikimedia.org/r/291716 (https://phabricator.wikimedia.org/T136406) (owner: 10Ladsgroup)
[16:19:28] <yurik>	 gehel, btw, unrelated, seems like something was missed in the maps-test today - https://ganglia.wikimedia.org/latest/?r=week&cs=&ce=&m=cpu_report&c=Maps+Cluster+codfw&h=maps-test2001.codfw.wmnet&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=NOGROUPS
[16:19:37] <yurik>	 could it be that db update broke?
[16:20:07] <grrrit-wm>	 (03PS3) 10Elukey: Set Kafka default cleanup policy to 'delete' to avoid any compaction with 0.9 [puppet] - 10https://gerrit.wikimedia.org/r/291697 
[16:20:27] <yurik>	 gehel, don't worry about it if it takes too long to check, but just in case its something obvious
[16:20:29] <thcipriani>	 yurik: gehel let's schedule something for next week if that works for you both. I don't anticipate the migration taking an hour but an absolute timebox is probably not a good thing.
[16:20:48] <yurik>	 thcipriani, agree, lets do it next week
[16:20:49] <yurik>	 thx!
[16:20:51] <gehel>	 sounds good to me.
[16:20:53] <gehel>	 thanks!
[16:21:06] <thcipriani>	 yurik: gehel awesome. Thanks both :)
[16:25:02] <icinga-wm>	 PROBLEM - puppet last run on mira is CRITICAL: CRITICAL: Puppet has 1 failures
[16:27:55] <wikibugs>	 06Operations, 06Analytics-Kanban, 10Traffic, 13Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2341533 (10Ottomata) > This is a problem in the way we check data integrity rather than in vk itself, so we should fix our calculation...
[16:28:42] <grrrit-wm>	 (03CR) 10Jcrespo: "This is ready, could be optionally blocked on 278990" [puppet] - 10https://gerrit.wikimedia.org/r/291716 (https://phabricator.wikimedia.org/T136406) (owner: 10Ladsgroup)
[16:29:29] <wikibugs>	 06Operations, 06Analytics-Kanban, 10Traffic, 13Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2341538 (10elukey) >>! In T136314#2340456, @elukey wrote: > 1) vk is correctly adding the start timestamp to our logs but this trigger...
[16:31:06] <grrrit-wm>	 (03PS4) 10Jcrespo: Add ores-admins group and provide permissions for scb [puppet] - 10https://gerrit.wikimedia.org/r/291716 (https://phabricator.wikimedia.org/T136406) (owner: 10Ladsgroup)
[16:31:59] <grrrit-wm>	 (03CR) 10Elukey: [C: 032] Set Kafka default cleanup policy to 'delete' to avoid any compaction with 0.9 [puppet] - 10https://gerrit.wikimedia.org/r/291697 (owner: 10Elukey)
[16:32:05] <wikibugs>	 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2341545 (10jcrespo) Already ready to go: https://gerrit.wikimedia.org/r/291716  This could technically go at any time, but there is nothing to de...
[16:33:38] <wikibugs>	 06Operations, 10ops-esams, 06DC-Ops, 10Traffic: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062#2341551 (10BBlack) Latest on Dell's site seems to be http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=5GCHC - going to reconfirm we still have issues, then t...
[16:39:51] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 031] Add ores-admins group and provide permissions for scb [puppet] - 10https://gerrit.wikimedia.org/r/291716 (https://phabricator.wikimedia.org/T136406) (owner: 10Ladsgroup)
[16:40:51] <wikibugs>	 06Operations, 10ops-esams, 06DC-Ops, 10Traffic: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062#2341578 (10BBlack) So... cp3032 rebooted fine via software, after I had done a preemptive `racadm racreset`.  Will move on to a few others that were known-problems in the past and se...
[16:44:12] <grrrit-wm>	 (03PS1) 10Jcrespo: Update autoinstall to include db1079-94 [puppet] - 10https://gerrit.wikimedia.org/r/291948 (https://phabricator.wikimedia.org/T135253) 
[16:45:15] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: base: add service_sidekick [puppet] - 10https://gerrit.wikimedia.org/r/291949 
[16:46:22] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] base: add service_sidekick [puppet] - 10https://gerrit.wikimedia.org/r/291949 (owner: 10Giuseppe Lavagetto)
[16:48:18] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Update autoinstall to include db1079-94 [puppet] - 10https://gerrit.wikimedia.org/r/291948 (https://phabricator.wikimedia.org/T135253) (owner: 10Jcrespo)
[16:48:26] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] "Comments yay!" [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/291870 (https://phabricator.wikimedia.org/T136314) (owner: 10Elukey)
[16:50:01] <grrrit-wm>	 (03CR) 10Ottomata: [C: 031] access request for joe sutherland [puppet] - 10https://gerrit.wikimedia.org/r/290599 (https://phabricator.wikimedia.org/T136137) (owner: 10RobH)
[16:51:13] <grrrit-wm>	 (03PS5) 10Elukey: Deploy memcached 1.4.25 to mc1010 as part of a performance experiment. [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) 
[16:51:33] <wikibugs>	 06Operations, 10ops-eqiad, 13Patch-For-Review: Rack and set up 16 db's db1079-1094 - https://phabricator.wikimedia.org/T135253#2341606 (10jcrespo)
[16:51:57] <wikibugs>	 06Operations, 10ops-eqiad, 13Patch-For-Review: Rack and set up 16 db's db1079-1094 - https://phabricator.wikimedia.org/T135253#2293311 (10jcrespo) a:05Cmjohnson>03jcrespo
[16:52:52] <elukey>	 !log disabling puppet on mc10* hosts as prep step for https://gerrit.wikimedia.org/r/#/c/291916. Memcached 1.4.25 will be deployed to mc1010 as part of a perf. test (T129963)
[16:52:53] <stashbot>	 T129963: Update memcached package and configuration options - https://phabricator.wikimedia.org/T129963
[16:52:57] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:54:08] <grrrit-wm>	 (03CR) 10Elukey: [C: 032] Deploy memcached 1.4.25 to mc1010 as part of a performance experiment. [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) (owner: 10Elukey)
[16:54:28] <grrrit-wm>	 (03PS1) 10Madhuvishy: ifttt: Specify the right uwsgi plugin for python2 [puppet] - 10https://gerrit.wikimedia.org/r/291952 
[16:54:32] <jynus>	 oh, one host is actually installed already
[17:00:04] <jouncebot>	 yurik gwicke cscott arlolra subbu: Respected human, time to deploy Services – Graphoid / Parsoid / OCG / Citoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160531T1700). Please do the needful.
[17:03:34] <thcipriani>	 !log starting branch-cut for mediawiki and extensions for version 1.28.0-wmf.4
[17:03:39] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:03:48] <yurik>	 nope, skipping this one
[17:04:23] <grrrit-wm>	 (03PS1) 10Jcrespo: Add bot_password as a private table [puppet] - 10https://gerrit.wikimedia.org/r/291954 (https://phabricator.wikimedia.org/T135074) 
[17:06:45] <thcipriani>	 could I get an opsent to remove /tmp/make-wmf-branch ?
[17:06:52] <thcipriani>	 er opsen rather :P
[17:07:03] <thcipriani>	 on tin
[17:07:06] <jynus>	 where, tin?
[17:07:07] <jynus>	 ok
[17:07:09] <jynus>	 doing
[17:07:13] <thcipriani>	 thanks
[17:08:03] <jynus>	 thcipriani, done
[17:08:18] <thcipriani>	 jynus: thank you!
[17:08:19] <jynus>	 I will delete it when you confirm tin is not horribly broken
[17:08:30] <jynus>	 (it is not there anymore)
[17:08:52] <jynus>	 (rm redirects to move for ops)
[17:12:23] <grrrit-wm>	 (03CR) 10Chad: [C: 031] Add bot_password as a private table [puppet] - 10https://gerrit.wikimedia.org/r/291954 (https://phabricator.wikimedia.org/T135074) (owner: 10Jcrespo)
[17:17:55] <bblack>	 !log depooled reboot of cp3040 - T126062
[17:17:56] <stashbot>	 T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062
[17:18:00] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:19:31] <grrrit-wm>	 (03PS2) 10Jcrespo: Add bot_password as a private table [puppet] - 10https://gerrit.wikimedia.org/r/291954 (https://phabricator.wikimedia.org/T135074) 
[17:21:08] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Add bot_password as a private table [puppet] - 10https://gerrit.wikimedia.org/r/291954 (https://phabricator.wikimedia.org/T135074) (owner: 10Jcrespo)
[17:26:56] <jynus>	 !log restarting mysqls at sanitarium, some transitional lag on labs
[17:27:02] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:31:23] <jynus>	 I saw a kernel update, opted for a full reboot
[17:32:20] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures
[17:41:16] <grrrit-wm>	 (03PS7) 10Andrew Bogott: Allow horizon to query the labs puppetmaster for a list of classes [puppet] - 10https://gerrit.wikimedia.org/r/284103 
[17:41:29] <icinga-wm>	 RECOVERY - Host beryllium is UP: PING OK - Packet loss = 0%, RTA = 2.90 ms
[17:45:10] <grrrit-wm>	 (03PS1) 10Jcrespo: Change filter to the actual real name: bot_passwords, plural [puppet] - 10https://gerrit.wikimedia.org/r/291956 (https://phabricator.wikimedia.org/T135074) 
[17:45:12] <grrrit-wm>	 (03PS1) 10Urbanecm: Enable VE in NS_PROJECT in cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291957 (https://phabricator.wikimedia.org/T136628) 
[17:46:14] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Change filter to the actual real name: bot_passwords, plural [puppet] - 10https://gerrit.wikimedia.org/r/291956 (https://phabricator.wikimedia.org/T135074) (owner: 10Jcrespo)
[17:50:36] <grrrit-wm>	 (03CR) 10Ladsgroup: "I think this can be merged now :)" [puppet] - 10https://gerrit.wikimedia.org/r/278989 (https://phabricator.wikimedia.org/T124201) (owner: 10Alexandros Kosiaris)
[17:52:26] <grrrit-wm>	 (03CR) 10Dereckson: "@Nemo: please schedule this during a SWAT window at https://wikitech.wikimedia.org/wiki/Deployments." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250384 (https://phabricator.wikimedia.org/T130442) (owner: 10Nemo bis)
[17:54:00] <icinga-wm>	 PROBLEM - Router interfaces on pfw-eqiad is CRITICAL: CRITICAL: host 208.80.154.218, interfaces up: 106, down: 1, dormant: 0, excluded: 1, unused: 0BRge-11/0/10: down - berylliumBR
[17:56:37] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage: rack ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2341867 (10RobH)
[17:57:20] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage: rack ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2341884 (10RobH) a:05RobH>03fgiunchedi I'm going to assign this to @fgiunchedi for his recommendation on how we need to space out the 6 new systems in the 4 rows.  Once we have his feedback, @Papaul...
[17:57:56] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage: rack ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2341888 (10RobH)
[17:57:58] <wikibugs>	 06Operations, 10media-storage, 07Tracking: refresh swift hardware in codfw/eqiad (tracking) - https://phabricator.wikimedia.org/T130012#2341887 (10RobH)
[17:58:11] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[18:00:29] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[18:01:05] <grrrit-wm>	 (03CR) 10Nemo bis: "I don't use SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250384 (https://phabricator.wikimedia.org/T130442) (owner: 10Nemo bis)
[18:01:13] <grrrit-wm>	 (03CR) 10Dereckson: "You can use Depends-On: I66b437795a376223b02a0c8a87bddc197470a3b9 to park the dependency." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287936 (https://phabricator.wikimedia.org/T134770) (owner: 10Addshore)
[18:04:12] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage: rack ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2341901 (10RobH) It seems @fgiunchedi is out from now until the 10th.  These servers may arrive on site before he returns.
[18:04:13] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1002 is OK: All endpoints are healthy
[18:05:03] <wikibugs>	 06Operations, 10DBA: Install, configure and provision recently arrived db core machines - https://phabricator.wikimedia.org/T133398#2230646 (10jcrespo)
[18:05:05] <wikibugs>	 06Operations, 10ops-eqiad, 13Patch-For-Review: Rack and set up 16 db's db1079-1094 - https://phabricator.wikimedia.org/T135253#2341902 (10jcrespo) 05Open>03stalled Stalling until  T133398 is completed as I've been warned there could be some network issues.
[18:05:21] <Dereckson>	 Nemo_bis: the change isn't going to deploy itself by magic, I've asked on the task if a wight or n uria is willing to deploy it, but if they aren't interested either, we don't have magic processes to do it. What you could do to help is note on https://phabricator.wikimedia.org/T130442 the procedure to test your change, so the one willing to deploy it for you will know what to do.
[18:05:34] <wikibugs>	 06Operations, 10DBA: Install, configure and provision recently arrived db core machines - https://phabricator.wikimedia.org/T133398#2230646 (10jcrespo) a:03jcrespo
[18:06:42] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1003 is OK: All endpoints are healthy
[18:07:09] <wikibugs>	 06Operations, 10ops-eqiad, 10media-storage: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2341913 (10RobH)
[18:07:24] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2341929 (10RobH)
[18:12:43] <grrrit-wm>	 (03CR) 10Dereckson: [C: 04-1] "Blocked by https://phabricator.wikimedia.org/T124841" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/285009 (https://phabricator.wikimedia.org/T104163) (owner: 10Urbanecm)
[18:13:23] <Krinkle>	 !log mwscript deleteEqualMessages.php --wiki hrwikibooks (T45917)
[18:13:23] <stashbot>	 T45917: Delete all redundant "MediaWiki" pages for system messages - https://phabricator.wikimedia.org/T45917
[18:13:27] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:15:02] <grrrit-wm>	 (03Abandoned) 10Urbanecm: Enable DynamicPageList extension on te.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/285009 (https://phabricator.wikimedia.org/T104163) (owner: 10Urbanecm)
[18:15:42] <grrrit-wm>	 (03CR) 10Dereckson: [C: 031] "This patch is ready to deploy." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/284087 (https://phabricator.wikimedia.org/T132972) (owner: 10Eranroz)
[18:17:09] <grrrit-wm>	 (03CR) 10Dereckson: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291218 (https://phabricator.wikimedia.org/T58037) (owner: 10Matěj Suchánek)
[18:20:13] <icinga-wm>	 RECOVERY - Router interfaces on pfw-eqiad is OK: OK: host 208.80.154.218, interfaces up: 108, down: 0, dormant: 0, excluded: 1, unused: 0
[18:20:15] <grrrit-wm>	 (03CR) 10Dereckson: [C: 031] "This looks good to me, and depending change has been merged. So this is ready for deployment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291218 (https://phabricator.wikimedia.org/T58037) (owner: 10Matěj Suchánek)
[18:20:20] <gehel>	 thcipriani, yurik: did we decide on a date (more precise than "next week") for this scap3 thing?
[18:21:12] <yurik>	 gehel, nope, up to thcipriani, i'm very flex :)
[18:21:18] <thcipriani>	 gehel: no we didn't. Would be nice to get it on the deployemnt calendar as well. Monday and Wednesday are usually the most open days there.
[18:21:52] <thcipriani>	 Monday pre-service deployment window?
[18:22:15] <yurik>	 thcipriani, what time is it?
[18:22:18] <thcipriani>	 oh wait, meeting time at that time.
[18:22:33] <icinga-wm>	 PROBLEM - Host cp3043 is DOWN: PING CRITICAL - Packet loss = 100%
[18:22:35] <yurik>	 hehe, i think the depl calendar should be managed in gcal :)
[18:22:52] <icinga-wm>	 PROBLEM - Host cp3012 is DOWN: PING CRITICAL - Packet loss = 100%
[18:23:02] <yurik>	 should we create a bot that copies all the events from wiki to gcal?
[18:23:03] <icinga-wm>	 PROBLEM - Host lvs3003 is DOWN: PING CRITICAL - Packet loss = 100%
[18:23:23] <icinga-wm>	 PROBLEM - Host cp3008 is DOWN: PING CRITICAL - Packet loss = 100%
[18:23:29] <gehel>	 yurik: that bot would be great!
[18:23:42] <yurik>	 i will let greg-g do that :)
[18:23:50] <yurik>	 or who manages rel eng nowadays?
[18:24:02] <icinga-wm>	 RECOVERY - Host cp3008 is UP: PING WARNING - Packet loss = 54%, RTA = 83.61 ms
[18:24:02] <gehel>	 yurik: too easy, you give the idea, you take care of the implementation!
[18:24:02] <icinga-wm>	 RECOVERY - Host cp3043 is UP: PING WARNING - Packet loss = 54%, RTA = 86.26 ms
[18:24:02] <greg-g>	 `?
[18:24:03] <icinga-wm>	 RECOVERY - Host cp3012 is UP: PING OK - Packet loss = 0%, RTA = 83.30 ms
[18:24:11] <thcipriani>	 Wednesday 2016-06-08 18:00 UTC would work for me. Right after SoS.
[18:24:13] <icinga-wm>	 RECOVERY - Host lvs3003 is UP: PING OK - Packet loss = 0%, RTA = 87.73 ms
[18:25:25] <thcipriani>	 gehel: I have been known to be around an hour before morning SWAT.
[18:25:36] <thcipriani>	 so 7am SF time.
[18:25:51] <gehel>	 thcipriani: damn! I'm not a morning person ... that's impressive!
[18:26:05] <thcipriani>	 I'm not in SF's timezone :P
[18:26:12] <thcipriani>	 that's 8am for me.
[18:26:42] <gehel>	 Wednesday 18UTC sounds good to me. yurik ?
[18:27:28] <yurik>	 gehel, thcipriani, 18UTC is ogod
[18:27:54] <icinga-wm>	 PROBLEM - puppet last run on ms-fe3001 is CRITICAL: CRITICAL: Puppet has 2 failures
[18:28:33] <icinga-wm>	 PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures
[18:29:12] <icinga-wm>	 PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: Puppet has 1 failures
[18:30:35] <icinga-wm>	 PROBLEM - puppet last run on mc2007 is CRITICAL: CRITICAL: puppet fail
[18:35:23] <thcipriani>	 yurik: gehel greg-g https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160608T1800
[18:36:32] <grrrit-wm>	 (03PS2) 10Ottomata: Make hive-metastore service depend on libmysql-jar in classpath [puppet/cdh] - 10https://gerrit.wikimedia.org/r/284506 (https://phabricator.wikimedia.org/T133198) 
[18:42:52] <grrrit-wm>	 (03CR) 10Matěj Suchánek: "Ok, thanks. Sheduled for tomorrow's Morning SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291218 (https://phabricator.wikimedia.org/T58037) (owner: 10Matěj Suchánek)
[18:45:35] <greg-g>	 thcipriani: weee
[18:47:10] <thcipriani>	 :)
[18:47:15] <grrrit-wm>	 (03PS3) 10Ottomata: Make hive-metastore service depend on libmysql-jar in classpath [puppet/cdh] - 10https://gerrit.wikimedia.org/r/284506 (https://phabricator.wikimedia.org/T133198) 
[18:48:14] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 649 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6081860 keys - replication_delay is 649
[18:50:13] <icinga-wm>	 PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[18:50:20] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Make hive-metastore service depend on libmysql-jar in classpath [puppet/cdh] - 10https://gerrit.wikimedia.org/r/284506 (https://phabricator.wikimedia.org/T133198) (owner: 10Ottomata)
[18:52:22] <wikibugs>	 07Puppet, 06Labs, 10Phabricator: Phabricator labs puppet role configures phabricator wrong - https://phabricator.wikimedia.org/T131899#2342059 (10mmodell) @luke081515 I'll work on it a bit and see if I can get it to be more automated.
[18:52:29] <wikibugs>	 07Puppet, 06Labs, 10Phabricator: Phabricator labs puppet role configures phabricator wrong - https://phabricator.wikimedia.org/T131899#2342060 (10mmodell) a:03mmodell
[18:53:50] <grrrit-wm>	 (03PS1) 10Ottomata: Update cdh submodule wth libmysql-java dependency fix [puppet] - 10https://gerrit.wikimedia.org/r/291964 (https://phabricator.wikimedia.org/T133198) 
[18:54:03] <icinga-wm>	 RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 2.008 second response time
[18:54:42] <icinga-wm>	 RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures
[18:55:23] <icinga-wm>	 RECOVERY - puppet last run on ms-fe3001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[18:55:53] <icinga-wm>	 RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[18:56:54] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Update cdh submodule wth libmysql-java dependency fix [puppet] - 10https://gerrit.wikimedia.org/r/291964 (https://phabricator.wikimedia.org/T133198) (owner: 10Ottomata)
[18:57:52] <icinga-wm>	 RECOVERY - puppet last run on mc2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:00:04] <jouncebot>	 thcipriani: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160531T1900).
[19:02:44] * thcipriani does
[19:02:49] <icinga-wm>	 PROBLEM - HHVM rendering on mw1020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:02:51] <wikibugs>	 06Operations, 10ops-eqiad, 06Labs, 10Labs-Infrastructure: connect usb external disk to labmon1001 - https://phabricator.wikimedia.org/T136242#2342154 (10Cmjohnson) Connected a 3TB disk with the usb drive toaster. Did not mount.
[19:03:38] <wikibugs>	 07Puppet, 10ORES, 06Revision-Scoring-As-A-Service: ORES-staging is broken due to service::uwsgi mandatory scap::target invoke - https://phabricator.wikimedia.org/T136488#2336803 (10yuvipanda) List of reasons why this is a problem:  1. Setting up a standalone scap3 master in a project that is not deployment-p...
[19:04:10] <icinga-wm>	 RECOVERY - HHVM rendering on mw1020 is OK: HTTP OK: HTTP/1.1 200 OK - 68586 bytes in 0.157 second response time
[19:04:30] <wikibugs>	 06Operations, 10ops-eqiad, 06Labs, 10Labs-Infrastructure: connect usb external disk to labmon1001 - https://phabricator.wikimedia.org/T136242#2342163 (10Cmjohnson) a:05Cmjohnson>03RobH
[19:06:54] <wikibugs>	 07Puppet, 10ORES, 06Revision-Scoring-As-A-Service: ORES-staging is broken due to service::uwsgi mandatory scap::target invoke - https://phabricator.wikimedia.org/T136488#2342181 (10Ladsgroup) {meme, src="southparkfan-approves", below=Great!}
[19:07:04] <grrrit-wm>	 (03PS2) 10Dereckson: Enable Visual Editor in NS_PROJECT in cs.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291957 (https://phabricator.wikimedia.org/T136628) (owner: 10Urbanecm)
[19:08:19] <grrrit-wm>	 (03CR) 10Dereckson: [C: 04-1] Enable Visual Editor in NS_PROJECT in cs.wikipedia (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291957 (https://phabricator.wikimedia.org/T136628) (owner: 10Urbanecm)
[19:10:40] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6031337 keys - replication_delay is 0
[19:11:59] <grrrit-wm>	 (03PS1) 10Yuvipanda: service: Allow not requiring scap3 for service::uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/291967 (https://phabricator.wikimedia.org/T136488) 
[19:12:17] <wikibugs>	 07Puppet, 10ORES, 06Revision-Scoring-As-A-Service, 13Patch-For-Review: ORES-staging is broken due to service::uwsgi mandatory scap::target invoke - https://phabricator.wikimedia.org/T136488#2342223 (10yuvipanda) Patch takes slightly different approach, but same thing.
[19:14:23] <urandom>	 !log Restart restbase1007-c.eqiad.wmnet because reasons
[19:14:28] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:15:45] <grrrit-wm>	 (03CR) 10Awight: Use full URL in $wgNoticeHideUrls (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250384 (https://phabricator.wikimedia.org/T130442) (owner: 10Nemo bis)
[19:16:54] <grrrit-wm>	 (03PS8) 10Andrew Bogott: Allow horizon to query the labs puppetmaster for a list of classes [puppet] - 10https://gerrit.wikimedia.org/r/284103 
[19:16:56] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Allow horizon hosts to contact the labs puppetmaster. [puppet] - 10https://gerrit.wikimedia.org/r/291969 (https://phabricator.wikimedia.org/T91990) 
[19:17:45] <grrrit-wm>	 (03PS1) 10Thcipriani: Group0 to 1.28.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291970 
[19:18:41] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Allow horizon hosts to contact the labs puppetmaster. [puppet] - 10https://gerrit.wikimedia.org/r/291969 (https://phabricator.wikimedia.org/T91990) (owner: 10Andrew Bogott)
[19:22:09] <wikibugs>	 06Operations, 10ops-eqiad: mw1070-89 and mw1121-30 are shut down and should be physically decommissioned - https://phabricator.wikimedia.org/T133770#2342276 (10Cmjohnson)
[19:22:16] <wikibugs>	 06Operations, 10ops-eqiad: mw1070-89 and mw1121-30 are shut down and should be physically decommissioned - https://phabricator.wikimedia.org/T133770#2242410 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson
[19:22:18] <wikibugs>	 06Operations, 13Patch-For-Review, 05codfw-rollout, 03codfw-rollout-Jan-Mar-2016: Reduce the number of appservers we're using in eqiad - https://phabricator.wikimedia.org/T126242#2342280 (10Cmjohnson)
[19:23:26] <logmsgbot>	 !log thcipriani@tin Started scap: testwiki to php-1.28.0-wmf.4 and rebuild l10n cache
[19:23:30] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:28:02] <grrrit-wm>	 (03PS1) 10Mobrovac: MobileApps: use the provided request templates for API calls [puppet] - 10https://gerrit.wikimedia.org/r/291972 
[19:29:35] <wikibugs>	 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/deploy 3 eqiad druid nodes - https://phabricator.wikimedia.org/T134275#2342320 (10Ottomata)
[19:29:46] <wikibugs>	 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/deploy 3 eqiad druid nodes - https://phabricator.wikimedia.org/T134275#2260510 (10Ottomata) 05Open>03Resolved
[19:34:11] <wikibugs>	 06Operations, 10ops-eqiad, 10media-storage: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2342330 (10Southparkfan)
[19:35:02] <wikibugs>	 06Operations, 10ops-ulsfo: cp4016: bad power supply - https://phabricator.wikimedia.org/T134526#2342331 (10RobH) 05Open>03Resolved Dropping off the return to USPS today.  {F4095382}  {F4095386}
[19:35:05] <grrrit-wm>	 (03PS2) 10BBlack: tlsproxy: double ssl session cache size [puppet] - 10https://gerrit.wikimedia.org/r/291914 
[19:35:34] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] tlsproxy: double ssl session cache size [puppet] - 10https://gerrit.wikimedia.org/r/291914 (owner: 10BBlack)
[19:35:53] <grrrit-wm>	 (03PS2) 10BryanDavis: Make the builder script less simple [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291525 
[19:37:42] <grrrit-wm>	 (03CR) 10BryanDavis: "check experimental" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291525 (owner: 10BryanDavis)
[19:37:53] <grrrit-wm>	 (03CR) 10Mobrovac: "OKed by PCC - https://puppet-compiler.wmflabs.org/3008/" [puppet] - 10https://gerrit.wikimedia.org/r/291972 (owner: 10Mobrovac)
[19:39:37] <grrrit-wm>	 (03CR) 10Mobrovac: [C: 04-1] "This needs to be deployed in sync with I945d21e341b5d6d7d3a9848ea166bc68f281878d, so -1'ing until the time comes." [puppet] - 10https://gerrit.wikimedia.org/r/291972 (owner: 10Mobrovac)
[19:45:55] <bblack>	 !log restarting cp* nginxes for config update
[19:46:00] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:49:26] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 032] Add base PHP container & php web container [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/290607 (owner: 10Yuvipanda)
[19:49:38] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 032] Switch to using wikimedia-jessie as base container [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/290795 (owner: 10Yuvipanda)
[19:50:04] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 032] Add a simple builder script [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/290793 (owner: 10Yuvipanda)
[19:50:21] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 032] Add a java base + web container [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291222 (https://phabricator.wikimedia.org/T124903) (owner: 10Yuvipanda)
[19:50:31] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] Make the builder script less simple [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291525 (owner: 10BryanDavis)
[19:50:45] <bblack>	 !log depooled reboot of cp3030 - T126062
[19:50:46] <stashbot>	 T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062
[19:50:50] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:51:10] <bd808>	 YuviPanda: is that all set up for zuul to do the actual merges?
[19:51:25] <YuviPanda>	 bd808: oh, good point, I've no idea
[19:51:42] <paladox>	 Dereckson: Hi i left a comment at https://gerrit.wikimedia.org/r/#/c/291671/
[19:52:02] <YuviPanda>	 bd808: shall I just merge by hand? or is there a thing I need to do for zuul?
[19:53:22] <icinga-wm>	 PROBLEM - Host cp3030 is DOWN: PING CRITICAL - Packet loss = 100%
[19:53:42] <bd808>	 YuviPanda: probably just merge yourself for now. I think https://gerrit.wikimedia.org/r/#/c/291685/ will get zuul and jenkins setup properly
[19:54:11] <icinga-wm>	 RECOVERY - Host cp3030 is UP: PING OK - Packet loss = 0%, RTA = 83.33 ms
[19:54:19] <bd808>	 Or I can go hit v+2 on the start of the chain if you want
[19:54:39] <YuviPanda>	 bd808: nah, I feel ok doing the V+2
[19:54:50] <bd808>	 cool beans
[19:54:50] <grrrit-wm>	 (03CR) 10Yuvipanda: [V: 032] Add base PHP container & php web container [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/290607 (owner: 10Yuvipanda)
[19:55:01] <grrrit-wm>	 (03CR) 10Yuvipanda: [V: 032] Switch to using wikimedia-jessie as base container [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/290795 (owner: 10Yuvipanda)
[19:55:29] <grrrit-wm>	 (03CR) 10Yuvipanda: [V: 032] Add a simple builder script (033 comments) [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/290793 (owner: 10Yuvipanda)
[19:55:43] <grrrit-wm>	 (03CR) 10Yuvipanda: [V: 032] Add a java base + web container [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291222 (https://phabricator.wikimedia.org/T124903) (owner: 10Yuvipanda)
[19:55:54] <grrrit-wm>	 (03CR) 10Yuvipanda: [V: 032] Make the builder script less simple [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291525 (owner: 10BryanDavis)
[19:57:04] <bblack>	 !log depooled reboot of cp3031 - T126062
[19:57:04] <stashbot>	 T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062
[19:57:11] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:59:12] <icinga-wm>	 PROBLEM - Host cp3031 is DOWN: PING CRITICAL - Packet loss = 100%
[20:00:33] <icinga-wm>	 RECOVERY - Host cp3031 is UP: PING OK - Packet loss = 0%, RTA = 94.20 ms
[20:02:02] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 660 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6036004 keys - replication_delay is 660
[20:02:34] <bblack>	 !log depooled reboot of cp3032 - T126062
[20:02:35] <stashbot>	 T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062
[20:02:40] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:02:54] <bblack>	 !log depooled reboot of cp3033 (not 3032) - T126062
[20:02:55] <stashbot>	 T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062
[20:02:59] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:08:40] <grrrit-wm>	 (03PS1) 10Yuvipanda: build: Allow disregarding cache when building docker images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291980 
[20:09:00] <eranroz>	 Per Dereckson	suggestion I added a request for deploying a config change for hewiki (gerrit:284087) in the next SWAT slot. I added the request Deployments page.
[20:09:20] <grrrit-wm>	 (03CR) 10Yuvipanda: "Tested" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291980 (owner: 10Yuvipanda)
[20:09:55] <bblack>	 !log depooled reboot of cp3041 - T126062
[20:09:56] <stashbot>	 T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062
[20:10:01] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:10:03] <urandom>	 !log disabling cql binary transport on restbase1007-c
[20:10:09] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:11:42] <Dereckson>	 eranroz: nice, thanks
[20:12:21] <icinga-wm>	 PROBLEM - Host cp3041 is DOWN: PING CRITICAL - Packet loss = 100%
[20:12:46] <grrrit-wm>	 (03PS2) 10BryanDavis: build: Allow disregarding cache when building docker images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291980 (owner: 10Yuvipanda)
[20:13:17] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 032] build: Allow disregarding cache when building docker images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291980 (owner: 10Yuvipanda)
[20:13:22] <icinga-wm>	 RECOVERY - Host cp3041 is UP: PING OK - Packet loss = 0%, RTA = 91.72 ms
[20:13:35] <logmsgbot>	 !log thcipriani@tin Finished scap: testwiki to php-1.28.0-wmf.4 and rebuild l10n cache (duration: 50m 09s)
[20:13:41] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:13:42] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6023645 keys - replication_delay is 0
[20:14:13] <jzerebecki>	 thcipriani: can I sneak in a backport?
[20:14:44] <thcipriani>	 jzerebecki: sure, hopefully not one I need a full scap on?
[20:14:55] <grrrit-wm>	 (03CR) 10BryanDavis: [V: 032] build: Allow disregarding cache when building docker images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291980 (owner: 10Yuvipanda)
[20:15:21] <icinga-wm>	 PROBLEM - cassandra-c CQL 10.64.0.232:9042 on restbase1007 is CRITICAL: Connection refused
[20:15:35] <YuviPanda>	 bd808: thanks :D I'm working on small no-op refactors to toollabs-webservice too, so should be up soon
[20:16:08] <jzerebecki>	 thcipriani: no only one file https://gerrit.wikimedia.org/r/#/c/291974/
[20:18:57] <thcipriani>	 jzerebecki: sure. I'll get it out (if zuul/wikibugs notices)
[20:23:01] <icinga-wm>	 RECOVERY - cassandra-c CQL 10.64.0.232:9042 on restbase1007 is OK: TCP OK - 0.000 second response time on port 9042
[20:23:55] <bblack>	 !log depooled reboot of cp3042 - T126062
[20:23:56] <stashbot>	 T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062
[20:24:01] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:26:11] <icinga-wm>	 PROBLEM - Host cp3042 is DOWN: PING CRITICAL - Packet loss = 100%
[20:27:02] <icinga-wm>	 RECOVERY - Host cp3042 is UP: PING OK - Packet loss = 0%, RTA = 83.92 ms
[20:28:34] <bblack>	 !log depooled reboot of cp3043 - T126062
[20:28:34] <stashbot>	 T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062
[20:28:39] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:28:59] <grrrit-wm>	 (03PS1) 10BryanDavis: [DO NOT MERGE] Verify that tox job notices flake8 failure [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291988 
[20:29:41] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [DO NOT MERGE] Verify that tox job notices flake8 failure [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291988 (owner: 10BryanDavis)
[20:31:13] <icinga-wm>	 PROBLEM - Host cp3043 is DOWN: PING CRITICAL - Packet loss = 100%
[20:31:42] <icinga-wm>	 RECOVERY - Host cp3043 is UP: PING OK - Packet loss = 0%, RTA = 85.89 ms
[20:32:18] <grrrit-wm>	 (03Abandoned) 10BryanDavis: [DO NOT MERGE] Verify that tox job notices flake8 failure [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291988 (owner: 10BryanDavis)
[20:32:33] <icinga-wm>	 PROBLEM - traffic-pool service on cp3043 is CRITICAL: CRITICAL - Expecting active but unit traffic-pool is activating
[20:33:50] <logmsgbot>	 !log thcipriani@tin Synchronized php-1.28.0-wmf.4/extensions/Wikidata/extensions/Wikibase/view/resources/jquery/ui/jquery.ui.tagadata.js: [[gerit:291974|Update Wikibase]] (duration: 00m 30s)
[20:33:53] <icinga-wm>	 RECOVERY - traffic-pool service on cp3043 is OK: OK - traffic-pool is active
[20:33:57] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:34:08] <thcipriani>	 ^ jzerebecki wmf.3 coming
[20:34:13] <icinga-wm>	 PROBLEM - NTP on cp3043 is CRITICAL: NTP CRITICAL: Offset unknown
[20:34:30] <wikibugs>	 06Operations, 10ops-esams, 06DC-Ops, 10Traffic: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062#2342590 (10BBlack) 05Open>03Resolved a:03BBlack All of cache_text in esams (8/12 of the nodes considered affected) have rebooted into 4.4.2-3+wmf1 today without issue.  It coul...
[20:35:04] <logmsgbot>	 !log thcipriani@tin Synchronized php-1.28.0-wmf.3/extensions/Wikidata/extensions/Wikibase/view/resources/jquery/ui/jquery.ui.tagadata.js: [[gerit:291974|Update Wikibase]] (duration: 00m 23s)
[20:35:09] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:35:09] <thcipriani>	 ^ jzerebecki check please
[20:36:04] <icinga-wm>	 RECOVERY - NTP on cp3043 is OK: NTP OK: Offset -0.05275964737 secs
[20:36:49] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] Group0 to 1.28.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291970 (owner: 10Thcipriani)
[20:37:01] <jzerebecki>	 thcipriani: it seems i still get old js
[20:37:29] <jzerebecki>	 thcipriani: ok works now
[20:37:29] <jzerebecki>	 thx
[20:37:41] <thcipriani>	 jzerebecki: np :)
[20:37:55] <grrrit-wm>	 (03Merged) 10jenkins-bot: Group0 to 1.28.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291970 (owner: 10Thcipriani)
[20:38:04] <grrrit-wm>	 (03PS1) 10Yuvipanda: java: Inherit from correct base image name [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 
[20:38:13] <YuviPanda>	 bd808: ^ we can test on this
[20:38:21] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] java: Inherit from correct base image name [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 (owner: 10Yuvipanda)
[20:38:33] <YuviPanda>	 hah
[20:38:45] <grrrit-wm>	 (03PS2) 10Yuvipanda: java: Inherit from correct base image name [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 
[20:39:46] <logmsgbot>	 !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.28.0-wmf.4
[20:39:50] <bd808>	 YuviPanda: I think the php template has the same problem
[20:39:52] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:41:14] <grrrit-wm>	 (03PS3) 10Yuvipanda: Make web templates inherit from correct base templates [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 
[20:41:25] <YuviPanda>	 bd808: yeah, I think it worked for php because there might've been a hand built toollabs-php before
[20:41:33] <grrrit-wm>	 (03PS4) 10Yuvipanda: Make web templates inherit from correct base templates [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 
[20:42:17] <thcipriani>	 well that's weird. Seeing lots of notices for Notice: Undefined variable: wgAbuseFilterAvailableActions in /srv/mediawiki/wmf-config/abusefilter.php on line 23
[20:42:23] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 032] Make web templates inherit from correct base templates [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 (owner: 10Yuvipanda)
[20:42:29] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Make web templates inherit from correct base templates [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 (owner: 10Yuvipanda)
[20:42:31] <grrrit-wm>	 (03PS3) 10Dereckson: Add namespace translation 'Portal' for diq [mediawiki-config] - 10https://gerrit.wikimedia.org/r/284866 (https://phabricator.wikimedia.org/T133702) (owner: 10Raimond Spekking)
[20:42:55] <bd808>	 hrrm
[20:43:51] <grrrit-wm>	 (03CR) 10Dereckson: [C: 031] "So, finally, we've got feedback from the community: they agree with this change." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/284866 (https://phabricator.wikimedia.org/T133702) (owner: 10Raimond Spekking)
[20:44:05] <grrrit-wm>	 (03Merged) 10jenkins-bot: Make web templates inherit from correct base templates [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 (owner: 10Yuvipanda)
[20:44:07] <bd808>	 ah, there it goes
[20:55:24] <thcipriani>	 !log rolling back group0 wmf.4 for T136644 too much log spam
[20:55:25] <stashbot>	 T136644: Notice: Undefined variable: wgAbuseFilterAvailableActions in /srv/mediawiki/wmf-config/abusefilter.php on line 23 - https://phabricator.wikimedia.org/T136644
[20:55:30] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:55:53] <icinga-wm>	 PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master).
[20:57:08] <grrrit-wm>	 (03PS1) 10Thcipriani: Revert "Group0 to 1.28.0-wmf.4" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292027 
[20:58:10] <logmsgbot>	 !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 back to 1.28.0-wmf.3
[20:58:16] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:58:58] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] Revert "Group0 to 1.28.0-wmf.4" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292027 (owner: 10Thcipriani)
[20:59:35] <grrrit-wm>	 (03Merged) 10jenkins-bot: Revert "Group0 to 1.28.0-wmf.4" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292027 (owner: 10Thcipriani)
[21:06:05] <grrrit-wm>	 (03PS1) 10Yuvipanda: [WIP] Introduce 'Backends' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 
[21:10:15] <grrrit-wm>	 (03PS2) 10Yuvipanda: [WIP] Introduce 'Backends' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 
[21:12:51] <wikibugs>	 06Operations, 10DBA, 06Labs: disk failure on labsdb1002 - https://phabricator.wikimedia.org/T126946#2342722 (10russblau) Is there any update on the status of this? On 23 May, the revision table was in progress and was expected to take ~12 hours. The pagelinks table is about 3X larger and so might be expected...
[21:12:52] <paladox>	 thcipriani: I think I think this https://github.com/wikimedia/mediawiki-extensions-AbuseFilter/commit/e71808f4c4deca416ecd39160d12f2584bfb9d65 caused the problem
[21:13:23] <paladox>	 thcipriani: What is on line 23 of /srv/mediawiki/wmf-config/abusefilter.php
[21:14:17] <thcipriani>	 paladox: I saw that on the ticket. Added tgr to the ticket. https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/abusefilter.php#L21-L23
[21:14:25] <grrrit-wm>	 (03PS1) 10Madhuvishy: uwsgi: Allow specifying plugins optionally as a uwsgi command line option [puppet] - 10https://gerrit.wikimedia.org/r/292030 
[21:14:30] <paladox>	 Ok thanks
[21:14:39] <madhuvishy>	  YuviPanda: ^^
[21:14:55] <madhuvishy>	 not tested yet but about to
[21:16:17] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 04-1] uwsgi: Allow specifying plugins optionally as a uwsgi command line option (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/292030 (owner: 10Madhuvishy)
[21:16:23] <YuviPanda>	 madhuvishy: kk, did a nit
[21:17:02] <madhuvishy>	 YuviPanda: yup cool. can you merge https://gerrit.wikimedia.org/r/#/c/291952/
[21:19:43] <tgr>	 thcipriani: it's best to use @username when adding someone to a ticket
[21:19:43] <icinga-wm>	 PROBLEM - cassandra-c CQL 10.64.0.232:9042 on restbase1007 is CRITICAL: Connection refused
[21:19:47] <grrrit-wm>	 (03PS2) 10Yuvipanda: ifttt: Specify the right uwsgi plugin for python2 [puppet] - 10https://gerrit.wikimedia.org/r/291952 (owner: 10Madhuvishy)
[21:19:54] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] ifttt: Specify the right uwsgi plugin for python2 [puppet] - 10https://gerrit.wikimedia.org/r/291952 (owner: 10Madhuvishy)
[21:20:10] <tgr>	 if you just update the subscribers, chances are they won't see it until someone adds a new comment
[21:20:54] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-c CQL 10.64.0.232:9042 on restbase1007 is CRITICAL: Connection refused eevans t-shooting - The acknowledgement expires at: 2016-06-01 21:20:33.
[21:21:02] <thcipriani>	 tgr: hmm, hadn't noticed that, will keep it in mind in future.
[21:22:03] <thcipriani>	 tgr: ticket I was talking about (if you missed backscroll): https://phabricator.wikimedia.org/T136644
[21:22:28] <grrrit-wm>	 (03PS3) 10Yuvipanda: [WIP] Introduce 'Backends' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 
[21:22:51] <tgr>	 thcipriani: yeah, saw it, extension registration must cause that variable to be defined too late
[21:23:00] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [WIP] Introduce 'Backends' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 (owner: 10Yuvipanda)
[21:23:12] <thcipriani>	 that's seemingly the case in this instance.
[21:23:16] <tgr>	 maybe legoktm has an idea how to solve that nicely
[21:24:16] <wikibugs>	 06Operations, 10ops-codfw, 13Patch-For-Review: codfw old mw app server decomission - https://phabricator.wikimedia.org/T135468#2342748 (10Papaul) disk wipe complete on mw2001-mw2016 and mw2018-mw2040. Those servers are unracked and stored in the storage area.  Disk wipe in progress on mw2014-mw2060
[21:24:31] <legoktm>	 I commented
[21:24:31] <grrrit-wm>	 (03PS4) 10Yuvipanda: Introduce 'Backends' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 
[21:24:38] <paladox>	 tgr would reverting so we can go on with wmf 4 and try next week with wmf 5 if it is fixed.
[21:26:21] <paladox>	 thcipriani legoktm ^^
[21:26:23] <icinga-wm>	 PROBLEM - cxserver endpoints health on scb1002 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) is CRITICAL: Test Fetch enwiki Oxygen page returned the unexpected status 404 (expecting: 200)
[21:26:33] <icinga-wm>	 PROBLEM - cassandra-a CQL 10.64.0.230:9042 on restbase1007 is CRITICAL: Connection refused
[21:26:34] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: /page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 500 (expecting: 200): /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /page/html/{title} (Get html by title from storage) is CRITICAL: Test Ge
[21:26:43] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /page/html/{title} (Get html by title from storage) is CRITICAL: Test Get html by title from storage returned the unexpected status 500 (expecting: 200): /page/title/{title} (Get rev by title from storage) is CRITICAL
[21:26:44] <wikibugs>	 06Operations, 10ops-codfw: rack/setup/deploy new codfw mw app servers - https://phabricator.wikimedia.org/T135466#2342763 (10Papaul)
[21:26:52] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: /{domain}/v1/page/mobile-sections-lead/{title} (retrieve lead section of en.wp San Francisco page via mobile-sections-lead) is CRITICAL: Test retrieve lead section of en.wp San Francisco page via mobile-sections-lead returned the unexpected status 500 (expecting: 200): /{domain}/v1/page/mobile-sections-lead/{title} (retrieve lead section of en.wp Barack Obama page via mobil
[21:26:53] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: /page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 500 (expecting: 200): /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /page/html/{title} (Get html by title from storage) is CRITICAL: Test Ge
[21:26:53] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: /page/html/{title} (Get html by title from storage) is CRITICAL: Test Get html by title from storage returned the unexpected status 500 (expecting: 200): /page/mobile-sections/{title} (Get MobileApps Foobar page) is CRITICAL: Test Get MobileApps Foobar page returned the unexpected status 500 (expecting: 200)
[21:26:54] <icinga-wm>	 PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/mobile-summary/{title} (retrieve page preview of Dog page) is CRITICAL: Test retrieve page preview of Dog page returned the unexpected status 500 (expecting: 200): /{domain}/v1/page/mobile-sections-lead/{title} (retrieve lead section of en.wp San Francisco page via mobile-sections-lead) is CRITICAL: Test retrieve lead section of en.wp San Franc
[21:27:00] <wikibugs>	 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia, and 2 others: Image cache issue when 'over-writing' an image on commons - https://phabricator.wikimedia.org/T119038#2342765 (10BBlack)
[21:27:02] <icinga-wm>	 PROBLEM - cassandra-b CQL 10.64.0.231:9042 on restbase1007 is CRITICAL: Connection refused
[21:27:04] <grrrit-wm>	 (03CR) 10Yuvipanda: "I've tested this and it seems to work." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 (owner: 10Yuvipanda)
[21:27:13] <icinga-wm>	 PROBLEM - cxserver endpoints health on scb1001 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) is CRITICAL: Test Fetch enwiki Oxygen page returned the unexpected status 404 (expecting: 200)
[21:27:13] <icinga-wm>	 PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 500 (expecting: 200): /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /page/html/{title} (Get html by title from storage) is CRITICAL: Te
[21:27:33] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /page/html/{title} (Get html by title from storage) is CRITICAL: Test Get html by title from storage returned the unexpected status 500 (expecting: 200): /page/title/{title} (Get rev by title from storage) is CRITICAL
[21:27:33] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: /page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 500 (expecting: 200): /page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 500 (expecting: 200): /page/revision/{revision} (Get rev by ID) is CRITICAL: Test Get rev by ID r
[21:27:43] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: /page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 500 (expecting: 200): /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /page/html/{title} (Get html by title from storage) is CRITICAL: Test Ge
[21:27:43] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1010 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200)
[21:27:51] <urandom>	 ummm
[21:27:53] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: /{domain}/v1/page/media/{title} (retrieve images and videos of en.wp Cat page via media route) is CRITICAL: Test retrieve images and videos of en.wp Cat page via media route returned the unexpected status 500 (expecting: 200): /{domain}/v1/page/mobile-summary/{title} (retrieve page preview of Dog page) is CRITICAL: Test retrieve page preview of Dog page returned the unexpec
[21:28:13] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[21:28:14] <mdholloway>	 :|
[21:28:38] <urandom>	 !log Bouncing restbase on restbase1010.eqiad
[21:28:42] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy
[21:28:43] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:29:06] <tgr>	 thcipriani: I think I would rather just duplicate variable initialization in the config file for now and take a little more time writing the proper patch
[21:29:13] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [1000.0]
[21:29:13] <icinga-wm>	 RECOVERY - cxserver endpoints health on scb1001 is OK: All endpoints are healthy
[21:29:43] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1010 is OK: All endpoints are healthy
[21:29:53] <urandom>	 !log Bouncing restbase on restbase1008.eqiad.wmnet
[21:29:53] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0]
[21:29:53] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0]
[21:29:58] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:30:04] <wikibugs>	 06Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration: Create moon.wikimedia.org and redirect it to https://meta.wikimedia.org/wiki/Wikipedia_to_the_Moon - https://phabricator.wikimedia.org/T136557#2342785 (10BBlack) Unclear from the description: Is it intended that moon always redirects to this...
[21:30:23] <wikibugs>	 06Operations, 10Traffic: Upgrade all cache clusters to Varnish 4 - https://phabricator.wikimedia.org/T131499#2342795 (10BBlack)
[21:30:25] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review: Sort out vcl_deliver vs vcl_synth mess with v4 VCL - https://phabricator.wikimedia.org/T135696#2342793 (10BBlack) 05Open>03Resolved a:03BBlack
[21:30:45] <urandom>	 !log Bouncing restbase on restbase1009.eqiad.wmnet
[21:30:50] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:30:52] <thcipriani>	 tgr: that works for me.
[21:30:53] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy
[21:31:59] <urandom>	 !log Bouncing restbase on restbase1012.eqiad.wmnet
[21:32:04] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy
[21:32:04] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:32:25] <wikibugs>	 06Operations, 06Commons, 10Traffic, 10media-storage, and 2 others: upload-lb.ulsfo.wikimedia.org still allow access to some deleted files - https://phabricator.wikimedia.org/T133819#2342800 (10BBlack)
[21:32:26] <icinga-wm>	 RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy
[21:32:29] <wikibugs>	 06Operations, 10Traffic: Content purges are unreliable - https://phabricator.wikimedia.org/T133821#2342802 (10BBlack)
[21:32:44] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy
[21:32:56] <urandom>	 !log Bouncing restbase on restbase1013.eqiad.wmnet
[21:33:01] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:33:26] <wikibugs>	 06Operations, 06Commons, 10Traffic, 10media-storage, and 2 others: Deleted files sometimes remain visible to non-privileged users if permanently linked - https://phabricator.wikimedia.org/T109331#2342807 (10BBlack)
[21:33:31] <wikibugs>	 06Operations, 10Traffic: Content purges are unreliable - https://phabricator.wikimedia.org/T133821#2245593 (10BBlack)
[21:33:50] <urandom>	 !log Bouncing restbase on restbase1015.eqiad.wmnet
[21:33:56] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:34:25] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy
[21:34:35] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy
[21:34:56] <icinga-wm>	 RECOVERY - cxserver endpoints health on scb1002 is OK: All endpoints are healthy
[21:34:56] <icinga-wm>	 RECOVERY - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is OK: All endpoints are healthy
[21:34:57] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-b CQL 10.64.0.231:9042 on restbase1007 is CRITICAL: Connection refused eevans t-shooting - The acknowledgement expires at: 2016-06-01 21:34:45.
[21:35:54] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy
[21:36:05] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy
[21:36:46] <grrrit-wm>	 (03PS5) 10Yuvipanda: Introduce 'Backends' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 
[21:37:01] <wikibugs>	 06Operations, 10Traffic: Varnish: the lower the Age value, the slower the request - https://phabricator.wikimedia.org/T84980#2342830 (10BBlack) 05Open>03Resolved No movement in over a year, and is more an observation than a question.
[21:37:55] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy
[21:38:33] <grrrit-wm>	 (03PS1) 10Gergő Tisza: Workaround for T136644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292036 (https://phabricator.wikimedia.org/T136644) 
[21:38:54] <grrrit-wm>	 (03CR) 10Yuvipanda: "There's also the question of if 'Backend' is the right terminology to use here." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 (owner: 10Yuvipanda)
[21:39:21] <tgr>	 thcipriani: legoktm: https://gerrit.wikimedia.org/r/#/c/292036/
[21:39:36] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-a CQL 10.64.0.230:9042 on restbase1007 is CRITICAL: Connection refused eevans administratively shutdown while t-shooting - The acknowledgement expires at: 2016-06-01 21:39:13.
[21:40:24] <grrrit-wm>	 (03CR) 10Legoktm: [C: 031] Workaround for T136644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292036 (https://phabricator.wikimedia.org/T136644) (owner: 10Gergő Tisza)
[21:40:26] <grrrit-wm>	 (03CR) 10Paladox: [C: 031] Workaround for T136644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292036 (https://phabricator.wikimedia.org/T136644) (owner: 10Gergő Tisza)
[21:40:53] <wikibugs>	 06Operations, 10Traffic, 10Wikimedia-Apache-configuration, 07HTTPS: HTTP->HTTPS redirects need to unconditional send Vary header - https://phabricator.wikimedia.org/T98990#2342859 (10BBlack) 05Open>03declined Varnish is now doing all the redirects directly rather than the applayer.
[21:40:53] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] Workaround for T136644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292036 (https://phabricator.wikimedia.org/T136644) (owner: 10Gergő Tisza)
[21:41:06] <thcipriani>	 tgr: thank you :)
[21:41:50] <grrrit-wm>	 (03Merged) 10jenkins-bot: Workaround for T136644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292036 (https://phabricator.wikimedia.org/T136644) (owner: 10Gergő Tisza)
[21:43:51] <logmsgbot>	 !log thcipriani@tin Synchronized wmf-config/abusefilter.php: [[gerrit:292036|Workaround for T136644]] (duration: 00m 30s)
[21:43:52] <stashbot>	 T136644: Notice: Undefined variable: wgAbuseFilterAvailableActions in /srv/mediawiki/wmf-config/abusefilter.php on line 23 - https://phabricator.wikimedia.org/T136644
[21:43:57] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:44:04] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[21:45:34] <icinga-wm>	 RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge.
[21:45:55] <grrrit-wm>	 (03PS1) 10Thcipriani: Revert "Revert "Group0 to 1.28.0-wmf.4"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292037 
[21:46:17] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] Revert "Revert "Group0 to 1.28.0-wmf.4"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292037 (owner: 10Thcipriani)
[21:46:44] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[21:46:44] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[21:46:46] <wikibugs>	 06Operations, 10Traffic, 07HTTPS: Getting ssl_error_inappropriate_fallback_alert very rarely - https://phabricator.wikimedia.org/T108579#2342885 (10BBlack) 05Open>03Resolved a:03BBlack Assuming not, re-open if so.
[21:46:53] <grrrit-wm>	 (03Merged) 10jenkins-bot: Revert "Revert "Group0 to 1.28.0-wmf.4"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292037 (owner: 10Thcipriani)
[21:47:29] <logmsgbot>	 !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.28.0-wmf.4
[21:48:20] <thcipriani>	 no log explosion
[21:48:45] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[21:50:05] <wikibugs>	 06Operations, 10Traffic, 07HTTPS: When user is logging out via HTTPS, insecure HTTP cookies keeping logged in state should be cleared as well - https://phabricator.wikimedia.org/T34144#2342891 (10BBlack) 05Open>03Resolved a:03BBlack Assuming this is no longer an issue, since login via HTTP is impossible.
[21:50:50] <grrrit-wm>	 (03PS2) 10Madhuvishy: uwsgi: Allow specifying plugins optionally as a uwsgi command line option [puppet] - 10https://gerrit.wikimedia.org/r/292030 
[21:51:24] <wikibugs>	 06Operations, 10Traffic, 07Beta-Cluster-reproducible: PHP fatal errors causing Varnish to return 503 - "Junk after gzip data" - https://phabricator.wikimedia.org/T125938#2342896 (10BBlack) Is this still reproducible?  Did we decide whether varnish or hhvm was at fault?
[21:51:57] <wikibugs>	 06Operations, 10Traffic: 3 Varnish cache_upload servers crashed in a short time window - https://phabricator.wikimedia.org/T125401#2342897 (10BBlack) 05Open>03Resolved a:03BBlack Haven't seen much of this since, and 4.4.x upgrades are in-progress this week.
[21:53:25] <wikibugs>	 06Operations, 10Traffic: Varnish leaks memory - https://phabricator.wikimedia.org/T122455#2342900 (10BBlack) 05Open>03Resolved a:03BBlack We've kept TBF reverted ever since.  At this point the VCL wouldn't un-revert easily anyways, so we'll look again at TBF or similar post-Varnish4, and we don't have an...
[21:54:23] <wikibugs>	 06Operations, 07Puppet, 10Traffic: Clean up  nginx / nginx::ssl classes and usage - https://phabricator.wikimedia.org/T118078#2342904 (10BBlack) 05Open>03Resolved a:03BBlack eh, this is a "refactor things better" ticket.  We're always doing that and we're never done.
[21:55:11] <wikibugs>	 06Operations, 10Traffic: Reintroduce rejection for requests with null user agents - https://phabricator.wikimedia.org/T111140#2342912 (10BBlack) 05Open>03declined
[21:56:58] <wikibugs>	 06Operations, 10MediaWiki-extensions-CentralNotice, 10Traffic, 10Wikimedia-Fundraising: Provide location, logged-in status and device information in ResourceLoaderContext - https://phabricator.wikimedia.org/T103695#2342913 (10BBlack) This ticket is getting stale, is it still relevant and up-to-date with cu...
[21:58:03] <wikibugs>	 06Operations, 10Traffic: Varnish Assert error in VGZ_Ibuf() - https://phabricator.wikimedia.org/T122462#2342915 (10BBlack) 05Open>03Resolved a:03BBlack It hasn't been a huge issue over the past several months, and everything about this will change with Varnish4 which is in the process of being deployed.
[22:03:24] <icinga-wm>	 PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master).
[22:07:26] <Josve05a>	 I'm getting "VM417:400 Uncaught TypeError: Cannot read property 'toLowerCase' of null" on Commons
[22:07:35] <Josve05a>	 some gadgets aren't loading...
[22:09:04] <Josve05a>	 http://dpaste.com/10FTJY3.txt
[22:09:39] <Josve05a>	 something new happened to mediawiki? new apis or something?
[22:12:04] <p858snake>	 thcipriani: did you just rool out to commons? ^
[22:12:17] <p858snake>	 legoktm: ^
[22:12:48] <legoktm>	 uh, commons shouldn't have gotten it today?
[22:12:49] <thcipriani>	 p858snake: no, commons did not just get an update. mediawiki.org did get a new version as well as testwiki and test2wiki
[22:14:55] <Josve05a>	 The gadgets (Google and TineEye) worked just minutes before. Now I'm getting this instead
[22:15:07] <Josve05a>	 (plus a few more gadgets)
[22:15:24] <Josve05a>	 (I'll file a ticket)
[22:17:34] <Josve05a>	 nvm...seems to be https://phabricator.wikimedia.org/T134860
[22:23:02] <urandom>	 anyone know why this https://graphite.wikimedia.org/render?target=servers.restbase1007.iostat.md2.read_byte_per_second&from=-12h&width=1024 ... would disagree with the output of iostat on the machine?
[22:23:18] <urandom>	 disagree by a lot
[22:25:27] <grrrit-wm>	 (03PS2) 10Dzahn: Stop using package->latest in ganglia monitor [puppet] - 10https://gerrit.wikimedia.org/r/291764 (https://phabricator.wikimedia.org/T115384) (owner: 10Muehlenhoff)
[22:30:31] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 04-1] uwsgi: Allow specifying plugins optionally as a uwsgi command line option (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/292030 (owner: 10Madhuvishy)
[22:35:52] <icinga-wm>	 PROBLEM - puppet last run on cp2009 is CRITICAL: CRITICAL: Puppet has 1 failures
[22:36:10] <grrrit-wm>	 (03PS1) 10Dereckson: Set collation to uca-it for it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292050 
[22:38:03] <tgr>	 !log running "mwscript sql.php --wiki=zerowiki /srv/mediawiki/php-1.28.0-wmf.4/maintenance/archives/patch-bot_passwords.sql" for T135074
[22:38:04] <stashbot>	 T135074: Update JsonConfig for AuthManager - https://phabricator.wikimedia.org/T135074
[22:38:08] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:42:02] <icinga-wm>	 PROBLEM - HP RAID on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 20 seconds.
[22:43:53] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] Stop using package->latest in ganglia monitor [puppet] - 10https://gerrit.wikimedia.org/r/291764 (https://phabricator.wikimedia.org/T115384) (owner: 10Muehlenhoff)
[22:44:01] <icinga-wm>	 RECOVERY - HP RAID on ms-be1016 is OK: OK: Slot 1: OK: 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2, Controller, Battery/Capacitor
[22:49:43] <grrrit-wm>	 (03PS1) 10Gergő Tisza: Enable bot passwords on zerowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292053 (https://phabricator.wikimedia.org/T135074) 
[23:00:05] <jouncebot>	 RoanKattouw ostriches Krenair MaxSem Dereckson: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160531T2300). Please do the needful.
[23:00:05] <jouncebot>	 RoanKattouw eranroz ebernhardson James_F: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[23:00:13] <Dereckson>	 Hi.
[23:00:17] * RoanKattouw waves
[23:00:20] <ebernhardson>	 \o
[23:00:22] * James_F waves.
[23:01:00] <MaxSem>	 \m/
[23:02:16] <icinga-wm>	 RECOVERY - puppet last run on cp2009 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[23:02:27] <Dereckson>	 I would like to add https://gerrit.wikimedia.org/r/292050 to the SWAT.
[23:03:27] <Dereckson>	 Eranroz isn't here?
[23:05:09] <Dereckson>	 Okay I can SWAT. We'll see later for Eranroz.
[23:06:15] <icinga-wm>	 PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[23:06:57] <grrrit-wm>	 (03PS1) 10Yuvipanda: Add LICENSE [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292056 
[23:07:04] <wikibugs>	 06Operations, 10Traffic: Content purges are unreliable - https://phabricator.wikimedia.org/T133821#2343168 (10MZMcBride) Related:  * {T56902} * {T130901} * {T135964}
[23:07:25] <grrrit-wm>	 (03PS1) 10Yuvipanda: Add LICENSE [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/292057 
[23:07:26] <YuviPanda>	 bd808: ^^
[23:07:35] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[23:08:05] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[23:08:08] <grrrit-wm>	 (03PS2) 10BryanDavis: Add LICENSE [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/292057 (owner: 10Yuvipanda)
[23:08:13] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Add LICENSE [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/292057 (owner: 10Yuvipanda)
[23:08:39] <YuviPanda>	 bd808: I think for the base images you need to agree
[23:08:48] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 032] Add LICENSE [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/292057 (owner: 10Yuvipanda)
[23:08:55] * bd808 agrees
[23:09:03] <YuviPanda>	 bd808: for the other one, only other committer is valhallasw`cloud and he also only added a short comment. I'm ok with us merging it or waiting for his +1
[23:09:13] <James_F>	 Dereckson: Note that you'll need to make the pull-through commits for the MW-VE production branches into MW manually, as always.
[23:12:11] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 031] Add LICENSE (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292056 (owner: 10Yuvipanda)
[23:13:43] <Dereckson>	 James_F: I generally rebase the wmf branch against origin/wmf
[23:14:04] <icinga-wm>	 RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[23:14:22] <James_F>	 Dereckson: OK? But my point remains, gerrit won't auto-make the commits for you for the VE-MW repo.
[23:14:37] <Dereckson>	 oh okay, yes yes for VE I rebase the extension branch.
[23:15:15] * James_F nods.
[23:15:19] <James_F>	 It's irritating.
[23:15:25] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[23:15:29] <Dereckson>	 So we're waiting Zuul now, https://integration.wikimedia.org/zuul/
[23:15:50] <grrrit-wm>	 (03PS3) 10Madhuvishy: uwsgi: Allow specifying plugins optionally as a uwsgi command line option [puppet] - 10https://gerrit.wikimedia.org/r/292030 
[23:15:55] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[23:17:08] <grrrit-wm>	 (03PS2) 10Dereckson: Set collation to uca-it for it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292050 (https://phabricator.wikimedia.org/T136647) 
[23:18:30] <ostriches>	 Dereckson: I'm going afk for just a bit and you should be done with swat before I'm back, but plz ping me or something when swat is done, I'm going to sync something afterwords.
[23:18:38] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add LICENSE [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/292057 (owner: 10Yuvipanda)
[23:18:43] <Dereckson>	 ostriches: k
[23:18:47] <ostriches>	 Thx
[23:19:48] <grrrit-wm>	 (03CR) 10Luke081515: [C: 031] Set collation to uca-it for it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292050 (https://phabricator.wikimedia.org/T136647) (owner: 10Dereckson)
[23:22:54] <Dereckson>	 Ah, Zuul merged stuff.
[23:24:47] <ebernhardson>	 cool
[23:28:32] <logmsgbot>	 !log dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: Turn off textcat subtest of search satisfaction (T134319) (duration: 00m 30s)
[23:28:32] <stashbot>	 T134319: Turn off TextCat A/B test on the English Wikipedia on or after May 23 - https://phabricator.wikimedia.org/T134319
[23:28:36] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:28:39] <Dereckson>	 ebernhardson: please test ^
[23:29:15] <ebernhardson>	 Dereckson: it will be a couple minutes before the cache clears, but will do
[23:29:34] <icinga-wm>	 RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge.
[23:33:12] <logmsgbot>	 !log dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/Echo/modules/styles/: Adjust styling for Special:Notification items (T136572, 1/2) (duration: 00m 30s)
[23:33:12] <stashbot>	 T136572: Make notification styling on the Notifications Page closer to the ones in the panel - https://phabricator.wikimedia.org/T136572
[23:33:17] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:33:49] <logmsgbot>	 !log dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/Echo/modules/ui/mw.echo.ui.NotificationItemWidget.js: Adjust styling for Special:Notification items (T136572, 2/2) (duration: 00m 24s)
[23:33:50] <stashbot>	 T136572: Make notification styling on the Notifications Page closer to the ones in the panel - https://phabricator.wikimedia.org/T136572
[23:33:52] <Dereckson>	 RoanKattouw: please test ^
[23:33:54] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:36:27] <ebernhardson>	 Dereckson: testwiki looks reasonable for my patch
[23:36:33] <Dereckson>	 ok let's go for wmf3
[23:36:53] <RoanKattouw>	 Dereckson: Looks good
[23:37:02] <Dereckson>	 ack
[23:37:44] <logmsgbot>	 !log dereckson@tin Synchronized php-1.28.0-wmf.3/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: Turn off textcat subtest of search satisfaction (T134319) (duration: 00m 23s)
[23:37:44] <stashbot>	 T134319: Turn off TextCat A/B test on the English Wikipedia on or after May 23 - https://phabricator.wikimedia.org/T134319
[23:37:48] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:38:09] <Dereckson>	 James_F: you're next
[23:39:07] <James_F>	 Kk.
[23:41:28] <grrrit-wm>	 (03PS2) 10Dzahn: remove pardus table and orain remnants [debs/wikistats] - 10https://gerrit.wikimedia.org/r/291481 (https://phabricator.wikimedia.org/T136460) 
[23:43:53] <logmsgbot>	 !log dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/VisualEditor/modules/ve-mw/init/ve.init.MWWelcomeDialog.js: ve.init.MWWelcomeDialog: Fix keyboard focus on dialog actions (T135808) (duration: 00m 23s)
[23:43:54] <stashbot>	 T135808: "Start editing" popup can't be dismissed without clicking "Start editing" - https://phabricator.wikimedia.org/T135808
[23:43:58] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:44:08] <Dereckson>	 James_F: please test on wmf4 ^
[23:44:46] <James_F>	 Dereckson: Yup, works well.
[23:45:26] <logmsgbot>	 !log dereckson@tin Synchronized php-1.28.0-wmf.3/extensions/VisualEditor/modules/ve-mw/init/ve.init.MWWelcomeDialog.js: ve.init.MWWelcomeDialog: Fix keyboard focus on dialog actions (T135808) (duration: 00m 22s)
[23:45:27] <stashbot>	 T135808: "Start editing" popup can't be dismissed without clicking "Start editing" - https://phabricator.wikimedia.org/T135808
[23:45:31] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:47:19] <logmsgbot>	 !log dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/GeoData/includes/Hooks.php: Don't index non-Earth coordinates (T136559) (duration: 00m 24s)
[23:47:20] <stashbot>	 T136559: Elasticsearch: illegal longitude value [219.38] for coordinates.coord - https://phabricator.wikimedia.org/T136559
[23:47:24] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:47:56] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032 V: 032] remove pardus table and orain remnants [debs/wikistats] - 10https://gerrit.wikimedia.org/r/291481 (https://phabricator.wikimedia.org/T136460) (owner: 10Dzahn)
[23:47:56] <Dereckson>	 MaxSem: please test ^
[23:48:00] <Dereckson>	 (wmf4)
[23:50:02] <ebernhardson>	 Dereckson: i have to run and catch a train, maxsem will be double checking that my patch works (he sits next to me)
[23:52:22] <Dereckson>	 okay, good train
[23:53:05] <Dereckson>	 MaxSem: ebernhardson has already tested it for wmf4, only wmf3 still to test for Turn off textcat search test
[23:53:47] <grrrit-wm>	 (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292050 (https://phabricator.wikimedia.org/T136647) (owner: 10Dereckson)
[23:53:59] <MaxSem>	 Dereckson, "testing" requires monitoring over a prolonged period, so just go ahead
[23:54:05] <Dereckson>	 k
[23:54:26] <grrrit-wm>	 (03Merged) 10jenkins-bot: Set collation to uca-it for it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292050 (https://phabricator.wikimedia.org/T136647) (owner: 10Dereckson)
[23:54:54] <logmsgbot>	 !log dereckson@tin Synchronized php-1.28.0-wmf.3/extensions/GeoData/includes/Hooks.php: Don't index non-Earth coordinates (T136559) (duration: 00m 23s)
[23:54:55] <stashbot>	 T136559: Elasticsearch: illegal longitude value [219.38] for coordinates.coord - https://phabricator.wikimedia.org/T136559
[23:54:59] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:54:59] <Dereckson>	 MaxSem: here you are ^ all is synced for you and ebernhardson 
[23:55:17] <MaxSem>	 thanks
[23:57:27] <logmsgbot>	 !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Set collation to uca-it for it.wikipedia (T136647) (duration: 00m 25s)
[23:57:28] <stashbot>	 T136647: Set UCA-IT as it.wiki's collation - https://phabricator.wikimedia.org/T136647
[23:57:29] <Dereckson>	 Will test that later, after running collation update script on Terbium.
[23:57:31] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:57:55] <Dereckson>	 Let's see if we can find Eranroz. If no, the SWAT is done.
[23:58:53] <Dereckson>	 22:36:22 -!- eranroz [~Thunderbi@37.46.39.199] has quit [Quit: eranroz]