[00:00:04] <jouncebot>	 twentyafterfour: Dear anthropoid, the time has come. Please deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160616T0000).
[00:13:00] <matt_flaschen>	 sync-dir hung for me at 99%.  I'll give it a couple minutes, then retry.
[00:14:20] <icinga-wm>	 PROBLEM - Apache HTTP on mw1147 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50408 bytes in 9.487 second response time
[00:15:00] <icinga-wm>	 PROBLEM - HHVM rendering on mw1147 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:15:40] <icinga-wm>	 PROBLEM - puppet last run on mw1147 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:15:52] <icinga-wm>	 PROBLEM - Disk space on mw1147 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:15:52] <icinga-wm>	 PROBLEM - salt-minion processes on mw1147 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:16:00] <icinga-wm>	 PROBLEM - nutcracker port on mw1147 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:16:42] <logmsgbot>	 !log mattflaschen@tin Synchronized php-1.28.0-wmf.6/extensions/Kartographer: Search for maplinks inside and outside of content. (duration: 01m 08s)
[00:16:48] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:16:50] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1147 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:16:51] <icinga-wm>	 PROBLEM - SSH on mw1147 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:17:04] <matt_flaschen>	 yurik, okay, it's correct on 387 servers, 1 failed, so it should be fine, unless you're doing an important demo.
[00:17:22] <matt_flaschen>	 00:16:42 ['/usr/bin/scap', 'pull', '--no-update-l10n', '--include', 'php-1.28.0-wmf.6', '--include', 'php-1.28.0-wmf.6/extensions', '--include', 'php-1.28.0-wmf.6/extensions/Kartographer', '--include', 'php-1.28.0-wmf.6/extensions/Kartographer/***', 'mw1097.eqiad.wmnet', 'mw1161.eqiad.wmnet', 'mw1010.eqiad.wmnet', 'mw2119.codfw.wmnet', 'mw2215.codfw.wmnet', 'mw2080.codfw.wmnet', 'mw1201.eqiad.wmnet', 'mw2187.codfw.wmnet', '
[00:17:24] <matt_flaschen>	 mw1216.eqiad.wmnet'] on mw1147.eqiad.wmnet returned [255]: Connection to 10.64.16.127 timed out while waiting to read
[00:17:30] <icinga-wm>	 PROBLEM - DPKG on mw1147 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:17:31] <matt_flaschen>	 Which probably-not-coincidentally is the same as:
[00:17:31] <icinga-wm>	 PROBLEM - configured eth on mw1147 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:17:40] <matt_flaschen>	 <icinga-wm> PROBLEM - SSH on mw1147 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:17:42] <icinga-wm>	 PROBLEM - dhclient process on mw1147 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:18:17] <matt_flaschen>	 ^ robh , sync problems.
[00:18:45] <matt_flaschen>	 yurik, the second time I ran it, it did complete, one apache failed though.
[00:18:53] <robh>	 mw1147 died out?
[00:19:00] <yurik>	 rip mw1147
[00:19:05] <yurik>	 thx matt_flaschen !
[00:19:06] <robh>	 lemme take a peek at it
[00:19:38] <matt_flaschen>	 robh, if that's what that scap pull error above means.  Thanks for checking.
[00:20:03] <robh>	 well, it has that plus then icinga shows it falling over
[00:20:03] <matt_flaschen>	 Scap complete
[00:20:11] <robh>	 it was likely taxed and scap killed it, it happens.
[00:21:09] <robh>	 trying to login is stalling out.
[00:21:14] <robh>	 (from serial)
[00:21:52] <robh>	 !log mw1147 seems to have died during scap, unresponsive from serial console, powercycled
[00:21:59] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:22:23] <robh>	 matt_flaschen: what is the scap command i need to run to bring it back up to snuff once its online?
[00:22:31] <icinga-wm>	 PROBLEM - nutcracker process on mw1147 is CRITICAL: Timeout while attempting connection
[00:22:31] <icinga-wm>	 PROBLEM - HHVM processes on mw1147 is CRITICAL: Timeout while attempting connection
[00:22:37] <robh>	 other than scapping everything again which seems excessive
[00:22:51] <robh>	 (if you know that is ;)
[00:22:58] <legoktm>	 robh: sync-common
[00:23:09] <robh>	 cool, things havent changed that much then yay
[00:23:20] <robh>	 i'll babysit its reboot and run that once its os is back
[00:23:48] <legoktm>	 oh, it's called "scap pull" now
[00:24:02] <robh>	 see, i wouldnt have known that, thank you =]
[00:24:11] <icinga-wm>	 RECOVERY - Disk space on mw1147 is OK: DISK OK
[00:24:11] <icinga-wm>	 RECOVERY - salt-minion processes on mw1147 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[00:24:12] <icinga-wm>	 RECOVERY - nutcracker port on mw1147 is OK: TCP OK - 0.000 second response time on port 11212
[00:24:32] <icinga-wm>	 RECOVERY - nutcracker process on mw1147 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[00:24:32] <icinga-wm>	 RECOVERY - HHVM processes on mw1147 is OK: PROCS OK: 6 processes with command name hhvm
[00:24:38] <robh>	 !log mw1147 rebooted and manually running scap pull
[00:24:44] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:25:00] <icinga-wm>	 RECOVERY - Apache HTTP on mw1147 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 8.234 second response time
[00:25:02] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1147 is OK: OK: nf_conntrack is 0 % full
[00:25:03] <robh>	 hrmm, maybe i should have screened that, here is hoping it doesnt take too long
[00:25:11] <icinga-wm>	 RECOVERY - SSH on mw1147 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0)
[00:25:21] * robh has to leave his place in les than 35 minutes
[00:25:50] <icinga-wm>	 RECOVERY - DPKG on mw1147 is OK: All packages OK
[00:25:51] <icinga-wm>	 RECOVERY - configured eth on mw1147 is OK: OK - interfaces up
[00:25:55] <robh>	 00:24:25 Copying to mw1147.eqiad.wmnet from deployment.eqiad.wmnet
[00:25:55] <robh>	 00:24:25 Started rsync common
[00:26:00] <robh>	 and waiting, heh.
[00:26:01] <icinga-wm>	 RECOVERY - dhclient process on mw1147 is OK: PROCS OK: 0 processes with command name dhclient
[00:26:08] <legoktm>	 it should take a minute or two iirc
[00:26:10] <icinga-wm>	 RECOVERY - puppet last run on mw1147 is OK: OK: Puppet is currently enabled, last run 43 minutes ago with 0 failures
[00:26:30] <robh>	 00:26:23 Finished rsync common (duration: 01m 57s) 
[00:26:33] <robh>	 legoktm: you are correct
[00:27:01] <legoktm>	 :)
[00:27:04] <legoktm>	 https://wikitech.wikimedia.org/w/index.php?title=Wikimedia_binaries&type=revision&diff=657357&oldid=539850 
[00:27:31] <icinga-wm>	 RECOVERY - HHVM rendering on mw1147 is OK: HTTP OK: HTTP/1.1 200 OK - 66410 bytes in 0.179 second response time
[00:28:11] <Krenair>	 <robh> see, i wouldnt have known that, thank you =]
[00:28:26] <Krenair>	 yeah but neither does anyone else, I think it tells you about the new command
[00:29:02] <greg-g>	 change is hard, so is reading
[00:29:26] <twentyafterfour>	 jouncebot: doing the needful.
[00:29:55] <twentyafterfour>	 updating phabricator, downtime will be minimal
[00:33:20] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures
[00:38:58] <grrrit-wm>	 (03PS1) 10Luke081515: Two permission changes at urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294652 (https://phabricator.wikimedia.org/T137888) 
[00:41:08] <twentyafterfour>	 !log taking phabricator offline momentarily for scheduled maintenance.
[00:41:14] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:43:17] <twentyafterfour>	 !log phabricator upgrade/maintenance complete. Everything appears to be back up and running normally.
[00:43:23] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:51:32] <grrrit-wm>	 (03PS1) 1020after4: force HTTPS when x-forwarded-for header is set [puppet] - 10https://gerrit.wikimedia.org/r/294653 
[00:52:07] <twentyafterfour>	 can I get an opsen to merge https://gerrit.wikimedia.org/r/#/c/294653/ so that I can re-enable puppet on iridium?
[00:52:43] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] force HTTPS when x-forwarded-for header is set [puppet] - 10https://gerrit.wikimedia.org/r/294653 (owner: 1020after4)
[00:52:52] <twentyafterfour>	 hmm
[00:54:16] <twentyafterfour>	 wtf ...why is pep8 voting on puppet repo? that failure can't be related to my change anyway.
[00:56:29] <grrrit-wm>	 (03CR) 1020after4: [C: 031] "jenkins-bot is a liar. nothing wrong with this commit" [puppet] - 10https://gerrit.wikimedia.org/r/294653 (owner: 1020after4)
[00:56:42] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures
[00:57:22] <twentyafterfour>	 !log puppet disabled on iridium because https://gerrit.wikimedia.org/r/#/c/294653/ needs to merge (hotfix in preamble.php which puppet will undo if it's allowed to run)
[00:57:28] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[01:03:31] <grrrit-wm>	 (03CR) 10Legoktm: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/294653 (owner: 1020after4)
[01:06:52] <grrrit-wm>	 (03CR) 1020after4: "legoktm: the commit doesn't even touch python code at all. this should not be a voting test if the repository state is already failing by " [puppet] - 10https://gerrit.wikimedia.org/r/294653 (owner: 1020after4)
[01:23:06] <grrrit-wm>	 (03PS11) 10MaxSem: Script to do the initial data load from OSM for Maps project [puppet] - 10https://gerrit.wikimedia.org/r/293105 (owner: 10Gehel)
[01:24:15] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Script to do the initial data load from OSM for Maps project [puppet] - 10https://gerrit.wikimedia.org/r/293105 (owner: 10Gehel)
[01:27:36] <wikibugs>	 06Operations, 03Maps-Sprint: Increase frequency of OSM replication - https://phabricator.wikimedia.org/T137939#2384405 (10MaxSem)
[01:31:08] <grrrit-wm>	 (03CR) 1020after4: "This is needed before re-enabling puppet on iridium" [puppet] - 10https://gerrit.wikimedia.org/r/294653 (owner: 1020after4)
[02:01:27] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures
[02:13:36] <icinga-wm>	 PROBLEM - puppet last run on db2043 is CRITICAL: CRITICAL: puppet fail
[02:27:38] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[02:34:38] <logmsgbot>	 !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.5) (duration: 15m 49s)
[02:34:47] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:34:58] <icinga-wm>	 PROBLEM - HHVM rendering on mw1137 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[02:35:57] <icinga-wm>	 PROBLEM - Apache HTTP on mw1137 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[02:35:57] <icinga-wm>	 PROBLEM - puppet last run on mw1137 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:36:17] <icinga-wm>	 PROBLEM - DPKG on mw1137 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:36:18] <icinga-wm>	 PROBLEM - nutcracker port on mw1137 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:36:36] <icinga-wm>	 PROBLEM - Disk space on mw1137 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:36:37] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1137 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:36:56] <icinga-wm>	 PROBLEM - HHVM processes on mw1137 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:36:58] <icinga-wm>	 PROBLEM - nutcracker process on mw1137 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:37:26] <icinga-wm>	 PROBLEM - SSH on mw1137 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[02:37:28] <icinga-wm>	 PROBLEM - dhclient process on mw1137 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:37:46] <icinga-wm>	 PROBLEM - configured eth on mw1137 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:38:08] <icinga-wm>	 PROBLEM - salt-minion processes on mw1137 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:38:58] <icinga-wm>	 RECOVERY - puppet last run on db2043 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures
[02:44:18] <icinga-wm>	 RECOVERY - puppet last run on mw1137 is OK: OK: Puppet is currently enabled, last run 25 minutes ago with 0 failures
[02:44:18] <icinga-wm>	 RECOVERY - salt-minion processes on mw1137 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[02:44:36] <icinga-wm>	 RECOVERY - DPKG on mw1137 is OK: All packages OK
[02:44:37] <icinga-wm>	 RECOVERY - nutcracker port on mw1137 is OK: TCP OK - 0.000 second response time on port 11212
[02:44:48] <icinga-wm>	 RECOVERY - Disk space on mw1137 is OK: DISK OK
[02:44:48] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1137 is OK: OK: nf_conntrack is 0 % full
[02:45:07] <icinga-wm>	 RECOVERY - HHVM processes on mw1137 is OK: PROCS OK: 12 processes with command name hhvm
[02:45:17] <icinga-wm>	 RECOVERY - nutcracker process on mw1137 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[02:45:37] <icinga-wm>	 RECOVERY - SSH on mw1137 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0)
[02:45:46] <icinga-wm>	 RECOVERY - dhclient process on mw1137 is OK: PROCS OK: 0 processes with command name dhclient
[02:45:57] <icinga-wm>	 RECOVERY - configured eth on mw1137 is OK: OK - interfaces up
[02:51:37] <icinga-wm>	 PROBLEM - puppet last run on mira is CRITICAL: CRITICAL: puppet fail
[02:52:37] <icinga-wm>	 PROBLEM - puppet last run on mw1137 is CRITICAL: CRITICAL: Puppet has 81 failures
[03:16:53] <grrrit-wm>	 (03PS1) 10KartikMistry: apertium-arg: Initial Debian packaging [debs/contenttranslation/apertium-arg] - 10https://gerrit.wikimedia.org/r/294657 (https://phabricator.wikimedia.org/T124369) 
[03:20:49] <grrrit-wm>	 (03PS1) 10KartikMistry: apertium-spa: Initial Debian packaging [debs/contenttranslation/apertium-spa] - 10https://gerrit.wikimedia.org/r/294658 (https://phabricator.wikimedia.org/T124370) 
[03:21:08] <icinga-wm>	 RECOVERY - puppet last run on mira is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures
[03:22:56] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0]
[03:23:08] <icinga-wm>	 PROBLEM - Misc HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [1000.0]
[04:32:17] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures
[05:35:36] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[05:37:37] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5215616 keys - replication_delay is 0
[05:56:16] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[06:17:12] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[06:29:32] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[06:33:53] <icinga-wm>	 PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:34:03] <icinga-wm>	 PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:34:14] <icinga-wm>	 PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:35:02] <icinga-wm>	 PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:56:33] <icinga-wm>	 RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[06:57:24] <icinga-wm>	 RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:03] <icinga-wm>	 RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:14] <icinga-wm>	 RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:04:37] <_joe_>	 hi, puppetmaster
[07:05:02] <_joe_>	 you know you're not even the lamest piece of software I ever had to manage?
[07:10:02] <wikibugs>	 07Blocked-on-Operations, 10Datasets-Archiving, 10Dumps-Generation, 10Flow, 03Collab-Team-2016-Apr-Jun-Q4: Publish recurring Flow dumps at http://dumps.wikimedia.org/ - https://phabricator.wikimedia.org/T119511#2384613 (10Nemo_bis) >>! In T119511#2379060, @ArielGlenn wrote: > Uh, this is done, insofar as...
[08:03:48] <grrrit-wm>	 (03PS1) 10Mobrovac: RESTBase: Make sendind resource_change events optional [puppet] - 10https://gerrit.wikimedia.org/r/294669 
[08:05:08] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] RESTBase: Make sendind resource_change events optional [puppet] - 10https://gerrit.wikimedia.org/r/294669 (owner: 10Mobrovac)
[08:06:26] <icinga-wm>	 PROBLEM - configured eth on mw2247 is CRITICAL: Timeout while attempting connection
[08:06:47] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2247 is CRITICAL: Host mw2247 is not in mediawiki-installation dsh group
[08:06:47] <icinga-wm>	 PROBLEM - dhclient process on mw2247 is CRITICAL: Timeout while attempting connection
[08:07:10] <grrrit-wm>	 (03Abandoned) 10Hashar: contint: cleanup gallium / use contint1001 [puppet] - 10https://gerrit.wikimedia.org/r/293283 (https://phabricator.wikimedia.org/T137358) (owner: 10Hashar)
[08:07:16] <icinga-wm>	 PROBLEM - nutcracker port on mw2247 is CRITICAL: Timeout while attempting connection
[08:07:17] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw2247 is CRITICAL: Connection timed out
[08:07:46] <icinga-wm>	 PROBLEM - nutcracker process on mw2247 is CRITICAL: Timeout while attempting connection
[08:07:57] <icinga-wm>	 PROBLEM - puppet last run on mw2247 is CRITICAL: Timeout while attempting connection
[08:08:16] <icinga-wm>	 PROBLEM - salt-minion processes on mw2247 is CRITICAL: Timeout while attempting connection
[08:08:27] <grrrit-wm>	 (03CR) 10Mobrovac: "The tox failure has nothing to do with this patch ... This is becoming a bit annoying, honestly." [puppet] - 10https://gerrit.wikimedia.org/r/294669 (owner: 10Mobrovac)
[08:08:36] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw2247 is CRITICAL: Timeout while attempting connection
[08:08:56] <icinga-wm>	 PROBLEM - DPKG on mw2247 is CRITICAL: Timeout while attempting connection
[08:09:06] <icinga-wm>	 PROBLEM - Disk space on mw2247 is CRITICAL: Timeout while attempting connection
[08:09:07] <_joe_>	 I am imaging a few servers
[08:09:37] <icinga-wm>	 PROBLEM - MD RAID on mw2247 is CRITICAL: Timeout while attempting connection
[08:15:12] <grrrit-wm>	 (03PS1) 10KartikMistry: apertium-es-ca: Rebuild for Jessie and other fixes [debs/contenttranslation/apertium-es-ca] - 10https://gerrit.wikimedia.org/r/294671 (https://phabricator.wikimedia.org/T107306) 
[08:15:25] <jynus>	 !log rebooting db1085 before putting it back into production
[08:15:29] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[08:18:13] <wikibugs>	 06Operations, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, and 4 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2384669 (10KartikMistry)
[08:19:41] <grrrit-wm>	 (03CR) 10Mobrovac: "OK'ed by the PCC - https://puppet-compiler.wmflabs.org/3131/" [puppet] - 10https://gerrit.wikimedia.org/r/294669 (owner: 10Mobrovac)
[08:20:52] <wikibugs>	 06Operations, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, and 4 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2384675 (10KartikMistry)
[08:24:08] <icinga-wm>	 RECOVERY - MD RAID on mw2247 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0
[08:24:17] <icinga-wm>	 RECOVERY - nutcracker process on mw2247 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker
[08:24:56] <icinga-wm>	 RECOVERY - salt-minion processes on mw2247 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[08:25:07] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw2247 is OK: OK: nf_conntrack is 0 % full
[08:25:08] <icinga-wm>	 RECOVERY - configured eth on mw2247 is OK: OK - interfaces up
[08:25:28] <icinga-wm>	 RECOVERY - dhclient process on mw2247 is OK: PROCS OK: 0 processes with command name dhclient
[08:25:28] <icinga-wm>	 RECOVERY - DPKG on mw2247 is OK: All packages OK
[08:25:46] <icinga-wm>	 RECOVERY - Disk space on mw2247 is OK: DISK OK
[08:25:56] <icinga-wm>	 RECOVERY - nutcracker port on mw2247 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[08:30:26] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw2247 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.083 second response time
[08:33:16] <icinga-wm>	 PROBLEM - puppet last run on mw2247 is CRITICAL: CRITICAL: Puppet has 5 failures
[08:33:34] <wikibugs>	 06Operations, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, and 4 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2384707 (10KartikMistry)
[08:35:49] <grrrit-wm>	 (03PS1) 10Jcrespo: Pool db1085, increase weight of all new db servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294672 (https://phabricator.wikimedia.org/T133398) 
[08:37:18] <grrrit-wm>	 (03PS2) 10Muehlenhoff: services firejail: make fs blacklist more obvious [puppet] - 10https://gerrit.wikimedia.org/r/293515 (owner: 10JanZerebecki)
[08:38:46] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] services firejail: make fs blacklist more obvious [puppet] - 10https://gerrit.wikimedia.org/r/293515 (owner: 10JanZerebecki)
[08:39:09] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Pool db1085, increase weight of all new db servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294672 (https://phabricator.wikimedia.org/T133398) (owner: 10Jcrespo)
[08:40:41] <moritzm>	 is the operations-puppet-tox-jessie check now active? I'm wondering why https://gerrit.wikimedia.org/r/293515 failed jenkins?
[08:41:02] <_joe_>	 it's active, yes
[08:41:08] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Pool db1085, increase weight of all new db servers (duration: 00m 29s)
[08:41:12] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[08:41:26] <jynus>	 ERROR: InvocationError: '/home/jenkins/workspace/operations-puppet-tox-jessie/.tox/pep8/bin/flake8'
[08:41:27] <_joe_>	 when it's not active you see "non-voting" on the side
[08:42:37] <grrrit-wm>	 (03PS1) 10KartikMistry: apertium-eus: Rebuild for Jessie and other fixes [debs/contenttranslation/apertium-eus] - 10https://gerrit.wikimedia.org/r/294673 (https://phabricator.wikimedia.org/T107306) 
[08:42:42] <moritzm>	 ah, ok
[08:45:38] <hashar>	 so yeah yesterday
[08:45:54] <hashar>	 I have phased out the legacy job that was running pep8 1.4.6 in each directory containing python scripts
[08:46:02] <hashar>	 and switched to a job that runs  'tox' from the root of the repo
[08:46:13] <hashar>	 made possible thanks to Bryan and all reviewers that fixed all the python linting issues we had
[08:46:23] <hashar>	 so now CI ends up doing something like:
[08:46:26] <hashar>	 pip install flake8
[08:46:27] <hashar>	 flake8
[08:46:59] <wikibugs>	 06Operations, 10DBA: High replication lag to dewiki - https://phabricator.wikimedia.org/T135100#2384720 (10jcrespo)
[08:47:01] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: reimage or decom db servers on precise - https://phabricator.wikimedia.org/T125028#2384721 (10jcrespo)
[08:47:07] <wikibugs>	 06Operations, 10DBA: Physical location SPOF because of database server distribution on a single rack (D1) - https://phabricator.wikimedia.org/T111992#2384723 (10jcrespo)
[08:47:10] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: Install, configure and provision recently arrived db core machines - https://phabricator.wikimedia.org/T133398#2384717 (10jcrespo) 05Open>03Resolved All 16 new servers (21 in total, 3 per shard) are pooled into production- we will do some adjustments over the foll...
[08:47:16] <hashar>	 bonus point, you can  run 'tox' on your local machine to reproduce what CI is doing 
[08:48:07] <icinga-wm>	 RECOVERY - puppet last run on mw2247 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures
[08:48:23] <hashar>	 oh
[08:48:25] <hashar>	 moritzm: got it
[08:48:33] <hashar>	 moritzm: we have defined the dependency as  "flake8"
[08:48:49] <hashar>	 so that download whatever new version from pypi and one got released  yesterday :(
[08:50:48] <wikibugs>	 06Operations, 10DBA, 07Epic: Eliminate SPOF at the main database infrastructure - https://phabricator.wikimedia.org/T119626#2384731 (10jcrespo)
[08:50:49] <wikibugs>	 06Operations, 10DBA: Physical location SPOF because of database server distribution on a single rack (D1) - https://phabricator.wikimedia.org/T111992#2384728 (10jcrespo) 05Open>03Resolved a:03jcrespo This is now fixed, D1 is no longer a SPOF. Although somehow heavy, if D1 or the whole D row went down, we...
[08:50:55] <grrrit-wm>	 (03PS1) 10Hashar: Explicitly pin flake8 to 2.5.5 [puppet] - 10https://gerrit.wikimedia.org/r/294674 
[08:51:13] <hashar>	 _joe_ moritzm : I guess we want to explicitly pin the flake8 version being used  https://gerrit.wikimedia.org/r/294674     
[08:51:20] <hashar>	 since upstream tends to add new checks from time to time
[08:51:29] <hashar>	 (specially on a new minor version)
[08:52:43] <moritzm>	 looks, good I'll merge
[08:53:09] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 032 V: 032] Explicitly pin flake8 to 2.5.5 [puppet] - 10https://gerrit.wikimedia.org/r/294674 (owner: 10Hashar)
[08:53:26] <hashar>	 then 'recheck' your patch  and it shall pass
[08:53:36] <moritzm>	 k
[08:53:39] <hashar>	 (since your open patch is going to be tested as a merge on tip of production branch)
[08:53:52] <hashar>	 sorry should have thought about pinning the version
[08:54:03] <hashar>	 there is nothing more annoying than a Jenkins job failing for unrelated reasons
[08:56:08] <moritzm>	 np, the joys of npm/pip etc. pp :-)
[08:56:12] <hashar>	 mobile and I are going to push a fix for MobileFrontend Special:Nearby .  It has some javascript error due to a missing dependency in the RL definition
[08:56:26] <hashar>	 https://phabricator.wikimedia.org/T137919  for the bug and wmf.6 patch is https://gerrit.wikimedia.org/r/#/c/294649/
[08:56:35] <grrrit-wm>	 (03PS3) 10Muehlenhoff: services firejail: make fs blacklist more obvious [puppet] - 10https://gerrit.wikimedia.org/r/293515 (owner: 10JanZerebecki)
[09:00:18] <wikibugs>	 06Operations, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, and 4 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2384745 (10KartikMistry)
[09:02:17] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures
[09:04:34] <icinga-wm>	 PROBLEM - Apache HTTP on mw1278 is CRITICAL: Connection timed out
[09:05:26] <grrrit-wm>	 (03PS1) 10KartikMistry: apertium-hbs: Rebuild for Jessie and other fixes [debs/contenttranslation/apertium-hbs] - 10https://gerrit.wikimedia.org/r/294675 (https://phabricator.wikimedia.org/T107306) 
[09:05:44] <icinga-wm>	 PROBLEM - puppet last run on mw1278 is CRITICAL: Timeout while attempting connection
[09:06:14] <icinga-wm>	 PROBLEM - salt-minion processes on mw1278 is CRITICAL: Timeout while attempting connection
[09:07:04] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1278 is CRITICAL: Timeout while attempting connection
[09:07:05] <icinga-wm>	 PROBLEM - DPKG on mw1278 is CRITICAL: Timeout while attempting connection
[09:07:25] <icinga-wm>	 PROBLEM - Disk space on mw1278 is CRITICAL: Timeout while attempting connection
[09:07:54] <icinga-wm>	 PROBLEM - MD RAID on mw1278 is CRITICAL: Timeout while attempting connection
[09:08:24] <_joe_>	 it's I am installing that system
[09:08:44] <icinga-wm>	 PROBLEM - configured eth on mw1278 is CRITICAL: Timeout while attempting connection
[09:08:58] <wikibugs>	 06Operations: ffmpeg/libav on jessie video scalers - https://phabricator.wikimedia.org/T137886#2384763 (10MoritzMuehlenhoff) Sounds good, I'll rebuild libtheora as used on trusty for jessie-wikimedia and make a backport of ffmpeg2theora 0.30 for jessie.
[09:09:04] <icinga-wm>	 PROBLEM - dhclient process on mw1278 is CRITICAL: Timeout while attempting connection
[09:09:05] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1278 is CRITICAL: Host mw1278 is not in mediawiki-installation dsh group
[09:09:35] <icinga-wm>	 PROBLEM - nutcracker port on mw1278 is CRITICAL: Timeout while attempting connection
[09:09:55] <icinga-wm>	 PROBLEM - nutcracker process on mw1278 is CRITICAL: Timeout while attempting connection
[09:12:13] <wikibugs>	 06Operations, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, and 4 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2384764 (10KartikMistry)
[09:13:37] <grrrit-wm>	 (03PS1) 10Gehel: Interactive team would like to be notified of issues with Maps. [puppet] - 10https://gerrit.wikimedia.org/r/294676 (https://phabricator.wikimedia.org/T137869) 
[09:14:45] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: install_server: rename ms-be partman config to reflect reality [puppet] - 10https://gerrit.wikimedia.org/r/294677 
[09:15:23] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: swift: redirect syslog from all daemons to separate file [puppet] - 10https://gerrit.wikimedia.org/r/294678 (https://phabricator.wikimedia.org/T137397) 
[09:15:53] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] install_server: rename ms-be partman config to reflect reality [puppet] - 10https://gerrit.wikimedia.org/r/294677 (owner: 10Filippo Giunchedi)
[09:16:50] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2384770 (10fgiunchedi) @papaul partman recipe would be the same as other `ms-be` systems from HP, namely `ms-be-hp.cfg`, thanks!  also just to confirm, the 2x200GB SAS is SSD not spinning d...
[09:17:09] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 031] Interactive team would like to be notified of issues with Maps. [puppet] - 10https://gerrit.wikimedia.org/r/294676 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[09:18:08] <logmsgbot>	 !log hashar@tin Synchronized php-1.28.0-wmf.6/extensions/MobileFrontend: MobileFrontend RL registration issue preventing Special:Nearby from working properly T137919 (duration: 00m 36s)
[09:18:09] <stashbot>	 T137919: Uncaught Error: Module "mediawiki.router" is not loaded (on Special:Nearby) - https://phabricator.wikimedia.org/T137919
[09:18:12] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:21:33] <grrrit-wm>	 (03PS3) 10Filippo Giunchedi: DNS: Add mgmt DNS entries for ms-be2022 to ms-be2027 [dns] - 10https://gerrit.wikimedia.org/r/294543 (https://phabricator.wikimedia.org/T136630) (owner: 10Papaul)
[09:21:55] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, I've reworded the commit message" [dns] - 10https://gerrit.wikimedia.org/r/294543 (https://phabricator.wikimedia.org/T136630) (owner: 10Papaul)
[09:23:04] <grrrit-wm>	 (03PS1) 10Gehel: Fixed typo (cron instead of from). [puppet] - 10https://gerrit.wikimedia.org/r/294679 
[09:24:17] <grrrit-wm>	 (03CR) 10Gehel: "I have not seen much use of resource default in our code base. I wonder if there is a reason for that (appart from the awful and non obvio" [puppet] - 10https://gerrit.wikimedia.org/r/294679 (owner: 10Gehel)
[09:24:23] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Fixed typo (cron instead of from). [puppet] - 10https://gerrit.wikimedia.org/r/294679 (owner: 10Gehel)
[09:25:32] <grrrit-wm>	 (03PS2) 10Gehel: Fixed typo (cron instead of from). [puppet] - 10https://gerrit.wikimedia.org/r/294679 
[09:26:37] <grrrit-wm>	 (03PS2) 10Gehel: Interactive team would like to be notified of issues with Maps. [puppet] - 10https://gerrit.wikimedia.org/r/294676 (https://phabricator.wikimedia.org/T137869) 
[09:26:55] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures
[09:28:28] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] Interactive team would like to be notified of issues with Maps. [puppet] - 10https://gerrit.wikimedia.org/r/294676 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[09:41:05] <icinga-wm>	 RECOVERY - Apache HTTP on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 11378 bytes in 0.010 second response time
[09:44:05] <wikibugs>	 06Operations: "puppet fail" flapping on restbase1007 - https://phabricator.wikimedia.org/T137952#2384810 (10fgiunchedi)
[09:44:26] <icinga-wm>	 RECOVERY - MD RAID on mw1278 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0
[09:44:45] <icinga-wm>	 PROBLEM - NTP on mw1278 is CRITICAL: NTP CRITICAL: Offset unknown
[09:44:54] <icinga-wm>	 RECOVERY - salt-minion processes on mw1278 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[09:45:25] <icinga-wm>	 RECOVERY - configured eth on mw1278 is OK: OK - interfaces up
[09:45:33] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 04-1] "I'm not really fond of that, the --output option in firejail uses it's own homegrown log rotation, let's rather redirect stdout/stderr in " [puppet] - 10https://gerrit.wikimedia.org/r/294499 (owner: 10Mobrovac)
[09:45:35] <icinga-wm>	 RECOVERY - dhclient process on mw1278 is OK: PROCS OK: 0 processes with command name dhclient
[09:45:44] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1278 is OK: OK: nf_conntrack is 0 % full
[09:46:05] <icinga-wm>	 RECOVERY - Disk space on mw1278 is OK: DISK OK
[09:46:15] <icinga-wm>	 RECOVERY - nutcracker port on mw1278 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[09:46:26] <icinga-wm>	 RECOVERY - nutcracker process on mw1278 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker
[09:47:55] <icinga-wm>	 RECOVERY - DPKG on mw1278 is OK: All packages OK
[09:48:21] <mobrovac>	 !log restbase deploy start of ebeaa46
[09:48:25] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:49:14] <icinga-wm>	 PROBLEM - HHVM rendering on mw1143 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:49:15] <icinga-wm>	 PROBLEM - Apache HTTP on mw1143 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:51:35] <icinga-wm>	 PROBLEM - SSH on mw1143 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:51:45] <icinga-wm>	 PROBLEM - configured eth on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:52:04] <icinga-wm>	 PROBLEM - nutcracker port on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:52:15] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:52:25] <icinga-wm>	 PROBLEM - puppet last run on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:52:35] <icinga-wm>	 PROBLEM - DPKG on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:54:18] <moritzm>	 I can't even connect to the serial console of mw1143, I'm only getting "Disconnected from UNKNOWN port 0", can someone please doublecheck whether it also fails?
[09:54:29] <_joe_>	 moritzm: I'll try
[09:55:38] <_joe_>	 I got in
[09:55:55] <icinga-wm>	 PROBLEM - dhclient process on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:56:17] <_joe_>	 !log powercycling mw1143, unresponsive on ssh, console
[09:56:21] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:56:25] <icinga-wm>	 PROBLEM - nutcracker process on mw1143 is CRITICAL: Timeout while attempting connection
[09:57:04] <icinga-wm>	 PROBLEM - Disk space on mw1143 is CRITICAL: Timeout while attempting connection
[09:57:24] <icinga-wm>	 PROBLEM - salt-minion processes on mw1143 is CRITICAL: Timeout while attempting connection
[09:57:25] <icinga-wm>	 PROBLEM - HHVM processes on mw1143 is CRITICAL: Timeout while attempting connection
[09:57:25] <icinga-wm>	 PROBLEM - puppet last run on mw1278 is CRITICAL: CRITICAL: Puppet has 8 failures
[09:58:35] <icinga-wm>	 RECOVERY - nutcracker process on mw1143 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[09:58:35] <grrrit-wm>	 (03CR) 10Mobrovac: "I put up this patch mostly because I wasn't sure (and couldn't confirm) that firejail is actually letting stdout and stderr through to sys" [puppet] - 10https://gerrit.wikimedia.org/r/294499 (owner: 10Mobrovac)
[09:58:44] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1143 is OK: OK: nf_conntrack is 0 % full
[09:58:54] <icinga-wm>	 RECOVERY - puppet last run on mw1143 is OK: OK: Puppet is currently enabled, last run 42 minutes ago with 0 failures
[09:58:56] <icinga-wm>	 RECOVERY - DPKG on mw1143 is OK: All packages OK
[09:59:14] <icinga-wm>	 RECOVERY - Disk space on mw1143 is OK: DISK OK
[09:59:23] <mobrovac>	 !log restbase deploy end of ebeaa46
[09:59:25] <icinga-wm>	 RECOVERY - salt-minion processes on mw1143 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[09:59:25] <icinga-wm>	 RECOVERY - HHVM processes on mw1143 is OK: PROCS OK: 6 processes with command name hhvm
[09:59:27] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:59:34] <moritzm>	 ahh, ssh -vv shows that my SSH client apparently negotiated a DH key exchange which is too modern for whatever they have installed there, so fails :-/
[09:59:45] <icinga-wm>	 RECOVERY - NTP on mw1278 is OK: NTP OK: Offset -0.004133582115 secs
[10:00:04] <icinga-wm>	 RECOVERY - HHVM rendering on mw1143 is OK: HTTP OK: HTTP/1.1 200 OK - 66413 bytes in 7.482 second response time
[10:00:05] <icinga-wm>	 RECOVERY - Apache HTTP on mw1143 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.461 second response time
[10:00:15] <icinga-wm>	 RECOVERY - SSH on mw1143 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0)
[10:00:15] <icinga-wm>	 RECOVERY - dhclient process on mw1143 is OK: PROCS OK: 0 processes with command name dhclient
[10:00:24] <icinga-wm>	 RECOVERY - configured eth on mw1143 is OK: OK - interfaces up
[10:00:45] <icinga-wm>	 RECOVERY - nutcracker port on mw1143 is OK: TCP OK - 0.000 second response time on port 11212
[10:03:01] <_joe_>	 moritzm: I'm on osx right now
[10:05:49] <grrrit-wm>	 (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/294669 (owner: 10Mobrovac)
[10:07:05] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: servermon: Remove old urls.py file [puppet] - 10https://gerrit.wikimedia.org/r/294685 
[10:07:30] <grrrit-wm>	 (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/294653 (owner: 1020after4)
[10:07:38] <grrrit-wm>	 (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/293105 (owner: 10Gehel)
[10:08:13] <grrrit-wm>	 (03CR) 10Hashar: "Issue was due to a new version of the python linter flake8 that got released yesterday. Unrelated to this patch and now fixed (by pinning" [puppet] - 10https://gerrit.wikimedia.org/r/293105 (owner: 10Gehel)
[10:08:14] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[10:08:25] <icinga-wm>	 RECOVERY - puppet last run on mw1278 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[10:08:28] <grrrit-wm>	 (03CR) 10Hashar: "Issue was due to a new version of the python linter flake8 that got released yesterday. Unrelated to this patch and now fixed (by pinning " [puppet] - 10https://gerrit.wikimedia.org/r/294653 (owner: 1020after4)
[10:08:30] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] servermon: Remove old urls.py file [puppet] - 10https://gerrit.wikimedia.org/r/294685 (owner: 10Alexandros Kosiaris)
[10:09:26] <icinga-wm>	 PROBLEM - Apache HTTP on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:09:54] <icinga-wm>	 PROBLEM - Check correctness of the icinga configuration on neon is CRITICAL: Icinga configuration contains errors
[10:10:05] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5220557 keys - replication_delay is 0
[10:11:25] <icinga-wm>	 RECOVERY - Apache HTTP on mw1278 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.057 second response time
[10:14:49] <mobrovac>	 !log scb1001 disabling puppet for a while to manually test changeprop with transclusion rules
[10:14:52] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:24:16] <wikibugs>	 06Operations, 13Patch-For-Review: revisit swift (sys)logging - https://phabricator.wikimedia.org/T137397#2384891 (10fgiunchedi)
[10:28:23] <wikibugs>	 06Operations, 10media-storage: swift backend machines load spike: cause and remediation - https://phabricator.wikimedia.org/T84385#2384895 (10fgiunchedi)
[10:28:46] <wikibugs>	 06Operations, 10media-storage, 13Patch-For-Review: swift upgrade plans: jessie and swift 2.x - https://phabricator.wikimedia.org/T117972#2384899 (10fgiunchedi)
[10:28:48] <wikibugs>	 06Operations, 10media-storage: swift backend machines load spike: cause and remediation - https://phabricator.wikimedia.org/T84385#926618 (10fgiunchedi)
[10:29:20] <grrrit-wm>	 (03CR) 10Muehlenhoff: "firejail logs to stdout/stderr, but the systemd file still needs to be updated to use StandardError/StandardOutput" [puppet] - 10https://gerrit.wikimedia.org/r/294499 (owner: 10Mobrovac)
[10:29:45] <wikibugs>	 06Operations, 10media-storage: swift backend machines load spike: cause and remediation - https://phabricator.wikimedia.org/T84385#926618 (10fgiunchedi) what's left here is xfs bug(s) sending load average through the root, blocking with {T117972} to be checked again once we're running on linux 4.x
[10:31:49] <wikibugs>	 06Operations, 06Discovery, 06Maps, 03Maps-Sprint, 13Patch-For-Review: Configure monitoring / alerting of Postgresql / redis / ... cluster for maps - https://phabricator.wikimedia.org/T135647#2384904 (10Gehel)
[10:38:03] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: swift: enable statsd for all daemons [puppet] - 10https://gerrit.wikimedia.org/r/294691 
[10:40:25] <wikibugs>	 06Operations: investigate why swift container server takes so much cpu - https://phabricator.wikimedia.org/T82850#2384912 (10fgiunchedi)
[10:41:27] <wikibugs>	 06Operations: investigate why swift container server takes so much cpu - https://phabricator.wikimedia.org/T82850#906177 (10fgiunchedi) 05Open>03Invalid not sure there's anything we can do on this old bug, afaik we haven't experienced cpu problems with container server on the current swift fleet, resolving
[10:41:54] <icinga-wm>	 PROBLEM - changeprop endpoints health on scb1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.16, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused)))
[10:41:55] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T137439#2384919 (10fgiunchedi) 05Open>03Invalid duplicate of {T136630}
[10:43:11] <wikibugs>	 06Operations, 10hardware-requests: additional graphite machines request, 1x per DC - https://phabricator.wikimedia.org/T126253#2384926 (10fgiunchedi)
[10:43:13] <wikibugs>	 06Operations, 10ops-codfw, 13Patch-For-Review: rack/setup new host graphite2002 - https://phabricator.wikimedia.org/T130938#2384924 (10fgiunchedi) 05Open>03Resolved machine is in service, resolving
[10:43:31] <grrrit-wm>	 (03PS2) 10BBlack: force HTTPS when x-forwarded-for header is set [puppet] - 10https://gerrit.wikimedia.org/r/294653 (owner: 1020after4)
[10:44:50] <moritzm>	 !log depooling mw1154 for kernel update/reboot
[10:44:53] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:44:54] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] force HTTPS when x-forwarded-for header is set [puppet] - 10https://gerrit.wikimedia.org/r/294653 (owner: 1020after4)
[10:51:34] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures
[10:54:46] <icinga-wm>	 RECOVERY - changeprop endpoints health on scb1001 is OK: All endpoints are healthy
[10:56:05] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures
[10:59:05] <icinga-wm>	 PROBLEM - puppet last run on db2067 is CRITICAL: CRITICAL: puppet fail
[11:01:28] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "The reason is that apart from the hideous syntax, it also can mess up defaults across multiple scopes. https://docs.puppet.com/puppet/3.5/" [puppet] - 10https://gerrit.wikimedia.org/r/294679 (owner: 10Gehel)
[11:01:33] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] Fixed typo (cron instead of from). [puppet] - 10https://gerrit.wikimedia.org/r/294679 (owner: 10Gehel)
[11:06:11] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 031] "+1 from me. We should start looking at integrating systemd's security directives (http://0pointer.de/blog/projects/security.html) to compl" [puppet] - 10https://gerrit.wikimedia.org/r/294483 (https://phabricator.wikimedia.org/T121756) (owner: 10Muehlenhoff)
[11:06:15] <icinga-wm>	 PROBLEM - puppet last run on mw2175 is CRITICAL: CRITICAL: puppet fail
[11:17:25] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[11:18:37] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: salt: add wmfpuppet module [puppet] - 10https://gerrit.wikimedia.org/r/294694 
[11:19:54] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] salt: add wmfpuppet module [puppet] - 10https://gerrit.wikimedia.org/r/294694 (owner: 10Giuseppe Lavagetto)
[11:22:46] <icinga-wm>	 PROBLEM - changeprop endpoints health on scb1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.16, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused)))
[11:24:55] <icinga-wm>	 RECOVERY - puppet last run on db2067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[11:27:05] <icinga-wm>	 RECOVERY - changeprop endpoints health on scb1001 is OK: All endpoints are healthy
[11:34:15] <icinga-wm>	 RECOVERY - puppet last run on mw2175 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures
[11:37:52] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure: No replica for adywiki - https://phabricator.wikimedia.org/T135029#2286195 (10jcrespo)
[11:46:14] <grrrit-wm>	 (03CR) 10Gehel: "I'm not a big fan of 'create_resources()'. It breaks what little compile time checks there are in Puppet. Let's wait for Puppet 4 and "Per" [puppet] - 10https://gerrit.wikimedia.org/r/294679 (owner: 10Gehel)
[11:48:31] <grrrit-wm>	 (03PS3) 10Gehel: Fixed typo (cron instead of from). [puppet] - 10https://gerrit.wikimedia.org/r/294679 
[11:51:46] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures
[11:53:44] <Amir1>	 ores.wikimedia.org is down
[11:53:50] <Amir1>	 it seems it got overloaded
[11:53:57] <Amir1>	 https://ores.wikimedia.org/v2/scores/enwiki/?models=damaging&revids=724030089
[11:54:08] <Amir1>	 1- I haven't done anything
[11:54:12] <Amir1>	 2- let's fix it
[11:55:01] <Amir1>	 https://grafana.wikimedia.org/dashboard/db/ores
[11:55:12] <Amir1>	 it has been done for about 8 hours
[11:55:16] <Amir1>	 *down
[11:56:02] <Amir1>	 akosiaris: ^
[11:58:54] <Amir1>	 wikidatawiki and fawiki are depending on this, https://logstash.wikimedia.org/#/dashboard/elasticsearch/default
[12:03:06] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures
[12:06:18] <akosiaris>	 Amir1: wow, why on earth nothing alerted us of that ...
[12:06:35] <Amir1>	 :(((
[12:06:56] <Amir1>	 let's fix it and then fix the icigna 
[12:07:38] <akosiaris>	 WARNING:ores.score_processors.celery -- Queue size is too full 229
[12:07:39] <akosiaris>	 ?
[12:07:46] <icinga-wm>	 RECOVERY - Apache HTTP on mw1137 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 3.253 second response time
[12:07:52] <akosiaris>	 all that precaching ?
[12:08:08] <Amir1>	 It can be 
[12:08:21] <Amir1>	 one of my friends was running some stats on it too
[12:08:43] <akosiaris>	 hmm, so icinga says ores is returning an OK
[12:08:55] <Amir1>	 he doesn't know about performance throttling :( I think it was combination of them  
[12:08:56] <akosiaris>	 but that is because it is only checking one url
[12:09:52] <Amir1>	 akosiaris: I think a restart would bring everything back up, probably, it will get huge requests from extensions and precaching 
[12:10:11] <Amir1>	 but overall it should handle that much
[12:10:25] <icinga-wm>	 RECOVERY - HHVM rendering on mw1137 is OK: HTTP OK: HTTP/1.1 200 OK - 66520 bytes in 0.170 second response time
[12:10:42] <moritzm>	 !log restarted hhvm on mw1137, got stuck
[12:10:46] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:11:36] <akosiaris>	 Amir1: https://phabricator.wikimedia.org/T137804 should fix the icinga issue
[12:11:43] <Amir1>	 probably we need to dsiable precaching for wikidata
[12:12:12] <akosiaris>	 Amir1: yes
[12:12:19] <akosiaris>	 at least that much is obvious from the logs
[12:12:48] <wikibugs>	 06Operations, 10ORES, 06Revision-Scoring-As-A-Service: ORES should advertise swagger specs under /?spec - https://phabricator.wikimedia.org/T137804#2385011 (10Ladsgroup)
[12:13:35] <akosiaris>	 https://phabricator.wikimedia.org/tag/revision-scoring-as-a-service/ is different from https://phabricator.wikimedia.org/tag/ores/ ?
[12:14:00] <akosiaris>	 well, duh, but I though just adding ORES to the task would be enough
[12:14:17] <Amir1>	 revision... is an umbrella project for all of products, the extension, wikilabels 
[12:14:44] <akosiaris>	 ah my choice makes sense then
[12:14:47] <Amir1>	 we do it to keep track of what we did (we treat the board like the Trello board)
[12:16:12] <akosiaris>	 Amir1: definitely stop the wikidatawiki precaching 
[12:16:19] <akosiaris>	 lemme know when it's done
[12:16:38] <Amir1>	 doing it is rather easy
[12:16:39] <icinga-wm>	 ACKNOWLEDGEMENT - Check correctness of the icinga configuration on neon is CRITICAL: Icinga configuration contains errors Gehel Probably related to a commit from gehel, cheking right now
[12:17:55] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[12:18:47] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "not sure I follow what you mean by "It breaks what little compile time checks there are in Puppet"" [puppet] - 10https://gerrit.wikimedia.org/r/294679 (owner: 10Gehel)
[12:18:56] <icinga-wm>	 RECOVERY - puppet last run on mw1137 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[12:20:26] <grrrit-wm>	 (03CR) 10Gehel: "For example, if I have a typo in a param name, puppet will complain at compile time and tell me I'm trying to set a param that does not ex" [puppet] - 10https://gerrit.wikimedia.org/r/294679 (owner: 10Gehel)
[12:20:37] <Amir1>	 akosiaris: https://gerrit.wikimedia.org/r/#/c/294699/
[12:20:57] <Amir1>	 please review and then we try to deploy
[12:21:18] <Amir1>	 in wmflabs setup it didn't get precaching issue
[12:23:33] <grrrit-wm>	 (03PS1) 10Gehel: Fix team name for icinga alert [puppet] - 10https://gerrit.wikimedia.org/r/294700 
[12:26:13] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] Fix team name for icinga alert [puppet] - 10https://gerrit.wikimedia.org/r/294700 (owner: 10Gehel)
[12:26:55] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures
[12:29:04] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "I don't think that is true. For example" [puppet] - 10https://gerrit.wikimedia.org/r/294679 (owner: 10Gehel)
[12:30:26] <akosiaris>	 Amir1: I am having problems understanding that change
[12:30:44] <akosiaris>	 what does that do ? disable the ?precache=true parameter ?
[12:31:18] <akosiaris>	 or something else ? 
[12:31:36] <Amir1>	 no, it disables it in deamon halfak is running
[12:31:47] <Amir1>	 but that needs to be restart manually
[12:31:52] <Amir1>	 facepalm
[12:32:07] <akosiaris>	 so, there is a precaching daemon running somewhere
[12:32:12] <Amir1>	 !log manually restarted celery-ores-worker in scb1002
[12:32:16] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:32:20] <moritzm>	 !log installing apache2 trusty update on graphite1001
[12:32:24] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:32:25] <Amir1>	 akosiaris: yup 
[12:32:43] <akosiaris>	 and just shares the same config as ores
[12:32:49] <akosiaris>	 as the rest of ores, anyway
[12:33:10] <Amir1>	 akosiaris: yeah
[12:33:27] <Amir1>	 !log manually restarted celery-ores-worker in scb1001
[12:33:30] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:33:44] <Amir1>	 ores is back online for now: https://ores.wikimedia.org/v2/scores/enwiki/?models=damaging&revids=724030089
[12:34:24] <icinga-wm>	 RECOVERY - Check correctness of the icinga configuration on neon is OK: Icinga configuration is correct
[12:34:26] <mobrovac>	 Amir1: akosiaris: fyi, i've got puppet disabled on scb1001
[12:34:29] <grrrit-wm>	 (03CR) 10Gehel: "I stand corrected. I remember having that issue multiple times, but that might have been older Puppet version (or my memory going bad). So" [puppet] - 10https://gerrit.wikimedia.org/r/294679 (owner: 10Gehel)
[12:34:51] <Amir1>	 akosiaris: is there a way to refuse requests that are coming too much for an outside source
[12:35:12] <Amir1>	 specially I'm talking about my friends who tried to run some stats in a very bad way
[12:37:08] <akosiaris>	 Amir1: yes, there exists a possibility for rate limiting
[12:38:20] <Amir1>	 on docs on how to implement it or only Ops can do it?
[12:38:45] <grrrit-wm>	 (03CR) 10Zfilipin: "What is the status of this patch? Are you working on it? Should it be abandoned?" [puppet] - 10https://gerrit.wikimedia.org/r/178810 (https://phabricator.wikimedia.org/T78342) (owner: 10Hashar)
[12:40:17] <akosiaris>	 Amir1: no, it exist on the varnish level and is based on token bucket filters. IIRC it was disabled due to some problems it created. I 'll ping bblack to see if it makes sense to re-enable it
[12:40:25] <grrrit-wm>	 (03CR) 10Zfilipin: "What is the status of this patch? Are you working on it? Should it be abandoned?" [puppet] - 10https://gerrit.wikimedia.org/r/276733 (owner: 10Hashar)
[12:43:01] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "I 'll grant you that, but in practice, whenever you go for create_resources, you 've already put the data in a data structure (a hash). Th" [puppet] - 10https://gerrit.wikimedia.org/r/294679 (owner: 10Gehel)
[12:43:05] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[12:43:05] <icinga-wm>	 RECOVERY - Misc HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[12:45:03] <Amir1>	 jynus: if you're around. I finally got around the performance issue. https://gerrit.wikimedia.org/r/#/c/294693/2
[12:45:43] <Amir1>	 fortunately no need to for schema changes for now ( it would be great if we can have one later)
[12:45:51] <jynus>	 nice, did you test it on labs/production
[12:46:07] <jynus>	 e.g. running EXPLAIN on the resulting query
[12:46:09] <Amir1>	 yup, in labs
[12:46:20] <Amir1>	 in beta cluster
[12:46:22] <grrrit-wm>	 (03CR) 10Gehel: "Interesting conversation! I think that's another weak point of Puppet. The strict boundary between code and config is puppet code vs hiera" [puppet] - 10https://gerrit.wikimedia.org/r/294679 (owner: 10Gehel)
[12:46:26] <Amir1>	 got much faster
[12:46:38] <jynus>	 give it a try on production too, to be sure
[12:46:47] <Amir1>	 of course 
[12:46:59] <Amir1>	 I want to get it though in SWAT
[12:47:07] <jynus>	 query plans tend to be very different when there is a lot of data
[12:47:15] <jynus>	 I can do it if you give me the resulting query
[12:47:20] <jynus>	 just update the ticket
[12:48:08] <grrrit-wm>	 (03PS1) 10BBlack: r::c::ssl::unified: set explicit server name www.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/294703 (https://phabricator.wikimedia.org/T107236) 
[12:48:10] <grrrit-wm>	 (03PS1) 10BBlack: r::c::ssl: use 3127 for upstream_port [puppet] - 10https://gerrit.wikimedia.org/r/294704 (https://phabricator.wikimedia.org/T107236) 
[12:48:12] <grrrit-wm>	 (03PS1) 10BBlack: vhtcpd: use port 3127 for fe [puppet] - 10https://gerrit.wikimedia.org/r/294705 (https://phabricator.wikimedia.org/T107236) 
[12:48:14] <grrrit-wm>	 (03PS1) 10BBlack: tlsproxy: redirect-only service on 8080 [puppet] - 10https://gerrit.wikimedia.org/r/294706 (https://phabricator.wikimedia.org/T107236) 
[12:49:25] <jynus>	 apergos, I am a bit lost on T29112
[12:49:26] <stashbot>	 T29112: Select of revisions for stub history files does not explicitly order revisions - https://phabricator.wikimedia.org/T29112
[12:49:45] <jynus>	 too much info
[12:49:47] <apergos>	 jynus: there's a select without an order-by
[12:49:55] <Amir1>	 jynus: it would be great, just instead of LEFT JOIN on ores_classification, run INNER JOIN
[12:49:58] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 031] Add the ability to configure contact group for check of services. [puppet] - 10https://gerrit.wikimedia.org/r/294507 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[12:50:04] <grrrit-wm>	 (03PS4) 10Gehel: Fixed typo (cron instead of from). [puppet] - 10https://gerrit.wikimedia.org/r/294679 
[12:50:13] <jynus>	 adding the order by creates a filesort?
[12:50:17] <apergos>	 this used to not be an issue because sort of by luck most entries came back in prev id order within pages
[12:50:21] <apergos>	 *rev id order
[12:50:23] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] r::c::ssl: use 3127 for upstream_port [puppet] - 10https://gerrit.wikimedia.org/r/294704 (https://phabricator.wikimedia.org/T107236) (owner: 10BBlack)
[12:50:25] <apergos>	 now that's not true at all
[12:50:38] <jynus>	 sometimes that can be avoided with the right index, but only sometimes
[12:50:41] <apergos>	 so I need to add that explicit ordering...
[12:50:51] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Fixed typo (cron instead of from). [puppet] - 10https://gerrit.wikimedia.org/r/294679 (owner: 10Gehel)
[12:51:08] <apergos>	 but if we do it on 500k pages that's liable to be waaaaay too many revs to do in memory
[12:51:19] <apergos>	 that could be some millions of revs
[12:51:25] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] tlsproxy: redirect-only service on 8080 [puppet] - 10https://gerrit.wikimedia.org/r/294706 (https://phabricator.wikimedia.org/T107236) (owner: 10BBlack)
[12:51:27] <jynus>	 Is "SELECT * FROM page INNER JOIN revision ON ((page_id=rev_page)) WHERE page_id >= 1157 AND page_id < 1158 ORDER BY page_id ASC, revision.rev_id ASC;" the canonical example?
[12:51:34] <jynus>	 so I can play?
[12:51:39] <apergos>	 no, it's not at all
[12:51:42] <apergos>	 and that's the problem
[12:51:46] <jynus>	 oh
[12:52:11] <jynus>	 is the one in the description the original one
[12:52:12] <jynus>	 ?
[12:52:31] <jynus>	 you can send me code also, if that is easier
[12:52:36] <apergos>	 page id ranges from say 1 to 5000 
[12:52:49] <grrrit-wm>	 (03PS5) 10Gehel: Fixed typo (cron instead of from). [puppet] - 10https://gerrit.wikimedia.org/r/294679 
[12:52:57] <apergos>	 but really
[12:53:00] <apergos>	 (that's the output)
[12:53:14] <apergos>	 the stubs it rangers from page 1 to probably 500k  for the first query
[12:53:18] <jynus>	 I just need somewhere where to start, and then I can follow with options
[12:53:24] <apergos>	 then 500001 to 100k for the next and so on
[12:53:29] <apergos>	 start with those
[12:53:32] <jynus>	 ok, that is a good idea
[12:53:34] <apergos>	 you'll see what I mean right away
[12:53:44] <jynus>	 probably the range + the orderby breaks the proformance
[12:53:50] <jynus>	 which is the typical case
[12:54:06] <apergos>	 indeed
[12:54:07] <jynus>	 let me do some tests and I will go back with what I find
[12:54:08] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Fixed typo (cron instead of from). [puppet] - 10https://gerrit.wikimedia.org/r/294679 (owner: 10Gehel)
[12:54:15] <apergos>	 ok, thanks, if I can help please let me know
[12:54:53] <jynus>	 what I usually do is give the results, and send 1 or a couple of recommendations, and then you can take it from there
[12:55:24] <icinga-wm>	 PROBLEM - Start and verify pages via webservices on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - 187 bytes in 10.966 second response time
[12:55:45] <jynus>	 one last question, I see you doing tests with elwiktionary and elwiki
[12:55:46] <grrrit-wm>	 (03PS6) 10Gehel: Fixed typo (cron instead of from). [puppet] - 10https://gerrit.wikimedia.org/r/294679 
[12:55:58] <jynus>	 but I suupose it will apply to all wikis, right?
[12:55:58] <wikibugs>	 06Operations, 10ORES, 06Revision-Scoring-As-A-Service: ORES should advertise swagger specs under /?spec - https://phabricator.wikimedia.org/T137804#2385088 (10Ladsgroup) p:05Triage>03Unbreak!
[12:57:47] <jynus>	 I see now springle's comment, which points me in the right direction
[12:57:51] <gehel>	 !log rebalancing shards on elasticsearch equiad cluster
[12:57:55] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:01:59] <Amir1>	 jynus: please tell me once you tested it, so I purse this direction 
[13:02:15] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures
[13:02:57] <jynus>	 Amir1, I will have a look when I have the time, I am with aperg*s issue right now
[13:03:09] <jynus>	 if it is on the ticket, it will not be forgotten
[13:03:24] <grrrit-wm>	 (03PS2) 10Gehel: Add the ability to configure contact group for check of services. [puppet] - 10https://gerrit.wikimedia.org/r/294507 (https://phabricator.wikimedia.org/T137869) 
[13:03:40] <Amir1>	 okay sure
[13:04:42] <mobrovac>	 !log scb1001 enabled puppet back
[13:04:47] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:10:26] <icinga-wm>	 PROBLEM - changeprop endpoints health on scb1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.16, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused)))
[13:10:40] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] Add the ability to configure contact group for check of services. [puppet] - 10https://gerrit.wikimedia.org/r/294507 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[13:10:54] <icinga-wm>	 RECOVERY - Start and verify pages via webservices on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 9.624 second response time
[13:11:12] <jynus>	 apergos, one last question, what are some "acceptable" and "unaceptable" times per X rows, to compare with the result I get?
[13:11:38] <apergos>	 I don't know those numbers
[13:11:52] <apergos>	 if you do it without order by (as the code now is) that's "acceptable" I guess
[13:11:55] <jynus>	 (I do not need an exact time, just "X takes half an hour when before it took a few minutes"
[13:12:22] <apergos>	 what we have now in the dumps is that instead of getting 95% of revision content from the old dumps on disk we ask the db
[13:12:35] <apergos>	 that's a side effect of this missing explicit ordering
[13:12:46] <apergos>	 so somehow I need to add this explicit ordering on there without doing the server harm
[13:12:50] <apergos>	 that's where I need your hepl
[13:13:00] <jynus>	 ok, I will give you the number I get, and you can decide if that is ok
[13:13:06] <jynus>	 on the ticket
[13:13:09] <apergos>	 ok
[13:15:35] <grrrit-wm>	 (03PS2) 10BBlack: r::c::ssl::unified: set explicit server name www.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/294703 (https://phabricator.wikimedia.org/T107236) 
[13:15:37] <grrrit-wm>	 (03PS2) 10BBlack: tlsproxy: redirect-only service on 8080 [puppet] - 10https://gerrit.wikimedia.org/r/294706 (https://phabricator.wikimedia.org/T107236) 
[13:15:39] <grrrit-wm>	 (03PS2) 10BBlack: r::c::ssl: use 3127 for upstream_port [puppet] - 10https://gerrit.wikimedia.org/r/294704 (https://phabricator.wikimedia.org/T107236) 
[13:15:41] <grrrit-wm>	 (03PS2) 10BBlack: vhtcpd: use port 3127 for fe [puppet] - 10https://gerrit.wikimedia.org/r/294705 (https://phabricator.wikimedia.org/T107236) 
[13:16:24] <grrrit-wm>	 (03PS2) 10Giuseppe Lavagetto: salt: add wmfpuppet module [puppet] - 10https://gerrit.wikimedia.org/r/294694 
[13:16:55] <icinga-wm>	 RECOVERY - changeprop endpoints health on scb1001 is OK: All endpoints are healthy
[13:17:36] <grrrit-wm>	 (03CR) 10BBlack: "It's a little bit stalled while we try to figure out the long-term stuff on how to integrate CI with VCL tests better, but probably should" [puppet] - 10https://gerrit.wikimedia.org/r/276733 (owner: 10Hashar)
[13:18:15] <icinga-wm>	 PROBLEM - puppet last run on ganeti2001 is CRITICAL: CRITICAL: puppet fail
[13:19:41] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] salt: add wmfpuppet module [puppet] - 10https://gerrit.wikimedia.org/r/294694 (owner: 10Giuseppe Lavagetto)
[13:23:10] <_joe_>	 I hate you pep8
[13:24:07] <gehel>	 _joe_: don't worry, pep8 hates you too
[13:26:05] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[13:27:31] <grrrit-wm>	 (03PS3) 10Giuseppe Lavagetto: salt: add wmfpuppet module [puppet] - 10https://gerrit.wikimedia.org/r/294694 
[13:34:30] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures
[13:39:03] <wikibugs>	 06Operations, 06Discovery, 06Services, 03Maps-Sprint, 13Patch-For-Review: Allow configuration of contact groups for monitoring of services - https://phabricator.wikimedia.org/T137891#2385214 (10Gehel) 05Open>03Resolved
[13:43:54] <grrrit-wm>	 (03CR) 10BBlack: [C: 031] "+1 for usefulness of enable/disable. Can we push this as an upstream patch to salt too?" [puppet] - 10https://gerrit.wikimedia.org/r/294694 (owner: 10Giuseppe Lavagetto)
[13:44:02] <icinga-wm>	 RECOVERY - puppet last run on ganeti2001 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[13:50:52] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 2 failures
[13:51:18] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] r::c::ssl::unified: set explicit server name www.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/294703 (https://phabricator.wikimedia.org/T107236) (owner: 10BBlack)
[13:51:45] <grrrit-wm>	 (03PS1) 10Alex Monk: Simplify the VE RB URL config some more, now that we no longer use wgServerName [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294713 
[13:57:40] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:01:00] <icinga-wm>	 PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: puppet fail
[14:04:04] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2385267 (10Papaul) @fgiunchedi Yes there are SSD
[14:07:50] <icinga-wm>	 PROBLEM - Check correctness of the icinga configuration on neon is CRITICAL: Icinga configuration contains errors
[14:10:36] <grrrit-wm>	 (03CR) 10JanZerebecki: "If my memory serves me right building on testing is fine in this case. We could upload the debs required for building from testing to jess" [debs/geckodriver] - 10https://gerrit.wikimedia.org/r/294293 (https://phabricator.wikimedia.org/T137797) (owner: 10Hashar)
[14:11:43] <bblack>	 Krenair: do you have a ref about the RB and wgServerName stuff somewhere? I don't get the "now that we no longer use" part, but sounds related to https://phabricator.wikimedia.org/T127370#2042629 ?
[14:13:01] <Krenair>	 https://gerrit.wikimedia.org/r/#/c/291349/
[14:16:41] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:15] <wikibugs>	 07Blocked-on-Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 05Gerrit-Migration, and 2 others: Package xhpast (libphutil) - https://phabricator.wikimedia.org/T137770#2385298 (10fgiunchedi) thanks for the context @mmodell ! Since we're using the same package names as Debian we should ensur...
[14:23:50] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Add the instance tld (e.g. 'wmflabs') to designate and horizon config. [puppet] - 10https://gerrit.wikimedia.org/r/294716 (https://phabricator.wikimedia.org/T91990) 
[14:25:11] <wikibugs>	 07Blocked-on-Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 05Gerrit-Migration, and 2 others: Package xhpast (libphutil) - https://phabricator.wikimedia.org/T137770#2385312 (10mmodell) I'm not picky about the versioning and right now I can't think of anything that would be different depe...
[14:26:05] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] r::c::ssl: use 3127 for upstream_port [puppet] - 10https://gerrit.wikimedia.org/r/294704 (https://phabricator.wikimedia.org/T107236) (owner: 10BBlack)
[14:26:28] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] vhtcpd: use port 3127 for fe [puppet] - 10https://gerrit.wikimedia.org/r/294705 (https://phabricator.wikimedia.org/T107236) (owner: 10BBlack)
[14:26:51] <icinga-wm>	 RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures
[14:28:34] <grrrit-wm>	 (03PS2) 10Andrew Bogott: Add the instance tld (e.g. 'wmflabs') to designate and horizon config. [puppet] - 10https://gerrit.wikimedia.org/r/294716 (https://phabricator.wikimedia.org/T91990) 
[14:29:20] <wikibugs>	 06Operations, 10OCG-General: ocg alarm ocg_job_status_queue 'flapping' - https://phabricator.wikimedia.org/T97524#2385375 (10Aklapper) One year later: Still happening? Or obsolete / declined?
[14:29:59] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review: Switch port 80 to nginx on primary clusters - https://phabricator.wikimedia.org/T107236#2385376 (10BBlack)
[14:31:24] <twentyafterfour>	 !log re-enabled and ran puppet agent --test on iridium. Everything appears to be normal.
[14:31:27] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:32:41] <grrrit-wm>	 (03PS3) 10Andrew Bogott: Add the instance tld (e.g. 'wmflabs') to designate and horizon config. [puppet] - 10https://gerrit.wikimedia.org/r/294716 (https://phabricator.wikimedia.org/T91990) 
[14:33:54] <wikibugs>	 06Operations, 10Traffic, 10Wikimedia-Stream, 07HTTPS: stream.wikimedia.org doesn't redirect to HTTPS - https://phabricator.wikimedia.org/T137915#2385381 (10BBlack) Note: based on simple test python and javascript clients, websocket client libraries tend to not support 301 redirects to HTTPS.  So we'll prob...
[14:34:32] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Add the instance tld (e.g. 'wmflabs') to designate and horizon config. [puppet] - 10https://gerrit.wikimedia.org/r/294716 (https://phabricator.wikimedia.org/T91990) (owner: 10Andrew Bogott)
[14:35:42] <wikibugs>	 06Operations, 10Mail, 10MediaWiki-Watchlist: Mails from MediaWiki seem to get (partially) lost - https://phabricator.wikimedia.org/T121105#2385389 (10Aklapper) >>! In T121105#2046743, @Dzahn wrote: > Could i have another pair of eyes here please? I don't really see a pattern here  Wondering if it's another c...
[14:35:56] <grrrit-wm>	 (03CR) 10Jforrester: [C: 031] Simplify the VE RB URL config some more, now that we no longer use wgServerName [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294713 (owner: 10Alex Monk)
[14:36:28] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2244 is CRITICAL: Host mw2244 is not in mediawiki-installation dsh group
[14:36:28] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2245 is CRITICAL: Host mw2245 is not in mediawiki-installation dsh group
[14:36:28] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2241 is CRITICAL: Host mw2241 is not in mediawiki-installation dsh group
[14:36:28] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2242 is CRITICAL: Host mw2242 is not in mediawiki-installation dsh group
[14:36:59] <wikibugs>	 07Blocked-on-Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 05Gerrit-Migration, and 2 others: Package xhpast (libphutil) - https://phabricator.wikimedia.org/T137770#2385401 (10mmodell) @fgiunchedi can you handle tagging a version according to your 0~git<date>-0wmf1 scheme then? Or should...
[14:37:18] <icinga-wm>	 PROBLEM - puppet last run on mw2241 is CRITICAL: CRITICAL: Puppet has 9 failures
[14:37:18] <icinga-wm>	 PROBLEM - puppet last run on mw2242 is CRITICAL: CRITICAL: Puppet has 9 failures
[14:39:28] <icinga-wm>	 PROBLEM - puppet last run on mw2244 is CRITICAL: CRITICAL: Puppet has 9 failures
[14:41:14] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Include eqiad/codfw in INSTANCE_TLD [puppet] - 10https://gerrit.wikimedia.org/r/294718 (https://phabricator.wikimedia.org/T91990) 
[14:41:38] <icinga-wm>	 RECOVERY - Check correctness of the icinga configuration on neon is OK: Icinga configuration is correct
[14:42:43] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Include eqiad/codfw in INSTANCE_TLD [puppet] - 10https://gerrit.wikimedia.org/r/294718 (https://phabricator.wikimedia.org/T91990) (owner: 10Andrew Bogott)
[14:43:18] <godog>	 twentyafterfour: yup I can change the version and tag, do you prefer a gerrit review or differential?
[14:43:48] <godog>	 twentyafterfour: I have a couple of changes to debian/control to review too
[14:43:49] <icinga-wm>	 PROBLEM - puppet last run on mw2245 is CRITICAL: CRITICAL: Puppet has 9 failures
[14:44:29] <icinga-wm>	 PROBLEM - Apache HTTP on mw2241 is CRITICAL: Connection refused
[14:47:41] <wikibugs>	 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2385440 (10BBlack) As discussed in email, now that we're past the first deadline date and we've been posting username lists on public wikis...
[14:47:41] <twentyafterfour>	 godog: either way, it's got an arcconfig but gerrit is fine if you prefer
[14:49:39] <wikibugs>	 06Operations, 10Traffic, 07HTTPS, 05MW-1.27-release-notes, 13Patch-For-Review: Insecure POST traffic - https://phabricator.wikimedia.org/T105794#2385447 (10BBlack) Latest list of accounts still making insecure requests over the past ~24H: T136674#2385440
[14:49:54] <wikibugs>	 07Blocked-on-Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 05Gerrit-Migration, and 2 others: Package xhpast (libphutil) - https://phabricator.wikimedia.org/T137770#2385451 (10hashar) For Zuul package I am using `2.1.0-151-g30a433b-wmf2precise1` where:  | 2.1.0 | Upstream tag | 151-g30a4...
[14:50:01] <wikibugs>	 06Operations, 06Discovery, 06Maps, 03Maps-Sprint, 13Patch-For-Review: Review alerting scheme for Maps - https://phabricator.wikimedia.org/T137869#2385452 (10Gehel) >>! In T137869#2382475, @Joe wrote: > We should add a service check for karthoterian using service_checker on the lvs IP, pretty much as we d...
[14:51:00] <icinga-wm>	 PROBLEM - Apache HTTP on mw2244 is CRITICAL: Connection refused
[14:51:44] <godog>	 twentyafterfour: ok! thanks, https://phabricator.wikimedia.org/D268
[14:52:31] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Hm, I don't know what designateconfig['dhcp_domain'] is but what I want here is $::site [puppet] - 10https://gerrit.wikimedia.org/r/294719 
[14:55:19] <icinga-wm>	 PROBLEM - Apache HTTP on mw2242 is CRITICAL: Connection refused
[14:55:36] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] "This time I actually tested with the puppet compiler, and this now does what I want." [puppet] - 10https://gerrit.wikimedia.org/r/294719 (owner: 10Andrew Bogott)
[15:00:04] <jouncebot>	 anomie, ostriches, thcipriani, marktraceur, and aude: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160616T1500).
[15:00:04] <jouncebot>	 Amir1: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[15:00:21] <Amir1>	 o/
[15:01:19] <Amir1>	 "the time has come" lol
[15:02:28] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures
[15:02:48] <thcipriani>	 I can SWAT today
[15:03:10] <thcipriani>	 Amir1: I am reviewing changes now, give me a moment :)
[15:03:26] <Amir1>	 sure, you're awesome
[15:03:37] <icinga-wm>	 PROBLEM - Host mw2246 is DOWN: PING CRITICAL - Packet loss = 100%
[15:04:52] <logmsgbot>	 !log root@palladium conftool action : set/pooled=yes; selector: name=mw1262.eqiad.wmnet
[15:04:56] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:05:26] <wikibugs>	 06Operations, 06Discovery, 06Maps, 13Patch-For-Review: "Is maps service alive?" check - https://phabricator.wikimedia.org/T137851#2385489 (10Gehel) >>! In T137869#2382475, @Joe wrote: > We should add a service check for karthoterian using service_checker on the lvs IP, pretty much as we do for other servic...
[15:05:58] <icinga-wm>	 PROBLEM - DPKG on mw1291 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[15:06:30] <grrrit-wm>	 (03CR) 10Eevans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/294669 (owner: 10Mobrovac)
[15:06:34] <_joe_>	 moritzm: ^^ that you?
[15:07:04] <moritzm>	 yeah, fix is currently building
[15:07:50] <_joe_>	 ok :P
[15:11:28] <icinga-wm>	 PROBLEM - Apache HTTP on mw2245 is CRITICAL: Connection refused
[15:15:34] <icinga-wm>	 PROBLEM - MariaDB disk space on labsdb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:17:08] <icinga-wm>	 RECOVERY - DPKG on mw1291 is OK: All packages OK
[15:20:05] <icinga-wm>	 PROBLEM - MariaDB disk space on labsdb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:20:37] <grrrit-wm>	 (03PS1) 10Gehel: Team-interactive receives maps alerts [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) 
[15:20:47] <wikibugs>	 06Operations: ffmpeg/libav on jessie video scalers - https://phabricator.wikimedia.org/T137886#2385556 (10MoritzMuehlenhoff) 05Open>03Resolved The following packages have been built for jessie-wikimedia and uploaded to apt.wikimedia.org: libtheora 1.2.0~git+20150816-1+wmf1  ffmpeg2theora 0.30-1+wmf1  chromap...
[15:21:04] <logmsgbot>	 !log thcipriani@tin Synchronized php-1.28.0-wmf.6/extensions/ORES: SWAT: [[gerrit:294711|Skip when an edit is errored in PopulateDatabase.php]] (duration: 00m 30s)
[15:21:07] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:21:08] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures
[15:21:25] <thcipriani>	 ^ Amir1 patch1 sync'd, check if possible please
[15:21:37] <Amir1>	 not possible :)
[15:21:45] <Amir1>	 maintenance script 
[15:22:16] <Amir1>	 we actually passed it through SWAT before but that was for wmf.5 
[15:22:35] <thcipriani>	 ack, that's what I figured :)
[15:23:44] <moritzm>	 !log rolling reboot of restbase1008 - restbase1011 for upgrade to Linux 4.4
[15:23:48] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:26:09] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[15:27:16] <logmsgbot>	 !log thcipriani@tin Synchronized php-1.28.0-wmf.6/extensions/ORES/includes/Hooks.php: SWAT: [[gerrit:294712|Performance boost on hidenondamaging]] (duration: 00m 35s)
[15:27:19] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:27:20] <thcipriani>	 ^ Amir1 check please
[15:27:24] <Amir1>	 sure
[15:28:05] <Amir1>	 request responded in 1.68 sec instead of 22
[15:28:12] <Amir1>	 jynus: ^
[15:29:20] <Amir1>	 (that is for the whole page, not the db query which definitely took very shortly) 
[15:30:18] <Amir1>	 thcipriani: i.e. it's working like a charm
[15:30:19] <Amir1>	 thanks
[15:30:32] <thcipriani>	 Amir1: glad to hear, thanks for checking :)
[15:30:50] <jynus>	 Amir1, you may want to involve performance team for page loading tips
[15:31:15] <jynus>	 but it is ok for now
[15:31:33] <Amir1>	 yeah
[15:31:48] * Amir1 goes afk for dancing in WMDE office :D
[15:35:55] <icinga-wm>	 RECOVERY - MariaDB disk space on labsdb1003 is OK: DISK OK
[15:35:58] <wikibugs>	 06Operations, 06Discovery, 06Maps, 03Maps-Sprint, 13Patch-For-Review: Review alerting scheme for Maps - https://phabricator.wikimedia.org/T137869#2385605 (10Gehel) In term of production support, we seem to be good to go once https://gerrit.wikimedia.org/r/#/c/294723/ is merged. LVS will be paging.  We ca...
[15:37:22] <jynus>	 !log deleted sqldata.s6 from labsdb1008 - space issues caused by queries creating temporary tables
[15:37:25] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:42:15] <grrrit-wm>	 (03PS1) 10Mobrovac: Change Prop: increase concurrency to 50 [puppet] - 10https://gerrit.wikimedia.org/r/294726 
[15:45:28] <icinga-wm>	 RECOVERY - Host mw2246 is UP: PING OK - Packet loss = 0%, RTA = 37.18 ms
[15:45:58] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures
[15:47:10] <wikibugs>	 07Blocked-on-Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 05Gerrit-Migration, and 2 others: Package xhpast (libphutil) - https://phabricator.wikimedia.org/T137770#2385618 (10mmodell) Unfortunately phabricator doesn't have any upstream version tags.
[15:49:54] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] Team-interactive receives maps alerts (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[15:49:58] <icinga-wm>	 PROBLEM - configured eth on mw2246 is CRITICAL: Connection refused by host
[15:50:19] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw2246 is CRITICAL: Connection refused by host
[15:50:28] <icinga-wm>	 PROBLEM - dhclient process on mw2246 is CRITICAL: Connection refused by host
[15:50:38] <icinga-wm>	 PROBLEM - DPKG on mw2246 is CRITICAL: Connection refused by host
[15:50:57] <icinga-wm>	 PROBLEM - Disk space on mw2246 is CRITICAL: Connection refused by host
[15:51:17] <icinga-wm>	 PROBLEM - nutcracker port on mw2246 is CRITICAL: Connection refused by host
[15:51:18] <icinga-wm>	 PROBLEM - MD RAID on mw2246 is CRITICAL: Connection refused by host
[15:51:28] <icinga-wm>	 PROBLEM - nutcracker process on mw2246 is CRITICAL: Connection refused by host
[15:51:47] <icinga-wm>	 PROBLEM - puppet last run on mw2246 is CRITICAL: Connection refused by host
[15:51:59] <icinga-wm>	 PROBLEM - salt-minion processes on mw2246 is CRITICAL: Connection refused by host
[15:52:09] <grrrit-wm>	 (03CR) 10Gehel: Team-interactive receives maps alerts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[15:53:51] <grrrit-wm>	 (03PS2) 10Gehel: Team-interactive receives maps alerts [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) 
[15:54:18] <icinga-wm>	 PROBLEM - puppet last run on mw1136 is CRITICAL: CRITICAL: Puppet has 1 failures
[15:59:05] <grrrit-wm>	 (03PS2) 10Mobrovac: Change Prop: increase concurrency to 50 [puppet] - 10https://gerrit.wikimedia.org/r/294726 (https://phabricator.wikimedia.org/T137902) 
[15:59:33] <bblack>	 pretty big uptick in text 500-errors just recently....
[16:00:02] <bblack>	 starts around 15:37 but doesn't hit its full stride until a few minutes ago
[16:00:04] <jouncebot>	 godog, moritzm, and _joe_: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160616T1600). Please do the needful.
[16:00:04] <jouncebot>	 tgr and mobrovac: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process.
[16:00:24] <grrrit-wm>	 (03PS2) 10RobH: adding user joewalsh to cluster access [puppet] - 10https://gerrit.wikimedia.org/r/294093 (https://phabricator.wikimedia.org/T137110) 
[16:00:35] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Change Prop: increase concurrency to 50 [puppet] - 10https://gerrit.wikimedia.org/r/294726 (https://phabricator.wikimedia.org/T137902) (owner: 10Mobrovac)
[16:00:58] <mobrovac>	 i changed nothing in the patch, only the commit msg, how is that possible?
[16:01:36] <godog>	 jenkins likes to scold people
[16:01:37] <grrrit-wm>	 (03CR) 10Mobrovac: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/294726 (https://phabricator.wikimedia.org/T137902) (owner: 10Mobrovac)
[16:01:53] <mobrovac>	 apparently
[16:01:53] <godog>	 _joe_ moritzm I can SWAT
[16:02:02] <bblack>	 a lot of the 500s are coming from RB apparently
[16:02:03] <moritzm>	 mobrovac: "Gem::RemoteFetcher::UnknownHostError: no such name (https://rubygems.org/gems/rspec-mocks-3.4.1.gem)"
[16:02:14] <moritzm>	 the joy of pulling unversioned stuff from the internet!
[16:02:25] <mobrovac>	 sigh
[16:02:40] <moritzm>	 hashar: seems jake-jessie also broke
[16:02:45] <moritzm>	 hashar: seems rake-jessie also broke
[16:02:55] <moritzm>	 godog: ok
[16:03:54] <mobrovac>	 bblack: https://grafana-admin.wikimedia.org/dashboard/db/restbase?panelId=17&fullscreen shows 7 reqs/sec of 5xx
[16:04:02] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2246 is CRITICAL: Host mw2246 is not in mediawiki-installation dsh group
[16:04:09] <mobrovac>	 hmm mobile-sections
[16:04:40] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 04-1] Handle invalid DB name in 'sql' shell script (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/294496 (owner: 10Gergő Tisza)
[16:04:43] <godog>	 tgr: you're up, ^
[16:04:56] <tgr>	 o/
[16:05:03] <mobrovac>	 moritzm: are you restarting cassandra?
[16:05:21] <bblack>	 mobrovac: time pattern fits what I see on cache_text
[16:05:40] <_joe_>	 godog: thanks, my bandwidth is not getting any better
[16:06:36] <grrrit-wm>	 (03PS2) 10Gergő Tisza: Handle invalid DB name in 'sql' shell script [puppet] - 10https://gerrit.wikimedia.org/r/294496 
[16:06:44] <tgr>	 godog: ^
[16:06:45] <godog>	 _joe_: np, I was tempted to make a joke about wind the operator heh
[16:06:51] <mobrovac>	 moritzm: urandom: Error: Cannot achieve consistency level LOCAL_QUORUM
[16:07:12] <mobrovac>	 lots and lots of those in the logs
[16:07:16] <mobrovac>	 bblack: probably ^^^
[16:07:49] <grrrit-wm>	 (03PS3) 10Filippo Giunchedi: Handle invalid DB name in 'sql' shell script [puppet] - 10https://gerrit.wikimedia.org/r/294496 (owner: 10Gergő Tisza)
[16:07:50] <paravoid>	 mobrovac: should I just merge the changeprop change?
[16:07:56] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Handle invalid DB name in 'sql' shell script [puppet] - 10https://gerrit.wikimedia.org/r/294496 (owner: 10Gergő Tisza)
[16:08:03] <mobrovac>	 paravoid: sure
[16:08:07] <grrrit-wm>	 (03PS3) 10Faidon Liambotis: Change Prop: increase concurrency to 50 [puppet] - 10https://gerrit.wikimedia.org/r/294726 (https://phabricator.wikimedia.org/T137902) (owner: 10Mobrovac)
[16:08:20] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 032 V: 032] Change Prop: increase concurrency to 50 [puppet] - 10https://gerrit.wikimedia.org/r/294726 (https://phabricator.wikimedia.org/T137902) (owner: 10Mobrovac)
[16:08:33] <paravoid>	 done
[16:08:37] <mobrovac>	 thnx!
[16:08:37] <godog>	 tgr: {{done}}
[16:08:40] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "2 minor comments, otherwise LGTM. Feel free to merge after fixing comments" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[16:08:44] <tgr>	 godog: thanks!
[16:10:01] <moritzm>	 mobrovac: indirectly, by means of the restbase1008-1011 reboots, but I have waited between individual reboots (and only one at a time)
[16:14:15] <godog>	 moritzm mobrovac urandom mhh only restbase1011-b reported down now though
[16:16:23] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10Monitoring, 06Services: Update restbase catchpoint metric - https://phabricator.wikimedia.org/T137181#2385714 (10mark) a:03mark
[16:16:44] <moritzm>	 yeah, all others are in UN
[16:17:40] <mobrovac>	 these are still ongoing (cannot achieve quorum)
[16:18:47] <mobrovac>	 godog: moritzm: urandom: somebody trying to revive it?
[16:18:51] <godog>	 odd, not from restbase1011's perspective
[16:19:15] <godog>	 yeah I'll try to drain cassandra instances on 1011
[16:19:29] <moritzm>	 1011 is still depooled BTW
[16:19:48] <mobrovac>	 rb1009-b 1014-a and 1014-b are DN
[16:19:59] <mobrovac>	 godog: ^
[16:20:12] <icinga-wm>	 PROBLEM - check_puppetrun on heka is CRITICAL: CRITICAL: Puppet has 1 failures
[16:20:21] <godog>	 mobrovac: from where?
[16:20:28] <mobrovac>	 only from 1011
[16:21:31] <mobrovac>	 ok godog rb1011 is definitely the problem, it sees these as down, but all others think everybody is UN
[16:21:52] <godog>	 mobrovac: yup, but looks like it has converged just now?!
[16:22:51] <mobrovac>	 hm interesting
[16:22:53] <mobrovac>	 indeed
[16:24:54] <godog>	 mobrovac: also looks like quorum messages from RB are not there anymore?
[16:25:12] <icinga-wm>	 RECOVERY - check_puppetrun on heka is OK: OK: Puppet is currently enabled, last run 96 seconds ago with 0 failures
[16:25:36] <grrrit-wm>	 (03PS3) 10Gehel: Team-interactive receives maps alerts [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) 
[16:26:03] <mobrovac>	 godog: can't load logstash the last 5 mins, so no idea
[16:26:05] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: scap: add new appservers [puppet] - 10https://gerrit.wikimedia.org/r/294735 
[16:26:40] <mobrovac>	 godog: ok, loaded, it looks stabilised now
[16:26:54] <mobrovac>	 bblack: confirm the 5xx rate is down now?
[16:27:11] <grrrit-wm>	 (03Abandoned) 10Gehel: Add interactive-team to default Icinga notification group for maps servers [puppet] - 10https://gerrit.wikimedia.org/r/294503 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[16:28:02] <bblack>	 mobrovac: seems to be so far
[16:28:16] <moritzm>	 mobrovac, godog: I'll repool 1011, then?
[16:28:25] <grrrit-wm>	 (03PS1) 10Jcrespo: Set all new slaves to medium weight (300) after warm up [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294736 
[16:28:29] <mobrovac>	 moritzm: let's give it 5 mins to be sure
[16:28:40] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] scap: add new appservers [puppet] - 10https://gerrit.wikimedia.org/r/294735 (owner: 10Giuseppe Lavagetto)
[16:29:21] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Set all new slaves to medium weight (300) after warm up [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294736 (owner: 10Jcrespo)
[16:30:04] <moritzm>	 ok
[16:30:26] <grrrit-wm>	 (03PS3) 10RobH: adding user joewalsh to cluster access [puppet] - 10https://gerrit.wikimedia.org/r/294093 (https://phabricator.wikimedia.org/T137110) 
[16:31:12] <jynus>	 _joe_, should I wait 1 minute for scap?
[16:31:38] <mobrovac>	 moritzm: kk, feel free to repool it now
[16:31:52] <icinga-wm>	 PROBLEM - NTP on mw2246 is CRITICAL: NTP CRITICAL: No response from NTP server
[16:32:29] <grrrit-wm>	 (03CR) 10RobH: [C: 032] "3 day wait has passed with no objections." [puppet] - 10https://gerrit.wikimedia.org/r/294093 (https://phabricator.wikimedia.org/T137110) (owner: 10RobH)
[16:32:37] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: conftool: add new jessie api appservers [puppet] - 10https://gerrit.wikimedia.org/r/294737 
[16:32:46] <godog>	 moritzm: also I'd say more time between reboots, when "total hints" hits zero should be safe to proceed with the next one, e.g. in https://grafana.wikimedia.org/dashboard/db/restbase-cassandra-storage
[16:33:05] <_joe_>	 jynus: yes please
[16:33:11] <_joe_>	 sorry I was preparing the other change
[16:33:16] <jynus>	 _joe_, ping me when done
[16:33:29] <mobrovac>	 godog: +1
[16:34:30] <jynus>	 we can test it with my change- if some fail it is not a huge deal
[16:34:48] <wikibugs>	 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to stat1003, stat1002 and bast1001 for joewalsh - https://phabricator.wikimedia.org/T137110#2385766 (10RobH) 05stalled>03Resolved a:05RobH>03None @JoeWalsh: Your access received no objections, so I've merged it live.  While it...
[16:34:58] <_joe_>	 jynus: green light
[16:35:07] <godog>	 I've updated Service_restarts on wikitech to point to the dashboards
[16:35:31] <jynus>	 ok, let's do this- a simple change, we are just adding 15 new application servers and 15 new databases :-)
[16:35:51] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures
[16:36:34] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures Filippo Giunchedi check is flapping, see also https://phabricator.wikimedia.org/T137952
[16:36:55] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Set all new slaves to medium weight (300) after warm up (duration: 00m 25s)
[16:36:58] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:37:04] <jynus>	 he, he, _joe_ : Could not resolve hostname nw2241.codfw.wmnet: Name or service not known
[16:37:39] <mobrovac>	 godog: i have one RB patch for puppetswat, i think we're safe now to go with it
[16:37:53] <grrrit-wm>	 (03CR) 10Faidon Liambotis: "The kernel limits are per IP, so the number of connections LVS is handling isn't a (big) factor here. I'm assuming that for destunreach yo" [puppet] - 10https://gerrit.wikimedia.org/r/294467 (https://phabricator.wikimedia.org/T136939) (owner: 10Faidon Liambotis)
[16:37:58] <jynus>	 you fix it and pool while I check the dbs?
[16:38:01] <jynus>	 *pull
[16:38:19] <_joe_>	 jynus: yes, grrrr
[16:38:23] <_joe_>	 damn mac fonts
[16:38:59] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw2245 is OK: OK
[16:38:59] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw2242 is OK: OK
[16:39:00] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw2244 is OK: OK
[16:39:07] <godog>	 mobrovac: ack, LGTM
[16:39:15] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: RESTBase: Make sendind resource_change events optional [puppet] - 10https://gerrit.wikimedia.org/r/294669 (owner: 10Mobrovac)
[16:39:30] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] RESTBase: Make sendind resource_change events optional [puppet] - 10https://gerrit.wikimedia.org/r/294669 (owner: 10Mobrovac)
[16:39:57] <mobrovac>	 godog: moritzm: hmm, the local_quorum problem seems to be back
[16:40:30] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: scap: s/nw2241/mw2241/ [puppet] - 10https://gerrit.wikimedia.org/r/294738 
[16:41:00] <grrrit-wm>	 (03PS2) 10Giuseppe Lavagetto: scap: s/nw2241/mw2241/ [puppet] - 10https://gerrit.wikimedia.org/r/294738 
[16:41:11] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] scap: s/nw2241/mw2241/ [puppet] - 10https://gerrit.wikimedia.org/r/294738 (owner: 10Giuseppe Lavagetto)
[16:41:24] <grrrit-wm>	 (03PS1) 10EBernhardson: Dependent config for textcat AB test. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294739 
[16:41:24] <mobrovac>	 godog: did you run puppet for rb perhaps?
[16:41:35] <grrrit-wm>	 (03PS2) 10EBernhardson: search: Dependent config for textcat AB test. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294739 
[16:42:00] <godog>	 mobrovac: no, but it shouldn't affect anything even if ran I think
[16:42:26] <mobrovac>	 godog: sure, sure, was asking to know whether i should do so :)
[16:42:39] <mobrovac>	 k, i'll run it
[16:42:56] <moritzm>	 haven't repooled 1011 yet, shall I withhold?
[16:42:56] <godog>	 mobrovac: ack, thanks, I'm looking at logstash btw but don't see the quorum messages so far
[16:43:21] <mobrovac>	 godog: there seems to have been just a burst of them @ :38
[16:43:31] <mobrovac>	 calmed down again
[16:43:43] <mobrovac>	 moritzm: i think you're good to go
[16:44:13] <_joe_>	 jynus: fixed :)
[16:44:19] <jynus>	 yay
[16:44:19] <_joe_>	 I am going off now
[16:45:14] <moritzm>	 k, repooled 1011
[16:45:17] <mobrovac>	 thnx
[16:50:45] <wikibugs>	 06Operations, 03Discovery-Search-Sprint: Followup on elastic1026 blowing up May 9, 21:43-22:14 UTC - https://phabricator.wikimedia.org/T134829#2385821 (10EBernhardson) Are we going to do anything else with this ticket? Should move it to done?
[16:53:00] <icinga-wm>	 PROBLEM - Apache HTTP on mw1136 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[16:53:39] <icinga-wm>	 PROBLEM - HHVM rendering on mw1136 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[16:54:11] <icinga-wm>	 PROBLEM - nutcracker process on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:54:19] <icinga-wm>	 PROBLEM - SSH on mw1136 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[16:54:30] <icinga-wm>	 PROBLEM - HHVM processes on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:54:30] <icinga-wm>	 PROBLEM - configured eth on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:55:20] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10Wikidata, 10Wikimedia-Language-setup, and 2 others: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2253774 (10RobH) It sounds like a database script, and therefore falls to @jcrespo?  (I don't want to leave this sitting with no attention, so j...
[16:55:49] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:56:10] <godog>	 twentyafterfour: I'm failing repeatedly to 'arc land' the patch with an error about libext/Sprint submodule not found, any way you can land it too?
[16:56:33] <wikibugs>	 06Operations, 10OCG-General: ocg alarm ocg_job_status_queue 'flapping' - https://phabricator.wikimedia.org/T97524#2385830 (10cscott) The threshold is pretty arbitrary, it just warns us maybe to have a look and see if anything is obviously wrong.  We can bump the threshold higher if it seems that the warning is...
[16:56:36] <twentyafterfour>	 godog: sure thing
[16:57:12] <twentyafterfour>	 godog: btw: you can usually just merge and git push, phabricator figures it out
[16:58:09] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:59:21] <icinga-wm>	 PROBLEM - dhclient process on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:59:40] <icinga-wm>	 PROBLEM - nutcracker port on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[17:00:04] <jouncebot>	 yurik, gwicke, cscott, arlolra, and subbu: Respected human, time to deploy Services – Graphoid / Parsoid / OCG / Citoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160616T1700). Please do the needful.
[17:00:09] <icinga-wm>	 PROBLEM - DPKG on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[17:00:10] <icinga-wm>	 PROBLEM - salt-minion processes on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[17:00:17] <godog>	 twentyafterfour: oh ok, thanks! didn't know that
[17:00:22] <subbu>	 no parsoid deploy
[17:00:31] <icinga-wm>	 PROBLEM - Disk space on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[17:00:41] <wikibugs>	 07Blocked-on-Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 05Gerrit-Migration, and 2 others: Package xhpast (libphutil) - https://phabricator.wikimedia.org/T137770#2385855 (10mmodell)
[17:07:08] <twentyafterfour>	 godog: https://phabricator.wikimedia.org/rPHDEP9101e9e9e520170215c9c2260f1ce0667773c5c1
[17:09:01] <icinga-wm>	 PROBLEM - Apache HTTP on mw1117 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50408 bytes in 3.134 second response time
[17:09:10] <icinga-wm>	 PROBLEM - HHVM rendering on mw1117 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.003 second response time
[17:09:10] <twentyafterfour>	 godog: autoclose didn't work but the patch is landed.
[17:09:23] <twentyafterfour>	 (I didn't have autoclose enabled on the debian branch)
[17:11:19] <icinga-wm>	 RECOVERY - Apache HTTP on mw1117 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.188 second response time
[17:11:29] <icinga-wm>	 RECOVERY - HHVM rendering on mw1117 is OK: HTTP OK: HTTP/1.1 200 OK - 67286 bytes in 0.300 second response time
[17:11:52] <grrrit-wm>	 (03CR) 10Tjones: [C: 031] "Everything looks good!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294739 (owner: 10EBernhardson)
[17:13:03] <wikibugs>	 06Operations, 10Traffic: Support brotli compression - https://phabricator.wikimedia.org/T137979#2385879 (10BBlack)
[17:13:15] <wikibugs>	 06Operations, 10Traffic: Support brotli compression - https://phabricator.wikimedia.org/T137979#2385892 (10BBlack) p:05Triage>03Normal
[17:13:26] <wikibugs>	 06Operations, 06Performance-Team, 10Traffic: Support brotli compression - https://phabricator.wikimedia.org/T137979#2385879 (10BBlack)
[17:13:59] <wikibugs>	 07Blocked-on-Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 05Gerrit-Migration, and 2 others: Package xhpast (libphutil) - https://phabricator.wikimedia.org/T137770#2385894 (10mmodell)
[17:15:31] <wikibugs>	 06Operations, 06Performance-Team, 10Traffic: Support brotli compression - https://phabricator.wikimedia.org/T137979#2385900 (10BBlack)
[17:15:50] <icinga-wm>	 PROBLEM - puppet last run on mw1117 is CRITICAL: CRITICAL: Puppet has 8 failures
[17:16:28] <wikibugs>	 06Operations, 06Performance-Team, 10Traffic: Support brotli compression - https://phabricator.wikimedia.org/T137979#2385879 (10BBlack)
[17:16:55] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10Monitoring, 06Services: Update restbase catchpoint metric - https://phabricator.wikimedia.org/T137181#2385904 (10mark) I've added a copy of the old test for these changes, suffixed "UNCACHED". I'll leave the old cached (but now fixed up for article content) test in...
[17:17:12] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10Monitoring, 06Services: Update restbase catchpoint metric - https://phabricator.wikimedia.org/T137181#2385908 (10mark) p:05High>03Normal
[17:17:19] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw2247 is OK: OK
[17:18:21] <icinga-wm>	 RECOVERY - Disk space on mw1136 is OK: DISK OK
[17:18:40] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw1278 is OK: OK
[17:19:00] <icinga-wm>	 RECOVERY - nutcracker process on mw1136 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[17:19:00] <icinga-wm>	 RECOVERY - SSH on mw1136 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0)
[17:19:20] <icinga-wm>	 RECOVERY - HHVM processes on mw1136 is OK: PROCS OK: 6 processes with command name hhvm
[17:19:21] <icinga-wm>	 RECOVERY - configured eth on mw1136 is OK: OK - interfaces up
[17:19:39] <icinga-wm>	 RECOVERY - dhclient process on mw1136 is OK: PROCS OK: 0 processes with command name dhclient
[17:19:50] <icinga-wm>	 RECOVERY - nutcracker port on mw1136 is OK: TCP OK - 0.000 second response time on port 11212
[17:20:19] <icinga-wm>	 RECOVERY - DPKG on mw1136 is OK: All packages OK
[17:20:20] <icinga-wm>	 RECOVERY - salt-minion processes on mw1136 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[17:21:20] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: puppet fail
[17:24:18] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] Introduce 'Backends' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 (owner: 10Yuvipanda)
[17:24:41] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] Add LICENSE [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292056 (owner: 10Yuvipanda)
[17:25:21] <wikibugs>	 06Operations, 06Release-Engineering-Team, 05Gitblit-Deprecate, 13Patch-For-Review: write Apache rewrite rules for  gitblit -> diffusion migration - https://phabricator.wikimedia.org/T137224#2363735 (10greg) >>! In T137224#2382844, @mmodell wrote: >>>! In T137224#2381927, @Joe wrote: >> @20after4 do you thi...
[17:25:49] <icinga-wm>	 PROBLEM - nutcracker process on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[17:25:49] <icinga-wm>	 PROBLEM - SSH on mw1136 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[17:26:09] <icinga-wm>	 PROBLEM - HHVM processes on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[17:26:09] <icinga-wm>	 PROBLEM - configured eth on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[17:26:21] <icinga-wm>	 PROBLEM - dhclient process on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[17:26:21] <grrrit-wm>	 (03PS30) 10Yuvipanda: Add a Kubernetes backend [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/293063 
[17:26:40] <icinga-wm>	 PROBLEM - nutcracker port on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[17:27:00] <icinga-wm>	 PROBLEM - DPKG on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[17:27:10] <icinga-wm>	 PROBLEM - salt-minion processes on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[17:27:26] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] Add a Kubernetes backend [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/293063 (owner: 10Yuvipanda)
[17:27:29] <icinga-wm>	 PROBLEM - Disk space on mw1136 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[17:31:51] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures
[17:34:09] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1136 is OK: OK: nf_conntrack is 0 % full
[17:34:09] <icinga-wm>	 RECOVERY - Disk space on mw1136 is OK: DISK OK
[17:34:40] <icinga-wm>	 RECOVERY - nutcracker process on mw1136 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[17:34:40] <icinga-wm>	 RECOVERY - SSH on mw1136 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0)
[17:34:59] <icinga-wm>	 RECOVERY - HHVM processes on mw1136 is OK: PROCS OK: 6 processes with command name hhvm
[17:35:00] <icinga-wm>	 RECOVERY - configured eth on mw1136 is OK: OK - interfaces up
[17:35:20] <icinga-wm>	 RECOVERY - dhclient process on mw1136 is OK: PROCS OK: 0 processes with command name dhclient
[17:35:31] <icinga-wm>	 RECOVERY - nutcracker port on mw1136 is OK: TCP OK - 0.000 second response time on port 11212
[17:35:40] <icinga-wm>	 RECOVERY - Apache HTTP on mw1136 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 1.528 second response time
[17:35:51] <icinga-wm>	 RECOVERY - puppet last run on mw1117 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[17:35:59] <icinga-wm>	 RECOVERY - DPKG on mw1136 is OK: All packages OK
[17:36:00] <icinga-wm>	 RECOVERY - salt-minion processes on mw1136 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[17:36:11] <icinga-wm>	 RECOVERY - HHVM rendering on mw1136 is OK: HTTP OK: HTTP/1.1 200 OK - 67293 bytes in 0.443 second response time
[17:38:09] <icinga-wm>	 RECOVERY - puppet last run on mw1136 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:40:01] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw2241 is OK: OK
[17:42:49] <grrrit-wm>	 (03PS1) 10Yuvipanda: Bump deb version [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/294741 
[17:44:13] <wikibugs>	 07Blocked-on-Operations, 10Datasets-Archiving, 10Dumps-Generation, 10Flow, 03Collab-Team-2016-Apr-Jun-Q4: Publish recurring Flow dumps at http://dumps.wikimedia.org/ - https://phabricator.wikimedia.org/T119511#2386030 (10Mattflaschen-WMF) 05Open>03Resolved >>! In T119511#2384613, @Nemo_bis wrote: > h...
[17:47:40] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:52:50] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10Monitoring, 06Services: Update restbase catchpoint metric - https://phabricator.wikimedia.org/T137181#2386062 (10GWicke) Thanks, @mark!
[17:53:02] <grrrit-wm>	 (03PS1) 10Thcipriani: scap: make deployment aware of canary machines [puppet] - 10https://gerrit.wikimedia.org/r/294742 (https://phabricator.wikimedia.org/T110068) 
[17:54:31] <wikibugs>	 06Operations, 06Performance-Team, 10Traffic: Support brotli compression - https://phabricator.wikimedia.org/T137979#2386066 (10BBlack) Other interesting references:  https://datatracker.ietf.org/doc/draft-alakuijala-brotli/ (IETF standard, seems pretty far along in the approval process) https://blog.cloudfla...
[17:59:20] <grrrit-wm>	 (03CR) 10Thcipriani: "I would like to add a target object in scap that uses etcd to get a list of targets; however, looking at what's currently available via co" [puppet] - 10https://gerrit.wikimedia.org/r/294742 (https://phabricator.wikimedia.org/T110068) (owner: 10Thcipriani)
[18:11:33] <wikibugs>	 06Operations, 06Performance-Team, 10Traffic: Support brotli compression - https://phabricator.wikimedia.org/T137979#2386165 (10BBlack) p:05Normal>03Low A very quick check (just a couple of minutes on one cache_text machine) shows about 7% of requests indicate brotli support in Accept-Encoding.  Not big e...
[18:20:12] <wikibugs>	 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2386183 (10Whatamidoing-WMF) I've posted notes for the newest four.
[18:21:29] <wikibugs>	 06Operations, 10Traffic, 06Wikipedia-Android-App-Backlog, 10iOS-app-Bugs: Zero: Investigate removing the limit on carrier tagging to m-dot and zero-dot requests - https://phabricator.wikimedia.org/T137990#2386185 (10Mholloway)
[18:21:44] <wikibugs>	 06Operations, 10Traffic, 06Wikipedia-Android-App-Backlog, 06Zero, 10iOS-app-Bugs: Zero: Investigate removing the limit on carrier tagging to m-dot and zero-dot requests - https://phabricator.wikimedia.org/T137990#2386188 (10Mholloway)
[18:22:29] <mobrovac>	 !log change-prop deploying bc87a1fecfa
[18:22:33] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:22:56] * robh is running out for lunch (just mentioning it since he is on ops clinic duty)
[18:24:48] <grrrit-wm>	 (03CR) 10Urbanecm: [C: 031] "Looks good for me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294652 (https://phabricator.wikimedia.org/T137888) (owner: 10Luke081515)
[18:26:20] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[18:35:38] <wikibugs>	 06Operations, 06Performance-Team, 10Traffic: Support brotli compression - https://phabricator.wikimedia.org/T137979#2385879 (10ori) >>! In T137979#2386165, @BBlack wrote: > A very quick check (just a couple of minutes on one cache_text machine) shows about 7% of requests indicate brotli support in Accept-Enc...
[18:37:02] <tgr>	 !log running invalidateUserSessions.php for T137799
[18:37:06] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:45:09] <icinga-wm>	 PROBLEM - check_puppetrun on db1025 is CRITICAL: CRITICAL: puppet fail
[18:49:42] <wikibugs>	 06Operations, 06Performance-Team, 10Traffic: Support brotli compression - https://phabricator.wikimedia.org/T137979#2385879 (10Krinkle) >>! In T137979#2386210, @ori wrote: >>>! In T137979#2386165, @BBlack wrote: >> A very quick check (just a couple of minutes on one cache_text machine) shows about 7% of requ...
[18:50:09] <icinga-wm>	 PROBLEM - check_puppetrun on db1025 is CRITICAL: CRITICAL: puppet fail
[18:51:39] <wikibugs>	 06Operations, 06Performance-Team, 10Traffic: Support brotli compression - https://phabricator.wikimedia.org/T137979#2386232 (10BBlack) Yeah ori's right, I didn't filter properly.  Interesting!
[18:55:09] <icinga-wm>	 PROBLEM - check_puppetrun on db1025 is CRITICAL: CRITICAL: puppet fail
[19:00:04] <jouncebot>	 hashar: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160616T1900).
[19:00:09] <icinga-wm>	 RECOVERY - check_puppetrun on db1025 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures
[19:00:29] <hashar>	 jouncebot: o/
[19:01:01] <grrrit-wm>	 (03PS1) 10Papaul: DHCP: Add mw2243 MAC address Bug:T135466 [puppet] - 10https://gerrit.wikimedia.org/r/294745 (https://phabricator.wikimedia.org/T135466) 
[19:02:29] <grrrit-wm>	 (03PS1) 10Hashar: all wikis to 1.28.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294746 
[19:02:40] <hashar>	 896 migrated  !
[19:03:02] <grrrit-wm>	 (03PS2) 10Hashar: all wikis to 1.28.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294746 (https://phabricator.wikimedia.org/T136971) 
[19:04:37] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] all wikis to 1.28.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294746 (https://phabricator.wikimedia.org/T136971) (owner: 10Hashar)
[19:05:13] <grrrit-wm>	 (03Merged) 10jenkins-bot: all wikis to 1.28.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294746 (https://phabricator.wikimedia.org/T136971) (owner: 10Hashar)
[19:05:32] <logmsgbot>	 !log hashar@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.28.0-wmf.6
[19:05:36] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:05:42] <hashar>	 doh
[19:05:45] <hashar>	 doesnt sound right
[19:06:19] <icinga-wm>	 PROBLEM - changeprop endpoints health on scb1002 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.16.21, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused)))
[19:07:28] <hashar>	 Jun 16 19:06:44 mw1138:  #012Warning: parseAndStash() expects exactly 4 parameters, 3 given in /srv/mediawiki/php-1.28.0-wmf.6/includes/api/ApiStashEdit.php on line 182
[19:07:28] <hashar>	 Jun 16 19:06:45 mw1138:  #012Notice: Undefined variable: summary in /srv/mediawiki/php-1.28.0-wmf.6/includes/api/ApiStashEdit.php on line 157
[19:07:54] <MatmaRex>	 hashar: AaronSchulz / ori
[19:08:14] * aude_ waves
[19:08:30] <icinga-wm>	 RECOVERY - changeprop endpoints health on scb1002 is OK: All endpoints are healthy
[19:08:37] <hashar>	 I am wondering whether that has an impact on actual editions
[19:08:50] <icinga-wm>	 PROBLEM - puppet last run on mw2154 is CRITICAL: CRITICAL: puppet fail
[19:10:13] <hashar>	 at least https://grafana.wikimedia.org/dashboard/db/edit-count  does not show any drop
[19:11:04] <aude>	 edit summaries seem to still work
[19:11:21] <MatmaRex>	 hmm, possibly just stashing
[19:11:28] <aude>	 though sure i could be missing something
[19:11:37] <aude>	 doesn't stashing always happen
[19:11:39] <aude>	 ?
[19:11:42] <MatmaRex>	 ori or aaron will probably soon notice that their stash rate metrics just dropped by 100% :P
[19:11:46] <aude>	 with wikitext editing?
[19:13:06] <AaronSchulz>	 MatmaRex: meeting now, I can look soon
[19:14:44] <hashar>	 filled as https://phabricator.wikimedia.org/T137995
[19:17:25] <grrrit-wm>	 (03CR) 10JanZerebecki: "I did a quick check of how cargo verifies downloads: https://phabricator.wikimedia.org/T137996 please reopen if more is needed." [debs/geckodriver] - 10https://gerrit.wikimedia.org/r/294293 (https://phabricator.wikimedia.org/T137797) (owner: 10Hashar)
[19:21:49] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 4 failures
[19:25:39] <icinga-wm>	 PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master).
[19:35:54] <icinga-wm>	 RECOVERY - puppet last run on mw2154 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:47:33] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:48:38] <wikibugs>	 07Blocked-on-Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 05Gerrit-Migration, and 2 others: Package xhpast (libphutil) - https://phabricator.wikimedia.org/T137770#2386535 (10mmodell)
[19:51:06] <wikibugs>	 07Blocked-on-Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 05Gerrit-Migration, and 2 others: Package xhpast (libphutil) - https://phabricator.wikimedia.org/T137770#2386536 (10mmodell)
[19:51:24] <wikibugs>	 07Blocked-on-Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 05Gerrit-Migration, and 2 others: Package xhpast (libphutil) - https://phabricator.wikimedia.org/T137770#2378301 (10mmodell)
[19:52:18] <wikibugs>	 07Blocked-on-Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 05Gerrit-Migration, and 2 others: Package xhpast (libphutil) - https://phabricator.wikimedia.org/T137770#2378301 (10mmodell)
[19:53:33] <icinga-wm>	 PROBLEM - Start and verify pages via webservices on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - 187 bytes in 10.840 second response time
[19:53:56] <grrrit-wm>	 (03PS2) 10Yuvipanda: Bump deb version [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/294741 
[19:55:44] <icinga-wm>	 RECOVERY - Start and verify pages via webservices on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 14.663 second response time
[19:55:53] <grrrit-wm>	 (03CR) 10Gehel: Team-interactive receives maps alerts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[19:56:10] <grrrit-wm>	 (03PS4) 10Gehel: Team-interactive receives maps alerts [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) 
[19:57:31] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Team-interactive receives maps alerts [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[19:58:08] <grrrit-wm>	 (03CR) 10Gehel: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[20:03:10] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures
[20:10:52] <grrrit-wm>	 (03PS3) 10EBernhardson: search: Dependent config for textcat AB test. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294739 
[20:10:57] <grrrit-wm>	 (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[20:19:33] <grrrit-wm>	 (03CR) 10EBernhardson: "minor quibble, but if adding a parameters documentation section might as well document the existing parameter as well." [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[20:20:32] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures
[20:22:53] <grrrit-wm>	 (03PS5) 10Gehel: Team-interactive receives maps alerts [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) 
[20:23:43] <grrrit-wm>	 (03CR) 10Gehel: "@EBernhardson: documentation added, but I'm not really sure of what that does. I'll ask @yurik to check it..." [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[20:24:05] <grrrit-wm>	 (03PS1) 10JanZerebecki: Add gitblit compatibility apache vhost to phabricator [puppet] - 10https://gerrit.wikimedia.org/r/294784 (https://phabricator.wikimedia.org/T137224) 
[20:25:00] <grrrit-wm>	 (03CR) 10Paladox: [C: 031] ":)" [puppet] - 10https://gerrit.wikimedia.org/r/294784 (https://phabricator.wikimedia.org/T137224) (owner: 10JanZerebecki)
[20:25:21] <grrrit-wm>	 (03PS6) 10Gehel: Team-interactive receives maps alerts [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) 
[20:28:01] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[20:29:36] <grrrit-wm>	 (03PS7) 10Gehel: Team-interactive receives maps alerts [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) 
[20:31:52] <wikibugs>	 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2343854 (10Danmichaelo) Fixed CatWatchBot and hopefully the remaining tasks for DanmicholoBot
[20:34:45] <grrrit-wm>	 (03PS8) 10Gehel: Team-interactive receives maps alerts [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) 
[20:37:03] <grrrit-wm>	 (03CR) 10Yurik: [C: 031] Team-interactive receives maps alerts [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[20:38:36] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] Team-interactive receives maps alerts [puppet] - 10https://gerrit.wikimedia.org/r/294723 (https://phabricator.wikimedia.org/T137869) (owner: 10Gehel)
[20:43:50] <grrrit-wm>	 (03PS1) 10Yuvipanda: labspuppetbackend: Make sure to propogate errors to uwsgi log [puppet] - 10https://gerrit.wikimedia.org/r/294834 
[20:43:53] <wikibugs>	 06Operations, 06Discovery, 06Maps, 07Epic: Epic: switch Maps to production status - https://phabricator.wikimedia.org/T133744#2386810 (10Gehel)
[20:43:55] <wikibugs>	 06Operations, 06Discovery, 06Maps, 03Maps-Sprint, 13Patch-For-Review: Review alerting scheme for Maps - https://phabricator.wikimedia.org/T137869#2386809 (10Gehel) 05Open>03Resolved
[20:44:45] <wikibugs>	 06Operations, 06Discovery, 06Maps, 13Patch-For-Review: "Is maps service alive?" check - https://phabricator.wikimedia.org/T137851#2386811 (10Gehel) Check implemented, also alerting team-interactive.
[20:44:54] <wikibugs>	 06Operations, 06Discovery, 06Maps, 07Epic: Epic: switch Maps to production status - https://phabricator.wikimedia.org/T133744#2241419 (10Gehel)
[20:44:56] <wikibugs>	 06Operations, 06Discovery, 06Maps, 13Patch-For-Review: "Is maps service alive?" check - https://phabricator.wikimedia.org/T137851#2386812 (10Gehel) 05Open>03Resolved
[20:45:09] <grrrit-wm>	 (03PS2) 10JanZerebecki: Add gitblit compatibility apache vhost to phabricator [puppet] - 10https://gerrit.wikimedia.org/r/294784 (https://phabricator.wikimedia.org/T137224) 
[20:45:15] <yuvipanda>	 andrewbogott (IRC): if you merge ^ patch you can re-enable puppet on labstestcontrol2001
[20:45:56] <grrrit-wm>	 (03PS2) 10Andrew Bogott: labspuppetbackend: Make sure to propogate errors to uwsgi log [puppet] - 10https://gerrit.wikimedia.org/r/294834 (owner: 10Yuvipanda)
[20:46:40] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[20:49:44] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] labspuppetbackend: Make sure to propogate errors to uwsgi log [puppet] - 10https://gerrit.wikimedia.org/r/294834 (owner: 10Yuvipanda)
[20:50:22] <wikibugs>	 06Operations, 06Discovery, 06Maps: Improve automation around Maps servers - https://phabricator.wikimedia.org/T138017#2386884 (10Gehel)
[20:51:19] <wikibugs>	 06Operations, 06Discovery, 06Maps: Ensure that maps server can be automatically installed (fully puppetized) - https://phabricator.wikimedia.org/T135750#2386903 (10Gehel) We are good enough at the moment. Some notes about things we still need to improve are in T138017.
[20:51:21] <grrrit-wm>	 (03PS3) 10Yuvipanda: labspuppetbackend: Make sure to propogate errors to uwsgi log [puppet] - 10https://gerrit.wikimedia.org/r/294834 
[20:51:25] <wikibugs>	 06Operations, 06Discovery, 06Maps: Ensure that maps server can be automatically installed (fully puppetized) - https://phabricator.wikimedia.org/T135750#2386905 (10Gehel) 05Open>03Resolved
[20:51:26] <grrrit-wm>	 (03PS4) 10Andrew Bogott: labspuppetbackend: Make sure to propogate errors to uwsgi log [puppet] - 10https://gerrit.wikimedia.org/r/294834 (owner: 10Yuvipanda)
[20:51:27] <wikibugs>	 06Operations, 06Discovery, 06Maps, 07Epic: Epic: switch Maps to production status - https://phabricator.wikimedia.org/T133744#2386906 (10Gehel)
[20:53:34] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] labspuppetbackend: Make sure to propogate errors to uwsgi log [puppet] - 10https://gerrit.wikimedia.org/r/294834 (owner: 10Yuvipanda)
[20:53:51] <wikibugs>	 06Operations, 10Android-app-feature-Feeds, 10Mobile-Content-Service, 10RESTBase, and 2 others: Investigate Android app API request latency regression - https://phabricator.wikimedia.org/T138010#2386912 (10GWicke)
[20:59:21] <grrrit-wm>	 (03PS2) 10Gehel: maps caches: remove referrer checks [puppet] - 10https://gerrit.wikimedia.org/r/294390 (https://phabricator.wikimedia.org/T137848) (owner: 10MaxSem)
[21:01:03] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures
[21:01:52] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] "Alerting is good, so are all blockers to https://phabricator.wikimedia.org/T133744" [puppet] - 10https://gerrit.wikimedia.org/r/294390 (https://phabricator.wikimedia.org/T137848) (owner: 10MaxSem)
[21:03:42] <grrrit-wm>	 (03PS3) 10Yuvipanda: Bump deb version [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/294741 
[21:03:44] <grrrit-wm>	 (03PS1) 10Yuvipanda: Add appropriate dependencies to package [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/294837 
[21:15:28] <logmsgbot>	 !log hashar@tin Synchronized php-1.28.0-wmf.6/extensions/VisualEditor/ApiVisualEditor.php: Pass empty summary to parseAndStash() to avoid warnings  T137995 (duration: 00m 39s)
[21:15:29] <stashbot>	 T137995: ApiStashEdit warning and notices - https://phabricator.wikimedia.org/T137995
[21:15:32] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:16:30] <hashar>	 (train is done)
[21:16:52] <icinga-wm>	 RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge.
[21:18:07] <wikibugs>	 06Operations, 10Android-app-feature-Feeds, 10Mobile-Content-Service, 10RESTBase, and 2 others: Investigate Android app API request latency regression - https://phabricator.wikimedia.org/T138010#2387040 (10GWicke) Another possibility is that there are issues with the eventlogging instrumentation. The number...
[21:20:37] <wikibugs>	 06Operations, 06Discovery, 10Kartotherian, 06Maps, and 3 others: Remove referrer check from varnish for maps cluster - https://phabricator.wikimedia.org/T137848#2387056 (10Gehel) 05Open>03Resolved
[21:27:33] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[21:33:32] <grrrit-wm>	 (03PS2) 10Jhobs: Prepare Wikidata descriptions on mobile for production rollout [mediawiki-config] - 10https://gerrit.wikimedia.org/r/293883 (https://phabricator.wikimedia.org/T127250) 
[21:36:36] <grrrit-wm>	 (03CR) 10Bmansurov: [C: 031] Prepare Wikidata descriptions on mobile for production rollout [mediawiki-config] - 10https://gerrit.wikimedia.org/r/293883 (https://phabricator.wikimedia.org/T127250) (owner: 10Jhobs)
[21:43:18] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] Add appropriate dependencies to package [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/294837 (owner: 10Yuvipanda)
[21:46:18] <wikibugs>	 06Operations, 10Android-app-feature-Feeds, 10Mobile-Content-Service, 10RESTBase, and 2 others: Investigate Android app API request latency regression - https://phabricator.wikimedia.org/T138010#2387115 (10Mholloway) Sure, one or both of us can look into this.
[21:46:21] <grrrit-wm>	 (03PS4) 10Yuvipanda: Bump deb version [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/294741 
[21:46:24] <grrrit-wm>	 (03PS1) 10Yuvipanda: Exit when given unsupported parameters [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/294843 
[21:46:26] <grrrit-wm>	 (03PS1) 10Yuvipanda: Set explicit default for args.type [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/294844 
[22:00:04] <jouncebot>	 yurik and maxsem: Dear anthropoid, the time has come. Please deploy Enable Maps Wikidata & Commons (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160616T2200).
[22:04:48] <wikibugs>	 06Operations, 10DBA, 06Labs, 10Tool-Labs, 10Traffic: Antigng-bot improper non-api http requests - https://phabricator.wikimedia.org/T137707#2376097 (10Legoktm) >>! In T137707#2382924, @Antigng_ wrote: > Lack of hard and fast limit on read requests can be a problem, since your definition of request limit...
[22:05:17] <wikibugs>	 06Operations, 03Discovery-Search-Sprint: Enable GC (garbage collection) logs on Elasticsearch JVM - https://phabricator.wikimedia.org/T134853#2387206 (10debt)
[22:05:41] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures
[22:13:55] <grrrit-wm>	 (03PS1) 10MaxSem: Enable Kartographer on Commons and Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294854 (https://phabricator.wikimedia.org/T138029) 
[22:16:31] <grrrit-wm>	 (03CR) 10Yurik: [C: 031] Enable Kartographer on Commons and Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294854 (https://phabricator.wikimedia.org/T138029) (owner: 10MaxSem)
[22:18:44] <yurik>	 MaxSem, there are two patches i cherrypicked
[22:19:22] <yurik>	 https://gerrit.wikimedia.org/r/#/c/294856/
[22:19:25] <yurik>	 https://gerrit.wikimedia.org/r/#/c/294855/
[22:20:51] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures
[22:21:27] <yurik>	 MaxSem, ^^
[22:21:50] <wikibugs>	 06Operations, 10Parsoid, 06Services: Migrate Parsoid cluster to Jessie / node 4.x - https://phabricator.wikimedia.org/T135176#2387274 (10ssastry) Addendum to my earlier performance numbers: On a bunch of pages, looks like DOM post processing is about 2x faster on v4.3 vs v0.10 on my laptop.
[22:22:45] <grrrit-wm>	 (03CR) 10MaxSem: [C: 032] Enable Kartographer on Commons and Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294854 (https://phabricator.wikimedia.org/T138029) (owner: 10MaxSem)
[22:22:51] * yurik hides
[22:23:18] <grrrit-wm>	 (03Merged) 10jenkins-bot: Enable Kartographer on Commons and Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294854 (https://phabricator.wikimedia.org/T138029) (owner: 10MaxSem)
[22:24:13] <grrrit-wm>	 (03CR) 10Niedzielski: "@hashar, I was afraid I had said something!" [puppet] - 10https://gerrit.wikimedia.org/r/264303 (owner: 10Niedzielski)
[22:24:27] <logmsgbot>	 !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/294854/ (duration: 00m 26s)
[22:24:31] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:27:01] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[22:32:57] <yurik>	 MaxSem, maps show up on both, all's good
[22:33:37] <logmsgbot>	 !log maxsem@tin Synchronized php-1.28.0-wmf.6/extensions/Kartographer: https://gerrit.wikimedia.org/r/294856 https://gerrit.wikimedia.org/r/294855 (duration: 00m 30s)
[22:33:40] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:46:51] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[22:59:01] <wikibugs>	 06Operations, 06Collaboration-Team-Interested, 10Flow, 07WorkType-Maintenance: Setup separate logical External Store for Flow - https://phabricator.wikimedia.org/T107610#1499219 (10Mattflaschen-WMF)
[23:00:04] <jouncebot>	 RoanKattouw, ostriches, Krenair, MaxSem, awight, and Dereckson: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160616T2300).
[23:00:04] <jouncebot>	 dr0ptp4kt, Luke081515, and EBernhardson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[23:00:16] * Luke081515 is here
[23:00:35] * dr0ptp4kt here
[23:02:04] <icinga-wm>	 PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures
[23:02:44] <ebernhardson>	 here
[23:03:36] <Luke081515>	 who will SWAT?
[23:04:01] <ebernhardson>	 well, since noone is jumping i suppose i can do it
[23:04:41] <ebernhardson>	 first in the list, dr0ptp4kt 
[23:04:51] <Dereckson>	 Hello.
[23:04:52] <ebernhardson>	 labs only change, seems safe enough
[23:04:53] <dr0ptp4kt>	 wooooooo! i'm going to disneyland!
[23:05:01] <grrrit-wm>	 (03PS3) 10EBernhardson: Prepare Wikidata descriptions on mobile for production rollout [mediawiki-config] - 10https://gerrit.wikimedia.org/r/293883 (https://phabricator.wikimedia.org/T127250) (owner: 10Jhobs)
[23:05:08] <MaxSem>	 dr0ptp4kt, beware of gators
[23:05:15] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] Prepare Wikidata descriptions on mobile for production rollout [mediawiki-config] - 10https://gerrit.wikimedia.org/r/293883 (https://phabricator.wikimedia.org/T127250) (owner: 10Jhobs)
[23:05:19] <Dereckson>	 Thanks ebernhardson to take care of this SWAT :)
[23:05:22] <ebernhardson>	 Dereckson: np
[23:05:35] <dr0ptp4kt>	 MaxSem!
[23:05:45] <grrrit-wm>	 (03PS2) 10EBernhardson: Two permission changes at urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294652 (https://phabricator.wikimedia.org/T137888) (owner: 10Luke081515)
[23:05:56] <grrrit-wm>	 (03Merged) 10jenkins-bot: Prepare Wikidata descriptions on mobile for production rollout [mediawiki-config] - 10https://gerrit.wikimedia.org/r/293883 (https://phabricator.wikimedia.org/T127250) (owner: 10Jhobs)
[23:07:40] <logmsgbot>	 !log ebernhardson@tin Synchronized wmf-config/InitialiseSettings-labs.php: T127250: Prepare Wikidata descriptions on mobile for production rollout (duration: 00m 27s)
[23:07:41] <stashbot>	 T127250: Prepare Wikidata descriptions to roll out to stable - https://phabricator.wikimedia.org/T127250
[23:07:41] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] "patch matches ticket. SWAT'ing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294652 (https://phabricator.wikimedia.org/T137888) (owner: 10Luke081515)
[23:07:44] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:08:49] <wikibugs>	 06Operations, 06Collaboration-Team-Interested, 10DBA, 10Flow, 07WorkType-Maintenance: Setup separate logical External Store for Flow - https://phabricator.wikimedia.org/T107610#2387618 (10Mattflaschen-WMF) a:03jcrespo @jcrespo, I think this is the next concrete step ({T119568} will get QA-ed, but that'...
[23:08:51] <ebernhardson>	 dr0ptp4kt: your patch is synced
[23:09:00] <dr0ptp4kt>	 ebernhardson: thx, will check
[23:09:04] <grrrit-wm>	 (03PS3) 10Luke081515: Two permission changes at urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294652 (https://phabricator.wikimedia.org/T137888) 
[23:09:14] <Luke081515>	 ebernhardson: another rebase was needed :-/
[23:09:26] <Luke081515>	 only ff is sometimes annoying
[23:09:50] <ebernhardson>	 indeed :)
[23:10:08] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] "one more time!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294652 (https://phabricator.wikimedia.org/T137888) (owner: 10Luke081515)
[23:10:37] <dr0ptp4kt>	 ebernhardson: looks good. no fatals? if so, good.
[23:10:42] <grrrit-wm>	 (03Merged) 10jenkins-bot: Two permission changes at urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294652 (https://phabricator.wikimedia.org/T137888) (owner: 10Luke081515)
[23:11:36] <logmsgbot>	 !log ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: T137888: Two permission changes at urwiki (duration: 00m 27s)
[23:11:37] <stashbot>	 T137888: Enable  Accountcreator and Filemover groups on Urdu Wikipedia - https://phabricator.wikimedia.org/T137888
[23:11:39] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:12:06] <ebernhardson>	 Luke081515: you're synced out. I imagine you can't directly test though?
[23:12:21] <Luke081515>	 ebernhardson: I checked Special:ListGroupRights, it works :)
[23:12:35] <Luke081515>	 thank you for swat :)
[23:13:30] <ebernhardson>	 sweet
[23:13:58] <grrrit-wm>	 (03PS4) 10EBernhardson: search: Dependent config for textcat AB test. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294739 
[23:14:06] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294739 (owner: 10EBernhardson)
[23:14:46] <grrrit-wm>	 (03Merged) 10jenkins-bot: search: Dependent config for textcat AB test. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294739 (owner: 10EBernhardson)
[23:16:01] <logmsgbot>	 !log ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: T137167: search: Dependent config for textcat AB test. (duration: 00m 26s)
[23:16:02] <stashbot>	 T137167: Part Deux: TextCat A/B test for Language Identification - create and deploy - https://phabricator.wikimedia.org/T137167
[23:16:04] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:17:01] <wikibugs>	 06Operations, 06Collaboration-Team-Interested, 10DBA, 10Flow, 07WorkType-Maintenance: Setup separate logical External Store for Flow - https://phabricator.wikimedia.org/T107610#2387654 (10Mattflaschen-WMF)
[23:18:47] <wikibugs>	 06Operations, 06Collaboration-Team-Interested, 10DBA, 10Flow, 07WorkType-Maintenance: Setup separate logical External Store for Flow in production - https://phabricator.wikimedia.org/T107610#1499219 (10Mattflaschen-WMF)
[23:19:42] <logmsgbot>	 !log ebernhardson@tin Synchronized php-1.28.0-wmf.6/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: T137167: TextCat A/B test for Language Identification (duration: 00m 24s)
[23:19:43] <stashbot>	 T137167: Part Deux: TextCat A/B test for Language Identification - create and deploy - https://phabricator.wikimedia.org/T137167
[23:19:45] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:24:10] <logmsgbot>	 !log ebernhardson@tin Synchronized php-1.28.0-wmf.6/extensions/WikimediaEvents/extension.json: T137167: TextCat A/B test for Language Identification (duration: 00m 24s)
[23:24:10] <stashbot>	 T137167: Part Deux: TextCat A/B test for Language Identification - create and deploy - https://phabricator.wikimedia.org/T137167
[23:24:13] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:26:53] <icinga-wm>	 RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[23:40:19] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2387684 (10Papaul) a:05RobH>03Papaul
[23:43:43] <wikibugs>	 06Operations, 06Release-Engineering-Team, 05Gitblit-Deprecate, 13Patch-For-Review: write Apache rewrite rules for  gitblit -> diffusion migration - https://phabricator.wikimedia.org/T137224#2387689 (10Danny_B) URLs marked with {icon check-square-o color=green} are redirected to their appropriate or similar...
[23:44:42] <logmsgbot>	 !log ebernhardson@tin Synchronized php-1.28.0-wmf.6/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: T137167: TextCat A/B test for Language Identification (duration: 00m 25s)
[23:44:43] <stashbot>	 T137167: Part Deux: TextCat A/B test for Language Identification - create and deploy - https://phabricator.wikimedia.org/T137167
[23:44:49] <wikibugs>	 06Operations, 06Release-Engineering-Team, 05Gitblit-Deprecate, 13Patch-For-Review: write Apache rewrite rules for  gitblit -> diffusion migration - https://phabricator.wikimedia.org/T137224#2387692 (10Paladox) @Danny_B thankyou :).
[23:49:35] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage: codfw: rack/setup/deploy ms-be202[2-7] switch configuration - https://phabricator.wikimedia.org/T138052#2387694 (10Papaul)
[23:54:56] <grrrit-wm>	 (03PS8) 10Mattflaschen: Change login cookies (for 'Remember me') to a one year expiry. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230954 (https://phabricator.wikimedia.org/T68699)