[00:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170309T0000). Please do the needful.
[00:00:04] <jouncebot>	 RoanKattouw: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[00:00:19] <RoanKattouw>	 I'll do the SWAT since I'm the only customer
[00:00:35] <mutante>	 i see now, i need to read more wmfall
[00:01:06] <mutante>	 "his March, we are celebrating our first WMF March Holiday "  nice!
[00:01:33] <mutante>	 well, jouncebot, you are great
[00:02:46] <paladox>	 lol
[00:03:45] <wikibugs_>	 06Operations, 06Discovery, 10Traffic, 06Maps (Tilerator): Tilerator should purge Varnish cache - https://phabricator.wikimedia.org/T109776#1558879 (10Pnorman) I'd rather see `max-age` significantly reduced and `stale-while-revalidate` set to the current `max-age` value. This avoids the need to invalidate t...
[00:04:03] <mutante>	 when does the phab window begin?
[00:04:50] <mutante>	 twentyafterfour: 
[00:06:10] <RoanKattouw>	 Hopefully not during SWAT? :D
[00:09:54] <bd808>	 mutante: the 'hoilday' is the made up WMF day off in March. there is an email about it on wmfall
[00:10:31] <twentyafterfour>	 mutante: in an hour
[00:10:39] <mutante>	 bd808: i just found out about that... thanks to the bot :)
[00:10:58] <bd808>	 jouncebot: refresh
[00:11:06] <jouncebot>	 I refreshed my knowledge about deployments.
[00:11:10] <mutante>	 it's nice to have a free Monday that is NOT a holiday for everybody
[00:11:17] <mutante>	 good for hotel/flights, heh
[00:11:47] <mutante>	 twentyafterfour: gotcha, thanks. i was thinking about the "phd to systemd" thing
[00:13:45] <wikibugs_>	 (03PS1) 10Dzahn: remove fluorine from DHCP config [puppet] - 10https://gerrit.wikimedia.org/r/341939 (https://phabricator.wikimedia.org/T159996)
[00:15:45] <wikibugs_>	 06Operations, 13Patch-For-Review: replace fluorine with mwlog servers (was: Upgrade fluorine to trusty/jessie) - https://phabricator.wikimedia.org/T123728#3086271 (10Dzahn)
[00:15:47] <wikibugs_>	 (03PS1) 10Dzahn: mediawiki::logging: remove fluorine from firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/341940 (https://phabricator.wikimedia.org/T159996)
[00:17:47] <wikibugs_>	 (03PS1) 10Dzahn: remove fluorine prod IP, keep mgmt [dns] - 10https://gerrit.wikimedia.org/r/341941 (https://phabricator.wikimedia.org/T159996)
[00:18:02] <wikibugs_>	 (03PS2) 10Dzahn: remove fluorine prod IP, keep mgmt [dns] - 10https://gerrit.wikimedia.org/r/341941 (https://phabricator.wikimedia.org/T159996)
[00:18:32] <logmsgbot>	 !log catrope@tin Synchronized php-1.29.0-wmf.15/extensions/Echo/modules/styles/mw.echo.ui.NotificationBadgeWidget.less: Fix RTL popup alignment (T159999) (duration: 00m 42s)
[00:18:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:18:41] <stashbot>	 T159999: 1.29.0-wmf.15 regression: notification popup misaligned, partially off-screen in RTL - https://phabricator.wikimedia.org/T159999
[00:22:29] <wikibugs_>	 (03CR) 10Dzahn: "@Krinkle" [puppet] - 10https://gerrit.wikimedia.org/r/341789 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi)
[00:23:08] <wikibugs_>	 (03CR) 10Krinkle: [C: 031] "Thanks" [puppet] - 10https://gerrit.wikimedia.org/r/341940 (https://phabricator.wikimedia.org/T159996) (owner: 10Dzahn)
[00:23:19] <wikibugs_>	 (03CR) 10Krinkle: [C: 031] site: use spare::system on fluorine [puppet] - 10https://gerrit.wikimedia.org/r/341789 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi)
[00:34:14] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] Phabricator: Migrate to base::service_unit for phd [puppet] - 10https://gerrit.wikimedia.org/r/340158 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox)
[00:35:01] <mutante>	 RoanKattouw: are you still swatting?
[00:35:10] <RoanKattouw>	 Sorry, no I'm done
[00:35:27] <mutante>	 ok, good timing then. i'm doing a phab change, right before the phab window starts
[00:36:37] <mutante>	 !log iridium - temp. disable puppet | phab1001 - converting service to base::service_unit (T137928)
[00:36:43] <mutante>	 2001, 
[00:36:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:36:45] <stashbot>	 T137928: Deploy phabricator to phab2001.codfw.wmnet - https://phabricator.wikimedia.org/T137928
[00:36:57] <mutante>	 paladox: you here?
[00:37:00] <paladox>	 yep
[00:37:30] <mutante>	 i was gonna say there is a problem with 2001, but there is not
[00:37:53] <paladox>	 Oh
[00:38:29] <mutante>	 it said: Service[phd]/enable: change from false to true failed: Could not enable phd:
[00:38:34] <mutante>	 but:
[00:38:42] <mutante>	 a) we disabled it there on purpose
[00:38:51] <mutante>	 b) for some reason that does not repeat itself on each puppet run
[00:39:11] <paladox>	 oh
[00:39:30] <mutante>	 it's because we used "mask" to disable it
[00:39:41] <mutante>	 i think
[00:40:09] <paladox>	 yep
[00:40:18] <mutante>	 unfortunately this shows as "failed" in the overview
[00:40:26] <mutante>	 which makes it look bad in Icinga
[00:40:29] <paladox>	 oh
[00:40:36] <mutante>	 but that isn't a change due to this 
[00:40:38] <mutante>	 it was like it before
[00:40:47] <mutante>	 it's just been ACKed
[00:41:08] <mutante>	 now we get to iridium ...
[00:41:54] <mutante>	 !log iridium - re-enable puppet, convert to base::service unit, phd restarting
[00:42:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:42:02] <paladox>	 oh
[00:42:04] <paladox>	 :)
[00:42:28] <mutante>	 Notice: /Stage[main]/Phabricator/Base::Service_unit[phd]/Service[phd]/ensure: ensure changed 'stopped' to 'running'
[00:42:31] <mutante>	 looks all good
[00:42:50] <mutante>	 thanks for the conversion 
[00:47:46] <paladox>	 Your welcome :0
[00:47:47] <paladox>	 :)
[00:47:48] <paladox>	 mutante test phd
[00:47:50] <paladox>	 since it is using upstart
[00:47:52] <paladox>	 on iridium
[00:49:18] <paladox>	 root@gerrit-test3:/var/lib/gerrit2/review_site# groups gerrit2
[00:49:18] <paladox>	 gerrit2 : nda labsadminbots
[00:49:21] <paladox>	 woops
[00:50:54] <mutante>	 paladox: 
[00:50:54] <mutante>	 iridium:/etc/init] $ status phd
[00:50:55] <mutante>	 phd start/running
[00:50:57] <mutante>	 i did
[00:51:01] <paladox>	 Yep
[00:51:04] <paladox>	 :)
[00:51:04] <mutante>	 did you mean anything more specific?
[00:51:07] <paladox>	 Nope
[00:51:17] <mutante>	 delete that old init script though, right
[00:51:46] <paladox>	 Yep, that's a to do once we move from trusty to jessie
[00:51:53] <mutante>	 heh, it can't be stopped
[00:52:07] <paladox>	 Oh
[00:52:12] <paladox>	 sudo service phd stop?
[00:53:25] <mutante>	 paladox: ok, tested more, nevermind :)
[00:53:35] <mutante>	 it does work and i removed the old symlink
[00:53:45] <paladox>	 ok :)
[00:54:11] <mutante>	 !log iridium - tested stop/start of phd service with upstart, unlink /etc/init.t/phd which was the formerly used symlink to a phab php script
[00:54:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:54:23] <paladox>	 :)
[00:55:36] <icinga-wm>	 PROBLEM - puppet last run on dbproxy1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:05:57] <twentyafterfour>	 hmm no jouncebot?
[01:06:23] <twentyafterfour>	 !log updating phabricator to tag release/2017-03-08/1
[01:06:26] <RainbowSprinkles>	 jouncebot: ping
[01:06:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:08:56] <twentyafterfour>	 !log phabricator update complete.
[01:09:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:11:23] <bd808>	 jouncebot: now
[01:11:23] <jouncebot>	 For the next 0 hour(s) and 48 minute(s): Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170309T0100)
[01:11:47] <bd808>	 twentyafterfour: ^ not sure why the dumb bot forgot to yell at you
[01:11:53] <bd808>	 jouncebot: next
[01:11:53] <jouncebot>	 In 12 hour(s) and 48 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170309T1400)
[01:15:46] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[01:20:46] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 15 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[01:21:06] <icinga-wm>	 PROBLEM - puppet last run on druid1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:24:36] <icinga-wm>	 RECOVERY - puppet last run on dbproxy1008 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures
[01:49:06] <icinga-wm>	 RECOVERY - puppet last run on druid1001 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[01:55:16] <icinga-wm>	 PROBLEM - puppet last run on aqs1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[02:04:26] <icinga-wm>	 PROBLEM - PHD should be supervising processes on iridium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 997 (phd)
[02:05:16] <twentyafterfour>	 fixing phd
[02:05:26] <icinga-wm>	 RECOVERY - PHD should be supervising processes on iridium is OK: PROCS OK: 13 processes with UID = 997 (phd)
[02:05:56] <mutante>	 thanks. btw this check is gone from phab2001 since today
[02:06:35] <mutante>	 (it will move with the active server)
[02:08:10] <RainbowSprinkles>	 Yay :D
[02:19:41] <wikibugs_>	 (03PS1) 10Dzahn: planet: get rid of $realm-case, use Hiera for domain name [puppet] - 10https://gerrit.wikimedia.org/r/341959
[02:23:16] <icinga-wm>	 RECOVERY - puppet last run on aqs1009 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures
[02:23:34] <wikibugs_>	 06Operations, 10MediaWiki-JobRunner, 13Patch-For-Review, 15User-Addshore: jobrunner should send statsd in batches - https://phabricator.wikimedia.org/T132327#3086682 (10aaron) I put it up for SWAT tomorrow.
[02:24:09] <wikibugs_>	 (03PS2) 10Dzahn: planet: get rid of $realm-case, use Hiera for domain name [puppet] - 10https://gerrit.wikimedia.org/r/341959
[02:25:01] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] planet: get rid of $realm-case, use Hiera for domain name [puppet] - 10https://gerrit.wikimedia.org/r/341959 (owner: 10Dzahn)
[02:26:07] <wikibugs_>	 (03PS3) 10Dzahn: planet: get rid of $realm-case, use Hiera for domain name [puppet] - 10https://gerrit.wikimedia.org/r/341959
[02:30:55] <wikibugs_>	 (03PS1) 10Dzahn: racktables: get rid of $realm-case, use Hiera for host name [puppet] - 10https://gerrit.wikimedia.org/r/341960
[02:32:39] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] racktables: get rid of $realm-case, use Hiera for host name [puppet] - 10https://gerrit.wikimedia.org/r/341960 (owner: 10Dzahn)
[02:34:57] <wikibugs_>	 (03PS2) 10Dzahn: racktables: get rid of $realm-case, use Hiera for host name [puppet] - 10https://gerrit.wikimedia.org/r/341960
[02:36:34] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.14) (duration: 14m 34s)
[02:36:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:09:46] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.15) (duration: 14m 35s)
[03:09:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:15:39] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Mar  9 03:15:39 UTC 2017 (duration 5m 53s)
[03:15:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:23:06] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 791.70 seconds
[03:31:06] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 251.79 seconds
[03:32:40] <ebernhardson>	 AaronSchulz: i added a pull request for your class_alias problem. I don't think we have any particular solution to backporting stuff into phan though, which means waiting for a release, upgrading CI to php 7.1 (latest phan is 7.1), and using the new version
[03:33:13] <ebernhardson>	 (Assuming it works, i also didn't have a valid 7.1 installation to test with so fixed against 0.7 branch, then cherry picked to master and letting travis figure it out ...)
[03:33:27] <ebernhardson>	 doh wrong channel ... oh well this will work too
[03:42:34] <AaronSchulz>	 fortunately it's nothing urgent 
[03:49:16] <icinga-wm>	 PROBLEM - puppet last run on maps1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:04:12] <wikibugs_>	 (03PS1) 10Dzahn: udp2log: fix lint warning [puppet] - 10https://gerrit.wikimedia.org/r/341964
[04:06:52] <wikibugs_>	 (03PS1) 10Dzahn: prometheus: fix lint warning [puppet] - 10https://gerrit.wikimedia.org/r/341965
[04:07:55] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] prometheus: fix lint warning [puppet] - 10https://gerrit.wikimedia.org/r/341965 (owner: 10Dzahn)
[04:09:45] <wikibugs_>	 (03PS1) 10Dzahn: authdns: fix lint warning [puppet] - 10https://gerrit.wikimedia.org/r/341966
[04:12:03] <wikibugs_>	 (03CR) 10Dzahn: "this and 2 other patches are the only warnings there are across the whole repo, after that it's warning-free again" [puppet] - 10https://gerrit.wikimedia.org/r/341964 (owner: 10Dzahn)
[04:14:04] <wikibugs_>	 (03PS2) 10Dzahn: authdns: fix lint warning [puppet] - 10https://gerrit.wikimedia.org/r/341966
[04:17:16] <icinga-wm>	 RECOVERY - puppet last run on maps1004 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[04:18:14] <wikibugs_>	 (03CR) 10Dzahn: "well, this one is annoying, because while we see here "WARNING indentation of => is not properly aligned (arrow_alignment)" it is actually" [puppet] - 10https://gerrit.wikimedia.org/r/341965 (owner: 10Dzahn)
[04:18:52] <wikibugs_>	 (03PS2) 10Dzahn: prometheus: fix lint warning [puppet] - 10https://gerrit.wikimedia.org/r/341965
[04:19:23] <wikibugs_>	 (03CR) 10Dzahn: [C: 031] site: use spare::system on fluorine [puppet] - 10https://gerrit.wikimedia.org/r/341789 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi)
[04:20:06] <icinga-wm>	 PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=345.25 Read Requests/Sec=2228.60 Write Requests/Sec=1.10 KBytes Read/Sec=14889.60 KBytes_Written/Sec=44.40
[04:20:22] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] prometheus: fix lint warning [puppet] - 10https://gerrit.wikimedia.org/r/341965 (owner: 10Dzahn)
[04:22:05] <wikibugs_>	 (03CR) 10Dzahn: "jenkins-bot said -1 for line 3 - 7: modules/mediawiki/manifests/maintenance/uploads.pp  WARNING indentation of => is not properly aligned " [puppet] - 10https://gerrit.wikimedia.org/r/341264 (https://phabricator.wikimedia.org/T159661) (owner: 10Dereckson)
[04:23:43] <wikibugs_>	 (03CR) 10Dzahn: maintenance: provision /etc/wgetrc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/341264 (https://phabricator.wikimedia.org/T159661) (owner: 10Dereckson)
[04:27:26] <icinga-wm>	 PROBLEM - salt-minion processes on lvs1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[04:27:46] <icinga-wm>	 PROBLEM - Check systemd state on lvs1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[04:28:41] <wikibugs_>	 (03PS2) 10Krinkle: udp2log: fix lint warning [puppet] - 10https://gerrit.wikimedia.org/r/341964 (owner: 10Dzahn)
[04:28:54] <wikibugs_>	 (03CR) 10Krinkle: "(moved warning from footer-meta to commit-msg body)" [puppet] - 10https://gerrit.wikimedia.org/r/341964 (owner: 10Dzahn)
[04:30:06] <icinga-wm>	 RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=3.00 Read Requests/Sec=0.90 Write Requests/Sec=6.00 KBytes Read/Sec=14.80 KBytes_Written/Sec=126.80
[04:55:56] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:56:56] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:56:57] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:57:26] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1003 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200)
[04:57:27] <icinga-wm>	 PROBLEM - DPKG on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:57:28] <icinga-wm>	 PROBLEM - LVS HTTP IPv6 on text-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[04:57:30] <icinga-wm>	 PROBLEM - LVS HTTPS IPv4 on text-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[04:57:36] <icinga-wm>	 PROBLEM - Graphoid LVS eqiad on graphoid.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200)
[04:57:39] <icinga-wm>	 PROBLEM - LVS HTTPS IPv6 on text-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[04:57:39] <icinga-wm>	 PROBLEM - Check rp_filter disabled on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:57:39] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1004 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200)
[04:57:39] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - rendering-https_443 - Could not depool server mw1298.eqiad.wmnet because of too many down!: rendering_80 - Could not depool server mw1298.eqiad.wmnet because of too many down!
[04:57:40] <icinga-wm>	 PROBLEM - wiki content on commons on commons.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[04:57:40] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - rendering_80 - Could not depool server mw1297.eqiad.wmnet because of too many down!
[04:58:06] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:58:56] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[05:00:56] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[05:00:56] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[05:00:56] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[05:00:56] <icinga-wm>	 PROBLEM - puppet last run on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:01:06] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:01:19] <icinga-wm>	 RECOVERY - LVS HTTP IPv6 on text-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 523 bytes in 0.011 second response time
[05:01:20] <icinga-wm>	 RECOVERY - LVS HTTPS IPv4 on text-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 16132 bytes in 0.053 second response time
[05:01:46] <icinga-wm>	 PROBLEM - dhclient process on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:01:46] <icinga-wm>	 PROBLEM - configured eth on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:01:46] <icinga-wm>	 PROBLEM - Disk space on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:02:06] <icinga-wm>	 PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: CRITICAL - failed 23 probes of 416 (alerts on 19) - https://atlas.ripe.net/measurements/1790945/#!map
[05:03:47] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on wikidata is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:04:16] <icinga-wm>	 PROBLEM - SSH on lvs1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:04:17] <icinga-wm>	 RECOVERY - DPKG on lvs1001 is OK: All packages OK
[05:04:28] <icinga-wm>	 RECOVERY - LVS HTTPS IPv6 on text-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 16134 bytes in 0.075 second response time
[05:04:28] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1003 is OK: All endpoints are healthy
[05:04:37] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy
[05:05:46] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1002 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200)
[05:07:07] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:07:07] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:07:07] <icinga-wm>	 RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 15 probes of 416 (alerts on 19) - https://atlas.ripe.net/measurements/1790945/#!map
[05:07:36] <icinga-wm>	 RECOVERY - Check rp_filter disabled on lvs1001 is OK: OK: kernel parameters are set to expected value.
[05:07:39] <icinga-wm>	 RECOVERY - wiki content on commons on commons.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 153114 bytes in 0.013 second response time
[05:07:39] <icinga-wm>	 RECOVERY - dhclient process on lvs1001 is OK: PROCS OK: 0 processes with command name dhclient
[05:07:39] <icinga-wm>	 RECOVERY - configured eth on lvs1001 is OK: OK - interfaces up
[05:07:39] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1004 is OK: All endpoints are healthy
[05:07:39] <icinga-wm>	 RECOVERY - Graphoid LVS eqiad on graphoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[05:07:46] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy
[05:08:36] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1863 bytes in 0.085 second response time
[05:09:06] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:09:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:09:58] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1001 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200)
[05:09:58] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[05:09:58] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[05:09:58] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[05:10:17] <icinga-wm>	 PROBLEM - puppet last run on mw1182 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:11:06] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy
[05:11:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:11:46] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1002 is OK: All endpoints are healthy
[05:12:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:12:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:12:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:13:08] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on text-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:13:46] <icinga-wm>	 RECOVERY - Disk space on lvs1001 is OK: DISK OK
[05:13:47] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1001 is OK: All endpoints are healthy
[05:14:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:14:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:14:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:14:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:14:06] <icinga-wm>	 PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:14:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:14:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:14:07] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:14:07] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:14:08] <icinga-wm>	 PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:14:08] <icinga-wm>	 PROBLEM - restbase endpoints health on xenon is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:14:09] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:14:58] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on text-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 512 bytes in 0.007 second response time
[05:15:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1003 is OK: All endpoints are healthy
[05:15:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1002 is OK: All endpoints are healthy
[05:15:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1017 is OK: All endpoints are healthy
[05:15:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy
[05:15:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1001 is OK: All endpoints are healthy
[05:15:57] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy
[05:15:57] <icinga-wm>	 RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy
[05:15:58] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy
[05:15:58] <icinga-wm>	 RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy
[05:15:59] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy
[05:15:59] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy
[05:16:00] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy
[05:16:00] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1010 is OK: All endpoints are healthy
[05:16:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy
[05:16:17] <icinga-wm>	 PROBLEM - pybal on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:16:26] <icinga-wm>	 PROBLEM - DPKG on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:17:46] <icinga-wm>	 PROBLEM - Check rp_filter disabled on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:18:01] <yuvipanda>	 hmm
[05:18:06] <yuvipanda>	 not sure if this is real
[05:18:29] <yuvipanda>	 or if it's something with icinga
[05:18:38] <icinga-wm>	 PROBLEM - LVS HTTP IPv6 on text-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:18:46] <icinga-wm>	 PROBLEM - dhclient process on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:18:46] <icinga-wm>	 PROBLEM - configured eth on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:18:56] <yuvipanda>	 ok
[05:19:00] <yuvipanda>	 I can't get onto lvs1001
[05:19:06] <icinga-wm>	 RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 7 probes of 416 (alerts on 19) - https://atlas.ripe.net/measurements/1790945/#!map
[05:19:38] <icinga-wm>	 PROBLEM - LVS HTTPS IPv4 on text-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:20:36] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - rendering-https_443 - Could not depool server mw1293.eqiad.wmnet because of too many down!: rendering_80 - Could not depool server mw1296.eqiad.wmnet because of too many down!
[05:20:56] <icinga-wm>	 PROBLEM - Disk space on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:21:06] <icinga-wm>	 RECOVERY - pybal on lvs1001 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal
[05:21:38] <icinga-wm>	 PROBLEM - LVS HTTPS IPv6 on text-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:22:04] <yuvipanda>	 I'm trying to get into mgmt for it
[05:22:05] <yuvipanda>	 hi bblack--
[05:22:09] <yuvipanda>	 or bblack
[05:22:11] <bblack>	 hi
[05:22:23] <yuvipanda>	 bblack: I can ping lvs1001 but can't ssh in
[05:22:27] <yuvipanda>	 ping was also intermittent early on
[05:22:32] <bblack>	 did the lvs1001 issue predate all the scb, etc spam?
[05:22:36] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1003 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200)
[05:22:38] <icinga-wm>	 PROBLEM - Graphoid LVS eqiad on graphoid.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200)
[05:22:48] <icinga-wm>	 PROBLEM - wiki content on commons on commons.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:22:53] <bblack>	 seems close enough to the start, anyways
[05:23:18] <yuvipanda>	 bblack: yeah, not sure.
[05:23:27] <icinga-wm>	 RECOVERY - LVS HTTP IPv6 on text-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 524 bytes in 0.009 second response time
[05:23:29] <icinga-wm>	 RECOVERY - LVS HTTPS IPv4 on text-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 16133 bytes in 0.086 second response time
[05:23:33] <yuvipanda>	 bblack: I see salt on lvs1001 complaining before everything
[05:23:46] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1004 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200)
[05:23:46] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - rendering-https_443 - Could not depool server mw1295.eqiad.wmnet because of too many down!: rendering_80 - Could not depool server mw1297.eqiad.wmnet because of too many down!
[05:24:18] <bblack>	 ok
[05:24:25] <bblack>	 I'm going to halt it from console
[05:24:28] <icinga-wm>	 RECOVERY - LVS HTTPS IPv6 on text-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 16134 bytes in 0.072 second response time
[05:24:43] <yuvipanda>	 bblack: ok!
[05:24:50] <yuvipanda>	 bblack: am on the console already, want me to reboot it?
[05:24:54] <yuvipanda>	 or shall I leave it to you?
[05:24:56] <bblack>	 !log poweroff lvs1001 from idrac
[05:25:01] <bblack>	 did
[05:25:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:25:12] <yuvipanda>	 ok!
[05:25:16] <bblack>	 lvs1004 should automatically take over for it on death, but if lvs1001 is in some half-dead state
[05:25:16] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.09 seconds
[05:25:39] * yuvipanda nods
[05:25:46] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on wikidata is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:25:56] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1002 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200)
[05:26:16] <icinga-wm>	 PROBLEM - pybal on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:26:29] <yuvipanda>	 ganglia says 'down' for lvs1001
[05:26:58] <yuvipanda>	 lots of spiking
[05:27:17] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:27:17] <bblack>	 yeah
[05:27:35] <bblack>	 I don't see traffic moving to 1004, though
[05:28:18] <icinga-wm>	 PROBLEM - Host text-lb.eqiad.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100%
[05:28:26] <icinga-wm>	 PROBLEM - Host en.m.wikipedia.org is DOWN: PING CRITICAL - Packet loss = 100%
[05:28:26] <icinga-wm>	 PROBLEM - Host en.wikibooks.org is DOWN: PING CRITICAL - Packet loss = 100%
[05:28:26] <yuvipanda>	 hmm
[05:28:48] <icinga-wm>	 PROBLEM - Host text-lb.eqiad.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100%
[05:29:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:29:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:29:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:29:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:29:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:29:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:29:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:29:07] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:29:07] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:29:08] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:29:08] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:29:09] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:29:36] <icinga-wm>	 PROBLEM - Host en.wikipedia.org is DOWN: PING CRITICAL - Packet loss = 100%
[05:29:36] <icinga-wm>	 PROBLEM - Host commons.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100%
[05:29:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1003 is OK: All endpoints are healthy
[05:29:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1002 is OK: All endpoints are healthy
[05:29:57] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1010 is OK: All endpoints are healthy
[05:29:57] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy
[05:29:57] <icinga-wm>	 RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy
[05:29:57] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy
[05:29:57] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy
[05:29:58] <icinga-wm>	 RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy
[05:29:58] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1001 is OK: All endpoints are healthy
[05:29:59] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy
[05:29:59] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy
[05:30:00] <icinga-wm>	 RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy
[05:30:00] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy
[05:30:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy
[05:30:36] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1852 bytes in 0.098 second response time
[05:30:37] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1003 is OK: All endpoints are healthy
[05:30:58] <icinga-wm>	 RECOVERY - Host text-lb.eqiad.wikimedia.org_ipv6 is UP: PING WARNING - Packet loss = 86%, RTA = 0.34 ms
[05:30:59] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1002 is OK: All endpoints are healthy
[05:31:36] <icinga-wm>	 PROBLEM - Host lvs1001 is DOWN: PING CRITICAL - Packet loss = 100%
[05:31:48] <icinga-wm>	 RECOVERY - wiki content on commons on commons.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 153114 bytes in 0.013 second response time
[05:31:48] <icinga-wm>	 RECOVERY - Graphoid LVS eqiad on graphoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[05:31:48] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1004 is OK: All endpoints are healthy
[05:31:56] <icinga-wm>	 RECOVERY - Host commons.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms
[05:32:38] <icinga-wm>	 PROBLEM - LVS HTTPS IPv4 on text-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:33:27] <icinga-wm>	 RECOVERY - LVS HTTPS IPv4 on text-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 16132 bytes in 0.056 second response time
[05:33:36] <icinga-wm>	 RECOVERY - Host en.m.wikipedia.org is UP: PING OK - Packet loss = 0%, RTA = 0.54 ms
[05:33:36] <icinga-wm>	 RECOVERY - Host en.wikibooks.org is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms
[05:33:36] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy
[05:33:46] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy
[05:34:56] <icinga-wm>	 RECOVERY - Host en.wikipedia.org is UP: PING WARNING - Packet loss = 54%, RTA = 1.03 ms
[05:36:27] <wikibugs_>	 06Operations, 10MediaWiki-General-or-Unknown: Investigate spike in 500s during asw-c2-eqiad replacement - https://phabricator.wikimedia.org/T156475#3086770 (10aaron) I wonder why would the restart cause MASTER_GTID_WAIT() to fail in a non-timeout way, e.g. 'Failed to query MASTER_POS_WAIT()'.
[05:38:48] <icinga-wm>	 PROBLEM - Host text-lb.eqiad.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100%
[05:39:16] <icinga-wm>	 RECOVERY - puppet last run on mw1182 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[05:39:38] <icinga-wm>	 RECOVERY - Host text-lb.eqiad.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms
[05:40:59] <musikanimal>	 anyone getting packet loss with all things WMF? I'm in NYC, but I tried on my home ISP and my mobile carrier, things are really slow or may not load at all
[05:42:16] <icinga-wm>	 PROBLEM - Host commons.wikimedia.org is DOWN: CRITICAL - Destination Unreachable (commons.wikimedia.org)
[05:42:26] <musikanimal>	 okay so not just me, heh
[05:42:41] <bblack>	 yes
[05:42:56] <icinga-wm>	 RECOVERY - Host commons.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 2.70 ms
[05:43:17] <icinga-wm>	 PROBLEM - puppet last run on db1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:44:08] <wikibugs_>	 (03PS1) 10BBlack: depool eqiad front edge traffic [dns] - 10https://gerrit.wikimedia.org/r/341971
[05:44:30] <wikibugs_>	 (03CR) 10BBlack: [C: 032] depool eqiad front edge traffic [dns] - 10https://gerrit.wikimedia.org/r/341971 (owner: 10BBlack)
[05:45:16] <icinga-wm>	 PROBLEM - Host commons.wikimedia.org is DOWN: CRITICAL - Destination Unreachable (commons.wikimedia.org)
[05:46:06] <icinga-wm>	 RECOVERY - Host commons.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms
[05:46:46] <icinga-wm>	 PROBLEM - Host en.wikibooks.org is DOWN: CRITICAL - Destination Unreachable (en.wikibooks.org)
[05:46:48] <icinga-wm>	 PROBLEM - Host text-lb.eqiad.wikimedia.org_ipv6 is DOWN: CRITICAL - Destination Unreachable (2620:0:861:ed1a::1)
[05:47:38] <icinga-wm>	 RECOVERY - Host text-lb.eqiad.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms
[05:50:26] <icinga-wm>	 PROBLEM - Host commons.wikimedia.org is DOWN: CRITICAL - Destination Unreachable (commons.wikimedia.org)
[05:51:06] <icinga-wm>	 RECOVERY - Host commons.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms
[05:51:48] <icinga-wm>	 PROBLEM - Host text-lb.eqiad.wikimedia.org_ipv6 is DOWN: CRITICAL - Destination Unreachable (2620:0:861:ed1a::1)
[05:51:56] <icinga-wm>	 RECOVERY - Host en.wikibooks.org is UP: PING OK - Packet loss = 0%, RTA = 36.12 ms
[05:52:38] <icinga-wm>	 RECOVERY - Host text-lb.eqiad.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms
[05:53:16] <icinga-wm>	 PROBLEM - Host commons.wikimedia.org is DOWN: CRITICAL - Destination Unreachable (commons.wikimedia.org)
[05:54:06] <icinga-wm>	 RECOVERY - Host commons.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 36.10 ms
[05:54:48] <icinga-wm>	 PROBLEM - Host text-lb.eqiad.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100%
[05:55:38] <icinga-wm>	 RECOVERY - Host text-lb.eqiad.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms
[05:59:47] <icinga-wm>	 PROBLEM - Host text-lb.eqiad.wikimedia.org_ipv6 is DOWN: CRITICAL - Destination Unreachable (2620:0:861:ed1a::1)
[06:00:38] <icinga-wm>	 RECOVERY - Host text-lb.eqiad.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms
[06:01:26] <icinga-wm>	 PROBLEM - DPKG on lvs1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[06:02:16] <icinga-wm>	 RECOVERY - DPKG on lvs1004 is OK: All packages OK
[06:04:48] <icinga-wm>	 PROBLEM - Host text-lb.eqiad.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100%
[06:07:38] <icinga-wm>	 RECOVERY - Host text-lb.eqiad.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms
[06:09:16] <icinga-wm>	 PROBLEM - puppet last run on scb1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[06:09:49] <icinga-wm>	 PROBLEM - Host text-lb.eqiad.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100%
[06:12:26] <icinga-wm>	 RECOVERY - puppet last run on db1001 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[06:12:38] <icinga-wm>	 RECOVERY - Host text-lb.eqiad.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms
[06:13:06] <icinga-wm>	 RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0)
[06:13:16] <icinga-wm>	 RECOVERY - Host lvs1001 is UP: PING OK - Packet loss = 0%, RTA = 0.18 ms
[06:13:26] <icinga-wm>	 RECOVERY - DPKG on lvs1001 is OK: All packages OK
[06:13:56] <icinga-wm>	 RECOVERY - salt-minion processes on lvs1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[06:13:57] <icinga-wm>	 RECOVERY - Check systemd state on lvs1001 is OK: OK - running: The system is fully operational
[06:13:57] <icinga-wm>	 RECOVERY - dhclient process on lvs1001 is OK: PROCS OK: 0 processes with command name dhclient
[06:13:57] <icinga-wm>	 RECOVERY - configured eth on lvs1001 is OK: OK - interfaces up
[06:13:57] <icinga-wm>	 RECOVERY - Check rp_filter disabled on lvs1001 is OK: OK: kernel parameters are set to expected value.
[06:13:57] <icinga-wm>	 RECOVERY - puppet last run on lvs1001 is OK: OK: Puppet is currently enabled, last run 1 hour ago with 0 failures
[06:13:57] <icinga-wm>	 RECOVERY - Disk space on lvs1001 is OK: DISK OK
[06:18:56] <icinga-wm>	 PROBLEM - Check systemd state on lvs1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[06:19:16] <icinga-wm>	 PROBLEM - salt-minion processes on lvs1004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[06:39:51] <wikibugs_>	 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3086795 (10Marostegui) It has actually done some recovering as the file it is scanning now has changed since last night: ``` postgres  7189  0.0  0....
[06:41:56] <icinga-wm>	 PROBLEM - salt-minion processes on lvs1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[06:41:56] <icinga-wm>	 PROBLEM - Check systemd state on lvs1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[06:42:16] <icinga-wm>	 RECOVERY - puppet last run on scb1002 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[06:43:16] <icinga-wm>	 PROBLEM - puppet last run on mw1221 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[06:45:56] <icinga-wm>	 PROBLEM - Host lvs1001 is DOWN: PING CRITICAL - Packet loss = 100%
[06:46:56] <icinga-wm>	 RECOVERY - salt-minion processes on lvs1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[06:46:56] <icinga-wm>	 RECOVERY - Check systemd state on lvs1001 is OK: OK - running: The system is fully operational
[06:47:06] <icinga-wm>	 RECOVERY - Host lvs1001 is UP: PING OK - Packet loss = 0%, RTA = 0.19 ms
[06:49:06] <icinga-wm>	 PROBLEM - pybal on lvs1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal
[06:50:07] <icinga-wm>	 RECOVERY - pybal on lvs1001 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal
[06:50:56] <icinga-wm>	 RECOVERY - Check systemd state on lvs1004 is OK: OK - running: The system is fully operational
[06:51:16] <icinga-wm>	 RECOVERY - salt-minion processes on lvs1004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[06:52:36] <icinga-wm>	 PROBLEM - Host lvs1001 is DOWN: PING CRITICAL - Packet loss = 100%
[06:53:56] <icinga-wm>	 PROBLEM - Check systemd state on lvs1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[06:54:16] <icinga-wm>	 PROBLEM - salt-minion processes on lvs1004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[06:55:16] <icinga-wm>	 PROBLEM - puppet last run on dbstore1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:08:50] <icinga-wm>	 PROBLEM - Host text-lb.eqiad.wikimedia.org_ipv6 is DOWN: CRITICAL - Destination Unreachable (2620:0:861:ed1a::1)
[07:09:40] <icinga-wm>	 RECOVERY - Host text-lb.eqiad.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 0.65 ms
[07:11:16] <icinga-wm>	 RECOVERY - puppet last run on mw1221 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures
[07:15:49] <icinga-wm>	 PROBLEM - Host text-lb.eqiad.wikimedia.org_ipv6 is DOWN: CRITICAL - Destination Unreachable (2620:0:861:ed1a::1)
[07:15:50] <icinga-wm>	 PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.009 second response time
[07:16:21] <_joe_>	 uhm I was sure I scheduled downtime there
[07:16:39] <icinga-wm>	 RECOVERY - Host text-lb.eqiad.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms
[07:17:46] <icinga-wm>	 RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.010 second response time
[07:21:06] <icinga-wm>	 RECOVERY - Check systemd state on lvs1004 is OK: OK - running: The system is fully operational
[07:21:16] <icinga-wm>	 RECOVERY - salt-minion processes on lvs1004 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[07:23:16] <icinga-wm>	 RECOVERY - puppet last run on dbstore1002 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures
[07:45:56] <icinga-wm>	 PROBLEM - puppet last run on relforge1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:48:20] <wikibugs_>	 06Operations, 10MediaWiki-General-or-Unknown: Investigate spike in 500s during asw-c2-eqiad replacement - https://phabricator.wikimedia.org/T156475#3086871 (10jcrespo) There are other options- loadbalancing creating new connections timing out when there is no immediate error, or external storage doing that (th...
[07:52:08] <wikibugs_>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341981 (https://phabricator.wikimedia.org/T159414)
[07:53:13] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Repool db1051 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341982 (https://phabricator.wikimedia.org/T159319)
[07:53:55] <marostegui>	 jynus: you go first, I will rebase
[07:54:05] <wikibugs_>	 (03PS2) 10Jcrespo: mariadb: Repool db1051 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341982 (https://phabricator.wikimedia.org/T159319)
[07:54:50] <jynus>	 I have to wait for jenkins
[07:55:19] <marostegui>	 no worries :)
[07:56:12] <wikibugs_>	 (03PS3) 10Jcrespo: mariadb: Repool db1051 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341982 (https://phabricator.wikimedia.org/T159319)
[08:01:27] <wikibugs_>	 (03CR) 10Jcrespo: [C: 04-2] "Krinke- I am not going to amend this change, because as I said, I do not plan to deploy it (hence the -2) this is just a template for help" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338996 (https://phabricator.wikimedia.org/T158580) (owner: 10Jcrespo)
[08:05:27] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Repool db1051 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341982 (https://phabricator.wikimedia.org/T159319) (owner: 10Jcrespo)
[08:07:19] <wikibugs_>	 (03Merged) 10jenkins-bot: mariadb: Repool db1051 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341982 (https://phabricator.wikimedia.org/T159319) (owner: 10Jcrespo)
[08:07:29] <wikibugs_>	 (03CR) 10jenkins-bot: mariadb: Repool db1051 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341982 (https://phabricator.wikimedia.org/T159319) (owner: 10Jcrespo)
[08:08:14] <wikibugs_>	 (03PS2) 10Marostegui: db-eqiad.php: Depool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341981 (https://phabricator.wikimedia.org/T159414)
[08:10:18] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1051 after maintenance with low weight (duration: 00m 43s)
[08:10:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:11:01] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341981 (https://phabricator.wikimedia.org/T159414) (owner: 10Marostegui)
[08:11:06] <icinga-wm>	 RECOVERY - Host lvs1001 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms
[08:12:16] <wikibugs_>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341981 (https://phabricator.wikimedia.org/T159414) (owner: 10Marostegui)
[08:12:24] <jynus>	 buffer pool efficiency dropped to 98% on db1051
[08:12:24] <wikibugs_>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341981 (https://phabricator.wikimedia.org/T159414) (owner: 10Marostegui)
[08:12:56] <icinga-wm>	 RECOVERY - puppet last run on relforge1002 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[08:13:41] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1093 - T159414 (duration: 00m 49s)
[08:13:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:13:49] <stashbot>	 T159414: Rampant differences in indexes and PK on s6 (frwiki, jawiki, ruwiki) for revision table - https://phabricator.wikimedia.org/T159414
[08:14:06] <icinga-wm>	 PROBLEM - puppet last run on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:15:56] <icinga-wm>	 PROBLEM - puppet last run on lvs1001 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[08:16:36] <icinga-wm>	 PROBLEM - puppet last run on thumbor1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:17:56] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1004 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090
[08:18:06] <icinga-wm>	 PROBLEM - salt-minion processes on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:18:06] <icinga-wm>	 PROBLEM - dhclient process on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:18:06] <icinga-wm>	 PROBLEM - Disk space on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:18:06] <icinga-wm>	 PROBLEM - configured eth on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:18:06] <icinga-wm>	 PROBLEM - Check rp_filter disabled on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:18:07] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:18:07] <icinga-wm>	 PROBLEM - Check systemd state on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:18:16] <icinga-wm>	 PROBLEM - pybal on lvs1004 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal
[08:18:16] <icinga-wm>	 PROBLEM - SSH on lvs1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[08:18:16] <icinga-wm>	 PROBLEM - pybal on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:18:36] <icinga-wm>	 PROBLEM - DPKG on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:21:42] <marostegui>	 !log  Deploy alter table s6 revision table on db1093 - T159414
[08:21:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:21:48] <stashbot>	 T159414: Rampant differences in indexes and PK on s6 (frwiki, jawiki, ruwiki) for revision table - https://phabricator.wikimedia.org/T159414
[08:24:16] <icinga-wm>	 PROBLEM - Host lvs1001 is DOWN: PING CRITICAL - Packet loss = 100%
[08:25:16] <icinga-wm>	 RECOVERY - pybal on lvs1004 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal
[08:26:06] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1004 is OK: PYBAL OK - All pools are healthy
[08:27:26] <icinga-wm>	 RECOVERY - DPKG on lvs1001 is OK: All packages OK
[08:27:36] <icinga-wm>	 RECOVERY - Host lvs1001 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms
[08:27:56] <icinga-wm>	 RECOVERY - Disk space on lvs1001 is OK: DISK OK
[08:27:56] <icinga-wm>	 RECOVERY - salt-minion processes on lvs1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[08:27:57] <icinga-wm>	 RECOVERY - configured eth on lvs1001 is OK: OK - interfaces up
[08:27:57] <icinga-wm>	 RECOVERY - dhclient process on lvs1001 is OK: PROCS OK: 0 processes with command name dhclient
[08:27:57] <icinga-wm>	 RECOVERY - Check systemd state on lvs1001 is OK: OK - running: The system is fully operational
[08:27:57] <icinga-wm>	 RECOVERY - Check rp_filter disabled on lvs1001 is OK: OK: kernel parameters are set to expected value.
[08:28:06] <icinga-wm>	 RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0)
[08:32:16] <icinga-wm>	 PROBLEM - pybal on lvs1004 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal
[08:32:56] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1004 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090
[08:34:56] <wikibugs_>	 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3086995 (10Marostegui) Not sure from which time this is: ``` FATAL:  the database system is starting up FATAL:  terminating walreceiver process due...
[08:44:26] <icinga-wm>	 RECOVERY - puppet last run on thumbor1002 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures
[08:44:36] <icinga-wm>	 PROBLEM - NTP on lvs1001 is CRITICAL: NTP CRITICAL: Offset unknown
[08:45:18] <wikibugs_>	 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3086996 (10jcrespo) That is me killing the replication, which will not work anyway. @akosiaris can you point us to the osm load process, do you have...
[08:53:16] <icinga-wm>	 PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:12] <wikibugs_>	 (03CR) 10Hashar: [C: 031] "Definitely \O/" [puppet] - 10https://gerrit.wikimedia.org/r/341569 (https://phabricator.wikimedia.org/T150850) (owner: 10Jcrespo)
[09:14:36] <icinga-wm>	 RECOVERY - NTP on lvs1001 is OK: NTP OK: Offset -0.0002154111862 secs
[09:17:32] <wikibugs_>	 (03CR) 10Hashar: [C: 031] "Since that is not used :-}" [puppet] - 10https://gerrit.wikimedia.org/r/341593 (owner: 10Chad)
[09:21:10] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Decouple mariadb::misc role to a separate file [puppet] - 10https://gerrit.wikimedia.org/r/341825 (https://phabricator.wikimedia.org/T150850) (owner: 10Jcrespo)
[09:21:18] <wikibugs_>	 (03PS3) 10Jcrespo: mariadb: Decouple mariadb::misc role to a separate file [puppet] - 10https://gerrit.wikimedia.org/r/341825 (https://phabricator.wikimedia.org/T150850)
[09:22:16] <icinga-wm>	 RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures
[09:22:22] <wikibugs_>	 (03CR) 10Hashar: [C: 04-1] "That would probably do it, then it feels like a hack and I would rather have puppet do the proper thing." [puppet] - 10https://gerrit.wikimedia.org/r/340496 (https://phabricator.wikimedia.org/T157785) (owner: 10Paladox)
[09:25:51] <wikibugs_>	 (03PS3) 10Jcrespo: mariadb: Decouple beta role to a separate file [puppet] - 10https://gerrit.wikimedia.org/r/341569 (https://phabricator.wikimedia.org/T150850)
[09:26:34] <wikibugs_>	 (03CR) 10Hashar: [C: 031] labstore: fix typo in snapshot-manager [puppet] - 10https://gerrit.wikimedia.org/r/341427 (owner: 10Hashar)
[09:27:26] <wikibugs_>	 (03CR) 10Hashar: "Yup apparently that only happens on the first provisioning of a fresh machine. Ordering issue :-}" [puppet] - 10https://gerrit.wikimedia.org/r/341700 (owner: 10Hashar)
[09:29:18] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Decouple beta role to a separate file [puppet] - 10https://gerrit.wikimedia.org/r/341569 (https://phabricator.wikimedia.org/T150850) (owner: 10Jcrespo)
[09:29:26] <icinga-wm>	 PROBLEM - puppet last run on ms-be1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:36:06] <icinga-wm>	 PROBLEM - puppet last run on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:37:36] <icinga-wm>	 PROBLEM - DPKG on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:38:06] <icinga-wm>	 PROBLEM - Check systemd state on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:38:06] <icinga-wm>	 PROBLEM - Check rp_filter disabled on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:38:06] <icinga-wm>	 PROBLEM - dhclient process on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:38:06] <icinga-wm>	 PROBLEM - configured eth on lvs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:38:16] <icinga-wm>	 PROBLEM - SSH on lvs1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:39:56] <icinga-wm>	 RECOVERY - dhclient process on lvs1001 is OK: PROCS OK: 0 processes with command name dhclient
[09:39:56] <icinga-wm>	 RECOVERY - Check rp_filter disabled on lvs1001 is OK: OK: kernel parameters are set to expected value.
[09:39:56] <icinga-wm>	 RECOVERY - Check systemd state on lvs1001 is OK: OK - running: The system is fully operational
[09:39:57] <icinga-wm>	 RECOVERY - configured eth on lvs1001 is OK: OK - interfaces up
[09:40:06] <icinga-wm>	 RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0)
[09:40:27] <icinga-wm>	 RECOVERY - DPKG on lvs1001 is OK: All packages OK
[09:41:30] <wikibugs_>	 (03CR) 10Hashar: "I have updated the beta puppetmaster and ran puppet on deployment-db03 and deployment-db04.  Only thing that happened is:" [puppet] - 10https://gerrit.wikimedia.org/r/341569 (https://phabricator.wikimedia.org/T150850) (owner: 10Jcrespo)
[09:43:57] <icinga-wm>	 RECOVERY - puppet last run on lvs1001 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures
[09:44:16] <icinga-wm>	 RECOVERY - pybal on lvs1001 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal
[09:44:56] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1001 is OK: PYBAL OK - All pools are healthy
[09:48:56] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1004 is OK: PYBAL OK - All pools are healthy
[09:49:16] <icinga-wm>	 RECOVERY - pybal on lvs1004 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal
[09:57:26] <icinga-wm>	 RECOVERY - puppet last run on ms-be1014 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures
[10:03:18] <wikibugs_>	 (03CR) 10Jcrespo: "> I have updated the beta puppetmaster and ran puppet on" [puppet] - 10https://gerrit.wikimedia.org/r/341569 (https://phabricator.wikimedia.org/T150850) (owner: 10Jcrespo)
[10:20:22] <wikibugs_>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1093" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341987
[10:23:10] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1093" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341987 (owner: 10Marostegui)
[10:23:35] <wikibugs_>	 (03CR) 10Jcrespo: [C: 031] Revert "db-eqiad.php: Depool db1093" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341987 (owner: 10Marostegui)
[10:25:08] <wikibugs_>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1093" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341987 (owner: 10Marostegui)
[10:25:08] <ema>	 !log service systemd-sysctl restart on lvs hosts 
[10:25:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:10] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1093 - T159414 (duration: 00m 42s)
[10:26:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:16] <stashbot>	 T159414: Rampant differences in indexes and PK on s6 (frwiki, jawiki, ruwiki) for revision table - https://phabricator.wikimedia.org/T159414
[10:27:05] <wikibugs_>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1088 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341988 (https://phabricator.wikimedia.org/T159414)
[10:27:11] <wikibugs_>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1093" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341987 (owner: 10Marostegui)
[10:35:02] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 04-1] PDFRender: Delay service shut-down to work around xpra race (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/341833 (https://phabricator.wikimedia.org/T159922) (owner: 10GWicke)
[10:40:26] <icinga-wm>	 PROBLEM - puppet last run on labvirt1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[10:44:37] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1088 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341988 (https://phabricator.wikimedia.org/T159414) (owner: 10Marostegui)
[10:46:14] <wikibugs_>	 06Operations, 10ops-codfw, 15User-Elukey: codfw: mw2251-mw2260 rack/setup - https://phabricator.wikimedia.org/T155180#3087217 (10elukey) So hosts rebooted, verified that puppet ran correctly and executed apt-get dist-upgrade. Verified also ROW allocation:  ``` {'mw2251.codfw.wmnet': '    SysName:      asw-a-...
[10:46:20] <wikibugs_>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1088 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341988 (https://phabricator.wikimedia.org/T159414) (owner: 10Marostegui)
[10:47:21] <wikibugs_>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1088 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341988 (https://phabricator.wikimedia.org/T159414) (owner: 10Marostegui)
[10:47:25] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1088 - T159414 (duration: 00m 41s)
[10:47:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:47:32] <stashbot>	 T159414: Rampant differences in indexes and PK on s6 (frwiki, jawiki, ruwiki) for revision table - https://phabricator.wikimedia.org/T159414
[10:49:01] <marostegui>	 !log Deploy alter table s6 revision table on db1088 - T159414
[10:49:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:51:56] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: profile::etcd::replication: add --strip option [puppet] - 10https://gerrit.wikimedia.org/r/341805
[10:51:58] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: conftool: switch prefix to /eqiad.wmnet/conftool [puppet] - 10https://gerrit.wikimedia.org/r/341989 (https://phabricator.wikimedia.org/T159687)
[10:59:11] <wikibugs_>	 06Operations, 06Labs: Remove linux kernel 3.16 from the jessie image on labs - https://phabricator.wikimedia.org/T159990#3087240 (10MoritzMuehlenhoff) @Paladox : That's entirely unrelated, the Launchpad entry refers to a bug in Upstart, which jessie doesn't use at all. @yuvipanda : I don't think there's a gene...
[11:06:41] <wikibugs_>	 06Operations, 10netops: Audit and cleanup border-in ACL on core routers - https://phabricator.wikimedia.org/T160055#3087244 (10mark)
[11:07:56] <wikibugs_>	 (03PS1) 10Alexandros Kosiaris: Promote labsdb1007 to osm::master. [puppet] - 10https://gerrit.wikimedia.org/r/341991 (https://phabricator.wikimedia.org/T157359)
[11:14:26] <icinga-wm>	 RECOVERY - puppet last run on labvirt1001 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[11:15:37] <wikibugs_>	 (03CR) 10Jcrespo: [C: 031] Promote labsdb1007 to osm::master. [puppet] - 10https://gerrit.wikimedia.org/r/341991 (https://phabricator.wikimedia.org/T157359) (owner: 10Alexandros Kosiaris)
[11:24:32] <marostegui>	 !log Stop replication db2033 - T159707
[11:24:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:24:39] <stashbot>	 T159707: Import x1 on dbstore2001 - https://phabricator.wikimedia.org/T159707
[11:28:40] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 032] Promote labsdb1007 to osm::master. [puppet] - 10https://gerrit.wikimedia.org/r/341991 (https://phabricator.wikimedia.org/T157359) (owner: 10Alexandros Kosiaris)
[11:29:50] <akosiaris>	 jynus: merging yours (5897b00) as well
[11:30:00] <jynus>	 yes, thanks
[11:30:22] <jynus>	 I thought I had already done that
[11:30:41] <jynus>	 ah, I know, because it is a beta-only change, it was tested it there
[11:30:51] <jynus>	 but it is a noop for production, so I forgot
[11:33:38] <wikibugs_>	 (03PS1) 10Andrew-WMDE: Don't show rdf2latex table hint with ElectronPdfService enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341992 (https://phabricator.wikimedia.org/T157432)
[11:36:39] <wikibugs_>	 (03CR) 10WMDE-Fisch: [C: 031] Don't show rdf2latex table hint with ElectronPdfService enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341992 (https://phabricator.wikimedia.org/T157432) (owner: 10Andrew-WMDE)
[11:36:48] <wikibugs_>	 (03CR) 10Addshore: [C: 031] Don't show rdf2latex table hint with ElectronPdfService enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341992 (https://phabricator.wikimedia.org/T157432) (owner: 10Andrew-WMDE)
[11:45:26] <icinga-wm>	 PROBLEM - puppet last run on mw1300 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:49:37] <wikibugs_>	 (03PS1) 10Urbanecm: Add HD logos for several projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341993 (https://phabricator.wikimedia.org/T150618)
[11:52:34] <wikibugs_>	 06Operations, 06Performance-Team, 10Thumbor: Thumbor resource consumption is spiky - https://phabricator.wikimedia.org/T151851#3087379 (10Gilles) 05Open>03Resolved I'm closing this, as the load spikes have considerably lowered and now look reasonable compared to the amount of cores. Memory consumption ha...
[11:56:26] <icinga-wm>	 PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:59:47] <wikibugs_>	 06Operations, 10Icinga: Icinga check for sysctl settings - https://phabricator.wikimedia.org/T160060#3087392 (10MoritzMuehlenhoff)
[12:14:26] <icinga-wm>	 RECOVERY - puppet last run on mw1300 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures
[12:23:51] <jynus>	 !log purging old rc rows from non-production database replicas
[12:23:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:25:26] <icinga-wm>	 RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures
[12:34:39] <moritzm>	 !log rebooting multatuli to Linux 4.9
[12:34:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:43:39] <wikibugs_>	 06Operations, 13Patch-For-Review: Package the next LTS kernel (4.9) - https://phabricator.wikimedia.org/T154934#3087501 (10MoritzMuehlenhoff) Linux 4.9.13 is now available in jessie-wikimedia/experimental along with updated firmware-nonfree. I have extended linux-meta with a new meta package linux-meta-4.9 whi...
[12:43:51] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Repool db1051 with normal weight after warmup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341999 (https://phabricator.wikimedia.org/T159319)
[12:45:09] <wikibugs_>	 (03CR) 10Jcrespo: [C: 04-1] "We may want to wait a bit for the server cache to stabilize: https://grafana.wikimedia.org/dashboard/db/mysql?panelId=13&fullscreen&var-dc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341999 (https://phabricator.wikimedia.org/T159319) (owner: 10Jcrespo)
[12:45:35] <wikibugs_>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1088" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342001
[12:45:52] <wikibugs_>	 (03PS2) 10Tarrow: remove elasticsearch plugin_dir setting [puppet] - 10https://gerrit.wikimedia.org/r/341831
[12:50:20] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1088" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342001 (owner: 10Marostegui)
[12:52:12] <wikibugs_>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1088" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342001 (owner: 10Marostegui)
[12:52:27] <wikibugs_>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1088" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342001 (owner: 10Marostegui)
[13:05:40] <wikibugs_>	 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: Implement DC-local cache failure limiter in Thumbor - https://phabricator.wikimedia.org/T151065#3087549 (10Gilles)
[13:07:56] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1088 - T159414 (duration: 00m 43s)
[13:08:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:08:03] <stashbot>	 T159414: Rampant differences in indexes and PK on s6 (frwiki, jawiki, ruwiki) for revision table - https://phabricator.wikimedia.org/T159414
[13:09:41] <wikibugs_>	 06Operations, 06Performance-Team, 10Thumbor: Thumbor original file download limit should be 4GB - https://phabricator.wikimedia.org/T151456#3087555 (10Gilles) I'm kind of ambivalent about that now. We could raise the limit, but that would make Thumbor potentially consume a lot more disk when things go wrong....
[13:13:04] <wikibugs_>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1085 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342005 (https://phabricator.wikimedia.org/T159414)
[13:13:19] <wikibugs_>	 06Operations, 06Performance-Team, 10Thumbor: Add request URL to thumbor errors - https://phabricator.wikimedia.org/T151553#3087558 (10Gilles) p:05High>03Low
[13:14:57] <wikibugs_>	 06Operations, 06Performance-Team, 10Thumbor, 15User-Joe: Thumbor instances exit with exit code 0 even when crashing/failing - https://phabricator.wikimedia.org/T149560#3087562 (10Gilles) 05Open>03Resolved I'm going to close this as I think it's not actionable. Thumbor is running behind firejail and in...
[13:16:01] <wikibugs_>	 (03PS1) 10Elukey: Remove Piwik backend probe from Varnish Misc backends [puppet] - 10https://gerrit.wikimedia.org/r/342007 (https://phabricator.wikimedia.org/T159136)
[13:17:18] <wikibugs_>	 (03PS2) 10Elukey: Remove Piwik backend probe from Varnish Misc backends [puppet] - 10https://gerrit.wikimedia.org/r/342007 (https://phabricator.wikimedia.org/T159136)
[13:18:29] <wikibugs_>	 (03PS1) 10Muehlenhoff: Harmomise group type for LDAP admin access [puppet] - 10https://gerrit.wikimedia.org/r/342008 (https://phabricator.wikimedia.org/T157131)
[13:18:31] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1085 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342005 (https://phabricator.wikimedia.org/T159414) (owner: 10Marostegui)
[13:20:21] <wikibugs_>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1085 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342005 (https://phabricator.wikimedia.org/T159414) (owner: 10Marostegui)
[13:20:32] <wikibugs_>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1085 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342005 (https://phabricator.wikimedia.org/T159414) (owner: 10Marostegui)
[13:21:21] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1085 - T159414 (duration: 00m 41s)
[13:21:25] <marostegui>	 !log Deploy alter table s6 revision table on db1085 - T159414
[13:21:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:21:28] <stashbot>	 T159414: Rampant differences in indexes and PK on s6 (frwiki, jawiki, ruwiki) for revision table - https://phabricator.wikimedia.org/T159414
[13:21:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:23:50] <wikibugs_>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/5698/ looks good" [puppet] - 10https://gerrit.wikimedia.org/r/342007 (https://phabricator.wikimedia.org/T159136) (owner: 10Elukey)
[13:31:01] <doctaxon>	 argh, I notice right now, that I could do bot edits at test2wiki without edit token all the time, because since lunch time today UTC it's not possible any more ... 
[13:33:57] <wikibugs_>	 (03CR) 10Elukey: cache_misc: set timeout_idle to 120s (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/341576 (https://phabricator.wikimedia.org/T159429) (owner: 10Ema)
[13:43:23] <gehel>	 !log invalidating Tasmania zoom level 10 tiles in varnish - T159631
[13:43:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:43:32] <stashbot>	 T159631: Tasmania is covered with water at z10+ - https://phabricator.wikimedia.org/T159631
[13:44:48] <Zppix>	 jouncebot:  next
[13:44:48] <jouncebot>	 In 0 hour(s) and 15 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170309T1400)
[13:46:11] <moritzm>	 !log removed cn=trebuchet group from LDAP directory (Bug: T129788)
[13:46:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:46:18] <stashbot>	 T129788: Review list of LDAP groups and document exactly what kind of access they can be allowed to provide - https://phabricator.wikimedia.org/T129788
[13:48:35] <wikibugs_>	 (03PS2) 10Jcrespo: mariadb: Repool db1051 with normal weight after warmup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341999 (https://phabricator.wikimedia.org/T159319)
[13:48:55] <marostegui>	 \o/
[13:49:06] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] "Thinking it better, db1055 has a worse cache hit ration, so this can be pooled now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341999 (https://phabricator.wikimedia.org/T159319) (owner: 10Jcrespo)
[13:51:27] <wikibugs_>	 (03Merged) 10jenkins-bot: mariadb: Repool db1051 with normal weight after warmup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341999 (https://phabricator.wikimedia.org/T159319) (owner: 10Jcrespo)
[13:51:39] <wikibugs_>	 (03CR) 10jenkins-bot: mariadb: Repool db1051 with normal weight after warmup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341999 (https://phabricator.wikimedia.org/T159319) (owner: 10Jcrespo)
[13:52:46] <moritzm>	 !log removed cn=svnadm group from LDAP directory (Bug: T129788)
[13:52:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:52:52] <stashbot>	 T129788: Review list of LDAP groups and document exactly what kind of access they can be allowed to provide - https://phabricator.wikimedia.org/T129788
[13:53:19] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1051 with normal weight after warmup (duration: 00m 40s)
[13:53:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:54:38] <zeljkof>	 jouncebot: next
[13:54:38] <jouncebot>	 In 0 hour(s) and 5 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170309T1400)
[13:55:53] <wikibugs_>	 (03PS2) 10Filippo Giunchedi: site: use spare::system on fluorine [puppet] - 10https://gerrit.wikimedia.org/r/341789 (https://phabricator.wikimedia.org/T123728)
[13:58:32] <wikibugs_>	 (03PS1) 10Filippo Giunchedi: role: add ipvs prometheus metrics for lvs nodes [puppet] - 10https://gerrit.wikimedia.org/r/342010
[13:58:56] <wikibugs_>	 (03CR) 10Filippo Giunchedi: [V: 032 C: 032] site: use spare::system on fluorine [puppet] - 10https://gerrit.wikimedia.org/r/341789 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi)
[14:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170309T1400). Please do the needful.
[14:00:04] <jouncebot>	 Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process.
[14:00:14] <zeljkof>	 o/
[14:00:20] <addshore>	 o/
[14:00:43] <zeljkof>	 addshore: want to do swat? I can do it if there are no takers :)
[14:00:56] <addshore>	 zeljkof: I'll let you :) althoguh I may have a patch to add!
[14:01:18] <zeljkof>	 addshore: if you have a patch... you do the swat! (that should be the rule) ;)
[14:01:28] * addshore hides his patch until later
[14:01:37] <zeljkof>	 in that case...
[14:01:43] <zeljkof>	 I can SWAT today!
[14:02:14] <zeljkof>	 Urbanecm: around for swat?
[14:02:26] <icinga-wm>	 PROBLEM - puppet last run on mw1244 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:03:11] <wikibugs_>	 (03PS3) 10Zfilipin: [throttle] Add new throttle rule+remove expired rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341812 (https://phabricator.wikimedia.org/T159957) (owner: 10Urbanecm)
[14:05:51] <zeljkof>	 addshore, hashar: looks like Urbanecm is not around, should I wait with his patches until he is around?
[14:06:21] <zeljkof>	 the patches are pretty simple, throttle and logos...
[14:06:30] <srdjan_m>	 you should
[14:06:36] <srdjan_m>	 for 341993 anyway
[14:07:26] <zeljkof>	 srdjan_m: I should wait? or should deploy?
[14:07:36] <srdjan_m>	 you should wait
[14:08:11] <wikibugs_>	 (03PS2) 10Filippo Giunchedi: role: add ipvs prometheus metrics for lvs at ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/342010
[14:08:19] <addshore>	 I have added mine to the calender now!
[14:08:22] <zeljkof>	 in that case, since there are no other patches... we are done with swat until Urbanecm is back
[14:08:32] <zeljkof>	 addshore: want to deploy it yourself?
[14:08:35] <addshore>	 Will do!
[14:08:42] <wikibugs_>	 (03PS3) 10Zppix: role: add ipvs prometheus metrics for lvs at ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/342010 (owner: 10Filippo Giunchedi)
[14:09:33] <zeljkof>	 addshore: great
[14:09:57] <wikibugs_>	 (03CR) 10Addshore: [C: 032] Don't show rdf2latex table hint with ElectronPdfService enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341992 (https://phabricator.wikimedia.org/T157432) (owner: 10Andrew-WMDE)
[14:10:01] <addshore>	 {{doing}}
[14:11:55] <wikibugs_>	 (03Merged) 10jenkins-bot: Don't show rdf2latex table hint with ElectronPdfService enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341992 (https://phabricator.wikimedia.org/T157432) (owner: 10Andrew-WMDE)
[14:12:26] <icinga-wm>	 PROBLEM - puppet last run on mwdebug1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:12:38] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Decouple misc role classes into separate files [puppet] - 10https://gerrit.wikimedia.org/r/342014 (https://phabricator.wikimedia.org/T150850)
[14:13:26] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: Decouple misc role classes into separate files [puppet] - 10https://gerrit.wikimedia.org/r/342014 (https://phabricator.wikimedia.org/T150850) (owner: 10Jcrespo)
[14:14:26] <icinga-wm>	 RECOVERY - puppet last run on mwdebug1001 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures
[14:15:00] <addshore>	 syncing
[14:15:40] <logmsgbot>	 !log addshore@tin Synchronized wmf-config/CommonSettings.php: [[gerrit:341992|Don't show rdf2latex table hint with ElectronPdfService enabled]] T157432 (duration: 00m 49s)
[14:15:42] <Urbanecm>	 zeljkof, I'm here
[14:15:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:15:46] <stashbot>	 T157432: Change text "Rendering finished"  > suggest use of "single column" for PDFs with tables - https://phabricator.wikimedia.org/T157432
[14:15:55] <zeljkof>	 Urbanecm: ok
[14:16:09] <zeljkof>	 addshore: let me know when you are done, I will deploy Urbanecm's patches
[14:16:24] <addshore>	 zeljkof: all done here!
[14:16:34] <addshore>	 its all yours!
[14:16:41] <zeljkof>	 ok, taking over
[14:17:24] <wikibugs_>	 (03CR) 10Filippo Giunchedi: [C: 032] mediawiki::logging: remove fluorine from firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/341940 (https://phabricator.wikimedia.org/T159996) (owner: 10Dzahn)
[14:17:59] <wikibugs_>	 (03CR) 10jenkins-bot: Don't show rdf2latex table hint with ElectronPdfService enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341992 (https://phabricator.wikimedia.org/T157432) (owner: 10Andrew-WMDE)
[14:18:11] <wikibugs_>	 (03PS4) 10Zfilipin: [throttle] Add new throttle rule+remove expired rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341812 (https://phabricator.wikimedia.org/T159957) (owner: 10Urbanecm)
[14:18:29] <zeljkof>	 Urbanecm: rebasing 341812, will +2 and deploy
[14:18:50] <wikibugs_>	 (03PS2) 10Jcrespo: mariadb: Decouple misc role classes into separate files [puppet] - 10https://gerrit.wikimedia.org/r/342014 (https://phabricator.wikimedia.org/T150850)
[14:19:16] <Urbanecm>	 zeljkof, ack
[14:19:26] <icinga-wm>	 PROBLEM - puppet last run on analytics1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:20:41] <wikibugs_>	 (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/5702/" [puppet] - 10https://gerrit.wikimedia.org/r/342010 (owner: 10Filippo Giunchedi)
[14:20:47] <wikibugs_>	 (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341812 (https://phabricator.wikimedia.org/T159957) (owner: 10Urbanecm)
[14:20:55] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Decouple misc role classes into separate files [puppet] - 10https://gerrit.wikimedia.org/r/342014 (https://phabricator.wikimedia.org/T150850) (owner: 10Jcrespo)
[14:20:59] <wikibugs_>	 (03PS1) 10Filippo Giunchedi: hieradata: remove access to fluorine [puppet] - 10https://gerrit.wikimedia.org/r/342017 (https://phabricator.wikimedia.org/T123728)
[14:22:34] <wikibugs_>	 (03Merged) 10jenkins-bot: [throttle] Add new throttle rule+remove expired rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341812 (https://phabricator.wikimedia.org/T159957) (owner: 10Urbanecm)
[14:22:36] <wikibugs_>	 (03PS3) 10Jcrespo: mariadb: Decouple misc role classes into separate files [puppet] - 10https://gerrit.wikimedia.org/r/342014 (https://phabricator.wikimedia.org/T150850)
[14:22:59] <wikibugs_>	 (03PS2) 10Zfilipin: Add HD logos for several projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341993 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm)
[14:23:46] <wikibugs_>	 (03CR) 10jenkins-bot: [throttle] Add new throttle rule+remove expired rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341812 (https://phabricator.wikimedia.org/T159957) (owner: 10Urbanecm)
[14:23:48] <wikibugs_>	 (03CR) 10Filippo Giunchedi: [C: 032] hieradata: remove access to fluorine [puppet] - 10https://gerrit.wikimedia.org/r/342017 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi)
[14:25:03] <wikibugs_>	 (03PS4) 10Jcrespo: mariadb: Decouple misc role classes into separate files [puppet] - 10https://gerrit.wikimedia.org/r/342014 (https://phabricator.wikimedia.org/T150850)
[14:25:24] <logmsgbot>	 !log zfilipin@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:341812|throttle] Add new throttle rule+remove expired rules (T159957)]] (duration: 00m 45s)
[14:25:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:25:31] <stashbot>	 T159957: Remove IP cap for account creation for MoMA NYC - Saturday March 11 - https://phabricator.wikimedia.org/T159957
[14:25:34] <zeljkof>	 Urbanecm: 341812 deployed
[14:25:49] <zeljkof>	 working on 341993
[14:25:51] <Urbanecm>	 zeljkof, ack
[14:26:12] <srdjan_m>	 srwiki-1.5x.png and srwiki-2x.png don't match srwiki.png, just fyi
[14:26:38] <wikibugs_>	 (03PS1) 10Alexandros Kosiaris: Temporarily set labsdb1007 hiera data [puppet] - 10https://gerrit.wikimedia.org/r/342018
[14:26:48] <Urbanecm>	 srdjan_m, must overseen it, thank you for notification
[14:27:04] <zeljkof>	 Urbanecm: will you amend the patch?
[14:27:15] <Urbanecm>	 zeljkof, I'll just delete them and solve them later. 
[14:27:26] <zeljkof>	 so, I can proceed with the deploy?
[14:27:29] <zeljkof>	 as is?
[14:27:39] <Urbanecm>	 zeljkof, no, please wait.
[14:27:46] <zeljkof>	 ok, waiting
[14:28:24] <wikibugs_>	 (03PS3) 10Urbanecm: Add HD logos for several projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341993 (https://phabricator.wikimedia.org/T150618)
[14:28:46] <Urbanecm>	 zeljkof, please deploy PS3
[14:30:23] <zeljkof>	 ok, reviewing
[14:31:24] <icinga-wm>	 RECOVERY - puppet last run on mw1244 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures
[14:32:52] <wikibugs_>	 (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341993 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm)
[14:34:08] <wikibugs_>	 (03Merged) 10jenkins-bot: Add HD logos for several projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341993 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm)
[14:34:24] <wikibugs_>	 (03CR) 10jenkins-bot: Add HD logos for several projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341993 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm)
[14:35:44] <moritzm>	 !log removed cn=svn group from LDAP directory (Bug: T129788)
[14:35:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:50] <stashbot>	 T129788: Review list of LDAP groups and document exactly what kind of access they can be allowed to provide - https://phabricator.wikimedia.org/T129788
[14:36:33] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 032] Temporarily set labsdb1007 hiera data [puppet] - 10https://gerrit.wikimedia.org/r/342018 (owner: 10Alexandros Kosiaris)
[14:36:38] <wikibugs_>	 (03PS2) 10Alexandros Kosiaris: Temporarily set labsdb1007 hiera data [puppet] - 10https://gerrit.wikimedia.org/r/342018
[14:36:40] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Temporarily set labsdb1007 hiera data [puppet] - 10https://gerrit.wikimedia.org/r/342018 (owner: 10Alexandros Kosiaris)
[14:36:56] <jynus>	 I was about to do the same
[14:38:38] <jynus>	 I think the reason those were there was labsdb1006/7 themselves :-)
[14:38:52] <logmsgbot>	 !log zfilipin@tin Synchronized static/images/project-logos/: SWAT: [[gerrit:341993|Add HD logos for several projects (T150618)]] (duration: 00m 42s)
[14:38:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:38:58] <stashbot>	 T150618: Provide HD logos for all projects - https://phabricator.wikimedia.org/T150618
[14:39:44] <logmsgbot>	 !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:341993|Add HD logos for several projects (T150618)]] (duration: 00m 41s)
[14:39:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:05] <zeljkof>	 Urbanecm: 341993 deployed, please check the logos at wikis
[14:40:10] <Urbanecm>	 zeljkof, checking
[14:41:35] <Urbanecm>	 zeljkof, working
[14:41:58] <zeljkof>	 Urbanecm: all good?
[14:42:01] <wikibugs_>	 (03CR) 10Ema: [C: 031] role: add ipvs prometheus metrics for lvs at ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/342010 (owner: 10Filippo Giunchedi)
[14:42:02] <Urbanecm>	 Yep
[14:42:08] <zeljkof>	 great, in that case...
[14:42:14] <zeljkof>	 !log EU SWAT finished
[14:42:14] <wikibugs_>	 (03PS1) 10Alexandros Kosiaris: role::osm::common: Conditionalize tuning.conf inclusion [puppet] - 10https://gerrit.wikimedia.org/r/342019
[14:42:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:49] <wikibugs_>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342020
[14:42:52] <wikibugs_>	 (03PS4) 10Filippo Giunchedi: role: add ipvs prometheus metrics for lvs at ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/342010
[14:45:13] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 032] role::osm::common: Conditionalize tuning.conf inclusion [puppet] - 10https://gerrit.wikimedia.org/r/342019 (owner: 10Alexandros Kosiaris)
[14:45:17] <wikibugs_>	 (03PS2) 10Alexandros Kosiaris: role::osm::common: Conditionalize tuning.conf inclusion [puppet] - 10https://gerrit.wikimedia.org/r/342019
[14:45:20] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] role::osm::common: Conditionalize tuning.conf inclusion [puppet] - 10https://gerrit.wikimedia.org/r/342019 (owner: 10Alexandros Kosiaris)
[14:45:40] <wikibugs_>	 (03CR) 10Filippo Giunchedi: [C: 032] role: add ipvs prometheus metrics for lvs at ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/342010 (owner: 10Filippo Giunchedi)
[14:45:48] <wikibugs_>	 (03PS5) 10Filippo Giunchedi: role: add ipvs prometheus metrics for lvs at ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/342010
[14:46:07] <wikibugs_>	 (03CR) 10Filippo Giunchedi: [V: 032 C: 032] role: add ipvs prometheus metrics for lvs at ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/342010 (owner: 10Filippo Giunchedi)
[14:49:24] <icinga-wm>	 RECOVERY - puppet last run on analytics1029 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[14:55:05] <wikibugs_>	 06Operations, 06Analytics-Kanban, 10ChangeProp, 10Reading-Web-Trending-Service, 06Services (watching): Upgrade librdkafka 0.9.4 on SCB and Varnishes - https://phabricator.wikimedia.org/T159379#3087865 (10Ottomata) I think we are ready to proceed with this when yall are.  Should we schedule a day next wee...
[14:55:21] <wikibugs_>	 (03PS1) 10Alexandros Kosiaris: osm::planet_import: conditionalize load of 900913.sql [puppet] - 10https://gerrit.wikimedia.org/r/342022
[14:55:26] <Reedy>	 jouncebot: now
[14:55:26] <jouncebot>	 For the next 0 hour(s) and 4 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170309T1400)
[14:57:41] <wikibugs_>	 (03CR) 10Filippo Giunchedi: "Parsing /proc/net/ip_vs_stats works:" [puppet] - 10https://gerrit.wikimedia.org/r/342010 (owner: 10Filippo Giunchedi)
[14:57:43] <wikibugs_>	 (03CR) 10Gehel: [C: 04-1] remove elasticsearch plugin_dir setting (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/341831 (owner: 10Tarrow)
[14:57:47] <wikibugs_>	 (03PS3) 10Gehel: remove elasticsearch plugin_dir setting [puppet] - 10https://gerrit.wikimedia.org/r/341831 (owner: 10Tarrow)
[14:57:58] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 032] osm::planet_import: conditionalize load of 900913.sql [puppet] - 10https://gerrit.wikimedia.org/r/342022 (owner: 10Alexandros Kosiaris)
[14:58:03] <wikibugs_>	 (03PS2) 10Alexandros Kosiaris: osm::planet_import: conditionalize load of 900913.sql [puppet] - 10https://gerrit.wikimedia.org/r/342022
[14:58:07] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] osm::planet_import: conditionalize load of 900913.sql [puppet] - 10https://gerrit.wikimedia.org/r/342022 (owner: 10Alexandros Kosiaris)
[14:59:06] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] remove elasticsearch plugin_dir setting [puppet] - 10https://gerrit.wikimedia.org/r/341831 (owner: 10Tarrow)
[15:01:21] <wikibugs_>	 (03CR) 10Gehel: [C: 04-1] remove elasticsearch plugin_dir setting (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/341831 (owner: 10Tarrow)
[15:02:34] <moritzm>	 !log installing nettle security updates
[15:02:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:06:26] <tarrow>	 gehel: I'm not quite sure I explained clearly what is needed in https://gerrit.wikimedia.org/r/341831. Having $plugins_dir set in the base ::elasticsearch means that it fails unless you have made the link like in "common" and "logstash"
[15:07:36] <gehel>	 tarrow: do you have the exact failure message?
[15:07:39] <logmsgbot>	 !log reedy@tin Synchronized php-1.29.0-wmf.15/extensions/ConfirmEdit: Fixup maintenance script (duration: 00m 43s)
[15:07:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:07:59] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Decouple misc role classes into separate files [puppet] - 10https://gerrit.wikimedia.org/r/342014 (https://phabricator.wikimedia.org/T150850) (owner: 10Jcrespo)
[15:08:04] <wikibugs_>	 (03PS5) 10Jcrespo: mariadb: Decouple misc role classes into separate files [puppet] - 10https://gerrit.wikimedia.org/r/342014 (https://phabricator.wikimedia.org/T150850)
[15:08:36] <wikibugs_>	 (03PS3) 10Elukey: Remove Piwik backend probe from Varnish Misc backends [puppet] - 10https://gerrit.wikimedia.org/r/342007 (https://phabricator.wikimedia.org/T154558)
[15:08:38] <tarrow>	 well; in the ES log you get "java.lang.IllegalStateException: Unable to access 'path.plugins' (/srv/deployment/elasticsearch/plugins)" because we've never made it
[15:09:24] <gehel>	 Ok, so in your case, you probably want to set it to /usr/share/elasticsearch/plugins
[15:09:28] <wikibugs_>	 06Operations, 10Analytics-Cluster, 06Analytics-Kanban, 13Patch-For-Review, 15User-Elukey: Reimage a Trusty Hadoop worker to Debian jessie - https://phabricator.wikimedia.org/T159530#3070663 (10elukey) Just completed https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Administration#Worker_Nodes_...
[15:10:08] <tarrow>	 gehel: yep, that also works. I assumed the right think to do was to set it to /usr/share/elasticsearch/plugins in ::elasticsearch (by removing it)
[15:10:17] <wikibugs_>	 (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342020
[15:10:33] <gehel>	 tarrow: yeah, that would be nicer, but it will break our production cluster...
[15:10:48] <tarrow>	 and then set it to "/srv/deployment/elasticsearch/plugins" in the production roles
[15:11:14] <gehel>	 there are some assumptions in that module that only hold true for our specific setup
[15:11:38] <tarrow>	 which is what I thought my second patch did; but I obviously couldn't test 
[15:12:06] <gehel>	 Oh, I see what you mean. That would work as well, but then you need to keep the param, not remove it. And change the default value. I'll send a patch
[15:13:46] <tarrow>	 ah, sure. I didn't realise removing it did anything other than set it to the default value. It seemed to work that you could still override it fine
[15:13:49] <tarrow>	 thanks!
[15:14:27] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342020 (owner: 10Marostegui)
[15:14:41] <wikibugs_>	 06Operations, 06Analytics-Kanban, 10ChangeProp, 10Reading-Web-Trending-Service, 06Services (watching): Upgrade librdkafka 0.9.4 on SCB and Varnishes - https://phabricator.wikimedia.org/T159379#3087960 (10mobrovac) Wednesday 15th?
[15:16:12] <wikibugs_>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342020 (owner: 10Marostegui)
[15:17:07] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1085 - T159414 (duration: 00m 41s)
[15:17:11] <wikibugs_>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342020 (owner: 10Marostegui)
[15:17:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:17:14] <stashbot>	 T159414: Rampant differences in indexes and PK on s6 (frwiki, jawiki, ruwiki) for revision table - https://phabricator.wikimedia.org/T159414
[15:17:37] <wikibugs_>	 (03PS4) 10Gehel: elasticsearch - use default plugins directory in the elasticsearch class [puppet] - 10https://gerrit.wikimedia.org/r/341831 (owner: 10Tarrow)
[15:18:42] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] elasticsearch - use default plugins directory in the elasticsearch class [puppet] - 10https://gerrit.wikimedia.org/r/341831 (owner: 10Tarrow)
[15:20:06] <wikibugs_>	 (03PS5) 10Gehel: elasticsearch - use default plugins directory in the elasticsearch class [puppet] - 10https://gerrit.wikimedia.org/r/341831 (owner: 10Tarrow)
[15:20:52] <wikibugs_>	 (03CR) 10Gehel: [C: 031] "This is a noop on our current cluster (as it should): https://puppet-compiler.wmflabs.org/5705/" [puppet] - 10https://gerrit.wikimedia.org/r/341831 (owner: 10Tarrow)
[15:25:41] <wikibugs_>	 (03PS6) 10Gehel: elasticsearch - use default plugins directory in the elasticsearch class [puppet] - 10https://gerrit.wikimedia.org/r/341831 (owner: 10Tarrow)
[15:27:01] <wikibugs_>	 (03CR) 10Gehel: [C: 032] elasticsearch - use default plugins directory in the elasticsearch class [puppet] - 10https://gerrit.wikimedia.org/r/341831 (owner: 10Tarrow)
[15:27:34] <gehel>	 tarrow: ^ you should be good (let me know otherwise)
[15:27:34] <tarrow>	 gehel: thanks! that's awesome! :)
[15:27:40] <tarrow>	 I'll just have a test
[15:35:30] <wikibugs_>	 (03PS1) 10Ema: lvs: load ip_vs before systemd-sysctl.service starts [puppet] - 10https://gerrit.wikimedia.org/r/342026
[15:36:53] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "LGTM but small error" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342026 (owner: 10Ema)
[15:37:06] <_joe_>	 ema: good idea to preload the module
[15:37:23] <_joe_>	 but you stumbled upon one of puppet's delicacies I think
[15:37:54] <ema>	 ha!
[15:37:58] <ema>	 thanks _joe_ 
[15:38:02] <wikibugs_>	 (03CR) 10Mobrovac: [C: 031] PDFRender: Delay service shut-down to work around xpra race (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/341833 (https://phabricator.wikimedia.org/T159922) (owner: 10GWicke)
[15:38:16] <_joe_>	 ema: puppet apply -e "notice('ciao\n')" vs puppet apply -e 'notice("ciao\n")'
[15:38:36] <ema>	 and of course no linter will complain that there's no variable in that double quoted string
[15:39:40] <_joe_>	 lol tell me that's really happening :P
[15:39:48] <wikibugs_>	 (03PS2) 10Ema: lvs: load ip_vs before systemd-sysctl.service starts [puppet] - 10https://gerrit.wikimedia.org/r/342026
[15:39:54] <ema>	 _joe_: let's see!
[15:40:34] <ema>	 _joe_: nope, puppet-lint is happy
[15:40:44] <_joe_>	 it's not /that/ stupid
[15:41:25] <ema>	 low expectations, key to happiness
[15:41:34] <icinga-wm>	 PROBLEM - puppet last run on elastic1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:42:37] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/342026 (owner: 10Ema)
[15:45:04] <icinga-wm>	 PROBLEM - puppet last run on lvs1001 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago
[15:45:34] <icinga-wm>	 RECOVERY - puppet last run on elastic1037 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[15:45:40] <wikibugs_>	 (03PS2) 10Muehlenhoff: Remove Aaron from deployment group [puppet] - 10https://gerrit.wikimedia.org/r/340101
[15:48:38] <wikibugs_>	 (03PS3) 10Ema: lvs: load ip_vs before systemd-sysctl.service starts [puppet] - 10https://gerrit.wikimedia.org/r/342026
[15:48:48] <wikibugs_>	 (03CR) 10Ema: [V: 032 C: 032] lvs: load ip_vs before systemd-sysctl.service starts [puppet] - 10https://gerrit.wikimedia.org/r/342026 (owner: 10Ema)
[15:50:08] <wikibugs_>	 06Operations, 10ops-codfw: codfw: oresrdb2002 rack/setup - https://phabricator.wikimedia.org/T160082#3088068 (10Papaul)
[15:50:14] <wikibugs_>	 (03PS3) 10Muehlenhoff: Remove Aaron from deployment group [puppet] - 10https://gerrit.wikimedia.org/r/340101
[15:53:28] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 032] Remove Aaron from deployment group [puppet] - 10https://gerrit.wikimedia.org/r/340101 (owner: 10Muehlenhoff)
[15:57:16] <wikibugs_>	 06Operations: Enhance account handling (meta bug) - https://phabricator.wikimedia.org/T142815#3088144 (10Ottomata)
[15:57:20] <wikibugs_>	 06Operations: Rethink/clarify/document use of 'analytics' vs. 'statistics' in group names - https://phabricator.wikimedia.org/T149225#3088142 (10Ottomata) 05Open>03declined I've added a sentence or two to help explain the difference here: https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Group...
[15:59:30] <wikibugs_>	 06Operations, 10ops-codfw, 10hardware-requests: Decom db2001-db2009 - https://phabricator.wikimedia.org/T125827#3088150 (10Papaul) Disk wipe in progress
[15:59:46] <wikibugs_>	 (03CR) 10Ema: [C: 031] Remove Piwik backend probe from Varnish Misc backends [puppet] - 10https://gerrit.wikimedia.org/r/342007 (https://phabricator.wikimedia.org/T154558) (owner: 10Elukey)
[16:00:09] <wikibugs_>	 06Operations, 10ops-codfw: codfw: oresrdb2002 rack/setup - https://phabricator.wikimedia.org/T160082#3088151 (10Papaul) p:05Triage>03Normal
[16:02:37] <wikibugs_>	 (03PS4) 10Elukey: Remove Piwik backend probe from Varnish Misc backends [puppet] - 10https://gerrit.wikimedia.org/r/342007 (https://phabricator.wikimedia.org/T154558)
[16:05:28] <wikibugs_>	 06Operations: Enhance account handling (meta bug) - https://phabricator.wikimedia.org/T142815#3088160 (10Ottomata)
[16:05:31] <wikibugs_>	 06Operations: Reconsider/check naming of 'privatedata' shell groups compared to their theoretically non-sensitive counterparts - https://phabricator.wikimedia.org/T149222#3088158 (10Ottomata) 05Open>03declined The '*private*' user groups here grant access to stat1002.  Historically, stat1002 was used to host...
[16:07:45] <wikibugs_>	 06Operations, 06Analytics-Kanban, 10ChangeProp, 10Reading-Web-Trending-Service, 06Services (watching): Upgrade librdkafka 0.9.4 on SCB and Varnishes - https://phabricator.wikimedia.org/T159379#3088162 (10Ottomata) Ya.  @elukey should we do varnishes before or after this?  I can add librdkafka 0.9.4 to ou...
[16:09:10] <wikibugs_>	 (03CR) 10Elukey: [C: 032] Remove Piwik backend probe from Varnish Misc backends [puppet] - 10https://gerrit.wikimedia.org/r/342007 (https://phabricator.wikimedia.org/T154558) (owner: 10Elukey)
[16:10:25] <elukey>	 !log remove Piwik/bohrium health check from Varnish cache misc (https://gerrit.wikimedia.org/r/#/c/342007/)
[16:10:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:15:38] <wikibugs_>	 (03PS1) 10DCausse: [es5 upgrade] step 1: depool codfw for writes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342031 (https://phabricator.wikimedia.org/T157479)
[16:15:39] <wikibugs_>	 (03PS1) 10DCausse: [es5 upgrade] step 2: repool codfw and send wmf16 to codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342032 (https://phabricator.wikimedia.org/T157479)
[16:15:42] <wikibugs_>	 (03PS1) 10DCausse: [es5 upgrade] step 3: depool eqiad for writes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342033 (https://phabricator.wikimedia.org/T157479)
[16:15:44] <wikibugs_>	 (03PS1) 10DCausse: [es5 upgrade] step 4: repool eqiad and restore normal operations [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342034 (https://phabricator.wikimedia.org/T157479)
[16:22:39] <wikibugs_>	 (03PS4) 10Gehel: deployment-prep: Use elasticsearch 5.x [puppet] - 10https://gerrit.wikimedia.org/r/341372 (owner: 10EBernhardson)
[16:27:07] <wikibugs_>	 (03CR) 10Gehel: [C: 032] deployment-prep: Use elasticsearch 5.x [puppet] - 10https://gerrit.wikimedia.org/r/341372 (owner: 10EBernhardson)
[16:27:44] <wikibugs_>	 (03PS1) 10Papaul: DNS: Add mgmt dns for oresrdb2002 [dns] - 10https://gerrit.wikimedia.org/r/342036
[16:32:54] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: x1 on dbstore2001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1032, Errmsg: Could not execute Update_rows_v1 event on table heartbeat.heartbeat: Cant find record in heartbeat, Error_code: 1032: handler error HA_ERR_KEY_NOT_FOUND: the events master log db1031-bin.002061, end_log_pos 445907880
[16:33:55] <wikibugs_>	 (03CR) 10Madhuvishy: [C: 032] Remove non-existing group from jupyterhub LDAP config [puppet] - 10https://gerrit.wikimedia.org/r/341336 (https://phabricator.wikimedia.org/T129788) (owner: 10Muehlenhoff)
[16:34:02] <wikibugs_>	 (03PS2) 10Madhuvishy: Remove non-existing group from jupyterhub LDAP config [puppet] - 10https://gerrit.wikimedia.org/r/341336 (https://phabricator.wikimedia.org/T129788) (owner: 10Muehlenhoff)
[16:34:51] <wikibugs_>	 (03CR) 10Madhuvishy: [V: 032 C: 032] Remove non-existing group from jupyterhub LDAP config [puppet] - 10https://gerrit.wikimedia.org/r/341336 (https://phabricator.wikimedia.org/T129788) (owner: 10Muehlenhoff)
[16:36:55] <jynus>	 ^manuel and me are on the alert
[16:37:45] <wikibugs_>	 06Operations, 10ops-codfw, 13Patch-For-Review: codfw: oresrdb2002 rack/setup - https://phabricator.wikimedia.org/T160082#3088249 (10Papaul)
[16:44:35] <wikibugs_>	 06Operations, 10ops-codfw, 13Patch-For-Review: codfw: oresrdb2002 rack/setup - https://phabricator.wikimedia.org/T160082#3088255 (10fgiunchedi) Row C works (oresrdb2001 is row B)  partman scheme: `raid1-lvm-ext4-srv.cfg` (same as oresrdb1001)
[16:44:41] <godog>	 papaul: ^
[16:51:16] <wikibugs_>	 (03PS1) 10EBernhardson: [cirrus] Config update for elasticsearch 5.x in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342040
[16:54:17] <wikibugs_>	 (03PS2) 10RobH: DNS: Add mgmt dns for oresrdb2002 [dns] - 10https://gerrit.wikimedia.org/r/342036 (owner: 10Papaul)
[16:58:50] <logmsgbot>	 !log filippo@puppetmaster1001 conftool action : set/pooled=no; selector: name=ms-fe1001.eqiad.wmnet
[16:58:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:59:08] <wikibugs_>	 (03PS1) 10MarcoAurelio: Allow 'autoreviewrestore' to be managed from Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342042
[17:00:04] <jouncebot>	 godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170309T1700).
[17:00:55] <wikibugs_>	 06Operations, 10ops-codfw, 13Patch-For-Review: codfw: oresrdb2002 rack/setup - https://phabricator.wikimedia.org/T160082#3088284 (10Papaul)
[17:01:56] <godog>	 no puppet swat patches https://i.imgur.com/m5lwP.gif
[17:02:08] <bblack>	 !log reboot lvs1004 (post-incident cleanup reboot)
[17:02:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:53] <wikibugs_>	 06Operations, 10ops-codfw, 10netops: codfw: oresrdb2002 switch port configuration - https://phabricator.wikimedia.org/T160087#3088288 (10Papaul)
[17:03:21] <wikibugs_>	 06Operations, 10ops-codfw, 13Patch-For-Review: codfw: oresrdb2002 rack/setup - https://phabricator.wikimedia.org/T160082#3088304 (10Papaul)
[17:04:16] <papaul>	 godog: thanks 
[17:04:55] <wikibugs_>	 (03CR) 10DCausse: [C: 031] [cirrus] Config update for elasticsearch 5.x in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342040 (owner: 10EBernhardson)
[17:05:44] <icinga-wm>	 PROBLEM - Host lvs1004 is DOWN: PING CRITICAL - Packet loss = 100%
[17:07:04] <icinga-wm>	 RECOVERY - Host lvs1004 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms
[17:08:54] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: x1 on dbstore2001 is OK: OK slave_sql_state not a slave
[17:09:12] <wikibugs_>	 06Operations, 13Patch-For-Review: replace fluorine with mwlog servers (was: Upgrade fluorine to trusty/jessie) - https://phabricator.wikimedia.org/T123728#3088327 (10fgiunchedi)
[17:10:04] <icinga-wm>	 RECOVERY - puppet last run on lvs1001 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures
[17:10:16] <wikibugs_>	 06Operations, 06Release-Engineering-Team, 05DC-Switchover-Prep-Q3-2016-17: Understand the preparedness of misc services for datacenter switchover - https://phabricator.wikimedia.org/T156937#3088335 (10fgiunchedi)
[17:10:18] <wikibugs_>	 06Operations, 05Goal: reduce amount of remaining Ubuntu 12.04 (precise) systems in production - https://phabricator.wikimedia.org/T123525#3088336 (10fgiunchedi)
[17:10:21] <wikibugs_>	 06Operations, 13Patch-For-Review: replace fluorine with mwlog servers (was: Upgrade fluorine to trusty/jessie) - https://phabricator.wikimedia.org/T123728#2243032 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi This is completed, I've left out the part about sending logs to both datacenters as out of sco...
[17:11:02] <bblack>	 !log reboot lvs1001 (post-incident cleanup reboot)
[17:11:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:12] <logmsgbot>	 !log filippo@puppetmaster1001 conftool action : set/pooled=no; selector: name=ms-fe1002.eqiad.wmnet
[17:11:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:21] <logmsgbot>	 !log filippo@puppetmaster1001 conftool action : set/pooled=no; selector: name=ms-fe1003.eqiad.wmnet
[17:11:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:13:14] <icinga-wm>	 PROBLEM - Host lvs1001 is DOWN: PING CRITICAL - Packet loss = 100%
[17:14:44] <icinga-wm>	 RECOVERY - Host lvs1001 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms
[17:15:50] <wikibugs_>	 (03PS1) 10Gehel: elasticsearch - make size of bulk executor configureable [puppet] - 10https://gerrit.wikimedia.org/r/342043
[17:16:49] <wikibugs_>	 (03CR) 10EBernhardson: [C: 031] elasticsearch - make size of bulk executor configureable [puppet] - 10https://gerrit.wikimedia.org/r/342043 (owner: 10Gehel)
[17:19:33] <wikibugs_>	 (03PS2) 10DCausse: Elastic 5.1.2 plugins [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/341826
[17:21:29] <wikibugs_>	 (03PS2) 10Gehel: elasticsearch - make size of bulk executor configureable [puppet] - 10https://gerrit.wikimedia.org/r/342043
[17:24:11] <wikibugs_>	 (03CR) 10Gehel: [V: 032 C: 032] elasticsearch - make size of bulk executor configureable [puppet] - 10https://gerrit.wikimedia.org/r/342043 (owner: 10Gehel)
[17:28:30] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Decouple tendril mariadb role to a separate file [puppet] - 10https://gerrit.wikimedia.org/r/342050 (https://phabricator.wikimedia.org/T150850)
[17:29:34] <wikibugs_>	 (03PS2) 10Jcrespo: mariadb: Decouple tendril mariadb role to a separate file [puppet] - 10https://gerrit.wikimedia.org/r/342050 (https://phabricator.wikimedia.org/T150850)
[17:31:17] <wikibugs_>	 (03CR) 10Hashar: [C: 04-1] "We had to enable StatCache or the performance just crawled down.  Ori made it very early you can see for the details T75706" [puppet] - 10https://gerrit.wikimedia.org/r/341916 (https://phabricator.wikimedia.org/T158176) (owner: 10Reedy)
[17:31:43] <wikibugs_>	 06Operations, 07HHVM, 13Patch-For-Review: Build / migrate to HHVM 3.18 - https://phabricator.wikimedia.org/T158176#3028990 (10hashar) Unless things have changed, we had to enable StatCache or the performance just crawled down.  Ori made it very early you can see for the details on T75706
[17:31:53] <wikibugs_>	 (03PS1) 10Gehel: elasticsearch - statsd plugin isn't used anymore [puppet] - 10https://gerrit.wikimedia.org/r/342052
[17:34:03] <wikibugs_>	 06Operations, 07HHVM, 13Patch-For-Review: Build / migrate to HHVM 3.18 - https://phabricator.wikimedia.org/T158176#3088422 (10Reedy) >>! In T158176#3088412, @hashar wrote: > Unless things have changed, we had to enable StatCache or the performance just crawled down.  Ori made it very early you can see for th...
[17:34:26] <wikibugs_>	 (03PS3) 10Jcrespo: mariadb: Decouple tendril mariadb role to a separate file [puppet] - 10https://gerrit.wikimedia.org/r/342050 (https://phabricator.wikimedia.org/T150850)
[17:34:28] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Decouple core (mediawiki) role on a separate file [puppet] - 10https://gerrit.wikimedia.org/r/342054 (https://phabricator.wikimedia.org/T150850)
[17:36:43] <wikibugs_>	 (03PS2) 10Gehel: elasticsearch - statsd plugin isn't used anymore [puppet] - 10https://gerrit.wikimedia.org/r/342052
[17:39:39] <wikibugs_>	 06Operations, 07HHVM, 13Patch-For-Review: Build / migrate to HHVM 3.18 - https://phabricator.wikimedia.org/T158176#3088436 (10hashar) The assertion failure definitely happens on the beta cluster. I haven't found a good way to reproduce.  Looking at the wfDebugLog I managed to find some URL that probably trig...
[17:40:22] <wikibugs_>	 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Decomission ms-fe2001-4 - https://phabricator.wikimedia.org/T159413#3088437 (10Papaul) @Robh yes please do. Thanks
[17:40:25] <wikibugs_>	 (03PS18) 10Nuria: Changes to perf consumer of event logging events [puppet] - 10https://gerrit.wikimedia.org/r/337158 (https://phabricator.wikimedia.org/T156760)
[17:41:18] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Decouple tendril mariadb role to a separate file [puppet] - 10https://gerrit.wikimedia.org/r/342050 (https://phabricator.wikimedia.org/T150850) (owner: 10Jcrespo)
[17:41:49] <wikibugs_>	 (03PS3) 10Gehel: elasticsearch - statsd plugin isn't used anymore [puppet] - 10https://gerrit.wikimedia.org/r/342052
[17:43:11] <wikibugs_>	 06Operations, 06Labs: Remove linux kernel 3.16 from the jessie image on labs - https://phabricator.wikimedia.org/T159990#3088459 (10Andrew) I just built four different jessie instances, ran 'apt get update && apt-get upgrade' on them and rebooted.  All four came up, no problems.
[17:43:42] <wikibugs_>	 (03CR) 10Chad: "That one is actually being used" [puppet] - 10https://gerrit.wikimedia.org/r/341593 (owner: 10Chad)
[17:43:51] <wikibugs_>	 06Operations, 07HHVM, 13Patch-For-Review: Build / migrate to HHVM 3.18 - https://phabricator.wikimedia.org/T158176#3088461 (10hashar) The beta log spam stopped because StatCache has been disabled via cherry pick of https://gerrit.wikimedia.org/r/#/c/341916/ .  That hardly made any change to the instances CPU...
[17:45:32] <logmsgbot>	 !log filippo@puppetmaster1001 conftool action : set/pooled=no; selector: name=ms-fe1004.eqiad.wmnet
[17:45:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:45:51] <wikibugs_>	 06Operations, 07HHVM, 13Patch-For-Review, 07Upstream: Build / migrate to HHVM 3.18 - https://phabricator.wikimedia.org/T158176#3088468 (10hashar)
[17:47:30] <wikibugs_>	 06Operations, 06Labs: Remove linux kernel 3.16 from the jessie image on labs - https://phabricator.wikimedia.org/T159990#3088472 (10Paladox) Oh, i wonder why mine failed.
[17:50:01] <wikibugs_>	 (03PS2) 10Jcrespo: mariadb: Decouple core (mediawiki) role on a separate file [puppet] - 10https://gerrit.wikimedia.org/r/342054 (https://phabricator.wikimedia.org/T150850)
[17:50:03] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Decouple labsdb mariadb role (deprecated) to a separate file [puppet] - 10https://gerrit.wikimedia.org/r/342060 (https://phabricator.wikimedia.org/T150850)
[17:50:06] <wikibugs_>	 (03PS1) 10BryanDavis: Remove support for Precise [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/342061 (https://phabricator.wikimedia.org/T94792)
[17:50:37] <logmsgbot>	 !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=wdqs1003.eqiad.wmnet
[17:50:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:50:52] <wikibugs_>	 (03CR) 10RobH: [C: 032] DNS: Add mgmt dns for oresrdb2002 [dns] - 10https://gerrit.wikimedia.org/r/342036 (owner: 10Papaul)
[17:51:38] <wikibugs_>	 (03PS2) 10BryanDavis: Remove support for Precise [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/342061 (https://phabricator.wikimedia.org/T94792)
[17:51:44] <icinga-wm>	 PROBLEM - puppet last run on db1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:52:42] <wikibugs_>	 (03PS3) 10BryanDavis: Remove support for Precise [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/342061 (https://phabricator.wikimedia.org/T94792)
[17:59:44] <wikibugs_>	 (03PS1) 10Muehlenhoff: Enable experimental on cp1008 [puppet] - 10https://gerrit.wikimedia.org/r/342063
[18:00:04] <jouncebot>	 gwicke, cscott, arlolra, subbu, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170309T1800).
[18:00:15] <halfak>	 Nothing for ORES today
[18:00:17] <subbu>	 no parsoid deploy today
[18:00:53] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Decouple mariadb wikitech role to a separate file [puppet] - 10https://gerrit.wikimedia.org/r/342064 (https://phabricator.wikimedia.org/T150850)
[18:05:59] <wikibugs_>	 (03CR) 10Muehlenhoff: [V: 032 C: 032] Enable experimental on cp1008 [puppet] - 10https://gerrit.wikimedia.org/r/342063 (owner: 10Muehlenhoff)
[18:06:30] <wikibugs_>	 (03PS4) 10BryanDavis: Remove support for Precise [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/342061 (https://phabricator.wikimedia.org/T94792)
[18:12:16] <wikibugs_>	 06Operations, 10ops-codfw, 13Patch-For-Review: codfw: oresrdb2002 rack/setup - https://phabricator.wikimedia.org/T160082#3088551 (10RobH)
[18:12:19] <wikibugs_>	 06Operations, 10ops-codfw, 10netops: codfw: oresrdb2002 switch port configuration - https://phabricator.wikimedia.org/T160087#3088549 (10RobH) 05Open>03Resolved switch port updated and committed, resolving task  ```  robh@asw-c-codfw# show | compare  [edit interfaces interface-range vlan-private1-c-codfw...
[18:19:44] <icinga-wm>	 RECOVERY - puppet last run on db1024 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[18:21:12] <moritzm>	 !log rebooting cp1008 for upgrade to Linux 4.9
[18:21:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:22:39] <wikibugs_>	 06Operations, 06DC-Ops: audit spare disk levels for codfw & eqiad utlized storage in servers - https://phabricator.wikimedia.org/T160097#3088618 (10RobH)
[18:22:42] <wikibugs_>	 06Operations, 06DC-Ops: audit spare disk levels for codfw & eqiad against shelf spares - https://phabricator.wikimedia.org/T160097#3088631 (10RobH)
[18:22:43] <wikibugs_>	 06Operations, 06DC-Ops: audit spare disk levels for codfw & eqiad utlized storage in servers - https://phabricator.wikimedia.org/T160097#3088618 (10RobH)
[18:23:02] <robh>	 too many open tabs im reverting my own task edits =P
[18:31:23] <wikibugs_>	 (03CR) 10Filippo Giunchedi: "LGTM on the idea, comments on the implementation" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/341791 (https://phabricator.wikimedia.org/T159352) (owner: 10Gilles)
[18:34:34] <icinga-wm>	 PROBLEM - puppet last run on ganeti1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[18:36:05] <wikibugs_>	 06Operations, 10media-storage: Sanity check global-multiwrite logs for ConfirmEdit usage - https://phabricator.wikimedia.org/T159830#3088694 (10fgiunchedi) I took a quick look at the swift logs on lithium, all DELETEs seem to be successful (i.e. HTTP 200s) with the exception of some for which swift replied 404...
[18:44:28] <Reedy>	 jouncebot: next
[18:44:28] <jouncebot>	 In 0 hour(s) and 15 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170309T1900)
[19:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170309T1900).
[19:01:02] <wikibugs_>	 (03PS1) 10BryanDavis: Full PEP8/Flake8 compliance [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/342069
[19:02:14] <wikibugs_>	 (03Abandoned) 10DCausse: [cirrus] Add $wgCirrusSearchElasticQuirks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/339409 (owner: 10DCausse)
[19:02:34] <icinga-wm>	 RECOVERY - puppet last run on ganeti1001 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures
[19:02:44] <wikibugs_>	 (03PS2) 10Dzahn: mediawiki::logging: remove fluorine from firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/341940 (https://phabricator.wikimedia.org/T159996)
[19:04:25] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] "fluorine is a "role spare" now and about to be decom'ed." [puppet] - 10https://gerrit.wikimedia.org/r/341940 (https://phabricator.wikimedia.org/T159996) (owner: 10Dzahn)
[19:06:40] <wikibugs_>	 (03CR) 10Yuvipanda: [C: 031] "This should work." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/342061 (https://phabricator.wikimedia.org/T94792) (owner: 10BryanDavis)
[19:07:34] <icinga-wm>	 PROBLEM - puppet last run on prometheus1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[19:14:46] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] mariadb: Decouple mariadb wikitech role to a separate file [puppet] - 10https://gerrit.wikimedia.org/r/342064 (https://phabricator.wikimedia.org/T150850) (owner: 10Jcrespo)
[19:18:47] <wikibugs_>	 (03PS3) 10Dzahn: mediawiki::logging: remove fluorine from firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/341940 (https://phabricator.wikimedia.org/T159996)
[19:19:15] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 031] Harmomise group type for LDAP admin access [puppet] - 10https://gerrit.wikimedia.org/r/342008 (https://phabricator.wikimedia.org/T157131) (owner: 10Muehlenhoff)
[19:19:45] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] labstore: fix typo in snapshot-manager [puppet] - 10https://gerrit.wikimedia.org/r/341427 (owner: 10Hashar)
[19:19:49] <wikibugs_>	 (03PS2) 10Andrew Bogott: labstore: fix typo in snapshot-manager [puppet] - 10https://gerrit.wikimedia.org/r/341427 (owner: 10Hashar)
[19:20:23] <wikibugs_>	 06Operations, 10hardware-requests, 13Patch-For-Review: decom fluorine - https://phabricator.wikimedia.org/T159996#3088814 (10Dzahn) a:03Dzahn
[19:20:56] <wikibugs_>	 (03PS1) 10Filippo Giunchedi: Provision new ms-be machines in codfw [puppet] - 10https://gerrit.wikimedia.org/r/342074 (https://phabricator.wikimedia.org/T158337)
[19:21:42] <wikibugs_>	 (03CR) 10Dzahn: [V: 032 C: 032] mediawiki::logging: remove fluorine from firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/341940 (https://phabricator.wikimedia.org/T159996) (owner: 10Dzahn)
[19:22:18] <wikibugs_>	 (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/341434 (owner: 10Dzahn)
[19:22:43] <legoktm>	 !log foreachwiki extensions/WikimediaMaintenance/createExtensionTables.php linter
[19:22:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:22:49] <wikibugs_>	 (03PS1) 10Eevans: Optional Cassandra client encryption; Enabled on RESTBase Staging [puppet] - 10https://gerrit.wikimedia.org/r/342075 (https://phabricator.wikimedia.org/T111113)
[19:22:51] <wikibugs_>	 (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/#/c/341427/ has been merged (thx AndrewBogott) so now this should be Verified" [puppet] - 10https://gerrit.wikimedia.org/r/341434 (owner: 10Dzahn)
[19:24:10] <wikibugs_>	 (03CR) 10Dzahn: "on eventlog1001:" [puppet] - 10https://gerrit.wikimedia.org/r/341940 (https://phabricator.wikimedia.org/T159996) (owner: 10Dzahn)
[19:24:18] <wikibugs_>	 (03PS3) 10Andrew Bogott: labstore: fix typo in snapshot-manager [puppet] - 10https://gerrit.wikimedia.org/r/341427 (owner: 10Hashar)
[19:25:39] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] typos: add 'criticial' [puppet] - 10https://gerrit.wikimedia.org/r/341434 (owner: 10Dzahn)
[19:26:12] <wikibugs_>	 06Operations, 10ops-codfw, 10hardware-requests: decommission ms2001 & ms2002 - https://phabricator.wikimedia.org/T157991#3088824 (10RobH) 05Open>03Resolved port info is no longer on switches, resolving task
[19:26:47] <wikibugs_>	 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Decomission ms-fe2001-4 - https://phabricator.wikimedia.org/T159413#3088826 (10RobH) a:05Papaul>03RobH
[19:29:18] <wikibugs_>	 06Operations, 10hardware-requests, 13Patch-For-Review: decom fluorine - https://phabricator.wikimedia.org/T159996#3088834 (10Dzahn)
[19:31:16] <wikibugs_>	 (03PS2) 10Eevans: Optional Cassandra client encryption; Enabled on RESTBase Staging [puppet] - 10https://gerrit.wikimedia.org/r/342075 (https://phabricator.wikimedia.org/T111113)
[19:31:18] <wikibugs_>	 06Operations, 10Packaging: Upgrade php5-json .deb to at least 1.3.8 - https://phabricator.wikimedia.org/T160101#3088838 (10Legoktm)
[19:33:34] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db1047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 453.41 seconds
[19:35:30] <wikibugs_>	 (03CR) 10Eevans: "PC output: http://puppet-compiler.wmflabs.org/5711/" [puppet] - 10https://gerrit.wikimedia.org/r/342075 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans)
[19:35:34] <icinga-wm>	 RECOVERY - puppet last run on prometheus1002 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures
[19:35:51] <wikibugs_>	 06Operations, 10hardware-requests, 13Patch-For-Review: decom fluorine - https://phabricator.wikimedia.org/T159996#3088868 (10Dzahn)
[19:36:21] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] remove fluorine from DHCP config [puppet] - 10https://gerrit.wikimedia.org/r/341939 (https://phabricator.wikimedia.org/T159996) (owner: 10Dzahn)
[19:36:31] <wikibugs_>	 (03PS2) 10Dzahn: remove fluorine from DHCP config [puppet] - 10https://gerrit.wikimedia.org/r/341939 (https://phabricator.wikimedia.org/T159996)
[19:43:10] <wikibugs_>	 (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/341434 (owner: 10Dzahn)
[19:44:34] <icinga-wm>	 PROBLEM - puppet last run on wtp1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[19:47:37] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] typos: add 'criticial' [puppet] - 10https://gerrit.wikimedia.org/r/341434 (owner: 10Dzahn)
[19:47:42] <wikibugs_>	 (03PS3) 10Dzahn: typos: add 'criticial' [puppet] - 10https://gerrit.wikimedia.org/r/341434
[19:47:51] <wikibugs_>	 (03CR) 10Dzahn: [V: 032 C: 032] typos: add 'criticial' [puppet] - 10https://gerrit.wikimedia.org/r/341434 (owner: 10Dzahn)
[19:49:01] <wikibugs_>	 (03PS3) 10Dzahn: udp2log: fix lint warning [puppet] - 10https://gerrit.wikimedia.org/r/341964
[19:52:58] <wikibugs_>	 (03PS4) 10Dzahn: udp2log: remove "lint-ignore" that has been fixed [puppet] - 10https://gerrit.wikimedia.org/r/341964
[19:53:32] <logmsgbot>	 !log reedy@tin Synchronized php-1.29.0-wmf.15/extensions/ConfirmEdit: Fixup maintenance script (duration: 00m 43s)
[19:53:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:54:25] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] udp2log: remove "lint-ignore" that has been fixed [puppet] - 10https://gerrit.wikimedia.org/r/341964 (owner: 10Dzahn)
[19:55:51] <wikibugs_>	 (03PS2) 10Dzahn: Gerrit: Remove reviewer counts cron, nobody is using it [puppet] - 10https://gerrit.wikimedia.org/r/341593 (owner: 10Chad)
[19:56:34] <icinga-wm>	 PROBLEM - puppet last run on ms-be1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[19:58:22] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] Gerrit: Remove reviewer counts cron, nobody is using it [puppet] - 10https://gerrit.wikimedia.org/r/341593 (owner: 10Chad)
[20:00:04] <jouncebot>	 twentyafterfour: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170309T2000). Please do the needful.
[20:02:57] <mutante>	 !log cobalt: remove crontab entry of user gerrit2 that created reviewer counts, gzip /var/www/reviewer-counts.json and moved to /root/ for backup (re: gerrit:341592) T54329
[20:03:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:03:04] <stashbot>	 T54329: Provide reviewer counts per Gerrit changeset in batch form - https://phabricator.wikimedia.org/T54329
[20:03:43] <wikibugs_>	 (03CR) 10Dzahn: "!log cobalt: remove crontab entry of user gerrit2 that created reviewer counts, gzip /var/www/reviewer-counts.json and moved to /root/ for" [puppet] - 10https://gerrit.wikimedia.org/r/341593 (owner: 10Chad)
[20:04:37] <wikibugs_>	 (03PS4) 10Dzahn: planet: get rid of $realm-case, use Hiera for domain name [puppet] - 10https://gerrit.wikimedia.org/r/341959
[20:06:29] <wikibugs_>	 (03PS1) 1020after4: all wikis to 1.29.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342078
[20:06:31] <wikibugs_>	 (03CR) 1020after4: [C: 032] all wikis to 1.29.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342078 (owner: 1020after4)
[20:08:43] <wikibugs_>	 (03Merged) 10jenkins-bot: all wikis to 1.29.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342078 (owner: 1020after4)
[20:08:56] <wikibugs_>	 (03CR) 10jenkins-bot: all wikis to 1.29.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342078 (owner: 1020after4)
[20:09:13] <logmsgbot>	 !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.29.0-wmf.15
[20:09:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:12:34] <icinga-wm>	 RECOVERY - puppet last run on wtp1007 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[20:18:32] <wikibugs_>	 (03CR) 10Krinkle: "No worries, I appreciate it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338996 (https://phabricator.wikimedia.org/T158580) (owner: 10Jcrespo)
[20:22:47] <wikibugs_>	 (03PS1) 10Papaul: DNS:Add production dns for oresrdb2002 [dns] - 10https://gerrit.wikimedia.org/r/342081
[20:23:02] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] DNS:Add production dns for oresrdb2002 [dns] - 10https://gerrit.wikimedia.org/r/342081 (owner: 10Papaul)
[20:24:34] <icinga-wm>	 RECOVERY - puppet last run on ms-be1002 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[20:24:46] <wikibugs_>	 (03PS19) 10Nuria: Changes to perf consumer of event logging events [puppet] - 10https://gerrit.wikimedia.org/r/337158 (https://phabricator.wikimedia.org/T156760)
[20:25:23] <wikibugs_>	 (03Draft1) 10Paladox: Gerrit: Make sure any services under the gerrit2 user are stopped [debs/gerrit] - 10https://gerrit.wikimedia.org/r/342082
[20:25:25] <wikibugs_>	 (03PS2) 10Paladox: Gerrit: Make sure any services under the gerrit2 user are stopped [debs/gerrit] - 10https://gerrit.wikimedia.org/r/342082
[20:25:33] <paladox>	 RainbowSprinkles ^^ :)
[20:26:02] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Changes to perf consumer of event logging events [puppet] - 10https://gerrit.wikimedia.org/r/337158 (https://phabricator.wikimedia.org/T156760) (owner: 10Nuria)
[20:27:40] <wikibugs_>	 (03CR) 10Krinkle: [C: 04-1] "LGTM (assuming tests pass) - two minor nits left." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/337158 (https://phabricator.wikimedia.org/T156760) (owner: 10Nuria)
[20:28:52] <wikibugs_>	 (03PS3) 10Paladox: Gerrit: Make sure any services under the gerrit2 user are stopped [debs/gerrit] - 10https://gerrit.wikimedia.org/r/342082
[20:29:50] <wikibugs_>	 (03CR) 10Chad: Gerrit: Make sure any services under the gerrit2 user are stopped (031 comment) [debs/gerrit] - 10https://gerrit.wikimedia.org/r/342082 (owner: 10Paladox)
[20:30:37] <wikibugs_>	 (03CR) 10Paladox: Gerrit: Make sure any services under the gerrit2 user are stopped (031 comment) [debs/gerrit] - 10https://gerrit.wikimedia.org/r/342082 (owner: 10Paladox)
[20:32:27] <wikibugs_>	 (03CR) 10Paladox: "killall is dangerous as it fails lintian." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/342082 (owner: 10Paladox)
[20:32:30] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/5712/" [puppet] - 10https://gerrit.wikimedia.org/r/341959 (owner: 10Dzahn)
[20:36:18] <wikibugs_>	 (03PS20) 10Nuria: Changes to perf consumer of event logging events [puppet] - 10https://gerrit.wikimedia.org/r/337158 (https://phabricator.wikimedia.org/T156760)
[20:37:04] <wikibugs_>	 (03CR) 10Chad: Gerrit: Make sure any services under the gerrit2 user are stopped (031 comment) [debs/gerrit] - 10https://gerrit.wikimedia.org/r/342082 (owner: 10Paladox)
[20:38:56] <urandom>	 What would be the most elegant way to specify an associative array in heira, and then template that into a yaml file?  Do we have an example of such a thing in operations/puppet?
[20:43:44] <bblack>	 urandom: we have lots of associative arrays in our hieradata, all over the hieradata/ subdirectories in the ops/puppet repo
[20:44:11] <bblack>	 foo::bar:
[20:44:15] <wikibugs_>	 (03PS4) 10Paladox: Gerrit: Make sure any services under the gerrit2 user are stopped [debs/gerrit] - 10https://gerrit.wikimedia.org/r/342082
[20:44:16] <bblack>	   key1: value1
[20:44:18] <wikibugs_>	 (03CR) 10Krinkle: [C: 031] "'python -m unittest navtiming' passes." [puppet] - 10https://gerrit.wikimedia.org/r/337158 (https://phabricator.wikimedia.org/T156760) (owner: 10Nuria)
[20:44:19] <bblack>	   key2: value2
[20:44:20] <bblack>	 etc...
[20:45:00] <bblack>	 oh I misunderstood your question I think
[20:45:13] <bblack>	 you want the (yaml) hieradata to then be ERB-templated into a yaml output file on the host
[20:45:26] <bblack>	 I think that depends on whether it's just a single-depth array of simple keys or not
[20:45:34] <bblack>	 (if it is, just iterate it in ruby and output yaml-looking strings)
[20:45:40] <bblack>	 (if not... ?)
[20:45:48] <wikibugs_>	 (03PS3) 10Dzahn: racktables: get rid of $realm-case, use Hiera for host name [puppet] - 10https://gerrit.wikimedia.org/r/341960
[20:48:00] <wikibugs_>	 (03CR) 10Dzahn: "yea, i recommended killall first because it can kill by process name to avoid the scripting to find the UID, then lintian said that is dan" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/342082 (owner: 10Paladox)
[20:49:01] <urandom>	 bblack: yeah, it's single-depth (for now)
[20:49:15] <urandom>	 or single-depth would be Good Enough I think
[20:50:01] <wikibugs_>	 (03CR) 10Ottomata: [C: 031] "A follow up here would be to use Kafka instead of ZMQ.  We can do that after we get webperf eventlogging off of trebuchet and onto scap." [puppet] - 10https://gerrit.wikimedia.org/r/341724 (owner: 10Krinkle)
[20:50:10] <wikibugs_>	 (03CR) 10Dzahn: "generally about kill i always had this quote: "Generally, send 15, and wait a second or two, and if that doesn't work, send 2, and if that" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/342082 (owner: 10Paladox)
[20:50:54] <wikibugs_>	 (03PS1) 10Papaul: Add MAC address and partman entries for oresrdb2002 [puppet] - 10https://gerrit.wikimedia.org/r/342086
[20:50:56] <urandom>	 bblack: i was hoping for something more elegant, but in this case, didn't have strong expectations :)
[20:52:05] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/5713/" [puppet] - 10https://gerrit.wikimedia.org/r/341960 (owner: 10Dzahn)
[20:53:19] <wikibugs_>	 06Operations, 10ops-codfw, 13Patch-For-Review: codfw: oresrdb2002 rack/setup - https://phabricator.wikimedia.org/T160082#3089206 (10Papaul)
[20:54:18] <wikibugs_>	 06Operations, 10RESTBase, 10service-runner, 06Services (doing), 15User-mobrovac: enable restbase syslog/file logging - https://phabricator.wikimedia.org/T112648#3089212 (10mobrovac) Ok, after losing half of a day on this, I realised that using `/var/log` is not going to fly with firejail. It explicitly [...
[20:55:34] <wikibugs_>	 (03PS2) 10Dzahn: Add MAC address and partman entries for oresrdb2002 [puppet] - 10https://gerrit.wikimedia.org/r/342086 (owner: 10Papaul)
[20:58:03] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] Add MAC address and partman entries for oresrdb2002 [puppet] - 10https://gerrit.wikimedia.org/r/342086 (owner: 10Papaul)
[20:59:56] <wikibugs_>	 (03CR) 10Dzahn: [C: 04-1] "in "wmnet" the IP is not complete" (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/342081 (owner: 10Papaul)
[21:03:41] <wikibugs_>	 06Operations, 10Gerrit, 06Labs, 06Release-Engineering-Team, 07LDAP: Remove user gerrit2 from ldap - https://phabricator.wikimedia.org/T160122#3089287 (10Paladox)
[21:03:45] <wikibugs_>	 06Operations, 10Gerrit, 06Labs, 06Release-Engineering-Team, 07LDAP: Remove user gerrit2 from ldap - https://phabricator.wikimedia.org/T160122#3089299 (10Paladox) p:05Triage>03High
[21:04:44] <wikibugs_>	 (03PS2) 10Dzahn: maintenance: provision /etc/wgetrc [puppet] - 10https://gerrit.wikimedia.org/r/341264 (https://phabricator.wikimedia.org/T159661) (owner: 10Dereckson)
[21:12:18] <wikibugs_>	 06Operations, 07Documentation, 07LDAP, 13Patch-For-Review: Review list of LDAP groups and document exactly what kind of access they can be allowed to provide - https://phabricator.wikimedia.org/T129788#3089322 (10Dzahn) also see: T160122
[21:13:33] <wikibugs_>	 (03PS1) 10Eevans: WIP: TLS configuration for RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/342088
[21:15:05] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] WIP: TLS configuration for RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/342088 (owner: 10Eevans)
[21:16:31] <wikibugs_>	 06Operations, 10RESTBase, 10service-runner, 06Services (doing), 15User-mobrovac: enable restbase syslog/file logging - https://phabricator.wikimedia.org/T112648#3089340 (10GWicke) How about going back to syslog over UDP?
[21:16:45] <wikibugs_>	 (03CR) 10Dzahn: "could you make wgetrc a template instead of a file and use "<%= @site %>" instead of "eqiad" in there to make it flexible?" [puppet] - 10https://gerrit.wikimedia.org/r/341264 (https://phabricator.wikimedia.org/T159661) (owner: 10Dereckson)
[21:17:33] <wikibugs_>	 06Operations, 10Gerrit, 06Labs, 06Release-Engineering-Team, 07LDAP: Remove user gerrit2 from ldap - https://phabricator.wikimedia.org/T160122#3089347 (10Paladox)
[21:17:39] <wikibugs_>	 (03PS2) 10Eevans: WIP: TLS configuration for RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/342088
[21:23:42] <wikibugs_>	 06Operations, 10hardware-requests, 13Patch-For-Review: decom fluorine - https://phabricator.wikimedia.org/T159996#3089366 (10Dzahn)
[21:29:37] <wikibugs_>	 (03PS1) 10Dzahn: site.pp: remove fluorine (decom) [puppet] - 10https://gerrit.wikimedia.org/r/342089 (https://phabricator.wikimedia.org/T159996)
[21:29:44] <wikibugs_>	 (03PS3) 10Eevans: WIP: TLS configuration for RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/342088
[21:30:19] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/5716/" [puppet] - 10https://gerrit.wikimedia.org/r/341966 (owner: 10Dzahn)
[21:30:24] <wikibugs_>	 (03PS3) 10Dzahn: authdns: fix lint warning [puppet] - 10https://gerrit.wikimedia.org/r/341966
[21:33:31] <wikibugs_>	 (03CR) 10Mobrovac: WIP: TLS configuration for RESTBase (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/342088 (owner: 10Eevans)
[21:33:39] <wikibugs_>	 (03Draft1) 10Paladox: Gerrit: Ensure review_site is owned by gerrit2:gerrit [puppet] - 10https://gerrit.wikimedia.org/r/342091
[21:33:41] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] site.pp: remove fluorine (decom) [puppet] - 10https://gerrit.wikimedia.org/r/342089 (https://phabricator.wikimedia.org/T159996) (owner: 10Dzahn)
[21:33:49] <wikibugs_>	 (03PS2) 10Paladox: Gerrit: Ensure review_site is owned by gerrit2:gerrit [puppet] - 10https://gerrit.wikimedia.org/r/342091
[21:35:51] <mutante>	 !log fluorine - shutdown -h now (decom) T159996
[21:35:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:35:58] <stashbot>	 T159996: decom fluorine  - https://phabricator.wikimedia.org/T159996
[21:36:34] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db1047 is OK: OK slave_sql_lag Replication lag: 55.39 seconds
[21:37:45] <mutante>	 !log fluorine - puppet node clean, puppet node deactivate, salt-key -d, remove from Icinga..  (T159996)
[21:37:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:38:58] <wikibugs_>	 (03PS4) 10Dzahn: authdns: fix lint warning [puppet] - 10https://gerrit.wikimedia.org/r/341966
[21:39:38] <wikibugs_>	 06Operations, 10hardware-requests, 13Patch-For-Review: decom fluorine - https://phabricator.wikimedia.org/T159996#3089445 (10Dzahn)
[21:41:39] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] remove fluorine prod IP, keep mgmt [dns] - 10https://gerrit.wikimedia.org/r/341941 (https://phabricator.wikimedia.org/T159996) (owner: 10Dzahn)
[21:42:18] <wikibugs_>	 (03PS3) 10Dzahn: remove fluorine prod IP, keep mgmt [dns] - 10https://gerrit.wikimedia.org/r/341941 (https://phabricator.wikimedia.org/T159996)
[21:43:33] <wikibugs_>	 06Operations, 10hardware-requests, 13Patch-For-Review: decom fluorine - https://phabricator.wikimedia.org/T159996#3089473 (10Dzahn)
[21:44:07] <wikibugs_>	 (03PS4) 10Dzahn: remove fluorine prod IP, keep mgmt [dns] - 10https://gerrit.wikimedia.org/r/341941 (https://phabricator.wikimedia.org/T159996)
[21:46:17] <wikibugs_>	 06Operations, 05Goal: reduce amount of remaining Ubuntu 12.04 (precise) systems in production - https://phabricator.wikimedia.org/T123525#3089498 (10Dzahn)
[21:46:32] <wikibugs_>	 06Operations, 05Goal: reduce amount of remaining Ubuntu 12.04 (precise) systems in production - https://phabricator.wikimedia.org/T123525#2199946 (10Dzahn) fluorine has been shutdown: count:  3
[21:48:30] <wikibugs_>	 (03PS3) 10Paladox: Gerrit: Ensure /var/lib/gerrit2 is owned by gerrit2:gerrit [puppet] - 10https://gerrit.wikimedia.org/r/342091
[21:48:37] <wikibugs_>	 (03PS4) 10Paladox: Gerrit: Ensure /var/lib/gerrit2 is owned by gerrit2:gerrit [puppet] - 10https://gerrit.wikimedia.org/r/342091
[21:48:50] <paladox>	 mutante RainbowSprinkles ^^
[21:48:53] <paladox>	 :)
[21:48:53] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] WIP   Apt:  Remove an ensure->absent stanza [puppet] - 10https://gerrit.wikimedia.org/r/336954 (owner: 10Andrew Bogott)
[21:51:36] <wikibugs_>	 (03PS3) 10Andrew Bogott: Apt:  Remove an ensure->absent stanza [puppet] - 10https://gerrit.wikimedia.org/r/336954
[21:51:39] <wikibugs_>	 (03PS2) 10Papaul: DNS:Add production dns for oresrdb2002 [dns] - 10https://gerrit.wikimedia.org/r/342081
[21:52:07] <wikibugs_>	 (03PS4) 10Eevans: WIP: TLS configuration for RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/342088
[21:52:12] <wikibugs_>	 (03CR) 10Dzahn: "i can confirm /var/lib/gerrit2/review_site/lib/mysql-connector-java.jar is owned by gerrit2:gerrit2 on cobalt.  not so sure yet about runn" [puppet] - 10https://gerrit.wikimedia.org/r/342091 (owner: 10Paladox)
[21:52:26] <wikibugs_>	 (03CR) 10Chad: [C: 04-1] "I disagree with this approach. Labs should be fixed to have the correct group, production does this just fine already" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342091 (owner: 10Paladox)
[21:53:04] <wikibugs_>	 (03Abandoned) 10Paladox: Gerrit: Ensure /var/lib/gerrit2 is owned by gerrit2:gerrit [puppet] - 10https://gerrit.wikimedia.org/r/342091 (owner: 10Paladox)
[21:54:12] <wikibugs_>	 (03Abandoned) 10Mholloway: [Android] Create symlink to repo licenses dir in the SDK on CI [puppet] - 10https://gerrit.wikimedia.org/r/341583 (https://phabricator.wikimedia.org/T147099) (owner: 10Mholloway)
[21:54:23] <wikibugs_>	 06Operations, 10hardware-requests, 13Patch-For-Review: decom fluorine - https://phabricator.wikimedia.org/T159996#3089520 (10Dzahn)
[21:54:48] <logmsgbot>	 !log mobrovac@tin Started deploy [trending-edits/deploy@57a654e]: Bump max_pages for T156411
[21:54:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:54:54] <stashbot>	 T156411: Compute the trending articles over a period of 24h rather than 1h - https://phabricator.wikimedia.org/T156411
[21:55:41] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] Apt:  Remove an ensure->absent stanza [puppet] - 10https://gerrit.wikimedia.org/r/336954 (owner: 10Andrew Bogott)
[21:56:57] <wikibugs_>	 06Operations, 10hardware-requests, 13Patch-For-Review: decom fluorine - https://phabricator.wikimedia.org/T159996#3089559 (10Dzahn) @Robh per IRC talk, all steps above done (and added a little) up to disabling switch ports.  please do that as i just shut the server down a couple minutes ago.   please **do NO...
[21:57:07] <wikibugs_>	 06Operations, 10hardware-requests, 13Patch-For-Review: decom fluorine - https://phabricator.wikimedia.org/T159996#3089560 (10Dzahn) a:05Dzahn>03RobH
[21:59:24] <wikibugs_>	 06Operations, 06Performance-Team: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837#3089575 (10Krinkle) p:05Triage>03Low
[22:00:32] <wikibugs_>	 (03CR) 10Mobrovac: [C: 04-1] WIP: TLS configuration for RESTBase (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342088 (owner: 10Eevans)
[22:00:54] <logmsgbot>	 !log mobrovac@tin Finished deploy [trending-edits/deploy@57a654e]: Bump max_pages for T156411 (duration: 06m 07s)
[22:01:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:01:01] <stashbot>	 T156411: Compute the trending articles over a period of 24h rather than 1h - https://phabricator.wikimedia.org/T156411
[22:05:44] <wikibugs_>	 06Operations, 10RESTBase, 10service-runner, 06Services (doing), 15User-mobrovac: enable restbase syslog/file logging - https://phabricator.wikimedia.org/T112648#3089589 (10mobrovac) Touche. I vote for the latter.
[22:07:01] <wikibugs_>	 (03PS3) 10Dzahn: DNS:Add production dns for oresrdb2002 [dns] - 10https://gerrit.wikimedia.org/r/342081 (owner: 10Papaul)
[22:09:04] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] DNS:Add production dns for oresrdb2002 [dns] - 10https://gerrit.wikimedia.org/r/342081 (owner: 10Papaul)
[22:10:02] <wikibugs_>	 (03PS1) 10Mobrovac: RESTBase: Send the logs locally to stdout/syslog [puppet] - 10https://gerrit.wikimedia.org/r/342103 (https://phabricator.wikimedia.org/T112648)
[22:11:29] <icinga-wm>	 PROBLEM - Disk space on prometheus1004 is CRITICAL: DISK CRITICAL - free space: / 1045 MB (3% inode=51%)
[22:11:59] <icinga-wm>	 PROBLEM - Disk space on prometheus1003 is CRITICAL: DISK CRITICAL - free space: / 40 MB (0% inode=51%)
[22:13:49] <wikibugs_>	 (03CR) 10Ppchelko: RESTBase: Send the logs locally to stdout/syslog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342103 (https://phabricator.wikimedia.org/T112648) (owner: 10Mobrovac)
[22:14:01] <wikibugs_>	 (03PS5) 10Eevans: WIP: TLS configuration for RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/342088
[22:14:54] <mutante>	 uh oh @ prometheus disk space
[22:15:30] <wikibugs_>	 (03PS2) 10Mobrovac: RESTBase: Send the logs locally to stdout/syslog [puppet] - 10https://gerrit.wikimedia.org/r/342103 (https://phabricator.wikimedia.org/T112648)
[22:16:17] <wikibugs_>	 (03CR) 10Mobrovac: RESTBase: Send the logs locally to stdout/syslog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342103 (https://phabricator.wikimedia.org/T112648) (owner: 10Mobrovac)
[22:17:42] <wikibugs_>	 06Operations, 06Performance-Team, 10Traffic, 13Patch-For-Review: Segment Navigation Timing data by continent - https://phabricator.wikimedia.org/T128709#3089607 (10Krinkle)
[22:18:32] <logmsgbot>	 !log maxsem@tin Started deploy [tilerator/deploy@367df80]: no-op
[22:18:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:18:53] <logmsgbot>	 !log maxsem@tin Finished deploy [tilerator/deploy@367df80]: no-op (duration: 00m 22s)
[22:18:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:20:11] <wikibugs_>	 (03PS6) 10Eevans: WIP: TLS configuration for RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/342088
[22:24:05] <wikibugs_>	 (03PS7) 10Eevans: WIP: TLS configuration for RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/342088
[22:28:29] <wikibugs_>	 06Operations, 06Performance-Team, 10Wikimedia-General-or-Unknown: Run EventLogging test to determine best DC for each country - https://phabricator.wikimedia.org/T55497#3089642 (10Krinkle)
[22:28:59] <wikibugs_>	 (03PS1) 10Hashar: (WIP) contint: migrate git-daemon to systemd [puppet] - 10https://gerrit.wikimedia.org/r/342128 (https://phabricator.wikimedia.org/T157785)
[22:30:00] <wikibugs_>	 (03CR) 10Mobrovac: "PCC OK - https://puppet-compiler.wmflabs.org/5724/restbase1009.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/342103 (https://phabricator.wikimedia.org/T112648) (owner: 10Mobrovac)
[22:30:05] <wikibugs_>	 (03PS8) 10Eevans: WIP: TLS configuration for RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/342088
[22:30:27] <wikibugs_>	 (03CR) 10Mobrovac: "Also confirmed in BC that the output ends up in the syslog.log file." [puppet] - 10https://gerrit.wikimedia.org/r/342103 (https://phabricator.wikimedia.org/T112648) (owner: 10Mobrovac)
[22:33:26] <wikibugs_>	 (03CR) 10GWicke: "See inline question about log levels for stdout." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342103 (https://phabricator.wikimedia.org/T112648) (owner: 10Mobrovac)
[22:35:53] <wikibugs_>	 (03PS3) 10Mobrovac: RESTBase: Send the logs locally to stdout/syslog [puppet] - 10https://gerrit.wikimedia.org/r/342103 (https://phabricator.wikimedia.org/T112648)
[22:37:15] <wikibugs_>	 (03CR) 10Mobrovac: RESTBase: Send the logs locally to stdout/syslog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342103 (https://phabricator.wikimedia.org/T112648) (owner: 10Mobrovac)
[22:40:36] <wikibugs_>	 (03PS9) 10Eevans: Cassanra TLS configuration for RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/342088 (https://phabricator.wikimedia.org/T111113)
[22:43:13] <wikibugs_>	 (03CR) 10Eevans: "PC output: http://puppet-compiler.wmflabs.org/5726/" [puppet] - 10https://gerrit.wikimedia.org/r/342088 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans)
[22:44:00] <wikibugs_>	 (03CR) 10Eevans: [C: 04-1] "This isn't ready to be merged; Depend on https://gerrit.wikimedia.org/r/342075" [puppet] - 10https://gerrit.wikimedia.org/r/342088 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans)
[22:45:46] <logmsgbot>	 !log maxsem@tin Started deploy [tilerator/deploy@fb06c99]: https://gerrit.wikimedia.org/r/#/c/342140/
[22:45:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:46:07] <logmsgbot>	 !log maxsem@tin Finished deploy [tilerator/deploy@fb06c99]: https://gerrit.wikimedia.org/r/#/c/342140/ (duration: 00m 21s)
[22:46:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:46:39] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:46:41] <mutante>	 !log prometheus1003 - stopping service: [....] Stopping monitoring system and time series database: prometheusInvalid --pidfile argument: '/var/run/prometheus/prometheus.pid' (Parent directory does not exist)
[22:46:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:48:06] <wikibugs_>	 (03CR) 10GWicke: RESTBase: Send the logs locally to stdout/syslog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342103 (https://phabricator.wikimedia.org/T112648) (owner: 10Mobrovac)
[22:48:57] <logmsgbot>	 !log maxsem@tin Started deploy [tilerator/deploy@fb06c99]: https://gerrit.wikimedia.org/r/#/c/342140/
[22:49:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:49:03] <logmsgbot>	 !log maxsem@tin Finished deploy [tilerator/deploy@fb06c99]: https://gerrit.wikimedia.org/r/#/c/342140/ (duration: 00m 05s)
[22:49:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:50:27] <mutante>	 !log prometheus1003/1004 - systemctl stop prometheus (as opposed to /etc/init.d/prometheus), as they are low on disk but are not in production yet
[22:50:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:54:58] <wikibugs_>	 (03CR) 10Krinkle: "Is there a task for the reaper and/or the issue it solves? Would be good to have a short write-up about the data we got from labs (what's " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/339245 (owner: 10Aaron Schulz)
[22:55:23] <wikibugs_>	 (03PS6) 10Krinkle: Include DB shard in production SPI log entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331808 (owner: 10Aaron Schulz)
[23:04:59] <icinga-wm>	 PROBLEM - Misc HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[23:08:39] <wikibugs_>	 06Operations, 10hardware-requests, 13Patch-For-Review: decom fluorine - https://phabricator.wikimedia.org/T159996#3089733 (10RobH)
[23:09:30] <wikibugs_>	 06Operations, 10hardware-requests, 13Patch-For-Review: decom fluorine - https://phabricator.wikimedia.org/T159996#3085886 (10RobH) a:05RobH>03Dzahn The switch port is disabled.  Once you have confirmed this wipe can occur, please comment and  assign this to @cmjohnson.
[23:14:59] <icinga-wm>	 RECOVERY - Misc HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[23:15:39] <icinga-wm>	 RECOVERY - puppet last run on ms-fe1002 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures
[23:37:14] <wikibugs_>	 (03Abandoned) 10Dzahn: prometheus: fix lint warning [puppet] - 10https://gerrit.wikimedia.org/r/341965 (owner: 10Dzahn)
[23:42:23] <wikibugs_>	 (03PS5) 10Dzahn: change MX records for wikimedia.ee from elkdata.ee to Google [dns] - 10https://gerrit.wikimedia.org/r/341359 (https://phabricator.wikimedia.org/T158638)
[23:52:01] <wikibugs_>	 (03PS1) 10Krinkle: (no-op) Move NavigationTiming config to EventLogging section [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342147
[23:52:03] <wikibugs_>	 (03PS1) 10Krinkle: Disable WikimediaEvents extension on closed wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342148 (https://phabricator.wikimedia.org/T158721)
[23:53:09] <wikibugs_>	 (03PS1) 10Krinkle: (no-op) Remove setting of unused $wgPercentHHVM (no longer exists) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342149
[23:53:52] <wikibugs_>	 (03PS2) 10Krinkle: [noop] Move NavigationTiming config to EventLogging section [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342147
[23:54:01] <wikibugs_>	 (03PS2) 10Krinkle: (no-op) Remove setting of unused $wgPercentHHVM (no longer exists) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342149
[23:54:39] <icinga-wm>	 PROBLEM - tileratorui on maps-test2001 is CRITICAL: connect to address 10.192.0.128 and port 6535: Connection refused
[23:54:39] <icinga-wm>	 PROBLEM - tilerator on maps-test2001 is CRITICAL: connect to address 10.192.0.128 and port 6534: Connection refused
[23:55:03] <greg-g>	 MaxSem: ^ ?
[23:58:53] <wikibugs_>	 (03PS3) 10Krinkle: [noop] Move NavigationTiming config to EventLogging section [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342147
[23:59:04] <wikibugs_>	 (03PS3) 10Krinkle: (no-op) Remove setting of unused $wgPercentHHVM (no longer exists) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342149
[23:59:05] <wikibugs_>	 06Operations, 10media-storage: Sanity check global-multiwrite logs for ConfirmEdit usage - https://phabricator.wikimedia.org/T159830#3089950 (10Reedy) Well, the deletions should've been happening, but it was weird that one file was left. It seems there were some issues getting the captchas stored for whatever...