[00:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Evening SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T0000).
[00:00:04] <jouncebot>	 ebernhardson: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[00:00:59] <ebernhardson>	 just me, i can deploy
[00:01:47] <jdlrobson>	 ebernhardson: hmm
[00:01:48] <wikibugs>	 (03PS2) 10EBernhardson: [cirrus] reduce master timeout to 30s [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491231 (https://phabricator.wikimedia.org/T215969) (owner: 10DCausse)
[00:01:53] <jdlrobson>	 i think i must have put mine in the wrong place..
[00:02:12] <jdlrobson>	 indeed.. i put it down for thursday. @ebernhardson mind if I move it to now?
[00:02:14] <wikibugs>	 (03CR) 10EBernhardson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491231 (https://phabricator.wikimedia.org/T215969) (owner: 10DCausse)
[00:02:20] <ebernhardson>	 jdlrobson: sure, we can ship it too
[00:02:29] <jdlrobson>	 I sincerely hate the deploy table wikitext..
[00:03:10] <ebernhardson>	 jdlrobson: i always ctrl-f for '19 16' to find the 16:00 entry for the 19th (today) 
[00:03:11] <jdlrobson>	 @ebernhardson okay im in... https://wikitech.wikimedia.org/wiki/Deployments#Wednesday,_February_20
[00:03:14] <wikibugs>	 (03Merged) 10jenkins-bot: [cirrus] reduce master timeout to 30s [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491231 (https://phabricator.wikimedia.org/T215969) (owner: 10DCausse)
[00:03:23] <jdlrobson>	 Yeh I got confused though and grepped for utc time :/
[00:03:34] <ebernhardson>	 i suppose it displays in utc, makes sense :)_
[00:05:33] <logmsgbot>	 !log ebernhardson@deploy1001 Synchronized wmf-config/CirrusSearch-production.php: SWAT T215969 Return cirrussearch master timeout back to the default value (duration: 00m 57s)
[00:05:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:05:35] <stashbot>	 T215969: Measure mutation latency across the newly split elasticsearch clusters - https://phabricator.wikimedia.org/T215969
[00:12:51] <wikibugs>	 (03CR) 10jenkins-bot: [cirrus] reduce master timeout to 30s [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491231 (https://phabricator.wikimedia.org/T215969) (owner: 10DCausse)
[00:13:12] <jdlrobson>	 ebernhardson: ready when you are
[00:17:38] <ebernhardson>	 they should be on mwdebug1001 now
[00:18:26] <jdlrobson>	 (on it)
[00:18:46] <jdlrobson>	 ebernhardson: both changes or just 1 of them?
[00:21:02] <jdlrobson>	 ebernhardson: you can sync now.. assuming that's both, as it looks like it's on both :)
[00:21:31] <ebernhardson>	 jdlrobson: right, it's on both branches
[00:21:36] <ebernhardson>	 shipping it
[00:22:59] <jdlrobson>	 thanks <3
[00:23:24] <logmsgbot>	 !log ebernhardson@deploy1001 Synchronized php-1.33.0-wmf.18/skins/MinervaNeue/resources/skins.minerva.content.styles/lists.less: Revert switch to outside list style from ordered lists (duration: 00m 59s)
[00:23:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:24:45] <logmsgbot>	 !log ebernhardson@deploy1001 Synchronized php-1.33.0-wmf.17/skins/MinervaNeue/resources/skins.minerva.content.styles/lists.less: Revert switch to outside list style from ordered lists (duration: 00m 52s)
[00:24:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:26:10] <ebernhardson>	 jdlrobson: all synced out
[00:44:35] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga2001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[00:44:47] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga2001 is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[00:44:49] <quiddity>	 Wikidata isues? I'm getting "Request from [my IP] via cp1085 cp1085, Varnish XID 1061781934
[00:44:49] <quiddity>	 Error: 503, Backend fetch failed at Wed, 20 Feb 2019 00:42:58 GMT
[00:44:51] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga2001 is CRITICAL: job=varnish-text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[00:44:53] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga2001 is CRITICAL: cluster=cache_text site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[00:45:02] <chaomodus>	 what's this
[00:45:27] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqiad on icinga2001 is CRITICAL: job=varnish-text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[00:45:28] <paladox>	 wikidata works for me
[00:45:31] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga2001 is CRITICAL: job=varnish-text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[00:45:43] <chaomodus>	 yah wikidata is broken for me also
[00:45:53] <paladox>	 though
[00:46:01] <paladox>	 wikipedia is down for me
[00:46:03] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga2001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[00:46:21] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on icinga2001 is CRITICAL: cluster=cache_text site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[00:46:39] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqiad on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[00:46:41] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5
[00:46:46] <chaomodus>	 funnily wikipedia is up for me
[00:46:53] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at codfw on icinga2001 is CRITICAL: job=varnish-text site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[00:47:04] <paladox>	 heh chaomodus wikidata works for me
[00:47:05] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqsin on icinga2001 is CRITICAL: job=varnish-text site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[00:47:14] <paladox>	 so im guessing it depends on the data center?
[00:47:29] <chaomodus>	 well the alerts imply eqiad
[00:47:38] <chaomodus>	 oh i guess that
[00:47:42] * paladox would be going through esams
[00:47:44] <chaomodus>	 there are some from other pops too
[00:47:46] <paladox>	 but https://en.wikipedia.org/w/index.php?title=Main_Page&action=edit is showing a error
[00:47:53] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqiad on icinga2001 is CRITICAL: job=varnish-text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[00:47:58] <paladox>	 "Request from 2a00:23c4:ad14:9700:4d40:6f2a:d965:22d8 via cp1085 cp1085, Varnish XID 1067418549
[00:47:58] <paladox>	 Error: 503, Backend fetch failed at Wed, 20 Feb 2019 00:46:28 GMT"
[00:48:14] <chaomodus>	 so varnish done exploded what do
[00:49:39] <paladox>	 wikipedia is back for me.
[00:50:06] <chaomodus>	 wikidata is back for me just now
[00:52:34] <paladox>	 wikipedia is now slow
[00:52:47] <paladox>	 ie not loading now
[00:52:51] <paladox>	 chaomodus ^^
[00:53:05] <chaomodus>	 wikidata seems snappy for me, i'll check out wikipedia
[00:53:05] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at codfw on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[00:53:17] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqsin on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[00:53:20] <chaomodus>	 seems fluid
[00:53:26] <chaomodus>	 maybe it was a network burp
[00:53:42] <paladox>	 oh, stupid safari, it works in chrome.
[00:54:07] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[00:54:34] <chaomodus>	 that graph looks ok again
[00:55:01] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[00:57:05] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[00:57:09] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[00:57:11] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[00:57:47] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqiad on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[00:58:07] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[00:58:25] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[01:03:07] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10EBernhardson) Sorry for making everything confusing here, lets run with the assumption for now that the job runners can talk to clo...
[01:05:05] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5
[01:32:39] <wikibugs>	 (03CR) 10BryanDavis: "> LGTM. Shall I build + deploy this?" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/491397 (https://phabricator.wikimedia.org/T193646) (owner: 10BryanDavis)
[01:52:30] <logmsgbot>	 !log mobrovac@deploy1001 Started deploy [restbase/deploy@751dc5c]: Temporarily collect VE lrequest ogs for T215956
[01:52:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:52:33] <stashbot>	 T215956: Consider stashing data-parsoid for VE  - https://phabricator.wikimedia.org/T215956
[02:15:07] <logmsgbot>	 !log mobrovac@deploy1001 Finished deploy [restbase/deploy@751dc5c]: Temporarily collect VE lrequest ogs for T215956 (duration: 22m 37s)
[02:15:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:15:10] <stashbot>	 T215956: Consider stashing data-parsoid for VE  - https://phabricator.wikimedia.org/T215956
[02:24:04] <wikibugs>	 (03PS2) 10Andrew Bogott: cloudvirt1012: enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/490786 (https://phabricator.wikimedia.org/T216190)
[02:25:11] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt1012: enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/490786 (https://phabricator.wikimedia.org/T216190) (owner: 10Andrew Bogott)
[02:32:42] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudvirt1009: evaluate upgrading to 10G - https://phabricator.wikimedia.org/T216324 (10Andrew) Steps:  [] Move host to a rack with 10G -- B2, B4 or B7 I believe [] Enable the 10G nic in the bios (note that we can not do this via mgmt; it...
[03:57:34] <wikibugs>	 10Operations, 10Gerrit, 10serviceops: Gerrit loads very slowly - https://phabricator.wikimedia.org/T215855 (10Paladox) Thanks @hashar
[04:45:27] <XioNoX>	 !log add avoid-paths WIRESTAR-OPTICALTEL to cr2-eqdfw
[04:45:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:58:03] <icinga-wm>	 PROBLEM - Debian mirror in sync with upstream on sodium is CRITICAL: /srv/mirrors/debian is over 14 hours old.
[05:10:25] <wikibugs>	 10Operations, 10Cloud-VPS, 10Traffic, 10netops, 10cloud-services-team (Kanban): Evaluate the possibility to add Juniper images to Openstack - https://phabricator.wikimedia.org/T180179 (10ayounsi) @aborrero Being able to create different L2 links between VMs would be ideal, but having them all in the same...
[05:12:14] <wikibugs>	 10Operations, 10Operations-Software-Development: Netbox: cable termination names report - https://phabricator.wikimedia.org/T216469 (10ayounsi) No strong preferences, but indeed "Device X has Y miss-labelled" is an option.
[05:55:13] <wikibugs>	 (03CR) 10Elukey: Add analytics purge job for xmldumps on HDFS (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal)
[06:08:18] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491682 (https://phabricator.wikimedia.org/T210713)
[06:10:00] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491682 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[06:10:47] <wikibugs>	 (03CR) 10Elukey: "> I understand your concern Luca. I also think it is likely to fail" [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal)
[06:11:00] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491682 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[06:11:17] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491682 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[06:12:20] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1119 T210713 (duration: 01m 05s)
[06:12:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:12:23] <stashbot>	 T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713
[06:12:31] <wikibugs>	 (03CR) 10Elukey: Add analytics purge job for xmldumps on HDFS (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal)
[06:14:30] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491683
[06:16:14] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491683 (owner: 10Marostegui)
[06:17:15] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491683 (owner: 10Marostegui)
[06:18:24] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1109 for kernel and mysql upgrade (duration: 00m 52s)
[06:18:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:18:51] <marostegui>	 !log Stop MySQL on db1109 for kernel and mysql upgrade
[06:18:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:19:19] <wikibugs>	 (03PS9) 10Elukey: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal)
[06:21:59] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491683 (owner: 10Marostegui)
[06:22:48] <wikibugs>	 (03PS10) 10Elukey: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal)
[06:23:12] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491684
[06:26:21] <wikibugs>	 (03PS11) 10Elukey: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal)
[06:26:37] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491684 (owner: 10Marostegui)
[06:27:41] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491684 (owner: 10Marostegui)
[06:28:41] <icinga-wm>	 PROBLEM - netbox HTTPS on netmon1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 547 bytes in 0.190 second response time
[06:28:47] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1109 after kernel upgrade (duration: 00m 52s)
[06:28:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:29:08] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491687
[06:29:21] <icinga-wm>	 PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[06:31:13] <icinga-wm>	 RECOVERY - netbox HTTPS on netmon1002 is OK: HTTP OK: HTTP/1.1 302 Found - 348 bytes in 0.662 second response time
[06:31:51] <wikibugs>	 (03PS12) 10Elukey: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal)
[06:31:53] <icinga-wm>	 RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational
[06:33:00] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491684 (owner: 10Marostegui)
[06:36:15] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491687 (owner: 10Marostegui)
[06:37:11] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491687 (owner: 10Marostegui)
[06:38:13] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1109 after kernel upgrade (duration: 00m 52s)
[06:38:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:39:09] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1001/14745/an-coord1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal)
[06:39:23] <wikibugs>	 (03CR) 10Elukey: "Joal: let me know if I should merge or not :)" [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal)
[06:40:41] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491688
[06:41:25] <wikibugs>	 (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491688
[06:42:25] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491688 (owner: 10Marostegui)
[06:43:29] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491688 (owner: 10Marostegui)
[06:43:52] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::mediawiki::php: install tideways-xhprof, remove tideways [puppet] - 10https://gerrit.wikimedia.org/r/491533 (https://phabricator.wikimedia.org/T176916) (owner: 10Giuseppe Lavagetto)
[06:44:02] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::php: install tideways-xhprof, remove tideways [puppet] - 10https://gerrit.wikimedia.org/r/491533 (https://phabricator.wikimedia.org/T176916)
[06:44:10] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491687 (owner: 10Marostegui)
[06:44:12] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491688 (owner: 10Marostegui)
[06:44:30] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491689 (https://phabricator.wikimedia.org/T210713)
[06:44:32] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1119 T210713 (duration: 00m 51s)
[06:44:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:44:35] <stashbot>	 T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713
[06:45:31] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491689 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[06:46:34] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491689 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[06:47:38] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491690
[06:47:46] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1080 T210713 (duration: 00m 51s)
[06:47:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:48:43] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491690 (owner: 10Marostegui)
[06:49:45] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491690 (owner: 10Marostegui)
[06:49:51] <wikibugs>	 (03PS1) 10Elukey: superset: use cn for LDAP search (not uid) [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524)
[06:50:52] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Increase traffic for db1109 after kernel upgrade (duration: 00m 52s)
[06:50:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:51:12] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile: use register_shutdown_function [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491518 (https://phabricator.wikimedia.org/T176916) (owner: 10Giuseppe Lavagetto)
[06:52:14] <wikibugs>	 (03Merged) 10jenkins-bot: profile: use register_shutdown_function [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491518 (https://phabricator.wikimedia.org/T176916) (owner: 10Giuseppe Lavagetto)
[06:54:51] <logmsgbot>	 !log oblivian@deploy1001 Synchronized wmf-config/profiler.php: Fix the tideways setup (duration: 00m 52s)
[06:54:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:55:37] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491689 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[06:55:39] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491690 (owner: 10Marostegui)
[06:55:41] <wikibugs>	 (03CR) 10jenkins-bot: profile: use register_shutdown_function [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491518 (https://phabricator.wikimedia.org/T176916) (owner: 10Giuseppe Lavagetto)
[07:04:49] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491692
[07:07:38] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491692 (owner: 10Marostegui)
[07:08:36] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491692 (owner: 10Marostegui)
[07:09:44] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Increase traffic for db1109 after kernel upgrade (duration: 00m 52s)
[07:09:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:09:58] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491693
[07:10:47] <wikibugs>	 (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491693
[07:11:49] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491693 (owner: 10Marostegui)
[07:12:49] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491693 (owner: 10Marostegui)
[07:13:51] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1080 T210713 (duration: 00m 52s)
[07:13:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:13:54] <stashbot>	 T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713
[07:14:05] <marostegui>	 !log Deploy schema change on s1 primary master (db1067) - T210713
[07:14:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:16:06] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491694
[07:17:58] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491692 (owner: 10Marostegui)
[07:18:00] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491693 (owner: 10Marostegui)
[07:44:37] <wikibugs>	 (03Abandoned) 10Elukey: Introduce profile::analytics::cluster::limits::statistics [puppet] - 10https://gerrit.wikimedia.org/r/488078 (https://phabricator.wikimedia.org/T212824) (owner: 10Elukey)
[07:45:07] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal)
[07:45:14] <wikibugs>	 (03PS13) 10Elukey: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal)
[07:45:46] <moritzm>	 !log installing gnupg2 updates on stretch
[07:45:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:05:29] <wikibugs>	 (03PS1) 10Joal: Correct analytics-drop-xmldumps systemd-timer name [puppet] - 10https://gerrit.wikimedia.org/r/491696 (https://phabricator.wikimedia.org/T216414)
[08:06:30] <joal>	 elukey: --^
[08:22:29] <wikibugs>	 (03PS1) 10Zoranzoki21: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491697
[08:22:31] <wikibugs>	 (03PS1) 10Zoranzoki21: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491698
[08:22:33] <wikibugs>	 (03PS1) 10Zoranzoki21: Disable mobile main page special casing on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491699 (https://phabricator.wikimedia.org/T216563)
[08:22:38] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Correct analytics-drop-xmldumps systemd-timer name [puppet] - 10https://gerrit.wikimedia.org/r/491696 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal)
[08:22:52] <wikibugs>	 (03Abandoned) 10Zoranzoki21: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491698 (owner: 10Zoranzoki21)
[08:22:58] <wikibugs>	 (03Abandoned) 10Zoranzoki21: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491697 (owner: 10Zoranzoki21)
[08:23:24] <wikibugs>	 (03PS2) 10Zoranzoki21: Disable mobile main page special casing on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491699 (https://phabricator.wikimedia.org/T216563)
[08:24:56] <wikibugs>	 (03PS3) 10Zoranzoki21: Disable mobile main page special casing on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491699 (https://phabricator.wikimedia.org/T216563)
[08:24:58] <wikibugs>	 (03PS1) 10Zoranzoki21: Test change for problems with git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491700
[08:25:28] <wikibugs>	 (03CR) 10Zoranzoki21: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491699 (https://phabricator.wikimedia.org/T216563) (owner: 10Zoranzoki21)
[08:25:36] <wikibugs>	 (03PS2) 10Zoranzoki21: Test change for problems with git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491700
[08:25:39] <wikibugs>	 (03Abandoned) 10Zoranzoki21: Test change for problems with git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491700 (owner: 10Zoranzoki21)
[08:34:03] <icinga-wm>	 PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[08:34:34] <elukey>	 this is me --^
[08:40:20] <wikibugs>	 (03PS1) 10Zoranzoki21: Disabled mobile main page special casing on Serbian projects because it is unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491701
[08:40:52] <wikibugs>	 (03PS4) 10Mathew.onipe: cloudelastic: Add cloudelastic configs [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921)
[08:41:06] <wikibugs>	 (03CR) 10Mathew.onipe: cloudelastic: Add cloudelastic configs (039 comments) [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe)
[08:44:26] <wikibugs>	 (03Abandoned) 10Zoranzoki21: Disabled mobile main page special casing on Serbian projects because it is unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491701 (owner: 10Zoranzoki21)
[08:44:50] <wikibugs>	 10Operations, 10Analytics, 10Product-Analytics, 10Patch-For-Review, 10User-Elukey: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 (10elukey) Today I checked notebook1003 using the command `systemd-cgls memory`, that should show how the cgroups for memory setting...
[08:48:47] <moritzm>	 !log powercycling rdb1001 for a test
[08:48:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:00:04] <wikibugs>	 (03PS1) 10Muehlenhoff: Update point of contact for one researcher [puppet] - 10https://gerrit.wikimedia.org/r/491710
[09:01:23] <wikibugs>	 (03PS2) 10Muehlenhoff: profile::prometheus::nutcracker_exporter: Remove trusty support [puppet] - 10https://gerrit.wikimedia.org/r/490881
[09:01:32] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Update point of contact for one researcher [puppet] - 10https://gerrit.wikimedia.org/r/491710 (owner: 10Muehlenhoff)
[09:03:48] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491694 (owner: 10Marostegui)
[09:04:51] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491694 (owner: 10Marostegui)
[09:05:35] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491694 (owner: 10Marostegui)
[09:06:03] <wikibugs>	 (03PS4) 10Gehel: Restore privileges to admin table after script [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) (owner: 10MSantos)
[09:06:36] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1109 (duration: 00m 52s)
[09:06:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:07:48] <wikibugs>	 (03CR) 10Gehel: "puppet compiler looks good: https://puppet-compiler.wmflabs.org/compiler1001/14748/" [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) (owner: 10MSantos)
[09:13:32] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production).
[09:13:39] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Make staging not read-only [puppet] - 10https://gerrit.wikimedia.org/r/491713
[09:14:02] <marostegui>	 moritzm: ^ is that your +2 ?
[09:15:15] <moritzm>	 damn, forgot to press ENTER, just merged
[09:15:58] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge.
[09:16:35] <marostegui>	 :)
[09:16:46] <wikibugs>	 (03CR) 10Mathew.onipe: [C: 03+1] Restore privileges to admin table after script [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) (owner: 10MSantos)
[09:19:18] <wikibugs>	 (03PS2) 10Marostegui: mariadb: Make staging not read-only [puppet] - 10https://gerrit.wikimedia.org/r/491713
[09:21:01] <wikibugs>	 (03PS5) 10Gehel: Restore privileges to admin table after script [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) (owner: 10MSantos)
[09:22:31] <wikibugs>	 (03CR) 10Marostegui: "The compiler looks good: https://puppet-compiler.wmflabs.org/compiler1001/14750/" [puppet] - 10https://gerrit.wikimedia.org/r/491713 (owner: 10Marostegui)
[09:23:09] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] Restore privileges to admin table after script [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) (owner: 10MSantos)
[09:25:37] <wikibugs>	 (03PS3) 10Marostegui: mariadb: Make staging not read-only [puppet] - 10https://gerrit.wikimedia.org/r/491713 (https://phabricator.wikimedia.org/T210478)
[09:29:35] <wikibugs>	 (03PS1) 10Joal: Fix analytics-drop-xmldumps service [puppet] - 10https://gerrit.wikimedia.org/r/491717 (https://phabricator.wikimedia.org/T216414)
[09:29:46] <joal>	 elukey: --^ hopefully the last one
[09:31:13] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] Add remove_on_error parameter to icinga.hosts_downtimed() [software/spicerack] - 10https://gerrit.wikimedia.org/r/491526 (owner: 10Gehel)
[09:32:02] <wikibugs>	 (03CR) 10jenkins-bot: Add remove_on_error parameter to icinga.hosts_downtimed() [software/spicerack] - 10https://gerrit.wikimedia.org/r/491526 (owner: 10Gehel)
[09:33:28] <marostegui>	 !log Deploy schema change on db2043 (s3 codfw master), lag will be generated on s3 codfw - T210713
[09:33:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:33:31] <stashbot>	 T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713
[09:33:33] <elukey>	 joal: let's remove /usr/bin/python from there
[09:33:50] <joal>	 ack elukey - pushing
[09:35:21] <wikibugs>	 (03PS2) 10Joal: Fix analytics-drop-xmldumps service [puppet] - 10https://gerrit.wikimedia.org/r/491717 (https://phabricator.wikimedia.org/T216414)
[09:37:02] <wikibugs>	 (03PS3) 10Elukey: Fix the mediawiki-drop-xmldumps-pages_meta_history timer [puppet] - 10https://gerrit.wikimedia.org/r/491717 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal)
[09:38:03] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Fix the mediawiki-drop-xmldumps-pages_meta_history timer [puppet] - 10https://gerrit.wikimedia.org/r/491717 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal)
[09:39:09] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+1] mariadb: Make staging not read-only [puppet] - 10https://gerrit.wikimedia.org/r/491713 (https://phabricator.wikimedia.org/T210478) (owner: 10Marostegui)
[09:39:37] <wikibugs>	 (03PS4) 10Marostegui: mariadb: Make staging not read-only [puppet] - 10https://gerrit.wikimedia.org/r/491713 (https://phabricator.wikimedia.org/T210478)
[09:40:42] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Make staging not read-only [puppet] - 10https://gerrit.wikimedia.org/r/491713 (https://phabricator.wikimedia.org/T210478) (owner: 10Marostegui)
[09:41:23] <wikibugs>	 (03Abandoned) 10Marostegui: dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 (https://phabricator.wikimedia.org/T210478) (owner: 10Marostegui)
[09:41:42] <icinga-wm>	 RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational
[09:42:31] <wikibugs>	 10Operations, 10Elasticsearch, 10Discovery-Search (Current work): Test spicerack elasticsearch module - https://phabricator.wikimedia.org/T207920 (10Gehel) Upgrade to elasticsearch 5.6.x was performed on relforge with spicerack, so testing is complete. There are always things to improve, but this is working...
[09:44:34] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] profile::prometheus::nutcracker_exporter: Remove trusty support [puppet] - 10https://gerrit.wikimedia.org/r/490881 (owner: 10Muehlenhoff)
[09:52:23] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: retry on all urllib3 exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719
[09:52:28] <wikibugs>	 10Operations, 10Wikimedia-Logstash: Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10fgiunchedi) >>! In T205856#4959426, @bd808 wrote: >> The plan has syslog + json as formatting, since that's what we use for logstash already and preserv...
[09:52:38] <wikibugs>	 (03PS1) 10Marostegui: instance.pp: Make read-only check use the variable [puppet] - 10https://gerrit.wikimedia.org/r/491720
[09:53:10] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] "Probably legit? ;)" [puppet] - 10https://gerrit.wikimedia.org/r/491276 (https://phabricator.wikimedia.org/T210706) (owner: 10Elukey)
[09:53:31] <wikibugs>	 (03PS3) 10Elukey: Deployment-prep: add cassandra/twcs scap repository [puppet] - 10https://gerrit.wikimedia.org/r/491276 (https://phabricator.wikimedia.org/T210706)
[09:54:37] <wikibugs>	 (03CR) 10Volans: elasticsearch: retry on all urllib3 exceptions (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719 (owner: 10Gehel)
[09:54:43] <wikibugs>	 (03CR) 10Marostegui: "https://puppet-compiler.wmflabs.org/compiler1001/14751/" [puppet] - 10https://gerrit.wikimedia.org/r/491720 (owner: 10Marostegui)
[09:56:04] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] instance.pp: Make read-only check use the variable [puppet] - 10https://gerrit.wikimedia.org/r/491720 (owner: 10Marostegui)
[09:57:54] <moritzm>	 !log installing systemd security updates on jessie hosts
[09:57:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:00:02] <wikibugs>	 10Operations, 10Wikimedia-Logstash: Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10fgiunchedi) >>! In T205856#4965359, @Ottomata wrote: > Qs: >  > Are the logs sent using Monolog? >  > Is there just one topic 'mwlog', or multiple, one...
[10:01:18] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] Add ganeti read-only user deployment (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov)
[10:02:14] <wikibugs>	 (03PS2) 10Gehel: elasticsearch: retry on all urllib3 exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719
[10:02:37] <wikibugs>	 (03CR) 10Gehel: elasticsearch: retry on all urllib3 exceptions (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719 (owner: 10Gehel)
[10:04:06] <marostegui>	 !log Deploy schema change on dbstore1004:3313 - T210713
[10:04:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:09] <stashbot>	 T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713
[10:04:29] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Deployment-prep: add cassandra/twcs scap repository [puppet] - 10https://gerrit.wikimedia.org/r/491276 (https://phabricator.wikimedia.org/T210706) (owner: 10Elukey)
[10:04:36] <wikibugs>	 (03PS4) 10Elukey: Deployment-prep: add cassandra/twcs scap repository [puppet] - 10https://gerrit.wikimedia.org/r/491276 (https://phabricator.wikimedia.org/T210706)
[10:04:38] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] Deployment-prep: add cassandra/twcs scap repository [puppet] - 10https://gerrit.wikimedia.org/r/491276 (https://phabricator.wikimedia.org/T210706) (owner: 10Elukey)
[10:05:42] <wikibugs>	 10Operations: Integrate Stretch 9.8 point update - https://phabricator.wikimedia.org/T216384 (10MoritzMuehlenhoff)
[10:07:57] <wikibugs>	 (03PS5) 10Mathew.onipe: cloudelastic: Add cloudelastic configs [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921)
[10:32:12] <icinga-wm>	 PROBLEM - Disk space on prometheus2003 is CRITICAL: DISK CRITICAL - free space: /srv/prometheus/services 4996 MB (2% inode=99%)
[10:35:09] <godog>	 ugghh that's me
[10:35:52] <icinga-wm>	 RECOVERY - Disk space on prometheus2003 is OK: DISK OK
[10:36:56] <marostegui>	 !log Deploy schema change on db1095:3313 - T210713
[10:36:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:37:00] <stashbot>	 T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713
[10:40:32] <wikibugs>	 (03PS1) 10Elukey: Add pginer (WMF staff) to admin data.yml [puppet] - 10https://gerrit.wikimedia.org/r/491723 (https://phabricator.wikimedia.org/T211036)
[10:41:10] <wikibugs>	 (03PS3) 10Gehel: elasticsearch: retry on all urllib3 exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719
[10:42:49] <wikibugs>	 (03CR) 10DCausse: "settings look good to me but I think you now need to add a new role in modules/role/manifests/elasticsearch/cloudelastic.pp because none t" [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe)
[10:52:17] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: raise logging level to ERROR for elasticsearch [software/spicerack] - 10https://gerrit.wikimedia.org/r/491725
[10:53:12] <godog>	 mmhh for some reason prometheus on bast3002 has restarted, currently recovering its storage
[10:53:22] <godog>	 hence the UNKNOWNs on icinga
[10:54:54] <wikibugs>	 (03PS42) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011)
[10:55:49] <moritzm>	 godog: hmmh, the prometheus process is from 10:27 and at 10:25 I restarted systemd-journald for a sec update, I'm wondering if that's related
[10:56:48] <godog>	 moritzm: could be, I'm not sure yet, though prometheus got started back up by puppet afaics
[10:57:03] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1069 (x1 master) memory errors - https://phabricator.wikimedia.org/T201133 (10Marostegui) Just for the record ` db1069  Memory correctable errors -EDAC- WARNING 2019-02-20 10:45:24 2d 19h 28m 54s 3/3 2 ge 2 `
[10:57:46] <wikibugs>	 10Operations, 10monitoring, 10Goal, 10Patch-For-Review: Upgrade production prometheus-node-exporter to >= 0.16 - https://phabricator.wikimedia.org/T213708 (10fgiunchedi) Noticed this today on `bast3002`, probably harmless but needs investigation:  ` Feb 20 10:26:50 bast3002 systemd[1]: Starting Collect ipm...
[10:57:50] <wikibugs>	 (03CR) 10Jbond: "ready for review" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond)
[10:59:50] <wikibugs>	 10Operations, 10Gerrit, 10serviceops: Gerrit loads very slowly - https://phabricator.wikimedia.org/T215855 (10hashar) I have pasted P8073 content to [[ https://fastthread.io/ | fastthread.io ]]. It is an analyzer for Java thread dumps.  https://fastthread.io/ft-thread-report.jsp?dumpId=1&oTxnId_value=c865888...
[11:06:41] <godog>	 moritzm: looks like the last datapoints were around 10:24 so that would line up, however other processes seem fine and didn't restart
[11:09:54] <wikibugs>	 (03PS4) 10Volans: elasticsearch: retry on all urllib3 exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719 (owner: 10Gehel)
[11:09:56] <moritzm>	 the jessie-based prometheus servers are still TBD, we can see whether it reproes when systemd is upgraded there
[11:10:22] <icinga-wm>	 PROBLEM - puppet last run on mc2022 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools]
[11:11:20] <godog>	 moritzm: kk, please let me know before so I can take a look too
[11:14:27] <wikibugs>	 (03CR) 10Alexandros Kosiaris: Introduce citoid helm chart (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/491523 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris)
[11:15:54] <wikibugs>	 (03PS6) 10Mathew.onipe: cloudelastic: Add cloudelastic configs [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921)
[11:16:09] <wikibugs>	 (03CR) 10DCausse: "sorry ignore my previous comment" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe)
[11:17:11] <wikibugs>	 (03CR) 10Volans: [C: 03+2] elasticsearch: retry on all urllib3 exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719 (owner: 10Gehel)
[11:18:08] <wikibugs>	 10Operations, 10Gerrit, 10serviceops: Gerrit loads very slowly - https://phabricator.wikimedia.org/T215855 (10hashar) ` zcat /var/log/apache2/gerrit.wikimedia.org.https.access.log.9.gz|cut -b-13|sort|uniq -c   17821 2019-02-11T06   55594 2019-02-11T07   52925 2019-02-11T08   54292 2019-02-11T09   74124 2019-...
[11:19:27] <wikibugs>	 (03CR) 10Fsero: [C: 03+1] Introduce citoid helm chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/491523 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris)
[11:21:29] <wikibugs>	 (03Merged) 10jenkins-bot: elasticsearch: retry on all urllib3 exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719 (owner: 10Gehel)
[11:22:16] <wikibugs>	 (03CR) 10jenkins-bot: elasticsearch: retry on all urllib3 exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719 (owner: 10Gehel)
[11:25:16] <wikibugs>	 (03PS2) 10Volans: elasticsearch: raise logging level to ERROR for elasticsearch [software/spicerack] - 10https://gerrit.wikimedia.org/r/491725 (owner: 10Gehel)
[11:28:50] <akosiaris>	 !log rebuild and re-upload rsyslog_8.38.0-1~bpo9+1wmf1_amd64.changes to apt.wikimedia.org/stretch-wikimedia to have mmkubernetes package
[11:28:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:33:18] <wikibugs>	 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10Jhernandez) We still haven't created the herald rule to tag all proton...
[11:34:17] <wikibugs>	 (03CR) 10Volans: [C: 03+2] elasticsearch: raise logging level to ERROR for elasticsearch [software/spicerack] - 10https://gerrit.wikimedia.org/r/491725 (owner: 10Gehel)
[11:36:28] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on dbstore2001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 407.65 seconds
[11:36:30] <icinga-wm>	 RECOVERY - puppet last run on mc2022 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[11:38:27] <wikibugs>	 (03Merged) 10jenkins-bot: elasticsearch: raise logging level to ERROR for elasticsearch [software/spicerack] - 10https://gerrit.wikimedia.org/r/491725 (owner: 10Gehel)
[11:39:12] <wikibugs>	 (03CR) 10jenkins-bot: elasticsearch: raise logging level to ERROR for elasticsearch [software/spicerack] - 10https://gerrit.wikimedia.org/r/491725 (owner: 10Gehel)
[11:41:26] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s5 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 40.47 seconds
[11:41:34] <wikibugs>	 (03PS3) 10Zoranzoki21: IS.php: Add wgProofreadPagePageJoiner, set it per default on '-' and at zhwikisource on __PAGEJOIN__ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482502 (https://phabricator.wikimedia.org/T205826)
[11:41:53] <wikibugs>	 (03PS4) 10Zoranzoki21: Add category at wgGettingStartedExcludedCategories for srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482534
[11:41:58] <wikibugs>	 (03PS5) 10Zoranzoki21: Add categories for all Croatian projects at wmgBabelMainCategory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482548
[11:46:18] <jbond42>	 !log rolling restarts for hhvm in codfw
[11:46:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:47:25] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: aptrepo: pull openstack mitaka packages into reprepro [puppet] - 10https://gerrit.wikimedia.org/r/491558 (https://phabricator.wikimedia.org/T216497)
[11:48:28] <wikibugs>	 (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.17 [software/spicerack] - 10https://gerrit.wikimedia.org/r/491733
[11:49:16] <wikibugs>	 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MSantos) a:03MSantos
[11:54:54] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] Add pginer (WMF staff) to admin data.yml [puppet] - 10https://gerrit.wikimedia.org/r/491723 (https://phabricator.wikimedia.org/T211036) (owner: 10Elukey)
[11:55:01] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/491723 (https://phabricator.wikimedia.org/T211036) (owner: 10Elukey)
[11:56:10] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Add pginer (WMF staff) to admin data.yml [puppet] - 10https://gerrit.wikimedia.org/r/491723 (https://phabricator.wikimedia.org/T211036) (owner: 10Elukey)
[11:56:16] <elukey>	 thanks!
[11:57:08] <wikibugs>	 10Operations, 10Scap, 10Release-Engineering-Team (Kanban): Remove trusty-specific hacks from logstash_checker.py - https://phabricator.wikimedia.org/T216380 (10MoritzMuehlenhoff) p:05Triage→03Low
[11:57:29] <wikibugs>	 (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.17 [software/spicerack] - 10https://gerrit.wikimedia.org/r/491733 (owner: 10Volans)
[12:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: My dear minions, it's time we take the moon! Just kidding. Time for European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T1200).
[12:00:04] <jouncebot>	 Zoranzoki21: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[12:00:15] <Zoranzoki21>	 Here \o/
[12:00:45] <zeljkof>	 Zoranzoki21: I can swat today, but in 5 minutes, and I have to go in 25 minutes, so I'll do my best today
[12:00:52] <zeljkof>	 2-3 patches probably
[12:00:59] <sDrewth>	 is it known that the RC IRC feeds have stopped?
[12:01:07] <Zoranzoki21>	 ?
[12:01:39] <sDrewth>	 wikimedia IRC recent changes have stopped
[12:01:44] <wikibugs>	 (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.17 [software/spicerack] - 10https://gerrit.wikimedia.org/r/491733 (owner: 10Volans)
[12:01:57] <Zoranzoki21>	 Ok, if you have so small time, do 491699 and mwscript namespaceDupes.php on shwiki
[12:02:13] <wikibugs>	 10Operations, 10IRCecho: irc.wikimedia.org RC feed has stopped - https://phabricator.wikimedia.org/T216607 (10Bawolff)
[12:02:30] <wikibugs>	 (03CR) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.17 [software/spicerack] - 10https://gerrit.wikimedia.org/r/491733 (owner: 10Volans)
[12:02:33] <bawolff>	 I think this probably warrants UBN
[12:03:04] <wikibugs>	 10Operations, 10IRCecho: irc.wikimedia.org RC feed has stopped - https://phabricator.wikimedia.org/T216607 (10Bawolff) p:05Triage→03Unbreak!
[12:04:06] <wikibugs>	 (03PS1) 10Volans: Upstream release v0.0.17 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/491734
[12:04:57] <sDrewth>	 fsero ^^^
[12:05:40] <Zoranzoki21>	 What happening with SWAT?
[12:08:18] <Bsadowski1>	 It's up now, bawolff
[12:08:20] <moritzm>	 !log restarted ircecho on kraz.wikimedia.org
[12:08:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:08:50] <bawolff>	 So i guess there's no icingia monitoring of ircecho ;)
[12:08:54] <cdanis>	 there is
[12:09:12] <cdanis>	 but it looks like it only checks if the service is running (it was), not if the service is doing anything (it wasn't)
[12:09:21] <fsero>	 i guess moritzm restarted i was looking into it and i saw the restart
[12:09:34] <wikibugs>	 (03CR) 10Addshore: [C: 03+1] Change Special:ItemDisambiguation from blank special page to disabled page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491237 (https://phabricator.wikimedia.org/T216397) (owner: 10Ladsgroup)
[12:09:41] <bawolff>	 Anyways, thanks all :)
[12:09:51] <zeljkof>	 Zoranzoki21: sorry, I'm a bit late today
[12:09:53] <sDrewth>	 meta and enwikisource are back, awaiting wikimaniawiki
[12:09:59] <zeljkof>	 I guess there will be time for 1-2 patches today
[12:10:01] <Bsadowski1>	 Wed [06:07:56 AM]<-- rc-pmtpa has quit (Remote host closed the connection)
[12:10:01] <Bsadowski1>	 Wed [06:07:56 AM]--> rc-pmtpa (~rc-pmtpa@special.user) has joined #en.wikipedia
[12:10:06] <zeljkof>	 what's the most urgent ones?
[12:10:12] <zeljkof>	 in the order of calendar?
[12:10:15] <Bsadowski1>	 en's up
[12:10:19] <wikibugs>	 (03CR) 10Addshore: [C: 03+1] "Wikibase.php should be synced first" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491506 (https://phabricator.wikimedia.org/T213713) (owner: 10Ladsgroup)
[12:10:26] <Zoranzoki21>	 mwscript and 491699
[12:10:30] <sDrewth>	 thks all
[12:10:30] <wikibugs>	 (03PS2) 10Addshore: Drop obsolete Wikibase configs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491506 (https://phabricator.wikimedia.org/T213713) (owner: 10Ladsgroup)
[12:10:39] <bawolff>	 sDrewth: It will probably rejoin the channel upon the first edit to that wiki (If i remember how it works correctly)
[12:10:51] <sDrewth>	 k
[12:10:58] * sDrewth goes to prod
[12:11:12] <moritzm>	 sDrewth: thanks for the report, I'll follow up on the Phab task
[12:11:46] <zeljkof>	 Zoranzoki21: ah, so namespaceDupes is a separate thing, and should be done first?
[12:11:56] <Zoranzoki21>	 Ok
[12:12:00] <wikibugs>	 (03CR) 10Volans: [C: 03+2] Upstream release v0.0.17 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/491734 (owner: 10Volans)
[12:12:36] <sDrewth>	 bawolff, totally correct, memory prize awarded
[12:13:22] <sDrewth>	 thks moritzm, reallly appreciate the quick resolution
[12:16:09] <wikibugs>	 10Operations, 10IRCecho: irc.wikimedia.org RC feed has stopped - https://phabricator.wikimedia.org/T216607 (10fsero) ircd was restarted and it seems to be working again. We should investigate why stopped it might be related to systemd upgrade?
[12:16:15] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream release v0.0.17 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/491734 (owner: 10Volans)
[12:16:33] <wikibugs>	 10Operations, 10IRCecho: irc.wikimedia.org RC feed has stopped - https://phabricator.wikimedia.org/T216607 (10fsero) p:05Unbreak!→03Normal
[12:18:03] <wikibugs>	 10Operations, 10IRCecho: irc.wikimedia.org RC feed has stopped - https://phabricator.wikimedia.org/T216607 (10MoritzMuehlenhoff) Thanks for the report. This was caused by a restart of systemd-journald which was necessary to deploy a security update for systemd. The immediate error has been fixed by a restart o...
[12:18:30] <zeljkof>	 Zoranzoki21: ran the script https://phabricator.wikimedia.org/T216524#4968321 
[12:18:47] <zeljkof>	 that's pretty much it for today, I have to go 
[12:19:00] <Zoranzoki21>	 Ok...
[12:19:02] <zeljkof>	 if anybody can take over the swat, please do
[12:19:21] <zeljkof>	 Zoranzoki21: move the patches to another swat window if nobody can deploy today
[12:19:33] <Zoranzoki21>	 zeljkof: Ok
[12:20:05] * zeljkof is gone
[12:21:20] <Zoranzoki21>	 addshore, Maxsem, dereckson: anyone? Or I have to move patches at next SWAT?
[12:22:12] <wikibugs>	 10Operations, 10IRCecho: irc.wikimedia.org RC feed has stopped - https://phabricator.wikimedia.org/T216607 (10MoritzMuehlenhoff)
[12:22:30] <wikibugs>	 10Operations, 10IRCecho: Restarting systemd-journald breaks ircecho service - https://phabricator.wikimedia.org/T216607 (10MoritzMuehlenhoff)
[12:24:35] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: use archive.debian.org as jessie-backports repo [puppet] - 10https://gerrit.wikimedia.org/r/491736 (https://phabricator.wikimedia.org/T216497)
[12:25:32] <icinga-wm>	 PROBLEM - puppet last run on mw2257 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[apache2]
[12:25:40] <volans>	 !log uploaded spicerack_0.0.17-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
[12:25:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:28:40] <volans>	 !log upgraded spicerack to 0.0.17 on cumin[12]001
[12:28:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:28:47] <volans>	 cc gehel, onimisionipe ^^^
[12:29:04] <onimisionipe>	 volans: Thanks!
[12:29:18] <onimisionipe>	 codfw upgrade is set
[12:33:42] <wikibugs>	 10Operations, 10IRCecho, 10Icinga, 10monitoring: Icnga check for ircecho should check for actual activity - https://phabricator.wikimedia.org/T216611 (10MoritzMuehlenhoff)
[12:33:53] <wikibugs>	 10Operations, 10IRCecho, 10Icinga, 10monitoring: Icinga check for ircecho should check for actual activity - https://phabricator.wikimedia.org/T216611 (10MoritzMuehlenhoff) p:05Triage→03Normal
[12:40:06] <icinga-wm>	 PROBLEM - HHVM rendering on mw2176 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[12:41:12] <icinga-wm>	 RECOVERY - HHVM rendering on mw2176 is OK: HTTP OK: HTTP/1.1 200 OK - 75146 bytes in 0.210 second response time
[12:43:51] <gehel>	 volans: thanks !
[12:47:29] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: use archive.debian.org as jessie-backports repo [puppet] - 10https://gerrit.wikimedia.org/r/491736 (https://phabricator.wikimedia.org/T216497)
[12:52:12] <wikibugs>	 10Operations, 10IRCecho, 10Icinga, 10monitoring: Icinga check for ircecho should check for actual activity - https://phabricator.wikimedia.org/T216611 (10CDanis) This would be an incredibly silly way to do it, but it would be very easy to write a `check_prometheus` invocation for outgoing network traffic f...
[12:54:45] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: cumin: add cloud-eqiad1 alias [puppet] - 10https://gerrit.wikimedia.org/r/491740
[12:55:52] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cumin: add cloud-eqiad1 alias [puppet] - 10https://gerrit.wikimedia.org/r/491740 (owner: 10Arturo Borrero Gonzalez)
[12:56:56] <icinga-wm>	 RECOVERY - puppet last run on mw2257 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[13:00:04] <jouncebot>	 Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T1300)
[13:00:59] <jbond42>	 !log rolling restarts for hhvm in eqiad
[13:01:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:33] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: upgrade elasticsearch / cirrus / codfw to 5.6.14 [puppet] - 10https://gerrit.wikimedia.org/r/491746 (https://phabricator.wikimedia.org/T215931)
[13:36:07] <wikibugs>	 (03PS3) 10Muehlenhoff: profile::prometheus::nutcracker_exporter: Remove trusty support [puppet] - 10https://gerrit.wikimedia.org/r/490881
[13:37:51] <wikibugs>	 (03CR) 10Gehel: "PCC looks happy: https://puppet-compiler.wmflabs.org/compiler1001/14752/" [puppet] - 10https://gerrit.wikimedia.org/r/491746 (https://phabricator.wikimedia.org/T215931) (owner: 10Gehel)
[13:37:55] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] profile::prometheus::nutcracker_exporter: Remove trusty support [puppet] - 10https://gerrit.wikimedia.org/r/490881 (owner: 10Muehlenhoff)
[13:41:20] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] elasticsearch: upgrade elasticsearch / cirrus / codfw to 5.6.14 [puppet] - 10https://gerrit.wikimedia.org/r/491746 (https://phabricator.wikimedia.org/T215931) (owner: 10Gehel)
[13:43:41] <wikibugs>	 (03CR) 10Mathew.onipe: [C: 03+1] elasticsearch: upgrade elasticsearch / cirrus / codfw to 5.6.14 [puppet] - 10https://gerrit.wikimedia.org/r/491746 (https://phabricator.wikimedia.org/T215931) (owner: 10Gehel)
[13:44:58] <wikibugs>	 (03PS2) 10Gehel: elasticsearch: upgrade elasticsearch / cirrus / codfw to 5.6.14 [puppet] - 10https://gerrit.wikimedia.org/r/491746 (https://phabricator.wikimedia.org/T215931)
[13:45:40] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] elasticsearch: upgrade elasticsearch / cirrus / codfw to 5.6.14 [puppet] - 10https://gerrit.wikimedia.org/r/491746 (https://phabricator.wikimedia.org/T215931) (owner: 10Gehel)
[13:48:16] <icinga-wm>	 PROBLEM - HHVM rendering on mw1257 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[13:49:22] <icinga-wm>	 RECOVERY - HHVM rendering on mw1257 is OK: HTTP OK: HTTP/1.1 200 OK - 75202 bytes in 0.760 second response time
[13:51:37] <godog>	 !log prometheus on prometheus2004 crashed/exited after journald upgrade -- starting up again now
[13:51:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:59:26] <gehel>	 !log rolling upgrade of elasticsearch / cirrus / codfw to 5.6.14 - T215931
[13:59:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:59:29] <stashbot>	 T215931: Upgrade elasticsearch to 5.6.14 - https://phabricator.wikimedia.org/T215931
[14:00:05] <jouncebot>	 Deploy window MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T1400)
[14:00:19] <logmsgbot>	 !log gehel@cumin2001 START - Cookbook sre.elasticsearch.rolling-upgrade
[14:00:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:00:41] <logmsgbot>	 !log gehel@cumin2001 END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
[14:00:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:00:45] <dcausse>	 :)
[14:00:59] <onimisionipe>	 here we go
[14:01:36] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on logstash1006 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=logstash1006&var-datasource=eqiad+prometheus/ops
[14:04:34] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on icinga2001 is CRITICAL: 8.112 le 60 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[14:05:48] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on icinga2001 is OK: (C)60 le (W)70 le 107.6 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[14:06:31] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10Ottomata) > mw job runners -> cloudelastic : closed  We have the same problem with updating data in a Presto cluster in the public...
[14:07:01] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Introduce citoid helm chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/491523 (https://phabricator.wikimedia.org/T213194)
[14:08:15] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: access production clusters over HTTPS [software/spicerack] - 10https://gerrit.wikimedia.org/r/491750 (https://phabricator.wikimedia.org/T207920)
[14:09:29] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] elasticsearch: access production clusters over HTTPS [software/spicerack] - 10https://gerrit.wikimedia.org/r/491750 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[14:11:33] <wikibugs>	 10Operations, 10Analytics, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10akosiaris) 05Open→03Stalled Per comment above.
[14:15:14] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] elasticsearch: access production clusters over HTTPS [software/spicerack] - 10https://gerrit.wikimedia.org/r/491750 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[14:16:08] <wikibugs>	 (03CR) 10jenkins-bot: elasticsearch: access production clusters over HTTPS [software/spicerack] - 10https://gerrit.wikimedia.org/r/491750 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[14:19:07] <wikibugs>	 (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.18 [software/spicerack] - 10https://gerrit.wikimedia.org/r/491752
[14:21:42] <wikibugs>	 10Operations, 10Wikimedia-Logstash: Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10Ottomata) > re: the open question itself I'm leaning towards having json on kafka  Yes please!  > There will be one topic per syslog severity [...]  Ok...
[14:22:08] <wikibugs>	 10Operations, 10Patch-For-Review: Redundant bootloaders for software RAID - https://phabricator.wikimedia.org/T215183 (10CDanis) @Joe made me aware of the existence of partman configs present on `install1002` that are not in Puppet.  The good news is that almost all such files are either editor backup files (e...
[14:24:14] <wikibugs>	 (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.18 [software/spicerack] - 10https://gerrit.wikimedia.org/r/491752 (owner: 10Volans)
[14:25:27] <wikibugs>	 (03PS1) 10Volans: Upstream release v0.0.18 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/491753
[14:25:30] <wikibugs>	 (03CR) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.18 [software/spicerack] - 10https://gerrit.wikimedia.org/r/491752 (owner: 10Volans)
[14:30:37] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure: jenkins / zuul backing up due to jenkins slaves down - https://phabricator.wikimedia.org/T216039 (10hashar) 05Open→03Resolved a:03thcipriani We can resolve this task since Tyler did the emergency action.  The `castor-save` job could not be triggered...
[14:30:55] <wikibugs>	 (03CR) 10Volans: [C: 03+2] Upstream release v0.0.18 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/491753 (owner: 10Volans)
[14:33:13] <wikibugs>	 (03CR) 10Ottomata: "The reason we were using uid in some places (like Hue), is that we need the Hue account to  match the shell account so it can do proper us" [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524) (owner: 10Elukey)
[14:34:17] <volans>	 !log uploaded spicerack_0.0.18-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
[14:34:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:14] <volans>	 !log upgraded spicerack to 0.0.18 on cumin[12]001
[14:35:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:21] <logmsgbot>	 !log gehel@cumin2001 START - Cookbook sre.elasticsearch.rolling-upgrade
[14:35:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:36:49] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for kharlan - https://phabricator.wikimedia.org/T216258 (10fsero) Hi,  If you are not sure probably analytics-users is the one. analytics-privatedata-users will give you access to IPs or other PII information which unless you are completely sur...
[14:42:16] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on logstash1006 is CRITICAL: cluster=logstash device=sde instance=logstash1006:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=logstash1006&var-datasource=eqiad+prometheus/ops
[14:43:28] <logmsgbot>	 !log gehel@cumin2001 END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
[14:43:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:50] <wikibugs>	 10Operations, 10Analytics, 10Discovery, 10Research: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10Ottomata) Alright, I'm not familiar with Swift, but if we were to do this, here is what I think we'd need:  - Netwo...
[14:48:08] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for kharlan - https://phabricator.wikimedia.org/T216258 (10Ottomata) We don't actually have a lot of users in `analytics-users`, but I believe if all you need access to are EventLogging and Mediawiki History data in Hadoop, `analytics-users` is...
[14:49:10] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for kharlan - https://phabricator.wikimedia.org/T216258 (10kostajh) > We don't actually have a lot of users in analytics-users, but I believe if all you need access to are EventLogging and Mediawiki History data in Hadoop, analytics-users is th...
[14:49:33] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for kharlan - https://phabricator.wikimedia.org/T216258 (10kostajh)
[14:51:58] <ottomata>	 That eventbus alert looks like it was caused by a message size to large for a mediawiki.job.cirrusSearchElasticaWrite job
[14:52:57] <wikibugs>	 (03CR) 10Elukey: "> The reason we were using uid in some places (like Hue), is that we" [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524) (owner: 10Elukey)
[14:53:30] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: don't fail if cluster is not yet green after 5 minutes [cookbooks] - 10https://gerrit.wikimedia.org/r/491755
[14:53:53] <wikibugs>	 (03PS1) 10CDanis: install_server: purge old files from /srv/autoinstall [puppet] - 10https://gerrit.wikimedia.org/r/491756 (https://phabricator.wikimedia.org/T215183)
[14:55:17] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: don't fail if cluster is not yet green after 5 minutes [cookbooks] - 10https://gerrit.wikimedia.org/r/491755 (owner: 10Gehel)
[15:05:06] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: scaffolding: Add single quotes around metrics [deployment-charts] - 10https://gerrit.wikimedia.org/r/491758
[15:05:08] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: scaffolding: Don't chomp ending whitespace in monitoring [deployment-charts] - 10https://gerrit.wikimedia.org/r/491759
[15:05:10] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: eventgate: Correctly checksum config template [deployment-charts] - 10https://gerrit.wikimedia.org/r/491760
[15:05:12] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: mathoid: Correctly checksum config template [deployment-charts] - 10https://gerrit.wikimedia.org/r/491761
[15:05:21] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans)
[15:05:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Removed run-tests.sh script [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans)
[15:05:48] <icinga-wm>	 PROBLEM - Check systemd state on elastic2049 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[15:06:01] <volans>	 gehel, onimisionipe: ^^^
[15:06:15] <gehel>	 checking
[15:06:24] <icinga-wm>	 PROBLEM - Check systemd state on elastic2047 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[15:07:31] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: mathoid: Bump to version 0.0.17 [deployment-charts] - 10https://gerrit.wikimedia.org/r/491763
[15:08:00] <wikibugs>	 (03PS1) 10Andrew Bogott: nova: update scheduler pools [puppet] - 10https://gerrit.wikimedia.org/r/491764
[15:08:16] <icinga-wm>	 RECOVERY - Check systemd state on elastic2049 is OK: OK - running: The system is fully operational
[15:09:50] <wikibugs>	 (03PS2) 10Andrew Bogott: nova: update scheduler pools [puppet] - 10https://gerrit.wikimedia.org/r/491764
[15:10:06] <icinga-wm>	 RECOVERY - Check systemd state on elastic2047 is OK: OK - running: The system is fully operational
[15:11:05] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nova: update scheduler pools [puppet] - 10https://gerrit.wikimedia.org/r/491764 (owner: 10Andrew Bogott)
[15:11:17] <wikibugs>	 (03PS2) 10BBlack: CI check [dns] - 10https://gerrit.wikimedia.org/r/483198
[15:11:27] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] scaffolding: Add single quotes around metrics [deployment-charts] - 10https://gerrit.wikimedia.org/r/491758 (owner: 10Alexandros Kosiaris)
[15:11:39] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] scaffolding: Don't chomp ending whitespace in monitoring [deployment-charts] - 10https://gerrit.wikimedia.org/r/491759 (owner: 10Alexandros Kosiaris)
[15:12:00] <wikibugs>	 (03CR) 10Volans: "recheck" [cookbooks] - 10https://gerrit.wikimedia.org/r/491755 (owner: 10Gehel)
[15:14:03] <wikibugs>	 (03CR) 10Ottomata: "Ah, yes, I think we do need to use uid here.  It is (or at least it should be) used for auto superset account creation, which should match" [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524) (owner: 10Elukey)
[15:15:05] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] install_server: purge old files from /srv/autoinstall [puppet] - 10https://gerrit.wikimedia.org/r/491756 (https://phabricator.wikimedia.org/T215183) (owner: 10CDanis)
[15:15:56] <wikibugs>	 (03CR) 10Elukey: "> Ah, yes, I think we do need to use uid here.  It is (or at least it" [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524) (owner: 10Elukey)
[15:16:14] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] "Comments addressed and a +1 already, I am gonna merge this and deploy a test in staging." [deployment-charts] - 10https://gerrit.wikimedia.org/r/491523 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris)
[15:17:34] <wikibugs>	 (03PS2) 10CDanis: install_server: purge old files from /srv/autoinstall [puppet] - 10https://gerrit.wikimedia.org/r/491756 (https://phabricator.wikimedia.org/T215183)
[15:17:52] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, with my limited notion of ES orchestration" [cookbooks] - 10https://gerrit.wikimedia.org/r/491755 (owner: 10Gehel)
[15:18:02] <wikibugs>	 (03CR) 10CDanis: "I'm making a backup of /srv/autoinstall as it currently exists on install1002 just to be sure, then merging this" [puppet] - 10https://gerrit.wikimedia.org/r/491756 (https://phabricator.wikimedia.org/T215183) (owner: 10CDanis)
[15:18:05] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] elasticsearch: don't fail if cluster is not yet green after 5 minutes [cookbooks] - 10https://gerrit.wikimedia.org/r/491755 (owner: 10Gehel)
[15:19:12] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] install_server: purge old files from /srv/autoinstall [puppet] - 10https://gerrit.wikimedia.org/r/491756 (https://phabricator.wikimedia.org/T215183) (owner: 10CDanis)
[15:20:06] <wikibugs>	 (03PS1) 10Ottomata: Remove usages of ::cdh::spark, we use ::spark2 now only [puppet] - 10https://gerrit.wikimedia.org/r/491767 (https://phabricator.wikimedia.org/T212134)
[15:20:08] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] eventgate: Correctly checksum config template [deployment-charts] - 10https://gerrit.wikimedia.org/r/491760 (owner: 10Alexandros Kosiaris)
[15:20:40] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans)
[15:20:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Removed run-tests.sh script [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans)
[15:21:52] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Add setup.py and tox.ini [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491768 (https://phabricator.wikimedia.org/T216253)
[15:21:54] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Reformat with black + isort [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491769
[15:21:56] <wikibugs>	 (03PS1) 10Filippo Giunchedi: debian: add dh-python/pybuild [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491770
[15:21:58] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Add missing metrics help text [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491771 (https://phabricator.wikimedia.org/T216253)
[15:22:00] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Add missing metrics help text, required for prometheus 2.0 [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491772
[15:23:42] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] "thanks!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/491760 (owner: 10Alexandros Kosiaris)
[15:24:15] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] "Same thing as the parent change for eventgate-analytics, merging per that +1" [deployment-charts] - 10https://gerrit.wikimedia.org/r/491761 (owner: 10Alexandros Kosiaris)
[15:24:19] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Revert incorrectly overwritten transfer.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/491775
[15:24:22] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] mathoid: Bump to version 0.0.17 [deployment-charts] - 10https://gerrit.wikimedia.org/r/491763 (owner: 10Alexandros Kosiaris)
[15:24:45] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: Revert incorrectly overwritten transfer.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/491775 (owner: 10Jcrespo)
[15:25:46] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Package citoid version 0.0.1 chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/491776
[15:25:50] <wikibugs>	 (03PS1) 10Elukey: camus: make webrequest_text config more similar to prod [puppet] - 10https://gerrit.wikimedia.org/r/491777 (https://phabricator.wikimedia.org/T212259)
[15:26:46] <icinga-wm>	 PROBLEM - grafana.wikimedia.org on krypton is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.073 second response time
[15:27:07] <wikibugs>	 (03CR) 10Jcrespo: [V: 03+2 C: 03+2] mariadb: Revert incorrectly overwritten transfer.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/491775 (owner: 10Jcrespo)
[15:27:20] <volans>	 godog: working on grafana by any chance?
[15:27:42] <godog>	 volans: no
[15:27:48] <godog>	 loads for me now though
[15:27:57] <cdanis>	 krypton is the old host
[15:28:16] <cdanis>	 there is still a not-upgraded instance there that we should tear down
[15:28:45] <cdanis>	 (for a while it was still in use because FR firewall, but that's fixed for some time now)
[15:29:08] <godog>	 funnily enough I was thinking about it the other day, have grafana-beta run grafana 6.0 beta
[15:29:13] <volans>	 died: gmetric failed: sh: 1: /usr/bin/gmetric: not found
[15:29:17] <volans>	 not sure if related
[15:29:26] <cdanis>	 godog: clearly we need grafana1002 and for it to run buster ;)
[15:29:47] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] camus: make webrequest_text config more similar to prod [puppet] - 10https://gerrit.wikimedia.org/r/491777 (https://phabricator.wikimedia.org/T212259) (owner: 10Elukey)
[15:30:18] <godog>	 cdanis: hehe that'd be proper yeah
[15:30:25] <volans>	 the exim-to-gmetric error is a red herring, is there since earlier
[15:30:38] <volans>	 cc herron
[15:31:02] <icinga-wm>	 PROBLEM - docker-registry service on darmstadtium is CRITICAL: CRITICAL - Expecting active but unit docker-registry is inactive
[15:31:43] <herron>	 hey volans, which what exim-to-gmetric error?
[15:31:52] <icinga-wm>	 PROBLEM - Docker registry HTTPS interface on darmstadtium is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string schemaVersion not found on https://darmstadtium.eqiad.wmnet:443/v2/wikimedia-jessie/manifests/latest - 372 bytes in 0.153 second response time
[15:32:12] <godog>	 if krypton isn't in service anyways we should decom/silence it though, seems like a spurious grafana alert
[15:32:25] <volans>	 herron: hey, I was looking at krypton for other reasons and it's spamming syslog with that error I pasted 3 minutes ago
[15:32:33] <volans>	 if it's only on this one not a problem
[15:32:36] <cdanis>	 godog: yeah I'll take a look at decomming grafana from it
[15:32:41] <volans>	 but wanted to cc you in case it might be more spread out
[15:32:53] <herron>	 ah I see
[15:35:34] <moritzm>	 !log temporarily stop prometheus instances on prometheus1004 for systemd upgrade/journald restart
[15:35:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:36:10] <fsero>	 is anyone looking to the docker registry error?
[15:36:14] <fsero>	 if not im looking into it
[15:36:33] <moritzm>	 looking
[15:37:08] <fsero>	 !log restarting docker-registry service on systemd
[15:37:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:38:02] <fsero>	 it was stopped moritzm 
[15:38:06] <icinga-wm>	 RECOVERY - Docker registry HTTPS interface on darmstadtium is OK: HTTP OK: HTTP/1.1 200 OK - 2483 bytes in 0.624 second response time
[15:38:13] <fsero>	 similar to the issue with ircecho before
[15:38:18] <icinga-wm>	 RECOVERY - docker-registry service on darmstadtium is OK: OK - docker-registry is active
[15:38:22] <fsero>	 i guess you just updated systemd there
[15:39:03] <moritzm>	 yeah, it also seems related to the journald restart, although I don't know yet why
[15:39:49] <wikibugs>	 (03CR) 10BBlack: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/483198 (owner: 10BBlack)
[15:40:28] <wikibugs>	 (03PS1) 10Elukey: Rename kafka webrequest test topic [puppet] - 10https://gerrit.wikimedia.org/r/491779 (https://phabricator.wikimedia.org/T212259)
[15:40:52] <wikibugs>	 (03PS1) 10CDanis: webserver_misc_apps: unbundle grafana [puppet] - 10https://gerrit.wikimedia.org/r/491780
[15:41:06] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1293 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.074 second response time
[15:41:20] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1296 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.073 second response time
[15:41:20] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1294 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.073 second response time
[15:41:55] <wikibugs>	 (03CR) 10Filippo Giunchedi: logstash: force use elasticsearch-curator 5 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/490809 (https://phabricator.wikimedia.org/T213898) (owner: 10Filippo Giunchedi)
[15:42:01] <wikibugs>	 (03PS3) 10Filippo Giunchedi: logstash: force use elasticsearch-curator 5 [puppet] - 10https://gerrit.wikimedia.org/r/490809 (https://phabricator.wikimedia.org/T213898)
[15:42:18] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1293 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.080 second response time
[15:42:36] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1296 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.080 second response time
[15:42:36] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1294 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.079 second response time
[15:42:56] <icinga-wm>	 RECOVERY - grafana.wikimedia.org on krypton is OK: HTTP OK: HTTP/1.1 200 OK - 31353 bytes in 0.123 second response time
[15:44:21] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] Rename kafka webrequest test topic [puppet] - 10https://gerrit.wikimedia.org/r/491779 (https://phabricator.wikimedia.org/T212259) (owner: 10Elukey)
[15:44:25] <wikibugs>	 10Operations, 10Discovery-Search, 10Elasticsearch: fix broken visualizations in Elasticsearch Node comparison dashboard - https://phabricator.wikimedia.org/T212831 (10Mathew.onipe) 05Open→03Resolved
[15:44:28] <wikibugs>	 10Operations, 10Elasticsearch, 10Maps, 10Discovery-Search (Current work): Review Elastic/maps Grafana dashboards - https://phabricator.wikimedia.org/T209812 (10Mathew.onipe)
[15:44:37] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Rename kafka webrequest test topic [puppet] - 10https://gerrit.wikimedia.org/r/491779 (https://phabricator.wikimedia.org/T212259) (owner: 10Elukey)
[15:45:55] <wikibugs>	 (03PS4) 10Filippo Giunchedi: logstash: force use elasticsearch-curator 5 [puppet] - 10https://gerrit.wikimedia.org/r/490809 (https://phabricator.wikimedia.org/T213898)
[15:46:12] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] "Thanks for the reviews!" [puppet] - 10https://gerrit.wikimedia.org/r/490809 (https://phabricator.wikimedia.org/T213898) (owner: 10Filippo Giunchedi)
[15:47:30] <godog>	 elukey: merging your change too
[15:47:42] <elukey>	 thanks!
[15:48:41] <wikibugs>	 (03CR) 10CDanis: "krypton is the only machine with role(webserver_misc_apps) I could find in the depot." [puppet] - 10https://gerrit.wikimedia.org/r/491780 (owner: 10CDanis)
[15:49:13] <elukey>	 godog: lemme know once done
[15:49:32] <godog>	 elukey: yup I'm done!
[15:49:37] <elukey>	 thankss
[15:52:30] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans)
[15:55:59] <bblack>	 !log authdns2001: upgrade gdnsd to 3.0.0-1~wmf1
[15:56:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:58:14] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1303 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.074 second response time
[15:58:28] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1300 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.077 second response time
[15:58:34] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1301 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.074 second response time
[15:59:30] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1303 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.078 second response time
[15:59:42] <wikibugs>	 (03CR) 10Mforns: "> > I understand your concern Luca. I also think it is likely to fail" [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal)
[15:59:44] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1300 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.081 second response time
[15:59:50] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1301 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.079 second response time
[16:00:45] <wikibugs>	 (03PS2) 10Ottomata: Remove usages of ::cdh::spark, we use ::spark2 now only [puppet] - 10https://gerrit.wikimedia.org/r/491767 (https://phabricator.wikimedia.org/T212134)
[16:00:57] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Remove usages of ::cdh::spark, we use ::spark2 now only [puppet] - 10https://gerrit.wikimedia.org/r/491767 (https://phabricator.wikimedia.org/T212134) (owner: 10Ottomata)
[16:03:44] <ottomata>	 !log removing spark 1 from Analytics cluster - T212134
[16:03:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:03:47] <stashbot>	 T212134: Deprecate Spark 1.6 in favor of Spark 2.x only - https://phabricator.wikimedia.org/T212134
[16:06:09] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Removed run-tests.sh script [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans)
[16:06:13] <wikibugs>	 (03PS4) 10BBlack: Removed run-tests.sh script [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans)
[16:11:54] <wikibugs>	 (03PS2) 10Elukey: superset: fix httpd LDAP auth message [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524)
[16:12:23] <icinga-wm>	 PROBLEM - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.300 second response time
[16:12:34] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m3 on db2078 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 311.18 seconds
[16:12:36] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m3 on db2042 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 311.32 seconds
[16:14:56] <marostegui>	 uh?
[16:14:58] <marostegui>	 checking
[16:16:02] <marostegui>	    Cache Status: Permanently Disabled
[16:16:22] <marostegui>	 https://phabricator.wikimedia.org/T202051
[16:17:25] <marostegui>	 spike of inserts too
[16:17:27] <marostegui>	 it must be that
[16:19:32] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] superset: fix httpd LDAP auth message [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524) (owner: 10Elukey)
[16:19:56] <twentyafterfour>	 !log stopped phd on phab1002 
[16:19:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:20:44] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] superset: fix httpd LDAP auth message [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524) (owner: 10Elukey)
[16:24:49] <bblack>	 !log authdns1001: upgrade gdnsd to 3.0.0-1~wmf1
[16:24:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:26:17] <twentyafterfour>	 !log stopped phd on phab1001 and scheduled downtime in icinga 
[16:26:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:26:19] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Revert "scap: use logstash1008 for logstash_host" [puppet] - 10https://gerrit.wikimedia.org/r/491794 (https://phabricator.wikimedia.org/T213898)
[16:26:43] <elukey>	 Krinkle: thanks for the merge!
[16:28:00] <wikibugs>	 (03PS2) 10Cwhite: hiera: upgrade prometheus-node-exporter to 0.17 in codfw [puppet] - 10https://gerrit.wikimedia.org/r/490689 (https://phabricator.wikimedia.org/T213708)
[16:28:02] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] "Reimaging logstash1008 now, move back to logstash1009" [puppet] - 10https://gerrit.wikimedia.org/r/491794 (https://phabricator.wikimedia.org/T213898) (owner: 10Filippo Giunchedi)
[16:28:16] <wikibugs>	 (03PS2) 10Kosta Harlan: [WIP] GrowthExperiments: Soft launch of help panel on viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489729 (https://phabricator.wikimedia.org/T215666)
[16:29:29] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: support cluster names which have '-' in them [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920)
[16:30:41] <Krinkle>	 elukey: yw, and thanks for pinging me so I remember to backport (was about to forget)
[16:35:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: support cluster names which have '-' in them [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[16:36:10] <godog>	 !log depool and reimage logstash1008 with stretch - T213898
[16:36:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:36:13] <stashbot>	 T213898: Replace and expand Elasticsearch storage in eqiad and upgrade the cluster from Debian jessie to stretch - https://phabricator.wikimedia.org/T213898
[16:36:58] <wikibugs>	 (03PS2) 10Gehel: elasticsearch: support cluster names which have '-' in them [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920)
[16:37:42] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "see inline" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[16:38:11] <bblack>	 !log multatuli: upgrade gdnsd to 3.0.0-1~wmf1
[16:38:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:38:35] <wikibugs>	 (03PS3) 10Gehel: elasticsearch: support cluster names which have '-' in them [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920)
[16:38:36] <icinga-wm>	 PROBLEM - Host logstash1008 is DOWN: PING CRITICAL - Packet loss = 100%
[16:38:46] <wikibugs>	 (03CR) 10Gehel: elasticsearch: support cluster names which have '-' in them (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[16:39:28] <icinga-wm>	 RECOVERY - Host logstash1008 is UP: PING OK - Packet loss = 0%, RTA = 37.81 ms
[16:39:50] <volans>	 godog: missing downtime?
[16:40:03] <godog>	 volans: indeed, fixed now
[16:40:45] <twentyafterfour>	 !log started phd again, seems to be working now without killing the db 
[16:40:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:41:18] <volans>	 godog: I keep mixing which logstash is physical and which ganeti and I thought was a bug in the reimage script ;)
[16:41:56] <wikibugs>	 (03CR) 10Mathew.onipe: "> Patch Set 2: Code-Review-1" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[16:45:30] <wikibugs>	 (03CR) 10Volans: elasticsearch: support cluster names which have '-' in them (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[16:46:07] <godog>	 volans: hhehe I'll bug you for sure if I come across a bug in wmf-reimage
[16:46:26] <wikibugs>	 (03PS4) 10Gehel: elasticsearch: support cluster names which have '-' in them [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920)
[16:46:43] <wikibugs>	 (03CR) 10Gehel: elasticsearch: support cluster names which have '-' in them (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[16:49:24] <herron>	 !log migrating es shards away from logstash100[56] with "cluster.routing.allocation.exclude._name" : "logstash1005-production-logstash-eqiad,logstash1006-production-logstash-eqiad” T214608
[16:49:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:49:27] <stashbot>	 T214608: rack/setup/install logstash101[012].eqiad.wmnet - https://phabricator.wikimedia.org/T214608
[16:50:34] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[16:53:18] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m3 on db2078 is OK: OK slave_sql_lag Replication lag: 0.03 seconds
[16:53:20] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m3 on db2042 is OK: OK slave_sql_lag Replication lag: 0.13 seconds
[16:54:50] <godog>	 uugghh looks like squid got a bogus copy of rsyslog_8.38.0-1%7ebpo9%2b1wmf1_amd64.deb and thus a stretch reinstall is failing with hash mismatch
[16:56:31] <wikibugs>	 (03CR) 10CRusnov: "> Patch Set 5: Code-Review-1" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov)
[16:58:15] <godog>	 (purged)
[16:58:32] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] elasticsearch: support cluster names which have '-' in them [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[16:59:30] <wikibugs>	 (03CR) 10jenkins-bot: elasticsearch: support cluster names which have '-' in them [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[17:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Your horoscope predicts another unfortunate Morning SWAT (Max 6 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T1700).
[17:00:05] <jouncebot>	 Zoranzoki21 and stephanebisson: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[17:00:16] <stephanebisson>	 hey
[17:00:37] <stephanebisson>	 I can SWAT
[17:01:10] <stephanebisson>	 Zoranzoki21... are you here under another name?
[17:01:37] <hauskatze>	 I don't think he is
[17:01:45] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: get_next_clusters_nodes raises ElasticsearchClusterError [software/spicerack] - 10https://gerrit.wikimedia.org/r/491803 (https://phabricator.wikimedia.org/T207920)
[17:02:01] <stephanebisson>	 I'll start with my patch. That'll give them some time to show up.
[17:02:19] <wikibugs>	 (03CR) 10Gehel: elasticsearch: get_next_clusters_nodes raises ElasticsearchClusterError (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491803 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[17:02:36] <wikibugs>	 (03PS1) 10BBlack: authdns: listen for local PROXY, min v6 threads [puppet] - 10https://gerrit.wikimedia.org/r/491804
[17:02:38] <wikibugs>	 (03PS1) 10BBlack: Lock memory for gdnsd [puppet] - 10https://gerrit.wikimedia.org/r/491805
[17:04:37] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] authdns: listen for local PROXY, min v6 threads [puppet] - 10https://gerrit.wikimedia.org/r/491804 (owner: 10BBlack)
[17:04:41] <wikibugs>	 10Operations, 10Traffic, 10Wikidata, 10serviceops, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Addshore) Okay, so the question that I have now been asked is "why we can't simply do a DNS re-route without changing the owner". So, why can'...
[17:04:51] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Lock memory for gdnsd [puppet] - 10https://gerrit.wikimedia.org/r/491805 (owner: 10BBlack)
[17:04:51] <wikibugs>	 10Operations, 10Traffic, 10Wikidata, 10serviceops, and 3 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Addshore)
[17:08:57] <hashar>	 !log contint1001: fix broken root ownership on zuul git deploy repo: sudo find /etc/zuul/wikimedia/.git -not -user zuul -exec chown zuul:zuul {} +
[17:08:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:49] <icinga-wm>	 RECOVERY - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.340 second response time
[17:14:48] <wikibugs>	 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) Updates from https://github.com/RadeonOpenCompute/ROCm/issues/714#issuecomment-465666946 are not encouraging, gfx701 is a dead end so w...
[17:15:01] <wikibugs>	 10Operations, 10Code-Stewardship-Reviews, 10Graphoid, 10Core Platform Team Backlog (Watching / External), and 2 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10Jhernandez) >>! In T211881#4954470, @akosiaris wrote: >>>! In T211881#4954092, @Jhernandez wrote: >> The...
[17:17:26] <icinga-wm>	 PROBLEM - puppet last run on cloudvirtan1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[ip addr add 2620:0:861:118:10:64:20:46/64 dev eth0]
[17:18:53] <wikibugs>	 (03CR) 10Mathew.onipe: elasticsearch: get_next_clusters_nodes raises ElasticsearchClusterError (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491803 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[17:21:14] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10elukey) In https://github.com/RadeonOpenCompute/ROCm/issues/714#issuecomment-465666946 the upstream developers of the AMD drivers told me that our GPU on stat1005 is b...
[17:22:36] <logmsgbot>	 !log sbisson@deploy1001 Synchronized php-1.33.0-wmf.18/extensions/Flow/modules/mw.flow.Initializer.js: SWAT: [[gerrit:491744|Unbreak reply clicks with existing widget]] (duration: 00m 58s)
[17:22:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:23:51] <stephanebisson>	 I'm done SWATing my patch.
[17:25:42] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: systemctl iterates explicitly on elasticsearch instances [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920)
[17:26:22] <wikibugs>	 10Operations, 10Traffic, 10Wikidata, 10serviceops, and 3 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10BBlack) There are different layers of "handing off" DNS management which are being conflated, but to run through them in order:  1) ** "Point...
[17:30:56] <wikibugs>	 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MSantos) I tried to run proton (which I could before this update) and c...
[17:34:42] <wikibugs>	 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MoritzMuehlenhoff) Where did you get the 72.0.3618.0 chromium build fro...
[17:42:02] <wikibugs>	 (03PS1) 10Filippo Giunchedi: logstash: remove cycle for apt::pin in collector [puppet] - 10https://gerrit.wikimedia.org/r/491811
[17:43:12] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1020 - https://phabricator.wikimedia.org/T194855 (10GTirloni) @Cmjohnson thank you!  RAID reconfigured with spares.  ` => ctrl slot=1 create type=ld drives=1I:1:5,1I:1:6,1I:1:7,1I:1:8,2I:1:1,2I:1:2,2I:1...
[17:43:24] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1020 - https://phabricator.wikimedia.org/T194855 (10GTirloni) 05Open→03Resolved
[17:43:32] <icinga-wm>	 RECOVERY - puppet last run on cloudvirtan1003 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[17:47:16] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "I think the bash command is missing a part." (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[17:49:01] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/491803 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[17:52:00] <wikibugs>	 (03PS2) 10CRusnov: Add dummy password for ganeti readonly user. [labs/private] - 10https://gerrit.wikimedia.org/r/491552
[17:52:28] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [labs/private] - 10https://gerrit.wikimedia.org/r/491552 (owner: 10CRusnov)
[17:53:33] <wikibugs>	 (03PS2) 10Gehel: elasticsearch: systemctl iterates explicitly on elasticsearch instances [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920)
[17:53:39] <wikibugs>	 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MSantos) @MoritzMuehlenhoff get it from google's chromium-browser-snaps...
[17:53:50] <wikibugs>	 (03PS6) 10CRusnov: Add ganeti read-only user deployment [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229)
[17:54:53] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] "PCC https://puppet-compiler.wmflabs.org/compiler1001/14753/" [puppet] - 10https://gerrit.wikimedia.org/r/491811 (owner: 10Filippo Giunchedi)
[17:56:08] <wikibugs>	 (03CR) 10CRusnov: [V: 03+2 C: 03+2] Add dummy password for ganeti readonly user. [labs/private] - 10https://gerrit.wikimedia.org/r/491552 (owner: 10CRusnov)
[17:57:58] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Package citoid version 0.0.1 chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/491776
[18:00:49] <wikibugs>	 (03CR) 10Gehel: elasticsearch: systemctl iterates explicitly on elasticsearch instances (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[18:01:38] <wikibugs>	 (03PS1) 10Gehel: WIP: experimentation with type hints [software/spicerack] - 10https://gerrit.wikimedia.org/r/491812
[18:04:09] <wikibugs>	 (03PS7) 10CRusnov: Add ganeti read-only user deployment [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229)
[18:04:25] <logmsgbot>	 !log mobrovac@deploy1001 Started deploy [restbase/deploy@80f518c]: Remove VE request logging - T215956
[18:04:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:04:28] <stashbot>	 T215956: Consider stashing data-parsoid for VE  - https://phabricator.wikimedia.org/T215956
[18:04:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] WIP: experimentation with type hints [software/spicerack] - 10https://gerrit.wikimedia.org/r/491812 (owner: 10Gehel)
[18:05:16] <wikibugs>	 (03CR) 10Andrew Bogott: "This breaks puppet runs on cloud Trusty VMs:" [puppet] - 10https://gerrit.wikimedia.org/r/487888 (owner: 10Muehlenhoff)
[18:07:47] <wikibugs>	 (03PS1) 10Andrew Bogott: Revert "imagemagick: Unconditionally use /etc/ImageMagick-6/" [puppet] - 10https://gerrit.wikimedia.org/r/491816
[18:08:17] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "imagemagick: Unconditionally use /etc/ImageMagick-6/" [puppet] - 10https://gerrit.wikimedia.org/r/491816 (owner: 10Andrew Bogott)
[18:08:31] <wikibugs>	 (03PS2) 10Andrew Bogott: Revert "imagemagick: Unconditionally use /etc/ImageMagick-6/" [puppet] - 10https://gerrit.wikimedia.org/r/491816
[18:09:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "imagemagick: Unconditionally use /etc/ImageMagick-6/" [puppet] - 10https://gerrit.wikimedia.org/r/491816 (owner: 10Andrew Bogott)
[18:09:11] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "On second thought is not that simple" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[18:09:43] <wikibugs>	 (03CR) 10Framawiki: [C: 03+1] Disable mobile main page special casing on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491699 (https://phabricator.wikimedia.org/T216563) (owner: 10Zoranzoki21)
[18:09:51] <wikibugs>	 (03PS3) 10Andrew Bogott: Revert "imagemagick: Unconditionally use /etc/ImageMagick-6/" [puppet] - 10https://gerrit.wikimedia.org/r/491816
[18:10:50] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Revert "imagemagick: Unconditionally use /etc/ImageMagick-6/" [puppet] - 10https://gerrit.wikimedia.org/r/491816 (owner: 10Andrew Bogott)
[18:12:01] <wikibugs>	 (03CR) 10Gehel: elasticsearch: systemctl iterates explicitly on elasticsearch instances (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[18:16:48] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Add the option of postprocessing backups [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/491818 (https://phabricator.wikimedia.org/T210292)
[18:17:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: Add the option of postprocessing backups [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/491818 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo)
[18:19:40] <logmsgbot>	 !log fdans@deploy1001 Started deploy [analytics/refinery@ccf837e]: deploying refinery for new wikis and changes in scripts
[18:19:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:20:15] <wikibugs>	 (03CR) 10Framawiki: "It is common practice to respect when possible and not forgotten the same numbers in the configurations between wikis." (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491054 (https://phabricator.wikimedia.org/T216322) (owner: 10Ammarpad)
[18:24:44] <logmsgbot>	 !log mobrovac@deploy1001 Finished deploy [restbase/deploy@80f518c]: Remove VE request logging - T215956 (duration: 20m 19s)
[18:24:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:24:49] <stashbot>	 T215956: Consider stashing data-parsoid for VE  - https://phabricator.wikimedia.org/T215956
[18:26:37] <wikibugs>	 (03PS2) 10Elukey: profile::analytics::refinery: add a wrapper for analytics-mysql [puppet] - 10https://gerrit.wikimedia.org/r/491528 (https://phabricator.wikimedia.org/T212386)
[18:30:02] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::analytics::refinery: add a wrapper for analytics-mysql [puppet] - 10https://gerrit.wikimedia.org/r/491528 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey)
[18:30:53] <logmsgbot>	 !log fdans@deploy1001 Finished deploy [analytics/refinery@ccf837e]: deploying refinery for new wikis and changes in scripts (duration: 11m 13s)
[18:30:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:34:13] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: WIP: Fix iteration of secret values in all deployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/491821
[18:38:05] <wikibugs>	 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MoritzMuehlenhoff) >>! In T216493#4969195, @MSantos wrote: > @MoritzMue...
[18:39:13] <wikibugs>	 (03PS1) 10Zoranzoki21: Add img.raremaps.com at wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491823 (https://phabricator.wikimedia.org/T216638)
[18:39:44] <wikibugs>	 (03PS1) 10Elukey: Add profile::analytics::refinery to notebook100[3,4] and stat1006 [puppet] - 10https://gerrit.wikimedia.org/r/491824 (https://phabricator.wikimedia.org/T212386)
[18:40:46] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] "Also need the change in scap hosts" [puppet] - 10https://gerrit.wikimedia.org/r/491824 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey)
[18:41:34] <wikibugs>	 (03PS1) 10GTirloni: cloudvirt1020: Network config [puppet] - 10https://gerrit.wikimedia.org/r/491825 (https://phabricator.wikimedia.org/T193264)
[18:41:57] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] debian: add dh-python/pybuild [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491770 (owner: 10Filippo Giunchedi)
[18:42:15] <wikibugs>	 (03CR) 10GTirloni: [C: 03+2] cloudvirt1020: Network config [puppet] - 10https://gerrit.wikimedia.org/r/491825 (https://phabricator.wikimedia.org/T193264) (owner: 10GTirloni)
[18:42:26] <wikibugs>	 (03CR) 10Jcrespo: "Not finished, just FYI. This will allow to retry the statistics gathering and be the second execution to finish postprocessing after trans" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/491818 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo)
[18:43:02] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] Add missing metrics help text, required for prometheus 2.0 [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491772 (owner: 10Filippo Giunchedi)
[18:45:38] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] Add missing metrics help text [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491771 (https://phabricator.wikimedia.org/T216253) (owner: 10Filippo Giunchedi)
[18:45:59] <wikibugs>	 (03PS2) 10Elukey: Add profile::analytics::refinery to notebook100[3,4] and stat1006 [puppet] - 10https://gerrit.wikimedia.org/r/491824 (https://phabricator.wikimedia.org/T212386)
[18:47:25] <Krenair>	 fyi, report of a network problem in OTRS #2019022010008102
[18:47:27] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1001/14758/" [puppet] - 10https://gerrit.wikimedia.org/r/491824 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey)
[18:47:43] <Krenair>	 (just a minor report, might be nothing)
[18:47:50] <wikibugs>	 (03PS1) 10Zoranzoki21: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491826 (https://phabricator.wikimedia.org/T216642)
[18:48:28] <wikibugs>	 (03PS2) 10Zoranzoki21: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491826 (https://phabricator.wikimedia.org/T216642)
[18:48:49] <wikibugs>	 (03CR) 10Muehlenhoff: Add setup.py and tox.ini (031 comment) [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491768 (https://phabricator.wikimedia.org/T216253) (owner: 10Filippo Giunchedi)
[18:49:12] <wikibugs>	 (03CR) 10Elukey: "scap change in https://gerrit.wikimedia.org/r/#/c/analytics/refinery/scap/+/491827/" [puppet] - 10https://gerrit.wikimedia.org/r/491824 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey)
[18:51:55] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1009 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64
[18:52:02] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good (that print statement was some debugging leftover)" [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491769 (owner: 10Filippo Giunchedi)
[18:58:17] <wikibugs>	 (03PS1) 10CRusnov: (hopefully) get the dummy hiera key in the right place. [labs/private] - 10https://gerrit.wikimedia.org/r/491830
[19:00:04] <jouncebot>	 Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T1900)
[19:01:41] <wikibugs>	 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1018 - https://phabricator.wikimedia.org/T216004 (10GTirloni) All good, thank you!
[19:01:47] <wikibugs>	 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1018 - https://phabricator.wikimedia.org/T216004 (10GTirloni) 05Open→03Resolved
[19:10:31] <wikibugs>	 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MSantos)
[19:14:06] <wikibugs>	 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MSantos) >>! In T216493#4969393, @MoritzMuehlenhoff wrote: > [...] I th...
[19:14:35] <wikibugs>	 10Puppet: hiera_lookup: Allow query against checkout of labs/private in addition to checkout of operations/puppet - https://phabricator.wikimedia.org/T216647 (10crusnov)
[19:17:49] <wikibugs>	 (03PS3) 10Zoranzoki21: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491826 (https://phabricator.wikimedia.org/T216642)
[19:23:30] <wikibugs>	 (03PS1) 10Andrew Bogott: imagemagick: Resolve version conflicts between toolforge and prod [puppet] - 10https://gerrit.wikimedia.org/r/491837 (https://phabricator.wikimedia.org/T216506)
[19:27:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] imagemagick: Resolve version conflicts between toolforge and prod [puppet] - 10https://gerrit.wikimedia.org/r/491837 (https://phabricator.wikimedia.org/T216506) (owner: 10Andrew Bogott)
[19:33:22] <icinga-wm>	 ACKNOWLEDGEMENT - HP RAID on cloudvirt1020 is CRITICAL: CRITICAL: Slot 1: OK: 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:1:1, 2I:1:2, 2I:1:3, 2I:1:4, 2I:2:1, 2I:2:2 - Controller: OK - Battery count: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T216649
[19:33:25] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1020 - https://phabricator.wikimedia.org/T216649 (10ops-monitoring-bot)
[19:42:26] <wikibugs>	 (03PS2) 10Andrew Bogott: imagemagick: Resolve version conflicts between toolforge and prod [puppet] - 10https://gerrit.wikimedia.org/r/491837 (https://phabricator.wikimedia.org/T216506)
[19:56:50] <wikibugs>	 (03PS2) 10Gehel: WIP: experimentation with type hints [software/spicerack] - 10https://gerrit.wikimedia.org/r/491812
[20:00:04] <jouncebot>	 thcipriani: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for MediaWiki train - Americas version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T2000).
[20:00:21] * thcipriani train
[20:00:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] WIP: experimentation with type hints [software/spicerack] - 10https://gerrit.wikimedia.org/r/491812 (owner: 10Gehel)
[20:14:01] <logmsgbot>	 !log thcipriani@deploy1001 Synchronized php-1.33.0-wmf.18/extensions/EventBus/includes/EventBusRCFeedEngine.php: [[gerrit:491845|Check for eventServiceName in config before accessing]] T216561 (duration: 00m 55s)
[20:14:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:14:04] <stashbot>	 T216561: extensions/EventBus/includes/EventBusRCFeedEngine.php:45 PHP Notice: Undefined index: eventServiceName - https://phabricator.wikimedia.org/T216561
[20:23:15] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10EBernhardson) Another thing to take away from the upstream response is that debian is unsupported. I can't imagine deploying ubuntu to a single machine will be an acce...
[20:29:14] <wikibugs>	 (03PS1) 10Thcipriani: group1 wikis to 1.33.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491849
[20:29:16] <wikibugs>	 (03CR) 10Thcipriani: [C: 03+2] group1 wikis to 1.33.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491849 (owner: 10Thcipriani)
[20:30:55] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.33.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491849 (owner: 10Thcipriani)
[20:31:12] <wikibugs>	 (03CR) 10jenkins-bot: group1 wikis to 1.33.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491849 (owner: 10Thcipriani)
[20:32:45] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: add script to execute systemctl on each elasticsearch instance [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920)
[20:33:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: add script to execute systemctl on each elasticsearch instance [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[20:33:49] <logmsgbot>	 !log thcipriani@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.18
[20:34:43] <logmsgbot>	 !log thcipriani@deploy1001 Synchronized php: group1 wikis to 1.33.0-wmf.18 (duration: 00m 53s)
[20:35:50] <stashbot>	 thcipriani@deploy1001: Failed to log message to wiki. Somebody should check the error logs.
[20:35:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:38:37] <wikibugs>	 (03PS2) 10Gehel: elasticsearch: add script to execute systemctl on each elasticsearch instance [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920)
[20:39:12] <wikibugs>	 (03PS3) 10Gehel: elasticsearch: add script to execute systemctl on each elasticsearch instance [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920)
[20:39:21] <icinga-wm>	 PROBLEM - Apache HTTP on mw1240 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[20:39:37] <wikibugs>	 (03PS4) 10Gehel: elasticsearch: add script to execute systemctl on each elasticsearch instance [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920)
[20:39:53] <wikibugs>	 (03CR) 10Framawiki: "I've tested that conf on live instance and it works as excepted, no error found." [puppet] - 10https://gerrit.wikimedia.org/r/491377 (https://phabricator.wikimedia.org/T214637) (owner: 10Framawiki)
[20:40:27] <icinga-wm>	 RECOVERY - Apache HTTP on mw1240 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.101 second response time
[20:41:45] <wikibugs>	 (03PS4) 10Framawiki: quarry: Setup CSP http header [puppet] - 10https://gerrit.wikimedia.org/r/491377 (https://phabricator.wikimedia.org/T214637)
[20:42:51] <wikibugs>	 (03PS3) 10Gehel: elasticsearch: systemctl iterates explicitly on elasticsearch instances [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920)
[20:43:31] <wikibugs>	 (03CR) 10DCausse: "I'm a bit late on this but thanks for shipping it!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491231 (https://phabricator.wikimedia.org/T215969) (owner: 10DCausse)
[20:45:21] <wikibugs>	 (03PS5) 10Gehel: elasticsearch: add script to execute systemctl on each elasticsearch instance [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920)
[20:46:22] <wikibugs>	 (03CR) 10Gehel: elasticsearch: add script to execute systemctl on each elasticsearch instance (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[20:47:05] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: systemctl iterates explicitly on elasticsearch instances [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[20:47:19] <wikibugs>	 (03PS9) 10Eevans: Initial configuration for session storage service [puppet] - 10https://gerrit.wikimedia.org/r/487885 (https://phabricator.wikimedia.org/T215883)
[20:48:44] <wikibugs>	 (03PS4) 10Gehel: elasticsearch: systemctl iterates explicitly on elasticsearch instances [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920)
[21:00:04] <jouncebot>	 cscott, arlolra, subbu, bearND, halfak, and Amir1: My dear minions, it's time we take the moon! Just kidding. Time for Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T2100).
[21:02:47] <wikibugs>	 (03PS10) 10Eevans: Initial configuration for session storage service [puppet] - 10https://gerrit.wikimedia.org/r/487885 (https://phabricator.wikimedia.org/T215883)
[21:08:36] <_joe_>	 !log rolling restart of php-fpm to catch up with the tideways change
[21:08:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:09:29] <wikibugs>	 (03CR) 10Herron: [C: 03+1] (hopefully) get the dummy hiera key in the right place. [labs/private] - 10https://gerrit.wikimedia.org/r/491830 (owner: 10CRusnov)
[21:09:46] <wikibugs>	 (03CR) 10CRusnov: [V: 03+2 C: 03+2] (hopefully) get the dummy hiera key in the right place. [labs/private] - 10https://gerrit.wikimedia.org/r/491830 (owner: 10CRusnov)
[21:12:49] <wikibugs>	 (03CR) 10Eevans: "[PC output](http://puppet-compiler.wmflabs.org/14763/) here.  The compile fails because of missing secrets (which seems...right).  Aside f" [puppet] - 10https://gerrit.wikimedia.org/r/487885 (https://phabricator.wikimedia.org/T215883) (owner: 10Eevans)
[21:19:12] <logmsgbot>	 !log arlolra@deploy1001 Started deploy [parsoid/deploy@c4574d1]: Updating Parsoid to 9b204a0
[21:19:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:19:19] <wikibugs>	 (03PS1) 10Ottomata: Set cors to false for eventgate-analytics node service chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/491857 (https://phabricator.wikimedia.org/T208251)
[21:19:46] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Set cors to false for eventgate-analytics node service chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/491857 (https://phabricator.wikimedia.org/T208251) (owner: 10Ottomata)
[21:27:32] <wikibugs>	 (03CR) 10CRusnov: [V: 03+1] "Successfully built with dummy password, produces the file expected." [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov)
[21:28:46] <logmsgbot>	 !log arlolra@deploy1001 Finished deploy [parsoid/deploy@c4574d1]: Updating Parsoid to 9b204a0 (duration: 09m 33s)
[21:28:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:31:33] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudvirt1009: upgrade to 10G - https://phabricator.wikimedia.org/T216324 (10Andrew)
[21:34:11] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudvirt1009: upgrade to 10G - https://phabricator.wikimedia.org/T216324 (10RobH) a:05RobH→03Cmjohnson So yeah earlier we tried to remotely enter bios and enable the 10G nic and failed (it requires crash cart.)  So this is ready for...
[21:40:54] <icinga-wm>	 PROBLEM - ensure kvm processes are running on labvirt1008 is CRITICAL: PROCS CRITICAL: 0 processes with regex args /usr/bin/kvm
[21:43:22] <icinga-wm>	 RECOVERY - ensure kvm processes are running on labvirt1008 is OK: PROCS OK: 1 process with regex args /usr/bin/kvm
[21:46:16] <arlolra>	 !log Updated Parsoid to 9b204a0 (T153080, T169975, T215824)
[21:46:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:46:23] <stashbot>	 T153080: Parse images synchronously without making imageinfo requests and use a final postprocessing pass to fixup image HTML - https://phabricator.wikimedia.org/T153080
[21:46:23] <stashbot>	 T215824: AddMediaInfo pass isn't robust to link-in-link - https://phabricator.wikimedia.org/T215824
[21:46:23] <stashbot>	 T169975: Missing images render as broken img tags, not redlinks - https://phabricator.wikimedia.org/T169975
[21:54:59] <wikibugs>	 (03PS1) 10Ottomata: [WIP] Set up DNS for eventgate-analytics [dns] - 10https://gerrit.wikimedia.org/r/491860 (https://phabricator.wikimedia.org/T211247)
[21:55:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Set up DNS for eventgate-analytics [dns] - 10https://gerrit.wikimedia.org/r/491860 (https://phabricator.wikimedia.org/T211247) (owner: 10Ottomata)
[21:55:28] <wikibugs>	 (03PS1) 10Ottomata: [WIP] Set up eventgate-analytics.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/491861 (https://phabricator.wikimedia.org/T211247)
[21:56:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Set up eventgate-analytics.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/491861 (https://phabricator.wikimedia.org/T211247) (owner: 10Ottomata)
[21:56:15] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "Mostly ok, few nitpicks/typos inline" (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[21:56:28] <wikibugs>	 (03PS2) 10Ottomata: [WIP] Set up DNS for eventgate-analytics [dns] - 10https://gerrit.wikimedia.org/r/491860 (https://phabricator.wikimedia.org/T211247)
[21:56:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Set up DNS for eventgate-analytics [dns] - 10https://gerrit.wikimedia.org/r/491860 (https://phabricator.wikimedia.org/T211247) (owner: 10Ottomata)
[21:57:18] <wikibugs>	 (03PS3) 10Ottomata: [WIP] Set up DNS for eventgate-analytics [dns] - 10https://gerrit.wikimedia.org/r/491860 (https://phabricator.wikimedia.org/T211247)
[21:57:58] <wikibugs>	 (03PS2) 10Ottomata: [WIP] Set up eventgate-analytics.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/491861 (https://phabricator.wikimedia.org/T211247)
[21:58:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Set up eventgate-analytics.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/491861 (https://phabricator.wikimedia.org/T211247) (owner: 10Ottomata)
[22:08:18] <wikibugs>	 (03PS3) 10Ottomata: [WIP] Set up eventgate-analytics.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/491861 (https://phabricator.wikimedia.org/T211247)
[22:44:04] <Krenair>	 Telia is a network Wikimedia peers with right?
[22:44:23] <wikibugs>	 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to tsops@ - https://phabricator.wikimedia.org/T210464 (10bcampbell) @Dzahn I just added pat@, gary@, and box6699@ as aliases to Google Group tsops@wikimedia.org. You should be able to delete on your side now.
[22:47:46] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10User-fgiunchedi, 10User-herron: Increase utilization of application logging pipeline (FY2018-2019 Q3 TEC6) - https://phabricator.wikimedia.org/T213157 (10RobH)
[22:55:10] <wikibugs>	 (03CR) 10DCausse: elasticsearch: add script to execute systemctl on each elasticsearch instance (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel)
[23:08:51] <wikibugs>	 (03CR) 10Zhuyifei1999: [C: 03+2] Mount /mnt/nfs into Kuberntes pods [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/491397 (https://phabricator.wikimedia.org/T193646) (owner: 10BryanDavis)
[23:08:53] <wikibugs>	 (03CR) 10Zhuyifei1999: [C: 03+2] Set custom mime-types [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/489409 (https://phabricator.wikimedia.org/T178601) (owner: 10BryanDavis)
[23:09:29] <wikibugs>	 (03Merged) 10jenkins-bot: Mount /mnt/nfs into Kuberntes pods [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/491397 (https://phabricator.wikimedia.org/T193646) (owner: 10BryanDavis)
[23:09:46] <wikibugs>	 (03Merged) 10jenkins-bot: Set custom mime-types [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/489409 (https://phabricator.wikimedia.org/T178601) (owner: 10BryanDavis)
[23:28:26] <wikibugs>	 (03PS1) 10Smalyshev: Turn off proxy_intercept_errors for nginx [puppet] - 10https://gerrit.wikimedia.org/r/491870 (https://phabricator.wikimedia.org/T214032)
[23:31:46] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1020 - https://phabricator.wikimedia.org/T216649 (10Andrew) This is still complaining about the battery :(
[23:43:01] <wikibugs>	 (03PS1) 10BryanDavis: Always create /var/run/lighttpd/ before chmod [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/491873
[23:43:55] <wikibugs>	 (03CR) 10Zhuyifei1999: [C: 03+2] Always create /var/run/lighttpd/ before chmod [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/491873 (owner: 10BryanDavis)
[23:44:11] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10MW-1.33-notes (1.33.0-wmf.2; 2018-10-30), 10User-Addshore: Wikidata produces a lot of failed requests for recentchanges API - https://phabricator.wikimedia.org/T202764 (10Smalyshev) 05Open→03Resolved a:03Smalyshev Doesn't seem to happen anymore,...
[23:44:17] <wikibugs>	 (03Merged) 10jenkins-bot: Always create /var/run/lighttpd/ before chmod [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/491873 (owner: 10BryanDavis)
[23:57:52] <logmsgbot>	 !log ppchelko@deploy1001 Started deploy [changeprop/deploy@5e4486a]: Purge varnish on revision restrictions
[23:57:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:59:07] <wikibugs>	 10Operations, 10Traffic: Content purges are unreliable - https://phabricator.wikimedia.org/T133821 (10mobrovac)
[23:59:15] <logmsgbot>	 !log ppchelko@deploy1001 Finished deploy [changeprop/deploy@5e4486a]: Purge varnish on revision restrictions (duration: 01m 23s)
[23:59:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log