[00:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Evening SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T0000). [00:00:04] ebernhardson: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:59] just me, i can deploy [00:01:47] ebernhardson: hmm [00:01:48] (03PS2) 10EBernhardson: [cirrus] reduce master timeout to 30s [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491231 (https://phabricator.wikimedia.org/T215969) (owner: 10DCausse) [00:01:53] i think i must have put mine in the wrong place.. [00:02:12] indeed.. i put it down for thursday. @ebernhardson mind if I move it to now? [00:02:14] (03CR) 10EBernhardson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491231 (https://phabricator.wikimedia.org/T215969) (owner: 10DCausse) [00:02:20] jdlrobson: sure, we can ship it too [00:02:29] I sincerely hate the deploy table wikitext.. [00:03:10] jdlrobson: i always ctrl-f for '19 16' to find the 16:00 entry for the 19th (today) [00:03:11] @ebernhardson okay im in... https://wikitech.wikimedia.org/wiki/Deployments#Wednesday,_February_20 [00:03:14] (03Merged) 10jenkins-bot: [cirrus] reduce master timeout to 30s [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491231 (https://phabricator.wikimedia.org/T215969) (owner: 10DCausse) [00:03:23] Yeh I got confused though and grepped for utc time :/ [00:03:34] i suppose it displays in utc, makes sense :)_ [00:05:33] !log ebernhardson@deploy1001 Synchronized wmf-config/CirrusSearch-production.php: SWAT T215969 Return cirrussearch master timeout back to the default value (duration: 00m 57s) [00:05:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:05:35] T215969: Measure mutation latency across the newly split elasticsearch clusters - https://phabricator.wikimedia.org/T215969 [00:12:51] (03CR) 10jenkins-bot: [cirrus] reduce master timeout to 30s [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491231 (https://phabricator.wikimedia.org/T215969) (owner: 10DCausse) [00:13:12] ebernhardson: ready when you are [00:17:38] they should be on mwdebug1001 now [00:18:26] (on it) [00:18:46] ebernhardson: both changes or just 1 of them? [00:21:02] ebernhardson: you can sync now.. assuming that's both, as it looks like it's on both :) [00:21:31] jdlrobson: right, it's on both branches [00:21:36] shipping it [00:22:59] thanks <3 [00:23:24] !log ebernhardson@deploy1001 Synchronized php-1.33.0-wmf.18/skins/MinervaNeue/resources/skins.minerva.content.styles/lists.less: Revert switch to outside list style from ordered lists (duration: 00m 59s) [00:23:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:24:45] !log ebernhardson@deploy1001 Synchronized php-1.33.0-wmf.17/skins/MinervaNeue/resources/skins.minerva.content.styles/lists.less: Revert switch to outside list style from ordered lists (duration: 00m 52s) [00:24:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:26:10] jdlrobson: all synced out [00:44:35] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga2001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:44:47] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga2001 is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:44:49] Wikidata isues? I'm getting "Request from [my IP] via cp1085 cp1085, Varnish XID 1061781934 [00:44:49] Error: 503, Backend fetch failed at Wed, 20 Feb 2019 00:42:58 GMT [00:44:51] PROBLEM - HTTP availability for Varnish at ulsfo on icinga2001 is CRITICAL: job=varnish-text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:44:53] PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga2001 is CRITICAL: cluster=cache_text site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:45:02] what's this [00:45:27] PROBLEM - HTTP availability for Varnish at eqiad on icinga2001 is CRITICAL: job=varnish-text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:45:28] wikidata works for me [00:45:31] PROBLEM - HTTP availability for Varnish at esams on icinga2001 is CRITICAL: job=varnish-text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:45:43] yah wikidata is broken for me also [00:45:53] though [00:46:01] wikipedia is down for me [00:46:03] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga2001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:46:21] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on icinga2001 is CRITICAL: cluster=cache_text site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:46:39] RECOVERY - HTTP availability for Varnish at eqiad on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:46:41] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [00:46:46] funnily wikipedia is up for me [00:46:53] PROBLEM - HTTP availability for Varnish at codfw on icinga2001 is CRITICAL: job=varnish-text site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:47:04] heh chaomodus wikidata works for me [00:47:05] PROBLEM - HTTP availability for Varnish at eqsin on icinga2001 is CRITICAL: job=varnish-text site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:47:14] so im guessing it depends on the data center? [00:47:29] well the alerts imply eqiad [00:47:38] oh i guess that [00:47:42] * paladox would be going through esams [00:47:44] there are some from other pops too [00:47:46] but https://en.wikipedia.org/w/index.php?title=Main_Page&action=edit is showing a error [00:47:53] PROBLEM - HTTP availability for Varnish at eqiad on icinga2001 is CRITICAL: job=varnish-text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:47:58] "Request from 2a00:23c4:ad14:9700:4d40:6f2a:d965:22d8 via cp1085 cp1085, Varnish XID 1067418549 [00:47:58] Error: 503, Backend fetch failed at Wed, 20 Feb 2019 00:46:28 GMT" [00:48:14] so varnish done exploded what do [00:49:39] wikipedia is back for me. [00:50:06] wikidata is back for me just now [00:52:34] wikipedia is now slow [00:52:47] ie not loading now [00:52:51] chaomodus ^^ [00:53:05] wikidata seems snappy for me, i'll check out wikipedia [00:53:05] RECOVERY - HTTP availability for Varnish at codfw on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:53:17] RECOVERY - HTTP availability for Varnish at eqsin on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:53:20] seems fluid [00:53:26] maybe it was a network burp [00:53:42] oh, stupid safari, it works in chrome. [00:54:07] RECOVERY - HTTP availability for Varnish at esams on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:54:34] that graph looks ok again [00:55:01] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:57:05] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:57:09] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:57:11] RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:57:47] RECOVERY - HTTP availability for Varnish at eqiad on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:58:07] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:58:25] RECOVERY - HTTP availability for Varnish at ulsfo on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [01:03:07] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10EBernhardson) Sorry for making everything confusing here, lets run with the assumption for now that the job runners can talk to clo... [01:05:05] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [01:32:39] (03CR) 10BryanDavis: "> LGTM. Shall I build + deploy this?" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/491397 (https://phabricator.wikimedia.org/T193646) (owner: 10BryanDavis) [01:52:30] !log mobrovac@deploy1001 Started deploy [restbase/deploy@751dc5c]: Temporarily collect VE lrequest ogs for T215956 [01:52:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:52:33] T215956: Consider stashing data-parsoid for VE - https://phabricator.wikimedia.org/T215956 [02:15:07] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@751dc5c]: Temporarily collect VE lrequest ogs for T215956 (duration: 22m 37s) [02:15:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:15:10] T215956: Consider stashing data-parsoid for VE - https://phabricator.wikimedia.org/T215956 [02:24:04] (03PS2) 10Andrew Bogott: cloudvirt1012: enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/490786 (https://phabricator.wikimedia.org/T216190) [02:25:11] (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt1012: enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/490786 (https://phabricator.wikimedia.org/T216190) (owner: 10Andrew Bogott) [02:32:42] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudvirt1009: evaluate upgrading to 10G - https://phabricator.wikimedia.org/T216324 (10Andrew) Steps: [] Move host to a rack with 10G -- B2, B4 or B7 I believe [] Enable the 10G nic in the bios (note that we can not do this via mgmt; it... [03:57:34] 10Operations, 10Gerrit, 10serviceops: Gerrit loads very slowly - https://phabricator.wikimedia.org/T215855 (10Paladox) Thanks @hashar [04:45:27] !log add avoid-paths WIRESTAR-OPTICALTEL to cr2-eqdfw [04:45:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:58:03] PROBLEM - Debian mirror in sync with upstream on sodium is CRITICAL: /srv/mirrors/debian is over 14 hours old. [05:10:25] 10Operations, 10Cloud-VPS, 10Traffic, 10netops, 10cloud-services-team (Kanban): Evaluate the possibility to add Juniper images to Openstack - https://phabricator.wikimedia.org/T180179 (10ayounsi) @aborrero Being able to create different L2 links between VMs would be ideal, but having them all in the same... [05:12:14] 10Operations, 10Operations-Software-Development: Netbox: cable termination names report - https://phabricator.wikimedia.org/T216469 (10ayounsi) No strong preferences, but indeed "Device X has Y miss-labelled" is an option. [05:55:13] (03CR) 10Elukey: Add analytics purge job for xmldumps on HDFS (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [06:08:18] (03PS1) 10Marostegui: db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491682 (https://phabricator.wikimedia.org/T210713) [06:10:00] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491682 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:10:47] (03CR) 10Elukey: "> I understand your concern Luca. I also think it is likely to fail" [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [06:11:00] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491682 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:11:17] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491682 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:12:20] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1119 T210713 (duration: 01m 05s) [06:12:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:12:23] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [06:12:31] (03CR) 10Elukey: Add analytics purge job for xmldumps on HDFS (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [06:14:30] (03PS1) 10Marostegui: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491683 [06:16:14] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491683 (owner: 10Marostegui) [06:17:15] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491683 (owner: 10Marostegui) [06:18:24] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1109 for kernel and mysql upgrade (duration: 00m 52s) [06:18:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:18:51] !log Stop MySQL on db1109 for kernel and mysql upgrade [06:18:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:19:19] (03PS9) 10Elukey: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [06:21:59] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491683 (owner: 10Marostegui) [06:22:48] (03PS10) 10Elukey: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [06:23:12] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491684 [06:26:21] (03PS11) 10Elukey: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [06:26:37] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491684 (owner: 10Marostegui) [06:27:41] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491684 (owner: 10Marostegui) [06:28:41] PROBLEM - netbox HTTPS on netmon1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 547 bytes in 0.190 second response time [06:28:47] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1109 after kernel upgrade (duration: 00m 52s) [06:28:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:29:08] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491687 [06:29:21] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:31:13] RECOVERY - netbox HTTPS on netmon1002 is OK: HTTP OK: HTTP/1.1 302 Found - 348 bytes in 0.662 second response time [06:31:51] (03PS12) 10Elukey: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [06:31:53] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational [06:33:00] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491684 (owner: 10Marostegui) [06:36:15] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491687 (owner: 10Marostegui) [06:37:11] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491687 (owner: 10Marostegui) [06:38:13] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1109 after kernel upgrade (duration: 00m 52s) [06:38:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:39:09] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1001/14745/an-coord1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [06:39:23] (03CR) 10Elukey: "Joal: let me know if I should merge or not :)" [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [06:40:41] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491688 [06:41:25] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491688 [06:42:25] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491688 (owner: 10Marostegui) [06:43:29] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491688 (owner: 10Marostegui) [06:43:52] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::mediawiki::php: install tideways-xhprof, remove tideways [puppet] - 10https://gerrit.wikimedia.org/r/491533 (https://phabricator.wikimedia.org/T176916) (owner: 10Giuseppe Lavagetto) [06:44:02] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::php: install tideways-xhprof, remove tideways [puppet] - 10https://gerrit.wikimedia.org/r/491533 (https://phabricator.wikimedia.org/T176916) [06:44:10] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491687 (owner: 10Marostegui) [06:44:12] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491688 (owner: 10Marostegui) [06:44:30] (03PS1) 10Marostegui: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491689 (https://phabricator.wikimedia.org/T210713) [06:44:32] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1119 T210713 (duration: 00m 51s) [06:44:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:44:35] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [06:45:31] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491689 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:46:34] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491689 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:47:38] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491690 [06:47:46] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1080 T210713 (duration: 00m 51s) [06:47:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:48:43] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491690 (owner: 10Marostegui) [06:49:45] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491690 (owner: 10Marostegui) [06:49:51] (03PS1) 10Elukey: superset: use cn for LDAP search (not uid) [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524) [06:50:52] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Increase traffic for db1109 after kernel upgrade (duration: 00m 52s) [06:50:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:51:12] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile: use register_shutdown_function [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491518 (https://phabricator.wikimedia.org/T176916) (owner: 10Giuseppe Lavagetto) [06:52:14] (03Merged) 10jenkins-bot: profile: use register_shutdown_function [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491518 (https://phabricator.wikimedia.org/T176916) (owner: 10Giuseppe Lavagetto) [06:54:51] !log oblivian@deploy1001 Synchronized wmf-config/profiler.php: Fix the tideways setup (duration: 00m 52s) [06:54:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:55:37] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491689 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:55:39] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491690 (owner: 10Marostegui) [06:55:41] (03CR) 10jenkins-bot: profile: use register_shutdown_function [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491518 (https://phabricator.wikimedia.org/T176916) (owner: 10Giuseppe Lavagetto) [07:04:49] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491692 [07:07:38] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491692 (owner: 10Marostegui) [07:08:36] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491692 (owner: 10Marostegui) [07:09:44] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Increase traffic for db1109 after kernel upgrade (duration: 00m 52s) [07:09:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:09:58] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491693 [07:10:47] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491693 [07:11:49] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491693 (owner: 10Marostegui) [07:12:49] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491693 (owner: 10Marostegui) [07:13:51] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1080 T210713 (duration: 00m 52s) [07:13:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:13:54] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [07:14:05] !log Deploy schema change on s1 primary master (db1067) - T210713 [07:14:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:16:06] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491694 [07:17:58] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491692 (owner: 10Marostegui) [07:18:00] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491693 (owner: 10Marostegui) [07:44:37] (03Abandoned) 10Elukey: Introduce profile::analytics::cluster::limits::statistics [puppet] - 10https://gerrit.wikimedia.org/r/488078 (https://phabricator.wikimedia.org/T212824) (owner: 10Elukey) [07:45:07] (03CR) 10Elukey: [C: 03+2] Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [07:45:14] (03PS13) 10Elukey: Add analytics purge job for xmldumps on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [07:45:46] !log installing gnupg2 updates on stretch [07:45:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:05:29] (03PS1) 10Joal: Correct analytics-drop-xmldumps systemd-timer name [puppet] - 10https://gerrit.wikimedia.org/r/491696 (https://phabricator.wikimedia.org/T216414) [08:06:30] elukey: --^ [08:22:29] (03PS1) 10Zoranzoki21: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491697 [08:22:31] (03PS1) 10Zoranzoki21: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491698 [08:22:33] (03PS1) 10Zoranzoki21: Disable mobile main page special casing on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491699 (https://phabricator.wikimedia.org/T216563) [08:22:38] (03CR) 10Elukey: [C: 03+2] Correct analytics-drop-xmldumps systemd-timer name [puppet] - 10https://gerrit.wikimedia.org/r/491696 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [08:22:52] (03Abandoned) 10Zoranzoki21: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491698 (owner: 10Zoranzoki21) [08:22:58] (03Abandoned) 10Zoranzoki21: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491697 (owner: 10Zoranzoki21) [08:23:24] (03PS2) 10Zoranzoki21: Disable mobile main page special casing on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491699 (https://phabricator.wikimedia.org/T216563) [08:24:56] (03PS3) 10Zoranzoki21: Disable mobile main page special casing on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491699 (https://phabricator.wikimedia.org/T216563) [08:24:58] (03PS1) 10Zoranzoki21: Test change for problems with git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491700 [08:25:28] (03CR) 10Zoranzoki21: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491699 (https://phabricator.wikimedia.org/T216563) (owner: 10Zoranzoki21) [08:25:36] (03PS2) 10Zoranzoki21: Test change for problems with git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491700 [08:25:39] (03Abandoned) 10Zoranzoki21: Test change for problems with git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491700 (owner: 10Zoranzoki21) [08:34:03] PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:34:34] this is me --^ [08:40:20] (03PS1) 10Zoranzoki21: Disabled mobile main page special casing on Serbian projects because it is unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491701 [08:40:52] (03PS4) 10Mathew.onipe: cloudelastic: Add cloudelastic configs [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921) [08:41:06] (03CR) 10Mathew.onipe: cloudelastic: Add cloudelastic configs (039 comments) [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe) [08:44:26] (03Abandoned) 10Zoranzoki21: Disabled mobile main page special casing on Serbian projects because it is unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491701 (owner: 10Zoranzoki21) [08:44:50] 10Operations, 10Analytics, 10Product-Analytics, 10Patch-For-Review, 10User-Elukey: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 (10elukey) Today I checked notebook1003 using the command `systemd-cgls memory`, that should show how the cgroups for memory setting... [08:48:47] !log powercycling rdb1001 for a test [08:48:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:00:04] (03PS1) 10Muehlenhoff: Update point of contact for one researcher [puppet] - 10https://gerrit.wikimedia.org/r/491710 [09:01:23] (03PS2) 10Muehlenhoff: profile::prometheus::nutcracker_exporter: Remove trusty support [puppet] - 10https://gerrit.wikimedia.org/r/490881 [09:01:32] (03CR) 10Muehlenhoff: [C: 03+2] Update point of contact for one researcher [puppet] - 10https://gerrit.wikimedia.org/r/491710 (owner: 10Muehlenhoff) [09:03:48] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491694 (owner: 10Marostegui) [09:04:51] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491694 (owner: 10Marostegui) [09:05:35] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491694 (owner: 10Marostegui) [09:06:03] (03PS4) 10Gehel: Restore privileges to admin table after script [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) (owner: 10MSantos) [09:06:36] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1109 (duration: 00m 52s) [09:06:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:07:48] (03CR) 10Gehel: "puppet compiler looks good: https://puppet-compiler.wmflabs.org/compiler1001/14748/" [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) (owner: 10MSantos) [09:13:32] PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [09:13:39] (03PS1) 10Marostegui: mariadb: Make staging not read-only [puppet] - 10https://gerrit.wikimedia.org/r/491713 [09:14:02] moritzm: ^ is that your +2 ? [09:15:15] damn, forgot to press ENTER, just merged [09:15:58] RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge. [09:16:35] :) [09:16:46] (03CR) 10Mathew.onipe: [C: 03+1] Restore privileges to admin table after script [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) (owner: 10MSantos) [09:19:18] (03PS2) 10Marostegui: mariadb: Make staging not read-only [puppet] - 10https://gerrit.wikimedia.org/r/491713 [09:21:01] (03PS5) 10Gehel: Restore privileges to admin table after script [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) (owner: 10MSantos) [09:22:31] (03CR) 10Marostegui: "The compiler looks good: https://puppet-compiler.wmflabs.org/compiler1001/14750/" [puppet] - 10https://gerrit.wikimedia.org/r/491713 (owner: 10Marostegui) [09:23:09] (03CR) 10Gehel: [C: 03+2] Restore privileges to admin table after script [puppet] - 10https://gerrit.wikimedia.org/r/491399 (https://phabricator.wikimedia.org/T216466) (owner: 10MSantos) [09:25:37] (03PS3) 10Marostegui: mariadb: Make staging not read-only [puppet] - 10https://gerrit.wikimedia.org/r/491713 (https://phabricator.wikimedia.org/T210478) [09:29:35] (03PS1) 10Joal: Fix analytics-drop-xmldumps service [puppet] - 10https://gerrit.wikimedia.org/r/491717 (https://phabricator.wikimedia.org/T216414) [09:29:46] elukey: --^ hopefully the last one [09:31:13] (03CR) 10Gehel: [C: 03+2] Add remove_on_error parameter to icinga.hosts_downtimed() [software/spicerack] - 10https://gerrit.wikimedia.org/r/491526 (owner: 10Gehel) [09:32:02] (03CR) 10jenkins-bot: Add remove_on_error parameter to icinga.hosts_downtimed() [software/spicerack] - 10https://gerrit.wikimedia.org/r/491526 (owner: 10Gehel) [09:33:28] !log Deploy schema change on db2043 (s3 codfw master), lag will be generated on s3 codfw - T210713 [09:33:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:33:31] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [09:33:33] joal: let's remove /usr/bin/python from there [09:33:50] ack elukey - pushing [09:35:21] (03PS2) 10Joal: Fix analytics-drop-xmldumps service [puppet] - 10https://gerrit.wikimedia.org/r/491717 (https://phabricator.wikimedia.org/T216414) [09:37:02] (03PS3) 10Elukey: Fix the mediawiki-drop-xmldumps-pages_meta_history timer [puppet] - 10https://gerrit.wikimedia.org/r/491717 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [09:38:03] (03CR) 10Elukey: [C: 03+2] Fix the mediawiki-drop-xmldumps-pages_meta_history timer [puppet] - 10https://gerrit.wikimedia.org/r/491717 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [09:39:09] (03CR) 10Jcrespo: [C: 03+1] mariadb: Make staging not read-only [puppet] - 10https://gerrit.wikimedia.org/r/491713 (https://phabricator.wikimedia.org/T210478) (owner: 10Marostegui) [09:39:37] (03PS4) 10Marostegui: mariadb: Make staging not read-only [puppet] - 10https://gerrit.wikimedia.org/r/491713 (https://phabricator.wikimedia.org/T210478) [09:40:42] (03CR) 10Marostegui: [C: 03+2] mariadb: Make staging not read-only [puppet] - 10https://gerrit.wikimedia.org/r/491713 (https://phabricator.wikimedia.org/T210478) (owner: 10Marostegui) [09:41:23] (03Abandoned) 10Marostegui: dbstore_multiinstance.pp: Specify read-only for staging [puppet] - 10https://gerrit.wikimedia.org/r/491408 (https://phabricator.wikimedia.org/T210478) (owner: 10Marostegui) [09:41:42] RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational [09:42:31] 10Operations, 10Elasticsearch, 10Discovery-Search (Current work): Test spicerack elasticsearch module - https://phabricator.wikimedia.org/T207920 (10Gehel) Upgrade to elasticsearch 5.6.x was performed on relforge with spicerack, so testing is complete. There are always things to improve, but this is working... [09:44:34] (03CR) 10Elukey: [C: 03+1] profile::prometheus::nutcracker_exporter: Remove trusty support [puppet] - 10https://gerrit.wikimedia.org/r/490881 (owner: 10Muehlenhoff) [09:52:23] (03PS1) 10Gehel: elasticsearch: retry on all urllib3 exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719 [09:52:28] 10Operations, 10Wikimedia-Logstash: Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10fgiunchedi) >>! In T205856#4959426, @bd808 wrote: >> The plan has syslog + json as formatting, since that's what we use for logstash already and preserv... [09:52:38] (03PS1) 10Marostegui: instance.pp: Make read-only check use the variable [puppet] - 10https://gerrit.wikimedia.org/r/491720 [09:53:10] (03CR) 10Hashar: [C: 03+1] "Probably legit? ;)" [puppet] - 10https://gerrit.wikimedia.org/r/491276 (https://phabricator.wikimedia.org/T210706) (owner: 10Elukey) [09:53:31] (03PS3) 10Elukey: Deployment-prep: add cassandra/twcs scap repository [puppet] - 10https://gerrit.wikimedia.org/r/491276 (https://phabricator.wikimedia.org/T210706) [09:54:37] (03CR) 10Volans: elasticsearch: retry on all urllib3 exceptions (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719 (owner: 10Gehel) [09:54:43] (03CR) 10Marostegui: "https://puppet-compiler.wmflabs.org/compiler1001/14751/" [puppet] - 10https://gerrit.wikimedia.org/r/491720 (owner: 10Marostegui) [09:56:04] (03CR) 10Marostegui: [C: 03+2] instance.pp: Make read-only check use the variable [puppet] - 10https://gerrit.wikimedia.org/r/491720 (owner: 10Marostegui) [09:57:54] !log installing systemd security updates on jessie hosts [09:57:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:02] 10Operations, 10Wikimedia-Logstash: Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10fgiunchedi) >>! In T205856#4965359, @Ottomata wrote: > Qs: > > Are the logs sent using Monolog? > > Is there just one topic 'mwlog', or multiple, one... [10:01:18] (03CR) 10Alexandros Kosiaris: [C: 04-1] Add ganeti read-only user deployment (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [10:02:14] (03PS2) 10Gehel: elasticsearch: retry on all urllib3 exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719 [10:02:37] (03CR) 10Gehel: elasticsearch: retry on all urllib3 exceptions (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719 (owner: 10Gehel) [10:04:06] !log Deploy schema change on dbstore1004:3313 - T210713 [10:04:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:04:09] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [10:04:29] (03CR) 10Elukey: [C: 03+2] Deployment-prep: add cassandra/twcs scap repository [puppet] - 10https://gerrit.wikimedia.org/r/491276 (https://phabricator.wikimedia.org/T210706) (owner: 10Elukey) [10:04:36] (03PS4) 10Elukey: Deployment-prep: add cassandra/twcs scap repository [puppet] - 10https://gerrit.wikimedia.org/r/491276 (https://phabricator.wikimedia.org/T210706) [10:04:38] (03CR) 10Elukey: [V: 03+2 C: 03+2] Deployment-prep: add cassandra/twcs scap repository [puppet] - 10https://gerrit.wikimedia.org/r/491276 (https://phabricator.wikimedia.org/T210706) (owner: 10Elukey) [10:05:42] 10Operations: Integrate Stretch 9.8 point update - https://phabricator.wikimedia.org/T216384 (10MoritzMuehlenhoff) [10:07:57] (03PS5) 10Mathew.onipe: cloudelastic: Add cloudelastic configs [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921) [10:32:12] PROBLEM - Disk space on prometheus2003 is CRITICAL: DISK CRITICAL - free space: /srv/prometheus/services 4996 MB (2% inode=99%) [10:35:09] ugghh that's me [10:35:52] RECOVERY - Disk space on prometheus2003 is OK: DISK OK [10:36:56] !log Deploy schema change on db1095:3313 - T210713 [10:36:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:00] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [10:40:32] (03PS1) 10Elukey: Add pginer (WMF staff) to admin data.yml [puppet] - 10https://gerrit.wikimedia.org/r/491723 (https://phabricator.wikimedia.org/T211036) [10:41:10] (03PS3) 10Gehel: elasticsearch: retry on all urllib3 exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719 [10:42:49] (03CR) 10DCausse: "settings look good to me but I think you now need to add a new role in modules/role/manifests/elasticsearch/cloudelastic.pp because none t" [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe) [10:52:17] (03PS1) 10Gehel: elasticsearch: raise logging level to ERROR for elasticsearch [software/spicerack] - 10https://gerrit.wikimedia.org/r/491725 [10:53:12] mmhh for some reason prometheus on bast3002 has restarted, currently recovering its storage [10:53:22] hence the UNKNOWNs on icinga [10:54:54] (03PS42) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) [10:55:49] godog: hmmh, the prometheus process is from 10:27 and at 10:25 I restarted systemd-journald for a sec update, I'm wondering if that's related [10:56:48] moritzm: could be, I'm not sure yet, though prometheus got started back up by puppet afaics [10:57:03] 10Operations, 10ops-eqiad, 10DBA: db1069 (x1 master) memory errors - https://phabricator.wikimedia.org/T201133 (10Marostegui) Just for the record ` db1069 Memory correctable errors -EDAC- WARNING 2019-02-20 10:45:24 2d 19h 28m 54s 3/3 2 ge 2 ` [10:57:46] 10Operations, 10monitoring, 10Goal, 10Patch-For-Review: Upgrade production prometheus-node-exporter to >= 0.16 - https://phabricator.wikimedia.org/T213708 (10fgiunchedi) Noticed this today on `bast3002`, probably harmless but needs investigation: ` Feb 20 10:26:50 bast3002 systemd[1]: Starting Collect ipm... [10:57:50] (03CR) 10Jbond: "ready for review" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [10:59:50] 10Operations, 10Gerrit, 10serviceops: Gerrit loads very slowly - https://phabricator.wikimedia.org/T215855 (10hashar) I have pasted P8073 content to [[ https://fastthread.io/ | fastthread.io ]]. It is an analyzer for Java thread dumps. https://fastthread.io/ft-thread-report.jsp?dumpId=1&oTxnId_value=c865888... [11:06:41] moritzm: looks like the last datapoints were around 10:24 so that would line up, however other processes seem fine and didn't restart [11:09:54] (03PS4) 10Volans: elasticsearch: retry on all urllib3 exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719 (owner: 10Gehel) [11:09:56] the jessie-based prometheus servers are still TBD, we can see whether it reproes when systemd is upgraded there [11:10:22] PROBLEM - puppet last run on mc2022 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools] [11:11:20] moritzm: kk, please let me know before so I can take a look too [11:14:27] (03CR) 10Alexandros Kosiaris: Introduce citoid helm chart (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/491523 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris) [11:15:54] (03PS6) 10Mathew.onipe: cloudelastic: Add cloudelastic configs [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921) [11:16:09] (03CR) 10DCausse: "sorry ignore my previous comment" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe) [11:17:11] (03CR) 10Volans: [C: 03+2] elasticsearch: retry on all urllib3 exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719 (owner: 10Gehel) [11:18:08] 10Operations, 10Gerrit, 10serviceops: Gerrit loads very slowly - https://phabricator.wikimedia.org/T215855 (10hashar) ` zcat /var/log/apache2/gerrit.wikimedia.org.https.access.log.9.gz|cut -b-13|sort|uniq -c 17821 2019-02-11T06 55594 2019-02-11T07 52925 2019-02-11T08 54292 2019-02-11T09 74124 2019-... [11:19:27] (03CR) 10Fsero: [C: 03+1] Introduce citoid helm chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/491523 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris) [11:21:29] (03Merged) 10jenkins-bot: elasticsearch: retry on all urllib3 exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719 (owner: 10Gehel) [11:22:16] (03CR) 10jenkins-bot: elasticsearch: retry on all urllib3 exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/491719 (owner: 10Gehel) [11:25:16] (03PS2) 10Volans: elasticsearch: raise logging level to ERROR for elasticsearch [software/spicerack] - 10https://gerrit.wikimedia.org/r/491725 (owner: 10Gehel) [11:28:50] !log rebuild and re-upload rsyslog_8.38.0-1~bpo9+1wmf1_amd64.changes to apt.wikimedia.org/stretch-wikimedia to have mmkubernetes package [11:28:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:33:18] 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10Jhernandez) We still haven't created the herald rule to tag all proton... [11:34:17] (03CR) 10Volans: [C: 03+2] elasticsearch: raise logging level to ERROR for elasticsearch [software/spicerack] - 10https://gerrit.wikimedia.org/r/491725 (owner: 10Gehel) [11:36:28] PROBLEM - MariaDB Slave Lag: s5 on dbstore2001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 407.65 seconds [11:36:30] RECOVERY - puppet last run on mc2022 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:38:27] (03Merged) 10jenkins-bot: elasticsearch: raise logging level to ERROR for elasticsearch [software/spicerack] - 10https://gerrit.wikimedia.org/r/491725 (owner: 10Gehel) [11:39:12] (03CR) 10jenkins-bot: elasticsearch: raise logging level to ERROR for elasticsearch [software/spicerack] - 10https://gerrit.wikimedia.org/r/491725 (owner: 10Gehel) [11:41:26] RECOVERY - MariaDB Slave Lag: s5 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 40.47 seconds [11:41:34] (03PS3) 10Zoranzoki21: IS.php: Add wgProofreadPagePageJoiner, set it per default on '-' and at zhwikisource on __PAGEJOIN__ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482502 (https://phabricator.wikimedia.org/T205826) [11:41:53] (03PS4) 10Zoranzoki21: Add category at wgGettingStartedExcludedCategories for srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482534 [11:41:58] (03PS5) 10Zoranzoki21: Add categories for all Croatian projects at wmgBabelMainCategory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482548 [11:46:18] !log rolling restarts for hhvm in codfw [11:46:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:47:25] (03PS2) 10Arturo Borrero Gonzalez: aptrepo: pull openstack mitaka packages into reprepro [puppet] - 10https://gerrit.wikimedia.org/r/491558 (https://phabricator.wikimedia.org/T216497) [11:48:28] (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.17 [software/spicerack] - 10https://gerrit.wikimedia.org/r/491733 [11:49:16] 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MSantos) a:03MSantos [11:54:54] (03CR) 10Muehlenhoff: [C: 03+1] Add pginer (WMF staff) to admin data.yml [puppet] - 10https://gerrit.wikimedia.org/r/491723 (https://phabricator.wikimedia.org/T211036) (owner: 10Elukey) [11:55:01] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/491723 (https://phabricator.wikimedia.org/T211036) (owner: 10Elukey) [11:56:10] (03CR) 10Elukey: [C: 03+2] Add pginer (WMF staff) to admin data.yml [puppet] - 10https://gerrit.wikimedia.org/r/491723 (https://phabricator.wikimedia.org/T211036) (owner: 10Elukey) [11:56:16] thanks! [11:57:08] 10Operations, 10Scap, 10Release-Engineering-Team (Kanban): Remove trusty-specific hacks from logstash_checker.py - https://phabricator.wikimedia.org/T216380 (10MoritzMuehlenhoff) p:05Triage→03Low [11:57:29] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.17 [software/spicerack] - 10https://gerrit.wikimedia.org/r/491733 (owner: 10Volans) [12:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: My dear minions, it's time we take the moon! Just kidding. Time for European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T1200). [12:00:04] Zoranzoki21: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:15] Here \o/ [12:00:45] Zoranzoki21: I can swat today, but in 5 minutes, and I have to go in 25 minutes, so I'll do my best today [12:00:52] 2-3 patches probably [12:00:59] is it known that the RC IRC feeds have stopped? [12:01:07] ? [12:01:39] wikimedia IRC recent changes have stopped [12:01:44] (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.17 [software/spicerack] - 10https://gerrit.wikimedia.org/r/491733 (owner: 10Volans) [12:01:57] Ok, if you have so small time, do 491699 and mwscript namespaceDupes.php on shwiki [12:02:13] 10Operations, 10IRCecho: irc.wikimedia.org RC feed has stopped - https://phabricator.wikimedia.org/T216607 (10Bawolff) [12:02:30] (03CR) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.17 [software/spicerack] - 10https://gerrit.wikimedia.org/r/491733 (owner: 10Volans) [12:02:33] I think this probably warrants UBN [12:03:04] 10Operations, 10IRCecho: irc.wikimedia.org RC feed has stopped - https://phabricator.wikimedia.org/T216607 (10Bawolff) p:05Triage→03Unbreak! [12:04:06] (03PS1) 10Volans: Upstream release v0.0.17 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/491734 [12:04:57] fsero ^^^ [12:05:40] What happening with SWAT? [12:08:18] It's up now, bawolff [12:08:20] !log restarted ircecho on kraz.wikimedia.org [12:08:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:50] So i guess there's no icingia monitoring of ircecho ;) [12:08:54] there is [12:09:12] but it looks like it only checks if the service is running (it was), not if the service is doing anything (it wasn't) [12:09:21] i guess moritzm restarted i was looking into it and i saw the restart [12:09:34] (03CR) 10Addshore: [C: 03+1] Change Special:ItemDisambiguation from blank special page to disabled page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491237 (https://phabricator.wikimedia.org/T216397) (owner: 10Ladsgroup) [12:09:41] Anyways, thanks all :) [12:09:51] Zoranzoki21: sorry, I'm a bit late today [12:09:53] meta and enwikisource are back, awaiting wikimaniawiki [12:09:59] I guess there will be time for 1-2 patches today [12:10:01] Wed [06:07:56 AM]<-- rc-pmtpa has quit (Remote host closed the connection) [12:10:01] Wed [06:07:56 AM]--> rc-pmtpa (~rc-pmtpa@special.user) has joined #en.wikipedia [12:10:06] what's the most urgent ones? [12:10:12] in the order of calendar? [12:10:15] en's up [12:10:19] (03CR) 10Addshore: [C: 03+1] "Wikibase.php should be synced first" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491506 (https://phabricator.wikimedia.org/T213713) (owner: 10Ladsgroup) [12:10:26] mwscript and 491699 [12:10:30] thks all [12:10:30] (03PS2) 10Addshore: Drop obsolete Wikibase configs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491506 (https://phabricator.wikimedia.org/T213713) (owner: 10Ladsgroup) [12:10:39] sDrewth: It will probably rejoin the channel upon the first edit to that wiki (If i remember how it works correctly) [12:10:51] k [12:10:58] * sDrewth goes to prod [12:11:12] sDrewth: thanks for the report, I'll follow up on the Phab task [12:11:46] Zoranzoki21: ah, so namespaceDupes is a separate thing, and should be done first? [12:11:56] Ok [12:12:00] (03CR) 10Volans: [C: 03+2] Upstream release v0.0.17 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/491734 (owner: 10Volans) [12:12:36] bawolff, totally correct, memory prize awarded [12:13:22] thks moritzm, reallly appreciate the quick resolution [12:16:09] 10Operations, 10IRCecho: irc.wikimedia.org RC feed has stopped - https://phabricator.wikimedia.org/T216607 (10fsero) ircd was restarted and it seems to be working again. We should investigate why stopped it might be related to systemd upgrade? [12:16:15] (03Merged) 10jenkins-bot: Upstream release v0.0.17 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/491734 (owner: 10Volans) [12:16:33] 10Operations, 10IRCecho: irc.wikimedia.org RC feed has stopped - https://phabricator.wikimedia.org/T216607 (10fsero) p:05Unbreak!→03Normal [12:18:03] 10Operations, 10IRCecho: irc.wikimedia.org RC feed has stopped - https://phabricator.wikimedia.org/T216607 (10MoritzMuehlenhoff) Thanks for the report. This was caused by a restart of systemd-journald which was necessary to deploy a security update for systemd. The immediate error has been fixed by a restart o... [12:18:30] Zoranzoki21: ran the script https://phabricator.wikimedia.org/T216524#4968321 [12:18:47] that's pretty much it for today, I have to go [12:19:00] Ok... [12:19:02] if anybody can take over the swat, please do [12:19:21] Zoranzoki21: move the patches to another swat window if nobody can deploy today [12:19:33] zeljkof: Ok [12:20:05] * zeljkof is gone [12:21:20] addshore, Maxsem, dereckson: anyone? Or I have to move patches at next SWAT? [12:22:12] 10Operations, 10IRCecho: irc.wikimedia.org RC feed has stopped - https://phabricator.wikimedia.org/T216607 (10MoritzMuehlenhoff) [12:22:30] 10Operations, 10IRCecho: Restarting systemd-journald breaks ircecho service - https://phabricator.wikimedia.org/T216607 (10MoritzMuehlenhoff) [12:24:35] (03PS1) 10Arturo Borrero Gonzalez: openstack: use archive.debian.org as jessie-backports repo [puppet] - 10https://gerrit.wikimedia.org/r/491736 (https://phabricator.wikimedia.org/T216497) [12:25:32] PROBLEM - puppet last run on mw2257 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[apache2] [12:25:40] !log uploaded spicerack_0.0.17-1_amd64.deb to apt.wikimedia.org stretch-wikimedia [12:25:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:28:40] !log upgraded spicerack to 0.0.17 on cumin[12]001 [12:28:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:28:47] cc gehel, onimisionipe ^^^ [12:29:04] volans: Thanks! [12:29:18] codfw upgrade is set [12:33:42] 10Operations, 10IRCecho, 10Icinga, 10monitoring: Icnga check for ircecho should check for actual activity - https://phabricator.wikimedia.org/T216611 (10MoritzMuehlenhoff) [12:33:53] 10Operations, 10IRCecho, 10Icinga, 10monitoring: Icinga check for ircecho should check for actual activity - https://phabricator.wikimedia.org/T216611 (10MoritzMuehlenhoff) p:05Triage→03Normal [12:40:06] PROBLEM - HHVM rendering on mw2176 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:41:12] RECOVERY - HHVM rendering on mw2176 is OK: HTTP OK: HTTP/1.1 200 OK - 75146 bytes in 0.210 second response time [12:43:51] volans: thanks ! [12:47:29] (03PS2) 10Arturo Borrero Gonzalez: openstack: use archive.debian.org as jessie-backports repo [puppet] - 10https://gerrit.wikimedia.org/r/491736 (https://phabricator.wikimedia.org/T216497) [12:52:12] 10Operations, 10IRCecho, 10Icinga, 10monitoring: Icinga check for ircecho should check for actual activity - https://phabricator.wikimedia.org/T216611 (10CDanis) This would be an incredibly silly way to do it, but it would be very easy to write a `check_prometheus` invocation for outgoing network traffic f... [12:54:45] (03PS1) 10Arturo Borrero Gonzalez: cumin: add cloud-eqiad1 alias [puppet] - 10https://gerrit.wikimedia.org/r/491740 [12:55:52] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cumin: add cloud-eqiad1 alias [puppet] - 10https://gerrit.wikimedia.org/r/491740 (owner: 10Arturo Borrero Gonzalez) [12:56:56] RECOVERY - puppet last run on mw2257 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [13:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T1300) [13:00:59] !log rolling restarts for hhvm in eqiad [13:01:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:33] (03PS1) 10Gehel: elasticsearch: upgrade elasticsearch / cirrus / codfw to 5.6.14 [puppet] - 10https://gerrit.wikimedia.org/r/491746 (https://phabricator.wikimedia.org/T215931) [13:36:07] (03PS3) 10Muehlenhoff: profile::prometheus::nutcracker_exporter: Remove trusty support [puppet] - 10https://gerrit.wikimedia.org/r/490881 [13:37:51] (03CR) 10Gehel: "PCC looks happy: https://puppet-compiler.wmflabs.org/compiler1001/14752/" [puppet] - 10https://gerrit.wikimedia.org/r/491746 (https://phabricator.wikimedia.org/T215931) (owner: 10Gehel) [13:37:55] (03CR) 10Muehlenhoff: [C: 03+2] profile::prometheus::nutcracker_exporter: Remove trusty support [puppet] - 10https://gerrit.wikimedia.org/r/490881 (owner: 10Muehlenhoff) [13:41:20] (03CR) 10DCausse: [C: 03+1] elasticsearch: upgrade elasticsearch / cirrus / codfw to 5.6.14 [puppet] - 10https://gerrit.wikimedia.org/r/491746 (https://phabricator.wikimedia.org/T215931) (owner: 10Gehel) [13:43:41] (03CR) 10Mathew.onipe: [C: 03+1] elasticsearch: upgrade elasticsearch / cirrus / codfw to 5.6.14 [puppet] - 10https://gerrit.wikimedia.org/r/491746 (https://phabricator.wikimedia.org/T215931) (owner: 10Gehel) [13:44:58] (03PS2) 10Gehel: elasticsearch: upgrade elasticsearch / cirrus / codfw to 5.6.14 [puppet] - 10https://gerrit.wikimedia.org/r/491746 (https://phabricator.wikimedia.org/T215931) [13:45:40] (03CR) 10Gehel: [C: 03+2] elasticsearch: upgrade elasticsearch / cirrus / codfw to 5.6.14 [puppet] - 10https://gerrit.wikimedia.org/r/491746 (https://phabricator.wikimedia.org/T215931) (owner: 10Gehel) [13:48:16] PROBLEM - HHVM rendering on mw1257 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:49:22] RECOVERY - HHVM rendering on mw1257 is OK: HTTP OK: HTTP/1.1 200 OK - 75202 bytes in 0.760 second response time [13:51:37] !log prometheus on prometheus2004 crashed/exited after journald upgrade -- starting up again now [13:51:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:26] !log rolling upgrade of elasticsearch / cirrus / codfw to 5.6.14 - T215931 [13:59:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:29] T215931: Upgrade elasticsearch to 5.6.14 - https://phabricator.wikimedia.org/T215931 [14:00:05] Deploy window MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T1400) [14:00:19] !log gehel@cumin2001 START - Cookbook sre.elasticsearch.rolling-upgrade [14:00:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:41] !log gehel@cumin2001 END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97) [14:00:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:45] :) [14:00:59] here we go [14:01:36] RECOVERY - Device not healthy -SMART- on logstash1006 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=logstash1006&var-datasource=eqiad+prometheus/ops [14:04:34] PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on icinga2001 is CRITICAL: 8.112 le 60 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [14:05:48] RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on icinga2001 is OK: (C)60 le (W)70 le 107.6 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [14:06:31] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10Discovery-Search (Current work): Setup elasticsearch on cloudelastic100[1-4] - https://phabricator.wikimedia.org/T214921 (10Ottomata) > mw job runners -> cloudelastic : closed We have the same problem with updating data in a Presto cluster in the public... [14:07:01] (03PS2) 10Alexandros Kosiaris: Introduce citoid helm chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/491523 (https://phabricator.wikimedia.org/T213194) [14:08:15] (03PS1) 10Gehel: elasticsearch: access production clusters over HTTPS [software/spicerack] - 10https://gerrit.wikimedia.org/r/491750 (https://phabricator.wikimedia.org/T207920) [14:09:29] (03CR) 10DCausse: [C: 03+1] elasticsearch: access production clusters over HTTPS [software/spicerack] - 10https://gerrit.wikimedia.org/r/491750 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [14:11:33] 10Operations, 10Analytics, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10akosiaris) 05Open→03Stalled Per comment above. [14:15:14] (03CR) 10Gehel: [C: 03+2] elasticsearch: access production clusters over HTTPS [software/spicerack] - 10https://gerrit.wikimedia.org/r/491750 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [14:16:08] (03CR) 10jenkins-bot: elasticsearch: access production clusters over HTTPS [software/spicerack] - 10https://gerrit.wikimedia.org/r/491750 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [14:19:07] (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.18 [software/spicerack] - 10https://gerrit.wikimedia.org/r/491752 [14:21:42] 10Operations, 10Wikimedia-Logstash: Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10Ottomata) > re: the open question itself I'm leaning towards having json on kafka Yes please! > There will be one topic per syslog severity [...] Ok... [14:22:08] 10Operations, 10Patch-For-Review: Redundant bootloaders for software RAID - https://phabricator.wikimedia.org/T215183 (10CDanis) @Joe made me aware of the existence of partman configs present on `install1002` that are not in Puppet. The good news is that almost all such files are either editor backup files (e... [14:24:14] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.18 [software/spicerack] - 10https://gerrit.wikimedia.org/r/491752 (owner: 10Volans) [14:25:27] (03PS1) 10Volans: Upstream release v0.0.18 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/491753 [14:25:30] (03CR) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.18 [software/spicerack] - 10https://gerrit.wikimedia.org/r/491752 (owner: 10Volans) [14:30:37] 10Operations, 10Continuous-Integration-Infrastructure: jenkins / zuul backing up due to jenkins slaves down - https://phabricator.wikimedia.org/T216039 (10hashar) 05Open→03Resolved a:03thcipriani We can resolve this task since Tyler did the emergency action. The `castor-save` job could not be triggered... [14:30:55] (03CR) 10Volans: [C: 03+2] Upstream release v0.0.18 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/491753 (owner: 10Volans) [14:33:13] (03CR) 10Ottomata: "The reason we were using uid in some places (like Hue), is that we need the Hue account to match the shell account so it can do proper us" [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524) (owner: 10Elukey) [14:34:17] !log uploaded spicerack_0.0.18-1_amd64.deb to apt.wikimedia.org stretch-wikimedia [14:34:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:14] !log upgraded spicerack to 0.0.18 on cumin[12]001 [14:35:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:21] !log gehel@cumin2001 START - Cookbook sre.elasticsearch.rolling-upgrade [14:35:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:49] 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for kharlan - https://phabricator.wikimedia.org/T216258 (10fsero) Hi, If you are not sure probably analytics-users is the one. analytics-privatedata-users will give you access to IPs or other PII information which unless you are completely sur... [14:42:16] PROBLEM - Device not healthy -SMART- on logstash1006 is CRITICAL: cluster=logstash device=sde instance=logstash1006:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=logstash1006&var-datasource=eqiad+prometheus/ops [14:43:28] !log gehel@cumin2001 END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99) [14:43:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:50] 10Operations, 10Analytics, 10Discovery, 10Research: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10Ottomata) Alright, I'm not familiar with Swift, but if we were to do this, here is what I think we'd need: - Netwo... [14:48:08] 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for kharlan - https://phabricator.wikimedia.org/T216258 (10Ottomata) We don't actually have a lot of users in `analytics-users`, but I believe if all you need access to are EventLogging and Mediawiki History data in Hadoop, `analytics-users` is... [14:49:10] 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for kharlan - https://phabricator.wikimedia.org/T216258 (10kostajh) > We don't actually have a lot of users in analytics-users, but I believe if all you need access to are EventLogging and Mediawiki History data in Hadoop, analytics-users is th... [14:49:33] 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for kharlan - https://phabricator.wikimedia.org/T216258 (10kostajh) [14:51:58] That eventbus alert looks like it was caused by a message size to large for a mediawiki.job.cirrusSearchElasticaWrite job [14:52:57] (03CR) 10Elukey: "> The reason we were using uid in some places (like Hue), is that we" [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524) (owner: 10Elukey) [14:53:30] (03PS1) 10Gehel: elasticsearch: don't fail if cluster is not yet green after 5 minutes [cookbooks] - 10https://gerrit.wikimedia.org/r/491755 [14:53:53] (03PS1) 10CDanis: install_server: purge old files from /srv/autoinstall [puppet] - 10https://gerrit.wikimedia.org/r/491756 (https://phabricator.wikimedia.org/T215183) [14:55:17] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: don't fail if cluster is not yet green after 5 minutes [cookbooks] - 10https://gerrit.wikimedia.org/r/491755 (owner: 10Gehel) [15:05:06] (03PS1) 10Alexandros Kosiaris: scaffolding: Add single quotes around metrics [deployment-charts] - 10https://gerrit.wikimedia.org/r/491758 [15:05:08] (03PS1) 10Alexandros Kosiaris: scaffolding: Don't chomp ending whitespace in monitoring [deployment-charts] - 10https://gerrit.wikimedia.org/r/491759 [15:05:10] (03PS1) 10Alexandros Kosiaris: eventgate: Correctly checksum config template [deployment-charts] - 10https://gerrit.wikimedia.org/r/491760 [15:05:12] (03PS1) 10Alexandros Kosiaris: mathoid: Correctly checksum config template [deployment-charts] - 10https://gerrit.wikimedia.org/r/491761 [15:05:21] (03CR) 10Giuseppe Lavagetto: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans) [15:05:35] (03CR) 10jerkins-bot: [V: 04-1] Removed run-tests.sh script [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans) [15:05:48] PROBLEM - Check systemd state on elastic2049 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:06:01] gehel, onimisionipe: ^^^ [15:06:15] checking [15:06:24] PROBLEM - Check systemd state on elastic2047 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:07:31] (03PS1) 10Alexandros Kosiaris: mathoid: Bump to version 0.0.17 [deployment-charts] - 10https://gerrit.wikimedia.org/r/491763 [15:08:00] (03PS1) 10Andrew Bogott: nova: update scheduler pools [puppet] - 10https://gerrit.wikimedia.org/r/491764 [15:08:16] RECOVERY - Check systemd state on elastic2049 is OK: OK - running: The system is fully operational [15:09:50] (03PS2) 10Andrew Bogott: nova: update scheduler pools [puppet] - 10https://gerrit.wikimedia.org/r/491764 [15:10:06] RECOVERY - Check systemd state on elastic2047 is OK: OK - running: The system is fully operational [15:11:05] (03CR) 10Andrew Bogott: [C: 03+2] nova: update scheduler pools [puppet] - 10https://gerrit.wikimedia.org/r/491764 (owner: 10Andrew Bogott) [15:11:17] (03PS2) 10BBlack: CI check [dns] - 10https://gerrit.wikimedia.org/r/483198 [15:11:27] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] scaffolding: Add single quotes around metrics [deployment-charts] - 10https://gerrit.wikimedia.org/r/491758 (owner: 10Alexandros Kosiaris) [15:11:39] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] scaffolding: Don't chomp ending whitespace in monitoring [deployment-charts] - 10https://gerrit.wikimedia.org/r/491759 (owner: 10Alexandros Kosiaris) [15:12:00] (03CR) 10Volans: "recheck" [cookbooks] - 10https://gerrit.wikimedia.org/r/491755 (owner: 10Gehel) [15:14:03] (03CR) 10Ottomata: "Ah, yes, I think we do need to use uid here. It is (or at least it should be) used for auto superset account creation, which should match" [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524) (owner: 10Elukey) [15:15:05] (03CR) 10Giuseppe Lavagetto: [C: 03+1] install_server: purge old files from /srv/autoinstall [puppet] - 10https://gerrit.wikimedia.org/r/491756 (https://phabricator.wikimedia.org/T215183) (owner: 10CDanis) [15:15:56] (03CR) 10Elukey: "> Ah, yes, I think we do need to use uid here. It is (or at least it" [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524) (owner: 10Elukey) [15:16:14] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] "Comments addressed and a +1 already, I am gonna merge this and deploy a test in staging." [deployment-charts] - 10https://gerrit.wikimedia.org/r/491523 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris) [15:17:34] (03PS2) 10CDanis: install_server: purge old files from /srv/autoinstall [puppet] - 10https://gerrit.wikimedia.org/r/491756 (https://phabricator.wikimedia.org/T215183) [15:17:52] (03CR) 10Volans: [C: 03+1] "LGTM, with my limited notion of ES orchestration" [cookbooks] - 10https://gerrit.wikimedia.org/r/491755 (owner: 10Gehel) [15:18:02] (03CR) 10CDanis: "I'm making a backup of /srv/autoinstall as it currently exists on install1002 just to be sure, then merging this" [puppet] - 10https://gerrit.wikimedia.org/r/491756 (https://phabricator.wikimedia.org/T215183) (owner: 10CDanis) [15:18:05] (03CR) 10Gehel: [C: 03+2] elasticsearch: don't fail if cluster is not yet green after 5 minutes [cookbooks] - 10https://gerrit.wikimedia.org/r/491755 (owner: 10Gehel) [15:19:12] (03CR) 10CDanis: [C: 03+2] install_server: purge old files from /srv/autoinstall [puppet] - 10https://gerrit.wikimedia.org/r/491756 (https://phabricator.wikimedia.org/T215183) (owner: 10CDanis) [15:20:06] (03PS1) 10Ottomata: Remove usages of ::cdh::spark, we use ::spark2 now only [puppet] - 10https://gerrit.wikimedia.org/r/491767 (https://phabricator.wikimedia.org/T212134) [15:20:08] (03CR) 10Ottomata: [C: 03+1] eventgate: Correctly checksum config template [deployment-charts] - 10https://gerrit.wikimedia.org/r/491760 (owner: 10Alexandros Kosiaris) [15:20:40] (03CR) 10Giuseppe Lavagetto: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans) [15:20:50] (03CR) 10jerkins-bot: [V: 04-1] Removed run-tests.sh script [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans) [15:21:52] (03PS1) 10Filippo Giunchedi: Add setup.py and tox.ini [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491768 (https://phabricator.wikimedia.org/T216253) [15:21:54] (03PS1) 10Filippo Giunchedi: Reformat with black + isort [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491769 [15:21:56] (03PS1) 10Filippo Giunchedi: debian: add dh-python/pybuild [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491770 [15:21:58] (03PS1) 10Filippo Giunchedi: Add missing metrics help text [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491771 (https://phabricator.wikimedia.org/T216253) [15:22:00] (03PS1) 10Filippo Giunchedi: Add missing metrics help text, required for prometheus 2.0 [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491772 [15:23:42] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] "thanks!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/491760 (owner: 10Alexandros Kosiaris) [15:24:15] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] "Same thing as the parent change for eventgate-analytics, merging per that +1" [deployment-charts] - 10https://gerrit.wikimedia.org/r/491761 (owner: 10Alexandros Kosiaris) [15:24:19] (03PS1) 10Jcrespo: mariadb: Revert incorrectly overwritten transfer.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/491775 [15:24:22] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] mathoid: Bump to version 0.0.17 [deployment-charts] - 10https://gerrit.wikimedia.org/r/491763 (owner: 10Alexandros Kosiaris) [15:24:45] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Revert incorrectly overwritten transfer.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/491775 (owner: 10Jcrespo) [15:25:46] (03PS1) 10Alexandros Kosiaris: Package citoid version 0.0.1 chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/491776 [15:25:50] (03PS1) 10Elukey: camus: make webrequest_text config more similar to prod [puppet] - 10https://gerrit.wikimedia.org/r/491777 (https://phabricator.wikimedia.org/T212259) [15:26:46] PROBLEM - grafana.wikimedia.org on krypton is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.073 second response time [15:27:07] (03CR) 10Jcrespo: [V: 03+2 C: 03+2] mariadb: Revert incorrectly overwritten transfer.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/491775 (owner: 10Jcrespo) [15:27:20] godog: working on grafana by any chance? [15:27:42] volans: no [15:27:48] loads for me now though [15:27:57] krypton is the old host [15:28:16] there is still a not-upgraded instance there that we should tear down [15:28:45] (for a while it was still in use because FR firewall, but that's fixed for some time now) [15:29:08] funnily enough I was thinking about it the other day, have grafana-beta run grafana 6.0 beta [15:29:13] died: gmetric failed: sh: 1: /usr/bin/gmetric: not found [15:29:17] not sure if related [15:29:26] godog: clearly we need grafana1002 and for it to run buster ;) [15:29:47] (03CR) 10Elukey: [C: 03+2] camus: make webrequest_text config more similar to prod [puppet] - 10https://gerrit.wikimedia.org/r/491777 (https://phabricator.wikimedia.org/T212259) (owner: 10Elukey) [15:30:18] cdanis: hehe that'd be proper yeah [15:30:25] the exim-to-gmetric error is a red herring, is there since earlier [15:30:38] cc herron [15:31:02] PROBLEM - docker-registry service on darmstadtium is CRITICAL: CRITICAL - Expecting active but unit docker-registry is inactive [15:31:43] hey volans, which what exim-to-gmetric error? [15:31:52] PROBLEM - Docker registry HTTPS interface on darmstadtium is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string schemaVersion not found on https://darmstadtium.eqiad.wmnet:443/v2/wikimedia-jessie/manifests/latest - 372 bytes in 0.153 second response time [15:32:12] if krypton isn't in service anyways we should decom/silence it though, seems like a spurious grafana alert [15:32:25] herron: hey, I was looking at krypton for other reasons and it's spamming syslog with that error I pasted 3 minutes ago [15:32:33] if it's only on this one not a problem [15:32:36] godog: yeah I'll take a look at decomming grafana from it [15:32:41] but wanted to cc you in case it might be more spread out [15:32:53] ah I see [15:35:34] !log temporarily stop prometheus instances on prometheus1004 for systemd upgrade/journald restart [15:35:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:10] is anyone looking to the docker registry error? [15:36:14] if not im looking into it [15:36:33] looking [15:37:08] !log restarting docker-registry service on systemd [15:37:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:02] it was stopped moritzm [15:38:06] RECOVERY - Docker registry HTTPS interface on darmstadtium is OK: HTTP OK: HTTP/1.1 200 OK - 2483 bytes in 0.624 second response time [15:38:13] similar to the issue with ircecho before [15:38:18] RECOVERY - docker-registry service on darmstadtium is OK: OK - docker-registry is active [15:38:22] i guess you just updated systemd there [15:39:03] yeah, it also seems related to the journald restart, although I don't know yet why [15:39:49] (03CR) 10BBlack: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/483198 (owner: 10BBlack) [15:40:28] (03PS1) 10Elukey: Rename kafka webrequest test topic [puppet] - 10https://gerrit.wikimedia.org/r/491779 (https://phabricator.wikimedia.org/T212259) [15:40:52] (03PS1) 10CDanis: webserver_misc_apps: unbundle grafana [puppet] - 10https://gerrit.wikimedia.org/r/491780 [15:41:06] PROBLEM - HHVM jobrunner on mw1293 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.074 second response time [15:41:20] PROBLEM - HHVM jobrunner on mw1296 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.073 second response time [15:41:20] PROBLEM - HHVM jobrunner on mw1294 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.073 second response time [15:41:55] (03CR) 10Filippo Giunchedi: logstash: force use elasticsearch-curator 5 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/490809 (https://phabricator.wikimedia.org/T213898) (owner: 10Filippo Giunchedi) [15:42:01] (03PS3) 10Filippo Giunchedi: logstash: force use elasticsearch-curator 5 [puppet] - 10https://gerrit.wikimedia.org/r/490809 (https://phabricator.wikimedia.org/T213898) [15:42:18] RECOVERY - HHVM jobrunner on mw1293 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.080 second response time [15:42:36] RECOVERY - HHVM jobrunner on mw1296 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.080 second response time [15:42:36] RECOVERY - HHVM jobrunner on mw1294 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.079 second response time [15:42:56] RECOVERY - grafana.wikimedia.org on krypton is OK: HTTP OK: HTTP/1.1 200 OK - 31353 bytes in 0.123 second response time [15:44:21] (03CR) 10Ottomata: [C: 03+1] Rename kafka webrequest test topic [puppet] - 10https://gerrit.wikimedia.org/r/491779 (https://phabricator.wikimedia.org/T212259) (owner: 10Elukey) [15:44:25] 10Operations, 10Discovery-Search, 10Elasticsearch: fix broken visualizations in Elasticsearch Node comparison dashboard - https://phabricator.wikimedia.org/T212831 (10Mathew.onipe) 05Open→03Resolved [15:44:28] 10Operations, 10Elasticsearch, 10Maps, 10Discovery-Search (Current work): Review Elastic/maps Grafana dashboards - https://phabricator.wikimedia.org/T209812 (10Mathew.onipe) [15:44:37] (03CR) 10Elukey: [C: 03+2] Rename kafka webrequest test topic [puppet] - 10https://gerrit.wikimedia.org/r/491779 (https://phabricator.wikimedia.org/T212259) (owner: 10Elukey) [15:45:55] (03PS4) 10Filippo Giunchedi: logstash: force use elasticsearch-curator 5 [puppet] - 10https://gerrit.wikimedia.org/r/490809 (https://phabricator.wikimedia.org/T213898) [15:46:12] (03CR) 10Filippo Giunchedi: [C: 03+2] "Thanks for the reviews!" [puppet] - 10https://gerrit.wikimedia.org/r/490809 (https://phabricator.wikimedia.org/T213898) (owner: 10Filippo Giunchedi) [15:47:30] elukey: merging your change too [15:47:42] thanks! [15:48:41] (03CR) 10CDanis: "krypton is the only machine with role(webserver_misc_apps) I could find in the depot." [puppet] - 10https://gerrit.wikimedia.org/r/491780 (owner: 10CDanis) [15:49:13] godog: lemme know once done [15:49:32] elukey: yup I'm done! [15:49:37] thankss [15:52:30] (03CR) 10Giuseppe Lavagetto: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans) [15:55:59] !log authdns2001: upgrade gdnsd to 3.0.0-1~wmf1 [15:56:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:58:14] PROBLEM - HHVM jobrunner on mw1303 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.074 second response time [15:58:28] PROBLEM - HHVM jobrunner on mw1300 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.077 second response time [15:58:34] PROBLEM - HHVM jobrunner on mw1301 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.074 second response time [15:59:30] RECOVERY - HHVM jobrunner on mw1303 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.078 second response time [15:59:42] (03CR) 10Mforns: "> > I understand your concern Luca. I also think it is likely to fail" [puppet] - 10https://gerrit.wikimedia.org/r/491415 (https://phabricator.wikimedia.org/T216414) (owner: 10Joal) [15:59:44] RECOVERY - HHVM jobrunner on mw1300 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.081 second response time [15:59:50] RECOVERY - HHVM jobrunner on mw1301 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.079 second response time [16:00:45] (03PS2) 10Ottomata: Remove usages of ::cdh::spark, we use ::spark2 now only [puppet] - 10https://gerrit.wikimedia.org/r/491767 (https://phabricator.wikimedia.org/T212134) [16:00:57] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Remove usages of ::cdh::spark, we use ::spark2 now only [puppet] - 10https://gerrit.wikimedia.org/r/491767 (https://phabricator.wikimedia.org/T212134) (owner: 10Ottomata) [16:03:44] !log removing spark 1 from Analytics cluster - T212134 [16:03:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:03:47] T212134: Deprecate Spark 1.6 in favor of Spark 2.x only - https://phabricator.wikimedia.org/T212134 [16:06:09] (03CR) 10BBlack: [C: 03+2] Removed run-tests.sh script [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans) [16:06:13] (03PS4) 10BBlack: Removed run-tests.sh script [dns] - 10https://gerrit.wikimedia.org/r/491286 (owner: 10Volans) [16:11:54] (03PS2) 10Elukey: superset: fix httpd LDAP auth message [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524) [16:12:23] PROBLEM - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.300 second response time [16:12:34] PROBLEM - MariaDB Slave Lag: m3 on db2078 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 311.18 seconds [16:12:36] PROBLEM - MariaDB Slave Lag: m3 on db2042 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 311.32 seconds [16:14:56] uh? [16:14:58] checking [16:16:02] Cache Status: Permanently Disabled [16:16:22] https://phabricator.wikimedia.org/T202051 [16:17:25] spike of inserts too [16:17:27] it must be that [16:19:32] (03CR) 10Ottomata: [C: 03+1] superset: fix httpd LDAP auth message [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524) (owner: 10Elukey) [16:19:56] !log stopped phd on phab1002 [16:19:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:20:44] (03CR) 10Elukey: [C: 03+2] superset: fix httpd LDAP auth message [puppet] - 10https://gerrit.wikimedia.org/r/491691 (https://phabricator.wikimedia.org/T214524) (owner: 10Elukey) [16:24:49] !log authdns1001: upgrade gdnsd to 3.0.0-1~wmf1 [16:24:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:17] !log stopped phd on phab1001 and scheduled downtime in icinga [16:26:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:19] (03PS1) 10Filippo Giunchedi: Revert "scap: use logstash1008 for logstash_host" [puppet] - 10https://gerrit.wikimedia.org/r/491794 (https://phabricator.wikimedia.org/T213898) [16:26:43] Krinkle: thanks for the merge! [16:28:00] (03PS2) 10Cwhite: hiera: upgrade prometheus-node-exporter to 0.17 in codfw [puppet] - 10https://gerrit.wikimedia.org/r/490689 (https://phabricator.wikimedia.org/T213708) [16:28:02] (03CR) 10Filippo Giunchedi: [C: 03+2] "Reimaging logstash1008 now, move back to logstash1009" [puppet] - 10https://gerrit.wikimedia.org/r/491794 (https://phabricator.wikimedia.org/T213898) (owner: 10Filippo Giunchedi) [16:28:16] (03PS2) 10Kosta Harlan: [WIP] GrowthExperiments: Soft launch of help panel on viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489729 (https://phabricator.wikimedia.org/T215666) [16:29:29] (03PS1) 10Gehel: elasticsearch: support cluster names which have '-' in them [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) [16:30:41] elukey: yw, and thanks for pinging me so I remember to backport (was about to forget) [16:35:33] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: support cluster names which have '-' in them [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [16:36:10] !log depool and reimage logstash1008 with stretch - T213898 [16:36:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:36:13] T213898: Replace and expand Elasticsearch storage in eqiad and upgrade the cluster from Debian jessie to stretch - https://phabricator.wikimedia.org/T213898 [16:36:58] (03PS2) 10Gehel: elasticsearch: support cluster names which have '-' in them [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) [16:37:42] (03CR) 10Volans: [C: 04-1] "see inline" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [16:38:11] !log multatuli: upgrade gdnsd to 3.0.0-1~wmf1 [16:38:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:35] (03PS3) 10Gehel: elasticsearch: support cluster names which have '-' in them [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) [16:38:36] PROBLEM - Host logstash1008 is DOWN: PING CRITICAL - Packet loss = 100% [16:38:46] (03CR) 10Gehel: elasticsearch: support cluster names which have '-' in them (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [16:39:28] RECOVERY - Host logstash1008 is UP: PING OK - Packet loss = 0%, RTA = 37.81 ms [16:39:50] godog: missing downtime? [16:40:03] volans: indeed, fixed now [16:40:45] !log started phd again, seems to be working now without killing the db [16:40:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:41:18] godog: I keep mixing which logstash is physical and which ganeti and I thought was a bug in the reimage script ;) [16:41:56] (03CR) 10Mathew.onipe: "> Patch Set 2: Code-Review-1" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [16:45:30] (03CR) 10Volans: elasticsearch: support cluster names which have '-' in them (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [16:46:07] volans: hhehe I'll bug you for sure if I come across a bug in wmf-reimage [16:46:26] (03PS4) 10Gehel: elasticsearch: support cluster names which have '-' in them [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) [16:46:43] (03CR) 10Gehel: elasticsearch: support cluster names which have '-' in them (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [16:49:24] !log migrating es shards away from logstash100[56] with "cluster.routing.allocation.exclude._name" : "logstash1005-production-logstash-eqiad,logstash1006-production-logstash-eqiad” T214608 [16:49:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:49:27] T214608: rack/setup/install logstash101[012].eqiad.wmnet - https://phabricator.wikimedia.org/T214608 [16:50:34] (03CR) 10Volans: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [16:53:18] RECOVERY - MariaDB Slave Lag: m3 on db2078 is OK: OK slave_sql_lag Replication lag: 0.03 seconds [16:53:20] RECOVERY - MariaDB Slave Lag: m3 on db2042 is OK: OK slave_sql_lag Replication lag: 0.13 seconds [16:54:50] uugghh looks like squid got a bogus copy of rsyslog_8.38.0-1%7ebpo9%2b1wmf1_amd64.deb and thus a stretch reinstall is failing with hash mismatch [16:56:31] (03CR) 10CRusnov: "> Patch Set 5: Code-Review-1" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [16:58:15] (purged) [16:58:32] (03CR) 10Gehel: [C: 03+2] elasticsearch: support cluster names which have '-' in them [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [16:59:30] (03CR) 10jenkins-bot: elasticsearch: support cluster names which have '-' in them [software/spicerack] - 10https://gerrit.wikimedia.org/r/491797 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [17:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Your horoscope predicts another unfortunate Morning SWAT (Max 6 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T1700). [17:00:05] Zoranzoki21 and stephanebisson: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [17:00:16] hey [17:00:37] I can SWAT [17:01:10] Zoranzoki21... are you here under another name? [17:01:37] I don't think he is [17:01:45] (03PS1) 10Gehel: elasticsearch: get_next_clusters_nodes raises ElasticsearchClusterError [software/spicerack] - 10https://gerrit.wikimedia.org/r/491803 (https://phabricator.wikimedia.org/T207920) [17:02:01] I'll start with my patch. That'll give them some time to show up. [17:02:19] (03CR) 10Gehel: elasticsearch: get_next_clusters_nodes raises ElasticsearchClusterError (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491803 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [17:02:36] (03PS1) 10BBlack: authdns: listen for local PROXY, min v6 threads [puppet] - 10https://gerrit.wikimedia.org/r/491804 [17:02:38] (03PS1) 10BBlack: Lock memory for gdnsd [puppet] - 10https://gerrit.wikimedia.org/r/491805 [17:04:37] (03CR) 10BBlack: [C: 03+2] authdns: listen for local PROXY, min v6 threads [puppet] - 10https://gerrit.wikimedia.org/r/491804 (owner: 10BBlack) [17:04:41] 10Operations, 10Traffic, 10Wikidata, 10serviceops, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Addshore) Okay, so the question that I have now been asked is "why we can't simply do a DNS re-route without changing the owner". So, why can'... [17:04:51] (03CR) 10BBlack: [C: 03+2] Lock memory for gdnsd [puppet] - 10https://gerrit.wikimedia.org/r/491805 (owner: 10BBlack) [17:04:51] 10Operations, 10Traffic, 10Wikidata, 10serviceops, and 3 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Addshore) [17:08:57] !log contint1001: fix broken root ownership on zuul git deploy repo: sudo find /etc/zuul/wikimedia/.git -not -user zuul -exec chown zuul:zuul {} + [17:08:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:09:49] RECOVERY - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.340 second response time [17:14:48] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) Updates from https://github.com/RadeonOpenCompute/ROCm/issues/714#issuecomment-465666946 are not encouraging, gfx701 is a dead end so w... [17:15:01] 10Operations, 10Code-Stewardship-Reviews, 10Graphoid, 10Core Platform Team Backlog (Watching / External), and 2 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10Jhernandez) >>! In T211881#4954470, @akosiaris wrote: >>>! In T211881#4954092, @Jhernandez wrote: >> The... [17:17:26] PROBLEM - puppet last run on cloudvirtan1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[ip addr add 2620:0:861:118:10:64:20:46/64 dev eth0] [17:18:53] (03CR) 10Mathew.onipe: elasticsearch: get_next_clusters_nodes raises ElasticsearchClusterError (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491803 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [17:21:14] 10Operations, 10Analytics, 10Analytics-Kanban, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10elukey) In https://github.com/RadeonOpenCompute/ROCm/issues/714#issuecomment-465666946 the upstream developers of the AMD drivers told me that our GPU on stat1005 is b... [17:22:36] !log sbisson@deploy1001 Synchronized php-1.33.0-wmf.18/extensions/Flow/modules/mw.flow.Initializer.js: SWAT: [[gerrit:491744|Unbreak reply clicks with existing widget]] (duration: 00m 58s) [17:22:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:23:51] I'm done SWATing my patch. [17:25:42] (03PS1) 10Gehel: elasticsearch: systemctl iterates explicitly on elasticsearch instances [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920) [17:26:22] 10Operations, 10Traffic, 10Wikidata, 10serviceops, and 3 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10BBlack) There are different layers of "handing off" DNS management which are being conflated, but to run through them in order: 1) ** "Point... [17:30:56] 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MSantos) I tried to run proton (which I could before this update) and c... [17:34:42] 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MoritzMuehlenhoff) Where did you get the 72.0.3618.0 chromium build fro... [17:42:02] (03PS1) 10Filippo Giunchedi: logstash: remove cycle for apt::pin in collector [puppet] - 10https://gerrit.wikimedia.org/r/491811 [17:43:12] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1020 - https://phabricator.wikimedia.org/T194855 (10GTirloni) @Cmjohnson thank you! RAID reconfigured with spares. ` => ctrl slot=1 create type=ld drives=1I:1:5,1I:1:6,1I:1:7,1I:1:8,2I:1:1,2I:1:2,2I:1... [17:43:24] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1020 - https://phabricator.wikimedia.org/T194855 (10GTirloni) 05Open→03Resolved [17:43:32] RECOVERY - puppet last run on cloudvirtan1003 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [17:47:16] (03CR) 10Volans: [C: 04-1] "I think the bash command is missing a part." (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [17:49:01] (03CR) 10Volans: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/491803 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [17:52:00] (03PS2) 10CRusnov: Add dummy password for ganeti readonly user. [labs/private] - 10https://gerrit.wikimedia.org/r/491552 [17:52:28] (03CR) 10Volans: [C: 03+1] "LGTM" [labs/private] - 10https://gerrit.wikimedia.org/r/491552 (owner: 10CRusnov) [17:53:33] (03PS2) 10Gehel: elasticsearch: systemctl iterates explicitly on elasticsearch instances [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920) [17:53:39] 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MSantos) @MoritzMuehlenhoff get it from google's chromium-browser-snaps... [17:53:50] (03PS6) 10CRusnov: Add ganeti read-only user deployment [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229) [17:54:53] (03CR) 10Filippo Giunchedi: [C: 03+2] "PCC https://puppet-compiler.wmflabs.org/compiler1001/14753/" [puppet] - 10https://gerrit.wikimedia.org/r/491811 (owner: 10Filippo Giunchedi) [17:56:08] (03CR) 10CRusnov: [V: 03+2 C: 03+2] Add dummy password for ganeti readonly user. [labs/private] - 10https://gerrit.wikimedia.org/r/491552 (owner: 10CRusnov) [17:57:58] (03PS2) 10Alexandros Kosiaris: Package citoid version 0.0.1 chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/491776 [18:00:49] (03CR) 10Gehel: elasticsearch: systemctl iterates explicitly on elasticsearch instances (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [18:01:38] (03PS1) 10Gehel: WIP: experimentation with type hints [software/spicerack] - 10https://gerrit.wikimedia.org/r/491812 [18:04:09] (03PS7) 10CRusnov: Add ganeti read-only user deployment [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229) [18:04:25] !log mobrovac@deploy1001 Started deploy [restbase/deploy@80f518c]: Remove VE request logging - T215956 [18:04:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:04:28] T215956: Consider stashing data-parsoid for VE - https://phabricator.wikimedia.org/T215956 [18:04:56] (03CR) 10jerkins-bot: [V: 04-1] WIP: experimentation with type hints [software/spicerack] - 10https://gerrit.wikimedia.org/r/491812 (owner: 10Gehel) [18:05:16] (03CR) 10Andrew Bogott: "This breaks puppet runs on cloud Trusty VMs:" [puppet] - 10https://gerrit.wikimedia.org/r/487888 (owner: 10Muehlenhoff) [18:07:47] (03PS1) 10Andrew Bogott: Revert "imagemagick: Unconditionally use /etc/ImageMagick-6/" [puppet] - 10https://gerrit.wikimedia.org/r/491816 [18:08:17] (03CR) 10jerkins-bot: [V: 04-1] Revert "imagemagick: Unconditionally use /etc/ImageMagick-6/" [puppet] - 10https://gerrit.wikimedia.org/r/491816 (owner: 10Andrew Bogott) [18:08:31] (03PS2) 10Andrew Bogott: Revert "imagemagick: Unconditionally use /etc/ImageMagick-6/" [puppet] - 10https://gerrit.wikimedia.org/r/491816 [18:09:02] (03CR) 10jerkins-bot: [V: 04-1] Revert "imagemagick: Unconditionally use /etc/ImageMagick-6/" [puppet] - 10https://gerrit.wikimedia.org/r/491816 (owner: 10Andrew Bogott) [18:09:11] (03CR) 10Volans: [C: 04-1] "On second thought is not that simple" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [18:09:43] (03CR) 10Framawiki: [C: 03+1] Disable mobile main page special casing on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491699 (https://phabricator.wikimedia.org/T216563) (owner: 10Zoranzoki21) [18:09:51] (03PS3) 10Andrew Bogott: Revert "imagemagick: Unconditionally use /etc/ImageMagick-6/" [puppet] - 10https://gerrit.wikimedia.org/r/491816 [18:10:50] (03CR) 10Andrew Bogott: [C: 03+2] Revert "imagemagick: Unconditionally use /etc/ImageMagick-6/" [puppet] - 10https://gerrit.wikimedia.org/r/491816 (owner: 10Andrew Bogott) [18:12:01] (03CR) 10Gehel: elasticsearch: systemctl iterates explicitly on elasticsearch instances (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [18:16:48] (03PS1) 10Jcrespo: mariadb: Add the option of postprocessing backups [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/491818 (https://phabricator.wikimedia.org/T210292) [18:17:10] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Add the option of postprocessing backups [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/491818 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [18:19:40] !log fdans@deploy1001 Started deploy [analytics/refinery@ccf837e]: deploying refinery for new wikis and changes in scripts [18:19:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:20:15] (03CR) 10Framawiki: "It is common practice to respect when possible and not forgotten the same numbers in the configurations between wikis." (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491054 (https://phabricator.wikimedia.org/T216322) (owner: 10Ammarpad) [18:24:44] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@80f518c]: Remove VE request logging - T215956 (duration: 20m 19s) [18:24:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:24:49] T215956: Consider stashing data-parsoid for VE - https://phabricator.wikimedia.org/T215956 [18:26:37] (03PS2) 10Elukey: profile::analytics::refinery: add a wrapper for analytics-mysql [puppet] - 10https://gerrit.wikimedia.org/r/491528 (https://phabricator.wikimedia.org/T212386) [18:30:02] (03CR) 10Elukey: [C: 03+2] profile::analytics::refinery: add a wrapper for analytics-mysql [puppet] - 10https://gerrit.wikimedia.org/r/491528 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey) [18:30:53] !log fdans@deploy1001 Finished deploy [analytics/refinery@ccf837e]: deploying refinery for new wikis and changes in scripts (duration: 11m 13s) [18:30:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:34:13] (03PS1) 10Alexandros Kosiaris: WIP: Fix iteration of secret values in all deployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/491821 [18:38:05] 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MoritzMuehlenhoff) >>! In T216493#4969195, @MSantos wrote: > @MoritzMue... [18:39:13] (03PS1) 10Zoranzoki21: Add img.raremaps.com at wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491823 (https://phabricator.wikimedia.org/T216638) [18:39:44] (03PS1) 10Elukey: Add profile::analytics::refinery to notebook100[3,4] and stat1006 [puppet] - 10https://gerrit.wikimedia.org/r/491824 (https://phabricator.wikimedia.org/T212386) [18:40:46] (03CR) 10Ottomata: [C: 03+1] "Also need the change in scap hosts" [puppet] - 10https://gerrit.wikimedia.org/r/491824 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey) [18:41:34] (03PS1) 10GTirloni: cloudvirt1020: Network config [puppet] - 10https://gerrit.wikimedia.org/r/491825 (https://phabricator.wikimedia.org/T193264) [18:41:57] (03CR) 10Muehlenhoff: [C: 03+1] debian: add dh-python/pybuild [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491770 (owner: 10Filippo Giunchedi) [18:42:15] (03CR) 10GTirloni: [C: 03+2] cloudvirt1020: Network config [puppet] - 10https://gerrit.wikimedia.org/r/491825 (https://phabricator.wikimedia.org/T193264) (owner: 10GTirloni) [18:42:26] (03CR) 10Jcrespo: "Not finished, just FYI. This will allow to retry the statistics gathering and be the second execution to finish postprocessing after trans" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/491818 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [18:43:02] (03CR) 10Muehlenhoff: [C: 03+1] Add missing metrics help text, required for prometheus 2.0 [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491772 (owner: 10Filippo Giunchedi) [18:45:38] (03CR) 10Muehlenhoff: [C: 03+1] Add missing metrics help text [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491771 (https://phabricator.wikimedia.org/T216253) (owner: 10Filippo Giunchedi) [18:45:59] (03PS2) 10Elukey: Add profile::analytics::refinery to notebook100[3,4] and stat1006 [puppet] - 10https://gerrit.wikimedia.org/r/491824 (https://phabricator.wikimedia.org/T212386) [18:47:25] fyi, report of a network problem in OTRS #2019022010008102 [18:47:27] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1001/14758/" [puppet] - 10https://gerrit.wikimedia.org/r/491824 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey) [18:47:43] (just a minor report, might be nothing) [18:47:50] (03PS1) 10Zoranzoki21: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491826 (https://phabricator.wikimedia.org/T216642) [18:48:28] (03PS2) 10Zoranzoki21: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491826 (https://phabricator.wikimedia.org/T216642) [18:48:49] (03CR) 10Muehlenhoff: Add setup.py and tox.ini (031 comment) [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491768 (https://phabricator.wikimedia.org/T216253) (owner: 10Filippo Giunchedi) [18:49:12] (03CR) 10Elukey: "scap change in https://gerrit.wikimedia.org/r/#/c/analytics/refinery/scap/+/491827/" [puppet] - 10https://gerrit.wikimedia.org/r/491824 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey) [18:51:55] PROBLEM - ensure kvm processes are running on cloudvirt1009 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 [18:52:02] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good (that print statement was some debugging leftover)" [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/491769 (owner: 10Filippo Giunchedi) [18:58:17] (03PS1) 10CRusnov: (hopefully) get the dummy hiera key in the right place. [labs/private] - 10https://gerrit.wikimedia.org/r/491830 [19:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T1900) [19:01:41] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1018 - https://phabricator.wikimedia.org/T216004 (10GTirloni) All good, thank you! [19:01:47] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1018 - https://phabricator.wikimedia.org/T216004 (10GTirloni) 05Open→03Resolved [19:10:31] 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MSantos) [19:14:06] 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MSantos) >>! In T216493#4969393, @MoritzMuehlenhoff wrote: > [...] I th... [19:14:35] 10Puppet: hiera_lookup: Allow query against checkout of labs/private in addition to checkout of operations/puppet - https://phabricator.wikimedia.org/T216647 (10crusnov) [19:17:49] (03PS3) 10Zoranzoki21: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491826 (https://phabricator.wikimedia.org/T216642) [19:23:30] (03PS1) 10Andrew Bogott: imagemagick: Resolve version conflicts between toolforge and prod [puppet] - 10https://gerrit.wikimedia.org/r/491837 (https://phabricator.wikimedia.org/T216506) [19:27:49] (03CR) 10jerkins-bot: [V: 04-1] imagemagick: Resolve version conflicts between toolforge and prod [puppet] - 10https://gerrit.wikimedia.org/r/491837 (https://phabricator.wikimedia.org/T216506) (owner: 10Andrew Bogott) [19:33:22] ACKNOWLEDGEMENT - HP RAID on cloudvirt1020 is CRITICAL: CRITICAL: Slot 1: OK: 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:1:1, 2I:1:2, 2I:1:3, 2I:1:4, 2I:2:1, 2I:2:2 - Controller: OK - Battery count: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T216649 [19:33:25] 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1020 - https://phabricator.wikimedia.org/T216649 (10ops-monitoring-bot) [19:42:26] (03PS2) 10Andrew Bogott: imagemagick: Resolve version conflicts between toolforge and prod [puppet] - 10https://gerrit.wikimedia.org/r/491837 (https://phabricator.wikimedia.org/T216506) [19:56:50] (03PS2) 10Gehel: WIP: experimentation with type hints [software/spicerack] - 10https://gerrit.wikimedia.org/r/491812 [20:00:04] thcipriani: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for MediaWiki train - Americas version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T2000). [20:00:21] * thcipriani train [20:00:44] (03CR) 10jerkins-bot: [V: 04-1] WIP: experimentation with type hints [software/spicerack] - 10https://gerrit.wikimedia.org/r/491812 (owner: 10Gehel) [20:14:01] !log thcipriani@deploy1001 Synchronized php-1.33.0-wmf.18/extensions/EventBus/includes/EventBusRCFeedEngine.php: [[gerrit:491845|Check for eventServiceName in config before accessing]] T216561 (duration: 00m 55s) [20:14:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:14:04] T216561: extensions/EventBus/includes/EventBusRCFeedEngine.php:45 PHP Notice: Undefined index: eventServiceName - https://phabricator.wikimedia.org/T216561 [20:23:15] 10Operations, 10Analytics, 10Analytics-Kanban, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10EBernhardson) Another thing to take away from the upstream response is that debian is unsupported. I can't imagine deploying ubuntu to a single machine will be an acce... [20:29:14] (03PS1) 10Thcipriani: group1 wikis to 1.33.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491849 [20:29:16] (03CR) 10Thcipriani: [C: 03+2] group1 wikis to 1.33.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491849 (owner: 10Thcipriani) [20:30:55] (03Merged) 10jenkins-bot: group1 wikis to 1.33.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491849 (owner: 10Thcipriani) [20:31:12] (03CR) 10jenkins-bot: group1 wikis to 1.33.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491849 (owner: 10Thcipriani) [20:32:45] (03PS1) 10Gehel: elasticsearch: add script to execute systemctl on each elasticsearch instance [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920) [20:33:38] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: add script to execute systemctl on each elasticsearch instance [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [20:33:49] !log thcipriani@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.18 [20:34:43] !log thcipriani@deploy1001 Synchronized php: group1 wikis to 1.33.0-wmf.18 (duration: 00m 53s) [20:35:50] thcipriani@deploy1001: Failed to log message to wiki. Somebody should check the error logs. [20:35:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:38:37] (03PS2) 10Gehel: elasticsearch: add script to execute systemctl on each elasticsearch instance [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920) [20:39:12] (03PS3) 10Gehel: elasticsearch: add script to execute systemctl on each elasticsearch instance [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920) [20:39:21] PROBLEM - Apache HTTP on mw1240 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:39:37] (03PS4) 10Gehel: elasticsearch: add script to execute systemctl on each elasticsearch instance [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920) [20:39:53] (03CR) 10Framawiki: "I've tested that conf on live instance and it works as excepted, no error found." [puppet] - 10https://gerrit.wikimedia.org/r/491377 (https://phabricator.wikimedia.org/T214637) (owner: 10Framawiki) [20:40:27] RECOVERY - Apache HTTP on mw1240 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.101 second response time [20:41:45] (03PS4) 10Framawiki: quarry: Setup CSP http header [puppet] - 10https://gerrit.wikimedia.org/r/491377 (https://phabricator.wikimedia.org/T214637) [20:42:51] (03PS3) 10Gehel: elasticsearch: systemctl iterates explicitly on elasticsearch instances [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920) [20:43:31] (03CR) 10DCausse: "I'm a bit late on this but thanks for shipping it!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491231 (https://phabricator.wikimedia.org/T215969) (owner: 10DCausse) [20:45:21] (03PS5) 10Gehel: elasticsearch: add script to execute systemctl on each elasticsearch instance [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920) [20:46:22] (03CR) 10Gehel: elasticsearch: add script to execute systemctl on each elasticsearch instance (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [20:47:05] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: systemctl iterates explicitly on elasticsearch instances [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [20:47:19] (03PS9) 10Eevans: Initial configuration for session storage service [puppet] - 10https://gerrit.wikimedia.org/r/487885 (https://phabricator.wikimedia.org/T215883) [20:48:44] (03PS4) 10Gehel: elasticsearch: systemctl iterates explicitly on elasticsearch instances [software/spicerack] - 10https://gerrit.wikimedia.org/r/491808 (https://phabricator.wikimedia.org/T207920) [21:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: My dear minions, it's time we take the moon! Just kidding. Time for Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190220T2100). [21:02:47] (03PS10) 10Eevans: Initial configuration for session storage service [puppet] - 10https://gerrit.wikimedia.org/r/487885 (https://phabricator.wikimedia.org/T215883) [21:08:36] <_joe_> !log rolling restart of php-fpm to catch up with the tideways change [21:08:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:09:29] (03CR) 10Herron: [C: 03+1] (hopefully) get the dummy hiera key in the right place. [labs/private] - 10https://gerrit.wikimedia.org/r/491830 (owner: 10CRusnov) [21:09:46] (03CR) 10CRusnov: [V: 03+2 C: 03+2] (hopefully) get the dummy hiera key in the right place. [labs/private] - 10https://gerrit.wikimedia.org/r/491830 (owner: 10CRusnov) [21:12:49] (03CR) 10Eevans: "[PC output](http://puppet-compiler.wmflabs.org/14763/) here. The compile fails because of missing secrets (which seems...right). Aside f" [puppet] - 10https://gerrit.wikimedia.org/r/487885 (https://phabricator.wikimedia.org/T215883) (owner: 10Eevans) [21:19:12] !log arlolra@deploy1001 Started deploy [parsoid/deploy@c4574d1]: Updating Parsoid to 9b204a0 [21:19:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:19:19] (03PS1) 10Ottomata: Set cors to false for eventgate-analytics node service chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/491857 (https://phabricator.wikimedia.org/T208251) [21:19:46] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Set cors to false for eventgate-analytics node service chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/491857 (https://phabricator.wikimedia.org/T208251) (owner: 10Ottomata) [21:27:32] (03CR) 10CRusnov: [V: 03+1] "Successfully built with dummy password, produces the file expected." [puppet] - 10https://gerrit.wikimedia.org/r/490397 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [21:28:46] !log arlolra@deploy1001 Finished deploy [parsoid/deploy@c4574d1]: Updating Parsoid to 9b204a0 (duration: 09m 33s) [21:28:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:31:33] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudvirt1009: upgrade to 10G - https://phabricator.wikimedia.org/T216324 (10Andrew) [21:34:11] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudvirt1009: upgrade to 10G - https://phabricator.wikimedia.org/T216324 (10RobH) a:05RobH→03Cmjohnson So yeah earlier we tried to remotely enter bios and enable the 10G nic and failed (it requires crash cart.) So this is ready for... [21:40:54] PROBLEM - ensure kvm processes are running on labvirt1008 is CRITICAL: PROCS CRITICAL: 0 processes with regex args /usr/bin/kvm [21:43:22] RECOVERY - ensure kvm processes are running on labvirt1008 is OK: PROCS OK: 1 process with regex args /usr/bin/kvm [21:46:16] !log Updated Parsoid to 9b204a0 (T153080, T169975, T215824) [21:46:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:46:23] T153080: Parse images synchronously without making imageinfo requests and use a final postprocessing pass to fixup image HTML - https://phabricator.wikimedia.org/T153080 [21:46:23] T215824: AddMediaInfo pass isn't robust to link-in-link - https://phabricator.wikimedia.org/T215824 [21:46:23] T169975: Missing images render as broken img tags, not redlinks - https://phabricator.wikimedia.org/T169975 [21:54:59] (03PS1) 10Ottomata: [WIP] Set up DNS for eventgate-analytics [dns] - 10https://gerrit.wikimedia.org/r/491860 (https://phabricator.wikimedia.org/T211247) [21:55:20] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Set up DNS for eventgate-analytics [dns] - 10https://gerrit.wikimedia.org/r/491860 (https://phabricator.wikimedia.org/T211247) (owner: 10Ottomata) [21:55:28] (03PS1) 10Ottomata: [WIP] Set up eventgate-analytics.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/491861 (https://phabricator.wikimedia.org/T211247) [21:56:01] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Set up eventgate-analytics.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/491861 (https://phabricator.wikimedia.org/T211247) (owner: 10Ottomata) [21:56:15] (03CR) 10Volans: [C: 04-1] "Mostly ok, few nitpicks/typos inline" (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [21:56:28] (03PS2) 10Ottomata: [WIP] Set up DNS for eventgate-analytics [dns] - 10https://gerrit.wikimedia.org/r/491860 (https://phabricator.wikimedia.org/T211247) [21:56:47] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Set up DNS for eventgate-analytics [dns] - 10https://gerrit.wikimedia.org/r/491860 (https://phabricator.wikimedia.org/T211247) (owner: 10Ottomata) [21:57:18] (03PS3) 10Ottomata: [WIP] Set up DNS for eventgate-analytics [dns] - 10https://gerrit.wikimedia.org/r/491860 (https://phabricator.wikimedia.org/T211247) [21:57:58] (03PS2) 10Ottomata: [WIP] Set up eventgate-analytics.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/491861 (https://phabricator.wikimedia.org/T211247) [21:58:48] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Set up eventgate-analytics.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/491861 (https://phabricator.wikimedia.org/T211247) (owner: 10Ottomata) [22:08:18] (03PS3) 10Ottomata: [WIP] Set up eventgate-analytics.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/491861 (https://phabricator.wikimedia.org/T211247) [22:44:04] Telia is a network Wikimedia peers with right? [22:44:23] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to tsops@ - https://phabricator.wikimedia.org/T210464 (10bcampbell) @Dzahn I just added pat@, gary@, and box6699@ as aliases to Google Group tsops@wikimedia.org. You should be able to delete on your side now. [22:47:46] 10Operations, 10Wikimedia-Logstash, 10User-fgiunchedi, 10User-herron: Increase utilization of application logging pipeline (FY2018-2019 Q3 TEC6) - https://phabricator.wikimedia.org/T213157 (10RobH) [22:55:10] (03CR) 10DCausse: elasticsearch: add script to execute systemctl on each elasticsearch instance (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/491850 (https://phabricator.wikimedia.org/T207920) (owner: 10Gehel) [23:08:51] (03CR) 10Zhuyifei1999: [C: 03+2] Mount /mnt/nfs into Kuberntes pods [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/491397 (https://phabricator.wikimedia.org/T193646) (owner: 10BryanDavis) [23:08:53] (03CR) 10Zhuyifei1999: [C: 03+2] Set custom mime-types [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/489409 (https://phabricator.wikimedia.org/T178601) (owner: 10BryanDavis) [23:09:29] (03Merged) 10jenkins-bot: Mount /mnt/nfs into Kuberntes pods [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/491397 (https://phabricator.wikimedia.org/T193646) (owner: 10BryanDavis) [23:09:46] (03Merged) 10jenkins-bot: Set custom mime-types [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/489409 (https://phabricator.wikimedia.org/T178601) (owner: 10BryanDavis) [23:28:26] (03PS1) 10Smalyshev: Turn off proxy_intercept_errors for nginx [puppet] - 10https://gerrit.wikimedia.org/r/491870 (https://phabricator.wikimedia.org/T214032) [23:31:46] 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1020 - https://phabricator.wikimedia.org/T216649 (10Andrew) This is still complaining about the battery :( [23:43:01] (03PS1) 10BryanDavis: Always create /var/run/lighttpd/ before chmod [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/491873 [23:43:55] (03CR) 10Zhuyifei1999: [C: 03+2] Always create /var/run/lighttpd/ before chmod [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/491873 (owner: 10BryanDavis) [23:44:11] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10MW-1.33-notes (1.33.0-wmf.2; 2018-10-30), 10User-Addshore: Wikidata produces a lot of failed requests for recentchanges API - https://phabricator.wikimedia.org/T202764 (10Smalyshev) 05Open→03Resolved a:03Smalyshev Doesn't seem to happen anymore,... [23:44:17] (03Merged) 10jenkins-bot: Always create /var/run/lighttpd/ before chmod [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/491873 (owner: 10BryanDavis) [23:57:52] !log ppchelko@deploy1001 Started deploy [changeprop/deploy@5e4486a]: Purge varnish on revision restrictions [23:57:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:59:07] 10Operations, 10Traffic: Content purges are unreliable - https://phabricator.wikimedia.org/T133821 (10mobrovac) [23:59:15] !log ppchelko@deploy1001 Finished deploy [changeprop/deploy@5e4486a]: Purge varnish on revision restrictions (duration: 01m 23s) [23:59:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log