[00:03:34] (03PS1) 10Bstorm: postgres: correct an issue in the wmcs profile around replication user [puppet] - 10https://gerrit.wikimedia.org/r/511792 [00:07:09] (03CR) 10Bstorm: [C: 03+1] "This cannot be used with anything that actually properly implements systemd at all (cgred conflicts with systemd's uses of cgroups and doe" [puppet] - 10https://gerrit.wikimedia.org/r/511791 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [00:07:51] (03CR) 10Bstorm: [C: 03+2] postgres: correct an issue in the wmcs profile around replication user [puppet] - 10https://gerrit.wikimedia.org/r/511792 (owner: 10Bstorm) [00:25:08] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:25:44] PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [00:25:50] PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [00:26:00] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [00:26:20] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:26:36] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [00:26:50] PROBLEM - Eqsin HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqsin&var-cache_type=All&var-status_type=5 [00:27:43] another short spike [00:28:34] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:28:38] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:32:48] RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [00:32:52] RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [00:33:46] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [00:34:04] RECOVERY - Eqsin HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqsin&var-cache_type=All&var-status_type=5 [00:34:24] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [00:57:08] * AaronSchulz should push https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/511783/ [01:15:06] PROBLEM - BFD status on cr2-eqdfw is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [01:16:32] RECOVERY - BFD status on cr2-eqdfw is OK: OK: UP: 10 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [01:22:24] !log aaron@deploy1001 Synchronized php-1.34.0-wmf.6/includes/specials/SpecialWatchlist.php: 447bf504e498e2c18f29b90f7760514102236e4e (duration: 00m 57s) [01:22:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:46:14] !log aaron@deploy1001 Synchronized php-1.34.0-wmf.5/includes/specials/SpecialWatchlist.php: 68eeaa5b76738a6a07d148391220cdb6c8fd1d23 (duration: 00m 57s) [01:46:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:59:30] 10Operations, 10Maps, 10Operations-Software-Development, 10User-Joe, 10User-jijiki: Create cookbook to restart Maps - https://phabricator.wikimedia.org/T224072 (10Mathew.onipe) [01:59:40] 10Operations, 10Maps, 10Operations-Software-Development, 10User-Joe, 10User-jijiki: Create cookbook to restart Maps - https://phabricator.wikimedia.org/T224072 (10Mathew.onipe) p:05Triage→03Normal [02:08:19] (03PS3) 10Andrew Bogott: neutron: support primary/secondary rabbitmq hosts [puppet] - 10https://gerrit.wikimedia.org/r/511744 (https://phabricator.wikimedia.org/T223906) [02:11:00] (03CR) 10Andrew Bogott: [C: 03+2] neutron: support primary/secondary rabbitmq hosts [puppet] - 10https://gerrit.wikimedia.org/r/511744 (https://phabricator.wikimedia.org/T223906) (owner: 10Andrew Bogott) [02:15:27] 10Operations, 10Discovery-Search, 10Elasticsearch, 10Icinga: Create Icinga check that alerts whenever elasticsearch master is down - https://phabricator.wikimedia.org/T224073 (10Mathew.onipe) [02:15:38] 10Operations, 10Discovery-Search, 10Elasticsearch, 10Icinga: Create Icinga check that alerts whenever elasticsearch master is down - https://phabricator.wikimedia.org/T224073 (10Mathew.onipe) p:05Triage→03Normal [02:15:46] PROBLEM - nova instance creation test on cloudcontrol1003 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [02:16:50] PROBLEM - Check systemd state on cloudcontrol1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [02:17:05] ACKNOWLEDGEMENT - Check systemd state on cloudcontrol1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. andrew bogott I dont know what this is yet but I almost certainly caused it and am looking [02:17:05] ACKNOWLEDGEMENT - nova instance creation test on cloudcontrol1003 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack andrew bogott I dont know what this is yet but I almost certainly caused it and am looking https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [02:18:14] RECOVERY - Check systemd state on cloudcontrol1003 is OK: OK - running: The system is fully operational [02:18:36] RECOVERY - nova instance creation test on cloudcontrol1003 is OK: PROCS OK: 1 process with command name python, args nova-fullstack https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [02:28:56] 10Operations, 10Discovery-Search (Current work): Cleanup puppet hieradata for logstash - https://phabricator.wikimedia.org/T224074 (10Mathew.onipe) [02:29:07] 10Operations, 10Discovery-Search (Current work): Cleanup puppet hieradata for logstash - https://phabricator.wikimedia.org/T224074 (10Mathew.onipe) p:05Triage→03Normal [02:55:22] PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [03:22:06] !log removed 2fa for T224075 [03:22:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:27:42] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [04:18:24] PROBLEM - HHVM rendering on mw1345 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [04:19:38] RECOVERY - HHVM rendering on mw1345 is OK: HTTP OK: HTTP/1.1 200 OK - 77914 bytes in 0.390 second response time https://wikitech.wikimedia.org/wiki/Application_servers [04:30:53] (03PS1) 10Gilles: Renew origin trial tokens [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511803 [04:36:15] (03CR) 10Gilles: [C: 03+2] Renew origin trial tokens [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511803 (owner: 10Gilles) [04:37:15] (03Merged) 10jenkins-bot: Renew origin trial tokens [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511803 (owner: 10Gilles) [04:37:29] (03CR) 10jenkins-bot: Renew origin trial tokens [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511803 (owner: 10Gilles) [04:39:02] !log gilles@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Renew origin trial tokens (duration: 00m 57s) [04:39:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:41:20] !log purging ruwiki and eswiki to make them get the new origin trial tokens [04:41:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:05:00] (03PS1) 10Marostegui: db-eqiad.php: Depool db1118 from API, pool db1134 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511804 (https://phabricator.wikimedia.org/T224017) [05:06:30] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1118 from API, pool db1134 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511804 (https://phabricator.wikimedia.org/T224017) (owner: 10Marostegui) [05:07:28] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1118 from API, pool db1134 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511804 (https://phabricator.wikimedia.org/T224017) (owner: 10Marostegui) [05:07:42] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1118 from API, pool db1134 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511804 (https://phabricator.wikimedia.org/T224017) (owner: 10Marostegui) [05:09:05] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Remove db1118 from s1 api and pool db1134 instead T224017 (duration: 00m 57s) [05:09:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:09:10] T224017: Slow query ApiQueryRevisions on enwiki - https://phabricator.wikimedia.org/T224017 [05:09:41] (03PS1) 10Marostegui: db2118,db2120: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/511805 (https://phabricator.wikimedia.org/T222772) [05:13:02] (03CR) 10Marostegui: [C: 03+2] db2118,db2120: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/511805 (https://phabricator.wikimedia.org/T222772) (owner: 10Marostegui) [05:13:06] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Pool db2118,db2120 into s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511806 (https://phabricator.wikimedia.org/T222772) [05:15:57] 10Operations, 10Growth-Team, 10Performance-Team, 10Wikidata, and 4 others: Investigate increase in tx bandwidth usage for mc1033 - https://phabricator.wikimedia.org/T223310 (10elukey) Thanks a lot for the deploy! I checked metrics and nothing seems changed :( [05:18:53] 10Operations, 10ops-codfw, 10decommission: Decommission db2041 - https://phabricator.wikimedia.org/T223950 (10Marostegui) a:03RobH This host is ready for DCOPs to take over. [05:22:21] (03CR) 10ArielGlenn: [C: 03+1] db-eqiad,db-codfw.php: Pool db2118,db2120 into s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511806 (https://phabricator.wikimedia.org/T222772) (owner: 10Marostegui) [05:22:42] (03CR) 10Marostegui: [C: 03+2] db-eqiad,db-codfw.php: Pool db2118,db2120 into s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511806 (https://phabricator.wikimedia.org/T222772) (owner: 10Marostegui) [05:23:41] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Pool db2118,db2120 into s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511806 (https://phabricator.wikimedia.org/T222772) (owner: 10Marostegui) [05:23:55] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Pool db2118,db2120 into s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511806 (https://phabricator.wikimedia.org/T222772) (owner: 10Marostegui) [05:24:47] 10Operations, 10ops-eqiad, 10DBA, 10Goal, and 2 others: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10Marostegui) [05:25:05] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Pool db2118 and db2120 into s7 T222772 (duration: 00m 55s) [05:25:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:25:10] T222772: Productionize db2[103-120] - https://phabricator.wikimedia.org/T222772 [05:26:06] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Pool db2118 and db2120 into s7 T222772 (duration: 00m 55s) [05:26:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:26:59] 10Operations, 10ops-codfw, 10DBA, 10Goal, 10Patch-For-Review: rack/setup/install db2[103-120].codfw.wmnet (18 hosts) - https://phabricator.wikimedia.org/T221532 (10Marostegui) [05:33:31] (03PS1) 10Marostegui: mariadb: Provision db1136 into s7 [puppet] - 10https://gerrit.wikimedia.org/r/511807 (https://phabricator.wikimedia.org/T222682) [05:34:21] (03PS1) 10Marostegui: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511808 [05:37:25] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511808 (owner: 10Marostegui) [05:37:36] (03CR) 10Marostegui: [C: 03+2] mariadb: Provision db1136 into s7 [puppet] - 10https://gerrit.wikimedia.org/r/511807 (https://phabricator.wikimedia.org/T222682) (owner: 10Marostegui) [05:38:22] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511808 (owner: 10Marostegui) [05:38:36] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511808 (owner: 10Marostegui) [05:39:53] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1086 (duration: 00m 55s) [05:39:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:42:09] !log Stop MySQL on db1086 to clone db1136 [05:42:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:47:26] (03CR) 10Elukey: "Update: planning to start working on this the first week of June if nobody else has free time before!" [puppet] - 10https://gerrit.wikimedia.org/r/492948 (owner: 10Aaron Schulz) [05:51:37] 10Operations, 10DBA, 10Performance-Team, 10serviceops, and 2 others: Increased instability in MediaWiki backends (according to load balancers) - https://phabricator.wikimedia.org/T223952 (10Marostegui) Update from the subtask related to the slow query detected (which might or might not be the cause for thi... [06:01:36] !log Stop MySQL on db2040 - T224079 [06:01:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:01:41] T224079: Decommission db2040 - https://phabricator.wikimedia.org/T224079 [06:04:48] PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:05:40] 10Operations, 10PHP 7.2 support, 10Performance-Team (Radar): Monitoring PHP 7 APC usage - https://phabricator.wikimedia.org/T223180 (10Joe) A couple general graphs were added to https://grafana.wikimedia.org/d/GuHySj3mz/php7-transition?refresh=30s&orgId=1&from=now-12h&to=now to monitor APC status. I woul... [06:08:30] (03PS3) 10Giuseppe Lavagetto: profile::keyholder::server: profile for keyholder installation [puppet] - 10https://gerrit.wikimedia.org/r/508577 [06:10:54] 10Operations, 10PHP 7.2 support, 10Wikimedia-production-error: PHP7 opcache sometimes corrupts when cleared (was: Fatal ConfigException, undefined InitialiseSettings variable) - https://phabricator.wikimedia.org/T221347 (10Joe) 05Open→03Resolved a:03Joe FWIW, we had no more occurrences of this problem... [06:11:07] 10Operations, 10Core Platform Team (PHP7 (TEC4)), 10Core Platform Team Kanban (Doing), 10HHVM, and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Joe) [06:11:28] 10Operations, 10DBA, 10Performance-Team, 10serviceops, 10HHVM: Increased instability in MediaWiki backends (according to load balancers) - https://phabricator.wikimedia.org/T223952 (10Joe) [06:12:25] 10Operations, 10PHP 7.2 support, 10Performance-Team (Radar): Monitoring PHP 7 APC usage - https://phabricator.wikimedia.org/T223180 (10Joe) a:03Joe [06:12:31] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Remove db2040 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511809 (https://phabricator.wikimedia.org/T224079) [06:13:06] !log Remove db2040 from zarcillo and tendril - T224079 [06:13:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:13:11] T224079: Decommission db2040 - https://phabricator.wikimedia.org/T224079 [06:13:50] (03CR) 10Marostegui: [C: 03+2] db-eqiad,db-codfw.php: Remove db2040 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511809 (https://phabricator.wikimedia.org/T224079) (owner: 10Marostegui) [06:14:48] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db2040 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511809 (https://phabricator.wikimedia.org/T224079) (owner: 10Marostegui) [06:16:12] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Remove db2040 from config T224079 (duration: 00m 56s) [06:16:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:16:31] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db2040 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511809 (https://phabricator.wikimedia.org/T224079) (owner: 10Marostegui) [06:17:21] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Remove db2040 from config T224079 (duration: 00m 55s) [06:17:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:43:05] (03PS1) 10Elukey: profile::hadoop::balancer: run timer as hdfs user [puppet] - 10https://gerrit.wikimedia.org/r/511811 [06:43:22] 10Operations, 10Maps, 10Operations-Software-Development, 10User-Joe, 10User-jijiki: Create cookbook to reboot Maps - https://phabricator.wikimedia.org/T224072 (10Mathew.onipe) [06:44:28] (03CR) 10Elukey: [C: 03+2] profile::hadoop::balancer: run timer as hdfs user [puppet] - 10https://gerrit.wikimedia.org/r/511811 (owner: 10Elukey) [06:48:18] RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational [07:17:23] (03PS1) 10Marostegui: db-codfw.php: Tackle s4 weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511814 (https://phabricator.wikimedia.org/T220170) [07:18:48] (03PS2) 10Marostegui: db-codfw.php: Tackle s4 weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511814 (https://phabricator.wikimedia.org/T220170) [07:20:16] (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Tackle s4 weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511814 (https://phabricator.wikimedia.org/T220170) (owner: 10Marostegui) [07:20:19] (03PS1) 10Marostegui: db2090: Make it candidate master for s4 [puppet] - 10https://gerrit.wikimedia.org/r/511816 (https://phabricator.wikimedia.org/T220170) [07:22:03] (03CR) 10Marostegui: [C: 03+2] db2090: Make it candidate master for s4 [puppet] - 10https://gerrit.wikimedia.org/r/511816 (https://phabricator.wikimedia.org/T220170) (owner: 10Marostegui) [07:23:23] (03Merged) 10jenkins-bot: db-codfw.php: Tackle s4 weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511814 (https://phabricator.wikimedia.org/T220170) (owner: 10Marostegui) [07:23:37] (03CR) 10jenkins-bot: db-codfw.php: Tackle s4 weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511814 (https://phabricator.wikimedia.org/T220170) (owner: 10Marostegui) [07:23:38] !log Restart MySQL on db2090 to change binlog format T220170 [07:23:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:23:44] T220170: Address Database infrastructure blockers on datacenter switchover & multi-dc deployment - https://phabricator.wikimedia.org/T220170 [07:24:42] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Tackle s4 codfw weights T220170 (duration: 01m 06s) [07:24:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:34:34] (03PS1) 10Marostegui: db-codfw.php: Tackle s8 weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511818 (https://phabricator.wikimedia.org/T220170) [07:36:51] !log decommission restbase1007-c - T223976 [07:36:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:36:56] T223976: Decommission restbase10(0[7-9]|1[0-5]) - https://phabricator.wikimedia.org/T223976 [07:38:25] 10Operations, 10Patch-For-Review: Standardizing our partman recipes - https://phabricator.wikimedia.org/T156955 (10MoritzMuehlenhoff) One note on db.cfg; it's used by DBAs specifically for the initial setup of a host, if no current server has it configured in Puppet, this doesn't mean that it's unused/up for r... [07:39:08] (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Tackle s8 weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511818 (https://phabricator.wikimedia.org/T220170) (owner: 10Marostegui) [07:40:10] (03Merged) 10jenkins-bot: db-codfw.php: Tackle s8 weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511818 (https://phabricator.wikimedia.org/T220170) (owner: 10Marostegui) [07:40:24] (03CR) 10jenkins-bot: db-codfw.php: Tackle s8 weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511818 (https://phabricator.wikimedia.org/T220170) (owner: 10Marostegui) [07:40:25] 10Operations, 10Dumps-Generation: Reboot dumps/snapshot hosts - https://phabricator.wikimedia.org/T223962 (10ArielGlenn) [07:40:38] (03PS1) 10Mathew.onipe: Add maps reboot cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/511819 (https://phabricator.wikimedia.org/T224072) [07:41:18] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Tackle s8 codfw weights T220170 (duration: 00m 55s) [07:41:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:41:22] T220170: Address Database infrastructure blockers on datacenter switchover & multi-dc deployment - https://phabricator.wikimedia.org/T220170 [07:44:17] (03CR) 10Hashar: "> does the logger qualname have to match the loggers listed above? if so.. there is zuul.stack_dump vs zuul_stack_dump" [puppet] - 10https://gerrit.wikimedia.org/r/505253 (owner: 10Hashar) [07:46:09] 10Operations, 10Dumps-Generation: Reboot dumps/snapshot hosts - https://phabricator.wikimedia.org/T223962 (10ArielGlenn) [07:50:23] (03CR) 10Muehlenhoff: Add maps reboot cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/511819 (https://phabricator.wikimedia.org/T224072) (owner: 10Mathew.onipe) [07:55:54] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511821 [07:56:42] (03PS2) 10Mathew.onipe: Add maps reboot cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/511819 (https://phabricator.wikimedia.org/T224072) [07:58:01] (03CR) 10Muehlenhoff: [C: 03+1] debdeploy client: update defaults [puppet] - 10https://gerrit.wikimedia.org/r/511760 (owner: 10Jbond) [07:58:09] (03CR) 10jerkins-bot: [V: 04-1] Add maps reboot cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/511819 (https://phabricator.wikimedia.org/T224072) (owner: 10Mathew.onipe) [08:01:02] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511821 (owner: 10Marostegui) [08:02:07] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511821 (owner: 10Marostegui) [08:02:21] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511821 (owner: 10Marostegui) [08:03:25] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 (duration: 00m 55s) [08:03:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:07:55] (03CR) 10Gehel: [C: 04-1] Add maps reboot cookbook (034 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/511819 (https://phabricator.wikimedia.org/T224072) (owner: 10Mathew.onipe) [08:07:58] (03PS1) 10Marostegui: db-eqiad.php. Repool db1086 into API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511822 [08:08:28] (03PS1) 10Elukey: profile::analytics::refinery: change ownership of files in logrotate [puppet] - 10https://gerrit.wikimedia.org/r/511823 [08:09:23] (03CR) 10Elukey: [C: 03+2] profile::analytics::refinery: change ownership of files in logrotate [puppet] - 10https://gerrit.wikimedia.org/r/511823 (owner: 10Elukey) [08:15:06] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php. Repool db1086 into API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511822 (owner: 10Marostegui) [08:16:07] (03Merged) 10jenkins-bot: db-eqiad.php. Repool db1086 into API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511822 (owner: 10Marostegui) [08:16:40] (03CR) 10jenkins-bot: db-eqiad.php. Repool db1086 into API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511822 (owner: 10Marostegui) [08:17:15] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1086 into API (duration: 00m 56s) [08:17:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:20:06] (03CR) 10Mathew.onipe: "I should find a way to make sure node" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/511819 (https://phabricator.wikimedia.org/T224072) (owner: 10Mathew.onipe) [08:23:34] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511827 [08:30:14] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511827 (owner: 10Marostegui) [08:31:13] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511827 (owner: 10Marostegui) [08:31:27] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511827 (owner: 10Marostegui) [08:31:35] (03PS11) 10Gehel: elasticsearch: add new attribute [puppet] - 10https://gerrit.wikimedia.org/r/507950 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [08:32:23] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1086 (duration: 00m 56s) [08:32:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:32:37] (03CR) 10Gehel: [C: 03+2] elasticsearch: add new attribute [puppet] - 10https://gerrit.wikimedia.org/r/507950 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [08:39:27] (03CR) 10Gehel: [C: 04-1] "PPC is failing: https://puppet-compiler.wmflabs.org/compiler1001/16681/" [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [08:42:20] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [08:42:21] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [08:42:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:44:20] ACKNOWLEDGEMENT - MD RAID on sulfur is CRITICAL: connect to address 208.80.154.87 port 5666: No route to host nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T224087 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [08:44:24] 10Operations, 10ops-eqiad: Degraded RAID on sulfur - https://phabricator.wikimedia.org/T224087 (10ops-monitoring-bot) [08:46:59] looking, I rebooted that one [08:47:16] (03PS1) 10Marostegui: db-codfw.php: Move db2070 to m5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511831 (https://phabricator.wikimedia.org/T221533) [08:50:26] (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Move db2070 to m5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511831 (https://phabricator.wikimedia.org/T221533) (owner: 10Marostegui) [08:51:04] (03PS1) 10Marostegui: mariadb: Move db2070 from s1 to m5 [puppet] - 10https://gerrit.wikimedia.org/r/511833 (https://phabricator.wikimedia.org/T221533) [08:51:26] (03Merged) 10jenkins-bot: db-codfw.php: Move db2070 to m5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511831 (https://phabricator.wikimedia.org/T221533) (owner: 10Marostegui) [08:51:40] !log marostegui@deploy1001 sync-file aborted: Move db2070 from s1 to m5 (duration: 00m 03s) [08:51:40] (03CR) 10jenkins-bot: db-codfw.php: Move db2070 to m5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511831 (https://phabricator.wikimedia.org/T221533) (owner: 10Marostegui) [08:51:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:48] 10Operations, 10ops-eqiad: Degraded RAID on sulfur - https://phabricator.wikimedia.org/T224087 (10MoritzMuehlenhoff) 05Open→03Invalid The host is fine, the host was rebooted and the NRPE check happened during the reboot (despite being downtimed) [08:51:53] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Move db2070 from s1 to m5 [puppet] - 10https://gerrit.wikimedia.org/r/511833 (https://phabricator.wikimedia.org/T221533) (owner: 10Marostegui) [08:52:41] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Move db2070 from s1 to m5 (duration: 00m 55s) [08:52:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:54:21] 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10hashar) a:05hashar→03None [08:55:38] (03PS2) 10Marostegui: mariadb: Move db2070 from s1 to m5 [puppet] - 10https://gerrit.wikimedia.org/r/511833 (https://phabricator.wikimedia.org/T221533) [08:55:41] (03CR) 10Arturo Borrero Gonzalez: "I don't understand." [puppet] - 10https://gerrit.wikimedia.org/r/511769 (owner: 10Jbond) [08:56:26] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Move db2070 from s1 to m5 [puppet] - 10https://gerrit.wikimedia.org/r/511833 (https://phabricator.wikimedia.org/T221533) (owner: 10Marostegui) [08:56:28] 10Operations, 10WMDE-QWERTY-Team, 10serviceops, 10wikidiff2, 10WMDE-QWERTY-Sprint-2019-05-15: Deploy Wikidiff2 version 1.8.2 with the timeout issue fixed - https://phabricator.wikimedia.org/T223391 (10thiemowmde) @Tobi_WMDE_SW, can you take care of this and possibly assign people who are able to deploy t... [08:58:50] (03PS3) 10Marostegui: mariadb: Move db2070 from s1 to m5 [puppet] - 10https://gerrit.wikimedia.org/r/511833 (https://phabricator.wikimedia.org/T221533) [08:59:55] (03CR) 10Arturo Borrero Gonzalez: "would you please include a PCC run?" [puppet] - 10https://gerrit.wikimedia.org/r/511701 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [09:01:26] (03CR) 10Marostegui: "PCC looks happy: https://puppet-compiler.wmflabs.org/compiler1001/16683/" [puppet] - 10https://gerrit.wikimedia.org/r/511833 (https://phabricator.wikimedia.org/T221533) (owner: 10Marostegui) [09:01:30] (03CR) 10Marostegui: [C: 03+2] mariadb: Move db2070 from s1 to m5 [puppet] - 10https://gerrit.wikimedia.org/r/511833 (https://phabricator.wikimedia.org/T221533) (owner: 10Marostegui) [09:02:25] (03CR) 10Vgutierrez: [C: 03+2] "NOOP on existing instances https://puppet-compiler.wmflabs.org/compiler1002/16682/ and works as expected on the traffic-upload labs instan" [puppet] - 10https://gerrit.wikimedia.org/r/511716 (https://phabricator.wikimedia.org/T221594) (owner: 10Vgutierrez) [09:02:39] (03PS2) 10Vgutierrez: ATS: Toggle use of elevated privileges to load TLS material [puppet] - 10https://gerrit.wikimedia.org/r/511716 (https://phabricator.wikimedia.org/T221594) [09:03:16] 10Operations, 10ops-eqiad, 10cloud-services-team: cloudvirt1028 - no PS redundancy - https://phabricator.wikimedia.org/T224065 (10Volans) p:05Triage→03Normal [09:05:46] (03CR) 10Muehlenhoff: "I agree with Andrew, this is a transitional/temporary and simplicity is a certain win here." [puppet] - 10https://gerrit.wikimedia.org/r/508311 (https://phabricator.wikimedia.org/T221225) (owner: 10Arturo Borrero Gonzalez) [09:06:17] (03PS2) 10Michael Große: Add a list of IDs to skip in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511753 [09:06:21] (03CR) 10Vgutierrez: [C: 03+2] ATS: Fix service definition for non default instances [puppet] - 10https://gerrit.wikimedia.org/r/511711 (https://phabricator.wikimedia.org/T221217) (owner: 10Vgutierrez) [09:06:29] (03PS3) 10Vgutierrez: ATS: Fix service definition for non default instances [puppet] - 10https://gerrit.wikimedia.org/r/511711 (https://phabricator.wikimedia.org/T221217) [09:06:42] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/16678/deploy1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/508577 (owner: 10Giuseppe Lavagetto) [09:06:52] (03PS4) 10Giuseppe Lavagetto: profile::keyholder::server: profile for keyholder installation [puppet] - 10https://gerrit.wikimedia.org/r/508577 [09:08:14] (03PS5) 10Giuseppe Lavagetto: profile::keyholder::server: profile for keyholder installation [puppet] - 10https://gerrit.wikimedia.org/r/508577 [09:15:14] (03PS3) 10Giuseppe Lavagetto: profile::mediawiki::scap_client: rationalize scap2 installation [puppet] - 10https://gerrit.wikimedia.org/r/508578 [09:16:50] !log mobrovac@deploy1001 Started deploy [restbase/deploy@b90fb8b]: Temporarily copy from old tables to new ones if the data is not found, the correct way this time - T215956 [09:16:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:55] T215956: Consider stashing data-parsoid for VE - https://phabricator.wikimedia.org/T215956 [09:17:12] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/16684/deploy1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/508578 (owner: 10Giuseppe Lavagetto) [09:26:02] (03PS1) 10Mathew.onipe: logstash: cleanup duplication in logstash hiera [puppet] - 10https://gerrit.wikimedia.org/r/511838 [09:27:08] 10Operations, 10DNS, 10Matrix, 10Traffic, 10Wikimedia-Apache-configuration: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T223835 (10Volans) p:05Triage→03Normal [09:29:39] (03CR) 10Mathew.onipe: "PCC is noop: https://puppet-compiler.wmflabs.org/compiler1001/16685/" [puppet] - 10https://gerrit.wikimedia.org/r/511838 (owner: 10Mathew.onipe) [09:34:34] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::mediawiki::deployment::server: rationalize puppetization. [puppet] - 10https://gerrit.wikimedia.org/r/508579 (owner: 10Giuseppe Lavagetto) [09:36:35] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): cloudvirt1028 - no PS redundancy - https://phabricator.wikimedia.org/T224065 (10aborrero) This server is in the `B5` rack https://netbox.wikimedia.org/dcim/racks/13/ Is probably affected by the operations done in {T223126} [09:36:45] (03PS3) 10Giuseppe Lavagetto: profile::mediawiki::deployment::server: rationalize puppetization. [puppet] - 10https://gerrit.wikimedia.org/r/508579 [09:37:28] (03CR) 1020after4: [C: 03+1] "This is really awesome and it would be so helpful in tracking down the cause of the leaked workers on phabricator." [puppet] - 10https://gerrit.wikimedia.org/r/511751 (owner: 10Ori.livneh) [09:39:19] (03PS2) 10Mathew.onipe: logstash: cleanup duplication in logstash hiera [puppet] - 10https://gerrit.wikimedia.org/r/511838 [09:39:35] (03PS1) 10Marostegui: install_server: Do not reimage db2070 [puppet] - 10https://gerrit.wikimedia.org/r/511840 [09:40:15] (03CR) 10Marostegui: [C: 03+2] install_server: Do not reimage db2070 [puppet] - 10https://gerrit.wikimedia.org/r/511840 (owner: 10Marostegui) [09:42:36] !log Stop MySQL on db2078:m5 to clone db2070 - T221533 [09:42:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:42:41] T221533: Decommission old coredb machines (<=db2042) - https://phabricator.wikimedia.org/T221533 [09:43:57] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@b90fb8b]: Temporarily copy from old tables to new ones if the data is not found, the correct way this time - T215956 (duration: 27m 07s) [09:44:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:05] T215956: Consider stashing data-parsoid for VE - https://phabricator.wikimedia.org/T215956 [09:44:07] (03PS46) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) [09:46:23] PROBLEM - Check Varnish expiry mailbox lag on cp3035 is CRITICAL: CRITICAL: expiry mailbox lag is 2033429 https://wikitech.wikimedia.org/wiki/Varnish [09:46:49] vgutierrez: for your joy ;) ^^^ [09:47:33] (03CR) 10Mathew.onipe: "PCC is ok now: https://puppet-compiler.wmflabs.org/compiler1001/16688/" [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [09:47:43] (03PS1) 10Volans: Matrix wikimedia.org IDs domain authorization [dns] - 10https://gerrit.wikimedia.org/r/511842 (https://phabricator.wikimedia.org/T223835) [09:47:53] PROBLEM - Check systemd state on ms-be2017 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:48:13] volans: <3 [09:51:03] (03PS3) 10Mathew.onipe: logstash: cleanup duplication in logstash hiera [puppet] - 10https://gerrit.wikimedia.org/r/511838 [09:52:18] !log start the en, fr and de wiki dumps again to populate the new parsoid table - T215956 [09:52:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:52:23] T215956: Consider stashing data-parsoid for VE - https://phabricator.wikimedia.org/T215956 [09:52:56] (03PS1) 10Ladsgroup: deploy WikibaseSchema to test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511844 (https://phabricator.wikimedia.org/T216956) [09:56:10] PROBLEM - Check Varnish expiry mailbox lag on cp3039 is CRITICAL: CRITICAL: expiry mailbox lag is 2101315 https://wikitech.wikimedia.org/wiki/Varnish [09:57:11] (03PS4) 10Mathew.onipe: logstash: cleanup duplication in logstash hiera [puppet] - 10https://gerrit.wikimedia.org/r/511838 [09:59:59] PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [10:01:33] PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [10:01:38] !log restarting varnish-backend on cp3039 [10:01:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:02:36] (03PS5) 10Mathew.onipe: logstash: cleanup duplication in logstash hiera [puppet] - 10https://gerrit.wikimedia.org/r/511838 [10:05:08] (03PS4) 10Giuseppe Lavagetto: profile::mediawiki::deployment::server: rationalize puppetization. [puppet] - 10https://gerrit.wikimedia.org/r/508579 [10:05:24] (03CR) 10Mathew.onipe: "PCC is noop: https://puppet-compiler.wmflabs.org/compiler1001/16693/" [puppet] - 10https://gerrit.wikimedia.org/r/511838 (owner: 10Mathew.onipe) [10:06:35] RECOVERY - Check Varnish expiry mailbox lag on cp3039 is OK: OK: expiry mailbox lag is 0 https://wikitech.wikimedia.org/wiki/Varnish [10:07:01] RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [10:08:09] RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [10:20:17] (03PS1) 10Muehlenhoff: Revoke bawolff's access due to lost laptop [puppet] - 10https://gerrit.wikimedia.org/r/511845 [10:21:13] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/511845 (owner: 10Muehlenhoff) [10:24:15] (03CR) 10Muehlenhoff: [C: 03+2] Revoke bawolff's access due to lost laptop [puppet] - 10https://gerrit.wikimedia.org/r/511845 (owner: 10Muehlenhoff) [10:25:15] RECOVERY - Check systemd state on ms-be2017 is OK: OK - running: The system is fully operational [10:25:34] (03CR) 10Jbond: [C: 03+1] "LGTM just one minor comment" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/511689 (owner: 10Muehlenhoff) [10:25:44] (03PS1) 10Mobrovac: RESTBase: Replace soon-to-be-removed nodes with new ones [puppet] - 10https://gerrit.wikimedia.org/r/511846 (https://phabricator.wikimedia.org/T223976) [10:28:45] PROBLEM - puppet last run on mw1323 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:29:41] <_joe_> uhm [10:30:03] PROBLEM - puppet last run on mw2285 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:30:07] PROBLEM - puppet last run on mw2165 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:30:14] <_joe_> yeah something's wrong with the patch [10:30:19] PROBLEM - puppet last run on mw1248 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:30:25] PROBLEM - puppet last run on mw2258 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:30:47] PROBLEM - puppet last run on mw1283 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:31:23] PROBLEM - puppet last run on mw2147 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:31:27] PROBLEM - puppet last run on mw1303 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:31:29] <_joe_> gpasswd: user 'bawolff' does not exist [10:31:29] PROBLEM - puppet last run on mw1311 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:31:45] PROBLEM - puppet last run on mw1314 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:32:13] PROBLEM - puppet last run on mw1319 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:32:15] PROBLEM - puppet last run on mw1265 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:32:15] PROBLEM - puppet last run on mw2281 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:32:23] PROBLEM - puppet last run on mw1275 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:32:25] PROBLEM - puppet last run on mw1252 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:32:28] (03PS1) 10Urbanecm: Set wgLocaltimezone for euwiki to Europe/Berlin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511849 (https://phabricator.wikimedia.org/T224091) [10:33:01] PROBLEM - puppet last run on mw1312 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:33:03] PROBLEM - puppet last run on deploy2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:33:11] PROBLEM - puppet last run on mw2162 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:33:11] PROBLEM - puppet last run on mw2202 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:33:13] (03PS1) 10Giuseppe Lavagetto: admin: remove bawolff from remaining groups [puppet] - 10https://gerrit.wikimedia.org/r/511850 [10:33:31] PROBLEM - puppet last run on mw2265 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:33:31] PROBLEM - puppet last run on mw2288 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:33:31] PROBLEM - puppet last run on mw2221 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:33:53] PROBLEM - puppet last run on mw1247 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:33:57] PROBLEM - puppet last run on mw1262 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:05] PROBLEM - puppet last run on snapshot1009 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:07] PROBLEM - puppet last run on mw1282 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:07] PROBLEM - puppet last run on mw1316 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:07] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:13] PROBLEM - puppet last run on mw2280 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:13] PROBLEM - puppet last run on mw2248 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:13] PROBLEM - puppet last run on mw2234 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:13] PROBLEM - puppet last run on mw2277 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:13] PROBLEM - puppet last run on mw2223 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:14] PROBLEM - puppet last run on mw2290 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:14] PROBLEM - puppet last run on mw2157 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:15] PROBLEM - puppet last run on mw2182 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:15] PROBLEM - puppet last run on mw2141 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:16] PROBLEM - puppet last run on mw2195 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:16] PROBLEM - puppet last run on mw2212 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:23] PROBLEM - puppet last run on deploy1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:26] (03CR) 10Vgutierrez: [C: 03+1] admin: remove bawolff from remaining groups [puppet] - 10https://gerrit.wikimedia.org/r/511850 (owner: 10Giuseppe Lavagetto) [10:34:29] PROBLEM - puppet last run on mw2152 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:34:39] (03CR) 10ArielGlenn: [C: 03+1] admin: remove bawolff from remaining groups [puppet] - 10https://gerrit.wikimedia.org/r/511850 (owner: 10Giuseppe Lavagetto) [10:34:41] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/511850 (owner: 10Giuseppe Lavagetto) [10:34:52] (03CR) 10Giuseppe Lavagetto: [C: 03+2] admin: remove bawolff from remaining groups [puppet] - 10https://gerrit.wikimedia.org/r/511850 (owner: 10Giuseppe Lavagetto) [10:34:57] PROBLEM - puppet last run on mw1239 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:35:03] PROBLEM - puppet last run on mw1242 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:35:10] PROBLEM - puppet last run on mw2183 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:35:11] PROBLEM - puppet last run on mw1309 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:35:17] PROBLEM - puppet last run on mw2135 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:35:17] PROBLEM - puppet last run on mw2144 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:35:21] PROBLEM - puppet last run on labweb1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:35:21] PROBLEM - puppet last run on mw1326 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:35:29] PROBLEM - puppet last run on mw2251 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:35:29] PROBLEM - puppet last run on mw2177 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:35:49] PROBLEM - puppet last run on mw2169 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:35:53] PROBLEM - puppet last run on mw1346 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:36:05] PROBLEM - puppet last run on mw2266 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:36:17] PROBLEM - puppet last run on mw1332 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:36:17] PROBLEM - puppet last run on mw1339 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:36:21] PROBLEM - puppet last run on mw1322 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:36:35] PROBLEM - puppet last run on mw2246 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:36:35] PROBLEM - puppet last run on mw2156 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:36:37] PROBLEM - puppet last run on mw1293 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:36:41] PROBLEM - puppet last run on mw2137 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:36:45] PROBLEM - puppet last run on mw1295 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:36:57] PROBLEM - puppet last run on mw1284 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:37:07] PROBLEM - puppet last run on mw2252 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:37:11] PROBLEM - puppet last run on mw2240 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 8 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:37:11] PROBLEM - puppet last run on mw2214 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:37:41] PROBLEM - puppet last run on mw1276 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:37:49] PROBLEM - puppet last run on mw2255 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:37:49] PROBLEM - puppet last run on mw2176 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:37:49] PROBLEM - puppet last run on mw2172 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:37:59] PROBLEM - puppet last run on mw2245 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:37:59] PROBLEM - puppet last run on mw2287 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 8 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:38:03] PROBLEM - puppet last run on mw2218 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:38:27] RECOVERY - puppet last run on deploy2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:38:27] PROBLEM - puppet last run on mw2220 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:38:33] PROBLEM - puppet last run on mw1287 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:38:55] PROBLEM - puppet last run on mw2231 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:38:57] PROBLEM - puppet last run on mw2244 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:38:59] PROBLEM - puppet last run on mw2225 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:38:59] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:39:05] PROBLEM - puppet last run on mw1294 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:39:09] PROBLEM - puppet last run on mw2274 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:39:17] PROBLEM - puppet last run on mw1245 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:39:29] PROBLEM - puppet last run on mw1310 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:39:33] PROBLEM - puppet last run on mw2256 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:39:35] PROBLEM - puppet last run on mw2201 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:39:35] PROBLEM - puppet last run on mw2279 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:39:35] PROBLEM - puppet last run on mw2233 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:39:35] PROBLEM - puppet last run on mw2286 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:39:35] PROBLEM - puppet last run on mw2253 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:39:45] RECOVERY - puppet last run on deploy1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:39:47] PROBLEM - puppet last run on mw1232 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:40:21] PROBLEM - puppet last run on mw1233 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:40:31] PROBLEM - puppet last run on mw1263 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:40:37] PROBLEM - puppet last run on mw1274 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:40:43] PROBLEM - puppet last run on mw1345 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:40:43] PROBLEM - puppet last run on mw2160 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:40:49] just ran manually on mw2231 and we're back to ok [10:40:53] PROBLEM - puppet last run on an-master1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[analytics-privatedata-users_ensure_members] [10:40:53] PROBLEM - puppet last run on mw1279 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:41:03] apergos: <3 [10:41:05] PROBLEM - puppet last run on mw2186 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:41:19] PROBLEM - puppet last run on mw2138 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:41:19] PROBLEM - puppet last run on mw2208 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:41:24] just might have some spam until these clear [10:41:31] E_TOOMANYSERVERS [10:41:57] PROBLEM - puppet last run on mwlog2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:42:19] PROBLEM - puppet last run on mw1255 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:42:21] PROBLEM - puppet last run on mw2271 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:42:37] PROBLEM - puppet last run on mw2146 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:43:09] PROBLEM - puppet last run on mw2262 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[deployment_ensure_members] [10:44:17] RECOVERY - puppet last run on mw2231 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:55:47] RECOVERY - puppet last run on mw1323 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [10:57:05] RECOVERY - puppet last run on mw2285 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [10:57:23] RECOVERY - puppet last run on mw2258 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [10:57:23] (03PS1) 10Marostegui: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511860 (https://phabricator.wikimedia.org/T224017) [10:57:43] RECOVERY - puppet last run on mw1283 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:58:23] RECOVERY - puppet last run on mw2147 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:58:25] RECOVERY - puppet last run on mw1303 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:58:29] RECOVERY - puppet last run on mw1311 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:58:43] RECOVERY - puppet last run on mw1314 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:59:11] RECOVERY - puppet last run on mw1319 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:59:13] RECOVERY - puppet last run on mw1265 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:59:15] RECOVERY - puppet last run on mw2281 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:21] RECOVERY - puppet last run on mw1275 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [10:59:23] RECOVERY - puppet last run on mw1252 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:59:23] RECOVERY - puppet last run on mw2255 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [10:59:57] RECOVERY - puppet last run on mw1312 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:04] MaxSem, RoanKattouw, and Niharika: Dear deployers, time to do the European Mid-day SWAT(Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190522T1100). [11:00:04] matthiasmullie and Amir1: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:09] RECOVERY - puppet last run on mw2202 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [11:00:19] o/ [11:00:26] I can SWAT today I guess [11:00:29] RECOVERY - puppet last run on mw2265 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:30] RECOVERY - puppet last run on mw2288 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:30] RECOVERY - puppet last run on mw2221 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:00:50] RECOVERY - puppet last run on mw1247 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:00:55] RECOVERY - puppet last run on mw1262 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:00:58] Amir1: I can do mine if you want [11:01:07] RECOVERY - puppet last run on snapshot1009 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:01:07] RECOVERY - puppet last run on mw1282 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:01:07] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:01:07] RECOVERY - puppet last run on mw1316 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:01:13] RECOVERY - puppet last run on mw2234 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [11:01:13] RECOVERY - puppet last run on mw2277 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [11:01:13] RECOVERY - puppet last run on mw2280 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:01:13] RECOVERY - puppet last run on mw2248 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:01:13] RECOVERY - puppet last run on mw2290 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [11:01:14] RECOVERY - puppet last run on mw2212 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:01:14] RECOVERY - puppet last run on mw2223 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:01:15] RECOVERY - puppet last run on mw2195 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:01:15] RECOVERY - puppet last run on mw2157 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:01:16] RECOVERY - puppet last run on mw2141 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:01:16] RECOVERY - puppet last run on mw2182 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:01:29] RECOVERY - puppet last run on mw2152 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:01:32] matthiasmullie: nah, it's a lot of hassle to set up the system [11:01:41] I need to do it anyway [11:01:45] looks like all 3 patches are InitialiseSettings.php [11:01:57] RECOVERY - puppet last run on mw1239 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:02:03] RECOVERY - puppet last run on mw1242 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [11:02:11] RECOVERY - puppet last run on mw2183 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [11:02:13] RECOVERY - puppet last run on mw1309 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:02:19] RECOVERY - puppet last run on mw2144 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:02:20] RECOVERY - puppet last run on mw2135 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:02:23] RECOVERY - puppet last run on labweb1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:02:23] RECOVERY - puppet last run on mw1326 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [11:02:29] RECOVERY - puppet last run on mw2251 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [11:02:29] RECOVERY - puppet last run on mw2177 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [11:02:31] RECOVERY - puppet last run on mw2165 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [11:02:41] RECOVERY - puppet last run on mw1248 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:02:45] (03PS2) 10Ladsgroup: [SDC] Enable depicts qualifiers on testcommons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511674 (owner: 10Matthias Mullie) [11:02:47] RECOVERY - puppet last run on mw2169 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:02:49] RECOVERY - puppet last run on mw1346 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:02:49] (03CR) 10Ladsgroup: [C: 03+2] [SDC] Enable depicts qualifiers on testcommons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511674 (owner: 10Matthias Mullie) [11:03:03] RECOVERY - puppet last run on mw2266 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:03:15] RECOVERY - puppet last run on mw1339 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:19] RECOVERY - puppet last run on mw1322 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:31] RECOVERY - puppet last run on mw2246 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [11:03:31] RECOVERY - puppet last run on mw2156 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:03:33] RECOVERY - puppet last run on mw1293 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [11:03:37] RECOVERY - puppet last run on mw2137 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:03:41] RECOVERY - puppet last run on mw1295 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:03:57] RECOVERY - puppet last run on mw1284 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [11:04:04] (03Merged) 10jenkins-bot: [SDC] Enable depicts qualifiers on testcommons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511674 (owner: 10Matthias Mullie) [11:04:07] RECOVERY - puppet last run on mw2252 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:10] RECOVERY - puppet last run on mw2240 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [11:04:11] RECOVERY - puppet last run on mw2214 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [11:04:21] (03CR) 10jenkins-bot: [SDC] Enable depicts qualifiers on testcommons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511674 (owner: 10Matthias Mullie) [11:04:47] RECOVERY - puppet last run on mw2172 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:04:47] RECOVERY - puppet last run on mw2176 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:04:55] RECOVERY - puppet last run on mw2287 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [11:04:55] RECOVERY - puppet last run on mw2245 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:05:01] RECOVERY - puppet last run on mw2218 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:05:23] RECOVERY - puppet last run on mw2220 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:05:29] RECOVERY - puppet last run on mw1287 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:05:33] RECOVERY - puppet last run on mw2162 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:05:55] RECOVERY - puppet last run on mw2244 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:05:57] RECOVERY - puppet last run on mw2225 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:05:57] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:05:58] matthiasmullie: your patch is live on mwdebug1002 [11:06:03] RECOVERY - puppet last run on mw1294 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:06:09] RECOVERY - puppet last run on mw2274 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:06:13] RECOVERY - puppet last run on mw1245 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [11:06:14] okay [11:06:31] RECOVERY - puppet last run on mw1310 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:06:33] RECOVERY - puppet last run on mw2256 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [11:06:33] RECOVERY - puppet last run on mw2201 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:06:35] RECOVERY - puppet last run on mw2233 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:06:35] RECOVERY - puppet last run on mw2286 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:35] RECOVERY - puppet last run on mw2279 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:35] RECOVERY - puppet last run on mw2253 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:06:49] RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:07:19] RECOVERY - puppet last run on mw1233 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:07:29] RECOVERY - puppet last run on mw1263 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:07:35] RECOVERY - puppet last run on mw1274 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:07:45] RECOVERY - puppet last run on mw1345 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:07:45] RECOVERY - puppet last run on mw2160 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:07:53] RECOVERY - puppet last run on an-master1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:07:53] RECOVERY - puppet last run on mw1279 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:08:03] RECOVERY - puppet last run on mw2186 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:08:15] RECOVERY - puppet last run on mw2208 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:08:15] RECOVERY - puppet last run on mw2138 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:08:37] RECOVERY - puppet last run on mw1332 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [11:08:55] RECOVERY - puppet last run on mwlog2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:09:17] RECOVERY - puppet last run on mw1255 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:09:19] RECOVERY - puppet last run on mw2271 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:09:37] RECOVERY - puppet last run on mw2146 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:10:03] RECOVERY - puppet last run on mw1276 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [11:10:07] RECOVERY - puppet last run on mw2262 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:11:34] Amir1: seems to work! [11:12:08] okie dokie, I go live [11:14:13] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:511674|[SDC] Enable depicts qualifiers on testcommons]] (duration: 00m 57s) [11:14:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:47] (03PS7) 10Ladsgroup: Add configuration for EntitySchema ShExSimpleUrl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509878 (https://phabricator.wikimedia.org/T223120) (owner: 10Michael Große) [11:16:33] (03CR) 10Ladsgroup: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509878 (https://phabricator.wikimedia.org/T223120) (owner: 10Michael Große) [11:17:32] (03Merged) 10jenkins-bot: Add configuration for EntitySchema ShExSimpleUrl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509878 (https://phabricator.wikimedia.org/T223120) (owner: 10Michael Große) [11:17:44] (03CR) 10jenkins-bot: Add configuration for EntitySchema ShExSimpleUrl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509878 (https://phabricator.wikimedia.org/T223120) (owner: 10Michael Große) [11:19:48] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:509878|Add configuration for EntitySchema ShExSimpleUrl (T223120)]] (duration: 00m 56s) [11:19:49] (03CR) 10Arturo Borrero Gonzalez: "Adding Bryan Davis as reviewer, since he is the mind behind striker." [puppet] - 10https://gerrit.wikimedia.org/r/511769 (owner: 10Jbond) [11:19:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:53] T223120: Add configuration for ShExSimple url - https://phabricator.wikimedia.org/T223120 [11:21:09] (03PS2) 10Muehlenhoff: Make the openldap class compatible with Cloud VPS [puppet] - 10https://gerrit.wikimedia.org/r/511689 [11:21:15] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/16694/deploy1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/508579 (owner: 10Giuseppe Lavagetto) [11:21:25] (03PS5) 10Giuseppe Lavagetto: profile::mediawiki::deployment::server: rationalize puppetization. [puppet] - 10https://gerrit.wikimedia.org/r/508579 [11:21:45] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] dologmsg: fix variable [puppet] - 10https://gerrit.wikimedia.org/r/511750 (owner: 10Lucas Werkmeister (WMDE)) [11:22:20] (03CR) 10Ladsgroup: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/503342 (https://phabricator.wikimedia.org/T220609) (owner: 10Lucas Werkmeister (WMDE)) [11:22:31] (03CR) 10jerkins-bot: [V: 04-1] Remove constraint-suggestions beta feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/503342 (https://phabricator.wikimedia.org/T220609) (owner: 10Lucas Werkmeister (WMDE)) [11:25:41] (03PS3) 10Giuseppe Lavagetto: role::deployment_server: fold in the base class [puppet] - 10https://gerrit.wikimedia.org/r/508580 [11:25:47] (03PS3) 10Lucas Werkmeister (WMDE): Remove constraint-suggestions beta feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/503342 (https://phabricator.wikimedia.org/T220609) [11:26:08] (03CR) 10Lucas Werkmeister (WMDE): "Rebased (conflicted with removal of PHP7 beta feature)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/503342 (https://phabricator.wikimedia.org/T220609) (owner: 10Lucas Werkmeister (WMDE)) [11:26:19] (03CR) 10Ladsgroup: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/503342 (https://phabricator.wikimedia.org/T220609) (owner: 10Lucas Werkmeister (WMDE)) [11:27:19] (03Merged) 10jenkins-bot: Remove constraint-suggestions beta feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/503342 (https://phabricator.wikimedia.org/T220609) (owner: 10Lucas Werkmeister (WMDE)) [11:27:33] (03CR) 10jenkins-bot: Remove constraint-suggestions beta feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/503342 (https://phabricator.wikimedia.org/T220609) (owner: 10Lucas Werkmeister (WMDE)) [11:30:20] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:503342|Remove constraint-suggestions beta feature (T220609)]] (duration: 00m 57s) [11:30:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:30:27] T220609: Enable constraints suggestions for everyone and remove beta feature - https://phabricator.wikimedia.org/T220609 [11:30:48] !log EU SWAT is done [11:30:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:46] thanks, Amir1 ! [11:35:02] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/16695/deploy1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/508580 (owner: 10Giuseppe Lavagetto) [11:35:56] (03CR) 10Jbond: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/511769 (owner: 10Jbond) [12:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190522T1200) [12:03:07] (03CR) 10Jbond: "One comment currently wikimedia.modular.im. dose not exist ideally we should ensure that exists and isn't a cname before configuring this." [dns] - 10https://gerrit.wikimedia.org/r/511842 (https://phabricator.wikimedia.org/T223835) (owner: 10Volans) [12:04:38] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/511689 (owner: 10Muehlenhoff) [12:09:57] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [12:11:55] PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [12:13:40] (03PS3) 10Muehlenhoff: Make the openldap class compatible with Cloud VPS [puppet] - 10https://gerrit.wikimedia.org/r/511689 [12:13:41] PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [12:14:15] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [12:18:42] (03CR) 10Muehlenhoff: [C: 03+2] Make the openldap class compatible with Cloud VPS (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/511689 (owner: 10Muehlenhoff) [12:19:33] (03CR) 10Gehel: [C: 04-1] "PCC in error: https://puppet-compiler.wmflabs.org/compiler1001/16697/" [puppet] - 10https://gerrit.wikimedia.org/r/511838 (owner: 10Mathew.onipe) [12:20:25] RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [12:20:47] RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [12:21:03] PROBLEM - cassandra-c CQL 10.64.0.232:9042 on restbase1007 is CRITICAL: connect to address 10.64.0.232 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886 [12:22:07] PROBLEM - cassandra-c SSL 10.64.0.232:7001 on restbase1007 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://phabricator.wikimedia.org/T120662 [12:23:48] (03CR) 10Gehel: [C: 04-1] "Note that there is similar duplication between the different logstash/elasticsearch.yaml files." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/511838 (owner: 10Mathew.onipe) [12:24:18] (03PS3) 10Jbond: debdeploy client: update defaults [puppet] - 10https://gerrit.wikimedia.org/r/511760 [12:25:01] (03CR) 10Jbond: [C: 03+2] debdeploy client: update defaults [puppet] - 10https://gerrit.wikimedia.org/r/511760 (owner: 10Jbond) [12:26:56] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511860 (https://phabricator.wikimedia.org/T224017) (owner: 10Marostegui) [12:28:01] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511860 (https://phabricator.wikimedia.org/T224017) (owner: 10Marostegui) [12:28:15] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511860 (https://phabricator.wikimedia.org/T224017) (owner: 10Marostegui) [12:29:13] PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [12:29:27] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1080 to rebuild revision table T224017 (duration: 00m 55s) [12:29:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:29:33] T224017: Slow query ApiQueryRevisions on enwiki - https://phabricator.wikimedia.org/T224017 [12:29:47] PROBLEM - Upload HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=5 [12:30:45] (03PS2) 10Jbond: hiera: update search order [puppet] - 10https://gerrit.wikimedia.org/r/511686 [12:34:01] !log Stop replication on db1080 to rebuild revision table - T224017 [12:34:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:35:31] !log cp3035: restarting varnish backend [12:35:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:36:22] (03PS1) 10Tulsi Bhagat: Configuring $wgMetaNamespace for for ur.wiktionary, ur.wikibooks and ur.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511868 [12:37:33] (03CR) 10Jbond: [C: 03+2] flake8 - misc: add py extension so CI can run [puppet] - 10https://gerrit.wikimedia.org/r/510465 (https://phabricator.wikimedia.org/T144169) (owner: 10Jbond) [12:37:43] (03PS3) 10Jbond: flake8 - misc: add py extension so CI can run [puppet] - 10https://gerrit.wikimedia.org/r/510465 (https://phabricator.wikimedia.org/T144169) [12:39:17] !log Stop replication on db2048 (s1 codfw master) to rebuild revision table - this will generate lag on codfw - T224017 [12:39:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:22] T224017: Slow query ApiQueryRevisions on enwiki - https://phabricator.wikimedia.org/T224017 [12:42:39] (03PS1) 10Vgutierrez: ATS: Provide parent proxies support [puppet] - 10https://gerrit.wikimedia.org/r/511869 (https://phabricator.wikimedia.org/T221594) [12:43:05] RECOVERY - Check Varnish expiry mailbox lag on cp3035 is OK: OK: expiry mailbox lag is 0 https://wikitech.wikimedia.org/wiki/Varnish [12:43:06] (03PS2) 10Tulsi Bhagat: Configuring $wgMetaNamespace for ur.wiktionary, ur.wikibooks and ur.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511868 (https://phabricator.wikimedia.org/T223964) [12:43:25] (03PS2) 10Vgutierrez: ATS: Provide parent proxies support [puppet] - 10https://gerrit.wikimedia.org/r/511869 (https://phabricator.wikimedia.org/T221594) [12:43:57] (03CR) 10jerkins-bot: [V: 04-1] ATS: Provide parent proxies support [puppet] - 10https://gerrit.wikimedia.org/r/511869 (https://phabricator.wikimedia.org/T221594) (owner: 10Vgutierrez) [12:45:05] (03CR) 10Tulsi Bhagat: "Requires `namespaceDupes.php --wiki=urwikibooks --fix` after deployment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511868 (https://phabricator.wikimedia.org/T223964) (owner: 10Tulsi Bhagat) [12:46:13] RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [12:46:45] RECOVERY - Upload HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=5 [12:48:48] 10Operations, 10DNS, 10Matrix, 10Traffic, and 2 others: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T223835 (10jbond) @Tgr while reviewing the change created by volans i noticed that currently `wikimedia.modular.im.` dose not exist. We should ensur... [12:49:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1004 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [12:51:06] (03PS3) 10Vgutierrez: ATS: Provide parent proxies support [puppet] - 10https://gerrit.wikimedia.org/r/511869 (https://phabricator.wikimedia.org/T221594) [12:53:19] (03PS2) 10Arturo Borrero Gonzalez: sudo: decouple sudo from sudo-ldap [puppet] - 10https://gerrit.wikimedia.org/r/508311 (https://phabricator.wikimedia.org/T221225) [12:54:05] (03CR) 10jerkins-bot: [V: 04-1] sudo: decouple sudo from sudo-ldap [puppet] - 10https://gerrit.wikimedia.org/r/508311 (https://phabricator.wikimedia.org/T221225) (owner: 10Arturo Borrero Gonzalez) [12:54:53] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1004 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [13:00:04] hashar: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for MediaWiki train - European version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190522T1300). [13:02:31] will do train soon [13:03:28] (03PS4) 10Vgutierrez: ATS: Provide parent proxies support [puppet] - 10https://gerrit.wikimedia.org/r/511869 (https://phabricator.wikimedia.org/T221594) [13:05:47] 10Operations, 10Cassandra, 10RESTBase, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), and 3 others: Decommission restbase10(0[7-9]|1[0-5]) - https://phabricator.wikimedia.org/T223976 (10Eevans) [13:05:49] wish me luck [13:06:51] !log decommissioning restbase1008-a -- T223976 [13:06:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:56] T223976: Decommission restbase10(0[7-9]|1[0-5]) - https://phabricator.wikimedia.org/T223976 [13:08:03] (03PS1) 10Hashar: group1 wikis to 1.34.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511871 [13:08:04] (03CR) 10Hashar: [C: 03+2] group1 wikis to 1.34.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511871 (owner: 10Hashar) [13:09:11] (03Merged) 10jenkins-bot: group1 wikis to 1.34.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511871 (owner: 10Hashar) [13:09:25] (03CR) 10jenkins-bot: group1 wikis to 1.34.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511871 (owner: 10Hashar) [13:09:28] SELECT MASTER_GTID_WAIT(...., 10) which triggers a slow_query cause we have set the threshold at 10 seconds I guess [13:11:01] (03PS3) 10Arturo Borrero Gonzalez: sudo: decouple sudo from sudo-ldap [puppet] - 10https://gerrit.wikimedia.org/r/508311 (https://phabricator.wikimedia.org/T221225) [13:11:14] !log hashar@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.6 [13:12:09] !log hashar@deploy1001 Synchronized php: group1 wikis to 1.34.0-wmf.6 (duration: 00m 54s) [13:13:07] (03CR) 10Vgutierrez: "Mostly a NOOP for existing nodes: https://puppet-compiler.wmflabs.org/compiler1002/16700/" [puppet] - 10https://gerrit.wikimedia.org/r/511869 (https://phabricator.wikimedia.org/T221594) (owner: 10Vgutierrez) [13:13:14] hashar@deploy1001: Failed to log message to wiki. Somebody should check the error logs. [13:13:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:13] (03PS4) 10Arturo Borrero Gonzalez: sudo: decouple sudo from sudo-ldap [puppet] - 10https://gerrit.wikimedia.org/r/508311 (https://phabricator.wikimedia.org/T221225) [13:16:50] (03CR) 10Eevans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/511846 (https://phabricator.wikimedia.org/T223976) (owner: 10Mobrovac) [13:21:45] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1004 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [13:22:23] grblblb [13:22:36] oh those must be the web request timing out [13:25:31] (03CR) 10Filippo Giunchedi: [C: 03+1] "PCC https://puppet-compiler.wmflabs.org/compiler1002/16699/" [puppet] - 10https://gerrit.wikimedia.org/r/511846 (https://phabricator.wikimedia.org/T223976) (owner: 10Mobrovac) [13:26:01] 10Operations, 10Puppet, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188 (10Krenair) Looks like we regressed here while I was busy - logged onto the new puppetmasters to find puppet has be... [13:27:21] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1004 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [13:27:47] !log reedy@deploy1001 Synchronized php-1.34.0-wmf.6/extensions/Collection/templates/: T224092 (duration: 00m 58s) [13:27:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:57] T224092: [Collections] Call to a member function getLocalUrl() on a non-object (null) - https://phabricator.wikimedia.org/T224092 [13:28:30] (03PS1) 10Alex Monk: openstack puppetmaster profiles: don't include clientpackages [puppet] - 10https://gerrit.wikimedia.org/r/511875 (https://phabricator.wikimedia.org/T171188) [13:33:12] (03CR) 10Gehel: [C: 04-1] Add maps reboot cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/511819 (https://phabricator.wikimedia.org/T224072) (owner: 10Mathew.onipe) [13:33:29] (03PS1) 10Alex Monk: openstack puppetmaster roles: duplicate for set of profiles to be used in labs [puppet] - 10https://gerrit.wikimedia.org/r/511877 (https://phabricator.wikimedia.org/T171188) [13:34:00] (03PS47) 10Gehel: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [13:35:35] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=dns5001.wikimedia.org [13:35:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:50] 10Operations, 10Data-Services, 10cloud-services-team (Kanban): labstore1006 spontaneous reboot - https://phabricator.wikimedia.org/T217473 (10Maintenance_bot) [13:36:08] 10Operations, 10Traffic: ATS is currently adding its own server header - https://phabricator.wikimedia.org/T224119 (10Vgutierrez) [13:36:25] 10Operations, 10Traffic: ATS is currently adding its own server header - https://phabricator.wikimedia.org/T224119 (10Vgutierrez) p:05Triage→03Normal [13:36:38] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [13:36:38] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [13:36:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:37:17] (03PS1) 10BBlack: wikimedia.org gsuite-test verification [dns] - 10https://gerrit.wikimedia.org/r/511881 (https://phabricator.wikimedia.org/T223921) [13:37:29] Reedy: for Flagged revs .. I guess it is worth a rollback. Though I am not sure what is the impact of those fatals [13:38:06] (03CR) 10BBlack: [C: 03+2] wikimedia.org gsuite-test verification [dns] - 10https://gerrit.wikimedia.org/r/511881 (https://phabricator.wikimedia.org/T223921) (owner: 10BBlack) [13:41:51] 10Operations, 10DNS, 10Traffic, 10Patch-For-Review: GSuite Test Domain Verification - https://phabricator.wikimedia.org/T223921 (10BBlack) @HMarcus - The record is live, can you try the validation and let me know how it goes? Note there was already another google site verification token like this at the s... [13:42:20] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=dns5001.wikimedia.org [13:42:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:42:32] 10Operations, 10Release Pipeline, 10Release-Engineering-Team, 10serviceops, and 4 others: Kask integration testing with Cassandra via the Deployment Pipeline - https://phabricator.wikimedia.org/T224041 (10akosiaris) >>! In T224041#5201573, @thcipriani wrote: > It seems that the cassandra subchart already e... [13:42:39] (03PS1) 10Hashar: Rollback cawikinews to 1.34.0-wmf.5 due to FlaggedRevs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511882 (https://phabricator.wikimedia.org/T220731) [13:42:56] (03CR) 10Hashar: [C: 03+2] Rollback cawikinews to 1.34.0-wmf.5 due to FlaggedRevs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511882 (https://phabricator.wikimedia.org/T220731) (owner: 10Hashar) [13:43:42] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=dns5002.wikimedia.org [13:43:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:43:49] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [13:43:49] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [13:43:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:43:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:44:01] (03Merged) 10jenkins-bot: Rollback cawikinews to 1.34.0-wmf.5 due to FlaggedRevs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511882 (https://phabricator.wikimedia.org/T220731) (owner: 10Hashar) [13:44:15] (03CR) 10jenkins-bot: Rollback cawikinews to 1.34.0-wmf.5 due to FlaggedRevs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511882 (https://phabricator.wikimedia.org/T220731) (owner: 10Hashar) [13:45:32] !log hashar@deploy1001 rebuilt and synchronized wikiversions files: (no justification provided) [13:45:52] (03PS7) 10Vgutierrez: prometheus: Toggle SSL certificate verification for trafficserver-exporter [puppet] - 10https://gerrit.wikimedia.org/r/508327 (https://phabricator.wikimedia.org/T221217) [13:45:54] (03PS17) 10Vgutierrez: ATS: Provide a unified monitoring define [puppet] - 10https://gerrit.wikimedia.org/r/506986 (https://phabricator.wikimedia.org/T221217) [13:45:58] (03PS7) 10Vgutierrez: ATS: Provide a unified logs define [puppet] - 10https://gerrit.wikimedia.org/r/510641 (https://phabricator.wikimedia.org/T221217) [13:46:00] (03PS5) 10Vgutierrez: ATS: Provide parent proxies support [puppet] - 10https://gerrit.wikimedia.org/r/511869 (https://phabricator.wikimedia.org/T221594) [13:46:02] (03PS58) 10Vgutierrez: ATS: Provide a TLS terminator profile [puppet] - 10https://gerrit.wikimedia.org/r/506398 (https://phabricator.wikimedia.org/T221594) [13:46:36] hashar@deploy1001: Failed to log message to wiki. Somebody should check the error logs. [13:46:57] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=dns5002.wikimedia.org [13:47:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:49:11] (03CR) 10Gehel: [C: 04-1] "Test run on elastic1038 shows an issue:" [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [13:52:27] (03PS1) 10BBlack: cache: reimage cp3046 as upload_ats [puppet] - 10https://gerrit.wikimedia.org/r/511886 (https://phabricator.wikimedia.org/T222937) [13:52:44] (03PS6) 10Mathew.onipe: logstash: cleanup duplication in logstash hiera [puppet] - 10https://gerrit.wikimedia.org/r/511838 [13:57:51] !log rebooting swift frontends in codfw [13:57:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:58:12] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [13:58:15] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [13:58:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:58:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:58:26] !log depool cp3046 for reimage to ats-be - T222937 [13:58:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:58:31] T222937: Replace Varnish backends with ATS on cache upload nodes in esams - https://phabricator.wikimedia.org/T222937 [13:59:59] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Bump service-checker docker container image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/510743 (https://phabricator.wikimedia.org/T220401) (owner: 10Alexandros Kosiaris) [14:00:16] (03CR) 10Vgutierrez: [C: 03+1] cache: reimage cp3046 as upload_ats [puppet] - 10https://gerrit.wikimedia.org/r/511886 (https://phabricator.wikimedia.org/T222937) (owner: 10BBlack) [14:00:59] (03CR) 10BBlack: [C: 03+2] cache: reimage cp3046 as upload_ats [puppet] - 10https://gerrit.wikimedia.org/r/511886 (https://phabricator.wikimedia.org/T222937) (owner: 10BBlack) [14:01:34] 10Operations, 10Traffic, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in esams - https://phabricator.wikimedia.org/T222937 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by bblack on cumin1001.eqiad.wmnet for hosts: ` ['cp3046.esams.wmnet'] ` The log can be found i... [14:02:18] !log Stop MySQL on db2078 for upgrade [14:02:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:28] 10Operations, 10ops-codfw, 10Analytics, 10Patch-For-Review: rack/setup/install kafka-main200[1-5] - https://phabricator.wikimedia.org/T223493 (10Ottomata) > can you please provide me with the partman recipe for those systems raid10-gpt-srv-lvm-ext4.cfg would work, but it uses only disks. I think you sho... [14:06:30] 10Operations, 10Core Platform Team (MCR), 10Core Platform Team Backlog (Next), 10Multi-Content-Revisions (Reactive), and 2 others: Unable to move page (Special:MovePage&action=submit) Title does not belong to page ID X but actually belong to Y - https://phabricator.wikimedia.org/T221763 (10hashar) [14:08:14] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [14:08:14] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:08:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:09:06] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=dns4001.wikimedia.org [14:09:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:10:58] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=dns4002.wikimedia.org [14:11:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:34] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [14:11:35] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:11:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:16] !log start it, es wiki dumps (fr and de completed) to fill the new parsoid tables - T215956 [14:14:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:22] T215956: Consider stashing data-parsoid for VE - https://phabricator.wikimedia.org/T215956 [14:14:35] !log 1.34.0-wmf.6 deployed to group1 with the exception of cawikinews due to T224116 [14:14:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:40] T224116: operand type was used: expects array(s) or collection(s) in /srv/mediawiki/wmf-config/flaggedrevs.php on line 182 - https://phabricator.wikimedia.org/T224116 [14:16:59] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=dns4002.wikimedia.org [14:17:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:30] (03PS1) 10Dbarratt: Enable Partial Blocks on more Wikipedias. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511889 (https://phabricator.wikimedia.org/T222218) [14:19:18] (03PS2) 10Mobrovac: RESTBase: Replace soon-to-be-removed nodes with new ones [puppet] - 10https://gerrit.wikimedia.org/r/511846 (https://phabricator.wikimedia.org/T223976) [14:19:29] (03PS1) 10Vgutierrez: ATS: Fix typo in ssl_multicert template [puppet] - 10https://gerrit.wikimedia.org/r/511890 (https://phabricator.wikimedia.org/T221594) [14:21:34] (03CR) 10Vgutierrez: [C: 03+2] "NOOP for existing ATS instances: https://puppet-compiler.wmflabs.org/compiler1001/16705/" [puppet] - 10https://gerrit.wikimedia.org/r/511890 (https://phabricator.wikimedia.org/T221594) (owner: 10Vgutierrez) [14:21:54] (03PS2) 10Vgutierrez: ATS: Fix typo in ssl_multicert template [puppet] - 10https://gerrit.wikimedia.org/r/511890 (https://phabricator.wikimedia.org/T221594) [14:22:20] (03CR) 10Mobrovac: [C: 03+1] "PCC - https://puppet-compiler.wmflabs.org/compiler1002/16704/" [puppet] - 10https://gerrit.wikimedia.org/r/511846 (https://phabricator.wikimedia.org/T223976) (owner: 10Mobrovac) [14:23:58] (03CR) 10Filippo Giunchedi: [C: 03+2] RESTBase: Replace soon-to-be-removed nodes with new ones [puppet] - 10https://gerrit.wikimedia.org/r/511846 (https://phabricator.wikimedia.org/T223976) (owner: 10Mobrovac) [14:24:02] 10Operations, 10SRE-Access-Requests, 10Release-Engineering-Team (Kanban), 10User-Urbanecm, 10User-greg: Requesting access to production for SWAT deploy for Urbanecm - https://phabricator.wikimedia.org/T192830 (10Volans) Pending approval from sponsor (@zeljkofilipin ) and deployment group owner (@greg ) [14:24:06] (03PS3) 10Filippo Giunchedi: RESTBase: Replace soon-to-be-removed nodes with new ones [puppet] - 10https://gerrit.wikimedia.org/r/511846 (https://phabricator.wikimedia.org/T223976) (owner: 10Mobrovac) [14:35:17] (03PS7) 10Mathew.onipe: logstash: cleanup duplication in logstash hiera [puppet] - 10https://gerrit.wikimedia.org/r/511838 [14:35:32] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=maerlant.wikimedia.org [14:35:32] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [14:35:33] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:35:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:38:19] 10Operations, 10DNS, 10Traffic: GSuite Test Domain Verification - https://phabricator.wikimedia.org/T223921 (10Maintenance_bot) [14:39:44] (03PS1) 10Volans: admin: convert jfishback to shell user [puppet] - 10https://gerrit.wikimedia.org/r/511896 (https://phabricator.wikimedia.org/T222910) [14:39:48] (03PS1) 10Volans: admin: add jfishback to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/511897 (https://phabricator.wikimedia.org/T222910) [14:42:05] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=maerlant.wikimedia.org [14:42:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:23] 10Operations, 10SRE-Access-Requests, 10Security-Team, 10Patch-For-Review, and 2 others: Requesting access to deployment and analytics-privatedata-users for jfishback - https://phabricator.wikimedia.org/T222910 (10Volans) p:05Low→03Normal a:03greg In the meanwhile I've sent patches for the conversion... [14:44:23] (03PS2) 10Dbarratt: Enable Partial Blocks on more Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511889 (https://phabricator.wikimedia.org/T222218) [14:45:18] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T222788 (10Volans) @darthmon_wmde could you verify all works as expected? Feel free to resolve this task if there isn't any problem. [14:45:29] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=nescio.wikimedia.org [14:45:29] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [14:45:30] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:45:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:00] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=nescio.wikimedia.org [14:49:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:06] (03CR) 10Dmaza: [C: 03+1] Enable Partial Blocks on more Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511889 (https://phabricator.wikimedia.org/T222218) (owner: 10Dbarratt) [14:50:12] (03CR) 10Muehlenhoff: [C: 03+1] admin: convert jfishback to shell user [puppet] - 10https://gerrit.wikimedia.org/r/511896 (https://phabricator.wikimedia.org/T222910) (owner: 10Volans) [14:50:15] (03PS1) 10Andrew Bogott: clouddb2001-dev: adjust firewall rules to allow access to both cloudcontrols. [puppet] - 10https://gerrit.wikimedia.org/r/511901 (https://phabricator.wikimedia.org/T223905) [14:53:02] (03CR) 10Andrew Bogott: [C: 03+2] clouddb2001-dev: adjust firewall rules to allow access to both cloudcontrols. [puppet] - 10https://gerrit.wikimedia.org/r/511901 (https://phabricator.wikimedia.org/T223905) (owner: 10Andrew Bogott) [14:53:03] 10Operations, 10Wikimedia-Logstash, 10netops, 10User-herron: Migrate network device syslogs to Kafka logging pipeline - https://phabricator.wikimedia.org/T224128 (10herron) p:05Triage→03Normal [14:54:10] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org [14:54:11] (03PS5) 10Herron: rsyslog: add netdev_kafka_relay compatibility endpoint [puppet] - 10https://gerrit.wikimedia.org/r/495980 (https://phabricator.wikimedia.org/T224128) [14:54:11] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [14:54:11] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:54:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:28] (03PS6) 10Herron: rsyslog: add netdev_kafka_relay compatibility endpoint [puppet] - 10https://gerrit.wikimedia.org/r/495980 (https://phabricator.wikimedia.org/T224128) [14:57:16] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org [14:57:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:52] (03CR) 10Tchanders: [C: 04-1] "Could you add Hungarian to the commit message?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511889 (https://phabricator.wikimedia.org/T222218) (owner: 10Dbarratt) [14:58:33] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org [14:58:34] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [14:58:34] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:58:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:59:24] (03PS3) 10Dbarratt: Enable Partial Blocks on more Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511889 (https://phabricator.wikimedia.org/T222218) [15:00:30] (03CR) 10Tchanders: [C: 03+1] Enable Partial Blocks on more Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511889 (https://phabricator.wikimedia.org/T222218) (owner: 10Dbarratt) [15:00:56] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org [15:01:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:04] (03PS8) 10Mathew.onipe: logstash: cleanup duplication in logstash hiera [puppet] - 10https://gerrit.wikimedia.org/r/511838 [15:04:09] 10Operations, 10serviceops, 10User-jijiki: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 (10Jdforrester-WMF) [15:04:31] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org [15:04:32] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [15:04:32] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [15:04:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:57] PROBLEM - puppet last run on db2035 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [15:07:45] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org [15:07:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:08:27] (03CR) 10Cwhite: [C: 03+2] update node exporter metrics to 0.16+ names [puppet] - 10https://gerrit.wikimedia.org/r/510977 (https://phabricator.wikimedia.org/T219825) (owner: 10Cwhite) [15:08:37] (03PS2) 10Cwhite: update node exporter metrics to 0.16+ names [puppet] - 10https://gerrit.wikimedia.org/r/510977 (https://phabricator.wikimedia.org/T219825) [15:08:48] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=dns1002.wikimedia.org [15:08:49] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [15:08:49] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [15:08:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:08:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:08:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:43] (03PS9) 10Mathew.onipe: logstash: cleanup duplication in logstash hiera [puppet] - 10https://gerrit.wikimedia.org/r/511838 [15:11:13] (03CR) 10Gehel: [C: 04-1] "PCC still failing: https://puppet-compiler.wmflabs.org/compiler1002/16709/" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/511838 (owner: 10Mathew.onipe) [15:11:22] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=dns1002.wikimedia.org [15:11:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:00] 10Operations, 10ops-eqiad: Install new PDUs into b5-eqiad - https://phabricator.wikimedia.org/T223126 (10Maintenance_bot) [15:14:39] (03CR) 10Gehel: [C: 04-1] "PCC now work: https://puppet-compiler.wmflabs.org/compiler1001/16712/" [puppet] - 10https://gerrit.wikimedia.org/r/511838 (owner: 10Mathew.onipe) [15:17:18] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] openstack puppetmaster roles: duplicate for set of profiles to be used in labs [puppet] - 10https://gerrit.wikimedia.org/r/511877 (https://phabricator.wikimedia.org/T171188) (owner: 10Alex Monk) [15:19:31] 10Operations, 10ops-eqiad, 10DBA, 10Goal, 10User-Marostegui: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10Maintenance_bot) [15:21:49] 10Operations, 10ops-eqiad, 10Traffic: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293 (10Maintenance_bot) [15:23:52] 10Operations, 10Icinga, 10observability: re-create script for manual paging - https://phabricator.wikimedia.org/T82937 (10Maintenance_bot) [15:24:22] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Remove deprecated eventgate-analytics chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/510564 (owner: 10Ottomata) [15:24:33] 10Operations, 10Cassandra, 10RESTBase, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), and 2 others: Decommission restbase10(0[7-9]|1[0-5]) - https://phabricator.wikimedia.org/T223976 (10Maintenance_bot) [15:24:49] 10Operations, 10ops-codfw, 10decommission: Decommission db2036 - https://phabricator.wikimedia.org/T223885 (10Maintenance_bot) [15:25:27] 10Operations, 10ops-codfw, 10ops-eqiad, 10DC-Ops, and 2 others: Triage and resolve all outstanding Netbox report errors - https://phabricator.wikimedia.org/T223450 (10Maintenance_bot) [15:26:44] (03PS10) 10Mathew.onipe: logstash: cleanup duplication in logstash hiera [puppet] - 10https://gerrit.wikimedia.org/r/511838 [15:27:01] (03CR) 10Mathew.onipe: logstash: cleanup duplication in logstash hiera (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/511838 (owner: 10Mathew.onipe) [15:27:22] 10Operations, 10Data-Services, 10cloud-services-team (Kanban): labstore1006 spontaneous reboot - https://phabricator.wikimedia.org/T217473 (10Bstorm) 05Open→03Resolved This seems ok for now following the firmware upgrades. I'm going to close it. [15:28:34] 10Operations, 10Puppet, 10Icinga, 10observability: Puppet failing without Icinga alert in case of dependency cycle - https://phabricator.wikimedia.org/T221784 (10Maintenance_bot) [15:31:09] RECOVERY - puppet last run on db2035 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:31:14] (03CR) 10Filippo Giunchedi: logstash: cleanup duplication in logstash hiera (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/511838 (owner: 10Mathew.onipe) [15:32:08] 10Operations, 10Acme-chief, 10Traffic, 10HTTPS: acme-chief: Validate that configured certificates can be actually issued - https://phabricator.wikimedia.org/T220518 (10Maintenance_bot) [15:32:26] 10Operations, 10MediaWiki-extensions-UrlShortener, 10Traffic, 10User-Ladsgroup: Make UrlShortener 404s cacheable - https://phabricator.wikimedia.org/T220190 (10Maintenance_bot) [15:33:01] 10Operations, 10Data-Services, 10decommission, 10cloud-services-team (Kanban): Decommission labsdb1006.eqiad.wmnet and labsdb1007.eqiad.wmnet - https://phabricator.wikimedia.org/T220144 (10Maintenance_bot) [15:33:14] 10Operations, 10Wikimedia-Logstash, 10Goal, 10User-fgiunchedi: TEC6: Logging infrastructure (Q4 2018/19 goal) - https://phabricator.wikimedia.org/T220103 (10Maintenance_bot) [15:35:13] 10Operations, 10ops-eqiad, 10Operations-Software-Development, 10observability: ms-be1043 sdk failed - https://phabricator.wikimedia.org/T218544 (10Maintenance_bot) [15:36:07] 10Operations, 10Analytics, 10Analytics-Kanban, 10vm-requests, 10User-Elukey: Create an-tool1005 (Staging environment for Superset) - https://phabricator.wikimedia.org/T217738 (10Maintenance_bot) [15:40:13] 10Operations, 10Maps, 10Reading-Infrastructure-Team-Backlog: Fix node vs nodejs dependency issue - https://phabricator.wikimedia.org/T214153 (10Maintenance_bot) [15:40:16] (03PS48) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) [15:43:08] (03PS11) 10Mathew.onipe: logstash: cleanup duplication in logstash hiera [puppet] - 10https://gerrit.wikimedia.org/r/511838 [15:43:34] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission oxygen.eqiad.wmnet - https://phabricator.wikimedia.org/T211826 (10Maintenance_bot) [15:45:17] (03CR) 10Gehel: logstash: cleanup duplication in logstash hiera (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/511838 (owner: 10Mathew.onipe) [15:45:19] (03CR) 10Mathew.onipe: "PCC is Ok now: https://puppet-compiler.wmflabs.org/compiler1002/16713/" [puppet] - 10https://gerrit.wikimedia.org/r/511838 (owner: 10Mathew.onipe) [15:45:32] 10Operations, 10serviceops, 10Core Platform Team Backlog (Later), 10Services (next): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10Maintenance_bot) [15:45:51] 10Operations, 10ORES, 10Release Pipeline, 10Scoring-platform-team, 10Release-Engineering-Team (Backlog): Execution of the deployment pipeline should be configurable via .pipeline/config.yaml - https://phabricator.wikimedia.org/T210267 (10Maintenance_bot) [15:46:20] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Remove labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T209642 (10Maintenance_bot) [15:46:52] 10Operations, 10Traffic: Renew Digicert Unified in 2019 - https://phabricator.wikimedia.org/T209515 (10Maintenance_bot) [15:47:25] (03CR) 10Jbond: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/511701 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [15:47:57] 10Operations, 10ops-ulsfo: ulsfo: setup ulsfo PDUs - https://phabricator.wikimedia.org/T209101 (10Maintenance_bot) [15:48:17] 10Operations, 10Release Pipeline, 10Release-Engineering-Team (Backlog): Design pipeline image versioning scheme - https://phabricator.wikimedia.org/T209088 (10Maintenance_bot) [15:48:34] 10Operations, 10MediaWiki-Cache, 10serviceops, 10Performance-Team (Radar), and 2 others: Apply -R 200 to all the memcached mw object cache instances running in eqiad/codfw - https://phabricator.wikimedia.org/T208844 (10Maintenance_bot) [15:49:21] 10Operations, 10ops-eqiad, 10Traffic, 10decommission: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10Maintenance_bot) [15:49:39] 10Operations, 10Traffic, 10Performance-Team (Radar): Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10Maintenance_bot) [15:50:00] 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): Move labs-recursors in WMCS - https://phabricator.wikimedia.org/T207533 (10Maintenance_bot) [15:50:16] 10Operations, 10Release Pipeline, 10Core Platform Team Backlog (Watching / External), 10Release-Engineering-Team (Watching / External), 10Services (watching): Revisit the logging work done on Q1 2017-2018 for the standard pod setup - https://phabricator.wikimedia.org/T207200 (10Maintenance_bot) [15:50:28] (03PS5) 10Arturo Borrero Gonzalez: sudo: decouple sudo from sudo-ldap [puppet] - 10https://gerrit.wikimedia.org/r/508311 (https://phabricator.wikimedia.org/T221225) [15:52:10] 10Operations, 10Operations-Software-Development, 10Goal: Expand Spicerack library and SRE Cookbooks - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205867 (10Maintenance_bot) [15:52:24] 10Operations, 10observability, 10User-fgiunchedi: Expand modern metrics infrastructure coverage (2018-19 Q2 goal) - https://phabricator.wikimedia.org/T205862 (10Maintenance_bot) [15:56:11] (03CR) 10Bstorm: "Setting this to review (do not merge) to prepare for scheduled migration." [puppet] - 10https://gerrit.wikimedia.org/r/509469 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm) [15:56:22] (03PS2) 10Bstorm: cloudstore: switch scratch mounts from labstore1003 to cloudstore1008 [puppet] - 10https://gerrit.wikimedia.org/r/509469 (https://phabricator.wikimedia.org/T209527) [15:56:33] (03CR) 10BryanDavis: "Seems fine to me. Originally these placeholder values were only in hieradata/role/common/striker/web.yaml (I855f9484f799a6847590b5d1196abf" [puppet] - 10https://gerrit.wikimedia.org/r/511769 (owner: 10Jbond) [15:58:20] (03CR) 10Andrew Bogott: [C: 03+1] wmcs openstack: remove redundant hiera config [puppet] - 10https://gerrit.wikimedia.org/r/511769 (owner: 10Jbond) [16:00:04] MaxSem, RoanKattouw, and Niharika: It is that lovely time of the day again! You are hereby commanded to deploy Morning SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190522T1600). [16:00:04] davidwbarratt: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:14] here! [16:01:44] onimisionipe: it seems commit 842dd549e14af7320640ce3d7e8c1ca32ee73aaf introduced issues in Toolforge's elastic nodes [16:01:56] (03CR) 10Andrew Bogott: [C: 03+1] "The package conflict is a pain in the neck but I think this is an OK solution. Maybe we can make a note someplace about removing the comp" [puppet] - 10https://gerrit.wikimedia.org/r/511877 (https://phabricator.wikimedia.org/T171188) (owner: 10Alex Monk) [16:02:23] onimisionipe: [16:02:25] https://www.irccloud.com/pastebin/AUKUDDDa/ [16:02:45] not again :/ [16:03:15] is just a parameter in the class :-) the fix should be easy I think [16:03:24] yea.. :) [16:03:39] do you want a phabricator task? [16:04:08] arturo: Oh that's on wmflabs? I thought you meant cloudelastic100* and I was pretty sure I tested those [16:04:18] I have a task already but not linked to the patch. I should probably link it [16:04:28] arturo: sorry for the pain, forgot to check those [16:04:46] yup, nodes like `tools-elastic-02.tools.eqiad.wmflabs`, cloudvps VMs [16:05:11] it's OK folks, I appreciate your hard work, this is not a big deal [16:05:56] (03CR) 10Arturo Borrero Gonzalez: "PCC for Toolforge: https://puppet-compiler.wmflabs.org/compiler1001/16715/" [puppet] - 10https://gerrit.wikimedia.org/r/508311 (https://phabricator.wikimedia.org/T221225) (owner: 10Arturo Borrero Gonzalez) [16:06:58] gehel onimisionipe for the record, you can try PCC for Toolforge using this regexp in the `Nodes` field in the form: `re:.*\.tools\.eqiad\.wmflabs` [16:07:10] example: https://integration.wikimedia.org/ci/view/operations/job/operations-puppet-catalog-compiler/16715/console [16:07:28] arturo: Thanks! [16:08:23] MaxSem, RoanKattouw, and Niharika is anyone swatting? [16:08:56] hi [16:09:01] so FlaggedRevs is breaking the wikis [16:09:07] so gotta rollback the train :-\ [16:09:18] (03PS1) 10Anomie: wiki replicas: Improve index usage for queries against revision_userindex [puppet] - 10https://gerrit.wikimedia.org/r/511910 (https://phabricator.wikimedia.org/T221339) [16:10:09] (03PS12) 10Mathew.onipe: logstash: cleanup duplication in logstash hiera [puppet] - 10https://gerrit.wikimedia.org/r/511838 [16:10:50] (03PS13) 10Mathew.onipe: logstash: cleanup duplication in logstash hiera [puppet] - 10https://gerrit.wikimedia.org/r/511838 (https://phabricator.wikimedia.org/T224074) [16:11:36] (03PS1) 10Hashar: Revert "group1 wikis to 1.34.0-wmf.6" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511911 (https://phabricator.wikimedia.org/T224116) [16:13:03] (03CR) 10Hashar: [C: 03+2] Revert "group1 wikis to 1.34.0-wmf.6" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511911 (https://phabricator.wikimedia.org/T224116) (owner: 10Hashar) [16:14:04] (03Merged) 10jenkins-bot: Revert "group1 wikis to 1.34.0-wmf.6" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511911 (https://phabricator.wikimedia.org/T224116) (owner: 10Hashar) [16:14:20] (03CR) 10jenkins-bot: Revert "group1 wikis to 1.34.0-wmf.6" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511911 (https://phabricator.wikimedia.org/T224116) (owner: 10Hashar) [16:15:14] PROBLEM - ps1-b5-eqiad-infeed-load-tower-A-phase-Z on ps1-b5-eqiad is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [16:15:16] PROBLEM - ps1-b5-eqiad-infeed-load-tower-B-phase-X on ps1-b5-eqiad is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [16:15:16] PROBLEM - ps1-b5-eqiad-infeed-load-tower-B-phase-Z on ps1-b5-eqiad is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [16:15:16] PROBLEM - ps1-b5-eqiad-infeed-load-tower-A-phase-X on ps1-b5-eqiad is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [16:16:59] !log hashar@deploy1001 rebuilt and synchronized wikiversions files: Revert group1 to 1.34.0-wmf.5 T224116 T224124 # T220731 [16:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:12] T220731: 1.34.0-wmf.6 deployment blockers - https://phabricator.wikimedia.org/T220731 [16:17:13] T224116: operand type was used: expects array(s) or collection(s) in /srv/mediawiki/wmf-config/flaggedrevs.php on line 182 - https://phabricator.wikimedia.org/T224116 [16:17:13] T224124: Special:ProblemChanges on several Wiktionary sites show raw message IDs instead of translated strings - https://phabricator.wikimedia.org/T224124 [16:18:09] 04Critical Alert for device ps1-b5-eqiad.mgmt.eqiad.wmnet - Device rebooted [16:18:53] XioNoX: was that expected ^ ? [16:21:08] jijiki: yeah, some of the eqiad's PDUs have been replaced last week, and their configuration fixed only today (so LibreNMS discovered it only now) [16:21:20] great :D [16:22:31] wikis rolled back [16:28:09] 04̶C̶r̶i̶t̶i̶c̶a̶l Device ps1-b5-eqiad.mgmt.eqiad.wmnet recovered from Device rebooted [16:31:33] (03CR) 10Arturo Borrero Gonzalez: "This should be ready for another round of reviews CC @andrew @moritz @jbond" [puppet] - 10https://gerrit.wikimedia.org/r/508311 (https://phabricator.wikimedia.org/T221225) (owner: 10Arturo Borrero Gonzalez) [16:44:53] (03PS3) 10Bstorm: cloudstore: switch scratch mounts from labstore1003 to cloudstore1008 [puppet] - 10https://gerrit.wikimedia.org/r/509469 (https://phabricator.wikimedia.org/T209527) [16:45:56] (03CR) 10Ppchelko: [EventBus] Add eventgate-main event service. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/510299 (https://phabricator.wikimedia.org/T222822) (owner: 10Ppchelko) [16:50:00] (03CR) 10Mathew.onipe: "PCC is now ok for labs: https://puppet-compiler.wmflabs.org/compiler1001/16718/" [puppet] - 10https://gerrit.wikimedia.org/r/511838 (https://phabricator.wikimedia.org/T224074) (owner: 10Mathew.onipe) [16:52:39] 10Operations, 10ops-codfw, 10Analytics, 10Patch-For-Review: rack/setup/install kafka-main200[1-5] - https://phabricator.wikimedia.org/T223493 (10Papaul) @Ottomata thanks for the updates. Can you please make the necessaries modifications needed on raid10-gpt-srv-lvm-ext4.cfg . [16:59:59] (03CR) 10Gehel: logstash: cleanup duplication in logstash hiera (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/511838 (https://phabricator.wikimedia.org/T224074) (owner: 10Mathew.onipe) [17:06:06] !log decommissioning restbase1008-b -- T223976 [17:06:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:06:13] T223976: Decommission restbase10(0[7-9]|1[0-5]) - https://phabricator.wikimedia.org/T223976 [17:10:55] 10Operations, 10ops-codfw, 10Analytics, 10Patch-For-Review: rack/setup/install kafka-main200[1-5] - https://phabricator.wikimedia.org/T223493 (10RobH) Please note these are showing as an error state of staged in netbox, when they are not yet installed with an OS. I have changed all of the kafka-main200[1-... [17:11:48] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): rack/setup/install labvirt10(19|20).eqiad.wmnet - https://phabricator.wikimedia.org/T172538 (10Bstorm) [17:12:11] 10Operations, 10ops-codfw, 10Analytics, 10Patch-For-Review: rack/setup/install kafka-main200[1-5] - https://phabricator.wikimedia.org/T223493 (10Ottomata) > Can you please make the necessaries modifications needed on raid10-gpt-srv-lvm-ext4.cfg . Ping @herron on this one, SRE is handling these. :) [17:16:31] (03PS14) 10Gehel: logstash: cleanup duplication in logstash hiera [puppet] - 10https://gerrit.wikimedia.org/r/511838 (https://phabricator.wikimedia.org/T224074) (owner: 10Mathew.onipe) [17:16:44] 10Operations, 10DNS, 10Traffic: GSuite Test Domain Verification - https://phabricator.wikimedia.org/T223921 (10HMarcus) @BBlack , receiving the following error from Google when attempting to verify: {F29207898} Would you mind clarifying the context of the second token? Are you saying the same domain name... [17:17:52] (03CR) 10Gehel: [C: 03+2] logstash: cleanup duplication in logstash hiera [puppet] - 10https://gerrit.wikimedia.org/r/511838 (https://phabricator.wikimedia.org/T224074) (owner: 10Mathew.onipe) [17:21:24] (03PS2) 10Volans: admin: convert jfishback to shell user [puppet] - 10https://gerrit.wikimedia.org/r/511896 (https://phabricator.wikimedia.org/T222910) [17:22:21] (03CR) 10Volans: [C: 03+2] admin: convert jfishback to shell user [puppet] - 10https://gerrit.wikimedia.org/r/511896 (https://phabricator.wikimedia.org/T222910) (owner: 10Volans) [17:25:46] 10Operations, 10ops-codfw, 10Analytics, 10Patch-For-Review: rack/setup/install kafka-main200[1-5] - https://phabricator.wikimedia.org/T223493 (10RobH) [17:31:46] 10Operations, 10DNS, 10Traffic: GSuite Test Domain Verification - https://phabricator.wikimedia.org/T223921 (10BBlack) The context of the second token is that all of our canonical wiki domains, including `wikimedia.org`, already have persistent Google Site Verification TXT tokens so that we can manage Google... [17:33:28] (03PS1) 10BBlack: Move gsuite-test token to the actual hostname [dns] - 10https://gerrit.wikimedia.org/r/511923 (https://phabricator.wikimedia.org/T223921) [17:34:04] (03CR) 10BBlack: [C: 03+2] Move gsuite-test token to the actual hostname [dns] - 10https://gerrit.wikimedia.org/r/511923 (https://phabricator.wikimedia.org/T223921) (owner: 10BBlack) [17:35:36] 10Operations, 10DNS, 10Traffic, 10Patch-For-Review: GSuite Test Domain Verification - https://phabricator.wikimedia.org/T223921 (10BBlack) The above is deployed. I'd wait a full 10 minutes from the time of this comment to re-test, in case they've negative-cached the previous lookup, then try again and let... [17:36:31] 10Operations, 10SRE-Access-Requests, 10Security-Team, 10Patch-For-Review, and 2 others: Requesting access to deployment and analytics-privatedata-users for jfishback - https://phabricator.wikimedia.org/T222910 (10Volans) I've verified with @JFishback_WMF that basic access works as expected. [17:38:23] !log repool cp3046 as esams cache_upload ats-be node - T222937 [17:38:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:38:28] T222937: Replace Varnish backends with ATS on cache upload nodes in esams - https://phabricator.wikimedia.org/T222937 [17:39:59] (03PS7) 10Dzahn: switch phabricator from phab1001 to phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/437620 (https://phabricator.wikimedia.org/T196019) [17:42:41] 10Operations, 10Traffic, 10Performance-Team (Radar): Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10BBlack) 05Open→03Resolved a:03BBlack Scheme has been stable for ~1w now and seems to be working out fine. The net reduction in... [17:43:11] 10Operations, 10DNS, 10Traffic, 10Patch-For-Review: GSuite Test Domain Verification - https://phabricator.wikimedia.org/T223921 (10Dzahn) [17:51:18] 10Operations, 10Traffic: Lower geodns TTLs from 600 (10min) to 300 (5min) - https://phabricator.wikimedia.org/T140365 (10BBlack) So we've reduced query volume by ~32% in T208263 . Since the last significant updates here, we've also deployed newer versions of our authdns software which perform even better, and... [17:51:21] (03PS1) 10Dzahn: phabricator: Mediawiki -> Wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/511926 [17:53:53] (03PS1) 10Dzahn: phabricator: activate read-only mode for maintenance [puppet] - 10https://gerrit.wikimedia.org/r/511929 (https://phabricator.wikimedia.org/T196019) [17:55:10] (03PS2) 10Dzahn: phabricator: activate read-only mode for maintenance [puppet] - 10https://gerrit.wikimedia.org/r/511929 (https://phabricator.wikimedia.org/T196019) [17:59:57] (03PS1) 10Legoktm: flaggedrevs: Declare cswikinews permissions in the standard way [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511932 [18:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190522T1800) [18:03:54] (03CR) 10Paladox: [C: 03+1] phabricator: activate read-only mode for maintenance [puppet] - 10https://gerrit.wikimedia.org/r/511929 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn) [18:03:55] (03PS8) 10Dzahn: switch phabricator from phab1001 to phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/437620 (https://phabricator.wikimedia.org/T196019) [18:05:49] (03PS2) 10Dzahn: phabricator: Mediawiki -> Wikimedia and fix a typo [puppet] - 10https://gerrit.wikimedia.org/r/511926 [18:06:01] (03CR) 10Paladox: [C: 03+1] phabricator: Mediawiki -> Wikimedia and fix a typo [puppet] - 10https://gerrit.wikimedia.org/r/511926 (owner: 10Dzahn) [18:08:12] I'm going to deploy a back-port to wmf.6 now. [18:09:39] 10Operations, 10ops-codfw, 10Analytics, 10Patch-For-Review: rack/setup/install kafka-main200[1-5] - https://phabricator.wikimedia.org/T223493 (10Papaul) [18:10:27] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install kafka-main200[1-5] - https://phabricator.wikimedia.org/T223493 (10Ottomata) [18:11:49] MaxSem ping [18:17:27] !log jforrester@deploy1001 Synchronized php-1.34.0-wmf.6/extensions/UrlShortener/modules/ext.urlShortener.special.js: Fix i18n/command mix-up Ic99cf063a (duration: 01m 00s) [18:17:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:42] (03CR) 10Mathew.onipe: "script looks good:" [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [18:35:07] !log jforrester@deploy1001 Synchronized php-1.34.0-wmf.6/extensions/FlaggedRevs: Hot-deploy reverting FlaggedRevs config for T224116 T224124 (duration: 00m 58s) [18:35:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:35:14] T224116: operand type was used: expects array(s) or collection(s) in /srv/mediawiki/wmf-config/flaggedrevs.php on line 182 - https://phabricator.wikimedia.org/T224116 [18:35:14] T224124: Special:ProblemChanges on several Wiktionary sites show raw message IDs instead of translated strings - https://phabricator.wikimedia.org/T224124 [18:38:28] (03PS2) 10Dzahn: strongswan: add an Icinga notes_url [puppet] - 10https://gerrit.wikimedia.org/r/510962 [18:39:12] (03PS3) 10Dzahn: strongswan: add an Icinga notes_url [puppet] - 10https://gerrit.wikimedia.org/r/510962 [18:46:38] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/16722/cp1075.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/510962 (owner: 10Dzahn) [18:53:21] 10Operations, 10VisualEditor, 10Performance-Team (Radar), 10Software-Licensing: New MongoDB version is not DFSG-compatible, dropped by Debian - https://phabricator.wikimedia.org/T213996 (10Krinkle) Meanwhile, over on the OSI license-review mailing list ([March 2019 summary](http://lists.opensource.org/pipe... [18:53:36] !log jforrester@deploy1001 Started scap: Re-build i18n and re-scap everything for i18n issues for T224116 T224124 T220731 [18:53:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:53:46] T220731: 1.34.0-wmf.6 deployment blockers - https://phabricator.wikimedia.org/T220731 [18:53:49] T224116: operand type was used: expects array(s) or collection(s) in /srv/mediawiki/wmf-config/flaggedrevs.php on line 182 - https://phabricator.wikimedia.org/T224116 [18:53:49] T224124: Special:ProblemChanges on several Wiktionary sites show raw message IDs instead of translated strings - https://phabricator.wikimedia.org/T224124 [18:55:02] (03CR) 10Dzahn: "> Patch Set 2: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/511751 (owner: 10Ori.livneh) [18:56:04] 10Operations, 10VisualEditor, 10Performance-Team (Radar), 10Software-Licensing: New MongoDB version is not DFSG-compatible, dropped by Debian - https://phabricator.wikimedia.org/T213996 (10Krinkle) From the XHGui side, a refactor has landed that decouples it from the MongoDB backend, adding PHP-PDO support... [18:56:57] 10Operations, 10VisualEditor, 10Performance-Team (Radar), 10Software-Licensing, 10User-Ryasmeen: New MongoDB version is not DFSG-compatible, dropped by Debian - https://phabricator.wikimedia.org/T213996 (10Krinkle) 05Open→03Resolved a:03Krinkle Closing for now as migration from MongoDB isn't in sco... [19:00:04] Deploy window MediaWiki train - Americas version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190522T1900) [19:03:01] (03PS1) 10Dzahn: ipmi: add Icinga notes_url [puppet] - 10https://gerrit.wikimedia.org/r/511949 (https://phabricator.wikimedia.org/T197873) [19:03:08] (03PS1) 10Andrew Bogott: nova: make nova-conductor and nova-scheduler active/active [puppet] - 10https://gerrit.wikimedia.org/r/511950 (https://phabricator.wikimedia.org/T223905) [19:03:36] 10Operations, 10Analytics, 10Analytics-Kanban, 10Discovery, and 2 others: Make hadoop cluster able to push to swift - https://phabricator.wikimedia.org/T219544 (10Ottomata) Alright, I've written a bash wrapper to help out with this. I'd do it with just the swift CLI, but we need to be able to source some... [19:04:56] (03PS1) 10Herron: partman: add 8 disk raid10 layout [puppet] - 10https://gerrit.wikimedia.org/r/511952 (https://phabricator.wikimedia.org/T223493) [19:05:20] 10Operations, 10DNS, 10Traffic, 10Patch-For-Review: GSuite Test Domain Verification - https://phabricator.wikimedia.org/T223921 (10HMarcus) {F29208502} Thanks so much for your help Brandon, that did it. I will follow up with the Google team to see if any additional DNS records are needed. Would you prefe... [19:05:47] (03CR) 10Herron: [C: 03+2] partman: add 8 disk raid10 layout [puppet] - 10https://gerrit.wikimedia.org/r/511952 (https://phabricator.wikimedia.org/T223493) (owner: 10Herron) [19:05:52] (03PS1) 10Jforrester: Re-apply "group1 wikis to 1.34.0-wmf.6" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511953 (https://phabricator.wikimedia.org/T224116) [19:10:22] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install kafka-main200[1-5] - https://phabricator.wikimedia.org/T223493 (10herron) Hey @papaul, I added a raid10-gpt-srv-lvm-ext4-8disks.cfg for the initial installs on these. Once they are up and running I'll do a little benchmarking to try and see i... [19:15:16] (03PS1) 10Dzahn: phabricator: add forensic apache logging and enable on phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/511955 [19:16:17] (03CR) 10jerkins-bot: [V: 04-1] phabricator: add forensic apache logging and enable on phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/511955 (owner: 10Dzahn) [19:22:05] PROBLEM - High CPU load on API appserver on mw1281 is CRITICAL: CRITICAL - load average: 72.04, 37.95, 24.45 [19:22:07] (03PS2) 10Dzahn: phabricator: add forensic apache logging and enable on phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/511955 [19:22:45] PROBLEM - High CPU load on API appserver on mw1229 is CRITICAL: CRITICAL - load average: 49.60, 24.68, 17.00 [19:22:47] (03CR) 10Dzahn: "i'm copying this for phabricator at https://gerrit.wikimedia.org/r/511955" [puppet] - 10https://gerrit.wikimedia.org/r/511751 (owner: 10Ori.livneh) [19:22:55] PROBLEM - High CPU load on API appserver on mw1222 is CRITICAL: CRITICAL - load average: 64.06, 30.75, 19.83 [19:23:45] PROBLEM - High CPU load on API appserver on mw1276 is CRITICAL: CRITICAL - load average: 72.31, 37.76, 25.47 [19:24:05] PROBLEM - High CPU load on API appserver on mw1289 is CRITICAL: CRITICAL - load average: 64.26, 33.32, 22.01 [19:24:11] is this ^^^ release-train related, or is it something else? [19:24:14] The CPU load might be me (scap-cdb-rebuild is running). [19:24:17] RECOVERY - High CPU load on API appserver on mw1281 is OK: OK - load average: 23.34, 31.39, 23.85 [19:24:22] ah okay, thanks James_F [19:24:40] It's >90% done. [19:25:07] PROBLEM - High CPU load on API appserver on mw1286 is CRITICAL: CRITICAL - load average: 61.29, 34.76, 22.24 [19:26:03] RECOVERY - High CPU load on API appserver on mw1229 is OK: OK - load average: 15.32, 23.06, 18.24 [19:26:06] 10Operations, 10ops-codfw, 10ops-eqiad, 10DC-Ops, and 2 others: Triage and resolve all outstanding Netbox report errors - https://phabricator.wikimedia.org/T223450 (10faidon) [19:26:13] RECOVERY - High CPU load on API appserver on mw1222 is OK: OK - load average: 15.00, 24.16, 19.52 [19:26:15] RECOVERY - High CPU load on API appserver on mw1286 is OK: OK - load average: 28.90, 30.50, 21.67 [19:26:30] (03CR) 10Dzahn: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/16725/" [puppet] - 10https://gerrit.wikimedia.org/r/511955 (owner: 10Dzahn) [19:26:31] !log jforrester@deploy1001 Finished scap: Re-build i18n and re-scap everything for i18n issues for T224116 T224124 T220731 (duration: 32m 55s) [19:26:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:39] T220731: 1.34.0-wmf.6 deployment blockers - https://phabricator.wikimedia.org/T220731 [19:26:40] T224116: operand type was used: expects array(s) or collection(s) in /srv/mediawiki/wmf-config/flaggedrevs.php on line 182 - https://phabricator.wikimedia.org/T224116 [19:26:40] T224124: Special:ProblemChanges on several Wiktionary sites show raw message IDs instead of translated strings - https://phabricator.wikimedia.org/T224124 [19:27:05] RECOVERY - High CPU load on API appserver on mw1276 is OK: OK - load average: 19.32, 30.10, 25.08 [19:27:11] Recovery for all four. [19:27:25] RECOVERY - High CPU load on API appserver on mw1289 is OK: OK - load average: 18.78, 29.30, 23.02 [19:27:38] (03PS1) 10Effie Mouzeli: profile::memcached::instance Migrate to lookup() [puppet] - 10https://gerrit.wikimedia.org/r/511963 [19:28:15] (03CR) 10Paladox: [C: 03+1] phabricator: add forensic apache logging and enable on phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/511955 (owner: 10Dzahn) [19:28:36] (03CR) 10jerkins-bot: [V: 04-1] profile::memcached::instance Migrate to lookup() [puppet] - 10https://gerrit.wikimedia.org/r/511963 (owner: 10Effie Mouzeli) [19:30:03] 10Operations, 10DNS, 10Traffic, 10Patch-For-Review: GSuite Test Domain Verification - https://phabricator.wikimedia.org/T223921 (10BBlack) Either is fine. I assume you won't be able to do anything else with this (e.g. make https://gsuite-test.wikimedia.org/ work) without some followup records added on our... [19:32:56] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install kafka-main200[1-5] - https://phabricator.wikimedia.org/T223493 (10Papaul) [19:37:53] (03PS3) 10Dzahn: phabricator: add forensic apache logging and enable on phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/511955 [19:45:02] (03PS2) 10Effie Mouzeli: profile::memcached::instance Migrate to lookup() [puppet] - 10https://gerrit.wikimedia.org/r/511963 [19:45:21] (03PS1) 10Ayounsi: Format README, remove mention to oldhardware.py [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/511969 [19:50:11] (03CR) 10Effie Mouzeli: [V: 03+1] "NOOP https://puppet-compiler.wmflabs.org/compiler1002/16729/" [puppet] - 10https://gerrit.wikimedia.org/r/511963 (owner: 10Effie Mouzeli) [20:00:04] cscott, arlolra, subbu, bearND, and halfak: It is that lovely time of the day again! You are hereby commanded to deploy Services – Parsoid / Citoid / Mobileapps / ORES / …. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190522T2000). [20:02:22] (03PS1) 10Effie Mouzeli: profile::memcached::instance: Add -R 200 option [puppet] - 10https://gerrit.wikimedia.org/r/511973 (https://phabricator.wikimedia.org/T208844) [20:06:04] (03CR) 10Effie Mouzeli: [V: 03+1] "1 NOOP, 1 CHANGE, as expected. https://puppet-compiler.wmflabs.org/compiler1001/16730/" [puppet] - 10https://gerrit.wikimedia.org/r/511973 (https://phabricator.wikimedia.org/T208844) (owner: 10Effie Mouzeli) [20:06:25] (03CR) 10Jbond: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/511769 (owner: 10Jbond) [20:08:16] (03CR) 10Dzahn: [C: 03+1] profile::memcached::instance Migrate to lookup() [puppet] - 10https://gerrit.wikimedia.org/r/511963 (owner: 10Effie Mouzeli) [20:08:40] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [20:09:34] PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [20:09:38] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [20:10:44] PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [20:11:32] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [20:11:41] (03PS3) 10Effie Mouzeli: profile::memcached::instance: Migrate to lookup() [puppet] - 10https://gerrit.wikimedia.org/r/511963 [20:12:28] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [20:12:52] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [20:13:22] (03CR) 10Dzahn: [C: 03+1] "looks like https://gerrit.wikimedia.org/r/c/operations/puppet/+/473669/1/hieradata/hosts/mc1019.yaml just now making it default" [puppet] - 10https://gerrit.wikimedia.org/r/511973 (https://phabricator.wikimedia.org/T208844) (owner: 10Effie Mouzeli) [20:17:08] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [20:17:46] RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [20:18:00] RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [20:18:24] (03CR) 10Jbond: [C: 03+1] "LGTM <3" [puppet] - 10https://gerrit.wikimedia.org/r/511963 (owner: 10Effie Mouzeli) [20:24:24] (03CR) 10Dzahn: "it's soon been 2 months and a month since the last comment that asked to hold it for a week. i wonder who is working on this and what the " [puppet] - 10https://gerrit.wikimedia.org/r/498429 (owner: 10Dzahn) [20:26:42] (03CR) 10Dzahn: "So wikitech registration has been opened again. What is the current status? Did that make this change invalid or not?" [puppet] - 10https://gerrit.wikimedia.org/r/498429 (owner: 10Dzahn) [20:35:33] !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298 [20:35:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:38:14] !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298 (duration: 02m 41s) [20:38:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:38:58] 10Operations, 10ops-eqiad, 10Gerrit, 10serviceops, 10Release-Engineering-Team (Watching / External): Gerrit Hardware Upgrade - https://phabricator.wikimedia.org/T222391 (10Dzahn) 05Open→03Stalled [20:39:11] !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298, take 2 [20:39:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:43:08] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: /{domain}/v1/page/metadata/{title}/{revision}/{tid} (retrieve extended metadata for Video article on English Wikipedia) is CRITICAL: Test retrieve extended metadata for Video article on English Wikipedia returned the unexpected status 404 (expecting: 200): /{domain}/v1/page/references/{title}/{revision}/{tid} (Get references of a test page) timed out before a response [20:43:08] omain}/v1/page/definition/{title}/{revision}/{tid} (retrieve en-wiktionary definitions for cat) is CRITICAL: Test retrieve en-wiktionary definitions for cat returned the unexpected status 404 (expecting: 200): /{domain}/v1/page/summary/{title}/{revision}/{tid} (Get summary for test page) is CRITICAL: Test Get summary for test page returned the unexpected status 404 (expecting: 200): /{domain}/v1/page/mobile-html/{title}/{revision [20:43:08] content HTML for test page) is CRITICAL: Test Get page content HTML for test page returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [20:43:30] !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298, take 2 (duration: 04m 19s) [20:43:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:43:34] ^ rolling back [20:44:30] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [20:45:53] mdholloway: looks like the rollback is not logged in SAL automatically. You might want to add a manual entry using `!log` [20:46:29] !log mobileapps rolled back deployment due to endpoint check failures [20:46:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:59:37] 10Operations, 10ops-eqiad, 10Cloud-Services, 10cloud-services-team (Kanban): rack/setup/install (3) new osd ceph nodes - https://phabricator.wikimedia.org/T224188 (10RobH) [21:03:57] 10Operations: build grafana package for stretch - https://phabricator.wikimedia.org/T210034 (10Dzahn) Just checked on this again and i notice that meanwhile somebody has done this. ` [install1002:~] $ sudo -i reprepro ls grafana grafana | 6.1.3 | jessie-wikimedia | amd64 grafana | 5.4.2 | stretch-wikimedia |... [21:04:20] PROBLEM - Check systemd state on ms-be2017 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:08:53] 10Operations, 10ops-eqiad, 10Cloud-Services, 10cloud-services-team (Kanban): rack/setup/install (3) new osd ceph nodes - https://phabricator.wikimedia.org/T224188 (10RobH) a:03Andrew @andrew or @bstorm: Since you both were commenting on the hardware specification task, I'm assuming you would also be the... [21:24:04] RECOVERY - Check systemd state on ms-be2017 is OK: OK - running: The system is fully operational [21:27:03] 10Operations, 10ops-eqiad, 10Cloud-Services, 10cloud-services-team (Kanban): rack/setup/install (3) new osd ceph nodes - https://phabricator.wikimedia.org/T224188 (10Bstorm) - I vote to add cloudosd1xxx to the naming conventions unless my team rebels against that. The related monitor nodes would end up... [21:36:28] 10Operations, 10Operations-Software-Development, 10Patch-For-Review, 10cloud-services-team (Kanban): Flake8 for python files without extension in puppet repo - https://phabricator.wikimedia.org/T144169 (10hashar) @jbond those OpenStack `.original` files, I am assuming they are copied verbatim from the upst... [21:41:19] !log decommissioning restbase1008-c -- T223976 [21:41:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:41:26] T223976: Decommission restbase10(0[7-9]|1[0-5]) - https://phabricator.wikimedia.org/T223976 [21:44:21] * Krinkle staging on mwdebug1002 [21:45:46] 10Operations, 10DNS, 10Matrix, 10Traffic, and 2 others: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T223835 (10Tgr) Yeah, it shouldn't be merged before the server is up (which is in a few days if all goes well). The Matrix server (Synapse) will be a... [21:47:49] !log krinkle@deploy1001 Synchronized php-1.34.0-wmf.6/includes/resourceloader/MessageBlobStore.php: T222539 / 3cb01cc73ce9 (duration: 00m 56s) [21:47:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:47:55] T222539: Scap deployments are not purging MessageBlobStore (was: Stale localized messages) - https://phabricator.wikimedia.org/T222539 [21:48:02] (03CR) 10Ori.livneh: "Instead of duplicating it, perhaps I should move this code to the httpd module? It already handles basic logging setup :" [puppet] - 10https://gerrit.wikimedia.org/r/511751 (owner: 10Ori.livneh) [21:51:06] !log krinkle@deploy1001 Synchronized php-1.34.0-wmf.5/includes/resourceloader/MessageBlobStore.php: T222539 / 734b3d84f7 (duration: 00m 56s) [21:51:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:03:50] 10Operations, 10serviceops: upgrade krypton (webserver_misc_apps) to stretch - https://phabricator.wikimedia.org/T210008 (10Dzahn) [22:05:14] !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724 [22:05:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:06:18] 10Operations, 10serviceops: upgrade krypton (webserver_misc_apps) to stretch - https://phabricator.wikimedia.org/T210008 (10Dzahn) [22:07:16] 10Operations, 10serviceops: upgrade krypton (webserver_misc_apps) to stretch - https://phabricator.wikimedia.org/T210008 (10Dzahn) [22:07:18] (03PS1) 10Reedy: Copy in FR config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/512053 [22:08:13] !log reset user email and password for DarkKyoushu [22:08:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:08:35] 10Operations, 10serviceops: upgrade krypton (webserver_misc_apps) to stretch - https://phabricator.wikimedia.org/T210008 (10Dzahn) [22:08:39] !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724 (duration: 03m 25s) [22:08:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:09:22] !log mobileapps rolled back deployment due to endpoint check failure (not the same one as before); retrying momentarily [22:09:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:09:51] !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724, take 2 [22:09:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:12:09] 10Operations, 10serviceops: upgrade krypton (webserver_misc_apps) to stretch - https://phabricator.wikimedia.org/T210008 (10Dzahn) ~~test~~ [22:12:17] 10Operations, 10ops-eqiad: wmf7622 wont powercycle (cannot be allocated from spares) - https://phabricator.wikimedia.org/T222922 (10crusnov) I'm definitely in favor or allowing a failed state to basically come from any other state. [22:12:51] (03PS3) 10Ori.livneh: Configure forensic logging of Apache requests; enable on beta [puppet] - 10https://gerrit.wikimedia.org/r/511751 [22:13:04] 10Operations, 10serviceops: upgrade krypton (webserver_misc_apps) to stretch - https://phabricator.wikimedia.org/T210008 (10Dzahn) [22:15:29] !log reset user email and password for Nv8200pa [22:15:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:16:50] PROBLEM - mobileapps endpoints health on scb2005 is CRITICAL: /{domain}/v1/page/references/{title}/{revision} (Get references of a test page) is CRITICAL: Could not fetch url http://10.192.0.34:8888/en.wikipedia.org/v1/page/references/Video/830543386: Generic connection error: HTTPConnectionPool(host=u10.192.0.34, port=8888): Max retries exceeded with url: /en.wikipedia.org/v1/page/references/Video/830543386 (Caused by NewConnect [22:16:50] connection.HTTPConnection object at 0x7f9b11c912d0: Failed to establish a new connection: [Errno 111] Connection refused,)) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [22:17:10] !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724, take 2 (duration: 07m 19s) [22:17:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:17:36] mdholloway: ^ i assume that alert will recover soon because it just happens during deploy [22:17:39] 10Operations, 10Scap, 10Release-Engineering-Team (Backlog): Remove trusty-specific hacks from logstash_checker.py - https://phabricator.wikimedia.org/T216380 (10greg) [22:17:58] mutante: yes, i'd expect that to recover in a minute [22:17:58] (03PS2) 10Reedy: Copy in FR config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/512053 [22:18:03] cool [22:18:12] RECOVERY - mobileapps endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [22:18:18] hehe [22:18:34] !log mobileapps rolled back deployment (again) due to occasional references endpoint timeouts [22:18:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:21:53] 10Operations, 10serviceops: upgrade krypton (webserver_misc_apps) to stretch - https://phabricator.wikimedia.org/T210008 (10Dzahn) [22:23:02] 10Operations, 10serviceops: upgrade krypton (webserver_misc_apps) to stretch - https://phabricator.wikimedia.org/T210008 (10Dzahn) [22:23:07] 10Operations: build grafana package for stretch - https://phabricator.wikimedia.org/T210034 (10Dzahn) [22:28:48] 10Operations: build grafana package for stretch (upgrade grafana stretch package to 6.x?) - https://phabricator.wikimedia.org/T210034 (10Dzahn) [22:29:24] 10Operations, 10serviceops: upgrade krypton (webserver_misc_apps) to stretch - https://phabricator.wikimedia.org/T210008 (10Dzahn) grafana is not on this host anymore meanwhile. unlinking subtask , not blocking this anymore [22:31:06] (03CR) 10Dzahn: "this happened to me again today on a different VM and role." [puppet] - 10https://gerrit.wikimedia.org/r/451206 (https://phabricator.wikimedia.org/T196968) (owner: 10Dzahn) [22:32:57] (03CR) 10Dzahn: "manually running 'sudo /usr/sbin/a2dismod mpm_event' followed by puppet fixes it" [puppet] - 10https://gerrit.wikimedia.org/r/451206 (https://phabricator.wikimedia.org/T196968) (owner: 10Dzahn) [22:37:38] (03PS1) 10Reedy: Stop using array_merge for $wgFlaggedRevsNamespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/512061 [22:38:34] (03CR) 10jerkins-bot: [V: 04-1] Stop using array_merge for $wgFlaggedRevsNamespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/512061 (owner: 10Reedy) [22:39:59] (03PS2) 10Reedy: Stop using array_merge for $wgFlaggedRevsNamespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/512061 [22:43:52] (03PS1) 10Dzahn: webserver_misc_apps: add PHP7.2 APT repository [puppet] - 10https://gerrit.wikimedia.org/r/512066 [22:44:16] 10Operations, 10serviceops, 10PHP 7.2 support: switch webserver_misc_apps to PHP 7.2 - https://phabricator.wikimedia.org/T224194 (10Dzahn) [22:44:43] (03CR) 10jerkins-bot: [V: 04-1] webserver_misc_apps: add PHP7.2 APT repository [puppet] - 10https://gerrit.wikimedia.org/r/512066 (owner: 10Dzahn) [22:45:07] (03PS2) 10Dzahn: webserver_misc_apps: add PHP7.2 APT repository [puppet] - 10https://gerrit.wikimedia.org/r/512066 (https://phabricator.wikimedia.org/T224194) [22:45:55] (03CR) 10jerkins-bot: [V: 04-1] webserver_misc_apps: add PHP7.2 APT repository [puppet] - 10https://gerrit.wikimedia.org/r/512066 (https://phabricator.wikimedia.org/T224194) (owner: 10Dzahn) [22:47:18] (03CR) 10Dzahn: [C: 04-1] ""wmf-style: role 'role::webserver_misc_apps' should not include defines" meeeh.. this also needs a profile for the httpd setup as done for" [puppet] - 10https://gerrit.wikimedia.org/r/512066 (https://phabricator.wikimedia.org/T224194) (owner: 10Dzahn) [22:47:55] (03CR) 10BryanDavis: "> > Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/511769 (owner: 10Jbond) [23:00:04] MaxSem, RoanKattouw, and Niharika: It is that lovely time of the day again! You are hereby commanded to deploy Evening SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190522T2300). [23:00:04] davidwbarratt: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:12] here! [23:01:12] https://i.imgur.com/hsUZqgp.png [23:02:11] (03PS4) 10MaxSem: Enable Partial Blocks on more Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511889 (https://phabricator.wikimedia.org/T222218) (owner: 10Dbarratt) [23:03:17] (03CR) 10MaxSem: [C: 03+2] Enable Partial Blocks on more Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511889 (https://phabricator.wikimedia.org/T222218) (owner: 10Dbarratt) [23:04:22] (03Merged) 10jenkins-bot: Enable Partial Blocks on more Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511889 (https://phabricator.wikimedia.org/T222218) (owner: 10Dbarratt) [23:06:19] 10Operations, 10ops-codfw: pull decom hardware and ship to Harry/OIT @ SF office - https://phabricator.wikimedia.org/T222383 (10HMarcus) Hi all, The controllers and RAM arrived today. Thanks very much for your help with coordinating this, I'll let you know if we need anything else but this ticket can be close... [23:06:20] davidwbarratt: no idea if this patch can be tested, but I've pulled it on mwdebug1002 [23:06:53] (03CR) 10jenkins-bot: Enable Partial Blocks on more Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511889 (https://phabricator.wikimedia.org/T222218) (owner: 10Dbarratt) [23:08:38] MaxSem oh yes it can sorry [23:08:43] MaxSem testing now [23:09:15] MaxSem looks perfect! [23:10:45] !log maxsem@deploy1001 Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511889/ (duration: 00m 55s) [23:10:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:12:40] davidwbarratt: ^ [23:12:51] MaxSem Thanks! [23:16:18] (03CR) 10Bstorm: [C: 04-1] "The comments are after the > in the view definition, so they are part of the definition as far as the pyyaml is concerned and the maintain" [puppet] - 10https://gerrit.wikimedia.org/r/511910 (https://phabricator.wikimedia.org/T221339) (owner: 10Anomie) [23:16:45] (03CR) 10Bstorm: [C: 04-1] "> Patch Set 1: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/511910 (https://phabricator.wikimedia.org/T221339) (owner: 10Anomie) [23:23:08] (03PS1) 10Papaul: DNS: Add mgmt and production DNS for kafka-main200[1-5] [dns] - 10https://gerrit.wikimedia.org/r/512069 [23:28:37] 10Operations, 10ops-codfw: pull decom hardware and ship to Harry/OIT @ SF office - https://phabricator.wikimedia.org/T222383 (10Papaul) 05Open→03Resolved You welcome. [23:28:45] (03CR) 1020after4: [C: 03+1] phabricator: add forensic apache logging and enable on phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/511955 (owner: 10Dzahn) [23:29:37] (03CR) 1020after4: [C: 03+1] phabricator: activate read-only mode for maintenance [puppet] - 10https://gerrit.wikimedia.org/r/511929 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn) [23:30:01] !log scheduling downtime for phabricator from 0:00 to 1:00 utc [23:30:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:35:34] (03PS9) 1020after4: switch phabricator from phab1001 to phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/437620 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn) [23:38:09] 10Operations, 10Analytics, 10Analytics-Cluster, 10CirrusSearch, and 2 others: Cirrus query clicks cron job for dropping partitions older than 90 days have started failing - https://phabricator.wikimedia.org/T224200 (10EBernhardson) [23:38:58] 10Operations, 10Analytics, 10Analytics-Cluster, 10CirrusSearch, and 2 others: Cirrus query clicks cron job for dropping partitions older than 90 days have started failing - https://phabricator.wikimedia.org/T224200 (10EBernhardson) [23:44:56] (03CR) 10Petar.petkovic: "Partial blocks on Bengali Wikipedia are enabled by I7c8bf3d531765d3c05edb4a5fadfa5283c174c5a" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/508122 (https://phabricator.wikimedia.org/T222258) (owner: 10Ammarpad)