[00:00:15] PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [00:00:21] PROBLEM - HTTP availability for Varnish at eqsin on icinga1001 is CRITICAL: job=varnish-text site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:00:28] looking, also bblack ^ [00:00:39] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:00:43] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [00:00:45] PROBLEM - HTTP availability for Varnish at codfw on icinga1001 is CRITICAL: job=varnish-text site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:01:03] PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [00:01:53] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [00:02:07] PROBLEM - Codfw HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=codfw&var-cache_type=All&var-status_type=5 [00:04:05] RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:04:07] RECOVERY - HTTP availability for Varnish at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:04:17] RECOVERY - HTTP availability for Varnish at eqsin on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:04:33] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:04:35] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:04:39] RECOVERY - HTTP availability for Varnish at codfw on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:05:01] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:05:25] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:05:33] RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:05:37] RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [00:08:57] RECOVERY - Eqsin HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqsin&var-cache_type=All&var-status_type=5 [00:09:53] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [00:09:59] RECOVERY - Codfw HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=codfw&var-cache_type=All&var-status_type=5 [00:10:13] RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [00:10:27] 10Operations, 10observability, 10Patch-For-Review: LibreNMS upgrade to 1.51 - https://phabricator.wikimedia.org/T207706 (10ayounsi) CR above solved issue #1, but also needed: `sudo chown www-data:librenms /var/log/librenms.log` `sudo chmod a+rw bootstrap/cache/` solved issue #2 Now getting the following:... [00:11:01] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [00:12:03] RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [00:21:25] PROBLEM - Check systemd state on ms-be2013 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [00:30:27] !log ayounsi@deploy1001 Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481 [00:30:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:30:31] !log ayounsi@deploy1001 Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481 (duration: 00m 04s) [00:30:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:37:01] PROBLEM - LibreNMS HTTPS on netmon1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 10357 bytes in 0.042 second response time https://wikitech.wikimedia.org/wiki/LibreNMS [00:40:49] (03PS1) 10Mobrovac: Handle application/octet-stream requests properly; release v0.1.5 [software/service-checker] - 10https://gerrit.wikimedia.org/r/507531 (https://phabricator.wikimedia.org/T220401) [00:41:30] (03PS2) 10Mobrovac: Handle application/octet-stream requests properly; release v0.1.5 [software/service-checker] - 10https://gerrit.wikimedia.org/r/507531 (https://phabricator.wikimedia.org/T220401) [00:41:35] (03CR) 10jerkins-bot: [V: 04-1] Handle application/octet-stream requests properly; release v0.1.5 [software/service-checker] - 10https://gerrit.wikimedia.org/r/507531 (https://phabricator.wikimedia.org/T220401) (owner: 10Mobrovac) [00:42:24] (03CR) 10jerkins-bot: [V: 04-1] Handle application/octet-stream requests properly; release v0.1.5 [software/service-checker] - 10https://gerrit.wikimedia.org/r/507531 (https://phabricator.wikimedia.org/T220401) (owner: 10Mobrovac) [00:43:35] RECOVERY - LibreNMS HTTPS on netmon1002 is OK: HTTP OK: HTTP/1.1 302 Found - 1424 bytes in 0.058 second response time https://wikitech.wikimedia.org/wiki/LibreNMS [00:44:06] (03PS3) 10Mobrovac: Handle application/octet-stream requests properly; release v0.1.5 [software/service-checker] - 10https://gerrit.wikimedia.org/r/507531 (https://phabricator.wikimedia.org/T220401) [00:55:27] RECOVERY - Check systemd state on ms-be2013 is OK: OK - running: The system is fully operational [00:57:50] !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@5d619e4]: Update spec x-amples [00:57:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:59:27] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [01:00:31] RECOVERY - mobileapps endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [01:01:07] RECOVERY - mobileapps endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [01:01:48] !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@5d619e4]: Update spec x-amples (duration: 03m 58s) [01:01:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:02:07] RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [01:02:17] RECOVERY - mobileapps endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [01:02:17] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [01:15:57] 10Operations, 10observability, 10Patch-For-Review: LibreNMS upgrade to 1.51 - https://phabricator.wikimedia.org/T207706 (10ayounsi) I managed to get it working with some live hacking that need to be puppetized: For the above error, the specific error was: ` [Wed May 01 00:33:06.878169 2019] [php7:error] [pid... [01:37:44] (03CR) 10Zoranzoki21: DNS: Remoce mgmt and production DNS for db2014,db2020,db2021,db2022,db2024,db2031 (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/507525 (owner: 10Papaul) [01:56:21] (03PS2) 10Papaul: DNS: Remove mgmt and production DNS for db2014,db2020,db2021,db2022,db2024,db2031 [dns] - 10https://gerrit.wikimedia.org/r/507525 [02:30:52] (03CR) 10Dzahn: "also see: https://phabricator.wikimedia.org/rMSCA54a2713aa23efaf640099ff5a26e3fb42762be02 , https://phabricator.wikimedia.org/T78076" [puppet] - 10https://gerrit.wikimedia.org/r/506750 (https://phabricator.wikimedia.org/T78076) (owner: 10Dzahn) [03:29:51] PROBLEM - Check systemd state on ms-be2015 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [03:35:07] RECOVERY - Check systemd state on ms-be2015 is OK: OK - running: The system is fully operational [04:01:07] PROBLEM - puppet last run on es1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:07:09] PROBLEM - puppet last run on db1075 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:09:43] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 71, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:09:51] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 229, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:23:13] PROBLEM - puppet last run on mc1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:24:47] PROBLEM - Check systemd state on ms-be1015 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [04:25:50] 10Operations, 10Continuous-Integration-Infrastructure: Jessie rsyslog_8.1901.0-1~bpo8+wmf1_amd64.deb package fails to upgrade - https://phabricator.wikimedia.org/T222166 (10hashar) The workaround kind of make sense, however whenever we provision a new instance we would end up with a broken apt upgrade due to t... [04:27:39] RECOVERY - puppet last run on es1017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:33:39] RECOVERY - puppet last run on db1075 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [04:49:15] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 73, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:49:17] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:49:45] RECOVERY - puppet last run on mc1036 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:50:20] any ops with a spare time to merge a puppet patch for me? [04:53:13] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 71, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:53:15] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 229, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:57:33] RECOVERY - Check systemd state on ms-be1015 is OK: OK - running: The system is fully operational [04:59:34] 10Operations, 10Wikimedia-Apache-configuration, 10Patch-For-Review, 10User-revi: Change kr.wikimedia.org redirection destination - https://phabricator.wikimedia.org/T222033 (10revi) p:05Normal→03Triage Can I get an attention from Ops? Seems like this is not eligible for Puppet swat as an apache config... [05:19:41] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:19:43] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 73, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:01:51] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1004 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [06:04:29] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1004 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [06:29:39] PROBLEM - Check systemd state on ms-be2015 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:34:57] RECOVERY - Check systemd state on ms-be2015 is OK: OK - running: The system is fully operational [10:30:39] PROBLEM - Check systemd state on ms-be2015 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [10:34:35] RECOVERY - Check systemd state on ms-be2015 is OK: OK - running: The system is fully operational [10:49:47] (03PS12) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) [10:49:49] (03CR) 10Mathew.onipe: icinga: create and apply cirrus config check (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [10:56:58] 10Operations, 10Performance-Team, 10Thumbor, 10Traffic, 10Patch-For-Review: SwiftMedia URL rewrite returns some 404s with wrong Content-Length - https://phabricator.wikimedia.org/T222071 (10Gilles) Probably [10:57:08] (03CR) 10Mathew.onipe: "PCC is happy: https://puppet-compiler.wmflabs.org/compiler1002/16252/" [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [11:00:05] MaxSem, RoanKattouw, and Niharika: Dear deployers, time to do the European Mid-day SWAT(Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190501T1100). [11:00:05] No GERRIT patches in the queue for this window AFAICS. [11:10:18] I'll deploy a config change, unless someone has something else to deploy now? [11:11:18] (03PS1) 10Gilles: Renew origin trial tokens for ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507545 (https://phabricator.wikimedia.org/T216499) [11:12:31] (03CR) 10Gilles: [C: 03+2] Renew origin trial tokens for ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507545 (https://phabricator.wikimedia.org/T216499) (owner: 10Gilles) [11:13:33] (03Merged) 10jenkins-bot: Renew origin trial tokens for ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507545 (https://phabricator.wikimedia.org/T216499) (owner: 10Gilles) [11:14:41] (03CR) 10jenkins-bot: Renew origin trial tokens for ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507545 (https://phabricator.wikimedia.org/T216499) (owner: 10Gilles) [11:22:05] !log gilles@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T216499 T216598 T216594 Renew origin trial tokens for ruwiki (duration: 01m 14s) [11:22:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:12] T216499: Priority Hints origin trial - https://phabricator.wikimedia.org/T216499 [11:22:12] T216594: Layout Stability API origin trial - https://phabricator.wikimedia.org/T216594 [11:22:13] T216598: Element Timing for Images origin trial - https://phabricator.wikimedia.org/T216598 [11:27:29] !log T216499 Y216594 T216598 mwscript purgeList.php ruwiki --all --verbose [11:27:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:27:34] T216499: Priority Hints origin trial - https://phabricator.wikimedia.org/T216499 [11:27:34] T216598: Element Timing for Images origin trial - https://phabricator.wikimedia.org/T216598 [11:33:46] 10Operations, 10puppet-compiler, 10Jenkins: compiler1002.puppet-diffs.eqiad.wmflabs disk is full - https://phabricator.wikimedia.org/T222072 (10Gilles) a:05Gilles→03None [11:33:58] jouncebot: now [11:33:58] For the next 0 hour(s) and 26 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190501T1100) [11:59:47] PROBLEM - Check systemd state on ms-be2015 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190501T1200) [12:16:04] (03PS1) 10Ottomata: Enable cirrussearch-request logging to eventgate-analytics for group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507550 (https://phabricator.wikimedia.org/T214080) [12:18:59] (03CR) 10Ottomata: "This config change will be a no-op until 1.34.0-wmf.3 goes to group1 wikis at 19:00 UTC today." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507550 (https://phabricator.wikimedia.org/T214080) (owner: 10Ottomata) [12:23:12] (03PS6) 10Jbond: logstash: add ulog parser to logstash [puppet] - 10https://gerrit.wikimedia.org/r/506400 (https://phabricator.wikimedia.org/T220987) [12:25:13] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/507522 (https://phabricator.wikimedia.org/T222214) (owner: 10Dzahn) [12:28:07] PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [12:33:23] RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [12:34:48] (03CR) 10Jbond: [C: 03+2] facter3/puppet5: enable puppet5/facter3 esams [puppet] - 10https://gerrit.wikimedia.org/r/507300 (https://phabricator.wikimedia.org/T219803) (owner: 10Jbond) [12:34:58] (03PS2) 10Jbond: facter3/puppet5: enable puppet5/facter3 esams [puppet] - 10https://gerrit.wikimedia.org/r/507300 (https://phabricator.wikimedia.org/T219803) [12:35:17] RECOVERY - Check systemd state on ms-be2015 is OK: OK - running: The system is fully operational [12:52:59] 10Operations, 10observability, 10Wikimedia-Incident: figure out why Kafka dashboard hammers Prometheus, and fix it - https://phabricator.wikimedia.org/T222112 (10Ottomata) Hm, I just edited some of those graphs so that A. they didn't use regex '=~' matching for $kafka_brokers, or if they did, I removed the :... [12:53:56] !log start recording 30 minutes of traffic from elasticsearch eqiad - T221121 [12:53:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:54:00] T221121: Capacity planning for elastic search - https://phabricator.wikimedia.org/T221121 [12:59:48] (03PS1) 10Gilles: Enable Feature Policy Reporting origin trial [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507555 (https://phabricator.wikimedia.org/T209572) [13:00:04] Deploy window MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190501T1300) [13:02:06] (03CR) 10Gilles: [C: 03+2] Enable Feature Policy Reporting origin trial [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507555 (https://phabricator.wikimedia.org/T209572) (owner: 10Gilles) [13:03:12] (03Merged) 10jenkins-bot: Enable Feature Policy Reporting origin trial [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507555 (https://phabricator.wikimedia.org/T209572) (owner: 10Gilles) [13:05:53] (03CR) 10jenkins-bot: Enable Feature Policy Reporting origin trial [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507555 (https://phabricator.wikimedia.org/T209572) (owner: 10Gilles) [13:10:36] (03PS1) 10Gilles: Fix syntax of wgFeaturePolicyReportOnly fields [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507559 (https://phabricator.wikimedia.org/T209572) [13:11:57] (03CR) 10Gilles: [C: 03+2] Fix syntax of wgFeaturePolicyReportOnly fields [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507559 (https://phabricator.wikimedia.org/T209572) (owner: 10Gilles) [13:12:57] (03Merged) 10jenkins-bot: Fix syntax of wgFeaturePolicyReportOnly fields [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507559 (https://phabricator.wikimedia.org/T209572) (owner: 10Gilles) [13:16:15] (03CR) 10jenkins-bot: Fix syntax of wgFeaturePolicyReportOnly fields [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507559 (https://phabricator.wikimedia.org/T209572) (owner: 10Gilles) [13:20:57] (03CR) 10Gehel: [C: 04-1] icinga: create and apply cirrus config check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [13:21:32] (03PS1) 10Jbond: prometheus: add timeout paramter to query method [software/spicerack] - 10https://gerrit.wikimedia.org/r/507561 [13:22:09] (03CR) 10Gehel: [C: 04-1] "Note that some of the changes here should be moved to the parent CR. Or we might actually want to merge those 2 CRs since they are related" [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [13:22:57] 10Operations, 10observability, 10Wikimedia-Incident: figure out why Kafka dashboard hammers Prometheus, and fix it - https://phabricator.wikimedia.org/T222112 (10Ottomata) Ah, Chris clued me in, I have to collapse the Row in order to move it. I've modified the Kafka dashboard so that only the Summary Row is... [13:24:31] (03PS1) 10Gilles: Remove unsupported wgFeaturePolicyReportOnly types [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507563 (https://phabricator.wikimedia.org/T209572) [13:25:37] (03CR) 10Gilles: [C: 03+2] Remove unsupported wgFeaturePolicyReportOnly types [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507563 (https://phabricator.wikimedia.org/T209572) (owner: 10Gilles) [13:26:36] (03Merged) 10jenkins-bot: Remove unsupported wgFeaturePolicyReportOnly types [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507563 (https://phabricator.wikimedia.org/T209572) (owner: 10Gilles) [13:26:40] (03CR) 10Herron: [C: 04-1] "Looks good overall! Needs a minor syntax and typo fix, please see inline. Works well in testing after making these changes." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/506400 (https://phabricator.wikimedia.org/T220987) (owner: 10Jbond) [13:27:17] (03CR) 10jenkins-bot: Remove unsupported wgFeaturePolicyReportOnly types [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507563 (https://phabricator.wikimedia.org/T209572) (owner: 10Gilles) [13:28:04] !log update puppet and facter on esams [13:28:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:31:45] !log gilles@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T209572 Enable Feature Policy Reporting origin trial (duration: 01m 01s) [13:31:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:31:49] T209572: Feature Policy Reporting origin trial - https://phabricator.wikimedia.org/T209572 [13:32:02] (03PS7) 10Jbond: logstash: add ulog parser to logstash [puppet] - 10https://gerrit.wikimedia.org/r/506400 (https://phabricator.wikimedia.org/T220987) [13:32:03] PROBLEM - cxserver endpoints health on scb2002 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [13:32:27] PROBLEM - cxserver endpoints health on scb1003 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [13:32:29] PROBLEM - cxserver endpoints health on scb1001 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [13:32:29] PROBLEM - cxserver endpoints health on scb1002 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [13:32:33] PROBLEM - cxserver endpoints health on scb2004 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [13:32:45] PROBLEM - cxserver endpoints health on scb2003 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [13:32:47] PROBLEM - cxserver endpoints health on scb2001 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [13:32:49] PROBLEM - cxserver endpoints health on scb2005 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [13:32:49] PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX [13:32:57] PROBLEM - cxserver endpoints health on scb1004 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [13:32:57] PROBLEM - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX [13:32:59] PROBLEM - cxserver endpoints health on scb2006 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [13:34:12] (03CR) 10Jbond: "issues should be fixed ready for review again" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/506400 (https://phabricator.wikimedia.org/T220987) (owner: 10Jbond) [13:35:36] (03CR) 10Herron: [C: 03+1] "Awesome! looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/506400 (https://phabricator.wikimedia.org/T220987) (owner: 10Jbond) [13:37:02] (03CR) 10Jbond: [C: 03+2] logstash: add ulog parser to logstash [puppet] - 10https://gerrit.wikimedia.org/r/506400 (https://phabricator.wikimedia.org/T220987) (owner: 10Jbond) [13:37:25] (03PS8) 10Jbond: logstash: add ulog parser to logstash [puppet] - 10https://gerrit.wikimedia.org/r/506400 (https://phabricator.wikimedia.org/T220987) [13:48:30] 10Operations, 10observability, 10Wikimedia-Incident: figure out why Kafka dashboard hammers Prometheus, and fix it - https://phabricator.wikimedia.org/T222112 (10CDanis) >>! In T222112#5149757, @Ottomata wrote: > I've modified the Kafka dashboard so that only the Summary Row is uncollapsed bym default. I've... [13:50:28] (03PS2) 10BBlack: Convert most DYNA into 1H CNAME records [dns] - 10https://gerrit.wikimedia.org/r/507399 (https://phabricator.wikimedia.org/T208263) [13:50:30] (03PS2) 10BBlack: Change CNAME->DYNA TTLs from 1H to 1D [dns] - 10https://gerrit.wikimedia.org/r/507400 (https://phabricator.wikimedia.org/T208263) [13:51:40] (03PS1) 10Gilles: Disable Reporting API endpoint [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507571 (https://phabricator.wikimedia.org/T209572) [13:52:59] (03CR) 10Gilles: [C: 03+2] Disable Reporting API endpoint [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507571 (https://phabricator.wikimedia.org/T209572) (owner: 10Gilles) [13:54:49] (03Merged) 10jenkins-bot: Disable Reporting API endpoint [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507571 (https://phabricator.wikimedia.org/T209572) (owner: 10Gilles) [13:57:50] !log gilles@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T209572 Disable Reporting API endpoint (duration: 00m 59s) [13:57:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:57:56] T209572: Feature Policy Reporting origin trial - https://phabricator.wikimedia.org/T209572 [14:00:47] (03CR) 10jenkins-bot: Disable Reporting API endpoint [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507571 (https://phabricator.wikimedia.org/T209572) (owner: 10Gilles) [14:05:27] I am checking the cxserver alert, it might be nothing [14:09:14] cxserver latency has increased, but I am not sure why [14:14:25] ACKNOWLEDGEMENT - cxserver endpoints health on scb1001 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received Effie Mouzeli These checks will be removed - effie https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [14:14:25] ACKNOWLEDGEMENT - cxserver endpoints health on scb1002 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received Effie Mouzeli These checks will be removed - effie https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [14:14:25] ACKNOWLEDGEMENT - cxserver endpoints health on scb1003 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received Effie Mouzeli These checks will be removed - effie https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [14:14:25] ACKNOWLEDGEMENT - cxserver endpoints health on scb1004 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received Effie Mouzeli These checks will be removed - effie https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [14:14:25] ACKNOWLEDGEMENT - cxserver endpoints health on scb2001 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received: /v2/page/{sourcelanguage}/{targetlanguage}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received Effie Mouzeli These checks will be removed - effie https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [14:14:25] ACKNOWLEDGEMENT - cxserver endpoints health on scb2002 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received: /v2/page/{sourcelanguage}/{targetlanguage}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received Effie Mouzeli These checks will be removed - effie https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [14:14:25] ACKNOWLEDGEMENT - cxserver endpoints health on scb2003 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received: /v2/page/{sourcelanguage}/{targetlanguage}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received Effie Mouzeli These checks will be removed - effie https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [14:14:26] ACKNOWLEDGEMENT - cxserver endpoints health on scb2004 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received: /v2/page/{sourcelanguage}/{targetlanguage}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received Effie Mouzeli These checks will be removed - effie https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [14:14:26] ACKNOWLEDGEMENT - cxserver endpoints health on scb2005 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received: /v2/page/{sourcelanguage}/{targetlanguage}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received Effie Mouzeli These checks will be removed - effie https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [14:14:27] ACKNOWLEDGEMENT - cxserver endpoints health on scb2006 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received: /v2/page/{sourcelanguage}/{targetlanguage}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received Effie Mouzeli These checks will be removed - effie https://wikitech.wikimedia.org/wiki/Services/Monitoring/cxserver [14:15:20] I am not ACKing the cxserver.svc.codfw.wmnet yet [14:15:29] I will have a look in a bit [14:32:21] (03PS2) 10Rush: wikitech: Disable Gerrit accounts when blocked on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506587 (https://phabricator.wikimedia.org/T218654) (owner: 10BryanDavis) [14:32:36] (03PS3) 10Rush: Revert "striker: Disable developer account creation" [puppet] - 10https://gerrit.wikimedia.org/r/507351 (https://phabricator.wikimedia.org/T219830) (owner: 10BryanDavis) [14:34:15] (03Abandoned) 10Rush: labstore: fix rsync rule for misc [puppet] - 10https://gerrit.wikimedia.org/r/392063 (https://phabricator.wikimedia.org/T165136) (owner: 10Rush) [14:40:49] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission osm-db200[12] and osm-web200[1234] - https://phabricator.wikimedia.org/T187445 (10Papaul) [14:44:07] PROBLEM - puppet last run on elastic1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:46:21] (03CR) 10Reedy: [C: 03+2] wikitech: Disable Gerrit accounts when blocked on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506587 (https://phabricator.wikimedia.org/T218654) (owner: 10BryanDavis) [14:47:24] (03Merged) 10jenkins-bot: wikitech: Disable Gerrit accounts when blocked on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506587 (https://phabricator.wikimedia.org/T218654) (owner: 10BryanDavis) [14:52:48] !log reedy@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Add new logging channel for wikitech (duration: 00m 58s) [14:52:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:19] !log reedy@deploy1001 Synchronized wmf-config/wikitech.php: propagate blocks to gerrit (duration: 00m 57s) [14:54:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:41] (03CR) 10jenkins-bot: wikitech: Disable Gerrit accounts when blocked on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506587 (https://phabricator.wikimedia.org/T218654) (owner: 10BryanDavis) [15:06:50] (03PS1) 10Reedy: Revert "Temporarily disable account creation on wikitech" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507594 [15:10:37] RECOVERY - puppet last run on elastic1046 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:26:54] (03PS13) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) [15:27:47] (03PS2) 10Sbisson: Enable cirrussearch-request logging to eventgate-analytics for group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507550 (https://phabricator.wikimedia.org/T214080) (owner: 10Ottomata) [15:27:51] (03CR) 10Mathew.onipe: icinga: create and apply cirrus config check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [15:29:25] ACKNOWLEDGEMENT - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received Effie Mouzeli A change in restbase has increased cxservers latency and thsi check fails, well fix it - effie https://wikitech.wikimedia.org/wiki/CX [15:29:25] ACKNOWLEDGEMENT - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received Effie Mouzeli A change in restbase has increased cxservers latency and thsi check fails, well fix it - effie https://wikitech.wikimedia.org/wiki/CX [15:30:33] (03PS14) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) [15:33:33] (03PS15) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) [15:37:53] (03PS16) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) [15:41:37] (03Abandoned) 10Mathew.onipe: elasticsearch: config file for aligning puppet config [puppet] - 10https://gerrit.wikimedia.org/r/506378 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [15:42:18] (03CR) 10Mathew.onipe: "PCC is ok and expected: https://puppet-compiler.wmflabs.org/compiler1002/16257/" [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [15:49:35] (03PS1) 10Reedy: Revert "Adjust wikitech account settings." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507604 [15:49:39] PROBLEM - High CPU load on API appserver on mw1339 is CRITICAL: CRITICAL - load average: 87.84, 57.19, 32.19 [15:49:46] jouncebot: now [15:49:46] No deployments scheduled for the next 0 hour(s) and 10 minute(s) [15:49:47] jouncebot: next [15:49:47] (03CR) 10jerkins-bot: [V: 04-1] Revert "Adjust wikitech account settings." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507604 (owner: 10Reedy) [15:49:48] In 0 hour(s) and 10 minute(s): Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190501T1600) [15:50:11] (03CR) 10Rush: [C: 03+1] "forget you jenkins!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507604 (owner: 10Reedy) [15:52:07] (03PS2) 10Reedy: Revert "Adjust wikitech account settings." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507604 [15:53:17] (03CR) 10CRusnov: "LGTM, only the g= tag is removed." [dns] - 10https://gerrit.wikimedia.org/r/504948 (https://phabricator.wikimedia.org/T221290) (owner: 10Cwhite) [15:53:21] (03CR) 10CRusnov: [C: 03+1] remove granularity key from wiki-mail DKIM [dns] - 10https://gerrit.wikimedia.org/r/504948 (https://phabricator.wikimedia.org/T221290) (owner: 10Cwhite) [15:53:33] RECOVERY - High CPU load on API appserver on mw1339 is OK: OK - load average: 20.84, 35.87, 28.92 [15:54:03] (03PS3) 10Reedy: Partial Revert "Adjust wikitech account settings." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507604 [15:54:10] (03CR) 10CDanis: profile::ganeti: Add cumin hosts to RAPI and cleanup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/507384 (owner: 10CRusnov) [15:54:28] (03CR) 10Reedy: [C: 03+2] Partial Revert "Adjust wikitech account settings." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507604 (owner: 10Reedy) [15:55:34] (03Merged) 10jenkins-bot: Partial Revert "Adjust wikitech account settings." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507604 (owner: 10Reedy) [15:57:04] (03CR) 10CRusnov: profile::ganeti: Add cumin hosts to RAPI and cleanup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/507384 (owner: 10CRusnov) [15:58:11] (03CR) 10CDanis: [C: 03+1] profile::ganeti: Add cumin hosts to RAPI and cleanup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/507384 (owner: 10CRusnov) [15:58:19] !log reedy@deploy1001 Synchronized wmf-config/wikitech.php: Re-enable password reset on wikitech (duration: 00m 58s) [15:58:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:04] MaxSem, RoanKattouw, and Niharika: Time to snap out of that daydream and deploy Morning SWAT (Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190501T1600). [16:00:04] ottomata and stephanebisson: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:26] (03CR) 10CRusnov: [C: 03+1] "If I'm reading the documentation correctly this looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/507365 (https://phabricator.wikimedia.org/T222198) (owner: 10Herron) [16:00:46] (03PS4) 10CRusnov: profile::ganeti: Add cumin hosts to RAPI and cleanup [puppet] - 10https://gerrit.wikimedia.org/r/507384 [16:01:15] (03CR) 10Herron: [C: 04-1] "Looks good aside from typo, please see inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/507384 (owner: 10CRusnov) [16:01:23] I'll SWAT [16:01:58] (03CR) 10jenkins-bot: Partial Revert "Adjust wikitech account settings." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507604 (owner: 10Reedy) [16:02:25] (03PS1) 10Ayounsi: LibreNMS, remove email_from from config.php [puppet] - 10https://gerrit.wikimedia.org/r/507605 (https://phabricator.wikimedia.org/T207706) [16:03:02] ottomata: Your change is a no-op unti wmf.3 in on group 1, should I deploy it now or do you prefer to deploy it when you can actually test it? [16:03:43] (03PS5) 10CRusnov: profile::ganeti: Add cumin hosts to RAPI and cleanup [puppet] - 10https://gerrit.wikimedia.org/r/507384 [16:03:51] (03CR) 10Ayounsi: [C: 03+2] LibreNMS, remove email_from from config.php [puppet] - 10https://gerrit.wikimedia.org/r/507605 (https://phabricator.wikimedia.org/T207706) (owner: 10Ayounsi) [16:03:55] (03CR) 10CRusnov: profile::ganeti: Add cumin hosts to RAPI and cleanup (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/507384 (owner: 10CRusnov) [16:04:11] (03PS6) 10CRusnov: profile::ganeti: Add cumin hosts to RAPI and cleanup [puppet] - 10https://gerrit.wikimedia.org/r/507384 [16:04:29] (03CR) 10Herron: [C: 03+1] "looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/507384 (owner: 10CRusnov) [16:05:00] ottomata: Let me know when you are around. I'll start with other patches in the meantime. [16:05:49] (03CR) 10Herron: [C: 03+1] LibreNMS, remove email_from from config.php [puppet] - 10https://gerrit.wikimedia.org/r/507605 (https://phabricator.wikimedia.org/T207706) (owner: 10Ayounsi) [16:07:39] (03CR) 10CRusnov: [C: 03+2] profile::ganeti: Add cumin hosts to RAPI and cleanup [puppet] - 10https://gerrit.wikimedia.org/r/507384 (owner: 10CRusnov) [16:08:11] (03PS7) 10CRusnov: profile::ganeti: Add cumin hosts to RAPI and cleanup [puppet] - 10https://gerrit.wikimedia.org/r/507384 [16:09:53] stephanebisson: sorry in meeting [16:09:56] yes please deploy [16:10:03] its already live for group0 [16:10:05] and working fine [16:10:13] so deploying that now will just make it live for group1 when wmf.3 goes out later [16:10:23] i'm here, just in standup [16:11:01] ottomata: OK, thanks. I'll do it after the current patch. [16:13:37] (03PS3) 10Sbisson: Enable cirrussearch-request logging to eventgate-analytics for group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507550 (https://phabricator.wikimedia.org/T214080) (owner: 10Ottomata) [16:13:55] (03CR) 10Sbisson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507550 (https://phabricator.wikimedia.org/T214080) (owner: 10Ottomata) [16:14:55] (03Merged) 10jenkins-bot: Enable cirrussearch-request logging to eventgate-analytics for group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507550 (https://phabricator.wikimedia.org/T214080) (owner: 10Ottomata) [16:17:25] !log sbisson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:507550|Enable cirrussearch-request logging to eventgate-analytics for group1 wikis]] (duration: 01m 00s) [16:17:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:31] ottomata: Done [16:22:35] thanks! [16:23:43] !log sbisson@deploy1001 Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.Mentorship.js: SWAT: [[gerrit:507580|Mentorship module: Add data-link-id to mentor's talkpage link]] (duration: 01m 01s) [16:23:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:24:00] (03CR) 10jenkins-bot: Enable cirrussearch-request logging to eventgate-analytics for group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507550 (https://phabricator.wikimedia.org/T214080) (owner: 10Ottomata) [16:27:10] 10Operations, 10ops-codfw, 10DBA, 10Goal, 10Patch-For-Review: rack/setup/install db2[103-120].codfw.wmnet (18 hosts) - https://phabricator.wikimedia.org/T221532 (10Papaul) [16:32:23] (03PS2) 10Reedy: Revert "Temporarily disable account creation on wikitech" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507594 [16:34:51] (03PS1) 10EBernhardson: Start writing to cloudelastic from testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507609 (https://phabricator.wikimedia.org/T220625) [16:35:44] (03CR) 10jerkins-bot: [V: 04-1] Start writing to cloudelastic from testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507609 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [16:38:02] (03PS1) 10Ayounsi: LibreNMS, run irc-bot as user deploy-librenms [puppet] - 10https://gerrit.wikimedia.org/r/507610 (https://phabricator.wikimedia.org/T207706) [16:39:25] (03CR) 10Ayounsi: [C: 03+2] LibreNMS, run irc-bot as user deploy-librenms [puppet] - 10https://gerrit.wikimedia.org/r/507610 (https://phabricator.wikimedia.org/T207706) (owner: 10Ayounsi) [16:41:13] !log sbisson@deploy1001 Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/includes/HelpPanel/QuestionPoster.php: SWAT: [[gerrit:507593|Re-use timestamp for section header and question storage]] (duration: 01m 01s) [16:41:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:42:47] (03CR) 10Herron: [C: 03+1] LibreNMS, run irc-bot as user deploy-librenms [puppet] - 10https://gerrit.wikimedia.org/r/507610 (https://phabricator.wikimedia.org/T207706) (owner: 10Ayounsi) [16:48:57] !log sbisson@deploy1001 Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/includes/HelpPanel/QuestionPoster.php: SWAT: [[gerrit:507593|Re-use timestamp for section header and question storage]] (duration: 01m 01s) [16:48:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:50] can i sneak one more item into SWAT? [16:51:59] (03CR) 10MSantos: [C: 04-1] "Maybe abandoning it would be a good idea." [puppet] - 10https://gerrit.wikimedia.org/r/457408 (https://phabricator.wikimedia.org/T198622) (owner: 10Gehel) [16:52:05] (03PS2) 10EBernhardson: Start writing to cloudelastic from testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507609 (https://phabricator.wikimedia.org/T220625) [16:52:06] !log sbisson@deploy1001 Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.QuestionPosterDialog.js: SWAT: [[gerrit:507598|Ensure text exists before logging enter-question-text action]] (duration: 01m 00s) [16:52:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:52:29] (03Abandoned) 10Gehel: maps: migrate maps2004 to stretch [puppet] - 10https://gerrit.wikimedia.org/r/457408 (https://phabricator.wikimedia.org/T198622) (owner: 10Gehel) [16:53:09] (03CR) 10MSantos: [C: 04-1] maps: change partitioning scheme for new SSDs in maps2004 [puppet] - 10https://gerrit.wikimedia.org/r/457409 (https://phabricator.wikimedia.org/T195285) (owner: 10Gehel) [16:53:32] (03Abandoned) 10Gehel: maps: change partitioning scheme for new SSDs in maps2004 [puppet] - 10https://gerrit.wikimedia.org/r/457409 (https://phabricator.wikimedia.org/T195285) (owner: 10Gehel) [16:53:50] SWAT is finished [16:54:00] 04Critical Testing transport from LibreNMS [16:54:00] This is a test alert [16:54:24] woop [16:55:10] ok, i'm going to deploy one late-added swat patch to test something on testwiki [16:55:26] (03PS3) 10EBernhardson: Start writing to cloudelastic from testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507609 (https://phabricator.wikimedia.org/T220625) [16:55:33] (03CR) 10EBernhardson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507609 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [16:56:37] (03Merged) 10jenkins-bot: Start writing to cloudelastic from testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507609 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [16:57:14] (03CR) 10jenkins-bot: Start writing to cloudelastic from testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507609 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [16:58:34] (03PS2) 10Dzahn: admins: add Joel Aufrecht to ldap_only_admins [puppet] - 10https://gerrit.wikimedia.org/r/507522 (https://phabricator.wikimedia.org/T222214) [16:58:47] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T220625 Start writing to cloudelastic from testwiki (duration: 01m 01s) [16:58:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:58:51] T220625: Initialize CirrusSearch on cloudelastic - https://phabricator.wikimedia.org/T220625 [16:59:04] (03CR) 10Dzahn: [C: 03+2] "for logstash access" [puppet] - 10https://gerrit.wikimedia.org/r/507522 (https://phabricator.wikimedia.org/T222214) (owner: 10Dzahn) [17:02:26] !log joal@deploy1001 Started deploy [analytics/refinery@682ab7c]: Regular analytics weekly train [17:02:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:03:27] (03PS1) 10Papaul: DNS: Add mgmt and production DNS for db2[103-120] [dns] - 10https://gerrit.wikimedia.org/r/507613 [17:03:48] (03CR) 10jerkins-bot: [V: 04-1] DNS: Add mgmt and production DNS for db2[103-120] [dns] - 10https://gerrit.wikimedia.org/r/507613 (owner: 10Papaul) [17:09:21] PROBLEM - Request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 verb=PATCH https://grafana.wikimedia.org/dashboard/db/kubernetes-api [17:09:26] 04Critical Testing transport from LibreNMS [17:09:26] This is a test alert [17:09:31] PROBLEM - etcd request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 operation=compareAndSwap https://grafana.wikimedia.org/dashboard/db/kubernetes-api [17:10:29] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=PATCH https://grafana.wikimedia.org/dashboard/db/kubernetes-api [17:10:40] 04Critical Testing transport from LibreNMS [17:10:43] cool [17:11:23] PROBLEM - etcd request latencies on neon is CRITICAL: instance=10.64.0.40:6443 operation={compareAndSwap,get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [17:11:27] nice [17:11:57] PROBLEM - Disk space on notebook1003 is CRITICAL: DISK CRITICAL - free space: /srv 4479 MB (3% inode=87%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space [17:11:57] PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/pro [17:13:21] PROBLEM - Disk space on notebook1004 is CRITICAL: DISK CRITICAL - free space: /srv 5252 MB (3% inode=83%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space [17:14:20] (03PS1) 10Ayounsi: LibreNMS, set IRC alerting to single line messages [puppet] - 10https://gerrit.wikimedia.org/r/507616 (https://phabricator.wikimedia.org/T207706) [17:15:32] (03CR) 10Ayounsi: [C: 03+2] LibreNMS, set IRC alerting to single line messages [puppet] - 10https://gerrit.wikimedia.org/r/507616 (https://phabricator.wikimedia.org/T207706) (owner: 10Ayounsi) [17:17:09] RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton [17:18:41] RECOVERY - etcd request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [17:18:48] (03PS2) 10Papaul: DNS: Add mgmt and production DNS for db2[103-120] [dns] - 10https://gerrit.wikimedia.org/r/507613 [17:19:13] RECOVERY - etcd request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [17:19:35] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [17:19:49] RECOVERY - Request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [17:24:13] (03CR) 10Herron: [C: 03+2] remove granularity key from wiki-mail DKIM [dns] - 10https://gerrit.wikimedia.org/r/504948 (https://phabricator.wikimedia.org/T221290) (owner: 10Cwhite) [17:24:17] (03PS3) 10Herron: remove granularity key from wiki-mail DKIM [dns] - 10https://gerrit.wikimedia.org/r/504948 (https://phabricator.wikimedia.org/T221290) (owner: 10Cwhite) [17:27:45] !log joal@deploy1001 Finished deploy [analytics/refinery@682ab7c]: Regular analytics weekly train (duration: 25m 18s) [17:27:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:33:28] 10Operations, 10DNS, 10Mail, 10Traffic, 10Patch-For-Review: wiki-mail DKIM failing - https://phabricator.wikimedia.org/T221290 (10herron) Looking better after merging the above. From a password reminder mail: ` Date: Wed, 01 May 2019 17:29:44 +0000 dkim=pass (1024-bit rsa key sha256) header.d=wikim... [17:39:21] RECOVERY - Disk space on notebook1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space [17:39:25] RECOVERY - Disk space on notebook1004 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space [17:40:06] !log joal@deploy1001 Started deploy [analytics/refinery@682ab7c]: Regular analytics weekly train - Second try after space freed [17:40:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:43:21] !log joal@deploy1001 Finished deploy [analytics/refinery@682ab7c]: Regular analytics weekly train - Second try after space freed (duration: 03m 15s) [17:43:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:44:51] 10Operations, 10puppet-compiler, 10Jenkins: compiler1002.puppet-diffs.eqiad.wmflabs disk is full - https://phabricator.wikimedia.org/T222072 (10Dzahn) p:05Triage→03Normal [17:52:35] PROBLEM - Disk space on notebook1003 is CRITICAL: DISK CRITICAL - /mnt/hdfs is not accessible: Transport endpoint is not connected https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space [17:53:52] mrr? [17:59:58] !log force remount of /mnt/hdfs on notebook1003 (fuse hdfs got stuck) [18:00:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:05] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190501T1800) [18:00:23] RECOVERY - Disk space on notebook1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space [18:05:19] (03PS1) 10Dzahn: puppet_compiler: add cron to delete old output files [puppet] - 10https://gerrit.wikimedia.org/r/507623 (https://phabricator.wikimedia.org/T222072) [18:06:13] (03CR) 10jerkins-bot: [V: 04-1] puppet_compiler: add cron to delete old output files [puppet] - 10https://gerrit.wikimedia.org/r/507623 (https://phabricator.wikimedia.org/T222072) (owner: 10Dzahn) [18:06:49] PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:07:55] ^^ refinery-sqoop-mediawiki-production.service failed [18:13:27] RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational [18:16:27] (03PS2) 10Dzahn: puppet_compiler: add cron to delete old output files [puppet] - 10https://gerrit.wikimedia.org/r/507623 (https://phabricator.wikimedia.org/T222072) [18:18:16] (03CR) 10Dzahn: puppet_compiler: add cron to delete old output files (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/507623 (https://phabricator.wikimedia.org/T222072) (owner: 10Dzahn) [18:19:45] 10Operations, 10puppet-compiler, 10Jenkins, 10Patch-For-Review: compiler1002.puppet-diffs.eqiad.wmflabs disk is full - https://phabricator.wikimedia.org/T222072 (10Dzahn) >>! In T222072#5146890, @hashar wrote: > * add a puppet tidy class to purge old artifacts I think it can be even simpler. How about ht... [18:21:08] chaomodus: we are looking into the refinery-sqoop-mediawiki-production.service [18:21:17] Oke doke :) [18:21:21] was just filing a bug for you [18:22:07] (03CR) 10Herron: [C: 03+1] puppet_compiler: add cron to delete old output files [puppet] - 10https://gerrit.wikimedia.org/r/507623 (https://phabricator.wikimedia.org/T222072) (owner: 10Dzahn) [18:28:52] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: decommission db2014,db2020, db2021, db2022, db2024, db2031 - https://phabricator.wikimedia.org/T221424 (10Dzahn) Are the "remove all remaining puppet references" and "disable puppet" boxes done? [18:29:30] 10Operations, 10puppet-compiler, 10Jenkins, 10Patch-For-Review: compiler1002.puppet-diffs.eqiad.wmflabs disk is full - https://phabricator.wikimedia.org/T222072 (10herron) >>! In T222072#5150744, @Dzahn wrote: > I think it can be even simpler. How about https://gerrit.wikimedia.org/r/c/operations/puppet/+... [18:31:20] (03PS1) 10Ayounsi: LibreNMS, alerting syntax change (REGEX instead of ~) [puppet] - 10https://gerrit.wikimedia.org/r/507624 (https://phabricator.wikimedia.org/T207706) [18:31:37] (03CR) 10Dzahn: [C: 04-1] "db2024 is in the list but doesn't actually get removed" [dns] - 10https://gerrit.wikimedia.org/r/507525 (owner: 10Papaul) [18:34:45] (03PS2) 10Ayounsi: LibreNMS, alerting syntax change (REGEX instead of ~) [puppet] - 10https://gerrit.wikimedia.org/r/507624 (https://phabricator.wikimedia.org/T207706) [18:37:06] (03CR) 10Ayounsi: [C: 03+2] LibreNMS, alerting syntax change (REGEX instead of ~) [puppet] - 10https://gerrit.wikimedia.org/r/507624 (https://phabricator.wikimedia.org/T207706) (owner: 10Ayounsi) [18:41:18] 10Operations, 10puppet-compiler, 10Jenkins, 10Patch-For-Review: compiler1002.puppet-diffs.eqiad.wmflabs disk is full - https://phabricator.wikimedia.org/T222072 (10herron) On paper this use case also would lend itself to a filesystem with transparent compression. Maybe btrfs with compression. The data st... [18:46:33] (03CR) 10Dzahn: "can we send the output of this command to /dev/null? We are currently getting a lot of cron spam from cp servers because:" [puppet] - 10https://gerrit.wikimedia.org/r/385187 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [19:00:04] thcipriani: It is that lovely time of the day again! You are hereby commanded to deploy MediaWiki train - Americas version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190501T1900). [19:04:52] * thcipriani does [19:07:15] (03CR) 10Thcipriani: [V: 03+2 C: 03+2] Remove quota plugin [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/507523 (owner: 10Paladox) [19:11:46] (03PS1) 10Thcipriani: group1 wikis to 1.34.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507628 [19:11:48] (03CR) 10Thcipriani: [C: 03+2] group1 wikis to 1.34.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507628 (owner: 10Thcipriani) [19:13:02] (03Merged) 10jenkins-bot: group1 wikis to 1.34.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507628 (owner: 10Thcipriani) [19:13:28] (03PS1) 10Ayounsi: LibreNMS, fix typo (REGEX -> REGEXP) [puppet] - 10https://gerrit.wikimedia.org/r/507629 (https://phabricator.wikimedia.org/T207706) [19:13:39] (03PS1) 10Elukey: Fix default log4j logging for Hadoop Namenode [puppet/cdh] - 10https://gerrit.wikimedia.org/r/507630 (https://phabricator.wikimedia.org/T220702) [19:14:13] (03CR) 10Ayounsi: [C: 03+2] LibreNMS, fix typo (REGEX -> REGEXP) [puppet] - 10https://gerrit.wikimedia.org/r/507629 (https://phabricator.wikimedia.org/T207706) (owner: 10Ayounsi) [19:15:13] (03PS2) 10Elukey: Fix default log4j logging for Hadoop Namenode [puppet/cdh] - 10https://gerrit.wikimedia.org/r/507630 (https://phabricator.wikimedia.org/T220702) [19:15:33] !log thcipriani@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.3 [19:15:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:16:32] (03CR) 10Ottomata: [C: 03+1] Fix default log4j logging for Hadoop Namenode [puppet/cdh] - 10https://gerrit.wikimedia.org/r/507630 (https://phabricator.wikimedia.org/T220702) (owner: 10Elukey) [19:17:20] (03CR) 10Elukey: [V: 03+2 C: 03+2] Fix default log4j logging for Hadoop Namenode [puppet/cdh] - 10https://gerrit.wikimedia.org/r/507630 (https://phabricator.wikimedia.org/T220702) (owner: 10Elukey) [19:17:27] !log thcipriani@deploy1001 Synchronized php: group1 wikis to 1.34.0-wmf.3 (duration: 01m 53s) [19:17:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:18:10] ugh 60 second timeouts [19:18:41] (03PS1) 10Elukey: Update the cdh module to its latest sha [puppet] - 10https://gerrit.wikimedia.org/r/507631 [19:18:53] (03CR) 10Elukey: [V: 03+2 C: 03+2] Update the cdh module to its latest sha [puppet] - 10https://gerrit.wikimedia.org/r/507631 (owner: 10Elukey) [19:20:58] (03CR) 10jenkins-bot: group1 wikis to 1.34.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507628 (owner: 10Thcipriani) [19:21:52] (03PS1) 10Cwhite: initial attempt at a varnishkafka exporter [debs/prometheus-varnishkafka-exporter] - 10https://gerrit.wikimedia.org/r/507632 (https://phabricator.wikimedia.org/T196066) [19:22:53] (03PS1) 10Ottomata: Include eventlogging::dependencies on stat boxes to help with backfilling [puppet] - 10https://gerrit.wikimedia.org/r/507633 [19:23:01] PROBLEM - puppet last run on cp5005 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [19:25:27] (03PS1) 10Dzahn: smart-data-dump: add '-l error' to facter command to suppress warnings [puppet] - 10https://gerrit.wikimedia.org/r/507634 [19:25:33] (03CR) 10Nuria: [C: 03+1] Include eventlogging::dependencies on stat boxes to help with backfilling [puppet] - 10https://gerrit.wikimedia.org/r/507633 (owner: 10Ottomata) [19:26:49] (03PS2) 10Dzahn: smart-data-dump: add '-l error' to facter command to suppress warnings [puppet] - 10https://gerrit.wikimedia.org/r/507634 [19:27:52] (03CR) 10Ottomata: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/16258/" [puppet] - 10https://gerrit.wikimedia.org/r/507633 (owner: 10Ottomata) [19:29:06] (03PS3) 10Dzahn: smart-data-dump: add '-l error' to facter command to suppress warnings [puppet] - 10https://gerrit.wikimedia.org/r/507634 (https://phabricator.wikimedia.org/T86552) [19:29:23] (03CR) 10CDanis: smart-data-dump: add '-l error' to facter command to suppress warnings (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/507634 (https://phabricator.wikimedia.org/T86552) (owner: 10Dzahn) [19:29:36] (03CR) 10jerkins-bot: [V: 04-1] smart-data-dump: add '-l error' to facter command to suppress warnings [puppet] - 10https://gerrit.wikimedia.org/r/507634 (https://phabricator.wikimedia.org/T86552) (owner: 10Dzahn) [19:31:44] (03PS4) 10Dzahn: smart-data-dump: add '-l error' to facter command to suppress warnings [puppet] - 10https://gerrit.wikimedia.org/r/507634 (https://phabricator.wikimedia.org/T86552) [19:31:52] (03CR) 10Dzahn: smart-data-dump: add '-l error' to facter command to suppress warnings (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/507634 (https://phabricator.wikimedia.org/T86552) (owner: 10Dzahn) [19:33:27] (03CR) 10CDanis: [C: 03+1] smart-data-dump: add '-l error' to facter command to suppress warnings [puppet] - 10https://gerrit.wikimedia.org/r/507634 (https://phabricator.wikimedia.org/T86552) (owner: 10Dzahn) [19:40:33] (03CR) 10Dzahn: [C: 03+2] smart-data-dump: add '-l error' to facter command to suppress warnings [puppet] - 10https://gerrit.wikimedia.org/r/507634 (https://phabricator.wikimedia.org/T86552) (owner: 10Dzahn) [19:40:41] (03PS5) 10Dzahn: smart-data-dump: add '-l error' to facter command to suppress warnings [puppet] - 10https://gerrit.wikimedia.org/r/507634 (https://phabricator.wikimedia.org/T86552) [19:42:02] (03PS1) 10Ottomata: Only install eventlogging::dependencies on non buster [puppet] - 10https://gerrit.wikimedia.org/r/507639 [19:47:08] (03PS2) 10Ottomata: Only install eventlogging::dependencies on non buster [puppet] - 10https://gerrit.wikimedia.org/r/507639 [19:47:14] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Only install eventlogging::dependencies on non buster [puppet] - 10https://gerrit.wikimedia.org/r/507639 (owner: 10Ottomata) [19:47:59] PROBLEM - SSH on ms-be2018 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:48:34] Whoever is on the train deployment -See https://phabricator.wikimedia.org/T218511 [19:49:29] PROBLEM - MD RAID on ms-be2018 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.16.160: Connection reset by peer https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [19:49:33] RECOVERY - puppet last run on cp5005 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [19:49:37] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2018 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.16.160: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [19:50:27] RECOVERY - SSH on ms-be2018 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:50:39] RECOVERY - MD RAID on ms-be2018 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [19:50:47] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be2018 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [19:52:34] (03PS1) 10Dzahn: Revert "smart-data-dump: add '-l error' to facter command to suppress warnings" [puppet] - 10https://gerrit.wikimedia.org/r/507642 [19:53:26] (03PS2) 10Dzahn: Revert "smart-data-dump: add '-l error' to facter command to suppress warnings" [puppet] - 10https://gerrit.wikimedia.org/r/507642 [19:53:57] (03PS3) 10Dzahn: Revert "smart-data-dump: add '-l error' to facter command to suppress warnings" [puppet] - 10https://gerrit.wikimedia.org/r/507642 [19:54:15] PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:54:43] (03CR) 10Dzahn: [C: 03+2] "unfortunately there are more hosts with facter 2 than with facter 3.. so that would be even more spam" [puppet] - 10https://gerrit.wikimedia.org/r/507642 (owner: 10Dzahn) [19:58:47] ^^ is us [19:58:58] ottomata: thanks ! [19:59:40] (03PS1) 10Reedy: Invalidate user sessions and log them out upon blocking on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507644 (https://phabricator.wikimedia.org/T222282) [20:00:04] cscott, arlolra, subbu, bearND, and halfak: (Dis)respected human, time to deploy Services – Parsoid / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190501T2000). Please do the needful. [20:00:19] (03CR) 10Rush: [C: 03+1] Invalidate user sessions and log them out upon blocking on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507644 (https://phabricator.wikimedia.org/T222282) (owner: 10Reedy) [20:00:20] I've got a minor ORES deployment [20:00:31] I'll kick it off now. [20:00:42] (03CR) 10jerkins-bot: [V: 04-1] Invalidate user sessions and log them out upon blocking on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507644 (https://phabricator.wikimedia.org/T222282) (owner: 10Reedy) [20:01:06] !log halfak@deploy1001 Started deploy [ores/deploy@52e9759]: T222121 [20:01:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:01:18] T222121: Non-root features no longer being injected. - https://phabricator.wikimedia.org/T222121 [20:01:52] (03PS2) 10Reedy: Invalidate user sessions and log them out upon blocking on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507644 (https://phabricator.wikimedia.org/T222282) [20:02:15] (03CR) 10Reedy: "Why does the linter only complain about lack of tabs on hhvm?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507644 (https://phabricator.wikimedia.org/T222282) (owner: 10Reedy) [20:02:39] oh, it's not -lint, it's -test [20:03:15] ACKNOWLEDGEMENT - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. ottomata systemd timer stopped manually. [20:04:33] lol [20:10:45] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [20:11:37] PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [20:11:55] that's no good [20:14:53] (03CR) 10Rush: [C: 03+1] "nice" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507644 (https://phabricator.wikimedia.org/T222282) (owner: 10Reedy) [20:15:09] !log halfak@deploy1001 Finished deploy [ores/deploy@52e9759]: T222121 (duration: 14m 03s) [20:15:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:15:15] T222121: Non-root features no longer being injected. - https://phabricator.wikimedia.org/T222121 [20:16:55] RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [20:18:39] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [20:25:20] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: decommission db2014,db2020, db2021, db2022, db2024, db2031 - https://phabricator.wikimedia.org/T221424 (10Papaul) @Robh please take a look at this task if you have a minute if there are any other puppet or dhcp references before i m... [20:27:00] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: decommission db2014,db2020, db2021, db2022, db2024, db2031 - https://phabricator.wikimedia.org/T221424 (10Papaul) a:05Papaul→03RobH [20:30:38] All done! [20:34:22] 10Operations, 10Wikimedia-Mailing-lists: Close the engineering mailing list - https://phabricator.wikimedia.org/T222308 (10Jdforrester-WMF) [20:35:24] lol [20:37:01] (03PS1) 10EBernhardson: Start writing to cloudelastic for group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507703 (https://phabricator.wikimedia.org/T220625) [20:47:12] ebernhardson: so about writing to cloudelastic I noticed: https://phabricator.wikimedia.org/T222307 spike up for a few in the logs [20:47:49] an hour or so after wmf.3 deployment. Looked like group0 wikis only. [20:49:06] (03PS2) 10Herron: mx: disable multi_domain in smtp transports [puppet] - 10https://gerrit.wikimedia.org/r/507365 (https://phabricator.wikimedia.org/T222198) [20:50:41] (03CR) 10Herron: [C: 03+2] mx: disable multi_domain in smtp transports [puppet] - 10https://gerrit.wikimedia.org/r/507365 (https://phabricator.wikimedia.org/T222198) (owner: 10Herron) [20:53:45] (03PS3) 10Andrew Bogott: wmcs: Remove puppet code for the 'main' region [puppet] - 10https://gerrit.wikimedia.org/r/507340 (https://phabricator.wikimedia.org/T167293) [20:55:54] (03CR) 10Andrew Bogott: [C: 03+2] wmcs: Remove puppet code for the 'main' region [puppet] - 10https://gerrit.wikimedia.org/r/507340 (https://phabricator.wikimedia.org/T167293) (owner: 10Andrew Bogott) [21:00:07] (03PS2) 10Andrew Bogott: OpenStack: Update firewall defines to remove references to things in ::main:: [puppet] - 10https://gerrit.wikimedia.org/r/507505 [21:04:34] (03CR) 10Andrew Bogott: [C: 03+2] OpenStack: Update firewall defines to remove references to things in ::main:: [puppet] - 10https://gerrit.wikimedia.org/r/507505 (owner: 10Andrew Bogott) [21:16:52] (03PS2) 10Andrew Bogott: labtest/codfw-dev: remove some dangling references to the main region [puppet] - 10https://gerrit.wikimedia.org/r/507506 [21:16:54] (03PS2) 10Andrew Bogott: wmcs: update or remove some old references to the main region [puppet] - 10https://gerrit.wikimedia.org/r/507507 [21:16:56] (03PS2) 10Andrew Bogott: prometheus: update references to the no-longer-existing 'main' deploy [puppet] - 10https://gerrit.wikimedia.org/r/507508 [21:16:58] (03PS2) 10Andrew Bogott: wmcs: remove hiera references to the now-deleted main deploy [puppet] - 10https://gerrit.wikimedia.org/r/507509 [21:18:07] (03CR) 10Andrew Bogott: [C: 03+2] labtest/codfw-dev: remove some dangling references to the main region [puppet] - 10https://gerrit.wikimedia.org/r/507506 (owner: 10Andrew Bogott) [21:18:44] !log start importing group1 into cloudelastic from mwmaint1002 [21:18:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:26:33] (03CR) 10Andrew Bogott: [C: 03+2] wmcs: update or remove some old references to the main region [puppet] - 10https://gerrit.wikimedia.org/r/507507 (owner: 10Andrew Bogott) [21:34:08] (03PS1) 10Ottomata: Enable cirrussearch-request logging to eventgate-analytics on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507709 (https://phabricator.wikimedia.org/T214080) [21:36:41] (03CR) 10Ottomata: "This will be a no-op until 1.34.0-wmf.3 goes to group2 later in the afternoon on May 2. This config change will take effect then. It is " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507709 (https://phabricator.wikimedia.org/T214080) (owner: 10Ottomata) [21:37:59] (03CR) 10Andrew Bogott: [C: 03+2] prometheus: update references to the no-longer-existing 'main' deploy [puppet] - 10https://gerrit.wikimedia.org/r/507508 (owner: 10Andrew Bogott) [21:49:58] (03PS1) 10RobH: removing db2014,db2020, db2021, db2022, db2024, db2031 remaining references [puppet] - 10https://gerrit.wikimedia.org/r/507710 (https://phabricator.wikimedia.org/T221424) [21:50:08] (03CR) 10RobH: [C: 03+2] DNS: Remove mgmt and production DNS for db2014,db2020,db2021,db2022,db2024,db2031 [dns] - 10https://gerrit.wikimedia.org/r/507525 (owner: 10Papaul) [21:50:36] (03PS3) 10RobH: DNS: Remove mgmt and production DNS for db2014,db2020,db2021,db2022,db2024,db2031 [dns] - 10https://gerrit.wikimedia.org/r/507525 (owner: 10Papaul) [21:50:41] (03CR) 10jerkins-bot: [V: 04-1] removing db2014,db2020, db2021, db2022, db2024, db2031 remaining references [puppet] - 10https://gerrit.wikimedia.org/r/507710 (https://phabricator.wikimedia.org/T221424) (owner: 10RobH) [21:51:46] (03PS2) 10RobH: removing old db references [puppet] - 10https://gerrit.wikimedia.org/r/507710 (https://phabricator.wikimedia.org/T221424) [21:52:29] (03CR) 10RobH: [C: 03+2] removing old db references [puppet] - 10https://gerrit.wikimedia.org/r/507710 (https://phabricator.wikimedia.org/T221424) (owner: 10RobH) [21:55:16] (03PS4) 10RobH: DNS: Remove mgmt and production DNS for db2014,db2020,db2021,db2022,db2024,db2031 [dns] - 10https://gerrit.wikimedia.org/r/507525 (owner: 10Papaul) [21:55:31] (03CR) 10Catrope: [C: 04-1] Invariant config cleanup: V - Notifications matters (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501007 (owner: 10Jforrester) [21:57:24] !log start importing group2 to cloudelastic in parallel with group1 [21:57:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:58:47] (03CR) 10RobH: [C: 03+2] DNS: Remove mgmt and production DNS for db2014,db2020,db2021,db2022,db2024,db2031 [dns] - 10https://gerrit.wikimedia.org/r/507525 (owner: 10Papaul) [22:00:35] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: decommission db2014,db2020, db2021, db2022, db2024, db2031 - https://phabricator.wikimedia.org/T221424 (10RobH) [22:00:43] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: decommission db2014,db2020, db2021, db2022, db2024, db2031 - https://phabricator.wikimedia.org/T221424 (10RobH) 05Open→03Resolved a:05RobH→03None [22:03:51] (03CR) 10Gehel: [C: 04-1] icinga: create and apply cirrus config check (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [22:07:57] !log LDAP - adding jaufrecht to wmf (T222214) [22:08:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:08:01] T222214: Access request for wikitech:jaufrecht to logstash - https://phabricator.wikimedia.org/T222214 [22:20:59] (03CR) 10CRusnov: "Since it's the way the winds are blowing, I've looked into adding timeout to this module; unfortunately pynetbox doesn't support passing a" [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) (owner: 10CRusnov) [22:26:53] (03PS3) 10CRusnov: Minor improvements to PuppetDB report [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/506001 (https://phabricator.wikimedia.org/T220422) [22:27:46] (03CR) 10CRusnov: [C: 03+2] Minor improvements to PuppetDB report [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/506001 (https://phabricator.wikimedia.org/T220422) (owner: 10CRusnov) [22:27:51] 10Operations, 10DNS, 10Mail, 10Traffic, 10Patch-For-Review: wiki-mail DKIM failing - https://phabricator.wikimedia.org/T221290 (10Krenair) Looks better to me too. [22:40:28] (03PS1) 10Ayounsi: Add librenms laravel_app_key fake public key [labs/private] - 10https://gerrit.wikimedia.org/r/507715 (https://phabricator.wikimedia.org/T207706) [22:46:06] (03PS1) 10Ayounsi: LibreNMS, file files permission, add app key, add logrotate [puppet] - 10https://gerrit.wikimedia.org/r/507716 (https://phabricator.wikimedia.org/T207706) [22:46:55] (03CR) 10jerkins-bot: [V: 04-1] LibreNMS, file files permission, add app key, add logrotate [puppet] - 10https://gerrit.wikimedia.org/r/507716 (https://phabricator.wikimedia.org/T207706) (owner: 10Ayounsi) [22:47:25] (03PS2) 10Ayounsi: LibreNMS, file files permission, add app key, add logrotate [puppet] - 10https://gerrit.wikimedia.org/r/507716 (https://phabricator.wikimedia.org/T207706) [22:47:55] (03CR) 10jerkins-bot: [V: 04-1] LibreNMS, file files permission, add app key, add logrotate [puppet] - 10https://gerrit.wikimedia.org/r/507716 (https://phabricator.wikimedia.org/T207706) (owner: 10Ayounsi) [22:53:29] (03PS1) 10CRusnov: Add device model/device type parity check [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/507717 [22:53:58] (03CR) 10jerkins-bot: [V: 04-1] Add device model/device type parity check [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/507717 (owner: 10CRusnov) [23:00:04] MaxSem, RoanKattouw, and Niharika: #bothumor My software never has bugs. It just develops random features. Rise for Evening SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190501T2300). [23:00:04] ebernhardson: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:37] I can SWAT. [23:00:56] If ebernhardson is around. :) [23:02:56] (03PS2) 10CRusnov: Add device model/device type parity check [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/507717 [23:05:52] Oh I should add my patch too [23:07:06] Niharika: I added a patch to the page, and happy for you to SWAT it or to do it myself [23:07:38] RoanKattouw: Go ahead. I didn't swat yet - seems like Erik isn't around. [23:07:47] Cool thanks [23:09:13] Niharika: i'm here, not much to test on my patch we shipped a similar patch to enable just testwiki earlier, this turns on the rest of group0 [23:09:30] Niharika: nothing that can directly be tested, this variable is only referenced from job runners [23:09:49] * ebernhardson was doing an interview, ran a few minutes late [23:10:02] ebernhardson: Cool. I can swat your patch unless Roan is willing to do it after his own. [23:10:09] i can do it myself too, no big deal [23:12:00] ebernhardson: If it's a config patch go ahead, I'll be waiting for Jenkins for a while [23:14:15] ok [23:14:30] (03CR) 10EBernhardson: [C: 03+2] Start writing to cloudelastic for group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507703 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [23:15:37] (03Merged) 10jenkins-bot: Start writing to cloudelastic for group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507703 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [23:15:51] (03CR) 10jenkins-bot: Start writing to cloudelastic for group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507703 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [23:19:18] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T220625 Start writing to cloudelastic for group0 (duration: 01m 05s) [23:19:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:19:24] T220625: Initialize CirrusSearch on cloudelastic - https://phabricator.wikimedia.org/T220625 [23:20:43] RECOVERY - exim queue on mx1001 is OK: OK: Less than 1000 mails in exim queue. [23:24:48] RoanKattouw: I'm all done [23:25:47] (03CR) 10Ayounsi: "Thanks! reply inline." (032 comments) [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/507217 (owner: 10Ayounsi) [23:34:59] !log catrope@deploy1001 Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/: Drop RENDER_NOW for impact module images (T222223) (duration: 01m 04s) [23:35:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:35:04] T222223: Use thumb.php for impact image thumbnails - https://phabricator.wikimedia.org/T222223 [23:42:45] I'm alsodone