[00:02:15] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[00:02:21] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[00:12:33] <wikibugs>	 (03CR) 10BryanDavis: [C: 04-1] "Need to address issues from Brooke's review" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243) (owner: 10BryanDavis)
[00:13:26] <wikibugs>	 (03PS7) 10Alex Monk: labs puppetmaster migration: Puppet role for encapi/labspuppet DB hosts [puppet] - 10https://gerrit.wikimedia.org/r/500844
[00:17:02] <wikibugs>	 (03PS8) 10Alex Monk: labs puppetmaster migration: Puppet role for encapi/labspuppet DB hosts [puppet] - 10https://gerrit.wikimedia.org/r/500844
[00:20:57] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[00:25:58] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: Broken disk on ms-be2026 - https://phabricator.wikimedia.org/T219854 (10Papaul) @Volans the server is running on old firmware.   HPE Smart Storage Battery 1 Firmware 1.1 Embedded iLO 2.40 Dec 02 2015 System Board Intelligent Platform Abstraction Data 20.3 System...
[00:35:55] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[00:41:45] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[00:42:09] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[00:47:43] <wikibugs>	 (03PS12) 10Ayounsi: Bird anycast: add anycast_healthchecker [puppet] - 10https://gerrit.wikimedia.org/r/397723
[00:49:45] <icinga-wm>	 RECOVERY - SSH on labservices1001 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.11 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[00:53:37] <icinga-wm>	 PROBLEM - SSH on labservices1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[00:54:41] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[01:08:51] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[01:11:13] <icinga-wm>	 PROBLEM - rsyslog TLS listener on port 6514 on lithium is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Logs
[01:15:13] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[01:15:17] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[01:20:33] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[01:20:41] <icinga-wm>	 RECOVERY - rsyslog TLS listener on port 6514 on lithium is OK: SSL OK - Certificate lithium.eqiad.wmnet valid until 2021-10-23 19:09:29 +0000 (expires in 934 days) https://wikitech.wikimedia.org/wiki/Logs
[01:24:15] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[01:29:29] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[01:38:59] <icinga-wm>	 RECOVERY - SSH on labservices1001 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.11 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[01:42:53] <icinga-wm>	 PROBLEM - SSH on labservices1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[01:52:11] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[01:56:37] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1002 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[01:59:27] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[01:59:37] <wikibugs>	 (03PS1) 10Andrew Bogott: pdns-recursor: reduce maximum number of file descriptors [puppet] - 10https://gerrit.wikimedia.org/r/500880 (https://phabricator.wikimedia.org/T219953)
[02:02:02] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] pdns-recursor: reduce maximum number of file descriptors [puppet] - 10https://gerrit.wikimedia.org/r/500880 (https://phabricator.wikimedia.org/T219953) (owner: 10Andrew Bogott)
[02:02:59] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[02:04:55] <icinga-wm>	 RECOVERY - SSH on labservices1001 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.11 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[02:26:13] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[02:26:29] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[02:36:31] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 21701744 and 0 seconds
[02:37:49] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1688 and 2 seconds
[02:39:11] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[02:42:09] <icinga-wm>	 PROBLEM - HHVM rendering on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[02:43:19] <icinga-wm>	 RECOVERY - HHVM rendering on mwdebug1001 is OK: HTTP OK: HTTP/1.1 200 OK - 75009 bytes in 0.180 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[03:10:33] <icinga-wm>	 PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:29:51] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[03:36:59] <icinga-wm>	 RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[03:37:37] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:37:55] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:56:57] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[03:59:33] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:15:29] <wikibugs>	 (03PS25) 10BryanDavis: wmcs: Migrate tools-checker to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243)
[04:16:09] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:18:30] <wikibugs>	 (03CR) 10BryanDavis: wmcs: Migrate tools-checker to Stretch (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243) (owner: 10BryanDavis)
[04:22:39] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:22:51] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:26:37] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:30:29] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:32:27] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:33:03] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:33:17] <wikibugs>	 10Operations, 10ORES, 10Scoring-platform-team (Research): Investigate memory usage of ORES in kubernetes - https://phabricator.wikimedia.org/T210264 (10Harej)
[04:35:03] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:35:41] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:38:15] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:38:57] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:39:45] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:44:23] <wikibugs>	 (03PS9) 10CRusnov: Netbox module for Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072)
[04:47:47] <icinga-wm>	 PROBLEM - puppet last run on lvs5003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:49:15] <wikibugs>	 (03CR) 10CRusnov: "Woowee lots of changes." (0312 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) (owner: 10CRusnov)
[04:49:29] <wikibugs>	 (03CR) 10Mobrovac: "> Lemme know when it's ready to go (what's blocking it?) and I 'll" [puppet] - 10https://gerrit.wikimedia.org/r/496872 (https://phabricator.wikimedia.org/T204245) (owner: 10Mobrovac)
[04:49:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Netbox module for Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) (owner: 10CRusnov)
[04:54:46] <wikibugs>	 (03PS10) 10CRusnov: Netbox module for Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072)
[04:56:25] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:59:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Netbox module for Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) (owner: 10CRusnov)
[05:01:02] <wikibugs>	 (03PS11) 10CRusnov: Netbox module for Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072)
[05:06:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Netbox module for Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) (owner: 10CRusnov)
[05:11:48] <wikibugs>	 (03PS12) 10CRusnov: Netbox module for Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072)
[05:13:11] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2002 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:14:11] <icinga-wm>	 RECOVERY - puppet last run on lvs5003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[05:19:10] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool pc1007 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500892
[05:21:23] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool pc1007 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500892 (owner: 10Marostegui)
[05:22:17] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool pc1007 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500892 (owner: 10Marostegui)
[05:23:42] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool pc1007 for upgrade (duration: 01m 00s)
[05:23:44] <marostegui>	 !log Upgrade pc1007
[05:23:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:23:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:25:35] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is CRITICAL: cluster=cache_text site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:25:55] <marostegui>	 uh?
[05:26:11] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:26:15] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool pc1007 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500892 (owner: 10Marostegui)
[05:26:39] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:26:45] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:26:59] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[05:27:06] <marostegui>	 what's going on?
[05:28:03] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool pc1007" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500893
[05:28:57] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5
[05:29:05] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
[05:29:08] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool pc1007" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500893 (owner: 10Marostegui)
[05:29:21] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:29:23] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[05:29:29] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:30:05] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:30:08] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool pc1007" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500893 (owner: 10Marostegui)
[05:30:31] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:31:20] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 00m 59s)
[05:31:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:31:55] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:34:07] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5
[05:34:17] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
[05:34:33] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:35:51] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[05:37:11] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:37:22] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool pc1007" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500893 (owner: 10Marostegui)
[05:38:41] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[05:40:05] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Move wikimedia_editor_tasks_entity_description_exists [puppet] - 10https://gerrit.wikimedia.org/r/500894 (https://phabricator.wikimedia.org/T218302)
[05:43:01] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[05:43:11] <wikibugs>	 (03PS2) 10Marostegui: mariadb: Move wikimedia_editor_tasks_entity_description_exists [puppet] - 10https://gerrit.wikimedia.org/r/500894 (https://phabricator.wikimedia.org/T218302)
[05:43:31] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:43:45] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is CRITICAL: cluster=cache_text site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:43:55] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqiad on icinga1001 is CRITICAL: job=varnish-text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[05:44:21] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:44:39] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
[05:46:15] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[05:46:15] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:46:25] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[05:46:31] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[05:46:41] <icinga-wm>	 PROBLEM - Eqsin HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqsin&var-cache_type=All&var-status_type=5
[05:46:57] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[05:47:07] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5
[05:47:39] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:48:17] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:48:43] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:52:25] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
[05:54:01] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[05:54:45] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[05:54:45] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:55:13] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:55:27] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is CRITICAL: cluster=cache_text site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:55:31] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[05:57:29] <marostegui>	 !log Fix data drifts on bnwikisource on x1 - T219493
[05:57:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:57:33] <stashbot>	 T219493: Decommission 2 codfw x1 hosts db2033 and db2034 - https://phabricator.wikimedia.org/T219493
[05:57:35] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
[05:57:42] <_joe_>	 marostegui: can we stop deploying things?
[05:57:53] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[05:57:56] <marostegui>	 yep, I am not deploying anything for a while
[05:58:37] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[05:58:37] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:59:05] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:59:19] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:59:49] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[06:03:53] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5
[06:04:03] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
[06:04:23] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[06:04:26] <_joe_>	 !log restart varnish backend on cp1085, causing unavailability
[06:04:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:04:33] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[06:04:47] <icinga-wm>	 RECOVERY - Eqsin HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqsin&var-cache_type=All&var-status_type=5
[06:17:02] <elukey>	 Krinkle, AaronSchulz - o/ is https://gerrit.wikimedia.org/r/499693 going to be deployed this week by any chance? I'd really like to see it working asap, it might reduce a lot the number of TKOs that we are seeing :)
[06:23:03] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] uwsgi: allow setting routing rules [puppet] - 10https://gerrit.wikimedia.org/r/500729 (owner: 10Giuseppe Lavagetto)
[06:23:17] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/15514/graphite1004.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/500730 (owner: 10Giuseppe Lavagetto)
[06:25:53] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[06:26:57] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[06:31:05] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[06:37:37] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[06:38:07] <wikibugs>	 10Operations, 10serviceops, 10Core Platform Team Backlog (Watching / External), 10Services (watching), 10User-jijiki: Use PHP7 to run all async jobs - https://phabricator.wikimedia.org/T219148 (10jijiki) p:05Triage→03Normal
[06:53:05] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:00:06] <wikibugs>	 (03CR) 10Jcrespo: "Filtered tables will attempt to create a trigger for each column- that will be a waste of resources. I would prefer to leave it as private" [puppet] - 10https://gerrit.wikimedia.org/r/500894 (https://phabricator.wikimedia.org/T218302) (owner: 10Marostegui)
[07:01:25] <wikibugs>	 (03CR) 10Marostegui: "> Filtered tables will attempt to create a trigger for each column-" [puppet] - 10https://gerrit.wikimedia.org/r/500894 (https://phabricator.wikimedia.org/T218302) (owner: 10Marostegui)
[07:03:46] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1120 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500895 (https://phabricator.wikimedia.org/T219493)
[07:03:51] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:04:56] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1120 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500895 (https://phabricator.wikimedia.org/T219493) (owner: 10Marostegui)
[07:06:02] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1120 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500895 (https://phabricator.wikimedia.org/T219493) (owner: 10Marostegui)
[07:06:15] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1120 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500895 (https://phabricator.wikimedia.org/T219493) (owner: 10Marostegui)
[07:07:17] <wikibugs>	 (03CR) 10Jcrespo: "Ah, so you want to make it public? Ok to me then, but it will have to b e imported." [puppet] - 10https://gerrit.wikimedia.org/r/500894 (https://phabricator.wikimedia.org/T218302) (owner: 10Marostegui)
[07:07:33] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1120 T219493 (duration: 01m 13s)
[07:07:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:07:37] <stashbot>	 T219493: Decommission 2 codfw x1 hosts db2033 and db2034 - https://phabricator.wikimedia.org/T219493
[07:07:38] <wikibugs>	 (03CR) 10Marostegui: "> Ah, so you want to make it public? Ok to me then, but it will have" [puppet] - 10https://gerrit.wikimedia.org/r/500894 (https://phabricator.wikimedia.org/T218302) (owner: 10Marostegui)
[07:09:22] <marostegui>	 !log Stop replication in sync on db1120 and db2034 (x1 codfw master) - T219493
[07:09:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:16:01] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 229, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:18:36] <wikibugs>	 (03CR) 10Gilles: "Nothing blocking this, should be fine" [puppet] - 10https://gerrit.wikimedia.org/r/496872 (https://phabricator.wikimedia.org/T204245) (owner: 10Mobrovac)
[07:20:57] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1120" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500897
[07:22:59] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2003 is CRITICAL: instance=kubernetes2003.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:23:16] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1120" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500897 (owner: 10Marostegui)
[07:24:04] <wikibugs>	 10Operations, 10Domains, 10Traffic: figure out if we can park wicipediacymraeg.org - https://phabricator.wikimedia.org/T128085 (10Dzahn)
[07:24:08] <wikibugs>	 10Operations, 10Domains, 10Traffic: wicipediacymraeg.org is on clientHold - https://phabricator.wikimedia.org/T219856 (10Dzahn)
[07:24:47] <wikibugs>	 10Operations, 10Domains, 10Traffic: wicipediacymraeg.org is on clientHold - https://phabricator.wikimedia.org/T219856 (10Dzahn) also see T128085
[07:24:55] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1120" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500897 (owner: 10Marostegui)
[07:25:47] <marostegui>	 !log Deploy schema change on db1073, labtestwiki - T219887
[07:25:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:25:50] <stashbot>	 T219887: Change job table to use mediumblob for job_params field - https://phabricator.wikimedia.org/T219887
[07:26:08] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1120 T219493 (duration: 00m 57s)
[07:26:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:26:12] <stashbot>	 T219493: Decommission 2 codfw x1 hosts db2033 and db2034 - https://phabricator.wikimedia.org/T219493
[07:28:11] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1120" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500897 (owner: 10Marostegui)
[07:29:55] <wikibugs>	 10Operations, 10Domains, 10Traffic: wicipediacymraeg.org is on clientHold - https://phabricator.wikimedia.org/T219856 (10Vgutierrez) regarding TLS wicipediacymraeg.org should benefit from T133548 that should be implemented during Q4
[07:30:03] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[07:30:30] <wikibugs>	 (03CR) 10Marostegui: "jcrespo so you ok with this to go?" [puppet] - 10https://gerrit.wikimedia.org/r/500894 (https://phabricator.wikimedia.org/T218302) (owner: 10Marostegui)
[07:32:01] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2003 is CRITICAL: instance=kubernetes2003.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:32:07] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+1] mariadb: Move wikimedia_editor_tasks_entity_description_exists [puppet] - 10https://gerrit.wikimedia.org/r/500894 (https://phabricator.wikimedia.org/T218302) (owner: 10Marostegui)
[07:32:14] <marostegui>	 \o/
[07:36:01] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:38:37] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:40:45] <marostegui>	 !log DIsable event scheduler on db1115 before restarting - tendril is stuck
[07:40:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:42:14] <marostegui>	 !log Reboot db1115 - tendril and dbtree will be down
[07:42:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:42:19] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2003 is CRITICAL: instance=kubernetes2003.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:42:53] <wikibugs>	 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10elukey) I think that we should move away from hacks done up to now and...
[07:43:49] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Depool s8 sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500899 (https://phabricator.wikimedia.org/T218302)
[07:44:09] <icinga-wm>	 PROBLEM - HTTP-dbtree on dbmonitor1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dbtree.wikimedia.org
[07:44:43] <wikibugs>	 (03PS2) 10Marostegui: db-eqiad,db-codfw.php: Depool s8 sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500899 (https://phabricator.wikimedia.org/T218302)
[07:44:43] <icinga-wm>	 PROBLEM - HTTP-dbtree on dbmonitor2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dbtree.wikimedia.org
[07:44:57] <marostegui>	 ^  expected as !logged before
[07:45:39] <wikibugs>	 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10MoritzMuehlenhoff) >>! In T148843#5080853, @elukey wrote: > * see if i...
[07:45:44] <wikibugs>	 (03CR) 10Gilles: Make caching of static performance site explicit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/499537 (https://phabricator.wikimedia.org/T219417) (owner: 10Gilles)
[07:46:15] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] gerrit: admins: ops -> gerritadmin [puppet] - 10https://gerrit.wikimedia.org/r/498431 (owner: 10Hashar)
[07:47:53] <icinga-wm>	 RECOVERY - HTTP-dbtree on dbmonitor1001 is OK: HTTP OK: HTTP/1.1 200 OK - 80550 bytes in 0.325 second response time https://wikitech.wikimedia.org/wiki/Dbtree.wikimedia.org
[07:48:29] <icinga-wm>	 RECOVERY - HTTP-dbtree on dbmonitor2001 is OK: HTTP OK: HTTP/1.1 200 OK - 80592 bytes in 0.834 second response time https://wikitech.wikimedia.org/wiki/Dbtree.wikimedia.org
[07:51:12] <moritzm>	 !log installing new apache packages on mwdebug
[07:51:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:26] <logmsgbot>	 !log gilles@deploy1001 Synchronized php-1.33.0-wmf.24/includes/media/ThumbnailImage.php: T216499 Only apply high priority hint half the time (duration: 00m 58s)
[07:53:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:29] <stashbot>	 T216499: Priority Hints origin trial - https://phabricator.wikimedia.org/T216499
[07:55:49] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[08:02:10] <wikibugs>	 (03PS4) 10Mathew.onipe: icinga: add mediawiki cirrus update lag check [puppet] - 10https://gerrit.wikimedia.org/r/500422 (https://phabricator.wikimedia.org/T219601)
[08:02:47] <wikibugs>	 (03CR) 10Mathew.onipe: icinga: add mediawiki cirrus update lag check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/500422 (https://phabricator.wikimedia.org/T219601) (owner: 10Mathew.onipe)
[08:07:34] <wikibugs>	 (03PS3) 10Dzahn: gerrit: admins: ops -> gerritadmin [puppet] - 10https://gerrit.wikimedia.org/r/498431 (owner: 10Hashar)
[08:08:32] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] gerrit: admins: ops -> gerritadmin [puppet] - 10https://gerrit.wikimedia.org/r/498431 (owner: 10Hashar)
[08:09:31] <wikibugs>	 (03PS1) 10Jcrespo: network constants: dbmonitor hosts are not general monitoring hosts [puppet] - 10https://gerrit.wikimedia.org/r/500900
[08:09:52] <moritzm>	 !log installing new apache packages on mmw1261
[08:09:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:11:39] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] network constants: dbmonitor hosts are not general monitoring hosts [puppet] - 10https://gerrit.wikimedia.org/r/500900 (owner: 10Jcrespo)
[08:12:35] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/500900 (owner: 10Jcrespo)
[08:13:15] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2003 is CRITICAL: instance=kubernetes2003.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[08:13:46] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad,db-codfw.php: Depool s8 sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500899 (https://phabricator.wikimedia.org/T218302) (owner: 10Marostegui)
[08:14:54] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Depool s8 sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500899 (https://phabricator.wikimedia.org/T218302) (owner: 10Marostegui)
[08:15:10] <wikibugs>	 (03PS2) 10Jcrespo: network constants: dbmonitor hosts are not general monitoring hosts [puppet] - 10https://gerrit.wikimedia.org/r/500900
[08:15:32] <wikibugs>	 10Operations, 10Traffic, 10Goal: Replace Varnish backends with ATS on cache upload nodes in ulsfo - https://phabricator.wikimedia.org/T219967 (10ema)
[08:15:39] <wikibugs>	 10Operations, 10Traffic, 10Goal: Replace Varnish backends with ATS on cache upload nodes in ulsfo - https://phabricator.wikimedia.org/T219967 (10ema) p:05Triage→03Normal
[08:15:47] <wikibugs>	 (03PS3) 10Jcrespo: network constants: dbmonitor hosts are not general monitoring hosts [puppet] - 10https://gerrit.wikimedia.org/r/500900
[08:16:21] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool s8 sanitarium master (duration: 00m 58s)
[08:16:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:17:15] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[08:17:26] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Depool s8 sanitarium master (duration: 00m 57s)
[08:17:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:18:35] <marostegui>	 !log Stop replication on db2082 and db1087 (s8 sanitarium masters) T218302
[08:18:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:18:38] <stashbot>	 T218302: Choose DB/Cluster for WikimediaEditorTasks tables - https://phabricator.wikimedia.org/T218302
[08:19:27] <wikibugs>	 (03PS3) 10Marostegui: mariadb: Move wikimedia_editor_tasks_entity_description_exists [puppet] - 10https://gerrit.wikimedia.org/r/500894 (https://phabricator.wikimedia.org/T218302)
[08:20:31] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Move wikimedia_editor_tasks_entity_description_exists [puppet] - 10https://gerrit.wikimedia.org/r/500894 (https://phabricator.wikimedia.org/T218302) (owner: 10Marostegui)
[08:22:57] <wikibugs>	 (03PS4) 10Jcrespo: network constants: dbmonitor hosts are not general monitoring hosts [puppet] - 10https://gerrit.wikimedia.org/r/500900
[08:23:02] <marostegui>	 !log Restart mysql on sanitarium hosts db1124 db1125 db2094 db2095 - T218302
[08:23:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:24:01] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Depool s8 sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500899 (https://phabricator.wikimedia.org/T218302) (owner: 10Marostegui)
[08:27:35] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[08:33:53] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] network constants: dbmonitor hosts are not general monitoring hosts [puppet] - 10https://gerrit.wikimedia.org/r/500900 (owner: 10Jcrespo)
[08:35:27] <jynus>	 !log merging change on network constants (firewall operation)
[08:35:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:38:09] <wikibugs>	 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10elukey) @EBernhardson I think that the most pressing point now is to d...
[08:38:50] <jynus>	 ^marostegui be viligilant for any network issue, even if it should be a noop
[08:38:56] <marostegui>	 wilco
[08:38:58] <marostegui>	 thanks
[08:38:59] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:39:09] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2003 is CRITICAL: instance=kubernetes2003.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[08:39:25] <jynus>	 I will check db1115, maybe now there is an port opening needed?
[08:40:16] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM. Some comments inline which are mostly optional stuff." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/500801 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm)
[08:40:31] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[08:42:06] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "Let me know if you want me to merge this or if andrew will handle it." [puppet] - 10https://gerrit.wikimedia.org/r/500825 (owner: 10Alex Monk)
[08:42:22] <jynus>	 marostegui: we need a new rule for db1115, I added one manually for now
[08:42:40] <marostegui>	 I just saw the dbmonitor complaining about timeout on icinga
[08:43:03] <icinga-wm>	 PROBLEM - HTTP-dbtree on dbmonitor2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dbtree.wikimedia.org
[08:43:12] <marostegui>	 hehe that ^
[08:43:21] <jynus>	 it should be back
[08:43:27] <jynus>	 oh, I see, it is the other one
[08:43:41] <jynus>	 that should do it
[08:43:51] <marostegui>	 1001 is gone on icinga
[08:43:57] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "let me know if andrew doesn't merge this, I can do it." [puppet] - 10https://gerrit.wikimedia.org/r/500824 (owner: 10Alex Monk)
[08:43:58] <marostegui>	 and dbtree works for me
[08:44:09] <jynus>	 dbmonitor2001 is passive anyway
[08:44:13] <icinga-wm>	 RECOVERY - HTTP-dbtree on dbmonitor2001 is OK: HTTP OK: HTTP/1.1 200 OK - 80589 bytes in 0.946 second response time https://wikitech.wikimedia.org/wiki/Dbtree.wikimedia.org
[08:44:15] <marostegui>	 yep
[08:44:21] <jynus>	 that is why I didn't add it at first
[08:44:25] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[08:44:32] <jynus>	 so we need the glue between frontends and backends on a rule
[08:44:40] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/500823 (owner: 10Alex Monk)
[08:44:45] <mutante>	 soo.. i can explain the 2001 thing
[08:44:55] <mutante>	   if hiera('do_acme', true) {
[08:44:55] <mutante>	         ferm::service { 'tendril-http-https':
[08:45:04] <mutante>	 only opens port 80/443 if do_acme
[08:45:05] <jynus>	 ?
[08:45:08] <mutante>	 so only on the active one
[08:45:24] <jynus>	 mutante: it is actually the backend- it is ok now
[08:45:35] <mutante>	 oh, nevermind then :)
[08:46:41] <jynus>	 I am not worried about that, I am more worried about if others are ok
[08:46:48] <jynus>	 eg "ERROR ferm input drop default policy not set, ferm might not have been started correctly"
[08:47:17] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: openstack: clientpackages: fix missing deb repo installation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/500797 (owner: 10Arturo Borrero Gonzalez)
[08:48:26] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: openstack: clientpackages: fix missing deb repo installation [puppet] - 10https://gerrit.wikimedia.org/r/500797
[08:48:49] <mutante>	 jynus: gotcha, but even if ferm fails to restart for some reason, like failed DNS lookup, it doesn't mean that everything is now closed or open (anymore). had a case like that not too long ago
[08:49:38] <jynus>	 i don't know what was mw1338 thing, it seems ok now
[08:53:31] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[08:54:09] <wikibugs>	 (03PS4) 10Ema: ATS: add ats-backend-restart [puppet] - 10https://gerrit.wikimedia.org/r/500675 (https://phabricator.wikimedia.org/T213263)
[08:56:08] <wikibugs>	 (03PS4) 10Arturo Borrero Gonzalez: openstack: clientpackages: fix missing deb repo installation [puppet] - 10https://gerrit.wikimedia.org/r/500797
[08:57:09] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: clientpackages: fix missing deb repo installation [puppet] - 10https://gerrit.wikimedia.org/r/500797 (owner: 10Arturo Borrero Gonzalez)
[09:03:56] <wikibugs>	 (03PS5) 10Arturo Borrero Gonzalez: openstack: clientpackages: fix missing deb repo installation [puppet] - 10https://gerrit.wikimedia.org/r/500797
[09:05:05] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: clientpackages: fix missing deb repo installation [puppet] - 10https://gerrit.wikimedia.org/r/500797 (owner: 10Arturo Borrero Gonzalez)
[09:08:12] <wikibugs>	 (03PS1) 10Jcrespo: tendril: Open firewall only from tendril web to tendril db backend [puppet] - 10https://gerrit.wikimedia.org/r/500904
[09:08:51] <wikibugs>	 (03PS3) 10Muehlenhoff: haproxy: Remove Ubuntu support [puppet] - 10https://gerrit.wikimedia.org/r/487895
[09:09:03] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[09:09:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] tendril: Open firewall only from tendril web to tendril db backend [puppet] - 10https://gerrit.wikimedia.org/r/500904 (owner: 10Jcrespo)
[09:10:03] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] haproxy: Remove Ubuntu support [puppet] - 10https://gerrit.wikimedia.org/r/487895 (owner: 10Muehlenhoff)
[09:10:13] <wikibugs>	 (03PS5) 10Ema: ATS: add ats-backend-restart [puppet] - 10https://gerrit.wikimedia.org/r/500675 (https://phabricator.wikimedia.org/T213263)
[09:11:58] <wikibugs>	 (03PS6) 10Ema: ATS: add ats-backend-restart [puppet] - 10https://gerrit.wikimedia.org/r/500675 (https://phabricator.wikimedia.org/T213263)
[09:13:14] <wikibugs>	 (03CR) 10Ema: [C: 03+2] ATS: add ats-backend-restart [puppet] - 10https://gerrit.wikimedia.org/r/500675 (https://phabricator.wikimedia.org/T213263) (owner: 10Ema)
[09:15:52] <wikibugs>	 (03CR) 10Muehlenhoff: tendril: Open firewall only from tendril web to tendril db backend (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/500904 (owner: 10Jcrespo)
[09:17:18] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad,db-codfw.php: Depool s8 sanitarium masters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500906
[09:17:32] <marostegui>	 jouncebot: next
[09:17:32] <jouncebot>	 In 1 hour(s) and 42 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190403T1100)
[09:18:47] <wikibugs>	 (03PS2) 10Jcrespo: tendril: Open firewall only from tendril web to tendril db backend [puppet] - 10https://gerrit.wikimedia.org/r/500904
[09:23:44] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad,db-codfw.php: Depool s8 sanitarium masters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500906 (owner: 10Marostegui)
[09:24:51] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad,db-codfw.php: Depool s8 sanitarium masters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500906 (owner: 10Marostegui)
[09:25:48] <wikibugs>	 (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.21 [software/spicerack] - 10https://gerrit.wikimedia.org/r/500907
[09:26:04] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool s8 sanitarium master (duration: 01m 00s)
[09:26:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:26:34] <wikibugs>	 (03PS3) 10Jcrespo: tendril: Open firewall only from tendril web to tendril db backend [puppet] - 10https://gerrit.wikimedia.org/r/500904
[09:27:11] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool s8 sanitarium master (duration: 00m 56s)
[09:27:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:27:31] <marostegui>	 !log Drop wikishared.wikimedia_editor_tasks_entity_description_exists table from x1 T219963
[09:27:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:27:38] <stashbot>	 T219963: Drop wikishared.wikimedia_editor_tasks_entity_description_exists table from x1 - https://phabricator.wikimedia.org/T219963
[09:27:39] <wikibugs>	 (03CR) 10Jcrespo: "Please give it a new look :-)" [puppet] - 10https://gerrit.wikimedia.org/r/500904 (owner: 10Jcrespo)
[09:29:22] <ema>	 !log cp-ats-codfw: test ATS rolling restart T213263
[09:29:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:29:26] <stashbot>	 T213263: Partial cache_upload traffic switchover to ATS and switchback to Varnish - https://phabricator.wikimedia.org/T213263
[09:29:33] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[09:30:26] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad,db-codfw.php: Depool s8 sanitarium masters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500906 (owner: 10Marostegui)
[09:30:47] <wikibugs>	 (03CR) 10Jcrespo: "https://puppet-compiler.wmflabs.org/compiler1002/15517/" [puppet] - 10https://gerrit.wikimedia.org/r/500904 (owner: 10Jcrespo)
[09:33:04] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: designate: introduce openstack/debian split layout [puppet] - 10https://gerrit.wikimedia.org/r/500908
[09:33:12] <wikibugs>	 (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.21 [software/spicerack] - 10https://gerrit.wikimedia.org/r/500907 (owner: 10Volans)
[09:33:49] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/500904 (owner: 10Jcrespo)
[09:36:07] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: only expose HTTP ports where needed. [puppet] - 10https://gerrit.wikimedia.org/r/500909
[09:36:17] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: designate: introduce openstack/debian split layout [puppet] - 10https://gerrit.wikimedia.org/r/500908
[09:37:41] <wikibugs>	 (03CR) 10Jcrespo: "I am deploying this without much delay as this is technically broken right now and the scope of potential breakage is very small." [puppet] - 10https://gerrit.wikimedia.org/r/500904 (owner: 10Jcrespo)
[09:37:50] <wikibugs>	 (03PS4) 10Jcrespo: tendril: Open firewall only from tendril web to tendril db backend [puppet] - 10https://gerrit.wikimedia.org/r/500904
[09:38:01] <wikibugs>	 (03PS2) 10Gehel: elasticsearch: only expose HTTP ports where needed. [puppet] - 10https://gerrit.wikimedia.org/r/500909
[09:38:06] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] tendril: Open firewall only from tendril web to tendril db backend [puppet] - 10https://gerrit.wikimedia.org/r/500904 (owner: 10Jcrespo)
[09:38:41] <wikibugs>	 (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.21 [software/spicerack] - 10https://gerrit.wikimedia.org/r/500907 (owner: 10Volans)
[09:38:48] <wikibugs>	 (03CR) 10Gehel: "PCC looks good: https://puppet-compiler.wmflabs.org/compiler1002/15519/" [puppet] - 10https://gerrit.wikimedia.org/r/500909 (owner: 10Gehel)
[09:38:59] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "PCC ok https://puppet-compiler.wmflabs.org/compiler1002/15518/" [puppet] - 10https://gerrit.wikimedia.org/r/500908 (owner: 10Arturo Borrero Gonzalez)
[09:39:54] <wikibugs>	 (03CR) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.21 [software/spicerack] - 10https://gerrit.wikimedia.org/r/500907 (owner: 10Volans)
[09:43:19] <wikibugs>	 (03PS1) 10Ladsgroup: Enable UrlShortener in mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500910 (https://phabricator.wikimedia.org/T108557)
[09:44:27] <wikibugs>	 (03CR) 10Mathew.onipe: "PCC is Ok: https://puppet-compiler.wmflabs.org/compiler1002/15520/cloudelastic1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/500909 (owner: 10Gehel)
[09:44:46] <wikibugs>	 (03CR) 10Mathew.onipe: [C: 03+1] elasticsearch: only expose HTTP ports where needed. [puppet] - 10https://gerrit.wikimedia.org/r/500909 (owner: 10Gehel)
[09:45:36] <moritzm>	 !log removed labtestnet2003.codfw.wmnet from debmonitor (T219776)
[09:45:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:45:39] <stashbot>	 T219776: labtestnet2003.codfw.wmnet: rename to cloudnet2003-dev.codfw.wmnet and reimage to stretch - https://phabricator.wikimedia.org/T219776
[09:46:38] <wikibugs>	 (03PS1) 10Volans: Upstream release v0.0.21 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/500912
[09:47:03] <wikibugs>	 (03PS1) 10Dzahn: confd: remove upstart support [puppet] - 10https://gerrit.wikimedia.org/r/500913
[09:47:17] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[09:47:41] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "after https://gerrit.wikimedia.org/r/c/operations/puppet/+/500913" [puppet] - 10https://gerrit.wikimedia.org/r/456317 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn)
[09:47:54] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] elasticsearch: only expose HTTP ports where needed. [puppet] - 10https://gerrit.wikimedia.org/r/500909 (owner: 10Gehel)
[09:48:02] <wikibugs>	 (03PS3) 10Gehel: elasticsearch: only expose HTTP ports where needed. [puppet] - 10https://gerrit.wikimedia.org/r/500909
[09:48:06] <wikibugs>	 (03PS5) 10Jcrespo: tendril: Open firewall only from tendril web to tendril db backend [puppet] - 10https://gerrit.wikimedia.org/r/500904
[09:49:27] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] tendril: Open firewall only from tendril web to tendril db backend [puppet] - 10https://gerrit.wikimedia.org/r/500904 (owner: 10Jcrespo)
[09:49:57] <wikibugs>	 (03PS4) 10Gehel: elasticsearch: only expose HTTP ports where needed. [puppet] - 10https://gerrit.wikimedia.org/r/500909
[09:51:39] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, not of the servers matched by "cumin C:confd" are running trusty." [puppet] - 10https://gerrit.wikimedia.org/r/500913 (owner: 10Dzahn)
[09:52:54] <wikibugs>	 (03CR) 10Dzahn: "thanks. after this also https://gerrit.wikimedia.org/r/c/operations/puppet/+/456317" [puppet] - 10https://gerrit.wikimedia.org/r/500913 (owner: 10Dzahn)
[09:53:38] <wikibugs>	 (03CR) 10Volans: [C: 03+2] Upstream release v0.0.21 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/500912 (owner: 10Volans)
[09:54:09] <mutante>	 !log running mysql select queries on m3-slave to get data from phabricator conpherence as requested by andre
[09:54:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:55:26] <moritzm>	 !log upgrading beta to hhvm wikidiff 1.8.1 (T203069)
[09:55:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:55:30] <stashbot>	 T203069: Deploy wikidiff2 v1.8.1 with changed signature - https://phabricator.wikimedia.org/T203069
[09:56:19] <wikibugs>	 (03PS4) 10Jbond: jbond home: add user files [puppet] - 10https://gerrit.wikimedia.org/r/500739
[09:56:30] <marostegui>	 !log Alter empty job table on s6 primary master - T219887
[09:56:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:56:34] <stashbot>	 T219887: Change job table to use mediumblob for job_params field - https://phabricator.wikimedia.org/T219887
[09:58:17] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: restrict access to cloudelastic to only cumin masters [puppet] - 10https://gerrit.wikimedia.org/r/500914
[09:59:28] <wikibugs>	 (03PS2) 10Gehel: elasticsearch: restrict access to cloudelastic to only cumin masters [puppet] - 10https://gerrit.wikimedia.org/r/500914
[09:59:43] <wikibugs>	 (03CR) 10Mathew.onipe: [C: 03+1] elasticsearch: restrict access to cloudelastic to only cumin masters [puppet] - 10https://gerrit.wikimedia.org/r/500914 (owner: 10Gehel)
[10:00:30] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] elasticsearch: restrict access to cloudelastic to only cumin masters [puppet] - 10https://gerrit.wikimedia.org/r/500914 (owner: 10Gehel)
[10:03:25] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream release v0.0.21 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/500912 (owner: 10Volans)
[10:07:42] <Leaderboard>	 Getting database query errors
[10:07:45] <Leaderboard>	 Import failed: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? Query: INSERT IGNORE INTO `page` (page_namespace,page_title,page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_latest,page_len) VALUES ('108','List_of_Australian_AM_radio_stations','','0','1','0.896837776405','20190403100623','0','0') Function: WikiPage::insertOn Error: 
[10:07:45] <Leaderboard>	 1205 Lock wait timeout exceeded; try restarting transaction (10.64.32.136) 
[10:08:02] <marostegui>	 where is that?
[10:08:14] <marostegui>	 is it happening all the time or just once?
[10:08:45] <Leaderboard>	 Just happened twice now while trying to import
[10:08:49] <marostegui>	 so far I have only seen that same error
[10:09:18] <marostegui>	 Leaderboard: that is the same as: https://phabricator.wikimedia.org/T219702 I think
[10:09:51] <Leaderboard>	 Looks like that, will add my case to it. Thanks
[10:09:54] <marostegui>	 thanks
[10:10:28] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: drop serverpackages profile [puppet] - 10https://gerrit.wikimedia.org/r/500916
[10:14:25] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "pcc happy (mostly NOOP) https://puppet-compiler.wmflabs.org/compiler1002/15521/" [puppet] - 10https://gerrit.wikimedia.org/r/500916 (owner: 10Arturo Borrero Gonzalez)
[10:15:50] <wikibugs>	 (03PS5) 10Jbond: jbond home: add user files [puppet] - 10https://gerrit.wikimedia.org/r/500739
[10:17:37] <volans>	 !log uploaded spicerack_0.0.21-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
[10:17:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:50] <wikibugs>	 (03CR) 10GTirloni: [C: 03+1] Add python 3.5 and nodejs 10 types [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496265 (owner: 10BryanDavis)
[10:18:18] <wikibugs>	 (03PS2) 10Dzahn: confd: remove upstart support [puppet] - 10https://gerrit.wikimedia.org/r/500913
[10:18:41] <wikibugs>	 (03PS6) 10Arturo Borrero Gonzalez: openstack: clientpackages: fix missing deb repo installation [puppet] - 10https://gerrit.wikimedia.org/r/500797
[10:19:54] <volans>	 !log upgraded spicerack to 0.0.21 on cumin[12]001
[10:19:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:20:03] <volans>	 gehel, onimisionipe FYI ^^^ ;)
[10:22:51] <onimisionipe>	 nice!
[10:23:40] <wikibugs>	 (03PS7) 10Arturo Borrero Gonzalez: openstack: clientpackages: fix missing deb repo installation [puppet] - 10https://gerrit.wikimedia.org/r/500797
[10:25:36] <arturo>	 !log updating puppet compiler facts
[10:25:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:25:45] <wikibugs>	 (03PS1) 10Elukey: profile::aqs: add the analytics contact group to aqs's alarm [puppet] - 10https://gerrit.wikimedia.org/r/500917
[10:27:30] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] jbond home: add user files [puppet] - 10https://gerrit.wikimedia.org/r/500739 (owner: 10Jbond)
[10:27:39] <wikibugs>	 (03PS6) 10Jbond: jbond home: add user files [puppet] - 10https://gerrit.wikimedia.org/r/500739
[10:27:48] <mutante>	 !log planet1001/2001 - upgrade apache2, openssh, locales, rsyslog ..
[10:27:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:28] <mutante>	 !log planet1001/2001 - apt autoremove un-required packages
[10:29:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:30:14] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 04-1] "Aside from the comment below, I also recommend against deploying this feature until T219974 is resolved." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500910 (https://phabricator.wikimedia.org/T108557) (owner: 10Ladsgroup)
[10:31:21] <wikibugs>	 (03PS2) 10Elukey: profile::aqs: add the analytics contact group to aqs's alarms [puppet] - 10https://gerrit.wikimedia.org/r/500917
[10:31:53] <wikibugs>	 (03PS8) 10Arturo Borrero Gonzalez: openstack: clientpackages: fix missing deb repo installation [puppet] - 10https://gerrit.wikimedia.org/r/500797
[10:32:51] <wikibugs>	 (03CR) 10Dzahn: profile::aqs: add the analytics contact group to aqs's alarms (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/500917 (owner: 10Elukey)
[10:36:29] <icinga-wm>	 PROBLEM - puppet last run on cp5006 is CRITICAL: CRITICAL: Puppet has 5 failures. Last run 3 minutes ago with 5 failures. Failed resources (up to 3 shown): File[/home/jbond/.gitconfig],File[/home/jbond/.vim/autoload/pathogen.vim],File[/home/jbond/.vim/bundle/README.md],File[/home/jbond/.vimrc]
[10:36:54] <elukey>	 thanks mutante !
[10:36:59] <wikibugs>	 10Operations: Add support for temporary chroots to boron - https://phabricator.wikimedia.org/T219977 (10ema)
[10:37:05] <wikibugs>	 10Operations: Add support for temporary chroots to boron - https://phabricator.wikimedia.org/T219977 (10ema) p:05Triage→03Normal
[10:37:13] <mutante>	 elukey: :) yw, enjoy lunch
[10:37:14] <wikibugs>	 (03PS3) 10Elukey: profile::aqs: add the analytics contact group to aqs's alarms [puppet] - 10https://gerrit.wikimedia.org/r/500917
[10:37:29] <icinga-wm>	 PROBLEM - puppet last run on phab2001 is CRITICAL: CRITICAL: Puppet has 5 failures. Last run 5 minutes ago with 5 failures. Failed resources (up to 3 shown): File[/home/jbond/.gitconfig],File[/home/jbond/.vim/autoload/pathogen.vim],File[/home/jbond/.vim/bundle/README.md],File[/home/jbond/.vimrc]
[10:38:27] <mutante>	 i bet that's just a race and phab2001 is random
[10:38:34] <mutante>	 since home dir files change it on everything
[10:39:03] <icinga-wm>	 PROBLEM - puppet last run on labsdb1009 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/home/jbond/.vim/bundle/README.md],File[/home/jbond/.zshrc]
[10:39:18] <mutante>	 yea, confirmed on phab2001.. no issue
[10:39:27] <icinga-wm>	 PROBLEM - puppet last run on scb1001 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 7 minutes ago with 4 failures. Failed resources (up to 3 shown): File[/home/jbond/.vim/autoload/pathogen.vim],File[/home/jbond/.vim/bundle/README.md],File[/home/jbond/.vimrc],File[/home/jbond/.zshenv]
[10:39:45] <icinga-wm>	 PROBLEM - puppet last run on mw1232 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 7 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/home/jbond/.vim/bundle/README.md],File[/home/jbond/.vimrc]
[10:39:45] <icinga-wm>	 PROBLEM - puppet last run on mw1233 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 7 minutes ago with 3 failures. Failed resources (up to 3 shown): File[/home/jbond/.gitconfig],File[/home/jbond/.vimrc],File[/home/jbond/.zshrc]
[10:40:43] <mutante>	 running puppet on all of that
[10:41:16] <wikibugs>	 (03PS2) 10Ladsgroup: Enable UrlShortener in mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500910 (https://phabricator.wikimedia.org/T108557)
[10:41:32] <jbond42>	 sorry reverting
[10:41:52] <wikibugs>	 (03PS1) 10Jbond: Revert "jbond home: add user files" [puppet] - 10https://gerrit.wikimedia.org/r/500919
[10:42:03] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Enable UrlShortener in mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500910 (https://phabricator.wikimedia.org/T108557) (owner: 10Ladsgroup)
[10:42:47] <icinga-wm>	 RECOVERY - puppet last run on phab2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[10:42:49] <mutante>	 jbond42: why revert. it's fine
[10:42:59] <mutante>	 nothing to worry
[10:43:09] <mutante>	 it's just a matter of scale and timing
[10:43:21] <jbond42>	 oh ok i just saw problems and my name 
[10:43:36] <mutante>	 a change that touches everything just gets like 5 out of 1000 
[10:43:43] <mutante>	 that happen to run during the check or so
[10:43:50] <mutante>	 they were all fine after next run
[10:43:57] <jbond42>	 ahh ok thanks
[10:44:04] <mutante>	 no worries
[10:44:19] <icinga-wm>	 RECOVERY - puppet last run on labsdb1009 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[10:44:21] <wikibugs>	 (03Abandoned) 10Jbond: Revert "jbond home: add user files" [puppet] - 10https://gerrit.wikimedia.org/r/500919 (owner: 10Jbond)
[10:44:41] <icinga-wm>	 RECOVERY - puppet last run on scb1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[10:45:01] <icinga-wm>	 RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[10:45:01] <icinga-wm>	 RECOVERY - puppet last run on mw1233 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[10:46:24] <wikibugs>	 (03PS9) 10Arturo Borrero Gonzalez: openstack: clientpackages: fix missing deb repo installation [puppet] - 10https://gerrit.wikimedia.org/r/500797
[10:49:07] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[10:49:41] <wikibugs>	 (03PS3) 10Ladsgroup: Enable UrlShortener in mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500910 (https://phabricator.wikimedia.org/T108557)
[10:50:26] <wikibugs>	 (03PS10) 10Arturo Borrero Gonzalez: openstack: clientpackages: fix missing deb repo installation [puppet] - 10https://gerrit.wikimedia.org/r/500797
[10:53:31] <wikibugs>	 10Operations, 10Phabricator, 10Traffic: Make phame cacheable - https://phabricator.wikimedia.org/T219978 (10ema)
[10:53:39] <wikibugs>	 10Operations, 10Phabricator, 10Traffic: Make phame cacheable - https://phabricator.wikimedia.org/T219978 (10ema) p:05Triage→03Normal
[10:54:07] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "I'm finally happy with the resulting catalogs: https://puppet-compiler.wmflabs.org/compiler1002/15527/" [puppet] - 10https://gerrit.wikimedia.org/r/500797 (owner: 10Arturo Borrero Gonzalez)
[10:54:51] <wikibugs>	 (03CR) 10Alex Monk: "I don't mind which person does it, whichever is most convenient for you two. I think Andrew is planning to when he has time." [puppet] - 10https://gerrit.wikimedia.org/r/500825 (owner: 10Alex Monk)
[11:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: It is that lovely time of the day again! You are hereby commanded to deploy European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190403T1100).
[11:00:04] <jouncebot>	 Tulsi and Amir1: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[11:00:12] <Amir1>	 o/
[11:00:40] <zeljkof>	 Amir1: could you please also deploy Tulsi's patch? (if they are around)
[11:00:45] <wikibugs>	 10Operations: Add support for temporary chroots to boron - https://phabricator.wikimedia.org/T219977 (10ema)
[11:00:56] <Amir1>	 yeah sure, let's see if they are around
[11:01:30] <zeljkof>	 Amir1: swat is yours then, start with your patch, continue with Tulsi's
[11:01:42] <Amir1>	 yess
[11:02:51] <icinga-wm>	 RECOVERY - puppet last run on cp5006 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures
[11:03:54] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] "SWAT" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500910 (https://phabricator.wikimedia.org/T108557) (owner: 10Ladsgroup)
[11:05:02] <wikibugs>	 (03Merged) 10jenkins-bot: Enable UrlShortener in mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500910 (https://phabricator.wikimedia.org/T108557) (owner: 10Ladsgroup)
[11:06:48] <akosiaris>	 XioNoX: they should be investigated, we can downtime for a few days though
[11:07:53] <wikibugs>	 (03PS1) 10Jbond: jbond_home: use emacs bindkeys [puppet] - 10https://gerrit.wikimedia.org/r/500923
[11:09:37] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] jbond_home: use emacs bindkeys [puppet] - 10https://gerrit.wikimedia.org/r/500923 (owner: 10Jbond)
[11:09:48] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: Revert "openstack: clientpackages: fix missing deb repo installation" [puppet] - 10https://gerrit.wikimedia.org/r/500924
[11:09:50] <wikibugs>	 (03CR) 10jenkins-bot: Enable UrlShortener in mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500910 (https://phabricator.wikimedia.org/T108557) (owner: 10Ladsgroup)
[11:11:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "openstack: clientpackages: fix missing deb repo installation" [puppet] - 10https://gerrit.wikimedia.org/r/500924 (owner: 10Arturo Borrero Gonzalez)
[11:12:06] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: Revert "openstack: clientpackages: fix missing deb repo installation" [puppet] - 10https://gerrit.wikimedia.org/r/500924
[11:13:39] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: Revert "openstack: clientpackages: fix missing deb repo installation" [puppet] - 10https://gerrit.wikimedia.org/r/500924
[11:14:55] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] Revert "openstack: clientpackages: fix missing deb repo installation" [puppet] - 10https://gerrit.wikimedia.org/r/500924 (owner: 10Arturo Borrero Gonzalez)
[11:16:13] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[11:16:32] <jbond42>	 !rolling security updates for apache
[11:16:36] <jbond42>	 !log rolling security updates for apache
[11:16:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:19:30] <Amir1>	 okay, it works for example w.wiki/6
[11:20:30] <wikibugs>	 (03PS1) 10Ladsgroup: Revert "Enable UrlShortener in mediawikiwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500926
[11:21:53] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Revert "Enable UrlShortener in mediawikiwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500926 (owner: 10Ladsgroup)
[11:23:04] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Enable UrlShortener in mediawikiwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500926 (owner: 10Ladsgroup)
[11:25:20] <Amir1>	 !log EU SWAT is done
[11:25:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:25:50] <Amir1>	 zeljkof: I put all of the Easter eggs. like w.wiki/e (ta-duh)
[11:26:55] <wikibugs>	 (03PS1) 10Volans: check_icinga: add configuration validator [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/500927
[11:28:23] <wikibugs>	 (03CR) 10Volans: "The plan is to add a symlink on the wikitech-static host like we have now for the check_icinga and tell people to run it when modifying th" [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/500927 (owner: 10Volans)
[11:30:17] <icinga-wm>	 PROBLEM - Check systemd state on webperf2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[11:30:21] <icinga-wm>	 PROBLEM - HTTP-dbtree on dbmonitor2001 is CRITICAL: connect to address 208.80.153.52 and port 80: Connection refused https://wikitech.wikimedia.org/wiki/Dbtree.wikimedia.org
[11:30:27] <icinga-wm>	 PROBLEM - Check systemd state on dbmonitor2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[11:30:37] <wikibugs>	 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10Miriam) Thanks @EBernhardson and all!!. Would a CNN finetuning task, u...
[11:30:39] <icinga-wm>	 PROBLEM - puppet last run on mc2031 is CRITICAL: CRITICAL: Puppet has 12 failures. Last run 3 minutes ago with 12 failures. Failed resources (up to 3 shown)
[11:32:05] <icinga-wm>	 PROBLEM - puppet last run on maps1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:32:07] <icinga-wm>	 PROBLEM - puppet last run on mw2258 is CRITICAL: CRITICAL: Puppet has 80 failures. Last run 5 minutes ago with 80 failures. Failed resources (up to 3 shown)
[11:32:18] <wikibugs>	 (03CR) 10jenkins-bot: Revert "Enable UrlShortener in mediawikiwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500926 (owner: 10Ladsgroup)
[11:32:27] <icinga-wm>	 PROBLEM - puppet last run on elastic1026 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown)
[11:32:31] <icinga-wm>	 PROBLEM - puppet last run on mw2195 is CRITICAL: CRITICAL: Puppet has 35 failures. Last run 5 minutes ago with 35 failures. Failed resources (up to 3 shown): File[/var/lib/hphpd/hphpd.ini],File[/usr/local/bin/mwrepl],File[/etc/logrotate.d/mediawiki_apache],File[/etc/rsyslog.lookup.d/lookup_table_output.json]
[11:32:43] <icinga-wm>	 PROBLEM - puppet last run on mw2248 is CRITICAL: CRITICAL: Puppet has 37 failures. Last run 5 minutes ago with 37 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP],File[/etc/tmpreaper.conf],File[/usr/local/bin/cgroup-mediawiki-clean],File[/etc/ImageMagick-6/policy.xml]
[11:32:45] <icinga-wm>	 PROBLEM - puppet last run on cp5008 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/home/krinkle],File[/home/gilles]
[11:33:11] <icinga-wm>	 PROBLEM - puppet last run on an-worker1079 is CRITICAL: CRITICAL: Puppet has 40 failures. Last run 4 minutes ago with 40 failures. Failed resources (up to 3 shown): File[/home/filippo],File[/home/jgreen],File[/home/bblack],File[/home/andrew]
[11:33:15] <icinga-wm>	 PROBLEM - puppet last run on mw2165 is CRITICAL: CRITICAL: Puppet has 71 failures. Last run 6 minutes ago with 71 failures. Failed resources (up to 3 shown)
[11:33:35] <icinga-wm>	 PROBLEM - puppet last run on db1073 is CRITICAL: CRITICAL: Puppet has 11 failures. Last run 4 minutes ago with 11 failures. Failed resources (up to 3 shown): File[/usr/local/bin/apt2xml],File[/usr/local/bin/puppet-enabled],File[/usr/local/bin/prometheus-puppet-agent-stats],File[/etc/rsyslog.d]
[11:34:27] <icinga-wm>	 PROBLEM - Check systemd state on webperf1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[11:34:35] <icinga-wm>	 PROBLEM - puppet last run on cp3041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:34:44] <Zppix>	 Expected? ^
[11:35:11] <icinga-wm>	 PROBLEM - puppet last run on aqs1006 is CRITICAL: CRITICAL: Puppet has 31 failures. Last run 6 minutes ago with 31 failures. Failed resources (up to 3 shown): File[/etc/vim/vimrc.local],File[/usr/local/bin/phaste],File[/root/.screenrc],File[/usr/local/lib/nagios/plugins/]
[11:35:33] <icinga-wm>	 PROBLEM - puppet last run on cloudvirtan1004 is CRITICAL: CRITICAL: Puppet has 33 failures. Last run 6 minutes ago with 33 failures. Failed resources (up to 3 shown): File[/home/filippo],File[/home/jgreen],File[/home/bblack],File[/home/andrew]
[11:35:45] <icinga-wm>	 PROBLEM - puppet last run on cp3044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:35:55] <icinga-wm>	 RECOVERY - puppet last run on mc2031 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[11:36:03] <icinga-wm>	 PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: Puppet has 13 failures. Last run 7 minutes ago with 13 failures. Failed resources (up to 3 shown): File[/usr/local/lib/nagios/plugins/check_systemd_state],File[/usr/local/lib/nagios/plugins/check_long_procs],File[/etc/smartmontools/run.d/20logger],File[/usr/lib/nagios/plugins/check_timedatectl]
[11:36:05] <jbond42>	 ^^ checking however i think this may have occured as i updated apache without first disabling puppet agent 
[11:36:19] <jbond42>	 sorry for the noise, everything i have tested so far is working
[11:36:39] <icinga-wm>	 PROBLEM - puppet last run on analytics1065 is CRITICAL: CRITICAL: Puppet has 8 failures. Last run 7 minutes ago with 8 failures. Failed resources (up to 3 shown): File[/usr/local/share/ca-certificates/RapidSSL_SHA256_CA_-_G3.crt],File[/usr/local/share/ca-certificates/DigiCert_High_Assurance_CA-3.crt],File[/usr/local/share/ca-certificates/DigiCert_SHA2_High_Assurance_Server_CA.crt],File[/usr/local/share/ca-certificates/GlobalSign_
[11:36:39] <icinga-wm>	 dation_CA_-_SHA256_-_G2.crt]
[11:36:55] <moritzm>	 you mean upgrading apache on one of the puppet masters?
[11:37:00] <Zppix>	 jbond42: hey we know icinga works now atleast lol
[11:37:30] <moritzm>	 if so, yes, that is the typical amount of puppet spam caused by the apache upgrade window
[11:37:49] <jbond42>	 moritzm: erm no i didn't think theses would be so noise :S
[11:37:59] <icinga-wm>	 RECOVERY - puppet last run on mw2248 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[11:38:37] <moritzm>	 puppetdb restarts are even worse :-)
[11:39:29] <Zppix>	 moritzm: to be fair icinga is bound to complain regardless what you do :P
[11:40:32] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] "Let's err on the side of caution and wait for filippo's +1 as well" [puppet] - 10https://gerrit.wikimedia.org/r/496872 (https://phabricator.wikimedia.org/T204245) (owner: 10Mobrovac)
[11:40:37] <jbond42>	 fyi the alert on webperf1002 is geniune https://phabricator.wikimedia.org/P8334
[11:41:01] <icinga-wm>	 RECOVERY - puppet last run on cp3044 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[11:41:17] <icinga-wm>	 PROBLEM - puppet last run on webperf1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Service[apache2]
[11:42:11] <moritzm>	 looking at webperf1002
[11:42:29] <wikibugs>	 (03PS1) 10Ladsgroup: wikilabels: update alias for database [puppet] - 10https://gerrit.wikimedia.org/r/500928 (https://phabricator.wikimedia.org/T219563)
[11:50:18] <wikibugs>	 (03PS1) 10Muehlenhoff: xhgui: Properly include passwords [puppet] - 10https://gerrit.wikimedia.org/r/500930
[11:50:25] <wikibugs>	 (03PS1) 10Jbond: webperf1002: include passwords class [puppet] - 10https://gerrit.wikimedia.org/r/500931
[11:50:36] <jbond42>	 lol moritzm seems you beat me to it
[11:51:02] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/500930 (owner: 10Muehlenhoff)
[11:51:15] <icinga-wm>	 PROBLEM - puppet last run on webperf2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[apache2]
[11:51:31] <wikibugs>	 (03Abandoned) 10Jbond: webperf1002: include passwords class [puppet] - 10https://gerrit.wikimedia.org/r/500931 (owner: 10Jbond)
[11:51:40] <moritzm>	 great minds think alike :-)
[11:51:45] <jbond42>	 :)
[11:51:57] <wikibugs>	 (03PS2) 10Muehlenhoff: xhgui: Properly include passwords [puppet] - 10https://gerrit.wikimedia.org/r/500930
[11:52:40] <jbond42>	 moritzm: looks like this error has been around for a while, is it possible its only started to trigger because of the apache upgrade
[11:53:01] <moritzm>	 yeah, it was introduced on 5th of March
[11:53:24] <moritzm>	 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/494425/
[11:53:32] <moritzm>	 actually, 13th, when it was merged
[11:53:57] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] xhgui: Properly include passwords [puppet] - 10https://gerrit.wikimedia.org/r/500930 (owner: 10Muehlenhoff)
[11:55:17] <jbond42>	 moritzm: ahh the notify to apache just dose a reload so it probably errord once and never reloaded its config untill just now when it was rtestarted witrh the upgrade 
[11:55:56] <moritzm>	 ack, yes
[11:56:12] <Amir1>	 w.wiki/$ -> donate.wikimedia.org
[11:58:31] <icinga-wm>	 RECOVERY - puppet last run on mw2258 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[11:58:55] <icinga-wm>	 RECOVERY - puppet last run on mw2195 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[11:59:09] <icinga-wm>	 RECOVERY - puppet last run on cp5008 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[11:59:21] <Tulsi>	 Hello can someone deploy https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/499530/
[11:59:35] <icinga-wm>	 RECOVERY - puppet last run on an-worker1079 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[11:59:39] <icinga-wm>	 RECOVERY - puppet last run on mw2165 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[11:59:43] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[11:59:59] <icinga-wm>	 RECOVERY - puppet last run on db1073 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures
[12:00:05] <jouncebot>	 Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190403T1200)
[12:00:19] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove passwords::ldap::production from role::webperf::profiling_tools [puppet] - 10https://gerrit.wikimedia.org/r/500932
[12:00:58] <Amir1>	 Tulsi: the SWAT is over, we waited for you
[12:01:01] <icinga-wm>	 RECOVERY - puppet last run on cp3041 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[12:01:15] <Amir1>	 be around in the time of deployment
[12:01:37] <icinga-wm>	 RECOVERY - puppet last run on aqs1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[12:01:57] <icinga-wm>	 RECOVERY - puppet last run on cloudvirtan1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[12:02:11] <Tulsi>	 Amir1: :/
[12:02:21] <icinga-wm>	 PROBLEM - puppet last run on dbmonitor2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[apache2]
[12:02:27] <icinga-wm>	 RECOVERY - puppet last run on lvs3003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[12:02:33] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/500932 (owner: 10Muehlenhoff)
[12:03:05] <icinga-wm>	 RECOVERY - puppet last run on analytics1065 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[12:03:49] <icinga-wm>	 RECOVERY - puppet last run on maps1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[12:04:05] <icinga-wm>	 RECOVERY - puppet last run on elastic1026 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[12:05:35] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[12:07:02] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove passwords::ldap::production from role::webperf::profiling_tools [puppet] - 10https://gerrit.wikimedia.org/r/500932 (owner: 10Muehlenhoff)
[12:09:19] <icinga-wm>	 RECOVERY - Check systemd state on webperf1002 is OK: OK - running: The system is fully operational
[12:13:01] <icinga-wm>	 RECOVERY - puppet last run on webperf1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[12:15:49] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2002 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[12:17:39] <icinga-wm>	 RECOVERY - puppet last run on webperf2002 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[12:18:15] <icinga-wm>	 RECOVERY - Check systemd state on webperf2002 is OK: OK - running: The system is fully operational
[12:18:43] <mutante>	 thanks for the fix moritz
[12:18:48] <mutante>	 re: webperf xhgui
[12:22:13] <moritzm>	 sure, yw :-)
[12:23:45] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] uwsgi: allow setting routing rules [puppet] - 10https://gerrit.wikimedia.org/r/500729 (owner: 10Giuseppe Lavagetto)
[12:25:00] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] graphite: correctly set Cache-control: no-store [puppet] - 10https://gerrit.wikimedia.org/r/500730 (owner: 10Giuseppe Lavagetto)
[12:26:21] <icinga-wm>	 PROBLEM - puppet last run on francium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:26:55] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[12:31:15] <icinga-wm>	 PROBLEM - puppet last run on ms-be1027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:31:18] <wikibugs>	 (03PS1) 10Mathew.onipe: acme_chief: Issue a certificate for cloudelastic100[1-4].wm.o [puppet] - 10https://gerrit.wikimedia.org/r/500940 (https://phabricator.wikimedia.org/T214921)
[12:31:26] <mutante>	 !log restarting gerrit service to apply change 498431
[12:31:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:31:46] <mutante>	 gerrit is restarting.. be back momentarily
[12:31:51] <onimisionipe>	 ok
[12:33:39] <mutante>	 onimisionipe: it's back. sorry about the interruption. tried to find a quiet window 
[12:33:54] <onimisionipe>	 mutante: Oh... no p!
[12:34:45] <wikibugs>	 (03PS2) 10Mathew.onipe: acme_chief: Issue a certificate for cloudelastic100[1-4].wm.o [puppet] - 10https://gerrit.wikimedia.org/r/500940 (https://phabricator.wikimedia.org/T214921)
[12:34:53] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: codfw1dev: don't include clientpackages in cloudcontrol servers [puppet] - 10https://gerrit.wikimedia.org/r/500946 (https://phabricator.wikimedia.org/T219981)
[12:35:01] <icinga-wm>	 PROBLEM - puppet last run on kafka2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas]
[12:35:25] <mutante>	 ^ caused by gerrit restart. fixed by puppet. no issue
[12:36:55] <icinga-wm>	 PROBLEM - puppet last run on labsdb1011 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config]
[12:37:19] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] "LGTM with one nit" (031 comment) [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/500927 (owner: 10Volans)
[12:37:57] <icinga-wm>	 PROBLEM - puppet last run on labsdb1009 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config]
[12:38:11] <mutante>	 running puppet on those too
[12:38:31] <icinga-wm>	 PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_jenkins CI slave scripts]
[12:38:37] <cdanis>	 does anyone else have the sense that in the past few weeks we're seeing a fair bit more puppet failures due to 'Catalog fetch fail'?
[12:38:45] <icinga-wm>	 PROBLEM - puppet last run on neodymium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/cookbooks]
[12:38:53] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: codfw1dev: don't include clientpackages in cloudcontrol servers [puppet] - 10https://gerrit.wikimedia.org/r/500946 (https://phabricator.wikimedia.org/T219981) (owner: 10Arturo Borrero Gonzalez)
[12:39:29] <icinga-wm>	 PROBLEM - puppet last run on labsdb1010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config]
[12:40:07] <icinga-wm>	 PROBLEM - puppet last run on releases1001 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 7 minutes ago with 3 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/tools/release],Exec[git_pull_operations/deployment-charts],Exec[git_pull_jenkins CI Composer]
[12:40:19] <icinga-wm>	 RECOVERY - puppet last run on kafka2002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[12:42:11] <icinga-wm>	 RECOVERY - puppet last run on labsdb1011 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[12:42:15] <wikibugs>	 (03CR) 10Volans: check_icinga: add configuration validator (031 comment) [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/500927 (owner: 10Volans)
[12:42:29] <arturo>	 !log T219626 reimaging cloudcontrol2001-dev
[12:42:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:42:33] <stashbot>	 T219626: codfw1dev: bootstrap cloudcontrol servers in mitaka/stretch - https://phabricator.wikimedia.org/T219626
[12:43:10] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/15528/aqs1007.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/500917 (owner: 10Elukey)
[12:43:15] <icinga-wm>	 RECOVERY - puppet last run on labsdb1009 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[12:43:25] <wikibugs>	 (03PS4) 10Elukey: profile::aqs: add the analytics contact group to aqs's alarms [puppet] - 10https://gerrit.wikimedia.org/r/500917
[12:43:49] <icinga-wm>	 RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[12:44:47] <icinga-wm>	 RECOVERY - puppet last run on labsdb1010 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[12:44:51] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2003 is CRITICAL: instance=kubernetes2003.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[12:45:23] <icinga-wm>	 RECOVERY - puppet last run on releases1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[12:45:30] <logmsgbot>	 !log gehel@cumin2001 START - Cookbook sre.elasticsearch.rolling-reboot
[12:45:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:48:44] <wikibugs>	 (03PS1) 10Elukey: aqs: better handling of contact groups [puppet] - 10https://gerrit.wikimedia.org/r/500949
[12:49:49] <logmsgbot>	 !log gehel@cumin2001 END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
[12:49:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:51:15] <wikibugs>	 (03CR) 10Dzahn: "gerrit service restarted to apply this" [puppet] - 10https://gerrit.wikimedia.org/r/498431 (owner: 10Hashar)
[12:51:35] <wikibugs>	 (03PS2) 10Elukey: aqs: better handling of contact groups [puppet] - 10https://gerrit.wikimedia.org/r/500949
[12:52:49] <icinga-wm>	 RECOVERY - puppet last run on francium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[12:55:50] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: wait for host up after reboot [cookbooks] - 10https://gerrit.wikimedia.org/r/500952
[12:56:26] <wikibugs>	 10Operations, 10MediaWiki-extensions-UrlShortener, 10Traffic: Shortened URLs won't redirect when there's data - https://phabricator.wikimedia.org/T219986 (10Ladsgroup) I think there's something with https://github.com/wikimedia/puppet/blob/production/modules/varnish/templates/text-backend.inc.vcl.erb but my...
[12:56:35] <icinga-wm>	 PROBLEM - puppet last run on kafka1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:58:46] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2003 is CRITICAL: instance=kubernetes2003.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[12:59:42] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] elasticsearch: wait for host up after reboot [cookbooks] - 10https://gerrit.wikimedia.org/r/500952 (owner: 10Gehel)
[13:00:02] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] elasticsearch: wait for host up after reboot [cookbooks] - 10https://gerrit.wikimedia.org/r/500952 (owner: 10Gehel)
[13:00:04] <jouncebot>	 Deploy window MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190403T1300)
[13:01:16] <logmsgbot>	 !log gehel@cumin2001 START - Cookbook sre.elasticsearch.rolling-reboot
[13:01:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:01:46] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2003 is CRITICAL: instance=kubernetes2003.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[13:02:10] <wikibugs>	 10Operations, 10Domains, 10Traffic, 10serviceops: contact Wikivoyage e. V. and figure out status of wikivoyage-old.org / fix or park broken domain - https://phabricator.wikimedia.org/T219867 (10Dzahn) p:05Triage→03Normal `  cat wikivoyage-old.org  ; vim: set expandtab:smarttab @           1D  IN SOA  n...
[13:02:41] <logmsgbot>	 !log gehel@cumin2001 END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
[13:02:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:04] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/15531/aqs1007.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/500949 (owner: 10Elukey)
[13:03:11] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] aqs: better handling of contact groups [puppet] - 10https://gerrit.wikimedia.org/r/500949 (owner: 10Elukey)
[13:03:16] <wikibugs>	 10Operations, 10Domains, 10Traffic, 10serviceops: contact Wikivoyage e. V. and figure out status of wikivoyage-old.org / fix or park broken domain - https://phabricator.wikimedia.org/T219867 (10Dzahn)
[13:03:42] <wikibugs>	 10Operations: DNS for wikivoyage-old.org - https://phabricator.wikimedia.org/T81727 (10Dzahn)
[13:04:15] <wikibugs>	 10Operations: wikivoyage migration (tracking) - https://phabricator.wikimedia.org/T81583 (10Dzahn)
[13:04:40] <icinga-wm>	 RECOVERY - puppet last run on neodymium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[13:05:12] <mutante>	 making old tickets public that used to be NDA but for no reason expect being RT and out of caution
[13:05:33] <mutante>	 that's why you might see something ancient pop up in feed
[13:07:56] <mutante>	 it doesnt mean they are new issues, it's for transparency and linkability.. and comes from ticket system before phab
[13:08:32] <icinga-wm>	 RECOVERY - puppet last run on ms-be1027 is OK: OK: Puppet is currently enabled, last run 12 minutes ago with 0 failures
[13:09:40] <wikibugs>	 10Operations: SSL / protoproxy config for wikivoyage - https://phabricator.wikimedia.org/T81686 (10Dzahn)
[13:10:19] <wikibugs>	 10Operations: SSL cert for wikivoyage.org - https://phabricator.wikimedia.org/T81588 (10Dzahn)
[13:10:40] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: codfw1dev: drop even more usage of clientpackages [puppet] - 10https://gerrit.wikimedia.org/r/500956 (https://phabricator.wikimedia.org/T219981)
[13:11:35] <wikibugs>	 10Operations: wikivoyage - import mysql dumps to s3 - https://phabricator.wikimedia.org/T81734 (10Dzahn)
[13:12:29] <wikibugs>	 10Operations: redirect wikivoyage.de to wikivoyage.org with/after switch - https://phabricator.wikimedia.org/T81726 (10Dzahn)
[13:12:51] <wikibugs>	 10Operations: create the new DNS zone file template for wikivoyage.org (wv "going live"-switch) - https://phabricator.wikimedia.org/T81569 (10Dzahn)
[13:13:42] <wikibugs>	 10Operations: setup wikivoyage-lb - https://phabricator.wikimedia.org/T81555 (10Dzahn)
[13:14:15] <wikibugs>	 10Operations, 10netops: IPv6 LVS service IPs (secure6) - https://phabricator.wikimedia.org/T81670 (10Dzahn)
[13:16:31] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: codfw1dev: drop even more usage of clientpackages [puppet] - 10https://gerrit.wikimedia.org/r/500956 (https://phabricator.wikimedia.org/T219981) (owner: 10Arturo Borrero Gonzalez)
[13:20:18] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[13:22:50] <marostegui>	 jynus: there is an alert on icinga about dbmonitor2001 and dbtree, could that be related to the earlier work with ferm?
[13:23:06] <wikibugs>	 10Operations, 10Patch-For-Review: Ferm rules for dumps (ms1001/datasets) - https://phabricator.wikimedia.org/T105040 (10Dzahn)
[13:23:17] <wikibugs>	 10Operations: Ferm rules for dumps (ms1001/datasets) - https://phabricator.wikimedia.org/T105040 (10Dzahn)
[13:27:48] <wikibugs>	 (03CR) 10Volans: [C: 03+2] check_icinga: add configuration validator [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/500927 (owner: 10Volans)
[13:27:50] <mutante>	 marostegui: jynus : syntax error in apache config related to ssl
[13:28:19] <wikibugs>	 (03Merged) 10jenkins-bot: check_icinga: add configuration validator [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/500927 (owner: 10Volans)
[13:28:20] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[13:30:03] <wikibugs>	 (03PS2) 10Gilles: Make caching of static performance site explicit [puppet] - 10https://gerrit.wikimedia.org/r/499537 (https://phabricator.wikimedia.org/T219417)
[13:30:18] <wikibugs>	 (03CR) 10Gilles: Make caching of static performance site explicit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/499537 (https://phabricator.wikimedia.org/T219417) (owner: 10Gilles)
[13:31:17] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Make caching of static performance site explicit [puppet] - 10https://gerrit.wikimedia.org/r/499537 (https://phabricator.wikimedia.org/T219417) (owner: 10Gilles)
[13:33:26] <icinga-wm>	 RECOVERY - puppet last run on kafka1012 is OK: OK: Puppet is currently enabled, last run 9 minutes ago with 0 failures
[13:34:36] <icinga-wm>	 RECOVERY - Check systemd state on dbmonitor2001 is OK: OK - running: The system is fully operational
[13:34:38] <icinga-wm>	 RECOVERY - HTTP-dbtree on dbmonitor2001 is OK: HTTP OK: HTTP/1.1 200 OK - 80589 bytes in 1.084 second response time https://wikitech.wikimedia.org/wiki/Dbtree.wikimedia.org
[13:34:47] <mutante>	 :)
[13:34:49] <moritzm>	 !log reverting dbmonitor2001 to deb8u12+wmf1 build
[13:34:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:14] <wikibugs>	 (03PS3) 10Gilles: Make caching of static performance site explicit [puppet] - 10https://gerrit.wikimedia.org/r/499537 (https://phabricator.wikimedia.org/T219417)
[13:36:22] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Make caching of static performance site explicit [puppet] - 10https://gerrit.wikimedia.org/r/499537 (https://phabricator.wikimedia.org/T219417) (owner: 10Gilles)
[13:36:58] <wikibugs>	 (03PS1) 10Volans: icinga: rename config validator and set exec bit [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/500960
[13:37:27] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] icinga: rename config validator and set exec bit [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/500960 (owner: 10Volans)
[13:37:51] <wikibugs>	 (03CR) 10Volans: [C: 03+2] icinga: rename config validator and set exec bit [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/500960 (owner: 10Volans)
[13:38:15] <wikibugs>	 (03Merged) 10jenkins-bot: icinga: rename config validator and set exec bit [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/500960 (owner: 10Volans)
[13:38:29] <wikibugs>	 (03PS4) 10Gilles: Make caching of static performance site explicit [puppet] - 10https://gerrit.wikimedia.org/r/499537 (https://phabricator.wikimedia.org/T219417)
[13:44:07] <logmsgbot>	 !log gilles@deploy1001 Synchronized php-1.33.0-wmf.23/includes/media/MediaTransformOutput.php: T216499 Identify images that should have had high importance (duration: 00m 59s)
[13:44:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:44:10] <stashbot>	 T216499: Priority Hints origin trial - https://phabricator.wikimedia.org/T216499
[13:44:40] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on thumbor1004 is CRITICAL: 4.001 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=thumbor1004&var-datasource=eqiad+prometheus/ops
[13:46:31] <wikibugs>	 (03PS5) 10Gilles: Make caching of static performance site explicit [puppet] - 10https://gerrit.wikimedia.org/r/499537 (https://phabricator.wikimedia.org/T219417)
[13:46:32] <andrewbogott>	 !log restarting neutron-metadata-agent on cloudnet1003
[13:46:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:00] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[13:49:38] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[13:50:06] <wikibugs>	 10Operations, 10Release-Engineering-Team: mwdebug2001 "/" almost full - https://phabricator.wikimedia.org/T219989 (10Marostegui)
[13:56:24] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[13:56:40] <icinga-wm>	 RECOVERY - puppet last run on dbmonitor2001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures
[13:59:32] <andrewbogott>	 !log restarting neutron-l3-agent on cloudnet1003 and cloudnet1004
[13:59:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:00:47] <wikibugs>	 10Operations, 10Puppet, 10Packaging: upgrade facter and puppet across the fleet - https://phabricator.wikimedia.org/T219803 (10MoritzMuehlenhoff) It's a tough nut to crack, I've made progress on a  number of issues, but still not fully done yet:  1. The failure quoted above is ultimately a bug in the C++ sta...
[14:02:32] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[14:03:14] <andrewbogott>	 !log restarting rabbitmq on cloudcontrol1003
[14:03:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:10:38] <icinga-wm>	 PROBLEM - puppet last run on mw1310 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:15:36] <wikibugs>	 (03PS1) 10Joal: Update sqoop number of processors to 10 [puppet] - 10https://gerrit.wikimedia.org/r/500964
[14:18:37] <marostegui>	 !log Stop replication on pc2007 for testing - T210725
[14:18:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:18:55] <stashbot>	 T210725: Replace parsercache keys to something more meaningful on db-XXXX.php - https://phabricator.wikimedia.org/T210725
[14:23:28] <wikibugs>	 (03PS1) 10Elukey: cumin: add more hadoop-related aliases [puppet] - 10https://gerrit.wikimedia.org/r/500967 (https://phabricator.wikimedia.org/T218343)
[14:28:24] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Kanban): Server rename: labtestnet2003 to cloudnet2003-dev, update label and switch ports descriptions, etc - https://phabricator.wikimedia.org/T219861 (10Papaul) 05Open→03Resolved complete
[14:28:26] <wikibugs>	 10Operations, 10ops-codfw, 10Cloud-VPS, 10DC-Ops, and 2 others: labtestnet2003.codfw.wmnet: rename to cloudnet2003-dev.codfw.wmnet and reimage to stretch - https://phabricator.wikimedia.org/T219776 (10Papaul)
[14:30:48] <wikibugs>	 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10EBernhardson) >>! In T148843#5081277, @Miriam wrote: > Thanks @EBernha...
[14:33:15] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2070 - https://phabricator.wikimedia.org/T219852 (10Papaul) a:05Papaul→03Marostegui complete
[14:33:56] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2070 - https://phabricator.wikimedia.org/T219852 (10Marostegui) Thanks! `       physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 600 GB, Rebuilding) `
[14:35:12] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[14:36:58] <icinga-wm>	 RECOVERY - puppet last run on mw1310 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[14:40:11] <wikibugs>	 10Operations, 10Phabricator, 10Traffic: Make phame cacheable - https://phabricator.wikimedia.org/T219978 (10ema)
[14:45:38] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[14:50:29] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Update sqoop number of processors to 10 [puppet] - 10https://gerrit.wikimedia.org/r/500964 (owner: 10Joal)
[14:51:13] <wikibugs>	 10Operations, 10Parsoid, 10RESTBase, 10VisualEditor, and 5 others: Consider stashing data-parsoid for VE - https://phabricator.wikimedia.org/T215956 (10Pchelolo)
[14:51:28] <icinga-wm>	 PROBLEM - cache_text: Varnishkafka Webrequest Delivery Errors per second on icinga1001 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [5.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1&var-instance=webrequest&var-host=All
[14:52:09] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] wmcs: Migrate tools-checker to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243) (owner: 10BryanDavis)
[14:52:54] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[14:53:12] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[14:55:09] <logmsgbot>	 !log anomie@deploy1001 Synchronized php-1.33.0-wmf.23/maintenance/includes/MigrateActors.php: Backporting fix from [[gerrit:500754]] (duration: 01m 01s)
[14:55:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:55:33] <elukey>	 checking varnishkafka
[14:56:37] <logmsgbot>	 !log anomie@deploy1001 Synchronized php-1.33.0-wmf.24/maintenance/includes/MigrateActors.php: Backporting fix from [[gerrit:500754]] (duration: 01m 01s)
[14:56:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:57:20] <wikibugs>	 (03CR) 10Bstorm: cloudstore: start refactor for role switch up around the labstores (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/500801 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm)
[14:58:45] <wikibugs>	 (03PS1) 10Alex Monk: profile::cache::ssl::wikibase: Simplify [puppet] - 10https://gerrit.wikimedia.org/r/500973
[14:59:10] <icinga-wm>	 RECOVERY - cache_text: Varnishkafka Webrequest Delivery Errors per second on icinga1001 is OK: OK: Less than 1.00% above the threshold [1.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1&var-instance=webrequest&var-host=All
[14:59:30] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::cache::ssl::wikibase: Simplify [puppet] - 10https://gerrit.wikimedia.org/r/500973 (owner: 10Alex Monk)
[14:59:33] <logmsgbot>	 !log anomie@mwmaint1002 Fixing empty values for 'target_author_actor' in log_search on section 1 wikis for T215525
[14:59:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:59:36] <stashbot>	 T215525: log_search rows with ls_field='target_author_actor' and empty ls_value are created during actor migration - https://phabricator.wikimedia.org/T215525
[14:59:42] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: batch_sleep should be None, not 0.0 [cookbooks] - 10https://gerrit.wikimedia.org/r/500974
[15:00:06] <logmsgbot>	 !log anomie@mwmaint1002 Fixing empty values for 'target_author_actor' in log_search on section 2 wikis for T215525
[15:00:06] <logmsgbot>	 !log anomie@mwmaint1002 Fixing empty values for 'target_author_actor' in log_search on remaining section 3 wikis for T215525
[15:00:06] <logmsgbot>	 !log anomie@mwmaint1002 Fixing empty values for 'target_author_actor' in log_search on section 4 wikis for T215525
[15:00:06] <logmsgbot>	 !log anomie@mwmaint1002 Fixing empty values for 'target_author_actor' in log_search on section 5 wikis for T215525
[15:00:06] <logmsgbot>	 !log anomie@mwmaint1002 Fixing empty values for 'target_author_actor' in log_search on section 6 wikis for T215525
[15:00:06] <logmsgbot>	 !log anomie@mwmaint1002 Fixing empty values for 'target_author_actor' in log_search on section 7 wikis for T215525
[15:00:06] <logmsgbot>	 !log anomie@mwmaint1002 Fixing empty values for 'target_author_actor' in log_search on section 8 wikis for T215525
[15:00:07] <logmsgbot>	 !log anomie@mwmaint1002 Fixing empty values for 'target_author_actor' in log_search on wikitech for T215525
[15:00:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:00:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:00:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:00:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:00:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:00:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:00:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:00:30] <wikibugs>	 (03PS2) 10Alex Monk: profile::cache::ssl::wikibase: Simplify [puppet] - 10https://gerrit.wikimedia.org/r/500973
[15:00:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:06] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:03:49] <wikibugs>	 10Operations, 10Release-Engineering-Team: mwdebug2001 "/" almost full - https://phabricator.wikimedia.org/T219989 (10greg)
[15:03:56] <wikibugs>	 10Operations, 10Release-Engineering-Team: mwdebug2001 "/" almost full - https://phabricator.wikimedia.org/T219989 (10thcipriani) This is related to T218783
[15:05:11] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM with a caveat" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/500974 (owner: 10Gehel)
[15:05:12] <wikibugs>	 10Operations, 10Puppet, 10Packaging: upgrade facter and puppet across the fleet - https://phabricator.wikimedia.org/T219803 (10jbond) Clang-4.0 is provided by security jessie/updates and have managed to get pbuilder working by adding the following  ` deb http://security.debian.org/ jessie/updates main `  to:...
[15:05:30] <wikibugs>	 10Operations, 10RESTBase, 10RESTBase-API, 10serviceops, and 3 others: Make RESTBase spec standard compliant and switch to OpenAPI 3.0 - https://phabricator.wikimedia.org/T218218 (10mobrovac) p:05Triage→03Normal
[15:06:29] <wikibugs>	 (03PS3) 10Dzahn: varnish/trafficserver: add regex to cover www.wikiba.se as well [puppet] - 10https://gerrit.wikimedia.org/r/500715 (https://phabricator.wikimedia.org/T99531)
[15:06:46] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1078 s3 primary DB master BBU pre-failure - https://phabricator.wikimedia.org/T219115 (10Marostegui) @Cmjohnson let us know that the BBU arrived and he'll need to put the server down to be able to replace it. So we need to do a failover and failback to db1075 (the previous...
[15:06:57] <wikibugs>	 (03CR) 10Dzahn: "> Patch Set 2:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/500715 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn)
[15:07:00] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on db2070 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2070&var-datasource=codfw+prometheus/ops
[15:07:00] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1078 s3 primary DB master BBU pre-failure - https://phabricator.wikimedia.org/T219115 (10Marostegui) a:05Cmjohnson→03Marostegui
[15:07:12] <wikibugs>	 10Operations, 10RESTBase, 10RESTBase-API, 10serviceops, and 3 others: Make RESTBase spec standard compliant and switch to OpenAPI 3.0 - https://phabricator.wikimedia.org/T218218 (10mobrovac)
[15:08:20] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:08:48] <wikibugs>	 10Operations, 10RESTBase, 10RESTBase-API, 10serviceops, and 3 others: Make RESTBase spec standard compliant and switch to OpenAPI 3.0 - https://phabricator.wikimedia.org/T218218 (10mobrovac)
[15:09:12] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10jcrespo)
[15:09:24] <wikibugs>	 10Operations, 10RESTBase, 10RESTBase-API, 10serviceops, and 3 others: Make RESTBase spec standard compliant and switch to OpenAPI 3.0 - https://phabricator.wikimedia.org/T218218 (10mobrovac) One other thing left to do here: replace optional parameters in the `/sys` hierarchy specs.
[15:09:52] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10jcrespo) 05Open→03Resolved a:05jcrespo→03Papaul This is done, except the problems with mounting point of the ssds, to be handled...
[15:10:04] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:12:15] <wikibugs>	 (03CR) 10Bstorm: "Will merge after Iefcc0a8ea51a3cddc0e79218809e14d97acfc186 is merged" [puppet] - 10https://gerrit.wikimedia.org/r/500535 (https://phabricator.wikimedia.org/T219817) (owner: 10Bstorm)
[15:13:19] <wikibugs>	 (03PS1) 10Ladsgroup: Add mediawiki.org to the URL shortener whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500976
[15:13:21] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "Oh yes, we should do this one asap.  Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/500928 (https://phabricator.wikimedia.org/T219563) (owner: 10Ladsgroup)
[15:14:15] <wikibugs>	 (03PS4) 10Dzahn: varnish/trafficserver: add regex to cover www.wikiba.se as well [puppet] - 10https://gerrit.wikimedia.org/r/500715 (https://phabricator.wikimedia.org/T99531)
[15:14:17] <wikibugs>	 (03CR) 10Vgutierrez: "looks good but please go a step further and get rid of check_ssl_unified_sni_letsencrypt_no_ocsp" [puppet] - 10https://gerrit.wikimedia.org/r/500973 (owner: 10Alex Monk)
[15:14:41] <wikibugs>	 10Operations, 10Puppet, 10Packaging: upgrade facter and puppet across the fleet - https://phabricator.wikimedia.org/T219803 (10MoritzMuehlenhoff) >>! In T219803#5081880, @jbond wrote: > Clang-4.0 is provided by security jessie/updates and have managed to get pbuilder working by adding the following  Ah, righ...
[15:16:27] <wikibugs>	 (03PS3) 10Dzahn: confd: remove upstart support [puppet] - 10https://gerrit.wikimedia.org/r/500913
[15:16:36] <logmsgbot>	 !log volans@cumin1001 START - Cookbook sre.hosts.downtime
[15:16:36] <logmsgbot>	 !log volans@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[15:16:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:16:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:17:42] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] confd: remove upstart support [puppet] - 10https://gerrit.wikimedia.org/r/500913 (owner: 10Dzahn)
[15:18:12] <volans>	 !log shutdown ms-be2026 for firmware upgrade - T219854
[15:18:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:18:16] <stashbot>	 T219854: Broken disk on ms-be2026 - https://phabricator.wikimedia.org/T219854
[15:18:41] <wikibugs>	 (03CR) 10Krinkle: Enable UrlShortener in mediawikiwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500910 (https://phabricator.wikimedia.org/T108557) (owner: 10Ladsgroup)
[15:19:34] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:20:39] <wikibugs>	 (03PS2) 10Gehel: elasticsearch: batch_sleep should be None, not 0.0 [cookbooks] - 10https://gerrit.wikimedia.org/r/500974
[15:22:50] <wikibugs>	 10Operations, 10Packaging: Add security apt security suites to pbuilder base images - https://phabricator.wikimedia.org/T220003 (10jbond)
[15:23:01] <wikibugs>	 10Operations, 10Puppet, 10Packaging: upgrade facter and puppet across the fleet - https://phabricator.wikimedia.org/T219803 (10jbond) have created https://phabricator.wikimedia.org/T220003
[15:23:05] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: rack/setup/deploy eqiad dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T219399 (10jcrespo)
[15:23:08] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install (5) dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10jcrespo)
[15:23:10] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10jcrespo)
[15:23:15] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10jcrespo)
[15:23:18] <wikibugs>	 10Operations, 10Citoid, 10Wikimedia-Logstash, 10service-runner, and 2 others: Move citoid logging to new logging pipeline - https://phabricator.wikimedia.org/T219919 (10mobrovac) p:05Triage→03Normal
[15:26:02] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:26:05] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10service-runner, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), and 2 others: Move graphoid logging to new logging pipeline - https://phabricator.wikimedia.org/T219923 (10mobrovac) p:05Triage→03Normal Since Graphoid currently does no...
[15:26:15] <wikibugs>	 (03CR) 10Volans: cumin: add more hadoop-related aliases (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/500967 (https://phabricator.wikimedia.org/T218343) (owner: 10Elukey)
[15:27:14] <wikibugs>	 (03CR) 10Elukey: cumin: add more hadoop-related aliases (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/500967 (https://phabricator.wikimedia.org/T218343) (owner: 10Elukey)
[15:28:32] <wikibugs>	 (03PS2) 10Elukey: cumin: add more hadoop-related aliases [puppet] - 10https://gerrit.wikimedia.org/r/500967 (https://phabricator.wikimedia.org/T218343)
[15:29:28] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: serverpackages: require apt-get update before moving on [puppet] - 10https://gerrit.wikimedia.org/r/500977 (https://phabricator.wikimedia.org/T219981)
[15:30:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: serverpackages: require apt-get update before moving on [puppet] - 10https://gerrit.wikimedia.org/r/500977 (https://phabricator.wikimedia.org/T219981) (owner: 10Arturo Borrero Gonzalez)
[15:30:46] <wikibugs>	 10Operations, 10Parsoid, 10Wikimedia-Logstash, 10service-runner, and 2 others: Move parsoid logging to new logging pipeline - https://phabricator.wikimedia.org/T219927 (10mobrovac) @fgiunchedi Since Parsoid is being moved over to PHP in the next 2 Qs, is there still point in moving the Node.js version over...
[15:30:49] <wikibugs>	 (03PS1) 10Dzahn: park wikivoyage-old.org [dns] - 10https://gerrit.wikimedia.org/r/500978 (https://phabricator.wikimedia.org/T219867)
[15:31:05] <wikibugs>	 10Operations: netbox: User's groups not updated - https://phabricator.wikimedia.org/T220004 (10GTirloni)
[15:31:38] <wikibugs>	 (03PS2) 10Andrew Bogott: openstack::monitor::spreadcheck: Use a list of projects [puppet] - 10https://gerrit.wikimedia.org/r/500823 (owner: 10Alex Monk)
[15:32:19] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] "Puppet compiler shows only test descriptions changing." [puppet] - 10https://gerrit.wikimedia.org/r/500823 (owner: 10Alex Monk)
[15:34:44] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] wikilabels: update alias for database [puppet] - 10https://gerrit.wikimedia.org/r/500928 (https://phabricator.wikimedia.org/T219563) (owner: 10Ladsgroup)
[15:34:53] <wikibugs>	 (03PS2) 10Bstorm: wikilabels: update alias for database [puppet] - 10https://gerrit.wikimedia.org/r/500928 (https://phabricator.wikimedia.org/T219563) (owner: 10Ladsgroup)
[15:36:40] <wikibugs>	 (03PS2) 10Andrew Bogott: openstack::monitor::spreadcheck: rm old renaming absent file resources [puppet] - 10https://gerrit.wikimedia.org/r/500824 (owner: 10Alex Monk)
[15:37:32] <wikibugs>	 (03PS2) 10Andrew Bogott: openstack::monitor::spreadcheck: add cloudinfra config [puppet] - 10https://gerrit.wikimedia.org/r/500825 (owner: 10Alex Monk)
[15:38:00] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack::monitor::spreadcheck: rm old renaming absent file resources [puppet] - 10https://gerrit.wikimedia.org/r/500824 (owner: 10Alex Monk)
[15:38:28] <wikibugs>	 10Operations, 10Reading-Infrastructure-Team-Backlog, 10Wikimedia-Logstash, 10service-runner, and 2 others: Move proton logging to new logging pipeline - https://phabricator.wikimedia.org/T219925 (10LGoto) p:05Triage→03Normal
[15:38:32] <wikibugs>	 (03PS2) 10Alex Monk: sslcert: update-ocsp: Fix passing Host header in absence of proxy [puppet] - 10https://gerrit.wikimedia.org/r/500398
[15:38:33] <wikibugs>	 (03PS4) 10Alex Monk: tlsproxy::localssl: No hardcoding of prod webproxy hostname [puppet] - 10https://gerrit.wikimedia.org/r/500406
[15:38:43] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:39:01] <wikibugs>	 (03PS1) 10Jcrespo: mariadb-snapshots: Setup full daily snapshots for all sections [puppet] - 10https://gerrit.wikimedia.org/r/500980 (https://phabricator.wikimedia.org/T206203)
[15:39:06] <wikibugs>	 10Operations, 10Reading-Infrastructure-Team-Backlog, 10Wikimedia-Logstash, 10service-runner, and 2 others: Move mobile apps logging to new logging pipeline - https://phabricator.wikimedia.org/T219924 (10LGoto) p:05Triage→03Normal
[15:39:29] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:40:01] <wikibugs>	 (03PS3) 10Alex Monk: profile::cache::ssl::wikibase: Simplify [puppet] - 10https://gerrit.wikimedia.org/r/500973
[15:40:13] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack::monitor::spreadcheck: add cloudinfra config [puppet] - 10https://gerrit.wikimedia.org/r/500825 (owner: 10Alex Monk)
[15:42:54] <wikibugs>	 (03PS26) 10Andrew Bogott: wmcs: Migrate tools-checker to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243) (owner: 10BryanDavis)
[15:44:06] <wikibugs>	 (03PS2) 10Jcrespo: mariadb-snapshots: Setup full daily snapshots for all codfw sections [puppet] - 10https://gerrit.wikimedia.org/r/500980 (https://phabricator.wikimedia.org/T206203)
[15:45:33] <wikibugs>	 (03CR) 10WMDE-Fisch: wikiba.se: add Apache rewrites for www to naked domain (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/500695 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn)
[15:46:21] <wikibugs>	 (03CR) 10Andrew Bogott: "Latest diffs:  https://puppet-compiler.wmflabs.org/compiler1001/15535/" [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243) (owner: 10BryanDavis)
[15:54:31] <wikibugs>	 10Operations, 10Release-Engineering-Team: mwdebug2001 and mwdebug2002 "/" almost full - https://phabricator.wikimedia.org/T219989 (10jcrespo)
[15:56:11] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: Broken disk on ms-be2026 - https://phabricator.wikimedia.org/T219854 (10Papaul)  HP FlexFabric 10Gb 2port 534FLR-SFP+ Adapter 7.17.19 Embedded HPE Smart Storage Battery 1 Firmware 1.1 Embedded iLO 2.60 May 23 2018 System Board Intelligent Platform Abstraction Da...
[15:57:35] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:58:07] <icinga-wm>	 PROBLEM - tools project instance distribution on cloudcontrol1003 is CRITICAL: CRITICAL: k8s-etcd,prometheus,static class instances not spread out enough https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:58:57] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[15:59:07] <wikibugs>	 (03CR) 10BBlack: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/500731 (https://phabricator.wikimedia.org/T208263) (owner: 10BBlack)
[15:59:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Turn on non-chaining CNAMEs experimental option [dns] - 10https://gerrit.wikimedia.org/r/500731 (https://phabricator.wikimedia.org/T208263) (owner: 10BBlack)
[16:00:05] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor My software never has bugs. It just develops random features. Rise for Morning SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190403T1600).
[16:00:05] <jouncebot>	 Zoranzoki21, Tulsi, and Pchelolo: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[16:00:13] <Pchelolo>	 here
[16:00:19] <Zoranzoki21>	 \o
[16:02:14] <twentyafterfour>	 can anyone swat? all of #wikimedia-releng is in a meeting right now 
[16:03:11] <icinga-wm>	 PROBLEM - puppet last run on icinga1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:07:34] <Lucas_WMDE>	 I could do the deploys, but AFAIK some SWAT member should still be available on standby or something in case I really screw up
[16:08:53] <twentyafterfour>	 Lucas_WMDE: there are several of us on standby ;) 
[16:09:04] <twentyafterfour>	 I'll keep an eye on this channel 
[16:09:10] <Lucas_WMDE>	 okay, then I can do it
[16:09:23] <twentyafterfour>	 Lucas_WMDE: thank you!  much appreciated
[16:09:45] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] "I checked that there’s no more namespace 104 in https://ar.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces, so I thin" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500153 (https://phabricator.wikimedia.org/T217507) (owner: 10Zoranzoki21)
[16:09:54] <wikibugs>	 (03PS5) 10Lucas Werkmeister (WMDE): Remove namespace 104 from FlaggedRevs configuration for arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500153 (https://phabricator.wikimedia.org/T217507) (owner: 10Zoranzoki21)
[16:10:04] <wikibugs>	 10Operations, 10Puppet, 10Packaging: upgrade facter and puppet across the fleet - https://phabricator.wikimedia.org/T219803 (10jbond) Looks like we may need to rebuild everything with stdc++[1]  already tried leatherman and get a simlar errors pointing to  relating to boost, hopefully we dont need to rebuild...
[16:10:08] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500153 (https://phabricator.wikimedia.org/T217507) (owner: 10Zoranzoki21)
[16:10:39] <Zoranzoki21>	 Oh, super. I am online :)
[16:11:01] <Lucas_WMDE>	 good, because I already started with your first change :)
[16:11:07] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] "I think we can merge this for now to stop alerting so often, while trying to figure out the reason for this in https://phabricator.wikimed" [puppet] - 10https://gerrit.wikimedia.org/r/500839 (owner: 10CRusnov)
[16:11:24] <wikibugs>	 (03Merged) 10jenkins-bot: Remove namespace 104 from FlaggedRevs configuration for arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500153 (https://phabricator.wikimedia.org/T217507) (owner: 10Zoranzoki21)
[16:12:22] <Zoranzoki21>	 yes, I saw it now
[16:12:23] <Lucas_WMDE>	 Zoranzoki21: the first patch (ns104) should be on mwdebug1002, please test
[16:12:30] <Lucas_WMDE>	 I’ll review the next one in the meantime
[16:12:51] <Zoranzoki21>	 Lucas_WMDE: ok, will do
[16:13:33] <Zoranzoki21>	 so slow...
[16:13:39] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Add three domains at wgCopyUploadDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500154 (https://phabricator.wikimedia.org/T216886) (owner: 10Zoranzoki21)
[16:13:57] <wikibugs>	 (03PS3) 10Zoranzoki21: Add three domains at wgCopyUploadDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500154 (https://phabricator.wikimedia.org/T216886)
[16:14:02] <Tulsi>	 Hi
[16:14:17] <Zoranzoki21>	 Good to go, check logs
[16:14:21] <Lucas_WMDE>	 okay
[16:14:57] <Zoranzoki21>	 500154?
[16:15:22] <Lucas_WMDE>	 that one’s next
[16:15:26] <Lucas_WMDE>	 the previous one isn’t done yet though
[16:15:41] <wikibugs>	 (03PS1) 10Zoranzoki21: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500987 (https://phabricator.wikimedia.org/T220001)
[16:15:49] <wikibugs>	 (03CR) 10Krinkle: "I cherry-picked this to deployment-puppetmaster03 and ran 'puppet agent -tv' on webperf11 but got this error:" [puppet] - 10https://gerrit.wikimedia.org/r/499537 (https://phabricator.wikimedia.org/T219417) (owner: 10Gilles)
[16:16:02] <Lucas_WMDE>	 hi Tulsi btw :)
[16:16:07] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/flaggedrevs.php: SWAT: [[gerrit:500153|Remove namespace 104 from FlaggedRevs configuration for arwiki (T217507)]] (duration: 01m 00s)
[16:16:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:16:17] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500154 (https://phabricator.wikimedia.org/T216886) (owner: 10Zoranzoki21)
[16:16:17] <stashbot>	 T217507: FlaggedRevs still treats a removed namespace as if it still exists (arwiki) - https://phabricator.wikimedia.org/T217507
[16:16:20] <Zoranzoki21>	 oh ok
[16:16:26] <Tulsi>	 Hello Lucas_WMDE :-)
[16:16:34] <Tulsi>	 Please ping me when it's my turn.
[16:16:41] <Lucas_WMDE>	 will do, currently doing Zoranzoki21’s patches
[16:17:12] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/500974 (owner: 10Gehel)
[16:17:24] <wikibugs>	 (03Merged) 10jenkins-bot: Add three domains at wgCopyUploadDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500154 (https://phabricator.wikimedia.org/T216886) (owner: 10Zoranzoki21)
[16:17:33] <Zoranzoki21>	 Lucas_WMDE: Patch is public now, and works
[16:17:35] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] elasticsearch: batch_sleep should be None, not 0.0 [cookbooks] - 10https://gerrit.wikimedia.org/r/500974 (owner: 10Gehel)
[16:17:47] <Zoranzoki21>	 And 500154 can be merged directly
[16:18:05] <Lucas_WMDE>	 it’s on mwdebug1002 now
[16:18:09] <Lucas_WMDE>	 can it be tested?
[16:18:22] <wikibugs>	 (03PS2) 10Zoranzoki21: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500987 (https://phabricator.wikimedia.org/T220001)
[16:18:29] <wikibugs>	 (03PS3) 10Zoranzoki21: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500987 (https://phabricator.wikimedia.org/T220001)
[16:18:36] <Lucas_WMDE>	 I’ll just check that at least it’s not breaking Commons
[16:18:52] <Zoranzoki21>	 Lucas_WMDE: Yes, you can check it
[16:18:56] <Lucas_WMDE>	 (why is the debug server so slow? I don’t remember it being that bad)
[16:18:57] <icinga-wm>	 RECOVERY - puppet last run on icinga1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:19:12] <Zoranzoki21>	 Lucas_WMDE: I told it already previously
[16:19:41] <wikibugs>	 (03CR) 10jenkins-bot: Remove namespace 104 from FlaggedRevs configuration for arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500153 (https://phabricator.wikimedia.org/T217507) (owner: 10Zoranzoki21)
[16:19:43] <wikibugs>	 (03CR) 10jenkins-bot: Add three domains at wgCopyUploadDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500154 (https://phabricator.wikimedia.org/T216886) (owner: 10Zoranzoki21)
[16:20:29] <wikibugs>	 (03CR) 10Acamicamacaraca: [C: 03+1] Enable Draft namespace on srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500761 (https://phabricator.wikimedia.org/T214428) (owner: 10Zoranzoki21)
[16:20:31] <Zoranzoki21>	 Lucas_WMDE: I loaded commons, everything works
[16:20:41] <Lucas_WMDE>	 okay
[16:20:53] <Lucas_WMDE>	 going ahead
[16:21:25] <Zoranzoki21>	 Lucas_WMDE: I added https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/500987/ on deployments calendar too, can you do it after 500761?
[16:21:58] <wikibugs>	 (03PS2) 10Zoranzoki21: Enable Draft namespace on srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500761 (https://phabricator.wikimedia.org/T214428)
[16:22:37] <Lucas_WMDE>	 is it okay if we do it at the end of the SWAT window, if there’s still time?
[16:22:41] <Lucas_WMDE>	 doesn’t look urgent from the task
[16:22:52] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:500154|Add three domains at wgCopyUploadDomains (T216886, T219075)]] (duration: 01m 00s)
[16:22:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:57] <stashbot>	 T216886: Please add uni-hamburg.de to $wgCopyUploadsDomains - https://phabricator.wikimedia.org/T216886
[16:22:58] <stashbot>	 T219075: Add bruun-rasmussen.dk to the wgCopyUploadsDomains whitelist of Wikimedia Commons - https://phabricator.wikimedia.org/T219075
[16:23:16] <Zoranzoki21>	 Lucas_WMDE: Ok
[16:23:37] <Zoranzoki21>	 Now you can do adding namespace on srwiki
[16:24:49] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] "LGTM – NS 118 is still free on https://sr.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces and also matches the Draft " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500761 (https://phabricator.wikimedia.org/T214428) (owner: 10Zoranzoki21)
[16:25:06] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500761 (https://phabricator.wikimedia.org/T214428) (owner: 10Zoranzoki21)
[16:25:16] <Lucas_WMDE>	 heh, it’s nice when I don’t even need to rebase the change, thank you :)
[16:25:19] <wikibugs>	 (03PS5) 10Alex Monk: tlsproxy::localssl: No hardcoding of prod webproxy hostname [puppet] - 10https://gerrit.wikimedia.org/r/500406
[16:25:32] <Zoranzoki21>	 Lucas_WMDE: Your welcome
[16:26:13] <wikibugs>	 (03Merged) 10jenkins-bot: Enable Draft namespace on srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500761 (https://phabricator.wikimedia.org/T214428) (owner: 10Zoranzoki21)
[16:26:41] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[16:27:00] <Zoranzoki21>	 Lucas_WMDE: Is it ready for mwdebug?
[16:27:05] <Lucas_WMDE>	 it is now
[16:27:17] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[16:27:18] <Zoranzoki21>	 Will check
[16:27:22] <Lucas_WMDE>	 seems to work in the API at least
[16:27:55] <Zoranzoki21>	 Lucas_WMDE: Yes, I can confirm it. LGTM
[16:27:59] <Lucas_WMDE>	 alright, deploying
[16:28:15] <Zoranzoki21>	 https://i.snag.gy/seOh5w.jpg
[16:28:32] <wikibugs>	 10Operations, 10Office-IT, 10Research, 10Wikimedia-Mailing-lists: Create research-alerts mailing list - https://phabricator.wikimedia.org/T219309 (10bmansurov) @Dzahn thanks. Turns out a Google group needs at least one member besides the admin. The only person who will use this mailing list is me for now....
[16:29:05] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2003 is CRITICAL: instance=kubernetes2003.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[16:29:17] <Lucas_WMDE>	 um
[16:29:32] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:500761|Enable Draft namespace on srwiki (T214428)]] (duration: 01m 00s)
[16:29:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:35] <stashbot>	 T214428: Enable Draft namespace on sr.wikipedia - https://phabricator.wikimedia.org/T214428
[16:29:37] <Lucas_WMDE>	 T218796 says that the new idwiktionary namespace will require a maintenance script after deployment
[16:29:37] <stashbot>	 T218796: Add namespace "Lampiran" at ID wiktionary - https://phabricator.wikimedia.org/T218796
[16:29:45] <wikibugs>	 (03PS3) 10Elukey: cumin: add more hadoop-related aliases [puppet] - 10https://gerrit.wikimedia.org/r/500967 (https://phabricator.wikimedia.org/T218343)
[16:29:45] <Lucas_WMDE>	 is that the case for the new srwiki Draft namespace too?
[16:30:00] <Lucas_WMDE>	 that would be a question for the SWAT folks, e. g. twentyafterfour :)
[16:30:15] * Lucas_WMDE tries to find documentation in the meantime
[16:30:18] <Zoranzoki21>	 Lucas_WMDE: No, for srwiki we no need it, because namespace is empty
[16:30:46] <Zoranzoki21>	 Lucas_WMDE: For idwiktionary you need it because it contains articles already.
[16:30:48] <Zoranzoki21>	 mwscript namespaceDupes.php --wiki=idwiktionary --fix
[16:30:53] <Lucas_WMDE>	 okay
[16:31:58] <Zoranzoki21>	 Lucas_WMDE: srwiki is ok
[16:32:34] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] "LGTM – namespace 102 is still free according to https://id.wiktionary.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces&formatver" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499530 (https://phabricator.wikimedia.org/T218796) (owner: 10Tulsi Bhagat)
[16:32:41] <wikibugs>	 (03PS3) 10Lucas Werkmeister (WMDE): Add namespace "Lampiran" at id.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499530 (https://phabricator.wikimedia.org/T218796) (owner: 10Tulsi Bhagat)
[16:32:54] <Lucas_WMDE>	 Tulsi: starting with your change now
[16:33:00] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499530 (https://phabricator.wikimedia.org/T218796) (owner: 10Tulsi Bhagat)
[16:33:07] <Tulsi>	 Okay
[16:33:51] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] "pcc looks happy, let's merge this tomorrow EU morning: https://puppet-compiler.wmflabs.org/compiler1002/15536/" [puppet] - 10https://gerrit.wikimedia.org/r/500973 (owner: 10Alex Monk)
[16:34:06] <wikibugs>	 (03Merged) 10jenkins-bot: Add namespace "Lampiran" at id.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499530 (https://phabricator.wikimedia.org/T218796) (owner: 10Tulsi Bhagat)
[16:34:34] <Lucas_WMDE>	 Tulsi: it’s on mwdebug1002, please test
[16:34:41] <Tulsi>	 Testing
[16:35:16] <Tulsi>	 Looks good
[16:35:21] <Lucas_WMDE>	 alright, deploying
[16:35:33] <Tulsi>	 OK
[16:36:41] <Lucas_WMDE>	 Pchelolo: heads up, your change is coming up soon
[16:36:48] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:499530|Add namespace "Lampiran" at id.wiktionary (T218796)]] (duration: 00m 59s)
[16:36:49] <Lucas_WMDE>	 (I’m not yet done with idwiktionary though)
[16:36:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:36:53] <stashbot>	 T218796: Add namespace "Lampiran" at ID wiktionary - https://phabricator.wikimedia.org/T218796
[16:36:55] <Pchelolo>	 thank you Lucas_WMDE, 
[16:37:17] <wikibugs>	 (03CR) 10jenkins-bot: Enable Draft namespace on srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500761 (https://phabricator.wikimedia.org/T214428) (owner: 10Zoranzoki21)
[16:37:19] <wikibugs>	 (03CR) 10jenkins-bot: Add namespace "Lampiran" at id.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499530 (https://phabricator.wikimedia.org/T218796) (owner: 10Tulsi Bhagat)
[16:37:39] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/namespaceDupes.php --wiki=idwiktionary --fix # T218796 – 41 links to fix, 41 were resolvable, Looks good!
[16:37:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:38:09] <Lucas_WMDE>	 okay that didn’t take long at all, phew
[16:38:36] <Zoranzoki21>	 Lucas_WMDE: I think you should put output in comment at task
[16:38:45] <Zoranzoki21>	 Lucas_WMDE: Some deployers always do it
[16:38:46] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/500967 (https://phabricator.wikimedia.org/T218343) (owner: 10Elukey)
[16:38:50] <Lucas_WMDE>	 Zoranzoki21: the full output?
[16:39:19] <Zoranzoki21>	 Lucas_WMDE: Yes
[16:39:33] <Tulsi>	 hmm
[16:39:48] <wikibugs>	 10Operations, 10Continuous-Integration-Config, 10Patch-For-Review, 10User-zeljkofilipin: npm 6 consistently fails with "Z_DATA_ERROR: invalid distance too far back" on some repos - https://phabricator.wikimedia.org/T215562 (10Krinkle) a:05Krinkle→03MoritzMuehlenhoff OK. Looks like the image will alread...
[16:39:53] <wikibugs>	 (03CR) 10EBernhardson: [C: 03+1] Add 'depicts' statements to search index on testcommons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500080 (owner: 10Cparle)
[16:40:15] <Zoranzoki21>	 what it fixed....
[16:40:17] <Zoranzoki21>	 I think on it
[16:40:33] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2003 is CRITICAL: instance=kubernetes2003.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[16:41:00] <wikibugs>	 (03PS2) 10CRusnov: profile kubernetes node: Adjust latency alert thresholds [puppet] - 10https://gerrit.wikimedia.org/r/500839
[16:41:02] <Zoranzoki21>	 Lucas_WMDE: I means on this (as example) https://phabricator.wikimedia.org/T212100#4940931
[16:41:27] <Lucas_WMDE>	 Zoranzoki21: done
[16:41:34] <Lucas_WMDE>	 looks good? https://phabricator.wikimedia.org/T218796#5082331
[16:41:43] <Zoranzoki21>	 Lucas_WMDE: Yes, it is
[16:41:55] <Tulsi>	 Thank you so much Lucas_WMDE, \o/
[16:42:01] <Lucas_WMDE>	 okay, yay
[16:42:08] <Tulsi>	 ;)
[16:42:11] <Lucas_WMDE>	 going ahead with Pchelolo’s change now
[16:42:16] <Lucas_WMDE>	 thanks for the info Zoranzoki21
[16:42:35] <Zoranzoki21>	 Lucas_WMDE: yw
[16:42:36] <Pchelolo>	 Lucas_WMDE: gimme a headsup when on mwdebug, I'll test
[16:44:22] <Lucas_WMDE>	 +2ed, let’s hope gate-and-submit doesn’t take too long
[16:44:44] <logmsgbot>	 !log gehel@cumin2001 START - Cookbook sre.elasticsearch.rolling-reboot
[16:44:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:45:23] <Lucas_WMDE>	 okay, its jenkins jobs are already done but now it’s waiting for other changes in the gate-and-submit pipeline :/
[16:45:45] <Lucas_WMDE>	 sorry, nevermind, I was looking at the entirely wrong change
[16:46:34] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-General-or-Unknown, 10PHP 7.2 support: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Joe) >>! In T219279#5079510, @Jdforrester-WMF wrote: >>>! In T219279#5068956, @Joe wro...
[16:46:49] <Zoranzoki21>	 Lucas_WMDE: Can you do my throttle rule patch meanwhile?
[16:48:00] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 04-1] Add new throttle rule (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500987 (https://phabricator.wikimedia.org/T220001) (owner: 10Zoranzoki21)
[16:48:05] <Lucas_WMDE>	 Zoranzoki21: reviewed
[16:48:15] <Lucas_WMDE>	 but I’d prefer to get the backport done first
[16:48:17] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, 10serviceops, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10WMDE-leszek) Disclaimer: I do understand that SRE and others have been pretty busy last weeks, and I would absolutely take "we cannot reall...
[16:48:32] <wikibugs>	 (03PS4) 10Zoranzoki21: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500987 (https://phabricator.wikimedia.org/T220001)
[16:48:38] <wikibugs>	 (03PS5) 10Zoranzoki21: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500987 (https://phabricator.wikimedia.org/T220001)
[16:48:42] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on sodium - https://phabricator.wikimedia.org/T212010 (10RobH) 05Open→03Resolved robh@sodium:~$ sudo megacli -PDList -aALL |grep "Firmware state" Firmware state: Online, Spun Up Firmware state: Online, Spun Up Firmware state: Online, Spun Up Firmware state: Online,...
[16:48:45] <Zoranzoki21>	 Lucas_WMDE: Fixed
[16:49:38] <Lucas_WMDE>	 um
[16:49:51] <Lucas_WMDE>	 well, never mind
[16:50:07] <Lucas_WMDE>	 I’m not sure if the T0:00 in the Hackathon rule is intentional (midnight) or the same typo
[16:50:12] <Lucas_WMDE>	 but it doesn’t really matter at the moment I guess
[16:50:37] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500987 (https://phabricator.wikimedia.org/T220001) (owner: 10Zoranzoki21)
[16:50:45] <Zoranzoki21>	 Lucas_WMDE: It is not related to me
[16:50:49] <Lucas_WMDE>	 yeah
[16:50:51] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2002 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[16:51:45] <logmsgbot>	 !log gehel@cumin2001 END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
[16:51:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:52:37] <wikibugs>	 (03CR) 10CRusnov: [C: 03+2] profile kubernetes node: Adjust latency alert thresholds [puppet] - 10https://gerrit.wikimedia.org/r/500839 (owner: 10CRusnov)
[16:52:47] <Lucas_WMDE>	 Pchelolo: your change is on mwdebug1002 now, please test
[16:53:09] <Pchelolo>	 Lucas_WMDE: doing so. will ping when done
[16:53:23] <Lucas_WMDE>	 ok
[16:53:33] <Lucas_WMDE>	 and wmf.23 doesn’t need a backport?
[16:53:39] <bd808>	 marxarelli: Are you running the train this week? I just made a cherry-pick that it would be really really nice to get into the train for wikitech -- https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/LdapAuthentication/+/500994/
[16:57:05] <marxarelli>	 bd808: i am
[16:57:24] * bd808 makes puppet dog eyes at marxarelli 
[16:57:32] <bd808>	 *puppy
[16:57:33] <marxarelli>	 haha
[16:57:37] <Lucas_WMDE>	 lol
[16:57:45] <marxarelli>	 was going to say, puppet eyes are not effective on me :)
[16:58:26] <Pchelolo>	 Lucas_WMDE: seems ok. and wmf.23 does not need a backport, the issue was introduced in wmf.24
[16:58:30] <Lucas_WMDE>	 ok
[16:58:33] <Lucas_WMDE>	 going ahead
[16:58:36] <Lucas_WMDE>	 thanks
[16:58:48] <bd808>	 hopefully not this puppet dog -- https://www.amazon.com/Rotting-Zombie-Puppet-Halloween-Decoration/dp/B075FV1BRQ
[16:59:24] <Zoranzoki21>	 lmao
[16:59:51] <Zoranzoki21>	 Lucas_WMDE: I will move throttle in second SWAT windows as it is not more time anymore
[16:59:57] <Zoranzoki21>	 *window
[16:59:58] <Lucas_WMDE>	 Zoranzoki21: I was about to do it
[17:00:08] <Zoranzoki21>	 Lucas_WMDE: If you can do it, it will be great
[17:00:09] <Lucas_WMDE>	 I think it’d be okay to extend the SWAT by a few minutes
[17:00:18] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized php-1.33.0-wmf.24/extensions/EventBus: SWAT: [[gerrit:500959|Incorrect order of calls in createPageDeleteEvent.]] (duration: 00m 59s)
[17:00:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:00:22] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500987 (https://phabricator.wikimedia.org/T220001) (owner: 10Zoranzoki21)
[17:00:26] <Lucas_WMDE>	 can’t be tested anyways right?
[17:00:27] <Zoranzoki21>	 Lucas_WMDE: I think so, becauuse throttles no needs mwdebug
[17:00:32] <Lucas_WMDE>	 well except for “it doesn’t break the site”
[17:00:36] <Zoranzoki21>	 *because
[17:00:45] <Zoranzoki21>	 It will not break
[17:01:12] <Lucas_WMDE>	 yeah I guess the canaries should be enough for that
[17:01:13] <marxarelli>	 bd808: i have no objections to deploying that but i'm not sure i can review/merge it
[17:01:33] <wikibugs>	 (03Merged) 10jenkins-bot: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500987 (https://phabricator.wikimedia.org/T220001) (owner: 10Zoranzoki21)
[17:01:34] <bd808>	 marxarelli: I already did the merge of it into master
[17:01:36] <wikibugs>	 (03CR) 10Zoranzoki21: Add new throttle rule (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500987 (https://phabricator.wikimedia.org/T220001) (owner: 10Zoranzoki21)
[17:01:47] <wikibugs>	 (03CR) 10jenkins-bot: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500987 (https://phabricator.wikimedia.org/T220001) (owner: 10Zoranzoki21)
[17:01:52] <marxarelli>	 and it's exercised somewhere? (beta)
[17:01:53] <bd808>	 so this is just a backport rubberstamp and then the critical part is the deploy :)
[17:02:16] <bd808>	 marxarelli: local test instances by me and akosiaris 
[17:02:21] <marxarelli>	 kk
[17:02:31] <bd808>	 we don't have a deployment-prep version of wikitech
[17:02:43] <marxarelli>	 word for me then :)
[17:02:46] <marxarelli>	 er, works
[17:03:05] <marxarelli>	 puppies, puppets, words, works!
[17:03:07] <Zoranzoki21>	 Lucas_WMDE: Is it deployed? Can I go now?
[17:03:24] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/throttle.php: SWAT: [[gerrit:500987|Add new throttle rule (T220001)]] (duration: 00m 58s)
[17:03:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:03:27] <stashbot>	 T220001: Throttle Exception for Amnesty International edit-a-thon on April 14th - https://phabricator.wikimedia.org/T220001
[17:03:31] <Lucas_WMDE>	 Zoranzoki21: it got done just now
[17:03:35] <Lucas_WMDE>	 should be fine now
[17:03:40] <Zoranzoki21>	 Yep, thanks
[17:03:42] <Zoranzoki21>	 Cya
[17:03:45] * Zoranzoki21 waves
[17:04:38] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/namespaceDupes.php --wiki=srwiki --fix # T214428 – 0 pages to fix, 0 links to fix, Looks good!
[17:04:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:04:42] <stashbot>	 T214428: Enable Draft namespace on sr.wikipedia - https://phabricator.wikimedia.org/T214428
[17:04:46] <Lucas_WMDE>	 !log EU SWAT done
[17:04:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:12:05] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[17:14:18] <herron>	 !log depooling kafka1001 to restart eventbus and kafka services for security updates
[17:14:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:15:06] <icinga-wm>	 PROBLEM - Hadoop Namenode - Stand By on an-master1002 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.hdfs.server.namenode.NameNode https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration
[17:15:08] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: serverpackages: require apt-get update before moving on [puppet] - 10https://gerrit.wikimedia.org/r/500977 (https://phabricator.wikimedia.org/T219981)
[17:15:31] <volans>	 elukey: any work in progress?
[17:15:35] <robh>	 that paged
[17:15:36] <elukey>	 nope
[17:15:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: serverpackages: require apt-get update before moving on [puppet] - 10https://gerrit.wikimedia.org/r/500977 (https://phabricator.wikimedia.org/T219981) (owner: 10Arturo Borrero Gonzalez)
[17:16:12] <marostegui>	 is that active or an standby?
[17:16:46] <elukey>	 standby, i am checking
[17:16:56] <marostegui>	 thanks :)
[17:17:18] <wikibugs>	 (03PS1) 10Anomie: Set actor migration to read-new on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501000 (https://phabricator.wikimedia.org/T188327)
[17:17:35] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: openstack: serverpackages: require apt-get update before moving on [puppet] - 10https://gerrit.wikimedia.org/r/500977 (https://phabricator.wikimedia.org/T219981)
[17:17:48] <elukey>	 never seen this error before
[17:17:55] * apergos peeks in
[17:19:48] <elukey>	 !log restart hadoop-hdfs-namenode on an-master1002 after forced shutdown due to errors
[17:19:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:17] <elukey>	 need to check why it paged, I thought we removed pages for hadoop
[17:26:46] <wikibugs>	 (03PS1) 10Hashar: jenkins: prevent reading IRC passwords [puppet] - 10https://gerrit.wikimedia.org/r/501001 (https://phabricator.wikimedia.org/T219991)
[17:27:25] <wikibugs>	 (03PS1) 10Elukey: profile::hadoop::master::standby: remove page to SRE [puppet] - 10https://gerrit.wikimedia.org/r/501002
[17:27:37] <wikibugs>	 (03PS1) 10Jforrester: Invariant config cleanup: I - Initial DB and performance items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501003
[17:27:39] <wikibugs>	 (03PS1) 10Jforrester: Invariant config cleanup: II - Account and anti-abuse settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501004
[17:27:41] <wikibugs>	 (03PS1) 10Jforrester: Invariant config cleanup: III - SVG rendering [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501005
[17:27:47] <wikibugs>	 (03PS1) 10Jforrester: Invariant config cleanup: IV - DJVU rendering [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501006
[17:27:49] <wikibugs>	 (03PS1) 10Jforrester: Invariant config cleanup: V - Notifications matters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501007
[17:27:51] <wikibugs>	 (03PS1) 10Jforrester: Invariant config cleanup: VI - Watchlist default setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501008
[17:27:53] <wikibugs>	 (03PS1) 10Jforrester: Invariant config cleanup: VII - RL local storage setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501009
[17:27:55] <wikibugs>	 (03PS1) 10Jforrester: Invariant config cleanup: VIII - ULS logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501010
[17:27:57] <wikibugs>	 (03PS1) 10Jforrester: Invariant config cleanup: IX - RightsIcon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501011
[17:27:59] <wikibugs>	 (03PS1) 10Jforrester: Invariant config cleanup: X - Extensions loaded on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501012
[17:28:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: II - Account and anti-abuse settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501004 (owner: 10Jforrester)
[17:28:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: I - Initial DB and performance items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501003 (owner: 10Jforrester)
[17:28:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: III - SVG rendering [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501005 (owner: 10Jforrester)
[17:29:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: IV - DJVU rendering [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501006 (owner: 10Jforrester)
[17:29:10] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[17:29:12] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: V - Notifications matters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501007 (owner: 10Jforrester)
[17:29:22] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::hadoop::master::standby: remove page to SRE [puppet] - 10https://gerrit.wikimedia.org/r/501002 (owner: 10Elukey)
[17:29:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: VI - Watchlist default setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501008 (owner: 10Jforrester)
[17:30:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: VII - RL local storage setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501009 (owner: 10Jforrester)
[17:30:40] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: VIII - ULS logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501010 (owner: 10Jforrester)
[17:30:45] <wikibugs>	 (03PS1) 10Hashar: jenkins: ensure secrets are only readable by jenkins [puppet] - 10https://gerrit.wikimedia.org/r/501013
[17:31:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: IX - RightsIcon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501011 (owner: 10Jforrester)
[17:31:29] <wikibugs>	 (03PS1) 10Thcipriani: Gerrit 2.15.12 (update core only) [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/501014
[17:31:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: X - Extensions loaded on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501012 (owner: 10Jforrester)
[17:32:14] <wikibugs>	 (03PS2) 10Jforrester: Invariant config cleanup: I - Initial DB and performance items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501003
[17:32:16] <wikibugs>	 (03PS2) 10Jforrester: Invariant config cleanup: II - Account and anti-abuse settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501004
[17:32:18] <wikibugs>	 (03PS2) 10Jforrester: Invariant config cleanup: III - SVG rendering [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501005
[17:32:20] <wikibugs>	 (03PS2) 10Jforrester: Invariant config cleanup: IV - DJVU rendering [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501006
[17:32:22] <wikibugs>	 (03PS2) 10Jforrester: Invariant config cleanup: V - Notifications matters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501007
[17:32:24] <wikibugs>	 (03PS2) 10Jforrester: Invariant config cleanup: VI - Watchlist default setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501008
[17:32:26] <wikibugs>	 (03PS2) 10Jforrester: Invariant config cleanup: VII - RL local storage setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501009
[17:32:28] <wikibugs>	 (03PS2) 10Jforrester: Invariant config cleanup: VIII - ULS logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501010
[17:32:33] <wikibugs>	 (03PS2) 10Jforrester: Invariant config cleanup: IX - RightsIcon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501011
[17:32:35] <wikibugs>	 (03PS2) 10Jforrester: Invariant config cleanup: X - Extensions loaded on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501012
[17:33:26] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: III - SVG rendering [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501005 (owner: 10Jforrester)
[17:33:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: IV - DJVU rendering [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501006 (owner: 10Jforrester)
[17:33:53] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: V - Notifications matters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501007 (owner: 10Jforrester)
[17:34:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: VI - Watchlist default setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501008 (owner: 10Jforrester)
[17:34:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: VII - RL local storage setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501009 (owner: 10Jforrester)
[17:35:15] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: VIII - ULS logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501010 (owner: 10Jforrester)
[17:35:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: IX - RightsIcon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501011 (owner: 10Jforrester)
[17:36:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Invariant config cleanup: X - Extensions loaded on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501012 (owner: 10Jforrester)
[17:37:10] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] "Makes sense. Still doesn't cover the fact any program executed as the jenkins user can read that, which looks to me like the biggest issue" [puppet] - 10https://gerrit.wikimedia.org/r/501001 (https://phabricator.wikimedia.org/T219991) (owner: 10Hashar)
[17:37:42] <wikibugs>	 (03CR) 10Hashar: [C: 04-1] "That is apparently done by the Debian package (per Joe)" [puppet] - 10https://gerrit.wikimedia.org/r/501013 (owner: 10Hashar)
[17:39:52] <icinga-wm>	 RECOVERY - Hadoop Namenode - Stand By on an-master1002 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.namenode.NameNode https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration
[17:40:03] <volans>	 thanks elukey !
[17:40:15] <elukey>	 there was a job hammering the namenodes :(
[17:40:20] <elukey>	 we just killed it
[17:41:04] <apergos>	 huh
[17:41:18] <apergos>	 what was it?
[17:41:56] <elukey>	 it seems a job that creates a ton of files on hdfs
[17:42:02] <elukey>	 from a researcher
[17:42:09] <elukey>	 we are going to follow up with him
[17:42:13] <elukey>	 I have little context
[17:43:09] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] "So the Debian package 'postinst' does not set the adm group." [puppet] - 10https://gerrit.wikimedia.org/r/501013 (owner: 10Hashar)
[17:44:45] <herron>	 !log shortly postponing restarts of eventbus and kafka services for security updates due to unrelated firefighting - repooling kafka1001
[17:44:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:45:38] <wikibugs>	 (03PS10) 10Giuseppe Lavagetto: Add an update action [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/487793
[17:45:40] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Move pulling logic to us, away from the docker daemon [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/501017
[17:46:20] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[17:46:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Move pulling logic to us, away from the docker daemon [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/501017 (owner: 10Giuseppe Lavagetto)
[17:47:37] <wikibugs>	 (03PS2) 10Herron: jenkins: prevent reading IRC passwords [puppet] - 10https://gerrit.wikimedia.org/r/501001 (https://phabricator.wikimedia.org/T219991) (owner: 10Hashar)
[17:47:41] <wikibugs>	 (03PS2) 10Hashar: jenkins: ensure secrets and logs are only readable by jenkins [puppet] - 10https://gerrit.wikimedia.org/r/501013
[17:48:36] <wikibugs>	 (03PS9) 10Andrew Bogott: labs puppetmaster migration: Puppet role for encapi/labspuppet DB hosts [puppet] - 10https://gerrit.wikimedia.org/r/500844 (owner: 10Alex Monk)
[17:48:46] <wikibugs>	 (03CR) 10Herron: [C: 03+2] jenkins: prevent reading IRC passwords [puppet] - 10https://gerrit.wikimedia.org/r/501001 (https://phabricator.wikimedia.org/T219991) (owner: 10Hashar)
[17:50:42] <wikibugs>	 (03PS10) 10Andrew Bogott: labs puppetmaster migration: Puppet role for encapi/labspuppet DB hosts [puppet] - 10https://gerrit.wikimedia.org/r/500844 (owner: 10Alex Monk)
[17:52:18] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] labs puppetmaster migration: Puppet role for encapi/labspuppet DB hosts [puppet] - 10https://gerrit.wikimedia.org/r/500844 (owner: 10Alex Monk)
[17:52:48] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic: Some load.php requests failing due to "ERR_SPDY_PROTOCOL_ERROR 200" - https://phabricator.wikimedia.org/T220022 (10Krinkle)
[17:53:47] <wikibugs>	 (03CR) 10Gilles: "Ok,yeah, I'll have to figure out another way to get the headers apache mod installed." [puppet] - 10https://gerrit.wikimedia.org/r/499537 (https://phabricator.wikimedia.org/T219417) (owner: 10Gilles)
[17:54:56] <wikibugs>	 (03CR) 10Paladox: [C: 03+2] Gerrit 2.15.12 (update core only) [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/501014 (owner: 10Thcipriani)
[17:56:48] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[17:57:05] <elukey>	 !log restart hadoop-hdfs-namenode on an-master1001 as precautionary measure after the outage (currently standby)
[17:57:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:00:04] <jouncebot>	 Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190403T1800)
[18:00:04] <jouncebot>	 thcipriani, brennen, and paladox: #bothumor My software never has bugs. It just develops random features. Rise for Gerrit Core 2.15.12 Upgrade. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190403T1800).
[18:00:32] * paladox here
[18:00:33] * thcipriani does
[18:01:22] <wikibugs>	 (03PS3) 10Herron: jenkins: ensure secrets and logs are only readable by jenkins [puppet] - 10https://gerrit.wikimedia.org/r/501013 (owner: 10Hashar)
[18:03:19] <wikibugs>	 (03CR) 10Herron: [C: 03+2] jenkins: ensure secrets and logs are only readable by jenkins [puppet] - 10https://gerrit.wikimedia.org/r/501013 (owner: 10Hashar)
[18:04:18] <wikibugs>	 (03CR) 10Thcipriani: [V: 03+2] Gerrit 2.15.12 (update core only) [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/501014 (owner: 10Thcipriani)
[18:04:37] <hashar>	 thcipriani: can you old the upgrade just for a few minutes? ;D
[18:04:57] <hashar>	 I got some patching for jenkins going on :]
[18:05:31] <thcipriani>	 hashar: sure ping me when I'm clear
[18:05:58] <hashar>	 we are running puppet on the hosts
[18:07:24] <hashar>	 thcipriani: all set thanks
[18:07:41] <thcipriani>	 hashar: cool, going ahead with the update now
[18:07:46] <hashar>	 good luck!
[18:08:59] <thcipriani>	 thanks :)
[18:09:26] <logmsgbot>	 !log thcipriani@deploy1001 Started deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on gerrit2001 only
[18:09:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:09:37] <logmsgbot>	 !log thcipriani@deploy1001 Finished deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on gerrit2001 only (duration: 00m 11s)
[18:09:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:46] <logmsgbot>	 !log thcipriani@deploy1001 Started deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on cobalt (restart to follow)
[18:11:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:57] <logmsgbot>	 !log thcipriani@deploy1001 Finished deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on cobalt (restart to follow) (duration: 00m 11s)
[18:11:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:12:20] <thcipriani>	 !log restarting gerrit for 2.15.12 update
[18:12:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:52] <thcipriani>	 !log gerrit back on 2.15.12
[18:14:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:15:12] <James_F>	 Yay.
[18:15:20] <icinga-wm>	 PROBLEM - puppet last run on stat1006 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 2 minutes ago with 4 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config],Exec[git_pull_mediawiki/event-schemas],Exec[git_pull_statistics_mediawiki],Exec[git_pull_analytics/reportupdater]
[18:17:57] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): Enable UrlShortener in mediawikiwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500910 (https://phabricator.wikimedia.org/T108557) (owner: 10Ladsgroup)
[18:18:04] <icinga-wm>	 PROBLEM - puppet last run on db1125 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config]
[18:19:14] <icinga-wm>	 PROBLEM - puppet last run on webperf1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_performance/docroot]
[18:23:13] <James_F>	 thcipriani: Hmm. Phab isn't letting me create a task via https://phabricator.wikimedia.org/maniphest/task/edit/form/3/ without explicitly setting a number of points. Did we change something? It used to work fine.
[18:26:20] <thcipriani>	 James_F: hrm, I'm unware of any updates to phab that have happened recently (although twentyafterfour mmay know better). Krinkle updated the form on the 28th, but I can't figure out what changes were made looking at the ui.
[18:26:52] <wikibugs>	 (03PS4) 10Elukey: cumin: add more hadoop-related aliases [puppet] - 10https://gerrit.wikimedia.org/r/500967 (https://phabricator.wikimedia.org/T218343)
[18:26:58] <twentyafterfour>	 James_F: no updates to phab that I'm aware of either.. I'll look at it
[18:27:10] <James_F>	 thcipriani: Not urgent at all, just confusing.
[18:27:51] <Krinkle>	 I changed the label from "Create Task (Advanced)" to "New Task (Advanced)".
[18:28:18] <Krinkle>	 I don't see a configuratin option to declare whether a field should be required to be non-empty or not.
[18:28:29] <Krinkle>	 That might've been changed in upstream code unintentionally or in our extension of it.
[18:28:34] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] cumin: add more hadoop-related aliases [puppet] - 10https://gerrit.wikimedia.org/r/500967 (https://phabricator.wikimedia.org/T218343) (owner: 10Elukey)
[18:29:37] * James_F nods.
[18:42:16] <wikibugs>	 (03PS2) 10Bstorm: sonofgridengine: make tools-checker hosts submit hosts [puppet] - 10https://gerrit.wikimedia.org/r/500535 (https://phabricator.wikimedia.org/T219817)
[18:43:56] <icinga-wm>	 RECOVERY - puppet last run on db1125 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[18:45:08] <icinga-wm>	 RECOVERY - puppet last run on webperf1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[18:46:16] <icinga-wm>	 RECOVERY - puppet last run on stat1006 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[18:47:06] <wikibugs>	 (03PS3) 10Jforrester: Invariant config cleanup: III - SVG rendering [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501005
[18:47:08] <wikibugs>	 (03PS3) 10Jforrester: Invariant config cleanup: IV - DJVU rendering [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501006
[18:47:10] <wikibugs>	 (03PS3) 10Jforrester: Invariant config cleanup: V - Notifications matters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501007
[18:47:12] <wikibugs>	 (03PS3) 10Jforrester: Invariant config cleanup: VI - Watchlist default setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501008
[18:47:14] <wikibugs>	 (03PS3) 10Jforrester: Invariant config cleanup: VII - RL local storage setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501009
[18:47:16] <wikibugs>	 (03PS3) 10Jforrester: Invariant config cleanup: VIII - ULS logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501010
[18:47:18] <wikibugs>	 (03PS3) 10Jforrester: Invariant config cleanup: IX - RightsIcon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501011
[18:47:20] <wikibugs>	 (03PS3) 10Jforrester: Invariant config cleanup: X - Extensions loaded on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501012
[18:50:34] <wikibugs>	 10Operations, 10cloud-services-team, 10serviceops, 10Core Platform Team Backlog (Watching / External), and 2 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10jijiki)
[18:50:38] <wikibugs>	 (03PS27) 10Andrew Bogott: wmcs: Migrate tools-checker to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243) (owner: 10BryanDavis)
[18:52:24] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] wmcs: Migrate tools-checker to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243) (owner: 10BryanDavis)
[18:55:13] <wikibugs>	 (03CR) 10Jbond: tests: mark test strings with escape as raw (031 comment) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/483131 (owner: 10Volans)
[18:55:40] <wikibugs>	 10Operations, 10cloud-services-team, 10serviceops, 10Core Platform Team Backlog (Watching / External), and 2 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10jijiki)
[18:57:12] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1002 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[18:57:31] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/443367 (https://phabricator.wikimedia.org/T198592) (owner: 10Volans)
[18:59:00] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[19:00:04] <jouncebot>	 marxarelli: How many deployers does it take to do MediaWiki train - Americas version deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190403T1900).
[19:02:44] <icinga-wm>	 RECOVERY - toolschecker: NFS read/writeable on labs instances on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.012 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[19:03:55] <wikibugs>	 (03PS3) 10Bstorm: sonofgridengine: make tools-checker hosts submit hosts [puppet] - 10https://gerrit.wikimedia.org/r/500535 (https://phabricator.wikimedia.org/T219817)
[19:05:25] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] sonofgridengine: make tools-checker hosts submit hosts [puppet] - 10https://gerrit.wikimedia.org/r/500535 (https://phabricator.wikimedia.org/T219817) (owner: 10Bstorm)
[19:08:35] <wikibugs>	 (03CR) 10Jforrester: "Found via `'(wm?g)(.*)' => \[\n\t'default' => ([^\n,]*),?( ([^\n]*))?\n\],`, with judgement as to which vary from time to time and which d" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501003 (owner: 10Jforrester)
[19:09:40] <twentyafterfour>	 James_F: I set the points field to default=0 ... maybe that helps?
[19:10:39] <twentyafterfour>	 well sort of helps: https://phabricator.wikimedia.org/T220027
[19:10:53] <twentyafterfour>	 only drawback is that shows "0 story points" at the top for no good reason
[19:11:10] <James_F>	 Yeah, difference between 0 and null…
[19:11:33] <James_F>	 This is probably an upstream change we didn't notice until now.
[19:16:32] <wikibugs>	 (03PS1) 10Ladsgroup: varnish: allow short urls that have query [puppet] - 10https://gerrit.wikimedia.org/r/501032 (https://phabricator.wikimedia.org/T219986)
[19:19:50] <wikibugs>	 (03PS1) 10Hashar: contint: deny some Jenkins entrypoint [puppet] - 10https://gerrit.wikimedia.org/r/501033 (https://phabricator.wikimedia.org/T219991)
[19:19:59] <wikibugs>	 (03CR) 10Ladsgroup: varnish: allow short urls that have query (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/501032 (https://phabricator.wikimedia.org/T219986) (owner: 10Ladsgroup)
[19:20:51] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] contint: deny some Jenkins entrypoint [puppet] - 10https://gerrit.wikimedia.org/r/501033 (https://phabricator.wikimedia.org/T219991) (owner: 10Hashar)
[19:21:52] <wikibugs>	 (03PS1) 10Bstorm: toolschecker: Typo fix [puppet] - 10https://gerrit.wikimedia.org/r/501034 (https://phabricator.wikimedia.org/T219243)
[19:22:20] <wikibugs>	 (03PS2) 10Bstorm: toolschecker: Typo fix [puppet] - 10https://gerrit.wikimedia.org/r/501034 (https://phabricator.wikimedia.org/T219243)
[19:23:22] <logmsgbot>	 !log smalyshev@deploy1001 Started deploy [wdqs/wdqs@50b2af9]: Deploy new Updater for more cache-friendly update startegy
[19:23:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:23:57] <wikibugs>	 10Operations, 10cloud-services-team, 10serviceops, 10Core Platform Team Backlog (Watching / External), and 3 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10jijiki)
[19:24:36] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] toolschecker: Typo fix [puppet] - 10https://gerrit.wikimedia.org/r/501034 (https://phabricator.wikimedia.org/T219243) (owner: 10Bstorm)
[19:25:14] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[19:26:10] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[19:28:40] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 04-1] "This needs a change in the puppet compiler client code to search subdirs" [puppet] - 10https://gerrit.wikimedia.org/r/500501 (https://phabricator.wikimedia.org/T219430) (owner: 10Andrew Bogott)
[19:34:17] <logmsgbot>	 !log smalyshev@deploy1001 Finished deploy [wdqs/wdqs@50b2af9]: Deploy new Updater for more cache-friendly update startegy (duration: 10m 54s)
[19:34:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:35:19] <wikibugs>	 (03PS10) 10Jforrester: SDC: Enable Depicts functionality on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498145 (https://phabricator.wikimedia.org/T218913)
[19:36:00] <wikibugs>	 (03PS11) 10Jforrester: SDC: Enable Depicts functionality on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498145 (https://phabricator.wikimedia.org/T218913)
[19:39:06] <wikibugs>	 (03PS1) 10Andrew Bogott: utils.facts_file: do a recursive search in the 'facts' dir [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/501039 (https://phabricator.wikimedia.org/T219430)
[19:39:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] utils.facts_file: do a recursive search in the 'facts' dir [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/501039 (https://phabricator.wikimedia.org/T219430) (owner: 10Andrew Bogott)
[19:44:59] <wikibugs>	 (03PS1) 10Dduvall: group1 wikis to 1.33.0-wmf.24 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501042
[19:45:01] <wikibugs>	 (03CR) 10Dduvall: [C: 03+2] group1 wikis to 1.33.0-wmf.24 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501042 (owner: 10Dduvall)
[19:46:16] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.33.0-wmf.24 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501042 (owner: 10Dduvall)
[19:46:50] <marxarelli>	 bd808: rolling the train, fyi
[19:48:19] <bd808>	 marxarelli: thanks for the heads up. We haven't flipped the feature flag on for that patch yet, so hopefully it will be a very boring roll out :)
[19:48:35] <marxarelli>	 ah, good to know
[19:49:25] <logmsgbot>	 !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
[19:50:29] <stashbot>	 dduvall@deploy1001: Failed to log message to wiki. Somebody should check the error logs.
[19:50:39] <wikibugs>	 (03PS1) 10BryanDavis: sudo: Allow root to assume any group [puppet] - 10https://gerrit.wikimedia.org/r/501043
[19:50:48] <marxarelli>	 !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
[19:50:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:51:15] <logmsgbot>	 !log dduvall@deploy1001 Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 49s)
[19:51:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:51:24] <icinga-wm>	 PROBLEM - HHVM rendering on mw1241 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[19:52:11] <wikibugs>	 (03PS1) 10Dmaza: Enable Partial Blocks on French and Polish wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501044 (https://phabricator.wikimedia.org/T219327)
[19:52:16] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1241 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[19:52:18] <marxarelli>	 uh ohs
[19:52:24] <icinga-wm>	 PROBLEM - Apache HTTP on mw1241 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[19:53:02] <marxarelli>	 !log massive spike in DBTransactionError ([{exception_id}] {exception_url} Wikimedia\Rdbms\DBTransactionError from line 246 of /srv/mediawiki/php-1.33.0-wmf.24/includes/libs/rdbms/lbfactory/LBFactory.php: RefreshLinksJob::runForTitle: transaction round 'RefreshLinksJob::run' already started.)
[19:53:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:53:16] <marxarelli>	 !log rolling back group1
[19:53:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:54:01] <wikibugs>	 (03CR) 10jenkins-bot: group1 wikis to 1.33.0-wmf.24 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501042 (owner: 10Dduvall)
[19:54:17] <wikibugs>	 10Operations, 10Prod-Kubernetes, 10Kubernetes: Alert "kubelet operational latencies" - https://phabricator.wikimedia.org/T219696 (10akosiaris) 05Open→03Resolved Culprit identified.  On `Thu Mar 28 15:07:55 2019` a new version of the eventgate-analytics chart was deployed to both codfw and eqiad. That new...
[19:55:26] <marxarelli>	 !log 111,185 and counting DBTransactionError for jobrunner.discovery.wmnet
[19:55:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:55:37] <wikibugs>	 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10Miriam) OK, I can prepare a task for this, or we can start from someth...
[19:56:00] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1241 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 618 bytes in 0.216 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[19:56:06] <icinga-wm>	 RECOVERY - Apache HTTP on mw1241 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.261 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[19:56:14] <logmsgbot>	 !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: Revert group1 to 1.33.0-wmf.24
[19:56:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:56:22] <icinga-wm>	 RECOVERY - HHVM rendering on mw1241 is OK: HTTP OK: HTTP/1.1 200 OK - 74924 bytes in 0.493 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[19:56:40] <wikibugs>	 (03CR) 10Aezell: [C: 03+1] "These are the ones." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501044 (https://phabricator.wikimedia.org/T219327) (owner: 10Dmaza)
[19:56:51] <marxarelli>	 !log log correction group1 reverted to 1.33.0-wmf.23
[19:56:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:57:59] <wikibugs>	 (03PS13) 10Ayounsi: Bird anycast: add anycast_healthchecker [puppet] - 10https://gerrit.wikimedia.org/r/397723
[19:59:52] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1004 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[20:00:05] <jouncebot>	 cscott, arlolra, subbu, bearND, halfak, and Amir1: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190403T2000).
[20:09:18] <marxarelli>	 !log 1.33.0-wmf.24 is holding at group0 following rollback. filed T220037. cc: T206678
[20:09:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:09:28] <stashbot>	 T206678: 1.33.0-wmf.24 deployment blockers - https://phabricator.wikimedia.org/T206678
[20:09:29] <stashbot>	 T220037: Spike in DBTransactionError following 1.33.0-wmf.24 group1 promotion - https://phabricator.wikimedia.org/T220037
[20:11:22] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1004 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[20:14:27] <logmsgbot>	 !log arlolra@deploy1001 Started deploy [parsoid/deploy@4f740e3]: Updating Parsoid to 0b3bb10
[20:14:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:17:20] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: Move pulling logic to us, away from the docker daemon [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/501017
[20:18:27] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Move pulling logic to us, away from the docker daemon [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/501017 (owner: 10Giuseppe Lavagetto)
[20:19:20] <wikibugs>	 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10Nuria) I wonder if we can use fashion mist to benchmark: https://resea...
[20:20:11] <logmsgbot>	 !log arlolra@deploy1001 Finished deploy [parsoid/deploy@4f740e3]: Updating Parsoid to 0b3bb10 (duration: 05m 44s)
[20:20:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:23:30] <icinga-wm>	 PROBLEM - MD RAID on elastic2048 is CRITICAL: CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0
[20:23:34] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on elastic2048 is CRITICAL: CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T220038
[20:23:46] <wikibugs>	 10Operations, 10ops-codfw: Degraded RAID on elastic2048 - https://phabricator.wikimedia.org/T220038 (10ops-monitoring-bot)
[20:24:24] <icinga-wm>	 PROBLEM - Check systemd state on elastic2048 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[20:26:20] <wikibugs>	 (03CR) 10Smalyshev: [C: 03+1] "Deployed." [puppet] - 10https://gerrit.wikimedia.org/r/500359 (https://phabricator.wikimedia.org/T217897) (owner: 10Smalyshev)
[20:29:57] <arlolra>	 !log Updated Parsoid to 0b3bb10 (T219337)
[20:30:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:30:00] <stashbot>	 T219337: Port Parsoid tokenizer to PHP - https://phabricator.wikimedia.org/T219337
[20:30:50] <icinga-wm>	 RECOVERY - Check systemd state on elastic2048 is OK: OK - running: The system is fully operational
[20:34:42] <icinga-wm>	 PROBLEM - Check systemd state on elastic2048 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[20:36:22] <wikibugs>	 (03PS4) 10KartikMistry: Add publish restrictions config for enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495677 (https://phabricator.wikimedia.org/T217237) (owner: 10Petar.petkovic)
[20:36:59] * gehel is looking into elastic2048
[20:37:48] <wikibugs>	 (03PS14) 10Ayounsi: Bird anycast: add anycast_healthchecker [puppet] - 10https://gerrit.wikimedia.org/r/397723
[20:39:31] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on elastic2048 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Gehel raid failed, disk read only - https://phabricator.wikimedia.org/T220038
[20:40:39] <gehel>	 !log excluding elastic2048 from cluster and depooling - T220038
[20:40:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:40:42] <stashbot>	 T220038: Degraded RAID on elastic2048 - https://phabricator.wikimedia.org/T220038
[20:43:54] <wikibugs>	 10Operations, 10ops-codfw: Degraded RAID on elastic2048 - https://phabricator.wikimedia.org/T220038 (10Gehel) Node is depooled and excluded from the cluster. @Papaul if you have a spare, feel free to do what needs doing. Ping me when done and I'll reimage.
[20:45:37] <wikibugs>	 (03PS5) 10Gehel: Enable using revision-fetch mechanism for test & internal clusters [puppet] - 10https://gerrit.wikimedia.org/r/500359 (https://phabricator.wikimedia.org/T217897) (owner: 10Smalyshev)
[20:47:08] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] Enable using revision-fetch mechanism for test & internal clusters [puppet] - 10https://gerrit.wikimedia.org/r/500359 (https://phabricator.wikimedia.org/T217897) (owner: 10Smalyshev)
[20:47:35] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-General-or-Unknown, 10PHP 7.2 support: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10tstarling) Excluding the ligatures, since I think they are correct already in HHVM, th...
[20:53:03] <wikibugs>	 (03PS1) 10Gehel: Revert "Enable using revision-fetch mechanism for test & internal clusters" [puppet] - 10https://gerrit.wikimedia.org/r/501054
[20:54:15] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] Revert "Enable using revision-fetch mechanism for test & internal clusters" [puppet] - 10https://gerrit.wikimedia.org/r/501054 (owner: 10Gehel)
[20:54:35] <wikibugs>	 (03PS15) 10Ayounsi: Bird anycast: add anycast_healthchecker [puppet] - 10https://gerrit.wikimedia.org/r/397723
[20:56:46] <icinga-wm>	 PROBLEM - Hadoop Namenode - Stand By on an-master1002 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.hdfs.server.namenode.NameNode https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration
[20:58:19] <wikibugs>	 10Operations, 10Puppet, 10Packaging: upgrade facter and puppet across the fleet - https://phabricator.wikimedia.org/T219803 (10jbond) both rapidjson and catch build find using libc++ however even using theses packages we get the above error
[20:59:26] <wikibugs>	 (03PS16) 10Ayounsi: Bird anycast: add anycast_healthchecker [puppet] - 10https://gerrit.wikimedia.org/r/397723
[21:02:14] <wikibugs>	 (03PS1) 10Gehel: wdqs: expose revision-fetch mechanism [puppet] - 10https://gerrit.wikimedia.org/r/501056 (https://phabricator.wikimedia.org/T217897)
[21:04:17] <wikibugs>	 (03CR) 10Smalyshev: [C: 03+1] wdqs: expose revision-fetch mechanism [puppet] - 10https://gerrit.wikimedia.org/r/501056 (https://phabricator.wikimedia.org/T217897) (owner: 10Gehel)
[21:07:33] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] wdqs: expose revision-fetch mechanism [puppet] - 10https://gerrit.wikimedia.org/r/501056 (https://phabricator.wikimedia.org/T217897) (owner: 10Gehel)
[21:14:08] <elukey>	 checking an-master 1002 
[21:14:13] <wikibugs>	 (03PS17) 10Ayounsi: Bird anycast: add anycast_healthchecker [puppet] - 10https://gerrit.wikimedia.org/r/397723
[21:16:59] <wikibugs>	 (03CR) 10Ayounsi: "https://puppet-compiler.wmflabs.org/compiler1001/15553/dns1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/397723 (owner: 10Ayounsi)
[21:19:00] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 04-1] "We also need to pass in a host-specific yamldir to the puppet master itself" [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/501039 (https://phabricator.wikimedia.org/T219430) (owner: 10Andrew Bogott)
[21:21:12] <icinga-wm>	 RECOVERY - Hadoop Namenode - Stand By on an-master1002 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.namenode.NameNode https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration
[21:21:47] <wikibugs>	 (03PS6) 10Gilles: Make caching of static performance site explicit [puppet] - 10https://gerrit.wikimedia.org/r/499537 (https://phabricator.wikimedia.org/T219417)
[21:22:29] <wikibugs>	 (03PS9) 10Bstorm: cloudstore: start refactor for role switch up around the labstores [puppet] - 10https://gerrit.wikimedia.org/r/500801 (https://phabricator.wikimedia.org/T209527)
[21:22:56] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-General-or-Unknown, 10PHP 7.2 support: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10tstarling) I see now that the ligatures are indeed changing, but there is only one aff...
[21:22:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Make caching of static performance site explicit [puppet] - 10https://gerrit.wikimedia.org/r/499537 (https://phabricator.wikimedia.org/T219417) (owner: 10Gilles)
[21:24:23] <wikibugs>	 (03PS7) 10Gilles: Make caching of static performance site explicit [puppet] - 10https://gerrit.wikimedia.org/r/499537 (https://phabricator.wikimedia.org/T219417)
[21:25:36] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] cloudstore: start refactor for role switch up around the labstores [puppet] - 10https://gerrit.wikimedia.org/r/500801 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm)
[21:32:03] <elukey>	 !log start hadoop-hdfs-namenode on an-master1002 after outage due to big job hitting HDFS
[21:32:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:48:37] <wikibugs>	 (03PS1) 10Bstorm: labstore: fix mistake in maintain_dbusers service [puppet] - 10https://gerrit.wikimedia.org/r/501066 (https://phabricator.wikimedia.org/T209527)
[21:50:41] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] labstore: fix mistake in maintain_dbusers service [puppet] - 10https://gerrit.wikimedia.org/r/501066 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm)
[21:55:03] <wikibugs>	 (03PS1) 10Elukey: admin: temporary remove piccardi from analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/501067
[21:55:30] <elukey>	 cdanis: --^
[21:55:55] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] admin: temporary remove piccardi from analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/501067 (owner: 10Elukey)
[21:57:09] <elukey>	 thanks!
[21:57:16] <cdanis>	 np!
[21:57:49] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] admin: temporary remove piccardi from analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/501067 (owner: 10Elukey)
[22:14:13] <wikibugs>	 (03PS1) 10Bstorm: labstore: cleanup the remaining files after Icc89332f0e779 [puppet] - 10https://gerrit.wikimedia.org/r/501070 (https://phabricator.wikimedia.org/T209527)
[22:29:52] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[22:41:47] <wikibugs>	 (03PS1) 10Catrope: Set GrowthExperiments homepage config for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501079
[22:45:39] <wikibugs>	 (03PS1) 10Volans: icinga: sync only if config is valid and log it [puppet] - 10https://gerrit.wikimedia.org/r/501083
[22:45:52] <wikibugs>	 (03PS1) 10Catrope: GrowthExperiments Homepage: configure tutorial pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501084 (https://phabricator.wikimedia.org/T219395)
[22:48:00] <wikibugs>	 (03PS1) 10Catrope: Enable Flow and Flow beta feature on zhwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501086 (https://phabricator.wikimedia.org/T219588)
[22:48:37] <wikibugs>	 (03CR) 10Volans: "Compiler results here:" [puppet] - 10https://gerrit.wikimedia.org/r/501083 (owner: 10Volans)
[22:51:57] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] icinga: sync only if config is valid and log it [puppet] - 10https://gerrit.wikimedia.org/r/501083 (owner: 10Volans)
[22:54:19] <wikibugs>	 (03PS1) 10Catrope: GrowthExperiments: Enable homepage instrumentation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501091
[22:55:47] <wikibugs>	 (03PS1) 10Catrope: Beta cluster: Enable GrowthExperiments homepage for 50% of new accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501093
[22:56:58] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[22:57:04] <wikibugs>	 (03PS3) 10CRusnov: Break report into 3 parts and adjust the way devices are filtered [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/499245
[22:57:49] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] Beta cluster: Enable GrowthExperiments homepage for 50% of new accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501093 (owner: 10Catrope)
[22:58:29] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+1] Add cron job to update WikimediaEditorTasks suggestions table [puppet] - 10https://gerrit.wikimedia.org/r/500104 (https://phabricator.wikimedia.org/T218136) (owner: 10Mholloway)
[22:59:01] <wikibugs>	 (03Merged) 10jenkins-bot: Beta cluster: Enable GrowthExperiments homepage for 50% of new accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501093 (owner: 10Catrope)
[22:59:21] <wikibugs>	 (03PS18) 10Ayounsi: Bird anycast: add anycast_healthchecker [puppet] - 10https://gerrit.wikimedia.org/r/397723
[23:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the Evening SWAT (Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190403T2300).
[23:00:04] <jouncebot>	 RoanKattouw: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[23:00:26] <RoanKattouw>	 I'll SWAT since I'm the only customer
[23:00:38] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] Set GrowthExperiments homepage config for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501079 (owner: 10Catrope)
[23:01:55] <wikibugs>	 (03Merged) 10jenkins-bot: Set GrowthExperiments homepage config for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501079 (owner: 10Catrope)
[23:03:03] <wikibugs>	 (03PS4) 10CRusnov: Break report into 3 parts and adjust the way devices are filtered [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/499245
[23:04:15] <wikibugs>	 (03CR) 10jenkins-bot: Beta cluster: Enable GrowthExperiments homepage for 50% of new accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501093 (owner: 10Catrope)
[23:04:17] <wikibugs>	 (03CR) 10jenkins-bot: Set GrowthExperiments homepage config for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501079 (owner: 10Catrope)
[23:05:59] <RoanKattouw>	 marxarelli: In the future please push train reverts to Gerrit and merge them rather than leaving them as local patches on the deployment host
[23:06:54] <marxarelli>	 RoanKattouw: ah, my mistake
[23:07:05] <wikibugs>	 (03PS1) 10Catrope: Revert "group1 wikis to 1.33.0-wmf.24" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501100
[23:07:05] <marxarelli>	 did you submit the revert already?
[23:07:12] <marxarelli>	 there it is
[23:07:12] <RoanKattouw>	 Just did
[23:07:14] <marxarelli>	 sorry about that
[23:07:20] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] Revert "group1 wikis to 1.33.0-wmf.24" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501100 (owner: 10Catrope)
[23:08:21] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "group1 wikis to 1.33.0-wmf.24" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501100 (owner: 10Catrope)
[23:09:06] <RoanKattouw>	 No worries, I was thrown off by git pull trying to do a merge commit at first, but once I figured out it was just a train revert it was easy to recover from
[23:12:30] <wikibugs>	 (03PS1) 10CRusnov: puppetdb_microservice: Redo how it returns values [puppet] - 10https://gerrit.wikimedia.org/r/501104
[23:13:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppetdb_microservice: Redo how it returns values [puppet] - 10https://gerrit.wikimedia.org/r/501104 (owner: 10CRusnov)
[23:14:28] <logmsgbot>	 !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Configure GrowthExperiments homepage on testwiki (duration: 01m 01s)
[23:14:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:15:17] <wikibugs>	 (03CR) 10jenkins-bot: Revert "group1 wikis to 1.33.0-wmf.24" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501100 (owner: 10Catrope)
[23:15:39] <wikibugs>	 (03PS2) 10CRusnov: puppetdb_microservice: Redo how it returns values [puppet] - 10https://gerrit.wikimedia.org/r/501104
[23:16:18] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] GrowthExperiments Homepage: configure tutorial pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501084 (https://phabricator.wikimedia.org/T219395) (owner: 10Catrope)
[23:16:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] GrowthExperiments Homepage: configure tutorial pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501084 (https://phabricator.wikimedia.org/T219395) (owner: 10Catrope)
[23:16:54] <wikibugs>	 (03PS2) 10Catrope: GrowthExperiments Homepage: configure tutorial pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501084 (https://phabricator.wikimedia.org/T219395)
[23:17:02] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] GrowthExperiments Homepage: configure tutorial pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501084 (https://phabricator.wikimedia.org/T219395) (owner: 10Catrope)
[23:18:12] <wikibugs>	 (03Merged) 10jenkins-bot: GrowthExperiments Homepage: configure tutorial pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501084 (https://phabricator.wikimedia.org/T219395) (owner: 10Catrope)
[23:18:33] <logmsgbot>	 !log catrope@deploy1001 sync-file aborted: (no justification provided) (duration: 00m 00s)
[23:18:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:20:42] <logmsgbot>	 !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Configure GrowthExperiments homepage tutorial pages on cswiki, kowiki, viwiki (dark deploy) (duration: 00m 59s)
[23:20:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:22:02] <wikibugs>	 (03PS2) 10Catrope: GrowthExperiments: Enable homepage instrumentation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501091
[23:22:08] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] GrowthExperiments: Enable homepage instrumentation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501091 (owner: 10Catrope)
[23:23:53] <wikibugs>	 (03Merged) 10jenkins-bot: GrowthExperiments: Enable homepage instrumentation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501091 (owner: 10Catrope)
[23:26:00] <wikibugs>	 (03PS1) 10Catrope: Fix missing wg prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501110
[23:26:05] <wikibugs>	 (03PS5) 10CRusnov: Break report into 3 parts and adjust the way devices are filtered [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/499245
[23:26:25] <wikibugs>	 (03CR) 10jenkins-bot: GrowthExperiments Homepage: configure tutorial pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501084 (https://phabricator.wikimedia.org/T219395) (owner: 10Catrope)
[23:26:27] <wikibugs>	 (03CR) 10jenkins-bot: GrowthExperiments: Enable homepage instrumentation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501091 (owner: 10Catrope)
[23:28:16] <wikibugs>	 (03PS6) 10CRusnov: Break report into 3 parts and adjust the way devices are filtered [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/499245
[23:28:26] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] Fix missing wg prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501110 (owner: 10Catrope)
[23:29:32] <wikibugs>	 (03Merged) 10jenkins-bot: Fix missing wg prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501110 (owner: 10Catrope)
[23:37:44] <wikibugs>	 (03CR) 10jenkins-bot: Fix missing wg prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501110 (owner: 10Catrope)
[23:38:01] <logmsgbot>	 !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments homepage EventLogging on testwiki (duration: 00m 59s)
[23:38:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:38:19] <wikibugs>	 (03PS7) 10CRusnov: Break report into parts and adjust the way devices are filtered [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/499245
[23:39:19] <wikibugs>	 (03PS2) 10Catrope: Enable Flow and Flow beta feature on zhwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501086 (https://phabricator.wikimedia.org/T219588)
[23:39:25] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] Enable Flow and Flow beta feature on zhwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501086 (https://phabricator.wikimedia.org/T219588) (owner: 10Catrope)
[23:41:11] <wikibugs>	 (03Merged) 10jenkins-bot: Enable Flow and Flow beta feature on zhwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501086 (https://phabricator.wikimedia.org/T219588) (owner: 10Catrope)
[23:41:41] <RoanKattouw>	 TimStarling: It looks like you're the only one using mwscript on deploy1001 right now, so I think it might be you that just triggered 15k error log entries of the form  ErrorException from line 0 of : PHP Warning: Class WikiPageMessageGroup has no unserializer ?
[23:42:56] <RoanKattouw>	 https://logstash.wikimedia.org/goto/2216d523464e3fce980768445c77b13b
[23:44:33] <TimStarling>	 no, I don't think so
[23:44:39] <wikibugs>	 (03CR) 10Jforrester: "Pinging people who like these kinds of changes. ;-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501003 (owner: 10Jforrester)
[23:46:07] <TimStarling>	 I had an sql terminal open, not sure how that could cause a flood of messages
[23:46:13] <TimStarling>	 wasn't doing anything with it
[23:48:50] <wikibugs>	 (03CR) 10jenkins-bot: Enable Flow and Flow beta feature on zhwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501086 (https://phabricator.wikimedia.org/T219588) (owner: 10Catrope)
[23:50:41] <logmsgbot>	 !log catrope@deploy1001 Synchronized dblists/flow.dblist: Enable Flow on zhwikisource (T219588) (duration: 00m 57s)
[23:50:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:50:45] <stashbot>	 T219588: Enable "Structured Discussion" for zh.wikisource.org - https://phabricator.wikimedia.org/T219588
[23:50:54] <TimStarling>	 it was open untouched since about 21:19, hard to see how it could cause errors at 23:37
[23:51:40] <logmsgbot>	 !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable Flow beta feature on zhwikisource (T219588) (duration: 00m 58s)
[23:51:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:55:42] <wikibugs>	 10Operations, 10Traffic, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Reduce / remove the aggessive cache busting behaviour of wdqs-updater - https://phabricator.wikimedia.org/T217897 (10Smalyshev) Looks like we have problem with redirects - they can not be fetched by-revision. E.g.:  https://www.wi...