[00:00:51] <SMalyshev>	 can be deployed
[00:00:53] <logmsgbot>	 !log krinkle@deploy1001 Synchronized wmf-config/InitialiseSettings.php: I24a5469dbfd0 / T216206 for testwikidatawiki (duration: 00m 50s)
[00:00:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:00:58] <stashbot>	 T216206: Set up WikibaseLexemeCirrusSearch extension for Elastic code in WikibaseLexeme - https://phabricator.wikimedia.org/T216206
[00:01:51] <Krinkle>	 SMalyshev: Ok to proceed?
[00:01:58] <SMalyshev>	 Krinkle: yes
[00:02:01] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] Enable new Lexeme search on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499868 (owner: 10Smalyshev)
[00:03:17] <wikibugs>	 (03Merged) 10jenkins-bot: Enable new Lexeme search on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499868 (owner: 10Smalyshev)
[00:04:00] <Krinkle>	 SMalyshev: staged on mwdebug1002
[00:04:40] <Krinkle>	 SMalyshev: This may be unrelated but I'm seeing PHP errors from mwdebug1002
[00:04:44] <Krinkle>	 [XJ1gugpAAC4AADrz5YwAAABE] /w/index.php?search=test&title=Special%3ASearch&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns146=1   ErrorException from line 85 of /srv/mediawiki/php-1.33.0-wmf.23/extensions/ArticlePlaceholder/includes/ItemNotabilityFilter.php: PHP Notice: Undefined index: Q3519023
[00:06:37] <Krinkle>	 I guess this is https://phabricator.wikimedia.org/T207235
[00:06:41] <wikibugs>	 (03CR) 10jenkins-bot: Enable new Lexeme search on testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499867 (https://phabricator.wikimedia.org/T216206) (owner: 10Smalyshev)
[00:06:43] <wikibugs>	 (03CR) 10jenkins-bot: Enable new Lexeme search on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499868 (owner: 10Smalyshev)
[00:07:01] <SMalyshev>	 hmm
[00:07:23] <SMalyshev>	 Krinkle: it is search, but the code place is wrong
[00:07:33] <SMalyshev>	 it's ArticlePlaceholder
[00:08:22] <SMalyshev>	 Krinkle: are there a lot of them? When did they start?
[00:08:31] <Krinkle>	 23:59
[00:08:38] <Krinkle>	 when I synced the previous commit to test
[00:08:51] <Krinkle>	 but I'm sure it's just because there was no activity there prior and it's probbaly in prod as well
[00:08:52] <Krinkle>	 checking now
[00:09:18] <SMalyshev>	 hmm not sure... the fact it's search does make me suspicious but could be also a coincidence
[00:09:41] <SMalyshev>	 and it says ArticlePlaceholder which has nothing to do with what I'm doing... let me see
[00:09:48] <Krinkle>	 Yeah, it's not becoming more common
[00:09:58] <Krinkle>	 https://logstash.wikimedia.org/goto/29470c49933fb1a7e4ee3f7b119789c3
[00:10:59] <Krinkle>	 SMalyshev: proceeding to prod?
[00:11:08] <SMalyshev>	 hmm it does seem to be related to search...
[00:11:12] <SMalyshev>	 Krinkle: give me a minute
[00:11:14] <Krinkle>	 https://logstash.wikimedia.org/app/kibana#/dashboard/mwdebug1002
[00:11:17] <Krinkle>	 Okay, no worries :)
[00:12:24] <SMalyshev>	 Krinkle: I see same errors before that - days ago
[00:13:08] <SMalyshev>	 e.g. on 22th. So I presume it's not something we broke
[00:13:35] <SMalyshev>	 it also happens on wmf.22 according to Kibana so probably been broken for a while
[00:14:05] <Krinkle>	 Yeah
[00:14:12] <SMalyshev>	 Krinkle: also, it seems to only happen on testwikis
[00:14:12] <Krinkle>	 I've left a new trace on the task.
[00:14:51] <SMalyshev>	 which makes me say we can proceed
[00:15:11] <Krinkle>	 Okay
[00:16:05] <logmsgbot>	 !log krinkle@deploy1001 Synchronized wmf-config/InitialiseSettings.php: I8887ce013a8 (duration: 00m 51s)
[00:16:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:18:27] <logmsgbot>	 !log krinkle@deploy1001 Synchronized php-1.33.0-wmf.23/includes/api/ApiStashEdit.php: I35213d83a0 (duration: 00m 49s)
[00:18:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:19:43] <icinga-wm>	 PROBLEM - HHVM rendering on mwdebug1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[00:24:41] <icinga-wm>	 RECOVERY - HHVM rendering on mwdebug1002 is OK: HTTP OK: HTTP/1.1 200 OK - 78721 bytes in 0.172 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[00:24:51] <wikibugs>	 10Operations, 10Traffic, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Reduce / remove the aggessive cache busting behaviour of wdqs-updater - https://phabricator.wikimedia.org/T217897 (10Smalyshev) @Addshore btw do I understand right that constraints can not be fetched per-revision? In this case, do...
[00:34:37] <SMalyshev>	 Krinkle: thanks!
[00:41:52] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+1] "Pretty sure this was a typo on my part, yes." [puppet] - 10https://gerrit.wikimedia.org/r/499887 (owner: 10Andrew Bogott)
[02:08:55] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1024 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[02:14:37] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster={cache_text,cache_upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[02:14:43] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[02:15:17] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1024 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[02:18:28] <icinga-wm>	 ACKNOWLEDGEMENT - ensure kvm processes are running on cloudvirt1024 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 andrew bogott I can not make icinga shut up about this! https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[02:18:28] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc maximum on cloudvirt1024 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute andrew bogott I can not make icinga shut up about this! https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[02:18:29] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc minimum on cloudvirt1024 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute andrew bogott I can not make icinga shut up about this! https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[02:18:33] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[02:19:41] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[02:50:13] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[02:50:19] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job={varnish-text,varnish-upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[02:53:21] <icinga-wm>	 PROBLEM - puppet last run on snapshot1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:00:29] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[03:00:35] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[03:05:35] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster={cache_text,cache_upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[03:05:41] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job={varnish-text,varnish-upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[03:09:23] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[03:09:29] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[03:19:24] <icinga-wm>	 RECOVERY - puppet last run on snapshot1008 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[03:20:02] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[03:20:10] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[03:21:54] <icinga-wm>	 PROBLEM - puppet last run on relforge1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:25:12] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[03:25:20] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[03:45:16] <icinga-wm>	 PROBLEM - puppet last run on druid1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:52:24] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:52:41] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on cp4032 is CRITICAL: connect to address 10.128.0.132 port 5666: No route to host nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T219586
[03:52:46] <wikibugs>	 10Operations, 10ops-ulsfo: Degraded RAID on cp4032 - https://phabricator.wikimedia.org/T219586 (10ops-monitoring-bot)
[03:53:05] <icinga-wm>	 RECOVERY - puppet last run on relforge1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[03:53:09] <icinga-wm>	 PROBLEM - BFD status on cr1-codfw is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[03:54:27] <icinga-wm>	 RECOVERY - BFD status on cr1-codfw is OK: OK: UP: 10 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[03:54:37] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job={varnish-text,varnish-upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[03:55:41] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster={cache_text,cache_upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[04:04:35] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[04:04:47] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[04:11:39] <icinga-wm>	 RECOVERY - puppet last run on druid1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:12:03] <icinga-wm>	 RECOVERY - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.007 second response time https://wikitech.wikimedia.org/wiki/Help:Toolforge/Monitoring
[04:12:35] <icinga-wm>	 PROBLEM - puppet last run on labpuppetmaster1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:27:29] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster={cache_text,cache_upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[04:27:39] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[04:28:05] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:31:09] <icinga-wm>	 PROBLEM - puppet last run on ms-be1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:31:29] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[04:32:35] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[04:35:03] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:35:35] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2003 is CRITICAL: instance=kubernetes2003.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:35:53] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:38:53] <icinga-wm>	 RECOVERY - puppet last run on labpuppetmaster1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[04:40:05] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:47:27] <icinga-wm>	 PROBLEM - puppet last run on kubernetes1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:48:35] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:55:47] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:56:39] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:57:27] <icinga-wm>	 RECOVERY - puppet last run on ms-be1035 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:03:59] <icinga-wm>	 PROBLEM - puppet last run on db1121 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:13:37] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2002 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:13:45] <icinga-wm>	 RECOVERY - puppet last run on kubernetes1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:14:33] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster={cache_text,cache_upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:14:43] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job={varnish-text,varnish-upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[05:19:39] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:19:49] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[05:23:37] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[05:24:41] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster={cache_text,cache_upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:25:25] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:27:53] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:28:29] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:28:33] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[05:28:45] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[05:30:17] <icinga-wm>	 RECOVERY - puppet last run on db1121 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:34:19] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:41:57] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:42:39] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1075 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499970
[05:44:13] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1075 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499970 (owner: 10Marostegui)
[05:45:36] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1075 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499970 (owner: 10Marostegui)
[05:46:05] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1075 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499970 (owner: 10Marostegui)
[05:46:48] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1075 (duration: 00m 52s)
[05:46:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:47:09] <marostegui>	 !log Remove labsdb1004 and labsdb1005 from tendril - T216749
[05:47:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:47:12] <stashbot>	 T216749: Reclaim/Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet as soon as they are ready - https://phabricator.wikimedia.org/T216749
[05:49:26] <marostegui>	 !log Disable notifications on labsdb1004 and labsdb1005 - T216749
[05:49:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:50:35] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2003 is CRITICAL: instance=kubernetes2003.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:59:00] <wikibugs>	 10Operations, 10Data-Services, 10decommission, 10Patch-For-Review, 10cloud-services-team (Kanban): Reclaim/Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet as soon as they are ready - https://phabricator.wikimedia.org/T216749 (10Marostegui) @Bstorm I have removed the hosts from Tendril (ten...
[06:01:25] <wikibugs>	 10Operations, 10Data-Services, 10decommission, 10Patch-For-Review, 10cloud-services-team (Kanban): Reclaim/Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet as soon as they are ready - https://phabricator.wikimedia.org/T216749 (10Marostegui)
[06:14:19] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1075" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499971
[06:14:49] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 241, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:15:05] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[06:15:28] <wikibugs>	 (03PS10) 10Vgutierrez: acme_chief: Provide OCSP stapling support [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705)
[06:15:30] <wikibugs>	 (03PS14) 10Vgutierrez: Allow acme-chief to provide unified cert [puppet] - 10https://gerrit.wikimedia.org/r/497929 (https://phabricator.wikimedia.org/T182927) (owner: 10Alex Monk)
[06:15:32] <wikibugs>	 (03PS7) 10Vgutierrez: hieradata: Deploy acme-chief unified certificate in eqsin cp servers [puppet] - 10https://gerrit.wikimedia.org/r/499780 (https://phabricator.wikimedia.org/T213705)
[06:15:34] <wikibugs>	 (03PS2) 10Vgutierrez: nagios_common: provide check_ssl_unified variants for LE certs [puppet] - 10https://gerrit.wikimedia.org/r/499823 (https://phabricator.wikimedia.org/T213705)
[06:15:36] <wikibugs>	 (03PS4) 10Vgutierrez: cache: serve wikiba.se traffic using cache::canary servers [puppet] - 10https://gerrit.wikimedia.org/r/499825 (https://phabricator.wikimedia.org/T213705)
[06:15:58] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1075" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499971 (owner: 10Marostegui)
[06:18:45] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1075" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499971 (owner: 10Marostegui)
[06:19:45] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1075 (duration: 00m 50s)
[06:19:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:25:17] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[06:28:03] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool pc1009 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499972
[06:29:31] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool pc1009 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499972 (owner: 10Marostegui)
[06:29:39] <icinga-wm>	 PROBLEM - puppet last run on cloudvirt1029 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/apt-upgrade-activity]
[06:29:49] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1075" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499971 (owner: 10Marostegui)
[06:30:28] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool pc1009 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499972 (owner: 10Marostegui)
[06:30:41] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool pc1009 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499972 (owner: 10Marostegui)
[06:31:23] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2003 is CRITICAL: instance=kubernetes2003.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[06:32:06] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool pc1009 (duration: 00m 50s)
[06:32:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:32:51] <wikibugs>	 (03CR) 10Vgutierrez: "pcc looks good on existing acme-chief clients: https://puppet-compiler.wmflabs.org/compiler1002/15427/" [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez)
[06:39:39] <icinga-wm>	 PROBLEM - MariaDB Slave IO: pc3 on pc2009 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2003, Errmsg: error reconnecting to master repl@pc1009.eqiad.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: Cant connect to MySQL server on pc1009.eqiad.wmnet (111 Connection refused) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[06:40:00] <marostegui>	 ^ that is me
[06:41:03] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[06:41:04] <marostegui>	 !log Upgrade pc1009
[06:41:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:41:28] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool pc1009" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499973
[06:42:56] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool pc1009" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499973 (owner: 10Marostegui)
[06:43:25] <icinga-wm>	 RECOVERY - MariaDB Slave IO: pc3 on pc2009 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[06:43:53] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool pc1009" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499973 (owner: 10Marostegui)
[06:44:54] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool pc1009 (duration: 00m 49s)
[06:44:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:45:23] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[06:48:25] <wikibugs>	 (03PS2) 10Gilles: Element Timing for Images and Layout Stability on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499152 (https://phabricator.wikimedia.org/T216598)
[06:49:08] <wikibugs>	 (03PS11) 10Vgutierrez: acme_chief: Provide OCSP stapling support [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705)
[06:49:10] <wikibugs>	 (03PS15) 10Vgutierrez: Allow acme-chief to provide unified cert [puppet] - 10https://gerrit.wikimedia.org/r/497929 (https://phabricator.wikimedia.org/T182927) (owner: 10Alex Monk)
[06:49:13] <wikibugs>	 (03PS8) 10Vgutierrez: hieradata: Deploy acme-chief unified certificate in eqsin cp servers [puppet] - 10https://gerrit.wikimedia.org/r/499780 (https://phabricator.wikimedia.org/T213705)
[06:49:15] <wikibugs>	 (03PS3) 10Vgutierrez: nagios_common: provide check_ssl_unified variants for LE certs [puppet] - 10https://gerrit.wikimedia.org/r/499823 (https://phabricator.wikimedia.org/T213705)
[06:49:17] <wikibugs>	 (03PS5) 10Vgutierrez: cache: serve wikiba.se traffic using cache::canary servers [puppet] - 10https://gerrit.wikimedia.org/r/499825 (https://phabricator.wikimedia.org/T213705)
[06:49:19] <wikibugs>	 (03PS1) 10Vgutierrez: acme_chief: Allow cp1008 to fetch the unified certificate [puppet] - 10https://gerrit.wikimedia.org/r/499974 (https://phabricator.wikimedia.org/T213705)
[06:49:21] <wikibugs>	 (03PS1) 10Vgutierrez: hieradata: Deploy acme-chief unified certificate in the cp1008 [puppet] - 10https://gerrit.wikimedia.org/r/499975 (https://phabricator.wikimedia.org/T213705)
[06:50:44] <wikibugs>	 (03PS2) 10Vgutierrez: hieradata: Deploy acme-chief unified certificate in cp1008 [puppet] - 10https://gerrit.wikimedia.org/r/499975 (https://phabricator.wikimedia.org/T213705)
[06:50:46] <wikibugs>	 (03PS9) 10Vgutierrez: hieradata: Deploy acme-chief unified certificate in eqsin cp servers [puppet] - 10https://gerrit.wikimedia.org/r/499780 (https://phabricator.wikimedia.org/T213705)
[06:50:48] <wikibugs>	 (03PS4) 10Vgutierrez: nagios_common: provide check_ssl_unified variants for LE certs [puppet] - 10https://gerrit.wikimedia.org/r/499823 (https://phabricator.wikimedia.org/T213705)
[06:50:50] <wikibugs>	 (03PS6) 10Vgutierrez: cache: serve wikiba.se traffic using cache::canary servers [puppet] - 10https://gerrit.wikimedia.org/r/499825 (https://phabricator.wikimedia.org/T213705)
[06:51:57] <wikibugs>	 (03CR) 10Gilles: [C: 03+2] Element Timing for Images and Layout Stability on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499152 (https://phabricator.wikimedia.org/T216598) (owner: 10Gilles)
[06:52:00] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool pc1009" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499973 (owner: 10Marostegui)
[06:55:57] <icinga-wm>	 RECOVERY - puppet last run on cloudvirt1029 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:56:13] <marostegui>	 !log Remove tools section from tendril by doing: update shards set display='0' where name='tools'; T216749
[06:56:13] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[06:56:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:56:18] <stashbot>	 T216749: Reclaim/Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet as soon as they are ready - https://phabricator.wikimedia.org/T216749
[06:57:33] <wikibugs>	 (03Merged) 10jenkins-bot: Element Timing for Images and Layout Stability on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499152 (https://phabricator.wikimedia.org/T216598) (owner: 10Gilles)
[06:58:11] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499977
[06:58:55] <wikibugs>	 (03CR) 10Vgutierrez: "pcc is happy and shows almost a NOOP (just setting acme_chief => False) in all DCs for text and upload cache servers: https://puppet-compi" [puppet] - 10https://gerrit.wikimedia.org/r/497929 (https://phabricator.wikimedia.org/T182927) (owner: 10Alex Monk)
[07:01:07] <logmsgbot>	 !log gilles@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T216598 T216594 Element Timing for Images and Layout Stability on ruwiki (duration: 00m 51s)
[07:01:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:01:17] <stashbot>	 T216594: Layout Stability API origin trial - https://phabricator.wikimedia.org/T216594
[07:01:18] <stashbot>	 T216598: Element Timing for Images origin trial - https://phabricator.wikimedia.org/T216598
[07:03:22] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499977 (owner: 10Marostegui)
[07:04:34] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499977 (owner: 10Marostegui)
[07:06:01] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1110 (duration: 00m 49s)
[07:06:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:06:11] <wikibugs>	 (03CR) 10Vgutierrez: "pcc shows unified being deployed in cp1008 and the upload/text servers being unaffected: https://puppet-compiler.wmflabs.org/compiler1002/" [puppet] - 10https://gerrit.wikimedia.org/r/499975 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez)
[07:06:21] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:06:29] <marostegui>	 !log Upgrade db1110
[07:06:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:10:14] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499978
[07:10:42] <wikibugs>	 (03PS16) 10Vgutierrez: Allow acme-chief to provide unified cert [puppet] - 10https://gerrit.wikimedia.org/r/497929 (https://phabricator.wikimedia.org/T182927) (owner: 10Alex Monk)
[07:10:45] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2003 is CRITICAL: instance=kubernetes2003.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:10:55] <wikibugs>	 (03PS2) 10Vgutierrez: acme_chief: Allow cp1008 to fetch the unified certificate [puppet] - 10https://gerrit.wikimedia.org/r/499974 (https://phabricator.wikimedia.org/T213705)
[07:10:57] <wikibugs>	 (03PS3) 10Vgutierrez: hieradata: Deploy acme-chief unified certificate in cp1008 [puppet] - 10https://gerrit.wikimedia.org/r/499975 (https://phabricator.wikimedia.org/T213705)
[07:10:59] <wikibugs>	 (03PS10) 10Vgutierrez: hieradata: Deploy acme-chief unified certificate in eqsin cp servers [puppet] - 10https://gerrit.wikimedia.org/r/499780 (https://phabricator.wikimedia.org/T213705)
[07:11:01] <wikibugs>	 (03PS5) 10Vgutierrez: nagios_common: provide check_ssl_unified variants for LE certs [puppet] - 10https://gerrit.wikimedia.org/r/499823 (https://phabricator.wikimedia.org/T213705)
[07:11:03] <wikibugs>	 (03PS7) 10Vgutierrez: cache: serve wikiba.se traffic using cache::canary servers [puppet] - 10https://gerrit.wikimedia.org/r/499825 (https://phabricator.wikimedia.org/T213705)
[07:11:59] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:12:11] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:14:19] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499978 (owner: 10Marostegui)
[07:15:43] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499978 (owner: 10Marostegui)
[07:16:07] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:16:15] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] acme_chief: Provide OCSP stapling support [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez)
[07:17:05] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1110 (duration: 01m 06s)
[07:17:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:19:38] <vgutierrez>	 !log reenabling puppet in acme-chief clients after verifying NOOP in netmon2001
[07:19:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:20:41] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: More weight to db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499979
[07:22:29] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:23:17] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:26:16] <wikibugs>	 (03CR) 10jenkins-bot: Element Timing for Images and Layout Stability on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499152 (https://phabricator.wikimedia.org/T216598) (owner: 10Gilles)
[07:26:18] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More weight to db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499979 (owner: 10Marostegui)
[07:26:23] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499977 (owner: 10Marostegui)
[07:27:19] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: More weight to db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499979 (owner: 10Marostegui)
[07:28:21] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1110 (duration: 00m 50s)
[07:28:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:30:25] <wikibugs>	 10Operations, 10Phabricator, 10Release-Engineering-Team, 10Wikimedia-Incident: Analyze and amend (if necessary) workflow of user reporting and detecting large regressions/outages - https://phabricator.wikimedia.org/T219589 (10jcrespo) Adding release engineering, although they should not own this, but so th...
[07:30:37] <wikibugs>	 (03PS1) 10Vgutierrez: cache: serve wikiba.se traffic using cache::text servers [puppet] - 10https://gerrit.wikimedia.org/r/499981 (https://phabricator.wikimedia.org/T213705)
[07:30:49] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:31:57] <wikibugs>	 (03CR) 10Vgutierrez: [C: 04-2] "Do not merge till I222b8ef48bf0ca2b23c091ebafd2bb933a9faa99 has been tested thoroughly" [puppet] - 10https://gerrit.wikimedia.org/r/499981 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez)
[07:34:27] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[07:34:27] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: More weight to db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499982
[07:34:37] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job={varnish-text,varnish-upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[07:34:40] <wikibugs>	 (03CR) 10Vgutierrez: [C: 04-2] "pcc shows the expected NOOPs in the upload cluster and the proper changes in text: https://puppet-compiler.wmflabs.org/compiler1002/15430/" [puppet] - 10https://gerrit.wikimedia.org/r/499981 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez)
[07:36:18] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499978 (owner: 10Marostegui)
[07:36:35] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: More weight to db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499979 (owner: 10Marostegui)
[07:38:25] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[07:39:31] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[07:43:35] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job={varnish-text,varnish-upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[07:44:41] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster={cache_text,cache_upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[07:45:02] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] Enable base::service_auto_restart for rsync/namenode standby [puppet] - 10https://gerrit.wikimedia.org/r/498834 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[07:48:29] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[07:48:39] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[07:51:21] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[07:51:42] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More weight to db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499982 (owner: 10Marostegui)
[07:52:59] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: More weight to db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499982 (owner: 10Marostegui)
[07:54:08] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1110 (duration: 00m 51s)
[07:54:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:57:35] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: More weight to db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499982 (owner: 10Marostegui)
[07:58:42] <logmsgbot>	 !log gilles@deploy1001 Synchronized php-1.33.0-wmf.23/includes/media/MediaTransformOutput.php: T216499 Only apply high priority half the time (duration: 00m 50s)
[07:58:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:58:46] <stashbot>	 T216499: Priority Hints origin trial - https://phabricator.wikimedia.org/T216499
[08:00:04] <wikibugs>	 (03PS8) 10Vgutierrez: cache: serve wikiba.se traffic using cache::canary servers [puppet] - 10https://gerrit.wikimedia.org/r/499825 (https://phabricator.wikimedia.org/T213705)
[08:00:06] <wikibugs>	 (03PS2) 10Vgutierrez: cache: serve wikiba.se traffic using cache::text servers [puppet] - 10https://gerrit.wikimedia.org/r/499981 (https://phabricator.wikimedia.org/T213705)
[08:05:32] <wikibugs>	 (03PS2) 10Muehlenhoff: Enable base::service_auto_restart for rsync/namenode standby [puppet] - 10https://gerrit.wikimedia.org/r/498834 (https://phabricator.wikimedia.org/T135991)
[08:07:33] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Enable base::service_auto_restart for rsync/namenode standby [puppet] - 10https://gerrit.wikimedia.org/r/498834 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[08:10:00] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499984
[08:15:21] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[08:17:35] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[08:17:58] <wikibugs>	 (03PS34) 10Mathew.onipe: elasticsearch: add profile for icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/496782 (https://phabricator.wikimedia.org/T214921)
[08:19:10] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[08:22:39] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: wmcs-spreadcheck: return 0 on success [puppet] - 10https://gerrit.wikimedia.org/r/499887 (owner: 10Andrew Bogott)
[08:23:57] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[08:23:57] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wmcs-spreadcheck: return 0 on success [puppet] - 10https://gerrit.wikimedia.org/r/499887 (owner: 10Andrew Bogott)
[08:25:12] <wikibugs>	 (03PS35) 10Mathew.onipe: elasticsearch: add profile for icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/496782 (https://phabricator.wikimedia.org/T214921)
[08:27:41] <wikibugs>	 10Operations, 10Traffic, 10netops: ulsfo <-> codfw transit link flapping causing nginx availability alerts - https://phabricator.wikimedia.org/T219591 (10elukey) p:05Triage→03High
[08:28:10] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/499355 (owner: 10Alex Monk)
[08:28:12] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499984 (owner: 10Marostegui)
[08:28:16] <wikibugs>	 10Operations, 10Continuous-Integration-Config, 10Patch-For-Review, 10User-zeljkofilipin: npm 6 consistently fails with "Z_DATA_ERROR: invalid distance too far back" on some repos - https://phabricator.wikimedia.org/T215562 (10MoritzMuehlenhoff) >>! In T215562#5066711, @Krinkle wrote: > As such, it is effec...
[08:28:49] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: Stop serving trusty repositories in aptly [puppet] - 10https://gerrit.wikimedia.org/r/499935 (owner: 10Muehlenhoff)
[08:29:36] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499984 (owner: 10Marostegui)
[08:30:33] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499984 (owner: 10Marostegui)
[08:30:41] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1110 (duration: 00m 50s)
[08:30:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:32:20] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "By the time of this patch, we only have 2 trusty servers in toolforge: toolscheker. And we are actively working on removing them." [puppet] - 10https://gerrit.wikimedia.org/r/499935 (owner: 10Muehlenhoff)
[08:34:13] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Depool ulsfo [dns] - 10https://gerrit.wikimedia.org/r/499987 (https://phabricator.wikimedia.org/T219591)
[08:34:18] <godog>	 elukey: ^
[08:35:03] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] Depool ulsfo [dns] - 10https://gerrit.wikimedia.org/r/499987 (https://phabricator.wikimedia.org/T219591) (owner: 10Filippo Giunchedi)
[08:35:11] <wikibugs>	 10Operations, 10Traffic, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Reduce / remove the aggessive cache busting behaviour of wdqs-updater - https://phabricator.wikimedia.org/T217897 (10Addshore) >>! In T217897#5066900, @Smalyshev wrote: >> WDQS does know what the latest version of the entity that...
[08:35:40] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] Depool ulsfo [dns] - 10https://gerrit.wikimedia.org/r/499987 (https://phabricator.wikimedia.org/T219591) (owner: 10Filippo Giunchedi)
[08:36:20] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "I'm not sure about this one. Leaving it for Andrew to decide if we can merge this now or worth waiting a couple of days until all Trusty i" [puppet] - 10https://gerrit.wikimedia.org/r/499933 (owner: 10Muehlenhoff)
[08:36:42] <godog>	 !log depool ulsfo as precaution -- link repair in progress
[08:36:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:37:17] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[08:42:05] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is CRITICAL: cluster=cache_upload site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:42:27] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at codfw on icinga1001 is CRITICAL: job=varnish-upload site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[08:44:39] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at ulsfo on icinga1001 is CRITICAL: 52.32 le 60 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[08:48:12] <elukey>	 ah! Just in time :)
[08:50:39] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 124, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:50:43] <icinga-wm>	 PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:53:15] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 126, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:53:17] <icinga-wm>	 RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:54:19] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[08:59:41] <icinga-wm>	 PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[09:02:15] <icinga-wm>	 RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[09:03:32] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jynus on cumin1001.eqiad.wmnet for hosts: ` ['dbprov2002.codfw.wmnet'] ` The...
[09:05:03] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
[09:05:04] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
[09:05:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:05:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:05:39] <Trizek>	 Hey ops, I have a last-minute request to temporary lift of IP cap on fr.wikipedia.org https://phabricator.wikimedia.org/T219594
[09:05:40] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=nginx
[09:05:41] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=varnish-fe
[09:05:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:05:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:05:56] <Trizek>	 Last minute as-in I just get the IP for an event starting in 3 hours. 
[09:08:41] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 124, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[09:08:41] <icinga-wm>	 PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[09:09:57] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 126, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[09:09:59] <icinga-wm>	 RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[09:10:30] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at codfw on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[09:10:37] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: add mitaka/stretch support for neutron server in cloudcontrol [puppet] - 10https://gerrit.wikimedia.org/r/499992 (https://phabricator.wikimedia.org/T215407)
[09:12:41] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[09:15:35] <wikibugs>	 (03PS2) 10Dzahn: xvfb: remove upstart support [puppet] - 10https://gerrit.wikimedia.org/r/499771
[09:16:15] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[09:16:29] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[09:16:44] <mutante>	 @seen hashar
[09:16:44] <wm-bot>	 mutante: Last time I saw hashar they were quitting the network with reason: Quit: I am a virus. Please copy paste me in your /quit message to help me propagate N/A at 3/28/2019 2:56:32 PM (18h20m12s ago)
[09:18:02] <wikibugs>	 10Operations, 10Traffic, 10netops, 10Patch-For-Review: ulsfo <-> codfw transit link flapping causing nginx availability alerts - https://phabricator.wikimedia.org/T219591 (10ema)
[09:21:27] <wikibugs>	 (03PS1) 10Dzahn: xvfb: replace base::service_unit with systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/499993 (https://phabricator.wikimedia.org/T194724)
[09:21:53] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[09:22:44] <wikibugs>	 (03CR) 10Mathew.onipe: "PCC output is Ok. Changes are expected: https://puppet-compiler.wmflabs.org/compiler1002/15432/" [puppet] - 10https://gerrit.wikimedia.org/r/496782 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe)
[09:23:43] <mutante>	 re: icinga alert on operational latencies on kubernetes1003 - looking at the actual graph does not look like it and 1003 is not the slowest ?
[09:24:45] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: add mitaka/stretch support for neutron server in cloudcontrol [puppet] - 10https://gerrit.wikimedia.org/r/499992 (https://phabricator.wikimedia.org/T215407)
[09:27:21] <mutante>	 jouncebot: next
[09:27:21] <jouncebot>	 In 73 hour(s) and 2 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190401T1030)
[09:27:35] <mutante>	 #wikimedia-tech
[09:27:57] <mutante>	 nevermind, somebody was asking for a deploy there. but it's Friday
[09:31:10] <wikibugs>	 (03PS1) 10Aklapper: Add throttling rule for frwiki event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499994
[09:32:42] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10jcrespo) a:05jcrespo→03Papaul @papaul we need help from you.  We cannot network boot on dbprov2002 (we did on dbprov2001 already). `...
[09:32:56] <wikibugs>	 (03PS2) 10Aklapper: Add throttling rule for frwiki event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499994 (https://phabricator.wikimedia.org/T219594)
[09:33:39] <wikibugs>	 (03PS4) 10Ladsgroup: ores: use hiera for statsd host [puppet] - 10https://gerrit.wikimedia.org/r/499875 (https://phabricator.wikimedia.org/T218567)
[09:34:09] <icinga-wm>	 PROBLEM - puppet last run on db1094 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:35:37] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), 10Patch-For-Review, and 3 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) Summary of what each team is currently wo...
[09:36:37] <wikibugs>	 (03PS36) 10Gehel: elasticsearch: add profile for icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/496782 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe)
[09:37:31] <wikibugs>	 (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/499992 (https://phabricator.wikimedia.org/T215407) (owner: 10Arturo Borrero Gonzalez)
[09:37:37] <mutante>	 !log restarting zuul on contint1001
[09:37:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:40:48] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: add mitaka/stretch support for neutron server in cloudcontrol [puppet] - 10https://gerrit.wikimedia.org/r/499992 (https://phabricator.wikimedia.org/T215407) (owner: 10Arturo Borrero Gonzalez)
[09:43:05] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "pcc https://puppet-compiler.wmflabs.org/compiler1002/15435/" [puppet] - 10https://gerrit.wikimedia.org/r/499992 (https://phabricator.wikimedia.org/T215407) (owner: 10Arturo Borrero Gonzalez)
[09:43:13] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at ulsfo on icinga1001 is OK: (C)60 le (W)70 le 70.56 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[09:51:23] <wikibugs>	 10Operations, 10CirrusSearch, 10Discovery-Search, 10Elasticsearch: Create checks that alerts on cirrussearch update lags - https://phabricator.wikimedia.org/T219601 (10Mathew.onipe)
[09:51:57] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] "At least for CI, we are no more using Xvfb." [puppet] - 10https://gerrit.wikimedia.org/r/499771 (owner: 10Dzahn)
[09:58:18] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] xvfb: remove upstart support [puppet] - 10https://gerrit.wikimedia.org/r/499771 (owner: 10Dzahn)
[09:58:31] <wikibugs>	 (03PS3) 10Dzahn: xvfb: remove upstart support [puppet] - 10https://gerrit.wikimedia.org/r/499771
[09:58:59] <wikibugs>	 (03PS5) 10Ladsgroup: ores: use hiera for statsd host [puppet] - 10https://gerrit.wikimedia.org/r/499875 (https://phabricator.wikimedia.org/T218567)
[10:00:25] <icinga-wm>	 RECOVERY - puppet last run on db1094 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[10:01:05] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[10:01:27] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[10:03:32] <wikibugs>	 10Operations, 10MediaWiki-General-or-Unknown, 10PHP 7.2 support: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Joe)
[10:06:07] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[10:11:09] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster={cache_text,cache_upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:11:25] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[10:13:59] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[10:15:09] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[10:16:51] <wikibugs>	 (03PS1) 10Jcrespo: mariadb-backups: Allow remote dumps from cumin hosts [puppet] - 10https://gerrit.wikimedia.org/r/499997 (https://phabricator.wikimedia.org/T206203)
[10:17:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb-backups: Allow remote dumps from cumin hosts [puppet] - 10https://gerrit.wikimedia.org/r/499997 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo)
[10:18:28] <wikibugs>	 (03PS6) 10Ladsgroup: ores: use hiera for statsd host [puppet] - 10https://gerrit.wikimedia.org/r/499875 (https://phabricator.wikimedia.org/T218567)
[10:20:13] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[10:20:15] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[10:21:19] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:22:02] <wikibugs>	 (03CR) 10Ladsgroup: "Now it's ready:" [puppet] - 10https://gerrit.wikimedia.org/r/499875 (https://phabricator.wikimedia.org/T218567) (owner: 10Ladsgroup)
[10:24:51] <wikibugs>	 (03PS2) 10Dzahn: xvfb: replace base::service_unit with systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/499993 (https://phabricator.wikimedia.org/T194724)
[10:24:55] <wikibugs>	 (03PS2) 10Jcrespo: mariadb-backups: Allow remote dumps from cumin hosts [puppet] - 10https://gerrit.wikimedia.org/r/499997 (https://phabricator.wikimedia.org/T206203)
[10:26:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb-backups: Allow remote dumps from cumin hosts [puppet] - 10https://gerrit.wikimedia.org/r/499997 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo)
[10:27:39] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster={cache_text,cache_upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:27:51] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[10:30:31] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[10:31:49] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[10:33:06] <wikibugs>	 (03PS3) 10Jcrespo: mariadb-backups: Allow remote dumps from cumin hosts [puppet] - 10https://gerrit.wikimedia.org/r/499997 (https://phabricator.wikimedia.org/T206203)
[10:35:02] <ema>	 briefly pooling cp2002's varnish-fe again to try reproduce the 503s we got earlier
[10:35:19] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:35:27] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
[10:35:28] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
[10:35:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:35:29] <wikibugs>	 (03CR) 10Jcrespo: "Not sure about this..." [puppet] - 10https://gerrit.wikimedia.org/r/499997 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo)
[10:35:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:36:06] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
[10:36:07] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
[10:36:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:36:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:37:27] <wikibugs>	 (03PS4) 10Jcrespo: mariadb-backups: Allow remote dumps from cumin hosts [puppet] - 10https://gerrit.wikimedia.org/r/499997 (https://phabricator.wikimedia.org/T206203)
[10:37:39] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[10:37:47] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: serverpackages: mitaka: stretch: ignore python-cryptography from bpo [puppet] - 10https://gerrit.wikimedia.org/r/499998
[10:38:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: serverpackages: mitaka: stretch: ignore python-cryptography from bpo [puppet] - 10https://gerrit.wikimedia.org/r/499998 (owner: 10Arturo Borrero Gonzalez)
[10:40:11] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[10:40:51] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: serverpackages: mitaka: stretch: ignore python-cryptography from bpo [puppet] - 10https://gerrit.wikimedia.org/r/499998
[10:41:03] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/499998 (owner: 10Arturo Borrero Gonzalez)
[10:41:17] <wikibugs>	 (03CR) 10Marostegui: "Agreed it is unclear, but let's try to see if it works so at least we know whether we have the ability to produce remote dumps or not." [puppet] - 10https://gerrit.wikimedia.org/r/499997 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo)
[10:41:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: serverpackages: mitaka: stretch: ignore python-cryptography from bpo [puppet] - 10https://gerrit.wikimedia.org/r/499998 (owner: 10Arturo Borrero Gonzalez)
[10:42:55] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:43:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: serverpackages: mitaka: stretch: ignore python-cryptography from bpo [puppet] - 10https://gerrit.wikimedia.org/r/499998 (owner: 10Arturo Borrero Gonzalez)
[10:43:11] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[10:46:50] <wikibugs>	 (03PS1) 10Ladsgroup: Add tmpSerializeEmptyListsAsObjects Wikibase repo config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499999 (https://phabricator.wikimedia.org/T138104)
[10:46:59] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[10:47:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add tmpSerializeEmptyListsAsObjects Wikibase repo config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499999 (https://phabricator.wikimedia.org/T138104) (owner: 10Ladsgroup)
[10:47:59] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:50:03] <wikibugs>	 (03PS2) 10Ladsgroup: Add tmpSerializeEmptyListsAsObjects Wikibase repo config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499999 (https://phabricator.wikimedia.org/T138104)
[10:50:30] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: openstack: serverpackages: mitaka: stretch: ignore python-cryptography from bpo [puppet] - 10https://gerrit.wikimedia.org/r/499998
[10:53:06] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/499998 (owner: 10Arturo Borrero Gonzalez)
[10:53:59] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[10:54:50] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: serverpackages: mitaka: stretch: ignore python-cryptography from bpo [puppet] - 10https://gerrit.wikimedia.org/r/499998 (owner: 10Arturo Borrero Gonzalez)
[11:00:33] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[11:03:57] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[11:06:13] <wikibugs>	 10Operations, 10MediaWiki-General-or-Unknown, 10PHP 7.2 support: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10MoritzMuehlenhoff) >>! In T219279#5068956, @Joe wrote: > @Anomie so you're suggesting we need to complete the...
[11:07:17] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1002 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[11:09:17] <icinga-wm>	 PROBLEM - puppet last run on lvs3001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:10:19] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[11:10:25] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Add throttling rule for frwiki event (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499994 (https://phabricator.wikimedia.org/T219594) (owner: 10Aklapper)
[11:11:01] <icinga-wm>	 PROBLEM - Apache HTTP on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[11:12:09] <icinga-wm>	 RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.035 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[11:14:05] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1024 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[11:16:55] <wikibugs>	 (03Abandoned) 10Lucas Werkmeister (WMDE): Add throttling rule for frwiki event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499994 (https://phabricator.wikimedia.org/T219594) (owner: 10Aklapper)
[11:17:42] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::php: remove references to tideways [puppet] - 10https://gerrit.wikimedia.org/r/499144
[11:19:01] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "Not sure about changes to fixtures and spec but LGTM in principle" [puppet] - 10https://gerrit.wikimedia.org/r/496719 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[11:25:06] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::mediawiki::php: remove references to tideways [puppet] - 10https://gerrit.wikimedia.org/r/499144 (owner: 10Giuseppe Lavagetto)
[11:29:39] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: admin_scripts: add missing directory dependency for root ssh key [puppet] - 10https://gerrit.wikimedia.org/r/500002
[11:30:44] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] docker: Remove support for trusty images [puppet] - 10https://gerrit.wikimedia.org/r/499929 (owner: 10Muehlenhoff)
[11:31:32] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: admin_scripts: add missing directory dependency for root ssh key [puppet] - 10https://gerrit.wikimedia.org/r/500002 (owner: 10Arturo Borrero Gonzalez)
[11:32:49] <icinga-wm>	 PROBLEM - puppet last run on an-worker1078 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:35:32] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] "This is great, thanks a lot. LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/497767 (owner: 10Jbond)
[11:35:37] <icinga-wm>	 RECOVERY - puppet last run on lvs3001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[11:36:00] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: admin_script: delete code used for renaming cleanup [puppet] - 10https://gerrit.wikimedia.org/r/500003
[11:36:18] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::php: raise mysql.connect_timeout to 3 seconds [puppet] - 10https://gerrit.wikimedia.org/r/499143 (https://phabricator.wikimedia.org/T211488)
[11:39:53] <icinga-wm>	 PROBLEM - Gerrit Health Check on gerrit.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://gerrit.wikimedia.org/r/config/server/healthcheck%7Estatus
[11:40:07] <_joe_>	 I did notice
[11:40:23] <icinga-wm>	 PROBLEM - Gerrit JSON on gerrit.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Gerrit%23Monitoring
[11:42:50] <tarrow>	 gerrit?
[11:42:54] <tarrow>	 ah
[11:44:11] <icinga-wm>	 PROBLEM - puppet last run on netmon1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/software/netbox-reports]
[11:45:22] <mutante>	 !log cobalt - systemctl restart gerrit
[11:45:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:45:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::mediawiki::php: raise mysql.connect_timeout to 3 seconds [puppet] - 10https://gerrit.wikimedia.org/r/499143 (https://phabricator.wikimedia.org/T211488) (owner: 10Giuseppe Lavagetto)
[11:47:29] <icinga-wm>	 RECOVERY - Gerrit Health Check on gerrit.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 950 bytes in 0.090 second response time https://gerrit.wikimedia.org/r/config/server/healthcheck%7Estatus
[11:47:42] <mutante>	 gerrit back for me
[11:47:59] <icinga-wm>	 RECOVERY - Gerrit JSON on gerrit.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 27453 bytes in 0.056 second response time https://wikitech.wikimedia.org/wiki/Gerrit%23Monitoring
[11:48:39] <addshore>	 same
[11:48:49] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/499143 (https://phabricator.wikimedia.org/T211488) (owner: 10Giuseppe Lavagetto)
[11:49:31] <icinga-wm>	 PROBLEM - puppet last run on db1125 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config]
[11:49:53] <icinga-wm>	 PROBLEM - puppet last run on webperf1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_performance/docroot]
[11:51:07] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2002 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[11:51:39] <icinga-wm>	 PROBLEM - puppet last run on webperf2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_performance/docroot]
[11:51:40] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: admin_script: delete code used for renaming cleanup [puppet] - 10https://gerrit.wikimedia.org/r/500003 (owner: 10Arturo Borrero Gonzalez)
[11:51:57] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/15445/mw1261.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/499143 (https://phabricator.wikimedia.org/T211488) (owner: 10Giuseppe Lavagetto)
[11:52:07] <icinga-wm>	 PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 6 minutes ago with 3 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config],Exec[git_pull_mediawiki/event-schemas],Exec[git_pull_statistics_mediawiki]
[11:52:07] <icinga-wm>	 PROBLEM - puppet last run on stat1006 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 6 minutes ago with 4 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config],Exec[git_pull_mediawiki/event-schemas],Exec[git_pull_statistics_mediawiki],Exec[git_pull_analytics/reportupdater]
[11:52:19] <wikibugs>	 (03PS13) 10Jbond: Move qualified parameters to there correct location [puppet] - 10https://gerrit.wikimedia.org/r/497767
[11:52:21] <icinga-wm>	 PROBLEM - puppet last run on stat1007 is CRITICAL: CRITICAL: Puppet has 7 failures. Last run 5 minutes ago with 7 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config],Exec[git_pull_wmde/scripts],Exec[git_pull_wmde/toolkit-analyzer-build],Exec[git_pull_mediawiki/event-schemas]
[11:52:27] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: profile::mediawiki::php: raise mysql.connect_timeout to 3 seconds [puppet] - 10https://gerrit.wikimedia.org/r/499143 (https://phabricator.wikimedia.org/T211488)
[11:53:07] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] Move qualified parameters to there correct location [puppet] - 10https://gerrit.wikimedia.org/r/497767 (owner: 10Jbond)
[11:53:13] <icinga-wm>	 PROBLEM - puppet last run on cumin2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/cookbooks]
[11:53:29] <icinga-wm>	 PROBLEM - puppet last run on notebook1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas]
[11:53:37] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[11:53:49] <wikibugs>	 (03CR) 10Alex Monk: "What's the deal with those hosts under 'Hosts that fail to compile when the change is applied'? do they not work in puppet-compiler?" [puppet] - 10https://gerrit.wikimedia.org/r/499355 (owner: 10Alex Monk)
[11:53:51] <mutante>	 runs puppet on some of those
[11:53:59] <icinga-wm>	 PROBLEM - puppet last run on webperf2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/software/xhgui]
[11:54:17] <jbond42>	 running puppet on cumin worked fine
[11:54:18] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: profile::mediawiki::php: raise mysql.connect_timeout to 3 seconds [puppet] - 10https://gerrit.wikimedia.org/r/499143 (https://phabricator.wikimedia.org/T211488)
[11:54:29] <icinga-wm>	 PROBLEM - puppet last run on notebook1003 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 7 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config],Exec[git_pull_mediawiki/event-schemas]
[11:54:43] <jbond42>	 however i did get a 503 from gerrit a second ago so wonder if it was having issues, most the errors seem to be related to pulling
[11:54:44] <mutante>	 jbond42: it's fallout from gerrit crash. they are all expected to recover
[11:54:53] <jbond42>	 ok cool :)
[11:55:20] <mutante>	 i was just going to run it on a few to get the recoveries 
[11:55:39] <jbond42>	 ack
[11:57:21] <icinga-wm>	 RECOVERY - puppet last run on stat1006 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures
[11:57:21] <icinga-wm>	 RECOVERY - puppet last run on stat1004 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[11:58:45] <icinga-wm>	 RECOVERY - puppet last run on notebook1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[11:58:52] <wikibugs>	 (03PS1) 10Elukey: role::piwik: simplify profile's parameters and remove dead code [puppet] - 10https://gerrit.wikimedia.org/r/500007 (https://phabricator.wikimedia.org/T218037)
[11:59:17] <icinga-wm>	 RECOVERY - puppet last run on webperf2002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[12:01:30] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 04-1] "See inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond)
[12:03:03] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/15446/ - no op for matomo1001" [puppet] - 10https://gerrit.wikimedia.org/r/500007 (https://phabricator.wikimedia.org/T218037) (owner: 10Elukey)
[12:03:57] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[12:04:06] <wikibugs>	 (03PS1) 10Ema: ATS: unset AE:gzip [puppet] - 10https://gerrit.wikimedia.org/r/500011 (https://phabricator.wikimedia.org/T125938)
[12:04:27] <icinga-wm>	 RECOVERY - puppet last run on an-worker1078 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[12:05:13] <wikibugs>	 (03PS2) 10Ema: ATS: unset Accept-Encoding [puppet] - 10https://gerrit.wikimedia.org/r/500011 (https://phabricator.wikimedia.org/T125938)
[12:05:47] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster={cache_text,cache_upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[12:05:59] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[12:06:05] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[12:07:08] <wikibugs>	 10Operations, 10Analytics, 10EventBus, 10vm-requests, 10Services (watching): Create schema[12]00[12] (schema.svc.{eqiad,codfw}.wmnet) - https://phabricator.wikimedia.org/T219556 (10Pchelolo)
[12:07:26] <wikibugs>	 (03CR) 10Dzahn: "hashar: fyi and re: puppet cleanup." [puppet] - 10https://gerrit.wikimedia.org/r/499993 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn)
[12:07:38] <wikibugs>	 10Operations, 10Analytics, 10EventBus, 10vm-requests, and 2 others: Create schema[12]00[12] (schema.svc.{eqiad,codfw}.wmnet) - https://phabricator.wikimedia.org/T219556 (10Pchelolo)
[12:08:59] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: keystone: drop jessie-backport install option [puppet] - 10https://gerrit.wikimedia.org/r/500013 (https://phabricator.wikimedia.org/T216497)
[12:09:37] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[12:09:53] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[12:10:33] <icinga-wm>	 RECOVERY - puppet last run on netmon1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[12:12:27] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: keystone: drop jessie-backport install option [puppet] - 10https://gerrit.wikimedia.org/r/500013 (https://phabricator.wikimedia.org/T216497)
[12:12:31] <wikibugs>	 (03CR) 10Ema: [C: 03+2] ATS: unset Accept-Encoding [puppet] - 10https://gerrit.wikimedia.org/r/500011 (https://phabricator.wikimedia.org/T125938) (owner: 10Ema)
[12:12:43] <icinga-wm>	 RECOVERY - puppet last run on webperf2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[12:13:09] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: openstack: keystone: drop jessie-backport install option [puppet] - 10https://gerrit.wikimedia.org/r/500013 (https://phabricator.wikimedia.org/T216497)
[12:13:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: keystone: drop jessie-backport install option [puppet] - 10https://gerrit.wikimedia.org/r/500013 (https://phabricator.wikimedia.org/T216497) (owner: 10Arturo Borrero Gonzalez)
[12:13:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: keystone: drop jessie-backport install option [puppet] - 10https://gerrit.wikimedia.org/r/500013 (https://phabricator.wikimedia.org/T216497) (owner: 10Arturo Borrero Gonzalez)
[12:14:07] <wikibugs>	 (03PS4) 10Arturo Borrero Gonzalez: openstack: keystone: drop jessie-backport install option [puppet] - 10https://gerrit.wikimedia.org/r/500013 (https://phabricator.wikimedia.org/T216497)
[12:14:17] <icinga-wm>	 RECOVERY - puppet last run on cumin2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[12:15:16] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: keystone: drop jessie-backport install option [puppet] - 10https://gerrit.wikimedia.org/r/500013 (https://phabricator.wikimedia.org/T216497) (owner: 10Arturo Borrero Gonzalez)
[12:15:38] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] Expose rsyslog_udp_port to services configs. [puppet] - 10https://gerrit.wikimedia.org/r/498872 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko)
[12:15:51] <icinga-wm>	 RECOVERY - puppet last run on db1125 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[12:16:13] <icinga-wm>	 RECOVERY - puppet last run on webperf1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[12:16:23] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[12:16:23] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] Add rsyslog kafka to service nodes. [puppet] - 10https://gerrit.wikimedia.org/r/496813 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko)
[12:18:39] <icinga-wm>	 RECOVERY - puppet last run on stat1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[12:20:49] <icinga-wm>	 RECOVERY - puppet last run on notebook1003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[12:23:52] <ema>	 !log rolling ATS restarts to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/500011/ T213263
[12:23:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:23:57] <stashbot>	 T213263: Partial cache_upload traffic switchover to ATS and switchback to Varnish - https://phabricator.wikimedia.org/T213263
[12:25:33] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: keystone: mitaka: stretch: use python-pyldap instead of python-ldap [puppet] - 10https://gerrit.wikimedia.org/r/500014 (https://phabricator.wikimedia.org/T215407)
[12:26:54] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: keystone: mitaka: stretch: use python-pyldap instead of python-ldap [puppet] - 10https://gerrit.wikimedia.org/r/500014 (https://phabricator.wikimedia.org/T215407)
[12:27:41] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[12:28:25] <wikibugs>	 (03PS9) 10Jbond: jessie-backports: warn users if they try to use backports on jessie [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333)
[12:28:41] <wikibugs>	 (03CR) 10Dzahn: "@integration-slave-jessie-1001 has an unrelated puppet issue:  php-xdebug : PreDepends: php-common (>= 2:69~) but 1:51~bpo8+1+wmf1 is to b" [puppet] - 10https://gerrit.wikimedia.org/r/499993 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn)
[12:28:45] <wikibugs>	 (03CR) 10Jbond: "comments addresses" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond)
[12:29:36] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: keystone: mitaka: stretch: use python-pyldap instead of python-ldap [puppet] - 10https://gerrit.wikimedia.org/r/500014 (https://phabricator.wikimedia.org/T215407) (owner: 10Arturo Borrero Gonzalez)
[12:32:33] <paladox>	 If anyone has any details that could help upstream here https://groups.google.com/forum/m/#!topic/repo-discuss/pBMh09-XJsw with the gerrit issue please add :)
[12:33:13] <wikibugs>	 (03PS1) 10Thcipriani: Revert "Gerrit 2.15.12 release" [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/500016
[12:33:59] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[12:34:32] <wikibugs>	 (03CR) 10Paladox: [V: 03+2 C: 03+2] Revert "Gerrit 2.15.12 release" [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/500016 (owner: 10Thcipriani)
[12:35:23] <wikibugs>	 (03CR) 10Paladox: [C: 03+2] Revert "Gerrit 2.15.12 release" [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/500016 (owner: 10Thcipriani)
[12:36:35] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[12:37:13] <wikibugs>	 (03CR) 10Thcipriani: [V: 03+2] Revert "Gerrit 2.15.12 release" [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/500016 (owner: 10Thcipriani)
[12:40:23] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[12:41:15] <icinga-wm>	 PROBLEM - puppet last run on elastic1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:42:01] <wikibugs>	 (03PS1) 10Thcipriani: Revert "Revert "gerrit: Disable jgit gc"" [puppet] - 10https://gerrit.wikimedia.org/r/500017
[12:43:37] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] Revert "Revert "gerrit: Disable jgit gc"" [puppet] - 10https://gerrit.wikimedia.org/r/500017 (owner: 10Thcipriani)
[12:46:48] <moritzm>	 !log  upgrading snapshot1005-1007/1009 to component/php72 (T218193)
[12:46:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:46:52] <stashbot>	 T218193: Switch dumps to component/php7.2 - https://phabricator.wikimedia.org/T218193
[12:46:59] <moritzm>	 !log upgrading snapshot1008 to component/php72 (T218193)
[12:47:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:50:45] <logmsgbot>	 !log thcipriani@deploy1001 Started deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 (gerrit2001 only)
[12:50:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:50:55] <logmsgbot>	 !log thcipriani@deploy1001 Finished deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 (gerrit2001 only) (duration: 00m 10s)
[12:50:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:51:15] <moritzm>	 !log removing php 7.0 packages from snapshot1008, dumps are only using 7.2 (T218193)
[12:51:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:52:42] <wikibugs>	 10Operations, 10puppet-compiler, 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban): operations-puppet-catalog-compiler-test fails due to commit message validator linter error - https://phabricator.wikimedia.org/T219615 (10hashar)
[12:52:45] <logmsgbot>	 !log thcipriani@deploy1001 Started deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 on cobalt -- restart of gerrit incoming
[12:52:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:52:56] <logmsgbot>	 !log thcipriani@deploy1001 Finished deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 on cobalt -- restart of gerrit incoming (duration: 00m 11s)
[12:52:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:53:40] <thcipriani>	 !log restarting gerrit to finish rollback to 2.15.11
[12:53:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:55:24] <thcipriani>	 !log gerrit running on 2.15.11
[12:55:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:55:50] <mutante>	 !tzag is Time Zone Appropriate Greeting - https://www.urbandictionary.com/define.php?term=TZAG
[12:55:51] <wm-bot>	 Key was added
[13:04:00] <wikibugs>	 (03PS4) 10ArielGlenn: use MediaWiki maintenance script to get db user and password [dumps] - 10https://gerrit.wikimedia.org/r/498245 (https://phabricator.wikimedia.org/T218923)
[13:04:17] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] use MediaWiki maintenance script to get db user and password [dumps] - 10https://gerrit.wikimedia.org/r/498245 (https://phabricator.wikimedia.org/T218923) (owner: 10ArielGlenn)
[13:05:56] <ema>	 !log cp2002/cp2005: repool varnish-fe for user traffic T213263
[13:05:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:59] <stashbot>	 T213263: Partial cache_upload traffic switchover to ATS and switchback to Varnish - https://phabricator.wikimedia.org/T213263
[13:06:04] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
[13:06:05] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
[13:06:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:06:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:06:30] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=nginx
[13:06:31] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=varnish-fe
[13:06:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:06:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:06:35] <wikibugs>	 (03PS5) 10ArielGlenn: use MediaWiki maintenance script to get db user and password [dumps] - 10https://gerrit.wikimedia.org/r/498245 (https://phabricator.wikimedia.org/T218923)
[13:07:35] <icinga-wm>	 RECOVERY - puppet last run on elastic1020 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[13:10:52] <wikibugs>	 (03PS1) 10Ema: Revert "Depool ulsfo" [dns] - 10https://gerrit.wikimedia.org/r/500031 (https://phabricator.wikimedia.org/T219591)
[13:12:03] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[13:15:53] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[13:18:29] <wikibugs>	 (03PS1) 10Jcrespo: transfer.py: Allow for a 3rd transfer type: decompression [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/500043 (https://phabricator.wikimedia.org/T219631)
[13:18:51] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] transfer.py: Allow for a 3rd transfer type: decompression [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/500043 (https://phabricator.wikimedia.org/T219631) (owner: 10Jcrespo)
[13:20:32] <wikibugs>	 (03PS2) 10Jcrespo: transfer.py: Allow for a 3rd transfer type: decompression [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/500043 (https://phabricator.wikimedia.org/T219631)
[13:20:59] <wikibugs>	 (03CR) 10Filippo Giunchedi: "AFAICS link maintenance will be performed tonight, I'm not opposed to repooling ulsfo but we should make sure the link under maintenance i" [dns] - 10https://gerrit.wikimedia.org/r/500031 (https://phabricator.wikimedia.org/T219591) (owner: 10Ema)
[13:21:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] transfer.py: Allow for a 3rd transfer type: decompression [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/500043 (https://phabricator.wikimedia.org/T219631) (owner: 10Jcrespo)
[13:23:22] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: wmcs: introduce codfw1dev puppet code [puppet] - 10https://gerrit.wikimedia.org/r/500044 (https://phabricator.wikimedia.org/T219626)
[13:23:32] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM, see last inline comment re: ARCHIVE vs ARCHIVE_BACKPORTS" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond)
[13:29:44] <wikibugs>	 (03PS10) 10Jbond: jessie-backports: warn users if they try to use backports on jessie [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333)
[13:30:07] <wikibugs>	 (03CR) 10Jbond: jessie-backports: warn users if they try to use backports on jessie (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond)
[13:31:13] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[13:34:32] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] jessie-backports: warn users if they try to use backports on jessie [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond)
[13:34:50] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] jessie-backports: warn users if they try to use backports on jessie [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond)
[13:35:01] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[13:41:18] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wmcs: introduce codfw1dev puppet code [puppet] - 10https://gerrit.wikimedia.org/r/500044 (https://phabricator.wikimedia.org/T219626) (owner: 10Arturo Borrero Gonzalez)
[13:43:54] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: cloudcontrol2001-dev: assign proper role [puppet] - 10https://gerrit.wikimedia.org/r/500047 (https://phabricator.wikimedia.org/T219626)
[13:46:13] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudcontrol2001-dev: assign proper role [puppet] - 10https://gerrit.wikimedia.org/r/500047 (https://phabricator.wikimedia.org/T219626) (owner: 10Arturo Borrero Gonzalez)
[13:50:31] <wikibugs>	 10Operations, 10media-storage: swift falsely claims 404s are gzipped - https://phabricator.wikimedia.org/T219635 (10ema)
[13:50:50] <wikibugs>	 10Operations, 10media-storage: swift falsely claims 404s are gzipped - https://phabricator.wikimedia.org/T219635 (10ema) p:05Triage→03Normal
[13:53:54] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] jessie-backports: warn users if they try to use backports on jessie [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond)
[13:54:03] <wikibugs>	 (03PS11) 10Jbond: jessie-backports: warn users if they try to use backports on jessie [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333)
[13:54:09] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[13:59:15] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[14:03:42] <wikibugs>	 (03PS1) 10Mholloway: Cleanup: Remove obsolete WikimediaEditorTasks beta cluster prefs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500049
[14:07:15] <wikibugs>	 (03CR) 10Volans: admin: allow users to be removed preserving their home directories (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey)
[14:16:19] <wikibugs>	 (03CR) 10Marostegui: "that will be a snapshot that has already being prepared on source, right?" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/500043 (https://phabricator.wikimedia.org/T219631) (owner: 10Jcrespo)
[14:22:31] <icinga-wm>	 PROBLEM - puppet last run on analytics1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:30:11] <wikibugs>	 (03PS3) 10Dzahn: xvfb: replace base::service_unit with systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/499993 (https://phabricator.wikimedia.org/T194724)
[14:35:18] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] docker: Remove support for trusty images [puppet] - 10https://gerrit.wikimedia.org/r/499929 (owner: 10Muehlenhoff)
[14:35:34] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: docker: Remove support for trusty images [puppet] - 10https://gerrit.wikimedia.org/r/499929 (owner: 10Muehlenhoff)
[14:35:39] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] docker: Remove support for trusty images [puppet] - 10https://gerrit.wikimedia.org/r/499929 (owner: 10Muehlenhoff)
[14:42:25] <wikibugs>	 (03PS4) 10Dzahn: xvfb: replace base::service_unit with systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/499993 (https://phabricator.wikimedia.org/T194724)
[14:44:59] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: docker::baseimages: remove backports from jessie [puppet] - 10https://gerrit.wikimedia.org/r/500054 (https://phabricator.wikimedia.org/T219580)
[14:47:41] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] xvfb: replace base::service_unit with systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/499993 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn)
[14:48:42] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] "Indeed, entirely unused branching. Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/499944 (owner: 10Alex Monk)
[14:48:49] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: docker::baseimages: remove backports from jessie [puppet] - 10https://gerrit.wikimedia.org/r/500054 (https://phabricator.wikimedia.org/T219580)
[14:48:52] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: base::firewall: rm seemingly unused realm branch [puppet] - 10https://gerrit.wikimedia.org/r/499944 (owner: 10Alex Monk)
[14:54:05] <icinga-wm>	 RECOVERY - puppet last run on analytics1046 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[14:56:10] <wikibugs>	 (03CR) 10Dzahn: "thanks for the follow-up on mediawiki params, Krenair!" [puppet] - 10https://gerrit.wikimedia.org/r/499567 (owner: 10Alex Monk)
[14:56:57] <icinga-wm>	 PROBLEM - Disk space on ldap-eqiad-replica02 is CRITICAL: DISK CRITICAL - free space: / 676 MB (3% inode=96%)
[15:00:07] <mutante>	 !log ldap-eqiad-replica02 - running out of disk - apt-get clean - gzipping /var/log/debug
[15:00:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:03:40] <wikibugs>	 10Operations, 10Traffic: Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10BBlack) There's some complexities here that I've been stewing on for a while, mostly noted in the original description, but I like this general direction.  Most of...
[15:04:08] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/500054 (https://phabricator.wikimedia.org/T219580) (owner: 10Giuseppe Lavagetto)
[15:05:22] <_joe_>	 thanks!
[15:05:26] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] docker::baseimages: remove backports from jessie [puppet] - 10https://gerrit.wikimedia.org/r/500054 (https://phabricator.wikimedia.org/T219580) (owner: 10Giuseppe Lavagetto)
[15:05:45] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: docker::baseimages: remove backports from jessie [puppet] - 10https://gerrit.wikimedia.org/r/500054 (https://phabricator.wikimedia.org/T219580)
[15:13:27] <icinga-wm>	 RECOVERY - Disk space on ldap-eqiad-replica02 is OK: DISK OK
[15:14:39] <_joe_>	 !log pruning old images and containers on boron
[15:14:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:21:48] <wikibugs>	 10Operations, 10Discovery-Search (Current work), 10Wikimedia-Incident: Create cookbook to reset frozen write state on elasticsearch / cirrus - https://phabricator.wikimedia.org/T219638 (10Gehel)
[15:22:25] <wikibugs>	 (03PS1) 10Gehel: Cookbook to reset frozen writes on elasticsearch / cirrus. [cookbooks] - 10https://gerrit.wikimedia.org/r/500064 (https://phabricator.wikimedia.org/T219638)
[15:23:02] <wikibugs>	 (03PS2) 10Gehel: Cookbook to reset frozen writes on elasticsearch / cirrus. [cookbooks] - 10https://gerrit.wikimedia.org/r/500064 (https://phabricator.wikimedia.org/T219638)
[15:23:20] <wikibugs>	 (03CR) 10Jcrespo: "> that will be a snapshot that has already being prepared on source," [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/500043 (https://phabricator.wikimedia.org/T219631) (owner: 10Jcrespo)
[15:24:06] <wikibugs>	 10Operations, 10CirrusSearch, 10Discovery-Search, 10Elasticsearch, 10Wikimedia-Incident: Create checks that alerts on cirrussearch update lags - https://phabricator.wikimedia.org/T219601 (10Gehel)
[15:24:17] <wikibugs>	 (03CR) 10Marostegui: "As per https://phabricator.wikimedia.org/T219631#5069579 that is this very same code?" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/500043 (https://phabricator.wikimedia.org/T219631) (owner: 10Jcrespo)
[15:25:05] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Cookbook to reset frozen writes on elasticsearch / cirrus. [cookbooks] - 10https://gerrit.wikimedia.org/r/500064 (https://phabricator.wikimedia.org/T219638) (owner: 10Gehel)
[15:25:31] <wikibugs>	 10Operations, 10Discovery-Search (Current work), 10Wikimedia-Incident: Make spicerack more robust when unfreezing writes to elasticsearch / cirrus - https://phabricator.wikimedia.org/T219640 (10Gehel)
[15:25:49] <wikibugs>	 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review, 10Wikimedia-Incident: Create cookbook to reset frozen write state on elasticsearch / cirrus - https://phabricator.wikimedia.org/T219638 (10Gehel) a:03Gehel
[15:25:55] <wikibugs>	 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review, 10Wikimedia-Incident: Create cookbook to reset frozen write state on elasticsearch / cirrus - https://phabricator.wikimedia.org/T219638 (10Gehel) p:05Triage→03High
[15:26:01] <wikibugs>	 10Operations, 10Discovery-Search (Current work), 10Wikimedia-Incident: Make spicerack more robust when unfreezing writes to elasticsearch / cirrus - https://phabricator.wikimedia.org/T219640 (10Gehel) p:05Triage→03High a:03Gehel
[15:26:51] <wikibugs>	 (03PS3) 10Gehel: Cookbook to reset frozen writes on elasticsearch / cirrus. [cookbooks] - 10https://gerrit.wikimedia.org/r/500064 (https://phabricator.wikimedia.org/T219638)
[15:28:32] <wikibugs>	 (03CR) 10Jcrespo: "> As per https://phabricator.wikimedia.org/T219631#5069579 that is" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/500043 (https://phabricator.wikimedia.org/T219631) (owner: 10Jcrespo)
[15:30:06] <wikibugs>	 (03CR) 10Marostegui: "Yeah sure, just asking if it was already functional, as in, it was already tested that works :)" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/500043 (https://phabricator.wikimedia.org/T219631) (owner: 10Jcrespo)
[15:30:09] <wikibugs>	 (03PS1) 10Muehlenhoff: Pull in kibana/logstash 5.6.15 [puppet] - 10https://gerrit.wikimedia.org/r/500066
[15:30:13] <icinga-wm>	 PROBLEM - puppet last run on elastic1047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:32:16] <wikibugs>	 (03PS1) 10Gehel: Elasticsearch: make unfreezing writes more robust. [software/spicerack] - 10https://gerrit.wikimedia.org/r/500067 (https://phabricator.wikimedia.org/T219640)
[15:34:40] <wikibugs>	 (03PS6) 10ArielGlenn: use MediaWiki maintenance script to get db user and password [dumps] - 10https://gerrit.wikimedia.org/r/498245 (https://phabricator.wikimedia.org/T218923)
[15:35:34] <wikibugs>	 (03PS4) 10CRusnov: Add synchronizing nodes to ganeti-netbox sync. [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/498268
[15:35:47] <wikibugs>	 (03CR) 10CRusnov: Add synchronizing nodes to ganeti-netbox sync. (036 comments) [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/498268 (owner: 10CRusnov)
[15:36:57] <wikibugs>	 (03PS5) 10CRusnov: netbox ganeti sync: Fix path to logfiles. [puppet] - 10https://gerrit.wikimedia.org/r/499288
[15:38:19] <wikibugs>	 (03CR) 10CRusnov: [C: 03+2] netbox ganeti sync: Fix path to logfiles. [puppet] - 10https://gerrit.wikimedia.org/r/499288 (owner: 10CRusnov)
[15:40:08] <wikibugs>	 (03CR) 10EBernhardson: Elasticsearch: make unfreezing writes more robust. (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/500067 (https://phabricator.wikimedia.org/T219640) (owner: 10Gehel)
[15:43:47] <wikibugs>	 (03CR) 10Dzahn: "the URL parameters are fine now. now just: Class[Tilerator::Ui]: parameter 'sources_to_invalidate' expects a String value, got Tuple" [puppet] - 10https://gerrit.wikimedia.org/r/495735 (https://phabricator.wikimedia.org/T215523) (owner: 10MSantos)
[15:44:27] <wikibugs>	 (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1002/15447/maps1001.eqiad.wmnet/change.maps1001.eqiad.wmnet.err" [puppet] - 10https://gerrit.wikimedia.org/r/495735 (https://phabricator.wikimedia.org/T215523) (owner: 10MSantos)
[15:48:43] <XioNoX>	 !log bump ulsfo-codfw ospf link cost to 1000 - T219591
[15:48:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:48:48] <stashbot>	 T219591: ulsfo <-> codfw transit link flapping causing nginx availability alerts - https://phabricator.wikimedia.org/T219591
[15:50:21] <wikibugs>	 (03CR) 10Dzahn: "tested to stop and start with systemctl on both integration-slave-jessie-1001.integration and jenkins-slave-01.git , no issues" [puppet] - 10https://gerrit.wikimedia.org/r/499993 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn)
[15:50:59] <wikibugs>	 (03PS3) 10Jcrespo: transfer.py: Allow for a 3rd transfer type: decompression [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/500043 (https://phabricator.wikimedia.org/T219631)
[15:51:23] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] transfer.py: Allow for a 3rd transfer type: decompression [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/500043 (https://phabricator.wikimedia.org/T219631) (owner: 10Jcrespo)
[15:51:41] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Revert "Depool ulsfo" [dns] - 10https://gerrit.wikimedia.org/r/500031 (https://phabricator.wikimedia.org/T219591) (owner: 10Ema)
[15:51:54] <XioNoX>	 !log repool ulsfo - T219591
[15:51:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:52:42] <wikibugs>	 10Operations, 10Traffic, 10netops, 10Patch-For-Review: ulsfo <-> codfw transit link flapping causing nginx availability alerts - https://phabricator.wikimedia.org/T219591 (10ayounsi) a:03ayounsi
[15:56:33] <icinga-wm>	 RECOVERY - puppet last run on elastic1047 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[15:57:41] <wikibugs>	 (03PS2) 10Jbond: jessie-backports: remove redundant pins [puppet] - 10https://gerrit.wikimedia.org/r/499808 (https://phabricator.wikimedia.org/T219333)
[15:57:43] <wikibugs>	 (03PS1) 10Jbond: jessie-backports: remove updates from jessie bootstrap-vz config [puppet] - 10https://gerrit.wikimedia.org/r/500069 (https://phabricator.wikimedia.org/T219580)
[16:00:00] <wikibugs>	 (03PS2) 10Gehel: Elasticsearch: make unfreezing writes more robust. [software/spicerack] - 10https://gerrit.wikimedia.org/r/500067 (https://phabricator.wikimedia.org/T219640)
[16:00:08] <wikibugs>	 (03CR) 10Gehel: Elasticsearch: make unfreezing writes more robust. (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/500067 (https://phabricator.wikimedia.org/T219640) (owner: 10Gehel)
[16:00:59] <wikibugs>	 (03CR) 10EBernhardson: [C: 03+1] Elasticsearch: make unfreezing writes more robust. [software/spicerack] - 10https://gerrit.wikimedia.org/r/500067 (https://phabricator.wikimedia.org/T219640) (owner: 10Gehel)
[16:01:25] <icinga-wm>	 PROBLEM - BFD status on cr1-codfw is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[16:01:45] <wikibugs>	 (03PS2) 10Muehlenhoff: Pull in kibana/logstash 5.6.15 [puppet] - 10https://gerrit.wikimedia.org/r/500066
[16:02:43] <icinga-wm>	 RECOVERY - BFD status on cr1-codfw is OK: OK: UP: 12 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[16:07:22] <wikibugs>	 (03PS1) 10EBernhardson: Disable wbcs dispatching query builder on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500070 (https://phabricator.wikimedia.org/T218954)
[16:07:31] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is CRITICAL: 58.25 le 60 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[16:14:17] <icinga-wm>	 PROBLEM - BFD status on cr1-codfw is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[16:15:35] <icinga-wm>	 RECOVERY - BFD status on cr1-codfw is OK: OK: UP: 12 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[16:19:29] <icinga-wm>	 PROBLEM - BFD status on cr1-codfw is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[16:20:45] <icinga-wm>	 RECOVERY - BFD status on cr1-codfw is OK: OK: UP: 12 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[16:21:42] <cdanis>	 looks like the "Varnish traffic drop" alert was just ulsfo being repooled?
[16:21:54] <chaomodus>	 seems reasonable explanation
[16:21:57] <cdanis>	 are the BFD status alerts of any concern XioNoX?
[16:23:03] <XioNoX>	 cdanis: "Varnish traffic drop" alert was just ulsfo being repooled?   <- correct
[16:23:52] <XioNoX>	 about the BFD, not a concern, that link is now the backup one
[16:26:45] <wikibugs>	 (03PS1) 10Nuria: Removing TestSearchSatisfaction from  it being persisted to MySQL [puppet] - 10https://gerrit.wikimedia.org/r/500076 (https://phabricator.wikimedia.org/T216055)
[16:28:29] <wikibugs>	 (03CR) 10Nuria: "Let's wait for @bearloga to be done with moving dashboards to run on top of hadoop to execute this." [puppet] - 10https://gerrit.wikimedia.org/r/500076 (https://phabricator.wikimedia.org/T216055) (owner: 10Nuria)
[16:32:59] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is OK: (C)60 le (W)70 le 72.26 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[16:33:36] <apergos>	 brb
[16:39:55] <icinga-wm>	 PROBLEM - BFD status on cr1-codfw is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[16:41:11] <icinga-wm>	 RECOVERY - BFD status on cr1-codfw is OK: OK: UP: 12 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[16:44:36] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Contrary to what one would expect from the documentation, jessie-updates gets added anyways. See https://github.com/andsens/bootstrap-vz/b" [puppet] - 10https://gerrit.wikimedia.org/r/500069 (https://phabricator.wikimedia.org/T219580) (owner: 10Jbond)
[16:55:15] <icinga-wm>	 PROBLEM - BFD status on cr1-codfw is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[16:56:31] <icinga-wm>	 RECOVERY - BFD status on cr1-codfw is OK: OK: UP: 12 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[17:00:02] <wikibugs>	 (03CR) 10Smalyshev: Disable wbcs dispatching query builder on commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500070 (https://phabricator.wikimedia.org/T218954) (owner: 10EBernhardson)
[17:02:25] <wikibugs>	 (03PS1) 10Cparle: Add 'depicts' statements to search index on testcommons and commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500080
[17:03:07] <wikibugs>	 (03PS2) 10Cparle: Add 'depicts' statements to search index on testcommons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500080
[17:08:48] <wikibugs>	 (03CR) 10Gehel: "I think this looks reasonable. I'll do a last pass and merge this next Monday." [puppet] - 10https://gerrit.wikimedia.org/r/496782 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe)
[17:09:33] <wikibugs>	 10Operations, 10serviceops: Use PHP7 to run all async jobs - https://phabricator.wikimedia.org/T219148 (10jijiki)
[17:09:46] <wikibugs>	 (03CR) 10Gehel: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/500066 (owner: 10Muehlenhoff)
[17:10:07] <wikibugs>	 (03PS3) 10Cparle: Add 'depicts' statements to search index on testcommons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500080
[17:11:30] <wikibugs>	 (03CR) 10Eric Gardner: [C: 03+1] Add 'depicts' statements to search index on testcommons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500080 (owner: 10Cparle)
[17:12:54] <wikibugs>	 10Operations, 10serviceops, 10Services (watching): Use PHP7 to run all async jobs - https://phabricator.wikimedia.org/T219148 (10Pchelolo)
[17:23:42] <wikibugs>	 10Operations, 10serviceops, 10Services (watching): Use PHP7 to run all async jobs - https://phabricator.wikimedia.org/T219148 (10jijiki)
[17:29:48] <wikibugs>	 10Operations, 10serviceops, 10Services (watching), 10User-jijiki: Use PHP7 to run all async jobs - https://phabricator.wikimedia.org/T219148 (10jijiki)
[17:31:27] <wikibugs>	 10Operations, 10serviceops, 10Services (watching), 10User-jijiki: Use PHP7 to run all async jobs - https://phabricator.wikimedia.org/T219148 (10Pchelolo) If I understand correctly, in order to switch a particular job execution to PHP7 all we need to do is to add `Cookie: PHP_ENGINE=php7` header to the requ...
[17:41:52] <wikibugs>	 (03PS1) 10Bstorm: osmdb: set the CNAME for osmdb to the new instance in Cloud VPS [dns] - 10https://gerrit.wikimedia.org/r/500086 (https://phabricator.wikimedia.org/T219652)
[17:44:59] <wikibugs>	 (03PS10) 10MSantos: Pass flag use_nodejs10 for maps services [puppet] - 10https://gerrit.wikimedia.org/r/495735 (https://phabricator.wikimedia.org/T215523)
[17:48:34] <wikibugs>	 10Operations, 10Data-Services, 10decommission, 10Patch-For-Review, 10cloud-services-team (Kanban): Reclaim/Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet as soon as they are ready - https://phabricator.wikimedia.org/T216749 (10Bstorm) Thanks @Marostegui !
[17:55:26] <wikibugs>	 (03CR) 10Herron: [C: 03+1] Pull in kibana/logstash 5.6.15 [puppet] - 10https://gerrit.wikimedia.org/r/500066 (owner: 10Muehlenhoff)
[17:58:03] <icinga-wm>	 PROBLEM - BFD status on cr1-codfw is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[17:59:17] <icinga-wm>	 RECOVERY - BFD status on cr1-codfw is OK: OK: UP: 12 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[18:02:08] <XioNoX>	 I downtimed the BFD alerts
[18:13:51] <wikibugs>	 (03CR) 10EBernhardson: Disable wbcs dispatching query builder on commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500070 (https://phabricator.wikimedia.org/T218954) (owner: 10EBernhardson)
[18:17:11] <wikibugs>	 (03PS1) 10Bstorm: labsdb: remove old and likely unused cname for labsdb1004 [dns] - 10https://gerrit.wikimedia.org/r/500090 (https://phabricator.wikimedia.org/T216749)
[18:20:44] <wikibugs>	 (03CR) 10EBernhardson: Disable wbcs dispatching query builder on commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500070 (https://phabricator.wikimedia.org/T218954) (owner: 10EBernhardson)
[18:20:45] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 124, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[18:23:17] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 126, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[18:39:59] <wikibugs>	 (03CR) 10Marostegui: "I don't know if it is in use or not, but a good way to test it is to stop mysql and postresql for a few days, if no one complains those ho" [dns] - 10https://gerrit.wikimedia.org/r/500090 (https://phabricator.wikimedia.org/T216749) (owner: 10Bstorm)
[18:44:52] <wikibugs>	 (03PS2) 10EBernhardson: Disable wbcs dispatching query builder on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500070 (https://phabricator.wikimedia.org/T218954)
[18:45:01] <wikibugs>	 (03PS1) 10BryanDavis: wmcs: Migrate tools-checker to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243)
[18:46:12] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmcs: Migrate tools-checker to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243) (owner: 10BryanDavis)
[18:49:06] <wikibugs>	 (03PS2) 10BryanDavis: wmcs: Migrate tools-checker to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243)
[18:50:03] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10Papaul) @jcrespo the problem was that  ge-4/0/3 was already part of private1-b-codfw and not xe-4/0/3 so the install is in progress. will...
[18:50:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmcs: Migrate tools-checker to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243) (owner: 10BryanDavis)
[18:53:17] <wikibugs>	 (03PS3) 10BryanDavis: wmcs: Migrate tools-checker to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243)
[18:54:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmcs: Migrate tools-checker to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243) (owner: 10BryanDavis)
[18:56:16] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10Papaul) a:05Papaul→03jcrespo @jcrespo all yours let me know if you have any questions
[19:01:55] <dmaza>	 any idea why this is not going through after the +2? I can't find it in here (https://integration.wikimedia.org/zuul/) either
[19:04:01] <wikibugs>	 (03PS1) 10Cwhite: profile: do not mutate level for mjolnir [puppet] - 10https://gerrit.wikimedia.org/r/500099 (https://phabricator.wikimedia.org/T213899)
[19:09:26] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/500064 (https://phabricator.wikimedia.org/T219638) (owner: 10Gehel)
[19:10:16] <wikibugs>	 10Operations, 10Release Pipeline, 10Core Platform Team Kanban (Done with CPT), 10Release-Engineering-Team (Watching / External), 10Services (done): Track and install additional npm packages for all service container images - https://phabricator.wikimedia.org/T205911 (10mobrovac) 05Open→03Resolved a:...
[19:16:12] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, a couple of question inline, feel free to merge as needed." (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/500067 (https://phabricator.wikimedia.org/T219640) (owner: 10Gehel)
[19:31:01] <wikibugs>	 10Operations, 10Cassandra, 10Core Platform Team (Session Management Service (CDP2)), 10Core Platform Team Backlog (Watching / External), and 2 others: Credentials needed for session storage Cassandra cluster - https://phabricator.wikimedia.org/T219560 (10mobrovac)
[19:35:13] <wikibugs>	 (03PS4) 10BryanDavis: wmcs: Migrate tools-checker to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243)
[19:36:24] <wikibugs>	 10Operations, 10Analytics, 10EventBus, 10vm-requests, and 3 others: Create schema[12]00[12] (schema.svc.{eqiad,codfw}.wmnet) - https://phabricator.wikimedia.org/T219556 (10mobrovac)
[19:42:01] <wikibugs>	 (03PS1) 10Mholloway: Add cron job to update WikimediaEditorTasks suggestions table [puppet] - 10https://gerrit.wikimedia.org/r/500104 (https://phabricator.wikimedia.org/T218136)
[19:43:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add cron job to update WikimediaEditorTasks suggestions table [puppet] - 10https://gerrit.wikimedia.org/r/500104 (https://phabricator.wikimedia.org/T218136) (owner: 10Mholloway)
[19:44:11] <wikibugs>	 (03PS2) 10Mholloway: Add cron job to update WikimediaEditorTasks suggestions table [puppet] - 10https://gerrit.wikimedia.org/r/500104 (https://phabricator.wikimedia.org/T218136)
[19:53:14] <wikibugs>	 (03CR) 10Smalyshev: Disable wbcs dispatching query builder on commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/500070 (https://phabricator.wikimedia.org/T218954) (owner: 10EBernhardson)
[19:54:58] <wikibugs>	 10Operations, 10serviceops, 10Core Platform Team Backlog (Watching / External), 10Services (watching), 10User-jijiki: Use PHP7 to run all async jobs - https://phabricator.wikimedia.org/T219148 (10mobrovac)
[20:07:19] <wikibugs>	 (03CR) 10BryanDavis: "PCC output for icinga[12]001.wikimedia.org: https://puppet-compiler.wmflabs.org/compiler1002/15456/" [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243) (owner: 10BryanDavis)
[20:09:08] <wikibugs>	 (03CR) 10EBernhardson: [C: 03+1] Elasticsearch: make unfreezing writes more robust. (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/500067 (https://phabricator.wikimedia.org/T219640) (owner: 10Gehel)
[20:20:52] <wikibugs>	 (03PS1) 10Bstorm: labsdb: decommissioning labsdb1004/5 [puppet] - 10https://gerrit.wikimedia.org/r/500117 (https://phabricator.wikimedia.org/T216749)
[20:23:05] <wikibugs>	 (03PS2) 10Bstorm: labsdb: decommissioning labsdb1004/5 [puppet] - 10https://gerrit.wikimedia.org/r/500117 (https://phabricator.wikimedia.org/T216749)
[20:25:34] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] labsdb: decommissioning labsdb1004/5 [puppet] - 10https://gerrit.wikimedia.org/r/500117 (https://phabricator.wikimedia.org/T216749) (owner: 10Bstorm)
[20:29:38] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers
[20:29:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:30:08] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (duration: 00m 30s)
[20:30:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:30:15] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 124, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[20:31:47] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 2)
[20:31:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:32:51] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 126, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[20:35:17] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 2) (duration: 03m 30s)
[20:35:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:43:09] <wikibugs>	 10Operations, 10Cloud-VPS, 10DNS, 10Maps, and 2 others: multi-component wmflabs.org subdomains doesn't work under simple wildcard TLS cert - https://phabricator.wikimedia.org/T161256 (10TheDJ) FYI, I have configured [abc].tiles.wmflabs.org webhosts to redirect to http://tiles.wmflabs.org during {T204506}...
[20:46:12] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[20:46:44] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3)
[20:46:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:48:02] <icinga-wm>	 PROBLEM - puppet last run on restbase1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:49:58] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3) (duration: 03m 13s)
[20:49:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:01] <wikibugs>	 (03PS4) 10CRusnov: Add basic Ganeti RAPI module and tests [software/spicerack] - 10https://gerrit.wikimedia.org/r/499032
[20:55:56] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3)
[20:55:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:58:12] <icinga-wm>	 PROBLEM - puppet last run on cp1080 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:01:10] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3) (duration: 05m 14s)
[21:01:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:03:31] <icinga-wm>	 RECOVERY - puppet last run on restbase1011 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[21:07:08] <icinga-wm>	 PROBLEM - HHVM rendering on mw1285 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[21:07:11] <wikibugs>	 (03PS5) 10BryanDavis: wmcs: Migrate tools-checker to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243)
[21:08:22] <icinga-wm>	 RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 78663 bytes in 0.363 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[21:14:14] <icinga-wm>	 PROBLEM - puppet last run on kafkamon1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:54] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[21:20:48] <icinga-wm>	 PROBLEM - puppet last run on kafka-jumbo1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:23:14] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[21:29:42] <icinga-wm>	 RECOVERY - puppet last run on cp1080 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[21:30:15] <wikibugs>	 (03PS6) 10BryanDavis: wmcs: Migrate tools-checker to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243)
[21:40:30] <icinga-wm>	 RECOVERY - puppet last run on kafkamon1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[21:45:48] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[21:47:04] <icinga-wm>	 RECOVERY - puppet last run on kafka-jumbo1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[22:06:39] <bstorm_>	 !log stopped database services on labsdb1004 and labsdb1005
[22:06:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:07:24] <wikibugs>	 10Operations, 10Data-Services, 10decommission, 10Patch-For-Review, 10cloud-services-team (Kanban): Reclaim/Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet as soon as they are ready - https://phabricator.wikimedia.org/T216749 (10Bstorm) Database services (postgres and mariadb) are now shut...
[22:09:19] <wikibugs>	 10Operations, 10Data-Services, 10decommission, 10Patch-For-Review, 10cloud-services-team (Kanban): Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet as soon as they are ready - https://phabricator.wikimedia.org/T216749 (10Bstorm)
[22:15:04] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[22:24:00] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[22:28:04] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2002 is CRITICAL: instance=kubernetes2002.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[22:35:06] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[22:40:43] <wikibugs>	 10Operations, 10Data-Services, 10decommission, 10Patch-For-Review, 10cloud-services-team (Kanban): Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet - https://phabricator.wikimedia.org/T216749 (10Bstorm)
[22:45:34] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[22:46:34] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[22:46:50] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 4 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[22:48:34] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 124, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[22:49:50] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 126, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[23:00:22] <wikibugs>	 (03CR) 10BryanDavis: [C: 04-1] wmcs: Migrate tools-checker to Stretch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/500095 (https://phabricator.wikimedia.org/T219243) (owner: 10BryanDavis)
[23:15:20] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2001 is CRITICAL: instance=kubernetes2001.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[23:25:48] <icinga-wm>	 PROBLEM - puppet last run on ms-be1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:36:44] <icinga-wm>	 PROBLEM - puppet last run on druid1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:42:02] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[23:44:36] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 5 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[23:51:10] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[23:56:10] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[23:57:26] <icinga-wm>	 RECOVERY - puppet last run on ms-be1026 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[23:58:44] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 5 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status