[00:59:40] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1276 is CRITICAL: CRITICAL - load average: 62.87, 29.27, 18.63
[01:03:20] <icinga-wm>	 PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb={PATCH,PUT} https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[01:03:20] <icinga-wm>	 PROBLEM - etcd request latencies on neon is CRITICAL: instance=10.64.0.40:6443 operation=compareAndSwap https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[01:04:41] <icinga-wm>	 PROBLEM - etcd request latencies on argon is CRITICAL: instance=10.64.32.133:6443 operation=compareAndSwap https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[01:05:10] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1276 is OK: OK - load average: 18.55, 30.51, 23.38
[02:06:11] <icinga-wm>	 RECOVERY - etcd request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[02:07:01] <icinga-wm>	 RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[02:07:01] <icinga-wm>	 RECOVERY - etcd request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[02:28:50] <wikibugs_>	 (03PS6) 10Zhuyifei1999: Load project name dynamically from /etc/wmcs-project [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/430647 (https://phabricator.wikimedia.org/T190893)
[02:39:04] <wikibugs_>	 (03CR) 10BryanDavis: [C: 031] "Guess this could be tagged as resolving T192244 as well, but I really hope people don't rely on /etc/wmcs-instancename widely." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/430647 (https://phabricator.wikimedia.org/T190893) (owner: 10Zhuyifei1999)
[02:40:10] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet operation_type={container_status,create_container,list_containers,list_podsandbox,podsandbox_status,remove_container,start_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[02:40:11] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet operation_type={container_status,create_container,list_containers,list_podsandbox,podsandbox_status,remove_container,start_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[02:53:20] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[02:53:21] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[02:54:01] <wikibugs_>	 (03CR) 10Zhuyifei1999: "pykube.exceptions.HTTPError: /etc/wmcs-project is not in allowed host paths nor allowed host path prefixes" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/430647 (https://phabricator.wikimedia.org/T190893) (owner: 10Zhuyifei1999)
[02:59:00] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet operation_type={list_podsandbox,podsandbox_status,remove_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:00:00] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet operation_type={create_container,list_containers,list_podsandbox,podsandbox_status,remove_container,start_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:01:10] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:02:10] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:07:40] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet operation_type={container_status,create_container,list_containers,list_podsandbox,podsandbox_status,remove_container,start_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:07:41] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet operation_type={list_podsandbox,podsandbox_status,remove_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:08:50] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:09:50] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:15:21] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet operation_type={container_status,create_container,list_podsandbox,podsandbox_status,remove_container,start_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:15:30] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet operation_type={container_status,create_container,list_podsandbox,podsandbox_status,remove_container,start_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:17:40] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:18:40] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:23:10] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet operation_type={container_status,create_container,list_containers,list_podsandbox,podsandbox_status,remove_container,start_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:24:11] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet operation_type={container_status,create_container,list_containers,list_podsandbox,podsandbox_status,remove_container,start_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:26:30] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:27:12] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 807.07 seconds
[03:27:31] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:28:10] <wikibugs_>	 (03CR) 10Krinkle: Raise Scribunto maxLangCacheSize to 200 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430068 (https://phabricator.wikimedia.org/T85461) (owner: 10Anomie)
[03:28:58] <wikibugs_>	 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3891777 (10Peachey88) >>! In T184664#4184441, @Verdy_p wrote: > How do you plan to update these fonts when the Noto...
[03:33:10] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet operation_type={list_podsandbox,podsandbox_status,remove_container,start_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:34:10] <icinga-wm>	 PROBLEM - puppet last run on mw2137 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz]
[03:34:11] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet operation_type={podsandbox_status,remove_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:34:20] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:34:21] <icinga-wm>	 PROBLEM - puppet last run on analytics1067 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-ISP.mmdb.gz]
[03:35:21] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:41:00] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet operation_type={create_container,list_podsandbox,podsandbox_status,remove_container,start_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:42:00] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet operation_type={container_status,create_container,list_containers,list_podsandbox,podsandbox_status,remove_container,start_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:43:10] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:44:11] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:49:51] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet operation_type={remove_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:51:00] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:53:31] <wikibugs_>	 (03PS1) 10Zhuyifei1999: toolforge k8s: allow /etc/wmcs-project to be mounted [puppet] - 10https://gerrit.wikimedia.org/r/431285 (https://phabricator.wikimedia.org/T192244)
[03:54:25] <wikibugs_>	 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10TechCom-RFC (TechCom-Approved), 10User-ArielGlenn: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#4184607 (10alex-mashin) > but I think we should also have Scribunto for other frameworks, notably Javascript/ECMASCript like...
[03:54:46] <wikibugs_>	 (03PS2) 10Zhuyifei1999: toolforge k8s: allow /etc/wmcs-project to be mounted [puppet] - 10https://gerrit.wikimedia.org/r/431285 (https://phabricator.wikimedia.org/T190893)
[03:55:14] <wikibugs_>	 (03PS7) 10Zhuyifei1999: Mount & load project name dynamically from /etc/wmcs-project [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/430647 (https://phabricator.wikimedia.org/T190893)
[03:59:35] <wikibugs_>	 (03PS8) 10Zhuyifei1999: Mount & load project name dynamically from /etc/wmcs-project [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/430647 (https://phabricator.wikimedia.org/T190893)
[04:00:21] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 284.94 seconds
[04:04:50] <icinga-wm>	 RECOVERY - puppet last run on mw2137 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[04:05:00] <icinga-wm>	 RECOVERY - puppet last run on analytics1067 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[04:16:30] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet operation_type={container_status,create_container,list_containers,list_podsandbox,podsandbox_status,remove_container,start_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:17:31] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[04:38:10] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on mw1230 is CRITICAL: cluster=api_appserver device=sda instance=mw1230:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=mw1230&var-datasource=eqiad%2520prometheus%252Fops
[06:28:00] <icinga-wm>	 PROBLEM - puppet last run on mw2161 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/cgroup-mediawiki-clean]
[06:28:10] <icinga-wm>	 PROBLEM - puppet last run on mw1323 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/bash_autologout.sh]
[06:29:20] <icinga-wm>	 PROBLEM - puppet last run on mw1305 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/cgroup-mediawiki-clean]
[06:58:30] <icinga-wm>	 RECOVERY - puppet last run on mw2161 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[06:58:40] <icinga-wm>	 RECOVERY - puppet last run on mw1323 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[06:59:50] <icinga-wm>	 RECOVERY - puppet last run on mw1305 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[11:09:50] <wikibugs_>	 (03PS1) 10Addshore: WIP DNM WikibaseLexeme Config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431306 (https://phabricator.wikimedia.org/T184745)
[11:11:00] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] WIP DNM WikibaseLexeme Config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431306 (https://phabricator.wikimedia.org/T184745) (owner: 10Addshore)
[11:12:00] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- on einsteinium is CRITICAL: cluster=cache_upload site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[11:14:10] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[11:45:30] <wikibugs_>	 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10TechCom-RFC (TechCom-Approved), 10User-ArielGlenn: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#4185033 (10Verdy_p) "proposals as an attack" ? Strange attitude. I's easy to see that Lua is in fact very slow compared to m...
[11:57:45] <wikibugs_>	 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10TechCom-RFC (TechCom-Approved), 10User-ArielGlenn: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#4185035 (10Reedy) This is off topic for this task. Please use/create an appropriate one
[12:11:04] <wikibugs_>	 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10TechCom-RFC (TechCom-Approved), 10User-ArielGlenn: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#4185036 (10Verdy_p) What was off-topic was the sentence I commented: "proposals seen as an attack". Which is completely wron...
[12:14:02] <wikibugs_>	 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10TechCom-RFC (TechCom-Approved), 10User-ArielGlenn: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#4185037 (10Reedy) This task isn’t about Lua. Or Scribunto.  And where is your reference for it “draining resources”?
[13:18:31] <wikibugs_>	 (03PS1) 10ArielGlenn: wikidata weekly dumps: set all default vars before parsing args [puppet] - 10https://gerrit.wikimedia.org/r/431312
[13:51:34] <wikibugs_>	 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10TechCom-RFC (TechCom-Approved), 10User-ArielGlenn: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#4185070 (10Joe) The decision to migrate the WMF production back to PHP 7.x is long taken and is not something we'd have done...
[14:12:54] <wikibugs_>	 (03CR) 10BryanDavis: [C: 031] Mount & load project name dynamically from /etc/wmcs-project [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/430647 (https://phabricator.wikimedia.org/T190893) (owner: 10Zhuyifei1999)
[14:35:02] <wikibugs_>	 (03CR) 10Hoo man: [C: 031] "Very good catch. Please note that is only relevant in case the last run aborted unsuccessfully/ was killed, otherwise the files will be de" [puppet] - 10https://gerrit.wikimedia.org/r/431312 (owner: 10ArielGlenn)
[15:06:58] <wikibugs_>	 (03PS1) 10Hoo man: Fix dumpwikidatardf size sanity check [puppet] - 10https://gerrit.wikimedia.org/r/431315
[15:13:38] <wikibugs_>	 (03CR) 10ArielGlenn: [C: 032] wikidata weekly dumps: set all default vars before parsing args [puppet] - 10https://gerrit.wikimedia.org/r/431312 (owner: 10ArielGlenn)
[15:14:10] <wikibugs_>	 (03CR) 10ArielGlenn: [C: 032] Fix dumpwikidatardf size sanity check [puppet] - 10https://gerrit.wikimedia.org/r/431315 (owner: 10Hoo man)
[15:14:16] <wikibugs_>	 (03PS2) 10ArielGlenn: Fix dumpwikidatardf size sanity check [puppet] - 10https://gerrit.wikimedia.org/r/431315 (owner: 10Hoo man)
[18:12:24] <wikibugs_>	 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#3591799 (10TerraCodes) Can this task be closed? (since everything in the task description is checked off)
[18:13:27] <wikibugs_>	 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey: Upgrade mw* servers to Debian Stretch (using HHVM) - https://phabricator.wikimedia.org/T174431#4185208 (10TerraCodes)