[00:00:13] <hauskatze>	 mutante: for some reason 569100 says "can't merge" but the rebase button says "already up to date". Black and white at the same time? :-
[00:00:17] <hauskatze>	 :-) *
[00:02:11] <icinga-wm>	 PROBLEM - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1016 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} https://wikitech.wikimedia.org/wiki/Microcode
[00:02:11] <icinga-wm>	 PROBLEM - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1009 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} https://wikitech.wikimedia.org/wiki/Microcode
[00:06:53] <icinga-wm>	 PROBLEM - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1010 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} https://wikitech.wikimedia.org/wiki/Microcode
[00:10:52] <wikibugs>	 (03PS1) 10Marostegui: Revert "db2087: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/569102
[00:11:35] <icinga-wm>	 PROBLEM - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1011 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} https://wikitech.wikimedia.org/wiki/Microcode
[00:12:17] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db2087: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/569102 (owner: 10Marostegui)
[00:14:23] <wikibugs>	 (03PS1) 10Dzahn: ATS: directly talk wss:// to aphlict [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593)
[00:16:17] <icinga-wm>	 PROBLEM - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1012 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} https://wikitech.wikimedia.org/wiki/Microcode
[00:16:24] <wikibugs>	 (03CR) 10Dzahn: ATS: directly talk wss:// to aphlict (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn)
[00:19:54] <icinga-wm>	 PROBLEM - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1013 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} https://wikitech.wikimedia.org/wiki/Microcode
[00:23:44] <icinga-wm>	 PROBLEM - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1014 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} https://wikitech.wikimedia.org/wiki/Microcode
[00:31:38] <mutante>	 !log importing jenkins 2.219 to stretch-wikimedia APT repo; releases1001: upgrading jenkins to 2.219
[00:31:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:36:16] <mutante>	 !log releases2001: upgrading jenkins to 2.219; install1002: import jenkins 2.219 into jessie-wikimedia APT repo
[00:36:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:41:37] <mutante>	 !log contint1001/contint2001 - upgrading jenkins to 2.219
[00:41:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:44:16] <icinga-wm>	 PROBLEM - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1015 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} https://wikitech.wikimedia.org/wiki/Microcode
[00:57:56] <icinga-wm>	 PROBLEM - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1018 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} https://wikitech.wikimedia.org/wiki/Microcode
[01:02:59] <logmsgbot>	 !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@322ee4c]: Update mobileapps to 3eec28d
[01:03:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:09:52] <logmsgbot>	 !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@322ee4c]: Update mobileapps to 3eec28d (duration: 06m 53s)
[01:09:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:38:40] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[01:40:32] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[01:42:52] <icinga-wm>	 PROBLEM - Host cp3063 is DOWN: PING CRITICAL - Packet loss = 100%
[03:14:20] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[03:16:10] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[03:22:45] <mutante>	 !log powercycling crashed cp3063
[03:22:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:25:56] <icinga-wm>	 RECOVERY - Host cp3063 is UP: PING OK - Packet loss = 0%, RTA = 83.37 ms
[03:53:22] <wikibugs>	 10Operations, 10ops-eqiad, 10serviceops: (No Need By Date Provided) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10Jclark-ctr) @jijiki will most likely be early March
[09:18:16] <addshore>	 !log addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --sleep 4 --batch-size=25 # In a screen for T219301
[09:18:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:18:20] <stashbot>	 T219301: Migrate to and read from new store for property terms - https://phabricator.wikimedia.org/T219301
[09:28:16] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Cluster: Degraded RAID on analytics1030 - https://phabricator.wikimedia.org/T243971 (10Peachey88)
[09:31:14] <apergos>	 hey addshore while you're here...
[09:31:22] <addshore>	 o/
[09:31:25] <apergos>	 what's the new eta on having the ful migration done?
[09:31:34] <apergos>	 (this is my monthly check-in)
[09:31:48] <addshore>	 I should be able to give you a better prediction at the start of next week
[09:32:14] <apergos>	 do we think it's a month or a week or much longer than a month?  just looking for a ballpark figure
[09:32:28] <addshore>	 Started again around 12 hours ago, and nearly at Q 2 million, so maybe it is 4 million ish every day? so maybe 20 days?
[09:32:38] <addshore>	 ballpark would be 1 month
[09:32:44] <apergos>	 so I should note to check in again around a month from now... that works for me.
[09:32:46] <apergos>	 thanks much!
[09:32:55] <addshore>	 id like to see it happen sooner however, and it might, but if it does, youll know about it :)
[09:34:06] <apergos>	 :-)
[09:34:23] <apergos>	 this is more for my own bookkeeping on a phab task 
[10:32:56] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[10:33:05] <effie>	 ^ checking 
[10:34:44] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[10:35:19] <effie>	 nothing interesting 
[10:50:51] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] Buster vms: include python3 versions of openstack clients [puppet] - 10https://gerrit.wikimedia.org/r/569084 (owner: 10Andrew Bogott)
[11:53:59] <wikibugs>	 (03PS2) 10WMDE-leszek: Wikibase: added config variables to configure entity sources [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569031 (https://phabricator.wikimedia.org/T242087)
[11:54:01] <wikibugs>	 (03PS1) 10WMDE-leszek: Beta wikidata: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569204 (https://phabricator.wikimedia.org/T242087)
[11:54:03] <wikibugs>	 (03PS1) 10WMDE-leszek: Beta commons: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569205 (https://phabricator.wikimedia.org/T242087)
[11:54:05] <wikibugs>	 (03PS1) 10WMDE-leszek: Beta cluster: use entity source Wikibase setting for all wikibase-enabled wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569206 (https://phabricator.wikimedia.org/T242087)
[11:54:07] <wikibugs>	 (03PS1) 10WMDE-leszek: Beta commons: Remove custom wmgWikibaseRepoForeignRepositories setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569207 (https://phabricator.wikimedia.org/T242087)
[11:54:09] <wikibugs>	 (03PS1) 10WMDE-leszek: Beta cluster: remove custom wmgWikibaseClientRepositories settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569208 (https://phabricator.wikimedia.org/T242087)
[11:54:11] <wikibugs>	 (03PS1) 10WMDE-leszek: Test wikidata: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569209 (https://phabricator.wikimedia.org/T242087)
[11:58:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Beta commons: Remove custom wmgWikibaseRepoForeignRepositories setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569207 (https://phabricator.wikimedia.org/T242087) (owner: 10WMDE-leszek)
[12:00:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Beta cluster: remove custom wmgWikibaseClientRepositories settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569208 (https://phabricator.wikimedia.org/T242087) (owner: 10WMDE-leszek)
[12:01:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Test wikidata: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569209 (https://phabricator.wikimedia.org/T242087) (owner: 10WMDE-leszek)
[12:21:10] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[12:26:40] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[12:39:28] <icinga-wm>	 PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is CRITICAL: 1.179e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[12:43:07] <effie>	   ^ this alert can be looked at later
[12:56:00] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[13:06:29] <wikibugs>	 (03PS2) 10WMDE-leszek: Beta commons: Remove custom wmgWikibaseRepoForeignRepositories setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569207 (https://phabricator.wikimedia.org/T242087)
[13:06:35] <wikibugs>	 (03PS2) 10WMDE-leszek: Beta cluster: remove custom wmgWikibaseClientRepositories settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569208 (https://phabricator.wikimedia.org/T242087)
[13:06:37] <wikibugs>	 (03PS2) 10WMDE-leszek: Test wikidata: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569209 (https://phabricator.wikimedia.org/T242087)
[13:27:16] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[13:32:46] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[13:34:36] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[13:58:30] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:00:20] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:02:50] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: cloud: hiera: puppetmaster: refactor hiera [puppet] - 10https://gerrit.wikimedia.org/r/569230 (https://phabricator.wikimedia.org/T229441)
[14:04:00] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:09:30] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:09:54] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "Sharing this idea with you." [puppet] - 10https://gerrit.wikimedia.org/r/569230 (https://phabricator.wikimedia.org/T229441) (owner: 10Arturo Borrero Gonzalez)
[14:22:22] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:26:02] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:29:17] <icinga-wm>	 ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1009 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} Effie Mouzeli Will be taken care of after all hands - The acknowledgement expires at: 2020-02-03 13:27:00. https://wikitech.wikimedia.org/wiki/Microcode
[14:29:17] <icinga-wm>	 ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1010 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} Effie Mouzeli Will be taken care of after all hands - The acknowledgement expires at: 2020-02-03 13:27:00. https://wikitech.wikimedia.org/wiki/Microcode
[14:29:17] <icinga-wm>	 ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1011 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} Effie Mouzeli Will be taken care of after all hands - The acknowledgement expires at: 2020-02-03 13:27:00. https://wikitech.wikimedia.org/wiki/Microcode
[14:29:17] <icinga-wm>	 ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1012 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} Effie Mouzeli Will be taken care of after all hands - The acknowledgement expires at: 2020-02-03 13:27:00. https://wikitech.wikimedia.org/wiki/Microcode
[14:29:17] <icinga-wm>	 ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1013 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} Effie Mouzeli Will be taken care of after all hands - The acknowledgement expires at: 2020-02-03 13:27:00. https://wikitech.wikimedia.org/wiki/Microcode
[14:29:17] <icinga-wm>	 ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1014 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} Effie Mouzeli Will be taken care of after all hands - The acknowledgement expires at: 2020-02-03 13:27:00. https://wikitech.wikimedia.org/wiki/Microcode
[14:29:18] <icinga-wm>	 ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1015 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} Effie Mouzeli Will be taken care of after all hands - The acknowledgement expires at: 2020-02-03 13:27:00. https://wikitech.wikimedia.org/wiki/Microcode
[14:29:18] <icinga-wm>	 ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1016 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} Effie Mouzeli Will be taken care of after all hands - The acknowledgement expires at: 2020-02-03 13:27:00. https://wikitech.wikimedia.org/wiki/Microcode
[14:29:18] <icinga-wm>	 ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1018 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} Effie Mouzeli Will be taken care of after all hands - The acknowledgement expires at: 2020-02-03 13:27:00. https://wikitech.wikimedia.org/wiki/Microcode
[14:29:19] <icinga-wm>	 ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1019 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} Effie Mouzeli Will be taken care of after all hands - The acknowledgement expires at: 2020-02-03 13:27:00. https://wikitech.wikimedia.org/wiki/Microcode
[14:29:20] <icinga-wm>	 ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1020 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} Effie Mouzeli Will be taken care of after all hands - The acknowledgement expires at: 2020-02-03 13:27:00. https://wikitech.wikimedia.org/wiki/Microcode
[14:29:20] <icinga-wm>	 ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1021 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} Effie Mouzeli Will be taken care of after all hands - The acknowledgement expires at: 2020-02-03 13:27:00. https://wikitech.wikimedia.org/wiki/Microcode
[14:29:20] <icinga-wm>	 ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1022 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear} Effie Mouzeli Will be taken care of after all hands - The acknowledgement expires at: 2020-02-03 13:27:00. https://wikitech.wikimedia.org/wiki/Microcode
[14:33:39] <addshore>	 :D
[14:38:52] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:40:40] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:41:08] <Vermont>	 :/
[14:50:40] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Cluster: notebook1004 - /srv is full - https://phabricator.wikimedia.org/T232068 (10jijiki) 05Resolved→03Open
[14:51:56] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Cluster: notebook1004 - /srv is full - https://phabricator.wikimedia.org/T232068 (10jijiki) Host alerted again about /srv being full, /srv/home is 119G.
[14:52:47] <icinga-wm>	 ACKNOWLEDGEMENT - Disk space on notebook1004 is CRITICAL: DISK CRITICAL - free space: /srv 2102 MB (1% inode=78%): Effie Mouzeli Reopened T232068 https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=notebook1004&var-datasource=eqiad+prometheus/ops
[14:55:32] <icinga-wm>	 PROBLEM - etherpad_up reduced availability on icinga1001 is CRITICAL: 0 le 0.8 https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_exporters_%22up%22_metrics_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:57:22] <icinga-wm>	 RECOVERY - etherpad_up reduced availability on icinga1001 is OK: (C)0.8 le (W)0.9 le 1 https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_exporters_%22up%22_metrics_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:11:56] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:13:46] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:37:34] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:41:12] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:03:12] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:05:02] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:10:04] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Cluster: notebook1004 - /srv is full - https://phabricator.wikimedia.org/T232068 (10elukey) @Groceryheist hello, can you check your home directory size ? :)
[16:23:24] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:25:14] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:34:24] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:34:32] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:34:45] <effie>	 ok that is different now 
[16:36:22] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:38:04] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:52:41] * addshore reads up
[16:53:21] <addshore>	 are these error spikes all abusefilter? O_o
[16:53:52] <effie>	 the app or the api pnes?
[16:54:03] <effie>	 ones* 
[16:54:10] <effie>	 api 
[16:54:27] <effie>	 it is a bot doing too many expensive requests and timing out 
[16:54:34] <addshore>	 I see
[16:54:38] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:54:44] <addshore>	 I just saw this, many skipeyness for abusefilter https://usercontent.irccloud-cdn.com/file/OodmiHwl/image.png
[16:54:46] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[16:54:46] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=POST
[16:55:20] <effie>	 addshore: which dashoard are you looking at ?
[16:55:31] <addshore>	 thats just on the logstash homepage
[16:55:49] <effie>	 oh lol
[16:56:36] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=POST
[16:57:11] <bawolff>	 Abusefilter looks normal to me
[16:57:19] <bawolff>	 there's always lots of noise in logstash about it
[16:57:30] <addshore>	 just zhwiki then :)
[16:58:26] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:59:14] <marostegui>	 !log Re-enable notifications on the dbstore1005:3318 check T243871
[16:59:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:59:17] <stashbot>	 T243871: Long query running on dbstore1005:3318 - https://phabricator.wikimedia.org/T243871
[17:00:06] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:05:38] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:25:33] <wikibugs>	 (03PS1) 10Bstorm: wiki-replicas: Correct the actor subquery against revision [puppet] - 10https://gerrit.wikimedia.org/r/569249 (https://phabricator.wikimedia.org/T243984)
[17:32:04] <wikibugs>	 (03CR) 10Brian Wolff: [C: 03+1] wiki-replicas: Correct the actor subquery against revision [puppet] - 10https://gerrit.wikimedia.org/r/569249 (https://phabricator.wikimedia.org/T243984) (owner: 10Bstorm)
[17:44:12] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:47:54] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:51:32] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:53:24] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:59:16] <wikibugs>	 (03PS1) 10WMDE-leszek: Test wikibase clients: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569256 (https://phabricator.wikimedia.org/T242087)
[17:59:19] <wikibugs>	 (03PS1) 10WMDE-leszek: Test commons: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569257 (https://phabricator.wikimedia.org/T242087)
[17:59:21] <wikibugs>	 (03PS1) 10WMDE-leszek: Wikidata: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569258 (https://phabricator.wikimedia.org/T242087)
[17:59:23] <wikibugs>	 (03PS1) 10WMDE-leszek: Wikidata client wikis: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569259 (https://phabricator.wikimedia.org/T242087)
[17:59:24] <wikibugs>	 (03PS1) 10WMDE-leszek: Commons: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569260 (https://phabricator.wikimedia.org/T242087)
[17:59:27] <wikibugs>	 (03PS1) 10WMDE-leszek: Wikidata/Wikibase: use entity source Wikibase setting for all wikibase-enabled wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569261 (https://phabricator.wikimedia.org/T242087)
[17:59:37] <wikibugs>	 (03PS1) 10WMDE-leszek: Set wmgUseEntitySourceBasedFederation to true for all wikibase-enabled wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569262 (https://phabricator.wikimedia.org/T241971)
[17:59:39] <wikibugs>	 (03PS1) 10WMDE-leszek: Wikibase: Removed config option wmgUseEntitySourceBasedFederation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569263 (https://phabricator.wikimedia.org/T241975)
[18:01:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Wikibase: Removed config option wmgUseEntitySourceBasedFederation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569263 (https://phabricator.wikimedia.org/T241975) (owner: 10WMDE-leszek)
[18:03:19] <bblack>	 !log depool ats-tls on cp4029
[18:03:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:05:11] <bblack>	 !log depool varnish-fe on cp4029
[18:05:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:13:34] <bblack>	 !log restarted ats-tls and varnish-fe on cp4029
[18:13:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:13:53] <wikibugs>	 (03PS1) 10Majavah: Add wgImportSources for hiwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569267 (https://phabricator.wikimedia.org/T244022)
[18:14:20] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[18:14:25] <bblack>	 !log repool cp4029
[18:14:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Wikidata/Wikibase: use entity source Wikibase setting for all wikibase-enabled wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569261 (https://phabricator.wikimedia.org/T242087) (owner: 10WMDE-leszek)
[18:16:12] <wikibugs>	 (03CR) 10Majavah: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569267 (https://phabricator.wikimedia.org/T244022) (owner: 10Majavah)
[18:16:14] <icinga-wm>	 PROBLEM - Webrequests Varnishkafka log producer on cp4029 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka
[18:16:52] <icinga-wm>	 PROBLEM - statsv Varnishkafka log producer on cp4029 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/statsv.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka
[18:17:23] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Set wmgUseEntitySourceBasedFederation to true for all wikibase-enabled wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569262 (https://phabricator.wikimedia.org/T241971) (owner: 10WMDE-leszek)
[18:17:36] <icinga-wm>	 PROBLEM - eventlogging Varnishkafka log producer on cp4029 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/eventlogging.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka
[18:17:42] <logmsgbot>	 !log bblack@cumin1001 conftool action : set/pooled=yes; selector: name=cp4032.ulsfo.wmnet
[18:17:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:17:49] <bblack>	 !log repool cp4032 (buster)
[18:17:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:18:04] <icinga-wm>	 RECOVERY - Webrequests Varnishkafka log producer on cp4029 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka
[18:18:42] <icinga-wm>	 RECOVERY - statsv Varnishkafka log producer on cp4029 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/statsv.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka
[18:19:28] <icinga-wm>	 RECOVERY - eventlogging Varnishkafka log producer on cp4029 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/eventlogging.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka
[18:22:22] <wikibugs>	 (03PS1) 10BBlack: geodns: bugfix for TR entry, use esams [dns] - 10https://gerrit.wikimedia.org/r/569269
[18:25:25] <wikibugs>	 (03CR) 10Bstorm: "I confirmed in my local dev environment that this runs and produces a functional view.  I'll merge and start deploying it." [puppet] - 10https://gerrit.wikimedia.org/r/569249 (https://phabricator.wikimedia.org/T243984) (owner: 10Bstorm)
[18:25:36] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] wiki-replicas: Correct the actor subquery against revision [puppet] - 10https://gerrit.wikimedia.org/r/569249 (https://phabricator.wikimedia.org/T243984) (owner: 10Bstorm)
[18:29:08] <wikibugs>	 (03CR) 10BBlack: [V: 03+2 C: 03+2] "jerkins, where are you?" [dns] - 10https://gerrit.wikimedia.org/r/569269 (owner: 10BBlack)
[18:32:51] <wikibugs>	 10Operations, 10Traffic: cp4029 varnish-fe freakout - https://phabricator.wikimedia.org/T243634 (10BBlack) We also tested depooling just port 80 yesterday, which didn't affect anything (fd leak was still growing), which means this isn't driven by external->:80 traffic.  cp4029 was at ~400K fds this morning, so...
[18:46:12] <wikibugs>	 (03CR) 10Brian Wolff: [C: 03+1] "Oh, i just noticed there is an actor_revision view that probably needs the same change applied." [puppet] - 10https://gerrit.wikimedia.org/r/569249 (https://phabricator.wikimedia.org/T243984) (owner: 10Bstorm)
[18:52:13] <wikibugs>	 (03CR) 10Bstorm: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/569249 (https://phabricator.wikimedia.org/T243984) (owner: 10Bstorm)
[18:53:48] <wikibugs>	 (03PS1) 10Bstorm: wiki-replicas: Correct the actor_revision subquery against revision [puppet] - 10https://gerrit.wikimedia.org/r/569270 (https://phabricator.wikimedia.org/T243984)
[18:56:05] <wikibugs>	 10Operations, 10Traffic: ulsfo varinsh-fe vcache processes overflow on FDs - https://phabricator.wikimedia.org/T243634 (10CDanis)
[18:59:08] <wikibugs>	 (03CR) 10Brian Wolff: [C: 03+1] wiki-replicas: Correct the actor_revision subquery against revision [puppet] - 10https://gerrit.wikimedia.org/r/569270 (https://phabricator.wikimedia.org/T243984) (owner: 10Bstorm)
[19:00:20] <wikibugs>	 (03PS1) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/569271
[19:00:22] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] wiki-replicas: Correct the actor_revision subquery against revision [puppet] - 10https://gerrit.wikimedia.org/r/569270 (https://phabricator.wikimedia.org/T243984) (owner: 10Bstorm)
[19:02:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] WIP [puppet] - 10https://gerrit.wikimedia.org/r/569271 (owner: 10CDanis)
[19:06:58] <wikibugs>	 (03PS2) 10CDanis: ulsfo cp-text: Prometheus export # of vcache fds [puppet] - 10https://gerrit.wikimedia.org/r/569271 (https://phabricator.wikimedia.org/T243634)
[19:08:46] <wikibugs>	 (03PS3) 10CDanis: ulsfo cp-text: Prometheus export # of vcache fds [puppet] - 10https://gerrit.wikimedia.org/r/569271 (https://phabricator.wikimedia.org/T243634)
[19:14:29] <wikibugs>	 (03CR) 10CDanis: "PCC looks right to me: https://puppet-compiler.wmflabs.org/compiler1001/20579/" [puppet] - 10https://gerrit.wikimedia.org/r/569271 (https://phabricator.wikimedia.org/T243634) (owner: 10CDanis)
[19:48:22] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "> Alexandros Kosiaris, thanks for the reviews. After adding the config.app, I'm getting an error from Jenkins. Do I need to make configmap" [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov)
[19:48:38] <wikibugs>	 (03PS1) 10Jhedden: wiki replicas: update comment filtering in maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/569276
[19:52:32] <wikibugs>	 (03PS2) 10Jhedden: wiki replicas: update comment filtering in maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/569276
[19:53:28] <wikibugs>	 (03PS3) 10Jhedden: wiki replicas: update comment filtering in maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/569276
[19:54:08] <wikibugs>	 (03PS4) 10Jhedden: wiki replicas: update comment filtering in maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/569276
[20:50:32] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[21:07:30] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] wiki replicas: update comment filtering in maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/569276 (owner: 10Jhedden)
[21:07:42] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] "Nice work, tests out perfectly locally." [puppet] - 10https://gerrit.wikimedia.org/r/569276 (owner: 10Jhedden)
[21:17:16] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:21:38] <bstorm_>	 !log updated actor views on labsdb1012 
[21:21:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:22:08] <bstorm_>	 !log updated views on labsdb1009
[21:22:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:28:20] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:32:24] <bstorm_>	 !log updated views on labsdb1010
[21:32:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:33:54] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] "visual irl review by godog" [puppet] - 10https://gerrit.wikimedia.org/r/569271 (https://phabricator.wikimedia.org/T243634) (owner: 10CDanis)
[21:40:50] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+1] "Thank you for finding and fixing this logic bug I hid in my prior patch Jason." [puppet] - 10https://gerrit.wikimedia.org/r/569276 (owner: 10Jhedden)
[21:43:33] <wikibugs>	 (03PS1) 10Bstorm: wiki-replicas: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/569284 (https://phabricator.wikimedia.org/T243984)
[21:45:35] <wikibugs>	 (03PS2) 10Bstorm: wiki-replicas: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/569284 (https://phabricator.wikimedia.org/T243984)
[21:50:28] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] wiki-replicas: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/569284 (https://phabricator.wikimedia.org/T243984) (owner: 10Bstorm)
[21:51:36] <wikibugs>	 (03PS1) 10CDanis: prom-file-count: include symlinks & other special files [puppet] - 10https://gerrit.wikimedia.org/r/569285 (https://phabricator.wikimedia.org/T243634)
[21:52:16] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:54:38] <wikibugs>	 (03PS2) 10CDanis: prom-file-count: include symlinks & other special files [puppet] - 10https://gerrit.wikimedia.org/r/569285 (https://phabricator.wikimedia.org/T243634)
[21:55:52] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] prom-file-count: include symlinks & other special files [puppet] - 10https://gerrit.wikimedia.org/r/569285 (https://phabricator.wikimedia.org/T243634) (owner: 10CDanis)
[21:59:27] <bstorm_>	 !log depooled labsdb1011
[21:59:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:03:22] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[22:05:36] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: ulsfo varinsh-fe vcache processes overflow on FDs - https://phabricator.wikimedia.org/T243634 (10CDanis) https://grafana.wikimedia.org/d/OU_pxz8Wz/cdanis-ulsfo-vcache-open-fds?orgId=1
[22:09:42] <wikibugs>	 (03PS1) 10Bstorm: Revert "wiki-replicas: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/569287
[22:10:08] <wikibugs>	 (03PS2) 10Bstorm: Revert "wiki-replicas: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/569287
[22:11:25] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] Revert "wiki-replicas: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/569287 (owner: 10Bstorm)
[22:14:22] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[22:14:27] <bstorm_>	 !log repooled labsdb1011 now that view work is done
[22:14:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:18:39] <wikibugs>	 10Operations, 10Traffic: ulsfo varinsh-fe vcache processes overflow on FDs - https://phabricator.wikimedia.org/T243634 (10CDanis) Here's some `lsof` output from a faulty-looking vcache process, showing garbage-y sockets that aren't actually associated with any TCP stream:  `cache-mai 16000 vcache *870u     soc...
[22:52:58] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[23:00:20] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[23:07:42] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[23:09:32] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[23:54:30] <wikibugs>	 (03CR) 10Cwhite: "Looks good, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/569285 (https://phabricator.wikimedia.org/T243634) (owner: 10CDanis)