[00:13:32] <logmsgbot>	 !log volker-e@deploy1001 Started deploy [design/style-guide@efc240b]: Deploy design/style-guide:
[00:13:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:13:39] <logmsgbot>	 !log volker-e@deploy1001 Finished deploy [design/style-guide@efc240b]: Deploy design/style-guide:  (duration: 00m 07s)
[00:13:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:12:02] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10observability: Ensure graphs used by Performance account for Varnish-to-ATS migration - https://phabricator.wikimedia.org/T233474 (10Krinkle) a:03ema It looks like the Apache Backend-Timing graphs dried up.  <https://grafana.wikimedia.org/d/000000580/apache-...
[01:24:21] <wikibugs>	 (03PS1) 10Bmansurov: Recommendation API: upgrade node to version 10 [puppet] - 10https://gerrit.wikimedia.org/r/560454 (https://phabricator.wikimedia.org/T241230)
[02:14:23] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2098 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1067.82 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[02:45:25] <logmsgbot>	 !log volker-e@deploy1001 Started deploy [design/style-guide@8b2eda6]: Deploy design/style-guide:
[02:45:33] <logmsgbot>	 !log volker-e@deploy1001 Finished deploy [design/style-guide@8b2eda6]: Deploy design/style-guide:  (duration: 00m 07s)
[02:45:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:45:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:34:45] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2098 is OK: OK slave_sql_lag Replication lag: 0.51 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[05:10:49] <icinga-wm>	 PROBLEM - snapshot of s7 in eqiad on db1115 is CRITICAL: snapshot for s7 at eqiad taken more than 4 days ago: Most recent backup 2019-12-20 05:04:29 https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[06:20:12] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/560416 (owner: 10Volans)
[06:32:42] <wikibugs>	 (03CR) 10Ammarpad: [C: 03+1] "LGTM also." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/560386 (https://phabricator.wikimedia.org/T150618) (owner: 10Subscriptshoe9)
[07:21:36] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: codfw: rack/setup/install  es202[0-5].codfw.wmnet - https://phabricator.wikimedia.org/T241336 (10jcrespo)
[07:30:07] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Restore access for bmansurov - https://phabricator.wikimedia.org/T241089 (10jcrespo) a:03jcrespo I need to ask internally the reasons of removal (expiration, inactivity, other). Knowing the original access request would expedite handling this ticket.
[07:36:35] <wikibugs>	 (03CR) 10ArielGlenn: "I'm not totally excited about some of the formatting but I can live with it, as far as the changes to the dumps and snapshot modules in th" [puppet] - 10https://gerrit.wikimedia.org/r/554825 (https://phabricator.wikimedia.org/T221083) (owner: 10Jbond)
[07:42:48] <wikibugs>	 10Operations, 10Research, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users and researchers for Aroraakhil - https://phabricator.wikimedia.org/T241096 (10jcrespo) a:03Nuria This needs @nuria approval (in addition of @leila) as service owner. I haven't checked the other information giv...
[07:46:25] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Restore access for bmansurov - https://phabricator.wikimedia.org/T241089 (10MoritzMuehlenhoff) >>! In T241089#5762004, @jcrespo wrote: > I need to ask internally the reasons of removal (expiration, inactivity, other). Knowing the original access reques...
[07:51:21] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Restore access for bmansurov - https://phabricator.wikimedia.org/T241089 (10jcrespo) I found the original onboarding ticket: T113069. Sorry for the delay on handling this, these are bad dates. Will proceed as per procedure after the 3 business day delay.
[08:14:49] <icinga-wm>	 RECOVERY - Maps - OSM synchronization lag - codfw on icinga1001 is OK: (C)2.592e+05 ge (W)1.764e+05 ge 2.969e+04 https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=12&fullscreen&orgId=1
[08:23:44] <wikibugs>	 (03PS1) 10Muehlenhoff: Add a define to install a package from a repository component (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/560458 (https://phabricator.wikimedia.org/T240324)
[08:23:45] <icinga-wm>	 PROBLEM - Maps - OSM synchronization lag - codfw on icinga1001 is CRITICAL: 4.955e+06 ge 2.592e+05 https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=12&fullscreen&orgId=1
[08:24:26] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add a define to install a package from a repository component (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/560458 (https://phabricator.wikimedia.org/T240324) (owner: 10Muehlenhoff)
[08:28:46] <wikibugs>	 (03PS1) 10Mathew.onipe: maps: Enable osm replication after state file update. [puppet] - 10https://gerrit.wikimedia.org/r/560459 (https://phabricator.wikimedia.org/T239728)
[08:37:43] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1123 - https://phabricator.wikimedia.org/T240534 (10jcrespo) 05Open→03Resolved ` megacli -PDRbld -ShowProg -PhysDrv [32:9] -aALL                                       Device(Encl-32 Slot-9) is not in rebuild process  Exit Code: 0x00 ` `MegaRAID OK: opt...
[08:39:53] <icinga-wm>	 PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is CRITICAL: 1.023e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[08:52:19] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is OK: (C)1e+05 gt (W)1e+04 gt 324 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[08:57:24] <icinga-wm>	 ACKNOWLEDGEMENT - snapshot of s7 in eqiad on db1115 is CRITICAL: snapshot for s7 at eqiad taken more than 4 days ago: Most recent backup 2019-12-20 05:04:29 Jcrespo retrying now, should be fixed soon https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[08:59:24] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to stat1004, stat1007, stat1006, notebook1003, notebook1004 for Kate Zimmerman - https://phabricator.wikimedia.org/T240732 (10jcrespo) 05Open→03Resolved Because no feedback has been given for a while, this is considered as resolved. Please reopen if yo...
[09:02:01] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review: puppet-merge can't accept an explicit SHA1 for an --ops merge - https://phabricator.wikimedia.org/T241277 (10jcrespo) @CDanis Is this something you plan to work on? Otherwise, who do you need help with? I am trying to triage the importance of this ticket.
[09:03:48] <wikibugs>	 10Operations, 10serviceops, 10Patch-For-Review: PHP Fatal error: Allowed memory size of 524288000 bytes exhausted (tried to allocate 20480 bytes) in /var/www/php-monitoring/lib.php on line 35 - https://phabricator.wikimedia.org/T240824 (10jcrespo) I believe this is a know issue tracked on other ticket (parse...
[09:05:14] <wikibugs>	 10Operations, 10serviceops, 10Patch-For-Review: PHP Fatal error: Allowed memory size of 524288000 bytes exhausted (tried to allocate 20480 bytes) in /var/www/php-monitoring/lib.php on line 35 - https://phabricator.wikimedia.org/T240824 (10jcrespo) T230076 I believe.
[09:11:47] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-extensions-PagedTiffHandler, 10Multimedia, and 2 others: Large TIFF files do not pass file verification (related to version of image magick installed) - https://phabricator.wikimedia.org/T240455 (10jcrespo) CC @MoritzMuehlenhoff (Re: get operations to use a newer versio...
[09:16:09] <wikibugs>	 (03PS1) 10Jcrespo: Revert "Increase nginx limits on http resp hdr block size" [puppet] - 10https://gerrit.wikimedia.org/r/560514
[09:16:20] <wikibugs>	 (03PS1) 10Jcrespo: Revert "varnish: temporarily allow more response headers" [puppet] - 10https://gerrit.wikimedia.org/r/560515
[09:18:29] <wikibugs>	 (03PS2) 10Jcrespo: Revert "varnish: temporarily allow more response headers" [puppet] - 10https://gerrit.wikimedia.org/r/560515
[09:18:48] <wikibugs>	 (03PS2) 10Jcrespo: Revert "Increase nginx limits on http resp hdr block size" [puppet] - 10https://gerrit.wikimedia.org/r/560514
[09:19:53] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-extensions-CentralAuth, 10TimedMediaHandler, and 5 others: Consistent HTTP 503 Error on some urls for some logged-in users (CentralAuth Set-Cookie storm) - https://phabricator.wikimedia.org/T226840 (10jcrespo) p:05High→03Normal > Yes, I think, once those...
[09:22:07] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-extensions-PagedTiffHandler, 10Multimedia, and 2 others: Large TIFF files do not pass file verification (related to version of image magick installed) - https://phabricator.wikimedia.org/T240455 (10MoritzMuehlenhoff) Providing Imagemagick 7 is non-trivial given that Deb...
[09:22:51] <wikibugs>	 10Operations, 10SRE-swift-storage, 10serviceops, 10Patch-For-Review, and 2 others: Swift object servers become briefly unresponsive on a regular basis - https://phabricator.wikimedia.org/T226373 (10jcrespo) What is the right followup after a month? "I don't know" is an ok answer, I just want to clarify the...
[09:24:48] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-extensions-PagedTiffHandler, 10Multimedia, and 2 others: Large TIFF files do not pass file verification (related to version of image magick installed) - https://phabricator.wikimedia.org/T240455 (10jcrespo) @Bawolff Is the answer clarifying enough? Aiming for Bullseye (...
[09:27:02] <wikibugs>	 10Operations, 10Gerrit, 10serviceops, 10Patch-For-Review: Convert Gerrit to use H2 as the database - https://phabricator.wikimedia.org/T211139 (10jcrespo) 05Open→03Stalled p:05High→03Normal This seems to be installed due to concerns raised at T211139#4798560, to be revisited later.
[09:34:47] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Restore access for bmansurov - https://phabricator.wikimedia.org/T241089 (10jcrespo) p:05Triage→03High
[09:35:33] <wikibugs>	 10Operations, 10Research, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users and researchers for Aroraakhil - https://phabricator.wikimedia.org/T241096 (10jcrespo) p:05Triage→03High
[09:51:48] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-extensions-PagedTiffHandler, 10Multimedia, and 2 others: Large TIFF files do not pass file verification (related to version of image magick installed) - https://phabricator.wikimedia.org/T240455 (10TheDJ) So what bawolff quotes:   > identify-im6.q16: cache resources exh...
[10:00:36] <wikibugs>	 10Operations, 10Mail: CA App Synthetic Monitor Mail (SMTP): Connection timed out; connect(): -2 - https://phabricator.wikimedia.org/T240906 (10jcrespo) There has been multiple of mx1001 issues lately (even if that is unreliable, it is worth noting). My suggestion would be, at least initially, to detect the sam...
[10:05:04] <wikibugs>	 10Operations, 10Mail: CA App Synthetic Monitor Mail (SMTP): Connection timed out; connect(): -2 - https://phabricator.wikimedia.org/T240906 (10jcrespo) p:05Normal→03High I am going to mark this as high, as we have now daily alerts, assuming those are real.
[10:06:55] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-extensions-PagedTiffHandler, 10Multimedia, and 2 others: Large TIFF files do not pass file verification (related to version of image magick installed) - https://phabricator.wikimedia.org/T240455 (10TheDJ) This is likely the same issue as {T124662}
[10:08:49] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-extensions-PagedTiffHandler, 10Multimedia, and 2 others: Large TIFF files do not pass file verification (related to version of image magick installed) - https://phabricator.wikimedia.org/T240455 (10TheDJ) Interestingly enough tiffinfo can be used in place of identify.....
[10:09:45] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 27992816 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[10:15:07] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 897664 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[10:29:00] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM, minor comment inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/560373 (https://phabricator.wikimedia.org/T241348) (owner: 10Andrew Bogott)
[10:30:53] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-extensions-PagedTiffHandler, 10Multimedia, and 2 others: Large TIFF files do not pass file verification (related to version of image magick installed) - https://phabricator.wikimedia.org/T240455 (10TheDJ) It was [[ https://github.com/wikimedia/operations-mediawiki-confi...
[10:35:55] <icinga-wm>	 PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is CRITICAL: 1.64e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[10:43:59] <wikibugs>	 (03PS1) 10Andrew Bogott: nova firstboot script: disable 'growpart' in cloud-config [puppet] - 10https://gerrit.wikimedia.org/r/560516 (https://phabricator.wikimedia.org/T241322)
[10:48:06] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: nova firstboot script: disable 'growpart' in cloud-config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/560516 (https://phabricator.wikimedia.org/T241322) (owner: 10Andrew Bogott)
[10:49:58] <wikibugs>	 (03CR) 10Andrew Bogott: nova firstboot script: disable 'growpart' in cloud-config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/560516 (https://phabricator.wikimedia.org/T241322) (owner: 10Andrew Bogott)
[10:51:50] <wikibugs>	 (03CR) 10Andrew Bogott: nova firstboot script: disable 'growpart' in cloud-config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/560516 (https://phabricator.wikimedia.org/T241322) (owner: 10Andrew Bogott)
[10:52:32] <wikibugs>	 (03PS2) 10Andrew Bogott: nova firstboot script: disable 'growpart' in cloud-config [puppet] - 10https://gerrit.wikimedia.org/r/560516 (https://phabricator.wikimedia.org/T241322)
[10:55:27] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[11:06:41] <wikibugs>	 (03PS1) 10Andrew Bogott: nova firstboot: move the serial tty logic out of the base image [puppet] - 10https://gerrit.wikimedia.org/r/560517 (https://phabricator.wikimedia.org/T181375)
[11:30:32] <wikibugs>	 (03PS3) 10Andrew Bogott: nova firstboot script: disable 'growpart' in cloud-config [puppet] - 10https://gerrit.wikimedia.org/r/560516 (https://phabricator.wikimedia.org/T241322)
[11:30:34] <wikibugs>	 (03PS2) 10Andrew Bogott: nova firstboot: move the serial tty logic out of the base image [puppet] - 10https://gerrit.wikimedia.org/r/560517 (https://phabricator.wikimedia.org/T181375)
[11:30:36] <wikibugs>	 (03PS1) 10Andrew Bogott: openstack nova: rename firstboot.sh to userdata.txt [puppet] - 10https://gerrit.wikimedia.org/r/560520
[11:36:28] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-extensions-PagedTiffHandler, 10Multimedia, and 2 others: Large TIFF files do not pass file verification (related to version of image magick installed) - https://phabricator.wikimedia.org/T240455 (10Reedy) >>! In T240455#5762118, @TheDJ wrote: > It was [[ https://github....
[11:36:32] <wikibugs>	 (03PS1) 10Reedy: Revert "Remove $wgUseImageResize as same as default" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/560521
[11:36:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "Remove $wgUseImageResize as same as default" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/560521 (owner: 10Reedy)
[11:37:17] <wikibugs>	 (03PS2) 10Reedy: Revert "Remove $wgTiffUseTiffinfo because it doesn't exist" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/560521
[11:37:21] <wikibugs>	 (03PS3) 10Reedy: Revert "Remove $wgTiffUseTiffinfo because it doesn't exist" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/560521
[11:37:42] <wikibugs>	 (03PS4) 10Reedy: Revert "Remove $wgTiffUseTiffinfo because it doesn't exist" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/560521 (https://phabricator.wikimedia.org/T240455)
[11:40:59] <wikibugs>	 (03CR) 10Reedy: [C: 03+2] Revert "Remove $wgTiffUseTiffinfo because it doesn't exist" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/560521 (https://phabricator.wikimedia.org/T240455) (owner: 10Reedy)
[11:42:03] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Remove $wgTiffUseTiffinfo because it doesn't exist" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/560521 (https://phabricator.wikimedia.org/T240455) (owner: 10Reedy)
[11:43:45] <logmsgbot>	 !log reedy@deploy1001 Synchronized wmf-config/CommonSettings.php: use TiffInfo again T240455 (duration: 01m 07s)
[11:43:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:43:53] <stashbot>	 T240455: Large TIFF files do not pass file verification (related to version of image magick installed) - https://phabricator.wikimedia.org/T240455
[11:50:49] <icinga-wm>	 RECOVERY - snapshot of s7 in eqiad on db1115 is OK: snapshot for s7 at eqiad taken less than 4 days ago and larger than 90 GB: Last one 2019-12-24 10:11:43 from db1116.eqiad.wmnet:3317 (894 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[12:07:50] <wikibugs>	 (03CR) 10Volans: [C: 03+2] images: fix authentication [software/debmonitor] - 10https://gerrit.wikimedia.org/r/560416 (owner: 10Volans)
[12:10:30] <wikibugs>	 (03Merged) 10jenkins-bot: images: fix authentication [software/debmonitor] - 10https://gerrit.wikimedia.org/r/560416 (owner: 10Volans)
[12:17:12] <wikibugs>	 (03PS1) 10Volans: Release v0.2.2 [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/560525
[12:18:49] <wikibugs>	 (03CR) 10Volans: [V: 03+2 C: 03+2] Release v0.2.2 [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/560525 (owner: 10Volans)
[12:20:20] <logmsgbot>	 !log volans@deploy1001 Started deploy [debmonitor/deploy@39ad186]: Release v0.2.2 - T241206
[12:20:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:20:27] <stashbot>	 T241206: Report image metadata to debmonitor - https://phabricator.wikimedia.org/T241206
[12:21:00] <logmsgbot>	 !log volans@deploy1001 Finished deploy [debmonitor/deploy@39ad186]: Release v0.2.2 - T241206 (duration: 00m 40s)
[12:21:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:23:49] <wikibugs>	 10Operations, 10SRE-tools, 10docker-pkg, 10serviceops, 10Patch-For-Review: Report image metadata to debmonitor - https://phabricator.wikimedia.org/T241206 (10Volans) The issue for the `DELETE` has been fixed, I've successfully deleted the image `docker-registry.wikimedia.org/python3-build-stretch:0.0.2`...
[12:54:00] <wikibugs>	 (03PS1) 10Elukey: airflow: fix hdfs fuse mountpoint check [puppet] - 10https://gerrit.wikimedia.org/r/560527
[12:55:59] <icinga-wm>	 RECOVERY - Maps tiles generation on icinga1001 is OK: OK: Less than 90.00% under the threshold [10.0] https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=8&fullscreen&orgId=1
[12:57:00] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] airflow: fix hdfs fuse mountpoint check [puppet] - 10https://gerrit.wikimedia.org/r/560527 (owner: 10Elukey)
[13:38:49] <icinga-wm>	 PROBLEM - Disk space on wdqs1006 is CRITICAL: DISK CRITICAL - free space: /srv 53349 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=wdqs1006&var-datasource=eqiad+prometheus/ops
[13:46:20] <jynus>	 ^gehel: something that could be deleted there?
[13:54:04] <gehel>	 jynus: looking. A data reload is probably the only solution
[13:54:19] <jynus>	 I was creating a ticket
[13:54:23] <gehel>	 At least the only short term solution
[13:54:26] <jynus>	 I was about to tunefs -m0
[13:54:31] <jynus>	 to get some gigabytes
[13:54:46] <jynus>	 ok with that?
[13:56:04] <wikibugs>	 (03PS3) 10Andrew Bogott: Add initial config for Openstack Pike [puppet] - 10https://gerrit.wikimedia.org/r/560372 (https://phabricator.wikimedia.org/T241347)
[13:56:08] <wikibugs>	 (03PS3) 10Andrew Bogott: Openstack Designate: add manifests for Openstack Pike [puppet] - 10https://gerrit.wikimedia.org/r/560373 (https://phabricator.wikimedia.org/T241348)
[13:56:10] <wikibugs>	 (03PS2) 10Andrew Bogott: keystone/pike: remove obsolete filter from paste.ini [puppet] - 10https://gerrit.wikimedia.org/r/560375 (https://phabricator.wikimedia.org/T241347)
[13:56:12] <wikibugs>	 (03PS2) 10Andrew Bogott: nova/pike: update policy.json for new Pike policy changes [puppet] - 10https://gerrit.wikimedia.org/r/560376 (https://phabricator.wikimedia.org/T241347)
[13:56:21] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nova firstboot script: disable 'growpart' in cloud-config [puppet] - 10https://gerrit.wikimedia.org/r/560516 (https://phabricator.wikimedia.org/T241322) (owner: 10Andrew Bogott)
[13:56:45] <wikibugs>	 (03PS3) 10Andrew Bogott: nova firstboot: move the serial tty logic out of the base image [puppet] - 10https://gerrit.wikimedia.org/r/560517 (https://phabricator.wikimedia.org/T181375)
[13:57:07] <wikibugs>	 (03PS2) 10Andrew Bogott: openstack nova: rename firstboot.sh to userdata.txt [puppet] - 10https://gerrit.wikimedia.org/r/560520
[13:58:27] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nova firstboot: move the serial tty logic out of the base image [puppet] - 10https://gerrit.wikimedia.org/r/560517 (https://phabricator.wikimedia.org/T181375) (owner: 10Andrew Bogott)
[13:58:28] <jynus>	 !log tune2fs -m 0 /dev/mapper/wdqs1006--vg-data T241418
[13:58:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:35] <stashbot>	 T241418: wdqs1006 /srv low on disk space - https://phabricator.wikimedia.org/T241418
[13:58:49] <gehel>	 jynus: can you hold up on the tune2fs?
[13:58:56] <gehel>	 or is it already done'
[13:59:08] <jynus>	 sorry, I did it already
[13:59:18] <gehel>	 ok, no problem
[13:59:19] <jynus>	 I can undo it
[13:59:47] <jynus>	 I guessed it was ok if you planned on rebuild it
[13:59:49] <gehel>	 the journal is exploding, not the first time we have that, but we don't really understand how that happens
[14:00:06] <gehel>	 I'm just going to copy the journal from another system, no do a full rebuild
[14:00:09] <icinga-wm>	 RECOVERY - Disk space on wdqs1006 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=wdqs1006&var-datasource=eqiad+prometheus/ops
[14:00:41] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack nova: rename firstboot.sh to userdata.txt [puppet] - 10https://gerrit.wikimedia.org/r/560520 (owner: 10Andrew Bogott)
[14:00:51] <wikibugs>	 (03PS4) 10Andrew Bogott: Add initial config for Openstack Pike [puppet] - 10https://gerrit.wikimedia.org/r/560372 (https://phabricator.wikimedia.org/T241347)
[14:00:53] <wikibugs>	 (03PS4) 10Andrew Bogott: Openstack Designate: add manifests for Openstack Pike [puppet] - 10https://gerrit.wikimedia.org/r/560373 (https://phabricator.wikimedia.org/T241348)
[14:00:55] <wikibugs>	 (03PS3) 10Andrew Bogott: keystone/pike: remove obsolete filter from paste.ini [puppet] - 10https://gerrit.wikimedia.org/r/560375 (https://phabricator.wikimedia.org/T241347)
[14:00:57] <wikibugs>	 (03PS3) 10Andrew Bogott: nova/pike: update policy.json for new Pike policy changes [puppet] - 10https://gerrit.wikimedia.org/r/560376 (https://phabricator.wikimedia.org/T241347)
[14:02:06] <wikibugs>	 (03PS2) 10Gehel: maps: Enable osm replication after state file update. [puppet] - 10https://gerrit.wikimedia.org/r/560459 (https://phabricator.wikimedia.org/T239728) (owner: 10Mathew.onipe)
[14:03:05] <jynus>	 gehel: to be fair, I don't see much disadvantages on 0% reserved blocks with a 95% filesystemd utilization on a non-root partition
[14:04:24] <gehel>	 it gives us some head space if we have a more subtle disk space issue in the future, but agreed, not much  of a change
[14:05:47] <jynus>	 "disk space issue" like today :-P
[14:06:15] <gehel>	 well, this one, I know what to do to fix it in the short term
[14:06:40] <jynus>	 again, can be undone, I thought you were out and I was gaining a few hours
[14:06:47] <gehel>	 I might not know what to do with the next one :)
[14:06:57] <jynus>	 but more disks!
[14:06:59] <jynus>	 *buy
[14:07:17] <gehel>	 though honestly, I don't know what could be worth than having blazegraph unable to recover free space :/
[14:07:41] <jynus>	 let me know if I can help
[14:08:04] <gehel>	 jynus: thanks! but things are undercontrol (well as much as they can be)
[14:11:04] <logmsgbot>	 !log gehel@cumin1001 START - Cookbook sre.wdqs.data-transfer
[14:11:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:11:27] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service: Some queries causes wdqs-blazegraph on wdqs1006 to crash and restart - https://phabricator.wikimedia.org/T213191 (10jcrespo) Issue started the 22 Dec at around 2:16  https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=wdqs1006&var-dat...
[14:13:43] <gehel>	 !log data reload from wdqs1008 to wdqs1006 - T241418
[14:13:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:50] <stashbot>	 T241418: wdqs1006 /srv low on disk space - https://phabricator.wikimedia.org/T241418
[14:28:57] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:30:46] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[14:38:34] <logmsgbot>	 !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime
[14:38:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:38:52] <logmsgbot>	 !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[14:38:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:47:26] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: don't show puppet diff in files which may contain passwords [puppet] - 10https://gerrit.wikimedia.org/r/560530
[14:48:17] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] openstack: don't show puppet diff in files which may contain passwords [puppet] - 10https://gerrit.wikimedia.org/r/560530 (owner: 10Arturo Borrero Gonzalez)
[14:49:50] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: don't show puppet diff in files which may contain passwords [puppet] - 10https://gerrit.wikimedia.org/r/560530 (owner: 10Arturo Borrero Gonzalez)
[15:13:16] <logmsgbot>	 !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime
[15:13:21] <logmsgbot>	 !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[15:13:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:13:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:53] <logmsgbot>	 !log gehel@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
[15:33:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:36:01] <icinga-wm>	 PROBLEM - WDQS high update lag on wdqs1008 is CRITICAL: 4756 ge 3600 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[15:41:14] <icinga-wm>	 ACKNOWLEDGEMENT - WDQS high update lag on wdqs1008 is CRITICAL: 4559 ge 3600 Gehel recovery after data reload https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[15:53:25] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[16:20:51] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/dashboard/file/swift?panelId=9&fullscreen&orgId=1&var-DC=eqiad
[16:21:01] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/dashboard/file/swift?panelId=9&fullscreen&orgId=1&var-DC=codfw
[16:32:59] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[16:35:27] <wikibugs>	 (03PS1) 10Jcrespo: swift: Fix icinga+prometheus+grafana alert link (Dashboard not found) [puppet] - 10https://gerrit.wikimedia.org/r/560538
[16:38:19] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-1] "Needs checking and fix, and reviewing of the other links, but uploading as a reminder to check later." [puppet] - 10https://gerrit.wikimedia.org/r/560538 (owner: 10Jcrespo)
[16:39:04] <wikibugs>	 (03PS2) 10Jcrespo: swift: Fix icinga+prometheus+grafana alert link (Dashboard not found) [puppet] - 10https://gerrit.wikimedia.org/r/560538
[16:39:36] <wikibugs>	 (03PS3) 10Jcrespo: swift: Fix icinga+prometheus+grafana alert link (Dashboard not found) [puppet] - 10https://gerrit.wikimedia.org/r/560538
[16:40:09] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[16:40:43] <icinga-wm>	 RECOVERY - WDQS high update lag on wdqs1008 is OK: (C)3600 ge (W)1200 ge 1113 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[16:51:23] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/dashboard/file/swift?panelId=9&fullscreen&orgId=1&var-DC=eqiad
[16:51:35] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/dashboard/file/swift?panelId=9&fullscreen&orgId=1&var-DC=codfw
[17:01:47] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1004 is OK: OK: Less than 20.00% above the threshold [500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[17:30:51] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/dashboard/file/swift?panelId=9&fullscreen&orgId=1&var-DC=eqiad
[17:31:01] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/dashboard/file/swift?panelId=9&fullscreen&orgId=1&var-DC=codfw
[17:32:13] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[17:57:13] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[18:09:45] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[18:43:39] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1004 is OK: OK: Less than 20.00% above the threshold [500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[19:28:21] <wikibugs>	 10Operations, 10Machine vision, 10Product-Infrastructure-Team-Backlog, 10Structured-Data-Backlog, and 5 others: Some jobs are not being processed / are processed slowly - https://phabricator.wikimedia.org/T240518 (10MarcoAurelio) @jcrespo @Pchelolo Could this be related to T241294 somewhat? One job is from...
[21:33:12] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-extensions-PagedTiffHandler, 10Multimedia, and 2 others: Large TIFF files do not pass file verification (related to version of image magick installed) - https://phabricator.wikimedia.org/T240455 (10Reedy) Can someone try the large tiff(s) again? :)
[21:37:06] <apergos>	 reedy, my connection is prohibitively slow for that, or I would try it. but 350 mb upload?  nope
[21:37:28] <apergos>	 it's already taking enough time just to download it :-P
[21:37:33] <Reedy>	 heh, if I was at home I'd do it
[21:37:46] <Reedy>	 In a B&B... but not sure on connection speed yet
[21:37:51] <apergos>	 mm
[21:37:55] <apergos>	 might be better than mine :-P
[21:38:03] <Reedy>	 It is Sweden, and they do like their Fibre
[21:38:15] <apergos>	 I wish they would hurry up with the dang fiber rollout (=vdsl for us)
[21:38:48] <apergos>	 in theory fiber to the home is supposed to be possible in this neighborhood once the rollout is complete, but no one has said they would be offering it I guess
[21:38:49] <apergos>	 meh
[21:39:01] <apergos>	 only vdsl, but that will still be way better than what I have
[21:39:14] <apergos>	 anyways, why not do a speed test where you are? :-P
[21:39:32] <apergos>	 it's STILL downloading
[21:40:17] <apergos>	 done at last :-/
[21:41:48] <Reedy>	 45-50 meg down...
[21:41:49] <Reedy>	 11 ish up
[21:42:21] <Reedy>	 "Anna Norrie, rollporträtt - SMV - NN054.tif (326M) is too large for Google to scan for viruses"
[21:42:26] <Reedy>	 Pfft. as if google doesn't have the resources
[21:43:43] <Reedy>	 ~5 minutes to upload
[21:43:45] <Reedy>	 Might aswell try
[21:43:53] <Reedy>	 5-10, it's fluctuating
[21:47:56] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-extensions-PagedTiffHandler, 10Multimedia, and 2 others: Large TIFF files do not pass file verification (related to version of image magick installed) - https://phabricator.wikimedia.org/T240455 (10Bawolff) >Maybe it's just that Debian has adjusted the memory limit poli...
[21:50:59] <apergos>	 you hve better upload speed than me indeed
[21:51:04] <apergos>	 welp, go go go
[21:58:00] <Reedy>	 http://commons.wikimedia.org/wiki/File:Anna_Norrie,_rollportr%C3%A4tt_-_SMV_-_NN054.tif
[21:59:51] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-extensions-PagedTiffHandler, 10Multimedia, and 2 others: Large TIFF files do not pass file verification (related to version of image magick installed) - https://phabricator.wikimedia.org/T240455 (10Reedy) 05Open→03Resolved a:03Reedy It works!  @Alicia_Fagerving_WM...
[22:06:19] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-extensions-PagedTiffHandler, 10Multimedia, and 2 others: Large TIFF files do not pass file verification (related to version of image magick installed) - https://phabricator.wikimedia.org/T240455 (10Reedy) >>! In T240455#5762109, @TheDJ wrote: > This is possibly related...
[22:22:45] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[22:24:33] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[22:40:30] <wikibugs>	 (03CR) 10BryanDavis: "I would personally be a lot happier with line length of 79. Black's formatting is tolerable otherwise. Black can be configured with a pypr" [puppet] - 10https://gerrit.wikimedia.org/r/554825 (https://phabricator.wikimedia.org/T221083) (owner: 10Jbond)
[22:41:25] <apergos>	 bd808: 100 or bust!  (I truly truly hate 79 as the cutoff0
[22:41:26] <apergos>	 )
[22:41:43] <apergos>	 and with that, good night from Greece where it is already the 25th, happy holidays, etc!
[22:41:50] <bd808>	 I truly hate anything >79 ;)
[22:42:03] <apergos>	 fight! fight! fight! :-P :-D
[22:42:05] <apergos>	 but not today
[22:42:08] <apergos>	 maybe next year :-D
[22:42:20] * apergos pulls the covers up
[22:45:17] <bd808>	 I can fit an 80 char terminal and a 1024x800 browser side by side on my laptop screen. If I make the terminal 100 chars then my browser has to be squished to a size that modern websites decides to treat as a tablet. Vim soft wraps, so >80 char lines are visible, but ugly.
[22:46:30] <bd808>	 Also (and this won't win an argument with a.pergos) I have used 79 chars for editing for something more than 25 years and wider files look really really weird
[23:14:28] <wikibugs>	 10Operations, 10Gerrit, 10serviceops, 10Patch-For-Review: Convert Gerrit to use H2 as the database - https://phabricator.wikimedia.org/T211139 (10Paladox) >>! In T211139#4798560, @Dzahn wrote: > On one hand i would love this because it would make the gerrit codfw slave work which is blocked to lack of misc...
[23:58:43] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - /usr/lib/nagios/plugins/check_ripe_atlas.py failed with HTTPError: HTTP Error 500: Internal Server Error https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[23:59:11] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets