[00:25:31] <icinga-wm>	 PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash instance=kafkamon1001:9501 job=burrow partition={2,3} site=eqiad topic=udp_localhost-info https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logg
[00:25:31] <icinga-wm>	 ic=All&var-consumer_group=All
[00:34:33] <icinga-wm>	 RECOVERY - Too many messages in kafka logging-eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All
[01:44:27] <wikibugs>	 10Operations, 10Research, 10Wikimedia-Mailing-lists: Admin password reset request for a mailman list: research-wmf - https://phabricator.wikimedia.org/T255326 (10Reedy)
[02:00:17] <icinga-wm>	 PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash instance=kafkamon1001:9501 job=burrow partition={2,3} site=eqiad topic=udp_localhost-info https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logg
[02:00:17] <icinga-wm>	 ic=All&var-consumer_group=All
[02:03:51] <icinga-wm>	 RECOVERY - Too many messages in kafka logging-eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All
[02:17:07] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[02:18:55] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[02:38:53] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[02:39:29] <icinga-wm>	 RECOVERY - OSPF status on cr2-codfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:24:59] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1055 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:50:37] <icinga-wm>	 RECOVERY - Check systemd state on ms-be1055 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:21:01] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[05:30:11] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is OK: HTTP OK: HTTP/1.0 200 OK - 23533 bytes in 0.255 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[05:40:03] <icinga-wm>	 PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash instance=kafkamon1001:9501 job=burrow partition=3 site=eqiad topic=udp_localhost-info https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logging-
[05:40:03] <icinga-wm>	 ll&var-consumer_group=All
[05:41:51] <icinga-wm>	 RECOVERY - Too many messages in kafka logging-eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All
[06:34:55] <icinga-wm>	 PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:35:49] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 133, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:46:53] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 135, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:47:49] <icinga-wm>	 RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200613T0700)
[07:03:21] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job={atlas_exporter,swagger_check_cxserver_cluster_eqiad} site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[07:05:09] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:01:25] <icinga-wm>	 PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash instance=kafkamon1001:9501 job=burrow partition={2,3} site=eqiad topic=udp_localhost-info https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logg
[08:01:25] <icinga-wm>	 ic=All&var-consumer_group=All
[08:03:54] <elukey>	 mmmm
[08:04:09] <elukey>	 so the max lag seems to be for udp-localhost_info
[08:04:23] <elukey>	 but the topic didn't seem to have changed its volume size (https://grafana.wikimedia.org/d/000000234/kafka-by-topic?orgId=1&from=now-12h&to=now&refresh=5m&var-datasource=eqiad%20prometheus%2Fops&var-kafka_cluster=logging-eqiad&var-kafka_broker=All&var-topic=udp_localhost-info)
[08:11:26] <elukey>	 there is a nice match between kafka lag and logstash1008's gc timings
[08:20:22] <elukey>	 I added a breakdown to the logstash dashboard with eden/survivor/old-gen heap areas
[08:20:45] <elukey>	 and I think that CMS is struggling to free space
[08:22:57] <elukey>	 we could try to restart it but never done it, logstash1008 is behind LVS etc.. so better not to do anything weird on a saturday
[08:23:07] <elukey>	 Will Cc: godog,herron,shdubsh --^
[08:23:38] * elukey afk, will check later
[09:13:43] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[09:15:33] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[10:07:01] <ShakespeareFan00>	 Okay something is wrong 
[10:07:16] <ShakespeareFan00>	 I shouldn't get an error message when trying to upload a file on Commons..
[10:07:17] <ShakespeareFan00>	 Request from 88.97.96.89 via cp3062 frontend, Varnish XID 547625252
[10:07:17] <ShakespeareFan00>	 Error: 503, Backend fetch failed at Sat, 13 Jun 2020 10:06:25 GMT
[10:07:37] <ShakespeareFan00>	 when trying to use the Upload form at Commons to upload something I KNOW exists
[10:07:50] <ShakespeareFan00>	 https://archive.org/download/catalogofcopy13libr/catalogofcopy13libr.pdf
[10:07:52] <ShakespeareFan00>	 exists
[10:08:00] <ShakespeareFan00>	 and is PD as a US Government work
[10:08:22] <ShakespeareFan00>	 If large files can't be uploaded from a URL, then the documentation ought to SAY SO!!
[10:29:00] <ShakespeareFan00>	 Anyone know what might have broken?
[10:29:30] <ShakespeareFan00>	 Because not being to upload large PDF's to Commons is a distinct disincentive to my continuing to contribute to WMF projects
[10:32:02] <Majavah>	 sunday at night US time is probably the worst time to complain about non-emergencies
[10:38:18] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] package_builder: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/605240 (owner: 10Muehlenhoff)
[10:41:57] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] profile::icinga: move single line scripts in line [puppet] - 10https://gerrit.wikimedia.org/r/605271 (https://phabricator.wikimedia.org/T254480) (owner: 10Jbond)
[10:46:20] <wikibugs>	 (03CR) 10Dzahn: "The profile::openstack::eqiad1::galera::monitoring would have to be included by a role to actually be used?" [puppet] - 10https://gerrit.wikimedia.org/r/605315 (owner: 10Andrew Bogott)
[10:53:57] <ShakespeareFan00>	 Majavah: It's Saturday Morning where I am
[10:54:39] <Majavah>	 oh wait it's saturday, I need more coffee :D
[10:55:23] <Majavah>	 anyways I'd recommend creating a task on Phabricator so someone can take a look during the working week
[10:55:48] <mutante>	 ShakespeareFan00: when using tools that don't support chunked uploads the limit is 100 MiB. the docs say so at https://commons.wikimedia.org/wiki/Commons:Maximum_file_size  the file you link to is _just_ over that limit with 101.69 or something
[10:56:12] <ShakespeareFan00>	 mutante: Commons says uploading from a URL should allow anything up to 4GB
[10:56:14] <mutante>	 also Saturday or Sunday both are not during regular work hours. please use tickets to report problems.
[10:56:30] <ShakespeareFan00>	 (sigh) 
[10:56:36] <mutante>	 ShakespeareFan00: see link above "Uploads using the Upload Wizard, other tools that support chunked uploads, and server-side uploads must be smaller than this limit.[1][2] Otherwise the limit is 100 MiB (104,857,600 bytes)[3]"
[10:56:58] <ShakespeareFan00>	 mutante: Then WHY DOES COMMONS SAY OTHERWISE?
[10:57:07] <ShakespeareFan00>	 (Apologies but I have to be blunt)
[10:57:15] <mutante>	 first google result when looking for commons maximum file size. please stop shouting
[10:57:19] <ShakespeareFan00>	 No
[10:57:33] <ShakespeareFan00>	 I followed the advice on commons, and I get errors
[10:57:42] <ShakespeareFan00>	 I expect some sort of answer
[10:57:45] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 56 probes of 575 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[10:57:53] <mutante>	 then you gotta ask on commons
[10:57:55] <ShakespeareFan00>	 Ok, maybe I am a little frustrated
[10:58:10] <ShakespeareFan00>	 (sigh) Defer the problem again instead of actually solving it :9
[10:58:13] <ShakespeareFan00>	 :(
[10:58:19] <mutante>	 i am here by mere coincidence and decided to help you
[10:58:22] <mutante>	 good bye
[11:03:33] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 48 probes of 575 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:12:16] <ShakespeareFan00>	 mutante: Sorry
[11:12:39] <ShakespeareFan00>	 for shouting
[11:13:06] <ShakespeareFan00>	 but it's get very frustrating getting persistent errors when you follow the documentation
[11:13:24] <ShakespeareFan00>	 The actual error I am getting when trying to do the upload is 
[11:13:47] <ShakespeareFan00>	 "Request from 88.97.96.89 via cp3062 frontend, Varnish XID 625639739
[11:13:47] <ShakespeareFan00>	 Error: 503, Backend fetch failed at Sat, 13 Jun 2020 11:10:56 GMT"
[11:14:06] <ShakespeareFan00>	 I think the file still uploads, but then something fails elsewhere
[11:14:17] <ShakespeareFan00>	 I think there's already a phabricator ticket about that
[11:14:48] <ShakespeareFan00>	 I don't at this point think it's strictly a Commons issue though.
[11:14:58] <ShakespeareFan00>	 And once again sorry for shouting..
[11:18:57] <Krenair>	 a 5xx indicates something went wrong on the server side
[11:19:47] <Krenair>	 so yeah, there should probably be a phab ticket somewhere
[11:23:32] <ShakespeareFan00>	 My aplogies for getting upset
[11:24:51] <ShakespeareFan00>	 Krenair: https://phabricator.wikimedia.org/T255238
[11:25:06] <ShakespeareFan00>	 https://phabricator.wikimedia.org/T254459
[11:25:24] <ShakespeareFan00>	 Although those don't mention the 5xxx errors when using Special:Upload directly
[11:26:08] <ShakespeareFan00>	 Krenair: Can I also borrow your expertise in #wikimedia-tech to help resolve a script problem?
[11:26:29] <Krenair>	 ok
[11:49:13] <wikibugs>	 10Operations, 10Cloud-VPS, 10DNS, 10Maps, and 2 others: multi-component wmflabs.org subdomains doesn't work under simple wildcard TLS cert - https://phabricator.wikimedia.org/T161256 (10TheDJ) I replaced the redirects with a general http -> https redirect protocol upgrade.
[12:12:29] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: x1 on db2101 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1677, Errmsg: Column 2 of table wikishared.echo_unread_wikis cannot be converted from type varchar(30) to type varbinary(64) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[12:13:15] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: x1 on db2101 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 86463.06 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[12:32:33] <godog>	 elukey: thanks! yeah I'll bounce logstash on logstash1008
[12:33:35] <godog>	 !log bounce logstash on logstash1008, GC death
[12:33:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:34:10] <qchris>	 !log Disabling puppet on gerrit1002 (test instance) to do some more upgrade testing
[12:34:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:43:17] <herron>	 hey godog was just looking at the same thanks!
[12:44:02] <godog>	 herron: np!
[12:44:52] <godog>	 looks like it is recovering, waiting and see if some other logstash instance is affected
[12:47:09] <herron>	 yeah I was thinking a full restart of the collectors wouldn't be a bad idea
[12:51:47] <herron>	 !log restarted logstash service on logstash1007, logstash1009
[12:51:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:53:08] <godog>	 SGTM
[12:54:44] <herron>	 since lag is trending in the right direction now I'll resume my trip to the grocery!  thanks!  please ping if needed
[12:57:26] <godog>	 ok! ttyl
[13:01:39] <icinga-wm>	 RECOVERY - Too many messages in kafka logging-eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All
[16:02:22] <wikibugs>	 10Operations, 10Security-Team, 10Wikimedia-Mailing-lists: Have a conversation about migrating from GNU Mailman 2.1 to GNU Mailman 3.0 - https://phabricator.wikimedia.org/T52864 (10Ladsgroup) It works now, you can try it in https://lists-beta.wmflabs.org  I haven't managed to get the archive working but you c...
[19:00:01] <icinga-wm>	 PROBLEM - Check systemd state on dumpsdata1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:00:19] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 51 probes of 574 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[19:02:51] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[19:04:41] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[19:06:05] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 47 probes of 574 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[19:20:41] <wikibugs>	 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Continuous-Integration-Config, and 2 others: Add pcov PHP extension to wikimedia apt so it can be used in Wikimedia CI - https://phabricator.wikimedia.org/T243847 (10Daimona)
[19:37:30] <wikibugs>	 10Operations, 10Thumbor, 10Wikimedia-SVG-rendering, 10Upstream: Update librsvg to ≥2.42.3 - https://phabricator.wikimedia.org/T193352 (10AntiCompositeNumber)
[19:50:52] <wikibugs>	 10Operations, 10Wikimedia-General-or-Unknown, 10Wikimedia-SVG-rendering, 10Documentation: Document how to request installing additional SVG and PDF fonts on Wikimedia servers - https://phabricator.wikimedia.org/T228591 (10AntiCompositeNumber) SVG and PDF rendering are both handled by Thumbor on Wikimedia s...
[20:09:57] <wikibugs>	 (03Abandoned) 10Addshore: wikidata: post edit constraint jobs on 50% of edits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484633 (https://phabricator.wikimedia.org/T204031) (owner: 10Addshore)
[20:10:00] <wikibugs>	 (03Abandoned) 10Addshore: wikidata: post edit constraint jobs on 100% of edits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484635 (https://phabricator.wikimedia.org/T204031) (owner: 10Addshore)
[20:33:05] <wikibugs>	 (03PS1) 10Ladsgroup: rabbitmq: Rename "slave" to "replica" in comment [puppet] - 10https://gerrit.wikimedia.org/r/605382 (https://phabricator.wikimedia.org/T254646)
[20:37:35] <wikibugs>	 (03PS1) 10Ladsgroup: wmcs: Remove "slave" from comment [puppet] - 10https://gerrit.wikimedia.org/r/605383 (https://phabricator.wikimedia.org/T254646)
[20:38:28] <wikibugs>	 (03PS2) 10Ladsgroup: rabbitmq: Rename "slave" to "replica" in comment [puppet] - 10https://gerrit.wikimedia.org/r/605382 (https://phabricator.wikimedia.org/T254646)
[20:44:59] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 54 probes of 574 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[20:50:49] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 47 probes of 574 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[21:12:32] <qchris>	 !log Enabling puppet on gerrit1002 (test instance). Done with testing for today.
[21:12:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:17:45] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 66 probes of 574 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:19:11] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 52 probes of 574 (alerts on 50) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:23:35] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 48 probes of 574 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:25:01] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 46 probes of 574 (alerts on 50) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas