[00:04:03] PROBLEM - Check systemd state on netflow3001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:04:25] PROBLEM - Check systemd state on netflow5001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:04:57] PROBLEM - Check systemd state on netflow2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:04:59] PROBLEM - Check systemd state on netflow4001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:05:33] PROBLEM - Check systemd state on netflow1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:20:22] 10Operations, 10netops: every Sunday at 00:00 UTC, logrotate fails on netflow hosts - https://phabricator.wikimedia.org/T257128 (10CDanis) [00:20:27] 10Operations, 10netops: every Sunday at 00:00 UTC, logrotate fails on netflow hosts - https://phabricator.wikimedia.org/T257128 (10CDanis) p:05Triage→03Low [03:31:39] PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 201.4 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [04:51:23] RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 62.03 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [04:53:50] (03PS1) 10VulpesVulpes825: Don't index NS_USER on hywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609559 (https://phabricator.wikimedia.org/T257112) [05:17:53] PROBLEM - Query Service HTTP Port on wdqs1007 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 298 bytes in 0.002 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [06:03:43] PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash-codfw instance=kafkamon1001 job=burrow partition={2,3} site=eqiad topic={rsyslog-info,rsyslog-notice,udp_localhost-info} https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+ [06:03:43] r-cluster=logging-eqiad&var-topic=All&var-consumer_group=All [06:12:57] PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash-codfw instance=kafkamon1001 job=burrow partition={0,1,2,3,4,5} site=eqiad topic={udp_localhost-info,udp_localhost-warning} https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqia [06:12:57] var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All [06:20:25] RECOVERY - Too many messages in kafka logging-eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200705T0700) [07:08:19] PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:09:05] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 133, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:15:41] RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:16:27] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 135, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:29:30] (03PS1) 10Amire80: Remove englishwikisource.tumblr.com from Planet Wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/609564 [07:34:58] (03PS1) 10Amire80: Remove three entries from the Russian Planet [puppet] - 10https://gerrit.wikimedia.org/r/609565 [07:50:58] (03PS2) 10RhinosF1: Remove englishwikisource.tumblr.com from Planet Wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/609564 (owner: 10Amire80) [07:51:36] (03CR) 10RhinosF1: [C: 03+1] "Fixed the grammar because I'm insane and it was annoying me leaving it. Otherwise, +1" [puppet] - 10https://gerrit.wikimedia.org/r/609564 (owner: 10Amire80) [07:55:30] (03CR) 10Amire80: "> Patch Set 2: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/609564 (owner: 10Amire80) [08:07:47] (03CR) 10Ammarpad: Don't index NS_USER on hywiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609559 (https://phabricator.wikimedia.org/T257112) (owner: 10VulpesVulpes825) [09:07:43] (03PS1) 10Joal: Bump AQS druid snapshot to 2020-06 [puppet] - 10https://gerrit.wikimedia.org/r/609566 [09:39:37] PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 102.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [10:23:37] (03CR) 10MarcoAurelio: [C: 04-1] "I think this needs changing on https://meta.wikimedia.org/wiki/Interwiki_map, not here." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577740 (https://phabricator.wikimedia.org/T227053) (owner: 10Fomafix) [10:51:45] RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 73.22 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [11:22:53] PROBLEM - Host cr3-eqsin is DOWN: PING CRITICAL - Packet loss = 100% [11:24:18] here but not near my laptop [11:24:34] <_joe_> I'm around [11:24:36] checking dashboards [11:24:47] PROBLEM - Host re0.cr3-eqsin is DOWN: PING CRITICAL - Packet loss = 100% [11:25:18] <_joe_> eqsin is unreachable [11:25:27] <_joe_> I'll write a patch to depool it [11:25:39] is it? [11:25:47] <_joe_> it's unreachable from eqiad [11:26:02] <_joe_> icinga has all cp5* timing out checks [11:26:29]  [11:26:41] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [11:27:03] PROBLEM - Router interfaces on cr2-eqsin is CRITICAL: CRITICAL: host 103.102.166.130, interfaces up: 74, down: 3, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [11:27:10] that's the new router, but it's just the one [11:27:29] in theory one of the two should be enough [11:27:30] PROBLEM - OSPF status on mr1-eqsin is CRITICAL: OSPFv2: 1/2 UP : OSPFv3: 1/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [11:27:30] <_joe_> they have now recovered [11:27:48] <_joe_> but we had alerts on 9 cp5* hosts [11:28:01] <_joe_> so yes, now everything seems ok [11:28:18] (on my phone btw) [11:28:51] PROBLEM - Host cr3-eqsin IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [11:29:03] I barely have any signal for the next 2h (in the mountains) [11:29:44] <_joe_> yeah I browse just fine on eqsin for the record [11:30:57] looks like even the mgmt interface went down (from the backlog) [11:31:27] <_joe_> yes [11:31:40] up/here reading backlog [11:31:52] <_joe_> I will prepare a patch to depool eqsin in case we lose the second router [11:32:02] <_joe_> but I don't think this warrants depooling the site as of now [11:32:13] logging off for now but LMK if I can help [11:35:08] (03PS1) 10Giuseppe Lavagetto: Depool eqsin [dns] - 10https://gerrit.wikimedia.org/r/609571 [11:35:57] <_joe_> paravoid: do you think there is something we can try to do now? [11:36:06] <_joe_> oh you're on the phone too, heh [11:36:57] next steps will be to look at the console server see if there is anything there [11:38:09] if yes get the logs, etc and most likely open a jtac case [11:38:39] if not probably depool eqsin and powercycle it by turning the PDU ports off/on [11:41:10] <_joe_> https://gerrit.wikimedia.org/r/c/operations/dns/+/609571/ [11:42:23] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 63 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [11:44:09] dont have access to the console server unfortunatly [11:48:13] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [11:52:39] PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [12:03:35] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [12:09:23] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [12:29:33] RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 52.88 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [13:48:55] PROBLEM - Check systemd state on wdqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:08:53] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:09:13] RECOVERY - Check systemd state on wdqs1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:10:45] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:14:45] PROBLEM - Check systemd state on wdqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:38:39] RECOVERY - Check systemd state on wdqs1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:44:03] PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 101.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [14:44:11] PROBLEM - Check systemd state on wdqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:08:09] RECOVERY - Check systemd state on wdqs1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:13:41] PROBLEM - Check systemd state on wdqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:26:38] (03PS1) 10Thiemo Kreuz (WMDE): [POC] Convert all Wikipedia logos to (true) grayscale [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609584 [15:31:33] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [15:33:22] (03PS2) 10Thiemo Kreuz (WMDE): [POC] Convert all Wikipedia logos to (true) grayscale [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609584 (https://phabricator.wikimedia.org/T252108) [15:37:25] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 49 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [15:51:25] (03CR) 10Privacybatm: "In this new patch set, I used a file to write the checksum instead of a pipe. This change resolves the above stuck issue. The benchmark va" [software/transferpy] - 10https://gerrit.wikimedia.org/r/608640 (https://phabricator.wikimedia.org/T254979) (owner: 10Privacybatm) [15:56:16] !log restart blazegraph + updater on wdqs1007 and depool to allow catching up on lag [15:56:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:13] PROBLEM - WDQS high update lag on wdqs1007 is CRITICAL: 1.524e+05 ge 4.32e+04 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [15:57:49] RECOVERY - Query Service HTTP Port on wdqs1007 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.025 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [15:57:55] RECOVERY - Check systemd state on wdqs1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:58:38] ACKNOWLEDGEMENT - WDQS high update lag on wdqs1007 is CRITICAL: 1.524e+05 ge 4.32e+04 Gehel catching up on lag after restart - The acknowledgement expires at: 2020-07-06 15:58:16. https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [16:01:22] !log restart elastic-psi on elastic1052 (high GC rate) [16:01:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:21] RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 53.9 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [16:27:03] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:28:57] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:54:43] PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is CRITICAL: 48.3 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [17:58:23] RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [18:20:11] (03PS1) 10Andrew Bogott: profile::wmcs::novaproxy: add use_wmflabs_root option [puppet] - 10https://gerrit.wikimedia.org/r/609586 (https://phabricator.wikimedia.org/T256276) [18:20:13] (03PS1) 10Andrew Bogott: profile::wmcs::novaproxy: add acme_certname [puppet] - 10https://gerrit.wikimedia.org/r/609587 (https://phabricator.wikimedia.org/T256276) [19:10:28] 10Operations, 10netops: Investigate Junos vmhost snapshot - https://phabricator.wikimedia.org/T257153 (10ayounsi) p:05Triage→03High [19:16:28] 10Operations, 10netops: cr3-eqsin disk 1 failure - https://phabricator.wikimedia.org/T257154 (10ayounsi) p:05Triage→03High [19:34:10] 10Operations, 10netops: cr3-eqsin disk 1 failure - https://phabricator.wikimedia.org/T257154 (10ayounsi) Opened JTAC case 2020-0705-0136 [19:43:47] 10Operations, 10netops: cr3-eqsin disk 1 failure - https://phabricator.wikimedia.org/T257154 (10ayounsi) Note that there are no mentions of hard drives in Juniper's MX204 [[ https://www.juniper.net/documentation/en_US/release-independent/junos/information-products/pathway-pages/mx-series/mx204/mx204-hw-guide.... [20:45:27] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [20:51:17] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 49 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [21:01:53] 10Operations, 10netops: cr3-eqsin disk 1 failure - https://phabricator.wikimedia.org/T257154 (10ayounsi) JTAC asked for some more troubleshooting commands, the interesting one is: ` root> show vmhost hardware Compute cluster: rainier-re-cc Compute node: rainier-re-cn Hardware invento... [21:20:33] !log Enable puppet on gerrit1002 (gerrit-test) again to let it catch up again [21:20:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:24:18] (03PS1) 10QChris: Bump gerrit.war to Gerrit 3.2.2-102-g3bbb138e13 [software/gerrit] (deploy/wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/609590 (https://phabricator.wikimedia.org/T256550) [21:24:22] (03PS1) 10QChris: Bump zuul.jar to master-0-g7accc67 [software/gerrit] (deploy/wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/609591 (https://phabricator.wikimedia.org/T256550) [21:27:12] (03CR) 10QChris: [V: 03+2 C: 03+2] Bump gerrit.war to Gerrit 3.2.2-102-g3bbb138e13 [software/gerrit] (deploy/wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/609590 (https://phabricator.wikimedia.org/T256550) (owner: 10QChris) [21:27:42] (03CR) 10QChris: [V: 03+2 C: 03+2] Bump zuul.jar to master-0-g7accc67 [software/gerrit] (deploy/wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/609591 (https://phabricator.wikimedia.org/T256550) (owner: 10QChris) [21:32:21] !log qchris@deploy1001 Started deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13 and zuul plugin to master-0-g7accc67 [21:32:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:32:29] !log qchris@deploy1001 Finished deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13 and zuul plugin to master-0-g7accc67 (duration: 00m 08s) [21:32:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:32:57] !log Restarting gerrit on gerrit1002 to pick up new wars and jars. [21:33:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:45:34] !log qchris@deploy1001 Started deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit2001 [21:45:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:45:44] !log qchris@deploy1001 Finished deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit2001 (duration: 00m 10s) [21:45:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:46:21] !log Restarting gerrit on gerrit2001 to pick up new war and jars. [21:46:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:50:34] !log qchris@deploy1001 Started deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1001 [21:50:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:50:42] !log qchris@deploy1001 Finished deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1001 (duration: 00m 07s) [21:50:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:50:48] !log Restarting gerrit on gerrit1001 to pick up new war and jars. [21:50:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:55:58] Gerrit is up, is properly connected to phabricator, allows clones, pushes, voting, and integration with Zuul works too. [22:20:29] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 62 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [22:32:09] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [22:41:43] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 58 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [22:47:35] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 49 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas