[00:00:09] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [00:41:25] 04Critical Alert for device cr2-eqdfw.wikimedia.org - Primary outbound port utilisation over 80% [00:41:36] 04Critical Alert for device cr2-eqdfw.wikimedia.org - Primary inbound port utilisation over 80% [00:47:25] 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-eqdfw.wikimedia.org recovered from Primary outbound port utilisation over 80% [00:47:36] 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-eqdfw.wikimedia.org recovered from Primary inbound port utilisation over 80% [01:29:57] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [01:30:25] RECOVERY - OSPF status on cr2-codfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [01:43:51] PROBLEM - rpki grafana alert on icinga1001 is CRITICAL: CRITICAL: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is alerting: eqiad rsync status alert. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [03:04:55] RECOVERY - rpki grafana alert on icinga1001 is OK: OK: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is not alerting. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [04:32:07] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [04:37:33] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [05:16:37] (03PS1) 10Legoktm: keys.txt: Only include Tim's current key (73F146FECF9D333C) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557158 [05:16:39] (03PS1) 10Legoktm: keys.html: Include Tim's new key (73F146FECF9D333C) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557159 [05:45:27] (03PS3) 10TechneSiyam: Added missing comma in line 1712,1782 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557053 [05:46:30] (03CR) 10jerkins-bot: [V: 04-1] Added missing comma in line 1712,1782 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557053 (owner: 10TechneSiyam) [05:56:12] (03CR) 10Legoktm: "Late review, but did you do this manually or script it? It would be nice if we could do this on a regular basis are more sites move to HTT" [puppet] - 10https://gerrit.wikimedia.org/r/551919 (owner: 10Dzahn) [06:00:05] (03PS2) 10TechneSiyam: Modified files with correct sized logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/556212 [06:00:07] (03PS4) 10TechneSiyam: Added bnwikibooks,bnwikisource,ukwikivoyage under wiki hd logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557053 [06:01:09] (03CR) 10jerkins-bot: [V: 04-1] Added bnwikibooks,bnwikisource,ukwikivoyage under wiki hd logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557053 (owner: 10TechneSiyam) [06:07:30] (03PS5) 10TechneSiyam: Added bnwikibooks,bnwikisource,ukwikivoyage under wiki hd logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557053 [06:08:20] (03CR) 10jerkins-bot: [V: 04-1] Added bnwikibooks,bnwikisource,ukwikivoyage under wiki hd logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557053 (owner: 10TechneSiyam) [06:11:43] (03PS6) 10TechneSiyam: Added bnwikibooks,bnwikisource,ukwikivoyage under wiki hd logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557053 [06:12:30] (03CR) 10jerkins-bot: [V: 04-1] Added bnwikibooks,bnwikisource,ukwikivoyage under wiki hd logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557053 (owner: 10TechneSiyam) [08:40:53] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 153452328 and 11 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [08:51:43] PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 39015736 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [08:53:31] RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 3144 and 34 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [08:53:31] RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 109640 and 34 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [10:13:43] (03Abandoned) 10Legoktm: php72: Switch from thirdparty/php72 to component/php72 [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/495291 (https://phabricator.wikimedia.org/T216712) (owner: 10Legoktm) [10:42:01] PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:09:03] RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:29:41] (03PS1) 10Aklapper: phabricator weekly project changes email: Tweak query about new assignees [puppet] - 10https://gerrit.wikimedia.org/r/557210 (https://phabricator.wikimedia.org/T227388) [12:39:49] PROBLEM - Logstash Elasticsearch indexing errors on icinga1001 is CRITICAL: 0.65 ge 0.5 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/1cee1f1b5d4e6c5e06edb3353a2a4b83 https://grafana.wikimedia.org/dashboard/db/logstash [13:50:11] RECOVERY - Logstash Elasticsearch indexing errors on icinga1001 is OK: (C)0.5 ge (W)0.1 ge 0.07917 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/1cee1f1b5d4e6c5e06edb3353a2a4b83 https://grafana.wikimedia.org/dashboard/db/logstash [14:01:33] PROBLEM - rpki grafana alert on icinga1001 is CRITICAL: CRITICAL: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is alerting: rsync status alert. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [14:41:51] (03CR) 10Andrew Bogott: [C: 03+2] Add shinken group and contacts for the 'gratitude' cloud-vps project [puppet] - 10https://gerrit.wikimedia.org/r/556389 (https://phabricator.wikimedia.org/T238424) (owner: 10Andrew Bogott) [15:22:35] RECOVERY - rpki grafana alert on icinga1001 is OK: OK: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is not alerting. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [15:43:23] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [16:01:23] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [16:06:47] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [16:12:13] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [16:15:51] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [16:30:23] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [16:33:59] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [18:29:55] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:31:43] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:16:58] Is anyone able to restart gerrit on gerrit2001 for T240763 please? [20:16:58] T240763: gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 [20:38:03] (03PS5) 10Ladsgroup: mediawiki: Use mediawiki::errorpage instead of a php7-fatal-error.php.erb [puppet] - 10https://gerrit.wikimedia.org/r/539203 (https://phabricator.wikimedia.org/T113114) [20:51:01] (03CR) 10Ladsgroup: "The rebase is done." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/539203 (https://phabricator.wikimedia.org/T113114) (owner: 10Ladsgroup) [21:53:59] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [21:56:33] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase [21:58:19] RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [21:59:25] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [22:03:15] PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:06:21] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [22:06:51] PROBLEM - restbase endpoints health on restbase1019 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:07:57] PROBLEM - Restrouter LVS eqiad on restrouter.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase [22:08:01] PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:08:11] PROBLEM - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is CRITICAL: /api/rest_v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase [22:08:39] PROBLEM - restbase endpoints health on restbase1021 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:08:39] PROBLEM - restbase endpoints health on restbase1026 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:08:41] is graphite-labs.wikimedia.org okay? [22:09:15] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase [22:09:43] RECOVERY - Restrouter LVS eqiad on restrouter.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [22:09:55] RECOVERY - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [22:10:21] RECOVERY - restbase endpoints health on restbase1026 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:10:27] RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:11:33] RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:11:43] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [22:12:19] PROBLEM - restbase endpoints health on restbase1027 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:12:19] PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:13:35] PROBLEM - restbase endpoints health on restbase1025 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:14:01] RECOVERY - restbase endpoints health on restbase1021 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:14:03] RECOVERY - restbase endpoints health on restbase1027 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:14:03] RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:14:05] RECOVERY - restbase endpoints health on restbase1019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:15:19] RECOVERY - restbase endpoints health on restbase1025 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:16:29] RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [22:20:57] PROBLEM - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is CRITICAL: /api/rest_v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received: /api/rest_v1/transform/wikitext/to/html/{title} (Transform wikitext to html) timed out before a response was received: /api/rest_v1/media/math/check/{type} (Mathoid - check test formula) timed out before a response was received htt [22:20:57] imedia.org/wiki/RESTBase [22:25:05] PROBLEM - restbase endpoints health on restbase1027 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:26:43] PROBLEM - restbase endpoints health on restbase1022 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:26:47] RECOVERY - restbase endpoints health on restbase1027 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:28:37] RECOVERY - restbase endpoints health on restbase1022 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:28:43] PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:30:27] RECOVERY - restbase endpoints health on restbase1017 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:31:52] (03CR) 10Hashar: [C: 03+1] Install pygerrit2 on releases server [puppet] - 10https://gerrit.wikimedia.org/r/557075 (https://phabricator.wikimedia.org/T196517) (owner: 1020after4) [22:33:37] RECOVERY - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [22:40:13] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase [22:41:59] RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [22:42:37] PROBLEM - Restrouter LVS eqiad on restrouter.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase [22:44:23] RECOVERY - Restrouter LVS eqiad on restrouter.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [22:44:33] PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:46:23] RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:46:25] PROBLEM - restbase endpoints health on restbase1025 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:48:07] RECOVERY - restbase endpoints health on restbase1025 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:50:07] !log Restarted Gerrit on gerrit2001 # T240763 [22:50:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:50:15] T240763: gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 [22:51:31] 10Operations, 10Gerrit, 10LibUp: gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 (10hashar) [2019-12-14 16:07:25,808] [HTTP-87043] WARN org.eclipse.jetty.servlet.ServletHandler : Error for /r/mediawiki/extensions/DataTransfer.git... [22:51:53] 10Operations, 10Gerrit, 10LibUp, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201912): gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 (10hashar) [22:53:19] 10Operations, 10Gerrit, 10LibUp, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201912): gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 (10Legoktm) Thank you hashar :) Confirmed that libup... [22:54:12] 10Operations, 10Gerrit, 10LibUp, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201912): gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 (10hashar) And the trace analysis is https://fastthr... [22:56:35] 10Operations, 10Gerrit, 10LibUp, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201912): gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 (10hashar) No out of memory messages for December 12... [23:01:50] 10Operations, 10Gerrit, 10LibUp, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201912): gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 (10Legoktm) Looks like the memory usage has just bee... [23:02:59] 10Operations, 10Gerrit, 10LibUp, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201912): gerrit-replica is returning 502 responses when trying to git clone, breaking libup - https://phabricator.wikimedia.org/T240763 (10hashar) 05Open→03Resolved a:03hashar tldr:... [23:24:59] (03CR) 10Krinkle: mediawiki: Use mediawiki::errorpage instead of a php7-fatal-error.php.erb (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/539203 (https://phabricator.wikimedia.org/T113114) (owner: 10Ladsgroup)