[00:00:38] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:09:08] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:10:48] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:24:10] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:27:32] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:30:54] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:32:36] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:34:16] PROBLEM - Postgres Replication Lag on maps2001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 18172960 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:35:56] RECOVERY - Postgres Replication Lag on maps2001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 12632 and 6 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [02:21:37] (03CR) 10Huji: "Now is the time to rebase and deploy this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/626851 (https://phabricator.wikimedia.org/T262689) (owner: 10Majavah) [06:43:36] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 90, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:44:12] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 241, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:53:42] sigh this is still in progress -^ [06:57:04] (03PS1) 10Elukey: Decom analytics1054 from Hadoop [puppet] - 10https://gerrit.wikimedia.org/r/634766 (https://phabricator.wikimedia.org/T255140) [06:58:24] (03CR) 10Elukey: [C: 03+2] Decom analytics1054 from Hadoop [puppet] - 10https://gerrit.wikimedia.org/r/634766 (https://phabricator.wikimedia.org/T255140) (owner: 10Elukey) [07:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201018T0700) [07:54:52] Can someone check the stack trace for `[0c5ddc64-14bf-4c8e-aa80-7c85a13568f7] 2020-10-18 07:54:41: Fatal exception of type "MWException"` please? [08:18:35] DannyS712: TeX input is invalid [08:19:51] okay, reported at https://phabricator.wikimedia.org/T265814 if you'd be willing to add a full stack trace (though that has a different id) [08:20:16] https://phabricator.wikimedia.org/P13017 [08:20:30] feel free to link it [08:21:31] I hav added the id so it's clear [08:22:05] DannyS712: ^^ [08:23:01] thanks, added to task [09:21:14] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [09:21:20] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 243, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [09:21:34] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 92, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [09:22:54] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [09:39:58] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [09:41:42] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:10:47] 10Operations, 10Commons, 10Tools, 10Wikimedia-Mailing-lists: daily-image-l stopped sending on 2020-10-11 - https://phabricator.wikimedia.org/T265568 (10Steinsplitter) >>! In T265568#6548353, @Legoktm wrote: >>>! In T265568#6546816, @Steinsplitter wrote: >> @Legoktm If you provide a e-mail address to me, th... [11:08:30] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:10:12] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:18:07] (03PS1) 10Urbanecm: arbcom_ruwiki.yaml: Fix a copy/paste typo from creating the wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634772 [11:29:50] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:31:26] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:33:10] (03PS1) 10Urbanecm: Initial configuration for smnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634774 (https://phabricator.wikimedia.org/T264859) [11:56:14] 10Operations, 10DBA, 10Epic, 10User-Kormat: Use zarcillo as an authoritative inventory of db instances/roles - https://phabricator.wikimedia.org/T257814 (10Aklapper) [13:25:06] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:26:48] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:03:44] 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Several unreadable mailing list descriptions (Mojibake) due to wrong charset encodings, should be Unicode - https://phabricator.wikimedia.org/T261031 (10alriazuddin2017) {F32397330} Today I received the mail (see screenshot) from the mailing list of Bengali... [14:29:32] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:31:12] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:52:56] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:54:34] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:15:36] 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Several unreadable mailing list descriptions (Mojibake) due to wrong charset encodings, should be Unicode - https://phabricator.wikimedia.org/T261031 (10alanajjar) We are facing same issue on: * Wikimedia-levant list https://lists.wikimedia.org/mailman/listinf... [15:26:16] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job={swagger_check_citoid_cluster_codfw,swagger_check_restbase_esams} site={codfw,esams} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:26:31] 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Several unreadable mailing list descriptions (Mojibake) due to wrong charset encodings, should be Unicode - https://phabricator.wikimedia.org/T261031 (10Ladsgroup) I, in my volunteer capacity, have been trying to figure out what's going on and have tried sever... [15:31:18] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:43:10] 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Several unreadable mailing list descriptions (Mojibake) due to wrong charset encodings, should be Unicode - https://phabricator.wikimedia.org/T261031 (10Aklapper) Hi, we do **not** need more comments and confirmations that there is a problem. We all know that.... [16:48:56] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:50:36] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:26:28] (03CR) 10Jforrester: "recheck" [dumps/dcat] - 10https://gerrit.wikimedia.org/r/628103 (owner: 10L10n-bot) [18:07:13] 10Operations, 10Wikimedia-Mailing-lists: Create a mailing list for frwiki nominators - https://phabricator.wikimedia.org/T265835 (10Kvardek_du) [18:07:32] 10Operations, 10Wikimedia-Mailing-lists: Create a mailing list for frwiki nominators - https://phabricator.wikimedia.org/T265835 (10Kvardek_du) [18:16:40] 10Operations, 10Wikimedia-Mailing-lists: Create a mailing list for frwiki nominators - https://phabricator.wikimedia.org/T265835 (10Kvardek_du) [18:42:42] 10Operations, 10Wikimedia-Mailing-lists: Create a mailing list for frwiki nominators - https://phabricator.wikimedia.org/T265835 (10Kvardek_du) [19:21:42] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [19:23:24] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:15:52] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job={swagger_check_citoid_cluster_codfw,swagger_check_restbase_esams} site={codfw,esams} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:17:32] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:29:28] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:31:10] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [21:35:40] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [21:37:20] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:41:04] 10Operations, 10Wikimedia-Mailing-lists: Bot unable to send messages to wikipedia-fr-wikimag - https://phabricator.wikimedia.org/T265844 (10Orlodrim) [23:32:42] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:36:08] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets