[00:51:51] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1974301040 and 83 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:54:05] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 4116057112 and 312 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:54:39] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1007 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 894609800 and 181 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:55:13] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 170 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:55:45] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1016 and 202 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:56:21] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1007 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 202288 and 238 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:04:07] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 9049073864 and 595 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:04:45] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2006 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 5694610992 and 332 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:04:51] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 3724200072 and 189 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:05:21] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2009 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 5877000112 and 318 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:09:45] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2006 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1088 and 232 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:09:51] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 105424 and 238 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:12:01] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2009 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 6152 and 368 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:12:29] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 395 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[02:51:57] <wikibugs>	 10SRE, 10vm-requests: What is the Scientific Method - https://phabricator.wikimedia.org/T271634 (10Davishca)
[02:52:58] <wikibugs>	 10SRE, 10vm-requests: What is the Scientific Method - https://phabricator.wikimedia.org/T271634 (10Davishca) What is the Scientific Method? The scientific method is a method used to discover new understandings about the natural world based on making falsifiable predictions (hypotheses), testing them empiricall...
[03:04:58] <wikibugs>	 10SRE, 10LDAP: Create auto-populated LDAP group of those who have production shell access - https://phabricator.wikimedia.org/T271587 (10Peachey88)
[07:30:15] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 359 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[07:31:55] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 13 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[07:56:49] <wikibugs>	 10SRE, 10vm-requests: What is the Scientific Method - https://phabricator.wikimedia.org/T271634 (10DannyS712) @Aklapper or another phab admin, can you please close this? Or reset the edit policy at least? Thanks
[09:41:56] <icinga-wm>	 PROBLEM - Check nf_conntrack usage in neutron netns on cloudnet1004 is CRITICAL: CRITICAL: nf_conntrack usage over 80% in netns qrouter-d93771ba-2711-4f88-804a-8df6fd03978a https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:43:37] <icinga-wm>	 RECOVERY - Check nf_conntrack usage in neutron netns on cloudnet1004 is OK: OK: everything is apparently fine https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:53:41] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=routinator site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[09:55:23] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[10:57:01] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=routinator site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[11:00:21] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[11:33:26] <wikibugs>	 10SRE, 10Graphoid, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), and 2 others: Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10Aklapper)
[13:43:37] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1042 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:55:19] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ms-be1042 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[14:10:19] <icinga-wm>	 RECOVERY - Check systemd state on ms-be1042 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:25:53] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ms-be1042 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[16:19:09] <wikibugs>	 (03PS1) 10Andrew Bogott: OpenStack haproxy: change http service health check interval to 3s [puppet] - 10https://gerrit.wikimedia.org/r/655275 (https://phabricator.wikimedia.org/T271647)
[16:22:32] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] OpenStack haproxy: change http service health check interval to 3s [puppet] - 10https://gerrit.wikimedia.org/r/655275 (https://phabricator.wikimedia.org/T271647) (owner: 10Andrew Bogott)
[16:57:40] <wikibugs>	 (03PS1) 10Andrew Bogott: OpenStack rabbitmq: set busy wait threshold to 'none' [puppet] - 10https://gerrit.wikimedia.org/r/655277 (https://phabricator.wikimedia.org/T271647)
[16:59:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] OpenStack rabbitmq: set busy wait threshold to 'none' [puppet] - 10https://gerrit.wikimedia.org/r/655277 (https://phabricator.wikimedia.org/T271647) (owner: 10Andrew Bogott)
[17:01:33] <wikibugs>	 (03PS2) 10Andrew Bogott: OpenStack rabbitmq: set busy wait threshold to 'none' [puppet] - 10https://gerrit.wikimedia.org/r/655277 (https://phabricator.wikimedia.org/T271647)
[17:09:04] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] OpenStack rabbitmq: set busy wait threshold to 'none' [puppet] - 10https://gerrit.wikimedia.org/r/655277 (https://phabricator.wikimedia.org/T271647) (owner: 10Andrew Bogott)
[17:14:00] <wikibugs>	 (03PS1) 10Andrew Bogott: When changing rabbitmq-env.conf, notify rabbit service [puppet] - 10https://gerrit.wikimedia.org/r/655278 (https://phabricator.wikimedia.org/T271647)
[17:15:26] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] When changing rabbitmq-env.conf, notify rabbit service [puppet] - 10https://gerrit.wikimedia.org/r/655278 (https://phabricator.wikimedia.org/T271647) (owner: 10Andrew Bogott)
[17:15:58] <wikibugs>	 (03PS2) 10Andrew Bogott: When changing rabbitmq-env.conf, notify rabbit service [puppet] - 10https://gerrit.wikimedia.org/r/655278 (https://phabricator.wikimedia.org/T271647)
[17:17:48] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] When changing rabbitmq-env.conf, notify rabbit service [puppet] - 10https://gerrit.wikimedia.org/r/655278 (https://phabricator.wikimedia.org/T271647) (owner: 10Andrew Bogott)
[17:51:49] <wikibugs>	 (03PS1) 10Majavah: Revert "Switch fiwiki to their 500k temporary logo!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655281
[17:51:51] <wikibugs>	 (03PS1) 10Majavah: Revert "Add fiwiki 500k temporary logos" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655282
[18:49:41] <wikibugs>	 (03PS5) 10Jforrester: wgAbuseFilterAflFilterMigrationStage: Make WRITE_BOTH everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647116 (https://phabricator.wikimedia.org/T269712)
[18:52:35] <wikibugs>	 (03PS3) 10Jforrester: wgAbuseFilterAflFilterMigrationStage: Make READ_NEW in Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647117 (https://phabricator.wikimedia.org/T269712)
[18:52:41] <wikibugs>	 (03PS3) 10Jforrester: wgAbuseFilterAflFilterMigrationStage: Make COMPAT_NEW in Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647118 (https://phabricator.wikimedia.org/T269712)
[20:18:57] <icinga-wm>	 PROBLEM - SSH on logstash1008 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[20:19:09] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on logstash1008 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Max retries exceeded with url: /_cluster/health (Caused by NewConnectionError(requests.packages.urllib3.connection.HTTPConnection object at 0x7f8e40720518: Failed to establish a new connection: [Errno 111] Connection
[20:19:09] <icinga-wm>	 ://wikitech.wikimedia.org/wiki/Search%23Administration
[20:20:23] <icinga-wm>	 PROBLEM - Check systemd state on logstash1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:20:27] <icinga-wm>	 RECOVERY - SSH on logstash1008 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[20:45:27] <icinga-wm>	 RECOVERY - Check systemd state on logstash1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:45:55] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on logstash1008 is OK: OK - elasticsearch status production-logstash-eqiad: status: green, active_shards: 916, delayed_unassigned_shards: 0, initializing_shards: 0, cluster_name: production-logstash-eqiad, relocating_shards: 0, number_of_nodes: 6, unassigned_shards: 0, timed_out: False, active_shards_percent_as_number: 100.0, number_of_in_flight_fetch: 0, number_of_data_nod
[20:45:55] <icinga-wm>	 aiting_in_queue_millis: 0, number_of_pending_tasks: 0, active_primary_shards: 483 https://wikitech.wikimedia.org/wiki/Search%23Administration
[21:45:46] <wikibugs>	 10SRE, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Reedy) Is there a task (or should the be it?) for actually swapping from PHP 7.2 to... newer PHP (7.3 or whatever)?...
[22:19:50] <wikibugs>	 (03PS1) 10Urbanecm: Enable anniversary logo for cs.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655292 (https://phabricator.wikimedia.org/T271662)
[22:22:38] <wikibugs>	 (03PS1) 10Urbanecm: Set import sources for mrwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655293 (https://phabricator.wikimedia.org/T270402)