[00:01:02] RECOVERY - Check systemd state on deneb is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:05:46] PROBLEM - Check systemd state on deneb is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:22:20] PROBLEM - WDQS high update lag on wdqs1004 is CRITICAL: 7.598e+04 ge 4.32e+04 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [00:40:22] PROBLEM - MegaRAID on db1101 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [00:40:23] ACKNOWLEDGEMENT - MegaRAID on db1101 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T270571 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [00:41:21] 10Operations, 10ops-eqiad: Degraded RAID on db1101 - https://phabricator.wikimedia.org/T270571 (10ops-monitoring-bot) [00:44:26] PROBLEM - Postgres Replication Lag on maps1005 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 846963944 and 261 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:46:00] RECOVERY - Postgres Replication Lag on maps1005 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 29784 and 322 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [04:57:57] (03PS1) 10Andrew Bogott: OpenStack haproxy: make logs much, much quieter [puppet] - 10https://gerrit.wikimedia.org/r/650943 (https://phabricator.wikimedia.org/T270554) [05:03:57] (03CR) 10Andrew Bogott: [C: 03+1] "Clearly better, presumably I was leaking this on purpose during dev and never fixed it." [puppet] - 10https://gerrit.wikimedia.org/r/650542 (https://phabricator.wikimedia.org/T270478) (owner: 10David Caro) [05:14:38] (03CR) 10Andrew Bogott: [C: 03+1] "This uses some Python conventions that are new to me and I like it!" [puppet] - 10https://gerrit.wikimedia.org/r/650141 (https://phabricator.wikimedia.org/T267195) (owner: 10David Caro) [05:18:11] (03CR) 10Andrew Bogott: [C: 03+1] [wmcs][backups] Add cli see where a project/vm is backed up [puppet] - 10https://gerrit.wikimedia.org/r/650496 (https://phabricator.wikimedia.org/T267195) (owner: 10David Caro) [05:18:55] (03CR) 10Andrew Bogott: [C: 03+1] [wmcs][backup] Added command to show a project [puppet] - 10https://gerrit.wikimedia.org/r/650497 (https://phabricator.wikimedia.org/T267195) (owner: 10David Caro) [06:00:46] (03PS1) 10Urbanecm: labs: bnwiki: Fix a typo in wgGEHelpPanelLinks config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650948 (https://phabricator.wikimedia.org/T270578) [06:07:48] (03PS2) 10Urbanecm: labs: bnwiki: Fix a typo in wgGEHelpPanelLinks config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650948 (https://phabricator.wikimedia.org/T270578) [06:58:16] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1101 - https://phabricator.wikimedia.org/T270571 (10Marostegui) p:05Triage→03Medium @wiki_willy this host is out of warranty, but do we have some spare disks (used is also ok) in the DC that we can replace this one with? Thanks [07:10:30] (03CR) 10DannyS712: [C: 03+1] labs: bnwiki: Fix a typo in wgGEHelpPanelLinks config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650948 (https://phabricator.wikimedia.org/T270578) (owner: 10Urbanecm) [10:05:18] PROBLEM - Blazegraph Port for wdqs-blazegraph on wdqs1006 is CRITICAL: connect to address 127.0.0.1 and port 9999: Connection refused https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [10:06:50] RECOVERY - Blazegraph Port for wdqs-blazegraph on wdqs1006 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 9999 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [13:15:22] RECOVERY - WDQS high update lag on wdqs1004 is OK: (C)4.32e+04 ge (W)2.16e+04 ge 2.154e+04 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [14:29:36] PROBLEM - Long running screen/tmux on maps1010 is CRITICAL: CRIT: Long running SCREEN process. (user: root PID: 267261, 1741012s 1728000s). https://wikitech.wikimedia.org/wiki/Monitoring/Long_running_screens [16:15:04] (03CR) 10Elukey: Port IRCSocketHandler from Spickerack and create irc_utils.py (032 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/650546 (https://phabricator.wikimedia.org/T257905) (owner: 10Elukey) [16:57:13] (03PS1) 10Urbanecm: Grant oathauth-disable-for-user and oathauth-verify-user to wmf-supportsafety at Meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650988 (https://phabricator.wikimedia.org/T180896) [17:10:46] (03CR) 10Ladsgroup: [C: 04-1] Grant oathauth-disable-for-user and oathauth-verify-user to wmf-supportsafety at Meta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650988 (https://phabricator.wikimedia.org/T180896) (owner: 10Urbanecm) [18:24:24] PROBLEM - Long running screen/tmux on maps1001 is CRITICAL: CRIT: Long running SCREEN process. (user: root PID: 16699, 1738974s 1728000s). https://wikitech.wikimedia.org/wiki/Monitoring/Long_running_screens [20:20:25] (03PS1) 10Ladsgroup: druid: Migrate hiera() to lookup() and setting datatype in middlemanager [puppet] - 10https://gerrit.wikimedia.org/r/650993 (https://phabricator.wikimedia.org/T209953) [20:24:20] (03CR) 10Ladsgroup: "PCC: https://puppet-compiler.wmflabs.org/compiler1003/27221/" [puppet] - 10https://gerrit.wikimedia.org/r/650993 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [22:35:19] (03PS2) 10Urbanecm: Grant oathauth-disable-for-user and oathauth-verify-user to wmf-supportsafety at Meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650988 (https://phabricator.wikimedia.org/T180896) [22:36:05] (03CR) 10Urbanecm: Grant oathauth-disable-for-user and oathauth-verify-user to wmf-supportsafety at Meta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650988 (https://phabricator.wikimedia.org/T180896) (owner: 10Urbanecm) [22:37:05] (03PS1) 10Urbanecm: metawiki: Grant oathauth-view-log to stewards [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651002 [23:13:36] (03CR) 10DannyS712: "if we are going to be (eventually) allowing resets on other wikis, stewards will need it in the global group, so while it should also be a" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651002 (owner: 10Urbanecm) [23:47:59] (03CR) 10Urbanecm: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651002 (owner: 10Urbanecm)