[00:17:17] RECOVERY - MariaDB Slave Lag: s3 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 291.73 seconds [00:30:45] PROBLEM - Memory correctable errors -EDAC- on thumbor1004 is CRITICAL: 5.001 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=thumbor1004&var-datasource=eqiad+prometheus/ops [00:50:34] PROBLEM - LVS HTTP IPv4 on wdqs.svc.eqiad.wmnet is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 387 bytes in 0.002 second response time [00:53:22] anyone looking at the WDQS pages? [00:54:16] RECOVERY - LVS HTTP IPv4 on wdqs.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.055 second response time [00:56:34] I think wqds1004 is hung somehow? [01:01:34] PROBLEM - LVS HTTP IPv4 on wdqs.svc.eqiad.wmnet is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 387 bytes in 0.002 second response time [01:02:48] RECOVERY - LVS HTTP IPv4 on wdqs.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.005 second response time [01:06:23] RECOVERY - WDQS HTTP Port on wdqs1004 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.030 second response time [01:07:10] !log cdanis@wdqs1004.eqiad.wmnet /var/log/wdqs % sudo service wdqs-blazegraph restart [01:07:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:08:42] gehel: SMalyshev: the blazegraph daemon on wdqs1004 seemed to be throttling all requests and not doing useful work? Not sure why restarting it fixed anything, but it seems to have. [01:09:08] gehel: SMalyshev: I'm also bothered that one server being on the fritz paged [01:09:15] PROBLEM - Memory correctable errors -EDAC- on db1068 is CRITICAL: 4.001 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=db1068&var-datasource=eqiad+prometheus/ops [02:45:50] cdanis: looks like something caused thread spike there... but now it's ok [02:46:13] I know nothing about paging so can't say anything about that [02:46:18] SMalyshev: yeah, not sure what happened, didn't see anything obvious in a cursory look at the logs, but restarting it by hand fixed it [02:46:36] ok then [02:46:39] ideally it wouldn't be the case that one wedged server was causing requests to the service IP to fail [02:46:41] thanks for restarting it [02:47:19] yeah that's weird why service endpoint failed - it should have been just depooled, not? [02:48:42] cdanis: maybe worth creating a task to check it... 1004 should alert but svc should have stayed fine [02:49:09] yeah, I will create a task on Tuesday [02:49:16] cool, thanks [03:33:53] PROBLEM - puppet last run on mw2162 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz],File[/usr/share/GeoIP/GeoIP2-City.mmdb.test] [03:38:52] 10Operations, 10Traffic, 10HTTPS: en.wikipedia.com [sic] serves an invalid certificate - https://phabricator.wikimedia.org/T214253 (10Krenair) I think wikipedia.com is a junk redirect domain which makes this another case of {T133548} [03:39:38] 10Operations, 10Traffic, 10HTTPS: en.wikipedia.com [sic] serves an invalid certificate - https://phabricator.wikimedia.org/T214253 (10Krenair) > https://www.wikipedia.com works fine. Nope: {F27953553} [03:40:18] 10Operations, 10Traffic, 10HTTPS: en.wikipedia.com [sic] serves an invalid certificate - https://phabricator.wikimedia.org/T214253 (10Krenair) [03:40:22] 10Operations, 10Traffic, 10HTTPS, 10Patch-For-Review: Create a secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548 (10Krenair) [03:42:09] 10Operations, 10Domains, 10Traffic, 10Wikimedia-Apache-configuration: en-wp.org certificate error - https://phabricator.wikimedia.org/T190244 (10Krenair) [03:42:11] 10Operations, 10Traffic, 10HTTPS, 10Patch-For-Review: Create a secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548 (10Krenair) [03:42:55] 10Operations, 10Domains, 10Traffic, 10Wikimedia-Apache-configuration: en-wp.org certificate error - https://phabricator.wikimedia.org/T190244 (10Krenair) [03:42:58] 10Operations, 10Traffic, 10HTTPS, 10Patch-For-Review: Create a secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548 (10Krenair) [03:53:23] RECOVERY - MegaRAID on dbstore1002 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [04:05:17] RECOVERY - puppet last run on mw2162 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [04:21:22] 10Operations, 10Wikimedia-Mailing-lists: Adminship of MediaWiki-India Mailing List - https://phabricator.wikimedia.org/T212957 (10Jayprakash12345) 11 Days has passed, There is no objection. now should we go ahead? [04:54:18] (03PS3) 10BryanDavis: toolforge: Prometheus replacement for sge.py diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/485372 (https://phabricator.wikimedia.org/T211684) [07:44:16] 10Operations, 10ops-eqiad, 10Analytics, 10Product-Analytics: Degraded RAID on dbstore1002 - https://phabricator.wikimedia.org/T206965 (10Marostegui) RAID is back to optimal! @Cmjohnson was this disk replaced then? ` root@dbstore1002:~# megacli -LDInfo -lALL -aALL Adapter 0 -- Virtual Drive Information:... [07:58:19] PROBLEM - MariaDB Slave Lag: s2 on db2035 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 301.44 seconds [07:58:49] PROBLEM - MariaDB Slave Lag: s2 on db2088 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 307.42 seconds [07:59:03] PROBLEM - MariaDB Slave Lag: s2 on db2056 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.19 seconds [07:59:07] PROBLEM - MariaDB Slave Lag: s2 on db2063 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 310.03 seconds [07:59:07] PROBLEM - MariaDB Slave Lag: s2 on db2041 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 310.04 seconds [07:59:15] PROBLEM - MariaDB Slave Lag: s2 on db2049 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 311.11 seconds [07:59:21] PROBLEM - MariaDB Slave Lag: s2 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 311.21 seconds [07:59:23] PROBLEM - MariaDB Slave Lag: s2 on db2091 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 311.96 seconds [09:00:21] RECOVERY - MariaDB Slave Lag: s2 on db2088 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [09:00:37] RECOVERY - MariaDB Slave Lag: s2 on db2056 is OK: OK slave_sql_lag Replication lag: 0.49 seconds [09:00:41] RECOVERY - MariaDB Slave Lag: s2 on db2063 is OK: OK slave_sql_lag Replication lag: 0.42 seconds [09:00:41] RECOVERY - MariaDB Slave Lag: s2 on db2041 is OK: OK slave_sql_lag Replication lag: 0.43 seconds [09:00:49] RECOVERY - MariaDB Slave Lag: s2 on db2049 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [09:00:53] RECOVERY - MariaDB Slave Lag: s2 on db2095 is OK: OK slave_sql_lag Replication lag: 0.07 seconds [09:00:55] RECOVERY - MariaDB Slave Lag: s2 on db2091 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [09:01:05] RECOVERY - MariaDB Slave Lag: s2 on db2035 is OK: OK slave_sql_lag Replication lag: 0.24 seconds [09:09:39] RECOVERY - MariaDB Slave Lag: s4 on db2051 is OK: OK slave_sql_lag Replication lag: 14.91 seconds [09:09:49] RECOVERY - MariaDB Slave Lag: s4 on db2058 is OK: OK slave_sql_lag Replication lag: 0.05 seconds [09:09:59] RECOVERY - MariaDB Slave Lag: s4 on db2091 is OK: OK slave_sql_lag Replication lag: 0.25 seconds [09:10:01] RECOVERY - MariaDB Slave Lag: s4 on db2095 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [09:10:11] RECOVERY - MariaDB Slave Lag: s4 on db2073 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [09:10:11] RECOVERY - MariaDB Slave Lag: s4 on db2065 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [09:10:21] RECOVERY - MariaDB Slave Lag: s4 on db2090 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [09:10:27] RECOVERY - MariaDB Slave Lag: s4 on db2084 is OK: OK slave_sql_lag Replication lag: 0.34 seconds [09:33:49] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 199879.93 seconds [09:34:01] PROBLEM - MariaDB Slave Lag: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 14664.97 seconds [09:38:26] 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-Cache, 10Language-Team (Language-2019-January-March), and 5 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) >>! In T203786#48803... [10:35:24] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "I don't think this is a good idea." [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/484547 (https://phabricator.wikimedia.org/T183546) (owner: 10Hashar) [11:25:21] PROBLEM - MariaDB Slave Lag: s2 on db2088 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.41 seconds [11:26:09] PROBLEM - MariaDB Slave Lag: s2 on db2035 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.67 seconds [11:26:51] PROBLEM - MariaDB Slave Lag: s2 on db2056 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.93 seconds [11:26:55] PROBLEM - MariaDB Slave Lag: s2 on db2041 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.71 seconds [11:26:55] PROBLEM - MariaDB Slave Lag: s2 on db2063 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.66 seconds [11:27:05] PROBLEM - MariaDB Slave Lag: s2 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 308.87 seconds [11:27:07] PROBLEM - MariaDB Slave Lag: s2 on db2049 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 308.73 seconds [11:27:11] PROBLEM - MariaDB Slave Lag: s2 on db2091 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 308.66 seconds [13:11:16] 10Operations, 10ops-codfw, 10cloud-services-team (Kanban): labstore2004 - memory error on DIMM A2 - https://phabricator.wikimedia.org/T214262 (10GTirloni) [13:18:29] RECOVERY - MariaDB Slave Lag: s4 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 204.10 seconds [13:19:31] PROBLEM - MariaDB Slave Lag: s2 on db2035 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.64 seconds [13:21:57] PROBLEM - MariaDB Slave Lag: s2 on db2035 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.13 seconds [13:22:21] PROBLEM - MariaDB Slave Lag: s2 on db2088 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 301.36 seconds [13:22:41] PROBLEM - MariaDB Slave Lag: s2 on db2056 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 301.69 seconds [13:22:43] PROBLEM - MariaDB Slave Lag: s2 on db2041 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 301.66 seconds [13:22:45] PROBLEM - MariaDB Slave Lag: s2 on db2063 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 301.91 seconds [13:22:49] PROBLEM - MariaDB Slave Lag: s2 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 303.08 seconds [13:22:57] PROBLEM - MariaDB Slave Lag: s2 on db2049 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.45 seconds [13:22:57] PROBLEM - MariaDB Slave Lag: s2 on db2091 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.63 seconds [13:54:29] (03PS1) 10Zoranzoki21: Merge the "extended-uploader" and "autopatrolled" user groups on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/485487 (https://phabricator.wikimedia.org/T214003) [13:58:29] 10Operations, 10Traffic, 10HTTPS: en.wikipedia.com [sic] serves an invalid certificate - https://phabricator.wikimedia.org/T214253 (10GKFX) [14:20:30] (03PS1) 10Zoranzoki21: Add few domains at $wgCopyUploadsDomains and cleanup inline comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/485489 (https://phabricator.wikimedia.org/T213961) [15:02:01] (03PS1) 10Zoranzoki21: Change $wgUploadNavigationUrl for the Persian (fa) Wikisource to Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/485490 (https://phabricator.wikimedia.org/T214048) [15:05:35] (03PS1) 10Zoranzoki21: Assign "suppressredirect" to rollbacker on newiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/485491 (https://phabricator.wikimedia.org/T214012) [15:13:17] !log Force WriteBack on db2040 - T214264 [15:13:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:21] T214264: BBU issues on codfw - https://phabricator.wikimedia.org/T214264 [16:06:14] (03PS1) 10Urbanecm: Remove ability for bureaucrats on outreachwiki to remove bureaucrat flag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/485493 (https://phabricator.wikimedia.org/T214133) [16:13:27] (03CR) 10Urbanecm: [C: 03+1] "LGTM!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/485491 (https://phabricator.wikimedia.org/T214012) (owner: 10Zoranzoki21) [16:15:39] (03CR) 10Urbanecm: [C: 03+1] "LGTM!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/485489 (https://phabricator.wikimedia.org/T213961) (owner: 10Zoranzoki21) [16:17:34] (03PS11) 10Urbanecm: Upload HD logos for several projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478498 (owner: 10Robingan7) [16:18:42] (03PS12) 10Urbanecm: Upload HD logos for several projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478498 (owner: 10Robingan7) [16:24:22] (03PS1) 10Urbanecm: Upload HD logos for several projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/485494 (https://phabricator.wikimedia.org/T150618) [16:25:03] (03CR) 10Urbanecm: [C: 04-1] "Since I had to completely redo all the files, I've uploaded a new version at https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478498 (owner: 10Robingan7) [16:28:35] (03PS1) 10Urbanecm: Use new logos in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/485495 (https://phabricator.wikimedia.org/T150618) [16:28:56] (03CR) 10Urbanecm: [C: 04-1] "Since I redid this patch, I've uploaded a new version at https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/485495. This sho" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478570 (owner: 10Robingan7) [16:29:27] (03CR) 10jerkins-bot: [V: 04-1] Use new logos in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/485495 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [16:37:55] PROBLEM - MariaDB Slave Lag: s4 on db2051 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 305.06 seconds [16:37:59] PROBLEM - MariaDB Slave Lag: s4 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.19 seconds [16:38:05] PROBLEM - MariaDB Slave Lag: s4 on db2058 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.81 seconds [16:38:09] PROBLEM - MariaDB Slave Lag: s4 on db2091 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 307.33 seconds [16:38:21] PROBLEM - MariaDB Slave Lag: s4 on db2073 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.26 seconds [16:38:25] PROBLEM - MariaDB Slave Lag: s4 on db2065 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 311.33 seconds [16:38:31] PROBLEM - MariaDB Slave Lag: s4 on db2090 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 311.12 seconds [16:38:39] PROBLEM - MariaDB Slave Lag: s4 on db2084 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 312.82 seconds [16:49:32] (03PS1) 10Giuseppe Lavagetto: Add a prune action [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/485499 (https://phabricator.wikimedia.org/T207703) [16:59:37] PROBLEM - MariaDB Slave Lag: s6 on db2039 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.25 seconds [16:59:43] PROBLEM - MariaDB Slave Lag: s6 on db2046 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.65 seconds [16:59:57] PROBLEM - MariaDB Slave Lag: s6 on db2076 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 303.48 seconds [17:00:05] PROBLEM - MariaDB Slave Lag: s6 on db2087 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 305.33 seconds [17:00:05] PROBLEM - MariaDB Slave Lag: s6 on db2053 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 305.37 seconds [17:00:09] PROBLEM - MariaDB Slave Lag: s6 on db2060 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 305.25 seconds [17:00:29] PROBLEM - MariaDB Slave Lag: s6 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.23 seconds [17:00:37] PROBLEM - MariaDB Slave Lag: s6 on db2089 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 316.79 seconds [17:00:45] PROBLEM - MariaDB Slave Lag: s6 on db2067 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 323.56 seconds [17:19:19] PROBLEM - MariaDB Slave Lag: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 601.90 seconds [17:21:25] 10Operations, 10Wikimedia-Mailing-lists: Reset list admin password for Wikies-l mailing list - https://phabricator.wikimedia.org/T214249 (10MarcoAurelio) I think that while we're at it, another list administrator could be added so there's always somebody able to moderate the list. [18:29:29] RECOVERY - MariaDB Slave Lag: s2 on db2088 is OK: OK slave_sql_lag Replication lag: 57.59 seconds [18:29:51] RECOVERY - MariaDB Slave Lag: s2 on db2095 is OK: OK slave_sql_lag Replication lag: 49.38 seconds [18:29:55] RECOVERY - MariaDB Slave Lag: s2 on db2056 is OK: OK slave_sql_lag Replication lag: 49.64 seconds [18:29:57] RECOVERY - MariaDB Slave Lag: s2 on db2041 is OK: OK slave_sql_lag Replication lag: 48.50 seconds [18:30:01] RECOVERY - MariaDB Slave Lag: s2 on db2063 is OK: OK slave_sql_lag Replication lag: 46.43 seconds [18:30:03] RECOVERY - MariaDB Slave Lag: s2 on db2091 is OK: OK slave_sql_lag Replication lag: 46.92 seconds [18:30:09] RECOVERY - MariaDB Slave Lag: s2 on db2049 is OK: OK slave_sql_lag Replication lag: 45.89 seconds [18:30:27] RECOVERY - MariaDB Slave Lag: s2 on db2035 is OK: OK slave_sql_lag Replication lag: 40.67 seconds [19:34:59] RECOVERY - MariaDB Slave Lag: s6 on db2067 is OK: OK slave_sql_lag Replication lag: 59.24 seconds [19:35:07] RECOVERY - MariaDB Slave Lag: s6 on db2039 is OK: OK slave_sql_lag Replication lag: 41.47 seconds [19:35:13] RECOVERY - MariaDB Slave Lag: s6 on db2046 is OK: OK slave_sql_lag Replication lag: 24.56 seconds [19:35:21] RECOVERY - MariaDB Slave Lag: s6 on db2076 is OK: OK slave_sql_lag Replication lag: 7.10 seconds [19:35:31] RECOVERY - MariaDB Slave Lag: s6 on db2087 is OK: OK slave_sql_lag Replication lag: 0.24 seconds [19:35:35] RECOVERY - MariaDB Slave Lag: s6 on db2053 is OK: OK slave_sql_lag Replication lag: 0.02 seconds [19:35:37] RECOVERY - MariaDB Slave Lag: s6 on db2060 is OK: OK slave_sql_lag Replication lag: 0.35 seconds [19:35:51] RECOVERY - MariaDB Slave Lag: s6 on db2095 is OK: OK slave_sql_lag Replication lag: 0.25 seconds [19:36:05] RECOVERY - MariaDB Slave Lag: s6 on db2089 is OK: OK slave_sql_lag Replication lag: 1.28 seconds [19:42:31] RECOVERY - MariaDB Slave Lag: s7 on db2087 is OK: OK slave_sql_lag Replication lag: 55.09 seconds [19:42:37] RECOVERY - MariaDB Slave Lag: s7 on db2095 is OK: OK slave_sql_lag Replication lag: 45.00 seconds [19:42:49] RECOVERY - MariaDB Slave Lag: s7 on db2077 is OK: OK slave_sql_lag Replication lag: 30.14 seconds [19:43:05] RECOVERY - MariaDB Slave Lag: s7 on db2068 is OK: OK slave_sql_lag Replication lag: 15.63 seconds [19:43:11] RECOVERY - MariaDB Slave Lag: s7 on db2047 is OK: OK slave_sql_lag Replication lag: 8.70 seconds [19:43:11] RECOVERY - MariaDB Slave Lag: s7 on db2054 is OK: OK slave_sql_lag Replication lag: 4.19 seconds [19:43:13] RECOVERY - MariaDB Slave Lag: s7 on db2040 is OK: OK slave_sql_lag Replication lag: 3.39 seconds [19:43:15] RECOVERY - MariaDB Slave Lag: s7 on db2086 is OK: OK slave_sql_lag Replication lag: 0.15 seconds [19:43:35] RECOVERY - MariaDB Slave Lag: s7 on db2061 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [20:21:26] 10Operations, 10Wikimedia-Mailing-lists: Adding administrator to mailing list for Wikimedia New Zealand - https://phabricator.wikimedia.org/T214271 (10Podzemnik) [21:43:35] PROBLEM - Memory correctable errors -EDAC- on kafka1023 is CRITICAL: 4.001 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=kafka1023&var-datasource=eqiad+prometheus/ops [21:48:45] (03CR) 10Krinkle: [C: 03+1] Drop the Wikipedia Zero debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482099 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [21:51:32] (03CR) 10Krinkle: "Hm.. while the zero.wp domains for end-users have been rerouted in DNS, the zero.wikimedia.org portal seems to still be accessible:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482101 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [23:06:19] RECOVERY - MariaDB Slave Lag: s4 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 287.70 seconds