[01:00:29] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received [01:01:37] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [01:44:35] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received [01:48:09] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [01:56:47] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received [01:57:53] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [02:27:37] PROBLEM - puppet last run on authdns2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:58:51] RECOVERY - puppet last run on authdns2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [03:33:39] PROBLEM - puppet last run on analytics1067 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz] [03:35:59] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received [03:37:05] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [03:37:23] PROBLEM - puppet last run on mw2234 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz] [03:38:13] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 967.98 seconds [04:03:25] RECOVERY - puppet last run on mw2234 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [04:04:53] RECOVERY - puppet last run on analytics1067 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [04:45:03] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 278.18 seconds [05:51:15] PROBLEM - Memory correctable errors -EDAC- on kafka1023 is CRITICAL: 4.001 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=kafka1023&var-datasource=eqiad+prometheus/ops [06:28:37] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:28:51] PROBLEM - netbox HTTPS on netmon1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 547 bytes in 0.012 second response time [06:30:39] PROBLEM - puppet last run on mw1289 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/apache-status] [06:37:07] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational [06:37:23] RECOVERY - netbox HTTPS on netmon1002 is OK: HTTP OK: HTTP/1.1 302 Found - 348 bytes in 0.552 second response time [06:56:37] RECOVERY - puppet last run on mw1289 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [07:09:29] (03PS1) 10Revi: [WIP] Change links of wgGEHelpPanelLinks for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483996 (https://phabricator.wikimedia.org/T209467) [07:10:16] (03PS2) 10Revi: [WIP] Change links of wgGEHelpPanelLinks for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483996 (https://phabricator.wikimedia.org/T209467) [07:19:33] (03PS4) 10Revi: Add SPF record for wikidata.org [dns] - 10https://gerrit.wikimedia.org/r/477034 (https://phabricator.wikimedia.org/T210134) [07:20:17] can ^ be SWAT'ed? [07:20:26] I mean puppet swat? [07:31:51] PROBLEM - HHVM rendering on mw1229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:32:43] 10Operations, 10DNS, 10Mail, 10Traffic, and 3 others: wikidata.org lacks SPF record - https://phabricator.wikimedia.org/T210134 (10Peachey88) [07:32:45] 10Operations, 10Mail, 10Patch-For-Review: SPF record for canonical domains - https://phabricator.wikimedia.org/T193408 (10Peachey88) [07:32:55] RECOVERY - HHVM rendering on mw1229 is OK: HTTP OK: HTTP/1.1 200 OK - 74987 bytes in 0.110 second response time [09:24:03] (03CR) 10Luca Mauri: [C: 03+1] gerrit: Add colour to PolyGerrit header and update the theme slightly [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [09:33:14] (03PS1) 10Alexandros Kosiaris: Don't page for zotero [puppet] - 10https://gerrit.wikimedia.org/r/483999 [11:07:53] (03PS5) 10MarcoAurelio: [WIP] mediawiki: Stop logging each run of purge_abusefilter.pp [puppet] - 10https://gerrit.wikimedia.org/r/483876 (https://phabricator.wikimedia.org/T213591) [15:35:01] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received [15:36:07] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [15:58:40] (03PS1) 10Hoo man: Fix README typo, clarify variable naming [dumps/dcat] - 10https://gerrit.wikimedia.org/r/484011 [16:22:07] (03CR) 10ArielGlenn: Fix README typo, clarify variable naming (031 comment) [dumps/dcat] - 10https://gerrit.wikimedia.org/r/484011 (owner: 10Hoo man) [16:33:57] !log Updated operations/dumps/dcat (559dee37452..a86285f4e7) on snapshot1008 [16:33:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:41:16] (03CR) 10Hoo man: Fix README typo, clarify variable naming (031 comment) [dumps/dcat] - 10https://gerrit.wikimedia.org/r/484011 (owner: 10Hoo man) [17:00:51] (03CR) 10Andrew Bogott: [C: 03+2] wmcs: add a script to update VPS proxies [puppet] - 10https://gerrit.wikimedia.org/r/483902 (https://phabricator.wikimedia.org/T213540) (owner: 10Andrew Bogott) [17:06:47] PROBLEM - puppet last run on puppetmaster1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:13:38] andrewbogott ^^ (puppet error may be related to your change) [17:13:57] paladox: I doubt it but I'll check [17:14:20] ok [17:17:11] RECOVERY - puppet last run on puppetmaster1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:36:23] (03PS1) 10Urbanecm: Create extra namespace in kawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484021 (https://phabricator.wikimedia.org/T212956) [18:28:52] (03PS1) 10Andrew Bogott: Update IPs for the new VPS proxy host [puppet] - 10https://gerrit.wikimedia.org/r/484024 (https://phabricator.wikimedia.org/T213540) [18:31:09] (03CR) 10Andrew Bogott: [C: 03+2] Update IPs for the new VPS proxy host [puppet] - 10https://gerrit.wikimedia.org/r/484024 (https://phabricator.wikimedia.org/T213540) (owner: 10Andrew Bogott) [19:29:14] 10Operations, 10DBA: correctable memory errors db1068 (commons primary master database - https://phabricator.wikimedia.org/T213664 (10Paladox) [19:30:16] 10Operations, 10DBA: correctable memory errors db1068 (commons primary master database) - https://phabricator.wikimedia.org/T213664 (10jcrespo) [19:31:12] 10Operations, 10ops-eqiad, 10DBA: correctable memory errors db1068 (commons primary master database) - https://phabricator.wikimedia.org/T213664 (10Reedy) [20:09:26] 10Operations, 10ops-eqiad, 10DBA: correctable memory errors db1068 (commons primary master database) - https://phabricator.wikimedia.org/T213664 (10Marostegui) Those have showed up before and normally get corrected by themselves after a few days [20:20:22] 10Operations, 10DBA: correctable memory errors db1068 (commons primary master database) - https://phabricator.wikimedia.org/T213664 (10Marostegui) Removing ops-eqiad tag as there is no action needed from Chris as this point. [20:30:48] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review, 10User-Marostegui: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10Marostegui) [21:39:02] (03PS6) 10Mathew.onipe: Elasticsearch failed shard allocation check [puppet] - 10https://gerrit.wikimedia.org/r/482297 (https://phabricator.wikimedia.org/T212850) [21:39:38] (03CR) 10Mathew.onipe: Elasticsearch failed shard allocation check (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/482297 (https://phabricator.wikimedia.org/T212850) (owner: 10Mathew.onipe) [21:48:22] 10Operations, 10Discovery-Search, 10Elasticsearch, 10Maps, and 2 others: Cleanup/Improve elasticsearch/maps/wdqs doc in wikitech - https://phabricator.wikimedia.org/T213665 (10Mathew.onipe) [21:48:32] 10Operations, 10Discovery-Search, 10Elasticsearch, 10Maps, and 2 others: Cleanup/Improve elasticsearch/maps/wdqs doc in wikitech - https://phabricator.wikimedia.org/T213665 (10Mathew.onipe) p:05Triage→03Normal [21:52:57] Just got an OTRS ticket (Ticket#2019011310003926) complaining about a "Secure Connection Failure" in Firefox, Chrome, and Edge only on Wikipedia. No other details were provided, but I figured I'd ask here if anyone had any details before responding [22:00:37] just saw the same [22:00:38] no alarms are going off [22:00:41] I can get to enwiki fine [22:00:55] I was going to ask what country they're in and if their ISP may be interfering [22:19:39] (03PS1) 10GTirloni: wmcs::nfs::misc - Fix typo and nsswitch.conf file [puppet] - 10https://gerrit.wikimedia.org/r/484149 (https://phabricator.wikimedia.org/T209527) [22:48:22] (03PS1) 10BryanDavis: toolforge: Add dev packages for giftbot [puppet] - 10https://gerrit.wikimedia.org/r/484151 (https://phabricator.wikimedia.org/T213646) [23:03:54] 10Operations, 10media-storage: Lost file Juan_Guaidó.jpg - https://phabricator.wikimedia.org/T213655 (10Aklapper) No matter if I set `eqiad`, `esams` or `codfw` for the `xxxxx` in `wget -S --header 'host: upload.wikimedia.org' 'http://upload-lb.xxxxx.wikimedia.org/wikipedia/commons/c/ca/Juan_Guaid%C3%B3.jpg'`,... [23:34:49] (03CR) 10BryanDavis: [C: 04-1] "Looking deeper. I think this may be the wrong approach." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/484151 (https://phabricator.wikimedia.org/T213646) (owner: 10BryanDavis) [23:37:36] (03Abandoned) 10BryanDavis: toolforge: Add dev packages for giftbot [puppet] - 10https://gerrit.wikimedia.org/r/484151 (https://phabricator.wikimedia.org/T213646) (owner: 10BryanDavis)