[01:07:36] ACKNOWLEDGEMENT - MegaRAID on helium is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T226912 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [01:07:39] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226912 (10ops-monitoring-bot) [01:38:51] ACKNOWLEDGEMENT - MegaRAID on helium is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T226913 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [01:38:55] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226913 (10ops-monitoring-bot) [01:55:10] 10Operations, 10observability: ops-monitoring-bot creating dupes - https://phabricator.wikimedia.org/T226908 (10Peachey88) Is this actually a problem with the bot, or with how acknowledgements are working in icinga? [01:55:30] 10Operations, 10Icinga, 10observability: ops-monitoring-bot creating dupes - https://phabricator.wikimedia.org/T226908 (10Peachey88) [02:10:14] ACKNOWLEDGEMENT - MegaRAID on helium is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T226915 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [02:10:17] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226915 (10ops-monitoring-bot) [03:12:57] ACKNOWLEDGEMENT - MegaRAID on helium is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T226916 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [03:13:04] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226916 (10ops-monitoring-bot) [03:42:21] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T224794 (10Peachey88) [03:42:23] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226915 (10Peachey88) [03:42:25] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226916 (10Peachey88) [03:42:27] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226913 (10Peachey88) [03:42:29] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226912 (10Peachey88) [04:29:00] (03CR) 10KartikMistry: [C: 03+1] "> > This can wait until July 11." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518260 (https://phabricator.wikimedia.org/T225398) (owner: 10Petar.petkovic) [05:18:36] ACKNOWLEDGEMENT - MegaRAID on helium is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T226917 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [05:18:39] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226917 (10ops-monitoring-bot) [05:36:08] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226917 (10Peachey88) [05:36:10] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T224794 (10Peachey88) [06:10:53] ACKNOWLEDGEMENT - MegaRAID on helium is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T226919 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [06:10:57] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226919 (10ops-monitoring-bot) [06:17:19] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226919 (10Peachey88) [06:17:21] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T224794 (10Peachey88) [06:28:37] PROBLEM - puppet last run on elastic1045 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [06:28:45] (03CR) 10Urbanecm: [C: 03+1] "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518260 (https://phabricator.wikimedia.org/T225398) (owner: 10Petar.petkovic) [06:33:17] PROBLEM - puppet last run on analytics1048 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/R/update-library.R] [06:55:05] RECOVERY - puppet last run on analytics1048 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [07:01:20] RECOVERY - puppet last run on elastic1045 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [07:05:23] !log Remove 2FA from User:SQL (T226918) [07:05:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:05:29] T226918: Disable two factor authentication for Wikimedia account "SQL" - https://phabricator.wikimedia.org/T226918 [07:06:13] <3 [07:25:01] PROBLEM - puppet last run on multatuli is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [07:52:19] RECOVERY - puppet last run on multatuli is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:08:22] 10Operations, 10Wikimedia-Mailing-lists, 10Space (Jan-Mar-2020): Integrate mailing lists in Wikimedia Space - https://phabricator.wikimedia.org/T226727 (10Tgr) Note that deleting a thread and purging it from storage are different things. The latter will be needed here for compliance with the data retention p... [08:37:18] ACKNOWLEDGEMENT - MegaRAID on helium is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T226921 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [08:37:21] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226921 (10ops-monitoring-bot) [08:48:31] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226921 (10Peachey88) [08:48:34] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T224794 (10Peachey88) [09:08:41] ACKNOWLEDGEMENT - MegaRAID on helium is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T226923 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [09:08:44] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226923 (10ops-monitoring-bot) [09:11:36] like most places, shit gets f'up when people start polarizing into bad/ [09:11:53] ugh. wwe [09:19:00] (03CR) 10Gergő Tisza: Create new http://www.mediawiki.org/xml/sitelist-1.1/ to reference sitelist-1.1.xsd (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/508130 (https://phabricator.wikimedia.org/T222516) (owner: 10Luca Mauri) [09:40:04] ACKNOWLEDGEMENT - MegaRAID on helium is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T226924 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [09:40:06] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226924 (10ops-monitoring-bot) [10:45:50] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226923 (10Reedy) [10:45:52] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T224794 (10Reedy) [10:46:08] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226924 (10Reedy) [10:46:10] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T224794 (10Reedy) [10:47:11] 10Operations, 10Icinga, 10observability: ops-monitoring-bot creating dupes - https://phabricator.wikimedia.org/T226908 (10Reedy) >>! In T226908#5294154, @Peachey88 wrote: > Is this actually a problem with the bot, or with how acknowledgements are working in icinga? Pass. The user facing "issue" is the dupli... [11:56:03] ACKNOWLEDGEMENT - MegaRAID on helium is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T226933 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [11:56:06] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226933 (10ops-monitoring-bot) [11:57:21] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226933 (10Peachey88) [11:57:23] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T224794 (10Peachey88) [12:37:54] ACKNOWLEDGEMENT - MegaRAID on helium is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T226936 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [12:37:59] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226936 (10ops-monitoring-bot) [12:49:10] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T224794 (10Peachey88) [12:49:12] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226936 (10Peachey88) [12:49:57] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T224794 (10Peachey88) [12:49:59] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226936 (10Peachey88) [12:50:06] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T226936 (10Peachey88) [12:50:09] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T224794 (10Peachey88) [13:06:24] 10Operations, 10Icinga, 10observability: ops-monitoring-bot creating dupes - https://phabricator.wikimedia.org/T226908 (10Volans) Sorry for the spam. My guess is that the check is flapping between critical and unknown. The script ignores the unknowns but it doesn't know if there is already a task opened (lon... [14:33:53] 10Operations, 10media-storage: Not possible to server-side upload certain images - https://phabricator.wikimedia.org/T226937 (10Urbanecm) [14:34:22] 10Operations, 10media-storage: Not possible to server-side upload certain images - https://phabricator.wikimedia.org/T226937 (10Urbanecm) Tagging @Reedy, who had problems with this too in T226845. [14:40:00] PROBLEM - puppet last run on bast2002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [15:07:20] RECOVERY - puppet last run on bast2002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:21:53] 10Operations, 10LDAP-Access-Requests: Grant WMDE engineers access to logstash and creating grafana boards / Add WMDE engineers to 'nda' LDAP group - https://phabricator.wikimedia.org/T225004 (10Addshore) >>! In T225004#5291475, @MoritzMuehlenhoff wrote: > we have two ways to approach this: If you specifically... [17:38:56] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 55, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [17:39:34] RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 68, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [17:43:46] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 36 probes of 433 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [17:45:34] (03CR) 10ArielGlenn: dumpwikidatajson: Fix error code detection (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/519494 (https://phabricator.wikimedia.org/T226601) (owner: 10Hoo man) [17:49:11] 10Operations, 10Traffic, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Reduce / remove the aggessive cache busting behaviour of wdqs-updater - https://phabricator.wikimedia.org/T217897 (10Addshore) I guess this will eventually be in wdqs 0.3.3 ? [17:49:14] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 19 probes of 433 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [18:21:50] 10Operations, 10Wikidata, 10wikidata-tech-focus: Move dispatching of wikidata to a dedicated node - https://phabricator.wikimedia.org/T193733 (10Addshore) >>! In T193733#5276659, @Ladsgroup wrote: > So I like to just drop the whole thing but first we need to address {T220696} which enables us to make all the... [18:55:12] PROBLEM - puppet last run on relforge1001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [19:22:28] RECOVERY - puppet last run on relforge1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [20:59:17] 10Operations, 10media-storage: Not possible to server-side upload certain images - https://phabricator.wikimedia.org/T226937 (10Urbanecm) ` [urbanecm@mwmaint1002 T223052-upload2]$ ls Hurtigruten.05.11.1920x1080.NRK2.webm [urbanecm@mwmaint1002 T223052-upload2]$ mwscript importImages.php --wiki=commonswiki --use... [21:08:27] 10Operations, 10media-storage: Not possible to server-side upload certain images: "An unknown error occurred in storage backend "local-swift-eqiad"" - https://phabricator.wikimedia.org/T226937 (10Aklapper) [21:14:30] PROBLEM - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [21:16:27] jbond42|away: Are you around? I need to PM you or a en-wiki sysadmin... [21:23:17] (03PS1) 10Aaron Schulz: Update my obsolete YubiKey-stored SSH keys [puppet] - 10https://gerrit.wikimedia.org/r/519941 [22:02:49] 10Operations, 10Traffic, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Reduce / remove the aggessive cache busting behaviour of wdqs-updater - https://phabricator.wikimedia.org/T217897 (10Smalyshev) Eventually, yes. [22:24:42] PROBLEM - puppet last run on dns2001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [22:25:32] PROBLEM - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [22:51:58] RECOVERY - puppet last run on dns2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:24:35] (03PS1) 10Urbanecm: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519945 [23:24:37] (03CR) 10Urbanecm: [C: 03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519945 (owner: 10Urbanecm) [23:25:38] (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519945 (owner: 10Urbanecm) [23:26:20] (03CR) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/519945 (owner: 10Urbanecm) [23:27:08] !log urbanecm@deploy1001 Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 45s) [23:27:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:36] Urbanecm: Um [23:49:43] Why are you merging and deploying no-ops?