[00:27:51] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [00:39:17] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 80%, RTA = 231.37 ms [00:52:29] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [00:58:13] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 229.74 ms [01:05:41] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [01:17:09] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 230.07 ms [01:33:35] 10Operations, 10DBA, 10Jade, 10Patch-For-Review, and 2 others: Introduce a new namespace for collaborative judgements about wiki entities - https://phabricator.wikimedia.org/T200297 (10kchapman) TechCom has approved this RFC [01:41:47] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [01:47:31] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 80%, RTA = 230.41 ms [02:17:02] 10Operations, 10serviceops, 10PHP 7.2 support, 10Performance-Team (Radar), and 2 others: PHP 7 corruption during deployment (was: PHP 7 fatals on mw1262) - https://phabricator.wikimedia.org/T224491 (10Krinkle) The cause of these notices was code like this: `lang=php setThing( array $list ) { $key = im... [02:17:55] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [02:23:39] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 229.68 ms [02:57:29] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [03:08:57] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 86%, RTA = 233.35 ms [03:16:25] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [03:27:53] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 86%, RTA = 229.54 ms [03:35:21] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [03:41:05] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 86%, RTA = 229.54 ms [03:44:17] PROBLEM - snapshot of s6 in codfw on db1115 is CRITICAL: snapshot for s6 at codfw taken more than 4 days ago: Most recent backup 2019-07-10 03:23:51 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [04:13:15] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [04:18:59] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 231.02 ms [04:39:39] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [04:45:25] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 230.13 ms [04:52:27] (03CR) 10Hashar: "recheck" [debs/python-git-archive-all] - 10https://gerrit.wikimedia.org/r/522428 (owner: 10Hashar) [05:17:31] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [05:46:11] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 229.68 ms [06:01:09] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [06:12:41] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 229.60 ms [06:31:31] PROBLEM - puppet last run on deploy1001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [06:32:17] 10Operations, 10serviceops, 10PHP 7.2 support, 10Performance-Team (Radar), and 2 others: PHP 7 corruption during deployment (was: PHP 7 fatals on mw1262) - https://phabricator.wikimedia.org/T224491 (10jijiki) @Krinkle Should we reopen this task then? [06:33:07] PROBLEM - puppet last run on cp2010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/prometheus-puppet-agent-stats] [06:33:17] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [06:33:37] PROBLEM - puppet last run on mw1314 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [06:39:01] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 86%, RTA = 230.57 ms [06:46:29] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [06:54:55] RECOVERY - puppet last run on cp2010 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:57:57] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 229.54 ms [06:58:49] RECOVERY - puppet last run on deploy1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:00:55] RECOVERY - puppet last run on mw1314 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:31:49] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [07:37:33] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 229.54 ms [08:05:40] 10Operations, 10netops: mr1-eqsin.oob IPv6 connectivity flapping - https://phabricator.wikimedia.org/T227967 (10elukey) [08:05:46] 10Operations, 10netops: mr1-eqsin.oob IPv6 connectivity flapping - https://phabricator.wikimedia.org/T227967 (10elukey) p:05Triage→03High [08:32:37] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [08:49:49] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 231.01 ms [08:57:17] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [09:03:01] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 229.44 ms [09:10:27] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [09:21:55] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 229.56 ms [09:29:23] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [09:35:09] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 229.70 ms [09:42:37] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [10:28:27] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 229.47 ms [10:35:55] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [10:58:51] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 229.51 ms [11:06:17] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [11:12:01] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 229.53 ms [11:30:57] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [11:42:25] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 229.52 ms [11:57:17] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [12:01:31] !log Running mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Sporti /home/urbanecm/T227968 for server side upload [12:01:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:03:01] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 229.54 ms [12:10:31] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [12:16:13] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 80%, RTA = 229.59 ms [12:29:25] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [12:40:53] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 230.92 ms [12:54:05] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [12:59:49] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING WARNING - Packet loss = 93%, RTA = 229.51 ms [13:18:39] !log silence mr1-eqsin.oob IPv6 until tomorrow 8 UTC - T227967 [13:18:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:45] T227967: mr1-eqsin.oob IPv6 connectivity flapping - https://phabricator.wikimedia.org/T227967 [13:36:09] 10Operations, 10media-storage: Not possible to server-side upload certain images: "An unknown error occurred in storage backend "local-swift-eqiad"" - https://phabricator.wikimedia.org/T226937 (10Urbanecm) @fgiunchedi Hi, is `Importing Radomlje_-_Nafta_1903_0-0_(2._polčas).webm...failed. (An unknown error occu... [14:11:24] 10Operations, 10media-storage: Not possible to server-side upload certain images: "An unknown error occurred in storage backend "local-swift-eqiad"" - https://phabricator.wikimedia.org/T226937 (10Urbanecm) [14:48:30] 10Operations, 10media-storage: Not possible to server-side upload certain images: "An unknown error occurred in storage backend "local-swift-eqiad"" - https://phabricator.wikimedia.org/T226937 (10Urbanecm) FYI, whole importImages.php took 3 mins and 17 secs, which is below the timeout (even the before-change t... [14:54:51] PROBLEM - puppet last run on scb1004 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [15:22:05] RECOVERY - puppet last run on scb1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:55:52] (03PS1) 10Urbanecm: Move private and fishbowl overrides from groupOverrides to groupOverrides2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/522987 [15:56:19] * Urbanecm is stashing on mwdebug1001 [15:56:40] (03PS2) 10Urbanecm: Move private and fishbowl overrides from groupOverrides to groupOverrides2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/522987 [16:03:40] * Urbanecm is done with stashing on mwdebug [16:08:07] (03PS3) 10Urbanecm: Move private and fishbowl overrides from groupOverrides to groupOverrides2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/522987 (https://phabricator.wikimedia.org/T227980) [16:13:11] (03PS2) 10ArielGlenn: Remove BETA from RDF dump filenames [puppet] - 10https://gerrit.wikimedia.org/r/518108 (https://phabricator.wikimedia.org/T226153) (owner: 10Smalyshev) [16:15:01] (03CR) 10ArielGlenn: [C: 03+2] Remove BETA from RDF dump filenames [puppet] - 10https://gerrit.wikimedia.org/r/518108 (https://phabricator.wikimedia.org/T226153) (owner: 10Smalyshev) [16:33:11] PROBLEM - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [16:33:41] PROBLEM - Apache HTTP on mw1347 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [16:34:57] RECOVERY - Apache HTTP on mw1347 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.038 second response time https://wikitech.wikimedia.org/wiki/Application_servers [16:49:31] PROBLEM - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [16:53:23] ACKNOWLEDGEMENT - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL Cas Rusnov Muting until next working day. - The acknowledgement expires at: 2019-07-15 14:00:00. https://wikitech.wikimedia.org/wiki/Netbox%23Reports [18:00:31] (03PS2) 10CDanis: conftool: add support for --version to all executables [software/conftool] - 10https://gerrit.wikimedia.org/r/522235 [18:31:37] (03PS1) 10Gergő Tisza: Allow CORS access to publichtml (people.wikimedia.org) [puppet] - 10https://gerrit.wikimedia.org/r/522991 (https://phabricator.wikimedia.org/T224068) [18:35:25] (03PS1) 10CDanis: WIP: nrpe: fix confusing comment [puppet] - 10https://gerrit.wikimedia.org/r/522992 [18:37:57] (03PS2) 10CDanis: WIP: nrpe: fix confusing comment [puppet] - 10https://gerrit.wikimedia.org/r/522992 [18:38:45] (03CR) 10jerkins-bot: [V: 04-1] WIP: nrpe: fix confusing comment [puppet] - 10https://gerrit.wikimedia.org/r/522992 (owner: 10CDanis) [18:43:47] (03PS3) 10CDanis: WIP: nrpe: fix confusing comment & critical's a bool [puppet] - 10https://gerrit.wikimedia.org/r/522992 [19:06:27] PROBLEM - HHVM rendering on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [19:07:45] RECOVERY - HHVM rendering on mw1234 is OK: HTTP OK: HTTP/1.1 200 OK - 75948 bytes in 0.198 second response time https://wikitech.wikimedia.org/wiki/Application_servers [19:30:57] (03PS4) 10CDanis: nrpe: $critical is a boolean, NOT a string! 😤 [puppet] - 10https://gerrit.wikimedia.org/r/522992 [19:38:43] (03CR) 10CDanis: "PCC says no-op: https://puppet-compiler.wmflabs.org/compiler1001/17355/" [puppet] - 10https://gerrit.wikimedia.org/r/522992 (owner: 10CDanis) [20:52:56] * Krinkle debugging on mwdebug1002 [21:16:07] (03PS1) 10Urbanecm: Enable WikiLove and SandboxLink on sqwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/523003 (https://phabricator.wikimedia.org/T227970) [21:22:12] (03PS1) 10Urbanecm: Grant skipcaptcha to everyone coming from whitelisted IP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/523005 (https://phabricator.wikimedia.org/T227487) [21:22:43] (03PS2) 10Urbanecm: Grant skipcaptcha to everyone coming from whitelisted IP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/523005 (https://phabricator.wikimedia.org/T227487) [21:24:00] (03CR) 10DannyS712: [C: 03+1] "Looks good to me" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/523005 (https://phabricator.wikimedia.org/T227487) (owner: 10Urbanecm) [21:44:52] * Krinkle resetting mwdebug1002 [22:07:21] (03PS4) 10Urbanecm: Rename `Image-reviewer` to `image-reviewer` [mediawiki-config] - 10https://gerrit.wikimedia.org/r/520283 (https://phabricator.wikimedia.org/T216406) [22:09:56] (03PS5) 10Urbanecm: Rename `Image-reviewer` to `image-reviewer` [mediawiki-config] - 10https://gerrit.wikimedia.org/r/520283 (https://phabricator.wikimedia.org/T216406) [22:15:10] (03PS1) 10Urbanecm: Create image-reviewer for commonswiki with same rights as Image-reviewer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/523006 (https://phabricator.wikimedia.org/T216406) [22:16:57] (03PS6) 10Urbanecm: Rename `Image-reviewer` to `image-reviewer` [mediawiki-config] - 10https://gerrit.wikimedia.org/r/520283 (https://phabricator.wikimedia.org/T216406) [22:22:43] (03PS7) 10Urbanecm: Rename `Image-reviewer` to `image-reviewer` for Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/520283 (https://phabricator.wikimedia.org/T216406) [23:36:51] 10Operations, 10netops: mr1-eqsin.oob IPv6 connectivity flapping - https://phabricator.wikimedia.org/T227967 (10ayounsi) a:03ayounsi Thanks, email sent to Equinix NOC. So far I don't think there is a link between the ripe alerts and the oob alerts. [23:48:33] (03PS5) 10CDanis: nrpe: $critical is a boolean, NOT a string! 😤 [puppet] - 10https://gerrit.wikimedia.org/r/522992 (https://phabricator.wikimedia.org/T113783)