[00:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: That opportune time is upon us again. Time for a Evening SWAT (Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190124T0000). [00:00:04] Jdlrobson: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:23] \o [00:02:18] @twentyafterfour are you able to run swat today? [00:02:46] (03PS1) 10Cwhite: prometheus: upgrade to node-exporter 0.17 [puppet] - 10https://gerrit.wikimedia.org/r/486192 (https://phabricator.wikimedia.org/T213708) [00:03:18] (03CR) 10jerkins-bot: [V: 04-1] prometheus: upgrade to node-exporter 0.17 [puppet] - 10https://gerrit.wikimedia.org/r/486192 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [00:03:33] (03PS2) 10Cwhite: prometheus: upgrade to node-exporter 0.17 in backports [puppet] - 10https://gerrit.wikimedia.org/r/486192 (https://phabricator.wikimedia.org/T213708) [00:04:06] (03CR) 10jerkins-bot: [V: 04-1] prometheus: upgrade to node-exporter 0.17 in backports [puppet] - 10https://gerrit.wikimedia.org/r/486192 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [00:06:24] (03PS3) 10Cwhite: prometheus: upgrade to node-exporter 0.17 in backports [puppet] - 10https://gerrit.wikimedia.org/r/486192 (https://phabricator.wikimedia.org/T213708) [00:06:26] thcipriani: are you free? [00:07:28] jdlrobson: I think I could be...I can SWAT [00:08:11] thank you thcipriani! Got some broken iOS clients and unhappy wikivoyagers that SWAT will make happy [00:08:14] (03PS4) 10Cwhite: prometheus: upgrade to node-exporter 0.17 in backports [puppet] - 10https://gerrit.wikimedia.org/r/486192 (https://phabricator.wikimedia.org/T213708) [00:09:20] that's good :) now I just have to remind myself how to do this :P [00:11:31] jdlrobson: looks like you abandoned one of these... [00:12:16] this one needs to go correct? https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/MinervaNeue/+/486148/ [00:13:07] and by "go" I mean "be deployed during SWAT" to be less ambiguous [00:14:17] sorrry [00:14:31] yes https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/MinervaNeue/+/486148/ is the right one [00:14:36] not sure why jenkins is unhappy [00:14:42] you can switch to the other 2 while i check that out [00:14:58] subprocess.CalledProcessError: Command '['npm', 'install']' returned non-zero exit status 1 < oh i guess that's a blip? [00:15:53] ah, yeah, that'll do it, I'm having jenkins recheck, we'll get the others out while we wait, hopefully just a momentary anomaly [00:18:30] sounndddss good [00:18:57] PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: Traceback (most recent call last): https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [00:22:21] (03PS10) 10Tim Starling: Class wrapper for ProductionServices.php etc. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477956 [00:23:06] (03CR) 10Tim Starling: [C: 03+2] Class wrapper for ProductionServices.php etc. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477956 (owner: 10Tim Starling) [00:23:17] RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 1 probes of 409 (alerts on 35) - https://atlas.ripe.net/measurements/1791307/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [00:23:41] (03Merged) 10jenkins-bot: Class wrapper for ProductionServices.php etc. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477956 (owner: 10Tim Starling) [00:24:54] thcipriani: are you deploying? [00:25:19] AaronSchulz: I am nominally although I haven't actually started yet [00:25:40] (03CR) 10jenkins-bot: Class wrapper for ProductionServices.php etc. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477956 (owner: 10Tim Starling) [00:26:23] why do you ask? do you have something for SWAT or were you hoping to scap something? [00:27:33] I had https://gerrit.wikimedia.org/r/c/486134/ [00:29:34] AaronSchulz: I could deploy it after these patches that are merging, or I could ping you when I'm done deploying? [00:29:59] jdlrobson: mobilefrontend changes for wmf.13 and wmf.14 are on mwdebug1002, check please [00:30:18] testing! thanks! [00:30:50] thcipriani: as long as I know when it's merged [00:31:39] permission to sync thcipriani [00:31:51] jdlrobson: cool, going wmf.14 first [00:32:23] PROBLEM - HHVM rendering on mwdebug1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:33] sounds good! [00:33:27] RECOVERY - HHVM rendering on mwdebug1002 is OK: HTTP OK: HTTP/1.1 200 OK - 76854 bytes in 0.671 second response time [00:34:31] !log thcipriani@deploy1001 Synchronized php-1.33.0-wmf.14/extensions/MobileFrontend: SWAT: [[gerrit:486147|Explicitly pass in parseHTML]] T214451 (duration: 00m 57s) [00:34:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:34:36] T214451: [Bug] A large amount of our errors are occurring in iOS Safari - https://phabricator.wikimedia.org/T214451 [00:34:41] ^ jdlrobson live now [00:35:08] (03PS4) 10Krinkle: errorpages: Remove unused php-fatal-error.html file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478575 (https://phabricator.wikimedia.org/T113114) [00:35:56] AaronSchulz: I suppose I'll let you deploy it, I'll ping you when all clear, sound fine? [00:36:24] ok [00:36:45] thanks thcipriani [00:37:22] jdlrobson: minervaneue update for wmf.14 live on mwdebug1002, check please [00:39:03] checking! [00:40:21] thcipriani: confirmed! please sync! [00:40:46] just realized i didn't sync wmf.13 for mobilefrontend, will do that first then minervaneue [00:41:47] cool was gonna ask about that :) [00:41:57] 1.33.0-wmf.13 is where I'm seeing most of the bugs. [00:42:45] !log thcipriani@deploy1001 Synchronized php-1.33.0-wmf.13/extensions/MobileFrontend: SWAT: [[gerrit:486146|Explicitly pass in parseHTML]] T214451 (duration: 00m 55s) [00:42:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:42:54] T214451: [Bug] A large amount of our errors are occurring in iOS Safari - https://phabricator.wikimedia.org/T214451 [00:43:00] ^ jdlrobson mobilefrontend live for wmf.13 [00:44:06] (03PS7) 10BryanDavis: toolforge: process dynamicproxy access logs [puppet] - 10https://gerrit.wikimedia.org/r/482237 (https://phabricator.wikimedia.org/T87001) [00:45:12] (03CR) 10BryanDavis: "> Uploaded patch set 7." [puppet] - 10https://gerrit.wikimedia.org/r/482237 (https://phabricator.wikimedia.org/T87001) (owner: 10BryanDavis) [00:45:17] !log thcipriani@deploy1001 Synchronized php-1.33.0-wmf.14/skins/MinervaNeue/includes/skins/minerva.mustache: SWAT: [[gerrit:486148|Restore banners to Wikivoyage project]] (duration: 00m 52s) [00:45:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:45:31] ^ jdlrobson minervaneue change live now [00:46:06] AaronSchulz: SWAT is complete, deployment server is yours [00:46:23] ok [00:46:47] (03PS3) 10Aaron Schulz: Switch parser cache top tier backing cache to mcrouter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486134 (https://phabricator.wikimedia.org/T214275) [00:47:04] * Krinkle queues up after Aaron for a no-op change [00:47:51] (03CR) 10Aaron Schulz: [C: 03+2] Switch parser cache top tier backing cache to mcrouter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486134 (https://phabricator.wikimedia.org/T214275) (owner: 10Aaron Schulz) [00:49:05] (03Merged) 10jenkins-bot: Switch parser cache top tier backing cache to mcrouter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486134 (https://phabricator.wikimedia.org/T214275) (owner: 10Aaron Schulz) [00:51:57] thcipriani: looks like 694ca2ddb173012ebcba5507f is not deployed [00:52:37] (03CR) 10jenkins-bot: Switch parser cache top tier backing cache to mcrouter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486134 (https://phabricator.wikimedia.org/T214275) (owner: 10Aaron Schulz) [00:53:25] AaronSchulz: hrm, I see that, but I didn't merge that for SWAT. TimStarling ^ were you going to deploy something? [00:53:36] yeah sorry [00:53:36] (03CR) 10Krinkle: Add pycountry to analytics cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/484994 (https://phabricator.wikimedia.org/T209857) (owner: 10Gilles) [00:53:40] (03CR) 10Krinkle: [C: 03+1] Add pycountry to analytics cluster [puppet] - 10https://gerrit.wikimedia.org/r/484994 (https://phabricator.wikimedia.org/T209857) (owner: 10Gilles) [00:53:46] I'll do it now? [00:54:12] ok, i'll wait [00:54:14] (03CR) 10Krinkle: [C: 03+2] errorpages: Remove unused php-fatal-error.html file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478575 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [00:54:25] * Krinkle aborts [00:54:56] (03CR) 10Andrew Bogott: [C: 03+2] toolforge: process dynamicproxy access logs [puppet] - 10https://gerrit.wikimedia.org/r/482237 (https://phabricator.wikimedia.org/T87001) (owner: 10BryanDavis) [00:55:06] I can push out both [00:55:49] that's fine [00:57:20] thanks thcipriani ... https://grafana.wikimedia.org/d/000000566/reading-web-dashboard?panelId=15&fullscreen&edit&orgId=1&tab=general&from=now-2d&to=now seems to be trending in a nice direction : [00:57:22] !log tstarling@deploy1001 Synchronized src/ServiceConfig.php: gerrit 477956 (duration: 00m 53s) [00:57:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:58:41] !log tstarling@deploy1001 Synchronized multiversion/MWRealm.php: (no justification provided) (duration: 00m 52s) [00:58:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:59:47] !log tstarling@deploy1001 Synchronized errorpages/hhvm-fatal-error.php: (no justification provided) (duration: 00m 53s) [00:59:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:00:04] twentyafterfour: I, the Bot under the Fountain, allow thee, The Deployer, to do Phabricator update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190124T0100). [01:01:35] !log tstarling@deploy1001 Synchronized wmf-config/CommonSettings.php: 477956 and Aaron's 486134 (duration: 00m 52s) [01:01:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:02:41] * AaronSchulz expects re-hashing churn for a while [01:22:25] 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-Cache, 10Language-Team (Language-2019-January-March), and 5 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10Etonkovidova) @Nikerabbit t... [01:24:54] 10Operations, 10DBA, 10Jade, 10TechCom-RFC, and 3 others: Introduce a new namespace for collaborative judgements about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) >>! In T200297#4903879, @Milimetric wrote: > This wikitext-in-JSON thing seems really complicated. It's certainly tric... [01:29:07] (03CR) 10Krinkle: [C: 03+2] errorpages: Remove unused php-fatal-error.html file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478575 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [01:30:19] (03Merged) 10jenkins-bot: errorpages: Remove unused php-fatal-error.html file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478575 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [01:33:19] (03CR) 10jenkins-bot: errorpages: Remove unused php-fatal-error.html file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478575 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [01:35:47] !log krinkle@deploy1001 Synchronized errorpages/: Ic093c3122f - rm php-fatal-error.html (duration: 00m 54s) [01:35:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:47:34] (03PS1) 10Milimetric: Sqoop actor and comment from production monthly [puppet] - 10https://gerrit.wikimedia.org/r/486203 (https://phabricator.wikimedia.org/T209031) [02:56:09] PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: Traceback (most recent call last): https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [03:00:59] RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 1 probes of 411 (alerts on 35) - https://atlas.ripe.net/measurements/1790945/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [03:06:51] 10Operations, 10Discovery-Search, 10Elasticsearch, 10Maps: Add more metrics to upstream's elasticsearch exporter. - https://phabricator.wikimedia.org/T214547 (10Mathew.onipe) p:05Triage→03Normal [03:20:03] (03CR) 10Mathew.onipe: wdqs: convert prom exporter script tp py3 (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/484974 (https://phabricator.wikimedia.org/T213305) (owner: 10Mathew.onipe) [03:25:25] (03PS6) 10Mathew.onipe: wdqs: convert prom exporter script tp py3 [puppet] - 10https://gerrit.wikimedia.org/r/484974 (https://phabricator.wikimedia.org/T213305) [03:38:03] 10Operations, 10Discovery-Search (Current work): Test spicerack elasticsearch module - https://phabricator.wikimedia.org/T207920 (10Mathew.onipe) Gehel did some testing and here is the link to the results: https://etherpad.wikimedia.org/p/spicerack-elastic-test-plan The summary are as follows: - Need to fix... [03:38:20] 10Operations, 10Discovery-Search, 10Elasticsearch: Test spicerack elasticsearch module - https://phabricator.wikimedia.org/T207920 (10Mathew.onipe) [03:39:54] 10Operations, 10Elasticsearch, 10Discovery-Search (Current work): Test spicerack elasticsearch module - https://phabricator.wikimedia.org/T207920 (10Mathew.onipe) [04:02:25] (03PS1) 10BryanDavis: wmcs: Add missing admin::groups for labtestn hosts [puppet] - 10https://gerrit.wikimedia.org/r/486207 [04:46:49] (03PS9) 10Tim Starling: Put profiler hostnames in ProductionServices.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477957 [04:47:12] (03CR) 10Tim Starling: [C: 03+2] Put profiler hostnames in ProductionServices.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477957 (owner: 10Tim Starling) [04:48:14] (03Merged) 10jenkins-bot: Put profiler hostnames in ProductionServices.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477957 (owner: 10Tim Starling) [04:50:09] !log tstarling@deploy1001 Synchronized wmf-config/ProductionServices.php: gerrit 477957 (duration: 00m 56s) [04:50:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:51:27] !log tstarling@deploy1001 Synchronized wmf-config/LabsServices.php: gerrit 477957 (duration: 00m 52s) [04:51:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:52:36] !log tstarling@deploy1001 Synchronized wmf-config/PhpAutoPrepend.php: gerrit 477957 (duration: 00m 52s) [04:52:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:53:52] (03CR) 10jenkins-bot: Put profiler hostnames in ProductionServices.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477957 (owner: 10Tim Starling) [04:53:58] !log tstarling@deploy1001 Synchronized wmf-config/PhpAutoPrepend-labs.php: gerrit 477957 (duration: 00m 53s) [04:54:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:55:39] (03PS10) 10Tim Starling: Excimer and Tideways support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478137 [04:57:10] (03PS1) 10Tulsi Bhagat: Configure $wgSitename and $wgMetaNamespace for ur.wiktionary, ur.wikibooks and ur.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486208 (https://phabricator.wikimedia.org/T214290) [04:57:38] (03CR) 10Tim Starling: [C: 03+2] Excimer and Tideways support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478137 (owner: 10Tim Starling) [04:58:42] (03Merged) 10jenkins-bot: Excimer and Tideways support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478137 (owner: 10Tim Starling) [05:01:38] !log tstarling@deploy1001 Synchronized wmf-config/PhpAutoPrepend.php: gerrit 478137 (duration: 00m 53s) [05:01:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:01:43] (03CR) 10Tulsi Bhagat: "Requires `namespaceDupes.php --wiki=urwikibooks --fix` after deployment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486208 (https://phabricator.wikimedia.org/T214290) (owner: 10Tulsi Bhagat) [05:03:00] !log tstarling@deploy1001 Synchronized wmf-config/profiler.php: gerrit 478137 (duration: 00m 53s) [05:03:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:07:33] (03CR) 10jenkins-bot: Excimer and Tideways support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478137 (owner: 10Tim Starling) [05:19:55] (03CR) 10محمد شعیب: "Please copy paste it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486208 (https://phabricator.wikimedia.org/T214290) (owner: 10Tulsi Bhagat) [06:06:44] (03PS1) 10Marostegui: mariadb: Provision dbstore1005 [puppet] - 10https://gerrit.wikimedia.org/r/486209 (https://phabricator.wikimedia.org/T210478) [06:07:17] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Provision dbstore1005 [puppet] - 10https://gerrit.wikimedia.org/r/486209 (https://phabricator.wikimedia.org/T210478) (owner: 10Marostegui) [06:08:57] (03PS2) 10Marostegui: mariadb: Provision dbstore1005 [puppet] - 10https://gerrit.wikimedia.org/r/486209 (https://phabricator.wikimedia.org/T210478) [06:10:36] !log Add dbstore1003:3311 to tendril - T210478 [06:10:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:10:43] T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 [06:11:25] (03CR) 10Marostegui: [C: 03+2] mariadb: Provision dbstore1005 [puppet] - 10https://gerrit.wikimedia.org/r/486209 (https://phabricator.wikimedia.org/T210478) (owner: 10Marostegui) [06:11:48] (03PS8) 10Marostegui: dbstore_multiinstance: Add staging db [puppet] - 10https://gerrit.wikimedia.org/r/485367 (https://phabricator.wikimedia.org/T210478) [06:12:52] (03CR) 10Marostegui: [C: 03+2] dbstore_multiinstance: Add staging db [puppet] - 10https://gerrit.wikimedia.org/r/485367 (https://phabricator.wikimedia.org/T210478) (owner: 10Marostegui) [06:14:39] !log Reboot dbstore1005 - T210478 [06:14:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:21:29] PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: Traceback (most recent call last): https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [06:22:44] (03PS1) 10Marostegui: db-eqiad.php: Depool db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486210 (https://phabricator.wikimedia.org/T210713) [06:23:55] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486210 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:25:04] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486210 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:25:52] (03PS2) 10Krinkle: PhpAutoPrepend: Merge php7.php into PhpAutoPrepend.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486176 [06:26:15] RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 1 probes of 409 (alerts on 35) - https://atlas.ripe.net/measurements/1791307/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [06:26:18] (03CR) 10Krinkle: [C: 04-2] "Deployer (probably me): AutoPrepend MUST sync before php7.php." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486176 (owner: 10Krinkle) [06:26:27] (03PS2) 10Krinkle: PhpAutoPrepend: Remove PhpAutoPrepend-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486177 [06:27:09] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1100 T210713 (duration: 00m 53s) [06:27:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:27:11] !log Deploy schema change on db1100 - T210713 [06:27:12] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [06:27:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:27:31] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1100" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486211 [06:27:35] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486210 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:28:16] 10Operations, 10Performance-Team, 10monitoring, 10Patch-For-Review: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837 (10Krinkle) a:05Krinkle→03None (un-assigning the tracking task to reduce clutter on the workboard, see open sub tasks for current assignees) [06:29:18] 10Operations, 10MediaWiki-Debug-Logger, 10Performance-Team: Set up request profiling for PHP 7 - https://phabricator.wikimedia.org/T206152 (10Krinkle) [06:29:59] 10Operations, 10MediaWiki-Debug-Logger, 10Performance-Team: Set up request profiling for PHP 7 - https://phabricator.wikimedia.org/T206152 (10Krinkle) With it 3545a906d6b1 deployed (by @tstarling). I'll try this again tomorrow. [06:35:10] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1100" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486211 (owner: 10Marostegui) [06:36:13] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1100" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486211 (owner: 10Marostegui) [06:37:29] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1100 T210713 (duration: 00m 52s) [06:37:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:37:32] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [06:41:03] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1100" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486211 (owner: 10Marostegui) [06:43:50] !log Add dbstore1005:3320 to tendril and zarcillo - T210478 [06:43:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:43:54] T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 [06:48:10] (03PS1) 10Marostegui: db-eqiad.php: Depool db1113:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486213 (https://phabricator.wikimedia.org/T210713) [06:49:27] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1113:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486213 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:50:29] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1113:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486213 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:51:34] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1113:3315 T210713 (duration: 00m 53s) [06:51:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:51:37] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [06:51:38] !log Deploy schema change on db1113:3315 - T210713 [06:51:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:54] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1113:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486214 [06:54:17] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1113:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486213 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:55:13] !log Transfer x1 from dbstore1001 to dbstore1005 using mariadbbackup - T210478 [06:55:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:55:16] T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 [07:00:31] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1113:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486214 (owner: 10Marostegui) [07:01:35] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1113:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486214 (owner: 10Marostegui) [07:02:37] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1113:3315 T210713 (duration: 00m 53s) [07:02:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:02:40] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [07:03:06] (03PS1) 10Marostegui: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486216 (https://phabricator.wikimedia.org/T210713) [07:05:16] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486216 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [07:06:25] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486216 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [07:07:32] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1082 T210713 (duration: 00m 52s) [07:07:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:07:38] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1113:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486214 (owner: 10Marostegui) [07:07:40] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486216 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [07:07:49] !log Deploy schema change on db1082, this will generate lag on labsdb s5 - T210713 [07:07:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:07:54] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [07:09:49] !log Compress Aria tables to InnoDB on dbstore1002 staging database - T213706 [07:09:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:09:53] T213706: Convert Aria/Tokudb tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 [07:12:04] (03CR) 10Hashar: gerrit: update PolyGerrit theme (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [07:15:21] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486217 [07:18:09] !log Transfer s6 from dbstore1001 to dbstore1005 using mariadbbackup - T210478 [07:18:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:18:13] T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 [07:21:31] 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-Cache, 10Language-Team (Language-2019-January-March), and 5 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) @Etonkovidova hi! I... [07:24:51] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486217 (owner: 10Marostegui) [07:25:05] (03CR) 10Elukey: [C: 03+1] mariadb: Provision dbstore1005 [puppet] - 10https://gerrit.wikimedia.org/r/486209 (https://phabricator.wikimedia.org/T210478) (owner: 10Marostegui) [07:25:54] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486217 (owner: 10Marostegui) [07:26:55] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1082 T210713 (duration: 00m 52s) [07:26:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:26:58] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [07:34:31] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486217 (owner: 10Marostegui) [07:43:56] !log Add dbstore1005:3316 to tendril and zarcillo - T210478 [07:43:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:44:00] T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 [07:46:59] (03PS4) 10Giuseppe Lavagetto: Beta Features: Add the new PHP7 beta feature to the whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484799 (https://phabricator.wikimedia.org/T213934) (owner: 10Jforrester) [07:47:42] (03CR) 10jerkins-bot: [V: 04-1] Beta Features: Add the new PHP7 beta feature to the whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484799 (https://phabricator.wikimedia.org/T213934) (owner: 10Jforrester) [07:50:29] !log Compress InnoDB tables on dbstore1005:3316 - T210478 [07:50:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:50:32] T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 [07:53:06] !log Deploy schema change on db1102:3315 [07:53:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:54:35] (03PS5) 10Giuseppe Lavagetto: Beta Features: Add the new PHP7 beta feature to the whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484799 (https://phabricator.wikimedia.org/T213934) (owner: 10Jforrester) [07:56:11] (03CR) 10Elukey: [C: 03+1] Beta Features: Add the new PHP7 beta feature to the whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484799 (https://phabricator.wikimedia.org/T213934) (owner: 10Jforrester) [07:56:29] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Beta Features: Add the new PHP7 beta feature to the whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484799 (https://phabricator.wikimedia.org/T213934) (owner: 10Jforrester) [07:57:34] (03Merged) 10jenkins-bot: Beta Features: Add the new PHP7 beta feature to the whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484799 (https://phabricator.wikimedia.org/T213934) (owner: 10Jforrester) [07:58:13] !log Compress innodb on dbstor1004 s2 and s3 - T210478 [07:58:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:58:16] T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 [08:01:41] (03CR) 10jenkins-bot: Beta Features: Add the new PHP7 beta feature to the whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484799 (https://phabricator.wikimedia.org/T213934) (owner: 10Jforrester) [08:07:45] (03PS1) 10WMDE-Fisch: Enable reference previews on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486218 (https://phabricator.wikimedia.org/T213415) [08:08:04] !log oblivian@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Whitelist the php7 beta feature (duration: 00m 54s) [08:08:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:11:37] 10Operations, 10DBA, 10Jade, 10TechCom-RFC, and 3 others: Introduce a new namespace for collaborative judgements about wiki entities - https://phabricator.wikimedia.org/T200297 (10daniel) > Simple text is a good suggestion, but I'm annoyed that it can't express links. I have a proposal in the pipeline for... [08:14:08] (03PS1) 10Marostegui: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486219 (https://phabricator.wikimedia.org/T210713) [08:15:19] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486219 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [08:16:22] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486219 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [08:17:50] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1110 T210713 (duration: 00m 53s) [08:17:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:53] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [08:18:05] !log Deploy schema change on db1110 - T210713 [08:18:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:18:18] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1110" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486220 [08:23:27] (03PS1) 10Ammarpad: Add 'Author' namespace in Sanskrit Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486221 (https://phabricator.wikimedia.org/T214553) [08:27:28] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486219 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [08:27:30] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1110" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486220 (owner: 10Marostegui) [08:29:22] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1110" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486220 (owner: 10Marostegui) [08:30:32] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1110 T210713 (duration: 00m 52s) [08:30:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:36] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [08:30:40] !log Deploy schema change on db1070 (s5 master) - T210713 [08:30:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:35:05] (03PS2) 10Muehlenhoff: puppetmasters: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/483695 [08:39:51] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] "The extra `false` does not hurt, I'm just wondering." (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486218 (https://phabricator.wikimedia.org/T213415) (owner: 10WMDE-Fisch) [08:40:27] !log Deploy schema change on s2 codfw master (db2035). this will generate lag on codfw - T210713 [08:40:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:40:33] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [08:40:50] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1110" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486220 (owner: 10Marostegui) [08:44:32] (03CR) 10Muehlenhoff: [C: 03+2] puppetmasters: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/483695 (owner: 10Muehlenhoff) [08:49:05] !log Transfer s8 from db1116:3318 to dbstore1005:3318 T210478 [08:49:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:49:08] T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 [08:51:28] !log elasticsearch: deleting indices moved out of the search-chi@(eqiad|codfw) cluster (T214052) [08:51:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:31] T214052: Delete indices moved from chi to psi/omega - https://phabricator.wikimedia.org/T214052 [08:54:35] (03PS1) 10Muehlenhoff: uwsgi: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/486223 [08:56:49] (03PS2) 10Ammarpad: Add 'Author' namespace in Sanskrit Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486221 (https://phabricator.wikimedia.org/T214553) [08:58:16] (03PS3) 10Ammarpad: Add 'Author' namespace in Sanskrit Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486221 (https://phabricator.wikimedia.org/T214553) [09:01:59] Hi, a question for the SWAT team: European Mid-day SWAT reads "max 6 patches". I see there are already 7 listed, any chance it would be possible to add another one? [09:04:29] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Fix mathoid's prometheus-statsd.conf inclusion [deployment-charts] - 10https://gerrit.wikimedia.org/r/486114 (owner: 10Alexandros Kosiaris) [09:09:01] 10Puppet, 10Beta-Cluster-Infrastructure, 10monitoring: Puppet failure on deployment-prometheus01.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T214558 (10greg) [09:12:26] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/14456/" [puppet] - 10https://gerrit.wikimedia.org/r/486223 (owner: 10Muehlenhoff) [09:15:33] PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: Traceback (most recent call last): https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [09:20:33] RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 2 probes of 411 (alerts on 35) - https://atlas.ripe.net/measurements/1790945/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [09:24:14] !log temp stop prometheus@global on prometheus2003 to grab a snapshot [09:24:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:53] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] "Talked with Fisch and decided we will change the extensions default to `true` (this does make the rollout on labs sooo much easier), and n" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486218 (https://phabricator.wikimedia.org/T213415) (owner: 10WMDE-Fisch) [09:26:05] (03PS1) 10Marostegui: db-eqiad.php: Depool db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486229 (https://phabricator.wikimedia.org/T210713) [09:27:39] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486229 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [09:28:45] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486229 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [09:29:49] (03CR) 10Addshore: [C: 03+1] Enable reference previews on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486218 (https://phabricator.wikimedia.org/T213415) (owner: 10WMDE-Fisch) [09:29:51] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 T210713 (duration: 00m 53s) [09:29:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:29:54] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [09:30:03] !log Deploy schema change on db1103:3312 - T210713 [09:30:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:31:33] (03PS8) 10Daimona Eaytoy: Move all AbuseFilter config to abusefilter.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477063 (https://phabricator.wikimedia.org/T145931) [09:33:08] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1103:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486231 [09:33:56] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486229 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [09:36:26] (03PS1) 10Muehlenhoff: graphite: Remove support for trusty/upstart [puppet] - 10https://gerrit.wikimedia.org/r/486232 [09:37:57] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: Traceback (most recent call last): https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [09:39:24] (03PS9) 10Daimona Eaytoy: Move all AbuseFilter config to abusefilter.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477063 (https://phabricator.wikimedia.org/T145931) [09:41:19] PROBLEM - MariaDB Slave Lag: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 8974.20 seconds [09:41:31] PROBLEM - MariaDB Slave Lag: s8 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 7888.41 seconds [09:42:53] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 1 probes of 410 (alerts on 35) - https://atlas.ripe.net/measurements/1791210/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [09:42:55] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1103:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486231 (owner: 10Marostegui) [09:43:33] !log Deploy schema change on db1095:3312 - T210713 [09:43:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:43:36] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [09:44:01] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1103:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486231 (owner: 10Marostegui) [09:45:13] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1103:3312 T210713 (duration: 00m 53s) [09:45:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:46:20] marostegui: are you doing anything with s8? just noticed our change dispatching got super slow [09:46:32] addshore: nope [09:46:50] between 10 and 10:09 and then stayed slow [09:46:53] * addshore will investigate [09:46:55] addshore: I am touching the backups host for s8, but that shouldn't affect you [09:47:09] 10Operations, 10Puppet, 10Continuous-Integration-Config: puppet.git rake fails with ruby 2.5 - https://phabricator.wikimedia.org/T208566 (10Joe) There is a specific reason why the recommended method for running our spec tests includes the use of `rbenv` https://wikitech.wikimedia.org/wiki/Puppet_coding#Insta... [09:47:15] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1103:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486231 (owner: 10Marostegui) [09:49:10] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/14458/" [puppet] - 10https://gerrit.wikimedia.org/r/486232 (owner: 10Muehlenhoff) [09:51:14] interesting, it seems to be recovering..... [09:52:14] (03PS2) 10Alexandros Kosiaris: add statsd_exporter config to mathoid [deployment-charts] - 10https://gerrit.wikimedia.org/r/482718 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite) [09:55:04] kind of felt like something odd was up with redis / the locks for dispatching, but all recovered now, oh well [09:56:23] marostegui: Wikimedia\Rdbms\LoadBalancer::doWait: Timed out waiting on db1075 pos 0-171966669-4075108480,171966669-171966669-3589226378,171974792-171974792-127015932,180359174-180359174-94123433,180363367-180363367-134174373 [09:56:29] thatsaI see that in logstash for the period [09:56:34] I see that [09:57:03] https://logstash.wikimedia.org/goto/2c52568b079dfaa6dfe4c3c5d67d87e4 [09:57:41] addshore: db1075 is s3 [09:58:15] and db1077 too [09:58:22] I cant help but think it is somehow related, as it is for the exact same time period as the dispatching issues that I saw [09:58:23] I don't think anyone was touching s3 at the time [09:58:42] (03PS12) 10Giuseppe Lavagetto: profile::services_proxy: simple local proxying for remote services [puppet] - 10https://gerrit.wikimedia.org/r/483788 (https://phabricator.wikimedia.org/T210717) [09:58:44] (03PS13) 10Giuseppe Lavagetto: mediawiki::common: add proxy for services [puppet] - 10https://gerrit.wikimedia.org/r/483789 (https://phabricator.wikimedia.org/T210717) [09:58:46] (03PS1) 10Arturo Borrero Gonzalez: cloudnet1004: reimage to Debian Stretch [puppet] - 10https://gerrit.wikimedia.org/r/486236 (https://phabricator.wikimedia.org/T214299) [09:58:50] Yeah, it might be, but I don't understand how [09:58:52] !log installing tiff security updates on trusty [09:58:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:48] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudnet1004: reimage to Debian Stretch [puppet] - 10https://gerrit.wikimedia.org/r/486236 (https://phabricator.wikimedia.org/T214299) (owner: 10Arturo Borrero Gonzalez) [10:01:47] addshore: something happened on s3, there is an increase on activity: https://grafana.wikimedia.org/d/000000273/mysql?panelId=7&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1075&var-port=9104&from=now-24h&to=now [10:03:46] !log T214299 reimage cloudnet1004 to debian stretch [10:03:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:50] T214299: cloudvps: neutron: upgrade jessie -> stretch - https://phabricator.wikimedia.org/T214299 [10:03:50] <_joe_> marostegui: since when? [10:03:51] (03PS1) 10Alexandros Kosiaris: ores: Enable checking of celery list key [puppet] - 10https://gerrit.wikimedia.org/r/486238 (https://phabricator.wikimedia.org/T182914) [10:03:56] (03CR) 10Sayant Mahato: "102: Author should be लेखकः" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486221 (https://phabricator.wikimedia.org/T214553) (owner: 10Ammarpad) [10:04:24] <_joe_> oh since 8 am [10:04:32] _joe_: similar thing happened on db1078 but finished already: https://grafana.wikimedia.org/d/000000273/mysql?panelId=7&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1078&var-port=9104&from=now-24h&to=now [10:04:36] <_joe_> is it when I deployed the whitelisting of the php7 beta? [10:05:06] (03CR) 10Filippo Giunchedi: [C: 04-1] "Good start, though we should introduce a feature flag or similar mechanism to cater for partial rollouts" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/486192 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [10:05:26] I'm in a meeting now but still watching here, and will have more of a look after [10:05:47] <_joe_> no the surge is definitely started before [10:06:10] (03CR) 10Alexandros Kosiaris: [C: 03+1] "Much better" [puppet] - 10https://gerrit.wikimedia.org/r/483788 (https://phabricator.wikimedia.org/T210717) (owner: 10Giuseppe Lavagetto) [10:07:22] 10Puppet, 10Beta-Cluster-Infrastructure, 10monitoring: Puppet failure on deployment-prometheus01.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T214558 (10fgiunchedi) Indeed, this failures are part of testing for {T213708}, is there a way we can temporarily mute the emails or sth like that? [10:11:37] (03PS7) 10Gehel: wdqs: convert prom exporter script tp py3 [puppet] - 10https://gerrit.wikimedia.org/r/484974 (https://phabricator.wikimedia.org/T213305) (owner: 10Mathew.onipe) [10:12:47] (03CR) 10Gehel: [C: 03+2] wdqs: convert prom exporter script tp py3 [puppet] - 10https://gerrit.wikimedia.org/r/484974 (https://phabricator.wikimedia.org/T213305) (owner: 10Mathew.onipe) [10:13:31] !log installing libav security updates [10:13:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:14:41] marostegui: _joe_ seems to be happening again? [10:14:51] it is still happening on db1075 [10:15:08] dropped off between 10:50 and 11 and then resumed (looking at the wikidata dispatch lag), still not sure how they are linked [10:16:39] https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&from=now-24h&to=now&var-dc=eqiad%20prometheus%2Fops&var-server=db1075&var-port=9104&panelId=7&fullscreen [10:17:30] (03CR) 10Alexandros Kosiaris: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/14462/ points out it's effectively a noop" [puppet] - 10https://gerrit.wikimedia.org/r/486238 (https://phabricator.wikimedia.org/T182914) (owner: 10Alexandros Kosiaris) [10:19:43] (03PS9) 10Filippo Giunchedi: WIP prometheus: add feature flag for v2 compat [puppet] - 10https://gerrit.wikimedia.org/r/486051 (https://phabricator.wikimedia.org/T187987) [10:19:45] (03PS6) 10Filippo Giunchedi: WIP: hieradata: use v2 for prometheus1003 [puppet] - 10https://gerrit.wikimedia.org/r/486059 [10:20:40] (03CR) 10jerkins-bot: [V: 04-1] WIP prometheus: add feature flag for v2 compat [puppet] - 10https://gerrit.wikimedia.org/r/486051 (https://phabricator.wikimedia.org/T187987) (owner: 10Filippo Giunchedi) [10:22:49] marostegui: any idea what the queries are that are cuasing the thing? [10:22:55] (03PS10) 10Filippo Giunchedi: WIP prometheus: add feature flag for v2 compat [puppet] - 10https://gerrit.wikimedia.org/r/486051 (https://phabricator.wikimedia.org/T187987) [10:22:56] nope [10:22:57] (03PS7) 10Filippo Giunchedi: WIP: hieradata: use v2 for prometheus1003 [puppet] - 10https://gerrit.wikimedia.org/r/486059 [10:23:35] (03Abandoned) 10Alexandros Kosiaris: Don't page for zotero [puppet] - 10https://gerrit.wikimedia.org/r/483999 (owner: 10Alexandros Kosiaris) [10:23:51] (03CR) 10jerkins-bot: [V: 04-1] WIP prometheus: add feature flag for v2 compat [puppet] - 10https://gerrit.wikimedia.org/r/486051 (https://phabricator.wikimedia.org/T187987) (owner: 10Filippo Giunchedi) [10:25:09] (03PS1) 10Marostegui: db-eqiad.php: Depool db1075 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486246 [10:26:22] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1075 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486246 (owner: 10Marostegui) [10:27:35] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1075 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486246 (owner: 10Marostegui) [10:27:51] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1075 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486246 (owner: 10Marostegui) [10:28:47] addshore: looks like it is something that happens every day: https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&from=now-4d&to=now&panelId=7&fullscreen&var-dc=eqiad%20prometheus%2Fops&var-server=db1075&var-port=9104 [10:28:48] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1075 (duration: 00m 52s) [10:28:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:28:51] same times [10:29:42] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1075" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486247 [10:29:56] marostegui: interesting, this is just the first time is has also hit wikidata via causing the dispatch lag to increase somehow.... [10:30:03] still no idea what the link is... [10:31:33] (03CR) 10Tulsi Bhagat: "@محمد شعیب: Already done. Don't worry." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486208 (https://phabricator.wikimedia.org/T214290) (owner: 10Tulsi Bhagat) [10:31:47] addshore: we are guessing https://phabricator.wikimedia.org/T172497#4905030 [10:31:56] db1075 was a master a week ago [10:32:28] second time I have seen that ticket in 3 days now! :P [10:32:33] XDDDD [10:33:35] marostegui: it looks like Wikimedia\Rdbms\LoadBalancer::doWait: Timed out waiting on {host} pos {pos} hit all sites? not just s3 sites [10:33:56] https://logstash.wikimedia.org/goto/ff5202a70e6610d1054501631d40b730 [10:35:00] always on db1087 [10:35:24] which is s8 [10:36:40] it started at 08:59 [10:37:09] No idea why it says for example itwiki, as db1087 doesn't contain itwiki [10:37:14] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1075" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486247 (owner: 10Marostegui) [10:37:23] !log starting stretch upgrade on maps1001 - T198622 [10:37:26] !log installing libsndfile security updates [10:37:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:27] T198622: migrate maps servers to stretch with the current style - https://phabricator.wikimedia.org/T198622 [10:37:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:38] addshore: I will depool db1087 to see if that stops or moves to anothe rhost [10:37:43] marostegui: okay [10:38:16] marostegui: if it relates to dispatching, that could explain the different sites vs the messages always being for this s8 server [10:38:24] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1075" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486247 (owner: 10Marostegui) [10:38:50] apergos: are the dumps running? [10:39:02] db1087 is vslow/dump [10:39:23] yeah, they are running [10:39:27] and hitting db1087 [10:39:35] (03PS2) 10Gehel: maps: migrate maps1001 to stretch [puppet] - 10https://gerrit.wikimedia.org/r/486062 (https://phabricator.wikimedia.org/T198622) (owner: 10Mathew.onipe) [10:39:36] let me put weight 0 to db1087 [10:40:34] (03PS1) 10Marostegui: db-eqiad.php: Weight 0 to db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486250 [10:40:37] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1075" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486247 (owner: 10Marostegui) [10:40:55] (03PS1) 10Filippo Giunchedi: prometheus: set retention period for v2 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/486251 (https://phabricator.wikimedia.org/T187987) [10:41:49] (03PS2) 10Filippo Giunchedi: prometheus: set retention period for v2 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/486251 (https://phabricator.wikimedia.org/T187987) [10:41:51] (03PS11) 10Filippo Giunchedi: WIP prometheus: add feature flag for v2 compat [puppet] - 10https://gerrit.wikimedia.org/r/486051 (https://phabricator.wikimedia.org/T187987) [10:41:53] (03PS8) 10Filippo Giunchedi: WIP: hieradata: use v2 for prometheus1003 [puppet] - 10https://gerrit.wikimedia.org/r/486059 [10:42:02] addshore: it doesn't look specific to db1087 as it is now db1091 [10:42:05] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1075 (duration: 00m 53s) [10:42:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:00] (03Abandoned) 10Marostegui: db-eqiad.php: Weight 0 to db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486250 (owner: 10Marostegui) [10:43:02] (03CR) 10Gehel: [C: 03+2] maps: migrate maps1001 to stretch [puppet] - 10https://gerrit.wikimedia.org/r/486062 (https://phabricator.wikimedia.org/T198622) (owner: 10Mathew.onipe) [10:43:07] (03CR) 10jerkins-bot: [V: 04-1] WIP prometheus: add feature flag for v2 compat [puppet] - 10https://gerrit.wikimedia.org/r/486051 (https://phabricator.wikimedia.org/T187987) (owner: 10Filippo Giunchedi) [10:43:58] marostegui: hmmm inersting [10:44:02] *interesting [10:44:22] addshore: i think it just happened to hit the script, but it is not something "new" https://logstash.wikimedia.org/goto/04feba7a0024afcb577ef6e7ddd0fefe [10:46:00] marostegui: if you filter that by "Wikimedia\Rdbms\LoadBalancer::doWait: Timed out waiting on {host} pos {pos}", then it is new [10:46:05] but maybe I should stop worrying [10:46:18] 10Puppet, 10Beta-Cluster-Infrastructure, 10monitoring: Puppet failure on deployment-prometheus01.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T214558 (10thiemowmde) It looks like I also get these emails. I don't understand why. They are of zero use for me. How can I turn them off? [10:47:27] addshore: I will comment on the existing task I think [10:47:32] okay! [10:47:42] 10Operations, 10Maps, 10Reading-Infrastructure-Team-Backlog, 10Patch-For-Review: migrate maps servers to stretch with the current style - https://phabricator.wikimedia.org/T198622 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on cumin1001.eqiad.wmnet for hosts: ` ['maps1001.eqiad.wmn... [10:48:00] I'll keep an eye out and see if this keeps hitting wikidata dispatching now, as they will start to get annoying [10:48:41] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1002/14464/" [puppet] - 10https://gerrit.wikimedia.org/r/486251 (https://phabricator.wikimedia.org/T187987) (owner: 10Filippo Giunchedi) [10:49:35] (03CR) 10Filippo Giunchedi: [C: 03+1] "Note this will trigger a restart of prometheus, I'll stop puppet on the affected hosts and re-enable in a controlled way" [puppet] - 10https://gerrit.wikimedia.org/r/486251 (https://phabricator.wikimedia.org/T187987) (owner: 10Filippo Giunchedi) [10:49:56] 10Puppet, 10Beta-Cluster-Infrastructure, 10monitoring: Puppet failure on deployment-prometheus01.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T214558 (10Krenair) I think you have to give up deployment-prep adminship (and possibly membership?) to avoid getting emails about puppet failures... [10:51:12] !log Compress innodb tables on dbstore1005:3318 - T210478 [10:51:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:15] T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 [10:52:09] addshore: for what is worth, the temporary tables on db1075 are now gone, same pattern as the rest of the days [10:59:07] !log T214299 additional reboot for cloudnet1004 [10:59:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:11] T214299: cloudvps: neutron: upgrade jessie -> stretch - https://phabricator.wikimedia.org/T214299 [11:01:45] (03PS1) 10Effie Mouzeli: Revert "role::eqiad::scb: Switch rdb1006 to redis::misc::master" [puppet] - 10https://gerrit.wikimedia.org/r/486256 [11:05:03] (03PS1) 10Jcrespo: transfer.py: Add the ability to transfer from a new mariabackup [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/486257 (https://phabricator.wikimedia.org/T210292) [11:05:30] (03CR) 10jerkins-bot: [V: 04-1] transfer.py: Add the ability to transfer from a new mariabackup [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/486257 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [17:59:30] (03PS20) 10Thcipriani: gerrit: update PolyGerrit theme [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [17:59:35] added your cheat sheet link and counting mails to https://wikitech.wikimedia.org/wiki/Exim [18:00:04] cscott, arlolra, subbu, halfak, and Amir1: How many deployers does it take to do Services – Graphoid / Parsoid / Citoid / ORES deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190124T1800). [18:00:04] (03CR) 10Thcipriani: gerrit: update PolyGerrit theme (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [18:00:43] maybe it could have an example of the iptables commands used to block [18:00:58] thanks thcipriani [18:01:08] * thcipriani doffs cap [18:01:12] !log deactive ping offload redirect for ping1001 restart [18:01:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:02:01] (03PS11) 10Mathew.onipe: Add allocator metrics export for Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/485135 (https://phabricator.wikimedia.org/T213372) (owner: 10Smalyshev) [18:02:52] (03PS2) 10Dzahn: phabricator: firewall hole to allow http from deployment hosts [puppet] - 10https://gerrit.wikimedia.org/r/486181 [18:03:40] !log rebooting ping1001 to pick up SSBD-enabled qemu [18:03:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:03:55] akosiaris, hmm... maybe [18:04:23] actually I can only move stuff to junk [18:04:28] and close [18:04:32] stephanebisson nobody on our team has permission to +2 [18:04:33] would need an OTRS admin to delete [18:05:09] stephanebisson I should say I have +2 rights in general, but not to this repo [18:05:20] 10Operations, 10Elasticsearch, 10Maps, 10Discovery-Search (Current work): Add more metrics to upstream's elasticsearch exporter. - https://phabricator.wikimedia.org/T214547 (10EBjune) [18:06:29] Krenair: I have the same right it seems [18:06:33] PROBLEM - exim queue on mx2001 is CRITICAL: CRITICAL: 5064 mails in exim queue. [18:07:28] !log re-activate ping offload redirect for ping1001 restart [18:07:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:09:38] to delete tickets? [18:10:04] Krenair: no I can't delete either, that's what I mean [18:10:21] funny I am not an admin... maybe I should just give myself that right [18:10:29] on the other hand, no. That would be a mistake [18:10:36] I 'll ask though on the wiki [18:10:36] davidwbarratt: if someone from your team can +1, I can +2 [18:10:58] (03CR) 10Thcipriani: [C: 03+1] "nice update, paladox!" [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [18:11:17] (03CR) 10Tchanders: [C: 03+1] Disable partial blocks on most of the beta cluster. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486314 (https://phabricator.wikimedia.org/T214596) (owner: 10Dbarratt) [18:11:30] stephanebisson done! [18:11:41] akosiaris, surprised the WM Tech Staff right does not give you that ability [18:12:00] (03CR) 10Smalyshev: [C: 03+1] Add allocator metrics export for Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/485135 (https://phabricator.wikimedia.org/T213372) (owner: 10Smalyshev) [18:13:32] (03CR) 10Volans: [C: 03+1] "I've not tested it but the change seems reasonable to me. I'd like _joe_ to have a look too being related to conftool." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/413745 (https://phabricator.wikimedia.org/T157133) (owner: 10Andrew Bogott) [18:14:11] (03CR) 10Sbisson: [C: 03+2] Disable partial blocks on most of the beta cluster. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486314 (https://phabricator.wikimedia.org/T214596) (owner: 10Dbarratt) [18:15:27] (03Merged) 10jenkins-bot: Disable partial blocks on most of the beta cluster. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486314 (https://phabricator.wikimedia.org/T214596) (owner: 10Dbarratt) [18:15:31] davidwbarratt: voila! (you can unregister you patch from SWAT) [18:17:50] (03CR) 10Dzahn: [C: 03+2] gerrit: update PolyGerrit theme [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [18:18:37] stephanebisson done! https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=1814564&oldid=1814557 [18:19:36] !log deploying polygerrit (new gerrit UI) theme change to roughly match MediaWiki timeless theme (gerrit:482379) (shoutouts: paladox, thcipiriani) [18:19:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:20:02] paladox: please test :) [18:20:14] thanks! [18:20:15] * paladox tests [18:20:27] ah [18:20:36] forgot /r/ for /static/wikimedia-codereview-logo.cache.svg [18:20:47] though otherwise LGTM [18:20:55] ok [18:21:04] (03CR) 10jenkins-bot: Disable partial blocks on most of the beta cluster. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486314 (https://phabricator.wikimedia.org/T214596) (owner: 10Dbarratt) [18:21:10] 10Operations, 10monitoring, 10Goal, 10Patch-For-Review: Upgrade production prometheus-node-exporter to >= 0.16 - https://phabricator.wikimedia.org/T213708 (10colewhite) [18:21:24] !log onimisionipe@deploy1001 Started deploy [kartotherian/deploy@26a8bbd] (stretch): Updating maps1001 to reflect latest changes [18:21:24] (03PS1) 10Paladox: Add missing /r/ to /static/wikimedia-codereview-logo.cache.svg [puppet] - 10https://gerrit.wikimedia.org/r/486340 [18:21:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:48] (03PS2) 10Paladox: Add missing /r/ to /static/wikimedia-codereview-logo.cache.svg [puppet] - 10https://gerrit.wikimedia.org/r/486340 [18:21:53] mutante ^^ [18:22:48] !log onimisionipe@deploy1001 Finished deploy [kartotherian/deploy@26a8bbd] (stretch): Updating maps1001 to reflect latest changes (duration: 01m 24s) [18:22:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:57] (03PS3) 10Dzahn: gerrit: Add missing /r/ to /static/wikimedia-codereview-logo.cache.svg [puppet] - 10https://gerrit.wikimedia.org/r/486340 (owner: 10Paladox) [18:23:38] (03CR) 10Dzahn: [C: 03+2] "ACK - https://gerrit.wikimedia.org/r/static/wikimedia-codereview-logo.cache.svg" [puppet] - 10https://gerrit.wikimedia.org/r/486340 (owner: 10Paladox) [18:24:22] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T212990 (10Cmjohnson) Replaced the failed disk [18:24:55] paladox: deployed [18:24:55] thanks! [18:25:05] works! [18:25:15] ok :) [18:25:41] people who opted into new gerrit UI will see design changes now a new theme [18:25:56] default is still gwtui [18:26:11] the theme is made to kind of match MW Timeless [18:26:22] * Hauskatze still using the old one until creating and editting patches with polygerrit becomes a reality [18:26:28] 10Operations, 10serviceops, 10User-Joe: SRE FY2019 Q3 goal: Ramp-up serving traffic to PHP 7 - https://phabricator.wikimedia.org/T212828 (10Jdforrester-WMF) [18:26:33] 10Operations, 10serviceops, 10Core Platform Team (PHP7 (TEC4)), 10Core Platform Team Backlog (Watching / External), and 4 others: Set up a beta feature offering the use of PHP7 - https://phabricator.wikimedia.org/T213934 (10Jdforrester-WMF) 05Open→03Resolved a:03Joe [18:26:40] yup [18:26:52] * paladox recently fixed a bug in pg inline editor [18:26:59] recently as in today [18:27:10] will it be shipped in 2.16.4? [18:27:17] yup [18:27:21] which was released today [18:27:24] *claps* [18:27:46] we need to shepherd some devs into updating gerrit :) [18:27:46] 10Operations, 10serviceops, 10User-Joe: SRE FY2019 Q3 goal: Ramp-up serving traffic to PHP 7 - https://phabricator.wikimedia.org/T212828 (10Jdforrester-WMF) [18:27:49] (03PS12) 10Gehel: Add allocator metrics export for Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/485135 (https://phabricator.wikimedia.org/T213372) (owner: 10Smalyshev) [18:29:16] (03CR) 10Gehel: [C: 03+2] Add allocator metrics export for Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/485135 (https://phabricator.wikimedia.org/T213372) (owner: 10Smalyshev) [18:30:40] Hauskatze: no worries, it's on a roadmap [18:31:11] mutante: did that puppet patch finally worked? [18:31:32] the one that you linked to me a couple of days ago about mw2150 [18:32:15] Hauskatze: oh, right. yes that "worked" as in "something got applied" instead of nothing happening [18:32:24] the issue was it showed up twice in site.pp [18:32:52] now it got applied but there is an unrelated puppet error [18:33:07] oh, puppet... [18:33:26] well, it's missing steps to add certs, cant directly blame it in this case [18:33:43] i need to fix it still but it's not urgent [18:33:47] Ciao Daimona [18:34:01] mutante: something certcentral could be useful to? [18:34:21] (03CR) 10Volans: "LGTM, I've also run a compiler (missed the one already linked), results here:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/486306 (owner: 10Jbond) [18:34:25] (03CR) 10Volans: [C: 03+1] Add apt pinning for buster [puppet] - 10https://gerrit.wikimedia.org/r/486306 (owner: 10Jbond) [18:34:49] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review: Broken disk on analytics1056 - https://phabricator.wikimedia.org/T214057 (10Cmjohnson) disk is replaced but shows as unconfigured (good) [18:35:03] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: Degraded RAID on analytics1054 - https://phabricator.wikimedia.org/T213038 (10Cmjohnson) disk is replaced but shows as unconfigured (good) [18:36:26] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10Cmjohnson) the ssd is on-site, it's /dev/sda...the disk will need to be failed before I can replace. This server may need a reinstall if /dev/sda does not r... [18:36:58] Hola :) [18:37:14] !log pooling maps1003 - stretch migration is complete. T198622 [18:37:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:18] T198622: migrate maps servers to stretch with the current style - https://phabricator.wikimedia.org/T198622 [18:38:23] Hauskatze: no, they are for internal use, not for external clients [18:38:33] got it [18:49:13] !log cp4026: T214529: apt-get install'ing edac-utils with new deps libedac1 libsysfs2 [18:49:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:49:19] T214529: EDAC events not being reported by node-exporter? - https://phabricator.wikimedia.org/T214529 [18:49:59] PROBLEM - configured eth on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [18:50:07] PROBLEM - DPKG on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [18:50:19] PROBLEM - Disk space on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [18:50:25] PROBLEM - MD RAID on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [18:50:29] PROBLEM - puppet last run on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [18:50:43] PROBLEM - Check systemd state on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [18:50:57] PROBLEM - dhclient process on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [18:51:14] sigh.. it's notebook.. happens [18:51:32] users doing things that uses all the memory [18:51:40] existing ticket [18:52:14] yep [18:52:27] RECOVERY - configured eth on notebook1003 is OK: OK - interfaces up [18:52:35] RECOVERY - DPKG on notebook1003 is OK: All packages OK [18:52:47] !log notebook1002: restarted nagios-nrpe-server due to oom [18:52:47] RECOVERY - Disk space on notebook1003 is OK: DISK OK [18:52:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:52:53] RECOVERY - MD RAID on notebook1003 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [18:52:54] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Analytics query access for search platform NLP contractor - https://phabricator.wikimedia.org/T214623 (10EBernhardson) p:05Triage→03Normal [18:53:11] RECOVERY - Check systemd state on notebook1003 is OK: OK - running: The system is fully operational [18:53:15] !log notebook1003 - restarted nagios-nrpe-server... T212824 [18:53:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:53:18] T212824: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 [18:53:25] RECOVERY - dhclient process on notebook1003 is OK: PROCS OK: 0 processes with command name dhclient [18:53:54] !log arlolra@deploy1001 Started deploy [parsoid/deploy@f2384f0]: Updating Parsoid to f1d717f [18:53:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:55:37] chaomodus: let's keep logging to that ticket when it happens [18:55:43] RECOVERY - puppet last run on notebook1003 is OK: OK: Puppet is currently enabled, last run 9 minutes ago with 0 failures [18:55:44] to gather some data points [18:55:45] ah good point [18:55:49] will do [18:55:52] :) [18:57:55] jouncebot: next [18:57:56] In 0 hour(s) and 2 minute(s): Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190124T1900) [18:58:00] Bah. [18:58:45] OK, I'll SWAT then [18:58:54] raynor: fwiw I have no problem with doing the Proton handover first and figuring out the ops tasks afterwards - they should be deployment blockers IMO but not handover blockers [18:59:14] 10Operations, 10monitoring: EDAC events not being reported by node-exporter? - https://phabricator.wikimedia.org/T214529 (10CDanis) `cdanis@cp4026.ulsfo.wmnet ~ % edac-util -v mc0: 0 Uncorrected Errors with no DIMM info mc0: 0 Corrected Errors with no DIMM info mc0: csrow0: 0 Uncorrected Errors mc0: csrow0:... [18:59:14] jouncebot: refresh [18:59:16] I refreshed my knowledge about deployments. [18:59:35] I said that on the handover phab task a few times but I think people misunderstood it anyways [19:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the Morning SWAT (Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190124T1900). [19:00:04] Daimona, bmansurov, stephanebisson, and James_F: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:11] here [19:00:12] Hi [19:00:18] bmansurov: I'll do yours first, if that's OK, as it's Beta-Only [19:00:22] (03CR) 10Jforrester: [C: 03+2] Labs: set wgWMECitationUsagePageLoadPopulationSize at 33.3% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486329 (https://phabricator.wikimedia.org/T213969) (owner: 10Bmansurov) [19:00:29] James_F: thanks [19:00:57] Hauskatze anyways im working on bringing zuul status view onto changes in PolyGerrit. [19:01:00] o/ [19:01:05] stephanebisson: I assume you need the wmf.14 bit for viwiki before the config change? :-) [19:01:20] paladox: how's so? [19:01:23] (03CR) 10Volans: [C: 03+2] sre.host: add Icinga downtime cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/484432 (https://phabricator.wikimedia.org/T205886) (owner: 10Volans) [19:01:40] (03PS2) 10Volans: spicerack: expose the icinga_master_host property [software/spicerack] - 10https://gerrit.wikimedia.org/r/486135 [19:01:44] Hauskatze polygerrit has a ci hook that we can integrate with to bring in a live view of zuul status page. [19:01:46] no need to go to integration.wm... [19:01:48] James_F: Not really because it won't come into effect until wmf.14 is on group2 [19:01:55] upstream are already using it for some other ci. [19:01:59] paladox: looks good [19:02:17] stephanebisson: Ah, OK. Should I just sling out your config patch now, then? Or should we wait? [19:02:37] !log T214529: cdanis@cp4026.ulsfo.wmnet ~ % sudo apt-get --purge remove edac-utils libsysfs2 libedac1 [19:02:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:02:40] T214529: EDAC events not being reported by node-exporter? - https://phabricator.wikimedia.org/T214529 [19:03:00] (03Merged) 10jenkins-bot: sre.host: add Icinga downtime cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/484432 (https://phabricator.wikimedia.org/T205886) (owner: 10Volans) [19:03:06] James_F: the wmf.14 (GrowthExperiment) patch can go anytime. The config change I would like to test. [19:03:07] 10Operations, 10Proton, 10Security-Team, 10Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q3), 10Reading-Infrastructure-Team-Backlog (Kanban): [2 hrs] Decide on handling system updates for Proton - https://phabricator.wikimedia.org/T213366 (10MoritzMuehlenhoff) >>! In T213366#4906157, @Tgr wrot... [19:03:19] * James_F nods. [19:03:30] Will wait for the wmf.14 patches then. [19:03:35] !log arlolra@deploy1001 Finished deploy [parsoid/deploy@f2384f0]: Updating Parsoid to f1d717f (duration: 09m 41s) [19:03:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:04:07] (03CR) 10Volans: tests: test also with Python 3.7 (031 comment) [software/cumin] - 10https://gerrit.wikimedia.org/r/481914 (owner: 10Volans) [19:04:42] (03PS2) 10Jforrester: Labs: set wgWMECitationUsagePageLoadPopulationSize at 33.3% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486329 (https://phabricator.wikimedia.org/T213969) (owner: 10Bmansurov) [19:04:55] (03CR) 10Jforrester: [C: 03+2] "…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486329 (https://phabricator.wikimedia.org/T213969) (owner: 10Bmansurov) [19:05:37] (03CR) 10Volans: [C: 03+2] tests: fix Pytest RemovedInPytest4Warning [software/cumin] - 10https://gerrit.wikimedia.org/r/481913 (owner: 10Volans) [19:05:53] PROBLEM - Nginx local proxy to apache on mw1270 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.008 second response time [19:06:04] (03Merged) 10jenkins-bot: Labs: set wgWMECitationUsagePageLoadPopulationSize at 33.3% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486329 (https://phabricator.wikimedia.org/T213969) (owner: 10Bmansurov) [19:06:05] PROBLEM - Apache HTTP on mw1270 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1309 bytes in 0.006 second response time [19:06:09] PROBLEM - HHVM rendering on mw1270 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.962 second response time [19:06:59] RECOVERY - exim queue on mx2001 is OK: OK: Less than 1000 mails in exim queue. [19:07:07] RECOVERY - Nginx local proxy to apache on mw1270 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.039 second response time [19:07:17] RECOVERY - Apache HTTP on mw1270 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.042 second response time [19:07:23] RECOVERY - HHVM rendering on mw1270 is OK: HTTP OK: HTTP/1.1 200 OK - 76851 bytes in 0.506 second response time [19:09:26] * James_F twiddles thumbs waiting for CI. [19:09:53] (03CR) 10Volans: [C: 03+2] spicerack: expose the icinga_master_host property [software/spicerack] - 10https://gerrit.wikimedia.org/r/486135 (owner: 10Volans) [19:10:03] bmansurov: Your patch is landed and should be on Beta in a few minutes' time, sorry for not saying explicitly. [19:10:12] (03CR) 10jenkins-bot: Labs: set wgWMECitationUsagePageLoadPopulationSize at 33.3% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486329 (https://phabricator.wikimedia.org/T213969) (owner: 10Bmansurov) [19:10:29] There goes the Beta sync process. [19:11:56] (03CR) 10CRusnov: "This little bit of documentation is complicated! Anyways, just some minor word changes." (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/484330 (owner: 10Volans) [19:12:03] (03CR) 10Dzahn: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/486150 (owner: 10Dzahn) [19:12:16] !log Convert dbstore1002 staging.organic_link from Aria to InnoDB - T213706 [19:12:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:12:19] T213706: Convert Aria/Tokudb tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 [19:13:43] (03Merged) 10jenkins-bot: tests: fix Pytest RemovedInPytest4Warning [software/cumin] - 10https://gerrit.wikimedia.org/r/481913 (owner: 10Volans) [19:14:58] (03CR) 10jenkins-bot: tests: fix Pytest RemovedInPytest4Warning [software/cumin] - 10https://gerrit.wikimedia.org/r/481913 (owner: 10Volans) [19:15:35] (03Merged) 10jenkins-bot: spicerack: expose the icinga_master_host property [software/spicerack] - 10https://gerrit.wikimedia.org/r/486135 (owner: 10Volans) [19:16:14] Finally. [19:17:03] (03CR) 10jenkins-bot: spicerack: expose the icinga_master_host property [software/spicerack] - 10https://gerrit.wikimedia.org/r/486135 (owner: 10Volans) [19:17:09] stephanebisson: The wmf.14 change is live on mwdebug1002. [19:17:25] James_F: will you take care of the blockAttacker cherry-pick? [19:17:34] Hauskatze: Yes. [19:17:41] James_F: appreciated, thanks. [19:17:41] James_F: you can sync [19:17:49] stephanebisson: Kk, syncing. [19:18:45] Daimona: AbuseFilter change live on mwdebug1002. [19:18:54] (03PS6) 10Volans: documentation: fine-tune generated documentation [software/spicerack] - 10https://gerrit.wikimedia.org/r/484330 [19:19:00] James_F: checking [19:19:07] Although it should only have effect on logstash logs [19:19:22] !log jforrester@deploy1001 Synchronized php-1.33.0-wmf.14/extensions/GrowthExperiments/GrowthExperiments.alias.php: SWAT T213356 Add Special:WelcomeSurvey Vietnamese alias (duration: 00m 54s) [19:19:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:19:26] T213356: Personalized first day: activate for Vietnamese Wikipedia - https://phabricator.wikimedia.org/T213356 [19:19:32] (03CR) 10Volans: "Changes made. I'm not the author of the elastic docstrings so I don't want to change them too much risking to alter the meaning ;)" (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/484330 (owner: 10Volans) [19:19:45] Daimona: Yeah, just check it doesn't PHP fatal/etc. :-) [19:19:48] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Analytics query access for search platform NLP contractor - https://phabricator.wikimedia.org/T214623 (10Nuria) We also need to have an NDA on file that I am guessing will be signed after contract is, @TJones has the NDA signing taken pl... [19:19:51] (03PS7) 10Volans: documentation: fine-tune generated documentation [software/spicerack] - 10https://gerrit.wikimedia.org/r/484330 [19:19:55] What I'm doing :) [19:20:11] (03PS3) 10Jforrester: Enable WelcomeSurvey experiment 2 on viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486335 (https://phabricator.wikimedia.org/T213356) (owner: 10Sbisson) [19:20:24] (03CR) 10Jforrester: [C: 03+2] Enable WelcomeSurvey experiment 2 on viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486335 (https://phabricator.wikimedia.org/T213356) (owner: 10Sbisson) [19:21:04] tgr, thanks for clarification. I think that the most important bit is to have a good mutual understanding Proton. [19:21:15] James_F tested a bit, nothing strange [19:21:24] OK, let's see if we break everything. ;-) [19:21:46] (03PS1) 10Bstorm: toolforge: change dev_environ into shell_environ [puppet] - 10https://gerrit.wikimedia.org/r/486350 (https://phabricator.wikimedia.org/T213965) [19:22:36] tgr: I'm happy to help you with everything, the biggest problem is that this project took ~2yr, and I don't recall all things. It's difficult to write a good documentation, all things are in my mind but they don't come up immediately. If something is unclear please ask, I'll keep improving the Proton docs [19:23:00] !log jforrester@deploy1001 Synchronized php-1.33.0-wmf.14/extensions/AbuseFilter/includes/AbuseFilter.php: SWAT AbuseFilter Optionally pass the filter ID to checkConditions for error reporting I8510319c (duration: 00m 53s) [19:23:17] Daimona: OK, we look to be done. [19:23:21] !log delete 5076 tickets from OTRS with customerID MAILER-DAEMON@ubuntu.member.linode.com T214604 [19:23:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:23:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:23:32] T214604: OTRS receiving flood of emails into info-en-c - https://phabricator.wikimedia.org/T214604 [19:23:35] James_F Yes everything seems fine on logstash and logs are now updated with the ID [19:23:45] Excellent. [19:23:45] (03PS2) 10Bstorm: toolforge: change dev_environ into shell_environ [puppet] - 10https://gerrit.wikimedia.org/r/486350 (https://phabricator.wikimedia.org/T213965) [19:23:57] Many thanks :) [19:24:01] stephanebisson: Waiting for the config to merge for you. [19:24:19] tgr: the biggest struggle to me (atm) is that most of the questions are related to the service infrastructure/services in general. Marko set both production and beta Proton instances and I'm not aware of all things he did to keep it running. I'm afraid that from now, I'll keep redirecting you to services more often. [19:24:21] PROBLEM - puppet last run on notebook1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[cdh::hadoop::directory /user/spark] [19:24:23] (03PS3) 10Bstorm: toolforge: change dev_environ into shell_environ [puppet] - 10https://gerrit.wikimedia.org/r/486350 (https://phabricator.wikimedia.org/T213965) [19:24:47] (03CR) 10CRusnov: [C: 03+1] "Looks good!" (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/484330 (owner: 10Volans) [19:25:13] !log Updated Parsoid to f1d717f (T187958, T205337, T214103) [19:25:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:25:22] T214103: Instrument Parsoid language variant conversions - https://phabricator.wikimedia.org/T214103 [19:25:26] raynor: thanks! yeah, I think the documentation is in a good shape, the open questions are about WMF infrastructure more than Proton itself [19:25:28] T205337: Extract and use a token transformation interface (API) in place of custom token handlers - https://phabricator.wikimedia.org/T205337 [19:25:28] T187958: Parsoid and PHP parser parse differently - https://phabricator.wikimedia.org/T187958 [19:26:34] !log jforrester@deploy1001 Synchronized php-1.33.0-wmf.14/extensions/WikimediaMessages/i18n/wikimedia/en.json: SWAT T208097 WikimediaMessages: Add message for BlockAttacker password policy (duration: 00m 50s) [19:26:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:27:27] (Not that a basic sync of the file will do anything, of course.) [19:27:29] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Analytics query access for search platform NLP contractor - https://phabricator.wikimedia.org/T214623 (10TJones) >>! In T214623#4906847, @Nuria wrote: > We also need to have an NDA on file that I am guessing will be signed after contract... [19:27:48] (03CR) 10Gergő Tisza: [C: 04-2] "Blocked on T211621, apparently." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479571 (https://phabricator.wikimedia.org/T211622) (owner: 10Jforrester) [19:29:35] 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-Cache, 10Language-Team (Language-2019-January-March), and 5 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10Etonkovidova) Thanks, @eluke... [19:30:06] (03Merged) 10jenkins-bot: Enable WelcomeSurvey experiment 2 on viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486335 (https://phabricator.wikimedia.org/T213356) (owner: 10Sbisson) [19:31:06] 10Operations, 10Continuous-Integration-Config: jenkins-bot puppet-compiler-test may report SUCCESS though compiling failed - https://phabricator.wikimedia.org/T214629 (10Dzahn) [19:32:06] !log jforrester@deploy1001 Synchronized php-1.33.0-wmf.14/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php: SWAT T213885 Don't add mw:mediainfoView on File pages with no captions either (duration: 00m 51s) [19:32:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:32:09] T213885: Multilingual Captions: Content isn't clear:both'ed on Files with no captions yet, because it's still wrapped in a mw:mediainfoView tags - https://phabricator.wikimedia.org/T213885 [19:32:19] RECOVERY - exim queue on mx1001 is OK: OK: Less than 1000 mails in exim queue. [19:32:48] !log delete 5076 tickets from OTRS with customerID Mailer-Daemon@wizengo.ds.planet-work.net T214604 [19:32:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:32:51] T214604: OTRS receiving flood of emails into info-en-c - https://phabricator.wikimedia.org/T214604 [19:33:02] stephanebisson: OK, config patch live on mwdebug1002. Please test. [19:33:09] !log delete 8505 tickets from OTRS with customerID Mailer-Daemon@wizengo.ds.planet-work.net T214604 - correction [19:33:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:33:12] James_F: testing... [19:34:17] James_F: works as expected, you can sync [19:34:25] Excellent. [19:35:37] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT T213356 Enable WelcomeSurvey experiment 2 on viwiki (duration: 00m 53s) [19:35:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:40] T213356: Personalized first day: activate for Vietnamese Wikipedia - https://phabricator.wikimedia.org/T213356 [19:35:44] stephanebisson: All done? [19:35:53] 10Operations, 10Proton, 10Security-Team, 10Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q3), 10Reading-Infrastructure-Team-Backlog (Kanban): [2 hrs] Decide on handling system updates for Proton - https://phabricator.wikimedia.org/T213366 (10Tgr) @MoritzMuehlenhoff thanks, that's good to know.... [19:36:01] James_F: yep, thanks! [19:36:11] 10Operations, 10Continuous-Integration-Config: jenkins-bot puppet-compiler-test may report SUCCESS though compiling failed - https://phabricator.wikimedia.org/T214629 (10Dzahn) p:05Triage→03Low prio isn't high since this is clearly labelled as experimental and the feature itself is really cool to have :) t... [19:36:11] Excellent. [19:36:13] OK, Swat's done. [19:36:48] (03CR) 10jenkins-bot: Enable WelcomeSurvey experiment 2 on viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486335 (https://phabricator.wikimedia.org/T213356) (owner: 10Sbisson) [19:37:05] !log jforrester@deploy1001 Started scap: Post-SWAT full sync for new i18n for T208097 [19:37:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:38:43] volans: "check experimental" is cool but don't always trust it yet withot clicking through to compiler output .. https://phabricator.wikimedia.org/T214629 [19:39:05] for now i will do the "manual" compile [19:39:10] mutante: yeah saw your comment [19:39:37] i just made a ticket for it and called it low prio.. maybe it is slightly higher.. normal [19:39:43] (03CR) 10Volans: [C: 03+2] documentation: fine-tune generated documentation [software/spicerack] - 10https://gerrit.wikimedia.org/r/484330 (owner: 10Volans) [19:39:57] but i can't say High when soemthing is called experimental :) [19:40:27] James_F: I thought it was l10nupdate and not scap what we had to run :) [19:40:45] Hauskatze: Nope. [19:41:36] 10Operations, 10SRE-Access-Requests, 10WMF-NDA-Requests, 10Discovery-Search (Current work): Volunteer / Collaborator NDA - https://phabricator.wikimedia.org/T214630 (10Nuria) p:05Triage→03Normal [19:42:14] * James_F waits for the i18n build step. [19:44:06] (03PS14) 10Jbond: Add apt pinning for buster [puppet] - 10https://gerrit.wikimedia.org/r/486306 [19:44:40] (03CR) 10jerkins-bot: [V: 04-1] Add apt pinning for buster [puppet] - 10https://gerrit.wikimedia.org/r/486306 (owner: 10Jbond) [19:45:07] (03Merged) 10jenkins-bot: documentation: fine-tune generated documentation [software/spicerack] - 10https://gerrit.wikimedia.org/r/484330 (owner: 10Volans) [19:45:22] mutante: ack, thanks [19:45:54] (03PS3) 10Dzahn: phabricator: firewall hole to allow http from deployment hosts [puppet] - 10https://gerrit.wikimedia.org/r/486181 [19:46:03] (03CR) 10jenkins-bot: documentation: fine-tune generated documentation [software/spicerack] - 10https://gerrit.wikimedia.org/r/484330 (owner: 10Volans) [19:47:30] (03CR) 10Dzahn: [C: 03+2] phabricator: firewall hole to allow http from deployment hosts [puppet] - 10https://gerrit.wikimedia.org/r/486181 (owner: 10Dzahn) [19:48:35] (03PS15) 10Jbond: Add apt pinning for buster [puppet] - 10https://gerrit.wikimedia.org/r/486306 [19:49:30] (03PS16) 10Jbond: Add apt pinning for buster [puppet] - 10https://gerrit.wikimedia.org/r/486306 [19:51:02] (03PS1) 10Andrew Bogott: prometheus: update for new eqiad1 service hosts [puppet] - 10https://gerrit.wikimedia.org/r/486353 [19:51:45] (03PS2) 10Andrew Bogott: prometheus: update for new eqiad1 service hosts [puppet] - 10https://gerrit.wikimedia.org/r/486353 [19:52:00] 10Operations, 10SRE-Access-Requests, 10WMF-NDA-Requests, 10Discovery-Search (Current work): Volunteer / Collaborator NDA for @Julia.glen - https://phabricator.wikimedia.org/T214630 (10Framawiki) [19:52:13] 10Operations, 10SRE-Access-Requests, 10WMF-NDA-Requests, 10Discovery-Search (Current work): Volunteer / Collaborator NDA for @Julia.glen - https://phabricator.wikimedia.org/T214630 (10TJones) I support this request. @Julia.glen needs access to our search logs to design and build algorithms to improve the q... [19:57:52] (03PS1) 10Gehel: wdqs: add icinga check of free blazegraph allocators [puppet] - 10https://gerrit.wikimedia.org/r/486356 (https://phabricator.wikimedia.org/T213372) [19:59:11] (03CR) 10Smalyshev: [C: 03+1] wdqs: add icinga check of free blazegraph allocators [puppet] - 10https://gerrit.wikimedia.org/r/486356 (https://phabricator.wikimedia.org/T213372) (owner: 10Gehel) [19:59:31] (03PS2) 10Gehel: wdqs: add icinga check of free blazegraph allocators [puppet] - 10https://gerrit.wikimedia.org/r/486356 (https://phabricator.wikimedia.org/T213372) [19:59:39] !log temp disabled puppet on phab1001 , applying ferm change to allow deployment servers to http to phab servers [19:59:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:59:42] (03CR) 10Framawiki: [C: 04-1] "Colon removing will also be needed in commit message." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486221 (https://phabricator.wikimedia.org/T214553) (owner: 10Ammarpad) [20:00:05] twentyafterfour: Dear deployers, time to do the MediaWiki train - Americas version deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190124T2000). [20:00:18] twentyafterfour: Sorry, full scap is 96% done. [20:00:35] (03CR) 10Gehel: [C: 03+2] wdqs: add icinga check of free blazegraph allocators [puppet] - 10https://gerrit.wikimedia.org/r/486356 (https://phabricator.wikimedia.org/T213372) (owner: 10Gehel) [20:02:46] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 3 others: ForeignAPIRepo wrongly returns non-protocol-relative URLs for original "thumbs" - https://phabricator.wikimedia.org/T50133 (10Tacsipacsi) [20:03:09] PROBLEM - High CPU load on API appserver on mw1227 is CRITICAL: CRITICAL - load average: 55.98, 24.31, 15.27 [20:03:19] PROBLEM - Apache HTTP on mw1246 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.002 second response time [20:03:29] James_F: no problem [20:03:43] PROBLEM - Nginx local proxy to apache on mw1346 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.008 second response time [20:03:48] er, maybe one problem ^ [20:03:57] Argh. [20:04:09] PROBLEM - Apache HTTP on mw1346 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.002 second response time [20:04:12] It's currently running scap-cdb-rebuild which is expensive. [20:04:20] 10Operations, 10SRE-Access-Requests, 10WMF-NDA-Requests, 10Discovery-Search (Current work): Volunteer / Collaborator NDA for @Julia.glen - https://phabricator.wikimedia.org/T214630 (10Aklapper) Hmm, was the #wmf-nda-requests tag set intentionally (see its description)? [20:04:33] RECOVERY - Apache HTTP on mw1246 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.034 second response time [20:04:57] RECOVERY - Nginx local proxy to apache on mw1346 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.071 second response time [20:05:23] RECOVERY - Apache HTTP on mw1346 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.032 second response time [20:05:39] RECOVERY - High CPU load on API appserver on mw1227 is OK: OK - load average: 20.28, 22.60, 16.10 [20:05:45] hmm [20:05:57] Everything recovered? [20:06:17] If an expensive user request came into a box at the same time as the rebuild I suppose that'd have slowed things down? [20:06:27] But user requests ideally shouldn't be expensive. [20:07:23] It's been stuck on scap-cdb-rebuild with four machines left for a couple of minutes now. [20:08:49] Oh, just dropped to three left, presumably it's not stuck, just very slow for some machines. [20:08:51] James_F: looks like just 60 second timeouts [20:09:17] Yeah. *sighs* [20:09:38] PROBLEM - Apache HTTP on mw1323 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [20:10:01] (03CR) 10Dzahn: "So this worked partially. You can now use apache-fast-test on phab1001 from deploy1001, confirmed. Though it does not mean it can be used " [puppet] - 10https://gerrit.wikimedia.org/r/486181 (owner: 10Dzahn) [20:10:38] RECOVERY - Apache HTTP on mw1323 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.041 second response time [20:10:46] twentyafterfour: Last time I needed to do this it took 17 minutes total and was smooth as butter. [20:10:59] !log jforrester@deploy1001 Finished scap: Post-SWAT full sync for new i18n for T208097 (duration: 33m 54s) [20:11:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:11:02] twentyafterfour: Naturally, as soon as I'm on a deadline it's slow as molasses. [20:11:06] OK, done. Finally. [20:11:21] James_F: not your fault ;) [20:11:23] And timeouts don't seem to be growing. [20:11:35] OK, I'm handing over the conch. :-) [20:11:41] no, I wonder if an hhvm restart is in order [20:11:51] James_F: thanks, I'll watch it [20:11:53] Maybe. [20:11:55] Ta! [20:13:39] akosiaris, we seem to be getting more [20:15:31] something is up with mw1323 I think [20:16:19] e.g. 2019012410179575 [20:16:21] lower rate now though [20:16:23] and basically all of this: https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketQueue;QueueID=58;Filter=Unlocked;View=Preview;SortBy=Age;OrderBy=Up;StartHit=101 [20:17:09] 10Operations, 10Traffic, 10netops: Connection problem (Moscow ISP, 4G) with Beeline / Sovintel - https://phabricator.wikimedia.org/T214459 (10ayounsi) >>! In T214459#4904249, @Iluvatar wrote: > See https://disk.yandex.by/d/Scrhfdy0BBYdAQ Thank you, it's very curious. Could you try `curl http://www.google.c... [20:19:41] 10Operations, 10SRE-Access-Requests, 10WMF-NDA-Requests, 10Discovery-Search (Current work): Volunteer / Collaborator NDA for @Julia.glen - https://phabricator.wikimedia.org/T214630 (10TJones) [20:21:20] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Volunteer / Collaborator NDA for @Julia.glen - https://phabricator.wikimedia.org/T214630 (10TJones) I think Andre is right and we aren't after Phab NDA access, so I've removed the tag. #sre-access-requests seems to be the right tag. [20:25:33] (03PS1) 10Dzahn: phabricator: allow http from deployment hosts on stand-by servers [puppet] - 10https://gerrit.wikimedia.org/r/486368 [20:26:22] (03CR) 10jerkins-bot: [V: 04-1] phabricator: allow http from deployment hosts on stand-by servers [puppet] - 10https://gerrit.wikimedia.org/r/486368 (owner: 10Dzahn) [20:26:26] (03CR) 10Dzahn: "follow-up https://gerrit.wikimedia.org/r/c/operations/puppet/+/486368" [puppet] - 10https://gerrit.wikimedia.org/r/486181 (owner: 10Dzahn) [20:28:04] (03PS3) 10Andrew Bogott: prometheus: update for new eqiad1 service hosts [puppet] - 10https://gerrit.wikimedia.org/r/486353 [20:28:13] 10Operations, 10Traffic, 10netops: Connection problem (Moscow ISP, 4G) with Beeline / Sovintel - https://phabricator.wikimedia.org/T214459 (10Iluvatar) Problem solved. The user received a new answer to one of many his letters to Support: "we reloaded GPRS settings". Connection is restored. 1 month later...... [20:29:36] (03PS2) 10Dzahn: phabricator: allow http from deployment hosts on stand-by servers [puppet] - 10https://gerrit.wikimedia.org/r/486368 [20:29:56] (03PS4) 10Andrew Bogott: prometheus: update for new eqiad1 service hosts [puppet] - 10https://gerrit.wikimedia.org/r/486353 [20:30:29] (03PS3) 10Dzahn: phabricator: allow http from deployment hosts on stand-by servers [puppet] - 10https://gerrit.wikimedia.org/r/486368 [20:30:37] (03CR) 10Andrew Bogott: [C: 03+2] prometheus: update for new eqiad1 service hosts [puppet] - 10https://gerrit.wikimedia.org/r/486353 (owner: 10Andrew Bogott) [20:30:38] hmm I guess things stablized, not sure what the load was about. mw1321 and mw1323 seem to be having a lot more timeouts than others and the cpu load on those servers seems a bit higher [20:31:08] (compared to some random other mw* servers [20:31:43] I'm just going to assume transient issues since things look stable [20:34:18] (03PS4) 10Dzahn: phabricator: allow http from deployment hosts on stand-by servers [puppet] - 10https://gerrit.wikimedia.org/r/486368 (https://phabricator.wikimedia.org/T190568) [20:34:56] (03PS1) 1020after4: all wikis to 1.33.0-wmf.14 refs T206668 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486370 [20:34:58] (03CR) 1020after4: [C: 03+2] all wikis to 1.33.0-wmf.14 refs T206668 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486370 (owner: 1020after4) [20:35:48] (03CR) 10Dzahn: [V: 03+1 C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/14475/" [puppet] - 10https://gerrit.wikimedia.org/r/486368 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn) [20:36:23] (03PS5) 10Dzahn: phabricator: allow http from deployment hosts on stand-by servers [puppet] - 10https://gerrit.wikimedia.org/r/486368 (https://phabricator.wikimedia.org/T190568) [20:39:22] (03CR) 10Dzahn: "no change on phab1001 (prod), added firewall rules on phab1002 and phab2001" [puppet] - 10https://gerrit.wikimedia.org/r/486368 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn) [20:40:23] (03Merged) 10jenkins-bot: all wikis to 1.33.0-wmf.14 refs T206668 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486370 (owner: 1020after4) [20:43:37] (03CR) 10jenkins-bot: all wikis to 1.33.0-wmf.14 refs T206668 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486370 (owner: 1020after4) [20:46:48] Before the train rolls out, I noticed a problem in AbuseFilter which should probably be train blocker [20:47:07] !log twentyafterfour@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.14 refs T206668 [20:47:09] Dunno if rollout has already started, but I'm gonna send a fix ASAP, could someone please deploy it? [20:47:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:47:10] T206668: 1.33.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T206668 [20:47:39] Daimona: sure [20:47:39] Uh [20:47:46] Alright [20:47:50] The fix is simple, coming in a minute [20:47:57] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568 (10Dzahn) >>! In T190568#4903583, @mmodell wrote: > https://wikitech.wikimedia.org/wiki/Phabricator/Meeting_Notes/2019... [20:48:30] (03PS1) 10Andrew Bogott: prometheus: replace nova API endpoint for eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/486376 [20:49:39] 10Operations, 10Phabricator, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568 (10Paladox) "* 429 Too Many Requests" someone will want to whitelist deploy1001 ip so it's not hit by... [20:49:43] 10Operations, 10Phabricator, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568 (10Dzahn) [20:50:02] https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/486379/ for who wants to do the honours [20:50:13] Daimona: I'll deploy it [20:50:19] Noice [20:50:21] Thanks [20:50:46] (03CR) 10Andrew Bogott: [C: 03+2] prometheus: replace nova API endpoint for eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/486376 (owner: 10Andrew Bogott) [20:58:39] 10Operations, 10Phabricator, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568 (10Dzahn) [21:00:40] 10Puppet: Audit /etc/apt directories - https://phabricator.wikimedia.org/T214605 (10jbond) [21:01:12] (03PS1) 10Andrew Bogott: Designate: allow monitoring access to the designate api over ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/486381 [21:02:10] (03PS1) 10Framawiki: [tests] wgExtraNamespaces should not contain colons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486382 (https://phabricator.wikimedia.org/T214632) [21:02:27] (03CR) 10Andrew Bogott: [C: 03+2] Designate: allow monitoring access to the designate api over ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/486381 (owner: 10Andrew Bogott) [21:03:13] (03CR) 10jerkins-bot: [V: 04-1] [tests] wgExtraNamespaces should not contain colons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486382 (https://phabricator.wikimedia.org/T214632) (owner: 10Framawiki) [21:03:29] 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install phab1002(WMF4727) - https://phabricator.wikimedia.org/T196019 (10Dzahn) there has been a meeting (https://wikitech.wikimedia.org/wiki/Phabricator/Meeting_Notes/2019-01-23) and further progress happened at T190568#4907166 which resolved on of the... [21:04:09] 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install phab1002(WMF4727) - https://phabricator.wikimedia.org/T196019 (10Dzahn) [21:05:09] 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install phab1002(WMF4727) - https://phabricator.wikimedia.org/T196019 (10Dzahn) I can't speak for the " - bios/drac/serial setup/testing" checkbox but the "add to operations puppet" and "make phabricator role work on stretch" part has been done. updated bo... [21:07:41] (03PS2) 10Framawiki: [tests] wgExtraNamespaces should not contain colons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486382 (https://phabricator.wikimedia.org/T214632) [21:07:47] 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install phab1002(WMF4727) - https://phabricator.wikimedia.org/T196019 (10Dzahn) @Robh See above fyi and i am not sure if i should have reopened the original hardware request as i did on T195623#4903960 or if this ticket here is the most appropriate place b... [21:08:41] 10Operations, 10wikitech.wikimedia.org: wikitech-static cert about to expire - https://phabricator.wikimedia.org/T214640 (10Andrew) [21:14:04] 10Operations, 10wikitech.wikimedia.org: wikitech-static cert about to expire - https://phabricator.wikimedia.org/T214640 (10Andrew) acccess to this host is just ssh root@wikitech-static.wikimedia.org with the password in pwstore. [21:14:20] PROBLEM - ElasticSearch shard size check on search.svc.eqiad.wmnet is CRITICAL: CRITICAL - commonswiki_content_1538078672(69gb) [21:16:29] twentyafterfour The patch is now merged and ready for deploy [21:16:39] !log twentyafterfour@deploy1001 Synchronized php-1.33.0-wmf.14/extensions/AbuseFilter/includes/Views/AbuseFilterView.php: sync I67ca475fa16ea449820f8c735531c2cc1b0ec975 refs T206668 (duration: 00m 47s) [21:16:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:16:43] T206668: 1.33.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T206668 [21:17:36] Daimona: deployed [21:17:47] Thanks :) [21:20:36] 10Operations, 10wikitech.wikimedia.org: wikitech-static cert about to expire - https://phabricator.wikimedia.org/T214640 (10Andrew) a:03Dzahn [21:25:58] (03CR) 10BryanDavis: "Some comments inline. All of them are things that could be done as followup refactors/cleanup. Per irc chat we could also rip out the OS v" (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/486350 (https://phabricator.wikimedia.org/T213965) (owner: 10Bstorm) [21:26:19] (03CR) 10Urbanecm: [tests] wgExtraNamespaces should not contain colons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486382 (https://phabricator.wikimedia.org/T214632) (owner: 10Framawiki) [21:30:06] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Volunteer / Collaborator NDA for @Julia.glen - https://phabricator.wikimedia.org/T214630 (10EBernhardson) [21:30:09] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Analytics query access for search platform NLP contractor - https://phabricator.wikimedia.org/T214623 (10EBernhardson) [21:31:10] (03PS3) 10Framawiki: [tests] wgExtraNamespaces should not contain colons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486382 (https://phabricator.wikimedia.org/T214632) [21:31:12] (03CR) 10Framawiki: [tests] wgExtraNamespaces should not contain colons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486382 (https://phabricator.wikimedia.org/T214632) (owner: 10Framawiki) [21:32:29] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10Framawiki) [21:33:46] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Volunteer / Collaborator NDA for @Julia.glen - https://phabricator.wikimedia.org/T214630 (10EBernhardson) In that case we don't need two SRE-Access-Requests tickets, merging into the other one. [21:38:04] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10EBernhardson) @Julia.glen For full details see https://wikitech.wikimedia.org/wiki/Production_shell_access#New_users... [21:40:23] !log Finished MediaWiki train for 1.33.0-wmf.14 (T206668) - there is no train next week so I'll be back with wmf.16 (T206670) in two weeks. [21:40:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:40:29] T206670: 1.33.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T206670 [21:40:30] T206668: 1.33.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T206668 [21:42:53] (03CR) 10Bstorm: toolforge: change dev_environ into shell_environ (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/486350 (https://phabricator.wikimedia.org/T213965) (owner: 10Bstorm) [21:44:17] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486382 (https://phabricator.wikimedia.org/T214632) (owner: 10Framawiki) [21:44:49] 10Operations, 10Operations-Software-Development, 10User-Joe, 10User-jijiki: Convert automation scripts to spicerack cookbooks - https://phabricator.wikimedia.org/T203943 (10Volans) [21:44:52] 10Operations, 10Operations-Software-Development, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Create a cookbook to copy data between WDQS servers - https://phabricator.wikimedia.org/T213401 (10Volans) [21:45:29] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T212990 (10Cmjohnson) [21:46:20] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10TJones) I support this request. @Julia.glen needs access to our search logs to design and build algorithms to improve th... [21:46:53] 10Operations, 10Operations-Software-Development, 10User-Joe, 10User-jijiki: Spicerack cookbooks TODO list - https://phabricator.wikimedia.org/T203943 (10Volans) [21:48:24] 10Operations, 10Operations-Software-Development, 10User-Joe, 10User-jijiki: Spicerack cookbooks TODO list - https://phabricator.wikimedia.org/T203943 (10Volans) [21:48:26] 10Operations, 10Operations-Software-Development, 10Kubernetes: Create Spicerack cook book to drain/reboot/uncordon a Kubernetes worker - https://phabricator.wikimedia.org/T212866 (10Volans) [21:48:33] (03PS4) 10Bstorm: toolforge: change dev_environ into shell_environ [puppet] - 10https://gerrit.wikimedia.org/r/486350 (https://phabricator.wikimedia.org/T213965) [21:49:20] RECOVERY - puppet last run on notebook1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [21:51:50] 10Operations, 10Operations-Software-Development, 10User-Joe, 10User-jijiki: Spicerack cookbooks TODO list - https://phabricator.wikimedia.org/T203943 (10Volans) [21:51:52] 10Operations, 10Discovery-Search (Current work), 10Epic: Migrate elasticsearch scripts to spicerack cookbooks - https://phabricator.wikimedia.org/T202885 (10Volans) [21:54:32] 10Operations, 10Wikidata, 10Wikimedia-production-error: DBQueryErrors when trying to create Wikidata Items - https://phabricator.wikimedia.org/T214644 (10abian) [21:57:36] (03PS5) 10Bstorm: toolforge: change dev_environ into shell_environ [puppet] - 10https://gerrit.wikimedia.org/r/486350 (https://phabricator.wikimedia.org/T213965) [21:58:54] (03CR) 10Bstorm: toolforge: change dev_environ into shell_environ (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/486350 (https://phabricator.wikimedia.org/T213965) (owner: 10Bstorm) [21:59:28] PROBLEM - HTTPS-wikitech-static on wikitech-static.wikimedia.org is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [22:04:41] that's me trying to fix the cert.. ugh [22:04:54] ACKing it and logging [22:05:11] eh, it's already back though [22:05:59] (03CR) 10BryanDavis: [C: 03+1] toolforge: change dev_environ into shell_environ [puppet] - 10https://gerrit.wikimedia.org/r/486350 (https://phabricator.wikimedia.org/T213965) (owner: 10Bstorm) [22:07:42] 10Operations, 10monitoring, 10User-fgiunchedi, 10cloud-services-team (Kanban): Deprecate Diamond collectors in Cloud VPS - https://phabricator.wikimedia.org/T210993 (10bd808) [22:08:01] 10Operations, 10Toolforge, 10monitoring, 10User-fgiunchedi, 10cloud-services-team (Kanban): Deprecate Diamond collectors in Tool Labs / Tool Forge - https://phabricator.wikimedia.org/T210991 (10bd808) [22:08:20] !log wikitech-static - certbot was already installed but it wasn't used to generate the existing certs so just running certbot renew did not work, attempted to use certbot to renew but apache plugin missing, installed python-certbot-apache (T214640) [22:08:21] 10Operations, 10Mail, 10OTRS: OTRS receiving flood of emails into info-en-c - https://phabricator.wikimedia.org/T214604 (10Peachey88) [22:08:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:08:26] T214640: wikitech-static cert about to expire - https://phabricator.wikimedia.org/T214640 [22:08:42] 10Operations, 10Puppet: Audit /etc/apt directories - https://phabricator.wikimedia.org/T214605 (10Peachey88) [22:11:55] !log wikitech-static attempted to use certbot with --authenticator webroot and --installer apache to make it properly work with certbot renew in the future. it created account in /etc/letsencrypt/ made backup in /root/; challenge fails though because all domains need to serve out of a webroot and there is status.wikimedia.org here as well. (T21640) [22:11:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:11:59] T21640: Categorymembers namespace filtering is inefficient, uses ugly hack in miser mode - https://phabricator.wikimedia.org/T21640 [22:13:12] ugh, wrong ticket [22:13:46] (03PS1) 10Papaul: DNS: Add mgmt DNS entries for cloudcontrol2001-dev and cloudvirt200[123]-dev [dns] - 10https://gerrit.wikimedia.org/r/486391 (https://phabricator.wikimedia.org/T214448) [22:16:51] (03PS2) 10Herron: rsyslog-shipper: enable omkafka action queue and retry [puppet] - 10https://gerrit.wikimedia.org/r/486169 (https://phabricator.wikimedia.org/T214176) [22:21:55] !log wikitech-static splitting apache2 config files into one file per vhost to make it possible for certbot t odetect them [22:21:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:23:56] (03PS6) 10Bstorm: toolforge: change dev_environ into shell_environ [puppet] - 10https://gerrit.wikimedia.org/r/486350 (https://phabricator.wikimedia.org/T213965) [22:23:58] Perhaps one of you guys could take a look at this and get the traceback? https://phabricator.wikimedia.org/T214644 [22:25:25] (03CR) 10Bstorm: [C: 03+2] toolforge: change dev_environ into shell_environ [puppet] - 10https://gerrit.wikimedia.org/r/486350 (https://phabricator.wikimedia.org/T213965) (owner: 10Bstorm) [22:26:32] RECOVERY - HTTPS-wikitech-static on wikitech-static.wikimedia.org is OK: SSL OK - Certificate wikitech-static.wikimedia.org valid until 2019-04-24 20:59:28 +0000 (expires in 89 days) [22:27:40] andrewbogott: ^ i did a bunch of things.. i will write them down [22:27:55] but the TLDR is it will be much easier next time [22:27:59] or not happen at all [22:28:13] (03CR) 10Tim Starling: [C: 03+1] "OK to deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486176 (owner: 10Krinkle) [22:37:42] 10Operations, 10Patch-For-Review, 10User-Marostegui, 10User-fgiunchedi: Audit "misc" cluster hosts - https://phabricator.wikimedia.org/T210486 (10colewhite) a:05colewhite→03None [22:41:53] (03PS4) 10Daimona Eaytoy: Rename globals and rights in AbuseFilter config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/480074 [22:48:24] (03PS9) 10Daimona Eaytoy: Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772 [22:48:35] (03PS5) 10Daimona Eaytoy: Enable $wgAbuseFilterRuntimeProfile on every wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423945 (https://phabricator.wikimedia.org/T191039) [22:49:16] PROBLEM - puppet last run on notebook1003 is CRITICAL: CRITICAL: Puppet has 42 failures. Last run 3 minutes ago with 42 failures. Failed resources (up to 3 shown): Package[tzdata],Package[apport],Package[command-not-found],Package[command-not-found-data] [22:50:53] thank you mutante [22:50:54] (03CR) 10CRusnov: [C: 03+1] tests: test also with Python 3.7 (031 comment) [software/cumin] - 10https://gerrit.wikimedia.org/r/481914 (owner: 10Volans) [22:51:28] (03PS10) 10Daimona Eaytoy: Move all AbuseFilter config to abusefilter.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477063 (https://phabricator.wikimedia.org/T145931) [22:51:56] RECOVERY - Memory correctable errors -EDAC- on kafka1023 is OK: (C)4 ge (W)2 ge 1 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=kafka1023&var-datasource=eqiad+prometheus/ops [22:58:26] (03PS11) 10Cwhite: prometheus: upgrade to node-exporter 0.17 in backports [puppet] - 10https://gerrit.wikimedia.org/r/486192 (https://phabricator.wikimedia.org/T213708) [23:00:15] (03PS1) 10Alexandros Kosiaris: mathoid: Update prometheus-stats.conf [deployment-charts] - 10https://gerrit.wikimedia.org/r/486396 [23:01:07] (03CR) 10Alexandros Kosiaris: "Proceeded with some changes per filippo's recommendation in https://wikitech.wikimedia.org/wiki/Prometheus/statsd_k8s" [deployment-charts] - 10https://gerrit.wikimedia.org/r/486396 (owner: 10Alexandros Kosiaris) [23:01:29] (03PS2) 10Alexandros Kosiaris: mathoid: Update prometheus-stats.conf [deployment-charts] - 10https://gerrit.wikimedia.org/r/486396 [23:13:57] (03PS1) 10CRusnov: Nudge requirements to Django 2.1.5 [software/netbox] - 10https://gerrit.wikimedia.org/r/486399 [23:19:52] 10Operations, 10wikitech.wikimedia.org: wikitech-static cert about to expire - https://phabricator.wikimedia.org/T214640 (10Dzahn) So... [[ https://certbot.eff.org/ | certbot ]] was already installed on this system but it had not been used to create the existing certificates. This meant simply running `certbot... [23:19:54] (03PS12) 10Cwhite: prometheus: upgrade to node-exporter 0.17 in backports [puppet] - 10https://gerrit.wikimedia.org/r/486192 (https://phabricator.wikimedia.org/T213708) [23:22:41] (03PS1) 10BryanDavis: toolforge: install libboost-python-dev on grid nodes [puppet] - 10https://gerrit.wikimedia.org/r/486401 (https://phabricator.wikimedia.org/T213965) [23:23:04] (03CR) 10Cwhite: "https://puppet-compiler.wmflabs.org/compiler1002/14477/" [puppet] - 10https://gerrit.wikimedia.org/r/486192 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [23:23:40] 10Operations, 10wikitech.wikimedia.org: wikitech-static cert about to expire - https://phabricator.wikimedia.org/T214640 (10Dzahn) 05Open→03Resolved [23:28:06] last wikitech-static [23:29:50] 10Operations, 10wikitech.wikimedia.org: wikitech-static cert about to expire - https://phabricator.wikimedia.org/T214640 (10Dzahn) ` root@wikitech-static-ord:/# certbot certificates Saving debug log to /var/log/letsencrypt/letsencrypt.log -----------------------------------------------------------------------... [23:32:34] 10Operations, 10Traffic, 10netops: Connection problem (Moscow ISP, 4G) with Beeline / Sovintel - https://phabricator.wikimedia.org/T214459 (10ayounsi) 05Open→03Resolved a:03ayounsi Good to see a happy ending here. Thanks to everyone who helped. [23:34:22] hello! arlolra is going to do an out-of-band parsoid deploy to pre-emptively address any potential dirty diffs that we might see on VE edits to galleries (followup to an earlier deploy from today). greg-g approves :-) fyi in case anyone is wondering why there is a parsoid deploy happening now. [23:37:57] (03CR) 10Bstorm: [C: 03+2] toolforge: install libboost-python-dev on grid nodes [puppet] - 10https://gerrit.wikimedia.org/r/486401 (https://phabricator.wikimedia.org/T213965) (owner: 10BryanDavis) [23:42:30] !log arlolra@deploy1001 Started deploy [parsoid/deploy@f9ef630]: Updating Parsoid to 4772f44 [23:42:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:46:00] (03PS1) 10Bstorm: toolforge: upgrade the stretch grid to openjdk-11 [puppet] - 10https://gerrit.wikimedia.org/r/486404 [23:46:18] (03PS1) 10MaxSem: Set confirmed permissions after extensions are loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486405 (https://phabricator.wikimedia.org/T213003) [23:47:24] (03CR) 10jerkins-bot: [V: 04-1] Set confirmed permissions after extensions are loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486405 (https://phabricator.wikimedia.org/T213003) (owner: 10MaxSem) [23:48:19] (03PS2) 10MaxSem: Set confirmed permissions after extensions are loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486405 (https://phabricator.wikimedia.org/T213003) [23:52:52] (03CR) 10Bstorm: [C: 03+2] toolforge: upgrade the stretch grid to openjdk-11 [puppet] - 10https://gerrit.wikimedia.org/r/486404 (owner: 10Bstorm) [23:54:28] !log arlolra@deploy1001 Finished deploy [parsoid/deploy@f9ef630]: Updating Parsoid to 4772f44 (duration: 11m 58s) [23:54:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:56:40] 10Operations, 10wikitech.wikimedia.org: wikitech-static cert about to expire - https://phabricator.wikimedia.org/T214640 (10Andrew) Thanks! It would be nice to add instructions about how to do this in the future to https://wikitech.wikimedia.org/wiki/Wikitech-static#How_do_we_maintain_it? [23:57:50] 10Operations, 10wikitech.wikimedia.org: wikitech-static cert about to expire - https://phabricator.wikimedia.org/T214640 (10Andrew) Oh, sorry I didn't read to the end -- looks like it automatically renews! So, nevermind :)