[00:00:22] (03CR) 10Dzahn: [C: 031] icinga-sms: use localhost as smtp server [puppet] - 10https://gerrit.wikimedia.org/r/429344 (https://phabricator.wikimedia.org/T82937) (owner: 10Herron) [00:02:21] (03PS1) 10Dzahn: icinga-sms: remove dry_run mode for single user SMS [puppet] - 10https://gerrit.wikimedia.org/r/434621 (https://phabricator.wikimedia.org/T82937) [00:02:52] (03PS2) 10Dzahn: icinga-sms: remove dry_run mode for single user SMS [puppet] - 10https://gerrit.wikimedia.org/r/434621 (https://phabricator.wikimedia.org/T82937) [00:04:15] (03PS3) 10Dzahn: icinga-sms: remove dry_run mode for single user SMS [puppet] - 10https://gerrit.wikimedia.org/r/434621 (https://phabricator.wikimedia.org/T82937) [00:06:12] (03CR) 10Dzahn: [C: 032] icinga-sms: remove dry_run mode for single user SMS [puppet] - 10https://gerrit.wikimedia.org/r/434621 (https://phabricator.wikimedia.org/T82937) (owner: 10Dzahn) [00:43:09] PROBLEM - puppet last run on ms-be1038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:13:29] RECOVERY - puppet last run on ms-be1038 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [01:32:09] (03CR) 10Aaron Schulz: profile::mediawiki::mcrouter_wancache: add ssl, proxy support (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/431737 (https://phabricator.wikimedia.org/T192370) (owner: 10Giuseppe Lavagetto) [02:02:58] (03CR) 10Aaron Schulz: profile::mediawiki::mcrouter_wancache: add ssl, proxy support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/431737 (https://phabricator.wikimedia.org/T192370) (owner: 10Giuseppe Lavagetto) [02:37:58] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1970 bytes in 0.147 second response time [02:48:08] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1951 bytes in 0.110 second response time [02:54:54] 10Operations, 10Availability (MediaWiki-MultiDC), 10Patch-For-Review, 10Performance-Team (Radar), 10User-Joe: mcrouter production architecture - https://phabricator.wikimedia.org/T192771#4223914 (10aaron) If SET/DELETE go to all mc* servers in the wancache-(eqiad/codfw) pools (as mediawiki_wancache is co... [02:58:54] !log l10nupdate@tin scap sync-l10n completed (1.32.0-wmf.4) (duration: 10m 54s) [02:58:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:00:19] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1970 bytes in 0.088 second response time [03:20:38] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1953 bytes in 0.112 second response time [03:27:38] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 894.44 seconds [03:51:39] ACKNOWLEDGEMENT - MD RAID on wasat is CRITICAL: CRITICAL: State: degraded, Active: 4, Working: 4, Failed: 2, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T195339 [03:51:44] 10Operations, 10ops-codfw: Degraded RAID on wasat - https://phabricator.wikimedia.org/T195339#4223921 (10ops-monitoring-bot) [03:57:21] !log l10nupdate@tin scap sync-l10n completed (1.32.0-wmf.5) (duration: 16m 39s) [03:57:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:04:43] !log l10nupdate@tin ResourceLoader cache refresh completed at Wed May 23 04:04:43 UTC 2018 (duration 7m 22s) [04:04:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:23:18] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 215.90 seconds [05:06:25] !log Deploy schema change on s3 primary master (db1075) - T191519 T188299 T190148 [05:06:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:06:32] T191519: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519 [05:06:32] T190148: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148 [05:06:32] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [05:12:05] !log Deploy schema change on s8 codfw primary master (db2045), this will generate lag on codfw - T194273 [05:12:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:12:10] T194273: Clean up indexes of wb_terms table - https://phabricator.wikimedia.org/T194273 [05:21:48] !log Deploy schema change on dbstore1002:s8 - T191519 T188299 T190148 T194273 T194270 [05:21:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:21:57] T191519: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519 [05:21:57] T194270: Drop 'tmp1' index from wb_terms table in production - https://phabricator.wikimedia.org/T194270 [05:21:58] T190148: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148 [05:21:58] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [05:21:58] T194273: Clean up indexes of wb_terms table - https://phabricator.wikimedia.org/T194273 [05:34:56] (03PS1) 10Marostegui: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434624 (https://phabricator.wikimedia.org/T190148) [05:36:15] (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434624 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [05:37:23] (03PS2) 10Marostegui: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434624 (https://phabricator.wikimedia.org/T190148) [05:39:23] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434624 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [05:40:45] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434624 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [05:41:00] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434624 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [05:43:07] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1092 for alter table (duration: 01m 27s) [05:43:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:43:44] 10Operations, 10Availability (MediaWiki-MultiDC), 10Patch-For-Review, 10Performance-Team (Radar), 10User-Joe: mcrouter production architecture - https://phabricator.wikimedia.org/T192771#4223990 (10Joe) The reason of the hybrid proxy approach is that mcrouter is known to use a non-insignificant amount of... [05:43:48] !log Deploy schema change on db1092 - T191519 T188299 T190148 T194273 T194270 [05:43:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:43:55] T191519: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519 [05:43:55] T194270: Drop 'tmp1' index from wb_terms table in production - https://phabricator.wikimedia.org/T194270 [05:43:55] T190148: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148 [05:43:55] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [05:43:55] T194273: Clean up indexes of wb_terms table - https://phabricator.wikimedia.org/T194273 [05:47:49] 10Operations, 10Move-Files-To-Commons, 10TCB-Team, 10Wikimedia-Extension-setup, and 2 others: Deploying FileExporter and FileImporter - https://phabricator.wikimedia.org/T190716#4224000 (10Joe) Looking through the tickets and the Readme of the extension, I think the default configurations are somewhat sane... [06:05:39] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1968 bytes in 0.102 second response time [06:18:06] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434629 [06:19:28] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434629 (owner: 10Marostegui) [06:20:51] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434629 (owner: 10Marostegui) [06:21:10] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434629 (owner: 10Marostegui) [06:22:24] (03PS1) 10Marostegui: db-eqiad.php: Depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434631 (https://phabricator.wikimedia.org/T190148) [06:22:27] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1092 after alter table (duration: 01m 20s) [06:22:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:23:37] (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434631 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [06:24:19] (03PS2) 10Marostegui: db-eqiad.php: Depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434631 (https://phabricator.wikimedia.org/T190148) [06:36:02] (03Abandoned) 10Marostegui: db-eqiad.php: Depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434631 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [06:37:48] (03PS1) 10Marostegui: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434632 (https://phabricator.wikimedia.org/T194273) [06:39:37] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434632 (https://phabricator.wikimedia.org/T194273) (owner: 10Marostegui) [06:41:01] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434632 (https://phabricator.wikimedia.org/T194273) (owner: 10Marostegui) [06:41:37] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434632 (https://phabricator.wikimedia.org/T194273) (owner: 10Marostegui) [06:44:31] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1092 for alter table (duration: 01m 20s) [06:44:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:44:56] !log restart zookeeper on druid100[4-6] for openjdk-8 upgrades [06:44:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:49:25] !log Re-add indexes on wb_terms on db1092 - T194273 [06:49:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:49:29] T194273: Clean up indexes of wb_terms table - https://phabricator.wikimedia.org/T194273 [06:52:08] !log upgrading mw1299-mw1306 (job runners) to HHVM 3,18.5+dfsg-1+wmf8+deb9u1 [06:52:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:56:57] * addshore thought he scheduled his deploy slot to start 1 hour earlier, sill timezones [06:57:00] *silly [06:58:12] marostegui: I see you doing merges in mediawiki-config, do you have more to come? [06:58:25] nope [06:58:28] all done :) [06:58:49] okay, in which case I'll bring my slot forward 1 hour, if you do need to merge anything feel free to poke me as it is probably also fine :) [06:59:04] sure, not expecting any changes soon :) [06:59:06] so feel free [06:59:14] jouncebot: update [06:59:24] jouncebot: refresh [06:59:25] I refreshed my knowledge about deployments. [06:59:27] :) [06:59:29] jouncebot: next [06:59:29] In 0 hour(s) and 0 minute(s): WikibaseLexeme wikidata.org deployment (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T0700) [07:00:04] addshore: (Dis)respected human, time to deploy WikibaseLexeme wikidata.org deployment (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T0700). Please do the needful. [07:11:09] 10Operations, 10Move-Files-To-Commons, 10TCB-Team, 10Wikimedia-Extension-setup, and 2 others: Deploying FileExporter and FileImporter - https://phabricator.wikimedia.org/T190716#4224074 (10Gilles) A Swift COPY is possible, but would require exposing sharding information of all wikis to a given wiki's confi... [07:13:09] (03PS1) 10Elukey: profile::druid::*: move prometheus jmx exp's config to /etc/prometheus [puppet] - 10https://gerrit.wikimedia.org/r/434634 (https://phabricator.wikimedia.org/T192636) [07:19:13] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4224086 (10WMDE-Fisch) >>! In T190717#4222177, @MoritzMuehlenhoff wrote: >>>! In T190717#4213084, @WMDE-Fisch wrote: >> So for deployment on... [07:20:13] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/11269/" [puppet] - 10https://gerrit.wikimedia.org/r/434634 (https://phabricator.wikimedia.org/T192636) (owner: 10Elukey) [07:20:17] (03PS2) 10Elukey: profile::druid::*: move prometheus jmx exp's config to /etc/prometheus [puppet] - 10https://gerrit.wikimedia.org/r/434634 (https://phabricator.wikimedia.org/T192636) [07:23:09] twentyafterfour: it looks like https://gerrit.wikimedia.org/r/#/c/434529/ is merged but not deployed on .4 on tin :( [07:25:52] again, unless i'm missing something.... [07:29:55] !log upload druid debs 0.11.0-3 to stretch-wikimedia [07:29:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:38:12] !log upgrading remaining API servers in eqiad to HHVM 3,18.5+dfsg-1+wmf8+deb9u1 [07:38:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:43:18] (03PS1) 10Jcrespo: mariadb: Allow to setup a custom template for individual instances [puppet] - 10https://gerrit.wikimedia.org/r/434638 (https://phabricator.wikimedia.org/T192979) [07:44:01] !log addshore@tin Synchronized php-1.32.0-wmf.4/extensions/WikibaseLexeme: [[gerrit:434601|Use ISO code und for missing language code]] (duration: 01m 31s) [07:44:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:45:01] 10Operations, 10Analytics, 10Research, 10Traffic, and 6 others: Referrer policy for browsers which only support the old spec - https://phabricator.wikimedia.org/T180921#4224098 (10TheDJ) @gh87 yeah, so that is correct. We indicate as referer policy: "origin, origin-when-crossorigin, origin-when-cross-origi... [07:45:45] !log addshore@tin Synchronized php-1.32.0-wmf.5/extensions/WikibaseLexeme: [[gerrit:434602|Use ISO code und for missing language code]] (duration: 01m 30s) [07:45:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:46:08] (03PS2) 10Addshore: testwikidata: Add Property NS to wgNamespacesToBeSearchedDefault [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434591 [07:46:17] (03PS3) 10Addshore: testwikidata: Add Lexeme NS to wgNamespacesToBeSearchedDefault [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434592 (https://phabricator.wikimedia.org/T191458) [07:46:28] PROBLEM - DPKG on mw1312 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [07:46:56] (03CR) 10Addshore: [C: 032] testwikidata: Add Property NS to wgNamespacesToBeSearchedDefault [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434591 (owner: 10Addshore) [07:47:08] PROBLEM - HHVM rendering on mw1315 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:47:17] (03CR) 10Muehlenhoff: [C: 031] Proton: Require the standard MW fonts [puppet] - 10https://gerrit.wikimedia.org/r/434545 (https://phabricator.wikimedia.org/T186748) (owner: 10Mobrovac) [07:47:29] RECOVERY - DPKG on mw1312 is OK: All packages OK [07:47:59] RECOVERY - HHVM rendering on mw1315 is OK: HTTP OK: HTTP/1.1 200 OK - 76235 bytes in 0.153 second response time [07:48:35] (03Merged) 10jenkins-bot: testwikidata: Add Property NS to wgNamespacesToBeSearchedDefault [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434591 (owner: 10Addshore) [07:49:41] (03CR) 10jenkins-bot: testwikidata: Add Property NS to wgNamespacesToBeSearchedDefault [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434591 (owner: 10Addshore) [07:52:00] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:434591|testwikidata: Add Property NS to wgNamespacesToBeSearchedDefault]] (duration: 01m 20s) [07:52:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:54:03] !log addshore@terbium:~$ mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki testwikidatawiki --reindexAndRemoveOk --indexIdentifier=now [07:54:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:00:43] !log upgrading remaining app servers in eqiad to HHVM 3,18.5+dfsg-1+wmf8+deb9u1 [08:00:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:02:08] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1969 bytes in 0.210 second response time [08:06:00] !log addshore@wasat:~$ mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki testwikidatawiki --reindexAndRemoveOk --indexIdentifier=now --reindexProcesses=10 [08:06:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:12:35] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4224151 (10MoritzMuehlenhoff) >>! In T190717#4224086, @WMDE-Fisch wrote: > > Just to let you know, the config change will be done today so w... [08:13:31] (03CR) 10Addshore: [C: 032] testwikidata: Add Lexeme NS to wgNamespacesToBeSearchedDefault [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434592 (https://phabricator.wikimedia.org/T191458) (owner: 10Addshore) [08:13:50] (03PS1) 10Jcrespo: mariadb: Setup db1107 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [08:13:59] (03PS1) 10Marostegui: db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434640 (https://phabricator.wikimedia.org/T190148) [08:14:18] (03PS1) 10Gilles: Launch performance survey on cawiki and enwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434641 (https://phabricator.wikimedia.org/T187299) [08:14:35] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1107 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [08:14:48] (03Merged) 10jenkins-bot: testwikidata: Add Lexeme NS to wgNamespacesToBeSearchedDefault [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434592 (https://phabricator.wikimedia.org/T191458) (owner: 10Addshore) [08:15:56] (03CR) 10Marostegui: "Commit says db1107 but you mean db1117 right? as the change on site.pp reflects" [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [08:16:43] addshore: can I merge https://gerrit.wikimedia.org/r/#/c/434640/ sometime soon? [08:17:03] marostegui: yup, just wait 1 sec for my current patch to sync :) [08:17:10] ~1 min [08:17:14] sure, let me know when I can :) [08:17:50] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:434592|testwikidata: Add Lexeme NS to wgNamespacesToBeSearchedDefault]] (duration: 01m 19s) [08:17:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:55] marostegui: all yours! :) [08:17:59] thanks! [08:18:02] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434640 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [08:18:36] !log addshore@terbium:~$ mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki testwikidatawiki --reindexAndRemoveOk --indexIdentifier=now [08:18:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:18:41] !log addshore@wasat:~$ mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki testwikidatawiki --reindexAndRemoveOk --indexIdentifier=now [08:18:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:18] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434640 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [08:19:36] (03CR) 10jenkins-bot: testwikidata: Add Lexeme NS to wgNamespacesToBeSearchedDefault [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434592 (https://phabricator.wikimedia.org/T191458) (owner: 10Addshore) [08:20:50] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1099:3318 for alter table (duration: 01m 20s) [08:20:50] addshore: I am done, all yours now [08:20:52] !log all the updateSearchIndexConfig runs from wasat also had --cluster=codfw [08:20:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:20:54] marostegui: thanks! [08:20:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:11] !log Deploy schema change on db1099:3318 - T191519 T188299 T190148 T194270 [08:21:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:17] T191519: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519 [08:21:17] T194270: Drop 'tmp1' index from wb_terms table in production - https://phabricator.wikimedia.org/T194270 [08:21:17] T190148: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148 [08:21:17] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [08:21:26] (03CR) 10Jcrespo: [C: 031] "This look as expected: https://puppet-compiler.wmflabs.org/compiler02/11270/" [puppet] - 10https://gerrit.wikimedia.org/r/434638 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [08:21:56] (03CR) 10Marostegui: [C: 031] mariadb: Allow to setup a custom template for individual instances [puppet] - 10https://gerrit.wikimedia.org/r/434638 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [08:22:17] (03CR) 10Jcrespo: [C: 032] mariadb: Allow to setup a custom template for individual instances [puppet] - 10https://gerrit.wikimedia.org/r/434638 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [08:31:49] !log upgrading remaining app/API servers in codfw to HHVM 3,18.5+dfsg-1+wmf8+deb9u1 [08:31:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:23] (03PS2) 10Jcrespo: mariadb: Setup db1107 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [08:43:07] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1107 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [08:43:32] (03PS3) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [08:44:08] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [08:44:30] (03PS3) 10WMDE-Fisch: Disable wikidiff2 inline moved paragraphs by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434158 (https://phabricator.wikimedia.org/T194271) [08:47:17] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4224212 (10WMDE-Fisch) > In T190717#4224151, @MoritzMuehlenhoff wrote: > Ok, if the config change gets deployed today (can you please add me... [08:48:29] (03PS4) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [08:49:03] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [08:49:24] (03PS1) 10DCausse: [testwikidata] Add Lexeme NS to ContentNamespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434642 (https://phabricator.wikimedia.org/T191458) [08:49:49] (03PS5) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [08:50:32] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [08:51:36] (03PS6) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [08:51:47] (03CR) 10Addshore: [C: 032] [testwikidata] Add Lexeme NS to ContentNamespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434642 (https://phabricator.wikimedia.org/T191458) (owner: 10DCausse) [08:52:17] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [08:52:19] 10Operations, 10MediaWiki-extensions-Translate, 10Wikimedia-Incident, 10Wikimedia-log-errors: 503 error attempting to open multiple projects (Wikipedia and meta wiki are loading very slowly) - https://phabricator.wikimedia.org/T195293#4224220 (10Nikerabbit) Also: https://meta.wikimedia.org/w/index.php?titl... [08:53:07] thanks for the analysis Nikerabbit :) [08:53:18] (03Merged) 10jenkins-bot: [testwikidata] Add Lexeme NS to ContentNamespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434642 (https://phabricator.wikimedia.org/T191458) (owner: 10DCausse) [08:54:37] (03PS7) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [08:55:14] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [08:56:19] 10Operations, 10MediaWiki-extensions-Translate, 10Wikimedia-Incident, 10Wikimedia-log-errors: 503 error attempting to open multiple projects (Wikipedia and meta wiki are loading very slowly) - https://phabricator.wikimedia.org/T195293#4224241 (10Nikerabbit) Also, does anyone think that fixing the index is... [08:57:53] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434643 [08:57:56] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434643 [08:58:04] addshore: Can I merge? ^ [08:59:02] (03CR) 10Addshore: [C: 04-2] wikidata: Add Lexeme NS to wgNamespacesToBeSearchedDefault [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434593 (https://phabricator.wikimedia.org/T191457) (owner: 10Addshore) [08:59:14] (03PS1) 10Ema: text-be: remove code adding X-F-P to Vary [puppet] - 10https://gerrit.wikimedia.org/r/434644 (https://phabricator.wikimedia.org/T53700) [09:00:05] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:434642|[testwikidata] Add Lexeme NS to ContentNamespaces]] (duration: 01m 20s) [09:00:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:00:17] marostegui: go for it! [09:00:28] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434643 (owner: 10Marostegui) [09:00:30] :) [09:00:31] thanks! [09:01:47] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434643 (owner: 10Marostegui) [09:02:05] !log addshore@terbium:~$ mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki testwikidatawiki --queue [09:02:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:19] (03PS2) 10Ema: text-be: remove code adding X-F-P to Vary [puppet] - 10https://gerrit.wikimedia.org/r/434644 (https://phabricator.wikimedia.org/T53700) [09:02:58] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0 [09:03:08] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [09:03:09] addshore: yeah I'd rather not "break" Wikipedia gain in the future :) [09:03:23] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1099:3318 after alter table (duration: 01m 19s) [09:03:26] addshore: I am done [09:03:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:03:39] marostegui: thanks! [09:04:15] (03PS8) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [09:04:53] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [09:08:24] (03Abandoned) 10Addshore: wikidata: Add Lexeme NS to wgNamespacesToBeSearchedDefault [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434593 (https://phabricator.wikimedia.org/T191457) (owner: 10Addshore) [09:10:47] (03PS9) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [09:11:23] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [09:21:59] PROBLEM - puppet last run on mw2194 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[hhvm-dbg] [09:24:48] (03PS1) 10Marostegui: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434648 (https://phabricator.wikimedia.org/T190148) [09:26:58] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434648 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [09:28:17] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434648 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [09:29:57] (03PS10) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [09:30:14] (03PS2) 10Muehlenhoff: Switch video scalers to a profile [puppet] - 10https://gerrit.wikimedia.org/r/430892 [09:30:28] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1101:3318 for alter table (duration: 01m 20s) [09:30:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:30:36] !log Deploy schema change on db1101:3318 - T191519 T188299 T190148 T194270 [09:30:40] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [09:30:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:30:43] T191519: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519 [09:30:43] T194270: Drop 'tmp1' index from wb_terms table in production - https://phabricator.wikimedia.org/T194270 [09:30:43] T190148: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148 [09:30:43] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [09:32:55] (03CR) 10Mark Bergsma: [C: 031] "Nice work!" (034 comments) [debs/pybal] - 10https://gerrit.wikimedia.org/r/434328 (https://phabricator.wikimedia.org/T192437) (owner: 10Vgutierrez) [09:33:24] (03PS11) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [09:33:49] (03PS12) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [09:34:22] !log upgrading remaining job runners in eqiad to HHVM 3,18.5+dfsg-1+wmf8+deb9u1 [09:34:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:34:26] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [09:35:58] !log Finished, forceSearchIndex.php --wiki testwikidatawiki [09:36:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:37:54] (03PS1) 10Elukey: role::druid::public:worker: upgrade settings for Druid 0.11 [puppet] - 10https://gerrit.wikimedia.org/r/434649 (https://phabricator.wikimedia.org/T193712) [09:38:29] (03PS13) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [09:39:03] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [09:41:48] (03PS14) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [09:42:22] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [09:42:45] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/11271/druid1004.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/434649 (https://phabricator.wikimedia.org/T193712) (owner: 10Elukey) [09:46:41] (03PS15) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [09:47:18] RECOVERY - puppet last run on mw2194 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:47:20] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [09:48:29] (03PS2) 10Addshore: Enable WikibaseLexeme on wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434453 (https://phabricator.wikimedia.org/T191457) [09:48:47] (03PS3) 10Addshore: Enable WikibaseLexeme on wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434453 (https://phabricator.wikimedia.org/T191457) [09:49:06] (03PS4) 10Addshore: Enable WikibaseLexeme on wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434453 (https://phabricator.wikimedia.org/T191457) [09:49:41] jouncebot now [09:49:41] For the next 3 hour(s) and 10 minute(s): WikibaseLexeme wikidata.org deployment (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T0700) [09:49:54] * addshore is happy with the size of this deploy slot [09:50:04] (03CR) 10DCausse: [C: 031] Enable WikibaseLexeme on wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434453 (https://phabricator.wikimedia.org/T191457) (owner: 10Addshore) [09:50:14] (03CR) 10jerkins-bot: [V: 04-1] Enable WikibaseLexeme on wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434453 (https://phabricator.wikimedia.org/T191457) (owner: 10Addshore) [09:51:15] (03PS16) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [09:51:28] (03PS5) 10Addshore: Enable WikibaseLexeme on wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434453 (https://phabricator.wikimedia.org/T191457) [09:51:48] (03CR) 10Mark Bergsma: [C: 031] Create MonitoringProtocolTestCase base class (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/430337 (owner: 10Mark Bergsma) [09:51:55] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [09:52:30] Morning apergos :) [09:52:57] (03PS17) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [09:53:23] morning [09:53:30] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [09:53:39] apergos: I just wanted to check an assumption that I'm about to make... tin and terbium definately serve no mediawiki web requests? :) [09:54:23] they do not serve web requests. terbium in particular does many other mediawiki-related things but not that [09:54:41] why do you ask? [09:54:48] Good good, just what I was expecting, but wasn't sure how to technically check that. Assumption removed! [09:55:12] (03PS18) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [09:55:42] apergos: for the WikibaseLexeme deployment I'm going to merge the enabling patch, then scap pull to tin or terbium, and then run a maint script to setup search before actually syncing everywhere and enabling users to create entities [09:56:04] and I didn't want my scap pull from tin to suddenly start serving the feature to any web users at all [09:56:09] right [09:56:11] you're good [09:56:23] :) [09:56:28] I'd run the script on terbum please [09:56:35] tin really is meant to be just for deploys [09:57:38] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Renaming file to have a .gz suffix without actually being .gz is bound to cause confusion. Better fix the cron instead. I have to ask why " [puppet] - 10https://gerrit.wikimedia.org/r/434605 (owner: 10Paladox) [09:58:40] (03PS4) 10Alexandros Kosiaris: Proton: Require the standard MW fonts [puppet] - 10https://gerrit.wikimedia.org/r/434545 (https://phabricator.wikimedia.org/T186748) (owner: 10Mobrovac) [09:58:45] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Proton: Require the standard MW fonts [puppet] - 10https://gerrit.wikimedia.org/r/434545 (https://phabricator.wikimedia.org/T186748) (owner: 10Mobrovac) [09:59:14] apergos: ack! [10:00:03] (03CR) 10Paladox: "I’m not really sure how we are compressing the logs, I think it’s gerrit, but looking at gerrit source code, it disables doing the logs it" [puppet] - 10https://gerrit.wikimedia.org/r/434605 (owner: 10Paladox) [10:01:52] (03CR) 10Paladox: "> Renaming file to have a .gz suffix without actually being .gz is" [puppet] - 10https://gerrit.wikimedia.org/r/434605 (owner: 10Paladox) [10:02:34] (03CR) 10Jcrespo: [C: 032] mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [10:02:40] (03PS19) 10Jcrespo: mariadb: Setup db1117 as misc multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434639 (https://phabricator.wikimedia.org/T192979) [10:04:50] (03PS3) 10Ppchelko: Switch all jobs for everything except wikipedia, commons and wikidata. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429980 (https://phabricator.wikimedia.org/T190327) [10:06:38] (03PS1) 10Jcrespo: mariadb: Fix reference to misc_multiinstance template [puppet] - 10https://gerrit.wikimedia.org/r/434651 (https://phabricator.wikimedia.org/T192979) [10:06:54] (03PS2) 10Jcrespo: mariadb: Fix reference to misc_multiinstance template [puppet] - 10https://gerrit.wikimedia.org/r/434651 (https://phabricator.wikimedia.org/T192979) [10:07:30] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Fix reference to misc_multiinstance template [puppet] - 10https://gerrit.wikimedia.org/r/434651 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [10:07:48] (03CR) 10Addshore: [C: 032] Enable WikibaseLexeme on wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434453 (https://phabricator.wikimedia.org/T191457) (owner: 10Addshore) [10:07:52] :O [10:08:57] (03PS3) 10Jcrespo: mariadb: Fix reference to misc_multiinstance template [puppet] - 10https://gerrit.wikimedia.org/r/434651 (https://phabricator.wikimedia.org/T192979) [10:09:06] (03Merged) 10jenkins-bot: Enable WikibaseLexeme on wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434453 (https://phabricator.wikimedia.org/T191457) (owner: 10Addshore) [10:09:08] PROBLEM - puppet last run on db1117 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:09:30] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Fix reference to misc_multiinstance template [puppet] - 10https://gerrit.wikimedia.org/r/434651 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [10:10:06] (03PS4) 10Jcrespo: mariadb: Fix reference to misc_multiinstance template [puppet] - 10https://gerrit.wikimedia.org/r/434651 (https://phabricator.wikimedia.org/T192979) [10:10:40] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Fix reference to misc_multiinstance template [puppet] - 10https://gerrit.wikimedia.org/r/434651 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [10:10:58] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434653 [10:11:10] (03PS5) 10Jcrespo: mariadb: Fix reference to misc_multiinstance template [puppet] - 10https://gerrit.wikimedia.org/r/434651 (https://phabricator.wikimedia.org/T192979) [10:11:44] (03PS4) 10Mobrovac: Switch all jobs for everything except wikipedia, commons and wikidata. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429980 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [10:11:46] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Fix reference to misc_multiinstance template [puppet] - 10https://gerrit.wikimedia.org/r/434651 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [10:12:00] (03PS6) 10Jcrespo: mariadb: Fix reference to misc_multiinstance template [puppet] - 10https://gerrit.wikimedia.org/r/434651 (https://phabricator.wikimedia.org/T192979) [10:12:20] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434653 (owner: 10Marostegui) [10:12:26] addshore: i see you merging in mw-config, are you done deploying? [10:12:30] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Fix reference to misc_multiinstance template [puppet] - 10https://gerrit.wikimedia.org/r/434651 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [10:12:35] mobrovac: no [10:12:37] ah marostegui is in it too :P [10:12:42] Just this one [10:12:48] And then I will be done for a couple of hours [10:12:50] :) [10:12:54] ack! just dont sync InitialiseSettings.php :P [10:13:12] :) [10:13:12] * addshore waits for your sync [10:13:21] addshore: can you ping me when done? [10:13:21] (03PS7) 10Jcrespo: mariadb: Fix reference to misc_multiinstance template [puppet] - 10https://gerrit.wikimedia.org/r/434651 (https://phabricator.wikimedia.org/T192979) [10:13:31] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434653 (owner: 10Marostegui) [10:13:35] mobrovac: yes, althought my slot is another 2 hours long I think :P [10:13:46] if it is something trivial though it could go in during it? [10:13:59] (03PS1) 10Alexandros Kosiaris: ircecho: Add restart => true to systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/434654 [10:14:02] heh no, but it's a quick one [10:14:04] (03CR) 10Jcrespo: [C: 032] mariadb: Fix reference to misc_multiinstance template [puppet] - 10https://gerrit.wikimedia.org/r/434651 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [10:14:19] !log addshore@wasat:~$ mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki wikidatawiki --justMapping --cluster codfw [10:14:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:14:34] (03CR) 10Alexandros Kosiaris: [C: 04-1] keyholder: base::service_unit -> systemd::service (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/434535 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [10:15:08] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1101:3318 after alter table (duration: 01m 20s) [10:15:09] mobrovac addshore I am done [10:15:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:16] marostegui: thanks! [10:15:28] mobrovac: is yours in IS.php or? [10:15:47] ofc it is addshore :) [10:15:51] !log updateSearchIndexConfig.php with --justMapping failed :( [10:15:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:16:28] mobrovac: okay, well in that case I'll ping you when ready, but it could be a while! [10:16:33] (03PS1) 10Jcrespo: mariadb: Fix duplicate definition of monitoring at misc_multiinstance role [puppet] - 10https://gerrit.wikimedia.org/r/434655 (https://phabricator.wikimedia.org/T192979) [10:16:40] If only you'd caught me 5 mins earlier ;) [10:16:56] addshore: don't worry, i'll just delay mine to after swat, do your things in peace :) [10:17:01] heh timing is everything! [10:17:03] ack! [10:17:13] (03CR) 10Jcrespo: [C: 032] mariadb: Fix duplicate definition of monitoring at misc_multiinstance role [puppet] - 10https://gerrit.wikimedia.org/r/434655 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [10:17:15] !log manually updating mapping on wikidatawiki elastic indices to add new lexeme fields [10:17:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:19:03] addshore: done ^ [10:19:13] dcausse: awesome [10:19:18] so, I should be okay to try an edit now? [10:19:18] RECOVERY - puppet last run on db1117 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [10:19:26] addshore: yes [10:20:03] 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 3 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#4224518 (10mobrovac) >>! In T186748#4223186, @pmiazga wrote: > I hope that https://gerrit.wikimedia.org/r/434545 can help with that. @mobrova... [10:20:08] PROBLEM - Check systemd state on db1117 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [10:20:41] (03PS1) 10Ladsgroup: Make clients read from term_text instead of term_search_key [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434658 (https://phabricator.wikimedia.org/T194273) [10:21:19] !log restarting db1117 for setup and upgrade [10:21:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:23:30] addshore or anybody else with deploy privs, can you purge https://en.wikipedia.org/static/images/project-logos/liwikibooks.png please? [10:24:09] $ you@terbium echo "https://en.wikipedia.org/static/images/project-logos/liwikibooks.png" | mwscript purgeList.php [10:26:20] !log ppchelko@tin Started deploy [cpjobqueue/deploy@ffdc19a]: Logging improvements [10:26:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:27:07] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@ffdc19a]: Logging improvements (duration: 00m 47s) [10:27:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:28:06] (03PS2) 10Mobrovac: VCL: Normalise the Accept-Language header for the REST API [puppet] - 10https://gerrit.wikimedia.org/r/434558 (https://phabricator.wikimedia.org/T195327) [10:28:46] If anybody's going to run my command, please associate the SAL entry with T193680 [10:28:47] T193680: Logo change request li.wikibooks - https://phabricator.wikimedia.org/T193680 [10:29:21] (or if not, I'll request it during SWAT window) [10:32:23] !log stop db1065 for cloning (proxys will complain) [10:32:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:17] I don't want to ack the proxy errors [10:35:37] because we need to be notified if the other host fails, too [10:36:31] PROBLEM - haproxy failover on dbproxy1006 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [10:36:40] PROBLEM - haproxy failover on dbproxy1001 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [10:37:28] * addshore continues Lexeme things [10:38:04] 10Operations, 10MediaWiki-extensions-Translate, 10Language-2018-Apr-June, 10Wikimedia-Incident, 10Wikimedia-log-errors: 503 error attempting to open multiple projects (Wikipedia and meta wiki are loading very slowly) - https://phabricator.wikimedia.org/T195293#4224566 (10Nikerabbit) a:03Nikerabbit [10:39:43] syncing [10:40:52] (03CR) 10Giuseppe Lavagetto: profile::mediawiki::mcrouter_wancache: add ssl, proxy support (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/431737 (https://phabricator.wikimedia.org/T192370) (owner: 10Giuseppe Lavagetto) [10:40:53] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: Enable WikibaseLexeme on wikidata.org T191457 T168260 https://gerrit.wikimedia.org/r/#/c/434453 (duration: 01m 20s) [10:40:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:40:59] T168260: Deploy WikibaseLexeme extension on Wikimedia cluster - https://phabricator.wikimedia.org/T168260 [10:40:59] T191457: Deploy WikibaseLexeme on www.wikidata.org - https://phabricator.wikimedia.org/T191457 [10:42:20] !log addshore@terbium mwscript extensions/WikibaseLexeme/maintenance/createBlacklistedLexemes.php --wiki testwikidatawiki [10:42:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:27] * addshore waits for explosions [10:46:27] :-D [10:46:45] BOOM! [10:47:02] zeljkof: I hope that was some sort of beer bottle [10:47:08] or something fizzy ;) [10:47:28] carbonated water ;) [10:49:29] :D [10:51:50] 10Operations, 10Traffic: Identify bots using AES128-SHA maintainers running on toolforge - https://phabricator.wikimedia.org/T194380#4224611 (10MaxBioHazard) I re-run this bot with this option. [10:58:01] RECOVERY - haproxy failover on dbproxy1001 is OK: OK check_failover servers up 2 down 0 [10:59:01] RECOVERY - haproxy failover on dbproxy1006 is OK: OK check_failover servers up 2 down 0 [11:06:46] (03PS1) 10Jcrespo: misc-mariadb-monitoring: Add db1117 replica instances to monitoring [puppet] - 10https://gerrit.wikimedia.org/r/434672 (https://phabricator.wikimedia.org/T192979) [11:09:00] !log Lexeme deploy window probably done (unless something explodes) [11:09:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:13:55] (03CR) 10Jkroll: Disable wikidiff2 inline moved paragraphs by default (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434158 (https://phabricator.wikimedia.org/T194271) (owner: 10WMDE-Fisch) [11:15:39] (03PS1) 10Jcrespo: mariadb-hosts: Add db1117 instances to m1,m2,m3 and m5 [software] - 10https://gerrit.wikimedia.org/r/434675 (https://phabricator.wikimedia.org/T192979) [11:16:16] (03CR) 10Jcrespo: "I may need a second look 0:-)" [software] - 10https://gerrit.wikimedia.org/r/434675 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [11:17:44] (03PS2) 10Jcrespo: mariadb-hosts: Add db1117 instances to m1,m2,m3 and m5 [software] - 10https://gerrit.wikimedia.org/r/434675 (https://phabricator.wikimedia.org/T192979) [11:19:27] (03CR) 10Jcrespo: [C: 032] misc-mariadb-monitoring: Add db1117 replica instances to monitoring [puppet] - 10https://gerrit.wikimedia.org/r/434672 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [11:20:39] (03CR) 10Marostegui: [C: 031] mariadb-hosts: Add db1117 instances to m1,m2,m3 and m5 [software] - 10https://gerrit.wikimedia.org/r/434675 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [11:21:06] (03PS1) 10Muehlenhoff: Enable microcode updates for Ganeti hosts [puppet] - 10https://gerrit.wikimedia.org/r/434676 [11:24:37] (03PS2) 10Alexandros Kosiaris: ircecho: Add restart => true to systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/434654 [11:24:49] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] ircecho: Add restart => true to systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/434654 (owner: 10Alexandros Kosiaris) [11:25:01] (03PS3) 10MarcoAurelio: security: Remove dangerous unused 'botadmin' group at mlwik{tionary|isource} [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433136 (https://phabricator.wikimedia.org/T152296) [11:25:12] (03PS3) 10MarcoAurelio: zhwiki: let 'accountcreators' to self-remove their permissions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433546 (https://phabricator.wikimedia.org/T194871) [11:25:19] jouncebot: next [11:25:19] In 1 hour(s) and 34 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T1300) [11:26:26] mobrovac: if your around it's all free now [11:26:40] kk thnx addshore [11:30:27] !log installing curl security updates [11:30:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:33:25] !log addshore@mw1317:~$ scap pull # It seemed to be missing changes..... [11:33:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:34:52] 10Operations, 10Wikimedia-General-or-Unknown: Remove pear/mail packages from WMF MW app servers - https://phabricator.wikimedia.org/T195364#4224797 (10Reedy) [11:35:47] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4224809 (10jcrespo) m1 master is on C, so the following services may go down: * bacula * etherpadlite * librenms * puppet * racktables * rt [11:36:14] 10Operations, 10Traffic, 10netops: cp intermitent IPsec MTU issue - https://phabricator.wikimedia.org/T195365#4224812 (10ayounsi) p:05Triage>03Low [11:36:32] 10Operations, 10Traffic, 10netops: cp intermittent IPsec MTU issue - https://phabricator.wikimedia.org/T195365#4224827 (10ayounsi) [11:37:02] (03CR) 10Alexandros Kosiaris: [C: 032] "Production is fully kubernetes, this is fine as far as production goes. I had a look at https://tools.wmflabs.org/openstack-browser/projec" [puppet] - 10https://gerrit.wikimedia.org/r/434537 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [11:37:08] (03PS2) 10Alexandros Kosiaris: k8s: remove trusty/upstart support [puppet] - 10https://gerrit.wikimedia.org/r/434537 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [11:40:41] PROBLEM - ganeti-noded running on ganeti2002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded [11:40:51] PROBLEM - ganeti-mond running on ganeti2002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-mond [11:41:20] PROBLEM - ganeti-confd running on ganeti2002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 111 (gnt-confd), command name ganeti-confd [11:41:36] (03CR) 10WMDE-Fisch: Disable wikidiff2 inline moved paragraphs by default (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434158 (https://phabricator.wikimedia.org/T194271) (owner: 10WMDE-Fisch) [11:43:14] (03PS1) 10Jcrespo: mariadb: Switch m2-replica to db1117:3321 [puppet] - 10https://gerrit.wikimedia.org/r/434680 (https://phabricator.wikimedia.org/T192979) [11:44:27] !log ppchelko@tin Started deploy [cpjobqueue/deploy@59674ba]: Correctly commit pending messages for multi-topic rules [11:44:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:45:14] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@59674ba]: Correctly commit pending messages for multi-topic rules (duration: 00m 47s) [11:45:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:45:45] !log reimage ganeti2002, ganeti2006 as debian stretch [11:45:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:46:11] (03PS4) 10Volans: facter: refactor the net_driver fact [puppet] - 10https://gerrit.wikimedia.org/r/434032 [11:47:13] (03CR) 10Volans: "The ruby code was tested on cp2024, sarin, lvs2002, ganeti2008, labvirt1019." [puppet] - 10https://gerrit.wikimedia.org/r/434032 (owner: 10Volans) [11:47:22] (03PS5) 10Volans: facter: refactor the net_driver fact [puppet] - 10https://gerrit.wikimedia.org/r/434032 [11:52:34] (03CR) 10MarcoAurelio: "@Foks: can I consider this approved by Trust and Safety?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433136 (https://phabricator.wikimedia.org/T152296) (owner: 10MarcoAurelio) [11:56:48] (03PS7) 10Vgutierrez: Implement kubernetes configuration observer [debs/pybal] - 10https://gerrit.wikimedia.org/r/434328 (https://phabricator.wikimedia.org/T192437) [11:59:35] (03CR) 10Alexandros Kosiaris: [C: 04-1] otrs: base::service_unit -> systemd::service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/433491 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [12:02:52] (03CR) 10Alexandros Kosiaris: [C: 031] Enable microcode updates for Ganeti hosts [puppet] - 10https://gerrit.wikimedia.org/r/434676 (owner: 10Muehlenhoff) [12:04:04] (03PS2) 10Elukey: role::druid::public:worker: upgrade settings for Druid 0.11 [puppet] - 10https://gerrit.wikimedia.org/r/434649 (https://phabricator.wikimedia.org/T193712) [12:04:49] (03PS8) 10Vgutierrez: Implement kubernetes configuration observer [debs/pybal] - 10https://gerrit.wikimedia.org/r/434328 (https://phabricator.wikimedia.org/T192437) [12:05:15] (03CR) 10Vgutierrez: Implement kubernetes configuration observer (033 comments) [debs/pybal] - 10https://gerrit.wikimedia.org/r/434328 (https://phabricator.wikimedia.org/T192437) (owner: 10Vgutierrez) [12:05:57] (03CR) 10Elukey: [C: 032] role::druid::public:worker: upgrade settings for Druid 0.11 [puppet] - 10https://gerrit.wikimedia.org/r/434649 (https://phabricator.wikimedia.org/T193712) (owner: 10Elukey) [12:06:18] !log upgrade druid public to druid 0.11 (druid100[4-6]) [12:06:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:16:05] !log set normal metric on codfw-eqsin link [12:16:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:18:14] akosiaris: I see this puppet error in toolforge: [12:18:17] https://www.irccloud.com/pastebin/Tq2MlwWz/ [12:18:36] 10Operations, 10Code-Stewardship-Reviews, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#3967379 (10Tgr) //"The translators repository has had 6 commits, 5 of them updating from upstream"// - note that the recommended way to write/update... [12:18:58] related to your upstart work in T194724 ? [12:18:59] T194724: Deprecate `base::service_unit` in puppet - https://phabricator.wikimedia.org/T194724 [12:19:14] arturo: dammit... I never expected the tools bastions to have flannel [12:19:41] I 'll revert :-( [12:20:02] thanks! [12:20:10] let me know if I can be of any help [12:20:34] (03PS1) 10Alexandros Kosiaris: Revert "k8s: remove trusty/upstart support" [puppet] - 10https://gerrit.wikimedia.org/r/434682 [12:21:04] (03PS2) 10Alexandros Kosiaris: Revert "k8s: remove trusty/upstart support" [puppet] - 10https://gerrit.wikimedia.org/r/434682 [12:21:09] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Revert "k8s: remove trusty/upstart support" [puppet] - 10https://gerrit.wikimedia.org/r/434682 (owner: 10Alexandros Kosiaris) [12:28:08] 10Operations, 10Cloud-Services, 10netops: Allocate public v4 IPs for Neutron setup in eqiad - https://phabricator.wikimedia.org/T193496#4224987 (10faidon) OK, so it looks 185.15.56.0/24 is proposed to be used immediately in eqiad, to replace 208.80.155.128/25 in the next ~6 months. Additionally, 185.15.57.0/... [12:28:35] 10Operations, 10Move-Files-To-Commons, 10TCB-Team, 10Wikimedia-Extension-setup, 10Wikimedia-extension-review-queue: Deploy FileExporter and FileImporter to group0 - https://phabricator.wikimedia.org/T195370#4224988 (10Lea_WMDE) p:05Triage>03Normal [12:33:02] 10Operations, 10Move-Files-To-Commons, 10TCB-Team, 10Wikimedia-Extension-setup, 10Wikimedia-extension-review-queue: Deploy FileExporter and FileImporter to group0 - https://phabricator.wikimedia.org/T195370#4225006 (10Lea_WMDE) @Jdforrester-WMF again we need your official approval to put the [[ https://m... [12:38:07] (03PS1) 10Addshore: Fix disabledRdfExportEntityTypes for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434683 (https://phabricator.wikimedia.org/T168260) [12:38:38] (03CR) 10WMDE-leszek: [C: 031] Fix disabledRdfExportEntityTypes for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434683 (https://phabricator.wikimedia.org/T168260) (owner: 10Addshore) [12:38:55] (03PS2) 10Addshore: Fix disabledRdfExportEntityTypes for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434683 (https://phabricator.wikimedia.org/T168260) [12:39:23] * addshore will be deploying the above in a second [12:39:39] (03CR) 10WMDE-leszek: [C: 031] Fix disabledRdfExportEntityTypes for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434683 (https://phabricator.wikimedia.org/T168260) (owner: 10Addshore) [12:39:48] (03CR) 10Addshore: [C: 032] Fix disabledRdfExportEntityTypes for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434683 (https://phabricator.wikimedia.org/T168260) (owner: 10Addshore) [12:41:02] (03Merged) 10jenkins-bot: Fix disabledRdfExportEntityTypes for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434683 (https://phabricator.wikimedia.org/T168260) (owner: 10Addshore) [12:43:35] !log addshore@tin Synchronized wmf-config/Wikibase.php: [[gerrit:434683|Fix disabledRdfExportEntityTypes for wikidata]] T168260 (duration: 01m 20s) [12:43:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:43:40] T168260: Deploy WikibaseLexeme extension on Wikimedia cluster - https://phabricator.wikimedia.org/T168260 [12:45:45] Hmmm, how does one clear the cache of a page such as https://www.wikidata.org/wiki/Special:EntityData/L1.rdf ? [12:46:42] ooooh https://wikitech.wikimedia.org/wiki/Multicast_HTCP_purging#One-off_purge [12:47:48] !log addshore@terbium:~$ echo 'https://www.wikidata.org/wiki/Special:EntityData/L1.rdf' | mwscript purgeList.php [12:47:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:00] RECOVERY - Check systemd state on notebook1004 is OK: OK - running: The system is fully operational [12:51:30] RECOVERY - Check systemd state on notebook1003 is OK: OK - running: The system is fully operational [12:52:08] just fixed those --^ [12:52:33] (03PS1) 10Mark Bergsma: Use passed-in reactor in all monitors [debs/pybal] - 10https://gerrit.wikimedia.org/r/434684 [12:52:35] (03PS1) 10Mark Bergsma: Use MemoryReactorClock for monitor unit tests and adopt tests [debs/pybal] - 10https://gerrit.wikimedia.org/r/434685 [12:54:02] (03CR) 10jerkins-bot: [V: 04-1] Use passed-in reactor in all monitors [debs/pybal] - 10https://gerrit.wikimedia.org/r/434684 (owner: 10Mark Bergsma) [12:54:08] CFisch_remote: I won't be here for swat after all :) [12:54:13] I imagine zeljkof will pick it up :) [12:54:25] addshore: ok, no worries :-) [12:54:33] you deployed enough already ^^ [12:55:02] addshore: so going for some L4 now? :-D [12:56:11] addshore CFisch_remote: Ready for swat [12:57:07] zeljkof: yepp! [12:59:21] 10Operations, 10Move-Files-To-Commons, 10TCB-Team, 10Wikimedia-Extension-setup, and 2 others: Deploying FileExporter and FileImporter - https://phabricator.wikimedia.org/T190716#4225078 (10Lea_WMDE) >>! In T190716#4224000, @Joe wrote: > Looking through the tickets and the Readme of the extension, I think t... [13:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: (Dis)respected human, time to deploy European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T1300). Please do the needful. [13:00:04] CFisch_WMDE, Zoranzoki21, and Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:16] Ready for breaking the site again [13:00:17] I can SWAT today [13:00:19] (03CR) 10Nuria: Blacklisting new iOS eventlogging schemas on MySQL (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/434424 (https://phabricator.wikimedia.org/T192819) (owner: 10Chelsyx) [13:00:20] !log starting logical dump of m2-master [13:00:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:24] zeljkof, can you please run a script first? [13:00:49] It's just purgeList.php [13:00:52] For https://li.wikibooks.org/static/images/project-logos/liwikibooks.png [13:00:57] should if you see something strange on gerrit or otrs now [13:00:59] *shout [13:01:03] T193680 [13:01:04] T193680: Logo change request li.wikibooks - https://phabricator.wikimedia.org/T193680 [13:01:10] I am monitoring and everthing seems ok, but just in case [13:02:13] Urbanecm: is it urgent? CFisch_remote is first [13:02:27] zeljkof, it is not, feel free to run it during doing my patches [13:02:35] (03PS2) 10Mark Bergsma: Use passed-in reactor in all monitors [debs/pybal] - 10https://gerrit.wikimedia.org/r/434684 [13:02:37] (03PS2) 10Mark Bergsma: Use MemoryReactorClock for monitor unit tests and adopt tests [debs/pybal] - 10https://gerrit.wikimedia.org/r/434685 [13:02:38] It just should be run sometime :) [13:03:01] Urbanecm: ok, please make sure it's in gerrit comments or commit message [13:03:12] I cannot do that [13:03:16] It is not related to any commit [13:03:22] ah [13:03:31] Urbanecm: hm, not sure I had that before [13:03:50] Urbanecm: turns out it probably wasn't related to any of your commits yesterday [13:03:58] can you add it to the calendar then, as a separate thing, with link to a phab task? [13:04:05] zeljkof, will do [13:04:16] also I just ran purgeList for something else :D [13:04:23] Zoranzoki21: around for SWAT? [13:04:35] Urbanecm, addshore: so there is no need to run the script? [13:04:43] (03PS4) 10Zfilipin: Disable wikidiff2 inline moved paragraphs by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434158 (https://phabricator.wikimedia.org/T194271) (owner: 10WMDE-Fisch) [13:04:47] (for a different url) [13:04:57] the docs fyi are at https://wikitech.wikimedia.org/wiki/Multicast_HTCP_purging#One-off_purge :) [13:05:07] Maybe it's doing all the images... [13:05:23] CFisch_remote: please stand by, I'll ping you when your commit is at mwdebug1002 [13:05:30] Do you see new logo on https://en.wikipedia.org/static/images/project-logos/liwikibooks.png? [13:05:37] It should be non-English [13:05:43] addshore, does it mean I have green light for scheduling? [13:05:45] !log starting logical dump of m3-master [13:05:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:49] zeljkof: yeah, cool there should be nothing really to see but I can check anyway [13:05:51] (03CR) 10Mark Bergsma: [C: 031] Create MonitoringProtocolTestCase base class (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/430337 (owner: 10Mark Bergsma) [13:06:24] CFisch_remote: soooo, your commit can not be deployed as-is :/ [13:06:35] there is a rule one deploy per commit [13:06:40] let me find it [13:06:43] zeljkof, run full scap [13:06:51] so, you have to split the commit into two [13:07:01] Urbanecm: probably, although maybe wait for the ticket to evolve a little more first [13:07:02] zeljkof: I do? [13:07:24] CFisch_remote, yes, or zeljkof must run full scap, which is a pain [13:07:32] The policy is at https://wikitech.wikimedia.org/wiki/SWAT_deploys#Guidelines [13:07:33] Urbanecm: not sure full scap is recommended yet [13:07:42] "Single patches that require more than one sync - in other words, changes to multiple files which depend on each other." [13:07:56] ahhhhh [13:08:19] CFisch_remote: yes [13:08:21] uff ok, then let my fix the patch to do the first thing first so at least that can go in todat [13:08:23] (03CR) 10Mark Bergsma: [C: 032] Create MonitoringProtocolTestCase base class [debs/pybal] - 10https://gerrit.wikimedia.org/r/430337 (owner: 10Mark Bergsma) [13:08:29] would that wokr for you zeljkof ? [13:08:44] CFisch_remote, well...we are already full [13:08:47] (03PS1) 10Ayounsi: Netbox: use uid instead of cn as login [puppet] - 10https://gerrit.wikimedia.org/r/434687 [13:08:49] and you can do another thing first [13:08:50] CFisch_remote: sure, and we can deploy the second one maybe today [13:08:57] (03Merged) 10jenkins-bot: Create MonitoringProtocolTestCase base class [debs/pybal] - 10https://gerrit.wikimedia.org/r/430337 (owner: 10Mark Bergsma) [13:09:02] depends on the time [13:09:09] (03CR) 10Ema: [C: 04-1] "This is fine for the vast majority of our traffic, but would not really work as advertised for non-canonical Host header values (those not" [puppet] - 10https://gerrit.wikimedia.org/r/434644 (https://phabricator.wikimedia.org/T53700) (owner: 10Ema) [13:09:17] 6 patches is a guideline, not a hard limit, depends on available time [13:09:48] also, Zoranzoki21 is not around, so his patch will not be deployed [13:09:48] zeljkof, you're right. [13:10:06] so that's 5 patches so far [13:10:22] Ok, so I propose we'll replace Zoranzoki21 patch with CFisch_remote one [13:10:32] If we have time, I'll take care about his patch at the end of my patches [13:10:35] Is it okay with you zeljkof? [13:10:51] sure [13:11:06] (03PS5) 10WMDE-Fisch: Disable wikidiff2 inline moved paragraphs by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434158 (https://phabricator.wikimedia.org/T194271) [13:11:29] zeljkof: I updated the patch to do the first thing that has to be done only [13:11:31] (03CR) 10Ayounsi: [C: 032] "Tested manually on netmon1002" [puppet] - 10https://gerrit.wikimedia.org/r/434687 (owner: 10Ayounsi) [13:11:41] CFisch_remote: reviewing [13:11:56] (03CR) 10Mark Bergsma: [C: 032] Add minimal test cases for Skeleton and ProxyFetch [debs/pybal] - 10https://gerrit.wikimedia.org/r/430338 (owner: 10Mark Bergsma) [13:12:37] (03Merged) 10jenkins-bot: Add minimal test cases for Skeleton and ProxyFetch [debs/pybal] - 10https://gerrit.wikimedia.org/r/430338 (owner: 10Mark Bergsma) [13:13:07] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434158 (https://phabricator.wikimedia.org/T194271) (owner: 10WMDE-Fisch) [13:13:36] CFisch_remote: is there anything to test at mwdebug, or should I deploy it directly? [13:13:51] nothing to test there [13:13:57] zeljkof: [13:14:11] (03Merged) 10jenkins-bot: Disable wikidiff2 inline moved paragraphs by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434158 (https://phabricator.wikimedia.org/T194271) (owner: 10WMDE-Fisch) [13:14:25] CFisch_remote: ok, then deploying as soon as it merges; please prepare the second patch and add it to the calendar [13:14:33] I am on it [13:14:47] * zeljkof thumbs up [13:15:27] CFisch_remote, while you're editing, move Zoranozki21 patch to my nickname [13:15:27] Thanks! [13:15:34] (03PS1) 10WMDE-Fisch: Disable wikidiff2 inline moved paragraphs on production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434692 (https://phabricator.wikimedia.org/T194271) [13:15:42] Urbanecm: kk [13:15:51] jouncebot update [13:15:55] jouncebot refresh [13:15:56] I refreshed my knowledge about deployments. [13:16:33] (03CR) 10Mark Bergsma: [C: 031] Add unit tests for ProxyFetchMonitoringProtocol (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/430339 (owner: 10Mark Bergsma) [13:17:39] !log zfilipin@tin scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/2cc7028226a539553178454fc2f14459 for details) [13:17:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:00] CFisch_remote: scap failed, see above [13:18:08] reverting [13:18:49] *looks* [13:18:53] oha [13:19:00] (03PS1) 10Zfilipin: Revert "Disable wikidiff2 inline moved paragraphs by default" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434693 [13:19:10] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434693 (owner: 10Zfilipin) [13:19:11] Notice: Undefined variable: wmgDeactivateWikiDiff2InlineMovedParagaphDetection in /srv/mediawiki/wmf-config/CommonSettings.php on line 3194 [13:19:58] * addshore reads up [13:20:07] hmmm [13:20:16] (03Merged) 10jenkins-bot: Revert "Disable wikidiff2 inline moved paragraphs by default" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434693 (owner: 10Zfilipin) [13:20:25] not sure how jenkins did not catch it [13:21:02] CFisch_remote, CS.php should be deployed AFTER IS.php [13:21:04] Not before [13:21:45] Urbanecm: thanks for that, I would have guessed it's the other way around [13:21:58] zeljkof: just tell me if you have the time this SWAT [13:22:10] !log zfilipin@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:434693|Revert "Disable wikidiff2 inline moved paragraphs by default" (T194271)]] (duration: 01m 19s) [13:22:11] so basically the order in the wiki must be swapped [13:22:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:15] T194271: Change config to disable new mobile behavior on all wikis except beta - https://phabricator.wikimedia.org/T194271 [13:22:17] CFisch_remote, IS.php introduce variable, CS.php uses it [13:22:23] You cannot use variable which is not introduced [13:22:51] Urbanecm: Yes makes sense now :-) [13:23:17] CFisch_remote: there should be time, please add both patches _in the correct order_ to the calendar ;) [13:23:28] k :-) [13:23:40] thanks so far zeljkof , Urbanecm and addshore [13:23:52] If the rule won't exist, I'd just write the correct order in the channel... [13:24:12] CFisch_remote: 434692 can be deployed? [13:24:30] Urbanecm: well, I'm surprised jenkins did not catch it :/ [13:24:47] zeljkof, we might rely on deployer's knowledge... [13:24:56] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434640 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [13:25:00] (03CR) 10jenkins-bot: [testwikidata] Add Lexeme NS to ContentNamespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434642 (https://phabricator.wikimedia.org/T191458) (owner: 10DCausse) [13:25:04] jenkins doesnt actually run the code ;) [13:25:08] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434643 (owner: 10Marostegui) [13:25:09] afaik [13:25:11] Urbanecm: well, the new rule should prevent such mistakes [13:25:12] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434648 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [13:25:17] (03CR) 10jenkins-bot: Enable WikibaseLexeme on wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434453 (https://phabricator.wikimedia.org/T191457) (owner: 10Addshore) [13:25:21] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434653 (owner: 10Marostegui) [13:25:23] zeljkof, it should prevent all mistakes THAT ARE TESTED [13:25:24] (03CR) 10jenkins-bot: Fix disabledRdfExportEntityTypes for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434683 (https://phabricator.wikimedia.org/T168260) (owner: 10Addshore) [13:25:28] (03CR) 10jenkins-bot: Disable wikidiff2 inline moved paragraphs by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434158 (https://phabricator.wikimedia.org/T194271) (owner: 10WMDE-Fisch) [13:25:32] addshore: but a linter should detect variable is used but not initialized :/ [13:25:32] (03CR) 10jenkins-bot: Revert "Disable wikidiff2 inline moved paragraphs by default" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434693 (owner: 10Zfilipin) [13:25:33] This apparently isn't tested somehow [13:25:38] *somewhy [13:25:47] Urbanecm: looks like it is not tested, yes [13:25:54] zeljkof: nope, as they are global they can come from anywhere :) [13:26:05] zeljkof: can be deployed [13:26:06] zeljkof, CFisch_remote: If you want, I'll review the patches and their dependencies [13:26:07] addshore: another reason to kill globals :P [13:26:09] there are all just array keys in IS [13:26:11] :D [13:26:28] addshore, if jenkins install MW... [13:26:29] CFisch_remote: reviewing... can it be tested at mwdebug? [13:26:39] no there is nothing to tests [13:26:50] these setting will not do anything for now [13:27:23] they are needed before the upcomming wikidiff lib deployment [13:27:30] CFisch_remote, zeljkof, what commit you're reviewing&deploying? [13:27:34] so, 434692 (IS) then 434158 (CS)? [13:27:40] yeah [13:27:44] Urbanecm: ^ [13:27:49] both zeljkof [13:27:53] CFisch_remote: ok, reviewing and merging [13:28:26] yes [13:29:12] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434692 (https://phabricator.wikimedia.org/T194271) (owner: 10WMDE-Fisch) [13:30:24] (03Merged) 10jenkins-bot: Disable wikidiff2 inline moved paragraphs on production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434692 (https://phabricator.wikimedia.org/T194271) (owner: 10WMDE-Fisch) [13:31:07] (03PS1) 10Mark Bergsma: Adapt ProxyFetch tests to use tcpClients and sslClients [debs/pybal] - 10https://gerrit.wikimedia.org/r/434695 [13:31:31] (03PS2) 10Jcrespo: mariadb: Switch m1-replica to db1117:3321 [puppet] - 10https://gerrit.wikimedia.org/r/434680 (https://phabricator.wikimedia.org/T192979) [13:31:52] (03CR) 10jenkins-bot: Disable wikidiff2 inline moved paragraphs on production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434692 (https://phabricator.wikimedia.org/T194271) (owner: 10WMDE-Fisch) [13:32:19] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:434692|Disable wikidiff2 inline moved paragraphs on production (T194271)]] (duration: 01m 21s) [13:32:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:23] T194271: Change config to disable new mobile behavior on all wikis except beta - https://phabricator.wikimedia.org/T194271 [13:32:31] CFisch_remote: 434692 deployed [13:33:01] zeljkof: nice [13:33:21] (03CR) 10Mark Bergsma: [C: 031] Add unit tests for ProxyFetchMonitoringProtocol (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/430339 (owner: 10Mark Bergsma) [13:33:22] Urbanecm: please make sure commits are sorted by priority at calendar, we will probably run out of time [13:33:36] Going to make sure... [13:33:47] 33 minutes with one patch, that's the highest value... [13:33:59] (03PS1) 10Jcrespo: mariadb: remove db1065 as an m1 replica, set db1117 instead [dns] - 10https://gerrit.wikimedia.org/r/434696 (https://phabricator.wikimedia.org/T192979) [13:34:14] zeljkof: feel free to overrun a bit, I have slots after but wont need all the time :) [13:34:44] addshore: sure, if it's a few minutes, but I'll have to move on to other tasks... [13:35:01] as much as I like deployments, I can't spend all day... ;) [13:35:31] CFisch_remote: 434158 has been reverted, I am not sure anything will happen if I deploy it [13:36:01] zeljkof: ah so it needs to be submitted as new patch ... [13:36:02] you should make sure your commit adds the change to CS [13:36:11] no worries I can also plan for another swat [13:36:29] zeljkof, revert your revert [13:36:30] CFisch_remote: yes, new patch, since we always deploy master of the repo [13:36:45] Urbanecm: that's hurting my head ;) [13:36:52] Because? :) [13:37:23] CFisch_remote: if that's what you want to do, revert the revert, just make sure the commit message is sane [13:37:32] zeljkof, I confirm the patch are priority-ordered [13:37:47] Urbanecm: as a deployer, I can not really think about that [13:37:58] that should be done by the developer [13:38:00] (03CR) 10Mark Bergsma: [C: 032] Add unit tests for ProxyFetchMonitoringProtocol [debs/pybal] - 10https://gerrit.wikimedia.org/r/430339 (owner: 10Mark Bergsma) [13:38:23] Urbanecm: ok, starting with your patches, please stand by, will ping you when each is at mwdebug [13:38:30] (03Merged) 10jenkins-bot: Add unit tests for ProxyFetchMonitoringProtocol [debs/pybal] - 10https://gerrit.wikimedia.org/r/430339 (owner: 10Mark Bergsma) [13:39:02] zeljkof, please skip mwdebug with all the logo patches [13:39:05] (03PS1) 10WMDE-Fisch: Disable wikidiff2 inline moved paragraphs on production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434697 (https://phabricator.wikimedia.org/T194271) [13:39:11] Urbanecm: ok [13:39:18] Thanks! [13:39:29] Urbanecm: I forgot, does a script not related to a patch need to run? [13:39:41] zeljkof, seems to being already fixed somehow [13:39:42] if so, please add it to the calendar with link to a phab task [13:39:46] (03PS2) 10WMDE-Fisch: Disable wikidiff2 inline moved paragraphs on production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434697 (https://phabricator.wikimedia.org/T194271) [13:39:53] Urbanecm: ok [13:40:08] zeljkof: reverted revert with sane commit message 434697 [13:40:33] CFisch_remote: I hear you like reverts, so I have reverted your revert... ;) [13:40:52] CFisch_remote: please add it to the calendar, will deploy as soon as possible [13:41:08] (03CR) 10Addshore: [C: 031] add addshore, aude, ladsgroup to wikidata contact group [puppet] - 10https://gerrit.wikimedia.org/r/434479 (https://phabricator.wikimedia.org/T195289) (owner: 10ArielGlenn) [13:41:09] CFisch_remote, also note that the patch that zeljkof reverted was reverted, please. [13:41:13] (03PS1) 10Jcrespo: mariadb: Move db1065 from m1 to m2 [puppet] - 10https://gerrit.wikimedia.org/r/434698 (https://phabricator.wikimedia.org/T192979) [13:41:27] (03PS3) 10Addshore: add addshore, aude, ladsgroup to wikidata contact group [puppet] - 10https://gerrit.wikimedia.org/r/434479 (https://phabricator.wikimedia.org/T195289) (owner: 10ArielGlenn) [13:41:43] Urbanecm: uh, wait, what? :/ [13:42:02] zeljkof, you reverted the first attempt (that jenkins didn't catch) because of scap failure [13:42:08] It's still in the calendar [13:42:24] And it a) should be removed b) listed as being reverted [13:42:29] Not sure what's better [13:43:03] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434598 (https://phabricator.wikimedia.org/T194340) (owner: 10Urbanecm) [13:43:28] Urbanecm: I think CFisch_remote misunderstood the process, he thought it could be reused [13:43:39] That's true [13:43:40] yes [13:43:41] (03CR) 10Jcrespo: [C: 032] mariadb: remove db1065 as an m1 replica, set db1117 instead [dns] - 10https://gerrit.wikimedia.org/r/434696 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [13:43:44] sorry guys [13:43:56] CFisch_remote, everyone's learning the whole life :) [13:44:02] Thank you for your work! [13:44:12] Urbanecm: for 434598 purge should be done for the existing logo, right? [13:44:26] Exactly [13:44:31] CFisch_remote: no problem, problems happen :) [13:44:42] (03Merged) 10jenkins-bot: Change logo assets for wikimania2018wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434598 (https://phabricator.wikimedia.org/T194340) (owner: 10Urbanecm) [13:45:16] Yesterday all WMF projects started to being loaded slower and slower, in the middle of the SWAT [13:45:40] (03CR) 10Jcrespo: [C: 032] mariadb: Switch m1-replica to db1117:3321 [puppet] - 10https://gerrit.wikimedia.org/r/434680 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [13:47:32] !log zfilipin@tin Synchronized static/images/project-logos/: SWAT: [[gerrit:434598|Change logo assets for wikimania2018wiki (T194340)]] (duration: 01m 20s) [13:47:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:36] T194340: Change Wikimania2018 site logo - https://phabricator.wikimedia.org/T194340 [13:47:41] Urbanecm: argh, can not log in to terbium... looking into it [13:47:56] zeljkof, try ssh -v ;) [13:48:00] (v for verbose) [13:48:02] Urbanecm: 434598 is deployed, the script did not run yet [13:48:07] ack [13:48:15] Can help you with debugging SSH, if you want [13:49:03] Urbanecm: thanks, swat window 12 minutes before closing is probably not the best time :D [13:49:15] will deploy what I can, then debug and run the scripts [13:49:40] ack [13:49:41] I remember having that problem recently, can not remember how I fixed it [13:49:43] :/ [13:50:25] Can you log in at least to deployment.eqiad.wmnet? [13:50:57] Urbanecm: I can log in to everywhere but terbium [13:51:01] Ok [13:51:11] CFisch_remote: argh, my eyes! o.O https://gerrit.wikimedia.org/r/#/c/434697/2//COMMIT_MSG [13:51:26] 10Operations, 10Traffic: Identify bots using AES128-SHA maintainers running on toolforge - https://phabricator.wikimedia.org/T194380#4225244 (10Vgutierrez) >>! In T194380#4224611, @MaxBioHazard wrote: > I re-run this bot with this option. Thanks! I've already seen it behaving properly :) [13:51:28] (trailing white-space in commit message) [13:51:31] Nothing's doable now :D [13:51:47] (should be covered by jenkins as well btw) [13:51:51] (as is all the formatting) [13:52:00] (03PS3) 10WMDE-Fisch: Disable wikidiff2 inline moved paragraphs on production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434697 (https://phabricator.wikimedia.org/T194271) [13:52:15] zeljkof: fixed .. stupid c&p [13:52:27] Urbanecm: will try ssh to terbium from another machine [13:53:27] zeljkof, ssh -v can help you with debugging. But please deploy the rest, it won't break anything but basic version [13:53:31] Which is fixable later [13:55:13] !log stopping db1065 database to move it to m2 [13:55:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:00] (03CR) 10Zfilipin: [C: 032] "zfilipin@terbium:~$ echo "https://en.wikipedia.org/static/images/project-logos/wikimania2018wiki.png" | mwscript purgeList.php" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434598 (https://phabricator.wikimedia.org/T194340) (owner: 10Urbanecm) [13:56:29] Urbanecm: ok, purged from another machine, works there [13:56:43] works [13:56:54] (03CR) 10Jcrespo: [C: 032] mariadb: Move db1065 from m1 to m2 [puppet] - 10https://gerrit.wikimedia.org/r/434698 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [13:56:56] Can you continue with the rest, please? [13:57:05] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434697 (https://phabricator.wikimedia.org/T194271) (owner: 10WMDE-Fisch) [13:57:22] CFisch_remote: merging 434697 [13:57:29] Urbanecm: ^ [13:57:32] zeljkof: thanks [13:57:38] ack [13:57:48] Urbanecm: will try to merge one more your patch after 434697 [13:58:26] Friendly reminder for zeljkof: addshore confirmed you can use the start of his window [13:58:34] His window starts in 2 minutes [13:58:35] (03Merged) 10jenkins-bot: Disable wikidiff2 inline moved paragraphs on production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434697 (https://phabricator.wikimedia.org/T194271) (owner: 10WMDE-Fisch) [13:59:08] Urbanecm: thanks, saw that, I'll finish up CFisch_remote's patches and do one more of yours, then I have to move on to other tasks [13:59:30] oh, ok... [14:00:04] addshore: Dear deployers, time to do the WikibaseLexeme backports deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T1400). [14:00:14] maybe addshore can take over :-) [14:00:17] Urbanecm: as much as I like SWAT, I can't do it all day... ;) [14:00:35] understood [14:00:36] (03PS5) 10Giuseppe Lavagetto: profile::mediawiki::mcrouter_wancache: add ssl, proxy support [puppet] - 10https://gerrit.wikimedia.org/r/431737 (https://phabricator.wikimedia.org/T192370) [14:00:43] addshore: please wait, deploying 434697 [14:01:03] !log zfilipin@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:434697|Disable wikidiff2 inline moved paragraphs by default (T194271)]] (duration: 01m 18s) [14:01:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:08] T194271: Change config to disable new mobile behavior on all wikis except beta - https://phabricator.wikimedia.org/T194271 [14:01:27] CFisch_remote: 434697 deployed, please check and thanks for deploying with #releng! ;) [14:01:40] addshore: do I have the time for one more swat patch? [14:01:43] yup [14:01:47] * addshore is eating ;) [14:01:52] addshore: great, thanks! :D [14:02:11] thanks again zeljkof [14:02:17] (03PS6) 10Zfilipin: Enable HD logos for wikimania2018wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434600 (https://phabricator.wikimedia.org/T194340) (owner: 10Urbanecm) [14:02:33] CFisch_remote: no problemo, it ended up being more complicated than usual :D [14:02:51] zeljkof: always :P [14:02:55] * CFisch_remote blames a bit addshore for that ;-) [14:03:14] yesterday someone, god forbid them, made an edit on metawiki in the middle of the swat and it caused ~20 mins of exceptions :P [14:04:18] <_joe_> addshore: uh? [14:05:04] _joe_: well an edit and marking a page for translation [14:05:21] _joe_: looks like that was the actual cause now, I updated the incident report & also there is a comment on the ticket [14:05:21] Urbanecm: sorry, now my interview went down :( I give up for today [14:05:28] (on the phone) [14:05:37] <_joe_> addshore: just marking it for translation? wow [14:05:42] !log EU SWAT finished [14:05:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:09] addshore, can you please deploy at least 434600? [14:06:23] *looks* [14:06:30] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review, and 2 others: Update and use php-wikidiff2 1.5.1 & MovedParagraphDetectionCutoff in production - https://phabricator.wikimedia.org/T177891#4225295 (10Lea_WMDE) [14:06:42] (I'd do it myself if I had the deployer privs [14:06:49] (03CR) 10Addshore: [C: 032] Enable HD logos for wikimania2018wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434600 (https://phabricator.wikimedia.org/T194340) (owner: 10Urbanecm) [14:06:56] I'll take over the last one [14:07:35] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review, and 2 others: Update and use php-wikidiff2 1.5.1 & MovedParagraphDetectionCutoff in production - https://phabricator.wikimedia.org/T177891#3674105 (10Lea_WMDE) 05Open>03Resolved Closing this ticket as we have passed 1.5.1 already. For 1... [14:08:01] (03Merged) 10jenkins-bot: Enable HD logos for wikimania2018wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434600 (https://phabricator.wikimedia.org/T194340) (owner: 10Urbanecm) [14:08:43] Urbanecm: its on mwdebug1002 [14:09:07] I don't have Retina...hard to test then. [14:09:15] ack [14:09:20] *checks the site works* [14:10:49] syncing [14:12:06] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:434600|Enable HD logos for wikimania2018wiki]] T194340 (duration: 01m 20s) [14:12:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:11] T194340: Change Wikimania2018 site logo - https://phabricator.wikimedia.org/T194340 [14:12:12] Urbanecm: done [14:12:16] thanks a lot [14:12:36] * addshore waits for his patches to merge [14:13:05] are we swatting now? /me is switching timezones [14:13:14] Hauskatze: just finished [14:13:26] jouncebot now [14:13:27] For the next 0 hour(s) and 46 minute(s): WikibaseLexeme backports (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T1400) [14:13:32] addshore: hmm, but morning swat is in an hour? [14:13:39] jouncebot: next [14:13:39] In 0 hour(s) and 46 minute(s): test.wikidata.org dispatching lock manager change (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T1500) [14:13:45] Hauskatze, no, it is not [14:13:46] Hauskatze: yes, and that hour finished 13 mins ago! [14:13:52] addshore, you're wrong [14:14:10] Morning SWAT is at 18:00 UTC [14:14:18] EU SWAT is at 13:00 UTC [14:14:22] EU SWAT finished [14:14:27] ack, the other morning swat :P [14:14:36] I am waiting for morning SWAT [14:14:48] Hauskatze: not for 3 hours I don't think [14:14:51] (03PS1) 10Ema: VCL: move RB Accept header normalization to text-fe [puppet] - 10https://gerrit.wikimedia.org/r/434706 [14:15:10] Hauskatze, morning swat is at every workday but tuesday and wednesday [14:15:11] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4225335 (10Lea_WMDE) [14:15:16] 17:00 utc wikitech dixit [14:15:23] (03CR) 10jerkins-bot: [V: 04-1] VCL: move RB Accept header normalization to text-fe [puppet] - 10https://gerrit.wikimedia.org/r/434706 (owner: 10Ema) [14:15:32] Hauskatze, so there's no morning swat at wednesdays [14:15:39] Urbanecm: https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T1700 ?? [14:15:51] Oh... [14:15:53] It's MOVED [14:15:58] I'm sorry [14:16:00] 10Operations, 10Traffic, 10netops: cp intermittent IPsec MTU issue - https://phabricator.wikimedia.org/T195365#4225342 (10ayounsi) [14:16:25] okay so I'm not late for my swat window because it's not started yet, that's why I wanted to know [14:16:41] s/why/what [14:17:43] (03PS2) 10Ema: VCL: move RB Accept header normalization to text-fe [puppet] - 10https://gerrit.wikimedia.org/r/434706 [14:19:57] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434359 (https://phabricator.wikimedia.org/T195247) (owner: 10Zoranzoki21) [14:21:16] 10Operations, 10Traffic, 10netops: cp intermittent IPsec MTU issue - https://phabricator.wikimedia.org/T195365#4225379 (10BBlack) I don't have complete thoughts, but keep in mind in general it's complicated to go changing our actual host interface MTUs to anything larger than 1500 ("jumbo frames"), for a few... [14:21:34] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4225381 (10WMDE-Fisch) Config change is done and deployed ... even though it's just a minor thing we had some confusion and problems with tha... [14:22:16] (03CR) 10jenkins-bot: Change logo assets for wikimania2018wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434598 (https://phabricator.wikimedia.org/T194340) (owner: 10Urbanecm) [14:22:21] (03CR) 10jenkins-bot: Disable wikidiff2 inline moved paragraphs on production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434697 (https://phabricator.wikimedia.org/T194271) (owner: 10WMDE-Fisch) [14:22:26] (03CR) 10jenkins-bot: Enable HD logos for wikimania2018wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434600 (https://phabricator.wikimedia.org/T194340) (owner: 10Urbanecm) [14:24:56] (03PS3) 10Ottomata: Redirect pivot -> turnilo [puppet] - 10https://gerrit.wikimedia.org/r/434499 (https://phabricator.wikimedia.org/T194427) [14:25:03] (03CR) 10Ottomata: [V: 032 C: 032] Redirect pivot -> turnilo [puppet] - 10https://gerrit.wikimedia.org/r/434499 (https://phabricator.wikimedia.org/T194427) (owner: 10Ottomata) [14:25:39] (03PS3) 10Mark Bergsma: Avoid Deferred.cancel() induced CancelledErrors [debs/pybal] - 10https://gerrit.wikimedia.org/r/433364 [14:25:41] (03PS4) 10Mark Bergsma: Handle HTTP status 302 and 303 as well as 301 [debs/pybal] - 10https://gerrit.wikimedia.org/r/430393 (https://phabricator.wikimedia.org/T102393) [14:25:43] (03PS4) 10Mark Bergsma: Add full unit test coverage of IdleConnection [debs/pybal] - 10https://gerrit.wikimedia.org/r/433341 [14:25:45] (03PS3) 10Mark Bergsma: Cleanup monitor shutdown handler (invoking stop) after run [debs/pybal] - 10https://gerrit.wikimedia.org/r/433369 [14:25:47] (03PS3) 10Mark Bergsma: Split monitor tests into separate modules [debs/pybal] - 10https://gerrit.wikimedia.org/r/433370 [14:25:49] (03PS3) 10Mark Bergsma: Extend unit testing of RunCommand [debs/pybal] - 10https://gerrit.wikimedia.org/r/433702 [14:25:51] (03PS3) 10Mark Bergsma: Use passed-in reactor in all monitors [debs/pybal] - 10https://gerrit.wikimedia.org/r/434684 [14:25:53] (03PS3) 10Mark Bergsma: Use MemoryReactorClock for monitor unit tests and adopt tests [debs/pybal] - 10https://gerrit.wikimedia.org/r/434685 [14:25:55] (03PS2) 10Mark Bergsma: Adapt ProxyFetch tests to use tcpClients and sslClients [debs/pybal] - 10https://gerrit.wikimedia.org/r/434695 [14:27:02] 10Operations, 10Discovery, 10Maps: disk usage increase on maps servers - https://phabricator.wikimedia.org/T194966#4225438 (10Gehel) I see no significant drop in disk usage since enabling `unchecked_tombstone_compaction`. But disk usage remains stable, which means we are probably OK, at least for the short t... [14:27:17] * vgutierrez hides [14:28:08] (03CR) 10Volans: "To be on the safe side I've tested across the whole fleet the ruby code. It works everywhere except on Trusty, where it was already failin" [puppet] - 10https://gerrit.wikimedia.org/r/434032 (owner: 10Volans) [14:28:10] (03CR) 10Mark Bergsma: Avoid Deferred.cancel() induced CancelledErrors (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/433364 (owner: 10Mark Bergsma) [14:28:18] !log addshore@tin Started scap: WikibaseLexeme: Do not refer to the spelling variant as language, T193603, [[gerrit:434691|Patch 1]], [[gerrit:434690|Patch 2]] [14:28:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:22] T193603: [UI] Change title of the field of a lemma language - https://phabricator.wikimedia.org/T193603 [14:29:36] * addshore waits [14:30:56] !log installing curl security updates on mediawiki canaries along with HHVM restart to pick up new library version [14:31:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:19] PROBLEM - dhclient process on stat1005 is CRITICAL: Return code of 255 is out of bounds [14:43:29] PROBLEM - Disk space on stat1005 is CRITICAL: Return code of 255 is out of bounds [14:43:29] PROBLEM - MD RAID on stat1005 is CRITICAL: Return code of 255 is out of bounds [14:43:39] PROBLEM - configured eth on stat1005 is CRITICAL: Return code of 255 is out of bounds [14:44:00] PROBLEM - DPKG on stat1005 is CRITICAL: Return code of 255 is out of bounds [14:44:09] PROBLEM - Check systemd state on stat1005 is CRITICAL: Return code of 255 is out of bounds [14:44:30] PROBLEM - SSH on stat1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:44:50] PROBLEM - puppet last run on stat1005 is CRITICAL: Return code of 255 is out of bounds [14:46:17] I guess that somebody is crunching data [14:46:19] checking [14:48:30] RECOVERY - SSH on stat1005 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u3 (protocol 2.0) [14:49:29] (03PS1) 10ArielGlenn: Monitor dump output file production [puppet] - 10https://gerrit.wikimedia.org/r/434709 [14:50:04] (03CR) 10jerkins-bot: [V: 04-1] Monitor dump output file production [puppet] - 10https://gerrit.wikimedia.org/r/434709 (owner: 10ArielGlenn) [14:50:50] PROBLEM - Check the NTP synchronisation status of timesyncd on stat1005 is CRITICAL: Return code of 255 is out of bounds [14:52:48] !log restart db2078 for upgrade and to convert it to multiinstance [14:52:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:34] (03PS1) 10Reedy: Remove pear packages from MW Application Servers [puppet] - 10https://gerrit.wikimedia.org/r/434710 (https://phabricator.wikimedia.org/T195364) [14:53:47] (03PS1) 10Jcrespo: mariadb: Convert db2078 to multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434711 [14:54:34] (03PS2) 10ArielGlenn: Monitor dump output file production [puppet] - 10https://gerrit.wikimedia.org/r/434709 [14:55:35] (03PS2) 10Jcrespo: mariadb: Convert db2078 to multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434711 [14:55:37] (03CR) 10Reedy: Remove pear packages from MW Application Servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/434710 (https://phabricator.wikimedia.org/T195364) (owner: 10Reedy) [14:55:49] RECOVERY - DPKG on stat1005 is OK: All packages OK [14:55:50] RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational [14:56:01] 10Operations, 10Patch-For-Review: Ship host syslogs to ELK - https://phabricator.wikimedia.org/T193766#4225600 (10herron) >>! In T193766#4193319, @Gehel wrote: > * We already have syslog messages coming in. We should probably ensure that moving those to different indices is as transparent as possible. My under... [14:56:10] RECOVERY - dhclient process on stat1005 is OK: PROCS OK: 0 processes with command name dhclient [14:56:19] RECOVERY - Disk space on stat1005 is OK: DISK OK [14:56:19] RECOVERY - MD RAID on stat1005 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [14:56:29] RECOVERY - configured eth on stat1005 is OK: OK - interfaces up [14:56:43] (03CR) 10Jcrespo: [C: 032] mariadb: Convert db2078 to multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/434711 (owner: 10Jcrespo) [14:59:31] 10Operations, 10Traffic, 10Wikimedia-Hackathon-2018: Create and deploy a centralized letsencrypt service - https://phabricator.wikimedia.org/T194962#4225611 (10BBlack) >>! In T194962#4221028, @Krenair wrote: >>>! In T194962#4217296, @BBlack wrote: >> * The reference earlier to emulating the puppet fileserver... [15:00:00] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [15:00:04] addshore: How many deployers does it take to do test.wikidata.org dispatching lock manager change deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T1500). [15:00:32] * addshore continues [15:00:48] I actually think this single scap will end up taking both the slots, oh well [15:02:07] addshore, it supposed to be displayed like https://ctrlv.cz/QyTQ? [15:02:16] *is it [15:03:37] that's https://phabricator.wikimedia.org/T195359 [15:03:52] thats the one! [15:04:02] (03PS3) 10ArielGlenn: Monitor dump output file production [puppet] - 10https://gerrit.wikimedia.org/r/434709 [15:05:06] addshore, Nikerabbit, thank you both [15:15:56] (03PS1) 10Herron: puppet-agent: remove --show_diff from scheduled puppet-run script [puppet] - 10https://gerrit.wikimedia.org/r/434719 (https://phabricator.wikimedia.org/T1) [15:24:00] (03PS1) 10Cmjohnson: Adding dns entry for new snapshot1008 [dns] - 10https://gerrit.wikimedia.org/r/434721 (https://phabricator.wikimedia.org/T195385) [15:24:03] 10Operations, 10ops-eqiad, 10Datasets-General-or-Unknown, 10Dumps-Generation, and 2 others: Rack and setup snapshot1008 - https://phabricator.wikimedia.org/T195385#4225718 (10Cmjohnson) p:05Triage>03Normal [15:25:57] (03CR) 10Cmjohnson: [C: 032] Adding dns entry for new snapshot1008 [dns] - 10https://gerrit.wikimedia.org/r/434721 (https://phabricator.wikimedia.org/T195385) (owner: 10Cmjohnson) [15:26:14] (03PS1) 10Ladsgroup: ores: Add hunspell-sr library [puppet] - 10https://gerrit.wikimedia.org/r/434723 (https://phabricator.wikimedia.org/T195063) [15:30:56] !log stop db2044 for cloning to db2078 + upgrade [15:31:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:31:32] proxies should complain in a minute [15:31:33] (03PS2) 10Muehlenhoff: Enable microcode updates for Ganeti hosts [puppet] - 10https://gerrit.wikimedia.org/r/434676 [15:32:10] (03CR) 10Muehlenhoff: [C: 032] Enable microcode updates for Ganeti hosts [puppet] - 10https://gerrit.wikimedia.org/r/434676 (owner: 10Muehlenhoff) [15:32:13] (03CR) 10Alexandros Kosiaris: [C: 032] ores: Add hunspell-sr library [puppet] - 10https://gerrit.wikimedia.org/r/434723 (https://phabricator.wikimedia.org/T195063) (owner: 10Ladsgroup) [15:32:18] (03PS2) 10Alexandros Kosiaris: ores: Add hunspell-sr library [puppet] - 10https://gerrit.wikimedia.org/r/434723 (https://phabricator.wikimedia.org/T195063) (owner: 10Ladsgroup) [15:32:20] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] ores: Add hunspell-sr library [puppet] - 10https://gerrit.wikimedia.org/r/434723 (https://phabricator.wikimedia.org/T195063) (owner: 10Ladsgroup) [15:33:37] PROBLEM - haproxy failover on dbproxy1007 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [15:34:07] PROBLEM - haproxy failover on dbproxy1002 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [15:36:30] 10Operations, 10Code-Stewardship-Reviews, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4225795 (10akosiaris) >>! In T187194#4224971, @Tgr wrote: > //"The translators repository has had 6 commits, 5 of them updating from upstream"// - n... [15:39:17] RECOVERY - Check the NTP synchronisation status of timesyncd on stat1005 is OK: OK: synced at Wed 2018-05-23 15:39:09 UTC. [15:39:18] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1983 bytes in 0.133 second response time [15:41:51] (03PS3) 10Gehel: Convert string fields to text in apifeatureusage [puppet] - 10https://gerrit.wikimedia.org/r/430250 (https://phabricator.wikimedia.org/T192614) (owner: 10EBernhardson) [15:42:43] (03CR) 10Gehel: [C: 032] Convert string fields to text in apifeatureusage [puppet] - 10https://gerrit.wikimedia.org/r/430250 (https://phabricator.wikimedia.org/T192614) (owner: 10EBernhardson) [15:44:27] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1960 bytes in 0.099 second response time [15:47:54] shhhh wikidata [15:51:03] (03CR) 10Mark Bergsma: [C: 032] Avoid Deferred.cancel() induced CancelledErrors [debs/pybal] - 10https://gerrit.wikimedia.org/r/433364 (owner: 10Mark Bergsma) [15:51:38] (03Merged) 10jenkins-bot: Avoid Deferred.cancel() induced CancelledErrors [debs/pybal] - 10https://gerrit.wikimedia.org/r/433364 (owner: 10Mark Bergsma) [15:53:42] !log Running populateExternallinksIndex60.php on group 0 for T59176 [15:53:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:53:47] T59176: ApiQueryExtLinksUsage::run query has crazy limit - https://phabricator.wikimedia.org/T59176 [15:58:07] (03CR) 10Ema: VCL: Normalise the Accept-Language header for the REST API (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/434558 (https://phabricator.wikimedia.org/T195327) (owner: 10Mobrovac) [16:01:20] (03PS1) 10Cmjohnson: Adding dhcpd snapshot1008 [puppet] - 10https://gerrit.wikimedia.org/r/434726 (https://phabricator.wikimedia.org/T195385) [16:01:50] (03CR) 10Cmjohnson: [C: 032] Adding dhcpd snapshot1008 [puppet] - 10https://gerrit.wikimedia.org/r/434726 (https://phabricator.wikimedia.org/T195385) (owner: 10Cmjohnson) [16:04:13] (03CR) 10Chelsyx: Blacklisting new iOS eventlogging schemas on MySQL (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/434424 (https://phabricator.wikimedia.org/T192819) (owner: 10Chelsyx) [16:04:54] (03PS2) 10Dzahn: keyholder: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/434535 (https://phabricator.wikimedia.org/T194724) [16:05:14] 10Operations, 10ops-eqiad, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests: Rack and setup snapshot1008 - https://phabricator.wikimedia.org/T195385#4225942 (10Cmjohnson) [16:05:47] (03PS4) 10Dzahn: otrs: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/433491 (https://phabricator.wikimedia.org/T194724) [16:06:47] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1973 bytes in 0.084 second response time [16:10:19] 10Operations, 10ops-eqiad, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests: Rack and setup snapshot1008 - https://phabricator.wikimedia.org/T195385#4225960 (10Cmjohnson) @robh the server is racked and setup (dns, switch, idrac, racktables, etc) .....I did the dhcpd file but can you... [16:10:21] !log Running deduplicateArchiveRevId.php on group 0 for T193180 [16:10:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:10:26] T193180: Clean up archive rows with duplicate revision IDs - https://phabricator.wikimedia.org/T193180 [16:10:51] 10Operations, 10ops-eqiad, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests: Rack and setup snapshot1008 - https://phabricator.wikimedia.org/T195385#4225963 (10Cmjohnson) [16:11:48] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1972 bytes in 0.082 second response time [16:11:56] (03PS2) 10Dzahn: sentry: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/434539 (https://phabricator.wikimedia.org/T194724) [16:11:56] !log addshore@tin Finished scap: WikibaseLexeme: Do not refer to the spelling variant as language, T193603, [[gerrit:434691|Patch 1]], [[gerrit:434690|Patch 2]] (duration: 103m 38s) [16:12:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:12:01] T193603: [UI] Change title of the field of a lemma language - https://phabricator.wikimedia.org/T193603 [16:12:26] finally [16:12:27] PROBLEM - MariaDB Slave Lag: s3 on db2036 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 301.60 seconds [16:12:29] * addshore is done [16:12:34] (03CR) 10Dzahn: "not adding "restart => true" here because refresh was set to false explicitly before" [puppet] - 10https://gerrit.wikimedia.org/r/434538 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [16:12:38] PROBLEM - MariaDB Slave Lag: s3 on db2057 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.68 seconds [16:12:38] PROBLEM - MariaDB Slave Lag: s3 on db2043 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.70 seconds [16:12:47] PROBLEM - MariaDB Slave Lag: s3 on db2074 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 305.70 seconds [16:12:47] PROBLEM - MariaDB Slave Lag: s3 on db2050 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 307.08 seconds [16:14:51] (03CR) 10Mark Bergsma: [C: 032] Handle HTTP status 302 and 303 as well as 301 [debs/pybal] - 10https://gerrit.wikimedia.org/r/430393 (https://phabricator.wikimedia.org/T102393) (owner: 10Mark Bergsma) [16:16:35] ^marostegui is any alter table ongoing on codfw? [16:16:59] (03Merged) 10jenkins-bot: Handle HTTP status 302 and 303 as well as 301 [debs/pybal] - 10https://gerrit.wikimedia.org/r/430393 (https://phabricator.wikimedia.org/T102393) (owner: 10Mark Bergsma) [16:17:29] jynus: I wonder if it's the maintenance script I'm running. Does a maintenance script running on terbium wait for all replicas, or just eqiad replicas? [16:17:32] 10Operations, 10ops-eqiad, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests: Rack and setup snapshot1008 - https://phabricator.wikimedia.org/T195385#4225981 (10RobH) [16:17:38] just eqiad [16:17:51] it is ok, then [16:18:07] Not on s3 [16:18:09] until we are fully active-active of course [16:18:45] jynus: nothing from my side running at the moment [16:19:30] (03CR) 10Mark Bergsma: [C: 032] Add full unit test coverage of IdleConnection [debs/pybal] - 10https://gerrit.wikimedia.org/r/433341 (owner: 10Mark Bergsma) [16:20:09] (03Merged) 10jenkins-bot: Add full unit test coverage of IdleConnection [debs/pybal] - 10https://gerrit.wikimedia.org/r/433341 (owner: 10Mark Bergsma) [16:20:43] anomie: how long will it take? [16:21:43] (03CR) 10Mark Bergsma: [C: 031] Cleanup monitor shutdown handler (invoking stop) after run (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/433369 (owner: 10Mark Bergsma) [16:21:56] (03PS3) 10Krinkle: Drop $wgTitle usages from robots.txt and extract2.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428843 (owner: 10Chad) [16:22:19] (03CR) 10Krinkle: [C: 032] Drop $wgTitle usages from robots.txt and extract2.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428843 (owner: 10Chad) [16:22:24] jynus: I'm not sure. It's running on group 0, up to testwiki right now. Most of the wikis go pretty fast, but there are a few with lots of rows. [16:22:42] (03CR) 10Dzahn: "@Alex, yea we don't know what actually does the compression. It works in production but this change breaks it: https://gerrit.wikimedia.or" [puppet] - 10https://gerrit.wikimedia.org/r/434605 (owner: 10Paladox) [16:22:52] (03CR) 10Dzahn: [C: 04-1] Gerrit: Fix log4j rotating files [puppet] - 10https://gerrit.wikimedia.org/r/434605 (owner: 10Paladox) [16:23:44] (03Merged) 10jenkins-bot: Drop $wgTitle usages from robots.txt and extract2.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428843 (owner: 10Chad) [16:24:06] 10Operations, 10ops-eqiad, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests: Rack and setup snapshot1008 - https://phabricator.wikimedia.org/T195385#4226012 (10RobH) So raid1-lvm-ext4-srv-noswap.cfg looks like it will work well for this. Discussed iwth @ArielGlenn via IRC as well. [16:24:07] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1967 bytes in 0.090 second response time [16:24:25] 10Operations, 10ops-eqiad, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests: Rack and setup snapshot1008 - https://phabricator.wikimedia.org/T195385#4226014 (10RobH) a:05Cmjohnson>03RobH [16:24:32] (03CR) 10Dzahn: [C: 04-1] "we both confirmed log4j should not be able to do it without "extras". and we dont see a cron doing it.. must be Gerrit.. but where does th" [puppet] - 10https://gerrit.wikimedia.org/r/434605 (owner: 10Paladox) [16:24:58] jynus: It finished now. [16:27:02] (03CR) 10Dzahn: [C: 04-1] "of course we could just add a second cron in puppet that actually runs the gzip and forget about Gerrit doing it.. Gerrit probably doesn't" [puppet] - 10https://gerrit.wikimedia.org/r/434605 (owner: 10Paladox) [16:28:16] chasemp I am working on labvirt1019 and 1020....moving them to 10G and connecting second cable [16:29:28] (03CR) 10jenkins-bot: Drop $wgTitle usages from robots.txt and extract2.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428843 (owner: 10Chad) [16:30:39] jynus: I was planning on running the same script on group 1 tomorrow, probably starting around 13:00 or 13:30 UTC. [16:31:01] !log starting wikidata full reindex for T163642 [16:31:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:31:05] T163642: Index Wikidata strings in statements in the search engine - https://phabricator.wikimedia.org/T163642 [16:32:40] (03PS4) 10ArielGlenn: Monitor dump output file production [puppet] - 10https://gerrit.wikimedia.org/r/434709 [16:33:07] 10Operations, 10Code-Stewardship-Reviews, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4226047 (10Tgr) That number doesn't really mean anything though. The latest mediawiki/services/zotero/translators commit for example bundles 213 ups... [16:33:17] PROBLEM - Host labvirt1019 is DOWN: PING CRITICAL - Packet loss = 100% [16:33:28] 10Operations, 10Move-Files-To-Commons, 10TCB-Team, 10Wikimedia-Extension-setup, and 2 others: Deploying FileExporter and FileImporter - https://phabricator.wikimedia.org/T190716#4226051 (10Joe) Shortly: - It's ok to reevaluate the limits after we deploy, 250 MB is not a huge amount of data and will save us... [16:34:30] jynus: Then group 2 same time Friday. enwiki will probably take a really long time. [16:34:41] please no friday [16:36:27] RECOVERY - MariaDB Slave Lag: s3 on db2043 is OK: OK slave_sql_lag Replication lag: 36.01 seconds [16:36:28] RECOVERY - MariaDB Slave Lag: s3 on db2057 is OK: OK slave_sql_lag Replication lag: 37.01 seconds [16:36:28] RECOVERY - MariaDB Slave Lag: s3 on db2074 is OK: OK slave_sql_lag Replication lag: 8.80 seconds [16:36:37] RECOVERY - MariaDB Slave Lag: s3 on db2050 is OK: OK slave_sql_lag Replication lag: 0.04 seconds [16:37:18] RECOVERY - MariaDB Slave Lag: s3 on db2036 is OK: OK slave_sql_lag Replication lag: 0.24 seconds [16:38:27] it will likely take a long time, start it on monday so we can monitor any issues during the weekday [16:38:48] Fine, I can leave it til Monday or Tuesday (Monday is a US holiday). Although depending on how big other group-2 wikis are I can't promise it won't be running all week until *next* Friday. [16:39:03] that is ok [16:39:07] tuesday is ok [16:39:09] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4226071 (10Lea_WMDE) @MoritzMuehlenhoff we found a bug. Could you give us tomorrow (Thursday) to find out if that means if we should postpone... [16:39:16] and 7 days is ok, too [16:39:27] the main issue is the start/middle [16:40:18] we can also massage the table on codfw in advance to prevent alerts [16:40:43] I have to wait for the train to run for the maintenance script to be deployed places, and by then it's already Thursday evening for group 2 :( [16:41:15] I understand, but if it is not an emergency, I would wait until the following week [16:41:37] and that is assuming it doesn't get reverted [16:51:37] here come back the proxies [16:52:07] RECOVERY - haproxy failover on dbproxy1002 is OK: OK check_failover servers up 2 down 0 [16:52:37] RECOVERY - haproxy failover on dbproxy1007 is OK: OK check_failover servers up 2 down 0 [16:54:02] (03CR) 10Nuria: Blacklisting new iOS eventlogging schemas on MySQL (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/434424 (https://phabricator.wikimedia.org/T192819) (owner: 10Chelsyx) [16:55:40] 10Operations, 10Analytics, 10Research, 10Traffic, and 6 others: Referrer policy for browsers which only support the old spec - https://phabricator.wikimedia.org/T180921#4226106 (10Nuria) Latest data about this show internal referrers for safari creeping up from Late march: {F18492888} [16:59:27] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1940 bytes in 0.105 second response time [17:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Morning SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T1700). [17:00:04] Amir1, Nikerabbit, and Hauskatze: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [17:00:11] o/ [17:00:22] o/ [17:00:41] Not testable but we should keep looking at logs for ~10 minutes [17:00:59] My patch is testable on mwdebug{whatever} [17:02:06] Mine should be too [17:02:46] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4226127 (10MoritzMuehlenhoff) @Lea_WMDE : Sure, no problem! [17:04:06] "Not testable" - my favorite kind of code [17:04:12] xD [17:04:34] * Hauskatze afk for a minute-ish [17:04:50] I suppose I can deploy unless someone else wants to [17:05:17] (03PS1) 10Cmjohnson: Updating labvirt1019 mac [puppet] - 10https://gerrit.wikimedia.org/r/434737 (https://phabricator.wikimedia.org/T194964) [17:05:49] Amir1: merging yours first [17:05:52] (03CR) 1020after4: [C: 032] Make clients read from term_text instead of term_search_key [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434658 (https://phabricator.wikimedia.org/T194273) (owner: 10Ladsgroup) [17:06:19] (03CR) 1020after4: [C: 032] "Merging for SWAT deployment..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434658 (https://phabricator.wikimedia.org/T194273) (owner: 10Ladsgroup) [17:06:30] (03CR) 10Cmjohnson: [C: 032] Updating labvirt1019 mac [puppet] - 10https://gerrit.wikimedia.org/r/434737 (https://phabricator.wikimedia.org/T194964) (owner: 10Cmjohnson) [17:06:33] (03CR) 10Krinkle: [C: 031] "I've added some links to the task description about previewing the survey and the EventLogging input." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434641 (https://phabricator.wikimedia.org/T187299) (owner: 10Gilles) [17:06:37] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1970 bytes in 0.116 second response time [17:07:08] uhm, what's that mean? ^ [17:07:14] (03Merged) 10jenkins-bot: Make clients read from term_text instead of term_search_key [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434658 (https://phabricator.wikimedia.org/T194273) (owner: 10Ladsgroup) [17:08:48] no_justification: I see Ib1c6d67666814ea5383493401d24588abfe2577b is merged and undeployed, should I sync it with SWAT? [17:09:23] Amir1: which logs should I be monitoring? [17:10:22] twentyafterfour: fatals [17:10:28] (03PS1) 10Jcrespo: mariadb: Set up db1117:3325 as the backup host for m5 database section [puppet] - 10https://gerrit.wikimedia.org/r/434740 (https://phabricator.wikimedia.org/T192979) [17:10:37] and performance regressions (specially on s8) [17:12:03] Amir1: can you watch performance? I've got fatalmonitor covered [17:12:27] * twentyafterfour syncs it [17:13:18] no_justification: twentyafterfour: woops sorry, forgot to press return on the scap sync command [17:13:24] Have it in my shell for the past hour :/ [17:13:40] It's two simple file changes, fine to rolll out now or later. Sorry about that [17:13:45] Already tested on mwdebug [17:13:51] (yesterday) [17:14:12] Krinkle: if you want to deploy that next I'll be done syncing shortly [17:14:19] twentyafterfour: sure [17:14:21] OK. [17:14:27] !log twentyafterfour@tin Synchronized wmf-config/Wikibase-production.php: SWAT deploying https://gerrit.wikimedia.org/r/#/c/434658/ refs T194273 (duration: 01m 20s) [17:14:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:14:32] T194273: Clean up indexes of wb_terms table - https://phabricator.wikimedia.org/T194273 [17:14:36] If you need a full scap, I'll revert for now, otherwise, let's keep it in staging and I'll roll out after this [17:14:56] I just did sync-file, you can go next [17:15:12] K [17:15:17] Amir1: no new fatals that I can see [17:16:23] twentyafterfour: can I have a 'submit' at https://gerrit.wikimedia.org/r/#/c/434742/ for new repo setup? [17:16:38] twentyafterfour: the number of rows read has increased [17:16:55] if would be great if we wait for five minutes [17:17:09] !log krinkle@tin Synchronized w/extract2.php: Ib1c6d676 - Bye, wgTitle (duration: 01m 19s) [17:17:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:17:42] Amir1: ok, let me know what you want me to do [17:17:47] Hauskatze: ok [17:17:59] twentyafterfour: it's back to normal now [17:18:06] https://grafana.wikimedia.org/dashboard/db/mysql-aggregated?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-group=core&var-shard=s8&var-role=All&from=now-3h&to=now [17:18:08] (03PS1) 10Bstorm: toolforge: maintain-kubeusers did not page through ldap [puppet] - 10https://gerrit.wikimedia.org/r/434744 [17:18:25] wow that was quite a spike [17:18:25] thanks [17:18:36] !log krinkle@tin Synchronized w/robots.php: Ib1c6d676 - Bye, wgTitle (duration: 01m 20s) [17:18:38] * Krinkle unlocks scap [17:18:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:18:42] twentyafterfour: done (both) and confirmed in prod. Sorry again :) [17:18:49] Krinkle: no problem \ [17:18:53] a scary one :D [17:18:55] Thanks! [17:19:19] Amir1: so we're good to move on to the next one? [17:20:04] Amir1: worst response time seems to be up and staying up? [17:20:26] eh I guess that started before the spike [17:20:56] twentyafterfour: let me double check worst response time [17:21:23] Nikerabbit: looks like you're next [17:21:27] okay [17:22:44] twentyafterfour: I don't see any increase in worst response time [17:22:52] Amir1: ok cool [17:23:21] Amir1: I think we're good, still no new fatals [17:23:23] please move forward, thank you a lot [17:26:04] (03CR) 10jenkins-bot: Make clients read from term_text instead of term_search_key [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434658 (https://phabricator.wikimedia.org/T194273) (owner: 10Ladsgroup) [17:26:44] (03CR) 1020after4: [C: 032] "Merging for SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433546 (https://phabricator.wikimedia.org/T194871) (owner: 10MarcoAurelio) [17:27:03] Nikerabbit: still waiting for CI [17:27:28] ~4 minutes remaining, says zuul [17:27:50] ack [17:28:16] if you can then sync wmf.4 to mwdebug so that I can test it on wikidata [17:33:30] 10Operations, 10DBA, 10Patch-For-Review: Rack and setup db1116 - db1123 - https://phabricator.wikimedia.org/T191792#4226176 (10Marostegui) [17:34:39] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10User-ArielGlenn: Run all jobs on PHP7 - https://phabricator.wikimedia.org/T195392#4226179 (10Jdforrester-WMF) p:05Triage>03Normal [17:35:00] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10User-ArielGlenn: Run all jobs on PHP7 or HHVM - https://phabricator.wikimedia.org/T195393#4226190 (10Jdforrester-WMF) p:05Triage>03Normal [17:35:56] 10Operations: rename wasat to mwmaint2001 and reinstall it with stretch - https://phabricator.wikimedia.org/T193915#4226202 (10Jdforrester-WMF) [17:35:59] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10User-ArielGlenn: Run all jobs on PHP7 or HHVM - https://phabricator.wikimedia.org/T195393#4226190 (10Jdforrester-WMF) [17:36:02] 10Operations, 10Patch-For-Review: setup replacement for terbium (maintenance_server) on stretch - https://phabricator.wikimedia.org/T192092#4226203 (10Jdforrester-WMF) [17:37:27] now merged [17:38:47] twentyafterfour: let me known when testable [17:39:53] Nikerabbit: it should be ready on mwdebug1001 [17:40:04] twentyafterfour: ok, testing [17:40:33] (03PS4) 1020after4: zhwiki: let 'accountcreators' to self-remove their permissions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433546 (https://phabricator.wikimedia.org/T194871) (owner: 10MarcoAurelio) [17:41:02] https://www.wikidata.org/w/index.php?title=Wikidata:News/cs&diff=prev&oldid=683956646 [17:41:09] I confirm it works as expected [17:41:34] Nikerabbit: sweet, syncing [17:42:24] twentyafterfour: I don't have separate test for wmf.5, I suggest just go ahead and sync that righ after this [17:42:45] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10User-ArielGlenn: Run all jobs on PHP7 or HHVM - https://phabricator.wikimedia.org/T195393#4226212 (10Jdforrester-WMF) [17:44:04] (03PS2) 10Bstorm: toolforge: maintain-kubeusers did not page through ldap [puppet] - 10https://gerrit.wikimedia.org/r/434744 [17:44:06] Nikerabbit: yeah I'm syncing them both [17:44:09] (03PS2) 10Dzahn: add deploy2001 to site.pp with deployment_server role [puppet] - 10https://gerrit.wikimedia.org/r/433638 (https://phabricator.wikimedia.org/T193916) [17:44:47] yay [17:45:10] !log twentyafterfour@tin Synchronized php-1.32.0-wmf.5/extensions/Translate/MessageCollection.php: syncing https://gerrit.wikimedia.org/r/#/c/434699/ refs T195347 (duration: 01m 20s) [17:45:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:45:15] T195347: Change to a single translation unit refreshed the whole page with an obsolete content - https://phabricator.wikimedia.org/T195347 [17:45:24] (03CR) 10Bstorm: "Moving style and format changes to another change." [puppet] - 10https://gerrit.wikimedia.org/r/434744 (owner: 10Bstorm) [17:45:29] (03CR) 10Dzahn: [C: 032] add deploy2001 to site.pp with deployment_server role [puppet] - 10https://gerrit.wikimedia.org/r/433638 (https://phabricator.wikimedia.org/T193916) (owner: 10Dzahn) [17:45:36] (03PS3) 10Bstorm: toolforge: maintain-kubeusers did not page through ldap [puppet] - 10https://gerrit.wikimedia.org/r/434744 [17:46:29] (03PS4) 10Bstorm: toolforge: maintain-kubeusers did not page through ldap [puppet] - 10https://gerrit.wikimedia.org/r/434744 [17:46:50] Nikerabbit: that should do it. Up next is https://gerrit.wikimedia.org/r/#/c/433546/ [17:47:09] !log twentyafterfour@tin Synchronized php-1.32.0-wmf.4/extensions/Translate/MessageCollection.php: syncing https://gerrit.wikimedia.org/r/#/c/434700/ refs T195347 (duration: 01m 19s) [17:47:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:47:18] (03CR) 10Bstorm: [C: 032] toolforge: maintain-kubeusers did not page through ldap [puppet] - 10https://gerrit.wikimedia.org/r/434744 (owner: 10Bstorm) [17:47:34] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): rename naos to deploy2001 and reinstall with stretch - https://phabricator.wikimedia.org/T193916#4226224 (10Dzahn) [17:47:52] twentyafterfour: thanks, that went smoothly! [17:48:13] twentyafterfour: I see my patch is merged on gerrit but it's not sync yet right? [17:48:18] Nikerabbit: you're welcome, glad it did [17:48:32] Hauskatze: I just pulled it to mwdebug1001. Can you test? [17:48:48] twentyafterfour: sure, doing [17:49:21] (03CR) 10Dzahn: [C: 04-1] "i should first remove naos, then reinstall it, then add deploy2001" [puppet] - 10https://gerrit.wikimedia.org/r/433616 (https://phabricator.wikimedia.org/T193916) (owner: 10Dzahn) [17:49:53] twentyafterfour: looks good to me [17:50:03] Hauskatze: ok syncing globally [17:50:08] thanks [17:50:38] Krinkle: got a second for https://gerrit.wikimedia.org/r/#/c/434749/ ? [17:50:49] it's a mediawiki/extensions .gitmodules update [17:52:28] !log twentyafterfour@tin Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/#/c/433546/ for SWAT refs T194871 (duration: 01m 19s) [17:52:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:52:33] T194871: Allow the user with account creator flag to remove the flag themselve on zh.wikipedia - https://phabricator.wikimedia.org/T194871 [17:52:49] Hauskatze: done. Can you confirm without mwdebug? [17:53:07] re-checking [17:53:09] (03CR) 10jenkins-bot: zhwiki: let 'accountcreators' to self-remove their permissions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433546 (https://phabricator.wikimedia.org/T194871) (owner: 10MarcoAurelio) [17:53:49] twentyafterfour: yep, accountcreators are now $wgRemoveFromSelf as well [17:54:38] !log Finished SWATting, thanks everyone! [17:54:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:54:42] purrfehktz [17:56:15] (03PS1) 10Bstorm: toolforge: refactor some python in maintain-kubeusers [puppet] - 10https://gerrit.wikimedia.org/r/434752 [17:56:31] jouncebot: next [17:56:31] In 0 hour(s) and 3 minute(s): Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T1800) [17:56:31] (03PS1) 10Dzahn: remove naos as scap master and scap host [puppet] - 10https://gerrit.wikimedia.org/r/434753 (https://phabricator.wikimedia.org/T193916) [18:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T1800) [18:09:31] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10User-ArielGlenn: Run all jobs on PHP7 or HHVM - https://phabricator.wikimedia.org/T195393#4226294 (10Reedy) [18:09:42] hmm.. i will wait until the 2-4p PST window then [18:09:52] maybe that's not even enough.. grmm [18:09:57] i need to reinstall/rename naos [18:10:08] but the first puppet run on a deployment server takes forever [18:10:23] and not sure what happens if you deploy and there is no scap master in codfw [18:10:28] in that moment..until it's back [18:13:57] twentyafterfour: any idea what would happen if i remove the codfw scap master, like this: https://gerrit.wikimedia.org/r/#/c/434753/1/hieradata/common/scap/dsh.yaml and then there is a deploy [18:14:01] and after i add it back [18:14:16] i mean.. it's just "a scap master" per that config file. not "the codfw master" [18:15:14] mutante: I don't know, honestly. when scap does `sync masters` it might break [18:15:16] it would be the same host with a new name.. deploy2001 to match deploy1001 which we'll start to use in 2 days [18:15:39] jouncebot: next [18:15:39] In 0 hour(s) and 44 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T1900) [18:15:55] mutante: we can test it real quick and undo the change if it breaks scap? [18:16:03] looking at calendar... i can wait until after 5pm PST [18:16:05] and do it then [18:16:12] no wait.. i can't today..arg [18:16:22] twentyafterfour: oh, nice idea:) yes please [18:16:35] today's train should be short and I can delay it if you want to use the time now to do this [18:17:04] i don't want to start it because i know the initial puppet run will take a LONG time [18:17:11] oh [18:17:15] how long? [18:17:21] hours? [18:17:22] but i would love to merge that change and test if it breaks anything [18:17:33] and if it doesn't... i can do the reinstall later [18:17:36] ok [18:17:42] yes, hours [18:17:54] might have to do that tomorrow after 5 [18:18:37] (03CR) 10Dzahn: [C: 032] remove naos as scap master and scap host [puppet] - 10https://gerrit.wikimedia.org/r/434753 (https://phabricator.wikimedia.org/T193916) (owner: 10Dzahn) [18:21:03] i ran puppet on tin and naos and naos has been removed [18:21:21] from scap/dsh groups/confd [18:21:24] (03PS1) 10Reedy: s/php5/php/ in foreachwikiindblist [puppet] - 10https://gerrit.wikimedia.org/r/434754 (https://phabricator.wikimedia.org/T195393) [18:22:37] mutante: ok I'll test [18:22:41] thanks [18:24:36] !log twentyafterfour@tin Synchronized README: testing sync-masters without naos (duration: 01m 09s) [18:24:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:01] mutante: I got a warning in the log: [18:26:03] 18:24:04 Job ['/usr/bin/scap', 'pull-master', 'tin.eqiad.wmnet'] called with an empty host list. [18:26:14] but it appears that everything worked ok besides that [18:26:22] IIRC we get that on beta without any issues [18:26:49] sanity check for behaviour that probably means something is 'wrong" [18:27:02] (03CR) 10ArielGlenn: [C: 031] "Makes sense to me." [puppet] - 10https://gerrit.wikimedia.org/r/434754 (https://phabricator.wikimedia.org/T195393) (owner: 10Reedy) [18:27:46] (03PS1) 10Bstorm: wiki replicas: maintain-dbusers to page through ldap [puppet] - 10https://gerrit.wikimedia.org/r/434755 [18:28:13] twentyafterfour/Reedy: thank you, sounds good. i guess i can then start the reinstall despite what the deployment calendar says :) [18:28:29] and we see the result a day before we switch to deploy1001.. cant hurt [18:31:37] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10Patch-For-Review, 10User-ArielGlenn: Run all jobs on PHP7 or HHVM - https://phabricator.wikimedia.org/T195393#4226190 (10Reedy) Stuff that uses `foreachwiki`, and hence, `php5` ``` modules/mediawiki/manifests/maintenance/purge_abusefilter.pp: com... [18:50:25] (03CR) 10Merlijn van Deen: [C: 031] "Looks sane to me." [puppet] - 10https://gerrit.wikimedia.org/r/434752 (owner: 10Bstorm) [18:56:55] (03CR) 10Foks: [C: 031] "> @Foks: can I consider this approved by Trust and Safety?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433136 (https://phabricator.wikimedia.org/T152296) (owner: 10MarcoAurelio) [19:00:04] twentyafterfour: My dear minions, it's time we take the moon! Just kidding. Time for MediaWiki train deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T1900). [19:18:47] (03PS1) 1020after4: group1 wikis to 1.32.0-wmf.5 refs T191051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434756 [19:18:49] (03CR) 1020after4: [C: 032] group1 wikis to 1.32.0-wmf.5 refs T191051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434756 (owner: 1020after4) [19:20:22] (03Merged) 10jenkins-bot: group1 wikis to 1.32.0-wmf.5 refs T191051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434756 (owner: 1020after4) [19:21:51] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10Patch-For-Review, 10User-ArielGlenn: Run all jobs on PHP7 or HHVM - https://phabricator.wikimedia.org/T195393#4226468 (10Reedy) So, running `extensions/CheckUser/maintenance/purgeOldData.php` under normal foreachwiki, and a hacked version of ` `foreachwi... [19:21:58] !log twentyafterfour@tin rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.5 refs T191051 [19:22:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:22:13] T191051: 1.32.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T191051 [19:23:06] !log twentyafterfour@tin Synchronized php: group1 wikis to 1.32.0-wmf.5 refs T191051 (duration: 01m 08s) [19:23:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:18] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10Patch-For-Review, 10User-ArielGlenn: Run all jobs on PHP7 or HHVM - https://phabricator.wikimedia.org/T195393#4226495 (10Reedy) Most of the jobs aren't time sensitive... But I think the Echo and CirrusSearch ones are more important and run more regularly... [19:27:50] (03CR) 10jenkins-bot: group1 wikis to 1.32.0-wmf.5 refs T191051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434756 (owner: 1020after4) [19:36:53] !log reinstalling naos as deploy2001, booting to PXE (T193916) [19:37:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:37:02] T193916: rename naos to deploy2001 and reinstall with stretch - https://phabricator.wikimedia.org/T193916 [19:38:18] PROBLEM - Host naos is DOWN: PING CRITICAL - Packet loss = 100% [19:38:58] ACKNOWLEDGEMENT - Host naos is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn reinstall as deploy2001 [19:41:12] mutante: FYI you can use the reimage script also when changing hostname ;) [19:43:29] 10Operations, 10Code-Stewardship-Reviews, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4226548 (10mobrovac) I'd like to point out that this is about the Zotero translation server. We want to be able to preserve the benefits that the tr... [19:49:42] ok ok ;) [19:49:57] we were wondering about https://phabricator.wikimedia.org/T195402#4226544 [19:50:50] "I'm using mw1267 as a server and for me Special:Version says 0.0.0 " [19:51:06] ? [19:51:36] it's Wikidata. people are getting different results https://phabricator.wikimedia.org/T195402 [19:51:48] I guess is unrelated to my previous comment :) [19:51:53] already talking to twentyafterfour [19:52:34] well, almost unrelated except i was worried for a moment that removing naos could be related to "different code on different servers" but seems not [19:53:07] Using mwdebug extension I confirmed that mw2017 and mw2099.codfw are running wmf.5. I'm not sure why 1267 would be running "0.0.0" [19:53:39] everyone should be seeing wmf.5 I think [19:53:40] 1267 is a canary [19:53:55] no, it's not. the comment is wrong [19:55:02] ohh that comment is referring to the wikibaselexeme extension version, not the mediawiki version [19:55:51] ack. i could depool mw1267 [19:56:43] I think 1267 is showing the right thing, it's the same version I see on every other server I've tried [19:56:58] aah [19:57:35] so far I'm unable to find any servers that are running a different version of the code [19:57:47] (03PS1) 10Ottomata: Add comment to refine command about extraClassPath [puppet] - 10https://gerrit.wikimedia.org/r/434758 [19:57:57] good [19:58:08] (03CR) 10Ottomata: [C: 031] "Jenkins doesn't like your commit message (lines too long!) but otherwise +1 from me :)" [puppet] - 10https://gerrit.wikimedia.org/r/434424 (https://phabricator.wikimedia.org/T192819) (owner: 10Chelsyx) [19:58:27] twentyafterfour: if you have a command to run locally to test it we can run cumin across all of them to quickly verify [19:58:45] (03CR) 10Ottomata: [C: 032] Add comment to refine command about extraClassPath [puppet] - 10https://gerrit.wikimedia.org/r/434758 (owner: 10Ottomata) [19:58:50] (03PS2) 10Ottomata: Add comment to refine command about extraClassPath [puppet] - 10https://gerrit.wikimedia.org/r/434758 [19:58:52] (03CR) 10Ottomata: [V: 032 C: 032] Add comment to refine command about extraClassPath [puppet] - 10https://gerrit.wikimedia.org/r/434758 (owner: 10Ottomata) [20:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: My dear minions, it's time we take the moon! Just kidding. Time for Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T2000). [20:00:21] 10Operations, 10Traffic, 10Wikimedia-Hackathon-2018: Create and deploy a centralized letsencrypt service - https://phabricator.wikimedia.org/T194962#4226649 (10Krenair) Ah so the trick is where we normally `puppet:///` in a file source, between the second and third slashes can be a host name of the machine t... [20:00:52] volans: hmm, lets see... [20:04:06] volans: [20:04:09] jq .headSHA1 /srv/mediawiki/php-1.32.0-wmf.5/cache/gitinfo/info.json [20:04:33] ack, doing [20:05:19] !log arlolra@tin Started deploy [parsoid/deploy@de18a58]: Updating Parsoid to dccfeafd [20:05:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:35] expected result: "fda06ee2e1ff51c8b8145bca7100cbc7cc4709d0" [20:05:40] twentyafterfour: same sha everywhere [20:06:00] including videoscalers and jobrunners [20:06:12] 274 hosts checked [20:06:13] ok that's good. So I don't know why people are seeing different results, perhaps an intermittent exception [20:06:46] thanks for checking, volans! [20:06:47] that was a nice way to check, thanks [20:07:36] yw, it's so easy and quick that doesn't bother at all ;) [20:07:45] cumin is really cool [20:07:51] (and tasty too!) [20:07:55] and the combo using the checksum too [20:08:06] jq is handy :) [20:08:19] I have used it a few times lately [20:08:41] hmm there is an oddity though: https://tools.wmflabs.org/versions/ [20:08:47] says group1 is wmf.4 [20:08:52] it should be wmf.5 [20:10:01] hmm, the tool seems to be broken because wikiversions.json looks correct: https://noc.wikimedia.org/conf/highlight.php?file=wikiversions.json [20:11:02] (03PS1) 10Dzahn: remove naos from site.pp, tcpircbot. update scap comment [puppet] - 10https://gerrit.wikimedia.org/r/434763 (https://phabricator.wikimedia.org/T193916) [20:12:28] yeah jq is pretty nice, once you start remembering the syntax for more complex stuff :D [20:12:57] I wish it used css selector syntax, honestly... [20:13:04] but that's a bit hard to apply to json I guess [20:14:28] lol [20:14:36] twentyafterfour: i cant confirm. (now) it shows wmf.5 for group 0 and group 1, just not for group 2 [20:14:38] it can do much more though [20:16:18] mutante: hhmmm now it says wmf.5 for me as well [20:16:24] a glitch in the matrix [20:16:31] the tool is delayed somehow? [20:16:51] or maybe cached? I hard-refreshed the page and it took a while to load instead of being instant as before [20:17:01] (03CR) 10Dzahn: [C: 032] "naos is gone. long live deploy2001" [puppet] - 10https://gerrit.wikimedia.org/r/434763 (https://phabricator.wikimedia.org/T193916) (owner: 10Dzahn) [20:17:02] that's kinda shitty if a status page is cached with a long timeout [20:17:46] right [20:18:28] !log arlolra@tin Finished deploy [parsoid/deploy@de18a58]: Updating Parsoid to dccfeafd (duration: 13m 09s) [20:18:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:54] no it doesn't appear to have cache headers hmm [20:19:03] " [20:19:04] Version information read from live configuration files hosted on noc.wikimedia.org. [20:20:20] header: X-Clacks-Overhead: GNU Terry Pratchett [20:20:24] lols [20:20:33] http://clacksoverhead.discworld.us/ [20:20:59] hahaa. nice [20:21:10] !log bsitzmann@tin Started deploy [mobileapps/deploy@5896151]: Update mobileapps to 29ebe0f (T192664) [20:21:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:21:15] T192664: Announce browser extension support for Reading Lists in the apps - https://phabricator.wikimedia.org/T192664 [20:23:52] (03PS1) 10Dzahn: remove naos from network::constants and kubernetes staging [puppet] - 10https://gerrit.wikimedia.org/r/434798 (https://phabricator.wikimedia.org/T193916) [20:24:51] twentyafterfour, I think that's added by the proxy? [20:25:00] 10Operations, 10Code-Stewardship-Reviews, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4226724 (10akosiaris) >>! In T187194#4226047, @Tgr wrote: > That number doesn't really mean anything though. I respectfully disagree. That number... [20:25:45] Krenair: I'm not sure [20:28:07] !log Updated Parsoid to dccfeafd (T157418, T194777, T195317, T195174, T194763, T194658) [20:28:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:28:24] T157418: RFC: Make some aspects of Tidy's whitespace stripping behavior part of wikitext parsing "spec" - https://phabricator.wikimedia.org/T157418 [20:28:24] T194777: Category links are swallowed into definition list output even when they are outside the list - https://phabricator.wikimedia.org/T194777 [20:28:25] T195317: Comment prevents category from being migrated out of list - https://phabricator.wikimedia.org/T195317 [20:28:25] T194763: stripUnnecessaryIndentPreNowikis in wikitext serializer is over-aggressive and can introduce semantic diffs - https://phabricator.wikimedia.org/T194763 [20:28:25] T195174: Edge case tokenizing difference - https://phabricator.wikimedia.org/T195174 [20:28:25] T194658: Empty table cells aren't serialized correctly - https://phabricator.wikimedia.org/T194658 [20:30:03] !log bsitzmann@tin Finished deploy [mobileapps/deploy@5896151]: Update mobileapps to 29ebe0f (T192664) (duration: 08m 54s) [20:30:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:30:08] T192664: Announce browser extension support for Reading Lists in the apps - https://phabricator.wikimedia.org/T192664 [20:30:21] (03CR) 10Dzahn: [C: 032] remove naos from network::constants and kubernetes staging [puppet] - 10https://gerrit.wikimedia.org/r/434798 (https://phabricator.wikimedia.org/T193916) (owner: 10Dzahn) [20:33:52] (03PS2) 10Dzahn: scap: add deploy2001 as scap master [puppet] - 10https://gerrit.wikimedia.org/r/433616 (https://phabricator.wikimedia.org/T193916) [20:34:20] PROBLEM - Confd template for /etc/dsh/group/parsoid on deploy2001 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:34:23] (03CR) 10jerkins-bot: [V: 04-1] scap: add deploy2001 as scap master [puppet] - 10https://gerrit.wikimedia.org/r/433616 (https://phabricator.wikimedia.org/T193916) (owner: 10Dzahn) [20:34:50] PROBLEM - Confd template for /etc/dsh/group/ores on deploy2001 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:35:03] (03PS3) 10Dzahn: scap: add deploy2001 as scap master and host [puppet] - 10https://gerrit.wikimedia.org/r/433616 (https://phabricator.wikimedia.org/T193916) [20:35:24] mutante: I don't know if it was mentioned already in the task(s) but probably there might be some DB grants that might need to be upgraded for the naos->deploy2001 reimage [20:35:33] *updated [20:37:09] volans: ugh :/ thanks! then i am blocked i think [20:37:22] i have one for mwmaint1001 pending at https://gerrit.wikimedia.org/r/#/c/430524/ [20:37:40] while it has +1 it also always says it needs manual deploy [20:37:53] so that would be similar [20:38:26] I guess most of them might be for terbium/wasat, but tin/naos might be in the list too. At least I know I asked for it for debmonitor [20:38:56] did the IP changed? [20:39:29] terbium/wasat is already in the queue [20:39:44] if there is one for naos then it's not using the host name [20:39:53] yes the IP changed on purpose [20:40:07] for smoother migration [20:40:10] ok [20:42:57] !log pnorman@tin Started deploy [tilerator/deploy@18faaa6]: Deploy scap fixes to cleartables map test server [20:43:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:43:02] !log pnorman@tin Finished deploy [tilerator/deploy@18faaa6]: Deploy scap fixes to cleartables map test server (duration: 00m 08s) [20:43:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:46:47] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack frdata1001 - https://phabricator.wikimedia.org/T187364#4226818 (10Jgreen) a:05Jgreen>03None [20:48:10] 10Operations, 10ops-eqiad, 10fundraising-tech-ops, 10Patch-For-Review: Rack/setup frmon1001 - https://phabricator.wikimedia.org/T186073#4226819 (10Jgreen) a:05Jgreen>03None [20:49:31] PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/ [20:52:05] !log pnorman@tin Started deploy [tilerator/deploy@63617a9]: Deploy scap fixes to cleartables map test server go 2 [20:52:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:52:28] !log pnorman@tin Finished deploy [tilerator/deploy@63617a9]: Deploy scap fixes to cleartables map test server go 2 (duration: 00m 23s) [20:52:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:53:34] !log arlolra@tin Started deploy [parsoid/deploy@de18a58]: Reverting Parsoid deploy [20:53:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:54:00] RECOVERY - Confd template for /etc/dsh/group/parsoid on deploy2001 is OK: No errors detected [20:54:30] RECOVERY - Confd template for /etc/dsh/group/ores on deploy2001 is OK: No errors detected [20:55:44] in this case recovery means "just started to exist" [20:55:51] RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy [20:56:11] !log arlolra@tin Finished deploy [parsoid/deploy@de18a58]: Reverting Parsoid deploy (duration: 02m 37s) [20:56:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:57:25] !log rebooting deploy2001 [20:57:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:05:30] (03CR) 10EBernhardson: [C: 031] Add extra-analysis analyzers as separate plugins [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/432136 (https://phabricator.wikimedia.org/T193734) (owner: 10DCausse) [21:08:22] (03PS1) 10Dzahn: remove naos.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/434802 (https://phabricator.wikimedia.org/T193916) [21:09:45] volans: yes, i found grants that are using the IP, i'll upload another change. thanks [21:11:15] mutante: ack, probably best to ask the DBAs to check it too tomorrow [21:11:52] yes, i will pause it here and do that tomorrow [21:13:01] besides that the server will be ready, with the puppet role and has been added to network/constants and all that [21:14:09] great [21:16:15] (03PS1) 10Dzahn: mariadb: update m5 grants after naos became deploy2001 [puppet] - 10https://gerrit.wikimedia.org/r/434803 (https://phabricator.wikimedia.org/T193916) [21:17:36] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): rename naos to deploy2001 and reinstall with stretch - https://phabricator.wikimedia.org/T193916#4226921 (10Dzahn) [21:19:28] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): rename naos to deploy2001 and reinstall with stretch - https://phabricator.wikimedia.org/T193916#4183490 (10Dzahn) [21:24:12] (03CR) 10Dzahn: [C: 031] "+1 because that would only be after the other change this depends on (and which needs review from others)" [puppet] - 10https://gerrit.wikimedia.org/r/429344 (https://phabricator.wikimedia.org/T82937) (owner: 10Herron) [21:25:30] 10Operations, 10Availability (MediaWiki-MultiDC), 10Patch-For-Review, 10Performance-Team (Radar), 10User-Joe: mcrouter production architecture - https://phabricator.wikimedia.org/T192771#4226937 (10aaron) >>! In T192771#4223990, @Joe wrote: > The reason of the hybrid proxy approach is that mcrouter is kn... [21:29:57] !log arming keyholder on deploy2001 [21:30:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:34:03] (03CR) 10Aaron Schulz: profile::mediawiki::mcrouter_wancache: add ssl, proxy support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/431737 (https://phabricator.wikimedia.org/T192370) (owner: 10Giuseppe Lavagetto) [21:43:04] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10Patch-For-Review, 10User-ArielGlenn: Run all jobs on PHP7 or HHVM - https://phabricator.wikimedia.org/T195393#4226973 (10Jdforrester-WMF) If you're testing anyway, how long does `maintenance/rebuildLocalisationCache.php` take on PHP7 (ref. {T191921})? ;-) [21:43:52] 10Operations, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: Access to usergroups for Marshall Miller - https://phabricator.wikimedia.org/T194550#4226980 (10MMiller_WMF) Thank you! I'm in. [21:58:17] 10Operations, 10ops-codfw: update physical labels from naos.codfw.wmnet to deploy2001.codfw.wmnet - https://phabricator.wikimedia.org/T195421#4227023 (10Dzahn) [21:58:32] 10Operations, 10ops-codfw: update physical labels from naos.codfw.wmnet to deploy2001.codfw.wmnet - https://phabricator.wikimedia.org/T195421#4227036 (10Dzahn) p:05Triage>03Normal [21:58:58] 10Operations, 10ops-codfw: update physical labels from naos.codfw.wmnet to deploy2001.codfw.wmnet - https://phabricator.wikimedia.org/T195421#4227023 (10Dzahn) [21:59:32] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): rename naos to deploy2001 and reinstall with stretch - https://phabricator.wikimedia.org/T193916#4183490 (10Dzahn) [21:59:34] 10Operations, 10ops-codfw: update physical labels from naos.codfw.wmnet to deploy2001.codfw.wmnet - https://phabricator.wikimedia.org/T195421#4227023 (10Dzahn) [22:03:28] 10Operations, 10netops: update switch port label from naos.codfw.wmnet to deploy2001.codfw.wmnet - https://phabricator.wikimedia.org/T195422#4227061 (10Dzahn) [22:03:52] 10Operations, 10netops: update switch port label from naos.codfw.wmnet to deploy2001.codfw.wmnet - https://phabricator.wikimedia.org/T195422#4227063 (10Dzahn) [22:03:55] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): rename naos to deploy2001 and reinstall with stretch - https://phabricator.wikimedia.org/T193916#4183490 (10Dzahn) [22:06:25] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): rename naos to deploy2001 and reinstall with stretch - https://phabricator.wikimedia.org/T193916#4227070 (10Dzahn) [22:07:28] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): rename naos to deploy2001 and reinstall with stretch - https://phabricator.wikimedia.org/T193916#4183490 (10Dzahn) [22:08:31] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): rename naos to deploy2001 and reinstall with stretch - https://phabricator.wikimedia.org/T193916#4227075 (10Dzahn) p:05Normal>03High [22:10:07] (03CR) 10Aaron Schulz: profile::mediawiki::mcrouter_wancache: add ssl, proxy support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/431737 (https://phabricator.wikimedia.org/T192370) (owner: 10Giuseppe Lavagetto) [22:15:43] 10Operations, 10monitoring: Reduce false positive icinga alerts during host reimages - https://phabricator.wikimedia.org/T195423#4227076 (10herron) p:05Triage>03Normal [22:18:46] 10Operations, 10monitoring: Reduce false positive icinga alerts during host reimages - https://phabricator.wikimedia.org/T195423#4227089 (10herron) One theory is that this is occurring if a puppet agent runs happens on the icinga server while a downtimed node is deactivated in puppetdb and in the middle of an... [22:24:36] (03CR) 10Krinkle: Remove pear packages from MW Application Servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/434710 (https://phabricator.wikimedia.org/T195364) (owner: 10Reedy) [22:36:37] (03CR) 10Dzahn: [C: 031] security: Remove dangerous unused 'botadmin' group at mlwik{tionary|isource} [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433136 (https://phabricator.wikimedia.org/T152296) (owner: 10MarcoAurelio) [22:41:12] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): rename naos to deploy2001 and reinstall with stretch - https://phabricator.wikimedia.org/T193916#4227138 (10Dzahn) - rename/reinstall is done - naos has been removed from everything (except DNS) - deploy2001 has been added... [22:43:39] (03CR) 10Dzahn: "it would be great if this can get merged soon since it is the sole blocker of https://phabricator.wikimedia.org/T193916#4227138 and the s" [puppet] - 10https://gerrit.wikimedia.org/r/434803 (https://phabricator.wikimedia.org/T193916) (owner: 10Dzahn) [22:50:12] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4227160 (10Dzahn) [22:50:59] (03PS1) 10Dzahn: mariadb: grant deploy1001 access to labswiki [puppet] - 10https://gerrit.wikimedia.org/r/434821 (https://phabricator.wikimedia.org/T175288) [22:52:39] why does a deployment server need to talk to the "labswiki" database? [22:53:22] what breaks if it cant [22:55:20] (03CR) 10Dzahn: "would be super great if this can be done before Friday because we scheduled the switch to deploy1001 and this could be the blocker for it " [puppet] - 10https://gerrit.wikimedia.org/r/434821 (https://phabricator.wikimedia.org/T175288) (owner: 10Dzahn) [22:56:06] (03CR) 10Dzahn: "that being said.i am not sure what would break if a the deployment server can't talk to "labswiki" (wikitech) db" [puppet] - 10https://gerrit.wikimedia.org/r/434821 (https://phabricator.wikimedia.org/T175288) (owner: 10Dzahn) [22:57:54] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4227190 (10Dzahn) We'll need the database grants above added or that might block the switch. That being said, not sure why exact... [23:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Evening SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T2300). [23:00:04] ebernhardson and James_F: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:20] * James_F waves. [23:00:22] James_F: you just had to make a pile of patches :P [23:00:31] any particular ordering? [23:00:48] ebernhardson: Well, that's what happens when RelEng keep the train running whilst everyone is at the Hackathon. [23:00:58] ebernhardson: Nope. Ideally wmf.5s before wmf.4s though. [23:01:30] (03CR) 10EBernhardson: [C: 032] security: Remove dangerous unused 'botadmin' group at mlwik{tionary|isource} [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433136 (https://phabricator.wikimedia.org/T152296) (owner: 10MarcoAurelio) [23:04:38] ebernhardson: That one's not really testable. ;-) [23:04:44] James_F: i figured :) [23:04:53] it's pretty straight forward though [23:05:04] And theoretically a no-op. [23:05:12] "Does it crash? No, then it's fine." [23:06:46] (03PS4) 10EBernhardson: security: Remove dangerous unused 'botadmin' group at mlwik{tionary|isource} [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433136 (https://phabricator.wikimedia.org/T152296) (owner: 10MarcoAurelio) [23:06:52] (03CR) 10EBernhardson: [C: 032] security: Remove dangerous unused 'botadmin' group at mlwik{tionary|isource} [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433136 (https://phabricator.wikimedia.org/T152296) (owner: 10MarcoAurelio) [23:07:41] jouncebot: now [23:07:41] For the next 0 hour(s) and 52 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180523T2300) [23:08:08] (03Merged) 10jenkins-bot: security: Remove dangerous unused 'botadmin' group at mlwik{tionary|isource} [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433136 (https://phabricator.wikimedia.org/T152296) (owner: 10MarcoAurelio) [23:08:15] note: you are scapping with just one scap master right now, the other one will come back but i guess not _while_ you are swatting [23:08:31] ok [23:09:48] (03CR) 10jenkins-bot: security: Remove dangerous unused 'botadmin' group at mlwik{tionary|isource} [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433136 (https://phabricator.wikimedia.org/T152296) (owner: 10MarcoAurelio) [23:10:51] !log ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: T152296: Remove dangerous unused botadmin group at mlwik{tionary|isource} (duration: 01m 10s) [23:10:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:10:56] T152296: Review the 'botadmin' group at mlwiktionary and mlwikisource - https://phabricator.wikimedia.org/T152296 [23:12:43] waiting for the next batch of patches to make it through jenkins [23:15:37] hmm, `git log HEAD..origin/wmf/1.32.0-wmf.4 claims there is an undeployed patch to Translate extension [23:16:09] I know they were working on some UBN earlier today. [23:16:18] Maybe debugging work? [23:16:19] Nikerabbit: around? it seems https://gerrit.wikimedia.org/r/#/c/434700/ is merged to the branch but not merged into the deploy repo on tin [23:17:15] ebernhardson: But https://phabricator.wikimedia.org/T195347#4226223 says it was? [23:17:37] twentyafterfour: You around? [23:17:50] hmm, same in wmf.5. git log is claiming it hasn't been merged into HEAD [23:18:00] (on tin) [23:19:10] well, safest path from here is to revert and let twentyafterfour / Nikerabbit re-deploy tomorrow i suppose [23:19:37] Yeah… :-( [23:19:48] Unless tin is confused and it actually /is/ deployed. [23:20:12] looking [23:21:38] ok it looks like the expected patch *is* in the extensions/Translate repo for each deploy branch, somehow the submodule was updated without ever merging the submodule bump into the mediawiki/core branch [23:22:04] (and so Translate extension shows dirty on both branches in tin). sec. [23:23:48] Fun. [23:25:08] twentyafterfour: ^ [23:25:56] i have to go afk but i know earlier there was already this issue with wikidata and the version [23:26:16] it was deployed [23:26:21] That sounds worrying. Different versions of Wikidata than it should be? [23:27:19] James_F: no, dont be _that_ worried.. all the versions on all appservers were checked [23:27:59] yeah we checked thoroughly everything was consistent on all the app servers [23:28:00] i just said that because in both cases there is a relation to wmf.5 [23:28:07] bad phrasing [23:28:07] Ah, OK. [23:28:09] this is the task in question: https://phabricator.wikimedia.org/T195402 [23:28:24] sorry, i have to go afk. call if needed [23:28:39] twentyafterfour: oh good you're here ... so two things (one pebkac on this side that i need your help with now ...) [23:28:47] thanks mutante [23:28:56] ebernhardson: ok [23:29:28] twentyafterfour: it looks like from the git repos that translate extension submodule was checked out and deployed with right version, but somehow the main mediawiki/core repo for the branch didn't merge in the upstream one. Should have been an easy fix but ... [23:29:42] twentyafterfour: for some reason i did `git rebase origin/master` in php-1.32.0-wmf.5 instead of git rebsae origin/wmf/1.32.0-wmf.5 [23:30:11] ebernhardson: lovely. ok [23:30:35] git reflog is our friend [23:31:01] actually it looks like you fixed it already? [23:31:33] twentyafterfour: well, sorta. I looked at git reflog and issued `git reset --soft 2a240be8c8` to put it back in the old state [23:31:48] twentyafterfour: soft because i didn't want it further mucking with files on disk and wanted to look at git status [23:32:13] it looks like it's almost right, but you picked up a few patches from master [23:32:24] twentyafterfour: so now it sorta looks right ... but `git log origin/wmf/1.32.0-wmf.5..HEAD` should only list security patches, things that shouldn't be in gerrit. But it has extras [23:32:39] one option: interactive rebase to remove everything other than the two security patches? [23:33:18] another option: hard reset to origin/wmf/1.32.0-wmf.5 and then re-run scap patch to apply security patches [23:33:34] hmm, hard reset with scap patch sounds most likely to put into the correct state [23:33:43] i didn't realize it was that easy to re-apply the patches. sec [23:34:31] it should be easy to apply patches, I had to fix conflicts already once this week so the patches should be up to date [23:34:50] twentyafterfour: whats the command? scap -h doesn't have a 'patch' [23:35:10] `scap patch 1.32.0-wmf.5` from the mediawiki-staging directory [23:35:30] oh, scap has different options in different directories. magic! [23:35:57] it picks up plugins from the ./scap/plugins/ directory [23:36:11] ok, so ok git logs in both directions look sane between tin and gerrit. Thanks! [23:36:22] so each repo can have custom scap commands, however, it doesn't look up in parent directories to find the .git root (it should, IMO) [23:36:31] ebernhardson: cool [23:36:45] James_F: ok, finally back to deploy now that we un-mucked my pebkac [23:37:04] Yay. [23:37:39] James_F: VE MWSaveDialog is on mwdebug1001 for wmf.5, and the mobile frontend one is synced for wmf.4 [23:38:13] Checking. [23:40:01] The MF one LGTM on wmf.4. [23:40:41] ebernhardson: And the MWSaveDialog one on wmf.5 is good too. [23:41:32] ok syncing MF on wmf.4 [23:42:30] !log ebernhardson@tin Synchronized php-1.32.0-wmf.4/extensions/MobileFrontend/resources/mobile.editor.common/editor.less: SWAT: T194832: Fix layout of editor switcher dropdown (duration: 01m 08s) [23:42:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:42:36] T194832: Editor switcher in MobileFrontend visually broken - https://phabricator.wikimedia.org/T194832 [23:45:01] !log ebernhardson@tin Synchronized php-1.32.0-wmf.5/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWSaveDialog.js: SWAT: T195323: MWSaveDialog: Fix typo in no-categories branch (duration: 01m 08s) [23:45:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:45:05] T195323: Visual Editor changes dialog hangs when clicking on preview on pages without categories - https://phabricator.wikimedia.org/T195323 [23:45:52] James_F: typo in api call for version number for VE on mwdebug1001 [23:46:53] ebernhardson: Yup, LGTM. [23:49:33] !log ebernhardson@tin Synchronized php-1.32.0-wmf.4/extensions/VisualEditor/modules/ve-mw/ui/tools/ve.ui.MWPopupTool.js: SWAT: Fix typo in API call for version number help (duration: 01m 08s) [23:49:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:50:38] James_F: ok just waiting on jenkins for the VE MWSaveDialog patch to wmf.4 (that was already deployed to wmf.5) [23:50:54] Cool! [23:51:32] poor jenkins, not really the source of slowness just gets blamed by everyone when they wait :P [23:57:44] James_F: MWSaveDialog up on wmf.4 in mwdebug1001 [23:58:37] ebernhardson: Yup!