[00:00:13] and we're outta time [00:00:21] wahhh [00:00:37] MaxSem: Verified [00:00:41] I'll still deploy my changes because they're urgent. anything else urgent? [00:00:48] you mean i sat here for the last hour for nothing? [00:01:10] life sucks [00:01:44] https://gerrit.wikimedia.org/r/#/c/340697/ [config] 340697 Make Page Previews use RESTBase on "stage 0" wikis [00:01:57] can i at least have that one? [00:02:07] let's see [00:05:58] !log dzahn@puppetmaster1001 conftool action : get/pooled; selector: dc=eqiad,name=mw2256.codfw.wmnet [00:06:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:07:59] (03PS2) 10MaxSem: Make Page Previews use RESTBase on "stage 0" wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340697 (https://phabricator.wikimedia.org/T158221) (owner: 10Phuedx) [00:08:14] (03CR) 10MaxSem: [C: 032] Make Page Previews use RESTBase on "stage 0" wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340697 (https://phabricator.wikimedia.org/T158221) (owner: 10Phuedx) [00:08:23] !log dzahn@puppetmaster1001 conftool action : set/pooled=no; selector: name=mw2256.codfw.wmnet [00:08:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:08:32] Reedy: ^ [00:08:41] cheers [00:08:52] !log depooled mw2256 because it's down again (T155180) [00:08:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:08:57] T155180: codfw: mw2251-mw2260 rack/setup - https://phabricator.wikimedia.org/T155180 [00:09:28] (03Merged) 10jenkins-bot: Make Page Previews use RESTBase on "stage 0" wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340697 (https://phabricator.wikimedia.org/T158221) (owner: 10Phuedx) [00:09:37] (03CR) 10jenkins-bot: Make Page Previews use RESTBase on "stage 0" wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340697 (https://phabricator.wikimedia.org/T158221) (owner: 10Phuedx) [00:16:37] (03CR) 10Mattflaschen: "See inline. The primary purpose of this patch is actually to fix that enwiki part." (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342767 (owner: 10Mattflaschen) [00:19:32] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/340697/2 (duration: 02m 53s) [00:19:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:20:12] jdlrobson, please verify ^ [00:22:18] (03PS1) 10Dzahn: mediawiki::maintenance: convert to profile/role (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/342777 [00:24:19] (03PS1) 10Gergő Tisza: Temporary fix to avoid referencing AuthManagerStatsdHandler when not loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342778 [00:24:37] PROBLEM - MegaRAID on ms-be1008 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) [00:24:48] ACKNOWLEDGEMENT - MegaRAID on ms-be1008 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T160488 [00:24:51] 06Operations, 10ops-eqiad: Degraded RAID on ms-be1008 - https://phabricator.wikimedia.org/T160488#3100581 (10ops-monitoring-bot) [00:25:06] :o [00:25:10] When did that start? [00:25:55] the autohandler for RAID issues? :) [00:26:13] a couple months ago, v.olans did that [00:26:19] nice, eh [00:26:38] (03CR) 10jerkins-bot: [V: 04-1] mediawiki::maintenance: convert to profile/role (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/342777 (owner: 10Dzahn) [00:26:40] (03CR) 10Krinkle: [C: 031] Temporary fix to avoid referencing AuthManagerStatsdHandler when not loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342778 (owner: 10Gergő Tisza) [00:26:48] I've been wanting something like that for mw logs for aaages [00:27:00] MaxSem: still doing the SWAT? [00:27:11] what's also nice, trump tax returns, they have them [00:27:19] shuts up but that was special [00:27:20] tgr, yes [00:27:25] !log maxsem@tin Synchronized php-1.29.0-wmf.15/extensions/RelatedSites/: Hide DMOZ links with https://gerrit.wikimedia.org/r/#/c/342753/ + https://gerrit.wikimedia.org/r/#/c/342768/ (duration: 02m 48s) [00:27:25] (03PS2) 10Krinkle: Temporary fix to avoid referencing AuthManagerStatsdHandler when not loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342778 (owner: 10Gergő Tisza) [00:27:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:27:34] tgr, how about leaving a comment? [00:27:51] Reedy: pretty sure you can build on the autohandler for non-RAID stuff too :) [00:27:59] Reedy: it does create ticket in phab [00:29:56] (03PS3) 10Gergő Tisza: Temporary fix to avoid referencing AuthManagerStatsdHandler when not loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342778 [00:30:32] (03CR) 10MaxSem: [C: 04-1] Temporary fix to avoid referencing AuthManagerStatsdHandler when not loaded (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342778 (owner: 10Gergő Tisza) [00:30:49] MaxSem: too slow :) [00:31:09] rrr [00:31:37] (03CR) 10MaxSem: [C: 032] Temporary fix to avoid referencing AuthManagerStatsdHandler when not loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342778 (owner: 10Gergő Tisza) [00:32:08] !log maxsem@tin Synchronized php-1.29.0-wmf.16/extensions/RelatedSites/: Hide DMOZ links with https://gerrit.wikimedia.org/r/#/c/342753/ + https://gerrit.wikimedia.org/r/#/c/342768/ (duration: 02m 48s) [00:32:08] PROBLEM - puppet last run on ms-be1008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[mkfs-/dev/sdh1] [00:32:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:33:56] (03Merged) 10jenkins-bot: Temporary fix to avoid referencing AuthManagerStatsdHandler when not loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342778 (owner: 10Gergő Tisza) [00:34:04] (03CR) 10jenkins-bot: Temporary fix to avoid referencing AuthManagerStatsdHandler when not loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342778 (owner: 10Gergő Tisza) [00:34:58] tgr, pulled on mwdebug1002 [00:36:33] MaxSem: the wikis did not explode, other than that, not sure what to test [00:41:08] !log maxsem@tin Synchronized wmf-config/logging.php: https://gerrit.wikimedia.org/r/342778 (duration: 02m 46s) [00:41:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:57:20] (03CR) 10Ladsgroup: [C: 031] "Okay, thanks for clarification." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342767 (owner: 10Mattflaschen) [01:02:52] (03CR) 10Catrope: [C: 032] Fix issues with ORES models: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342767 (owner: 10Mattflaschen) [01:04:38] (03Merged) 10jenkins-bot: Fix issues with ORES models: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342767 (owner: 10Mattflaschen) [01:04:50] (03CR) 10jenkins-bot: Fix issues with ORES models: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342767 (owner: 10Mattflaschen) [02:21:25] (03PS1) 10Chad: Remove integration/* repos from trebuchet [puppet] - 10https://gerrit.wikimedia.org/r/342785 [02:28:18] PROBLEM - puppet last run on mw1177 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:28:27] (03CR) 10Chad: [C: 031] "Mostly just moving stuff around, so lgtm. Once it passes compiler let's get it done :)" [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [02:35:20] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.15) (duration: 13m 33s) [02:35:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:55:18] PROBLEM - puppet last run on db1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:56:18] RECOVERY - puppet last run on mw1177 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [03:23:18] RECOVERY - puppet last run on db1011 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [03:29:43] (03PS1) 10Chad: Remove salt grain "php" from mediawiki hosts [puppet] - 10https://gerrit.wikimedia.org/r/342787 [03:35:24] (03PS1) 10Chad: Scap3: Prep MediaWiki to be available from /srv/deployment [puppet] - 10https://gerrit.wikimedia.org/r/342788 [03:45:28] PROBLEM - puppet last run on analytics1057 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:55:38] PROBLEM - puppet last run on labtestcontrol2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:14:28] RECOVERY - puppet last run on analytics1057 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [04:23:38] RECOVERY - puppet last run on labtestcontrol2001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [04:54:46] (03CR) 10Hashar: [C: 031] Remove integration/* repos from trebuchet [puppet] - 10https://gerrit.wikimedia.org/r/342785 (owner: 10Chad) [06:17:28] PROBLEM - puppet last run on mw1298 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:41:28] PROBLEM - puppet last run on analytics1049 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata] [06:47:28] RECOVERY - puppet last run on mw1298 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:52:18] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 23 probes of 276 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [06:57:18] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 16 probes of 276 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [07:09:28] RECOVERY - puppet last run on analytics1049 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [07:09:31] 06Operations, 10Analytics, 10DBA: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3100960 (10Marostegui) >>! In T156844#3099522, @Ottomata wrote: > Oh yeah, rats, I totally forgot to put this in our budget request. Hm. do db1046 and db1047 host just EL da... [07:13:43] 06Operations, 10MediaWiki-extensions-InterwikiSorting, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Deploy InterwikiSorting extension to production - https://phabricator.wikimedia.org/T150183#3100962 (10Addshore) >>! In T150183#3100090, @Aklapper wrote: >>>! In T150183#3097639, @Addshore wrote: >... [07:15:26] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "the direction is correct; there are however a few things to fix, see inline comments." (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [07:18:26] 06Operations, 13Patch-For-Review, 15User-Elukey: Tracking and Reducing cron-spam from root@ - https://phabricator.wikimedia.org/T132324#3100965 (10elukey) [07:18:28] 06Operations, 10Analytics, 13Patch-For-Review: Remove cronspam from stat1002 to root@ - https://phabricator.wikimedia.org/T145606#3100964 (10elukey) 05Open>03Resolved [07:21:13] (03PS1) 10Elukey: Reduce daily cronspam from mwlog2001 [puppet] - 10https://gerrit.wikimedia.org/r/342795 (https://phabricator.wikimedia.org/T156151) [07:22:14] (03CR) 10Elukey: [C: 032] Reduce daily cronspam from mwlog2001 [puppet] - 10https://gerrit.wikimedia.org/r/342795 (https://phabricator.wikimedia.org/T156151) (owner: 10Elukey) [07:25:06] 06Operations, 13Patch-For-Review, 15User-Elukey: Tracking and Reducing cron-spam from root@ - https://phabricator.wikimedia.org/T132324#3100971 (10elukey) [07:25:08] 06Operations, 13Patch-For-Review: Cronspam from mwlog* - https://phabricator.wikimedia.org/T156151#3100970 (10elukey) 05Open>03Resolved [07:26:56] !log Enable parallel replication on x1 slaves - T160407 [07:27:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:03] T160407: Apply wikishared.cx_translations index change - https://phabricator.wikimedia.org/T160407 [07:28:54] 06Operations, 10Wikimedia-General-or-Unknown, 13Patch-For-Review: foreachwikiindblist regular cronspam - https://phabricator.wikimedia.org/T159438#3100976 (10elukey) The last cronspam email seems to be the same issue described in T145360, we have more than one issue with terbium so I got confused :) The... [07:29:35] nothing better than cronspam emails in the morning [07:29:41] [07:34:28] PROBLEM - MariaDB Slave IO: s4 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:34:28] PROBLEM - MariaDB Slave SQL: s6 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:34:28] PROBLEM - MariaDB Slave SQL: s5 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:34:38] PROBLEM - MariaDB Slave IO: s1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:34:38] PROBLEM - MariaDB Slave IO: s2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:34:48] PROBLEM - MariaDB Slave IO: s5 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:34:52] ^ backups are probably running - I will check [07:35:18] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:35:19] PROBLEM - MariaDB Slave SQL: m3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:35:19] PROBLEM - MariaDB Slave SQL: m2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:42:08] RECOVERY - MariaDB Slave SQL: m3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [07:42:08] RECOVERY - MariaDB Slave SQL: m2 on dbstore1001 is OK: OK slave_sql_state not a slave [07:42:38] RECOVERY - MariaDB Slave IO: s1 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [07:42:38] RECOVERY - MariaDB Slave IO: s2 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [07:42:38] RECOVERY - MariaDB Slave IO: s5 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [07:43:19] RECOVERY - MariaDB Slave IO: s4 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [07:43:19] RECOVERY - MariaDB Slave SQL: s6 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [07:43:19] RECOVERY - MariaDB Slave SQL: s5 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [07:47:08] PROBLEM - MariaDB Slave Lag: x1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 832.26 seconds [07:50:08] PROBLEM - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 834.17 seconds [07:52:38] PROBLEM - puppet last run on mw1275 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:55:14] <_joe_> elukey: wanna try the big morning outage of last week as an alternative? [07:58:09] PROBLEM - MariaDB Slave Lag: m3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 870.82 seconds [07:59:32] (03PS1) 10Urbanecm: Enable Multimedia Viewer at officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342798 (https://phabricator.wikimedia.org/T160420) [08:01:00] _joe_ nononono I'll take the cronspam :D [08:02:08] RECOVERY - MariaDB Slave Lag: x1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [08:02:08] RECOVERY - MariaDB Slave Lag: m3 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [08:03:18] RECOVERY - puppet last run on oxygen is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [08:05:08] RECOVERY - MariaDB Slave Lag: s7 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 190.87 seconds [08:09:06] (03PS3) 10Addshore: Enable Cognate on beta wiktionary sites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341401 [08:11:48] !log installing imagemagick security updates [08:11:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:18:23] (03CR) 10Addshore: [C: 032] Enable Cognate on beta wiktionary sites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341401 (owner: 10Addshore) [08:18:33] (03PS2) 10Filippo Giunchedi: statsite: ignore spammy messages [puppet] - 10https://gerrit.wikimedia.org/r/342654 (https://phabricator.wikimedia.org/T73322) [08:19:38] (03CR) 10Filippo Giunchedi: [C: 032] statsite: ignore spammy messages [puppet] - 10https://gerrit.wikimedia.org/r/342654 (https://phabricator.wikimedia.org/T73322) (owner: 10Filippo Giunchedi) [08:19:52] (03Merged) 10jenkins-bot: Enable Cognate on beta wiktionary sites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341401 (owner: 10Addshore) [08:20:00] (03CR) 10jenkins-bot: Enable Cognate on beta wiktionary sites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341401 (owner: 10Addshore) [08:20:38] RECOVERY - puppet last run on mw1275 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [08:20:51] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/5776/" [puppet] - 10https://gerrit.wikimedia.org/r/342648 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [08:22:25] !log Deploy alter table x1 testing parallel replication - T160407 [08:22:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:31] T160407: Apply wikishared.cx_translations index change - https://phabricator.wikimedia.org/T160407 [08:24:19] (03PS2) 10Filippo Giunchedi: Remove salt grain "php" from mediawiki hosts [puppet] - 10https://gerrit.wikimedia.org/r/342787 (owner: 10Chad) [08:26:03] 06Operations, 10ops-codfw, 13Patch-For-Review, 15User-fgiunchedi: codfw: ms-be2028-ms-be2039 rack/setup - https://phabricator.wikimedia.org/T158337#3101032 (10fgiunchedi) [08:26:30] !log removed imagemagick 6.8.9.9-5+deb8u7+wmf1 from apt.wikimedia.org (the sharpen patch is folded into the new 6.8.9.9-5+deb8u8 security update) [08:26:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:36] 06Operations, 10Monitoring, 13Patch-For-Review, 05Prometheus-metrics-monitoring, 15User-fgiunchedi: Evaluate prometheus snmp_exporter for Torrus PDUs metrics use case - https://phabricator.wikimedia.org/T148541#3101045 (10fgiunchedi) [08:27:12] 06Operations, 10ops-eqiad, 13Patch-For-Review, 15User-fgiunchedi: Rack and set up ms-fe100[5-8] - https://phabricator.wikimedia.org/T155095#3101047 (10fgiunchedi) [08:27:16] !log addshore@tin Synchronized wmf-config/InitialiseSettings-labs.php: [[gerrit:341401|Enable Cognate on beta wiktionary sites]] T156241 Beta Only (duration: 02m 48s) [08:27:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:27:21] T156241: Deploy Cognate extension to beta - https://phabricator.wikimedia.org/T156241 [08:27:31] 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring, 15User-fgiunchedi: Put prometheus baremetal servers in service - https://phabricator.wikimedia.org/T148408#3101051 (10fgiunchedi) [08:28:11] 06Operations, 15User-fgiunchedi: upgrade netmon1001 to jessie - https://phabricator.wikimedia.org/T125020#3101052 (10fgiunchedi) [08:28:42] Filippo going full power --> 1 task per minute [08:28:44] 06Operations, 10Wikimedia-Logstash, 15User-fgiunchedi: Get 5xx logs into kibana/logstash - https://phabricator.wikimedia.org/T149451#3101055 (10fgiunchedi) [08:29:47] elukey: yes, like this https://i.imgur.com/bF8jLYs.gifv [08:29:58] ahahhahaha [08:30:08] PROBLEM - swift-object-updater on ms-be1005 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [08:30:37] I got a workboard created and was moving tasks into it [08:30:40] now I can go agile [08:32:17] (03CR) 10Filippo Giunchedi: [C: 032] Remove salt grain "php" from mediawiki hosts [puppet] - 10https://gerrit.wikimedia.org/r/342787 (owner: 10Chad) [08:35:09] PROBLEM - puppet last run on xenon is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:47:08] PROBLEM - swift-object-updater on ms-be1015 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [08:50:08] RECOVERY - swift-object-updater on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [08:52:13] marostegui: Thanks! [08:52:23] no worries! :) [08:54:08] RECOVERY - swift-object-updater on ms-be1015 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [09:02:15] !log Disable parallel replication on x1 slaves (db1029, db2033) - T160407 [09:02:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:21] T160407: Apply wikishared.cx_translations index change - https://phabricator.wikimedia.org/T160407 [09:04:08] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [09:04:44] 06Operations, 10ops-eqiad: Degraded RAID on ms-be1008 - https://phabricator.wikimedia.org/T160488#3101154 (10fgiunchedi) [09:05:31] 06Operations, 10ops-eqiad: Degraded RAID on ms-be1008 - https://phabricator.wikimedia.org/T160488#3100581 (10fgiunchedi) a:03Cmjohnson @Cmjohnson if we have 2TB spares onsite please replace. We can reclaim the disks once decommission happens in a few weeks [09:05:42] 06Operations, 10ops-eqiad, 15User-fgiunchedi: Degraded RAID on ms-be1008 - https://phabricator.wikimedia.org/T160488#3101158 (10fgiunchedi) [09:09:11] 06Operations, 10MediaWiki-extensions-InterwikiSorting, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Deploy InterwikiSorting extension to production - https://phabricator.wikimedia.org/T150183#2776612 (10Addshore) 05Open>03Resolved [09:09:38] PROBLEM - MariaDB Slave Lag: s4 on dbstore2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:09:38] PROBLEM - MariaDB Slave Lag: x1 on dbstore2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:09:48] PROBLEM - MariaDB Slave Lag: s7 on dbstore2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:11:12] !log Disable parallel replication on dbstore2002, dbstore2001, dbstore1002, dbstore1001 - T160407 [09:11:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:19] T160407: Apply wikishared.cx_translations index change - https://phabricator.wikimedia.org/T160407 [09:13:36] (03CR) 10Alexandros Kosiaris: [C: 031] "seems fine to me, @hashar any objections ?" [puppet] - 10https://gerrit.wikimedia.org/r/342637 (owner: 10Alexandros Kosiaris) [09:18:31] (03CR) 10Hashar: [C: 031] "I am surprised it pass! I guess Daniel and others fixed most of them." [puppet] - 10https://gerrit.wikimedia.org/r/342637 (owner: 10Alexandros Kosiaris) [09:22:39] (03CR) 10Alexandros Kosiaris: [C: 031] "makes sense. emailing ops about this" [puppet] - 10https://gerrit.wikimedia.org/r/342637 (owner: 10Alexandros Kosiaris) [09:26:52] (03CR) 10Volans: "The logic sounds good, see few comments inline" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/342633 (https://phabricator.wikimedia.org/T142825) (owner: 10Muehlenhoff) [09:32:28] RECOVERY - MariaDB Slave Lag: x1 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 89807.55 seconds [09:33:21] (03PS2) 10Ema: logrotate: use 'rsyslog rotate' instead of 'reload rsyslog' [puppet] - 10https://gerrit.wikimedia.org/r/342608 (https://phabricator.wikimedia.org/T160405) [09:36:48] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/342608 (https://phabricator.wikimedia.org/T160405) (owner: 10Ema) [09:45:33] (03CR) 10Ema: [C: 032] logrotate: use 'rsyslog rotate' instead of 'reload rsyslog' [puppet] - 10https://gerrit.wikimedia.org/r/342608 (https://phabricator.wikimedia.org/T160405) (owner: 10Ema) [09:51:40] (03PS4) 10Giuseppe Lavagetto: Add redis switching task, some more stages boilerplate [switchdc] - 10https://gerrit.wikimedia.org/r/342498 (https://phabricator.wikimedia.org/T160178) [09:51:42] (03PS2) 10Giuseppe Lavagetto: Decouple logging setup from importing the module [switchdc] - 10https://gerrit.wikimedia.org/r/342657 [09:51:43] (03PS2) 10Giuseppe Lavagetto: Add cache wipe + warmup phase implementation [switchdc] - 10https://gerrit.wikimedia.org/r/342666 (https://phabricator.wikimedia.org/T160178) [09:51:45] (03PS1) 10Giuseppe Lavagetto: Add conftool support [switchdc] - 10https://gerrit.wikimedia.org/r/342805 (https://phabricator.wikimedia.org/T160178) [09:51:47] (03PS1) 10Giuseppe Lavagetto: Add stages to manage maintenance [switchdc] - 10https://gerrit.wikimedia.org/r/342806 (https://phabricator.wikimedia.org/T160178) [09:54:14] (03CR) 10Giuseppe Lavagetto: [C: 031] Add network::monitor role [puppet] - 10https://gerrit.wikimedia.org/r/342648 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [09:58:55] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: Implement DC-local cache failure limiter in Thumbor - https://phabricator.wikimedia.org/T151065#3101272 (10Gilles) [10:02:35] (03CR) 10Giuseppe Lavagetto: [C: 031] "I didn't check the config files, the rest looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/341005 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [10:02:38] RECOVERY - MariaDB Slave Lag: s7 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 89998.23 seconds [10:07:13] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Overall looks promising, various comments inline." (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/342431 (https://phabricator.wikimedia.org/T156245) (owner: 10Gilles) [10:08:26] (03CR) 10Volans: [C: 031] "LGTM, it was actually my first implementation :D" [switchdc] - 10https://gerrit.wikimedia.org/r/342657 (owner: 10Giuseppe Lavagetto) [10:11:26] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Initial import [switchdc] - 10https://gerrit.wikimedia.org/r/342492 (https://phabricator.wikimedia.org/T160178) (owner: 10Volans) [10:11:56] 06Operations, 15User-Elukey: JobQueue Redis codfw replicas periodically lagging - https://phabricator.wikimedia.org/T159850#3101302 (10akosiaris) >>! In T159850#3095981, @elukey wrote: > ``` > elukey@neodymium:~$ sudo -i salt -E 'rdb100[1357].eqiad.wmnet' cmd.run "du -hs /srv/redis/* | sort -h" > rdb1007.eqiad... [10:14:54] 06Operations, 10Pybal, 10Traffic, 13Patch-For-Review: pybal stops logging - https://phabricator.wikimedia.org/T160405#3101310 (10ema) 05Open>03Resolved a:03ema In the past we've always had a bunch of hosts constantly depooled because of various issues (eg: T148891) thus we got used to seeing frequen... [10:15:23] (03PS1) 10Gilles: Upgrade to 0.1.36 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/342809 (https://phabricator.wikimedia.org/T159358) [10:17:21] (03PS1) 10Filippo Giunchedi: prometheus: adjust storage retention [puppet] - 10https://gerrit.wikimedia.org/r/342810 [10:17:53] (03CR) 10Volans: [C: 04-1] "There is some error/missing things, see inline." (036 comments) [switchdc] - 10https://gerrit.wikimedia.org/r/342666 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [10:18:08] (03PS5) 10Giuseppe Lavagetto: Add redis switching task, some more stages boilerplate [switchdc] - 10https://gerrit.wikimedia.org/r/342498 (https://phabricator.wikimedia.org/T160178) [10:18:10] (03PS3) 10Giuseppe Lavagetto: Decouple logging setup from importing the module [switchdc] - 10https://gerrit.wikimedia.org/r/342657 [10:18:12] (03PS3) 10Giuseppe Lavagetto: Add cache wipe + warmup phase implementation [switchdc] - 10https://gerrit.wikimedia.org/r/342666 (https://phabricator.wikimedia.org/T160178) [10:18:14] (03PS2) 10Giuseppe Lavagetto: Add conftool support [switchdc] - 10https://gerrit.wikimedia.org/r/342805 (https://phabricator.wikimedia.org/T160178) [10:18:16] (03PS2) 10Giuseppe Lavagetto: Add stages to manage maintenance [switchdc] - 10https://gerrit.wikimedia.org/r/342806 (https://phabricator.wikimedia.org/T160178) [10:18:18] (03PS3) 10Muehlenhoff: Add support for Phabricator offboarding to offboarding script [puppet] - 10https://gerrit.wikimedia.org/r/342633 (https://phabricator.wikimedia.org/T142825) [10:18:23] (03CR) 10Muehlenhoff: Add support for Phabricator offboarding to offboarding script (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/342633 (https://phabricator.wikimedia.org/T142825) (owner: 10Muehlenhoff) [10:21:09] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: adjust storage retention [puppet] - 10https://gerrit.wikimedia.org/r/342810 (owner: 10Filippo Giunchedi) [10:23:28] RECOVERY - MariaDB Slave Lag: s4 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 89959.41 seconds [10:24:05] (03PS1) 10Gilles: Enable memcache-based Thumbor broken thumbnail throttling [puppet] - 10https://gerrit.wikimedia.org/r/342811 (https://phabricator.wikimedia.org/T151065) [10:28:39] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I don't like the idea of mixing data on our memcached servers, that are at the moment dedicated to mediawiki." [puppet] - 10https://gerrit.wikimedia.org/r/342811 (https://phabricator.wikimedia.org/T151065) (owner: 10Gilles) [10:31:54] (03CR) 10Gilles: "It's reproducing a mediawiki feature almost verbatim, which will become a dead codepath when thumbor replaces mediawiki. I'm fine with a n" [puppet] - 10https://gerrit.wikimedia.org/r/342811 (https://phabricator.wikimedia.org/T151065) (owner: 10Gilles) [10:33:28] (03CR) 10Gilles: "Or I guess nutcracker can point to the memcache instances on both thumbor nodes. Makinf the memcache cluster run on the thumbor machines t" [puppet] - 10https://gerrit.wikimedia.org/r/342811 (https://phabricator.wikimedia.org/T151065) (owner: 10Gilles) [10:34:10] gilles: o/ [10:34:44] !log upgrade cp4001 (misc) and cp4011 (maps) to linux 4.9 T154934 [10:34:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:34:51] T154934: Package the next LTS kernel (4.9) - https://phabricator.wikimedia.org/T154934 [10:35:18] <_joe_> gilles: my point is I would like to have isolation of services [10:35:31] <_joe_> they shouldn't be able to pollute each other's datastores [10:36:46] (03CR) 10Giuseppe Lavagetto: Add cache wipe + warmup phase implementation (035 comments) [switchdc] - 10https://gerrit.wikimedia.org/r/342666 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [10:36:48] to support --^ there is also a problem of mixing too many different object sizes in our caches.. it would be great if we could start to have separate pools for services [10:37:03] !log Run namespaceDupes.php for pnb.wikipedia (T159976) [10:37:05] so they'll have a better and more consistent slab usage [10:37:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:09] T159976: Run namespaceDupes.php for wikis in Western Punjabi (pnb) - https://phabricator.wikimedia.org/T159976 [10:38:05] (03CR) 10Giuseppe Lavagetto: Add cache wipe + warmup phase implementation (031 comment) [switchdc] - 10https://gerrit.wikimedia.org/r/342666 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [10:41:43] !log Run namespaceDupes.php for pnb.wiktionary (T159976): all looks good for this one [10:41:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:42:01] (03CR) 10Elukey: "Yes I am pretty sure that nutcracker could easily work with memcached instances running on the same hosts, but I am concerned about puttin" [puppet] - 10https://gerrit.wikimedia.org/r/342811 (https://phabricator.wikimedia.org/T151065) (owner: 10Gilles) [10:43:55] (03CR) 10Gilles: "So what do you want to do right now? Start off with nutcracker+memcached running on the same machines, or take on a task to create a separ" [puppet] - 10https://gerrit.wikimedia.org/r/342811 (https://phabricator.wikimedia.org/T151065) (owner: 10Gilles) [10:45:08] (03PS1) 10Muehlenhoff: Change email address for moushira [puppet] - 10https://gerrit.wikimedia.org/r/342812 [10:52:20] (03CR) 10Ema: add parsoid-vd-tests.wikimedia.org (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/341920 (https://phabricator.wikimedia.org/T159995) (owner: 10Dzahn) [10:52:51] (03CR) 10Ema: [C: 031] varnish/misc: add parsoid-vd-tests -> ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/341925 (https://phabricator.wikimedia.org/T159995) (owner: 10Dzahn) [10:53:11] (03CR) 10Ema: [C: 031] varnish/misc: rename parsoid-tests to parsoid-rt-tests [puppet] - 10https://gerrit.wikimedia.org/r/341926 (https://phabricator.wikimedia.org/T159995) (owner: 10Dzahn) [10:55:02] (03CR) 10Elukey: "I'd like to involve Filippo in the discussion, I'll ping him and get back to you asap." [puppet] - 10https://gerrit.wikimedia.org/r/342811 (https://phabricator.wikimedia.org/T151065) (owner: 10Gilles) [10:57:59] (03PS1) 10Elukey: Ensure that all the Cassandra dirs are created after related user/group [puppet] - 10https://gerrit.wikimedia.org/r/342813 [11:04:08] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:08:20] (03CR) 10Volans: [C: 04-1] "see inline" (033 comments) [switchdc] - 10https://gerrit.wikimedia.org/r/342666 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [11:09:23] akosiaris: the puppet-lint bump will most probably not work [11:09:40] akosiaris: the CI job on runs puppet-lint against files changed in HEAD, not on the whole repo [11:10:27] hashar: do you mean it will make new problems show up ? cause I don't think it will [11:10:40] I 've done a find . -name "*.pp" -ls -exec puppet-lint {} \; [11:10:53] and aside from the 140 chars warnings that are already there [11:10:55] I am running it on the whole repo right now using bundle exec rake puppetlint [11:10:57] I saw nothing else big [11:11:14] I am running 2.0.2 btw [11:11:19] ideally CI should detect that one of the gem version bump requires to run the associated linter on the whole repo instead of just HEAD [11:12:18] you are going btw to find already existing problems that need fixing. but which from what I gather show up in 1.1.0 too [11:12:25] (03CR) 10Volans: [C: 04-1] "There is a typo, see inline. LGTM otherwise" (032 comments) [switchdc] - 10https://gerrit.wikimedia.org/r/342805 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [11:12:37] possibly yeah [11:12:47] I think mutante had a patch to disable the 140 chars warning [11:12:58] 06Operations, 10DBA, 13Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3101425 (10Marostegui) [11:13:11] but yeah your find *.pp -exec puppet-lint would reproduce what CI does [11:13:13] nice [11:13:21] modules/stdlib/spec/fixtures/test/manifests/absolutepath.pp:5 WARNING double quoted string containing no variables (double_quoted_strings) [11:13:21] :( [11:13:53] heh. not much we can do about that btw [11:14:05] upstream's error [11:14:17] puppet 4 ? [11:14:26] I mean, could you have puppet 4 on your local machine? [11:14:32] I do indeed [11:14:56] but puppet-lint is unrelated to puppet [11:14:59] funnily enough [11:15:27] it does use Puppet to lex/parse whatever the puppet manifest iirc [11:15:41] on the contrary, it doesn't IIRC [11:16:13] lexer.rb:require 'puppet-lint/lexer/token' [11:16:20] has it's own lexer [11:16:51] ah [11:17:25] I mix it up with puppet-syntax sorry :/ [11:18:03] (03PS4) 10Giuseppe Lavagetto: Add cache wipe + warmup phase implementation [switchdc] - 10https://gerrit.wikimedia.org/r/342666 (https://phabricator.wikimedia.org/T160178) [11:18:05] (03PS3) 10Giuseppe Lavagetto: Add conftool support [switchdc] - 10https://gerrit.wikimedia.org/r/342805 (https://phabricator.wikimedia.org/T160178) [11:18:07] (03PS3) 10Giuseppe Lavagetto: Add stages to manage maintenance [switchdc] - 10https://gerrit.wikimedia.org/r/342806 (https://phabricator.wikimedia.org/T160178) [11:19:27] <_joe_> volans: I had somehow not re-sent the new patch [11:19:31] <_joe_> but just a rebase, sorry [11:20:07] eheheh now make sense [11:24:39] 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 07Wikimedia-Multiple-active-datacenters: Prepare and improve the datacenter switchover procedure - https://phabricator.wikimedia.org/T154658#3101447 (10akosiaris) [11:24:42] 06Operations, 10Revision-Scoring-As-A-Service-Backlog, 13Patch-For-Review: Set up oresrdb redis node in codfw - https://phabricator.wikimedia.org/T139372#3101444 (10akosiaris) 05Open>03Resolved a:03akosiaris oresrdb2002 is up and running fine as a slave of oresrdb2001 for the last 2 days, so I consider... [11:25:26] 06Operations, 10Revision-Scoring-As-A-Service-Backlog, 13Patch-For-Review: Set up oresrdb redis node in codfw - https://phabricator.wikimedia.org/T139372#3101452 (10akosiaris) [11:25:28] 06Operations, 10hardware-requests: codfw: (2) servers request for ORES redis databases - https://phabricator.wikimedia.org/T142190#3101449 (10akosiaris) 05Open>03Resolved a:03akosiaris I think we can resolve this. We have 1 VM and 1 hardware box for this and seem to be working fine. [11:29:01] jouncebot: next [11:29:01] In 1 hour(s) and 30 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170315T1300) [11:32:15] (03CR) 10Volans: [C: 04-1] "Some typo inline. The logic looks good." (037 comments) [switchdc] - 10https://gerrit.wikimedia.org/r/342806 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [11:33:18] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [11:41:19] PROBLEM - puppet last run on db1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:47:55] (03CR) 10Filippo Giunchedi: [C: 032] Upgrade to 0.1.36 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/342809 (https://phabricator.wikimedia.org/T159358) (owner: 10Gilles) [11:49:34] (03CR) 10Filippo Giunchedi: [C: 031] Ensure that all the Cassandra dirs are created after related user/group [puppet] - 10https://gerrit.wikimedia.org/r/342813 (owner: 10Elukey) [11:53:41] (03CR) 10Elukey: [C: 032] Ensure that all the Cassandra dirs are created after related user/group [puppet] - 10https://gerrit.wikimedia.org/r/342813 (owner: 10Elukey) [11:54:24] (03CR) 10Gilles: "It makes sense for Thumbor to have its own memcache. I just don't think that the only small feature thumbor needs memcache for warrants de" [puppet] - 10https://gerrit.wikimedia.org/r/342811 (https://phabricator.wikimedia.org/T151065) (owner: 10Gilles) [11:59:54] (03CR) 10Filippo Giunchedi: "I think for now we can co-locate memcached on thumbor, on the basis thumbor IIRC will fail-open if the key isn't there (i.e. skip the brok" [puppet] - 10https://gerrit.wikimedia.org/r/342811 (https://phabricator.wikimedia.org/T151065) (owner: 10Gilles) [12:00:06] (03PS1) 10Muehlenhoff: Extend list of privileged LDAP groups [puppet] - 10https://gerrit.wikimedia.org/r/342818 [12:01:03] (03CR) 10Gilles: "I have an update ready for thumbor100* colocation, pushing it in a sec" [puppet] - 10https://gerrit.wikimedia.org/r/342811 (https://phabricator.wikimedia.org/T151065) (owner: 10Gilles) [12:03:38] 06Operations, 10MediaWiki-Configuration, 06Performance-Team, 06Services (watching), and 5 others: Allow integration of data from etcd into the MediaWiki configuration - https://phabricator.wikimedia.org/T156924#3101507 (10Joe) Thinking of a general way to represent any mediawiki-config variable left me wit... [12:04:09] (03PS2) 10Muehlenhoff: Extend list of privileged LDAP groups [puppet] - 10https://gerrit.wikimedia.org/r/342818 [12:05:05] (03PS2) 10Gilles: Enable memcache-based Thumbor broken thumbnail throttling [puppet] - 10https://gerrit.wikimedia.org/r/342811 (https://phabricator.wikimedia.org/T151065) [12:06:00] (03CR) 10Muehlenhoff: [C: 032] Extend list of privileged LDAP groups [puppet] - 10https://gerrit.wikimedia.org/r/342818 (owner: 10Muehlenhoff) [12:07:44] 06Operations: Enhance account handling (meta bug) - https://phabricator.wikimedia.org/T142815#3101532 (10MoritzMuehlenhoff) [12:07:49] 06Operations, 07Documentation, 07LDAP, 13Patch-For-Review: Review list of LDAP groups and document exactly what kind of access they can be allowed to provide - https://phabricator.wikimedia.org/T129788#3101530 (10MoritzMuehlenhoff) 05Open>03Resolved https://wikitech.wikimedia.org/wiki/LDAP_Groups is n... [12:09:18] RECOVERY - puppet last run on db1020 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [12:13:56] !log deploy thumbor 0.1.36-1 on thumbor100* [12:14:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:14:02] gilles: ^ [12:14:49] thanks [12:15:39] (03PS2) 10Gilles: Performance Grafana alerts [puppet] - 10https://gerrit.wikimedia.org/r/342431 (https://phabricator.wikimedia.org/T156245) [12:15:48] (03CR) 10Gilles: Performance Grafana alerts (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/342431 (https://phabricator.wikimedia.org/T156245) (owner: 10Gilles) [12:16:24] (03PS19) 10Filippo Giunchedi: prometheus: add snmp_exporter module and profile [puppet] - 10https://gerrit.wikimedia.org/r/341005 (https://phabricator.wikimedia.org/T148541) [12:16:47] (03CR) 10jerkins-bot: [V: 04-1] Performance Grafana alerts [puppet] - 10https://gerrit.wikimedia.org/r/342431 (https://phabricator.wikimedia.org/T156245) (owner: 10Gilles) [12:18:01] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: add snmp_exporter module and profile [puppet] - 10https://gerrit.wikimedia.org/r/341005 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [12:18:16] (03PS3) 10Filippo Giunchedi: Add network::monitor role [puppet] - 10https://gerrit.wikimedia.org/r/342648 (https://phabricator.wikimedia.org/T148541) [12:18:24] (03PS3) 10Gilles: Performance Grafana alerts [puppet] - 10https://gerrit.wikimedia.org/r/342431 (https://phabricator.wikimedia.org/T156245) [12:20:41] (03CR) 10Filippo Giunchedi: [C: 032] Add network::monitor role [puppet] - 10https://gerrit.wikimedia.org/r/342648 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [12:25:28] PROBLEM - puppet last run on netmon1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[prometheus-snmp-exporter] [12:26:18] PROBLEM - DPKG on netmon1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:29:17] that's me ^ [12:30:34] phuedx: want to deploy your change yourself today during eu swat? [12:31:15] zeljkof: sure [12:31:35] phuedx: want to deploy the other two changes? ;) [12:31:45] saw that coming [12:31:48] :D [12:32:06] I can deploy them if you are busy, just asking [12:32:25] no pressure [12:32:55] zeljkof: no sure, i think i can get 'em done [12:32:58] i'll go get lunch now [12:33:18] phuedx: great, claim swat window then and good luck [12:33:29] how does one claim the swat window [12:33:31] stick a flag in it? [12:33:35] I will be around if you need help, probably hasharLunch also [12:33:54] when jouncebot pings you just say "I can swat today!" [12:34:22] but you have type it with the left hand only, if you are right-handed, and with right hand only of you are left-handed [12:34:52] don't cheat, jouncebot can tell the difference [12:35:18] RECOVERY - DPKG on netmon1001 is OK: All packages OK [12:36:28] I am here for euro swat FYI [12:39:11] 06Operations, 10Wikimedia-Site-requests, 07WMF-maintenance-script-run: Run "refreshLinks.php --dfn-only" on all wikis periodically - https://phabricator.wikimedia.org/T18112#3101690 (10MarcoAurelio) [12:41:28] RECOVERY - puppet last run on netmon1001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [12:47:37] (03PS1) 10Filippo Giunchedi: prometheus: fix erb yaml indentation [puppet] - 10https://gerrit.wikimedia.org/r/342823 [12:48:18] PROBLEM - puppet last run on mw1194 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:49:04] 06Operations, 10Wikimedia-Site-requests, 07WMF-maintenance-script-run: Run "refreshLinks.php --dfn-only" on all wikis periodically - https://phabricator.wikimedia.org/T18112#3101862 (10MarcoAurelio) [12:49:38] 06Operations, 10ops-codfw: troubleshoot drac on ms-be2010.codfw.wmnet - https://phabricator.wikimedia.org/T155690#3101868 (10fgiunchedi) a:05fgiunchedi>03RobH [12:50:06] 06Operations, 10ops-eqiad, 13Patch-For-Review, 15User-fgiunchedi: rack/setup prometheus100[3-4] - https://phabricator.wikimedia.org/T152504#3101876 (10fgiunchedi) [12:50:38] 06Operations, 10ops-codfw, 13Patch-For-Review: rack/setup restbase201[0-2] - https://phabricator.wikimedia.org/T150680#3101879 (10fgiunchedi) 05Open>03Resolved This was done a while ago [12:51:42] (03PS3) 10Gehel: wdqs - set heap size for blazegraph from puppet [puppet] - 10https://gerrit.wikimedia.org/r/342695 (https://phabricator.wikimedia.org/T160218) [12:53:10] 06Operations, 10ops-codfw, 06DC-Ops, 13Patch-For-Review: mw2212 had several downtimes recently - test before repool - https://phabricator.wikimedia.org/T129196#3101909 (10fgiunchedi) 05Open>03Resolved No issues, resolving [12:53:36] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: fix erb yaml indentation [puppet] - 10https://gerrit.wikimedia.org/r/342823 (owner: 10Filippo Giunchedi) [12:53:49] jouncebot: next [12:53:49] In 0 hour(s) and 6 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170315T1300) [12:58:05] (03CR) 10Gehel: [C: 032] wdqs - set heap size for blazegraph from puppet [puppet] - 10https://gerrit.wikimedia.org/r/342695 (https://phabricator.wikimedia.org/T160218) (owner: 10Gehel) [12:58:13] (03PS4) 10Gehel: wdqs - set heap size for blazegraph from puppet [puppet] - 10https://gerrit.wikimedia.org/r/342695 (https://phabricator.wikimedia.org/T160218) [13:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170315T1300). [13:00:05] Urbanecm, Zppix, and Phuedx: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:00:09] 06Operations, 10media-storage: File not found after reupload - https://phabricator.wikimedia.org/T125140#3101922 (10fgiunchedi) a:05fgiunchedi>03None Indeed the previous version from 2007 doesn't seem to be there, not sure there's much we can do [13:00:27] i can swat today [13:00:34] i typed it with both hands [13:00:37] i'm a monster [13:00:40] o/ [13:00:49] Urbanecm: yt? [13:00:57] o/ Zppix [13:01:00] 06Operations, 05Prometheus-metrics-monitoring, 15User-fgiunchedi: Provide authenticated access to Prometheus native web interface - https://phabricator.wikimedia.org/T151009#3101927 (10fgiunchedi) [13:01:22] alright Zppix, we'll start with yours [13:01:25] ok [13:01:33] give Urbanecm a little time to appear/get ready [13:01:46] 06Operations, 10RESTBase, 10RESTBase-Cassandra, 13Patch-For-Review, 06Services (watching): rename cassandra cluster - https://phabricator.wikimedia.org/T112257#3101930 (10fgiunchedi) a:05fgiunchedi>03None [13:01:57] (03PS16) 10Phuedx: Removal of "editusercssjs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342456 (owner: 10Zppix) [13:04:01] (03PS1) 10Elukey: Update prometheus jmx_exporter path in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/342825 (https://phabricator.wikimedia.org/T155120) [13:04:22] (03CR) 10Volans: [C: 04-1] "Thanks for the work and the tests! Looks good in general, see "few" things inline ;)" (0325 comments) [switchdc] - 10https://gerrit.wikimedia.org/r/342498 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [13:04:25] !log syncing puppet git repo on wdqs-puppet.wikidata-query.eqiad.wmflabs [13:04:28] PROBLEM - puppet last run on db1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:04:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:04:47] (03CR) 10Phuedx: [C: 032] "SWAT!!1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342456 (owner: 10Zppix) [13:05:21] lol - see "few" things inline ;)" (25 comments) [13:05:32] Zppix: ^ once it's merged, i'll pull it onto mwdebug1001 so that you can test it [13:05:33] volans strikes back [13:05:39] phuedx: ack [13:05:43] elukey: lol [13:05:46] elukey: lol [13:05:58] it's the wink that makes that comment [13:06:00] (03Merged) 10jenkins-bot: Removal of "editusercssjs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342456 (owner: 10Zppix) [13:07:03] (03CR) 10Volans: [C: 04-1] "One typo, looks good otherwise" (031 comment) [switchdc] - 10https://gerrit.wikimedia.org/r/342666 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [13:07:20] (03CR) 10jenkins-bot: Removal of "editusercssjs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342456 (owner: 10Zppix) [13:07:26] (03PS1) 10Muehlenhoff: Add a delay in LDAP memberof rewrite script to cope with replication [puppet] - 10https://gerrit.wikimedia.org/r/342826 (https://phabricator.wikimedia.org/T142817) [13:07:29] unexpected commit in the diff on the deployment host [13:07:52] oh, it's beta only, merged by addshore [13:07:53] okie poke [13:07:58] hmmmm [13:08:18] phuedx: question, do i login to debug the same as labs? [13:08:29] addshore: https://gerrit.wikimedia.org/r/#/c/341401/3 [13:08:29] phuedx: I did sync that... [13:08:53] elukey: :-P [13:08:54] unlesss..... I didn't..... [13:09:28] addshore: it's scapped by jenkins automatically, right? /cc zeljkof [13:09:36] for beta yes [13:09:38] 06Operations, 07LDAP, 13Patch-For-Review: Enhance group membership visibility using the memberof LDAP overlay - https://phabricator.wikimedia.org/T142817#3101953 (10MoritzMuehlenhoff) I've revisited this and after some digging it's apparent that maintaining the memberOf attribute of all newly added groups is... [13:09:58] It would appear what I did was, not fetch it and not rebase it, but just sync the old version of the files again... [13:10:12] *file [13:10:19] Zppix: you use the WikimediaDebug browser extension [13:10:26] Zppix: e.g. you use https://chrome.google.com/webstore/detail/wikimediadebug/binmakecefompkjggiklgjenddjoifbb for chrome [13:10:27] (03CR) 10Filippo Giunchedi: [C: 04-1] Update prometheus jmx_exporter path in deployment-prep (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342825 (https://phabricator.wikimedia.org/T155120) (owner: 10Elukey) [13:10:42] phuedx: yeah i misread the deploy code page [13:11:09] addshore, zeljkof: not quite sure what to do here [13:11:14] addshore: want me to sync the file? [13:11:21] phuedx: I can sync it if you would like :) [13:11:29] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/342633 (https://phabricator.wikimedia.org/T142825) (owner: 10Muehlenhoff) [13:11:30] addshore: go for it while i help Zppix [13:11:40] {{doing}} [13:11:51] note that tin currently has https://gerrit.wikimedia.org/r/#/c/342456/ on it too [13:11:55] addshore ^ [13:12:04] ack :) [13:12:21] testing now phuedx [13:13:21] phuedx: all good no errors on my end [13:13:48] (03PS2) 10Elukey: Update prometheus jmx_exporter path in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/342825 (https://phabricator.wikimedia.org/T155120) [13:14:05] Zppix: cool, addshore: lmk when you're done [13:14:19] will do, the sync seems to be waiting for 1 host [13:14:22] !log addshore@tin Synchronized wmf-config/InitialiseSettings-labs.php: [[gerrit:341401|Enable Cognate on beta wiktionary sites]] T156241 Beta Only (again) (duration: 02m 45s) [13:14:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:14:27] T156241: Deploy Cognate extension to beta - https://phabricator.wikimedia.org/T156241 [13:14:28] phuedx: ^^ thats my mess all gone [13:14:38] and FYI ssh: connect to host mw2256.codfw.wmnet port 22: Connection timed out [13:14:44] (03CR) 10Elukey: [V: 032 C: 032] "Didn't get the meaning of SNAPSHOT before your comment, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/342825 (https://phabricator.wikimedia.org/T155120) (owner: 10Elukey) [13:15:04] addshore: ta <3 [13:15:21] It looks like mw2256 is down, so you will likely get the same error on your sync [13:15:39] addshore: it is depooled at the moment i believe i remember seeing on phab [13:17:10] 06Operations, 10ops-codfw, 13Patch-For-Review, 15User-Elukey: codfw: mw2251-mw2260 rack/setup - https://phabricator.wikimedia.org/T155180#2936454 (10Addshore) >>! In T155180#3100345, @Reedy wrote: > @elukey Can we get mw2256 depooled from the dsh lists etc? > > Will stop scap giving timeout errors for the... [13:17:18] RECOVERY - puppet last run on mw1194 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [13:17:29] Urbanecm: ping pong [13:18:04] addshore: o/ - timeouts for mw2256? [13:18:10] yup [13:18:14] elukey: yup [13:18:22] weird, let me check [13:18:29] !log phuedx@tin Synchronized wmf-config/InitialiseSettings.php: 342456: Remove "editusercssjs". (duration: 02m 50s) [13:18:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:37] Zppix: ^ [13:18:38] I saw in the ticket a SAL for it being depooled, but.. :P [13:18:40] (03PS1) 10Alexandros Kosiaris: Remove role::list:migration [puppet] - 10https://gerrit.wikimedia.org/r/342828 [13:18:53] Thanks phuedx :) [13:19:09] ok, i guess i'm up next [13:19:23] ah it wasn't set inactive [13:19:36] I am running puppet on tin to update the dsh [13:20:57] addshore: now mw2256 is removed from dsh, should not return timeouts anymore [13:20:58] :) [13:21:04] thanks elukey! [13:21:20] (03PS2) 10Phuedx: pagePreviews: Enable perf instrumentation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342672 (https://phabricator.wikimedia.org/T157111) [13:21:26] thanks elukey [13:21:32] 06Operations, 10ops-codfw, 13Patch-For-Review, 15User-Elukey: codfw: mw2251-mw2260 rack/setup - https://phabricator.wikimedia.org/T155180#3101984 (10elukey) set mw2256 as inactive and ran puppet on tin, host removed from DSH. [13:21:58] thanks you guys for the patience, sorry for the noise :( [13:22:07] (03CR) 10Muehlenhoff: [C: 032] Add a delay in LDAP memberof rewrite script to cope with replication [puppet] - 10https://gerrit.wikimedia.org/r/342826 (https://phabricator.wikimedia.org/T142817) (owner: 10Muehlenhoff) [13:22:09] elukey: we all make mistakes [13:22:12] (03PS2) 10Muehlenhoff: Add a delay in LDAP memberof rewrite script to cope with replication [puppet] - 10https://gerrit.wikimedia.org/r/342826 (https://phabricator.wikimedia.org/T142817) [13:24:44] (03PS3) 10Phuedx: pagePreviews: Enable perf instrumentation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342672 (https://phabricator.wikimedia.org/T157111) [13:25:16] come on jenkins [13:25:43] Urbanecm: yt? if so, then i can deploy your change (https://gerrit.wikimedia.org/r/#/c/342609/) [13:25:44] its almost finished [13:27:19] (03CR) 10Phuedx: [C: 032] "SWAT!!1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342672 (https://phabricator.wikimedia.org/T157111) (owner: 10Phuedx) [13:27:41] (03PS1) 10Elukey: Update Cassandra jmx_exporter config path in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/342829 (https://phabricator.wikimedia.org/T155120) [13:27:53] (03PS2) 10Elukey: Update Cassandra jmx_exporter config path in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/342829 (https://phabricator.wikimedia.org/T155120) [13:29:01] (03CR) 10Elukey: [V: 032 C: 032] Update Cassandra jmx_exporter config path in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/342829 (https://phabricator.wikimedia.org/T155120) (owner: 10Elukey) [13:29:31] (03Merged) 10jenkins-bot: pagePreviews: Enable perf instrumentation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342672 (https://phabricator.wikimedia.org/T157111) (owner: 10Phuedx) [13:29:39] (03CR) 10jenkins-bot: pagePreviews: Enable perf instrumentation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342672 (https://phabricator.wikimedia.org/T157111) (owner: 10Phuedx) [13:30:58] ^ pulling the change onto mwdebug1001 [13:32:28] RECOVERY - puppet last run on db1030 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [13:33:03] 06Operations, 07LDAP, 13Patch-For-Review: Enhance group membership visibility using the memberof LDAP overlay - https://phabricator.wikimedia.org/T142817#3102012 (10MoritzMuehlenhoff) All privileged groups have been updated with the exception of cn=wmf and cn=ops, I will these during an upcoming early mornin... [13:34:42] ok, looks good on mwdebug1001 [13:34:59] page previews working as expected on mw.org and the sampling rate is showing up [13:36:21] jouncebot, next [13:36:21] In 4 hour(s) and 23 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170315T1800) [13:36:30] jouncebot, now [13:36:30] For the next 0 hour(s) and 23 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170315T1300) [13:37:47] !log phuedx@tin Synchronized wmf-config/InitialiseSettings.php: T157111: pagePreviews: Enable perf instrumentation (duration: 00m 42s) [13:37:51] phuedx, I saw you mentioned me before a moment. I'm sorry I thought the EU SWAT is from 14:00 to 15:00 UTC. Could my change be deployed? [13:37:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:37:54] T157111: [8 hours] Create a Page Previews performance dashboard - https://phabricator.wikimedia.org/T157111 [13:38:36] Urbanecm: remember deploy slots are not locked to UTC ;) [13:38:51] Urbanecm: sure [13:38:53] ! :) [13:38:56] also, no worries [13:39:01] addshore, to what are the windows locked then? [13:39:11] (03PS2) 10Phuedx: Add d to enwikisource's import list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342609 (https://phabricator.wikimedia.org/T160403) (owner: 10Urbanecm) [13:39:43] to PDT? [13:39:52] whatever timezone SF is [13:40:12] so, yes PDT I believe [13:41:50] (03CR) 10Phuedx: [C: 032] "SWAT!!1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342609 (https://phabricator.wikimedia.org/T160403) (owner: 10Urbanecm) [13:42:53] (03Merged) 10jenkins-bot: Add d to enwikisource's import list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342609 (https://phabricator.wikimedia.org/T160403) (owner: 10Urbanecm) [13:43:04] (03CR) 10jenkins-bot: Add d to enwikisource's import list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342609 (https://phabricator.wikimedia.org/T160403) (owner: 10Urbanecm) [13:43:52] Urbanecm: can your change be tested on mwdebug1001? i've pulled it onto that server [13:44:17] phuedx, it can be but only with sysop rights which I do not have. [13:44:30] ah [13:44:35] I can test it on your behalf phuedx Urbanecm [13:44:41] TabbyCat: <3 <3 <3 [13:44:45] TabbyCat, thank you! [13:44:52] what was done? [13:45:03] add d: to import? [13:45:39] TabbyCat: yup [13:45:46] (03PS4) 10Muehlenhoff: Add support for Phabricator offboarding to offboarding script [puppet] - 10https://gerrit.wikimedia.org/r/342633 (https://phabricator.wikimedia.org/T142825) [13:46:13] d is in the dropdown of the Special:Import @en.wikisource on mwdebug1001 [13:47:21] Urbanecm: & phuedx ^^ [13:47:30] (03PS1) 10Ema: cache: different parity for start/end ip_local_port_range values [puppet] - 10https://gerrit.wikimedia.org/r/342832 [13:47:35] TabbyCat, then it works as it should. [13:47:36] cool [13:47:38] syncing [13:47:39] (03CR) 10Muehlenhoff: [C: 032] Add support for Phabricator offboarding to offboarding script [puppet] - 10https://gerrit.wikimedia.org/r/342633 (https://phabricator.wikimedia.org/T142825) (owner: 10Muehlenhoff) [13:47:40] yes [13:47:41] phuedx, thx [13:47:52] can I invoice you now? :P [13:48:11] (my pleasure to help) [13:48:44] !log phuedx@tin Synchronized wmf-config/InitialiseSettings.php: T160403: Add d to enwikisource's import list (duration: 00m 42s) [13:48:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:50] T160403: Add wikidata to English Wikisource's Special:Import - https://phabricator.wikimedia.org/T160403 [13:50:41] cool [13:54:27] (03PS1) 10Alexandros Kosiaris: Set up sync for icinga state between hosts [puppet] - 10https://gerrit.wikimedia.org/r/342833 [14:01:25] jouncebot: next [14:01:25] In 3 hour(s) and 58 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170315T1800) [14:04:10] (03PS2) 10Alexandros Kosiaris: Set up sync for icinga state between hosts [puppet] - 10https://gerrit.wikimedia.org/r/342833 [14:05:48] !log uploaded python-phabricator 0.6.1-1~bpo8~trusty1 for trusty-wikimedia to apt.wikimedia.org (required for Phabricator support in offboarding script running on terbium (trusty)) [14:05:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:18] PROBLEM - puppet last run on db1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:20:01] 06Operations, 10fundraising-tech-ops, 10netops: set up firewall policies for barium, lutetium, db1025, and indium replacement servers - https://phabricator.wikimedia.org/T159336#3102151 (10Jgreen) p:05Normal>03Unbreak! Raising priority because this blocks deprecating Precise in fundraising. [14:21:41] (03CR) 10Giuseppe Lavagetto: Add redis switching task, some more stages boilerplate (0324 comments) [switchdc] - 10https://gerrit.wikimedia.org/r/342498 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [14:21:53] 06Operations, 10fundraising-tech-ops, 10netops: set up firewall policies for barium, lutetium, db1025, and indium replacement servers - https://phabricator.wikimedia.org/T159336#3102171 (10Jgreen) [14:22:46] !log installing chromium security update on osmium [14:22:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:04] 06Operations, 10ops-eqiad: rack and cable frdb1002 - https://phabricator.wikimedia.org/T159886#3102178 (10Jgreen) p:05Normal>03High This blocks deprecating Precise in fundraising, thus priority is high. [14:25:04] (03CR) 10Giuseppe Lavagetto: Add cache wipe + warmup phase implementation (031 comment) [switchdc] - 10https://gerrit.wikimedia.org/r/342666 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [14:26:22] (03PS1) 10Elukey: Fix cassandra seeds for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/342836 [14:26:59] 06Operations, 10fundraising-tech-ops, 10netops: deploy firewall policies for (barium,lutetium,db1025,indium) replacements (civi1001,frdev1001,frdb1002,frlog1001) - https://phabricator.wikimedia.org/T159336#3102204 (10Jgreen) [14:27:40] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: rack and cable frdev1001 - https://phabricator.wikimedia.org/T159887#3102205 (10Jgreen) p:05Normal>03High This blocks deprecating Precise in Fundraising, thus raising priority to high. [14:27:54] (03CR) 10Elukey: "There is probably a better and less repetitive way to avoid this mess, please let me know in case and I'll amend the code review :)" [puppet] - 10https://gerrit.wikimedia.org/r/342836 (owner: 10Elukey) [14:28:12] (03CR) 10Giuseppe Lavagetto: Add conftool support (031 comment) [switchdc] - 10https://gerrit.wikimedia.org/r/342805 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [14:28:44] 06Operations, 10fundraising-tech-ops, 10netops: deploy firewall policies for (barium,lutetium,db1025,indium) replacements (civi1001,frdev1001,frdb1002,frlog1001) - https://phabricator.wikimedia.org/T159336#3064590 (10Jgreen) [14:30:09] 06Operations, 10fundraising-tech-ops, 10netops: deploy firewall policies for (barium,lutetium,db1025,indium) replacements (civi1001,frdev1001,frdb1002,frlog1001) - https://phabricator.wikimedia.org/T159336#3102213 (10Jgreen) [14:30:52] 06Operations, 10fundraising-tech-ops, 10netops: deploy firewall policies for (barium,lutetium,db1025,indium) replacements (civi1001,frdev1001,frdb1002,frlog1001) - https://phabricator.wikimedia.org/T159336#3064590 (10Jgreen) [14:30:56] 06Operations, 10fundraising-tech-ops, 10netops: reassign wmf7010/frpm1001 to host "civi1001.frack.eqiad.wmnet" - https://phabricator.wikimedia.org/T159342#3102217 (10Jgreen) [14:32:23] (03CR) 10Giuseppe Lavagetto: Add stages to manage maintenance (037 comments) [switchdc] - 10https://gerrit.wikimedia.org/r/342806 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [14:33:23] 06Operations, 10Wikimedia-General-or-Unknown, 13Patch-For-Review: foreachwikiindblist regular cronspam - https://phabricator.wikimedia.org/T159438#3102242 (10demon) >>! In T159438#3100976, @elukey wrote: > The last cronspam email seems to be the same issue described in T145360, we have more than one issue wi... [14:33:40] (03PS6) 10Giuseppe Lavagetto: Add redis switching task, some more stages boilerplate [switchdc] - 10https://gerrit.wikimedia.org/r/342498 (https://phabricator.wikimedia.org/T160178) [14:33:43] (03PS4) 10Giuseppe Lavagetto: Decouple logging setup from importing the module [switchdc] - 10https://gerrit.wikimedia.org/r/342657 [14:33:45] (03PS5) 10Giuseppe Lavagetto: Add cache wipe + warmup phase implementation [switchdc] - 10https://gerrit.wikimedia.org/r/342666 (https://phabricator.wikimedia.org/T160178) [14:33:47] (03PS4) 10Giuseppe Lavagetto: Add conftool support [switchdc] - 10https://gerrit.wikimedia.org/r/342805 (https://phabricator.wikimedia.org/T160178) [14:33:49] (03PS4) 10Giuseppe Lavagetto: Add stages to manage maintenance [switchdc] - 10https://gerrit.wikimedia.org/r/342806 (https://phabricator.wikimedia.org/T160178) [14:37:50] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 2 others: Purge Varnish cache when a banner is saved - https://phabricator.wikimedia.org/T154954#3102252 (10Pcoombe) @AndyRussG That sounds fine to me. [14:46:18] RECOVERY - puppet last run on db1021 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [14:46:40] 06Operations, 10ops-codfw, 10hardware-requests: Decom db2001-db2009 - https://phabricator.wikimedia.org/T125827#3102271 (10Papaul) [14:51:19] (03CR) 10Elukey: "From https://wikitech.wikimedia.org/wiki/Puppet_Hiera#In_Labs this one seems the only viable option?" [puppet] - 10https://gerrit.wikimedia.org/r/342836 (owner: 10Elukey) [14:53:07] jouncebot: next [14:53:07] In 3 hour(s) and 6 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170315T1800) [14:57:43] (03CR) 10Elukey: [C: 032] Fix cassandra seeds for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/342836 (owner: 10Elukey) [14:58:18] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:58:38] PROBLEM - Check Varnish expiry mailbox lag on cp1072 is CRITICAL: CRITICAL: expiry mailbox lag is 632182 [15:00:11] (03PS5) 10Gehel: elasticsearch - statsd plugin isn't used anymore [puppet] - 10https://gerrit.wikimedia.org/r/342052 [15:00:27] (03CR) 10Volans: [C: 031] "LGTM" (031 comment) [switchdc] - 10https://gerrit.wikimedia.org/r/342498 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [15:01:09] (03PS2) 10Ema: cache: different parity for start/end ip_local_port_range values [puppet] - 10https://gerrit.wikimedia.org/r/342832 [15:01:25] (03CR) 10DCausse: [C: 031] elasticsearch - statsd plugin isn't used anymore [puppet] - 10https://gerrit.wikimedia.org/r/342052 (owner: 10Gehel) [15:02:20] (03CR) 10Volans: [C: 031] "LGTM" [switchdc] - 10https://gerrit.wikimedia.org/r/342666 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [15:03:05] (03CR) 10Volans: [C: 031] "LGTM" [switchdc] - 10https://gerrit.wikimedia.org/r/342805 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [15:05:22] (03CR) 10Volans: [C: 031] "LGTM" [switchdc] - 10https://gerrit.wikimedia.org/r/342806 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [15:18:58] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 606310 [15:19:34] (03PS6) 10Gehel: elasticsearch - statsd plugin isn't used anymore [puppet] - 10https://gerrit.wikimedia.org/r/342052 [15:20:58] (03CR) 10Gehel: [C: 032] elasticsearch - statsd plugin isn't used anymore [puppet] - 10https://gerrit.wikimedia.org/r/342052 (owner: 10Gehel) [15:22:07] 06Operations, 10ops-codfw, 10hardware-requests: Decom db2001-db2009 - https://phabricator.wikimedia.org/T125827#3102427 (10Papaul) [15:26:18] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [15:27:00] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: Rack and setup Fundraising DB frdb1001 - https://phabricator.wikimedia.org/T136200#3102455 (10Jgreen) [15:29:13] 06Operations, 10fundraising-tech-ops, 10netops: Cleanup layer2 firewall config from pfw-eqiad - https://phabricator.wikimedia.org/T111463#3102464 (10Jgreen) [15:29:18] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 22 probes of 277 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [15:31:31] (03PS1) 10Papaul: DNS/Decom: Remove DNS entries for db200[1-9] [dns] - 10https://gerrit.wikimedia.org/r/342841 [15:32:42] (03CR) 10RobH: [C: 04-1] "-1 since the sda2 and sdb2 partitions are still created, just not labled/assigned. See in line comment =]" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342662 (https://phabricator.wikimedia.org/T158884) (owner: 10Gehel) [15:33:04] 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Decom db2001-db2009 - https://phabricator.wikimedia.org/T125827#3102470 (10Papaul) switch port information All servers are in row A rack A6 db2001 ge-6/0/0 db2002 ge-6/0/1 db2003 ge-6/0/2 db2004 ge-6/0/3 db2005 ge-6/0/4 db2006 ge-6/0/5 db... [15:34:18] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 15 probes of 277 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [15:34:29] 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Decom db2001-db2009 - https://phabricator.wikimedia.org/T125827#3102474 (10Papaul) a:05Papaul>03RobH [15:34:41] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: decommission the old pay-lvs1001/pay-lvs1002 boxes - https://phabricator.wikimedia.org/T156284#3102475 (10Jgreen) a:03Cmjohnson [15:35:33] (03PS2) 10Gehel: elasticsearch - no need to use swap [puppet] - 10https://gerrit.wikimedia.org/r/342662 (https://phabricator.wikimedia.org/T158884) [15:36:26] 06Operations, 10ops-eqiad: decommission the old pay-lvs1001/pay-lvs1002 boxes - https://phabricator.wikimedia.org/T156284#2969512 (10Jgreen) [15:37:33] (03PS2) 10Giuseppe Lavagetto: discovery: add parsoid entry [puppet] - 10https://gerrit.wikimedia.org/r/340992 [15:37:35] (03PS2) 10Giuseppe Lavagetto: discovery: add more DNS entries [puppet] - 10https://gerrit.wikimedia.org/r/340994 [15:37:37] (03PS2) 10Giuseppe Lavagetto: realm: remove graphoid_site [puppet] - 10https://gerrit.wikimedia.org/r/340995 [15:37:39] (03PS2) 10Giuseppe Lavagetto: realm: get rid of more entries [puppet] - 10https://gerrit.wikimedia.org/r/340996 [15:37:41] (03PS2) 10Giuseppe Lavagetto: realm: remove rb_site [puppet] - 10https://gerrit.wikimedia.org/r/340997 [15:37:43] (03PS2) 10Giuseppe Lavagetto: discovery: add api endpoint [puppet] - 10https://gerrit.wikimedia.org/r/340998 [15:37:45] (03PS2) 10Giuseppe Lavagetto: realm: remove most references to mwprimary where dns discovery should be enough. [puppet] - 10https://gerrit.wikimedia.org/r/340999 [15:38:41] (03CR) 10RobH: [C: 031] "Looks much better!" [puppet] - 10https://gerrit.wikimedia.org/r/342662 (https://phabricator.wikimedia.org/T158884) (owner: 10Gehel) [15:39:55] (03PS3) 10Gehel: elasticsearch - no need to use swap [puppet] - 10https://gerrit.wikimedia.org/r/342662 (https://phabricator.wikimedia.org/T158884) [15:43:14] (03CR) 10jerkins-bot: [V: 04-1] realm: remove most references to mwprimary where dns discovery should be enough. [puppet] - 10https://gerrit.wikimedia.org/r/340999 (owner: 10Giuseppe Lavagetto) [15:45:07] !log For the record: deployed schema change on s2 and s6 for image table (add an index) - T160415 [15:45:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:13] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [15:45:39] (03PS1) 10Elukey: Add AQS Beta configuration to deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/342842 [15:47:25] (03CR) 10Elukey: [V: 032 C: 032] Add AQS Beta configuration to deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/342842 (owner: 10Elukey) [15:47:48] !log uploaded new HHVM 3.18 package with backported patch for stat_cache regression (T158176) [15:47:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:47:53] T158176: Build / migrate to HHVM 3.18 - https://phabricator.wikimedia.org/T158176 [15:51:23] (03PS3) 10DatGuy: Turn off patrolling for FlaggedRevs in bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341350 (https://phabricator.wikimedia.org/T158662) [15:52:06] (03PS4) 10DatGuy: Turn off patrolling for FlaggedRevs in bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341350 (https://phabricator.wikimedia.org/T158662) [15:53:51] (03CR) 10Gehel: [C: 032] elasticsearch - no need to use swap [puppet] - 10https://gerrit.wikimedia.org/r/342662 (https://phabricator.wikimedia.org/T158884) (owner: 10Gehel) [15:53:58] (03PS4) 10Gehel: elasticsearch - no need to use swap [puppet] - 10https://gerrit.wikimedia.org/r/342662 (https://phabricator.wikimedia.org/T158884) [15:54:11] (03CR) 10Gehel: [V: 032 C: 032] elasticsearch - no need to use swap [puppet] - 10https://gerrit.wikimedia.org/r/342662 (https://phabricator.wikimedia.org/T158884) (owner: 10Gehel) [15:58:56] !log upgraded jessie systems running HHVM in deployment-prep to 3.18.1+dfsg-1+wmf1 [15:59:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:30] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 2 others: Purge Varnish cache when a banner is saved - https://phabricator.wikimedia.org/T154954#3102614 (10DStrine) I'm cool with this too. thanks! [16:09:53] (03PS1) 10ArielGlenn: scripts to generate a series of checkpoint files for a dump run manually [dumps] - 10https://gerrit.wikimedia.org/r/342846 (https://phabricator.wikimedia.org/T160507) [16:18:34] (03CR) 10Eevans: [C: 031] "> could you add it in labs/private as it should be in prod? then the" [puppet] - 10https://gerrit.wikimedia.org/r/342088 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [16:18:38] RECOVERY - Check Varnish expiry mailbox lag on cp1072 is OK: OK: expiry mailbox lag is 9228 [16:23:38] PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 381 bytes in 0.032 second response time [16:24:43] icinga-wm: all k8s workers nodes are healthy and critical in the same sentence is kinda contridicing [16:25:14] No [16:25:18] That's what it's checking for [16:25:35] maybe it should be reworded then? [16:40:13] (03PS1) 10Yuvipanda: tools: Set k8s service IP range to new range [puppet] - 10https://gerrit.wikimedia.org/r/342852 (https://phabricator.wikimedia.org/T152399) [16:43:31] (03PS1) 10Addshore: wmgUseInterwikiSorting false everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342853 (https://phabricator.wikimedia.org/T160465) [16:44:00] 06Operations, 10MediaWiki-extensions-InterwikiSorting, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Deploy InterwikiSorting extension to production - https://phabricator.wikimedia.org/T150183#3102899 (10Addshore) 05Resolved>03Open [16:46:50] *looks around* is anyone deploying anything? if not I'm going to push the above patch out [16:47:28] 06Operations, 07HHVM, 13Patch-For-Review, 07Upstream: Build / migrate to HHVM 3.18 - https://phabricator.wikimedia.org/T158176#3102910 (10hashar) I guess we can upgrade hhvm on beta cluster and drop the revert patch from the beta cluster puppet master ( https://gerrit.wikimedia.org/r/#/c/341916/ ). [16:48:28] 06Operations, 07HHVM, 13Patch-For-Review, 07Upstream: Build / migrate to HHVM 3.18 - https://phabricator.wikimedia.org/T158176#3102911 (10Reedy) >>! In T158176#3102910, @hashar wrote: > I guess we can upgrade hhvm on beta cluster and drop the revert patch from the beta cluster puppet master ( https://gerri... [16:48:46] (03CR) 10Addshore: [C: 032] wmgUseInterwikiSorting false everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342853 (https://phabricator.wikimedia.org/T160465) (owner: 10Addshore) [16:52:08] (03Merged) 10jenkins-bot: wmgUseInterwikiSorting false everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342853 (https://phabricator.wikimedia.org/T160465) (owner: 10Addshore) [16:52:17] (03CR) 10jenkins-bot: wmgUseInterwikiSorting false everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342853 (https://phabricator.wikimedia.org/T160465) (owner: 10Addshore) [16:52:21] 06Operations, 10fundraising-tech-ops, 10netops: reassign wmf7010/frpm1001 to host "civi1001.frack.eqiad.wmnet" - https://phabricator.wikimedia.org/T159342#3102920 (10faidon) [16:52:33] 06Operations, 10fundraising-tech-ops, 10netops: deploy firewall policies for (barium,lutetium,db1025,indium) replacements (civi1001,frdev1001,frdb1002,frlog1001) - https://phabricator.wikimedia.org/T159336#3102917 (10faidon) 05Open>03Resolved a:03faidon That's done now :) [16:53:38] RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.259 second response time [16:54:12] 06Operations, 10ops-codfw, 10DBA: es2015 crashed on 2017-03-11 - https://phabricator.wikimedia.org/T160242#3102927 (10Papaul) Dell we replace the main board and the CPU' Hi Papaul, Please accept my apologies for the delayed response. I just came into office and hence there was a delay in the response. The... [16:55:18] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:342853|wmgUseInterwikiSorting false everywhere]] T160465 T150183 (duration: 01m 01s) [16:56:59] 06Operations, 10ops-codfw, 10DBA: es2015 crashed on 2017-03-11 - https://phabricator.wikimedia.org/T160242#3102940 (10jcrespo) Thank you very much! We will shutdown the server tomorrow ahead of time- ping us if have more details about the predicted schedule for it. [17:03:23] !log demon@tin Synchronized wmf-config/: pruning old extensionmessages files (duration: 00m 49s) [17:03:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:05:08] (03PS1) 10Volans: Bump version [software/cumin] - 10https://gerrit.wikimedia.org/r/342856 [17:06:44] jouncebot: now [17:06:44] No deployments scheduled for the next 0 hour(s) and 53 minute(s) [17:06:46] jouncebot: next [17:06:47] In 0 hour(s) and 53 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170315T1800) [17:08:09] (03PS1) 10Chad: Move contribution tracking config to CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342857 (https://phabricator.wikimedia.org/T147479) [17:08:24] 06Operations, 10fundraising-tech-ops, 10netops: deploy firewall policies for (barium,lutetium,db1025,indium) replacements (civi1001,frdev1001,frdb1002,frlog1001) - https://phabricator.wikimedia.org/T159336#3102993 (10Jgreen) Great, thank you!!! [17:08:46] (03CR) 10Volans: [C: 032] Bump version [software/cumin] - 10https://gerrit.wikimedia.org/r/342856 (owner: 10Volans) [17:08:58] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 29446 [17:10:08] (03Merged) 10jenkins-bot: Bump version [software/cumin] - 10https://gerrit.wikimedia.org/r/342856 (owner: 10Volans) [17:11:55] (03CR) 10Zppix: Move contribution tracking config to CommonSettings.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342857 (https://phabricator.wikimedia.org/T147479) (owner: 10Chad) [17:13:13] (03CR) 10Chad: Move contribution tracking config to CommonSettings.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342857 (https://phabricator.wikimedia.org/T147479) (owner: 10Chad) [17:15:14] (03Draft2) 10Zppix: Remove unprofessional comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342860 [17:16:22] (03CR) 10Chad: [C: 032] "For what it's worth, some users are idiots." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342860 (owner: 10Zppix) [17:17:53] Chad: Still publicly viewable things like that should contain such comments it looks bad [17:18:02] eshouldnt* [17:18:10] (03PS3) 10Chad: Remove unprofessional comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342860 (owner: 10Zppix) [17:18:25] (03CR) 10Chad: [C: 032] Remove unprofessional comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342860 (owner: 10Zppix) [17:18:42] Looks bad? Heh [17:18:55] * RainbowSprinkles chuckles [17:19:00] RainbowSprinkles: i meant on WMF and stuff it makes please look bad [17:19:11] jesus i cant type [17:19:14] Not really, but w/e [17:19:18] I'd rather merge than fight over it [17:19:42] it is being merged :P [17:19:43] (03Merged) 10jenkins-bot: Remove unprofessional comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342860 (owner: 10Zppix) [17:19:51] (03CR) 10jenkins-bot: Remove unprofessional comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342860 (owner: 10Zppix) [17:20:10] (03CR) 10Rush: [C: 032] tools: Set k8s service IP range to new range [puppet] - 10https://gerrit.wikimedia.org/r/342852 (https://phabricator.wikimedia.org/T152399) (owner: 10Yuvipanda) [17:20:11] that's why he merged it [17:20:14] chad@notsexy /a/vag/mediawiki (master)$ git log --grep=fuck | wc -l [17:20:14] 119 [17:20:21] Hehehehehe [17:20:25] Wonder how many of those are mine [17:20:59] I remember trying to change something like that in a Debian package way back in the day. Didn't go well. :) [17:21:11] Wait for it.... [17:21:39] !log demon@tin Synchronized wmf-config/CommonSettings.php: Stop calling an idiot user an idiot (duration: 00m 42s) [17:21:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:21:50] * RainbowSprinkles laughs at his own joke, wanders off to have coffee [17:22:18] (03PS1) 10Volans: Upgrade to version 0.0.2 [software/cumin] - 10https://gerrit.wikimedia.org/r/342861 [17:22:21] (03CR) 10Zppix: Move contribution tracking config to CommonSettings.php (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342857 (https://phabricator.wikimedia.org/T147479) (owner: 10Chad) [17:23:03] (03PS2) 10Chad: Move contribution tracking config to CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342857 (https://phabricator.wikimedia.org/T147479) [17:23:36] (03PS1) 10Filippo Giunchedi: prometheus: fix file permissions and servertech template [puppet] - 10https://gerrit.wikimedia.org/r/342862 (https://phabricator.wikimedia.org/T148541) [17:23:53] (03PS2) 10Filippo Giunchedi: prometheus: fix file permissions and servertech template [puppet] - 10https://gerrit.wikimedia.org/r/342862 (https://phabricator.wikimedia.org/T148541) [17:24:10] (03PS4) 10Yuvipanda: tools: Allow readonly access to all namespace objects [puppet] - 10https://gerrit.wikimedia.org/r/341191 [17:24:12] (03PS2) 10Yuvipanda: tools: Set k8s service IP range to new range [puppet] - 10https://gerrit.wikimedia.org/r/342852 (https://phabricator.wikimedia.org/T152399) [17:24:25] (03CR) 10Yuvipanda: [V: 032 C: 032] tools: Allow readonly access to all namespace objects [puppet] - 10https://gerrit.wikimedia.org/r/341191 (owner: 10Yuvipanda) [17:24:36] (03CR) 10Yuvipanda: [V: 032] tools: Set k8s service IP range to new range [puppet] - 10https://gerrit.wikimedia.org/r/342852 (https://phabricator.wikimedia.org/T152399) (owner: 10Yuvipanda) [17:26:46] (03PS1) 10Jgreen: reassign wmf7010/frpm1001 mgmt to civi1001, add A+PTR for civi1001.frack.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/342863 (https://phabricator.wikimedia.org/T159342) [17:27:53] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: fix file permissions and servertech template [puppet] - 10https://gerrit.wikimedia.org/r/342862 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [17:27:58] (03PS3) 10Filippo Giunchedi: prometheus: fix file permissions and servertech template [puppet] - 10https://gerrit.wikimedia.org/r/342862 (https://phabricator.wikimedia.org/T148541) [17:29:25] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] prometheus: fix file permissions and servertech template [puppet] - 10https://gerrit.wikimedia.org/r/342862 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [17:29:45] (03CR) 10Jgreen: [C: 032] reassign wmf7010/frpm1001 mgmt to civi1001, add A+PTR for civi1001.frack.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/342863 (https://phabricator.wikimedia.org/T159342) (owner: 10Jgreen) [17:32:36] 06Operations, 10RESTBase, 10service-runner, 13Patch-For-Review, and 2 others: enable restbase syslog/file logging - https://phabricator.wikimedia.org/T112648#1641285 (10Nuria) no movement on this for a while, is this work still ongoing? [17:35:06] (03PS1) 10Filippo Giunchedi: prometheus: python3 octal syntax [puppet] - 10https://gerrit.wikimedia.org/r/342868 [17:35:27] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] prometheus: python3 octal syntax [puppet] - 10https://gerrit.wikimedia.org/r/342868 (owner: 10Filippo Giunchedi) [17:38:06] lol RainbowSprinkles at your sync message [17:38:17] ;-) [17:40:20] !upgraded librdkafka to 0.9.4 on scb2001 [17:41:52] !log ppchelko@tin Started deploy [trending-edits/deploy@85be190]: Trending: Update to node-rdkafka 0.8.0. Canary on scb2001. T159200 [17:41:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:41:58] T159200: Update to node-rdkafka 0.8.0 - https://phabricator.wikimedia.org/T159200 [17:42:49] !log mobrovac@tin Started deploy [changeprop/deploy@614cb4b]: Canary deploy for switching to librdkafka 0.9.4 T159200 [17:42:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:43:14] !log ppchelko@tin Finished deploy [trending-edits/deploy@85be190]: Trending: Update to node-rdkafka 0.8.0. Canary on scb2001. T159200 (duration: 01m 21s) [17:43:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:43:23] 06Operations, 10fundraising-tech-ops, 10netops, 13Patch-For-Review: reassign wmf7010/frpm1001 to host "civi1001.frack.eqiad.wmnet" - https://phabricator.wikimedia.org/T159342#3103222 (10Jgreen) p:05Normal>03High [17:43:42] !log mobrovac@tin Finished deploy [changeprop/deploy@614cb4b]: Canary deploy for switching to librdkafka 0.9.4 T159200 (duration: 00m 53s) [17:43:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:44:41] (03PS1) 10Gehel: maps - increase postgresql track_activity_query_size to 16384 [puppet] - 10https://gerrit.wikimedia.org/r/342870 (https://phabricator.wikimedia.org/T160209) [17:45:19] jouncebot: next [17:45:19] In 0 hour(s) and 14 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170315T1800) [17:46:11] (03CR) 10Pnorman: [C: 031] maps - increase postgresql track_activity_query_size to 16384 [puppet] - 10https://gerrit.wikimedia.org/r/342870 (https://phabricator.wikimedia.org/T160209) (owner: 10Gehel) [17:46:40] !log otto@tin Started deploy [eventstreams/deploy@eb8698e]: T159200 [17:46:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:58] !log otto@tin Finished deploy [eventstreams/deploy@eb8698e]: T159200 (duration: 00m 17s) [17:47:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:47:04] T159200: Update to node-rdkafka 0.8.0 - https://phabricator.wikimedia.org/T159200 [17:48:37] (03PS1) 10Legoktm: Deploy Linter to group0 and small wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342874 (https://phabricator.wikimedia.org/T148609) [17:50:08] !log upgrading librdkafka on scb in codfw T159200 [17:50:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:50:38] 06Operations, 10ops-codfw, 10DBA: es2015 crashed on 2017-03-11 - https://phabricator.wikimedia.org/T160242#3103272 (10Marostegui) >>! In T160242#3102940, @jcrespo wrote: > Thank you very much! We will shutdown the server tomorrow ahead of time- ping us if have more details about the predicted schedule for it... [17:50:55] !log otto@tin Started deploy [eventstreams/deploy@eb8698e]: T159200 [17:51:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:51:09] !log ppchelko@tin Started deploy [trending-edits/deploy@85be190]: Update to node-rdkafka 0.8.0 in codfw. T159200 [17:51:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:51:31] !log mobrovac@tin Started deploy [changeprop/deploy@614cb4b]: Deploy to CODFW for switching to librdkafka 0.9.4 T159200 [17:51:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:52:30] !log otto@tin Finished deploy [eventstreams/deploy@eb8698e]: T159200 (duration: 01m 35s) [17:52:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:52:36] T159200: Update to node-rdkafka 0.8.0 - https://phabricator.wikimedia.org/T159200 [17:53:15] !log mobrovac@tin Finished deploy [changeprop/deploy@614cb4b]: Deploy to CODFW for switching to librdkafka 0.9.4 T159200 (duration: 01m 44s) [17:53:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:55:00] !log ppchelko@tin Finished deploy [trending-edits/deploy@85be190]: Update to node-rdkafka 0.8.0 in codfw. T159200 (duration: 03m 51s) [17:55:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:56:35] 06Operations, 10Analytics, 10Analytics-Cluster, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3103312 (10RobH) So there are not going to be a lot of server chassis that can accomodate an off the shelf GPU card. Additionally, it would then not have any kind of... [17:57:24] (03PS2) 10Filippo Giunchedi: Remove integration/* repos from trebuchet [puppet] - 10https://gerrit.wikimedia.org/r/342785 (owner: 10Chad) [17:58:56] 06Operations, 10Analytics, 10Analytics-Cluster, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3103329 (10RobH) Once we get the GPU options hammered down, all the rest of the specs are easy in comparison. [18:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170315T1800). [18:00:04] James_F, tgr, and MatmaRex: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [18:00:19] jouncebot: refresh [18:00:21] hi [18:00:23] * legoktm put up a patch too... [18:00:24] I refreshed my knowledge about deployments. [18:00:38] Heya. [18:00:44] jouncebot: now [18:00:46] For the next 0 hour(s) and 59 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170315T1800) [18:00:54] Well that's unhelpful. [18:01:43] not helpful enough [18:03:31] uh no deployers? I guess I could do it then [18:03:43] elukey: aha, so i should have used "pooled=inactive" instead of "pooled=no" (and ran puppet on tin) when i see an appserver that is down for hardware fail ? [18:04:09] (03PS3) 10Legoktm: Show 'Publish' not 'Save' on most public wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/337530 (https://phabricator.wikimedia.org/T131132) (owner: 10Jforrester) [18:04:17] (03CR) 10Legoktm: [C: 032] Show 'Publish' not 'Save' on most public wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/337530 (https://phabricator.wikimedia.org/T131132) (owner: 10Jforrester) [18:04:17] and that's what "remove from dsh groups" is nowadays [18:04:22] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] Remove integration/* repos from trebuchet [puppet] - 10https://gerrit.wikimedia.org/r/342785 (owner: 10Chad) [18:04:25] mutante: yep exactly! [18:04:25] Thanks, legoktm. [18:04:30] elukey: cool, thank you! [18:05:14] tgr: around? [18:05:24] legoktm: o/ [18:05:42] ok, I'll do your stuff after James_F's [18:06:08] Mines pretty much untestable on mwdebug [18:06:10] (03CR) 10Jforrester: "\o/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342874 (https://phabricator.wikimedia.org/T148609) (owner: 10Legoktm) [18:06:15] !log otto@tin Started deploy [eventstreams/deploy@eb8698e]: T159200 [18:06:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:06:21] T159200: Update to node-rdkafka 0.8.0 - https://phabricator.wikimedia.org/T159200 [18:06:31] !log ppchelko@tin Started deploy [trending-edits/deploy@85be190]: Update to node-rdkafka 0.8.0. Canary on scb1001.eqiad.wmnet. T159200 [18:06:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:07:14] 06Operations, 10Analytics, 10Analytics-Cluster, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3103386 (10RobH) Please also note a concern was raised about the driver support of these GPU options: >>! In T148843#3075519, @Ladsgroup wrote: > Regarding GPU optio... [18:07:20] Reedy: did you add a patch? [18:07:38] !log ppchelko@tin Finished deploy [trending-edits/deploy@85be190]: Update to node-rdkafka 0.8.0. Canary on scb1001.eqiad.wmnet. T159200 (duration: 01m 07s) [18:07:39] !log mobrovac@tin Started deploy [changeprop/deploy@614cb4b]: Deploy to EQIAD canary for switching to librdkafka 0.9.4 T159200 [18:07:39] (03PS3) 10Legoktm: Use custom LogstashFormatter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323492 (https://phabricator.wikimedia.org/T145133) (owner: 10Gergő Tisza) [18:07:41] (03PS2) 10Legoktm: Deploy PageViewInfo to group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342728 (https://phabricator.wikimedia.org/T125917) (owner: 10Gergő Tisza) [18:07:43] (03PS2) 10Legoktm: Deploy Linter to group0 and small wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342874 (https://phabricator.wikimedia.org/T148609) [18:07:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:07:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:07:50] https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=prev&oldid=1702167 [18:07:59] !log mobrovac@tin Finished deploy [changeprop/deploy@614cb4b]: Deploy to EQIAD canary for switching to librdkafka 0.9.4 T159200 (duration: 00m 20s) [18:08:00] (03CR) 10Legoktm: [C: 032] Use custom LogstashFormatter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323492 (https://phabricator.wikimedia.org/T145133) (owner: 10Gergő Tisza) [18:08:00] In apparently the wrong box [18:08:03] 06Operations, 10Analytics, 10Analytics-Cluster, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3080331 (10MoritzMuehlenhoff) Note that the Nvidia OpenCL drivers are closed-source (as the other parts of the Nvidia drivers). Note sure about AMD, but they've becom... [18:08:04] (03CR) 10Legoktm: [C: 032] Deploy PageViewInfo to group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342728 (https://phabricator.wikimedia.org/T125917) (owner: 10Gergő Tisza) [18:08:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:08:06] (03CR) 10Legoktm: [C: 032] Deploy Linter to group0 and small wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342874 (https://phabricator.wikimedia.org/T148609) (owner: 10Legoktm) [18:08:08] (03Merged) 10jenkins-bot: Show 'Publish' not 'Save' on most public wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/337530 (https://phabricator.wikimedia.org/T131132) (owner: 10Jforrester) [18:08:17] (03CR) 10jenkins-bot: Show 'Publish' not 'Save' on most public wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/337530 (https://phabricator.wikimedia.org/T131132) (owner: 10Jforrester) [18:08:19] Yup, one too hig [18:10:02] James_F: it's on mwdebug1002 [18:10:46] legoktm: LGTM. [18:11:12] (03Merged) 10jenkins-bot: Use custom LogstashFormatter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323492 (https://phabricator.wikimedia.org/T145133) (owner: 10Gergő Tisza) [18:11:22] (03CR) 10jenkins-bot: Use custom LogstashFormatter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323492 (https://phabricator.wikimedia.org/T145133) (owner: 10Gergő Tisza) [18:12:11] (03Merged) 10jenkins-bot: Deploy PageViewInfo to group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342728 (https://phabricator.wikimedia.org/T125917) (owner: 10Gergő Tisza) [18:12:17] !log legoktm@tin Synchronized wmf-config/InitialiseSettings.php: Show 'Publish' not 'Save' on most public wikis -T131132 (duration: 00m 42s) [18:12:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:24] T131132: Re-label the "Save" button to be "Publish", to better indicate to users the outcomes of their action - https://phabricator.wikimedia.org/T131132 [18:12:52] !log upgrading librdkafka on scb eqiad nodes T159200 [18:12:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:58] T159200: Update to node-rdkafka 0.8.0 - https://phabricator.wikimedia.org/T159200 [18:13:20] tgr: both of your patches are on mwdebug1002 [18:13:43] !log otto@tin Started deploy [eventstreams/deploy@eb8698e]: T159200 [18:13:46] !log ppchelko@tin Started deploy [trending-edits/deploy@85be190]: Update to node-rdkafka 0.8.0. T159200 [18:13:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:49] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, PCC https://puppet-compiler.wmflabs.org/5781/ see also typo in commit message" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342088 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [18:13:52] (03CR) 10jenkins-bot: Deploy PageViewInfo to group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342728 (https://phabricator.wikimedia.org/T125917) (owner: 10Gergő Tisza) [18:13:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:14:10] (03Merged) 10jenkins-bot: Deploy Linter to group0 and small wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342874 (https://phabricator.wikimedia.org/T148609) (owner: 10Legoktm) [18:14:19] (03CR) 10jenkins-bot: Deploy Linter to group0 and small wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342874 (https://phabricator.wikimedia.org/T148609) (owner: 10Legoktm) [18:14:35] !log restbase deploying f047dabb [18:14:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:07] !log mobrovac@tin Started deploy [changeprop/deploy@614cb4b]: Deploy for switching to librdkafka 0.9.4 T159200 [18:15:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:40] !log mobrovac@tin Finished deploy [changeprop/deploy@614cb4b]: Deploy for switching to librdkafka 0.9.4 T159200 (duration: 00m 33s) [18:15:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:48] PROBLEM - puppet last run on copper is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:16:50] tgr: ? [18:17:01] (03CR) 10Volans: [C: 032] Upgrade to version 0.0.2 [software/cumin] - 10https://gerrit.wikimedia.org/r/342861 (owner: 10Volans) [18:17:43] legoktm: thanks! PageViewInfo works, the other is not testable but does not seem to break anything [18:17:50] ok [18:18:38] PROBLEM - puppet last run on elastic1045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:19:07] (03PS12) 10Eevans: Cassandra TLS configuration for RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/342088 (https://phabricator.wikimedia.org/T111113) [18:19:17] !log legoktm@tin Synchronized wmf-config/logging.php: Use custom LogstashFormatter - T145133, T151290 (duration: 00m 42s) [18:19:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:19:23] T145133: A 'message' key in the extra log data overwrites the actual message - https://phabricator.wikimedia.org/T145133 [18:19:23] T151290: When logging an exception in Logstash as a PSR-3 context parameter, the trace does not include class/method names - https://phabricator.wikimedia.org/T151290 [18:19:58] !log ppchelko@tin Finished deploy [trending-edits/deploy@85be190]: Update to node-rdkafka 0.8.0. T159200 (duration: 06m 11s) [18:20:02] !log otto@tin Finished deploy [eventstreams/deploy@eb8698e]: T159200 (duration: 06m 18s) [18:20:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:20:06] T159200: Update to node-rdkafka 0.8.0 - https://phabricator.wikimedia.org/T159200 [18:20:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:36] !log legoktm@tin Synchronized wmf-config/InitialiseSettings.php: Deploy PageViewInfo to group0 - T125917 (duration: 00m 42s) [18:21:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:42] T125917: Deploy the PageViewInfo extension to production - https://phabricator.wikimedia.org/T125917 [18:22:02] tgr: everything should be live now [18:22:09] thx [18:22:51] (03CR) 10Filippo Giunchedi: [C: 04-1] "See inline, LGTM overall" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/342431 (https://phabricator.wikimedia.org/T156245) (owner: 10Gilles) [18:24:17] (03Merged) 10jenkins-bot: Upgrade to version 0.0.2 [software/cumin] - 10https://gerrit.wikimedia.org/r/342861 (owner: 10Volans) [18:25:16] !log legoktm@tin Synchronized wmf-config/InitialiseSettings.php: Deploy Linter to group0 and small wikis - T148609 (duration: 00m 42s) [18:25:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:25:21] T148609: Review and deploy Linter extension to Wikimedia wikis - https://phabricator.wikimedia.org/T148609 [18:27:01] legoktm: Yay. [18:29:23] https://www.mediawiki.org/w/index.php?title=Wikimedia_Engineering/2016-17_Q3_Goals&diff=2421377&oldid=2421334 :-) [18:30:33] godog: Ty [18:31:06] MatmaRex: your patch is on mwdebug1002 [18:31:49] legoktm: thanks, seems to work [18:32:09] MatmaRex: which file should I sync first? or does it not matter [18:32:37] !log legoktm@tin Synchronized php-1.29.0-wmf.16/includes/libs/filebackend/SwiftFileBackend.php: Make sure Swift store operations close the source file handle - T159607 (duration: 00m 44s) [18:32:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:32:43] T159607: Investigate why --delete seemingly deleted all but 1 captcha file when run in cronjob - https://phabricator.wikimedia.org/T159607 [18:32:44] RainbowSprinkles: yw [18:32:46] legoktm: doesn't matter [18:32:49] Reedy: ^ [18:34:03] oops, I didn't realize bash would execute ` even in quotes [18:34:13] !log legoktm@tin Synchronized php-1.29.0-wmf.16/includes/widget/SearchInputWidget.php: mw.widgets.SearchInputWidget: Do not pass to TextInputWidget - T148471 (1/2) (duration: 00m 41s) [18:34:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:34:21] (03CR) 10Dzahn: add parsoid-vd-tests.wikimedia.org (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/341920 (https://phabricator.wikimedia.org/T159995) (owner: 10Dzahn) [18:34:22] T148471: Break out search stuff in TextInputWidget into SearchInputWidget - https://phabricator.wikimedia.org/T148471 [18:35:04] (03PS7) 10Mbch331: Remove exception on Other Projects sidebar for Dutch Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341195 (https://phabricator.wikimedia.org/T159634) [18:35:32] !log legoktm@tin Synchronized php-1.29.0-wmf.16/resources/src/mediawiki.widgets/mw.widgets.SearchInputWidget.js: mw.widgets.SearchInputWidget: Do not pass to TextInputWidget - T148471 (2/2) (duration: 00m 42s) [18:35:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:35:52] MatmaRex: ^ [18:35:55] all done I think [18:36:20] thanks [18:37:00] (03PS3) 10Dzahn: add parsoid-vd-tests, parsoid-rt-tests [dns] - 10https://gerrit.wikimedia.org/r/341920 (https://phabricator.wikimedia.org/T159995) [18:37:40] (03PS4) 10Dzahn: add parsoid-vd-tests, parsoid-rt-tests [dns] - 10https://gerrit.wikimedia.org/r/341920 (https://phabricator.wikimedia.org/T159995) [18:38:31] (03CR) 10Dzahn: "pre-requisite for varnish changes for "Separate subdomain for parsoid visual diff test service on ruthenium", former "parsoid-tests" will " [dns] - 10https://gerrit.wikimedia.org/r/341920 (https://phabricator.wikimedia.org/T159995) (owner: 10Dzahn) [18:41:02] 06Operations, 10DNS, 10Parsoid, 10Traffic, 13Patch-For-Review: Separate subdomain for parsoid visual diff test service on ruthenium - https://phabricator.wikimedia.org/T159995#3103496 (10Dzahn) p:05Triage>03Normal [18:41:09] 06Operations, 10DNS, 10Parsoid, 10Traffic, 13Patch-For-Review: Separate subdomain for parsoid visual diff test service on ruthenium - https://phabricator.wikimedia.org/T159995#3085873 (10Dzahn) a:03Dzahn [18:41:39] (03CR) 10Dzahn: [C: 032] add parsoid-vd-tests, parsoid-rt-tests [dns] - 10https://gerrit.wikimedia.org/r/341920 (https://phabricator.wikimedia.org/T159995) (owner: 10Dzahn) [18:43:32] (03CR) 10Subramanya Sastry: "Let us keep the parsoid-tests domain for a little bit while we identify out all references to it and update docs and wikis pages." [dns] - 10https://gerrit.wikimedia.org/r/341920 (https://phabricator.wikimedia.org/T159995) (owner: 10Dzahn) [18:44:45] (03CR) 10Dzahn: "yes, that is why i changed it from "rename parsoid-tests to parsoid-rt-tests" to just "add the 2 new names and remove the old one later". " [dns] - 10https://gerrit.wikimedia.org/r/341920 (https://phabricator.wikimedia.org/T159995) (owner: 10Dzahn) [18:44:48] RECOVERY - puppet last run on copper is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [18:45:18] PROBLEM - puppet last run on mw1263 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:45:36] (03CR) 10Filippo Giunchedi: "+1 on the idea, see inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/342604 (https://phabricator.wikimedia.org/T119140) (owner: 10Hashar) [18:46:14] (03CR) 10Filippo Giunchedi: [C: 04-1] "fails PCC https://puppet-compiler.wmflabs.org/5782/" [puppet] - 10https://gerrit.wikimedia.org/r/342811 (https://phabricator.wikimedia.org/T151065) (owner: 10Gilles) [18:47:26] (03PS3) 10Dzahn: delete parsoid-test (was: rename parsoid-tests to parsoid-rt-tests) [dns] - 10https://gerrit.wikimedia.org/r/341923 (https://phabricator.wikimedia.org/T159995) [18:47:32] (03CR) 10jerkins-bot: [V: 04-1] delete parsoid-test (was: rename parsoid-tests to parsoid-rt-tests) [dns] - 10https://gerrit.wikimedia.org/r/341923 (https://phabricator.wikimedia.org/T159995) (owner: 10Dzahn) [18:47:38] RECOVERY - puppet last run on elastic1045 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [18:48:28] PROBLEM - puppet last run on labvirt1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:48:33] (03CR) 10Dzahn: "i was going to turn this from "rename parsoid-tests to rt-tests" into "remove parsoid-tests" now.. but i'll abandon and it just serves as " [dns] - 10https://gerrit.wikimedia.org/r/341923 (https://phabricator.wikimedia.org/T159995) (owner: 10Dzahn) [18:48:41] (03Abandoned) 10Dzahn: delete parsoid-test (was: rename parsoid-tests to parsoid-rt-tests) [dns] - 10https://gerrit.wikimedia.org/r/341923 (https://phabricator.wikimedia.org/T159995) (owner: 10Dzahn) [18:49:10] (03CR) 10Dzahn: "superseded by https://gerrit.wikimedia.org/r/#/c/341920/" [dns] - 10https://gerrit.wikimedia.org/r/341923 (https://phabricator.wikimedia.org/T159995) (owner: 10Dzahn) [18:49:33] (03PS2) 10Dzahn: varnish/misc: add parsoid-vd-tests -> ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/341925 (https://phabricator.wikimedia.org/T159995) [18:49:46] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/342231 (owner: 10Gehel) [18:52:47] (03CR) 10Dzahn: [C: 032] varnish/misc: add parsoid-vd-tests -> ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/341925 (https://phabricator.wikimedia.org/T159995) (owner: 10Dzahn) [18:56:38] PROBLEM - puppet last run on mc1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:57:11] 06Operations, 10ops-codfw: troubleshoot drac on ms-be2010.codfw.wmnet - https://phabricator.wikimedia.org/T155690#3103559 (10RobH) >>! In T155690#3060353, @fgiunchedi wrote: > We can decom the old ms-be machines as soon as the new ms-be hardware is fully in service. Specifically for ms-be2010 I wouldn't spend... [18:57:17] (03CR) 10Filippo Giunchedi: Scap3: Prep MediaWiki to be available from /srv/deployment (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342788 (owner: 10Chad) [18:59:18] (03PS3) 10Dzahn: varnish/misc: add parsoid-rt-tests (was: rename parsoid-tests) [puppet] - 10https://gerrit.wikimedia.org/r/341926 (https://phabricator.wikimedia.org/T159995) [19:00:04] twentyafterfour: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170315T1900). Please do the needful. [19:00:59] (03PS4) 10Dzahn: varnish/misc: add parsoid-rt-tests (was: rename parsoid-tests) [puppet] - 10https://gerrit.wikimedia.org/r/341926 (https://phabricator.wikimedia.org/T159995) [19:01:27] (03PS5) 10Dzahn: varnish/misc: add parsoid-rt-tests (was: rename parsoid-tests) [puppet] - 10https://gerrit.wikimedia.org/r/341926 (https://phabricator.wikimedia.org/T159995) [19:06:55] (03CR) 10Dzahn: [C: 04-1] "Duplicate declaration: Class[Exim4] is already declared .. sigh - http://puppet-compiler.wmflabs.org/5774/ununpentium.wikimedia.org/chang" [puppet] - 10https://gerrit.wikimedia.org/r/342771 (owner: 10Dzahn) [19:07:01] legoktm: Cheers [19:07:42] (03CR) 10Dzahn: [C: 04-1] "Must pass ipv4 to Class[Profile::Gerrit::Server] http://puppet-compiler.wmflabs.org/5773/" [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [19:07:58] (03CR) 10Dzahn: [C: 032] varnish/misc: add parsoid-rt-tests (was: rename parsoid-tests) [puppet] - 10https://gerrit.wikimedia.org/r/341926 (https://phabricator.wikimedia.org/T159995) (owner: 10Dzahn) [19:13:28] RECOVERY - puppet last run on mw1263 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [19:15:30] !log mobrovac@tin Started deploy [changeprop/deploy@b68bf51]: Deploy producer fix for T159200 [19:15:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:35] T159200: Update to node-rdkafka 0.8.0 - https://phabricator.wikimedia.org/T159200 [19:16:21] !log mobrovac@tin Finished deploy [changeprop/deploy@b68bf51]: Deploy producer fix for T159200 (duration: 00m 51s) [19:16:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:17:28] RECOVERY - puppet last run on labvirt1009 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [19:19:22] 06Operations, 10DNS, 10Parsoid, 10Traffic, 13Patch-For-Review: Separate subdomain for parsoid visual diff test service on ruthenium - https://phabricator.wikimedia.org/T159995#3103645 (10Dzahn) @ssastry on the DNS and Varnish side this should be done now. It should now just be up to Nginx config on ruth... [19:20:36] Error: 1054 Unknown column 'qc_ra.rd_namespace' in 'order clause' [19:20:55] I see that error in wmf.16 [19:21:14] qc_ra, that is new [19:21:57] (03PS2) 10Chad: Scap3: Prep MediaWiki to be available from /srv/deployment [puppet] - 10https://gerrit.wikimedia.org/r/342788 [19:22:03] the whole query was SELECT qc_type,qc_namespace AS `namespace`,qc_title AS `title`,qc_value AS `value` FROM `querycache` WHERE qc_type = 'DoubleRedirects' ORDER BY qc_ra.rd_namespace,qc_ra.rd_title LIMIT 5001 [19:22:05] (03CR) 10Chad: Scap3: Prep MediaWiki to be available from /srv/deployment (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342788 (owner: 10Chad) [19:22:27] 06Operations, 10DNS, 10Parsoid, 10Traffic, 13Patch-For-Review: Separate subdomain for parsoid visual diff test service on ruthenium - https://phabricator.wikimedia.org/T159995#3103650 (10ssastry) >>! In T159995#3103645, @Dzahn wrote: > @ssastry on the DNS and Varnish side this should be done now. It sho... [19:22:42] you cannot order from a table you do not select :-) [19:22:44] 06Operations, 10DNS, 10Parsoid, 10Traffic, 13Patch-For-Review: Separate subdomain for parsoid visual diff test service on ruthenium - https://phabricator.wikimedia.org/T159995#3103653 (10Dzahn) a:05Dzahn>03ssastry All good for now? Could you confirm the new names work for you and take the next step o... [19:23:02] Hmm, someone changed DoubleRedirects page [19:23:03] ? [19:23:04] qc_ra isn't a MW table [19:23:13] 06Operations, 10DNS, 10Parsoid, 10Traffic, 13Patch-For-Review: Separate subdomain for parsoid visual diff test service on ruthenium - https://phabricator.wikimedia.org/T159995#3103657 (10Dzahn) Ah, i sent my comment before i saw your new one. Ok, cool. [19:23:17] so presumably it's some alias [19:23:18] even if it was, it would not work [19:23:24] (03CR) 10BBlack: [C: 031] cache: different parity for start/end ip_local_port_range values [puppet] - 10https://gerrit.wikimedia.org/r/342832 (owner: 10Ema) [19:23:28] Or not [19:23:46] I cannot find a qc_ra in any WMF hosted extension/core [19:23:51] Me either [19:23:52] Hmmm [19:24:17] Dynamic-generated table alias in some query? [19:24:21] That'd be ugly for grepping [19:25:10] hmmm [19:25:35] I guess I know where that's from. QueryPage::fetchFromCache was changed [19:25:38] RECOVERY - puppet last run on mc1021 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [19:25:41] twentyafterfour: Is there a full stack for the DB query? [19:25:58] See https://gerrit.wikimedia.org/r/#/c/338965/ [19:26:04] /srv/mediawiki/php-1.29.0-wmf.16/includes/specialpage/QueryPage.php [19:26:12] line 485 [19:26:14] Within fetchfromcache everything is now prefixed with qc_ [19:26:22] foreach ( $orderFields as $field ) { [19:26:22] $order[] = "qc_${field}${DESC}"; [19:26:22] } [19:26:39] Let's revert that for now [19:26:47] 06Operations, 10Wikimedia-Mailing-lists: Create a public mailing list for Maithili Wikimedians - https://phabricator.wikimedia.org/T160499#3101034 (10Dzahn) [19:26:54] 06Operations, 06Performance-Team, 10Wikimedia-General-or-Unknown: Run EventLogging test to determine best DC for each country - https://phabricator.wikimedia.org/T55497#3103663 (10Gilles) 05Open>03declined This is probably not needed anymore [19:27:07] Seems simplest for now [19:27:42] Was my patch, I'm absolutely okay with reverting it :) [19:28:07] * twentyafterfour reverts [19:28:15] Should the $ really be outside the {}? [19:28:31] I know there's some dereferencing changes for nested stuff in PHP7 [19:29:02] I prefer just concating...always clear then :) [19:29:06] I don't know, I wrote the variables without {} into the string ... I think twentyafterfour changed that part. [19:29:32] {$foo} is usually used [19:29:37] more so when calling a function on foo [19:29:59] https://gerrit.wikimedia.org/r/#/c/342886/ [19:30:31] (03PS1) 10BBlack: [WIP/POC] DNS zones to puppet repo [puppet] - 10https://gerrit.wikimedia.org/r/342887 [19:30:50] Cherry pick to .16 and CR+2 in https://gerrit.wikimedia.org/r/#/c/342888 [19:30:57] Most important is getting it fixed in the branch [19:31:03] Master can follow, or be fixed [19:32:36] Reedy: thanks [19:32:54] Reedy: Let's revert in master too, re-doing this probably needs more eyes on it [19:33:05] (and I'd rather get it right than trip ourselves up again next week on wmf.17) [19:33:15] RainbowSprinkles: WFM. Just getting it done in .16 first because jenkins [19:33:19] Yeah ofc [19:33:26] 06Operations, 06Discovery, 10Wikimedia-Portals, 03Discovery-Portal-Sprint, and 2 others: https://www.wikipedia.org/ portal doesn't have any text - https://phabricator.wikimedia.org/T158782#3103669 (10debt) 05Open>03Resolved [19:33:29] * Reedy tries DoubleRedirects locally [19:33:31] and thanks eddiegp for remembering that change. /me should have remembered, given I reviewed it [19:33:36] :D [19:33:42] twentyafterfour: "it was a week ago" [19:33:49] You've hopefully slept. Or maybe not [19:33:51] (03PS1) 10Volans: Upgrade to version 0.0.2 [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/342892 [19:33:56] * twentyafterfour has slept indeed [19:34:38] "That's so long ago I don't remember" [19:34:38] good example of why faster dev process is better ...sooner you see the error the more likely it's remembered and easily tracked down [19:34:45] Well I didn't expect that one too. Prefixing works fine as long as only field names are given. Error is with prefixing "table.field" things. [19:34:48] 06Operations, 10Wikimedia-Mailing-lists: Create a public mailing list for Maithili Wikimedians - https://phabricator.wikimedia.org/T160499#3103676 (10Dzahn) You have successfully created the mailing list **wikimedia-mai **and notification has been sent to the list owner **maithiliwikimedians@gmail.com**. You c... [19:35:12] Well, if we expected bugs we'd never create them :) [19:35:13] (03PS2) 10BBlack: [WIP/POC] DNS zones to puppet repo [puppet] - 10https://gerrit.wikimedia.org/r/342887 [19:35:19] hahah [19:35:28] but I thought everyone liked writing bugs [19:35:43] twentyafterfour: Well, I wrote a change for scap today that breaks everything ;-) [19:35:59] The best changes. [19:36:39] 06Operations, 10Wikimedia-Mailing-lists: Create a public mailing list for Maithili Wikimedians - https://phabricator.wikimedia.org/T160499#3103679 (10Dzahn) 05Open>03Resolved The other people mentioned on the ticket can subscribe on https://lists.wikimedia.org/mailman/listinfo/wikimedia-mai . This is also... [19:37:29] meh, eddiegp it gives the right output [19:37:36] 06Operations, 10Scap (Scap3-MediaWiki-MVP): Depool proxies temporarily while scap is ongoing to avoid taxing those nodes - https://phabricator.wikimedia.org/T125629#3103684 (10demon) [19:37:48] ${ vs {$ [19:38:29] twentyafterfour: "Test plan: Totally untested. Oh, and it breaks everything for now because we don't have conftool installed in vagrant or on deploy masters. So probably shouldn't merge yet" [19:38:32] hehehehehehe [19:38:50] (03CR) 10Dzahn: "yea, not used. the reason i had not deleted it earlier was that i kind of used it as a template whenever i needed a migration rsync class " [puppet] - 10https://gerrit.wikimedia.org/r/342828 (owner: 10Alexandros Kosiaris) [19:39:06] (03PS2) 10Dzahn: Remove role::list:migration [puppet] - 10https://gerrit.wikimedia.org/r/342828 (owner: 10Alexandros Kosiaris) [19:40:20] (03CR) 10Dzahn: "the only reason to keep it would be that one day there will probably be a mailman migration again.." [puppet] - 10https://gerrit.wikimedia.org/r/342828 (owner: 10Alexandros Kosiaris) [19:40:34] (03CR) 10Dzahn: [C: 032] Remove role::list:migration [puppet] - 10https://gerrit.wikimedia.org/r/342828 (owner: 10Alexandros Kosiaris) [19:40:38] I remember I didn't want to change the parent class QueryPage because it is inherited widely in the first place because I was afraid it might break something lol [19:40:56] Those are the most fun to change :) [19:42:02] (03PS3) 10Dzahn: Remove role::lists:migration [puppet] - 10https://gerrit.wikimedia.org/r/342828 (owner: 10Alexandros Kosiaris) [19:42:03] Agreed, though I tried to start with code parts that won't break too much as a newbie. Worked great until here :D [19:42:23] It happens to everyone [19:43:03] (03PS2) 10ArielGlenn: scripts to generate a series of checkpoint files for a dump run manually [dumps] - 10https://gerrit.wikimedia.org/r/342846 (https://phabricator.wikimedia.org/T160507) [19:43:17] (03CR) 10Bmansurov: [C: 031] Restrict page images to lead section on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342696 (https://phabricator.wikimedia.org/T152115) (owner: 10Jdlrobson) [19:43:44] it's a rite of passage. at least your mistake is easily reverted. one of the problems i caused when i was starting out took a month to disappear from all the caches ;) [19:44:01] 06Operations, 10ops-eqiad: db1057 does not react to powercycle/powerdown/powerup commands - https://phabricator.wikimedia.org/T160435#3103717 (10jcrespo) I am going to pool db1067 as a replacement- we cannot be without this server for long. [19:44:19] I know those things happen, I don't feel bad about it ;) [19:46:39] !log shutting down db1067 for maintenance (as a db1057 replacement) T160435 [19:46:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:45] T160435: db1057 does not react to powercycle/powerdown/powerup commands - https://phabricator.wikimedia.org/T160435 [19:47:38] (03CR) 10jerkins-bot: [V: 04-1] [WIP/POC] DNS zones to puppet repo [puppet] - 10https://gerrit.wikimedia.org/r/342887 (owner: 10BBlack) [19:48:55] jenkins is slow (says captain obvious) [19:49:20] (03PS1) 10Dzahn: rm manifests/microsites/peopleweb/migration.pp [puppet] - 10https://gerrit.wikimedia.org/r/342897 [19:49:27] ....and our tests are slow [19:49:49] there was a spike on deadlocks on ipblock for an autoblocking on enwiki [19:50:17] twentyafterfour I think the slowness will pass soon [19:50:33] if there is an enwiki admin, check there were no unintended users blocked [19:51:15] jynus: should we notify someone in another channel? [19:51:49] only 100 deadlocks, it could be a vandal with many ips [19:52:02] or a too wide block range [19:53:02] twentyafterfour, I do not know, maybe wikimedia tech or some enwiki channel? [19:54:18] tells #en.wikipedia [19:54:19] (03PS1) 10Eevans: Enable Cassandra client encryption in RESTBase production [puppet] - 10https://gerrit.wikimedia.org/r/342898 (https://phabricator.wikimedia.org/T111113) [19:54:28] How do we check if a block was unintended? [19:54:44] well, I do not know- but there is a lot of ip blocking ongoing [19:55:06] automatically [19:55:11] (03PS2) 10Eevans: [WIP]: Enable Cassandra client encryption in RESTBase production [puppet] - 10https://gerrit.wikimedia.org/r/342898 (https://phabricator.wikimedia.org/T111113) [19:55:39] https://en.wikipedia.org/wiki/Special:BlockList [19:55:54] Checkuser sockpuppeters [19:56:45] !log twentyafterfour@tin Synchronized php-1.29.0-wmf.16/includes/specialpage/: deploy revert of 5b15728478f9b167389268fb988a7b9f9f78fcf5 (duration: 00m 44s) [19:56:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:56:59] (03PS3) 10Eevans: Enable Cassandra client encryption in RESTBase production [puppet] - 10https://gerrit.wikimedia.org/r/342898 (https://phabricator.wikimedia.org/T111113) [19:57:41] 06Operations, 10RESTBase, 10service-runner, 13Patch-For-Review, and 2 others: enable restbase syslog/file logging - https://phabricator.wikimedia.org/T112648#3103738 (10GWicke) See also: https://github.com/wikimedia/hyperswitch/pull/78 [20:00:04] gwicke, cscott, arlolra, subbu, bearND, halfak, and Amir1: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170315T2000). Please do the needful. [20:00:51] (03PS1) 1020after4: group1 wikis to 1.29.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342899 [20:00:53] (03CR) 1020after4: [C: 032] group1 wikis to 1.29.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342899 (owner: 1020after4) [20:00:55] (03CR) 10jerkins-bot: [V: 04-1] rm manifests/microsites/peopleweb/migration.pp [puppet] - 10https://gerrit.wikimedia.org/r/342897 (owner: 10Dzahn) [20:01:48] You can not do SELECT * FROM table_a.field_foo ORDER BY table_b.field_bar , can you? What is the sense in prefixing a the field to order by with the table name then? [20:02:13] eddiegp, for that [20:02:17] (03PS1) 10Dzahn: gerrit: rm role class gerrit::migration [puppet] - 10https://gerrit.wikimedia.org/r/342902 [20:02:19] you need to select both tables [20:02:25] (03Merged) 10jenkins-bot: group1 wikis to 1.29.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342899 (owner: 1020after4) [20:02:33] (03CR) 10jenkins-bot: group1 wikis to 1.29.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342899 (owner: 1020after4) [20:03:01] Aah, okay, in that case you might get a order clause like this, yeah. [20:03:14] (03PS1) 10Eevans: [WIP]: Enable encrypted client connections in RESTBase production [puppet] - 10https://gerrit.wikimedia.org/r/342903 (https://phabricator.wikimedia.org/T111113) [20:03:15] eddiegp, maybe you intended to JOIN two tables? [20:04:20] something along the lines of SELECT * FROM table_a JOIN table_b ON table_a.id = table_b.id RDER BY table_b.field_bar ? [20:04:41] (03CR) 10Eevans: [C: 04-1] [WIP]: Enable encrypted client connections in RESTBase production [puppet] - 10https://gerrit.wikimedia.org/r/342903 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [20:04:49] PROBLEM - puppet last run on phab2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:04:52] Question was about amending the patch just reverted. Thought about a solution [20:04:54] (03CR) 10Dzahn: [C: 031] "lgtm, i'll let Robh merge once he did switch ports" [dns] - 10https://gerrit.wikimedia.org/r/342841 (owner: 10Papaul) [20:05:13] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.29.0-wmf.16 [20:05:30] Child classes define a function called getOrderFields that returns which order to sort by. I though one just might kick out the "table." prefix in all child classes to make this patch work again (as I thought there would be no benefit from having it anyway). [20:05:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:06:13] (03CR) 10Eevans: [C: 04-1] "Not ready to be merged." [puppet] - 10https://gerrit.wikimedia.org/r/342903 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [20:06:23] But your answers tell me that there's a usecase for adding the table name to the order field and I need to find an other solution hence. [20:06:32] if the fields are unambiguous, they are optional (do not require the full table name or alias) [20:06:36] in mediawiki style [20:06:43] (you like it or not) [20:06:57] the convention is to use always unambiguous field names [20:06:58] we have deploy for ores, if that's okay [20:07:46] something like SELECT * FROM table_a JOIN table_b ON ta_id = tb_id ORDER BY ta_bar [20:08:24] but you do not even need to do SQL, mediawiki wrapers take care of most of the thinking [20:09:14] (03CR) 10Dzahn: [C: 031] "hehe, yes please. i just ran 2.1.2 across the repo not long ago and fixed like 3/4 that i found. (https://gerrit.wikimedia.org/r/#/c/3419" [puppet] - 10https://gerrit.wikimedia.org/r/342637 (owner: 10Alexandros Kosiaris) [20:09:25] 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Decom db2001-db2009 - https://phabricator.wikimedia.org/T125827#3103785 (10RobH) robh@asw-a-codfw# show | compare [edit interfaces ge-6/0/0] - description db2001; [edit interfaces ge-6/0/1] - description db2002; [edit interfaces ge-6/... [20:09:39] 'Variable 'wgCirrusSearchCompletionGeoContextSettings' is not set.' in /srv/mediawiki/php-1.29.0-wmf.16/maintenance/getConfiguration.php:105 [20:09:52] !log bsitzmann@tin Started deploy [mobileapps/deploy@fa43048]: Update mobileapps to bb8fcf2 [20:09:58] (03PS1) 10Eevans: [WIP]: Mandatory Cassandra client encryption [puppet] - 10https://gerrit.wikimedia.org/r/342904 (https://phabricator.wikimedia.org/T111113) [20:09:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:35] (03PS2) 10RobH: DNS/Decom: Remove DNS entries for db200[1-9] [dns] - 10https://gerrit.wikimedia.org/r/342841 (owner: 10Papaul) [20:10:57] (03CR) 10Eevans: [C: 04-1] "Not yet ready to be merged." [puppet] - 10https://gerrit.wikimedia.org/r/342904 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [20:12:23] (03CR) 10Dzahn: "> how does restbase1001 factor in here?" [puppet] - 10https://gerrit.wikimedia.org/r/342088 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [20:12:25] (03CR) 10RobH: [C: 032] DNS/Decom: Remove DNS entries for db200[1-9] [dns] - 10https://gerrit.wikimedia.org/r/342841 (owner: 10Papaul) [20:13:03] since i crashed baham i now am paranoid about dns update runs... [20:13:05] (03PS13) 10Dzahn: Cassandra TLS configuration for RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/342088 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [20:13:43] !log bsitzmann@tin Finished deploy [mobileapps/deploy@fa43048]: Update mobileapps to bb8fcf2 (duration: 03m 51s) [20:13:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:14:01] 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Decom db2001-db2009 - https://phabricator.wikimedia.org/T125827#3103794 (10RobH) [20:14:08] 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Decom db2001-db2009 - https://phabricator.wikimedia.org/T125827#2098270 (10RobH) 05Open>03Resolved [20:14:38] jynus: So basically it shouldn't make any difference if a class says "I want to order by table.foo" or if it states "I want to order by foo" as out of the context (select, join) they would do the very same (due to mw never having a column named "foo" twice in different tables), I got that right? [20:15:24] in sql, no [20:15:38] eddiegp: seems like it's best not to make the assumption and instead always use the table name explicitly [20:15:41] (03CR) 10Dzahn: "compiler run now looks good" [puppet] - 10https://gerrit.wikimedia.org/r/342088 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [20:15:44] mw may require not a qualifier [20:15:59] I am not 100% sure [20:16:09] I would use the field directly [20:16:38] whatever works [20:16:53] I am probably the worse person to tell you about mediawiki :-) [20:17:08] twentyafterfour: Yeah, definitely if that's not a hard rule ("We will NEVER do this") but just a "usually we don't". [20:17:14] wgCirrusSearchCompletionGeoContextSettings was recently removed from CirrusSearch? [20:17:52] (03CR) 10Dzahn: [C: 032] Cassandra TLS configuration for RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/342088 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [20:18:16] (03PS1) 10Jcrespo: Move db1067 from s2 to s1 as a db1057 replacement [puppet] - 10https://gerrit.wikimedia.org/r/342905 (https://phabricator.wikimedia.org/T160435) [20:18:46] where does maintenance/getConfiguration.php get it's list of expected globals? [20:19:07] seems like it's validating the globals in the wmf.16 branch against those which are defined in the wmf.15 branch [20:19:18] then throwing an exception because one was removed between branches [20:19:18] halfak: we have the key issue again :D [20:20:41] (03PS1) 10Jcrespo: Move db1067 from s2 to s1 as a db1057 replacement [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342906 (https://phabricator.wikimedia.org/T160435) [20:20:43] urandom: the restbase config change has been applied on xenon, want me to do the other 2 ? [20:21:44] (03PS2) 10Gehel: maps - increase postgresql track_activity_query_size to 16384 [puppet] - 10https://gerrit.wikimedia.org/r/342870 (https://phabricator.wikimedia.org/T160209) [20:21:54] doing it, all test hosts [20:22:24] (03CR) 10Dzahn: "ran puppet and applied config change on xenon, cerium, praseodymium test hosts" [puppet] - 10https://gerrit.wikimedia.org/r/342088 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [20:22:28] (03CR) 10Jcrespo: [C: 032] Move db1067 from s2 to s1 as a db1057 replacement [puppet] - 10https://gerrit.wikimedia.org/r/342905 (https://phabricator.wikimedia.org/T160435) (owner: 10Jcrespo) [20:22:33] (03PS2) 10Jcrespo: Move db1067 from s2 to s1 as a db1057 replacement [puppet] - 10https://gerrit.wikimedia.org/r/342905 (https://phabricator.wikimedia.org/T160435) [20:23:23] (03PS2) 10Dzahn: gerrit: rm role class gerrit::migration [puppet] - 10https://gerrit.wikimedia.org/r/342902 [20:23:26] (03CR) 10Gehel: [C: 032] maps - increase postgresql track_activity_query_size to 16384 [puppet] - 10https://gerrit.wikimedia.org/r/342870 (https://phabricator.wikimedia.org/T160209) (owner: 10Gehel) [20:24:15] (03PS3) 10Jcrespo: Move db1067 from s2 to s1 as a db1057 replacement [puppet] - 10https://gerrit.wikimedia.org/r/342905 (https://phabricator.wikimedia.org/T160435) [20:24:37] (03CR) 10Dzahn: [C: 032] gerrit: rm role class gerrit::migration [puppet] - 10https://gerrit.wikimedia.org/r/342902 (owner: 10Dzahn) [20:24:43] (03PS3) 10Dzahn: gerrit: rm role class gerrit::migration [puppet] - 10https://gerrit.wikimedia.org/r/342902 [20:27:22] !log ladsgroup@tin Started deploy [ores/deploy@bc0bc74]: Mid-March deploy of ORES (T160279) [20:27:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:29] T160279: Deploy ores in prod (Mid-March) - https://phabricator.wikimedia.org/T160279 [20:27:55] (03PS2) 10Jcrespo: Move db1067 from s2 to s1 as a db1057 replacement [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342906 (https://phabricator.wikimedia.org/T160435) [20:29:37] added as deployment blocker: https://phabricator.wikimedia.org/T160569 [20:30:53] !log T160569 blocks the train until I can figure out what is causing it. The frequency is low so I haven't reverted to wmf.15, group 1 remains on wmf.16 refs T158997 [20:31:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:31:01] T160569: Variable wgCirrusSearchCompletionGeoContextSettings is not set in getConfiguration.php on line 105 - https://phabricator.wikimedia.org/T160569 [20:31:01] T158997: MW-1.29.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T158997 [20:31:15] (03CR) 10Jcrespo: [C: 032] Move db1067 from s2 to s1 as a db1057 replacement [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342906 (https://phabricator.wikimedia.org/T160435) (owner: 10Jcrespo) [20:31:31] canary is done, let's wait to see if anything weird happens [20:31:43] (03PS4) 10Dzahn: gerrit: rm role class gerrit::migration [puppet] - 10https://gerrit.wikimedia.org/r/342902 [20:31:45] halfak: ^ [20:31:48] RECOVERY - puppet last run on phab2001 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [20:32:55] (03Merged) 10jenkins-bot: Move db1067 from s2 to s1 as a db1057 replacement [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342906 (https://phabricator.wikimedia.org/T160435) (owner: 10Jcrespo) [20:33:04] (03CR) 10jenkins-bot: Move db1067 from s2 to s1 as a db1057 replacement [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342906 (https://phabricator.wikimedia.org/T160435) (owner: 10Jcrespo) [20:33:40] mutante: oh hai! [20:33:56] mutante: when you say "applied", do you mean puppet? [20:34:07] urandom: hi! yes, i mean that puppet changed the restbase config file [20:34:17] but i did not see it do a restart i think [20:34:34] mutante: roger that: i can restart restbase; thanks for the merge! [20:34:44] which could be as intended or not :) [20:34:46] ok, cool [20:34:59] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Move db1067 from s2 to s1 as a db1057 replacement (duration: 00m 42s) [20:35:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:36:02] (03PS2) 10Dzahn: rm manifests/microsites/peopleweb/migration.pp [puppet] - 10https://gerrit.wikimedia.org/r/342897 [20:36:44] (03PS1) 10EBernhardson: Keep variable around that was removed in wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342907 (https://phabricator.wikimedia.org/T160569) [20:36:58] twentyafterfour: ^ [20:37:17] Amir1, cool. Stepped away for a minute. [20:37:25] twentyafterfour: i have to run to a dentist appointment in 20 minutes, should be safe to deploy but if you want to wait i'll be back in ~2.5 hours or so (by SWAT i expect) [20:37:27] * halfak pulls up dashboard [20:38:00] Everything looks good, showed a bump in number of errored scores but that's natural. Everything is normal [20:38:06] going live everywhere [20:38:10] (03CR) 10Dzahn: [C: 032] rm manifests/microsites/peopleweb/migration.pp [puppet] - 10https://gerrit.wikimedia.org/r/342897 (owner: 10Dzahn) [20:38:15] (03PS3) 10Dzahn: rm manifests/microsites/peopleweb/migration.pp [puppet] - 10https://gerrit.wikimedia.org/r/342897 [20:38:20] +1 AmandaNP [20:38:27] Woops. +1 Amir1 [20:38:29] (03CR) 10Chad: "Heh, I was actually going to resurrect/repurpose this code a little for master/slave replication, but it's fine :)" [puppet] - 10https://gerrit.wikimedia.org/r/342902 (owner: 10Dzahn) [20:38:35] Sorry Amand.aNP [20:38:45] !log T111113: Restarting xenon (RESTBase Staging) to enable client encryption (canary) [20:38:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:38:53] T111113: Cassandra client encryption - https://phabricator.wikimedia.org/T111113 [20:39:41] thanks ebernhardson [20:40:38] (03CR) 1020after4: [C: 032] Keep variable around that was removed in wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342907 (https://phabricator.wikimedia.org/T160569) (owner: 10EBernhardson) [20:41:48] PROBLEM - restbase endpoints health on xenon is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.200, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by NewConnectionError(urllib3.connection.HTTPConnection object at 0x7f0ca6da7990: Failed to establish a new connection: [Errno 111] Connection refused,)) [20:41:58] PROBLEM - Restbase root url on xenon is CRITICAL: connect to address 10.64.0.200 and port 7231: Connection refused [20:42:53] ^^^ got thtat [20:42:54] that [20:44:49] !log restarting postgresql on maps clusters - T160209 [20:44:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:44:56] T160209: Increase track_activity_query_size on rendering databases - https://phabricator.wikimedia.org/T160209 [20:48:23] restarting [20:48:44] *restarting services now 1 down, 6 to go [20:51:03] (03PS2) 1020after4: Keep variable around that was removed in wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342907 (https://phabricator.wikimedia.org/T160569) (owner: 10EBernhardson) [20:52:51] halfak: https://ores.wikimedia.org/v2/scores/etwiki/ (still three nodes have not been done yet) [20:53:19] I can see the new models. :) [20:53:23] (03CR) 1020after4: [V: 032 C: 032] Keep variable around that was removed in wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342907 (https://phabricator.wikimedia.org/T160569) (owner: 10EBernhardson) [20:53:24] * halfak touches fingertips. [20:53:28] PROBLEM - puppet last run on ms-be1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:53:31] Why is scap so much slower than fabric? [20:54:05] halfak: what part is so much slower? [20:54:06] It shouldn't be, I think our repos are huge [20:54:08] !log ladsgroup@tin Finished deploy [ores/deploy@bc0bc74]: Mid-March deploy of ORES (T160279) (duration: 26m 46s) [20:54:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:54:15] T160279: Deploy ores in prod (Mid-March) - https://phabricator.wikimedia.org/T160279 [20:54:35] okay, deploy is done. [20:54:47] halfak: I think that submodules are handled inefficiently, currently. [20:54:50] halfak: Do we have a card for enabling ores review tool in etwiki? [20:55:07] twentyafterfour, in fab, we do the same thing. [20:55:12] Amir1, we do [20:55:46] (03Merged) 10jenkins-bot: Keep variable around that was removed in wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342907 (https://phabricator.wikimedia.org/T160569) (owner: 10EBernhardson) [20:55:51] Amir1, https://phabricator.wikimedia.org/T159609 [20:56:25] halfak: keyholder could slow it down a little, and scap doesn't reuse ssh connections when making multiple requests to the same target [20:56:26] I'll do it [20:56:58] shouldn't be much slower. fabric avoids forking by doing ssh in native python, right? that could be the difference [20:57:12] but I wouldn't expect that to be faster, really [20:57:14] halfak: we need to clean the done column [20:57:20] (03CR) 10jenkins-bot: Keep variable around that was removed in wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342907 (https://phabricator.wikimedia.org/T160569) (owner: 10EBernhardson) [20:57:24] We do. It's bad. [20:57:32] I'm thrashing recently [21:00:02] RECOVERY - Restbase root url on xenon is OK: HTTP OK: HTTP/1.1 200 - 15500 bytes in 0.035 second response time [21:00:52] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [21:01:32] !log twentyafterfour@tin Synchronized wmf-config: deploy I489c4aa1b862f053b0cb385d509f9ac5f8c6deca to fix 160569 and unblock the train refs T158997 (duration: 00m 45s) [21:01:34] (03PS3) 10BBlack: [WIP/POC] DNS zones to puppet repo [puppet] - 10https://gerrit.wikimedia.org/r/342887 [21:01:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:01:42] T158997: MW-1.29.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T158997 [21:02:33] https://grafana.wikimedia.org/dashboard/db/ores-extension?from=now-1h&to=now [21:02:53] everything looks fine, the review tool is happy in new edits in enwiki and wikidata [21:03:00] I'm heading to bed [21:03:12] halfak: o/ [21:03:44] o/ AmandaNP [21:03:49] Bah! Amir1! [21:03:58] * halfak needs to get his [TAB] game together [21:03:59] :)))) [21:04:15] there is a bug in IRC cloud I think [21:04:38] (03PS1) 10Eevans: Use empty 'ca' directive, not 'cert' [puppet] - 10https://gerrit.wikimedia.org/r/342912 (https://phabricator.wikimedia.org/T111113) [21:05:35] mutante: i hate to ask, but i bungled that config change, could I get another merge to fix it? [21:13:32] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:22:32] RECOVERY - puppet last run on ms-be1024 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:22:52] https://www.leaseweb.com/customers/cases/wikimedia a third? [21:23:30] is this still accurate? [21:24:05] (03PS1) 1020after4: Temp fix for wmf.16: define wgCirrusSearchCacheWarmers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342914 (https://phabricator.wikimedia.org/T160569) [21:25:48] (03PS2) 1020after4: Temp fix for wmf.16: define wgCirrusSearchCacheWarmers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342914 (https://phabricator.wikimedia.org/T160569) [21:26:16] (03CR) 1020after4: [C: 032] Temp fix for wmf.16: define wgCirrusSearchCacheWarmers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342914 (https://phabricator.wikimedia.org/T160569) (owner: 1020after4) [21:27:15] (03Merged) 10jenkins-bot: Temp fix for wmf.16: define wgCirrusSearchCacheWarmers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342914 (https://phabricator.wikimedia.org/T160569) (owner: 1020after4) [21:27:25] (03CR) 10jenkins-bot: Temp fix for wmf.16: define wgCirrusSearchCacheWarmers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342914 (https://phabricator.wikimedia.org/T160569) (owner: 1020after4) [21:29:00] !log twentyafterfour@tin Synchronized wmf-config: deploy Iad984924671b499aea19ef536ec6367e73a84a7c to fix 160569 and unblock the train refs T158997 (duration: 00m 49s) [21:29:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:29:07] T158997: MW-1.29.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T158997 [21:35:32] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:41:32] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [21:49:27] (03PS1) 1020after4: Undefined global wgCirrusSearchCompletionGeoContextProfiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342944 (https://phabricator.wikimedia.org/T160569) [21:50:06] (03CR) 10Eevans: [C: 031] "This is a breakfix for https://gerrit.wikimedia.org/r/342088, which mistakenly used the `cert` directive instead of `ca`." [puppet] - 10https://gerrit.wikimedia.org/r/342912 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [21:50:26] (03CR) 1020after4: [C: 032] Undefined global wgCirrusSearchCompletionGeoContextProfiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342944 (https://phabricator.wikimedia.org/T160569) (owner: 1020after4) [21:51:44] (03Merged) 10jenkins-bot: Undefined global wgCirrusSearchCompletionGeoContextProfiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342944 (https://phabricator.wikimedia.org/T160569) (owner: 1020after4) [21:51:53] (03CR) 10jenkins-bot: Undefined global wgCirrusSearchCompletionGeoContextProfiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342944 (https://phabricator.wikimedia.org/T160569) (owner: 1020after4) [21:53:04] 06Operations, 06Analytics-Kanban, 10ChangeProp, 10Reading-Web-Trending-Service, 06Services (done): Upgrade librdkafka 0.9.4 on SCB and Varnishes - https://phabricator.wikimedia.org/T159379#3104126 (10mobrovac) 05Open>03Resolved This has been completed. Thank you @Pchelolo and @Ottomata [21:53:22] PROBLEM - puppet last run on ms-fe1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:54:27] chanop, please update the Topic to link the new log location: https://wm-bot.wmflabs.org/logs/%23wikimedia-operations/ [21:54:29] Thanks! [21:54:50] !log twentyafterfour@tin Synchronized wmf-config/CirrusSearch-common.php: Deploy I67d71217d16cd75a2646cf658cead2c5fd740cd9 refs T160569 and T158997 (duration: 00m 42s) [21:54:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:54:58] T160569: Variable wgCirrusSearchCompletionGeoContextSettings is not set in getConfiguration.php on line 105 - https://phabricator.wikimedia.org/T160569 [21:54:58] T158997: MW-1.29.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T158997 [22:01:10] Platonides, ack, my bad, that one is ok. I didn't check some of the URL shorteners, after the first few gave me the old link. [22:02:30] hmm, indeed [22:03:32] RECOVERY - puppet last run on oxygen is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [22:05:10] twentyafterfour: it work now? [22:06:10] ebernhardson: no, every time I track one down, another unset global pops up [22:06:24] apparently a bunch of wgCirrusSearchCompletion* globals were removed [22:06:35] and it takes a long time for me to find them [22:07:23] see also T160578 for an apparently unrelated bug [22:07:23] T160578: Fatal error: Couldn't find constant Elastica\Search::OPTION_SEARCH_TYPE_COUNT in ApiFeatureUsageQueryEngineElastica.php on line 88 - https://phabricator.wikimedia.org/T160578 [22:07:39] twentyafterfour: yea the elasticsearch 5 completion suggester is a big rewrite at the elasticsearch level, so we changed a bunch of things. Because we wrote this code over ~2 months it's harder to remember what differed, but i think i can get git to tell me [22:08:18] I tried reading through the commit diffs but that wasn't very clear either [22:08:44] twentyafterfour: it's probably because whats deployed this week is a merged of a branch into master that's been around since january for this transition [22:08:46] the way we do configuration is horribly broken [22:09:01] * paladox wonders do these configs get linited [22:09:25] paladox: not in a useful way, no [22:09:36] paladox: depends what you mean by lint :P [22:09:43] this exception that's happening should really be detected in a commit hook [22:09:54] that would be a useful form of linting ;) [22:09:56] Oh. By lint i mean it tests for if the config is ok. [22:10:15] problem is, config has to be consistent across branches which is annoying [22:10:27] that shouldn't be necessary [22:10:28] the config is ok, but whats happening here is code in wmf.15 is taking a list of config it knows about, and asking a wmf.16 wiki to give it all those variables [22:10:49] the wmf.16 wiki says OMG, i've never heard of wgFooBarBaz because it was removed in .16 [22:11:15] we just need to teach mediawiki to remember ;D [22:12:06] twentyafterfour, what if we set up a live test site based on patches going into mediawiki (by that i mean merged changes but still catachable) Running all the extensions that are on wikimedia sites. [22:12:25] paladox: we have that, it's called beta. The problem is beta doesn't run two different version at the same time [22:12:57] (03CR) 10Volans: [C: 032] Upgrade to version 0.0.2 [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/342892 (owner: 10Volans) [22:13:00] oh, we could setup two sites in that cluster that ones two different versions? [22:13:37] (03Merged) 10jenkins-bot: Upgrade to version 0.0.2 [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/342892 (owner: 10Volans) [22:14:06] is there some way to know how many rows are in certain wiki db table? count(*) takes too long.... doesn't have to be exact. [22:14:07] well, the fix i'm planning for here that i havn't gotten around to is instead of wmf.16 asking wmf.15 for variables it knows about, have wmf.16 ask for 'the cirrus config' and have wmf.15 return whatever it knows about, flipping it around. That wasn't origionally done because thats not how the code in core that supports this was originally designed [22:14:21] paladox: this is a particularly obscure kind of bug and indicates a bad design pattern more than something that we should catch in CI [22:14:23] SMalyshev: you can use explain to get an estimate, but it can be off by a factor of 10 in each direction [22:14:30] oh [22:14:39] ebernhardson: factor of 10 is actually not too bad for me [22:14:51] ebernhardson: that's the right way to fix it, yeah [22:15:44] SMalyshev: try 'explain select 1 from table`, i think that should give a cardinality estimate [22:16:05] (03PS1) 10Catrope: Set $wgOresExtension for I63b11eff3a4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342947 (https://phabricator.wikimedia.org/T159763) [22:16:09] ebernhardson: yeah that worked thanks [22:20:08] (03PS1) 10EBernhardson: Undefined globals from CirrusSearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342948 (https://phabricator.wikimedia.org/T160569) [22:20:11] do we ever prune the archive table? [22:21:18] SMalyshev, show table status like ''\G [22:21:51] please do not run count(*) on production [22:21:58] jynus: I didn't :) [22:22:22] RECOVERY - puppet last run on ms-fe1003 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [22:22:38] show table status gives you an estimate size in rows, data size and index size [22:22:50] I tried actually to look it up on _p tables but neither explain nor table status work there... fortunately they do on terbium, so I git what I need [22:22:56] *I got [22:23:08] jynus: thanks, that is what I needed [22:24:15] 06Operations, 10ops-ulsfo: track the ops juniper kit in OIT den - https://phabricator.wikimedia.org/T160581#3104231 (10RobH) [22:27:44] twentyafterfour: random other thing that would be nice, i'd love if i could send a request that somehow uses a different wmf-* version that currently configured, so can test if something is going to work without rolling all the wikis forward [22:28:29] ebernhardson: I've been wanting that exact feature for a long time [22:28:43] I think we could do it with the wikimedia-debug extension somehow [22:29:46] urandom: i am back now, looking at the break-fix now [22:29:47] it would be really cool if you could run a request against master through production setup [22:30:06] (03CR) 10Dzahn: [C: 032] Use empty 'ca' directive, not 'cert' [puppet] - 10https://gerrit.wikimedia.org/r/342912 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [22:31:24] urandom: running puppet on cerium and praseo*, not xenon because it's disable with reason that you are testing [22:33:53] (03PS5) 10Paladox: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [22:34:05] (03PS4) 10Dzahn: rm manifests/microsites/peopleweb/migration.pp [puppet] - 10https://gerrit.wikimedia.org/r/342897 [22:34:22] PROBLEM - puppet last run on ocg1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:34:45] (03CR) 10Paladox: "Only added the interface for ipv4. I can test this change on gerrit-test3 once you make the changes _joe_ gave. :)" [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [22:34:59] !log Cassandra test hosts: deploy break-fix gerrit:342912 , run puppet on cerium and praseodymium. on xenon puppet is disabled. [22:35:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:41:11] mutante: ok [22:41:39] mutante: thanks! [22:42:59] urandom: ok, no problem [22:43:25] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 3 others: Purge Varnish cache when a banner is saved - https://phabricator.wikimedia.org/T154954#3104307 (10DStrine) [22:47:02] PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [22:47:23] (03CR) 1020after4: [C: 032] Undefined globals from CirrusSearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342948 (https://phabricator.wikimedia.org/T160569) (owner: 10EBernhardson) [22:48:43] (03Merged) 10jenkins-bot: Undefined globals from CirrusSearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342948 (https://phabricator.wikimedia.org/T160569) (owner: 10EBernhardson) [22:48:55] (03PS6) 10Paladox: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [22:51:07] !log twentyafterfour@tin Synchronized wmf-config/CirrusSearch-common.php: Deploy I4980daf73d5a98afedfae4a254a88ea078912296 refs T160569 and T158997 (duration: 00m 42s) [22:51:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:51:16] T160569: Variable wgCirrusSearchCompletionGeoContextSettings is not set in getConfiguration.php on line 105 - https://phabricator.wikimedia.org/T160569 [22:51:16] T158997: MW-1.29.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T158997 [22:52:02] RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge. [22:53:13] twentyafterfour: my repro of the config vars problem works now (and also doesn't log errors) [22:58:58] (03CR) 10jenkins-bot: Undefined globals from CirrusSearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342948 (https://phabricator.wikimedia.org/T160569) (owner: 10EBernhardson) [23:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170315T2300). Please do the needful. [23:00:04] tgr, RoanKattouw, and Jdlrobson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:26] o/ [23:02:05] I can SWAT [23:02:18] twentyafterfour: is SWAT all clear? Deployment go ok? [23:02:22] RECOVERY - puppet last run on ocg1003 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [23:04:03] (03PS3) 10Thcipriani: Deploy PageViewInfo to group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342684 (https://phabricator.wikimedia.org/T125917) (owner: 10Gergő Tisza) [23:04:14] (03CR) 10Jforrester: [C: 031] "For whenever anyone gets around to SWATing this, sign-off from me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342798 (https://phabricator.wikimedia.org/T160420) (owner: 10Urbanecm) [23:05:13] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342684 (https://phabricator.wikimedia.org/T125917) (owner: 10Gergő Tisza) [23:05:22] I'm here [23:05:33] okie doke [23:07:10] (03Merged) 10jenkins-bot: Deploy PageViewInfo to group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342684 (https://phabricator.wikimedia.org/T125917) (owner: 10Gergő Tisza) [23:07:21] im here btw thcipriani [23:07:32] jdlrobson: hi :) [23:07:58] tgr: your patch is live on mwdebug1002, check please [23:08:12] thcipriani: go ahead, I still need to deploy one patch but you can SWAT [23:08:42] twentyafterfour: ok, didn't mean to step on your toes :( [23:09:07] it's not a problem [23:09:11] (03CR) 10jenkins-bot: Deploy PageViewInfo to group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342684 (https://phabricator.wikimedia.org/T125917) (owner: 10Gergő Tisza) [23:09:45] thcipriani: works, thanks [23:09:58] tgr: ok, going live [23:10:06] (03PS2) 10Thcipriani: Set $wgOresExtension for I63b11eff3a4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342947 (https://phabricator.wikimedia.org/T159763) (owner: 10Catrope) [23:10:23] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342947 (https://phabricator.wikimedia.org/T159763) (owner: 10Catrope) [23:11:50] (03Merged) 10jenkins-bot: Set $wgOresExtension for I63b11eff3a4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342947 (https://phabricator.wikimedia.org/T159763) (owner: 10Catrope) [23:11:58] (03CR) 10jenkins-bot: Set $wgOresExtension for I63b11eff3a4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342947 (https://phabricator.wikimedia.org/T159763) (owner: 10Catrope) [23:12:00] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:342684|Deploy PageViewInfo to group1]] T125917 (duration: 00m 43s) [23:12:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:12:07] T125917: Deploy the PageViewInfo extension to production - https://phabricator.wikimedia.org/T125917 [23:12:07] ^ tgr live everywhere [23:12:28] thx thcipriani [23:13:17] (03CR) 10Dzahn: [C: 04-1] gerrit: convert to profile/role structure (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [23:13:35] RoanKattouw: your patch is live mwdebug1002 [23:13:54] (03PS7) 10Dzahn: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 [23:15:10] thcipriani: let me know when you're done I'll deploy https://gerrit.wikimedia.org/r/#/c/342961/ [23:15:28] twentyafterfour: will do [23:15:45] thcipriani: Thanks. It's a no-op, setting a $wg that won't exist until wmf.17 next week [23:16:07] RoanKattouw: ok, I'll go ahead and sync everywhere, thanks [23:17:55] !log thcipriani@tin Synchronized wmf-config: SWAT: [[gerrit:342947|Set $wgOresExtension for I63b11eff3a4]] T159763 (duration: 00m 44s) [23:18:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:18:04] T159763: Add feature flag to enable parts of ORES extension by default - https://phabricator.wikimedia.org/T159763 [23:18:09] (03PS2) 10Thcipriani: Restrict page images to lead section on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342696 (https://phabricator.wikimedia.org/T152115) (owner: 10Jdlrobson) [23:18:15] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342696 (https://phabricator.wikimedia.org/T152115) (owner: 10Jdlrobson) [23:18:32] w00t w00t [23:18:37] :) [23:20:02] (03CR) 10Paladox: gerrit: convert to profile/role structure (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [23:20:47] (03Merged) 10jenkins-bot: Restrict page images to lead section on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342696 (https://phabricator.wikimedia.org/T152115) (owner: 10Jdlrobson) [23:20:56] (03CR) 10jenkins-bot: Restrict page images to lead section on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342696 (https://phabricator.wikimedia.org/T152115) (owner: 10Jdlrobson) [23:21:34] jdlrobson: your change is live on mwdebug1002, check please [23:22:36] (03CR) 10Dzahn: [C: 04-1] "getting there step by step. but: Error: Could not find data item profile::gerrit::server::ipv4 in any Hiera data file and no default supp" [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [23:23:17] thcipriani: hard to tell.. i think i can only test this kind of change when it's live [23:23:34] (it saves a page prop) [23:23:39] oh good :) [23:23:41] but nothing has exploded so proceed [23:23:45] alright, going live. [23:25:22] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:342696|Restrict page images to lead section on cawiki]] T152115 (duration: 00m 42s) [23:25:28] ^ jdlrobson live everywhere [23:25:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:25:29] T152115: [Config] Restricted lead images to lead section - https://phabricator.wikimedia.org/T152115 [23:25:33] awesome on it thcipriani [23:27:28] works [23:27:31] thanks thcipriani [23:27:44] jdlrobson: cool, thanks for checking :) [23:27:53] twentyafterfour: all clear [23:28:15] 06Operations, 10hardware-requests: hardware request for netmon1001 replacement - https://phabricator.wikimedia.org/T156040#3104466 (10RobH) The order for netmon1002 has been placed. [23:32:04] !log twentyafterfour@tin Synchronized php-1.29.0-wmf.16/extensions/ApiFeatureUsage/ApiFeatureUsageQueryEngineElastica.php: deploy I2d8603cef61723ddcc8c1d37566c334333422248 refs T160578 T158997 (duration: 00m 42s) [23:32:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:32:13] T158997: MW-1.29.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T158997 [23:32:13] T160578: Fatal error: Couldn't find constant Elastica\Search::OPTION_SEARCH_TYPE_COUNT in ApiFeatureUsageQueryEngineElastica.php on line 88 - https://phabricator.wikimedia.org/T160578 [23:36:52] !log train unblocked and wmf.16 is deployed to group1 wikis. [23:36:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:44:11] (03PS8) 10Dzahn: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 [23:54:06] (03PS9) 10Dzahn: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 [23:55:07] (03CR) 10Paladox: "You should keep what you did in patch set 8. As we need those in hiera. Otherwise it may break on labs." [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn)