[00:00:04] twentyafterfour: Respected human, time to deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170810T0000). Please do the needful. [00:07:12] (03PS2) 10Ppchelko: JobQueueEventBus: Enable group1 - wikidata. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370975 (https://phabricator.wikimedia.org/T163380) [00:08:41] (03CR) 10jerkins-bot: [V: 04-1] JobQueueEventBus: Enable group1 - wikidata. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370975 (https://phabricator.wikimedia.org/T163380) (owner: 10Ppchelko) [00:10:12] (03PS3) 10Ppchelko: JobQueueEventBus: Enable group1 - wikidata. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370975 (https://phabricator.wikimedia.org/T163380) [00:11:50] (03CR) 10jerkins-bot: [V: 04-1] JobQueueEventBus: Enable group1 - wikidata. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370975 (https://phabricator.wikimedia.org/T163380) (owner: 10Ppchelko) [00:17:58] (03PS4) 10Ppchelko: JobQueueEventBus: Enable group1 - wikidata. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370975 (https://phabricator.wikimedia.org/T163380) [00:18:50] PROBLEM - MariaDB Slave Lag: s7 on db2068 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 322.29 seconds [00:21:04] PROBLEM - MariaDB Slave Lag: s7 on db1039 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 329.02 seconds [00:21:25] Checking [00:23:13] RECOVERY - MariaDB Slave Lag: s7 on db1039 is OK: OK slave_sql_lag Replication lag: 0.40 seconds [00:27:00] RECOVERY - MariaDB Slave Lag: s7 on db2068 is OK: OK slave_sql_lag Replication lag: 21.95 seconds [00:39:20] PROBLEM - puppet last run on mw1265 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:08:30] RECOVERY - puppet last run on mw1265 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [01:23:44] 10Operations, 10Analytics, 10Analytics-Wikistats, 10Wikidata, and 6 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3514829 (10Dcljr) @Reedy Should the To-Do list be completed before this task is closed? [02:28:10] PROBLEM - Host mw2256 is DOWN: PING CRITICAL - Packet loss = 100% [02:28:44] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.11) (duration: 09m 12s) [02:29:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:08:55] (03CR) 10Mobrovac: JobQueueEventBus: Enable group1 - wikidata. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370975 (https://phabricator.wikimedia.org/T163380) (owner: 10Ppchelko) [03:14:21] PROBLEM - MariaDB Slave Lag: s7 on db2068 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 320.22 seconds [03:16:52] (03PS1) 10BBlack: Revoke dzahn ssh keys [puppet] - 10https://gerrit.wikimedia.org/r/370981 [03:17:38] (03CR) 10BBlack: [C: 032] Revoke dzahn ssh keys [puppet] - 10https://gerrit.wikimedia.org/r/370981 (owner: 10BBlack) [03:20:32] !log batched cumin puppet agent run on all hosts (not forced) [03:20:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:23:33] Did anything change with phabricator.wikimedia.org lately? Python's urllib is now returning an HTTP 400 Bad Request for it. [03:23:36] $ python -c 'import urllib; f = urllib.urlopen("https://phabricator.wikimedia.org/T47731#2331789"); print(f.read())' [03:23:52] Web server change? Application upgrade? Something like that? [03:23:55] !log cp1008, restbase-dev100[456] have puppet disabled, manual "rm /etc/ssh/userkeys/dzahn" [03:24:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:27:00] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [03:28:30] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [03:29:16] Esther, maybe set a User Agent? [03:31:30] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 647.44 seconds [03:31:36] Krenair: I never did before. But maybe, yeah. [03:32:20] PROBLEM - MariaDB Slave Lag: s4 on db2065 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.02 seconds [03:32:21] PROBLEM - MariaDB Slave Lag: s4 on db2019 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 307.45 seconds [03:32:40] PROBLEM - MariaDB Slave Lag: s4 on db2051 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 316.28 seconds [03:32:41] PROBLEM - MariaDB Slave Lag: s4 on db2037 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 319.05 seconds [03:33:00] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:33:03] PROBLEM - MariaDB Slave Lag: s4 on db2058 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 331.19 seconds [03:33:09] Esther, oh wait [03:33:10] PROBLEM - MariaDB Slave Lag: s4 on db2044 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 335.08 seconds [03:33:12] Esther, the anchor [03:33:45] Whoa. [03:33:48] Esther, get rid of the #2331789 [03:33:54] When did that become a problem? :-/ [03:33:57] browsers handle that themselves, it's not sent to the server [03:34:00] That's silly. [03:34:14] pretty sure that's HTTP standard [03:34:18] Doesn't some spec say the server can just ignore that part? [03:34:34] Or use it to redirect the answer in a Location: header? [03:34:39] the user [03:34:42] it's part of the standard for URLs. UAs aren't supposed to transmit it, but servers probably *should* tolerate/ignore it [03:35:02] I'm mostly curious what changed. urllib definitely used to work with phabricator.wikimedia.org [03:35:11] I have no idea [03:35:12] It works with other sites still. The anchor seems to be the culprit. [03:35:21] $ python -c 'import urllib; f = urllib.urlopen("https://phabricator.wikimedia.org/T47731"); print(f.read())' [03:35:24] Worked fine. [03:35:41] I guess I could just strip it off. [03:35:53] yeah that would be best [03:36:10] PROBLEM - MariaDB Slave Lag: s4 on db2058 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.26 seconds [03:36:11] PROBLEM - MariaDB Slave Lag: s4 on db2044 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.75 seconds [03:36:21] PROBLEM - MariaDB Slave Lag: s4 on db2065 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 313.18 seconds [03:36:30] PROBLEM - MariaDB Slave Lag: s4 on db2019 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 316.32 seconds [03:36:41] PROBLEM - MariaDB Slave Lag: s4 on db2051 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 324.25 seconds [03:36:50] PROBLEM - MariaDB Slave Lag: s4 on db2037 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 327.06 seconds [03:38:30] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:39:40] RECOVERY - MariaDB Slave Lag: s7 on db2068 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [03:40:57] Krenair: Thanks for the anchor clue. [03:44:20] PROBLEM - MariaDB Slave Lag: s4 on db2044 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.91 seconds [03:44:30] PROBLEM - MariaDB Slave Lag: s4 on db2065 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 307.06 seconds [03:44:31] PROBLEM - MariaDB Slave Lag: s4 on db2019 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 310.81 seconds [03:44:50] PROBLEM - MariaDB Slave Lag: s4 on db2051 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 320.43 seconds [03:44:51] PROBLEM - MariaDB Slave Lag: s4 on db2037 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 322.45 seconds [03:45:11] PROBLEM - MariaDB Slave Lag: s4 on db2058 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 336.70 seconds [03:48:30] RECOVERY - MariaDB Slave Lag: s4 on db2065 is OK: OK slave_sql_lag Replication lag: 21.48 seconds [03:48:40] RECOVERY - MariaDB Slave Lag: s4 on db2019 is OK: OK slave_sql_lag Replication lag: 0.20 seconds [03:48:50] RECOVERY - MariaDB Slave Lag: s4 on db2051 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [03:48:51] RECOVERY - MariaDB Slave Lag: s4 on db2037 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [03:49:11] RECOVERY - MariaDB Slave Lag: s4 on db2058 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [03:49:20] RECOVERY - MariaDB Slave Lag: s4 on db2044 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [04:33:50] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 182.37 seconds [04:37:00] PROBLEM - Disk space on graphite2002 is CRITICAL: DISK CRITICAL - free space: /var/lib/carbon 78666 MB (3% inode=97%) [04:44:24] 10Operations, 10Analytics, 10Analytics-Wikistats, 10Wikidata, and 6 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3514911 (10Jayprakash12345) 05Resolved>03Open The namespace are not redirecting respectively Portal: प्रवेशद्वार: School: विद्यालय: Collection: संग्रह: And... [04:52:10] PROBLEM - MariaDB Slave Lag: s7 on db2068 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 378.52 seconds [04:56:21] PROBLEM - MariaDB Slave Lag: s4 on db2051 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 474.24 seconds [04:56:30] PROBLEM - MariaDB Slave Lag: s4 on db2037 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 478.00 seconds [04:56:50] PROBLEM - MariaDB Slave Lag: s4 on db2058 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 493.88 seconds [04:56:51] PROBLEM - MariaDB Slave Lag: s4 on db2044 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 499.01 seconds [04:57:01] PROBLEM - MariaDB Slave Lag: s4 on db2065 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 507.74 seconds [04:57:10] PROBLEM - MariaDB Slave Lag: s4 on db2019 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 511.88 seconds [04:59:20] RECOVERY - MariaDB Slave Lag: s7 on db2068 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [05:02:30] RECOVERY - MariaDB Slave Lag: s4 on db2051 is OK: OK slave_sql_lag Replication lag: 16.67 seconds [05:02:30] RECOVERY - MariaDB Slave Lag: s4 on db2037 is OK: OK slave_sql_lag Replication lag: 0.43 seconds [05:02:50] RECOVERY - MariaDB Slave Lag: s4 on db2058 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [05:03:00] RECOVERY - MariaDB Slave Lag: s4 on db2044 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [05:03:10] RECOVERY - MariaDB Slave Lag: s4 on db2065 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [05:03:11] RECOVERY - MariaDB Slave Lag: s4 on db2019 is OK: OK slave_sql_lag Replication lag: 0.17 seconds [05:32:52] 10Operations, 10OCG-General, 10Reading-Community-Engagement, 10Epic, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#3514932 (10Volker_E) [06:04:11] !log Removed 2FA for GoldRingChip account (T172878) [06:04:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:11:59] 10Operations, 10Analytics, 10Analytics-Wikistats, 10Wikidata, and 6 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3514961 (10Dereckson) Actually, we want to keep a standard common set of extensions on wikis, so forget the `Topic:` namespace and find a new name. You don't wan... [06:13:41] PROBLEM - pdfrender on scb1004 is CRITICAL: connect to address 10.64.48.29 and port 5252: Connection refused [06:22:31] PROBLEM - MariaDB Slave Lag: s4 on db2058 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 374.81 seconds [06:22:40] PROBLEM - MariaDB Slave Lag: s4 on db2044 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 381.97 seconds [06:22:50] PROBLEM - MariaDB Slave Lag: s4 on db2065 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 385.98 seconds [06:22:51] PROBLEM - MariaDB Slave Lag: s4 on db2019 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 391.18 seconds [06:24:10] PROBLEM - MariaDB Slave Lag: s4 on db2051 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 439.40 seconds [06:24:11] PROBLEM - MariaDB Slave Lag: s4 on db2037 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 441.24 seconds [06:27:41] RECOVERY - MariaDB Slave Lag: s4 on db2044 is OK: OK slave_sql_lag Replication lag: 18.47 seconds [06:27:50] RECOVERY - MariaDB Slave Lag: s4 on db2065 is OK: OK slave_sql_lag Replication lag: 0.32 seconds [06:28:00] RECOVERY - MariaDB Slave Lag: s4 on db2019 is OK: OK slave_sql_lag Replication lag: 0.41 seconds [06:28:10] RECOVERY - MariaDB Slave Lag: s4 on db2051 is OK: OK slave_sql_lag Replication lag: 0.16 seconds [06:28:20] RECOVERY - MariaDB Slave Lag: s4 on db2037 is OK: OK slave_sql_lag Replication lag: 0.24 seconds [06:28:40] RECOVERY - MariaDB Slave Lag: s4 on db2058 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [06:29:20] (03PS1) 10Urbanecm: Create a few of namespace aliases for hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370990 (https://phabricator.wikimedia.org/T172977) [06:33:16] (03PS1) 10Urbanecm: Set $wgArticleCountMethod to 'any' on srwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370991 (https://phabricator.wikimedia.org/T172974) [06:38:43] !log restart pdfrender on scb1004 [06:38:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:39:51] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.004 second response time [06:40:11] argh mw2256 again down [06:40:11] sigh [06:41:07] 10Operations, 10Analytics, 10Analytics-Wikistats, 10Wikidata, and 6 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3515000 (10Dereckson) [06:45:49] !log powercycle mw2256 - T163346 [06:46:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:46:01] T163346: mw2256 - hardware issue - https://phabricator.wikimedia.org/T163346 [06:47:50] RECOVERY - Host mw2256 is UP: PING OK - Packet loss = 0%, RTA = 31.92 ms [06:47:56] 10Operations, 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, and 5 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3515010 (10jcrespo) I've been told that several thousands of UPDATES Title::invalidateCache per sec... [06:50:00] PROBLEM - MariaDB Slave Lag: s4 on db2065 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 490.29 seconds [06:50:10] PROBLEM - MariaDB Slave Lag: s4 on db2019 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 495.32 seconds [06:50:21] PROBLEM - MariaDB Slave Lag: s4 on db2051 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 505.14 seconds [06:50:30] PROBLEM - MariaDB Slave Lag: s4 on db2037 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 507.71 seconds [06:50:41] PROBLEM - MariaDB Slave Lag: s4 on db2058 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 519.30 seconds [06:50:51] PROBLEM - MariaDB Slave Lag: s4 on db2044 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 525.81 seconds [06:59:17] 10Operations, 10Analytics, 10Analytics-Wikistats, 10Wikidata, and 6 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3477623 (10Anooprao) Wouldn't talk pages should named चर्चा instead of वार्ता in hindi [07:07:20] RECOVERY - MariaDB Slave Lag: s4 on db2019 is OK: OK slave_sql_lag Replication lag: 0.06 seconds [07:07:31] RECOVERY - MariaDB Slave Lag: s4 on db2051 is OK: OK slave_sql_lag Replication lag: 0.16 seconds [07:08:51] RECOVERY - MariaDB Slave Lag: s4 on db2058 is OK: OK slave_sql_lag Replication lag: 0.39 seconds [07:09:10] RECOVERY - MariaDB Slave Lag: s4 on db2065 is OK: OK slave_sql_lag Replication lag: 0.43 seconds [07:16:40] RECOVERY - MariaDB Slave Lag: s4 on db2037 is OK: OK slave_sql_lag Replication lag: 0.26 seconds [07:17:10] RECOVERY - MariaDB Slave Lag: s4 on db2044 is OK: OK slave_sql_lag Replication lag: 0.32 seconds [07:25:49] (03PS1) 10Jcrespo: labsdb: Set InnoDB as the default storage engine [puppet] - 10https://gerrit.wikimedia.org/r/370992 (https://phabricator.wikimedia.org/T172882) [07:26:50] (03CR) 10Jcrespo: [C: 032] labsdb: Set InnoDB as the default storage engine [puppet] - 10https://gerrit.wikimedia.org/r/370992 (https://phabricator.wikimedia.org/T172882) (owner: 10Jcrespo) [07:29:56] !log disabling semisync slave replication on all SSD hosts on s4 [07:30:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:32:25] !log enabling semisymc master replication on db1068 [07:32:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:02:09] 10Operations, 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, and 5 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3515038 (10jcrespo) To avoid the continuous lagging on non-directly pooled hosts (passive dc codfw,... [08:07:26] (03PS1) 10Jcrespo: mariadb: Remove custom salt grains due to salt deprecation [puppet] - 10https://gerrit.wikimedia.org/r/370993 (https://phabricator.wikimedia.org/T164780) [08:14:31] RECOVERY - Disk space on graphite2002 is OK: DISK OK [08:15:56] !log add 50G to carbon lv on graphite1003 and 100G on graphite2002 [08:16:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:18:16] Cc: godog --^ [08:25:23] (03CR) 10MarcoAurelio: [C: 04-1] "Until we evaluate and decide that we really want to keep bureaucrats managing that user right; which will require some time. Also commit m" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370791 (https://phabricator.wikimedia.org/T101983) (owner: 10TerraCodes) [08:26:44] (03PS2) 10MarcoAurelio: Enabling OAuth on foundationwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370310 (https://phabricator.wikimedia.org/T170301) [08:31:17] (03PS1) 10Giuseppe Lavagetto: Fix call to compile_cmd_env [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370994 [08:41:26] (03PS6) 10Elukey: statistics::package: add missing package depencency [puppet] - 10https://gerrit.wikimedia.org/r/370786 (https://phabricator.wikimedia.org/T171924) [08:59:37] !log update librdkafka1 to 0.9.4.1 on eventlog1001 [08:59:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:14:40] PROBLEM - MariaDB Slave IO: s4 on db2051 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 1593, Errmsg: Fatal error: Failed to run after_read_event hook [09:15:13] ? [09:17:19] !log disabling lag notification on all s4 replicas [09:17:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:20:41] RECOVERY - MariaDB Slave IO: s4 on db2051 is OK: OK slave_io_state Slave_IO_Running: Yes [09:23:53] (03PS7) 10Elukey: statistics::package: add missing package depencency [puppet] - 10https://gerrit.wikimedia.org/r/370786 (https://phabricator.wikimedia.org/T171924) [09:25:37] !log gehel@tin Started deploy [wdqs/wdqs@c186e3e]: (no justification provided) [09:25:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:27:06] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/7381/" [puppet] - 10https://gerrit.wikimedia.org/r/370786 (https://phabricator.wikimedia.org/T171924) (owner: 10Elukey) [09:27:08] !log gehel@tin Finished deploy [wdqs/wdqs@c186e3e]: (no justification provided) (duration: 01m 31s) [09:27:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:29:10] (03PS1) 10Giuseppe Lavagetto: puppet-compiler: bump version, run puppetdb-populate [puppet] - 10https://gerrit.wikimedia.org/r/370995 (https://phabricator.wikimedia.org/T150456) [09:30:00] RECOVERY - puppet last run on notebook1001 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [09:30:10] \o/ [09:30:28] !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=wdqs2001.codfw.wmnet [09:30:36] (03PS2) 10Giuseppe Lavagetto: puppet-compiler: bump version, run puppetdb-populate [puppet] - 10https://gerrit.wikimedia.org/r/370995 (https://phabricator.wikimedia.org/T150456) [09:30:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:30:51] !log repooling wdqs2001, long after data reload completed [09:31:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:31:04] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet-compiler: bump version, run puppetdb-populate [puppet] - 10https://gerrit.wikimedia.org/r/370995 (https://phabricator.wikimedia.org/T150456) (owner: 10Giuseppe Lavagetto) [09:31:40] RECOVERY - puppet last run on notebook1002 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [09:32:18] 10Operations, 10cloud-services-team, 10Patch-For-Review: notebook100[12] - Invalid relationship: Apt::Pin[r-base] - https://phabricator.wikimedia.org/T171924#3515121 (10elukey) 05Open>03Resolved a:03elukey [09:35:16] (03CR) 10Giuseppe Lavagetto: [C: 032] Fix call to compile_cmd_env [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370994 (owner: 10Giuseppe Lavagetto) [09:58:48] !log continuing cloning of dbstore2002 to dbstore2001 [09:59:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:02:50] 10Operations, 10puppet-compiler, 10Patch-For-Review, 10User-Joe: puppet compiler fails with modules using puppetdb - https://phabricator.wikimedia.org/T150456#3515144 (10Joe) 05Open>03Resolved [10:04:20] <_joe_> finally. [10:06:18] nice! [10:06:50] (03PS1) 10Giuseppe Lavagetto: [TEST] Test commit for T172362 [puppet] - 10https://gerrit.wikimedia.org/r/370997 [10:10:31] PROBLEM - MariaDB Slave Lag: s4 on db2044 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 435.27 seconds [10:11:10] PROBLEM - MariaDB Slave Lag: s4 on db2037 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 447.76 seconds [10:13:31] RECOVERY - MariaDB Slave Lag: s4 on db2044 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [10:14:10] RECOVERY - MariaDB Slave Lag: s4 on db2037 is OK: OK slave_sql_lag Replication lag: 0.04 seconds [10:16:46] jouncebot: next [10:16:46] In 2 hour(s) and 43 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170810T1300) [10:25:10] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [10:25:27] (03Abandoned) 10Giuseppe Lavagetto: [TEST] Test commit for T172362 [puppet] - 10https://gerrit.wikimedia.org/r/370997 (owner: 10Giuseppe Lavagetto) [10:30:50] (03PS4) 10Giuseppe Lavagetto: role::mediawiki::jobrunner/videoscaler: switch to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/368621 (https://phabricator.wikimedia.org/T171704) [10:32:29] (03CR) 10Giuseppe Lavagetto: [C: 032] role::mediawiki::jobrunner/videoscaler: switch to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/368621 (https://phabricator.wikimedia.org/T171704) (owner: 10Giuseppe Lavagetto) [10:56:27] PROBLEM - Check Varnish expiry mailbox lag on cp1072 is CRITICAL: CRITICAL: expiry mailbox lag is 2003789 [11:06:29] (03CR) 10Zoranzoki21: [C: 031] Set $wgArticleCountMethod to 'any' on srwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370991 (https://phabricator.wikimedia.org/T172974) (owner: 10Urbanecm) [11:17:24] !log disabled puppet on cp3032 and restarted varnishkafka with debug logging [11:17:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:15] !log restored varnishakafka on cp3032 [12:07:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:46] 10Operations, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681#3515219 (10elukey) Upgraded librdkafka to 0.9.4.1 on eventlog1001 but no real changes registered. I tried to set again d... [12:26:34] 10Operations, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681#3515242 (10elukey) Another interesting thing is the fact that the error ***always*** happens once per minute at the first... [12:26:47] PROBLEM - MariaDB Slave Lag: s4 on db2044 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 339.65 seconds [12:27:17] PROBLEM - MariaDB Slave Lag: s4 on db2037 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 346.43 seconds [12:29:17] RECOVERY - MariaDB Slave Lag: s4 on db2037 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [12:29:48] RECOVERY - MariaDB Slave Lag: s4 on db2044 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [12:47:52] (03PS1) 10Elukey: role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) [12:48:14] (03CR) 10jerkins-bot: [V: 04-1] role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) (owner: 10Elukey) [12:49:31] (03PS2) 10Elukey: role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) [12:49:52] (03CR) 10jerkins-bot: [V: 04-1] role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) (owner: 10Elukey) [12:50:14] this one was sitting there waiting for me to get into the trap [12:51:43] (03PS3) 10Elukey: role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) [12:52:11] (03CR) 10jerkins-bot: [V: 04-1] role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) (owner: 10Elukey) [12:52:42] ah this is new WARNING optional parameter listed before required parameter (parameter_order) [12:55:02] (03PS4) 10Elukey: role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) [12:56:42] !log Stop MySQL on db2045 to upgrade socket location - T148507 [12:56:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:58] (03PS5) 10Gehel: Switch elastic1017-1031 to niofs [puppet] - 10https://gerrit.wikimedia.org/r/370834 (https://phabricator.wikimedia.org/T169498) (owner: 10EBernhardson) [12:59:05] (03CR) 10Gehel: [C: 032] Switch elastic1017-1031 to niofs [puppet] - 10https://gerrit.wikimedia.org/r/370834 (https://phabricator.wikimedia.org/T169498) (owner: 10EBernhardson) [12:59:07] (03CR) 10Marostegui: [C: 032] db2045.yaml: Update socket location [puppet] - 10https://gerrit.wikimedia.org/r/370958 (https://phabricator.wikimedia.org/T148507) (owner: 10Marostegui) [12:59:18] (03PS2) 10Marostegui: db2045.yaml: Update socket location [puppet] - 10https://gerrit.wikimedia.org/r/370958 (https://phabricator.wikimedia.org/T148507) [13:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170810T1300). [13:00:04] Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:00:18] Here [13:00:56] o/ [13:01:13] (03CR) 10Mobrovac: role:eventubus: set deploy-service as scap deploy_user (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) (owner: 10Elukey) [13:01:58] Urbanecm: looks like there is only one patch from you, reviewing... [13:02:01] PROBLEM - mysqld processes on db2045 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [13:02:21] That is me [13:02:23] I think I downtimed it [13:02:32] Ah no, I downtimed replication only :( [13:02:34] sorry [13:03:56] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370815 (https://phabricator.wikimedia.org/T172894) (owner: 10Urbanecm) [13:05:03] RECOVERY - mysqld processes on db2045 is OK: PROCS OK: 1 process with command name mysqld [13:05:20] (03Merged) 10jenkins-bot: Enable NewUserMessage on knwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370815 (https://phabricator.wikimedia.org/T172894) (owner: 10Urbanecm) [13:06:07] (03CR) 10jenkins-bot: Enable NewUserMessage on knwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370815 (https://phabricator.wikimedia.org/T172894) (owner: 10Urbanecm) [13:10:26] Urbanecm: [13:10:32] Yes? [13:10:59] sorry, clicked something wrong, the patch is at mwdebug1002 [13:11:09] Thank you. [13:11:21] please test and let me know if I can push [13:11:25] (further) [13:13:21] Urbanecm: can I deploy the patch? [13:13:40] Yes :) [13:14:59] Urbanecm: ok, deploying [13:15:35] ack [13:15:54] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:370815|Enable NewUserMessage on knwiki (T172894)]] (duration: 00m 52s) [13:16:04] Urbanecm: deployed, please check [13:16:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:05] T172894: Enable Extension:NewUserMessage on knwiki - https://phabricator.wikimedia.org/T172894 [13:16:30] Great, thank you :) [13:16:43] since there is nothing else for swat today... [13:16:52] !log EU SWAT finished [13:17:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:39] zeljkof, can we reopen? I've realized I wished to add one change too :D [13:17:47] Urbanecm: sure [13:17:49] !log Drop m3 databases from dbstore1002 - T156758 [13:17:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:59] T156758: Drop m3 from dbstore servers - https://phabricator.wikimedia.org/T156758 [13:18:09] !log EU SWAT, part two [13:18:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:20] zeljkof, thank you. https://gerrit.wikimedia.org/r/#/c/370991/ is the patch. A script will be needed probably [13:18:33] (03PS5) 10Elukey: role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) [13:19:00] (03CR) 10jerkins-bot: [V: 04-1] role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) (owner: 10Elukey) [13:19:27] This is uncheckable patch, please just push. A script will be needed too. [13:19:53] Urbanecm: please document how to run the script in the patch comment [13:19:57] Sure [13:20:44] (03PS1) 10Giuseppe Lavagetto: role::mediawiki::memcached: switch to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/371021 [13:20:46] (03PS1) 10Giuseppe Lavagetto: confd: fix templates for the future parser. [puppet] - 10https://gerrit.wikimedia.org/r/371022 [13:20:48] (03PS1) 10Giuseppe Lavagetto: profile::docker::storage: fix guard around vg_to_remove [puppet] - 10https://gerrit.wikimedia.org/r/371023 [13:20:50] (03PS6) 10Elukey: role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) [13:21:00] (03CR) 10Urbanecm: "A script will be needed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370991 (https://phabricator.wikimedia.org/T172974) (owner: 10Urbanecm) [13:21:05] Done [13:21:12] (03CR) 10Luke081515: [C: 031] Set $wgArticleCountMethod to 'any' on srwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370991 (https://phabricator.wikimedia.org/T172974) (owner: 10Urbanecm) [13:21:14] (03PS2) 10Zfilipin: Set $wgArticleCountMethod to 'any' on srwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370991 (https://phabricator.wikimedia.org/T172974) (owner: 10Urbanecm) [13:21:33] Urbanecm: sure that --wiki is not needed? [13:21:38] https://phabricator.wikimedia.org/T131771#2205465 [13:21:41] Opps, mistake [13:21:54] so use --wiki=srwikisource as well [13:21:59] as param [13:22:27] (03CR) 10Urbanecm: "Mistake in syntax above" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370991 (https://phabricator.wikimedia.org/T172974) (owner: 10Urbanecm) [13:22:41] Thank you Sagan :) [13:22:50] Urbanecm: you're welcome :) [13:23:11] my favorite random quote for today: Code wars: the empire strikes tap [13:23:20] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370991 (https://phabricator.wikimedia.org/T172974) (owner: 10Urbanecm) [13:23:51] 10Operations, 10Puppet, 10Patch-For-Review, 10User-Joe: Switch all hosts to the future parser - https://phabricator.wikimedia.org/T171704#3515324 (10Joe) Full list of hosts using the future parser: ``` (297) conf[2001-2003].codfw.wmnet,conf[1001-1003].eqiad.wmnet,mw[2017,2097,2099-2147,2150-2258].codfw.wm... [13:24:45] (03Merged) 10jenkins-bot: Set $wgArticleCountMethod to 'any' on srwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370991 (https://phabricator.wikimedia.org/T172974) (owner: 10Urbanecm) [13:26:31] (03CR) 10jenkins-bot: Set $wgArticleCountMethod to 'any' on srwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370991 (https://phabricator.wikimedia.org/T172974) (owner: 10Urbanecm) [13:26:46] (03PS7) 10Elukey: role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) [13:26:52] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:370991|Set $wgArticleCountMethod to any on srwikisource (T172974)]] (duration: 00m 53s) [13:27:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:06] T172974: Set $wgArticleCountMethod to 'any' for srwikisource - https://phabricator.wikimedia.org/T172974 [13:27:08] (03CR) 10jerkins-bot: [V: 04-1] role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) (owner: 10Elukey) [13:28:38] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2045" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371027 [13:28:44] (03PS2) 10Marostegui: Revert "db-codfw.php: Depool db2045" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371027 [13:30:05] (03CR) 10Elukey: [C: 031] Allow ANALYTICS_NETWORKS to talk to druid zookeeper cluster [puppet] - 10https://gerrit.wikimedia.org/r/370865 (https://phabricator.wikimedia.org/T168550) (owner: 10Ottomata) [13:31:39] (03CR) 10Zfilipin: "Script: https://phabricator.wikimedia.org/T172974#3515358" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370991 (https://phabricator.wikimedia.org/T172974) (owner: 10Urbanecm) [13:31:50] Urbanecm: deployed, ran the script [13:31:58] Great. Thank you :) [13:32:15] Urbanecm: anything else? ;) [13:32:34] No, that's finally all :). Thank you for your SWAT [13:32:44] !log EU SWAT finished [13:32:55] Urbanecm: see you next week :D [13:32:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:33:22] See you :) [13:34:00] Urbanecm: are you going to wikimania? just asking, I am not, so I will be around for deploys [13:34:25] (03PS8) 10Elukey: role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) [13:34:34] No, I'm not [13:35:34] (03PS1) 10Urbanecm: Update wikviersity favicon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371029 (https://phabricator.wikimedia.org/T160491) [13:36:14] Urbanecm: wikviersity? [13:36:34] (03PS2) 10Urbanecm: Update wikiversity favicon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371029 (https://phabricator.wikimedia.org/T160491) [13:36:40] Everything but message was correct :D [13:38:26] (03CR) 10Elukey: "pcc https://puppet-compiler.wmflabs.org/compiler02/7388/" [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) (owner: 10Elukey) [13:39:24] elukey: thanks re: graphite disk space! I'll take a look now [13:40:11] (03PS2) 10Giuseppe Lavagetto: role::mediawiki::memcached: switch to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/371021 [13:40:49] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3515429 (10Ottomata) > As a reminder, there is still no real support for EventLogging data analysis using Hive/Hadoop. IT IS COMING! :) > ColumnStore Or something. There... [13:41:26] (03CR) 10Mobrovac: [C: 04-1] role:eventubus: set deploy-service as scap deploy_user (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) (owner: 10Elukey) [13:43:08] !log Compress cebwiki on db1095 - T153058 [13:43:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:43:19] T153058: LabsDB infrastructure pending work - https://phabricator.wikimedia.org/T153058 [13:44:29] (03PS9) 10Elukey: role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) [13:44:52] (03PS3) 10Giuseppe Lavagetto: role::mediawiki::memcached: switch to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/371021 [13:45:22] (03CR) 10jerkins-bot: [V: 04-1] role::mediawiki::memcached: switch to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/371021 (owner: 10Giuseppe Lavagetto) [13:48:21] (03CR) 10Mobrovac: [C: 031] role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) (owner: 10Elukey) [13:49:31] (03CR) 10Ottomata: role:eventubus: set deploy-service as scap deploy_user (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) (owner: 10Elukey) [13:54:30] (03PS10) 10Elukey: role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) [13:57:48] (03PS4) 10Giuseppe Lavagetto: role::mediawiki::memcached: switch to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/371021 [13:58:13] (03CR) 10jerkins-bot: [V: 04-1] role::mediawiki::memcached: switch to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/371021 (owner: 10Giuseppe Lavagetto) [13:59:58] (03PS3) 10Filippo Giunchedi: mediawiki: clean up deprecated fonts packages [puppet] - 10https://gerrit.wikimedia.org/r/370969 (https://phabricator.wikimedia.org/T170817) [14:00:00] (03PS1) 10Filippo Giunchedi: thumbor: do not hardcode jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/371034 (https://phabricator.wikimedia.org/T170817) [14:04:13] (03CR) 10Elukey: "new pcc https://puppet-compiler.wmflabs.org/compiler02/7392/kafka1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) (owner: 10Elukey) [14:09:33] (03PS2) 10Giuseppe Lavagetto: confd: fix templates for the future parser. [puppet] - 10https://gerrit.wikimedia.org/r/371022 [14:09:35] (03PS5) 10Giuseppe Lavagetto: role::mediawiki::memcached: switch to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/371021 [14:10:10] (03CR) 10jerkins-bot: [V: 04-1] role::mediawiki::memcached: switch to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/371021 (owner: 10Giuseppe Lavagetto) [14:11:19] !log restart kafka1012 temporary with some logs to TRACE to debug T172681 [14:11:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:30] T172681: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681 [14:17:22] (03PS1) 10Filippo Giunchedi: package_builder: add hooks for stretch [puppet] - 10https://gerrit.wikimedia.org/r/371039 [14:18:41] 10Operations, 10Thumbor: Long running thumbnail requests locking up Thumbor instances - https://phabricator.wikimedia.org/T172930#3513559 (10Gilles) I did some load testing yesterday that caused 502s, do you have a list of pages with times? [14:21:49] (03CR) 10Giuseppe Lavagetto: [C: 031] package_builder: add hooks for stretch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/371039 (owner: 10Filippo Giunchedi) [14:28:25] (03PS2) 10Filippo Giunchedi: package_builder: add hooks for stretch [puppet] - 10https://gerrit.wikimedia.org/r/371039 [14:28:53] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2045" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371027 (owner: 10Marostegui) [14:30:12] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2045" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371027 (owner: 10Marostegui) [14:30:25] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2045" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371027 (owner: 10Marostegui) [14:31:12] (03PS2) 10Rush: Add hiwikiversity to labsrecursor [puppet] - 10https://gerrit.wikimedia.org/r/369924 (https://phabricator.wikimedia.org/T168765) (owner: 10Reedy) [14:31:47] (03CR) 10Rush: [C: 032] Add hiwikiversity to labsrecursor [puppet] - 10https://gerrit.wikimedia.org/r/369924 (https://phabricator.wikimedia.org/T168765) (owner: 10Reedy) [14:32:09] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2045 - T151029 (duration: 00m 51s) [14:32:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:20] T151029: duplicate key problems - https://phabricator.wikimedia.org/T151029 [14:33:38] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler02/7396/copper.eqiad.wmnet/" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/371039 (owner: 10Filippo Giunchedi) [14:33:44] (03CR) 10Filippo Giunchedi: [C: 032] package_builder: add hooks for stretch [puppet] - 10https://gerrit.wikimedia.org/r/371039 (owner: 10Filippo Giunchedi) [14:33:55] (03PS3) 10Filippo Giunchedi: package_builder: add hooks for stretch [puppet] - 10https://gerrit.wikimedia.org/r/371039 [14:37:50] 10Operations, 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, and 5 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3515701 (10Marostegui) @daniel this happened again yesterday evening (Montreal time, and we got som... [14:39:10] 10Operations, 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, and 5 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3515704 (10hoo) @Marostegui @daniel: Please keep me in the loop when discussing this. [14:39:20] 10Operations, 10OCG-General, 10Reading-Community-Engagement, 10Epic, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#3515706 (10phuedx) [14:40:01] (03PS1) 10Marostegui: db-codfw.php: Pool db2075 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371043 (https://phabricator.wikimedia.org/T170662) [14:42:28] (03CR) 10Marostegui: [C: 032] db-codfw.php: Pool db2075 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371043 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [14:42:40] (03PS5) 10Rush: openstack: add wikitech-grep as utility for adminscripts [puppet] - 10https://gerrit.wikimedia.org/r/363896 (https://phabricator.wikimedia.org/T169820) [14:43:32] (03CR) 10Rush: [C: 032] openstack: add wikitech-grep as utility for adminscripts [puppet] - 10https://gerrit.wikimedia.org/r/363896 (https://phabricator.wikimedia.org/T169820) (owner: 10Rush) [14:43:58] (03Merged) 10jenkins-bot: db-codfw.php: Pool db2075 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371043 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [14:44:55] (03PS2) 10Rush: labsdb: maintain-views and maintain_meta-p sock option [puppet] - 10https://gerrit.wikimedia.org/r/370217 (https://phabricator.wikimedia.org/T172496) [14:45:18] 10Operations, 10Thumbor: Long running thumbnail requests locking up Thumbor instances - https://phabricator.wikimedia.org/T172930#3515768 (10fgiunchedi) >>! In T172930#3515622, @Gilles wrote: > Do you have a list of pages with times? How often does it happen organically? Good question, I don't have a list of... [14:45:22] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Pool db2075 - T170662 (duration: 00m 51s) [14:45:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:35] T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662 [14:46:27] (03CR) 10jenkins-bot: db-codfw.php: Pool db2075 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371043 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [14:48:31] (03CR) 10Rush: [C: 032] labsdb: maintain-views and maintain_meta-p sock option [puppet] - 10https://gerrit.wikimedia.org/r/370217 (https://phabricator.wikimedia.org/T172496) (owner: 10Rush) [14:49:40] (03Abandoned) 10Rush: diamond: monitor nscd behavior for ldap clients [puppet] - 10https://gerrit.wikimedia.org/r/265847 (owner: 10Rush) [14:53:18] PROBLEM - Host mw2256.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:55:10] this one is me and papaul, applying thermal paste, forgot to silence [14:55:11] (03PS1) 10Marostegui: db-codfw.php: Depool db2046 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371050 (https://phabricator.wikimedia.org/T151029) [14:55:21] (the mgmt) [14:58:06] (03PS1) 10Filippo Giunchedi: package_builder: require gnupg too for apt.wikimedia.org hook [puppet] - 10https://gerrit.wikimedia.org/r/371051 [14:59:22] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2046 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371050 (https://phabricator.wikimedia.org/T151029) (owner: 10Marostegui) [14:59:46] (03CR) 10Filippo Giunchedi: [C: 032] package_builder: require gnupg too for apt.wikimedia.org hook [puppet] - 10https://gerrit.wikimedia.org/r/371051 (owner: 10Filippo Giunchedi) [14:59:55] (03PS2) 10Filippo Giunchedi: package_builder: require gnupg too for apt.wikimedia.org hook [puppet] - 10https://gerrit.wikimedia.org/r/371051 [15:00:00] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] package_builder: require gnupg too for apt.wikimedia.org hook [puppet] - 10https://gerrit.wikimedia.org/r/371051 (owner: 10Filippo Giunchedi) [15:03:26] !log Poweroff es2013 for maintenance - T172265 [15:03:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:38] T172265: es2013 faulty BBU - https://phabricator.wikimedia.org/T172265 [15:03:46] (03PS6) 10Rush: Add a default Apache 2.0 license [puppet] - 10https://gerrit.wikimedia.org/r/183862 (https://phabricator.wikimedia.org/T67270) [15:03:47] RECOVERY - Host mw2256.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.94 ms [15:04:04] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2046 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371050 (https://phabricator.wikimedia.org/T151029) (owner: 10Marostegui) [15:04:24] (03Abandoned) 10Rush: maintain-dbusers: cleanup one-time legacy functions [puppet] - 10https://gerrit.wikimedia.org/r/355103 (owner: 10Rush) [15:05:21] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2046 - T151029 (duration: 00m 50s) [15:05:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:33] T151029: duplicate key problems - https://phabricator.wikimedia.org/T151029 [15:06:18] (03CR) 10jenkins-bot: db-codfw.php: Depool db2046 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371050 (https://phabricator.wikimedia.org/T151029) (owner: 10Marostegui) [15:06:45] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3515845 (10Halfak) As I see it, the outcome is to phase out the use of the multisource database commonly referred to as analytics-store. This involved determining what the m... [15:09:15] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3515849 (10Halfak) @jcrespo, I thought it was clear from your comments in IRC that you'd like to begin phasing out the singular big multisource database host. I created this... [15:09:44] !log Stop replication on db2046 to fix duplicate entries - T151029 [15:09:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:10:21] 10Operations, 10Thumbor, 10Patch-For-Review, 10Performance-Team (Radar): Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817#3515856 (10fgiunchedi) I started playing with thumbor on stretch and building the package on copper yields an error with pillow 4 whereas thumbor wants pil... [15:12:49] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3515874 (10jcrespo) > Phase out and replace analytics-store / As I see it, the outcome is to phase out the use of the multisource database (many databases on a single host)... [15:15:25] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3515883 (10Ottomata) > , there are other tickets about eventlogging in hadoop so let's please keep that conversation on those. +1. Just to be clear T162610 is about EventLog... [15:16:17] (03PS1) 10Rush: wikireplica: add cnames for wikireplica services [dns] - 10https://gerrit.wikimedia.org/r/371055 (https://phabricator.wikimedia.org/T166404) [15:16:28] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3515888 (10jcrespo) Also, please stop T172410#3507261 re-adding me to this ticket- when I explicitly asked to be left aside and have nothing to do with whatever you want to do. [15:16:58] PROBLEM - Host es2013.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [15:19:29] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3515898 (10Ottomata) @jcrespo, earlier in this ticket, you said: > we are going to deprecate multi-source replication, so instead of one big fat server that is unmaintained,... [15:21:27] (03CR) 10BryanDavis: [C: 031] wikireplica: add cnames for wikireplica services [dns] - 10https://gerrit.wikimedia.org/r/371055 (https://phabricator.wikimedia.org/T166404) (owner: 10Rush) [15:21:47] (03Abandoned) 10Thcipriani: Scap: add beta canary_dashboard_url config value [puppet] - 10https://gerrit.wikimedia.org/r/353179 (https://phabricator.wikimedia.org/T164981) (owner: 10Thcipriani) [15:27:27] RECOVERY - Host es2013.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.72 ms [15:27:39] 10Operations, 10Ops-Access-Requests, 10Research, 10Patch-For-Review: Access for new Research Scientist: Diego Saez - https://phabricator.wikimedia.org/T172891#3515923 (10RobH) [15:29:41] (03PS2) 10Thcipriani: Beta: Add prometheus/jmx_exporter to scap::sources [puppet] - 10https://gerrit.wikimedia.org/r/337038 [15:30:45] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: es2013 faulty BBU - https://phabricator.wikimedia.org/T172265#3515925 (10Papaul) a:05Papaul>03Marostegui Raid controller replacement complete [15:31:58] (03PS11) 10Rush: tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) [15:32:39] (03CR) 10jerkins-bot: [V: 04-1] tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) (owner: 10Rush) [15:35:56] (03PS12) 10Rush: tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) [15:37:40] (03PS13) 10Rush: tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) [15:37:58] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: es2013 faulty BBU - https://phabricator.wikimedia.org/T172265#3515977 (10Marostegui) Thanks @Papaul Everything looks good so far, it is re-charging. I have disabled BBU auto-learn. The BBU itself looks good, and so does the storage ``` Charger Status:... [15:39:32] (03CR) 10MarkTraceur: [C: 031] Upgrade to 1.2 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/370907 (https://phabricator.wikimedia.org/T161719) (owner: 10Gilles) [15:40:17] PROBLEM - Check Varnish expiry mailbox lag on cp1049 is CRITICAL: CRITICAL: expiry mailbox lag is 2066862 [15:40:19] (03PS14) 10Rush: tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) [15:40:53] (03PS6) 10Mark Bergsma: WIP: add prometheus metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/370962 (https://phabricator.wikimedia.org/T171710) (owner: 10Ema) [15:40:56] (03PS1) 10Mark Bergsma: Add generic monitor metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371060 (https://phabricator.wikimedia.org/T171710) [15:40:58] (03PS1) 10Mark Bergsma: Add monitoring specific metric to ProxyFetch [debs/pybal] - 10https://gerrit.wikimedia.org/r/371061 (https://phabricator.wikimedia.org/T171710) [15:41:36] (03PS16) 10Rush: tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) [15:43:18] (03PS3) 10Ottomata: Allow ANALYTICS_NETWORKS to talk to druid zookeeper cluster [puppet] - 10https://gerrit.wikimedia.org/r/370865 (https://phabricator.wikimedia.org/T168550) [15:44:27] 10Operations, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681#3516024 (10elukey) Example of more verbose logging from kafka1012: ``` [2017-08-10 14:16:01,259] DEBUG Processor 0 liste... [15:44:56] (03CR) 10Ottomata: [C: 032] Allow ANALYTICS_NETWORKS to talk to druid zookeeper cluster [puppet] - 10https://gerrit.wikimedia.org/r/370865 (https://phabricator.wikimedia.org/T168550) (owner: 10Ottomata) [15:46:23] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: es2013 faulty BBU - https://phabricator.wikimedia.org/T172265#3516028 (10Marostegui) ``` Charger Status: Complete ``` [15:46:24] (03PS7) 10Ema: Add prometheus metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/370962 (https://phabricator.wikimedia.org/T171710) [15:47:47] (03PS1) 10Ottomata: Remove comma in ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/371064 [15:48:10] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: es2013 faulty BBU - https://phabricator.wikimedia.org/T172265#3516033 (10Marostegui) 05Open>03Resolved I have started MySQL and everything else looks fine. So I am going to close this as resolved and I hope that we do not have to re-open it :-) [15:48:11] !log troubleshoting interface errors between pfw3-codfw and fasw-codfw [15:48:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:23] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3516036 (10Nuria) 05Resolved>03Open [15:49:17] PROBLEM - Check systemd state on druid1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:54:00] (03PS1) 10Giuseppe Lavagetto: Rakefile: run syntax checks in sequence [puppet] - 10https://gerrit.wikimedia.org/r/371066 [15:54:33] (03CR) 10Giuseppe Lavagetto: [C: 032] Rakefile: run syntax checks in sequence [puppet] - 10https://gerrit.wikimedia.org/r/371066 (owner: 10Giuseppe Lavagetto) [15:55:14] (03PS3) 10Giuseppe Lavagetto: confd: fix templates for the future parser. [puppet] - 10https://gerrit.wikimedia.org/r/371022 [15:55:17] sorry ^^^ is me [15:55:19] and in meeting so am slow fixing [15:55:22] (03CR) 10Ottomata: [C: 032] Remove comma in ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/371064 (owner: 10Ottomata) [15:55:28] (03PS2) 10Ottomata: Remove comma in ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/371064 [15:55:31] (03CR) 10Ottomata: [V: 032 C: 032] Remove comma in ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/371064 (owner: 10Ottomata) [15:55:51] (03PS6) 10Giuseppe Lavagetto: role::mediawiki::memcached: switch to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/371021 [15:56:04] (03PS1) 10Thiemo Mättig (WMDE): remove unused injectrecentChanges option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371067 [15:56:15] <_joe_> elukey: ottomata can I merge your changes as well? [15:56:34] ya [15:56:36] was about to do yours [15:56:36] <_joe_> err just ottomata really [15:56:39] _joe_: please go ehead [15:56:40] <_joe_> go on then [15:56:43] <_joe_> ahahahah [15:56:46] ok doing [15:56:48] PROBLEM - Router interfaces on pfw3-codfw is CRITICAL: CRITICAL: host 208.80.153.197, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0 [15:57:51] (03PS1) 10Thiemo Mättig (WMDE): Remove pointless showExternalRecentChanges option. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371069 [15:58:16] (03CR) 10Rush: [C: 032] wikireplica: add cnames for wikireplica services [dns] - 10https://gerrit.wikimedia.org/r/371055 (https://phabricator.wikimedia.org/T166404) (owner: 10Rush) [15:58:57] RECOVERY - Router interfaces on pfw3-codfw is OK: OK: host 208.80.153.197, interfaces up: 65, down: 0, dormant: 0, excluded: 0, unused: 0 [15:59:07] PROBLEM - Check systemd state on druid1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:59:17] RECOVERY - Check systemd state on druid1001 is OK: OK - running: The system is fully operational [15:59:28] 10Operations, 10Traffic, 10Community-Liaisons (Jul-Sep 2017), 10User-Johan: Get translations for "IE8 on XP won't work" page - https://phabricator.wikimedia.org/T172418#3516071 (10Elitre) [16:00:04] godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170810T1600). [16:00:04] thcipriani: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [16:00:08] 10Operations, 10Thumbor, 10Patch-For-Review, 10Performance-Team (Radar): Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817#3516072 (10fgiunchedi) TODO: * Update thumbor package to latest upstream (fixes pillow dep and all fixes from @gilles have been merged upstream) I also co... [16:00:14] * thcipriani *waves* [16:01:08] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [16:01:44] my couple of patches for puppet SWAT are for CI-only and Beta-only and are already cherry-picked on both, just trying to minimize delta/close out tasks. [16:02:02] thcipriani: ok! I'll take a look shortly [16:02:09] thanks :)( [16:02:20] er..s/\(// [16:03:40] (03PS3) 10Filippo Giunchedi: Beta: Add prometheus/jmx_exporter to scap::sources [puppet] - 10https://gerrit.wikimedia.org/r/337038 (owner: 10Thcipriani) [16:04:03] (03PS8) 10Ema: Add prometheus metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/370962 (https://phabricator.wikimedia.org/T171710) [16:04:12] 10Operations, 10Traffic, 10Community-Liaisons (Jul-Sep 2017), 10User-Johan: Get translations for "IE8 on XP won't work" page - https://phabricator.wikimedia.org/T172418#3516095 (10Elitre) Does the page exist already? [16:04:27] (03PS2) 10Ema: Add generic monitor metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371060 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma) [16:04:38] (03PS2) 10Ema: Add monitoring specific metric to ProxyFetch [debs/pybal] - 10https://gerrit.wikimedia.org/r/371061 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma) [16:05:09] (03CR) 10Filippo Giunchedi: [C: 032] Beta: Add prometheus/jmx_exporter to scap::sources [puppet] - 10https://gerrit.wikimedia.org/r/337038 (owner: 10Thcipriani) [16:07:32] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3516106 (10Nuria) How did this added jaime again? Sorry. [16:10:07] PROBLEM - Router interfaces on pfw3-codfw is CRITICAL: CRITICAL: host 208.80.153.197, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0 [16:11:00] (03CR) 10Filippo Giunchedi: CI/integration: Create role for docker CI agent (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/365416 (https://phabricator.wikimedia.org/T150502) (owner: 10Thcipriani) [16:11:14] thcipriani: ^ a nit but LGTM otherwise [16:11:42] godog: ah, neat [16:11:44] * thcipriani fixes [16:13:07] RECOVERY - Router interfaces on pfw3-codfw is OK: OK: host 208.80.153.197, interfaces up: 65, down: 0, dormant: 0, excluded: 0, unused: 0 [16:14:17] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [16:15:23] (03PS10) 10Thcipriani: CI/integration: Create role for docker CI agent [puppet] - 10https://gerrit.wikimedia.org/r/365416 (https://phabricator.wikimedia.org/T150502) [16:15:42] ^ godog seems like it works [16:18:05] (03CR) 10Filippo Giunchedi: [C: 032] CI/integration: Create role for docker CI agent [puppet] - 10https://gerrit.wikimedia.org/r/365416 (https://phabricator.wikimedia.org/T150502) (owner: 10Thcipriani) [16:18:31] (03CR) 10Ema: "recheck" [debs/pybal] - 10https://gerrit.wikimedia.org/r/370962 (https://phabricator.wikimedia.org/T171710) (owner: 10Ema) [16:18:37] thcipriani: sweet [16:18:40] thcipriani: merged [16:18:46] godog: \o/ [16:18:50] thank you! [16:19:35] I'll make sure beta and ci puppetmaster are up-to-date. Thanks again! [16:21:56] (03PS1) 10Jcrespo: dbstore_multiinstance: All hosts other than dbstore2002 will have 8 instances [puppet] - 10https://gerrit.wikimedia.org/r/371073 (https://phabricator.wikimedia.org/T168409) [16:23:17] !log cp1072: restart varnish backend [16:23:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:24:22] (03CR) 10Ottomata: [C: 031] role:eventubus: set deploy-service as scap deploy_user [puppet] - 10https://gerrit.wikimedia.org/r/371014 (https://phabricator.wikimedia.org/T171506) (owner: 10Elukey) [16:26:37] RECOVERY - Check Varnish expiry mailbox lag on cp1072 is OK: OK: expiry mailbox lag is 0 [16:27:17] RECOVERY - Check systemd state on druid1003 is OK: OK - running: The system is fully operational [16:27:58] 10Operations, 10Traffic, 10Community-Liaisons (Jul-Sep 2017), 10User-Johan: Get translations for "IE8 on XP won't work" page - https://phabricator.wikimedia.org/T172418#3516259 (10Whatamidoing-WMF) Yes: https://en.wikipedia.org/test-sec-warning However: I believe that there was some interest in re-visit... [16:31:18] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [16:31:57] 10Operations, 10Traffic, 10Community-Liaisons (Jul-Sep 2017), 10User-Johan: Get translations for "IE8 on XP won't work" - https://phabricator.wikimedia.org/T172418#3516279 (10Johan) [16:32:15] 10Operations, 10Traffic, 10Community-Liaisons (Jul-Sep 2017), 10User-Johan: Get translations for "IE8 on XP won't work" - https://phabricator.wikimedia.org/T172418#3497985 (10Johan) This isn't for a page, it's really just for two sentences to use on a page. [16:33:08] PROBLEM - HHVM jobrunner on mw1259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:04] (03PS1) 10Gehel: rename "r" module to "r_lang" [puppet] - 10https://gerrit.wikimedia.org/r/371075 [16:35:18] PROBLEM - HHVM jobrunner on mw1169 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:35] (03CR) 10jerkins-bot: [V: 04-1] rename "r" module to "r_lang" [puppet] - 10https://gerrit.wikimedia.org/r/371075 (owner: 10Gehel) [16:35:40] <_joe_> /win 27 [16:36:08] RECOVERY - HHVM jobrunner on mw1169 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.002 second response time [16:37:37] PROBLEM - HHVM jobrunner on mw1168 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:37:52] (03PS2) 10Gehel: rename "r" module to "r_lang" [puppet] - 10https://gerrit.wikimedia.org/r/371075 [16:40:27] PROBLEM - HHVM jobrunner on mw1169 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:41:17] RECOVERY - HHVM jobrunner on mw1169 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.002 second response time [16:41:47] (03PS1) 10Giuseppe Lavagetto: base::service_unit: deprecate autolookup of templates [puppet] - 10https://gerrit.wikimedia.org/r/371076 (https://phabricator.wikimedia.org/T171704) [16:45:24] <_joe_> gehel: I'd like your opinion on ^^ [16:45:27] PROBLEM - HHVM jobrunner on mw1169 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:45:45] _joe_: I'm honored :) [16:46:11] <_joe_> context is allowing to fix base::service_unit declarations easily for the future parser [16:46:37] <_joe_> my idea is to do something like this https://wikitech.wikimedia.org/wiki/User:Giuseppe_Lavagetto/PuppetFutureParser#Variable_scope_is_now_respected_in_templates [16:48:31] <_joe_> anyways, I'll go away for a few, will be back later [16:48:37] 10Operations, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681#3516339 (10elukey) >>! In T172681#3511453, @elukey wrote: > > {F8994394} > Interesting finding: https://gerrit.wikimedi... [16:49:31] _joe_: I don't like much the parameters being boolean or string... (but that's probably my strongly typed background speaking) [16:53:37] PROBLEM - mediawiki-installation DSH group on mw2256 is CRITICAL: Host mw2256 is not in mediawiki-installation dsh group [16:53:56] ok this is me --^ [16:54:03] (03CR) 10Gehel: base::service_unit: deprecate autolookup of templates (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/371076 (https://phabricator.wikimedia.org/T171704) (owner: 10Giuseppe Lavagetto) [16:54:04] I've set it inactive [16:58:48] PROBLEM - BGP status on pfw3-codfw is CRITICAL: BGP CRITICAL - The requested table is empty or does not exist [16:59:48] RECOVERY - BGP status on pfw3-codfw is OK: BGP OK - up: 3, down: 2, shutdown: 0 [17:00:04] gwicke, cscott, arlolra, subbu, halfak, and Amir1: Respected human, time to deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170810T1700). Please do the needful. [17:00:17] Nothing for ORES, ty! [17:02:58] \o/ [17:03:27] (03CR) 10Marostegui: [C: 031] dbstore_multiinstance: All hosts other than dbstore2002 will have 8 instances [puppet] - 10https://gerrit.wikimedia.org/r/371073 (https://phabricator.wikimedia.org/T168409) (owner: 10Jcrespo) [17:05:37] RECOVERY - HHVM jobrunner on mw1169 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.011 second response time [17:07:48] RECOVERY - HHVM jobrunner on mw1259 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 8.900 second response time [17:07:58] RECOVERY - HHVM jobrunner on mw1168 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.002 second response time [17:11:12] nothing for parsoid [17:23:27] (03CR) 10Gergő Tisza: "You'll probably want to add bureaucrats->oauthadmin to wgAdd/RemoveGroups as well." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370310 (https://phabricator.wikimedia.org/T170301) (owner: 10MarcoAurelio) [17:24:28] tgr: hmm, not sure about that. Not all the wikis have local OAuth tables [17:25:05] I mean, Meta is the central place for OAuth at Wikimedia [17:25:29] other wikis do not manage that, right? [17:25:56] the patch is about adding OAuth support to a non-SUL wiki [17:25:57] meh, sorry tgr -- I was confused with another patch [17:26:05] indeed you're right [17:26:24] will fix that after I finish with some work I've got pending :) [17:38:12] (03PS5) 10HakanIST: mariadb/phabricator: update GRANTS from iridium to phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/369832 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn) [17:39:37] 10Operations, 10Traffic, 10netops: eqiad row D switch upgrade - https://phabricator.wikimedia.org/T172459#3516504 (10ayounsi) [17:40:50] (03PS3) 10MarcoAurelio: Enabling OAuth on foundationwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370310 (https://phabricator.wikimedia.org/T170301) [17:41:22] (03CR) 10MarcoAurelio: "> You'll probably want to add bureaucrats->oauthadmin to" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370310 (https://phabricator.wikimedia.org/T170301) (owner: 10MarcoAurelio) [17:41:44] (03PS4) 10MarcoAurelio: Enabling OAuth on foundationwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370310 (https://phabricator.wikimedia.org/T170301) [17:42:13] !log kartik@tin Started deploy [cxserver/deploy@1065ffe]: Update cxserver to 686f4f3 [17:42:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:42:53] !log kartik@tin Finished deploy [cxserver/deploy@1065ffe]: Update cxserver to 686f4f3 (duration: 00m 40s) [17:43:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:44:47] PROBLEM - cxserver endpoints health on scb2001 is CRITICAL: /v1/translate/{from}/{to}{/provider} (Machine translate an HTML fragment using Apertium, adapt the links to target language wiki.) is CRITICAL: Could not fetch url http://10.192.32.132:8080/v1/translate/en/es/Apertium: Generic connection error: HTTPConnectionPool(host=u10.192.32.132, port=8080): Max retries exceeded with url: /v1/translate/en/es/Apertium (Caused by Pro [17:44:47] on aborted., BadStatusLine(,))) [17:46:58] (03PS8) 10Freddy2001: gerrit: let Apache proxy only listen on service IP [puppet] - 10https://gerrit.wikimedia.org/r/354078 (owner: 10Dzahn) [17:47:24] (03Draft2) 10MarcoAurelio: Cloud VPS configuration for hi.wikivoyage [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) [17:47:46] (03CR) 10jerkins-bot: [V: 04-1] Cloud VPS configuration for hi.wikivoyage [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [17:48:23] (03CR) 10MarcoAurelio: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [17:48:48] (03CR) 10jerkins-bot: [V: 04-1] Cloud VPS configuration for hi.wikivoyage [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [17:49:39] (03PS3) 10MarcoAurelio: Cloud VPS configuration for hi.wikivoyage [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) [17:50:17] RECOVERY - Check Varnish expiry mailbox lag on cp1049 is OK: OK: expiry mailbox lag is 17488 [17:52:38] !log kartik@tin Started deploy [cxserver/deploy@f43ef96]: (no justification provided) [17:52:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:52:57] RECOVERY - cxserver endpoints health on scb2001 is OK: All endpoints are healthy [17:53:22] !log kartik@tin Finished deploy [cxserver/deploy@f43ef96]: (no justification provided) (duration: 00m 44s) [17:53:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:58:56] (03PS1) 10Mark Bergsma: Handle failing TCP socket options in IdleConnection [debs/pybal] - 10https://gerrit.wikimedia.org/r/371102 [18:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170810T1800). Please do the needful. [18:00:04] RoanKattouw and Jdlrobson: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [18:00:13] \o [18:00:31] I can do [18:01:15] (03CR) 10Mark Bergsma: [C: 031] Add prometheus metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/370962 (https://phabricator.wikimedia.org/T171710) (owner: 10Ema) [18:01:25] (03CR) 10Ema: [C: 032] Add prometheus metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/370962 (https://phabricator.wikimedia.org/T171710) (owner: 10Ema) [18:01:53] (03PS2) 10Chad: Correct config - commonswiki not commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370924 (https://phabricator.wikimedia.org/T170687) (owner: 10Jdlrobson) [18:02:04] \o [18:04:23] (03PS3) 10Mark Bergsma: Add generic monitor metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371060 (https://phabricator.wikimedia.org/T171710) [18:04:25] (03PS3) 10Mark Bergsma: Add monitoring specific metric to ProxyFetch [debs/pybal] - 10https://gerrit.wikimedia.org/r/371061 (https://phabricator.wikimedia.org/T171710) [18:04:27] (03PS1) 10Mark Bergsma: Add monitoring specific metric to DNSQuery [debs/pybal] - 10https://gerrit.wikimedia.org/r/371104 (https://phabricator.wikimedia.org/T171710) [18:04:29] (03PS1) 10Mark Bergsma: Add monitoring specific metric to RunCommand [debs/pybal] - 10https://gerrit.wikimedia.org/r/371105 (https://phabricator.wikimedia.org/T171710) [18:08:00] jdlrobson, RoanKattouw: Gonna take a bit, looks like somebody just did a bunch of +2s to core :) [18:08:05] (03CR) 10Chad: [C: 032] Correct config - commonswiki not commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370924 (https://phabricator.wikimedia.org/T170687) (owner: 10Jdlrobson) [18:08:15] (03PS4) 10Mark Bergsma: Add generic monitor metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371060 (https://phabricator.wikimedia.org/T171710) [18:08:17] (03PS4) 10Mark Bergsma: Add monitoring specific metric to ProxyFetch [debs/pybal] - 10https://gerrit.wikimedia.org/r/371061 (https://phabricator.wikimedia.org/T171710) [18:08:19] (03PS2) 10Mark Bergsma: Add monitoring specific metric to DNSQuery [debs/pybal] - 10https://gerrit.wikimedia.org/r/371104 (https://phabricator.wikimedia.org/T171710) [18:08:21] (03PS2) 10Mark Bergsma: Add monitoring specific metric to RunCommand [debs/pybal] - 10https://gerrit.wikimedia.org/r/371105 (https://phabricator.wikimedia.org/T171710) [18:10:03] (03CR) 10Mark Bergsma: [C: 031] Add generic monitor metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371060 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma) [18:10:24] (03CR) 10Ema: [V: 032 C: 032] Add generic monitor metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371060 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma) [18:10:36] RainbowSprinkles: k - i'll be around waiting for the ping [18:11:41] RoanKattouw: RainbowSprinkles re https://gerrit.wikimedia.org/r/#/c/368330/ one note wikidatawiki is not on wmf.12 yet, saw the quote in the commit "Now that wmf.12 is deployed everywhere." [18:11:52] Oh still? [18:11:54] Yeah [18:11:56] (03CR) 10Ema: [V: 032 C: 032] Add monitoring specific metric to ProxyFetch [debs/pybal] - 10https://gerrit.wikimedia.org/r/371061 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma) [18:11:58] I just noticed that [18:12:02] Then I need to postpone mine [18:12:41] (03CR) 10Mark Bergsma: [C: 031] Add monitoring specific metric to DNSQuery [debs/pybal] - 10https://gerrit.wikimedia.org/r/371104 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma) [18:13:41] (03CR) 10Mark Bergsma: [C: 031] Add monitoring specific metric to RunCommand [debs/pybal] - 10https://gerrit.wikimedia.org/r/371105 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma) [18:15:00] (03Merged) 10jenkins-bot: Correct config - commonswiki not commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370924 (https://phabricator.wikimedia.org/T170687) (owner: 10Jdlrobson) [18:16:08] (03PS1) 10Mark Bergsma: Add monitoring specific metric to IdleConnection [debs/pybal] - 10https://gerrit.wikimedia.org/r/371110 (https://phabricator.wikimedia.org/T171710) [18:16:57] (03CR) 10Ema: [V: 032 C: 032] Add monitoring specific metric to DNSQuery [debs/pybal] - 10https://gerrit.wikimedia.org/r/371104 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma) [18:19:31] (03CR) 10Ema: [V: 032 C: 032] Add monitoring specific metric to RunCommand [debs/pybal] - 10https://gerrit.wikimedia.org/r/371105 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma) [18:20:17] (03CR) 10jenkins-bot: Correct config - commonswiki not commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370924 (https://phabricator.wikimedia.org/T170687) (owner: 10Jdlrobson) [18:21:54] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: Fix commons/commonswiki snafu (duration: 00m 52s) [18:22:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:04] jdlrobson: You're live everywhere (too trivial for mwdebug imo) [18:22:15] RoanKattouw: Sorry, try again next time :\ [18:22:20] [18:23:07] RainbowSprinkles: k [18:23:20] RainbowSprinkles: looks good! [18:23:21] thanks! [18:24:30] (03CR) 10Filippo Giunchedi: [C: 031] Upgrade to 1.2 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/370907 (https://phabricator.wikimedia.org/T161719) (owner: 10Gilles) [18:27:14] 10Operations, 10MediaWiki-extensions-Scribunto: Build and push a new hhvm-luasandbox package - https://phabricator.wikimedia.org/T171166#3456205 (10eranroz) Should this task be unbreak now, as it is blocker/possible root cause for T170039 which generate thousands of errors? Anyway, what is the current statu... [18:29:48] ;Reedy want to make the newsletter tables on test? [18:31:51] (03PS2) 10Ema: Handle failing TCP socket options in IdleConnection [debs/pybal] - 10https://gerrit.wikimedia.org/r/371102 (owner: 10Mark Bergsma) [18:32:14] (03CR) 10Ema: [C: 032] Handle failing TCP socket options in IdleConnection [debs/pybal] - 10https://gerrit.wikimedia.org/r/371102 (owner: 10Mark Bergsma) [18:32:20] (03CR) 10Ema: [V: 032 C: 032] Handle failing TCP socket options in IdleConnection [debs/pybal] - 10https://gerrit.wikimedia.org/r/371102 (owner: 10Mark Bergsma) [18:45:23] test or beta cluster? [18:46:44] 10Operations, 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, and 5 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3516822 (10mmodell) [18:48:32] addshore: Sounds like effort [18:48:36] Need to backport a few things [18:48:41] Reedy: you sound like effort ;) [18:48:45] MAKE ME [18:49:29] (03Draft2) 10MarcoAurelio: Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) [18:49:34] (03Draft1) 10MarcoAurelio: Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) [18:49:41] (03PS3) 10MarcoAurelio: Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) [18:50:12] (03CR) 10MarcoAurelio: "I'll upload the logos in a new patch. I need to optimize them, etc." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [18:51:35] (03PS1) 10Mark Bergsma: Add monitoring specific metric to IdleConnection [debs/pybal] - 10https://gerrit.wikimedia.org/r/371117 (https://phabricator.wikimedia.org/T171710) [18:52:08] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [18:52:14] (03Abandoned) 10Mark Bergsma: Add monitoring specific metric to IdleConnection [debs/pybal] - 10https://gerrit.wikimedia.org/r/371110 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma) [18:52:38] 10Operations, 10DBA, 10media-storage, 10monitoring: icinga hp raid check timeout on busy ms-be and db machines - https://phabricator.wikimedia.org/T141252#3516853 (10herron) [18:52:41] 10Operations, 10monitoring, 10Patch-For-Review: Nrpe command_timeout and "Service Check Timed Out" errors - https://phabricator.wikimedia.org/T172921#3516851 (10herron) [18:53:14] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] Add the Scap3 configuration [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/366466 (https://phabricator.wikimedia.org/T116340) (owner: 10Mobrovac) [18:53:15] !log reedy@tin Synchronized php-1.30.0-wmf.13/extensions/WikimediaMaintenance/createExtensionTables.php: newsletter (duration: 00m 52s) [18:53:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:55:37] !log reedy@tin Synchronized php-1.30.0-wmf.13/extensions/Newsletter/: sql file updates (duration: 00m 52s) [18:55:42] (03PS4) 10MarcoAurelio: Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) [18:55:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:56:16] !log created newsletter tables on testwiki [18:56:18] addshore: beer me [18:56:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:00:05] twentyafterfour: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170810T1900). [19:02:12] (03PS3) 10Mobrovac: Cassandra: Switch logback-encoder to Scap3 [puppet] - 10https://gerrit.wikimedia.org/r/366473 (https://phabricator.wikimedia.org/T116340) [19:06:39] * TabbyCat pint-beers Reedy [19:07:41] (03CR) 10Filippo Giunchedi: [C: 032] Cassandra: Switch logback-encoder to Scap3 [puppet] - 10https://gerrit.wikimedia.org/r/366473 (https://phabricator.wikimedia.org/T116340) (owner: 10Mobrovac) [19:08:36] (03PS4) 10Reedy: Enable Newsletter on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362394 (https://phabricator.wikimedia.org/T110170) (owner: 10Addshore) [19:12:16] (03CR) 10jerkins-bot: [V: 04-1] Enable Newsletter on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362394 (https://phabricator.wikimedia.org/T110170) (owner: 10Addshore) [19:12:24] gj addshore [19:12:47] 19:12:15 19334 | ERROR | [x] Tabs must be used to indent lines; spaces are not [19:12:47] 19:12:15 | | allowed [19:12:47] 19:12:15 | | (Generic.WhiteSpace.DisallowSpaceIndent.SpacesUsed) [19:12:50] addshore: You're the worst [19:13:11] !log mobrovac@tin Started deploy [cassandra/logstash-logback-encoder@d085ffa] (staging): first Scap3 deployment - T116340 [19:13:14] !log mobrovac@tin Finished deploy [cassandra/logstash-logback-encoder@d085ffa] (staging): first Scap3 deployment - T116340 (duration: 00m 03s) [19:13:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:13:23] T116340: Deploy logstash logback encoder with scap3 - https://phabricator.wikimedia.org/T116340 [19:13:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:20] (03PS5) 10Reedy: Enable Newsletter on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362394 (https://phabricator.wikimedia.org/T110170) (owner: 10Addshore) [19:16:14] !log restart cassandra instances on xenon to test logstash-logback-encoder deploy [19:16:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:19:43] (03CR) 10Reedy: [C: 032] Enable Newsletter on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362394 (https://phabricator.wikimedia.org/T110170) (owner: 10Addshore) [19:21:42] (03Merged) 10jenkins-bot: Enable Newsletter on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362394 (https://phabricator.wikimedia.org/T110170) (owner: 10Addshore) [19:21:50] (03PS1) 1020after4: All wikis (except wikidata) to 1.30.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371119 (https://phabricator.wikimedia.org/T170631) [19:22:13] (03CR) 1020after4: [C: 032] All wikis (except wikidata) to 1.30.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371119 (https://phabricator.wikimedia.org/T170631) (owner: 1020after4) [19:23:49] (03Merged) 10jenkins-bot: All wikis (except wikidata) to 1.30.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371119 (https://phabricator.wikimedia.org/T170631) (owner: 1020after4) [19:27:44] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: All wikis (except wikidata) to 1.30.0-wmf.13 refs T170631 [19:27:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:27:55] T170631: 1.30.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T170631 [19:28:27] !log mobrovac@tin Started deploy [cassandra/logstash-logback-encoder@d085ffa] (staging): first Scap3 deployment (rest of the nodes) - T116340 [19:28:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:28:38] T116340: Deploy logstash logback encoder with scap3 - https://phabricator.wikimedia.org/T116340 [19:28:39] !log mobrovac@tin Finished deploy [cassandra/logstash-logback-encoder@d085ffa] (staging): first Scap3 deployment (rest of the nodes) - T116340 (duration: 00m 12s) [19:28:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:29:58] !log mobrovac@tin Started deploy [cassandra/logstash-logback-encoder@d085ffa]: first Scap3 deployment - T116340 [19:30:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:30:33] !log mobrovac@tin Finished deploy [cassandra/logstash-logback-encoder@d085ffa]: first Scap3 deployment - T116340 (duration: 00m 34s) [19:30:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:30:49] !log mobrovac@tin Started deploy [cassandra/logstash-logback-encoder@d085ffa] (aqs): first Scap3 deployment - T116340 [19:31:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:31:58] !log mobrovac@tin Finished deploy [cassandra/logstash-logback-encoder@d085ffa] (aqs): first Scap3 deployment - T116340 (duration: 01m 08s) [19:32:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:32:28] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] Add the Scap configuration [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/366404 (https://phabricator.wikimedia.org/T137371) (owner: 10Mobrovac) [19:33:13] (03CR) 10jenkins-bot: Enable Newsletter on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362394 (https://phabricator.wikimedia.org/T110170) (owner: 10Addshore) [19:33:15] (03CR) 10jenkins-bot: All wikis (except wikidata) to 1.30.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371119 (https://phabricator.wikimedia.org/T170631) (owner: 1020after4) [19:33:17] !log reedy@tin Synchronized wmf-config/: Newsletter on testwiki (duration: 00m 49s) [19:33:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:33:39] (03PS4) 10Mobrovac: Cassandra: Switch metrics-collector to use Scap3 [puppet] - 10https://gerrit.wikimedia.org/r/366459 (https://phabricator.wikimedia.org/T137371) [19:33:52] Notice: Undefined variable: wmgUseNewsletter in /srv/mediawiki/wmf-config/CommonSettings.php on line 3389 [19:34:01] It should be transient [19:34:19] count: 3511 [19:34:24] (03CR) 10Filippo Giunchedi: [C: 032] Cassandra: Switch metrics-collector to use Scap3 [puppet] - 10https://gerrit.wikimedia.org/r/366459 (https://phabricator.wikimedia.org/T137371) (owner: 10Mobrovac) [19:34:28] make that 7245 [19:35:02] !log reedy@tin Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 47s) [19:35:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:28] ok that's better [19:35:33] :) [19:35:35] reedy@tin:/srv/mediawiki-staging/wmf-config$ grep wmgUseNewsletter InitialiseSettings.php [19:35:35] 'wmgUseNewsletter' => [ [19:35:35] reedy@tin:/srv/mediawiki-staging/wmf-config$ grep wmgUseNewsletter CommonSettings.php [19:35:35] if ( $wmgUseNewsletter ) { [19:37:53] (03CR) 10Gergő Tisza: [C: 031] Enabling OAuth on foundationwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370310 (https://phabricator.wikimedia.org/T170301) (owner: 10MarcoAurelio) [19:38:19] (03PS1) 10Ema: coordinator: exit if exceptions are raised importing monitors [debs/pybal] - 10https://gerrit.wikimedia.org/r/371123 [19:39:04] (03PS5) 10MarcoAurelio: Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) [19:39:48] (03PS6) 10MarcoAurelio: Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) [19:40:02] (03CR) 10Ema: [C: 032] Add monitoring specific metric to IdleConnection [debs/pybal] - 10https://gerrit.wikimedia.org/r/371117 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma) [19:40:18] (03PS2) 10Ema: coordinator: exit if exceptions are raised importing monitors [debs/pybal] - 10https://gerrit.wikimedia.org/r/371123 [19:41:52] (03PS5) 10MarcoAurelio: Enabling OAuth on foundationwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370310 (https://phabricator.wikimedia.org/T170301) [19:41:59] !log mobrovac@tin Started deploy [cassandra/metrics-collector@5db1a43] (staging): First Scap3 deployment - T137371 [19:42:06] (03CR) 10Mark Bergsma: [C: 031] coordinator: exit if exceptions are raised importing monitors [debs/pybal] - 10https://gerrit.wikimedia.org/r/371123 (owner: 10Ema) [19:42:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:42:10] T137371: Deploy cassandra metrics collector via scap3 - https://phabricator.wikimedia.org/T137371 [19:42:22] (03PS4) 10MarcoAurelio: Cloud VPS configuration for hi.wikivoyage [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) [19:42:28] Notice: Cannot access property on non-object in /srv/mediawiki/php-1.30.0-wmf.13/includes/specials/SpecialDoubleRedirects.php on line 143 [19:42:50] (03CR) 10Ema: [V: 032 C: 032] coordinator: exit if exceptions are raised importing monitors [debs/pybal] - 10https://gerrit.wikimedia.org/r/371123 (owner: 10Ema) [19:53:34] (03PS1) 10Mark Bergsma: Add pybal service (coordinator) metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371126 [20:00:17] !log mobrovac@tin Finished deploy [cassandra/metrics-collector@5db1a43] (staging): First Scap3 deployment - T137371 (duration: 18m 17s) [20:00:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:30] T137371: Deploy cassandra metrics collector via scap3 - https://phabricator.wikimedia.org/T137371 [20:01:38] (03PS1) 10Mobrovac: Scap config: restart the service after a deploy [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/371127 (https://phabricator.wikimedia.org/T137371) [20:02:16] hmm twentyafterfour -- still that DoubleRedirects error? [20:03:44] (03PS2) 10C. Scott Ananian: Rename language codes sr-ec and sr-el to sr-cyrl and sr-latn [puppet] - 10https://gerrit.wikimedia.org/r/368248 (https://phabricator.wikimedia.org/T117845) (owner: 10Fomafix) [20:03:52] (03CR) 10C. Scott Ananian: [C: 031] Rename language codes sr-ec and sr-el to sr-cyrl and sr-latn [puppet] - 10https://gerrit.wikimedia.org/r/368248 (https://phabricator.wikimedia.org/T117845) (owner: 10Fomafix) [20:04:39] (03CR) 10C. Scott Ananian: [C: 031] "Seems like this can be merged w/o waiting for the other patches on the 'sr' topic. The new aliases are harmless until/unless core recogni" [puppet] - 10https://gerrit.wikimedia.org/r/368248 (https://phabricator.wikimedia.org/T117845) (owner: 10Fomafix) [20:12:22] (03PS1) 10MarcoAurelio: Set project logo for wikimania2018wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371135 (https://phabricator.wikimedia.org/T173042) [20:13:29] (03CR) 10MarcoAurelio: "Logos was optiPNG -o7 already. Let me know if it is too small or too large (size: 150 px)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371135 (https://phabricator.wikimedia.org/T173042) (owner: 10MarcoAurelio) [20:24:54] (03PS2) 10Mark Bergsma: Add pybal service (coordinator) metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371126 [20:34:38] (03CR) 10Mobrovac: [V: 032 C: 032] Scap config: restart the service after a deploy [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/371127 (https://phabricator.wikimedia.org/T137371) (owner: 10Mobrovac) [20:34:52] !log mobrovac@tin Started deploy [cassandra/metrics-collector@d0169ee] (staging): First Scap3 deployment - T137371 [20:34:58] !log mobrovac@tin Finished deploy [cassandra/metrics-collector@d0169ee] (staging): First Scap3 deployment - T137371 (duration: 00m 05s) [20:35:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:35:05] T137371: Deploy cassandra metrics collector via scap3 - https://phabricator.wikimedia.org/T137371 [20:35:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:42:14] (03PS1) 10Mobrovac: Cassandra metrics: Allow Scap3 to restart the service [puppet] - 10https://gerrit.wikimedia.org/r/371159 (https://phabricator.wikimedia.org/T137371) [20:43:55] (03PS3) 10Ema: Add pybal service (coordinator) metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371126 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma) [20:47:46] (03CR) 10Filippo Giunchedi: [C: 032] Cassandra metrics: Allow Scap3 to restart the service [puppet] - 10https://gerrit.wikimedia.org/r/371159 (https://phabricator.wikimedia.org/T137371) (owner: 10Mobrovac) [20:50:30] (03CR) 10Ema: [V: 032 C: 032] Add pybal service (coordinator) metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371126 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma) [20:50:41] !log mobrovac@tin Started deploy [cassandra/metrics-collector@d0169ee] (staging): First Scap3 deployment, take #2 - T137371 [20:50:50] !log mobrovac@tin Finished deploy [cassandra/metrics-collector@d0169ee] (staging): First Scap3 deployment, take #2 - T137371 (duration: 00m 09s) [20:50:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:50:53] T137371: Deploy cassandra metrics collector via scap3 - https://phabricator.wikimedia.org/T137371 [20:51:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:51:11] !log mobrovac@tin Started deploy [cassandra/metrics-collector@d0169ee] (staging): First Scap3 deployment, take #2.5 - T137371 [20:51:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:51:32] !log mobrovac@tin Finished deploy [cassandra/metrics-collector@d0169ee] (staging): First Scap3 deployment, take #2.5 - T137371 (duration: 00m 21s) [20:51:44] 10Operations, 10Mail: Do not apply spam headers on email assessed NOT to be spam - https://phabricator.wikimedia.org/T111595#1611313 (10Platonides) Maybe changing the contents for X-Spam-Report would be enough to work around this Google problem. Replace perhaps with: > X-Antispam-Host: polonium.wikimedia.org... [20:51:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:57:22] TabbyCat: apparently [20:58:04] I asked because I run a bot at several projects tasked to fix double redirects. If it breaks I now know why :) [21:02:10] 10Operations, 10Mail: status of studentgroups@ and studentclubs@ mail aliases? - https://phabricator.wikimedia.org/T127550#2046289 (10Platonides) @Dzahn was anything found there? Can this task be closed? [21:11:51] (03PS1) 10Mark Bergsma: Add metric pybal_service_depool_threshold [debs/pybal] - 10https://gerrit.wikimedia.org/r/371185 (https://phabricator.wikimedia.org/T171710) [21:21:08] 10Operations, 10Mail: status of studentgroups@ and studentclubs@ mail aliases? - https://phabricator.wikimedia.org/T127550#3517404 (10Dzahn) @Platonides No, it can't be closed. It's still unknown if these are used. If studentclubs@ is a Google group, the easiest solution would be if OIT can just add studentg... [21:30:33] (03PS1) 10Catrope: Allow WOFF uploads on private wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371196 [21:32:27] mutante: did you get to check the logs? [21:37:10] 10Operations, 10Wiki-Loves-Monuments (2017): Import Wiki Loves Monuments photos from Flickr to Commons - https://phabricator.wikimedia.org/T173056#3517417 (10LilyOfTheWest) [21:37:23] 10Operations, 10Wiki-Loves-Monuments (2017): Import Wiki Loves Monuments photos from Flickr to Commons - https://phabricator.wikimedia.org/T173056#3517417 (10LilyOfTheWest) p:05Triage>03High [21:39:42] 10Operations, 10Wiki-Loves-Monuments (2017): Import Wiki Loves Monuments photos from Flickr to Commons - https://phabricator.wikimedia.org/T173056#3517433 (10LilyOfTheWest) @Multichill let's make sure we coordinate with Ops before we start the import of photos to Commons to make sure we don't break things. ;)... [21:41:58] (03CR) 10Brian Wolff: [C: 031] "+1, this is fine from a security perspective to upload on private wikis." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371196 (owner: 10Catrope) [21:54:11] greg-g: untaped? :O [22:00:42] jouncebot: next [22:00:42] In 0 hour(s) and 59 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170810T2300) [22:06:38] (03PS5) 10MarcoAurelio: Cloud VPS configuration for hi.wikivoyage [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) [22:08:27] (03CR) 10MarcoAurelio: "I'm not sure I'll be able to be around during the SWAT window. If possible, please deploy nonetheless. If not possible, I'll try to look f" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370310 (https://phabricator.wikimedia.org/T170301) (owner: 10MarcoAurelio) [22:17:16] !log installing git security-updates [22:17:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:18:09] moritzm: was the package itself updated, so can I just update it for my private servers? [22:18:25] Sagan: if you use Debian, yes [22:18:33] Sagan debian package was updated [22:18:41] hm, not for ubuntu? [22:18:48] for labs instances unattended--upgrades will upgrade shortly [22:18:59] Sagan i dont know about that [22:19:10] Sagan: they'll probably also release updates, but don't have insight there [22:19:23] moritzm: Zppix ok, thanks :) [22:19:40] np [22:24:08] PROBLEM - puppet last run on rdb2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata] [22:24:38] Sagan: ping [22:24:51] TabbyCat: pong [22:25:14] Sagan: are you staying around for some more time?, say, til the Evening SWAT? [22:25:31] TabbyCat: you mean https://gerrit.wikimedia.org/r/370310 ? [22:25:37] in theory, yes [22:25:53] yes [22:26:12] TabbyCat: is there something more to do then just merging the patch (script or so, table creating?) [22:27:07] PROBLEM - puppet last run on hydrogen is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git] [22:27:18] Sagan: I don't think so. Reedy deployed the OAuth tables some days ago and that should be the only 'blocker' before doing that; I don't know if there's a script to run after that. Have not seen anything in the docs. [22:27:26] moritzm: related to your patch ^ [22:27:37] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git] [22:27:39] TabbyCat: ah, ok. [22:28:03] Sagan: so, in case I'm not around, would you babysit that patch for me? [22:28:15] (if you wish/are around/etc) [22:28:20] TabbyCat: yeah, the patch looks easy enough for me :) [22:28:37] Sagan: thank you; I'll try to stay around though [22:29:25] Sagan: that's normal for fleet-wide updates, in some cases it clashes with apt updates done by puppet [22:30:19] moritzm: ah, ok. Why worry was just that these were the first signs of "puppet failed at all hosts" like we had that some days ago :) [22:31:03] !log twentyafterfour@tin Synchronized php-1.30.0-wmf.13/includes/specials/SpecialDoubleRedirects.php: Hopefully fix T173045 (duration: 00m 48s) [22:31:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:31:13] T173045: Notice: Cannot access property on non-object in /srv/mediawiki/php-1.30.0-wmf.13/includes/specials/SpecialDoubleRedirects.php on line 143 - https://phabricator.wikimedia.org/T173045 [22:34:34] TabbyCat: looks like double redirect bug is fixed [22:34:52] good :) [22:51:38] RECOVERY - puppet last run on rdb2003 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [22:54:28] RECOVERY - puppet last run on hydrogen is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [22:55:07] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [23:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170810T2300). [23:00:04] TabbyCat and RoanKattouw: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:14] meow o/ [23:00:16] I can deploy today [23:00:18] o/ [23:00:28] Seddon: You'll need to verify for this one [23:00:40] Seddon: And you'll need to install the WikimediaDebug browser extension for that [23:04:08] (03CR) 10MaxSem: [C: 032] Enabling OAuth on foundationwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370310 (https://phabricator.wikimedia.org/T170301) (owner: 10MarcoAurelio) [23:05:36] (03Merged) 10jenkins-bot: Enabling OAuth on foundationwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370310 (https://phabricator.wikimedia.org/T170301) (owner: 10MarcoAurelio) [23:05:58] (03CR) 10jenkins-bot: Enabling OAuth on foundationwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370310 (https://phabricator.wikimedia.org/T170301) (owner: 10MarcoAurelio) [23:08:18] mhm, were the tables already there? [23:08:35] MaxSem: yes, Reedy created them few days ago [23:20:55] hmm, WikimediaDebug doesn't work for me :O [23:21:42] I don't remember if wikimediadebug worked on foundationwiki. Surely not on wikitech as it's in silver. [23:22:07] foundationwiki is at least part of the main cluster [23:22:27] yep [23:22:40] to which mw did you pulled it? [23:22:50] 1001 or 1002? [23:22:55] mwdebug1002 [23:22:56] * Sagan guesses 1002 [23:23:00] :D [23:23:06] pfft, https://github.com/wikimedia/ChromeWikimediaDebug/blob/master/manifest.json [23:23:11] indeed it can't [23:23:19] okay, I'm just syncing [23:23:43] ah, right [23:23:54] wikimediafoundation.org is not matched there [23:25:21] !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/370310/ (duration: 00m 46s) [23:25:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:26] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/370310/ (duration: 00m 47s) [23:26:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:57] https://wikimediafoundation.org/wiki/Special:ListGroupRights appears as expected [23:27:52] WFM [23:27:59] and for me as well [23:28:05] (03PS2) 10MaxSem: Allow WOFF uploads on private wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371196 (owner: 10Catrope) [23:28:53] (03CR) 10MaxSem: [C: 032] Allow WOFF uploads on private wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371196 (owner: 10Catrope) [23:30:17] (03Merged) 10jenkins-bot: Allow WOFF uploads on private wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371196 (owner: 10Catrope) [23:30:35] (03CR) 10jenkins-bot: Allow WOFF uploads on private wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371196 (owner: 10Catrope) [23:31:17] RoanKattouw, pulled on mwdebug1002 [23:32:42] MaxSem: thanks. It affects donatewiki so I need to find Seddon and get him to test it for me [23:33:41] in principle, the patch is about uploading woff on private wikis, so if it works on officewiki it can be deployed [23:34:07] if there's some arcane reason it doeasn't work on donatewiki we can address it later [23:36:05] * MaxSem looks at html of a flow page and is horrified of all the json [23:36:43] Tester located [23:37:12] True [23:37:19] But I have Seddon here anyway [23:37:48] okay, so I don't need to upload junk to officewiki? :P [23:38:50] (03PS1) 10EBernhardson: Switch elastic1017 to LVM [puppet] - 10https://gerrit.wikimedia.org/r/371210 (https://phabricator.wikimedia.org/T169498) [23:41:15] MaxSem: Working, thanks [23:41:54] okay, pushing [23:43:10] !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/371196/2 (duration: 00m 47s) [23:43:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:43:30] RoanKattouw, ^ [23:44:50] Thanks!