[00:25:15] 10Operations, 10Traffic, 10Patch-For-Review, 10User-notice: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#2684468 (10TheDJ) @Pigsonthewing if such companies are out there, then us cutting them off might finally be an indication to them ho... [00:38:15] 10Operations, 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Performance-Team, and 5 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3508603 (10Krinkle) [00:40:14] 10Operations, 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Performance-Team, and 5 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3224459 (10Krinkle) [00:56:12] 10Operations, 10Phabricator, 10Traffic, 10Release-Engineering-Team (Kanban): Verify that the codfw lvs is configured correctly for Phabricator - https://phabricator.wikimedia.org/T168699#3508632 (10mmodell) [01:12:36] PROBLEM - HHVM rendering on mw1293 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:13:36] RECOVERY - HHVM rendering on mw1293 is OK: HTTP OK: HTTP/1.1 200 OK - 75850 bytes in 0.132 second response time [01:16:16] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [01:29:55] (03CR) 10Krinkle: [C: 031] logging: Remove exceptionmonitor [puppet] - 10https://gerrit.wikimedia.org/r/368522 (owner: 10MaxSem) [02:11:23] 10Operations, 10Patch-For-Review: Tracking and Reducing cron-spam from root@ - https://phabricator.wikimedia.org/T132324#3508707 (10Krinkle) [02:11:25] 10Operations: Disable cron.standard checking for lost+found directories - https://phabricator.wikimedia.org/T1249#3508705 (10Krinkle) 05Open>03declined Closing given the problem was specific to Precise, which is no longer used in production afaics. [02:26:06] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.11) (duration: 08m 25s) [02:26:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:46:20] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.12) (duration: 06m 59s) [02:46:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:47:28] 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration: m.{project}.org portal/redirect consistency - https://phabricator.wikimedia.org/T78421#3508724 (10Krinkle) [02:53:13] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Aug 8 02:53:13 UTC 2017 (duration 6m 53s) [02:53:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:56:06] 10Operations, 10Performance-Team: /usr/local/bin/xenon-generate-svgs and flamegraph.pl cronspam - https://phabricator.wikimedia.org/T169249#3508726 (10Krinkle) p:05Normal>03Low [03:05:56] (03PS1) 10Dzahn: phabricator: ensure /srv/dumps exists [puppet] - 10https://gerrit.wikimedia.org/r/370607 (https://phabricator.wikimedia.org/T163938) [03:06:24] (03CR) 10jerkins-bot: [V: 04-1] phabricator: ensure /srv/dumps exists [puppet] - 10https://gerrit.wikimedia.org/r/370607 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn) [03:12:25] (03PS2) 10Dzahn: phabricator: ensure /srv/dumps exists [puppet] - 10https://gerrit.wikimedia.org/r/370607 (https://phabricator.wikimedia.org/T163938) [03:12:51] (03CR) 10jerkins-bot: [V: 04-1] phabricator: ensure /srv/dumps exists [puppet] - 10https://gerrit.wikimedia.org/r/370607 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn) [03:14:10] (03PS3) 10Dzahn: phabricator: ensure /srv/dumps exists [puppet] - 10https://gerrit.wikimedia.org/r/370607 (https://phabricator.wikimedia.org/T163938) [03:14:42] (03CR) 10jerkins-bot: [V: 04-1] phabricator: ensure /srv/dumps exists [puppet] - 10https://gerrit.wikimedia.org/r/370607 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn) [03:14:55] 10Operations, 10Thumbor, 10Performance-Team (Radar): Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817#3508732 (10Krinkle) [03:14:58] 10Operations, 10Commons, 10Thumbor, 10media-storage, 10Performance-Team (Radar): HTTP 429 on thumbnail images for specific SVG file on Commons - https://phabricator.wikimedia.org/T170628#3508733 (10Krinkle) [03:15:07] 10Operations, 10Traffic, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3508737 (10Krinkle) [03:15:22] 10Operations, 10Performance-Team (Radar): Some Core availability Catchpoint tests might be more expensive than they need to be - https://phabricator.wikimedia.org/T162857#3508744 (10Krinkle) [03:15:29] 10Operations, 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, and 5 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3508742 (10Krinkle) [03:15:50] (03PS4) 10Dzahn: phabricator: ensure /srv/dumps exists [puppet] - 10https://gerrit.wikimedia.org/r/370607 (https://phabricator.wikimedia.org/T163938) [03:15:53] 10Operations, 10Performance-Team (Radar), 10User-Elukey: Update memcached package and configuration options - https://phabricator.wikimedia.org/T129963#3508761 (10Krinkle) [03:16:19] (03CR) 10jerkins-bot: [V: 04-1] phabricator: ensure /srv/dumps exists [puppet] - 10https://gerrit.wikimedia.org/r/370607 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn) [03:16:28] 10Operations, 10scap2, 10HHVM, 10Performance-Team (Radar), and 2 others: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#3508774 (10Krinkle) [03:17:30] (03PS5) 10Dzahn: phabricator: ensure /srv/dumps exists [puppet] - 10https://gerrit.wikimedia.org/r/370607 (https://phabricator.wikimedia.org/T163938) [03:18:30] (03CR) 10Dzahn: "hate inline editor because even more syntax issues, love inline editor because it works on bad connection" [puppet] - 10https://gerrit.wikimedia.org/r/370607 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn) [03:19:03] (03CR) 10Dzahn: [C: 032] phabricator: ensure /srv/dumps exists [puppet] - 10https://gerrit.wikimedia.org/r/370607 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn) [03:19:31] (03PS6) 10Dzahn: phabricator: ensure /srv/dumps exists [puppet] - 10https://gerrit.wikimedia.org/r/370607 (https://phabricator.wikimedia.org/T163938) [03:25:22] !log phab1001 /srv/phab/tools/public_task_dump.py to create dump, was failed cron due to missing /srv/dumps/ [03:25:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:27:16] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 829.88 seconds [03:40:10] (03PS3) 10Dzahn: Phabricator: Install package heirloom-mailx for mail command [puppet] - 10https://gerrit.wikimedia.org/r/370518 (owner: 10Paladox) [03:49:49] (03CR) 10Dzahn: [C: 032] Phabricator: Install package heirloom-mailx for mail command [puppet] - 10https://gerrit.wikimedia.org/r/370518 (owner: 10Paladox) [03:50:39] (03CR) 10Dzahn: "indeed, this is a regression since trusty where the current mail command used in stats cron works" [puppet] - 10https://gerrit.wikimedia.org/r/370518 (owner: 10Paladox) [03:52:49] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): setup/install phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T163938#3508797 (10Dzahn) https://gerrit.wikimedia.org/r/370518 by paladox merged as well, to fix another cron that sends the stats mail to admins [03:55:06] !log phab1001 /usr/local/bin/community_metrics.sh | /usr/local/bin/project_changes.sh creating stats mails to admins (which failed before) (T163938) [03:55:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:55:18] T163938: setup/install phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T163938 [03:58:26] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 295.70 seconds [05:06:36] PROBLEM - pdfrender on scb1003 is CRITICAL: connect to address 10.64.32.153 and port 5252: Connection refused [06:11:56] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.004 second response time [06:12:07] !log restart pdfrender on scb1003 [06:12:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:12:55] !log alert users with big home directories for stat1005 disk alarms (will erase data later on only if they don't answer) [06:13:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:40:13] (03PS1) 10Ladsgroup: Add copyright info for Wikidata API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370611 (https://phabricator.wikimedia.org/T112606) [06:41:39] (03CR) 10jerkins-bot: [V: 04-1] Add copyright info for Wikidata API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370611 (https://phabricator.wikimedia.org/T112606) (owner: 10Ladsgroup) [07:14:35] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3508852 (10jcrespo) **I do not want to phase out anything**. We can keep using eventlogging and have a mediawiki database copy or copies for analytics-like usage for all inte... [07:21:33] !log start of ladsgroup@terbium:~$ time mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/rebuildTermSqlIndex.php --wiki=wikidatawiki --entity-type=property --deduplicate-terms (T171460) [07:21:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:21:46] T171460: Populate term_full_entity_id on www.wikidata.org - https://phabricator.wikimedia.org/T171460 [07:34:17] !log stopped the script and re-running without --deduplicate-terms (T171460) [07:34:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:34:29] T171460: Populate term_full_entity_id on www.wikidata.org - https://phabricator.wikimedia.org/T171460 [07:41:54] !log stop puppet on cp3032 (cache::text) to set varnishkafka-webrequest logging to debug [07:42:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:11] (03PS1) 10Giuseppe Lavagetto: puppet-compiler use the correct CA for puppedb [puppet] - 10https://gerrit.wikimedia.org/r/370612 [07:47:32] (03CR) 10jerkins-bot: [V: 04-1] puppet-compiler use the correct CA for puppedb [puppet] - 10https://gerrit.wikimedia.org/r/370612 (owner: 10Giuseppe Lavagetto) [07:48:41] (03PS5) 10Jcrespo: mariadb: Add db1098 as new s6 recentchanges/watchlist/... replica [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370480 (https://phabricator.wikimedia.org/T171027) [07:48:46] (03CR) 10Jcrespo: [C: 031] mariadb: Add db1098 as new s6 recentchanges/watchlist/... replica [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370480 (https://phabricator.wikimedia.org/T171027) (owner: 10Jcrespo) [07:50:11] (03PS2) 10Giuseppe Lavagetto: puppet-compiler use the correct CA for puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/370612 [07:50:15] (03CR) 10Jcrespo: [C: 032] mariadb: Add db1098 as new s6 recentchanges/watchlist/... replica [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370480 (https://phabricator.wikimedia.org/T171027) (owner: 10Jcrespo) [07:50:30] (03CR) 10jenkins-bot: mariadb: Add db1098 as new s6 recentchanges/watchlist/... replica [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370480 (https://phabricator.wikimedia.org/T171027) (owner: 10Jcrespo) [07:51:59] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet-compiler use the correct CA for puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/370612 (owner: 10Giuseppe Lavagetto) [07:53:21] !log jynus@tin Synchronized wmf-config/db-codfw.php: Add db1098 (duration: 00m 47s) [07:53:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:55:12] (03PS1) 10Giuseppe Lavagetto: puppet-compiler: fix the generates clause in the exec [puppet] - 10https://gerrit.wikimedia.org/r/370613 [07:59:14] (03PS1) 10Jcrespo: mariadb: Pool db1098 as a generic server with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370614 [08:03:40] (03PS2) 10Jcrespo: mariadb: Pool db1098 as a generic server with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370614 [08:15:21] 10Operations, 10Traffic, 10netops: eqiad row D switch upgrade - https://phabricator.wikimedia.org/T172459#3508884 (10Gehel) **For the elasticsearch cluster:** The cluster //should// be able to survive the loss of a full row. This is not something we have tested under load yet, so this is a goo occasion to d... [08:16:58] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet-compiler: fix the generates clause in the exec [puppet] - 10https://gerrit.wikimedia.org/r/370613 (owner: 10Giuseppe Lavagetto) [08:20:35] (03CR) 10Jcrespo: [C: 032] mariadb: Pool db1098 as a generic server with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370614 (owner: 10Jcrespo) [08:21:59] (03Merged) 10jenkins-bot: mariadb: Pool db1098 as a generic server with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370614 (owner: 10Jcrespo) [08:22:09] (03CR) 10jenkins-bot: mariadb: Pool db1098 as a generic server with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370614 (owner: 10Jcrespo) [08:28:26] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Pool db1098 with limited load (duration: 00m 46s) [08:28:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:29] (03PS1) 10Jcrespo: mariadb: Pool db1098 as 50% of the recentchanges/watchlist s6 role [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370617 (https://phabricator.wikimedia.org/T171027) [08:38:00] (03PS1) 10Jcrespo: mariadb: Pool db1098 as the main recentchanges/watchlist s6 role [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370619 (https://phabricator.wikimedia.org/T171027) [08:43:14] (03CR) 10Jcrespo: [C: 032] mariadb: Pool db1098 as 50% of the recentchanges/watchlist s6 role [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370617 (https://phabricator.wikimedia.org/T171027) (owner: 10Jcrespo) [08:44:49] (03Merged) 10jenkins-bot: mariadb: Pool db1098 as 50% of the recentchanges/watchlist s6 role [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370617 (https://phabricator.wikimedia.org/T171027) (owner: 10Jcrespo) [08:46:19] (03CR) 10jenkins-bot: mariadb: Pool db1098 as 50% of the recentchanges/watchlist s6 role [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370617 (https://phabricator.wikimedia.org/T171027) (owner: 10Jcrespo) [08:46:23] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Pool db1098 at 50% load (duration: 00m 46s) [08:46:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:05:19] (03CR) 10Jcrespo: [C: 032] mariadb: Pool db1098 as the main recentchanges/watchlist s6 role [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370619 (https://phabricator.wikimedia.org/T171027) (owner: 10Jcrespo) [09:06:42] (03Merged) 10jenkins-bot: mariadb: Pool db1098 as the main recentchanges/watchlist s6 role [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370619 (https://phabricator.wikimedia.org/T171027) (owner: 10Jcrespo) [09:06:51] (03CR) 10jenkins-bot: mariadb: Pool db1098 as the main recentchanges/watchlist s6 role [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370619 (https://phabricator.wikimedia.org/T171027) (owner: 10Jcrespo) [09:09:12] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Pool db1098 as the main recentchanges/watchlist s6 role (duration: 00m 47s) [09:09:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:04] (03PS1) 1020after4: PHAB: deployment scripts to be called by scap [puppet] - 10https://gerrit.wikimedia.org/r/370622 [10:01:33] (03CR) 10jerkins-bot: [V: 04-1] PHAB: deployment scripts to be called by scap [puppet] - 10https://gerrit.wikimedia.org/r/370622 (owner: 1020after4) [10:02:26] (03PS2) 1020after4: PHAB: deployment scripts to be called by scap [puppet] - 10https://gerrit.wikimedia.org/r/370622 [10:03:17] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0 [10:04:45] (03PS3) 1020after4: PHAB: deployment scripts to be called by scap [puppet] - 10https://gerrit.wikimedia.org/r/370622 [10:05:33] (03CR) 1020after4: [C: 031] PHAB: deployment scripts to be called by scap (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/370622 (owner: 1020after4) [10:08:04] (03PS1) 10Giuseppe Lavagetto: puppet-compiler: generate files in the vardir, not libdir [puppet] - 10https://gerrit.wikimedia.org/r/370623 [10:12:01] !log update librdkafka1* on notebook100[12] and stat1003 [10:12:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:17:36] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0 [10:19:31] (03PS2) 10Giuseppe Lavagetto: puppet-compiler: generate files in the vardir, not libdir [puppet] - 10https://gerrit.wikimedia.org/r/370623 [10:19:58] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet-compiler: generate files in the vardir, not libdir [puppet] - 10https://gerrit.wikimedia.org/r/370623 (owner: 10Giuseppe Lavagetto) [10:31:40] (03PS1) 10Giuseppe Lavagetto: Remove templatedir from puppet compilation options. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370624 [10:31:42] (03PS1) 10Giuseppe Lavagetto: Copy over the puppetdb configuration if present [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370625 [10:32:09] (03CR) 10jerkins-bot: [V: 04-1] Remove templatedir from puppet compilation options. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370624 (owner: 10Giuseppe Lavagetto) [10:32:16] (03CR) 10jerkins-bot: [V: 04-1] Copy over the puppetdb configuration if present [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370625 (owner: 10Giuseppe Lavagetto) [10:33:20] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): setup/install phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T163938#3508974 (10Paladox) we should probably only use @aklapper email for /usr/local/bin/project_changes.sh on prod. I was testing /usr/local/bin/pr... [10:36:06] (03CR) 10Paladox: PHAB: deployment scripts to be called by scap (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/370622 (owner: 1020after4) [10:37:25] (03CR) 1020after4: [C: 031] PHAB: deployment scripts to be called by scap (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/370622 (owner: 1020after4) [10:38:17] (03PS2) 10Giuseppe Lavagetto: Escape content differences [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370160 (https://phabricator.wikimedia.org/T172362) [10:38:19] (03PS2) 10Giuseppe Lavagetto: Remove templatedir from puppet compilation options. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370624 [10:38:21] (03PS2) 10Giuseppe Lavagetto: Copy over the puppetdb configuration if present [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370625 [10:38:52] (03CR) 10jerkins-bot: [V: 04-1] Remove templatedir from puppet compilation options. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370624 (owner: 10Giuseppe Lavagetto) [10:38:53] (03CR) 10jerkins-bot: [V: 04-1] Copy over the puppetdb configuration if present [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370625 (owner: 10Giuseppe Lavagetto) [10:38:57] (03CR) 10Paladox: PHAB: deployment scripts to be called by scap (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/370622 (owner: 1020after4) [10:39:13] (03PS3) 10Giuseppe Lavagetto: Escape content differences [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370160 (https://phabricator.wikimedia.org/T172362) [10:39:15] (03PS3) 10Giuseppe Lavagetto: Remove templatedir from puppet compilation options. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370624 [10:39:17] (03PS3) 10Giuseppe Lavagetto: Copy over the puppetdb configuration if present [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370625 [10:39:41] (03CR) 10Giuseppe Lavagetto: [C: 032] Escape content differences (031 comment) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370160 (https://phabricator.wikimedia.org/T172362) (owner: 10Giuseppe Lavagetto) [10:39:44] (03CR) 10jerkins-bot: [V: 04-1] Remove templatedir from puppet compilation options. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370624 (owner: 10Giuseppe Lavagetto) [10:39:49] (03CR) 10jerkins-bot: [V: 04-1] Copy over the puppetdb configuration if present [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370625 (owner: 10Giuseppe Lavagetto) [10:40:07] (03Merged) 10jenkins-bot: Escape content differences [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370160 (https://phabricator.wikimedia.org/T172362) (owner: 10Giuseppe Lavagetto) [10:43:27] (03PS1) 10Ladsgroup: mediawiki: Add puppetized cronjob for rebuildTermSqlIndex [puppet] - 10https://gerrit.wikimedia.org/r/370626 (https://phabricator.wikimedia.org/T171460) [10:43:48] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: Add puppetized cronjob for rebuildTermSqlIndex [puppet] - 10https://gerrit.wikimedia.org/r/370626 (https://phabricator.wikimedia.org/T171460) (owner: 10Ladsgroup) [10:44:42] (03PS2) 10Ladsgroup: mediawiki: Add puppetized cronjob for rebuildTermSqlIndex [puppet] - 10https://gerrit.wikimedia.org/r/370626 (https://phabricator.wikimedia.org/T171460) [10:45:04] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: Add puppetized cronjob for rebuildTermSqlIndex [puppet] - 10https://gerrit.wikimedia.org/r/370626 (https://phabricator.wikimedia.org/T171460) (owner: 10Ladsgroup) [10:45:11] (03PS4) 10Giuseppe Lavagetto: Remove templatedir from puppet compilation options. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370624 [10:45:13] (03PS4) 10Giuseppe Lavagetto: Copy over the puppetdb configuration if present [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370625 [10:47:33] (03PS3) 10Ladsgroup: mediawiki: Add puppetized cronjob for rebuildTermSqlIndex [puppet] - 10https://gerrit.wikimedia.org/r/370626 (https://phabricator.wikimedia.org/T171460) [10:47:56] (03CR) 10Giuseppe Lavagetto: [C: 032] Remove templatedir from puppet compilation options. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370624 (owner: 10Giuseppe Lavagetto) [10:47:58] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: Add puppetized cronjob for rebuildTermSqlIndex [puppet] - 10https://gerrit.wikimedia.org/r/370626 (https://phabricator.wikimedia.org/T171460) (owner: 10Ladsgroup) [10:48:40] (03CR) 10Giuseppe Lavagetto: [C: 032] Copy over the puppetdb configuration if present [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370625 (owner: 10Giuseppe Lavagetto) [10:48:59] (03CR) 10Paladox: [C: 031] PHAB: deployment scripts to be called by scap [puppet] - 10https://gerrit.wikimedia.org/r/370622 (owner: 1020after4) [10:49:54] (03PS4) 10Ladsgroup: mediawiki: Add puppetized cronjob for rebuildTermSqlIndex [puppet] - 10https://gerrit.wikimedia.org/r/370626 (https://phabricator.wikimedia.org/T171460) [11:00:53] (03PS1) 10Giuseppe Lavagetto: labs/puppet3-diffs: Bump version of the puppet compiler [puppet] - 10https://gerrit.wikimedia.org/r/370627 (https://phabricator.wikimedia.org/T172362) [11:01:17] (03CR) 10jerkins-bot: [V: 04-1] labs/puppet3-diffs: Bump version of the puppet compiler [puppet] - 10https://gerrit.wikimedia.org/r/370627 (https://phabricator.wikimedia.org/T172362) (owner: 10Giuseppe Lavagetto) [11:02:27] (03PS2) 10Giuseppe Lavagetto: labs/puppet3-diffs: Bump version of the puppet compiler [puppet] - 10https://gerrit.wikimedia.org/r/370627 (https://phabricator.wikimedia.org/T172362) [11:03:02] (03CR) 10Giuseppe Lavagetto: [C: 032] labs/puppet3-diffs: Bump version of the puppet compiler [puppet] - 10https://gerrit.wikimedia.org/r/370627 (https://phabricator.wikimedia.org/T172362) (owner: 10Giuseppe Lavagetto) [11:12:05] (03PS5) 10Ladsgroup: mediawiki: Add puppetized cronjob for rebuildTermSqlIndex [puppet] - 10https://gerrit.wikimedia.org/r/370626 (https://phabricator.wikimedia.org/T171460) [11:22:12] !log start of ladsgroup@terbium:~$ timeout 3500s /usr/local/bin/mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/rebuildTermSqlIndex.php --wiki wikidatawiki --entity-type=item >>/tmp/rebuildTermSqlIndex.log 2>&1 (T171460) [11:22:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:25] T171460: Populate term_full_entity_id on www.wikidata.org - https://phabricator.wikimedia.org/T171460 [11:28:55] (03PS6) 10Ladsgroup: mediawiki: Add puppetized cronjob for rebuildTermSqlIndex [puppet] - 10https://gerrit.wikimedia.org/r/370626 (https://phabricator.wikimedia.org/T171460) [11:54:17] PROBLEM - High lag on wdqs1001 is CRITICAL: CRITICAL: 34.48% of data above the critical threshold [1800.0] [12:02:06] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0] [12:13:42] ACKNOWLEDGEMENT - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [1000.0] Gehel known issue, work in progress - T169498 [12:14:42] !log stop eventlogging on eventlog1001 to test kafka consumer failures [12:14:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:15:46] !log restarting wdqs-updater on wdqs1001 [12:15:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:24] (03PS1) 10Aklapper: Block vandalism IP that repeatedly added comments / uploaded files [puppet] - 10https://gerrit.wikimedia.org/r/370630 [12:20:45] !log start of ladsgroup@terbium:~$ /usr/local/bin/mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/rebuildTermSqlIndex.php --wiki wikidatawiki --entity-type=property (T172776) [12:20:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:20:56] T172776: Property labels missing on some items - https://phabricator.wikimedia.org/T172776 [12:27:47] PROBLEM - pdfrender on scb1002 is CRITICAL: connect to address 10.64.16.21 and port 5252: Connection refused [12:30:47] (03PS1) 10Elukey: Force Kafka protocol version [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/370631 (https://phabricator.wikimedia.org/T172681) [12:31:45] checking scb1002.. [12:32:26] !log restart pdfrender on scb1002 [12:32:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:47] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.003 second response time [12:37:26] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] [12:39:16] (03PS2) 10Elukey: Force Kafka protocol version [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/370631 (https://phabricator.wikimedia.org/T172681) [12:47:37] RECOVERY - High lag on wdqs1001 is OK: OK: Less than 30.00% above the threshold [600.0] [12:49:39] !log start of ladsgroup@terbium:~$ /usr/local/bin/mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/rebuildTermSqlIndex.php --wiki wikidatawiki --entity-type=property --rebuild-all-terms (T172776) [12:49:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:49:51] T172776: Property labels missing on some items - https://phabricator.wikimedia.org/T172776 [12:50:26] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [12:55:47] ACKNOWLEDGEMENT - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [1000.0] Gehel known issue, investigation in progress - T169498 [13:00:13] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170808T1300). Please do the needful. [13:00:13] Urbanecm and Amir1: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:00:31] o/ [13:00:42] !log restart varnishkafka-webrequest with kafka.broker.version.fallback=0.9.0.1 + kafka.api.version.request=false on cp3032 (local test, to rollback remove the lines from /etc/varnishkafka/webrequest.conf) [13:00:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:27] I'm here [13:02:20] Who's the current SWATter? [13:06:27] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] [13:09:32] (03CR) 10Elukey: [C: 032] Force Kafka protocol version [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/370631 (https://phabricator.wikimedia.org/T172681) (owner: 10Elukey) [13:14:46] We have the problem all over again it seems :D [13:15:16] I have the yubikey around today so I can SWAT but I need an explicit approval from releng [13:17:38] (03PS1) 10Elukey: role::cache::kafka::*: force kafka protocol to 0.9.0.1 [puppet] - 10https://gerrit.wikimedia.org/r/370637 (https://phabricator.wikimedia.org/T172681) [13:21:02] (03CR) 10Umherirrender: "Anything open, which avoids merging of this patch set?" [puppet] - 10https://gerrit.wikimedia.org/r/363851 (https://phabricator.wikimedia.org/T89741) (owner: 10Umherirrender) [13:21:17] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/7337/cp3032.esams.wmnet/ looks good plus cp3032 is running with these settings now, all loo" [puppet] - 10https://gerrit.wikimedia.org/r/370637 (https://phabricator.wikimedia.org/T172681) (owner: 10Elukey) [13:28:13] 10Operations, 10Cloud-Services, 10Cloud-VPS: logrotate/disk space on silver for nutcracker log - https://phabricator.wikimedia.org/T120683#3509264 (10Andrew) 05Open>03Resolved When all is well, silver is fine -- If something breaks that increases the logging rate then I get alerts. That said, I think we... [13:40:47] Urbanecm: around? [13:40:53] got the SWAT approval [13:41:00] Yep. [13:41:05] Great, would you SWAT? [13:41:09] yeah [13:41:12] let's start [13:41:35] Great. Waiting till your ping :) [13:41:48] welcome to my first official SWAT, let's not crash and I hope you enjoy deploying with releng [13:42:57] (03CR) 10Ladsgroup: [C: 032] Fix srwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370183 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [13:43:02] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370183 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [13:43:30] Amir1: :) :) [13:44:26] (03Merged) 10jenkins-bot: Fix srwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370183 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [13:46:23] (03CR) 10jenkins-bot: Fix srwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370183 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [13:47:05] Urbanecm: your patch is up in mwdebug1002 [13:47:16] Amir1, great, will test [13:49:54] Amir1, working, please deploy to the whole universe [13:50:02] ack [13:51:14] 10Operations, 10ops-eqiad, 10Analytics-Kanban, 10User-Elukey: Analytics1034 eth0 negotiated speed to 100Mb/s instead of 1000Mb/s - https://phabricator.wikimedia.org/T172633#3509314 (10Cmjohnson) @elukey: first and easiest thing is to swap the ethernet cable. Since you're not servicing traffic, I will jus... [13:51:46] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0] [13:54:24] !log ladsgroup@tin Synchronized static/images/project-logos: SWAT: Fix srwiki logos (T150618) (duration: 00m 48s) [13:54:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:54:37] T150618: Provide HD logos for all projects - https://phabricator.wikimedia.org/T150618 [13:54:41] Urbanecm: your patch should be live, can you check and let me know? [13:54:44] 10Operations, 10ops-eqiad, 10Analytics-Kanban, 10User-Elukey: Analytics1034 eth0 negotiated speed to 100Mb/s instead of 1000Mb/s - https://phabricator.wikimedia.org/T172633#3509317 (10elukey) +1! [13:55:27] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3509319 (10Halfak) [13:55:31] ACKNOWLEDGEMENT - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0] Gehel known issue, investigation in progress - T169498 [13:56:11] (03PS1) 10Jcrespo: mariadb: temporarely depooling db1060 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370640 (https://phabricator.wikimedia.org/T166546) [13:56:13] !log dump of task API for elasticsearch eqiad - T169498 [13:56:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:24] T169498: Investigate load spikes on the elasticsearch cluster in eqiad - https://phabricator.wikimedia.org/T169498 [13:56:26] Amir1, ack [13:57:06] Amir1, working, thank you for your SWAT! [13:57:31] Thank you for choosing releng, see you soon [13:57:40] (03PS2) 10Elukey: role::cache::kafka::*: force kafka protocol to 0.9.0.1 [puppet] - 10https://gerrit.wikimedia.org/r/370637 (https://phabricator.wikimedia.org/T172681) [13:58:43] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3509335 (10elukey) >>! In T172410#3508852, @jcrespo wrote: > The only thing I commented is that we are going to deprecate multi-source replication, so instead of one big fat... [13:59:00] 10Operations, 10puppet-compiler, 10Patch-For-Review, 10User-Joe: puppet compiler fails with modules using puppetdb - https://phabricator.wikimedia.org/T150456#3509336 (10Joe) Status update: * puppetdb now runs on every puppet-compiler node, using the CA/certs combination used by the puppet compiler * the... [13:59:24] (03PS2) 10Ladsgroup: Add copyright info for Wikidata API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370611 (https://phabricator.wikimedia.org/T112606) [14:01:19] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370611 (https://phabricator.wikimedia.org/T112606) (owner: 10Ladsgroup) [14:02:41] (03Merged) 10jenkins-bot: Add copyright info for Wikidata API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370611 (https://phabricator.wikimedia.org/T112606) (owner: 10Ladsgroup) [14:02:46] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] [14:02:54] (03CR) 10jenkins-bot: Add copyright info for Wikidata API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370611 (https://phabricator.wikimedia.org/T112606) (owner: 10Ladsgroup) [14:05:51] pulled in mwdebug1002, works just fine [14:05:58] going to all [14:07:31] !log ladsgroup@tin Synchronized wmf-config/Wikibase-production.php: SWAT: Add copyright info for Wikidata API (T112606) (duration: 00m 47s) [14:07:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:42] T112606: [Bug] The API query for rightsinfo on www.wikidata.org reports CC-SA 3.0 , while its page footer says CC0 as well - https://phabricator.wikimedia.org/T112606 [14:08:35] My patch is everywhere and looks fine [14:08:43] SWAT is officially done now [14:08:44] (03Abandoned) 10Elukey: role::cache::kafka::*: force kafka protocol to 0.9.0.1 [puppet] - 10https://gerrit.wikimedia.org/r/370637 (https://phabricator.wikimedia.org/T172681) (owner: 10Elukey) [14:10:50] (03PS1) 10Elukey: role::cache::kafka::[el|statsv]: force kafka protocol to 0.9.0.1 [puppet] - 10https://gerrit.wikimedia.org/r/370644 (https://phabricator.wikimedia.org/T172681) [14:12:51] (03CR) 10Elukey: [C: 032] "Looks good: https://puppet-compiler.wmflabs.org/compiler02/7340/cp3032.esams.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/370644 (https://phabricator.wikimedia.org/T172681) (owner: 10Elukey) [14:16:23] !log set mw2256 pooled=inactive + downtime to allow BIOS upgrade - T163346 [14:16:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:16:34] T163346: mw2256 - hardware issue - https://phabricator.wikimedia.org/T163346 [14:17:59] (03CR) 10Eevans: "This will break single instance setups, won't it?" [puppet] - 10https://gerrit.wikimedia.org/r/370554 (https://phabricator.wikimedia.org/T172610) (owner: 10Mobrovac) [14:19:03] !log restart of all the varnishkafka statsv/eventlogging instances on caching hosts to pick up https://gerrit.wikimedia.org/r/370644 (puppet automatic restarts) [14:19:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:02] (03CR) 10Gehel: Transports: improve target management (032 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/367825 (https://phabricator.wikimedia.org/T171684) (owner: 10Volans) [14:35:26] RECOVERY - Disk space on stat1005 is OK: DISK OK [14:35:56] niceeee [14:41:34] (03PS1) 10Herron: Add system filter for list owner spam from numeric qq.com addresses [puppet] - 10https://gerrit.wikimedia.org/r/370650 [14:42:35] (03CR) 10Herron: [C: 032] Add system filter for list owner spam from numeric qq.com addresses [puppet] - 10https://gerrit.wikimedia.org/r/370650 (owner: 10Herron) [14:42:43] (03PS2) 10Mobrovac: Cassandra: Do not include the main DNS in the list of seeds [puppet] - 10https://gerrit.wikimedia.org/r/370554 (https://phabricator.wikimedia.org/T172610) [14:48:24] (03CR) 10Mobrovac: [C: 04-1] "Good catch, Eric! Still working on it as the current version breaks maps..." [puppet] - 10https://gerrit.wikimedia.org/r/370554 (https://phabricator.wikimedia.org/T172610) (owner: 10Mobrovac) [14:52:23] (03CR) 10Elukey: Cassandra: Do not include the main DNS in the list of seeds (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/370554 (https://phabricator.wikimedia.org/T172610) (owner: 10Mobrovac) [14:54:43] XioNoX: hello i am ready wen you are [14:56:09] papaul: hello! Let's see if Jeff_Green is around [14:56:19] yup [14:56:26] Jeff_Green: hey [14:56:28] great [14:56:32] i am here too [14:56:35] hey hey [14:56:36] fwiw [14:57:20] Jeff_Green: good to start? [14:57:48] pretty much, but lemme put all the codfw hosts in icinga down-for-maintenance mode [14:58:04] no problem, just let me know when you're good [14:58:08] 10Operations, 10Ops-Access-Requests: Requesting access to wikimedia-tech-channel-op for Luke081515 - https://phabricator.wikimedia.org/T172793#3509528 (10Luke081515) [15:01:47] 10Operations, 10ops-codfw: mw2256 - hardware issue - https://phabricator.wikimedia.org/T163346#3509560 (10Papaul) @elukey done Update firmware from 2.40 to 2.41 Update BIOS from 2.3.4 to 2.4.2 [15:01:54] XioNoX: ok good [15:02:50] alright! [15:03:06] !log starting pfw-codfw migration - T171970 [15:03:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:20] T171970: Move codfw frack to new infra - https://phabricator.wikimedia.org/T171970 [15:04:24] disabling the interfaces, and advertising the pfw-codfw space from the new firewalls, I know that at least smokeping will complain, if not other external monitoring [15:05:21] papaul: you're good to move server's uplinks [15:06:18] ok [15:09:26] PROBLEM - Router interfaces on pfw-eqiad is CRITICAL: CRITICAL: host 208.80.154.218, interfaces up: 104, down: 1, dormant: 0, excluded: 3, unused: 0 [15:11:36] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 2065854 [15:14:01] ACKNOWLEDGEMENT - Router interfaces on pfw-eqiad is CRITICAL: CRITICAL: host 208.80.154.218, interfaces up: 104, down: 1, dormant: 0, excluded: 3, unused: 0: Ayounsi T171970 [15:16:08] (03PS1) 10Herron: Change system filter for list owner spam from numeric qq.com addresses [puppet] - 10https://gerrit.wikimedia.org/r/370655 [15:16:44] (03CR) 10Herron: [C: 032] Change system filter for list owner spam from numeric qq.com addresses [puppet] - 10https://gerrit.wikimedia.org/r/370655 (owner: 10Herron) [15:18:53] XioNoX: all the servers i am plugging into fasw-c8b have no link [15:19:36] papaul: to the ports listed on T171970 ? [15:19:36] T171970: Move codfw frack to new infra - https://phabricator.wikimedia.org/T171970 [15:20:06] mobrovac: o/ do you have time to fix eventbus deployment with me? I am kinda worried that we'll need an urgent deployment and scap will not be available (depool failures) [15:20:31] huh elukey, i'm at debconf now [15:20:37] elukey: when would you like to do it? [15:21:29] papaul: I see link and mac on some ports, like ge-0/0/0 to /2 [15:21:55] XioNoX: give me a minute checking cables [15:23:13] and I can ping payments2001 from the firewall [15:25:30] mobrovac: ah snap I didn't know it! Whenever you want, even later on during the week [15:26:34] it would be awesome to easily be able to mute smokeping alerts [15:26:35] XioNoX: payments2002 you said ge-1/0/3 is it on fasw-c8b-codfw? [15:26:51] yep [15:27:38] XioNoX: ge-1/0/3 or ge-0/0/3 [15:28:13] 1/0/x is on the b node (serial PE3717131102) [15:28:51] 0/0/x is on the A node (serial PE3717130734) [15:28:56] papaul: ^ [15:29:22] so payments2002 is on 1/0/3 [15:33:51] XioNoX: can you see payements2002 now [15:33:57] or ping it [15:34:15] ge-1/0/3 up down [15:34:21] papaul: still shows as down [15:35:22] papaul: payment2002 should be on the other node compared to payment2001 [15:37:46] XioNoX: i have payment2001 on fasw-c8a and payments2002 on fasw-c8b [15:39:02] papaul: ge-1/0/3 (fasw-c8b) still shows as down :/ [15:39:20] XioNoX: let me replace cable [15:39:43] papaul: okay, then skip it and connect pay-lvs2002 (ge-1/0/4) [15:40:09] oh, are they crossover cables? [15:40:24] XioNoX: check now [15:40:56] 1/0/3 is good now [15:40:58] yay [15:41:04] papaul: ^ [15:41:09] XioNoX: ok [15:42:51] XioNoX: pay-ovs2002 [15:42:57] pay-lvs2002 [15:42:59] check [15:43:59] papaul: it pings! [15:46:38] XioNoX: ok [15:47:13] (03PS1) 10Ayounsi: Remove pfw-codfw from Smokeping, Rancid, Torrus, Icinga [puppet] - 10https://gerrit.wikimedia.org/r/370658 (https://phabricator.wikimedia.org/T171970) [15:47:54] fyi, those are the interfaces still down https://www.irccloud.com/pastebin/TbWet3sk/ [15:48:43] (03PS1) 10Elukey: role::cache::kafka::webrequest: force Kafka protocol to 0.9.0.1 [puppet] - 10https://gerrit.wikimedia.org/r/370659 (https://phabricator.wikimedia.org/T172681) [15:57:07] (03PS1) 10Ayounsi: Remove pfw-codfw from DNS [dns] - 10https://gerrit.wikimedia.org/r/370661 (https://phabricator.wikimedia.org/T171970) [15:58:58] Jeff_Green: how is it looking so far for you? Seeing hosts comming back to life? [15:59:07] XioNoX: looking [16:00:02] (03PS2) 10Elukey: role::cache::kafka::webrequest: force Kafka protocol to 0.9.0.1 [puppet] - 10https://gerrit.wikimedia.org/r/370659 (https://phabricator.wikimedia.org/T172681) [16:00:02] i am getting to stuff [16:00:05] godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170808T1600). [16:00:06] Amir1: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [16:00:14] o/ [16:01:21] XioNoX: done [16:01:24] XioNoX: yeah, so far so good [16:01:39] papaul: I see all the ports as up, thanks [16:01:40] (03PS2) 10Jcrespo: mediawiki: Another increase of batch size in dispatchChanges cronjob [puppet] - 10https://gerrit.wikimedia.org/r/370315 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup) [16:01:46] XioNoX: yw [16:02:01] papaul: I'll need you for the failover tests once we confirm everything is properly up [16:02:23] ok [16:02:28] (03CR) 10Jcrespo: [C: 032] mediawiki: Another increase of batch size in dispatchChanges cronjob [puppet] - 10https://gerrit.wikimedia.org/r/370315 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup) [16:02:29] Jeff_Green: please tests everything as much as you can :) [16:02:43] (03CR) 10Elukey: [C: 032] role::cache::kafka::webrequest: force Kafka protocol to 0.9.0.1 [puppet] - 10https://gerrit.wikimedia.org/r/370659 (https://phabricator.wikimedia.org/T172681) (owner: 10Elukey) [16:02:45] will do [16:02:54] (03PS3) 10Elukey: role::cache::kafka::webrequest: force Kafka protocol to 0.9.0.1 [puppet] - 10https://gerrit.wikimedia.org/r/370659 (https://phabricator.wikimedia.org/T172681) [16:02:56] (03CR) 10Elukey: [V: 032 C: 032] role::cache::kafka::webrequest: force Kafka protocol to 0.9.0.1 [puppet] - 10https://gerrit.wikimedia.org/r/370659 (https://phabricator.wikimedia.org/T172681) (owner: 10Elukey) [16:04:42] !log rolling restart of varnishkafka-webrequest to apply https://gerrit.wikimedia.org/r/#/c/370659/ (puppet automatically restarts) [16:04:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:06:26] RECOVERY - Router interfaces on pfw-eqiad is OK: OK: host 208.80.154.218, interfaces up: 104, down: 0, dormant: 0, excluded: 3, unused: 0 [16:06:43] XioNoX: i think lvs may not be working [16:08:03] Jeff_Green: can you tell me more? [16:09:03] sec [16:09:15] (03CR) 10Gehel: "puppet compiler seems happy" [puppet] - 10https://gerrit.wikimedia.org/r/369682 (owner: 10Gehel) [16:09:23] huh this time it worked [16:09:42] i'm just testing 208.80.152.228 which should be the public IP that routes via lvs to the payments servers [16:09:44] jynus: thanks! [16:10:45] XioNoX: nevermind, it was a typo in the hostname [16:10:53] cool [16:12:50] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3509765 (10Nuria) Ok, seems that this ticket can be closed as it talks about work that is tracked on tickets and projects elsewhere. [16:12:55] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3509766 (10Nuria) 05Open>03Resolved [16:13:08] from my side everything looks good so far [16:14:11] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3509773 (10Halfak) Is switching analytics-store from multi-source to multi-instance tracked elsewhere? Is there a task for identifying the user-implications of making the sw... [16:16:53] 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3509774 (10jcrespo) It is a Tecnology goal, it was already discussed by all managers: https://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2017-18_Q1 [16:18:54] (03PS6) 10Gehel: wdqs - remove upstart configuration files [puppet] - 10https://gerrit.wikimedia.org/r/369688 [16:19:23] (03PS7) 10Gehel: wdqs - remove upstart configuration files [puppet] - 10https://gerrit.wikimedia.org/r/369688 [16:24:08] Jeff_Green: how is it going? [16:25:07] so far so good [16:25:28] awesome [16:29:10] (03PS8) 10Gehel: wdqs - remove upstart configuration files [puppet] - 10https://gerrit.wikimedia.org/r/369688 [16:29:15] Jeff_Green: let me know when we can do the failover tests [16:29:26] now is fine [16:30:36] PROBLEM - Check Varnish expiry mailbox lag on cp1072 is CRITICAL: CRITICAL: expiry mailbox lag is 2026354 [16:32:25] Jeff_Green: anything in particular we should monitor during those tests? [16:32:43] you're just testing what happens i.e. if you kill a firewall? [16:32:43] (03Abandoned) 10Gehel: wdqs - explicit scope for systemd templates to ensure puppet 4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/369685 (owner: 10Gehel) [16:33:25] Jeff_Green: yeah, and how long it takes to recover [16:35:09] welllll [16:35:31] seems like we can test cross-datacenter pings [16:35:32] (03PS9) 10Gehel: wdqs - remove upstart configuration files [puppet] - 10https://gerrit.wikimedia.org/r/369688 [16:35:34] and public pings [16:35:55] oh cross DC is a good idea [16:36:00] (03PS1) 10Andrew Bogott: nova-fullstack: adjust timeouts [puppet] - 10https://gerrit.wikimedia.org/r/370665 (https://phabricator.wikimedia.org/T165555) [16:36:28] Jeff_Green: what's the jumphost name in eqiad? [16:36:40] we could maybe set up a ping internally running in screen too, but it seems unnecessary [16:36:50] tellurium.wikimedia.org for eqiad [16:39:41] er, password expired [16:39:42] haha [16:40:41] 10Operations, 10Ops-Access-Requests, 10Discovery, 10Wikidata, and 2 others: allow wdqs-admins to pool / depool wdqs servers - https://phabricator.wikimedia.org/T172798#3509813 (10Gehel) [16:40:54] (03PS2) 10Gehel: wdqs - allow wdqs-admins to pool / depool servers [puppet] - 10https://gerrit.wikimedia.org/r/370198 (https://phabricator.wikimedia.org/T172798) [16:40:55] :-P [16:41:02] Jeff_Green: okay, I'm running an mtr from bast2001 to rigel, and another one from tellurium to frbackup2001, sounds good? [16:41:26] (03CR) 10Gehel: [C: 04-1] "This requires approval in weekly Ops meeting before merging." [puppet] - 10https://gerrit.wikimedia.org/r/370198 (https://phabricator.wikimedia.org/T172798) (owner: 10Gehel) [16:41:26] PROBLEM - Check Varnish expiry mailbox lag on cp1049 is CRITICAL: CRITICAL: expiry mailbox lag is 2059805 [16:41:38] (03CR) 10Andrew Bogott: [C: 032] nova-fullstack: adjust timeouts [puppet] - 10https://gerrit.wikimedia.org/r/370665 (https://phabricator.wikimedia.org/T165555) (owner: 10Andrew Bogott) [16:41:39] XioNoX: yup, great [16:46:03] Jeff_Green: hum, I'm already seeing some intermittent packet loss over the vpn [16:46:12] 64 bytes from frbackup2001.frack.codfw.wmnet (10.195.0.77): icmp_seq=23 ttl=62 time=37.0 ms [16:46:12] 64 bytes from frbackup2001.frack.codfw.wmnet (10.195.0.77): icmp_seq=32 ttl=62 time=37.1 ms [16:46:19] here is like 9 pings losts [16:46:32] huh [16:47:00] do we have utilization graphs for that link? [16:47:56] yeah on librenms, one sec [16:51:21] papaul: there are some inbound CRC errors on fasw-c8a:xe-0/2/0 [16:51:32] could you please replace the optic there? [16:51:44] and verify cable, etc etc [16:52:34] XioNoX:ok [16:52:38] that's the link to pfw3a [16:53:04] XioNoX im not dcops or anything but if you have anything a volunteer can do to assist with your current task let me know [16:54:13] thank you! so far everything is under control, if you're curious we're doing https://phabricator.wikimedia.org/T171970 [16:57:07] Jeff_Green: the ipsec tunnel is under used so far https://librenms.wikimedia.org/device/device=153/tab=port/port=13267/ the CRC error though could be related to the packet loss [16:57:56] PROBLEM - Router interfaces on pfw3-codfw is CRITICAL: CRITICAL: host 208.80.153.197, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0 [16:58:45] XioNoX: done [16:58:56] RECOVERY - Router interfaces on pfw3-codfw is OK: OK: host 208.80.153.197, interfaces up: 65, down: 0, dormant: 0, excluded: 0, unused: 0 [16:59:32] XioNoX we've been seeing connection lag for a while between datacenters, I wonder if that's related [16:59:41] !log Branching mediawiki/master to mediawiki/wmf/1.30.0-wmf.13 refs T170631 [16:59:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:59:52] T170631: 1.30.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T170631 [17:00:04] gwicke, cscott, arlolra, subbu, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170808T1700). [17:00:11] (03PS1) 10Andrew Bogott: bootstrap_vz firstboot: specify eth0 when refreshing dhcp [puppet] - 10https://gerrit.wikimedia.org/r/370668 (https://phabricator.wikimedia.org/T165555) [17:00:16] papaul: thanks [17:00:17] PROBLEM - Host pfw-codfw is DOWN: CRITICAL - Network Unreachable (208.80.153.195) [17:01:26] RECOVERY - Check Varnish expiry mailbox lag on cp1049 is OK: OK: expiry mailbox lag is 0 [17:01:44] pfw-codfw down, expected? [17:02:18] maybe downtime expired? [17:02:20] jynus: yeah, the downtime expired [17:02:26] ok, cool [17:03:17] ACKNOWLEDGEMENT - Host pfw-codfw is DOWN: CRITICAL - Network Unreachable (208.80.153.195) Ayounsi pfw-codfw migration [17:03:17] (03CR) 10Andrew Bogott: [C: 032] bootstrap_vz firstboot: specify eth0 when refreshing dhcp [puppet] - 10https://gerrit.wikimedia.org/r/370668 (https://phabricator.wikimedia.org/T165555) (owner: 10Andrew Bogott) [17:04:42] (03CR) 10Smalyshev: [C: 031] wdqs - allow wdqs-admins to pool / depool servers [puppet] - 10https://gerrit.wikimedia.org/r/370198 (https://phabricator.wikimedia.org/T172798) (owner: 10Gehel) [17:18:10] Jeff_Green, papaul, I think I found the root cause of that packet loss issue, let's do the failover tests [17:18:42] cool, pls. go ahead [17:18:50] XioNoX: ok [17:19:45] papaul: please unplug the cr1 – pfw3a link [17:20:34] XioNoX: done [17:21:06] papaul: plug it back and then do the same with cr2-pfw3b [17:21:37] ok [17:21:52] XioNoX: done [17:22:31] papaul: plug it back, then do the same with pfw3a – fasw-c8a link [17:23:24] XioNoX: done [17:23:40] awesome [17:24:08] papaul: re-plug them pfw3b – fasw-c8b link [17:24:29] XioNoX: done [17:25:08] 10Operations, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681#3509923 (10elukey) All the varnishkafkas are now forced (via lidbrdkafka settings) to communicate using the kafka `0.9.0.... [17:25:50] papaul: replug, and now loet's do something more tricky, unplug the HA control link between the two pfw [17:26:34] XioNoX: done [17:28:12] papaul: please replug [17:28:30] XioNoX: done [17:31:35] need the two nodes to re-sync, etc... [17:33:38] XioNoX: the connection lag eqiad->codfw seems a lot better than before today [17:33:41] papaul: okay, unplug the power from pfw3b (backup node) [17:34:23] and plug it back [17:34:26] PROBLEM - Router interfaces on pfw3-codfw is CRITICAL: CRITICAL: host 208.80.153.197, interfaces up: 47, down: 2, dormant: 0, excluded: 0, unused: 0 [17:36:16] XioNoX: done [17:36:47] let's wait for it to come back up [17:37:15] there are only two things left to test, the primary firewall node, and the fabric/data link between the two firewalls [17:38:46] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [17:40:47] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [17:42:36] RECOVERY - Router interfaces on pfw3-codfw is OK: OK: host 208.80.153.197, interfaces up: 65, down: 0, dormant: 0, excluded: 0, unused: 0 [17:43:20] out of curiousity how did cr2-codfw go from 120 with only 1 down to having a total of 122? [17:43:30] same with pfw3 [17:44:07] unless my math is wrong it should be 121 and 49 respectivly [17:44:11] papaul: please powercycle pfw3a [17:44:14] respectively* [17:44:16] (03PS1) 10Andrew Bogott: nova fullstack: use new jessie-8.9 image [puppet] - 10https://gerrit.wikimedia.org/r/370673 [17:45:01] XioNoX: done [17:45:05] Zppix: I think the check ignores subinterfaces when the primary goes down [17:45:17] ah [17:45:23] that makes sense [17:45:44] (03CR) 10Andrew Bogott: [C: 032] nova fullstack: use new jessie-8.9 image [puppet] - 10https://gerrit.wikimedia.org/r/370673 (owner: 10Andrew Bogott) [17:46:46] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [17:47:37] PROBLEM - Router interfaces on pfw3-codfw is CRITICAL: CRITICAL: host 208.80.153.197, interfaces up: 47, down: 2, dormant: 0, excluded: 0, unused: 0 [17:49:26] PROBLEM - puppet last run on cp3038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:49:46] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [17:50:18] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:50:46] RECOVERY - Router interfaces on pfw3-codfw is OK: OK: host 208.80.153.197, interfaces up: 65, down: 0, dormant: 0, excluded: 0, unused: 0 [17:51:06] PROBLEM - puppet last run on cp2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:51:26] PROBLEM - puppet last run on cp3046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:51:29] papaul: while it reboots, I'm still seeing errors on the link between pfw3a and fasw, could you replace the fiber and the interface on the other side? (if not alread done) [17:51:36] PROBLEM - puppet last run on cp3043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:51:43] pfw3a-codfw:xe-0/0/17 [17:52:01] 10Operations, 10Epic, 10Goal, 10Services (doing), and 2 others: Services Q1 2017/18 goal: Begin migrating job queue processing to multi-DC enabled eventbus infrastructure. - https://phabricator.wikimedia.org/T169937#3510012 (10GWicke) p:05Triage>03Normal [17:52:16] PROBLEM - puppet last run on cp2014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:53:06] PROBLEM - puppet last run on cp1072 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:53:06] PROBLEM - puppet last run on cp1071 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:53:17] PROBLEM - puppet last run on cp1053 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:53:22] ema: ^ [17:53:54] XioNoX: i just replaced both the fiber and the connector on both sides [17:54:02] papaul: thanks [17:54:16] PROBLEM - puppet last run on cp2017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:54:17] PROBLEM - puppet last run on cp2026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:54:51] papaul: last thing, please unplug the data link between the two firewalls (xe-0/0/19) [17:55:09] XioNoX: done [17:55:17] PROBLEM - puppet last run on cp1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:55:46] PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:55:46] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:55:52] papaul: awesome, plug it back and we're all done with the DC work, thank you! [17:56:21] XioNoX: done [17:56:56] PROBLEM - puppet last run on cp1058 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:56:56] PROBLEM - puppet last run on cp2022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:57:17] PROBLEM - puppet last run on cp2024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:57:17] PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:57:26] 10Operations, 10Citoid, 10ContentTranslation, 10ContentTranslation-CXserver, and 4 others: Decom legacy ex-parsoidcache cxserver, citoid, and restbase service hostnames - https://phabricator.wikimedia.org/T133001#2216638 (10GWicke) p:05Normal>03Triage [17:57:27] PROBLEM - puppet last run on cp3048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:57:46] PROBLEM - puppet last run on cp2025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:57:47] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:57:56] PROBLEM - puppet last run on cp3049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:57:57] PROBLEM - Router interfaces on pfw3-codfw is CRITICAL: CRITICAL: No response from remote host 208.80.153.197 for 1.3.6.1.2.1.2.2.1.8 with snmp version 2 [17:58:17] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:58:26] 10Operations, 10Citoid, 10ContentTranslation, 10ContentTranslation-CXserver, and 4 others: Decom legacy ex-parsoidcache cxserver, citoid, and restbase service hostnames - https://phabricator.wikimedia.org/T133001#2216638 (10GWicke) p:05Triage>03Normal [17:58:56] PROBLEM - puppet last run on cp2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:58:57] PROBLEM - puppet last run on cp1064 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:59:06] PROBLEM - puppet last run on cp2016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:59:06] PROBLEM - puppet last run on cp2019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:59:16] PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:59:17] PROBLEM - puppet last run on cp3036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:59:36] PROBLEM - BGP status on pfw3-codfw is CRITICAL: BGP CRITICAL - No response from remote host 208.80.153.197 [17:59:56] PROBLEM - puppet last run on cp2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:00:26] RECOVERY - BGP status on pfw3-codfw is OK: BGP OK - up: 1, down: 0, shutdown: 4 [18:00:36] PROBLEM - puppet last run on cp3039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:00:48] PROBLEM - puppet last run on cp4015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:00:56] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:02:07] PROBLEM - puppet last run on cp3032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:02:17] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:02:36] PROBLEM - puppet last run on cp2012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:02:46] PROBLEM - puppet last run on cp2008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:03:06] PROBLEM - puppet last run on cp1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:03:06] PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:03:37] PROBLEM - puppet last run on cp2011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:03:37] PROBLEM - puppet last run on cp1049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:03:56] RECOVERY - Router interfaces on pfw3-codfw is OK: OK: host 208.80.153.197, interfaces up: 65, down: 0, dormant: 0, excluded: 0, unused: 0 [18:03:56] PROBLEM - puppet last run on cp1045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:04:26] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:04:56] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:04:56] PROBLEM - puppet last run on cp2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:05:06] PROBLEM - puppet last run on cp3034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:05:11] woa, anybody checking? [18:05:27] PROBLEM - puppet last run on cp4017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:05:36] PROBLEM - puppet last run on cp3037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:05:46] mError: Could not retrieve catalog from remote server: Error 400 on SERVER: Invalid parameter force_protocol_version on Varnishkafka::Instance[webrequest] at /etc/puppet/modules/role/manifests/cache/kafka/webrequest.pp:121 on node cp3034.esams.wmnet [18:05:46] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:05:58] so this seems to be related to my last change, but it was a long time ago [18:06:48] like two hours ago [18:06:49] sigh [18:07:26] PROBLEM - puppet last run on cp1073 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:08:06] PROBLEM - puppet last run on cp3031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:08:16] PROBLEM - puppet last run on cp1062 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:08:57] PROBLEM - puppet last run on cp3030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:08:58] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:08:58] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:09:06] argh andrewbogott your change reverted mine :( [18:09:15] https://gerrit.wikimedia.org/r/#/c/370673/ [18:09:18] modules varnishkafka [18:09:49] wha? [18:09:53] oh, I see, sorry [18:10:15] This was a rare occasion where I did 'git commit .' instead of adding individual files [18:10:18] lemme see if I can fix [18:10:26] PROBLEM - puppet last run on cp3044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:10:36] PROBLEM - puppet last run on cp1074 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:10:46] PROBLEM - puppet last run on cp3035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:10:46] PROBLEM - puppet last run on cp1099 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:10:55] elukey: unless you already have a fix ready to go [18:11:17] andrewbogott: nope, but it should be simply to revert the submodule change.. I can try if you want [18:11:26] !log stop ircecho to avoid puppet shower [18:11:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:48] (03PS1) 10Andrew Bogott: Revert "nova fullstack: use new jessie-8.9 image" [puppet] - 10https://gerrit.wikimedia.org/r/370676 [18:11:57] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:12:12] (03CR) 10Andrew Bogott: [C: 032] "This contained a completely unwarranted change to varnishkafka" [puppet] - 10https://gerrit.wikimedia.org/r/370676 (owner: 10Andrew Bogott) [18:13:09] elukey: reverted — I'll redo my patch correctly [18:13:17] andrewbogott: thanks! [18:14:15] I hope the revert didn't just cause more chaos [18:14:33] nono I just checked and it should be a no-op (hopefully) [18:15:25] (03PS1) 10Andrew Bogott: nova fullstack: move to new jessie image [puppet] - 10https://gerrit.wikimedia.org/r/370677 [18:15:36] just ran puppet on cp3034, all good (no-op) [18:16:54] (03CR) 10Andrew Bogott: [C: 032] nova fullstack: move to new jessie image [puppet] - 10https://gerrit.wikimedia.org/r/370677 (owner: 10Andrew Bogott) [18:17:10] 10Operations, 10fundraising-tech-ops, 10netops, 10Patch-For-Review: Move codfw frack to new infra - https://phabricator.wikimedia.org/T171970#3510082 (10ayounsi) **Failover tests** Rate: 1 ping per second |Tested item|ping from bast2001 to rigel|ping from external to pfw3|ping from tellurium to frbackup20... [18:17:20] Jeff_Green: https://phabricator.wikimedia.org/T171970#3510082 [18:18:08] It seems like I hit a bug during the manual failover (as it went worse than the "unexpected" one) [18:18:18] ok [18:18:58] Jeff_Green: other than that I'm quite happy with the numbers. We could probably make some of the things fasters (like IPsec) [18:19:28] yep, it's certainly better than the >5min before [18:20:27] running cumin/puppet with --failed-only for the cp hosts [18:20:38] irc-echo still disabled [18:24:50] Jeff_Green: an interface still shows some errors, but a very low rate, so nothing to be worried about, will investigate though [18:25:03] ok [18:26:09] will let it sits for an hour or two, then I'll start cleaning up monitoring, etc. if you're okay with it [18:26:33] Jeff_Green: ^ [18:26:41] yep, sounds good [18:27:57] !log re-enabled irc-echo after the puppet shower [18:28:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:57] !log frack-codfw moved to new infrastructure [18:29:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:35:09] 10Operations, 10ops-eqiad, 10Analytics-Kanban: Broken disk on analytics1055 - https://phabricator.wikimedia.org/T172808#3510118 (10elukey) [18:36:19] ACKNOWLEDGEMENT - MegaRAID on analytics1055 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T172809 [18:36:28] 10Operations, 10ops-eqiad: Degraded RAID on analytics1055 - https://phabricator.wikimedia.org/T172809#3510136 (10ops-monitoring-bot) [18:39:45] argh I thought that ops-monitoring-bot wasn't working [18:40:06] 10Operations, 10ops-eqiad, 10Analytics-Kanban: Broken disk on analytics1055 - https://phabricator.wikimedia.org/T172808#3510161 (10elukey) [18:40:08] 10Operations, 10ops-eqiad: Degraded RAID on analytics1055 - https://phabricator.wikimedia.org/T172809#3510164 (10elukey) [18:40:26] 10Operations, 10ops-eqiad, 10Analytics-Kanban: Degraded RAID on analytics1055 - https://phabricator.wikimedia.org/T172809#3510136 (10elukey) [19:00:05] twentyafterfour: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170808T1900). Please do the needful. [19:14:32] (03CR) 10Ayounsi: [C: 032] Remove pfw-codfw from Smokeping, Rancid, Torrus, Icinga [puppet] - 10https://gerrit.wikimedia.org/r/370658 (https://phabricator.wikimedia.org/T171970) (owner: 10Ayounsi) [19:14:39] (03PS2) 10Ayounsi: Remove pfw-codfw from Smokeping, Rancid, Torrus, Icinga [puppet] - 10https://gerrit.wikimedia.org/r/370658 (https://phabricator.wikimedia.org/T171970) [19:36:58] !log twentyafterfour@tin Started scap: deploy 1.30.0-wmf.13 to testwikis and rebuild l10n refs T170631 [19:37:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:37:11] T170631: 1.30.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T170631 [19:44:31] !log twentyafterfour@tin scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="test2wiki" --outdir="/tmp/scap_l10n_224168097" --threads=10 --lang en --quiet' returned non-zero exit status 255 (duration: 07m 33s) [19:44:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:45:28] !log bounce thumbor-instances on thumbor1003 to make sure all memory limits are applied [19:45:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:34] !log restart varnish backend on cp1074 [19:46:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:50:05] 10Operations, 10RESTBase, 10RESTBase-API, 10Traffic, 10Services (next): RESTBase support for www.wikimedia.org missing - https://phabricator.wikimedia.org/T133178#3510337 (10GWicke) > On the RESTBase side, though, we have to figure out how to do the transition (for all of our environments as well as for... [19:51:46] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 0 [19:53:43] 10Operations, 10Analytics-Kanban, 10Traffic, 10Patch-For-Review, 10User-Elukey: Update Varnishkafka to support TLS encryption/authentication - https://phabricator.wikimedia.org/T165736#3510349 (10Nuria) 05Open>03Resolved [19:58:32] (03PS1) 10Andrew Bogott: labservices: remove ferm rule that opens mysql to all internal hosts [puppet] - 10https://gerrit.wikimedia.org/r/370689 (https://phabricator.wikimedia.org/T169075) [19:59:53] (03CR) 10Andrew Bogott: "My only worry about this is that it also removes the rule for port 3307 to db1011.eqiad.wmnet. I can't think why that would happen, but I'" [puppet] - 10https://gerrit.wikimedia.org/r/370689 (https://phabricator.wikimedia.org/T169075) (owner: 10Andrew Bogott) [20:18:24] (03PS3) 10Mobrovac: Cassandra: Do not include the main DNS in the list of seeds [puppet] - 10https://gerrit.wikimedia.org/r/370554 (https://phabricator.wikimedia.org/T172610) [20:22:55] !log twentyafterfour@tin Started scap: again: deploy 1.30.0-wmf.13 to testwikis and rebuild l10n refs T170631 [20:23:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:23:07] T170631: 1.30.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T170631 [20:25:51] (03CR) 10Mobrovac: [C: 04-1] Cassandra: Do not include the main DNS in the list of seeds (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/370554 (https://phabricator.wikimedia.org/T172610) (owner: 10Mobrovac) [20:26:23] (03CR) 10Mobrovac: "PCC is now looking good - https://puppet-compiler.wmflabs.org/compiler02/7352/" [puppet] - 10https://gerrit.wikimedia.org/r/370554 (https://phabricator.wikimedia.org/T172610) (owner: 10Mobrovac) [20:32:20] !log updated mediawiki.org changelog for 1.30.0-wmf.13 [20:32:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:35:45] 10Operations, 10OfflineContentGenerator, 10Services (designing): Improve stability and maintainability of our browser-based PDF render service - https://phabricator.wikimedia.org/T172815#3510411 (10GWicke) [20:37:11] 10Operations, 10OfflineContentGenerator, 10Services (designing): Improve stability and maintainability of our browser-based PDF render service - https://phabricator.wikimedia.org/T172815#3510428 (10GWicke) p:05Triage>03High [20:50:37] RECOVERY - Check Varnish expiry mailbox lag on cp1072 is OK: OK: expiry mailbox lag is 48 [20:59:32] !log ppchelko@tin Started deploy [restbase/deploy@c16fb6b]: Update summary and licensing information for pageviews API [20:59:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:05:50] !log twentyafterfour@tin Finished scap: again: deploy 1.30.0-wmf.13 to testwikis and rebuild l10n refs T170631 (duration: 42m 55s) [21:06:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:06:01] T170631: 1.30.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T170631 [21:08:03] !log ppchelko@tin Finished deploy [restbase/deploy@c16fb6b]: Update summary and licensing information for pageviews API (duration: 08m 31s) [21:08:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:08:33] (03PS1) 1020after4: scap prep: Fix comment in generated LocalSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370745 [21:09:17] (03PS1) 1020after4: group0 wikis to 1.30.0-wmf.13 refs T170631 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370748 [21:09:19] (03CR) 1020after4: [C: 032] group0 wikis to 1.30.0-wmf.13 refs T170631 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370748 (owner: 1020after4) [21:12:14] (03Merged) 10jenkins-bot: group0 wikis to 1.30.0-wmf.13 refs T170631 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370748 (owner: 1020after4) [21:12:28] (03CR) 10jenkins-bot: group0 wikis to 1.30.0-wmf.13 refs T170631 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370748 (owner: 1020after4) [21:20:46] (03CR) 10Ayounsi: [C: 032] Remove pfw-codfw from DNS [dns] - 10https://gerrit.wikimedia.org/r/370661 (https://phabricator.wikimedia.org/T171970) (owner: 10Ayounsi) [21:25:11] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 wikis to 1.30.0-wmf.13 refs T170631 [21:25:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:25:22] T170631: 1.30.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T170631 [21:36:20] 10Operations: bios defaults on new hardware orders - https://phabricator.wikimedia.org/T112627#3510635 (10RobH) 05Open>03Resolved Unless we order systems racks at a time, vendors don't pre-set bios defaults. We'll have to set these ourselves. [21:40:12] 10Operations, 10OfflineContentGenerator, 10Services (designing): Improve stability and maintainability of our browser-based PDF render service - https://phabricator.wikimedia.org/T172815#3510643 (10GWicke) [21:40:16] 10Operations, 10Electron-PDFs, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking), 10Services (blocked): pdfrender fails to serve requests since Mar 8 00:30:32 UTC on scb1003 - https://phabricator.wikimedia.org/T159922#3510642 (10GWicke) [21:54:24] (03PS7) 10Eevans: WIP: Reshape RESTBase Cassandra production cluster; Provision new 3.x cluster [puppet] - 10https://gerrit.wikimedia.org/r/370098 (https://phabricator.wikimedia.org/T169939) [22:00:48] (03PS8) 10Eevans: WIP: Reshape RESTBase Cassandra production cluster; Provision new 3.x cluster [puppet] - 10https://gerrit.wikimedia.org/r/370098 (https://phabricator.wikimedia.org/T169939) [22:01:06] (03CR) 10jerkins-bot: [V: 04-1] WIP: Reshape RESTBase Cassandra production cluster; Provision new 3.x cluster [puppet] - 10https://gerrit.wikimedia.org/r/370098 (https://phabricator.wikimedia.org/T169939) (owner: 10Eevans) [22:13:09] (03PS9) 10Eevans: Reshape RESTBase Cassandra production cluster; Provision new 3.x cluster [puppet] - 10https://gerrit.wikimedia.org/r/370098 (https://phabricator.wikimedia.org/T169939) [22:16:06] (03PS10) 10Eevans: Reshape RESTBase Cassandra production cluster; Provision new 3.x cluster [puppet] - 10https://gerrit.wikimedia.org/r/370098 (https://phabricator.wikimedia.org/T169939) [22:17:31] 10Operations, 10Ops-Access-Requests: Requesting access to wikimedia-tech-channel-op for Luke081515 - https://phabricator.wikimedia.org/T172793#3509528 (10RobH) Reference: In #wikimedia-operations, @Luke081515 has the following; Luke081515 +Aiotv This task is to request the same for #wikimedia-te... [22:27:01] 10Operations, 10Release-Engineering-Team (Backlog), 10Services (later), 10Wikimedia-Incident: Review new service 'pre-deployment to production' checklist - https://phabricator.wikimedia.org/T141897#3510867 (10GWicke) p:05High>03Normal [22:33:22] 10Operations, 10RESTBase, 10Services (attic): RESTBase and domain renames - https://phabricator.wikimedia.org/T113307#3510889 (10GWicke) 05Open>03declined Two years later, this has not been an issue in practice. I am going to be bold & decline this task for now. We can resurrect it if needed at a later p... [22:36:47] 10Operations, 10Deployment-Systems, 10Release-Engineering-Team, 10Services: Streamline our service development and deployment process - https://phabricator.wikimedia.org/T93428#3510897 (10GWicke) [22:36:51] 10Operations, 10Deployment-Systems, 10Services (attic): Evaluate Docker as a container deployment tool - https://phabricator.wikimedia.org/T93439#3510895 (10GWicke) 05Open>03declined With all the ongoing work around T170453, this task has lost its usefulness. Much of the information is out of date, and i... [22:42:20] 10Operations, 10Parsoid, 10service-runner, 10service-template-node, 10Services (done): Create a standard service template / init / logging / package setup - https://phabricator.wikimedia.org/T88585#3510900 (10GWicke) [22:48:00] 10Operations, 10Release-Engineering-Team, 10Epic, 10Services (watching): FY2017/18 Program 6 - Outcome 2 - Objective 2: Set up a continuous integration and deployment pipeline - https://phabricator.wikimedia.org/T170481#3510904 (10GWicke) [22:53:24] 10Operations, 10TechCom-RfC, 10RfC, 10Services (attic), and 2 others: Service Ownership and Maintenance - https://phabricator.wikimedia.org/T122825#3510908 (10mobrovac) 05Open>03stalled [23:00:05] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170808T2300). [23:00:08] Hello [23:00:43] I can SWAT this evening if thereis a last minute patch, but it seems the window is empty. [23:03:35] (03PS4) 10Dereckson: Run Lilypond from Firejail [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370358 (https://phabricator.wikimedia.org/T172582) (owner: 10Ebe123) [23:05:04] (03CR) 10Dereckson: "« Should not be merged until I011db0e9a is merged. » can be expressed as a Depends-On property. That ensures Zuul won't merge it by accide" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370358 (https://phabricator.wikimedia.org/T172582) (owner: 10Ebe123) [23:09:15] (03PS3) 10Dereckson: Allow bureaucrats on WMF wikis to grant and remove 'confirmed' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368939 (https://phabricator.wikimedia.org/T101983) (owner: 10MarcoAurelio) [23:12:00] (03CR) 10Ebe123: "> « Should not be merged until I011db0e9a is merged. » can be" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370358 (https://phabricator.wikimedia.org/T172582) (owner: 10Ebe123) [23:14:35] (03CR) 10Dereckson: [C: 032] Allow bureaucrats on WMF wikis to grant and remove 'confirmed' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368939 (https://phabricator.wikimedia.org/T101983) (owner: 10MarcoAurelio) [23:16:01] (03Merged) 10jenkins-bot: Allow bureaucrats on WMF wikis to grant and remove 'confirmed' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368939 (https://phabricator.wikimedia.org/T101983) (owner: 10MarcoAurelio) [23:16:11] (03CR) 10jenkins-bot: Allow bureaucrats on WMF wikis to grant and remove 'confirmed' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368939 (https://phabricator.wikimedia.org/T101983) (owner: 10MarcoAurelio) [23:21:16] (03CR) 10Dereckson: "This change grants the right to ADD users to the group, but not to remove them. Could be a little annoying as it doesn't allow easily to u" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368939 (https://phabricator.wikimedia.org/T101983) (owner: 10MarcoAurelio) [23:23:16] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Allow bureaucrats on WMF wikis to grant and remove 'confirmed' (T101983) (duration: 00m 51s) [23:23:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:23:30] T101983: Allow admins adding confirmed user group on all WMF wikis - https://phabricator.wikimedia.org/T101983 [23:30:30] (03CR) 10Dereckson: "I confirm https://ur.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-ur.svg exists." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367946 (https://phabricator.wikimedia.org/T171769) (owner: 10محمد شعیب) [23:40:31] (03Draft1) 10Paladox: Insert the description of the change. [puppet] - 10https://gerrit.wikimedia.org/r/370762 [23:40:34] (03PS2) 10Paladox: Phabricator: Fix phab dump script to use variable $app_user and $app_pass [puppet] - 10https://gerrit.wikimedia.org/r/370762 [23:51:53] (03PS3) 10Dereckson: Add urdu logo to mobile site [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367946 (https://phabricator.wikimedia.org/T171769) (owner: 10محمد شعیب) [23:52:04] (03CR) 10jerkins-bot: [V: 04-1] Add urdu logo to mobile site [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367946 (https://phabricator.wikimedia.org/T171769) (owner: 10محمد شعیب) [23:52:23] (03CR) 10Dereckson: "PS3: cleaned spaces" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367946 (https://phabricator.wikimedia.org/T171769) (owner: 10محمد شعیب) [23:59:48] 10Operations, 10Services (done), 10User-mobrovac: nodejs 6.11 - https://phabricator.wikimedia.org/T170548#3510985 (10MaxSem) @debt, I have no idea whether it works 6.11 or not. It needs to be tested.