[00:00:05] twentyafterfour: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Phabricator update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190328T0000). [00:04:45] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.329 second response time https://phabricator.wikimedia.org/T174916 [00:08:39] PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [00:17:57] * Krinkle is staging on mwdebug1002 [00:21:45] !log krinkle@deploy1001 Synchronized php-1.33.0-wmf.23/includes/api/ApiStashEdit.php: Ic357dbfcd9ab / T203786 (duration: 00m 57s) [00:21:46] 10Operations, 10PHP 7.2 support: PHP Fatal error: The UdpSocket to 127.0.0.1:10514 has been closed (from Monolog/SyslogUdp) - https://phabricator.wikimedia.org/T214734 (10Krinkle) Still seen. [00:21:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:21:50] T203786: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 [00:21:51] 10Operations, 10PHP 7.2 support: PHP Fatal error: The UdpSocket to 127.0.0.1:10514 has been closed (from Monolog/SyslogUdp) - https://phabricator.wikimedia.org/T214734 (10Krinkle) [00:22:11] 10Operations, 10Core Platform Team (PHP7 (TEC4)), 10Core Platform Team Kanban (Doing), 10HHVM, and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Krinkle) [00:32:44] !log jnt push to standardize mr1-* [00:32:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:56:09] !log jnt push to standardize asw* [00:56:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:15:24] !log remove sandbox-out6 filter from all routers [01:15:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:20:06] !log progressive jnt push to standardize cr* [01:20:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:29:35] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.553 second response time https://phabricator.wikimedia.org/T174916 [01:33:25] PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [01:45:51] !log add AS specific policy-statements to cr2-eqsin (but don't apply them yet) - T211930 [01:45:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:45:55] T211930: Add eqsin routing special cases to jnt - https://phabricator.wikimedia.org/T211930 [02:19:13] PROBLEM - Backup of x1 in codfw on db1115 is CRITICAL: Backup for x1 at codfw taken more than 8 days ago: Most recent backup 2019-03-20 02:06:35 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [02:28:33] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 1.129 second response time https://phabricator.wikimedia.org/T174916 [02:32:31] PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [02:49:09] PROBLEM - puppet last run on cloudvirt1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:50:25] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.003 second response time https://phabricator.wikimedia.org/T174916 [02:50:35] !log restarted pdfrender on scb1004 in order to attempt to address flapping errors [02:50:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:54:14] 10Operations, 10Electron-PDFs, 10Core Platform Team Backlog (Attic), 10Services (attic): electron/pdfrender hangs - https://phabricator.wikimedia.org/T174916 (10crusnov) Just a note, the service was flapping for a while, and I have restarted it on scb1004. {F28491495} [03:02:55] PROBLEM - puppet last run on db1070 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:07:11] PROBLEM - puppet last run on debmonitor1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:15:31] RECOVERY - puppet last run on cloudvirt1009 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [03:33:29] RECOVERY - puppet last run on debmonitor1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:34:31] RECOVERY - puppet last run on db1070 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [03:37:43] (03PS1) 10Smalyshev: Enable new WBCS search together with all search settings. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499695 [03:38:40] (03PS2) 10Smalyshev: Enable new WBCS search together with all search settings. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499695 [04:24:05] PROBLEM - puppet last run on cloudelastic1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:32:43] RECOVERY - Backup of s7 in codfw on db1115 is OK: Backup for s7 at codfw taken less than 8 days ago and larger than 10 GB: Last one 2019-03-27 20:44:41 from dbstore2001.codfw.wmnet:3317 (103 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups [04:36:33] RECOVERY - Mjolnir bulk update failure check - codfw on icinga1001 is OK: (C)2 gt (W)1 gt 0 https://grafana.wikimedia.org/d/000000591/elasticsearch-mjolnir-bulk-updates?orgId=1&from=now-7d&to=now&panelId=1&fullscreen [04:50:25] RECOVERY - puppet last run on cloudelastic1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [05:15:38] just fyi, Phab is down [05:39:58] !log Restart apache on phab1001 - phabricator is down [05:40:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:47:49] 10Operations, 10ops-codfw, 10DBA, 10procurement: rack/setup/install (5) dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Marostegui) a:05jcrespo→03Papaul [05:48:36] 10Operations, 10ops-codfw, 10DBA, 10procurement: rack/setup/install (5) dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Marostegui) Hostnames updated. Racking proposal, is basically one server per row. And as we have 5 servers and 4 rows, we have to place 2 server within the same row,... [05:49:42] 10Operations, 10ops-codfw, 10DBA, 10procurement: rack/setup/install (1) testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Marostegui) a:05jcrespo→03Papaul [05:51:44] 10Operations, 10ops-codfw, 10DBA, 10procurement: rack/setup/install (1) testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Marostegui) Done. Racking: Any 1G rack is fine. Hostname: db2102.codfw.wmnet [06:01:14] (03PS1) 10Marostegui: install_server: Allow image of the new codfw DBs. [puppet] - 10https://gerrit.wikimedia.org/r/499706 (https://phabricator.wikimedia.org/T219461) [06:03:51] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install (5) dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10RobH) [06:04:05] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install (1) testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10RobH) [06:14:37] (03PS1) 10Marostegui: db-eqiad.php: Clean up non used entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499707 [06:27:40] !log Deploy schema change on s3 codfw, lag will be generated on s3 codfw. [06:27:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:28:39] PROBLEM - puppet last run on ms-be2014 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/root/.screenrc] [06:43:09] (03CR) 10Giuseppe Lavagetto: profile::mediawiki::maintenance: systemd-timer based periodic jobs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [06:54:16] (03PS2) 10ArielGlenn: enable use of lbzip2 for compressing revison history dumps [puppet] - 10https://gerrit.wikimedia.org/r/499519 (https://phabricator.wikimedia.org/T214293) [06:55:05] (03CR) 10Marostegui: [C: 03+1] mariadb-snapshots: Require wmf mariadb package present [puppet] - 10https://gerrit.wikimedia.org/r/499599 (https://phabricator.wikimedia.org/T218336) (owner: 10Jcrespo) [06:55:53] (03CR) 10ArielGlenn: [C: 03+2] enable use of lbzip2 for compressing revison history dumps [puppet] - 10https://gerrit.wikimedia.org/r/499519 (https://phabricator.wikimedia.org/T214293) (owner: 10ArielGlenn) [07:00:15] RECOVERY - puppet last run on ms-be2014 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [07:12:17] (03CR) 10Jcrespo: [C: 03+1] install_server: Allow image of the new codfw DBs. [puppet] - 10https://gerrit.wikimedia.org/r/499706 (https://phabricator.wikimedia.org/T219461) (owner: 10Marostegui) [07:12:39] (03PS2) 10Jcrespo: mariadb-snapshots: Require wmf mariadb package present [puppet] - 10https://gerrit.wikimedia.org/r/499599 (https://phabricator.wikimedia.org/T218336) [07:13:22] (03CR) 10Jcrespo: [C: 03+1] "They will need dns and dhcp patches, too." [puppet] - 10https://gerrit.wikimedia.org/r/499706 (https://phabricator.wikimedia.org/T219461) (owner: 10Marostegui) [07:13:48] (03CR) 10Jcrespo: [C: 03+1] db-eqiad.php: Clean up non used entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499707 (owner: 10Marostegui) [07:14:01] (03CR) 10Marostegui: "yep, those are usually done by papaul" [puppet] - 10https://gerrit.wikimedia.org/r/499706 (https://phabricator.wikimedia.org/T219461) (owner: 10Marostegui) [07:14:08] (03PS2) 10Marostegui: install_server: Allow image of the new codfw DBs. [puppet] - 10https://gerrit.wikimedia.org/r/499706 (https://phabricator.wikimedia.org/T219461) [07:14:19] (03PS1) 10Marostegui: db-codfw.php: Depool pc2008 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499711 [07:15:00] (03CR) 10Marostegui: [C: 03+2] install_server: Allow image of the new codfw DBs. [puppet] - 10https://gerrit.wikimedia.org/r/499706 (https://phabricator.wikimedia.org/T219461) (owner: 10Marostegui) [07:15:35] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Clean up non used entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499707 (owner: 10Marostegui) [07:15:59] PROBLEM - Backup of m1 in codfw on db1115 is CRITICAL: Backup for m1 at codfw taken more than 8 days ago: Most recent backup 2019-03-20 06:55:48 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [07:16:24] (03PS3) 10Jcrespo: mariadb-snapshots: Require wmf mariadb package present [puppet] - 10https://gerrit.wikimedia.org/r/499599 (https://phabricator.wikimedia.org/T218336) [07:16:38] (03Merged) 10jenkins-bot: db-eqiad.php: Clean up non used entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499707 (owner: 10Marostegui) [07:18:04] (03CR) 10Jcrespo: [C: 03+1] "> yep, those are usually done by papaul" [puppet] - 10https://gerrit.wikimedia.org/r/499706 (https://phabricator.wikimedia.org/T219461) (owner: 10Marostegui) [07:18:05] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Clean up old non used entries (duration: 01m 04s) [07:18:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:18:16] (03CR) 10Jcrespo: [C: 03+2] mariadb-snapshots: Require wmf mariadb package present [puppet] - 10https://gerrit.wikimedia.org/r/499599 (https://phabricator.wikimedia.org/T218336) (owner: 10Jcrespo) [07:18:43] 10Operations, 10Analytics, 10Analytics-Kanban, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Sunset Wikimetrics - https://phabricator.wikimedia.org/T211835 (10mforns) Thanks @ema! [07:18:48] 10Operations, 10Analytics, 10Analytics-Kanban, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Sunset Wikimetrics - https://phabricator.wikimedia.org/T211835 (10mforns) [07:19:26] (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Depool pc2008 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499711 (owner: 10Marostegui) [07:20:49] (03Merged) 10jenkins-bot: db-codfw.php: Depool pc2008 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499711 (owner: 10Marostegui) [07:22:05] (03CR) 10jenkins-bot: db-eqiad.php: Clean up non used entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499707 (owner: 10Marostegui) [07:22:06] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Depool pc2008 for upgrade (duration: 00m 57s) [07:22:07] (03CR) 10jenkins-bot: db-codfw.php: Depool pc2008 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499711 (owner: 10Marostegui) [07:22:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:22:09] !log Upgrade pc2008 [07:22:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:19] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool pc2008" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499713 [07:27:32] (03PS1) 10Elukey: role::mediawiki::maintenance: raise the mcrouter's conn to 5 [puppet] - 10https://gerrit.wikimedia.org/r/499714 [07:30:06] (03CR) 10Marostegui: [C: 03+2] Revert "db-codfw.php: Depool pc2008" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499713 (owner: 10Marostegui) [07:30:13] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/15393/mwmaint1002.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/499714 (owner: 10Elukey) [07:31:06] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool pc2008" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499713 (owner: 10Marostegui) [07:32:56] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool pc2008 after upgrade (duration: 00m 57s) [07:32:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:33:26] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool pc2008" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499713 (owner: 10Marostegui) [07:38:13] PROBLEM - puppet last run on db1062 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:46:31] PROBLEM - SSH on dbprov2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [07:46:52] (03PS1) 10Elukey: Switch the Yarn rmstore config back to zookeeper for Analytics Hadoop test [puppet] - 10https://gerrit.wikimedia.org/r/499715 (https://phabricator.wikimedia.org/T218758) [07:47:45] RECOVERY - SSH on dbprov2001 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [07:51:17] RECOVERY - Mjolnir bulk update failure check - eqiad on icinga1001 is OK: (C)2 gt (W)1 gt 0 https://grafana.wikimedia.org/d/000000591/elasticsearch-mjolnir-bulk-updates?orgId=1&from=now-7d&to=now&panelId=1&fullscreen [07:51:45] (03PS1) 10Elukey: Switch the Yarn rmstore config back to zookeeper for Analytics Hadoop [puppet] - 10https://gerrit.wikimedia.org/r/499716 (https://phabricator.wikimedia.org/T218758) [07:52:15] RECOVERY - Backup of s4 in codfw on db1115 is OK: Backup for s4 at codfw taken less than 8 days ago and larger than 10 GB: Last one 2019-03-28 00:41:56 from dbstore2002.codfw.wmnet:3314 (111 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups [07:58:40] (03Abandoned) 10Vgutierrez: acme_chief: Add a LE ACMEv2 staging environment account [puppet] - 10https://gerrit.wikimedia.org/r/499426 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez) [07:59:09] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install (1) testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10jcrespo) vlan: private This will be like any other production db-hosts, documented here: https://wikitech.wikimedia.org/wiki/Raid_setup [08:00:07] RECOVERY - Memory correctable errors -EDAC- on mw2206 is OK: (C)4 ge (W)2 ge 1 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw2206&var-datasource=codfw+prometheus/ops [08:01:04] (03PS7) 10Vgutierrez: acme_chief: Issue the global unified wildcard certificate [puppet] - 10https://gerrit.wikimedia.org/r/499185 (https://phabricator.wikimedia.org/T213705) [08:04:33] RECOVERY - puppet last run on db1062 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [08:06:55] (03PS1) 10Marostegui: db-codfw.php: Depool pc2009 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499718 [08:07:13] (03CR) 10Vgutierrez: [C: 03+2] acme_chief: Issue the global unified wildcard certificate [puppet] - 10https://gerrit.wikimedia.org/r/499185 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez) [08:07:53] !log gehel@cumin2001 START - Cookbook sre.elasticsearch.force-shard-allocation [08:07:53] RECOVERY - EDAC syslog messages on mw2206 is OK: (C)4 ge (W)2 ge 1.001 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw2206&var-datasource=codfw+prometheus/ops [08:07:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:00] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install (5) dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Marostegui) [08:09:35] (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Depool pc2009 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499718 (owner: 10Marostegui) [08:10:41] !log gehel@cumin2001 END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0) [08:10:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:44] (03Merged) 10jenkins-bot: db-codfw.php: Depool pc2009 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499718 (owner: 10Marostegui) [08:11:53] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Depool pc2009 for upgrade (duration: 00m 57s) [08:11:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:12:00] !log Upgrade pc2009 [08:12:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:14:23] 10Operations, 10Acme-chief, 10Traffic, 10Goal, 10Patch-For-Review: Deploy managed LetsEncrypt certs for all public use-cases - https://phabricator.wikimedia.org/T213705 (10Vgutierrez) ` root@acmechief1001:~# openssl x509 -text -noout -in /var/lib/acme-chief/certs/unified/live/rsa-2048.crt Certificate:... [08:16:45] (03CR) 10jenkins-bot: db-codfw.php: Depool pc2009 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499718 (owner: 10Marostegui) [08:18:00] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool pc2009" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499719 [08:19:09] (03PS3) 10Vgutierrez: acme_chief: Issue wikiba.se certificate [puppet] - 10https://gerrit.wikimedia.org/r/499189 (https://phabricator.wikimedia.org/T213705) [08:19:29] (03CR) 10Elukey: [C: 03+2] Switch the Yarn rmstore config back to zookeeper for Analytics Hadoop test [puppet] - 10https://gerrit.wikimedia.org/r/499715 (https://phabricator.wikimedia.org/T218758) (owner: 10Elukey) [08:19:38] (03PS2) 10Elukey: Switch the Yarn rmstore config back to zookeeper for Analytics Hadoop test [puppet] - 10https://gerrit.wikimedia.org/r/499715 (https://phabricator.wikimedia.org/T218758) [08:19:40] (03CR) 10Vgutierrez: acme_chief: Issue wikiba.se certificate (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/499189 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez) [08:20:30] (03CR) 10Marostegui: [C: 03+2] Revert "db-codfw.php: Depool pc2009" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499719 (owner: 10Marostegui) [08:21:17] (03PS2) 10Hashar: gerrit: admins: ops -> gerritadmin [puppet] - 10https://gerrit.wikimedia.org/r/498431 [08:21:31] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool pc2009" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499719 (owner: 10Marostegui) [08:23:13] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool pc2009 after upgrade (duration: 00m 57s) [08:23:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:24:36] (03PS1) 10Marostegui: db-codfw.php: Depool pc2007 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499720 [08:24:51] (03CR) 10Giuseppe Lavagetto: [C: 03+1] gerrit: admins: ops -> gerritadmin [puppet] - 10https://gerrit.wikimedia.org/r/498431 (owner: 10Hashar) [08:27:44] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool pc2009" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499719 (owner: 10Marostegui) [08:28:42] (03CR) 10Gehel: [C: 04-1] elasticsearch: use standard resources for icinga checks (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/499511 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe) [08:29:35] (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Depool pc2007 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499720 (owner: 10Marostegui) [08:30:35] (03Merged) 10jenkins-bot: db-codfw.php: Depool pc2007 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499720 (owner: 10Marostegui) [08:31:37] (03PS6) 10Filippo Giunchedi: profile: kafkatee instance for udp2log compat [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) [08:31:53] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Depool pc2007 for upgrade (duration: 00m 56s) [08:31:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:32:11] !log Upgrade pc2007 [08:32:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:33:46] !log move hadoop yarn configuration from hdfs back to zookeeper - T218758 [08:33:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:33:51] T218758: Improve speed and reliability of Yarn's Resource Manager failover - https://phabricator.wikimedia.org/T218758 [08:35:05] (03PS2) 10Elukey: Switch the Yarn rmstore config back to zookeeper for Analytics Hadoop [puppet] - 10https://gerrit.wikimedia.org/r/499716 (https://phabricator.wikimedia.org/T218758) [08:37:42] !log retry shard allocation on elasticsearch codfw (curl -k -XPOST 'https://localhost:9243/_cluster/reroute?pretty&explain=true&retry_failed') [08:37:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:57] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool pc2007" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499723 [08:38:36] (03CR) 10jenkins-bot: db-codfw.php: Depool pc2007 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499720 (owner: 10Marostegui) [08:38:42] !log retry shard allocation on elasticsearch codfw all clusters (curl -k -XPOST 'https://localhost:9243/_cluster/reroute?pretty&explain=true&retry_failed') - T218878 [08:38:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:38:46] T218878: Upgrade to elasticsearch 6.5.4 for cirrus / codfw - https://phabricator.wikimedia.org/T218878 [08:41:09] (03CR) 10Marostegui: [C: 03+2] Revert "db-codfw.php: Depool pc2007" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499723 (owner: 10Marostegui) [08:41:52] (03PS1) 10Jcrespo: WMFBackup: Reimplement os.rename to make work on different fs [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/499724 (https://phabricator.wikimedia.org/T206203) [08:42:08] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool pc2007" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499723 (owner: 10Marostegui) [08:42:16] (03CR) 10jerkins-bot: [V: 04-1] WMFBackup: Reimplement os.rename to make work on different fs [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/499724 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [08:43:17] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool pc2007 after upgrade (duration: 00m 57s) [08:43:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:43:42] (03PS2) 10Jcrespo: WMFBackup: Reimplement os.rename to make it work on different fs [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/499724 (https://phabricator.wikimedia.org/T206203) [08:44:04] (03CR) 10jerkins-bot: [V: 04-1] WMFBackup: Reimplement os.rename to make it work on different fs [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/499724 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [08:44:22] (03CR) 10Elukey: [C: 03+2] Switch the Yarn rmstore config back to zookeeper for Analytics Hadoop [puppet] - 10https://gerrit.wikimedia.org/r/499716 (https://phabricator.wikimedia.org/T218758) (owner: 10Elukey) [08:44:39] (03CR) 10Vgutierrez: dynamicproxy: Prevent STS header from non-TLS connections (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/499669 (https://phabricator.wikimedia.org/T102367) (owner: 10BryanDavis) [08:48:27] (03PS1) 10Jcrespo: mariadb-backups: Update code to the latest version [puppet] - 10https://gerrit.wikimedia.org/r/499726 (https://phabricator.wikimedia.org/T206203) [08:48:49] (03CR) 10Dzahn: [C: 03+2] smokeping: replace bast2001 (A5) with bast2002 (B5) target [puppet] - 10https://gerrit.wikimedia.org/r/499421 (https://phabricator.wikimedia.org/T196665) (owner: 10Dzahn) [08:48:59] (03PS3) 10Dzahn: smokeping: replace bast2001 (A5) with bast2002 (B5) target [puppet] - 10https://gerrit.wikimedia.org/r/499421 (https://phabricator.wikimedia.org/T196665) [08:49:33] (03CR) 10jerkins-bot: [V: 04-1] mariadb-backups: Update code to the latest version [puppet] - 10https://gerrit.wikimedia.org/r/499726 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [08:49:38] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool pc2007" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499723 (owner: 10Marostegui) [08:51:45] (03PS2) 10Jcrespo: mariadb-backups: Update code to the latest version [puppet] - 10https://gerrit.wikimedia.org/r/499726 (https://phabricator.wikimedia.org/T206203) [08:51:56] (03CR) 10Vgutierrez: "This looks good but right now we intend to deploy the unified certificate in eqsin along with the current certificate and switch traffic a" [puppet] - 10https://gerrit.wikimedia.org/r/497929 (https://phabricator.wikimedia.org/T182927) (owner: 10Alex Monk) [08:52:15] (03PS1) 10Marostegui: db-eqiad.php: Depool db1123 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499727 [08:54:07] (03CR) 10Jcrespo: [V: 03+2 C: 03+2] WMFBackup: Reimplement os.rename to make it work on different fs [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/499724 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [08:54:25] (03PS3) 10Jcrespo: mariadb-backups: Update code to the latest version [puppet] - 10https://gerrit.wikimedia.org/r/499726 (https://phabricator.wikimedia.org/T206203) [08:54:51] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1123 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499727 (owner: 10Marostegui) [08:55:52] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1123 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499727 (owner: 10Marostegui) [08:56:33] (03PS1) 10Dzahn: smokeping: add authdns2001 as target host in A5 [puppet] - 10https://gerrit.wikimedia.org/r/499728 [08:57:09] (03CR) 10Dzahn: "thanks Arzhel, here's a change to add authdns2001 as well https://gerrit.wikimedia.org/r/c/operations/puppet/+/499728" [puppet] - 10https://gerrit.wikimedia.org/r/499421 (https://phabricator.wikimedia.org/T196665) (owner: 10Dzahn) [08:57:10] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1123 (duration: 00m 55s) [08:57:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:59:48] (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Update code to the latest version [puppet] - 10https://gerrit.wikimedia.org/r/499726 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [08:59:55] 10Operations, 10Performance-Team, 10Traffic: Send peering requests to AS with the worst TTFB - https://phabricator.wikimedia.org/T219486 (10Gilles) [09:00:41] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1123 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499727 (owner: 10Marostegui) [09:00:43] !log restarting elasticsearch-psi on elastic2036 (shards stuck in recovery) - T218878 [09:00:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:00:46] T218878: Upgrade to elasticsearch 6.5.4 for cirrus / codfw - https://phabricator.wikimedia.org/T218878 [09:02:46] (03CR) 10Giuseppe Lavagetto: Add an update action (038 comments) [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/487793 (owner: 10Giuseppe Lavagetto) [09:05:21] (03PS9) 10Dzahn: openldap/offboard-user: add wikitech user deactivation [puppet] - 10https://gerrit.wikimedia.org/r/498429 [09:05:59] !log restarting elasticsearch-psi on elastic20[35,36,53] (shards stuck in recovery) - T218878 [09:06:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:02] T218878: Upgrade to elasticsearch 6.5.4 for cirrus / codfw - https://phabricator.wikimedia.org/T218878 [09:07:31] (03CR) 10Dzahn: [C: 04-2] "ok. i understood the last actions as "not worth it since it's going to be shutdown anyways"" [puppet] - 10https://gerrit.wikimedia.org/r/482118 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [09:09:11] !log restarting elasticsearch-omega on elastic2050 (shards stuck in recovery) - T218878 [09:09:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:12:35] (03PS2) 10Dzahn: Revert "gerrit: Disable jgit gc" [puppet] - 10https://gerrit.wikimedia.org/r/499289 (owner: 10Paladox) [09:13:22] (03CR) 10Alex Monk: [C: 04-1] "per Valentin this will need more work to be useful in prod" [puppet] - 10https://gerrit.wikimedia.org/r/497929 (https://phabricator.wikimedia.org/T182927) (owner: 10Alex Monk) [09:14:14] !log install rsyslog 8.1901.0-1~bpo8+wmf1 on phab1001 and copper [09:14:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:14:23] (03CR) 10Alexandros Kosiaris: [C: 04-1] jessie-backports: Remove unsued pins (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/499453 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [09:15:02] sigh, not copper, cobalt! [09:16:06] (03PS1) 10Jcrespo: mariadb-backups: Configure x1 as the backup source for codfw [puppet] - 10https://gerrit.wikimedia.org/r/499729 (https://phabricator.wikimedia.org/T206203) [09:16:43] (03PS2) 10Jcrespo: mariadb-backups: Configure x1 as the backup source for codfw [puppet] - 10https://gerrit.wikimedia.org/r/499729 (https://phabricator.wikimedia.org/T206203) [09:17:24] (03CR) 10Alexandros Kosiaris: [C: 03+1] "March 25 is past. I am guessing this can be merged now?" [puppet] - 10https://gerrit.wikimedia.org/r/482118 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [09:18:41] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 76 probes of 401 (alerts on 35) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [09:19:13] !log restarting elasticsearch-omega on elastic20[38,50] (shards stuck in recovery) - T218878 [09:19:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:17] T218878: Upgrade to elasticsearch 6.5.4 for cirrus / codfw - https://phabricator.wikimedia.org/T218878 [09:19:59] RECOVERY - Backup of s1 in codfw on db1115 is OK: Backup for s1 at codfw taken less than 8 days ago and larger than 10 GB: Last one 2019-03-28 04:37:52 from dbstore2002.codfw.wmnet:3311 (144 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups [09:20:18] (03CR) 10Marostegui: [C: 03+1] mariadb-backups: Configure x1 as the backup source for codfw [puppet] - 10https://gerrit.wikimedia.org/r/499729 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [09:21:59] 10Operations, 10Analytics, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10elukey) 05Open→03Stalled Pending hardware procurement in https://phabricator.wikimedia.org/T217668 [09:22:02] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10elukey) [09:23:25] (03CR) 10Alexandros Kosiaris: [C: 03+1] [WIP] POST test event to service for readinessProbe [deployment-charts] - 10https://gerrit.wikimedia.org/r/499576 (https://phabricator.wikimedia.org/T218680) (owner: 10Ottomata) [09:23:38] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10elukey) 05Open→03Stalled All the info tracked in T216226. We are g... [09:24:57] (03CR) 10Dzahn: [C: 03+2] Revert "gerrit: Disable jgit gc" [puppet] - 10https://gerrit.wikimedia.org/r/499289 (owner: 10Paladox) [09:27:49] (03PS2) 10Giuseppe Lavagetto: arclamp: fix arclamp-grep file format [puppet] - 10https://gerrit.wikimedia.org/r/499554 [09:27:51] (03PS5) 10Giuseppe Lavagetto: arclamp: add a second instance for excimer logs [puppet] - 10https://gerrit.wikimedia.org/r/499222 (https://phabricator.wikimedia.org/T176916) [09:28:30] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1123" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499731 [09:28:30] !log restarting elasticsearch on elastic20[25,27] (shards stuck in recovery) - T218878 [09:28:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:28:33] T218878: Upgrade to elasticsearch 6.5.4 for cirrus / codfw - https://phabricator.wikimedia.org/T218878 [09:29:35] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1123" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499731 (owner: 10Marostegui) [09:29:45] (03PS2) 10Dzahn: smokeping: add authdns2001 as target host in A5 [puppet] - 10https://gerrit.wikimedia.org/r/499728 [09:30:06] (03CR) 10Dzahn: [C: 03+2] "per comments on https://gerrit.wikimedia.org/r/c/operations/puppet/+/499421" [puppet] - 10https://gerrit.wikimedia.org/r/499728 (owner: 10Dzahn) [09:30:26] (03CR) 10Muehlenhoff: "At this point only the checker instances remain in the old cluster:" [puppet] - 10https://gerrit.wikimedia.org/r/482118 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [09:31:04] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1123" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499731 (owner: 10Marostegui) [09:32:56] (03PS7) 10Filippo Giunchedi: profile: kafkatee instance for udp2log compat [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) [09:32:58] (03PS1) 10Filippo Giunchedi: logrotate: add old_dir parameter [puppet] - 10https://gerrit.wikimedia.org/r/499734 (https://phabricator.wikimedia.org/T126989) [09:33:34] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1123 (duration: 00m 56s) [09:33:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:34:15] (03PS1) 10Dzahn: smokeping: correct host name of authdns2001 [puppet] - 10https://gerrit.wikimedia.org/r/499735 [09:34:47] (03CR) 10Filippo Giunchedi: profile: kafkatee instance for udp2log compat (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [09:35:08] (03CR) 10jerkins-bot: [V: 04-1] profile: kafkatee instance for udp2log compat [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [09:35:10] (03CR) 10Dzahn: [C: 03+2] smokeping: correct host name of authdns2001 [puppet] - 10https://gerrit.wikimedia.org/r/499735 (owner: 10Dzahn) [09:35:16] (03PS3) 10Jcrespo: mariadb-backups: Configure x1 as the backup source for codfw [puppet] - 10https://gerrit.wikimedia.org/r/499729 (https://phabricator.wikimedia.org/T206203) [09:35:25] (03PS4) 10Jcrespo: mariadb-backups: Configure x1 as the backup source for codfw [puppet] - 10https://gerrit.wikimedia.org/r/499729 (https://phabricator.wikimedia.org/T206203) [09:35:27] (03CR) 10Muehlenhoff: "Looks good, one note inline." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/499667 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [09:35:33] (03CR) 10Dzahn: "duh.. typos https://gerrit.wikimedia.org/r/c/operations/puppet/+/499735" [puppet] - 10https://gerrit.wikimedia.org/r/499728 (owner: 10Dzahn) [09:36:52] (03PS1) 10Jcrespo: mariadb-backups: Once transferred, chown and prepare them as non-root [puppet] - 10https://gerrit.wikimedia.org/r/499736 (https://phabricator.wikimedia.org/T206203) [09:37:05] (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Configure x1 as the backup source for codfw [puppet] - 10https://gerrit.wikimedia.org/r/499729 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [09:37:07] jouncebot: now [09:37:07] No deployments scheduled for the next 1 hour(s) and 22 minute(s) [09:37:27] !log restarting elasticsearch-psi on elastic20[39,40] (shards stuck in recovery) - T218878 [09:37:29] * addshore is going to prepare a wikibase backport for an UBN https://phabricator.wikimedia.org/T219452 for deploying in the next hour [09:37:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:37:30] T218878: Upgrade to elasticsearch 6.5.4 for cirrus / codfw - https://phabricator.wikimedia.org/T218878 [09:38:05] (03CR) 10jerkins-bot: [V: 04-1] mariadb-backups: Once transferred, chown and prepare them as non-root [puppet] - 10https://gerrit.wikimedia.org/r/499736 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [09:38:12] (03Abandoned) 10Dzahn: simplelamp: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/415510 (https://phabricator.wikimedia.org/T202574) (owner: 10Dzahn) [09:39:20] (03PS1) 10Marostegui: db-eqiad.php: Change parsercache key [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499737 (https://phabricator.wikimedia.org/T210725) [09:39:33] (03CR) 10Marostegui: [C: 04-2] "Do not submit" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499737 (https://phabricator.wikimedia.org/T210725) (owner: 10Marostegui) [09:41:00] (03CR) 10Marostegui: [C: 04-2] "Check my plan at: https://phabricator.wikimedia.org/T210725#5065110" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499737 (https://phabricator.wikimedia.org/T210725) (owner: 10Marostegui) [09:41:53] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665 (10Dzahn) [09:42:11] !log restarting elasticsearch on elastic20[28,29,41] (shards stuck in recovery) - T218878 [09:42:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:42:42] (03PS2) 10Jcrespo: mariadb-backups: Once transferred, chown and prepare them as non-root [puppet] - 10https://gerrit.wikimedia.org/r/499736 (https://phabricator.wikimedia.org/T206203) [09:42:44] (03PS2) 10Dzahn: network::constants: remove bast2001 [puppet] - 10https://gerrit.wikimedia.org/r/499449 (https://phabricator.wikimedia.org/T219492) [09:43:23] (03PS2) 10Dzahn: turn bast2001 into a spare, replaced by bast2002 [puppet] - 10https://gerrit.wikimedia.org/r/499224 (https://phabricator.wikimedia.org/T219492) [09:45:25] (03PS1) 10Dzahn: install_server: switch bast2001 to stretch installer [puppet] - 10https://gerrit.wikimedia.org/r/499739 (https://phabricator.wikimedia.org/T219492) [09:46:55] (03PS1) 10Dzahn: bastionhost: remove rsync and motd warning for bast2001 [puppet] - 10https://gerrit.wikimedia.org/r/499740 (https://phabricator.wikimedia.org/T219492) [09:46:58] (03CR) 10Jcrespo: "Let me know what you think." [puppet] - 10https://gerrit.wikimedia.org/r/499736 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [09:47:21] RECOVERY - Backup of m1 in codfw on db1115 is OK: Backup for m1 at codfw taken less than 8 days ago and larger than 10 GB: Last one 2019-03-28 09:00:40 from db2078.codfw.wmnet:3321 (13 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups [09:47:56] (03CR) 10Giuseppe Lavagetto: [C: 03+2] arclamp: fix arclamp-grep file format [puppet] - 10https://gerrit.wikimedia.org/r/499554 (owner: 10Giuseppe Lavagetto) [09:48:00] (03PS8) 10Filippo Giunchedi: profile: kafkatee instance for udp2log compat [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) [09:48:04] (03CR) 10Dzahn: [C: 03+2] "probably gets decom'ed but if we reinstall we want stretch" [puppet] - 10https://gerrit.wikimedia.org/r/499739 (https://phabricator.wikimedia.org/T219492) (owner: 10Dzahn) [09:48:09] (03PS3) 10Giuseppe Lavagetto: arclamp: fix arclamp-grep file format [puppet] - 10https://gerrit.wikimedia.org/r/499554 [09:48:15] (03PS2) 10Dzahn: install_server: switch bast2001 to stretch installer [puppet] - 10https://gerrit.wikimedia.org/r/499739 (https://phabricator.wikimedia.org/T219492) [09:48:20] (03CR) 10Marostegui: "I know we discussed it, but we were not fully sure if it needed root or not, did you try it locally with a non privileged user?" [puppet] - 10https://gerrit.wikimedia.org/r/499736 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [09:48:24] lets joe win the race [09:48:34] <_joe_> ahah thanks [09:49:03] (03CR) 10jerkins-bot: [V: 04-1] profile: kafkatee instance for udp2log compat [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [09:49:08] 10Operations, 10Code-Stewardship-Reviews, 10Graphoid, 10Core Platform Team Backlog (Watching / External), and 2 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10dr0ptp4kt) Okay, this has been sitting in draft for too long, so I'm going to provide this simply so that... [09:49:22] <_joe_> mutante: merged [09:49:32] _joe_: 'k thx [09:49:48] (03PS3) 10Dzahn: install_server: switch bast2001 to stretch installer [puppet] - 10https://gerrit.wikimedia.org/r/499739 (https://phabricator.wikimedia.org/T219492) [09:51:49] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1123" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499731 (owner: 10Marostegui) [09:52:12] (03PS9) 10Filippo Giunchedi: profile: kafkatee instance for udp2log compat [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) [09:52:31] (03CR) 10Arturo Borrero Gonzalez: "This should be fine. All of our servers should support systemd by now. On the other hand, this change worth coordinating with @Bstorm." [puppet] - 10https://gerrit.wikimedia.org/r/482118 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [09:55:05] (03CR) 10Jcrespo: "> I know we discussed it, but we were not fully sure if it needed" [puppet] - 10https://gerrit.wikimedia.org/r/499736 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [09:56:36] !log restarting elasticsearch-omega on elastic2031 (shards stuck in recovery) - T218878 [09:56:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:56:40] T218878: Upgrade to elasticsearch 6.5.4 for cirrus / codfw - https://phabricator.wikimedia.org/T218878 [09:57:21] (03CR) 10Marostegui: [C: 03+1] "let's try then" [puppet] - 10https://gerrit.wikimedia.org/r/499736 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [09:57:24] (03CR) 10Muehlenhoff: admin: allow users to be removed preserving their home directories (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey) [10:02:08] (03CR) 10Arturo Borrero Gonzalez: "> Patch Set 6:" [puppet] - 10https://gerrit.wikimedia.org/r/499516 (https://phabricator.wikimedia.org/T218146) (owner: 10GTirloni) [10:04:29] jouncebot: now [10:04:29] No deployments scheduled for the next 0 hour(s) and 55 minute(s) [10:07:17] I'm going to do that wikibase backport now [10:10:32] (03CR) 10Arturo Borrero Gonzalez: "You are touching several pieces of codes that may break some servers badly if not properly configured from the hiera point of view. Also, " [puppet] - 10https://gerrit.wikimedia.org/r/499355 (owner: 10Alex Monk) [10:11:21] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 16 probes of 401 (alerts on 35) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [10:11:23] !log restarting elasticsearch-omega on elastic2050 (shards stuck in recovery) - T218878 [10:11:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:11:27] T218878: Upgrade to elasticsearch 6.5.4 for cirrus / codfw - https://phabricator.wikimedia.org/T218878 [10:14:38] syncing now [10:15:46] !log addshore@deploy1001 Synchronized php-1.33.0-wmf.23/extensions/Wikibase/lib: T219452 [[gerrit:499738|Revert: Use enableModuleContentVersion() for Wikibase\lib\SitesModule]] (duration: 01m 06s) [10:15:51] (03CR) 10Elukey: admin: allow users to be removed preserving their home directories (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey) [10:15:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:54] T219452: Cannot add a Wikidata sitelink [2019-03-27] - https://phabricator.wikimedia.org/T219452 [10:18:40] (03CR) 10Muehlenhoff: "Looks good to me! One comment inline wrt the git package." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/499453 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [10:19:31] (03CR) 10Alex Monk: "Yep, I expect we're going to need some careful review, and also to figure out a list of hosts to run puppet compiler against." [puppet] - 10https://gerrit.wikimedia.org/r/499355 (owner: 10Alex Monk) [10:20:24] (03CR) 10Muehlenhoff: admin: allow users to be removed preserving their home directories (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey) [10:22:54] !log restarting elasticsearch on elastic20[34,36,50] (shards stuck in recovery) - T218878 [10:22:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:22:57] T218878: Upgrade to elasticsearch 6.5.4 for cirrus / codfw - https://phabricator.wikimedia.org/T218878 [10:28:59] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 41 probes of 401 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [10:36:40] (03CR) 10Volans: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/499355 (owner: 10Alex Monk) [10:38:19] (03PS1) 10Filippo Giunchedi: prometheus: set v2 max block duration to 24h [puppet] - 10https://gerrit.wikimedia.org/r/499742 (https://phabricator.wikimedia.org/T187987) [10:39:31] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 14 probes of 401 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [10:39:47] (03CR) 10Elukey: admin: allow users to be removed preserving their home directories (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey) [10:42:19] (03CR) 10Gehel: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/499734 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [10:42:36] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/499734 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [10:51:51] (03CR) 10Volans: [C: 03+1] "Python code LGTM, a couple of optional comments. The puppet part looks sane but a compiler would check it better than human eye ;)" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [10:54:09] !log restarting elasticsearch-psi on elastic20[35,36,53] (shards stuck in recovery) - T218878 [10:54:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:54:12] T218878: Upgrade to elasticsearch 6.5.4 for cirrus / codfw - https://phabricator.wikimedia.org/T218878 [10:59:01] (03PS2) 10Filippo Giunchedi: logrotate: add old_dir parameter [puppet] - 10https://gerrit.wikimedia.org/r/499734 (https://phabricator.wikimedia.org/T126989) [10:59:10] (03CR) 10Filippo Giunchedi: [C: 03+2] logrotate: add old_dir parameter [puppet] - 10https://gerrit.wikimedia.org/r/499734 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [11:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190328T1100). [11:00:04] No GERRIT patches in the queue for this window AFAICS. [11:00:24] I'm stealing SWAT for wiki creation [11:01:10] Amir1: go ahead, nothing for swat [11:01:42] !log test copying prometheus metrics on bast3002 [11:01:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:19] (03PS1) 10Jbond: jessie-backport: Remove the jessie backport [puppet] - 10https://gerrit.wikimedia.org/r/499745 (https://phabricator.wikimedia.org/T219333) [11:04:37] (03CR) 10jerkins-bot: [V: 04-1] jessie-backport: Remove the jessie backport [puppet] - 10https://gerrit.wikimedia.org/r/499745 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [11:05:23] (03CR) 10Alex Monk: [C: 04-1] "I haven't been able to find a reason not to do that... Let's try that." [puppet] - 10https://gerrit.wikimedia.org/r/499355 (owner: 10Alex Monk) [11:06:25] (03CR) 10Muehlenhoff: [C: 03+1] jessie-backport: Remove the jessie backport [puppet] - 10https://gerrit.wikimedia.org/r/499745 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [11:07:03] (03PS2) 10Jbond: jessie-backport: Remove the jessie backport [puppet] - 10https://gerrit.wikimedia.org/r/499745 (https://phabricator.wikimedia.org/T219333) [11:07:42] (03PS1) 10Vgutierrez: acme_chief: Provide OCSP stapling support [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705) [11:07:46] (03PS3) 10Jbond: jessie-backport: Remove the jessie backport [puppet] - 10https://gerrit.wikimedia.org/r/499745 (https://phabricator.wikimedia.org/T219333) [11:08:52] (03CR) 10jerkins-bot: [V: 04-1] acme_chief: Provide OCSP stapling support [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez) [11:09:54] (03CR) 10Jbond: [C: 03+2] jessie-backport: Remove the jessie backport [puppet] - 10https://gerrit.wikimedia.org/r/499745 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [11:11:23] (03PS2) 10Vgutierrez: acme_chief: Provide OCSP stapling support [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705) [11:12:07] (03CR) 10jerkins-bot: [V: 04-1] acme_chief: Provide OCSP stapling support [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez) [11:12:25] (03PS3) 10Ladsgroup: Reinstate "Initial configuration for hyw.wikipedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499477 (owner: 10MarcoAurelio) [11:12:40] (03PS1) 10Ladsgroup: Revert "Revert "Add hywwiki to wikiversions.json"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499747 [11:12:49] (03CR) 10Ladsgroup: [C: 03+2] Reinstate "Initial configuration for hyw.wikipedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499477 (owner: 10MarcoAurelio) [11:12:52] (03CR) 10Ladsgroup: [C: 03+2] Revert "Revert "Add hywwiki to wikiversions.json"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499747 (owner: 10Ladsgroup) [11:13:16] Amir1: Both of those won't merge [11:13:16] (03CR) 10jerkins-bot: [V: 04-1] Revert "Revert "Add hywwiki to wikiversions.json"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499747 (owner: 10Ladsgroup) [11:14:23] (03Merged) 10jenkins-bot: Reinstate "Initial configuration for hyw.wikipedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499477 (owner: 10MarcoAurelio) [11:14:25] (03CR) 10jerkins-bot: [V: 04-1] Revert "Revert "Add hywwiki to wikiversions.json"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499747 (owner: 10Ladsgroup) [11:14:37] (03PS3) 10Vgutierrez: acme_chief: Provide OCSP stapling support [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705) [11:15:23] (03CR) 10jerkins-bot: [V: 04-1] acme_chief: Provide OCSP stapling support [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez) [11:15:38] Reedy: shoot [11:15:43] let me check [11:16:13] no merge commits on mw-config [11:16:23] Never mind any rebase conflict from the bumps yesterday [11:16:47] I see [11:18:28] (03Abandoned) 10Ladsgroup: Revert "Revert "Add hywwiki to wikiversions.json"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499747 (owner: 10Ladsgroup) [11:18:54] (03PS4) 10Vgutierrez: acme_chief: Provide OCSP stapling support [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705) [11:20:00] (03PS1) 10Ladsgroup: Add hywwiki to wikiversions.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499748 (https://phabricator.wikimedia.org/T212597) [11:20:02] (03CR) 10Elukey: admin: allow users to be removed preserving their home directories (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey) [11:20:18] (03CR) 10Ladsgroup: [C: 03+2] Add hywwiki to wikiversions.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499748 (https://phabricator.wikimedia.org/T212597) (owner: 10Ladsgroup) [11:20:30] (03CR) 10jenkins-bot: Reinstate "Initial configuration for hyw.wikipedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499477 (owner: 10MarcoAurelio) [11:20:48] (03CR) 10Filippo Giunchedi: [C: 03+2] "> Patch Set 9: Code-Review+1" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [11:20:58] (03PS10) 10Filippo Giunchedi: profile: kafkatee instance for udp2log compat [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) [11:21:19] PROBLEM - puppet last run on icinga1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:21:22] (03Merged) 10jenkins-bot: Add hywwiki to wikiversions.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499748 (https://phabricator.wikimedia.org/T212597) (owner: 10Ladsgroup) [11:21:27] PROBLEM - puppet last run on chlorine is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:21:39] (03CR) 10jenkins-bot: Add hywwiki to wikiversions.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499748 (https://phabricator.wikimedia.org/T212597) (owner: 10Ladsgroup) [11:24:01] RECOVERY - Backup of x1 in codfw on db1115 is OK: Backup for x1 at codfw taken less than 8 days ago and larger than 10 GB: Last one 2019-03-28 09:32:42 from dbstore2002.codfw.wmnet:3320 (16 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups [11:24:05] (03CR) 10GTirloni: [C: 03+1] k8s::flannel: remove upstart, use systemd::service instead [puppet] - 10https://gerrit.wikimedia.org/r/482118 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [11:25:37] (03PS1) 10Jbond: jessie-backports: remove jessie pinning. [puppet] - 10https://gerrit.wikimedia.org/r/499749 (https://phabricator.wikimedia.org/T219333) [11:26:05] (03PS2) 10Jbond: jessie-backports: remove jessie pinning. [puppet] - 10https://gerrit.wikimedia.org/r/499749 (https://phabricator.wikimedia.org/T219333) [11:27:31] !log ladsgroup@deploy1001 Synchronized dblists: T212597 (duration: 00m 56s) [11:27:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:27:34] T212597: Create Wikipedia Western Armenian - https://phabricator.wikimedia.org/T212597 [11:29:38] !log ladsgroup@deploy1001 rebuilt and synchronized wikiversions files: T212597 [11:29:45] https://phabricator.wikimedia.org/T219450 [11:29:53] This is a very serious bug. Now Main Namespace pages are now locked for everybody, including admins. Please fix ASAP. [11:29:59] (03CR) 10Muehlenhoff: [C: 03+1] jessie-backports: remove jessie pinning. [puppet] - 10https://gerrit.wikimedia.org/r/499749 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [11:30:18] (03PS2) 10Filippo Giunchedi: prometheus: set v2 max block duration to 24h [puppet] - 10https://gerrit.wikimedia.org/r/499742 (https://phabricator.wikimedia.org/T187987) [11:31:01] PROBLEM - puppet last run on labmon1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:31:40] ladsgroup@deploy1001: Failed to log message to wiki. Somebody should check the error logs. [11:33:29] yannf: for any new article? what happens? [11:34:13] can't edit Main Namespace pages :( [11:34:23] reported by a dozen people [11:34:48] (03PS5) 10Vgutierrez: acme_chief: Provide OCSP stapling support [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705) [11:35:04] the error message is : "wikitext" content is not allowed on page Julia Margaret Cameron in slot "Main" [11:35:22] 10Operations, 10Dumps-Generation: Switch dumps to component/php7.2 - https://phabricator.wikimedia.org/T218193 (10ArielGlenn) Tests in beta all look good. Update at will except for snapshot1008; that one can go tomorrow afternoon (Fri Mar 28). [11:35:22] now on https://commons.wikimedia.org/wiki/Julia_Margaret_Cameron [11:35:33] (03PS6) 10Vgutierrez: acme_chief: Provide OCSP stapling support [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705) [11:35:41] thanks, yannf [11:36:09] (03CR) 10Volans: [C: 04-1] "Most of the added code seems very similar to existing code, I think with some little effort the existing one could be generalized a bit to" (036 comments) [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/498268 (owner: 10CRusnov) [11:36:17] RECOVERY - puppet last run on labmon1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:36:22] I can delete and recreate the page, but I can't edit it [11:36:30] ok [11:37:16] yannf: Presumably this is since the deploy yesterday? [11:37:29] and I can't recreate the page with the same content after deletion [11:37:41] yes, probably [11:37:51] (03CR) 10Jbond: [C: 03+2] jessie-backports: remove jessie pinning. [puppet] - 10https://gerrit.wikimedia.org/r/499749 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [11:38:18] first report was 18:43, 27 March 2019 (UTC) [11:38:32] (03CR) 10Giuseppe Lavagetto: "> Patch Set 4: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/499222 (https://phabricator.wikimedia.org/T176916) (owner: 10Giuseppe Lavagetto) [11:38:33] Yeah then [11:39:03] (03PS1) 10Jbond: jessie-backports: remove pins files as its redundent [puppet] - 10https://gerrit.wikimedia.org/r/499751 (https://phabricator.wikimedia.org/T219333) [11:39:17] (03PS1) 10Reedy: Revert commonswiki to .22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499752 (https://phabricator.wikimedia.org/T219450) [11:39:26] Amir1: What's the state of the deploy server? ^ Want to deploy that [11:39:33] 10Operations, 10DBA, 10Data-Services, 10cloud-services-team: Prepare and check storage layer for hywwiki - https://phabricator.wikimedia.org/T212625 (10MarcoAurelio) https://hyw.wikipedia.org is now up. However there's: [XJyyFQpAAD0AAEF-ZM8AAAAD] 2019-03-28 11:37:57: Fatal exception of type "MediaWiki\R... [11:40:00] Reedy: at the middle of the isse [11:40:07] 10Operations, 10DBA, 10Data-Services, 10cloud-services-team: Prepare and check storage layer for hywwiki - https://phabricator.wikimedia.org/T212625 (10Marostegui) We are investigating on -databases [11:40:14] trying to fix it, it seems T212881 all over again [11:40:15] T212881: addWiki.php broken creating ES tables - https://phabricator.wikimedia.org/T212881 [11:41:11] (03PS7) 10Vgutierrez: acme_chief: Provide OCSP stapling support [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705) [11:42:57] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/15395/mwlog1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/499222 (https://phabricator.wikimedia.org/T176916) (owner: 10Giuseppe Lavagetto) [11:43:14] (03PS6) 10Giuseppe Lavagetto: arclamp: add a second instance for excimer logs [puppet] - 10https://gerrit.wikimedia.org/r/499222 (https://phabricator.wikimedia.org/T176916) [11:45:06] <_joe_> Amir1: can Reedy deploy that change? seems quite important [11:45:23] _joe_: It's config not mw version [11:45:29] Just happened around a similar time [11:45:45] https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/499531/ is the likely culprit [11:45:50] <_joe_> Reedy: yeah I was reading the ticket right now and about to ask [11:46:04] (03Abandoned) 10Reedy: Revert commonswiki to .22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499752 (https://phabricator.wikimedia.org/T219450) (owner: 10Reedy) [11:46:19] (03PS1) 10Ladsgroup: Revert "Reinstate "Initial configuration for hyw.wikipedia"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499754 [11:46:26] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/499751 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [11:46:30] _joe_: I can revert the patches and wait a little [11:47:01] (03CR) 10Ladsgroup: [C: 03+2] Revert "Reinstate "Initial configuration for hyw.wikipedia"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499754 (owner: 10Ladsgroup) [11:47:35] RECOVERY - puppet last run on icinga1001 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [11:47:43] RECOVERY - puppet last run on chlorine is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:48:09] (03Merged) 10jenkins-bot: Revert "Reinstate "Initial configuration for hyw.wikipedia"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499754 (owner: 10Ladsgroup) [11:48:22] (03PS1) 10Ladsgroup: Revert "Add hywwiki to wikiversions.json" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499755 [11:48:33] (03CR) 10Ladsgroup: [C: 03+2] Revert "Add hywwiki to wikiversions.json" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499755 (owner: 10Ladsgroup) [11:49:37] (03Merged) 10jenkins-bot: Revert "Add hywwiki to wikiversions.json" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499755 (owner: 10Ladsgroup) [11:49:56] marostegui: I'm reverting for now [11:50:01] PROBLEM - Backup of m2 in codfw on db1115 is CRITICAL: Backup for m2 at codfw taken more than 8 days ago: Most recent backup 2019-03-20 11:29:00 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [11:50:06] Amir1: ok [11:50:51] RECOVERY - puppet last run on labtestmetal2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:50:53] Amir1: To confirm, the database and tables are present on es for both cluster24 and cluster25. What is not present is the x1 database. However, logtash doesn't indicate a problem with that anymore [11:51:26] !log ladsgroup@deploy1001 Synchronized dblists: Revert T212597 (duration: 00m 58s) [11:51:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:30] T212597: Create Wikipedia Western Armenian - https://phabricator.wikimedia.org/T212597 [11:51:49] marostegui: That's a good info [11:52:04] I will check addwiki.php to see what's going with that [11:52:23] did you reopen the ticket or should a new one be created? [11:52:42] I think it's another issue [11:53:09] !log ladsgroup@deploy1001 rebuilt and synchronized wikiversions files: Revert T212597 [11:53:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:53:15] Reedy: I'm done [11:53:21] (03CR) 10jenkins-bot: Revert "Reinstate "Initial configuration for hyw.wikipedia"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499754 (owner: 10Ladsgroup) [11:53:23] (03CR) 10jenkins-bot: Revert "Add hywwiki to wikiversions.json" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499755 (owner: 10Ladsgroup) [11:53:41] Reedy: about T219450 did we get positive reports or is it WIP? [11:53:41] T219450: "wikitext" content is not allowed on page … in slot "Main" - https://phabricator.wikimedia.org/T219450 [11:54:07] Lucas_WMDE is trying to narrow down the mw-config commit that caused it/needs reverting [11:54:09] !log upgrading snapshot1005-1007/1009 to component/php72 (T218193) [11:54:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:54:12] T218193: Switch dumps to component/php7.2 - https://phabricator.wikimedia.org/T218193 [11:54:23] sorry, not the fix, the revert [11:54:29] is is live already? [11:54:34] No [11:54:48] :-( [11:55:26] (03PS1) 10Lucas Werkmeister (WMDE): Revert "SDC: Enable both new-style and old-style Wikibase federation on Commons" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499756 (https://phabricator.wikimedia.org/T219450) [11:55:27] soon [11:56:11] (03PS2) 10Jbond: jessie-backports: remove pins files as its redundent [puppet] - 10https://gerrit.wikimedia.org/r/499751 (https://phabricator.wikimedia.org/T219333) [11:56:28] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "will try it out on the debug server and revert again if it doesn’t help" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499756 (https://phabricator.wikimedia.org/T219450) (owner: 10Lucas Werkmeister (WMDE)) [11:57:28] (03Merged) 10jenkins-bot: Revert "SDC: Enable both new-style and old-style Wikibase federation on Commons" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499756 (https://phabricator.wikimedia.org/T219450) (owner: 10Lucas Werkmeister (WMDE)) [11:57:44] (03CR) 10Jbond: [C: 03+2] jessie-backports: remove pins files as its redundent [puppet] - 10https://gerrit.wikimedia.org/r/499751 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [11:58:43] testing… [11:58:51] * apergos bites nails [11:59:21] apergos: do you know the public url or testcommonswiki ? [12:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190328T1200) [12:00:09] *of [12:00:15] yup, that appears to have worked [12:00:17] syncing [12:00:22] then testcommonswiki as well, presumably [12:00:35] That can stay broken, presumably? [12:00:37] It's a test wiki [12:00:38] jynus: no, sorry [12:00:49] +1, I wanted to explore it for testing [12:01:09] Saves breaking commons to test the fix etc [12:01:11] ok whew, thanks Lucas [12:01:37] (03PS1) 10Jbond: jessie-backports: remove pins for packages in openstack-mitaka-jessie [puppet] - 10https://gerrit.wikimedia.org/r/499757 (https://phabricator.wikimedia.org/T219333) [12:02:02] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:499756|Revert "SDC: Enable both new-style and old-style Wikibase federation on Commons" (T219450)]] (duration: 00m 57s) [12:02:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:02:05] T219450: "wikitext" content is not allowed on page … in slot "Main" - https://phabricator.wikimedia.org/T219450 [12:02:10] (03PS2) 10Jbond: jessie-backports: remove pins for packages in openstack-mitaka-jessie [puppet] - 10https://gerrit.wikimedia.org/r/499757 (https://phabricator.wikimedia.org/T219333) [12:02:37] jynus: https://test-commons.wikimedia.org/wiki/Main_Page [12:02:45] mutante: thanks! [12:03:13] we need a better way for these to come to folks' notice when they are urgent and affect main functionality of the site https://phabricator.wikimedia.org/T219450 [12:04:04] what's failing now? [12:04:13] "to catch issues a day or so before they hit the actual Wikimedia Commons" sounds like in this case we needed slightly more than a day.. but 2 days or so [12:04:28] (03CR) 10jenkins-bot: Revert "SDC: Enable both new-style and old-style Wikibase federation on Commons" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499756 (https://phabricator.wikimedia.org/T219450) (owner: 10Lucas Werkmeister (WMDE)) [12:04:46] * Lucas_WMDE takes a deep breath [12:05:02] (03CR) 10Lucas Werkmeister (WMDE): "seems to have worked" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499756 (https://phabricator.wikimedia.org/T219450) (owner: 10Lucas Werkmeister (WMDE)) [12:05:12] \o/ [12:07:01] apparently this config had been enabled on testcommonswiki for almost a week [12:07:08] and that was still not enough? :/ [12:07:15] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/499757 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [12:07:17] (unless the config was only made breaking by some other change, I suppose) [12:07:28] hauskatze: editing main pages on commons.. and now it works again [12:07:31] Lucas_WMDE: You think we test things? :p [12:07:33] enwiki is for testing [12:07:45] nope. and that ticket was not noticed by anyone here; if yannf hadn't reported it we still wouldn't know [12:07:45] well. your edit at https://commons.wikimedia.org/w/index.php?title=Orgues_d%27Alsace&diff=prev&oldid=344162314 seems to proof it's fixed [12:08:35] ah yes, works now [12:08:39] yannf: thanks [12:08:40] I feel like this warrants an 'incident report' (main namespace articles uneditable > 12 hours on commons) but what lessons go into that? [12:08:45] thank you all [12:08:46] test-commons says specifically the whole existence is to notice breaking changes a day before they are on commons ..sigh [12:08:58] apergos: production (regression) testing [12:09:06] maybe so [12:09:17] also, [12:09:34] * apergos slaps Reedy around with a big trout-shaped object [12:09:46] I saw that remark about where to test, don't think I didn't [12:10:26] I agree an incident report is probably warranted [12:10:39] not sure who’d be responsible for it [12:10:40] Lucas_WMDE: do you want to do the honors, since you fixed it? at least, putting the things you did [12:10:44] so reduce unbreak now on task, make sure to handover to the right team? [12:10:46] we can add to it [12:11:19] alright, I can start [12:11:30] looking at the RC of test-commons, probably the issue was nobody tried editing an article in main name space.. while a lot of file uploads were tested [12:11:37] link us when you want us to add thing [12:11:38] s [12:13:08] !log move git from jessie-wikimedia backports repo components/ci [12:13:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:20:00] !log removing php 7.0 packages from snapshot1005-1007/1009, dumps are only using 7.2 (T218193) [12:20:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:20:08] T218193: Switch dumps to component/php7.2 - https://phabricator.wikimedia.org/T218193 [12:20:57] (03PS3) 10Jcrespo: mariadb-backups: Once transferred, chown and prepare them as non-root [puppet] - 10https://gerrit.wikimedia.org/r/499736 (https://phabricator.wikimedia.org/T206203) [12:21:22] woo hoo! [12:21:49] (03CR) 10Jbond: [C: 03+2] jessie-backports: remove pins for packages in openstack-mitaka-jessie [puppet] - 10https://gerrit.wikimedia.org/r/499757 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [12:23:10] has anyone confirmed that the bug actually affected testcommonswiki as well? [12:24:31] (03PS4) 10Jcrespo: mariadb-backups: Once transferred, chown and prepare them as non-root [puppet] - 10https://gerrit.wikimedia.org/r/499736 (https://phabricator.wikimedia.org/T206203) [12:24:43] Lucas_WMDE I did [12:24:52] it can be done now, or at least a few minute ago [12:24:55] (03PS1) 10Jbond: jessie-backports: add components/ci repository [puppet] - 10https://gerrit.wikimedia.org/r/499758 (https://phabricator.wikimedia.org/T219333) [12:25:00] it is as described [12:25:25] expects main to be wikidata jsons, not wikitext [12:25:35] okay thanks [12:25:45] incident documentation started at https://wikitech.wikimedia.org/wiki/Incident_documentation/20190328-commons, please expand [12:25:59] (03CR) 10jerkins-bot: [V: 04-1] jessie-backports: add components/ci repository [puppet] - 10https://gerrit.wikimedia.org/r/499758 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [12:26:06] I’ll send an email to ops@ next [12:27:39] let me find the exact timestamps and add it for workflow purposes [12:27:54] oh, it is below already [12:31:35] (03PS2) 10Jbond: jessie-backports: add components/ci repository [puppet] - 10https://gerrit.wikimedia.org/r/499758 (https://phabricator.wikimedia.org/T219333) [12:36:00] (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Once transferred, chown and prepare them as non-root [puppet] - 10https://gerrit.wikimedia.org/r/499736 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [12:37:55] made a couple tiny edits [12:39:28] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me, two nits" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/499758 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [12:40:25] if we unbreak the main namespace on test commons, that’ll probably break the “depicts” testing https://commons.wikimedia.org/wiki/Commons:Structured_data/Get_involved/Feedback_requests/Depicts_testing [12:41:02] and since https://test-commons.wikimedia.org/wiki/Special:AllPages only lists two pages in the main namespace anyways, one protected and the other a redirect [12:41:08] I’d say let’s just leave it as it is for now [12:41:55] PROBLEM - puppet last run on puppetmaster2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:42:29] (03CR) 10Jbond: jessie-backports: add components/ci repository (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/499758 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [12:43:38] (which does make me wonder, why was the configuration enabled on real Commons? am I missing something and it was already used after all?) [12:46:43] (03PS3) 10Jbond: jessie-backports: add components/ci repository [puppet] - 10https://gerrit.wikimedia.org/r/499758 (https://phabricator.wikimedia.org/T219333) [12:48:57] PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [12:49:11] that is me, I forgot to press yes [12:50:07] (03CR) 10Volans: "Few comments inline" (033 comments) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/495267 (https://phabricator.wikimedia.org/T212526) (owner: 10CRusnov) [12:50:15] RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge. [12:53:02] apergos, testcommons is not really useful: I have tested SDC proposals there, but things do not work like Commons (images are not shown, etc.) [12:53:23] good to know [12:53:34] (03PS1) 10Lucas Werkmeister (WMDE): exec watch in fatalmonitor [puppet] - 10https://gerrit.wikimedia.org/r/499761 [12:54:53] what can be done to make testcommons more like commons I wonder [12:54:57] addshore: any idea why beta is affected? [12:55:06] I would’ve expected that to pick up the fix [12:55:17] or does that call itself 'testcommonswiki'? [12:55:23] Lucas_WMDE: no the config is deployed there separately [12:55:28] oh [12:55:31] leaving test and beta broken for this is probably fine [12:55:44] well having outdated config on beta is probably not good, right? [12:55:44] they have been broken there for weeks, if not over a month, just apparently noone noticed [12:56:21] apergos, making files appear would be a good start [12:56:24] not outdated, different, as they point to different wikidatas [12:56:53] (03CR) 10Hashar: jessie-backports: add components/ci repository (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/499758 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [12:57:08] I have had thumbnail issues recently [12:57:09] added a note about testcommons reduced functionality on incident report [12:57:21] "which does make me wonder, why was the configuration enabled on real Commons?", to catch more issues, we didnt think anything would happen however [12:57:30] * apergos does the backread [12:57:59] which can't be tested on https://test-commons.wikimedia.org/wiki/Special:NewFiles [12:58:08] Lucas_WMDE: i guess the sdoc & wikidata team would be responsible for the incident report? [12:58:37] on the wikidata side, mainly leszek & I, sdoc side, jamesF probably [12:58:43] addshore: I already started the report, but others are welcome to expand it [12:58:59] sdoc as soon as they wake up, I guess [12:59:31] indeed [12:59:44] I'm going to dive in and see if I can figure out exactly why the thing happened now :) [12:59:50] I wish we could add people as (back-)subscribers to a wiki page so they would wake up and be notified to go read it... [13:00:02] thanks, interested in your findings [13:00:04] Deploy window MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190328T1300) [13:00:19] really? a dpeloy already? uh [13:00:23] (03PS1) 10Ema: ATS: use pointer trick in SystemTap scripts [puppet] - 10https://gerrit.wikimedia.org/r/499762 (https://phabricator.wikimedia.org/T213263) [13:00:35] (03PS1) 10Ema: ATS: add ats_transaction_err.stp [puppet] - 10https://gerrit.wikimedia.org/r/499763 (https://phabricator.wikimedia.org/T213263) [13:00:47] Lucas_WMDE: but yes, reverting the config will not have broken anything, no need to worry there [13:00:48] apergos: that is the european time window, it is always set but is not used this week ( Dan Duvall is the train conductor and still sleeping at this time) /D [13:00:56] okey dokey [13:00:59] it was going to be turned on and left to sit for a short while to see if anything final was missed [13:01:00] :-) [13:01:19] addshore: okay, it did its job then [13:01:25] though it would’ve been good to notice the issue faster [13:01:55] indeed [13:02:00] im surprised commons people didnt notice sooner? [13:02:25] well they did notice pretty quickly, we just didn’t find out about it [13:02:35] (03PS2) 10Ema: ATS: use pointer trick in SystemTap scripts [puppet] - 10https://gerrit.wikimedia.org/r/499762 (https://phabricator.wikimedia.org/T213263) [13:02:37] Yup, looks like it was 4 hours until the phab ticket was filed [13:03:01] if the Phabricator task had been triaged UBN! earlier, perhaps it wouldn’t have taken so long [13:03:11] but I’m not sure if that’s an outcome we want, to encourage that [13:03:14] Lucas_WMDE: ahh yes, it wasn't UBN initially :) [13:03:15] or if that’s going to result in too much noise [13:03:38] I still had no idea it happened until Tom just told me in our call [13:06:15] ah yeah, bug is there: https://test-commons.wikimedia.org/w/index.php?title=Julia_Margaret_Cameron&action=submit [13:06:23] (03CR) 10Ema: [C: 03+2] ATS: use pointer trick in SystemTap scripts [puppet] - 10https://gerrit.wikimedia.org/r/499762 (https://phabricator.wikimedia.org/T213263) (owner: 10Ema) [13:06:44] (03PS2) 10Ema: ATS: add ats_transaction_err.stp [puppet] - 10https://gerrit.wikimedia.org/r/499763 (https://phabricator.wikimedia.org/T213263) [13:08:12] (03CR) 10Ema: [C: 03+2] ATS: add ats_transaction_err.stp [puppet] - 10https://gerrit.wikimedia.org/r/499763 (https://phabricator.wikimedia.org/T213263) (owner: 10Ema) [13:08:19] RECOVERY - puppet last run on puppetmaster2001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [13:12:41] !log ppchelko@deploy1001 Started deploy [cpjobqueue/deploy@17285f8]: Partition htmlCacheUpdate topic, step 1 T219159 [13:12:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:45] T219159: Partition htmlCacheUpdate job topic - https://phabricator.wikimedia.org/T219159 [13:14:26] !log ppchelko@deploy1001 Finished deploy [cpjobqueue/deploy@17285f8]: Partition htmlCacheUpdate topic, step 1 T219159 (duration: 01m 46s) [13:14:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:24] Lucas_WMDE, I have suggested long ago that users could set a severity level for bugs [13:17:59] "users could set a severity level for bugs" isn't that possible now? [13:18:02] I think that could also help devs [13:19:09] I think that should be possible for everyone [13:19:17] jynus, it seems only devs are allowed to do that [13:19:43] I dont think you can do it in the bug creation form, but you can edit the priority once it has been created? [13:20:27] well, I think I was reverted when I did it [13:20:49] RECOVERY - Backup of m2 in codfw on db1115 is OK: Backup for m2 at codfw taken less than 8 days ago and larger than 10 GB: Last one 2019-03-28 08:21:33 from db2044.codfw.wmnet (370 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups [13:21:01] https://www.mediawiki.org/wiki/Phabricator/Project_management#Setting_task_priorities is unclear- I think it should be ok unless someone that attends the bug contradicts you [13:21:02] I was told "you are not a dev" [13:21:25] !log ppchelko@deploy1001 Started deploy [cpjobqueue/deploy@c120b38]: Partition htmlCacheUpdate topic, explicitly exclude htmlCacheUpdate T219159 [13:21:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:21:29] T219159: Partition htmlCacheUpdate job topic - https://phabricator.wikimedia.org/T219159 [13:21:57] for example, if you say it is important, and someone else disagrees, I think it make sense- if no one sees it because it is in triage it should be justified [13:22:02] (03PS4) 10Jbond: jessie-backports: add components/ci repository [puppet] - 10https://gerrit.wikimedia.org/r/499758 (https://phabricator.wikimedia.org/T219333) [13:22:10] I will suggest to add that to the documentation [13:22:11] (03CR) 10Jbond: jessie-backports: add components/ci repository (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/499758 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [13:22:13] !log ppchelko@deploy1001 Finished deploy [cpjobqueue/deploy@c120b38]: Partition htmlCacheUpdate topic, explicitly exclude htmlCacheUpdate T219159 (duration: 00m 48s) [13:22:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:47] if it is OK, I will set a severity level next time I report a bug [13:22:52] (03CR) 10Alex Monk: [C: 03+1] acme_chief: Issue wikiba.se certificate [puppet] - 10https://gerrit.wikimedia.org/r/499189 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez) [13:23:01] (03PS5) 10Jbond: jessie-backports: add components/ci repository [puppet] - 10https://gerrit.wikimedia.org/r/499758 (https://phabricator.wikimedia.org/T219333) [13:23:08] yannf: I think it is ok if it is a clear "things are broken, fix now" [13:23:23] and important stuff like "we cannot edit" [13:23:47] sure, it could be the other way: "a bit annoying, but not serious" [13:23:53] sometimes some common sense is expected to be applied [13:24:12] if the site is down developers aren't going to revert you setting UBN [13:24:27] ok ;) [13:24:33] same for other bugs universally considered very serious [13:25:15] requestors shouldn't be prioritising their own requests [13:25:35] bug reporters of very big problems might [13:27:08] (03PS3) 10Gehel: elasticsearch: use standard resources for icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/499511 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe) [13:27:21] avoid prioritising controversial things [13:27:24] yannf: so we are looking on how to improve response time [13:27:39] I think another thing that may have contributed to the delay is the title [13:28:22] "Can't edit on Commons" vs "" [13:28:40] part of the problem with that task was likely the tag chosen [13:28:58] some bugs I have reported are still on Triage weeks, or even months, after I open them [13:29:01] indeed, but that shouldn't be the reporter's problem [13:29:12] (03CR) 10Gehel: [C: 03+2] elasticsearch: use standard resources for icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/499511 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe) [13:29:31] onimisionipe: ^ [13:29:50] I am going to add an actionable which is "give a recommendation" on how to act in case of large issues [13:30:03] for faster response [13:30:31] good, but my point is that a recommendation would be useful in other cases [13:30:58] again, not the reporter's fault, but I think we all agree it could be handled better/faster [13:32:35] e.g. https://phabricator.wikimedia.org/T212101 reported Dec 17, still on triage [13:33:17] not all tasks get given a priority [13:33:33] I don't usually check the field is set when working on something [13:33:34] I agree it is not a serious issue, but not having a priority 3 months after it was open is... bad [13:34:10] not all projects are actively prioritised [13:34:11] Well, other than UBN, a lot of priorities are ignored [13:34:21] don't worry too much about priority [13:34:34] as Reedy, unless it is a clear UBN [13:34:39] *says [13:35:03] (for example, I work sometimes on untriaged tickets) [13:35:08] (03CR) 10Muehlenhoff: [C: 03+1] jessie-backports: add components/ci repository [puppet] - 10https://gerrit.wikimedia.org/r/499758 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [13:41:34] yannf: I added as part on the actionables analyzing what could be done better on that area [13:42:03] ok thanks [13:42:57] thank you again for the ping [13:49:46] (03PS1) 10Addshore: Do not set useEntitySourceBasedFederation for wikibase client [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499768 [13:49:58] !log filippo@puppetmaster1001 conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet [13:50:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:15] !log filippo@puppetmaster1001 conftool action : set/pooled=no; selector: name=prometheus2004.codfw.wmnet [13:50:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:53] (03PS1) 10Dzahn: k8s:proxy: remove upstart support [puppet] - 10https://gerrit.wikimedia.org/r/499769 [13:55:57] (03CR) 10Jbond: [C: 03+2] jessie-backports: add components/ci repository [puppet] - 10https://gerrit.wikimedia.org/r/499758 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [13:56:10] (03PS6) 10Jbond: jessie-backports: add components/ci repository [puppet] - 10https://gerrit.wikimedia.org/r/499758 (https://phabricator.wikimedia.org/T219333) [13:56:12] (03PS1) 10Dzahn: xvfb: remove upstart support [puppet] - 10https://gerrit.wikimedia.org/r/499771 [13:57:11] !log upgrading Java on elasticsearch hosts [13:57:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:58:06] (03CR) 10Muehlenhoff: [C: 03+1] xvfb: remove upstart support [puppet] - 10https://gerrit.wikimedia.org/r/499771 (owner: 10Dzahn) [13:59:34] !log restarting elasticsearch on elastic2050 to validate JVM upgrade [13:59:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:02:44] (03CR) 10Vgutierrez: [C: 03+2] acme_chief: Issue wikiba.se certificate [puppet] - 10https://gerrit.wikimedia.org/r/499189 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez) [14:02:53] (03PS4) 10Vgutierrez: acme_chief: Issue wikiba.se certificate [puppet] - 10https://gerrit.wikimedia.org/r/499189 (https://phabricator.wikimedia.org/T213705) [14:03:51] gehel: yay!!! [14:04:07] Thanks! [14:04:14] 10Operations, 10monitoring, 10Patch-For-Review: Serve >= 50% of production Prometheus systems with Prometheus v2 - https://phabricator.wikimedia.org/T187987 (10fgiunchedi) Status update: codfw and eqiad are served by Prometheus v2 now (single host in each, prometheus2003 and prometheus1003 respectively). The... [14:05:22] (03PS5) 10Vgutierrez: acme_chief: Issue wikiba.se certificate [puppet] - 10https://gerrit.wikimedia.org/r/499189 (https://phabricator.wikimedia.org/T213705) [14:05:41] (03PS1) 10Jbond: jessie-backports: remove pining from packages [puppet] - 10https://gerrit.wikimedia.org/r/499773 (https://phabricator.wikimedia.org/T219333) [14:07:00] !log reindexing changes from '2019-03-26T12:00:00Z' to '2019-03-28T12:00:00Z' into cirrus / elasticsearch - T218878 [14:07:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:04] T218878: Upgrade to elasticsearch 6.5.4 for cirrus / codfw - https://phabricator.wikimedia.org/T218878 [14:07:52] 10Operations: ms1002 - broken dpkg - https://phabricator.wikimedia.org/T80033 (10Dzahn) [14:09:11] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/499773 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [14:09:56] (03CR) 10Jbond: [C: 03+2] jessie-backports: remove pining from packages [puppet] - 10https://gerrit.wikimedia.org/r/499773 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [14:10:06] (03PS2) 10Jbond: jessie-backports: remove pining from packages [puppet] - 10https://gerrit.wikimedia.org/r/499773 (https://phabricator.wikimedia.org/T219333) [14:10:07] 10Operations, 10Discovery-Search: Create cookbook to reindex into elasticsearch / cirrus - https://phabricator.wikimedia.org/T219507 (10Gehel) [14:14:15] (03PS1) 10Dzahn: remove ms1002 from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/499776 [14:17:46] 10Operations, 10Acme-chief, 10Traffic, 10Goal, 10Patch-For-Review: Deploy managed LetsEncrypt certs for all public use-cases - https://phabricator.wikimedia.org/T213705 (10Vgutierrez) ` vgutierrez@acmechief1001:~$ sudo -i openssl x509 -text -noout -in /var/lib/acme-chief/certs/wikibase/live/rsa-2048.crt... [14:20:05] (03PS1) 10Ladsgroup: Add the 'urlshortener-manage-url' right and enable it for stewards [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499777 (https://phabricator.wikimedia.org/T133109) [14:20:07] jouncebot: now [14:20:07] For the next 0 hour(s) and 39 minute(s): MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190328T1300) [14:20:59] (03PS1) 10Jbond: jessie-backports: remove pins for jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/499778 (https://phabricator.wikimedia.org/T219333) [14:21:50] (03CR) 10Volans: Add report which checks against puppetdb and compares serial numbers (032 comments) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/495267 (https://phabricator.wikimedia.org/T212526) (owner: 10CRusnov) [14:27:39] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install (5) dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Papaul) [14:28:21] (03CR) 10CRusnov: "Thanks for the feedback! Replies inline." (034 comments) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/495267 (https://phabricator.wikimedia.org/T212526) (owner: 10CRusnov) [14:28:27] (03PS1) 10Vgutierrez: tlsproxy: Allow acme-chief certs to be deployed [puppet] - 10https://gerrit.wikimedia.org/r/499779 (https://phabricator.wikimedia.org/T213705) [14:28:31] (03PS1) 10Vgutierrez: hieradata: Deploy acme-chief unified certificate in eqsin cp servers [puppet] - 10https://gerrit.wikimedia.org/r/499780 (https://phabricator.wikimedia.org/T213705) [14:29:49] 10Operations, 10Electron-PDFs, 10Core Platform Team Backlog (Attic), 10Services (attic): electron/pdfrender hangs - https://phabricator.wikimedia.org/T174916 (10crusnov) Additional follow-up: THere were numerous OOMs in the log, even though the box has around 20gb of free ram +/- buffers. I'm not sure if t... [14:31:12] !log ppchelko@deploy1001 Started deploy [cpjobqueue/deploy@3a8a889]: Partition htmlCacheUpdate topic, step 2 T219159 [14:31:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:31:16] T219159: Partition htmlCacheUpdate job topic - https://phabricator.wikimedia.org/T219159 [14:32:05] !log ppchelko@deploy1001 Finished deploy [cpjobqueue/deploy@3a8a889]: Partition htmlCacheUpdate topic, step 2 T219159 (duration: 00m 53s) [14:32:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:21] (03CR) 10CDanis: [C: 03+1] prometheus: set v2 max block duration to 24h [puppet] - 10https://gerrit.wikimedia.org/r/499742 (https://phabricator.wikimedia.org/T187987) (owner: 10Filippo Giunchedi) [14:32:34] (03Abandoned) 10Jbond: jessie-backports: Remove unsued pins [puppet] - 10https://gerrit.wikimedia.org/r/499453 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [14:33:05] (03PS8) 10Vgutierrez: acme_chief: Provide OCSP stapling support [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705) [14:33:07] (03PS2) 10Vgutierrez: tlsproxy: Allow acme-chief certs to be deployed [puppet] - 10https://gerrit.wikimedia.org/r/499779 (https://phabricator.wikimedia.org/T213705) [14:33:09] (03PS2) 10Vgutierrez: hieradata: Deploy acme-chief unified certificate in eqsin cp servers [puppet] - 10https://gerrit.wikimedia.org/r/499780 (https://phabricator.wikimedia.org/T213705) [14:34:14] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install (1) testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Papaul) [14:35:32] (03CR) 10Ayounsi: Add report which checks against puppetdb and compares serial numbers (031 comment) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/495267 (https://phabricator.wikimedia.org/T212526) (owner: 10CRusnov) [14:36:29] (03PS2) 10Dzahn: remove ms1002 from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/499776 [14:36:51] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install (1) testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Papaul) switch port information asw-c5-codfw ge-5/0 11 [14:37:06] (03CR) 10Dzahn: [C: 03+2] "Host ms1002.eqiad.wmnet not found: 3(NXDOMAIN)" [puppet] - 10https://gerrit.wikimedia.org/r/499776 (owner: 10Dzahn) [14:37:46] (03PS3) 10Dzahn: remove ms1002 from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/499776 [14:38:36] 10Operations, 10monitoring, 10Patch-For-Review: Serve >= 50% of production Prometheus systems with Prometheus v2 - https://phabricator.wikimedia.org/T187987 (10fgiunchedi) As far as eqsin/esams/ulsfo are concerned I've thought about rsync'ing the data off hosts for migration, however there are a lot of metri... [14:41:12] (03CR) 10Alexandros Kosiaris: [C: 03+1] k8s:proxy: remove upstart support [puppet] - 10https://gerrit.wikimedia.org/r/499769 (owner: 10Dzahn) [14:41:43] (03PS2) 10Elukey: admin: allow users to be removed preserving their home directories [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) [14:41:48] (03PS9) 10Vgutierrez: acme_chief: Provide OCSP stapling support [puppet] - 10https://gerrit.wikimedia.org/r/499746 (https://phabricator.wikimedia.org/T213705) [14:41:50] (03PS3) 10Vgutierrez: tlsproxy: Allow acme-chief certs to be deployed [puppet] - 10https://gerrit.wikimedia.org/r/499779 (https://phabricator.wikimedia.org/T213705) [14:41:52] (03PS3) 10Vgutierrez: hieradata: Deploy acme-chief unified certificate in eqsin cp servers [puppet] - 10https://gerrit.wikimedia.org/r/499780 (https://phabricator.wikimedia.org/T213705) [14:43:42] (03PS3) 10Elukey: admin: allow users to be removed preserving their home directories [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) [14:44:17] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Archival of home directories on servers with very large homes - https://phabricator.wikimedia.org/T215171 (10elukey) [14:44:46] (03CR) 10Vgutierrez: "pcc looks as expected, it shows no significative changes in cp3030 and the expected changes in cp5007: https://puppet-compiler.wmflabs.org" [puppet] - 10https://gerrit.wikimedia.org/r/499780 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez) [14:45:11] !log ppchelko@deploy1001 Started deploy [cpjobqueue/deploy@4deeb04]: Partition htmlCacheUpdate topic, final cleanup stage T219159 [14:45:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:15] T219159: Partition htmlCacheUpdate job topic - https://phabricator.wikimedia.org/T219159 [14:46:03] !log ppchelko@deploy1001 Finished deploy [cpjobqueue/deploy@4deeb04]: Partition htmlCacheUpdate topic, final cleanup stage T219159 (duration: 00m 52s) [14:46:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:47:48] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install (5) dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Papaul) switch port information db2097 asw-a6-codfw ge-6/0 6 db2098 asw-b6-codfw ge-6/0/0 db2099 asw-c6-codfw ge-6/0/6 db2100 asw-d1-codfw ge-1/0/0 db2101 as... [14:48:51] (03CR) 10CRusnov: "> Patch Set 6:" (031 comment) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/495267 (https://phabricator.wikimedia.org/T212526) (owner: 10CRusnov) [14:50:15] 10Operations, 10ops-codfw, 10decommission, 10Patch-For-Review, 10cloud-services-team (Kanban): decommission: cloudnet2001-dev.codfw.wmnet - https://phabricator.wikimedia.org/T218025 (10Papaul) p:05Triage→03Normal [14:52:06] 10Operations, 10wikitech.wikimedia.org: wikitech-static cert renewal seems to stop apache2 - https://phabricator.wikimedia.org/T214640 (10Dzahn) :( Sad. Such a fight to get certbot to take over and not have to manually deal with renewals anymore and now this. I can find some other users reporting it: https:... [14:53:49] (03PS2) 10Cwhite: prometheus: clean up node exporter transition code [puppet] - 10https://gerrit.wikimedia.org/r/499667 (https://phabricator.wikimedia.org/T213708) [14:54:09] (03CR) 10Cwhite: prometheus: clean up node exporter transition code (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/499667 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [14:54:42] (03CR) 10Muehlenhoff: [C: 03+1] prometheus: clean up node exporter transition code [puppet] - 10https://gerrit.wikimedia.org/r/499667 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [14:54:59] (03PS2) 10Ottomata: eventgate-analytics - POST test event to service for readinessProbe [deployment-charts] - 10https://gerrit.wikimedia.org/r/499576 (https://phabricator.wikimedia.org/T218680) [14:55:27] (03CR) 10Cwhite: [C: 03+1] prometheus: set v2 max block duration to 24h [puppet] - 10https://gerrit.wikimedia.org/r/499742 (https://phabricator.wikimedia.org/T187987) (owner: 10Filippo Giunchedi) [14:56:36] 10Operations, 10wikitech.wikimedia.org: wikitech-static cert renewal seems to stop apache2 - https://phabricator.wikimedia.org/T214640 (10Dzahn) Our certbot crontab lines are: ` @monthly /usr/local/sbin/acme-setup -i wikitech-static -s wikitech-static.wikimedia.org -m acme -w apache2 @monthly /usr/local/sbin... [14:56:54] 10Operations, 10ops-codfw, 10decommission, 10Patch-For-Review, 10cloud-services-team (Kanban): decommission: cloudnet2001-dev.codfw.wmnet - https://phabricator.wikimedia.org/T218025 (10Papaul) Switch information ge-8/0/10 ge-8/0/11 [14:57:43] (03CR) 10Ottomata: [V: 03+2 C: 03+2] eventgate-analytics - POST test event to service for readinessProbe [deployment-charts] - 10https://gerrit.wikimedia.org/r/499576 (https://phabricator.wikimedia.org/T218680) (owner: 10Ottomata) [14:59:20] 10Operations, 10wikitech.wikimedia.org: wikitech-static cert renewal seems to stop apache2 - https://phabricator.wikimedia.org/T214640 (10Krenair) Can't we have it use webroot-based authorisation with the existing web server? [14:59:27] (03CR) 10Volans: Add report which checks against puppetdb and compares serial numbers (031 comment) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/495267 (https://phabricator.wikimedia.org/T212526) (owner: 10CRusnov) [15:01:00] (03PS1) 10Ottomata: eventgate-analytics - bump to 0.0.21 [deployment-charts] - 10https://gerrit.wikimedia.org/r/499786 [15:01:11] (03PS2) 10Ottomata: eventgate-analytics - bump to 0.0.21 [deployment-charts] - 10https://gerrit.wikimedia.org/r/499786 [15:01:29] (03CR) 10Ottomata: [V: 03+2 C: 03+2] eventgate-analytics - bump to 0.0.21 [deployment-charts] - 10https://gerrit.wikimedia.org/r/499786 (owner: 10Ottomata) [15:06:14] !log otto@deploy1001 scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging] [15:06:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:17] !log otto@deploy1001 scap-helm eventgate-analytics cluster staging completed [15:06:17] !log otto@deploy1001 scap-helm eventgate-analytics finished [15:06:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:47] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey) [15:07:47] !log otto@deploy1001 scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw] [15:07:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:50] !log otto@deploy1001 scap-helm eventgate-analytics cluster codfw completed [15:07:50] !log otto@deploy1001 scap-helm eventgate-analytics finished [15:07:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:54] !log otto@deploy1001 scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad] [15:07:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:56] !log otto@deploy1001 scap-helm eventgate-analytics cluster eqiad completed [15:07:56] !log otto@deploy1001 scap-helm eventgate-analytics finished [15:07:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:48] (03PS18) 10Daimona Eaytoy: Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772 [15:09:54] (03PS17) 10Daimona Eaytoy: Move all AbuseFilter config to abusefilter.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477063 (https://phabricator.wikimedia.org/T145931) [15:10:12] (03PS2) 10Daimona Eaytoy: Revert "Revert "Remove $wgAbuseFilterRuntimeProfile"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498818 (https://phabricator.wikimedia.org/T191039) [15:15:42] !log wikitech-static - removing acme-setup cron jobs from root's crontab. this was used before the switch to certbot, is unrelated and added to confusion and maybe the problem (T214640) [15:15:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:15:47] T214640: wikitech-static cert renewal seems to stop apache2 - https://phabricator.wikimedia.org/T214640 [15:16:01] 10Operations, 10CirrusSearch, 10Wikidata, 10Discovery-Search (Current work): Elasticsearch indices went read-only causing huge lag - https://phabricator.wikimedia.org/T219364 (10dcausse) Backlog of updates is now completely absorbed, a script has been run to catchup lost updates, nothing we can do at this... [15:16:11] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10RobH) [15:17:24] (03PS3) 10Jbond: jessie-backports: warn users if they try to use backports on jessie [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) [15:17:59] (03CR) 10Bstorm: [C: 03+1] "I think we killed off the last flannel trusty node. This should be good to go." [puppet] - 10https://gerrit.wikimedia.org/r/482118 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [15:18:34] (03PS1) 10Mathew.onipe: elasticsearch: split plugin into base and cirrus [puppet] - 10https://gerrit.wikimedia.org/r/499790 (https://phabricator.wikimedia.org/T214921) [15:18:39] 10Operations, 10ops-eqiad, 10Operations-Software-Development, 10monitoring, 10Patch-For-Review: ms-be1043 sdk failed - https://phabricator.wikimedia.org/T218544 (10fgiunchedi) >>! In T218544#5054275, @Cmjohnson wrote: > You have successfully submitted request SR988320478. > > A disk has been ordered Th... [15:18:42] (03PS3) 10Dzahn: k8s::flannel: remove upstart, use systemd::service instead [puppet] - 10https://gerrit.wikimedia.org/r/482118 (https://phabricator.wikimedia.org/T194724) [15:21:13] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1020 - https://phabricator.wikimedia.org/T214778 (10fgiunchedi) >>! In T214778#5054226, @Cmjohnson wrote: > This server raid appears to be in optimal condition. I verified the h/w and icinga is not reporting a degraded raid. Resolving for now The firmware upgra... [15:25:39] (03CR) 10Mathew.onipe: "PCC is Ok. Changes are expected: https://puppet-compiler.wmflabs.org/compiler1002/15403/" [puppet] - 10https://gerrit.wikimedia.org/r/499790 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe) [15:25:52] 10Operations, 10ops-codfw, 10decommission, 10cloud-services-team (Kanban): decommission: labtestvirt200[12].codfw.wmnet - https://phabricator.wikimedia.org/T218023 (10Papaul) ge-5/0/8 ge-5/0/30 ge-5/0/31 ge-5/0/17 [15:37:30] 10Operations, 10Analytics, 10Discovery, 10Research: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10Nuria) Also, https://etherpad.wikimedia.org/p/moving-data-analytics-prod [15:41:41] (03CR) 10Cwhite: [C: 03+2] "PCC looks good: https://puppet-compiler.wmflabs.org/compiler1002/15404/" [puppet] - 10https://gerrit.wikimedia.org/r/499667 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [15:41:55] (03PS3) 10Cwhite: prometheus: clean up node exporter transition code [puppet] - 10https://gerrit.wikimedia.org/r/499667 (https://phabricator.wikimedia.org/T213708) [15:43:38] (03PS4) 10Jbond: jessie-backports: warn users if they try to use backports on jessie [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) [15:47:42] dcausse: ping on https://phabricator.wikimedia.org/T219162#5053961 - still not working :) [15:47:52] (03PS1) 10DCausse: [cirrus] Use bm25 similarity for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499795 (https://phabricator.wikimedia.org/T219268) [15:48:15] Ive downloaded the patch to my home dir for the time being. [15:48:50] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Use bm25 similarity for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499795 (https://phabricator.wikimedia.org/T219268) (owner: 10DCausse) [15:48:56] Krinkle: ok, will ping Guillaume, cannot deploy myself [15:49:25] gehel: mind shipping https://gerrit.wikimedia.org/r/#/c/498927/ ? [15:49:37] looking [15:50:23] (03PS3) 10Gehel: mwgrep: add support for elasticsearch6 [puppet] - 10https://gerrit.wikimedia.org/r/498927 (https://phabricator.wikimedia.org/T219162) (owner: 10DCausse) [15:51:33] (03CR) 10Gehel: [C: 03+2] mwgrep: add support for elasticsearch6 [puppet] - 10https://gerrit.wikimedia.org/r/498927 (https://phabricator.wikimedia.org/T219162) (owner: 10DCausse) [15:51:36] (03CR) 10Mobrovac: "ping Filippo / Alex" [puppet] - 10https://gerrit.wikimedia.org/r/469791 (https://phabricator.wikimedia.org/T207143) (owner: 10Mobrovac) [15:51:42] gehel: thanks! [15:56:11] (03PS5) 10Jbond: jessie-backports: warn users if they try to use backports on jessie [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) [16:00:00] !log poweroff sessionstore2001 for a re-racking [16:00:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:05] godog and _joe_: Time to snap out of that daydream and deploy Puppet SWAT(Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190328T1600). [16:00:05] No GERRIT patches in the queue for this window AFAICS. [16:00:53] 10Operations, 10Performance-Team, 10Traffic: Send peering requests to AS with the worst TTFB - https://phabricator.wikimedia.org/T219486 (10ayounsi) The first step when looking at peering with a provider is to check if we're both present at a common exchange point. You can see where we are present on https:... [16:01:23] (03PS6) 10Jbond: jessie-backports: warn users if they try to use backports on jessie [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) [16:03:26] (03PS2) 10Gehel: elasticsearch: split plugin into base and cirrus [puppet] - 10https://gerrit.wikimedia.org/r/499790 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe) [16:04:07] 10Operations, 10ops-eqiad, 10Analytics: install new GPU in stat1005 - https://phabricator.wikimedia.org/T219522 (10RobH) p:05Triage→03Normal [16:04:29] (03CR) 10Gehel: [C: 03+2] elasticsearch: split plugin into base and cirrus [puppet] - 10https://gerrit.wikimedia.org/r/499790 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe) [16:05:38] 10Operations, 10ops-eqiad, 10Analytics: install new GPU in stat1005 - https://phabricator.wikimedia.org/T219522 (10RobH) So the main concern about this is if it will physically fit. Task T216226 - GPU upgrade for stat1005, has a sub-task T216528 where Chris took measurements of the inside of the chassis,... [16:09:23] 10Operations: Session storage Cassandra metrics (Prometheus) not being collected - https://phabricator.wikimedia.org/T219523 (10Eevans) [16:09:25] (03PS2) 10Jbond: jessie-backports: remove pins for jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/499778 (https://phabricator.wikimedia.org/T219333) [16:09:27] (03PS1) 10Jbond: jessie-backports: create new component for kube2proxy [puppet] - 10https://gerrit.wikimedia.org/r/499803 (https://phabricator.wikimedia.org/T213711) [16:18:19] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: install new GPU in stat1005 - https://phabricator.wikimedia.org/T219522 (10elukey) [16:20:02] (03CR) 10Muehlenhoff: [C: 03+1] jessie-backports: create new component for kube2proxy [puppet] - 10https://gerrit.wikimedia.org/r/499803 (https://phabricator.wikimedia.org/T213711) (owner: 10Jbond) [16:23:14] (03PS2) 10Jbond: jessie-backports: create new component for kube2proxy [puppet] - 10https://gerrit.wikimedia.org/r/499803 (https://phabricator.wikimedia.org/T213711) [16:24:32] (03CR) 10Jbond: [C: 03+2] jessie-backports: create new component for kube2proxy [puppet] - 10https://gerrit.wikimedia.org/r/499803 (https://phabricator.wikimedia.org/T213711) (owner: 10Jbond) [16:27:41] 10Operations, 10wikitech.wikimedia.org: wikitech-static cert renewal seems to stop apache2 - https://phabricator.wikimedia.org/T214640 (10Dzahn) Soo.. the acme-setup crons were unrelated and are removed now and the actual cron that comes with certbot, from the Debian package of certbot, so done by Debian, is:... [16:27:50] (03PS1) 10Arturo Borrero Gonzalez: openstack: add glance support for mitaka/stretch in cloudcontrol servers [puppet] - 10https://gerrit.wikimedia.org/r/499806 (https://phabricator.wikimedia.org/T215407) [16:29:12] (03CR) 10jerkins-bot: [V: 04-1] openstack: add glance support for mitaka/stretch in cloudcontrol servers [puppet] - 10https://gerrit.wikimedia.org/r/499806 (https://phabricator.wikimedia.org/T215407) (owner: 10Arturo Borrero Gonzalez) [16:29:26] (03PS3) 10Jbond: jessie-backports: remove pins for jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/499778 (https://phabricator.wikimedia.org/T219333) [16:29:29] (03PS1) 10Jbond: jessie-backports: remove redundant pins [puppet] - 10https://gerrit.wikimedia.org/r/499808 (https://phabricator.wikimedia.org/T219333) [16:31:01] PROBLEM - Apache HTTP on mw1315 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Application_servers [16:31:26] (03PS1) 10Arturo Borrero Gonzalez: wmcs: instances: require clientpackages from eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/499810 [16:32:17] RECOVERY - Apache HTTP on mw1315 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.039 second response time https://wikitech.wikimedia.org/wiki/Application_servers [16:32:49] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wmcs: instances: require clientpackages from eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/499810 (owner: 10Arturo Borrero Gonzalez) [16:33:20] !log disable cr2-codfw:xe-5/0/0 (to cr2-eqdfw) [16:33:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:35:48] (03PS1) 10Bstorm: gridengine: fix the dedicated class up a bit [puppet] - 10https://gerrit.wikimedia.org/r/499813 (https://phabricator.wikimedia.org/T218126) [16:36:17] !log move python3-requests and python3-urllib3 from jessie-wikimedia backports to component/kube2proxy [16:36:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:36:23] (03PS11) 10Vgutierrez: Allow acme-chief to provide unified cert [puppet] - 10https://gerrit.wikimedia.org/r/497929 (https://phabricator.wikimedia.org/T182927) (owner: 10Alex Monk) [16:36:27] !log wikitech-static - changing [renewalparams] authenticator = to 'apache' from 'standalone' (installer = was already apache) (T214640) [16:36:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:36:31] T214640: wikitech-static cert renewal seems to stop apache2 - https://phabricator.wikimedia.org/T214640 [16:38:30] 10Operations, 10Performance-Team, 10Traffic: Send peering requests to AS with the worst TTFB - https://phabricator.wikimedia.org/T219486 (10Gilles) Thanks for the details. I don't have permission to access T186835 Is this manual work something I can do myself? [16:39:46] !log enable cr2-codfw:xe-5/0/0 (to cr2-eqdfw) [16:39:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:40:17] (03PS2) 10Bstorm: gridengine: fix the dedicated class up a bit [puppet] - 10https://gerrit.wikimedia.org/r/499813 (https://phabricator.wikimedia.org/T218126) [16:40:20] (03CR) 10Vgutierrez: "pcc happy and showing almost a NOOP in cp3030: https://puppet-compiler.wmflabs.org/compiler1001/15408/" [puppet] - 10https://gerrit.wikimedia.org/r/497929 (https://phabricator.wikimedia.org/T182927) (owner: 10Alex Monk) [16:41:05] (03Abandoned) 10Vgutierrez: tlsproxy: Allow acme-chief certs to be deployed [puppet] - 10https://gerrit.wikimedia.org/r/499779 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez) [16:41:31] 10Operations, 10PHP 7.2 support: PHP Fatal error: The UdpSocket to 127.0.0.1:10514 has been closed (from Monolog/SyslogUdp) - https://phabricator.wikimedia.org/T214734 (10fgiunchedi) Is it sporadic or reproducible at will? My first thought was rsyslog not running (or restarting) and thus 10514 udp not open [16:41:48] 10Operations, 10PHP 7.2 support, 10User-fgiunchedi: PHP Fatal error: The UdpSocket to 127.0.0.1:10514 has been closed (from Monolog/SyslogUdp) - https://phabricator.wikimedia.org/T214734 (10fgiunchedi) [16:42:24] 10Operations, 10User-fgiunchedi: Session storage Cassandra metrics (Prometheus) not being collected - https://phabricator.wikimedia.org/T219523 (10fgiunchedi) [16:42:39] (03CR) 10Bstorm: [C: 03+2] gridengine: fix the dedicated class up a bit [puppet] - 10https://gerrit.wikimedia.org/r/499813 (https://phabricator.wikimedia.org/T218126) (owner: 10Bstorm) [16:45:49] (03PS4) 10Vgutierrez: hieradata: Deploy acme-chief unified certificate in eqsin cp servers [puppet] - 10https://gerrit.wikimedia.org/r/499780 (https://phabricator.wikimedia.org/T213705) [16:46:26] (03PS1) 10Jbond: jessie-backports: add component/kube2proxy apt repository [puppet] - 10https://gerrit.wikimedia.org/r/499815 (https://phabricator.wikimedia.org/T219333) [16:46:50] vgutierrez, you realise this thing is set to actually start using the cert right? [16:47:25] like the cert is not just pulled down to the host, it's actually going to serve traffic using it [16:48:19] (03PS1) 10Herron: rename admin stub files [labs/private] - 10https://gerrit.wikimedia.org/r/499816 [16:48:44] (03CR) 10Herron: [V: 03+2 C: 03+2] rename admin stub files [labs/private] - 10https://gerrit.wikimedia.org/r/499816 (owner: 10Herron) [16:49:26] 10Operations, 10netops: Add eqsin routing special cases to jnt - https://phabricator.wikimedia.org/T211930 (10ayounsi) a:05faidon→03ayounsi Moving forward on that as the latest plan (taking the feedback into consideration) is anyway better than what we currently have deployed in Singapore. [16:50:33] (03PS12) 10Vgutierrez: Allow acme-chief to provide unified cert [puppet] - 10https://gerrit.wikimedia.org/r/497929 (https://phabricator.wikimedia.org/T182927) (owner: 10Alex Monk) [16:50:35] (03PS5) 10Vgutierrez: hieradata: Deploy acme-chief unified certificate in eqsin cp servers [puppet] - 10https://gerrit.wikimedia.org/r/499780 (https://phabricator.wikimedia.org/T213705) [16:51:26] (03PS12) 10Jbond: Move qualified parameters to there correct location [puppet] - 10https://gerrit.wikimedia.org/r/497767 [16:51:45] (03PS2) 10DCausse: [cirrus] Use bm25 similarity for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499795 (https://phabricator.wikimedia.org/T219268) [16:53:46] (03PS13) 10Vgutierrez: Allow acme-chief to provide unified cert [puppet] - 10https://gerrit.wikimedia.org/r/497929 (https://phabricator.wikimedia.org/T182927) (owner: 10Alex Monk) [16:53:48] (03PS6) 10Vgutierrez: hieradata: Deploy acme-chief unified certificate in eqsin cp servers [puppet] - 10https://gerrit.wikimedia.org/r/499780 (https://phabricator.wikimedia.org/T213705) [16:53:52] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Use bm25 similarity for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499795 (https://phabricator.wikimedia.org/T219268) (owner: 10DCausse) [16:54:36] (03PS1) 10Arturo Borrero Gonzalez: Revert "wmcs: instances: require clientpackages from eqiad1" [puppet] - 10https://gerrit.wikimedia.org/r/499817 [16:54:48] vgutierrez? [16:55:51] (03PS2) 10Arturo Borrero Gonzalez: Revert "wmcs: instances: require clientpackages from eqiad1" [puppet] - 10https://gerrit.wikimedia.org/r/499817 [16:57:19] -traffic [16:57:29] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] Revert "wmcs: instances: require clientpackages from eqiad1" [puppet] - 10https://gerrit.wikimedia.org/r/499817 (owner: 10Arturo Borrero Gonzalez) [17:00:04] cscott, arlolra, subbu, halfak, and Amir1: #bothumor I � Unicode. All rise for Services – Graphoid / Parsoid / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190328T1700). [17:01:56] 10Operations, 10Continuous-Integration-Config, 10Patch-For-Review, 10Upstream, 10User-zeljkofilipin: npm 6 consistently fails with "Z_DATA_ERROR: invalid distance too far back" on some repos - https://phabricator.wikimedia.org/T215562 (10Krinkle) a:03MoritzMuehlenhoff [17:02:24] (03CR) 10Cwhite: [C: 03+2] Switch mjolnir to rsyslog based structured logging [puppet] - 10https://gerrit.wikimedia.org/r/498232 (https://phabricator.wikimedia.org/T218833) (owner: 10EBernhardson) [17:02:32] (03PS8) 10Cwhite: Switch mjolnir to rsyslog based structured logging [puppet] - 10https://gerrit.wikimedia.org/r/498232 (https://phabricator.wikimedia.org/T218833) (owner: 10EBernhardson) [17:04:27] (03CR) 10Herron: [C: 03+1] facter: fix interface_primary under newer versions of facter [puppet] - 10https://gerrit.wikimedia.org/r/398120 (https://phabricator.wikimedia.org/T182819) (owner: 10Herron) [17:15:37] (03PS3) 10DCausse: [cirrus] Use bm25 similarity for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499795 (https://phabricator.wikimedia.org/T219268) [17:16:12] 10Operations, 10Continuous-Integration-Config, 10Patch-For-Review, 10User-zeljkofilipin: npm 6 consistently fails with "Z_DATA_ERROR: invalid distance too far back" on some repos - https://phabricator.wikimedia.org/T215562 (10Krinkle) It seems we've found the culprit. The problem is indeed the zlib1g libra... [17:17:34] (03PS5) 10Jbond: facter: fix interface_primary under newer versions of facter [puppet] - 10https://gerrit.wikimedia.org/r/398120 (https://phabricator.wikimedia.org/T182819) (owner: 10Herron) [17:17:54] (03PS1) 10Vgutierrez: nagios_common: provide check_ssl_unified variants for LE certs [puppet] - 10https://gerrit.wikimedia.org/r/499823 (https://phabricator.wikimedia.org/T213705) [17:19:13] (03CR) 10Jbond: [C: 03+2] facter: fix interface_primary under newer versions of facter [puppet] - 10https://gerrit.wikimedia.org/r/398120 (https://phabricator.wikimedia.org/T182819) (owner: 10Herron) [17:21:05] (03PS2) 10Addshore: wikibase.php, define sharedCacheKeyGroup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499158 [17:21:07] (03PS2) 10Arturo Borrero Gonzalez: openstack: add glance support for mitaka/stretch in cloudcontrol servers [puppet] - 10https://gerrit.wikimedia.org/r/499806 (https://phabricator.wikimedia.org/T215407) [17:23:48] (03PS1) 10Vgutierrez: cache: serve wikiba.se traffic using cache::text servers [puppet] - 10https://gerrit.wikimedia.org/r/499825 (https://phabricator.wikimedia.org/T213705) [17:24:07] 10Operations, 10Traffic, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Reduce / remove the aggessive cache busting behaviour of wdqs-updater - https://phabricator.wikimedia.org/T217897 (10Addshore) >>! In T217897#5062728, @Smalyshev wrote: >> the cache we are talking about there would be unnecessary... [17:25:14] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "catalog compilation: https://puppet-compiler.wmflabs.org/compiler1002/15413/" [puppet] - 10https://gerrit.wikimedia.org/r/499806 (https://phabricator.wikimedia.org/T215407) (owner: 10Arturo Borrero Gonzalez) [17:25:36] 10Operations, 10Performance-Team, 10Traffic: Send peering requests to AS with the worst TTFB - https://phabricator.wikimedia.org/T219486 (10ayounsi) (Added you to the task) In some measure, yes. Getting the routing table is the most complicated part. As a one of, I've been SSHing directly to the routers, bu... [17:25:46] (03CR) 10jerkins-bot: [V: 04-1] cache: serve wikiba.se traffic using cache::text servers [puppet] - 10https://gerrit.wikimedia.org/r/499825 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez) [17:27:09] Hello! The fix for T219514 should be backported before train. Shall I add it to the next SWAT window, or will whoever is in charge of the train take care of that? [17:27:10] T219514: Variables old_wikitext and new_wikitext are blank in Page namespace - https://phabricator.wikimedia.org/T219514 [17:29:18] (03PS2) 10Vgutierrez: cache: serve wikiba.se traffic using cache::text servers [puppet] - 10https://gerrit.wikimedia.org/r/499825 (https://phabricator.wikimedia.org/T213705) [17:29:25] (03CR) 10Andrew Bogott: "@Giuseppe, given that your objections haven't born out (this works, and doesn't disrupt puppetdb-populate), can I persuade you to remove y" [puppet] - 10https://gerrit.wikimedia.org/r/499026 (https://phabricator.wikimedia.org/T219430) (owner: 10Andrew Bogott) [17:29:51] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 239, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [17:30:44] (03PS1) 10Ema: Use scanner.Bytes instead of .Text [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/499827 [17:35:45] (03PS1) 10Cwhite: role: add kafka_shipper to elasticsearch::cirrus role [puppet] - 10https://gerrit.wikimedia.org/r/499829 (https://phabricator.wikimedia.org/T213899) [17:37:43] (03PS1) 10Alex Monk: db-labs: Update MW to use new master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499830 (https://phabricator.wikimedia.org/T219087) [17:38:59] (03CR) 10Herron: [C: 03+1] "LGTM as long as PCC is happy" [puppet] - 10https://gerrit.wikimedia.org/r/499829 (https://phabricator.wikimedia.org/T213899) (owner: 10Cwhite) [17:40:37] (03PS3) 10Vgutierrez: cache: serve wikiba.se traffic using cache::canary servers [puppet] - 10https://gerrit.wikimedia.org/r/499825 (https://phabricator.wikimedia.org/T213705) [17:40:43] (03CR) 10Cwhite: [C: 03+2] "PCC looks good https://puppet-compiler.wmflabs.org/compiler1002/15418/" [puppet] - 10https://gerrit.wikimedia.org/r/499829 (https://phabricator.wikimedia.org/T213899) (owner: 10Cwhite) [17:41:47] PROBLEM - nova-compute proc maximum on cloudvirt1024 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [17:42:19] (03PS1) 10Paladox: Fix Quota rest rate limiter support [software/gerrit/plugins/quota] (stable-2.15) - 10https://gerrit.wikimedia.org/r/499833 [17:42:22] ^^^ looking [17:42:35] (03CR) 10Paladox: [V: 03+2 C: 03+2] Fix Quota rest rate limiter support [software/gerrit/plugins/quota] (stable-2.15) - 10https://gerrit.wikimedia.org/r/499833 (owner: 10Paladox) [17:43:31] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@ff9d424]: Deploy logging @cee: prefixing bugfix [17:43:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:43:37] (03CR) 10Vgutierrez: "pcc shows the expected changes in cp1008, the changes shown in cp5007 are from the previous changes that are not still merged, (NOOP for t" [puppet] - 10https://gerrit.wikimedia.org/r/499825 (https://phabricator.wikimedia.org/T213705) (owner: 10Vgutierrez) [17:43:39] (03CR) 10Bstorm: [C: 03+1] "Any affected hosts are definitely not Trusty on the cloud side at this point." [puppet] - 10https://gerrit.wikimedia.org/r/499769 (owner: 10Dzahn) [17:44:20] (03PS1) 10Paladox: Merge branch 'stable-2.15' into HEAD [software/gerrit/plugins/quota] (stable-2.16) - 10https://gerrit.wikimedia.org/r/499834 [17:46:02] (03PS2) 10Paladox: Merge branch 'stable-2.15' into stable-2.16 [software/gerrit/plugins/quota] (stable-2.16) - 10https://gerrit.wikimedia.org/r/499834 [17:46:17] (03PS1) 10Paladox: Merge branch 'stable-2.16' [software/gerrit/plugins/quota] - 10https://gerrit.wikimedia.org/r/499835 [17:46:32] (03CR) 10Paladox: [V: 03+2 C: 03+2] Merge branch 'stable-2.15' into stable-2.16 [software/gerrit/plugins/quota] (stable-2.16) - 10https://gerrit.wikimedia.org/r/499834 (owner: 10Paladox) [17:46:39] (03CR) 10Paladox: [V: 03+2 C: 03+2] Merge branch 'stable-2.16' [software/gerrit/plugins/quota] - 10https://gerrit.wikimedia.org/r/499835 (owner: 10Paladox) [17:46:55] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@ff9d424]: Deploy logging @cee: prefixing bugfix (duration: 03m 24s) [17:46:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:48:35] (03PS1) 10Paladox: Update quota to our fork [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/499836 [17:50:51] (03PS1) 10Paladox: Update quota to our fork [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/499837 [17:52:10] (03CR) 10Paladox: [V: 03+2 C: 03+2] Update quota to our fork [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/499836 (owner: 10Paladox) [17:52:25] (03CR) 10Paladox: [V: 03+2 C: 03+2] Update quota to our fork [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/499837 (owner: 10Paladox) [17:53:41] jouncebot: now [17:53:41] For the next 0 hour(s) and 6 minute(s): Services – Graphoid / Parsoid / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190328T1700) [17:53:44] jouncebot: next [17:53:44] In 0 hour(s) and 6 minute(s): Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190328T1800) [17:53:52] aah James_F if we are aiming for that swat i can do it [17:54:17] (03PS9) 10MSantos: Pass flag use_nodejs10 for maps services [puppet] - 10https://gerrit.wikimedia.org/r/495735 (https://phabricator.wikimedia.org/T215523) [17:56:05] addshore: It's UBN, it takes priority. [17:57:53] (03PS1) 10Arturo Borrero Gonzalez: openstack: add nova support for mitaka/stretch in cloudcontrol servers [puppet] - 10https://gerrit.wikimedia.org/r/499840 (https://phabricator.wikimedia.org/T215407) [17:58:15] James_F: I'll make the cherry pick :) [17:58:49] (03PS2) 10Alex Monk: db-labs: Update MW to use new master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499830 (https://phabricator.wikimedia.org/T219087) [17:59:30] 10Operations, 10Traffic, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Reduce / remove the aggessive cache busting behaviour of wdqs-updater - https://phabricator.wikimedia.org/T217897 (10Smalyshev) > WDQS does know what the latest version of the entity that it is trying to get updates for is, But "... [17:59:43] addshore: I'm waiting for it to land so the cherry-pick refers to the git hash of master. [18:00:02] James_F: is that a thing we normally do? =o [18:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Morning SWAT (Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190328T1800). [18:00:04] No GERRIT patches in the queue for this window AFAICS. [18:00:12] [I'm crashing SWAT. No deploys please.] [18:00:20] addshore: Yes. Traceability is important. [18:00:29] * addshore is going to add something to the calendar for after this UBN deploy fix [18:00:36] James_F: I mean, they always have the same changeid [18:01:03] (03CR) 10jerkins-bot: [V: 04-1] openstack: add nova support for mitaka/stretch in cloudcontrol servers [puppet] - 10https://gerrit.wikimedia.org/r/499840 (https://phabricator.wikimedia.org/T215407) (owner: 10Arturo Borrero Gonzalez) [18:04:01] 10Operations, 10DBA, 10Data-Services, 10cloud-services-team: Prepare and check storage layer for hywwiki - https://phabricator.wikimedia.org/T212625 (10Marostegui) This is no longer throwing errors from what I can see. @Ladsgroup you fixed it in the end? I can also see 3 users there already created and we... [18:05:20] 10Operations, 10DBA, 10Data-Services, 10cloud-services-team: Prepare and check storage layer for hywwiki - https://phabricator.wikimedia.org/T212625 (10Ladsgroup) No it got reverted. It was sending fatal errors. [18:07:16] (03CR) 10CDanis: [C: 03+1] "LGTM with one nit" (031 comment) [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/499368 (owner: 10Volans) [18:12:21] (03PS2) 10Arturo Borrero Gonzalez: openstack: add nova support for mitaka/stretch in cloudcontrol servers [puppet] - 10https://gerrit.wikimedia.org/r/499840 (https://phabricator.wikimedia.org/T215407) [18:12:49] 10Operations, 10DBA, 10Data-Services, 10cloud-services-team: Prepare and check storage layer for hywwiki - https://phabricator.wikimedia.org/T212625 (10Marostegui) Ok - let me know when it is attempted again. Thank you! [18:13:10] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: add nova support for mitaka/stretch in cloudcontrol servers [puppet] - 10https://gerrit.wikimedia.org/r/499840 (https://phabricator.wikimedia.org/T215407) (owner: 10Arturo Borrero Gonzalez) [18:13:37] addshore: Testing now, looks good. [18:14:51] great! [18:17:11] James_F Any chance you could backport+deploy the fix for the other train blocker too? [18:17:50] !log jforrester@deploy1001 Synchronized php-1.33.0-wmf.23/extensions/AdvancedSearch/: AdvancedSearch: Fix two UBNs T219455 T219539 (duration: 00m 59s) [18:17:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:55] T219455: AdvancedSearch extension changes behaviour of default namespaces to be search for anon and logged in users - https://phabricator.wikimedia.org/T219455 [18:17:55] T219539: Browsing to Special:Search is broken on wmf.23, redirects to Special:Search&ns0=1 not Special:Search?ns0=1 - https://phabricator.wikimedia.org/T219539 [18:19:28] addshore: Thank you so much. [18:21:04] Or actually, since I'm here: am I on time to add a patch to this SWAT window? [18:21:50] Daimona: Sure, it's UBN. T219514, right? [18:21:51] T219514: Variables old_wikitext and new_wikitext are blank in Page namespace - https://phabricator.wikimedia.org/T219514 [18:21:57] Thanks! Yes [18:22:05] Linked the backport on phab, it's very simple [18:23:53] James_F: and I okay to do my config one while you prep that backport? [18:24:07] addshore: I'm already pulling it to debug. [18:24:12] ack, will wait [18:24:48] Daimona: Live on mwdebug1002; can you test? [18:24:58] Yes, going [18:25:06] Thank you. [18:27:35] * Daimona is waiting for the test page to save since about half a minute :O [18:28:08] yay, mwdebug servers [18:29:12] I never found them so slow [18:29:28] And now another minute to load AF... Eheh [18:29:43] Daimona: Yeah, they've been particularly bad today. [18:30:19] OK looks good [18:31:43] OK, deploying. [18:31:55] Thanks! [18:32:36] !log jforrester@deploy1001 Synchronized php-1.33.0-wmf.23/extensions/ProofreadPage/includes/Index/IndexContent.php: ProofreadPage: Fix AbuseFilter UBN T219514 (duration: 00m 57s) [18:32:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:32:41] T219514: Variables old_wikitext and new_wikitext are blank in Page namespace - https://phabricator.wikimedia.org/T219514 [18:32:47] Done. [18:32:54] James_F: am I free to proceed with my config change? :) [18:32:54] What's next. addshore? [18:32:57] Go go go. [18:33:00] (03CR) 10Addshore: [C: 03+2] wikibase.php, define sharedCacheKeyGroup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499158 (owner: 10Addshore) [18:33:40] (03PS3) 10Alex Monk: db-labs: Update MW to use new master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499830 (https://phabricator.wikimedia.org/T219087) [18:34:06] (03Merged) 10jenkins-bot: wikibase.php, define sharedCacheKeyGroup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499158 (owner: 10Addshore) [18:34:32] (03CR) 10jerkins-bot: [V: 04-1] db-labs: Update MW to use new master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499830 (https://phabricator.wikimedia.org/T219087) (owner: 10Alex Monk) [18:35:32] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: SWAT: wikibase.php, define sharedCacheKeyGroup (duration: 00m 57s) [18:35:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:35:34] James_F: thats me all done [18:35:37] (03PS4) 10Alex Monk: db-labs: Update MW to use new master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499830 (https://phabricator.wikimedia.org/T219087) [18:35:42] Cool. [18:35:52] Any chance I can add one to SWAT? [18:36:15] i dont see why not [18:37:29] Krenair: There's a no-need-to-deploy test fix landing in wmf.23 for ProofreadPage, but that doesn't need to block anyone. [18:37:51] added, thanks [18:38:11] (03PS3) 10Bstorm: osmdb: Switch the replica to the VM that needs to become the master [puppet] - 10https://gerrit.wikimedia.org/r/495290 (https://phabricator.wikimedia.org/T193264) [18:38:25] (03PS5) 10Addshore: db-labs: Update MW to use new master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499830 (https://phabricator.wikimedia.org/T219087) (owner: 10Alex Monk) [18:38:39] Krenair: its all correct? :) [18:38:48] it's a beta-only thing which is already cherry-pick-deployed [18:39:01] yes [18:39:01] aaah :D [18:39:06] (03CR) 10Addshore: [C: 03+2] db-labs: Update MW to use new master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499830 (https://phabricator.wikimedia.org/T219087) (owner: 10Alex Monk) [18:39:13] (03CR) 10Addshore: [C: 03+2] "already CPed and deployed on beta" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499830 (https://phabricator.wikimedia.org/T219087) (owner: 10Alex Monk) [18:39:42] Cool. [18:40:27] (03CR) 10jenkins-bot: wikibase.php, define sharedCacheKeyGroup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499158 (owner: 10Addshore) [18:40:59] (03CR) 10Bstorm: [C: 03+2] osmdb: Switch the replica to the VM that needs to become the master [puppet] - 10https://gerrit.wikimedia.org/r/495290 (https://phabricator.wikimedia.org/T193264) (owner: 10Bstorm) [18:41:15] (03Merged) 10jenkins-bot: db-labs: Update MW to use new master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499830 (https://phabricator.wikimedia.org/T219087) (owner: 10Alex Monk) [18:41:51] syncing [18:42:43] !log addshore@deploy1001 Synchronized wmf-config/db-labs.php: BETA ONLY db-labs (duration: 00m 57s) [18:42:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:42:44] Krenair: all done [18:42:50] thanks [18:43:03] I am tapping out for the day [18:44:51] 10Operations, 10Analytics, 10Discovery, 10Research: Make hadoop cluster able to push to swift - https://phabricator.wikimedia.org/T219544 (10Nuria) [18:45:16] !log switching replica for osmdb to clouddb1003 VM from labsdb1007 [18:45:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:48:47] 10Operations, 10Analytics, 10Discovery, 10Research: Make hadoop cluster able to push to swift - https://phabricator.wikimedia.org/T219544 (10Ottomata) Oh ho hoooo https://hadoop.apache.org/docs/current/hadoop-openstack/index.html [18:51:38] (03CR) 10jenkins-bot: db-labs: Update MW to use new master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499830 (https://phabricator.wikimedia.org/T219087) (owner: 10Alex Monk) [18:54:04] (03PS1) 10Bstorm: Revert "osmdb: Switch the replica to the VM that needs to become the master" [puppet] - 10https://gerrit.wikimedia.org/r/499862 [18:55:51] (03PS1) 10Mholloway: Enable WikimediaEditorTasks on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499863 (https://phabricator.wikimedia.org/T218136) [18:56:20] (03CR) 10Mholloway: [C: 04-1] "hold for deployment" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499863 (https://phabricator.wikimedia.org/T218136) (owner: 10Mholloway) [18:56:22] (03CR) 10Bstorm: [C: 03+2] Revert "osmdb: Switch the replica to the VM that needs to become the master" [puppet] - 10https://gerrit.wikimedia.org/r/499862 (owner: 10Bstorm) [18:56:43] (03CR) 10Cwhite: [C: 03+2] service::uwsgi: Allow instances to disable logging config [puppet] - 10https://gerrit.wikimedia.org/r/498516 (https://phabricator.wikimedia.org/T217932) (owner: 10BryanDavis) [18:56:51] (03PS3) 10Cwhite: service::uwsgi: Allow instances to disable logging config [puppet] - 10https://gerrit.wikimedia.org/r/498516 (https://phabricator.wikimedia.org/T217932) (owner: 10BryanDavis) [18:57:09] (03CR) 10Cwhite: [C: 03+2] "PCC looks good: https://puppet-compiler.wmflabs.org/compiler1002/15412/" [puppet] - 10https://gerrit.wikimedia.org/r/498516 (https://phabricator.wikimedia.org/T217932) (owner: 10BryanDavis) [18:58:03] (03CR) 10Cwhite: [C: 03+2] "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/498516 (https://phabricator.wikimedia.org/T217932) (owner: 10BryanDavis) [19:00:04] marxarelli: #bothumor I � Unicode. All rise for MediaWiki train - Americas version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190328T1900). [19:04:36] (03PS1) 10Dduvall: all wikis to 1.33.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499866 [19:04:38] (03CR) 10Dduvall: [C: 03+2] all wikis to 1.33.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499866 (owner: 10Dduvall) [19:05:35] (03PS3) 10Smalyshev: Enable new WBCS search together with all search settings. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499695 [19:05:51] (03Merged) 10jenkins-bot: all wikis to 1.33.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499866 (owner: 10Dduvall) [19:07:45] (03PS1) 10Smalyshev: Enable new Lexeme search on testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499867 (https://phabricator.wikimedia.org/T216206) [19:07:47] (03PS1) 10Smalyshev: Enable new Lexeme search on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499868 [19:08:15] !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.23 [19:09:19] dduvall@deploy1001: Failed to log message to wiki. Somebody should check the error logs. [19:09:37] (03CR) 10Volans: "reply inline" (031 comment) [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/499368 (owner: 10Volans) [19:09:46] !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.23 [19:09:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:10:59] PROBLEM - HHVM rendering on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [19:11:09] PROBLEM - HHVM rendering on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [19:11:11] PROBLEM - Apache HTTP on mw1221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [19:11:25] PROBLEM - HHVM rendering on mw1346 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [19:11:33] PROBLEM - HHVM rendering on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [19:12:17] RECOVERY - Apache HTTP on mw1221 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.029 second response time https://wikitech.wikimedia.org/wiki/Application_servers [19:12:39] RECOVERY - HHVM rendering on mw1226 is OK: HTTP OK: HTTP/1.1 200 OK - 76012 bytes in 0.144 second response time https://wikitech.wikimedia.org/wiki/Application_servers [19:13:23] RECOVERY - HHVM rendering on mw1289 is OK: HTTP OK: HTTP/1.1 200 OK - 76012 bytes in 0.141 second response time https://wikitech.wikimedia.org/wiki/Application_servers [19:13:35] RECOVERY - HHVM rendering on mw1288 is OK: HTTP OK: HTTP/1.1 200 OK - 76014 bytes in 2.443 second response time https://wikitech.wikimedia.org/wiki/Application_servers [19:15:05] RECOVERY - HHVM rendering on mw1346 is OK: HTTP OK: HTTP/1.1 200 OK - 76012 bytes in 0.129 second response time https://wikitech.wikimedia.org/wiki/Application_servers [19:19:20] !log 1.33.0-wmf.23 deployed for all wikis (T206677) [19:19:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:19:23] T206677: 1.33.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T206677 [19:20:45] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1004 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [19:20:49] (03CR) 10jenkins-bot: all wikis to 1.33.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499866 (owner: 10Dduvall) [19:21:55] PROBLEM - Nginx local proxy to apache on mw1314 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.009 second response time https://wikitech.wikimedia.org/wiki/Application_servers [19:23:11] RECOVERY - Nginx local proxy to apache on mw1314 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.041 second response time https://wikitech.wikimedia.org/wiki/Application_servers [19:24:33] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1004 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [19:25:44] (03CR) 10Muehlenhoff: [C: 03+1] jessie-backports: add component/kube2proxy apt repository [puppet] - 10https://gerrit.wikimedia.org/r/499815 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [19:33:35] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, the labmon* hosts and labstore100[4-5] have the mitaka component enabled and the packages are available." [puppet] - 10https://gerrit.wikimedia.org/r/499778 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [19:34:12] (03PS1) 10Ladsgroup: ores: use hiera for statsd host [puppet] - 10https://gerrit.wikimedia.org/r/499875 (https://phabricator.wikimedia.org/T218567) [19:35:05] marxarelli: All well with the train? I've got a deployment scheduled for 90 mins from now, but it would be great to get it in sooner if possible [19:35:42] mdholloway: yep yep [19:35:56] great, thanks! [19:36:01] np [19:39:05] !log created table wikimedia_editor_tasks_entity_description_exists on wikidatawiki [19:39:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:40:24] (03PS2) 10Mholloway: Enable WikimediaEditorTasks on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499863 (https://phabricator.wikimedia.org/T218136) [19:42:25] (03CR) 10Mholloway: [C: 03+2] Enable WikimediaEditorTasks on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499863 (https://phabricator.wikimedia.org/T218136) (owner: 10Mholloway) [19:43:52] jouncebot: refresh [19:43:53] I refreshed my knowledge about deployments. [19:43:57] (03Merged) 10jenkins-bot: Enable WikimediaEditorTasks on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499863 (https://phabricator.wikimedia.org/T218136) (owner: 10Mholloway) [19:44:15] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, two comments inline. Let's maybe also amend the existing docs in README.md, e.g." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [19:44:58] greg-g: FYI I just claimed the 22:00-23:00 UTC slot for s Striker bug fix to follow up my release last week [19:45:48] (03PS2) 10Ladsgroup: ores: use hiera for statsd host [puppet] - 10https://gerrit.wikimedia.org/r/499875 (https://phabricator.wikimedia.org/T218567) [19:47:21] !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable WikimediaEditorTasks on wikidatawiki (duration: 00m 52s) [19:47:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:47:40] (03PS6) 10Herron: ores: ship to logstash via the kafka logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/497614 (https://phabricator.wikimedia.org/T213899) [19:48:18] (03PS2) 10Jbond: jessie-backports: add component/kube2proxy apt repository [puppet] - 10https://gerrit.wikimedia.org/r/499815 (https://phabricator.wikimedia.org/T219333) [19:49:19] PROBLEM - puppet last run on lvs1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:49:56] (03CR) 10Jbond: [C: 03+2] jessie-backports: add component/kube2proxy apt repository [puppet] - 10https://gerrit.wikimedia.org/r/499815 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [19:50:03] Mar 28 19:45:16 lvs1004 puppet-agent[19194]: Could not retrieve catalog from remote server: Error 503 on SERVER: [19:51:08] manual run completed ok [19:51:13] will clear shortly [19:53:23] (03PS4) 10Jbond: jessie-backports: remove pins for jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/499778 (https://phabricator.wikimedia.org/T219333) [19:54:35] RECOVERY - puppet last run on lvs1004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [19:55:11] (03CR) 10jenkins-bot: Enable WikimediaEditorTasks on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499863 (https://phabricator.wikimedia.org/T218136) (owner: 10Mholloway) [19:55:17] (03CR) 10Jbond: [C: 03+2] jessie-backports: remove pins for jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/499778 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [19:56:35] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey) [19:56:57] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1002/15421/" [puppet] - 10https://gerrit.wikimedia.org/r/497614 (https://phabricator.wikimedia.org/T213899) (owner: 10Herron) [19:57:10] (03PS3) 10Ladsgroup: ores: use hiera for statsd host [puppet] - 10https://gerrit.wikimedia.org/r/499875 (https://phabricator.wikimedia.org/T218567) [20:11:56] (03CR) 10Ladsgroup: "It doesn't work because the labs role module doesn't include profile and it includes ores::web directly. I need to fix it." [puppet] - 10https://gerrit.wikimedia.org/r/499875 (https://phabricator.wikimedia.org/T218567) (owner: 10Ladsgroup) [20:13:25] (03Abandoned) 10Ladsgroup: Deprecate statsd hiera config in favor of statsd_host and statsd_port [puppet] - 10https://gerrit.wikimedia.org/r/497316 (https://phabricator.wikimedia.org/T218567) (owner: 10Ladsgroup) [20:14:45] (03PS7) 10Jbond: jessie-backports: warn users if they try to use backports on jessie [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) [20:16:24] (03CR) 10CDanis: [C: 03+1] check_icinga: fix retry logic (031 comment) [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/499368 (owner: 10Volans) [20:17:31] (03CR) 10Jbond: "> Patch Set 6: Code-Review+1" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [20:17:34] 10Operations, 10Data-Services, 10decommission, 10cloud-services-team (Kanban): Reclaim/Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet as soon as they are ready - https://phabricator.wikimedia.org/T216749 (10Bstorm) In the course of sorting out the disabling of things, I found out we monitor... [20:18:55] (03PS1) 10Andrew Bogott: wmcs-spreadcheck: return 0 on success [puppet] - 10https://gerrit.wikimedia.org/r/499887 [20:24:05] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [20:29:52] !log restarting Gerrit on cobalt to effect new Java security update [20:29:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:33:13] (03CR) 10Jbond: jessie-backports: warn users if they try to use backports on jessie (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [20:33:29] 10Operations, 10Analytics, 10EventBus, 10Services, 10vm-requests: Create schema[12]00[12] (schema.svc.{eqiad,codfw}.wmnet) - https://phabricator.wikimedia.org/T219556 (10Ottomata) [20:37:08] (03CR) 10Muehlenhoff: [C: 03+1] jessie-backports: warn users if they try to use backports on jessie (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [20:40:37] jouncebot: now [20:40:37] For the next 0 hour(s) and 19 minute(s): MediaWiki train - Americas version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190328T1900) [20:40:59] marxarelli, mdholloway: Deployment clear? [20:42:01] James_F: all done here [20:42:04] Kk. [20:42:11] (03PS5) 10Jforrester: VE section editing: Enable mobile AB test on remaining target wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498084 (https://phabricator.wikimedia.org/T218851) (owner: 10Esanders) [20:42:17] (03CR) 10Jforrester: [C: 03+2] VE section editing: Enable mobile AB test on remaining target wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498084 (https://phabricator.wikimedia.org/T218851) (owner: 10Esanders) [20:42:29] (03PS3) 10Jforrester: Wikimaniawiki: Enable visual editor in 2019 namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497682 (https://phabricator.wikimedia.org/T218645) (owner: 10Ammarpad) [20:42:35] (03CR) 10Jforrester: [C: 03+2] Wikimaniawiki: Enable visual editor in 2019 namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497682 (https://phabricator.wikimedia.org/T218645) (owner: 10Ammarpad) [20:42:53] (03PS3) 10Jforrester: [Wikitech] Enable VisualEditor in extra namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497081 [20:42:58] jouncebot: next [20:42:59] In 0 hour(s) and 17 minute(s): Enable WikimediaEditorTasks on wikidatawiki (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190328T2100) [20:43:23] (03Merged) 10jenkins-bot: VE section editing: Enable mobile AB test on remaining target wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498084 (https://phabricator.wikimedia.org/T218851) (owner: 10Esanders) [20:44:15] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster={cache_text,cache_upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [20:44:19] PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job={varnish-text,varnish-upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [20:45:45] (03CR) 10Jbond: "will wait for filippo before merging to make sure i haven't overlooked something" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) (owner: 10Jbond) [20:46:19] 10Operations, 10User-Eevans: Credentials needed for session storage Cassandra cluster - https://phabricator.wikimedia.org/T219560 (10Eevans) [20:46:49] (03PS1) 10Andrew Bogott: designate: open API to the backup nova controller [puppet] - 10https://gerrit.wikimedia.org/r/499908 [20:46:51] 10Operations, 10Cassandra, 10User-Eevans: Credentials needed for session storage Cassandra cluster - https://phabricator.wikimedia.org/T219560 (10Eevans) [20:48:02] (03PS4) 10Jforrester: Wikimaniawiki: Enable visual editor in 2019 namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497682 (https://phabricator.wikimedia.org/T218645) (owner: 10Ammarpad) [20:48:09] (03CR) 10Jforrester: [C: 03+2] "…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497682 (https://phabricator.wikimedia.org/T218645) (owner: 10Ammarpad) [20:48:26] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: VisualEditor: Enable mobile section editing A/B test on 10 Wikipedias T218851 T218939 (duration: 00m 50s) [20:48:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:48:31] T218939: Deploy mobile section editing to the next set of wikis (listed in the description) - https://phabricator.wikimedia.org/T218939 [20:48:31] T218851: Instrument mobile visual editor for a section editing experiment - https://phabricator.wikimedia.org/T218851 [20:49:12] (03Merged) 10jenkins-bot: Wikimaniawiki: Enable visual editor in 2019 namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497682 (https://phabricator.wikimedia.org/T218645) (owner: 10Ammarpad) [20:49:21] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [20:49:25] RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [20:49:37] (03PS2) 10Andrew Bogott: designate: open API to the backup nova controller [puppet] - 10https://gerrit.wikimedia.org/r/499908 [20:50:32] (03CR) 10jenkins-bot: VE section editing: Enable mobile AB test on remaining target wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498084 (https://phabricator.wikimedia.org/T218851) (owner: 10Esanders) [20:50:34] (03CR) 10jenkins-bot: Wikimaniawiki: Enable visual editor in 2019 namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497682 (https://phabricator.wikimedia.org/T218645) (owner: 10Ammarpad) [20:50:52] (03CR) 10Andrew Bogott: [C: 03+2] designate: open API to the backup nova controller [puppet] - 10https://gerrit.wikimedia.org/r/499908 (owner: 10Andrew Bogott) [20:54:51] (03PS1) 10Bstorm: wikilabels: Update toolschecker to monitor the live DB [puppet] - 10https://gerrit.wikimedia.org/r/499910 (https://phabricator.wikimedia.org/T216749) [20:54:54] (03PS1) 10Andrew Bogott: designate ferm: remove some commas that were upsetting ferm [puppet] - 10https://gerrit.wikimedia.org/r/499911 [20:55:31] PROBLEM - Check systemd state on cloudservices1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [20:56:27] (03CR) 10Andrew Bogott: [C: 03+2] designate ferm: remove some commas that were upsetting ferm [puppet] - 10https://gerrit.wikimedia.org/r/499911 (owner: 10Andrew Bogott) [20:57:48] (03CR) 10Bstorm: "I've manually tested all the checker's actions using the privs the checker uses." [puppet] - 10https://gerrit.wikimedia.org/r/499910 (https://phabricator.wikimedia.org/T216749) (owner: 10Bstorm) [20:58:05] RECOVERY - Check systemd state on cloudservices1004 is OK: OK - running: The system is fully operational [20:58:32] (03PS1) 10Muehlenhoff: Extend account date for pbj [puppet] - 10https://gerrit.wikimedia.org/r/499912 [20:59:21] (03PS1) 10Jforrester: Revert "Wikimaniawiki: Enable visual editor in 2019 namespace" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499913 [20:59:23] 10Operations, 10ops-eqiad: Degraded RAID on sodium - https://phabricator.wikimedia.org/T212010 (10RobH) No update back from Dell, so sent a followup today: > Ivan, > > Any updates on this? [20:59:43] (03CR) 10Jforrester: [C: 03+2] Revert "Wikimaniawiki: Enable visual editor in 2019 namespace" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499913 (owner: 10Jforrester) [21:00:04] mdholloway: Dear deployers, time to do the Enable WikimediaEditorTasks on wikidatawiki deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190328T2100). [21:00:18] already done ^ [21:00:44] 10Operations, 10Patch-For-Review: Switch the main etcd cluster in eqiad to use conf1004-1006 - https://phabricator.wikimedia.org/T205814 (10RobH) [21:00:48] 10Operations, 10ops-eqiad, 10decommission: Decommission conf100[1-3] - https://phabricator.wikimedia.org/T206626 (10RobH) 05Open→03Resolved [21:00:52] (03Merged) 10jenkins-bot: Revert "Wikimaniawiki: Enable visual editor in 2019 namespace" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499913 (owner: 10Jforrester) [21:01:41] (03CR) 10jenkins-bot: Revert "Wikimaniawiki: Enable visual editor in 2019 namespace" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499913 (owner: 10Jforrester) [21:04:04] (03PS4) 10Jforrester: [Wikitech] Enable VisualEditor in extra namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497081 [21:04:11] (03CR) 10Jforrester: [C: 03+2] [Wikitech] Enable VisualEditor in extra namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497081 (owner: 10Jforrester) [21:05:13] (03PS2) 10Muehlenhoff: Extend account date for pbj [puppet] - 10https://gerrit.wikimedia.org/r/499912 [21:05:20] (03Merged) 10jenkins-bot: [Wikitech] Enable VisualEditor in extra namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497081 (owner: 10Jforrester) [21:06:28] (03CR) 10Muehlenhoff: [C: 03+2] Extend account date for pbj [puppet] - 10https://gerrit.wikimedia.org/r/499912 (owner: 10Muehlenhoff) [21:06:58] (03PS2) 10Smalyshev: Enable new Lexeme search on testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499867 (https://phabricator.wikimedia.org/T216206) [21:10:00] 10Operations, 10Patch-For-Review: Audit our puppet tree for uses of jessie-backports - https://phabricator.wikimedia.org/T216711 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff Closing, the remaining work for this is handled via T219333 [21:11:58] (03CR) 10Jforrester: "Works fine: https://wikitech.wikimedia.org/w/index.php?title=Tool:ReleaseTaggerBot&diff=1821735&oldid=1815383" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497081 (owner: 10Jforrester) [21:12:02] 10Operations, 10Analytics, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10RobH) a:05RobH→03elukey As the hardware order is pending, and T219522 is setup for the installation, I'm reassigning this to @elukey as there is nothing more pending for me to do at thi... [21:12:45] (03CR) 10jenkins-bot: [Wikitech] Enable VisualEditor in extra namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497081 (owner: 10Jforrester) [21:12:52] 10Operations, 10Analytics, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10RobH) Basically this can be resolved as soon as @elukey is happy with it. It can stay open until after the new hardware is installed if preferred. [21:13:04] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [Wikitech] Enable VisualEditor in extra namespaces (duration: 00m 50s) [21:13:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:13:35] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: decommission of restbase200[1-6] (lease return in December 2018) - https://phabricator.wikimedia.org/T211070 (10RobH) 05Open→03Resolved [21:14:07] 10Operations, 10ops-codfw, 10decommission: Decommission elastic2001-2024 - https://phabricator.wikimedia.org/T211023 (10RobH) 05Open→03Resolved [21:15:13] (03PS1) 10Jforrester: [Wikimania] Enable VisualEditor in the 2019 namespace via ID [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499915 (https://phabricator.wikimedia.org/T218645) [21:16:14] (03CR) 10Jforrester: [C: 03+2] [Wikimania] Enable VisualEditor in the 2019 namespace via ID [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499915 (https://phabricator.wikimedia.org/T218645) (owner: 10Jforrester) [21:16:30] !log add AS specific policy-statements to cr2-eqsin v6 transits - T211930 [21:16:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:16:35] T211930: Add eqsin routing special cases to jnt - https://phabricator.wikimedia.org/T211930 [21:17:21] (03Merged) 10jenkins-bot: [Wikimania] Enable VisualEditor in the 2019 namespace via ID [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499915 (https://phabricator.wikimedia.org/T218645) (owner: 10Jforrester) [21:18:36] https://commons.wikimedia.org/wiki/File:Julia_Margaret_Cameron_-_Queen_of_the_May_-_1984.166_-_Cleveland_Museum_of_Art.tif [21:18:45] no thumbnails for all day [21:18:50] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [Wikimania] Enable VisualEditor in the 2019 namespace T218645 (duration: 00m 50s) [21:18:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:18:58] T218645: Wikimaniawiki: enable VisualEditor in the "2019:" namespace - https://phabricator.wikimedia.org/T218645 [21:19:44] https://upload.wikimedia.org/wikipedia/commons/thumb/2/28/Julia_Margaret_Cameron_-_Queen_of_the_May_-_1984.166_-_Cleveland_Museum_of_Art.tif/lossy-page1-100px-Julia_Margaret_Cameron_-_Queen_of_the_May_-_1984.166_-_Cleveland_Museum_of_Art.tif.jpg gives a 500 [21:20:06] Request from 88.182.181.224 via cp1090 cp1090, Varnish XID 398255068 [21:23:54] (03CR) 10jenkins-bot: [Wikimania] Enable VisualEditor in the 2019 namespace via ID [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499915 (https://phabricator.wikimedia.org/T218645) (owner: 10Jforrester) [21:24:00] (03CR) 10Jforrester: "This needs for I80e054a2134ca to be live in production before we can deploy it. Provisionally, Wednesday 3 April." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496677 (https://phabricator.wikimedia.org/T218363) (owner: 10Varnent) [21:27:24] is Gerrit okay? i just tried to push a change for review and got "fatal: Unpack error, check server log" [21:27:31] "error: remote unpack failed: error Missing tree 9d91b3c6800247ad7e9cf06698d5d7ebd51a4ad3" [21:27:50] hmm gerrit's not loading for me. [21:28:28] paladox: same [21:28:55] thcipriani ^^ [21:29:24] ERR_TIMED_OUT [21:30:09] PROBLEM - Gerrit Health Check on gerrit.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://gerrit.wikimedia.org/r/config/server/healthcheck%7Estatus [21:30:19] PROBLEM - Gerrit JSON on gerrit.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Gerrit%23Monitoring [21:31:20] > is Gerrit okay? i just tried to push a change for review and got "fatal: Unpack error, check server log" [21:31:20] FWIW i've been hitting this last 2 weeks. Rebasing always solves it [21:31:31] I love that users are quicker than monitoring [21:31:32] Gerrit down for me too :/ [21:32:49] yah it's being looked at [21:33:51] There was some weirdness happening about 15 minutes before it fully died (see messages in releng channel) [21:34:13] someone walking into the dc unplugged the wrong thing - please be patient :) [21:41:37] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_jenkins CI slave scripts] [21:42:07] ^ this kind of error is expected given that gerrit is down currently. [21:43:03] ACKNOWLEDGEMENT - puppet last run on contint2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_jenkins CI slave scripts] cole_white gerrit fallout [21:44:56] thcipriani: ^^^ [21:45:07] looking [21:51:57] !log restarting gerrit [21:51:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:53:29] RECOVERY - Gerrit JSON on gerrit.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 25685 bytes in 8.442 second response time https://wikitech.wikimedia.org/wiki/Gerrit%23Monitoring [21:54:25] RECOVERY - Gerrit Health Check on gerrit.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 950 bytes in 0.049 second response time https://gerrit.wikimedia.org/r/config/server/healthcheck%7Estatus [21:56:09] PROBLEM - puppet last run on notebook1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] [21:56:25] PROBLEM - puppet last run on kafka1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] [21:56:53] ACKNOWLEDGEMENT - puppet last run on kafka1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] cole_white gerrit fallout [21:56:53] ACKNOWLEDGEMENT - puppet last run on notebook1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] cole_white gerrit fallout [21:57:35] ACKNOWLEDGEMENT - puppet last run on bromine is CRITICAL: CRITICAL: Puppet has 5 failures. Last run 4 minutes ago with 5 failures. Failed resources (up to 3 shown): Exec[git_pull_wikibase/wikiba.se-deploy],Exec[git_pull_research/landing-page],Exec[git_pull_design/landing-page],Exec[git_pull_design/style-guide] cole_white gerrit fallout [21:57:35] ACKNOWLEDGEMENT - puppet last run on db2094 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config] cole_white gerrit fallout [21:57:35] ACKNOWLEDGEMENT - puppet last run on kafka1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] cole_white gerrit fallout [21:57:35] ACKNOWLEDGEMENT - puppet last run on notebook1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] cole_white gerrit fallout [21:57:35] ACKNOWLEDGEMENT - puppet last run on stat1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_statistics_mediawiki] cole_white gerrit fallout [21:57:35] ACKNOWLEDGEMENT - puppet last run on stat1006 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[git_pull_statistics_mediawiki],Exec[git_pull_analytics/reportupdater] cole_white gerrit fallout [21:57:35] ACKNOWLEDGEMENT - puppet last run on stat1007 is CRITICAL: CRITICAL: Puppet has 6 failures. Last run 3 minutes ago with 6 failures. Failed resources (up to 3 shown): Exec[git_pull_wmde/scripts],Exec[git_pull_wmde/toolkit-analyzer-build],Exec[git_pull_mediawiki/event-schemas],Exec[git_pull_statistics_mediawiki] cole_white gerrit fallout [21:57:36] ACKNOWLEDGEMENT - puppet last run on vega is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 3 minutes ago with 4 failures. Failed resources (up to 3 shown): Exec[git_pull_research/landing-page],Exec[git_pull_design/landing-page],Exec[git_pull_design/style-guide],Exec[git_pull_wikimedia/campaigns/eswiki-2018] cole_white gerrit fallout [21:58:13] ACKNOWLEDGEMENT - puppet last run on an-coord1001 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas],Exec[git_pull_operations/mediawiki-config] cole_white gerrit fallout [21:58:13] ACKNOWLEDGEMENT - puppet last run on kafka2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] cole_white gerrit fallout [22:00:04] bd808: (Dis)respected human, time to deploy Striker update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190328T2200). Please do the needful. [22:02:43] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [22:02:45] * bd808 is running slightly behind, but will be ready to deploy "soon" [22:11:10] !log add AS specific policy-statements to cr1-eqsin v6 transits - T211930 [22:11:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:11:14] T211930: Add eqsin routing special cases to jnt - https://phabricator.wikimedia.org/T211930 [22:12:04] !log bd808@deploy1001 Started deploy [striker/deploy@2f62c43]: Fixes for error pages and repo creation (T176325) [22:12:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:12:07] T176325: "No Phabricator accounts found for tool maintainers." while creating the a new diffusion repository - https://phabricator.wikimedia.org/T176325 [22:12:22] (03PS1) 10Muehlenhoff: docker: Remove support for trusty images [puppet] - 10https://gerrit.wikimedia.org/r/499929 [22:13:03] !log bd808@deploy1001 Finished deploy [striker/deploy@2f62c43]: Fixes for error pages and repo creation (T176325) (duration: 00m 59s) [22:13:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:14:46] (03PS4) 10Jayprakash12345: Enable $wgAllowCopyUploads for pawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495446 (https://phabricator.wikimedia.org/T217486) [22:17:13] RECOVERY - puppet last run on notebook1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:17:43] (03PS1) 10Muehlenhoff: role::labs::instance: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/499933 [22:22:16] (03PS1) 10Muehlenhoff: Stop serving trusty repositories in aptly [puppet] - 10https://gerrit.wikimedia.org/r/499935 [22:22:49] RECOVERY - puppet last run on kafka1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:23:51] (03PS2) 10Bstorm: wikilabels: Update toolschecker to monitor the live DB [puppet] - 10https://gerrit.wikimedia.org/r/499910 (https://phabricator.wikimedia.org/T216749) [22:23:59] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Migrate at least 3 existing Logstash inputs and associated producers to the new Kafka-logging pipeline, and remove the associated non-Kafka Logstash inputs - https://phabricator.wikimedia.org/T213899 (10bd808) [22:27:45] (03CR) 10Bstorm: [C: 03+2] wikilabels: Update toolschecker to monitor the live DB [puppet] - 10https://gerrit.wikimedia.org/r/499910 (https://phabricator.wikimedia.org/T216749) (owner: 10Bstorm) [22:28:58] * bd808 is done on deploy1001 [22:33:27] (03PS8) 10Jbond: jessie-backports: warn users if they try to use backports on jessie [puppet] - 10https://gerrit.wikimedia.org/r/499505 (https://phabricator.wikimedia.org/T219333) [22:45:47] 10Operations, 10Data-Services, 10decommission, 10Patch-For-Review, 10cloud-services-team (Kanban): Reclaim/Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet as soon as they are ready - https://phabricator.wikimedia.org/T216749 (10Bstorm) Need to disable or update any monitoring of the mariad... [22:54:50] 10Operations, 10Traffic, 10VisualEditor, 10Wikimedia-Apache-configuration, 10User-Ryasmeen: Visual Editor gets stuck opening article (net::ERR_SPDY_PROTOCOL_ERROR 200/Loading failed for the