[00:00:17] :) :) [00:04:13] 10Operations, 10Patch-For-Review, 10Services (doing), 10User-mobrovac: nodejs 6.11 - https://phabricator.wikimedia.org/T170548#3473292 (10debt) @ksmith and @Gehel - I believe we updated maps to use node 6 in these tickets: T150354 and T158984. @MaxSem - is there more to do, that you know of? [00:15:04] (03PS1) 10Reedy: phpcs on multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367831 (https://phabricator.wikimedia.org/T171509) [00:17:52] (03CR) 10Reedy: "Woo, less exclusions" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367831 (https://phabricator.wikimedia.org/T171509) (owner: 10Reedy) [00:23:02] (03PS2) 10Reedy: phpcs on multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367831 (https://phabricator.wikimedia.org/T171509) [00:25:40] (03CR) 10jerkins-bot: [V: 04-1] phpcs on multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367831 (https://phabricator.wikimedia.org/T171509) (owner: 10Reedy) [00:27:23] (03PS3) 10Reedy: phpcs on multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367831 (https://phabricator.wikimedia.org/T171509) [01:07:16] (03CR) 10Krinkle: [C: 031] phpcs on multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367831 (https://phabricator.wikimedia.org/T171509) (owner: 10Reedy) [01:08:06] (03CR) 10Krinkle: [C: 031] "@Reedy: Wanna land?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367831 (https://phabricator.wikimedia.org/T171509) (owner: 10Reedy) [01:08:29] Krinkle: you mean jfdi? :P [01:08:44] Well, I did review it, and some tests are passing. [01:08:52] We can give it a few minutes in beta first, and on XMD [01:08:54] XWD* [01:09:06] I thought Krinkle was making a pilot joke [01:09:15] ... [01:09:30] lol [01:09:44] (03CR) 10Reedy: [C: 032] phpcs on multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367831 (https://phabricator.wikimedia.org/T171509) (owner: 10Reedy) [01:11:12] (03Merged) 10jenkins-bot: phpcs on multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367831 (https://phabricator.wikimedia.org/T171509) (owner: 10Reedy) [01:11:22] (03CR) 10jenkins-bot: phpcs on multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367831 (https://phabricator.wikimedia.org/T171509) (owner: 10Reedy) [01:12:21] !log reedy@tin Synchronized tests/multiversion/: phpcs (duration: 00m 46s) [01:12:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:13:17] !log reedy@tin Synchronized phpcs.xml: phpcs (duration: 00m 46s) [01:13:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:14:28] PROBLEM - puppet last run on puppetmaster1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:15:00] !log reedy@tin Synchronized multiversion/: phpcs (duration: 01m 06s) [01:15:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:15:20] next stop mediawiki-codesniffer 0.10.1 [01:24:37] RECOVERY - puppet last run on puppetmaster1001 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [01:29:34] (03CR) 10Krinkle: [C: 031] Do the echo when running update.php [puppet] - 10https://gerrit.wikimedia.org/r/354932 (owner: 10Reedy) [01:30:48] (03CR) 10Krinkle: [C: 04-1] "foreachwikiindblist (mwscriptwikiset) doesn't fail early when one wiki returns non-zeor, right? So we could do all.dblist, as long as we f" [puppet] - 10https://gerrit.wikimedia.org/r/363639 (owner: 10Reedy) [02:04:53] (03CR) 10Krinkle: [C: 031] Allow mwdeploy user to restart jobchron [puppet] - 10https://gerrit.wikimedia.org/r/367815 (https://phabricator.wikimedia.org/T129148) (owner: 10Thcipriani) [02:07:07] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 10765 [02:40:40] 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3473485 (10Jayprakash12345) [03:01:06] !log l10nupdate@tin LocalisationUpdate failed: git pull of extensions failed [03:01:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:27:57] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 770.50 seconds [04:10:57] PROBLEM - Check systemd state on cp1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [04:11:17] PROBLEM - Varnish HTTP text-backend - port 3128 on cp1008 is CRITICAL: connect to address 208.80.154.42 and port 3128: Connection refused [04:22:07] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 288.98 seconds [04:27:47] PROBLEM - puppet last run on cp1008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[varnish] [04:49:57] 10Operations, 10ops-eqiad, 10Cloud-Services: rack/setup/install labstore100[67].wikimedia.org - https://phabricator.wikimedia.org/T167984#3473675 (10madhuvishy) @Cmjohnson These two need to be in the public vlan. [05:05:02] 10Operations, 10ops-eqiad, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Degraded RAID on labsdb1001 - https://phabricator.wikimedia.org/T171538#3473678 (10Marostegui) 05Open>03Resolved ``` [root@labsdb1001 05:04 /root] # megacli -pdrbld -showprog -physdrv\[16:9\] -aALL Device... [05:31:45] (03CR) 10Giuseppe Lavagetto: [C: 032] prometheus::node_exporter: fix compatibility with the future parser [puppet] - 10https://gerrit.wikimedia.org/r/367694 (owner: 10Giuseppe Lavagetto) [05:31:52] (03PS3) 10Giuseppe Lavagetto: prometheus::node_exporter: fix compatibility with the future parser [puppet] - 10https://gerrit.wikimedia.org/r/367694 [05:32:09] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] prometheus::node_exporter: fix compatibility with the future parser [puppet] - 10https://gerrit.wikimedia.org/r/367694 (owner: 10Giuseppe Lavagetto) [05:36:02] (03PS10) 10Giuseppe Lavagetto: role::configcluster: move to future environment [puppet] - 10https://gerrit.wikimedia.org/r/365572 [05:37:34] (03CR) 10Giuseppe Lavagetto: [C: 032] role::configcluster: move to future environment [puppet] - 10https://gerrit.wikimedia.org/r/365572 (owner: 10Giuseppe Lavagetto) [05:38:57] PROBLEM - MariaDB Slave Lag: s1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:39:11] ^ backups [05:40:28] PROBLEM - MariaDB Slave SQL: s4 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:40:28] PROBLEM - MariaDB Slave SQL: m2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:40:28] PROBLEM - MariaDB Slave IO: m2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:40:28] PROBLEM - MariaDB Slave IO: s3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:40:28] PROBLEM - MariaDB Slave SQL: s6 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:40:28] PROBLEM - MariaDB Slave SQL: s2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:40:28] PROBLEM - MariaDB Slave IO: s2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:40:38] PROBLEM - MariaDB Slave SQL: s3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:40:38] PROBLEM - MariaDB Slave IO: s4 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:40:38] PROBLEM - MariaDB Slave IO: s5 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:40:38] PROBLEM - MariaDB Slave SQL: s1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:41:18] RECOVERY - MariaDB Slave SQL: s4 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [05:41:18] RECOVERY - MariaDB Slave SQL: s2 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [05:41:18] RECOVERY - MariaDB Slave IO: s3 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [05:41:18] RECOVERY - MariaDB Slave IO: m2 on dbstore1001 is OK: OK slave_io_state not a slave [05:41:18] RECOVERY - MariaDB Slave SQL: s6 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [05:41:18] RECOVERY - MariaDB Slave SQL: m2 on dbstore1001 is OK: OK slave_sql_state not a slave [05:41:19] RECOVERY - MariaDB Slave IO: s2 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [05:41:37] RECOVERY - MariaDB Slave SQL: s3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [05:41:37] RECOVERY - MariaDB Slave IO: s4 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [05:41:37] RECOVERY - MariaDB Slave SQL: s1 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [05:41:37] RECOVERY - MariaDB Slave IO: s5 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [05:53:17] <_joe_> !log moving all conf* servers to the future puppet parser [05:53:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:56:42] 10Operations, 10Puppet, 10User-Joe: Switch all hosts to the future parser - https://phabricator.wikimedia.org/T171704#3473695 (10Joe) [05:59:48] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=1105.80 Read Requests/Sec=758.80 Write Requests/Sec=1.50 KBytes Read/Sec=49154.40 KBytes_Written/Sec=39.60 [06:06:10] (03PS1) 10Krinkle: Revert "Bump cache epoch for Wikidata" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367853 (https://phabricator.wikimedia.org/T167784) [06:07:58] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=97.30 Read Requests/Sec=0.00 Write Requests/Sec=1.30 KBytes Read/Sec=0.00 KBytes_Written/Sec=29.60 [06:18:50] 10Operations, 10Patch-For-Review, 10Services (doing), 10User-mobrovac: nodejs 6.11 - https://phabricator.wikimedia.org/T170548#3473724 (10MoritzMuehlenhoff) @debt , @ksmith : We currently use nodejs 6.9 in the production cluster and are migrating to 6.11. While 6.x is an LTS release, there's a sizable numb... [06:21:35] (03PS8) 10Muehlenhoff: Clean up stray binary packages after Debian updates [puppet] - 10https://gerrit.wikimedia.org/r/367645 [06:28:13] PROBLEM - ores on scb1003 is CRITICAL: connect to address 10.64.32.153 and port 8081: Connection refused [06:44:13] RECOVERY - ores on scb1003 is OK: HTTP OK: HTTP/1.0 200 OK - 3666 bytes in 0.011 second response time [06:57:11] (03CR) 10Muehlenhoff: [C: 032] Clean up stray binary packages after Debian updates [puppet] - 10https://gerrit.wikimedia.org/r/367645 (owner: 10Muehlenhoff) [06:57:23] PROBLEM - MariaDB Slave Lag: s1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:53] PROBLEM - MariaDB Slave SQL: m2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:54] PROBLEM - MariaDB Slave SQL: s4 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:54] PROBLEM - MariaDB Slave IO: s2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:54] PROBLEM - MariaDB Slave SQL: s2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:54] PROBLEM - MariaDB Slave SQL: s6 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:54] PROBLEM - MariaDB Slave IO: s3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:54] PROBLEM - MariaDB Slave IO: m2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:00:04] PROBLEM - MariaDB Slave IO: s5 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:00:04] PROBLEM - MariaDB Slave IO: s4 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:00:04] PROBLEM - MariaDB Slave SQL: s3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:00:04] PROBLEM - MariaDB Slave SQL: s1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:00:23] PROBLEM - MariaDB Slave Lag: s1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:02:03] RECOVERY - MariaDB Slave IO: s5 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [07:02:03] RECOVERY - MariaDB Slave IO: s4 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [07:02:03] RECOVERY - MariaDB Slave SQL: s3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [07:02:03] RECOVERY - MariaDB Slave SQL: s1 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [07:02:53] RECOVERY - MariaDB Slave SQL: m2 on dbstore1001 is OK: OK slave_sql_state not a slave [07:02:53] RECOVERY - MariaDB Slave IO: m2 on dbstore1001 is OK: OK slave_io_state not a slave [07:02:53] RECOVERY - MariaDB Slave SQL: s4 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [07:02:53] RECOVERY - MariaDB Slave SQL: s6 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [07:02:53] RECOVERY - MariaDB Slave IO: s2 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [07:02:53] RECOVERY - MariaDB Slave IO: s3 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [07:02:53] RECOVERY - MariaDB Slave SQL: s2 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [07:09:04] 10Operations, 10ops-codfw: failing RAID disk on frdb2001 - https://phabricator.wikimedia.org/T171584#3473760 (10MoritzMuehlenhoff) p:05Triage>03Normal [07:11:04] PROBLEM - BGP status on cr1-eqord is CRITICAL: BGP CRITICAL - AS2914/IPv6: Active, AS2914/IPv4: Active [07:13:53] RECOVERY - Varnish HTTP text-backend - port 3128 on cp1008 is OK: HTTP OK: HTTP/1.1 200 OK - 176 bytes in 0.001 second response time [07:14:33] RECOVERY - Check systemd state on cp1008 is OK: OK - running: The system is fully operational [07:14:53] !log cp1008: use sdb only in varnish.service, waiting for Chris to replace sda T171028 [07:15:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:15:06] T171028: Degraded RAID on cp1008 - https://phabricator.wikimedia.org/T171028 [07:15:10] 10Operations, 10Interactive-Sprint, 10Maps (Kartotherian): Upgrade kartotherian and tilerator to nodejs 6.11 - https://phabricator.wikimedia.org/T171707#3473775 (10Gehel) [07:18:02] (03PS24) 10Ema: varnish: Avoid std.fileread() and use new errorpage template [puppet] - 10https://gerrit.wikimedia.org/r/350966 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [07:28:02] (03CR) 10Jcrespo: [C: 04-1] "Right now this is a no, based on 2 ongoing issues: https://phabricator.wikimedia.org/T167784 and the unbreak now https://phabricator.wikim" [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup) [07:28:56] (03PS25) 10Ema: varnish: Avoid std.fileread() and use new errorpage template [puppet] - 10https://gerrit.wikimedia.org/r/350966 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [07:30:50] (03CR) 10Ema: [C: 032] varnish: Avoid std.fileread() and use new errorpage template [puppet] - 10https://gerrit.wikimedia.org/r/350966 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [07:42:24] RECOVERY - BGP status on cr1-eqord is OK: BGP OK - up: 52, down: 0, shutdown: 4 [07:43:03] (03PS1) 10Muehlenhoff: Reimage mw2119 with jessie [puppet] - 10https://gerrit.wikimedia.org/r/367855 [07:43:53] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0 [07:46:13] (03PS4) 10Giuseppe Lavagetto: wmflib: fix all Hiera backends' Rubocop infractions [puppet] - 10https://gerrit.wikimedia.org/r/359447 (owner: 10Faidon Liambotis) [07:48:49] (03CR) 10Muehlenhoff: [C: 032] Reimage mw2119 with jessie [puppet] - 10https://gerrit.wikimedia.org/r/367855 (owner: 10Muehlenhoff) [07:48:53] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [07:48:56] (03PS2) 10Muehlenhoff: Reimage mw2119 with jessie [puppet] - 10https://gerrit.wikimedia.org/r/367855 [07:49:38] (03CR) 10Giuseppe Lavagetto: [C: 032] wmflib: fix all Hiera backends' Rubocop infractions [puppet] - 10https://gerrit.wikimedia.org/r/359447 (owner: 10Faidon Liambotis) [07:50:33] PROBLEM - BGP status on cr1-eqord is CRITICAL: BGP CRITICAL - AS2914/IPv4: Active, AS2914/IPv6: Active [07:52:26] (03PS3) 10Muehlenhoff: Reimage mw2119 with jessie [puppet] - 10https://gerrit.wikimedia.org/r/367855 [07:53:08] ACKNOWLEDGEMENT - BGP status on cr1-eqord is CRITICAL: BGP CRITICAL - AS2914/IPv6: Active, AS2914/IPv4: Active Ema Peering with NTT flapping (AS2914) [07:53:14] (03CR) 10Muehlenhoff: [V: 032 C: 032] Reimage mw2119 with jessie [puppet] - 10https://gerrit.wikimedia.org/r/367855 (owner: 10Muehlenhoff) [07:53:44] !log start defragmenging on pc1* hosts T167784 [07:53:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:53:54] T167784: WMF ParserCache disk space exhaustion - https://phabricator.wikimedia.org/T167784 [07:56:30] !log restarting cassandra-metrics-collector on restbase* to pick up openjdk security update [07:56:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:59:46] !log restarting cassandra-metrics-collector on maps* to pick up openjdk security update [07:59:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:00:40] (03Abandoned) 10Giuseppe Lavagetto: prometheus::node::exporter: ugly workaround for future parser [puppet] - 10https://gerrit.wikimedia.org/r/367659 (owner: 10Giuseppe Lavagetto) [08:02:25] !log installing Java security updates on jessie-based stat systems [08:02:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:02:51] (03PS2) 10Filippo Giunchedi: thumbor: fix connections-per-backend in nginx [puppet] - 10https://gerrit.wikimedia.org/r/367687 (https://phabricator.wikimedia.org/T171468) [08:03:23] 10Operations, 10Traffic, 10Patch-For-Review, 10User-Elukey: logster should not resolve statsd's IP every time it sends a metric - https://phabricator.wikimedia.org/T171318#3473823 (10elukey) After a chat with Moritz and Ema we decided to pick the current jessie version and apply the patch on top of it. In... [08:04:27] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1066 - https://phabricator.wikimedia.org/T169448#3473825 (10jcrespo) This is taking a long time to be rebuilt :-/ - It is still doing it. [08:04:54] (03CR) 10Ema: [C: 031] thumbor: fix connections-per-backend in nginx [puppet] - 10https://gerrit.wikimedia.org/r/367687 (https://phabricator.wikimedia.org/T171468) (owner: 10Filippo Giunchedi) [08:06:33] PROBLEM - puppet last run on stat1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[libgsl0-dev] [08:07:24] (03PS3) 10Filippo Giunchedi: thumbor: fix connections-per-backend in nginx [puppet] - 10https://gerrit.wikimedia.org/r/367687 (https://phabricator.wikimedia.org/T171468) [08:08:00] (03PS4) 10Filippo Giunchedi: thumbor: fix connections-per-backend in nginx [puppet] - 10https://gerrit.wikimedia.org/r/367687 (https://phabricator.wikimedia.org/T171468) [08:08:33] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [08:08:52] 10Operations, 10Traffic, 10Patch-For-Review, 10User-Elukey: logster should not resolve statsd's IP every time it sends a metric - https://phabricator.wikimedia.org/T171318#3460781 (10MoritzMuehlenhoff) Ideally we upgrade to 1.x in Debian, the version currently in the archive is from 2014 and hasn't been to... [08:09:08] (03CR) 10Filippo Giunchedi: [C: 032] thumbor: fix connections-per-backend in nginx [puppet] - 10https://gerrit.wikimedia.org/r/367687 (https://phabricator.wikimedia.org/T171468) (owner: 10Filippo Giunchedi) [08:10:12] (03PS1) 10Jcrespo: mariadb: Depool db1066 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367859 (https://phabricator.wikimedia.org/T169448) [08:12:42] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1066 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367859 (https://phabricator.wikimedia.org/T169448) (owner: 10Jcrespo) [08:13:22] ACKNOWLEDGEMENT - Host ms-be2024 is DOWN: PING CRITICAL - Packet loss = 100% Filippo Giunchedi T171275 [08:14:33] RECOVERY - MegaRAID on db1066 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [08:14:44] (03Merged) 10jenkins-bot: mariadb: Depool db1066 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367859 (https://phabricator.wikimedia.org/T169448) (owner: 10Jcrespo) [08:16:29] (03CR) 10jenkins-bot: mariadb: Depool db1066 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367859 (https://phabricator.wikimedia.org/T169448) (owner: 10Jcrespo) [08:18:37] 10Operations, 10ops-codfw: ms-be2024 not powering on - https://phabricator.wikimedia.org/T171275#3473840 (10fgiunchedi) a:05fgiunchedi>03Papaul @papaul I don't seem to be able to bring back the power via ilo, connected via ssh and power is off. Turning power on doesn't seem to do anything. ``` hpiLO->... [08:21:05] 10Operations, 10ops-codfw, 10User-fgiunchedi: ms-be2024 not powering on - https://phabricator.wikimedia.org/T171275#3473851 (10fgiunchedi) [08:21:22] 10Operations, 10monitoring: On stretch, python metric collector for disk is on DEBUG logging mode - https://phabricator.wikimedia.org/T171638#3473856 (10fgiunchedi) [08:21:24] 10Operations, 10monitoring, 10User-fgiunchedi: Diamond log level set to DEBUG spams syslog - https://phabricator.wikimedia.org/T171580#3473854 (10fgiunchedi) [08:26:01] (03CR) 10Filippo Giunchedi: "LGTM in general, see inline. Too bad the disk collector isn't able to blacklist filesystems :(" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/367710 (https://phabricator.wikimedia.org/T171583) (owner: 10Rush) [08:26:47] !log reimaging mw2119 to jessie (T145742) [08:26:51] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1066 for maintenance (duration: 00m 46s) [08:26:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:59] T145742: Migrate video scalers to jessie - https://phabricator.wikimedia.org/T145742 [08:27:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:27:56] 10Operations, 10Pybal, 10Traffic, 10monitoring: pybal: add prometheus metrics - https://phabricator.wikimedia.org/T171710#3473875 (10ema) [08:28:03] 10Operations, 10Pybal, 10Traffic, 10monitoring: pybal: add prometheus metrics - https://phabricator.wikimedia.org/T171710#3473888 (10ema) p:05Triage>03Normal [08:28:18] 10Operations, 10MediaWiki-extensions-LocalisationUpdate, 10MinervaNeue, 10Release-Engineering-Team: The mobile-frontend-placeholder message is not updated in din.wikipedia.org - https://phabricator.wikimedia.org/T171711#3473889 (10Amire80) [08:29:10] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Degraded RAID on db1066 - https://phabricator.wikimedia.org/T169448#3473901 (10jcrespo) I depool it and not it finishes :-( [08:29:37] (03PS6) 10Filippo Giunchedi: librenms: enable graphite extension [puppet] - 10https://gerrit.wikimedia.org/r/366836 (https://phabricator.wikimedia.org/T171167) [08:31:14] (03CR) 10Filippo Giunchedi: [C: 032] librenms: enable graphite extension [puppet] - 10https://gerrit.wikimedia.org/r/366836 (https://phabricator.wikimedia.org/T171167) (owner: 10Filippo Giunchedi) [08:32:56] (03CR) 10Volans: [C: 04-1] "Nice check to add! I have a couple of general doubts and there are a couple of things to fix." (0312 comments) [puppet] - 10https://gerrit.wikimedia.org/r/367662 (https://phabricator.wikimedia.org/T134893) (owner: 10Ema) [08:36:27] "couple of doubts" --> 12 comments [08:36:36] * elukey loves volans code reviews [08:36:51] elukey: read the whole comment though ;) [08:37:29] volans: I was kiddiiiinggggg [08:38:03] :D [08:40:02] s/a couple/a couple dozen/ [08:40:22] rotfl [08:40:52] "couple" is an arbitrary definition ;) [08:42:09] !log upgrading and restarting db1066 [08:42:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:27] 10Operations, 10MediaWiki-extensions-LocalisationUpdate, 10MinervaNeue, 10Release-Engineering-Team: The mobile-frontend-placeholder message is not updated in din.wikipedia.org - https://phabricator.wikimedia.org/T171711#3473906 (10KartikMistry) It seems LocalisationUpdate is failing. See: https://tools.wmf... [08:45:17] (03PS5) 10Ema: pybal: bind instrumentation TCP port to private addresses [puppet] - 10https://gerrit.wikimedia.org/r/348074 (https://phabricator.wikimedia.org/T103882) [08:45:22] 10Operations, 10ops-eqiad, 10hardware-requests, 10Patch-For-Review: Decommission mw1196 - https://phabricator.wikimedia.org/T170441#3431403 (10MoritzMuehlenhoff) Have the "non-interruptuable steps" really been completed? mw1196 still has a salt key and shows up https://servermon.wikimedia.org/hosts/ [08:45:55] PROBLEM - MariaDB Slave Lag: s1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:44] !log upload logster 0.0.10-2~jessie1 to jessie-wikimedia [08:46:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:47:26] !log rollout logster 0.0.10-2~jessie1 to the cache hosts [08:47:36] ema: --^ [08:47:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:48:23] (03PS1) 10Filippo Giunchedi: librenms: fix log file ownership after rotation [puppet] - 10https://gerrit.wikimedia.org/r/367863 [08:49:18] elukey: \o/ [08:50:24] (03PS2) 10Filippo Giunchedi: librenms: fix log file ownership after rotation [puppet] - 10https://gerrit.wikimedia.org/r/367863 [08:51:43] (03CR) 10Filippo Giunchedi: [C: 032] librenms: fix log file ownership after rotation [puppet] - 10https://gerrit.wikimedia.org/r/367863 (owner: 10Filippo Giunchedi) [08:56:14] RECOVERY - BGP status on cr1-eqord is OK: BGP OK - up: 54, down: 0, shutdown: 2 [08:58:11] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1066 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367865 [09:07:19] 10Operations, 10Traffic, 10Patch-For-Review, 10User-Elukey: logster should not resolve statsd's IP every time it sends a metric - https://phabricator.wikimedia.org/T171318#3473940 (10elukey) New package uploaded to jessie-wikimedia and rolled out to role cache::misc/upload/text/canary. [09:07:42] good morning [09:08:11] for info: the puppet masters for CI (and probably for tools as well) yield Data retrieved from Integration/host/integration-puppetmaster01 is String, not Hash or nil at /etc/puppet/manifests/realm.pp:51 [09:08:19] which is: $app_routes = hiera('discovery::app_routes') [09:08:34] filled as T171712 I am trying to investigate :-} [09:08:34] T171712: integration puppetmaster yield String, not Hash or nil at /etc/puppet/manifests/realm.pp:51 - https://phabricator.wikimedia.org/T171712 [09:09:08] hashar: _joe_ is already on it [09:09:50] good joe :) [09:10:49] (03PS1) 10Muehlenhoff: Install mw2152 and mw2246 with jessie [puppet] - 10https://gerrit.wikimedia.org/r/367868 [09:21:23] 10Operations, 10Traffic, 10Patch-For-Review, 10User-Elukey: logster should not resolve statsd's IP every time it sends a metric - https://phabricator.wikimedia.org/T171318#3473953 (10elukey) 05Open>03Resolved a:03elukey Impact to maerlant and acamar's traffic: {F8854082} {F8854085} In eqiad hydron... [09:21:24] 10Operations, 10netops: "MySQL server has gone away" from librenms logs - https://phabricator.wikimedia.org/T171714#3473956 (10fgiunchedi) [09:21:59] PROBLEM - mediawiki-installation DSH group on mw2119 is CRITICAL: Host mw2119 is not in mediawiki-installation dsh group [09:22:49] PROBLEM - nutcracker port on mw2119 is CRITICAL: connect to address 127.0.0.1 and port 11212: Connection refused [09:22:49] PROBLEM - Check systemd state on mw2119 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:22:56] (03CR) 10Ema: pybal: bind instrumentation TCP port to private addresses (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/348074 (https://phabricator.wikimedia.org/T103882) (owner: 10Ema) [09:23:40] PROBLEM - nutcracker process on mw2119 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 111 (nutcracker), command name nutcracker [09:24:20] (03PS1) 10Giuseppe Lavagetto: hiera: fix mwcache library [puppet] - 10https://gerrit.wikimedia.org/r/367870 [09:25:10] (03CR) 10Giuseppe Lavagetto: [C: 032] hiera: fix mwcache library [puppet] - 10https://gerrit.wikimedia.org/r/367870 (owner: 10Giuseppe Lavagetto) [09:26:09] _joe_: wanna test it on a labs puppet master? [09:26:16] and I had https://phabricator.wikimedia.org/T171712 for that [09:26:19] PROBLEM - Host mw2119 is DOWN: PING CRITICAL - Packet loss = 100% [09:26:40] <_joe_> hashar: please do [09:26:44] doing [09:26:50] PROBLEM - puppet last run on mw1213 is CRITICAL: CRITICAL: Puppet last ran 21 hours ago [09:27:08] (03PS2) 10Hashar: hiera: fix mwcache library [puppet] - 10https://gerrit.wikimedia.org/r/367870 (https://phabricator.wikimedia.org/T171712) (owner: 10Giuseppe Lavagetto) [09:27:11] <_joe_> hashar: you will need to run puppet on the puppetmaster to fix it I think [09:27:16] (03CR) 10Hashar: "testing it on the CI puppet master" [puppet] - 10https://gerrit.wikimedia.org/r/367870 (https://phabricator.wikimedia.org/T171712) (owner: 10Giuseppe Lavagetto) [09:27:29] RECOVERY - Host mw2119 is UP: PING OK - Packet loss = 0%, RTA = 36.08 ms [09:27:44] I am still unsure how you manage to fix those weird issues [09:27:49] RECOVERY - nutcracker process on mw2119 is OK: PROCS OK: 1 process with UID = 111 (nutcracker), command name nutcracker [09:27:52] (03CR) 10Muehlenhoff: [C: 032] Install mw2152 and mw2246 with jessie [puppet] - 10https://gerrit.wikimedia.org/r/367868 (owner: 10Muehlenhoff) [09:27:53] cherry picked on integration-puppetmaster [09:27:58] ran puppet which applied your patch [09:27:59] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:27:59] RECOVERY - nutcracker port on mw2119 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212 [09:27:59] RECOVERY - Check systemd state on mw2119 is OK: OK - running: The system is fully operational [09:28:07] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class classes for integration-puppetmaster01.integration.eqiad.wmflabs on node integration-puppetmaster01.integration.eqiad.wmflabs [09:28:08] :( [09:28:16] <_joe_> ok [09:28:22] <_joe_> it seems we fixed that issue at least [09:28:43] it is on integration-puppetmaster01.integration.eqiad.wmflabs if you want to live hack it [09:28:44] lol [09:29:10] maybe I can restart apache2 / passenger? [09:29:12] <_joe_> yes [09:29:14] <_joe_> do that [09:29:38] restarted apache2, running puppet [09:29:43] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Reading data from Integration/host/integration-puppetmaster01 failed: TypeError: Data retrieved from Integration/host/integration-puppetmaster01 is String, not Hash or nil at /etc/puppet/manifests/realm.pp:51 on node integration-puppetmaster01.integration.eqiad.wmflabs [09:29:45] bah [09:29:47] it is back [09:30:09] PROBLEM - puppet last run on mw2119 is CRITICAL: CRITICAL: Puppet has 6 failures. Last run 6 minutes ago with 6 failures. Failed resources (up to 3 shown): File_line[login.defs-SYS_GID_MAX],File[/etc/firejail/mediawiki-converters.profile],Package[fonts-noto-cjk],Service[nutcracker] [09:30:33] stil in realm.pp:51 for the discovery::apps_route which is probably the first hiera() call [09:30:39] <_joe_> yes [09:31:05] the "could not find class classes" was probably a misleading error [09:31:09] <_joe_> I'm gonna play with it [09:31:15] ok [09:32:03] and if you want a guinea ping puppet agent, you can use integration-r-lang-01.integration.eqiad.wmflabs (jessie) [09:32:13] <_joe_> ok, thanks [09:32:14] <_joe_> sigh [09:32:28] any clue why it would suddenly start falling ? [09:32:36] <_joe_> yes [09:32:43] <_joe_> a change by paravoid that I merged [09:32:58] cool! [09:33:09] RECOVERY - puppet last run on mw2119 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [09:33:20] I mean, at least the root cause is known :} [09:33:30] <_joe_> ohhh jeez [09:33:40] <_joe_> I forgot something in my fix [09:36:16] (03PS1) 10Ema: ipresolve: update documentation [puppet] - 10https://gerrit.wikimedia.org/r/367871 [09:36:34] (03PS3) 10Giuseppe Lavagetto: hiera: fix mwcache library [puppet] - 10https://gerrit.wikimedia.org/r/367870 [09:37:00] RECOVERY - MariaDB Slave Lag: s1 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 89998.71 seconds [09:38:42] _joe_: and while at it you can attach it to Bug: T171712 :} [09:38:43] T171712: integration puppetmaster yield String, not Hash or nil at /etc/puppet/manifests/realm.pp:51 - https://phabricator.wikimedia.org/T171712 [09:39:41] <_joe_> hashar: fixed I'd say [09:39:52] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/367870 (owner: 10Giuseppe Lavagetto) [09:39:54] <_joe_> can you try on other servers? [09:40:07] sure [09:40:50] catalog being cached [09:41:10] <_joe_> so it works [09:41:11] <_joe_> ok [09:41:15] <_joe_> let me merge this [09:41:16] at least puppet is no more complainig [09:41:21] _joe_: can you add Bug: T171712 to it ? [09:41:27] might help for later reference [09:41:40] <_joe_> sure [09:41:44] and yeah jessie/trusty hosts are passing just fine \O/ [09:42:28] (03PS4) 10Giuseppe Lavagetto: hiera: fix mwcache library [puppet] - 10https://gerrit.wikimedia.org/r/367870 (https://phabricator.wikimedia.org/T171712) [09:42:50] (03CR) 10Hashar: [C: 031] "That fixed it on the CI puppet master" [puppet] - 10https://gerrit.wikimedia.org/r/367870 (https://phabricator.wikimedia.org/T171712) (owner: 10Giuseppe Lavagetto) [09:43:40] (03CR) 10Giuseppe Lavagetto: [C: 032] hiera: fix mwcache library [puppet] - 10https://gerrit.wikimedia.org/r/367870 (https://phabricator.wikimedia.org/T171712) (owner: 10Giuseppe Lavagetto) [09:47:22] <_joe_> fun thing - hiera changes are enabled on a puppetmaster only when the agent runs on it [09:47:25] <_joe_> or it is restarted [09:47:33] <_joe_> can you fucking believe it? [09:47:34] <_joe_> :P [09:48:13] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1066 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367865 (owner: 10Jcrespo) [09:48:52] _joe_: sounds like the script used to deployed puppet changes would have to take care of that whenever /hieradata is touched ? :( [09:49:31] hmm since today puppet now fails on a puppet master running stretch [09:49:32] E: Package 'ruby-mysql' has no installation candidate [09:49:33] or maybe a HUP is sufficient [09:50:35] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1066 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367865 (owner: 10Jcrespo) [09:50:45] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1066 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367865 (owner: 10Jcrespo) [09:52:11] paladox: it's ruby-mysql2 in stretch [09:52:18] ah thanks [09:52:37] https://github.com/wikimedia/puppet/blob/7908798a594f809fc2286333c9e3f8387362a6af/modules/puppetmaster/manifests/init.pp#L87 should probaly need upating :) [09:52:45] not sure if it's a clean drop-in replacement, though, when that has been tested, we can update puppet [09:52:56] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1066 after maintenance (duration: 00m 46s) [09:53:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:53:17] moritzm: version numbers are very different though [09:53:18] <_joe_> puppet master running stretch is not supported. [09:53:20] <_joe_> next [09:53:37] <_joe_> do not try to fix it [09:54:34] volans: src:ruby-mysql was removed from Debian with the comment "replaced by ruby-mysql2", so that seems fine: https://packages.qa.debian.org/r/ruby-mysql/news/20161222T191352Z.html [09:54:51] <_joe_> moritzm: again, stretch has puppet 4 [09:54:55] <_joe_> it's completely unsupported [09:55:12] <_joe_> and if he's using a puppet3 master on stretch [09:55:23] <_joe_> that's unsupported by us and makes no sense to put effort into it [09:57:05] !log reimaging mw2152 to jessie (T145742) [09:57:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:57:17] T145742: Migrate video scalers to jessie - https://phabricator.wikimedia.org/T145742 [09:57:36] _joe_: sure, I'm not planning to work on that anyway [09:58:31] !log jmm@puppetmaster1001 conftool action : set/pooled=yes; selector: mw2119.codfw.wmnet [09:58:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:58:42] !log jmm@puppetmaster1001 conftool action : set/pooled=no; selector: mw2119.codfw.wmnet [09:58:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:58:55] !log jmm@puppetmaster1001 conftool action : set/pooled=no; selector: mw2119.codfw.wmnet [09:59:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:04] (03CR) 10Giuseppe Lavagetto: [C: 031] Tests: simplify and improve parametrized tests (032 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/366733 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans) [10:07:52] _joe_: tools labs puppet is recovering as well. Kudos! [10:09:22] <_joe_> well I merged the bad patch [10:09:32] <_joe_> so I guess not really 'kudos' [10:16:35] (03PS1) 10Filippo Giunchedi: librenms: explicit graphite port [puppet] - 10https://gerrit.wikimedia.org/r/367875 (https://phabricator.wikimedia.org/T171167) [10:17:56] (03CR) 10Filippo Giunchedi: [C: 032] librenms: explicit graphite port [puppet] - 10https://gerrit.wikimedia.org/r/367875 (https://phabricator.wikimedia.org/T171167) (owner: 10Filippo Giunchedi) [10:19:22] (03PS1) 10Giuseppe Lavagetto: role::mediawiki::canary_appserver: move to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/367876 [10:19:52] if anyone has bandwidth, I could use a puppet merge for yet another CI role ( https://gerrit.wikimedia.org/r/#/c/367411/ ) [10:20:07] (still a role cause I havent switched my mind yet to the role/profile/module pattern :-( ) [10:21:49] hashar: so it's a -1 by definition :-P [10:22:00] RECOVERY - mediawiki-installation DSH group on mw2119 is OK: OK [10:22:03] yeah I guess [10:22:16] then I would have to refactor the whole CI mess which is slightly more complicated ;} [10:22:25] I'll leave it to the human-puppetmasters ;) [10:22:54] I will probably refactor the zuul stuff first to train myself. But that would be after my relocation/vacations [10:23:49] 10Operations, 10monitoring, 10User-fgiunchedi: prometheus-puppet-agent-stats cronspam on missing puppet stats - https://phabricator.wikimedia.org/T170932#3474166 (10faidon) [10:25:27] 10Operations, 10monitoring, 10netops: "MySQL server has gone away" from librenms logs - https://phabricator.wikimedia.org/T171714#3474171 (10faidon) [10:30:08] (03PS3) 10Faidon Liambotis: Kill module puppet_statsd [puppet] - 10https://gerrit.wikimedia.org/r/359448 [10:31:04] (03CR) 10Faidon Liambotis: [C: 032] Kill module puppet_statsd [puppet] - 10https://gerrit.wikimedia.org/r/359448 (owner: 10Faidon Liambotis) [10:33:50] PROBLEM - puppet last run on alsafi is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:11] grumble grumble [10:34:20] PROBLEM - puppet last run on dbmonitor2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:20] PROBLEM - puppet last run on dbmonitor1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:20] PROBLEM - puppet last run on elastic1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:21] PROBLEM - puppet last run on mw1167 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:30] PROBLEM - puppet last run on mw2122 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:30] PROBLEM - puppet last run on db1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:30] PROBLEM - puppet last run on dubnium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:30] PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:30] PROBLEM - puppet last run on cp1060 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:30] PROBLEM - puppet last run on mw1161 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:30] PROBLEM - puppet last run on mc2028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:31] PROBLEM - puppet last run on planet2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:40] PROBLEM - puppet last run on mw1256 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:40] PROBLEM - puppet last run on db1092 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:40] PROBLEM - puppet last run on mw1278 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:40] PROBLEM - puppet last run on analytics1040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:35:18] (03PS1) 10Faidon Liambotis: Revert "Kill module puppet_statsd" [puppet] - 10https://gerrit.wikimedia.org/r/367877 [10:35:27] (03CR) 10Faidon Liambotis: [V: 032 C: 032] Revert "Kill module puppet_statsd" [puppet] - 10https://gerrit.wikimedia.org/r/367877 (owner: 10Faidon Liambotis) [10:35:28] 10Operations, 10monitoring, 10netops, 10Patch-For-Review, 10User-fgiunchedi: Evaluate LibreNMS' Graphite backend - https://phabricator.wikimedia.org/T171167#3474186 (10fgiunchedi) 05Open>03Resolved This is the resolved, note that the port in https://gerrit.wikimedia.org/r/367875 is required since `$c... [10:59:30] (03CR) 10Daniel Kinzler: "From the comments, it doesn't look like T167784 was caused by Wikidata. T171370 is my bad, patch is up, see I9100c1745. However, I don't " [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup) [11:02:31] !log Updated the Wikidata property suggester with data from Monday's JSON dump and applied the T132839 workarounds [11:02:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:43] T132839: [RfC] Property suggester suggests human properties for non-human items - https://phabricator.wikimedia.org/T132839 [11:02:58] hoo: <3 [11:03:59] (03CR) 10Daniel Kinzler: "> Also, we would be raising the dispatch rate of wikidata changes by ~ 25%. That's sign-off from the DBAs." [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup) [11:04:56] (03CR) 10Jcrespo: [C: 04-1] "We have to degradations of service directly related to Wikidata jobs/crons/maintenance- we do not want to add more variables because we wi" [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup) [11:08:36] 10Operations, 10ops-eqiad: Degraded RAID on db1068 - https://phabricator.wikimedia.org/T171723#3474264 (10ops-monitoring-bot) [11:10:20] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1068 - https://phabricator.wikimedia.org/T171723#3474269 (10Volans) p:05Triage>03High This is s4 master. [11:22:36] (03CR) 10Daniel Kinzler: "This patch was made to address one of these service degradations. We can hold it back, but I don't see how we can fix the problem without " [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup) [11:49:40] RECOVERY - mediawiki-installation DSH group on mw2152 is OK: OK [11:53:10] PROBLEM - MariaDB Slave Lag: s2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:53:24] jouncebot: next [11:53:24] In 1 hour(s) and 6 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170726T1300) [11:53:31] PROBLEM - MariaDB Slave Lag: s3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:53:59] hashar: nothing for swat so far ^ [11:58:11] PROBLEM - MariaDB Slave Lag: s2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:58:40] PROBLEM - MariaDB Slave Lag: s3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:12:40] !log installing xorg-server updates from jessie 8.9 point release [12:12:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:24:00] PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL: CRITICAL: 1.67% of data above the critical threshold [1000.0] [12:49:10] PROBLEM - Apache HTTP on mw1289 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.002 second response time [12:49:21] PROBLEM - HHVM rendering on mw1289 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 3.850 second response time [12:49:21] PROBLEM - Nginx local proxy to apache on mw1289 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 1.858 second response time [12:50:10] RECOVERY - Apache HTTP on mw1289 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.025 second response time [12:50:20] RECOVERY - HHVM rendering on mw1289 is OK: HTTP OK: HTTP/1.1 200 OK - 75226 bytes in 0.273 second response time [12:50:20] RECOVERY - Nginx local proxy to apache on mw1289 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.083 second response time [12:51:11] jouncebot: next [12:51:11] In 0 hour(s) and 8 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170726T1300) [13:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170726T1300). [13:00:27] o/ [13:00:32] but nothing to deploy... [13:02:37] * TabbyCat checks just in case he's got anything [13:03:04] zeljkof: https://gerrit.wikimedia.org/r/#/c/367676/ <-- wanna do? [13:03:11] if yes, I can list at wikitech [13:04:18] TabbyCat: sure [13:04:44] (03PS2) 10Giuseppe Lavagetto: role::mediawiki::canary_appserver: move to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/367876 [13:05:07] zeljkof: okay I'll list it right now [13:05:40] PROBLEM - MariaDB Slave Lag: s2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:06:33] done [13:06:40] TabbyCat: on it [13:06:47] for the record: I can SWAT today! [13:07:17] hashar: one thing for SWAT, I'll take care of it [13:08:11] (03PS3) 10Zfilipin: HD logos for eswikivoyage and added some missing paths to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367676 (https://phabricator.wikimedia.org/T170604) (owner: 10MarcoAurelio) [13:08:29] (03PS3) 10Ema: pybal::monitoring: add check_pybal_ipvs_diff [puppet] - 10https://gerrit.wikimedia.org/r/367662 (https://phabricator.wikimedia.org/T134893) [13:08:55] TabbyCat: having some intertubes slowdown at the moment, so it might be a bit slower that usual... [13:09:21] zeljkof: no problem, take your time or we can delay it if the server is too overloaded [13:12:10] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367676 (https://phabricator.wikimedia.org/T170604) (owner: 10MarcoAurelio) [13:12:45] (03CR) 10Ema: pybal::monitoring: add check_pybal_ipvs_diff (0312 comments) [puppet] - 10https://gerrit.wikimedia.org/r/367662 (https://phabricator.wikimedia.org/T134893) (owner: 10Ema) [13:12:53] TabbyCat: it will get done in a few minutes, tubes not as bad as I thought [13:13:00] the problem is on my side, servers are fine [13:13:11] okay no probs [13:13:38] (03Merged) 10jenkins-bot: HD logos for eswikivoyage and added some missing paths to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367676 (https://phabricator.wikimedia.org/T170604) (owner: 10MarcoAurelio) [13:13:40] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:13:51] (03CR) 10jenkins-bot: HD logos for eswikivoyage and added some missing paths to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367676 (https://phabricator.wikimedia.org/T170604) (owner: 10MarcoAurelio) [13:14:11] 10Operations, 10Diamond, 10Traffic, 10monitoring, 10Prometheus-metrics-monitoring: Enable diamond PowerDNSRecursor collector on dnsrecursors - https://phabricator.wikimedia.org/T169600#3474646 (10ema) [13:15:40] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [13:18:25] TabbyCat: the commit is at mwdebug1002, can you test there? [13:18:35] zeljkof: sure thing, I'm on it [13:19:38] zeljkof: oops https://es.wikisource.org/wiki/Portada <_< [13:19:45] I guess I need to reduce that one [13:20:28] TabbyCat: should I revert? or will you create a follow up commit? [13:20:31] but for everything else, all looks good to me [13:20:34] I'll follow-up [13:20:47] TabbyCat: ok, so I can deploy? [13:21:19] zeljkof: yes, and I'll create a followup removing the path for eswikisource until I can find the right sizes [13:21:35] TabbyCat: ok, deploying... [13:22:46] !log zfilipin@tin Synchronized static/images/project-logos/: SWAT: [[gerrit:367676|HD logos for eswikivoyage and added some missing paths to the config (T170604)]] (duration: 00m 54s) [13:22:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:57] T170604: High density logos for spanish sister projects - https://phabricator.wikimedia.org/T170604 [13:24:10] (03Draft2) 10MarcoAurelio: Revert eswikisource paths due to oversized logos. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367889 [13:24:15] (03Draft1) 10MarcoAurelio: Revert eswikisource paths due to oversized logos. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367889 [13:24:31] zeljkof: ^^ [13:24:49] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:367676|HD logos for eswikivoyage and added some missing paths to the config (T170604)]] (duration: 00m 46s) [13:24:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:11] TabbyCat: deployed, please check [13:25:26] TabbyCat: also, could you please add the follow up commit to Deployments page? [13:25:37] yes, I was doing that :) [13:27:06] done [13:27:35] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367889 (owner: 10MarcoAurelio) [13:27:43] TabbyCat: thanks [13:29:00] (03Merged) 10jenkins-bot: Revert eswikisource paths due to oversized logos. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367889 (owner: 10MarcoAurelio) [13:29:13] (03CR) 10jenkins-bot: Revert eswikisource paths due to oversized logos. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367889 (owner: 10MarcoAurelio) [13:29:20] TabbyCat: you have added it to yesterday :) please update [13:29:50] sorry, I am tense due to this [13:29:52] fixing [13:30:13] (03CR) 10Jcrespo: "I don't know if this can create a regression, but this will likely mitigate T171638 (maybe), even if it is a separate issue." [puppet] - 10https://gerrit.wikimedia.org/r/367710 (https://phabricator.wikimedia.org/T171583) (owner: 10Rush) [13:30:37] TabbyCat: 367889 is at mwdebug1002, please check [13:30:54] on it [13:31:10] PROBLEM - MariaDB Slave Lag: s3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:31:30] looks good again on mwdebug1002 [13:31:30] PROBLEM - salt-minion processes on stat1002 is CRITICAL: Return code of 255 is out of bounds [13:31:37] 10Operations, 10monitoring, 10User-fgiunchedi: Diamond log level set to DEBUG spams syslog - https://phabricator.wikimedia.org/T171580#3474672 (10jcrespo) Migrating my comment from T171638, as it may be useful for debugging: > I believe it is on stretch, because I have not seen it on other hosts, but it c... [13:31:40] PROBLEM - configured eth on stat1002 is CRITICAL: Return code of 255 is out of bounds [13:31:41] PROBLEM - puppet last run on stat1002 is CRITICAL: Return code of 255 is out of bounds [13:31:45] (03PS3) 10Giuseppe Lavagetto: role::mediawiki::canary_appserver: move to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/367876 [13:31:50] PROBLEM - MariaDB Slave Lag: s2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:32:25] TabbyCat: ok, deploying [13:32:34] I thought 1001 was getting less loaded [13:32:36] I am working on stat1002 sorry, silenced [13:33:10] PROBLEM - MariaDB Slave SQL: x1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:33:11] PROBLEM - MariaDB Slave IO: s6 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:33:20] PROBLEM - MariaDB Slave IO: m3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:33:23] PROBLEM - MariaDB Slave IO: x1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:33:23] PROBLEM - MariaDB Slave IO: s1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:33:30] RECOVERY - salt-minion processes on stat1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:33:40] RECOVERY - configured eth on stat1002 is OK: OK - interfaces up [13:33:51] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 18 minutes ago with 0 failures [13:34:01] RECOVERY - MariaDB Slave SQL: x1 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [13:34:01] RECOVERY - MariaDB Slave IO: s6 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [13:34:10] RECOVERY - MariaDB Slave IO: m3 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [13:34:10] RECOVERY - MariaDB Slave IO: x1 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [13:34:11] RECOVERY - MariaDB Slave IO: s1 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [13:34:16] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:367889|Revert eswikisource paths due to oversized logos (T170604)]] (duration: 00m 46s) [13:34:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:27] T170604: High density logos for spanish sister projects - https://phabricator.wikimedia.org/T170604 [13:34:39] TabbyCat: deployed, please check [13:35:05] lgtm [13:35:20] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 2040233 [13:35:21] TabbyCat: anything else, or are we done with scap for now? :) [13:35:33] (03PS1) 10Giuseppe Lavagetto: apache::conf: convert to use validate_numeric [puppet] - 10https://gerrit.wikimedia.org/r/367891 [13:35:50] zeljkof: I give way now, no more from me to swat [13:35:55] thank you for your help [13:36:01] !log EU SWAT finished [13:36:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:15] TabbyCat: thanks for deploying with #releng :) [13:37:22] as if it could be done with another "company" xD -- my pleasure :) [13:37:47] TabbyCat: ;) [13:41:20] PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL: CRITICAL: 1.67% of data above the critical threshold [1000.0] [13:44:21] PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL: CRITICAL: 1.67% of data above the critical threshold [1000.0] [13:44:30] !log restarting cassandra on maps clusters [13:44:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:59] (03Draft2) 10MarcoAurelio: Make ptwikimedia a fishbowl wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367892 (https://phabricator.wikimedia.org/T171501) [13:51:07] (03Draft1) 10MarcoAurelio: Make ptwikimedia a fishbowl wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367892 (https://phabricator.wikimedia.org/T171501) [13:51:16] (03PS2) 10Giuseppe Lavagetto: apache::conf: convert to use validate_numeric [puppet] - 10https://gerrit.wikimedia.org/r/367891 (https://phabricator.wikimedia.org/T171704) [13:51:18] (03PS4) 10Giuseppe Lavagetto: role::mediawiki::canary_appserver: move to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/367876 (https://phabricator.wikimedia.org/T171704) [13:51:54] (03PS3) 10MarcoAurelio: Make ptwikimedia a fishbowl wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367892 (https://phabricator.wikimedia.org/T171501) [13:53:20] (03CR) 10jerkins-bot: [V: 04-1] Make ptwikimedia a fishbowl wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367892 (https://phabricator.wikimedia.org/T171501) (owner: 10MarcoAurelio) [13:58:30] (03PS4) 10MarcoAurelio: Make ptwikimedia a fishbowl wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367892 (https://phabricator.wikimedia.org/T171501) [14:02:17] Dereckson: does it make sense to make ptwikimedia fishbowl, given that they use the abusefilter to make it ''de facto'' already? [14:02:54] (03CR) 10Volans: "Much nicer!. I still have some doubts about getting the data from prometheus because of potential stale data that doesn't come from the so" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/367662 (https://phabricator.wikimedia.org/T134893) (owner: 10Ema) [14:04:20] (03PS3) 10Jcrespo: mariadb: Pool db2072 with low load as s1 main traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365285 (https://phabricator.wikimedia.org/T170662) [14:05:20] (03CR) 10Volans: [C: 04-1] "Reply inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/367662 (https://phabricator.wikimedia.org/T134893) (owner: 10Ema) [14:05:26] damn gerrit [14:05:29] it was not a -1 [14:06:02] seems a bug on the IRC side though, the patch doesn't have the -1 :D [14:08:28] (03PS8) 10Rush: diamond: set diskspace filesystems explicitly [puppet] - 10https://gerrit.wikimedia.org/r/367710 (https://phabricator.wikimedia.org/T171583) [14:08:47] (03PS9) 10Rush: diamond: set diskspace filesystems explicitly [puppet] - 10https://gerrit.wikimedia.org/r/367710 (https://phabricator.wikimedia.org/T171583) [14:09:21] (03CR) 10Jcrespo: [C: 032] mariadb: Pool db2072 with low load as s1 main traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365285 (https://phabricator.wikimedia.org/T170662) (owner: 10Jcrespo) [14:11:52] * elukey will not make jokes about volans' last -1 [14:11:58] (03Merged) 10jenkins-bot: mariadb: Pool db2072 with low load as s1 main traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365285 (https://phabricator.wikimedia.org/T170662) (owner: 10Jcrespo) [14:12:13] (03CR) 10jenkins-bot: mariadb: Pool db2072 with low load as s1 main traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365285 (https://phabricator.wikimedia.org/T170662) (owner: 10Jcrespo) [14:12:13] elukey: you're so kind [14:12:17] :-P [14:12:20] :D [14:13:01] (03Abandoned) 10Rush: DON'T MERGE: labsdb: in case labsdb1001 falls over [puppet] - 10https://gerrit.wikimedia.org/r/367625 (https://phabricator.wikimedia.org/T171538) (owner: 10Rush) [14:13:57] (03CR) 10Jcrespo: "Don't keep it too far ;-)" [puppet] - 10https://gerrit.wikimedia.org/r/367625 (https://phabricator.wikimedia.org/T171538) (owner: 10Rush) [14:16:24] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Degraded RAID on db1066 - https://phabricator.wikimedia.org/T169448#3474822 (10jcrespo) 05Open>03Resolved a:03Cmjohnson [14:17:09] (03CR) 10Filippo Giunchedi: [C: 031] diamond: set diskspace filesystems explicitly [puppet] - 10https://gerrit.wikimedia.org/r/367710 (https://phabricator.wikimedia.org/T171583) (owner: 10Rush) [14:21:13] !log jynus@tin Synchronized wmf-config/db-codfw.php: Pool db2072 (duration: 00m 45s) [14:21:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:19] 10Operations, 10Beta-Cluster-Infrastructure, 10media-storage, 10Release-Engineering-Team (Kanban): nscd does not cache localhost causing high CPU usage when localhost is often resolved - https://phabricator.wikimedia.org/T171745#3474846 (10hashar) [14:25:51] (03PS2) 10Hashar: swift: save nscd CPU by using IP address [puppet] - 10https://gerrit.wikimedia.org/r/358799 (https://phabricator.wikimedia.org/T160990) [14:27:18] (03CR) 10Hashar: "This patch is cherry picked on the beta cluster and definitely reduce the load / CPU usages of nscd on the labs instances." [puppet] - 10https://gerrit.wikimedia.org/r/358799 (https://phabricator.wikimedia.org/T160990) (owner: 10Hashar) [14:28:09] 10Operations, 10Beta-Cluster-Infrastructure, 10media-storage, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): nscd does not cache localhost causing high CPU usage when localhost is often resolved - https://phabricator.wikimedia.org/T171745#3474869 (10hashar) For #beta-cluster-infrastructure the... [14:33:40] (03CR) 10Giuseppe Lavagetto: [C: 031] CLI: simplify imports and introspection [software/cumin] - 10https://gerrit.wikimedia.org/r/366734 (owner: 10Volans) [14:34:51] (03PS1) 10Jcrespo: mariadb: Depool db2068 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367896 [14:35:56] (03PS2) 10Jcrespo: mariadb: Depool db2068 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367896 [14:40:50] !log installing spice security updates [14:40:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:01] (03CR) 10Filippo Giunchedi: "LGTM, one comment re: exceptions" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/367662 (https://phabricator.wikimedia.org/T134893) (owner: 10Ema) [14:50:17] (03CR) 10Volans: pybal::monitoring: add check_pybal_ipvs_diff (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/367662 (https://phabricator.wikimedia.org/T134893) (owner: 10Ema) [14:51:19] 10Operations, 10ops-codfw, 10User-fgiunchedi: ms-be2024 not powering on - https://phabricator.wikimedia.org/T171275#3474929 (10Papaul) @fgiunchedi Yes the server is back at the stage it was yesterday before updating the firmware and removing the power. I am going to remove the power again and let you try to... [14:54:14] papaul: ^ thanks! I'm about to jump into a meeting, will be able to take a look in an hour or so [14:55:00] PROBLEM - Host ms-be2024.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [15:00:08] godog: ok [15:00:12] PROBLEM - HHVM jobrunner on mw1169 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:01:10] RECOVERY - HHVM jobrunner on mw1169 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.002 second response time [15:07:47] (03CR) 10Giuseppe Lavagetto: [C: 031] Logging: add a custom trace() logging level [software/cumin] - 10https://gerrit.wikimedia.org/r/366735 (owner: 10Volans) [15:08:02] 10Operations, 10Performance-Team, 10Thumbor, 10Patch-For-Review, 10User-fgiunchedi: Deploy thumbor in codfw - https://phabricator.wikimedia.org/T167801#3474987 (10Papaul) [15:08:06] 10Operations, 10ops-codfw, 10Performance-Team, 10Thumbor, 10User-fgiunchedi: Rename mw2148 / mw2149 / mw2259 / mw2260 to thumbor200[1234] - https://phabricator.wikimedia.org/T168881#3474985 (10Papaul) 05Open>03Resolved Complete [15:10:50] RECOVERY - Host ms-be2024.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.55 ms [15:11:15] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db2068 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367896 (owner: 10Jcrespo) [15:12:32] (03Merged) 10jenkins-bot: mariadb: Depool db2068 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367896 (owner: 10Jcrespo) [15:12:44] (03CR) 10jenkins-bot: mariadb: Depool db2068 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367896 (owner: 10Jcrespo) [15:15:21] !log jynus@tin Synchronized wmf-config/db-codfw.php: Depool db2068 (duration: 00m 46s) [15:15:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:17:37] !log restarting and upgrading db2068 [15:17:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:00] (03PS3) 10Dzahn: contint: webperformance Jenkins slave [puppet] - 10https://gerrit.wikimedia.org/r/367411 (https://phabricator.wikimedia.org/T166756) (owner: 10Hashar) [15:20:55] !log upgrade nodejs on scb2001 (currently depooled for testing) [15:21:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:21:53] PROBLEM - Check Varnish expiry mailbox lag on cp1049 is CRITICAL: CRITICAL: expiry mailbox lag is 2026383 [15:21:53] (03CR) 10Dzahn: [C: 032] contint: webperformance Jenkins slave [puppet] - 10https://gerrit.wikimedia.org/r/367411 (https://phabricator.wikimedia.org/T166756) (owner: 10Hashar) [15:22:56] (03PS2) 10Dzahn: visualdiff: Remove manually built `uprightdiff` [puppet] - 10https://gerrit.wikimedia.org/r/367131 (owner: 10Legoktm) [15:23:46] mutante: danke :) [15:27:21] de rien [15:27:40] (03CR) 10Dzahn: [C: 032] visualdiff: Remove manually built `uprightdiff` [puppet] - 10https://gerrit.wikimedia.org/r/367131 (owner: 10Legoktm) [15:29:21] !log upgrade nodejs on remaining scb hosts (along with service restarts) [15:29:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:53] 10Operations, 10MediaWiki-extensions-LocalisationUpdate, 10MinervaNeue, 10Reading-Web-Backlog, 10Release-Engineering-Team: The mobile-frontend-placeholder message is not updated in din.wikipedia.org - https://phabricator.wikimedia.org/T171711#3475060 (10bmansurov) [15:30:17] (03PS9) 10Andrew Bogott: Puppetmaster: Fix apache config ssldir [puppet] - 10https://gerrit.wikimedia.org/r/365053 [15:31:34] (03CR) 10Andrew Bogott: [C: 032] Puppetmaster: Fix apache config ssldir [puppet] - 10https://gerrit.wikimedia.org/r/365053 (owner: 10Andrew Bogott) [15:32:15] (03PS1) 10Jcrespo: Revert "mariadb: Depool db2068 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367900 [15:32:43] !log patching puppetmaster1001, possible puppet hiccups coming up [15:32:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:43] (03CR) 10Dzahn: "cleaned up manually on ruthenium to reflect this" [puppet] - 10https://gerrit.wikimedia.org/r/367131 (owner: 10Legoktm) [15:38:12] (03CR) 10Dzahn: [C: 031] "Can you confirm the desired result is that on jessie both PHP5 and PHP7 are installed in parallel?" [puppet] - 10https://gerrit.wikimedia.org/r/361680 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [15:38:35] (03CR) 10Dzahn: [C: 031] "@Hashar" [puppet] - 10https://gerrit.wikimedia.org/r/361680 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [15:38:54] (03PS2) 10Jcrespo: Revert "mariadb: Depool db2068 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367900 [15:38:56] (03PS1) 10Jcrespo: mariadb: Increase db2072 weight after pooling it with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367901 [15:38:58] (03PS1) 10Jcrespo: mariadb: Depool db2069 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367902 [15:39:00] (03PS1) 10Ema: Add support for One-packet scheduling (OPS) [debs/pybal] - 10https://gerrit.wikimedia.org/r/367903 [15:39:12] !log rolling upgrade/service restarts of nodejs in eqiad [15:39:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:39:45] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db2068 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367900 (owner: 10Jcrespo) [15:40:03] (03CR) 10Jcrespo: [C: 032] mariadb: Increase db2072 weight after pooling it with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367901 (owner: 10Jcrespo) [15:40:21] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db2069 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367902 (owner: 10Jcrespo) [15:40:40] (03PS2) 10Dzahn: Don't need to update submodules recursively [puppet] - 10https://gerrit.wikimedia.org/r/367639 (owner: 10Reedy) [15:41:16] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db2068 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367900 (owner: 10Jcrespo) [15:41:26] (03CR) 10jenkins-bot: Revert "mariadb: Depool db2068 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367900 (owner: 10Jcrespo) [15:42:33] (03Merged) 10jenkins-bot: mariadb: Increase db2072 weight after pooling it with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367901 (owner: 10Jcrespo) [15:42:52] (03Merged) 10jenkins-bot: mariadb: Depool db2069 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367902 (owner: 10Jcrespo) [15:43:55] (03CR) 10jenkins-bot: mariadb: Increase db2072 weight after pooling it with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367901 (owner: 10Jcrespo) [15:43:57] (03CR) 10jenkins-bot: mariadb: Depool db2069 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367902 (owner: 10Jcrespo) [15:45:58] !log jynus@tin Synchronized wmf-config/db-codfw.php: Repool db2068, depool db2069, pool db2072 with more weight (duration: 00m 46s) [15:46:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:46:28] db2068 IPMI Temperature check starting timing out after reboot :-/ [15:48:03] !log upgrade and reboot db2069 [15:48:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:38] (03CR) 10Dzahn: [C: 032] Don't need to update submodules recursively [puppet] - 10https://gerrit.wikimedia.org/r/367639 (owner: 10Reedy) [15:51:23] (03PS1) 10Jcrespo: mariadb: Change db2069 mysql socket location to the default [puppet] - 10https://gerrit.wikimedia.org/r/367906 (https://phabricator.wikimedia.org/T148507) [15:53:22] (03CR) 10jerkins-bot: [V: 04-1] Add support for One-packet scheduling (OPS) [debs/pybal] - 10https://gerrit.wikimedia.org/r/367903 (owner: 10Ema) [15:54:22] (03CR) 10Jcrespo: [C: 032] mariadb: Change db2069 mysql socket location to the default [puppet] - 10https://gerrit.wikimedia.org/r/367906 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [15:57:48] PROBLEM - Check Varnish expiry mailbox lag on cp1099 is CRITICAL: CRITICAL: expiry mailbox lag is 2082684 [15:57:57] (03PS10) 10Rush: diamond: set diskspace filesystems explicitly [puppet] - 10https://gerrit.wikimedia.org/r/367710 (https://phabricator.wikimedia.org/T171583) [15:59:00] 10Operations, 10ops-eqiad, 10hardware-requests, 10Patch-For-Review: Decommission mw1196 - https://phabricator.wikimedia.org/T170441#3475150 (10RobH) >>! In T170441#3473908, @MoritzMuehlenhoff wrote: > Have the "non-interruptuable steps" really been completed? mw1196 still has a salt key and shows up htt... [15:59:47] 10Operations, 10ops-codfw, 10User-fgiunchedi: ms-be2024 not powering on - https://phabricator.wikimedia.org/T171275#3475157 (10Papaul) @fgiunchedi Now the server can't not power on even when using the power button on the server . I contact HP and after troubleshooting they decide to send a replacement board... [16:01:16] (03CR) 10Rush: [C: 032] diamond: set diskspace filesystems explicitly [puppet] - 10https://gerrit.wikimedia.org/r/367710 (https://phabricator.wikimedia.org/T171583) (owner: 10Rush) [16:03:03] (03CR) 10Dzahn: "doesn't this mean merging this would break scap deployment on all labs instances that aren't using that puppetmaster with the cherry-picke" [puppet] - 10https://gerrit.wikimedia.org/r/365891 (https://phabricator.wikimedia.org/T166013) (owner: 1020after4) [16:04:21] (03CR) 10Dzahn: "I don't understand how this is related to /home since it's /var/lib/scap before and /var/lib/something_else after. Where is $deploy_user g" [puppet] - 10https://gerrit.wikimedia.org/r/365891 (https://phabricator.wikimedia.org/T166013) (owner: 1020after4) [16:08:08] (03PS1) 10Jcrespo: Revert "mariadb: Depool db2069 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367909 [16:12:19] (03PS2) 10Jcrespo: Revert "mariadb: Depool db2069 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367909 [16:12:21] (03PS1) 10Jcrespo: mariadb: Depool db2070 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367911 [16:14:25] !log upgraded nodejs on restbase* [16:14:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:06] (03PS1) 10Jcrespo: mariadb: Move db2070 socket location to the default after reboot [puppet] - 10https://gerrit.wikimedia.org/r/367912 (https://phabricator.wikimedia.org/T148507) [16:19:08] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db2069 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367909 (owner: 10Jcrespo) [16:19:53] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db2070 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367911 (owner: 10Jcrespo) [16:20:07] (03CR) 10Daniel Kinzler: [C: 031] "yes, please." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367393 (https://phabricator.wikimedia.org/T165197) (owner: 10Ladsgroup) [16:20:22] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db2069 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367909 (owner: 10Jcrespo) [16:20:32] (03CR) 10jenkins-bot: Revert "mariadb: Depool db2069 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367909 (owner: 10Jcrespo) [16:21:01] (03Merged) 10jenkins-bot: mariadb: Depool db2070 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367911 (owner: 10Jcrespo) [16:22:33] PROBLEM - puppet last run on mw1295 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:22:42] PROBLEM - puppet last run on kafka1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:22:42] PROBLEM - puppet last run on prometheus2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:22:42] PROBLEM - puppet last run on db1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:22:43] PROBLEM - puppet last run on cp3043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:22:46] !log jynus@tin Synchronized wmf-config/db-codfw.php: Repool db2069, depool db2070 (duration: 00m 45s) [16:22:52] PROBLEM - puppet last run on aluminium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:22:52] PROBLEM - puppet last run on ms-be1032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:22:52] PROBLEM - puppet last run on ores1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:22:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:23:02] PROBLEM - puppet last run on notebook1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:02] PROBLEM - puppet last run on wtp1041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:03] PROBLEM - puppet last run on labtestservices2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:12] PROBLEM - puppet last run on restbase1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:12] PROBLEM - puppet last run on db1098 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:12] PROBLEM - puppet last run on db1100 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:12] PROBLEM - puppet last run on krypton is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:13] PROBLEM - puppet last run on wtp1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:13] PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:22] PROBLEM - puppet last run on install1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:22] PROBLEM - puppet last run on mw1212 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:22] PROBLEM - puppet last run on wtp1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:32] PROBLEM - puppet last run on analytics1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:32] PROBLEM - puppet last run on db1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:32] PROBLEM - puppet last run on lvs3004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:32] (03CR) 10jenkins-bot: mariadb: Depool db2070 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367911 (owner: 10Jcrespo) [16:23:33] PROBLEM - puppet last run on wtp1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:25:13] RECOVERY - puppet last run on db1098 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:25:14] 10Operations, 10ops-codfw, 10hardware-requests: Decommission subra/suhail - https://phabricator.wikimedia.org/T169506#3475211 (10Papaul) [16:26:13] RECOVERY - puppet last run on wtp1019 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:27:05] (03PS1) 10Lucas Werkmeister (WMDE): Remove wbq_evaluation logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367913 [16:27:28] !log upgrading and rebooting db2070 [16:27:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:22] 10Operations, 10ops-codfw, 10hardware-requests: reclaim/decom tmh200[12] - https://phabricator.wikimedia.org/T168472#3475218 (10Papaul) [16:29:23] (03CR) 10Jcrespo: [C: 032] mariadb: Move db2070 socket location to the default after reboot [puppet] - 10https://gerrit.wikimedia.org/r/367912 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [16:30:57] 10Operations, 10MediaWiki-extensions-LocalisationUpdate, 10MinervaNeue, 10Release-Engineering-Team, 10Reading-Web-Backlog (Tracking): The mobile-frontend-placeholder message is not updated in din.wikipedia.org - https://phabricator.wikimedia.org/T171711#3475232 (10Jdlrobson) [16:31:54] RECOVERY - Check Varnish expiry mailbox lag on cp1049 is OK: OK: expiry mailbox lag is 587 [16:33:22] RECOVERY - puppet last run on mw1212 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:34:41] (03PS1) 10Lucas Werkmeister (WMDE): Log 'WikibaseQualityConstraints' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367914 (https://phabricator.wikimedia.org/T171281) [16:35:38] 10Operations, 10MediaWiki-extensions-LocalisationUpdate, 10Release-Engineering-Team: The mobile-frontend-placeholder message is not updated in din.wikipedia.org - https://phabricator.wikimedia.org/T171711#3475238 (10Amire80) Removing reading tags. It's most likely an ops issue with LU. [16:36:11] (03PS1) 10Jcrespo: Revert "mariadb: Depool db2070 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367915 [16:36:18] (03CR) 10Jcrespo: [C: 04-1] "Not yet" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367915 (owner: 10Jcrespo) [16:36:27] (03CR) 10Lucas Werkmeister (WMDE): "If you don’t like the log channel name, it can still be changed – the code that uses it hasn’t been deployed yet." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367914 (https://phabricator.wikimedia.org/T171281) (owner: 10Lucas Werkmeister (WMDE)) [16:36:41] papaul: thanks for dealing with ms-be2024, did they give you an ETA? [16:37:42] RECOVERY - Check Varnish expiry mailbox lag on cp1099 is OK: OK: expiry mailbox lag is 147718 [16:38:00] (03CR) 10Gehel: [C: 032] Decrease elasticsearch search thread pool to 32 for cirrus servers (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/367709 (https://phabricator.wikimedia.org/T169498) (owner: 10EBernhardson) [16:38:08] (03CR) 10Gehel: [C: 04-1] Decrease elasticsearch search thread pool to 32 for cirrus servers [puppet] - 10https://gerrit.wikimedia.org/r/367709 (https://phabricator.wikimedia.org/T169498) (owner: 10EBernhardson) [16:40:02] godog: tomorrow between 9am-1pm [16:40:04] 10Operations, 10MediaWiki-extensions-LocalisationUpdate, 10Release-Engineering-Team: l10nupdate failing with "git pull of extensions failed" since July 19th - https://phabricator.wikimedia.org/T171711#3475257 (10greg) [16:40:12] 10Operations, 10MediaWiki-extensions-LocalisationUpdate, 10Release-Engineering-Team: l10nupdate failing with "git pull of extensions failed" since July 19th - https://phabricator.wikimedia.org/T171711#3473889 (10greg) p:05Triage>03High [16:40:41] (03CR) 10Dzahn: "regarding the previous comments about this not being merged earlier etc - i have to point out now that YES, it did BREAK and caused a new " [puppet] - 10https://gerrit.wikimedia.org/r/255958 (owner: 10Reedy) [16:41:45] 10Operations, 10MediaWiki-extensions-LocalisationUpdate, 10Release-Engineering-Team: l10nupdate failing with "git pull of extensions failed" since July 19th - https://phabricator.wikimedia.org/T171711#3473889 (10Dzahn) likely caused by https://gerrit.wikimedia.org/r/#/c/255958/ follow-up to fix it was uploa... [16:43:09] mutante: thanks for that ^ so likely will run successfully tonight? [16:43:51] 10Operations, 10MediaWiki-extensions-LocalisationUpdate, 10Release-Engineering-Team: l10nupdate failing with "git pull of extensions failed" since July 19th - https://phabricator.wikimedia.org/T171711#3475273 (10greg) >>! In T171711#3475263, @Dzahn wrote: > likely caused by https://gerrit.wikimedia.org/r/#/c... [16:44:00] Reedy: ^ just fyi [16:44:15] Reedy: well, more than fyi, more like "hey, think you fixed it?" [16:44:16] greg-g: I made a fix a day or two ago [16:44:43] yeah, merged this morning, so hopefully fixed tonight [16:44:48] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db2070 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367915 (owner: 10Jcrespo) [16:45:22] 10Operations, 10MediaWiki-extensions-LocalisationUpdate, 10Release-Engineering-Team: l10nupdate failing with "git pull of extensions failed" since July 19th - https://phabricator.wikimedia.org/T171711#3475277 (10greg) a:03Reedy [16:46:11] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db2070 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367915 (owner: 10Jcrespo) [16:46:23] (03CR) 10jenkins-bot: Revert "mariadb: Depool db2070 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367915 (owner: 10Jcrespo) [16:47:48] (03PS2) 10Ema: Add support for One-packet scheduling (OPS) [debs/pybal] - 10https://gerrit.wikimedia.org/r/367903 [16:49:02] 10Operations, 10monitoring, 10netops, 10Patch-For-Review, 10User-fgiunchedi: Evaluate LibreNMS' Graphite backend - https://phabricator.wikimedia.org/T171167#3475301 (10fgiunchedi) Turns out this is more data than I expected (just slowly increasing by now) ``` $ du -hcs /var/lib/carbon/whisper/librenms/... [16:49:13] RECOVERY - puppet last run on wtp1041 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:49:22] RECOVERY - puppet last run on labtestservices2002 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:49:22] RECOVERY - puppet last run on krypton is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [16:49:42] RECOVERY - puppet last run on wtp1017 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [16:50:02] RECOVERY - puppet last run on aluminium is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [16:50:03] RECOVERY - puppet last run on cp3043 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [16:50:22] RECOVERY - puppet last run on restbase1010 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [16:50:23] RECOVERY - puppet last run on db1100 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:50:42] RECOVERY - puppet last run on db1029 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:50:52] RECOVERY - puppet last run on lvs3004 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:51:02] RECOVERY - puppet last run on prometheus2004 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:51:02] RECOVERY - puppet last run on ores1004 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:51:32] RECOVERY - puppet last run on install1002 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:51:32] RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:51:33] RECOVERY - puppet last run on wtp1026 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [16:51:33] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto, 10Wikimedia-log-errors: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3475310 (10Anomie) a:03Anomie [16:51:37] (03PS5) 10Elukey: puppetdb: Bump Java Heap max size to 6GB [puppet] - 10https://gerrit.wikimedia.org/r/366229 (https://phabricator.wikimedia.org/T170740) (owner: 10Alexandros Kosiaris) [16:51:52] RECOVERY - puppet last run on db1020 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:52:02] (03CR) 10Reedy: "The thing that broke it... Was adding something that wasn't there before :(" [puppet] - 10https://gerrit.wikimedia.org/r/255958 (owner: 10Reedy) [16:52:12] RECOVERY - puppet last run on ms-be1032 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:52:12] RECOVERY - puppet last run on notebook1001 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:52:42] RECOVERY - puppet last run on analytics1051 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [16:52:52] RECOVERY - puppet last run on kafka1022 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [16:54:52] RECOVERY - puppet last run on mw1295 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [16:55:10] !log jynus@tin Synchronized wmf-config/db-codfw.php: Repool db2070 (duration: 00m 45s) [16:55:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:58:46] (03CR) 10BBlack: [C: 031] Add support for One-packet scheduling (OPS) [debs/pybal] - 10https://gerrit.wikimedia.org/r/367903 (owner: 10Ema) [16:59:53] (03PS3) 10Gehel: Decrease elasticsearch search thread pool to 32 for cirrus servers [puppet] - 10https://gerrit.wikimedia.org/r/367709 (https://phabricator.wikimedia.org/T169498) (owner: 10EBernhardson) [17:02:50] (03CR) 10Ema: [C: 032] Add support for One-packet scheduling (OPS) [debs/pybal] - 10https://gerrit.wikimedia.org/r/367903 (owner: 10Ema) [17:04:39] (03CR) 10Faidon Liambotis: [C: 031] "LGTM. Maybe add IPv6 while we're at it?" [dns] - 10https://gerrit.wikimedia.org/r/367809 (https://phabricator.wikimedia.org/T169643) (owner: 10Ayounsi) [17:06:10] 10Operations, 10Puppet, 10Patch-For-Review: PuppetDB misbehaving on 2017-07-15 - https://phabricator.wikimedia.org/T170740#3475402 (10Volans) So we had a small hiccup today in which puppetdb responded 28 times 503s between 16:20:13 and 16:20:39 UTC, of those 17 where POSTs to update the hosts facts and we ha... [17:06:56] RECOVERY - carbon-cache too many creates on graphite1001 is OK: OK: Less than 1.00% above the threshold [500.0] [17:10:05] 10Operations, 10ops-codfw, 10hardware-requests: Decommission subra/suhail - https://phabricator.wikimedia.org/T169506#3475415 (10Papaul) [17:10:30] (03CR) 10Gehel: [C: 04-1] "puppet compiler result: https://puppet-compiler.wmflabs.org/compiler02/7171/" [puppet] - 10https://gerrit.wikimedia.org/r/367709 (https://phabricator.wikimedia.org/T169498) (owner: 10EBernhardson) [17:11:11] (03PS1) 10Chad: Group1 to wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367918 [17:11:22] (03CR) 10Chad: [C: 04-2] "later" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367918 (owner: 10Chad) [17:11:41] !log installing openjdk-8 security updates on cobalt and removing unused openjdk-7 packages [17:11:43] PROBLEM - Check systemd state on relforge1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [17:11:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:12:32] !log restarting gerrit to pick up Java security update [17:12:39] ^ checking relforge... [17:12:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:14:53] ACKNOWLEDGEMENT - Check systemd state on relforge1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Gehel experimentation in progress by dcausse [17:14:54] ACKNOWLEDGEMENT - Check systemd state on relforge1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Gehel experimentation in progress by dcausse [17:15:02] PROBLEM - puppet last run on stat1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_statistics_mediawiki] [17:15:58] (03PS1) 10Papaul: DNS: Remove mgmt DNS entries for subra and suhail [dns] - 10https://gerrit.wikimedia.org/r/367919 [17:16:59] 10Operations, 10Traffic: Investigate better DNS cache/lookup solutions - https://phabricator.wikimedia.org/T104442#3475432 (10BBlack) So to recap a small part of IRC discussion today in the wake of issues with rebooting hydrogen, I think our short-term improvement plan looks like this: 1) Implement OPS (one-p... [17:17:51] 10Operations, 10ops-codfw, 10hardware-requests, 10Patch-For-Review: Decommission subra/suhail - https://phabricator.wikimedia.org/T169506#3475436 (10Papaul) [17:18:27] 10Operations, 10ops-codfw: failing RAID disk on frdb2001 - https://phabricator.wikimedia.org/T171584#3475443 (10RobH) We've ordered two disks for this, one for immediate use and one for standby spares. They should arrive by early next week. [17:22:18] 10Operations, 10Puppet, 10Mobile, 10Need-volunteer, 10Reading-Web-Backlog (Tracking): URLs with title query string parameter and additional query string parameters do not redirect to mobile site - https://phabricator.wikimedia.org/T154227#3475450 (10Jdlrobson) [17:22:52] 10Operations, 10Analytics, 10Analytics-Cluster, 10Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2734568 (10leila) (There is a bit of IRC, email, meeting discussions as background missing here. but basically, Aaron, Andrew, and I chatted a couple of weeks ago... [17:25:48] (03PS1) 10Ema: Add support for One-packet scheduling (OPS) [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/367923 [17:26:53] (03PS2) 10RobH: DNS: Remove mgmt DNS entries for subra and suhail [dns] - 10https://gerrit.wikimedia.org/r/367919 (owner: 10Papaul) [17:27:12] (03CR) 10RobH: [C: 032] DNS: Remove mgmt DNS entries for subra and suhail [dns] - 10https://gerrit.wikimedia.org/r/367919 (owner: 10Papaul) [17:28:11] (03PS1) 10Ema: 1.13.10: Add support for One-packet scheduling (OPS) [debs/pybal] - 10https://gerrit.wikimedia.org/r/367924 [17:28:25] 10Operations, 10ops-codfw, 10hardware-requests, 10Patch-For-Review: Decommission subra/suhail - https://phabricator.wikimedia.org/T169506#3475464 (10RobH) a:05Papaul>03RobH merging papaul's dns change and removing switch port config [17:29:45] (03PS2) 10Ema: 1.13.10: Add support for One-packet scheduling (OPS) [debs/pybal] - 10https://gerrit.wikimedia.org/r/367924 (https://phabricator.wikimedia.org/T104442) [17:30:54] (03CR) 10Ema: [C: 032] Add support for One-packet scheduling (OPS) [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/367923 (owner: 10Ema) [17:31:11] (03PS1) 10Ema: 1.13.10: Add support for One-packet scheduling (OPS) [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/367925 (https://phabricator.wikimedia.org/T104442) [17:31:35] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: pdu phase inbalances: ps1-a3-codfw, ps1-c6-codfw, & ps1-d6-codfw - https://phabricator.wikimedia.org/T163339#3475474 (10Papaul) p:05Normal>03Low [17:31:59] (03PS1) 10BBlack: VCL: mobile_redirect: unconditional https [puppet] - 10https://gerrit.wikimedia.org/r/367926 [17:32:20] (03PS1) 10BBlack: recdns: do not use self in local resolv.conf [puppet] - 10https://gerrit.wikimedia.org/r/367927 (https://phabricator.wikimedia.org/T104442) [17:35:42] PROBLEM - puppet last run on conf1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:35:44] PROBLEM - puppet last run on cp3034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:35:44] PROBLEM - puppet last run on relforge1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:35:44] PROBLEM - puppet last run on mw1167 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:35:52] PROBLEM - puppet last run on etherpad1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:35:52] PROBLEM - puppet last run on mw1161 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:35:57] 10Operations, 10ops-codfw, 10hardware-requests: Decommission subra/suhail - https://phabricator.wikimedia.org/T169506#3475500 (10RobH) 05Open>03Resolved [17:36:02] PROBLEM - puppet last run on mw1256 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:02] PROBLEM - puppet last run on mw1278 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:02] PROBLEM - puppet last run on labvirt1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:02] PROBLEM - puppet last run on mc1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:02] PROBLEM - puppet last run on mw1300 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:03] PROBLEM - puppet last run on ms-fe1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:03] PROBLEM - puppet last run on ms1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:03] PROBLEM - puppet last run on db1075 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:12] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:12] PROBLEM - puppet last run on dbproxy1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:12] PROBLEM - puppet last run on stat1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:12] PROBLEM - puppet last run on db1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:13] PROBLEM - puppet last run on db1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:13] PROBLEM - puppet last run on wtp1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:13] PROBLEM - puppet last run on labvirt1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:22] PROBLEM - puppet last run on dbmonitor1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:22] PROBLEM - puppet last run on alsafi is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:22] PROBLEM - puppet last run on achernar is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:22] PROBLEM - puppet last run on mw1190 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:23] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:23] PROBLEM - puppet last run on analytics1050 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:23] PROBLEM - puppet last run on mwdebug1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:23] PROBLEM - puppet last run on graphite1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:23] PROBLEM - puppet last run on maps1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:32] PROBLEM - puppet last run on cp1045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:34] PROBLEM - puppet last run on labcontrol1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:34] PROBLEM - puppet last run on db1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:34] PROBLEM - puppet last run on praseodymium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:34] PROBLEM - puppet last run on ms-be1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:34] PROBLEM - puppet last run on analytics1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:34] PROBLEM - puppet last run on rdb1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:34] PROBLEM - puppet last run on dbproxy1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:34] PROBLEM - puppet last run on iridium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:34] PROBLEM - puppet last run on wtp1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:34] PROBLEM - puppet last run on prometheus2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:42] PROBLEM - puppet last run on db1082 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:42] PROBLEM - puppet last run on db1092 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:42] PROBLEM - puppet last run on labvirt1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:42] PROBLEM - puppet last run on lvs1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:42] PROBLEM - puppet last run on mw1303 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:43] PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:43] PROBLEM - puppet last run on etcd1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:43] PROBLEM - puppet last run on cp1060 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:36:43] PROBLEM - puppet last run on labweb1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:37:02] PROBLEM - puppet last run on elastic1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:37:02] PROBLEM - puppet last run on mw1240 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:37:02] PROBLEM - puppet last run on mw1250 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:37:02] PROBLEM - puppet last run on rdb1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:37:02] PROBLEM - puppet last run on analytics1056 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:37:03] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:37:03] PROBLEM - puppet last run on puppetmaster1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:37:04] PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:37:12] PROBLEM - puppet last run on pc1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:37:12] PROBLEM - puppet last run on elastic1042 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:37:12] PROBLEM - puppet last run on dbproxy1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:37:22] PROBLEM - puppet last run on cp3037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:37:22] PROBLEM - puppet last run on uranium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:37:23] PROBLEM - puppet last run on mw1218 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:38:03] PROBLEM - puppet last run on mw1246 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:38:32] PROBLEM - puppet last run on francium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:33] PROBLEM - puppet last run on analytics1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:33] PROBLEM - puppet last run on notebook1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:42] PROBLEM - puppet last run on db1070 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:42] PROBLEM - puppet last run on restbase1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:42] PROBLEM - puppet last run on sodium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:43] PROBLEM - puppet last run on kafka1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:43] PROBLEM - puppet last run on etcd1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:52] PROBLEM - puppet last run on ms-be1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:52] PROBLEM - puppet last run on mw1265 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:52] PROBLEM - puppet last run on mw1267 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:52] PROBLEM - puppet last run on mw1268 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:52] PROBLEM - puppet last run on ores1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:52] PROBLEM - puppet last run on es1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:55] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:56] PROBLEM - puppet last run on wasat is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:56] PROBLEM - puppet last run on cp1073 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:56] PROBLEM - puppet last run on labweb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:56] PROBLEM - puppet last run on ores1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:39:56] PROBLEM - puppet last run on ores1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:06] PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:13] PROBLEM - puppet last run on ms-fe3001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:13] PROBLEM - puppet last run on es1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:13] PROBLEM - puppet last run on logstash1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:13] PROBLEM - puppet last run on wtp1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:13] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:13] PROBLEM - puppet last run on analytics1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:13] PROBLEM - puppet last run on analytics1062 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:14] PROBLEM - puppet last run on mw1185 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:14] PROBLEM - puppet last run on db1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:15] PROBLEM - puppet last run on wtp1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:15] PROBLEM - puppet last run on maps1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:16] PROBLEM - puppet last run on californium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:22] PROBLEM - puppet last run on dbstore1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:22] PROBLEM - puppet last run on labsdb1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:22] PROBLEM - puppet last run on cp3035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:22] PROBLEM - puppet last run on mw1223 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:22] PROBLEM - puppet last run on wtp1040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:23] PROBLEM - puppet last run on elastic1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:23] PROBLEM - puppet last run on labtestweb2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:23] PROBLEM - puppet last run on cp1062 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:34] PROBLEM - puppet last run on phab2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:42] PROBLEM - puppet last run on labsdb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:42] PROBLEM - puppet last run on db1066 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:43] PROBLEM - puppet last run on cp3030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:52] PROBLEM - puppet last run on kubestagetcd1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:52] PROBLEM - puppet last run on hassium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:52] PROBLEM - puppet last run on aqs1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:52] PROBLEM - puppet last run on aqs1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:52] PROBLEM - puppet last run on ores1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:52] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:52] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:53] PROBLEM - puppet last run on prometheus1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:53] PROBLEM - puppet last run on mw1271 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:54] PROBLEM - puppet last run on db1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:40:58] PROBLEM - puppet last run on oresrdb1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:02] PROBLEM - puppet last run on ganeti1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:02] PROBLEM - puppet last run on mw1254 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:02] PROBLEM - puppet last run on rhenium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:02] PROBLEM - puppet last run on mw1299 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:02] PROBLEM - puppet last run on rdb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:03] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:03] PROBLEM - puppet last run on mc1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:03] PROBLEM - puppet last run on mw1283 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:03] PROBLEM - puppet last run on mw1255 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:04] PROBLEM - puppet last run on mw1197 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:04] PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:12] PROBLEM - puppet last run on rdb1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:12] PROBLEM - puppet last run on cp3044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:12] PROBLEM - puppet last run on radon is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:12] PROBLEM - puppet last run on oresrdb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:12] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:13] PROBLEM - puppet last run on dbproxy1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:13] PROBLEM - puppet last run on mw1282 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:13] PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:32] PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:33] PROBLEM - puppet last run on eventlog1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:42] PROBLEM - puppet last run on copper is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:42] PROBLEM - puppet last run on ms-be1025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:42] PROBLEM - puppet last run on cp1099 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:42] PROBLEM - puppet last run on db1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:42] PROBLEM - puppet last run on ms-be1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:44] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:44] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:44] PROBLEM - puppet last run on labtestservices2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:52] PROBLEM - puppet last run on cobalt is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:52] PROBLEM - puppet last run on kubestage1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:52] PROBLEM - puppet last run on ms-be1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:52] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:52] PROBLEM - puppet last run on mw1269 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:53] PROBLEM - puppet last run on thorium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:42:03] PROBLEM - puppet last run on mw1206 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:42:03] PROBLEM - puppet last run on elastic1038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:42:03] PROBLEM - puppet last run on analytics1057 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:42:12] PROBLEM - puppet last run on es1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:42:12] PROBLEM - puppet last run on mw1296 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:42:22] PROBLEM - puppet last run on db1091 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:42:23] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:42:42] PROBLEM - puppet last run on poolcounter1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:42:42] PROBLEM - puppet last run on db1071 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:42:43] PROBLEM - puppet last run on kafka1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:42:52] PROBLEM - puppet last run on mc1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:42:52] PROBLEM - puppet last run on db1097 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:42:52] PROBLEM - puppet last run on mw1263 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:02] PROBLEM - puppet last run on ms-be1034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:04] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:04] PROBLEM - puppet last run on ms-be1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:04] PROBLEM - puppet last run on mw1252 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:04] PROBLEM - puppet last run on wdqs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:12] PROBLEM - puppet last run on cerium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:12] PROBLEM - puppet last run on conf1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:12] PROBLEM - puppet last run on ganeti1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:12] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:12] PROBLEM - puppet last run on snapshot1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:13] PROBLEM - puppet last run on labvirt1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:13] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:13] PROBLEM - puppet last run on wtp1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:22] PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:22] PROBLEM - puppet last run on ganeti1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:33] PROBLEM - puppet last run on wtp1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:42] PROBLEM - puppet last run on db1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:43] PROBLEM - puppet last run on mw1239 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:43] PROBLEM - puppet last run on db1085 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:43] PROBLEM - puppet last run on mw1281 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:43] PROBLEM - puppet last run on labvirt1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:52] PROBLEM - puppet last run on db1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:54] PROBLEM - puppet last run on mwdebug1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:54] PROBLEM - puppet last run on db1099 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:43:54] PROBLEM - puppet last run on mw1209 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:02] PROBLEM - puppet last run on elastic1027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:12] PROBLEM - puppet last run on mc1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:12] PROBLEM - puppet last run on mw1184 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:12] PROBLEM - puppet last run on analytics1034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:12] PROBLEM - puppet last run on scb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:12] PROBLEM - puppet last run on mw1297 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:12] PROBLEM - puppet last run on ms-be3003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:13] PROBLEM - puppet last run on labcontrol1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:13] PROBLEM - puppet last run on thumbor1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:13] PROBLEM - puppet last run on analytics1061 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:14] PROBLEM - puppet last run on graphite1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:23] PROBLEM - puppet last run on ores1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:23] PROBLEM - puppet last run on mw1229 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:23] PROBLEM - puppet last run on wtp1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:23] PROBLEM - puppet last run on db1096 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:23] PROBLEM - puppet last run on mw1162 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:33] PROBLEM - puppet last run on db1041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:33] PROBLEM - puppet last run on mw1231 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:33] PROBLEM - puppet last run on wtp1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:33] PROBLEM - puppet last run on db1054 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:42] PROBLEM - puppet last run on rdb1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:42] PROBLEM - puppet last run on rhodium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:42] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:42] PROBLEM - puppet last run on scb1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:43] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:52] PROBLEM - puppet last run on cp3047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:52] PROBLEM - puppet last run on mw1207 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:52] PROBLEM - puppet last run on dysprosium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:52] PROBLEM - puppet last run on mw1211 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:53] PROBLEM - puppet last run on analytics1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:53] PROBLEM - puppet last run on elastic1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:53] PROBLEM - puppet last run on ocg1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:44:53] PROBLEM - puppet last run on labtestservices2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:02] PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:03] PROBLEM - puppet last run on db1087 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:03] PROBLEM - puppet last run on darmstadtium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:03] PROBLEM - puppet last run on db1072 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:12] PROBLEM - puppet last run on analytics1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:12] PROBLEM - puppet last run on analytics1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:12] PROBLEM - puppet last run on db1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:13] PROBLEM - puppet last run on mw1294 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:13] PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:13] PROBLEM - puppet last run on logstash1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:13] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:13] PROBLEM - puppet last run on aqs1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:22] PROBLEM - puppet last run on thumbor1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:22] PROBLEM - puppet last run on ms-be1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:32] PROBLEM - puppet last run on mc1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:32] PROBLEM - puppet last run on ms-be1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:32] PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:36] PROBLEM - puppet last run on labcontrol1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:42] PROBLEM - puppet last run on mw1232 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:42] PROBLEM - puppet last run on db1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:42] PROBLEM - puppet last run on poolcounter1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:52] PROBLEM - puppet last run on mw1275 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:52] PROBLEM - puppet last run on db1105 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:52] PROBLEM - puppet last run on db1061 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:52] PROBLEM - puppet last run on ms-be1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:45:52] PROBLEM - puppet last run on bromine is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:02] PROBLEM - puppet last run on ms-fe1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:03] PROBLEM - puppet last run on db1044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:03] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:03] PROBLEM - puppet last run on wtp1044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:03] PROBLEM - puppet last run on labvirt1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:12] PROBLEM - puppet last run on mw1276 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:12] PROBLEM - puppet last run on mw1289 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:12] PROBLEM - puppet last run on analytics1058 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:13] PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:13] !log nitrogen: disabled puppet agent, manually hacked puppetdb.service unit file, restarted puppetdb.service... [17:46:22] PROBLEM - puppet last run on mw1285 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:22] PROBLEM - puppet last run on db1047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:22] PROBLEM - puppet last run on mw1260 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:22] PROBLEM - puppet last run on elastic1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:32] PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:33] PROBLEM - puppet last run on ganeti1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:42] PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:42] PROBLEM - puppet last run on analytics1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:42] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:43] PROBLEM - puppet last run on cp3033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:43] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:55] PROBLEM - puppet last run on lvs1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:55] PROBLEM - puppet last run on mw1305 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:55] PROBLEM - puppet last run on fermium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:55] PROBLEM - puppet last run on mw1245 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:46:55] PROBLEM - puppet last run on analytics1064 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:02] PROBLEM - puppet last run on mw1215 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:02] PROBLEM - puppet last run on mw1165 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:02] PROBLEM - puppet last run on elastic1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:02] PROBLEM - puppet last run on etcd1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:03] PROBLEM - puppet last run on mw1248 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:03] PROBLEM - puppet last run on mw1247 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:12] PROBLEM - puppet last run on elastic1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:12] PROBLEM - puppet last run on cp3040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:12] PROBLEM - puppet last run on cp3041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:12] PROBLEM - puppet last run on elastic1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:12] PROBLEM - puppet last run on analytics1066 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:12] PROBLEM - puppet last run on analytics1063 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:12] PROBLEM - puppet last run on labtestcontrol2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:13] PROBLEM - puppet last run on wtp1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:22] PROBLEM - puppet last run on wtp1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:24] PROBLEM - puppet last run on ms-be1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:32] PROBLEM - puppet last run on cp3045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:32] PROBLEM - puppet last run on ms-be1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:32] PROBLEM - puppet last run on elastic1050 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:42] PROBLEM - puppet last run on aqs1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:42] PROBLEM - puppet last run on mw1225 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:42] PROBLEM - puppet last run on dbstore1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:52] PROBLEM - puppet last run on db1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:52] PROBLEM - puppet last run on kafka1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:47:52] PROBLEM - puppet last run on db1084 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:48:02] PROBLEM - puppet last run on lvs1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:48:12] PROBLEM - puppet last run on kubernetes1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:48:17] I got acamar (one of the failed above) to run [17:48:22] RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [17:48:23] I wonder if things will now settle on their own, or not? [17:48:34] (or how delayed these reports are, too) [17:48:35] (03CR) 10Ayounsi: [C: 032] "Thanks, I'll add IPv6 to the interfaces when we will bring v6 to the services." [dns] - 10https://gerrit.wikimedia.org/r/367809 (https://phabricator.wikimedia.org/T169643) (owner: 10Ayounsi) [17:48:39] (03PS2) 10Ayounsi: Add pfw3-codfw loopback and uplinks IPs to DNS [dns] - 10https://gerrit.wikimedia.org/r/367809 (https://phabricator.wikimedia.org/T169643) [17:53:15] 10Operations, 10Puppet, 10Patch-For-Review: PuppetDB misbehaving on 2017-07-15 - https://phabricator.wikimedia.org/T170740#3475569 (10BBlack) So, things fell over again with a ton of puppetfail spam. As a stopgap, I've done the following: 1. Disabled the agent on nitrogen 2. Edited the puppetdb.service sys... [17:54:46] 10Operations, 10ops-codfw: failing RAID disk on frdb2001 - https://phabricator.wikimedia.org/T171584#3475576 (10RobH) p:05Normal>03High a:03Papaul I've assigned this to @papaul and moved it into the high priority column on the #ops-codfw workboard. This is blocked until the disks ordered on T171620 arri... [17:55:12] 10Operations, 10Cloud-VPS: rack/setup/install labtestmetal2001.codfw.wmnet - https://phabricator.wikimedia.org/T168891#3475586 (10RobH) [17:55:34] 10Operations, 10Cloud-VPS, 10Patch-For-Review: rack/setup/install labtestpuppetmaster2001 - https://phabricator.wikimedia.org/T167157#3475589 (10RobH) [17:55:51] schana: there you go [17:56:00] (03PS1) 10Bearloga: statistics::discovery: Reconfigure for Golden data retrieval [puppet] - 10https://gerrit.wikimedia.org/r/367930 (https://phabricator.wikimedia.org/T170494) [17:56:28] thanks Sagan [17:56:34] you're welcome :) [17:57:27] (03CR) 10jerkins-bot: [V: 04-1] statistics::discovery: Reconfigure for Golden data retrieval [puppet] - 10https://gerrit.wikimedia.org/r/367930 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga) [17:58:52] RECOVERY - puppet last run on prometheus2003 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [17:58:56] (03PS1) 10Ayounsi: Assign internal IPs to pfw3-codfw<->pfw3-eqiad ipsec link [dns] - 10https://gerrit.wikimedia.org/r/367933 (https://phabricator.wikimedia.org/T169643) [17:59:00] (03PS2) 10Bearloga: statistics::discovery: Reconfigure for Golden data retrieval [puppet] - 10https://gerrit.wikimedia.org/r/367930 (https://phabricator.wikimedia.org/T170494) [18:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170726T1800). Please do the needful. [18:00:05] RoanKattouw: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [18:00:18] (03CR) 10jerkins-bot: [V: 04-1] statistics::discovery: Reconfigure for Golden data retrieval [puppet] - 10https://gerrit.wikimedia.org/r/367930 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga) [18:00:24] blergh [18:01:52] RECOVERY - puppet last run on wtp1010 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [18:01:52] RECOVERY - puppet last run on dbproxy1003 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [18:01:53] RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [18:02:03] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [18:02:12] RECOVERY - puppet last run on mw1256 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [18:02:12] RECOVERY - puppet last run on mw1278 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [18:02:13] RECOVERY - puppet last run on mc1036 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [18:02:13] RECOVERY - puppet last run on analytics1056 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [18:02:22] RECOVERY - puppet last run on puppetmaster1002 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [18:02:22] RECOVERY - puppet last run on dbproxy1011 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [18:02:32] RECOVERY - puppet last run on db1036 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [18:02:32] RECOVERY - puppet last run on dbmonitor1001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [18:02:33] RECOVERY - puppet last run on alsafi is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [18:02:42] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [18:02:42] RECOVERY - puppet last run on cp1045 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [18:02:52] RECOVERY - puppet last run on conf1005 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [18:02:53] RECOVERY - puppet last run on etcd1001 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [18:03:02] RECOVERY - puppet last run on mw1167 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [18:03:02] RECOVERY - puppet last run on labweb1002 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [18:03:03] RECOVERY - puppet last run on etherpad1001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [18:03:03] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [18:03:12] RECOVERY - puppet last run on labvirt1011 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [18:03:12] RECOVERY - puppet last run on rdb1005 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [18:03:15] I can SWAT. [18:03:22] RECOVERY - puppet last run on ms-fe1006 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [18:03:22] o/ [18:03:23] RECOVERY - puppet last run on elastic1042 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [18:03:32] RECOVERY - puppet last run on wtp1015 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [18:03:32] RECOVERY - puppet last run on labvirt1007 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [18:03:33] RECOVERY - puppet last run on mw1190 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [18:03:42] RECOVERY - puppet last run on analytics1050 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [18:03:42] RECOVERY - puppet last run on mwdebug1001 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [18:03:42] RECOVERY - puppet last run on cp3037 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [18:03:42] RECOVERY - puppet last run on graphite1003 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [18:03:42] RECOVERY - puppet last run on achernar is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [18:03:43] RECOVERY - puppet last run on analytics1039 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [18:03:43] RECOVERY - puppet last run on iridium is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [18:03:52] RECOVERY - puppet last run on db1092 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [18:03:53] RECOVERY - puppet last run on labvirt1002 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [18:03:53] RECOVERY - puppet last run on mw1303 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [18:03:53] RECOVERY - puppet last run on relforge1001 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [18:04:02] RECOVERY - puppet last run on cp1060 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [18:04:02] RECOVERY - puppet last run on cp3034 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [18:04:13] RECOVERY - puppet last run on mw1240 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [18:04:13] RECOVERY - puppet last run on mw1246 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [18:04:13] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [18:04:19] I'm here [18:04:22] RECOVERY - puppet last run on db1075 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [18:04:22] RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [18:04:22] RECOVERY - puppet last run on cp1066 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [18:04:22] RECOVERY - puppet last run on pc1006 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [18:04:32] RECOVERY - puppet last run on dbproxy1001 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [18:04:32] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [18:04:32] RECOVERY - puppet last run on uranium is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [18:04:42] RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [18:04:43] RECOVERY - puppet last run on maps1003 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [18:04:43] RECOVERY - puppet last run on francium is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [18:04:43] RECOVERY - puppet last run on labcontrol1002 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [18:04:43] RECOVERY - puppet last run on db1030 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [18:04:52] RECOVERY - puppet last run on rdb1003 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [18:04:52] RECOVERY - puppet last run on ms-be1037 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [18:04:52] RECOVERY - puppet last run on db1082 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [18:05:02] RECOVERY - puppet last run on aqs1007 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [18:05:12] RECOVERY - puppet last run on ores1001 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [18:05:12] RECOVERY - puppet last run on ores1006 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [18:05:13] RECOVERY - puppet last run on mw1241 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [18:05:13] RECOVERY - puppet last run on elastic1046 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [18:05:13] RECOVERY - puppet last run on mw1250 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [18:05:13] RECOVERY - puppet last run on mw1300 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [18:05:22] RECOVERY - puppet last run on ms1001 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [18:05:22] RECOVERY - puppet last run on es1014 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [18:05:32] RECOVERY - puppet last run on ms-be3004 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [18:05:32] RECOVERY - puppet last run on db1026 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [18:05:32] RECOVERY - puppet last run on maps1001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [18:05:42] RECOVERY - puppet last run on relforge1002 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [18:05:42] RECOVERY - puppet last run on db1034 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [18:05:43] RECOVERY - puppet last run on kubestage1001 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [18:05:43] RECOVERY - puppet last run on praseodymium is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [18:05:52] RECOVERY - puppet last run on analytics1002 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [18:06:02] RECOVERY - puppet last run on db1070 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [18:06:02] RECOVERY - puppet last run on lvs1001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [18:06:02] RECOVERY - puppet last run on mw1271 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [18:06:03] RECOVERY - puppet last run on cp1073 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [18:06:12] RECOVERY - puppet last run on ganeti1007 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [18:06:12] RECOVERY - puppet last run on labweb1001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [18:06:12] RECOVERY - puppet last run on db1053 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [18:06:12] RECOVERY - puppet last run on mw1302 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [18:06:12] RECOVERY - puppet last run on cp1051 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [18:06:22] RECOVERY - puppet last run on alcyone is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [18:06:22] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [18:06:22] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [18:06:32] RECOVERY - puppet last run on wtp1040 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [18:06:33] RECOVERY - puppet last run on cp1062 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [18:06:43] RECOVERY - puppet last run on kafka1013 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [18:06:51] (03PS2) 10Niharika29: Create 'rollbacker' user group in frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365538 (https://phabricator.wikimedia.org/T170780) (owner: 10Framawiki) [18:06:52] RECOVERY - puppet last run on ms-be1025 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [18:07:02] RECOVERY - puppet last run on restbase1011 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [18:07:02] RECOVERY - puppet last run on kafka1020 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [18:07:02] RECOVERY - puppet last run on etcd1004 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [18:07:02] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [18:07:02] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [18:07:02] RECOVERY - puppet last run on mw1268 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [18:07:03] RECOVERY - puppet last run on ores1007 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [18:07:03] RECOVERY - puppet last run on es1013 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [18:07:03] RECOVERY - puppet last run on cp3030 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [18:07:04] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:07:04] RECOVERY - puppet last run on ms-be1031 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [18:07:08] (03CR) 10Niharika29: [C: 032] Create 'rollbacker' user group in frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365538 (https://phabricator.wikimedia.org/T170780) (owner: 10Framawiki) [18:07:12] RECOVERY - puppet last run on db1076 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [18:07:13] RECOVERY - puppet last run on mc1019 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [18:07:13] RECOVERY - puppet last run on maps1004 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [18:07:22] RECOVERY - puppet last run on wtp1018 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [18:07:22] RECOVERY - puppet last run on wtp1002 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [18:07:22] RECOVERY - puppet last run on dbproxy1006 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [18:07:27] Shush, icinga-wm. [18:07:32] RECOVERY - puppet last run on bast1001 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [18:07:33] RECOVERY - puppet last run on mw1185 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [18:07:33] RECOVERY - puppet last run on wtp1029 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [18:07:33] RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [18:07:33] RECOVERY - puppet last run on mw1223 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [18:07:42] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [18:07:43] RECOVERY - puppet last run on labtestweb2001 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [18:07:52] RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [18:07:52] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [18:07:52] RECOVERY - puppet last run on mw1227 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [18:07:52] RECOVERY - puppet last run on notebook1002 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [18:07:53] RECOVERY - puppet last run on phab2001 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [18:08:02] RECOVERY - puppet last run on labsdb1001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [18:08:02] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [18:08:03] RECOVERY - puppet last run on hassium is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [18:08:03] RECOVERY - puppet last run on aqs1008 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [18:08:12] RECOVERY - puppet last run on mw1267 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [18:08:12] RECOVERY - puppet last run on mw1265 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [18:08:12] RECOVERY - puppet last run on db1023 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [18:08:13] RECOVERY - puppet last run on mw1254 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:08:13] RECOVERY - puppet last run on rhenium is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [18:08:22] RECOVERY - puppet last run on rdb1004 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [18:08:23] RECOVERY - puppet last run on mw1244 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [18:08:23] RECOVERY - puppet last run on mw1197 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [18:08:23] RECOVERY - puppet last run on chlorine is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [18:08:23] RECOVERY - puppet last run on stat1004 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [18:08:23] RECOVERY - puppet last run on logstash1003 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [18:08:32] RECOVERY - puppet last run on analytics1065 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [18:08:32] RECOVERY - puppet last run on californium is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [18:08:32] RECOVERY - puppet last run on db1001 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [18:08:32] RECOVERY - puppet last run on dbstore1002 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [18:08:33] RECOVERY - puppet last run on cp3044 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [18:08:33] RECOVERY - puppet last run on mw1187 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [18:08:33] RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [18:08:34] RECOVERY - puppet last run on ms-fe3001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [18:08:42] RECOVERY - puppet last run on elastic1051 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [18:08:42] RECOVERY - puppet last run on labsdb1005 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [18:08:43] RECOVERY - puppet last run on wtp1025 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [18:08:43] RECOVERY - puppet last run on wtp1038 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [18:08:43] RECOVERY - puppet last run on cp3035 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [18:08:43] RECOVERY - puppet last run on ms-be1028 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [18:08:52] RECOVERY - puppet last run on elastic1021 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [18:08:52] RECOVERY - puppet last run on ms-be1016 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [18:09:02] RECOVERY - puppet last run on oxygen is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [18:09:02] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [18:09:02] RECOVERY - puppet last run on db1066 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [18:09:03] RECOVERY - puppet last run on ores1009 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [18:09:03] RECOVERY - puppet last run on prometheus1003 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [18:09:12] RECOVERY - puppet last run on ms-be1030 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [18:09:12] RECOVERY - puppet last run on oresrdb1002 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [18:09:12] RECOVERY - puppet last run on cp1074 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [18:09:13] (03PS3) 10Bearloga: statistics::discovery: Reconfigure for Golden data retrieval [puppet] - 10https://gerrit.wikimedia.org/r/367930 (https://phabricator.wikimedia.org/T170494) [18:09:13] RECOVERY - puppet last run on wasat is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [18:09:22] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [18:09:22] RECOVERY - puppet last run on mw1200 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [18:09:22] RECOVERY - puppet last run on mw1283 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [18:09:22] RECOVERY - puppet last run on mw1255 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [18:09:22] RECOVERY - puppet last run on es1019 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [18:09:23] RECOVERY - puppet last run on rdb1007 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [18:09:23] RECOVERY - puppet last run on radon is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [18:09:23] RECOVERY - puppet last run on oresrdb1001 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [18:09:28] (03Merged) 10jenkins-bot: Create 'rollbacker' user group in frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365538 (https://phabricator.wikimedia.org/T170780) (owner: 10Framawiki) [18:09:32] RECOVERY - puppet last run on mw1282 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [18:09:32] RECOVERY - puppet last run on mw1259 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [18:09:32] RECOVERY - puppet last run on analytics1062 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [18:09:33] RECOVERY - puppet last run on db1091 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [18:09:37] (03CR) 10jenkins-bot: Create 'rollbacker' user group in frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365538 (https://phabricator.wikimedia.org/T170780) (owner: 10Framawiki) [18:09:42] RECOVERY - puppet last run on wtp1006 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [18:09:52] RECOVERY - puppet last run on eventlog1001 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [18:09:52] RECOVERY - puppet last run on wtp1037 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [18:09:52] RECOVERY - puppet last run on copper is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [18:09:52] RECOVERY - puppet last run on cp1099 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [18:09:53] RECOVERY - puppet last run on db1011 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [18:10:02] RECOVERY - puppet last run on poolcounter1001 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [18:10:02] RECOVERY - puppet last run on ms-be1022 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [18:10:02] RECOVERY - puppet last run on kafka1001 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [18:10:03] RECOVERY - puppet last run on cobalt is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [18:10:03] RECOVERY - puppet last run on kubestage1002 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [18:10:03] RECOVERY - puppet last run on labtestservices2003 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [18:10:12] RECOVERY - puppet last run on mc1031 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [18:10:12] RECOVERY - puppet last run on mw1269 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [18:10:12] RECOVERY - puppet last run on mw1263 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [18:10:12] RECOVERY - puppet last run on db1097 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [18:10:12] RECOVERY - puppet last run on thorium is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [18:10:22] RECOVERY - puppet last run on mw1299 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [18:10:24] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [18:10:24] RECOVERY - puppet last run on mw1252 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [18:10:24] RECOVERY - puppet last run on mw1206 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [18:10:24] RECOVERY - puppet last run on elastic1038 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [18:10:24] RECOVERY - puppet last run on analytics1057 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:10:24] RECOVERY - puppet last run on wdqs1002 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [18:10:24] RECOVERY - puppet last run on conf1001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [18:10:24] RECOVERY - puppet last run on ganeti1003 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [18:10:25] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [18:10:25] RECOVERY - puppet last run on ms-be1017 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [18:10:26] RECOVERY - puppet last run on snapshot1006 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [18:10:32] RECOVERY - puppet last run on labvirt1017 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [18:10:32] RECOVERY - puppet last run on analytics1046 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [18:10:32] RECOVERY - puppet last run on wtp1048 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [18:10:42] RECOVERY - puppet last run on ganeti1001 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [18:10:42] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [18:11:02] RECOVERY - puppet last run on mw1239 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [18:11:02] RECOVERY - puppet last run on db1071 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [18:11:02] RECOVERY - puppet last run on db1085 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [18:11:03] RECOVERY - puppet last run on db1043 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [18:11:12] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [18:11:13] RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [18:11:22] RECOVERY - puppet last run on db1072 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [18:11:22] RECOVERY - puppet last run on mc1035 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [18:11:22] RECOVERY - puppet last run on analytics1048 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [18:11:22] RECOVERY - puppet last run on ms-be1034 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [18:11:22] RECOVERY - puppet last run on scb1001 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [18:11:22] RECOVERY - puppet last run on cerium is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [18:11:23] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [18:11:32] RECOVERY - puppet last run on labcontrol1003 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [18:11:32] RECOVERY - puppet last run on analytics1061 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [18:11:32] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [18:11:32] RECOVERY - puppet last run on ores1005 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [18:11:33] RECOVERY - puppet last run on mw1229 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [18:11:43] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [18:11:52] RECOVERY - puppet last run on db1054 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [18:11:53] RECOVERY - puppet last run on rhodium is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [18:12:02] RECOVERY - puppet last run on mw1281 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [18:12:04] RECOVERY - puppet last run on labvirt1005 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [18:12:04] RECOVERY - puppet last run on dysprosium is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [18:12:12] RECOVERY - puppet last run on db1099 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [18:12:13] RECOVERY - puppet last run on ocg1003 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [18:12:13] RECOVERY - puppet last run on elastic1029 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [18:12:13] RECOVERY - puppet last run on mw1209 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [18:12:22] RECOVERY - puppet last run on analytics1067 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [18:12:22] RECOVERY - puppet last run on mw1184 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [18:12:22] RECOVERY - puppet last run on analytics1034 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [18:12:23] RECOVERY - puppet last run on mw1296 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [18:12:23] RECOVERY - puppet last run on mw1297 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [18:12:23] RECOVERY - puppet last run on db1035 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [18:12:32] RECOVERY - puppet last run on logstash1004 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [18:12:32] RECOVERY - puppet last run on thumbor1004 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [18:12:32] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [18:12:32] RECOVERY - puppet last run on graphite1001 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [18:12:40] (03CR) 10jerkins-bot: [V: 04-1] statistics::discovery: Reconfigure for Golden data retrieval [puppet] - 10https://gerrit.wikimedia.org/r/367930 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga) [18:12:42] RECOVERY - puppet last run on stat1006 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [18:12:42] RECOVERY - puppet last run on wtp1046 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [18:12:42] RECOVERY - puppet last run on db1096 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [18:12:42] RECOVERY - puppet last run on mw1162 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [18:12:52] RECOVERY - puppet last run on mw1231 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [18:12:53] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [18:12:53] RECOVERY - puppet last run on rdb1008 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [18:13:02] RECOVERY - puppet last run on db1039 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [18:13:02] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [18:13:02] RECOVERY - puppet last run on mw1207 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [18:13:03] RECOVERY - puppet last run on kubestagetcd1003 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [18:13:12] RECOVERY - puppet last run on mwdebug1002 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [18:13:12] RECOVERY - puppet last run on cp3047 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [18:13:13] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [18:13:22] RECOVERY - puppet last run on darmstadtium is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [18:13:32] RECOVERY - puppet last run on aqs1005 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [18:13:33] RECOVERY - puppet last run on ms-be1015 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [18:13:42] RECOVERY - puppet last run on ms-be3003 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [18:13:42] RECOVERY - puppet last run on mc1023 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [18:13:43] RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [18:13:52] RECOVERY - puppet last run on ms-be1029 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [18:13:52] RECOVERY - puppet last run on labcontrol1001 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [18:13:52] RECOVERY - puppet last run on db1041 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [18:13:52] RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [18:13:52] RECOVERY - puppet last run on wtp1003 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [18:13:53] RECOVERY - puppet last run on db1048 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [18:13:53] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [18:14:02] RECOVERY - puppet last run on scb1003 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [18:14:03] RECOVERY - puppet last run on db1105 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [18:14:12] RECOVERY - puppet last run on bromine is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [18:14:12] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [18:14:12] RECOVERY - puppet last run on analytics1052 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:14:13] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [18:14:13] RECOVERY - puppet last run on wtp1044 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [18:14:22] RECOVERY - puppet last run on db1087 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [18:14:23] RECOVERY - puppet last run on mw1289 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [18:14:23] RECOVERY - puppet last run on analytics1058 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [18:14:32] RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [18:14:32] Gah, I'm sorry, someone will have to take over the SWAT. This wifi network doesn't let me ssh in anywhere. :( I should have checked before I volunteered. [18:14:32] RECOVERY - puppet last run on thumbor1003 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [18:14:33] RECOVERY - puppet last run on mw1260 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [18:14:33] RECOVERY - puppet last run on elastic1052 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [18:14:42] RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [18:14:43] Or wait. [18:14:52] RECOVERY - puppet last run on ganeti1004 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [18:14:52] RECOVERY - puppet last run on aqs1004 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [18:14:52] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [18:14:56] Oh, I'm in. Never mind. [18:15:02] RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [18:15:02] RECOVERY - puppet last run on poolcounter1002 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [18:15:02] RECOVERY - puppet last run on mw1275 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [18:15:03] RECOVERY - puppet last run on fermium is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [18:15:12] RECOVERY - puppet last run on db1061 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [18:15:12] RECOVERY - puppet last run on cp3033 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [18:15:12] RECOVERY - puppet last run on elastic1028 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [18:15:12] RECOVERY - puppet last run on mw1215 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [18:15:12] RECOVERY - puppet last run on ms-fe1008 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [18:15:13] RECOVERY - puppet last run on ms-be1033 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [18:15:13] RECOVERY - puppet last run on db1044 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [18:15:13] RECOVERY - puppet last run on etcd1006 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [18:15:13] RECOVERY - puppet last run on labtestservices2001 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [18:15:22] RECOVERY - puppet last run on labvirt1001 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [18:15:23] RECOVERY - puppet last run on mw1276 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [18:15:23] RECOVERY - puppet last run on elastic1043 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [18:15:23] RECOVERY - puppet last run on elastic1039 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [18:15:23] RECOVERY - puppet last run on analytics1063 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [18:15:23] RECOVERY - puppet last run on kubernetes1002 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [18:15:23] RECOVERY - puppet last run on wtp1020 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [18:15:32] RECOVERY - puppet last run on labtestcontrol2001 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [18:15:32] RECOVERY - puppet last run on db1047 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [18:15:32] RECOVERY - puppet last run on mw1285 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [18:15:32] RECOVERY - puppet last run on cp3040 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [18:15:32] RECOVERY - puppet last run on cp3041 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [18:15:32] RECOVERY - puppet last run on wtp1022 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [18:15:52] RECOVERY - puppet last run on elastic1050 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [18:15:52] RECOVERY - puppet last run on ms-be1035 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [18:15:52] RECOVERY - puppet last run on mw1225 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [18:15:52] RECOVERY - puppet last run on analytics1036 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [18:16:02] RECOVERY - puppet last run on dbstore1001 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [18:16:02] RECOVERY - puppet last run on db1037 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [18:16:02] RECOVERY - puppet last run on mw1305 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [18:16:03] RECOVERY - puppet last run on db1084 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [18:16:03] RECOVERY - puppet last run on lvs1004 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [18:16:12] RECOVERY - puppet last run on mw1245 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [18:16:12] RECOVERY - puppet last run on analytics1064 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [18:16:12] RECOVERY - puppet last run on mw1165 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [18:16:12] RECOVERY - puppet last run on lvs1006 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [18:16:22] RECOVERY - puppet last run on mw1248 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [18:16:23] RECOVERY - puppet last run on mw1247 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [18:16:24] RECOVERY - puppet last run on analytics1066 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:16:32] RECOVERY - puppet last run on mw1294 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [18:16:36] framawiki: Can you check your changes on mwdebug1002? They're there. [18:16:42] RECOVERY - puppet last run on ms-be1018 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [18:16:49] Niharika: i'm on it [18:16:52] RECOVERY - puppet last run on cp3045 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [18:17:03] RECOVERY - puppet last run on kafka1014 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [18:17:44] Niharika: good for me on mwdebug1002 [18:17:54] framawiki: Ack. [18:18:38] RoanKattouw: Hey there. https://gerrit.wikimedia.org/r/#/c/367833/ is on mwdebug1002. [18:19:01] Cool, checking [18:20:33] Niharika: Looks like it's working, but it also needs https://gerrit.wikimedia.org/r/#/c/367850 otherwise it'll be worse than what's there currently [18:20:53] !log niharika29@tin Synchronized wmf-config/InitialiseSettings.php: Create 'rollbacker' user group in frwiki https://gerrit.wikimedia.org/r/#/c/365538/ (duration: 00m 47s) [18:21:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:12] framawiki: All synced. [18:21:17] RoanKattouw: I'm on it. [18:22:22] Niharika: :) thx ! [18:23:00] framawiki: What's the deal with https://gerrit.wikimedia.org/r/#/c/341267/ ? [18:23:15] I read the discussion briefly. [18:23:45] Nikerabbit: looks like it's not for today. [18:24:06] I'll remove this from wikitech deployments page [18:24:34] If I had a penny for every time someone pinged Niklas when they meant to ping me... [18:24:42] framawiki: Okay, cool. [18:25:06] lol [18:28:14] 10Operations, 10Puppet, 10Traffic, 10Mobile, and 2 others: URLs with title query string parameter and additional query string parameters do not redirect to mobile site - https://phabricator.wikimedia.org/T154227#3475647 (10Dzahn) [18:31:37] RoanKattouw: https://gerrit.wikimedia.org/r/#/c/367837/ is on mwdebug1002 as well. Do you also want https://gerrit.wikimedia.org/r/#/c/367850/ to be able to test this one? [18:32:06] No I can test that separaetly [18:33:08] Niharika: It works [18:34:43] RoanKattouw: Syncing... [18:35:07] (03PS1) 10Jdlrobson: Update several Wikipedia projects to existing wordmarks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367938 (https://phabricator.wikimedia.org/T171556) [18:35:15] !log niharika29@tin Synchronized php-1.30.0-wmf.11/resources/src/mediawiki.rcfilters/: RCFilters: Improve loading animation https://gerrit.wikimedia.org/r/#/c/367833/, RCFilters UI: Unbreak limit and days widgets in non-experimental mode https://gerrit.wikimedia.org/r/#/c/367837/ (duration: 00m 45s) [18:35:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:35:29] room for one ore swat Niharika ? https://gerrit.wikimedia.org/r/367938 Update several Wikipedia projects to existing wordmarks [18:35:34] Both of those done. [18:35:41] Yup, jdlrobson. Add it to the calendar. [18:35:57] thank you and done :) [18:36:59] (03CR) 10Niharika29: [C: 032] Update several Wikipedia projects to existing wordmarks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367938 (https://phabricator.wikimedia.org/T171556) (owner: 10Jdlrobson) [18:38:01] oh shoot waiy [18:38:13] (03PS2) 10Jdlrobson: Update several Wikipedia projects to existing wordmarks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367938 (https://phabricator.wikimedia.org/T171556) [18:38:15] i forgot to check in the dblists... ^ [18:38:47] ^ Niharika [18:39:41] (03CR) 10Niharika29: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367938 (https://phabricator.wikimedia.org/T171556) (owner: 10Jdlrobson) [18:39:59] (03CR) 10jerkins-bot: [V: 04-1] Update several Wikipedia projects to existing wordmarks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367938 (https://phabricator.wikimedia.org/T171556) (owner: 10Jdlrobson) [18:40:37] RoanKattouw: And https://gerrit.wikimedia.org/r/#/c/367850/ is on mwdebug1002 too. [18:40:43] jdlrobson: ^^ [18:41:01] Niharika: what about the -1 above? [18:41:12] (03CR) 10jerkins-bot: [V: 04-1] Update several Wikipedia projects to existing wordmarks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367938 (https://phabricator.wikimedia.org/T171556) (owner: 10Jdlrobson) [18:41:24] jdlrobson: Yeah, the -1. [18:41:34] somethings broke.. [18:41:49] NocDblistTest::testNocDblists 18:39:58 Failed asserting that two arrays are equal. [18:43:58] Niharika: that's weird.. is that for ps1 or ps2? [18:44:06] (03CR) 10Jdlrobson: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367938 (https://phabricator.wikimedia.org/T171556) (owner: 10Jdlrobson) [18:44:36] The first one got a +2 from Jenkins. [18:46:15] can replicate locally.. so exploring what's going on here [18:46:20] (03CR) 10Niharika29: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367938 (https://phabricator.wikimedia.org/T171556) (owner: 10Jdlrobson) [18:46:34] (03CR) 10jerkins-bot: [V: 04-1] Update several Wikipedia projects to existing wordmarks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367938 (https://phabricator.wikimedia.org/T171556) (owner: 10Jdlrobson) [18:47:46] Niharika: Looks good [18:47:56] Alright then. [18:48:27] anyone know what docroot/noc/conf is for? [18:48:46] RainbowSprinkles: ? [18:49:26] jdlrobson: https://noc.wikimedia.org/ [18:49:30] it looks like the process for adding dblists changed.. https://gerrit.wikimedia.org/r/#/c/367938/2 and im not sure what i need to do [18:50:26] !log niharika29@tin Synchronized php-1.30.0-wmf.11/resources/src/mediawiki.rcfilters/: RCFilters: Followup I78e23f85c3: Don't disable RCFilters system when fetching results https://gerrit.wikimedia.org/r/#/c/367850/ (duration: 00m 46s) [18:50:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:50:36] RoanKattouw: Done^ [18:51:45] jdlrobson: Um, should we take if off this SWAT? [18:53:51] I don't like that patch [18:53:54] It looks weird [18:53:57] Plz remove [18:54:11] RainbowSprinkles: what's weird about it? what am i doing wrong? [18:54:38] (03PS3) 10Jdlrobson: Update several Wikipedia projects to existing wordmarks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367938 (https://phabricator.wikimedia.org/T171556) [18:55:00] jdlrobson: It just feels weird. I haven't even looked at jenkins failing yet. [18:55:14] Swat ends in 5 and I have train window, let's just boot it for now [18:55:26] sure, but can you articulate more? I will be swatting this at 4pm today if not now [18:55:45] so i need to understand what's weird about it. I'm trying to avoid having 20 identical lines [18:55:49] (in config) [18:56:01] (03CR) 10jerkins-bot: [V: 04-1] Update several Wikipedia projects to existing wordmarks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367938 (https://phabricator.wikimedia.org/T171556) (owner: 10Jdlrobson) [18:56:18] dblist seems appropriate in this case, or if not that a programattic approach e.g. if lang=='hu' use the fr value [18:56:50] I'm not familiar with NocDblistTest::testNocDblists so I'm not sure what it's testing.. [18:58:09] (03PS4) 10Jdlrobson: Update several Wikipedia projects to existing wordmarks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367938 (https://phabricator.wikimedia.org/T171556) [18:58:09] I think it's making sure all dblists are shown on https://noc.wikimedia.org/conf/ [18:58:18] (03CR) 10Jdlrobson: "dblist seems appropriate in this case, or if not that a programattic approach e.g. if lang=='hu' use the fr value" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367938 (https://phabricator.wikimedia.org/T171556) (owner: 10Jdlrobson) [18:59:16] jdlrobson: Well, they're not alphabetized for oneeeee [18:59:24] And we don't really do any wikipedia-* style ones [18:59:28] But that could be me nitpicking [18:59:30] Anyway [18:59:41] * RainbowSprinkles orders lunch, chugs rest of coffee, puts on train conductor hat [19:00:01] (03CR) 10jerkins-bot: [V: 04-1] Update several Wikipedia projects to existing wordmarks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367938 (https://phabricator.wikimedia.org/T171556) (owner: 10Jdlrobson) [19:00:04] RainbowSprinkles: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170726T1900). [19:00:57] Choo choo [19:01:05] !log depooling wdqs1001 for data reload - T166244 [19:01:07] !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=wdqs1001.wmnet [19:01:13] !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=wdqs1001.eqiad.wmnet [19:01:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:01:15] T166244: Reload WDQS data after T131960 is merged - https://phabricator.wikimedia.org/T166244 [19:01:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:01:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:06] !log cp1074: run-no-puppet varnish-backend-restart (mailbox lag in icinga) [19:06:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:12:36] (03PS2) 10Chad: Group1 to wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367918 [19:13:18] jdlrobson: the "NocDBlistTest" checks if there are links in both ./dblists/ with ./docroot/noc/conf/ directories [19:13:26] https://github.com/wikimedia/operations-mediawiki-config/blob/master/tests/noc-conf/NOCDblistTest.php [19:13:44] eh, i meant "if there are links from both directories to a third place", i guess [19:15:32] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 0 [19:26:48] (03CR) 10Chad: [C: 032] Group1 to wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367918 (owner: 10Chad) [19:28:08] (03Merged) 10jenkins-bot: Group1 to wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367918 (owner: 10Chad) [19:28:19] (03CR) 10jenkins-bot: Group1 to wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367918 (owner: 10Chad) [19:30:50] !log mx1001 - temp disable puppet to test adjusted sudo privileges for an icinga check [19:30:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:31] !log demon@tin Started scap: group1 to wmf.11 [19:35:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:14] (03PS1) 10Eevans: Enable prometheus jmx exporter in dev environment [puppet] - 10https://gerrit.wikimedia.org/r/367952 (https://phabricator.wikimedia.org/T171772) [19:40:39] (03Draft2) 10محمد شعیب: Add urdu logo to mobile site [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367946 (https://phabricator.wikimedia.org/T171769) [19:47:35] !log demon@tin scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/2cc7028226a539553178454fc2f14459 for details) [19:47:35] !log demon@tin scap failed: RuntimeError scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/2cc7028226a539553178454fc2f14459 for details) (duration: 12m 03s) [19:47:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:47:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:48:15] !log demon@tin Started scap: group1 to wmf.11 [19:48:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:50:56] (03CR) 10Eevans: [C: 031] "Enables the Prometheus agent in the dev environment only ([see puppet compiler output](http://puppet-compiler.wmflabs.org/7172))" [puppet] - 10https://gerrit.wikimedia.org/r/367952 (https://phabricator.wikimedia.org/T171772) (owner: 10Eevans) [19:57:10] (03CR) 10Dzahn: [C: 032] "thanks for adding compiler link. merging per "dev only"" [puppet] - 10https://gerrit.wikimedia.org/r/367952 (https://phabricator.wikimedia.org/T171772) (owner: 10Eevans) [19:57:39] (03PS2) 10Dzahn: restbase: Enable prometheus jmx exporter in dev environment [puppet] - 10https://gerrit.wikimedia.org/r/367952 (https://phabricator.wikimedia.org/T171772) (owner: 10Eevans) [20:00:04] gwicke, cscott, arlolra, subbu, bearND, halfak, and Amir1: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170726T2000). Please do the needful. [20:00:18] (03CR) 10Mobrovac: [C: 031] restbase: Enable prometheus jmx exporter in dev environment [puppet] - 10https://gerrit.wikimedia.org/r/367952 (https://phabricator.wikimedia.org/T171772) (owner: 10Eevans) [20:01:37] !log demon@tin Finished scap: group1 to wmf.11 (duration: 13m 22s) [20:01:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:43] RECOVERY - MariaDB Slave Lag: s3 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 89976.84 seconds [20:05:16] no parsoid deploy today [20:06:59] mutante: thanks! [20:08:04] urandom: yw. and now it's actually submitted [20:08:19] !log mobrovac@tin Started deploy [cxserver/deploy@f43ef96]: Switch node_modules to node v6.11 [20:08:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:06] !log mobrovac@tin Started deploy [citoid/deploy@43c2776]: Switch node_modules to Node v6.11 [20:09:09] !log demon@tin Started scap: no-op, ideal timing scenario [20:09:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:55] !log mobrovac@tin Finished deploy [cxserver/deploy@f43ef96]: Switch node_modules to node v6.11 (duration: 02m 36s) [20:11:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:11:27] !log mobrovac@tin Started deploy [graphoid/deploy@1707b3c]: Switch node_modules to node v6.11 [20:11:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:03] !log mobrovac@tin Finished deploy [citoid/deploy@43c2776]: Switch node_modules to Node v6.11 (duration: 02m 56s) [20:12:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:35] (03CR) 10Bearloga: "Seems to me the build failure is (1) unrelated to the patch, and (2) about multiple Phab tickets in the commit message which is also weird" [puppet] - 10https://gerrit.wikimedia.org/r/367930 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga) [20:12:45] !log demon@tin Finished scap: no-op, ideal timing scenario (duration: 03m 35s) [20:12:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:10] !log mobrovac@tin Started deploy [mobileapps/deploy@bb81d91]: Switch node_modules to Node v6.11 [20:18:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:19:18] !log mobrovac@tin Finished deploy [graphoid/deploy@1707b3c]: Switch node_modules to node v6.11 (duration: 07m 50s) [20:19:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:20:06] !log mobrovac@tin Started deploy [trending-edits/deploy@22967f3]: Switch node_modules to node v6.11 [20:20:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:21:42] PROBLEM - HHVM jobrunner on mw1301 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [20:22:18] !log mobrovac@tin Finished deploy [mobileapps/deploy@bb81d91]: Switch node_modules to Node v6.11 (duration: 04m 08s) [20:22:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:22:39] !log mobrovac@tin Started deploy [changeprop/deploy@444223d]: Switch node_modules to Node v6.11 [20:22:42] RECOVERY - HHVM jobrunner on mw1301 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.001 second response time [20:22:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:24:16] !log mobrovac@tin Finished deploy [changeprop/deploy@444223d]: Switch node_modules to Node v6.11 (duration: 01m 35s) [20:24:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:25:12] 10Operations, 10ORES, 10Scoring-platform-team: rack/setup/install ores2001-2009 - https://phabricator.wikimedia.org/T165170#3476095 (10RobH) [20:25:31] !log mobrovac@tin Started deploy [eventstreams/deploy@a2a0f19]: Switch node_modules to Node v6.11 [20:25:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:07] !log mobrovac@tin Finished deploy [trending-edits/deploy@22967f3]: Switch node_modules to node v6.11 (duration: 07m 01s) [20:27:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:40] !log mobrovac@tin Started deploy [recommendation-api/deploy@e7adea0]: Switch node_modules to node v6.11 [20:27:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:28:28] !log mobrovac@tin Finished deploy [eventstreams/deploy@a2a0f19]: Switch node_modules to Node v6.11 (duration: 02m 57s) [20:28:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:29:23] !log mobrovac@tin Started deploy [mathoid/deploy@44ea6d8]: Switch node_modules to Node v6.11 [20:29:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:30:06] !log mobrovac@tin Finished deploy [recommendation-api/deploy@e7adea0]: Switch node_modules to node v6.11 (duration: 02m 26s) [20:30:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:31:56] !log mobrovac@tin Started deploy [electron-render/deploy@8dd5f13]: Switch node_modules to node v6.11 [20:32:00] (03CR) 10Hoo man: [C: 031] "Fine to deploy at any time." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367913 (owner: 10Lucas Werkmeister (WMDE)) [20:32:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:32:39] !log mobrovac@tin Finished deploy [mathoid/deploy@44ea6d8]: Switch node_modules to Node v6.11 (duration: 03m 16s) [20:32:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:35:13] PROBLEM - pdfrender on scb2006 is CRITICAL: connect to address 10.192.32.20 and port 5252: Connection refused [20:36:12] RECOVERY - pdfrender on scb2006 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.074 second response time [20:38:22] PROBLEM - pdfrender on scb1001 is CRITICAL: connect to address 10.64.0.16 and port 5252: Connection refused [20:41:12] PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL: CRITICAL: 1.67% of data above the critical threshold [1000.0] [20:48:24] 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): Switch to new labs puppetmasters - https://phabricator.wikimedia.org/T171786#3476152 (10Andrew) [20:51:01] !log mforns@tin Started deploy [analytics/refinery@58176d0]: deploying refinery to use 0.0.49 jars [20:51:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:54:14] !log mforns@tin Finished deploy [analytics/refinery@58176d0]: deploying refinery to use 0.0.49 jars (duration: 03m 12s) [20:54:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:00:52] !log mobrovac@tin Started deploy [electron-render/deploy@8dd5f13]: (no justification provided) [21:01:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:01:27] (03CR) 10Thcipriani: "inline comment" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363970 (owner: 10Chad) [21:01:39] (03PS1) 10Andrew Bogott: puppet-merge: fix a syntax error when there's only one worker [puppet] - 10https://gerrit.wikimedia.org/r/368088 [21:02:06] (03PS1) 10Andrew Bogott: add puppetmaster roles to labpuppetmaster1001 and 2 [puppet] - 10https://gerrit.wikimedia.org/r/368090 [21:03:31] !log mobrovac@tin Finished deploy [electron-render/deploy@8dd5f13]: (no justification provided) (duration: 02m 38s) [21:03:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:04:12] (03CR) 10Andrew Bogott: [C: 032] add puppetmaster roles to labpuppetmaster1001 and 2 [puppet] - 10https://gerrit.wikimedia.org/r/368090 (owner: 10Andrew Bogott) [21:07:29] (03PS10) 10Dzahn: icinga/role:mail::mx: add monitoring of exim queue size [puppet] - 10https://gerrit.wikimedia.org/r/361023 (https://phabricator.wikimedia.org/T133110) [21:08:23] !log mobrovac@tin Started deploy [electron-render/deploy@8dd5f13]: (no justification provided) [21:08:36] PROBLEM - puppet last run on labpuppetmaster1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:08:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:10:00] !log mobrovac@tin Finished deploy [electron-render/deploy@8dd5f13]: (no justification provided) (duration: 01m 37s) [21:10:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:13:43] PROBLEM - cassandra-a CQL 10.64.16.97:9042 on restbase-dev1005 is CRITICAL: connect to address 10.64.16.97 and port 9042: Connection refused [21:13:49] that's me ^^^ [21:13:52] dev only [21:14:03] PROBLEM - cassandra-a service on restbase-dev1005 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed [21:14:17] 10Operations, 10Analytics, 10Analytics-Cluster, 10Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#3476259 (10Halfak) Looks to me like this task is ready to be resolved. Also, I have no idea why it is assigned to me as I've only consulted on it. @dr0ptp4kt w... [21:14:32] PROBLEM - cassandra-a SSL 10.64.16.97:7001 on restbase-dev1005 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [21:15:03] RECOVERY - cassandra-a service on restbase-dev1005 is OK: OK - cassandra-a is active [21:15:32] RECOVERY - cassandra-a SSL 10.64.16.97:7001 on restbase-dev1005 is OK: SSL OK - Certificate restbase-dev1005-a valid until 2018-07-20 15:08:07 +0000 (expires in 358 days) [21:15:42] RECOVERY - cassandra-a CQL 10.64.16.97:9042 on restbase-dev1005 is OK: TCP OK - 0.000 second response time on 10.64.16.97 port 9042 [21:21:32] RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.004 second response time [21:21:56] (03PS1) 10Andrew Bogott: labtestpuppetmaster: add lots of hiera defs [puppet] - 10https://gerrit.wikimedia.org/r/368097 [21:23:02] (03CR) 10Dzahn: [C: 04-1] icinga/role:mail::mx: add monitoring of exim queue size (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/361023 (https://phabricator.wikimedia.org/T133110) (owner: 10Dzahn) [21:23:05] (03PS4) 10Chad: WIP: Simple wrapper around updating the interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363970 [21:23:26] (03CR) 10Andrew Bogott: [C: 032] labtestpuppetmaster: add lots of hiera defs [puppet] - 10https://gerrit.wikimedia.org/r/368097 (owner: 10Andrew Bogott) [21:25:13] (03PS5) 10Chad: WIP: Simple wrapper around updating the interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363970 [21:25:32] (03PS1) 10Andrew Bogott: labpuppetmaster: more hiera config [puppet] - 10https://gerrit.wikimedia.org/r/368099 [21:25:39] (03PS6) 10Chad: WIP: Simple wrapper around updating the interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363970 [21:25:41] (03CR) 10Chad: [C: 032] WIP: Simple wrapper around updating the interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363970 (owner: 10Chad) [21:25:43] (03PS1) 10Chad: Updating interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368100 [21:25:45] (03CR) 10Chad: [C: 032] Updating interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368100 (owner: 10Chad) [21:26:56] (03PS11) 10Dzahn: icinga/role:mail::mx: add monitoring of exim queue size [puppet] - 10https://gerrit.wikimedia.org/r/361023 (https://phabricator.wikimedia.org/T133110) [21:29:02] PROBLEM - pdfrender on scb1002 is CRITICAL: connect to address 10.64.16.21 and port 5252: Connection refused [21:29:17] known ^ [21:29:53] ACKNOWLEDGEMENT - puppet last run on labpuppetmaster1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues andrew bogott Its going to take me a while to get this right. [21:30:02] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time [21:34:46] 10Operations, 10Services (done), 10User-mobrovac: nodejs 6.11 - https://phabricator.wikimedia.org/T170548#3476271 (10mobrovac) FYI, all of the SCB services have been migrated. [21:37:32] RECOVERY - carbon-cache too many creates on graphite1001 is OK: OK: Less than 1.00% above the threshold [500.0] [21:39:24] (03PS1) 10MaxSem: WIP: [labs] Puppetize XTools [puppet] - 10https://gerrit.wikimedia.org/r/368101 (https://phabricator.wikimedia.org/T170514) [21:39:35] !log restarting rabbitmq on labcontrol1001 [21:39:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:51:27] (03PS1) 10Andrew Bogott: Nodepool: raise rate to 8 seconds [puppet] - 10https://gerrit.wikimedia.org/r/368102 (https://phabricator.wikimedia.org/T170492) [21:52:36] (03CR) 10Andrew Bogott: [V: 032 C: 032] Nodepool: raise rate to 8 seconds [puppet] - 10https://gerrit.wikimedia.org/r/368102 (https://phabricator.wikimedia.org/T170492) (owner: 10Andrew Bogott) [21:54:46] andrewbogott: why the restart? [21:56:18] (03CR) 10Dzahn: [V: 032 C: 032] "nodepool issue and already verified before rebase" [puppet] - 10https://gerrit.wikimedia.org/r/361023 (https://phabricator.wikimedia.org/T133110) (owner: 10Dzahn) [21:56:31] (03PS12) 10Dzahn: icinga/role:mail::mx: add monitoring of exim queue size [puppet] - 10https://gerrit.wikimedia.org/r/361023 (https://phabricator.wikimedia.org/T133110) [21:56:32] chasemp: lots of things were timing out… I'm not sure exactly what's going on. [21:56:35] kk [21:56:39] andrewbogott: how did you notice? [21:56:45] chasemp: I turned down the nodepool rate a bit [21:57:01] I was waiting for Jenkins and it was slow so then looked at the contintcloud server list [21:57:06] gotcha [21:58:00] (03CR) 1020after4: "@dzahn: it depends. On labs, it gets the value from ldap for mwdeploy because that is where the user is defined. The current value on labs" [puppet] - 10https://gerrit.wikimedia.org/r/365891 (https://phabricator.wikimedia.org/T166013) (owner: 1020after4) [21:59:00] (03CR) 1020after4: "On production we shouldn't have this problem because puppet can change the user's home directory, due to not being defined in ldap." [puppet] - 10https://gerrit.wikimedia.org/r/365891 (https://phabricator.wikimedia.org/T166013) (owner: 1020after4) [22:01:40] (03CR) 10Dzahn: [V: 032 C: 032] icinga/role:mail::mx: add monitoring of exim queue size [puppet] - 10https://gerrit.wikimedia.org/r/361023 (https://phabricator.wikimedia.org/T133110) (owner: 10Dzahn) [22:04:56] (03CR) 10Dzahn: "[mx1001:~] $ sudo -u nagios /usr/local/lib/nagios/plugins/check_exim_queue -w 1000 -c 3000" [puppet] - 10https://gerrit.wikimedia.org/r/361023 (https://phabricator.wikimedia.org/T133110) (owner: 10Dzahn) [22:06:10] (03CR) 10Dzahn: "# This file is managed by Puppet!" [puppet] - 10https://gerrit.wikimedia.org/r/361023 (https://phabricator.wikimedia.org/T133110) (owner: 10Dzahn) [22:12:58] (03PS2) 10Andrew Bogott: labpuppetmaster: more hiera config [puppet] - 10https://gerrit.wikimedia.org/r/368099 [22:17:03] Warning: Certificate 'Puppet CA: virt1000.wikimedia.org' will expire on 2017-08-15T20:55:45UTC [22:17:40] 10Operations, 10monitoring, 10Patch-For-Review: Check for an oversized exim4 queue indicating mail delivery failures - https://phabricator.wikimedia.org/T133110#3476378 (10Dzahn) ``` [mx1001:~] $ sudo -u nagios /usr/local/lib/nagios/plugins/check_exim_queue -w 1000 -c 3000 OK: Less than 1000 mails in exim qu... [22:17:43] eh, wrong channel prolly [22:18:36] MaxSem: I'm working on it — all the labs VMs will be saying that for a week or two. [22:19:16] so it's because imma running shiny new stretch? :p [22:20:33] (03CR) 10Dzahn: "https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=exim+queue" [puppet] - 10https://gerrit.wikimedia.org/r/361023 (https://phabricator.wikimedia.org/T133110) (owner: 10Dzahn) [22:21:06] MaxSem: https://phabricator.wikimedia.org/T168110 [22:21:13] no [22:21:20] not because of stretch [22:23:19] (03CR) 10jerkins-bot: [V: 04-1] WIP: [labs] Puppetize XTools [puppet] - 10https://gerrit.wikimedia.org/r/368101 (https://phabricator.wikimedia.org/T170514) (owner: 10MaxSem) [22:26:33] (03Merged) 10jenkins-bot: WIP: Simple wrapper around updating the interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363970 (owner: 10Chad) [22:26:35] (03Merged) 10jenkins-bot: Updating interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368100 (owner: 10Chad) [22:26:44] (03CR) 10jenkins-bot: WIP: Simple wrapper around updating the interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363970 (owner: 10Chad) [22:29:20] (03CR) 10jenkins-bot: Updating interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368100 (owner: 10Chad) [22:45:22] RainbowSprinkles: hmpf, for merging WIP [22:45:45] There was an idea to make Jenkins reject those in gate at some point. Oh well :) [22:59:26] Ha. [23:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170726T2300). [23:00:04] RoanKattouw: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:11] Roan's here, ish. [23:00:45] I can surpervise. [23:00:57] Err. I can also supervise, more usefully. [23:02:00] I can SWAT [23:02:23] James_F: sounds like you're volunteering to check RoanKattouw 's patch for SWAT? [23:03:36] (03PS2) 10Thcipriani: Enable Echo per-user blacklist on meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363049 (https://phabricator.wikimedia.org/T150419) (owner: 10Catrope) [23:06:32] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363049 (https://phabricator.wikimedia.org/T150419) (owner: 10Catrope) [23:08:36] (03CR) 10Thcipriani: Enable Echo per-user blacklist on meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363049 (https://phabricator.wikimedia.org/T150419) (owner: 10Catrope) [23:08:45] (03CR) 10Thcipriani: [C: 032] "SWAT try again" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363049 (https://phabricator.wikimedia.org/T150419) (owner: 10Catrope) [23:09:32] thcipriani: Yes, sorry. [23:10:04] okie doke :) [23:10:15] (03Merged) 10jenkins-bot: Enable Echo per-user blacklist on meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363049 (https://phabricator.wikimedia.org/T150419) (owner: 10Catrope) [23:10:26] (03CR) 10jenkins-bot: Enable Echo per-user blacklist on meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363049 (https://phabricator.wikimedia.org/T150419) (owner: 10Catrope) [23:11:08] James_F: patch is live on mwdebug1002, check please [23:12:46] thcipriani: Hmm. Checking, seems odd. [23:13:51] Sorry guys, I forgot I had a patch listed [23:18:08] RoanKattouw: np, live on mwdebug1002 now, the UI is there, seems to work, but haven't tested any further. James_F what are you seeing that's odd? [23:18:55] (03PS1) 10Dzahn: lists/icinga: remove I/O monitoring on lists server [puppet] - 10https://gerrit.wikimedia.org/r/368110 (https://phabricator.wikimedia.org/T133110) [23:19:32] thcipriani: We worked it out. My second test account wasn't going through mwdebug1002 and that's necessary. [23:19:50] thcipriani: Working fo rme [23:19:59] ah, cool, going live [23:19:59] thcipriani: Go for it. [23:21:09] oh [23:21:13] Oh? [23:21:25] (03CR) 10Dzahn: [C: 04-1] "yes, thanks for the extra details! the exim queue size monitoring has been added today now. so i am going to remove this one instead as su" [puppet] - 10https://gerrit.wikimedia.org/r/358504 (owner: 10Dzahn) [23:21:27] RainbowSprinkles: you are updating the interwiki cache and that has scap locked? [23:21:37] (03Abandoned) 10Dzahn: lists/icinga: remove mailman I/O stat CRITs [puppet] - 10https://gerrit.wikimedia.org/r/358504 (owner: 10Dzahn) [23:21:41] > 23:21:00 sync-file failed: Failed to acquire lock "/var/lock/scap.operations_mediawiki-config.lock"; owner is "demon"; reason is "Updating interwiki cache" [23:22:03] Tut. [23:22:22] hrm, except I don't see that running anywhere... [23:23:00] thcipriani: did you see that earlier "Simple wrapper around updating the interwiki cache" was merged? [23:23:09] sounds so related [23:23:31] that would make sense [23:23:37] https://gerrit.wikimedia.org/r/363970 [23:26:09] hrm, I don't know why that wouldn't clean up its lock file... [23:27:45] and RainbowSprinkles doesn't seem to be on this box and scap isn't running. [23:28:02] do you need somebody to delete the lock file as root or something? [23:28:05] mutante: could you use your superpowers to manually remove the lock for now. [23:28:07] yeah [23:28:20] which machine is it though [23:28:24] tin [23:28:26] ok [23:28:32] thank you [23:29:04] !log tin rm /var/lock/scap.operations_mediawiki-config.lock [23:29:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:30:14] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:363049|Enable Echo per-user blacklist on meta]] T150419 (duration: 00m 49s) [23:30:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:30:26] T150419: Allow users to restrict who can send them notifications - https://phabricator.wikimedia.org/T150419 [23:30:33] ^ RoanKattouw James_F live now, thanks for your patience [23:30:39] mutante: thanks for the assist [23:30:40] Thanks! [23:30:49] no problem, yw [23:31:37] (03CR) 10Dzahn: "16:26 < thcipriani> hrm, I don't know why that wouldn't clean up its lock file..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363970 (owner: 10Chad) [23:42:58] (03PS1) 10Mattflaschen: Make emails for minor edits always available; keep defaults [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368113 (https://phabricator.wikimedia.org/T29884) [23:44:21] (03PS2) 10Dzahn: lists/icinga: remove I/O monitoring on lists server [puppet] - 10https://gerrit.wikimedia.org/r/368110 (https://phabricator.wikimedia.org/T133110) [23:47:15] (03CR) 10Dzahn: [C: 032] lists/icinga: remove I/O monitoring on lists server [puppet] - 10https://gerrit.wikimedia.org/r/368110 (https://phabricator.wikimedia.org/T133110) (owner: 10Dzahn) [23:56:43] (03CR) 10Dzahn: "removed 'bc' package manually, removed from icinga config by puppet" [puppet] - 10https://gerrit.wikimedia.org/r/368110 (https://phabricator.wikimedia.org/T133110) (owner: 10Dzahn)