[00:14:07] PROBLEM - puppet last run on elastic1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:38:56] (03CR) 10Paladox: "@Chad or @Dzahn" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [00:41:02] (03PS10) 10Paladox: Gerrit: Add support for logstash in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) [00:43:07] RECOVERY - puppet last run on elastic1017 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [01:27:33] (03CR) 10Krinkle: tlsproxy::localssl: add ability to have an access.log (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/328495 (https://phabricator.wikimedia.org/T153797) (owner: 10Giuseppe Lavagetto) [01:29:25] (03CR) 10Tim Landscheidt: [C: 031] "Tested role::puppetmaster::standalone." [puppet] - 10https://gerrit.wikimedia.org/r/330959 (https://phabricator.wikimedia.org/T148781) (owner: 10Andrew Bogott) [01:52:24] (03CR) 10Chad: Gerrit: Add support for logstash in gerrit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [02:03:57] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [02:06:57] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [02:10:51] 06Operations, 10OTRS, 07Wikimedia-Incident: OTRS error (back up, now monitoring) - https://phabricator.wikimedia.org/T154841#2926545 (10Peachey88) [02:21:58] (03PS11) 10Paladox: Gerrit: Add support for logstash in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) [02:22:02] (03CR) 10Paladox: Gerrit: Add support for logstash in gerrit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [02:22:50] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.7) (duration: 07m 33s) [02:22:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:27:29] !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Jan 8 02:27:29 UTC 2017 (duration 4m 40s) [02:27:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:28:07] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:34:31] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [02:36:57] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [02:56:57] PROBLEM - Check systemd state on labstore1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [02:57:07] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [02:57:17] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1004 is CRITICAL: CRITICAL - Expecting active but unit maintain-dbusers is failed [03:03:57] RECOVERY - Check systemd state on labstore1004 is OK: OK - running: The system is fully operational [03:04:17] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1004 is OK: OK - maintain-dbusers is active [03:20:57] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 614.12 seconds [03:24:57] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 236.24 seconds [03:35:07] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 2 minutes ago with 3 failures. Failed resources (up to 3 shown): Exec[wikidev_ensure_members],Exec[ops_ensure_members],Exec[absent_ensure_members] [04:03:07] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [04:34:11] 07Puppet, 10MediaWiki-Vagrant, 13Patch-For-Review: mediawiki/vagrant puppet classes "3d" are illegal with puppet - https://phabricator.wikimedia.org/T154594#2926576 (10Juniorsys) a:03Juniorsys [04:55:07] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [05:23:07] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:02:27] RECOVERY - Check systemd state on elastic2036 is OK: OK - running: The system is fully operational [06:06:27] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [06:09:27] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:51:07] PROBLEM - puppet last run on mw1169 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tree] [06:51:17] PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:04:27] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [07:07:27] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:09:27] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:19:37] RECOVERY - puppet last run on mw1169 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [07:22:44] 06Operations, 10IDS-extension, 10Wikimedia-Extension-setup, 07I18n: Deploy IDS rendering engine to production - https://phabricator.wikimedia.org/T148693#2926635 (10Shoichi) >>! In T148693#2926097, @Arthur2e5 wrote: > Regarding marking translated comments, consider using something like /*e to > replace /**... [07:26:32] 06Operations, 10Traffic: Plot number of cached objects on a per-server per-DC basis - https://phabricator.wikimedia.org/T154864#2926636 (10ema) [07:26:43] 06Operations, 10Traffic: Plot number of cached objects on a per-server per-DC basis - https://phabricator.wikimedia.org/T154864#2926650 (10ema) p:05Triage>03Normal [07:28:27] PROBLEM - puppet last run on mw1199 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:30:29] 06Operations, 10Monitoring, 10Traffic: Plot number of cached objects on a per-server per-DC basis - https://phabricator.wikimedia.org/T154864#2926651 (10Peachey88) [07:38:27] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [07:42:57] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=357.40 Read Requests/Sec=504.30 Write Requests/Sec=9.10 KBytes Read/Sec=29567.60 KBytes_Written/Sec=408.40 [07:50:27] PROBLEM - puppet last run on wasat is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:54:57] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=180.40 Read Requests/Sec=169.60 Write Requests/Sec=2.00 KBytes Read/Sec=5055.20 KBytes_Written/Sec=34.40 [07:56:27] RECOVERY - puppet last run on mw1199 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [08:03:17] PROBLEM - puppet last run on ms-be1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:04:07] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [08:18:27] RECOVERY - puppet last run on wasat is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [08:32:17] RECOVERY - puppet last run on ms-be1001 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [08:33:08] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [08:34:27] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [08:37:27] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:55:27] PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:24:27] RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [09:41:17] RECOVERY - Check HHVM threads for leakage on mw1169 is OK: OK [11:11:37] PROBLEM - puppet last run on cp3035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:33:32] hi, someone is asking which version of lilypond is running on WMF servers for Extension:Score, any idea? [11:34:27] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [11:37:27] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [11:40:27] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [11:40:37] RECOVERY - puppet last run on cp3035 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [12:04:27] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [12:06:37] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:07:27] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:07:27] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [12:13:27] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:27:27] PROBLEM - MariaDB Slave Lag: s2 on db1047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 395.99 seconds [12:35:37] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [12:41:27] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [12:50:37] PROBLEM - puppet last run on lvs3004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:55:27] PROBLEM - puppet last run on db1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:03:27] RECOVERY - MariaDB Slave Lag: s2 on db1047 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [13:18:37] RECOVERY - puppet last run on lvs3004 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [13:23:27] RECOVERY - puppet last run on db1011 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [14:04:27] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [14:07:27] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:18:27] PROBLEM - puppet last run on maps1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:47:27] RECOVERY - puppet last run on maps1004 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [15:01:27] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Puppet has 21 failures. Last run 2 minutes ago with 21 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [15:08:47] PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on mira is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:24:17] PROBLEM - puppet last run on ms-be1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:27] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [15:52:17] RECOVERY - puppet last run on ms-be1004 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:17:19] PROBLEM - MariaDB Slave IO: x1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:17:19] PROBLEM - MariaDB Slave SQL: s6 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:17:19] PROBLEM - MariaDB Slave IO: s6 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:17:19] PROBLEM - MariaDB Slave SQL: s1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:17:19] PROBLEM - MariaDB Slave IO: s3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:17:19] PROBLEM - MariaDB Slave IO: s5 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:17:19] PROBLEM - MariaDB Slave SQL: x1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:17:27] PROBLEM - MariaDB Slave SQL: s3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:17:27] PROBLEM - MariaDB Slave SQL: m3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:17:27] PROBLEM - MariaDB Slave IO: s2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:17:27] PROBLEM - MariaDB Slave IO: s7 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:17:27] PROBLEM - MariaDB Slave IO: m3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:17:28] PROBLEM - MariaDB Slave SQL: s2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:17:28] PROBLEM - MariaDB Slave IO: s4 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:18:07] RECOVERY - MariaDB Slave IO: x1 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [16:18:08] RECOVERY - MariaDB Slave SQL: s1 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [16:18:08] RECOVERY - MariaDB Slave SQL: s6 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [16:18:08] RECOVERY - MariaDB Slave SQL: x1 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [16:18:08] RECOVERY - MariaDB Slave IO: s6 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [16:18:08] RECOVERY - MariaDB Slave IO: s5 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [16:18:08] RECOVERY - MariaDB Slave IO: s3 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [16:18:17] RECOVERY - MariaDB Slave SQL: s3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [16:18:17] RECOVERY - MariaDB Slave SQL: m3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [16:18:17] RECOVERY - MariaDB Slave IO: s2 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [16:18:17] RECOVERY - MariaDB Slave IO: s7 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [16:18:17] RECOVERY - MariaDB Slave IO: m3 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [16:18:18] RECOVERY - MariaDB Slave IO: s4 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes [16:18:18] RECOVERY - MariaDB Slave SQL: s2 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [16:48:17] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:05:27] PROBLEM - puppet last run on maps1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:16:17] RECOVERY - puppet last run on cp1066 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [17:34:27] RECOVERY - puppet last run on maps1003 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [19:04:37] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:32:37] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:25:27] PROBLEM - puppet last run on db1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:53:27] RECOVERY - puppet last run on db1011 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [22:32:27] PROBLEM - puppet last run on ms-be1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:34:27] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [22:37:27] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [23:00:27] RECOVERY - puppet last run on ms-be1020 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [23:04:27] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [23:07:27] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [23:09:57] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [23:35:09] (03PS4) 10Tim Landscheidt: labstore: Use explicit groups for file resources [puppet] - 10https://gerrit.wikimedia.org/r/324729 (https://phabricator.wikimedia.org/T152095) [23:37:57] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures