[00:37:50] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [00:41:10] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [00:52:40] PROBLEM - Cassandra database on praseodymium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 109 (cassandra), command name java, args CassandraDaemon [00:53:00] PROBLEM - Cassandra database on cerium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 109 (cassandra), command name java, args CassandraDaemon [00:55:59] !log stopped cassandra on cerium and praseodymium temporarily for testing [00:55:59] RECOVERY - Cassandra database on praseodymium is OK: PROCS OK: 1 process with UID = 109 (cassandra), command name java, args CassandraDaemon [00:56:04] Logged the message, Master [00:59:50] RECOVERY - Cassandra database on cerium is OK: PROCS OK: 1 process with UID = 109 (cassandra), command name java, args CassandraDaemon [01:07:40] PROBLEM - HTTP 5xx req/min on graphite2001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [01:07:40] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [01:10:39] PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:12:59] RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [01:16:40] PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:19:10] RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [01:24:20] PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:24:50] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [01:24:50] RECOVERY - HTTP 5xx req/min on graphite2001 is OK: OK: Less than 1.00% above the threshold [250.0] [01:25:30] RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [01:29:10] PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:35:10] RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [02:03:26] !log l10nupdate Synchronized php-1.25wmf18/cache/l10n: (no message) (duration: 00m 01s) [02:03:32] Logged the message, Master [02:04:33] !log LocalisationUpdate completed (1.25wmf18) at 2015-03-01 02:03:30+00:00 [02:04:37] Logged the message, Master [02:04:57] !log l10nupdate Synchronized php-1.25wmf19/cache/l10n: (no message) (duration: 00m 01s) [02:05:00] Logged the message, Master [02:06:06] !log LocalisationUpdate completed (1.25wmf19) at 2015-03-01 02:05:02+00:00 [02:06:11] Logged the message, Master [02:17:28] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Mar 1 02:16:24 UTC 2015 (duration 16m 23s) [02:17:34] Logged the message, Master [02:42:09] PROBLEM - puppetmaster https on virt1000 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:49:00] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.045 second response time [03:33:10] PROBLEM - puppet last run on analytics1032 is CRITICAL: CRITICAL: Puppet has 1 failures [03:33:10] PROBLEM - puppet last run on mw1116 is CRITICAL: CRITICAL: Puppet has 1 failures [03:50:40] RECOVERY - puppet last run on analytics1032 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [03:51:50] RECOVERY - puppet last run on mw1116 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [03:56:23] !log unlocked stuck EmeraldRS --> Emerald-wiki global rename on dewiki [04:14:49] PROBLEM - Cassandra database on praseodymium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 109 (cassandra), command name java, args CassandraDaemon [04:22:12] eh [04:22:17] morebots: ? [04:22:17] I am a logbot running on tools-exec-10. [04:22:17] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [04:22:17] To log a message, type !log . [04:22:41] !log unlocked stuck EmeraldRS --> Emerald-wiki global rename on dewiki [04:22:58] !log morebots is not posting to wikis [04:23:05] that should help :) [04:25:50] RECOVERY - Cassandra database on praseodymium is OK: PROCS OK: 1 process with UID = 109 (cassandra), command name java, args CassandraDaemon [04:45:41] PROBLEM - puppet last run on es2008 is CRITICAL: CRITICAL: Puppet has 1 failures [05:04:20] RECOVERY - puppet last run on es2008 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:23:26] !log logging a test to test the logging [06:23:37] Logged the message, Master [06:29:00] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:01] (03PS1) 10Base: Enabling subpages for ns0 in uawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193661 (https://phabricator.wikimedia.org/T91154) [06:29:19] PROBLEM - puppet last run on db1051 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:30] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:41] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 4 failures [06:30:40] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:50] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:40] PROBLEM - puppet last run on mw1039 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:40] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:40:33] (03PS2) 10Base: Enabling subpages for ns0 in uawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193661 (https://phabricator.wikimedia.org/T91185) [06:46:09] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:46:10] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:46:20] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:46:20] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:46:40] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:47:00] RECOVERY - puppet last run on db1051 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:47:19] RECOVERY - puppet last run on mw1039 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:47:20] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:57:13] (03PS1) 10Base: Setting import sources for uawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193662 (https://phabricator.wikimedia.org/T91187) [06:58:11] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [07:01:15] (03CR) 10TTO: [C: 031] Setting import sources for uawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193662 (https://phabricator.wikimedia.org/T91187) (owner: 10Base) [07:01:31] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [07:06:46] <_joe_> bblack: happy to hear :) [08:07:53] (03PS1) 10BryanDavis: [WIP] Add role::mediawiki_vagrant_lxc [puppet] - 10https://gerrit.wikimedia.org/r/193665 [08:07:55] (03PS1) 10BryanDavis: [DO NOT MERGE] Make git-sync-upstream allow dirty clone [puppet] - 10https://gerrit.wikimedia.org/r/193666 [08:10:23] (03Abandoned) 10BryanDavis: [DO NOT MERGE] Make git-sync-upstream allow dirty clone [puppet] - 10https://gerrit.wikimedia.org/r/193666 (owner: 10BryanDavis) [08:12:07] (03PS5) 10Tim Landscheidt: labsdeprepo: Allow more than one local repository [puppet] - 10https://gerrit.wikimedia.org/r/118796 (https://phabricator.wikimedia.org/T62925) [08:12:19] (03PS2) 10Tim Landscheidt: Tools: Use labsdeprepo [puppet] - 10https://gerrit.wikimedia.org/r/119428 (https://phabricator.wikimedia.org/T62925) [08:17:30] (03CR) 10Tim Landscheidt: [C: 04-1] "Fuck. I tested this a lot, only to notice *now* that dynamicproxy has an "include misc::labsdebrepo" that will clash filenames. Need to " [puppet] - 10https://gerrit.wikimedia.org/r/119428 (https://phabricator.wikimedia.org/T62925) (owner: 10Tim Landscheidt) [08:19:38] (03PS2) 10BryanDavis: [WIP] Add role::mediawiki_vagrant_lxc [puppet] - 10https://gerrit.wikimedia.org/r/193665 [08:23:03] (03CR) 10Tim Landscheidt: "For instances that enable misc::labsdebrepo via wikitech's configuration page or "include misc::labsdebrepo", the only change will be in /" [puppet] - 10https://gerrit.wikimedia.org/r/118796 (https://phabricator.wikimedia.org/T62925) (owner: 10Tim Landscheidt) [08:37:49] RECOVERY - uWSGI web apps on graphite2001 is OK: OK: All defined uWSGI apps are runnning. [08:41:10] PROBLEM - uWSGI web apps on graphite2001 is CRITICAL: CRITICAL: Not all configured uWSGI apps are running. [11:06:50] PROBLEM - puppet last run on lvs3001 is CRITICAL: CRITICAL: puppet fail [11:11:11] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: Puppet has 1 failures [11:24:20] RECOVERY - puppet last run on lvs3001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [11:27:40] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [12:32:52] <_joe_> uhm had a weird malfunction on one itwiki page, but did a smotetest and all seems nice, overall [12:50:14] (03PS1) 10Nemo bis: Set $wgUploadNavigationUrl for it.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193672 [13:13:31] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Puppet has 1 failures [13:32:10] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [15:00:20] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [15:05:20] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [16:40:30] PROBLEM - puppetmaster https on virt1000 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:50:09] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: Puppet has 1 failures [16:53:30] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.156 second response time [16:56:50] PROBLEM - puppetmaster https on virt1000 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:03:30] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.191 second response time [17:04:31] PROBLEM - puppet last run on labstore2001 is CRITICAL: CRITICAL: puppet fail [17:06:15] (03CR) 10Nemo bis: "Thanks Jeff for this, can you do the same for all other icons in this directory please?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162538 (owner: 10Jhobs) [17:06:50] PROBLEM - puppetmaster https on virt1000 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:07:40] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:11:10] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 8.914 second response time [17:14:30] PROBLEM - puppetmaster https on virt1000 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:15:20] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.037 second response time [17:23:10] RECOVERY - puppet last run on labstore2001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [19:02:15] 6operations, 10Wikimedia-General-or-Unknown, 5Patch-For-Review: [gdash] "(cdn) HTTP Error Rate" would use log scale for 5xx errors - https://phabricator.wikimedia.org/T43754#1076651 (10Nemo_bis) a:5Nemo_bis>3None [20:08:37] 6operations, 10Continuous-Integration: gallium.wikimedia.org disk space running low - https://phabricator.wikimedia.org/T91211#1076705 (10Krinkle) 3NEW [20:15:10] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [20:20:20] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [21:23:50] PROBLEM - Redis on rdb1001 is CRITICAL: Connection refused [21:30:20] RECOVERY - Redis on rdb1001 is OK: TCP OK - 0.001 second response time on port 6379 [21:32:09] PROBLEM - LVS HTTP IPv4 on ocg.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:44:11] RECOVERY - LVS HTTP IPv4 on ocg.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 452 bytes in 0.013 second response time [22:08:49] !log restarted redis-server on rdb1001 [22:18:40] !log restarted jobrunner on mw1001, will restart them all [22:45:50] PROBLEM - puppet last run on mw1006 is CRITICAL: CRITICAL: Puppet has 1 failures [23:03:40] RECOVERY - puppet last run on mw1006 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [23:14:08] (03PS21) 10Gage: Strongswan: IPsec Puppet module [puppet] - 10https://gerrit.wikimedia.org/r/181742 [23:16:34] (03PS22) 10Gage: Strongswan: IPsec Puppet module [puppet] - 10https://gerrit.wikimedia.org/r/181742 [23:17:37] (03PS4) 10Nemo bis: Allow a full text search button on Commons whenever possible [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186916 (https://phabricator.wikimedia.org/T19471) [23:17:50] (03CR) 10Gage: [C: 032] Strongswan: IPsec Puppet module [puppet] - 10https://gerrit.wikimedia.org/r/181742 (owner: 10Gage) [23:19:58] (03CR) 10Florianschmidtwelzow: [C: 031] Allow a full text search button on Commons whenever possible [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186916 (https://phabricator.wikimedia.org/T19471) (owner: 10Nemo bis) [23:20:48] (03CR) 10Nemo bis: [C: 04-1] Added BounceHandler extension to group1 wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191937 (https://phabricator.wikimedia.org/T48640) (owner: 1001tonythomas) [23:31:16] (03CR) 10Alex Monk: Added BounceHandler extension to group1 wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191937 (https://phabricator.wikimedia.org/T48640) (owner: 1001tonythomas) [23:44:22] (03PS1) 10Gage: IPsec: hiera data for testing [puppet] - 10https://gerrit.wikimedia.org/r/193758 [23:46:10] (03CR) 10Gage: [C: 032] "only affects dedicated IPsec testing nodes" [puppet] - 10https://gerrit.wikimedia.org/r/193758 (owner: 10Gage) [23:51:41] (03PS1) 10Gage: IPsec: hiera data for testing v2 [puppet] - 10https://gerrit.wikimedia.org/r/193759 [23:52:45] (03CR) 10Gage: [C: 032] IPsec: hiera data for testing v2 [puppet] - 10https://gerrit.wikimedia.org/r/193759 (owner: 10Gage) [23:55:45] (03PS1) 10Gage: IPsec: hiera data for testing v3 [puppet] - 10https://gerrit.wikimedia.org/r/193760 [23:57:10] (03CR) 10Gage: [C: 032] IPsec: hiera data for testing v3 [puppet] - 10https://gerrit.wikimedia.org/r/193760 (owner: 10Gage)