[00:58:33] <icinga-wm>	 PROBLEM - puppet last run on rdb2001 is CRITICAL puppet fail
[01:15:23] <icinga-wm>	 RECOVERY - puppet last run on rdb2001 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures
[01:27:06] <RD>	 I just noticed on a couple of WMF private wikis that PDFs don't generate?  Is this a known issue or just something that was never possible?
[01:27:06] <RD>	 Rendering failed
[01:27:06] <RD>	 Generation of the document file has failed.
[01:27:06] <RD>	 Status: Bundling process died with non zero code: 1
[01:49:12] <icinga-wm>	 PROBLEM - puppet last run on mw2127 is CRITICAL puppet fail
[02:07:43] <icinga-wm>	 RECOVERY - puppet last run on mw2127 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[02:21:29] <logmsgbot>	 !log l10nupdate Synchronized php-1.26wmf7/cache/l10n: (no message) (duration: 06m 41s)
[02:21:40] <morebots>	 Logged the message, Master
[02:26:47] <logmsgbot>	 !log LocalisationUpdate completed (1.26wmf7) at 2015-05-31 02:25:44+00:00
[02:26:53] <morebots>	 Logged the message, Master
[02:43:11] <logmsgbot>	 !log l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 05m 51s)
[02:43:16] <morebots>	 Logged the message, Master
[02:47:44] <logmsgbot>	 !log LocalisationUpdate completed (1.26wmf8) at 2015-05-31 02:46:41+00:00
[02:47:51] <morebots>	 Logged the message, Master
[03:21:14] <grrrit-wm>	 (03PS1) 10Ladsgroup: Install Extension:Translate on labswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214893 (https://phabricator.wikimedia.org/T100313) 
[03:27:21] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration: Redirect for Wikimedia v NSA - https://phabricator.wikimedia.org/T97341#1323577 (10Glaisher)
[03:34:52] <icinga-wm>	 PROBLEM - puppet last run on mw1188 is CRITICAL Puppet has 1 failures
[03:35:42] <icinga-wm>	 PROBLEM - puppet last run on mw1212 is CRITICAL Puppet has 1 failures
[03:39:04] <grrrit-wm>	 (03CR) 10Glaisher: "Sysops/bureaucrats should be able to add/remove users to/from translationadmin group." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214893 (https://phabricator.wikimedia.org/T100313) (owner: 10Ladsgroup)
[03:48:54] <icinga-wm>	 PROBLEM - puppet last run on mw2050 is CRITICAL puppet fail
[03:50:47] <grrrit-wm>	 (03PS1) 10Glaisher: Enable "Other Projects Links" by default on ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214894 (https://phabricator.wikimedia.org/T99901) 
[03:51:10] <grrrit-wm>	 (03PS2) 10Glaisher: Enable "Other Projects Links" by default on ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214894 (https://phabricator.wikimedia.org/T99901) 
[03:51:43] <icinga-wm>	 RECOVERY - puppet last run on mw1188 is OK Puppet is currently enabled, last run 46 seconds ago with 0 failures
[03:52:42] <icinga-wm>	 RECOVERY - puppet last run on mw1212 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[03:59:24] <grrrit-wm>	 (03PS2) 10Ladsgroup: Install Extension:Translate on labswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214893 (https://phabricator.wikimedia.org/T100313) 
[04:07:24] <icinga-wm>	 RECOVERY - puppet last run on mw2050 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures
[04:36:43] <icinga-wm>	 PROBLEM - puppet last run on mw2204 is CRITICAL puppet fail
[04:55:22] <icinga-wm>	 RECOVERY - puppet last run on mw2204 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:59:33] <icinga-wm>	 PROBLEM - puppet last run on sca1001 is CRITICAL puppet fail
[05:35:39] <logmsgbot>	 !log LocalisationUpdate ResourceLoader cache refresh completed at Sun May 31 05:34:36 UTC 2015 (duration 34m 35s)
[05:35:43] <morebots>	 Logged the message, Master
[05:56:43] <icinga-wm>	 RECOVERY - puppet last run on sca1001 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures
[06:03:13] <icinga-wm>	 PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL 1.69% of data above the critical threshold [1000.0]
[06:20:23] <icinga-wm>	 PROBLEM - puppet last run on sca1001 is CRITICAL Puppet has 2 failures
[06:32:13] <icinga-wm>	 PROBLEM - puppet last run on db2040 is CRITICAL Puppet has 1 failures
[06:33:32] <icinga-wm>	 PROBLEM - puppet last run on lvs2001 is CRITICAL Puppet has 1 failures
[06:34:03] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2003 is CRITICAL Puppet has 1 failures
[06:34:12] <icinga-wm>	 PROBLEM - puppet last run on mw2093 is CRITICAL Puppet has 1 failures
[06:34:12] <icinga-wm>	 PROBLEM - puppet last run on mw2096 is CRITICAL Puppet has 1 failures
[06:34:13] <icinga-wm>	 PROBLEM - puppet last run on mw2079 is CRITICAL Puppet has 1 failures
[06:34:13] <icinga-wm>	 PROBLEM - puppet last run on mw2045 is CRITICAL Puppet has 1 failures
[06:34:13] <icinga-wm>	 PROBLEM - puppet last run on mw2003 is CRITICAL Puppet has 1 failures
[06:34:23] <icinga-wm>	 PROBLEM - puppet last run on mw1052 is CRITICAL Puppet has 1 failures
[06:35:52] <icinga-wm>	 PROBLEM - puppet last run on mw2092 is CRITICAL Puppet has 1 failures
[06:47:04] <icinga-wm>	 RECOVERY - puppet last run on lvs2001 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures
[06:47:32] <icinga-wm>	 RECOVERY - puppet last run on db2040 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:47:43] <icinga-wm>	 RECOVERY - puppet last run on ms-fe2003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:47:43] <icinga-wm>	 RECOVERY - puppet last run on mw2093 is OK Puppet is currently enabled, last run 26 seconds ago with 0 failures
[06:47:44] <icinga-wm>	 RECOVERY - puppet last run on mw2096 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:47:52] <icinga-wm>	 RECOVERY - puppet last run on mw2079 is OK Puppet is currently enabled, last run 30 seconds ago with 0 failures
[06:47:52] <icinga-wm>	 RECOVERY - puppet last run on mw2092 is OK Puppet is currently enabled, last run 29 seconds ago with 0 failures
[06:47:52] <icinga-wm>	 RECOVERY - puppet last run on mw2045 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures
[06:47:53] <icinga-wm>	 RECOVERY - puppet last run on mw2003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:47:53] <icinga-wm>	 RECOVERY - puppet last run on mw1052 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:23] <icinga-wm>	 RECOVERY - carbon-cache too many creates on graphite1001 is OK Less than 1.00% above the threshold [500.0]
[07:47:32] <icinga-wm>	 PROBLEM - High load average on labstore1001 is CRITICAL 62.50% of data above the critical threshold [24.0]
[08:23:03] <icinga-wm>	 RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0]
[09:15:52] <icinga-wm>	 RECOVERY - puppet last run on sca1001 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures
[09:30:12] <icinga-wm>	 PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 604
[09:35:13] <icinga-wm>	 RECOVERY - check_mysql on db1008 is OK: Uptime: 3876576 Threads: 1 Questions: 13455690 Slow queries: 25732 Opens: 61972 Flush tables: 2 Open tables: 64 Queries per second avg: 3.471 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[10:04:52] <wikibugs>	 6operations, 10Wikimedia-DNS, 10Wikimedia-Site-requests, 5Patch-For-Review: Create fishbowl wiki for Wikimedia User Group China - https://phabricator.wikimedia.org/T98676#1323741 (10AddisWang) Do you have mail server that we can use as xx@cn.wikimedia.org, or we can find outside service provider?
[10:33:11] <grrrit-wm>	 (03PS1) 10Yuvipanda: ores: Increase processes per CPU to 4 [puppet] - 10https://gerrit.wikimedia.org/r/214908 
[10:33:42] <grrrit-wm>	 (03PS2) 10Yuvipanda: ores: Increase processes per CPU to 4 [puppet] - 10https://gerrit.wikimedia.org/r/214908 
[10:34:14] <grrrit-wm>	 (03PS3) 10Yuvipanda: ores: Increase processes per CPU to 4 [puppet] - 10https://gerrit.wikimedia.org/r/214908 
[10:34:20] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] ores: Increase processes per CPU to 4 [puppet] - 10https://gerrit.wikimedia.org/r/214908 (owner: 10Yuvipanda)
[10:34:28] <grrrit-wm>	 (03CR) 10Yuvipanda: [V: 032] ores: Increase processes per CPU to 4 [puppet] - 10https://gerrit.wikimedia.org/r/214908 (owner: 10Yuvipanda)
[11:15:22] <grrrit-wm>	 (03PS1) 10Yuvipanda: ores: Add experimental nginx proxy caching [puppet] - 10https://gerrit.wikimedia.org/r/214909 
[11:16:51] <grrrit-wm>	 (03PS2) 10Yuvipanda: ores: Add experimental nginx proxy caching [puppet] - 10https://gerrit.wikimedia.org/r/214909 
[11:19:23] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] ores: Add experimental nginx proxy caching [puppet] - 10https://gerrit.wikimedia.org/r/214909 (owner: 10Yuvipanda)
[11:30:24] <grrrit-wm>	 (03PS1) 10Yuvipanda: ores: Specify labs lvm requirement correctly [puppet] - 10https://gerrit.wikimedia.org/r/214910 
[11:30:27] <grrrit-wm>	 (03PS1) 10Yuvipanda: ores: Enable caching even for resources with a cache header [puppet] - 10https://gerrit.wikimedia.org/r/214911 
[11:30:30] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] ores: Specify labs lvm requirement correctly [puppet] - 10https://gerrit.wikimedia.org/r/214910 (owner: 10Yuvipanda)
[11:30:34] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] ores: Enable caching even for resources with a cache header [puppet] - 10https://gerrit.wikimedia.org/r/214911 (owner: 10Yuvipanda)
[11:30:36] <grrrit-wm>	 (03PS2) 10Yuvipanda: ores: Specify labs lvm requirement correctly [puppet] - 10https://gerrit.wikimedia.org/r/214910 
[11:30:42] <grrrit-wm>	 (03PS2) 10Yuvipanda: ores: Enable caching even for resources with a cache header [puppet] - 10https://gerrit.wikimedia.org/r/214911 
[11:31:36] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] ores: Specify labs lvm requirement correctly [puppet] - 10https://gerrit.wikimedia.org/r/214910 (owner: 10Yuvipanda)
[11:31:46] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] ores: Enable caching even for resources with a cache header [puppet] - 10https://gerrit.wikimedia.org/r/214911 (owner: 10Yuvipanda)
[11:38:04] <grrrit-wm>	 (03PS1) 10Yuvipanda: ores: Specify protocol explicitly for nginx backend [puppet] - 10https://gerrit.wikimedia.org/r/214912 
[11:38:09] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] ores: Specify protocol explicitly for nginx backend [puppet] - 10https://gerrit.wikimedia.org/r/214912 (owner: 10Yuvipanda)
[11:39:39] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 04-1] "Question on the task to be addressed first." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214893 (https://phabricator.wikimedia.org/T100313) (owner: 10Ladsgroup)
[11:42:23] <wikibugs>	 6operations, 10Wikimedia-DNS, 10Wikimedia-Site-requests, 5Patch-For-Review: Create fishbowl wiki for Wikimedia User Group China - https://phabricator.wikimedia.org/T98676#1323802 (10Krenair) I think that's technically possible, but you should make a new ticket about it and CC me there so we can work out ho...
[11:45:18] <wikibugs>	 6operations, 10Wikimedia-DNS, 10Wikimedia-Site-requests: Create fishbowl wiki for Wikimedia User Group China - https://phabricator.wikimedia.org/T98676#1323812 (10Krenair)
[11:51:11] <wikibugs>	 6operations, 10wikitech.wikimedia.org: distribution upgrade for wikitech-static instance - https://phabricator.wikimedia.org/T94585#1323817 (10Krenair)
[11:51:12] <wikibugs>	 6operations, 7Tracking: Upgrade Wikimedia servers to Ubuntu Trusty (14.04) (tracking) - https://phabricator.wikimedia.org/T65899#1323816 (10Krenair)
[11:59:33] <grrrit-wm>	 (03CR) 10Alex Monk: Enable Echo on Wikimedia wikis by default (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/139326 (https://phabricator.wikimedia.org/T97760) (owner: 10Withoutaname)
[12:06:02] <grrrit-wm>	 (03CR) 10Ladsgroup: "Done" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214893 (https://phabricator.wikimedia.org/T100313) (owner: 10Ladsgroup)
[12:06:38] <grrrit-wm>	 (03CR) 10Alex Monk: Install Extension:Translate on labswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214893 (https://phabricator.wikimedia.org/T100313) (owner: 10Ladsgroup)
[12:12:09] <grrrit-wm>	 (03CR) 10Alex Monk: Remove echowikis.dblist (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/139581 (owner: 10Withoutaname)
[12:16:16] <grrrit-wm>	 (03CR) 10Alex Monk: "Let's just do what CentralAuth does: https://github.com/wikimedia/mediawiki-extensions-CentralAuth/blob/master/maintenance/createLocalAcco" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/139326 (https://phabricator.wikimedia.org/T97760) (owner: 10Withoutaname)
[12:26:30] <grrrit-wm>	 (03CR) 10Alex Monk: "Please see I3537206f" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/139326 (https://phabricator.wikimedia.org/T97760) (owner: 10Withoutaname)
[12:26:42] <grrrit-wm>	 (03CR) 10Alex Monk: "I3537206f will fix this" [puppet] - 10https://gerrit.wikimedia.org/r/139581 (owner: 10Withoutaname)
[12:56:03] <icinga-wm>	 PROBLEM - Host mw2027 is DOWN: PING CRITICAL - Packet loss = 100%
[12:57:13] <icinga-wm>	 RECOVERY - Host mw2027 is UPING WARNING - Packet loss = 86%, RTA = 64.86 ms
[12:57:16] <Krenair>	 weird
[13:00:07] <matanya>	 Krenair: what is 20** ?  I know 10** is eqiad, 40** is codfw, and 30** is esams
[13:00:32] <Krenair>	 I think 2* was codfw?
[13:00:55] <Krenair>	 it's mw2027.codfw.wmnet
[13:01:24] <Krenair>	 4* is ulsfo
[13:01:36] <Krenair>	 https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions#Cluster_Servers
[13:05:16] <Krenair>	 matanya, ^
[13:05:33] <matanya>	 ah, right. sundays...
[13:05:40] <matanya>	 thanks.
[13:22:48] <grrrit-wm>	 (03CR) 10GWicke: "Yes, we haven't added all special wikis yet. Luckily it'll be straightforward to do so." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214833 (https://phabricator.wikimedia.org/T100026) (owner: 10Jforrester)
[13:39:25] <wikibugs>	 6operations, 10Wikimedia-DNS, 10Wikimedia-Site-requests: Create fishbowl wiki for Wikimedia User Group China - https://phabricator.wikimedia.org/T98676#1323912 (10jeremyb-phone) >>! re T98676#1323802, @Krenair  no, we never offer that to chapters.  you need to find your own provider and use your own domain n...
[13:52:56] <grrrit-wm>	 (03PS1) 10Nemo bis: [English Planet] Add Bluerasberry, Nimish Gautam [puppet] - 10https://gerrit.wikimedia.org/r/214916 
[14:07:40] <wikibugs>	 6operations, 10Wikimedia-DNS, 10Wikimedia-Site-requests: Create fishbowl wiki for Wikimedia User Group China - https://phabricator.wikimedia.org/T98676#1323941 (10MZMcBride) "Never" is a bit of a strong word. For example, we have OTRS queues. And, of course, past practice shouldn't necessarily dictate future...
[14:17:38] <wikibugs>	 6operations, 10Wikimedia-DNS, 10Wikimedia-Site-requests: Create fishbowl wiki for Wikimedia User Group China - https://phabricator.wikimedia.org/T98676#1323947 (10zhuyifei1999) @jeremyb, @Krenair, @AddisWang: Would a mailing on lists.wikimedia.org work? I'll create a ticket for it if it works.
[14:31:43] <wikibugs>	 6operations, 10Wikimedia-DNS, 10Wikimedia-Site-requests: Create fishbowl wiki for Wikimedia User Group China - https://phabricator.wikimedia.org/T98676#1323956 (10jeremyb) > Would a mailing on lists.wikimedia.org work? I'll create a ticket for it if it works.  to clarify xx@cn.wikimedia.org could mean many t...
[14:32:12] <wikibugs>	 6operations, 10Wikimedia-DNS, 10Wikimedia-Site-requests: Create fishbowl wiki for Wikimedia User Group China - https://phabricator.wikimedia.org/T98676#1323957 (10Krenair) Is it really so hard to create a new ticket and stop bothering people on this one?
[14:39:19] <wikibugs>	 6operations, 10Wikimedia-DNS, 10Wikimedia-Site-requests: Create fishbowl wiki for Wikimedia User Group China - https://phabricator.wikimedia.org/T98676#1323963 (10AddisWang) I'll make a new ticket, after having a discussion with other members.
[14:50:15] <wikibugs>	 6operations, 7discovery-system, 5services-tooling: [RFC] Define the on-disk and live structure of etcd pool data - https://phabricator.wikimedia.org/T100793#1323975 (10Joe) a:3Joe
[14:50:28] <grrrit-wm>	 (03PS21) 10Paladox: Adding task support instead of using Bug: which was for bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/209741 
[15:33:42] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 031] "Looks fine. We could probably purge a lot more puppet_db stuff." [puppet] - 10https://gerrit.wikimedia.org/r/214637 (owner: 10Alexandros Kosiaris)
[15:35:48] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 04-1] "I think we need to leave a couple more IPs in the allow range" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/214638 (owner: 10Alexandros Kosiaris)
[15:36:09] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 031] lint: fully qualify puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/214639 (owner: 10Alexandros Kosiaris)
[15:38:05] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 031] "This is better! Small variable-naming request." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/214640 (owner: 10Alexandros Kosiaris)
[15:39:47] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 04-1] Rename role::puppet::server::labs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/214641 (owner: 10Alexandros Kosiaris)
[15:41:40] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 04-1] "I don't mind the new name, although I'm not sure it's worth it for consistency. A fair amount of docs reference the old name, and lots of" [puppet] - 10https://gerrit.wikimedia.org/r/214642 (owner: 10Alexandros Kosiaris)
[16:03:01] <grrrit-wm>	 (03CR) 10Yuvipanda: "1. Shim the old name to just include the new one" [puppet] - 10https://gerrit.wikimedia.org/r/214642 (owner: 10Alexandros Kosiaris)
[16:03:19] <wikibugs>	 6operations, 6Phabricator, 10Wikimedia-Bugzilla, 7Tracking: Tracking: Remove Bugzilla from production - https://phabricator.wikimedia.org/T95184#1324067 (10Nemo_bis) >>! In T95184#1322229, @Dzahn wrote: > T95267 - removed as a blocker because the dump exists now which makes it possible to build one without...
[16:05:50] <wikibugs>	 6operations, 6Phabricator, 10Wikimedia-Bugzilla, 7Tracking: Tracking: Remove Bugzilla from production - https://phabricator.wikimedia.org/T95184#1324075 (10Nemo_bis) > You are free to call more people to express their opinions.  I'm not the one who proposed this action and I don't intend to do the proposal...
[16:07:28] <wikibugs>	 6operations, 6Phabricator, 10Wikimedia-Bugzilla, 7Tracking: Tracking: Remove Bugzilla from production - https://phabricator.wikimedia.org/T95184#1324076 (10Nemo_bis)
[16:11:42] <wikibugs>	 6operations, 6Phabricator, 10Wikimedia-Bugzilla: Sanitise a Bugzilla database dump - https://phabricator.wikimedia.org/T85141#1324079 (10Nemo_bis) >>! In T85141#1315926, @MZMcBride wrote: > Anyone who wants this information should be given sufficient opportunity (a few months) to extract it from old-bugzilla...
[16:17:55] <grrrit-wm>	 (03PS1) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/214923 
[16:18:14] <grrrit-wm>	 (03PS2) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/214923 
[16:20:00] <grrrit-wm>	 (03CR) 10Paladox: "Please see https://gerrit.wikimedia.org/r/#/c/214923/ which breaks a bit of this patch." [puppet] - 10https://gerrit.wikimedia.org/r/209741 (owner: 10Paladox)
[16:22:52] <grrrit-wm>	 (03CR) 10Paladox: Add link in gitblit for phabricator (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/214923 (owner: 10Paladox)
[16:45:33] <icinga-wm>	 PROBLEM - puppet last run on ms-be1017 is CRITICAL Puppet has 1 failures
[17:02:22] <icinga-wm>	 RECOVERY - puppet last run on ms-be1017 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures
[17:12:00] <gwicke>	 !log performed a rolling restart of RESTBase Cassandra nodes to address elevated request error rates apparently related to schema disagreement
[17:12:05] <morebots>	 Logged the message, Master
[17:20:06] <Krinkle>	 !log Investigating RL issues (clients are loading mediawiki.notification&version=19700101T000000Z, mw.loader.moduleRegistry contains NaN for versions)
[17:20:10] <morebots>	 Logged the message, Master
[17:26:19] <grrrit-wm>	 (03PS1) 10Odder: Make import group assignable on newiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214925 (https://phabricator.wikimedia.org/T100925) 
[17:30:56] <wikibugs>	 6operations, 10Wikimedia-Mailing-lists: mailman emails taking long time for delivery, getting stuck in sodium - https://phabricator.wikimedia.org/T61731#1324190 (10Nemo_bis) Well, dunno, this doesn't really look entirely healthy: https://ganglia.wikimedia.org/latest/graph.php?r=year&z=xlarge&title=Emails%20pas...
[17:33:38] <logmsgbot>	 !log krinkle Synchronized php-1.26wmf7/resources: touch mediawiki.js (duration: 00m 13s)
[17:33:42] <morebots>	 Logged the message, Master
[17:35:50] <grrrit-wm>	 (03PS2) 10Odder: Make import group assignable on newiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214925 (https://phabricator.wikimedia.org/T100925) 
[17:35:58] <wikibugs>	 6operations, 6Phabricator, 10Wikimedia-Bugzilla, 7Tracking: Tracking: Remove Bugzilla from production - https://phabricator.wikimedia.org/T95184#1324193 (10JohnLewis)
[17:36:48] <Krinkle>	 !log Confirmed RL problem solved. The jquery|mediawiki&version=bizqqnC request was cached with an old mw.loader implementation somehow. After the touch and sync, the version is now dQAzAsdU and the implementation is up to date.
[17:36:55] <morebots>	 Logged the message, Master
[17:40:05] <MatmaRex>	 hey Krinkle
[17:40:22] <Krinkle>	 Hi
[17:40:38] <MatmaRex>	 Krinkle: did that, by chance, have anything to do with https://phabricator.wikimedia.org/T100883 ?
[17:40:49] <MatmaRex>	 (or is that yet another issue?)
[17:41:12] <Krinkle>	 Dont know
[17:41:14] <Krinkle>	 Probably not related.
[17:41:23] <Krinkle>	 Unless the culprit is somehow not having synced properly
[17:41:29] <wikibugs>	 6operations, 6Phabricator, 10Wikimedia-Bugzilla, 7Tracking: Tracking: Remove Bugzilla from production - https://phabricator.wikimedia.org/T95184#1324197 (10JohnLewis) The above tasking is getting annoying with the unless debates. As has been said, old-Bugzilla is not required for this as all data can be cl...
[18:37:03] <logmsgbot>	 !log krinkle Synchronized php-1.26wmf8/resources/src/mediawiki/mediawiki.js: rl live fix - I717b86573 (duration: 00m 12s)
[18:37:07] <morebots>	 Logged the message, Master
[20:09:03] <icinga-wm>	 PROBLEM - puppet last run on mw2146 is CRITICAL puppet fail
[20:27:33] <icinga-wm>	 RECOVERY - puppet last run on mw2146 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures
[20:38:14] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0]
[20:43:06] <grrrit-wm>	 (03PS1) 10Odder: Provide static PNG logos for emlwiki and kgwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214981 (https://phabricator.wikimedia.org/T100953) 
[21:01:28] <wikibugs>	 6operations, 10wikitech.wikimedia.org, 7Documentation: Wikitech: update Bacula article - https://phabricator.wikimedia.org/T100954#1324428 (10Gage) 3NEW
[21:09:18] <wikibugs>	 6operations, 7discovery-system, 5services-tooling: [RFC] Define the on-disk and live structure of etcd pool data - https://phabricator.wikimedia.org/T100793#1324445 (10Joe)
[21:28:33] <icinga-wm>	 PROBLEM - RAID on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:29:04] <icinga-wm>	 PROBLEM - dhclient process on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:30:03] <icinga-wm>	 PROBLEM - statsite backend instances on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:30:03] <icinga-wm>	 PROBLEM - configured eth on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:30:03] <icinga-wm>	 RECOVERY - RAID on graphite2001 is OK Active: 8, Working: 8, Failed: 0, Spare: 0
[21:30:33] <icinga-wm>	 RECOVERY - dhclient process on graphite2001 is OK: PROCS OK: 0 processes with command name dhclient
[21:30:53] <ToAruShiroiNeko>	 icinga-wm is so dramatic... :p
[21:31:19] <ToAruShiroiNeko>	 one moment its critical then a swift recovery and everything is ok :)
[21:31:33] <icinga-wm>	 RECOVERY - statsite backend instances on graphite2001 is OK All defined statsite jobs are runnning.
[21:31:33] <icinga-wm>	 RECOVERY - configured eth on graphite2001 is OK - interfaces up
[21:43:03] <icinga-wm>	 PROBLEM - statsdlb process on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:43:23] <icinga-wm>	 PROBLEM - puppet last run on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:43:32] <icinga-wm>	 PROBLEM - statsite backend instances on graphite2001 is CRITICAL: CHECK_NRPError - Could not complete SSL handshake.
[21:43:33] <icinga-wm>	 PROBLEM - configured eth on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:43:33] <icinga-wm>	 PROBLEM - Disk space on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:43:33] <icinga-wm>	 PROBLEM - salt-minion processes on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:43:33] <icinga-wm>	 PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:43:43] <icinga-wm>	 PROBLEM - RAID on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:43:52] <icinga-wm>	 PROBLEM - uWSGI web apps on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:44:13] <icinga-wm>	 PROBLEM - dhclient process on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:44:43] <icinga-wm>	 PROBLEM - SSH on graphite2001 is CRITICAL - Socket timeout after 10 seconds
[21:44:43] <icinga-wm>	 PROBLEM - DPKG on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:45:03] <icinga-wm>	 RECOVERY - puppet last run on graphite2001 is OK Puppet is currently enabled, last run 4 minutes ago with 0 failures
[21:45:13] <icinga-wm>	 RECOVERY - Disk space on graphite2001 is OK: DISK OK
[21:45:13] <icinga-wm>	 RECOVERY - Graphite Carbon on graphite2001 is OK All defined Carbon jobs are runnning.
[21:45:13] <icinga-wm>	 RECOVERY - salt-minion processes on graphite2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[21:45:13] <icinga-wm>	 RECOVERY - RAID on graphite2001 is OK Active: 8, Working: 8, Failed: 0, Spare: 0
[21:45:23] <icinga-wm>	 RECOVERY - uWSGI web apps on graphite2001 is OK All defined uWSGI apps are runnning.
[21:45:43] <icinga-wm>	 RECOVERY - dhclient process on graphite2001 is OK: PROCS OK: 0 processes with command name dhclient
[21:46:13] <icinga-wm>	 RECOVERY - SSH on graphite2001 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[21:46:13] <icinga-wm>	 RECOVERY - statsdlb process on graphite2001 is OK: PROCS OK: 1 process with command name statsdlb
[21:46:13] <icinga-wm>	 RECOVERY - DPKG on graphite2001 is OK: All packages OK
[21:46:43] <icinga-wm>	 RECOVERY - configured eth on graphite2001 is OK - interfaces up
[21:46:43] <icinga-wm>	 RECOVERY - statsite backend instances on graphite2001 is OK All defined statsite jobs are runnning.
[21:51:23] <icinga-wm>	 PROBLEM - SSH on graphite2001 is CRITICAL - Socket timeout after 10 seconds
[21:51:23] <icinga-wm>	 PROBLEM - statsdlb process on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:51:23] <icinga-wm>	 PROBLEM - DPKG on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:51:43] <icinga-wm>	 PROBLEM - puppet last run on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:51:52] <icinga-wm>	 PROBLEM - configured eth on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:51:53] <icinga-wm>	 PROBLEM - statsite backend instances on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:51:53] <icinga-wm>	 PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:51:53] <icinga-wm>	 PROBLEM - Disk space on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:51:53] <icinga-wm>	 PROBLEM - salt-minion processes on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:52:03] <icinga-wm>	 PROBLEM - RAID on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:52:12] <icinga-wm>	 PROBLEM - uWSGI web apps on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:52:33] <icinga-wm>	 PROBLEM - dhclient process on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:53:24] <wikibugs>	 6operations, 6Phabricator, 10Wikimedia-Bugzilla, 7Tracking: Tracking: Remove Bugzilla from production - https://phabricator.wikimedia.org/T95184#1182562 (10Nemo_bis) > The above tasking is getting annoying with the unless debates.  Pro-tip: http://meatballwiki.org/wiki/DiminishingReplies
[22:02:13] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0]
[22:03:12] <icinga-wm>	 RECOVERY - SSH on graphite2001 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[22:03:12] <icinga-wm>	 RECOVERY - statsdlb process on graphite2001 is OK: PROCS OK: 1 process with command name statsdlb
[22:03:13] <icinga-wm>	 RECOVERY - DPKG on graphite2001 is OK: All packages OK
[22:08:22] <icinga-wm>	 PROBLEM - SSH on graphite2001 is CRITICAL - Socket timeout after 10 seconds
[22:08:22] <icinga-wm>	 PROBLEM - statsdlb process on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:08:23] <icinga-wm>	 PROBLEM - DPKG on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:18:43] <icinga-wm>	 RECOVERY - Disk space on graphite2001 is OK: DISK OK
[22:18:43] <icinga-wm>	 RECOVERY - salt-minion processes on graphite2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[22:18:43] <icinga-wm>	 RECOVERY - Graphite Carbon on graphite2001 is OK All defined Carbon jobs are runnning.
[22:18:52] <icinga-wm>	 RECOVERY - RAID on graphite2001 is OK Active: 8, Working: 8, Failed: 0, Spare: 0
[22:18:53] <icinga-wm>	 RECOVERY - uWSGI web apps on graphite2001 is OK All defined uWSGI apps are runnning.
[22:19:13] <icinga-wm>	 RECOVERY - dhclient process on graphite2001 is OK: PROCS OK: 0 processes with command name dhclient
[22:19:54] <icinga-wm>	 RECOVERY - SSH on graphite2001 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[22:20:02] <icinga-wm>	 RECOVERY - statsdlb process on graphite2001 is OK: PROCS OK: 1 process with command name statsdlb
[22:20:02] <icinga-wm>	 RECOVERY - DPKG on graphite2001 is OK: All packages OK
[22:20:13] <icinga-wm>	 RECOVERY - puppet last run on graphite2001 is OK Puppet is currently enabled, last run 39 minutes ago with 0 failures
[22:20:23] <icinga-wm>	 RECOVERY - configured eth on graphite2001 is OK - interfaces up
[22:20:24] <icinga-wm>	 RECOVERY - statsite backend instances on graphite2001 is OK All defined statsite jobs are runnning.
[22:22:57] <wikibugs>	 6operations, 6Phabricator, 7database: Add Story points (from Sprint Extension)  to the phabricator data dump - https://phabricator.wikimedia.org/T100846#1324508 (10mmodell)
[22:28:43] <icinga-wm>	 PROBLEM - puppet last run on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:28:53] <icinga-wm>	 PROBLEM - configured eth on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:28:54] <icinga-wm>	 PROBLEM - statsite backend instances on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:29:02] <icinga-wm>	 PROBLEM - salt-minion processes on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:29:02] <icinga-wm>	 PROBLEM - Disk space on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:29:02] <icinga-wm>	 PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:29:12] <icinga-wm>	 PROBLEM - RAID on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:29:13] <icinga-wm>	 PROBLEM - uWSGI web apps on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:29:33] <icinga-wm>	 PROBLEM - dhclient process on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:29:45] <jgage>	 it fell off the net again while i was investigating :(
[22:30:13] <icinga-wm>	 PROBLEM - SSH on graphite2001 is CRITICAL - Socket timeout after 10 seconds
[22:30:13] <icinga-wm>	 PROBLEM - DPKG on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:30:13] <icinga-wm>	 PROBLEM - statsdlb process on graphite2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:31:03] <icinga-wm>	 RECOVERY - dhclient process on graphite2001 is OK: PROCS OK: 0 processes with command name dhclient
[22:35:51] <jgage>	 !log graphite2001 keeps falling off the net due to OOM; swap 100% in use. dist-upgraded & rebooted. dmesg in ~gage/dmesg.2015-05-31
[22:35:55] <morebots>	 Logged the message, Master
[22:36:12] <icinga-wm>	 PROBLEM - dhclient process on graphite2001 is CRITICAL: Timeout while attempting connection
[22:37:54] <icinga-wm>	 PROBLEM - Host graphite2001 is DOWN: PING CRITICAL - Packet loss = 100%
[22:38:23] <icinga-wm>	 RECOVERY - SSH on graphite2001 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[22:38:23] <icinga-wm>	 RECOVERY - statsdlb process on graphite2001 is OK: PROCS OK: 1 process with command name statsdlb
[22:38:24] <icinga-wm>	 RECOVERY - DPKG on graphite2001 is OK: All packages OK
[22:38:32] <icinga-wm>	 RECOVERY - Host graphite2001 is UPING OK - Packet loss = 0%, RTA = 44.02 ms
[22:38:44] <icinga-wm>	 RECOVERY - configured eth on graphite2001 is OK - interfaces up
[22:38:44] <icinga-wm>	 RECOVERY - Disk space on graphite2001 is OK: DISK OK
[22:38:44] <icinga-wm>	 RECOVERY - statsite backend instances on graphite2001 is OK All defined statsite jobs are runnning.
[22:38:44] <icinga-wm>	 RECOVERY - salt-minion processes on graphite2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[22:38:44] <icinga-wm>	 RECOVERY - Graphite Carbon on graphite2001 is OK All defined Carbon jobs are runnning.
[22:39:03] <icinga-wm>	 RECOVERY - RAID on graphite2001 is OK Active: 8, Working: 8, Failed: 0, Spare: 0
[22:39:12] <icinga-wm>	 RECOVERY - uWSGI web apps on graphite2001 is OK All defined uWSGI apps are runnning.
[22:39:33] <icinga-wm>	 RECOVERY - dhclient process on graphite2001 is OK: PROCS OK: 0 processes with command name dhclient
[22:43:52] <icinga-wm>	 PROBLEM - Host graphite2001 is DOWN: PING CRITICAL - Packet loss = 100%
[22:45:22] <icinga-wm>	 RECOVERY - Host graphite2001 is UPING OK - Packet loss = 0%, RTA = 43.31 ms
[22:46:20] <wikibugs>	 6operations, 10ops-codfw: graphite2001 bios config issue - https://phabricator.wikimedia.org/T100959#1324524 (10Gage) 3NEW
[22:48:00] <jgage>	 ok i'm done with graphite2001 - had to reboot a second time for kernel update
[23:04:59] <wikibugs>	 6operations, 10wikitech.wikimedia.org, 7Documentation: Create documentation on the requesting/allocation of virtual machines in the misc cluster - https://phabricator.wikimedia.org/T97072#1324575 (10Krenair)
[23:05:13] <wikibugs>	 6operations, 7Documentation: Create documentation on the requesting/allocation of virtual machines in the misc cluster - https://phabricator.wikimedia.org/T97072#1232100 (10Krenair)
[23:05:39] <wikibugs>	 6operations, 7Documentation: Wikitech: update Bacula article - https://phabricator.wikimedia.org/T100954#1324580 (10Krenair)