[00:47:35] <icinga-wm>	 PROBLEM - Ubuntu mirror in sync with upstream on carbon is CRITICAL: /srv/ubuntu/project/trace/carbon.wikimedia.org is over 12 hours old.  
[01:21:57] <icinga-wm>	 PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[01:22:57] <icinga-wm>	 RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0  
[01:26:06] <icinga-wm>	 PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[01:28:06] <icinga-wm>	 RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0  
[01:41:33] <icinga-wm>	 PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[01:43:34] <icinga-wm>	 RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0  
[02:02:36] <icinga-wm>	 PROBLEM - puppet last run on osmium is CRITICAL: CRITICAL: Puppet last ran 2606152 seconds ago, expected  14400  
[02:05:36] <icinga-wm>	 RECOVERY - puppet last run on osmium is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures  
[02:13:27] <logmsgbot>	 !log LocalisationUpdate completed (1.25wmf1) at 2014-10-05 02:13:26+00:00
[02:13:41] <morebots>	 Logged the message, Master
[02:22:47] <logmsgbot>	 !log LocalisationUpdate completed (1.25wmf2) at 2014-10-05 02:22:47+00:00
[02:22:55] <morebots>	 Logged the message, Master
[02:55:58] <grrrit-wm>	 (03PS1) 10Ori.livneh: Enable LuaSandbox profiling when `forceprofile` is true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164750 
[02:57:04] <jackmcbarn>	 ori: :D
[02:58:12] <ori>	 jackmcbarn: do you think that's an acceptable trade-off?
[02:58:34] <jackmcbarn>	 ori: definitely good for now, but do we know if it affects performance yet?
[02:58:55] <ori>	 no, i've been swamped and haven't gotten to it
[02:59:29] <grrrit-wm>	 (03CR) 10Jackmcbarn: [C: 031] "I like this for now, but in the future, I'd like to see it always enabled (unless we discover that it is bad for performance)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164750 (owner: 10Ori.livneh)
[02:59:55] <grrrit-wm>	 (03PS2) 10Ori.livneh: Enable LuaSandbox profiling when `forceprofile` is true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164750 
[03:00:02] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] Enable LuaSandbox profiling when `forceprofile` is true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164750 (owner: 10Ori.livneh)
[03:00:09] <grrrit-wm>	 (03Merged) 10jenkins-bot: Enable LuaSandbox profiling when `forceprofile` is true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164750 (owner: 10Ori.livneh)
[03:01:08] <jackmcbarn>	 ori: one other thing maybe for a follow-up. i think we should have it always enabled on jobrunners, since they sometimes exhibit weird behavior that can't be reproduced on the user-facing servers
[03:01:59] <logmsgbot>	 !log ori Synchronized wmf-config/CommonSettings.php: I707b5754: Enable LuaSandbox profiling when  is true (duration: 00m 07s)
[03:02:05] <morebots>	 Logged the message, Master
[03:02:09] <ori>	 jackmcbarn: sure, +1
[03:02:38] <jackmcbarn>	 ori: i notice that the cpu limit doubler patch got reverted though, since its logic for detecting jobrunners didn't work. do we have a good way to do that?
[03:19:55] <logmsgbot>	 !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Oct  5 03:19:55 UTC 2014 (duration 19m 53s)
[03:20:04] <morebots>	 Logged the message, Master
[05:48:01] <logmsgbot>	 Carmela: just the tip.
[05:48:46] <ori>	 No MSG.
[05:49:07] <Carmela>	 Heh.
[06:28:37] <icinga-wm>	 PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: puppet fail  
[06:29:27] <icinga-wm>	 PROBLEM - puppet last run on search1001 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:29:30] <icinga-wm>	 PROBLEM - puppet last run on labsdb1003 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:29:46] <icinga-wm>	 PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:29:46] <icinga-wm>	 PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:29:47] <icinga-wm>	 PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 2 failures  
[06:29:47] <icinga-wm>	 PROBLEM - puppet last run on db1023 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:29:56] <icinga-wm>	 PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:29:56] <icinga-wm>	 PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Puppet has 3 failures  
[06:29:57] <icinga-wm>	 PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:29:57] <icinga-wm>	 PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:29:57] <icinga-wm>	 PROBLEM - puppet last run on mw1175 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:29:57] <icinga-wm>	 PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:30:06] <icinga-wm>	 PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:30:06] <icinga-wm>	 PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:30:06] <icinga-wm>	 PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:30:07] <icinga-wm>	 PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:30:17] <icinga-wm>	 PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:30:17] <icinga-wm>	 PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:30:27] <icinga-wm>	 PROBLEM - puppet last run on db1021 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:30:47] <icinga-wm>	 PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:44:53] <icinga-wm>	 RECOVERY - puppet last run on labsdb1003 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures  
[06:44:58] <icinga-wm>	 RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures  
[06:45:33] <icinga-wm>	 RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures  
[06:45:38] <icinga-wm>	 RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures  
[06:45:38] <icinga-wm>	 RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures  
[06:45:48] <icinga-wm>	 RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures  
[06:45:49] <icinga-wm>	 RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures  
[06:45:49] <icinga-wm>	 RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures  
[06:45:49] <icinga-wm>	 RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures  
[06:45:50] <icinga-wm>	 RECOVERY - puppet last run on search1001 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures  
[06:46:00] <icinga-wm>	 RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures  
[06:46:20] <icinga-wm>	 RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures  
[06:46:20] <icinga-wm>	 RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures  
[06:46:20] <icinga-wm>	 RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures  
[06:46:21] <icinga-wm>	 RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures  
[06:46:28] <icinga-wm>	 RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures  
[06:46:38] <icinga-wm>	 RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures  
[06:46:48] <icinga-wm>	 RECOVERY - puppet last run on db1021 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures  
[06:46:50] <icinga-wm>	 RECOVERY - puppet last run on mw1175 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures  
[06:46:59] <icinga-wm>	 RECOVERY - Ubuntu mirror in sync with upstream on carbon is OK: /srv/ubuntu/project/trace/carbon.wikimedia.org is over 0 hours old.  
[06:47:18] <icinga-wm>	 RECOVERY - puppet last run on db1023 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures  
[06:47:19] <icinga-wm>	 RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures  
[08:03:33] <grrrit-wm>	 (03PS25) 10Catrope: Citoid puppetization [puppet] - 10https://gerrit.wikimedia.org/r/163068 
[08:20:12] <grrrit-wm>	 (03PS26) 10Catrope: Citoid puppetization [puppet] - 10https://gerrit.wikimedia.org/r/163068 
[08:20:14] <grrrit-wm>	 (03PS1) 10Catrope: Add citoid module to sca1001 and sca1002 [puppet] - 10https://gerrit.wikimedia.org/r/164758 
[08:20:16] <grrrit-wm>	 (03PS1) 10Catrope: Add LVS for citoid [puppet] - 10https://gerrit.wikimedia.org/r/164759 
[11:00:45] <grrrit-wm>	 (03PS1) 10QChris: Ensure that the namenode directory exists before starting the namenode [puppet/cdh] - 10https://gerrit.wikimedia.org/r/164761 
[11:01:52] <grrrit-wm>	 (03PS1) 10QChris: Declare namenode directory only once [puppet] - 10https://gerrit.wikimedia.org/r/164762 
[11:01:54] <grrrit-wm>	 (03PS1) 10QChris: Declare datanode's mount directories only once [puppet] - 10https://gerrit.wikimedia.org/r/164763 
[11:22:00] <icinga-wm>	 RECOVERY - Host ns1-v4 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms  
[11:23:05] <paravoid>	 !log adding static route for ns1 to rubidium (ns0) on cr1-eqiad to temporarily redirect its traffic while the codfw is offline
[11:23:13] <morebots>	 Logged the message, Master
[11:32:00] <icinga-wm>	 RECOVERY - Host db2012 is UP: PING OK - Packet loss = 0%, RTA = 52.56 ms  
[11:32:00] <icinga-wm>	 RECOVERY - Host db2002 is UP: PING OK - Packet loss = 0%, RTA = 52.71 ms  
[11:32:00] <icinga-wm>	 RECOVERY - Host lvs2004 is UP: PING OK - Packet loss = 0%, RTA = 52.97 ms  
[11:32:00] <icinga-wm>	 RECOVERY - Host ms-be2006 is UP: PING OK - Packet loss = 0%, RTA = 53.11 ms  
[11:32:00] <icinga-wm>	 RECOVERY - Host db2011 is UP: PING OK - Packet loss = 0%, RTA = 53.13 ms  
[11:33:40] <icinga-wm>	 PROBLEM - Host ps1-d1-pmtpa is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[11:33:40] <icinga-wm>	 PROBLEM - Host ps1-d3-pmtpa is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[11:33:40] <icinga-wm>	 PROBLEM - Host ps1-c1-pmtpa is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[11:33:40] <icinga-wm>	 PROBLEM - Host ps1-d2-pmtpa is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[11:33:40] <icinga-wm>	 PROBLEM - Host ps1-c3-pmtpa is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[11:33:50] <icinga-wm>	 PROBLEM - Host ps1-c2-pmtpa is DOWN: PING CRITICAL - Packet loss = 100%  
[11:34:00] <icinga-wm>	 PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: puppet fail  
[11:34:00] <icinga-wm>	 PROBLEM - puppet last run on db2009 is CRITICAL: CRITICAL: puppet fail  
[11:34:00] <icinga-wm>	 PROBLEM - puppet last run on db2038 is CRITICAL: CRITICAL: puppet fail  
[11:34:09] <icinga-wm>	 PROBLEM - puppet last run on db2034 is CRITICAL: CRITICAL: puppet fail  
[11:34:19] <icinga-wm>	 PROBLEM - NTP peers on achernar is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown  
[11:34:39] <icinga-wm>	 PROBLEM - puppet last run on ms-be2009 is CRITICAL: CRITICAL: puppet fail  
[11:34:39] <icinga-wm>	 PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: puppet fail  
[11:34:39] <icinga-wm>	 PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: puppet fail  
[11:34:39] <icinga-wm>	 PROBLEM - puppet last run on achernar is CRITICAL: CRITICAL: puppet fail  
[11:34:40] <icinga-wm>	 PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: puppet fail  
[11:34:40] <icinga-wm>	 PROBLEM - puppet last run on install2001 is CRITICAL: CRITICAL: puppet fail  
[11:34:40] <icinga-wm>	 PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: puppet fail  
[11:34:41] <icinga-wm>	 PROBLEM - puppet last run on db2036 is CRITICAL: CRITICAL: puppet fail  
[11:34:41] <icinga-wm>	 PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: puppet fail  
[11:34:49] <icinga-wm>	 PROBLEM - puppet last run on ms-be2010 is CRITICAL: CRITICAL: puppet fail  
[11:34:49] <icinga-wm>	 PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: puppet fail  
[11:34:50] <icinga-wm>	 PROBLEM - puppet last run on ms-be2001 is CRITICAL: CRITICAL: puppet fail  
[11:34:50] <icinga-wm>	 PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: puppet fail  
[11:35:00] <icinga-wm>	 PROBLEM - puppet last run on db2005 is CRITICAL: CRITICAL: puppet fail  
[11:35:11] <icinga-wm>	 PROBLEM - puppet last run on ms-be2003 is CRITICAL: CRITICAL: puppet fail  
[11:35:11] <icinga-wm>	 PROBLEM - puppet last run on db2002 is CRITICAL: CRITICAL: puppet fail  
[11:35:12] <icinga-wm>	 PROBLEM - puppet last run on db2017 is CRITICAL: CRITICAL: puppet fail  
[11:35:12] <icinga-wm>	 PROBLEM - puppet last run on ms-be2002 is CRITICAL: CRITICAL: puppet fail  
[11:35:12] <icinga-wm>	 PROBLEM - puppet last run on db2035 is CRITICAL: CRITICAL: puppet fail  
[11:35:12] <icinga-wm>	 PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: puppet fail  
[11:35:12] <icinga-wm>	 PROBLEM - puppet last run on db2033 is CRITICAL: CRITICAL: puppet fail  
[11:35:12] <icinga-wm>	 PROBLEM - puppet last run on ms-be2011 is CRITICAL: CRITICAL: puppet fail  
[11:35:13] <icinga-wm>	 PROBLEM - puppet last run on pollux is CRITICAL: CRITICAL: puppet fail  
[11:35:13] <icinga-wm>	 PROBLEM - puppet last run on bast2001 is CRITICAL: CRITICAL: puppet fail  
[11:35:14] <icinga-wm>	 PROBLEM - puppet last run on lvs2005 is CRITICAL: CRITICAL: puppet fail  
[11:35:14] <icinga-wm>	 PROBLEM - puppet last run on db2019 is CRITICAL: CRITICAL: puppet fail  
[11:35:15] <icinga-wm>	 PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: puppet fail  
[11:35:22] <icinga-wm>	 PROBLEM - puppet last run on labcontrol2001 is CRITICAL: CRITICAL: puppet fail  
[11:35:23] <icinga-wm>	 PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: puppet fail  
[11:35:23] <icinga-wm>	 PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: puppet fail  
[11:35:42] <icinga-wm>	 RECOVERY - puppet last run on achernar is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures  
[11:35:54] <icinga-wm>	 RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures  
[11:36:12] <icinga-wm>	 RECOVERY - puppet last run on bast2001 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures  
[11:37:03] <icinga-wm>	 RECOVERY - puppet last run on db2017 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures  
[11:37:03] <icinga-wm>	 RECOVERY - puppet last run on pollux is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures  
[11:37:23] <icinga-wm>	 RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures  
[11:38:12] <icinga-wm>	 RECOVERY - puppet last run on lvs2005 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures  
[11:38:42] <icinga-wm>	 RECOVERY - puppet last run on ms-be2009 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures  
[11:39:54] <icinga-wm>	 RECOVERY - puppet last run on ms-be2010 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures  
[11:40:14] <icinga-wm>	 RECOVERY - puppet last run on db2033 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures  
[11:41:18] <icinga-wm>	 RECOVERY - puppet last run on db2035 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures  
[11:42:11] <icinga-wm>	 RECOVERY - puppet last run on ms-be2002 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures  
[11:42:50] <icinga-wm>	 RECOVERY - puppet last run on ms-be2003 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures  
[11:43:34] <icinga-wm>	 RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures  
[11:43:34] <icinga-wm>	 RECOVERY - puppet last run on db2009 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures  
[11:43:38] <icinga-wm>	 RECOVERY - puppet last run on db2034 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures  
[11:43:58] <icinga-wm>	 RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures  
[11:44:12] <icinga-wm>	 RECOVERY - puppet last run on db2005 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures  
[11:44:13] <icinga-wm>	 RECOVERY - puppet last run on db2002 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures  
[11:44:28] <icinga-wm>	 RECOVERY - puppet last run on db2019 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures  
[11:45:00] <icinga-wm>	 RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures  
[11:45:29] <icinga-wm>	 RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures  
[11:46:38] <icinga-wm>	 RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures  
[11:47:09] <icinga-wm>	 RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures  
[11:47:38] <icinga-wm>	 RECOVERY - puppet last run on labcontrol2001 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures  
[11:47:38] <icinga-wm>	 RECOVERY - puppet last run on db2038 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures  
[11:48:28] <icinga-wm>	 RECOVERY - puppet last run on ms-be2011 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures  
[11:48:59] <icinga-wm>	 RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures  
[11:50:18] <icinga-wm>	 RECOVERY - puppet last run on ms-be2001 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures  
[11:50:19] <icinga-wm>	 RECOVERY - puppet last run on ms-be2005 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures  
[11:50:19] <icinga-wm>	 RECOVERY - puppet last run on ms-be2008 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures  
[11:50:38] <icinga-wm>	 RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures  
[11:51:19] <icinga-wm>	 RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures  
[11:51:59] <icinga-wm>	 RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures  
[11:54:59] <icinga-wm>	 RECOVERY - Host ps1-d2-pmtpa is UP: PING WARNING - Packet loss = 93%, RTA = 30.34 ms  
[11:54:59] <icinga-wm>	 RECOVERY - Host ps1-c3-pmtpa is UP: PING WARNING - Packet loss = 93%, RTA = 29.33 ms  
[11:54:59] <icinga-wm>	 RECOVERY - Host ps1-d3-pmtpa is UP: PING WARNING - Packet loss = 93%, RTA = 39.98 ms  
[11:54:59] <icinga-wm>	 RECOVERY - Host ps1-d1-pmtpa is UP: PING WARNING - Packet loss = 93%, RTA = 37.28 ms  
[11:54:59] <icinga-wm>	 RECOVERY - Host ps1-c1-pmtpa is UP: PING WARNING - Packet loss = 93%, RTA = 29.61 ms  
[11:55:14] <icinga-wm>	 RECOVERY - Host ps1-c2-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 30.00 ms  
[11:56:18] <icinga-wm>	 PROBLEM - Host achernar is DOWN: CRITICAL - Time to live exceeded (208.80.153.42)  
[11:56:28] <icinga-wm>	 PROBLEM - Host install2001 is DOWN: CRITICAL - Time to live exceeded (208.80.153.4)  
[11:56:38] <icinga-wm>	 PROBLEM - Host acamar is DOWN: CRITICAL - Time to live exceeded (208.80.153.12)  
[11:56:38] <icinga-wm>	 PROBLEM - Host labcontrol2001 is DOWN: CRITICAL - Time to live exceeded (208.80.153.14)  
[11:56:38] <icinga-wm>	 PROBLEM - Host bast2001 is DOWN: CRITICAL - Time to live exceeded (208.80.153.5)  
[11:56:38] <icinga-wm>	 PROBLEM - Host pollux is DOWN: CRITICAL - Time to live exceeded (208.80.153.43)  
[11:56:49] <icinga-wm>	 PROBLEM - Host baham is DOWN: CRITICAL - Time to live exceeded (208.80.153.13)  
[11:56:49] <icinga-wm>	 PROBLEM - Host ns1-v6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::e  
[11:56:59] <icinga-wm>	 PROBLEM - Host ms-be2008 is DOWN: PING CRITICAL - Packet loss = 100%  
[11:57:00] <icinga-wm>	 PROBLEM - Host ms-be2001 is DOWN: PING CRITICAL - Packet loss = 100%  
[11:57:00] <icinga-wm>	 PROBLEM - Host db2033 is DOWN: PING CRITICAL - Packet loss = 100%  
[11:57:00] <icinga-wm>	 PROBLEM - Host db2037 is DOWN: PING CRITICAL - Packet loss = 100%  
[11:57:00] <icinga-wm>	 PROBLEM - Host db2039 is DOWN: PING CRITICAL - Packet loss = 100%  
[11:59:28] <icinga-wm>	 PROBLEM - Host cr1-codfw is DOWN: CRITICAL - Time to live exceeded (208.80.153.192)  
[11:59:39] <icinga-wm>	 PROBLEM - Host cr2-codfw is DOWN: CRITICAL - Time to live exceeded (208.80.153.193)  
[12:04:46] <grrrit-wm>	 (03CR) 10Krinkle: "@Hashar: I'm aware. That's why I'm using 94, not 99. I'm still work-in-progress on this, but I'll probably end up using "xvdb-run --auto-s" [puppet] - 10https://gerrit.wikimedia.org/r/163791 (owner: 10Krinkle)
[12:11:13] <icinga-wm>	 RECOVERY - Host pollux is UP: PING OK - Packet loss = 0%, RTA = 52.11 ms  
[12:11:13] <icinga-wm>	 RECOVERY - Host db2017 is UP: PING OK - Packet loss = 0%, RTA = 52.05 ms  
[12:11:13] <icinga-wm>	 RECOVERY - Host db2007 is UP: PING OK - Packet loss = 0%, RTA = 52.02 ms  
[12:11:13] <icinga-wm>	 RECOVERY - Host ms-be2003 is UP: PING OK - Packet loss = 0%, RTA = 52.06 ms  
[12:11:13] <icinga-wm>	 RECOVERY - Host ms-be2005 is UP: PING OK - Packet loss = 0%, RTA = 52.19 ms  
[12:12:53] <icinga-wm>	 PROBLEM - Host ps1-d3-pmtpa is DOWN: PING CRITICAL - Packet loss = 100%  
[12:12:53] <icinga-wm>	 PROBLEM - Host ps1-d1-pmtpa is DOWN: PING CRITICAL - Packet loss = 100%  
[12:12:53] <icinga-wm>	 PROBLEM - Host ps1-c3-pmtpa is DOWN: PING CRITICAL - Packet loss = 100%  
[12:12:53] <icinga-wm>	 PROBLEM - Host ps1-c1-pmtpa is DOWN: PING CRITICAL - Packet loss = 100%  
[12:13:14] <icinga-wm>	 PROBLEM - Host ps1-c2-pmtpa is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[12:13:14] <icinga-wm>	 PROBLEM - Host ps1-d2-pmtpa is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[12:13:23] <icinga-wm>	 PROBLEM - puppet last run on db2038 is CRITICAL: CRITICAL: puppet fail  
[12:13:24] <icinga-wm>	 PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: puppet fail  
[12:13:33] <icinga-wm>	 PROBLEM - puppet last run on install2001 is CRITICAL: CRITICAL: puppet fail  
[12:13:33] <icinga-wm>	 PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: puppet fail  
[12:13:34] <icinga-wm>	 PROBLEM - puppet last run on lvs2001 is CRITICAL: CRITICAL: puppet fail  
[12:13:43] <icinga-wm>	 PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: puppet fail  
[12:13:53] <icinga-wm>	 PROBLEM - puppet last run on lvs2005 is CRITICAL: CRITICAL: puppet fail  
[12:13:59] <icinga-wm>	 PROBLEM - puppet last run on ms-be2009 is CRITICAL: CRITICAL: puppet fail  
[12:13:59] <icinga-wm>	 PROBLEM - puppet last run on ms-be2001 is CRITICAL: CRITICAL: puppet fail  
[12:13:59] <icinga-wm>	 PROBLEM - puppet last run on ms-be2003 is CRITICAL: CRITICAL: puppet fail  
[12:13:59] <icinga-wm>	 PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: puppet fail  
[12:13:59] <icinga-wm>	 PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: puppet fail  
[12:13:59] <icinga-wm>	 PROBLEM - puppet last run on db2005 is CRITICAL: CRITICAL: puppet fail  
[12:13:59] <icinga-wm>	 PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: puppet fail  
[12:14:02] <icinga-wm>	 PROBLEM - puppet last run on db2034 is CRITICAL: CRITICAL: puppet fail  
[12:14:03] <icinga-wm>	 PROBLEM - puppet last run on db2019 is CRITICAL: CRITICAL: puppet fail  
[12:14:03] <icinga-wm>	 PROBLEM - puppet last run on db2036 is CRITICAL: CRITICAL: puppet fail  
[12:14:03] <icinga-wm>	 PROBLEM - puppet last run on ms-be2010 is CRITICAL: CRITICAL: puppet fail  
[12:14:03] <icinga-wm>	 PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: puppet fail  
[12:14:13] <icinga-wm>	 PROBLEM - puppet last run on pollux is CRITICAL: CRITICAL: puppet fail  
[12:14:13] <icinga-wm>	 PROBLEM - puppet last run on bast2001 is CRITICAL: CRITICAL: puppet fail  
[12:14:14] <icinga-wm>	 PROBLEM - puppet last run on db2002 is CRITICAL: CRITICAL: puppet fail  
[12:14:14] <icinga-wm>	 PROBLEM - puppet last run on labcontrol2001 is CRITICAL: CRITICAL: puppet fail  
[12:14:14] <icinga-wm>	 PROBLEM - puppet last run on db2035 is CRITICAL: CRITICAL: puppet fail  
[12:14:14] <icinga-wm>	 PROBLEM - puppet last run on db2033 is CRITICAL: CRITICAL: puppet fail  
[12:14:14] <icinga-wm>	 PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: puppet fail  
[12:14:14] <icinga-wm>	 PROBLEM - puppet last run on db2017 is CRITICAL: CRITICAL: puppet fail  
[12:14:14] <icinga-wm>	 PROBLEM - puppet last run on ms-be2002 is CRITICAL: CRITICAL: puppet fail  
[12:14:15] <icinga-wm>	 PROBLEM - puppet last run on ms-be2011 is CRITICAL: CRITICAL: puppet fail  
[12:14:23] <icinga-wm>	 PROBLEM - puppet last run on db2009 is CRITICAL: CRITICAL: puppet fail  
[12:14:24] <icinga-wm>	 PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: puppet fail  
[12:14:24] <icinga-wm>	 PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: puppet fail  
[12:14:54] <icinga-wm>	 RECOVERY - Host cr1-codfw is UP: PING OK - Packet loss = 0%, RTA = 56.20 ms  
[12:15:03] <icinga-wm>	 RECOVERY - Host cr2-codfw is UP: PING OK - Packet loss = 0%, RTA = 52.76 ms  
[12:16:21] <grrrit-wm>	 (03CR) 10Krinkle: "Note that live-1.5 is still actively used by lots of w/ symlinks in docroot/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162520 (owner: 10MaxSem)
[12:16:24] <icinga-wm>	 RECOVERY - puppet last run on bast2001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures  
[12:17:25] <icinga-wm>	 RECOVERY - puppet last run on db2017 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures  
[12:17:44] <icinga-wm>	 RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures  
[12:18:27] <icinga-wm>	 RECOVERY - puppet last run on pollux is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures  
[12:19:06] <icinga-wm>	 RECOVERY - puppet last run on lvs2005 is OK: OK: Puppet is currently enabled, last run 64 seconds ago with 0 failures  
[12:19:06] <icinga-wm>	 RECOVERY - puppet last run on ms-be2009 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures  
[12:19:24] <icinga-wm>	 RECOVERY - puppet last run on ms-be2010 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures  
[12:19:44] <icinga-wm>	 RECOVERY - puppet last run on db2033 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures  
[12:20:41] <icinga-wm>	 RECOVERY - puppet last run on db2035 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures  
[12:22:51] <icinga-wm>	 RECOVERY - puppet last run on ms-be2002 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures  
[12:23:22] <icinga-wm>	 RECOVERY - puppet last run on db2005 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures  
[12:23:25] <icinga-wm>	 RECOVERY - puppet last run on ms-be2003 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures  
[12:23:36] <icinga-wm>	 RECOVERY - puppet last run on db2034 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures  
[12:23:51] <icinga-wm>	 RECOVERY - puppet last run on db2009 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures  
[12:24:01] <icinga-wm>	 RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures  
[12:24:32] <icinga-wm>	 RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures  
[12:24:41] <icinga-wm>	 RECOVERY - puppet last run on db2019 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures  
[12:24:44] <icinga-wm>	 RECOVERY - puppet last run on db2002 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures  
[12:25:31] <icinga-wm>	 RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures  
[12:26:01] <icinga-wm>	 RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures  
[12:27:01] <icinga-wm>	 RECOVERY - puppet last run on labcontrol2001 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures  
[12:27:12] <icinga-wm>	 RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures  
[12:27:22] <icinga-wm>	 RECOVERY - puppet last run on lvs2001 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures  
[12:28:05] <icinga-wm>	 RECOVERY - puppet last run on ms-be2011 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures  
[12:28:22] <icinga-wm>	 RECOVERY - puppet last run on db2038 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures  
[12:28:22] <icinga-wm>	 RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures  
[12:29:52] <icinga-wm>	 RECOVERY - puppet last run on ms-be2001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures  
[12:29:52] <icinga-wm>	 RECOVERY - puppet last run on ms-be2005 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures  
[12:29:52] <icinga-wm>	 RECOVERY - NTP peers on achernar is OK: NTP OK: Offset -0.009409 secs  
[12:30:11] <icinga-wm>	 RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures  
[12:30:42] <icinga-wm>	 RECOVERY - puppet last run on ms-be2008 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures  
[12:31:31] <icinga-wm>	 RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures  
[12:31:32] <icinga-wm>	 RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures  
[12:37:13] <grrrit-wm>	 (03PS1) 10Calak: Prevent search engines from indexing user pages and all talk pages on ckbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164766 (https://bugzilla.wikimedia.org/71663) 
[15:22:40] <grrrit-wm>	 (03PS1) 10Calak: Add namespace alias on fa.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164773 (https://bugzilla.wikimedia.org/71668) 
[15:27:54] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0]  
[15:34:24] <hoo>	 https://gerrit.wikimedia.org/r/#/c/164773/1/wmf-config/InitialiseSettings.php gerrit is not good at rtl it seems
[15:34:35] <hoo>	 (see the   'عن' => 106, part)
[15:37:33] <grrrit-wm>	 (03PS2) 10Calak: Enable Echo for Persian Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164491 (https://bugzilla.wikimedia.org/71669) (owner: 10Reza)
[15:40:14] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0]  
[16:39:51] <paravoid>	 !log restore ns1 routing to codfw
[16:39:58] <morebots>	 Logged the message, Master
[17:48:49] <icinga-wm>	 RECOVERY - Host ps1-d3-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 38.12 ms  
[17:48:58] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 224, down: 0, dormant: 0, excluded: 0, unused: 0  
[17:49:02] <icinga-wm>	 RECOVERY - Host ps1-c1-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 38.36 ms  
[17:49:02] <icinga-wm>	 RECOVERY - Host ps1-c2-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 33.99 ms  
[17:49:02] <icinga-wm>	 RECOVERY - Host ps1-d2-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 35.57 ms  
[17:49:02] <icinga-wm>	 RECOVERY - Host ps1-c3-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 35.74 ms  
[17:49:19] <icinga-wm>	 RECOVERY - Host ps1-d1-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 33.32 ms  
[18:14:29] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0]  
[18:32:00] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0]  
[19:11:18] <grrrit-wm>	 (03PS2) 10Calak: Add namespace alias on fa.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164773 (https://bugzilla.wikimedia.org/71668) 
[19:22:32] <tonythomas>	 legoktm: hi! around ?
[19:23:01] <legoktm>	 hey
[19:23:51] <tonythomas>	 great. any thoughts on when we can schedule to get bouncehandler isntalled in prod ?
[19:24:20] <tonythomas>	 I meant the wiki side -- as per https://gerrit.wikimedia.org/r/#/c/155753/49/manifests/role/mail.pp we planned to get that into loginwiki 
[19:24:53] <legoktm>	 hmm
[19:25:14] <tonythomas>	 I think hoo pointed that we use loginwiki since thats where we have most of them 
[19:25:56] <hoo>	 It's the Wiki i'd choose, but we certainly need to make sure stuff isn't going to go wrong if a user doesn't exist there
[19:26:06] <hoo>	 probably this is blocked by SULF
[19:26:14] <tonythomas>	 SULF ?
[19:26:19] <legoktm>	 SUL finalisation
[19:26:26] <legoktm>	 (https://www.mediawiki.org/wiki/SUL_finalisation)
[19:26:30] <hoo>	 but you probably wont want to wait... so bind against CA to check accounts match?
[19:26:39] <hoo>	 (If you haven't yet, dunno)
[19:27:23] <legoktm>	 it already tries to use CA if possible
[19:27:40] <tonythomas>	 I am fetching the code. one sec 
[19:27:59] <tonythomas>	 https://github.com/wikimedia/mediawiki-extensions-BounceHandler/blob/master/includes/BounceHandlerActions.php#L79
[19:28:01] <hoo>	 Ok... what if we have local user enwiki:foo and that one gets an emailed bounced
[19:28:16] <hoo>	 but loginwiki:foo (global account) belongs to somebody else
[19:28:23] <legoktm>	 https://www.mediawiki.org/wiki/SUL_finalisation
[19:28:24] <legoktm>	 err
[19:28:26] <legoktm>	 			$caUser = CentralAuthUser::getInstance( $user );
[19:28:27] <legoktm>	 			if ( $caUser->isAttached( $this->wikiId ) ) {
[19:28:33] <legoktm>	 that's in BounceHandlerActions
[19:29:07] <hoo>	 legoktm: that's guards against the other way round (not attached on loginwiki, but on enwiki)
[19:29:07] <legoktm>	 the code is currently assuming if CA is enabled, it is finalised. Needs some tweaking to remove that assumption...
[19:29:10] <hoo>	 not very likely
[19:29:12] <tonythomas>	 hoo: that can happen ? multiple users with same ?
[19:29:13] <hoo>	 (for us, at least)
[19:29:21] <hoo>	 tonythomas: Right now, sadly, yes
[19:29:33] <tonythomas>	 oh ! 
[19:29:47] <legoktm>	 hoo: isn't that what we care about? that the account is attached on enwiki?
[19:29:51] <tonythomas>	 legoktm: there is an else case there though 
[19:30:09] <hoo>	 legoktm: both accounts need to be attached... loginwiki and enwiki
[19:30:21] <hoo>	 and we need to check that before taking action
[19:30:40] <tonythomas>	 hoo: anyawy, the first phase will be only log based though 
[19:30:58] <tonythomas>	 $wgBounceHandlerUnconfirmUsers = false;
[19:30:58] <hoo>	 still should be tested
[19:31:06] <legoktm>	 why does loginwiki attachment matter?
[19:31:07] <hoo>	 we don't want this to fatal or something in production
[19:31:41] <hoo>	 legoktm: $this->wikiId is not the id of the current wiki (wfWikiId()), but the one where the bounce came from?
[19:31:58] <legoktm>	 yes
[19:32:02] <legoktm>	 ohhhhh
[19:32:04] <tonythomas>	 hoo: Jeff and I found that the current labs design would make it difficult for us  to test a webserver- mx mode 
[19:32:05] <legoktm>	 blareghad
[19:32:14] <legoktm>	 I see now.
[19:32:17] <tonythomas>	 I meant 'beta' 
[19:32:30] <hoo>	 tonythomas: What mode?
[19:32:32] <tonythomas>	 so we produced the same in labs - found it working alright 
[19:33:27] <hoo>	 "to test a webserver- mx mode"
[19:33:30] <hoo>	 what is meant by that?
[19:33:58] <tonythomas>	 hoo: like we have bouncehandler installed in beta wiki - and we need the exim configurations to be placed on the mail server outlet of beta - which turns out to be polonium. 
[19:34:16] <hoo>	 it sends mails through production?
[19:34:19] <tonythomas>	 remember our realm switches :( That wont happen -- as polonium is always configured in prodcution mode 
[19:34:31] <hoo>	 Ok, I didn't expect that
[19:34:55] <tonythomas>	 even if it dont - the email should bounce back from a remote mx right ? it digs for the wikimedia domain and hits to polonium 
[19:35:01] <tonythomas>	 and gets stuck there 
[19:36:09] <hoo>	 tonythomas: I see the problem and can't really come up with a fix offhand
[19:36:24] <hoo>	 (despite of moving that away from polonium into a labs instance)
[19:37:12] <tonythomas>	 yeah. we did that :) 
[19:37:20] <hoo>	 Awesome
[19:37:24] <tonythomas>	 and its working fine there 
[19:37:33] <tonythomas>	 the bounces gets registered into the db 
[19:37:48] <tonythomas>	 and the user gets un-subscribed on exceeding the limit too 
[19:38:58] <tonythomas>	 hoo: the difference - as Jeff tellls is between dig mx wikimedia.org and for beta emails dig mx deployment.wikimedia.beta.wmflabs.org 
[19:40:20] <hoo>	 tonythomas: Ok, but how is that a problem?
[19:40:42] <hoo>	 polonium or lead will do it for prod. and whatever instance does it for beta
[19:42:31] <tonythomas>	 hoo: once the remote gmail mx that produce the bounce search for  mx of deployment.wikimedia.beta.wmflabs.org and wont be able to find the mx 
[19:42:47] <tonythomas>	 as you can see from the terminal output -- there is no mx shown afaik 
[19:42:48] <hoo>	 because there's none set
[19:42:53] <hoo>	 yep
[19:43:23] <hoo>	 but if you still bounce to whatever the mx should be, it works
[19:43:24] <hoo>	 ?
[19:44:11] <tonythomas>	 I think the remote mx should lookup for the failing domain and pass the bounce to the mx of that domain 
[19:44:20] <hoo>	 Yes
[19:44:33] <hoo>	 so... someone would need to update beta's DNS
[19:44:34] <tonythomas>	 that step should fail, if the remote mx is not able to find 
[19:44:41] <hoo>	 it will
[19:45:04] <tonythomas>	 hoo: that would be great. any idea where the beta emails go through ?
[19:45:07] <tonythomas>	 the mail server ?
[19:47:04] <tonythomas>	 yeah. it goes through polonium 
[19:47:12] <tonythomas>	 I just tested with a test email 
[19:47:55] <tonythomas>	 hoo: https://dpaste.de/h7mb#L20,21
[19:48:47] <tonythomas>	 and since its return address is wiki-deploymentwiki-blah-@deployment.wikimedia.beta.wmflabs.org the remote mx fails to route the email back to polonium. 
[19:49:13] <hoo>	 tonythomas: I guess it could still bounce to something else, if the dns for deployment.wikimedia.beta.wmflabs.org were ok
[19:49:22] <hoo>	 ok as in set up for that
[19:50:28] <tonythomas>	 yeah. if deployment.wikimedia.beta.wmflabs.org would resolve to show up polonium and we have the role { labs } configuration in polonium -- then this should work 
[19:51:08] <hoo>	 no, that shouldn't go via polonium
[19:51:17] <hoo>	 in fact it can't (w/o messing a lot of other stuffs)
[19:51:22] <hoo>	 the curl also wont work for exampel
[19:51:27] <hoo>	 * example
[19:52:26] <tonythomas>	 curl wont work ?
[19:53:05] * tonythomas wish we had a testwiki in production :D
[19:53:44] <hoo>	 tonythomas: polinium wont be able to connect to the beta instances
[19:53:59] <tonythomas>	 not even via http ? 
[19:54:34] <hoo>	 Probably not, no
[19:54:49] <grrrit-wm>	 (03PS4) 10Physikerwelt: Re-enable all Math modes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/139421 (https://bugzilla.wikimedia.org/66587) (owner: 10Reedy)
[19:54:52] <hoo>	 it can only connect to machines within the production cluster
[19:55:02] <hoo>	 (I guess, but it's probably that way)
[19:55:20] <tonythomas>	 hoo: that makes it almost impossible to test on beta :(
[19:55:44] <grrrit-wm>	 (03PS5) 10Physikerwelt: Re-enable all Math modes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/139421 (https://bugzilla.wikimedia.org/66587) (owner: 10Reedy)
[19:55:45] <hoo>	 No, just create a polonium-equivalent in beta and use that
[19:55:50] <hoo>	 I don't know what would block that
[19:56:01] <hoo>	 except that it's quite some work
[19:56:05] <hoo>	 maybe
[19:56:11] <tonythomas>	 hoo: yeah. that would be great.
[19:56:29] <tonythomas>	 and make beta be mail serve-d through that one ?
[19:56:42] <hoo>	 not necessary outgoing mail, I guess
[19:56:49] <hoo>	 but it should at least be in the return path
[19:57:07] <tonythomas>	 yeah. we will need the mx records for that hostname 
[19:57:43] <tonythomas>	 so that someone looking up for deployment.wikimedia.beta.wmflabs.org should find that mx 
[19:57:47] <hoo>	 I guess that can be done... virt1000 is the dns server for beta
[19:57:54] <tonythomas>	 yeah. 
[19:57:55] <hoo>	 maybe that can even be done via wikitech
[19:57:56] <hoo>	 no idea
[19:58:18] <tonythomas>	 hoo: any idea who I should ping on a Sunday ?
[19:58:59] <hoo>	 Probably no one
[19:59:19] <tonythomas>	 hmm. sundays :|
[19:59:54] <tonythomas>	 we actually discussed about this earlier though - thought it would be difficult - and thought of limitting our tests only to labs 
[20:01:04] <hoo>	 That's not up to me to decide... if Jeff is ok with that, maybe
[20:01:46] <hoo>	 so things to do: 1) Get the extension into prod, 2) Configure it for prod, 3) Set up the exim in prod to make use of the extension
[20:02:07] <hoo>	 1. Step needs someone to say it's ok to ge tthat extension deployed (greg?)
[20:02:13] <tonythomas>	 true. Configuring is done - I think 
[20:02:26] <tonythomas>	 1) is the tough job 
[20:02:53] <tonythomas>	 its been through the sec-review before getting into beta - but no other reviews 
[20:03:19] <tonythomas>	 Nemo_bis: was going through the bugs today.
[20:03:54] <hoo>	 tonythomas: Ok, so I guess it needs the perf. one and then you can poke greg
[20:04:46] <tonythomas>	 yeah. I will add that in the bug
[20:20:32] <Nemo_bis>	 tonythomas, I would be interested in knowing how many emails we send daily on average
[20:21:06] <Nemo_bis>	 We have the exim stats in ganglia but I never undestood how "true" they are
[20:21:45] <tonythomas>	 Nemo_bis: few hundreds in a minute ?
[20:22:17] <Nemo_bis>	 exim stats say something like a hundred per second
[20:22:47] <tonythomas>	 oh. 
[20:23:10] <tonythomas>	 we where discussing on the possbility of having a polonium-equivallent in beta 
[20:23:19] <tonythomas>	 so that we can test the extension happily 
[20:27:03] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0]  
[20:53:30] <grrrit-wm>	 (03CR) 10Ebrahim: [C: 031] Enable Echo for Persian Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164491 (https://bugzilla.wikimedia.org/71669) (owner: 10Reza)
[20:54:29] <grrrit-wm>	 (03CR) 10Ebrahim: [C: 031] Add namespace alias on fa.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164773 (https://bugzilla.wikimedia.org/71668) (owner: 10Calak)
[20:54:39] <grrrit-wm>	 (03CR) 10Ebrahim: [C: 031] Prevent search engines from indexing user pages and all talk pages on ckbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164766 (https://bugzilla.wikimedia.org/71663) (owner: 10Calak)
[20:59:49] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0]  
[21:14:16] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0]  
[21:14:56] <se4598_2>	 hey ops, wikpedia: timeout on accessing wikipedia (via esams)
[21:15:44] <hoo>	 looks fine over here
[21:16:35] <se4598_2>	 hm, my traceroute/ping shows only one response of 10
[21:17:13] <hoo>	 100% over 15 packets here
[21:17:17] <FlorianSW>	 ho se4598_2 having same problem when try to visit dewiki and enwiki
[21:17:28] <hoo>	 I'd blame your or your ISPs connectivity
[21:17:37] <hoo>	 What ISPs do you have?
[21:17:54] <FlorianSW>	 DTAG mobile and wire connection, germany
[21:18:16] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0]  
[21:18:22] <hoo>	 DTAG mobile works for me as well
[21:18:33] <se4598_2>	 Unitymedia, Germany
[21:18:33] <FlorianSW>	 mw.org, too :/
[21:18:39] <FlorianSW>	 wow, all german :D
[21:18:40] <se4598_2>	 two tracerts: http://pastebin.com/raw.php?i=0yz7zLJF
[21:19:10] * FlorianSW waiting for tracert...
[21:19:39] <se4598_2>	 maybe the route via telia has a problem/is overloaded?
[21:20:25] <FlorianSW>	 se4598_2 my goes over adm-b5-link.telia.net too... still waiting for finish :)
[21:21:14] <se4598_2>	 FlorianSW, yeah, I canceled my second route, b/c clearly beyond finish. first stopped normally
[21:21:28] <FlorianSW>	 hoo: have you a tracert?
[21:21:40] <FlorianSW>	 se4598_2 i hope it finish sometimes ;) But i think no :(
[21:22:00] <hoo>	 FlorianSW: I do, but am busy atm
[21:22:25] <FlorianSW>	 hoo ok, just to can compare, bc it's working for you :)
[21:22:54] <MC8>	 all the wikis appear to have fallen over for me as well
[21:23:00] <hoo>	 !log Bypassed Wikibase restrictions and set https://www.wikidata.org/wiki/Q183 back to old serialization format
[21:23:00] * FlorianSW wait's the last 5 hops
[21:23:05] <morebots>	 Logged the message, Master
[21:23:08] <hoo>	 aude: ^
[21:23:11] <se4598_2>	 bits cache network drop, see https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Bits+caches+esams&m=cpu_report&s=by+name&mc=2&g=network_report
[21:23:23] <se4598_2>	 Coren? ^
[21:23:37] <se4598_2>	 op on duty?
[21:23:39] <hoo>	 Maybe it should be superprotected
[21:23:43] <hoo>	 (for real)
[21:23:46] <ori>	 yeah, something is up
[21:23:48] <FlorianSW>	 se4598_2: http://pastebin.com/raw.php?i=KLM2Ce4T
[21:24:15] <ori>	 i pinged a couple of folks, will page if no one replies
[21:24:26] <FlorianSW>	 looks like the same as se4598_2's one
[21:24:28] <ori>	 se4598_2: seems to be recovering
[21:24:34] <se4598_2>	 just as we speak, graph goes up
[21:24:43] <se4598_2>	 ori, recovering for me
[21:25:08] <FlorianSW>	 ori same for me (at least on dewiki)
[21:27:24] <hoo>	 oh ffs
[21:27:30] <hoo>	 zend segfault
[21:27:33] <hoo>	 arrrg
[21:28:04] <ori>	 yeah
[21:28:17] <ori>	 hoo: what's the story with https://www.wikidata.org/wiki/Q183 ?
[21:28:39] <hoo>	 ori: It's to large for our new serialization format
[21:28:46] <hoo>	 it kind of worked in the old
[21:28:50] <hoo>	 but it's still awry
[21:28:59] <hoo>	 and it's causing mayor troubles
[21:29:06] <hoo>	 (also in the Wikipedias)
[21:29:14] <ori>	 what sort of major troubles?
[21:29:24] <hoo>	 ori: Pages can't be edited/ re-parsed
[21:29:28] <MatmaRex>	 articles that reference it not being editable :D
[21:29:46] <hoo>	 and watchlists fataling and "fun" like that
[21:30:45] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0]  
[21:32:42] <ori>	 hoo: is there a plan for fixing it?
[21:33:10] <hoo>	 ori: yes, but nothing we can do in even a week
[21:33:17] <hoo>	 so I'm just going to reset it
[21:33:24] <hoo>	 and maybe then super protect even
[21:33:45] <ori>	 would that make pages that reference it editable again?
[21:33:47] <hoo>	 as every edit will kill it again (if an edit makes it through)
[21:33:50] <hoo>	 ori: Yes
[21:34:02] <ori>	 cool; do you need any help from me?
[21:35:09] <hoo>	 No, I don't think so
[21:35:22] <hoo>	 I guess these are the segfaults we saw before
[21:35:35] <hoo>	 so I'll just search for a revision that is so much older
[21:35:45] <hoo>	 one that renders
[21:36:07] <hoo>	 when stuff is ok again, we can revert back
[21:41:04] * hoo cries
[21:41:12] <hoo>	 stupid php 5.3
[21:42:10] <hoo>	 now I found a revision that doesn't segfault... but it's oom
[21:42:22] * hoo goes back further
[21:42:47] <hoo>	 I found one
[21:42:53] <hoo>	 going to hell for that
[21:59:48] <Coren>	 Bleh, why is it shit hits fans when I'm eating?
[22:00:31] <Coren>	 'nything I can do to help?
[22:00:31] <JohnLewis>	 Coren: because you do such a good job, life feels the need to throw stuff at you when you're not here?
[22:01:08] * Coren reads backlog.
[22:01:14] <Coren>	 I have +staff if you need to superprotect.
[22:01:48] * Coren idly wonders if that segfaults hhvm too.
[22:01:48] <hoo>	 Coren: I guess that would be handy... I could also give my real name account +sysadmin, I guess?
[22:02:04] <Coren>	 I don't think +sysadmin has superprotect.
[22:02:14] <JohnLewis>	 It does not last I checked.
[22:02:15] <hoo>	 but i can arrange that
[22:02:24] <legoktm>	 can we just make sure admins don't touch it?
[22:02:31] <hoo>	 legoktm: Ok, will do
[22:02:35] <Coren>	 legoktm: That's what superprotect /does/
[22:02:38] <legoktm>	 using superprotect is just going to kick off the drahmaz
[22:02:50] <hoo>	 blehr
[22:02:55] <legoktm>	 Coren: I meant socially, not technically.
[22:03:04] <Coren>	 legoktm: Not with an explanation.  "Breats wikimedia because bug.  Don't touch until bug is fix.  ktxbai.
[22:03:29] <legoktm>	 Coren: people will still get upset over it :P
[22:03:40] <hoo>	 I don't think so
[22:03:42] <bd808>	 super protect + wikidata item for Germany == web rage :(
[22:03:52] <hoo>	 bd808: Do you think so?
[22:03:59] <JohnLewis>	 This would be the best case ever :p
[22:04:09] <Coren>	 No, seriously, nobody is issane enough to not understand that's a bug that breaks the wiki.
[22:04:15] <ori>	 hoo: I don't think so
[22:04:16] <hoo>	 yeah
[22:04:17] <ori>	 yeah
[22:04:19] <JohnLewis>	 Super protect is first used on dewiki - second use - protecting the 'German' article on Wikidata! :o
[22:04:30] <JohnLewis>	 *Germany
[22:04:31] <ori>	 it's an unfortunate coincidence and that's it
[22:04:37] <hoo>	 JohnLewis: editing is waas broken for the last 3 months
[22:04:49] <MatmaRex>	 Coren: haha, you're underestimating them
[22:05:09] <hoo>	 Will someone do it, or shall I do it myself
[22:05:09] <JohnLewis>	 ori: we know it is - but the users won't :p
[22:05:22] <hoo>	 (I don't really mind much doing it myself)
[22:05:23] <JohnLewis>	 hoo: urg yeah true
[22:05:28] <Coren>	 MatmaRex: I'm honestly not worried.  I have a lot of tech cred with the dewiki folks; if I say "this is technical, things will break if this is touched" they'll beleive me.
[22:05:28] <legoktm>	 but if hoo does it himself as a volunteer sysadmin, people will be less upset since it's not "omg WMF evil"
[22:05:36] <MatmaRex>	 Coren: have you seen the patch that fixed the order of protection options after superprotect was introduced, that got four -1's from random "community members"?
[22:06:05] <JohnLewis>	 MatmaRex: not random - users who are dewiki edits I believe
[22:06:10] <MatmaRex>	 that was hilarious
[22:06:19] <bd808>	 hoo: Is the serialization bug something that can be fixed quicker if you get some help?
[22:06:50] <MatmaRex>	 Coren: sure, but that's probably not the folks who are going to be making a fuss :>
[22:06:52] <hoo>	 bd808: Not sure what kind of help that would be
[22:06:58] <hoo>	 But, we need a solution now
[22:07:08] <hoo>	 if some admin edits in accident, we're screwed again
[22:07:15] <MatmaRex>	 (i'd still superprotect that wikidata page, i'd just follow that up with getting some popcorn)
[22:07:24] <Coren>	 I'm about to superprotect with "Editing this item will break mediawiki because of a crash-causing bug; this is a temporary safeguard until the bug is fixed."
[22:07:33] <JohnLewis>	 if protecting this protect us - I'm in favour of it.
[22:07:38] <hoo>	 Coren: That would be awesome
[22:07:39] <MatmaRex>	 Coren: link to the bug number
[22:07:46] <Coren>	 MatmaRex: Ref?
[22:07:48] <hoo>	 Bug number is in the edit history
[22:07:48] <MatmaRex>	 or link to something
[22:07:52] <hoo>	 (well one of the many)
[22:07:55] <MC8>	 https://bugzilla.wikimedia.org/show_bug.cgi?id=71519
[22:07:55] <MatmaRex>	 but link
[22:07:56] <hoo>	 it was also segfaulting
[22:07:57] <MatmaRex>	 one sec
[22:08:01] <hoo>	 and running out of memory
[22:08:04] <hoo>	 couple of bugs
[22:08:14] <hoo>	 could be tracking bug about Q183 probs :S
[22:08:17] <MatmaRex>	 yeah, 71519 is probably the best one
[22:08:25] <Coren>	 I'll also put a message on the talk page.
[22:08:32] <MC8>	 Add a "See [[bugzilla:71519]]", I suppose
[22:08:38] <Coren>	 And send an email to Philippe
[22:08:46] <hoo>	 I'll sign that off as Wikidata-Dev after
[22:08:49] <Coren>	 (Because I just use +staff)
[22:09:14] <hoo>	 Coren: I can also get +sysadmin on my non-community works acc. and then do it that way
[22:09:20] <hoo>	 not sure that's better process-wise
[22:09:48] <Coren>	 hoo: I'm ops on duty; I'me exactly the right person to intervene (and get any flak)
[22:09:49] <MC8>	 surely sysop-only is fine unless you can't trust wikidata admins to behave?
[22:09:57] <hoo>	 Coren: Ok, thanks
[22:09:59] <Lydia_WMDE>	 hey folks
[22:10:00] <Lydia_WMDE>	 wasup?
[22:10:08] <MatmaRex>	 hmm
[22:10:10] <hoo>	 Lydia_WMDE: Will give you a summary in a bit
[22:10:15] <MatmaRex>	 actually, protecting a page insert a null revision
[22:10:25] <MatmaRex>	 will that revision use the old (good) or new (broken) format?
[22:10:27] <Coren>	 MC8: It'd be fine except that an error breaks the wiki.
[22:10:29] <JohnLewis>	 hoo: I poked her about it since y'know :)
[22:10:59] <hoo>	 !log WD:Q183 was frozen on version 120566337, see bug 71519 (and others)
[22:11:05] <morebots>	 Logged the message, Master
[22:11:09] <hoo>	 That version is pretty old
[22:11:26] <hoo>	 but it was the first that actually worked w/o hitting the f...
[22:11:32] <Lydia_WMDE>	 why does this need superprotect?
[22:11:47] <hoo>	 Lydia_WMDE: If an admin edits, serialization changes again and it will fail again
[22:12:01] <JohnLewis>	 ^ that which I was going to say in easier terms :p
[22:12:04] <MC8>	 don't forget to make yourself a userpage, Coren 
[22:12:23] <hoo>	 Also we probably want to revert back to pre-trouble at some point
[22:12:25] <Lydia_WMDE>	 beurocrats then and tell them?
[22:12:28] <Coren>	 MC8: I'm doing the emergency communication now, I'll do so right after.
[22:12:56] <hoo>	 Coren: Ah, doh
[22:13:06] <JohnLewis>	 Coren: I did it for you to help you :)
[22:13:12] <hoo>	 we froze it on an even older version now
[22:13:18] <hoo>	 but I don't think that's needed
[22:13:36] <hoo>	 (see the revision size as a estimate of the size)
[22:13:42] <Coren>	 hoo: That's not an issue - it's "somewhere in the past that doesn't break" only until the issue is fixed -- which revision is immaterial so long as it doesn't explode.
[22:13:47] <hoo>	 Ok
[22:13:57] <Lydia_WMDE>	 i disagree tbh
[22:14:38] <hoo>	 I would also favor to have it on 120566337 which should be ok-ish
[22:14:52] <Lydia_WMDE>	 if users on-wiki can't fix it it needs to be frozen at an acceptable revision at least
[22:14:58] <hoo>	 I only went back further because I thought it also fails, but that was a cache
[22:16:04] <Lydia_WMDE>	 hoo: do we know what the actual issue is?
[22:16:16] <hoo>	 Lydia_WMDE: About which of the bugs?
[22:16:22] <Coren>	 Lydia_WMDE: I'm more than happy to help the community find a better version, so long as things don't break.  This is an emergency protection measure.
[22:16:27] <hoo>	 Coren: Ok
[22:16:58] <hoo>	 If you update the protection now, the revision you create will be the oldid 120566337
[22:17:03] <Lydia_WMDE>	 hoo: all of the ones causing this protection
[22:17:16] <hoo>	 that is the newest I could find that doesn't make stuff go insane
[22:17:55] <hoo>	 Lydia_WMDE: Apparently the fact that the item can't be handled by neither client nor repo broke editing (and watchlists and ...)
[22:18:09] <Lydia_WMDE>	 ok
[22:18:11] <hoo>	 No idea why that happened now and not when this actually started coming up on Thursday
[22:18:13] <Lydia_WMDE>	 why can it not be handled?
[22:18:23] <hoo>	 Handled how?
[22:18:29] <hoo>	 Oh by PHP
[22:18:39] <hoo>	 well... mostly size
[22:19:03] <Lydia_WMDE>	 sigh
[22:19:04] <Lydia_WMDE>	 ok
[22:19:05] <hoo>	 New DataModel and new serialization is less performant then old one, that's why this broke
[22:19:21] <hoo>	 even some of the old serialized version were running OOM
[22:19:22] <Lydia_WMDE>	 ok can you write this all down in an email to tech?
[22:19:35] <hoo>	 Internal tech or public?
[22:19:43] <Lydia_WMDE>	 public is good i think
[22:19:52] <hoo>	 Ok
[22:19:57] <Coren>	 hoo: Can a preview show in advance if a revision would break or not?
[22:20:01] <Lydia_WMDE>	 hoo: thanks!
[22:20:16] <hoo>	 Coren: You can use &oldid with the revision and that should do it
[22:20:23] <hoo>	 but is not 100%, sadly
[22:21:44] <qchris>	 !log Updated gerrit's hooks-bugzilla to 6e1e659 (with hooks-its at a421db4)
[22:21:50] <morebots>	 Logged the message, Master
[22:22:00] <Coren>	 https://www.wikidata.org/w/index.php?title=Talk:Q183&diff=162221367&oldid=162220138 <-- please double check for accuracy?
[22:22:29] <hoo>	 Coren: Can you then please just do some kind of re-protection?
[22:22:34] <hoo>	 I just need a new null-revsion
[22:22:44] <hoo>	 but obviously can't make a new one myself
[22:22:44] <Coren>	 hoo, kk
[22:22:57] <Coren>	 hoo: No.  Q183 500s
[22:23:05] <hoo>	 Awesome
[22:23:31] <Coren>	 Is there a VP-equivalent on Wikidata?
[22:23:37] <hoo>	 VP?
[22:23:44] <bd808>	 Village Pump
[22:23:45] <legoktm>	 WD:Project chat
[22:23:48] <hoo>	 oh sure
[22:23:54] <hoo>	 I guess Lydia_WMDE can post there?
[22:23:58] <MC8>	 salient only has one L btw
[22:24:20] <Lydia_WMDE>	 i'd rather not as i had no say in it and don't have all the details
[22:24:32] <hoo>	 Lydia_WMDE: Ok, fair point
[22:24:45] <Coren>	 I'm doing a post there.
[22:25:26] <bd808>	 I'm getting a 500 from https://www.wikidata.org/wiki/Q183 as logged in user with hhvm enabled.
[22:25:39] <hoo>	 bd808: HHVM doesn't exist on WD right now
[22:26:04] <hoo>	 what the heck
[22:26:37] <Coren>	 hoo: Can you roll back to the known good rev?
[22:26:52] <hoo>	 Coren: Yeah, will have to :S
[22:27:08] <hoo>	 Sorry for the further trouble
[22:27:10] <bd808>	 My view of https://www.wikidata.org/wiki/Special:Version says different. I have global js that is setting the cookie to use hhvm cluster.
[22:27:19] <hoo>	 bd808: We killed it
[22:27:28] <hoo>	 ask ori... it was causing to much pain
[22:27:36] <grrrit-wm>	 (03PS1) 10QChris: Update hooks-bugzilla to 6e1e659eedc8719a2a0ea0906266738a18c7aa42 [gerrit/plugins] - 10https://gerrit.wikimedia.org/r/164879 
[22:27:43] <Coren>	 hoo: I get Q183 again.  Your doing?
[22:27:49] <ori>	 hoo: what was?
[22:27:51] <hoo>	 !log Q183 is on revision 116786096 again, please don't alter this further!
[22:27:55] <morebots>	 Logged the message, Master
[22:27:59] <hoo>	 Ok, no more experiments
[22:28:05] <hoo>	 this one is good, so keep
[22:28:06] <hoo>	 it
[22:28:22] <Coren>	 !log Q183 superprotected as a safeguard
[22:28:28] <morebots>	 Logged the message, Master
[22:28:41] <hoo>	 ori: We disabled HHVM on Wikidata because of the huge amount of issues
[22:28:53] <Coren>	 There we go.  I go finish my meal now, but I'm keeping an eye on the channel.  Ping me if you need further help.
[22:29:17] <ori>	 hoo: there was one issue, IIRC, and it was supposed to be fixed last tuesday, by a wikidata deploy
[22:29:25] <ori>	 if there are a "huge amount of issues", you guys aren't filing bugs
[22:29:43] <hoo>	 ori: Well, there is more in the logs then we have on bugzilla I think
[22:29:56] <hoo>	 but I'm not sure
[22:31:50] <bd808>	 I don't know that there is any way right now to disable the hhvm cluster for a particular wiki. It would take a varnish patch to ignore the hhvm=true cookie when the hostname matched some blacklist.
[22:31:51] <ori>	 HHVM isn't really optional -- we're in the middle of switching to it. it's OK to call a time-out to fix some issue but I expect some diligence with respect to reporting issues :/
[22:32:11] <ori>	 bd808: we disabled the beta feature; the code for the beta feature unsets the cookie onbeforepagedisplay
[22:32:20] <ori>	 so your global script and wikimediaevents are duking it out on each page load
[22:32:27] <bd808>	 ah
[22:32:30] <hoo>	 ori: We've been doing a lot lately
[22:32:47] <hoo>	 aude will know the exact status
[22:32:47] <ori>	 i know!
[22:32:58] <ori>	 ok. i'm sure we'll work it out.
[22:33:05] <ori>	 no stress, thanks for jumping on this issue.
[22:35:52] <grrrit-wm>	 (03PS1) 10QChris: Linkify Phabricator Task references in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/164880 
[22:45:05] <grrrit-wm>	 (03PS1) 10QChris: Make gerrit set PATCH_TO_REVIEW status only in bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/164881 
[22:47:10] * Coren is back.
[22:47:44] <hoo>	 error logs look okish
[22:53:59] <Coren>	 All of that said, this is just a dam over the flood; this bug is going to bite us in the ass with other items sooner rather than later.
[22:54:35] <hoo>	 Yep
[22:54:58] <hoo>	 We had such troubles before (on smaller scale)
[22:55:18] <hoo>	 it's always solvable, but it requires major changes to a lot of layers so nothing we can do in a blink
[23:01:04] <Coren>	 Incidentally, if I'm going to be asked why "I don't trust admins", the reason is simple:  It's not a question of trust but of ability to fix.
[23:02:36] <hoo>	 Yep... also protection in Wikibase is not well enough visible to avoid accidential changes (especially using scripts and the API)
[23:02:39] <hoo>	 which admins do
[23:44:37] <grrrit-wm>	 (03PS1) 10Ori.livneh: mediawiki: install `perf` on Trusty app servers [puppet] - 10https://gerrit.wikimedia.org/r/164883