[00:02:36] !log upgrading libtiff on imagescalers [00:02:42] Logged the message, Master [00:08:18] (03CR) 10Dzahn: [C: 032] "this isn't touching the original fenari version" [operations/puppet] - 10https://gerrit.wikimedia.org/r/130600 (owner: 10Dzahn) [00:15:23] (03PS6) 10Dzahn: manutius: remove torrus [operations/puppet] - 10https://gerrit.wikimedia.org/r/130587 (owner: 10Matanya) [00:17:42] (03CR) 10Dzahn: [C: 04-1] "it's still torrus.wikimedia.org is an alias for manutius.wikimedia.org. making DNS change" [operations/puppet] - 10https://gerrit.wikimedia.org/r/130587 (owner: 10Matanya) [00:22:08] (03PS1) 10Withoutaname: Reduce string URLs to defined constant [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131914 (https://bugzilla.wikimedia.org/48618) [00:25:38] (03PS1) 10Dzahn: switch torrus over to netmon1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/131915 [00:26:56] (03CR) 10Dzahn: "only after Change-Id: I58090c5293" [operations/puppet] - 10https://gerrit.wikimedia.org/r/130587 (owner: 10Matanya) [01:20:59] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 6 below the confidence bounds [02:12:26] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3788 MB (3% inode=99%): [02:15:47] !log LocalisationUpdate completed (1.24wmf2) at 2014-05-07 02:14:44+00:00 [02:15:57] Logged the message, Master [02:20:36] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3434 MB (3% inode=99%): [02:28:26] !log LocalisationUpdate completed (1.24wmf3) at 2014-05-07 02:27:23+00:00 [02:28:33] Logged the message, Master [03:12:43] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed May 7 03:11:36 UTC 2014 (duration 11m 35s) [03:12:49] Logged the message, Master [03:36:13] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Tue May 6 08:27:45 2014 [03:36:22] RECOVERY - Disk space on virt0 is OK: DISK OK [04:12:29] (03PS1) 10Ori.livneh: role::applicationserver [operations/puppet] - 10https://gerrit.wikimedia.org/r/131928 [04:13:05] (03PS2) 10Ori.livneh: Make mediawiki::jobrunner not include ::mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/131928 [05:00:20] (03PS3) 10Ori.livneh: Make mediawiki::jobrunner not include ::mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/131928 [05:00:22] (03PS1) 10Ori.livneh: Get rid of apache-graceful-all [operations/puppet] - 10https://gerrit.wikimedia.org/r/131931 [05:00:24] (03PS1) 10Ori.livneh: tidy mediawiki::jobrunner [operations/puppet] - 10https://gerrit.wikimedia.org/r/131932 [05:48:46] neon not responding [05:57:56] swapdeath [05:58:05] + oom killer heh [05:59:10] !log powercycled unresponsive neon, swapdeath + oom killer [05:59:18] Logged the message, Master [06:17:06] PROBLEM - MySQL Replication Heartbeat on db1049 is CRITICAL: CRIT replication delay 92029 seconds [06:22:35] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Tue May 6 08:27:45 2014 [06:24:28] (03PS2) 10Springle: Use m2-master CNAME to make DB master rotations neater. This allows a master switch to be a DNS change plus a simple port 3306 tcp redirect with socat until TTL. Should also help if we switch to a haproxy configuration in the future. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131420 [06:24:34] PROBLEM - MySQL Slave Delay on db1049 is CRITICAL: CRIT replication delay 90836 seconds [06:25:41] ACKNOWLEDGEMENT - MySQL Replication Heartbeat on db1049 is CRITICAL: CRIT replication delay 90313 seconds Sean Pringle Catching up after rebuild. - The acknowledgement expires at: 2014-05-09 06:25:10. [06:25:41] ACKNOWLEDGEMENT - MySQL Slave Delay on db1049 is CRITICAL: CRIT replication delay 90357 seconds Sean Pringle Catching up after rebuild. - The acknowledgement expires at: 2014-05-09 06:25:10. [06:40:52] argh [06:40:58] * springle kicks neon [06:42:20] (03PS1) 10Elvey: FixFileRedLinksBySettingwgUploadMissingFileUrlOnEn [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131933 [06:48:35] !log again [06:48:43] Logged the message, Master [06:55:55] PROBLEM - Puppet freshness on amssq52 is CRITICAL: Last successful Puppet run was Wed May 7 03:39:18 2014 [06:55:55] PROBLEM - Puppet freshness on amslvs3 is CRITICAL: Last successful Puppet run was Wed May 7 03:39:19 2014 [06:55:55] PROBLEM - Puppet freshness on cp1046 is CRITICAL: Last successful Puppet run was Wed May 7 03:39:19 2014 [06:55:55] PROBLEM - Puppet freshness on cp1050 is CRITICAL: Last successful Puppet run was Wed May 7 03:39:19 2014 [06:55:55] PROBLEM - Puppet freshness on cp1063 is CRITICAL: Last successful Puppet run was Wed May 7 03:39:19 2014 [06:56:38] <_joe_> mh that isn't good [06:57:55] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Last successful Puppet run was Wed May 7 03:41:49 2014 [06:57:55] PROBLEM - Puppet freshness on dbstore1001 is CRITICAL: Last successful Puppet run was Wed May 7 03:41:24 2014 [06:57:55] PROBLEM - Puppet freshness on mc1007 is CRITICAL: Last successful Puppet run was Wed May 7 03:41:34 2014 [06:57:55] PROBLEM - Puppet freshness on mc1004 is CRITICAL: Last successful Puppet run was Wed May 7 03:41:34 2014 [06:57:55] PROBLEM - Puppet freshness on mc1009 is CRITICAL: Last successful Puppet run was Wed May 7 03:41:24 2014 [06:57:55] PROBLEM - Puppet freshness on wtp1015 is CRITICAL: Last successful Puppet run was Wed May 7 03:41:19 2014 [06:58:28] (03CR) 10Elvey: "For an explanation of what this is for and evidence of consensus, see/start at https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(techn" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131933 (owner: 10Elvey) [07:05:11] (03CR) 10Nemo bis: [C: 04-1] "Please follow https://meta.wikimedia.org/wiki/Requesting_wiki_configuration_changes" (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131933 (owner: 10Elvey) [07:06:55] RECOVERY - Puppet freshness on amssq52 is OK: puppet ran at Wed May 7 07:06:51 UTC 2014 [07:07:05] RECOVERY - Puppet freshness on mw1017 is OK: puppet ran at Wed May 7 07:07:01 UTC 2014 [07:07:15] RECOVERY - Puppet freshness on cp4001 is OK: puppet ran at Wed May 7 07:07:06 UTC 2014 [07:07:19] RECOVERY - Puppet freshness on nescio is OK: puppet ran at Wed May 7 07:07:12 UTC 2014 [07:07:19] RECOVERY - Puppet freshness on search1008 is OK: puppet ran at Wed May 7 07:07:12 UTC 2014 [07:07:25] RECOVERY - Puppet freshness on mw1015 is OK: puppet ran at Wed May 7 07:07:22 UTC 2014 [07:07:25] RECOVERY - Puppet freshness on tin is OK: puppet ran at Wed May 7 07:07:22 UTC 2014 [07:07:35] RECOVERY - Puppet freshness on mw1101 is OK: puppet ran at Wed May 7 07:07:27 UTC 2014 [07:07:35] RECOVERY - Puppet freshness on lanthanum is OK: puppet ran at Wed May 7 07:07:27 UTC 2014 [07:07:36] RECOVERY - Puppet freshness on mw1162 is OK: puppet ran at Wed May 7 07:07:32 UTC 2014 [07:07:36] RECOVERY - Puppet freshness on cp1050 is OK: puppet ran at Wed May 7 07:07:32 UTC 2014 [07:07:45] RECOVERY - Puppet freshness on db1048 is OK: puppet ran at Wed May 7 07:07:37 UTC 2014 [07:07:45] RECOVERY - Puppet freshness on cp1046 is OK: puppet ran at Wed May 7 07:07:42 UTC 2014 [07:07:45] RECOVERY - Puppet freshness on mw1095 is OK: puppet ran at Wed May 7 07:07:42 UTC 2014 [07:07:45] RECOVERY - Puppet freshness on es1005 is OK: puppet ran at Wed May 7 07:07:42 UTC 2014 [07:07:45] RECOVERY - Puppet freshness on mw1044 is OK: puppet ran at Wed May 7 07:07:42 UTC 2014 [07:07:55] RECOVERY - Puppet freshness on db1036 is OK: puppet ran at Wed May 7 07:07:52 UTC 2014 [07:08:05] RECOVERY - Puppet freshness on es1003 is OK: puppet ran at Wed May 7 07:07:57 UTC 2014 [07:08:05] RECOVERY - Puppet freshness on lvs4003 is OK: puppet ran at Wed May 7 07:07:57 UTC 2014 [07:08:05] RECOVERY - Puppet freshness on mw1127 is OK: puppet ran at Wed May 7 07:08:02 UTC 2014 [07:08:15] RECOVERY - Puppet freshness on gadolinium is OK: puppet ran at Wed May 7 07:08:07 UTC 2014 [07:08:15] RECOVERY - Puppet freshness on mc1005 is OK: puppet ran at Wed May 7 07:08:07 UTC 2014 [07:08:25] RECOVERY - Puppet freshness on amslvs3 is OK: puppet ran at Wed May 7 07:08:22 UTC 2014 [07:08:35] RECOVERY - Puppet freshness on mw1190 is OK: puppet ran at Wed May 7 07:08:27 UTC 2014 [07:08:35] RECOVERY - Puppet freshness on mw1111 is OK: puppet ran at Wed May 7 07:08:27 UTC 2014 [07:08:35] RECOVERY - Puppet freshness on db1019 is OK: puppet ran at Wed May 7 07:08:32 UTC 2014 [07:08:45] RECOVERY - Puppet freshness on es4 is OK: puppet ran at Wed May 7 07:08:38 UTC 2014 [07:08:45] RECOVERY - Puppet freshness on mw1214 is OK: puppet ran at Wed May 7 07:08:38 UTC 2014 [07:08:55] RECOVERY - Puppet freshness on mw1182 is OK: puppet ran at Wed May 7 07:08:48 UTC 2014 [07:08:55] RECOVERY - Puppet freshness on mw1125 is OK: puppet ran at Wed May 7 07:08:48 UTC 2014 [07:09:05] RECOVERY - Puppet freshness on mw1004 is OK: puppet ran at Wed May 7 07:09:03 UTC 2014 [07:09:05] RECOVERY - Puppet freshness on virt1004 is OK: puppet ran at Wed May 7 07:09:03 UTC 2014 [07:09:15] RECOVERY - Puppet freshness on dobson is OK: puppet ran at Wed May 7 07:09:08 UTC 2014 [07:09:15] RECOVERY - Puppet freshness on elastic1015 is OK: puppet ran at Wed May 7 07:09:08 UTC 2014 [07:09:16] RECOVERY - Puppet freshness on ms-be1012 is OK: puppet ran at Wed May 7 07:09:08 UTC 2014 [07:09:16] RECOVERY - Puppet freshness on db1004 is OK: puppet ran at Wed May 7 07:09:08 UTC 2014 [07:09:16] RECOVERY - Puppet freshness on labsdb1002 is OK: puppet ran at Wed May 7 07:09:08 UTC 2014 [07:09:16] RECOVERY - Puppet freshness on mw1159 is OK: puppet ran at Wed May 7 07:09:08 UTC 2014 [07:09:16] RECOVERY - Puppet freshness on cp1063 is OK: puppet ran at Wed May 7 07:09:13 UTC 2014 [07:09:25] RECOVERY - Puppet freshness on mw1196 is OK: puppet ran at Wed May 7 07:09:18 UTC 2014 [07:09:25] RECOVERY - Puppet freshness on mw1057 is OK: puppet ran at Wed May 7 07:09:23 UTC 2014 [07:09:25] RECOVERY - Puppet freshness on amssq57 is OK: puppet ran at Wed May 7 07:09:23 UTC 2014 [07:09:25] RECOVERY - Puppet freshness on cp1069 is OK: puppet ran at Wed May 7 07:09:23 UTC 2014 [07:09:55] RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Wed May 7 07:09:48 UTC 2014 [07:09:55] RECOVERY - Puppet freshness on search1024 is OK: puppet ran at Wed May 7 07:09:53 UTC 2014 [07:10:05] RECOVERY - Puppet freshness on mw1081 is OK: puppet ran at Wed May 7 07:09:58 UTC 2014 [07:10:05] RECOVERY - Puppet freshness on mw1147 is OK: puppet ran at Wed May 7 07:10:03 UTC 2014 [07:10:05] RECOVERY - Puppet freshness on mw1171 is OK: puppet ran at Wed May 7 07:10:03 UTC 2014 [07:10:15] RECOVERY - Puppet freshness on mw1130 is OK: puppet ran at Wed May 7 07:10:08 UTC 2014 [07:10:16] RECOVERY - Puppet freshness on wtp1004 is OK: puppet ran at Wed May 7 07:10:13 UTC 2014 [07:10:25] RECOVERY - Puppet freshness on mw1124 is OK: puppet ran at Wed May 7 07:10:18 UTC 2014 [07:10:25] RECOVERY - Puppet freshness on cp4007 is OK: puppet ran at Wed May 7 07:10:23 UTC 2014 [07:10:25] RECOVERY - Puppet freshness on mw1161 is OK: puppet ran at Wed May 7 07:10:23 UTC 2014 [07:10:35] RECOVERY - Puppet freshness on mw1087 is OK: puppet ran at Wed May 7 07:10:33 UTC 2014 [07:10:45] RECOVERY - Puppet freshness on lvs1006 is OK: puppet ran at Wed May 7 07:10:38 UTC 2014 [07:10:45] RECOVERY - Puppet freshness on mc1009 is OK: puppet ran at Wed May 7 07:10:43 UTC 2014 [07:10:55] RECOVERY - Puppet freshness on mw1188 is OK: puppet ran at Wed May 7 07:10:48 UTC 2014 [07:11:05] RECOVERY - Puppet freshness on mw1210 is OK: puppet ran at Wed May 7 07:10:55 UTC 2014 [07:11:05] RECOVERY - Puppet freshness on mc1004 is OK: puppet ran at Wed May 7 07:10:55 UTC 2014 [07:11:05] RECOVERY - Puppet freshness on wtp1015 is OK: puppet ran at Wed May 7 07:11:00 UTC 2014 [07:11:15] RECOVERY - Puppet freshness on analytics1015 is OK: puppet ran at Wed May 7 07:11:05 UTC 2014 [07:11:15] RECOVERY - Puppet freshness on dbstore1001 is OK: puppet ran at Wed May 7 07:11:10 UTC 2014 [07:11:25] RECOVERY - Puppet freshness on mc1007 is OK: puppet ran at Wed May 7 07:11:15 UTC 2014 [07:11:55] RECOVERY - Puppet freshness on analytics1014 is OK: puppet ran at Wed May 7 07:11:45 UTC 2014 [07:12:55] PROBLEM - Puppet freshness on labsdb1004 is CRITICAL: Last successful Puppet run was Wed May 7 04:11:56 2014 [07:12:55] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Wed May 7 04:12:06 2014 [07:12:55] RECOVERY - Puppet freshness on labsdb1004 is OK: puppet ran at Wed May 7 07:12:50 UTC 2014 [07:12:55] RECOVERY - Puppet freshness on lvs3003 is OK: puppet ran at Wed May 7 07:12:50 UTC 2014 [07:14:55] PROBLEM - Puppet freshness on cp4002 is CRITICAL: Last successful Puppet run was Wed May 7 04:14:10 2014 [07:14:55] PROBLEM - Puppet freshness on mw1153 is CRITICAL: Last successful Puppet run was Wed May 7 04:14:20 2014 [07:14:55] RECOVERY - Puppet freshness on cp4002 is OK: puppet ran at Wed May 7 07:14:51 UTC 2014 [07:14:55] RECOVERY - Puppet freshness on mw1153 is OK: puppet ran at Wed May 7 07:14:51 UTC 2014 [07:17:55] PROBLEM - Puppet freshness on cp4005 is CRITICAL: Last successful Puppet run was Wed May 7 04:17:40 2014 [07:17:55] PROBLEM - Puppet freshness on elastic1016 is CRITICAL: Last successful Puppet run was Wed May 7 04:17:40 2014 [07:17:55] PROBLEM - Puppet freshness on nickel is CRITICAL: Last successful Puppet run was Wed May 7 04:17:34 2014 [07:17:55] PROBLEM - Puppet freshness on ssl1005 is CRITICAL: Last successful Puppet run was Wed May 7 04:17:34 2014 [07:18:05] RECOVERY - Puppet freshness on cp4005 is OK: puppet ran at Wed May 7 07:18:03 UTC 2014 [07:18:16] RECOVERY - Puppet freshness on elastic1016 is OK: puppet ran at Wed May 7 07:18:08 UTC 2014 [07:18:16] RECOVERY - Puppet freshness on nickel is OK: puppet ran at Wed May 7 07:18:08 UTC 2014 [07:18:25] RECOVERY - Puppet freshness on ssl1005 is OK: puppet ran at Wed May 7 07:18:19 UTC 2014 [07:19:55] PROBLEM - Puppet freshness on db1010 is CRITICAL: Last successful Puppet run was Wed May 7 04:19:36 2014 [07:20:17] RECOVERY - Puppet freshness on db1010 is OK: puppet ran at Wed May 7 07:20:12 UTC 2014 [07:21:13] (03CR) 10Springle: [C: 032] Use m2-master CNAME to make DB master rotations neater. This allows a master switch to be a DNS change plus a simple port 3306 tcp redirect [operations/puppet] - 10https://gerrit.wikimedia.org/r/131420 (owner: 10Springle) [07:21:15] PROBLEM - Puppet freshness on mw1040 is CRITICAL: Last successful Puppet run was Wed May 7 04:20:26 2014 [07:22:03] RECOVERY - Puppet freshness on mw1040 is OK: puppet ran at Wed May 7 07:21:58 UTC 2014 [07:22:13] PROBLEM - Puppet freshness on aluminium is CRITICAL: Last successful Puppet run was Wed May 7 04:20:46 2014 [07:22:13] PROBLEM - Puppet freshness on amssq59 is CRITICAL: Last successful Puppet run was Wed May 7 04:20:51 2014 [07:22:13] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Last successful Puppet run was Wed May 7 04:21:57 2014 [07:22:13] PROBLEM - Puppet freshness on cp3006 is CRITICAL: Last successful Puppet run was Wed May 7 04:20:46 2014 [07:22:13] PROBLEM - Puppet freshness on db1053 is CRITICAL: Last successful Puppet run was Wed May 7 04:21:37 2014 [07:22:14] PROBLEM - Puppet freshness on db1063 is CRITICAL: Last successful Puppet run was Wed May 7 04:21:36 2014 [07:22:14] PROBLEM - Puppet freshness on lvs4002 is CRITICAL: Last successful Puppet run was Wed May 7 04:21:52 2014 [07:22:15] PROBLEM - Puppet freshness on magnesium is CRITICAL: Last successful Puppet run was Wed May 7 04:21:47 2014 [07:22:15] PROBLEM - Puppet freshness on manutius is CRITICAL: Last successful Puppet run was Wed May 7 04:21:52 2014 [07:22:16] PROBLEM - Puppet freshness on mc1011 is CRITICAL: Last successful Puppet run was Wed May 7 04:21:26 2014 [07:22:16] PROBLEM - Puppet freshness on mc1016 is CRITICAL: Last successful Puppet run was Wed May 7 04:21:52 2014 [07:22:17] PROBLEM - Puppet freshness on mchenry is CRITICAL: Last successful Puppet run was Wed May 7 04:21:52 2014 [07:22:17] PROBLEM - Puppet freshness on ms-be1004 is CRITICAL: Last successful Puppet run was Wed May 7 04:21:57 2014 [07:22:18] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Last successful Puppet run was Wed May 7 04:21:26 2014 [07:22:18] PROBLEM - Puppet freshness on mw1031 is CRITICAL: Last successful Puppet run was Wed May 7 04:21:16 2014 [07:22:19] PROBLEM - Puppet freshness on mw1072 is CRITICAL: Last successful Puppet run was Wed May 7 04:21:11 2014 [07:22:19] PROBLEM - Puppet freshness on mw1080 is CRITICAL: Last successful Puppet run was Wed May 7 04:21:31 2014 [07:22:20] PROBLEM - Puppet freshness on mw1178 is CRITICAL: Last successful Puppet run was Wed May 7 04:21:01 2014 [07:22:20] PROBLEM - Puppet freshness on mw1134 is CRITICAL: Last successful Puppet run was Wed May 7 04:21:31 2014 [07:22:21] PROBLEM - Puppet freshness on ssl1007 is CRITICAL: Last successful Puppet run was Wed May 7 04:21:37 2014 [07:22:21] PROBLEM - Puppet freshness on wtp1013 is CRITICAL: Last successful Puppet run was Wed May 7 04:21:01 2014 [07:22:45] RECOVERY - Puppet freshness on mc1011 is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:45] RECOVERY - Puppet freshness on cp3006 is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:45] RECOVERY - Puppet freshness on analytics1011 is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:45] RECOVERY - Puppet freshness on mw1072 is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:45] RECOVERY - Puppet freshness on manutius is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:46] RECOVERY - Puppet freshness on mc1016 is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:46] RECOVERY - Puppet freshness on aluminium is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:47] RECOVERY - Puppet freshness on db1063 is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:47] RECOVERY - Puppet freshness on wtp1013 is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:48] RECOVERY - Puppet freshness on mw1031 is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:48] RECOVERY - Puppet freshness on lvs4002 is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:49] RECOVERY - Puppet freshness on mw1134 is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:49] RECOVERY - Puppet freshness on mw1080 is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:50] RECOVERY - Puppet freshness on amssq59 is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:50] RECOVERY - Puppet freshness on mw1178 is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:51] RECOVERY - Puppet freshness on ms-fe1003 is OK: puppet ran at Wed May 7 07:22:39 UTC 2014 [07:22:51] RECOVERY - Puppet freshness on ssl1007 is OK: puppet ran at Wed May 7 07:22:40 UTC 2014 [07:32:47] hello [07:49:36] hi hashar is there any future for : https://gerrit.wikimedia.org/r/#/c/16419/ [07:49:44] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 12 data above and 9 below the confidence bounds [07:50:05] matanya: I think that is an awesome feature to add to mediawiki [07:50:31] matanya: was written via Google summer of code iirc. Might want to get it some more attention [07:50:41] yes, was in 2012 [07:50:48] matanya: though the patch it is so old it is probably going to be hard to rebase or might not be acceptable anymore [07:51:43] it was raised in he.wiki as a wanted feature, but seems abandoned [07:54:53] !log Jenkins: installing Claim plugin (allow folks to comment on builds and mark them) [07:55:01] Logged the message, Master [07:55:25] matanya: if you are interested, you can emit the feature to Dan Garry the mwcore product manager at WMF [07:55:29] matanya: he might be able to do something [07:56:01] i'll try to get his attention later today, thanks hashar [08:12:35] (03CR) 10Elvey: "Thanks for the review!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131933 (owner: 10Elvey) [08:14:12] RECOVERY - MySQL Slave Delay on db1049 is OK: OK replication delay 122 seconds [08:14:14] RECOVERY - MySQL Replication Heartbeat on db1049 is OK: OK replication delay 66 seconds [08:22:17] (03CR) 10Matanya: Move cluster definition to the node level. (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/130591 (owner: 10Giuseppe Lavagetto) [08:46:10] (03PS2) 10Odder: Add a $wgUploadMissingFileUrl entry for enwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131933 (owner: 10Elvey) [08:53:56] (03PS3) 10Odder: Add a $wgUploadMissingFileUrl entry for enwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131933 (owner: 10Elvey) [08:56:06] akosiaris: morning, can you please support my ferm activities ? i would like if you can look at : https://etherpad.wikimedia.org/p/nodes_with_a_public_IP and help prioritize. and any other useful comment [08:56:58] matanya: I can try. I am in the middle of debugging torrus right now though. I will look into later in the day [08:57:19] thanks a lot. [09:05:54] (03PS1) 10Giuseppe Lavagetto: Pass the correct arguments to check_ganglia. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131945 [09:06:08] <_joe_> springle: ^ [09:09:44] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [09:14:45] <_joe_> I am going to merge this anyway [09:14:51] (03CR) 10Odder: [C: 031] "Looks fine to me, nothing near controversial :-)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131933 (owner: 10Elvey) [09:15:34] (03CR) 10Giuseppe Lavagetto: [C: 032] "I tested this from neon, worst that can happen is we don't get any improvement." [operations/puppet] - 10https://gerrit.wikimedia.org/r/131945 (owner: 10Giuseppe Lavagetto) [09:20:15] (03PS1) 10Giuseppe Lavagetto: Fix ganglia check command. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131946 [09:21:37] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Fix ganglia check command. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131946 (owner: 10Giuseppe Lavagetto) [09:26:24] <_joe_> I will write a naggen replacement sooner or later I swear [09:40:22] !log springle synchronized wmf-config/db-eqiad.php 'warm up db1049 in s4' [09:40:29] Logged the message, Master [09:40:54] (03PS1) 10Springle: Reassign db1049 to s4 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131947 [09:41:53] (03CR) 10Springle: [C: 04-1] "No submit at LB 300 until warmed up." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131947 (owner: 10Springle) [09:48:43] (03CR) 10Ori.livneh: Add 'rcstream' module for broadcasting recent changes over WebSockets (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/14045) (owner: 10Ori.livneh) [09:50:00] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Tue May 6 08:27:45 2014 [10:01:43] (03PS2) 10Ori.livneh: Get rid of apache-graceful-all [operations/puppet] - 10https://gerrit.wikimedia.org/r/131931 [10:03:40] (03PS14) 10Ori.livneh: Add 'rcstream' module for broadcasting recent changes over WebSockets [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/14045) [10:03:49] (03CR) 10Ori.livneh: [C: 032] Get rid of apache-graceful-all [operations/puppet] - 10https://gerrit.wikimedia.org/r/131931 (owner: 10Ori.livneh) [10:04:04] (03PS4) 10Ori.livneh: Make mediawiki::jobrunner not include ::mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/131928 [10:05:48] (03PS2) 10Ori.livneh: tidy mediawiki::jobrunner [operations/puppet] - 10https://gerrit.wikimedia.org/r/131932 [10:06:43] (03CR) 10Ori.livneh: [C: 032] Make mediawiki::jobrunner not include ::mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/131928 (owner: 10Ori.livneh) [10:07:29] (03CR) 10Ori.livneh: [C: 032] tidy mediawiki::jobrunner [operations/puppet] - 10https://gerrit.wikimedia.org/r/131932 (owner: 10Ori.livneh) [10:13:18] (03PS15) 10Ori.livneh: Add 'rcstream' module for broadcasting recent changes over WebSockets [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/14045) [10:13:34] [10:15:58] <_joe_> ori: you reverted most of my changes :) [10:16:37] <_joe_> the handle_exit function was a way to organize politely the exit of the server [10:16:51] it's still there; it just got renamed [10:16:57] <_joe_> oh sorry [10:17:06] and you missed the fact that link_exception calls the callback with the faulty greenlet as the argument [10:17:17] so you don't need to iterate over greenlets to see which one is the culprit [10:17:58] i removed a couple of logging calls by logging at a lower level [10:18:34] but i wouldn't say reverted -- just my usual OCD tweaking. i kept the principles :) [10:18:39] <_joe_> ori: looking at the code :) [10:18:46] <_joe_> ori: that's what counts [10:19:26] (03CR) 10Springle: [C: 032] Reassign db1049 to s4 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131947 (owner: 10Springle) [10:20:02] <_joe_> ori: ok the code is much better now :) [10:21:10] !log springle synchronized wmf-config/db-eqiad.php 'db1049 to full steam' [10:21:17] Logged the message, Master [10:21:27] <_joe_> ori: I just have one doubt. I'll wrap my head around that anyway [10:21:53] <_joe_> ori: are you in my TZ for a few days, right? so that I don't feel guilty for keeping you awake at crazy hours [10:22:22] i'm not, but my incorrigible insomnia is something of a running joke in this channel [10:22:43] ...and not only in this channel :) [10:22:52] heheh [10:24:55] <_joe_> ori: the trick is to wake up at 6 AM, then you will get to bed at 'normal' times like midnight [10:25:02] * _joe_ insomniac as well [10:29:47] _joe_: what's the one doubt? [10:30:02] "will it blend?"? [10:32:02] (03PS1) 10Filippo Giunchedi: add upstart job for idmapd [operations/puppet] - 10https://gerrit.wikimedia.org/r/131951 [10:32:38] <_joe_> ori: no about the logging adapter, but it's used wisely :) [10:34:30] ah, cool [10:34:36] godog: blech, i hate upstart_job [10:35:52] godog: it does two things: create init.d/ symlinks to ubuntu's upstart-job, for sysv compat which isn't really needed [10:36:08] and a refreshonly exec which starts the service [10:36:35] the latter you get more easily with service { 'foo': ensure => running, provider => upstart } [10:38:26] <_joe_> http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=large&h=neon.wikimedia.org&m=cpu_report&s=descending&mc=2&g=mem_report&c=Miscellaneous+eqiad looks like my patch succeeded [10:38:30] ori: mh yeah that's a better option indeed, I don't mind not having /etc/init.d files anymore esp with trusty [10:39:09] so no more upstart symlinks going forward [10:39:24] <_joe_> godog: why? [10:39:54] we should be using "service" (the command) anyway I'd say, which always DTRT [10:40:36] <_joe_> yes that will also force me to use it, I'm sometimes lazy [10:43:16] ori: any opinion on subscribe inside service vs say, require? I'm not a huge fan but perhaps for idmapd is harmless [10:44:10] (03PS2) 10Filippo Giunchedi: add upstart job for idmapd [operations/puppet] - 10https://gerrit.wikimedia.org/r/131951 [10:44:10] 13:37 < ori> thanks! good night [10:44:11] hello godog and welcome [10:44:15] :) [10:44:41] (03CR) 10Filippo Giunchedi: "ori suggested to have just provider => upstart and let puppet handle it, which I agree" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131951 (owner: 10Filippo Giunchedi) [10:45:01] fwiw, puppet 3 prefers upstart over sysv [10:45:04] hey matanya [10:45:10] so provider => upstart will be redundant [10:45:37] does carbon serve 14.04 images ? [10:46:18] s/images/packages/ [10:47:11] matanya: it does, trusty-wikimedia is the suite [10:47:18] thanks [10:48:17] (03PS1) 10Alexandros Kosiaris: Update torrus configurations [operations/puppet] - 10https://gerrit.wikimedia.org/r/131953 [10:51:36] (03CR) 10Alexandros Kosiaris: [C: 04-2] "Duplicate of https://gerrit.wikimedia.org/r/#/c/131499/" [operations/dns] - 10https://gerrit.wikimedia.org/r/131915 (owner: 10Dzahn) [10:52:57] (03CR) 10Alexandros Kosiaris: [C: 032] Update torrus configurations [operations/puppet] - 10https://gerrit.wikimedia.org/r/131953 (owner: 10Alexandros Kosiaris) [11:55:51] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 3 below the confidence bounds [12:12:51] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [12:22:51] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data exceeded the critical threshold [500.0] [12:27:53] (03PS4) 10Giuseppe Lavagetto: Move cluster definition to the node level. [operations/puppet] - 10https://gerrit.wikimedia.org/r/130591 [12:35:41] anyone already know what's going on? [12:37:34] (03PS1) 10Giuseppe Lavagetto: Get rid of $myshell dynamic lookup. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131961 [12:39:46] <_joe_> bblack: no I did not see the page, sorry [12:40:51] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% data above the threshold [250.0] [12:41:43] I got nly the recovery page just now [12:46:35] https://graphite.wikimedia.org/render/?title=HTTP%205xx%20Responses%20-4hours&from=-4hours&width=1024&height=500&until=now&areaMode=none&hideLegend=false&lineWidth=2&lineMode=connected&target=color%28cactiStyle%28alias%28reqstats.5xx,%225xx%20resp/min%22%29%29,%22blue%22%29 [12:47:11] PROBLEM - Disk space on analytics1019 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/k 74624 MB (3% inode=99%): /var/lib/hadoop/data/d 122387 MB (6% inode=99%): /var/lib/hadoop/data/l 93974 MB (5% inode=99%): /var/lib/hadoop/data/e 94721 MB (5% inode=99%): /var/lib/hadoop/data/g 124107 MB (6% inode=99%): /var/lib/hadoop/data/h 130858 MB (6% inode=99%): /var/lib/hadoop/data/f 102147 MB (5% inode=99%): /var/lib/hadoop/dat [12:49:02] <_joe_> bblack: I took a look on oxygen, around 12:20 we have some 5xx from uslfo text caches [12:49:44] <_joe_> is oxygen the right place to look at? [12:50:51] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Tue May 6 08:27:45 2014 [12:57:55] <_joe_> bblack, apergos this was an imagescalers issue [12:58:04] <_joe_> not *that* relevant, but still... [12:59:39] I saw the spike... what sort of requests, anything we can get a handle on? [13:00:41] <_joe_> thumb.php I'd say [13:01:17] <_joe_> but let me look better [13:03:46] (03PS1) 10Alexandros Kosiaris: torrus: varnishmetric not squidmetric [operations/puppet] - 10https://gerrit.wikimedia.org/r/131967 [13:04:22] (03CR) 10Alexandros Kosiaris: [C: 032] Get rid of $myshell dynamic lookup. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131961 (owner: 10Giuseppe Lavagetto) [13:04:24] <_joe_> mostly bogus requests from a couple of ips [13:04:33] <_joe_> thanks [13:04:44] my god... we were doing if in there ? [13:04:48] ah... [13:04:58] * akosiaris sighs [13:05:01] nice catch [13:05:20] (03CR) 10Alexandros Kosiaris: [C: 032] torrus: varnishmetric not squidmetric [operations/puppet] - 10https://gerrit.wikimedia.org/r/131967 (owner: 10Alexandros Kosiaris) [13:05:31] <_joe_> lol one of the two offending IPs is *ours* [13:05:40] _joe_: want me to also merge ? [13:06:09] <_joe_> akosiaris: be my guest :) [13:07:04] let's see how torrus now likes the config [13:07:07] (03PS1) 10Steinsplitter: adding mindat.org (mineral and locality database) to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131968 [13:10:28] <_joe_> apergos: bblack almost all the failed requests originate from the same IP, in brazil [13:11:40] <_joe_> and most seem attempts to operate on images with urls. [13:12:52] <_joe_> it also seem to have simply tried to uplad a ton of images at the same time via https://commons.wikimedia.org/wiki/Special:UploadWizard [13:13:22] I thought we capped those argh... no? [13:15:01] <_joe_> there are 8 attempts each for urls like this one: https://commons.wikimedia.org/wiki/Special:UploadStash/thumb/129z0fhb21os.7bgi29.3751919.png/100px-129z0fhb21os.7bgi29.3751919.png [13:16:01] <_joe_> exactly 8 for each of these the '3751919.png' is fixed, which makes me think those are a ton of attempts to generate thumbs here [13:20:09] what px for the thumbs for that png? [13:20:12] any pattern? [13:20:28] or just repeated tries of the same group of sizes over and over? [13:21:31] (03PS1) 10Odder: Change wgMetaNamespace for OfficeWiki, keep aliases [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131970 (https://bugzilla.wikimedia.org/64976) [13:21:41] (03CR) 10jenkins-bot: [V: 04-1] Change wgMetaNamespace for OfficeWiki, keep aliases [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131970 (https://bugzilla.wikimedia.org/64976) (owner: 10Odder) [13:23:12] (03PS2) 10Odder: Change wgMetaNamespace for OfficeWiki, keep aliases [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131970 (https://bugzilla.wikimedia.org/64976) [13:29:40] (03PS1) 10Odder: Allow all users on OfficeWiki to send mass messages [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131972 (https://bugzilla.wikimedia.org/64978) [13:34:36] (03CR) 10Andrew Bogott: [C: 032] add upstart job for idmapd [operations/puppet] - 10https://gerrit.wikimedia.org/r/131951 (owner: 10Filippo Giunchedi) [13:39:38] (03CR) 10Steinsplitter: [C: 031] "Looks ok" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130809 (https://bugzilla.wikimedia.org/57819) (owner: 10Withoutaname) [13:43:19] (03PS1) 10Alexandros Kosiaris: torrus: Update varnish configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/131973 [13:43:29] hashar: I have a question about the 'testswarm' user on gallium. It looks like the user is puppetized but the UID is not specified; does that sound right to you? [13:44:00] andrewbogott: yup that is a local user on gallium [13:44:30] Does that user own persistent files, or only own files for the life of a given test? [13:44:31] we are keeping the user around in case we decided to reuse that soft [13:44:45] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] torrus: Update varnish configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/131973 (owner: 10Alexandros Kosiaris) [13:44:54] andrewbogott: well we can probably clean it up. Will ask timo about it [13:45:11] hashar: I ask because gallium's (presumably arbitrary) uid# is the same as Timo's actual officially-allocated UID # [13:45:12] nuke it [13:45:22] So -- ok [13:45:34] I was going to assign it an official UID # but removing it will work as well :) [13:45:46] :) [13:46:29] andrewbogott: I mailed him, he usually reply on spot [13:46:33] ah [13:46:44] you guys can dismiss the email I just sent hehe [13:47:04] (03CR) 10Rillke: "Can you please provide a link to a sample file description page with free license on mindat.org ?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131968 (owner: 10Steinsplitter) [13:47:12] andrewbogott: want me to do the puppet paper work ? [13:47:24] hashar: Yes please, if you happen to know how offhand. [13:47:27] doing so [13:48:14] (03CR) 10Odder: [C: 04-1] "Please fix the first line of the commit message (the so-called commit title) per (03PS1) 10Hashar: contint: get rid of testswarm user [operations/puppet] - 10https://gerrit.wikimedia.org/r/131974 [13:49:53] (03PS1) 10Alexandros Kosiaris: Add missing sites in cache.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/131975 [13:50:31] hashar: is gallium the only box that was applied on? [13:50:44] andrewbogott: yes [13:50:49] (Because of course I will have to reove the user by hand after that is merged) [13:50:50] andrewbogott: I can handle the cleanup on the box [13:50:58] ok, great. [13:51:32] (03CR) 10Alexandros Kosiaris: [C: 032] Add missing sites in cache.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/131975 (owner: 10Alexandros Kosiaris) [13:51:39] (03CR) 10Krinkle: adding mindat.org (mineral and locality database) to wgCopyUploadsDomains (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131968 (owner: 10Steinsplitter) [13:51:42] (03CR) 10Andrew Bogott: [C: 032] contint: get rid of testswarm user [operations/puppet] - 10https://gerrit.wikimedia.org/r/131974 (owner: 10Hashar) [13:52:06] (03PS1) 10Springle: Weekly logical backup for dbstore100[12] all-shards boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/131976 [13:54:10] andrewbogott: puppet fails on gallium while changing timo uid [13:54:19] change from 607 to 2008 failed: Could not set uid on user[krinkle]: Execution of '/usr/sbin/usermod -u 2008 krinkle' returned 4: usermod: UID '2008' already exists [13:54:24] hashar: yes, that is the problem that we are fixing [13:54:36] testswarm is squatting on 2008 [13:54:53] (03CR) 10Springle: "Alex, how can I feed the resulting dumps to bacula?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131976 (owner: 10Springle) [13:55:16] andrewbogott: you can delete the user and group I guess [13:56:29] (03PS5) 10Giuseppe Lavagetto: Move cluster definition to the node level. [operations/puppet] - 10https://gerrit.wikimedia.org/r/130591 [13:56:40] (03CR) 10Steinsplitter: "@rillke:Example: http://www.mindat.org/photo-388401.html = PD" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131968 (owner: 10Steinsplitter) [13:58:11] (03PS2) 10Steinsplitter: Adding mindat.org (mineral and locality database) to wgCopyUploadsDomains to allow uploadbyurl. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131968 [13:59:18] hashar: ok, puppet on gallium seems happy now [13:59:21] \O/ [13:59:26] thx [14:07:55] ok, I have to pack and such. Depending on airplane & airport wifi I may or may not be reachable today. [14:12:09] (03PS2) 10Springle: Weekly logical backup for dbstore100[12] all-shards boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/131976 [14:12:41] (03CR) 10Anomie: [C: 04-1] "Should be an easy fix, I think." (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131970 (https://bugzilla.wikimedia.org/64976) (owner: 10Odder) [14:12:45] ahhh I found out sqlite apparently can be run in memory [14:13:26] (03PS1) 10Alexandros Kosiaris: Add missing sites/roles in cache.pp #2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131977 [14:15:17] (03PS3) 10Springle: Weekly logical backup for dbstore100[12] all-shards boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/131976 [14:15:48] (03CR) 10jenkins-bot: [V: 04-1] Add missing sites/roles in cache.pp #2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131977 (owner: 10Alexandros Kosiaris) [14:17:04] anomie: :-) [14:17:39] (03PS2) 10Alexandros Kosiaris: Add missing sites/roles in cache.pp #2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131977 [14:19:07] twkozlowski: Also, I'm not sure you can really be "responsible" for the OfficeWiki changes if you don't have access to OfficeWiki to test that they worked. ;) But I can test them easily enough. [14:19:25] (03PS3) 10Odder: Change wgMetaNamespace for OfficeWiki and add alias [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131970 (https://bugzilla.wikimedia.org/64976) [14:19:52] <_joe_> akosiaris: that role does something funny with $cluster [14:19:54] anomie: I did check MessagesEn.php to make sure that 'Project' isn't an alias, but forgot the canonical part, thanks for reminding me of it [14:20:12] <_joe_> it's the only place where moving $cluster to the node level does not work. [14:21:31] anomie: Open The Wiki, then? :-)) [14:21:53] twkozlowski: Well above my pay grade, sorry [14:22:05] _joe_: the role::cache::configuration ? [14:22:26] (03PS3) 10Steinsplitter: wgCopyUploadsDomains: Adding mindat.org (mineral and locality database) to wgCopyUploadsDomains to allow uploadbyurl. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131968 [14:23:07] (03CR) 10Alexandros Kosiaris: [C: 032] Add missing sites/roles in cache.pp #2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131977 (owner: 10Alexandros Kosiaris) [14:25:08] Failed to parse /etc/torrus/xmlconfig/varnish.xml: /etc/torrus/xmlconfig/varnish.xml:661: parser error : Opening and ending tag mismatch: subtree line 64 and datasources [14:25:10] (03PS4) 10Odder: Add mindat.org to $wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131968 (owner: 10Steinsplitter) [14:25:26] XML for configuration ???? WHY ? WHY ? WHY ??? [14:25:57] <^d> because java? [14:26:23] <^d> Probably not, but java's usually a good blame target when xml is config :p [14:26:41] well played sir, well played [14:26:58] not java but you had me smile at least [14:27:57] (03PS5) 10Rillke: Add "*.mindat.org" to wgCopyUploadsDomains for Wikimedia Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131968 (owner: 10Steinsplitter) [14:28:43] <^d> akosiaris: I try :p [14:31:34] (03CR) 10Odder: [C: 031] "Thanks for fixing the indentation issue, Rillke, it's been bothering me for a while." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131968 (owner: 10Steinsplitter) [14:32:51] (03CR) 10Rillke: "There doesn't appear to be any bug for the request, so" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131968 (owner: 10Steinsplitter) [14:33:17] (03CR) 10Anomie: [C: 031] "I'm inclined to think that Wikipedia:File_Upload_Wizard should be updated if not just replaced with [[mw:Extension:UploadWizard]] (which h" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131933 (owner: 10Elvey) [14:35:28] (03PS6) 10Odder: Add mindat.org to $wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131968 (owner: 10Steinsplitter) [14:37:26] * Steinsplitter needs to learn how to use git correctly \o/ [14:37:28] (03CR) 10Rillke: "for privileged users and use in the GWToolset tool" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131968 (owner: 10Steinsplitter) [14:38:34] See, Steinsplitter, that's what bugs are for. [14:38:52] (03CR) 10Steinsplitter: "No, it is for a upload campaign. I was ask to build a easy way to upload photos en mass from mindat." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131968 (owner: 10Steinsplitter) [14:38:54] (03CR) 10Odder: "No, but once this is merged, GWToolset users will be able to use the domain in the tool." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131968 (owner: 10Steinsplitter) [14:45:11] PROBLEM - Disk space on analytics1019 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/k 74973 MB (3% inode=99%): /var/lib/hadoop/data/d 127037 MB (6% inode=99%): /var/lib/hadoop/data/l 97607 MB (5% inode=99%): /var/lib/hadoop/data/e 88725 MB (4% inode=99%): /var/lib/hadoop/data/g 99004 MB (5% inode=99%): /var/lib/hadoop/data/f 106217 MB (5% inode=99%): /var/lib/hadoop/data/c 98028 MB (5% inode=99%): /var/lib/hadoop/data/ [14:50:26] manybubbles: I'll do the SWAT today, unless you were wanting to do it for some reason [14:50:51] anomie: I'm fine with you doing it. I'm not deep into anything though so I certainly could [14:51:14] except I _think_ I have a meeting with aude in 10 minutes. or maybe I don't and its in an hour and ten minutes. [14:52:57] i think it's 10 min, afaik [14:53:01] is robla coming? [14:53:17] twkozlowski: I think I'll save the zhsikisource change for last, just in case greg-g shows up and says there's still a blocker on https://gerrit.wikimedia.org/r/#q,127584,n,z [14:53:37] he'll come to the real one, whichever that one is [14:53:40] %@^% [14:53:47] no idea which one [14:54:03] anomie: Just because some people say there's a blocker doesn't mean there is :-(( [14:54:06] so irritating. [14:54:22] Nemo_bis explained it all. [14:54:27] twkozlowski: What irritates me is that there's apparently no public record of what the blocker might *be* [14:54:39] It doesn't exist, that's why it's not public. [14:55:06] They thought it blocked their super-cool feature which isn't even developed yet [14:55:47] If it is just a potential namespace conflict, they already have the problem with other wikisources [14:56:15] It's not, the Translate extension uses a namespace 1000+ while this is 114 or something [14:57:41] The name could conflict is what I was referring to. Numbers are easy to work around. [14:57:49] There is no conflict whatsoever, in namespace or features [14:57:55] No, the name is different as well [14:58:01] manybubbles: we'll see if someone comes but think it is in an hour [14:58:10] it would be 8am in SF which is rather early for robla [14:58:19] WMF is not going to make an extension to translate text from printed books, I can guarantee that :) [14:58:22] yeah [14:58:32] so another hour [14:59:48] <^d> Nemo_bis: I might. You know, weekend project. [14:59:50] <^d> ;-) [15:00:28] * anomie begins the SWAT deploy [15:00:44] :P [15:00:49] twkozlowski: Let's start with https://gerrit.wikimedia.org/r/#/c/131933/ [15:00:59] (03CR) 10Anomie: [C: 032] Add a $wgUploadMissingFileUrl entry for enwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131933 (owner: 10Elvey) [15:01:08] (03Merged) 10jenkins-bot: Add a $wgUploadMissingFileUrl entry for enwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131933 (owner: 10Elvey) [15:01:43] <^d> manybubbles: We don't need $wmgCirrusIsBuilding [15:02:03] ^d: did you remove it already? [15:02:17] <^d> I have a patch, your +1 would be nice. [15:02:18] <^d> https://gerrit.wikimedia.org/r/#/c/131513/ [15:02:30] (03CR) 10Manybubbles: [C: 031] Remove $wmgCirrusIsBuilding. Cirrus is always on everywhere now [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131513 (owner: 10Chad) [15:02:36] done [15:02:51] <^d> thx [15:03:43] !log anomie synchronized wmf-config/InitialiseSettings.php 'SWAT: Set $wgUploadMissingFileUrl for enwiki' [15:03:51] Logged the message, Master [15:03:54] (03CR) 10Manybubbles: [C: 031] "Looks like we have plenty of space." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131647 (owner: 10Chad) [15:03:58] twkozlowski: ^ Test please [15:05:08] * anomie will do https://gerrit.wikimedia.org/r/#/c/131972/ next [15:06:30] anomie: doing... [15:06:43] anomie: \o/ [15:07:04] (03PS2) 10Anomie: Allow all users on OfficeWiki to send mass messages [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131972 (https://bugzilla.wikimedia.org/64978) (owner: 10Odder) [15:07:09] (03CR) 10Anomie: [C: 032] Allow all users on OfficeWiki to send mass messages [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131972 (https://bugzilla.wikimedia.org/64978) (owner: 10Odder) [15:07:17] (03Merged) 10jenkins-bot: Allow all users on OfficeWiki to send mass messages [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131972 (https://bugzilla.wikimedia.org/64978) (owner: 10Odder) [15:09:02] !log anomie synchronized wmf-config/InitialiseSettings.php 'SWAT: Allow all users on OfficeWiki to send mass messages' [15:09:08] Logged the message, Master [15:09:25] * twkozlowski tests... oh well. [15:11:33] oops, let's try that again [15:11:59] * anomie fetched the change, but forgot to actually check it out [15:12:29] !log anomie synchronized wmf-config/InitialiseSettings.php 'SWAT: Allow all users on OfficeWiki to send mass messages (for real this time)' [15:12:34] Logged the message, Master [15:12:48] There, now it worked [15:13:06] * anomie is doing https://gerrit.wikimedia.org/r/#/c/131970/ next [15:13:26] (03CR) 10Anomie: [C: 032] Change wgMetaNamespace for OfficeWiki and add alias [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131970 (https://bugzilla.wikimedia.org/64976) (owner: 10Odder) [15:13:34] anomie: twkozlowski have the i18n team responded yet? [15:13:35] (03Merged) 10jenkins-bot: Change wgMetaNamespace for OfficeWiki and add alias [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131970 (https://bugzilla.wikimedia.org/64976) (owner: 10Odder) [15:14:18] greg-g: Nemo_bis is the only person replying on the patch. I also asked on #mediawiki-i18n, and only Nemo_bis replied [15:14:21] anomie: twkozlowski no, they haven't [15:14:52] please don't deploy, I promised them I would let them respond first. The balls in their court, but it'd be bad of me to just let this through now. [15:15:19] Nemo_bis: Good day to you. [15:15:35] greg-g: ok [15:16:03] Should I be surprised this is happening? [15:16:22] * anomie grumbles about people not bothering to raise these things publicly [15:16:42] !log anomie synchronized wmf-config/InitialiseSettings.php 'SWAT: Change wgMetaNamespace for OfficeWiki and add alias' [15:16:49] Logged the message, Master [15:17:05] twkozlowski: sorry :( [15:17:29] I submitted this patch three weeks ago, and there is no reason why it has to wait [15:17:34] I threw the ball over to their court, they acknowledged, so i took it off my radar :/ [15:17:36] None at all. [15:17:48] greg-g: Had Odder not mentioned that WMF language team had blocked it, I wouldn't have known there ever *was* a blocker and it would've been deployed. Yell at the appropriate people about that for me, please. [15:17:58] anomie: /me nods [15:18:42] !log anomie Running maintenance/namespaceDupes.php on OfficeWiki [15:18:49] Logged the message, Master [15:19:00] twkozlowski: again, sorry, I shold have followed up with them sooner, but I let it fall off my radar assuming they'd respond publicly. [15:19:22] Not the point: this is in no way relevant to their work [15:19:28] It's as if Office IT was blocking it. [15:19:51] !log anomie namespaceDupes.php on OfficeWiki done (that was quick) [15:19:59] Logged the message, Master [15:20:53] greg-g: Please move the patch to the other SWAT window today or tomorrow [15:20:59] * anomie is done with SWAT deploys, unless the Language team clears https://gerrit.wikimedia.org/r/#/c/127584/ in the next 20 minutes [15:21:26] if it doesn't get merged tomorrow, that's another week gone, impeding the work of the zhwikisource community [15:22:27] thanks for your help, anomie [15:22:39] <^d> anomie: Mind doing a no-op config sync while you're on tin already? [15:22:42] <^d> https://gerrit.wikimedia.org/r/#/c/131513/ [15:23:03] ^d: Sure. Throw it on the Deployments page for good measure? [15:23:12] <^d> Will do. [15:23:20] <_joe_> akosiaris: the culprit was $nagios_group of course, the other lovely global variable we have [15:23:54] _joe_: argh [15:24:04] (03CR) 10Anomie: [C: 032] Remove $wmgCirrusIsBuilding. Cirrus is always on everywhere now [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131513 (owner: 10Chad) [15:24:13] (03Merged) 10jenkins-bot: Remove $wmgCirrusIsBuilding. Cirrus is always on everywhere now [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131513 (owner: 10Chad) [15:25:26] <_joe_> akosiaris: also, while hosts having $nagios_group have it completely depend on $cluster (usually $nagios_group == $cluster_$::site ), others DO NOT have it [15:26:05] <_joe_> so, the solution is once again add the damn variable to site.pp in any damn node, just like $cluster [15:26:30] ^d: Out of curiousity, is there a "sync-files" that I just don't know about to avoid having to do two "sync-file" log entries? [15:26:35] !log anomie synchronized wmf-config/CirrusSearch-common.php 'SWAT: Remove obsolete $wmgCirrusIsBuilding (no functionality change)' [15:26:41] Logged the message, Master [15:27:00] <^d> anomie: No, either sync-file or sync-dir. [15:27:08] <^d> sync-dir is usually overkill. [15:27:09] Oh well. [15:27:55] !log anomie synchronized wmf-config/InitialiseSettings.php 'SWAT: Remove obsolete $wmgCirrusIsBuilding (no functionality change)' [15:27:58] * bd808 imagines a day when there is only `scap` [15:28:01] Logged the message, Master [15:28:09] ^d: Done, please make sure nothing broke ;) [15:28:42] <^d> beta feature still showing up where it should [15:28:45] <^d> thanks! [15:30:24] <^d> manybubbles: What magic do I have to do to raise the redundancy for commonswiki_file? [15:31:01] ^d: the top example here is almost exactly it, I think: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html [15:31:21] also, look at index.auto_expand_replicas [15:31:28] might be a useful default for us [15:31:50] like, default it to 3 on everything and users won't have to mess with the replicas settings ever [15:31:57] same for mw_cirrus_versions [15:32:01] <^d> Oh cool. [15:32:38] yeah [15:32:43] I'd never seen it before a few days ago [15:32:46] then I forgot about it [15:32:52] then I saw it again whiel looking that up for you [15:33:23] when maxsem comes back, tell him I think he broke mediawiki-vagrant [15:36:07] <^d> https://gist.github.com/demon/4a967ed8361c706dbad2 - look good? [15:36:11] <^d> if so I'll go ahead and do it [15:37:03] ^d: looks cool [15:37:37] <^d> {"acknowledged":true} [15:38:38] (03CR) 10Ottomata: [C: 032] create admins::bastion for _just_ bastion access [operations/puppet] - 10https://gerrit.wikimedia.org/r/131743 (owner: 10Dzahn) [15:38:40] (03CR) 10Chad: [C: 032] Raise redundancy back up for commonswiki_file as well [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131647 (owner: 10Chad) [15:38:49] (03Merged) 10jenkins-bot: Raise redundancy back up for commonswiki_file as well [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131647 (owner: 10Chad) [15:39:44] (03PS1) 10Mark Bergsma: Remove now unused Tampa LVS service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/131980 [15:40:30] !log demon synchronized wmf-config/CirrusSearch-common.php 'Raised redundancy for commonswiki_file back up, config to match' [15:40:37] Logged the message, Master [15:40:50] (03CR) 10Mark Bergsma: [C: 032] Remove now unused Tampa LVS service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/131980 (owner: 10Mark Bergsma) [15:41:33] (03PS4) 10BryanDavis: Adding $deployable_networks variable in network.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125184 (owner: 10Ottomata) [15:41:48] (03CR) 10jenkins-bot: [V: 04-1] Adding $deployable_networks variable in network.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125184 (owner: 10Ottomata) [15:43:49] (03PS1) 10Alexandros Kosiaris: Fix a torrus config syntax error [operations/puppet] - 10https://gerrit.wikimedia.org/r/131981 [15:43:51] (03PS1) 10Alexandros Kosiaris: torrus: text is dual layered [operations/puppet] - 10https://gerrit.wikimedia.org/r/131982 [15:44:11] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] torrus: text is dual layered [operations/puppet] - 10https://gerrit.wikimedia.org/r/131982 (owner: 10Alexandros Kosiaris) [15:44:24] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Fix a torrus config syntax error [operations/puppet] - 10https://gerrit.wikimedia.org/r/131981 (owner: 10Alexandros Kosiaris) [15:49:03] ^d: sweet. you can check how many it has allocated with the _cat api. play around with it. it loves grep [15:51:41] <^d> We be yellow, hmm [15:51:51] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Tue May 6 08:27:45 2014 [15:52:28] yellow just means its not done copying the new shards [15:52:34] probably [15:53:51] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data exceeded the critical threshold [500.0] [15:53:56] <^d> Ah now we green [15:53:59] <^d> Done I think [16:10:44] (03PS1) 10Alexandros Kosiaris: torrus: remove api-eqiad, remove squid only mib [operations/puppet] - 10https://gerrit.wikimedia.org/r/131988 [16:12:51] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% data above the threshold [250.0] [16:14:34] (03CR) 10Alexandros Kosiaris: [C: 032] torrus: remove api-eqiad, remove squid only mib [operations/puppet] - 10https://gerrit.wikimedia.org/r/131988 (owner: 10Alexandros Kosiaris) [16:21:16] (03PS1) 10Manybubbles: Update highlighter to 0.0.8 [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/131990 [16:21:39] (03CR) 10Manybubbles: [C: 032 V: 032] Update highlighter to 0.0.8 [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/131990 (owner: 10Manybubbles) [16:23:20] unless anyone objects I'm going to deploy the hebrew analyzer and a highlighter update to elasticsearch on the cluster. [16:23:35] greg-g: you that means a rolling restart of elasticsearch [16:23:56] the last time I did this the monitoring complained about it going red for a few seconds. I'm going to see if I can prevent that this time around. [16:27:17] (03Abandoned) 10Manybubbles: Update highlighter to 0.0.8 [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/131990 (owner: 10Manybubbles) [16:28:38] (03PS1) 10Manybubbles: Install hebrew analyzer and update highlighter [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/131992 [16:28:47] (03CR) 10Manybubbles: [C: 032 V: 032] Install hebrew analyzer and update highlighter [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/131992 (owner: 10Manybubbles) [16:29:08] there we go, that last one was broken because of a weird dependency [16:30:39] (03PS2) 10BBlack: Add (ten|quality).m.wikipedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/131896 (https://bugzilla.wikimedia.org/64972) (owner: 10MaxSem) [16:30:42] <_joe_> !log restarted mwprof/profiler-to-carbon on tungsten, stuck somehow [16:30:49] Logged the message, Master [16:31:49] (03CR) 10BBlack: [C: 032 V: 032] Add (ten|quality).m.wikipedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/131896 (https://bugzilla.wikimedia.org/64972) (owner: 10MaxSem) [16:32:29] silence means no objects then [16:32:47] (03PS3) 10BBlack: Disable mobile redirection for a bunch of .wikipedia.org domains [operations/puppet] - 10https://gerrit.wikimedia.org/r/131887 (https://bugzilla.wikimedia.org/64972) (owner: 10MaxSem) [16:33:13] (03PS1) 10Alexandros Kosiaris: ganglia::collector on netmon1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131993 [16:33:14] !log performing a rolling restart on elasticsearch nodes in production to pick up new plugins: experimental-highlight 0.0.8 and analysis-hebrew 1.1.0 [16:33:22] Logged the message, Master [16:34:50] (03CR) 10BBlack: [C: 032 V: 032] Disable mobile redirection for a bunch of .wikipedia.org domains [operations/puppet] - 10https://gerrit.wikimedia.org/r/131887 (https://bugzilla.wikimedia.org/64972) (owner: 10MaxSem) [16:35:01] (03CR) 10Alexandros Kosiaris: [C: 032] ganglia::collector on netmon1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131993 (owner: 10Alexandros Kosiaris) [16:36:02] matanya: when I'm done with the rolling restart (a few hours) I'll rebuild he.wikipedia.org's cirrus index with hebmorph. could you try it there once I'm done? it might be pretty late by the time it actually finishes. [16:42:14] oh lord, torrus configuration compiled successfully for the very first time in what is quite probably months [16:44:41] (03PS1) 10Alexandros Kosiaris: Give netmon1001 manutius's special ganglia handling [operations/puppet] - 10https://gerrit.wikimedia.org/r/131995 [16:46:03] (03CR) 10Alexandros Kosiaris: [C: 032] Give netmon1001 manutius's special ganglia handling [operations/puppet] - 10https://gerrit.wikimedia.org/r/131995 (owner: 10Alexandros Kosiaris) [16:49:57] akosiaris: I never trust anything that compiles the first time. Therefor that configuration must be great! [16:50:16] <_joe_> damn, tor is down [16:50:57] manybubbles: heh, it only compiles. I am not absolutely sure it does anything meaningful yet [16:51:04] but here's to hoping :-) [16:51:58] http://tinyurl.com/og6apls [16:52:05] manybubbles: yes, if it will be really late, i'll do it tomorrow [16:52:11] matanya: thanks! [16:53:03] it'll be another 4ish hours before the rolling restart is done. another while for the rebuild. I forget how long that takes hewiki. [16:53:22] akosiaris: i think i should warn you next time i start rolling the train of moving stuff. i didn't know torrus will be such a headache :) [16:53:34] neither did I [16:53:42] I was pretty sure it would be easy [16:53:48] FLW [16:54:17] ori, do you know if this code is in prod? https://gerrit.wikimedia.org/r/#/c/131011/ [16:55:04] yurikR: ori it is not [16:55:09] yurikR: afaik it is not yet [16:55:09] it is [16:55:12] oh? [16:55:15] backported? [16:55:22] checking on tin... [16:55:24] dr0ptp4kt added it to monday's swat [16:55:27] gotcha [16:55:30] then yeah, backported [16:55:31] thanks :) [16:55:50] * greg-g grumbles about no easy way to answer this other than with git commands [16:56:00] will deploy https://gerrit.wikimedia.org/r/#/c/130991/ than [16:56:31] PROBLEM - ElasticSearch health check on elastic1002 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.109 [16:56:43] greg-g: do you have an idea how to implement it ? [16:57:27] icinga-wm: I know, I did that intentionally [17:00:54] I have an idea of what it would look like, but not the implementation of getting the info exactly (I mean, the easy case is doable) [17:01:29] can you please elaborate greg-g ? [17:16:34] !log yurik synchronized php-1.24wmf3/extensions/ZeroRatedMobileAccess/ [17:16:40] Logged the message, Master [17:19:18] !log yurik synchronized php-1.24wmf2/extensions/ZeroRatedMobileAccess/ [17:19:25] Logged the message, Master [17:27:51] PROBLEM - ElasticSearch health check on elastic1003 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.110 [17:31:02] (03CR) 10Dzahn: "Ori, so what replaces this? is there a salt group for just the right servers? can we keep the script name and just change the content?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131931 (owner: 10Ori.livneh) [17:31:39] matanya: hey, sorry, was helping someone test Zero things on my phone. So, for the simple use case the Gerrit API is sufficient (where a thing was merged into master and you're curious which branches were cut that include it). The backport use case is harder and I just need to find the right git command to help. [17:32:06] (03CR) 10Dzahn: "could you replace it here? https://wikitech.wikimedia.org/wiki/Apache#Deploying_config" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131931 (owner: 10Ori.livneh) [17:33:17] I see greg-g the simple case shows master, you still need to go to wikitech deployment to see if it was cut and when [17:33:19] (03CR) 10Dzahn: "it was also running dologmsg for SAL" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131931 (owner: 10Ori.livneh) [17:34:34] mutante: wikitech says: Note: this script is in /home/wikipedia/bin/ on fenari and NOT in puppet. [17:34:54] ori: eh, you just deleted it from puppet [17:34:56] matanya: right, you get which branches it's in from gerrit api (on the webview click "Included In"), then either query/screenscrap the roadmap wiki page or we finally get rid of the effing thing and make it all sensible (db-backed) [17:34:56] the version in fenari:/home/wikipedia/bin doesn't call apache-sanity-check, so it actually works [17:35:03] mutante: yes, deleted both versions [17:35:16] mutante: but maybe we should replace it with fenari's copy [17:35:45] ori: no, what i put into puppet was the straight copy from fenari [17:35:56] i puppetized it to put it onto tin [17:36:02] then i made the change, not touching the fenari copy [17:36:20] greg-g: is there a way (i'm thinking git log) to find what is in prod, and export it to a site ? [17:37:07] well, we do create these pages programattically using git log output: https://www.mediawiki.org/wiki/MediaWiki_1.24/wmf3 [17:37:25] but, those are only single points in time, ie: it's already out of date since it doesn't list backports [17:37:56] so just add a cron job to update gerrit/bugzilla when something gets deployed [17:38:05] ori: the sanity check just fails on some boxes not on the relavant ones, so it does actually work [17:38:16] oh, i see what you mean [17:38:27] but whatever we use, let's fix the docs, not just remove it [17:38:37] mutante: well, the docs are still accurate [17:38:49] ori: the whole point is to get rid of fenari [17:38:51] mutante: in fact, they weren't accurate before, btu they are now [17:38:55] that is why i was moving it [17:38:55] "Note: this script is in /home/wikipedia/bin/ on fenari and NOT in puppet." [17:39:03] matanya: yeah, some kind of hook or cron job could fix that issue "easily" [17:39:09] matanya: maybe that's it [17:39:20] mutante: yeah, i think we should move it back into puppet, but it should be the correct script [17:39:25] ori, which server should kaldari log into for querying the eventlogging db? [17:39:27] matanya: maybe that's the easy hack the hack way forward instead of rewriting it all to be better [17:39:28] i think we can add a post merge hook [17:39:28] ori: i'm trying to get rid of fenari, that is precisely why i made that puppet role [17:39:53] ori, checking out the MobileOperatorCode stuff. just hit the API once per second 100 times, which I think should generate and event [17:39:56] * matanya is looking at ^d  [17:39:59] matanya: I'm only wary of a git-hook because I don't want weirdness with wikitech/mw.org to block a deployment :) [17:40:07] mutante: yes, but the file you put in was broken, and it was in two different places in puppet [17:40:15] and the docs weren't updated to indicate that it was in puppet [17:40:18] I'd rather it be async for safety. eventual consistency and such. [17:40:30] mutante: i'll follow up with another patch for you to review, just a moment [17:40:31] greg-g: i think ^d can answer the best [17:40:32] <^d> huh? [17:40:33] yurik + yurikR, seeing anything in the fatals log? [17:40:54] dr0ptp4kt, no, why? [17:41:00] ^d: is it possible to add a post merge hook to gerrit to show a change was merged ? [17:41:01] ori: i put it in exactly as i found it on fenari, then applied the fix to make it clear.. ok, thanks [17:41:02] yurikR, just wondering just in case [17:41:07] deployed [17:41:13] not merged [17:41:14] mutante: what was the fix? [17:42:01] ori: removing that part about detecting "home mounted apaches" https://gerrit.wikimedia.org/r/#/c/130600/2/modules/apachesync/files/apache-graceful-all [17:42:08] <^d> git hooks in gerrit don't work like normal git hooks. [17:42:11] ori: and replacing ddsh with dsh [17:42:30] mutante: i have to run for a bit, but as i understand it, what needs to be done is: [17:42:39] 1) the docs should remove the note about the file not being in puppet, and the file being in fenari [17:42:53] <^d> I actually ripped all of the old python gerrit hooks out on purpose. [17:42:54] 2) the file should move back into puppet, but it should be fixed properly [17:43:04] <^d> I'm not really wanting to bring them back [17:43:12] 2a) it should not call apache-sanity-check [17:43:29] 2b) it should not rely on /etc/cluster == pmtpa [17:43:34] ^d: use case: we want to know what patch was deployed where, can you suggest an easy why to show that in gerrit interface ? [17:43:46] 2b makes it only log on fenari anyway [17:43:53] so it wasn't truly decoupled from it [17:44:06] <^d> matanya: not really because gerrit's interface sucks. [17:44:22] any other idea ^d ? [17:44:30] mutante: if you don't beat me to it, i'll submit a patch, but i can't right now since i have to run for a bit [17:44:34] <^d> Check the git repo, that's always the best. [17:45:06] ^d: i have no access to prod [17:45:23] <^d> The repo's public. Only thing that wouldn't be is a security patch. [17:45:24] matanya: ^d yeah, sounds like cron job is easiest for now [17:45:44] matanya: 1 hour error band is probably ok [17:45:55] i guess i can do that [17:46:30] where do you want to see it greg-g ? [17:46:46] ori: i agree with all of that,and we were in the middle of doing that. i would have updated docs once it actually worked of course [17:46:49] It could just be an addendum/special section on those wmfXX pages [17:46:59] matanya: see for reference: https://git.wikimedia.org/blob/mediawiki%2Ftools%2Frelease.git/HEAD/make-deploy-notes%2FuploadChangelog.php [17:47:13] 500 greg-g [17:47:16] but just deleting it without a replacement [17:47:42] ? [17:47:43] gitblit is down most of the time [17:47:46] oh [17:48:07] <^d> http://fab.wmflabs.org/rMW8c6fca60f2b1fa6fa3408e92a22da0f1be536f49 - I like how differential presents the data. [17:48:12] <^d> Nice list of branches/tags it's on [17:48:14] so, I think first pass just a wiki page proof of concept, we can do fun things after [17:48:37] "Branches master, wmf/1.24wmf3 " yeah [17:48:47] commented how it doesn't touch the original on fenari to avoid that [17:48:55] <^d> Go back a little later, you see even more: http://fab.wmflabs.org/rMWaa85568350993d7d0be7bc69b8ed96b388dbc07d [17:48:57] ^d: the problem is backports, at least gerrit isn't smart enough to know to link them [17:49:13] <^d> s/to know to link them// [17:49:19] ^d: eg: https://gerrit.wikimedia.org/r/#/c/131011/ that's out on prod as a backport [17:49:22] ^d: touche ;) [17:51:32] greg-g: any labs host with all the code checked out already ? [17:51:56] probably, but not sure which one you'd want to mess with [17:52:02] <^d> Only the initial clone takes awhile. [17:52:20] I guess ideally you can bang on it in the beta cluster once its ready though [17:52:52] greg-g: my plan is simple: git log take all hashses and throw on a wiki page, run hourly [17:53:11] * greg-g nods [17:53:50] <^d> The full history of mw core is over 60k commits. You probably don't want the full list of hashes. [17:53:59] not "all" [17:54:03] :) [17:55:24] yurikR, things seem to be working. i'm running the phantomJS stuff from my local machine [17:56:43] (03CR) 10Dzahn: "note how this did "if [ `cat /etc/cluster` == eqiad ]", replaced ddsh with dsh, removed the "home-mounted" stuff, and did not touch the fe" [operations/puppet] - 10https://gerrit.wikimedia.org/r/130600 (owner: 10Dzahn) [18:17:09] (03PS1) 10Dzahn: re-add apache-graceful-all with some fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/132011 [18:18:39] (03PS2) 10Dzahn: re-add apache-graceful-all with some fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/132011 [18:20:31] RECOVERY - ElasticSearch health check on elastic1002 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2049: active_shards: 6146: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [18:20:41] PROBLEM - ElasticSearch health check on elastic1005 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.112 [18:23:07] (03PS3) 10Dzahn: re-add apache-graceful-all with some fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/132011 [18:31:59] (03PS4) 10Dzahn: re-add apache-graceful-all with some fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/132011 [18:35:05] (03CR) 10Dzahn: "is the removal of package wikimedia-job-runner related ?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131931 (owner: 10Ori.livneh) [18:37:37] Reedy: do you know where /etc/cluster came from? [18:37:56] on fenari it has "pmtpa" in it, but we don't have it on tin yet with "eqiad" [18:39:08] ah, think i found it, nvm [18:39:21] in modules/appserver [18:40:21] so then the thing is that class applicationserver::config::apache isn't on tin [18:47:41] RECOVERY - ElasticSearch health check on elastic1005 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2049: active_shards: 6146: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [18:47:51] RECOVERY - ElasticSearch health check on elastic1003 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2049: active_shards: 6146: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [18:52:51] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Tue May 6 08:27:45 2014 [19:12:27] (03CR) 10Matanya: [C: 031] create admins::bastion for _just_ bastion access [operations/puppet] - 10https://gerrit.wikimedia.org/r/131743 (owner: 10Dzahn) [19:13:19] is there a nice graph of Commons uploads somewhere? [19:13:40] I want to find out the high and low values for upload/minute [19:15:06] tgr: I don't think we have anyting like it; I for one have never heard of it. [19:15:36] tgr: The best place to ask, I think, is on the Village pump on Commons, maybe someone will have the answer for you [19:16:56] There might be some graphs for the load of servers that handle uploads, in which case I just don't know they exist :-) [19:30:23] greg-g: is the deployment train chugging along as usual through the Zurich Hackathon? [19:30:52] ragesoss: yeppers [19:31:07] * ragesoss wipes forhead, heaves a sigh of relief. [19:33:51] PROBLEM - ElasticSearch health check on elastic1008 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.140 [19:38:49] I'd love it if I could pre-silence the criticals from restarting the service.... [19:42:14] twkozlowski: http://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&m=bytes_in&s=by+name&c=Upload+caches+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&hide-hf=false&sh=1&z=small&hc=4 [19:42:24] seems to be the closest thing [19:42:34] manybubbles: You need an icinga account.. And maybe some rights [19:42:36] mutante: ^^ [19:43:14] manybubbles: we should be able to fix that [19:43:21] icinga login = LDAP wmf [19:43:31] but some more permissions to be able to ACK / scheduled downtime [19:43:46] http://ganglia.wikimedia.org/latest/?c=Swift%20eqiad&h=Swift%20eqiad%20prod&m=cpu_report&r=hour&s=descending&hc=4&mc=2 would be the closest thing [19:43:49] it's not great [19:43:59] and it's not just commons [19:46:34] the rights are in puppet, I had to fix them for myself a while back (after LDAP is working) [19:47:04] I don't think so many alerts should fire off when restarting a service, though [19:47:05] files/icinga/cgi.cfg [19:47:07] mutante: "Not Authorized" - but I got most of the way through [19:47:37] (several lines in there, search for all instances of bblack) [19:47:51] puppet/files/icinga/cgi.cfg [19:48:08] grep authorized cgi.cfg [19:48:15] paravoid: what we're seeing is the critical from the restart - the whole cluster hangs in yellow for most of the time when I restart them all [19:48:30] when it goes green is actually the trigger I can move on to the next one [19:52:19] (03PS1) 10Dzahn: add account for Dmitry Brant [operations/puppet] - 10https://gerrit.wikimedia.org/r/132024 [19:54:11] PROBLEM - Disk space on analytics1019 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/j 120814 MB (6% inode=99%): /var/lib/hadoop/data/k 76058 MB (4% inode=99%): /var/lib/hadoop/data/d 93439 MB (4% inode=99%): /var/lib/hadoop/data/l 75016 MB (3% inode=99%): /var/lib/hadoop/data/e 99738 MB (5% inode=99%): /var/lib/hadoop/data/g 98839 MB (5% inode=99%): /var/lib/hadoop/data/h 117565 MB (6% inode=99%): /var/lib/hadoop/data/ [19:54:51] PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.141 [19:55:15] (03PS1) 10Dzahn: allow manybubbles to run icinga commands [operations/puppet] - 10https://gerrit.wikimedia.org/r/132026 [19:56:40] I was trying to figure out which one to add.... [19:56:43] but it looks like all of them [19:56:46] thanks [19:57:04] Looks like that should be templated or something ;) [19:57:25] (03CR) 10Manybubbles: [C: 031] allow manybubbles to run icinga commands [operations/puppet] - 10https://gerrit.wikimedia.org/r/132026 (owner: 10Dzahn) [19:58:31] the Union of Sysadmins Who Like to Type a Lot vetoes your template suggestion [19:59:06] 0,$s/bblack/bblack,manybubbles/g :p [20:01:23] * subbu readies to update parsoid [20:03:10] (03PS1) 10Manybubbles: Turn new highlighter on for more wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132097 [20:04:16] (03CR) 10Manybubbles: "Any objection to this plan? I don't want to just do all the wikis that we're secondary on because I'd like to be able to pay attention du" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132097 (owner: 10Manybubbles) [20:05:57] (03CR) 10Chad: [C: 031] Turn new highlighter on for more wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132097 (owner: 10Manybubbles) [20:07:11] !log deployed parsoid 71f4e884 (with deploy sha 9a62899d) [20:07:18] Logged the message, Master [20:11:11] (03CR) 10Manybubbles: "Added max so he knows I'll be reindexing these tomorrow around 9am sf time which should make geodata start showing up. I won't do a full " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132097 (owner: 10Manybubbles) [20:15:19] lets see if icinga notices the next one.... [20:16:51] RECOVERY - ElasticSearch health check on elastic1008 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2049: active_shards: 6146: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [20:16:51] RECOVERY - ElasticSearch health check on elastic1009 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2049: active_shards: 6146: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [20:17:28] it missed that one [20:21:24] (03PS1) 10RobH: adding iridium to dns [operations/dns] - 10https://gerrit.wikimedia.org/r/132103 [20:22:42] hrmm [20:25:01] (03CR) 10RobH: [C: 031] "looks good, but letting chase push live" [operations/dns] - 10https://gerrit.wikimedia.org/r/132103 (owner: 10RobH) [20:27:50] (03CR) 10Rush: [C: 032 V: 032] "mmmm dns good ...hopefully this isn't an elaborate joke on the new guy" [operations/dns] - 10https://gerrit.wikimedia.org/r/132103 (owner: 10RobH) [20:37:01] OMGOMGOMGRUSHBROKEWIKIPEDIAAAAAA [20:37:31] ? [20:37:51] PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.143 [20:38:34] oops [20:41:03] ? [20:41:55] eh, elastic1011 is up [20:42:25] and has that IP.. should we do anything about the monitoring? [20:42:42] ferm rules? [20:43:51] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds [20:44:13] mutante: its "yellow" [20:44:19] meaning the cluster isn't healthy [20:44:41] we can do things about it, but I'm not really sure what [20:45:19] well, if it's not healthy then monitoring did its job? [20:45:53] thought it was just a false positive, but then it would be ok [20:48:16] mutante: it isn't healthy - it is only running with 2x redundency instead of 3x [20:48:27] that is how you do a rolling restart most of the time though [20:48:38] there are other ways, but they are _so slow_ [20:48:59] you still keep your 3x, but its like 3x the time [20:49:06] manybubbles: if you say this is part of the rolling restart then nvm [20:50:08] mutante: 12:33:13 manybubbles | !log performing a rolling restart on elasticsearch nodes in production to pick up [20:50:31] manybubbles: gotcha, yep [20:50:31] if it was all over in a few minutes I don't think anyone would care [20:50:52] but it takes so long that the log is outside of most people's radar [20:51:30] yea, back to "scheduled downtime" in icinga [20:51:48] If I could I'd just report downtime on the whole cluster and then watch it out of the corner of my eye [20:51:51] RECOVERY - ElasticSearch health check on elastic1011 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2049: active_shards: 6146: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [20:51:53] either you'll get the access or you can ping me next time if yu like [20:52:49] mutante: if you want to you can schedule downtime for elastic1012-1016 - they'll be unreachable for a total of about 120 seconds during the service bounce [20:52:53] but only one time [20:53:17] I think there is a special downtime type for that [20:53:22] triggered or something [20:53:27] for a couple hours? [20:54:22] i see what you mean, yep [20:56:53] mutante: looks like I just did 12. I'll do 13 in about 20, 25 minutes [20:59:28] manybubbles: in scheduled downtime for 2 hours.. was easier than the trigger [20:59:48] mutante: thanks! [21:00:12] you can watch them at once on https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=elastic [21:00:55] (adding real servicegroup = nice to have) [21:01:29] but we don't use this much yet https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?servicegroup=all&style=overview&nostatusheader [21:06:30] (03PS1) 10Dzahn: add dbrant to mobile release uploaders [operations/puppet] - 10https://gerrit.wikimedia.org/r/132109 [21:16:02] (03PS1) 10Dzahn: add dbrant to stat1003 "special users" and bast [operations/puppet] - 10https://gerrit.wikimedia.org/r/132110 [21:20:13] manybubbles: is it done? [21:21:03] (03PS1) 10Gergő Tisza: Throttle GWToolset uploads [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132112 [21:24:49] matanya: no - I realize I can't do it until _after_ the train deploy tomorrow [21:25:01] we don't have any code that supports using the hebrew analyzer until then [21:44:18] (03CR) 10Gergő Tisza: "The 10% target is completely random; I will change if someone has a better suggestion." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132112 (owner: 10Gergő Tisza) [21:45:14] (03CR) 10Ottomata: [C: 032] add dbrant to stat1003 "special users" and bast [operations/puppet] - 10https://gerrit.wikimedia.org/r/132110 (owner: 10Dzahn) [21:53:51] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Tue May 6 08:27:45 2014 [22:10:34] (03CR) 10Rillke: [C: 031] "Site looks good to me. Clear authorship and license information available." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131968 (owner: 10Steinsplitter) [22:19:51] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [22:22:32] (03PS1) 10Ori.livneh: Apply ::mediawiki::jobrunner on osmium [operations/puppet] - 10https://gerrit.wikimedia.org/r/132119 [22:24:31] (03CR) 10Ori.livneh: [C: 032] Apply ::mediawiki::jobrunner on osmium [operations/puppet] - 10https://gerrit.wikimedia.org/r/132119 (owner: 10Ori.livneh) [22:32:45] (03PS1) 10Rush: dhcp and partmon for iridium [operations/puppet] - 10https://gerrit.wikimedia.org/r/132120 [22:34:28] (03PS1) 10Ori.livneh: Remove dependency on mediawiki::packages from mediawiki::sync [operations/puppet] - 10https://gerrit.wikimedia.org/r/132121 [22:34:30] (03CR) 10Dzahn: [C: 04-1] "fixed-address iron.wikimedia.org; = fixed-address iridium.wikimedia.org;" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132120 (owner: 10Rush) [22:35:09] (03CR) 10Rush: "thanks" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132120 (owner: 10Rush) [22:36:00] (03CR) 10Dzahn: "re-adding in I4d1b36476555" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131931 (owner: 10Ori.livneh) [22:36:07] (03PS2) 10Rush: dhcp and partmon for iridium [operations/puppet] - 10https://gerrit.wikimedia.org/r/132120 [22:36:51] mutante: is there no salt group for it? [22:36:54] for the apaches, i mean [22:38:17] (03CR) 10Dzahn: [C: 031] "Prefix Vendor" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132120 (owner: 10Rush) [22:39:22] (03PS1) 10Ori.livneh: Provision mediawiki::sync on osmium [operations/puppet] - 10https://gerrit.wikimedia.org/r/132122 [22:39:33] ori: i don't think so, that's what i asked [22:40:26] (03CR) 10Ori.livneh: [C: 032] Remove dependency on mediawiki::packages from mediawiki::sync [operations/puppet] - 10https://gerrit.wikimedia.org/r/132121 (owner: 10Ori.livneh) [22:41:44] (03CR) 10Ori.livneh: [C: 031] "+1, but this should really be replaced with something that leverages salt, instead of being another special case." [operations/puppet] - 10https://gerrit.wikimedia.org/r/132011 (owner: 10Dzahn) [22:42:08] bugzilla admin online? [22:42:17] (03PS3) 10Rush: dhcp and partmon for iridium [operations/puppet] - 10https://gerrit.wikimedia.org/r/132120 [22:42:37] ori: yes, just the intention was "first move existing stuff, then talk about trebuchet/salt/deployment later" [22:43:04] also tried to not touch any mwdeployment things.. [22:44:06] (03PS4) 10Rush: dhcp and partmon for iridium [operations/puppet] - 10https://gerrit.wikimedia.org/r/132120 [22:44:31] mutante: makes sense :) [22:45:32] (03CR) 10Ori.livneh: [C: 032] Provision mediawiki::sync on osmium [operations/puppet] - 10https://gerrit.wikimedia.org/r/132122 (owner: 10Ori.livneh) [22:45:36] (03CR) 10RobH: [C: 031] dhcp and partmon for iridium [operations/puppet] - 10https://gerrit.wikimedia.org/r/132120 (owner: 10Rush) [22:46:00] (03CR) 10Rush: [C: 032 V: 032] "awayyyyyyy!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132120 (owner: 10Rush) [22:56:15] (03PS1) 10Ori.livneh: Add osmium to mediawiki-installation [operations/puppet] - 10https://gerrit.wikimedia.org/r/132123 [22:56:39] I'm going to run a no-op scap in a moment to ensure that scapping to osmium works [22:56:59] (by 'no-op' I mean that no code change is going out.) [22:57:11] <^d> no objection here. [22:57:27] (03PS2) 10Ori.livneh: Add osmium to mediawiki-installation [operations/puppet] - 10https://gerrit.wikimedia.org/r/132123 [22:59:40] (03CR) 10Ori.livneh: [C: 032] Add osmium to mediawiki-installation [operations/puppet] - 10https://gerrit.wikimedia.org/r/132123 (owner: 10Ori.livneh) [23:02:51] !log ori Started scap: No changes; testing scap to osmium [23:02:56] Logged the message, Master [23:08:07] An unknown error occurred in storage backend "local-swift-eqiad". at Wed, 07 May 2014 23:07:37 GMT served by mw1135 [23:08:08] o-O [23:08:54] (03PS1) 10Ori.livneh: Make mediawiki::sync depend on mediawiki::user::* [operations/puppet] - 10https://gerrit.wikimedia.org/r/132125 [23:09:39] (03PS2) 10Ori.livneh: Make mediawiki::sync depend on mediawiki::user::* [operations/puppet] - 10https://gerrit.wikimedia.org/r/132125 [23:11:13] (03CR) 10Ori.livneh: [C: 032] Make mediawiki::sync depend on mediawiki::user::* [operations/puppet] - 10https://gerrit.wikimedia.org/r/132125 (owner: 10Ori.livneh) [23:14:18] !log ori scap aborted: No changes; testing scap to osmium (duration: 11m 27s) [23:14:22] !log ori Started scap: No changes; testing scap to osmium (again) [23:14:25] Logged the message, Master [23:14:32] Logged the message, Master [23:17:23] (03PS1) 10Rush: iridium dhcp mac correction [operations/puppet] - 10https://gerrit.wikimedia.org/r/132126 [23:17:48] !log ori Finished scap: No changes; testing scap to osmium (again) (duration: 03m 25s) [23:17:55] Logged the message, Master [23:18:40] chasemp: multiple nics, eh [23:18:40] (03CR) 10Rush: [C: 032 V: 032] "so sayeth the robh" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132126 (owner: 10Rush) [23:18:57] (03PS1) 10Ori.livneh: include admins::mortals on osmium, to allow MediaWiki deployments [operations/puppet] - 10https://gerrit.wikimedia.org/r/132127 [23:19:23] (03CR) 10Ori.livneh: [C: 032 V: 032] include admins::mortals on osmium, to allow MediaWiki deployments [operations/puppet] - 10https://gerrit.wikimedia.org/r/132127 (owner: 10Ori.livneh) [23:19:39] (03PS2) 10Ori.livneh: include admins::mortals on osmium, to allow MediaWiki deployments [operations/puppet] - 10https://gerrit.wikimedia.org/r/132127 [23:19:39] ori: It's already there [23:19:44] (03CR) 10Ori.livneh: [V: 032] include admins::mortals on osmium, to allow MediaWiki deployments [operations/puppet] - 10https://gerrit.wikimedia.org/r/132127 (owner: 10Ori.livneh) [23:20:16] (03PS1) 10Hoo man: Revert "include admins::mortals on osmium, to allow MediaWiki deployments" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132128 [23:20:23] ori: ^ [23:20:42] (03CR) 10Dzahn: [C: 031] Revert "include admins::mortals on osmium, to allow MediaWiki deployments" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132128 (owner: 10Hoo man) [23:21:24] blergh, stupid of me. thanks. [23:21:30] (03CR) 10Ori.livneh: [C: 032] Revert "include admins::mortals on osmium, to allow MediaWiki deployments" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132128 (owner: 10Hoo man) [23:23:12] (03PS5) 10Dzahn: re-add apache-graceful-all with some fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/132011 [23:25:41] (03CR) 10Dzahn: [C: 032] re-add apache-graceful-all with some fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/132011 (owner: 10Dzahn) [23:27:48] !log demon Started scap: no-op scap, for ori [23:27:54] Logged the message, Master [23:29:03] (03CR) 10Dzahn: "now tin also has /etc/cluster with the site name, like fenari does" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132011 (owner: 10Dzahn) [23:33:44] root is testing dologmsg on tin [23:34:02] ^ why does it take so long for one of those to execute ,btw [23:34:10] it goes through neon [23:34:15] ah [23:34:23] that makes more sense [23:36:12] duh, check before asking.. echo "$*" | nc -q0 neon.wikimedia.org 9200 , hah [23:36:22] didn't realize that was all it was [23:36:50] !log demon Finished scap: no-op scap, for ori (duration: 09m 01s) [23:36:57] Logged the message, Master [23:38:27] if [ -z $DOLOGMSGNOLOG ]; then ..:) [23:38:36] dologmsgnolog [23:38:49] ^d much obliged [23:38:56] <^d> anytime [23:42:51] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 5 below the confidence bounds