[00:38:14] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [00:39:04] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 26.98 ms [02:08:37] !log LocalisationUpdate completed (1.22wmf22) at Mon Oct 28 02:08:37 UTC 2013 [02:08:58] Logged the message, Master [02:15:57] !log LocalisationUpdate completed (1.23wmf1) at Mon Oct 28 02:15:57 UTC 2013 [02:16:13] Logged the message, Master [02:36:16] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Oct 28 02:36:16 UTC 2013 [02:36:31] Logged the message, Master [04:02:34] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: No successful Puppet run in the last 10 hours [06:00:53] (03PS1) 10Ori.livneh: Remove references to 'olivneh' account from node defs [operations/puppet] - 10https://gerrit.wikimedia.org/r/92267 [06:35:36] (03PS1) 10Ori.livneh: [WIP] Add Graphite module & role [operations/puppet] - 10https://gerrit.wikimedia.org/r/92271 [06:35:59] paravoid: ^ [06:36:31] very incomplete. there's an overview of the state of the patch in the commit message [06:38:52] I wonder where python-{carbon,whisper} come from [06:38:56] and python-graphite-web [06:39:14] these aren't Debian/Ubuntu for sure [06:40:25] Debian does have a different python-whisper plus graphite-carbon and graphite-web [06:40:31] different from what we have [06:41:08] some random PPA at some point the past is what I'm guessing [06:41:39] probably; IIRC the listed maintainer is asher@ [06:41:58] !log powercycled ms-be1001, unaccessible via ssh or mgmt [06:42:14] Logged the message, Master [06:42:16] Maintainer: Chris Davis [06:42:27] apergos: oh, I didn't realize -- thanks! [06:42:30] yw [06:42:42] they've been hanging [06:42:47] one of the benefits of checking puppet freshness every day [06:42:47] it's the third one I think [06:42:50] no idea why [06:42:54] nothing obvious [06:43:11] saw some: [2107952.990239] BUG: soft lockup - CPU#1 stuck for 23s! [kworker/1:1:17962] [06:43:14] yeah [06:43:18] different processes on each though [06:43:18] same as the others [06:43:34] ori-l: so, we need to either forward port these packages to precise or switch to the Debian ones [06:43:43] I think you can guess my preference :P [06:43:49] they're in precise [06:44:02] in our apt repo, i mean [06:44:23] oh are they [06:44:41] that's how i've been testing things; mediawiki-vagrant uses apt.wikimedia.org [06:44:54] RECOVERY - Disk space on ms-be1001 is OK: DISK OK [06:44:54] RECOVERY - swift-object-server on ms-be1001 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [06:44:54] RECOVERY - swift-account-reaper on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [06:44:55] RECOVERY - swift-container-auditor on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [06:44:55] RECOVERY - DPKG on ms-be1001 is OK: All packages OK [06:45:04] RECOVERY - swift-container-server on ms-be1001 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [06:45:04] RECOVERY - swift-object-replicator on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [06:45:05] RECOVERY - swift-object-auditor on ms-be1001 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [06:45:07] but i'm in favor of switching to debian ones too [06:45:14] RECOVERY - RAID on ms-be1001 is OK: OK: State is Optimal, checked 14 logical drive(s), 14 physical drive(s) [06:45:14] RECOVERY - swift-container-replicator on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [06:45:14] RECOVERY - swift-account-auditor on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [06:45:14] RECOVERY - swift-account-replicator on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [06:45:14] RECOVERY - swift-container-updater on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [06:45:14] RECOVERY - SSH on ms-be1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [06:45:24] RECOVERY - swift-account-server on ms-be1001 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [06:45:30] ok, I see them now [06:45:33] different packages [06:45:34] RECOVERY - swift-object-updater on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [06:45:34] RECOVERY - Puppet freshness on ms-be1001 is OK: puppet ran at Mon Oct 28 06:45:31 UTC 2013 [06:45:44] asher@ indeed [06:45:53] other differences too, including versions [06:45:54] oh fun [06:46:12] the whisper one is the Debian one though [06:46:39] so that leaves carbon & web [06:47:02] ceres is also packaged in Debian fwiw [06:47:30] what are you thoughts on ceres btw? [06:47:56] i saw, but graphite is already severely lacking in good documentation and on ceres the interwebs are practically silent [06:48:04] RECOVERY - NTP on ms-be1001 is OK: NTP OK: Offset -0.002200245857 secs [06:48:54] ok [06:48:56] maybe later then [06:49:31] yeah. there are some good resources for scaling carbon-cache, written from experience.. i linked to them in manifests/role/graphite.pp. haven't found anything comparable for ceres yet [06:50:16] http://anatolijd.blogspot.com/2013/06/graphitemegacarbonceres-multi-node.html : "And while being announced two years ago, Ceres comes completely undocumented. But lack of documentation should never stop us to experiment!" [06:50:57] class role::graphite { 1 [06:50:58] class { 'graphite': [06:51:03] does that even work? [06:51:25] if it is, I'll be impressed [06:51:36] at puppet's usual craziness :) [06:51:39] no, it probably needs to be qualified. the role class wrapper was something i added in the course of copying the files over to operations/puppet [06:51:50] oh, ok [06:54:15] ok, looks reasonable in general [06:54:28] I puked every time I saw /opt of course :P [06:55:20] but you anticipated that [06:55:59] at least it's contained in one tree [06:56:20] the Debian packages I was talking about before are of course not like that [06:57:12] well, if some good soul went through the trouble of dotting the is and crossing the ts in the config files to make it work, let's use it [06:57:52] carbon.conf.example has some cheery comment to the effect of 'to use FHS paths, just set these three values', which i dutifully did, and lots of weird things started to break [06:59:19] carbon & whisper are even part of debian stable now [06:59:29] ceres & web are only unstable/testing [06:59:43] but stable probably means that they work [07:17:41] !log delaying one tampa slave per shard during OSC [07:17:58] Logged the message, Master [07:31:07] (03PS2) 10ArielGlenn: removing last vestiges (mgmt ip) for sq31-36, decommed [operations/dns] - 10https://gerrit.wikimedia.org/r/91160 [07:37:03] (03CR) 10ArielGlenn: [C: 032] removing last vestiges (mgmt ip) for sq31-36, decommed [operations/dns] - 10https://gerrit.wikimedia.org/r/91160 (owner: 10ArielGlenn) [07:45:11] nov 2012? [07:45:12] lol [08:13:42] was etherpad updated? I'm noticing weird new behaviours and I wonder if it's because of their new handling of corrupted pads [08:21:13] actually, I'm unable to save anything on any pad [08:22:39] !log etherpad.wikimedia.org seems read-only [08:22:55] Logged the message, Master [08:27:35] filed as https://bugzilla.wikimedia.org/show_bug.cgi?id=56232 [08:28:15] (03PS1) 10ArielGlenn: remove entries for db5,6,7,26,27 long since decommed [operations/dns] - 10https://gerrit.wikimedia.org/r/92272 [08:31:52] Nemo_bis: please check now [08:34:04] PROBLEM - etherpad_lite_process_running on zirconium is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^node node_modules/ep_etherpad-lite/node/server.js [08:34:39] mm no good it seems. that's too bad [08:35:07] apergos: same [08:36:32] yeah just a sec [08:37:04] RECOVERY - etherpad_lite_process_running on zirconium is OK: PROCS OK: 1 process with regex args ^node node_modules/ep_etherpad-lite/node/server.js [08:37:12] Nemo_bis: now? [08:38:06] (it seems now to work for me) [08:42:35] apergos: seems to work, let me test a bit more [08:42:41] !log restarted etherpadlite on zirconium, see ticket 6093, it was not saving edits [08:42:56] Logged the message, Master [08:56:09] (03PS1) 10ArielGlenn: remove db5,8,26 from all dsh group files, long since decommed [operations/puppet] - 10https://gerrit.wikimedia.org/r/92273 [08:56:34] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 305 seconds [08:57:04] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 315 seconds [08:58:57] (03CR) 10ArielGlenn: [C: 032] remove db5,8,26 from all dsh group files, long since decommed [operations/puppet] - 10https://gerrit.wikimedia.org/r/92273 (owner: 10ArielGlenn) [09:02:04] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 142 seconds [09:02:34] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay -0 seconds [09:10:09] mark, i tried tons of different options - seems that the moment ESI is enabled, varnish creates tons of worker threads and dies, irregardless of the URL that backend asks it to include [09:23:22] (03PS1) 10ArielGlenn: one more db26 removal from dsh files [operations/puppet] - 10https://gerrit.wikimedia.org/r/92277 [09:28:05] (03CR) 10ArielGlenn: [C: 032] one more db26 removal from dsh files [operations/puppet] - 10https://gerrit.wikimedia.org/r/92277 (owner: 10ArielGlenn) [09:33:56] (03PS1) 10ArielGlenn: removing dhcp entries for arsenic/niobium (reclaimed, see rt #5848) [operations/puppet] - 10https://gerrit.wikimedia.org/r/92278 [09:36:31] (03CR) 10ArielGlenn: [C: 032] removing dhcp entries for arsenic/niobium (reclaimed, see rt #5848) [operations/puppet] - 10https://gerrit.wikimedia.org/r/92278 (owner: 10ArielGlenn) [09:45:57] @replag [09:57:53] (03PS1) 10ArielGlenn: add back palladium mgmt ip [operations/dns] - 10https://gerrit.wikimedia.org/r/92280 [10:08:46] (03PS2) 10ArielGlenn: add back palladium mgmt ip [operations/dns] - 10https://gerrit.wikimedia.org/r/92280 [10:10:11] (03CR) 10ArielGlenn: [C: 032] add back palladium mgmt ip [operations/dns] - 10https://gerrit.wikimedia.org/r/92280 (owner: 10ArielGlenn) [10:49:51] (03PS1) 10Matanya: removing cache clean up patch [operations/puppet] - 10https://gerrit.wikimedia.org/r/92288 [11:07:49] apergos: around? [11:08:20] yes (though busy), what's up? [11:09:26] apergos: hi, sorry to interuppt, just need to know if any lucid mysql server still exists [11:09:43] let's see [11:11:55] the m2 shard hosts are apparently lucid [11:12:19] there are a number in tampa as well [11:12:19] oh, darn. thanks a lot apergos [11:12:21] that's it [11:12:29] yw [11:25:27] (03PS1) 10Mark Bergsma: Tabs to spaces [operations/dns] - 10https://gerrit.wikimedia.org/r/92292 [11:25:28] (03PS1) 10Mark Bergsma: Cleanup [operations/dns] - 10https://gerrit.wikimedia.org/r/92293 [11:26:11] (03CR) 10Mark Bergsma: [C: 032] Tabs to spaces [operations/dns] - 10https://gerrit.wikimedia.org/r/92292 (owner: 10Mark Bergsma) [11:26:25] (03CR) 10Mark Bergsma: [C: 032] Cleanup [operations/dns] - 10https://gerrit.wikimedia.org/r/92293 (owner: 10Mark Bergsma) [11:37:54] PROBLEM - Disk space on wtp1005 is CRITICAL: DISK CRITICAL - free space: / 353 MB (3% inode=76%): [11:43:14] PROBLEM - Parsoid on wtp1005 is CRITICAL: Connection refused [11:44:30] (03PS1) 10ArielGlenn: remove srv151-192,194-234 from dsh groups and add back srv193 [operations/puppet] - 10https://gerrit.wikimedia.org/r/92295 [11:47:58] (03CR) 10ArielGlenn: [C: 032] remove srv151-192,194-234 from dsh groups and add back srv193 [operations/puppet] - 10https://gerrit.wikimedia.org/r/92295 (owner: 10ArielGlenn) [12:26:54] RECOVERY - Disk space on wtp1005 is OK: DISK OK [12:27:14] RECOVERY - Parsoid on wtp1005 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.007 second response time [12:37:40] (03PS1) 10ArielGlenn: add explicit comments for ranges, hoping to avoid off-by-one [operations/dns] - 10https://gerrit.wikimedia.org/r/92296 [12:40:20] (03CR) 10ArielGlenn: [C: 032] add explicit comments for ranges, hoping to avoid off-by-one [operations/dns] - 10https://gerrit.wikimedia.org/r/92296 (owner: 10ArielGlenn) [12:55:27] (03PS1) 10Mark Bergsma: Cleanup [operations/dns] - 10https://gerrit.wikimedia.org/r/92300 [12:56:30] (03CR) 10Mark Bergsma: [C: 032] Cleanup [operations/dns] - 10https://gerrit.wikimedia.org/r/92300 (owner: 10Mark Bergsma) [12:59:36] thanks for that [13:00:26] I had a patchset ready earlier today to unilaterally remove the kenniset legacy stuff but then thought better of it [13:00:34] so yay [13:05:27] better hold off on the other in-addr.arpa files btw, I have a branch with service IP changes in them [13:26:33] !log restarted elasticsearch nodes to pick up new config [13:26:48] Logged the message, Master [13:53:33] (03PS1) 10ArielGlenn: give cmjohnson perms to ack, disable notifications etc in icinga [operations/puppet] - 10https://gerrit.wikimedia.org/r/92305 [13:56:05] (03CR) 10ArielGlenn: [C: 032] give cmjohnson perms to ack, disable notifications etc in icinga [operations/puppet] - 10https://gerrit.wikimedia.org/r/92305 (owner: 10ArielGlenn) [14:50:57] (03CR) 10Chad: "Actually it will be upstreamed ;-)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91879 (owner: 10Odder) [14:51:02] (03PS1) 10Cmjohnson: removing search21-36 from decommissioning.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/92313 [14:52:12] (03CR) 10Cmjohnson: [C: 032] removing search21-36 from decommissioning.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/92313 (owner: 10Cmjohnson) [14:53:54] (03CR) 10Andrew Bogott: [C: 031] "Looks right; will get Faidon to merge." [operations/puppet] - 10https://gerrit.wikimedia.org/r/92288 (owner: 10Matanya) [14:54:14] RECOVERY - Host search29 is UP: PING OK - Packet loss = 0%, RTA = 27.63 ms [14:54:59] paravoid: Shall I merge this? https://gerrit.wikimedia.org/r/#/c/92288/ [14:58:04] (03CR) 10Andrew Bogott: [C: 032] "Yep! Thanks for cleanup." [operations/puppet] - 10https://gerrit.wikimedia.org/r/92079 (owner: 10Matanya) [14:58:32] (03PS2) 10Andrew Bogott: site.pp: removed apache-utils [operations/puppet] - 10https://gerrit.wikimedia.org/r/92079 (owner: 10Matanya) [14:58:59] (03PS1) 10Dereckson: DynamicPageList extension configuration maintenance [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92314 [14:59:43] (03CR) 10Andrew Bogott: [C: 032] site.pp: removed apache-utils [operations/puppet] - 10https://gerrit.wikimedia.org/r/92079 (owner: 10Matanya) [15:01:59] !log removing search21-36 from pybal search_pools [15:02:17] Logged the message, Master [15:56:28] moorning manybubbles|lunc i guess you are not up yet? [15:56:32] on lunch [15:56:32] lucnh! [16:07:34] (03PS1) 10Ottomata: Making elasticsearch ganglia plugin query $ipaddress instead of localhost for ES stats. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92325 [16:09:09] (03PS2) 10Ottomata: Making elasticsearch ganglia plugin query $ipaddress instead of localhost for ES stats. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92325 [16:10:44] manybubbles|lunc: ^ [16:18:38] (03PS1) 10Andrew Bogott: Remove generic::sysfs::enable-rps [operations/puppet] - 10https://gerrit.wikimedia.org/r/92326 [16:23:35] heya LeslieCarr, I know you are super busy with the dc stuff, got anytime to help fix the inter rack multicast issue? [16:23:39] i'm not really sure what the problem is [16:24:30] but, multicast doesn't seem to work between racks, which makes ganglia not really work correctly in the analytics cluster [16:35:18] (03PS2) 10Andrew Bogott: Remove generic::sysfs::enable-rps [operations/puppet] - 10https://gerrit.wikimedia.org/r/92326 [16:35:57] (03CR) 10Andrew Bogott: [C: 032] Remove generic::sysfs::enable-rps [operations/puppet] - 10https://gerrit.wikimedia.org/r/92326 (owner: 10Andrew Bogott) [16:49:58] ottomata: which ganglia group was it again that wasn't working? [16:52:02] analytics, so 239.192.1.32 [16:52:35] if I start a multicast listener on a node in row B [16:52:45] PROBLEM - Host sq44 is DOWN: PING CRITICAL - Packet loss = 100% [16:52:45] and then send to that mutlicast addy from a node in row C [16:52:48] I don't get any traffic [16:52:53] but, I do if I send from a node in Row B [16:52:54] right [16:52:59] vice-versa is the same [16:53:14] multicast emitted from row B is not receieved in row C [16:53:23] I have two ganglia aggregators in the analytics cluster [16:53:32] one on row B (analytics1009) and one on row C (analytics1011) [16:53:35] (03PS1) 10Aude: Temporary logo for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92333 [16:53:44] metrics generated in row B make it to ganglia fine [16:53:49] metrics generated in row C do not [16:53:59] most of our production nodes are in row C; row B is all ciscos [16:54:33] (03CR) 10Aude: [C: 04-1] "not to deploy until 0:00 UTC, October 29 or after." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92333 (owner: 10Aude) [16:56:15] RECOVERY - Host sq44 is UP: PING OK - Packet loss = 0%, RTA = 31.18 ms [16:56:29] i think it's the analytics ACL that's breaking it [16:56:57] oh yeah? [16:57:35] we ahd a problem with that before, but wahtever was happening before just made the udp2log multicast firehose be duplicated in the analytics cluster [16:57:55] so why do I see all hosts in ganglia now? [16:59:10] (03CR) 10Bartosz DziewoƄski: [C: 04-1] "You really do not want to do it this way; the logo URL is put in page HTML and cached for up to 30 days. What you actually want is more al" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92333 (owner: 10Aude) [17:00:27] mark, i dunno [17:00:28] actually [17:00:30] all hosts are there [17:00:36] and even normal metrics seem to make it [17:00:43] its just custom ones that don'e [17:00:44] don't [17:00:59] (03CR) 10Aude: "@Bartosz putting it in css would indeed be better :)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92333 (owner: 10Aude) [17:01:09] (03Abandoned) 10Aude: Temporary logo for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92333 (owner: 10Aude) [17:01:14] strange [17:01:22] that's all done over the same multicast channel [17:02:02] when you tested multicast [17:02:07] did you test the same ganglia multicast address? [17:02:10] so [17:02:11] netcat analytics1009.eqiad.wmnet 8649 | grep cpu | wc -l [17:02:11] 416 [17:02:11] because the filter blocks other ranges [17:02:16] netcat analytics1011.eqiad.wmnet 8649 | grep cpu | wc -l [17:02:16] 416 [17:02:17] and [17:02:25] netcat analytics1009.eqiad.wmnet 8649 | grep kafka | wc -l [17:02:25] 1905 [17:02:34] netcat analytics1011.eqiad.wmnet 8649 | grep kafka | wc -l [17:02:34] 4971 [17:02:35] hm [17:03:06] and, yes mark, when I tested I tested both, buuuut oh actually I did not use the ganglia multicast address to test [17:03:09] i just used a random one [17:03:15] i guess if that is not in the acl it won't make it through? [17:03:20] that would have been blocked then yes [17:03:27] ok let me try that again [17:03:31] but, in the output I just pasted [17:03:58] the cpu metrics look all the same on both aggregators [17:04:00] but not the kafka metrics [17:04:05] yeah strange [17:04:12] should test with tcpdump and friends [17:05:46] is the 8649 port in the acl? or just the multicast group addy? [17:09:14] (03CR) 10Aaron Schulz: [C: 032] Switched to JobQueueFederated [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92032 (owner: 10Aaron Schulz) [17:09:32] (03Merged) 10jenkins-bot: Switched to JobQueueFederated [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92032 (owner: 10Aaron Schulz) [17:10:59] gah, tampa [17:11:02] ok this is strange to mark [17:11:16] i tcpdumped traffic on the gmond multicast on both an09 and an11 [17:11:28] (03CR) 10Dzahn: [C: 04-1] "the commit message says db5,6,7 but the change removes db5,7,8" [operations/dns] - 10https://gerrit.wikimedia.org/r/92272 (owner: 10ArielGlenn) [17:11:41] and then counted the occurrences of hostnames [17:11:47] in about 15 seconds of traffic [17:11:48] on an09 [17:11:54] 20 analytics1023.eqiad.wmnet.55418 [17:11:55] 27 analytics1024.eqiad.wmnet.60505 [17:12:07] soryr not those [17:12:09] these: [17:12:09] 6 analytics1021.eqiad.wmnet.35794 [17:12:09] 1 analytics1022.eqiad.wmnet.41099 [17:12:13] and then on an11 [17:12:17] 4345 analytics1021.eqiad.wmnet.53644 [17:12:17] 5928 analytics1022.eqiad.wmnet.38158 [17:12:18] (03PS1) 10Aaron Schulz: Update tampa for 7786c233a30d3c8552c862ab841d7b9dfa6d67be (also fixed gzip conf) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92336 [17:12:35] (03CR) 10Aaron Schulz: [C: 032] Update tampa for 7786c233a30d3c8552c862ab841d7b9dfa6d67be (also fixed gzip conf) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92336 (owner: 10Aaron Schulz) [17:12:47] (03Merged) 10jenkins-bot: Update tampa for 7786c233a30d3c8552c862ab841d7b9dfa6d67be (also fixed gzip conf) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92336 (owner: 10Aaron Schulz) [17:12:49] (03CR) 10Dzahn: [C: 031] "tickets matching the change though, db5/7/8. ack" [operations/dns] - 10https://gerrit.wikimedia.org/r/92272 (owner: 10ArielGlenn) [17:12:49] and conversly [17:12:53] an09 and an10 [17:12:56] (03PS1) 10coren: Labs DB: add recentchanges_userindex view [operations/software] - 10https://gerrit.wikimedia.org/r/92337 [17:12:56] on an09: [17:12:57] 813 analytics1010.eqiad.wmnet.36583 [17:12:57] 292 analytics1010.eqiad.wmnet.52368 [17:13:00] on an11: [17:13:05] 23 analytics1009.eqiad.wmnet.42442 [17:13:05] 4 analytics1010.eqiad.wmnet.47470 [17:13:24] !log aaron synchronized wmf-config/ 'Switched to JobQueueFederated' [17:13:26] (03CR) 10coren: [C: 032 V: 032] Labs DB: add recentchanges_userindex view [operations/software] - 10https://gerrit.wikimedia.org/r/92337 (owner: 10coren) [17:13:32] ah sorry, looks like multiple ports for an10 on an09, but still, same trend [17:13:38] Logged the message, Master [17:13:44] there are more metrics for hosts within the same rack [17:13:45] weird [17:13:59] so you're saying that most cross-rack multicast packets are not arriving, but some do [17:14:24] i think so? what's weirder is that it seems to be specifically custom ganglia metrics [17:14:43] which would indicate that maybe this is not a multicast specific issue, maybe? [17:14:44] not sure [17:14:57] i could be wrong on that, but that's just what i've noticed so far, [17:15:09] or at least, in ganglia the built in metrics seem to be fine on all hosts [17:15:22] yeah weird [17:22:49] ottomata: to be sure, perhaps do some multicast ping testing with the actual ganglia address (range)? [17:23:00] any other mcast address in that ganglia prefix works too [17:23:13] (03PS1) 10Andrew Bogott: Added the system_role module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92338 [17:25:12] hey [17:25:14] interesting problem [17:25:41] i wonder if it disappears if we deactivate the ACL [17:27:59] andrewbogott: or perhaps a "system" module, which could do some other wmf specific/inventory stuff in the future ;) [17:28:11] system::decommission [17:28:22] alarm at office :p.. wee [17:28:27] i don't know, just brainstorming [17:28:37] mark: That's probably better than putting it in the soon-to-exist generic::module [17:28:44] yeah [17:28:49] It still entails a massive search/replace but I guess that won't kill me [17:29:03] I think a "wmf" module is also too generic [17:29:04] s/_/::/g should do it, right? [17:29:21] well. for those system_role lines, yes ;) [17:29:30] kidding [17:29:59] generic module in progress: [17:30:25] (03PS1) 10Andrew Bogott: Add a 'generic' module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92339 [17:30:56] :) [17:34:59] paravoid: could you replace the custom graphite debs in apt w/the official debian packages? i'll update the patch to refer to the debian packages' file paths [17:38:36] And now, a patch that's way to big to read! [17:38:55] (03PS2) 10Andrew Bogott: Added the system module and the system::role class. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92338 [17:39:15] hehe [17:41:22] oops, that was broken, new patch coming up [17:41:25] (03PS3) 10Andrew Bogott: Added the system module and the system::role class. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92338 [17:46:18] paravoid: ping re ^? sorry if this merely rehashes what we talked about yesterday, i just realized afterward that we didn't really plan the next step [17:46:27] ori-l: oh yeah, sorry [17:46:36] it'll need a backport [17:46:52] I will have a look [17:47:51] paravoid: cool, thank you. [17:48:12] not now though [17:48:23] mark, thing systemuser should go in there as well? [17:48:27] s/thing/think/ [17:48:38] (03PS1) 10Mark Bergsma: Repartition ulsfo LVS service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/92342 [17:48:39] (03PS1) 10Mark Bergsma: Repartition eqiad LVS service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/92343 [17:48:40] (03PS1) 10Mark Bergsma: Repartition esams LVS service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/92344 [17:48:50] andrewbogott: well... not really [17:49:05] 'k [17:49:11] system::role is a pretty meta, wmf inventory/role kinda thing [17:49:18] system_user is just a low level unix system user [17:49:20] very different [17:50:20] (03CR) 10Manybubbles: [C: 031] "Makes sense to me. I imagine when I put this together I thought the plugin would run on the Elasticsearch machine instead so localhost wo" [operations/puppet] - 10https://gerrit.wikimedia.org/r/92325 (owner: 10Ottomata) [17:50:25] ottomata: thanks! [17:52:56] paravoid: yeah, i didn't expect you to drop everything :) [18:04:22] mark, yeah i did the same test with the ganglia analytics multicast addy [18:04:24] same behavior [18:04:35] traffic only appears for listeners in the same rack [18:13:24] !log reedy synchronized php-1.23wmf1/includes/specials/SpecialContributions.php [18:13:37] Logged the message, Master [18:15:42] (03CR) 10Dzahn: "it was most likely the duplicate bot and start script as you pointed out, but honestly i don't remember the full history of this one." [operations/puppet] - 10https://gerrit.wikimedia.org/r/60359 (owner: 10Dzahn) [18:20:39] (03PS2) 10Andrew Bogott: Add a 'generic' module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92339 [18:25:04] (03CR) 10Odder: "Weee! :-)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91879 (owner: 10Odder) [18:26:08] !log reedy synchronized php-1.23wmf1/extensions/Wikibase [18:26:22] Logged the message, Master [18:39:50] (03PS1) 10Chad: Wikidatawiki gets Cirrus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92352 [18:40:05] (03CR) 10Chad: "Prepping for tomorrow" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92352 (owner: 10Chad) [18:45:47] (03CR) 10Chad: [C: 031] delete search.wikimedia.org Apache config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/91132 (owner: 10Dzahn) [18:46:30] (03CR) 10Chad: "Please merge me. Super easy :D" [operations/puppet] - 10https://gerrit.wikimedia.org/r/84743 (owner: 10QChris) [18:46:41] PROBLEM - MySQL Processlist on db1024 is CRITICAL: CRIT 88 unauthenticated, 0 locked, 0 copy to table, 0 statistics [18:46:48] (03PS2) 10Chad: (bug 40941) Increase font size in Gerrit diff messages [operations/puppet] - 10https://gerrit.wikimedia.org/r/91879 (owner: 10Odder) [18:46:55] (03CR) 10Chad: [C: 031] "Please merge me. Super easy :D" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91879 (owner: 10Odder) [18:49:12] PROBLEM - DPKG on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:49:41] RECOVERY - MySQL Processlist on db1024 is OK: OK 6 unauthenticated, 0 locked, 0 copy to table, 2 statistics [18:50:01] RECOVERY - DPKG on searchidx1001 is OK: All packages OK [19:04:19] (03PS1) 10Cmjohnson: Removing search21-36 from lucene.pp (decomm'ing these servers) [operations/puppet] - 10https://gerrit.wikimedia.org/r/92355 [19:05:57] (03CR) 10jenkins-bot: [V: 04-1] Removing search21-36 from lucene.pp (decomm'ing these servers) [operations/puppet] - 10https://gerrit.wikimedia.org/r/92355 (owner: 10Cmjohnson) [19:06:59] (03PS2) 10Cmjohnson: Removing search21-36 from lucene.pp (decomm'ing these servers) [operations/puppet] - 10https://gerrit.wikimedia.org/r/92355 [19:07:56] (03CR) 10jenkins-bot: [V: 04-1] Removing search21-36 from lucene.pp (decomm'ing these servers) [operations/puppet] - 10https://gerrit.wikimedia.org/r/92355 (owner: 10Cmjohnson) [19:16:50] ^d: manybubbles Either of your rebuilding any indexes? [19:16:59] Reedy: nop! [19:17:02] nope [19:17:09] not I, at least [19:17:55] There's 3 wikis showing IndexMissingException atm [19:18:28] uh,4 - bhwiktionary, strategywiki, ugwikibooks and bmwikiquote [19:19:05] PROBLEM - Disk space on wtp1011 is CRITICAL: DISK CRITICAL - free space: / 232 MB (2% inode=76%): [19:20:39] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: non wikipedia to 1.23wmf1 [19:20:53] Logged the message, Master [19:23:05] PROBLEM - Parsoid on wtp1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:24:43] Reedy: will check [19:27:31] (03PS1) 10Reedy: non wikipedia to 1.23wmf1 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92361 [19:27:47] (03CR) 10Reedy: [C: 032] non wikipedia to 1.23wmf1 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92361 (owner: 10Reedy) [19:27:53] (03CR) 10Mwalker: [C: 04-2] "So we can't remove contribution tracking; it's what allows us to internally track and assign IDs to donors as they come through our system" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91675 (owner: 10Reedy) [19:28:31] (03Merged) 10jenkins-bot: non wikipedia to 1.23wmf1 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92361 (owner: 10Reedy) [19:29:15] (03CR) 10Mwalker: [C: 031] Make misc::maintenance::foundationwiki cronjobs ensure => absent [operations/puppet] - 10https://gerrit.wikimedia.org/r/91676 (owner: 10Reedy) [19:29:53] Reedy: I'm really not sure where all these wikis came from that we never made indexes for. I made a bunch of indexes a while ago when they were badly autocreated. I'm just going to go through cirrus.dblist and make sure they are all working as expected so I'll be sure everything is clean as of today [19:31:05] RECOVERY - Disk space on wtp1011 is OK: DISK OK [19:34:27] Reedy: all the wikis you mentioned are built except strategywiki - that one is building [19:34:51] nlwikinews [19:34:59] advisorywiki [19:37:03] Reedy: more wikis than I've ever heard of. languages, I didn't know existed. I'm just going to go down the list.... [19:37:31] 846 centralauthed wikis [19:37:35] plus a couple of handfuls more [19:37:46] 879 in total apparently [19:39:10] (03PS3) 10Andrew Bogott: Move generic::gluster* into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/91884 [19:41:13] hey manybubbles & ^d -- Krinkle & I have been meaning to ask you about something. How hard would be to set up a search interface that greps through the MediaWiki namespace of all Wikimedia wikis? [19:41:20] (03CR) 10Andrew Bogott: [C: 032] Move generic::gluster* into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/91884 (owner: 10Andrew Bogott) [19:42:59] whenever someone proposes to modify MediaWiki CSS or JS APIs, there is always the question of how many gadgets, site scripts, etc. exist in the wild that utilize the interface or rely on the selectors [19:44:07] it'd be very useful to be able to answer such questions definitively in reference to the code that exists on the cluster [19:44:19] There's also User:Foo/(skinname|common).(js|css) [19:44:36] hrm, yes, good point! [19:45:28] Certainly doing MediaWiki namespace would be a good start for site stuff, gadgets etc as you said [19:47:14] can't protect everyone... [19:47:39] (03PS1) 10Andrew Bogott: Specify group => 'root' for the logrotate config. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92365 [19:48:22] (03CR) 10Andrew Bogott: [C: 032] Specify group => 'root' for the logrotate config. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92365 (owner: 10Andrew Bogott) [19:50:52] (03PS4) 10Andrew Bogott: Added the system module and the system::role class. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92338 [19:53:37] (03PS3) 10Cmjohnson: Removing search21-36 from lucene.pp (decomm'ing these servers) [operations/puppet] - 10https://gerrit.wikimedia.org/r/92355 [19:54:26] (03CR) 10jenkins-bot: [V: 04-1] Removing search21-36 from lucene.pp (decomm'ing these servers) [operations/puppet] - 10https://gerrit.wikimedia.org/r/92355 (owner: 10Cmjohnson) [19:55:08] (03Abandoned) 10Cmjohnson: Removing search21-36 from lucene.pp (decomm'ing these servers) [operations/puppet] - 10https://gerrit.wikimedia.org/r/92355 (owner: 10Cmjohnson) [19:56:59] (03PS1) 10Cmjohnson: Removing search21-36 from lucene.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/92366 [19:58:39] (03CR) 10Cmjohnson: [C: 032] Removing search21-36 from lucene.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/92366 (owner: 10Cmjohnson) [19:58:48] manybubbles: i'm real confused about that patchset I gave you [19:58:57] i was about to note that the metrics do come from the es machines [19:59:01] which means that localhost should work [19:59:10] but when I was checking, i could not get anything via http://localhost [19:59:31] i was getting an http error page of some kind [19:59:42] but :9200 stats worked when I used the IP [19:59:52] but now, i went back to try it to get you the exact error [19:59:53] and it works [19:59:54] AND [20:00:01] at least on testsearch1001 [20:00:05] the value changed today once. [20:00:13] its the only time it has changed in the metric's history [20:00:18] (im looking at es_docs_count) [20:06:53] (03PS3) 10Andrew Bogott: Add a 'generic' module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92339 [20:06:54] (03PS5) 10Andrew Bogott: Added the system module and the system::role class. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92338 [20:08:48] ottomata: I'm nott sure I really trust that docs count then [20:08:59] it must be counting something that isn't what we want [20:09:25] (03CR) 10Ottomata: "Well, I believe it does run on the ES machine(s). Its just that http requests to localhost don't work. I was getting http error response" [operations/puppet] - 10https://gerrit.wikimedia.org/r/92325 (owner: 10Ottomata) [20:09:43] well, manybubbles, the ganglia python module works [20:09:49] it just queries the http interface [20:09:53] and gets new values all the time [20:09:56] its just that ganglia doesn't get them [20:09:59] except for one time today [20:10:15] weird [20:10:21] totally weird [20:10:31] no idea why the localhost url didn't work earlier but now does [20:10:32] I did restart elasticsearch this morning to pick up a config change [20:10:47] but I use the localhost url all the time [20:11:09] oh hm [20:11:14] maybe something was weird before it restarted? [20:11:29] dunno [20:11:34] i thougth that was our problem [20:11:45] because the python module was not working this morning [20:11:51] since it queried localhost [20:11:55] but now it works fine... [20:11:56] grr [20:19:58] ottomata: just watching the execution - I'd love it if I could get es_gc_time [20:20:21] you can get it, just dunno what's wrong with ganglia right now [20:21:02] ottomata: maybe nuke the rrds and start over? I've had to do that before with ganglia [20:21:15] not a bad idea.... [20:21:19] can I just delete them? [20:21:30] i'll move them out and restart gmetad i guess [20:21:50] I'm not really sure. I think I deleted them but I can't recall [20:21:53] actually, let's try just one, [20:21:57] i'll try gc_time first [20:24:33] seems to be working better, ok going to nuke all the es_* rrds [20:26:36] actually, manybubbles, you're getting new machines soon, right? [20:26:40] we probably should make a ganglia cluster for you [20:26:46] so that they are in their own group [20:26:53] outside of misc eqiad [20:27:00] ottomata: sounds good to me [20:27:09] oh that is much better! [20:28:42] we'll see if it actually works [20:28:47] i mean, it has the correct value at least the first time [20:28:49] we'll see if it changes [20:30:34] manybubbles: are you planning on keeping these test nodes around in the long term? [20:30:41] or eventually replacing them all with the new hardware? [20:31:05] ottomata: probably not keeping them, no. At this point it'd be too hard to deal with having both large and small nodes in the same cluster [20:32:12] ok great, so when you get the hardware let's set them up so that they are in their own ganglia cluster [20:32:37] perfect [20:35:14] ottomata: sad graph: http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20eqiad&h=testsearch1001.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1382991791&v=71083&m=es_gc_time&vl=ms&ti=es_gc_time&z=large [20:36:00] yeah sigh. [20:36:12] that's at least a different behavior [20:36:20] reporting 0 instead of repeat values [20:36:33] hmm [20:37:41] it had differnt values for 4 different 15 second interval checks [20:37:51] so it worked for exactly 1 minute [20:37:52] agghh [20:40:37] ottomata: I just checked using the python module and watched the value change locally.... do you know where logs for gmond go? [20:41:03] yeah it works great through pythonmodule [20:41:18] dunno of any ganglia logs on the client hosts :/ [20:41:26] !log built search indexes for a bunch of small wikis that didn't seem to have them. we should be done getting index doesn't exist errors. [20:41:45] Logged the message, Master [20:42:57] ottomata: I'm going to try running gmond in the foreground, it is suggested on http://sourceforge.net/apps/trac/ganglia/wiki/FAQ [20:43:07] oh good idea [20:43:12] i'll do that on 1002 to watch as well [20:48:53] looks like it is working for you on 1001 right now [20:48:56] 1002 isn't for me [20:53:52] not logging anything? [20:53:57] mine is logging and working and all wonderful [20:54:12] I don't particularly want to run this in screen though [20:55:04] yeah no [20:55:10] i'm seeing logs fine [20:55:13] and it says it works [20:55:18] but no new values are actually going to ganglia [20:57:38] the other suggestion was to try to nc the gmond - I'm not super sure what that would be supposed to look like [20:58:20] yeah [20:58:23] i've been doing that [20:58:29] the values don't change there [20:58:36] manybubbles: do you see the es_gc_count value in your gmond daemon output? [20:58:41] i just see messages lie [20:58:47] sent message 'es_gc_count' of length 56 with 0 errors [20:59:10] ottomata: no values, no [20:59:15] I think we're seeing the same logs [20:59:43] telnet ms1004.eqiad.wmnet 8649 | grep -P 'testsearch|es_gc_time' [20:59:50] (same as netcat) [20:59:58] netcat ms1004.eqiad.wmnet 8649 | grep -P 'testsearch|es_gc_time' [21:00:23] hone moment, gotta go to a meeting and linphone is hateful so restarting [21:08:05] i dunno man, this whole thing looks totally busted to me [21:12:50] ottomata: it worked when i ran it in debug [21:12:54] .... [21:12:56] bleh [21:13:23] yeah [21:13:31] i have so many problems with ganglia inprod [21:17:30] !log awight synchronized php-1.22wmf22/extensions/CentralNotice [21:17:45] Logged the message, Master [21:18:31] !log awight synchronized php-1.23wmf1/extensions/CentralNotice [21:18:44] Logged the message, Master [21:25:21] !log reedy synchronized php-1.23wmf1/includes/db/Database.php 'bug 56124' [21:25:35] Logged the message, Master [21:33:54] (03PS1) 10Cmjohnson: removing search21-36 from site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/92426 [21:36:44] (03CR) 10Cmjohnson: [C: 032] removing search21-36 from site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/92426 (owner: 10Cmjohnson) [21:47:25] (03PS1) 10Cmjohnson: Decom'ing search 21-36 removing from dsh groups and dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/92430 [21:51:04] (03CR) 10Cmjohnson: [C: 032] Decom'ing search 21-36 removing from dsh groups and dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/92430 (owner: 10Cmjohnson) [22:00:54] (03PS1) 10Hashar: gallium: let ci folks sudo as jenkins-slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/92431 [22:01:45] (03CR) 10Hashar: [C: 031] "Got to be merged by ops and puppetd should be run on gallium." [operations/puppet] - 10https://gerrit.wikimedia.org/r/92431 (owner: 10Hashar) [22:02:07] (03CR) 10Krinkle: [C: 031] gallium: let ci folks sudo as jenkins-slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/92431 (owner: 10Hashar) [22:05:07] RECOVERY - check_job_queue on fenari is OK: JOBQUEUE OK - all job queues below 10,000 [22:08:16] PROBLEM - check_job_queue on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:08:34] (03CR) 10Ori.livneh: [C: 032] Fix annoyance with ctrl-C in mwscriptwikiset scripts [operations/puppet] - 10https://gerrit.wikimedia.org/r/91990 (owner: 10Aaron Schulz) [22:08:39] (03PS2) 10Krinkle: gallium: let ci folks sudo as jenkins-slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/92431 (owner: 10Hashar) [22:08:44] (03CR) 10Krinkle: [C: 031] gallium: let ci folks sudo as jenkins-slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/92431 (owner: 10Hashar) [22:09:25] Krinkle: so basically ping ops till it is merged :D [22:09:50] Krinkle: do we get our CI checkin at 5pm tomorrow? [22:10:01] !log powering down search 21 -36 [22:10:16] Logged the message, Master [22:10:48] hashar: ori-l has +2 on that puppet repo [22:10:54] mutante: if the change has ops approval i can do the merge and forced puppet run for 92431 [22:11:18] yeah, but it assigns sudo privs to a user, which is outside the scope of what i'm supposed to be merging [22:11:29] but if it gets a nod from someone in ops i can do the dirty work :) [22:11:43] ori-l: reasonable. [22:12:15] (03CR) 10Dzahn: [C: 031] "if this is ok for jenkins, don't see why it wouldn't be for jenkins-slave" [operations/puppet] - 10https://gerrit.wikimedia.org/r/92431 (owner: 10Hashar) [22:12:20] ori-l: ah true, that makes me jealous by the way :D [22:12:34] (03PS3) 10Faidon Liambotis: gallium: let ci folks sudo as jenkins-slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/92431 (owner: 10Hashar) [22:12:41] (03CR) 10Faidon Liambotis: [C: 032] gallium: let ci folks sudo as jenkins-slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/92431 (owner: 10Hashar) [22:12:57] ah, thanks faidon [22:13:08] wait for jenkins [22:13:08] paravoid: come on, isn't today a day off in Athens or are you starting working because it is 00:13 ? [22:13:09] ironically :) [22:13:22] it's "no!" day! [22:13:29] it is, how did you know? [22:13:34] * paravoid is impressed [22:14:09] (03CR) 10Faidon Liambotis: [V: 032] gallium: let ci folks sudo as jenkins-slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/92431 (owner: 10Hashar) [22:14:13] i saw alexandros's e-mail and was curious what the holiday was :P [22:14:16] bah pour gallium is starving :( [22:15:43] and I just ran a dist-upgrade :) [22:16:17] pro tip: if wikitech-l seems flam-y at times, go have a look at debian-devel now [22:18:53] anddd i am off to bed [22:19:00] thank you guys [22:19:09] bye hashar [22:19:28] paravoid: is Upstart's future basically slow death, btw? [22:19:53] what do you mean? [22:20:22] systemd would replace it, no? [22:21:26] who knows [22:21:39] depends on how this will go [22:21:49] we've had a bunch of flamewars in Debian on whether we should go with upstart or systemd [22:21:55] never reached a resolution [22:22:16] this is the 4th or 5th time this debate is so getting so out of hand [22:22:24] so someone called the big guns now, i.e. the technical commitee [22:22:54] ("tech-ctte") [22:23:25] if Debian goes systemd, maybe upstart will die eventually, or maybe Canonical will keep developing it as it's doing with e.g. Unity, who knows [22:24:06] if Debian goes upstart, then the answer to your question would be "no", imho [22:24:12] but it's all speculative at this point [22:25:28] systemd looks pretty neat. that thread, however.. does not :) [22:26:22] nope [22:26:38] one problem is the tone [22:26:47] the other one is that it's extremely repetitive [22:26:58] everything that is to be said has been said, a million times [22:27:29] then there's https://plus.google.com/u/0/115547683951727699051/posts/8RmiAQsW9qf [22:28:00] (03PS1) 10RobH: adding ruthenium to dns RT: 6078 [operations/dns] - 10https://gerrit.wikimedia.org/r/92435 [22:28:19] (03PS1) 10Cmjohnson: Removing dns entries for search21 -36 +mgmt [operations/dns] - 10https://gerrit.wikimedia.org/r/92436 [22:29:03] (03CR) 10RobH: [C: 032] adding ruthenium to dns RT: 6078 [operations/dns] - 10https://gerrit.wikimedia.org/r/92435 (owner: 10RobH) [22:29:33] (03CR) 10Cmjohnson: [C: 032] Removing dns entries for search21 -36 +mgmt [operations/dns] - 10https://gerrit.wikimedia.org/r/92436 (owner: 10Cmjohnson) [22:29:40] (03CR) 10Ori.livneh: [C: 032] "You could simply add the selector to the previous rule, but I am going to assume that you are keeping them discrete on purpose, because th" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91879 (owner: 10Odder) [22:29:49] !log updated dns [22:30:03] Logged the message, RobH [22:30:10] robh: did you see my changes on that update? [22:31:45] !log dns update 2nd set of changes [22:32:00] Logged the message, Master [22:34:52] (03CR) 10Ori.livneh: "Changing the format arg from JSON to JSON_SINGLE isn't controversial, but I'd appreciate a look-over from ops for the whole approach. Dump" [operations/puppet] - 10https://gerrit.wikimedia.org/r/84743 (owner: 10QChris) [22:35:03] ^ paravoid [22:36:09] I have no idea what this does [22:38:44] bblack, dfoy's over at my desk. quick question: is netmapper now periodically polling for the latest w0 IP addresses? or is is it something you kick off manually? [22:39:24] paravoid: it generates some metric about reviewers that the API doesn't expose by having cron run a SQL query and dumping the output into a file on /var/www [22:39:44] what's the difference of JSON vs. JSON_SIMPLE? [22:39:59] dr0ptp4kt: netmapper periodically polls for an updated file on disk and updates its runtime dataset seamlessly. Separately, our puppet config deploys a small shellscript that fetches the w0 stuff with wget and drops it where netmapper will see it. [22:40:04] So, yes. [22:40:04] anyway, if you know what it does and think it's a good idea, feel free to merge it [22:40:32] i don't know what it does and i'm not sure it's a good idea [22:40:52] heh :) [22:41:39] " In JSON mode records are output as JSON objects using the column names as the property names, one object per line. In JSON_SINGLE mode the whole result set is output as a single JSON object." [22:42:07] the switch from JSON to JSON_SINGLE isn't an issue, I'm just a bit uncomfortable merging anything that touches that setup [22:42:21] dr0ptp4kt: (oh I should have mentioned the shellscript is run from cron, every 10 minutes) [22:46:51] (03PS4) 10Ori.livneh: Switch to single Json object for gerrit's reviewer count query [operations/puppet] - 10https://gerrit.wikimedia.org/r/84743 (owner: 10QChris) [22:46:56] paravoid, you able to do a quick google hangout? i think it would be faster for us to talk about the meaning of a "landing page" that way [22:49:25] sorry, I'm in a personal call right now [22:49:56] paravoid, cool man. i'll try to describe what i was thinking over email, and hopefully we'll get it figured out that way. [22:55:03] (03CR) 10Chad: "Yeah, this has always sucked. Unfortunately the REST api doesn't give us a clean way to get this data without requesting it change-by-chan" [operations/puppet] - 10https://gerrit.wikimedia.org/r/84743 (owner: 10QChris) [23:02:38] (03PS1) 10Cmjohnson: Adding ruthenium to site.pp netboot and dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/92448 [23:04:36] (03PS2) 10Cmjohnson: Adding ruthenium to site.pp netboot and dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/92448 [23:04:42] (03PS1) 10Reedy: It's Wikidatawikis birthday, and it shall cry if it wants to! [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92449 [23:05:16] oh Reedy, that's uncommon. [23:06:18] and it calls for a penis vandalism, but meh. [23:07:34] * ^d kicks grrrit-wm [23:08:25] (03PS3) 10Cmjohnson: Adding ruthenium to site.pp netboot and dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/92448 [23:08:31] (03CR) 10Cmjohnson: [C: 032] Adding ruthenium to site.pp netboot and dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/92448 (owner: 10Cmjohnson) [23:17:56] (03Abandoned) 10Reedy: It's Wikidatawikis birthday, and it shall cry if it wants to! [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92449 (owner: 10Reedy) [23:22:26] (03PS1) 10Cmjohnson: Adding dns entries for ruthenium [operations/dns] - 10https://gerrit.wikimedia.org/r/92452 [23:22:27] (03CR) 10jenkins-bot: [V: 04-1] Adding dns entries for ruthenium [operations/dns] - 10https://gerrit.wikimedia.org/r/92452 (owner: 10Cmjohnson) [23:23:54] (03Abandoned) 10Cmjohnson: Adding dns entries for ruthenium [operations/dns] - 10https://gerrit.wikimedia.org/r/92452 (owner: 10Cmjohnson) [23:31:58] (03PS1) 10Reedy: Update RC2UDP config to use $wgRCFeeds [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92455 [23:48:57] (03PS2) 10Reedy: Update RC2UDP config to use $wgRCFeeds [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92455