[00:03:30] RobH: this looks promising: http://hirise.lpl.arizona.edu/HiBlog/2009/07/02/problems-with-ias-viewer-jnlp-files/ [00:05:04] New patchset: Andre Engels; "Test" [analytics] (master) - https://gerrit.wikimedia.org/r/2054 [00:23:01] !log streaming a hotbackup of db1006 to db43 [00:23:02] Logged the message, Master [00:24:47] New review: Diederik; "Test works." [analytics] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2054 [00:24:47] Change merged: Diederik; [analytics] (master) - https://gerrit.wikimedia.org/r/2054 [00:29:09] apergos: went to bed right? [00:29:30] heh, i hope so its late there. [00:40:35] PROBLEM - Apache HTTP on srv224 is CRITICAL: Connection refused [00:43:04] PROBLEM - Disk space on db43 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=88%): /var/lib/ureadahead/debugfs 0 MB (0% inode=88%): [00:44:34] PROBLEM - MySQL disk space on db43 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=88%): /var/lib/ureadahead/debugfs 0 MB (0% inode=88%): [00:56:54] RECOVERY - Apache HTTP on srv224 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.057 second response time [01:03:24] RECOVERY - Disk space on db43 is OK: DISK OK [01:04:34] RECOVERY - MySQL disk space on db43 is OK: DISK OK [01:07:30] !log restarting dhcp3-server on brewster [01:07:32] Logged the message, Mistress of the network gear. [01:27:34] New patchset: Pyoungmeister; "invalid param :/" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2055 [01:27:52] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2055 [01:28:44] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2055 [01:28:44] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2055 [02:15:22] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1585s [02:18:22] RECOVERY - Memcached on srv256 is OK: TCP OK - 0.001 second response time on port 11000 [02:22:02] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1985s [02:30:32] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [02:42:22] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [02:45:22] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [04:16:10] RECOVERY - Disk space on es1004 is OK: DISK OK [04:18:00] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [04:44:30] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [05:24:26] PROBLEM - Puppet freshness on knsq9 is CRITICAL: Puppet has not run in the last 10 hours [07:12:15] RobH: you were asking if I was here... at like 2am [07:12:21] did you need something? [07:12:27] nope, dont worry about it [07:12:31] (and yeah luckily I was asleep) [07:12:36] ok [08:47:24] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:06:14] PROBLEM - Puppet freshness on cp1039 is CRITICAL: Puppet has not run in the last 10 hours [09:07:14] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [09:08:14] PROBLEM - Puppet freshness on cp1037 is CRITICAL: Puppet has not run in the last 10 hours [09:09:14] PROBLEM - Puppet freshness on srv199 is CRITICAL: Puppet has not run in the last 10 hours [09:14:14] PROBLEM - Puppet freshness on cp1038 is CRITICAL: Puppet has not run in the last 10 hours [09:16:14] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1397s [09:16:24] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1407s [09:26:06] PROBLEM - MySQL replication status on db1025 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1989s [09:36:46] PROBLEM - Puppet freshness on cp1040 is CRITICAL: Puppet has not run in the last 10 hours [09:46:46] PROBLEM - Puppet freshness on cp1036 is CRITICAL: Puppet has not run in the last 10 hours [09:58:46] PROBLEM - Disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 455006 MB (3% inode=99%): [10:05:26] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 391287 MB (3% inode=99%): [10:06:16] RECOVERY - MySQL replication status on db1025 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [10:19:34] PROBLEM - Puppet freshness on db43 is CRITICAL: Puppet has not run in the last 10 hours [11:04:14] RECOVERY - MySQL slave status on es1004 is OK: OK: [12:01:52] !log running authdns-update to add new lang "vep" [12:11:44] no morebots [12:14:59] ehm.. docs outdated: It runs from Andrew's home directory on Wikitech. ? [12:15:53] PROBLEM - Puppet freshness on mw1115 is CRITICAL: Puppet has not run in the last 10 hours [12:17:23] ah, netsplit Netsplit *.net <-> *.split quits: +morebots [12:35:53] does it not do that any more? [12:35:58] where does it run from? [12:37:58] ok it still does.. ~morebots@wikitech.wikimedia.org [12:38:13] I was gonna say... [12:38:28] if people start moving bots to some lab instance they had better document it :-P [12:39:39] there is a labs-morebots as well [12:39:44] Ryan_Lane is now known as labs-morebots [12:41:09] he is? [12:42:06] it appeared in logs some time [15:33:54] PROBLEM - Puppet freshness on knsq9 is CRITICAL: Puppet has not run in the last 10 hours [16:50:26] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [16:50:29] When you moved the blog behind varnish, did you guys forget about the mobile view? [16:50:35] http://blog.wikimedia.org/2012/01/23/wikimedia-foundation-voted-1-global-ngo-by-the-global-journal/ [17:01:28] johnduhart, you'll have to wait for the SF guys to get in [17:01:33] so either log a bug and poke them [17:01:38] or just poke them when they are ;) [17:02:01] I like poking people more than using bugzilla ;) [17:11:37] robh: is mobile1 in rotation...i want to replace the network cable due to packet loss [17:12:06] I thought it was not in use now [17:12:07] but [17:12:27] huh not even in dns [17:12:28] okay...rather just be sure [17:12:32] name change [17:12:37] yup [17:12:44] lemme check something [17:12:52] it is capella [17:12:56] yes [17:12:58] it will be [17:14:17] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [17:14:32] yeah go ahead [17:14:55] apergos: thx [17:15:12] !log replacing network cable mobile1 [17:17:47] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [17:30:12] hrmf, what oh what is gmond doing on storage3? [18:13:58] cmjohnson1: i get a pxe boot failure on ms-be1 [18:14:09] i think the network cable is plugged into the wrong port [18:14:25] cmjohnson1: ping me when you get back please [18:14:34] ah, I knew there was something I was supposed to be watching over here... [18:15:01] (and in the meantime the guys just sent me a windows app download link for the juniper class :-/) [18:15:58] robh: ping [18:16:44] robh: the network cable is not plugged in [18:17:15] ok, can ya attach it for me pls =] [18:17:30] sure...i am going to move my operation upstairs...brb [18:17:34] k [18:17:44] * apergos lurks [18:18:15] so the new c series requires ipmi [18:18:20] i am having to recall supermicro [18:18:42] * RobH refrains from ptsd joke involving servers [18:18:47] don't [18:18:52] haha, why can dell never use the same OOB mgmt system twice ? [18:19:05] cause their engineers get bored? [18:19:17] or maybe it's a sort of bofh syndrome.. who knows [18:19:23] C series is more industry standard with ipmi [18:19:29] which isnt awful, its just cryptic as fucking hell [18:19:51] though there is a good chance if i ever write the damned mgmt scripts i will just use ipmi for all of it. [18:20:02] but the mgmt script is a lot like 'when i win the lotto' [18:20:28] RECOVERY - Host ms6 is UP: PING OK - Packet loss = 0%, RTA = 109.64 ms [18:21:24] LeslieCarr: so there may be confusion on this network port, i think due to my assigning this server and port before it arrived [18:21:39] and another got its port after the fact, hence the lavbel for ms-be1 on asw-d1 when it wasnt there yet [18:21:41] hehe [18:21:49] may need you to set the vlan once chris plugs it in [18:21:53] okay [18:21:58] ms-be1 is all sorts of trouble [18:22:26] dont speak ill of my poor server [18:22:27] ;_; [18:22:32] trouble? ha ha ha...i laugh at trouble! [18:22:34] its already the redheaded stepchild of the datacenter [18:22:40] no that is dataset1 [18:22:41] only dataset1 gets more scorn [18:22:44] heh [18:22:48] yes, yes it does [18:22:58] PROBLEM - Puppet freshness on ms6 is CRITICAL: Puppet has not run in the last 10 hours [18:23:22] ms-be1 is gonna kick some ass [18:23:26] just you wait and see [18:23:28] robh: network cable is in. [18:23:28] * RobH shakes fist [18:23:43] ms-be1 is a powerful machine [18:24:10] cmjohnson1: thx, what port #? [18:24:19] port 6 [18:24:23] should be already set up [18:25:07] yep, just confirming [18:25:13] incase this doesnt work ;] [18:25:46] why are all the cp's in 1 rack ? if we need to work on that switch, won't it kill the site ? [18:25:57] because its how i was told to rack them ;] [18:25:59] and mark, how'd you recover ms6? :) [18:26:07] that's why we reroute traffic [18:26:25] heh [18:26:31] LeslieCarr: I have no idea [18:26:36] I've tried mounting btrfs a few times [18:26:43] started running debug commands for information gathering [18:26:47] nothing that should do/change anything [18:26:49] cmjohnson1: works, thanks! [18:26:54] ms-be1 is in installer =] [18:26:55] and after the 3rd time trying mounting, it worked [18:26:57] haha [18:26:58] awesome [18:27:35] awesome...robh: what about racadm? [18:27:44] racadm is drac stuffs, invalid on this [18:27:51] right.. with bmc [18:27:52] BUT! [18:27:55] cmjohnson1: so no ssh into it, it uses ipmitool which is run from a bastion host [18:27:57] I found this link... which is hilariouis [18:27:59] hilarious [18:28:00] http://www.spinics.net/lists/linux-btrfs/msg11887.html [18:28:03] read the 2nd paragraph [18:29:24] why do people do that [18:29:24] ooo poor guy [18:29:35] that so sucks! [18:29:37] what an idiot [18:29:45] apergos: put their thesis on experimental filesystems? [18:29:47] who the fuck does his master thesis on an experimental btrfs fs [18:29:56] and backs up only every two weeks [18:29:57] with dd no less [18:29:59] without a backup of whatever they don't want to lose [18:30:02] if 2 weeks is too long [18:30:07] just sayin... [18:30:07] ya that's crazy [18:30:51] I guess the hope is that the fsck util will get shoved out the door in the next couple weeks or so [18:31:02] (was pooking around the mailing list yesterday for some reason about it) [18:35:23] New patchset: Catrope; "Moving stuff over from analytics.git/reportcard" [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2056 [18:36:58] PROBLEM - Disk space on srv221 is CRITICAL: DISK CRITICAL - free space: / 226 MB (3% inode=60%): /var/lib/ureadahead/debugfs 226 MB (3% inode=60%): [18:37:28] New patchset: Catrope; "Delete stuff that's been moved to analytics/reportcard.git" [analytics] (master) - https://gerrit.wikimedia.org/r/2057 [18:38:48] RECOVERY - Puppet freshness on ms6 is OK: puppet ran at Tue Jan 24 18:38:40 UTC 2012 [18:40:10] New patchset: Catrope; "Add dummy README file" [analytics] (master) - https://gerrit.wikimedia.org/r/2058 [18:40:34] ok so [18:40:42] apparently you need to run a "btrfs device scan" [18:40:46] before it will mount, on ms6 [18:40:53] that fixes it [18:40:55] and should be in initrd [18:43:08] PROBLEM - Host cp1036 is DOWN: PING CRITICAL - Packet loss = 100% [18:43:08] PROBLEM - Host cp1037 is DOWN: PING CRITICAL - Packet loss = 100% [18:43:18] PROBLEM - Host cp1038 is DOWN: PING CRITICAL - Packet loss = 100% [18:43:26] ah, that all is me [18:43:28] PROBLEM - Host cp1040 is DOWN: PING CRITICAL - Packet loss = 100% [18:43:50] !log moved cp1001 - cp1040 out of public vlan [18:44:28] PROBLEM - Host cp1039 is DOWN: PING CRITICAL - Packet loss = 100% [18:44:37] New patchset: ArielGlenn; "clean up tmp on imagescalers more aggressively" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2059 [18:46:18] New patchset: Bhartshorne; "adding country filters for nimish RT-2260" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2060 [18:46:45] When you moved the blog behind varnish, did you guys forget about the mobile view? [18:46:47] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2060 [18:46:48] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2060 [18:46:52] Ryan_Lane: ^ [18:47:01] New review: ArielGlenn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2059 [18:47:02] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2059 [18:47:12] sure did [18:47:17] does it not work? [18:47:21] Ryan_Lane: It doesn't [18:47:38] apergos: can I merge your image scaler change? [18:47:48] yes [18:47:50] Ryan_Lane: Before there was a mobile view cached on a certain page, and right now I can't access the mobile skin on my phone without going to an invalid URL [18:47:59] apergos: done. [18:47:59] I tried to get in before you but no dice [18:48:01] thanks [18:48:05] every time I see this: emery-etc-locke-filters.erb I die on the inside [18:48:05] np. [18:48:13] I now wonder if */5 was better [18:48:19] johnduhart: well, we weren't terribly worried about the mobile view at the time :D [18:48:22] we'll look into it [18:48:23] notpeter: I could check in another change and twist the knife a bit. [18:48:36] maplebed: eh, it'll happen. I'm not worried about it [18:48:46] New review: Diederik; "Ok." [analytics/reportcard] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2056 [18:48:46] Change merged: Diederik; [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2056 [18:49:10] doh, morebots is dead [18:49:10] Ryan_Lane: It's more of an annoyance when a mobile view for a page gets displayed on the PC :p [18:49:13] going to try to fix it [18:49:56] Ryan_Lane: eg. http://blog.wikimedia.org/author/gerardm/ [18:50:08] New patchset: ArielGlenn; "hmm, guess I like */5 better" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2061 [18:50:17] I don't get a mobile view there [18:50:35] Ryan_Lane: I do... [18:50:36] Hm [18:50:50] New review: ArielGlenn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2061 [18:50:50] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2061 [18:50:50] http://awesomescreenshot.com/0e0splz8c [18:51:05] Ryan_Lane: Are you logged in? [18:51:20] no [18:51:32] New patchset: Diederik; "Adding readme file" [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2062 [18:51:37] though I may have a cookie that is causing me to skip cache [18:51:41] !log Started find on ms5 to compile a list of thumbs changed in the last 5 days [18:51:44] I believe you that it's a problem [18:52:01] the mobile skin may not support w3 total cache [18:52:02] New review: Diederik; "(no comment)" [analytics/reportcard] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2062 [18:52:02] Change merged: Diederik; [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2062 [18:52:02] Ryan_Lane: And now varnish is reporting it as a cache miss aswell. wtf. [18:52:30] !log restarted morebots [18:52:31] Logged the message, Mistress of the network gear. [18:52:41] !log moved cp1001-1040 to private vlan [18:52:43] Logged the message, Mistress of the network gear. [18:56:55] Unable to install GRUB in /dev/sda Executing 'grub-install /dev/sda' failed. [18:56:59] urgh. [18:57:54] This GPT partition label has no BIOS Boot Partition; embedding won't be possible!. [18:58:37] yay [19:06:49] RECOVERY - Puppet freshness on db43 is OK: puppet ran at Tue Jan 24 19:06:43 UTC 2012 [19:07:03] hrm, so i would like the puppetmaster to be able to pull information from the software directory (without copying stuff over because that would just cause us to have two copies of files, and we all know that would get out of sync in about 2 minutes…) -- so i think it's possible to just have stafford also looking at the software repo - do we do any manual config on that or is it all in the puppet files ? [19:07:18] especially to mark ^^ since you set up the last incarnation [19:08:24] New patchset: Catrope; "Add .gitreview file" [analytics/udp-filters] (master) - https://gerrit.wikimedia.org/r/2063 [19:08:41] New patchset: Asher; "db43 -> s6" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2064 [19:09:37] New review: Diederik; "Ok." [analytics/udp-filters] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2063 [19:09:37] Change merged: Diederik; [analytics/udp-filters] (master) - https://gerrit.wikimedia.org/r/2063 [19:10:10] New review: Diederik; "Ok." [analytics] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2058 [19:10:10] Change merged: Diederik; [analytics] (master) - https://gerrit.wikimedia.org/r/2058 [19:10:28] New review: Diederik; "Ok." [analytics] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2057 [19:10:28] Change merged: Diederik; [analytics] (master) - https://gerrit.wikimedia.org/r/2057 [19:10:29] RECOVERY - DPKG on db43 is OK: All packages OK [19:11:01] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2064 [19:11:02] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2064 [19:16:17] New patchset: Asher; "move db43 to fully puppetized mysql conf" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2065 [19:16:42] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2065 [19:16:42] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2065 [19:18:59] PROBLEM - Puppet freshness on srv199 is CRITICAL: Puppet has not run in the last 10 hours [19:19:49] RECOVERY - Disk space on srv221 is OK: DISK OK [19:30:19] RECOVERY - Puppet freshness on srv199 is OK: puppet ran at Tue Jan 24 19:30:12 UTC 2012 [19:40:58] so, wtf? be 1H IN CNAME wikimedia-lb [19:41:03] who put this into dns? [19:41:08] and why did they not commit it? [19:41:42] anyone know? [19:42:41] they didn't? and it was mutante, and I looked at it with them, it's for a chapter wiki [19:42:51] which presumably goes there [19:42:54] Ryan_Lane: [19:43:17] but if it's not committed I don't know how there is any service... [19:43:29] ok, well I commented it out [19:44:00] is there somewhere else that chapter wikis should go? [19:44:31] well, I commented it because I didn't know what it was for [19:44:37] and it wasn't committed [19:46:30] ok, so if I see him I'll nag him to commit it, that must have been an oversight [19:48:40] PROBLEM - udp2log processes on emery is CRITICAL: CRITICAL: filters absent: /var/log/squid/filters/countries-100, /var/log/squid/filters/countries-10, /var/log/squid/filters/countries-1, [19:49:13] maplebed: fixed the 2tb grub2 issue [19:49:16] so yay =] [19:49:25] nice! how? [19:49:26] oh wow [19:49:30] what did you do? [19:49:31] its not hard, just odd [19:49:50] using gpt partitioning on 2tb disks means you ahve to create a 1mb bios type partition on the boot disks [19:50:03] so grub has some idea how to read things, due to GPT EFI additions to grub2 [19:50:17] which means new partman templates in future =P [19:50:23] ok [20:02:51] hmm maybe this means that in a little while there will be a machine with a base install :-) [20:02:53] woo hoo! [20:14:08] RECOVERY - udp2log processes on emery is OK: OK: all filters present [20:19:23] !log spinning up db54-58 for asher [20:19:25] Logged the message, and now dispaching a T1000 to your position to terminate you. [20:20:28] is notpeter around? [20:21:35] diederik: what's up? [20:21:43] hi! [20:22:00] hey [20:22:59] i filed issue: http://rt.wikimedia.org/Ticket/Display.html?id=2329 [20:23:17] and ryan mentioned that you were most likely to help me with this [20:23:37] going off for lunch now, but if you have questions just email me [20:23:43] cool! [20:23:44] sounds good [20:24:33] diederik: there is something high-profile that ct asked me to do, so it might be on the backburner for a little bit, but yes, I can do that [20:25:47] binasher: the disks are looking good. I'll get them through an initial puppet run and hand them off to you [20:31:16] !log restarting ms4 for memory testing [20:31:18] Logged the message, Master [20:44:08] PROBLEM - udp2log processes on emery is CRITICAL: CRITICAL: filters absent: /var/log/squid/filters/countries-100, /var/log/squid/filters/countries-10, /var/log/squid/filters/countries-1, [20:46:03] diederik: that check seems to be flapping a bit [20:46:11] I'll try to figure out why [20:50:17] !log pulled db26, rebooting and re-imaging with lucid [20:50:18] Logged the message, Master [20:58:49] notpeter: the C filters aren't all well written, some seqfault on certain log lines. [20:59:40] so... they do actually die and need to be restarted? [20:59:43] binasher: ^ [21:01:35] i think udp2log handles restarting them [21:01:52] those countries filters are continually crashing [21:01:57] nimish_g: ^^^ [21:02:58] they're crashing within seconds of starting [21:03:24] wooo [21:03:28] well, go go monitoring [21:03:59] pull them from the udp2log conf [21:04:07] and let nimish know [21:11:48] PROBLEM - RAID on db26 is CRITICAL: Connection refused by host [21:15:14] I am here [21:15:17] let met help [21:16:16] diederik: shall I just comment out those filters until they can be fixed? [21:18:31] why you no boot from boot disk like i tell you [21:18:32] bad server [21:20:24] !log Starting dist-upgrade of sanger [21:20:25] Logged the message, Master [21:20:46] RECOVERY - RAID on db26 is OK: OK: 1 logical device(s) checked [21:23:03] New patchset: Asher; "rebuilding db26 - enwiki pmtpa snapshot host" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2066 [21:23:25] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2066 [21:23:26] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2066 [21:23:33] grr [21:23:34] the damned c2100 is pxe booting on every reboot. [21:23:38] whyyyyy [21:26:13] New patchset: Lcarr; "Adding in the sw repo as well as symlinking it in files for ease of puppet pulling" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2067 [21:26:34] ^^ maplebed, mark - check out ? [21:26:56] binasher: ok, I tried running random log lines through them and they seem to work, I'll try to tweak them further [21:27:11] * maplebed defers to mark [21:27:24] i'll have a look [21:27:33] nimish_g: diederik apparently knows what it is and is working on it [21:28:01] binasher: i am working on it but rob asked me to also have it reviewed by tim [21:28:10] so i guess this won't go back online today or tomorrow [21:29:42] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/2067 [21:30:00] New patchset: Asher; "rethinking db26 as a snapshot host" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2068 [21:30:02] diederik: ok, also there's an improved version of this filter "template" that katie's written if we want to use that [21:30:16] i know, i am using that [21:30:18] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2068 [21:30:20] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2068 [21:30:20] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2068 [21:30:26] how important are those filters anyway? [21:33:35] I cannot decide if i like the C series or not. [21:35:30] diederik: did you pull them out of puppet? [21:35:45] the filters? [21:35:49] yeah. [21:35:51] nope [21:35:53] from svn [21:35:58] ok. [21:36:05] I think I need to change puppet then. [21:36:06] * maplebed checks. [21:36:15] but they were copied from locke/emery a couple of days ago [21:36:38] but i want to use git/gerrit for filters [21:36:46] (if we haven't been doing that already) [21:37:11] diederik and nimish_g - is it correct that we need to disable those filters for now? [21:37:17] (I'm in a different meeting and haven't paid full attention) [21:37:20] yes for now [21:37:27] yes [21:37:50] New patchset: Bhartshorne; "removing recently installed country filters - they're crashing" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2069 [21:37:56] nimish_g: ^^^ review please? [21:38:28] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2069 [21:38:28] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2069 [21:38:46] maplebed: looks right [21:39:47] ok, committed and deploying to emery. [21:40:56] PROBLEM - DPKG on db55 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:40:57] and udp2log hupped. [21:42:06] RECOVERY - udp2log processes on emery is OK: OK: all filters present [21:42:23] thanks, nagios. [21:42:43] does this mean that filters are automatically restarted? [21:45:16] PROBLEM - DPKG on db56 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:48:20] diederik: once a filter segfaults it doesn't generate any more data until it's been manually restarted. I don't know what happens after that, but I suspect piping data to a dead process causes errors, presumably ones that asher was talking about earlier [21:49:32] !log ms-be1 is online! MAN WE ARE AWESOME [21:49:34] Logged the message, RobH [21:50:40] maplebed: So ms-be1 is now all yours. Please keep me in the loop if it does anything odd. [21:53:26] RobH: roger. thanks. [21:54:18] its online with puppet NOT run but otherwise good to go, has raid1 os partion of 120GB [21:54:23] and 1GB swap on sda and sdb [21:54:36] otherwise its all unpartitioned space that the OS sees as independent disks [21:55:10] yeah, puppet handles the partitioning for swift [22:06:08] mark: I want to symlink this into the files directory for ease of adding files in. The other alternative would be to change the files directory link to go up 2 levels and just add in longer paths to all files. (comment on the code review of https://gerrit.wikimedia.org/r/#patch,sidebyside,2067,1,manifests/puppetmaster.pp ) [22:06:20] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2067 [22:06:59] ehm [22:07:05] you know you can add modules in the puppet fileserver right [22:07:08] you should probably do that [22:07:11] and point that at the right directory [22:07:22] so then you do puppet:///software/blah [22:14:53] ah [22:15:07] did not know that, good to know [22:22:00] New patchset: Lcarr; "Adding in the sw repo as well as symlinking it in files for ease of puppet pulling" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2067 [22:23:03] !log Sanger is upgraded to lucid [22:23:04] Logged the message, Master [22:23:13]