[23:59:51] safer than safe enough isn't useful. [00:01:09] it's easy, just use http://docs.python.org/library/os.html#os.setuid [00:01:49] maplebed: hrm, i only fully tested against pdns.controlsocket [00:01:51] hm. I will try that. [00:02:28] of course you'll need to lookup the uid [00:02:37] since it can vary between systems [00:02:46] sure. [00:04:36] that's in another library. pwd, I believe [00:04:52] pwd.getpwnam(name) [00:05:24] pwd.getpwnam(name)[2] <— that specifically [00:05:26] i think you do want the recursor for the particular stats you're getting [00:06:04] Ryan_Lane: thanks. that's the one. [00:06:09] yw [00:12:34] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [00:37:47] New patchset: Bhartshorne; "adding ganglia metrics to pds recursors" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8876 [00:37:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8876 [00:37:48] blast. ryan left. binasher - I added dropping privs and sanitized input from the statefile. Care to look again? [00:46:36] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 294 seconds [00:49:18] New patchset: Bhartshorne; "adding ganglia metrics to pds recursors" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8876 [00:49:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8876 [00:53:14] New patchset: Bhartshorne; "adding ganglia metrics to pds recursors" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8876 [00:53:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8876 [00:58:24] New patchset: Bhartshorne; "adding ganglia metrics to pds recursors" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8876 [00:58:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8876 [01:00:15] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 25 seconds [01:00:51] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay 10 seconds [01:16:23] New patchset: Bhartshorne; "adding ganglia metrics to pds recursors" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8876 [01:16:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8876 [01:32:02] New patchset: Bhartshorne; "adding ganglia metrics to pds recursors" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8876 [01:32:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8876 [01:42:15] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 302 seconds [01:45:06] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [02:37:41] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [03:05:53] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [03:46:41] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [03:48:38] PROBLEM - Puppet freshness on sodium is CRITICAL: Puppet has not run in the last 10 hours [04:11:43] RECOVERY - MySQL Replication Heartbeat on db1003 is OK: OK replication delay 0 seconds [04:11:52] RECOVERY - MySQL Slave Delay on db1003 is OK: OK replication delay 0 seconds [08:07:09] so quiet this time of the morning... [08:43:39] nothing happen til 2pm CET (or noon GMT) [08:43:41] ++ [08:58:30] back [08:58:37] notpeter: so yeah European morning are really quiet [08:59:11] probably cause there are few staff/contractors in Eu [08:59:20] and most volunteer are at school / work / sleeping [09:02:12] PROBLEM - Puppet freshness on gurvin is CRITICAL: Puppet has not run in the last 10 hours [09:04:09] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [09:04:09] PROBLEM - Puppet freshness on es1002 is CRITICAL: Puppet has not run in the last 10 hours [09:04:09] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [09:04:09] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [09:05:12] hashar: gotcha [09:05:34] hashar: it also seems like lots of folk in europe sleep late/work late so that they get some overlap with west coast [09:06:13] we are kind of forced to do it :-( [09:06:51] the 9hours difference is really not helping [09:07:03] as an example, I end my day of work at 9am your time [09:07:11] some of us are just busy coding so we lay low [09:07:16] unless something big is broken [09:07:25] get my daughter, lunch, kiss my wife etc… then resume work at noon SF time when most people get out for lunch [09:07:32] and can work till 2pm (11pm eu) [09:07:59] notpeter: so I mostly use async communication with SF folk. Aka the good old email ;-D [09:08:11] and enjoy the morning coding [09:10:02] break [09:10:08] will be back this afternoon [10:03:33] PROBLEM - SSH on kaulen is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:05:57] PROBLEM - Apache HTTP on kaulen is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:07:18] RECOVERY - Apache HTTP on kaulen is OK: HTTP OK HTTP/1.1 200 OK - 461 bytes in 7.049 seconds [10:11:03] RECOVERY - SSH on kaulen is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [10:13:36] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [10:16:18] PROBLEM - Apache HTTP on kaulen is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:17:39] RECOVERY - Apache HTTP on kaulen is OK: HTTP OK HTTP/1.1 200 OK - 461 bytes in 2.189 seconds [10:18:42] PROBLEM - SSH on kaulen is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:20:12] RECOVERY - SSH on kaulen is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [10:34:03] seeing some segfaults from page.cgi on kaulen [10:34:28] trying to stop and restart apache over there, just stopping it is taking a very long time [10:35:03] PROBLEM - SSH on kaulen is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:35:20] gotta wait it out [10:37:54] RECOVERY - SSH on kaulen is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [10:38:00] going to watch load drop for a minute here [10:38:57] PROBLEM - Apache HTTP on kaulen is CRITICAL: Connection refused [10:38:58] mark: ping [10:40:36] RECOVERY - Apache HTTP on kaulen is OK: HTTP OK HTTP/1.1 200 OK - 461 bytes in 0.013 seconds [10:42:09] !log restarted apache on kaulen, was seeing page.cgi segfaults in dmesg and he logs, huge cpu wait spikes (why?) [10:42:16] Logged the message, Master [10:47:26] New patchset: ArielGlenn; "weak sync of wmf media from swift/other backend to local filesystem" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/8894 [10:52:36] RECOVERY - Host search13 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms [10:52:45] RECOVERY - Host search14 is UP: PING WARNING - Packet loss = 64%, RTA = 0.40 ms [10:52:45] RECOVERY - Host search15 is UP: PING WARNING - Packet loss = 93%, RTA = 0.75 ms [10:55:36] PROBLEM - SSH on search13 is CRITICAL: Connection refused [10:55:54] PROBLEM - SSH on search15 is CRITICAL: Connection refused [10:56:03] PROBLEM - SSH on search14 is CRITICAL: Connection refused [10:56:03] PROBLEM - Lucene disk space on search13 is CRITICAL: Connection refused by host [10:56:03] PROBLEM - Lucene disk space on search15 is CRITICAL: Connection refused by host [10:56:16] paravoid: pong [10:56:57] PROBLEM - Lucene disk space on search14 is CRITICAL: Connection refused by host [10:57:19] hi [10:57:31] so, I have several commits to pybal [10:57:37] ah? [10:57:38] how do I push them [10:57:43] do I need change-ids? [10:57:49] yes [10:57:54] yours didn't have [10:58:01] that's because I didn't push them via gerrit [10:58:05] but you should now [10:58:05] ah! [10:58:12] just push to refs/for/master [10:58:18] RECOVERY - Host search19 is UP: PING OK - Packet loss = 0%, RTA = 0.59 ms [10:58:18] RECOVERY - Host search16 is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms [10:58:27] RECOVERY - Host search20 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [10:58:30] I need to add the hook first and amend the commits :) [10:58:34] yeah [10:58:34] dammit [10:58:40] or maybe [10:58:44] change-ids are not required in that repo [10:58:45] try it first ;) [10:58:48] not sure what I set [10:59:26] New patchset: Faidon; "Remove stub/placeholder files from debian/" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8895 [10:59:27] New patchset: Faidon; "Ship bgp.py with pybal for now" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8896 [10:59:27] New patchset: Faidon; "Change homepage to wikitech" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8897 [10:59:28] New patchset: Faidon; "Use Python absolute imports" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8898 [10:59:29] New patchset: Faidon; "Add twisted to setup.py's requires" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8899 [10:59:29] New patchset: Faidon; "Move main.py to scripts/pybal" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8900 [10:59:30] New patchset: Faidon; "Modernize Debian packaging" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8901 [10:59:31] New patchset: Faidon; "Modernize init script" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8902 [10:59:31] New patchset: Faidon; "Make example configuration less Wikipedia-specific" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8903 [10:59:32] New patchset: Faidon; "Rewrite debian/copyright" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8904 [10:59:33] New patchset: Faidon; "Add a debian/changelog entry" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8905 [10:59:36] yay :) [10:59:47] awesome :) [10:59:48] PROBLEM - Lucene on search15 is CRITICAL: Connection refused [10:59:48] enjoy [10:59:48] reviewing [11:00:31] also, which instance do you use for building packages? [11:00:44] I now use my 'varnish' instance [11:00:47] hah [11:00:52] which has a pybal setup with lucid and precise [11:00:55] er [11:00:56] pbuilder [11:01:04] i'm gonna make a precise instance now, for testing pybal on it [11:01:27] PROBLEM - SSH on search16 is CRITICAL: Connection refused [11:01:27] PROBLEM - SSH on search19 is CRITICAL: Connection refused [11:01:43] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/8895 [11:01:51] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/8895 [11:01:52] mark: I made one, you can do whatever with it, if you like [11:01:53] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8895 [11:01:54] PROBLEM - Lucene on search14 is CRITICAL: Connection refused [11:01:54] PROBLEM - Lucene on search13 is CRITICAL: Connection refused [11:02:06] there's a general package builder instance as well [11:02:11] which is supposed to work [11:02:12] PROBLEM - Lucene disk space on search16 is CRITICAL: Connection refused by host [11:02:12] PROBLEM - Lucene disk space on search19 is CRITICAL: Connection refused by host [11:02:14] but I never used it [11:02:26] yeah, I have built on that [11:02:39] we should add a deb lint checker to gerrit [11:03:50] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/8896 [11:03:52] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8896 [11:04:30] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/8897 [11:04:32] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8897 [11:04:52] which one's that? [11:05:05] what do you mean? [11:05:29] the "general package builder instance" [11:05:41] * mark checks [11:05:49] labs-build1? [11:05:53] yeah I think so [11:05:57] fits :) [11:06:19] I had to filter project "testlabs" first, heh [11:07:02] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/8898 [11:07:11] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/8898 [11:07:13] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8898 [11:07:27] PROBLEM - Lucene on search16 is CRITICAL: Connection refused [11:07:36] PROBLEM - Lucene on search19 is CRITICAL: Connection refused [11:07:47] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/8899 [11:07:49] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8899 [11:08:29] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/8900 [11:08:31] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8900 [11:08:39] RECOVERY - SSH on search13 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [11:09:48] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/8901 [11:09:50] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8901 [11:10:09] RECOVERY - SSH on search14 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [11:10:09] RECOVERY - SSH on search15 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [11:10:12] hm, I was thinking of writing an upstart script for pybal [11:10:24] but do you know what ubuntu (debian) are gonna do with systemd? [11:10:51] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/8902 [11:10:59] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/8902 [11:11:01] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8902 [11:11:50] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/8903 [11:11:52] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8903 [11:11:57] PROBLEM - Host search19 is DOWN: PING CRITICAL - Packet loss = 100% [11:12:06] PROBLEM - Host search20 is DOWN: PING CRITICAL - Packet loss = 100% [11:12:33] RECOVERY - SSH on search20 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [11:12:42] RECOVERY - Host search20 is UP: PING OK - Packet loss = 0%, RTA = 0.17 ms [11:13:00] RECOVERY - SSH on search19 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [11:13:00] RECOVERY - SSH on search16 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [11:13:09] RECOVERY - Host search19 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [11:13:38] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/8904 [11:13:40] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8904 [11:13:54] PROBLEM - NTP on search14 is CRITICAL: NTP CRITICAL: No response from NTP server [11:13:54] PROBLEM - NTP on search15 is CRITICAL: NTP CRITICAL: No response from NTP server [11:13:54] PROBLEM - NTP on search13 is CRITICAL: NTP CRITICAL: No response from NTP server [11:14:35] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/8905 [11:14:37] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8905 [11:14:44] thanks a lot faidon :-) [11:16:00] RECOVERY - Host search17 is UP: PING OK - Packet loss = 0%, RTA = 0.48 ms [11:19:00] PROBLEM - Lucene disk space on search17 is CRITICAL: Connection refused by host [11:19:00] PROBLEM - SSH on search17 is CRITICAL: Connection refused [11:23:57] PROBLEM - Lucene on search17 is CRITICAL: Connection refused [11:25:54] hm, bug [11:26:04] I just created a 'pybal' project, gave my name as member [11:26:07] but now i'm not listed [11:26:10] and can't do anything with it [11:31:45] RECOVERY - SSH on search17 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [11:32:30] PROBLEM - NTP on search16 is CRITICAL: NTP CRITICAL: No response from NTP server [11:32:41] mark: ubuntu has said that they're not going to switch to systemd [11:32:49] debian has not migrated to either upstart or systemd [11:32:51] for various reasons [11:32:57] ok [11:33:04] technical and social [11:33:24] if Debian chooses systemd, I'd expect Ubuntu to follow [11:33:42] well then I'll continue creating upstart scripts from time to time [11:33:51] PROBLEM - NTP on search19 is CRITICAL: NTP CRITICAL: No response from NTP server [11:33:53] I've never written one [11:34:52] upstart has its problems [11:36:10] tbh, without having a very well educated opinion, I tend to prefer upstart over systemd [11:36:16] not a big fan of do-it-all daemons [11:36:38] yeah [11:36:42] PROBLEM - NTP on search17 is CRITICAL: NTP CRITICAL: No response from NTP server [11:36:58] anyway, as for pybal [11:37:08] maybe you should bump versions at some point? :) [11:37:11] yes [11:37:14] it's still at 0.1 :) [11:37:16] gonna do that now [11:37:22] I just didn't want to bother with it before ;) [11:37:25] and I was thinking of switching to git-buildpackage [11:37:30] absolutely [11:37:40] it's going to mess the workflow a bit [11:37:56] you have a master branch with just the upstream tree and a debian branch with the debian changes [11:37:59] I built it with git-buildpackage yesterday [11:38:12] it's a native package now, so everything on one branch [11:38:15] any reason to change that now? [11:39:03] it doesn't make a big difference for us but being a native package is a stopper for a Debian/Ubuntu upload [11:39:10] ok [11:39:12] feel free to change it [11:39:49] okay [11:40:02] later though, I have some pending tasks that I shouldn't postpone more [11:40:08] yeah, no rush [11:40:12] i'm gonna add in ipv6 now [11:40:15] although pybal brought me back to my comfort zone :) [11:40:18] gonna put pybal on an instance and test it there [11:40:21] haha [11:40:36] yeah and I'm happy too, didn't really feel like reading up on all the debian packaging updates [11:42:24] PROBLEM - swift-container-auditor on ms-be2 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [11:43:08] I saw the DSA too, but was waiting for the USN [11:49:02] grr [11:49:06] why doesn't my new instance let me in [11:49:31] did you try to login before you added yourself? [11:49:39] if so, there's a negative cache :) [11:49:45] been there, done that [11:49:59] i would think I've been added upon creation [11:50:10] ok, let me try [11:50:13] * mark checks [11:51:05] the labsconsole interface could really use some improvements ;) [11:51:13] ohrly? :) [11:52:26] negative caching... and then people always wonder why I hate ldap [11:52:27] hm, can't login either [11:52:38] I like ldap, I don't like pam/nss ldap :) [11:53:39] do you have access rights to remove the 'pybal' project? [11:53:48] it has no members so I can't [11:53:50] projects cannot get removed afaik [11:53:56] ever [11:53:56] whut [11:54:48] can't login to pybal-precise either, but I wasn't a sysadmin [11:55:02] added myself now, but perhaps I'm in the negative cache too now [11:55:04] fail :) [11:56:00] hrm, or puppet is still running? [11:56:03] might be [11:56:10] it's setup to create pbuilder instances [11:56:11] those take a while [11:56:15] especially when labs I/O is slow [11:56:25] yep, I'm looking at the console [11:56:38] it's creating instances for hardy, lucid and precise by default [11:56:59] i'll try again later [12:04:22] RECOVERY - swift-container-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:38:16] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [12:46:13] PROBLEM - Puppet freshness on search17 is CRITICAL: Puppet has not run in the last 10 hours [12:49:13] PROBLEM - Puppet freshness on search13 is CRITICAL: Puppet has not run in the last 10 hours [12:49:13] PROBLEM - Puppet freshness on search15 is CRITICAL: Puppet has not run in the last 10 hours [12:55:13] PROBLEM - Puppet freshness on search14 is CRITICAL: Puppet has not run in the last 10 hours [12:55:13] PROBLEM - Puppet freshness on search16 is CRITICAL: Puppet has not run in the last 10 hours [12:55:13] PROBLEM - Puppet freshness on search19 is CRITICAL: Puppet has not run in the last 10 hours [12:57:19] PROBLEM - Puppet freshness on dobson is CRITICAL: Puppet has not run in the last 10 hours [13:07:16] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [13:37:28] !log updating drac on search18, shouldnt cause system reboot. [13:37:32] Logged the message, RobH [13:47:19] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [13:49:16] PROBLEM - Puppet freshness on sodium is CRITICAL: Puppet has not run in the last 10 hours [13:50:46] !log palladium has a bad disk, goign to replace it [13:50:46] so, wait, we have rsyslog in production and syslog-ng in labs? [13:50:49] Logged the message, RobH [13:50:52] mark: ^ palladium is a varnish cache [13:50:59] but it appears to be dual disk 1 TB raid 1 [13:51:09] yes it's a bits server [13:51:11] oh? we have caches that are neither cp or sq? [13:51:16] yes [13:51:25] mark: so since its hardware raid1 i should be ok to pull it to swap [13:51:26] dammit, and I thought I got a hang of our naming conventions [13:51:33] but since its varnish i am glad you are about ;] [13:51:39] there are 3 other ones [13:51:41] just shut it down [13:51:51] i would like to try the hot swap just to see if it works =] [13:51:57] we have not hot swapped that many 610s [13:52:17] !log replacing ps2 on mw1017 [13:52:21] Logged the message, Master [13:53:13] yep, still up [13:53:14] huzzah [13:53:28] New patchset: Hashar; "split filebackend conf out of CommonSettings" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/8914 [13:53:35] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/8914 [13:56:50] !log palladium disk replaced [13:56:54] Logged the message, RobH [14:05:29] gah, labs is unusable again [14:11:19] :( [14:36:01] PROBLEM - Lucene disk space on search20 is CRITICAL: Connection refused by host [14:40:19] okay, silly question [14:40:25] why are we naming our backport section "universe" [14:40:29] instead of "backports"? :) [14:41:00] and our patched/custom-built "main", instead of say, "wikimedia", "patched", "custom" or whatnot [14:41:00] "cause that's how it's always been?" ? [14:41:05] idk if that's right though [14:41:39] sounds like the kind of thing that maybe hasn't changed in ~5 yrs [14:43:48] paravoid: there's an RT ticket somewhere to change that [14:44:08] Setting up pybal (0.1+r20120524-1) ... [14:44:08] * Starting pybal pybal Traceback (most recent call last): [14:44:08] File "/usr/sbin/pybal", line 10, in [14:44:08] from pybal import pybal [14:44:08] ImportError: No module named pybal [14:44:22] oh? [14:44:26] hehe [14:44:27] that may be me [14:44:31] only the /usr/sbin/pybal binary is included [14:44:31] most probably :) [14:44:32] and configs [14:44:35] not the actual app ;) [14:44:36] eh?! [14:44:40] s/binary/script/ [14:44:45] are you sure? [14:44:52] that's what dpkg -L tells me [14:45:15] and dpkg -c [14:45:15] built on precise? [14:45:17] yes [14:45:24] on pybal-precise [14:45:24] I only built it on Debian [14:45:42] Use of uninitialized value $python_default in substitution (s///) at /usr/share/perl5/Debian/Debhelper/Buildsystem/python_distutils.pm line 121. [14:45:44] a lot of those [14:45:47] that may have something to do with it [14:46:23] * jeremyb wonders if we should start a pool on how long until there's a buildd ;) [14:46:31] PROBLEM - SSH on kaulen is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:46:51] do we need a buildd? [14:46:58] # ls [14:47:02] yeah [14:47:02] and waits [14:47:12] gah, argh, grr [14:47:17] people keep wishing for a buildd [14:47:20] and although it would be nice [14:47:25] that's not actually where you spend most time ;) [14:47:30] you still have to test your packages [14:47:35] and thus do manual builds [14:47:48] ...which is pretty much one command anyway [14:48:19] PROBLEM - Apache HTTP on kaulen is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:48:38] I just built it on pybal-precise [14:48:43] and the deb includes them [14:48:47] so most probably a missing build depends [14:49:11] ah [14:49:15] I presume you built it in a clean pbuilder? [14:49:16] because I just installed twisted and stuff perhaps [14:49:16] mark: i guess the main benefit would be to ensure it builds in a clean, minimal env. has an accurate depends line, etc. [14:49:17] yes [14:49:19] brb [14:49:23] good mark, bad faidon [14:49:29] jeremyb: aka pbuilder, which we've had for years [14:49:39] oh, ok [14:51:01] RECOVERY - Apache HTTP on kaulen is OK: HTTP OK HTTP/1.1 200 OK - 461 bytes in 4.509 seconds [14:51:02] i think it would be nice to have a build environment which builds/checks a package on every git checkin [14:51:03] automatically [14:51:08] from gerrit and perhaps jenkins [14:51:26] but it's not essential [14:51:50] sounds like an overkill :) [14:52:10] yeah definitely not something we can spend a lot of time on ;) [14:52:17] Sounds like a task for Jenkins [14:52:30] as I just said ;) [14:52:41] I put it as one of the berlin hackathon topics, but I don't expect much to happen [14:52:45] as we have more important stuff on our plate ;) [14:52:58] IPv6, git-deploy, ... [14:54:48] I think I just added ipv6 support to pybal, but still need to test it [15:00:41] hm, I just build pybal on a sid pbuilder and got the .py too [15:01:22] PROBLEM - Apache HTTP on kaulen is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:03:18] ah! [15:04:17] we're missing a build-dep on python-all [15:04:20] that's at least a problem [15:04:29] hehe [15:05:34] RECOVERY - Apache HTTP on kaulen is OK: HTTP OK HTTP/1.1 200 OK - 461 bytes in 7.850 seconds [15:05:43] RECOVERY - SSH on kaulen is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [15:08:43] New patchset: Faidon; "Add b-d on python-all and use dh_python2" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/8922 [15:08:45] mark: ^ [15:09:01]