[00:00:48] lol [00:01:06] if you can't fix the problem, just forget it ever happened ;) [00:01:08] PROBLEM - NTP on search20 is CRITICAL: NTP CRITICAL: No response from NTP server [00:01:18] can you merge that then Reedy and I'll pull [00:01:28] I will just try not to make a mistake next time ;) [00:01:30] I will, just need to tidy up something else first [00:03:17] ok [00:04:59] Ryan_Lane: how do I view the users in an ldap group on gerrit? [00:05:09] you can't on gerrit [00:05:19] bleh [00:05:23] seemingly preilly isn't in WMF [00:05:26] could you fix htat? [00:05:59] Reedy: on any labs instance you can use ldaplist -l group [00:06:38] The group list for wmf is weird.. [00:09:35] Ryan_Lane: http://p.defau.lt/?lyX0Np1knlmP3ObaJiyuYQ [00:10:52] do you need to add a group first? [00:10:56] so group wmf [00:11:33] What? [00:11:36] It already exists [00:12:32] does the ordering matter? so modify-ldap-group wmf --addmembers=preilly [00:14:17] https://labsconsole.wikimedia.org/wiki/Help:Access#Giving_users_Labs_access.2C_if_they_already_have_an_SVN_account [00:14:23] and no, that wouldn't make sense [00:21:22] New patchset: Ryan Lane; "Fixes for device detection" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7142 [00:21:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7142 [00:22:11] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7142 [00:22:13] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7142 [00:22:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:24:08] Ryan_Lane: can you run "modify-ldap-group --addmembers=preilly wmf" for me? [00:28:35] Reedy: https://bugzilla.wikimedia.org/show_bug.cgi?id=33753 did you figure out which config bits you need? [00:28:51] no [00:31:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.813 seconds [00:46:21] Reedy: yeah [00:46:21] sec [00:46:41] Reedy: you need to run that via sudo [00:46:43] I ran it, though [00:46:49] Ah [00:46:56] just literally sudo foobar...? [00:47:14] https://labsconsole.wikimedia.org/wiki/Help:Access#Giving_users_Labs_access.2C_if_they_already_have_an_SVN_account [00:47:20] yep [00:47:22] docs said nothing ;) [00:47:34] add-labs-user doesn't need sudo, does it? [00:47:41] umm [00:47:45] I think all of them do [00:47:50] hmm [00:47:54] I thought I'd run that before [00:48:14] preilly: can you do git push from /home/wikipedia/common now? [00:48:23] (i haven't submitted your commit as it was complaining at me) [00:49:11] Reedy: To ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config.git [00:49:12] ! [remote rejected] master -> master (can not update the reference as a fast forward) [00:49:13] error: failed to push some refs to 'ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config.git' [00:50:10] You probably don't have the right to push merge commits then [00:50:15] Run git rebsae origin/master and try again [00:50:18] *rebase [00:50:22] It's not a merge commit [00:50:39] If he pulled after committing the pull might have created a merge commit locally [00:50:51] Computers suck [00:51:24] Reedy, RoanKattouw: git rebase origin/master [00:51:25] Current branch master is up to date. [00:52:00] Reedy, RoanKattouw: still ! [remote rejected] master -> master (can not update the reference as a fast forward) [00:52:31] I get ! [remote rejected] master -> master (invalid committer) [00:53:00] if (ctl.canUpdate()) { [00:54:04] RefControl ctl = projectControl.controlForRef(cmd.getRefName()); [00:54:05] if (ctl.canUpdate()) { [00:54:09] to be more complete [00:54:14] preilly: Could you pastebin your git log --graph for me please? [00:54:21] so it's probably permissions related right? [00:54:50] I just attempted to fix the permissions [00:54:55] Could you try again? [00:55:07] Though really the typical perms error is 'prohibited by Gerrit' [00:55:14] RoanKattouw: http://pastebin.mozilla.org/1628983 [00:55:38] RoanKattouw: now it's ! [remote rejected] master -> master (invalid committer) [00:55:50] hmm [00:55:59] Is the email you used for hte commit the one that's set on your gerrit account? [00:57:14] Ah [00:57:17] You used your full name [00:57:24] in gerrit your display name is preilly [00:57:44] Reedy: it's fixed To ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config.git [00:57:45] d4c704f..1604a07 master -> master [00:57:53] yay [00:58:26] Lol [00:58:34] I can direct push to that repo, but I can't review [00:58:34] Code Review: [00:58:35] +1 Looks good to me, but someone else must approve [00:58:35] 0 No score [00:58:35] -1 There's a problem with this change, please improve [00:58:49] Reedy, RoanKattouw: thanks, for the help [00:59:44] RoanKattouw: do you have rights to update https://gerrit.wikimedia.org/r/#/admin/projects/operations/mediawiki-config,access to give wmf and wmf-deployment? [01:00:33] else Ryan_Lane ^ [01:00:52] Sure [01:01:26] done [01:03:39] Nothing has changed.. [01:04:01] beat me to it [01:04:11] RoanKattouw: it needs to be added to submit [01:04:15] and verified [01:04:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:04:49] I thought we didn't want wmf to have this [01:05:00] why does wmf need it? shouldn't it only be wmf-deployment? [01:05:11] and why does push have anything specified? [01:05:17] by default everyone can push to everything [01:05:22] lol [01:05:30] * Reedy shrugs [01:05:40] ok [01:05:42] changed [01:05:43] look now [01:05:59] New review: Reedy; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7133 [01:06:00] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7133 [01:06:01] changed again [01:06:01] Yup, thanks [01:07:28] Krinkle-away: doesn't gitweb already handle this? [01:07:36] or does this just link to gitweb? [01:07:56] the mwSnapshots [01:08:01] Ryan_Lane: snapshots ? [01:08:04] Ryan_Lane: It doesn't use gitweb [01:08:07] I know gitweb has it [01:08:23] why not just link to that? [01:08:24] I thought I mentioned in the mail why I didn't use it [01:08:28] oh [01:08:31] the problem is that those are generated on the fly and often time out [01:08:33] very alos [01:08:39] very slow, not cached [01:08:42] ah [01:08:43] right [01:08:57] Need to get the checkouts for ED updated to have git in it [01:08:57] these are hourly generated for the HEAD of each public branch [01:09:03] ED ? [01:09:09] extension distributor [01:09:11] extension distributor [01:09:13] Right [01:09:15] fuck ED [01:09:18] I hate ED [01:09:27] I wonder what ED looks like now [01:09:34] apparently git clone is too hard for people [01:10:15] Reedy: For the regular john doe with one-click XAMP install or web host with FTP server, using the command line at all is too hard [01:10:23] let a lone installing Git for Windows and what not [01:10:43] Reedy: how do I abandon my changes? [01:10:52] I agree thought, that those people likely won't need nighties though, and we provide .tar for releases [01:10:53] it's giving me the stash or commit mesage again [01:11:09] git checkout changedfile [01:11:10] Thehelpfulone: On the gerrit change page, "Reject patch set #" or "Abandon change" [01:11:21] I mean within git Krinkle on my computer [01:11:39] Thehelpfulone: are you sure you mean "abandon" ? [01:11:46] Or do you mean something else [01:11:52] Oh, reset uncommitted changes ? [01:12:04] I mean overwrite my changed file with the one on the git repo [01:12:11] yeah, git checkout does that :) [01:12:25] which checks out that file from the last committed version of that file [01:12:43] now I've got merge conflict message [01:13:00] I did git checkout wmf-config/InitialiseSettings.php [01:13:05] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.019 seconds [01:13:09] and then which command ? [01:13:15] git pull [01:13:30] Thehelpfulone: have you already made changes ? [01:13:34] check git status [01:13:37] I did make changes yeah [01:13:41] git pull origin master --rebase [01:13:44] but I want to discard my changes [01:13:47] oh [01:14:03] Thehelpfulone: To undo a committed change, "git checkout" doesn't' do that [01:14:06] Thehelpfulone: If you want to completely discard your changes and reset your working tree to the master state, run git reset --hard origin/master [01:14:12] ^ [01:14:14] ok [01:14:33] what would I do if I just wanted to do it for one file? [01:14:52] Thehelpfulone: Also to make this easier nex time, make it a habbit to never edit in 'master' but make a local branch [01:14:55] I've done what you said and it worked Roan [01:15:05] how do I do that? :) [01:15:18] git branch? [01:15:21] git checkout -b myfancynewfeature master [01:15:24] so whenever you start working, checkout master, git reset --hard origin/master, git pull origin master; then git checkout -b my-fix-foobar [01:15:36] That will create a new branch called 'myfancynewfeature' based off 'master', and switch to it [01:15:41] yep [01:16:02] If you then get a clash with the origin, you can simple pull origin master and then rebase locally, without getting merge conflicts [01:16:57] ok [01:17:07] I'll have a go with that on my next shell request :) [01:17:39] git is actually quite good! it should help reduce these shell backlogs when devs just have to review it [01:18:06] a bug like this https://bugzilla.wikimedia.org/show_bug.cgi?id=22939 to run a script [01:18:12] The easy ones usually get done [01:18:21] I guess mortals can't do that? [01:18:37] anyone with shell access can [01:18:47] We actually need to triage and make a list of all these scripts that need running [01:19:11] screen foreachwiki scriptToRun.php [01:19:12] wait [01:19:27] hmm [01:19:28] I can't run that anyways ;) [01:19:31] seems cleanupimages is quick [01:19:42] * Reedy leaves it running [01:19:44] https://bugzilla.wikimedia.org/show_bug.cgi?id=29062 - did Priyanka put that on a cronjob? [01:20:07] Probably not [01:20:12] see comment 7 https://bugzilla.wikimedia.org/show_bug.cgi?id=29062#c7 [01:20:16] You can do that in the ops/puppet repo [01:20:35] hell [01:20:40] cleanupImages is quick on commons [01:20:54] 2000+ images a second [01:20:58] you have to run that for each wiki? all 800 and something on them? [01:21:13] there's no nice global way of doing it? [01:21:13] no, we have a script that runs it on each wiki in turn [01:21:19] ah :) [01:21:28] so literally screen foreachwiki cleanupImages.php [01:21:32] and wait [01:23:45] running cleanupTitles also now [01:41:26] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 236 seconds [01:44:17] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [01:45:47] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:52:50] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.685 seconds [02:02:44] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [02:26:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:56] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.737 seconds [03:18:47] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [03:24:22] New patchset: Hashar; "Enable show update markers on dewiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7152 [03:25:01] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7152 [03:25:05] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7152 [03:34:20] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [05:08:47] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [05:32:47] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [05:52:17] PROBLEM - Host knsq18 is DOWN: PING CRITICAL - Packet loss = 100% [05:52:17] PROBLEM - Host hooft is DOWN: PING CRITICAL - Packet loss = 100% [05:52:17] PROBLEM - Host knsq20 is DOWN: PING CRITICAL - Packet loss = 100% [05:52:17] PROBLEM - Host knsq21 is DOWN: PING CRITICAL - Packet loss = 100% [05:52:17] PROBLEM - Host knsq23 is DOWN: PING CRITICAL - Packet loss = 100% [05:52:17] PROBLEM - Host knsq16 is DOWN: PING CRITICAL - Packet loss = 100% [05:52:18] PROBLEM - Host knsq17 is DOWN: PING CRITICAL - Packet loss = 100% [05:52:18] PROBLEM - Host knsq22 is DOWN: PING CRITICAL - Packet loss = 100% [05:52:19] PROBLEM - Host knsq24 is DOWN: PING CRITICAL - Packet loss = 100% [05:52:19] PROBLEM - Host knsq19 is DOWN: PING CRITICAL - Packet loss = 100% [05:52:20] PROBLEM - Host knsq26 is DOWN: PING CRITICAL - Packet loss = 100% [05:52:26] PROBLEM - Host knsq28 is DOWN: PING CRITICAL - Packet loss = 100% [05:52:26] PROBLEM - Host knsq27 is DOWN: PING CRITICAL - Packet loss = 100% [05:52:35] PROBLEM - Host maerlant is DOWN: PING CRITICAL - Packet loss = 100% [05:52:35] PROBLEM - Host mediawiki-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [05:52:44] PROBLEM - Host ms6 is DOWN: PING CRITICAL - Packet loss = 100% [05:52:53] PROBLEM - BGP status on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, sessions up: 21, down: 4, shutdown: 1BRPeering with AS1257 not established - The + flag cannot be used with the sub-query features described below.BRPeering with AS24115 not established - ASN-EQIX-MLPEBRPeering with AS3292 not established - TDCBRPeering with AS4565 not established - The + flag cannot be used with the sub-query features described below.BR [05:53:02] RECOVERY - Host knsq20 is UP: PING OK - Packet loss = 0%, RTA = 118.42 ms [05:53:02] RECOVERY - Host knsq22 is UP: PING OK - Packet loss = 0%, RTA = 118.15 ms [05:53:02] RECOVERY - Host ms6 is UP: PING OK - Packet loss = 0%, RTA = 119.78 ms [05:53:02] RECOVERY - Host hooft is UP: PING OK - Packet loss = 0%, RTA = 118.41 ms [05:53:02] RECOVERY - Host knsq18 is UP: PING OK - Packet loss = 0%, RTA = 118.43 ms [05:53:02] RECOVERY - Host knsq19 is UP: PING OK - Packet loss = 0%, RTA = 119.27 ms [05:53:03] RECOVERY - Host knsq28 is UP: PING OK - Packet loss = 0%, RTA = 119.79 ms [05:53:11] RECOVERY - Host knsq26 is UP: PING OK - Packet loss = 0%, RTA = 118.51 ms [05:53:11] RECOVERY - Host mediawiki-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 119.37 ms [05:53:12] RECOVERY - Host knsq17 is UP: PING OK - Packet loss = 0%, RTA = 119.26 ms [05:53:12] RECOVERY - Host knsq16 is UP: PING OK - Packet loss = 0%, RTA = 118.31 ms [05:53:12] RECOVERY - Host knsq23 is UP: PING OK - Packet loss = 0%, RTA = 118.47 ms [05:53:29] RECOVERY - Host knsq27 is UP: PING OK - Packet loss = 0%, RTA = 118.76 ms [05:53:39] RECOVERY - Host knsq21 is UP: PING OK - Packet loss = 0%, RTA = 120.12 ms [05:53:39] RECOVERY - Host knsq24 is UP: PING OK - Packet loss = 0%, RTA = 118.97 ms [05:54:41] RECOVERY - Host maerlant is UP: PING OK - Packet loss = 0%, RTA = 120.29 ms [06:04:16] hmmm [06:05:59] my thoughts exactly [06:06:08] cr2-eqiad didn't recover? [06:12:41] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [06:35:29] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [06:59:53] PROBLEM - BGP status on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, sessions up: 15, down: 10, shutdown: 1BRPeering with AS7385 not established - The + flag cannot be used with the sub-query features described below.BRPeering with AS2603 not established - NORDUNETBRPeering with AS2711 not established - The + flag cannot be used with the sub-query features described below.BRPeering with AS1257 not established - The + flag c [08:20:22] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2600* [08:26:04] RECOVERY - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is OK: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z OK - 2325 [08:26:22] RECOVERY - BGP status on cr2-eqiad is OK: OK: host 208.80.154.197, sessions up: 8, down: 0, shutdown: 18 [08:30:16] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2563* [08:38:40] RECOVERY - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is OK: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z OK - 2388 [09:10:08] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [09:10:08] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [09:29:38] PROBLEM - Host knsq26 is DOWN: PING CRITICAL - Packet loss = 100% [09:30:41] RECOVERY - Host knsq26 is UP: PING OK - Packet loss = 0%, RTA = 108.86 ms [10:17:53] PROBLEM - Host knsq29 is DOWN: PING CRITICAL - Packet loss = 100% [10:19:23] RECOVERY - Host knsq29 is UP: PING OK - Packet loss = 0%, RTA = 109.56 ms [10:33:53] PROBLEM - Host knsq24 is DOWN: PING CRITICAL - Packet loss = 100% [10:35:05] RECOVERY - Host knsq24 is UP: PING OK - Packet loss = 0%, RTA = 108.65 ms [10:35:14] PROBLEM - NTP on knsq24 is CRITICAL: NTP CRITICAL: Offset unknown [10:39:35] RECOVERY - NTP on knsq24 is OK: NTP OK: Offset -0.0367449522 secs [10:49:29] PROBLEM - Host knsq16 is DOWN: PING CRITICAL - Packet loss = 100% [10:50:50] RECOVERY - Host knsq16 is UP: PING OK - Packet loss = 0%, RTA = 108.83 ms [11:58:09] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2588* [12:04:00] RECOVERY - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is OK: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z OK - 2325 [12:04:00] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [12:12:24] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2563* [12:15:06] RECOVERY - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is OK: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z OK - 2263 [12:23:39] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2588* [12:29:21] RECOVERY - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is OK: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z OK - 2350 [12:55:26] PROBLEM - Host knsq18 is DOWN: CRITICAL - Host Unreachable (91.198.174.28) [12:56:29] RECOVERY - Host knsq18 is UP: PING OK - Packet loss = 0%, RTA = 108.79 ms [13:06:02] New patchset: Dzahn; "add logrotate config for lighttpd on install-server (RT-2753)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7167 [13:06:19] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/7167 [13:07:32] New patchset: Dzahn; "add logrotate config for lighttpd on install-server (RT-2753)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7167 [13:07:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7167 [13:09:33] New review: Dzahn; "config file as on brewster, just changed "weekly" to "size 100M" and fixed tabs." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/7167 [13:20:56] PROBLEM - Backend Squid HTTP on knsq19 is CRITICAL: Connection refused [13:21:23] PROBLEM - Frontend Squid HTTP on knsq19 is CRITICAL: Connection refused [13:28:42] mutante: were you going to post about getting chapter mw's to upgrade? [13:33:03] mutante: if you haven't yet, I will... Just let me know if you want me to [13:47:12] hexmode: i'll do it. i was going to post on the chapter specific mailing lists, sounds good? [13:47:44] mutante: yes, that sounds fabulous. Could you CC me? [13:47:50] sure [13:51:02] mutante: so that we can keep a record, I'll update http://etherpad.wikimedia.org/OsJfzQH80T with whatever emails you send (or pointers to them). [13:52:08] hexmode: alright [13:54:34] RECOVERY - Backend Squid HTTP on knsq19 is OK: HTTP OK HTTP/1.0 200 OK - 632 bytes in 0.222 seconds [13:59:22] RECOVERY - Frontend Squid HTTP on knsq19 is OK: HTTP OK HTTP/1.0 200 OK - 653 bytes in 0.218 seconds [14:07:00] New patchset: Ottomata; "site.pp - Adding Fabian on stat1.wikimedia.org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7070 [14:07:22] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7070 [14:25:28] New patchset: Jgreen; "adding logmover account to hume so we can store fundraising impression logs there while storage3 is dead" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7169 [14:25:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7169 [14:26:16] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7169 [14:26:18] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7169 [14:28:46] PROBLEM - Host knsq20 is DOWN: PING CRITICAL - Packet loss = 100% [14:31:37] RECOVERY - Host knsq20 is UP: PING OK - Packet loss = 0%, RTA = 108.97 ms [14:34:55] PROBLEM - Frontend Squid HTTP on knsq20 is CRITICAL: Connection refused [14:35:13] PROBLEM - Backend Squid HTTP on knsq20 is CRITICAL: Connection refused [14:38:52] RECOVERY - Backend Squid HTTP on knsq20 is OK: HTTP OK HTTP/1.0 200 OK - 632 bytes in 0.218 seconds [14:40:58] RECOVERY - Frontend Squid HTTP on knsq20 is OK: HTTP OK HTTP/1.0 200 OK - 788 bytes in 0.219 seconds [14:46:58] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2638* [14:48:23] mutante: got the emails, updated the etherpad. Let me know if there is any other place you think I should work on. But targetting chapters is a great strategic move. ty [14:52:45] look at 'im go! \o/ [14:56:43] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2600* [14:58:13] RECOVERY - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is OK: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z OK - 2138 [15:07:28] hiya mark, you around? [15:08:55] wondering about installing precise on stat1 [15:10:22] PROBLEM - Backend Squid HTTP on knsq22 is CRITICAL: Connection refused [15:11:38] yes [15:11:43] PROBLEM - Frontend Squid HTTP on knsq22 is CRITICAL: Connection refused [15:11:43] RECOVERY - Backend Squid HTTP on knsq22 is OK: HTTP OK HTTP/1.0 200 OK - 632 bytes in 0.218 seconds [15:14:25] RECOVERY - Frontend Squid HTTP on knsq22 is OK: HTTP OK HTTP/1.0 200 OK - 788 bytes in 0.219 seconds [15:15:00] New patchset: Jgreen; "redo the way puppet manages logmover account which is used to move around fundraising proxy logs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7174 [15:15:19] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7174 [15:15:45] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7174 [15:15:47] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7174 [15:19:02] New patchset: Jgreen; "remove logmover from admins.pp, now installed by systemuser instead" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7176 [15:19:20] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/7176 [15:19:39] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7176 [15:23:16] PROBLEM - Host knsq21 is DOWN: PING CRITICAL - Packet loss = 100% [15:23:38] New patchset: Jgreen; "remove logmover from admins.pp, now installed by systemuser instead fixed missing comma in site.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7176 [15:23:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7176 [15:24:19] RECOVERY - Host knsq21 is UP: PING OK - Packet loss = 0%, RTA = 109.08 ms [15:24:46] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7176 [15:24:49] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7176 [15:31:39] oop, mark, sorry, just saw you responded [15:31:51] yeah, so, I heaarrrrd that precise is ready for prod machines [15:32:00] and, if we are going to install on stat1, we should do sooner rather than later [15:32:38] and to do that, we need a temporary places to store 206G [15:32:43] til it is reinstalled [15:32:50] possible? [15:44:13] yes [15:44:28] ideally we would use the netapps for that, but they're not in service at the moment :( [15:45:39] is there an alternative? [15:47:35] yeah [15:47:40] just for during the reinstall, right? [15:47:45] yeah [15:47:59] just to hold it somewhere until we can copy it back after [15:51:51] that shouldn't be hard [15:57:45] ^demon, are you around? [16:05:09] mark, if you let me know how to copy the stuff there, I can do start that now [16:05:14] and then you guys can reinstall anything asap [16:06:01] ottomata: how about we copy that data before the install, that's gonna be a whole lot easier ;) [16:07:02] when is the install? [16:07:09] we can schedule it [16:07:12] mutante: where do you have the chapter wikis on wikistats? [16:07:41] ok [16:07:56] when can we schedule it for…or, how do I go about doing that? [16:08:07] drdee, we need to make sure Erik Z and Andre E are ready [16:08:10] and know that this is happening [16:08:11] can do tomorrow or so [16:08:20] just put it in the ticket and get confirmation from us ;) [16:08:49] ok [16:08:56] will make sure this is ready with drdee, and then put in RT ticket [16:09:01] and will include info about saving /a [16:09:09] excellent [16:09:16] hexmode: "wmspecials" but that is exactly the table where version links are broken momentarily [16:09:38] thanks mark! [16:09:41] mutante: kk tyfti [16:14:34] LeslieCarr: so port 14/1 is tagged [16:14:49] and unless you changed it, cr1-eqiad is not using vlans, so expects it untagged... [16:16:04] yeah … now curious if i had changed it, looking through rollback versions ... [16:16:34] well just make 14/1 untagged I'd say, don't really wanna be switching vlans across that wave anyway ;) [16:16:40] ahha yeah rollback 8 [16:16:49] i had switched both to vlans [16:16:49] ok [16:18:03] and we're gonna replace this by a WDM setup anyway [16:18:28] let's talk about that after this is done :) [16:18:31] yep [16:18:38] mutante: wm.fr is running drupal!!!! [16:18:40] lame [16:19:22] yea, there are more :p [16:19:24] New patchset: Thehelpfulone; "adding flood group to itwikisource per bug 36600" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7180 [16:19:31] ok, have the deactivated config on cr2-pmtpa now [16:19:39] * mark checks [16:19:42] second time's a charm? [16:19:43] (deactivated interface config, that is) [16:20:30] hexmode: and wikimedia.org.ar requires login for version info [16:20:48] ok [16:20:50] * hexmode sighs [16:22:00] so, how about i deactivate ospf on cr1-eqiad and we try moving it ? [16:22:09] deactivate ospf?! [16:22:17] just that interface you mean? [16:22:24] just that one [16:22:24] yes [16:22:27] ah ok ;) [16:22:28] sorry to give you a heart attack [16:22:29] hehehe [16:22:32] mutante: from page source on wm.o.ar: [16:22:44] yeah let's do that [16:23:01] don't forget ospf3 [16:23:10] hexmode: ah, heh:) that's a bug "leaks version info when admin didnt want to" .hehe [16:23:33] ah yeah, i now see some of the appeal of isis and not needing two protocols :) [16:23:44] yes [16:23:49] couldn't do isis because of foundry [16:23:51] but that's now gone [16:25:50] cool, 0 traffic --- cmjohnson1 are you ready to move a link ? xe-0/0/1 on cr1-sdtpa to eth14/1 on csw1-sdtpa ? [16:25:59] same thing we were doing before, (but failed) [16:26:09] sure [16:26:26] let me know when u r ready [16:26:47] that's another nice rule [16:26:58] much better than andrew who would just run off and move links before I was finished talking [16:27:18] (or actually worse, configuring) [16:27:44] cmjohnson1: go for it :) [16:27:47] k [16:28:25] leslicarr: moved [16:28:47] mark: so in fun juniper news, now junipers are expected to drop icmp packets from large pings on unused interfaces (blecch) and the IFD errors on cr1-sdtpa are a bug fixed in the latest 10.4 release --- however the 11 release looks stable in europe and i have not heard any screams so might be worth switching to that branch [16:29:01] LeslieCarr: yeah I saw [16:29:08] +1 on the 11R [16:29:17] Hi Reedy :) [16:29:20] gerrit question [16:29:21] New patchset: Thehelpfulone; "adding flood group to itwikisource per bug 36600" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7180 [16:29:26] link is still down? [16:29:31] if someone doesn't approve that, and another change is made to that file [16:29:50] will git/gerrit still figure out how to merge the change? [16:29:55] Usually [16:29:59] unless there's a conflict [16:30:03] in which case you've got to rebase it [16:30:07] and then resubmit [16:30:11] ok [16:30:12] LeslieCarr: swap strands I think? :) [16:30:15] and where is the interwiki table? [16:30:23] I mean when we update it on meta and you copy it [16:30:28] where does that go? [16:30:38] It's not version controlled [16:30:41] as there is no need to [16:30:51] mark, i had it disabled on the port, it's up now [16:30:55] ah ok [16:30:56] yes indeed [16:31:01] ping test time :) [16:31:25] first activate that intf ;) [16:32:38] ping rapid succeeding so far.. :) [16:32:47] so how do I view the live copy of it Reedy to see if it matches meta? [16:33:00] You can't really [16:33:06] hexmode, you may be happier with the version of http://wiki.wikimedia.org.es/wiki/Portada [16:33:07] Though, the CDB wouldn't be much use [16:33:10] let's try ospf3 first [16:33:10] See if the Iw link works on a wiki [16:33:17] doesn't affect much if it breaks ;) [16:34:38] Platonides: \o/ [16:35:00] I was trying to figure this one out https://bugzilla.wikimedia.org/show_bug.cgi?id=35557 [16:35:17] stuck in ExStart state eh [16:35:35] yep .. :( [16:36:06] let's disable interface-type p2p on both ends temporarily [16:36:09] that sometimes fucks up things [16:36:18] if it's not that, we'll look for the mtu error [16:37:08] ah [16:37:14] Protocol inet6, MTU: 9178 [16:37:25] Protocol inet6, MTU: 9170 [16:37:28] that's the tagging [16:37:46] let's set to explicitly 9170 on both ends under family inet and inet6 [16:37:59] cool [16:39:06] setting that explicitly on inet6 for now :) [16:39:10] ok [16:39:24] there you go [16:39:30] ospf up [16:39:36] oh look at that [16:39:37] yay [16:39:45] thanks [16:39:55] yay for being able to reboot cr1-sdtpa [16:40:03] cr2-pmtpa, not so much [16:41:28] mark: why not cr-2-pmtpa? [16:41:39] because that's also switching traffic to row D (and C) [16:41:54] so when cr2-pmtpa is rebooting, those rows are unreachable [16:41:54] ah [16:42:02] whereis cr2-sdtpa doesn't switch anything [16:42:06] csw1-sdtpa does that [16:42:23] (but, likewise, we can't reboot csw1-sdtpa without it affecting lots of stuff) [16:42:26] Reedy: okay so I've done another bug request and it's telling me that I've got one outstanding commit, is this what I really want to do [16:42:37] at least there's usually much less need to reboot pure switches than to reboot routers [16:42:40] will saying "yes" overwrite my existing commit, or should everything be ok? [16:43:02] I inserted some more lines [16:43:07] http://lists.wikimedia.org/pipermail/wikitech-l/2012-April/060093.html [16:43:12] We should probably use -tech for this [16:52:47] so, WDM gear [16:56:53] New patchset: Thehelpfulone; "adding wikijunior, wikijunior_talk namespaces for frwikibooks, bug 35977" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7182 [16:57:09] New patchset: Jgreen; "adding user cmjohnson to base host build" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7183 [16:57:28] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7183 [16:57:39] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7183 [16:57:42] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7183 [16:59:00] LeslieCarr: https://gerrit.wikimedia.org/r/#/c/7181/ [17:03:18] mark: are you still around ? can i test a few sms's on you ? [17:04:22] apergos: are you around ? [17:04:23] cmjohnson1: are you in da DC today? [17:04:44] Im here and wondering if I'm getting any wm email via imap [17:04:50] and thinking maybe not [17:04:56] apergos: you're free! [17:05:02] seems to be working fine for me.. [17:05:11] imap? or google? [17:05:13] cause I don't have google [17:05:32] apergos: can i send you a few sms's to test the new system ? [17:05:33] imap [17:05:43] sure, send away [17:06:22] from imap.wm.org Reedy ? [17:06:37] Yeah [17:06:42] apergos: let me know when you get this one :) [17:06:50] now LeslieCarr [17:06:54] 1 from jeff 8 minutes ago [17:06:56] that was fast :) [17:07:02] it fell fast :-D [17:07:11] I have nothing after 10:30 am [17:07:23] I didn't pay it any mind, I was working but now it's much later [17:07:29] no good :-( [17:08:15] Bah [17:08:27] preilly: your umask is wrong [17:08:32] yes it is! [17:08:36] run for your lives! [17:08:39] -r--r--r-- 1 preilly wikidev 180 2012-05-09 23:58 .git/objects/06/78afdf06fddcc19eebf672c1a91f4aa3c5334e [17:08:40] ( LeslieCarr ) [17:08:42] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7181 [17:08:54] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7181 [17:08:55] yay :) [17:08:56] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7181 [17:09:20] LeslieCarr: just one quick question: does different VLAN mean different DNS zone? or will analytics just be .eqiad.wmnet in any case [17:09:30] it will be the same dns zone mutante [17:09:45] so i could already do the DHCP change..ok thanks [17:09:50] yep [17:10:05] kthxbye, bbl:) [17:11:14] it claims my mailbox is full [17:11:16] wtfff [17:11:30] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [17:11:43] not sending me mail about it, just in the log. [17:11:52] thanks sanger. thanks a whole lot [17:12:02] I am going to fix that right now [17:12:06] (the quota) [17:13:18] who did the new cmjohnson1 change in puppet ? [17:14:08] sheesh I never got more than the basic [17:15:15] fixed that sucker [17:17:57] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [17:18:29] Please can somone fix the permissions on /home/wikipedia/common/.git for me? chmod -R g+ws * and chgrp -R wikidev * [17:19:27] Reedy 'll do this [17:19:35] thanks [17:20:26] that should do it [17:20:45] check please [17:20:59] yup, thanks [17:21:10] sweet [17:21:40] Just need to not add the ppts in docroot to git :D [17:21:40] oh yay I see delivery [17:21:44] (of mail) [17:24:56] New patchset: Jgreen; "sigh, Could not find dependency Group[500] for User[cmjohnson] on mobile cache" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7185 [17:25:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7185 [17:26:29] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7185 [17:26:31] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7185 [17:30:15] preilly: completed [17:33:39] mutante: fyi: analytics1001.mgmt is fixed [17:39:15] !log pushing out new zone files. only minor changes [17:39:18] Logged the message, notpeter [17:41:48] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [17:43:04] cmjohnson1: would you be able to set up the mgmt interfaces on search21-36 today? (or even just one for testing purposes) [17:43:31] notpeter: awesome timing...i am just getting the mgmt ip together to set up drac [17:43:51] spent the morning installing the drives [17:44:17] they will all be ready for a network ticket today [17:44:40] New patchset: Thehelpfulone; "adding wikijunior, wikijunior_talk namespaces for frwikibooks, bug 35977" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7182 [17:47:32] cmjohnson1: awesome! [17:47:35] thank you! [17:48:29] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [17:55:56] New patchset: Lcarr; "Cleanup directory structure and release 1.1 binary" [operations/software] (master) - https://gerrit.wikimedia.org/r/7188 [18:03:16] New review: pugmajere; "'tis fine as it stands, but this feels like a hack and it'd probably be worth adding proper support ..." [operations/software] (master) C: 1; - https://gerrit.wikimedia.org/r/7188 [18:03:52] !log starting innobackupex from db10 to blondel [18:03:55] Logged the message, notpeter [18:04:03] New patchset: Lcarr; "Cleanup directory structure and release 1.1 binary" [operations/software] (master) - https://gerrit.wikimedia.org/r/7188 [18:06:08] btw, opsen, the above will lock some tables on db10 while they're being dumped, so there might be some laggage [18:08:06] New patchset: Jgreen; "created new group and class for dctech, moved cmjohnson from 'restricted' to 'dctech'" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7191 [18:08:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7191 [18:08:47] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7191 [18:08:49] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7191 [18:10:09] Change abandoned: Lcarr; "abandon patchset 2" [operations/software] (master) - https://gerrit.wikimedia.org/r/7188 [18:10:57] New patchset: Jgreen; "fixed typo in admins.pp, added admins::dctech to ocg* hosts for testing" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7192 [18:11:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7192 [18:11:18] Change restored: Lcarr; "(no reason)" [operations/software] (master) - https://gerrit.wikimedia.org/r/7188 [18:11:23] New patchset: Lcarr; "Cleanup directory structure and release 1.1 binary" [operations/software] (master) - https://gerrit.wikimedia.org/r/7188 [18:13:48] New review: Lcarr; "(no comment)" [operations/software] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7188 [18:13:50] Change merged: Lcarr; [operations/software] (master) - https://gerrit.wikimedia.org/r/7188 [18:14:08] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7192 [18:14:10] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7192 [18:26:21] New patchset: preilly; "Partner IP Live testing - Thursday, May 10th, 10am - 12pm PDT" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7201 [18:26:39] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7201 [18:31:27] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7201 [18:31:29] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7201 [18:32:44] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2575* [18:33:53] !log reloaded and purged cache of mobile varnish [18:33:56] Logged the message, Mistress of the network gear. [18:38:35] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2613* [18:39:59] preilly: 100, 80, 75, 68, 59, 47 [18:40:56] That's numberwang? [18:41:35] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2588* [18:43:06] !log restarting mobile varnish [18:43:09] Logged the message, Mistress of the network gear. [18:43:17] Reedy: yes! [18:43:18] scary [18:48:38] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2588* [18:51:05] LeslieCarr: did anything touch vcl_recv_purge? [18:52:02] i don't know [18:52:49] LeslieCarr: okay [18:58:03] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2575* [19:10:39] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [19:10:39] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [19:13:30] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2563* [19:19:23] New patchset: Jgreen; "fighting with login gid assignment" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7206 [19:19:43] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7206 [19:19:56] New patchset: Pyoungmeister; "marking db29 for precise test install" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7207 [19:20:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7207 [19:20:40] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7206 [19:20:43] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7206 [19:20:58] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7207 [19:21:00] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7207 [19:27:09] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [19:27:27] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2575* [19:31:39] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2563* [19:34:30] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2600* [19:35:33] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [19:37:12] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2588* [19:39:05] cmjohnson1: you there? [19:40:51] New patchset: saper; "adding flood group to itwikisource per bug 36600" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7180 [19:41:07] New patchset: Jgreen; "still fighting with account creation" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7208 [19:41:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7208 [19:41:42] notpeter: yes [19:42:11] just a note on the mgmt IPs for the search boxes [19:42:21] they were in dns slightly incorrectly, and I changed them this morning [19:42:37] oh..okay...what are you changes...cause I just finished with them [19:42:42] ugh [19:42:46] damnit, I'm really sorry [19:42:55] np...ez fix [19:43:12] basically, the last octet needs to be incremented by 1 on each of them [19:43:41] as it stands, both searchidx2 and search21 are trying to use the same IP [19:44:02] I think that rob didn't notice searchidx2's entry when he did the dns change [19:44:11] I can totally see how it would have happened [19:44:24] ok...let me fix them now [19:44:30] thanks! [19:44:35] and sorry for not saying something sooner [19:44:40] it didn't occur to me [19:45:23] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7208 [19:45:25] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7208 [19:45:36] RECOVERY - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is OK: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z OK - 2375 [19:46:27] notpeter: no problem...plz update dns [19:47:15] grrr. i so want to stab puppet. [19:50:33] cmjohnson1: dns is updated [19:51:24] !log rebooting db29 for do a test install of precise [19:51:27] Logged the message, notpeter [19:51:40] hey, puppet Q, i'm googling but not finding the answer [19:51:43] wanna double check [19:52:02] with something like this: [19:52:02] class generic::geoip { [19:52:02] class packages { [19:52:02] … [19:52:02] class files { [19:52:03] ... [19:52:09] include generic::geoip [19:52:27] will include both generic::geoip::packages and generic::geoip::files [19:52:27] ? [19:52:30] is that correct? [19:52:30] nope [19:52:36] i believe it won't [19:52:50] would it do anything? [19:52:55] since there isn't really a class called generic::geoip ? [19:52:59] it would just define the two classes [19:53:01] it is more of a namespace in this case? [19:53:05] define them totally separate [19:53:06] yeah [19:53:07] include generic::geoip [19:53:16] would do nothing then? [19:53:19] class generic::geoip::packages { [19:53:28] class generic::geoip::files { [19:53:39] right, i'd have to include each of those classes individually [19:53:40] right? [19:53:50] I believe so [19:53:54] you do with either approach [19:53:54] or have a class that includes both of them [19:53:58] right [19:54:03] back up a sec [19:54:12] makes sense, someone else did a include generic::geoip, and I was wondering why I don't see what I should [19:54:13] thanks! [19:54:20] why create the separate nested classes at all? [19:54:32] cuz nested classes rule! [19:54:34] or something [19:54:38] omg they're awful [19:54:45] PROBLEM - Host db29 is DOWN: PING CRITICAL - Packet loss = 100% [19:54:48] stab those too! :-0 [19:55:08] i'm going blind trying to figure out why admin.pp is broken, and the nested classes enrage me [19:55:28] i dunno, cause someone else did it? [19:56:06] blame says it was maybe Ryan_Lane? [19:56:11] yeah, been there too, hated it, and then I found the style guide which says not to do it [19:56:23] our conf is full of them, it's the standard [19:56:29] ah, cool, too bad it is legal syntax [19:56:38] ok cool, i'll change these, i migth even take these out of generic-definitions.pp [19:56:40] they aren't defines! [19:57:41] oh my goodness there are two classes that deal with this [19:58:53] New patchset: Jgreen; "testing theory that admins.pp is broken with scoping issues" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7212 [19:59:12] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7212 [19:59:23] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7212 [19:59:25] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7212 [20:00:19] RECOVERY - Host db29 is UP: PING OK - Packet loss = 0%, RTA = 1.33 ms [20:03:36] ah HA, it is le broken. [20:03:37] PROBLEM - MySQL disk space on db29 is CRITICAL: Connection refused by host [20:04:13] PROBLEM - SSH on db29 is CRITICAL: Connection refused [20:05:43] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [20:08:40] New patchset: Jgreen; "more testing of admins.pp b0rk" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7214 [20:08:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7214 [20:09:43] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7214 [20:09:46] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7214 [20:12:46] RECOVERY - SSH on db29 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [20:16:22] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2575* [20:17:16] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [20:19:04] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2563* [20:21:10] PROBLEM - NTP on db29 is CRITICAL: NTP CRITICAL: No response from NTP server [20:22:14] notpeter: search box mgmt complete and network ticket was created [20:23:16] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2563* [20:23:55] is that my queue ? :) [20:24:13] New patchset: Jgreen; "adding admins::dctech to all hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7230 [20:24:31] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7230 [20:24:31] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7230 [20:26:07] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2575* [20:27:03] the puppet lint check hamster fell off its wheel [20:28:07] lesliecarr: enter stage right [20:28:10] you're up! [20:28:14] :) [20:28:22] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [20:29:34] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [20:33:33] cmjohnson1: done! [20:34:03] awesome! notpeter ^ [20:34:16] he may be afk [20:37:22] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2575* [20:41:52] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2575* [20:44:34] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2613* [20:51:55] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2638* [20:54:21] New patchset: Ottomata; "Refactoring some geoip classes." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7232 [20:54:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7232 [20:57:39] would love a review on that when someone gets a sec [21:00:19] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2563* [21:00:45] he was supposedly on wikipedia-en and wikimedia-ops [21:00:56] ottomata: i'll take a look [21:01:12] thank you! [21:03:10] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2575* [21:06:18] while i'm at it, i need to install pygeoip on stat1 [21:06:25] i see this comment in generic-definitions.pp [21:06:37] # python pip and virtualenv. [21:06:37] # only use this on development systems. [21:06:37] # in order to go to production, all dependencies need to come from debian packages, not pip. [21:06:37] class generic::pythonpip [21:06:48] i can't seem to find a deb for pygeoip [21:07:20] hmm, i guess I could let fabian just install that with virtualenv [21:07:25] rather than puppetizing it [21:07:35] any opinions on that? [21:08:20] ottomata: i don't see the misc::geoip class ... [21:08:30] http://packages.debian.org/python-geoip ? [21:08:35] am i just missing it ? [21:08:38] otherwise just make your own deb? [21:09:18] oooooP! [21:09:23] git add [21:09:35] jeremyb: you are a better googler than i [21:09:48] no, i just know debian package naming conventions better [21:09:51] ___-------___ [21:09:51] _-~~ ~~-_ [21:09:52] _-~ /~-_ [21:09:54] /^\__/^\ /~ \ / \ [21:09:56] /| O|| O| / \_______________/ \ [21:09:59] | |___||__| / / \ \ [21:10:01] | \ / / \ \ [21:10:04] | (_______) /______/ \_________ \ [21:10:06] | / / \ / \ [21:10:09] \ \^\\ \ / \ / [21:10:12] \ || \______________/ _-_ //\__// [21:10:16] \ ||------_-~~-_ ------------- \ --/~ ~\ || __/ [21:10:18] ~-----||====/~ |==================| |/~~~~~ [21:10:21] (_(__/ ./ / \_\ \. [21:10:23] (_(___/ \_____)_) [21:10:31] PROBLEM - swift-container-auditor on ms-be2 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [21:10:31] New patchset: Ottomata; "Refactoring some geoip classes." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7232 [21:10:40] LeslieCarr: das should be better [21:10:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7232 [21:10:55] nagios-wm and blackman should get a room [21:12:30] LeslieCarr, hold off before clicking approve, i'm going to include that python package too [21:12:31] one sec [21:12:50] New patchset: Ottomata; "Refactoring some geoip classes." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7232 [21:13:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7232 [21:13:11] thar we go [21:17:19] RECOVERY - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is OK: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z OK - 2388 [21:18:33] New review: Lcarr; "Yay! Thanks for fixing this up" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7232 [21:18:35] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7232 [21:22:03] thank you! awesome [21:22:45] should all the rev_sha1 fields be filled by now or is the migration still in process? [21:26:01] drdee2: I'd imagine it'll still be in process [21:26:02] Ask AaronSchulz [21:27:42] LeslieCarr: re: sms, did you also try the mail gateway? [21:27:52] yeah, emailed from spence [21:27:56] want me to try emailing you ? :) [21:28:04] sure [21:28:11] Reedy: which one should be finished? [21:28:39] I've no idea [21:28:52] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [21:28:55] I don't know if it's just being run all in order, one procsess per wiki [21:29:45] they must be running per wiki [21:30:01] sure, but what order [21:30:12] some of the dblist files [21:30:12] by cluster? by project? [21:30:41] enwiki and eswiki are not started/finished yet (I checked that) [21:30:55] I'd be suprised if enwiki was done ;D [21:37:25] RECOVERY - swift-container-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [21:43:28] AaronSchulz: I've been checking the unsharded de thumb container over the last two days and it hasn't gotten any new objects. I'm thinking it's time to delete all the objects it contains then delete the container itself. [21:43:32] any objections? [21:44:02] nope [21:44:15] sweet. [21:44:22] tvtropes.org/pmwiki/pmwiki.php/Main/NukeEm [21:46:29] so apparently now that we have SMS work [21:46:41] what's stopping us from putting it into production [21:51:00] New patchset: Pyoungmeister; "giving db29 generic puppet definition" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7237 [21:51:18] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7237 [21:51:32] notpeter: would you be able to add some "new" wikis so they get indexed for searching? [21:52:05] Looks like according to wikitech it needs root [21:52:18] yerp [21:52:39] Actually, can we get a list of currently indexed wikis? And then I can find the difference? [21:53:00] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7237 [21:53:01] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [21:53:03] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7237 [21:53:37] yeah, I'll put that on fenari for you. one sec [21:57:38] Reedy: fenari:/tmp/wikilist.for.reedy [22:01:32] notpeter: looks like there's quite a few that aren't [22:01:45] I'm guessing the private ones are purposely not indexed [22:04:34] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [22:04:56] !log swift: deleting the unsharded wikipedia-de thumb container contents (the sharded version is currently serving traffic) [22:05:00] Logged the message, Master [22:05:33] notpeter: I count 10 (non restricted) wikis not indexed [22:06:38] http://p.defau.lt/?qJHnByDrOTfP3VimAHSmOw [22:10:20] ok, I can add those [22:10:27] Thanks [22:10:37] It takes 24-48 hours to fully take effect doesn't it? [22:11:37] hhhmmmm, I'm not fully sure, actually [22:11:45] but 48 hours seems like a generous upper bound [22:11:59] That's what I thought [22:12:09] I seemed to recall there is some cron that runs at sometime that does something [22:12:22] PROBLEM - udp2log log age for oxygen on oxygen is CRITICAL: CRITICAL: log files /a/squid/saudi-telecom.log, have not been written to in 24 hours [22:12:36] oh, yeah, that's true, the snapshotting cron will probably have to run before anything else will happen [22:14:10] Reedy: what's the url for the wikimania2013 wiki? [22:14:31] wikimania2013.wikimedia.org [22:14:55] I'm now curious [22:15:16] about? [22:16:33] Hm.. how do I know this link is working ? [22:16:34] http://ganglia.wikimedia.org/latest/?r=day&c=Apaches&h=srv189.pmtpa.wmnet [22:16:39] http://ganglia.wikimedia.org/latest/?r=day&c=Apaches&h=srv189.pmtpa.krinklenet [22:16:43] I see no difference [22:17:13] reedy how long it does take [22:17:24] what... group "Apaches" doesn't even exist [22:17:31] it was renamed to "Application servers pmtpa" [22:17:43] Yeah [22:17:47] It's always been called that AFAIK [22:17:52] ganglia has terrible error detection [22:17:54] aka, None [22:18:00] it even shows graphs [22:18:07] LOL [22:18:10] RoanKattouw: c=Apaches is all over wikitech [22:18:13] that's not the only fault ganglia has [22:18:21] RoanKattouw: linking to /small/ though, which is 404 error now [22:18:21] It shows graphs for srv189.pmpta.krinklenet [22:18:24] that's awesome [22:18:30] for anything you put there [22:19:02] RoanKattouw: did you know I run toolserver.org/~krinkle off a secret server at wmf ? [22:19:02] LP [22:19:02] anyone futzing with oxygen, or is it just being chatty with alerts? [22:19:03] http://ganglia.wikimedia.org/latest/?r=day&c=Toolserver&h=srv189.enschede.krinklenet [22:19:04] :P [22:19:15] Even http://ganglia.wikimedia.org/latest/?r=day&c=Apaches&h=foo.bar.baz works [22:19:27] c= is ignored to, even shows up in the breadcrumbs [22:20:07] wow, I even have my own cluster http://ganglia.wikimedia.org/latest/?c=toolserver.org:-krinkle&m=load_one&r=day&s=by%20name&hc=4&mc=2 [22:20:19] 6254 CPUs [22:26:15] LeslieCarr: I am unable do hit the mgmt cars for search34-36. would you be willing to use magic to let me know if these are plugged in currently? [22:26:25] *cards [22:26:53] do you have their mac addresses ? and actually maybe not… not sure if our mgmt switches or managed or not… checking [22:27:32] I don't think I do.... [22:27:57] I can also drop a ticket for chris [22:28:03] ok [22:28:12] yeah, magic only works when i have the mac addresses [22:28:25] ah, ok [22:28:29] cool! thanks! [22:28:54] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [22:39:14] Reedy: Do all apaches have fenari:/h/w mounted? [22:39:24] New patchset: Pyoungmeister; "adding dhcpd entries for search 21-36" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7243 [22:39:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7243 [22:40:22] no [22:41:30] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7243 [22:41:32] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7243 [22:43:34] Reedy: k, one last thing. fenari.wikimedia.org on port 80 reroutes to graphite.wikimedia.org, does that mean graphite.wikimedia.org runs on fenari? (like noc.wm.o) ? [22:44:04] Ye [22:44:05] s [22:44:08] k [22:44:08] They resolve to the same ip [22:44:50] graphite.wikimedia.org is an alias for noc.wikimedia.org. [22:44:52] noc.wikimedia.org is an alias for fenari.wikimedia.org. [22:44:53] fenari.wikimedia.org has address 208.80.152.165 [22:46:22] graphite runs off of fenari? [22:46:45] mmm [22:46:49] weren't we recently getting grief for doing deploy work off of fenari? [22:48:11] Reedy: those wikis should all be indexed... soon [22:48:26] ran the commands [22:48:50] Great [22:48:51] thanks [22:52:43] no, graphite does not run off fenari [22:53:12] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [22:53:13] that would be pretty awesome ;) [22:54:12] binasher: graphite would be a lot cooler if it did. just sayin' [22:54:19] graphite, gdash, and the actual profiling collector run on professor [22:54:27] we should run everything on fenari [22:54:29] well, professor's tweed jacket is pretty cool [22:54:31] It'll save a lot of money [22:54:50] a cloud of one. [23:06:42] notpeter: CloudStackOne ? [23:06:51] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [23:22:18] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [23:39:52] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [23:55:28] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor