[00:00:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17964 [00:01:06] * AaronSchulz waves TimStarling [00:05:02] hello [00:08:47] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6905 [00:11:10] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15458 [00:11:56] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15463 [00:13:20] New review: Ryan Lane; "Please ensure the file is actually gone. If the system is rebuilt, then we can merge this then." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/15597 [00:14:23] New review: Ryan Lane; "Hooray for cleanup!" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/15553 [00:14:24] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15553 [00:54:39] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17610 [01:04:00] !log a couple package upgrades on bast1001 [01:04:08] Logged the message, Master [01:41:20] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 212 seconds [01:42:59] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 279 seconds [01:49:44] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 682s [01:54:32] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 10 seconds [01:57:23] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 0 seconds [01:58:44] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 22s [03:11:30] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [03:25:27] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [03:25:27] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [03:25:27] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [04:02:30] PROBLEM - Puppet freshness on hume is CRITICAL: Puppet has not run in the last 10 hours [04:39:03] morning [04:50:38] paravoid: moin moin [04:50:48] :) [04:52:27] paravoid: you don't know any of the mirror team by chance? ftp.us.debian.org has a bad dual stack in rotation atm. i changed to cdn.debian.net which fixed it for me but it's still broke [04:52:47] * jeremyb got several people in a channel to try debian.gtisc.gatech.edu and all failed (not all with the same failure) [04:52:53] ping and HTTP both don't work [04:53:16] no one responds in #-mirrors [04:53:29] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [04:57:03] sec [04:59:10] jeremyb: removed, wait 5mins for the TTL to expire [04:59:26] oh, hah. didn't expect you to do it yourself ;) [04:59:39] i already switched to cdn anyway ;) [05:01:14] didn't even have to wait that long (more than 5 mins since my last try?) [05:01:54] super cow powers. [05:01:57] :-) [05:02:35] * jeremyb imagines valessio making a super cow animation [05:03:00] paravoid: so what about notifying them that they are broken? [05:03:34] I asked symoon about it [05:03:51] we usually add/remove mirrors per mirroradm's requests [05:04:00] or mirror local admins [05:04:06] so I'm not sure of the process [05:04:19] huh. symoon is 13 days idle [05:04:26] anyway, thanks [05:04:35] I could just drop a mail but thought I should ask him before starting stepping on anyone's toes [05:05:05] oh, you asked but didn't get a response. i thought you were saying he already responded [05:05:51] yes. no. :P [05:06:31] :) [05:13:35] New review: Faidon; "Calling it videoscaler::files when it sets up a jobrunner is counter-intuitive. I'd propose moving e..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/16654 [05:24:33] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [05:37:27] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [06:00:33] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [06:30:51] PROBLEM - SSH on labstore3 is CRITICAL: Server answer: [06:32:12] RECOVERY - SSH on labstore3 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [06:40:33] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: Puppet has not run in the last 10 hours [06:55:33] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [07:10:32] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [07:25:32] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [07:27:29] RECOVERY - Puppet freshness on srv281 is OK: puppet ran at Wed Aug 8 07:27:19 UTC 2012 [07:44:42] New review: Nikerabbit; "Would be nice to mention which (parts of) commit this reverts." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17965 [07:46:24] New review: Nikerabbit; "Looks more like addition of new stuff than cleanup." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17966 [08:02:21] Hello. Pursuant instructions of some devs, I wish to notify that I've been requested to perform a bigdelete on enwiki on a page with +5000 revids (approximately 5,520 revisions). Do I have the ops blessing? [08:03:03] Page is: https://en.wikipedia.org/wiki/User:28bot/edit-tests-found/2012-July [08:09:32] I'm fine with it, anyone else want to weigh in? [08:10:59] apergos: do we know the exact number of revids the page has? Because vvv's tool says that less than 3,000 but the deletion warning says that well over 5,000 revids. [08:11:13] lemme see [08:12:11] https://toolserver.org/~vvv/revcounter.php?wiki=enwiki_p&title=User%3A28bot%2Fedit-tests-found%2F2012-July [08:13:25] 4419 [08:14:15] well, I'll wait a bit to see if anybody objects then I'll delete [08:14:56] maybe there are some deleted revisions in the archive table [08:23:42] Performing deletion. [08:25:20] okey dokey [08:28:51] Done. [08:29:02] sweet [08:29:06] have a nice day :-) [08:29:22] Everythin is right? [08:29:32] *Everything [08:29:54] seems like it [08:30:03] until the hordes show up saying they can't edit :-D [08:30:55] we have a couple dbs that are lagged, the rest are fine [08:31:58] nothing that I can do to fix that, undeleting would increase the lag :) [08:32:12] well what I didn't do was check it before the delete [08:32:22] so it's hard to compare [08:33:11] I've seen the user proposing that bigdelete be assigned locally /me screams [08:34:39] anyways it's dropping so all good [08:35:26] I suppose that large deletes should be done in batches and maybe even jobqueued for these batches [08:35:29] someday [08:37:57] I remember when the meta:SRP page (+30k revids) was deleted in error... [08:38:47] that's pretty up there [08:40:33] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [08:54:30] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [08:54:30] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [10:00:38] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [10:03:34] apergos: hordes coming? how's possible before https://bugzilla.wikimedia.org/show_bug.cgi?id=16043 is fixed? :p [10:05:18] when things break we get hordes in wikimedia-tech [10:05:21] standard [10:44:36] New patchset: Nikerabbit; "Initial version of solr for ttmserver" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16732 [10:45:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16732 [10:49:38] New review: Nikerabbit; "Renamed the file and parametrized the class." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/16732 [11:44:10] New review: Oren; "Hi I've been learning puupet and it looks like you need to tell solr to restart if the schemma changes" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/16732 [11:49:53] !log Upgraded Observium to latest SVN [11:50:04] Logged the message, Master [12:40:13] New patchset: Mark Bergsma; "Add partial RANCID manifest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18082 [12:40:52] New patchset: Mark Bergsma; "Add cr1-esams to RANCID" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18083 [12:41:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18082 [12:41:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18083 [12:41:56] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18082 [12:42:15] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18083 [12:54:01] New review: Nikerabbit; "Hi Oren, thanks for the review." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/16732 [12:55:15] New patchset: Mark Bergsma; "Add cr1-esams to Torrus" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18084 [12:55:53] New patchset: Mark Bergsma; "Add cr1-esams to Nagios" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18085 [12:56:30] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18084 [12:56:31] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18085 [12:56:42] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18085 [13:02:07] New review: Oren; "I guess there are more then one way to make the puppet dance" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/16732 [13:12:30] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [13:26:36] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [13:26:36] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [13:26:36] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [14:03:30] PROBLEM - Puppet freshness on hume is CRITICAL: Puppet has not run in the last 10 hours [14:35:21] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17698 [14:36:31] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16432 [14:36:45] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17918 [14:37:27] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16987 [14:38:08] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17398 [14:38:21] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17399 [14:38:42] ottomata: care to work on your key issue? i was working in PM with chris about it [14:39:10] nice, yeah i'm talking with him too [14:39:17] ok [14:39:29] i'm going to generate a new one for wikimedia (root?) stuff [14:39:33] ottomata: you know about os x terminal sharing its keys across tabs right? [14:39:42] no? [14:39:46] unless you 'exec ssh-agent bash' on each tab before loading key [14:40:00] it matters when you are logging into multiple key domains (like production/labs) [14:40:11] dont want/need to pass your production key into labs. [14:40:23] hm [14:40:43] ok…, if I specify the identity on the ssh command it should be ok, though ja [14:42:58] chjohnson, i just emailed you my public key [14:54:30] New patchset: Catrope; "Redirect secure.wikimedia.org URLs to proper HTTPS" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13429 [14:55:08] Change abandoned: Catrope; "Superceded by https://gerrit.wikimedia.org/r/#/c/13429" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15599 [14:55:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13429 [14:57:21] New review: Catrope; "Differences between PS2 and PS3:" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/13429 [14:58:48] New patchset: Cmjohnson; "add new key for ottomata" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18093 [14:59:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18093 [15:01:29] cmjohnson1: nope, you need to change it, commenting [15:02:02] Totally need to make sure the old key doesn't exist then add the new one [15:02:15] yep, commented that ;] [15:02:54] New review: RobH; "The old key needs to be set to absent, then a new key stanza added." [operations/puppet] (production); V: -1 C: -1; - https://gerrit.wikimedia.org/r/18093 [15:03:10] there are in line comments for ya, cherrypick and fix =] [15:20:07] New patchset: Cmjohnson; "add new key for ottomata" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18093 [15:20:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18093 [15:25:32] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [15:33:10] hey [15:33:11] what's up? [15:34:37] cmjohnson1: looks good to me [15:36:02] New patchset: Mark Bergsma; "Cleanup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18096 [15:36:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18096 [15:38:35] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [15:40:04] mark: re: why is it here? in your comments [15:40:17] applicationserver/apache/apache2.conf.erb uses the cluster to generate the correct apache2.conf [15:43:48] then that should become a parameter of the class [15:44:14] and probably it would be better to just make parameters for the apache config class [15:44:19] which sets the max clients as a nr [15:44:25] instead of putting that logic inside a template [15:44:48] i see that's not the only thing it controls [15:45:54] did you make those conditionals inside apache2.conf.erb? [15:46:55] New review: RobH; "much better, merging" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/18093 [15:46:56] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18093 [15:47:47] cmjohnson1: ok, change is merged and live on sockpuppet [15:48:09] so if ottomata needs access to a specific server, someone can force a puppet run a few times until they see the key change [15:48:19] otherwise it may take a few hours to update across the cluster [15:48:29] mark: yeah [15:48:52] hm [15:48:53] I can redo, if you'd like [15:49:04] so I am trying (and learning?) to set up the new analytics dell boxes [15:49:21] don't think puppet is on those yet [15:49:33] but maybe I need root to access other things to set them up? [15:49:39] (I have zero idea how to do this) [15:49:43] ottomata: what has been done to them so far? [15:49:47] I can take a look if you'd like [15:49:56] no OS, afaik, just subnet setup by LeslieCarr [15:50:02] ah [15:50:02] ok [15:50:17] https://rt.wikimedia.org/Ticket/Display.html?id=3367 [15:50:22] https://rt.wikimedia.org/Ticket/Display.html?id=3067 [15:50:50] notpeter: i'll do this [15:51:39] mark: ok, just let me know if there's anything you'd like me to do on that [16:00:30] so, notpeter and cmjohnson1 [16:00:35] i am in fenari with new key [16:00:49] wait [16:00:49] sorry [16:00:50] no [16:00:53] i ran the wrong command [16:00:55] i take it back complete [16:00:57] old key still works [16:00:59] new key does not [16:01:03] you accidentally the everything? [16:01:12] yeah, it'll take until puppet runs for it to be purged [16:01:33] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [16:02:22] notpeter, old key works on bast1001, new key does now [16:02:23] not [16:02:25] * [16:03:11] bast100: [16:03:14] old key: [16:03:15] debug1: Offering RSA public key: /Users/otto/.ssh/id_rsa [16:03:15] debug1: Server accepts key: pkalg ssh-rsa blen 277 [16:03:29] new key: [16:03:30] debug1: Offering RSA public key: /Users/otto/.ssh/id_rsa-wmf [16:03:30] debug1: Authentications that can continue: publickey,password [16:03:30] debug1: Next authentication method: password [16:03:30] otto@bast1001.wikimedia.org's password: [16:04:00] might be position of the moon [16:06:37] New patchset: Mark Bergsma; "Try a different way of configuring apache" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18099 [16:07:03] try bast1001 now [16:07:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18099 [16:07:17] old key doesn't work [16:07:18] new key does [16:07:20] sorry [16:07:21] doesn't [16:07:25] both don't work on bast1001 now [16:07:59] but, notpeter, i may have been going to bast1001 through fenari before, [16:08:09] i realized I had a ProxyCommand set up in .ssh config that was going that [16:08:17] i've commented that out [16:08:20] ah, ok [16:08:26] I see your new key on bast1001 [16:08:31] fenari is the same: [16:08:31] old key works, new key does not [16:08:45] bast1001, neither key works [16:08:56] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16732 [16:09:30] rawr [16:10:19] New patchset: Mark Bergsma; "What kind of messed up indentation was that?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18101 [16:10:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18101 [16:11:49] New patchset: Mark Bergsma; "Cleanup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18102 [16:12:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18102 [16:13:16] ottomata: can you verify that the new key in here is correct: https://gerrit.wikimedia.org/r/#/c/18093/2/manifests/admins.pp [16:13:30] cuz I see that key in your auth keys file on bast1001 [16:13:55] for user otto? [16:14:00] ja [16:14:46] yeah that is correct [16:16:25] New patchset: Mark Bergsma; "Kill $cluster, it conflicts with the global $::cluster." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18103 [16:16:29] ottomata: can you confirm you only have the new key loaded, and not both? (ssh-add -l) [16:16:49] in ssh -v I can see it using my key [16:16:53] i am specifying key manually with -i [16:16:56] cool [16:17:01] ssh -v -i /Users/otto/.ssh/id_rsa-wmf bast1001.wikimedia.org [16:17:02] New patchset: Mark Bergsma; "Kill $cluster, it conflicts with the global $::cluster." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18103 [16:17:06] yea that should be fine then [16:17:19] and it doesnt like it on bast1001? [16:17:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18103 [16:17:47] looks like my key was wrong in puppet, extra '==' [16:17:55] notpeter just edited it and I am in in bast1001 now [16:17:59] cool [16:18:14] it was the == at the end of the key [16:18:26] cmjohnson1: ^ [16:18:30] PROBLEM - Puppet freshness on labstore2 is CRITICAL: Puppet has not run in the last 10 hours [16:18:58] I've had this happen before. I know what to look for :) [16:19:56] notpeter: so... the sudo::appserver class [16:20:07] I think we should move that into the applicationserver and mediawiki modules [16:20:09] cool, s'ok! [16:20:18] mediawiki sync stuff in the mediawiki module [16:20:22] other stuff in the appserver module [16:20:39] so can you create those classes/modules/config files? [16:20:58] sure [16:21:02] perhaps it's just mediawiki [16:21:06] i'm not entirely sure [16:21:10] i see the nagios check raid [16:21:14] that doesn't belong in there at all I think [16:21:27] that's related to the raid check in base or wherever [16:21:39] so it should be a totally separate sudo include file [16:21:55] mind you, this is a bit similar to e.g. pybal_check [16:22:01] you said, "perhaps this should move to a pybal module" [16:22:04] no, it shouldn't :) [16:22:11] a pybal module sets up pybal [16:22:24] but this is applicationserver specific support for pybal [16:22:33] pybal needs it to check an applicationserver [16:22:37] New patchset: Pyoungmeister; "update of andrew otto's key without == at the end." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18104 [16:22:40] so it needs to live in the applicationserver module [16:23:00] much like nagios check definitions also don't live in the nagios module [16:23:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18104 [16:25:41] ok. I was thinking that both the client and server ends of the pybal-related things could live in the pybal manifests [16:25:47] but either way is fine with me [16:29:13] New patchset: Mark Bergsma; "Put the config::apache inclusion in the webserver role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18105 [16:29:51] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18104 [16:29:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18105 [16:29:56] feel free to merge my changes if that helps your work [16:30:40] ok [16:36:19] New patchset: Mark Bergsma; "Put the config::apache inclusion in the webserver role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18105 [16:36:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18105 [16:40:30] Logged the message, Master [16:41:12] woo! [16:41:36] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: Puppet has not run in the last 10 hours [16:56:36] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [17:10:00] New patchset: Cmjohnson; "changing the MAC address for Search32 after main board swap." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18113 [17:10:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18113 [17:11:35] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [17:12:16] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18113 [17:13:03] why is the wiki soooooo slow? [17:13:23] bottleneck appears to be when loading from bits.wikimedia.org [17:16:36] cmjohnson1: should be good to go [17:18:53] nah, it'll break again [17:18:56] I can feel it in my bones [17:26:35] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [17:28:32] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [17:57:50] cmjohnson1: around ? [17:58:18] ok [17:58:22] lemme know when you're back ? [17:59:47] yep [18:03:47] LeslieCarr: can I point you at my request for a public ip for ms10 in tampa? [18:04:04] ah sure [18:04:26] 1 minute [18:04:48] sure. it's this: https://rt.wikimedia.org/Ticket/Display.html?id=3379 [18:04:57] for whenever you get to it. [18:14:10] apergos: so are you ready to switch over the ip now ? [18:14:20] ah hm [18:14:31] so I would just take the box down and switch the ip [18:14:37] cause I'm gonna reinstall [18:14:40] *shrug* [18:14:48] if you want I'll shut it down [18:15:03] ok [18:15:05] I'll deal with puppet cleanup tomorrow [18:15:15] it was stupid for me not to remember it needed to be public facing [18:15:35] no problem [18:15:41] i do stupid shit all the time ;) [18:16:43] !log shutting down ms10, going to move it to public ip [18:16:52] Logged the message, Master [18:17:04] done [18:17:26] so yeah if all you want to do is dig up an ip addr and shove it in the ticket that's fine [18:17:31] I can do dns and the vlan move. [18:17:43] if you want to do the whole shebang that's fine too, whatever works [18:18:11] cool [18:18:12] :) [18:18:44] updated with an ip [18:18:54] let me know if you need any help/advice with the vlan move [18:19:25] ok, thanks. it'll be tomorrow, I've done these before, it shouldn't be a deal [18:19:33] PROBLEM - Host ms10 is DOWN: PING CRITICAL - Packet loss = 100% [18:19:35] this is a juniper router [18:19:43] (I can do it on foundry too but I like it less) [18:20:36] I do have a question though: how did you dig up an ip for tampa? [18:22:04] i just checked out the rdns file for an empty ip and pinged it [18:22:51] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms [18:23:12] and there were some? sweet [18:23:50] I thought we had been in the "none left" phase for a while now, scrimping by by stealing from decommisioned hosts [18:25:37] well, it's pretty tight still [18:26:11] ok cool [18:26:45] PROBLEM - Lucene disk space on search32 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:27:12] PROBLEM - SSH on search32 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:30:03] RECOVERY - SSH on search32 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [18:35:31] !log completed metawiki centralnotice schema migration for fundraising [18:35:39] Logged the message, Master [18:38:27] PROBLEM - Lucene on search32 is CRITICAL: Connection refused [18:41:36] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [18:45:51] New patchset: Alex Monk; "(bug 37226) Fix portal talk namespace on bswiki. Also add comments with the bug number." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/18124 [18:51:30] PROBLEM - NTP on search32 is CRITICAL: NTP CRITICAL: No response from NTP server [18:55:14] New patchset: Catrope; "Set $wgVisualEditorParsoidPrefix correctly" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/18125 [18:55:33] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [18:55:33] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [19:08:08] RECOVERY - Lucene disk space on search32 is OK: DISK OK [19:09:59] ok, cmjohnson1, RobH, and notpeter [19:10:08] i finally got all my ssh stuff all sorted [19:10:09] feels good [19:10:20] cleaned up my .ssh/config, everything is smoooooth [19:10:33] what's next for getting OSes on these new machines? [19:10:39] i think I need management (IPMI?) access [19:10:44] I put in an RT ticket here: [19:11:03] https://rt.wikimedia.org/Ticket/Display.html?id=3406 [19:11:17] so an ops person has to do the actual install [19:11:26] so best to list how you wish the os partitioned and the base install stuff [19:11:38] then once we do the install we can do the initial puppet run to get you access [19:11:47] (i think) [19:12:04] i dunno if the network these are on can touch our install server [19:12:08] LeslieCarr: ^? [19:13:23] they should be able to [19:13:32] DHCP requests are forwarded on [19:13:43] if it doesn't respond let me know [19:13:44] ok cool [19:14:10] ottomata: so yep, detail in an rt ticket how you need them partitioned, then you can check in your gerrit changes for review on what puppet manifests to apply [19:14:55] I did that, but [19:15:05] I think the point of this is to teach me to do that part myself [19:15:08] I have done that, though [19:15:15] https://rt.wikimedia.org/Ticket/Display.html?id=3367 [19:15:48] CT suggested that I learn how to install the OS and setup puppet myself [19:16:00] once I get OS installed and network/ssh access, I probably already know how to do most of the rest [19:16:09] or can figure it out from wikitech [19:16:21] but at this point, I have zero idea of how to access machines with no OS [19:16:29] notpeter said IPMI (or KVM?) [19:16:33] but I need a management pw for that? [19:16:57] the mgmt password is only for folks with root [19:17:10] so unless you have been approved for cluster wide root access, no can do [19:17:16] its why we tend to put the OS on for folks. [19:17:28] i have root [19:17:33] oh, then thats fine. [19:17:41] ottomata: you are on fenari right now right? [19:17:45] i can write you the mgmt password [19:17:50] one sec [19:17:57] and walk you through how to do an install, with relevant wikitech docs [19:17:59] ok [19:18:01] cool! [19:18:02] on it now [19:18:49] ottomata: got it? [19:19:02] i think we can do the first link [19:19:11] is anyone on tampa management consoles ? [19:19:16] this is quite important :) [19:19:26] yup [19:19:26] got it [19:19:29] management access will go down during this first transition [19:19:48] i think the ones I'm about to use are in eqiad [19:19:57] cmjohnson1: ok [19:20:00] Ok, so all mgmt is broken down into a few different subnets [19:20:08] in tampa, its usually servername.mgmt.pmtpa.wmnet [19:20:15] in eqiad its servername.mgmt.eqiad.wmnet [19:20:34] now, are you looking to connect to the new c2100s? [19:20:48] (I advise we start on the analytics misc hosts over the analytics work hosts) [19:20:48] I think so? analytics1011-1027 [19:20:49] um [19:20:57] those are both [19:21:06] Leslie networked them up: [19:21:06] https://rt.wikimedia.org/Ticket/Display.html?id=3067 [19:21:10] so the lower end are c2100s, which are similar to setup as the rest of the dells, with a few differences [19:21:24] cmjohnson1: you happen to have the wikitech doc on how to install [19:21:30] i can never find the damned thing [19:21:37] zat this one? [19:21:37] http://wikitech.wikimedia.org/view/Build_a_new_server [19:21:58] yep [19:22:21] ok, so you need to do a few things, first is allocate DNS for them all [19:23:03] looks like 1001-1022 have that done [19:23:13] 1001-1010 are up and running [19:23:31] do they already have IPs assigned? [19:23:37] oh i guess they are just incremental in the subnet [19:23:40] so usually i end up allocating ip's, but these are internal, so if you wanna do it i can walk you thourhg it [19:23:42] yep [19:23:45] yeah totally [19:23:48] let's do it [19:23:50] just incremental, but we need to update it for up to 1027 [19:23:55] ok cool [19:24:18] so you wanna ssh into sockpuppet as root, and go into the /root/pdns_templates directory [19:24:46] of course, someone put vanadium in as 123 [19:24:49] =P [19:25:03] so now analytics1023 gets to be .124 ;] [19:25:04] doh [19:25:07] aww maaaaaannnnnn [19:25:08] i hate that [19:25:10] yargh [19:25:17] ottomata: oh wait [19:25:19] RobH: they are in their own vlan [19:25:21] that may be wrong [19:25:23] indeed [19:25:28] ok phew [19:25:47] LeslieCarr: there are no entries in the reverse file [19:25:58] so i have no idea what analytics in row c will be [19:26:03] cmjohnson1: found it thanks =] [19:26:18] RobH: for the vlan ? [19:26:26] RECOVERY - NTP on search32 is OK: NTP OK: Offset -0.01722049713 secs [19:26:32] LeslieCarr: I mean we have ; 10.64.21.0/24 - analytics1-b-eqiad [19:26:38] 10.in-addr.arpa:; 10.64.22.0/24 - analytics1-c-eqiad [19:26:39] but i need analytics1-c-eqiad [19:26:45] im just blind. [19:26:53] and doing too much at same time =P [19:27:16] ottomata: so in the 10.in-addr.arpa file [19:27:18] look for ; 10.64.22.0/24 - analytics1-c-eqiad [19:27:32] then add in analytics1023-1027 [19:27:35] yeah ok i see it [19:27:54] then you add them into the wmnet file in alphabetical order under analytics1022 [19:28:05] then lemme look at them before you svn commit [19:28:10] easier that way ;] [19:28:29] all the dns update info is on wikitech http://wikitech.wikimedia.org/view/Dns [19:28:36] ok, i see up to 1022 under 10.64.21.0/24 - analytics1-b-eqiad [19:28:43] right, because they are in row b [19:28:49] the reverse file is broken up and ordered by subnet [19:29:06] every row in eqiad has its own subnet(s) [19:29:19] so analytics machines in row b have an internal subnet just for them [19:29:23] and same for row c [19:29:54] so 1001-1022 are in a different subnet than 1023-1027? [19:29:59] ottomata: if I am reviewing stuff you know or dont wanna deal with (either way) lemme know [19:30:00] yep [19:30:11] and they are all sectioned off from production for security [19:30:15] ok, naw cool, so far don't konw any of this [19:30:16] cool [19:30:17] since non ops will have sudo on them [19:30:25] so I need to add 1023-1027 into row c [19:30:26] (for a VERY limited time period) [19:30:45] and even though they are a different subnet [19:30:50] i might as well start 1023 at .123 [19:30:52] right? [19:30:53] yep, need to add them in the reverse under ; 10.64.22.0/24 - analytics1-c-eqiad [19:31:05] New patchset: Asher; "adding db10(49-50) to s1" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18178 [19:31:10] you can, its a huge subnet [19:31:37] just know it will be 10.64.22.123, where analytics1022 is 10.64.21.122 [19:31:47] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18178 [19:31:55] right [19:31:57] annoying [19:32:00] if we didnt have a damned near limitless (for our purposes) number of internet ip's ;] [19:32:02] but at least a bit is consistent [19:32:10] internal i mean [19:32:12] not internet =P [19:32:14] yeah [19:32:35] so, what's the security issue here? these machines should all have equal access to each other [19:32:38] its just a convention? [19:32:50] to keep each row (um, is a row == a rack?) on a diff subnet? [19:33:07] kind of, each row has its own subnets [19:33:15] but those subnets may be linked in various ways [19:33:33] so in tampa, we just have our public subets, internal subnets, and mgmt subnets [19:33:52] we have a few public ones, as we have IP address (ipv4) in different subnets provided to us over the years [19:33:55] aye, i see that the mgmt names have already been given for all the analytics hosts, right? [19:33:58] so I don't need to mess with that? [19:34:00] and networking there is far from redundant =P [19:34:13] right, mgmt is all done, onsite techs do that for you guys [19:34:17] cool [19:34:18] as we have to know the ip to program it [19:34:30] in ashburn, we have full redundancy on all networking [19:34:35] so mark got to plan it out quite a bit more [19:34:42] ok, saved [19:34:47] the result is row a has its public subnet, private subnet, and mgmt subnet [19:34:49] um, someone else has some uncommitted edits to that file [19:34:53] row b same, row c same [19:34:59] hm, ok cool [19:35:01] and also, labs has its own internal subnet per row [19:35:11] analytics has its own internal subnet per row (just b and c presently) [19:35:19] right, cool [19:35:32] my understanding is having root in a subnet lets you escalate some exploits or some such [19:35:35] but as for why the analytics can't be on the same subnet…just convention? [19:35:40] honestly mark is far more knowledgable about it [19:35:47] they are in different rows [19:35:54] and we dont allow subnets to span rows in eqiad [19:35:57] is that a technical thing or just a convention thing [19:35:58] due to the networking convention yep [19:36:00] both [19:36:01] aye ok [19:36:01] cool [19:36:05] its a convention due to technical concerns [19:36:13] ok cool, that's fine, thanks [19:36:22] we can deal, for the most part it won't matter, we'll just have to remember [19:36:24] as for those actual technical reasons, leslie and mark know better =] [19:36:33] i think there are some softwares we are going to use that want IPs rather than hostnames, but we can deal [19:36:37] you guys should always use fqdn when you can anyhow [19:36:38] but yea [19:36:39] yeah [19:36:40] totally [19:36:50] so anyway [19:36:53] i saved my changes [19:36:55] svn diff to check [19:37:01] but, someone else has uncommitted edits [19:37:04] not sure what is up with those [19:37:08] reviewing [19:37:16] well technical in order to not have to do spanning tree [19:37:16] frack subnets? [19:37:26] oh [19:37:26] LeslieCarr: prolly added them [19:37:28] that's me [19:37:29] sorry [19:37:30] please commit [19:37:32] they are all subnet comments, no problem =] [19:37:37] hold off on commit, reviewing [19:37:38] (i should call them fracking subnets) [19:37:40] ah right [19:37:41] cool [19:37:42] ok [19:37:45] ok to do in same commit? [19:37:48] or shoudl I take out my changes for a min [19:37:50] to do two commits [19:37:50] ottomata: you need to add the forward before you commit [19:37:51] ? [19:37:58] you can leave your changes [19:38:05] its just root commit to internal svn [19:38:05] ok, lemme look for forward.. [19:38:10] so you need to open wmnet [19:38:19] and add the analytics entires under the rest of them [19:38:35] ah cool [19:38:54] and here they are not divided by subnet, right? [19:38:55] in the file? [19:38:59] i can just add the entries in order? [19:39:00] you will be adding to the ;servers - listed alphabetically [19:39:14] they are broken up by either server or subnet (so servers in one area, mgmt in another) [19:39:21] line 1075 [19:39:25] is where you want to append beneath [19:39:26] yeah [19:39:29] that's where I am [19:39:37] and these are 10.64.22.0/24? [19:39:42] not .21. [19:39:43] right? [19:39:44] yep [19:39:47] k [19:40:23] by the time we finish today you will be able to do installs on any system ;] [19:40:28] cuz the c2100s are a pain in the ass [19:40:33] if you can do them the rest are easy. [19:40:34] yay! [19:40:37] ok [19:40:40] how's svn diff now [19:40:40] ? [19:41:19] looks good to me, you can svn commit and comment on why etc... [19:41:30] ok [19:41:39] once you svn commit on sockpuppet, you need to ssh as root into dobson, but pass your key, cuz it needs to do a sync to the other nameservers [19:41:50] i normally dont pass my key to further servers. [19:42:17] ok committed [19:42:30] dobson... [19:42:36] http://wikitech.wikimedia.org/view/Dns#Changing_records_in_a_zonefile [19:42:57] so on dobson (with key passed) you run the authdns-update script [19:43:03] dobson... [19:43:10] dobson is ns0.wikimedia.org [19:43:37] ah great [19:43:38] ok in [19:43:39] when you run that, it does a bunch of stuff [19:43:49] pulls langlists, generates the zones, etc [19:43:56] and syncs it out to the other nameservers [19:44:08] point of interest, if we have a site go down, like ESAMS or something [19:44:10] ok, does it svn update? [19:44:22] scripts related to this one migrate the load off them [19:44:26] yep [19:44:28] ok [19:44:32] should I do it? [19:44:35] yep =] [19:44:45] k its going [19:44:52] the paranoia is why i checked your work [19:45:00] its best to have someone do that, a bad typo in here can take things down. [19:45:26] you can tease both mark (I wouldnt recommend it) or Ryan_Lane (totally give him hell) as they both have taken things down via dns updates [19:45:29] great, and all 3 ns show the change [19:45:43] coolness, congrats, you are doign better than Ryan_Lane [19:45:48] * RobH is just being a shit ;] [19:45:57] haha, nice [19:45:59] Isn't part of Ryan's job breaking shit *looks at labs* [19:45:59] :D [19:46:04] oh snap [19:46:15] great cool! [19:46:18] so, we ahve DNS [19:46:30] yep, now you can pull the MAC addresses from each device, and update the dhcp lease files [19:46:39] so, these are two platforms, so you get to learn two ways to do it [19:46:54] two platforms, in that they are two different hardwares [19:46:54] ? [19:47:10] hrmmm...... [19:47:20] hm? [19:47:21] hold on, i think the ip info may be wrong for the older entries [19:47:24] uh oh [19:47:26] not something you did, checking [19:47:29] lk [19:47:50] yea, analytics1011+ is in row C [19:47:57] but they are in the files for row B [19:48:03] yeah, that makes more sense [19:48:06] ottomata: so you wanna move those entries and update or shall i? [19:48:11] 1001-1010 ahve been around a while [19:48:12] i'll do it [19:48:19] the reverse file is an easy change, juve move 1011+ [19:48:29] and then replace the 21 with 22 in the wmnet file for them [19:48:32] you viming? [19:48:37] sorry, out now [19:48:48] mean to view, not vim, old habit [19:49:08] Damianz: heh [19:49:11] Damianz: I'll stab you [19:49:14] I'll stab you goof [19:49:17] *good [19:49:19] so IPs in forward need to go to .22. [19:49:22] for 11-22 [19:49:24] right? [19:49:26] yeah [19:49:29] soryr, you just said that [19:49:29] cool [19:49:34] ottomata: yep [19:49:53] Ryan_Lane: I'll take ryan stabs over mark stabs ;) [19:49:56] now, if you are 100% sure of your dns changes, you dont need to have someone review, but its recommended [19:50:11] i think I will take review for a while thank you [19:50:31] Ryan_Lane knows i can give him hell about dns outage since its only rivaled by an outage i caused years ago [19:50:46] hehe [19:50:49] dns outage = wonky stuff with folks who dont follow proper negative time to live [19:50:54] ok svn diff pluhlease :) [19:51:01] i did an apache redirect to send all pages to enwiki landing page including enwiki landing page [19:51:08] infinite loop that cached into squids [19:51:13] was not pretty ;_; [19:51:16] eeeek fun [19:51:28] yeah, everybody has those stories, ja? [19:51:42] at CouchSurfing once, shortly after I started messing with mysql replication for the first time (we had never used it before) [19:51:52] ottomata: looks good to me [19:51:54] i pointed the app master at the slaves (and the slaves were not running readonly) [19:51:56] THAT was nasty [19:52:06] ok cool, committing [19:52:28] yea, you really are part of the team when you take down major services with unplanned downtime. [19:52:36] it used to be enwiki [19:52:41] RobH: are there any ciscos available for me in eqiad? [19:52:55] binasher: nope. [19:53:01] not unless we steal them abck from something [19:53:01] i have pc1-3 in pmtpa [19:53:06] ok, so committed, runing authdns-update on dobson? [19:53:08] ja? [19:53:16] (i'm being super careful right now and double checking everything before I do it) [19:53:18] (with you) [19:53:21] yep, but the ciscos in eqiad were allocated a long time ago for analytics and labs, where in tampa there is no analytics [19:53:23] ok, i'll work on stealing [19:53:34] ottomata: yep, so run authdns-update [19:53:37] also, admin log when you do that [19:53:41] just to be safe [19:53:45] oh, [19:53:49] lemme try [19:53:52] its ok that you didnt before, no big deal [19:54:01] !log updating DNS entries for analytics1011-1027, they are all in row-c [19:54:09] Logged the message, Master [19:54:12] coooool [19:54:13] hehe [19:54:18] just post push, always dig, pdns has a known issue where the slaves (ns1/2) will sometimes hang on reload [19:54:20] and crash out [19:54:28] ok [19:54:35] it happens like 5% of the time or less. [19:54:43] but it will always happen the time you forget to dig ;] [19:54:54] nagios then catches it, and its no big deal [19:55:02] but easier to just manually check anyhow [19:55:12] cool, checked analytics1015, all say ns .22.115 [19:55:28] ok, so now those are set, you need to get all the mac info for them [19:55:37] ottomata: you have racktables access right? [19:55:49] https://racktables.wikimedia.org/index.php?page=rack&rack_id=68 [19:55:51] naw [19:56:09] oh, you need it if you have root [19:56:12] let me get you setup. [19:57:04] k [19:57:57] ottomata: ok, i just emailed you a temp pass for it [19:58:00] login and change it [19:58:12] then you can see the rack layouts, which is important in that it will tell you what model server you are using [19:58:22] which tells you what wikitech page has the howto on using its mgmt [19:59:27] ottomata: so you can login and change that when you have a moment, for now though i can tell you that 1011-1022 are dell poweredge c2100 [19:59:42] and 1023-1027 are poweredge r310 [19:59:58] ok, pw changed [20:01:08] So, on wikitech, we have http://wikitech.wikimedia.org/view/Platform-specific_documentation [20:01:14] hm, how do I know which rack they are in? [20:01:18] (I'm clicking around) [20:01:21] in racktables, you can search in top right [20:01:28] serach by servername works fine [20:01:31] AH! [20:01:32] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [20:01:32] they are named [20:01:33] so cool [20:01:34] (it can search any of the fields) [20:01:44] COOOL [20:02:01] this is the tool we use for the majority of our rack planning [20:02:15] but thats only useful for myself, chris, mark, folks who rack and plan rack layouts [20:02:23] aye [20:02:23] for the rest of you its useful in that you can pull the hardware type from it [20:02:30] which lets you know how to interface [20:02:33] !log streaming a hotbackup of db1017 to db1050 [20:02:33] so on the c2100s [20:02:39] http://wikitech.wikimedia.org/view/Dell_PowerEdge_C2100 [20:02:41] Logged the message, Master [20:02:49] these are interesting, as dell servers, they do NOT have DRAC [20:02:54] which is dells remote access controller [20:02:58] they just have plain old IPMI [20:03:08] which means you either have to know how to use the ipmi_tool [20:03:16] or you can do the easy thing and use my ipmi_mgmt script [20:03:28] that script lives in two places, in sockpuppet and on 'iron' [20:03:33] iron is the ops bastion host [20:03:38] ok [20:03:41] that for now only holds that [20:03:43] root login or as me? [20:03:50] but eventually it will migrate over to what we use daily [20:03:52] either. [20:03:57] well, root works [20:04:02] i dunno if normal user does, need to cehck [20:04:11] yep, normal user should work [20:04:23] port 22? [20:04:25] but doesnt have acceess to the damned script [20:04:28] ottomata: so root@iron [20:04:31] =P [20:04:33] ah [20:04:44] i'm hanging on trying to connect [20:05:01] ssh'ing into iron? [20:05:03] yeah [20:05:06] debug1: Connecting to iron.wikimedia.org [208.80.154.151] port 22. [20:05:08] hrmm.... [20:05:23] lemme force an update, your key is in the root keys list right? [20:05:29] yeah [20:05:37] but its not getting as far as even to try the key [20:05:40] just hanging on trying to connect [20:05:47] thats odd, it works for me... [20:05:53] but, you can also just root@sockpuppet [20:06:04] telnet just hangs on port 22 too [20:06:04] telnet iron.wikimedia.org 22 [20:06:04] Trying 208.80.154.151... [20:06:06] ok [20:06:08] sockpuppet it is [20:06:48] hrmm, sockpuppet isnt the updated script, lemme pull update [20:06:51] k [20:08:27] bleh, its fine [20:08:31] i dunno why i thought it wasnt [20:08:38] so if you just type ipmi_mgmt [20:08:42] it should give you a bit of help [20:09:02] so for the analytics hosts, to get the mac info for say, analytics1011 [20:09:02] ay eok [20:09:16] ipmi_mgmt analytics1011.mgmt.eqiad.wmnet getsysinfo [20:09:27] sorry, sysinfo [20:09:35] my own help file is right there and i ignore it ;] [20:09:54] hmm, pw [20:10:00] mgmt password you now have [20:10:04] oh right! [20:10:16] you can also store it in your shell environment for the session [20:10:25] if you read the script, it shows what the call is [20:10:26] usage: delloem [option...] [20:10:26] commands: [20:10:26] mac [20:10:26] lan [20:10:26] powermonitor [20:10:27] For help on individual commands type: [20:10:28] delloem help [20:10:35] ok cool [20:11:08] hrmm [20:11:10] well, damn [20:11:16] looks like that doesnt include the mac info [20:11:20] lemme check the web interface real fast [20:11:29] (drac sysinfo does, ipmi doesnt, annoying!) [20:11:38] can I get a console and interrogate it? [20:11:45] yep, but it takes longer than http [20:11:50] the post on these is like 5 minutes [20:11:52] its insane long. [20:11:53] as in laggy? [20:11:57] the console? [20:12:02] just takes a long time to spin disks and check memory [20:12:15] no difference from physical or serial console, slow no matter what [20:12:43] oh, hm, not sure I understand, we'd have to boot them? [20:12:46] that's what takes a while? [20:12:48] you mean? [20:13:34] yep [20:13:45] when you say 'web interface', you mean racktables? [20:13:47] so the easiest (relativly) way to get the mac from the c2100 [20:13:58] nope, i mean the mgmt interface has a web interface =] [20:14:12] or you can figure out ipmi_tools exact command to pull the mac info [20:14:13] java applet? [20:14:17] nah, http [20:14:19] oh really? [20:14:20] cool [20:14:24] but you ahve to have proxy setup for the cluster [20:14:29] aye [20:14:30] ok [20:14:33] which i recommend anyhow, as it lets you use web interfaces on things like labs [20:14:37] yeah [20:14:45] lemme find the wikitech doc on this i wrote [20:14:45] i fire them up temporarily when I need them [20:14:51] it uses firefox plus a plugin called foxyproxy [20:14:51] on how to get mac? [20:15:01] oh yeah, i htink the labsconsole has decent instructions [20:15:31] https://labsconsole.wikimedia.org/wiki/Help:Access#Accessing_web_services_using_a_SOCKS_proxy [20:15:37] http://wikitech.wikimedia.org/view/Proxy_access_to_cluster [20:15:51] wikitech are basically so os x + firefox + ssh = access [20:16:03] i ahve it setup basically for ops folks, plus the rules you need [20:16:14] you can of course simplify it and just tunnel ALL traffic via the datacenter [20:16:18] but its annoying that way [20:16:47] ottomata: so yea, didnt expect to get loaded down with this much eh? ;] [20:17:01] bring it oooooooon [20:17:10] i'd rather figure out the mac from the cli anyway, if I can [20:17:12] looking into that... [20:17:26] well, the command line arguements are decent [20:17:41] i used to run a single line that routed localhost:8080 into whatever hard coded mgmt interface i speficied [20:17:48] but that required me to fire it off on a host by host basis [20:17:56] lemme see if i have that command handy [20:18:15] oo [20:18:17] that sounds nice [20:19:19] ssh -L 8080:internal ip of mgmt interface:80(443) root@fenari.wikimedia.org [20:19:29] then in your browser you can localhost:8080 and it resolves [20:19:45] but then you have to open and close it per connectoin, the foxyproxy setup is more dynamic [20:20:07] so if you expect to do a lot of installs and the like, i recommend the foxyproxy setup, but the one liner works everywhere [20:20:12] and wsithout more software =] [20:20:12] can I use mgmt name rather than IP? [20:20:24] nope, cuz your local host will have no idea how to resolve that fqdn [20:20:46] when i setup foxyproxy, its rules on routing take that fqdn and say to pass it along via ssh [20:21:06] (in my experience) [20:21:17] ah right [20:21:27] also, you need https port on these [20:21:33] the c2100 uses https self signed cert [20:21:37] (so have to accept cert, etc) [20:22:08] ottomata: before you do all that though im checking the web mgmt interface to ensure it has the info we want [20:22:14] i should have done this before. [20:22:32] yay got it [20:22:36] haa, [20:22:37] well shit.... [20:22:38] ok [20:22:42] im not seeing it, i hate these things [20:22:45] hate. [20:22:55] what's the username for mgmt? [20:22:59] root [20:23:08] so the only way to get the mac may be to post and go into bios [20:23:17] cmjohnson1: Do you recall how we got the macs for c2100? [20:23:53] ottomata: sigh, annoyyyying, sorry to lead you down the web path [20:24:03] keep those notes, you may need them in the fugture, but seems you can stick with ipmi [20:24:09] you need to send a reboot, boot into BIOS [20:24:13] then copy down the mac from there [20:24:35] alternatively you could just tell it to PXE and try to grep its mac out when it hits, but thats not very useful (we have lots of dhcp hits) [20:24:49] use the one time boot option bios from the ipmi script [20:25:06] 'bootbios', then powercycle, then console [20:25:22] that will boot it and tell it on next boot enter bios, where you can then pull the mac out of it [20:25:37] once you ahve the mac addresses for them all, we can update the lease files for the install [20:25:38] ok, explain me something…is this machine currently booted? or just sitting there? [20:25:47] its just sitting there, more than likely powered up [20:25:59] but powercycle tends to power up (if i recall correctly) even if they arent powered up [20:26:03] hmm, ok ok ok [20:26:13] and we tend to power up servers when we rack them [20:26:18] to balance the power phases in the rack out [20:26:21] ah ok [20:26:22] so [20:26:26] so they tend to sit and spin with no OS [20:26:36] why do I powercycle after bootbios? [20:26:46] becuase if its booted, it wont reboot [20:26:53] oh [20:26:54] the bootbios is a one time, next boot option [20:26:54] bootbio [20:26:57] ahhh [20:26:57] ok [20:26:58] cool [20:27:11] ok, using ipmi_mgmt from sockpuppet to do that [20:27:20] i'm going to do it in a screen if you wanna look ove rmy shoulder [20:27:56] im doing it on analytics 1012 to confirm steps [20:28:00] k [20:28:18] (oh, btw, I looke din ipmi_mgmt script for env pw setting, didn't see anything...) [20:29:03] hrmm, yea, i dont see it, so i must have seen it as part of the impi_tool calls [20:29:11] lemme see [20:29:12] aye ok [20:29:14] s'ok [20:29:20] can I console directly after powercycle? [20:29:23] or do I need to wait a bit? [20:29:23] yep [20:29:27] you can do immediately [20:29:37] it will start scrolling once thigns are posting [20:29:42] mien is already starting post [20:29:44] mine even. [20:29:52] ahh there it goes! [20:30:03] oh this is so much better than the crap I had at CS [20:30:11] the dell DRAC systems are easier to deal with, richer feature set in the out of band mgmt [20:30:13] java applet was the only IPMI console I could get [20:30:18] but this isnt bad [20:31:00] hrmm, looking for mac [20:31:26] PCI? [20:31:43] ah there it is [20:31:46] yep [20:31:46] i see 2 NICs with MACs [20:31:51] which one is which? [20:31:51] so nic1 [20:31:53] k [20:32:01] oh, so you dont have to do analytics1012 [20:32:02] * NIC1 Mac Address [04-7D-7B-A5-E6-94] ** * [20:32:04] danke [20:32:17] so you need to do those steps (dont need to bother to exit bios if you dont wanna) [20:32:23] to d/c you can usually ~~. [20:32:31] but ipmi is odd and sometimes just kills your entire ssh [20:32:34] ha, ok [20:32:38] sorry ~~~. [20:32:51] you may also have to 'reset' it due to color [20:32:53] apple key + r [20:33:03] (soft terminal reset for color and type issues) [20:33:19] once serial console sends odd colors, terminal tends to stick to them without the reset [20:33:29] now, once you reset these, they will boot from disk [20:33:29] hm [20:33:40] so once you have all the macs, you will use the bootpxe command in ipmi [20:33:50] wait hang on, still trying to figure out how to exit [20:33:53] but thats once you have the macs, updated lease files, and updated installer files for auto partitioning [20:33:57] no problem [20:35:02] damn, it exited fine for me the first time [20:35:08] tried to do it again to confirm, now its not working [20:35:13] uhhh, i think I got out of bios [20:35:18] esc [20:35:25] but my terminal seems stuck [20:35:28] with a single black line [20:35:41] yea, welcome to annoying ipmi [20:35:43] hah [20:35:44] im stuck now too [20:35:49] can I jsut close the window and re-login? [20:35:53] yep [20:35:55] we don't care care we will boot these machines again later [20:35:56] k [20:36:03] just annoying to have to relog [20:36:06] ok, let's both go ahead and procee dwith 11 and 12 [20:36:07] ? [20:37:01] we can if ya wanna [20:37:09] hm, do I need to do the rest of the bios instructions on the C2100 page? [20:37:11] http://wikitech.wikimedia.org/view/Dell_PowerEdge_C2100 [20:37:31] nope [20:37:36] ah no, these were probably done by Leslie (or whoever) [20:37:37] k [20:37:39] we do the basic mgmt setup, password set, so on [20:37:45] those were done by me [20:37:51] or maybe chris when he was in town helping me [20:37:59] whoever physcially racks them has to do that stuff [20:38:09] ah ok, cool [20:38:34] so you are in the dhcp section of the build a new server [20:38:40] (fyi, I need to leave in 35 minutes) [20:38:43] the switch ports were done by leslie already i assume [20:38:46] ok [20:38:46] yeah [20:38:48] i think they were [20:38:48] ottomata: oh yea, we wont get this done today ;] [20:38:54] but we can resume tomorrow no problem [20:39:00] can we get one of them done today? [20:39:07] shoudl we stop here or is there another good stopping point coming up? [20:39:09] depends on the partman setup [20:39:13] ahhhhh right [20:39:16] well, we should keep going until the partman [20:39:19] ok cool [20:39:22] let's keep going then [20:39:23] thats going to decide if this is painless or painful. [20:39:32] plus you dont need to check in your changes today [20:39:33] should we switch to PM, we might be flooding ops here? [20:39:47] nah, no one else is chatting, and its not bad background info for other ops [20:39:49] ok cool [20:39:59] most of them should know this, but to some of them its very new [20:40:06] so it doesnt hurt [20:40:09] ok [20:40:22] So, these are dells, which communicate at 115200 on com2 [20:40:35] so this part is in your local puppet repo [20:40:47] local? [20:40:50] you are all setup to checkin gerrit changes for ops right? [20:40:53] yeah [20:41:00] RECOVERY - Lucene on search32 is OK: TCP OK - 0.001 second response time on port 8123 [20:41:00] ok, yea, your local git copy is all [20:41:05] ah ok [20:41:05] yeah [20:41:07] of origin/production [20:41:09] ohhhh [20:41:11] but my ssh key chanGED [20:41:14] ahh lemme fix gerrit [20:41:16] ahhh [20:41:17] heh [20:41:26] well, sounds like you need to connect to gerrit and give it your new key [20:41:46] on the gerrit interface its under your settings [20:42:01] ok [20:42:03] yeah [20:42:05] totally [20:42:05] done [20:42:05] so while you do that, i will list what you do so you have the info [20:42:11] i'm ready [20:42:19] in files/dhcpd [20:42:32] you will be editing linux-host-entries-ttyS1-115200 [20:42:45] the S1 is serial port 2 (s0 is com port 1) [20:43:00] you can see we have S0-115200 and S0-9600 as well [20:43:04] and S1-57600 [20:43:14] different server types require different speeds and settings for that [20:43:29] on these its all in linux-host-entries.ttyS1-115200 [20:43:36] this is dhcp over serial? [20:43:41] which you know just by knowing what machines do what, or by asking [20:43:54] nah, but the dhcp file also has the serial redirection settings for ubuntu is all [20:44:03] so it parses that info from the lease file name [20:44:15] we could have done it differently i suppose [20:44:19] serial redirection settings... [20:44:34] i think i'm confused as to what dhcp has to do with serial ports? [20:44:35] by default ubuntu will just stream all its console to the physical console port [20:44:52] well, the dhcp server tells the host where the install files are [20:45:00] ahhhh [20:45:01] hm [20:45:02] hm [20:45:13] then the installer files handed off are based on what file it lives in [20:45:28] oh crazy [20:45:36] if its in S0-115200 it tells the ubuntu installer to direct its console output on serial com port 1 at 115200 baud [20:45:40] installation via dhcp? [20:45:44] basically [20:45:50] well, DHCO fires off PXE Boot [20:45:53] which is the installer [20:45:59] right [20:46:15] but basically, dhcp is telling the hosts: HEY, go boot and install using X [20:46:16] ? [20:46:37] does dhcp serve it the install file? [20:46:38] well, if you look in dhcp.conf you can see what its telling it [20:46:43] it says hey, this is your ip, gateway, etc [20:46:46] and this is the isntall server [20:46:52] connect to this via tftp and load what it says [20:46:52] ahhhh [20:46:56] hm [20:46:57] cool [20:47:08] so PXE uses TFTP to launch the ubuntu installer [20:47:16] pretty standard for most enterprise level solutions [20:47:29] cool, never worked at enterprise level :p [20:47:31] most datacenters i know do the same thing for hosted solutions [20:47:33] but coooool [20:47:38] ok [20:47:39] so [20:47:40] yea imagine putting a cd in each of these =P [20:47:47] aye yea [20:47:53] it quickly becomes worth the overhead of setup [20:48:08] i need to add entires for an1011 and an1012 [20:48:11] with the macs we just found [20:48:23] yep, but dont commit change yet, there are a few others to make for the same two servers [20:48:28] ok [20:48:35] the entries don't have an installer file listed [20:48:37] that ok? [20:48:43] the default install is precise [20:48:50] as long as thats ok with you, then you are good =] [20:48:59] usually we dont list a distro unless we have to [20:49:11] and we change the default as we go along [20:49:17] but its always a LTS version [20:49:30] so we tend to cycle every two years, behind by about half a year. [20:49:57] yeah ok [20:49:58] that's good [20:49:59] So once you have the lease files updated, we can move on to partman [20:50:04] at this point, welcome to hell. [20:50:05] ok ready [20:50:09] because its about to get confusing as shit [20:50:13] yay [20:50:14] hm [20:50:18] maybe this is stopping time? [20:50:25] well, i should show you the file [20:50:27] ok [20:50:32] then you can feel free to pour over it at your whim [20:50:37] ok cool [20:50:50] look in files/autoinstall [20:50:55] there are a few files you need in there [20:51:09] netboot.cfg needs to be updated to list your hosts, along with what partman script to partition the disks from [20:51:28] it also needs an update to the subnet for panalytics-c in it [20:51:34] which we can do tomorrow, dont worry about it too much [20:51:42] but then you have to have a working partman script on how to partition the disks [20:51:47] this is the part that is a bit hard [20:51:51] o they are in there [20:51:51] analytics101[1-9]|analytics102[0-9]) echo partman/analytics-dell.cfg ;; \ [20:51:57] ohhhhhh.... [20:52:02] you are lucky. [20:52:06] hah, yay! [20:52:08] wait, prolly not [20:52:13] those are for the cisco systems ;] [20:52:18] different disk setup. [20:52:20] i don't see a listing for the C [20:52:24] well no, there are both [20:52:27] yep, cuz it doesnt exist [20:52:28] analytics100[1-9]|analytics1010) echo partman/analytics-cisco.cfg ;; \ [20:52:28] analytics101[1-9]|analytics102[0-9]) echo partman/analytics-dell.cfg ;; \ [20:52:35] oh... true [20:52:42] damnh, you are lucky [20:52:46] someone did that work already [20:52:52] partman is a pain in the ass [20:53:02] but if you, in the future, need to hack at it [20:53:10] myself, mark, daniel, and peter know about it [20:53:12] ok cool [20:53:20] so this is a good stopping point then [20:53:28] knowing tomorrow wont be quite so painful [20:53:32] i remember daniel and peter pulling hair out when doing the partman for ciscos [20:53:33] for us [20:53:37] ok [20:53:37] oh yea, its hell [20:53:41] shodul I add the row c line? [20:53:47] right now just this [20:53:47] 10.64.21.255) echo subnets/analytics1-b-eqiad.cfg ;; \ [20:53:48] yep, need to add in that subnet [20:53:56] and add in the config file [20:54:09] should be easy enough to figure out looking at the existing file for row b [20:54:16] 10.64.22.255) echo subnets/analytics1-c-eqiad.cfg ;; \ [20:54:19] ? [20:54:33] yep, plus adding the analytics1-c-eqiad.cfg file [20:54:36] oo [20:54:42] ah [20:54:43] k [20:54:47] i would copy the row b then edit it [20:55:11] cool, yeah [20:55:18] everything the same cept for thegateway? [20:55:36] d-i netcfg/get_gateway string 10.64.22.1 [20:56:16] same cept for gateway yep [20:56:26] cool [20:56:28] then you git -add the file [20:56:30] yup [20:56:31] and shoudl be cool [20:56:43] should I commit/push, or is there more stuff that will go along with this tomorrow? [20:56:54] well, you will have to add the rest of the hosts to the lease file [20:57:02] so you may just wanna pull that info and add before pushing [20:57:04] but up to you [20:57:06] ok [20:57:13] oh, the rest of them [20:57:14] yeah with macs [20:57:16] ok yeah [20:57:20] when I get up tomorrow i'll do that [20:57:22] recall that the c2100 ends at 1022 [20:57:30] so 1023-1027 will need to be a different partman script [20:57:33] oh right, which means maybe a diff way to find MAC? [20:57:37] oh [20:57:44] you can add to lease file, but they need to be a different line in netboot.cfg partman declarations [20:57:48] do you need MACs on Cisco? [20:57:56] no, we ahve those, all the new machines are dells [20:57:59] mutante: working c2100s, not ciscos [20:58:04] alright [20:58:05] New patchset: J; "Add videoscaler class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16654 [20:58:31] are there different # of disks on the other dells? [20:58:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16654 [20:59:08] ottomata: the c2100s are your work horses, 12 disks [20:59:21] analytics1023-1027 are your misc mgmt hosts [20:59:28] well, analycis misc mgmt hosts [20:59:30] you know, the partman might be the same [20:59:32] so they just have dusl small disks [20:59:34] we are barely partitioning these at all [20:59:39] just root / really [20:59:41] only using two disks? [20:59:43] the rest are unpartitioned [20:59:44] yeah [20:59:47] we are doing that manually later [20:59:52] then it may work, but may also wanna use a misc partman [20:59:53] cause we don't know the exact setup we want yet [20:59:58] yeah [20:59:58] they are dual 500GB disks if i recall [21:00:15] the 500gb version of the file isnt working [21:00:23] cuz i just added it and its tryuing to setup raided swap [21:00:26] im haivng issues [21:00:36] though you can just use the 250gb version and have slightly less space [21:00:44] i would hold off on 1023-1027 until last though [21:00:52] (i may have the partman working by then) [21:00:56] ok cool, will do the others first then [21:01:06] ok, just so I know, once this is pushed and merged [21:01:18] then I use ipmi_mgmt to pxe boot? [21:01:20] or something? [21:01:26] yep, you tell it bootpxe [21:01:28] and kablam, OS is installed? [21:01:31] then when it reboots it loads the installer [21:01:39] you can console to watch and confirm it runs wwithout prompts [21:01:42] cool [21:01:42] so [21:01:47] bootpxe, powercycle, console [21:01:50] (sometimes it prompts for disk overwrite) [21:02:02] yep [21:02:05] coooool [21:02:17] fantastic, man that will be so cool if I can get those up and running tomorrow all by my lonesome [21:02:18] heheh [21:02:38] heh, they arent in service, knock yourself out =] [21:02:47] yeahhhhhh [21:02:48] cool [21:02:52] ottomata: you got the "Build a new server" wikitech page yet? [21:02:55] yup [21:02:58] great [21:03:02] RobH has been giving me a huuuuuge run down all afternoon [21:03:05] super helpful [21:04:50] man yeah, amazing, thanks for the help RobH [21:05:00] very welcome, glad to get others doing installs [21:05:01] i'm going to head out purty soon, i'm sure I'll ahve more Qs for you tomorrow [21:05:04] yeah, this is fun! [21:05:09] especially since I didn't have to write partman! [21:05:10] hehe [21:05:17] indeed, that is the win of the day. [21:05:21] you know, when I was playing with the ciscos [21:05:29] i was doing a ton of partitioning [21:05:31] playing with different setups [21:05:36] repartitiong for cassandra, hdfs, etc. [21:05:43] i wrote a bunch of fdisk scripts to do it [21:05:49] was pretty hacky [21:05:50] but it worked! [21:06:18] ottomata: analytics-cisco.cfg btw [21:06:40] yeah, this was post install though [21:06:45] but it skips the sda/sdb [21:06:47] gotcha [21:06:47] can I use partman to do non-root partiions while booted? [21:07:21] (this was my crazy fdisk stuff: https://github.com/wmf-analytics/kraken/blob/master/bin/setup-scripts/disks/cassandra.sh ) [21:07:38] yea, it should [21:08:09] coool [21:08:29] ok, maybe when I become a partman master I will do that instead of my fdisk < ok time to go [21:08:41] thanks so much all! [21:10:02] have a nice party =] [21:11:45] enjoy [22:07:56] Yay thanks [22:08:22] Will puppet run on it automatically? [22:10:08] OK, could you run puppet then? [22:10:26] I tried SSHing into it as root, but it rejects my key, because it doesn't know about my key, because puppet hasn't told it about me yet [22:14:22] OK [22:34:23] Whee! [22:34:24] Thanks [22:40:43] Dang it, I broke the Parsoid puppet manifest, but that's my own fault [22:42:08] New patchset: Catrope; "Fix paths in the Parsoid startup script" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18193 [22:42:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18193 [22:42:54] Ryan_Lane: Easy change: https://gerrit.wikimedia.org/r/18193 [22:43:04] With that one I can move Parsoid onto wtp1 and give cadmium back [22:43:24] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18193 [22:43:42] merged [22:43:57] Thanks [22:48:47] PROBLEM - Router interfaces on cr1-sdtpa is CRITICAL: CRITICAL: host 208.80.152.196, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-5/2/1 (FPL/Level3, CV71028) [10Gbps wave]BR [22:49:10] * jeremyb waves LeslieCarr [22:50:33] grrr [22:50:39] of course, i go to the bathroom and the network dies [22:52:00] okay, we're back on network code yellow [22:52:12] grrrr [22:52:27] i need a traffic light in front of my desk [22:53:08] can be arranged [22:53:15] lights in barnstar shpae [22:53:16] shape* [22:53:40] :) [22:53:43] ok that's awesome [23:03:00] LeslieCarr: http://exchange.nagios.org/directory/Addons/Notifications/*-Visual-Notifications/NAmpel--2D-Nagios-Ampel-Project/details [23:05:12] :) [23:07:02] nicer and USB: http://translate.google.com/translate?langpair=de|en&u=http://shop.netways.de/alarmierung/nagios-ampel/nagios-usb-ampel-medium.html [23:07:21] oh that is so cute [23:08:37] "At least kernel version 2.6.8" [23:08:38] Haha [23:12:38] <^demon> I kinda like the LEDs in this one better: http://www.cleware.net/produkte/p-usbampel-E.html :) [23:13:02] <^demon> Hahaha. "The usage of the USB-Temp is strictly prohibited when the failure of the sensor will harm people. The usage in medical applications of any kind request the written permission of the Cleware GmbH." [23:13:33] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [23:23:46] New patchset: Pyoungmeister; "moving sudo definitions from sudo::applicationserver into the correct modules" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18194 [23:24:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18194 [23:27:30] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [23:27:30] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [23:27:30] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [23:41:31] New patchset: Asher; "new eqiad dbs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18198 [23:42:11] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18198 [23:42:26] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18198