[00:00:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17964 [00:01:06] * AaronSchulz waves TimStarling [00:05:02] hello [00:08:47] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6905 [00:11:10] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15458 [00:11:56] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15463 [00:13:20] New review: Ryan Lane; "Please ensure the file is actually gone. If the system is rebuilt, then we can merge this then." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/15597 [00:14:23] New review: Ryan Lane; "Hooray for cleanup!" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/15553 [00:14:24] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15553 [00:54:39] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17610 [01:04:00] !log a couple package upgrades on bast1001 [01:04:08] Logged the message, Master [01:41:20] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 212 seconds [01:42:59] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 279 seconds [01:49:44] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 682s [01:54:32] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 10 seconds [01:57:23] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 0 seconds [01:58:44] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 22s [03:11:30] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [03:25:27] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [03:25:27] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [03:25:27] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [04:02:30] PROBLEM - Puppet freshness on hume is CRITICAL: Puppet has not run in the last 10 hours [04:39:03] morning [04:50:38] paravoid: moin moin [04:50:48] :) [04:52:27] paravoid: you don't know any of the mirror team by chance? ftp.us.debian.org has a bad dual stack in rotation atm. i changed to cdn.debian.net which fixed it for me but it's still broke [04:52:47] * jeremyb got several people in a channel to try debian.gtisc.gatech.edu and all failed (not all with the same failure) [04:52:53] ping and HTTP both don't work [04:53:16] no one responds in #-mirrors [04:53:29] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [04:57:03] sec [04:59:10] jeremyb: removed, wait 5mins for the TTL to expire [04:59:26] oh, hah. didn't expect you to do it yourself ;) [04:59:39] i already switched to cdn anyway ;) [05:01:14] didn't even have to wait that long (more than 5 mins since my last try?) [05:01:54] super cow powers. [05:01:57] :-) [05:02:35] * jeremyb imagines valessio making a super cow animation [05:03:00] paravoid: so what about notifying them that they are broken? [05:03:34] I asked symoon about it [05:03:51] we usually add/remove mirrors per mirroradm's requests [05:04:00] or mirror local admins [05:04:06] so I'm not sure of the process [05:04:19] huh. symoon is 13 days idle [05:04:26] anyway, thanks [05:04:35] I could just drop a mail but thought I should ask him before starting stepping on anyone's toes [05:05:05] oh, you asked but didn't get a response. i thought you were saying he already responded [05:05:51] yes. no. :P [05:06:31] :) [05:13:35] New review: Faidon; "Calling it videoscaler::files when it sets up a jobrunner is counter-intuitive. I'd propose moving e..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/16654 [05:24:33] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [05:37:27] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [06:00:33] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [06:30:51] PROBLEM - SSH on labstore3 is CRITICAL: Server answer: [06:32:12] RECOVERY - SSH on labstore3 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [06:40:33] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: Puppet has not run in the last 10 hours [06:55:33] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [07:10:32] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [07:25:32] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [07:27:29] RECOVERY - Puppet freshness on srv281 is OK: puppet ran at Wed Aug 8 07:27:19 UTC 2012 [07:44:42] New review: Nikerabbit; "Would be nice to mention which (parts of) commit this reverts." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17965 [07:46:24] New review: Nikerabbit; "Looks more like addition of new stuff than cleanup." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17966 [08:02:21] Hello. Pursuant instructions of some devs, I wish to notify that I've been requested to perform a bigdelete on enwiki on a page with +5000 revids (approximately 5,520 revisions). Do I have the ops blessing? [08:03:03] Page is: https://en.wikipedia.org/wiki/User:28bot/edit-tests-found/2012-July [08:09:32] I'm fine with it, anyone else want to weigh in? [08:10:59] apergos: do we know the exact number of revids the page has? Because vvv's tool says that less than 3,000 but the deletion warning says that well over 5,000 revids. [08:11:13] lemme see [08:12:11] https://toolserver.org/~vvv/revcounter.php?wiki=enwiki_p&title=User%3A28bot%2Fedit-tests-found%2F2012-July [08:13:25] 4419 [08:14:15] well, I'll wait a bit to see if anybody objects then I'll delete [08:14:56] maybe there are some deleted revisions in the archive table [08:23:42] Performing deletion. [08:25:20] okey dokey [08:28:51] Done. [08:29:02] sweet [08:29:06] have a nice day :-) [08:29:22] Everythin is right? [08:29:32] *Everything [08:29:54] seems like it [08:30:03] until the hordes show up saying they can't edit :-D [08:30:55] we have a couple dbs that are lagged, the rest are fine [08:31:58] nothing that I can do to fix that, undeleting would increase the lag :) [08:32:12] well what I didn't do was check it before the delete [08:32:22] so it's hard to compare [08:33:11] I've seen the user proposing that bigdelete be assigned locally /me screams [08:34:39] anyways it's dropping so all good [08:35:26] I suppose that large deletes should be done in batches and maybe even jobqueued for these batches [08:35:29] someday [08:37:57] I remember when the meta:SRP page (+30k revids) was deleted in error... [08:38:47] that's pretty up there [08:40:33] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [08:54:30] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [08:54:30] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [10:00:38] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [10:03:34] apergos: hordes coming? how's possible before https://bugzilla.wikimedia.org/show_bug.cgi?id=16043 is fixed? :p [10:05:18] when things break we get hordes in wikimedia-tech [10:05:21] standard [10:44:36] New patchset: Nikerabbit; "Initial version of solr for ttmserver" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16732 [10:45:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16732 [10:49:38] New review: Nikerabbit; "Renamed the file and parametrized the class." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/16732 [11:44:10] New review: Oren; "Hi I've been learning puupet and it looks like you need to tell solr to restart if the schemma changes" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/16732 [11:49:53] !log Upgraded Observium to latest SVN [11:50:04] Logged the message, Master [12:40:13] New patchset: Mark Bergsma; "Add partial RANCID manifest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18082 [12:40:52] New patchset: Mark Bergsma; "Add cr1-esams to RANCID" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18083 [12:41:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18082 [12:41:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18083 [12:41:56] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18082 [12:42:15] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18083 [12:54:01] New review: Nikerabbit; "Hi Oren, thanks for the review." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/16732 [12:55:15] New patchset: Mark Bergsma; "Add cr1-esams to Torrus" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18084 [12:55:53] New patchset: Mark Bergsma; "Add cr1-esams to Nagios" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18085 [12:56:30] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18084 [12:56:31] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18085 [12:56:42] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18085 [13:02:07] New review: Oren; "I guess there are more then one way to make the puppet dance" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/16732 [13:12:30] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [13:26:36] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [13:26:36] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [13:26:36] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [14:03:30] PROBLEM - Puppet freshness on hume is CRITICAL: Puppet has not run in the last 10 hours [14:35:21] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17698 [14:36:31] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16432 [14:36:45] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17918 [14:37:27] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16987 [14:38:08] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17398 [14:38:21] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17399 [14:38:42] ottomata: care to work on your key issue? i was working in PM with chris about it [14:39:10] nice, yeah i'm talking with him too [14:39:17] ok [14:39:29] i'm going to generate a new one for wikimedia (root?) stuff [14:39:33] ottomata: you know about os x terminal sharing its keys across tabs right? [14:39:42] no? [14:39:46] unless you 'exec ssh-agent bash' on each tab before loading key [14:40:00] it matters when you are logging into multiple key domains (like production/labs) [14:40:11] dont want/need to pass your production key into labs. [14:40:23] hm [14:40:43] ok…, if I specify the identity on the ssh command it should be ok, though ja [14:42:58] chjohnson, i just emailed you my public key [14:54:30] New patchset: Catrope; "Redirect secure.wikimedia.org URLs to proper HTTPS" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13429 [14:55:08] Change abandoned: Catrope; "Superceded by https://gerrit.wikimedia.org/r/#/c/13429" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15599 [14:55:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13429 [14:57:21] New review: Catrope; "Differences between PS2 and PS3:" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/13429 [14:58:48] New patchset: Cmjohnson; "add new key for ottomata" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18093 [14:59:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18093 [15:01:29] cmjohnson1: nope, you need to change it, commenting [15:02:02] Totally need to make sure the old key doesn't exist then add the new one [15:02:15] yep, commented that ;] [15:02:54] New review: RobH; "The old key needs to be set to absent, then a new key stanza added." [operations/puppet] (production); V: -1 C: -1; - https://gerrit.wikimedia.org/r/18093 [15:03:10] there are in line comments for ya, cherrypick and fix =] [15:20:07] New patchset: Cmjohnson; "add new key for ottomata" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18093 [15:20:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18093 [15:25:32] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [15:33:10] hey [15:33:11] what's up? [15:34:37] cmjohnson1: looks good to me [15:36:02] New patchset: Mark Bergsma; "Cleanup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18096 [15:36:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18096 [15:38:35] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [15:40:04] mark: re: why is it here? in your comments [15:40:17] applicationserver/apache/apache2.conf.erb uses the cluster to generate the correct apache2.conf [15:43:48] then that should become a parameter of the class [15:44:14] and probably it would be better to just make parameters for the apache config class [15:44:19] which sets the max clients as a nr [15:44:25] instead of putting that logic inside a template [15:44:48] i see that's not the only thing it controls [15:45:54] did you make those conditionals inside apache2.conf.erb? [15:46:55] New review: RobH; "much better, merging" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/18093 [15:46:56] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18093 [15:47:47] cmjohnson1: ok, change is merged and live on sockpuppet [15:48:09] so if ottomata needs access to a specific server, someone can force a puppet run a few times until they see the key change [15:48:19] otherwise it may take a few hours to update across the cluster [15:48:29] mark: yeah [15:48:52] hm [15:48:53] I can redo, if you'd like [15:49:04] so I am trying (and learning?) to set up the new analytics dell boxes [15:49:21] don't think puppet is on those yet [15:49:33] but maybe I need root to access other things to set them up? [15:49:39] (I have zero idea how to do this) [15:49:43] ottomata: what has been done to them so far? [15:49:47] I can take a look if you'd like [15:49:56] no OS, afaik, just subnet setup by LeslieCarr [15:50:02] ah [15:50:02] ok [15:50:17] https://rt.wikimedia.org/Ticket/Display.html?id=3367 [15:50:22] https://rt.wikimedia.org/Ticket/Display.html?id=3067 [15:50:50] notpeter: i'll do this [15:51:39] mark: ok, just let me know if there's anything you'd like me to do on that [16:00:30] so, notpeter and cmjohnson1 [16:00:35] i am in fenari with new key [16:00:49] wait [16:00:49] sorry [16:00:50] no [16:00:53] i ran the wrong command [16:00:55] i take it back complete [16:00:57] old key still works [16:00:59] new key does not [16:01:03] you accidentally the everything? [16:01:12] yeah, it'll take until puppet runs for it to be purged [16:01:33] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [16:02:22] notpeter, old key works on bast1001, new key does now [16:02:23] not [16:02:25] * [16:03:11] bast100: [16:03:14] old key: [16:03:15] debug1: Offering RSA public key: /Users/otto/.ssh/id_rsa [16:03:15] debug1: Server accepts key: pkalg ssh-rsa blen 277 [16:03:29] new key: [16:03:30] debug1: Offering RSA public key: /Users/otto/.ssh/id_rsa-wmf [16:03:30] debug1: Authentications that can continue: publickey,password [16:03:30] debug1: Next authentication method: password [16:03:30] otto@bast1001.wikimedia.org's password: [16:04:00] might be position of the moon [16:06:37] New patchset: Mark Bergsma; "Try a different way of configuring apache" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18099 [16:07:03] try bast1001 now [16:07:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18099 [16:07:17] old key doesn't work [16:07:18] new key does [16:07:20] sorry [16:07:21] doesn't [16:07:25] both don't work on bast1001 now [16:07:59] but, notpeter, i may have been going to bast1001 through fenari before, [16:08:09] i realized I had a ProxyCommand set up in .ssh config that was going that [16:08:17] i've commented that out [16:08:20] ah, ok [16:08:26] I see your new key on bast1001 [16:08:31] fenari is the same: [16:08:31] old key works, new key does not [16:08:45] bast1001, neither key works [16:08:56] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16732 [16:09:30] rawr [16:10:19] New patchset: Mark Bergsma; "What kind of messed up indentation was that?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18101 [16:10:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18101 [16:11:49] New patchset: Mark Bergsma; "Cleanup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18102 [16:12:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18102 [16:13:16] ottomata: can you verify that the new key in here is correct: https://gerrit.wikimedia.org/r/#/c/18093/2/manifests/admins.pp [16:13:30] cuz I see that key in your auth keys file on bast1001 [16:13:55] for user otto? [16:14:00] ja [16:14:46] yeah that is correct [16:16:25] New patchset: Mark Bergsma; "Kill $cluster, it conflicts with the global $::cluster." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18103 [16:16:29] ottomata: can you confirm you only have the new key loaded, and not both? (ssh-add -l) [16:16:49] in ssh -v I can see it using my key [16:16:53] i am specifying key manually with -i [16:16:56] cool [16:17:01] ssh -v -i /Users/otto/.ssh/id_rsa-wmf bast1001.wikimedia.org [16:17:02] New patchset: Mark Bergsma; "Kill $cluster, it conflicts with the global $::cluster." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18103 [16:17:06] yea that should be fine then [16:17:19] and it doesnt like it on bast1001? [16:17:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18103 [16:17:47] looks like my key was wrong in puppet, extra '==' [16:17:55] notpeter just edited it and I am in in bast1001 now [16:17:59] cool [16:18:14] it was the == at the end of the key [16:18:26] cmjohnson1: ^ [16:18:30] PROBLEM - Puppet freshness on labstore2 is CRITICAL: Puppet has not run in the last 10 hours [16:18:58] I've had this happen before. I know what to look for :) [16:19:56] notpeter: so... the sudo::appserver class [16:20:07] I think we should move that into the applicationserver and mediawiki modules [16:20:09] cool, s'ok! [16:20:18] mediawiki sync stuff in the mediawiki module [16:20:22] other stuff in the appserver module [16:20:39] so can you create those classes/modules/config files? [16:20:58] sure [16:21:02] perhaps it's just mediawiki [16:21:06] i'm not entirely sure [16:21:10] i see the nagios check raid [16:21:14] that doesn't belong in there at all I think [16:21:27] that's related to the raid check in base or wherever [16:21:39] so it should be a totally separate sudo include file [16:21:55] mind you, this is a bit similar to e.g. pybal_check [16:22:01] you said, "perhaps this should move to a pybal module" [16:22:04] no, it shouldn't :) [16:22:11] a pybal module sets up pybal [16:22:24] but this is applicationserver specific support for pybal [16:22:33] pybal needs it to check an applicationserver [16:22:37] New patchset: Pyoungmeister; "update of andrew otto's key without == at the end." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18104 [16:22:40] so it needs to live in the applicationserver module [16:23:00] much like nagios check definitions also don't live in the nagios module [16:23:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18104 [16:25:41] ok. I was thinking that both the client and server ends of the pybal-related things could live in the pybal manifests [16:25:47] but either way is fine with me [16:29:13] New patchset: Mark Bergsma; "Put the config::apache inclusion in the webserver role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18105 [16:29:51] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18104 [16:29:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18105 [16:29:56] feel free to merge my changes if that helps your work [16:30:40] ok [16:36:19] New patchset: Mark Bergsma; "Put the config::apache inclusion in the webserver role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18105 [16:36:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18105 [16:40:30] Logged the message, Master [16:41:12] woo! [16:41:36] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: Puppet has not run in the last 10 hours [16:56:36] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [17:10:00] New patchset: Cmjohnson; "changing the MAC address for Search32 after main board swap." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18113 [17:10:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18113 [17:11:35] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [17:12:16] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18113 [17:13:03] why is the wiki soooooo slow? [17:13:23] bottleneck appears to be when loading from bits.wikimedia.org [17:16:36] cmjohnson1: should be good to go [17:18:53] nah, it'll break again [17:18:56] I can feel it in my bones [17:26:35] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [17:28:32] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [17:57:50] cmjohnson1: around ? [17:58:18] ok [17:58:22] lemme know when you're back ? [17:59:47] yep [18:03:47] LeslieCarr: can I point you at my request for a public ip for ms10 in tampa? [18:04:04] ah sure [18:04:26] 1 minute [18:04:48] sure. it's this: https://rt.wikimedia.org/Ticket/Display.html?id=3379 [18:04:57] for whenever you get to it. [18:14:10] apergos: so are you ready to switch over the ip now ? [18:14:20] ah hm [18:14:31] so I would just take the box down and switch the ip [18:14:37] cause I'm gonna reinstall [18:14:40] *shrug* [18:14:48] if you want I'll shut it down [18:15:03] ok [18:15:05] I'll deal with puppet cleanup tomorrow [18:15:15] it was stupid for me not to remember it needed to be public facing [18:15:35] no problem [18:15:41] i do stupid shit all the time ;) [18:16:43] !log shutting down ms10, going to move it to public ip [18:16:52] Logged the message, Master [18:17:04] done [18:17:26] so yeah if all you want to do is dig up an ip addr and shove it in the ticket that's fine [18:17:31] I can do dns and the vlan move. [18:17:43] if you want to do the whole shebang that's fine too, whatever works [18:18:11] cool [18:18:12] :) [18:18:44] updated with an ip [18:18:54] let me know if you need any help/advice with the vlan move [18:19:25] ok, thanks. it'll be tomorrow, I've done these before, it shouldn't be a deal [18:19:33] PROBLEM - Host ms10 is DOWN: PING CRITICAL - Packet loss = 100% [18:19:35] this is a juniper router [18:19:43] (I can do it on foundry too but I like it less) [18:20:36] I do have a question though: how did you dig up an ip for tampa? [18:22:04] i just checked out the rdns file for an empty ip and pinged it [18:22:51] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms [18:23:12] and there were some? sweet [18:23:50] I thought we had been in the "none left" phase for a while now, scrimping by by stealing from decommisioned hosts [18:25:37] well, it's pretty tight still [18:26:11] ok cool [18:26:45] PROBLEM - Lucene disk space on search32 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:27:12] PROBLEM - SSH on search32 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:30:03] RECOVERY - SSH on search32 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [18:35:31] !log completed metawiki centralnotice schema migration for fundraising [18:35:39] Logged the message, Master [18:38:27] PROBLEM - Lucene on search32 is CRITICAL: Connection refused [18:41:36] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [18:45:51] New patchset: Alex Monk; "(bug 37226) Fix portal talk namespace on bswiki. Also add comments with the bug number." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/18124 [18:51:30] PROBLEM - NTP on search32 is CRITICAL: NTP CRITICAL: No response from NTP server [18:55:14] New patchset: Catrope; "Set $wgVisualEditorParsoidPrefix correctly" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/18125 [18:55:33] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [18:55:33] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [19:08:08] RECOVERY - Lucene disk space on search32 is OK: DISK OK [19:09:59] ok, cmjohnson1, RobH, and notpeter [19:10:08] i finally got all my ssh stuff all sorted [19:10:09] feels good [19:10:20] cleaned up my .ssh/config, everything is smoooooth [19:10:33] what's next for getting OSes on these new machines? [19:10:39] i think I need management (IPMI?) access [19:10:44] I put in an RT ticket here: [19:11:03] https://rt.wikimedia.org/Ticket/Display.html?id=3406 [19:11:17] so an ops person has to do the actual install [19:11:26] so best to list how you wish the os partitioned and the base install stuff [19:11:38] then once we do the install we can do the initial puppet run to get you access [19:11:47] (i think) [19:12:04] i dunno if the network these are on can touch our install server [19:12:08] LeslieCarr: ^? [19:13:23] they should be able to [19:13:32] DHCP requests are forwarded on [19:13:43] if it doesn't respond let me know [19:13:44] ok cool [19:14:10] ottomata: so yep, detail in an rt ticket how you need them partitioned, then you can check in your gerrit changes for review on what puppet manifests to apply [19:14:55] I did that, but [19:15:05] I think the point of this is to teach me to do that part myself [19:15:08] I have done that, though [19:15:15] https://rt.wikimedia.org/Ticket/Display.html?id=3367 [19:15:48] CT suggested that I learn how to install the OS and setup puppet myself [19:16:00] once I get OS installed and network/ssh access, I probably already know how to do most of the rest [19:16:09] or can figure it out from wikitech [19:16:21] but at this point, I have zero idea of how to access machines with no OS [19:16:29] notpeter said IPMI (or KVM?) [19:16:33] but I need a management pw for that? [19:16:57] the mgmt password is only for folks with root [19:17:10] so unless you have been approved for cluster wide root access, no can do [19:17:16] its why we tend to put the OS on for folks. [19:17:28] i have root [19:17:33] oh, then thats fine. [19:17:41] ottomata: you are on fenari right now right? [19:17:45] i can write you the mgmt password [19:17:50] one sec [19:17:57] and walk you through how to do an install, with relevant wikitech docs [19:17:59] ok [19:18:01] cool! [19:18:02] on it now [19:18:49] ottomata: got it? [19:19:02] i think we can do the first link [19:19:11] is anyone on tampa management consoles ? [19:19:16]