[00:09:50] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.015 seconds [00:10:44] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [00:16:45] !log Scap complete [00:16:53] Logged the message, Master [00:42:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:58:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.032 seconds [01:19:53] PROBLEM - Puppet freshness on gallium is CRITICAL: Puppet has not run in the last 10 hours [01:26:20] New patchset: Ryan Lane; "Initial commit of deployment module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/27478 [01:31:18] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:48:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.021 seconds [02:20:11] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:35:29] PROBLEM - LVS HTTP IPv4 on wiktionary-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.026 seconds [02:36:59] RECOVERY - LVS HTTP IPv4 on wiktionary-lb.eqiad.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 62702 bytes in 0.140 seconds [02:56:49] TimStarling, I saw a report on WP:VPT about search updates being broken again on en.wp, are you aware of this / did you just fix it earlier? [02:58:34] I haven't looked at it or fixed it, I'll have a look now [02:58:38] thanks [03:04:04] TimStarling, fwiw, looking at the report at https://en.wikipedia.org/wiki/Wikipedia:VPT#Search_list_not_updating I am able to find titles from yesterday in the opensearch suggestions, but I'm not able to retrieve the text of those same pages in the fulltext search [03:09:32] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [03:12:09] so this report is from November 14? [03:14:46] no, the one I linked to is from nov 25 [03:16:57] https://en.wikipedia.org/wiki/Wikipedia:VPT#Search_list_not_updating ? [03:17:26] y [03:17:36] ah, I had an old cached version of that page somehow [03:18:01] the same section title was there, but from 11 days earlier [03:18:05] heh [03:30:24] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [03:45:25] I haven't had to look at this incremental index code before, but I'm getting there [03:46:38] we have this: [03:46:39] root@searchidx2:/a/search/indexes# cat status/enwiki [03:46:39] #Last incremental update timestamp [03:46:39] #Fri Nov 23 03:46:19 UTC 2012 [03:46:40] timestamp=2012-11-23T03\:44\:02Z [03:46:57] that is probably when the problem started [03:47:27] so around the time I did my upgrade [04:12:06] !log on searchidx2: restarted incremental updater [04:12:14] Logged the message, Master [04:13:02] I enabled INFO level logging temporarily, we will see what happens [04:13:24] but it seems to be working for now, it's up to arzwiki [04:44:03] TimStarling: apergos booted (or just started?) something on idx2. maybe it should have been done for 2 services and he only did one? [04:45:15] hrmmm, not seeing the !log for it [04:47:00] nope, i'm confusing with search13 apparently (based on my IRC log) [04:49:36] odd... you were looking at enwiki which was not one of the ones broken 3 days ago (AFAIK) [04:50:13] PROBLEM - Host srv238 is DOWN: PING CRITICAL - Packet loss = 100% [04:50:47] apergos was looking (3 days ago) at least at {fr,en}wikisource on searchidx2 and he said then that the files were recent. but that was 2 days after the timestamp tim pasted above [04:51:19] (i guess apergos meant the filesystem timestamps on the index files themselves) [05:16:07] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [05:46:39] jeremyb: there are two copies running now, one started at 04:09 and one at 04:24 [05:47:28] gerrit-wm: where's my lint check!? [05:47:31] !log on searchidx2: killed extra IncrementalUpdater daemon [05:47:38] Logged the message, Master [05:48:40] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/27478 [05:48:47] screw it. no lint check [05:51:46] New patchset: Ryan Lane; "Add a role for the deployment module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/27479 [05:51:55] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/27479 [05:55:36] New patchset: Ryan Lane; "Add deployment code to sockpuppet" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35597 [05:56:14] -_- [05:56:23] seems the zuul changes means no more ops lint checks [05:57:31] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35597 [07:25:05] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [08:05:23] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 226 seconds [08:07:02] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 327 seconds [08:10:02] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [08:10:11] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 2 seconds [08:16:38] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 303 seconds [08:30:27] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 272 seconds [08:50:00] New patchset: Hashar; "Jenkins test please ignore." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35135 [08:54:53] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [08:54:54] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [08:54:54] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [08:55:33] New patchset: Hashar; "Jenkins test please ignore." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35135 [09:10:56] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [09:10:56] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [09:25:47] PROBLEM - Puppet freshness on snapshot1002 is CRITICAL: Puppet has not run in the last 10 hours [09:29:32] doh parameterized class are killing me :-] [09:29:37] class { 'java::openjdk': version => '1.6', jdk => true, } [09:29:38] class { 'java::openjdk': version => '1.7', jdk => true, } [09:29:39] duplicate definitions :( [09:38:15] gotta use a define instead of class [09:44:51] New patchset: Hashar; "testing out multiple defines" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35601 [09:52:03] New patchset: Hashar; "testing out multiple defines" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35601 [09:58:58] New patchset: Hashar; "convert java::openjdk to a define" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35601 [09:59:13] solved [10:00:48] New review: Hashar; "This cause puppet to emit a duplicate definition errors for java::openjdk. The workaround is to use ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/34863 [10:11:23] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [10:13:06] hashar: oh yes, of course [10:13:11] should have caught that when I merged it [10:13:12] apologies [10:15:28] paravoid: I am not yet a pro :-] [10:15:35] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [10:15:42] on labs I only tested installing one version, not both :( [10:15:53] paravoid: so half tested == half working [10:16:01] and good morning! enjoy your coffee [10:18:06] getting one myself, brb [10:20:07] New review: Silke Meyer; "So, what's next? Can we use this in mediawiki.pp even if parts if it are copied? Or shall we drop it..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/35173 [10:20:14] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 1 seconds [10:55:55] New review: Silke Meyer; "Works for mediawiki installation but breaks wikidata installation: wikidata.pp needs to be adapted, ..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/35293 [11:20:33] New review: Silke Meyer; "Cool, works!" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/35313 [11:20:39] PROBLEM - Puppet freshness on gallium is CRITICAL: Puppet has not run in the last 10 hours [11:30:03] Silke_WMDE: hi; i just noticed your issue on github.com/atdt/wmf-vagrant, will answer tomorrow! [11:41:09] ori-l: you must sleepppp [11:41:21] ori-l: or you are never going to be productive tomorrow morning :-] [11:41:39] i've been sleeping since 10pm [11:41:46] no idea what you're talking about [11:41:47] i'm not here [11:41:48] etc [11:41:50] ah ok [11:41:51] sorry [11:41:52] :) [11:41:55] I though you were awake [11:42:12] seems I am dreaming myself [11:42:23] anyway, lunch time [11:42:55] enjoy! [13:10:07] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [13:28:45] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35601 [13:31:07] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [13:50:31] New patchset: Demon; "Reformat gerrit hooks to use more python-esque style" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35615 [13:55:56] RECOVERY - Puppet freshness on gallium is OK: puppet ran at Wed Nov 28 13:55:29 UTC 2012 [13:56:14] New review: Hashar; "ideally we would want to make them pass pep8 linter:" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/35615 [14:13:03] New review: Ottomata; "Interesting!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35601 [14:14:40] New review: Faidon; "A few days ago. I suggested a generic java module (instead ot the original idea) to Hashar esp. thin..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35601 [14:16:32] New review: Hashar; "I have introduced it with https://gerrit.wikimedia.org/r/#/c/34862/ a few days ago :-] I was not a..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35601 [14:17:18] hey hashar [14:17:20] https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=blob;f=manifests/misc/java.pp;h=52bbfcddd4e41fbfd0e60b407de174a1b1ea4114;hb=HEAD [14:17:46] hello ottomata :-) [14:18:02] morning! [14:18:04] yeah I have been looking at that one [14:18:12] ah ok, wasn't sure if you knew of it :) [14:18:19] I haven't even tried to look for an existing java class, so I just created my own :( [14:18:23] i saw your change to openjdk.pp, I wasn't aware of that one [14:18:46] i created this one back uhhmmmm, in the spring or early summer or something [14:18:55] as Faidon and I replied on change 35601, the java module has been introduced this week [14:18:58] or maybe last week [14:19:01] anyway, really new one [14:19:12] it is uberly simple so should not be too much of trouble :-] [14:19:18] oh sorry, missed you rreply there [14:19:48] so hmm I ended up reinventing the wheel [14:23:16] hm [14:23:31] maybe we should just move java.pp to the java module [14:23:46] and just put it in init.pp [14:24:15] mind if I make that change? I'll ask you and faidon for review [14:24:51] I'd more than happy to +2 a change that merges those two efforts [14:25:23] ok cool, i'll do that then, it shoudl be the same to hashar's stuff, but we'll need him to veriy [14:26:33] ottomata: I am happy with anything that move code from the main puppet dir to a submodule [14:26:50] ottomata: making your existing java.pp an init.pp to java module makes a lot of sense [14:26:58] + that is definitely not going to break anything [14:27:16] I don't think java.pp uses any template / file so that should be fine. [14:27:50] cool, which java version do you want to be the alternative default? [14:27:51] 6 or 7? [14:29:03] as in, which version do you want to invoke when you just run `java` [14:29:03] ? [14:29:29] looks like you have 6 as default on gallium right now [14:33:09] ottomata: we have some Oracle version installed to build the mobile application [14:33:20] ottomata: and going to need openjdk 6 and 7 [14:33:32] I guess the default should be whatever default is shipped by Ubuntu [14:33:34] oh, ok, so this should install all 3? [14:33:38] this is probably oracle 6 [14:33:45] i think so [14:33:47] install all the javas! [14:33:57] the default on gallium is openjdk6 right now [14:34:00] ottomata: I still have to check with the mobile team if openjdk is fine to them [14:34:04] I think we want to drop oracle [14:34:10] (don't quote me on this :-) ) [14:34:21] want me to install oracle too then? [14:34:45] ah Ubuntu has defaults packages: default-jdk and default-jre [14:34:53] I would expect a puppet "java" class to install thoses [14:35:13] that is 1.6 on precise [14:35:21] maybe the next ubuntu version will have 1.7 by default [14:35:22] one is just the runtime environment [14:35:25] the other is a development kit [14:35:32] yeah, those are just package aliases to the real package names [14:35:54] on gallium I would prefer we do not install Oracle from puppet [14:35:56] but the java define does choose a default for you if you don't specify [14:36:03] but if you are installing multiple versions [14:36:07] I am not sure which version currently run nor what is going to be installed by puppet [14:36:08] if you do it manually [14:36:18] the default will change based on the last one you install (90% on that) [14:36:20] and I am afraid it might cause troubles with the existing jobs building the mobile apps [14:36:28] so, this define lets you pick which one you want to be the default java [14:36:34] by specifying alternative => true [14:36:43] ok [14:36:54] so cool, puppet will only install openjdk6 and 7 [14:36:57] and make 6 the default [14:37:04] (since that is what you ahve as default now) [14:37:22] yup :) [14:37:31] could find out later oni if we still need Oracle version on gallium [14:37:37] if it is no more needed, I will drop it :-] [14:38:01] as for the java.pp manifest you wrote, you are checking the $::lsbdistrelease to find out the package prefix [14:38:21] I think that should instead rely on the default packages provided by Ubuntu ( default-jre and default-jdk ) [14:38:32] but that is not much of a problem right now since our latest Ubuntu version is Precise [14:38:51] would be a problem whenever we start installing new Ubuntu versions though since we will end up forcing an install of 1.6 [14:38:58] I guess we can fix that later [14:40:11] hmm, ok I will add that into that TODO [14:41:05] Change merged: Demon; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35540 [14:45:02] New patchset: Ottomata; "Merging duplicated java installation efforts into java module." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35621 [14:45:17] <^demon> !log fenari:/home/wikipedia/common now reflects Ia319794c - rm'd all traces of submodule [14:45:26] Logged the message, Master [14:46:21] hashar, paravoid: https://gerrit.wikimedia.org/r/#/c/35621/ [14:46:55] ottomata: reviewing :) [14:49:10] New review: Hashar; "Looks fine, sorry for the earlier code duplication." [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/35621 [14:49:25] one more module, one less puppet manifests in global space [14:49:27] \O/ [14:49:54] <^demon> I can't seem to sync-dir multiversion :\ [14:50:25] wee, thank you, should I let paravoid merge it or shall I go ahead? [14:51:12] !g Ieea46ba6d92d32a83c8795992a6a6c4012a18d8d [14:51:12] https://gerrit.wikimedia.org/r/#q,Ieea46ba6d92d32a83c8795992a6a6c4012a18d8d,n,z [14:51:27] ottomata: fine to me. Merge is up to you :-] [14:51:38] ottomata: I guess since you have rights to merge you can go ahead and merge that in [14:51:53] oook [14:51:56] ^demon: does it complains about the path already existing or something ? [14:52:02] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35621 [14:52:02] <^demon> No, it was just slow. [14:52:03] New patchset: Hashar; "LBFactory_Multi setup for labs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/29344 [14:52:08] <^demon> paravoid: Who was doing the apache upgrades? [14:52:18] notpeter: [14:52:31] if we're talking about the same upgrades [14:52:33] what's the issue? [14:52:45] <^demon> I'm getting a ton of host authenticity errors from rsync. [14:53:02] ok, hashar, merged on sockpuppet, go ahead and try on gallium [14:53:09] <^demon> All in the mw* range, plus a few search idxs and hume. [14:53:11] i'm changing locations, i'll be back on in 15ish [14:54:34] New review: Hashar; "This has been on beta for quiet sometime, lets deploy it." [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/29344 [14:54:34] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/29344 [14:55:49] I liked hashar's version better tbh :) [14:56:02] java { 'java-6-openjdk': version => 6, alternative => true } [14:56:22] is tehre a java-7-openjdk version = 6 [14:56:27] win [14:57:08] <^demon> There is a 7, but those aren't the package names anymore. [14:57:21] <^demon> It's openjdk-6-jre, openjdk-7-jdk, so on. [14:57:51] no [14:57:56] that is just a title for the define [14:57:59] it is arbitrary [14:58:15] you can refer to it later if you somethign depends on it, for example: [14:58:32] <^demon> Indeed. But better to use the actual package name, rather than the old one that's obsolete. [14:58:39] <^demon> Otherwise you'll just confuse someone like me ;-) [14:58:44] package { "java-crazy-thing": require => Java["java-6-openjdk"] .. [14:59:03] so the define doesn't care what you put there, [14:59:14] hashar could change that to whatever he wanted [14:59:22] but, the reason the define doesn't use the package names explicitly [14:59:28] is because they change in different versions of ubuntu [14:59:50] https://gist.github.com/4161806 [15:00:23] New patchset: Hashar; "beta: use IP for database hostname" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35623 [15:00:27] and not all versions and distributions are available on all OS versions [15:00:53] New review: Hashar; "beta is a bit broken without that. File content is properly safeguarded and not going to interact wi..." [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/35623 [15:00:53] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35623 [15:01:00] so, paravoid, if you like, we could change this line: [15:01:00] java { 'java-6-openjdk': version => 6, alternative => true } [15:01:00] to [15:01:00] java { 'muffins': version => 6, alternative => true } [15:01:13] <^demon> Muffins! [15:01:38] package { "breakfast": require => "muffins" } [15:01:49] require Java["muffins"] [15:02:00] they've already got java (coffee?) beans, eh? [15:02:08] ok and it is coffee time for meeeee [15:02:10] back in a bit [15:02:42] sbernardin: hey, can you do me a favor and see the brand name of the fiber raceway in rack d3-sdtpa? [15:02:50] chris and i are going to order some, or you can take photo [15:02:52] or both. [15:02:59] but brandname is what i really need [15:06:14] sbernardin: you may need to ask miguel...we got it from him [15:06:58] RobH: let me check on it [15:14:07] it may be panduit [15:14:13] usually is, but just confirm, thx [15:17:05] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [15:29:04] sbernardin: ms-be3 goes on b2...ms-be5 will replace the existing one..i need to schedule this w/apergos before taking ms-be5 offline [15:30:21] yep, I need to set weights to zero a couple days in advance [15:30:31] did you see the proposed schedule? [15:31:42] yes i did...so we are ready for be3 and be5 when you are [15:32:14] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:32:30] so do you have the info on the deadline for shipping back the c2100s? (if so can you stuff that on the wiki page?) [15:32:49] I should set the weights to zero today on those two [15:34:15] apergos: once i have the information, I will update the page [15:34:34] ah, I thought someone had that from talking with dell, my bad [15:34:56] okay, so do you want to plan for a Friday? [15:35:06] we did but our time frames have changed [15:35:13] do the slow shipping [15:38:05] PROBLEM - Puppet freshness on sockpuppet is CRITICAL: Puppet has not run in the last 10 hours [15:38:13] er for which boxes? are they not here? [15:39:11] apergos: for all of them...they didn't arrive like they should...and we are still waiting on 10 for eqiad [15:39:34] which ones did come in? [15:39:53] RECOVERY - Puppet freshness on snapshot1002 is OK: puppet ran at Wed Nov 28 15:39:40 UTC 2012 [15:40:00] all of Tampa is onsite...not all of it is racked because they need to be changed 1:1 [15:40:05] yup [15:40:11] eqiad is missing 10 swift still [15:40:20] racking 2 today [15:40:27] ok well we can start with ms-be3 and 5 on friday if you want [15:40:36] okay...lets do that [15:40:52] we'll follow your schedule [15:41:25] it might get streetched (assumign we don't run into the dell deadline) depending on cluster performance, hopefully not [15:41:29] how long after be3 and 5 are replaced should we wait until we do the next 2? [15:41:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.030 seconds [15:42:20] well a few days, and that's the question is how few we can make it [15:42:26] if i recall correctly it was 45 days but since we haven't received the last group we have some extra time [15:42:32] great [15:43:03] ah do we have ssds for all the new boxes in tampa? [15:43:14] that was the other q I had on the wiki page [15:44:40] yes, we do [15:44:53] yay [15:45:14] ok I want them all to go in with ssds then (I know, after we said not, but then 4 boxes are being pulled for ceph testing so I want tthem all in) [15:45:38] okay...so all of them get ssds [15:45:49] yep [15:45:53] sbernardin: ^ [15:46:11] in tampa. [15:46:14] we will need to add the 2 ssd's to all of the 720's now [15:46:37] RobH: yes...It's Panduit [15:46:55] yes in tampa...we will deal w/eqiad once they all arrive [15:47:03] yeah, I don't know what is wanted there. [15:47:21] so we are set for ms-be3 and 5 for friday? [15:47:40] we will be [15:47:49] ok...sounds good [15:47:53] I have to head out in a minute but I have new rings ready to go for when I get back [15:48:11] and I'll watch over the next little while to make sure everything behaves properly [15:48:19] we'll chat friday mornign your time anyways [15:48:32] ok [15:49:44] sbernardin: make sure you check w/ apergos and myself before making any changes [15:53:18] backin a little while [16:02:14] RECOVERY - NTP on snapshot1002 is OK: NTP OK: Offset -0.002396702766 secs [16:14:14] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:23:17] does modification of Squid configuration a al http://wikitech.wikimedia.org/view/How_to_block_a_remote_loader require root privileges? [16:24:42] TIAS? [16:25:26] what if it explodes?:P [16:29:41] sooo? [16:33:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.025 seconds [16:41:26] New patchset: Dereckson; "(bug 42077) Namespace configuration for ba.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35642 [17:05:50] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:16:15] MaxSem: yes, squid config changes need root [17:16:59] paravoid, could you do a little change for me then: https://bugzilla.wikimedia.org/show_bug.cgi?id=40919#c10 [17:19:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.218 seconds [17:19:53] huh, didn't know we had mobile stuff in squid [17:21:08] I wonder if pushing squid configs is okay or not during the FR :-) [17:21:35] Ryan_Lane: any idea if it is? [17:21:36] the only mobile stuff squids do is redirect people to .m. domains [17:22:11] paravoid: I'd imagine it is [17:22:17] it's just the redirects, yeah [17:22:30] it's the redirects, but it's pushed on all servers [17:22:39] assuming no one fucked up the redirects it should be ok ;) [17:22:46] we don't show banners on mobile [17:24:01] I find it funny that ops is asked to not do anything during the fundraiser, but devs can still do whatever they want [17:24:10] they break the site just as often as us [17:24:50] <^demon> I suppose keeping gerrit up during the fundraiser is probably a good idea. Scheduling the 2.6 upgrade might be tricky :\ [17:24:55] so, the risk here is a) me messing up the squid config (it's a single line change, so difficult but not impossible), b) that "mobi" in the UA might match desktop browsers and we'll end up redirecting random people on their desktops to the mobile site [17:25:03] where there's no FR banner :-) [17:25:27] yeah [17:25:30] that would be bad [17:25:35] ^demon: meh [17:25:50] ^demon: they can live without it [17:25:52] :) [17:26:06] seriously, though, I want the upgrade :) [17:26:15] <^demon> Nobody should have to depend on gerrit since git's distributed ;-) [17:26:38] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [17:26:49] technically people can deploy without gerrit [17:26:57] it's more of a pain in the ass, but it's doablw [17:27:06] <^demon> I've got the uuid thing mostly ironed out--just one last bug I'm trying to resolve. [17:27:07] doable* [17:27:16] why is Squid config not under version control/puppet? [17:27:47] because we're eventually moving away to varnish [17:28:01] anyway, the changes you want are in puppet [17:28:09] the change is already in [17:28:19] redirects.conf [17:29:33] MaxSem: replied to the bug report [17:29:41] paravoid, thank you [17:29:51] I didn't do it, so don't thank me yet [17:31:09] clarity is always good;) [17:31:24] ^demon: are the host fingerprints in your personal known hosts> [17:31:24] ? [17:33:48] <^demon> notpeter: I don't believe so. I've only got 5 entries in my known_hosts [17:34:52] gotcha [17:35:18] that's really weird... that was finished a week ago and no one else is seeing that, afaik [17:36:08] I mena, puppet's not that slow.... [17:43:17] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [17:51:14] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.004 second response time on port 11000 [17:53:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:09:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.046 seconds [18:14:41] notpeter: can you please merge — https://gerrit.wikimedia.org/r/35653 [18:16:47] New patchset: Jgreen; "adding aluminium fundraising-ssl apache vhost" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35654 [18:17:49] New patchset: Dereckson; "Cosmetic code/README fixes for multiversion" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35655 [18:17:57] paravoid: ping [18:18:01] mutante: ping [18:18:05] heh [18:18:12] AaronSchulz: ping [18:18:21] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35654 [18:18:28] AaronSchulz: Just so you felt like you were part of the ping parade [18:18:31] isn't there an rt ticket to solve this? ;) [18:18:37] AaronSchulz: indeed [18:21:40] Ryan_Lane: ping [18:21:52] in a openstack-dns meeting [18:22:51] should be done in 20-30 mins [18:22:52] Okay, so I really need someone to merge — https://gerrit.wikimedia.org/r/35653 it's for Wikipedia Zero Partner IP Live testing [18:22:56] woosters: ^^ [18:23:15] ah [18:23:18] this is an easy one [18:23:18] See https://office.wikimedia.org/wiki/Partner_IP_Live_testing_schedule for more information [18:23:31] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35653 [18:23:40] Ryan_Lane: cool thanks [18:23:56] Ryan_Lane: can you merge it on sock puppet and force a puppet run as well? [18:24:06] hmm [18:24:11] there's something wrong with the repo [18:24:16] New patchset: Dereckson; "(bug 41877) Namespace configuration for bxr.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35656 [18:24:27] error: Ref refs/remotes/origin/production is at 3eade19b5b602d58ed87f26ac98a5fa20efbbf1b but expected 7153ca6b5989854f392c134de9dc95e990277813 [18:24:31] ^demon: ^^ [18:24:56] ya perilly ? [18:25:07] preilly i mean :-) [18:25:10] woosters: it's preilly [18:25:31] woosters: it looks like Ryan_Lane is addressing the issue — I appreciate your response [18:25:41] cool [18:25:53] <^demon> Ryan_Lane: Try `git remote update`? [18:26:38] well, it says it merged it [18:27:27] preilly: need me to run puppet anywhere? [18:27:39] <^demon> This is why I tell everyone to use `git pull --ff-only` rather than blindly pulling. [18:27:44] Ryan_Lane: on the Varnish boxes for mobile [18:27:53] <^demon> $5 says local history has diverged from the origin. [18:28:24] ^demon: we don't pull [18:28:28] Ryan_Lane: so is it merged and live now? [18:28:33] should be [18:28:35] well [18:28:38] not live [18:28:47] which varnish servers are mobile? [18:28:50] can never remember [18:28:55]