[00:42:16] PROBLEM - Puppet freshness on mw1102 is CRITICAL: Puppet has not run in the last 10 hours [00:56:22] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [01:41:40] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 278 seconds [01:42:34] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 278 seconds [01:48:16] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 622s [01:52:01] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 5 seconds [01:54:16] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 54s [01:56:38] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [01:56:56] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 26 seconds [03:08:47] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [03:19:17] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [03:29:11] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [03:42:14] PROBLEM - Puppet freshness on ms1 is CRITICAL: Puppet has not run in the last 10 hours [04:53:50] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [05:37:01] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [06:07:23] PROBLEM - swift-object-auditor on ms-be8 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [06:08:53] PROBLEM - swift-object-auditor on ms-be6 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [06:14:44] PROBLEM - swift-object-auditor on ms-be7 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [06:17:44] RECOVERY - swift-object-auditor on ms-be6 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [06:18:47] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [06:20:53] RECOVERY - swift-object-auditor on ms-be8 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [06:36:09] RECOVERY - swift-object-auditor on ms-be7 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [07:27:03] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [07:31:06] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [08:11:00] New patchset: Hashar; "docroot for deployment.wikimedia.beta.wmflabs.org" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14099 [08:12:18] New review: Hashar; "Platonides wrote:" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/14099 [08:16:36] !log pallium, updating jenkins build script with {{gerrit|14666}} & {{gerrit|14667}} [08:16:45] Logged the message, Master [08:27:15] PROBLEM - Puppet freshness on db1029 is CRITICAL: Puppet has not run in the last 10 hours [10:04:24] New patchset: Mark Bergsma; "Make helium a DNS recursor" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14759 [10:04:58] New patchset: Mark Bergsma; "Prepare for helium precise reinstallation" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14760 [10:05:29] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/14759 [10:05:29] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/14760 [10:08:41] New patchset: Mark Bergsma; "Make helium a DNS recursor" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14759 [10:09:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14759 [10:09:41] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14759 [10:13:13] New patchset: Mark Bergsma; "Prepare for helium precise reinstallation" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14760 [10:13:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14760 [10:13:59] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14760 [10:22:58] New patchset: Mark Bergsma; "Revert "Prepare for helium precise reinstallation"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14763 [10:23:33] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14763 [10:23:49] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14763 [10:27:18] New patchset: Mark Bergsma; "Make hydrogen a DNS server instead of helium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14764 [10:27:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14764 [10:28:41] New patchset: Mark Bergsma; "Prepare hydrogen for precise reinstallation" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14765 [10:29:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14765 [10:29:47] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14765 [10:29:47] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14764 [10:38:30] for christ sake [10:38:53] first I take a server which has apparently been grabbed by someone else [10:39:01] now i take another which doesn't have serial console redir set or something [10:43:26] PROBLEM - Puppet freshness on mw1102 is CRITICAL: Puppet has not run in the last 10 hours [10:49:08] third time's a charm? [10:57:39] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [11:57:46] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [12:49:21] !authdns-update to add labs.wm entry for redirect to labsconsole [12:49:35] !log authdns-update to add labs.wm entry for redirect to labsconsole [12:49:42] Logged the message, Master [12:56:01] New review: Dzahn; "labs redirect per RT-2402" [operations/apache-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/14509 [12:56:03] Change merged: Dzahn; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/14509 [13:01:32] !log labs.wikimedia.org is now a redirect to labsconsole [13:01:39] Logged the message, Master [13:09:16] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [13:11:19] !log powercycling and upgrading a couple more mw* servers [13:11:27] Logged the message, Master [13:14:31] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 31.03 ms [13:15:43] RECOVERY - Host mw1012 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [13:16:10] RECOVERY - Host mw1100 is UP: PING OK - Packet loss = 0%, RTA = 31.09 ms [13:16:19] RECOVERY - Host mw1126 is UP: PING OK - Packet loss = 0%, RTA = 30.93 ms [13:20:04] ACKNOWLEDGEMENT - Host db30 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn RT #3052: db30 is dead [13:20:13] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [13:29:17] mutante: so db30 is dead...than let's decommission it [13:30:08] cmjohnson1: ok, it's not in service, yep [13:30:16] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [13:31:35] New patchset: Dzahn; "decom. db30 - dead per RT-3052 (irritating noises)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14768 [13:32:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14768 [13:37:05] cmjohnson1: done in racktables [13:43:19] PROBLEM - Puppet freshness on ms1 is CRITICAL: Puppet has not run in the last 10 hours [13:48:43] PROBLEM - Host virt1003 is DOWN: PING CRITICAL - Packet loss = 100% [13:49:28] PROBLEM - Host virt1002 is DOWN: PING CRITICAL - Packet loss = 100% [13:58:47] RECOVERY - Host virt1003 is UP: PING OK - Packet loss = 0%, RTA = 30.89 ms [13:58:47] RECOVERY - Host virt1002 is UP: PING OK - Packet loss = 0%, RTA = 31.34 ms [14:02:23] PROBLEM - SSH on virt1003 is CRITICAL: Connection refused [14:02:59] PROBLEM - SSH on virt1002 is CRITICAL: Connection refused [14:08:29] mutante: cool...thx [14:17:14] PROBLEM - Host virt1003 is DOWN: PING CRITICAL - Packet loss = 100% [14:17:14] PROBLEM - Host virt1002 is DOWN: PING CRITICAL - Packet loss = 100% [14:17:59] RECOVERY - SSH on virt1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [14:18:08] RECOVERY - Host virt1002 is UP: PING OK - Packet loss = 0%, RTA = 30.93 ms [14:18:44] RECOVERY - SSH on virt1003 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [14:18:53] RECOVERY - Host virt1003 is UP: PING OK - Packet loss = 0%, RTA = 30.96 ms [14:24:17] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [14:25:11] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [14:40:29] PROBLEM - NTP on virt1002 is CRITICAL: NTP CRITICAL: No response from NTP server [14:40:56] PROBLEM - NTP on virt1003 is CRITICAL: NTP CRITICAL: No response from NTP server [14:50:50] PROBLEM - swift-object-auditor on ms-be6 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [14:53:32] PROBLEM - swift-object-auditor on ms-be8 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [14:54:26] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [14:58:46] RECOVERY - swift-object-auditor on ms-be6 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [14:58:55] RECOVERY - swift-object-auditor on ms-be8 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [15:08:31] PROBLEM - swift-object-auditor on ms-be7 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [15:14:22] RECOVERY - swift-object-auditor on ms-be7 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [15:38:31] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [15:58:29] cmjohnson1: cant it stack into the switches near it? [15:58:54] one of the two boxes should have a couple long-ish cables [16:00:09] but i dunno if they wanna connect via stack or fiber, indeed. [16:01:27] tampa dc is a pain. [16:01:38] so it will prolly be dual fiber optic runs [16:01:43] but dunno. [16:11:21] RECOVERY - Host srv190 is UP: PING OK - Packet loss = 0%, RTA = 0.84 ms [16:15:15] PROBLEM - Apache HTTP on srv190 is CRITICAL: Connection refused [16:19:27] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [16:28:09] RobH: when will you be in eqiad? [16:28:39] was in it on friday, will be in it all day wednesday [16:28:46] cmjohnson1: run a fiber to rack A1/sdtpa I think [16:28:54] i have the audit with garfield on wednesday for part of that day [16:29:01] ok [16:29:02] need me in sooner? [16:29:04] no [16:29:14] just wanted to install a box which then had no serial redir set [16:29:23] no worries. oh, i will ship you those labels this week or next [16:29:28] ok [16:29:29] i carried them to berlin and forgot to give them to you =P [16:29:33] nice [16:29:46] also got a bunch of third hand tools coming on wednesday [16:29:48] from Reedy [16:29:52] \o/ [16:29:53] cool [16:30:04] so i dont recall if he peeled off half a set to send you a single tool or not [16:30:09] but will find out then [16:30:13] I did, yup [16:30:19] i don't need a single tool heh [16:30:20] cool, then i wont hold off on label shipment [16:30:25] mark: well, you have 3 of them [16:30:29] i have 2 [16:30:30] rather than two sets right? [16:30:34] one set of two [16:30:38] i gave you my other set [16:30:40] and one half a set [16:30:43] spread over a year ;) [16:30:45] nah, you only gave me HALF the other set [16:30:51] i gave you one last year, gave you another one now [16:30:53] oh wait, did you gimme the other one [16:30:55] you did [16:30:57] oh yea... [16:31:00] well,shit. [16:31:04] lmfao [16:31:12] wnna send me the spare or want me to send you the extra from ehre? [16:31:24] i think i planned this so you, me, and chris would each get two sets. [16:31:24] hold on to whatever you get [16:31:32] i have one set, it'll do for now [16:31:35] we can always change that later [16:31:49] You wanted a set for the west coast centre one IIRC [16:31:49] ok, well, you are gonna get half a set from Reedy, just hold onto it and send out west for that datacenter [16:32:00] and i will send the other half of that set to west coast as well [16:32:08] we have gone from not enough to plenty =] [16:32:57] i had plenty before :D [16:33:00] now I hae just enough [16:33:17] only problem was that my other set was in the data center I rarely visit [16:33:23] so I never had it on hand when meeting you [16:33:42] now that dc has no set, but that's fine [16:33:45] it has 12 RU [16:33:51] brb [16:35:47] true enough [16:36:38] its the second access switch in the rack correct? [16:36:49] asw2-rack-sdtpa [16:36:52] usually. [16:37:06] unless mark wants to change the proposed standard, we have it that way in eqiad but not online yet [16:37:29] so yea, asw2-bwhatever#-sdtpa [16:37:37] or what have you [16:41:11] cmjohnson1: the cable boots and ends, you need those asap or is 3-4 business day transit fine? [16:49:09] RECOVERY - Apache HTTP on srv190 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.145 second response time [16:55:52] RobH: lol, they've been delivered already [17:00:03] yep, saw that this morning [17:00:06] i forgot =P [17:00:16] i put the ticket in and promptly forgot it happened at all [17:01:55] Makes a change, something arriving early [17:04:52] hrmm, ct said they were canceling that order. [17:05:01] lets see what rachel did with them later today [17:09:01] New patchset: preilly; "fix subdomain for carrier" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14429 [17:09:34] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14429 [17:09:50] notpeter: you there? [17:10:00] Can someone please merge https://gerrit.wikimedia.org/r/#/c/14774/ [17:12:39] sup [17:12:40] usre [17:12:41] sure [17:12:53] notpeter: cool — thanks! [17:14:06] preilly: should I merge the previous commit in your branch as well? [17:14:38] this one: https://gerrit.wikimedia.org/r/#/c/14429/2 [17:16:03] Imunna go with yet [17:16:05] *yes [17:16:20] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14774 [17:16:20] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14429 [17:27:50] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [17:31:53] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [17:44:19] maplebed: heh, my acl change will conflict with a pending patch from may, oh well :) [17:44:29] lol [17:44:46] this is why it's good to have code cr'ed within a reasonable time [17:59:31] New patchset: Ottomata; "Now supporting logging of search queries and result statistics through log4j." [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/14776 [18:05:13] with whom can I discuss https://rt.wikimedia.org/Ticket/Display.html?id=3221 ? [18:25:50] with whom can I discuss https://rt.wikimedia.org/Ticket/Display.html?id=3221 ? [18:26:41] MaxSem - asher would be able to help [18:26:46] he is not in yet [18:26:51] what is rt wiki? [18:27:15] it's a private ops bugtracker [18:28:10] PROBLEM - Puppet freshness on db1029 is CRITICAL: Puppet has not run in the last 10 hours [18:53:05] binasher, can we discuss https://rt.wikimedia.org/Ticket/Display.html?id=3221 ? [18:53:49] sure [18:57:35] New patchset: Asher; "adding db30 to decom list" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14779 [18:57:52] binasher, so what are the options for full-text search on cluster? according to Patrick, our existing custom Lucene can't be used for this - is it possible to install a vanilla Solr/Lucene somewhere? or put MyISAM on a separate machine? [18:58:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14779 [18:58:35] (that table doesn't need to be on the same DB as the wiki it's running on) [18:58:54] MaxSem: putting vanilla solr/lucene or even sphinx somewhere is an option. adding to lsearchd might still be an option too [18:59:17] brion vibber would be a good person to rope in to figure out if we can get it indexed by existing infrastructure [18:59:23] notpeter might have insight too ^^ [18:59:34] are ops prepared for this? [19:00:10] mmm, brion seems to be on the plane right now [19:01:00] Most of the tech staff would prob. be heading towards Wikimania. [19:01:09] adding to the existing search infrastructure is the only thing ops would really be prepared for - that is, the option that would be least ops intensive, save for deploying a new lsearchd jar [19:01:10] * domas is in DC [19:01:33] how's the heat? [19:01:46] hot [19:01:49] better than last week [19:01:53] last week was killing [19:02:14] oh you've been there for a week? [19:02:27] not here. this week is shaping up to be a killer [19:02:42] MaxSem: preilly says he never said anything about lsearch not being suitable for this.. he was probably talking about geosearch [19:02:58] PROBLEM - SSH on sq36 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:58] PROBLEM - Backend Squid HTTP on sq36 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:03:05] then we misunderstood each other [19:03:17] since lsearch is based on an ancient version of lucene2 that doesn't have spatial indexing [19:04:12] I was in NYC [19:04:19] for a week [19:04:23] nice [19:04:32] * domas is office-hopping [19:04:54] maxsem: adding another fulltext index shouldn't be a big problem, but moving forward will require resources that may not be available til after wikimania [19:05:26] tbh, one can use myisam in wikimedia [19:05:31] as long as everything fits in RAM [19:05:39] I hear it isn't a problem nowadays! [19:05:47] full month of parser cache is going to be all in RAM [19:06:23] domas: we're going all memsql [19:06:41] binasher, a week's wait is ok - we need this up and running by September 1st [19:06:43] PROBLEM - Frontend Squid HTTP on sq36 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:06:50] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14779 [19:07:47] binasher: right [19:07:52] binasher: it has 10G limit [19:08:53] MaxSem: if you have deadlines, you might want to start looking at the lucene-search-2 code and email brion sooner than later [19:10:00] hold on a second [19:10:04] you want a separate database [19:10:07] with full text search [19:10:10] and some API queries? [19:10:23] and batch updating? [19:11:16] just roll out some sqlite service [19:11:19] and be done with it :) [19:12:02] mmm, with this load SQLite will be an option;) [19:12:36] I HAVE DOMAS'S APPROVAL FOR SQLITE REVOLT!!1 [19:12:39] MaxSem: you should totally do that [19:15:27] thats under a gig, web class machine could run such a service [19:15:30] :) [19:15:45] even memsql would work!!11 [19:15:49] if it were opensource [19:30:04] notpeter: can you merge this change: https://gerrit.wikimedia.org/r/14781 [19:30:27] notpeter: thanks [19:30:41] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14781 [19:32:10] PROBLEM - Host sq36 is DOWN: PING CRITICAL - Packet loss = 100% [20:06:44] anyone know the procedure for setting up a cron job on the cluster? [20:09:02] chrismcmahon, hashar: we're about to have our by-monthy sprint meeting. Any update on getting en.beta set up as a test env for the EE team? [20:09:48] kaldari: I did wrote a simple doc on https://labsconsole.wikimedia.org/wiki/Deployment/Overview [20:09:56] oh, nice [20:10:00] and the beta cluster has been fully broken over the week-end [20:10:05] :-/ [20:10:10] bookmarking [20:10:13] will make it work again tomorrow [20:10:27] kaldari: puppet sets up cron jobs, for example see manifests/apaches.pp [20:10:39] and review that doc once more then send some announcement to interested people [20:11:11] hashar: thanks!! [20:11:22] hashar: I know you mentioned "My next targets are MobileFrontend and ArticleFeedbackv5 which I will implement manually" so I've been standing by :) [20:11:34] kaldari: anyway, you can start sending the EE extensions setup in operations/mediawiki-config repo in the CommonSettings-wmflabs.php file [20:11:40] Jeff_Green: I meant more on the human side, rather than the machine-side :) [20:11:45] kaldari: that the file used by 'beta' to override production settings [20:12:04] chrismcmahon: :D [20:12:04] kaldari: I suppose a detailed RT request would be the way [20:12:08] hashar: OK, I'll do that [20:12:39] chrismcmahon: I have worked on Wikidata last week, deployed it over the weekend/today but is not really fully functional yet [20:13:00] chrismcmahon: still have to write a workaround a Jenkins bug. But hopefully will have it finished this Week [20:13:03] Jeff_Green: Thanks, we'll do that [20:13:08] k [20:13:27] chrismcmahon: ArticleFeedbackv5 / MobileFrontend should be trivial once Wikidata is setup :-] (**crosses fingers**) [20:13:41] kaldari: don't you have several extensions to deploy? [20:14:04] usually, but not this week :) [20:14:04] kaldari: I would love to get one to install myself so I can complete my documentation :-] [20:14:16] I meant, to deploy on beta. [20:14:30] oh yeah, lots. I'll add them to the configs [20:14:51] I'll try to make it match en.wiki otherwise [20:16:04] hashar: how do you ssh to that server? [20:16:33] ohh [20:16:38] I did not cover the labs access :-D [20:17:09] there is the huggeee doc at https://labsconsole.wikimedia.org/wiki/Help:Access [20:17:46] I use the ssh ProxyCommand functionality : https://labsconsole.wikimedia.org/wiki/Help:Access#Using_ProxyCommand_ssh_option [20:18:06] aka makes any request to *.pmtpa.wmflabs to be proxied through bastion.wmflabs.org [20:18:24] so: ssh deployment-dbdump.pmtpa.wmflabs , ssh to bastion, then from there ssh to the instance [20:19:52] I can get that far, but I can't figure out the name of the instance [20:20:30] the beta instances don't see to be on the instance list [20:21:00] look in the filter [20:21:06] https://labsconsole.wikimedia.org/wiki/Special:NovaInstance [20:21:14] you can filter by project [20:21:26] when you are added to a project, it still default to filtering the new project [20:21:52] so you have to click [Show project filter] then tick the project you want [20:23:27] yeah, but I don't see anything about beta anywhere? am I looking for the wrong thing? [20:24:32] maybe I need to be added to a beta group or something [20:25:19] nope, there's no beta group [20:26:36] hashar: ^ [20:26:51] hoh [20:26:54] that is deployment-prep [20:26:55] sorry :-D [20:27:16] there is a list of instances on https://labsconsole.wikimedia.org/wiki/Nova_Resource:Deployment-prep [20:27:42] i just checked, you are still in the project and even a sysadmin [20:27:45] Logged the message, Master [20:28:33] yeah, but what's the name of the instance? I don't see anything in that list that suggests it is the instance for the beta sites [20:28:56] is it just apache30? [20:30:42] Logged the message, Master [20:30:49] Logged the message, Master [20:30:52] kaldari: I am not sure I understand. All those instances are part of the beta sites [20:30:56] Logged the message, Master [20:31:14] kaldari: the apache30 / apache31 are the apaches. They are broken currently [20:31:26] kaldari: deployment-dbdump is the equivalent of fenari (aka the bastion host) [20:32:33] kaldari: we do MediaWiki changes in /home/wikipedia/common , as soon as a change is applied from gerrit or saved locally, it is instantly "deployed" : apaches are using a shared NFS dir which is /home/wikipedia/common on deployment-dbdump [20:32:36] hashar: thanks, why didn't you just say that :) [20:32:45] :-]]]]]]]]] [20:33:13] deployment-dbdump is a pretty obscure name :P [20:33:26] definitely :-] [20:33:37] we have a deployment-bastion host but it is broken too [20:33:43] ah [20:34:03] anyway you might want to read https://labsconsole.wikimedia.org/wiki/Deployment/Overview [20:34:17] and please do spam me with any question you might have :-] [20:34:26] and feel free to update the page with your finding [20:34:39] yeah, I definitely need to read that. Thanks for the help and sorry for my thick-headedness [20:35:02] it is usually faster to ask than to read the doc :-]] [20:35:12] and I am happy to answer hehe [20:35:52] kaldari: the chapter about deploying a new extension is mostly empty. I just described what need to be done but not the actual commands and culprit :/ [20:38:12] kaldari: I have added a few lines to https://labsconsole.wikimedia.org/wiki/Deployment/Overview#New_extension [20:40:43] nice [20:44:10] PROBLEM - Puppet freshness on mw1102 is CRITICAL: Puppet has not run in the last 10 hours [20:58:16] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [21:09:04] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:10:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.267 seconds [21:11:46] notpeter, congratulations! [21:27:07] if I'd like a list admin password to be reset for a mailing list, do you need a RT ticket or can I send an email to one person? [21:28:22] you = any member of ops that knows the site admin password or can do it directly on sodium [21:33:00] Thehelpfulone, I suppose you could directly mail "you" [21:33:14] given that not everybody can fill rt tickets [21:33:34] yep, but I wanted to know if ops require some more accessible ticket to keep track of these things [21:33:52] do you have access to rt? [21:33:59] I guess they could create a ticket there if they need [21:42:49] RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Mon Jul 9 21:42:15 UTC 2012 [21:44:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:51:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.137 seconds [21:58:16] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [22:10:16] Platonides: thanks! [22:11:22] RECOVERY - NTP on virt1001 is OK: NTP OK: Offset -0.02649593353 secs [22:11:58] RECOVERY - NTP on virt1002 is OK: NTP OK: Offset -0.03662621975 secs [22:15:52] RECOVERY - NTP on virt1003 is OK: NTP OK: Offset -0.03335916996 secs [22:25:01] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:36:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.034 seconds [23:05:55] New patchset: Bhartshorne; "adding MAC addresses of new swift backend servers ms-be9-12" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14855 [23:06:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14855 [23:06:55] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14855 [23:07:46] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:10:10] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [23:18:07] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.041 seconds [23:21:07] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [23:28:19] New patchset: Tim Starling; "Separate l10n and l10nupdate cache directories" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14858 [23:31:00] PROBLEM - swift-object-auditor on ms-be8 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [23:31:36] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [23:32:30] PROBLEM - swift-object-auditor on ms-be6 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [23:39:17] New patchset: Tim Starling; "Create the l10nupdate cache directory" [operations/mediawiki-multiversion] (master) - https://gerrit.wikimedia.org/r/14861 [23:44:39] PROBLEM - Puppet freshness on ms1 is CRITICAL: Puppet has not run in the last 10 hours [23:47:07] Change abandoned: Tim Starling; "l10nupdate needs to update l10n as well" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14858 [23:47:10] Change abandoned: Tim Starling; "l10nupdate needs to update l10n as well" [operations/mediawiki-multiversion] (master) - https://gerrit.wikimedia.org/r/14861 [23:47:48] PROBLEM - swift-object-auditor on ms-be7 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [23:52:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:57:06] New patchset: Asher; "initial config for new parsecache db" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14862 [23:57:40] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14862 [23:58:45] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14862