[00:02:23] RECOVERY - Memcached on ms-fe3001 is OK: TCP OK - 0.095 second response time on port 11211 [00:03:54] RECOVERY - Memcached on ms-fe3002 is OK: TCP OK - 0.096 second response time on port 11211 [00:03:54] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (203637) [00:26:44] PROBLEM - Host mw1163 is DOWN: PING CRITICAL - Packet loss = 100% [00:27:35] RECOVERY - Host mw1163 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [00:37:14] PROBLEM - Swift HTTP backend on ms-fe3002 is CRITICAL: Connection refused [00:37:25] PROBLEM - Swift HTTP backend on ms-fe3001 is CRITICAL: Connection refused [00:51:07] (03PS1) 10CSteipp: Temporarily allow insecure token trasfer for OAuth [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126185 [01:46:14] PROBLEM - Host mw1163 is DOWN: PING CRITICAL - Packet loss = 100% [01:47:04] RECOVERY - Host mw1163 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [02:13:44] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 2952 MB (3% inode=99%): [02:19:44] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3492 MB (3% inode=99%): [02:28:03] !log LocalisationUpdate completed (1.23wmf21) at 2014-04-16 02:28:01+00:00 [02:28:11] Logged the message, Master [02:42:07] (03PS1) 10Gergő Tisza: Enable MediaViewer user surveys on first batch of pilot sites [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126190 [02:49:04] PROBLEM - MySQL InnoDB on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:54] RECOVERY - MySQL InnoDB on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [02:55:30] !log LocalisationUpdate completed (1.23wmf22) at 2014-04-16 02:55:28+00:00 [02:55:38] Logged the message, Master [03:00:44] RECOVERY - Disk space on virt0 is OK: DISK OK [03:04:54] PROBLEM - Puppet freshness on ms-be3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [03:04:54] PROBLEM - Puppet freshness on ms-be3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [03:04:54] PROBLEM - Puppet freshness on ms-be3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [03:04:54] PROBLEM - Puppet freshness on ms-be3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [03:04:54] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [03:04:54] PROBLEM - Puppet freshness on ms-fe3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [03:06:04] PROBLEM - Host mw1163 is DOWN: PING CRITICAL - Packet loss = 100% [03:06:44] RECOVERY - Host mw1163 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [03:46:28] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Apr 16 03:46:23 UTC 2014 (duration 46m 22s) [03:46:34] Logged the message, Master [04:25:44] PROBLEM - Host mw1163 is DOWN: PING CRITICAL - Packet loss = 100% [04:26:24] RECOVERY - Host mw1163 is UP: PING OK - Packet loss = 0%, RTA = 0.47 ms [05:03:44] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [05:06:44] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (202212) [05:45:14] PROBLEM - Host mw1163 is DOWN: PING CRITICAL - Packet loss = 100% [05:51:10] (03CR) 10Amire80: "I'm not sure what do you mean by this. No fonts will be loaded automatically unless specifically requested by the page. Fonts for Burmese " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115153 (owner: 10Odder) [06:02:47] <_joe_> hi all! [06:04:48] good morning [06:05:01] <_joe_> akosiaris: isn't it like dawn there? [06:05:08] 9 am [06:05:11] close enough to dawn [06:05:18] <_joe_> oh yes, 1 hour AHEAD [06:05:27] hehe... it certainly looks like that today [06:05:32] crappy weather [06:05:41] yeah [06:05:42] meh [06:05:53] <_joe_> here it's chilly (but don't let andrewbogott_afk know I said that) [06:05:54] PROBLEM - Puppet freshness on ms-be3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [06:05:54] PROBLEM - Puppet freshness on ms-be3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [06:05:54] PROBLEM - Puppet freshness on ms-be3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [06:05:54] PROBLEM - Puppet freshness on ms-be3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [06:05:54] PROBLEM - Puppet freshness on ms-fe3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [06:05:54] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [06:06:01] and rain tmorrow, bleah [06:07:36] !log stop mysqld on db38 (x1) for decom [06:07:41] Logged the message, Master [06:09:05] <_joe_> springle: good evening, sir [06:09:22] <_joe_> springle: next week I'll bug you with db questions (I hope) [06:09:39] hi _joe_ [06:09:42] ok :) [06:10:25] <_joe_> anyone: do we collect metrics on APC hit rate and usage somewhere? [06:12:03] <_joe_> we have someth8ing in graphite [06:12:37] no idea if we have anything else [06:13:45] <_joe_> that's something we'll need to do with diamond [06:18:44] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [06:22:32] (03CR) 10ArielGlenn: "As sudo::appservers is removed from this class it should be explicitly added to fenari, just so nothing is broken there. I can't imagine " [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 (owner: 10Dzahn) [06:23:59] (03PS1) 10Springle: Remove db38 from x1 shard [operations/puppet] - 10https://gerrit.wikimedia.org/r/126202 [06:26:13] (03CR) 10Springle: [C: 032] Remove db38 from x1 shard [operations/puppet] - 10https://gerrit.wikimedia.org/r/126202 (owner: 10Springle) [06:26:44] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (202102) [06:31:46] (03PS1) 10Springle: Remove db48 and db49 from OTRS mail duties. db49 is decommissioned already so hasn't worked as a secondary for a while. [operations/puppet] - 10https://gerrit.wikimedia.org/r/126203 [06:34:24] (03CR) 10Springle: "Want to decommission db48, but mchenry is talking to MySQL there. Will this config work until mchenry goes away?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126203 (owner: 10Springle) [06:43:43] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [06:45:58] (03PS1) 10Dzahn: bugzilla,make Apache SSL CipherSuite configurable [operations/puppet] - 10https://gerrit.wikimedia.org/r/126204 [06:46:00] (03PS1) 10Dzahn: bugzilla, use better SSL cipher suite [operations/puppet] - 10https://gerrit.wikimedia.org/r/126205 [06:46:02] (03PS1) 10Dzahn: bugzilla, use SSLProtocol ALL -SSLv2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126206 [06:53:26] (03CR) 10Dzahn: "why do i get this?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126205 (owner: 10Dzahn) [07:00:43] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (200906) [07:03:03] !log zirconium - upgrading apache2, php5 packages [07:03:09] Logged the message, Master [07:06:43] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [07:09:00] (03Abandoned) 10Ori.livneh: beta cluster: un-split-brain memcached config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125926 (owner: 10Ori.livneh) [07:09:18] mutante: I doubt many of the ciphers in that ticket will work btw. ECDHE is only supported in apache 2.4 IIRC [07:10:01] akosiaris: yes, but it's per "If your version of OpenSSL is old, unavailable ciphers will be discarded automatically. Always use the full ciphersuite above and let OpenSSL pick the ones it supports. " [07:10:29] yea, want newer apache as well .. trusty [07:10:39] <_joe_> akosiaris: I thought those arguments were passed directly to openSSL and that apache did nothing with them [07:10:53] _joe_: so did I for some time [07:10:57] oh, yea, and the error i get is from [07:11:14] openssl ciphers -v '...' [07:11:24] <_joe_> mutante: oh, ok [07:12:01] i dont know yet if that means Apache would break [07:12:05] <_joe_> mutante: I did some research on PFS, if we don't mind annoying IE users I can take a look at what I did back then [07:12:12] the suite list is straight from the Mozilla page though [07:12:26] yeah IE is a pita concerning FS and PFS [07:12:26] !ie [07:12:32] <_joe_> mutante: the list on the mozilla PFS page hase some issues [07:12:41] didnt IE6 support finally end in April? [07:12:48] heh, yea, but it's Bugzilla... [07:12:57] so we should kind of work for a wide audience [07:13:02] <_joe_> akosiaris: in particular, TLSv1.1 and TLSv1.2 are disabled by default in IE up to version 9 at least [07:13:29] <_joe_> mutante: exactly, the problem is not IE6 (which won't work anyway). [07:13:30] mutante we should also add HSTS [07:13:39] Header add Strict-Transport-Security "max-age=15552000" [07:13:58] <_joe_> akosiaris: HSTS and pin the cert? [07:14:53] !IRClog2Gerrit :) [07:14:56] <_joe_> ok not our case though [07:15:15] pinning the cert would be nice too [07:15:33] <_joe_> akosiaris: that is client-side and verges on the paranoia a little bit [07:16:14] how about # OCSP Stapling, only in httpd 2.3.3 and later [07:16:23] SSLUseStapling on [07:17:13] <_joe_> only in httpd 2.3.3 and later... [07:17:18] <_joe_> :) [07:17:45] ok i can live with stapling only as well :-) [07:18:04] eh, yes:) i just upgraded the 2.2.22 packages btw [07:19:33] https://launchpad.net/ubuntu/+source/apache2/2.2.22-1ubuntu1.5 [07:20:52] akosiaris: _joe_ , please leave the comments on the gerrit so they are not list in irc backlog?:) [07:22:35] 126204 is not touching the suite itself, just making it easier to change, fwiw [07:26:03] (03PS3) 10Dzahn: remove sudo::appserver from bastions [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 [07:26:31] <_joe_> mutante: will do! [07:27:08] (03CR) 10Dzahn: [C: 031] "now included on fenari, per Ariel's comment" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 (owner: 10Dzahn) [07:27:17] _joe_: thanks! [07:27:34] (03PS4) 10Dzahn: remove sudo::appserver from bastions [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 [07:28:30] mutante: I saw hoo man also suggested remove mysql_wmf::client from bastionhost [07:28:52] akosiaris: yea, i saw it somewhere, i think it's already another patch [07:29:39] https://gerrit.wikimedia.org/r/#/c/126027/ [07:31:49] (03CR) 10Dzahn: [C: 031] "i guess it is like sudo::appservers, it should stay on fenari, but not the other newer bastions. the root cause is that fenari was half ba" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126027 (owner: 10Hoo man) [07:35:55] (03PS2) 10Dzahn: Remove mysql client from bastionhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/126027 (owner: 10Hoo man) [07:45:33] (03CR) 10Dzahn: [C: 032] "confirmed by Jeff" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125715 (owner: 10Dzahn) [07:52:47] (03CR) 10ArielGlenn: [C: 031] remove sudo::appserver from bastions [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 (owner: 10Dzahn) [08:03:53] good morning [08:06:31] <_joe_> hi hashar [08:07:15] I cleaned my email inbox at 2am [08:07:30] and by 10 am I already have 43 emails [08:07:55] <_joe_> hashar: I have like 1000 since tonight, but most are cron/root spam [08:08:16] _joe_: most of mines are Gerrit notifications [08:08:42] <_joe_> hashar: oh, yes, then there are those [08:09:18] and bugzilla!! [08:10:38] (03CR) 10Dzahn: [C: 032] beta: drop pmtpa instances from the natfix subclass [operations/puppet] - 10https://gerrit.wikimedia.org/r/125194 (owner: 10Hashar) [08:10:46] \O/ [08:10:46] adds one notification for hashar [08:11:09] I created some filters that parse the mails headers and tags the Gerrit emails [08:11:15] so I can then show them colored [08:11:24] i.e. merged notification shows up green [08:11:47] Jenkins success are purple, Jenkins failure are orange [08:11:50] and abandonned patches blue [08:11:57] that ease the triage tremendously [08:12:09] <_joe_> hashar: which client do you use? [08:12:14] Thunderbird [08:12:20] <_joe_> I should do something like that in mutt [08:12:28] I should learn mutt [08:12:29] :D [08:12:32] <_joe_> as soon as I've time to set up the conf for WMF [08:12:46] <_joe_> hashar: I'm using thunderbird for WMF email at the moment [08:12:57] i think i'm going back to thunderbird [08:13:14] <_joe_> mutante: from what? [08:13:18] google web ui [08:13:46] i want to sort alpha by subject [08:13:55] and i have always been TB user in the past [08:14:17] I definitely need the threaded view [08:14:28] i got used to tags.. but meh [08:15:00] do you guys talk to sanger or to google though when doing imap [08:15:02] <_joe_> gmail is *horribly* dumbed down [08:15:16] <_joe_> mutante: google, I do some pre-processing there [08:15:44] <_joe_> I move some mails to places I do not make available for IMAP with labels, but it doesn't seem to work well [08:15:45] (03PS1) 10ArielGlenn: timeout submit_check_result, see rt #5311 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126209 [08:16:07] nods [08:17:30] _joe_ mutante : this is how it looks to me http://imgur.com/ceq4Klr [08:18:03] (03PS2) 10Dzahn: beta: adjust protoproxy for eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/124057 (owner: 10Hashar) [08:18:36] spam = indesirable ?:) [08:18:51] mutante: yes :-] [08:19:00] could be translated as "unwanted" [08:19:07] and the various labels I am applying http://imgur.com/Re6iFFV [08:19:23] <_joe_> don't you have canned meat in france? you could translate spam to some local brand :P [08:20:00] we barely have any [08:20:27] <_joe_> Yeah, it's the same here, and nobody expects the thing to be edible, either [08:20:35] usually refers to it as "Corned-beef" ( https://en.wikipedia.org/wiki/Corned_beef ) [08:20:44] irrelevant of the actual 'brand' [08:20:52] pourriel ? [08:21:01] <_joe_> and no, we don't have corned beef at all I'd say [08:21:14] we are huge fans of Pâté though https://en.wikipedia.org/wiki/Pâté :-] [08:21:19] merdiel, polluriel [08:21:34] RECOVERY - RAID on dataset1001 is OK: OK: optimal, 2 logical, 24 physical [08:21:36] mutante: merdiel sounds great! [08:21:41] ooohhhh [08:21:44] <_joe_> "merdiel" sounds not so good [08:22:01] <_joe_> hashar: eheh. [08:22:13] hashar: http://en.wiktionary.org/wiki/pourriel#Synonyms [08:22:31] Quebec has a government agency in charge of "properly" translating english words to french words [08:22:39] you know i always have to check wikt [08:22:45] red links! [08:22:57] for email they came with the rather nice "courriel" (short for "courrier électronique" or "electronic mail) [08:23:15] and yeah they recommand "pourriel" for spam mails http://gdt.oqlf.gouv.qc.ca/ficheOqlf.aspx?Id_Fiche=8349831 [08:23:43] apergos: that was a good "ooh" ,right [08:23:45] the good thing on that dictionary is that they actually explain why one should not use the english word [08:24:03] <_joe_> hashar: why so? [08:24:15] so Pourriel comes from POUbelle (trash bin) and couRRIEL [08:24:23] <_joe_> (I'm asking as in Italy we do happily use the english words) [08:24:26] yes, it was a happy dataset1001 ooohhh [08:24:27] and pourri also means rotted [08:25:11] _joe_: Quebec is surrounded by english speaking people, I guess enforcing french everywhere is a way for us to show their "independency" and preserve the french culture [08:25:25] in France we do just like everyone else, we use the english words [08:26:38] <_joe_> hashar: I thought you used french terms as well (logiciel, ordinateur, etc...) [08:27:53] (03CR) 10Hashar: "I would adjust the services entries to point to whatever our syslog are. I am not sure whether they are used." [operations/dns] - 10https://gerrit.wikimedia.org/r/125952 (owner: 10Dzahn) [08:28:20] _joe_: that is true, I guess it depends on how lazy we are [08:28:49] _joe_: for online discussion we use "chat" (which also mean cat hehe) [08:29:04] (03CR) 10Dzahn: [C: 032] "from BZ: "Please enable ssl/https support for the beta wikis again. It is missing after migration to eqiad"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124057 (owner: 10Hashar) [08:29:29] that was for star.wmflabs.org [08:29:37] andrewbogott_afk: hashar & [08:29:47] mutante: yeah ssl is still broken on beta. The nginx refuse to start because of some SSL chain issue [08:30:01] just merged that change [08:30:09] that replaces pmtpa with eqiad [08:30:15] in protoproxy [08:30:15] yeah should be fine [08:30:32] I have a few more changes applied to puppet master that should be harmless for prod [08:31:20] https://gerrit.wikimedia.org/r/#/q/owner:hashar+project:operations/puppet+is:open+topic:contint,n,z :-] [08:34:14] (03CR) 10Dzahn: [C: 032] misc/dsh.pp: retab and almost pass puppet lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/122789 (owner: 10Hashar) [08:37:54] hashar: that monitor check can likely be removed though [08:38:20] and i think it was on another matanya change [08:40:14] (03CR) 10Dzahn: "noop, no puppet changes on bast1001" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122789 (owner: 10Hashar) [08:43:05] (03CR) 10Dzahn: [C: 032] "for beta uploads" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122786 (owner: 10Hashar) [08:44:51] (03CR) 10Hashar: beta: New script to restart apaches (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/125888 (owner: 10BryanDavis) [08:45:17] (03CR) 10Dzahn: [C: 032] "no access changes, just sorts alpha" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126154 (owner: 10Hashar) [08:49:30] (03CR) 10Dzahn: [C: 031] lvs: generic::upstart_job() now uses boolean values [operations/puppet] - 10https://gerrit.wikimedia.org/r/118717 (owner: 10Hashar) [08:50:27] (03CR) 10Dzahn: [C: 031] twemproxy: generic::upstart_job() now uses boolean values [operations/puppet] - 10https://gerrit.wikimedia.org/r/118718 (owner: 10Hashar) [08:52:03] (03CR) 10Dzahn: [C: 031 V: 031] Lint mediawiki::twemproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/121400 (owner: 10Hashar) [08:55:35] (03CR) 10Dzahn: [C: 031 V: 031] "dn: uid=parsoid,ou=people,dc=wikimedia,dc=org" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123212 (owner: 10Hashar) [08:59:20] (03CR) 10Dzahn: [C: 04-1] "are you sure? i got:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123600 (owner: 10Hashar) [08:59:49] (03CR) 10Dzahn: "Puppet-lint 0.1.12" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123600 (owner: 10Hashar) [09:05:55] (03CR) 10Hashar: "I use the version from gem:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123600 (owner: 10Hashar) [09:05:57] PROBLEM - Puppet freshness on ms-be3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [09:05:57] PROBLEM - Puppet freshness on ms-be3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [09:05:57] PROBLEM - Puppet freshness on ms-be3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [09:05:57] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [09:05:57] PROBLEM - Puppet freshness on ms-fe3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [09:05:57] PROBLEM - Puppet freshness on ms-be3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [09:08:41] (03CR) 10Mark Bergsma: [C: 032] Initial commit of pmacct module and role [operations/puppet] - 10https://gerrit.wikimedia.org/r/115345 (owner: 10Jkrauska) [09:10:56] (03CR) 10Dzahn: [C: 032] "wow, much newer" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123600 (owner: 10Hashar) [09:11:09] (03PS2) 10Dzahn: puppet-lint: ignore class_parameter_defaults [operations/puppet] - 10https://gerrit.wikimedia.org/r/123600 (owner: 10Hashar) [09:11:45] (03CR) 10Dzahn: [C: 032 V: 032] puppet-lint: ignore class_parameter_defaults [operations/puppet] - 10https://gerrit.wikimedia.org/r/123600 (owner: 10Hashar) [09:16:11] (03CR) 10Dzahn: "that would mean create syslog.eqiad.wmnet as an alias for ..?" [operations/dns] - 10https://gerrit.wikimedia.org/r/125952 (owner: 10Dzahn) [09:17:52] (03CR) 10Dzahn: "it's gone from https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=payments.wikimedia.org&nostatusheader" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125715 (owner: 10Dzahn) [09:20:13] ACKNOWLEDGEMENT - Host mw1163 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn 09:22 cmjohnson1 !log shutting down mw1163 to replace DIMM [09:24:53] (03PS1) 10Dzahn: remove mw1163 from dsh, broken memory [operations/puppet] - 10https://gerrit.wikimedia.org/r/126215 [09:25:50] (03CR) 10Hashar: [C: 031] "Fine to me. Feel free to +2 it at anytime to get the change applied on the beta cluster." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126185 (owner: 10CSteipp) [09:26:22] !log disabling mw1163 in pybal [09:26:27] Logged the message, Master [09:28:26] (03CR) 10Dzahn: [C: 032] "fyi, see ticket, also disabled in pybal. please revert in both places after hardware repair is done" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126215 (owner: 10Dzahn) [09:34:05] (03CR) 10Dzahn: [C: 032] contint: directory to hold debian-glue packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/122712 (owner: 10Hashar) [09:35:14] (03CR) 10Dzahn: [C: 031] contint: get rid of misc::pbuilder on slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/122707 (owner: 10Hashar) [09:36:31] mutante: that last patch "contint: get rid of misc::pbuilder on slaves" https://gerrit.wikimedia.org/r/#/c/122707/ , you can get it merged. It is already on the local puppetmaster :-D [09:37:02] hashar: ok, i just did not see the dependency first [09:37:14] ahh [09:37:17] and creating that directory was easy to check [09:38:20] (03CR) 10Dzahn: [C: 032] "already on local puppetmaster" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122707 (owner: 10Hashar) [09:39:53] hashar: ok, i hope that reduced your queue a bit.. now handing over to others for the rest though [09:40:41] mutante: there is one more you can land in, which is to have the contint git-deploy repositories to use submodules https://gerrit.wikimedia.org/r/122342 :D [09:40:46] all: https://gerrit.wikimedia.org/r/#/q/owner:hashar+status:open,n,z [09:40:59] and follow up change https://gerrit.wikimedia.org/r/124305 which configure a contint repo in gitdeploy [09:41:03] both straight forward :] [09:41:23] the rests are not that trivial :/ [09:41:58] check the tickets for uploads and SSL on beta? :) [09:42:05] after the recent merges [09:42:17] i'll look again after a break [09:42:32] gets coffee [09:44:05] mutante: for some reason the star.wmflabs.org cert used by nginx protoproxy is invalid and rejected :D [09:46:36] <_joe_> hashar: where can I fetch it? [09:46:54] <_joe_> mmmh, coffee... [09:46:57] got the issue on deployment-cache-bits01.eqiad.wmflabs [09:47:00] hashar: still? sigh [09:47:05] https://gerrit.wikimedia.org/r/#/c/126008/ [09:47:06] * hashar just like everyone grabs a coffeee [09:47:18] <_joe_> well, I have to brew it first [09:47:40] https://gerrit.wikimedia.org/r/#/c/124859/2/manifests/role/cache.pp [09:48:04] (03PS6) 10Giuseppe Lavagetto: Substituting the check_graphite script. [operations/puppet] - 10https://gerrit.wikimedia.org/r/125726 [09:48:11] my daughter has hidden our coffee capsules grrr [09:48:28] <_joe_> hashar: coffee capsules, you're doing it all wrong :) [09:48:46] _joe_: na I am lazy :-] [09:49:07] mutante: looks like chinese to me :-] [09:49:14] openssl s_client -connect wikistats.wmflabs.org:443 -CAfile /etc/ssl/certs/ meh,, i got a Verified 0 there yesterday [09:49:23] so on labs [09:49:23] after they fixed the chain [09:49:30] the .key comes from the repository labs/private apparently [09:49:33] (03CR) 10Giuseppe Lavagetto: "@chase, my responses are inline:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125726 (owner: 10Giuseppe Lavagetto) [09:49:39] which is *roll the drums* a PUBLIC repository [09:49:45] hashar: coffee capsules? she is doing the right thing [09:50:01] hashar: did somebody submit it there? :o [09:50:21] hashar: afaik there is the special labs project that is all locked down.. just for this [09:50:33] last change was * 21b4656 - Adding keys for labs (2 years, 5 months ago) [09:50:43] and.. capsules are just a way to raise the coffee price [09:51:00] so I guess the keys I get installed are obsoletes [09:51:03] once you get the machine to refill the capsules yourselves to save again ... [09:51:31] mutante: well a capsule is only 0,25€ so it is cheaper than me brewing the coffee :-] [09:51:47] hashar: yes, andrewbogott_afk and RobH replaced the self-signed cert with a real one from RapidSSL [09:51:48] I usually get my first coffee in bar which is 1,30€ anyway :] [09:51:53] ahh nice [09:52:00] hashar: and first the chained file was wrong [09:52:08] because of https://gerrit.wikimedia.org/r/#/c/126008/ [09:52:14] how can I get the real certs installed so ? :] [09:52:23] and another change [09:52:41] hashar: you must be member in some special project [09:52:46] that is locked down to NDA people [09:52:51] and holds actual certs [09:52:53] afaik [09:53:02] s/certs/keys [09:53:26] hashar: install_certificate() in puppet [09:53:34] ah yeah [09:53:37] i just don't know which step people did manually [09:53:38] we did the same for beta cluster [09:53:46] root access is only granted to folks having NDA [09:53:49] i never logged into those instances [09:54:04] it must be the one that has yuviproxy [09:54:15] for star.wmflabs.org [09:55:10] i will mail labs-l list to figure it out :-] [09:55:13] hashar: openssl s_client -connect wikistats.wmflabs.org:443 | grep Subject [09:55:25] OU = Domain Control Validated - RapidSSL(R), CN = *.wmflabs.org [09:55:41] yes, good idea [09:55:57] PROBLEM - Puppet freshness on db1056 is CRITICAL: Last successful Puppet run was Wed 16 Apr 2014 06:54:47 AM UTC [10:05:31] mutante: bah that is bug 48501 which has 85 comments [10:06:02] hashar: indeed :p but there have been recent updates [10:06:45] let me paste some links there [10:09:04] sure :D [10:09:40] I gave up with the SSL cert madness on the ground that i have no clue how certs work :] [10:13:29] hashar: eh.. strictly speaking. that ticket is "*.{projects}.beta.wmflabs.org" [10:13:38] *.beta.wmflabs.org != *.wmflabs.org [10:14:07] unless *.wmflabs.org also has *.beta.wmflabs.org on it [10:14:21] i doubt robh could buy that kind of cert though [10:14:50] afaik can just have one level of * [10:15:20] that is a pity :-( [10:16:05] mutante: dont waste your time on it anyway :] [10:19:05] hashar: ok.. last link https://bugzilla.wikimedia.org/show_bug.cgi?id=60833 [10:24:46] (03CR) 10Dzahn: [C: 032] contint::slave-scripts recurse submodules [operations/puppet] - 10https://gerrit.wikimedia.org/r/122342 (owner: 10Hashar) [10:28:11] (03CR) 10Giuseppe Lavagetto: [C: 031] "LGTM, I was just thinking if there could be a way to generalize the cipher suite injection in apache virtualhosts creating a small ad-hoc " [operations/puppet] - 10https://gerrit.wikimedia.org/r/126204 (owner: 10Dzahn) [10:30:07] (03CR) 10Dzahn: "yes, agree to "out of the scope" because we have had soo many attempts to do apache setup in different generic ways" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126204 (owner: 10Dzahn) [10:34:11] (03PS2) 10Dzahn: Remove pmtpa compute servers from puppet. [operations/puppet] - 10https://gerrit.wikimedia.org/r/125977 (owner: 10Andrew Bogott) [10:35:53] (03PS3) 10Dzahn: Remove pmtpa compute servers from puppet. [operations/puppet] - 10https://gerrit.wikimedia.org/r/125977 (owner: 10Andrew Bogott) [10:37:12] node /virt([5-9]|1[0-1]).pmtpa.wmnet/ { .. if $::hostname =~ /^virt5$/ { .. [10:38:01] how do i remove the change to zookeeper module.. hrmm [10:38:06] from that [10:40:01] <_joe_> mutante: I have a patch to your change to support PFS on bugzilla [10:40:14] :) [10:40:25] <_joe_> mutante: do you mind if I add it to your change? [10:40:35] not at all, please do [10:40:43] wiki style [10:41:36] <_joe_> mutante: that's the thing I dislike about gerrit, with normal git you can see the commit history and it's easy to include/exclude patcher [10:41:57] <_joe_> *s [10:42:11] you'll have author and committer [10:43:53] (03PS1) 10Hashar: contint: soften python-voluptuous version requirement [operations/puppet] - 10https://gerrit.wikimedia.org/r/126221 [10:46:32] (03PS1) 10Dzahn: decom virt5-11, pmtpa compute nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/126222 [10:47:24] !log Upgraded Zuul on gallium to wmf-deploy-20140416 (depends on python-voluptuous 0.7+ , Alexandros packaged 0.8.2 which I manually installed to validate). [10:47:30] Logged the message, Master [10:48:03] (03PS2) 10Dzahn: decom virt5-11, pmtpa compute nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/126222 [10:49:33] (03PS1) 10Ricordisamoa: Activate the "other projects" sidebar managed by Wikibase in itwikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126223 [10:51:52] (03CR) 10Dzahn: [C: 032] decom virt5-11, pmtpa compute nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/126222 (owner: 10Dzahn) [10:53:04] (03CR) 10Dzahn: [C: 04-2] "done in Change-Id: I413779fcd incl. DHCP" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125977 (owner: 10Andrew Bogott) [10:53:32] (03Abandoned) 10Dzahn: Decom virt12 [operations/puppet] - 10https://gerrit.wikimedia.org/r/125984 (owner: 10Andrew Bogott) [10:54:19] (03Abandoned) 10Dzahn: Remove pmtpa compute servers from puppet. [operations/puppet] - 10https://gerrit.wikimedia.org/r/125977 (owner: 10Andrew Bogott) [10:55:47] (03PS2) 10Giuseppe Lavagetto: bugzilla, use better SSL cipher suite [operations/puppet] - 10https://gerrit.wikimedia.org/r/126205 (owner: 10Dzahn) [10:56:40] !log stopping puppet on virt5-11 [10:56:46] Logged the message, Master [10:57:20] (03CR) 10Giuseppe Lavagetto: [C: 031] "Also, the openssl ciphers -v works on zirconium if you eliminate whitespace from the ciphers list." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126205 (owner: 10Dzahn) [10:57:42] _joe_: whitespace? oh nice, thanks [10:57:59] (03CR) 10John F. Lewis: [C: 031] "Looks good." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126223 (owner: 10Ricordisamoa) [10:58:38] SSLCompression Off .. oh, yes, i saw and forgot [11:00:14] <_joe_> mutante: yes :) [11:00:34] <_joe_> it's very important if we want to protect long-lived sessions [11:01:19] <_joe_> I remember the time I saw CRIME and thought 'wow this is so bad' - it was pre-heartbleed I guess. Now it seems like a 2nd-tier problem [11:01:51] <_joe_> and it must be noted that any sane browser now disables ssl compression on the client side [11:02:24] <_joe_> so yeah, still important to score well on ssl checkers, but not really that relevant :) [11:02:50] thanks for that new PS :) [11:03:25] <_joe_> now moving to the fascinating problem of compiling our puppet manifests in puppet 3 [11:03:31] !log virt5-11 revoked puppet certs and salt keys [11:03:35] Logged the message, Master [11:03:44] <_joe_> Did anyone work on that already? [11:03:48] _joe_: ooh.. there is an Etherpad link [11:03:57] that lists all the issues you will run into [11:04:00] <_joe_> mutante: that is for fixing things [11:04:02] i think faidon did [11:04:17] to create that list, and matanya has been creating the patches based on that [11:04:31] <_joe_> mutante: did we try to actually compile our manifests with real facts somewhere? [11:04:36] i dunno [11:04:38] <_joe_> ok I'll ask faidon [11:04:40] <_joe_> :) [11:12:28] mutante: are you doing the whole decom process for virt5-15, or just merging that one puppet patch? [11:13:19] 04:04 < mutante> !log virt5-11 revoked puppet certs and salt keys [11:13:24] already doing [11:13:32] ok [11:13:35] andrewbogott: i would have stopped before shutdown [11:13:38] and let you do that [11:14:05] you're welcome to do shutdown as well -- everything seems to have gone ok with 12 yesterday. [11:14:23] but either way, just let me know how far you get :) [11:14:26] alright, that's what i wanted to hear, doing so then [11:15:58] !log virt5-11 removing from icinga [11:16:04] Logged the message, Master [11:16:38] (03PS1) 10Prtksxna: TextExtracts: Add classes and elements to the exclusion list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126226 [11:16:59] mutante: I want to shut down virt2 myself, and virt0 is going to live on for a while. [11:17:16] There also looks to be a virt15 which you can also decom, it isn't doing much as far as I can tell. [11:17:31] _joe_: there is a 3rd one, that is related https://gerrit.wikimedia.org/r/#/c/126206/ [11:17:52] andrewbogott: yes, i just touched the ones where i already knew you wanted to kill them, the compute nodes [11:18:00] not doing the others for now [11:18:21] virt15, ok [11:20:37] (03CR) 10Giuseppe Lavagetto: [C: 031] "Much better that the preceding version." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126206 (owner: 10Dzahn) [11:20:52] <_joe_> mutante: this was easy :) [11:21:47] (03PS1) 10Dzahn: rm wap.wikipedia.org apache site [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 [11:22:40] <_joe_> sigh, WAP was such a good technology... [11:22:51] (03CR) 10Dzahn: "isnt this done by?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 (owner: 10Dzahn) [11:23:23] apergos: reopened with that :P ^ [11:23:29] _joe_: thx [11:29:44] !log upgraded Zuul to wmf-deploy-20140416-2 [11:29:46] (03CR) 10ArielGlenn: [C: 031] rm wap.wikipedia.org apache site [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 (owner: 10Dzahn) [11:29:50] Logged the message, Master [11:33:38] (03PS1) 10Hashar: zuul: remove push_change_refs setting [operations/puppet] - 10https://gerrit.wikimedia.org/r/126229 [11:34:43] akosiaris: works fine, closed ticket :) https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=carbon&service=Ubuntu+mirror+in+sync+with+upstream [11:34:55] _joe_: haha [11:35:09] (03CR) 10Gilles: [C: 032] Enable MediaViewer user surveys on first batch of pilot sites [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126190 (owner: 10Gergő Tisza) [11:35:15] (03Merged) 10jenkins-bot: Enable MediaViewer user surveys on first batch of pilot sites [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126190 (owner: 10Gergő Tisza) [11:36:00] mutante: yey! [11:39:01] !log Upgraded Zuul to wmf-deploy-20140416-3 (bring in a84f0e4 - "Make queue processing more efficient" which was much needed) [11:39:04] Logged the message, Master [11:39:12] akosiaris: you can get python-voluptuous on apt.wikimedia.org :-] [11:39:17] zuul is happy! [11:39:30] would need https://gerrit.wikimedia.org/r/126221 as well [11:39:34] hashar: thanks :-) [11:39:36] since the package was pinned via puppet [11:39:41] ok will do [11:39:55] and https://gerrit.wikimedia.org/r/#/c/126229/ remove some old settings that are no more used [11:39:58] :] [11:40:58] !log upgraded python-voluptuous on apt.wikimedia.org to 0.8.2-1wmf1 [11:41:04] Logged the message, Master [11:41:20] PROBLEM - RAID on virt5 is CRITICAL: CRITICAL: Active: 14, Working: 14, Failed: 1, Spare: 0 [11:41:29] (03CR) 10Alexandros Kosiaris: [C: 032] contint: soften python-voluptuous version requirement [operations/puppet] - 10https://gerrit.wikimedia.org/r/126221 (owner: 10Hashar) [11:42:25] ACKNOWLEDGEMENT - RAID on virt5 is CRITICAL: CRITICAL: Active: 14, Working: 14, Failed: 1, Spare: 0 daniel_zahn RT #6541 [11:42:52] mutante: after the decom we'll still be able to keep track of which box was virt5 so it can get a new disk? [11:43:42] andrewbogott: yes, from racktables data WMF3662 [11:43:48] (03CR) 10Alexandros Kosiaris: [C: 032] zuul: remove push_change_refs setting [operations/puppet] - 10https://gerrit.wikimedia.org/r/126229 (owner: 10Hashar) [11:45:01] andrewbogott: updated 6541 [11:45:19] can you update 6158? [11:50:01] akosiaris: and you can get python-voluptuous updated on apt.wm.o :-] [11:52:50] (03CR) 10Dzahn: "host wap.wikipedia.org = ??" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 (owner: 10Dzahn) [11:55:59] hashar: we have beta.wap. but not wap it seems :p [11:57:30] (03PS1) 10Dzahn: remove WAP / beta.wap.wikipedia.org ? [operations/dns] - 10https://gerrit.wikimedia.org/r/126232 [11:58:11] i see, all language versions [11:59:16] (03Abandoned) 10Dzahn: remove WAP / beta.wap.wikipedia.org ? [operations/dns] - 10https://gerrit.wikimedia.org/r/126232 (owner: 10Dzahn) [11:59:18] mutante: that is not the beta cluster [11:59:46] hashar: ok, yea, i take the DNS thing back, just that old Apache config is not used [12:00:00] (03CR) 10Hashar: "I think the mobile team is phasing out wap. Might catch up with them to have a confirmation." [operations/dns] - 10https://gerrit.wikimedia.org/r/126232 (owner: 10Dzahn) [12:01:03] (03CR) 10Dzahn: "i see, all the language versions are generated from langlist.. so:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 (owner: 10Dzahn) [12:01:34] (03CR) 10Dzahn: "this Apache config is not used nevertheless, it is all on Varnish, right" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 (owner: 10Dzahn) [12:02:49] (03CR) 10Dzahn: "also see Change-Id: I5152ff336ca3" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 (owner: 10Dzahn) [12:11:54] !log virt5-11 - shut down [12:12:00] andrewbogott: they are actually down now [12:12:01] Logged the message, Master [12:12:13] mutante: cool. I'll tell virt0 that they're gone. [12:12:22] great [12:24:09] (03CR) 10Alexandros Kosiaris: [C: 04-1] "I quick look show that at least one problem is going to be fixed by this. udp2log" (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/110943 (owner: 10Andrew Bogott) [12:25:30] (03CR) 10Dzahn: "nfs1 is a Wikimedia central syslog server (nfs) (misc::syslog-server)." [operations/dns] - 10https://gerrit.wikimedia.org/r/125952 (owner: 10Dzahn) [12:30:44] (03CR) 10Lydia Pintscher: [C: 031] "Good from PM side :)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126223 (owner: 10Ricordisamoa) [12:34:55] (03PS8) 10Dzahn: move LDAP admin permissions,tools out of site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 [12:38:23] (03CR) 10Dzahn: [C: 032] "shouldn't change what people can do on formey who had sudo before. after this find a replacement for formey though and apply there (and ma" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [12:41:19] (03CR) 10Dzahn: "ran fine on formey. well it did remove one thing, the permission to run /var/lib/gerrit2/review_site/bin/gerrit.sh but that was intended a" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [12:56:06] mornin manybubbles, shall I continue? [12:56:06] (03CR) 10Andrew Bogott: Restore sysctl priorities. (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/110943 (owner: 10Andrew Bogott) [12:56:07]