[00:02:23] RECOVERY - Memcached on ms-fe3001 is OK: TCP OK - 0.095 second response time on port 11211 [00:03:54] RECOVERY - Memcached on ms-fe3002 is OK: TCP OK - 0.096 second response time on port 11211 [00:03:54] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (203637) [00:26:44] PROBLEM - Host mw1163 is DOWN: PING CRITICAL - Packet loss = 100% [00:27:35] RECOVERY - Host mw1163 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [00:37:14] PROBLEM - Swift HTTP backend on ms-fe3002 is CRITICAL: Connection refused [00:37:25] PROBLEM - Swift HTTP backend on ms-fe3001 is CRITICAL: Connection refused [00:51:07] (03PS1) 10CSteipp: Temporarily allow insecure token trasfer for OAuth [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126185 [01:46:14] PROBLEM - Host mw1163 is DOWN: PING CRITICAL - Packet loss = 100% [01:47:04] RECOVERY - Host mw1163 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [02:13:44] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 2952 MB (3% inode=99%): [02:19:44] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3492 MB (3% inode=99%): [02:28:03] !log LocalisationUpdate completed (1.23wmf21) at 2014-04-16 02:28:01+00:00 [02:28:11] Logged the message, Master [02:42:07] (03PS1) 10Gergő Tisza: Enable MediaViewer user surveys on first batch of pilot sites [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126190 [02:49:04] PROBLEM - MySQL InnoDB on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:54] RECOVERY - MySQL InnoDB on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [02:55:30] !log LocalisationUpdate completed (1.23wmf22) at 2014-04-16 02:55:28+00:00 [02:55:38] Logged the message, Master [03:00:44] RECOVERY - Disk space on virt0 is OK: DISK OK [03:04:54] PROBLEM - Puppet freshness on ms-be3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [03:04:54] PROBLEM - Puppet freshness on ms-be3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [03:04:54] PROBLEM - Puppet freshness on ms-be3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [03:04:54] PROBLEM - Puppet freshness on ms-be3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [03:04:54] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [03:04:54] PROBLEM - Puppet freshness on ms-fe3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [03:06:04] PROBLEM - Host mw1163 is DOWN: PING CRITICAL - Packet loss = 100% [03:06:44] RECOVERY - Host mw1163 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [03:46:28] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Apr 16 03:46:23 UTC 2014 (duration 46m 22s) [03:46:34] Logged the message, Master [04:25:44] PROBLEM - Host mw1163 is DOWN: PING CRITICAL - Packet loss = 100% [04:26:24] RECOVERY - Host mw1163 is UP: PING OK - Packet loss = 0%, RTA = 0.47 ms [05:03:44] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [05:06:44] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (202212) [05:45:14] PROBLEM - Host mw1163 is DOWN: PING CRITICAL - Packet loss = 100% [05:51:10] (03CR) 10Amire80: "I'm not sure what do you mean by this. No fonts will be loaded automatically unless specifically requested by the page. Fonts for Burmese " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115153 (owner: 10Odder) [06:02:47] <_joe_> hi all! [06:04:48] good morning [06:05:01] <_joe_> akosiaris: isn't it like dawn there? [06:05:08] 9 am [06:05:11] close enough to dawn [06:05:18] <_joe_> oh yes, 1 hour AHEAD [06:05:27] hehe... it certainly looks like that today [06:05:32] crappy weather [06:05:41] yeah [06:05:42] meh [06:05:53] <_joe_> here it's chilly (but don't let andrewbogott_afk know I said that) [06:05:54] PROBLEM - Puppet freshness on ms-be3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [06:05:54] PROBLEM - Puppet freshness on ms-be3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [06:05:54] PROBLEM - Puppet freshness on ms-be3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [06:05:54] PROBLEM - Puppet freshness on ms-be3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [06:05:54] PROBLEM - Puppet freshness on ms-fe3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [06:05:54] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [06:06:01] and rain tmorrow, bleah [06:07:36] !log stop mysqld on db38 (x1) for decom [06:07:41] Logged the message, Master [06:09:05] <_joe_> springle: good evening, sir [06:09:22] <_joe_> springle: next week I'll bug you with db questions (I hope) [06:09:39] hi _joe_ [06:09:42] ok :) [06:10:25] <_joe_> anyone: do we collect metrics on APC hit rate and usage somewhere? [06:12:03] <_joe_> we have someth8ing in graphite [06:12:37] no idea if we have anything else [06:13:45] <_joe_> that's something we'll need to do with diamond [06:18:44] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [06:22:32] (03CR) 10ArielGlenn: "As sudo::appservers is removed from this class it should be explicitly added to fenari, just so nothing is broken there. I can't imagine " [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 (owner: 10Dzahn) [06:23:59] (03PS1) 10Springle: Remove db38 from x1 shard [operations/puppet] - 10https://gerrit.wikimedia.org/r/126202 [06:26:13] (03CR) 10Springle: [C: 032] Remove db38 from x1 shard [operations/puppet] - 10https://gerrit.wikimedia.org/r/126202 (owner: 10Springle) [06:26:44] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (202102) [06:31:46] (03PS1) 10Springle: Remove db48 and db49 from OTRS mail duties. db49 is decommissioned already so hasn't worked as a secondary for a while. [operations/puppet] - 10https://gerrit.wikimedia.org/r/126203 [06:34:24] (03CR) 10Springle: "Want to decommission db48, but mchenry is talking to MySQL there. Will this config work until mchenry goes away?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126203 (owner: 10Springle) [06:43:43] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [06:45:58] (03PS1) 10Dzahn: bugzilla,make Apache SSL CipherSuite configurable [operations/puppet] - 10https://gerrit.wikimedia.org/r/126204 [06:46:00] (03PS1) 10Dzahn: bugzilla, use better SSL cipher suite [operations/puppet] - 10https://gerrit.wikimedia.org/r/126205 [06:46:02] (03PS1) 10Dzahn: bugzilla, use SSLProtocol ALL -SSLv2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126206 [06:53:26] (03CR) 10Dzahn: "why do i get this?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126205 (owner: 10Dzahn) [07:00:43] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (200906) [07:03:03] !log zirconium - upgrading apache2, php5 packages [07:03:09] Logged the message, Master [07:06:43] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [07:09:00] (03Abandoned) 10Ori.livneh: beta cluster: un-split-brain memcached config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125926 (owner: 10Ori.livneh) [07:09:18] mutante: I doubt many of the ciphers in that ticket will work btw. ECDHE is only supported in apache 2.4 IIRC [07:10:01] akosiaris: yes, but it's per "If your version of OpenSSL is old, unavailable ciphers will be discarded automatically. Always use the full ciphersuite above and let OpenSSL pick the ones it supports. " [07:10:29] yea, want newer apache as well .. trusty [07:10:39] <_joe_> akosiaris: I thought those arguments were passed directly to openSSL and that apache did nothing with them [07:10:53] _joe_: so did I for some time [07:10:57] oh, yea, and the error i get is from [07:11:14] openssl ciphers -v '...' [07:11:24] <_joe_> mutante: oh, ok [07:12:01] i dont know yet if that means Apache would break [07:12:05] <_joe_> mutante: I did some research on PFS, if we don't mind annoying IE users I can take a look at what I did back then [07:12:12] the suite list is straight from the Mozilla page though [07:12:26] yeah IE is a pita concerning FS and PFS [07:12:26] !ie [07:12:32] <_joe_> mutante: the list on the mozilla PFS page hase some issues [07:12:41] didnt IE6 support finally end in April? [07:12:48] heh, yea, but it's Bugzilla... [07:12:57] so we should kind of work for a wide audience [07:13:02] <_joe_> akosiaris: in particular, TLSv1.1 and TLSv1.2 are disabled by default in IE up to version 9 at least [07:13:29] <_joe_> mutante: exactly, the problem is not IE6 (which won't work anyway). [07:13:30] mutante we should also add HSTS [07:13:39] Header add Strict-Transport-Security "max-age=15552000" [07:13:58] <_joe_> akosiaris: HSTS and pin the cert? [07:14:53] !IRClog2Gerrit :) [07:14:56] <_joe_> ok not our case though [07:15:15] pinning the cert would be nice too [07:15:33] <_joe_> akosiaris: that is client-side and verges on the paranoia a little bit [07:16:14] how about # OCSP Stapling, only in httpd 2.3.3 and later [07:16:23] SSLUseStapling on [07:17:13] <_joe_> only in httpd 2.3.3 and later... [07:17:18] <_joe_> :) [07:17:45] ok i can live with stapling only as well :-) [07:18:04] eh, yes:) i just upgraded the 2.2.22 packages btw [07:19:33] https://launchpad.net/ubuntu/+source/apache2/2.2.22-1ubuntu1.5 [07:20:52] akosiaris: _joe_ , please leave the comments on the gerrit so they are not list in irc backlog?:) [07:22:35] 126204 is not touching the suite itself, just making it easier to change, fwiw [07:26:03] (03PS3) 10Dzahn: remove sudo::appserver from bastions [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 [07:26:31] <_joe_> mutante: will do! [07:27:08] (03CR) 10Dzahn: [C: 031] "now included on fenari, per Ariel's comment" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 (owner: 10Dzahn) [07:27:17] _joe_: thanks! [07:27:34] (03PS4) 10Dzahn: remove sudo::appserver from bastions [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 [07:28:30] mutante: I saw hoo man also suggested remove mysql_wmf::client from bastionhost [07:28:52] akosiaris: yea, i saw it somewhere, i think it's already another patch [07:29:39] https://gerrit.wikimedia.org/r/#/c/126027/ [07:31:49] (03CR) 10Dzahn: [C: 031] "i guess it is like sudo::appservers, it should stay on fenari, but not the other newer bastions. the root cause is that fenari was half ba" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126027 (owner: 10Hoo man) [07:35:55] (03PS2) 10Dzahn: Remove mysql client from bastionhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/126027 (owner: 10Hoo man) [07:45:33] (03CR) 10Dzahn: [C: 032] "confirmed by Jeff" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125715 (owner: 10Dzahn) [07:52:47] (03CR) 10ArielGlenn: [C: 031] remove sudo::appserver from bastions [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 (owner: 10Dzahn) [08:03:53] good morning [08:06:31] <_joe_> hi hashar [08:07:15] I cleaned my email inbox at 2am [08:07:30] and by 10 am I already have 43 emails [08:07:55] <_joe_> hashar: I have like 1000 since tonight, but most are cron/root spam [08:08:16] _joe_: most of mines are Gerrit notifications [08:08:42] <_joe_> hashar: oh, yes, then there are those [08:09:18] and bugzilla!! [08:10:38] (03CR) 10Dzahn: [C: 032] beta: drop pmtpa instances from the natfix subclass [operations/puppet] - 10https://gerrit.wikimedia.org/r/125194 (owner: 10Hashar) [08:10:46] \O/ [08:10:46] adds one notification for hashar [08:11:09] I created some filters that parse the mails headers and tags the Gerrit emails [08:11:15] so I can then show them colored [08:11:24] i.e. merged notification shows up green [08:11:47] Jenkins success are purple, Jenkins failure are orange [08:11:50] and abandonned patches blue [08:11:57] that ease the triage tremendously [08:12:09] <_joe_> hashar: which client do you use? [08:12:14] Thunderbird [08:12:20] <_joe_> I should do something like that in mutt [08:12:28] I should learn mutt [08:12:29] :D [08:12:32] <_joe_> as soon as I've time to set up the conf for WMF [08:12:46] <_joe_> hashar: I'm using thunderbird for WMF email at the moment [08:12:57] i think i'm going back to thunderbird [08:13:14] <_joe_> mutante: from what? [08:13:18] google web ui [08:13:46] i want to sort alpha by subject [08:13:55] and i have always been TB user in the past [08:14:17] I definitely need the threaded view [08:14:28] i got used to tags.. but meh [08:15:00] do you guys talk to sanger or to google though when doing imap [08:15:02] <_joe_> gmail is *horribly* dumbed down [08:15:16] <_joe_> mutante: google, I do some pre-processing there [08:15:44] <_joe_> I move some mails to places I do not make available for IMAP with labels, but it doesn't seem to work well [08:15:45] (03PS1) 10ArielGlenn: timeout submit_check_result, see rt #5311 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126209 [08:16:07] nods [08:17:30] _joe_ mutante : this is how it looks to me http://imgur.com/ceq4Klr [08:18:03] (03PS2) 10Dzahn: beta: adjust protoproxy for eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/124057 (owner: 10Hashar) [08:18:36] spam = indesirable ?:) [08:18:51] mutante: yes :-] [08:19:00] could be translated as "unwanted" [08:19:07] and the various labels I am applying http://imgur.com/Re6iFFV [08:19:23] <_joe_> don't you have canned meat in france? you could translate spam to some local brand :P [08:20:00] we barely have any [08:20:27] <_joe_> Yeah, it's the same here, and nobody expects the thing to be edible, either [08:20:35] usually refers to it as "Corned-beef" ( https://en.wikipedia.org/wiki/Corned_beef ) [08:20:44] irrelevant of the actual 'brand' [08:20:52] pourriel ? [08:21:01] <_joe_> and no, we don't have corned beef at all I'd say [08:21:14] we are huge fans of Pâté though https://en.wikipedia.org/wiki/Pâté :-] [08:21:19] merdiel, polluriel [08:21:34] RECOVERY - RAID on dataset1001 is OK: OK: optimal, 2 logical, 24 physical [08:21:36] mutante: merdiel sounds great! [08:21:41] ooohhhh [08:21:44] <_joe_> "merdiel" sounds not so good [08:22:01] <_joe_> hashar: eheh. [08:22:13] hashar: http://en.wiktionary.org/wiki/pourriel#Synonyms [08:22:31] Quebec has a government agency in charge of "properly" translating english words to french words [08:22:39] you know i always have to check wikt [08:22:45] red links! [08:22:57] for email they came with the rather nice "courriel" (short for "courrier électronique" or "electronic mail) [08:23:15] and yeah they recommand "pourriel" for spam mails http://gdt.oqlf.gouv.qc.ca/ficheOqlf.aspx?Id_Fiche=8349831 [08:23:43] apergos: that was a good "ooh" ,right [08:23:45] the good thing on that dictionary is that they actually explain why one should not use the english word [08:24:03] <_joe_> hashar: why so? [08:24:15] so Pourriel comes from POUbelle (trash bin) and couRRIEL [08:24:23] <_joe_> (I'm asking as in Italy we do happily use the english words) [08:24:26] yes, it was a happy dataset1001 ooohhh [08:24:27] and pourri also means rotted [08:25:11] _joe_: Quebec is surrounded by english speaking people, I guess enforcing french everywhere is a way for us to show their "independency" and preserve the french culture [08:25:25] in France we do just like everyone else, we use the english words [08:26:38] <_joe_> hashar: I thought you used french terms as well (logiciel, ordinateur, etc...) [08:27:53] (03CR) 10Hashar: "I would adjust the services entries to point to whatever our syslog are. I am not sure whether they are used." [operations/dns] - 10https://gerrit.wikimedia.org/r/125952 (owner: 10Dzahn) [08:28:20] _joe_: that is true, I guess it depends on how lazy we are [08:28:49] _joe_: for online discussion we use "chat" (which also mean cat hehe) [08:29:04] (03CR) 10Dzahn: [C: 032] "from BZ: "Please enable ssl/https support for the beta wikis again. It is missing after migration to eqiad"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124057 (owner: 10Hashar) [08:29:29] that was for star.wmflabs.org [08:29:37] andrewbogott_afk: hashar & [08:29:47] mutante: yeah ssl is still broken on beta. The nginx refuse to start because of some SSL chain issue [08:30:01] just merged that change [08:30:09] that replaces pmtpa with eqiad [08:30:15] in protoproxy [08:30:15] yeah should be fine [08:30:32] I have a few more changes applied to puppet master that should be harmless for prod [08:31:20] https://gerrit.wikimedia.org/r/#/q/owner:hashar+project:operations/puppet+is:open+topic:contint,n,z :-] [08:34:14] (03CR) 10Dzahn: [C: 032] misc/dsh.pp: retab and almost pass puppet lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/122789 (owner: 10Hashar) [08:37:54] hashar: that monitor check can likely be removed though [08:38:20] and i think it was on another matanya change [08:40:14] (03CR) 10Dzahn: "noop, no puppet changes on bast1001" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122789 (owner: 10Hashar) [08:43:05] (03CR) 10Dzahn: [C: 032] "for beta uploads" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122786 (owner: 10Hashar) [08:44:51] (03CR) 10Hashar: beta: New script to restart apaches (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/125888 (owner: 10BryanDavis) [08:45:17] (03CR) 10Dzahn: [C: 032] "no access changes, just sorts alpha" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126154 (owner: 10Hashar) [08:49:30] (03CR) 10Dzahn: [C: 031] lvs: generic::upstart_job() now uses boolean values [operations/puppet] - 10https://gerrit.wikimedia.org/r/118717 (owner: 10Hashar) [08:50:27] (03CR) 10Dzahn: [C: 031] twemproxy: generic::upstart_job() now uses boolean values [operations/puppet] - 10https://gerrit.wikimedia.org/r/118718 (owner: 10Hashar) [08:52:03] (03CR) 10Dzahn: [C: 031 V: 031] Lint mediawiki::twemproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/121400 (owner: 10Hashar) [08:55:35] (03CR) 10Dzahn: [C: 031 V: 031] "dn: uid=parsoid,ou=people,dc=wikimedia,dc=org" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123212 (owner: 10Hashar) [08:59:20] (03CR) 10Dzahn: [C: 04-1] "are you sure? i got:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123600 (owner: 10Hashar) [08:59:49] (03CR) 10Dzahn: "Puppet-lint 0.1.12" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123600 (owner: 10Hashar) [09:05:55] (03CR) 10Hashar: "I use the version from gem:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123600 (owner: 10Hashar) [09:05:57] PROBLEM - Puppet freshness on ms-be3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [09:05:57] PROBLEM - Puppet freshness on ms-be3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [09:05:57] PROBLEM - Puppet freshness on ms-be3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [09:05:57] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [09:05:57] PROBLEM - Puppet freshness on ms-fe3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [09:05:57] PROBLEM - Puppet freshness on ms-be3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [09:08:41] (03CR) 10Mark Bergsma: [C: 032] Initial commit of pmacct module and role [operations/puppet] - 10https://gerrit.wikimedia.org/r/115345 (owner: 10Jkrauska) [09:10:56] (03CR) 10Dzahn: [C: 032] "wow, much newer" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123600 (owner: 10Hashar) [09:11:09] (03PS2) 10Dzahn: puppet-lint: ignore class_parameter_defaults [operations/puppet] - 10https://gerrit.wikimedia.org/r/123600 (owner: 10Hashar) [09:11:45] (03CR) 10Dzahn: [C: 032 V: 032] puppet-lint: ignore class_parameter_defaults [operations/puppet] - 10https://gerrit.wikimedia.org/r/123600 (owner: 10Hashar) [09:16:11] (03CR) 10Dzahn: "that would mean create syslog.eqiad.wmnet as an alias for ..?" [operations/dns] - 10https://gerrit.wikimedia.org/r/125952 (owner: 10Dzahn) [09:17:52] (03CR) 10Dzahn: "it's gone from https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=payments.wikimedia.org&nostatusheader" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125715 (owner: 10Dzahn) [09:20:13] ACKNOWLEDGEMENT - Host mw1163 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn 09:22 cmjohnson1 !log shutting down mw1163 to replace DIMM [09:24:53] (03PS1) 10Dzahn: remove mw1163 from dsh, broken memory [operations/puppet] - 10https://gerrit.wikimedia.org/r/126215 [09:25:50] (03CR) 10Hashar: [C: 031] "Fine to me. Feel free to +2 it at anytime to get the change applied on the beta cluster." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126185 (owner: 10CSteipp) [09:26:22] !log disabling mw1163 in pybal [09:26:27] Logged the message, Master [09:28:26] (03CR) 10Dzahn: [C: 032] "fyi, see ticket, also disabled in pybal. please revert in both places after hardware repair is done" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126215 (owner: 10Dzahn) [09:34:05] (03CR) 10Dzahn: [C: 032] contint: directory to hold debian-glue packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/122712 (owner: 10Hashar) [09:35:14] (03CR) 10Dzahn: [C: 031] contint: get rid of misc::pbuilder on slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/122707 (owner: 10Hashar) [09:36:31] mutante: that last patch "contint: get rid of misc::pbuilder on slaves" https://gerrit.wikimedia.org/r/#/c/122707/ , you can get it merged. It is already on the local puppetmaster :-D [09:37:02] hashar: ok, i just did not see the dependency first [09:37:14] ahh [09:37:17] and creating that directory was easy to check [09:38:20] (03CR) 10Dzahn: [C: 032] "already on local puppetmaster" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122707 (owner: 10Hashar) [09:39:53] hashar: ok, i hope that reduced your queue a bit.. now handing over to others for the rest though [09:40:41] mutante: there is one more you can land in, which is to have the contint git-deploy repositories to use submodules https://gerrit.wikimedia.org/r/122342 :D [09:40:46] all: https://gerrit.wikimedia.org/r/#/q/owner:hashar+status:open,n,z [09:40:59] and follow up change https://gerrit.wikimedia.org/r/124305 which configure a contint repo in gitdeploy [09:41:03] both straight forward :] [09:41:23] the rests are not that trivial :/ [09:41:58] check the tickets for uploads and SSL on beta? :) [09:42:05] after the recent merges [09:42:17] i'll look again after a break [09:42:32] gets coffee [09:44:05] mutante: for some reason the star.wmflabs.org cert used by nginx protoproxy is invalid and rejected :D [09:46:36] <_joe_> hashar: where can I fetch it? [09:46:54] <_joe_> mmmh, coffee... [09:46:57] got the issue on deployment-cache-bits01.eqiad.wmflabs [09:47:00] hashar: still? sigh [09:47:05] https://gerrit.wikimedia.org/r/#/c/126008/ [09:47:06] * hashar just like everyone grabs a coffeee [09:47:18] <_joe_> well, I have to brew it first [09:47:40] https://gerrit.wikimedia.org/r/#/c/124859/2/manifests/role/cache.pp [09:48:04] (03PS6) 10Giuseppe Lavagetto: Substituting the check_graphite script. [operations/puppet] - 10https://gerrit.wikimedia.org/r/125726 [09:48:11] my daughter has hidden our coffee capsules grrr [09:48:28] <_joe_> hashar: coffee capsules, you're doing it all wrong :) [09:48:46] _joe_: na I am lazy :-] [09:49:07] mutante: looks like chinese to me :-] [09:49:14] openssl s_client -connect wikistats.wmflabs.org:443 -CAfile /etc/ssl/certs/ meh,, i got a Verified 0 there yesterday [09:49:23] so on labs [09:49:23] after they fixed the chain [09:49:30] the .key comes from the repository labs/private apparently [09:49:33] (03CR) 10Giuseppe Lavagetto: "@chase, my responses are inline:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125726 (owner: 10Giuseppe Lavagetto) [09:49:39] which is *roll the drums* a PUBLIC repository [09:49:45] hashar: coffee capsules? she is doing the right thing [09:50:01] hashar: did somebody submit it there? :o [09:50:21] hashar: afaik there is the special labs project that is all locked down.. just for this [09:50:33] last change was * 21b4656 - Adding keys for labs (2 years, 5 months ago) [09:50:43] and.. capsules are just a way to raise the coffee price [09:51:00] so I guess the keys I get installed are obsoletes [09:51:03] once you get the machine to refill the capsules yourselves to save again ... [09:51:31] mutante: well a capsule is only 0,25€ so it is cheaper than me brewing the coffee :-] [09:51:47] hashar: yes, andrewbogott_afk and RobH replaced the self-signed cert with a real one from RapidSSL [09:51:48] I usually get my first coffee in bar which is 1,30€ anyway :] [09:51:53] ahh nice [09:52:00] hashar: and first the chained file was wrong [09:52:08] because of https://gerrit.wikimedia.org/r/#/c/126008/ [09:52:14] how can I get the real certs installed so ? :] [09:52:23] and another change [09:52:41] hashar: you must be member in some special project [09:52:46] that is locked down to NDA people [09:52:51] and holds actual certs [09:52:53] afaik [09:53:02] s/certs/keys [09:53:26] hashar: install_certificate() in puppet [09:53:34] ah yeah [09:53:37] i just don't know which step people did manually [09:53:38] we did the same for beta cluster [09:53:46] root access is only granted to folks having NDA [09:53:49] i never logged into those instances [09:54:04] it must be the one that has yuviproxy [09:54:15] for star.wmflabs.org [09:55:10] i will mail labs-l list to figure it out :-] [09:55:13] hashar: openssl s_client -connect wikistats.wmflabs.org:443 | grep Subject [09:55:25] OU = Domain Control Validated - RapidSSL(R), CN = *.wmflabs.org [09:55:41] yes, good idea [09:55:57] PROBLEM - Puppet freshness on db1056 is CRITICAL: Last successful Puppet run was Wed 16 Apr 2014 06:54:47 AM UTC [10:05:31] mutante: bah that is bug 48501 which has 85 comments [10:06:02] hashar: indeed :p but there have been recent updates [10:06:45] let me paste some links there [10:09:04] sure :D [10:09:40] I gave up with the SSL cert madness on the ground that i have no clue how certs work :] [10:13:29] hashar: eh.. strictly speaking. that ticket is "*.{projects}.beta.wmflabs.org" [10:13:38] *.beta.wmflabs.org != *.wmflabs.org [10:14:07] unless *.wmflabs.org also has *.beta.wmflabs.org on it [10:14:21] i doubt robh could buy that kind of cert though [10:14:50] afaik can just have one level of * [10:15:20] that is a pity :-( [10:16:05] mutante: dont waste your time on it anyway :] [10:19:05] hashar: ok.. last link https://bugzilla.wikimedia.org/show_bug.cgi?id=60833 [10:24:46] (03CR) 10Dzahn: [C: 032] contint::slave-scripts recurse submodules [operations/puppet] - 10https://gerrit.wikimedia.org/r/122342 (owner: 10Hashar) [10:28:11] (03CR) 10Giuseppe Lavagetto: [C: 031] "LGTM, I was just thinking if there could be a way to generalize the cipher suite injection in apache virtualhosts creating a small ad-hoc " [operations/puppet] - 10https://gerrit.wikimedia.org/r/126204 (owner: 10Dzahn) [10:30:07] (03CR) 10Dzahn: "yes, agree to "out of the scope" because we have had soo many attempts to do apache setup in different generic ways" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126204 (owner: 10Dzahn) [10:34:11] (03PS2) 10Dzahn: Remove pmtpa compute servers from puppet. [operations/puppet] - 10https://gerrit.wikimedia.org/r/125977 (owner: 10Andrew Bogott) [10:35:53] (03PS3) 10Dzahn: Remove pmtpa compute servers from puppet. [operations/puppet] - 10https://gerrit.wikimedia.org/r/125977 (owner: 10Andrew Bogott) [10:37:12] node /virt([5-9]|1[0-1]).pmtpa.wmnet/ { .. if $::hostname =~ /^virt5$/ { .. [10:38:01] how do i remove the change to zookeeper module.. hrmm [10:38:06] from that [10:40:01] <_joe_> mutante: I have a patch to your change to support PFS on bugzilla [10:40:14] :) [10:40:25] <_joe_> mutante: do you mind if I add it to your change? [10:40:35] not at all, please do [10:40:43] wiki style [10:41:36] <_joe_> mutante: that's the thing I dislike about gerrit, with normal git you can see the commit history and it's easy to include/exclude patcher [10:41:57] <_joe_> *s [10:42:11] you'll have author and committer [10:43:53] (03PS1) 10Hashar: contint: soften python-voluptuous version requirement [operations/puppet] - 10https://gerrit.wikimedia.org/r/126221 [10:46:32] (03PS1) 10Dzahn: decom virt5-11, pmtpa compute nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/126222 [10:47:24] !log Upgraded Zuul on gallium to wmf-deploy-20140416 (depends on python-voluptuous 0.7+ , Alexandros packaged 0.8.2 which I manually installed to validate). [10:47:30] Logged the message, Master [10:48:03] (03PS2) 10Dzahn: decom virt5-11, pmtpa compute nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/126222 [10:49:33] (03PS1) 10Ricordisamoa: Activate the "other projects" sidebar managed by Wikibase in itwikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126223 [10:51:52] (03CR) 10Dzahn: [C: 032] decom virt5-11, pmtpa compute nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/126222 (owner: 10Dzahn) [10:53:04] (03CR) 10Dzahn: [C: 04-2] "done in Change-Id: I413779fcd incl. DHCP" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125977 (owner: 10Andrew Bogott) [10:53:32] (03Abandoned) 10Dzahn: Decom virt12 [operations/puppet] - 10https://gerrit.wikimedia.org/r/125984 (owner: 10Andrew Bogott) [10:54:19] (03Abandoned) 10Dzahn: Remove pmtpa compute servers from puppet. [operations/puppet] - 10https://gerrit.wikimedia.org/r/125977 (owner: 10Andrew Bogott) [10:55:47] (03PS2) 10Giuseppe Lavagetto: bugzilla, use better SSL cipher suite [operations/puppet] - 10https://gerrit.wikimedia.org/r/126205 (owner: 10Dzahn) [10:56:40] !log stopping puppet on virt5-11 [10:56:46] Logged the message, Master [10:57:20] (03CR) 10Giuseppe Lavagetto: [C: 031] "Also, the openssl ciphers -v works on zirconium if you eliminate whitespace from the ciphers list." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126205 (owner: 10Dzahn) [10:57:42] _joe_: whitespace? oh nice, thanks [10:57:59] (03CR) 10John F. Lewis: [C: 031] "Looks good." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126223 (owner: 10Ricordisamoa) [10:58:38] SSLCompression Off .. oh, yes, i saw and forgot [11:00:14] <_joe_> mutante: yes :) [11:00:34] <_joe_> it's very important if we want to protect long-lived sessions [11:01:19] <_joe_> I remember the time I saw CRIME and thought 'wow this is so bad' - it was pre-heartbleed I guess. Now it seems like a 2nd-tier problem [11:01:51] <_joe_> and it must be noted that any sane browser now disables ssl compression on the client side [11:02:24] <_joe_> so yeah, still important to score well on ssl checkers, but not really that relevant :) [11:02:50] thanks for that new PS :) [11:03:25] <_joe_> now moving to the fascinating problem of compiling our puppet manifests in puppet 3 [11:03:31] !log virt5-11 revoked puppet certs and salt keys [11:03:35] Logged the message, Master [11:03:44] <_joe_> Did anyone work on that already? [11:03:48] _joe_: ooh.. there is an Etherpad link [11:03:57] that lists all the issues you will run into [11:04:00] <_joe_> mutante: that is for fixing things [11:04:02] i think faidon did [11:04:17] to create that list, and matanya has been creating the patches based on that [11:04:31] <_joe_> mutante: did we try to actually compile our manifests with real facts somewhere? [11:04:36] i dunno [11:04:38] <_joe_> ok I'll ask faidon [11:04:40] <_joe_> :) [11:12:28] mutante: are you doing the whole decom process for virt5-15, or just merging that one puppet patch? [11:13:19] 04:04 < mutante> !log virt5-11 revoked puppet certs and salt keys [11:13:24] already doing [11:13:32] ok [11:13:35] andrewbogott: i would have stopped before shutdown [11:13:38] and let you do that [11:14:05] you're welcome to do shutdown as well -- everything seems to have gone ok with 12 yesterday. [11:14:23] but either way, just let me know how far you get :) [11:14:26] alright, that's what i wanted to hear, doing so then [11:15:58] !log virt5-11 removing from icinga [11:16:04] Logged the message, Master [11:16:38] (03PS1) 10Prtksxna: TextExtracts: Add classes and elements to the exclusion list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126226 [11:16:59] mutante: I want to shut down virt2 myself, and virt0 is going to live on for a while. [11:17:16] There also looks to be a virt15 which you can also decom, it isn't doing much as far as I can tell. [11:17:31] _joe_: there is a 3rd one, that is related https://gerrit.wikimedia.org/r/#/c/126206/ [11:17:52] andrewbogott: yes, i just touched the ones where i already knew you wanted to kill them, the compute nodes [11:18:00] not doing the others for now [11:18:21] virt15, ok [11:20:37] (03CR) 10Giuseppe Lavagetto: [C: 031] "Much better that the preceding version." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126206 (owner: 10Dzahn) [11:20:52] <_joe_> mutante: this was easy :) [11:21:47] (03PS1) 10Dzahn: rm wap.wikipedia.org apache site [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 [11:22:40] <_joe_> sigh, WAP was such a good technology... [11:22:51] (03CR) 10Dzahn: "isnt this done by?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 (owner: 10Dzahn) [11:23:23] apergos: reopened with that :P ^ [11:23:29] _joe_: thx [11:29:44] !log upgraded Zuul to wmf-deploy-20140416-2 [11:29:46] (03CR) 10ArielGlenn: [C: 031] rm wap.wikipedia.org apache site [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 (owner: 10Dzahn) [11:29:50] Logged the message, Master [11:33:38] (03PS1) 10Hashar: zuul: remove push_change_refs setting [operations/puppet] - 10https://gerrit.wikimedia.org/r/126229 [11:34:43] akosiaris: works fine, closed ticket :) https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=carbon&service=Ubuntu+mirror+in+sync+with+upstream [11:34:55] _joe_: haha [11:35:09] (03CR) 10Gilles: [C: 032] Enable MediaViewer user surveys on first batch of pilot sites [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126190 (owner: 10Gergő Tisza) [11:35:15] (03Merged) 10jenkins-bot: Enable MediaViewer user surveys on first batch of pilot sites [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126190 (owner: 10Gergő Tisza) [11:36:00] mutante: yey! [11:39:01] !log Upgraded Zuul to wmf-deploy-20140416-3 (bring in a84f0e4 - "Make queue processing more efficient" which was much needed) [11:39:04] Logged the message, Master [11:39:12] akosiaris: you can get python-voluptuous on apt.wikimedia.org :-] [11:39:17] zuul is happy! [11:39:30] would need https://gerrit.wikimedia.org/r/126221 as well [11:39:34] hashar: thanks :-) [11:39:36] since the package was pinned via puppet [11:39:41] ok will do [11:39:55] and https://gerrit.wikimedia.org/r/#/c/126229/ remove some old settings that are no more used [11:39:58] :] [11:40:58] !log upgraded python-voluptuous on apt.wikimedia.org to 0.8.2-1wmf1 [11:41:04] Logged the message, Master [11:41:20] PROBLEM - RAID on virt5 is CRITICAL: CRITICAL: Active: 14, Working: 14, Failed: 1, Spare: 0 [11:41:29] (03CR) 10Alexandros Kosiaris: [C: 032] contint: soften python-voluptuous version requirement [operations/puppet] - 10https://gerrit.wikimedia.org/r/126221 (owner: 10Hashar) [11:42:25] ACKNOWLEDGEMENT - RAID on virt5 is CRITICAL: CRITICAL: Active: 14, Working: 14, Failed: 1, Spare: 0 daniel_zahn RT #6541 [11:42:52] mutante: after the decom we'll still be able to keep track of which box was virt5 so it can get a new disk? [11:43:42] andrewbogott: yes, from racktables data WMF3662 [11:43:48] (03CR) 10Alexandros Kosiaris: [C: 032] zuul: remove push_change_refs setting [operations/puppet] - 10https://gerrit.wikimedia.org/r/126229 (owner: 10Hashar) [11:45:01] andrewbogott: updated 6541 [11:45:19] can you update 6158? [11:50:01] akosiaris: and you can get python-voluptuous updated on apt.wm.o :-] [11:52:50] (03CR) 10Dzahn: "host wap.wikipedia.org = ??" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 (owner: 10Dzahn) [11:55:59] hashar: we have beta.wap. but not wap it seems :p [11:57:30] (03PS1) 10Dzahn: remove WAP / beta.wap.wikipedia.org ? [operations/dns] - 10https://gerrit.wikimedia.org/r/126232 [11:58:11] i see, all language versions [11:59:16] (03Abandoned) 10Dzahn: remove WAP / beta.wap.wikipedia.org ? [operations/dns] - 10https://gerrit.wikimedia.org/r/126232 (owner: 10Dzahn) [11:59:18] mutante: that is not the beta cluster [11:59:46] hashar: ok, yea, i take the DNS thing back, just that old Apache config is not used [12:00:00] (03CR) 10Hashar: "I think the mobile team is phasing out wap. Might catch up with them to have a confirmation." [operations/dns] - 10https://gerrit.wikimedia.org/r/126232 (owner: 10Dzahn) [12:01:03] (03CR) 10Dzahn: "i see, all the language versions are generated from langlist.. so:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 (owner: 10Dzahn) [12:01:34] (03CR) 10Dzahn: "this Apache config is not used nevertheless, it is all on Varnish, right" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 (owner: 10Dzahn) [12:02:49] (03CR) 10Dzahn: "also see Change-Id: I5152ff336ca3" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 (owner: 10Dzahn) [12:11:54] !log virt5-11 - shut down [12:12:00] andrewbogott: they are actually down now [12:12:01] Logged the message, Master [12:12:13] mutante: cool. I'll tell virt0 that they're gone. [12:12:22] great [12:24:09] (03CR) 10Alexandros Kosiaris: [C: 04-1] "I quick look show that at least one problem is going to be fixed by this. udp2log" (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/110943 (owner: 10Andrew Bogott) [12:25:30] (03CR) 10Dzahn: "nfs1 is a Wikimedia central syslog server (nfs) (misc::syslog-server)." [operations/dns] - 10https://gerrit.wikimedia.org/r/125952 (owner: 10Dzahn) [12:30:44] (03CR) 10Lydia Pintscher: [C: 031] "Good from PM side :)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126223 (owner: 10Ricordisamoa) [12:34:55] (03PS8) 10Dzahn: move LDAP admin permissions,tools out of site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 [12:38:23] (03CR) 10Dzahn: [C: 032] "shouldn't change what people can do on formey who had sudo before. after this find a replacement for formey though and apply there (and ma" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [12:41:19] (03CR) 10Dzahn: "ran fine on formey. well it did remove one thing, the permission to run /var/lib/gerrit2/review_site/bin/gerrit.sh but that was intended a" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [12:56:06] mornin manybubbles, shall I continue? [12:56:06] (03CR) 10Andrew Bogott: Restore sysctl priorities. (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/110943 (owner: 10Andrew Bogott) [12:56:07] :) [12:56:16] (03PS3) 10Andrew Bogott: Restore sysctl priorities. [operations/puppet] - 10https://gerrit.wikimedia.org/r/110943 [12:56:18] ottomata: sure! [12:56:24] anyone who can create mailing list? [12:56:41] akosiaris: are the changes I made to backup stuff on this change ok? https://gerrit.wikimedia.org/r/#/c/126009/ [12:56:49] PROBLEM - Puppet freshness on db1056 is CRITICAL: Last successful Puppet run was Wed 16 Apr 2014 06:54:47 AM UTC [12:57:05] Revi: Do you have a bug in bugzilla? [12:57:09] ottomata: look at the difference: [12:57:12] JohnLewis: https://bugzilla.wikimedia.org/show_bug.cgi?id=63869 [12:57:15] ottomata: the machine I doctored: http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&c=Elasticsearch+cluster+eqiad&h=elastic1009.eqiad.wmnet&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=ALLGROUPS [12:57:24] one I didn't: http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&c=Elasticsearch+cluster+eqiad&h=elastic1010.eqiad.wmnet&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=ALLGROUPS [12:58:04] RobH: Here? [12:58:34] Revi: "Thehelpfulone" in Bugzilla [12:58:45] Revi: if he is gone,come back and ping us here [12:58:52] mutante: did you do dns as well, or shall I update https://gerrit.wikimedia.org/r/#/c/125978/ ? [12:59:12] ;p Actually he didn't set himself as a assignee [12:59:15] Vogone did. [12:59:32] anyway, ok. [12:59:35] mutante ^ [12:59:59] Revi: sorry, i didnt read the link, i can make that happen [12:59:59] How many Korean-speaking OTRS agents are there? [13:00:22] Is there any precedent for creating a per-language mailing list for OTRS agents? [13:00:33] twkozlowski: There is. [13:00:39] A few exist AFAIK /me looks [13:01:18] (03CR) 10Alexandros Kosiaris: "Apart from the whitespaces issues LGTM" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/126009 (owner: 10Ottomata) [13:01:32] twkozlowski: About 11 exist already for large and small teams. [13:01:38] manybubbles: hard to see much difference, maybe slightly less load? [13:01:44] is that what I should be looking at? [13:02:00] ottomata: yeah, proportionally it is actually a pretty decent load drop [13:02:39] but yeah, it isn't what I'd hope:(\ [13:03:07] (03PS3) 10Ottomata: Removing some references to stat1, replacing some of them with stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126009 [13:03:36] Thanks akosiaris [13:03:44] (03PS4) 10Ottomata: Removing some references to stat1, replacing some of them with stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126009 [13:03:50] (03CR) 10Ottomata: [C: 032 V: 032] Removing some references to stat1, replacing some of them with stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126009 (owner: 10Ottomata) [13:03:51] ottomata: you may also want to merge https://gerrit.wikimedia.org/r/119754 [13:04:19] andrewbogott: sorry, eh..please update that [13:04:34] needed reverse [13:04:58] ah ok [13:04:59] thanks [13:05:02] Revi: How many Korean-speaking OTRS agents are there? [13:05:45] twkozlowski: sorry, my net was out. [13:06:02] (03CR) 10Sumanah: "Dzahn asked:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [13:06:06] 5~7, afaik [13:07:33] (03PS2) 10Andrew Bogott: Remove virt5-15 from dns [operations/dns] - 10https://gerrit.wikimedia.org/r/125978 [13:07:40] mutante: ^ [13:10:20] (03PS3) 10Ottomata: Enabling cgi scripts for stats.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/119754 [13:10:39] ottomata: I think the elasticsearch plugin deployment thing doesn't always get the jars out there perfectly [13:11:01] I just had to bounce deployment-elastic01 because it hadn't received the jars the first time through [13:11:29] OH [13:11:35] duh duh duh, that's my fault [13:11:37] I'm not really sure - maybe we can run some extra step that verifies that all the .jars stopped being text files and became zip files with the right sha? [13:11:37] hmmmm [13:11:40] and puppets fault [13:11:54] so, I had been checking that the repo had been cloned, puppet is supposed to do a checkout when it first runs [13:11:57] if the repo isn't there [13:12:03] but i betcha it isn't running git fat [13:12:13] ottomata: hmmm - maybe? [13:12:36] I'm not sure that is what is going on here. I was pushing a new version of the jar out - different name but in same directory [13:12:44] different sha too, of course [13:13:01] what happened? [13:13:04] (03PS1) 10Andrew Bogott: Remove virt2 from puppet. [operations/puppet] - 10https://gerrit.wikimedia.org/r/126241 [13:13:04] it should work if you deploy from tin [13:13:08] i was letting puppet do it [13:13:15] and was thinking it did everything right [13:13:19] but i didn't check the jars [13:13:43] (03PS4) 10Ottomata: Enabling cgi scripts for stats.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/119754 [13:13:52] (03CR) 10Ottomata: [C: 032 V: 032] Enabling cgi scripts for stats.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/119754 (owner: 10Ottomata) [13:14:24] ottomata: deployment-bsatino. this is beta [13:14:32] ohhh [13:14:38] I think I'll just manually check the jars when I do a deploy [13:14:49] hmm [13:15:00] rather, something like foreach server; ssh find | grep jar | xargs sha1sum [13:15:04] have you done a deploy from tin since yesterday? [13:15:07] and make sure they all line up the same [13:15:14] ottomata: not from real tin [13:15:16] no [13:15:17] k [13:15:21] lemme check my stuff then [13:15:23] too [13:15:30] maybe puppet is doing git-fat properly after all [13:15:32] (03PS3) 10Dzahn: Remove virt5-15 from dns [operations/dns] - 10https://gerrit.wikimedia.org/r/125978 (owner: 10Andrew Bogott) [13:15:43] i haven't done a deploy for the latest reinstalls, because I thought puppet was doing it [13:16:16] nope. [13:16:20] not real jars [13:16:24] ok i'm doing a deployment [13:16:55] andrewbogott: also the mgmt entries ^ [13:17:33] ok deployment worked in prod [13:17:36] mutante: docs say "Reclaims never have mgmt entries removed, and decom servers should keep them until they are wiped and unracked." [13:17:48] ok so, manybubbles, not sure I understand what your beta problem is? [13:18:11] andrewbogott: it changed recently.. per chris [13:18:18] yes, it should be updated [13:18:21] mutante: will up update the docs then? [13:18:26] *you [13:18:39] PROBLEM - Host virt2 is DOWN: PING CRITICAL - Packet loss = 100% [13:18:55] eh, wasnt that the one you wanted to keep [13:19:01] virt2 [13:19:34] mutante: virt0 [13:19:37] I'm shutting down virt2 now. [13:19:51] Which may or may not break dns :/ [13:20:15] ok :p [13:20:28] And, yeah, I purged it and updated neon, but... [13:20:46] re: docs, i don't have the phone next to me and it wants 2 factor :p [13:21:23] (03CR) 10Dzahn: [C: 032] Remove virt5-15 from dns [operations/dns] - 10https://gerrit.wikimedia.org/r/125978 (owner: 10Andrew Bogott) [13:22:27] !log DNS update - remove virt5-15 [13:22:33] Logged the message, Master [13:23:36] (03PS1) 10Yuvipanda: toollabs: Setup ssl cert for tools.wmflabs.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/126243 [13:23:40] Coren: ^ [13:23:52] cmjohnson1: can you do me the favor and just that one sentence in docs [13:24:41] (03PS2) 10Andrew Bogott: Remove virt2 from puppet. [operations/puppet] - 10https://gerrit.wikimedia.org/r/126241 [13:24:55] mutante: ok [13:25:13] thanks! [13:26:51] (03CR) 10Dzahn: "why do you remove ".chained.pem", that is what puppet creates for you from cert and ca cert" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126243 (owner: 10Yuvipanda) [13:27:07] (03PS1) 10Ottomata: Requiring deployment::packages in deployment::target [operations/puppet] - 10https://gerrit.wikimedia.org/r/126244 [13:27:14] mutante: because tools.wmflabs.org doesn't have .chained.pem (from what Coren told me) [13:27:18] mutante: and urlrouter is only used there. [13:27:31] mutante: I don't mind putting it back if the tools cert can be renamed [13:28:02] mutante: that line is actually correct...just not for Tampa since it's going away [13:28:40] YuviPanda: https://bugzilla.wikimedia.org/show_bug.cgi?id=60833 [13:28:57] I am updating it to say that [13:29:22] mutante: but that's unrelated to the tools proxy, no? dynamicproxy takes its config from domainrouter.conf. [13:29:24] YuviPanda: https://gerrit.wikimedia.org/r/#/c/126008/1 https://gerrit.wikimedia.org/r/111386 that was *.wmflabs.org [13:29:43] that should cover tools. [13:29:50] Coren: ^ [13:29:56] mutante: tools has its own cert, IIRC. [13:29:59] in the past it was self-signed, now it's from RapidSSL [13:30:01] mostly because it had it before *. [13:30:40] mutante: either way, I have no opinions. Coren ^ should we just use that cert? [13:30:57] brb phone [13:31:01] well, in that case, i dunno how tools sets it up, but if it's not -chained.pem it's not what the normal puppet install_certificate creates [13:31:20] let's await Coren being back from phone then :) [13:31:25] yep [13:31:49] cmjohnson1: thanks, alright [13:32:17] (03PS1) 10Andrew Bogott: remove virt2 from dns [operations/dns] - 10https://gerrit.wikimedia.org/r/126245 [13:32:51] (03CR) 10Andrew Bogott: [C: 032] Remove virt2 from puppet. [operations/puppet] - 10https://gerrit.wikimedia.org/r/126241 (owner: 10Andrew Bogott) [13:33:48] (Still on phone) Yeah, we /can/ use the star cert but since tools has more prople on it I liked the idea of the more restricted cert being there. [13:34:04] Better failure mode. [13:35:18] if there is a separate cert you'll need to edit https://gerrit.wikimedia.org/r/#/c/126008/1/manifests/certs.pp [13:35:37] and then it'll give you the right chained.pem [13:36:02] (03PS2) 10Yuvipanda: toollabs: Setup ssl cert for tools.wmflabs.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/126243 [13:36:08] otherwise you get default and that is wmf-ca.pem [13:36:19] mutante: I've updated it to add the .chained back. I'll leave it to Coren to do the other things [13:36:31] nods [13:40:00] andrewbogott: virt1002 is same role as virt2 ? [13:40:07] do they map like that? [13:40:09] mutante: no [13:40:18] which one replaces virt2 [13:40:21] I don't know why virt2 was called that. [13:40:22] labnet1001 [13:40:27] virt2 was the network node [13:40:31] which is labnet1001 in eqiad [13:41:04] so virt5-15 were ciscos [13:41:08] virt2 is a little misc dell I think [13:42:48] (03PS1) 10Dzahn: labs config,replace decom'ed virt2 -> labnet1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126248 [13:42:54] andrewbogott: gotcha, ok, then ... this ^ [13:44:07] mutante: both of the code blocks that you edited are pmtpa-specific. [13:44:15] So they can be left alone for now, will probably get scrapped later on. [13:45:02] (03CR) 10Andrew Bogott: "I saw these in the grep, but both code blocks are specific to pmtpa. I think it's fine to leave things as they are for now... eventually " [operations/puppet] - 10https://gerrit.wikimedia.org/r/126248 (owner: 10Dzahn) [13:45:02] ah, ok [13:45:03] !log reinstalling elastic1011 [13:45:06] (03PS1) 10ArielGlenn: updated fonts list and sorted it [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126249 [13:45:09] Logged the message, Master [13:45:39] mutante: I'm pretty sure that we've done everything that needs doing for decom of virt2,5-15. Let me know if I missed something. [13:45:46] (03PS2) 10ArielGlenn: updated fonts list and sorted it, rt #810 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126249 [13:46:14] (03Abandoned) 10Dzahn: labs config,replace decom'ed virt2 -> labnet1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126248 (owner: 10Dzahn) [13:47:36] PROBLEM - Host elastic1011 is DOWN: PING CRITICAL - Packet loss = 100% [13:48:05] (03CR) 10Dzahn: [C: 031] remove virt2 from dns [operations/dns] - 10https://gerrit.wikimedia.org/r/126245 (owner: 10Andrew Bogott) [13:49:12] (03CR) 10Andrew Bogott: [C: 032] remove virt2 from dns [operations/dns] - 10https://gerrit.wikimedia.org/r/126245 (owner: 10Andrew Bogott) [13:49:47] mutante: remind me how to merge dns changes? [13:49:56] yterbium? [13:50:06] andrewbogott: rubidium [13:50:14] andrewbogott: or easier to remember.. ssh root@ns0 [13:50:20] then authdns-update [13:50:30] !log raised the number of replicas for labswiki's search directly in elasticsearch because I can't easilly do for cirrus due to access restrictions [13:50:36] Logged the message, Master [13:50:42] !log restarting elastic1009 to suck up new config [13:50:48] Logged the message, Master [13:50:54] mutante: thx [13:50:57] yw [13:52:20] * Coren returns. [13:52:46] RECOVERY - Host elastic1011 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [13:52:55] hi Coren [13:52:56] PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.141 [13:53:07] I've updated the patch to put the .chained back. [13:53:19] YuviPanda: So, we goin' w/ tools. or star.? [13:53:46] I wouldn't mind unifying it and having star everywhere. [13:53:50] and one less cert. [13:54:36] If we're using tools, mutante mentioned a few other things that needed to be done, which I thought I'd leave up to you (security stuff, so I'd rather keep my hands off) [13:54:56] PROBLEM - Disk space on elastic1011 is CRITICAL: Connection refused by host [13:54:56] PROBLEM - puppet disabled on elastic1011 is CRITICAL: Connection refused by host [13:54:57] PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.143 [13:55:02] Coren, tell me again how to delete a record in ldapvi? [13:55:06] PROBLEM - SSH on elastic1011 is CRITICAL: Connection refused [13:55:12] Like I said, I'm okay with using star; the only potential issue I see is how bad things fail if something happens. [13:55:16] PROBLEM - DPKG on elastic1011 is CRITICAL: Connection refused by host [13:55:16] PROBLEM - RAID on elastic1011 is CRITICAL: Connection refused by host [13:55:26] PROBLEM - check if dhclient is running on elastic1011 is CRITICAL: Connection refused by host [13:55:26] PROBLEM - check configured eth on elastic1011 is CRITICAL: Connection refused by host [13:55:29] andrewbogott: Replace the index number (first line, before the space) with "del" [13:56:01] so, like, "36 dc=pmtpa-proxy,ou=hosts,dc=wikimedia,dc=org" => "del dc=pmtpa-proxy,ou=hosts,dc=wikimedia,dc=org" [13:56:01] Coren: let's just use star? [13:56:01] * Coren nods. [13:56:01] https://bugzilla.wikimedia.org/show_bug.cgi?id=56237 [13:56:08] YuviPanda: Yeah, allright. [13:56:11] who's the best person to deal with this? [13:57:03] should only take a couple of minutes, it seems [13:57:33] Coren: ldapvi says "Error: Invalid key: `del'." [13:57:44] hm, interesting attempt at smart quotes there [13:58:48] twkozlowski: You'll need to find someone who understands the impact well enough. Do you remember who did the wikivoyage install? [13:58:55] twkozlowski: springle [13:58:56] andrewbogott: Oh, my bad, you need to say "delete" [13:59:19] mutante: Just assigned it to Sean, too [13:59:20] (03PS3) 10Yuvipanda: toollabs: Setup ssl cert for tools.wmflabs.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/126243 [13:59:24] andrewbogott: Abbreviation not allowed. :-) [13:59:25] Coren: mutante ^ [13:59:39] now it says "Error: Garbage at end of record." :( [13:59:54] (03CR) 10coren: [C: 031] "Seems sane." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126243 (owner: 10Yuvipanda) [13:59:56] RECOVERY - ElasticSearch health check on elastic1009 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1896: active_shards: 5627: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [14:00:22] andrewbogott: That... isn't what normally happens to me when I do this. Try starting over? [14:00:26] I did [14:00:30] (Maybe you did a noise edit) [14:00:36] Really? [14:01:03] "36 dc=pmtpa-proxy,ou=hosts,dc=wikimedia,dc=org" => "delete dc=pmtpa-proxy,ou=hosts,dc=wikimedia,dc=org" [14:01:12] And no other changes? [14:01:20] yep [14:01:23] want to try? ldapvi -D "uid=andrew,ou=people,dc=wikimedia,dc=org" -b "ou=hosts,dc=wikimedia,dc=org" [14:01:27] and delete just that one record [14:01:37] * Coren tries. [14:01:43] Where did you do this from? [14:01:50] virt1000 [14:03:00] food, brb [14:04:08] andrewbogott: Ah. Two issues. (1) I didn't remember, but you want to keep /just/ the "delete ..." line when deleting. (2) additional info: The entry dc=pmtpa-proxy,ou=hosts,dc=wikimedia,dc=org cannot be deleted due to insufficient access rights [14:04:17] *I* don't have the right to do this. :-) [14:04:40] * andrewbogott tries again [14:05:14] dn: dc=pmtpa-proxy,ou=hosts,dc=wikimedia,dc=org [14:05:15] changetype: delete [14:05:15] dn: dc=pmtpa-proxy,ou=hosts,dc=wikimedia,dc=org [14:05:15] changetype: delete [14:05:22] Also odd, it tries removing it twice. [14:05:30] yeah, I'm seeing that too. [14:05:36] (03CR) 10Kaldari: [C: 032] updated fonts list and sorted it, rt #810 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126249 (owner: 10ArielGlenn) [14:05:42] (03Merged) 10jenkins-bot: updated fonts list and sorted it, rt #810 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126249 (owner: 10ArielGlenn) [14:05:49] works for me, then errors on the second attempt. [14:05:54] Anyway, this gets me moving again, thanks. [14:05:58] kk [14:06:13] andrewbogott: Although it'd be nice if I knew why I don't have the rights. [14:06:37] YuviPanda: +1'ed [14:07:06] PROBLEM - NTP on elastic1011 is CRITICAL: NTP CRITICAL: No response from NTP server [14:08:06] RECOVERY - SSH on elastic1011 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.3 (protocol 2.0) [14:10:37] (03PS1) 10Dzahn: remove virt15 from DHCP, decom [operations/puppet] - 10https://gerrit.wikimedia.org/r/126250 [14:11:20] YuviPanda: Coren how do you actually handle the key then [14:11:33] with privatekey => false [14:11:54] mutante: That just prevents installation of the fake key, we still have to put the real one by hand. [14:12:31] yea, i just wanted to understand where it comes from [14:13:00] hashar: [14:13:03] (03PS1) 10Odder: Create autopatrolled user group on brwikimedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126251 [14:13:34] mutante: yeah ? :] [14:13:40] (03CR) 10Dzahn: [C: 031] toollabs: Setup ssl cert for tools.wmflabs.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/126243 (owner: 10Yuvipanda) [14:13:56] hashar: what you said earlier about that key in the private/public labs repo [14:14:08] it's the old one [14:14:13] ah yeah so we apparently have some certificates keys in labs/private [14:14:18] but that is a public repository [14:14:21] and they copy the real one by hand [14:14:34] that was probably fine with self signed certs [14:14:43] as you can see in Yuvi's change there, with privatekey => false [14:14:50] but now that we have real certs, I guess we want to keep the .key out of public view [14:14:55] one can still use the install_certificate puppet class that creates the chained.pem [14:15:08] yes, what you said [14:15:23] ganglia in labs is unhappy, maybe you already know this [14:15:24] e.g. [14:15:31] http://ganglia.wmflabs.org/latest/?c=tools&h=tools-webproxy [14:15:36] (link from instance page) [14:15:46] Coren: ^ there were worries about that earlier, that's why i asked [14:16:02] about the old, private key in labs/private [14:16:07] ad the main page gives the same. [14:16:10] *and [14:17:00] (03PS1) 10Hashar: beta: do not use private key for star.wmflabs.org cert [operations/puppet] - 10https://gerrit.wikimedia.org/r/126252 [14:17:03] apergos: I did not. I'll take a look in a bit. [14:17:10] mutante: maybe something like https://gerrit.wikimedia.org/r/126252 ? [14:17:56] RECOVERY - ElasticSearch health check on elastic1011 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5627: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [14:20:24] (03CR) 10Hashar: "I have no clue what it is supposed to do. Based on a change by YuviPanda" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126252 (owner: 10Hashar) [14:23:16] RECOVERY - DPKG on elastic1011 is OK: All packages OK [14:23:17] RECOVERY - RAID on elastic1011 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [14:23:26] RECOVERY - check if dhclient is running on elastic1011 is OK: PROCS OK: 0 processes with command name dhclient [14:23:26] RECOVERY - check configured eth on elastic1011 is OK: NRPE: Unable to read output [14:23:56] RECOVERY - Disk space on elastic1011 is OK: DISK OK [14:23:56] RECOVERY - puppet disabled on elastic1011 is OK: OK [14:24:43] (03CR) 10Dzahn: [C: 032] "it was trying to install the old private snakeoil key from the "private" repo which was used for the self-signed cert. now it's a real cer" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126252 (owner: 10Hashar) [14:26:45] manybubbles: moving shards back to elastic1011 [14:26:52] ottomata: yay! [14:27:02] getting pretty close to time to do the primaries, right? [14:27:29] well, a day or two away [14:28:04] (03CR) 10Andrew Bogott: "The solution here is to just not try to use the star cert. At the moment the cert is pretty much reserved for just the proxy project." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126252 (owner: 10Hashar) [14:29:33] (03CR) 10Dzahn: "but beta can solve the problem just like the proxy project does. beta is aiming to be close to production" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126252 (owner: 10Hashar) [14:29:53] beta might need its own cert then? I'm not sure... [14:30:06] Or it can have its own self-signed cert... [14:30:14] it's a huge bug with 50 comments, yep [14:30:23] mostly I want to keep the start cert out of any project that gives access to folks w/out an nda [14:30:39] right, that's why i said "lock down the instance" [14:30:54] i'm not sure how many people have beta access [14:31:07] but if one project solved it.. [14:31:42] it's been going on on https://bugzilla.wikimedia.org/show_bug.cgi?id=48501 [14:32:07] hashar: i think you just want to ask for *.beta.wmflabs.org separately [14:32:19] because it won't work with *.*. anyways [14:33:54] Even then the instance would need to be secure though. Not sure what to do about that :( [14:34:52] my thought was only "if it can be safe enough for proxy, why cant it for beta" [14:36:38] (03CR) 10Dzahn: [C: 032] toollabs: Setup ssl cert for tools.wmflabs.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/126243 (owner: 10Yuvipanda) [14:36:56] RECOVERY - NTP on elastic1011 is OK: NTP OK: Offset -0.01212704182 secs [14:37:25] Is there a way for a non-op (and non-member of wmf ldap group) to check the icinga version we use? [14:38:52] qchris: no, but it's precise's, 1.6.1-2 [14:40:06] paravoid: Ok. Thanks. [14:40:09] Coren: mutante +2? :) [14:40:46] qchris: we had it open to everyone, but it currently has unresolved vulnerabilities [14:40:56] paravoid: Any plans to upgrade, so we can open up access to everyone again? [14:40:58] YuviPanda: done [14:41:04] paravoid: Ah. Yes. [14:41:06] mutante: woot! [14:41:12] mutante: Coren will someone have to copy the private key now? [14:41:15] paravoid: That was just my question :-) [14:41:20] no plans to upgrade really, no [14:41:23] Ok. [14:41:27] we may replace it soon, who knows [14:42:27] qchris: paravoid RT #6838 is that,, update it so we can open it again [14:42:48] YuviPanda: Yes. [14:43:00] Coren: can you do that? [14:43:17] I don't think I've the access, and again I don't want to be responsible for security without knowing much :) [14:44:43] it wasn't closed, mutante [14:44:56] Coren: mutante stepping afk for a bit. [14:45:41] YuviPanda|afk: Sure, in a few. I gotta finish catching up to my email first. [14:48:49] (03CR) 10coren: [C: 032] Add rc_source recentchanges column to labs replica databases [operations/software] - 10https://gerrit.wikimedia.org/r/125369 (owner: 10Aude) [14:49:14] (03CR) 10Andrew Bogott: [C: 04-2] "I'm still finding this useful, but it will never be merged." [operations/puppet] - 10https://gerrit.wikimedia.org/r/98700 (owner: 10Andrew Bogott) [14:49:52] werdna: regarding "Your production login" -- thoughts? [14:58:01] manybubbles: Hmm. We have a SWAT requested for this morning, but I see no Kaldari on IRC. I'm also curious as to whether anyone has tested the change on Beta Labs or something. [14:58:35] anomie: if the requester isn't on irc I'd skip the deploy [14:58:42] or you can see if anyone else is on. [14:59:04] I'm going to start a meeting with wikidata folks now but will keep reading if I can [14:59:04] manybubbles: If he shows up before the end of the window I might still do it. [14:59:09] sure [15:00:44] manybubbles: there are 25 shards on elastic1011 [15:00:48] should I wait, or keep moving [15:00:49] ? [15:00:57] ottomata: you can keep going i think [15:01:00] k [15:01:32] moving shards off of 1012 [15:12:34] (03CR) 10Andrew Bogott: [C: 031] remove virt15 from DHCP, decom [operations/puppet] - 10https://gerrit.wikimedia.org/r/126250 (owner: 10Dzahn) [15:18:23] Coren: back [15:23:12] Down to <100 email to catch up to. [15:23:49] (03CR) 10BryanDavis: [C: 031] Requiring deployment::packages in deployment::target [operations/puppet] - 10https://gerrit.wikimedia.org/r/126244 (owner: 10Ottomata) [15:25:02] (03PS2) 10Ottomata: Requiring deployment::packages in deployment::target [operations/puppet] - 10https://gerrit.wikimedia.org/r/126244 [15:25:06] (03CR) 10Ottomata: [C: 032 V: 032] "Thanks Bryan" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126244 (owner: 10Ottomata) [15:28:00] Coren: heh, ok. poke when done [15:29:14] mutante|away: nfs1 and nfs2 are in tampa [15:29:18] are they going away? [15:29:29] any idea why there is a mw udp2log instance running on those? [15:36:44] (03PS1) 10Marco: Adding '*.panoramio.com' to the wgCopyUploadsDomains array [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126384 [15:38:40] !log reinstalling elastic1012 [15:38:46] Logged the message, Master [15:40:16] PROBLEM - Host elastic1012 is DOWN: PING CRITICAL - Packet loss = 100% [15:41:29] Can someone please touch fluorine:/a/mw-log/redis-jobqueue.log ? [15:45:26] RECOVERY - Host elastic1012 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms [15:45:56] PROBLEM - swift-object-updater on ms-be3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [15:46:06] PROBLEM - swift-object-auditor on ms-be3001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [15:46:16] PROBLEM - swift-object-auditor on ms-be3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [15:46:16] PROBLEM - swift-object-replicator on ms-be3004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [15:46:16] PROBLEM - swift-object-replicator on ms-be3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [15:46:16] PROBLEM - swift-object-updater on ms-be3002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [15:46:17] PROBLEM - swift-object-auditor on ms-be3002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [15:46:26] PROBLEM - swift-object-updater on ms-be3004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [15:46:26] PROBLEM - swift-object-updater on ms-be3001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [15:46:27] PROBLEM - swift-object-auditor on ms-be3004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [15:46:46] PROBLEM - swift-object-replicator on ms-be3001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [15:46:46] PROBLEM - swift-object-replicator on ms-be3002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [15:47:36] PROBLEM - Disk space on elastic1012 is CRITICAL: Connection refused by host [15:47:36] PROBLEM - RAID on elastic1012 is CRITICAL: Connection refused by host [15:47:56] PROBLEM - DPKG on elastic1012 is CRITICAL: Connection refused by host [15:48:06] PROBLEM - check if dhclient is running on elastic1012 is CRITICAL: Connection refused by host [15:48:06] PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.144 [15:48:06] PROBLEM - SSH on elastic1012 is CRITICAL: Connection refused [15:48:06] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:48:16] PROBLEM - check configured eth on elastic1012 is CRITICAL: Connection refused by host [15:48:26] PROBLEM - puppet disabled on elastic1012 is CRITICAL: Connection refused by host [15:53:06] PROBLEM - Swift HTTP backend on ms-fe3001 is CRITICAL: Connection refused [15:54:46] PROBLEM - RAID on ms-fe3001 is CRITICAL: Connection refused by host [15:54:56] PROBLEM - Disk space on ms-fe3001 is CRITICAL: Connection refused by host [15:55:06] PROBLEM - Swift HTTP frontend on ms-fe3001 is CRITICAL: Connection refused [15:55:06] PROBLEM - check configured eth on ms-fe3001 is CRITICAL: Connection refused by host [15:55:06] PROBLEM - DPKG on ms-fe3001 is CRITICAL: Connection refused by host [15:55:13] ignore all that [15:55:16] PROBLEM - puppet disabled on ms-fe3001 is CRITICAL: Connection refused by host [15:55:16] PROBLEM - check if dhclient is running on ms-fe3001 is CRITICAL: Connection refused by host [15:55:17] PROBLEM - SSH on ms-fe3001 is CRITICAL: Connection refused [15:55:17] PROBLEM - Memcached on ms-fe3001 is CRITICAL: Connection refused [15:55:31] cmjohnson1: mutante|away mw1163 was the one that caused problems back in March as well [15:56:42] manybubbles: i need to get lunch before scrum of scrums and I want to get to a cafe before then too [15:56:57] 1012 is installing now and almost done, i'm going to run out now though and finish it when I get back [15:57:46] PROBLEM - Puppet freshness on db1056 is CRITICAL: Last successful Puppet run was Wed 16 Apr 2014 06:54:47 AM UTC [15:59:36] PROBLEM - NTP on elastic1012 is CRITICAL: NTP CRITICAL: No response from NTP server [16:00:17] RECOVERY - swift-object-replicator on ms-be3003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [16:00:17] RECOVERY - swift-object-updater on ms-be3002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [16:00:17] RECOVERY - swift-object-auditor on ms-be3002 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [16:00:26] RECOVERY - swift-object-updater on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [16:00:27] RECOVERY - swift-object-updater on ms-be3001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [16:00:27] RECOVERY - swift-object-auditor on ms-be3004 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [16:00:46] RECOVERY - swift-object-replicator on ms-be3001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [16:00:46] RECOVERY - swift-object-replicator on ms-be3002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [16:00:56] RECOVERY - swift-object-updater on ms-be3003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [16:01:06] RECOVERY - swift-object-auditor on ms-be3001 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [16:01:16] RECOVERY - swift-object-replicator on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [16:01:16] RECOVERY - swift-object-auditor on ms-be3003 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [16:02:06] RECOVERY - SSH on elastic1012 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.3 (protocol 2.0) [16:04:16] PROBLEM - Swift HTTP backend on ms-fe3002 is CRITICAL: Connection refused [16:05:56] PROBLEM - Swift HTTP frontend on ms-fe3002 is CRITICAL: Connection refused [16:06:26] (03CR) 10BryanDavis: [C: 04-1] "Is there a shell bug request or other discussion explaining why this is needed and how it will be used?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126384 (owner: 10Marco) [16:10:37] On Commons I am getting a lot of "Internal error: Server failed to store temporary file." errors when using the UploadWizard. any issues knwon? [16:11:02] Raymond_: It may have been related to the swift errors above? [16:11:06] paravoid: ^^ [16:12:30] no it's not [16:13:08] the ms-fe3xxx/ms-be3xxx have nothing to do with anything production/mediawiki [16:16:26] marktraceur: good guess though, but yeah, ignore all that spamminess until further notice [16:16:42] marktraceur: ball is now back in your court re debugging ;) [16:17:48] (03CR) 10Marco: "It was requested at the German village pump ( https://commons.wikimedia.org/w/index.php?title=Commons:Forum&oldid=121493612#Alle_Bilder_ei" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126384 (owner: 10Marco) [16:19:08] Righto. [16:19:13] Well...probably. [16:19:31] greg-g: Internal server errors aren't my forte. bd808 maybe can halp me out [16:19:47] * bd808 reads backscroll [16:19:56] Yuck [16:21:34] Yeah [16:22:05] greg-g: mw1163 went from having DIMM errors to maybe a cpu or motherboard [16:22:08] bd808: I haven't tried to debug internal server errors before, I imagine it's something about fluoride [16:22:22] cmjohnson1: awesome. under warranty? :) [16:22:33] yes it is [16:22:53] marktraceur: we don't have flouride in our drinking water in Petaluma. [16:23:01] (did you mean flourine?) [16:23:10] Ugh whatever [16:23:14] marktraceur: I'll start grepping some things [16:23:38] greg-g: Flashbacks to bad dentistry etc. etc. [16:25:39] Raymond_: Do you have any further details? (/me left his upload pipeline knowledge in his other pants.) [16:26:20] (good thing you work from home?) [16:29:16] bd808: I am trying to upload 2 batches of 5 images in the last hour with the UploadWizard. I am getting this error for 3-5 images per batch [16:29:38] bd808: with chrome and Firefox but I think this is not relevant for this error [16:30:10] (03PS1) 10RobH: server osmium dhcp entry and netboot [operations/puppet] - 10https://gerrit.wikimedia.org/r/126669 [16:31:03] (03CR) 10Rush: [C: 031] "Seems pretty good to me so far. I would say we could give it a whirl. Just a few thoughts." [operations/puppet] - 10https://gerrit.wikimedia.org/r/125726 (owner: 10Giuseppe Lavagetto) [16:31:29] Raymond_: Can you describe the files? [16:31:37] Raymond_: Do you have the chunked uploads enabled in your preferences? [16:31:43] Like, format, size, metadata [16:32:02] (03PS4) 10Ori.livneh: Restore sysctl priorities. [operations/puppet] - 10https://gerrit.wikimedia.org/r/110943 (owner: 10Andrew Bogott) [16:34:50] (03CR) 10Rush: "One last thing w/ check_window if I understand we fetch a week and then use the check_window subset number out of it counting back from mo" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125726 (owner: 10Giuseppe Lavagetto) [16:35:29] bd808: photos in jpg format, 10-20 MB per file. and yes. the chunked upload is enabled in my prefs [16:35:38] (03CR) 10RobH: [C: 032] server osmium dhcp entry and netboot [operations/puppet] - 10https://gerrit.wikimedia.org/r/126669 (owner: 10RobH) [16:35:49] * bd808 was afraid of that [16:36:25] aaarghhhhh [16:36:26] found it [16:36:28] finally [16:36:29] bd808: prefs unchanged since weeks/monthssss and it works in the last weeks/months [16:36:34] (03CR) 10Ori.livneh: "Can you give me an example of a host that has incorrect settings now? The default is 60 now.." [operations/puppet] - 10https://gerrit.wikimedia.org/r/110943 (owner: 10Andrew Bogott) [16:36:43] paravoid: ? [16:36:48] !log reedy synchronized php-1.23wmf22/includes/jobqueue/JobQueueRedis.php 'I678ab55ae3678b5cd944393f2f2048851625f153' [16:36:53] an issue I've been chasing since last night [16:36:55] Logged the message, Master [16:36:58] for many, many hours [16:37:05] oh! anything fun? [16:37:16] RECOVERY - Swift HTTP backend on ms-fe3002 is OK: HTTP OK: HTTP/1.1 200 OK - 343 bytes in 0.213 second response time [16:37:33] not fun at all, no [16:37:33] !log reedy synchronized php-1.23wmf21/includes/jobqueue/JobQueueRedis.php 'I678ab55ae3678b5cd944393f2f2048851625f153' [16:37:38] Logged the message, Master [16:37:41] just our version of swift being buggy [16:37:49] Raymond_: Sure. I'm just remembering chasing my tail on chunk assembly issues many months ago. [16:37:56] RECOVERY - Swift HTTP frontend on ms-fe3002 is OK: HTTP OK: HTTP/1.1 200 OK - 137 bytes in 0.199 second response time [16:38:17] RECOVERY - SSH on ms-fe3001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.3 (protocol 2.0) [16:38:17] RECOVERY - Memcached on ms-fe3001 is OK: TCP OK - 0.098 second response time on port 11211 [16:38:44] anywone know about 2014-04-16 16:23:18 mw1219 enwiki: Redis exception connecting to 10.64.0.194: read error on connection [16:38:46] RECOVERY - RAID on ms-fe3001 is OK: OK: optimal, 1 logical, 2 physical [16:38:56] RECOVERY - Disk space on ms-fe3001 is OK: DISK OK [16:39:06] RECOVERY - Swift HTTP frontend on ms-fe3001 is OK: HTTP OK: HTTP/1.1 200 OK - 137 bytes in 0.198 second response time [16:39:06] RECOVERY - Swift HTTP backend on ms-fe3001 is OK: HTTP OK: HTTP/1.1 200 OK - 343 bytes in 0.208 second response time [16:39:06] RECOVERY - check configured eth on ms-fe3001 is OK: NRPE: Unable to read output [16:39:07] RECOVERY - DPKG on ms-fe3001 is OK: All packages OK [16:39:16] RECOVERY - puppet disabled on ms-fe3001 is OK: OK [16:39:16] RECOVERY - check if dhclient is running on ms-fe3001 is OK: PROCS OK: 0 processes with command name dhclient [16:40:03] (03PS1) 10Chad: All wikis with <250k pages opted in [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126671 [16:40:08] (03CR) 10jenkins-bot: [V: 04-1] All wikis with <250k pages opted in [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126671 (owner: 10Chad) [16:40:32] (03PS2) 10Chad: All wikis with <250k pages opted in [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126671 [16:40:42] paravoid: btw https://github.com/facebook/hhvm/pull/2450 ;) [16:40:56] (03CR) 10Andrew Bogott: "If the default is now 60 then this patch is probably moot. Except for the udp2log bit..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/110943 (owner: 10Andrew Bogott) [16:43:35] (03CR) 10Ori.livneh: "ah, yes:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110943 (owner: 10Andrew Bogott) [16:44:03] andrewbogott: mind if i amend your patch? [16:44:10] ori: not at all. [16:45:06] PROBLEM - DPKG on copper is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:49:06] RECOVERY - DPKG on copper is OK: All packages OK [16:53:51] (03PS1) 10Reedy: Remove furhter pmtpa remnants [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126673 [16:54:31] (03PS2) 10Yuvipanda: Remove fuhrer pmtpa remnants [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126673 (owner: 10Reedy) [16:54:35] Reedy: ^ ftfy [16:54:46] (03PS5) 10Ori.livneh: Increase priority of Sysctl::Parameters['big rmem'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/110943 (owner: 10Andrew Bogott) [16:55:45] (03CR) 10Andrew Bogott: [C: 031] Increase priority of Sysctl::Parameters['big rmem'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/110943 (owner: 10Andrew Bogott) [16:56:04] (03CR) 10Ori.livneh: [C: 031] Increase priority of Sysctl::Parameters['big rmem'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/110943 (owner: 10Andrew Bogott) [16:58:10] (03CR) 10Andrew Bogott: [C: 032] Increase priority of Sysctl::Parameters['big rmem'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/110943 (owner: 10Andrew Bogott) [16:58:15] YuviPanda: it should be further, not fuhrer ;) [17:01:56] (03PS2) 10Ori.livneh: Enable web fonts by default on Hebrew Wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115153 (owner: 10Odder) [17:02:06] (03CR) 10Ori.livneh: [C: 032] "Ok, persuasive." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115153 (owner: 10Odder) [17:02:13] (03Merged) 10jenkins-bot: Enable web fonts by default on Hebrew Wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115153 (owner: 10Odder) [17:04:12] apergos: Ib7b2bc21a is merged in gerrit but not tin; shall i sync it? [17:04:36] that's [17:04:51] Trminator: :D [17:05:43] (03CR) 10Ori.livneh: "This was merged in gerrit but not on tin. It looks safe so I'll merge / sync." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126249 (owner: 10ArielGlenn) [17:06:26] marktraceur, Raymond_ : I didn't find anything obviously wrong with chunked uploads, but I did remind myself that the logging in ApiUpload leaves quite a bit to be desired. [17:06:40] *nod* [17:06:54] !log ori updated /a/common to {{Gerrit|I7b6e5c2d7}}: Enable web fonts by default on Hebrew Wikisource [17:06:54] Raymond_: The issue is persisting? [17:07:00] Logged the message, Master [17:07:45] ^d and ori : your benchmark server is online [17:07:50] !log ori synchronized fc-list 'Ib7b2bc21a: updated fonts list and sorted it, rt #810' [17:07:53] osmium [17:07:56] Logged the message, Master [17:08:27] !log ori synchronized wmf-config/InitialiseSettings.php 'I7b6e5c2d7: Enable web fonts by default on Hebrew Wikisource' [17:08:33] Logged the message, Master [17:08:37] bd808, marktraceur: I will try again now [17:08:44] marktraceur: There are 8 different failures that could produce the api-error-stashfailed message for the user. :/ [17:09:16] Of course there are sigh [17:09:21] !log starting swiftrepl on copper for eqiad->esams copy [17:09:27] Logged the message, Master [17:10:34] bd808, marktraceur: persists for 2 of 5 files of a batch [17:11:21] trying another batch [17:12:02] How often are new translations deployed onto production? [17:12:18] should be every 24 hours or so [17:12:28] twkozlowski: at 2am GMT each day [17:12:31] I had a patch merged some time ago adding a Latvian translation for NS_MODULE, wonder what happened to it. [17:12:41] GMT or UTC? [17:12:46] utc [17:12:51] Oh, those aren't translations [17:12:57] Those will come with code updates/deploys [17:12:58] * bd808 thinks of them as interchangable [17:13:50] bd808, marktraceur: another batch: 5 of 5 files fails with the error. [17:14:10] Reedy: Oh. Merged on Monday, so will be deployed next Thursday? [17:14:12] bd808, marktraceur: this moring (European time) I uploaded 10 files in a batch w/o errors [17:14:17] RobH: um, i may have an annoying followup request... [17:14:19] twkozlowski: Should be, yup [17:14:29] Wonder if I should keep the bug open till then, then. [17:14:32] paravoid: do you think that should be 14.04? [17:14:44] I never know what's the proper time to close bugs like this. [17:14:51] On merge or on deploy? Hmmm... [17:15:13] twkozlowski: on merge [17:15:20] if you need a different os then you'll need to reply to the ticket and be more specific on whats required [17:15:56] RobH: yeah, it'd be my fault if a change is required, i'm sorry about that [17:16:18] it was discussed during the quarterly review yesterday and i'm not sure we decided definitively [17:19:54] its not that big a deal, just clearing keys from systems and modifying the dhcpd file [17:20:14] but just dont have me reinstall like two more times ;] [17:20:46] (03PS1) 10Ottomata: Including yuvipanda on stat1003 and bast1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126722 [17:22:09] ottomata: woot! merge? :D [17:22:20] Thanks Raymond_, closed that one. One bug less! Yay. [17:23:42] ottomata: are you going to do the SoS update? [17:23:50] I'll be there, I just don't have much because of the hackathon :) [17:24:10] sure, I don't have much either! [17:24:18] aside from what i've been working on [17:24:18] as usual :) [17:24:39] (03PS2) 10Ottomata: Including yuvipanda on stat1003 and bast1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126722 [17:25:32] (03CR) 10Ottomata: [C: 032 V: 032] Including yuvipanda on stat1003 and bast1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126722 (owner: 10Ottomata) [17:27:17] I've started swiftrepl jobs on copper [17:27:32] they heavily rely on the eqiad-esams link [17:27:54] that has been a bit troubled in the past, ottomata you had some issues for example [17:28:09] if there's esams issues, feel free to ssh to copper and kill all of them [17:28:22] it's currently doing 160mbps or so [17:28:51] its ok paravoid, if the link causes problem for kafka stuff we'll just deal...we've been dealing thus far :/ [17:29:03] but ok [17:29:04] (which should be nothing for our scale, but..) [17:29:06] RECOVERY - ElasticSearch health check on elastic1012 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5627: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [17:29:13] yeah just a heads-up :) [17:31:35] k thanks...trying to join SoS [17:34:06] RECOVERY - DPKG on elastic1012 is OK: All packages OK [17:34:06] RECOVERY - check if dhclient is running on elastic1012 is OK: PROCS OK: 0 processes with command name dhclient [17:34:16] RECOVERY - check configured eth on elastic1012 is OK: NRPE: Unable to read output [17:34:26] RECOVERY - puppet disabled on elastic1012 is OK: OK [17:34:27] paravoid, I don't think I can give update, I can barely hear people talking [17:35:01] RECOVERY - Disk space on elastic1012 is OK: DISK OK [17:35:01] RECOVERY - RAID on elastic1012 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [17:37:02] garrrrrr I came to this library because internet was supposed to be good :(:(:( [17:37:08] it was great on monday! [17:37:41] (03PS1) 10Reedy: Add ttf-kochi-mincho and ttf-kochi-gothic to imagescalers [operations/puppet] - 10https://gerrit.wikimedia.org/r/126729 [17:38:20] paravoid: sorry :( [17:38:30] no worries [17:42:09] (03CR) 10Odder: Create a FeaturedFeed for the Tech News bulletin (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124272 (owner: 10Odder) [17:44:04] <^d> RobH: Thank you sir :) [17:45:15] ottomata: did you ping me? I seem to have a ping but can't find anything in my scrollback [17:46:19] hmmm, a while ago but not recently :) [17:46:19] nothing important if I did [17:46:30] oh i was wondering what your beta plugin deploy problem was [17:46:30] ottomata: sweet [17:46:32] I didn't understand that [17:47:10] ottomata: I'm not really sure what it was - I've added a step before I reboot the elasticsearch servers where I check that the plugins were synced correctly manually. if that catches something we'll know more [17:47:26] RECOVERY - NTP on elastic1012 is OK: NTP OK: Offset 0.004471898079 secs [17:47:44] mmk cool [17:54:50] !log reedy synchronized php-1.23wmf22/includes/jobqueue/ 'I4b4dbe4637dc50cd4630ef19d54f01efba10e138' [17:54:57] Logged the message, Master [17:55:28] ok, queue now 400k, let's see how fast at going down [18:10:02] (03PS2) 10MarkTraceur: FUTURE: Second batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125032 [18:10:36] (03PS3) 10MarkTraceur: Second batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125032 [18:11:25] manybubbles: moving shards back to 1012 now [18:11:32] ottomata: sweet! [18:11:45] only two non master relevant nodes left [18:11:48] 1015 and 1016 [18:11:50] shall I continue? [18:12:02] do you want to continue (y/n)? [18:12:03] <^d> gogogogo [18:12:07] ottomata: and I have access. Thank you :) [18:12:15] yup! sorry about that YuviPanda! [18:13:33] ok, moving shards off of 1015 [18:14:38] (03CR) 10Manybubbles: [C: 031] "I have no objections. Should we schedule this during one of our "search deploy" windows. Maybe we can also add some of the other wikiped" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126671 (owner: 10Chad) [18:15:25] ottomata: np :) [18:15:54] (03CR) 10Chad: "This adds them as betas, not as primary based on current CommonSettings config." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126671 (owner: 10Chad) [18:16:58] hey paravoid, do your partman skills include any ideas of how to automate running tune2fs -m 0 /dev/md2 ? [18:17:28] this partition is only for elasticsearch, so there's no reason to reserve any blocks for priviledged processes [18:17:36] and that gets about 25G back from the reserve [18:17:42] (03CR) 10Manybubbles: "Ah, sorry, wasn't reading properly. We can do 250k any time I think. They aren't likely to make the infrastructure scream. We should pr" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126671 (owner: 10Chad) [18:20:10] (03CR) 10Chad: "Will amend to add them to building. I like doing them in small batches of ~20 or so, makes it easier to manage :)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126671 (owner: 10Chad) [18:20:25] (03CR) 10CSteipp: [C: 031] bugzilla, use SSLProtocol ALL -SSLv2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126206 (owner: 10Dzahn) [18:26:06] (03PS1) 10Reedy: Only load/enable Lucene on production (not on labs) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126804 [18:26:39] (03PS3) 10Chad: All wikis with <250k pages opted in [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126671 [18:27:01] (03PS1) 10Chad: New wikis done building [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126806 [18:27:09] (03PS2) 10Reedy: Only load/enable Lucene on production (not on labs) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126804 [18:27:20] (03PS3) 10Reedy: Remove fuhrer pmtpa remnants [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126673 [18:27:29] (03PS4) 10Reedy: Remove further pmtpa remnants [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126673 [18:28:17] (03CR) 10CSteipp: [C: 031] "Awesome! I'm looking forward to seeing how the server load and outgoing bandwidth are affected." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126205 (owner: 10Dzahn) [18:34:23] (03PS1) 10Manybubbles: Raise the Elasticsearch refresh interval [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126811 [18:35:25] (03PS2) 10Manybubbles: Raise the Elasticsearch refresh interval [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126811 [18:35:27] (03PS1) 10coren: Tool Labs: switch the portgrabber to tools-webproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/126813 [18:40:26] (03CR) 10coren: [C: 032] Tool Labs: switch the portgrabber to tools-webproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/126813 (owner: 10coren) [18:49:03] (03CR) 10Jgreen: [C: 032 V: 031] Adding Node 0.10 dependency back to OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/126039 (owner: 10Mwalker) [18:51:53] ottomata: hmm, I can't login to bast1001? I was able to login a few mins ago [18:52:07] still 400k though a bit less stale ones https://www.wikidata.org/wiki/Special:DispatchStats [18:52:58] no change from my end, let's see [18:53:21] YuviPanda: i'mw atching logs, try now? [18:53:26] 122.169.55.108 [18:53:28] oops [18:53:34] ottomata: 'tis ok :) [18:53:38] Invalid user ypanda f [18:53:44] ottomata: gaaah [18:53:44] your username is yuvipanda [18:53:46] ottomata: I'm an idiot. [18:53:51] :) [18:53:51] ottomata: forgot to add yuvipanda@ [18:53:53] ottomata: sorry! [18:53:55] np [18:58:06] PROBLEM - Puppet freshness on db1056 is CRITICAL: Last successful Puppet run was Wed 16 Apr 2014 06:54:47 AM UTC [18:58:23] !log reinstalling elastic1015 [18:58:30] Logged the message, Master [18:59:50] oh YuviPanda [18:59:56] do you want me to rsync your homedir over from stat1? [19:00:21] ottomata: yeah, that'd be nice [19:00:24] k, doing so now [19:00:56] PROBLEM - Host elastic1015 is DOWN: PING CRITICAL - Packet loss = 100% [19:06:03] ottomata: this is the last one not part of the master dance, right? [19:06:06] RECOVERY - Host elastic1015 is UP: PING OK - Packet loss = 0%, RTA = 0.59 ms [19:07:32] no [19:07:37] this one and then 1016 [19:07:42] i'll try to get both of those done today [19:07:46] and we can start master dance tomorrow [19:08:06] PROBLEM - Disk space on elastic1015 is CRITICAL: Connection refused by host [19:08:16] PROBLEM - DPKG on elastic1015 is CRITICAL: Connection refused by host [19:08:17] PROBLEM - check configured eth on elastic1015 is CRITICAL: Connection refused by host [19:08:26] PROBLEM - RAID on elastic1015 is CRITICAL: Connection refused by host [19:08:26] PROBLEM - puppet disabled on elastic1015 is CRITICAL: Connection refused by host [19:08:26] PROBLEM - SSH on elastic1015 is CRITICAL: Connection refused [19:08:26] PROBLEM - ElasticSearch health check on elastic1015 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.12 [19:08:28] 1001,1002 1007,1008 1013,1014 are all part of the master dance [19:08:46] PROBLEM - check if dhclient is running on elastic1015 is CRITICAL: Connection refused by host [19:09:37] (03PS1) 10MaxSem: Kill all vestiges of $wgMFRemovableClasses [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126826 [19:10:52] Hi problems in accessing LAbs? [19:11:02] Qcoder00: yes, see /topic in -labs. [19:12:48] (03PS1) 10coren: Fix the *.wmflabs.org cert to actually match key [operations/puppet] - 10https://gerrit.wikimedia.org/r/126827 [19:13:38] YuviPanda: andrewbogott: ^^ [19:13:55] (03CR) 10Andrew Bogott: [C: 032] Fix the *.wmflabs.org cert to actually match key [operations/puppet] - 10https://gerrit.wikimedia.org/r/126827 (owner: 10coren) [19:13:58] cool :) [19:18:24] andrewbogott: hey, why did you specify a priority for Sysctl::Parameters['openstack'] (in )? there is no conflict that i can see, and 50 is within the range reserved for packaged params [19:19:34] bd808: can I run git pull on puppet repo on deployment-salt instance? [19:19:52] and run puppet [19:19:53] ? [19:19:54] ottomata: Sure. Just remember to rebase [19:20:05] you have local changes there? [19:20:16] Lots of them, yes [19:20:18] ok [19:20:23] cherry-picks [19:20:26] git fetch; git rebase origin/production [19:20:26] ok? [19:20:27] PROBLEM - NTP on elastic1015 is CRITICAL: NTP CRITICAL: No response from NTP server [19:20:35] ottomata: Yup. [19:20:37] k [19:20:42] ori: As I recall… when the sysctl code was refactored, it originally defaulted all sysctl::parameters uses to a low priority. It resulted in that setting being lower priority than the priority we were trying to override. Hence, breakage. [19:20:52] cool, looks good [19:20:53] thanks [19:21:05] i'm going to see if I can fix this trebuchet submodule bug [19:21:08] Finding/fixing that was the inspiration for the more general 'replace sysctl priorities' patch we were looking at earlier today. [19:21:13] andrewbogott: right -- but it's 60 now, which is actually more correct than the hard-coded value for that resource (50). i'll fix [19:21:27] yes, with the new default it doesn't need to be specified there anymore. [19:21:36] RECOVERY - SSH on elastic1015 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.3 (protocol 2.0) [19:24:50] (03PS1) 10Ori.livneh: Remove unneeded priority settings [operations/puppet] - 10https://gerrit.wikimedia.org/r/126828 [19:25:57] (03PS1) 10BBlack: Update Zero netmapper data from zero.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/126829 [19:26:24] (03CR) 10BBlack: [C: 04-1] "This isn't ready quite yet" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126829 (owner: 10BBlack) [19:26:27] RECOVERY - ElasticSearch health check on elastic1015 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5627: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [19:27:21] (03CR) 10jenkins-bot: [V: 04-1] Update Zero netmapper data from zero.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/126829 (owner: 10BBlack) [19:31:06] RECOVERY - Disk space on elastic1015 is OK: DISK OK [19:31:16] RECOVERY - DPKG on elastic1015 is OK: All packages OK [19:31:16] RECOVERY - check configured eth on elastic1015 is OK: NRPE: Unable to read output [19:31:26] RECOVERY - puppet disabled on elastic1015 is OK: OK [19:31:26] RECOVERY - RAID on elastic1015 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [19:31:46] RECOVERY - check if dhclient is running on elastic1015 is OK: PROCS OK: 0 processes with command name dhclient [19:39:51] (03PS1) 10Ottomata: Installing zpubsub on stat* servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/126832 [19:40:23] (03PS2) 10BBlack: Update Zero netmapper data from zero.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/126829 [19:40:51] (03CR) 10BBlack: [C: 04-1] "..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126829 (owner: 10BBlack) [19:41:22] manybubbles: moving shards back to 1015 [19:41:29] ottomata: yay! [19:41:33] 1012 still only has 74 [19:41:37] its still getting shards back [19:41:41] should I wait before doing 1016? [19:41:44] or continue? [19:42:51] (03CR) 10Ottomata: [C: 032 V: 032] Installing zpubsub on stat* servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/126832 (owner: 10Ottomata) [19:43:06] ottomata: can you force a puppet run? [19:44:26] RECOVERY - NTP on elastic1015 is OK: NTP OK: Offset -0.01369524002 secs [19:44:38] (03PS1) 10Brian Wolff: Add ttf-dejavu to image scalars for "DejaVu (Sans|Serif) Condensed". [operations/puppet] - 10https://gerrit.wikimedia.org/r/126834 [19:48:33] (03PS2) 10Nemo bis: Add ttf-dejavu to image scalers for "DejaVu (Sans|Serif) Condensed". [operations/puppet] - 10https://gerrit.wikimedia.org/r/126834 (owner: 10Brian Wolff) [19:48:49] YuviPanda: ja doing it [19:48:53] distracted by many things at once :) [19:49:00] manybubbles: shoudl I wait before doing 1016? [19:49:09] ottomata: let me look [19:49:12] since neither 1012 and 1015 have their full share of shards yet? [19:49:57] ottomata: they don't have their full share of shards yet.... [19:49:58] (03CR) 10Nemo bis: "Yes, this was already fixed in the past. Would be nice to avoid the breakage from happening again." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126834 (owner: 10Brian Wolff) [19:50:10] but we aren't running very high on load [19:50:39] probably should wait [19:50:45] so tomorrow for that one? [19:50:46] ottomata: hehe. [19:52:09] sure, let's wait then [19:52:19] YuviPanda: zpubsub installed [19:52:31] ottomata: ty! [19:59:22] (03PS1) 10RobH: setting osmium to trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/126837 [20:00:12] (03PS2) 10RobH: setting osmium to trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/126837 [20:00:37] (03CR) 10RobH: [C: 032 V: 032] setting osmium to trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/126837 (owner: 10RobH) [20:03:17] !log osmium cleared from salt, puppetca, and puppetstoredconfig for reinstall with trusty (ignore any icinga alerts, there are no pages) [20:03:18] PROBLEM - Host osmium is DOWN: PING CRITICAL - Packet loss = 100% [20:03:24] Logged the message, RobH [20:08:26] RECOVERY - Host osmium is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [20:10:26] PROBLEM - DPKG on osmium is CRITICAL: Connection refused by host [20:10:36] PROBLEM - Disk space on osmium is CRITICAL: Connection refused by host [20:10:46] PROBLEM - RAID on osmium is CRITICAL: Connection refused by host [20:10:56] PROBLEM - SSH on osmium is CRITICAL: Connection refused [20:10:56] PROBLEM - check configured eth on osmium is CRITICAL: Connection refused by host [20:11:06] PROBLEM - check if dhclient is running on osmium is CRITICAL: Connection refused by host [20:11:06] PROBLEM - puppet disabled on osmium is CRITICAL: Connection refused by host [20:11:10] ori: so trusty hates this disk controller [20:11:20] Jeff_Green: Did you have this issue on trusty installs recently iirc? [20:12:01] there was a kernel bug with one of the dell controllers, but afaik there's a workaround now [20:12:57] its not detecting disks durring installer [20:13:30] does it even load a module for the controller? [20:14:02] doesnt seem to, lemme reboot it and read exactly what it said [20:14:12] ahh, its asking for the driver now [20:14:58] (03PS1) 10Ori.livneh: Sysctl: make the default priority 70 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126839 [20:15:32] andrewbogott: ^ [20:15:39] see commit message for detailed rationale [20:16:02] ori: fwiw, tools.wmflabs.org just got SPDY (v2, not 3, but still). I remember you were interested, so wanted to let you know. [20:16:22] YuviPanda: yes! is it running precise, and is it using the packaged nginx? [20:16:36] ori: precise, and not the packaged nginx. [20:16:47] * ori nods [20:16:51] ori: backported from the thing that starts with Q [20:17:12] ori: haha, no. not even that. it's from the nginx ppa. [20:18:09] ori: we might upgrade to even 1.5.10 at some point, to get spdy 3.1 [20:22:46] PROBLEM - NTP on osmium is CRITICAL: NTP CRITICAL: No response from NTP server [20:23:17] (03CR) 10Andrew Bogott: [C: 031] "This looks right to me but probably needs to be watched carefully when it merges" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126839 (owner: 10Ori.livneh) [20:50:07] Embarassing [20:50:21] I have a merged but not deployed patch from gi11es: https://gerrit.wikimedia.org/r/126190 [20:50:40] I figure it should go out with the SWAT [20:51:00] At least I think it wasn't deployed [20:51:12] Yeah no [20:52:31] greg-g: Heads up ^^ [20:58:54] (03PS1) 10Ottomata: Running update-server-info for submodules during deployment_server_init [operations/puppet] - 10https://gerrit.wikimedia.org/r/126846 [21:00:18] (03CR) 10jenkins-bot: [V: 04-1] Running update-server-info for submodules during deployment_server_init [operations/puppet] - 10https://gerrit.wikimedia.org/r/126846 (owner: 10Ottomata) [21:00:20] (03PS2) 10Ottomata: Running update-server-info for submodules during deployment_server_init [operations/puppet] - 10https://gerrit.wikimedia.org/r/126846 [21:01:41] (03CR) 10jenkins-bot: [V: 04-1] Running update-server-info for submodules during deployment_server_init [operations/puppet] - 10https://gerrit.wikimedia.org/r/126846 (owner: 10Ottomata) [21:05:59] (03PS3) 10Ottomata: Running update-server-info for submodules during deployment_server_init [operations/puppet] - 10https://gerrit.wikimedia.org/r/126846 [21:07:21] (03CR) 10jenkins-bot: [V: 04-1] Running update-server-info for submodules during deployment_server_init [operations/puppet] - 10https://gerrit.wikimedia.org/r/126846 (owner: 10Ottomata) [21:09:00] (03PS4) 10Ottomata: Running update-server-info for submodules during deployment_server_init [operations/puppet] - 10https://gerrit.wikimedia.org/r/126846 [21:31:47] marktraceur: /me nods [21:37:46] (03CR) 10BryanDavis: Adding '*.panoramio.com' to the wgCopyUploadsDomains array [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126384 (owner: 10Marco) [21:55:50] (03CR) 10Ori.livneh: "I can watch it as it deploys (I'd watch the LVS servers, since it's the one case where an incorrect setting can quickly lead to severe fai" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126839 (owner: 10Ori.livneh) [21:57:27] (03CR) 10BryanDavis: Running update-server-info for submodules during deployment_server_init (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/126846 (owner: 10Ottomata) [21:58:44] PROBLEM - Puppet freshness on db1056 is CRITICAL: Last successful Puppet run was Wed 16 Apr 2014 06:54:47 AM UTC [22:08:49] (03CR) 10MaxSem: [C: 031] rm wap.wikipedia.org apache site [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 (owner: 10Dzahn) [22:43:07] bblack, have a sec? I thought we didn't have python on varnish srvrs [22:59:37] ori, RoanKattouw, ebernhardson; I'm happy to take the swat if no one else is doing it [22:59:46] mwalker: Sweet [23:00:21] Thanks man, I'd totally forgotten about it [23:00:24] * greg-g has an interview starting now, just fyi [23:00:34] enjoy :) [23:02:03] greg-g: Interview for? :o [23:02:10] (03CR) 10Mwalker: [C: 032] Activate the "other projects" sidebar managed by Wikibase in itwikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126223 (owner: 10Ricordisamoa) [23:02:21] JohnLewis: hiring people :) [23:02:22] (03Merged) 10jenkins-bot: Activate the "other projects" sidebar managed by Wikibase in itwikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126223 (owner: 10Ricordisamoa) [23:02:42] greg-g: Enjoy :p [23:02:51] mwalker: Thanks :) [23:03:01] will try, second one today, two different positions, feel a bit scattered :) [23:03:48] greg-g: If neither fit - tell them to hire me :D [23:04:14] :) [23:05:08] Recommend them each for the other's position [23:05:33] Also ohai mwalker, thanks for the halp [23:05:43] soooo many MMV patches :) [23:05:43] As always [23:05:50] mwalker: We're *too* productive. [23:06:07] Alternatively we're lazy and getting things out too slowly. [23:06:43] isn't that what the train is for? [23:06:56] hiding that everythign you're doing was at 2 in the morning the day before? [23:07:33] Something like that [23:07:42] mwalker: But we did need this to go out to wmf22 [23:07:46] Before the train. :) [23:13:05] tgr, the configuration change https://gerrit.wikimedia.org/r/#/c/126190/ was deployed earlier today -- so I'm just going to do your code change [23:15:10] !log mwalker Started scap: SWAT deploy: configuration change {{gerrit|126223}} and multimediaviewer {{gerrit|126852}} [23:15:16] Logged the message, Master [23:15:27] mwalker: thx [23:15:54] <3 my hero [23:19:17] !log mwalker Finished scap: SWAT deploy: configuration change {{gerrit|126223}} and multimediaviewer {{gerrit|126852}} (duration: 04m 07s) [23:19:24] Logged the message, Master [23:19:29] marktraceur, tgr ^ [23:19:33] Thanks [23:19:42] so far no errors or exceptions [23:19:45] mwalker: Thanks :) [23:19:49] tgr: I still don't see the survey but I expect that's the issue we were already seeing... [23:20:03] mwalker: Verified the change I asked for works. [23:20:22] marktraceur: i just checked enwikivoyage and it appears so the config change was indeed deployed earlier [23:20:31] JohnLewis, awesome [23:20:50] i can check the other wikis [23:21:39] OK now I see it, we good [23:22:09] kk; I think we're done here then? [23:22:47] Yup [23:22:49] Thanks mwalker [23:23:01] man; someone just caused a lot of angry to the proofreadpage in mobile frontend [23:23:24] jdlrobson, jfyi: "Fatal error: Call to undefined method ProofreadPageDifferenceEngine::getWarningMessageText() at /usr/local/apache/common-local/php-1.23wmf22/extensions/MobileFrontend/includes/specials/SpecialMobileDiff.php on line 179" [23:30:45] DifferenceEngine doesn't have a getWarningMessageText() [23:31:12] InlineDifferenceEngine does [23:32:36] well; that certainly makes sense for why it threw a fatal :) I filed it as https://bugzilla.wikimedia.org/show_bug.cgi?id=64037 [23:52:47] srsly, diffs on wikisource? who cares?:P [23:54:35] MaxSem: I care :P [23:54:36] I imagine people who edit wikisource