[00:01:34] (03CR) 10Ori.livneh: "cleaned up" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134282 (owner: 10Hoo man) [00:03:35] (03CR) 10GWicke: "Is it clear what is causing the underlying issue?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135875 (owner: 10Aaron Schulz) [00:16:54] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:16:33 UTC [00:18:54] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:17:55 UTC [00:23:54] PROBLEM - Puppet freshness on lvs1001 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:23:03 UTC [00:25:39] (03CR) 10Ori.livneh: [C: 031] scap: ensure=>absent /usr/local/bin/sync-common-file [operations/puppet] - 10https://gerrit.wikimedia.org/r/135924 (owner: 10BryanDavis) [00:25:56] (03CR) 10Ori.livneh: [C: 031] scap: /usr/local/bin/sync-common-file is unused [operations/puppet] - 10https://gerrit.wikimedia.org/r/135925 (owner: 10BryanDavis) [00:29:54] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:29:03 UTC [00:33:54] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:32:59 UTC [00:40:37] https://gerrit.wikimedia.org/r/#/admin/projects/phabricator/phabricator appears to be broken [00:40:44] rather. the git repo is broken [00:40:54] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:40:36 UTC [00:41:26] $ git clone https://gerrit.wikimedia.org/r/phabricator/phabricator [00:41:26] Cloning into 'phabricator'... [00:41:26] Checking connectivity... done. [00:41:26] warning: remote HEAD refers to nonexistent ref, unable to checkout. [00:41:36] <^d> Nobody's pushed anything to master yet. [00:41:48] <^d> HEAD's pointing at refs/heads/master [00:43:01] so the repo's empty [00:43:05] <^d> Yep. [00:43:09] nice. [00:47:54] PROBLEM - Puppet freshness on lvs1002 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:46:52 UTC [01:03:44] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Thu May 29 01:03:42 UTC 2014 [02:00:45] (03CR) 10BBlack: [C: 032 V: 032] fix daemonization stdio stuff [operations/debs/pybal] - 10https://gerrit.wikimedia.org/r/134651 (owner: 10BBlack) [02:36:48] !log LocalisationUpdate completed (1.24wmf5) at 2014-05-29 02:35:45+00:00 [02:36:54] Logged the message, Master [02:43:02] (03CR) 10Jeremyb: [C: 04-1] "John, do you want to fix "newikimedia.org"? (no new domains are being registered for this bug)" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133991 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [02:45:25] (03CR) 10Jeremyb: "renew +1" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133981 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [03:06:13] (03CR) 10Jeremyb: "Reedy, what do you think about DB name?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133982 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [03:06:42] (03CR) 10Jeremyb: "renew +1" [operations/dns] - 10https://gerrit.wikimedia.org/r/133980 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [03:11:18] !log LocalisationUpdate completed (1.24wmf6) at 2014-05-29 03:10:15+00:00 [03:11:23] Logged the message, Master [03:17:54] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:16:33 UTC [03:18:14] (03PS1) 10Springle: repool db1009 after raid tests [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135991 [03:18:47] (03CR) 10Springle: [C: 032] repool db1009 after raid tests [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135991 (owner: 10Springle) [03:18:56] (03Merged) 10jenkins-bot: repool db1009 after raid tests [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135991 (owner: 10Springle) [03:19:31] !log springle Synchronized wmf-config/db-eqiad.php: repool db1009 in s2 (duration: 00m 08s) [03:19:54] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:17:55 UTC [03:24:54] PROBLEM - Puppet freshness on lvs1001 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:23:03 UTC [03:30:54] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:29:03 UTC [03:32:25] ori: yt [03:32:33] springle: hey [03:32:43] what did i break? [03:33:40] for some reason sync-file isn't syncing db-eqiad.php. wondering if the mediawiki::sync mearge earlier means i have to do something differently? [03:34:05] (wild guess looking at commit log and seeing 'sync') [03:35:02] i'll take a look [03:35:16] it deliberately doesn't work if the file has syntax errors [03:35:28] and it also has been hanging after successfully syncing at the !log part [03:36:03] it said this to me: http://paste.debian.net/102338/ [03:36:27] that looks right [03:36:32] which looks fine. only no traffic on db1009, and a random mw node has old db-eqiad.php [03:37:49] ok, first things first, to unblock you [03:37:55] sync-file has been recently ported to python [03:37:59] here's the old shell script: https://dpaste.de/Fwzr/raw [03:38:05] you can just run that for the time being [03:38:08] and i'll keep looking [03:38:26] bd808|BUFFER: fyi ^^ [03:38:33] ok thanks [03:39:41] heh [03:39:59] that's already broken by gerrit 135924 [03:40:25] oh not merged [03:40:37] well, can't find /usr/local/bin/sync-common-file [03:40:46] ori: i'll wait :) it's no hurry for db1009 [03:40:51] d'oh [03:40:58] though i might panic if we suddenly have to depool something [03:41:54] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:40:36 UTC [03:43:49] springle: try now [03:44:10] !log springle Synchronized wmf-config/db-eqiad.php: repool db1009 in s2 (duration: 00m 08s) [03:44:30] nope [03:44:33] same [03:45:52] !log disabled puppet on tin and copied sync-common-file from mediawiki/tools/scap@8f2a8356c38 into /usr/local/bin to debug sync issue [03:45:56] Logged the message, Master [03:48:06] wow, wtf [03:48:11] oh wait [03:48:20] try now -> which method [03:48:24] old or new [03:48:28] old [03:48:35] sorry, i should have been explicit [03:48:43] np trying [03:48:54] PROBLEM - Puppet freshness on lvs1002 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:46:52 UTC [03:49:01] yep old works [03:49:32] ok, the reason i had to disable puppet is that sync-common-file has an ensure => absent on its head at the moment, since it was just purged from the cluster today [03:49:32] ori: thanks. i'll leave you in peace to debug :) [03:49:39] ok [03:49:41] since it's already gone from all hosts, i'll submit a commit to remove it from puppet [03:49:47] and i'll leave just tin's copy in place [03:49:50] !log springle synchronized wmf-config/db-eqiad.php 'repool db1009 in s2, take #2' [03:49:51] for now, until i figure it out [03:49:56] Logged the message, Master [03:51:57] !log db1009 mariadb 5.5.37 live trial with low load [03:52:02] Logged the message, Master [03:55:21] !log ori Synchronized README: Debugging sync-file (duration: 00m 06s) [03:55:26] Logged the message, Master [03:56:33] that worked just fine [03:56:36] that's so add [03:56:38] *odd [03:57:59] :) [03:58:08] if i had to guess, i'd say ssh agent forwarding is getting screwed up somehow [03:58:35] and each host is quietly rebuffing your attempt to log in [03:59:05] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu May 29 03:57:59 UTC 2014 (duration 57m 58s) [03:59:09] Logged the message, Master [04:00:50] i can definitely still manually ssh from tin to mw nodes [04:01:00] i guess the old sync-file proves that too [04:13:07] !log re-enabled puppet on tin [04:13:13] Logged the message, Master [04:15:05] https://gerrit.wikimedia.org/r/#/c/135994/ is a revert to the old status quo, if i get a review on that i'll sync it and leave it for bryan to debug [04:18:05] my money is on something odd with the --include --exclude stuff in new sync_common [04:20:42] !log updated scap to 9ba9014: Partially revert "Convert sync-dir and sync-file to python" [04:20:47] Logged the message, Master [04:22:45] springle: i just reverted it for now. could you sync /a/common/README to confirm? [04:26:59] !log springle synchronized README 'test sync-file' [04:27:03] Logged the message, Master [04:27:23] ori: seems ok [04:28:37] you might be right [04:29:23] the generated command-line is presumably rsync yadda yadda --include=db-eqiad.php --exclude=* [04:32:10] i normally run sync-file wmf-config/db-eqiad.php 'blah' [04:32:20] with cwd /a/common [04:32:41] i didn't think to try it directly from wmf-config dir [04:33:14] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:35:10] two points from the man page: [04:35:12] 1) "If a pattern excludes a particular parent directory, it can render a deeper include pattern ineffectual because rsync did not descend through that excluded section of the hierarchy. This is particularly important when using a trailing ’*’ rule." [04:35:31] 2) "if the pattern starts with a / then it is anchored to a particular spot in the hierarchy of files, otherwise it is matched against the end of the pathname." [04:36:12] i don't know if it canonicalizes the path [04:36:20] but maybe it excluded . [04:37:10] anyways, things are consistent everywhere (no local hacks), and you're able to sync [04:37:13] so i'm calling it a night [04:37:17] * ori waves [04:37:36] ori: night! [04:41:04] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.006 second response time [06:18:54] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:16:33 UTC [06:20:54] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:17:55 UTC [06:25:54] PROBLEM - Puppet freshness on lvs1001 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:23:03 UTC [06:27:33] _joe_: 24h for you ^ :) [06:31:54] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:29:03 UTC [06:42:54] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:40:36 UTC [06:49:54] PROBLEM - Puppet freshness on lvs1002 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 21:46:52 UTC [07:08:45] akosiaris: morning, wanted to touch base regarding sanger. where does it stand ? [07:09:49] matanya: meaning ? [07:10:37] I spoke to joel last night, asking him if he opened a ticket in rt regarding a replacemnt for sanger in eqiad [07:10:57] he said you are the person to conrtact regarding such a question, and you own the task [07:11:15] <_joe_> what the hell is happening to lvs servers? [07:11:24] <_joe_> akosiaris: did you took a look? [07:12:00] I know about #6163 but nothing moved on ever since [07:12:20] ah ok. So the LDAP part of sanger, yes I do own that task. We will be migrating the LDAP part to OpenLDAP and to a new machine in eqiad. I am developing the OpenLDAP module [07:12:27] _joe_: looking right now [07:15:06] duplicate definition ? [07:15:08] meh... [07:15:12] thanks akosiaris , please update the ticket in your spare time [07:15:27] akosiaris: i guess it is the admin yaml stuff [07:17:31] _joe_: the culprit seems to be https://gerrit.wikimedia.org/r/#/c/135928/ [07:17:48] bad guess :/ [07:19:55] <_joe_> akosiaris: will you fix that, or should we revert the change for now? [07:20:34] the puppet alert is not on those host changed in this patch [07:21:06] _joe_: trying to fix [07:21:44] matanya: sure it is. look at @@ -1643,6 +1646,12 @@ node /lvs100[1-6]\.wikimedia\.org/ { [07:22:39] stupid gerrit diff [07:22:49] why hide this? :/ [07:23:04] (03CR) 10Hashar: "Excellent! Thank you very much :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134739 (owner: 10Dzahn) [07:29:25] _joe_: here is a friend of yours. Look at class interface::bonding-tools [07:29:45] this is not going to break as long as it is included, but imagine a variable there ;-) [07:30:31] <_joe_> akosiaris: we're full of these [07:30:31] <_joe_> I thought we will change them once we need to [07:30:48] <_joe_> btw, merging naggen2 [07:30:56] <_joe_> I hope I don't screw neon up [07:30:57] yeah I know, I couldn't help myself though [07:31:34] (03PS9) 10Giuseppe Lavagetto: icinga: replace naggen [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 [07:35:37] (03CR) 10Giuseppe Lavagetto: [C: 032] icinga: replace naggen [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 (owner: 10Giuseppe Lavagetto) [07:36:58] <_joe_> hey ho, let's go! [07:38:58] (03PS1) 10Alexandros Kosiaris: Split interface-rps script File resources to class [operations/puppet] - 10https://gerrit.wikimedia.org/r/136000 [07:42:06] (03CR) 10Alexandros Kosiaris: [C: 032] Split interface-rps script File resources to class [operations/puppet] - 10https://gerrit.wikimedia.org/r/136000 (owner: 10Alexandros Kosiaris) [07:43:04] god I am an idiot [07:43:46] <_joe_> me too [07:43:47] <_joe_> I just broke something [07:44:47] (03PS1) 10Alexandros Kosiaris: Fix a typo introduced in 1a0605b [operations/puppet] - 10https://gerrit.wikimedia.org/r/136001 [07:45:40] <_joe_> akosiaris: let me merge a change quickli please [07:46:30] sure [07:46:39] (03PS1) 10Giuseppe Lavagetto: naggen2: fix template [operations/puppet] - 10https://gerrit.wikimedia.org/r/136002 [07:47:13] argh this is ugly :P [07:47:20] (03CR) 10Giuseppe Lavagetto: [C: 032] naggen2: fix template [operations/puppet] - 10https://gerrit.wikimedia.org/r/136002 (owner: 10Giuseppe Lavagetto) [07:47:50] <_joe_> akosiaris: yeah I was able to fuckup the simplest part of the commit [07:47:50] <_joe_> :/ [07:48:06] (03CR) 10Giuseppe Lavagetto: [V: 032] naggen2: fix template [operations/puppet] - 10https://gerrit.wikimedia.org/r/136002 (owner: 10Giuseppe Lavagetto) [07:49:24] btw. would it not work to do something like <% myvar = scope.lookupvar('::puppet::config') %> and then just do the evaluations ? like dsn <%= myvar['dbuser']:myvar['db'] %> and so on ? [07:49:24] <_joe_> akosiaris: that does not seem to be the only problem, crap. [07:49:40] (03CR) 10Alexandros Kosiaris: [C: 032] Fix a typo introduced in 1a0605b [operations/puppet] - 10https://gerrit.wikimedia.org/r/136001 (owner: 10Alexandros Kosiaris) [07:50:04] <_joe_> akosiaris: puppet is not running on the puppetmasters ATM [07:50:20] ok [07:50:27] <_joe_> akosiaris: the problem is, for some reason ::puppet::config is not available in that context [07:50:39] hmmm weird... [07:50:44] RECOVERY - Puppet freshness on lvs1002 is OK: puppet ran at Thu May 29 07:50:43 UTC 2014 [07:50:48] <_joe_> now I have to figure out why [07:50:50] good to know [07:50:51] <_joe_> but go on with your changes :) [07:51:22] ok so I merged and this fixed the lvs puppet issue [07:51:26] <_joe_> I think that is the name clash :) [07:51:41] <_joe_> I know what is the best way to solve this [07:51:46] ahhhh, now it makes sense [07:52:08] ok I got it, yeah you were absolutely right yesterday :) [07:52:08] <_joe_> I told you that was a problem :) [07:52:58] <_joe_> so, now I have to understand how to manage this. [07:53:25] <_joe_> the easiest way would be to restore my stash that was reading from puppet.conf [07:53:34] <_joe_> in naggen2 [07:53:34] RECOVERY - Puppet freshness on lvs1001 is OK: puppet ran at Thu May 29 07:53:30 UTC 2014 [07:53:51] <_joe_> yeah ok, on it [07:55:05] <_joe_> ~ 20 mins and it will be ok [07:59:14] RECOVERY - Puppet freshness on lvs1003 is OK: puppet ran at Thu May 29 07:59:09 UTC 2014 [08:11:05] (03PS1) 10Giuseppe Lavagetto: naggen2: read config from puppet.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/136003 [08:11:34] RECOVERY - Puppet freshness on lvs1006 is OK: puppet ran at Thu May 29 08:11:32 UTC 2014 [08:12:04] <_joe_> [08:12:04] <_joe_> mmmh gerrit-wm where art thou? [08:12:54] (03PS2) 10Giuseppe Lavagetto: naggen2: read config from puppet.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/136003 [08:14:34] (03CR) 10Giuseppe Lavagetto: [C: 032] naggen2: read config from puppet.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/136003 (owner: 10Giuseppe Lavagetto) [08:14:50] grrrit-wm: _joe_ lookst for thee [08:15:12] But thou art here, in foro. [08:16:17] <_joe_> marktraceur: my ancient english knowledge is very limited [08:16:54] RECOVERY - Puppet freshness on lvs1005 is OK: puppet ran at Thu May 29 08:16:49 UTC 2014 [08:17:03] _joe_: "in foro" is actually overly complicated Latin, but the rest should be understandable. "in foro" means "in the forum" [08:17:15] ....roughly [08:17:40] <_joe_> I do understand in foro [08:17:43] Because "forum" in modern English could translate to too many things for it to be an accurate translation [08:17:44] RECOVERY - Puppet freshness on lvs1004 is OK: puppet ran at Thu May 29 08:17:40 UTC 2014 [08:17:50] <_joe_> I'm italian and I studied latin :P [08:17:59] Well then I will shut up. :) [08:18:00] <_joe_> I just can't answere you properly [08:18:48] <_joe_> (in foro means 'in the forum' in italian as well, even if we'd use an article) [08:18:54] _joe_: Something about pining after grrrit-wm and also gesturing meaningfully at a papier-mache' moon that your classmates created. My knowledge of Shakespeare is limited. [08:22:30] <_joe_> :) [08:24:19] (03PS1) 10Giuseppe Lavagetto: naggen2: fix ensure_resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/136004 [08:26:49] (03CR) 10Giuseppe Lavagetto: [C: 032] naggen2: fix ensure_resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/136004 (owner: 10Giuseppe Lavagetto) [08:28:06] <_joe_> I should do things with more calm [08:28:10] <_joe_> my bad [08:28:56] <_joe_> one last horrible commit and I'm done - I hope :( [08:29:28] <_joe_> If we did not have that NS clash, it would have been good the second time though [08:30:11] (03PS1) 10Giuseppe Lavagetto: naggen2: remove references to the config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/136005 [08:31:35] (03CR) 10Giuseppe Lavagetto: [V: 032] naggen2: remove references to the config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/136005 (owner: 10Giuseppe Lavagetto) [08:31:43] (03CR) 10Giuseppe Lavagetto: [C: 032] naggen2: remove references to the config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/136005 (owner: 10Giuseppe Lavagetto) [08:49:58] (03PS1) 10Giuseppe Lavagetto: naggen2: remove ensure_resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/136006 [08:52:48] (03CR) 10Giuseppe Lavagetto: [C: 032] "This is hopefully the last one :(" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136006 (owner: 10Giuseppe Lavagetto) [09:13:25] (03PS1) 10Filippo Giunchedi: install python-imaging on appservers [operations/puppet] - 10https://gerrit.wikimedia.org/r/136007 [09:15:45] (03PS1) 10Giuseppe Lavagetto: naggen2: improving log format and usage [operations/puppet] - 10https://gerrit.wikimedia.org/r/136008 [09:17:00] (03CR) 10Giuseppe Lavagetto: [C: 031] install python-imaging on appservers [operations/puppet] - 10https://gerrit.wikimedia.org/r/136007 (owner: 10Filippo Giunchedi) [09:17:21] (03CR) 10Filippo Giunchedi: "I'm not an expert in the mediawiki module but it seems to make sense, please double check the RT too!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136007 (owner: 10Filippo Giunchedi) [09:17:36] (03CR) 10Giuseppe Lavagetto: [C: 032] naggen2: improving log format and usage [operations/puppet] - 10https://gerrit.wikimedia.org/r/136008 (owner: 10Giuseppe Lavagetto) [09:18:19] matanya: nice (re: 24h) [09:18:27] :) [09:19:12] I was looking for someone familiar with mediawiki role for https://gerrit.wikimedia.org/r/#/c/136007/ + related RT [09:19:37] ah I see _joe_ +1'd too, thanks! [09:19:49] <_joe_> godog: yes I mean, it does no harm [09:20:01] <_joe_> that's why I gave +1 [09:20:26] <_joe_> also, you're installing python packages, you can't be wrong :P [09:21:21] haha true [10:42:36] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Last successful Puppet run was Thu 29 May 2014 07:41:48 UTC [10:47:07] <_joe_> uhm, admins.pp I guess [10:47:43] <_joe_> no, this is mine :( [10:50:53] (03CR) 10Nikerabbit: naggen2: remove ensure_resource (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136006 (owner: 10Giuseppe Lavagetto) [10:52:17] <_joe_> Nikerabbit: there is one more serious problem there. I had a reason to use ensure_resource after all :( [10:57:36] PROBLEM - Puppet freshness on virt0 is CRITICAL: Last successful Puppet run was Thu 29 May 2014 07:57:07 UTC [10:58:05] (03PS1) 10Giuseppe Lavagetto: naggen: avoid clash with saltmaster for packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/136011 [11:05:14] (03CR) 10Giuseppe Lavagetto: [C: 032] "This fixes virt* puppet catalogs, and does not break prod puppet masters." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136011 (owner: 10Giuseppe Lavagetto) [11:05:46] _joe_: I'm just learning by reading commits, giving drive-by comments while doing it [11:06:15] <_joe_> Nikerabbit: my commits from this morning are a shame :/ [11:07:26] RECOVERY - Puppet freshness on virt1000 is OK: puppet ran at Thu May 29 11:07:24 UTC 2014 [11:07:43] _joe_: we all write bad code. the thing is to recognize it afterwards and avoid it in the future [11:08:28] <_joe_> Nikerabbit: no, very simply I was in a hurry to fix things and rushed a couple of commits [11:08:28] <_joe_> which is stupid and pointless [11:09:27] s/write bad code/do stupid things/ [11:09:28] <_joe_> well, now virt* are ok, so I can go to lunch [11:09:39] my lunch is ready too [11:27:36] RECOVERY - Puppet freshness on virt0 is OK: puppet ran at Thu May 29 11:27:29 UTC 2014 [11:35:39] _joe_: can you do a bigupload? [12:06:14] what is a bigupload? [12:49:34] paravoid: files over 100mb to commons. [12:49:43] I think it needs bugzilla [12:51:55] (03PS2) 10QChris: Allow to set up hive's auxpath globally [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/135539 [12:55:03] (03PS1) 10QChris: Use HCatalog as default auxpath for Hive [operations/puppet] - 10https://gerrit.wikimedia.org/r/136014 [12:56:39] (03CR) 10jenkins-bot: [V: 04-1] Use HCatalog as default auxpath for Hive [operations/puppet] - 10https://gerrit.wikimedia.org/r/136014 (owner: 10QChris) [12:57:02] (03CR) 10QChris: "> Hm, ok, but let's parameterize it then. Make a parameter [...]" [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/135539 (owner: 10QChris) [12:58:43] <_joe_> matanya: I don't think I know how to do that sorry [12:58:58] thanks [13:00:20] (03CR) 10QChris: [C: 04-1] "Jenkins fails because the corresponding commit" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136014 (owner: 10QChris) [13:04:11] _joe_: future use: https://commons.wikimedia.org/wiki/Help:Server-side_upload [13:04:39] <_joe_> matanya: thanks, will read. I'm a total n00b [13:05:05] not really. but meh [13:05:15] found a work around [13:06:25] I think I managed to get some server side upload done a few months ago [13:07:04] video sucks on commons, in so many ways [13:07:40] I wish i could upload files from labs. will save so much time [13:10:10] I'm pretty sure you can, we even have a tool for that. :) http://tools.wmflabs.org/ia-upload/commons/init [13:13:58] Nemo_bis: it gives me internal eroor [13:14:02] *error [13:18:08] _joe_: the naggen files are not filebucketed [13:18:14] we have backup => false iirc [13:18:24] they are checksummed, but this can be disabled as well [13:18:29] with checksum => mtime [13:20:16] (03CR) 10Faidon Liambotis: [C: 032] Switch LVS to "performance" cpufreq governor [operations/puppet] - 10https://gerrit.wikimedia.org/r/135929 (owner: 10BBlack) [13:20:18] matanya: you may need to enable cookies [13:20:26] I have cookies [13:20:29] but that's for archive.org files for [13:20:33] + only [13:23:25] andre__: When you get a second, just a bit more detail on https://bugzilla.wikimedia.org/show_bug.cgi?id=65861 will help me track it down. [13:23:55] Coren, I'm not even sure if I understand the question correctly :-/ [13:24:07] (03CR) 10Faidon Liambotis: [C: 031] "Looks good, but add your name to Authors at the top :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135931 (owner: 10BBlack) [13:24:08] Coren, isn't that info in the header files in comment 0? [13:24:25] (03CR) 10Faidon Liambotis: [C: 032] enable RSS for LVS servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/135932 (owner: 10BBlack) [13:25:49] Coren: Received: from jenkins-bot by fab2.eqiad.wmflabs with local (Exim 4.76) ? [13:26:24] (03CR) 10Ottomata: Use HCatalog as default auxpath for Hive (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136014 (owner: 10QChris) [13:28:36] (03CR) 10Ottomata: Allow to set up hive's auxpath globally (032 comments) [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/135539 (owner: 10QChris) [14:09:05] (03PS1) 10Giuseppe Lavagetto: rcs: install servers with trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/136022 [14:29:04] (03PS1) 10Alexandros Kosiaris: mysql-predump optimizations [operations/puppet] - 10https://gerrit.wikimedia.org/r/136026 [14:32:26] (03PS2) 10Alexandros Kosiaris: mysql-predump optimizations [operations/puppet] - 10https://gerrit.wikimedia.org/r/136026 [14:34:25] <_joe_> akosiaris: I was about to submit a comment about the missing -x :P [14:35:06] <_joe_> also, I don't understand why -x and not -e [14:35:28] cause I am an idiot [14:35:36] obviously -e and not -x [14:36:01] <_joe_> I was looking at the bash manual to see if I got something terribly wrong :P [14:36:38] and I even wrote a comment about the behaviour. I really do not know why I put -x there [14:36:42] thanks for catching that [14:37:18] (03PS3) 10Alexandros Kosiaris: mysql-predump optimizations [operations/puppet] - 10https://gerrit.wikimedia.org/r/136026 [14:37:28] (03PS2) 10Dzahn: admin yaml for tridge (backups) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135957 [14:39:01] (03CR) 10Dzahn: [C: 032] admin yaml for tridge (backups) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135957 (owner: 10Dzahn) [14:50:24] James_F|Away: SWAT in 10 minutes [14:59:08] (03PS4) 10Alexandros Kosiaris: mysql-predump optimizations [operations/puppet] - 10https://gerrit.wikimedia.org/r/136026 [14:59:38] anomie: Here. [15:00:13] * anomie begins SWAT [15:00:13] Oh, wow, we now have a bot? [15:00:17] Is this new? [15:00:35] James_FL A week old IIRC [15:00:37] I know mwalker|away was talking about this… [15:00:47] JohnLewis: Ah, clearly I just haven't been paying attention. :-) [15:01:03] James_F: Clearly :) [15:01:12] * anomie still doesn't like how the bot uses /notice instead of actually pinging in the channel [15:01:27] Indeed. [15:01:43] Especially since a /notice doesn't actually *ping* me [15:02:06] Helpful. [15:02:12] But it's in prettier colour! [15:02:19] (At least on irssi.) [15:02:46] * bd808 suggests pull requests to https://github.com/mattofak/jouncebot [15:02:52] Ping is configurable. (At least on irssi.) [15:03:09] a /notice is made for that? [15:03:28] (03PS1) 10Alexandros Kosiaris: ganeti evaluation shared IP [operations/dns] - 10https://gerrit.wikimedia.org/r/136055 [15:03:46] (03CR) 10Alexandros Kosiaris: [C: 032] mysql-predump optimizations [operations/puppet] - 10https://gerrit.wikimedia.org/r/136026 (owner: 10Alexandros Kosiaris) [15:03:47] The change would be at https://github.com/mattofak/jouncebot/blob/master/jouncebot.py#L135 [15:04:12] s/notice/privmsg/ [15:05:42] sounds like you're fixing the bot because the clients have an issue [15:06:13] mutante: Made for what? The problem here is that the bot is using /notice when that's not what /notice is for, surely? [15:06:17] according to the irc spec, the only time bots have to use notice is if they're responding to a privmsg [15:06:27] James_F: for sending notices to people? [15:06:40] mutante: /notice is normally used (IME, I could be wrong) for notices about the channel – not a ping to an individual to do a task. [15:06:48] !log anomie synchronized php-1.24wmf6/extensions/VisualEditor/modules/ve-mw/ 'SWAT: VisualEditor URL decoding and image alignment fixes. [[gerrit:135922]] [[gerrit:135946]]' [15:06:49] mutante: No, that's /query ;-) [15:06:57] James_F: ^ Test please. Don't forget you have two different changes included there. [15:06:58] Logged the message, Master [15:06:59] /query is a private message [15:07:05] anomie: Yup. Thanks. [15:07:07] mutante: Indeed. [15:07:22] do you want the bot to not talk in public? then i got you wrong [15:07:41] (03CR) 10Alexandros Kosiaris: [C: 04-2] "duh. I had this in my mind but instead of checking out and submitting my patch in this changeId I created a new changeId:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135517 (owner: 10Springle) [15:08:25] mutante: If all you want is to inform individuals, you should use /query. If you want to get people to do something, and for others to be aware, /privmsg in channel. If you want everyone to be vaguely aware of something related to the channel, /notice [15:08:47] mutante: Or at least that's the guidance/policy we used to have. Other policies might make sense, though. [15:09:41] James_F: ok, fair, so which of the 3 is the goal of jouncebot? [15:10:12] mutante: I think the goal of the bot is #2 not #3. It's a specific reminder to specific people (hey, go deploy), plus awareness to third parties (e.g. you, me, etc.) [15:10:28] mutante: But instead we could do /notice and /query combined, maybe? [15:10:34] * James_F is easy. [15:10:58] anomie: All tested, everything is great. Thank you! [15:11:02] James_F: no, you convinced me [15:11:05] * anomie is done with SWAT [15:14:26] (03PS1) 10Reedy: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136056 [15:14:28] (03PS1) 10Reedy: testwiki to 1.24wmf7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136057 [15:14:30] (03PS1) 10Reedy: Wikipedias to 1.24wmf6 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136058 [15:14:32] (03PS1) 10Reedy: group0 to 1.24wmf7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136059 [15:14:38] time for reedy spam [15:15:07] (03CR) 10Reedy: [C: 032] Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136056 (owner: 10Reedy) [15:15:15] (03Merged) 10jenkins-bot: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136056 (owner: 10Reedy) [15:15:27] (03PS1) 10Rush: admin yaml analytics1009.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136060 [15:15:29] (03CR) 10Reedy: [C: 032] testwiki to 1.24wmf7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136057 (owner: 10Reedy) [15:16:26] mutante: Maybe, OTOH, the logging bots could use /notice. [15:16:29] * James_F ponders. [15:16:52] (03Merged) 10jenkins-bot: testwiki to 1.24wmf7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136057 (owner: 10Reedy) [15:17:06] (03PS1) 10Christopher Johnson (WMDE): Icinga: new command "check_dispatch" for Wikidata [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 [15:17:39] !log reedy Started scap: testwiki to 1.24wmf7 and build l10n cache [15:17:44] Logged the message, Master [15:17:45] (03CR) 10jenkins-bot: [V: 04-1] admin yaml analytics1009.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136060 (owner: 10Rush) [15:29:10] (03PS2) 10Christopher Johnson (WMDE): Icinga: new command "check_dispatch" for Wikidata [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 [15:33:41] (03PS2) 10Rush: admin yaml analytics1009.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136060 [15:34:41] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I'm strongly opposed to install a perl module via puppi. I'd rather package/backport libperl-json-path." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 (owner: 10Christopher Johnson (WMDE)) [15:35:12] (03CR) 10jenkins-bot: [V: 04-1] admin yaml analytics1009.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136060 (owner: 10Rush) [15:37:21] (03PS3) 10Rush: admin yaml analytics1009.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136060 [15:39:14] (03CR) 10jenkins-bot: [V: 04-1] admin yaml analytics1009.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136060 (owner: 10Rush) [15:39:17] (03CR) 10Alexandros Kosiaris: [C: 032] ganeti evaluation shared IP [operations/dns] - 10https://gerrit.wikimedia.org/r/136055 (owner: 10Alexandros Kosiaris) [15:39:26] (03PS1) 10Alexandros Kosiaris: thallium,mercury ganeti eval machines setup [operations/puppet] - 10https://gerrit.wikimedia.org/r/136104 [15:40:11] (03PS4) 10Rush: admin yaml analytics1009.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136060 [15:42:03] !log reedy Finished scap: testwiki to 1.24wmf7 and build l10n cache (duration: 24m 24s) [15:42:03] (03CR) 10Alexandros Kosiaris: [C: 04-1] "I would say this is missing a partitioning scheme. Should be as easy as adding a line in" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136022 (owner: 10Giuseppe Lavagetto) [15:42:07] Logged the message, Master [15:42:55] (03PS5) 10Rush: admin yaml analytics1009.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136060 [15:42:58] <_joe_> akosiaris: oh I know that [15:42:58] <_joe_> :) [15:43:39] ah ok then. Sorry :-/ [15:43:58] bits broken ? [15:44:04] no js/css for me [15:45:26] PROBLEM - Apache HTTP on mw1152 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:45:26] PROBLEM - Apache HTTP on mw1149 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:45:26] PROBLEM - Apache HTTP on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:45:28] akosiaris _joe_ ? [15:45:39] i predicted this [15:45:56] PROBLEM - Apache HTTP on mw1189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:46:03] on it [15:46:06] PROBLEM - Apache HTTP on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:46:09] thanks [15:46:26] PROBLEM - Apache HTTP on mw1191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:46:36] PROBLEM - Apache HTTP on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:47:06] <_joe_> wtf [15:47:06] RECOVERY - Apache HTTP on mw1190 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 9.331 second response time [15:47:26] RECOVERY - Apache HTTP on mw1191 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 9.242 second response time [15:47:26] RECOVERY - Apache HTTP on mw1196 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 6.197 second response time [15:47:26] PROBLEM - Apache HTTP on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:47:36] PROBLEM - Apache HTTP on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:47:43] (03CR) 10Dzahn: [C: 031] "+1, but as the comment said this would be temp. and should become real analytics users later" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136060 (owner: 10Rush) [15:47:53] Bits odwn? [15:47:56] Down, even. [15:48:01] yes Marybelle [15:48:06] PROBLEM - Apache HTTP on mw1150 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:48:26] PROBLEM - Apache HTTP on mw1207 is CRITICAL: Connection timed out [15:48:26] RECOVERY - Apache HTTP on mw1202 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 8.512 second response time [15:48:32] <_joe_> segfaults [15:49:17] PROBLEM - Apache HTTP on mw1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:49:33] oh fuck [15:49:46] RECOVERY - Apache HTTP on mw1189 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 3.987 second response time [15:49:58] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: (no message) [15:50:03] Logged the message, Master [15:50:05] I deployed too much [15:50:16] RECOVERY - Apache HTTP on mw1208 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 9.266 second response time [15:50:17] RECOVERY - Apache HTTP on mw1203 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.067 second response time [15:50:26] RECOVERY - Apache HTTP on mw1207 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 4.463 second response time [15:50:26] RECOVERY - Apache HTTP on mw1198 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.058 second response time [15:50:36] (03CR) 10Hashar: "IIRC Chase said yesterday that this is manageable via the recently added admin module." [operations/puppet] - 10https://gerrit.wikimedia.org/r/76678 (owner: 10Tim Starling) [15:50:49] (03CR) 10JanZerebecki: "Why do you think we need to find out what percent of our readers are still vulnerable to BEAST client-side?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132393 (https://bugzilla.wikimedia.org/53259) (owner: 10JanZerebecki) [15:51:17] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data exceeded the critical threshold [500.0] [15:51:26] RECOVERY - Apache HTTP on mw1149 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 8.449 second response time [15:51:48] bits.wikimedia.org 500s, causing unstyled documents everywhere [15:51:57] liangent: known, thanks [15:52:26] (03CR) 10Ottomata: [C: 032] admin yaml analytics1009.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136060 (owner: 10Rush) [15:53:22] <_joe_> Reedy: what has happened exactly? [15:53:33] what's going on? [15:53:36] I deployed the new version to the wikipedias accidentally [15:53:40] usual post-deply bits downtime? [15:53:48] It's not deploy time [15:54:10] apc? [15:54:50] (03CR) 10Rush: [C: 032 V: 032] admin yaml analytics1009.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136060 (owner: 10Rush) [15:55:06] RECOVERY - Apache HTTP on mw1150 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 7.262 second response time [15:56:26] RECOVERY - Apache HTTP on mw1152 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 3.794 second response time [15:58:35] (03PS2) 10Dzahn: admin yaml for palladium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135958 [16:00:19] (03CR) 10Rush: [C: 031] admin yaml for palladium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135958 (owner: 10Dzahn) [16:01:03] (03CR) 10Dzahn: [C: 032] admin yaml for palladium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135958 (owner: 10Dzahn) [16:01:37] (03CR) 10Hoo man: "re check" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135725 (owner: 10Hoo man) [16:03:17] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% data above the threshold [250.0] [16:03:27] (03CR) 10Rush: "so yaml parser seems to ignore dupes and just assume the last read is authoritative, I wasn't too worried as a sort will fix this and reor" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135725 (owner: 10Hoo man) [16:03:35] (03CR) 10Hoo man: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135725 (owner: 10Hoo man) [16:05:10] (03PS2) 10Dzahn: admin yaml for iodine [operations/puppet] - 10https://gerrit.wikimedia.org/r/135959 [16:06:38] (03CR) 10Hoo man: "Note: The jenkins failures were unrelated." [operations/puppet] - 10https://gerrit.wikimedia.org/r/135725 (owner: 10Hoo man) [16:07:45] (03CR) 10Dzahn: [C: 032] admin yaml for iodine [operations/puppet] - 10https://gerrit.wikimedia.org/r/135959 (owner: 10Dzahn) [16:11:11] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Just for the record, same opinion here. If all this is to be imported just for JSON::Path I will gladly package the dependency instead" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 (owner: 10Christopher Johnson (WMDE)) [16:15:26] so, what's going on? [16:15:57] (03PS1) 10Dzahn: admin yaml for potassium [operations/puppet] - 10https://gerrit.wikimedia.org/r/136115 [16:18:00] (03PS3) 10Ottomata: [WIP] Add CDH5 support, drop CDH4 support [operations/puppet/cdh4] (cdh5) - 10https://gerrit.wikimedia.org/r/135494 [16:18:37] we're still operating with three bits app servers [16:18:52] mw1151 hasn't been repooled [16:19:34] that's quite fragile obviously [16:19:50] paravoid: ^ [16:20:02] ori: it has comments in icinga that you disabled puppet and apache [16:20:12] on 5/17 [16:20:13] i emailed about it [16:20:37] it had/has a bad disk [16:20:55] the ticket about the bad disk is still open [16:21:04] so i guess it's not fixed and shouldn't be used [16:21:26] but in that case we should probably repurpose another app server to be a bits app server [16:21:51] <_joe_> ori: the problem was not only on bits, btw [16:22:10] <_joe_> ori: and I strongly suspect we would have been down even with 2x bits appservers [16:22:29] [09:53] < Reedy> I deployed the new version to the wikipedias accidentally [16:22:40] Three versions is too much for APC [16:23:03] heh [16:23:12] * bd808 has timestamps in UTC-6 [16:23:28] still, three bits app servers is too fragile imo [16:23:33] * bd808 nods [16:23:34] *four* is too fragile [16:23:35] <_joe_> bd808: I can confirm that I saw a some segfaults on one API server. [16:24:15] <_joe_> ori: we can repurpose one appserver for sure, but it won't have saved us in this case, IMO [16:24:33] <_joe_> so, we should do that if we have other problems, not because of this [16:25:14] (03PS2) 10Dzahn: admin yaml for potassium [operations/puppet] - 10https://gerrit.wikimedia.org/r/136115 [16:25:53] well, there is enough redundancy in the other app server clusters that there were at all times servers serving requests [16:25:59] not in bits [16:26:03] which is why users came reporting bits issues [16:26:27] so while it's true that the overload was affecting all apaches, it became critical for bits [16:26:33] <_joe_> bits is obviously the most hammmered part of the infrastructure [16:26:47] <_joe_> there is a multiplexing effect for bits in such a situation [16:27:05] <_joe_> we are usually serving almost everything out of varnish for bits, correct? [16:27:26] <_joe_> so when we have a mass cache invalidation for high traffic sites [16:27:47] yes, the bits app servers get hammered [16:27:48] <_joe_> bits will suffer the higher relative requests surge [16:27:55] hence " usual post-deply bits downtime" [16:28:03] users are quite familiar with it [16:28:09] <_joe_> so, I'd +1 the idea to merge the bits part into the main appserver pool :) [16:28:34] <_joe_> but I don't think that would be a solution for such an event [16:28:34] * bd808 inserts "crash all the things!" meme image [16:29:05] <_joe_> apc is the root of all evil anyway [16:29:13] well, it was a mistake deploy [16:29:31] but if bits was part of the general pool, users wouldn't have noticed [16:29:54] <_joe_> ori: users would have seen the whole site down :) [16:29:58] <^demon|away> ori: On the "bits as its own pool" thing. [16:30:06] <_joe_> I'm pretty sure of that [16:30:12] (03PS1) 10Rush: admin yaml analytics* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136116 [16:30:13] <^demon|away> Another reason, if memory serves me, was that bits was the first thing to use varnish. [16:30:22] <^demon|away> At the time, the rest of the cluster was still on squid. [16:30:28] <^demon|away> So that was a 2nd reason for having its own cluster. [16:30:31] <^demon|away> (or 3rd?) [16:30:34] <^demon|away> :) [16:30:39] yes [16:30:41] <_joe_> ^demon|away: in theory, having things serving different payloads separated is a good idea [16:30:41] that's true [16:30:59] <_joe_> it is as long as you don't end up wasting too many resources [16:31:19] <^demon|away> Sure, makes sense. [16:31:20] it's fine for it to be separate, it's just too few servers atm [16:31:25] <^demon|away> I was just adding to the list of reasons it was before. [16:31:26] <^demon|away> :) [16:31:29] <_joe_> which may be the case here, but I'd like to have more experience before making a call :) [16:32:22] <_joe_> ori: did we have any bits-related outage due to "too few servers" that adding less than 10 servers would have solved in the last weeks? [16:32:46] <_joe_> ori: I think moving bits behind pybal/lvs is more important [16:32:57] <_joe_> than adding servers right now [16:33:06] <_joe_> I will ping chris about that disk replacement [16:33:08] <^d> Wait bits aren't lvs'd? [16:33:10] (03PS1) 10Rush: admin yaml tungsten [operations/puppet] - 10https://gerrit.wikimedia.org/r/136117 [16:33:16] nope [16:34:03] <_joe_> ^d: no, and that caused most probably the problem when mw1151 was not depooled [16:34:13] <_joe_> it's balanced directly by varnish [16:34:26] <^d> *nod* makes sense. [16:34:43] <_joe_> and that balancing is not that smart, I guess [16:35:24] (03CR) 10Dzahn: [C: 031] admin yaml tungsten [operations/puppet] - 10https://gerrit.wikimedia.org/r/136117 (owner: 10Rush) [16:35:26] it uses a consistent hashing scheme to map urls to servers iirc [16:36:03] (03CR) 10Ottomata: [C: 032] admin yaml analytics* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136116 (owner: 10Rush) [16:36:51] (03PS1) 10Dzahn: run all maintenance crons as apache user [operations/puppet] - 10https://gerrit.wikimedia.org/r/136118 [16:37:12] (03PS2) 10Rush: admin yaml analytics* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136116 [16:37:17] (03CR) 10Rush: [C: 032 V: 032] admin yaml analytics* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136116 (owner: 10Rush) [16:37:20] <_joe_> ori: oh that's the reason maybe [16:37:57] (03PS2) 10Rush: admin yaml tungsten [operations/puppet] - 10https://gerrit.wikimedia.org/r/136117 [16:38:02] (03CR) 10Rush: [C: 032 V: 032] admin yaml tungsten [operations/puppet] - 10https://gerrit.wikimedia.org/r/136117 (owner: 10Rush) [16:40:10] what's the timing of the enwiki deploy today? [16:41:10] (03PS3) 10Dzahn: admin yaml for potassium [operations/puppet] - 10https://gerrit.wikimedia.org/r/136115 [16:41:28] cscott: It should happen in about 1.5h or so [16:41:57] (03PS1) 10Ori.livneh: Drop stub role::mediawiki::maintenance class [operations/puppet] - 10https://gerrit.wikimedia.org/r/136119 [16:42:07] (03CR) 10Dzahn: [C: 032] admin yaml for potassium [operations/puppet] - 10https://gerrit.wikimedia.org/r/136115 (owner: 10Dzahn) [16:43:15] (03PS2) 10Giuseppe Lavagetto: rcs: install servers with trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/136022 [16:46:05] _joe_: trailing ws in linux-host-entries.ttyS1-115200 [16:47:16] <_joe_> ori: I must have disabled whitespace-mode [16:51:04] (03PS3) 10Giuseppe Lavagetto: rcs: install servers with trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/136022 [16:55:45] (03PS1) 10Rush: admin yaml node /virt100[1-7].eqiad.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/136121 [16:57:11] (03CR) 10Hoo man: "Please note that there are quite some files to chown... also some stuff is still logging to /home/mwdeploy/." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136118 (owner: 10Dzahn) [17:00:32] (03PS1) 10Rush: admin yaml /^solr100[1-3]\.eqiad\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/136126 [17:02:08] (03CR) 10Ori.livneh: [C: 031] rcs: install servers with trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/136022 (owner: 10Giuseppe Lavagetto) [17:02:17] let's do it :P [17:02:35] (03PS2) 10BBlack: Add optional RSS setup to interface RPS script [operations/puppet] - 10https://gerrit.wikimedia.org/r/135931 [17:02:52] (03CR) 10jenkins-bot: [V: 04-1] Add optional RSS setup to interface RPS script [operations/puppet] - 10https://gerrit.wikimedia.org/r/135931 (owner: 10BBlack) [17:03:57] (03PS1) 10Rush: admin yaml /^tmh100[1-2]\.eqiad\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/136127 [17:04:41] (03PS2) 10Filippo Giunchedi: add mini-dinstall to releases.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/136100 [17:05:33] (03Abandoned) 10Filippo Giunchedi: add mini-dinstall to releases.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/136100 (owner: 10Filippo Giunchedi) [17:06:12] (03PS2) 10Rush: admin yaml node /virt100[1-7].eqiad.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/136121 [17:06:21] (03CR) 10Rush: [C: 032 V: 032] "go" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136121 (owner: 10Rush) [17:07:34] (03PS3) 10BBlack: Add optional RSS setup to interface RPS script [operations/puppet] - 10https://gerrit.wikimedia.org/r/135931 [17:08:56] (03PS4) 10BBlack: Add optional RSS setup to interface RPS script [operations/puppet] - 10https://gerrit.wikimedia.org/r/135931 [17:10:05] (03PS1) 10Filippo Giunchedi: add mini-dinstall to releases.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/136128 [17:10:32] (03CR) 10CSteipp: "Since a lot of clients are now negotiating to AES-CBC (where they were negotiating rc4), they might be vulnerable to beast if they aren't " [operations/puppet] - 10https://gerrit.wikimedia.org/r/132393 (https://bugzilla.wikimedia.org/53259) (owner: 10JanZerebecki) [17:13:30] (03PS1) 10Ori.livneh: dissolve mediawiki::pybal_check into mediawiki::users [operations/puppet] - 10https://gerrit.wikimedia.org/r/136129 [17:16:08] (03PS1) 10Rush: admin yaml vanadium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136130 [17:16:21] (03PS2) 10Rush: admin yaml /^solr100[1-3]\.eqiad\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/136126 [17:16:27] (03CR) 10Rush: [C: 032 V: 032] admin yaml /^solr100[1-3]\.eqiad\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/136126 (owner: 10Rush) [17:18:08] (03PS2) 10Rush: admin yaml /^tmh100[1-2]\.eqiad\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/136127 [17:18:13] (03CR) 10Rush: [C: 032 V: 032] admin yaml /^tmh100[1-2]\.eqiad\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/136127 (owner: 10Rush) [17:20:10] (03PS2) 10Rush: admin yaml vanadium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136130 [17:20:16] (03CR) 10Rush: [C: 032 V: 032] admin yaml vanadium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136130 (owner: 10Rush) [17:20:18] bd808: give me a heads up before enwiki deploy so i can obsessively hit refresh on the imagescaler ganglia [17:20:55] Reedy: ^ ping cscott before you do the deploy please [17:21:09] (03CR) 10Faidon Liambotis: [C: 032] dissolve mediawiki::pybal_check into mediawiki::users [operations/puppet] - 10https://gerrit.wikimedia.org/r/136129 (owner: 10Ori.livneh) [17:21:20] cscott: Don't DOS ganglia :) [17:23:41] (03PS1) 10Chad: All Wikipedias with 100k pages or less getting Cirrus as primary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136132 [17:26:25] (03CR) 10Chad: "This means we'll be done everywhere except anwiki, arwiki, azwiki, be_x_oldwiki, bewiki, bgwiki, bnwiki, brwiki, bswiki, cebwiki, ckbwiki," [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136132 (owner: 10Chad) [17:27:08] so dissolute [17:27:14] (03PS1) 10Rush: admin yaml for pdf* and pc100[1-3] [operations/puppet] - 10https://gerrit.wikimedia.org/r/136135 [17:28:31] (03PS2) 10Rush: admin yaml for pdf* and pc100[1-3] [operations/puppet] - 10https://gerrit.wikimedia.org/r/136135 [17:28:38] (03CR) 10Rush: [C: 032 V: 032] "go" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136135 (owner: 10Rush) [17:33:16] (03CR) 10Ori.livneh: "the ridiculous path will be improved when rcstream becomes a proper python package rather than a single script.. this is just for now." [operations/puppet] - 10https://gerrit.wikimedia.org/r/132429 (owner: 10Ori.livneh) [17:33:24] (03PS1) 10Rush: admin yaml potassium & sodium [operations/puppet] - 10https://gerrit.wikimedia.org/r/136138 [17:34:17] (03CR) 10Ori.livneh: [C: 04-1] "should not be merged until sync-file / sync-dir issue is fixed" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135925 (owner: 10BryanDavis) [17:34:28] so it appears I don't have permission to push to phabricator/phabricator on gerrit? ?How would I go about initializing the repositories for phab since they are apparently empty [17:35:02] i can do that for you [17:35:24] can you produce a list of the gerrit repos and the repo they should be forked from? [17:35:51] ^d: can you give twentyafterfour rights to force push into phabricator/phabricator (or tell him how to do what he already has rights for)? [17:35:59] yes but it'd be nice to be able to administer them myself since I'm supposed to be doing this phabricator stuff [17:36:04] <^d> I shall do this! [17:36:10] <^d> At the time, I think he didn't have a gerrit acct yet. [17:36:18] ^d: cool thanks [17:36:28] it's phabricator/phabricator and phabricator/arcanist and phabricator/libphutil [17:36:38] everyone do this! [17:36:43] <^d> Added ori and bd808 for good measure too. [17:36:46] <^d> All done. [17:36:53] twentyafterfour: ^d is the man for this stuff. [17:36:59] Thanks ^d [17:37:02] <^d> yw [17:42:14] greg-g I am semi available the next ~2 hours in very unlikely event our stuff causes a problem during deploy [17:42:55] audephone: your nickname is reassuring ;) [17:43:36] * bd808 would prefer i-aude [17:43:37] wifi gods are not happy but can poke at code [17:43:47] heh [17:44:23] bd808: you would. [17:44:31] so next question: 'git@gerrit.wikimedia.org/r/phabricator/phabricator/' does not appear to be a git repository [17:44:41] ^d: ^ [17:44:57] is it not git@gerrit.wm.org? [17:45:05] twentyafterfour: it probably needs to be forked officially if that's what you mean [17:45:10] unsure if there is a default readme or something [17:45:27] probably the repo isn't actually there other than administratively [17:45:32] <^d> I'm going to push the master of all of this. [17:45:35] and it's your username@gerrit I believe [17:45:38] <^d> Should've done it myself. [17:45:43] unless I'm confused [17:45:55] chasemp: https://gerrit.wikimedia.org/r/#/c/136119/ is super trivial btw [17:46:15] if you +1 i can merge / apply [17:46:33] (03CR) 10Rush: [C: 031] "yup" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136119 (owner: 10Ori.livneh) [17:46:53] (03PS2) 10Rush: admin yaml potassium & sodium [operations/puppet] - 10https://gerrit.wikimedia.org/r/136138 [17:46:54] many thanks [17:46:55] (03PS2) 10Ori.livneh: Drop stub role::mediawiki::maintenance class [operations/puppet] - 10https://gerrit.wikimedia.org/r/136119 [17:47:00] (that's just a rebase) [17:47:01] (03CR) 10Rush: [C: 032 V: 032] admin yaml potassium & sodium [operations/puppet] - 10https://gerrit.wikimedia.org/r/136138 (owner: 10Rush) [17:47:02] twentyafterfour: The repo is there, it's just completely empty so the clone fails on checkout [17:47:17] `git clone ssh://20after4@gerrit.wikimedia.org:29418/phabricator/phabricator` [17:47:19] I'm not trying to clone I'm trying to push to master [17:47:20] (03PS3) 10Ori.livneh: Drop stub role::mediawiki::maintenance class [operations/puppet] - 10https://gerrit.wikimedia.org/r/136119 [17:47:27] (03CR) 10Ori.livneh: [C: 032 V: 032] Drop stub role::mediawiki::maintenance class [operations/puppet] - 10https://gerrit.wikimedia.org/r/136119 (owner: 10Ori.livneh) [17:47:32] I cloned from github [17:47:44] * bd808 did this once but forgot how [17:47:46] twentyafterfour: try 20after4@gerrit.... [17:48:09] chasemp: i'm going to puppet-merge your admin change on puppetmaster since it's queued up with mine [17:48:12] is that cool? [17:48:17] oops thanks [17:48:18] neither 20after4 or twentyafterfour work either [17:48:19] There's something slightly sneaky you have to do for the first push [17:48:21] faster than I [17:48:30] done [17:48:41] <^d> That's the wrong path. [17:48:54] <^d> I'm almost done. [17:49:11] <^d> arcanist is done [17:49:22] <^d> libphutil and phabricator are pushing. [17:49:25] * twentyafterfour goes out for a smoke. no rush ^d [17:49:30] thanks though [17:49:56] * ^d twiddles thumbs while jgit does its shiz. [17:51:41] ^d: for posterity, what are you doing differently? [17:51:55] <^d> Using the correct repo path and port. [17:52:09] hah [17:52:19] <^d> @gerrit.wikimedia.org:29418/phabricator/libphutil, for example. [17:52:23] <^d> No /r/, that's web. [17:52:41] <^d> And unless you mess with your .ssh/config like I do, you'll need that stupid port. [17:52:45] [11:47] < bd808> `git clone ssh://20after4@gerrit.wikimedia.org:29418/phabricator/phabricator` [17:52:52] <^d> Oh, clone wouldn't work yet. [17:53:00] <^d> HEAD was pointing at non-existent refs/heads/master. [17:53:14] yeah. It actually clones just the checkout failed [17:53:14] <^d> You needed to clone from upstream, add the remote, then push history. [17:53:27] s/clones/cloned/ [17:53:46] <^d> Anyway, all 3 repos in gerrit with current master as of ~5m ago. [17:53:55] I have all three cloned now LGTM [17:54:18] I think git-review -s sets it right? [17:54:20] dunno [17:54:33] <^d> It creates a gerrit remote. [17:54:45] <^d> And downloads commit-msg hook. [17:54:59] <^d> We won't use git-review on this repo. [17:55:02] <^d> Anyway :) [17:55:25] (03PS1) 10Rush: admin yaml stragglers [operations/puppet] - 10https://gerrit.wikimedia.org/r/136146 [17:55:41] (03CR) 10Rush: [C: 032 V: 032] admin yaml stragglers [operations/puppet] - 10https://gerrit.wikimedia.org/r/136146 (owner: 10Rush) [17:56:13] hi, robh and i are onsite at ulsfo. [17:56:18] <^d> bd808: Oh while we're halfway on the subject. The fix for the phab/elastic 1.0 bug hit master yesterday. [17:56:42] Nice. I saw your patch [17:58:02] (03PS1) 10Rush: admin yaml osm-cp100[1-4]\.wikimedia\.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/136148 [17:58:17] (03CR) 10Rush: [C: 032 V: 032] admin yaml osm-cp100[1-4]\.wikimedia\.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/136148 (owner: 10Rush) [18:00:46] (03PS1) 10Rush: admin yaml /^mw10(0[1-9]|1[0-6])\.eqiad\.wmnet$/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/136149 [18:01:06] (03CR) 10Rush: [C: 032 V: 032] admin yaml /^mw10(0[1-9]|1[0-6])\.eqiad\.wmnet$/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/136149 (owner: 10Rush) [18:02:12] it's the port I didn't have right [18:02:21] er and the /r/ part [18:05:31] permission denied (publickey) [18:06:03] * twentyafterfour double-checks ssh keys in gerrit [18:06:11] (03PS1) 10Rush: admin yaml for mw* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136150 [18:06:56] (03CR) 10Rush: [C: 032 V: 032] "ran ok on pilot batch, no reason to suspect issue but since this is such a large change I'm going to hold here for a bit." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136150 (owner: 10Rush) [18:08:01] (03PS1) 10Ori.livneh: WIP Remove wikimedia-task-appserver from app servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/136151 [18:09:18] key_read: uudecode AAAAB3NzaC1yc2EAAAADAQABAAAAgQCF8...oVFf1CgQ== [18:09:19] failed [18:09:19] wtf [18:09:19] (03CR) 10Reedy: [C: 032] Wikipedias to 1.24wmf6 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136058 (owner: 10Reedy) [18:11:44] (03Merged) 10jenkins-bot: Wikipedias to 1.24wmf6 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136058 (owner: 10Reedy) [18:17:24] ok got it finally [18:18:24] Bits may flap again... [18:18:25] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf6 [18:18:30] Logged the message, Master [18:19:05] 191 Catchable fatal error: Argument 1 passed to ApiQueryBase::__construct() must be an instance of ApiQuery, instance of ApiMain given, called in /usr/local/apache [18:19:05] /common-local/php-1.24wmf6/includes/api/ApiModuleManager.php on line 107 and defined in /usr/local/apache/common-local/php-1.24wmf6/includes/api/ApiQueryBase.php on lin [18:19:05] e 43 [18:19:13] Imma revert [18:19:47] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: rv that [18:19:52] Logged the message, Master [18:19:53] bugger [18:20:00] * Reedy spins the blame wheel [18:20:06] bits started flapping, indeed [18:20:13] (03PS5) 10BBlack: Add optional RSS setup to interface RPS script [operations/puppet] - 10https://gerrit.wikimedia.org/r/135931 [18:20:28] It is fixed in Site Matrix that issue [18:20:40] same thing to do elsewhere seems [18:20:55] and we fixed in wikibase [18:21:49] What was the fix? [18:21:53] evil code somewhere with no phpunit tests [18:22:06] i see bits issue is known [18:22:10] I see it locally too [18:22:34] matanya: yeah [18:22:40] change how API Query is extended [18:22:44] should subside soon [18:23:05] second time today :) [18:23:06] Until we go back again :( [18:23:18] or we can revert core patch and resubmit when problem fixed [18:23:38] What's the fix? [18:23:46] How many extensions is there going to be to fix? [18:23:59] cool I've got a janky vagrant+puppet setup for phabricator [18:24:16] you have to look at git log to see fix [18:24:20] (03CR) 10BBlack: [C: 032 V: 032] Add optional RSS setup to interface RPS script [operations/puppet] - 10https://gerrit.wikimedia.org/r/135931 (owner: 10BBlack) [18:24:34] no idea what is affected now [18:25:02] (03PS2) 10Ori.livneh: Remove wikimedia-task-appserver from app servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/136151 [18:25:13] Why's this breaking wmf6? [18:25:15] (now) [18:25:29] Core changed [18:25:49] twentyafterfour: nice! [18:26:27] (03PS2) 10BBlack: enable RSS for LVS servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/135932 [18:26:27] We backported wikibase patch [18:26:35] greg-g: now to hook up mediawiki oauth in vagrant to talk to phabricator in vagrant. :) [18:26:44] that'll be fun :) [18:26:56] matanya: looks like bits are almost back to normal [18:29:53] no idea what is wrong with bits [18:30:10] Normal version changeover [18:30:14] (03CR) 10BBlack: [C: 032 V: 032] enable RSS for LVS servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/135932 (owner: 10BBlack) [18:30:17] PROBLEM - Disk space on analytics1012 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/i 73341 MB (3% inode=99%): [18:31:19] critical! [18:31:20] :) [18:31:47] 73gig free is critical? wow [18:31:59] everyone's a critic [18:32:06] I've still no idea what the hell apparently changed [18:32:06] https://github.com/wikimedia/mediawiki-core/commits/master/includes/api [18:32:27] (03PS1) 10Jean-Frédéric: Add French Ministry for Culture to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136154 (https://bugzilla.wikimedia.org/65905) [18:32:36] Adding type hints in apiquery [18:33:04] Wow [18:33:10] Likely the cause [18:33:22] i know right [18:33:27] its a hardcoded percentage :/ [18:33:39] we can revert that and then look at fix and resubmit [18:33:42] Reedy: the only breaking change as identified by anomie was https://gerrit.wikimedia.org/r/#/c/135283/ [18:34:01] cant look but probably that [18:34:11] Allow filtering log entries by namespace (API) [18:34:15] Add parameter lenamespace to the API, allowing filtering of log entries by [18:34:18] namespace. [18:34:30] * anomie looks for context [18:34:38] 14:19 < Reedy> 191 Catchable fatal error: Argument 1 passed to ApiQueryBase::__construct() must be an instance of ApiQuery, instance of ApiMain given, called in /usr/local/apache [18:34:42] 14:19 < Reedy> /common-local/php-1.24wmf6/includes/api/ApiModuleManager.php on line 107 and defined in /usr/local/apache/common-local/php-1.24wmf6/includes/api/ApiQueryBase.php on lin [18:34:46] 14:19 < Reedy> e 4 [18:34:46] https://gerrit.wikimedia.org/r/#/c/120827/ [18:35:09] Not that greg [18:35:15] yeah, what Reedy linked [18:35:34] I just made some revert commits [18:35:45] greg-g: Ah. Is there another instance of that that we didn't catch on Beta? I specifically merged that patch just after wmf6 so we'd have maximum time for testing. [18:36:00] What reedy says [18:36:01] Reedy: see any others? [18:36:02] Not enough tests [18:36:03] I'll refrain from reverting from master for yet [18:36:08] kk [18:36:12] greg-g: When it spams out the log, it's hard to tell :( [18:36:23] true [18:36:49] (03PS3) 10Ori.livneh: Remove wikimedia-task-appserver from app servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/136151 [18:37:49] anomie: but yeah, way to do the smart thing and merge right after wmf6 [18:37:59] greg-g: oh, when this is over: just a nice and friendly hint for the volulnteer ldap blocker ... [18:38:00] * greg-g wants more anomies [18:38:17] matanya: dragons. there be dragons. [18:38:17] We fixed wikibase and I know Site Matrix was fixed [18:38:49] greg-g: I see the bug is in EducationProgram. Working on a fix. WhyTF is it subclassing ApiQueryBase for a module added to $wgAPIModules? [18:39:09] :) [18:39:10] I Looked at bunch of other extenstions and they looked ok [18:39:19] anomie: Nice methods? [18:39:24] Didnt look at all [18:39:27] Reedy: Doesn't use any of them [18:39:31] ... [18:39:49] grrrr [18:40:41] I think all the errors on enwiki were EP [18:40:50] greg-g: with deep appreciation : https://commons.wikimedia.org/wiki/Dragons#mediaviewer/File:DisegnoDrago.jpg [18:41:04] I looked up to central notice I think [18:41:39] didnt get to education extension [18:42:58] but makes sense why this issue didnt appear on group 1 wikis [18:44:58] !log reedy synchronized php-1.24wmf6/includes/api/ [18:45:01] Logged the message, Master [18:45:16] greg-g, Reedy: https://gerrit.wikimedia.org/r/#/c/136159/ should do it, unless I'm missing some way that ApiListStudents was actually needing ApiQueryBase. [18:45:40] cscott: enwiki time... [18:45:52] woot! [18:46:10] i checked ganglia and imagescaler load is very low. let's hope it stays that way. [18:46:22] cscott: hang on [18:47:12] (03CR) 10Odder: [C: 031] Add French Ministry for Culture to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136154 (https://bugzilla.wikimedia.org/65905) (owner: 10Jean-Frédéric) [18:47:25] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf6 take 3 [18:47:29] Logged the message, Master [18:48:45] cscott: ok, is now [18:48:50] here goes bits again :) [18:49:23] !log removing mw1151 from pybal and dsh groups to replace disk and reinstall [18:49:25] Reedy, greg-g: I ran fatal.log through a quick Perl script, the only instances of this error logged include 'liststudents' in the backtrace. [18:49:28] Logged the message, Master [18:49:44] anomie: thanks much [18:49:53] oh 1151 [18:50:28] greg-g https://rt.wikimedia.org/Ticket/Display.html?id=7521 [18:50:59] also mw1163 is coming back..do you want me to add back to dsh and pybal...or ping you [18:51:07] cmjohnson1: yeah, I was just looking at the bits cluster, where 1151 is, and felt sad about it being gone ;) [18:51:20] cmjohnson1: how soon for 1163? [18:51:23] i will have it fixed within in the hour [18:51:31] Reedy: you be around for that? [18:51:40] in an hour, to make sure 1161 rejoins correctly? [18:52:08] puppet still running....got this error http://pastebin.com/i7XHWtfs [18:52:12] er, 1163 [18:52:39] bd808: ^^ [18:52:43] bd808: the pastebin [18:53:05] * bd808 looks [18:53:22] cmjohnson1: That was on wm1163? [18:53:26] yes [18:53:41] during initial puppet run [18:54:25] might simply need to rerun puppet [18:54:31] or run it manually [18:54:43] it == /usr/local/sbin/grain-ensure contains deployment_target scap [18:54:58] and figure out how to not have to in the future :) [18:55:06] machines rejoining should "just work" [18:55:17] That's the bit that tells salt that the host should be registered for a given grain [18:55:29] no apparent effect of enwiki deploy on imagescalers. which is what we expected/hoped, but it's nice to see that mr murphy appears to be asleep. [18:56:07] (03PS1) 10Cmjohnson: removing mw1151 from dsh groups to replace hard drive and reinstall [operations/puppet] - 10https://gerrit.wikimedia.org/r/136160 [18:56:18] cscott: :) [18:56:38] (bits are starting to slowly return to normal) [18:56:49] greg-g: Assuming my grep-fu was good and no one screwed stuff up between wmf6 and wmf7, there don't seem to be any other WMF-deployed extensions with this problem. I extracted the class names from all assignments to wgAPIModules and then checked what they extend (recursively, in a few cases). [18:56:54] cmjohnson1: It looks like the error was the unless check to salt if it already knows about wm1163 being in the grain "scap" [18:57:09] anomie: rock on. [18:57:35] anomie: might it be worth a catch-up deploy after bits is done recovering? [18:57:49] and after EP is done patching, that is. [18:58:14] (03CR) 10Reedy: [C: 032] group0 to 1.24wmf7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136059 (owner: 10Reedy) [18:58:16] RECOVERY - Disk space on analytics1012 is OK: DISK OK [18:58:35] greg-g: "catch-up deploy" meaning? [18:58:58] bd808 i am also getting this error: Could not start Service[twemproxy] [18:59:02] (03Merged) 10jenkins-bot: group0 to 1.24wmf7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136059 (owner: 10Reedy) [18:59:40] ori: ^^ Could not start Service[twemproxy] on wm1163? [18:59:46] anomie: undo'ing the revert and patching EP [19:00:13] in wmf7 [19:00:17] 6/7 [19:00:35] Is EP on test2? [19:00:43] or test wikipedia [19:00:49] greg-g: I wouldn't mind that, rather than waiting another few weeks to find out if anything else is broken once wmf8 rolls around. [19:00:50] should be I suppose :) [19:00:56] anomie: yeah [19:01:19] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf7 [19:01:24] Logged the message, Master [19:01:29] well, wmf7 is broken [19:01:37] cmjohnson1: I think mw1163.eqiad.wmnet is sad in multiple ways. It won't let me ssh in which means scap won't run against it. [19:01:48] Needs backport for this [19:01:54] Reedy: can you do that (where "that" == when https://gerrit.wikimedia.org/r/#/c/136159/ is merged, undo the API revert and push our the updated EP at the same time, for wmf6 and 7) [19:02:06] s/our/out/ [19:02:27] PROBLEM - twemproxy port on mw1163 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 65534 (nobody), command name nutcracker [19:02:37] PROBLEM - twemproxy process on mw1163 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 65534 (nobody), command name nutcracker [19:02:37] PROBLEM - Apache HTTP on mw1163 is CRITICAL: Connection refused [19:02:58] bd808: you should be able to get in now [19:03:29] 1 Catchable fatal error: Argument 1 passed to ApiQueryBase::__construct() must be an instance of ApiQuery, instance of ApiMain given, called in /usr/local/apache [19:03:29] /common-local/php-1.24wmf7/includes/api/ApiModuleManager.php on line 107 and defined in /usr/local/apache/common-local/php-1.24wmf7/includes/api/ApiQueryBase.php on lin [19:03:29] e 43 [19:03:29] cmjohnson1: Permission denied (publickey). [19:03:34] Think we might still have one [19:03:41] greg-g, audephone, Reedy: Oh good, test2wiki has EducationProgram (but not testwiki or mediawikiwiki) and Reedy doesn't seem to have reverted the change in wmf7. So we could push the EP change to wmf7 and see if it's fixed on test2. [19:03:57] Backport the ep patch [19:04:05] see that it fixes it [19:04:27] (03CR) 10Cmjohnson: [C: 032] "Will revert this once the disk has been replaced." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136160 (owner: 10Cmjohnson) [19:04:36] anomie: can I leave you in charge of that right now, work it out with Reedy. I have to go afk for a bit or I'll be worthless the rest of the day) [19:04:39] -) [19:04:51] greg-g: ok [19:04:56] ty [19:06:36] bd808: odd, I can ssh into mw1163 [19:06:59] cmjohnson1: As a mortal? [19:07:15] oh [19:08:28] Reedy: Ping me when we can look at this API thing, please. [19:08:28] * bd808 wishes the only difference between mortals and roots was a sudoers.d file [19:09:36] !log powering down mw1151 for disk replacement [19:09:40] Logged the message, Master [19:10:46] ok, still no imagescaler action. i guess i can stop hitting refresh. [19:11:58] PROBLEM - Host mw1151 is DOWN: PING CRITICAL - Packet loss = 100% [19:15:10] (03PS1) 10Dzahn: convert logstash root users to admin yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136163 [19:15:19] cscott, that parser cache is probably working ;) [19:15:42] gwicke: yup, even murphy respects the parser cache ;) [19:17:07] (03CR) 10Rush: [C: 031] convert logstash root users to admin yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136163 (owner: 10Dzahn) [19:17:07] RECOVERY - Host mw1151 is UP: PING WARNING - Packet loss = 28%, RTA = 2.16 ms [19:17:55] (03CR) 10Dzahn: [C: 032] convert logstash root users to admin yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136163 (owner: 10Dzahn) [19:18:33] Hi JavaScript! Where've you gone? :-) [19:19:17] PROBLEM - puppet disabled on mw1151 is CRITICAL: Connection refused by host [19:19:18] PROBLEM - Disk space on mw1151 is CRITICAL: Connection refused by host [19:19:18] PROBLEM - twemproxy port on mw1151 is CRITICAL: Connection refused by host [19:19:18] PROBLEM - check configured eth on mw1151 is CRITICAL: Connection refused by host [19:19:18] PROBLEM - twemproxy process on mw1151 is CRITICAL: Connection refused by host [19:19:18] PROBLEM - SSH on mw1151 is CRITICAL: Connection refused [19:19:37] PROBLEM - RAID on mw1151 is CRITICAL: Connection refused by host [19:19:47] PROBLEM - check if dhclient is running on mw1151 is CRITICAL: Connection refused by host [19:20:07] PROBLEM - DPKG on mw1151 is CRITICAL: Connection refused by host [19:20:46] Is someone already on mw1151? [19:20:58] (03PS1) 10Rush: admin yaml ssl* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136164 [19:21:15] Coren: yeah, that's the bits server that has a bad disk being replaced by cmjohnson1 [19:21:49] Ah, I'd have expected planned maintenance to have been marked in icinga. *hint, hint* [19:22:07] :-) [19:22:33] * Coren politely disconnects from the management interface. [19:22:58] bblack, around? dr0ptp4kt & i are thinking of varnish variance issue with HTTP->HTTPS redirect - https://gerrit.wikimedia.org/r/#/c/133029/ [19:23:18] coren: :-) *received [19:23:26] (03CR) 10Dzahn: [C: 031] admin yaml ssl* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136164 (owner: 10Rush) [19:25:37] anomie: I guess it can go whenever... [19:26:26] (03CR) 10Rush: [C: 032 V: 032] admin yaml ssl* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136164 (owner: 10Rush) [19:26:30] Reedy: So, first thing I'd say would be to backport https://gerrit.wikimedia.org/r/#/c/136159 to wmf7 and undo your revert (which doesn't seem to have been deployed?), and then we'll see if test2wiki is fixed. [19:27:35] bd808: for mw1163: /proc/self/fd/9: 3: .: Can't open /usr/local/apache/common-local/multiversion/MWRealm.sh [19:27:51] it needs a sync-common [19:28:01] anomie: if Reedy doesn't respond soon enough, feel free to jfdi [19:28:15] * greg-g is back with two coffees in hand [19:28:36] bd808: but to have a sync-common it needs scap, so doing a git-deploy [19:29:00] Which is the bit that failed (joining the salt grain) [19:29:13] greg-g: I'm happy to jfdi, I just want to make sure I don't step on Reedy's toes if he's also working on it or on other stuff [19:29:15] anomie: I deployed to wmf6, not wmf7 [19:29:31] Reedy: agreed :) [19:29:56] Reedy: You merged https://gerrit.wikimedia.org/r/#/c/136157/ but didn't deploy it, is what I was referring to. [19:29:57] er, anomie [19:30:04] bah, coffee, work faster [19:30:15] ori: I still can't ssh into mw1163.eqiad.wmnet "Permission denied (publickey)." so scap will probably fail as well. sync-common should work though. [19:30:22] it succeeded now [19:30:30] no, needs scap for sync-common [19:31:27] PROBLEM - NTP on mw1151 is CRITICAL: NTP CRITICAL: No response from NTP server [19:31:30] (03PS1) 10Dzahn: admin yaml for rdb (redis) boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/136166 [19:32:10] (03CR) 10Rush: [C: 031] admin yaml for rdb (redis) boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/136166 (owner: 10Dzahn) [19:32:49] ori: yes, sync-common comes from installing scap. I meant that running scap from tin will fail as long as mortals can't ssh in, but running sync-common locally should work. [19:33:38] bd808: http://p.defau.lt/?tleQY0oanf_xHrY6_N8z8w [19:33:49] (03PS2) 10Dzahn: admin yaml for rdb (redis) boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/136166 [19:34:10] (03CR) 10Dzahn: [C: 032] admin yaml for rdb (redis) boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/136166 (owner: 10Dzahn) [19:34:40] ori: yuck. That looks like more puppet problems. [19:34:47] bd808: i'll mkdir it, but needs to be puppetized [19:35:15] oh, odd [19:35:17] root@mw1163:/srv# mkdir /usr/local/bin/sync-common [19:35:17] mkdir: cannot create directory `/usr/local/bin/sync-common': File exists [19:35:25] we should throw a chaos-monkey-like at a our infra sometime [19:36:08] ori: `mkdir /usr/local/apache/common-local` [19:37:32] ori: Apparently I make that directory in beta via beta::common. Not sure where it is supposed to come from in prod. [19:38:44] Reedy: If you're busy or just wanting to go eat or whatever, I'll be happy to take care of the EducationProgram thing. [19:38:51] (03PS4) 10Dzahn: Remove duplicate users in admin class's data.yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135725 (owner: 10Hoo man) [19:38:53] !log mw1163: mkdir -p /usr/local/apache/common-local && chown mwdeploy:mwdeploy /usr/local/apache/common-local [19:38:57] Logged the message, Master [19:40:48] <^d> !log hewiki elastic index was missing geodata mappings. re-map + in place reindex failed spectacularly. rebuilding from scratch now. [19:40:51] (03CR) 10Dzahn: [C: 032] Remove duplicate users in admin class's data.yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135725 (owner: 10Hoo man) [19:40:53] Logged the message, Master [19:43:37] !change 118036 | Coren [19:44:03] mutante: Hmm? [19:44:10] there used to be a bot for that:) [19:44:17] it would then nicely ask you for a review [19:44:32] ori: More puppet mysteries, the snapshot::sync class looks like it assumes that Exce['mw-sync'] (which just runs sync-common) creates /usr/local/apache/common-local [19:44:32] Ah! Found it; I'm taking a look now. [19:44:46] Coren: https://gerrit.wikimedia.org/r/#/c/118036/ not important, just a random older one [19:45:14] but it's needed by "allow access for admins from bastion" [19:45:33] (03PS2) 10coren: Tools: Rename references to local-admin to tools.admin [operations/puppet] - 10https://gerrit.wikimedia.org/r/118036 (owner: 10Tim Landscheidt) [19:46:37] ... rebasing it made it null. For some reason. [19:46:38] (03CR) 10Dzahn: "eh in PS2 the changes disappeared? duplicate?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/118036 (owner: 10Tim Landscheidt) [19:46:46] Coren: hah, yea, just saw that [19:46:53] actually duplicate? [19:47:05] Yeah, I'm thinking that's because that was made part of another changeset that already did it. Gimme a sec to git blame this. [19:47:13] (03PS4) 10Ori.livneh: Remove wikimedia-task-appserver from app servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/136151 [19:48:12] True duplicate. I gotta merge it since another one depends on it. [19:48:18] RECOVERY - SSH on mw1151 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [19:48:44] (03CR) 10coren: [C: 032] "Made null because duplicate; merging to satisfy a dependency." [operations/puppet] - 10https://gerrit.wikimedia.org/r/118036 (owner: 10Tim Landscheidt) [19:48:54] Coren: :) [19:48:58] (03PS2) 10coren: Tools: Allow access for administrators from bastions [operations/puppet] - 10https://gerrit.wikimedia.org/r/118039 (owner: 10Tim Landscheidt) [19:50:37] PROBLEM - Apache HTTP on mw1163 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 MediaWiki exception - 675 bytes in 0.056 second response time [19:51:00] (03CR) 10coren: [C: 032] "Yeah, reasonable enough given the criteria for group membership." [operations/puppet] - 10https://gerrit.wikimedia.org/r/118039 (owner: 10Tim Landscheidt) [19:51:27] RECOVERY - twemproxy port on mw1163 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:51:37] RECOVERY - twemproxy process on mw1163 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:51:37] RECOVERY - Apache HTTP on mw1163 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.106 second response time [19:53:08] * marktraceur notes that he'll be deploying MMV to the wikisources very soon [19:53:25] I will be back in a minute, am travelling upstairs. [19:55:53] marktraceur: k, one second. [19:56:02] Reedy: anomie how's things re ApiQuery? [19:56:18] greg-g: He handed it off to me a few minutes ago. Working on it now [19:56:45] kk, thanks [19:58:20] (03PS1) 10Rush: datasets systemuser under dataset role [operations/puppet] - 10https://gerrit.wikimedia.org/r/136220 [20:00:05] marktraceur: The time is nigh to deploy Media Viewer (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140529T2000) [20:00:14] LO THE TIME APPROACHETH [20:00:53] I did not know you could do 'git checkout -' before this week [20:01:01] dear god how did I not know that [20:01:26] marktraceur: one second please, if you don't mind. [20:01:38] chasemp: That's a handy command [20:01:45] greg-g: I know, I'm not going yet [20:01:50] marktraceur: :) thanks sir [20:01:51] (return to prior branch) [20:01:54] I have to get things sorted out anyway [20:02:05] * greg-g nods [20:02:05] (03PS3) 10MarkTraceur: Launch Media Viewer for all users on all Wikisources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134810 (owner: 10Gilles) [20:02:46] * bd808 notices that jouncebot didn't /notice [20:02:51] (03CR) 10Dzahn: datasets systemuser under dataset role (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136220 (owner: 10Rush) [20:02:55] * marktraceur grumbles [20:05:14] (03PS2) 10Rush: datasets systemuser under dataset role [operations/puppet] - 10https://gerrit.wikimedia.org/r/136220 [20:05:24] I'm undecided if it should /notice or not [20:05:25] (03PS1) 10MarkTraceur: Enable Media Viewer on wikisources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136221 [20:05:36] (03CR) 10Dzahn: "generic system users having ssh keys is new, but makes sense to me in this case, it needs to rsync things, and somehow this needs to go ou" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136220 (owner: 10Rush) [20:05:39] !log anomie synchronized php-1.24wmf7/extensions/EducationProgram/includes/api/ApiListStudents.php 'Backport fix for [[bugzilla:65906]]' [20:05:47] Logged the message, Master [20:05:48] (03CR) 10Dzahn: [C: 031] datasets systemuser under dataset role [operations/puppet] - 10https://gerrit.wikimedia.org/r/136220 (owner: 10Rush) [20:05:58] * marktraceur grumbles loudly [20:06:26] (03CR) 10Rush: [C: 032] datasets systemuser under dataset role [operations/puppet] - 10https://gerrit.wikimedia.org/r/136220 (owner: 10Rush) [20:06:45] marktraceur: what?! [20:06:48] (03CR) 10Rush: [V: 032] datasets systemuser under dataset role [operations/puppet] - 10https://gerrit.wikimedia.org/r/136220 (owner: 10Rush) [20:07:09] greg-g: The fact I had to submit a new patch because the other one was -2'd, sigh [20:07:45] * mutante noticed [20:09:35] (03PS1) 10Rush: typo for ssh_key in systemuser.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/136226 [20:09:51] (03CR) 10Rush: [C: 032 V: 032] typo for ssh_key in systemuser.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/136226 (owner: 10Rush) [20:10:13] Anyway I have the patch all sorted, just need the go signal. [20:10:28] anomie: whenever you're confident, let me know [20:10:45] (03PS1) 10Ori.livneh: Add 'deployment' service alias [operations/dns] - 10https://gerrit.wikimedia.org/r/136227 [20:11:11] marktraceur, greg-g: I'm backporting the fix to wmf6 now (waiting on Jenkins to merge). Then as soon as I make sure enwiki isn't somehow broken again we should be good. [20:11:21] coolio [20:13:15] marktraceur: If you're not doing any changes in 1.24wmf6, I don't see any reason you couldn't do yours in parallel. [20:13:25] It's a config change [20:13:35] I will literally take five minutes, so no reason to rush. [20:14:11] it's just enabling MMV on all wikisources, not teribly crazy [20:14:25] Ok, whichever you'd like. I should only be as long as it takes Jenkins to finish merging and then a sync-file and a sync-dir [20:14:36] (03PS1) 10Rush: bad var interpolation from ssh_key resource systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/136228 [20:14:51] let's keep 'em separate for sanity's sake [20:14:58] "gotta keep 'em separated" [20:15:11] damn you greg-g now that's in my head for the day [20:15:19] chasemp: you're welcome. [20:16:31] (03PS1) 10Chad: Flow-ify mw:Talk:Search [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136229 [20:18:11] greg-g: If it's just a config change it won't be takin' too much ti-i-i-i-me [20:18:33] another good ear worm [20:18:44] It's...the same one? [20:18:51] (03CR) 10Rush: [C: 032] "try again" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136228 (owner: 10Rush) [20:19:10] oh, I thought you were going for "Tiiiiiiiiiime, it's on my side, yes it is." [20:19:18] Naw [20:19:19] !log anomie synchronized php-1.24wmf6/extensions/EducationProgram/includes/api/ApiListStudents.php 'Backport fix for [[bugzilla:65906]]' [20:19:23] Logged the message, Master [20:19:26] "If you're under eighteen, you won't be doing any time" [20:20:12] Sigh [20:20:19] Of course there's a non-obvious way to delete reviews [20:20:46] (03Abandoned) 10MarkTraceur: Enable Media Viewer on wikisources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136221 (owner: 10MarkTraceur) [20:24:18] chasemp: that's wrong [20:24:22] you're missing a $ [20:24:33] just figuring that out :) [20:24:37] thank you [20:24:44] my brain looked but did not see [20:25:30] !log anomie synchronized php-1.24wmf6/includes/api 'Revert revert of [[gerrit:120827]], underlying bug should be fixed now' [20:25:35] Logged the message, Master [20:25:37] marktraceur, greg-g: I'm done, go ahead [20:25:43] (03CR) 10Faidon Liambotis: [C: 032] Remove wikimedia-task-appserver from app servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/136151 (owner: 10Ori.livneh) [20:25:52] Cool [20:25:52] \o/ \o/ \o/ [20:25:57] (03CR) 10MarkTraceur: [C: 032] Launch Media Viewer for all users on all Wikisources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134810 (owner: 10Gilles) [20:25:57] finally [20:26:01] america, fuck yeah [20:26:07] hehe [20:26:09] (03Merged) 10jenkins-bot: Launch Media Viewer for all users on all Wikisources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134810 (owner: 10Gilles) [20:26:52] thanks anomie for taking care of that [20:26:54] funny thing is that passed the linter [20:27:08] greg-g: No problem at all [20:27:11] the linter doesn't catch 90% of erros [20:27:14] errors :) [20:27:19] Perfect [20:27:56] (03CR) 10Faidon Liambotis: "What kind of settings would you unify? SSH host verification would fail on the CNAME, so using it for ssh kind of sucks. Where do you inte" [operations/dns] - 10https://gerrit.wikimedia.org/r/136227 (owner: 10Ori.livneh) [20:27:56] git merge is slow on tin today [20:28:01] !log marktraceur updated /a/common to {{Gerrit|I95348e0d4}}: Launch Media Viewer for all users on all Wikisources [20:28:06] Logged the message, Master [20:29:18] RECOVERY - Apache HTTP on mw1151 is OK: HTTP OK: HTTP/1.1 200 OK - 454 bytes in 0.002 second response time [20:29:47] !log marktraceur synchronized wmf-config/InitialiseSettings.php 'Enable Media Viewer on all wikisources by default' [20:29:52] Logged the message, Master [20:30:07] PROBLEM - DPKG on mw1168 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:30:10] Testing [20:30:18] But I think I've mananged under five minutes [20:30:49] :) [20:30:51] (03PS1) 10Rush: fixing generic::systemuser typos ..again [operations/puppet] - 10https://gerrit.wikimedia.org/r/136230 [20:30:53] (03CR) 10Ori.livneh: "for scap puppetization, to get rid of some nasty if labs / elsif production branching. I didn't think about the CNAME / SSH issue, that's " [operations/dns] - 10https://gerrit.wikimedia.org/r/136227 (owner: 10Ori.livneh) [20:31:01] getting pages [20:31:05] (03CR) 10Rush: [C: 032 V: 032] fixing generic::systemuser typos ..again [operations/puppet] - 10https://gerrit.wikimedia.org/r/136230 (owner: 10Rush) [20:31:07] RECOVERY - DPKG on mw1168 is OK: All packages OK [20:31:16] for? [20:31:17] 1168 is a new one to me [20:31:22] cmjohnson1, we're getting raw 404's intermittently on meta. [20:31:24] https://meta.wikimedia.org/wiki/Schema_talk:GuidedTourGuiderImpression [20:31:24] Main_Page [20:31:26] mw1168 is newly provisioned no? [20:31:31] no, 1163 was [20:31:31] wikisources now has MMV [20:31:35] 404s on Wikidata too. [20:31:36] no 1163 [20:31:36] 1151 is also out [20:31:37] {{done}} thanks greg-g [20:31:42] marktraceur: thank you [20:31:45] puppet running 1151 now [20:31:55] 1151 is just bits server [20:32:15] 1163 is depooled [20:32:17] PROBLEM - DPKG on mw1212 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:32:17] PROBLEM - DPKG on mw1216 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:32:19] superm401: what server is giving you the 404? [20:32:31] paravoid: yeah, hence xx68 being a new one to me [20:32:36] what? [20:32:39] dpkg issues? [20:32:54] greg-g, how do I check? [20:33:07] (getting 404s everwhere) [20:33:08] I'm getting 404s as well [20:33:17] RECOVERY - DPKG on mw1212 is OK: All packages OK [20:33:17] RECOVERY - DPKG on mw1216 is OK: All packages OK [20:33:29] I still have it open, but it doesn't look like the server is exposed on the DOM. [20:33:29] superm401: I guess you can't, i thoght it have the