[00:01:57] !log ori synchronized wmf-config/CommonSettings.php 'Update wgFlowCacheVersion to 4.2' [00:02:03] Logged the message, Master [00:03:15] ^ ebernhardson [00:19:49] (03PS1) 10Dzahn: just include ldap::role::client::labs in role [operations/puppet] - 10https://gerrit.wikimedia.org/r/131638 [00:22:04] (03PS2) 10Dzahn: just include ldap::role::client::labs in role [operations/puppet] - 10https://gerrit.wikimedia.org/r/131638 [00:25:15] (03CR) 10Dzahn: [C: 032] just include ldap::role::client::labs in role [operations/puppet] - 10https://gerrit.wikimedia.org/r/131638 (owner: 10Dzahn) [00:25:58] (03CR) 10Tim Starling: "I would prefer to see the domain name be used as the stream name, instead of the wiki ID. The wiki ID is difficult to change, but the doma" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/14045) (owner: 10Ori.livneh) [00:27:39] (03CR) 10Dzahn: "there was still a problem with this, but I4839d346028 fixed it" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131597 (owner: 10Dzahn) [00:28:31] (03CR) 10Dzahn: "fixed with I70d4418fbb and I70d4418fbb" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131123 (owner: 10Ori.livneh) [00:29:03] (03CR) 10Dzahn: "err.. Change-Id: I4839d3460" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131123 (owner: 10Ori.livneh) [00:41:09] RECOVERY - Puppet freshness on osmium is OK: puppet ran at Tue May 6 00:41:04 UTC 2014 [01:09:39] PROBLEM - MySQL Processlist on db1059 is CRITICAL: CRIT 116 unauthenticated, 0 locked, 0 copy to table, 1 statistics [01:13:39] RECOVERY - MySQL Processlist on db1059 is OK: OK 2 unauthenticated, 0 locked, 0 copy to table, 1 statistics [01:17:04] (03PS1) 10Chad: Raise redundancy back up for commonswiki_file as well [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131647 [01:18:04] (03CR) 10Chad: "My napkin math says we should be able to do this if we want. But maybe not :)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131647 (owner: 10Chad) [01:28:04] (03CR) 10Krinkle: "While I too originally recommended subscription to by hostname, I'm not sure actually" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/14045) (owner: 10Ori.livneh) [02:03:03] (03PS1) 10Springle: Enable parallel replication thread pool for analytics-slave. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131651 [02:05:43] (03CR) 10Springle: [C: 032] Enable parallel replication thread pool for analytics-slave. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131651 (owner: 10Springle) [02:12:50] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3791 MB (3% inode=99%): [02:19:50] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3432 MB (3% inode=99%): [02:28:24] !log LocalisationUpdate completed (1.24wmf2) at 2014-05-06 02:27:21+00:00 [02:28:33] Logged the message, Master [02:49:49] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Mon May 5 20:48:34 2014 [03:00:50] RECOVERY - Disk space on virt0 is OK: DISK OK [03:03:06] !log LocalisationUpdate completed (1.24wmf3) at 2014-05-06 03:02:03+00:00 [03:03:14] Logged the message, Master [03:48:47] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue May 6 03:47:41 UTC 2014 (duration 47m 40s) [03:48:54] Logged the message, Master [03:50:19] (03CR) 10Ori.livneh: "@Krinkle: Hm, why don't we simply add the canonical hostname as a global & JS config var?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/14045) (owner: 10Ori.livneh) [04:04:01] (03PS1) 10Springle: Depool db1049 for maintenance. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131654 [04:04:30] (03CR) 10Springle: [C: 032] Depool db1049 for maintenance. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131654 (owner: 10Springle) [04:04:38] (03Merged) 10jenkins-bot: Depool db1049 for maintenance. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131654 (owner: 10Springle) [04:06:54] !log springle synchronized wmf-config/db-eqiad.php 'depool db1049 for maintenance' [04:07:01] Logged the message, Master [04:27:05] (03CR) 10Mattflaschen: "This isn't working. Is it possibly because there is no include for my new account into mortals?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131624 (owner: 10Andrew Bogott) [04:29:06] After https://gerrit.wikimedia.org/r/#/c/131624/ , I should be able to connect to bast1001.wikimedia.org and tin with mattflaschen. [04:29:31] However, it's not working, maybe because the new username is not added to mortals. [04:29:49] ^ Jeff_Green [04:36:09] I have a deployment tomorrow, so it would be good to resolve this today. /ping apergos [04:47:02] superm401: I'motnearly awake enough to look at anything yet, it's not even 8 am here [04:47:12] !log mydumper/myloader clone db1042 to db1049 [04:47:15] apergos, sorry. [04:47:19] Logged the message, Master [04:47:34] Just randomly picking people marked unaway [04:49:12] (03PS1) 10Ori.livneh: Add mattflaschen to mortals (following rename from 'mflaschen') [operations/puppet] - 10https://gerrit.wikimedia.org/r/131656 [04:53:17] superm401: I'll merge that if no one from ops gets to it before your window [04:53:21] but I'm sure someone will [04:53:29] ori, much obliged. [04:54:20] andrewbogott: FYI ^ [04:54:48] andrewbogott: other accounts were affected, as superm401 noted in his comment on Iacec7bb90 [05:19:13] (03PS1) 10Springle: Replace db1020 with db1049 in s4 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131657 [05:21:45] (03CR) 10Springle: [C: 032] Replace db1020 with db1049 in s4 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131657 (owner: 10Springle) [05:33:59] PROBLEM - Disk space on vanadium is CRITICAL: DISK CRITICAL - free space: / 4272 MB (3% inode=95%): [05:50:50] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Mon May 5 20:48:34 2014 [06:01:13] (03PS1) 10Springle: Adjust config for MariaDB 10 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131659 [06:03:07] (03CR) 10Springle: [C: 032] Adjust config for MariaDB 10 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131659 (owner: 10Springle) [06:19:32] (03PS1) 10Springle: Use proper DB packages on tendril and dbstore [operations/puppet] - 10https://gerrit.wikimedia.org/r/131662 [06:21:16] (03PS1) 10Ori.livneh: logrotate.d/eventlogging: maxage 90 days [operations/puppet] - 10https://gerrit.wikimedia.org/r/131663 [06:21:25] (03CR) 10Springle: [C: 032] Use proper DB packages on tendril and dbstore [operations/puppet] - 10https://gerrit.wikimedia.org/r/131662 (owner: 10Springle) [06:22:03] (03PS2) 10Ori.livneh: logrotate.d/eventlogging: maxage 90 days [operations/puppet] - 10https://gerrit.wikimedia.org/r/131663 [06:23:48] (03CR) 10Ori.livneh: [C: 032] logrotate.d/eventlogging: maxage 90 days [operations/puppet] - 10https://gerrit.wikimedia.org/r/131663 (owner: 10Ori.livneh) [06:29:59] RECOVERY - Disk space on vanadium is OK: DISK OK [06:31:30] !log deleting rotated logs in /var/log/eventlogging/archive that are older than 90 days [06:31:37] Logged the message, Master [06:31:45] !log ..on vanadium [06:31:52] Logged the message, Master [06:50:24] <_joe_> springle: hey :) [06:54:34] hi _joe_ [06:55:12] !log hammering dbstore1001 with dumps in screen session. ignore replag [06:55:20] Logged the message, Master [06:57:58] ori: is there an automated process for that yet? or are you just deleting logs that wre grandfathered in somehow? [06:58:14] also, good morning [06:58:19] hi apergos [06:58:28] i updated the logrotate.d file [06:58:33] and did a manual run [06:58:46] ok excellent [06:59:39] PROBLEM - MySQL Processlist on db1059 is CRITICAL: CRIT 69 unauthenticated, 0 locked, 0 copy to table, 1 statistics [07:00:39] RECOVERY - MySQL Processlist on db1059 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 3 statistics [07:22:37] (03CR) 10Raimond Spekking: [C: 04-1] "This patch set looks ok -> +1" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130809 (https://bugzilla.wikimedia.org/57819) (owner: 10Withoutaname) [07:28:07] Raymond|afk: ? [07:28:44] ah [07:30:29] PROBLEM - RAID on ruthenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:30:34] replied on bug [07:31:19] RECOVERY - RAID on ruthenium is OK: NRPE: Unable to read output [07:31:49] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 8 below the confidence bounds [07:32:11] <_joe_> bad, bad. [07:32:30] <_joe_> I should work on that. [08:36:35] !log temporarily disabling puppet on analytics1026 to troubleshoot a camus import problem [08:36:43] Logged the message, Master [08:37:29] hey ottomata [08:37:43] hiya [08:38:08] did you see my page last night? [08:38:13] ah, no [08:38:15] jgage responded in the end [08:38:17] i'm in switzerland as of now [08:38:21] was traveling [08:38:26] no phone :/ [08:38:29] varnishkafka derr was freaking out [08:38:35] ah! [08:38:43] how did jgage fix? [08:38:52] was it just esams? [08:39:22] no [08:39:27] eqiad too [08:39:42] no idea what jgage did besides what's in SAL [08:39:47] no, looks like everything [08:39:49] hm [08:39:57] 20:38 jgage: forced kafka broker reelection 20:58 jgage: both kafka brokers back in service [08:40:00] ah ok [08:40:04] hmmm [08:41:20] ok so, yeah, we still have this problem where occasionally a broker drops out of the ISR for a few seconds, which makes the other broker become the leader for all topics [08:41:31] this has always been fine in the past [08:41:42] i'd notice and then do the reelection just like gage did last night [08:41:55] but, this might be the first time that this has happened with text data in the stream [08:42:03] perhaps one broker cannot handle all the data... [08:42:50] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [08:45:21] !log re-enabling puppet agent on analytics1022; kafka broker is caught up there and is fully in all ISRs [08:45:27] Logged the message, Master [08:46:28] (03CR) 10Ori.livneh: "..by reverting https://www.mediawiki.org/wiki/Special:Code/MediaWiki/73950, perhaps" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/14045) (owner: 10Ori.livneh) [08:46:39] RECOVERY - Puppet freshness on analytics1022 is OK: puppet ran at Tue May 6 08:46:36 UTC 2014 [08:48:45] (03PS1) 10Hashar: Trust Swift proxies XFF headers [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131669 (https://bugzilla.wikimedia.org/64622) [08:49:19] (03PS1) 10Faidon Liambotis: Add Swift frontends to squid.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131670 (https://bugzilla.wikimedia.org/64622) [08:49:19] hey [08:49:27] I was on it [08:50:40] eh? [08:50:50] that was to hashar [08:50:51] oh [08:50:52] k [08:51:00] we essentially pushed the same changeset at about the same time :) [08:51:05] hehe [08:51:19] paravoid: I noticed in site.pp we have node /^ms-fe300[1-2]\.esams\.wmnet$/ { [08:51:19] include role::swift::esams-prod::proxy [08:51:24] should we add them as well ? [08:51:25] not used by mw [08:51:28] no [08:52:08] (03Abandoned) 10Hashar: Trust Swift proxies XFF headers [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131669 (https://bugzilla.wikimedia.org/64622) (owner: 10Hashar) [08:52:23] I almost abandoned mine [08:52:29] that would have been funny [08:52:53] hehe [08:54:20] (03PS2) 10Hashar: Add Swift frontends to squid.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131670 (https://bugzilla.wikimedia.org/64622) (owner: 10Faidon Liambotis) [08:54:38] (03PS1) 10Hashar: Mention ms-fe servers need to be XFF trusted by MW [operations/puppet] - 10https://gerrit.wikimedia.org/r/131671 (https://bugzilla.wikimedia.org/64622) [08:54:58] paravoid: and I am adding a comment to site.pp to remember people to add new ms-fe servers to mediawiki-config https://gerrit.wikimedia.org/r/131671 [08:55:30] paravoid: should I handle the deploy for you ? :-) [08:56:20] (03CR) 10Faidon Liambotis: [C: 032] Add Swift frontends to squid.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131670 (https://bugzilla.wikimedia.org/64622) (owner: 10Faidon Liambotis) [08:56:28] (03Merged) 10jenkins-bot: Add Swift frontends to squid.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131670 (https://bugzilla.wikimedia.org/64622) (owner: 10Faidon Liambotis) [08:56:51] (03CR) 10Faidon Liambotis: [C: 032] Mention ms-fe servers need to be XFF trusted by MW [operations/puppet] - 10https://gerrit.wikimedia.org/r/131671 (https://bugzilla.wikimedia.org/64622) (owner: 10Hashar) [08:59:03] stuck at dologmsg [08:59:10] !log faidon updated /a/common to {{Gerrit|Ica9086dcd}}: Add Swift frontends to squid.php [08:59:18] Logged the message, Master [09:01:21] the renderfile-nonstandard limits are no more showing in limiter.log :] [09:01:34] !log faidon synchronized wmf-config/squid.php 'add Swift to squid.php' [09:01:40] Logged the message, Master [09:02:34] we really should renamed that list and that file :) [09:03:26] I will let you close bug https://bugzilla.wikimedia.org/show_bug.cgi?id=64622 :] [09:04:02] when reading php docs [09:04:14] am i the only one that has to pause for a sec and think about which is the needle and which is the haystack? [09:04:28] tell me i'm not the only one. [09:04:33] ori: you are alone [09:04:37] :( [09:04:52] most folks use IDE that shows up the function documentation in a popup [09:05:08] I dont use an IDE myself though, the workaround is to not have to write PHP [09:07:17] ori: it must be listed at http://phpsadness.com [09:07:30] ori: that site list the RFC / filled bugs for a bunch of PHP issues [09:07:40] and the author strike them whenever they got implemented / fixed [09:07:58] ori: relevant sadness is http://phpsadness.com/sad/9 [09:08:48] !log LocalisationUpdate completed (1.22wmf15) at Tue May 6 09:07:45 UTC 2014 [09:08:54] Logged the message, Master [09:16:24] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue May 6 09:15:20 UTC 2014 [09:16:31] Logged the message, Master [09:17:09] and, unrelated: several users (at minimum mattflaschen and gdubuc) lost shell access as part of slightly careless labs / production account consolidation which did not add the consolidated usernames to mortals [09:17:42] see for example [09:52:58] ori, ping [09:53:17] gwicke: funny, i was just about to comment on your patch [09:53:27] he ;) [09:53:33] you read my mind [09:53:44] are you in Europe already? [09:54:00] I am *g* [09:54:06] gwicke, meet godog; godog, meet gwicke [09:54:23] hey godog! [09:54:31] and good morning paravoid [09:54:43] i'm not yet, but i am forward-looking and shifting my timezones ahead of my flight [09:54:51] godog is Filippo, if it wasn't obvious [09:54:59] yes, I figured [09:56:10] gwicke: hi! [09:56:50] godog, paravoid praised your debian expertise & promised that you might help us set up a public repo [09:57:18] we're keen to publish our wares as debs [09:57:27] haha I'm flattered! yep I can help of course [09:58:20] there are some notes / background at https://www.mediawiki.org/wiki/Packaging and https://www.mediawiki.org/wiki/Talk:Packaging#Meeting_notes_2014-04-04 [09:59:26] the gist of it is that we'd like to set up a public repo that normal deployers can push debs to, ideally with an automated build process triggered by git tags [10:00:02] (03CR) 10Withoutaname: "@Raimond Spekking" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130809 (https://bugzilla.wikimedia.org/57819) (owner: 10Withoutaname) [10:02:52] cool, will be reading up those two to get some context [10:07:41] <_joe_> packages of mediawiki you mean? [10:08:56] _joe_, yes [10:09:58] <_joe_> mh, I never saw the point in packaging PHP more than a .tar.gz, but maybe that's me :) [10:10:38] so far we have packages for parsoid and mathoid; there's an official deb for core as well, but that's mostly been done by outside debian folks so far [10:11:35] <_joe_> gwicke: I get some people find that useful, so it's a good thing(TM) to provide those [10:11:42] <_joe_> don't get me wrong :) [10:13:02] paravoid: any objections to me merging those 3 submodule changes? [10:13:06] (well, 6 changes total) [10:13:11] <_joe_> I'm just suprised that people use debs to deploy php applications :) [10:14:06] _joe_, it's just much easier to have packages handle all dependencies etc [10:14:19] and it will become more important the further we move towards SOA [10:14:43] as core will only be one of several components of the overall system [10:14:54] <_joe_> gwicke: from my experience with operating a large SOA shop, debs do not solve the problems we will meet [10:15:29] (03CR) 10Ottomata: [C: 031] "I can merge if yall think I should." [operations/puppet] - 10https://gerrit.wikimedia.org/r/129814 (owner: 10BryanDavis) [10:15:33] <_joe_> they just solve the 'deploy the software and run some scripts' problem, which can be solved in various ways. [10:15:41] do you mean 'do not solve *all* problems' or 'do not solve *any* problem' ? ;) [10:16:00] <_joe_> gwicke: do not solve any of the relevant problems for SOA [10:16:23] normally you combine debs with puppet for config management of course [10:16:23] we've had this debate before, and I generally tend to agree with _joe_ :) [10:16:26] <_joe_> gwicke: the problem with SOA is that your dependencies become non-local [10:16:27] and I think most opsens do [10:16:43] so, for now we've agreed to set up a public Debian repo for third-party users [10:16:53] <_joe_> oh yeah, sorry, I was not trying to lit up a flame :) [10:16:53] not for internal use, at least not just yet [10:17:07] <_joe_> just giving everybody some wisdom from past experiences [10:17:15] yup, it's good [10:17:30] I don't have a strong opinion myself, so more PoVs are certainly helpful [10:17:55] for services like Parsoid I'm also hoping that we can capitalize debs internally in the longer term, so that we don't have to repeat everything in puppet [10:18:44] better to just do the interesting config stuff in puppet, and leave the boring static stuff to debs [10:18:54] IMHO ;) [10:23:55] ori, can you help me understand the pros/cons of using EventLogging vs. ProfilerSimpleUDP? [10:24:56] no, going to bed. but i'll reply tomorrow if you ask on the patch [10:25:13] sorry, exhausted [10:25:15] ori, okay; goodnight ;) [11:28:49] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Tue May 6 08:27:45 2014 [12:30:06] Anyone around? [12:31:14] <_joe_> Krenair: what do you need? [12:31:33] A system admin who can poke around in email logs and view OTRS tickets. [12:38:05] (for https://bugzilla.wikimedia.org/show_bug.cgi?id=64441 ) [12:41:02] <_joe_> Krenair: I don't really have time for that now, sorry [12:42:13] <_joe_> also, I just discovered I don't have access to OTRS it seems [12:46:28] (03PS2) 10Andrew Bogott: Add mattflaschen to mortals (following rename from 'mflaschen') [operations/puppet] - 10https://gerrit.wikimedia.org/r/131656 (owner: 10Ori.livneh) [12:47:34] (03CR) 10Andrew Bogott: "yep, you're right, I forgot to grep. Should be fixed (thanks to Ori) with https://gerrit.wikimedia.org/r/#/c/131656/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131624 (owner: 10Andrew Bogott) [12:48:38] (03PS3) 10Andrew Bogott: Add mattflaschen to mortals (following rename from 'mflaschen') [operations/puppet] - 10https://gerrit.wikimedia.org/r/131656 (owner: 10Ori.livneh) [12:50:27] (03CR) 10Andrew Bogott: [C: 032] Add mattflaschen to mortals (following rename from 'mflaschen') [operations/puppet] - 10https://gerrit.wikimedia.org/r/131656 (owner: 10Ori.livneh) [12:51:26] hashar: you have a pending unmerged patch… 'Mention ms-fe servers need to be XFF trusted by MW' [12:51:30] shall I merge? [12:51:47] andrewbogott: hi! [12:52:01] https://gerrit.wikimedia.org/r/#/c/131671/ ? [12:52:05] was merged by Faidon [12:52:20] merged in gerrit but not on palladium [12:52:24] oh yes I forgot [12:52:24] anyway, I'll merge it now [12:52:26] it's just a comment [12:52:27] merge it [12:52:31] ok :) [12:52:35] ah on the puppetmaster hehe [12:52:35] sorry [12:52:36] I went off to do the mediawiki deploy and forgot [12:53:29] superm401: I just merged your account fix… if that needs to be active in < 30 minutes then I can force a puppet update on whichever boxes you need it on. Sorry for the oversight! [12:53:34] (and, thanks ori) [12:56:36] (03PS1) 10Andrew Bogott: Add 'gilles' in places where 'gdubuc' was installed. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131689 [12:58:40] (03PS1) 10Andrew Bogott: Add amire80 to 'admins::restricted ' [operations/puppet] - 10https://gerrit.wikimedia.org/r/131690 [12:59:21] (03CR) 10Andrew Bogott: [C: 032] Add 'gilles' in places where 'gdubuc' was installed. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131689 (owner: 10Andrew Bogott) [13:01:05] (03CR) 10Andrew Bogott: [C: 032] Add amire80 to 'admins::restricted ' [operations/puppet] - 10https://gerrit.wikimedia.org/r/131690 (owner: 10Andrew Bogott) [13:12:19] (03PS2) 10Giuseppe Lavagetto: Adding ability to compute change-based diffs. [operations/software] - 10https://gerrit.wikimedia.org/r/131495 [13:12:22] (03CR) 10jenkins-bot: [V: 04-1] Adding ability to compute change-based diffs. [operations/software] - 10https://gerrit.wikimedia.org/r/131495 (owner: 10Giuseppe Lavagetto) [13:15:02] (03CR) 10Raimond Spekking: [C: 031] "Thanks for adding the messages. No "dependencies" to set. It works automatically." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130809 (https://bugzilla.wikimedia.org/57819) (owner: 10Withoutaname) [13:15:13] (03CR) 10Giuseppe Lavagetto: "@Rush I added a README that should help you in setting up your own copy of this." [operations/software] - 10https://gerrit.wikimedia.org/r/131495 (owner: 10Giuseppe Lavagetto) [13:38:34] (03PS1) 10Jgreen: fix puppet collision at naming unixaccount invocations for user Gilles [operations/puppet] - 10https://gerrit.wikimedia.org/r/131693 [13:39:33] (03CR) 10jenkins-bot: [V: 04-1] fix puppet collision at naming unixaccount invocations for user Gilles [operations/puppet] - 10https://gerrit.wikimedia.org/r/131693 (owner: 10Jgreen) [13:40:55] (03PS2) 10Jgreen: fix puppet collision at naming unixaccount invocations for user Gilles [operations/puppet] - 10https://gerrit.wikimedia.org/r/131693 [13:42:48] (03CR) 10Jgreen: [C: 032 V: 031] fix puppet collision at naming unixaccount invocations for user Gilles [operations/puppet] - 10https://gerrit.wikimedia.org/r/131693 (owner: 10Jgreen) [13:50:03] (03PS1) 10Jgreen: fix more naming collisions when calling unixaccount [operations/puppet] - 10https://gerrit.wikimedia.org/r/131696 [13:51:43] (03CR) 10Krinkle: "Totally." [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/14045) (owner: 10Ori.livneh) [13:53:53] (03CR) 10Jgreen: [C: 032 V: 031] fix more naming collisions when calling unixaccount [operations/puppet] - 10https://gerrit.wikimedia.org/r/131696 (owner: 10Jgreen) [14:08:08] (03CR) 10Filippo Giunchedi: "a few comments here and there, but looks good overall!" (038 comments) [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [14:29:49] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Tue May 6 08:27:45 2014 [14:52:49] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds [15:03:18] (03PS2) 10Hashar: zuul: compress log daily [operations/puppet] - 10https://gerrit.wikimedia.org/r/127230 (https://bugzilla.wikimedia.org/63935) [15:06:01] (03CR) 10Hashar: [C: 04-1] "Cherry picked on contint puppetmaster. Will see what happens on the integration-dev.eqiad.wmflabs instance which has Zuul installed and pr" [operations/puppet] - 10https://gerrit.wikimedia.org/r/127230 (https://bugzilla.wikimedia.org/63935) (owner: 10Hashar) [15:11:37] (03CR) 10BryanDavis: "Ottomata: I just checked from tin and trebuchet is reporting that the scap repo has been synced across 229 hosts so it should be safe to m" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129814 (owner: 10BryanDavis) [15:23:13] (03PS1) 10Alexandros Kosiaris: netmon1001 as a ganglia collector/aggregator [operations/puppet] - 10https://gerrit.wikimedia.org/r/131708 [15:24:11] (03PS5) 10Matanya: manutius: remove torrus [operations/puppet] - 10https://gerrit.wikimedia.org/r/130587 [15:24:50] (03PS3) 10Giuseppe Lavagetto: Move cluster definition to the node level. [operations/puppet] - 10https://gerrit.wikimedia.org/r/130591 [15:30:09] (03PS4) 10Manybubbles: WIP: Add some plugins to labs [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/130969 [15:32:22] is it me or does jenkins/zuul have a problem ? [15:34:48] superm401: Are your logins treating you OK now? [15:34:49] not just you akosiaris [15:36:30] (03PS3) 10Giuseppe Lavagetto: Adding ability to compute change-based diffs. [operations/software] - 10https://gerrit.wikimedia.org/r/131495 [15:37:09] (03CR) 10Giuseppe Lavagetto: [C: 031] "Should work following instructions in the README." [operations/software] - 10https://gerrit.wikimedia.org/r/131495 (owner: 10Giuseppe Lavagetto) [15:39:35] (03Abandoned) 10Hashar: Renamed $wmf* to $wmg* [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101820 (https://bugzilla.wikimedia.org/43956) (owner: 10Arav93) [15:40:35] (03CR) 10jenkins-bot: [V: 04-1] Adding ability to compute change-based diffs. [operations/software] - 10https://gerrit.wikimedia.org/r/131495 (owner: 10Giuseppe Lavagetto) [15:40:37] (03CR) 10jenkins-bot: [V: 04-1] netmon1001 as a ganglia collector/aggregator [operations/puppet] - 10https://gerrit.wikimedia.org/r/131708 (owner: 10Alexandros Kosiaris) [15:41:43] (03CR) 10Alexandros Kosiaris: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131708 (owner: 10Alexandros Kosiaris) [15:42:16] !log killed zuul processes on gallium and restarted the service [15:42:23] Logged the message, Master [15:43:28] (03CR) 10Alexandros Kosiaris: [C: 032] netmon1001 as a ganglia collector/aggregator [operations/puppet] - 10https://gerrit.wikimedia.org/r/131708 (owner: 10Alexandros Kosiaris) [15:46:49] There is a pending change for the MediaWiki Math extension that affects the database. In theory things should only change for people with private wiki installation that use the LateXML rendering mode. However, I want to be very sure that this change does not influence the production environment. Can someone take care of proper code review for this particular change https://gerrit.wikimedia.org/r/#/c/124805/? [15:46:54] (03PS1) 10Alexandros Kosiaris: Revert "netmon1001 as a ganglia collector/aggregator" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131709 [15:47:11] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Revert "netmon1001 as a ganglia collector/aggregator" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131709 (owner: 10Alexandros Kosiaris) [15:47:22] (03PS4) 10Giuseppe Lavagetto: Adding ability to compute change-based diffs. [operations/software] - 10https://gerrit.wikimedia.org/r/131495 [15:48:49] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [15:51:12] what is up with openssl? new vulnerabilities ... :/ [15:51:34] matanya: was it real? [15:51:43] matanya: I assume you're talking about the ransom note one? [15:51:55] no, two real ones [15:51:57] that was openSSH not openSSL [15:51:59] ugh [15:51:59] low, but real [15:52:03] akosiaris: ah [15:53:19] <_joe_> matanya: people look at it [15:53:43] greg-g: http://www.ubuntu.com/usn/usn-2192-1/ [15:53:44] <_joe_> matanya: I used to follow someone posting on twitter the openssl WTFs daily [15:53:54] matanya: ah, thanks [15:53:58] <_joe_> matanya: those are lower-grade vulns btw [15:54:07] yes, noted [15:54:17] <_joe_> yeah the openssh one... it seems bogus, and let's hope it is like that. [15:54:19] still, somewhat troubling [15:54:49] physikerwelt: hey, I haven't looked thoroughly at that patch, but thanks for giving a heads up. Quick question: is the extension backwards compat? [15:55:08] physikerwelt: ie: do things break if the new code is deployed but the database changes haven't been made? [15:55:35] greg-g: suddenly remembered i question i had for you yesterday [15:55:56] is there a way to know if a specific patch was deployed? [15:55:57] physikerwelt: see also: https://www.mediawiki.org/wiki/Development_policy#Database_patches (if you have already answered this on that patch, sorry!) [15:56:19] matanya: in a round about way, yeah, to some degree of certainty (not 100%) [15:56:39] greg-g: I'll look at that [15:57:08] greg-g: e.g I wanted to know if https://bugzilla.wikimedia.org/show_bug.cgi?id=64727 was deployed, but bugzilla has no such status as "deployed" [15:57:14] matanya: look at it on gerrit: 1) merged? 2) if yes, click "included in" 3) if nothing, no 4) if in some wmfXX branch cool, corroborate with https://www.mediawiki.org/wiki/MediaWiki_1.24/Roadmap#Schedule_for_the_deployments [15:57:39] matanya: yeah, that's an annoying thing. Mostly because, what does "deployed" mean in our rolling train cycle? [15:57:42] deployed where? [15:57:46] physikerwelt: thankya [15:57:50] akosiaris: hi, do you have an idea who could be a good WMF contact for feedback on DMARC vs. mailing lists issues? https://bugzilla.wikimedia.org/show_bug.cgi?id=64795 and https://bugzilla.wikimedia.org/show_bug.cgi?id=64818 [15:57:52] akosiaris, (asking you as you've commented on mailing lists related stuff, but maybe DMARC is too much up in the stack and Mailman settings territory?) [15:59:09] andre__: looking [15:59:32] matanya: see also: https://www.mediawiki.org/wiki/Wikimedia_Release_%26_QA_Team/Wishlist#Code_Deploy_Dashboard [15:59:43] akosiaris: thanks. Just would love to see an educated comment (means: not me) for our community [16:00:01] greg-g: bullseye [16:00:04] thanks [16:01:00] matanya: :) [16:01:19] bad sadly not in my wiki yet :) [16:01:31] greg-g: I see that it is required to have MySQL and SQLite. I created seperate files for both tables even though the content is the same. Is that the way to go? [16:02:00] physikerwelt: I *believe* so, but I'm honestly not the best person to ask db schema change/migrations details :/ [16:02:10] somebody else in here know? [16:05:25] Confirmed, I can now login with mattflaschen [16:08:33] greg-g: It's not urgend but it would be great if the changes in the Math extension would make progress in a week scale. Could you suggest some additional reviewers? [16:10:24] greg-g: Frédéric Wang was really helpful with the review but he is more a Mozialla Firefoy developer than a MediaWiki expert and neither of us has ever used a database that is distributed across more than one server [16:11:36] greg-g, there is a mistake on the deployments page for our deployment. It should be https://gerrit.wikimedia.org/r/#/c/130381 , not https://gerrit.wikimedia.org/r/#/c/131020/ [16:11:57] Fixing now, but wanted to let people know. The actual description is accurate. [16:15:08] Fixed on the page [16:19:03] !log Changed email for global account "Elph". [16:19:10] Logged the message, Master [16:19:38] (03CR) 10Rush: admin module for user/group/permissions cleanup (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 (owner: 10Rush) [16:20:05] (03PS3) 10Rush: admin module for user/group/permissions cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 [16:20:07] (03PS3) 10Rush: one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 [16:20:11] ^d: ori : ldap on silver - i claim fixed [16:20:18] (03CR) 10jenkins-bot: [V: 04-1] admin module for user/group/permissions cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 (owner: 10Rush) [16:20:28] <^d> mutante: I was able to ssh yesterday. Lemme check the scripts. [16:21:52] (03CR) 10jenkins-bot: [V: 04-1] one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 (owner: 10Rush) [16:22:32] (03PS4) 10Rush: admin module for user/group/permissions cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 [16:22:34] (03PS4) 10Rush: one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 [16:23:40] <^d> mutante: Just need to add one more script to that list. [16:23:42] <^d> `modify-ldap-group` [16:24:20] ^d: ah, that makes sense, yes [16:24:23] (03CR) 10jenkins-bot: [V: 04-1] one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 (owner: 10Rush) [16:24:29] ^d: can do [16:24:38] (03PS1) 10Chad: Adding modify-ldap-group to permissions [operations/puppet] - 10https://gerrit.wikimedia.org/r/131723 [16:24:41] <^d> Done [16:26:34] (03PS2) 10Dzahn: Adding modify-ldap-group to permissions [operations/puppet] - 10https://gerrit.wikimedia.org/r/131723 (owner: 10Chad) [16:29:16] (03CR) 10Dzahn: [C: 032] "had just modify-ldap-user, but not -group" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131723 (owner: 10Chad) [16:31:00] superm401: cool, thanks [16:31:54] physikerwelt: yeah, just added our DBA (Sean). [16:32:47] ^d: /File[/etc/sudoers.d/demon]: Filebucketed .. [16:33:52] mutante: do you what is the eqiad replacement for sanger if any ? [16:34:08] ^d: svn-group doesnt make sense anymore on silver though [16:34:22] <^d> Yeah [16:34:25] <^d> Old stuff [16:35:23] matanya: just as much as your last comment on 6163 [16:36:11] so no host was brought up to be "eqiad replacement for sanger" [16:36:43] greg-g: do you know his gerrit username? [16:37:05] physikerwelt: the one I just added :) "Springle" [16:38:31] <^d> blargh. [16:38:42] matanya: not that i know of [16:38:43] <^d> user does exist you silly script. [16:38:54] ^d: ? [16:39:02] <^d> $ sudo modify-ldap-group --addmembers=dr0ptp4kt wmf [16:39:02] <^d> dr0ptp4kt doesn't exist, and won't be added to the group. [16:39:02] <^d> No changes to make; exiting. [16:39:13] greg-g: Thank you have a nice day [16:39:27] (03PS1) 10Cmjohnson: adding dhcpd & netboot for db1064-1073 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131728 [16:39:39] physikerwelt: you too! let me know if/when you need any more prodding. [16:39:55] ^d: ehm.. confirmed it exists with ldaplist -l passwd dr0ptp4kt [16:39:57] cajoel: can you please shed some light on this question ? [16:40:30] <^d> mutante: Indeed, so did I :) [16:41:15] (03CR) 10Cmjohnson: [C: 032] adding dhcpd & netboot for db1064-1073 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131728 (owner: 10Cmjohnson) [16:41:47] Accidentally ran: [16:41:49] git submodule update extensions/ on 1.24wmf2 [16:42:09] It looks like it affected EL. It says: [16:42:33] http://pastebin.com/RZcfBuRg [16:42:50] I'm going to do a sync-dir, but I'm not sure which commit EL was checkout out at before. ^ ori [16:43:36] ^d: grr.. "except KeyError:"? [16:43:49] <^d> superm401: It should be fine. [16:43:59] <^d> We changed the merge strategy of submodules to be rebased. [16:44:42] <^d> superm401: Yeah, looks ok [16:44:42] Okay, I just didn't mean to do anything with EL. [16:45:26] !log mattflaschen synchronized php-1.24wmf2/extensions/GettingStarted/ 'GettingStarted token and logging deployment' [16:45:33] Logged the message, Master [16:48:58] !log mattflaschen synchronized php-1.24wmf3/extensions/GettingStarted/ 'GettingStarted token and logging deployment' [16:49:05] Logged the message, Master [16:49:17] Done, testing now. [17:03:58] (03CR) 10Ori.livneh: [C: 032] Provision scap scripts using trebuchet [operations/puppet] - 10https://gerrit.wikimedia.org/r/129814 (owner: 10BryanDavis) [17:09:35] hm, hey, salt people? [17:09:51] I ran a 'tail -f' command [17:09:54] just to try it [17:10:08] but, i guess salt doesn't return until output ends? [17:10:25] i got [17:10:25] The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later run: [17:10:25] salt-run jobs.lookup_jid 20140506170738464735 [17:11:18] i've just killed my tail processes on the targets [17:11:26] ah, so nm, should be fine [17:11:37] i was worried the salt stuff would just sit around forever unless I killed it somehow [17:11:59] mark: Any chance you could help us with https://bugzilla.wikimedia.org/show_bug.cgi?id=60003 ? Or give me pointers on how to solve it? [17:15:08] 170 FetchError c straight insufficient bytes [17:15:19] that's the error condition in Varnish [17:15:20] ^d: fwiw, it also doesn't work on virt0 .. [17:15:35] <^d> :\ [17:16:46] mutante: silver is missing ldap::client::nss. This is why modify-ldap-group fails [17:17:30] akosiaris: i just removed nss because if it has that puppet cant create the local users, because it looks in ldap [17:17:38] then users already exist and puppet cant create them [17:18:12] so i could cheat and re-add it now that the users have been created? [17:18:32] hmm or maybe fix the broken script [17:18:36] bblack: iirc the straight insufficient bytes error was what we were seeing with the gzip issues as well, right? [17:18:40] i see it does pwd.getnam [17:18:48] akosiaris: before this https://gerrit.wikimedia.org/r/#/c/131638/2/manifests/role/ldap.pp puppet was broken [17:18:58] akosiaris: yes, that "TODO" in there? [17:19:32] mutante: yes. Let me see what I can do about it [17:19:42] * ebernhardson faints : < dvorak> ickymettle: puppet is moving to ordering the way they are in the file, unless you ask for something different [17:19:44] preferable to having the class anyway [17:19:54] akosiaris: thank you! [17:25:59] mark: according to http://marc.info/?l=varnish-misc&m=138202412404543&w=2 the error happens because "either storage could not be obtained .... or content-length header and real content length do not match" [17:26:01] but: [17:26:02] ori@iron:~$ curl -qs -I -H 'host: www.wikidata.org' mw1100/wiki/Special:EntityData/Q30.json | grep ^Content-Length [17:26:02] Content-Length: 174426 [17:26:03] ori@iron:~$ curl -qs -H 'host: www.wikidata.org' mw1100/wiki/Special:EntityData/Q30.json | wc -c [17:26:05] 174426 [17:26:08] so that rules out option 2 [17:27:18] 174kb is not that large... [17:27:25] mark: I don't recall, it's been a while since I looked at it [17:29:50] (03PS13) 10Giuseppe Lavagetto: Add 'rcstream' module for broadcasting recent changes over WebSockets [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/14045) (owner: 10Ori.livneh) [17:30:00] i'm also not seeing storage allocator failures [17:30:20] what is the proper value for Content-Length when the response is gzipped? [17:30:27] should it be the gzipped length or the decompressed length? [17:30:49] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Tue May 6 08:27:45 2014 [17:31:00] gzipped length [17:31:03] <_joe_> gzipped AFAIR but let me check [17:31:13] as it's opaque to http [17:31:59] there you go, then [17:32:04] ori@iron:~$ curl -qs -I -H 'accept-encoding: gzip' -H 'host: www.wikidata.org' mw1100/wiki/Special:EntityData/Q30.json | grep ^Content [17:32:05] Content-Length: 174426 [17:32:05] Content-Encoding: gzip [17:32:09] ori@iron:~$ curl -qs -H 'accept-encoding: gzip' -H 'host: www.wikidata.org' mw1100/wiki/Special:EntityData/Q30.json | wc -c [17:32:09] 29832 [17:32:43] <_joe_> ebernhardson: sorry, puppet is going to go back to ordered resources in file order? Who said that? [17:33:14] _joe_: dvorak in the #puppet room, not sure how accurate a source dvorak is [17:33:45] <_joe_> ebernhardson: because it was file order in 2.x, then they moved to random ordering by default in 3.x, now they go back? [17:34:29] 58 FetchError c straight insufficient bytes [17:34:29] 58 Gzip c u F - 29832 174426 80 80 238592 [17:35:16] so apache is sending the wrong content length? [17:36:08] we're actually doing gzip? [17:36:20] (in varnish?) [17:36:28] no [17:36:39] but mediawiki compresses [17:36:41] it's just noticing and varying? [17:36:50] it's decompressing when needed [17:36:55] it will typically store the compressed object [17:38:56] !log Changed email for global account "ElphiBot". [17:39:03] Logged the message, Master [17:39:47] yeah, then apache has the wrong length [17:39:57] hoo: so, for some reason apache sends the content length of the uncompressed object while sending a compressed object [17:40:01] and varnish rightfully chokes on that [17:40:33] ok, that makes sense [17:41:05] we do quite some weird output buffer mangling on our side, so it might be that we are triggering an apache bug that way [17:41:17] yeah [17:44:51] hoo: where is the weird output buffer mangling code? [17:45:10] (03PS1) 10Dzahn: create admins::bastion for _just_ bastion access [operations/puppet] - 10https://gerrit.wikimedia.org/r/131743 [17:45:36] (03PS1) 10Alexandros Kosiaris: modify-ldap-group: Remove the need for getent [operations/puppet] - 10https://gerrit.wikimedia.org/r/131744 [17:45:40] mutante: ^ . I already ran it for dr0ptp4kt, he is in wmf group now (why wasn't he before?) [17:46:47] $response->header( 'Content-Length: ' . strlen( $data ) ); [17:46:50] wtf [17:46:54] why are we doing that [17:46:55] akosiaris: awesome! i don't know why not before, but agree he should have [17:47:00] dr0ptp4kt: ^d [17:47:23] nice [17:48:21] akosaris and mutante (and ^demon), i'm not sure about that, although i did have a user id dr0ptp4kt for wikitech and abaso for other stuff. abogott last week consolidated the ids to dr0ptp4kt [17:48:40] (03CR) 10Dzahn: [C: 031] "thank you for this fix, since you say it already worked on silver, yay :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131744 (owner: 10Alexandros Kosiaris) [17:48:40] i'll see if i can login to logstash though [17:48:53] <^d> mutante: Yeah, that's what I was trying to fix :) [17:49:34] akosiaris, mutante, ^demon, abogott i'm now able to login to logstash with dr0ptp4kt and my password abcd1234. [17:49:40] JOKING! (on the password) [17:50:04] dr0ptp4kt: happy to hear that :-) [17:50:11] thank you all [17:50:12] akosiaris: happens more often that people get added later [17:50:13] ! [17:50:17] dr0ptp4kt: hah, nice:) [17:50:20] <^d> :) [17:50:33] thanks for your help mark :) [17:50:59] yw (and ori as well) [17:56:19] (03CR) 10Dzahn: "this is also "RT #6134: Replacement for formey ldap operations in eqiad" , wonder how it worked on formey before, it did not work on virt0" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131744 (owner: 10Alexandros Kosiaris) [18:12:50] (03PS1) 10Dzahn: add w.wiki, *.w.wiki [operations/apache-config] - 10https://gerrit.wikimedia.org/r/131750 [18:12:54] (03CR) 10jenkins-bot: [V: 04-1] add w.wiki, *.w.wiki [operations/apache-config] - 10https://gerrit.wikimedia.org/r/131750 (owner: 10Dzahn) [18:16:55] (03CR) 10Dzahn: [C: 032] modify-ldap-group: Remove the need for getent [operations/puppet] - 10https://gerrit.wikimedia.org/r/131744 (owner: 10Alexandros Kosiaris) [18:18:06] (03CR) 10Krinkle: Add 'rcstream' module for broadcasting recent changes over WebSockets (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/14045) (owner: 10Ori.livneh) [18:19:27] (03CR) 10Krinkle: Add 'rcstream' module for broadcasting recent changes over WebSockets (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/14045) (owner: 10Ori.livneh) [18:21:37] (03CR) 10Dzahn: "tested by removing myself from wmf with --deletemembers then re-adding with --addmembers, works fine on silver now, thank you" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131744 (owner: 10Alexandros Kosiaris) [18:48:49] PROBLEM - Puppet freshness on netmon1001 is CRITICAL: Last successful Puppet run was Tue May 6 15:48:07 2014 [18:55:44] (03CR) 10Giuseppe Lavagetto: [C: 031] Add 'rcstream' module for broadcasting recent changes over WebSockets (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/14045) (owner: 10Ori.livneh) [18:56:32] hey, is somebody on netmon1001 and wanted to disable puppet or is it the bug? it's "admin disabled? [18:57:06] the bug = sometimes it becomes admin disabled without an actual admin doing that [18:57:34] bd808: https://gerrit.wikimedia.org/r/#/c/131758/ :) [18:59:10] bblack: Cool. I'll check it out. [18:59:22] No unit tests? tsk tsk [18:59:40] There's an XXX/TODO list inside the code that at least mentions the lack of them! :) [19:00:02] That's something! [19:00:10] !log enabling puppet on netmon1001 [19:00:17] Logged the message, Master [19:00:19] RECOVERY - Puppet freshness on netmon1001 is OK: puppet ran at Tue May 6 19:00:12 UTC 2014 [19:01:46] mutante: only 15 tickets blocking tampa, where would it be useful to push in order to help out ? [19:03:34] paravoid: sfp coming? [19:04:18] cajoel: what a timing! [19:04:32] cajoel: just ordered [19:04:45] nice. [19:04:54] can you please clarify the status of sanger? and OIT ldap stuff there ? [19:05:04] * mark out [19:05:07] (03PS1) 10Jdlrobson: Use ContentNamespace rather than NearbyNamespace [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131762 [19:06:06] ^ Steinsplitter there ya go [19:06:07] matanya: I don't have access to Sanger [19:06:12] what do you mean by clarify? [19:07:00] my understanding is that Sanger is an LDAP mirror of Office LDAP servers, and that it's used by the inbound MX server to whitelist recipients [19:07:15] I'm not sure if there are other uses for it. [19:07:48] cajoel: it's about replacing it with an eqiad box [19:07:51] jdlrobson: thx [19:08:04] mutante: sure, that makes sense. [19:08:45] cajoel: so matanya was asking if a box for that has already been specified [19:08:50] should we open a new ticket to provision a machine in equiad to provide the same function. [19:08:56] yes, that :) [19:08:59] assumption is no. [19:09:02] not by me. [19:09:10] I can start that flow [19:09:44] that would be cool, yes [19:10:00] (03CR) 10EBernhardson: [C: 031] "From the high level looks good, but i would like to see more about which specific use cases this is solving. I have to guess this is abou" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/14045) (owner: 10Ori.livneh) [19:10:23] mutante: can you help by identifying the machine type sanger is currently on? [19:10:30] mutante: any chance you're in SF today? [19:10:40] cajoel: i'm here [19:10:44] on 3rd [19:11:05] logs in on racktables [19:11:38] I don't think I have access to racktables. [19:11:44] HW type:Dell PowerEdge 1950 [19:12:24] cajoel: it's _kind of_ old :) so i think pretty much anything misc will be fine [19:12:33] but we need to make a procurement ticket [19:12:43] can I some shoulder surf? [19:12:47] yes, sure [19:13:04] brt (leaving my laptop plugged in to desk) [19:15:10] there is a ticket already for sanger replacement [19:15:25] https://rt.wikimedia.org/Ticket/Display.html?id=7141 [19:15:48] now that the other misc systems have come in they can be allocated, its on my working list (im getting orders placed first, then allocating purchased hardware) [19:15:58] cajoel / mutante ^ [19:39:18] I think I'm going to try to send out a summary mail of sanger issues to ops. [19:39:30] just to encourage a broader conversation. [19:41:20] links those tickets (procurement etc.) [19:41:25] cajoel: "ETA 2014-05-07 [open]" [19:42:07] (what does your second line reference?) [19:42:40] cajoel: it's a ticket linked to the procurement ticket [19:42:49] https://rt.wikimedia.org/Ticket/Display.html?id=7145 [19:43:02] let me mail you , sec [19:45:04] k [19:45:21] RobH: tickets about hosts going to codfw, what are there fate? moved from tampa queue to codfw, when it is created ? [19:45:29] thanks for answers cajoel [19:45:36] mutante: templates/exim/exim4.conf.SMTP_IMAP_MM.erb [19:46:02] ori: Is the version of scap used by Beta Labs the master version or something else? Keen for the GitInfo change to have an effect there soon so I can see whether other changes are needed for e.g. VisualEditor. [19:46:06] "# Send mail for IMAP accounts to sanger" [19:47:16] cajoel: matanya: #6163 - shutdown sanger , #7141 - get replacement servers, #7145 - actual order [19:47:20] cajoel: ah! [19:47:22] James_F: i'm not sure when bd808 last updated it on beta [19:47:26] James_F: The cache files are present in beta now. I was just looking at includes/specials/SpecialVersion.php to figure out if there is a good way to circumvent the 24h memcache cache of the data shown there. [19:47:31] mutante: no access, as you may know [19:47:41] (03PS1) 10Rush: linter manual '-T xterm-256color' for noninteractive shells [operations/puppet] - 10https://gerrit.wikimedia.org/r/131771 [19:47:41] bd808: Aha, right. Awesome! Thank you. [19:48:00] mutante: can you try to run that sqllite qurty on mchenry to see what accounts are in USERDB [19:48:26] matanya: for hw repairs that were pending when server was shipped? i imagine that you are correct, they'll move to codfw queue, but lemme think about it [19:48:31] (03CR) 10Rush: [C: 032 V: 032] "pushing through no linting to be done" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131771 (owner: 10Rush) [19:48:50] matanya: robh/cmjohnson will know [19:48:52] i dont wanna say yes and then think of some horrible reason i shouldnt have agreed to in 5 minutes, heh [19:48:54] cajoel: yep [19:48:56] bd808: in a pinch: tcpdump -i eth0 -s 65535 -A -ttt port 11211 | cut -c 9- | grep -i '^get' | cut -d' ' -f2 [19:49:12] RobH: yes, that too, but also things like mexia for example [19:49:13] bd808: on memcN nodes, to see which keys the request is getting [19:49:47] ori: Neat trick. [19:53:34] James_F, ori: Ah ha. The cache will be invalidated as soon as the sha1 for MW-core changes. [19:54:29] bd808: Time to merge something in MW? ;- [19:55:24] James_F: Well you could review and merge https://gerrit.wikimedia.org/r/#/c/131764/ [19:56:07] matanya: mexia is still in pmtpa [19:56:14] so it wont move out of that queue until its decomissioned [19:56:25] not sure if thats what you meant [19:56:32] bd808: Shouldn't there be a \n between "if (" and "is_file" when it's a multi-line if? [19:56:44] bd808: Or is that JS-only WMF coding standard? [19:56:47] mexia is going to be a fairly special case, I think we'll have to update registrars before we move it [19:57:07] unless we have some crazy pan about routing tampa ip space to ulsfo or something [19:57:08] James_F: Dunno. Let me look at other multiline conditionals in that file... [19:57:13] s/pan/plan/ [19:57:20] RobH: i mean at last after decom in tampa, it will move for shipping, then it will need to enter a new queue [19:57:39] bd808: Apparently it's normal. [19:57:41] * James_F shrugs. [19:58:09] matanya: not always [19:58:13] that one is old, it prolly wont move [19:58:17] but be decommissioned entirely [19:58:20] so its a case by case basis [19:58:30] (we wont bother makign that call until we replace its use elsewhere) [19:58:37] cajoel: i mailed you a list of the IMAP users [19:58:44] thx [19:58:59] if it decoms, it just resolves the ticket after decom, doesnt move queues [19:59:10] thanks for clarifying this RobH [19:59:12] and if it moves, it also prolly still just resolves that ticket [19:59:18] and a new ticket ot rack as a new name elsewhere [19:59:33] i think the only tickets we plan to move is moving the ones where they are in hw fault [19:59:33] yes, got it now [19:59:36] cool [20:00:05] sorry for delay in clarification, phone rang, heh [20:01:22] cajoel: another one, incl. the "active"(0/1) column info [20:02:23] mutante: sent you a draft email -- anything missing? [20:03:45] mutante: the email list doesn't line up with the directory we saw... [20:06:17] is is possible that mchenry passes all these mails to sanger, and then sanger has a second delivery lookup to check if it should drop in to imap or go on to google? [20:06:32] (that would explain the mail outage for many users when sanger was down) [20:06:53] cajoel: the draft seems good to me, we can add 2 more tickets though [20:07:02] one would now be "ferm rule for sanger" [20:07:05] matanya: ^ [20:07:13] and the other one would be about monitoring for LDAP [20:07:41] cajoel: yes, i think that is the case [20:07:57] can I come look at the sanger email config again with you? [20:08:01] (mchenry -> sanger -> lookup -> imap or google) [20:08:13] and/or can we open an access ticker to add me to sanger? [20:08:26] a'la jdavis and/or a role [20:08:29] oit role? [20:08:49] cajoel: yes, let's do that, oit role +1 [20:08:56] and i can mail you config now [20:11:09] (03PS1) 10MaxSem: Enable GeoData collection mode everywhere [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131855 [20:13:17] (03PS1) 10Rush: diamond::generic manifest and updates to match [operations/puppet] - 10https://gerrit.wikimedia.org/r/131857 [20:14:42] cajoel: check for mail from sanger itself , and in there "user_filter:" [20:15:37] and local_user: [20:15:43] # Written on 2007-05-12 by Mark Bergsma [20:15:44] :) [20:15:58] (03CR) 10Manybubbles: [C: 031] "Lets get a window for this?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131855 (owner: 10MaxSem) [20:16:21] 7 year anniversary of that config [20:16:25] coming right up [20:17:46] greg-g, can we just SWAT ^^^ since it doesn't actually change anything until you reindex ES? [20:17:50] !log demon synchronized php-1.24wmf2/extensions/CirrusSearch/includes/Hooks.php 'I2638b695: fix for page moves' [20:17:57] Logged the message, Master [20:18:18] !log demon synchronized php-1.24wmf3/extensions/CirrusSearch/includes/Hooks.php 'I2638b695: fix for page moves' [20:18:25] Logged the message, Master [20:19:49] !log demon synchronized php-1.24wmf2/extensions/CirrusSearch/CirrusSearch.php 'I2638b695: fix for page moves' [20:19:51] !log demon synchronized php-1.24wmf3/extensions/CirrusSearch/CirrusSearch.php 'I2638b695: fix for page moves' [20:19:56] Logged the message, Master [20:20:04] Logged the message, Master [20:25:57] MaxSem: sure [20:25:59] !log demon synchronized php-1.24wmf2/extensions/CirrusSearch/includes/Hooks.php 'Fix typehinting' [20:26:06] Logged the message, Master [20:26:13] thanks greg-g [20:26:27] <^d> Debug in production. [20:26:29] <^d> Fun times. [20:26:38] mutante: is that sqllite db being pushed out with puppet? [20:26:58] cajoel: i don't think so, no [20:27:20] !log demon synchronized php-1.24wmf3/extensions/CirrusSearch/includes/Hooks.php 'Fix typehinting' [20:27:26] Logged the message, Master [20:27:51] (03CR) 10Dzahn: diamond::generic manifest and updates to match (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/131857 (owner: 10Rush) [20:28:41] (03CR) 10Dzahn: diamond::generic manifest and updates to match (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/131857 (owner: 10Rush) [20:31:44] (03CR) 10Dzahn: [C: 031] diamond::generic manifest and updates to match [operations/puppet] - 10https://gerrit.wikimedia.org/r/131857 (owner: 10Rush) [20:31:49] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Tue May 6 08:27:45 2014 [20:32:32] (03PS2) 10Rush: diamond::generic manifest and updates to match [operations/puppet] - 10https://gerrit.wikimedia.org/r/131857 [20:35:53] (03CR) 10Rush: diamond::generic manifest and updates to match (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/131857 (owner: 10Rush) [20:36:10] So...moving pages on officewiki seems to be giving me Wikimedia Foundation Errors [20:36:22] PHP fatal error in /usr/local/apache/common-local/php-1.24wmf2/extensions/CirrusSearch/includes/Hooks.php line 325: [20:36:25] Call to a member function getTitle() on a non-object [20:36:45] well, ^d just sync'd some debugging stuff [20:36:48] ^d: ^^?? [20:36:57] <^d> Gahh [20:36:57] (03CR) 10Dzahn: diamond::generic manifest and updates to match (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/131857 (owner: 10Rush) [20:36:59] (03CR) 10Rush: [C: 032 V: 032] "merging on daniel's +1 since mine was just comments" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131857 (owner: 10Rush) [20:37:45] (03CR) 10Dzahn: "yep, replied to comments, +1" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131857 (owner: 10Rush) [20:37:46] <^d> I'll live hack the fix because it's bad. [20:37:51] (03PS1) 10Jkrauska: Add oit admin group with jkrauska, Add access to sanger RT #7428 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131863 [20:37:53] hah [20:37:56] <^d> I'll get changes in gerrit asap too [20:38:12] <^d> I tested this crap too [20:38:13] <^d> blargh [20:41:04] (03CR) 10Dzahn: [C: 031] "lgtm, this should be sufficient since sanger has a public IP. moved ticket to access-requests. it just needs to wait for 3 days" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131863 (owner: 10Jkrauska) [20:42:33] !log demon synchronized php-1.24wmf2/extensions/CirrusSearch 'Rolling Cirrus back to known-good state' [20:42:39] Logged the message, Master [20:43:05] <^d> greg-g: I aborted everything I was doing rather than play whack a mole and get deployment further out of sync. [20:43:22] !log demon synchronized php-1.24wmf3/extensions/CirrusSearch 'Rolling Cirrus back to known-good state' [20:43:27] Logged the message, Master [20:53:45] ^d: thanks. [21:03:57] (03PS1) 10Dzahn: add redirect for w.wiki / *.w.wiki to www index [operations/apache-config] - 10https://gerrit.wikimedia.org/r/131869 [21:05:34] (03Abandoned) 10Dzahn: add w.wiki, *.w.wiki [operations/apache-config] - 10https://gerrit.wikimedia.org/r/131750 (owner: 10Dzahn) [21:07:49] (03PS2) 10Dzahn: add redirect for w.wiki / *.w.wiki to www index [operations/apache-config] - 10https://gerrit.wikimedia.org/r/131869 [21:36:32] (03CR) 10Kaldari: [C: 04-1] add redirect for w.wiki / *.w.wiki to www index (031 comment) [operations/apache-config] - 10https://gerrit.wikimedia.org/r/131869 (owner: 10Dzahn) [21:43:55] (03PS3) 10Dzahn: add redirect for w.wiki / *.w.wiki to www index [operations/apache-config] - 10https://gerrit.wikimedia.org/r/131869 [21:43:58] (03CR) 10jenkins-bot: [V: 04-1] add redirect for w.wiki / *.w.wiki to www index [operations/apache-config] - 10https://gerrit.wikimedia.org/r/131869 (owner: 10Dzahn) [21:44:04] bah [21:44:26] (03PS4) 10Dzahn: add redirect for w.wiki / *.w.wiki to www index [operations/apache-config] - 10https://gerrit.wikimedia.org/r/131869 [21:45:43] (03CR) 10Dzahn: [C: 032] add w.wiki, link to wikipedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/131627 (owner: 10Dzahn) [21:48:39] (03CR) 10Kaldari: [C: 031] add redirect for w.wiki / *.w.wiki to www index [operations/apache-config] - 10https://gerrit.wikimedia.org/r/131869 (owner: 10Dzahn) [21:50:09] (03CR) 10Dzahn: [C: 032] add redirect for w.wiki / *.w.wiki to www index [operations/apache-config] - 10https://gerrit.wikimedia.org/r/131869 (owner: 10Dzahn) [21:57:42] (03CR) 10Dzahn: "http://w.wiki" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/131869 (owner: 10Dzahn) [21:57:43] !log gracefull'ing apaches [21:57:49] Logged the message, Master [22:10:46] (03CR) 10Dzahn: "http://whois.nic.wiki/" [operations/dns] - 10https://gerrit.wikimedia.org/r/131627 (owner: 10Dzahn) [22:27:22] (03PS1) 10MaxSem: Disable mobile redirection for nostalgia.wikipedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/131887 (https://bugzilla.wikimedia.org/64972) [22:33:00] PROBLEM - MySQL Processlist on db1059 is CRITICAL: CRIT 76 unauthenticated, 0 locked, 0 copy to table, 1 statistics [22:33:59] RECOVERY - MySQL Processlist on db1059 is OK: OK 1 unauthenticated, 0 locked, 0 copy to table, 1 statistics [22:39:12] kaldari: try again :) [22:39:36] mutante: nice :) [22:41:02] eh, what happened to db1059 ^^^? query killer? [22:41:56] got ori ? [22:53:04] (03PS1) 10Rush: first diamond nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/131891 [22:58:29] (03CR) 10Rush: [C: 032 V: 032] "verified all hosts have resources headroom, will babysit this. going for it." [operations/puppet] - 10https://gerrit.wikimedia.org/r/131891 (owner: 10Rush) [23:08:38] (03PS1) 10MaxSem: Add (ten|quality).m.wikipedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/131896 (https://bugzilla.wikimedia.org/64972) [23:11:51] (03PS2) 10MaxSem: Disable mobile redirection for a bunch of .wikipedia.org domains [operations/puppet] - 10https://gerrit.wikimedia.org/r/131887 (https://bugzilla.wikimedia.org/64972) [23:12:13] (03PS1) 10Rush: few more seed nodes for diamond [operations/puppet] - 10https://gerrit.wikimedia.org/r/131898 [23:12:29] MaxSem, anyone deploying your thing? [23:12:42] mwalker, nope? [23:12:54] kk; I'll do it [23:12:56] I can do now [23:13:19] ok; sure [23:13:25] MaxSem, on your plate [23:13:31] * mwalker goes back to debugging payments [23:13:32] thanks for reminding:) [23:13:55] (03CR) 10Rush: [C: 032 V: 032] "rolling this out" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131898 (owner: 10Rush) [23:14:22] (03CR) 10MaxSem: [C: 032] Enable GeoData collection mode everywhere [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131855 (owner: 10MaxSem) [23:15:59] (03Merged) 10jenkins-bot: Enable GeoData collection mode everywhere [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131855 (owner: 10MaxSem) [23:18:54] !log maxsem synchronized wmf-config/InitialiseSettings.php 'https://gerrit.wikimedia.org/r/131855' [23:19:01] Logged the message, Master [23:32:49] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Tue May 6 08:27:45 2014 [23:35:12] (03PS1) 10Dzahn: account for kleduc and add to admins::privatedata [operations/puppet] - 10https://gerrit.wikimedia.org/r/131905 [23:35:23] Eloquence: ping [23:37:18] (03CR) 10Dzahn: [C: 04-2] "please add the correct SSH key, confirm it and waiting period" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131905 (owner: 10Dzahn) [23:41:44] (03PS1) 10Withoutaname: Delete ve.wikimedia.org and leave redirect [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131907 (https://bugzilla.wikimedia.org/55737) [23:47:25] preilly, what's up? [23:47:43] Eloquence: may I PM [23:48:05] (03PS1) 10Dzahn: add kleduc to analytics users for hadoop access [operations/puppet] - 10https://gerrit.wikimedia.org/r/131908 [23:48:26] sure :) [23:50:32] (03CR) 10Dzahn: [C: 04-2] "only after Ie0386ce426 and ACK from analytics ops" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131908 (owner: 10Dzahn) [23:51:25] (03PS2) 10Dzahn: fix apache-graceful-all for use in eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/130600