[00:00:36] James_F: The VE config change is now live according to Erik B [00:01:03] greg-g: can i pushout another config update for zerowiki in a bit? [00:02:11] RoanKattouw: Yes, I know. :-) [00:02:30] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Server Error - 1703 bytes in 7.475 second response time [00:03:50] http://privatekeycheck.com/ :D [00:07:08] hoo: thx, made me smile [00:07:41] :) [00:08:02] yurikR: suppose so [00:12:23] (03PS1) 10Yurik: zerowiki enable sysop to add/remove all groups [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125045 [00:12:44] greg-g: ^ [00:13:24] yurikR: is there local wiki consensus? :P [00:13:31] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 227867 bytes in 8.242 second response time [00:14:49] greg-g: yes - the whole community of 1 users has agreed ot it [00:15:06] yurikR: :) [00:15:50] 100% yes, 100% participation... a rare occurance [00:15:55] yurikR: Link to discussion? You know, for recording purposes :p [00:16:23] JohnLewis: +1 [00:17:21] (03CR) 10Yurik: [C: 032] "JohnLewis & greg-g concurred" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125045 (owner: 10Yurik) [00:17:39] there! [00:18:28] yurikR: If we're tge community, then sure. [00:18:47] bummer... i just can't please everyone, can i? [00:18:52] true wiki spirit [00:18:55] (03Merged) 10jenkins-bot: zerowiki enable sysop to add/remove all groups [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125045 (owner: 10Yurik) [00:19:11] yurikR: 'Sorry, en.zero.wikipedia.org is only supported by select mobile carriers and is not available from your mobile carrier.' :( [00:20:39] !log yurik synchronized wmf-config/InitialiseSettings.php [00:20:44] Logged the message, Master [00:20:50] wow, how come it went so fast now [00:21:18] Because the wiki has Zero weight :D [01:05:31] !log Undid local patch to "grunt-lib-phantomjs/phantomjs/main.js" (for bug 63579) in "/srv/deployment/integration/slave-scripts" on gallium [01:05:39] Logged the message, Master [01:06:18] !log git-deploy: Deploying integration/slave-scripts 'If2539ccb3152bd0' [01:06:22] Logged the message, Master [01:44:50] (03PS1) 10Ori.livneh: Use '$channel' to branch on interpreter [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125062 [01:46:31] (03PS2) 10Ori.livneh: Use '$channel' to branch on interpreter [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125062 [01:47:26] (03CR) 10Ori.livneh: [C: 032] Use '$channel' to branch on interpreter [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125062 (owner: 10Ori.livneh) [01:48:03] !log ori updated /a/common to {{Gerrit|I697f7e4a6}}: Use '$channel' to branch on interpreter [01:48:10] Logged the message, Master [01:48:44] !log ori synchronized wmf-config/CommonSettings.php 'I697f7e4a6: Use to branch on interpreter' [01:48:49] Logged the message, Master [01:49:17] !log ori synchronized wmf-config/InitialiseSettings-labs.php 'I697f7e4a6: Use to branch on interpreter' [01:49:20] Logged the message, Master [02:00:45] (03PS1) 10Ori.livneh: Update multiversion regexp for *.beta-hhvm.wmflabs.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125063 [02:00:54] (03CR) 10jenkins-bot: [V: 04-1] Update multiversion regexp for *.beta-hhvm.wmflabs.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125063 (owner: 10Ori.livneh) [02:06:54] (03PS1) 10Yurik: ZeroWiki: Add extra page to whitelist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125064 [02:13:21] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 2787 MB (2% inode=99%): [02:18:00] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [02:18:00] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [02:18:00] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [02:18:00] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [02:18:21] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3710 MB (3% inode=99%): [02:22:13] !log LocalisationUpdate completed (1.23wmf20) at 2014-04-10 02:22:11+00:00 [02:22:19] Logged the message, Master [02:48:30] PROBLEM - MySQL InnoDB on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:49:40] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:50:21] RECOVERY - MySQL InnoDB on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [02:50:31] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [03:00:21] RECOVERY - Disk space on virt0 is OK: DISK OK [03:00:48] (03PS2) 10Ori.livneh: Update multiversion regexp for *.beta-hhvm.wmflabs.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125063 [03:01:19] (03CR) 10Ori.livneh: [C: 032] Update multiversion regexp for *.beta-hhvm.wmflabs.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125063 (owner: 10Ori.livneh) [03:01:29] !log ori updated /a/common to {{Gerrit|Ibdbac982b}}: Update multiversion regexp for *.beta-hhvm.wmflabs.org [03:01:37] Logged the message, Master [03:01:59] !log ori synchronized multiversion/MWMultiVersion.php 'Ibdbac982b: Update multiversion regexp for *.beta-hhvm.wmflabs.org' [03:02:03] Logged the message, Master [03:19:21] !log LocalisationUpdate completed (1.23wmf21) at 2014-04-10 03:19:18+00:00 [03:19:26] Logged the message, Master [03:31:16] (03PS1) 10Ori.livneh: beta: correct wgCanonicalServer string format [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125070 [03:34:00] (03PS2) 10Ori.livneh: beta: correct wgCanonicalServer string format [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125070 [03:34:22] (03CR) 10Ori.livneh: [C: 032] beta: correct wgCanonicalServer string format [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125070 (owner: 10Ori.livneh) [04:11:52] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Apr 10 04:11:47 UTC 2014 (duration 11m 46s) [04:11:57] Logged the message, Master [05:19:00] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:19:00] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:19:00] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:19:00] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:19:45] (03CR) 10Giuseppe Lavagetto: "Fair enough :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125025 (owner: 10Ori.livneh) [06:15:35] !log Some interface messages are missing on wikidata.org. Started a manual l10nupdate. [06:15:39] Logged the message, Master [06:24:58] (03PS1) 10Ori.livneh: Avoid using bits on beta-hhvm.wmflabs.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125073 [06:26:11] (03CR) 10Ori.livneh: [C: 032] Avoid using bits on beta-hhvm.wmflabs.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125073 (owner: 10Ori.livneh) [06:26:55] !log ori updated /a/common to {{Gerrit|I20bbe05cc}}: Avoid using bits on beta-hhvm.wmflabs.org [06:27:00] Logged the message, Master [06:27:30] !log ori synchronized wmf-config/CommonSettings.php 'I20bbe05cc: Avoid using bits on beta-hhvm.wmflabs.org' [06:27:34] Logged the message, Master [06:27:35] !log LocalisationUpdate completed (1.23wmf20) at 2014-04-10 06:27:30+00:00 [06:27:40] Logged the message, Master [06:27:57] ori: yeah no dice [06:28:18] still broken [06:28:23] that was wmf20 [06:28:25] though diffs now work [06:28:26] wikidata.org is wmf21 [06:35:53] Jasper_Deng: give it another few minutes to finish rebuilding the wmf21 cache [06:46:01] !log LocalisationUpdate completed (1.23wmf21) at 2014-04-10 06:45:59+00:00 [06:46:06] Logged the message, Master [06:47:20] Jasper_Deng: ori still broken on that wikidata recentchanges page [06:47:28] I don't see it anywhere else on a wmf21 wiki [06:47:42] that's for Lydia to know then [06:47:51] I'm falling asleep at the keyboard, mind poking her for me? [06:48:32] * greg-g zonks out [06:48:34] g'night [07:19:09] (03PS2) 10BBlack: Add proxy support for 437-01. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124887 (owner: 10Dr0ptp4kt) [07:19:57] (03CR) 10BBlack: [C: 032 V: 032] Add proxy support for 437-01. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124887 (owner: 10Dr0ptp4kt) [07:21:21] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Apr 10 07:21:16 UTC 2014 (duration 7m 21s) [07:21:26] Logged the message, Master [07:39:31] localisation update fail???? [07:41:12] ori: ? any idea? [07:42:49] aude: nope. i haven't investigated it. i ran a manual update because that fixed things before. it didn't, so the problem could be unrelated. [07:45:02] i will put wikidata back on wmf20, and then we can investigate [07:45:08] test wikidata would still be broken [07:45:23] it's like all the wikibase messages are missing [07:46:36] (03PS1) 10Aude: Put wikidata back on wmf/1.23wmf20 due to localisation cache issues [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125091 [07:46:45] i might have idea [07:47:07] extension messages might be out of sync with the version of wikibase in wmf21 [07:48:18] i don't know quite how to fix and don't want to touch myself, but probably need extension messages rebuilt or just keep wikidata on wmf20 [07:48:28] and then on wmf22 next week [07:48:44] well, i rebuilt them by forcing an l10nupdate, it didn't help [07:48:46] nevermind [07:48:53] hm? [07:48:59] there are also messages in client (e.g. wikipedia) [07:49:08] so we can't just skip wmf21 [07:49:29] like the 13th floor of some hotels [07:49:33] (03CR) 10Aude: [C: 032] Put wikidata back on wmf/1.23wmf20 due to localisation cache issues [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125091 (owner: 10Aude) [07:49:49] we can take time to figure out and when people are around [07:50:36] (03PS8) 10Rush: module to manage new python-diamond package [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 [07:50:52] (03Merged) 10jenkins-bot: Put wikidata back on wmf/1.23wmf20 due to localisation cache issues [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125091 (owner: 10Aude) [07:52:08] (03CR) 10ArielGlenn: [C: 031] "straight copy, and change of contents to follow separately as I understand it, then this is good to go by me." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123852 (owner: 10Dzahn) [07:52:21] (03CR) 10Andrew Bogott: [C: 031] module to manage new python-diamond package [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 (owner: 10Rush) [07:53:09] !log aude synchronized wikiversions.json 'Put Wikidata back on 1.23wmf20, due to localisation cache issues' [07:53:15] Logged the message, Master [07:54:08] ori: is there more i have to do than wikiversions.json? rebuild cdb? [07:54:35] (03PS9) 10Rush: module to manage new python-diamond package [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 [07:54:57] i think so [07:55:11] (03PS1) 10Alexandros Kosiaris: ignore vnet interfaces in check_eth [operations/puppet] - 10https://gerrit.wikimedia.org/r/125094 [07:55:43] sync-wikiversions [07:55:49] yes [07:55:53] also, i think i know what might be up [07:55:58] (03PS1) 10Springle: Enable engine condition pushdown for dbstore [operations/puppet] - 10https://gerrit.wikimedia.org/r/125096 [07:56:22] ok? [07:56:27] i'll do that and then we can look [07:56:37] (03PS10) 10Rush: module to manage new python-diamond package [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 [07:57:03] !log aude rebuilt wikiversions.cdb and synchronized wikiversions files: Rebuild wikiversions and put wikidata on 1.23wmf20 [07:57:08] Logged the message, Master [07:57:25] (03CR) 10Alexandros Kosiaris: [C: 032] ignore vnet interfaces in check_eth [operations/puppet] - 10https://gerrit.wikimedia.org/r/125094 (owner: 10Alexandros Kosiaris) [07:59:03] (03CR) 10Andrew Bogott: [C: 031] module to manage new python-diamond package [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 (owner: 10Rush) [07:59:29] but still seems broken [07:59:40] PROBLEM - check configured eth on virt1007 is CRITICAL: virbr0 reporting no carrier. [07:59:42] (03CR) 10Rush: [C: 032] module to manage new python-diamond package [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 (owner: 10Rush) [08:00:24] (03PS2) 10Hashar: contint: apply maven settings on labs slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/124822 [08:03:24] (03PS2) 10Springle: Enable engine condition pushdown for dbstore [operations/puppet] - 10https://gerrit.wikimedia.org/r/125096 [08:03:57] w/o js is good [08:04:15] just js is broken [08:05:31] PROBLEM - check configured eth on virt1006 is CRITICAL: virbr0 reporting no carrier. [08:05:45] (03CR) 10Springle: [C: 032] Enable engine condition pushdown for dbstore [operations/puppet] - 10https://gerrit.wikimedia.org/r/125096 (owner: 10Springle) [08:07:20] PROBLEM - check configured eth on virt1004 is CRITICAL: virbr0 reporting no carrier. [08:08:00] PROBLEM - check configured eth on virt1003 is CRITICAL: virbr0 reporting no carrier. [08:09:42] trying to touch stuff [08:09:54] !log aude synchronized php-1.23wmf20/extensions/Wikidata [08:09:59] Logged the message, Master [08:12:01] https://www.wikidata.org/wiki/Q60?debug=true is broken also [08:12:06] still [08:17:38] (03PS3) 10Dzahn: puppetize apache-graceful-all [operations/puppet] - 10https://gerrit.wikimedia.org/r/123852 [08:19:26] trying to touch more js [08:19:34] !log aude synchronized php-1.23wmf20/extensions/Wikidata [08:20:00] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:20:00] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:20:00] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:20:00] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:22:07] js still broken [08:22:15] * aude not sure what else to do :( [08:22:21] (03PS3) 10Hashar: contint: apply maven settings on labs slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/124822 [08:22:21] ori: still around? [08:22:40] messages work in non-js [08:22:42] broken in js [08:23:25] (03PS4) 10Hashar: contint: apply maven settings on labs slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/124822 [08:24:18] run scap, but seems scary [08:24:20] ? [08:26:21] PROBLEM - check configured eth on virt1005 is CRITICAL: virbr0 reporting no carrier. [08:26:44] aude: step 1: take a break for a moment. [08:27:15] ok [08:27:20] * aude trying to debug [08:27:21] it's no good to try everything in a panic; that's usually how our biggest outages happen (a careless, rushed fix for a small outage) [08:27:28] of course [08:27:53] are you on the engineering list? [08:27:56] no [08:27:57] (03CR) 10Hashar: [C: 031 V: 032] "Cherry picked on integration-puppetmaster.eqiad.wmflabs PS2-4 were meant to fix a permission issue (file belonged to root:root) which cau" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124822 (owner: 10Hashar) [08:28:10] ops [08:32:52] we also have stuff like "<wikibase-dataitem>" in wikivoyage [08:33:13] so i suppose we need to get wmf21 fixed ... a blocker for wikipedia [08:34:36] (03CR) 10Alexandros Kosiaris: [C: 032] contint: apply maven settings on labs slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/124822 (owner: 10Hashar) [08:34:41] (03PS1) 10BBlack: add public service IPs to lvs300[1234] [operations/puppet] - 10https://gerrit.wikimedia.org/r/125101 [08:36:40] (03PS2) 10BBlack: add public service IPs to lvs300[1234] [operations/puppet] - 10https://gerrit.wikimedia.org/r/125101 [08:36:57] (03CR) 10BBlack: [C: 032 V: 032] add public service IPs to lvs300[1234] [operations/puppet] - 10https://gerrit.wikimedia.org/r/125101 (owner: 10BBlack) [08:37:25] aude: i forwarded the last message to the wikidata list just now [08:38:06] ok [08:39:07] when i do wfMessage( 'wikidata-edit' ) .... in wmf21, broken of course [08:39:21] in wmf20, it's correct on php side, as we see on wikidata [08:39:41] just wonder why the js doesn't see them [08:42:08] !log forcing Bugzilla logout for all users [08:42:13] Logged the message, Master [08:43:30] aude: try /usr/local/bin/mwscript extensions/WikimediaMaintenance/refreshMessageBlobs.php --wiki=wikidatawiki [08:43:41] ok [08:44:20] aude: as l10nupdate user ideally. have you started yet? [08:44:59] not yet [08:45:20] i'll try on testwikidata first, even though it will do no good [08:47:21] nope [08:48:14] Does anybody know what server you connect when your save a page? [08:48:27] It's for a school who wants to local the vandals. :) [08:50:54] (03PS1) 10Rush: collector definition. CPU and Network collector enable. [operations/puppet] - 10https://gerrit.wikimedia.org/r/125104 [08:55:09] !log ms6 - shutdown -h now [08:55:14] Logged the message, Master [08:55:26] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [08:58:29] (03PS1) 10BBlack: add lvs300[1234] to lvs::configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/125105 [08:58:29] akosiaris: regarding "virbr0 reporting no carrier", is it important that I sort out and understand that, or can we just turn off that check? [08:58:45] hashar: http://en.wikipedia.beta-hhvm.wmflabs.org/wiki/Main_Page [08:58:49] in other words… is that check a hard-coded enumeration of interfaces, or does it automatically check every interface that it can find? [08:59:35] …actually, was that one you or cmjohnson1? [09:00:50] (03CR) 10BBlack: [C: 032 V: 032] add lvs300[1234] to lvs::configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/125105 (owner: 10BBlack) [09:02:35] andrewbogott: it checks every interface it can find but we can skip interfaces (we already skip alias interface and vtnets). But it may be better to actually know why that virbr0 interface is there (and is not there in some hosts). [09:02:59] ori: you are the best :-] [09:03:12] ori: did you get the cssmin corruption solved / figured out ? [09:04:20] hashar: no, it went away on its own. i did my usual thing of changing things before i completely understood them. there are a lot of configuration values that are set on the basis of the request host and i didn't handle all of them [09:04:47] !log Jenkins bunch of jobs are not being triggered properly. Taking traces. [09:04:52] Logged the message, Master [09:05:15] ori: that would explain. You might want to restart the bits varnish to clear out RL entries [09:05:34] i ended up with something slightly kludgy, which is to add '$channel' (like the chrome release channels -- not a perfect analogy, i admit) as one the string patterns that get replaced (with 'beta-hhvm' or 'beta' or 'stable') [09:05:44] had to stop using bits, too, actually [09:05:50] :-D [09:05:56] the req-host rewriting for bits load.php urls needs literal domain name suffix (any regular expression syntax is escaped), and it only allows for a maximum of three dynamic parts that are passed to the application stack [09:06:10] so given the suffix 'beta.wmflabs.org', you can have en.m.wikipedia.beta.wmflabs.org (the three dynamic parts being 'en', 'm', and 'wikipedia'). if i set the suffix to just 'wmflabs.org' then en.wikipedia.beta-hhvm.wmflabs.org would work but en.m would not [09:06:28] akosiaris: OK, I'll make an attempt at understanding :) [09:06:39] changing that means changing the vcl for prod too so i didn't want to touch that [09:07:05] hashar: if you are wondering how i typed that so quickly it's because i copy-pasted from a chat with aaron earlier :P [09:07:19] * hashar drops packets [09:07:20] :D [09:08:20] hashar: i think in hindsight you were right, though [09:09:19] at this point the configuration for the hhvm-powered wiki and the php-powered ones are a little different -- not a lot, but enough to make performance comparisons meaningless [09:09:45] so it would have probably been better to have the varnishes delegate to a separate set of apache instances [09:11:08] (03CR) 10Dzahn: [C: 032] "it's down, bye" [operations/dns] - 10https://gerrit.wikimedia.org/r/124901 (owner: 10Matanya) [09:11:31] bye bye ms6 [09:11:39] !log DNS update - removing ms6 [09:11:43] Logged the message, Master [09:15:39] ori: well you can still split them :-) [09:16:09] ori: I think it would be nice to have Varnish act as a director between apaches and hhvm instances. Would let us load the hhvm cluster gradually and easily fallback to the apache/zend application servers [09:16:21] hashar: yes, i think you're right [09:16:34] i know you said it before but i was foolish and didn't listen :P [09:16:35] we could even make it a beta feature which would set a cookie [09:16:42] if varnish is able to route based on a cookie [09:16:50] yes, it is, and yeah, we could [09:16:52] matanya: what do you mean on 122338 in line 231 [09:16:52] and we will need to vary cache which has various implications [09:16:59] matanya: what should i align there [09:17:14] * matanya is looking [09:17:29] hashar: i think i'll stick with having both interpreters on each apache until our quarterly review on the 15th, since getting it working was one of the targets rob set [09:17:29] ori: dont blame yourself!! [09:17:37] but afterwards probably split [09:17:56] mutante: the arrows [09:18:08] ori: the main goal was to get hhvm running on beta and I think it is fulfilled. [09:18:15] matanya: they look aligned to me [09:18:26] matanya: 230 and 231 are aligned [09:18:31] ori: now the team can think about the target architecture and how to migrate :-] [09:18:43] i don't remember what i meant there [09:18:50] might be just a mistake [09:18:56] ori: we will also need to investigate all the available hhvm settings and finely tune them. It is oing to take a while [09:19:00] hashar: it's going to be pretty huge [09:19:03] yeah, i was just about to say [09:19:04] matanya: ok! i thought maybe you mean that the whole file is 2-spaces :p [09:19:07] there are a lot more knobs to twiddle [09:19:30] maybe i did mutante hard to tell, i have a memory leak [09:20:02] matanya: alright, no worries, thanks [09:20:15] !log Jenkins unpooling both slave labs using the web interface and killing the Jenkins client running as jenkins-deploy . Will repool so the job can be reregistered properly {{bug|63760}} [09:20:20] Logged the message, Master [09:23:41] (03CR) 10Springle: [C: 031] collector definition. CPU and Network collector enable. [operations/puppet] - 10https://gerrit.wikimedia.org/r/125104 (owner: 10Rush) [09:23:55] (03PS2) 10Rush: collector definition. CPU and Network collector enable. [operations/puppet] - 10https://gerrit.wikimedia.org/r/125104 [09:24:19] (03CR) 10Rush: [C: 032] collector definition. CPU and Network collector enable. [operations/puppet] - 10https://gerrit.wikimedia.org/r/125104 (owner: 10Rush) [09:28:13] (03CR) 10Rush: [V: 032] "jenkins where are you?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125104 (owner: 10Rush) [09:29:08] !log Jenkins: disabling Gearman client in https://integration.wikimedia.org/ci/configure and reenabling it [09:29:13] Logged the message, Master [09:32:06] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [09:33:13] (03CR) 10Ori.livneh: "modules/graphite/lib/puppet/parser/functions/configparser_format.rb adds a configparser_format() function that you can use to turn a Puppe" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/125104 (owner: 10Rush) [09:38:50] (03CR) 10Rush: "checking out the configparser_format function. If it seems good I will follow up with that in a bit." (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/125104 (owner: 10Rush) [09:39:23] (03PS1) 10Ori.livneh: Move HHVM extension blacklist below extract($globals) so it isn't simply clobbered [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125108 [09:39:28] !log Zuul processed its backlog. Had to disconnect/reconnect the labs slaves. There is some weird bug occurring :-( [09:39:34] Logged the message, Master [09:40:55] (03CR) 10Ori.livneh: collector definition. CPU and Network collector enable. (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/125104 (owner: 10Rush) [09:41:47] (03CR) 10Ori.livneh: [C: 032] Move HHVM extension blacklist below extract($globals) so it isn't simply clobbered [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125108 (owner: 10Ori.livneh) [09:41:58] !log ori updated /a/common to {{Gerrit|I107179a27}}: Move HHVM extension blacklist below extract($globals) so it isn't simply clobbered [09:42:04] Logged the message, Master [09:44:07] !log ori synchronized wmf-config/CommonSettings.php 'I107179a27: Move HHVM extension blacklist below extract($globals) so it isn't simply clobbered' [09:44:12] Logged the message, Master [09:50:13] (03PS1) 10BBlack: remove dysfunctional DNS recursor from LVSes [operations/puppet] - 10https://gerrit.wikimedia.org/r/125113 [09:50:46] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [09:52:42] (03CR) 10BBlack: [C: 032 V: 032] remove dysfunctional DNS recursor from LVSes [operations/puppet] - 10https://gerrit.wikimedia.org/r/125113 (owner: 10BBlack) [09:53:07] (03PS1) 10Alexandros Kosiaris: Have check_sysctl populated on all hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/125116 [09:53:09] (03PS1) 10Alexandros Kosiaris: Set up a check for LVS rp_filter is disabled [operations/puppet] - 10https://gerrit.wikimedia.org/r/125117 [09:54:34] (03CR) 10jenkins-bot: [V: 04-1] Set up a check for LVS rp_filter is disabled [operations/puppet] - 10https://gerrit.wikimedia.org/r/125117 (owner: 10Alexandros Kosiaris) [09:56:55] (03CR) 10Aklapper: [C: 04-1] "01tonythomas: Did you test the patch on a local Bugzilla test instance and did it work?" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/124140 (owner: 1001tonythomas) [09:58:23] (03CR) 10Alexandros Kosiaris: [C: 032] Have check_sysctl populated on all hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/125116 (owner: 10Alexandros Kosiaris) [09:59:36] (03PS1) 10Springle: Enable all MariaDB block-based joins algorithms [operations/puppet] - 10https://gerrit.wikimedia.org/r/125120 [10:02:09] (03PS2) 10Alexandros Kosiaris: Set up a check for LVS rp_filter is disabled [operations/puppet] - 10https://gerrit.wikimedia.org/r/125117 [10:06:11] (03PS2) 10Dzahn: lint role/deployment [operations/puppet] - 10https://gerrit.wikimedia.org/r/122338 [10:13:00] (03CR) 10Matanya: "on labs i gut:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125116 (owner: 10Alexandros Kosiaris) [10:13:17] ori: ^ this might interst you [10:59:15] mutante: can i configure RT in labs ? [10:59:41] there is already an instance, isn't it? [10:59:48] * yuvipanda remembers setting up a proxy for one such thing [11:01:47] where yuvipanda ? [11:01:59] matanya: ottomata or drdee set it up, IIRC [11:02:04] don't remember too well :( [11:02:15] I setup a proxy at rt.wmflabs.org but that no longer seems to be aroudn [11:02:18] * yuvipanda pokes wikitech [11:03:36] back [11:04:06] matanya: aha! project packaging, instance 'rt'. status: SHUTOFF [11:04:17] matanya: so i guess you can ignore it and assume it doesn't exist [11:04:39] i don't find any puppet role i can apply [11:06:24] matanya: what's your wikitech username so I can add you to the project? [11:06:30] matanya [11:06:31] matanya. [11:06:35] haha :) [11:06:51] :) [11:06:55] added [11:07:11] thanks [11:07:17] :) [11:07:32] I know nothing of the project / instance, just the fact it exists :) [11:09:04] the box is dead :) [11:09:14] i'll wait for mutante he will know, i guess [11:14:45] matanya: :) ok! [11:15:05] thanks for the help [11:18:54] (03CR) 10Dzahn: lint role/deployment (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122338 (owner: 10Dzahn) [11:19:03] matanya: what's dead [11:19:07] (03CR) 10Springle: [C: 032] Enable all MariaDB block-based joins algorithms [operations/puppet] - 10https://gerrit.wikimedia.org/r/125120 (owner: 10Springle) [11:19:24] mutante: i wanted to develop stuff for rt, and test it in labs [11:19:42] matanya: i don't know about that instance [11:19:42] but can't find a way to configure a rt role in labs [11:19:59] matanya: role/rt.pp ? [11:20:34] mutante: i meant using manage instance [11:20:37] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [11:20:37] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [11:20:37] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [11:20:37] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [11:21:04] matanya: it needs to be added to "puppet groups" then [11:21:13] hold on [11:21:57] (03CR) 10Alexandros Kosiaris: [C: 032] Set up a check for LVS rp_filter is disabled [operations/puppet] - 10https://gerrit.wikimedia.org/r/125117 (owner: 10Alexandros Kosiaris) [11:23:46] matanya: sooo.. a) create new project or have existing one b) be project admin of that project c) add any puppet class you want to puppet groups d) configure instance and apply [11:24:11] matanya: existing project? [11:24:17] yes mutante [11:24:18] there's definitely a role class specifically for creating an RT labs instance... [11:24:21] * andrewbogott looks for name [11:24:28] rt-128 instance in the puppet project [11:25:28] andrewbogott: role/rt.pp right [11:25:36] that's for prod I think [11:25:46] matanya, I can add you to the rt project if you like... [11:26:02] oh, nm I deleted it apparently :) [11:26:24] yes, yuvipanda added me, but it is down [11:26:26] role::rt::labs [11:26:48] Um… differs from the prod role, possibly for good reasons or possibly for bad [11:27:06] matanya, what's it called? I can't find it [11:27:10] (the rt proj) [11:27:29] also: https://gerrit.wikimedia.org/r/#/c/116064/ [11:27:29] project packaging, instance 'rt' [11:27:36] oh, that's something else [11:27:37] how is it packaging [11:27:50] oh, i was sure it was merged [11:27:56] * andrewbogott is about to vanish again as meeting starts. [11:28:19] i'll delay this effort until that is merged. thanks andrewbogott and mutante [11:28:30] matanya: um… until what's merged? [11:28:50] the change linked above [11:29:02] https://gerrit.wikimedia.org/r/#/c/116064/ [11:29:13] oh, ok, I'm way behind then [11:29:18] anyway, gotta go [11:31:05] i'd have to say to be a real test it should be the identical role [11:31:09] not role:labs [11:31:20] but yea, we gotta run, ttyl matanya [11:31:40] thanks and bye [11:31:57] PROBLEM - RAID on nescio is CRITICAL: Connection refused by host [11:53:07] PROBLEM - RAID on brewster is CRITICAL: CRITICAL: Active: 2, Working: 2, Failed: 1, Spare: 0 [11:53:27] RECOVERY - RAID on nescio is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [11:54:07] ACKNOWLEDGEMENT - RAID on brewster is CRITICAL: CRITICAL: Active: 2, Working: 2, Failed: 1, Spare: 0 alexandros kosiaris well known. Brewster is to be decom soon [11:56:17] (03PS1) 10Alexandros Kosiaris: Correct check_sysctl source path [operations/puppet] - 10https://gerrit.wikimedia.org/r/125173 [11:57:27] (03CR) 10Alexandros Kosiaris: [C: 032] Correct check_sysctl source path [operations/puppet] - 10https://gerrit.wikimedia.org/r/125173 (owner: 10Alexandros Kosiaris) [11:57:39] (03CR) 10Alexandros Kosiaris: [V: 032] Correct check_sysctl source path [operations/puppet] - 10https://gerrit.wikimedia.org/r/125173 (owner: 10Alexandros Kosiaris) [11:59:55] (03PS1) 10BBlack: add esams.wmnet to strontium puppet allow_from [operations/puppet] - 10https://gerrit.wikimedia.org/r/125174 [12:03:04] (03CR) 10BBlack: [C: 032 V: 032] add esams.wmnet to strontium puppet allow_from [operations/puppet] - 10https://gerrit.wikimedia.org/r/125174 (owner: 10BBlack) [12:18:23] akosiaris: heh. thanks for that, too :) [12:20:01] ori: ? [12:20:23] the check_sysctl fix [12:20:35] (03CR) 10Matanya: "fixed in: https://gerrit.wikimedia.org/r/125173" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125116 (owner: 10Alexandros Kosiaris) [12:21:27] ah, yeah. still waiting on check on some machines but hopefully this chapter is almost closed :-) [12:29:59] (03PS1) 10Hashar: jenkins: logrotate main file on a daily basis [operations/puppet] - 10https://gerrit.wikimedia.org/r/125178 [12:30:26] akosiaris: if you have some spare cycles I could use https://gerrit.wikimedia.org/r/125178 :D Adjust logrotate for the Jenkins log file :] [12:30:59] (please) [12:31:01] 180 ? [12:31:05] days [12:31:08] instead of 52 weeks [12:31:09] 6 months of logs ? [12:31:23] that already cut it by half, but I am fine with less logs, maybe 60 days :] [12:31:45] you tell me. [12:31:48] how much is one day of logs? [12:31:59] 600MB uncompressed maybe? [12:32:08] ouch [12:32:43] the compressed weekly logs are 140M [12:32:58] so maybe 20 per day [12:33:06] sold! [12:33:22] 3.6G for 6 months. [12:33:26] yeah I think I am ok with that [12:33:34] (03PS2) 10Hashar: jenkins: logrotate main file on a daily basis [operations/puppet] - 10https://gerrit.wikimedia.org/r/125178 [12:33:37] and I filled a bug to upstream to reduce the amount of log [12:33:53] the main culprit is the Jenkins Gearman client plugin which has been written by openstack [12:34:01] it is solved upstream \O/ [12:34:32] LGTM but put a mode and ensure parameter [12:35:19] isn't ensure present always there ? [12:36:23]