[00:00:03] Krinkle: It's not a bug yet, but the message name from mwalker is right [00:00:10] mw.org is working fine [00:00:42] > return wfMsg('multimediaviewer-viewfile-link'); [00:00:42] <multimediaviewer-viewfile-link> [00:00:53] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Thu Jun 12 00:00:48 UTC 2014 [00:01:12] Krinkle: Weird. [00:01:41] marktraceur: is this message available on all wikis? [00:01:49] or a subset or certain mwversion? [00:01:51] Only wmf8 [00:01:53] ok [00:01:56] I was testing a wikipedia [00:02:01] mw.org works OK [00:02:02] mwscript extensions/WikimediaMaintenance/refreshMessageBlobs.php --wiki="$wiki" [00:02:06] commons's API works fine [00:02:08] mediawikiwiki [00:02:08] > return wfMsg('multimediaviewer-viewfile-link'); [00:02:08] View original file [00:02:33] and it works from eval.php as well, if it were failing there, it would rule out any rl_cache or bits cache [00:02:40] yeah [00:03:09] I'm not super heartbroken about this [00:03:13] messagecache is the same between mw and commons, verified via cli to confirm, but yeah, message works on both. [00:03:24] marktraceur: If you invalidate the bits url but keep debug=false, is the message there? [00:03:26] It has to be RL related [00:03:27] Then you may need to touch a file in the module [00:03:33] No touching yet [00:03:51] (I mean, we could, but I'd rather find the cause in the few minutes we have) [00:03:54] Krinkle: How do I invalidate the bits url? [00:04:03] marktraceur: Did you find the url that has the message error on it? [00:04:30] Uh, no? [00:04:40] When you load your code on a page, there is a load.php url in the stack that will have the function, css text and json message blob [00:04:45] Krinkle: in debug mode, this is the outdated script: [00:04:49] https://bits.wikimedia.org/commons.wikimedia.org/load.php?debug=true&lang=en&modules=jquery.color%2CcolorUtil%2Cfullscreen%7Cmmv%7Cmmv.ThumbnailWidthCalculator%2Capi%2Clightboximage%2Clightboxinterface%2Cmodel%2Cperformance%2Cprovider%2Crouting%2Cui%7Cmmv.model.FileUsage%2CImage%2CLicense%2CRepo%2CTaskQueue%2CThumbnail%2CThumbnailWidth%2CUser%7Cmmv.ui.canvas%2CcanvasButtons%2Ccategories%2Cdescription%2CfileUsage%2CmetadataPanel%2Cpermission%2Cprogr [00:04:49] *nod* [00:05:02] in there you will most certainly find mw.load.implement( yourmodule, function(){}, css, and { key: ' not value' } [00:05:07] !log aaron Synchronized php-1.24wmf8/includes/EditPage.php: e11d41dd366b039bff79e247368b6bff1245ea5e (duration: 00m 07s) [00:05:12] Logged the message, Master [00:05:56] tgr: What is the timestamp at the end of that URL? (It got cut off) [00:05:57] url cut off, none of those modules contain multimediaviewer-viewfile-link [00:06:05] qchris: ping? can I haz new repo? :) mediawiki/extensions/UrlShortener? [00:06:23] it's in mmv.ui.canvasButtons, right? [00:06:24] YuviPanda: Sure. [00:06:31] https://bits.wikimedia.org/commons.wikimedia.org/load.php?debug=true&lang=en&modules=mmv.ui.canvasButtons [00:06:32] YuviPanda: Are we gonna have both UrlShortener and ShortUrl? [00:06:45] So it works there [00:06:48] RoanKattouw: indeed. currently I'm calling it ShortUrlv5, but beyond the joke value that sounds even more confusing [00:06:57] http://pastebin.com/raw.php?i=JEAHFF87 [00:07:03] Krinkle: Well yes. It's almost certainly an outdated Varnish cache entry [00:07:12] this means we're most definitely hitting the race condition that is caused by scap and having live servers in rotation. The timestamped url got cached before the message was there I guess. [00:07:14] RoanKattouw: I'm implementing whatever was defined in the RfC. And we can't fully get rid of the current shorturls since they are already deploye in a few places and shared too [00:07:15] Yep [00:07:16] &version=20140611T231430Z&* [00:07:17] See [00:07:23] That's almost an hour ago [00:07:43] Actually, no it ins't a scap race condition. It didn't get bumped at all, and even if it was, message cache is synced before any thing else [00:07:50] this means the module tiemstamp didn't get increased at all [00:08:01] Also that URL WFM [00:08:02] YuviPanda: Since I was about to ask the same thing as Ro-anKattouw ... UrlShortener it is? [00:08:16] YuviPanda: OK, just asking [00:08:18] marktraceur: touch a file in the mmv.ui.canvasButtons module and sync-file it [00:08:30] I'm not deploying, I'm not even logged into the server [00:08:34] qchris: can't think of another name that's pointing out that these are different and also that they do similar things [00:08:39] mwalker: Are you still around? [00:08:42] yepyep [00:08:43] YuviPanda: k. [00:08:49] mwalker: * [00:08:51] Ahm [00:08:52] Guyts [00:08:53] PROBLEM - MySQL Processlist on db1019 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 145 statistics [00:09:01] RoanKattouw: indeed, that URL works now [00:09:02] I don't know why, but tgr's link doesn't produce broken output for me [00:09:06] I get the message just fine [00:09:10] Is this even still a bug? [00:09:19] heh; well; the clearMessageBlobs script is still running... [00:09:20] Message is working [00:09:24] Or is it "just" in browser cache [00:09:28] On Commons [00:09:28] So [00:09:31] RoanKattouw: tgr's url doesn't contain the message, the url got cut off before the relevant modules query part [00:09:33] Maybe it got fixed [00:09:37] Commons itself works now [00:09:38] YuviPanda: https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/UrlShortener [00:09:40] mwalker, I usually just touch all extension's files and sync-dir [00:09:50] Krinkle: He pastebinned the full thing [00:09:52] qchris: woo! ty! :) [00:09:53] RECOVERY - MySQL Processlist on db1019 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 0 statistics [00:09:58] YuviPanda: yw. [00:10:02] Yeah, btu that's debug mode. [00:10:04] MaxSem, that certainly is faster than this script [00:10:05] so fixed by the clearMessageBlobs script? [00:10:09] that's only ached for a few minutes [00:10:23] No [00:10:52] Krinkle: Same thing for debug=false [00:10:54] it was already fixed before we investigated anything, and it's still broken for users not using debug mode. [00:12:10] it is fixed in non-debug mode now [00:12:42] the pastebined url with debug=false in place has [00:12:42] {"multimediaviewer-viewfile-link":"\u003Cmultimediaviewer-viewfile-link\u003E"} [00:12:46] I checked around :45 I think, and it was broken in both modes then [00:13:08] https://gist.githubusercontent.com/Krinkle/c9ddbfe7736fe1473b15/raw [00:13:27] well as you said it has an old timestamp [00:13:30] Hah indeed it does [00:13:33] So yeah, old timestamp [00:13:39] So we need to touch a file in that module [00:13:41] even when bypassing browser cache, that one remains broken and will forever remain so until we update the module [00:14:14] * Krinkle yields and goes to look into the commit that changes it to see why it didn't cause a change event [00:15:46] ok -- so I'm going to let the script continue on it's merry way; but I'm going to touch and sync a/common/php-1.24wmf8/extensions/MultimediaViewer/resources/mmv/ui/mmv.ui.canvasButtons.js [00:16:04] Krinkle, ^ that sound OK? or do you want me to hold off until you finish your investigation? [00:16:15] Yep [00:16:18] go ahead [00:16:41] the fix is going to be the same either way, I'm doing postfix investigation already basically [00:17:11] RoanKattouw: Maybe json messages don't have a good message updated timestamp [00:17:14] !log mwalker Synchronized php-1.24wmf8/extensions/MultimediaViewer/resources/mmv/ui/mmv.ui.canvasButtons.js: poking cache for multimediaviewer messages (duration: 00m 04s) [00:17:18] message and use was added in the same commit: [00:17:19] https://github.com/wikimedia/mediawiki-extensions-MultimediaViewer/commit/b2039ad7eb8a994d78ab45761d8dae405e729ac9 [00:17:19] Logged the message, Master [00:17:32] so the typical "add message key that was updated before last module update timestamp" doeesn't apply [00:17:35] they're bothnew [00:17:51] (03PS2) 10Ori.livneh: mw/apache 2.4 compat: remove DefaultType directive [operations/puppet] - 10https://gerrit.wikimedia.org/r/138891 [00:18:03] PROBLEM - Puppet freshness on palladium is CRITICAL: Last successful Puppet run was Wed 11 Jun 2014 18:16:14 UTC [00:18:24] I think clearMessageBlobs is supposed to take care of this, but it's slow [00:18:33] Maybe there was a race condition? [00:18:47] soooo slow; it's still running [00:18:56] it would take care of it by giving all messages globally the current timestamp [00:18:57] I'm tempted to ctrl+c this [00:19:04] I mean it sounds like there must have been, we generated a response with the new code and without the message [00:19:06] Oh, oh, hold on! [00:19:09] Remember [00:19:22] There was a cached RL response with "messages":{"foo":""}} [00:19:26] So there is no failure on RL's part! [00:19:35] The l10n cache hadn't been rebuilt yet [00:19:44] RL was trying to include the message, but wfMsg('foo') was failing [00:19:50] I know [00:19:59] mwalker: Kill it [00:20:04] but scap does l10n update before rsyncing actual code [00:20:18] so why would rl be doing wfMsg(newkey) already? [00:20:25] !log clearMessageBlobs.php killed because we fixed the problem in a more different way [00:20:30] Logged the message, Master [00:20:59] it does l10n update for all caches, and only then start syncing to the first app server, right? [00:21:33] *nods* [00:21:37] but, it syncs code first [00:21:39] then i18n [00:21:56] wat? [00:22:10] ah, the file cache [00:22:13] wait; I'm misreading this [00:22:36] http://pastebin.com/1pXq4mrx [00:23:20] I think that's a regression in scap [00:23:24] scap-rebuild-cdbs happens before sync_wikiversions, but I'm not sure sync_wikiversions is what deploys the code [00:23:31] I think sync-common deploys the code [00:23:35] I'm pretty sure it didn't used to start with rsync common/ and *then* do l10n update [00:23:41] sync-common gets the code there, yeah [00:23:54] wikiversion just updates the versions we point to [00:24:10] so, for non-new-versions, sync-common is it [00:24:20] I reckon that output is just lieing [00:24:48] it if would really rsync first and then generate new wikiversions.cdb and l10n cache, then those files wouldn't be synced [00:25:14] yeah, not sure what that first step is, honestly [00:25:22] lines 3 and 4 [00:25:23] to tin.eqiad.wmnet from tin [00:25:35] ah that's /a/common to /usr/local/apache [00:25:46] still doesn't make sense though [00:27:12] RoanKattouw: Yeah, so it is what I thought. We no longer (or never did) rsync l10n cache to the app servers first. So while one single server will (aside from a very short window) have new code +new i18n or old code +new i18n, there can be an index.php request or startupmodule request handled by a new server, and generate a versioned url handled by an old server [00:27:23] the classic rl race condition that we see from time to time [00:27:27] Yeah [00:27:38] still weird that it had the new js but old message though [00:27:50] So what you're saying is we need to change our scripts to first sync the l10n cache, then the code? [00:27:51] do we put message cache somewhere central or per apache? [00:28:02] well, that might cause the opposite isssue [00:28:08] IIRC it's built centrally, then synced to each Apache [00:28:19] Krinkle: No it wouldn't, because of timestamps [00:28:21] Right, but it's not like sql [00:28:36] it's not accessing it from somewhere outside of the apache server [00:28:42] No, it's local only [00:28:44] (memcached shar on another host, or db slave) [00:28:46] ah, wait [00:28:47] Every Apache has its own local copy [00:28:49] it is [00:28:56] message *timestamps* are in sql [00:29:04] Yeah..... [00:29:07] Oh! [00:29:14] No but message contents are there too [00:29:18] So that should be fine, right? [00:29:31] RoanKattouw: No [00:29:33] message_blobs [00:29:35] because of something else [00:29:44] Or ahm, I guess it's called msg_resource [00:29:47] RoanKattouw: remember module definition summary hash ? [00:29:56] Yeaaaah? [00:29:57] the memcached cache key used to be modulename + rlcontext hash [00:30:15] with value { hash: md5(summary), timestamp: now() } [00:30:24] we changed it to modulename + rlcontext hash + md5(summary) [00:30:33] because during scap it keeps changing back and forth in memcached [00:30:42] and which ever wins last is kept [00:30:45] that can very well be an old one [00:31:06] I suspect that might be happening in sql rl message blob [00:31:40] depending on how we update the value [00:31:54] do we always update the blob if we had to regenerate the response, or only if timestamp is higiher [00:32:07] i hope we already update so that we can rollback [00:32:20] but that does mean it can be subject to this [00:34:06] I forget how this all works exactly [00:35:36] (03PS1) 10MaxSem: Disable PageImages on Wikibooks and Wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139048 (https://bugzilla.wikimedia.org/66455) [00:43:43] RECOVERY - Puppet freshness on palladium is OK: puppet ran at Thu Jun 12 00:43:39 UTC 2014 [00:44:05] !log ran "puppetca -s palladium.eqiad.wmnet" on palladium to get agent running again, someone borked/regenerated the key there 6 hours ago? [00:44:10] Logged the message, Master [00:52:53] PROBLEM - Disk space on analytics1020 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/g 73191 MB (3% inode=99%): [00:59:55] (03CR) 10BBlack: [C: 031] Replace references to /etc/apache2/wmf symlink with its link target [operations/apache-config] - 10https://gerrit.wikimedia.org/r/138858 (owner: 10Ori.livneh) [01:00:02] (03CR) 10BBlack: [C: 031] mediawiki/apache: load all.conf from canonical path rather than symlink [operations/puppet] - 10https://gerrit.wikimedia.org/r/138877 (owner: 10Ori.livneh) [01:00:23] (03CR) 10BBlack: [C: 031] 2.4 compat: load mod_filter for AddOutputFilterByType [operations/apache-config] - 10https://gerrit.wikimedia.org/r/138889 (owner: 10Ori.livneh) [01:01:08] (03CR) 10Kaldari: [C: 031] "Looks good. Feel free to merge when ready to deploy." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139048 (https://bugzilla.wikimedia.org/66455) (owner: 10MaxSem) [01:18:37] (03CR) 10BBlack: "As we discussed a bit in IRC, I'm a little wary of how this changes behavior for the existing apache 2.2 servers. I'm not even sure what " [operations/puppet] - 10https://gerrit.wikimedia.org/r/138891 (owner: 10Ori.livneh) [01:30:09] (03CR) 10Ori.livneh: "Fair point. I'll sample the logs to see if i can determine what requests that default is being applied to and how they might be effected." [operations/puppet] - 10https://gerrit.wikimedia.org/r/138891 (owner: 10Ori.livneh) [01:33:23] (03PS1) 10Krinkle: mwgrep: Add namespace prefix in output [operations/puppet] - 10https://gerrit.wikimedia.org/r/139059 [01:34:58] (03CR) 10Ori.livneh: [C: 032] 2.4 compat: load mod_filter for AddOutputFilterByType [operations/apache-config] - 10https://gerrit.wikimedia.org/r/138889 (owner: 10Ori.livneh) [01:35:12] (03CR) 10Hoo man: [C: 031] "Makes sense :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/139059 (owner: 10Krinkle) [01:47:15] !log graceful'd appservers for I0e66ee0a1: 2.4 compat: load mod_filter for AddOutputFilterByType [01:47:23] Logged the message, Master [02:13:04] (03PS1) 10Tim Starling: .bashrc: fix error on machines without MW [operations/puppet] - 10https://gerrit.wikimedia.org/r/139060 [02:14:08] (03CR) 10Tim Starling: [C: 032] .bashrc: fix error on machines without MW [operations/puppet] - 10https://gerrit.wikimedia.org/r/139060 (owner: 10Tim Starling) [02:16:52] @info 10.64.16.27 [02:16:52] Krinkle: [10.64.16.27: s3 (DEFAULT)] db1038 [02:16:58] @replag s3 [02:16:58] Krinkle: [s3: aawiki] db1038: 0s, db1035: 0s, db1003: 0s, db1019: 0s, db1027: 0s [02:20:03] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [02:26:14] Hm.. I don't know how many Redis servers we have, but I'm seeing 2 or 3 in the logs that get a lot of connection errors [02:26:15] https://logstash.wikimedia.org/#dashboard/temp/XBWricGwS1Gr8yv3sqCAZg [02:26:48] And while doing interface message clean up on liwiktionary on tin, I'm consistently getting a connection time out from the mysql link [02:26:53] Caught exception DBConnectionError: DB connection error: Lost connection to MySQL server at 'reading authorization packet', system error: 0 (10.64.16.27) [02:27:03] making an edit on that wiki seems to work fine though [02:33:10] !log LocalisationUpdate completed (1.24wmf7) at 2014-06-12 02:32:07+00:00 [02:33:16] Logged the message, Master [02:35:01] TimStarling: Might be interesting to look into ^^ g2g [02:55:41] (03PS1) 10Springle: depool db1051 for schema changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139063 [02:56:14] (03CR) 10Springle: [C: 032] depool db1051 for schema changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139063 (owner: 10Springle) [02:56:20] (03Merged) 10jenkins-bot: depool db1051 for schema changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139063 (owner: 10Springle) [02:58:03] PROBLEM - Puppet freshness on mw1149 is CRITICAL: Last successful Puppet run was Wed 11 Jun 2014 23:57:00 UTC [02:58:05] !log springle Synchronized wmf-config/db-eqiad.php: depool db1051 (duration: 01m 08s) [02:58:09] Logged the message, Master [03:01:03] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Thu 12 Jun 2014 00:00:48 UTC [03:03:13] !log LocalisationUpdate completed (1.24wmf8) at 2014-06-12 03:02:09+00:00 [03:03:18] Logged the message, Master [03:18:03] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Thu 12 Jun 2014 00:17:12 UTC [03:25:49] (03CR) 10Mattflaschen: "I'm not sure which part was taken to email, and which list/where." [operations/puppet] - 10https://gerrit.wikimedia.org/r/49678 (owner: 10Ottomata) [03:30:53] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Thu Jun 12 03:30:46 UTC 2014 [03:40:14] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 12 03:39:07 UTC 2014 (duration 39m 6s) [03:40:18] Logged the message, Master [03:56:57] (03CR) 10Ori.livneh: [C: 032] Replace references to /etc/apache2/wmf symlink with its link target [operations/apache-config] - 10https://gerrit.wikimedia.org/r/138858 (owner: 10Ori.livneh) [04:16:13] (03CR) 10MaxSem: "> However, if I recall correctly, views by logged in users always trigger a parse. I would appreciate it if someone could confirm." [operations/puppet] - 10https://gerrit.wikimedia.org/r/49678 (owner: 10Ottomata) [04:16:53] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:17:43] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Thu Jun 12 04:17:34 UTC 2014 [04:17:43] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.003 second response time [04:21:34] RECOVERY - Puppet freshness on mw1149 is OK: puppet ran at Thu Jun 12 04:21:32 UTC 2014 [04:25:45] (03CR) 10MaxSem: "As of correllation, this is an interesting task, actually: observers sniffing HTTPS traffic know times each request was made, their durati" [operations/puppet] - 10https://gerrit.wikimedia.org/r/49678 (owner: 10Ottomata) [04:27:03] (03CR) 10Jforrester: [C: 031] "Now good to go." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138345 (owner: 10Jforrester) [04:41:59] (03PS3) 10Ori.livneh: mediawiki/apache: load all.conf from canonical path rather than symlink [operations/puppet] - 10https://gerrit.wikimedia.org/r/138877 [04:42:01] (03PS1) 10Ori.livneh: mediawiki: small clean-ups (wip) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139065 [04:44:54] (03PS4) 10Ori.livneh: mediawiki/apache: load all.conf from canonical path rather than symlink [operations/puppet] - 10https://gerrit.wikimedia.org/r/138877 [04:46:03] * ori will wait for euro-opsen to wake up before deploying that [04:46:33] * YuviPanda should go to sleep at some point [04:46:42] heh [04:47:13] ori: got a minute? [04:47:21] literally just [04:47:27] ori: ah, then 'tis k :) [04:47:38] i'll be back in an hour or so [04:47:41] but you should sleep [04:47:45] I should, yeah. [04:47:46] should.. [04:48:00] * YuviPanda is writing an implementation of the URL Shortner RfC [05:21:04] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [05:46:29] !log amslvs[1234]: stopping pybal [05:46:33] Logged the message, Master [05:52:17] !log cr1-esams/cr2-knams: dismantling amslvs BGP peerings [05:52:21] Logged the message, Master [05:54:39] <_joe_> paravoid: wow we're moving everything to the new load balancers? [05:55:00] we already did, these were just left as backups [05:55:21] <_joe_> oh ok [06:00:27] _joe_: fwiw, replaced by lvs300[1234] [06:00:35] bblack has been pulling all the work there [06:00:45] <_joe_> yeah I noticed [06:00:56] <_joe_> I just thought he was not 100% done yet [06:01:04] <_joe_> more like 99% done [06:01:06] it's 99% [06:01:08] yeah [06:01:15] he tried 3.13 yesterday [06:01:32] <_joe_> ok, still it's a 99% that allows going to prod [06:01:34] <_joe_> after they fixed the ipvs6 fiasco? [06:01:39] yup [06:01:48] <_joe_> that was just abysmal [06:01:49] and 3.13 fixes XPS for bnx2x [06:02:16] so I *think* what's left is rolling out 3.13 (and maybe trusty) to all of them and fix the python script to configure XPS [06:02:21] he'd know better :) [06:06:48] (03PS1) 10Physikerwelt: Disable MathML rendering option [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139068 (https://bugzilla.wikimedia.org/66492) [06:09:59] ori: awjr_away wait what's the name of the varnish role for vagrant we had? [06:10:17] ? [06:10:55] ori: hmm, at the zurich hackathon, didn't someone create a varnish role for mediawiki-vagrant? [06:10:59] or am I confusing nginx with varnish? [06:11:30] ottomata made the nginx module from operations/puppet a submodule and introduced it to vagrant [06:11:37] and awjr added an SSL role on top of it [06:11:46] ah, right. I... should sleep soon. [06:11:49] but no varnish yet [06:12:00] * YuviPanda needs to figure out some way of testing varnish, since he has to test purges [06:12:25] it's extremely annoying to compile VMODs [06:12:34] heh, thought so [06:12:41] but I need to test just HTCP purges... [06:12:54] what about labs? [06:13:04] ori: yeah, that's where I'll end up doing it, most probably [06:13:09] the varnish setup in labs is pretty true to production [06:13:23] and while you can't completely set fire to it it's not the end of the world if it goes down for a minute or two [06:13:24] ori: https://gerrit.wikimedia.org/r/#/c/139054/7 which is the beginning of a full implementation of https://www.mediawiki.org/wiki/Requests_for_comment/URL_shortener [06:13:57] so need to figure out a way to make sure that regular hits just hit varnish instead of apache, and also that negative cache results are purged when appropriate. [06:14:56] why do you need to purge? [06:15:57] ori: I want to cache negative hits as well. say I get /adf cached as 'sorry this does not exist', and sometime in the future a short url with that key *does* exist, so it should be purged from the cache to make sure that the redirect happens properly [06:16:52] serve a synthetic response from vcl_miss [06:17:56] ori: I haven't checked out VMODs at all, so that flew over my head a bit. [06:18:01] https://github.com/wikimedia/operations-puppet/blob/production/templates/varnish/errorpage.inc.vcl.erb [06:19:25] <_joe_> ori: I am struggling with mwprof/reporter - Its .gitreview is probably wrong? http://git.wikimedia.org/blob/operations%2Fsoftware%2Fmwprof%2Freporter.git/b26e2b2806b323b66a914e5cc0628467a6830b3b/.gitreview [06:19:43] ori: hmm, ok. I think I'll need to look at these with a slightly more awake brain, so am off to sleep now. [06:20:06] YuviPanda: after varnish looks for a suitable response object in its cache and fails to find one, right before it forwards the request to a backend, you have a chance to intervene [06:20:31] you can decline to forward the request to a backend and serve a static response generated in varnish [06:20:56] _joe_: sec, i'll look [06:21:55] ori: ah, hmm. also this will be a Special page, so I guess by default it won't emit headers that make varnish cache it, and it's something I must specify [06:22:52] <_joe_> ori: project points at mwprof instead than at mwprof/reporter - it seems wrong, but I may just be missing something. Take your time, not in a hurry [06:23:08] _joe_: that's almost certainly just me being an idiot [06:23:12] it sounds like a mistake [06:23:16] i probably created one, then the other [06:23:27] want me to fix? [06:24:00] YuviPanda: but why do you need all the machinery of special pages when it's basically a 404? [06:24:13] ori: not for the 'not found' case, but for the 'found' case. [06:24:32] but you don't need to purge those [06:24:48] oh, you're wondering how to get varnish to cache it [06:25:48] <_joe_> ori: no nevermind, I'll do that [06:26:07] _joe_: thanks, sorry about that [06:26:50] ori: yeah, how to get varnish to cache it [06:41:20] akosiaris: manutius only left with ganglia_new::monitor::aggregator, where should that move to? [06:45:48] netmon1001? [06:46:02] I guess so [06:49:55] (03PS1) 10Matanya: netmon1001: add ganglia_new::monitor::aggregator [operations/puppet] - 10https://gerrit.wikimedia.org/r/139071 [06:57:53] PROBLEM - Disk space on analytics1020 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/k 72305 MB (3% inode=99%): /var/lib/hadoop/data/j 107784 MB (5% inode=99%): [07:04:59] (03PS1) 10Matanya: manutius: decom [operations/puppet] - 10https://gerrit.wikimedia.org/r/139072 [07:14:22] (03PS1) 10Matanya: manutius: decom, left mgmt [operations/dns] - 10https://gerrit.wikimedia.org/r/139075 [07:18:04] (03CR) 10Ori.livneh: [C: 032] mediawiki/apache: load all.conf from canonical path rather than symlink [operations/puppet] - 10https://gerrit.wikimedia.org/r/138877 (owner: 10Ori.livneh) [07:25:53] RECOVERY - Disk space on analytics1020 is OK: DISK OK [07:36:59] (03PS1) 10Giuseppe Lavagetto: access-request: grant Bernd Sitzmann access [operations/puppet] - 10https://gerrit.wikimedia.org/r/139079 [07:45:40] (03PS1) 10Faidon Liambotis: Allocate an IP for cobalt.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/139080 [07:48:15] (03PS1) 10Faidon Liambotis: install-server: add server cobalt [operations/puppet] - 10https://gerrit.wikimedia.org/r/139081 [07:48:39] (03CR) 10Faidon Liambotis: [C: 032] Allocate an IP for cobalt.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/139080 (owner: 10Faidon Liambotis) [07:49:00] (03CR) 10Faidon Liambotis: [C: 032] install-server: add server cobalt [operations/puppet] - 10https://gerrit.wikimedia.org/r/139081 (owner: 10Faidon Liambotis) [07:49:36] (03CR) 10Faidon Liambotis: [V: 032] install-server: add server cobalt [operations/puppet] - 10https://gerrit.wikimedia.org/r/139081 (owner: 10Faidon Liambotis) [07:49:43] oh jenkins [07:49:51] haha [07:52:27] (03CR) 10Faidon Liambotis: [C: 04-1] "Order of inclusion shouldn't matter on properly written manifests, this sounds like an awful hack to me..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/138804 (owner: 10Hashar) [07:54:38] paravoid: yeah that patch ( ^^^^ ) is awful :( [07:54:56] <_joe_> hashar: also, does not do whatever you wanted to do [07:55:13] it is all because gallium ends up calling mediawiki::packages and webserver::php5 which both rely on the same apache packages [07:55:18] <_joe_> hashar: you could re-define packages defs in the parent classes in a better way [07:55:19] and I am not sure how to fix it [07:56:01] <_joe_> hashar: ideally, there would be a class like php5::packages that would be included by both [07:56:02] _joe_: yup I replied to you about how I have no clue how the dependence works (i.e. Class -> Class ) [07:56:17] <_joe_> hashar: it is documented to work as you expect [07:56:47] <_joe_> but in reality using "include" (which is encouraged by puppetlabs itself) defies that dependency chain [07:57:09] <_joe_> it will NOT apply the class on the left hand side before the one on the right-hand side [07:57:32] so with: Class['role::ci::slave'] -> Class['role::ci::website'] [07:57:38] that would apply role::ci::website first right? [07:58:02] <_joe_> I discovered it while trying to find decent solution [07:58:06] <_joe_> no [07:58:15] ah NOT [07:58:20] I am not paying attention [07:58:25] <_joe_> it will just ensure the second is defined if the first one is defined as well [07:58:31] <_joe_> maybe I wrote it wrong [07:58:45] <_joe_> let me make an example [07:59:27] honestly, lets just throw the class dependency idea to the bin and use something else :D [08:00:10] I thought about either 1) adding if ! defined Package 2) creating a tiny wrapping class like apache::php5 to include [08:00:29] but iirc we don't really want tiny wrapper all other the place. I had the case with some other packages definition preivously [08:00:42] <_joe_> hashar: http://paste.debian.net/104585/ [08:01:10] <_joe_> I bet you expected the output to be 'bar' before 'foo' [08:01:13] <_joe_> and it's not [08:01:21] <_joe_> I expected it as well [08:01:59] <_joe_> the funny part is that if you declare class {'foo': } class {'bar': } in the node, it will behave as expected [08:02:34] <_joe_> "a fractal of bad design" [08:02:56] <_joe_> well, in the case of puppet, s/design/implementation/ [08:04:09] could it be that the notice{} are realized before? [08:07:05] _joe_: since it does not work, I am wondering which other solution could be used [08:07:18] I am out of ideas [08:12:39] <_joe_> hashar: as I stated earlier, build a class that both classes could include [08:20:22] <_joe_> hashar: do you know how can I seach a change on gerrit by its subject? [08:20:45] _joe_: I am not sure you can [08:20:51] <_joe_> eww [08:20:57] https://gerrit.wikimedia.org/r/Documentation/user-search.html might have some clue [08:21:09] <_joe_> so I'm left with looking at the git log of site.pp [08:21:12] <_joe_> shit [08:21:27] but git log can probably grep [08:22:03] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [08:22:58] _joe_: git log --grep // Limit the commits output to ones with log message that matches the specified regular expression [08:23:11] <_joe_> hashar: yes I did that, thanks [08:24:06] _joe_: in Gerrit that would be: message:'something' [08:24:14] Changes that match MESSAGE arbitrary string in the commit message body. [08:27:18] bah libapache2-mod-php5 is used all other the place :( [08:27:40] <_joe_> ok thanks :) [08:45:42] pffffff [08:46:23] apache::mod::php provides a default php template bah [08:56:12] (03PS3) 10Hashar: contint: reduce duplication with mediawiki::packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/138804 [09:00:09] (03CR) 10Hashar: "I factored out the installation of packages apache2-mpm-prefork and libapache2-mod-php5 to a new php5::apache2packages class." [operations/puppet] - 10https://gerrit.wikimedia.org/r/138804 (owner: 10Hashar) [09:00:23] I hate puppet sometime [09:05:24] <_joe_> hashar: sometimes? then you're still a puppet newbie [09:05:37] <_joe_> :) [09:05:53] I am [09:05:57] I should get a course maybe [09:06:07] <_joe_> not really [09:06:28] <_joe_> the doc is good enough [09:06:38] <_joe_> or, buy/download/whatever "pro puppet" [09:06:48] <_joe_> dunno how up to date is it [09:07:58] "How To Become a Puppet Master In A Week" [09:08:16] oh I see puppet as the gate protecting protection. A gate I want to avoid approaching :] [09:20:24] (03PS2) 10Filippo Giunchedi: keep 5 days worth of diamond.log [operations/puppet] - 10https://gerrit.wikimedia.org/r/138789 [09:20:34] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] keep 5 days worth of diamond.log [operations/puppet] - 10https://gerrit.wikimedia.org/r/138789 (owner: 10Filippo Giunchedi) [09:43:24] (03CR) 10Filippo Giunchedi: "mhh the bug mentiones a similar problem than what we had and the fact that having the default sample rate low for high-frequency metrics, " [operations/puppet] - 10https://gerrit.wikimedia.org/r/138574 (owner: 10Filippo Giunchedi) [10:04:03] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Thu 12 Jun 2014 07:03:00 UTC [10:06:59] (03PS1) 10Giuseppe Lavagetto: mediawiki: re-include deployment users [operations/puppet] - 10https://gerrit.wikimedia.org/r/139089 [10:07:22] mediawiki? not appserver? [10:08:11] <_joe_> it was renamed... I was in doubt [10:08:18] <_joe_> yes, it's the appservers [10:08:37] <_joe_> but now the roles are all 'mediawiki::something' [10:08:48] <_joe_> btw not sure I included the correct group [10:11:09] oh are they? :( [10:11:24] I was hoping we'd finally make a distinction between mediawiki the software and our appserver stack [10:11:27] oh well [10:12:08] <_joe_> in other news, something turned puppet back on on the puppet-compiler02 [10:12:26] <_joe_> which made it not working again [10:12:27] <_joe_> jesus. [10:22:14] (03CR) 10Tobias Gritschacher: [C: 031] "@Faidon: is this ok now and we can move forward? Or are there still some issues with the script from your side?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 (owner: 10Christopher Johnson (WMDE)) [10:33:10] (03PS1) 10Giuseppe Lavagetto: profiler-to-carbon: set a timeout on the connection [operations/software/mwprof/reporter] - 10https://gerrit.wikimedia.org/r/139093 [10:33:23] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Thu Jun 12 10:33:16 UTC 2014 [10:33:59] (03PS2) 10Giuseppe Lavagetto: profiler-to-carbon: set a timeout on the connection [operations/software/mwprof/reporter] - 10https://gerrit.wikimedia.org/r/139093 [10:47:40] (03PS1) 10Nikerabbit: cxserver configuration for beta labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 [11:18:37] !log Gerrit: created mediawiki/services/cxserver/deploy repository for Nikerabbit and kart_ [11:18:43] Logged the message, Master [11:19:02] whii [11:23:03] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [11:47:43] _joe_: topic says you're on rt duty. Could you please help me with an alert around eventlogging on vanadium (I have no access there)? [11:48:04] Could you check whether or not under /var/log/eventlogging [11:48:35] There are files "all-events", "client-side-events", "server-side-events" getting written [11:48:39] with current information? [11:48:52] (Like tailing them and seeing a timestamp not older than 5 minutes) [11:51:07] ori: ^ You around? [11:52:05] apergos: Could you please help me check three files on vanadium? [11:52:52] sure [11:52:55] qchris: [11:53:07] ah I see, let me look [11:53:14] It's about /var/log/eventlogging [11:53:59] Could you please check whether or not "all-events", "client-side-events", "server-side-events" getting lines appended with recent timestamps, like not older than 5 minutes? [11:54:00] yes, I see it in the backread [11:54:02] just a sec [11:54:04] Thanks. [11:54:55] they all have a last modified of now [11:54:59] let me check the contents [11:57:19] yes, all of them have entries within the last minute [11:57:30] Thanks. [11:57:43] Wonderful. That helped a lot. [11:57:49] anything else I can look at over there/ [11:57:50] ? [11:57:51] +1 thanks apergos [11:57:52] <_joe_> qchris: sorry, I was at lunch [11:58:09] apergos: The rest we can handle on our end. Thanks. [11:58:14] ok, good luck [11:58:16] _joe_: no worries :-) [11:59:01] <_joe_> apergos: thanks for covering for me :) [11:59:07] sure [12:09:24] apergos: oh hi [12:09:29] apergos: have you seen the dataset1001 alert? [12:09:29] hey [12:09:40] I pinged you about it yesterday too, not sure if you saw that [12:09:41] yes, that's 1.5T free still [12:09:49] unless I cna't ad which is quite likely [12:10:06] or spell [12:10:07] oh hah [12:10:07] :P [12:10:24] I can spell but I can't type [12:10:34] or should that be tpye [12:10:41] :D [12:10:56] that alert probably needs fixin' [12:11:10] we need optional parameters for the alert, it's set at some default of % [12:17:15] (03CR) 10Hashar: "Some random clue related to Jenkins. Since that is closely following the parsoid manifests, there is a bit of code that should be factore" (0310 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 (owner: 10Nikerabbit) [12:41:03] PROBLEM - etherpad_lite_process_running on zirconium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^node node_modules/ep_etherpad-lite/node/server.js [12:42:03] RECOVERY - etherpad_lite_process_running on zirconium is OK: PROCS OK: 1 process with regex args ^node node_modules/ep_etherpad-lite/node/server.js [12:56:21] (03PS2) 10Filippo Giunchedi: enable statsd reporting for swift proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/138574 [13:00:26] (03CR) 10Nikerabbit: cxserver configuration for beta labs (036 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 (owner: 10Nikerabbit) [13:03:25] (03PS1) 10Ottomata: Disable eventlogging -> kafka consumer (kafka producer) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139101 [13:04:13] (03PS2) 10Nikerabbit: cxserver configuration for beta labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 [13:05:02] (03CR) 10QChris: Disable eventlogging -> kafka consumer (kafka producer) (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139101 (owner: 10Ottomata) [13:05:49] (03CR) 10Ottomata: Disable eventlogging -> kafka consumer (kafka producer) (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139101 (owner: 10Ottomata) [13:05:51] (03CR) 10QChris: [C: 031] Disable eventlogging -> kafka consumer (kafka producer) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139101 (owner: 10Ottomata) [13:07:39] (03PS2) 10Ottomata: Disable eventlogging -> kafka consumer (kafka producer) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139101 [13:07:56] (03CR) 10Nikerabbit: "I did consider using a module, but then again modules do not have roles (as far as I know), so I was not sure how to split it." [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 (owner: 10Nikerabbit) [13:08:04] (03CR) 10Ottomata: [C: 032 V: 032] Disable eventlogging -> kafka consumer (kafka producer) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139101 (owner: 10Ottomata) [13:08:20] (03CR) 10Giuseppe Lavagetto: [C: 031] enable statsd reporting for swift proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/138574 (owner: 10Filippo Giunchedi) [13:09:23] RECOVERY - Check status of defined EventLogging jobs on vanadium is OK: OK: All defined EventLogging jobs are runnning. [13:27:14] (03CR) 10Hashar: "> I did consider using a module, but then again modules do not have roles (as far as I know), so I was not sure how to split it." [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 (owner: 10Nikerabbit) [13:27:22] (03CR) 10Hashar: cxserver configuration for beta labs (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 (owner: 10Nikerabbit) [13:39:49] !log enabling cp301[34] esams mobile frontends in pybal [13:39:55] Logged the message, Master [13:41:19] (03CR) 10Hashar: mediawiki: small clean-ups (wip) (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139065 (owner: 10Ori.livneh) [13:45:57] (03CR) 10Hashar: cache: remove pointer to pmtpa for labs (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/138903 (owner: 10Matanya) [13:47:05] (03PS3) 10Hashar: cache: remove pointer to pmtpa for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/138903 (owner: 10Matanya) [13:47:22] (03CR) 10Hashar: [C: 031] "Came back to patchset 1. Sorry for the confusion." [operations/puppet] - 10https://gerrit.wikimedia.org/r/138903 (owner: 10Matanya) [13:47:57] hashar: i see you and mutante are palying around with me :P [13:48:08] matanya: yeah it was late yesterday [13:48:12] I wasn't paying attention [13:48:19] no worries [13:55:13] (03PS3) 10Nikerabbit: cxserver configuration for beta labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 [13:56:15] (03CR) 10Nikerabbit: cxserver configuration for beta labs (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 (owner: 10Nikerabbit) [14:04:30] matanya: is puppet supposed to 'indent' with 2 spaces or 4 spaces? [14:04:45] aude: four spaces [14:04:49] ok, thanks [14:11:46] <_joe_> aude: https://wikitech.wikimedia.org/wiki/Puppet_coding [14:12:17] _joe_: thanks [14:12:53] <_joe_> aude: not everything there is valid nowadays, I guess [14:13:05] <_joe_> still, 3.5 is a good starting point [14:14:03] 3.5 spaces? [14:14:35] (03PS1) 10BBlack: turn on NTP for ulsfo LVS [operations/puppet] - 10https://gerrit.wikimedia.org/r/139109 [14:16:21] (03CR) 10BBlack: [C: 032 V: 032] turn on NTP for ulsfo LVS [operations/puppet] - 10https://gerrit.wikimedia.org/r/139109 (owner: 10BBlack) [14:17:43] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.003 second response time [14:17:52] is localisation update functional? https://bugzilla.wikimedia.org/show_bug.cgi?id=66524 [14:18:52] <_joe_> YuviPanda: I was referring to the section of the wiki doc [14:18:59] ah :) ok [14:19:38] <_joe_> YuviPanda: I worked in a place where php was indented by 3 spaces [14:19:47] oh god [14:19:58] <_joe_> because they had a fight about 4 vs 2 and the compromised on 3 :| [14:21:43] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.013 second response time [14:24:03] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [14:34:31] _joe_: I used 3 spaces for a time, merely to detect indentations oddities [14:35:15] the reason we use tabs in mediawiki/core is that it let people specify their preferred indentation size ( 2, 3, 4, 8 or whatever) [14:35:24] though the rest of the world is using 4 spaces apparently [14:36:55] Linux uses 8 ... https://www.kernel.org/doc/Documentation/CodingStyle [14:38:13] hm, so I want to start doing some kafka failure tests, how can I tell icinga to chill out ahead of time? [14:38:49] do I have to do it for everyhost...? [14:39:26] tests on non production hosts ? [14:40:21] yes, the hosts are officially not production [14:40:24] at this time [14:40:35] and i need the amount of data that is coming through from varnishes to test thsi [14:40:45] ok, i see I can schedule downtime for the hosts in question [14:41:00] but there are a lot of varnishes...ah, maybe those ones won't trigger, i think faidon disable those alerts [14:50:14] anomie: I'll swat today [14:50:18] manybubbles: ok! [14:58:29] heading to cafe, back in a bit [15:00:04] manybubbles, anomie: The time is nigh to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140612T1500) [15:00:35] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Thu 12 Jun 2014 12:00:06 UTC [15:00:49] twkozlowski: around for swat deploy? [15:09:23] twkozlowski: yo! you are the only person on swat today [15:10:51] manybubbles: I've not had time to patch some stuff since last night :p [15:11:05] hehe [15:11:28] I have ~3 bugs IIRC assigned to me though so, depends if I can sneak away from the complaints to do someting [15:16:01] (03PS1) 10RobH: returning cobalt to spares pool as it has a bad disk [operations/dns] - 10https://gerrit.wikimedia.org/r/139117 [15:16:40] (03CR) 10RobH: [C: 032] returning cobalt to spares pool as it has a bad disk [operations/dns] - 10https://gerrit.wikimedia.org/r/139117 (owner: 10RobH) [15:22:30] manybubbles: Second time this week people haven't shown up for their SWAT. [15:22:51] anomie: twkozlowski's irc system isn't very good for pinging [15:23:06] its like it doesn't actually pop up for him [15:23:29] If I could eat I'd call it a loss and bump the ticket - but I'm too numb [15:25:27] anomie: does it even say anywhere that you have to be present for a swat? [15:25:47] anomie: i was not there for mine because i was under the impression that i don't have to [15:26:06] MatmaRex: https://wikitech.wikimedia.org/wiki/SWAT_deploys "The SWAT team will ping the relevant developers at the start of the window and when theirs is up; they MUST be available" [15:26:06] if i knew, i'd schedule it for a time when i'd be able to sit here [15:26:34] MatmaRex: yeah - its in there [15:26:53] technically we're supposed to review the patch in the hour before the window but we relaxed that one quite a bit I think. [15:27:01] but the you are here thing, that is required [15:27:29] "Raise account creation limit for eswiki outreach event" [15:27:47] i don't see how twkozlowski being around is going to affect this patch a littlest bit [15:28:09] that patch has a deadline [15:28:16] it was obviously scheduled for a swat today because it needs to be done before 14 june [15:28:26] (see the bug) [15:28:48] usually if you can't be around and it has a deadline, you poke someone else who can be and have them be present, I think [15:28:52] manybubbles: I review the patches in the morning before the SWAT, but I usually skip actually +1ing if there are no problems. [15:28:55] patches like this one are done routinely [15:29:27] yeah, I'm happy to do it if someone wants to "support" it [15:29:34] looks sane to me [15:29:35] in this case I'm not actually sure that you'd test it [15:29:38] https://gerrit.wikimedia.org/r/#/c/138902/2/wmf-config/throttle.php [15:29:45] (03CR) 10Manybubbles: [C: 032] Raise account creation limit for eswiki outreach event [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138902 (https://bugzilla.wikimedia.org/66491) (owner: 10Odder) [15:29:56] (03Merged) 10jenkins-bot: Raise account creation limit for eswiki outreach event [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138902 (https://bugzilla.wikimedia.org/66491) (owner: 10Odder) [15:29:59] hi manybubbles [15:30:02] (03PS1) 10RobH: new server polonium for mail use [operations/dns] - 10https://gerrit.wikimedia.org/r/139119 [15:30:03] looks like enough folks are arround for it [15:30:06] oh, hey, cool [15:30:15] :) [15:30:49] twkozlowski: it won't work [15:31:01] Great [15:31:04] !log manybubbles Synchronized wmf-config/throttle.php: SWAT: Raise account creation limit for eswiki outreach event (duration: 00m 05s) [15:31:08] Logged the message, Master [15:31:13] matanya: ? [15:31:23] twkozlowski: you need to explictly add login.wiki [15:31:30] since when? [15:31:40] since that was chad told me [15:31:43] (03CR) 10RobH: [C: 032] new server polonium for mail use [operations/dns] - 10https://gerrit.wikimedia.org/r/139119 (owner: 10RobH) [15:31:54] see the previous one? i did it and it didn't work [15:32:14] ^d: ^^^^^ [15:32:15] put that in the code comments please,if that's the case [15:32:37] not obvious to me otherwise [15:33:18] yeah [15:33:46] mw1151 didn't let me sync [15:33:53] matanya: is that known? [15:33:56] known [15:34:02] k. so just ignore it [15:34:03] bits, although thought it was coming back [15:34:10] suppose not [15:34:33] I don't know, i asked chad, and he said he thinks that is the reason [15:34:43] man I'm numb. fillings at 8:10 and I'm still numb [15:52:36] akosiaris: would appreciate thoughts re: https://gerrit.wikimedia.org/r/#/c/138769/ whenever you have the chance [15:53:24] ori: ok, wil ldo [15:53:30] will do* [15:53:58] I already love the commit message [15:54:06] It's especially handy for [15:54:06] cases like zirconium [15:54:16] sigh, that server is a PITA [15:54:53] i don't know how much you'll love the rest of the commit [15:54:58] it's an acquired taste, shall we say [15:55:13] (03PS1) 10Ottomata: Usign analytics1012 and analytics1022 as ganglia aggregators for analytics kafka cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/139124 [15:55:37] but i think it's a good approach [15:56:09] <_joe_> ori: sigh [15:56:39] <_joe_> ori: I think it's a clever workaround for one of the biggest limitations puppet has. [15:56:54] <_joe_> that or embed everything in ensure_resource [15:57:19] <_joe_> it's also horribly hackish :) [15:58:09] ori, why the <- package [15:58:25] does that declare the package at the same time as setting up the dependency? [15:58:26] for visual column alignment really [15:58:30] yes [15:58:32] crazy [15:58:44] you can just chain resources like that? [15:58:52] yes [15:59:10] but that way madness lies [15:59:20] why the empty apache::mod class? [15:59:36] (03CR) 10Ottomata: [C: 032 V: 032] Usign analytics1012 and analytics1022 as ganglia aggregators for analytics kafka cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/139124 (owner: 10Ottomata) [16:00:03] there's a weird puppet 2.x parser bug that can be tickled if you don't have a parent scope class [16:00:12] hm [16:00:22] parent scope class? [16:00:36] weird, the class names have nothing to do with scope though, right? its just convention [16:00:39] if you have foo::bar::buzz but no foo::bar [16:00:41] just autoloading [16:01:15] no, there's also name resolution [16:01:25] (03CR) 10Ottomata: apache: replace apache::mod::* hierarchy with simpler equivalent from MWV (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/138769 (owner: 10Ori.livneh) [16:01:38] yeah, for autoloading, but if all the classes are in the same file anyway [16:01:42] iunno :) [16:02:01] ah, maybe it wouldn't like it if the file didn't have the expected class name in it [16:02:08] iunno, cool whatever :/ [16:02:09] :) [16:02:10] <_joe_> I won't care about puppet 2.x bugs for much longer, ori [16:02:19] <_joe_> you have a completely new bunch from puppet 3.4 coming up :P [16:02:35] haha [16:02:46] heh [16:03:04] brb [16:03:22] <_joe_> ottomata: can I beg for a CR? https://gerrit.wikimedia.org/r/#/c/139079/ [16:03:31] <_joe_> it's an access request [16:06:01] (03CR) 10Ottomata: [C: 032] access-request: grant Bernd Sitzmann access [operations/puppet] - 10https://gerrit.wikimedia.org/r/139079 (owner: 10Giuseppe Lavagetto) [16:07:17] <_joe_> thanks! [16:07:38] (03PS2) 10Giuseppe Lavagetto: access-request: grant Bernd Sitzmann access [operations/puppet] - 10https://gerrit.wikimedia.org/r/139079 [16:07:50] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] access-request: grant Bernd Sitzmann access [operations/puppet] - 10https://gerrit.wikimedia.org/r/139079 (owner: 10Giuseppe Lavagetto) [16:09:53] (03PS1) 10Ottomata: Include Analytics Kafka cluster eqiad aggregator data sources [operations/puppet] - 10https://gerrit.wikimedia.org/r/139129 [16:10:13] (03PS1) 10RobH: dns entries for new mail servers [operations/dns] - 10https://gerrit.wikimedia.org/r/139130 [16:10:32] (03PS2) 10Ottomata: Include Analytics Kafka cluster eqiad aggregator data sources [operations/puppet] - 10https://gerrit.wikimedia.org/r/139129 [16:12:55] (03CR) 10Ottomata: [C: 032 V: 032] Include Analytics Kafka cluster eqiad aggregator data sources [operations/puppet] - 10https://gerrit.wikimedia.org/r/139129 (owner: 10Ottomata) [16:14:15] (03CR) 10RobH: [C: 032] dns entries for new mail servers [operations/dns] - 10https://gerrit.wikimedia.org/r/139130 (owner: 10RobH) [16:15:22] greg-g, i just added myself to the schedule at 9:30, need to continue deploying, will need about an hour, half an hour safety afterwards. [16:30:04] yurik: The time is nigh to deploy Wikipedia Zero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140612T1630) [16:30:19] (03CR) 10Ori.livneh: apache: replace apache::mod::* hierarchy with simpler equivalent from MWV (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/138769 (owner: 10Ori.livneh) [16:31:00] (03CR) 10Rush: "Technically this looks good. Daniel and I came across this pretty much at the same time and were confused as we couldn't find where we ha" [operations/puppet] - 10https://gerrit.wikimedia.org/r/139089 (owner: 10Giuseppe Lavagetto) [16:34:25] ganglia is so mysterious sometimes... [16:39:26] ori, how does graphite's mysteriousity compare to ganglias? :p [16:39:37] i mainly haven't switched because of sunk costs :/ [16:39:49] sigh [16:39:58] it's not great, actually [16:40:17] partly this is my fault with txstatsd being a poor choice in hindsight and adding a layer of mysteriousity [16:40:29] i think jeff and chase are replacing it [16:40:38] <_joe_> ori: graphite is not mysterious at all :) [16:41:04] AHHHH NOT MYSTERIOUS [16:41:05] i found it [16:41:05] <_joe_> it's just not working with all the carbon-caches on one single host [16:41:10] right, just send a value twice in the interval of the smallest aggregation unit and.. ;) [16:41:10] need to tell jmx trans of new cluster IP [16:41:28] which I could use ganglia config var... [16:41:30] rather than hard coding [16:41:33] hahah [16:41:34] yeah [16:41:38] love that one [16:41:51] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor stuff. Otherwise it gets a +2 on my part. And altough it goes against puppet standards well it solves an ugly problem. Looking forwa" (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/138769 (owner: 10Ori.livneh) [16:42:15] greg-g, sorry was disconnecting. Any objections to my depl now? its in the schedule [16:43:15] (03PS1) 10Ottomata: Use new kafka cluster ganglia octet [operations/puppet] - 10https://gerrit.wikimedia.org/r/139132 [16:43:22] (03PS2) 10Ottomata: Use new kafka cluster ganglia octet [operations/puppet] - 10https://gerrit.wikimedia.org/r/139132 [16:43:46] (03CR) 10Ottomata: [C: 032 V: 032] Use new kafka cluster ganglia octet [operations/puppet] - 10https://gerrit.wikimedia.org/r/139132 (owner: 10Ottomata) [16:52:20] (03PS5) 10Ori.livneh: apache: replace apache::mod::* hierarchy with simpler equivalent from MWV [operations/puppet] - 10https://gerrit.wikimedia.org/r/138769 [16:53:35] PROBLEM - Disk space on nickel is CRITICAL: DISK CRITICAL - free space: /mnt/ganglia_tmp 103 MB (2% inode=76%): [16:54:39] ottomata: ^ ? [16:54:44] ganglia is in trouble [16:54:46] or will shortly be [16:54:57] interessitng [16:55:01] i am messing with it :p [16:55:53] akosiaris: amended [16:56:22] ori: kewl :-) [16:56:36] thanks for the review [16:57:25] chasemp: i need to revoke a key [16:57:26] thanks for that PS. I was wondering how to solve the same problem the other day [16:57:38] should I just make it be an empty array in data.yaml? [16:57:41] yurikR: if it's in the schedule, no :) [16:58:23] https://gerrit.wikimedia.org/r/#/c/138846/ is a two-line follow-up (include apache::mod::access_compat, include apache::mod::version in apache module's init.pp) to make things as seamlessly cross-compatible as possible [16:58:29] actually i should amend that to include mod_filter [16:58:33] yurikR: sorry, I was out this morning dealing with kid stuff, hope you still have time [16:58:43] greg-g, i added myself to the schedule this morning :) [16:58:46] doing depl right now [16:58:52] because on 2.4 mod_filter is needed for AddOutputFilterByType [16:59:04] yurikR: cool, just saw [16:59:10] (the added to sched) [17:00:03] (03PS1) 10Ottomata: Revoking Leila's ssh key [operations/puppet] - 10https://gerrit.wikimedia.org/r/139138 [17:03:43] ori, i just randomly saw this on nickel's syslog [17:03:43] got no answer from any [RCStream eqiad] datasource [17:03:48] not sure if it matters [17:04:21] yeah, they don't show up in ganglia [17:04:25] i've been meaning to look into that [17:04:31] does that require an iptables change? [17:04:47] !log yurik Synchronized php-1.24wmf7/extensions/: (no message) (duration: 01m 15s) [17:04:51] Logged the message, Master [17:05:11] (03PS1) 10coren: Labs: Traffic shaping [WIP] [operations/puppet] - 10https://gerrit.wikimedia.org/r/139139 [17:05:50] !log yurik's blank sync message could have been: Deploying new JsonConfig,ZeroBanner,ZeroPortal extensions (refactoring ZeroRatedMobileAccess ext) [17:05:55] Logged the message, Master [17:06:26] so. is localisation update functional? https://bugzilla.wikimedia.org/show_bug.cgi?id=66524 [17:06:29] !log yurik Synchronized php-1.24wmf8/extensions/: (no message) (duration: 01m 12s) [17:06:34] Logged the message, Master [17:06:42] i see in the SAL that it runs, but i also see the incorrect messages [17:07:30] MatmaRex: it has been, nothing has changed in scap/etc for over a week [17:07:37] (03PS2) 10Ottomata: Revoking Leila's ssh key [operations/puppet] - 10https://gerrit.wikimedia.org/r/139138 [17:07:52] but, people were complaining yesterday about messages ( marktraceur et al) [17:08:03] Yeah it was bad yesterday [17:08:15] I think mwalker|away ran a script that fixed it; Krinkle is not convinced [17:08:17] marktraceur: was there a bug reported out of it? or? [17:08:26] It's too unclear about what happened [17:08:32] * greg-g opens the bug from MatmaRex now [17:08:37] It might have been a caching issue, it might have been the script run [17:08:37] (03CR) 10Rush: [C: 031] "cool" [operations/puppet] - 10https://gerrit.wikimedia.org/r/139138 (owner: 10Ottomata) [17:08:44] marktraceur: ugh, gotcha [17:08:57] It might have been something totally different [17:09:04] (03CR) 10Ottomata: [C: 032 V: 032] Revoking Leila's ssh key [operations/puppet] - 10https://gerrit.wikimedia.org/r/139138 (owner: 10Ottomata) [17:09:31] marktraceur: but, mwalker also had issues during his deploy, unrelated to yours, or where they the same (I was ignoring swat yesterday until the end I couldn't parse the backscroll) [17:09:40] (03PS1) 10Reedy: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139142 [17:09:42] (03PS1) 10Reedy: testwiki to 1.24wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139143 [17:09:44] (03PS1) 10Reedy: Wikipedias to 1.24wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139144 [17:09:46] (03PS1) 10Reedy: group0 to 1.24wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139145 [17:09:57] Oh, I don't know [17:10:03] (03CR) 10Reedy: [C: 032] Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139142 (owner: 10Reedy) [17:10:09] (03Merged) 10jenkins-bot: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139142 (owner: 10Reedy) [17:10:20] I'll bug him when he's not |away [17:11:18] (03CR) 10Reedy: [C: 032] testwiki to 1.24wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139143 (owner: 10Reedy) [17:11:25] (03Merged) 10jenkins-bot: testwiki to 1.24wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139143 (owner: 10Reedy) [17:12:05] jgage: you around? [17:12:11] Reedy, are you doing depl??? [17:12:18] i want to do some kafka failure stuff soon, watch some metrics, see what happens [17:12:23] (03CR) 10CSteipp: "We do have a traffic analysis problem, and this wouldn't be the easiest way to correlate a user to traffic; however, traffic analysis is m" [operations/puppet] - 10https://gerrit.wikimedia.org/r/49678 (owner: 10Ottomata) [17:12:24] would be nice to have anohter pair of eyes to do this with me [17:12:40] yurikR: No, just initial staging [17:13:01] Reedy, ok, because i am doing some InitializeSetting changes [17:13:30] Reedy: re deployment, can you look at https://bugzilla.wikimedia.org/show_bug.cgi?id=66524 ? some messages are non-localised [17:13:35] yurikR: he just preps an hour ish before the window [17:13:38] Reedy, and I just did sync-dir of 7&8 for all extensions [17:13:42] gotcha [17:13:45] thx greenhac [17:13:49] thx greg-g [17:13:59] (and naturally it always has to be something i touched recently) [17:14:31] (03PS1) 10Faidon Liambotis: install-server: add server polonium [operations/puppet] - 10https://gerrit.wikimedia.org/r/139146 [17:14:57] (03CR) 10Faidon Liambotis: [C: 032 V: 032] install-server: add server polonium [operations/puppet] - 10https://gerrit.wikimedia.org/r/139146 (owner: 10Faidon Liambotis) [17:15:28] (03PS2) 10Ori.livneh: apache: include mod_{filter,access_compat,version} by default [operations/puppet] - 10https://gerrit.wikimedia.org/r/138846 [17:18:38] (03PS1) 10Yurik: Switching zerowiki & ruwiki to ZeroBanner/Portal ext [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139147 [17:20:05] (03CR) 10Yurik: [C: 032] Switching zerowiki & ruwiki to ZeroBanner/Portal ext [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139147 (owner: 10Yurik) [17:20:12] (03Merged) 10jenkins-bot: Switching zerowiki & ruwiki to ZeroBanner/Portal ext [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139147 (owner: 10Yurik) [17:20:15] greg-g: aude: https://bugzilla.wikimedia.org/show_bug.cgi?id=66536 [17:22:40] !log yurik Synchronized wmf-config/InitialiseSettings.php: Attempting to enable new zero ext on zerowiki & ruwiki - take3 (duration: 01m 04s) [17:22:45] Logged the message, Master [17:23:34] aude: pm me if you have any other details on that [17:24:35] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [17:28:55] PROBLEM - ElasticSearch health check on logstash1002 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.137 [17:29:35] PROBLEM - ElasticSearch health check on logstash1001 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.138 [17:30:10] (03PS1) 10Rush: admin enforcing an empty authorized_keys [operations/puppet] - 10https://gerrit.wikimedia.org/r/139152 [17:30:21] hm, bd808|BUFFER ^^ [17:30:48] (03PS1) 10Yurik: Enabling new Zero ext on all sites [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139153 [17:30:55] PROBLEM - ElasticSearch health check on logstash1002 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.137 [17:31:24] (03PS5) 10Withoutaname: Reduce string URLs to defined constant [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131914 (https://bugzilla.wikimedia.org/48618) [17:31:43] (03CR) 10Ottomata: [C: 031] admin enforcing an empty authorized_keys [operations/puppet] - 10https://gerrit.wikimedia.org/r/139152 (owner: 10Rush) [17:31:48] ottomata: he won't be back until monday, he's in the bahamas [17:31:53] hmmm [17:31:53] ok [17:32:46] (03CR) 10Rush: [C: 032] admin enforcing an empty authorized_keys [operations/puppet] - 10https://gerrit.wikimedia.org/r/139152 (owner: 10Rush) [17:32:51] (03CR) 10Yurik: [C: 032] Enabling new Zero ext on all sites [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139153 (owner: 10Yurik) [17:33:05] (03Merged) 10jenkins-bot: Enabling new Zero ext on all sites [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139153 (owner: 10Yurik) [17:34:50] !log yurik Synchronized wmf-config/InitialiseSettings.php: Enabling new zero ext on all wikis (duration: 01m 03s) [17:34:55] Logged the message, Master [17:39:12] looks like someone entered a logstash query that caused elasticsearch to OOM! [17:41:05] PROBLEM - ElasticSearch health check on logstash1003 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.136 [17:41:37] !log restarting elasticsearch on logstash servers [17:41:39] (03CR) 10Ori.livneh: [C: 04-2] "pending more research" [operations/puppet] - 10https://gerrit.wikimedia.org/r/138891 (owner: 10Ori.livneh) [17:41:42] Logged the message, Master [17:42:15] PROBLEM - ElasticSearch health check on logstash1001 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.138 [17:43:24] bblack, i just deployed the new zero ext, could you check that the netmapper still can get new data? [17:44:46] (03CR) 10Ori.livneh: mediawiki: small clean-ups (wip) (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139065 (owner: 10Ori.livneh) [17:45:51] (03CR) 10Ori.livneh: mediawiki: small clean-ups (wip) (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139065 (owner: 10Ori.livneh) [17:48:09] !log starting some kafka failure tests, I have scheduled downtime for some service checks in icinga, hopefully this will not be noisy [17:48:13] Logged the message, Master [17:49:52] !log disabling puppet on analytics1012 and analytics1022 [17:49:57] Logged the message, Master [17:51:21] yurikR: did you update carriers data as well? [17:51:39] bblack, i added the office ip to TEST [17:52:18] yurikR: on the first mobile cache I looked at, it received an update to the carriers data at :40 (3 mins before your IRC ping to me) [17:52:28] so if that was after the new zero ext, things seem to still be work [17:52:36] bblack, 198.73.209.2 ? [17:52:36] ing [17:52:59] (03PS1) 10Jforrester: Enable TemplateData GUI on Hebrew Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139155 (https://bugzilla.wikimedia.org/66538) [17:53:16] Reedy: https://bugzilla.wikimedia.org/show_bug.cgi?id=66538 <-- is this a change in wmf-config ? [17:53:16] yurikR: yes [17:53:20] "TEST": ["50.185.139.129", "54.200.159.9", "67.244.17.232", "67.244.17.232", "71.183.236.23", "72.14.179.17", "74.71.10.241", "76.189.119.107", "98.245.172.223", "162.243.129.61", "198.73.209.2", .... [17:53:24] James_F: Here? [17:53:28] bblack, awesome, netmapper works, thx [17:53:37] JohnLewis: Yes. [17:53:48] James_F: Did you get my ping last night :) [17:54:15] JohnLewis: Yes, but you left before I could respond. [17:54:24] Yeah sorry :p [17:54:30] bblack, what's the update freq on netmapper? [17:54:47] JohnLewis: It's still on ice because the enwiki community seem to want to start an RfC, and I don't want to interfere. [17:55:11] urg enwiki and RfCs. [17:55:43] yurikR: every 5 mins for portal -> disk files on cache nodes, and then the runtime VCL stuff checks those files every 89 seconds [17:56:11] so max 389 second delay, if you push a change at the worst possible time [17:56:48] (and the cron/vcl timers are at their worst possible offset, which varies over time) [17:56:52] bblack, want to make it faster? while all this dev is going, it would really be a quick noop for servers, and much less pain for us? [17:57:03] yurkR: not really, no :) [17:57:08] also, do you know what the office ip range is? [17:57:18] I originally didn't want it to be this fast, you're the one that pushed for 5 mins [17:57:33] * yurikR doesn't like waiting for servers :) [17:58:06] it just doesn't make sense in normal production. With all the emails and contracts that go into changing a carrier range, a few minutes here or there on the technical end after the decision is made means nothing. [17:58:25] eranroz: Scheduled for the SWAT this afternoon. [17:58:39] i think its 198.73.209.0/24 [17:58:49] I have no idea, and don't want to have an idea [17:59:05] mutante: do I send you an e-mail if I want you to check a broken feed for Planet? :-) [18:00:04] Reedy, greg-g: The time is nigh to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140612T1800) [18:01:04] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Thu 12 Jun 2014 12:00:06 UTC [18:05:06] greg-g, did you find a bug for yesterday? I got your irc message this morning... [18:05:10] (e.g. I dont know if Roan or Timo filed one) [18:05:30] mwalker: I haven't yet, no. [18:06:12] No bug was filed [18:06:36] !log reedy Started scap: testwiki to 1.24wmf9 and build l10n cache [18:06:41] Logged the message, Master [18:07:41] Krinkle: mwalker rock paper scissors on who'll file one? [18:08:59] greg-g: Mmm, great stuff coming up next week: https://wikitech.wikimedia.org/wiki/Deployments#Next_month [18:09:38] "June 17: redirect tablets to mobile" <= is there any place I can read up on this? [18:10:19] twkozlowski: /me is asking [18:11:38] twkozlowski: it's pretty clear though, no? things which identify as a tablet will get the mobile view instead of desktop view [18:14:41] mwalker: hah, Krinkle is now |detached, I see he didn't want to play RPS [18:14:46] twkozlowski: If you visit the site using a mobile device, you get redirected to the mobile site (unless you then choose to go back) [18:15:01] From June 17, tables will be considered mobile devices for that purpose [18:15:13] IIRC because mobile has built a better web interface for them [18:15:37] hey, Flow templating is hitting "Maximum function nesting level of '100' reached, aborting!". Setting xdebug.max_nesting_level to 200 in /etc/php5/apache2/php.ini fixes it in labs-vagrant. Is this going to be a problem in production? (does production even run xdebug?) [18:15:42] twkozlowski: 14:13 < Maryana> i've got a blog post draft here: https://meta.wikimedia.org/wiki/Wikimedia_Blog/Drafts/Tablet_announcement [18:15:51] 14:14 < Maryana> we'll be running a centralnotice banner on the day of the redirect & sending folks to the blog post :) [18:16:05] akosiaris: if you +1 that mod patch i can babysit it onto prod [18:16:17] Cool. I like the mobile app for tablets. [18:16:33] ori: thoughts on spagewmf's question? [18:16:41] * ori reads backscroll [18:16:55] prod doesn't run xdebug [18:17:01] s/tables/tablets/, oops :) [18:17:11] but the fact that you're hitting that limit is worrisome in and of itself [18:17:19] ebernhardson will know how to debug this [18:17:31] hmm? [18:17:35] "These changes are responsive, too, so it looks great whether you’re on a tablet, a phablet – or even the mobile site on your desktop computer." Cool :-) [18:17:50] ebernhardson: spagewmf's comment that 'Flow templating is hitting "Maximum function nesting level of '100' reached, aborting!"' [18:18:09] it hits the limit because its inside computer generating templating code, and its using recursion [18:18:34] * ori checks if hack does tail-call optimization ;) [18:18:45] ebernhardson: ok, as long as its known and you're accounting for it [18:18:54] i was just responding to "Is this going to be a problem in production?" [18:18:55] we could probably come up with a way to render posts nested within posts without doing recursion, but i think it would be messier [18:19:22] ebernhardson: https://github.com/shaunxcode/php-trampoline maybe [18:19:32] it's from xdebug (misleading error message!), so maybe that's not a factor in production. Is there a better file to modify in vagrant than /etc/php5/apache2/php.ini ? [18:19:52] /etc/php5/apache2/xdebug.ini? [18:19:54] if it's there [18:21:05] (03PS1) 10Ottomata: Set up varnishkafka on varnish uploads [operations/puppet] - 10https://gerrit.wikimedia.org/r/139160 [18:22:15] mark: ^ objections? [18:22:24] i'm trying to make kafka cry :) [18:22:45] ori: FYI no /etc/php5/apache2/xdebug.ini in labs-vagrant, creating it doesn't do the trick. Modifying /etc/php5/apache2/php.ini works. Thanks! [18:22:49] Did we deploy yet? [18:23:11] audephone: just to testwiki [18:23:23] k [18:23:24] audephone: well, inprogress [18:23:24] paravoid, objections?: https://gerrit.wikimedia.org/r/#/c/139160/ [18:23:52] I am confused what Day it is since its a 3 Day week for me [18:24:31] Unlikely wikibase causes problem but if so i can get Computer out [18:27:41] _joe_: can you please review https://gerrit.wikimedia.org/r/#/c/138903/ ? [18:27:57] audephone: :) ok [18:28:19] <_joe_> matanya: mmmh I'd be mostly off [18:28:44] yeah. sorry. forgot it is like 20:30 for you [18:28:52] i gotcha matanya [18:28:54] looks ok to me [18:29:00] <_joe_> yeah :) [18:29:01] those boxes are all gone anyway, ja [18:29:02] ? [18:29:04] thanks ottomata :) [18:29:10] yes, long gone [18:29:39] (03PS1) 10Manybubbles: Drop all Cirrus content indexes down to 5 shards [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139163 [18:29:39] andrewbogott will second this for sure [18:29:55] (03PS4) 10Ottomata: cache: remove pointer to pmtpa for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/138903 (owner: 10Matanya) [18:30:01] (03CR) 10Ottomata: [C: 032 V: 032] cache: remove pointer to pmtpa for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/138903 (owner: 10Matanya) [18:30:19] done [18:30:26] thanks a lot [18:34:17] (03PS2) 10Ottomata: Set up varnishkafka on varnish uploads [operations/puppet] - 10https://gerrit.wikimedia.org/r/139160 [18:35:58] ugh, my ssh session died [18:36:06] (03CR) 10Ottomata: [C: 032 V: 032] Set up varnishkafka on varnish uploads [operations/puppet] - 10https://gerrit.wikimedia.org/r/139160 (owner: 10Ottomata) [18:36:15] Reedy: you aren't on a boat, are you? :P [18:36:53] Nope [18:36:54] At home [18:37:12] !log reedy Started scap: 1.24wmf9 staging take 2... [18:37:16] Logged the message, Master [18:39:30] Looks like it'd nearly done anyway [18:44:57] yurikR, you accidentally wikipedia? [18:45:43] MaxSem, ? [18:45:45] (03CR) 10Ottomata: [C: 031] apache: replace apache::mod::* hierarchy with simpler equivalent from MWV [operations/puppet] - 10https://gerrit.wikimedia.org/r/138769 (owner: 10Ori.livneh) [18:45:58] yurikR, now it thinks office in in zero [18:46:11] yeah, was dog fooding the office :) [18:46:22] already disabled, should go public in a bit [18:46:34] you killed all the contributory features, plese disable it [18:46:58] waiting for netmapper to download the new list of ips [18:47:05] refreshes every 5 min [18:47:18] also, saw landing page instead of WP main page on someone's phone [18:47:39] MaxSem, sorry, almost done with the meeting, will check in 10 min? [18:48:42] (03CR) 10Ori.livneh: [C: 032] apache: replace apache::mod::* hierarchy with simpler equivalent from MWV [operations/puppet] - 10https://gerrit.wikimedia.org/r/138769 (owner: 10Ori.livneh) [18:52:27] ottomata: it applied correctly [18:52:31] thanks [18:52:32] !log reedy Finished scap: 1.24wmf9 staging take 2... (duration: 15m 20s) [18:52:38] Logged the message, Master [18:53:22] (finally) [18:56:10] (03PS2) 10Reedy: Wikipedias to 1.24wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139144 [18:56:16] (03CR) 10Reedy: [C: 032] Wikipedias to 1.24wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139144 (owner: 10Reedy) [18:56:25] (03Merged) 10jenkins-bot: Wikipedias to 1.24wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139144 (owner: 10Reedy) [19:04:10] !log Dropping old GeoData tables from everywhere [19:04:15] Logged the message, Master [19:04:35] (03PS1) 10Ori.livneh: delete webserver::apache2::rpaf (unused) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139169 [19:04:56] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.24wmf8 [19:05:00] Logged the message, Master [19:06:19] (03PS2) 10Reedy: group0 to 1.24wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139145 [19:06:24] (03CR) 10Reedy: [C: 032] group0 to 1.24wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139145 (owner: 10Reedy) [19:06:31] (03Merged) 10jenkins-bot: group0 to 1.24wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139145 (owner: 10Reedy) [19:08:46] greg-g: I only play rock paper scissors lizard spock. [19:09:16] Krinkle: :P [19:11:59] Krinkle|detached: https://bugzilla.wikimedia.org/show_bug.cgi?id=66538 is this going to wmf-config ? [19:14:11] * aude home [19:16:26] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: Fetching readonly [19:19:17] !log starting hadoop decom of analytics1018. This node will become a Kafka broker [19:19:22] Logged the message, Master [19:19:56] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf9 [19:19:56] <_joe_> are you guys about to deploy, I suppose 10 minutes is not enough [19:20:00] Logged the message, Master [19:20:26] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: Fetching readonly [19:20:28] _joe_: ? [19:20:40] oh, the unmerged bit [19:20:43] yeah [19:20:46] <_joe_> greg-g: the alarm on unmerged changes fires up after 10 minutes [19:20:46] PROBLEM - Hadoop NodeManager on analytics1018 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [19:21:01] <_joe_> it's more than enough for ops to puppet-merge [19:21:11] <_joe_> we can tune that though [19:21:26] ACKNOWLEDGEMENT - Hadoop NodeManager on analytics1018 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager ottomata This node will become a kafka broker. [19:21:52] looking at logs generally, i see a bunch of exceptions for MathSource (Math extension) [19:22:13] * aude can't see log stash to see how prevalent [19:24:46] RECOVERY - Hadoop NodeManager on analytics1018 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [19:25:37] !log stopping puppet on an18 [19:25:43] Logged the message, Master [19:25:49] ughhhhh [19:27:46] Reedy: re the math errors aude mentions, we probably need to merge this: https://gerrit.wikimedia.org/r/#/c/138993/ [19:27:46] PROBLEM - Hadoop NodeManager on analytics1018 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [19:29:53] i don't know the math code well [19:30:03] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Thu Jun 12 19:30:00 UTC 2014 [19:30:23] i just see it does writeCache then writeToDatabase then says wreting to database is not allowed [19:30:29] w/ exception [19:30:34] something is mixed up [19:31:47] Reedy: https://logstash.wikimedia.org/#dashboard/temp/J6KaT42ZTdSxRChlkrxf0A [19:32:16] and of course he's not online [19:33:07] code doesn't make sense to me [19:33:29] writeCache (then ifChanged -> writeToDatabase) [19:33:39] i don't see writeToCache oro anything [19:34:55] Reedy: see also: https://gerrit.wikimedia.org/r/#/c/135521/ [19:37:05] (03Abandoned) 10Yurik: Switching to ZeroBanner/Portal extension, disabling ZRMA [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139025 (owner: 10Yurik) [19:37:14] Reedy: if we can't make heads or tails of the math errors, lets just revert the math extension (again) to wmf7 (it looks like it was merged in wmf8 and 9). [19:37:29] +1 [19:38:08] I'm here [19:38:11] What are these Math errors? [19:38:13] Just irc client suppressing pings [19:38:28] aude: PHP Warning: include() [function.include]: GC cache entry '/usr/local/apache/common-local/php-1.24wmf7/extensions/Math/Math.php' (dev=2049 ino=6178981) was on gc-list for 601 seconds in /usr/local/apache/common-local/php-1.24wmf8/extensions/Wikidata/vendor/composer/ClassLoader.php on line 378 ?? :) [19:38:43] RoanKattouw: dirty log search: https://logstash.wikimedia.org/#dashboard/temp/tcrZ0PkbR3GNGMyAZFCzyA [19:39:02] OK so math backend stuff [19:39:06] * RoanKattouw washes his hands [19:39:17] greg-g: You can ignore those apc stuff [19:39:43] RoanKattouw: :) [19:39:53] Reedy: k [19:39:54] huh [19:39:57] ok [19:41:13] PROBLEM - Varnishkafka log producer on cp3005 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [19:48:40] yurikR2: Catchable fatal error: Argument 1 passed to JsonConfig\JCSingleton::getSettings() must be an instance of TitleValue, null given, called in /usr/local/apache/co [19:48:40] mmon-local/php-1.24wmf8/extensions/JsonConfig/includes/JCSingleton.php on line 359 and defined in /usr/local/apache/common-local/php-1.24wmf8/extensions/JsonConfig/incl [19:48:40] udes/JCSingleton.php on line 293 [19:50:22] Reedy, thx, looking [19:51:46] Reedy, i'm not seeing it in logstash [19:53:18] It's there in the apache syslogs... [19:53:27] Not seemingly on fluorine [19:55:18] Reedy: what's this one about: [19:55:18] DatabaseBase::sourceStream 10.64.16.27 1051 Unknown table 'geo_killlist_old' (10.64.16.27) DROP TABLE geo_killlist_old\n [19:55:34] Found it [19:55:47] greg-g: Blame MaxSem [19:55:51] :) [19:55:52] greg-g: What's the timestamp on it? [19:55:54] MaxSem: ^^ ;) [19:55:58] (03PS3) 10Withoutaname: Delete ve.wikimedia.org and leave redirect [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131907 (https://bugzilla.wikimedia.org/55737) [19:56:06] 2014-06-12T19:08:26.000Z [19:56:14] yurikR2: http://p.defau.lt/?eVtTJt9VUvFvs_ZCmCvy7g [19:56:15] https://logstash.wikimedia.org/#dashboard/temp/ygAykqFxQKCm29-GNSJF0g [19:56:41] I think that might be a case of the DROP DATABASE was run on all wikis [19:56:47] Whether or not they had the table [19:57:06] yep [19:57:28] because your creation script created those tables indiscriminatey on all wikis;) [19:57:35] *new wikis [19:57:53] I ended up doing that for a reason though [20:01:06] Reedy, judging by the Title code, there should be a warning right before that "Can't create a TitleValue for [[..." [20:01:31] what logs are you looking at? [20:02:57] reedy@fluorine:/a/mw-log$ grep JCSingleton fatal.log -A 25 [20:06:14] RECOVERY - Varnishkafka log producer on cp3005 is OK: PROCS OK: 1 process with command name varnishkafka [20:11:36] * greg-g takes a late lunch [20:13:21] (03PS1) 10Aaron Schulz: Removed unused "forkcount" stuff from jobs-loop [operations/puppet] - 10https://gerrit.wikimedia.org/r/139191 [20:14:45] (03PS12) 10Christopher Johnson (WMDE): Icinga: Check Dispatch command for Wikidata notification [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 [20:24:34] manybubbles: https://gerrit.wikimedia.org/r/#/c/137646/ [20:25:25] So what this does is that if I search for a term, and a page is in a category that includes this term in its name, I will now see the page on top of my search results? [20:25:33] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [20:25:47] twkozlowski: yes _but_ we reverted it from this week's roll out [20:26:00] we'll roll it out next week [20:26:10] because we're going to want to reindex everything next week any way [20:26:30] https://gerrit.wikimedia.org/r/#/c/139107/ [20:26:52] so you also reverted the revert, so what does it mean? :-P [20:26:55] twkozlowski: timing! [20:27:06] I reverted it so it wouldn't go out today [20:27:12] to group0 [20:27:22] and unreverted it after the branch so it'll go out next Thursday [20:27:34] (03PS1) 10Christopher Johnson (WMDE): Icinga: IRC notification event handler and Wikidata configuration file [operations/puppet] - 10https://gerrit.wikimedia.org/r/139193 [20:27:45] I did that because of that deployment note - its best to batch things with similar deployment notes together [20:27:54] and we have one or two other things like it in review [20:27:57] oh cool [20:28:16] * twkozlowski bookmarking the patch for later use [20:28:35] one thing should pull results higher if the there is something in the lead in of the article [20:29:06] twkozlowski: right now you can use <> to limit the result to just things in a category - if you want [20:29:14] thats been there since lsearchd's time [20:29:18] (03PS13) 10Christopher Johnson (WMDE): Icinga: Check Dispatch command for Wikidata notification [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 [20:29:38] Yeah, I know that. I just noticed this while going through Gerrit patches submitted this week [20:29:57] <^d> 137521 needs a rebase, then that one can go in for next week too. [20:30:34] !gerrit 137521 [20:30:43] aha -,- [20:31:04] (03CR) 10Chad: [C: 031] "lgtm, feel free to merge in SWAT or when you planned it to go out." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139163 (owner: 10Manybubbles) [20:31:35] <^d> twkozlowski: https://gerrit.wikimedia.org/r/#/c/137521/ :) [20:31:55] <^d> It should help boost matches from the article lead. [20:32:07] Reedy, https://bugzilla.wikimedia.org/show_bug.cgi?id=66555 [20:32:30] this reminds me I wanted to test something but I don't remember what [20:32:32] TIL: URLs like https://gerrit.wikimedia.org/137521 do not work [20:32:48] Reedy, i will introduce a workaround, but content handler hook should really not even be called when users hit "v:" page [20:32:49] while URLs like https://bugzilla.wikimedia.org/60001 work okay [20:33:38] <^d> https://gerrit.wikimedia.org/r/137521 works. [20:33:47] <^d> /r/ is a legacy mistake. [20:33:57] * twkozlowski nods [20:35:40] rebasing [20:37:33] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Last successful Puppet run was Thu 12 Jun 2014 17:36:45 UTC [20:38:31] ^d: let me know if anything else needs a rebase when you review it please. I'm just kind of plowing ahead [20:38:44] <^d> I think that's the only one. [20:38:45] ^d: right now running tests for the rebase [20:38:48] seet [20:38:50] sweet [20:38:57] <^d> Insource looks ok against master. [20:39:03] ^d: you still can't get good error messages from insource? [20:39:24] (03CR) 10Aude: Icinga: IRC notification event handler and Wikidata configuration file (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139193 (owner: 10Christopher Johnson (WMDE)) [20:39:38] <^d> Lemme get the boost thing merged and then I'll pull it in again and hack at it. [20:41:45] Reedy, i fixed it with https://gerrit.wikimedia.org/r/#/c/139198/, do you want to depl it right away? [20:42:57] oops [20:43:06] Reedy, sorry, spoke too fast, one sec [20:43:49] (03CR) 10Chad: mwgrep: Add namespace prefix in output (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139059 (owner: 10Krinkle) [20:45:40] (03CR) 10Manybubbles: mwgrep: Add namespace prefix in output (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139059 (owner: 10Krinkle) [20:45:49] (03CR) 10Manybubbles: [C: 031] mwgrep: Add namespace prefix in output [operations/puppet] - 10https://gerrit.wikimedia.org/r/139059 (owner: 10Krinkle) [20:46:39] ^d: cool - I'm running the regressions on the rebased version. The rebase was just annoying enough that I want to make sure I didn't break something [20:47:17] * ^d nods [20:47:33] PROBLEM - Packetloss_Average on analytics1003 is CRITICAL: packet_loss_average CRITICAL: 9.1425542 [20:48:26] Reedy, ok, fixed https://gerrit.wikimedia.org/r/#/c/139199 - should i push it out? [20:48:33] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Thu 12 Jun 2014 17:48:13 UTC [20:49:27] greg-g, minor bug in prod, should i push it out today? (zerowiki and metawiki error on links to "v:") [20:49:36] yurikR2: confirmed fixed in beta? [20:49:42] looks simple enough, go for it [20:50:38] greg-g, http://zero.wikimedia.beta.wmflabs.org/wiki/v: - waiting for beta syncing [20:52:23] PROBLEM - Packetloss_Average on oxygen is CRITICAL: packet_loss_average CRITICAL: 13.1065471 [20:52:42] interesting... [20:53:26] ACKNOWLEDGEMENT - Hadoop NodeManager on analytics1018 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager ottomata This node will become a kafka broker. [20:54:48] ACKNOWLEDGEMENT - Puppet freshness on analytics1012 is CRITICAL: Last successful Puppet run was Thu 12 Jun 2014 17:36:45 UTC ottomata Puppet is down here as I do failure tests. Its ok! [20:54:48] ACKNOWLEDGEMENT - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Thu 12 Jun 2014 17:48:13 UTC ottomata Puppet is down here as I do failure tests. Its ok! [20:56:23] RECOVERY - Packetloss_Average on oxygen is OK: packet_loss_average OKAY: 1.8361878 [20:57:18] (03CR) 10Chad: mwgrep: Add namespace prefix in output (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139059 (owner: 10Krinkle) [21:01:33] RECOVERY - Packetloss_Average on analytics1003 is OK: packet_loss_average OKAY: 0.8896939 [21:10:16] !log yurik Synchronized php-1.24wmf8/extensions/JsonConfig/: JsonConfig ext update, fixing bug 66555 (duration: 01m 04s) [21:10:21] Logged the message, Master [21:11:33] !log yurik Synchronized php-1.24wmf9/extensions/JsonConfig/: JsonConfig ext update, fixing bug 66555 (duration: 01m 03s) [21:11:37] Logged the message, Master [21:28:19] (03PS1) 10Aaron Schulz: Removed reference to unused -v option in jobs-loop [operations/puppet] - 10https://gerrit.wikimedia.org/r/139208 [21:37:33] PROBLEM - Puppet freshness on tin is CRITICAL: Last successful Puppet run was Thu 12 Jun 2014 18:36:54 UTC [21:45:54] hm, i think tin is my change [21:45:55] fixing [22:25:33] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Last successful Puppet run was Thu 12 Jun 2014 19:24:26 UTC [22:26:05] ACKNOWLEDGEMENT - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC ori.livneh Most likely broken by Iecfa44c92 Ill fix. [22:30:53] !log ori Synchronized php-1.24wmf9/extensions/WikimediaEvents: Update WikimediaEvents for Ibd36da416 (duration: 00m 03s) [22:30:57] Logged the message, Master [22:31:25] !log ori Synchronized php-1.24wmf8/extensions/WikimediaEvents: Update WikimediaEvents for Ibd36da416 (duration: 00m 03s) [22:31:30] Logged the message, Master [22:33:49] MaxSem: hah! i just saw that you have a couple of patches for today's swat as well :) [22:35:01] woot, fatal exception [22:35:04] [b379b4ff] 2014-06-12 22:34:25: Fatal exception of type MWException [22:35:08] https://pl.wikipedia.org/wiki/Element_elektroniczny_czynny [22:35:45] Bad news bears [22:36:10] (and reproducible after a refresh for me) [22:36:35] * ori looks [22:37:11] 2014-06-12 22:34:25 mw1183 plwiki: [b379b4ff] /wiki/Element_elektroniczny_czynny Exception from line 50 of /usr/local/apache/common-local/php-1.24wmf8/extensions/Math/MathSource.php: in math source mode no database caching should happen [22:37:33] Reedy: greg-g ^ [22:37:39] they were talking about something similar a while ago, IIRC [22:39:58] grrr [22:41:31] https://gerrit.wikimedia.org/r/#/c/135521/ [22:41:43] I may be totally off, but... ^^ [22:42:32] https://bugzilla.wikimedia.org/show_bug.cgi?id=66492 [22:42:35] ori: ^ [22:43:00] ori: last two comments on that bug [22:43:11] good lord [22:43:54] yeah [22:43:57] alright, let's see here [22:44:25] ori: if you want, feel free to revert the math extension to wmf7, as that new db stuff is in wmf8 and 9 [22:44:37] https://gerrit.wikimedia.org/r/#/c/139068/1/wmf-config/CommonSettings.php should do it [22:44:39] can i merge that? [22:44:59] yeah [22:45:13] (03CR) 10Ori.livneh: [C: 032] Disable MathML rendering option [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139068 (https://bugzilla.wikimedia.org/66492) (owner: 10Physikerwelt) [22:45:24] (03Merged) 10jenkins-bot: Disable MathML rendering option [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139068 (https://bugzilla.wikimedia.org/66492) (owner: 10Physikerwelt) [22:46:45] !log ori Synchronized wmf-config/CommonSettings.php: disable MW_MATH_MATHML until mathoid table is created (BUG 66492) (duration: 00m 04s) [22:46:50] Logged the message, Master [22:46:59] _joe_: Hm.. this is the first time I see that unmerged alert, interesting. Is the source code for that public? [22:48:03] YuviPanda, ? [22:49:16] MaxSem: https://wikitech.wikimedia.org/wiki/Deployments#Near-term I was going to ask you to SWAT something for me, then I figured I'd do it myself, not a big deal, and then found your name in https://wikitech.wikimedia.org/wiki/Deployments#Near-term swatting two other things anyway. Just was a bit funny in context to my head at that time, nothing to see now [22:50:42] I'll take today's SWAT: :) [22:51:15] greg-g, ori: Are you guys still busy messing with math, or am I clear to start SWATting in a few minutes? [22:51:25] you're clear [22:52:24] thanks RoanKattouw [22:52:39] Cool [22:55:48] OK I have a 4:30 so I'm gonna start a little bit early [22:56:02] gogogogog [22:56:23] (03CR) 10Catrope: [C: 032] Enable TemplateData GUI on Hebrew Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139155 (https://bugzilla.wikimedia.org/66538) (owner: 10Jforrester) [22:56:31] (03Merged) 10jenkins-bot: Enable TemplateData GUI on Hebrew Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139155 (https://bugzilla.wikimedia.org/66538) (owner: 10Jforrester) [22:56:52] * YuviPanda waves [22:56:52] (03CR) 10Catrope: [C: 032] Disable PageImages on Wikibooks and Wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139048 (https://bugzilla.wikimedia.org/66455) (owner: 10MaxSem) [22:56:57] (03Merged) 10jenkins-bot: Disable PageImages on Wikibooks and Wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139048 (https://bugzilla.wikimedia.org/66455) (owner: 10MaxSem) [22:57:19] Krinkle: It's new [22:57:40] (03CR) 10Catrope: [C: 032] Kill all vestiges of GeoData's Solr support [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135088 (owner: 10MaxSem) [22:57:49] (03Merged) 10jenkins-bot: Kill all vestiges of GeoData's Solr support [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135088 (owner: 10MaxSem) [22:58:16] RoanKattouw: for my patches, should I do a submodule bump after you merge them or they enough by themselves? [22:58:56] You do it [22:58:59] I'll +2 them right now [22:59:01] RoanKattouw: ok. [22:59:33] now I wait for jenkins [22:59:38] !log catrope Synchronized wmf-config/: (no message) (duration: 00m 04s) [22:59:43] Logged the message, Master [23:00:05] mwalker, ori, MaxSem: The time is nigh to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140612T2300) [23:00:28] (03PS1) 10Withoutaname: Initialize some settings for wikimania2015wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139279 (https://bugzilla.wikimedia.org/66370) [23:00:33] YuviPanda: Jenkins is done, ping me the links when you're done [23:00:41] RoanKattouw: yup, fetching now [23:02:27] PROBLEM - Apache HTTP on mw1151 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 50706 bytes in 0.024 second response time [23:03:35] RoanKattouw: second patch https://gerrit.wikimedia.org/r/139283 [23:03:55] YuviPanda: The second one is against the wrong branch [23:04:05] RoanKattouw: yeah, muscle memory and ctrl-r memory [23:05:11] !log Purging PageImages data from Wikibooks and Wikisource [23:05:16] Logged the message, Master [23:05:42] * RoanKattouw takes a quick break while Jenkins hasn't even queued any of those commits yet [23:06:01] stupid jenkins [23:07:27] !log integration-slave1003 is failing npm-test builds due to a cache corruption (filed as https://github.com/npm/npm/issues/5472). Manually cleared /mnt/home/jenkins-deploy/.npm/async on integration-slave1003.eqiad.wmflabs for now. [23:07:32] Logged the message, Master [23:07:40] RoanKattouw: James_F: ^ [23:10:05] hey, somehow the broken mw1151 is serving requests [23:10:42] it's bad because it has obsolete MW version [23:11:20] Krinkle: is jenkins stuck again? [23:11:25] No [23:11:29] not that I know anyway [23:11:51] icinga-wm: PROBLEM - Apache HTTP on mw1151 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 50706 bytes in 0.024 second response time [23:11:57] MaxSem: also, it is broken [23:12:41] Krinkle: Cool, thanks. [23:12:51] YuviPanda: there is nothing queued. If it isn't running that means it's not going to be happening [23:12:52] link? [23:13:27] Krinkle: nvm, it *just* merged https://gerrit.wikimedia.org/r/#/c/139284/ [23:13:28] ok [23:14:06] * YuviPanda awaits RoanKattouw [23:16:07] the bits app servers are not in use atm [23:16:24] load.php reqs are forwarded to the general app server pool [23:17:04] but load on that host does seem to have increased [23:17:07] let's see what puppet.log says [23:17:45] !log catrope Synchronized php-1.24wmf8/extensions/VisualEditor: (no message) (duration: 00m 04s) [23:17:49] Logged the message, Master [23:17:50] !log catrope Synchronized php-1.24wmf8/extensions/MobileFrontend: (no message) (duration: 00m 05s) [23:17:55] Logged the message, Master [23:17:55] !log catrope Synchronized php-1.24wmf9/extensions/MobileFrontend: (no message) (duration: 00m 04s) [23:18:00] Logged the message, Master [23:18:21] RoanKattouw: shall I test now? [23:18:56] mw1151 does not appear in pybal config, not disabled or enabled, it's just not in there [23:19:53] PHP Warning: require_once(/usr/local/apache/common-local/php-1.24wmf6/extensions/JsonConfig/JsonConfig.php) [function.require-once]: failed to open stream: No such file or directory in /usr/local/apache/common-local/wmf-config/mobile.php on line 44 [23:19:55] the bits app servers are not behind pybal [23:19:58] That's all mw1151 is doing at the moment [23:20:02] they were load-balanced by varnish [23:20:09] gotcha [23:20:12] the only requests that are coming in are the varnish health checks [23:20:15] bits apache, yep [23:20:22] i think because the backends are still defined for varnish [23:20:27] even though varnish is not otherwise routing requests [23:20:28] let's see [23:20:46] PHP Fatal error: require_once() [function.require]: Failed opening required '/usr/local/apache/common-local/php-1.24wmf6/extensions/JsonConfig/JsonConfig.php' (include_path='/usr/local/apache/common-local/php-1.24wmf6/extensions/TimedMediaHandler/handlers/OggHandler/PEAR/File_Ogg:/usr/local/apache/common-local/php-1.24wmf6:/usr/local/lib/php:/usr/share/php') in /u [23:20:46] sr/local/apache/common-local/wmf-config/mobile.php on line 44 [23:20:50] Weird [23:21:28] Have fun! [23:22:29] RoanKattouw_away: ty [23:23:26] RoanKattouw_away: hmm, I see the changes take effect on testwiki but not enwiki? [23:23:42] * YuviPanda looks for any other deployer [23:23:46] (03PS1) 10Withoutaname: Apache settings for wikimania2015wiki [operations/apache-config] - 10https://gerrit.wikimedia.org/r/139288 (https://bugzilla.wikimedia.org/66370) [23:25:27] RoanKattouw_away: mw1151(10.64.16.131,,80) 10 probe Sick 0/3 [23:25:33] bblack: you around? [23:25:44] or mutante? [23:26:50] i'll send a note to the ops list [23:26:59] it's not an emergency but it needs to be looked at [23:27:22] it looks like the VCL has not been reloaded since filippo updated the list of bits app server backends [23:27:48] because varnishadm backend.list is still showing the eqiad bits app servers [23:29:03] i'll run sync-common on mw1151 so it doesn't limp [23:30:44] ori: can you look at why the change RoanKattouw_away swatted seems to have taken effect on testwiki but not on enwiki, or point me to whom I should poke? [23:31:13] maybe he forgot to do a submodule update on tin [23:31:19] test and enwiki are on different branches [23:31:31] yeah, so that's highly possible [23:31:34] let me check mw.org [23:31:46] I did see !log catrope Synchronized php-1.24wmf8/extensions/MobileFrontend: (no message) (duration: 00m 05s) [23:32:01] right, it's on mw.org as well, so possibly a submodule update miss [23:32:05] * YuviPanda doesn't have tin access [23:32:24] what is the change you are expecting to see? [23:32:34] ori: Special:Tags should have 'mobile web edit' [23:32:44] what's the commit, i mean [23:33:23] RECOVERY - Apache HTTP on mw1151 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.134 second response time [23:33:33] ori: https://gerrit.wikimedia.org/r/139281 for wmf8 [23:33:50] ori: https://gerrit.wikimedia.org/r/#/c/139284/ for wmf9 [23:34:14] yeah, he forgot to update the submodule [23:34:20] i'll sync it now [23:34:23] RoanKattouw_away: ^ [23:34:24] ori: cool! [23:34:34] !log ran sync-common on mw1151 [23:34:39] Logged the message, Master [23:35:06] !log ori Synchronized php-1.24wmf8/extensions/MobileFrontend: Re-syncing after submodule update (duration: 00m 06s) [23:35:11] Logged the message, Master [23:35:24] how's it looking now? [23:35:50] ^ YuviPanda [23:35:51] ori: yup, seeing it now. cool, thanks! :D [23:35:55] np [23:36:07] brain melting [23:37:01] * YuviPanda gives ori some ice cream [23:44:54] (03PS1) 10Dzahn: remove ekrem's public IP and mgmt [operations/dns] - 10https://gerrit.wikimedia.org/r/139291 [23:48:47] (03CR) 10Dzahn: [C: 031] "i guess so, this is what was on manutius at least" [operations/puppet] - 10https://gerrit.wikimedia.org/r/139071 (owner: 10Matanya) [23:58:10] (03CR) 10Dzahn: [C: 031] "yea, so, there was nothing including mortals when we added yaml here: https://gerrit.wikimedia.org/r/#/c/136150/1/manifests/site.pp , BU" [operations/puppet] - 10https://gerrit.wikimedia.org/r/139089 (owner: 10Giuseppe Lavagetto) [23:58:51] mutante: ah, you're right [23:58:54] mutante: that's the issue [23:59:05] ori: ^ back in 98f3808af6dfc38 it still included mortals [23:59:11] not sure when it disappeared [23:59:22] it explains why it got into an inconsistent state [23:59:30] because roan is root [23:59:42] so his sync-dirs went through, but not scaps earlier in the day [23:59:48] yea, and only on mw1151 because that was reinstalled