[00:00:36] hmm, well basically i'm trying to figure out whats changing the /srv/mediawiki permissions on me. puppet seems like the most likely candidate, but its turning into a real pain to debug :P [00:07:15] ebernhardson: If it is Puppet, it should log whenever it changes anything, either in /var/log/puppet* or interactively with "sudo puppetd -tv". So I would change the permissions of /srv/mediawiki, then run "sudo puppetd -tv" and see if there's a line about that. [00:07:35] ebernhardson: I thought role::mediawiki-install::labs left /srv/mediawiki alone. New files get created with the user's umask which defaults to 022 ; I don't know if it was different on pmtpa instances [00:23:07] spagewmf: There's no reason it should be (different), it's the same version of puppet running from the same manifest. [00:31:50] Coren: thanks. We're having more trouble updating /srv/mediawiki on our new eqiad labs instance and aren't sure why. I just did g+s on all subdirectories, I think it helped. [00:32:34] Well, g+s will just propagate the group down (which may be what you want); but if you have something resetting permission that won't help in the long run. [00:32:39] !log deployment-prep Converted deployment-graphite to use local puppet & salt masters [00:32:42] Logged the message, Master [01:31:07] Hi, I finished the migration and restarted the webservice, but nothing is showing on http://tools.wmflabs.org/robin/ - does anyone have any idea what the problem is? [01:48:13] SPQRobin: That gives for me "No webservice"?! [01:52:32] (03PS1) 10Tim Landscheidt: Fix Lintian errors in misctools man pages [labs/toollabs] - 10https://gerrit.wikimedia.org/r/122628 [01:54:35] scfc_de: yeah, that's what I mean with "nothing" :P [02:01:02] SPQRobin: did you run webservice start? [02:01:18] Betacommand: I did [02:01:39] when I do it again, it says "webservice already running", so... [02:01:49] and trying to stop it doesn't work [02:02:42] SPQRobin: you are doing this as the tool correct? [02:02:51] SPQRobin: Your job is in "E" state; "error: can't open output file "/data/project/robin/error.log": Perm" [02:03:03] scfc_de: ah [02:03:06] Betacommand: yes [02:03:27] Which I don't understand ... [02:03:36] heh [02:03:55] scfc_de: what are the permissions for that file? [02:05:04] SPQRobin: Ah, you started webservice as robin. You need to start it as tools.robin ("become robin" => "webservice start"). [02:05:13] Users don't have webservices. [02:06:05] scfc_de: well, the tool is named identical to my user (both robin; I know, not really a good idea) [02:06:42] so I did do "become robin" & "webservice start" [02:07:23] but I did also play around with file permissions a while ago.. maybe there's the problem [02:08:04] SPQRobin: not sure you can have a tool with the same name as a user [02:08:25] well, it did work before the migration [02:09:04] maybe I should just start again under a new tool name [02:09:12] SPQRobin: you where probably exploiting a bug that got fixed [02:09:41] No, that should work. [02:11:15] SPQRobin: Look at "qstat -j 157058". It says "owner: robin", so there is a *very* high probability that you didn't "become robin" first :-). Have you tried again? [02:12:25] I did do "become robin"; my commands are under "tools.robin@tools-login" [02:12:51] Ah, I see what happened: You tried to "webservice start" as robin (which launches the job as "lighttpd-robin" as well), and then as tools.robin, but the latter saw that a job "lighttpd-robin" was already running, so it didn't start a new one. [02:13:12] As robin, you need to "qdel -j 157058", then "become robin" and "webservice start". [02:13:51] scfc_de: do you know how to block access to a tools webspace based off user agents? [02:14:44] With lighttpd no; but there should be some docs for that. [02:15:11] Ive see a TweetmemeBot spidering my tools [02:15:26] http://redmine.lighttpd.net/projects/1/wiki/Docs_Configuration, "match on useragent" looks alright. [02:15:45] scfc_de: that worked; thanks a lot! [02:16:21] In pmtpa, we blocked TweetmemeBot globally in tools-webproxy, but I haven't caught up how that is set up in eqiad. [04:37:48] Good morning all [04:37:55] * Beetstra looks at Coren .. any news? [06:53:18] Betacommand: why? the bot is smart enough, just try !fingerprint [07:45:05] does anyone have any idea who maintains the http://ganglia.wmflabs.org/ interface? It is dead :D [08:05:00] hashar: " [08:05:00] I'm stumped, but matanya has offered to work on this. andrew (talk) 15:56, 17 March 2014 (UTC) [08:05:05] quote from Andrew [08:05:15] ah nice [08:05:17] thanks :-] [08:05:31] it is a bit of a pity since wikimedia got a contractor to handle that [08:05:34] https://wikitech.wikimedia.org/wiki/Nova_Resource:Ganglia [08:05:37] says petan based on the logs [08:05:43] would have expected the project / instance to have been movable easily [08:05:55] contractor for just ganglia? [08:06:03] yeah I think so [08:06:03] petan says what? [08:06:35] petan: the questions was who was working on ganglia project in the past [08:06:43] and i just saw your name on project SAL [08:06:45] I was one of many :P [08:07:04] I basically just fixed the OOM issues that resulted in random crashes [08:07:10] blames not enough logging by the others [08:07:24] gotcha, petan [08:07:38] hashar: i didnt realize we have a contractor [08:07:48] that was a while ago [08:08:53] hashar: poke me later about this, didn't even have time to look at [08:09:24] matanya: was just wondering whether someone knew ganglia is dead :-D [08:09:33] i have [08:15:41] hashar: can you recommend who will know about https://gerrit.wikimedia.org/r/#/c/74591/11/manifests/misc/maintenance.pp ? [08:18:01] matanya: what are the stats? (usage of apache vs. mwdeploy) [08:18:26] let me grep [08:19:19] 12 apache 7 mwdeploy mutante [08:20:02] i'd just make the patch with apache as user, add comment to it linking to the revert [08:20:10] and poke people for gerrit reviews [08:20:25] would have used apache because the other "purge" script does it [08:21:06] i did that [08:21:14] my concerns are the logging issue [08:21:51] i have no clue where is the logging happenning, i guess from mw-deployment-vars.sh some where [08:21:59] but not familiar with it [08:23:08] what do you mean, isn't it just changing the path to logfile? [08:24:07] no, can;t [08:24:16] it comes from templates/misc/mw-deployment-vars.erb [08:24:36] but i dont know if other things rely on that [08:24:55] try Reedy [08:24:59] it seems not, but who knows [08:25:13] i think i'll push, and wait for reviews [08:25:20] rely on the logfile that doesn't exist yet? sorry i dont get the whole context yet [08:25:26] but yea, do that [08:31:40] !log deployment-prep MediaWiki config paths tweaks for Math {{bug|63331}} and Captchas {{bug|63342}} [08:31:43] Logged the message, Master [08:41:37] Is icinga (at http://icinga.wmflabs.org/ ) also (longer) dead or just moved to another url? [08:49:47] se4598: try nagios.wmflabs.org :) [08:51:45] matanya: wasn't nagios the old system where nagios company asked why nagios.wikimedia.org redirects to icinga? ;P [08:51:54] but there also: Connect to 208.80.153.249 on port 80 ... failed Error 110: Connection timed out [08:51:58] that is surprising indeed [08:52:10] my point is all monitoring seems to be dead [08:52:22] remembers how much work it was for Leslie and others to switch prod from Nagios to Icinga [08:52:35] and what se4598 said [08:52:37] about the naming [08:54:22] AFAIR was icinga.wmflabs.org already a long time before migration to eqiad a 502 Bad Gateway [08:54:35] yes mutante i was abit involved in that effort [08:55:01] se4598: yes [08:55:36] you could ping Damianz [08:55:55] Damianz: can we ?:) [09:09:24] do we have or need a bug for icinga and ganglia? [09:12:30] se4598: i'd say need.. [09:13:01] labs migration people just said to please make 'em [09:17:17] se4598: if in doubt, fill a bug :-] [09:17:24] at worth someone will close it [09:17:40] will do, currently busy fighting vandalism :) [09:20:58] as long as it's not vandalism on bug reports:) [09:21:04] thanks [09:29:45] we are not able to access our instance wikidata-builder3 since yesterday.. [09:30:03] https://www.irccloud.com/pastebin/zaBSjG2U [09:30:35] should we consider it lost? [09:30:38] Tobi_WMDE: a bug in bugzilla will do, or poke andrewbogott_afk or Coren [09:39:02] Coren: do you have an idea? [09:45:59] se4598: What are you fighting vandalism with? [09:47:26] most definitely not the wiki software itself, as logical as that could be [09:47:43] gry: wiki software suck in fighting vandals [09:47:50] that can be fixed [09:47:55] I hope he uses huggle as he is one of its developers [09:47:59] gry: so fix it. [09:48:09] I would, if I knew the language; I don't [09:48:21] petan: I am ;) [09:48:29] I know the language, which makes me think it can't be fixed [09:48:29] gry: JavaScript? [09:48:36] no, wikis aren't written in js [09:48:44] Oh, you mean the core? [09:49:01] yes [09:49:13] How would that work? [09:49:23] huggle is a browser desinged to fight vandalism, mediawiki is software that is executed on server and then rendered in a browser, it by design is not possible to be ever that powerful as huggle [09:49:52] it's like you wanted to create a java interpretor written in java that would be faster than its c version [09:49:55] it never will be [09:50:50] a930913: in some way that any contributor can do his stuff without installing software in browser or computer [09:50:50] Huggle doesn't max CPU, so speed isn't an issue. [09:51:12] no, I don't really have an issue with these tools performance [09:51:17] gry: What do userscripts count as? [09:51:33] gry: yes that could be possible, but again it would be slow and ineffective, there is already twinkle for this afaik [09:51:33] as something you can put into core and maintain properly [09:52:23] I'm not sure how putting something into JS and loading it when he enters a vandal-fight mode could slow things [09:52:33] I just can't imagine someone would be spending time to create some js based tool that would be heavily multithreaded connected to multiple external resources and doing all the stuff what huggle does [09:52:43] possibly [09:53:21] maybe they want to run it on node.js? [09:53:25] it would slow things because software written in JS is slower than C++, that's why all these 3d games are written in C++ and not in JS [09:53:49] that's why linux kernel is written in c and not JS :P [09:54:09] it would be funny idea to create self-interpreted byte compiled JS kernel though [09:54:21] hm. they have parsoid which does many things at once, what is it written in? [09:54:32] JS and it's horribly slow [09:54:42] I installed parsoid to one of my servers, it just suck [09:55:01] it consumes tons of resources to do a little work [09:55:48] I think that the approach of WYSIWIG editor was much better than VisualEditor [09:56:31] that thing was really fast, stable and didn't require any 3rd software to be installed on server, like parsoid [10:06:19] petan: V8 is fast enough to run a Huggle. The main lag would be in IO anyway. [10:06:49] Remember, until recently huggle was in C# ;) [10:06:53] maybe but I see a little point in that [10:07:07] a930913: hg2 (which I use) is in Visual Basic [10:07:28] I see that it sounds cool that "user doesn't need to install anything, but open a webpage" but on other hand, typing one command in terminal in order to install huggle isn't that hard either [10:07:39] you need to install browser anyway :P [10:08:11] petan: A big barrier for for using huggle is the need to download, install and hand credentials over to software. [10:08:48] petan: Using wikipedia within wikipedia makes a lot more sense. [10:09:32] ok, but all that JS, wouldn't you need to download that too hm? the traffic of webpage based SW would be higher though, so in the end, you would end up downloading much bigger data [10:09:48] downloading and installing huggle is really very easy, its debian package has few kb only [10:10:04] petan: There's downloading a file and "downloading" a webpage. [10:10:09] regarding credentials: yes that suck, but unless wmf fix that I can't do anything [10:10:16] The webpage is transparent to the user. [10:10:32] is it? :P [10:11:04] more and more powerful browsers will be, more potential they will have to destroy target computers with some JS based viruses [10:11:21] I doubt that everyone is reading and understands the source code of JS they load in their browsers [10:11:24] petan: When you click a link on a webpage and are brought to another webpage, almost everyone doesn't even realise that the download has occured. [10:11:36] maybe but it did [10:12:20] petan: If anything, the source of the JS is transparent, where the compiled code given for huggle is not. [10:12:36] I wouldn't be that sure, this is only case of windows [10:12:52] ubuntu packages are built by launchpad which is service provided by canonical [10:12:53] petan: A majority OS. [10:12:59] they are just that transparent [10:13:40] you can view source code of any binary that was built there [10:13:57] Anyway, until recently huggle was windows only, with a hack for other OS's. [10:14:19] petan: My point was the source code has little effect on the end user. [10:14:22] yes, that has changed though [10:14:42] define "little effect" [10:15:01] petan: Most users don't check source. [10:15:26] yes, that makes it irrelevant whether they run byte compiled code (faster) or interpreted code [10:15:46] they don't care anyway [10:15:54] because as you said, they don't check the source [10:16:07] petan: That was in response to 11:11 < petan> more and more powerful browsers will be, more potential they will have to destroy target computers with some JS based viruses [10:16:27] As for speed, as I said, it's generally fast enough. [10:16:51] really? on your PC maybe [10:17:02] I have 1.1ghz 2gb ram netbook [10:17:07] that thing can barely run firefox [10:17:12] the latest version I mean [10:17:15] petan: My PC is a benchmark for slow ;p [10:17:24] it's slow mastodon eating lot of resources [10:17:28] Well yeah, FF is a beast. [10:17:31] not all people on world have latest computesr [10:17:33] * computers [10:18:10] I can't even run facebook on that PC, but huggle, hell that run so fast and well [10:18:23] facebook is web based, huggle is natively running [10:18:39] I could run 10 huggles simultaneously on that box [10:18:51] but that box can't handle 1 facebook window [10:19:03] that is my proof that webbased stuff is slow and resource expensive [10:19:55] one day people will have boxes with 20 cores and 100gb of ram, and they will be happily running the same software as they run now, just it would be web based [10:19:59] petan: You seem to be confusing memory leaks with programs. [10:20:20] what, are you saying there is memory leak in firefox? [10:20:49] **** yes. I switched to chrome because of that. [10:20:57] It's just bloat. [10:21:01] I have chrome too [10:21:10] it uses just as much ram as firefox [10:21:20] literally, I see no difference between these 2 [10:21:36] just firefox is IMHO more community based, while chrome is pretty much all about google [10:21:39] FF has so much more overhead. [10:21:48] chrome too [10:22:02] that's all the JS bloat wrapped all around these browsers [10:22:02] That does not logic.. [10:25:41] petan: The most computationally expensive thing in much of the browser stuff is the rendering of the page. [11:00:45] petan: i built huggle 3 on my machine, works good, and looks nice, thanks [11:00:55] any plans for supporting more wikis? [11:02:54] you are welcome :P [11:03:02] yes, huggle 3 supports any wiki, even non wmf [11:03:16] one guy already installed it and had it working on some private wiki [11:05:05] a930913: indeed, fortunatelly huggle uses api, which don't need to be rendered, and pages it display are mostly pre-parsed diffs only [11:16:22] petan: Rendering in both applications should be similar. [11:17:03] a930913: I don't think so, huggle only needs to render few lines of html, while web browser based application would need to render whole its interface that would be only html + JS etc [11:18:03] the interface itself changes a lot [11:18:28] just have a look how often the queue is changed, the lists with information about edit etc [11:20:32] petan: how do i choose a non-listed project ? [11:21:06] matanya: wmf or non-wmf? [11:21:11] wmf [11:21:41] well, most easy is probably to open a bugzilla ticket requesting to enable huggle on that project [11:21:57] the project needs to have huggle configuration page, warning templates and all that [11:22:19] then it just needs to be added to Huggle [11:22:27] petan: Fair point, but the queue still needs to be rendered in any language. [11:22:28] * https://meta.wikimedia.org/wiki/Meta:Huggle/List [11:22:58] a930913: rendering it in native (on linux GTK) library is much, much faster, then rendering using html [11:23:16] html needs to be parsed, binary code doesn't [11:24:14] petan: Also if done right, a DocumentFragments helps change only deltas. [11:24:55] maybe, but I am not going to do that, but you have my permission to rewrite huggle to JS :P [11:25:05] good luck :) [11:26:08] petan: As I said last time, I already have translated the core, just needs a bit more bugfix :) [11:26:18] Do I get to call it huggle though? :3 [11:26:43] that's up to you [11:27:57] So I won't get shouted at for infringing "trade" mark? :p [11:31:07] thanks petan [11:38:22] done for today: hg2 still works like a charm http://imgur.com/W3hlP2J petan, sorry for not using hg3 ;) [11:51:29] mutante, matanya: created bugs for icinga and ganglia: bugs 63361 and 63362 [11:51:55] great,thx [11:51:55] thanks, coren and andrewbogott_afk will probably look at it [12:48:48] after 'finish-migration render-tests': [12:48:51] tools.render-tests@tools-login:~$ mysql [12:48:51] ERROR 1045 (28000): Access denied for user 'rendertests'@'10.68.16.7' (using password: YES) [12:48:54] what am i missing? [12:49:03] have the credentials changed? [12:56:58] JohannesK_WMDE: was the tool created while you was logged in with the shell. If yes, restart your connection [12:57:22] ups, misread [12:57:28] se4598: no, it's been migrated... [13:04:44] JohannesK_WMDE: seems you are missing the conf file. try $ sql local [13:05:13] !mysql [13:05:24] !mysql is https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Shared_Resources/MySQL_queries [13:05:25] Key was added [13:06:01] tools.render-tests@tools-login:~$ mysql local [13:06:01] ERROR 1045 (28000): Access denied for user 'rendertests'@'10.68.16.7' (using password: YES) [13:06:18] .my.cnf exists if that's what you mean hedonil... [13:06:21] JohannesK_WMDE: just sql not mysql [13:06:37] tools.render-tests@tools-login:~$ sql local [13:06:37] ERROR 1045 (28000): Access denied for user 'rendertests'@'10.68.16.7' (using password: YES) [13:06:44] sql is a wrapper script, mysql is the binary [13:06:47] !log deployment-prep Applying role::beta::natfix on deployment-upload.eqiad.wmflabs . Might let it access images from commons.wikimedia.beta.wmflabs.org ( ex: http://upload.beta.wmflabs.org/wikipedia/commons/thumb/4/43/Feed-icon.svg/16px-Feed-icon.svg.png yields: Error retrieving thumbnail from scaling server: couldn't connect to host commons.wikimedia.beta.wmflabs.org ) [13:06:50] Logged the message, Master [13:07:02] hedonil: i know [13:07:21] JohannesK_WMDE: can you open replica.my.cnf and read the content? [13:07:49] JohannesK_WMDE: not mysql, but "sql local" [13:07:52] hedonil: yep... [13:07:57] se4598: yes, see above [13:08:51] try $ mysql --defaults-file=~/replica.my.cnf -h tools-db [13:09:38] sorry my irc connection apparently dropped and i doesn't saw the messages (reading now from bot-log) [13:12:27] hedonil: that works... apparently .my.cnf is bad/broken/whatever [13:12:55] JohannesK_WMDE: toss it to the bin ;) [13:13:00] tools.render-tests@tools-login:~$ cat .my.cnf [13:13:00] [client] [13:13:00] host=tools-db [13:13:00] user=rendertests [13:13:00] database=rendertests [13:13:01] password=[...] [13:13:32] hmmmm... [13:13:34] That is a strange password - you sure it is correct ;-) [13:14:13] Beetstra: sure! that's the one i always use for everything! ;) [13:14:43] :-D [13:15:32] seriously though... i didn't realize the database names and user names changed that much after the move. so we have to use the sXXXX__blah DB names for everything? [13:16:14] JohannesK_WMDE: yep. http://tools.wmflabs.org/tools-info/?dblist=tools-db [13:16:20] JohannesK_WMDE - changed that much: some even disappeared!! [13:16:32] Beetstra: lol [13:16:52] great [13:17:14] Beetstra: coren made your db going to stealth mode ;) [13:17:30] hedonil .. Iḿ not exactly laughing yet - may have to spend time to again regenerate the db since I .. forgot to dump the structure of it after the last build/redesign :-( [13:18:21] Beetstra: Sorry, my timezone doesn't really allow me to respond to you around 1am my time most of the time. :-) I'm going to restore your DB from a pickled instance in an hour or two. [13:18:40] OK, I hope to see it tomorrow then. [13:18:44] Thanks Coren [13:18:45] we have to move again soon. this is all very exciting and fun [13:18:54] JohannesK_WMDE: Move again? [13:19:14] YES, please Coren, schedule another move. Split all tools into their own instances or something like that [13:19:40] Please announce it today (and hope that people don't notice it is the 32nd of March) [13:19:44] Coren: yeah, TS to labs, then the equiad move, then ... move somewhere else just for the fun of it [13:20:02] and yes, today would be the right day to announce it Coren [13:20:11] Beetstra: I think Coren is going to recover your missing asset [13:20:39] hedonil, Coren just mentioned that, indeed [13:20:40] Given how much pain everyone has had to go through already, I don't think it'd be the right time for playing April fool. Srsly. :-) [13:20:47] Beetstra: Everything will be ok in the end. If it's not ok, it's not the end :-D [13:21:19] hedonil - I have spam fighters from all over the world complaining, asking when I bring my tools back online [13:21:29] did not hear any spammers complain, though .. strange [13:21:54] Beetstra: they know... [13:22:11] Coren: Can you send round a panicked, back-up-your-work-quickly email to labs-l? :p [13:24:06] OK, see you tomorrow [13:24:52] a930913: I'm all for good fun in general but I really don't think it'd be a good move to annoy people about Labs now. :-) [13:27:04] Though backing up is always a good idea, and with no Labs snapshots at the moment, it's mandatory :-). [13:29:19] scfc_de: Is there an easy command to snapshot your own stuff? [13:30:27] a930913: rm -rf ? [13:30:27] !log deployment-prep upgrading labs Elasticsearch to 1.1.0 [13:30:30] Logged the message, Master [13:31:20] !log deployment-prep deployment-upload is rejecting connection on port 80. Applying role::beta::uploadservice from {{gerrit|122786}} [13:31:23] Logged the message, Master [13:31:45] hedonil: Thanks, running now. [13:31:50] a930913: ;) [13:31:56] hedonil: ;) [13:32:48] !log deployment-prep Thumbs access more or less fixed [13:32:51] Logged the message, Master [13:33:03] On the more positive side of things, I note that Tool Labs has been rock solid in its new home. The pain wasn't for nothing. :-) [13:33:38] Coren: With your professional experience, at what point should we expect the creaking to start? :p [13:35:30] Coren: congrats! [13:36:00] a930913: "ssh -e none scfc-test@tools-login.wmflabs.org 'cd .. && tar jc $USER' > scfc-test.tar.bz2" [13:36:27] Coren: but you have provide a kind of 'legacy corner' with gluster [13:37:19] Coren: to remember - and just in case have a cuplrit that can be blamed randomly [13:37:21] lol [13:38:31] scfc_de: That's just files, not cron or db, etc., right? [13:44:57] a930913: Yes, just the home directory. That suits me, as I always generate/back up crontabs or DBs from/to disk. [13:47:08] a930913: I'm hoping "never"; the move to eqiad allowed us to do "right" some of the things that were hacked up in pmtpa and there is no plan to stop continual maintenance. [13:47:42] Also, I *finally* get to start working of feature requests again. [14:07:37] hashar: deployment-elastic04 won't even let me in! [14:08:13] :-( [14:08:31] manybubbles: same for me [14:08:49] have you attempted rebooting it ? [14:09:00] let me give it a shot [14:09:05] we fucked up the puppet master yesterday might have applied some wrong configuration to it [14:09:06] :( [14:09:55] manybubbles: if all fails you get to grab ops to fix it up [14:13:17] hashar, ganglia was running on a Lucid instance. There were only a couple of Lucid boxes still running in labs and they mostly didn't survive the move due to not supporting the NFS client that we require. [14:13:55] andrewbogott: ah nice [14:14:24] andrewbogott: was it using any puppet classes ? I will be more than happy to attempt to resurrect the service on a Precise instance [14:15:44] hashar: please do! I started work on the 'aggregator' instance, it should have the necessary classes applied. [14:16:17] I just made you a project admin :) [14:16:35] What actually happens in these office meetings for labs? [14:18:52] andrewbogott: thanks [14:19:15] manybubbles: seems elasticsearch04 is reachable again. [14:19:25] yeah! [14:19:26] I'm on it [14:19:40] manybubbles: note that we have our puppetmaster (deployment-salt) with patches cherry picked in /var/lib/git/operations/puppet [14:19:54] I figured it was something like that [14:20:06] can those be on a branch or something? [14:20:09] a930913: You're talking about the toolserver migration meeting? If you're already migrated then it probably won't be interesting :) If you have issues that are preventing you from moving away from toolserver then the office hour is a good opportunity to make sure your blocker is addressed. [14:21:46] andrewbogott: You know, I completely missed the toolserver part :p I just say migration and thought it was about the labs migration XD [14:21:53] s/say/saw/ [14:22:44] hm... why Update_time = NULL o_O [14:22:50] i am suprised [14:24:11] is there a way to enabe? [14:24:55] Steinsplitter: I guess nobody has a clue what you are talking about. [14:27:16] SELECT UPDATE_TIME FROM information_schema.tables WHERE TABLE_SCHEMA = 'userdb' [14:27:21] output is NULL... [14:27:31] strange... normally schold be a timestamp [14:29:35] andrewbogott: first step for ganglia: get it installed on hosts in eqiad and point to the new IP address :-D https://gerrit.wikimedia.org/r/122790 [14:30:19] scfc_de: hast du ne idee warum das auf den labsdbs abgeschaltet ist? [14:32:19] !log deployment-prep completed upgrade to Elasticsearch 1.1.0 and fixed deployment-elastic04. [14:32:21] Logged the message, Master [14:32:50] Steinsplitter: Nein. [14:33:26] springle: Is "SELECT UPDATE_TIME FROM information_schema.tables WHERE TABLE_SCHEMA = 'userdb'" being NULL a conscious decision? [14:34:04] !log ganglia aggregator puppet.conf had some HTML for certname entries. Fixed all occurrences to use i-000002a4.eqiad.wmflabs [14:34:06] Logged the message, Master [14:34:30] Steinsplitter: Welcher Server? [14:35:12] scfc_de: der für die userdatenbanken [14:35:22] s51291 O_O [14:35:53] hashar, it has to be done by ip? [14:35:58] andrewbogott: no idea [14:36:06] andrewbogott: merely reusing the same logic that was used previously [14:36:10] yeah, ok. [14:36:14] Best to just remove the pmtpa entry [14:36:16] one bug at a time :-] [14:36:40] Steinsplitter: did that ever work on labs? [14:36:46] Steinsplitter: http://comments.gmane.org/gmane.comp.db.mysql.general/112102 [14:36:54] Steinsplitter: tools-db oder der auf den replicas? [14:37:06] ... Beginning with MySQL 5.7.2 [14:37:10] Coren, any way to check server-status via the terminal? xtools is stuck again, and I want to check out why, but the server-status page isn't loading as well. [14:37:24] scfc_de: tools-db [14:37:30] hashar: does 'one bug at a time' refer to pulling out pmtpa code as well? [14:37:54] na just the ip / hostname [14:37:59] I have no idea how ganglia use that information [14:39:54] !cyberpowerresponse [14:39:54] and I say what I need, get no response and realize I just wasted the effort of typing what I need. [14:40:12] Steinsplitter: Hat hedonils Link Deine Frage beantwortet? [14:40:23] This is exactly why I ping before I talk. petan [14:40:52] Cyberpower678: I hope the effort of typing two sentences hasn't hurt you too much. [14:41:28] scfc_de, I lost about 2 minutes of my life. :p [14:41:33] It's tragic [14:41:36] scfc_de: oh yes :) thy [14:41:42] hedonil: thx for the link :) [14:42:12] !petan:ping | Cyberpower678 [14:42:13] Cyberpower678: don't say petan: ping EVER!! If you need anything, say petan: , saying just "ping" is totaly useless [14:42:45] !cyberpowerresponse [14:42:45] and I say what I need, get no response and realize I just wasted the effort of typing what I need. [14:42:51] !cyberpowerresponse | petan [14:42:51] petan: and I say what I need, get no response and realize I just wasted the effort of typing what I need. [14:43:28] Cyberpower678: but I have a special window in my irc client which keeps track of these, so I can read it later, if you just say petan: ping all I see in that window is that you just "needed something" [14:44:09] petan, then I leave IRC, when you want to answer, and I end up having to ask again. :p [14:44:36] yes, you end up having to ask again if you just say "petan: ping" because your question will never be logged [14:44:55] if you say petan: I need this bla balbfd g then I will know it because it will be logged [14:45:03] it's like sending an e-mail "Hi are you there?" [14:45:06] use memoserv [14:45:12] memoserv is crap [14:45:17] :) [14:45:25] petan, not really. It works for me. :p [14:45:35] everytime I sent a message to anyone using memoserv they never read it [14:45:51] * Cyberpower678 reads all memos [14:45:57] I read no memos [14:46:50] * YuviPanda left yesterday since petan and Cyberpower678 seem to be talking random stuff and putting random personla opinions in wm-bot, back the next day and it is still going on... [14:47:26] btw I am pretty sure if I used memoserv it would be like "you have 1 new memo from Cyberpower678", then I would do /msg memoserv read and response would be "petan: ping" [14:47:48] andrewbogott: well for some reason our package ganglia-webfrontend does not provide any PHP files to set up the ganglia website :D [14:48:05] yep, bits are definitely missing. [14:48:06] petan, umm....no [14:48:16] YuviPanda: we moved from wm-bot to memoserv, so shush [14:48:23] alright. [14:48:25] andrewbogott: ganglia-webfrontend_3.5.0-wm1_all.deb is only 10kbytes ... I guess something screwed up when we build it [14:48:44] that's weird, how did it ever work in that case? [14:49:02] maybe there was a local deb repo in that project? [14:49:23] Or maybe 10kb is all it takes [14:52:16] andrewbogott: http://aufflick.com/blog/2005/11/10/six-stages-of-debugging [14:58:08] Cyberpower678: No; by definition you have to be able to connect to the webserver for this. But xtools is always stuck for the same reason: too many long requests end up clogging all the workers. [15:01:25] Coren, I want to know which requests are causing it, so I can restrict the tool for the time being. [15:01:53] access.log shows me every request but not the ones that are hanging. [15:07:46] hashar: I asked you a while back about multicast in labs - do you know if we plan to upgrade to an openstack that has it working "properly"? [15:08:31] manybubbles: I have zero idea. I think Leslie / Ryan wanted it eventually. [15:08:36] manybubbles: Do you know what openstack version/service would support that? [15:08:41] no idea:( [15:09:10] We did just upgrade things in eqiad. But the network setup is much the same as before. [15:16:04] andrewbogott: looks like it requires GRE tunnels.... starts talking about networking stuff I've let leak out of my head years ago [15:16:22] http://assafmuller.wordpress.com/2013/10/14/gre-tunnels-in-openstack-neutron/ [15:16:25] manybubbles: ok, so that's probably a Neutron thing :( [15:16:36] Might be a while, we tried to set up Neutron for this last migration and it was a MESS [15:16:50] yeah [15:16:55] ah [15:17:14] They don't really support our current use case. So we'd have to design an entirely new network topo [15:17:35] which I am not up for, and we're down to half a network engineeer atm :( [15:22:48] Cyberpower678: Do some logging on your own? [15:23:25] Yea. [15:34:20] (03PS1) 10Tim Landscheidt: Package webservice [labs/toollabs] - 10https://gerrit.wikimedia.org/r/122841 [15:34:25] Cyberpower678: Well, there are a few ways around that; you could have the tool itself log some metrics - or indeed you can look at the access log to see where things slow down. [15:42:27] matanya: Re https://wikitech.wikimedia.org/w/index.php?title=Tool_Labs/Migration_to_eqiad&diff=0&oldid=108232, do you want those tools to be deleted? [15:42:39] they can scfc_de [15:44:07] matanya: Could you then go to https://wikitech.wikimedia.org/wiki/Special:NovaServiceGroup and "Remove service group"? I'll clean up the disk afterwards. [15:52:10] scfc_de: You must be a member of the projectadmin role in project tools to perform this action. [15:57:39] !log deployment-prep shutting down all pmtpa instances [15:57:43] Logged the message, dummy [15:57:47] matanya: Oh! Well, then, eh, ignore my silly plea and let me do it all :-). [15:57:56] :) [15:59:59] bd808: did you restart the 'logstash' instance in pmtpa [15:59:59] ? [16:00:25] um… dang, this script did more than I expected. [16:00:44] Coren, you restarted a couple of instances in tools, and I just now shut them down (collateral damage). Is that messing with you? [16:00:54] Sorry if I just ruined a copy in progress :( [16:01:12] andrewbogott: You have. Your sense of horrible timing is impressive. :-P [16:01:20] Not a catastrophe, just a minor annoyance. [16:01:24] sorry [16:01:30] :-) Poop occurs. [16:01:33] So... [16:01:44] andrewbogott: I restarted logstash in pmtpa yesterday in an attempt to be able to ssh into it. You can kill it with fire. I'm done with it. [16:01:46] I'm also about to shut down wikitech, maybe you should restart the things you need before I do that! [16:01:52] bd808: ok, thanks [16:02:12] Coren: eta for https://bugzilla.wikimedia.org/show_bug.cgi?id=62387? [16:02:16] andrewbogott: I'm almost done, I'm making a last-bit-of paranoia dump of some dbs and I was going to blow those instances away. [16:02:43] Coren, I have a script I'm running periodically that shuts down everything in pmtpa. I'll ask you before I run it next... [16:02:53] But, meanwhile… let me know when you're clear of wikitech? [16:03:09] gifti: I'm working on requests this week. Unless there is some unknown complexity it should be done in a few days at most. [16:03:36] andrewbogott: I just rebooted the instances, so I'm okay. [16:04:22] ok [16:04:58] meh [16:05:58] gifti: Perhaps even today with a bit o'luck. [16:34:00] hi Coren [16:34:03] can you check https://bugzilla.wikimedia.org/show_bug.cgi?id=54164 ? [16:36:16] I seem to have forgotten the alternate views. [16:38:32] Coren: thanks [16:41:12] Coren: and I left a message in this channel a while ago, that I can't receive mails sent to tools.tools@tools.wmflabs.org . [16:41:13] playing with this is just for fun :) if you have other things to do, do them first :p [16:46:35] liangent: Yeah, I really don't care about that particular edge case you are messing with for the sake of messing with an edge case. :-) [16:48:36] Coren: as long as it doesn't eat mails sent to other users [16:50:14] * andrewbogott drums fingers impatiently [16:54:36] anyone around has expriance with running a bot on a Windows system ? [16:54:48] I am a admin in Heb Voy and I need to use a bot to be able to do some more advanced things within a shorter period of time [17:07:46] can anyone help ? [17:08:14] Coren: I got redirected to virt1000.wikimedia.org from wikitech.wikimedia.org ? [17:08:35] liangent: note the topic [17:09:39] andrewbogott: ok .. this brokenness looks strange [17:11:49] liangent: yes, i did as well, but not anymore [17:11:53] wfm now [17:12:10] besides it is readonly, db-side [17:33:33] liangent: behaving now? [17:34:36] andrewbogott: yeah [17:37:38] (03PS1) 10BryanDavis: Passwords needed by role::graphite [labs/private] - 10https://gerrit.wikimedia.org/r/122860 [17:39:56] (03CR) 10BryanDavis: [C: 032 V: 032] Passwords needed by role::graphite [labs/private] - 10https://gerrit.wikimedia.org/r/122860 (owner: 10BryanDavis) [17:40:23] -| The tools are dead; long live the tools! :P [17:43:09] Coren (or someone else with mediaiwiki admin skills) : when/if you have a minute, could use help with bulk page deletion on wikitech [17:44:31] andrewbogott: After the office hours, I'm okay. [17:44:39] oh yeah, that! [17:44:40] * andrewbogott tunes in [18:08:49] Coren: how do I hit tools.wmflabs.org from inside tools.wmflabs.org again? [18:08:52] tools-webproxy? [18:09:15] Coren: hmm, and https://tools-webproxy doesn't work at all :( [18:09:36] YuviPanda: It should. What happens when you try? [18:09:53] Coren: I mean, *https* tools-webproxy. [18:09:58] Coren: so ssl cert [18:10:05] tools.wp-signpost@tools-login:~/code/signpost/WPSignpost/api$ curl https://tools-webproxy/ [18:10:06] curl: (51) SSL peer certificate or SSH remote key was not OK [18:10:19] Ah, well yeah -- the cert name will be 'tools.wmflabs.org' [18:10:33] But you needn't use SSL internally. Nothing else does. [18:10:34] Coren: http works [18:10:34] Coren: yeah [18:10:45] Coren: true. wonder if there's any way around it at all [18:10:52] Around what? [18:12:37] Coren: the entire 'tools.wmflabs.org can not be reached from inside' [18:13:07] YuviPanda: nova-network limitation; public IPs are simply entirely unreachable from within the network. [18:13:24] sigh. is it something they are gonna fix or is it 'by design'? [18:13:52] Coren, for a bulk deletion, what works best? I can do a wildcard in the page name, or a SMW search… (I can probably use the latter to get all the page names into a file, if that's what we need.) I want to clobber https://wikitech.wikimedia.org/wiki/Nova_Resource:*.pmtpa.wmflabs [18:14:01] right now our usage stats are /way/ wrong :) [18:14:31] YuviPanda: I think it's "nasty side effect of how they decided to do things" [18:14:33] andrewbogott: I just tried to use [[Special:NovaProxy]] to create a new web proxy in the deployment-prep project and got "Successfully added graphite-beta entry for IP address 208.80.155.156. Failed to create new proxy graphite-beta.wmflabs.org.". Possibly related the existing logstash-beta.wmflabs.org proxy (which is working) is not showing up in the proxy list for the project. [18:15:13] andrewbogott: SMW search is probably your best bet [18:15:26] bd808: I'll try, just a second. Might be a side-effect of the wikitech switch just now. [18:15:30] Although, honestly, I don't remember having ever had to do mass deletion. :-) [18:15:37] andrewbogott: Thanks [18:15:46] Coren: Oh, ok -- if you haven't done it then I can just rtfm [18:17:06] bd808: yeah, happening to me too. I'll investigate [18:21:23] bd808: try it now? [18:21:53] andrewbogott: Worked! Also I read your email and logged out and back into wikitech [18:22:13] bd808: cool. There was just a firewall rule keeping wikitech from reaching the proxy. [18:22:15] I can see both proxies now in the list [18:22:24] We'll probably find lots of those in the next couple of hours :( [18:23:26] YuviPanda: In pmtpa, we had an alias in /etc/hosts for tools.wmflabs.org. [18:23:38] scfc_de: yeah, we should probably do that here too, perhaps [18:23:42] scfc_de: was it puppetized? [18:37:49] YuviPanda: No. /etc/hosts is (ATM) manually synchronized from /data/project/.system/hosts. [18:47:33] wasn't there the idea of secure repositories for bot passwords? [18:49:47] gifti: I'm not sure what you mean; any passwords can be stored in a file with limited permissions at need. Or is there a specific use-case you have in mind where that is impractical? [18:50:53] hm, i'm not sure [18:51:19] i want to git-ify my tool and i thought i could store code and passwords in different repos [18:52:11] gifti: I wouldn't put any passwords in a repo; save it locally on your box for backup and on Labs in a file with those limited permissions. [18:52:26] gifti: generally things like that are handled by having a dummy password in source control or puppet, and then using a local 'real' password that doesn't get overwritten by puppet. [19:03:31] mutante|away: se4598: Apparently someone turned the instance off - I blame andrewbogott. Booting up atm [19:03:51] what instance? [19:03:59] icinga [19:04:13] it might not have been you, but it was shutoff and I wanted to see if you where here for something else [19:04:22] in what project? [19:04:29] icinga [19:04:33] ah, ok. [19:04:37] I don't think i shut it down, but maybe :) [19:05:14] hmm it's not exactaly booting very well... maybe just slow [19:05:49] Anyway... do you have a wikitech/nova controller etc box like where was in pmtpa? Want to test some saltmaster/keystone changes now group acl support has been merged for the next release. [19:06:15] I don't have a test box right now :( The ones from pmtpa died during migration [19:07:10] :( Hmm... ok. IIRC they don't really build that well, can give it a try though [19:08:42] Yeah, will require a lot of by-hand work to make new ones. [19:08:49] Should be puppetized, but it's hard [19:09:34] You basically have to do all the keystone/nova stuff by hand if I remember right [19:12:44] and all the mediawiki stuff [19:12:48] basically everything :) [19:13:08] I think I know the answer to this but … is there a way to add a security group to an instance that has already been created? [19:13:15] I only need keystone and salt though ;) [19:13:38] Though mediawiki would be good... can't be worse than configuring bios + raid on dozens of boxes [19:13:43] bd808: there isn't. You can add rules to existing groups though. [19:13:52] * bd808 grumbles [19:14:06] That's what I thought the answer was [19:14:29] Than goodness for puppet roles to set everything up :) [19:20:57] !log deployment-prep Deleting and re-creating deployment-graphite because I forgot to add the web security group [19:20:59] Logged the message, Master [19:24:02] Coren: can you block access based off UA? [19:24:12] https://www.google.com/about/careers/search/#!t=jo&jid=32155002 < Anyone fancy a new job [19:25:34] Betacommand: Not reliably; but lighttpd should allow you to do so: http://redmine.lighttpd.net/projects/1/wiki/Docs_Configuration has some relevant examples. [19:26:12] Coren: the reason I was asking was that my tools are being spidered [19:27:51] notably TweetmemeBot [19:28:38] Betacommand: Should be fairly simple to deny access on $HTTP["useragent"] =~ "TweetmemeBot" [19:29:06] Coren: where is that set at? [19:29:16] * Damianz wonders what sort of meme's Betacommand has that are worth tweeting [19:29:24] I've originally had a restrictive robots.txt but then some users complained that they were /not/ being spidered. :-) [19:30:11] Coren: can you go ahead and block spiders for my tools? [19:30:36] Betacommand: In your tool's .lighttpd.conf. You probably want a stanza that looks like https://tools.wmflabs.org/paste/view/1d82b739 [19:31:04] * Damianz remembers when Coren blocked his own scripts to hitting his own scripts [19:32:55] Betacommand: You mean in robots.txt? I can block your tool root, but the really ugly bots don't even check. [19:33:51] where is the right place to ask about 'git review -s' returning an error? [19:33:53] Coren: ugly bots I can screw with [19:33:55] Hell, in general, the well-behaved bots that obey robots.txt are the ones one generally don't mind. :-) [19:34:02] Coren: thanks [19:34:41] Any news on ganglia's return? [19:34:44] Coren: I have some reports that probably shouldnt be indexed [19:34:51] (or a bug I can track?) [19:35:41] Betacommand: You might want to add rel="nofollow" to generated links to those; this way the well-behaved bots wouldn't anyways. [19:36:10] Coren: those links come from wikipedia [19:36:24] which already have nofollow :P [19:36:56] Indeed, so the bots following them are known to be misbehaving to begin with. :-P [19:37:02] gifti: What error? [19:37:15] Coren google does that :) [19:37:47] Tsk, tsk. [19:39:14] Coren: I think its just for the WMF though [19:39:55] scfc_de: https://tools.wmflabs.org/giftbot/error.txt [19:40:33] User-agent: * [19:40:33] Disallow: /betacommand-dev/ [19:47:46] gifti: That isn't on Labs, I suppose? Then you could take the vanilla commit-msg from gerrit.wikimedia.org; cf. https://www.mediawiki.org/wiki/Gerrit/Tutorial#Commit_Hook_and_Change-ID, "scp -p -P 29418 ...". [19:49:48] !log deployment-prep Converted deployment-graphite.eqiad.wmflabs to use local puppet & salt masters [19:49:50] Logged the message, Master [19:50:40] :-] [19:51:12] bd808: I am trying to resuscitate the labs wide ganglia setup [19:51:35] hashar: Anything I can do to help? [19:51:54] Krinkle: you mean something like https://bugzilla.wikimedia.org/show_bug.cgi?id=63362 ? [19:53:09] mh, now theres at ganglia at least a fatal error instead of a blank dir listing. Anyone worked on that? [19:53:25] !log ganglia adding bd808 as a project admin [19:53:27] Logged the message, Master [19:53:42] Oh noes! Moar power! [19:53:57] hashar: you are currently reinstalling it? [19:54:41] bd808: instance is aggregator.eqiad.wmflabs . There is a puppet class installed but ganglia-web is missing (the PHP files are not provided by our ubuntu package, see ops list) [19:55:13] !log ganglia cloned https://github.com/ganglia/ganglia-web.git to /usr/share/ganglia-webfrontend and checked out tag 3.5.12. Total unpuppetized hack for the win [19:55:16] Logged the message, Master [19:55:29] se4598: yeah I need ganglia to provide metrics for beta. [19:56:49] added you as cc on the bug [20:00:24] Coren: two questions, any update on changing the default grid output location, and how is your cron update going? [20:01:28] Betacommand: I've delayed the forcible change to allow people more time to switch by hand; both are operational for a while still. [20:01:30] se4598: thx [20:01:47] Betacommand: As for the jsub defaults file, I hope to be able to do that this week. [20:06:20] !log ganglia something more or less working at http://ganglia.wmflabs.org/latest/ [20:06:22] Logged the message, Master [20:12:42] hashar: can I copy your comments for ganglia here into the bug entry or do you leave a note? [20:13:03] se4598: will leave a note tomorrow [20:13:21] ok [20:14:05] hey Coren. my meetings are done now so poke me whenever :) [20:14:28] Will do shortly. [20:20:16] bd808: interestingly ganglia can write to a graphite backend :] [20:21:04] I'm close to getting graphite running in beta. Not quite right yet, but close [20:21:41] hashar: i saw you worked on populating beta wikis using dumps, were does that stand? [20:22:06] matanya: we have populated nothing [20:22:24] time or technical issues ? [20:22:27] matanya: basically what we did is export the base from pmtpa and reimport them on the eqiad db (done by springle our DBA) [20:22:51] and dropped a few unused one such as the wikivoyage imports we did when the project migrated under wikimedia umbrella [20:23:07] but in tampa there were poplulating ? [20:24:45] !log ganglia manually installed php5-cli so I can lint php files :D [20:24:47] Logged the message, Master [20:35:32] * YuviPanda gently pokes Coren [20:36:58] YuviPanda: All yours. [20:37:10] Coren: :) so, wassup? [20:39:10] Coren: where is the migration standing? how long to go still? [20:40:23] matanya: Minus ~ a week? I'ts been over for a bit. [20:41:15] Coren: thanks i'm asking because it blocks your time to work on irc migration [20:43:09] Ah! :-) Migration proper is done, but I have some high priority bugzillas to take care of. I'll be able to start on the IRC thing later this week though. [20:51:07] * se4598 sees that ganglia says that ganglia is a bit overloaded: Current Load Avg (15, 5, 1m): 1529%, 1483%, 1422% [20:58:02] for some reason the PHP of ganglia can not write to /tmp :-( [20:58:04] cumbersome [21:00:13] * Damianz thinks gitpuppet is borked [21:03:57] Damianz: ? [21:04:03] * andrewbogott doesn't know what gitpuppet is [21:04:23] …and yet I suspect this is my fault [21:04:24] err: /Stage[main]/Puppetmaster::Gitpuppet/File[/home/gitpuppet]/owner: change from root to gitpuppet failed: Failed to set owner to '997': Invalid argument - /home/gitpuppet [21:04:38] I think this one is probably your fault ;D [21:04:57] Ironically puppetmaster => false means 'install puppet master' apparently [21:05:04] Where is that happening? [21:05:22] That should be a production only thing, as far as I know :( [21:05:37] !log ganglia fixing /tmp permissions back to stat() 1777 [21:05:39] Logged the message, Master [21:06:18] That could explain why, though it looks a lot like the other local puppet labs thing.... suspect there should be a switch somewhere to handle that [21:06:32] it's on wikitech-dev if you want to poke... I'm half looking at installing this while watching tv [21:06:49] today I learned: uncompressing a .deb in /tmp would reset /tmp permission to rwxr-xr-x making it unwritable [21:07:25] I suspect bacula-fd is also a production only thing, eventhough in production bacup::client is included outside of nova stuff [21:07:34] Oh… yeah, the actual wikitech is also a puppet master. Best to just turn that off for labs. [21:07:45] I don't think there's much percentage in making it work. [21:07:54] bacula-fd, never heard of it [21:08:00] bacula backup client [21:08:34] It should... in theory... be off, but it seems to just ignore the option. Tbh though the puppet groups interface is bollocks... you can have the same var in multiple groups, which means changing it doesn't change it... sometimes. [21:08:39] * Damianz stabs interface [21:09:45] andrewbogott: got something working on http://ganglia.wmflabs.org/latest/ :-] [21:09:46] I think I just need to fix some dependancies so when puppet master is false nova stuff doesn't depent on puppet packages and that should fix things. [21:09:47] Man, today is my day for running simple commands that turn out to take 40 minutes to run. [21:09:52] hashar: great! [21:10:04] andrewbogott: Ihave added you as a reviewer to two changes. One is labs only and applied manually https://gerrit.wikimedia.org/r/#/c/123040/ [21:10:39] andrewbogott: the follow up https://gerrit.wikimedia.org/r/#/c/123044/ might impact production. It changes a ganglia setting (graphdir) to be absolute instead of relative. I have no access to the prod box to confirm the dir exists [21:10:55] 40 min is far ;) The other week we ran a mysql alter that took 4 hours per replication layer in a 5 layer deep replication chain... that was not a fun day [21:10:59] s/far/fast/ [21:11:21] gerrit doesn't really seem to load so much [21:11:57] hashar: 'dwoo'? [21:12:11] andrewbogott: seems to be a template engine for php [21:12:17] ok [21:12:19] it is [21:12:32] andrewbogott: the dwoo dir is created but not the compiled / cache dirs so I am using puppet to fix them [21:13:00] Does ganglia come with free exploits now? [21:13:16] ["ganglia_version"] = "wmflabs" [21:13:23] shouldn't the version be, like, a version number? [21:13:24] yeah no clue what it is for [21:13:26] Or is it arbitrary? [21:13:35] 'check $config[graphdir] should not be a relative path' [21:15:07] andrewbogott: apparently it is generated by a Makefile [21:15:57] andrewbogott: should be generated using "make version.php" [21:16:14] hashar: So, regarding that static path… where should I look in prod to make sure it's correct? [21:16:35] andrewbogott: on nickel.wikimedia.org [21:17:01] graph.d/ is shipped by ganglia-webfrontend package [21:17:11] well on our setup by some manually extracted tar ball or git clone [21:17:28] on nickel: ls /usr/share/ganglia-webfrontend/graph.d [21:17:36] nickel doesn't have /usr/share/ganglia-webfrontend [21:17:46] ahh great [21:18:04] that is under some other path like /srv/org/wikimedia/ganglia so [21:18:33] # ls /srv/org/wikimedia/ [21:18:33] ganglia-web-3.5.10 ganglia-web-3.5.7 ganglia-web-conf ganglia-web-latest [21:19:00] found a better way [21:19:10] $conf['graphdir']=__DIR__ .'/graph.d'; [21:19:17] __DIR__ would be the directory of conf.php [21:19:26] and graph.d is in the same dir as conf.php [21:19:44] that seems better [21:20:40] i love reviews [21:20:56] * Damianz reviews hashar's choice of shirts [21:21:08] Damianz: usually match them with my shoes [21:21:28] andrewbogott: https://gerrit.wikimedia.org/r/#/c/123044/2/files/ganglia/conf.php,unified [21:21:54] Explain to me how __DIR__ works in this context? It's some magic puppet thing? [21:22:06] ah it is a magic PHP keyword [21:22:09] That's a good idea.. all my shoes are the same though... thinking about it, so are all my shirts [21:22:16] Oh, it's a php file [21:22:17] nm [21:22:32] I would rewrite it in python but I am a bit lazy tonight :-] [21:22:48] thing to check in production is whether graphs are still rendering :-D [21:23:00] the graphdir setting point to a bunch of graph generators [21:23:25] like varnish_report.php [21:24:37] added reference to the bug report sorry [21:28:57] labs ganglia is showing me some very boring graphs [21:29:02] but it loads! [21:29:53] Graphs work though ;P [21:30:01] we will probably need a stronger instance [21:30:14] the apache error log is also full of ERROR: opening '/mnt/ganglia_tmp/rrds.pmtpa/zotero/__SummaryInfo__/load_one.rrd': No such file or directory [21:31:04] hashar: yeah, I saw that when I was working… also noticed that restarting the ganglia service took, like, 7 hours. [21:31:09] Elastic search is quite spammy network wise, considering the frontend traffic [21:31:24] andrewbogott: yeah something is blocking [21:31:25] but you've made a lot more progress than I did! [21:31:44] hehe [21:31:52] working 12 hours per day helps a lot [21:32:55] the CPU is burning on tools-login, please have a look at it … -> htop [21:32:55] I was wondering about that… hope you're planning to sleep at some point! [21:33:30] andrewbogott: I take naps during the day, usually during lunch time [21:33:35] * hedonil wonders if hashar has a part-time job ;) [21:33:47] hedonil: I have a part time life :-D [21:33:54] but hey, I am never there on week-ends! [21:34:09] Lives are overrated [21:34:48] + I found out a bug in Jenkins yeah! [21:35:04] Euku: It's what happens when people run cpu intensive things on bastion hosts [21:35:19] hashar: At least it's not a bug in zuul.... deadlocks ftl [21:36:11] * Damianz wonders when you'd be decompressing pages xml on tools-login for 16 hours and think it's a good idea [21:36:32] Damianz: one of the jobs is running for 29h!? should an admin kill this? [21:36:53] Euku: I killed them [21:37:03] thx [21:37:09] The 29hours one isn't really a problem... someone just left an editor open [21:38:02] and then nano needs 100% cpu? [21:38:47] !log deployment-prep Removed the Zuul triggers that updated beta cluster in PMTPA {{gerrit|123100}}. [21:38:49] Logged the message, Master [21:39:00] I'd be willing to bet if you killed the other one then it wouldn't use 100% cpu... just wants some cycles and there is none... % is silly [21:39:14] \_ /usr/sbin/gmond -c /etc/ganglia/gmond.conf [21:41:37] !log ganglia Ganglia is graphing!!!!!!!!!!!!!!!!!!!!!! \O/ [21:41:39] Logged the message, Master [21:42:46] ganglia is also rather slow [21:44:45] moar power needed here [21:46:56] I am off, have a good night everyone [21:50:17] * hedonil pokes andrewbogott to replace the virt hosts Cisco UCS C250 M1 / 2x Intel Xeon CPU X5650 @ 2.67GHz (12 cores)  [21:50:26] with IBM System x3950 X6 Intel Xeon E7-8800 v2 @3.2 GHz (80 cores) [21:51:00] hedonil: the ciscos were free! [21:51:35] andrewbogott: I heard money is no concern to WMF :) [21:55:14] hedonil: Whoever told you that was obviously on crack. :-) [21:55:29] lol [21:58:36] * hedonil remembers some of those statements from WMDE concerning ol' toolserver (desaster) [22:25:07] Coren: FYI The last Puppet run was at Mon Mar 31 16:31:27 UTC 2014 (1792 minutes ago) [22:25:16] -dev [22:28:03] Is it better to request an un-mothballing or a new project? [22:31:13] hedonil, baidu is attempting to hammer supercount. :p [22:31:35] CP678: I thought fou FIXED that ! [22:31:59] hedonil, I did. I didn't say it was succeeding. [22:32:15] CP678: :P [22:32:22] It's constantly hitting the rate limit. [22:32:24] huh_: If you're puppetized, a new project is better. [22:32:37] Well, new instances. The project /itself/ is still there. [22:33:59] hedonil: Huh. I got bit by the "instance name is html" silliness. [22:34:50] * hedonil was thinking about CP and baidu while reading the novel Command Authority [22:37:39] Coren, any idea why/how that happened? [22:37:43] What is misc::ircecho? [22:37:44] I'm guessing the curl line works properly now? [22:37:54] https://doc.wikimedia.org/puppet/classes/__site__/misc/ircecho.html no docs... [22:38:30] andrewbogott: It does; I had to edit /etc/puppet/puppet.conf to remove the 500 html that replaced the cert name [22:38:40] weird, ok. [22:42:16] money is no concern to WMF? If that was true I wouldn't have to find the javascript to hide the annoying 'give me money' banners every year :P [22:42:52] huh_: It reads a file and spits the contents into an irc channel, it also doesn't work via wikitech (you'd need to wrap it in a class) [22:43:05] Basically designed for log -> irc spam [22:44:08] Damianz: ircecho_infile is the file path then? [22:44:31] It was - it now takes a hash, which you can't set in wikitech AFAIK. [22:48:34] hedonil, do you have experience with JS? [22:49:02] CP678: li'l [22:49:59] hedonil, can you have a look at the edit counter source and possibly tell me what the hell is causing the dates to misalign with the graph in every browser but Firefox. It's driving me mad. [22:50:04] I can't find it. [22:50:34] * hedonil looks [22:56:15] CP678: Hmm. I tried with chromium & iceweasel, looks aligned to me (if you mean the monthly chart graphics) [22:56:30] hedonil, really? [22:56:42] Might be my hacky fix I placed in there. [22:56:48] CP678: yep [22:57:09] * CP678 doesn't like the hacky fix, but doesn't know what else to do. :-( [22:57:39] hedonil, try IE chrome Firefox and Safari [22:58:20] * hedonil has to fire up the battery of VM's  [22:58:27] :O [22:58:38] You don't have them all on your browser. [22:59:28] CP678: in fact I had to leave 'paranoid' mode and switch to JS enabled for that :P [23:09:16] CP678: Ah. now I spot what you mean: IE 11: fail, Chrome 33: fail, FF 28: ok [23:11:41] hedonil, see. WHY???? [23:11:46] I can't figure it out. [23:13:04] * hedonil gets some coffee and looks [23:36:09] hedonil, any ideas. [23:37:21] CP678: Hmm. Just looking in the raphael.g.axis routine [23:37:55] Same here. The parameters being fed into it should be correct. [23:39:31] CP678: but debugger says nothing like unkown functions/commands (special to FF) [23:41:06] hedonil, we should continue via pm [23:48:04] CP678: back again ;)