[01:31:02] 3Wikimedia Labs / 3deployment-prep (beta): Thumbnail 404s get cached - 10https://bugzilla.wikimedia.org/67056 (10Tisza Gergő) [09:09:11] !ping [09:09:11] !pong [10:00:46] 3Tool Labs tools / 3[other]: Fatal error: Allowed memory size exhausted - 10https://bugzilla.wikimedia.org/67606#c1 (10Andre Klapper) Nobody will see this bug report if you don't CC the tool's maintainer(s) or at least mention the tool name in the bug summary. ;) [10:02:16] 3Tool Labs tools / 3[other]: [pathoschild/catanalysis] Fatal error: Allowed memory size exhausted - 10https://bugzilla.wikimedia.org/67606 (10Andre Klapper) [10:03:17] 3Tool Labs tools / 3[other]: [pathoschild/catanalysis] Fatal error: Allowed memory size exhausted - 10https://bugzilla.wikimedia.org/67606#c2 (10Andre Klapper) CC'ed maintainer but excluded from bugmail due to account settings; you might want to ping the maintainer on her/his talk page about this... [10:06:46] !og integration rebased puppetmaster [10:06:59] bah [11:36:06] !log graphite set storage-schema on diamond-collector to collect only tools, betalabs and graphite stats [11:36:08] Logged the message, Master [12:37:54] !log graphite reduced metrics count from 65k to 25k, monitoring io performance [12:37:55] Logged the message, Master [12:54:48] 3Wikimedia Labs / 3tools: Provide namespace IDs and names in the databases similar to toolserver.namespace - 10https://bugzilla.wikimedia.org/48625#c39 (10Silke Meyer (WMDE)) 5REOP>3RESO/FIX No complaints in almost a month. Closing the ticket. [13:13:10] deal labs folks [13:13:17] er... starting over [13:13:19] dear labs folks, [13:13:24] I am bombarding labsdb [13:13:38] (testing performance for new wikimetrics functionality) [13:13:42] please blame me if stuff burns [13:13:52] We shall. BURN THE WITCH! [13:13:56] :) [13:19:31] <^demon|food> Coren: I'm not a witch, I'm not a witch. They dressed me up like this! [13:37:01] 3Tool Labs tools / 3[other]: [pathoschild/catanalysis] Fatal error: Allowed memory size exhausted - 10https://bugzilla.wikimedia.org/67606#c3 (10Helder) Hm... I tried to find the a specific component for this, but since I didn't find one I just used the generic "other" component. Isn't the text of the bug d... [13:39:25] !log tools tools-login: rm -f /var/log/exim4/paniclog ("daemon: fork of queue-runner process failed: Cannot allocate memory") [13:39:28] Logged the message, Master [13:52:02] 3Tool Labs tools / 3[other]: merl tools (tracking) - 10https://bugzilla.wikimedia.org/67556#c1 (10Andre Klapper) Is this about changes in merl tools itself (because that's what this ticket is filed under, though merl tools does not have a Bugzilla component), or about Tools infrastructure on Labs? Or both mi... [14:10:18] 3Wikimedia Labs / 3Infrastructure: LAMP instance becomes 404 a few hours after spawn (reproducible) - 10https://bugzilla.wikimedia.org/54059#c9 (10Andrew Bogott) This may require a doc update, or it may have been resolved by this: commit 203610a1dd12fe0bb1ca236dab54dc51e71bfb6b Author: Ori Livneh YuviPanda: ping? [14:38:13] Coren: I was at the Wikimania Open Data Hack the other day, and there was a cool project shown, which wasn't hosted on the labs. So I asked, and apparently it's in ASP.NET and C#, so I was wondering, on behalf of the maker, if the labs could support it. [14:42:13] andrewbogott: pong [14:42:15] andrewbogott: didn't get time to look at that patch, sadly [14:42:17] a930913: we have Mono support, I believe. [14:42:19] a930913: not sure if anyone's used that tho [14:43:06] scfc_de: btw, http://tools.wmflabs.org/giraffe/index.html#dashboard=ToolLabs+Basics&timeFrame=1h is working again, in a reduced capacity. No metrics for anything other than toollabs and betalabs [14:44:24] YuviPanda: no worries… mostly I was checking in about if you had any monitoring things to delegate to dogeydogey (who is not here, I note. Hm.) [14:45:06] andrewbogott: atm, I can't think of any :( Mostly fighting the fact that even our biggest labs instance isn't remotely close to big enough for graphite for all of labs. [14:45:19] andrewbogott: I can actually think of a smallish thing, let me file a bug [14:45:35] great, thanks [14:45:42] a930913: petan's wmbot is developed in C# (Mono), so if the app can be compiled with Mono and all the other bits fall into place, it shouldn't be a major problem. [14:46:15] a930913: (And I assume petan would be very supportive of that :-).) [14:46:18] andrewbogott: https://bugzilla.wikimedia.org/show_bug.cgi?id=67673 [14:46:24] 3Wikimedia Labs: MinimalPuppetAgent should collect number of puppet failures in last run - 10https://bugzilla.wikimedia.org/67673 (10Yuvi Panda) 3NEW p:3Unprio s:3normal a:3None Since otherwise we don't actually know if the last run succeeded or failed. [14:47:01] andrewbogott: it's minimal python work, but should be a start [14:47:04] YuviPanda: I don't know what 'minimalpuppetagent' is, so probably dogeydogey doesn't either :) [14:47:21] andrewbogott: :D I'll add more details to the bug [14:48:46] 3Wikimedia Labs: MinimalPuppetAgent should collect number of puppet failures in last run - 10https://bugzilla.wikimedia.org/67673#c1 (10Yuvi Panda) MinimalPuppetAgentCollector is a collector in the diamond module that collects (a small number) of stats about the previous puppet run and relays it to our graphit... [14:48:47] andrewbogott: better? [14:49:15] yep, thanks [14:49:27] YuviPanda: Does Graphite support a multi-layered structure? I. e., four (or +) aggregator instances that crunch the incoming data and deliver aggregated data to another instance? [14:49:49] * a930913 looks at petan. [14:49:49] scfc_de: yeah, that was my other alternative. [14:49:55] @notify petan [14:49:55] This user is now online in #huggle. I'll let you know when they show some activity (talk, etc.) [14:50:10] scfc_de: You mentioned a FUSE module the other day? [14:51:04] andrewbogott: merge a couple of small patches for me? :) https://gerrit.wikimedia.org/r/#/c/141588/ and https://gerrit.wikimedia.org/r/#/c/132238/ and https://gerrit.wikimedia.org/r/#/c/120347/ [14:55:13] !log tools cleaned out old diamond archive logs on tools-webproxy [14:55:15] a930913: There are some pages on enwp; look for ... wikifs or wikipediafs. I don't know if they (still) work out of the box, though. [14:55:16] Logged the message, Master [14:55:40] scfc_de: btw, diamond will no longer write archive.log files, and so not as much clogging up. only errors will be logged [14:56:23] a930913: We do have Mono support, and I think there's a tool or two that uses it. [14:57:10] YuviPanda: Hooray! :-) [14:57:16] scfc_de: :) [15:00:39] scfc_de: Oh, that does Wikipedia locally, whereas I need external, inside Wikipedia. [15:01:21] But mounting Wikipedia is probably a very useful thing... [15:01:53] Mind only for simple reading and writing though. [15:03:15] a930913: Eh? I understood that you want to develop JavaScript in Wikipedia, but with the luxury of GitHub's environment. So I would mount x.wikipedia.org/wiki/User:y/z.js locally and use the editor/Git/etc. of my choice. [15:03:39] * YuviPanda previously built a small vim plugin (since lost) that would make :mw write to wikipedia [15:04:17] scfc_de: Oh right, I already made a bash script to do that kind of thing. [15:04:22] Well, Emacs has that with hexmode's wikisomething-mode ... [15:04:50] The reason it's not on github, is because I don't git ;) [15:04:54] (Which could be improved.) [15:05:54] a930913: I'm impartial to that :-). I just don't like online editors, and FUSE (or wiki*-mode) is a nice way around that for JavaScript. [15:08:31] I must say though, the built in JS editor is pretty darn awesome. [15:11:25] scfc_de: btw, have you checked out pdsh? rather useful to run commands across the cluster :) [15:11:38] a930913: It's pretty impressive, but a ... media discontinuity (?; "Medienbruch") to my normal environment. [15:12:26] YuviPanda: I'm using it for quite some now :-). Though I have to throttle with -f because otherwise the bastion doesn't like me anymore :-). Getting ControlMode right is not that easy for me. [15:12:52] scfc_de: ah :) [15:23:05] * bd808 does a little dance because puppet hasn't had a failure in beta for 12+ hours [15:23:20] just you wait :) [15:23:53] Oh they will be back, but I finally figured out the problem that was killing all runs on the apache boxen for weeks [15:24:13] :) [15:25:50] !log graphite only accept metrics from tools, betalabs and graphite. The center seems to hold for now, let's see how long that lasts [15:25:52] Logged the message, Master [15:28:27] YuviPanda: So your graphite box is having a hard time keeping up with IOPS? [15:28:32] bd808: yeaaaah [15:28:54] At $DAYJOB-1 we put graphite on multiple ssd drives for that reason. [15:29:00] bd808: pretty much. got it down from 65k metrics to 25k, and it still has some slow-writes and iowait is about 17% (down from 45%) [15:29:09] bd808: right, but I dunno how easy it is to get real hardware for labs stuff [15:29:15] * bd808 nods [15:29:39] bd808: might have to write diamond::decommission and have it apply to everything outside of tools and betalabs [15:30:21] bd808: since right now there's a lot of network traffic that goes nowhere [15:30:34] What is the labs hardware stack? HP blade servers? [15:31:04] bd808: unsure, andrewbogott ^? [15:31:08] Or Cisco? Seems like I heard it was some kind of blade donation. [15:34:30] bd808: I think for labs' purposes, even having them just be spinning rust disks but real disks would be 'good enough' [15:35:15] It always depends on the number of metrics and rate of update. Scaling graphite is an adventure. [15:35:39] heh, yeah [15:35:51] But I think it's easier than scaling RRD/cacti stuff [15:35:53] I might also increase update time from 1m to 2m, which I'm guessing should fix any issues. [15:36:03] YuviPanda: It might be possible to give graphite a misc box. [15:36:09] Coren: wooo! [15:36:15] Coren: that'll be AWESOME. [15:36:17] YuviPanda: Ask Rob. [15:36:22] Coren: H or la? [15:36:24] H [15:36:39] If there's a misc server to be had, he'd be the one knowing. [15:36:45] Coren: alright, I'll wait for SF to wake up [15:37:00] Coren: I suppose that'll need to be put into labsnet so we can send metrics freely [15:37:07] YuviPanda: likely. [15:37:10] right [15:37:21] RobH seems active atm. [15:37:33] YuviPanda: I would suggest filing an RT ticket with rationale because that's the first thing rob.h will ask for :) [15:37:39] bd808: ah, right. [15:37:43] let me do that [15:42:14] YuviPanda: I think you can have someone with salt privileges run a command on all instances in Labs, if you need to disable diamond on non-{Tools,etc.} boxes. [15:42:32] scfc_de: right, but decomission is easier, and I don't think salt works across all boxes [15:47:42] bd808, YuviPanda: https://wikitech.wikimedia.org/wiki/Cisco_UCS_C250_M1 [15:47:54] That's the compute nodes. The other bits are various other dells. [15:49:41] andrewbogott: ah, cool [15:50:13] andrewbogott: unrelated, but can you tell me where I can get the config of tungsten? [15:50:19] as in, physical specs [15:50:50] andrewbogott: Thanks. All I know about the UCS stack is that my last employer standardized on it for their hardware. [15:51:01] YuviPanda: I suspect that that's recorded someplace but I don't know where. Robh would be the best one to ask. [15:51:04] ah [15:51:28] bd808: those systems were a donation (actually a loan I think) so when we expand labs in Texas we'll probably use different hardware. [15:51:52] If nothing else because Ops are constantly surprised and annouyed that the admin console is different for just those boxes :) [15:52:13] andrewbogott: haha [15:52:30] andrewbogott: PM for a moment? [15:52:43] We had a lot of firmware issues with ours initially as I recall but they eventually settled out to be pretty stable. [15:53:35] My involvement in the hardware stack ended at making proposals for cpu/ram/iops needs which is how I like it. [15:54:18] 3Wikimedia Labs / 3Infrastructure: set apt to Wikimedia mirror instead of http://nova.clouds.archive.ubuntu.com/ubuntu/ - 10https://bugzilla.wikimedia.org/66121#c1 (10scott.leea) Unless I'm doing this wrong, it seems to work. I just had to replace the URLs and then apt-get update -- I tested an nginx install... [15:57:17] 3Wikimedia Labs / 3Infrastructure: LAMP instance becomes 404 a few hours after spawn (reproducible) - 10https://bugzilla.wikimedia.org/54059#c10 (10scott.leea) Tried this out and it keeps displaying the Apache starter page. If you remove /etc/apache2/sites-available/000-default.conf you have to make sure to... [16:00:12] hi [16:00:16] YuviPanda: Does a diamond collector run on an individual node or on the aggregator? [16:00:31] andrewbogott: individual nodes. [16:00:48] cool. So, dogeydogey, in that case you can hack on that tool right on your test VM. [16:00:58] which, in case it's not obvious: you have sudo on that box. [16:03:32] 3Wikimedia Labs / 3Infrastructure: LAMP instance becomes 404 a few hours after spawn (reproducible) - 10https://bugzilla.wikimedia.org/54059#c11 (10Andrew Bogott) Scott, can you try forcing a few puppet runs and a reboot, and verify that Apache survives both of the above? If so then this bug can be closed.... [16:05:34] dogeydogey: I don't know much about how to monitor the output of the collector, maybe YuviPanda can suggest a way to run it locally so you can see what it's doing without coordinating with the aggregator... [16:13:15] andrewbogott: dogeydogey so, there's not really an easy local way short of installing diamond. I usually just test it on a server. [16:13:15] andrewbogott err maybe this is over my head? dunno how to proceed [16:13:19] oh [16:14:16] YuviPanda so I install Puppet and just start running modules from operations/puppet? [16:14:26] ah, hmm. locally you mean? [16:14:30] yes [16:14:33] dogeydogey: oh, which bug are you thinkin gof? [16:14:35] the labs one? [16:14:39] https://bugzilla.wikimedia.org/show_bug.cgi?id=67673#c1 [16:15:24] dogeydogey: right, I think the easiest setup is to 1. read about diamond (https://github.com/BrightcoveOS/Diamond) 2. hack on it in an instance itself (I assume you have access to one?) 3. submit final patch [16:16:08] YuviPanda ah, so diamond isn't working correctly and you want me to test it and play with it until it works, is that right? [16:16:40] dogeydogey: no, it's working correctly, but one of our custom collectors (minimalpuppetagent.py, in the diamond module) isn't collecting one particular metric that it should be. So that bug is about adding that particular feature [16:16:57] Coren: do you want me to write up the ops@ email? [16:25:17] 3Wikimedia Labs / 3Infrastructure: set apt to Wikimedia mirror instead of http://nova.clouds.archive.ubuntu.com/ubuntu/ - 10https://bugzilla.wikimedia.org/66121#c2 (10Antoine "hashar" Musso) Yup that would do. We need a puppet patch to adjust the sources.list on all labs instances. [16:29:08] YuviPanda: If you want. [16:30:05] Coren: yeah, let me get on it. ty [16:34:05] Coren: I hate to nag but any update on bug 54054 [16:35:21] Betacommand: But you to it so well! :-) I have an experimental version I should be able to push, lemme build the package and let you try it. [16:36:18] Coren: I hate when things fall through the cracks [16:36:32] 3Wikimedia Labs / 3Infrastructure: LAMP instance becomes 404 a few hours after spawn (reproducible) - 10https://bugzilla.wikimedia.org/54059#c12 (10scott.leea) Forced a Puppet run, restarted Apache, and even rebooted the instance. Survived. [16:36:57] Coren: and its annoying to have $HOME fill up with log files [16:37:39] yuvipanda hm, yeah i might not be suited for this -- it's this right? http://git.wikimedia.org/blob/operations%2Fpuppet.git/45d8ebffdf5af2f13defdda1448205fb6ee890f4/modules%2Fdiamond%2Ffiles%2Fcollector%2Fminimalpuppetagent.py [16:38:09] dogeydogey: yeah, that is right [16:39:09] err yeah sorry, i don't know what to do [16:39:27] :( ok [16:39:29] andrewbogott: ^ [16:40:53] Betacommand: What's wrong with using options/a wrapper? [16:42:16] YuviPanda: can't he work on the collector on a given instance without having to set up a server as well? [16:42:27] andrewbogott: indeed, no need to setup a server [16:42:33] andrewbogott: the collectors should be in /usr/share/diamond [16:42:40] so it is modify + 'service diamond restart' [16:43:45] Betacommand: Check the version on tools-dev; it should add the contents of ~/.jsubrc as options. [16:47:52] if you are making a collector what ends up in the log in /var/log/diamond is precisely what gets sent to statsd [16:48:03] the logger is just another output for the same mechanism essentially [16:48:16] anyways, just a thought on collector debugging [16:48:54] chasemp: btw, since we killed archiveloghandler nothing ends up in /var/log/diamond, only errors :) [16:49:08] chasemp: so you've to do self.log.debug/error for it to show up while debugging [16:50:10] easier to start diamond the forground then and do runonce for a collector maybe :) [16:50:21] chasemp: ah, that as well, yea [16:50:30] chasemp: my workflow is to edit locally, apply on puppetmaster, test, and repeat [16:50:31] so [16:50:38] hmm, now that I think about it, not very efficient, eh [16:51:01] 3Wikimedia Labs / 3Infrastructure: set apt to Wikimedia mirror instead of http://nova.clouds.archive.ubuntu.com/ubuntu/ - 10https://bugzilla.wikimedia.org/66121#c3 (10Andrew Bogott) I don't think sources.list comes from puppet, though. At least, I just dropped a test line into mine and it persisted over a p... [16:51:05] I really find it strange here how we test all things with a puppetmaster even if they are not puppetmaster related functionality [16:51:27] chasemp: heh, yeah. Terrible for getting volunteers involved... [16:51:40] doing ./localrun isn't cheating or anything of the kind to me, though it is less tolerant of some of our weird manifest organization [16:51:55] right, which is why I have my workflow, I think [16:52:03] output of puppet agent -tv is often useful [16:52:16] you can do --test with localrun too :) [16:52:35] chasemp: wait, are you talking about something specifically called localrun? :) [16:52:38] yes [16:52:41] oh? [16:52:45] shortcut script in the puppet repo [16:52:50] oh [16:52:51] for local testing / debugging [16:52:53] didn't realize [16:52:54] lol [16:52:59] by local do you mean 'in my machine'? [16:53:02] 3Wikimedia Labs / 3Infrastructure: set apt to Wikimedia mirror instead of http://nova.clouds.archive.ubuntu.com/ubuntu/ - 10https://bugzilla.wikimedia.org/66121#c4 (10Antoine "hashar" Musso) I guess sources.list is the default coming with whatever ubuntu package. We could get puppet to override it entirely... [16:53:28] damn it's --verbose locally and --debug [16:53:35] stupid puppet and their mismatch options [16:53:43] heh [16:53:45] yes, local puppet testing on a box [16:53:50] nice! [16:53:53] I shall investigate [16:54:20] will also lint your uncommited git files with lint arg [16:54:23] I tink that still works [16:54:29] I admit, I'm not as good about it as I once was [16:54:47] anyways, outside of collected resources or oddly arranged manifests it is the same thing [16:55:18] basically, unless you are testing puppetmaster specific behavior one isn't raelly needed. my last gig we did all module development this way [16:56:22] chasemp: nice! [16:56:47] 3Wikimedia Labs / 3Infrastructure: set apt to Wikimedia mirror instead of http://nova.clouds.archive.ubuntu.com/ubuntu/ - 10https://bugzilla.wikimedia.org/66121#c5 (10Tim Landscheidt) I think we need both (amend the image + puppet file) so that new instances are built faster and old instances get updated as... [17:02:01] scfc_de: Meetbot's minutes are apparently broken, quoth sumanah [17:02:10] CC hashar [17:02:24] ...who isn't here [17:02:41] bd808: how do I log something to the deployment-prep labs page from here? [17:03:21] greg-g: bang log deployment-prep [17:03:25] thank you sir [17:03:49] !log deployment-prep Added John F. Lewis to the project after his NDA was signed by Mark (RT 7722) [17:03:51] Logged the message, Master [17:04:08] !log tools webservice start on tools.meetbot since it seemed down [17:04:10] Logged the message, Master [17:04:29] marktraceur: fixed. [17:05:02] Well, that was fast ... :-) [17:05:12] scfc_de: just needed a webservice start [17:05:22] scfc_de: plus I was participating in the -office hour [17:06:47] 3Tool Labs tools / 3[other]: [pathoschild/catanalysis] Fatal error: Allowed memory size exhausted - 10https://bugzilla.wikimedia.org/67606#c4 (10Andre Klapper) (In reply to Helder from comment #3) > Isn't the text of the bug description (comment 0) also used to provide > search results? That depends how you... [17:07:38] Huzzah ty YuviPanda [17:07:55] marktraceur: yw :) if you want to, you can remove your name from there via wikitech [17:08:17] Meh [17:26:29] !log tools tools-tcl-test: Rebooted because system said so [17:26:31] Logged the message, Master [17:29:55] tcl-test? [17:30:58] scfc_de: we should probably kill tcl-test after asking Coren, think it was for testinmg tcl packages [17:31:14] gifti: No idea, I just came across it. [17:31:18] 3Wikimedia Labs / 3tools: grid default output location - 10https://bugzilla.wikimedia.org/54054#c2 (10Marc A. Pelletier) There is an experimental version of jsub on tools-dev that will prepend arguments found in ~/.jsubrc to the command line. Please test and see if it works. [17:31:19] mhm [17:31:58] YuviPanda: Oy, I didn't even remember this existing. [17:32:34] right, so let's delete it all! [17:33:20] Coren: is the solution in 54054 instead of 62156 or additional? [17:34:36] gifti: Its meant to be an additional mechanism, but can easily be tweaked to support both. [17:41:41] hi! i have some troubles accessing databases. ive read this: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Database_access but its still not clear what i should write in a php code to access a certain database. could anyone help? [17:43:49] sanyi4: Basically, you need to read username/password from replica.my.cnf (you can use code à la https://wiki.toolserver.org/view/Database_access#PHP for that) and then connect to (for example) enwiki.labsdb for the database enwiki_p. [17:45:00] i tried to, but i couldnt open replica.my.cnf [17:45:23] permission denied [17:46:59] sanyi4: What's the tool account's name? [17:47:48] YuviPanda how do i restart the task scheduler for diamond? [17:48:35] looking at logs and it stopped due to an error [17:49:06] @scfc_de: the link you gave me may be useful, i will try to read replica.my.cnf with php. [17:49:40] @scfc_de: the name of the tool is: lonelylinks [17:54:45] dogeydogey: service diamond restart? [17:55:16] ahaha...r ight [17:57:00] !log tools tools-exec-08, tools-exec-09, tools-webgrid-02, tools-webgrid-03: Removed stale Puppet lock files and reran manually [17:57:02] 3Wikimedia Labs / 3tools: Install xsltproc - 10https://bugzilla.wikimedia.org/66962 (10Tim Landscheidt) 5PATC>3RESO/FIX [17:57:02] Logged the message, Master [18:03:54] YuviPanda okay error has gone away, all I see in the diamond.log is task scheduler [18:04:07] so how do i send more info here? [18:04:20] 3Tool Labs tools / 3[other]: Migrate http://toolserver.org/~dispenser/* to Tool Labs - 10https://bugzilla.wikimedia.org/66868 (10Dispenser) [18:04:20] 3Wikimedia Labs / 3tools: Provide namespace IDs and names in the databases similar to toolserver.namespace - 10https://bugzilla.wikimedia.org/48625 (10Dispenser) [18:04:40] dogeydogey: self.log.error('message') [18:06:28] scfc_de: the code you suggested doesnt seem to function. [18:09:15] !log tools tools-webgrid-03, tools-webgrid-04: killall -TERM gmond (bug #64216) [18:09:18] Logged the message, Master [18:09:41] sanyi4: Your error.log says: "Can't connect to MySQL server on 'enwiki-p.rrdb.toolserver.org'". I said you need to connect to enwiki.labsdb. [18:10:56] YuviPanda so the error seems to be associated with this line: time_since = int(time.time()) - int(summary['time']['last_run']) [18:10:57] !paste [18:10:57] http://tools.wmflabs.org/paste/ [18:11:21] no need for them to be ints [18:12:00] or wrapped [18:16:23] scfc_de: and /.my.cnf needs to be changed to /replica.my.cnf too? [18:18:36] sanyi4: Yes, of course. [18:20:23] dogeydogey: '-' is not a valid operation on a string? unless you mean it's a float already [18:20:34] but float to int shouldn't be a problem it just chops off the trail [18:20:42] dogeydogey: oh? do submit a fix if it works out? [18:24:37] oh nevermind yeah i'm being dumb [18:27:18] paste the exception I'm sure we can help [18:27:51] the hash is probably wrong [18:29:25] http://pastie.org/9368877 [18:31:05] dogeydogey: you can read the file itself to check [18:31:18] how do i check that hash? [18:31:28] i would like to see the output [18:31:56] dogeydogey: self.log.error() should alsao output [18:32:01] to /var/log/diamond/diamond.log [18:43:44] nevermind, it seems to be outputting numbers [18:43:48] not errors [18:43:50] :q [18:44:30] dogeydogey: oh? maybe the error was from long ago [18:44:42] well i provisioned this instance today [18:44:51] do '> /var/log/diamond && service diamond restart' [18:44:56] to start fresh [18:45:04] diamond.log even [18:45:37] YuviPanda: Are you still going to send the email to ops-l? I will if you don't. [18:45:39] btw that error probably means that whatever is supposed to return for summary isn't [18:45:48] so when you try access the dunder method as a dict [18:45:54] it says this object doesn't have it [18:46:00] Coren: yeah, I'm writing it up now, should be out in a bit. Got distracted momentarily by food and things. [18:46:04] so it's confusing, but basically summary isn't wha tyou think it is [18:47:53] YuviPanda okay so the error is gone, what else am I supposed to do here ? [18:48:36] dogeydogey: there's a status field for number of puppet failures in the last run [18:48:44] dogeydogey: so add that as a metric? [18:49:55] oh got it! [18:50:20] scfc_de: many thanks!!! it works!!! [18:50:29] YuviPanda there's a resources failure line and a events failure line, does it matter which? [18:50:45] dogeydogey: hmm, I think events? [18:50:49] I'm unsure [18:50:59] ah got it, haha sorry for so many questions but thanks for letting me work on this [18:51:05] dogeydogey: :D it's ok! [18:51:47] 3Wikimedia Labs / 3wikitech-interface: Enable irc feed for wikitech.wikimedia.org site - 10https://bugzilla.wikimedia.org/34685#c10 (10Krinkle) While wikitech doesn't run on the main apaches (nor even a dedicated cluster of apaches), it does run inside the production cluster (not isolated inside the labs sub... [19:35:38] Coren: almost done with the email. Ended up being pretty long. writing TL;DR: now [19:39:42] Coren: sent [19:42:12] Coren: do respond if I missed something [19:52:17] !log tools tools-exec-wmt, tools-shadow: Removed stale Puppet lock files and reran manually (handy: "sudo find /var/lib/puppet/state -maxdepth 1 -type f -name agent_catalog_run.lock -ls -ok rm -f \{\} \; -exec sudo puppet agent apply -tv \;") [19:52:20] Logged the message, Master [20:10:00] How long is [[Special:OAuthConsumerRegistration]] suppose to take to get approved? [20:10:55] Dispenser: Until someone with approval rights remembers to look :/ [20:11:17] hm YuviPanda: proxy question that you guys might already know the answer to [20:11:34] so to serve files for dashboard consumption, ideally we'd just add CORS to the servers doing the serving [20:12:00] milimetric: right, and the proxy should just send any CORS header through [20:12:03] I've tried to add 'Header set Access-Control-Allow-Origin "*"' in the proper spot in apache, but it doesn't seem to be working [20:12:08] ok [20:12:21] i'll debug more and if i'm certain it's getting lost in the proxy, I'll come back [20:12:24] ok! [20:12:44] Dispenser: I pinged one of the approvers in another channel to take a look. [20:13:18] thanks [20:18:09] Dispenser: I just approved your OAuth app. Sorry for the delay. [20:24:55] csteipp: Is there a public list of all approved apps? [20:25:38] scfc_de: https://www.mediawiki.org/wiki/Special:OAuthListConsumers [20:26:23] csteipp: Thanks. [20:34:45] (03CR) 10Jforrester: "Ping." [labs/tools/pywikibugs] - 10https://gerrit.wikimedia.org/r/143238 (owner: 10Jforrester) [20:36:14] (03CR) 10Yuvipanda: [C: 04-2] "Yeah, having RL and Editing move to -VE doesn't sound right. I'd be ok with moving them there while also putting them in -dev, though." [labs/tools/pywikibugs] - 10https://gerrit.wikimedia.org/r/143238 (owner: 10Jforrester) [20:37:14] YuviPanda: thanks for the pep talk, it was just user error, proxy works perfectly fine [20:37:21] milimetric: :D \o/ [20:37:38] (03CR) 10Yuvipanda: [C: 031] "LGTM, but needs to have the dependency on the previous patch removed." [labs/tools/pywikibugs] - 10https://gerrit.wikimedia.org/r/143239 (owner: 10Jforrester) [20:38:47] 3Wikimedia Labs / 3Infrastructure: set apt to Wikimedia mirror instead of http://nova.clouds.archive.ubuntu.com/ubuntu/ - 10https://bugzilla.wikimedia.org/66121#c6 (10Tim Landscheidt) On second thought and a look at /etc/apt -- shouldn't apt-get already prefer the WMF repo due to /etc/apt/preferences.d/wikim... [20:44:02] i just created a database according to this: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Steps_to_create_a_user_database_on_tools-db -- how do i access it with php, i e what host, username, password i have to use? [20:46:32] sanyi4: Use the username and password from replica.my.cnf and "tools-db" as host (or "tools.labsdb"). [20:47:06] ok, thanks! [20:51:35] is that username and password that i get with "cat replica.my.cnf | grep user" and "... password" in putty? [20:52:19] sanyi4: Yes (though the file is only three lines long :-)). [20:53:32] that may be but i cant open it with winscp [20:56:37] sanyi4: There's some special do with winscp that I always forget; you have to tell it to not check the permissions itself, but just try to access /data/project/lonelylinks/replica.my.cnf. But what I meant was: If you're in Putty, "cat replica.my.cnf" should only output three lines -- that shouldn't be so screen-filling that you'd need grep :-). [20:57:11] ln -s ~/replica.my.cnf ~/.my.cnf [20:57:16] solves everything [20:57:32] !log tools tools-exec-gift: Puppet hangs due to "apt-get update" not finishing in time; manual runs of the latter take forever [20:57:34] Logged the message, Master [20:58:34] scfc_de: is that related to the job load? [20:58:35] oh, sorry, i didnt know that "cat" is for outputting a whole file. now i understand. thanks. [20:59:28] sanyi4: If you want more of a "file viewer", under Linux often "less" is used. "q" to quit. [21:04:14] (03CR) 10Jforrester: "Given that Flow magically appears in both without being in this config file, your advice is welcome." [labs/tools/pywikibugs] - 10https://gerrit.wikimedia.org/r/143238 (owner: 10Jforrester) [21:04:42] gifti: Ah! The obvious answer I didn't even think about. Yes, a lot of >> 150 sounds like another and bigger problem. [21:09:13] (03CR) 10Jforrester: "Also, renaming the IRC channel to -editing is a chore we're studiously trying to avoid. :-)" [labs/tools/pywikibugs] - 10https://gerrit.wikimedia.org/r/143238 (owner: 10Jforrester) [21:09:17] gifti: Did you just stop all jobs there? [21:09:46] yes, they do unneded things, it's a test run anyway [21:10:43] k [21:10:58] "Finished catalog run in 102.22 seconds" => Sounds a lot better :-). [21:11:53] the array job will return on 1st and 16th of every month and run for 2-3 days, will that be a problem, scfc_de? [21:15:02] I haven't looked deeper into array jobs and the specifics of tools-exec-gift, but SGE should be configured in a way that a host is never overloaded in such a way. As it's the 8th and the load was >> 150, something's clearly broken. [21:15:30] that wasn't a scheduled run [21:15:58] i need more nodes :D [21:16:59] something is weird, the next job is still waiting since 8 minutes [21:17:20] If there's a special requirement to run on the 1st and 16th for 2-3 days, maybe. But the grid should run the jobs when there are resources available and not just all at once. [21:17:40] hm [21:26:09] i think i found a referential integrity violation in enwiki's database [21:26:31] there's a bunch of templatelinks entries where tl_from is 3608776, but there's no page with that id [21:30:21] jackmcbarn: I think referential integrity is second to speed at Wikipedia. [21:30:43] what does it do? just leave templatelinks behind when pages go away? [21:32:12] (also id 160860) [21:35:53] I think (and you might find more authorative answers in #mediawiki :-)) that MediaWiki updates templatelinks via the job queue, i. e. it at least defers that until a later time, and if the update shouldn't happen, that's not the end of the world, because users can force another update with a null-edit. [22:24:51] (03CR) 10Legoktm: [C: 04-1] "Flow bugs only appear in -corefeatures. You're probably confusing it with grrrit-wm, which sends to both (3 actually) channels." [labs/tools/pywikibugs] - 10https://gerrit.wikimedia.org/r/143238 (owner: 10Jforrester) [22:29:57] (03PS1) 10Legoktm: Send core bugs related to multimedia to #wikimedia-dev [labs/tools/pywikibugs] - 10https://gerrit.wikimedia.org/r/144838 [22:30:00] (03CR) 10Jforrester: "Oh, maybe." [labs/tools/pywikibugs] - 10https://gerrit.wikimedia.org/r/143238 (owner: 10Jforrester) [22:30:39] (03CR) 10Jforrester: [C: 04-1] "This hides some of the major multimedia bugs from the Multimedia team. This is a bad move." [labs/tools/pywikibugs] - 10https://gerrit.wikimedia.org/r/144838 (owner: 10Legoktm) [22:34:27] do we have a tool to see a tool's outgoing network traffic? [22:35:58] Coren: In an effort to make project deletion actually delete projects, I've added a tool to labstore1001 called 'archive-project-volumes' which makes tarballs of stale volume and moves them out of the way. [22:36:28] I haven't given it free reign yet, but apparently there are 67 dirs in /srv/projects that don't have a corresponding ldap entry. [22:37:24] I am pretty sure that everything in that list is an orphan and that there are no special cases, but… perhaps you'd best double-check for me before I start clobbering things. The list is /srv/project/volumes-without-projects [22:42:03] I can't connect to beta labs over https. I think this used to work, [22:42:46] e.g. https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page . Maybe it never worked, but it's in my Location bar history :) [22:50:06] (03Abandoned) 10Legoktm: Send core bugs related to multimedia to #wikimedia-dev [labs/tools/pywikibugs] - 10https://gerrit.wikimedia.org/r/144838 (owner: 10Legoktm) [23:07:31] gifti: I think petan installed recently ... something to that effect, but I'm unclear about the details. [23:08:03] that … helps a lot ;) [23:37:01] !log deployment-prep Updated Kibana to 0afda49 (latest upstream head) [23:37:04] Logged the message, Master [23:45:04] 3Tool Labs tools / 3[other]: merl tools (tracking) - 10https://bugzilla.wikimedia.org/67556 (10merl) p:5Unprio>3Lowest s:5normal>3enhanc a:3merl