[00:19:16] <harej>	 So I am not entire sure if my crontab is doing anything?
[00:24:47] <harej>	 I tried configuring to run a bot through crontab and my bot wasn't editing. Then I ran the script manually and it worked just fine.
[00:30:01] <harej>	 ...oh
[00:30:03] <harej>	 Well there we go
[00:32:49] <harej>	 Now that I realized there is an error log, I see this error a lot:
[00:33:00] <harej>	  /data/project/projanalysis/bin/python3: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.17' not found (required by /data/project/projanalysis/bin/python3)
[00:33:40] <harej>	 When I run it in the virtual environment it seems fine, but I guess not from crontab.
[00:46:20] <Betacommand>	 harej: whats the crontab entry?
[00:52:43] <legoktm>	 harej: did you build the virtualenv on the same OS (trusty/precise) as the one it's running on?
[00:53:35] <harej>	 I use trusty. I don't know for certain the virtualenv was built on trusty.
[00:55:13] <Betacommand>	 harej: what is the crontab command?
[00:56:09] <harej>	 Betacommand I will look it up when I'm at
[00:56:12] <harej>	 My computer
[00:57:03] <Betacommand>	 harej: OK, I know I have to use the -cwd command to jsub otherwise stuff gets confused
[00:58:08] <harej>	 I was using jstart but then I re-read the docs and changed to jsub a few minutes ago since it seemed more appropriate
[00:58:16] <harej>	 These are scripts with defined beginnings and ends
[01:00:52] <Betacommand>	 harej: here is an example of what I use:
[01:00:54] <Betacommand>	 jsub -mem 600M -N NFCC9 python /data/project/betacommand-dev/svn_copy/sql_nfcc9.py
[01:52:36] <harej>	 Betacommand: my crontab is:
[01:52:46] <harej>	 https://www.irccloud.com/pastebin/Wp3OUHhi
[01:53:07] <harej>	 (it was originally jstart but then i changed it to jsub)
[01:56:34] <Betacommand>	 harej: I think your doing it wrong, you shouldnt have your own copy of python, just a virt environment 
[01:58:58] <harej>	 if i'm not mistaken it's just a symbolic link to the regular python
[02:01:14] <Betacommand>	 harej: see https://wikitech.wikimedia.org/wiki/Setting_up_Flask_cgi_app_as_a_tool#Set_up_virtualenv
[02:01:32] <Betacommand>	 you would need something similar for your virtualenv
[02:04:25] <harej>	 I mean I *have* a virtual environment. Which is how I've been able to install modules.
[02:05:11] <Betacommand>	 harej: for some reason when submitting it, the environment is lost
[02:05:16] <Betacommand>	 legoktm: poke
[02:06:16] <Betacommand>	 harej: if you dont get help here drop an email to the list about the proper way to use virtenv, Its probably something simple, I just dont use it so I am unsure
[02:10:41] <wikibugs>	 6Labs, 10Beta-Cluster, 5Patch-For-Review, 15User-Bd808-Test: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1281434 (10bd808)
[03:25:52] <legoktm>	 Betacommand: hey
[03:27:49] <harej>	 legoktm: my crontab is being a jerk :(
[03:29:24] <legoktm>	 harej: I have to go to dinner now, but this is what one of my jobs look like: http://fpaste.org/221349/43148774/raw/
[04:10:43] <yuvipanda>	 harej: hey! also do ‘-l release=trusty’?
[04:10:56] <harej>	 ...that may make a huge difference!
[04:11:06] <yuvipanda>	 did that work?
[04:11:12] <harej>	 I haven't done it yet
[04:11:17] <yuvipanda>	 ah :)
[04:11:27] <yuvipanda>	 by default it runs on precise
[04:11:38] <harej>	 Whereas I use trusty.
[04:11:50] <harej>	 That is why it kvetches about the software versioning being off.
[04:12:15] <harej>	 Also, it's currently complaining about my job names. Is there any way to kill those off so that I can have these new things use those name?
[04:12:31] * harej refers to: [Wed May 13 03:55:04 2015] there is a job named 'wpx_load_configuration' already active
[04:12:31] <yuvipanda>	 what is it complaining
[04:12:34] <yuvipanda>	 oh
[04:12:42] <yuvipanda>	 qdel wpx_load_configuration
[04:12:46] <yuvipanda>	 unless someone *else* is using that name
[04:13:02] <harej>	 there
[04:14:01] <harej>	 New crontab. Looks good to you? https://www.irccloud.com/pastebin/k0zj49QP
[04:15:14] <yuvipanda>	 harej: try it! :)
[04:15:25] * harej waits another 10 minutes
[04:16:15] <wikibugs>	 6Labs, 10Labs-Infrastructure: Input/Output errors in a /home directory - https://phabricator.wikimedia.org/T47609#1281480 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Closing.
[04:33:37] <harej>	 yuvipanda: It works! Thank you!
[04:33:49] <yuvipanda>	 yw!
[04:33:52] <yuvipanda>	 we should document that somewhere...
[04:34:03] <yuvipanda>	 if there’s a section on virtualenvs / using python
[06:45:33] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1206 is CRITICAL 50.00% of data above the critical threshold [0.0]
[07:10:33] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1206 is OK Less than 1.00% above the threshold [0.0]
[07:37:16] <polybuildr>	 Are there any guides for turning a Mediawiki-vagrant setup into a (reasonably secure) production server?
[07:37:35] <Coren>	 Depends what you mean by "production"
[07:37:42] <polybuildr>	 I used Labs-vagrant to set up an instance but I'm not sure it should be made public as-is.
[07:38:36] <polybuildr>	 Coren: The instance is going to be used as a spam honeypot in order to learn from patterns that bots and spammers use. I want it to be spammed, but I'd prefer not having any security loopholes.
[07:38:54] <polybuildr>	 Coren: Admin passwords and such, maybe.
[07:39:55] <Coren>	 There's no particular reason why you couldn't do that as-is.  Labs doesn't have very good performance for wikis in the general case though I expect that wouldn't be a concern in your case.
[07:40:24] <polybuildr>	 Coren: As is meaning just the out of the box Labs-vagrant instance?
[07:40:30] <polybuildr>	 After changing the admin password, of course.
[07:47:54] <wikibugs>	 10Tool-Labs: Clean up huge logs on toollabs - https://phabricator.wikimedia.org/T98652#1281595 (10coren) Actually, unlinking a file via NFS should not be any harder on the server than doing it directly - the real thing to watch out for is to make sure it is not currently opened by any process.  Something that /c...
[07:49:05] <Coren>	 Yeah, though you may well want to use the real database for your db backend.
[07:55:16] <yuvipanda>	 Coren: labs vagrant uses mysql by default 
[07:55:21] <yuvipanda>	 Also hi :)
[08:01:45] <Coren>	 yuvipanda: Hi to you too.   Wait, isn't it like ungoldy late PDT?
[08:02:05] <yuvipanda>	 I'm in bed yes
[08:02:08] <yuvipanda>	 :)
[08:02:26] <yuvipanda>	 I ended up late in the office again...
[08:02:59] <yuvipanda>	 I feel like I need to put in 1.5x the time when in office to feel 1x productive 
[08:03:57] <yuvipanda>	 Coren: catchpoint has been mostly happy since the nfs resync finished 
[08:03:58] <yuvipanda>	 So that's good
[08:04:16] <Coren>	 yuvipanda: I saw; which bodes well for the move to raid10
[08:04:16] <yuvipanda>	 Coren: do you have an eta for the rsync backup?
[08:04:42] <Coren>	 yuvipanda: It's in the second maps project now - last time I had to do an rsync it took about 20h past the maps.
[08:04:56] <yuvipanda>	 Coren: what happened to labstore1002 switch plans?
[08:05:04] <yuvipanda>	 Are we going to do raid switch first or?
[08:05:38] <Coren>	 yuvipanda: Raid switch first - debugging the hardware issue with 1002 is complicated because we're not sure how much/whether the shared shevles are interfering.
[08:05:53] <yuvipanda>	 Aah ok
[08:06:19] <Coren>	 I'm working off Mark's game plan atm, and it's pretty sane.
[08:06:25] <yuvipanda>	 Ok
[08:07:14] <yuvipanda>	 service manifests have been a success I think - I'll announce them later this week 
[08:07:37] <yuvipanda>	 Nobody had complained about webservice not running in a while 
[08:07:38] <Coren>	 We haven't managed to isolate what exactly the issue with 1002 is beyond "it's not an obvious electrical issue", but digging further means we almost certainly want to disconnect the shelves first.
[08:07:47] <yuvipanda>	 Right
[08:08:45] <Coren>	 Besides, switching the live service to a server that passes post about 50% of the time doesn't seem like a good idea to me.  :-)
[08:09:44] <Coren>	 Yeah, service manifests have been a rather good success.  Have you played with the not-webservice side yet?
[08:10:05] <yuvipanda>	 Not yet. I am going to tackle cron next 
[08:10:10] <yuvipanda>	 Crontabs next that is
[08:10:11] * Coren nods.
[08:10:22] <yuvipanda>	 Right Now I'm unifying our webservice code 
[08:10:50] <yuvipanda>	 It is currently a bunch of python / bash scripts in ops/puppet 
[08:11:04] <yuvipanda>	 I'm making that a nice python module in tools - webservice package 
[08:11:18] <yuvipanda>	 I added you as reviewer too if you wanna look :)
[08:11:44] <yuvipanda>	 This will end up rewriting lighttpd-starter and tomcat-starter too I think 
[08:12:20] <Coren>	 Yeah, I'll give the whole thing a look.
[08:13:07] <yuvipanda>	 I haven't really started rewriting the lighttpd stuff yet 
[08:13:24] <yuvipanda>	 Mostly trying to get the structure right to begin with 
[08:13:36] <yuvipanda>	 And can move on from there. 
[08:15:24] <yuvipanda>	 And now I shall sleep. Good night 
[08:20:41] <polybuildr>	 Coren: So after changing the Admin passwd, it's safe to make the wiki publicly accessible and ready to receive spam?
[08:20:59] <polybuildr>	 Coren: After all the warnings on several pages, it just seems odd that there's only one step to making it reasonably secure.
[08:22:08] <Coren>	 polybuildr: Well, it's all about the relative use case.  Since this is intended to be a honeypot, I'l working from the presumption that you can just wipe and reset it at need and that none of the data is precious.  :-)
[08:22:42] <Coren>	 polybuildr: Most of the normal hardening steps you'd want to take are to reduce the impact of spammers, after all.  :-)
[08:23:34] <Coren>	 polybuildr: But you'll have to monitor it carefully; you don't have infinite resources and the spammers might as well.
[08:26:09] <polybuildr>	 Coren: Thanks! :D
[08:26:58] <polybuildr>	 Coren: Also, if they do use their infinite resources, what happens to the instance? The DB cannot make any new inserts and maybe the network IO is all taken up by spammers, right?
[08:27:57] <Coren>	 polybuildr: More likely, your instance will crumble under the load before that.  Mediawiki doesn't scale all the well without a caching infrastructure.
[08:28:21] <polybuildr>	 Coren: And vagrant doesn't come out of the box with one, does it?
[08:28:54] <Coren>	 polybuildr: I don't think it does.  IIRC, it's fairly easy to set up a memcached for it (yuvipanda would be able to tell you for sure), but not by default.
[08:29:58] <polybuildr>	 Coren: Well, I'm in no hurry to start catching spam. I'll ask him later. :) Thanks again!
[08:30:15] <Coren>	 polybuildr: But to be honest, if your instance ends up being hammered I'd rather have /it/ crumble than sucessfully hammer on the db.  :-)
[08:31:01] <polybuildr>	 Coren: Ha, okay. :P It's an m1.small, though, so I suppose it will crumble far before it affects Labs.
[08:31:27] <Coren>	 No doubt it will.
[08:38:13] <shinken-wm>	 PROBLEM - Puppet staleness on tools-shadow is CRITICAL 100.00% of data above the critical threshold [43200.0]
[09:02:12] <mark>	 Coren: here?
[09:02:28] <Coren>	 mark: Yep
[09:02:32] <mark>	 10:04:40 <Coren> yuvipanda: It's in the second maps project now - last time I had to do an rsync it took about 20h past the maps.
[09:02:40] <mark>	 seems like it's going to take weeks, rather?
[09:03:00] <Coren>	 It's almost past the worst - both map projects together have some 20m files
[09:03:11] <mark>	 it's at 2.5 TB now
[09:03:20] <Coren>	 Yes, a LOT of very small files.
[09:03:29] <mark>	 i doubt it's gonna get much quicker
[09:03:36] <Coren>	 When I did the switch to the thin volumes, about 90% of the copy was maps
[09:03:55] <Coren>	 It took less than a day to do the rest
[09:04:05] <mark>	 but how much time will the rest of the maps take?
[09:05:48] <Coren>	 Hm.  It's in the second project, but it /does/ look like it just started.  :-(  It's got 48 million out of 118m total done
[09:06:11] <mark>	 it's gonna take a long time
[09:06:15] * Coren thinks.
[09:06:22] <Coren>	 A tarball might be faster for maps.
[09:06:27] <mark>	 why?
[09:06:47] <Coren>	 Fewer open-read small file-close cycles.
[09:06:55] <Coren>	 (during copy)
[09:07:05] <mark>	 i'm not sure why?
[09:07:19] <mark>	 it still needs to open/read/close them all :)
[09:07:37] <Coren>	 Well, the total will be the same, but building a tarball makes it all unconditionally without a bunch of stat() to make sure it'll have to
[09:07:54] <Coren>	 And all locally.
[09:07:54] <mark>	 it always has to, the destination is empty
[09:08:25] <Coren>	 Sure, but rsync doesn't know that.  It still sends every file's stat() to the receiving end, which then queues the transfer.
[09:08:44] <mark>	 so I actually tried something different yesterday
[09:08:48] <Coren>	 I don't think rsync optimizes for that
[09:08:52] <mark>	 i allocated a new 41t thin volume on the destination
[09:09:00] <mark>	 and then copied the volume block device over
[09:09:03] <mark>	 which is sparse of course
[09:09:17] <mark>	 so I compressed the sparse blocks to speed up the transfer for that
[09:09:17] <Coren>	 Oh?  I never tried that.  How does that fare?
[09:09:25] <mark>	 it went at about 30 MB/s on idle bandwidth
[09:09:28] <mark>	 on average
[09:09:34] <mark>	 and it was writing it sparse to the thin volume
[09:09:47] <mark>	 ionice -c idle pv -eprab /dev/mapper/store-backup | pigz -cf - | pv -q -L 50m | ssh -o Compression=no -o CompressionLevel=0 root@labstore2001.codfw.wmnet 'unpigz -c | dd of=/dev/mapper/store-ls1001_now conv=sparse'
[09:09:48] <Coren>	 Ooo.  That's the part I wouldn't have expected.  That's very nice actuallyy.
[09:09:49] <mark>	 is what I used
[09:10:03] <mark>	 it was still gonna take 350 hours
[09:10:09] <mark>	 despite being a fair bit faster than the current rsync
[09:10:29] <Coren>	 Hm.  350 hours would be /loger/ than the current rsync, wouldn't it?
[09:10:35] <mark>	 i doubt it
[09:10:42] <Coren>	 It's got 48m of 118m files done in... 70 hours or so
[09:10:49] <mark>	 that's why I'm saying that I think your rsync estimates is wrong ;)
[09:11:22] <Coren>	 You're correct that it's wrong - I was expecting alphabetical directoies in project/maps so I thought that part was almost done.
[09:11:25] <mark>	 it was less impacting on labs nfs actually
[09:11:33] <mark>	 because it's more sequential reads
[09:11:46] <mark>	 but some drawbacks, very hard to restart that process of course
[09:11:50] <mark>	 and i'm not 100% sure it will work
[09:11:53] <mark>	 can only tell at the end really
[09:12:01] <mark>	 so rather than wait a week and then find that it failed... I canceled it again
[09:12:15] <Coren>	 Yeah - the sequential reads will be easier on the server for sure.  I think it'll tranfer a lot more data though.
[09:12:28] <mark>	 it shouldn't much
[09:12:33] <Coren>	 Those filesystems have a lot of files shorter than 1 block in length.
[09:12:34] <mark>	 just because there's no discard/fstrim there's some bloat
[09:13:57] <Coren>	 Not even counting the unused blocks - if you copy at block resolution you're going to increase the size of a lot of files to the next block size.  On average half a block per file.
[09:15:08] <Coren>	 mark: We can try your technique with a different filesystem.
[09:15:25] <Coren>	 mark: create a small filesystem, put a couple 100s files in it, and try that.
[09:15:26] <mark>	 so after fstrim there are no unused blocks in theory
[09:15:28] <mark>	 but didn't want to run that
[09:15:29] <mark>	 yeah
[09:15:44] <mark>	 but anyway, you were not around, decided to stop it
[09:16:30] <mark>	 also reading from the old project volume
[09:16:36] <mark>	 goes at about 700 MB/s
[09:16:41] <mark>	 because no client load on those disks
[09:17:15] <Coren>	 Yeah, not likely to be that fast on the live one.
[09:17:32] <mark>	 no, caps out at about 30 MB/s before it's starting to get impacting
[09:18:23] <mark>	 http://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&m=cpu_report&c=Labs+NFS+cluster+eqiad&h=labstore1001.eqiad.wmnet&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=ALLGROUPS
[09:18:39] <Coren>	 But right now the copy is 48/118 after 103 hours (just checked)
[09:18:49] <Nemo_bis>	 https://tools.wmflabs.org/meta/ seems to need a restart?
[09:18:53] <mark>	 i paused that rsync while I was playing yesterday
[09:18:59] <mark>	 (STOP/CONT)
[09:19:17] <mark>	 48 what?
[09:19:18] <Coren>	 mark: do you know how long you stopped it, rought?
[09:19:22] <mark>	 a few hours
[09:19:30] <Coren>	 48 million files out of 118
[09:19:37] <mark>	 check the ganglia graph
[09:19:47] <mark>	 approx 5 hours I think
[09:20:01] <mark>	 cpu load was higher with my metric, but iowait lower
[09:20:08] <Coren>	 mark: 5h isn't big enough to impact the average all that much.
[09:20:22] <mark>	 s/metric/method/
[09:20:29] <Coren>	 It looks like rsync keeps about half a million files / hour
[09:21:55] <Coren>	 = ~220-240h for the whole thing.
[09:22:05] <mark>	 let's wait for that then
[09:22:11] <mark>	 perhaps we can do the copy to local eqiad using the other method
[09:22:16] <mark>	 from codfw
[09:22:18] <mark>	 that'll be quick then
[09:22:25] <mark>	 wire speed i'm sure
[09:22:41] <mark>	 also it will be less fragmented after the rsync
[09:22:48] <Coren>	 Since codfw is entirely unloaded, I'm pretty sure we can do a sequetial read fast enough to fill wire.
[09:22:54] <mark>	 easily
[09:23:04] <mark>	 just need 41 TB of thinly allocated storage somewhere
[09:23:55] <mark>	 or less actually
[09:24:01] <mark>	 we can shrink that volume to closer to actual size for the copy
[09:24:07] <mark>	 so we don't need to copy those sparse blocks at all
[09:24:12] <mark>	 we can just eliminate them, even easier
[09:24:31] <mark>	 gzip minimizes them but it's still not terribly efficient
[09:24:49] <Coren>	 Well, we can shrink the thin pool to the actual on-disk size
[09:24:55] <mark>	 that's what I just said
[09:25:06] <mark>	 so yeah let's do that
[09:25:37] <mark>	 do any of these boxes have 10G?
[09:25:54] <Coren>	 Networking?  No.
[09:28:37] <Coren>	 Best we could do is bind two 1g together
[09:28:58] <mark>	 wouldn't help for a single stream
[09:29:31] <Coren>	 If there's no other traffic, true.
[09:29:46] <mark>	 and there wouldn't be
[09:34:34] <wikibugs>	 10Tool-Labs: Get rid of toolwatcher, use skeleton homedirs instead - https://phabricator.wikimedia.org/T91235#1281773 (10coren) pam_mkhomedir is only invoked when a logs in for the first time which - by design - can never happen for service users/groups.  The user is never actually //created// in the labs projec...
[09:38:06] <wikibugs>	 10Tool-Labs: Grid engine "swallows" quotation marks (double and single quotation marks) and does not recognize pages at cs.wiki (more at "Additional Information") - https://phabricator.wikimedia.org/T74092#1281789 (10coren) @wesalius: Have you tried the console_encoding change yet?
[09:48:30] <wikibugs>	 10Tool-Labs: Cannot start java processes using the grid engine - https://phabricator.wikimedia.org/T69588#1281795 (10coren) The `java/lang/NoClassDefFoundError` exception means that a class that was available at compile time is not available at runtime.  If you get the error when on the grid but not interactivel...
[09:51:24] <wikibugs>	 6Labs, 10Labs-Infrastructure: korma.wmflabs.org server not reachable - https://phabricator.wikimedia.org/T98470#1281799 (10Acs) The instance is up now since yesterday. Any clues about the problem?  Cheers
[09:58:41] <wikibugs>	 6Labs, 10Labs-Infrastructure, 6operations: Migrate Labs NFS storage from RAID6 to RAID10 - https://phabricator.wikimedia.org/T96063#1281802 (10mark) This seems reasonable yes. Let's move ahead with this after the new backups have finished (codfw + eqiad).
[10:20:31] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-catscan is CRITICAL 100.00% of data above the critical threshold [43200.0]
[10:22:20] <Coren>	 yuvipanda: Are you the one that disabled puppet on tools-exec-canscan?
[10:24:15] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-cyberbot is CRITICAL 100.00% of data above the critical threshold [43200.0]
[10:31:52] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-15 is CRITICAL 100.00% of data above the critical threshold [43200.0]
[10:32:16] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-gift is CRITICAL 100.00% of data above the critical threshold [43200.0]
[10:33:12] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-wmt is CRITICAL 100.00% of data above the critical threshold [43200.0]
[10:33:32] <afeder>	 shinken-wm: FWIW, I can't access Wikipedia at all.
[10:36:08] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-08 is CRITICAL 100.00% of data above the critical threshold [43200.0]
[10:36:48] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-14 is CRITICAL 100.00% of data above the critical threshold [43200.0]
[10:37:02] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-07 is CRITICAL 100.00% of data above the critical threshold [43200.0]
[10:43:13] <polybuildr>	 .quit
[11:34:45] * Coren greatly desires to murder opendj.
[12:49:07] <wm-bot>	 Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/GuZ-MPG was created, changed by GuZ-MPG link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/GuZ-MPG edit summary: Created page with "{{Tools Access Request |Justification=test tools for display of -omics data from wikidata |Completed=false |User Name=GuZ-MPG }}"
[12:57:28] <Betacommand>	 Coren: did you see my tool link?
[13:04:47] <Coren>	 Betacommand: No, was traveling.  Give again?
[13:06:16] <Betacommand>	 Coren: need to add the filters but http://tools.wmflabs.org/betacommand-dev/cgi-bin/sge_status.py is the basic framework
[13:06:40] * Coren looks
[13:10:49] <Betacommand>	 Coren: thoughts?
[13:11:56] <Coren>	 Well, it looks like it works fine, but I can't really gauge whether there's a marked improvement yet.
[13:12:10] <wikibugs>	 6Labs: Reinstall db1009 from zero - https://phabricator.wikimedia.org/T98958#1282071 (10jcrespo) 3NEW a:3jcrespo
[13:13:21] <Betacommand>	 Coren: Not sure I would call it an improved version, just a different way of looking at the same data. All of the columns are sortable so you can view/group by different things
[13:15:08] <wikibugs>	 6Labs: Reinstall db1009 from zero - https://phabricator.wikimedia.org/T98958#1282079 (10jcrespo)
[13:16:41] <Betacommand>	 Coren: and the memory value sorting actually handles the B/MB/GB correctly, and so does the time sorting for CPU
[13:17:14] <Coren>	 Yeah, sorty tables made with dumb javascript are not quite as useful.
[13:19:21] <Betacommand>	 Coren: had to use css hidden values to get the sorting to work
[13:22:16] <wikibugs>	 6Labs: Reinstall db1009 from zero - https://phabricator.wikimedia.org/T98958#1282099 (10Springle)   - add an `m5-master.eqiad.wmnet` CNAME to dns
[13:24:16] <wikibugs>	 6Labs: Get Labs openstack service dbs on a proper db server - https://phabricator.wikimedia.org/T92693#1282100 (10Springle) Note that services should connect to a CNAME, `m5-master.eqiad.wmnet`. Check if we need any special network/vlan rules (think not, but since it is labs-related, ask @faidon)
[13:25:22] <wikibugs>	 6Labs: Reinstall db1009.eqiad from zero - https://phabricator.wikimedia.org/T98958#1282118 (10jcrespo)
[13:35:38] <wikibugs>	 6Labs: Reinstall db1009.eqiad from zero - https://phabricator.wikimedia.org/T98958#1282123 (10jcrespo)
[13:43:36] <wikibugs>	 6Labs: Reinstall db1009.eqiad from zero - https://phabricator.wikimedia.org/T98958#1282124 (10jcrespo) I believe do not have permisions to schedule the host as down in icinga.
[13:52:37] <wikibugs>	 6Labs: Reinstall db1009.eqiad from zero - https://phabricator.wikimedia.org/T98958#1282125 (10jcrespo)
[13:52:46] <TParis>	 Anyone know how to configure squid?
[13:53:32] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL 33.33% of data above the critical threshold [0.0]
[14:09:14] <wikibugs>	 6Labs: Reinstall db1009.eqiad from zero - https://phabricator.wikimedia.org/T98958#1282141 (10jcrespo)
[14:23:31] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK Less than 1.00% above the threshold [0.0]
[14:35:15] <wikibugs>	 6Labs: Reinstall db1009.eqiad from zero - https://phabricator.wikimedia.org/T98958#1282196 (10jcrespo)
[14:36:05] <wikibugs>	 6Labs: Reinstall db1009.eqiad from zero - https://phabricator.wikimedia.org/T98958#1282071 (10jcrespo) I now have permissions, Icinga notified downtime scheduled.
[14:59:35] <wikibugs>	 6Labs: allow routing between labs instances and public labs ips - https://phabricator.wikimedia.org/T96924#1282237 (10Andrew) Alex, are there any obvious next steps here?  Unfortunately pdns doesn't support split horizon, so the alternative to fixing this is ugly, requiring two different dns servers.
[15:11:51] <wikibugs>	 10MediaWiki-extensions-OpenStackManager: cannot create foo.foo.wmflabs.org domain - https://phabricator.wikimedia.org/T56664#1282243 (10Andrew) p:5Triage>3Low
[15:14:26] <wikibugs>	 6Labs: OpenStack manager extension improvements or replacement - https://phabricator.wikimedia.org/T85613#1282251 (10Andrew) 5Open>3Invalid Horizon is under way; I'm not sure this bug is useful anymore.
[15:16:17] <wikibugs>	 6Labs: Add a second pdns/mysql server - https://phabricator.wikimedia.org/T94865#1282257 (10Andrew) p:5Triage>3Normal
[15:16:40] <wikibugs>	 6Labs: Make a fact for project_id on labs instances - https://phabricator.wikimedia.org/T93684#1282267 (10Andrew) p:5Triage>3Normal
[15:17:31] <wikibugs>	 6Labs: Test Ceph for instance storage - https://phabricator.wikimedia.org/T90364#1282273 (10Andrew) p:5Triage>3Normal
[15:19:01] <wikibugs>	 6Labs, 10Labs-Infrastructure: provide bastion redundancy via DNS round robin - https://phabricator.wikimedia.org/T59834#1282277 (10Andrew) p:5Triage>3Normal
[15:19:29] <wikibugs>	 6Labs, 10Labs-Infrastructure, 5Patch-For-Review: Public IPs not being updated from OpenStack Nova plugin - https://phabricator.wikimedia.org/T52620#1282281 (10Andrew) 5Open>3stalled p:5Triage>3Normal
[16:07:17] <wikibugs>	 6Labs: Reinstall db1009.eqiad from zero - https://phabricator.wikimedia.org/T98958#1282460 (10jcrespo)
[16:10:03] <yuvipanda>	 valhallasw: thanks for adding me to the git reviewers list for tools :)
[16:10:25] <yuvipanda>	 valhallasw: also look at the webservice patch again?
[16:10:32] <valhallasw>	 yuvipanda: will do, later
[16:10:51] <valhallasw>	 first grading students' homework
[16:10:54] <yuvipanda>	 valhallasw: thanks 
[16:10:56] <yuvipanda>	 Haha
[16:11:02] <valhallasw>	 also shinken shinken shinken :>
[16:53:49] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1203 is CRITICAL 80.00% of data above the critical threshold [0.0]
[16:53:53] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-generic-1401 is CRITICAL 90.00% of data above the critical threshold [0.0]
[16:54:04] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1404 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:54:30] <shinken-wm>	 PROBLEM - Puppet failure on tools-static-01 is CRITICAL 80.00% of data above the critical threshold [0.0]
[16:54:32] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1215 is CRITICAL 80.00% of data above the critical threshold [0.0]
[16:54:35] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL 50.00% of data above the critical threshold [0.0]
[16:54:40] <shinken-wm>	 PROBLEM - Puppet failure on tools-services-02 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:54:41] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1213 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:54:44] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1205 is CRITICAL 50.00% of data above the critical threshold [0.0]
[16:54:46] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1207 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:54:47] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1208 is CRITICAL 55.56% of data above the critical threshold [0.0]
[16:54:51] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-generic-1403 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:54:51] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1203 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:54:59] <shinken-wm>	 PROBLEM - Puppet failure on tools-services-01 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:54:59] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1204 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:55:00] <andrewbogott>	 those are almost certainly my fault and should clear shortly
[16:55:05] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1407 is CRITICAL 80.00% of data above the critical threshold [0.0]
[16:55:05] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:55:09] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1204 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:55:11] <shinken-wm>	 PROBLEM - Puppet failure on tools-submit is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:55:40] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1405 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:55:41] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1407 is CRITICAL 50.00% of data above the critical threshold [0.0]
[16:55:44] <shinken-wm>	 PROBLEM - Puppet failure on tools-trusty is CRITICAL 60.00% of data above the critical threshold [0.0]
[16:56:06] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1403 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:56:09] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1408 is CRITICAL 66.67% of data above the critical threshold [0.0]
[16:56:11] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1208 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:56:17] <shinken-wm>	 PROBLEM - Puppet failure on tools-static-02 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:56:17] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1205 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:56:21] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1409 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:56:33] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1206 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:56:39] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1402 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:56:53] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1406 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:57:12] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1401 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:57:24] <shinken-wm>	 PROBLEM - Puppet failure on tools-redis is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:57:26] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1201 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:57:34] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1410 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:57:34] <shinken-wm>	 PROBLEM - Puppet failure on tools-master is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:57:42] <shinken-wm>	 PROBLEM - Puppet failure on tools-redis-slave is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:57:44] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-generic-1402 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:57:45] <shinken-wm>	 PROBLEM - Puppet failure on tools-bastion-02 is CRITICAL 80.00% of data above the critical threshold [0.0]
[16:57:53] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1405 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:57:53] <valhallasw>	 andrewbogott: You don't have permission to access /production/catalog/i-00000bd3.eqiad.wmflabs on this server
[16:57:56] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1408 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:57:57] <shinken-wm>	 PROBLEM - Puppet failure on tools-checker-01 is CRITICAL 50.00% of data above the critical threshold [0.0]
[16:58:03] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1214 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:58:07] <valhallasw>	 andrewbogott: if that's the issue you're expecting: great!
[16:58:16] <shinken-wm>	 PROBLEM - Puppet failure on tools-checker-02 is CRITICAL 70.00% of data above the critical threshold [0.0]
[16:58:16] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-generic-1404 is CRITICAL 44.44% of data above the critical threshold [0.0]
[16:58:22] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1211 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:58:22] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1401 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:58:22] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1216 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:58:24] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1201 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:58:26] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1206 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:58:34] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1207 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:58:34] <shinken-wm>	 PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:58:35] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1209 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:58:36] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1404 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:58:36] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1403 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:58:38] <andrewbogott>	 valhallasw: is that happening right this second?  Because it’s resolved for me everywhere that I’ve looked…
[16:58:38] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1409 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:58:39] <valhallasw>	 I should write a smarter puppet reporting bot...
[16:58:42] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1202 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:58:53] <valhallasw>	 andrewbogott: that's one of the recent ones being reported here
[16:58:57] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1219 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:59:01] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1209 is CRITICAL 60.00% of data above the critical threshold [0.0]
[16:59:05] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1217 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:59:07] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1210 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:59:09] <valhallasw>	 andrewbogott: let me do a manual run
[16:59:15] <andrewbogott>	 thanks
[16:59:31] <shinken-wm>	 PROBLEM - Puppet failure on tools-webproxy-02 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:59:42] <valhallasw>	 andrewbogott: manual run looks OK
[16:59:58] <andrewbogott>	 cool, thanks for checking.
[17:00:11] <valhallasw>	 the reporting chain isn't very fast, it seems
[17:00:28] <shinken-wm>	 PROBLEM - Puppet failure on tools-submit is CRITICAL 100.00% of data above the critical threshold [0.0]
[17:00:49] <valhallasw>	 puppet run fails -> last puppet run status file gets read by diamond every N minutes -> gets pushed to graphite -> shinken reads that every K minutes
[17:01:18] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1218 is CRITICAL 100.00% of data above the critical threshold [0.0]
[17:01:18] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1202 is CRITICAL 100.00% of data above the critical threshold [0.0]
[17:01:22] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1210 is CRITICAL 50.00% of data above the critical threshold [0.0]
[17:01:26] <shinken-wm>	 PROBLEM - Puppet failure on tools-webproxy-01 is CRITICAL 100.00% of data above the critical threshold [0.0]
[17:01:39] <andrewbogott>	 yeah, quite the lag.
[17:02:58] <shinken-wm>	 RECOVERY - Puppet failure on tools-checker-01 is OK Less than 1.00% above the threshold [0.0]
[17:03:16] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-generic-1404 is OK Less than 1.00% above the threshold [0.0]
[17:03:48] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1203 is OK Less than 1.00% above the threshold [0.0]
[17:03:52] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-generic-1401 is OK Less than 1.00% above the threshold [0.0]
[17:04:06] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1404 is OK Less than 1.00% above the threshold [0.0]
[17:04:26] <shinken-wm>	 RECOVERY - Puppet failure on tools-static-01 is OK Less than 1.00% above the threshold [0.0]
[17:04:32] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1215 is OK Less than 1.00% above the threshold [0.0]
[17:06:21] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1210 is OK Less than 1.00% above the threshold [0.0]
[17:08:12] <shinken-wm>	 RECOVERY - Puppet failure on tools-checker-02 is OK Less than 1.00% above the threshold [0.0]
[17:08:20] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1401 is OK Less than 1.00% above the threshold [0.0]
[17:09:02] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1209 is OK Less than 1.00% above the threshold [0.0]
[17:09:04] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1217 is OK Less than 1.00% above the threshold [0.0]
[17:09:42] <shinken-wm>	 RECOVERY - Puppet failure on tools-services-02 is OK Less than 1.00% above the threshold [0.0]
[17:09:45] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1207 is OK Less than 1.00% above the threshold [0.0]
[17:10:01] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1204 is OK Less than 1.00% above the threshold [0.0]
[17:10:03] <shinken-wm>	 RECOVERY - Puppet failure on tools-services-01 is OK Less than 1.00% above the threshold [0.0]
[17:10:09] <shinken-wm>	 RECOVERY - Puppet failure on tools-submit is OK Less than 1.00% above the threshold [0.0]
[17:10:09] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1204 is OK Less than 1.00% above the threshold [0.0]
[17:11:17] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1205 is OK Less than 1.00% above the threshold [0.0]
[17:11:19] <shinken-wm>	 RECOVERY - Puppet failure on tools-static-02 is OK Less than 1.00% above the threshold [0.0]
[17:11:22] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1409 is OK Less than 1.00% above the threshold [0.0]
[17:11:32] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1206 is OK Less than 1.00% above the threshold [0.0]
[17:11:37] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1402 is OK Less than 1.00% above the threshold [0.0]
[17:11:53] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1406 is OK Less than 1.00% above the threshold [0.0]
[17:12:16] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1401 is OK Less than 1.00% above the threshold [0.0]
[17:12:26] <shinken-wm>	 RECOVERY - Puppet failure on tools-redis is OK Less than 1.00% above the threshold [0.0]
[17:12:48] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-generic-1402 is OK Less than 1.00% above the threshold [0.0]
[17:13:04] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1214 is OK Less than 1.00% above the threshold [0.0]
[17:13:23] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1216 is OK Less than 1.00% above the threshold [0.0]
[17:13:23] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1201 is OK Less than 1.00% above the threshold [0.0]
[17:13:37] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1404 is OK Less than 1.00% above the threshold [0.0]
[17:13:37] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1403 is OK Less than 1.00% above the threshold [0.0]
[17:13:57] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1219 is OK Less than 1.00% above the threshold [0.0]
[17:14:33] <shinken-wm>	 RECOVERY - Puppet failure on tools-webproxy-02 is OK Less than 1.00% above the threshold [0.0]
[17:14:41] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1213 is OK Less than 1.00% above the threshold [0.0]
[17:14:47] <wm-bot>	 Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/GuZ-MPG was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=158661 edit summary: 
[17:14:52] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1203 is OK Less than 1.00% above the threshold [0.0]
[17:15:59] <wikibugs>	 6Labs: Reinstall db1009.eqiad from zero - https://phabricator.wikimedia.org/T98958#1282687 (10jcrespo)
[17:16:07] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1403 is OK Less than 1.00% above the threshold [0.0]
[17:16:09] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1208 is OK Less than 1.00% above the threshold [0.0]
[17:16:15] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1218 is OK Less than 1.00% above the threshold [0.0]
[17:16:21] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1202 is OK Less than 1.00% above the threshold [0.0]
[17:16:29] <shinken-wm>	 RECOVERY - Puppet failure on tools-webproxy-01 is OK Less than 1.00% above the threshold [0.0]
[17:17:27] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1201 is OK Less than 1.00% above the threshold [0.0]
[17:17:31] <shinken-wm>	 RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0]
[17:17:40] <shinken-wm>	 RECOVERY - Puppet failure on tools-redis-slave is OK Less than 1.00% above the threshold [0.0]
[17:17:51] <wikibugs>	 6Labs: Reinstall db1009.eqiad from zero - https://phabricator.wikimedia.org/T98958#1282071 (10jcrespo) Server seems to have been correctly installed, and puppet and salt are ok. But disk partitioning has to be verified and extra configuration has to be applied via puppet. Icinga notifications have been enabled f...
[17:17:54] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1405 is OK Less than 1.00% above the threshold [0.0]
[17:17:59] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1408 is OK Less than 1.00% above the threshold [0.0]
[17:18:22] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1211 is OK Less than 1.00% above the threshold [0.0]
[17:18:26] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1206 is OK Less than 1.00% above the threshold [0.0]
[17:18:28] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1209 is OK Less than 1.00% above the threshold [0.0]
[17:18:34] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1207 is OK Less than 1.00% above the threshold [0.0]
[17:19:09] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1210 is OK Less than 1.00% above the threshold [0.0]
[17:19:49] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-generic-1403 is OK Less than 1.00% above the threshold [0.0]
[17:20:01] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1407 is OK Less than 1.00% above the threshold [0.0]
[17:20:03] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK Less than 1.00% above the threshold [0.0]
[17:20:43] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1407 is OK Less than 1.00% above the threshold [0.0]
[17:20:43] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1405 is OK Less than 1.00% above the threshold [0.0]
[17:20:47] <shinken-wm>	 RECOVERY - Puppet failure on tools-trusty is OK Less than 1.00% above the threshold [0.0]
[17:21:09] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1408 is OK Less than 1.00% above the threshold [0.0]
[17:22:33] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1410 is OK Less than 1.00% above the threshold [0.0]
[17:22:45] <shinken-wm>	 RECOVERY - Puppet failure on tools-bastion-02 is OK Less than 1.00% above the threshold [0.0]
[17:22:53] <andrewbogott>	 what’s that, shinken?
[17:23:34] <shinken-wm>	 RECOVERY - Puppet failure on tools-bastion-01 is OK Less than 1.00% above the threshold [0.0]
[17:23:36] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1409 is OK Less than 1.00% above the threshold [0.0]
[17:23:44] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1202 is OK Less than 1.00% above the threshold [0.0]
[17:24:46] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1208 is OK Less than 1.00% above the threshold [0.0]
[17:24:46] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1205 is OK Less than 1.00% above the threshold [0.0]
[17:44:32] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK Less than 1.00% above the threshold [0.0]
[17:44:37] <wikibugs>	 6Labs: allow routing between labs instances and public labs ips - https://phabricator.wikimedia.org/T96924#1282781 (10akosiaris) After some conversation on IRC, we tried unsettings dmz_cidr. The 2 rules I mentioned above where indeed cleared, but various problems cropped up as VMs started using public IPs to acc...
[17:55:22] <wikibugs>	 6Labs: allow routing between labs instances and public labs ips - https://phabricator.wikimedia.org/T96924#1282836 (10Andrew) https://gerrit.wikimedia.org/r/#/c/210720/ removes other labs instances (private IPs) from the dmz.  That should break fairly few things.  Labs security rules that are exclusive to 10.0.0...
[18:03:37] <marcmiquel>	 hi guys
[18:04:05] <marcmiquel>	 I tried virtual env as a solution to run some modules i cannot install in tool labs as i would do normally 
[18:04:20] <marcmiquel>	 the thing is that i am installing module scipy
[18:04:32] <marcmiquel>	 and i'm finding many errors during the installation "pip install scipy"
[18:04:51] <yuvipanda>	 What errors are you getting?
[18:04:58] <yuvipanda>	 Can you pastebin them?
[18:05:08] <marcmiquel>	 later,when i run a different module (reverse-geocoding) which uses scipy, it gives an error
[18:05:33] <marcmiquel>	 Traceback (most recent call last):
[18:05:33] <marcmiquel>	   File "cira_selection.py", line 12, in <module>
[18:05:34] <marcmiquel>	     import reverse_geocoder as rg
[18:05:35] <marcmiquel>	   File "/home/marcmiquel/venv/local/lib/python2.7/site-packages/reverse_geocoder/__init__.py", line 10, in <module>
[18:05:37] <marcmiquel>	     from scipy.spatial import cKDTree as KDTree
[18:05:40] <marcmiquel>	 ImportError: No module named scipy.spatial
[18:05:44] <marcmiquel>	 well, this is the trigger error. it doesn't find spatial
[18:05:52] <marcmiquel>	 but the problem is in the scipy installation i guess
[18:05:53] <yuvipanda>	 Go to phabricator.wikimedia.org/paste 
[18:05:58] <yuvipanda>	 Paste your error there 
[18:06:03] <yuvipanda>	 And give us the link?
[18:06:20] <yuvipanda>	 Pasting directly on IRC makes it very difficult to read and it strips indents 
[18:06:29] <marcmiquel>	 yes
[18:07:23] <marcmiquel>	 yuvipanda: where can i paste it in phabricator?
[18:07:42] <yuvipanda>	 Phabricator.wikimedia.org/paste
[18:08:18] <yuvipanda>	 My bus is here now. Brb
[18:09:49] <marcmiquel>	 ok
[18:09:52] <marcmiquel>	 here it goes:
[18:09:54] <marcmiquel>	 https://phabricator.wikimedia.org/P647
[18:27:21] <wikibugs>	 10Tool-Labs-tools-Quentinv57's-tools: Global Sysop Statistics are not working - https://phabricator.wikimedia.org/T72185#1282958 (10-jem-) 5Open>3Resolved
[18:27:30] <valhallasw>	 yuvipanda: so... puppet-compiler only works if there are host directives in puppet. Would it make sense to have that kind of configuration for tools as well? It has the advantage of having less config in wikitech, but I think you mentioned there was anothre way to do that coming soon(tm)
[18:27:58] <valhallasw>	 marcmiquel: scipy.spatial; let me check
[18:28:22] <marcmiquel>	 valhallasw: i'm fighting to install sth called blas and lapack, which are necessary for scipy.
[18:28:32] <valhallasw>	 marcmiquel: yeah, don't
[18:28:58] <valhallasw>	 marcmiquel: that's something that will cost you a day or two to get to work
[18:29:04] <marcmiquel>	 :S
[18:29:19] <valhallasw>	 marcmiquel: however, we *should* just have scipy.spatial available
[18:29:46] <valhallasw>	 from scipy.spatial import cKDTree as KDTree works for me
[18:30:04] <valhallasw>	 marcmiquel: could you re-create the virtualenv with virtualenv --system-site-packages?
[18:30:21] <valhallasw>	 marcmiquel: deactivate && cd .. && virtualenv --system-site-packages venv && source venv/bin/activate
[18:31:25] <marcmiquel>	 aha, doing it
[18:31:40] <marcmiquel>	 it reinstalled pip and easy_tools
[18:32:12] <valhallasw>	 *nod*. It also changes the settings so it will use system python packages as well
[18:32:20] <valhallasw>	 which includes the system-wide scipy install
[18:33:03] <marcmiquel>	 mmm
[18:33:06] <marcmiquel>	 it seems it works!
[18:33:37] <marcmiquel>	 now i have regular code bugs :)
[18:33:47] <marcmiquel>	 great. we skipped the scipy problems
[18:33:51] <marcmiquel>	 thanks valhallasw
[18:34:00] <valhallasw>	 you're welcome!
[18:37:26] <wikibugs>	 10Tool-Labs-tools-Quentinv57's-tools: Global Sysop Statistics are not working - https://phabricator.wikimedia.org/T72185#1282975 (10-jem-) I think I have solved this now. There were two problems with the "_p" at the end of the database names, I guess because of the change from the toolserver to the toollabs meta...
[18:53:10] <ankita-ks>	 hi, i need help with running a http server on tools labs 
[18:53:40] <ankita-ks>	 when i queue a job using "jstart -q webgrid-generic ./httpserver.sh"
[18:53:53] <ankita-ks>	 and check its status using "qstat"
[18:54:31] <ankita-ks>	 the status never changes from qw to r
[18:54:50] <ankita-ks>	 I am following the instructions here : https://wikitech.wikimedia.org/wiki/Help:Tool_Labs#Creating_a_new_Tool_account
[18:55:04] <ankita-ks>	 and this is what I am trying to do : http://wiki.languagetool.org/http-server
[19:00:22] <valhallasw>	 ankita-ks: chances are it's not going to work like that. The proxy will send urls in the format   /<your tool name>/request   to the webserver
[19:01:17] <valhallasw>	 ankita-ks: I'm also not sure what you're trying to do exactly
[19:02:15] <valhallasw>	 ankita-ks: I assume the goal is not to have languagetool open for all over the network
[19:54:40] <polybuildr>	 yuvipanda: there?
[19:57:05] <polybuildr>	 valhallasw: Okay, I was pointed to yuvipanda, but I could just ask you. To make a Labs-vagrant instance (reasonably) production ready, apart from changing the password, what else needs to be done?
[19:57:17] <polybuildr>	 It's supposed to be a spam honeypot, so no spam stopping hardening measures required.
[19:57:41] <valhallasw>	 polybuildr: I'm not sure, to be honest. I'd check for other default accounts
[19:58:53] <polybuildr>	 valhallasw: hmm, okay. And should I add a layer of memcached or something?
[19:59:54] <bd808>	 polybuildr: Check out the simple_performant role
[20:00:17] <bd808>	 it sets up some basic performance tuning things
[20:00:34] <bd808>	 including http cache headers
[20:01:06] <bd808>	 mw-vagrant already has redis cache for most of the php side things out of the box
[20:01:26] <bd808>	 but you can enable the memcached role too if you like that better
[20:01:27] <polybuildr>	 bd808: Oh, there's redis out of the box? Cool.
[20:01:36] <polybuildr>	 No, that's okay.
[20:01:39] <polybuildr>	 Some level of caching, that's all.
[20:02:31] <bd808>	 polybuildr: $wgMainCacheType = 'redis'; $wgSessionCacheType = 'redis'; $wgSessionsInObjectCache = true; are default
[20:03:01] <polybuildr>	 bd808: I see. And it also comes with redis installed. Okay.
[20:03:06] <bd808>	 *nod*
[20:03:36] <polybuildr>	 I can't seem to find any documentation for the simple_performant role, btw.
[20:03:52] <polybuildr>	 Gerrit commits and comments instead. :P
[20:05:22] <polybuildr>	 bd808: I could just enable it, of course. list-roles does list it.
[20:06:05] <bd808>	 oh. yeah I forget that the hacks I built for labs-vagrant don't give you all the fun stuff
[20:06:36] <hashar>	 bd808: what is labs-vagrant ?
[20:06:39] <bd808>	 with a normal vagrarnt setup you can do `vagrant roles info simple_performant` to get a description
[20:07:14] <bd808>	 hashar: it's a puppet role for labs plus a simple wrapper script that let you use the puppet config from mediawiki-vagrant on a labs host
[20:07:31] <hashar>	 ohhhhhhhhh
[20:07:36] <bd808>	 hashar: https://wikitech.wikimedia.org/wiki/Help:Labs-vagrant
[20:07:56] <hashar>	 I need to held a Antoine-Bryan hackathon
[20:08:04] <bd808>	 the great yuvipanda started it and then I sort of took over maintaining it
[20:08:04] <polybuildr>	 bd808: I'll just enable it then. :P
[20:08:41] <bd808>	 hashar: the cooler thing I've been working on is vagrant managing lxc containers in labs
[20:09:07] <hashar>	 for the last two weeks or so I tried to use vagrant to provision images for labs
[20:09:13] <hashar>	 came to a dead end though
[20:09:14] <bd808>	 https://phabricator.wikimedia.org/T90892 and https://gerrit.wikimedia.org/r/#/c/193665/
[20:10:09] <bd808>	 I'd love to see a Vagrant provider that lets Vagrant on my laptop control a VM in labs
[20:10:11] <hashar>	 I will reach out to Dan next week
[20:10:54] <yuvipanda>	 valhallasw: doing shinken now
[20:11:09] <bd808>	 hashar: my Lyon plans are mw-vagrant and psr3 logging so we can chat there. Maybe over that dinner we talked about :)
[20:11:22] <hashar>	 sure thing!
[20:11:45] <polybuildr>	 bd808: Error: Duplicate declaration ... already declared in file /vagrant/ ... /simple_performant.pp:67; cannot redeclare at /vagrant/ ... labs_initial_content.pp:22
[20:12:00] <polybuildr>	 On running a provision.
[20:12:02] * bd808 settles in for 5 hours of Lyonnaise food, Belgian beer and geek talk
[20:12:15] <bd808>	 polybuildr: I'll look
[20:12:18] <polybuildr>	 Does it conflict with something that I need to disable first?
[20:12:43] <yuvipanda>	 valhallasw: re: port opening, there’s this huge, massive thread about it from elsewhere, and this is the best we can do for now. I’ll link you to it in a bit.
[20:12:52] <yuvipanda>	 valhallasw: also: it’s what is currently being used as well :)
[20:13:18] <valhallasw>	 ok
[20:13:37] <bd808>	 polybuildr: doh. the role that labs-vagrant gives you by default and simple_performant are fighting over controlling robots.txt
[20:13:56] <yuvipanda>	 valhallasw: https://phabricator.wikimedia.org/T93046
[20:14:01] <polybuildr>	 bd808: Yeah, the file was robots :P So now?
[20:14:18] <bd808>	 polybuildr: local hack for you, comment out lines 63-67 in puppet/modules/role/manifests/simple_performant.pp
[20:14:26] <bd808>	 I'll work on a real fix
[20:14:26] <polybuildr>	 bd808: willdo!
[20:16:21] <valhallasw>	 yuvipanda: mmhm.
[20:18:50] <yuvipanda>	 valhallasw: replied to some of the others, and I’m going to look at the subprocess interface now.
[20:22:45] <bd808>	 polybuildr: You could test https://gerrit.wikimedia.org/r/#/c/210770 via cherry-pick and see if it fixes the Puppet problem
[20:26:22] <polybuildr>	 bd808: Unfortunately, I have absolutely no idea how gerrit and a labs-vagrant instance mix. :P
[20:28:20] <bd808>	 A cherry-pick is actually pretty easy. On the patch set there is a "Download" section in gerrit. Click the "cherry-pick" and "Anonymous HTTP" options and the text it should will be the exact git command to past into your shell on the VM if you are $PWD == /srv/vagrant
[20:28:49] <bd808>	 you would need to revert your local changes to the simple_performant.pp file first
[20:31:49] <wikibugs>	 6Labs: allow routing between labs instances and public labs ips - https://phabricator.wikimedia.org/T96924#1283365 (10Andrew) I have a script which scans for every security group and adds an identical rule for the floating ip range 208.80.155.128/25 for anything that passes this:  if rule['Range'] == "10.0.0.0/8...
[20:32:18] <andrewbogott>	 yuvipanda or Coren: can you read the last three or four comments on ^ and check my work?
[20:35:01] <yuvipanda>	 andrewbogott: looking
[20:55:10] <polybuildr>	 bd808: Ouch, I forgot that /vagrant was just a git repo. Tried a cherry pick, still got an error.
[20:56:52] <polybuildr>	 hashar: How much of an issue is it to rename a Gerrit git repo?
[20:57:21] <Reedy>	 polybuildr: usually, it's create the new repo, push the whole history of the old repo into it
[20:57:30] <Reedy>	 disable/readonly-ify the old one
[20:57:41] <hashar>	 what reedy said
[20:57:53] <polybuildr>	 Reedy: So not too time/effort consuming, right?
[20:57:57] <hashar>	 polybuildr: there is no rename mechanism in Gerrit.  We had a patch for it but upstream refused it
[20:57:59] <hashar>	 iirc
[20:58:00] <Reedy>	 not really
[20:58:08] <Reedy>	 depends on how fast your connection is ;D
[20:58:11] <polybuildr>	 Ha, okay.
[20:58:12] <Reedy>	 and how big the repo
[20:58:12] <hashar>	 potentially that can be renamed doing a bunch of SQL queries
[20:58:26] <polybuildr>	 And how do I request for a repo?
[20:58:27] <hashar>	 OpenStack do rename them every Friday evening :]
[20:58:34] <hashar>	 Phabricator git-and-gerrit
[20:58:42] <hashar>	 i think
[20:58:46] <legoktm>	 no, it's still on-wiki
[20:58:58] <legoktm>	 polybuildr: https://www.mediawiki.org/wiki/Git/New_repositories/Requests
[20:59:06] <polybuildr>	 legoktm: thanks!
[20:59:08] <hashar>	 ^demon|lunch: polybuildr above wondered whether we do rename repositories. Should be doable via SQL :}
[20:59:16] <hashar>	 on wiki ..
[20:59:29] <hashar>	 and I thought we had a task to move all that mess to a Phabricator project
[21:00:28] <legoktm>	 some of us still prefer to use wikis ;)
[21:01:15] <mutante>	 hashar: legoktm: we made it so that asking for a new repo is on wiki but asking to change permissions on one is on phab
[21:01:19] <hashar>	 there is a project-creator project at https://phabricator.wikimedia.org/tag/project-creators/
[21:01:24] <hashar>	 and https://phabricator.wikimedia.org/tag/repository-ownership-requests/
[21:01:25] <mutante>	 for additional fun
[21:01:29] * hashar is confused and gives up
[21:01:37] <hashar>	 mutante: ahhh
[21:01:43] <hashar>	 mutante: yeah that is rather consistent :]
[21:01:58] <mutante>	 hashar: yea, i was wondering the same thing yesterday and ended up with the same links :)
[21:02:01] <hashar>	 if we had it in phab we could fill sub tasks to add rights and ci config
[21:02:02] <hashar>	 though
[21:02:17] <hashar>	 and ideally have a single change in puppet that creates repo / set rights and add CI conf :]
[21:02:39] <mutante>	 yea, i would like that too
[21:08:14] <cscott>	 !log deployment-prep updated OCG to version c7c75e5b03ad9096571dc6dbfcb7022c924ccb4f 
[21:08:20] <labs-morebots>	 Logged the message, Master
[21:12:43] <polybuildr>	 Who got labs-morebots to say ", Master" after everything? :P
[21:16:57] <^demon|lunch>	 hashar: Rename in gerrit? No.
[21:17:32] <valhallasw>	 polybuildr: if it doesn't recognise you, it says 'master'
[21:17:51] <yuvipanda>	 yeah, or it says 'dummy'
[21:17:56] <yuvipanda>	 if it does recognize you
[21:18:03] <yuvipanda>	 andrewbogott: it looks ok to me, assuming the ranges are correct...
[21:18:08] <polybuildr>	 valhallasw: yuvipanda: ...
[21:18:10] <polybuildr>	 Sure.
[21:18:19] <yuvipanda>	 polybuildr: not kidding, lookup any !logs from andrewbogott
[21:18:28] <polybuildr>	 No, but seriously. Does anyone remember who configured it to say that? :P
[21:18:43] <yuvipanda>	 no :P
[21:18:46] <andrewbogott>	 I think Domas wrote it originally
[21:18:57] <yuvipanda>	 at some point an entry was added for Leslie too
[21:19:09] <yuvipanda>	 (I think it says mistress of the networks or something like that)
[21:20:45] <hashar>	 ^demon|lunch: OpenStack does such renaming on a weekly basis. But I understand we don't want to invest more in Gerrit :]
[21:20:55] <valhallasw>	 polybuildr: it's really really old
[21:21:13] <valhallasw>	 as in: two pep8 changes and a semi-rewrite old
[21:21:23] <yuvipanda>	 it needs love
[21:21:32] <yuvipanda>	 and ircecho needs love for deduplicationg alerts
[21:21:34] <valhallasw>	 I gave it love, but nobody +2'ed it :P
[21:21:42] <^demon|lunch>	 hashar: I'm guessing they have a process.
[21:21:52] <yuvipanda>	 it needs love in the way of it needs to be somwhat destroyed and redone :P
[21:22:19] <valhallasw>	 polybuildr: all the way back to the initial commit: https://github.com/wikimedia/operations-debs-adminbot/commit/50bc81456a8b712a8d8afe7316e0b4e8c920c3e1
[21:22:32] <hashar>	 ^demon|lunch: indeed http://ci.openstack.org/gerrit.html#renaming-a-project :D
[21:23:44] <hashar>	 ^demon|lunch: some sql and file renaming while Gerrit is off  + reindexing . Doesn't look like to be too much of a hassle
[21:25:58] <polybuildr>	 valhallasw: Where are the actual titles?! I just keep finding title_map = {"example": "your exampleness"}
[21:26:24] <valhallasw>	 polybuildr: no clue
[21:27:06] <polybuildr>	 valhallasw: I even cloned it to grep through. Can't find a thing. "Master", yes. But the custom titles. Those were the ones I was looking for. :P
[21:27:43] <polybuildr>	 yuvipanda: Okay, I thought you were kidding about "dummy". My sincere apologies for not taking you seriously. :P
[21:28:16] <yuvipanda>	 polybuildr: :D welcome to wikimedia, it is a silly place
[21:28:41] <polybuildr>	 yuvipanda: I've seen little instances over the last 5 months, they've been pretty awesome. :P
[21:28:51] <yuvipanda>	 :)
[21:33:38] <wikibugs>	 6Labs: allow routing between labs instances and public labs ips - https://phabricator.wikimedia.org/T96924#1283605 (10scfc) For Tools, in `modules/dynamicproxy/manifests/init.pp` we (will) limit some functionality to `ferm`'s `$INTERNAL` hosts.  Could `modules/base/templates/firewall/defs.labs.erb` (?) be amende...
[21:39:44] <grrrit-wm>	 (03PS1) 10Sitic: Refactor backend [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/210815 (https://phabricator.wikimedia.org/T97900) 
[21:40:42] <grrrit-wm>	 (03CR) 10Sitic: [C: 032 V: 032] Refactor backend [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/210815 (https://phabricator.wikimedia.org/T97900) (owner: 10Sitic)
[22:09:11] <SMalyshev>	 is there a role for labs that contains java?
[22:20:09] <bd808>	 SMalyshev: I don't see any that add a jvm explicitly
[22:20:36] <bd808>	 I just did `git grep java|grep package`
[22:20:51] <SMalyshev>	 bd808: ok, thanks. I install with apt, but I wondered if there's already a role so I could do it easier
[22:21:49] <bd808>	 it's implicit with several other packages I guess (Elasticsearch, Logstash, ...)
[22:22:53] <bd808>	 ah. as search for 'jdk' has hits
[22:23:05] <bd808>	 but non of them look really generic
[22:23:35] <bd808>	 SMalyshev: ::contint::packages is the most generic but it pulls in a ton of other stuff
[22:23:56] <SMalyshev>	 ok, I'll keep doing apt then :) thanks
[22:33:46] <grrrit-wm>	 (03PS1) 10Sitic: Added README.md and .gitreview file [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/210822 (https://phabricator.wikimedia.org/T97899) 
[22:34:16] <grrrit-wm>	 (03CR) 10Sitic: [C: 032 V: 032] Added README.md and .gitreview file [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/210822 (https://phabricator.wikimedia.org/T97899) (owner: 10Sitic)
[22:37:40] <wikibugs>	 10Tool-Labs: Setup an easy to use logrotate based system for rotating tools logs - https://phabricator.wikimedia.org/T68623#1283723 (10valhallasw) logrotate can truncate instead of move-and-replace, which SGE seems to be able to cope with. However, Yuvi is planning to work on logstash-like solutions soon(tm), so...