[03:18:41] I am seeing an issue for COIBot, it is just reporting that it is collecting links, but not processing them. I am wondering whether someone is utilising most of the cycles of the server and COIBot may just be backing off because of it [03:19:32] eg. [14:11] Saved a backlog edit file - 226 + 3837 (+ 165). [03:19:32] [14:11] Saved a backlog edit file - 226 + 3837 + 165. [03:19:32] [14:11] Saved a backlog edit file - 162 + 2290 + 74. [05:24:50] petan: http://ganglia.wmflabs.org/latest/?c=bots&h=bots-nr1&m=load_one&r=hour&s=by%20name&hc=4&mc=2 is it possible to add more resources to nr1? dirk's (beetstra) bots are needed for steward work and it looks like he's maxing out the CPU on the instance [05:25:04] Damianz: ^ [05:53:31] legoktm: Annoyingly enough, Openstack doesn't allow resizing instances. [05:53:58] ugh [05:54:20] so dirk would have to move to a new instance? [05:54:59] legoktm: If he's run out, yeah. That's part of the reason why I insisted on using a compute grid on my design. :-) [05:55:22] Openstack is very good at some things, no so hot on others. [05:55:56] My understanding is that in order to enable resizing, you have to enable *live* resizing which is not so hot reliability-wise. [05:59:24] Do you know if he's got multiple bots working in parallel or is it just one big tool? [05:59:43] its multiple perl programs that work together i think. [06:00:29] Hm. If they work together through the filesystem, it might actually be advantageous for him to move to the new project. Well, as long as he doesn't use the replicas. [06:00:40] replicas? [06:00:45] Wait, what am I saying. Of course he doesn't if he's here already. [06:00:49] * Coren facepalms. [06:01:02] Oh right I had a question for you. [06:01:04] I'm still in the "from TS" mindframe. :-) [06:01:18] For replication, are the dbs still going to be split up on multiple servers? [06:01:30] Or is there any chance we'll be able to do joins across any db? [06:02:07] legoktm: I honestly don't know at this time, but I think Asher hopes to be able to present one cluster. [06:03:11] Err, so they would all be on one server? I'm not really familiar with all the terminology [06:03:31] Yeah. [06:03:48] Might also be workaround-able with federated databases depending on the kind of query. [06:04:55] But nothing I say here is trustworthy. Asher is at the "buying hardward" phase atm I believe. :-) [06:04:57] That would be awesome :) [06:05:03] hardware* [06:05:07] ah ok [06:05:28] I got a database report request that would require doing that, so I'll let them know its somewhat possible [06:05:39] Or might be in the future [06:05:56] legoktm: BTW, there are now user/tool DBs available on tools- [06:06:33] Ok I haven't had time to look into using tools yet, I'm using the dbs on bots-bsql01 right now [06:07:11] * Coren tries to think of a suitable bribe to entice you. :-P [06:10:19] Heh, I'm currently importing all of itwiki's persondata into a db [06:10:25] 500k articles. [06:10:29] Pretty interesting results. [06:44:19] Coren: To go back to the replicas thing, I thought I understood Asher that he (and Peter Youngmeister) don't do anything of that matter but that falls in Ryan's and Andrew's hands. Has this "line" been redrawn? [07:44:41] legoktm yes he needs to move to different box :P [07:45:11] :/ [07:45:27] did you look into testing the job queue stuff btw? [07:45:40] yes, going to do some initial setup today [07:46:02] awesome [08:43:43] !log bots added DGideas [08:43:50] Logged the message, Master [11:06:08] Hello to everybody!! [11:07:07] Can someone explain me why I constantly getting "svn: Can't open file '/data/project/DrTrigonBot/.svn/lock': Read-only file system" when trying to update the local svn repo from bots4? [11:10:53] ummm [11:10:58] let me se [11:12:13] weird [11:12:20] i can do it in my home directory [11:12:23] petan: ^ [11:14:46] the command was "svn up /data/project/DrTrigonBot/" started from my home on bots4 ... any idea? [11:15:21] well i tried [11:15:30] legoktm@bots-4:/data/project/DrTrigonBot$ sudo touch test [11:15:30] touch: cannot touch `test': Read-only file system [11:15:50] and [11:15:51] legoktm@bots-4:/data/project/legoktm$ sudo touch test [11:15:52] which works [11:16:09] so what have I done wrong? ;)) [11:16:21] no idea [11:16:44] it worked until about 2 days (may be 1 week) ago... [11:18:02] might it be related to "*** System restart required ***" reported from bastion? [12:03:17] !log bots petrb: removed nr2 (no users were using it) [12:03:19] Logged the message, Master [13:58:55] hmm, Remember my login on this browser (for a maximum of 7 days) -> is there a reason why we don't have it at 180 days like the rest of the wikis? [14:02:42] I think because it's ldap authentication? [14:31:58] Likely because it's storing tokens from multiple other services that have expiarys set [17:35:58] @labs-info bots-analytics [17:35:58] [Name bots-analytics doesn't exist but resolves to I-00000533] I-00000533 is Nova Instance with name: bots-analytics, host: virt8, IP: 10.4.1.44 of type: m1.small, with number of CPUs: 1, RAM of this size: 2048M, member of project: bots, size of storage: 30 and with image ID: ubuntu-12.04-precise [17:36:11] what is it for? [19:54:24] addrawr you have over 1000 processes on bnr1 :P [19:59:53] @notify Damianz [19:59:53] This user is now online in #wikimedia-tech so I will let you know when they show some activity (talk etc) [20:00:26] * Damianz looks around [20:00:36] yay [20:00:44] you know we wanted to move cbot [20:00:46] to bsql [20:00:47] :D [20:00:54] what about doing it now? :P [20:00:54] mhm [20:01:06] I was going to do it when I finish lunch [20:01:10] sure [20:01:20] you know it's 20 [20:01:24] late for lunch :P [20:01:24] mhm [20:01:30] I can help you :D [20:01:32] got up at like 1 [20:01:36] with lunch and bot [20:01:37] :D [20:03:56] Damianz is it possible to copy db, and launch the bot on like 20 minutes old snapshot without problems? or does it need to be synced? [20:04:05] because I would like to do it online [20:04:12] without having to put cb offline [20:04:15] no [20:04:23] it uses auto increment for revert ids [20:04:24] meh [20:04:29] ok [20:04:41] it may take a while for us to export / import it [20:06:03] lol you know there is not a single session in db now? [20:06:05] Reedy: hi i have seen you deployed E:RSS to mediawiki.org . [20:06:07] of your bo [20:06:08] t [20:06:44] Reedy: pls see http://www.mediawiki.org/wiki/Rss#Example [20:07:14] and pls add http://blog.wikimedia.org/feed/ to the whitelist - at least, I suggest to do this [20:08:04] petan: Yeah I stopped the bot... core and relay don't touch the db [20:08:11] ok [20:08:13] It's doing a dump/import now... [20:08:28] oh right... [20:08:31] how big is that db? [20:09:13] 710 threads lol [20:09:14] :D [20:09:35] it's not that big [20:09:39] ok [20:09:46] only like a million rows of data [20:09:55] beetstra's db was like 2 days of import :D [20:09:58] no idea why [20:10:21] We don't have that many indexes either... and the indexes are all ids anyway so numbers [20:11:08] I'd love to re-write a chunk of this and make a wholy shit awesome interface for full life cycle management... but I don't want to go behind the C's back as it's still technically their bot [20:11:36] who's C [20:11:46] Cobi? [20:12:03] Cobi/Crispy [20:13:42] aha [20:15:13] The bot pretty much works fine (apart from with it ooms the box and when the TS is too far out of sync so the api call fails)... but the interfaces suck, like review uses more bw than we have and report just sucks... would love to make it queue based so everything is checked in real time when possible and a few seconds/min later if not... most humans catch the mins later bit but a lot of it isn't [20:19:49] !log created -bnr2 feel free to use, it though it's a part of pre-setup for grid... (grid of 1 box really suck) [20:19:50] created is not a valid project. [20:19:58] !log bots created -bnr2 feel free to use, it though it's a part of pre-setup for grid... (grid of 1 box really suck) [20:19:59] Logged the message, Master [20:20:16] lol comma :D [20:29:35] imports are slow [20:30:15] 2169 root 20 0 0 0 0 S 4 0.0 1:12.12 [btrfs-endio-wri] [20:30:18] lol [20:36:46] * Damianz is not convinced by the speed of petan's new server [20:37:01] it's not really mine lol [20:37:09] you know IO was never good on labs [20:37:24] also until I boot kernel 3.8 you can't expect extra performance from btrfs [20:37:48] 25% iowait [20:37:57] * Damianz wishes a slow and painful death on someone [20:37:58] I don't think it's because of btrfs [20:38:11] it's just a slow hardware [20:38:25] meh, hardware is the same speed [20:38:28] :D [20:38:35] yes but it's slow on all servers [20:38:36] it's what you do with it that makes it slow or fast [20:39:00] I'd like to make it faster but that requires reboot [20:39:01] :/ [20:39:33] I don't think we could have any faster server - no matter of instance size, the IO is same slow [20:39:42] my import hasn't event got to reporting the first progress yet :( [20:39:47] it's only like 180mb [20:40:18] Ryan_Lane: Can haz ssds and real hardware for databases? [20:40:38] Damianz can but with no direct access :( [20:40:38] 512kB 0:02:03 [ 0B/s] [> ] 0% ETA 12:16:04 [20:40:41] avg-cpu: %user %nice %system %iowait %steal %idle [20:40:42] Is just funny [20:40:44] 0.48 0.00 0.70 22.81 0.00 76.01 [20:40:49] lol [20:41:25] I guess I'll go back to writing the case for using puppet and saltstack at work while this imports [20:41:35] heh [20:41:46] that's why i wanted to make it online [20:42:32] Well I was going to do all the db servers online - take the dump, import it, slave off it, then like 5 seconds downtime while replication caught up and ip swap [20:42:35] But labs makes that hard [20:42:48] mhm [20:45:40] I guess about 2 hours before someone posts 'is cb down' :D [20:48:35] heh :P [20:48:40] people like it being down [20:48:47] they can collect more score [20:49:04] meh [20:49:16] humans should embrace bots [20:49:23] :P [20:49:42] I'd rather everyone use something like Stiki and pile the info into a database that's feely open for people to use with tools and bots. [20:49:48] The world would be a much better place [20:50:40] http://www.commitlogsfromlastnight.com/ [20:50:56] I bet I can find some mine :D [20:53:44] lol [20:54:22] I need to clean up by gh repos and make the code pretty... think I'm gonna re-jiggle my cv to be around that as it's the direction I'd prefer to haed [21:05:18] 1.5MB 0:22:12 [ 867B/s] [> ] 0% ETA 44:02:14 [21:05:21] 44hours!?!?!? [21:05:30] * Damianz might restart this in screen now [21:11:12] Hello all! [21:11:57] ...comming back to my "Read-only file system" issue on bots4... has somebody any idea about this? [21:18:33] ...nobody? no idea? ... [21:22:02] Where are you getting a read only fs message? [21:22:22] Ryan_Lane: have you time for a code-review on my two new OpenID patches, I moved some code to a new static function showOpenIDIdentifier() - hope, that this is ok [21:22:51] Ryan_Lane: https://gerrit.wikimedia.org/r/#/c/52776/11/SpecialOpenID.body.php [21:23:27] nothing serious. but this code is neede in three places [21:23:33] needed [21:23:46] Damianz: when trying to "svn up /data/project/DrTrigonBot/" [21:27:04] Hmm... it's not readonly [21:27:58] ...so what have I done wrong? [21:29:11] hmm [21:29:15] * Damianz jabs Ryan_Lane in the ribs [21:29:47] It seems that your directory thinks it's read only... the rest of that fs isn't read only... smells like the filesystem is being fucky [21:30:40] Do not know if it is related but bastion (not bots4) reported "*** System restart required ***" on login... [21:31:07] hmm [21:31:09] ok there's 10 folders [21:31:24] that shouldn't be related.. I'll reboot it anyway as that will probably fix this issue [21:31:54] !log bots rebooted bots-4 weird read only fs issues (for 15 folders) and has updates [21:31:56] Logged the message, Master [21:32:45] Damianz: ...so I'll stay tuned and wait on your command (master ;) ... [21:33:38] ok.. so the read only issue is fixed... but now it's getting i/o errors [21:34:34] is Reedy a bot ? he does not answer me [21:34:39] ;-) [21:34:49] She's a bot. [21:35:49] DrTrigon: So basically, gluster is pooched a little and you .svn folder is just throwing i/o errors everywhere :( [21:36:30] * Damianz tickles andrewbogott_afk paravoid Coren and the other one he can't remember the nick of [21:37:14] Damianz: Was this my misstake? What do to in order to solve? How to proceed? ... [21:37:55] DrTrigon: Nah, it's probably due to gluster crashing on the storage nodes last week... either Ryan_Lane can fix it or possible could remove the directory and re-create it (checkout the code again) which might fix it [21:38:34] ...so just wait... ;)) [21:38:38] ?? [21:38:54] The other .svn dirs seem fine, just the one in the root gets i/o errors so everything fails... not really sure how well svn takes to fluxing about with it's internal dir tbh [21:38:54] probably best [21:39:09] everything else seems to be working, just a single dir not (which shouldn't affect running the bot, just updating it). [21:43:08] Damianz: yes looks like the bot is able to run and log to file... so will the other issue be solved (by you or somebody else) and/or do I have to take some action too? [21:43:42] I'll file a bug against Ryan_Lane for it, he might fix it today or maybe tomorrow or possibly Monday depending what he's doing this weekend [21:44:51] ...ooo that's ok no rush! ;) the question was more regarding the bug report, if this is filed and will finaly be solved everything is fine for me... ;))) THANKS AND GREETINGS!! [21:49:01] if you can recreate the files, I'd recommend deleting and recreating [21:49:06] [bz] (NEW - created by: Damian Z, priority: Unprioritized - major) [Bug 45945] Gluster sucks - https://bugzilla.wikimedia.org/show_bug.cgi?id=45945 [21:49:11] I've added quorum support, so split-brain won't occur again [21:49:39] now things will turn read-only in failure situations [21:50:18] wm-bot: OH HEY [21:50:18] Hi Damianz, there is some error, I am a stupid bot and I am not intelligent enough to hold a conversation with you :-) [21:50:32] Ah - that explains the read-onlyness rather than failover [21:50:54] oh. it's doing that ow? [21:50:55] *now [21:51:12] yes [21:51:14] where? [21:51:15] Ryan_Lane: so I'll try to recreate them... ;)) [21:51:19] bots-4 [21:51:29] /data/project/DrTrigonBot [21:51:36] Ryan_Lane: or shall I wait...? [21:51:37] ah. I wonder if that's due to there already being a split-brain [21:51:37] pywikipedia/.svn [21:51:42] DrTrigon: go for it [21:52:03] I rebooted the instance like 5min ago so it's a clean mount - dmesg was reporting rpc lock failures though (probably keys since that's normally nfs) [21:52:13] that's nfs [21:52:25] no need to lock on a read-only filesystem ;) [21:52:38] Ryan_Lane: I mean if it is not needed I would prefer not doing so... [21:52:39] And that is how the world ends for databases. [21:53:05] DrTrigon: well, it's pretty time-consuming to fix split-brains [21:53:17] Ryan_Lane: Also - is local storage slow (like 20-35% iowait, 1.34kB/s slow)? [21:53:18] ....ok ... I see... ;)) [21:53:22] Or should I blame petan? [21:53:24] if it's just pulling it again from svn, it's likely easier [21:53:52] Yeah I have to recreate all symlink which I do not remember right now actually... [21:54:16] ln -s /data/project/DrTrigonBot/pywikipedia/logs/ trunk ? [21:54:46] I think there were more... [21:54:50] symlinks? [21:54:55] why the use of symlinks? [21:55:04] To make fixing splitbrains harder :) [21:55:10] Damianz: https://ganglia.wikimedia.org/latest/?c=Virtualization%20cluster%20pmtpa&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [21:55:23] no real wait-io [21:55:23] ganglia works? awesome [21:55:24] to have the logs accessible from web... [21:55:29] there's a bit on virt8 [21:55:32] DrTrigon: ah [21:56:02] If you delete/re-checkout that dir the symlinks should be fine since they should be in public_html a dir above [21:56:07] unless you're symlinking to a symlink [21:56:40] I wonder if this is brtfs... because 180mb at 1.15kB/s (according to pv) is like 2 and a bit days =\ was much faster on the other mysql instance [21:56:54] maybe? [21:57:04] I haven't tried btrfs in labs [21:57:21] * Damianz looks at petan [21:58:08] I think 'Gluster sucks' was a perfect summary [21:58:16] * Damianz looks at mzmcbride [22:20:36] Damianz: How to delte if there are IO errors?? [22:21:05] Try and delete the parent, atm it just thinks the meta data is missing [22:22:33] Damianz: ...and if this not works? [22:24:21] cry [22:25:25] Damianz: I mean if i try "rm -rf ..." then this will not work too since ther IO error are in the sub-dirs... [22:25:50] 1sec [22:28:50] Hmm yeah this is kinda foobar'd [22:29:16] * Damianz wonders if Ryan_Lane can remove the files from the storage nodes so they dissapear from the client [22:29:54] Damianz: so I am crying... ;)) [22:30:13] Drink more Whisky [22:30:45] ...depends on the exact problem... whisky might not be the best drug... [22:30:46] ;)) [22:31:07] If Whisky doesn't solve the problem, Gin [22:33:02] port wine solves it for sure [22:36:14] .... I have to ask my grandma... ;)) [22:40:06] Damianz: as expected... does not work :( [23:08:38] ...so till soon for a second try... ;) greetings! [23:34:15] Ryan_Lane: Could you delete "/data/project/DrTrigonBot" for me, because of IO errors I can't do a "rm -rf" ... [23:34:24] (please?) [23:38:12] DrTrigon: I cleared the dir for you... try re-checking out now [23:39:02] and my cleared I mean I just moved it and made a new once... since that just changes the meta data on the parent dir and doesn't need to go anywhere near the files in it... yay filesystems