[00:03:43] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 157 processes [00:08:42] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 149 processes [00:37:32] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 21% free memory [00:41:42] PROBLEM Free ram is now: WARNING on techvandalism-bot.pmtpa.wmflabs 10.4.0.194 output: Warning: 15% free memory [00:41:53] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [00:50:34] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 17% free memory [00:56:34] RECOVERY Free ram is now: OK on techvandalism-bot.pmtpa.wmflabs 10.4.0.194 output: OK: 23% free memory [01:09:52] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [01:18:02] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Unable to read output [01:22:54] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [02:02:43] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 157 processes [02:07:42] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 149 processes [02:22:13] PROBLEM Total processes is now: WARNING on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS WARNING: 200 processes [02:28:02] PROBLEM Free ram is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to fork() failed [02:31:33] PROBLEM Disk Space is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [02:36:33] RECOVERY Disk Space is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: DISK OK [02:52:54] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [03:48:02] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to popen() failed [03:49:32] PROBLEM Disk Space is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to fork() failed [03:49:52] PROBLEM dpkg-check is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Unable to read output [03:51:22] PROBLEM Current Load is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [03:54:52] RECOVERY dpkg-check is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: All packages OK [03:56:13] RECOVERY Current Load is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: OK - load average: 0.15, 0.32, 0.31 [04:08:02] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 10% free memory [04:22:14] PROBLEM Total processes is now: CRITICAL on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS CRITICAL: 201 processes [04:27:12] PROBLEM Total processes is now: WARNING on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS WARNING: 200 processes [04:39:52] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [04:40:33] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 20% free memory [04:44:33] PROBLEM Free ram is now: WARNING on techvandalism-bot.pmtpa.wmflabs 10.4.0.194 output: Warning: 15% free memory [04:48:33] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 17% free memory [04:52:53] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [04:54:43] PROBLEM Free ram is now: CRITICAL on techvandalism-bot.pmtpa.wmflabs 10.4.0.194 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:59:33] PROBLEM Free ram is now: WARNING on techvandalism-bot.pmtpa.wmflabs 10.4.0.194 output: Warning: 7% free memory [06:27:15] PROBLEM Total processes is now: CRITICAL on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS CRITICAL: 203 processes [06:29:55] PROBLEM Total processes is now: WARNING on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS WARNING: 152 processes [06:32:44] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 155 processes [06:48:43] PROBLEM Free ram is now: CRITICAL on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:12] PROBLEM Total processes is now: WARNING on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS WARNING: 198 processes [06:52:42] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 150 processes [06:53:32] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 17% free memory [06:59:52] RECOVERY Total processes is now: OK on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS OK: 148 processes [07:05:42] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 151 processes [07:35:22] PROBLEM Free ram is now: CRITICAL on build-precise1.pmtpa.wmflabs 10.4.0.173 output: Connection refused by host [07:35:22] PROBLEM dpkg-check is now: CRITICAL on build-precise1.pmtpa.wmflabs 10.4.0.173 output: Connection refused by host [07:35:42] PROBLEM Disk Space is now: CRITICAL on build-precise1.pmtpa.wmflabs 10.4.0.173 output: Connection refused by host [07:36:12] PROBLEM SSH is now: CRITICAL on build-precise1.pmtpa.wmflabs 10.4.0.173 output: Connection refused [07:36:32] PROBLEM Current Load is now: CRITICAL on build-precise1.pmtpa.wmflabs 10.4.0.173 output: Connection refused by host [07:36:52] PROBLEM Total processes is now: CRITICAL on build-precise1.pmtpa.wmflabs 10.4.0.173 output: Connection refused by host [07:40:23] RECOVERY Free ram is now: OK on build-precise1.pmtpa.wmflabs 10.4.0.173 output: OK: 91% free memory [07:40:23] RECOVERY dpkg-check is now: OK on build-precise1.pmtpa.wmflabs 10.4.0.173 output: All packages OK [07:40:43] RECOVERY Disk Space is now: OK on build-precise1.pmtpa.wmflabs 10.4.0.173 output: DISK OK [07:41:13] RECOVERY SSH is now: OK on build-precise1.pmtpa.wmflabs 10.4.0.173 output: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [07:41:33] RECOVERY Current Load is now: OK on build-precise1.pmtpa.wmflabs 10.4.0.173 output: OK - load average: 0.02, 0.06, 0.04 [07:41:53] RECOVERY Total processes is now: OK on build-precise1.pmtpa.wmflabs 10.4.0.173 output: PROCS OK: 92 processes [08:37:53] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [08:38:33] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 21% free memory [08:50:53] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [09:06:32] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 17% free memory [09:13:22] !tunnel [09:13:22] ssh -f user@bastion.wmflabs.org -L :server: -N Example for sftp "ssh chewbacca@bastion.wmflabs.org -L 6000:bots-1:22 -N" will open bots-1:22 as localhost:6000 [09:17:00] pff [09:18:52] !log deployment-prep Beta is broken in some random and creative ways AGAIN. /home on bastion is corrupted, some instances do not let us connect anymore, apache docroot disappeared. [09:18:53] Logged the message, Master [09:18:55] I am fed up with beta [09:24:33] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 151 processes [09:24:46] !log deployment-prep upgrading / rebooting all instances [09:24:47] Logged the message, Master [09:25:33] PROBLEM dpkg-check is now: CRITICAL on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: DPKG CRITICAL dpkg reports broken packages [09:25:43] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 150 processes [09:25:53] PROBLEM dpkg-check is now: CRITICAL on deployment-varnish-t3.pmtpa.wmflabs 10.4.1.83 output: DPKG CRITICAL dpkg reports broken packages [09:26:33] PROBLEM dpkg-check is now: CRITICAL on deployment-jobrunner08.pmtpa.wmflabs 10.4.1.30 output: Connection refused by host [09:26:43] PROBLEM dpkg-check is now: CRITICAL on deployment-bastion.pmtpa.wmflabs 10.4.0.58 output: DPKG CRITICAL dpkg reports broken packages [09:27:24] PROBLEM dpkg-check is now: CRITICAL on deployment-cache-mobile01.pmtpa.wmflabs 10.4.1.82 output: DPKG CRITICAL dpkg reports broken packages [09:29:33] RECOVERY Total processes is now: OK on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS OK: 137 processes [09:30:34] PROBLEM dpkg-check is now: CRITICAL on deployment-cache-bits03.pmtpa.wmflabs 10.4.0.51 output: DPKG CRITICAL dpkg reports broken packages [09:30:54] RECOVERY dpkg-check is now: OK on deployment-varnish-t3.pmtpa.wmflabs 10.4.1.83 output: All packages OK [09:31:34] RECOVERY dpkg-check is now: OK on deployment-jobrunner08.pmtpa.wmflabs 10.4.1.30 output: All packages OK [09:31:44] RECOVERY dpkg-check is now: OK on deployment-bastion.pmtpa.wmflabs 10.4.0.58 output: All packages OK [09:31:44] PROBLEM dpkg-check is now: CRITICAL on deployment-cache-upload04.pmtpa.wmflabs 10.4.0.220 output: DPKG CRITICAL dpkg reports broken packages [09:40:32] RECOVERY dpkg-check is now: OK on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: All packages OK [09:42:22] RECOVERY dpkg-check is now: OK on deployment-cache-mobile01.pmtpa.wmflabs 10.4.1.82 output: All packages OK [10:54:36] !log bots petrb: installing python-twisted and php5-curl [10:54:38] Logged the message, Master [10:54:43] !log bots petrb: on bnr1 [10:54:44] Logged the message, Master [10:55:43] WTF [11:02:49] :O [11:03:58] !log bots root: because of gluster failure on bots-bnr1 it's needed to kill all processes that access the fs, which is gifti 10091 f.c.m (gifti)java valhallasw 13912 ..c.. (valhallasw)bash valhallasw 14202 ..c.. (valhallasw)file killing them now [11:03:59] Logged the message, Master [11:05:12] !log bots root: bots-bnr1 - process 14202 is stucked waiting for IO, unable to kill it, machine needs to be rebooted [11:05:14] Logged the message, Master [11:05:34] !log bots root: rebooting bots-bnr1 [11:05:36] Logged the message, Master [11:09:06] :D [11:09:14] it's totally fucked [11:09:18] :/ [11:09:25] that gluster suck [11:12:30] i think gluster likes having special moments [11:31:35] * Ryan_Lane grumbles [11:31:57] morning Ryan_Lane [11:32:06] labs are kind of fucked [11:32:08] I'm fixing gluster as we speak [11:32:10] yep [11:32:22] well, projects that rely on gluster currently are [11:32:25] ok [11:32:37] I'm pretty sure I know the fix and it shouldn't take an amazingly long time to fix it [11:32:51] I'm also taking this opportunity to upgrade the gluster boxes to precise [11:33:09] petan: hi! [11:33:12] hi [11:33:15] the gluster people don't really like supporting lucid [11:33:34] python-twisted doesn't seem to be working on bots-bnr1 [11:34:07] rschen7754 there is more that doesn't work on that box [11:34:12] ok [11:34:16] let me check it all [11:34:23] problems related to gluster [11:34:33] oh, ok [11:34:36] all my .aptitude stuff in $HOME was fucked [11:35:13] rschen7754 what is wrong with it [11:35:23] it's acting like it's not installed [11:35:48] !log bots root: reinstalling python-twisted on bots-bnr1 [11:35:50] Logged the message, Master [11:38:17] ok, try now [11:41:19] Connection to bots-nr1.pmtpa.wmflabs closed. [11:41:44] Unable to create and initialize directory '/home/rschen7754'. [11:41:55] gluster issues [11:42:00] I'm working on it now [11:42:09] i can try again later… trying to get an assignment done [11:42:11] thanks! [11:42:18] should hopefully be fixed in a few hours [11:42:25] (really hopefully before then) [11:45:13] rschen7754 did you have problems on bnr1 or nr1 [11:45:13] though train wifi is an awesome thing, it's not very good for latency :D [11:45:30] bots-nr1 [11:45:34] oh [11:45:37] you said bnr [11:45:41] no more memcached segfaults since downgrading [11:45:43] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 165 processes [11:45:44] there is no python-twisted on nr1 [11:47:33] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 153 processes [11:48:23] !log wikidata-dev wikidata-dev-9: Making cronjob restart memcached every night. Often a simple restart helps to get rid of the lock error message. [11:48:25] Logged the message, Master [11:49:33] oh, oops... [11:50:22] Silke_WMDE_: so, you'll be helping organize tool labs, right? [11:50:23] i think i better take a look when i'm more awake… sorry about that [11:50:36] Ryan_Lane: Yes. [11:50:47] \o/ [11:50:52] :) [11:51:12] Silke_WMDE_: should we get a meeting scheduled between you, I and sumana? [11:51:34] Yes, that would be cool! [11:51:48] are you going to be doing project management? [11:52:00] yes [11:52:03] we could probably discuss bots right now [11:52:23] there's quite a few immediate tasks that should be prioritized in that project [11:52:39] I don't know a lot about bots yet [11:52:55] I've been organizing tasks on the project page [11:53:04] well, labs project page, not bots [11:53:19] petan: got it going, sorry for the confusion [11:53:22] though it may make sense to have project tasks listed on project pages? [11:53:31] !resource bots [11:53:31] https://labsconsole.wikimedia.org/wiki/Nova_Resource:bots [11:53:38] thx [11:53:42] it's 4 am here and i didn't realize bnr-1 and nr-1 weren't the same :P [11:54:02] * Silke_WMDE_ is reading [11:54:05] I'm not totally sure where it would be best to organize stuff [11:54:21] labsconsole makes a lot of sense for a lot of reasons, but it's not historically where we do so [11:54:22] RECOVERY Current Load is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: OK - load average: 0.20, 0.20, 0.08 [11:54:22] RECOVERY Disk Space is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: DISK OK [11:54:58] where was it historically? mediawiki.org? meta? [11:55:08] mediawiki.org [11:55:13] we stopped using meta a long time ago [11:55:18] ok [11:55:33] mediawiki.org isn't really a great place for project docs either :D [11:55:37] * Ryan_Lane shrugs [11:55:42] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 97 processes [11:55:52] In my opinion labsconsole is a good place. Could be linked on mediawiki.org just in caase people look for it there. [11:55:54] we have too many wikis [11:55:55] docs.wikimedia.org would be cool :D [11:56:04] petan: yeah [11:56:15] merging wikitech and labsconsole and using the wikitech name would work too [11:56:23] or that [11:56:41] there's plans for it, but it always gets depriotized [11:56:43] we really badly need to hire people [11:56:44] :( [11:56:51] hire for what :D [11:56:56] ops [11:56:58] I have lot of time over weekends :D [11:57:10] heh [11:57:16] you should apply for a full time position ;) [11:57:24] we do have contractor positions available, though [11:59:00] mm full time... I can imagine that even if I am unsure if I meet all the requirements, I will see how the project I am employed by now will continue :D [11:59:24] Silke_WMDE_: I'm pretty sure the tool labs contractor position is close to being filled [11:59:29] at least I really want to learn puppet before thiking of working for wmf [11:59:44] petan: work with Damianz on the bots puppetization [11:59:47] Ryan_Lane: yay [12:00:23] I think I'm going to try to merge his change in next week. I've been absent in bots long enough, I guess :D [12:00:42] talking about puppet: The workflow for people who have a bot is not clear to me, yet. Is every bot author expected to deal with puppet to run a puppetized labs instance? [12:01:31] I'd prefer to have a community of people who would help puppetize things for any project [12:01:49] but realistically I expect the tool labs contractor to do this [12:01:58] I see. [12:02:07] we can't expect bot authors to use puppet [12:02:18] yeah, it's quite a ... [12:02:23] *dictionary* [12:02:31] hurdle [12:02:42] it needs to be as simple as possible [12:03:04] right now it requires too much effort for everyone [12:03:09] ssh bots-4 [12:03:17] >.< [12:03:21] it also puts quite a burden on petan and Damianz, I'm sure [12:03:57] But basically: Would everyone running a bot/moving a bot from toolserver to labs have instances like those we have fro Wikidata? [12:04:08] Ryan_Lane: Needs a few tweaks, just not had time [12:04:28] Really all the servers need re-deploying to that puppet standard with omg ram [12:04:38] nah, I think that's too difficult for people, too [12:04:55] I'd prefer bots be managed from a central place and be "scheduled" on instances [12:05:15] Like on toolserver, Would be perfect in my opinion [12:05:31] indeed. it would be nice if it was similar to toolserver [12:05:40] yes [12:05:45] different bots have different use cases [12:05:52] Damianz: yep [12:05:58] some will have their own instance [12:06:00] Ryan_Lane: Don't even tempt me to apply for a full time position, I'd drive you crazy :P [12:06:01] and that's fine [12:06:12] Damianz: hey, I told you ages ago to ;) [12:06:27] you just got a new job, though [12:06:36] Yeah :( [12:06:46] A bunch of clean puppeted instances with noroot, a 'submit' server and PBS [12:07:00] PBS? [12:07:15] yeah, what's a PBS? [12:07:19] addshore some of bots are daemon like [12:07:23] running all time [12:07:27] like wm-bot [12:07:34] PBS is of no use there [12:07:34] RECOVERY Total processes is now: OK on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS OK: 148 processes [12:07:36] I'm too expensive to do wmf full time anyway :P Do miss contract work somewhat though, this would be a cool project to work on... oh well, I've got crazy enterprise stuff to fix. [12:07:40] true [12:07:47] + these MUST be always on same instance [12:07:51] Damianz: heh [12:08:01] PBS (portable batch system) http://en.wikipedia.org/wiki/Portable_Batch_System [12:08:10] yeah [12:08:36] petan why must they always run on the same instance though? [12:08:53] ok labstore4 wtf are you doing :( [12:08:54] because for example wm-bot consist of multiple processes that are connected to each other [12:09:06] all these processes need to run on 1 machine [12:09:21] I'd love to look at using zeromq for bots usage reporting/management [12:09:28] hmm, fair, I imagine there is some way that you can pick an instance though, not looked into it [12:10:06] there's a number of ways we can handle job running [12:10:28] for sure this is a job for the tool labs contractor [12:10:56] Shame everytime you hire a labs guy they become a 'fix ceph, fix varnish, fix this, fix that' in production guy [12:11:01] We still love paravoid though [12:11:02] I am still looking for the big picture. Some bots did already migrate, other probably could. I don't know how many bots require db replication. (Actually, I don't have an overview over the variety of existing bots.) [12:11:03] * Ryan_Lane sighs [12:11:14] Damianz: yeah. well, production gets priority [12:11:56] I think a big plus to bots would be DB replication, i know a fair few people that would have migrated if it was already in place [12:12:05] booo, short team benefits [12:12:17] addshore: yes, that's supposed to be happening this month [12:12:23] :) [12:12:35] DB replication has really been blocked because magg... thingy has been busy doing prod stuff, but now a replica should be easy [12:12:50] In theory it should be better than labs, it's just another slave for switches etc rather than out-of-team [12:12:55] honestly until we stablize gluster we're not ready for mass movement [12:13:10] it's a major blocker right now [12:15:00] will Gluster ever work faster than now? [12:15:07] yes [12:15:22] in fact, I hope to make it 3x faster this week or next [12:15:39] --nodelay?:P [12:15:50] I'd do it now if I wasn't scared we'd saturate our network node [12:16:03] MaxSem: switch the instances to using virtio for networking [12:16:21] it's still going to be slow, but it should hopefully at least be bareable [12:16:46] "bareable":) [12:17:06] all distributed network filesystems are slow [12:21:44] Ryan_Lane: What are good next steps that can be done from my side for bots? [12:22:07] Do we have a list of bots? [12:22:14] Silke_WMDE_: we should start entering bugs and organizing them into projects [12:22:31] only what's documented on the resource page [12:22:37] ok [12:23:01] I guess we really need the contractor before we can start [12:23:36] * Silke_WMDE_ is looking at a list on the toolserver wiki [12:23:40] you could have a look through users crontabs to see what bots are running? [12:24:22] https://wiki.toolserver.org/view/List_of_Wikimedia_bots and https://wiki.toolserver.org/view/IRC_bots [12:26:36] Ryan_Lane: For a meeting with Sumana - what times would be good for you? [12:26:55] (and why are you awake at all? ;P ) [12:27:12] probably easiest is morning in SF [12:27:17] Silke_WMDE_: I'm in europe [12:27:25] oh! [12:27:26] fosdem? [12:27:47] yep [12:27:56] and I'm in paris for a couple of days [12:28:02] cool! [12:28:02] I'm riding the train there now, in fact [12:28:40] And Sumana is in NY, right? [12:29:30] yep [12:29:52] sumanah, were you summoned by Silke_WMDE_ ? [12:29:57] :) [12:30:09] hi Platonides - no! Should I have been? :) [12:30:13] LOL [12:30:14] happy Monday Platonides [12:30:15] hi sumanah [12:30:21] to you too, sumanah [12:30:26] Hello Silke_WMDE_! [12:30:43] Ryan_Lane, as the matter of fact I also have questions related to tools lab [12:30:44] Ryan_Lane and I were talking about scheduling a meeting [12:30:53] MaxSem: sure, what's up? [12:31:32] if I understood it right, there will be a native (as in non-virtualized) shared copy of Wikipedia DB? [12:31:43] yes [12:31:51] cool [12:33:03] that's planned for this month [12:33:18] Silke_WMDE_: cool, yes, meeting, sounds good [12:33:30] are there plans for other hi-performance services? several people who do mapping-related stuff are fearing that they will not be able to migrate from Toolserver as they require a copy of OSM [12:33:38] I have a few emails I aim to respond to today and yours is one (or 5) of them [12:34:00] MaxSem, also mention that they use postgreSQL [12:34:18] yup, OSM = PG + PosgGIS [12:34:33] MaxSem: OSM will be in production, not labs [12:34:47] sumanah Ryan_Lane Something like 10 a.m. in SF, 1 p.m.in NY and 7 p.m. in Berlin. [12:34:47] I know;) [12:35:01] but Toolserver currently hosts its own copy [12:35:05] Ryan_Lane: When will you back to work/to SF? [12:35:15] I fly back on the 8th [12:35:19] which requires more storage than our instances can provide (gluster is outta the question) and requires some performance [12:35:44] MaxSem: right, so labs users will use the production version [12:36:15] Ryan_Lane, so if we will be able to give them this access that would be awesome [12:36:33] it's just using tiles, right? [12:36:51] no, I'm talking about Postgres access [12:37:10] we'll have to look at replicated databases for that [12:37:38] rendering tiles on TS/Labs is insane enough that nobody's doing it, apparently:) [12:37:39] Silke_WMDE_: that time of day sounds good to me! [12:40:23] PROBLEM Total processes is now: WARNING on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: PROCS WARNING: 184 processes [12:40:53] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [12:41:24] !log deployment-prep -dbdump : stopping udp2log, starting udp2log-mw [12:41:26] Logged the message, Master [12:41:35] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 21% free memory [12:45:06] is there a tool labs spec somewhere for looking/adding suggestion? [12:46:31] there's a couple pages that DaB started [12:46:40] something like "features needed in tool labs" [12:46:48] and "features wanted in tool labs" [12:46:54] silly to have two pages, I think [12:47:43] Ryan_Lane: I agree that we don't need to have 2 pages [12:47:43] * hashar sends Ryan_Lane to bed [12:47:52] wait MaxSem, I'll send them [12:47:55] hashar: I'm in europe ;) [12:47:59] oh [12:48:05] hashar: wait, where is he? the middle of the Atlandtic? [12:48:24] http://www.mediawiki.org/wiki/Wikimedia_Labs/Toolserver_features_wanted and http://www.mediawiki.org/wiki/Wikimedia_Labs/Toolserver_features_needed_in_Tool_Labs [12:48:40] Silke_WMDE_: if you want to do the honors of combining those into 1 page, that would be great [12:48:49] hehe [12:48:53] OK! [12:49:19] More than great [12:49:34] some organization to the pages would be good too [12:49:35] Silke_WMDE_, thanks [12:49:58] Also, I'll send a mail to toolserver-l with a call for additions (though Dab said it should be complete) [12:49:59] also please go ahead & update the references on https://www.mediawiki.org/wiki/Wikimedia_Labs [12:50:00] maybe bugs added and listed with it :) [12:50:22] RECOVERY Total processes is now: OK on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: PROCS OK: 150 processes [12:52:59] !log deployment-prep rebasing /srv/deployment/mediawiki/common [12:53:01] Logged the message, Master [12:53:08] Change on 12mediawiki a page Wikimedia Labs/Toolserver features wanted in Tool Labs was modified, changed by MaxSem link https://www.mediawiki.org/w/index.php?diff=640492 edit summary: [+293] /* Labs wide (not only bots / tools), but available for all projects */ +cmt [12:54:32] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 17% free memory [12:55:00] ok. back on in a bit. train is arriving [12:55:06] enjoy paris! [12:55:32] Silke_WMDE_: thanks :) [12:57:34] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 154 processes [12:58:54] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [13:00:32] Change on 12mediawiki a page Wikimedia Labs/Toolserver features needed in Tool Labs was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=640497 edit summary: [+18422] Combined features needed and features wanted to one single page [13:00:43] !log deployment-prep rebooting apache32 (locked / can't login) [13:00:43] Logged the message, Master [13:01:13] MaxSem: I united the content of both pages now. [13:01:21] cheers [13:01:27] If you add more stuff, please use http://www.mediawiki.org/wiki/Wikimedia_Labs/Toolserver_features_needed_in_Tool_Labs [13:01:27] Thank you Silke_WMDE_ [13:01:38] Silke_WMDE_: mind if I also send a note about that to the Labs mailing list? [13:02:23] Change on 12mediawiki a page Wikimedia Labs/Toolserver features wanted in Tool Labs was modified, changed by MaxSem link https://www.mediawiki.org/w/index.php?diff=640498 edit summary: [-18378] Redirected page to [[Wikimedia Labs/Toolserver features needed in Tool Labs]] [13:02:24] Go head! [13:03:59] !log deployment-prep REVERTED GIT-DEPLOY!!!!! rm /data/project/apache/common-local (symlink) and restored backup: mv /data/project/apache/common-local.pre-git-deploy /data/project/apache/common-local [13:04:01] Logged the message, Master [13:04:59] thanks for the redirect! [13:05:26] !log deployment-prep refreshing /home/wikipedia/common from latest master (no more newdeploy branch) [13:05:28] Logged the message, Master [13:06:32] !log deployment-prep multiversion/refreshWikiversionsCDB [13:06:34] Logged the message, Master [13:07:21] Change on 12mediawiki a page Wikimedia Labs was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=640504 edit summary: [+0] /* Tool Labs */ Updated link to "Toolserver features needed/wanted in Tool Labs" [13:07:54] !log deployment-prep starting apache2 on apache32 and apache33 [13:07:55] Logged the message, Master [13:09:39] matthiasmullie: you will get more people to read your issue there :-] [13:09:55] alright [13:09:56] matthiasmullie: what are the symptoms ? [13:10:07] mlitn@bastion1:~$ ssh ee-prototype [13:10:08] If you are having access problems, please see: https://labsconsole.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [13:10:09] Creating directory '/home/mlitn'. [13:10:10] Unable to create and initialize directory '/home/mlitn'. [13:10:11] ... [13:10:13] Connection to ee-prototype closed. [13:10:14] mlitn@bastion1:~$ ssh ee-prototype.pmtpa.wmflabs [13:10:38] I guess the homedir is screwed [13:10:44] have you tried rebooting the instance ? [13:10:55] Change on 12mediawiki a page Wikimedia Labs was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=640510 edit summary: [+35] /* Proposals */ updated link to Features needed/wanted [13:11:02] I haven't, let's see [13:12:07] of course you already managed to log in that instance haven't you? [13:12:48] Rebooted instance 3a2e614d-6678-444e-ac7a-ff3105469fdd. [13:12:52] still same symptoms though [13:13:07] yes, I have been able to login to that instance in the past (last week it worked) [13:19:13] !log deployment-prep the infamous beta auto updater is back in action on deployment-bastion [13:19:15] Logged the message, Master [13:24:14] !log deployment-prep manually updating extensions to make sure the beta autoupdater works properly [13:24:16] Logged the message, Master [13:24:20] matthiasmullie: still broken ? [13:24:34] matthiasmullie: sometime the console log gives some clues. Such as the LDAP not being reacheable [13:25:31] mlitn@ee-prototype:~$ [13:25:33]