[00:03:43] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 157 processes [00:08:42] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 149 processes [00:37:32] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 21% free memory [00:41:42] PROBLEM Free ram is now: WARNING on techvandalism-bot.pmtpa.wmflabs 10.4.0.194 output: Warning: 15% free memory [00:41:53] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [00:50:34] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 17% free memory [00:56:34] RECOVERY Free ram is now: OK on techvandalism-bot.pmtpa.wmflabs 10.4.0.194 output: OK: 23% free memory [01:09:52] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [01:18:02] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Unable to read output [01:22:54] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [02:02:43] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 157 processes [02:07:42] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 149 processes [02:22:13] PROBLEM Total processes is now: WARNING on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS WARNING: 200 processes [02:28:02] PROBLEM Free ram is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to fork() failed [02:31:33] PROBLEM Disk Space is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [02:36:33] RECOVERY Disk Space is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: DISK OK [02:52:54] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [03:48:02] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to popen() failed [03:49:32] PROBLEM Disk Space is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to fork() failed [03:49:52] PROBLEM dpkg-check is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Unable to read output [03:51:22] PROBLEM Current Load is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [03:54:52] RECOVERY dpkg-check is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: All packages OK [03:56:13] RECOVERY Current Load is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: OK - load average: 0.15, 0.32, 0.31 [04:08:02] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 10% free memory [04:22:14] PROBLEM Total processes is now: CRITICAL on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS CRITICAL: 201 processes [04:27:12] PROBLEM Total processes is now: WARNING on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS WARNING: 200 processes [04:39:52] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [04:40:33] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 20% free memory [04:44:33] PROBLEM Free ram is now: WARNING on techvandalism-bot.pmtpa.wmflabs 10.4.0.194 output: Warning: 15% free memory [04:48:33] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 17% free memory [04:52:53] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [04:54:43] PROBLEM Free ram is now: CRITICAL on techvandalism-bot.pmtpa.wmflabs 10.4.0.194 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:59:33] PROBLEM Free ram is now: WARNING on techvandalism-bot.pmtpa.wmflabs 10.4.0.194 output: Warning: 7% free memory [06:27:15] PROBLEM Total processes is now: CRITICAL on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS CRITICAL: 203 processes [06:29:55] PROBLEM Total processes is now: WARNING on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS WARNING: 152 processes [06:32:44] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 155 processes [06:48:43] PROBLEM Free ram is now: CRITICAL on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:12] PROBLEM Total processes is now: WARNING on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS WARNING: 198 processes [06:52:42] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 150 processes [06:53:32] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 17% free memory [06:59:52] RECOVERY Total processes is now: OK on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS OK: 148 processes [07:05:42] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 151 processes [07:35:22] PROBLEM Free ram is now: CRITICAL on build-precise1.pmtpa.wmflabs 10.4.0.173 output: Connection refused by host [07:35:22] PROBLEM dpkg-check is now: CRITICAL on build-precise1.pmtpa.wmflabs 10.4.0.173 output: Connection refused by host [07:35:42] PROBLEM Disk Space is now: CRITICAL on build-precise1.pmtpa.wmflabs 10.4.0.173 output: Connection refused by host [07:36:12] PROBLEM SSH is now: CRITICAL on build-precise1.pmtpa.wmflabs 10.4.0.173 output: Connection refused [07:36:32] PROBLEM Current Load is now: CRITICAL on build-precise1.pmtpa.wmflabs 10.4.0.173 output: Connection refused by host [07:36:52] PROBLEM Total processes is now: CRITICAL on build-precise1.pmtpa.wmflabs 10.4.0.173 output: Connection refused by host [07:40:23] RECOVERY Free ram is now: OK on build-precise1.pmtpa.wmflabs 10.4.0.173 output: OK: 91% free memory [07:40:23] RECOVERY dpkg-check is now: OK on build-precise1.pmtpa.wmflabs 10.4.0.173 output: All packages OK [07:40:43] RECOVERY Disk Space is now: OK on build-precise1.pmtpa.wmflabs 10.4.0.173 output: DISK OK [07:41:13] RECOVERY SSH is now: OK on build-precise1.pmtpa.wmflabs 10.4.0.173 output: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [07:41:33] RECOVERY Current Load is now: OK on build-precise1.pmtpa.wmflabs 10.4.0.173 output: OK - load average: 0.02, 0.06, 0.04 [07:41:53] RECOVERY Total processes is now: OK on build-precise1.pmtpa.wmflabs 10.4.0.173 output: PROCS OK: 92 processes [08:37:53] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [08:38:33] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 21% free memory [08:50:53] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [09:06:32] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 17% free memory [09:13:22] !tunnel [09:13:22] ssh -f user@bastion.wmflabs.org -L :server: -N Example for sftp "ssh chewbacca@bastion.wmflabs.org -L 6000:bots-1:22 -N" will open bots-1:22 as localhost:6000 [09:17:00] pff [09:18:52] !log deployment-prep Beta is broken in some random and creative ways AGAIN. /home on bastion is corrupted, some instances do not let us connect anymore, apache docroot disappeared. [09:18:53] Logged the message, Master [09:18:55] I am fed up with beta [09:24:33] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 151 processes [09:24:46] !log deployment-prep upgrading / rebooting all instances [09:24:47] Logged the message, Master [09:25:33] PROBLEM dpkg-check is now: CRITICAL on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: DPKG CRITICAL dpkg reports broken packages [09:25:43] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 150 processes [09:25:53] PROBLEM dpkg-check is now: CRITICAL on deployment-varnish-t3.pmtpa.wmflabs 10.4.1.83 output: DPKG CRITICAL dpkg reports broken packages [09:26:33] PROBLEM dpkg-check is now: CRITICAL on deployment-jobrunner08.pmtpa.wmflabs 10.4.1.30 output: Connection refused by host [09:26:43] PROBLEM dpkg-check is now: CRITICAL on deployment-bastion.pmtpa.wmflabs 10.4.0.58 output: DPKG CRITICAL dpkg reports broken packages [09:27:24] PROBLEM dpkg-check is now: CRITICAL on deployment-cache-mobile01.pmtpa.wmflabs 10.4.1.82 output: DPKG CRITICAL dpkg reports broken packages [09:29:33] RECOVERY Total processes is now: OK on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS OK: 137 processes [09:30:34] PROBLEM dpkg-check is now: CRITICAL on deployment-cache-bits03.pmtpa.wmflabs 10.4.0.51 output: DPKG CRITICAL dpkg reports broken packages [09:30:54] RECOVERY dpkg-check is now: OK on deployment-varnish-t3.pmtpa.wmflabs 10.4.1.83 output: All packages OK [09:31:34] RECOVERY dpkg-check is now: OK on deployment-jobrunner08.pmtpa.wmflabs 10.4.1.30 output: All packages OK [09:31:44] RECOVERY dpkg-check is now: OK on deployment-bastion.pmtpa.wmflabs 10.4.0.58 output: All packages OK [09:31:44] PROBLEM dpkg-check is now: CRITICAL on deployment-cache-upload04.pmtpa.wmflabs 10.4.0.220 output: DPKG CRITICAL dpkg reports broken packages [09:40:32] RECOVERY dpkg-check is now: OK on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: All packages OK [09:42:22] RECOVERY dpkg-check is now: OK on deployment-cache-mobile01.pmtpa.wmflabs 10.4.1.82 output: All packages OK [10:54:36] !log bots petrb: installing python-twisted and php5-curl [10:54:38] Logged the message, Master [10:54:43] !log bots petrb: on bnr1 [10:54:44] Logged the message, Master [10:55:43] WTF [11:02:49] :O [11:03:58] !log bots root: because of gluster failure on bots-bnr1 it's needed to kill all processes that access the fs, which is gifti 10091 f.c.m (gifti)java valhallasw 13912 ..c.. (valhallasw)bash valhallasw 14202 ..c.. (valhallasw)file killing them now [11:03:59] Logged the message, Master [11:05:12] !log bots root: bots-bnr1 - process 14202 is stucked waiting for IO, unable to kill it, machine needs to be rebooted [11:05:14] Logged the message, Master [11:05:34] !log bots root: rebooting bots-bnr1 [11:05:36] Logged the message, Master [11:09:06] :D [11:09:14] it's totally fucked [11:09:18] :/ [11:09:25] that gluster suck [11:12:30] i think gluster likes having special moments [11:31:35] * Ryan_Lane grumbles [11:31:57] morning Ryan_Lane [11:32:06] labs are kind of fucked [11:32:08] I'm fixing gluster as we speak [11:32:10] yep [11:32:22] well, projects that rely on gluster currently are [11:32:25] ok [11:32:37] I'm pretty sure I know the fix and it shouldn't take an amazingly long time to fix it [11:32:51] I'm also taking this opportunity to upgrade the gluster boxes to precise [11:33:09] petan: hi! [11:33:12] hi [11:33:15] the gluster people don't really like supporting lucid [11:33:34] python-twisted doesn't seem to be working on bots-bnr1 [11:34:07] rschen7754 there is more that doesn't work on that box [11:34:12] ok [11:34:16] let me check it all [11:34:23] problems related to gluster [11:34:33] oh, ok [11:34:36] all my .aptitude stuff in $HOME was fucked [11:35:13] rschen7754 what is wrong with it [11:35:23] it's acting like it's not installed [11:35:48] !log bots root: reinstalling python-twisted on bots-bnr1 [11:35:50] Logged the message, Master [11:38:17] ok, try now [11:41:19] Connection to bots-nr1.pmtpa.wmflabs closed. [11:41:44] Unable to create and initialize directory '/home/rschen7754'. [11:41:55] gluster issues [11:42:00] I'm working on it now [11:42:09] i can try again later… trying to get an assignment done [11:42:11] thanks! [11:42:18] should hopefully be fixed in a few hours [11:42:25] (really hopefully before then) [11:45:13] rschen7754 did you have problems on bnr1 or nr1 [11:45:13] though train wifi is an awesome thing, it's not very good for latency :D [11:45:30] bots-nr1 [11:45:34] oh [11:45:37] you said bnr [11:45:41] no more memcached segfaults since downgrading [11:45:43] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 165 processes [11:45:44] there is no python-twisted on nr1 [11:47:33] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 153 processes [11:48:23] !log wikidata-dev wikidata-dev-9: Making cronjob restart memcached every night. Often a simple restart helps to get rid of the lock error message. [11:48:25] Logged the message, Master [11:49:33] oh, oops... [11:50:22] Silke_WMDE_: so, you'll be helping organize tool labs, right? [11:50:23] i think i better take a look when i'm more awake… sorry about that [11:50:36] Ryan_Lane: Yes. [11:50:47] \o/ [11:50:52] :) [11:51:12] Silke_WMDE_: should we get a meeting scheduled between you, I and sumana? [11:51:34] Yes, that would be cool! [11:51:48] are you going to be doing project management? [11:52:00] yes [11:52:03] we could probably discuss bots right now [11:52:23] there's quite a few immediate tasks that should be prioritized in that project [11:52:39] I don't know a lot about bots yet [11:52:55] I've been organizing tasks on the project page [11:53:04] well, labs project page, not bots [11:53:19] petan: got it going, sorry for the confusion [11:53:22] though it may make sense to have project tasks listed on project pages? [11:53:31] !resource bots [11:53:31] https://labsconsole.wikimedia.org/wiki/Nova_Resource:bots [11:53:38] thx [11:53:42] it's 4 am here and i didn't realize bnr-1 and nr-1 weren't the same :P [11:54:02] * Silke_WMDE_ is reading [11:54:05] I'm not totally sure where it would be best to organize stuff [11:54:21] labsconsole makes a lot of sense for a lot of reasons, but it's not historically where we do so [11:54:22] RECOVERY Current Load is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: OK - load average: 0.20, 0.20, 0.08 [11:54:22] RECOVERY Disk Space is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: DISK OK [11:54:58] where was it historically? mediawiki.org? meta? [11:55:08] mediawiki.org [11:55:13] we stopped using meta a long time ago [11:55:18] ok [11:55:33] mediawiki.org isn't really a great place for project docs either :D [11:55:37] * Ryan_Lane shrugs [11:55:42] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 97 processes [11:55:52] In my opinion labsconsole is a good place. Could be linked on mediawiki.org just in caase people look for it there. [11:55:54] we have too many wikis [11:55:55] docs.wikimedia.org would be cool :D [11:56:04] petan: yeah [11:56:15] merging wikitech and labsconsole and using the wikitech name would work too [11:56:23] or that [11:56:41] there's plans for it, but it always gets depriotized [11:56:43] we really badly need to hire people [11:56:44] :( [11:56:51] hire for what :D [11:56:56] ops [11:56:58] I have lot of time over weekends :D [11:57:10] heh [11:57:16] you should apply for a full time position ;) [11:57:24] we do have contractor positions available, though [11:59:00] mm full time... I can imagine that even if I am unsure if I meet all the requirements, I will see how the project I am employed by now will continue :D [11:59:24] Silke_WMDE_: I'm pretty sure the tool labs contractor position is close to being filled [11:59:29] at least I really want to learn puppet before thiking of working for wmf [11:59:44] petan: work with Damianz on the bots puppetization [11:59:47] Ryan_Lane: yay [12:00:23] I think I'm going to try to merge his change in next week. I've been absent in bots long enough, I guess :D [12:00:42] talking about puppet: The workflow for people who have a bot is not clear to me, yet. Is every bot author expected to deal with puppet to run a puppetized labs instance? [12:01:31] I'd prefer to have a community of people who would help puppetize things for any project [12:01:49] but realistically I expect the tool labs contractor to do this [12:01:58] I see. [12:02:07] we can't expect bot authors to use puppet [12:02:18] yeah, it's quite a ... [12:02:23] *dictionary* [12:02:31] hurdle [12:02:42] it needs to be as simple as possible [12:03:04] right now it requires too much effort for everyone [12:03:09] ssh bots-4 [12:03:17] >.< [12:03:21] it also puts quite a burden on petan and Damianz, I'm sure [12:03:57] But basically: Would everyone running a bot/moving a bot from toolserver to labs have instances like those we have fro Wikidata? [12:04:08] Ryan_Lane: Needs a few tweaks, just not had time [12:04:28] Really all the servers need re-deploying to that puppet standard with omg ram [12:04:38] nah, I think that's too difficult for people, too [12:04:55] I'd prefer bots be managed from a central place and be "scheduled" on instances [12:05:15] Like on toolserver, Would be perfect in my opinion [12:05:31] indeed. it would be nice if it was similar to toolserver [12:05:40] yes [12:05:45] different bots have different use cases [12:05:52] Damianz: yep [12:05:58] some will have their own instance [12:06:00] Ryan_Lane: Don't even tempt me to apply for a full time position, I'd drive you crazy :P [12:06:01] and that's fine [12:06:12] Damianz: hey, I told you ages ago to ;) [12:06:27] you just got a new job, though [12:06:36] Yeah :( [12:06:46] A bunch of clean puppeted instances with noroot, a 'submit' server and PBS [12:07:00] PBS? [12:07:15] yeah, what's a PBS? [12:07:19] addshore some of bots are daemon like [12:07:23] running all time [12:07:27] like wm-bot [12:07:34] PBS is of no use there [12:07:34] RECOVERY Total processes is now: OK on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS OK: 148 processes [12:07:36] I'm too expensive to do wmf full time anyway :P Do miss contract work somewhat though, this would be a cool project to work on... oh well, I've got crazy enterprise stuff to fix. [12:07:40] true [12:07:47] + these MUST be always on same instance [12:07:51] Damianz: heh [12:08:01] PBS (portable batch system) http://en.wikipedia.org/wiki/Portable_Batch_System [12:08:10] yeah [12:08:36] petan why must they always run on the same instance though? [12:08:53] ok labstore4 wtf are you doing :( [12:08:54] because for example wm-bot consist of multiple processes that are connected to each other [12:09:06] all these processes need to run on 1 machine [12:09:21] I'd love to look at using zeromq for bots usage reporting/management [12:09:28] hmm, fair, I imagine there is some way that you can pick an instance though, not looked into it [12:10:06] there's a number of ways we can handle job running [12:10:28] for sure this is a job for the tool labs contractor [12:10:56] Shame everytime you hire a labs guy they become a 'fix ceph, fix varnish, fix this, fix that' in production guy [12:11:01] We still love paravoid though [12:11:02] I am still looking for the big picture. Some bots did already migrate, other probably could. I don't know how many bots require db replication. (Actually, I don't have an overview over the variety of existing bots.) [12:11:03] * Ryan_Lane sighs [12:11:14] Damianz: yeah. well, production gets priority [12:11:56] I think a big plus to bots would be DB replication, i know a fair few people that would have migrated if it was already in place [12:12:05] booo, short team benefits [12:12:17] addshore: yes, that's supposed to be happening this month [12:12:23] :) [12:12:35] DB replication has really been blocked because magg... thingy has been busy doing prod stuff, but now a replica should be easy [12:12:50] In theory it should be better than labs, it's just another slave for switches etc rather than out-of-team [12:12:55] honestly until we stablize gluster we're not ready for mass movement [12:13:10] it's a major blocker right now [12:15:00] will Gluster ever work faster than now? [12:15:07] yes [12:15:22] in fact, I hope to make it 3x faster this week or next [12:15:39] --nodelay?:P [12:15:50] I'd do it now if I wasn't scared we'd saturate our network node [12:16:03] MaxSem: switch the instances to using virtio for networking [12:16:21] it's still going to be slow, but it should hopefully at least be bareable [12:16:46] "bareable":) [12:17:06] all distributed network filesystems are slow [12:21:44] Ryan_Lane: What are good next steps that can be done from my side for bots? [12:22:07] Do we have a list of bots? [12:22:14] Silke_WMDE_: we should start entering bugs and organizing them into projects [12:22:31] only what's documented on the resource page [12:22:37] ok [12:23:01] I guess we really need the contractor before we can start [12:23:36] * Silke_WMDE_ is looking at a list on the toolserver wiki [12:23:40] you could have a look through users crontabs to see what bots are running? [12:24:22] https://wiki.toolserver.org/view/List_of_Wikimedia_bots and https://wiki.toolserver.org/view/IRC_bots [12:26:36] Ryan_Lane: For a meeting with Sumana - what times would be good for you? [12:26:55] (and why are you awake at all? ;P ) [12:27:12] probably easiest is morning in SF [12:27:17] Silke_WMDE_: I'm in europe [12:27:25] oh! [12:27:26] fosdem? [12:27:47] yep [12:27:56] and I'm in paris for a couple of days [12:28:02] cool! [12:28:02] I'm riding the train there now, in fact [12:28:40] And Sumana is in NY, right? [12:29:30] yep [12:29:52] sumanah, were you summoned by Silke_WMDE_ ? [12:29:57] :) [12:30:09] hi Platonides - no! Should I have been? :) [12:30:13] LOL [12:30:14] happy Monday Platonides [12:30:15] hi sumanah [12:30:21] to you too, sumanah [12:30:26] Hello Silke_WMDE_! [12:30:43] Ryan_Lane, as the matter of fact I also have questions related to tools lab [12:30:44] Ryan_Lane and I were talking about scheduling a meeting [12:30:53] MaxSem: sure, what's up? [12:31:32] if I understood it right, there will be a native (as in non-virtualized) shared copy of Wikipedia DB? [12:31:43] yes [12:31:51] cool [12:33:03] that's planned for this month [12:33:18] Silke_WMDE_: cool, yes, meeting, sounds good [12:33:30] are there plans for other hi-performance services? several people who do mapping-related stuff are fearing that they will not be able to migrate from Toolserver as they require a copy of OSM [12:33:38] I have a few emails I aim to respond to today and yours is one (or 5) of them [12:34:00] MaxSem, also mention that they use postgreSQL [12:34:18] yup, OSM = PG + PosgGIS [12:34:33] MaxSem: OSM will be in production, not labs [12:34:47] sumanah Ryan_Lane Something like 10 a.m. in SF, 1 p.m.in NY and 7 p.m. in Berlin. [12:34:47] I know;) [12:35:01] but Toolserver currently hosts its own copy [12:35:05] Ryan_Lane: When will you back to work/to SF? [12:35:15] I fly back on the 8th [12:35:19] which requires more storage than our instances can provide (gluster is outta the question) and requires some performance [12:35:44] MaxSem: right, so labs users will use the production version [12:36:15] Ryan_Lane, so if we will be able to give them this access that would be awesome [12:36:33] it's just using tiles, right? [12:36:51] no, I'm talking about Postgres access [12:37:10] we'll have to look at replicated databases for that [12:37:38] rendering tiles on TS/Labs is insane enough that nobody's doing it, apparently:) [12:37:39] Silke_WMDE_: that time of day sounds good to me! [12:40:23] PROBLEM Total processes is now: WARNING on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: PROCS WARNING: 184 processes [12:40:53] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [12:41:24] !log deployment-prep -dbdump : stopping udp2log, starting udp2log-mw [12:41:26] Logged the message, Master [12:41:35] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 21% free memory [12:45:06] is there a tool labs spec somewhere for looking/adding suggestion? [12:46:31] there's a couple pages that DaB started [12:46:40] something like "features needed in tool labs" [12:46:48] and "features wanted in tool labs" [12:46:54] silly to have two pages, I think [12:47:43] Ryan_Lane: I agree that we don't need to have 2 pages [12:47:43] * hashar sends Ryan_Lane to bed [12:47:52] wait MaxSem, I'll send them [12:47:55] hashar: I'm in europe ;) [12:47:59] oh [12:48:05] hashar: wait, where is he? the middle of the Atlandtic? [12:48:24] http://www.mediawiki.org/wiki/Wikimedia_Labs/Toolserver_features_wanted and http://www.mediawiki.org/wiki/Wikimedia_Labs/Toolserver_features_needed_in_Tool_Labs [12:48:40] Silke_WMDE_: if you want to do the honors of combining those into 1 page, that would be great [12:48:49] hehe [12:48:53] OK! [12:49:19] More than great [12:49:34] some organization to the pages would be good too [12:49:35] Silke_WMDE_, thanks [12:49:58] Also, I'll send a mail to toolserver-l with a call for additions (though Dab said it should be complete) [12:49:59] also please go ahead & update the references on https://www.mediawiki.org/wiki/Wikimedia_Labs [12:50:00] maybe bugs added and listed with it :) [12:50:22] RECOVERY Total processes is now: OK on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: PROCS OK: 150 processes [12:52:59] !log deployment-prep rebasing /srv/deployment/mediawiki/common [12:53:01] Logged the message, Master [12:53:08] Change on 12mediawiki a page Wikimedia Labs/Toolserver features wanted in Tool Labs was modified, changed by MaxSem link https://www.mediawiki.org/w/index.php?diff=640492 edit summary: [+293] /* Labs wide (not only bots / tools), but available for all projects */ +cmt [12:54:32] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 17% free memory [12:55:00] ok. back on in a bit. train is arriving [12:55:06] enjoy paris! [12:55:32] Silke_WMDE_: thanks :) [12:57:34] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 154 processes [12:58:54] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [13:00:32] Change on 12mediawiki a page Wikimedia Labs/Toolserver features needed in Tool Labs was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=640497 edit summary: [+18422] Combined features needed and features wanted to one single page [13:00:43] !log deployment-prep rebooting apache32 (locked / can't login) [13:00:43] Logged the message, Master [13:01:13] MaxSem: I united the content of both pages now. [13:01:21] cheers [13:01:27] If you add more stuff, please use http://www.mediawiki.org/wiki/Wikimedia_Labs/Toolserver_features_needed_in_Tool_Labs [13:01:27] Thank you Silke_WMDE_ [13:01:38] Silke_WMDE_: mind if I also send a note about that to the Labs mailing list? [13:02:23] Change on 12mediawiki a page Wikimedia Labs/Toolserver features wanted in Tool Labs was modified, changed by MaxSem link https://www.mediawiki.org/w/index.php?diff=640498 edit summary: [-18378] Redirected page to [[Wikimedia Labs/Toolserver features needed in Tool Labs]] [13:02:24] Go head! [13:03:59] !log deployment-prep REVERTED GIT-DEPLOY!!!!! rm /data/project/apache/common-local (symlink) and restored backup: mv /data/project/apache/common-local.pre-git-deploy /data/project/apache/common-local [13:04:01] Logged the message, Master [13:04:59] thanks for the redirect! [13:05:26] !log deployment-prep refreshing /home/wikipedia/common from latest master (no more newdeploy branch) [13:05:28] Logged the message, Master [13:06:32] !log deployment-prep multiversion/refreshWikiversionsCDB [13:06:34] Logged the message, Master [13:07:21] Change on 12mediawiki a page Wikimedia Labs was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=640504 edit summary: [+0] /* Tool Labs */ Updated link to "Toolserver features needed/wanted in Tool Labs" [13:07:54] !log deployment-prep starting apache2 on apache32 and apache33 [13:07:55] Logged the message, Master [13:09:39] matthiasmullie: you will get more people to read your issue there :-] [13:09:55] alright [13:09:56] matthiasmullie: what are the symptoms ? [13:10:07] mlitn@bastion1:~$ ssh ee-prototype [13:10:08] If you are having access problems, please see: https://labsconsole.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [13:10:09] Creating directory '/home/mlitn'. [13:10:10] Unable to create and initialize directory '/home/mlitn'. [13:10:11] ... [13:10:13] Connection to ee-prototype closed. [13:10:14] mlitn@bastion1:~$ ssh ee-prototype.pmtpa.wmflabs [13:10:38] I guess the homedir is screwed [13:10:44] have you tried rebooting the instance ? [13:10:55] Change on 12mediawiki a page Wikimedia Labs was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=640510 edit summary: [+35] /* Proposals */ updated link to Features needed/wanted [13:11:02] I haven't, let's see [13:12:07] of course you already managed to log in that instance haven't you? [13:12:48] Rebooted instance 3a2e614d-6678-444e-ac7a-ff3105469fdd. [13:12:52] still same symptoms though [13:13:07] yes, I have been able to login to that instance in the past (last week it worked) [13:19:13] !log deployment-prep the infamous beta auto updater is back in action on deployment-bastion [13:19:15] Logged the message, Master [13:24:14] !log deployment-prep manually updating extensions to make sure the beta autoupdater works properly [13:24:16] Logged the message, Master [13:24:20] matthiasmullie: still broken ? [13:24:34] matthiasmullie: sometime the console log gives some clues. Such as the LDAP not being reacheable [13:25:31] mlitn@ee-prototype:~$ [13:25:33] I'm in :) [13:25:42] yay [13:25:51] thanks! [13:30:15] matthiasmullie: if in doubt, reboot :-] [13:30:27] that worked well for me over the past year [13:30:43] pff [13:30:50] annnd ganglia does not work anymore in labs :( [13:30:52] http://ganglia.wmflabs.org/latest/ [13:30:55] There was an error collecting ganglia data (127.0.0.1:8654): fsockopen error: Connection refused [13:31:09] how did I ever get to a point where rebooting a failing system is not my first reflex :) [13:31:52] when it still doesn't work after 2 reboots ? [13:35:47] !log deployment-prep Applying role::memcached to apache32 and apache33 [13:35:50] Logged the message, Master [13:36:07] I am so dumb [13:36:10] puppet is broken, can't apply [13:36:12] abab [13:37:45] http://thedailywtf.com/Articles/Routers,-Routers,-Everywhere.aspx [13:41:37] MaxSem: I think I got beta back in a good shape :-D [13:41:46] whee [13:42:03] MaxSem: have you made a change to mediawiki-config to support the en.wikipedia.m.beta.wmflabs.org urls ? [13:42:11] or should I go ahead and hack it ? :D [13:42:17] hashar, I did [13:42:25] it should be in master [13:42:33] however I don't see MF on beta [13:42:36] """ The HR staff were paid near-minimum wage rates, and like time-share salesmen, they were paid primarily on commission.""" [13:42:39] big mistakes :-D [13:42:55] then people just care about hiring "someone" :D [13:43:05] MaxSem: ahh that is not enabled properly [13:43:13] I was referring to "I'd reboot it again!" And if that didn't fix it? "I'd reboot again!" [13:44:09] zoo hmm wmf-config/mobile.php has a safeguard which prevent its code from running on labs [13:44:16] guess we want to refactor everything :-] [13:44:26] derp [13:44:31] I thought I removed it [13:47:36] hashar, https://gerrit.wikimedia.org/r/47400 [13:49:01] MaxSem: you are the boss! ;-] [13:50:40] ahh it shows up on http://en.wikipedia.beta.wmflabs.org/wiki/Special:Version now [13:51:38] http://en.wikipedia.beta.wmflabs.org/w/index.php?title=Greco-Turkish_War_(1919%E2%80%931922)&mobileaction=toggle_view_mobile [13:51:58] I'm getting Permission Denied (Public key)... anybody know what's going on? [13:52:08] I uploaded my Pubkey to NovaKey [13:52:08] Vacation9: have you tried rebooting your instance ? [13:52:10] needs a localisation update [13:52:23] hashar: Connecting to Bastion I mean [13:52:41] MaxSem: IIRC the l10n update happen automatically after the new code has been fetched [13:52:55] I don't think my key is updating, my known_hosts only has two keys while there are three in NovaKey [13:52:58] I see [13:53:16] PHP Warning: require(/mnt/srv/deployment/mediawiki/common/1.21wmf8/../wmf-config/wgConf.php): failed to open stream: No such file or directory in /data/project/apache/common-local/wmf-config/CommonSettings.php on line 149 [13:53:17] bohh [13:54:29] So no idea what's going on? I think there may be an issue with updating ssh public keys, but not sure... anybody willing to add my key in my .ssh? [13:54:45] MaxSem: ah I found out [13:56:22] mwversionsinuse --extended --withdb [13:56:23] 1.21wmf8=aawiki [13:56:24] seriously [13:57:13] PROBLEM Free ram is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: Warning: 12% free memory [13:57:53] PROBLEM dpkg-check is now: CRITICAL on wikidata-client-test.pmtpa.wmflabs 10.4.1.3 output: DPKG CRITICAL dpkg reports broken packages [13:59:15] so [13:59:31] we could have a VERY SIMPLE KISS system to figure out the versions [13:59:39] like a simple .ini file under /etc [13:59:50] instead we end up with half a dozen scripts [14:02:52] RECOVERY dpkg-check is now: OK on wikidata-client-test.pmtpa.wmflabs 10.4.1.3 output: All packages OK [14:07:45] !log deployment-prep -bastion removing role::deployment::deployment_servers::labs [14:07:47] Logged the message, Master [14:09:42] !log deployment-prep -bastion applying misc::deployment::scap_scripts [14:09:44] Logged the message, Master [14:12:03] PHP Notice: Undefined variable: urlprotocol in /data/project/apache/common-local/wmf-config/filebackend-labs.php on line 35 [14:12:03] PHP Notice: Undefined variable: urlprotocol in /data/project/apache/common-local/wmf-config/filebackend-labs.php on line 37 [14:12:05] PHP Notice: Undefined variable: urlprotocol in /data/project/apache/common-local/wmf-config/filebackend-labs.php on line 40 [14:12:06] PHP Notice: Undefined variable: urlprotocol in /data/project/apache/common-local/wmf-config/filebackend-labs.php on line 41 [14:12:07] PHP Notice: Undefined variable: wmgMFRemotePostFeedbackUsername in /data/project/apache/common-local/wmf-config/mobile.php on line 11 [14:12:08] PHP Notice: Undefined variable: wmgMFRemotePostFeedbackPassword in /data/project/apache/common-local/wmf-config/mobile.php on line 12 [14:12:11] everything is screwed up [14:17:30] I can't remember how to update the l10n cache [14:21:55] you may want to be careful using project storage right now [14:22:07] PHP Warning: wfMkdirParents: failed to mkdir "/home/wikipedia/common/php-master/cache/l10n" mode 0777 in /data/project/apache/common-local/php-master/includes/GlobalFunctions.php on line 2583 [14:22:16] I am slooowly progressing :-D [14:23:40] Rebuilding ab... [14:23:42] ok [14:23:44] I am done for today [14:23:54] MaxSem: the l10n cache is rebuilding [14:24:01] that will be completed later tonight [14:24:02] cool [14:24:31] !log deployment-prep Started the over long l10n cache rebuild in a screen on deployment-bastion [14:24:33] Logged the message, Master [14:24:36] that takes a few hours to generate :-] [14:24:42] PROBLEM Free ram is now: CRITICAL on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: CHECK_NRPE: Socket timeout after 10 seconds. [14:25:54] maybe I am exaggerating a bit. It already generated the second cache file [14:25:54] :D [14:29:32] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 17% free memory [14:29:46] Silke_WMDE_: the organization of the features pages are already so much better now, thanks! [14:30:09] <^demon> Oh wow, Ryan's up before me. Not used to that. [14:30:12] :D That's the easy part. [14:30:18] ^demon: I'm in paris [14:30:34] <^demon> I figured somewhere in Europe :) [14:30:59] I'm in a cool city, and yet here I am fixing a crappy filesystem [14:31:42] hashar: you're having sudo issues ;) [14:31:51] possibly [14:32:04] <^demon> Only time I was in Paris, it was for like 4 days and I got terribly sick in the last day going home. [14:32:07] the whole beta setup is a hack [14:32:15] scripts attempt to run everything as mwdeploy [14:32:24] and mwdeploy is supposed to be able to sudo as … mwdeploy :-] [14:32:49] Ryan_Lane: and you came to paris without telling me ? :D [14:32:59] hashar: I just showed up :) [14:33:31] where are you working? [14:33:35] hotel? :( [14:33:38] yep [14:33:47] I'm in the latin quarter [14:33:56] so cliché [14:33:58] ;-D [14:34:07] :D [14:34:48] you can try out a corworking place known as "la cantine" [14:34:53] http://lacantine.org/blog/un-espace-de-coworking [14:34:54] eating croissant, too? [14:34:59] ;) [14:35:13] shared space, cheap price [14:35:24] lot of freelancer and web people there [14:35:25] or [14:35:34] you can go squat the Wikimedia France office :-] [14:35:37] Silke_WMDE_: heh. no. not hungry just yet :) [14:35:59] hashar: I may do that tomorrow [14:36:02] I'm not in town long [14:36:13] or grab a train and come see me haha [14:36:26] where are you? [14:38:11] well I am in Nantes [14:38:19] but that is a bit far from Paris I am afraid [14:38:24] like 2hours and a half of train [14:40:09] anyway this week is a bit busy after work : / [14:41:21] * Silke_WMDE_ is going mad about a well hidden typo [14:46:31] Silke_WMDE_: in a puppet manifest ? [14:49:06] yep [14:49:06] Silke_WMDE_: upload the change so we can help you :-] [14:49:07] you should be able to install puppet on your local comp to run "puppet parser validate" [14:49:07] my strategy is commenting out everything :p [14:49:07] something like: sudo gem install puppet [14:49:08] also Jenkins run puppet parser validate on your change :-] [14:49:08] puppet parser validate!? sounds cool! [14:49:08] I have my editor to run it after I save a .pp file :-] [14:49:09] oh, or that one > vim-puppet - syntax highlighting for puppet manifests in vim [14:49:09] yeah that one helps a lot [14:49:21] Silke_WMDE_: and definitely get https://github.com/scrooloose/syntastic [14:51:27] Silke_WMDE_: http://imgur.com/0KJLqFp [14:51:37] I should write a doc page somewhere [14:52:03] :D [14:52:31] if you are curious, my vimrc is at https://github.com/hashar/alix [14:52:47] and the two plugins in https://github.com/hashar/alix/tree/master/vim [14:52:55] (the links points to the github repos) [14:56:51] hashar: thx!