[00:06:09] Hm… anyone around who understands the job queue? This is failing and I don't know how to debug it: https://gerrit.wikimedia.org/r/#/c/119537/ [00:06:46] (That's meant to solve one of several dns problems that labs has been having) [00:08:16] andrewbogott: what's the actual problem? [00:08:30] hoo|away: best I can tell, the job never runs at all. [00:09:09] andrewbogott: mh... if this is for wikitech you might be able to run the job per hand using the runJobs maint. script [00:09:33] like halt the real job queue executer for wikitech and run the manually [00:09:40] I don't think it's a problem with the queue; other jobs that land in the queue run just fine. [00:09:50] I don't really know how wikitech is set up... probably much smaller scale than production or beta [00:10:37] Yeah... do you have error logs for the job executor? [00:10:55] If so you might want to check those... if not, running them per hand also is an option [00:10:57] I would love an error log! Do you know where/how I can get one? [00:11:10] depends on hwo wikitech runs these [00:11:18] in production they'd be on terbium [00:11:27] but I have no idea where wikitech runs these [00:12:36] where it runs the jobs, you mean? I'm sure they just run locally on the same host. [00:13:21] mh, this doesn't look puppetized [00:13:56] wikitech is fairly seat-of-pants at the moment. [00:14:28] mh... is it running in a labs instance? [00:15:18] no, on virt0 [00:15:39] ah right... it's the openstack manager thing [00:15:45] I suspect that there's just some obvious mistake or typo in my code, and I just don't know about it on account of not having any logs. [00:16:07] mh, I can't ssh into virt0 [00:16:28] you might want to look at user apache's crons or so [00:16:44] somewhere there msut be a hint about runJobs [00:17:25] you mean, about where the logfiles are? [00:18:41] yep [00:18:57] or it might even be enough to halt the original runner and run it per hand once [00:19:19] if you execute the jobs interactively you'll also see exceptions and fatals and stuff [00:19:28] * hoo|away loves remote diagnosing [00:19:49] Ah, that makes sense. Ok, let's see if I can suspend the queue... [00:20:00] I don't have an apache user. I have www-data but it doesn't have a crontab. [00:20:11] mh [00:20:30] maybe one instance is running atm... that could give you the use [00:20:31] r [00:20:44] pgrep runJobs [00:22:18] nope, nothing [00:23:00] mh... look into /var/log and look for something going on there? [00:23:16] find /var/log -iname '*job*' or so [00:23:55] hi [00:23:57] https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Revision_history.2C_Edits_by_user_is_503 [00:24:01] Is there any reason to think that it's logging at all? In me experience most mediawiki stuff doesn't log except when explicitly configured to do so [00:24:28] andrewbogott: Well the hope that somebody made it log... it's not logging on it's own [00:25:02] Anyone can give advice here https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Revision_history.2C_Edits_by_user_is_503 [I am πr^2], please ? [00:27:32] andrewbogott: If you manually add me to virt0 I might be able to have a further look but from here I'm a bit out of ideas [00:28:19] except of totally insane stuff like triggering a loop and then wait for a process to get caught inside [00:28:41] hoo|away: I can't add you… I'm going to see if I can figure out how to stop the queue entirely… that way I'll be able to see if my code is getting added at all [00:29:56] makes sense... you can also view the job queue length via the api [00:30:03] and there's a maint. script for that [00:30:17] even one that lists job by type AFAIR [00:30:31] Hm, interesting, $wgJobRunRate = 0; [00:30:33] class admins::labs :P [00:30:36] So it must be run by a cron, someplace... [00:30:41] yup [00:37:03] huh: as you noted: the webservices for usersearch isn't running [00:37:15] Did I give correct advice? [00:37:20] Any idea why it would fail? [00:37:33] huh: log says last started: 2014-03-19 23:49:21: (log.c.166) server started [00:37:40] hmm [00:37:43] huh: and failed [00:37:54] !newweb [00:37:54] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help/NewWeb [00:37:57] Would it be in the logs? [00:38:41] nope ... [00:38:45] huh: you should create a .lighttpd.conf file and add debug.log-request-handling = "enable" for [00:39:03] a more verbose error log [00:39:27] I don't have access, should I pass the info to the maintainer? [00:39:32] hoo|away: ok, I've confirmed that my code is not adding a job at all. [00:39:42] huh: yes [00:40:02] $job->insert(); [00:40:08] oh, I don't think you're supposed to do that [00:40:20] https://gerrit.wikimedia.org/r/#/c/119537/ [00:40:32] hedonil: thank you [00:40:36] When I used insert() the code immediately errored out. I replaced it with the more proper singleton thingy... [00:41:11] huh: yw [00:41:21] $jobQueueGroup = JobQueueGroup::singleton(); [00:41:21] $jobQueueGroup->push( $job ); [00:42:21] andrewbogott: ^ that should do the magci [00:42:24] * magic [00:42:37] hm, I've lost track of my patch somewhere. I thought I was doing that [00:43:26] return JobQueueGroup::singleton()->push( $this ); [00:43:43] that's what Job::insert does... so it should also work (although it's deprecated and bad style) [01:02:56] Coren: I just recreated the databases... seemed easier than bugging you further, so now everything should be good on my end [01:06:43] hoo|away: I have an error message! \o/ [01:06:55] thanks for your help, should be able to sort this from here. [01:07:19] ah, great :) [02:29:02] have we had any luck with restoring the logs and/or restoring the cron tabs? [02:30:52] Coren >_> [08:45:10] hello [09:39:09] !log deployment-prep convert all remaining hosts but db1 to use the local puppet and salt masters [09:39:12] Logged the message, Master [09:49:01] springle: I guess you created the deployment-db1 on the beta cluster labs project [09:49:09] springle: seems something went wrong during the instance creation :-( [09:49:51] hashar: yes, was about to ask you about it [09:50:03] puppet doesn't run and config page wont load [09:50:16] springle: I tried rebooting it [09:50:24] but I think something weirder happened [09:50:27] although it did load initially right after setup [09:50:28] might want to create a new one :] [09:51:05] ah I am connected to it at least [09:51:54] yes it's booting and running fine [09:52:03] err: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find node 'i-00000205.eqiad.wmflabs'; cannot compile [09:52:03] :D [09:52:07] but puppet can't find a profile for the hostname [09:52:10] yup [09:52:18] and wikitech says the instance does not exist [09:52:19] bah [09:52:56] springle: lets delete it ? [09:53:01] sure [09:53:42] Created instance i-00000220 with image "ubuntu-12.04-precise" and hostname i-00000220.eqiad.wmflabs. [09:53:46] same hostname deployment-db1 [09:55:08] springle: also the labs instance do not have all their disk space allocated. We need to apply some puppet class on them [09:55:29] ah role::labs::lvm::mnt [09:55:38] yep [09:55:40] which create a lvm logical volume under /mnt [09:55:48] also, how is the db data pulled from production? [09:55:56] it is not [09:56:04] the beta cluster databases are independent from production [09:56:26] well not real time, but someone must export/import [09:56:31] two years ago, someone did an export of some pages and imported them manually to populate a bunch of pages [09:56:37] ah [09:56:41] and we never bothered adding more pages :] [09:56:53] revision timestamps seemed newer than that [09:57:00] there might be a script to sync some wikipages, but that is done using the Mediawiki API [09:58:03] the rev timestamps are updated because we have automatic browser tests doing edits on some of the wikis [09:58:08] for exampe testing VisualEditor [09:58:32] ah right [09:58:39] deployment-db1 running firstboot.sh \O/ [09:58:49] who chooses what data we import this time? [09:59:11] I thought we could export the current DB and reimport them in eqiad [09:59:12] as is [09:59:48] nature's call will be back soon [10:01:23] hashar: https://tendril.wikimedia.org/report/clusters [10:01:52] lot of disk needed, and 16G on the large vm might struggle with a full dataset [10:05:48] * hashar discovers tendril [10:06:03] springle: well we do not import the full production databases :° [10:06:10] the beta cluster is merely a staging area for code [10:06:31] the full databases are replicated on some other databases slaves for consumption by labs tools, but that is unrelated to beta [10:07:04] the current set is on ssh deployment-sql.pmtpa.wmflabs mysql root password is in /root/secret [10:07:28] 53GB on /mnt/db [10:07:59] all of that in a huge ibdata1 file [10:08:11] maybe we can just rsync that :] [10:08:35] you don't want new data and schema? :) [10:08:56] !log deployment-prep applying role::labs::lvm::mnt on deployment-db1 to provide additional disk space on /mnt [10:08:59] Logged the message, Master [10:09:14] springle: do you mean starting with fresh dbs? [10:09:40] the schemas are updated continuously using MediaWiki maintenance/update.php which apply the sql patches on all dbs [10:10:27] yes, fresh data + schema cross check [10:11:07] that might be a good idea :-] [10:11:51] though we will have to redo all the user / groups configurations [10:11:52] hence question: how to choose what we whittle away to reduce terabytes to 53G [10:12:02] ah [10:12:31] do you think this is too much to bite off right now? [10:12:39] probably [10:12:42] ok [10:12:56] again the beta cluster are unrelated to the production db [10:13:06] the wikis there have been created totally empty [10:13:15] and we never imported anything from the prod db [10:13:25] it just needs some data, not specific data. gotcha [10:13:31] yeah [10:13:35] sorry for the confusion [10:13:54] np at all [10:13:59] the ibdata1 file can probably be shrinked somehow [10:14:05] we never ran any db maintenance script on it [10:14:12] only by dumping and reloading [10:14:26] ibdata is recalcitrant [10:16:15] deployment-db1 seems ready now [10:16:36] with /mnt having 139GB :-] [10:16:43] plenty [10:17:37] ah and I found out the beta cluster wikidata database is on a different host deployment-sql02 :D [10:20:10] shall we put wikidata on deployment-db1, or is the separation necessary? [10:20:50] we can put everything on a single master [10:20:54] will be easier to handle I guess [10:24:45] and I found out simplewiki has been imported from the production one, thus it is huge [10:24:49] should probably clean it up :-] [10:29:43] hashar: looking at deployment-db1 configure page. how does Special:NovaInstance choose puppet classes to display? [10:30:50] ie, if I add a new core::db::beta or something, how to make it available for assignment to an instance? [10:32:33] * springle starts grepping OpenStackManager source [10:34:53] ah [10:35:00] yeah that is cumbersome [10:35:03] you have to add the class to the project [10:35:17] [Manage Puppet Group] in the sidebar or https://wikitech.wikimedia.org/wiki/Special:NovaPuppetGroup [10:36:11] thanks [10:36:58] !log deployment-prep Cleaning up simplewiki by deleting most pages in the main namespace. Would free up some disk space. deleteBatch.php is running in a screen on deployment-bastion.pmtpa.wmflabs [10:37:01] Logged the message, Master [10:37:20] brb [10:37:33] I should probably drop it and recreate it from scratchhehe [10:48:50] !log deployment-prep Stopped the simplewiki script. Would need to recreate the db from scratch instead [10:48:53] Logged the message, Master [11:10:08] anyone happen to know i can resolve a exim: insufficient disk space issue? [11:12:54] I am off to attend a coding dojo. Will be back in roughly 2 hours [11:19:11] it looks like php mail is broken because of exim: insufficient disk space. not sure yet how to fix it [11:21:34] is there a path to see ganglia metrics for eqiad labs? [11:22:08] ChrisJ_WMDE: unfortunately not yet [11:22:43] ok, thanks. [13:06:57] Coren: Seen https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Revision_history.2C_Edits_by_user_is_503 ? [13:09:13] Coren, oing [13:22:16] * Coren arrives. Ta-da! [13:22:25] * Coren reads scrollback. [13:33:13] anomie: From what I can see, Sigma has a working .lighttpd.conf for the tool that is only waiting on a maintainer to deploy (or for new maintainers to be added). [13:34:02] Coren: I just thought you might want to be aware of/respond to the criticism of Tool Labs in there. [13:35:53] anomie: It's not entirely unjustified; the migration /was/ annoying and troublesome -- it just was necessary. Then again, that same thread also showcases one of the primary advantages of the setup (easy collaboration between maintainers). That said, I'll add a little note there. [13:36:44] Coren: True. Although the migration was heavily announced on labs-l, and I think I saw it on wikitech-l a few times too. [13:37:11] So the "How was I supposed to know about it" criticism is somewhat misplaced. [13:37:47] Yeah, I don't think we failed to announce it and warn about it repeatedly but to be fair even if you knew about it it was an annoyance for maintainers. [13:40:16] Coren: migration wasn't problem the fact that new environment is different from previous one is the problem [13:40:31] the webservices weren't required before [13:40:49] you could just upload your stuff to public_html and everything worked [13:40:52] Coren, ping [13:41:00] petan: OTOH, the changes seem to me to be for the better, getting rid of cruft that never worked that well. [13:41:13] anomie: sure [13:41:30] anomie: I am just pointing out what seems annoying to users [13:41:36] Cyberpower678: Why not just ask what you want, and he'll respond when he gets the chance. [13:41:50] people don't like things that changes and stop being backward compatible [13:42:22] Cyberpower678: Don't ask to ask. Just ask. [13:42:30] +1 [13:43:25] petan: All of those changes were required for stability and requested new functionality; and none of them come as surprise as they have been announced months in advance. The lighttpd-based setup has been marked as the future for some six months. :-) [13:43:45] Coren: I know [13:44:21] but you can't expect people reading announcements [13:44:23] they never will [13:56:54] Coren: wasnt the webservice supposed to auto start in eqiad? [13:57:11] Betacommand, no. You have to start it. [13:57:22] Betacommand, webservice start' [13:57:34] Cyberpower678: thats not what I was told [13:58:00] That's what I've been told and have done. [13:58:04] become tool [13:58:09] webservice start [13:58:14] Cyberpower678: I know how to do it [13:59:38] Cyberpower678: one of the main changes between tampa and eqiad was the new web system, the proxy was supposed to detect if the service was running and start it if it wasnt, since the shared servers in tampa where killed [13:59:47] Betacommand: That's one of the "requested new functionality" bits. I don't want to do that before the dust from the migration settles a bit and maintainers have had a chance to look at their tools first. [14:00:29] Coren: would make things much easier for those who where depending on the shared webserver [14:01:06] Hell I would still be if the old apaches where not slower than a snail [14:02:17] Betacommand: To a point. It'd also would have caused a number of subtle issues on tools that are not being actively maintained (which is surprisingly many). [14:02:55] Betacommand: Efficiency is one of the reasons for the switch, but the bigger one is that per-tool lighttpd means that one misbehaving tool cannot bring the others down (as occured with the shared apaches at regular interval) [14:03:06] Coren: most tool owners write the tool, get it setup and let it be unless there is a problem [14:03:53] Hell Ive got tools that I have been running for years that I havent touched since 2009 [14:04:19] Also, the webservice scheme allows /other/ daemons than lighttpd to be used; hence the impending arrival of tomcat. [14:22:57] anyone know how i can add disk space to our instance / drive? [14:24:32] dan-nl: there is a role class for it which would create a LVM logical volume with all the disk space and mount it at /mnt [14:24:51] dan-nl: role::labs::lvm::mnt [14:25:25] Coren: have you ever managed to connect to the serial console of an instance ? [14:25:40] hashar: thanks, unfortunately i have no idea yet what any of that means :( [14:25:44] Coren: I got two instances locked up because they can't mount an entry from /etc/fstab . The console ask to press S to continue [14:25:51] hashar: No, although I've used some trickery to send keypresses before. [14:25:59] Coren: ohhh [14:26:25] mutante tried yesterday and could not reach the console for some reason :(- [14:26:39] so I guess it is the virsh send-key [14:26:43] I need to find the key codes [14:26:45] hashar: Yeah, the console is wonky, but the "keyboard" works for specific keypresses. [14:27:15] hashar: Yeah, note that you want the /linux/ keycode, not the hardware keyboard scancode. [14:27:49] do we have any clue why the console is broken ? [14:30:11] Coren: a little reminder about the crontab, if you'd have the time [14:31:14] hashar: Not really; and honestly we didn't spend all that much brain cycles on it given how rarely it would end up being necessary. [14:31:29] Coren: make sense [14:31:33] (And also, the instances not having passwords makes its usefulness even more limited) [14:32:03] so I could use your fingers to press S on the deployment-cache-mobile03.eqiad.wmflabs instance. [14:32:03] Should be sshing on compute node virt1002 then issue::: virsh send-key aa3c3550-96a7-4f20-a1ab-c88c01a8e5e9 KEY_S [14:32:29] I would do it myself but apparently dont have access on virt** nodes :-( [14:32:47] fluff: Look in your home. Also remember to /not/ restore that to your user account. :-) [14:33:49] dan-nl: are you an admin on the labs project? [14:34:14] hashar: Sent. [14:34:18] hashar: yes [14:35:17] dan-nl: on your instance configuration page you can check the role::labs::lvm::mnt puppet class [14:36:28] Coren: stupid console still asking to press S to skip the failing mount :-( [14:36:50] hashar: I might need to use the i-0000xxxx node name. What is it? [14:36:51] I am pissed off [14:37:03] i-00000080.eqiad.wmflabs [14:37:12] hashar: k, adding it now [14:37:34] dan-nl: once applied, connect to the instance, then manually run puppet using: sudo puppetd -tv [14:37:54] dan-nl: that should apply the class and thus create a logical volume with all disk space and mount it at /mnt/ [14:38:06] hashar: How's that? [14:38:09] k, running that now [14:38:22] Coren: sorry forgot to give you the context [14:38:43] hashar: No, I mean, I did it again with the i- number; how's the result? [14:38:48] * hashar https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=consoleoutput&project=deployment-prep&instanceid=aa3c3550-96a7-4f20-a1ab-c88c01a8e5e9®ion=eqiad [14:38:51] hashar: The context I underrstand. :-) [14:38:54] Coren: still stalled :( [14:39:18] I should download the image, boot it on my computer, press S, pause the instance and upload it back to labs :D [14:39:44] hashar: Hm. Doesn't seem to work right in eqiad then. :-( But why don't you just remove the fstab entry through puppet? [14:40:03] Coren: the instance is not booting :-( [14:40:09] Coren: thanks [14:40:38] hashar: It's trying to mount the NFS shares so /clearly/ puppet has run on it. [14:40:43] Coren: Is the migration now finished? No moving parts, maintainers can reinstate crontabs and other stuff without fearing to be overridden? [14:40:55] scfc_de: For all but the last two tools yes. [14:41:11] scfc_de: (And those might be *finally* finished now, I haven't yet checked today) [14:41:17] Coren: yeah it did run. But the varnish class I applied on it inserted a /srv/vdb /dev/vdb entry in /etc/fstab. And in eqiad there is no /dev/vdb so the instance is locked.. [14:41:30] hashar: Ah, poop. [14:41:36] * Coren tries something else. [14:41:44] Coren: sorry should I give you more context [14:43:21] hashar: It looks like the "console" doesn't have an input device at all. [14:43:32] hashar: So, I'm afraid, SOL. [14:43:39] SOL ? [14:44:04] hashar: i get a puppet error. maybe because there's no enough drive space to carry out the operation? [14:44:06] err: /Stage[main]/Labs_lvm/Exec[create-volume-group]/returns: change from notrun to 0 failed: /usr/local/sbin/make-instance-vg '/dev/vda' returned 1 instead of one of [0] at /etc/puppet/modules/labs_lvm/manifests/init.pp:33 [14:45:12] i think i'll just delete some pages and their images to free up drive space for now ... [14:45:36] dan-nl: ah sorry, was it on eqiad or pmtpa? [14:45:50] eqiad [14:46:14] MaxSem: May I mark the 'mobile' project as fully migrated? [14:46:48] dan-nl: well I have no clue. It "should" work :D [14:47:09] dan-nl: maybe running the command manually would give more details. Aka /usr/local/sbin/make-instance-vg '/dev/vda' [14:47:18] andrewbogott, yeah - I asked others who care about their instances to respond on ML, no reply = don't care [14:47:52] MaxSem: ok, thanks [14:47:52] halfak: Shit Outta Luck. [14:47:58] MaxSem: is mobile-sms one of yours? [14:48:03] Coren: can we mount the instance disk somehow to edit the faulty /etc/fstab entry ? [14:48:24] andrewbogott, no. poke yurik or dr0ptp4kt [14:49:45] MaxSem: done, thank you. [14:49:53] hashar: I can do that, give me a minute to re-learn how... [14:50:12] hashar: I take it that just building a fresh one is out of the question? [14:50:56] andrewbogott: well it would probably take a day or so to rebuild the two faillinginstances [14:51:08] ok, lemme see what I can do. [14:51:17] so if they can be fixed in an hour, it is worth the investment :-} [14:51:28] names/ids/project/etc? [14:51:41] * hashar both on deployment-prep  [14:52:00] first https://wikitech.wikimedia.org/wiki/Nova_Resource:I-00000080.eqiad.wmflabs deployment-cache-mobile03 aa3c3550-96a7-4f20-a1ab-c88c01a8e5e9 [14:52:12] andrewbogott: FYI, I am still working on a "true" fix but I found a suitable workaround for the readonly NFS mount when it happens: make sure that the shares are /unmounted/ and do an 'exportfs -r' on labstore1001. That flushes the ACLs /if/ they aren't mounted. [14:52:19] second is https://wikitech.wikimedia.org/wiki/Nova_Resource:I-00000103.eqiad.wmflabs deployment-cache-upload01 f2264b5b-afe4-4c1d-89f7-591464a39858 ( on virt1004 ) [14:52:25] Coren, ok, noted! [14:54:10] andrewbogott: ah found "Mounting an instance disk" at https://wikitech.wikimedia.org/wiki/OpenStack#Mounting_an_instance.27s_disk [14:57:49] hashar: OK… /dev/vdb? I should just remove that line? [14:58:00] This thing will re-run puppet as soon as it boots, so you'll have to account for that. [15:00:36] um… hashar, still there? [15:02:16] welp, I'm going to go make some breakfast [15:02:21] back soon [15:15:27] yeah back sorry went down to grab a coffee [15:19:25] so. [15:19:36] "/dev/vdb? I should just remove that line?" [15:20:07] andrewbogott: yes :-] [15:20:22] should have added a mount { '/dev/vdb': ensure => absent } or something like that [15:21:08] hashar, try to reboot and see what happens [15:21:23] um… the first one, cache-mobile03 [15:21:35] ah rebooted both [15:23:37] mobile03 is in state SHUTOFF (rebooting) [15:24:03] that seems good, so far... [15:24:08] HURRAHHHHHHH [15:24:42] did it come up? Can you ssh? [15:24:58] yeah I am on it! [15:25:00] wonderful [15:25:10] ok, I'll do the second one [15:26:55] ok, you can reboot that one too [15:27:01] Um… hm. [15:27:04] well, try it. [15:27:29] ok [15:27:48] rebooting [15:30:24] !log deployment-prep deployment-cache-upload01.eqiad.wmflabs and deployment-cache-mobile03.eqiad.wmflabs recovered!! /dev/vdb does not exist on eqiad which caused the instance to be stalled. [15:30:27] Logged the message, Master [15:30:49] !log deployment-prep migrated deployment-cache-upload01.eqiad.wmflabs and deployment-cache-mobile03.eqiad.wmflabs to use the salt/puppetmaster deployment-salt.eqiad.wmflabs. [15:30:51] Logged the message, Master [15:30:58] andrewbogott: Coren: thank you very much to both of you !! [15:31:05] np [15:31:39] andrewbogott: have you used the instructions to mount an instance disk which is at https://wikitech.wikimedia.org/wiki/OpenStack#Mounting_an_instance.27s_disk ? [15:31:45] yep [15:31:54] great [15:31:58] hashar, is the migration of deployment-prep mostly going fine? Do you need help with anything else? [15:32:11] beside the crazy puppet hacks I got to do [15:32:12] yeah [15:32:29] springle created an instance for the databases this morning and we talked a bit about how to migrate the data [15:32:29] ok [15:32:37] there is a bunch of puppet oddities floating around though [15:32:43] but overall it progresses well [15:32:50] should write a report to labs list maybe [15:33:06] and bd808 created a puppetmaster / salt master instance for beta [15:33:14] much like you did back in november on pmtpa [15:33:18] though I never followed up on that [15:33:53] Tpt_: Are you around? [15:34:02] andrewbogott: yes? [15:34:02] hashar: Yeah, I think you'll be happy with a project-local puppetmaster. [15:34:13] You're still having access problems for your migrated instance, right? [15:34:24] wikisource-tools, wikisource-dev? [15:34:36] andrewbogott: yes [15:34:48] hashar: Are there more instances you need to mess with today? Sam took over the deploy \o/ so I have unexpected free time [15:34:56] Tpt_: let's sort that out now. I'm catching up... [15:35:44] bd808: just in time. I think I have migrated all instances to use the deployment-salt instance as a puppet master [15:36:03] bd808: that is a bit cumbersome since we have to check the box on multiple web pages on wikitech and fill in the fingerprints. [15:36:17] Yeah. It's a pain [15:36:20] andrewbogott: "ssh -A tpt@bastion-eqiad.wmflabs.org" works fine but when I'm login into bastion-eqiad "ssh wsexport.eqiad.wmflabs" returns "Permission denied (publickey)." [15:36:28] bd808: and thanks for the "" puppetca sign --all && salt-key --accept-all --yes "" I would never have figured it out [15:36:35] I'd like to setup a cron job to autosign the keys [15:36:49] Tpt_: what about wsexport? Is that one working? [15:36:58] Oh, sorry, you just said that [15:37:10] So -- it works for me. So something is probably amiss on your end. Are you able to access other labs instances? [15:37:15] Besides bastion, I mean? [15:37:43] hashar: I started on a script to update the puppet git repo: /home/bd808/git-sync-upstream on deployment-salt [15:37:49] andrewbogott: "ssh wikisource-dev.eqiad.wmflabs" doesn't works too [15:38:08] andrewbogott: But I've no problems with Tool labs [15:38:18] bd808: mind if we switch to private message to avoid spamming this place ? [15:38:34] Tpt_: OK, but you're accessing toollabs in a different way, right? What I mean is, does key forwarding to a bastion and then on to another instance... [15:38:39] does that work, or has it ever? [15:38:52] I can access wsexport just fine, so I'm pretty sure the issue is on your end. [15:38:58] (For that instance, at least) [15:39:09] andrewbogott: I had no problem with pmtpa bastion and instances [15:39:32] andrewbogott: For Tools labs, I use tools-login.wmflabs.org [15:40:44] Tpt_: ok, please ssh -A bastion-eqiad.wmflabs.org [15:40:47] and then ssh-add -l [15:41:02] what does it say? [15:41:22] andrewbogott: "The agent has no identities." [15:41:32] ok, so you aren't actually forwarding a key. That's the problem. [15:41:35] Or, at least /a/ problem. [15:41:40] Are you on linux, mac, windows? [15:41:52] mac OS 10.9 [15:42:04] With default ssh [15:44:01] Tpt_: The docs for key forwarding are here: https://wikitech.wikimedia.org/wiki/Help:Access#Accessing_instances_using_agent_forwarding [15:44:13] Does that ring a bell? [15:44:20] I can talk you through it if this is unfamiliar. [15:45:32] andrewbogott: Yes. I've just done the instructions and it works fine now. Thanks a lot ! :-) [15:45:43] Tpt_: Cool! Is the other instance working as well? [15:46:04] yes [15:46:26] great. Go ahead and close that bug, then, if everything is working. [15:47:17] !log deployment-prep fixed salt-minion service on deployment-cache-upload01 and deployment-cache-mobile03 by deleting /etc/salt/pki/minion/minion_master.pub [15:47:19] Logged the message, Master [15:48:41] * Coren lunches. [15:48:43] hi, please install git review on labs [15:48:51] http://www.mediawiki.org/wiki/Gerrit/git-review [15:48:53] I need this [15:48:54] andrewbogott: I've closed the bug and updated "Labs Eqiad Migration/Progress" [15:49:03] thanks! [15:49:33] Amir1: Are you talking about tools, or a private labs project? [15:49:43] andrewbogott: tool [15:49:49] *tools [15:49:50] Ah,ok, in that case best to file a bug. [15:49:58] Seems like a reasonable thing to want, I'm surprised it's not already there. [15:50:20] I was already there but now it's not (I think it's because of the migration) [15:50:30] *it was [15:50:33] Yeah, could be it wasn't puppetized properly. [15:51:39] hey guys can anybody tell me how can I convert a .c file to .exe file which was written in codeblocks ? [15:52:12] andrewbogott: I hope bugging Coren makes the things faster [15:52:18] :DD [15:53:48] hey guys a little help would be useful for me.. as u guys r the boss. [15:54:32] Pratyya: I think you may be in the wrong room… .exe is a windows executable (usually) and labs is entirely linux based. [15:55:47] owww. I'm using windows. [15:57:00] Pratyya: convert .c to .exe = compiling .. on Windows you'd use VIsual Studio or something [15:58:38] paravoid, mutante, petan, Damianz, scfc_de, would one of you like to testify on behalf of the labs 'nagios' project? [15:58:48] hi [15:59:02] hi! [15:59:07] whatever [15:59:33] petan, have you been replaced with a chatbot? [15:59:48] no but my irc client is good :P [15:59:49] Or does 'whatever' sum up your attachment to the nagios project? [15:59:54] andrewbogott: I don't need the nagios project for testing, I can abuse Toolsbeta for that. [16:00:15] I'm not clear on if nagios project is for nagios testing or was an attempt at labs monitoring [16:00:19] https://bugzilla.wikimedia.org/show_bug.cgi?id=62871 [16:00:24] mutante: I'm asking about pre compile [16:00:32] andrewbogott: no, it's not for testing, it's production like :P [16:00:39] andrewbogott: I think it is currently hosting icinga.wmflabs.org. [16:00:45] andrewbogott: it's where icinga.wmflabs.org lives where [16:00:48] ah, that could be important then. [16:00:51] andrewbogott: no:) [16:01:19] It has three instances: nagios-main, nagios-dev, icinga. [16:01:32] "I'm not clear on if nagios project is for nagios testing or was an attempt at labs monitoring" [16:01:33] Which should I migrate vs. which should I scrap? [16:01:37] exactly that ^ andrewbogott [16:02:20] petan, scfc_de : I would be very happy if you fix this bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=62871 [16:02:26] thank you :) [16:02:51] Amir1: Will do. [16:03:14] Amir1: it's needs to be added to gerrit, which is painful and annoying (+ anyone can do that), then it needs to be merged by someone from ops, which takes ages [16:03:22] Amir1: so I am not really sure I can help you much :( [16:03:26] petan, so, shall I mothball that project pending you having time to revive and update it for eqiad? [16:03:46] andrewbogott: I will start migrating it right away [16:04:00] oh, great. thank you! [16:04:14] Will you sign on on the progress page so I know not to mess with it? [16:04:23] mhm [16:04:25] petan: oh I thought it's as easy as "sudo pip install git-review", So Okay, I'll wait, thank you again [16:04:30] @search wikitech [16:04:30] Results (Found 32): pxe, wikitech, mobile-cache, tooldocs, rq, proxy, replicateddb, add, sudo, docs, tools-admin, wm-bot2, wm-bot3, access, reboot, beta, wm-bot4, wikitech-putty, todo, tdb, mediawiki-instance, toolsvslabs, mediawiki, labs-putty, sal, newweb, self, queue, accessproblems, tools-request, migration, toolsmigration, [16:04:36] !migration [16:04:36] https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration_Progress [16:04:42] andrewbogott: this page? [16:04:47] that one! [16:04:50] ok [16:05:03] Amir1: no, if it was that easy, life would be great [16:05:10] :P [16:05:28] :D [16:07:26] Amir1: I'm guessing you want it on tools-login, not on the exec nodes? [16:07:38] That is -- bots aren't using it, just you using it interactively, yes? [16:08:12] andrewbogott: yes [16:09:14] andrewbogott: it needs to be in puppet anyway, "sudo apt-get" is strongly forbidden on tools project, should you use it, C oren will delete you [16:09:31] petan: https://gerrit.wikimedia.org/r/119771 [16:09:52] that's the right way [16:10:42] one day I will make a tool "tools-apt-get install blah" which will then insert the manifest for "blah" and submit a change to gerrit [16:10:46] andrewbogott: I prefer dev_environ; cf. https://gerrit.wikimedia.org/r/#/c/118595/. [16:11:18] scfc_de: Ah, you're right -- I missed that. [16:13:05] "joe", it just doesn't give up:) [16:13:36] Is joe controversial? It's just a text editor, isn't it? [16:13:38] andrewbogott: i think the git-review version from package will work but also be outdated [16:13:43] I'm guessing, it 's not exactly googlable. [16:13:50] they recommend the one from pip ..(i hate pip too) [16:14:18] andrewbogott: it just has a long history of trying to get into prod and/or labs base tools [16:14:28] and then there were the editor wars:) [16:15:01] mutante: Recommended, yes (do you know any software project that doesn't praise the latest greatest? :-)), but are there any blockers? [16:15:48] no guys, just try it [16:15:50] scfc_de: not to worry, I am definitely not installing anything with pip! [16:16:00] I only use the upstream ubuntu version, it works fine. [16:16:11] Amir1: give us 30 minutes or so for the change to apply, then you should have git-review. [16:16:16] i'm more amused because i got the exact same discussion in the past [16:16:22] go ahead [16:16:42] andrewbogott: thank you so much [16:42:57] andrewbogott: I have some problems migrating icinga [16:43:07] petan: OK, what? [16:43:19] andrewbogott: I made this new instance "icinga" like hour ago and I still can't ssh there [16:43:29] petan, check your security groups? [16:43:40] ssh is disabled by default now? [16:46:23] no, but... [16:46:28] I copied the security groups over from pmtpa [16:46:37] which means they probably don't allow ssh from eqiad bastion [16:48:09] petan: also be warned that sometimes having multiple security rules for a given port causes weird behavior. I usually just set up 10.0.0.8 for ssh in eqiad and wipe the other rules. [16:48:12] 22 22 tcp 0.0.0.0/0 [16:48:15] But, anyway… was that it? [16:48:36] andrewbogott: I am getting Permission denied (publickey). [16:48:46] I am 98.6436% positive it's not firewall [16:51:50] petan: I have to go in a minute, sorry. [16:51:57] I have to go as well :) [16:51:59] no problem [16:52:07] I'm not sure what's happening. We do have a race condition that sometimes prevents user keys from getting mounted on the first attempt. [16:52:09] So a reboot might help. [16:52:20] But, my root key doesn't work either, which is weird. [16:52:29] maybe puppet failure? [16:52:34] I'd suggest a reboot and/or starting a new instance, and then bugging me more if this becomes a pattern. [16:52:38] Dunno, the syslog looks ok. [16:52:43] ok [16:53:17] oh, wait! [16:53:24] petan, there's one thing -- having a project named 'nagios' is problematic [16:53:47] I can't remember the details, it's something to do with a name conflict with a default 'nagios' user. [16:54:10] So if you can stand it, maybe we can create you a new project (labs-nagios?) and you can migrate to that? [16:57:00] andrewbogott_afk: or maybe "icinga"? [16:57:08] we aren't using nagios anyway on there [17:07:39] +1 for icinga. [17:08:22] (Or "monitoring", if we want to include Ganglia in that.) [17:33:23] (03PS1) 10Dzahn: add bugzilla/modifications repo to ops channel [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/119784 [17:55:38] What happened with tools-dev.wmflabs.org ? [17:57:39] multichill: Coren: old dns entry (pmtpa) tools-dev.wmflabs.org. 3570 IN A 208.80.153.163 [17:58:53] hedonil: Happen to know the new ip? [17:59:07] multichill: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Overview [18:01:28] hi! im new to tools labs. im wondering how to access and manage databeses. could someone help? [18:14:37] hedonil: Still possible to connect to the old stepstone? [18:15:18] multichill: you mean the old dc in pmtpa? [18:15:24] yes [18:15:55] multichill: no. it's locked down. [18:16:37] @seen addshore [18:16:37] petan: Last time I saw addshore they were quitting the network with reason: Quit: Connection closed for inactivity N/A at 3/19/2014 5:54:39 PM (1d21m58s ago) [18:25:59] !ping [18:25:59] !pong [18:26:01] ok [18:27:05] hedonil: So where can i find my data? For example p50380g50831__stations_p tools-db [18:27:59] multichill: whats your tools name? [18:28:31] For example "railways" [18:28:39] But I had/have some more [18:30:23] hedonil: [18:30:39] multichill: if it's not in the current database list of tools-db https://tools.wmflabs.org/tools-info/?dblist=tools-db it may have not been copied yet [18:31:56] multichill: maybe it wasn't in 'standard naming' format. Occured several times before. You have to poke Coren to check/copy it. [18:33:10] Standard? Huh, that isn't the kind of name I would make up myself. I probably used the manual to set that [18:33:26] multichill: not my words ;) [18:34:24] user='p50380g50831' <- mysql username [18:34:48] So looks like ___p [18:35:13] hedonil: I see it still listed at https://wikitech.wikimedia.org/wiki/Special:NovaServiceGroup [18:35:23] multichill: the copy tool checked the replica.my.cnf for tools databases. if there has been an old naming like .my.cnf the copy process might have missed it. [18:36:37] !log tools Pointed tools-dev.wmflabs.org at tools-dev.eqiad.wmflabs; cf. [[Bugzilla:62883]] [18:36:41] Logged the message, Master [18:37:08] scfc_de: You might want to consider adding the missing rdns entries too [18:39:10] multichill: I don't think they're handled manually; either they exist or not :-). (The DNS change for tools-dev may take up to an hour to propagate.) [18:39:26] multichill: poke Coren a list of all missing databases. [18:39:48] scfc_de: how come that dns did change back again? [18:40:19] hey, my project on tools-labs isn't working. So I `ssh tools-login.wmflabs.org` and get "Host key verification failed." [18:40:44] presumably this is due to the move, but I can't find any mention of on tools labs pages [18:40:49] spagewmf: there are new ssh fingerprints [18:41:17] spagewmf: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Overview [18:42:01] scfc_de: http://git.wikimedia.org/blob/operations%2Fdns.git/9780578451acd4964cfa401b03fa269dcfd31f84/templates%2F155.80.208.in-addr.arpa [18:43:13] Good old bind ? :-) [18:43:42] multichill: at least bind files [18:44:22] hedonil: Looks like no project databases ended up at https://tools.wmflabs.org/tools-info/?dblist=tools-db [18:44:39] I do see them at the S*.labsdb, but I think nothing changed for those [18:45:20] petan: icinga is good. As long as there isn't a default 'icinga' user anywhere in puppet :) [18:45:58] multichill: yep. replica databases didn't change. only tools-db [18:47:22] hedonil: thanks, I added a note at the top of https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Overview [18:47:39] spagewmf: 'k [18:49:20] andrewbogott: ok let me know when you prepare it [18:49:30] I'll do it right now. one minute... [18:49:33] ok [18:50:36] ok, all yours. [18:50:52] I'm going to add 'icinga' to the progress list as 'migrated'. You can move the icinga project there as well once you've salvaged all you need. [18:51:06] And, thanks for being flexible. I could probably sort out the name collision problem but it would take a while. [18:53:06] hedonil: Any idea where I can find the database naming convention I should use? [18:54:16] http://tools.wmflabs.org/styleguide/desktop/ is failing ; looking at http://tools.wmflabs.org/?list , a bunch of other tool URLs are also getting "is not currently serviced". What should we do? [18:54:44] multichill: your database naming convention was correct p50380g50831__stations_p was perfectly right (old style), but maybe your replica.cnf wasn't. [18:54:59] spagewmf: simply type: $ webservice start [18:55:48] spagewmf: unless you had no cgi configured, this will start the tools webserver an do the trick. [18:56:32] multichill: 208.80.155.130 resolves to tools-login.wmflabs.org without an explicit DNS entry, so I assume in an hour that will work for tools-dev as well. [18:56:45] hedonil: Don't know *if* it rolled back as I never tested it. [18:56:57] hedonil, this level of simplicity is unacceptable! :) Many thanks, styleguide working [18:57:07] scfc_de: I was talking about *reverse* dns [18:57:23] And that didn't resolve [18:57:36] scfc_de: i did, worked on monday (last time I checked this) [18:57:55] spagewmf: writing a migration faq right now [18:59:02] spagewmf: feel free to leave a sign if everything works fine https://wikitech.wikimedia.org/wiki/Tool_Labs/Migration_to_eqiad [18:59:48] scfc_de: >nslookup 208.80.155.130 ns0.wikimedia.org -> NXDOMAIN [19:00:05] multichill: I was as well, and "host 208.80.155.130" => "tools-login.wmflabs.org" on my machine, but that's just my local resolver being clever?! RDNS is certainly very nice to have, but that would have to integrate with the OpenStack wiki interface, otherwise it would just be chaos. [19:00:58] No, apparently my all-in-one router says so. Interesting. [19:01:42] Does openstack add (forward) dns entries now? If so, you can also add the reverse ones [19:01:54] hedonil++ BTW, the first page I visited was ...Tools/Help ; the relationship between it and Tools and Tools/Overview is unclear [19:02:12] You could even delegate 208.80.155.128/25 to a dummy zone [19:02:44] spagewmf: Overview is new for eqiad. it wiil be more prominent after the migration [19:03:41] spagewmf: but the ssl key thingy has also been mentioned in http://lists.wikimedia.org/pipermail/labs-l/2014-March/002241.html [19:04:40] andrewbogott: Can you enlighten us about OpenStack and reverse DNS? Is there an open bug or RT, or is it impossible? [19:05:29] scfc_de: at the moment our DNS doesn't have anything to do with OpenStack. The web interface inserts entries into ldap [19:05:50] well, wait, that's not totally true… I guess OpenStack does dns for the shortnames of hosts. [19:06:03] Anyway, I maybe don't know enough about DNS to answer this question. What is the problem? [19:06:37] scfc_de: Labs seems to have its own dns servers at labs-ns0.wikimedia.org and labs-ns1 [19:06:55] You could also have a zone for the public /25 [19:07:09] DNS outside of Labs resolves tools-login.wmflabs.org to 208.80.155.130, but there's no record for the other way round. [19:07:13] andrewbogott: ^ [19:07:21] Ah, I see. [19:07:30] Delegate the 208.80.155.128/25 once to the labs-ns* [19:07:34] That's definitely an ldap/pdns thing. If you want to research it and make a bug with a specific suggestion I'll consider it. [19:07:55] k [19:07:57] But the current dns setup is fairly rickety and we're hoping to replace it with a proper OpenStack service sometime this year. so I won't spend a lot of time adding features to the existing setup. [19:08:10] andrewbogott: labs-ns0, what kind of server is that? Powerdns? [19:08:27] (I just now worked on a dns bug yesterday from about 9AM to midnight so my view is a bit jaundiced just now) [19:08:33] yes, pdns backed with ldap. [19:11:21] labs-ns0 is currently an alias pointing to virt0 [19:11:25] and labs-ns1 to virt1000 [19:11:39] andrewbogott: http://www.ietf.org/rfc/rfc4183.txt :P [19:12:43] So you would have to create 255-128.155.80.208.in-addr.arpa on labs-ns0 and fill it like you fill the forward (put PTR instead of A) [19:44:32] !petan-build [19:44:32] make -j `getconf _NPROCESSORS_ONLN` deb-pkg LOCALVERSION=-custom [19:52:11] Last time I checked (recently) dns support in openstack still sucked, but writing your own hacky hooks to update pdns is too easy :D [20:25:53] hashar: I have no idea what most of that email means :p [20:29:27] hello a930913 :-] [20:29:52] well that is mostly an internal email except it is on a public list [20:30:17] a930913: the Beta cluster is the wikimedia cluster build on labs. We are using it to test out code before deploying them in production [20:30:25] and [20:30:47] we are migrating it from the pmtpa datacenter to the eqiad datacenter just like all the labs project [20:35:35] (03CR) 10Hashar: [C: 031] add bugzilla/modifications repo to ops channel [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/119784 (owner: 10Dzahn) [20:38:40] Okay. I'm gonna try to get May set up so she can ssh into my labs instance. [20:38:51] First, I assume, she needs to make an account on wikitech, yes? [20:38:54] after that, then what? [20:39:02] hey jorm! [20:39:13] jorm: she needs to be 'approved' first. This usually happens quickly (I can do it) [20:39:20] jorm: then she needs to setup an ssh key in preferences on wikitech [20:39:55] jorm: then you can add her to your project via https://wikitech.wikimedia.org/wiki/Special:NovaProject [20:40:10] jorm: she *should* be able to ssh in after that [20:41:02] YuviPanda: we have a grrrit conf change for you https://gerrit.wikimedia.org/r/#/c/119784/ :] from mutante [20:41:06] jorm: do poke me after she's made an account, I can approve immediately (I'll be around for another hour) [20:41:10] hashar: yeah, was just about to go into that :) [20:41:17] So, create an account first, get approved, create an ssh key, put it in preferences, i add her to the project. [20:41:30] jorm: yup [20:41:47] hashar: https://gerrit.wikimedia.org/r/#/c/112311/ - would be nice if you or mutante can rebase over that (should be trivial) [20:42:34] YuviPanda: done. she is violetto [20:42:40] jorm: moment [20:42:50] YuviPanda: or merge the trivial change then rebase the rewrite on top of it :-] [20:43:43] jorm: approved. [20:44:11] is there a page that documents this? how labs wants ssh keys and such? [20:44:31] jorm: yeah. moment. [20:44:44] jorm: https://wikitech.wikimedia.org/wiki/Help:Access [20:49:08] (03CR) 10Yuvipanda: [C: 04-1] "Doesn't work :(" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/112311 (owner: 10Adamw) [20:49:33] (03PS3) 10Yuvipanda: grrrit: Fix check in repo_config for "repos" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/118028 (owner: 10AzaToth) [20:49:39] (03CR) 10Yuvipanda: [C: 032 V: 032] grrrit: Fix check in repo_config for "repos" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/118028 (owner: 10AzaToth) [20:50:02] (03CR) 10Yuvipanda: [C: 032 V: 032] add Extension:FundraisingChart; notify wikimedia-dev about FR commits [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/119643 (owner: 10Adamw) [20:50:27] (03PS2) 10Yuvipanda: add bugzilla/modifications repo to ops channel [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/119784 (owner: 10Dzahn) [20:50:49] (03CR) 10Yuvipanda: [C: 032 V: 032] add bugzilla/modifications repo to ops channel [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/119784 (owner: 10Dzahn) [20:50:53] hashar: ^ merged :) [20:50:59] hashar: let me deply [20:51:04] great :-] [20:51:15] !log deployment-prep Creating deployment-jobrunner01 and 02 in eqiad. [20:51:18] Logged the message, Master [20:51:40] jorm: I'm here to help with May as well, if things aren't sorted yet. [20:51:55] ah [20:51:56] interesting [20:51:59] error: server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none while accessing https://git.wikimedia.org/git/labs/tools/grrrit.git/info/refs [20:51:59] fatal: HTTP request failed [20:52:07] Coren: andrewbogott ^ [20:52:25] it can fetch from github fine [20:52:28] YuviPanda: I need more context for that [20:52:47] (03PS1) 10Dzahn: add fake passwords for bugzilla to fix puppet [labs/private] - 10https://gerrit.wikimedia.org/r/119871 [20:52:52] andrewbogott: Coren ah, so I did a 'git fetch gerrit', to URL gerrit https://git.wikimedia.org/git/labs/tools/grrrit.git (fetch) [20:52:59] andrewbogott: and it errored out with ssl errors. [20:53:13] but fetching github over ssl worked. [20:53:37] andrewbogott: this is on tools [20:54:05] YuviPanda: ok, but the issue is with the gerrit cert, right? [20:54:08] Like, the production gerrit? [20:54:16] (03CR) 10Dzahn: [C: 032] add fake passwords for bugzilla to fix puppet [labs/private] - 10https://gerrit.wikimedia.org/r/119871 (owner: 10Dzahn) [20:54:31] andrewbogott: well, fetching works for me from my local machine. [20:54:35] andrewbogott: and yes, production gerrit cert. [20:54:37] !log wikimania-support Updated wikimania-scholarships to cb2ef4c [20:54:38] YuviPanda: deply? [20:54:39] Logged the message, Master [20:55:20] AzaToth: deploy i mean [20:55:23] ツ [20:55:27] !log deployment-prep deleting deployment-jobrunner02 , lets start with a single instance for nwo [20:55:29] Logged the message, Master [20:58:29] (03CR) 10Dzahn: [V: 032] add fake passwords for bugzilla to fix puppet [labs/private] - 10https://gerrit.wikimedia.org/r/119871 (owner: 10Dzahn) [21:00:21] !log deployment-prep migrate jobrunner01.eqiad.wmflabs to self puppet/salt masters [21:00:24] Logged the message, Master [21:03:56] bd808: I somehow screwed up puppet on deployment-scap :-D [21:04:18] hashar: shame on you :p [21:04:32] Oh it was a wreck anyway. [21:04:40] bd808: http://paste.openstack.org/show/73947/ :D [21:05:01] I think I'll just delete that instance and start over [21:05:18] I had a local puppetmaster there that was just a big bag of hacks [21:05:50] deployment-tin sounds like a better name anyway :) [21:14:24] did you know there's a .plumbing TLD? [21:14:30] bd808: or deployment-marionette :-D [21:14:53] * hashar registers mario.plumbing [21:16:43] bah [21:16:50] bd808: deployment-salt died :( [21:17:07] err -scap [21:17:09] oh you deleted it [21:17:10] handy! [21:17:44] Yeah I nuked that poor guy. I'll start over tonight on the new deployment-tin instance [21:17:52] bd808: on beta the -bastion is more or less the equivalent of tin [21:18:12] i.e. -bastion is where folks run mwscript and jenkins update the code / run l10n etc [21:18:18] I could do it all there I suppose [21:18:34] I was worried about breaking the pmtpa setup before [21:18:41] would expect -bastion to bet the machine from which we run scap [21:18:41] Actually I still am. [21:18:48] break eqiad! :-] [21:23:54] * Coren is back for a bit, will be back for real after dinner (~1h from now) [21:24:58] how is it possible to have puppetmaster::self enabled, run puppet, but it doesn't care about changes in /etc/puppet/ [21:25:30] is instance "boogs" self hosted or not [21:26:28] What's happened to the tools server? My application is returning "No webservice", and attempting to login to tools-login.wmflabs.org throw up ssh warning message. [21:26:57] JMarkOckerbloom: https://wikitech.wikimedia.org/wiki/Tool_Labs/Migration_to_eqiad [21:28:34] Does anyone know about the pubsubhubbub project? It has an eqiad instance but no one has taken ownership on the progress page... [21:29:35] thanks for the beta work hashy [21:29:58] * Damianz finishes reading email and debates betwean moving the monitoring instance and doing work [21:30:22] !log deployment-prep manually installing timidity-daemon on jobrunner01.eqiad so puppet can stop it and stop whining [21:30:25] Logged the message, Master [21:33:58] Thanks! Reset ssh key, logged back into tools-login. [21:34:47] Then did become ftl (looks like everything's there); finish-migration ftl; webservice restart. [21:35:14] But I'm still getting "no webservice" message. [21:35:37] JMarkOckerbloom: type $ qstat [21:36:18] Seeing a job there ( lighttpd-f tools.ftl ) Is this something that I have to wait to complete, or unwedge somehow? [21:36:48] JMarkOckerbloom: should be in state (r) = running [21:36:59] it's in state qw. [21:42:02] Job shows submit/start time of 10 minutes ago. Not sure what's holding it up. [21:43:07] JMarkOckerbloom: Hmmm, seems it's hanging. try $webservice stop and restart it agin. [21:45:35] Restarted. It's a Perl script; Perl seems to be in the place it was before, as does library directory. I do do a "use lib" of a couple of directories that don't exist on this machine (so it'll look both in the place this installation uses and the place another installatino uses), but that shouldn't hang it, I wouldn't think. [21:47:13] OK, it's now into r state, and my error message has changed to "Four hundred and four!" [21:47:25] (03PS1) 10Tim Landscheidt: become: Add --help option [labs/toollabs] - 10https://gerrit.wikimedia.org/r/119882 [21:47:45] JMarkOckerbloom: fine. if you used apache and .htaccess before, you have to tweak this configuration int new .lighttpd.conf [21:47:50] !newweb [21:47:50] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help/NewWeb [21:49:26] JMarkOckerbloom: everything is being served from public_html, so either symlink or url-rewrite in lighttpd.conf [21:50:34] I don't see a .htaccess file in my directory. Looking into how to set up a lighttpd.conf file. [21:51:13] (03PS2) 10Tim Landscheidt: become: Add --help option [labs/toollabs] - 10https://gerrit.wikimedia.org/r/119882 [21:53:57] JMarkOckerbloom: maybe your .lighttpd.conf need a config for perl cgi [21:54:04] JMarkOckerbloom: http://www.cyberciti.biz/tips/lighttpd-howto-setup-cgi-bin-access-for-perl-programs.html [22:05:20] Ah, got it working! [22:05:43] For the record, what worked was moving cgi-bin under public_html, and then putting this in .lighttpd.conf: [22:05:51] $HTTP["url"] =~ "^/ftl/cgi-bin" { [22:05:59] cgi.assign = ( "" => "/usr/bin/perl" ) [22:06:05] } [22:06:13] JMarkOckerbloom: great. I'll put it to the docs! [22:06:57] Thanks so much for your help! Do I need to record the migration somewhere? [22:07:30] JMarkOckerbloom: if you want you can add your sign here https://wikitech.wikimedia.org/wiki/Tool_Labs/Migration_to_eqiad#Migration_status.2Fnotes [22:09:41] !log deployment-prep Migrated videoscaler01 to use self salt/puppet masters. [22:09:44] Logged the message, Master [22:09:50] bd808: two more instances created tonight \O/ [22:10:12] hashar: You're a champ! [22:10:22] merely applying the procedure :-] [22:12:35] One other quick policy question: My app sets a cookie that expires at end of session, but might be nice to have it persist longer. Can I put in a 30-day expire time, or does that require review? (Looks like the regular WP signin cookie lasts that long.) [22:13:30] !log rebased deployment-salt git puppet repo 2f2d17e..4179490 [22:13:31] rebased is not a valid project. [22:14:04] JMarkOckerbloom: my longest living cookie lives 3000 days. there's no limit for that. [22:15:39] oh, okay. wasn't sure if there was a privacy policy against long-term tool cookies. I think 30 days would suffice for my tool. [22:16:39] (what's the best reference for the policies the tools should follow? I know about the "no logging IP addresses" rule, but right now I don't get the IP address to work with in the first place.) [22:17:53] bd808: thank you :-]  Have a good afternoon, I am off. [22:18:10] * bd808 waves hashar to bed [22:20:19] JMarkOckerbloom: as you noted, the no-ip policy for tools is pointless as you can't access the raw logfiles at all. https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Web_logs. For the rest: [22:20:21] https://www.mediawiki.org/wiki/Wikimedia_Labs/Agreement_to_disclosure_of_personally_identifiable_information [22:25:29] thanks. I gather some tools can get approvel for using IP addresses, though (a couple OCLC-related apps get them). I'd be using them for targeting redirects, and not logging or retaining them in any way. [22:26:17] JMarkOckerbloom: best to ask Coren for this, should be back in an hour or so. [22:26:36] OK, maybe I'll try logging in later then. Thanks! [22:26:46] JMarkOckerbloom: yw [22:57:50] Hey, chrismcmahon and I are trying to find out how to access the betalabs instances and cannot, for some reason. Anyone have insight? [22:58:26] andrewbogott: ^ [22:59:00] rdwrer: hashar has been building new instances in eqiad, so there might be name changes and such. [22:59:06] Hrm [22:59:08] I don't know the details though. [22:59:40] andrewbogott: apropos of ^^, I useta could "ssh -A bastion.wmflabs.org" and from there ssh to e.g. deployment-bastion, but the deployment-foo hosts no longer recognize my ssh rdwrer [23:00:06] "useta could" :))) [23:00:15] Permission denied (publickey) [23:00:20] That's probably because you're trying to connect to instances that have been shut down [23:00:22] maybe. [23:00:24] rdwrer: I am a child of the [23:00:32] rdwrer: I am a child of the South [23:00:38] *nod* I'm a fan [23:01:06] I would look at the instance list but am not on the beta project [23:01:09] hm, no, betalabs-bastion seems to still be there. Lemme see if it works for me [23:02:46] andrewbogott: I think internal DNS routes correctly to new hosts on eqiad eg "deployment-bastion", but I'm guessing the ssh stuff didn't come across [23:03:52] chrismcmahon: I can log into both deployment-bastion.eqiad.wmflabs and deployment-bastion.pmtpa.wmflabs with my personal (non-root) key. So I don't know what's happening on your end. [23:04:09] Do you have reason to think your key is forwarded properly? Can you access other labs instances? [23:04:24] andrewbogott: hmm, me neither. not my biggest priority right now, but can you help out rdwrer ? [23:06:14] Maybe… rdwrer what is your exact question? [23:06:29] andrewbogott: I want to see why my config change didn't take effect [23:06:59] https://gerrit.wikimedia.org/r/119886 [23:07:05] I mean… what is it that you are doing that used to work but now doesn't? [23:07:21] https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update/2254/ should have deployed it [23:07:29] andrewbogott: Well, deploying config changes to betalabs [23:07:36] But it may be that there's an issue with how I configured it [23:07:39] But I can't tell [23:07:44] Oh -- I don't have anything to do with betalabs, other than providing the framework that it runs on. [23:07:54] Because I'm not on the beta project and chrismcmahon isn't either for some reason [23:08:04] So adding me to betalabs would be an acceptable first step [23:09:28] rdwrer, did you change your handle? This is confusing :) [23:09:46] what is your wikitech name? [23:10:06] andrewbogott: marktraceur [23:10:15] LIFE IS CONFUSING [23:10:17] THEN YOU DIE [23:10:48] ok, added you to the project. But I can't offer more guidance than that, you'll have to locate a deployment-prep admin (e.g. hashar) [23:12:48] That's fine [23:16:14] Definitely not able to SSH to deployment-tin.eqiad.wmflabs [23:16:24] But sshing to the IP addresses from bastion2 seems to work fine [23:16:36] rdwrer: deployment-tin is probably broken [23:16:50] I just started building it [23:17:00] Ah. [23:17:03] you really want to go to deploymnet-bastion.pmtpa [23:17:06] OK [23:17:11] except spelled right [23:17:20] That one seems to work [23:17:35] That's the "real" beta still [23:19:45] rdwrer: If you do need to get into an eqiad instance, don't forget to use bastion-eqiad.wmflabs.org as the bastion into labs [23:19:59] I am [23:20:16] And it looks like my config patch got in there [23:22:21] Seems like the apaches didn't get it though [23:22:43] Oh, wait, yeah they did [23:22:44] WTF. [23:23:16] bd808: The logs go to deployment-fluoride? [23:23:45] deployment-bastion in /data/project/logs [23:25:56] Coren: despite or due to your iron principles ;) , if you want to do a good deed on this issue: https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Revision_history.2C_Edits_by_user_is_503 [23:26:48] Coren: you should check the webservice status, I think it's simply stuck in qw state (as seen with other tools before) [23:27:30] Coren: just copying the files w/o tweaking worked out of the box https://tools.wmflabs.org/newwebtest/index.html [23:27:55] hedonil: I'm not sure I understand what you mean? [23:28:53] Coren: scottywong did a webservice start, but it never started to work. [23:29:24] Coren: it's https://tools.wmflabs.org/usersearch/ [23:31:02] hedonil: Is it still stuck? I understand that it solved itself somehow. [23:31:26] scfc_de: it's still no webservice [23:31:31] Maybe the issue is I need to stick the variables in the -labs versions of the config too [23:32:19] I thought both of them got run, guess not [23:32:24] hedonil: My understanding is that Σ will look at it? [23:33:35] hedonil: What tool is this/ [23:33:45] Coren: I read he was added to the maintainers crew, but no action took place since then (as far as I can see) [23:33:59] Coren: it' s https://tools.wmflabs.org/usersearch/ [23:34:26] hedonil: I don't see any job in qw ("qstat -u \*"). [23:34:29] hedonil: There is, indeed, no running webservice. [23:38:29] Still not seeing it, goddamn. [23:41:26] Oh, lol, because the patch isn't merged. [23:45:18] rdwrer: yeah, merging would probably help [23:45:28] Indeed [23:45:59] !log deployment-prep Converted deployment-tin to use local puppet & salt masters [23:46:01] Logged the message, Master [23:46:51] !log deployment-prep Mounted secondary disk as /var/lib/elasticsearch on deployment-logstash1 [23:46:56] Logged the message, Master