[05:37:04] rschen7754: yes [05:37:17] petan: i'm having problems migrating stewardbots [05:37:33] /usr/local/bin/migrate-tool: line 32: ...MIGRATE.STATUS: Permission denied [05:37:56] let me check [05:39:24] which user are you running the script from? [05:39:55] rschen7754 [05:43:44] rschen7754: I guess this is some nfs bug, I know one cheat [05:43:49] ok [05:43:56] rschen7754: type "sg local-stewardbots bash" [05:44:01] rschen7754: then try again [05:44:04] ok [05:44:31] no error this time, let's see if it works [05:45:36] I think it will [05:45:58] ok, looks like it worked [05:45:59] thanks [05:46:00] ! [05:46:00] There are multiple keys, refine your input: !log, $realm, $site, *, :), ?, {, access, accessproblems, account, account-questions, accountreq, add, addresses, addshore, afk, airport-centre, alert, amend, ask, awstats, bang, bastion, be, beta, bible, blehlogging, blueprint-dns, BOM, BOM2, BOManswer, borg, bot, bots, botsdocs, broken, bug, bz, channels, chmod, cloak, cmds, console, cookies, coren, Coren, cp, credentials, cs, Cyberpower, Cyberpower678, cyberpowerresponse, damianz, damianz's-reset, db, del, demon, dependency, deployment-beta-docs-1, deployment-prep, derp, derpie, doc, docs, domain, dumb, enwp, epad, es, etherpad, evil, excuse, explain, extension, failure, false, FastLizard4, fff, filemoves.py, flow, FORTRAN, forwarding, freenode, gc, gerrit, gerritsearch, gerrit-wm, ghsh, git, git-puppet, gitweb, google, group, grrrit-wm, hashar, help, helpmebot, herpie, hexmode, hodor, home, htmllogs, hyperon, icinga, ident, IE6, info, initial-login, instance, instance-json, instancelist, instanceproject, ip, is, jenkins, jira, keys, labs, labsconf, labsconsole, labsconsole.wiki, labs-home-wm, labs-l, labs-morebots, labs-nagios-wm, labs-project, labs-putty, labstore3, labswiki, leslie's-reset, lighttpd, limitations, link, linux, load, load-all, log, logs, logsearch, mac, magic, mäh, mail, manage-projects, mediawiki, mediawiki-instance, meh, memo, migration, mob, mobile-cache, monitor, morebots, msys-git, mw, nagios, nagios.wmflabs.org, nagios-fix, namespaces, nc, newgrp, newlabs, newlabs2, newlabs-rl, new-labsuser, new-ldapuser, newweb, night, nocloakonjoin, nova-resource, op_on_duty, openstack-manager, origin/test, os-change, osm, osm-bug, paf, pageant, pang, password, pastebin, pathconflict, pc, perl, petan, petan..., petan:ping, petan-build, petan-forgot, ping, pl, po*of, pong, poof, poofing, port-forwarding, project-access, project-discuss, projects, proxy, pung, puppetmaster::self, puppetmasterself, puppet-variables, putty, pxe, pypi, python, pythonguy, pythonwalkthrough, queue, quilt, ragesoss, rb, reboot, redis, reedy-spam-hour, relay, remove, replicateddb, report, requests, resource, revision, rights, rq, rt, rules, Ryan, Ryan_Lane, ryanland, sal, SAL, say, screenfix, search, searchlog, security, security-groups, seen, self, sexytime, shellrequests, single-node-mediawiki, snapshits, socks-proxy, somethingtorelax, srv, ssh, sshkey, start, stats, status, Steinsplitter, StoneB, stucked, sudo, sudo-policies, sudo-policy, sumanah, svn, t13, taskinfo, tdb, Technical_13, terminology, test, Thehelpfulone, tmh, todo, tooldocs, tools-admin, toolsbeta, tools-bug, tools-equiad, toolsmigration, tools-request, toolsvslabs, tools-web, trout, tunnel, tygs, unicorn, valhallasw, venue, vim, vmem, we, whatIwant, whitespace, whyismypackagegone:'(, wiki, wikitech, wikitech-putty, wikiversity-sandbox, windows, wl, wm-bot, wm-bot2, wm-bot3, wm-bot4, wmflabs, xy, you, [05:46:03] ... [05:52:57] !! is hello :) [05:52:58] Key was added [05:53:00] ! [05:53:00] There are multiple keys, refine your input: !, !log, $realm, $site, *, :), ?, {, access, accessproblems, account, account-questions, accountreq, add, addresses, addshore, afk, airport-centre, alert, amend, ask, awstats, bang, bastion, be, beta, bible, blehlogging, blueprint-dns, BOM, BOM2, BOManswer, borg, bot, bots, botsdocs, broken, bug, bz, channels, chmod, cloak, cmds, console, cookies, coren, Coren, cp, credentials, cs, Cyberpower, Cyberpower678, cyberpowerresponse, damianz, damianz's-reset, db, del, demon, dependency, deployment-beta-docs-1, deployment-prep, derp, derpie, doc, docs, domain, dumb, enwp, epad, es, etherpad, evil, excuse, explain, extension, failure, false, FastLizard4, fff, filemoves.py, flow, FORTRAN, forwarding, freenode, gc, gerrit, gerritsearch, gerrit-wm, ghsh, git, git-puppet, gitweb, google, group, grrrit-wm, hashar, help, helpmebot, herpie, hexmode, hodor, home, htmllogs, hyperon, icinga, ident, IE6, info, initial-login, instance, instance-json, instancelist, instanceproject, ip, is, jenkins, jira, keys, labs, labsconf, labsconsole, labsconsole.wiki, labs-home-wm, labs-l, labs-morebots, labs-nagios-wm, labs-project, labs-putty, labstore3, labswiki, leslie's-reset, lighttpd, limitations, link, linux, load, load-all, log, logs, logsearch, mac, magic, mäh, mail, manage-projects, mediawiki, mediawiki-instance, meh, memo, migration, mob, mobile-cache, monitor, morebots, msys-git, mw, nagios, nagios.wmflabs.org, nagios-fix, namespaces, nc, newgrp, newlabs, newlabs2, newlabs-rl, new-labsuser, new-ldapuser, newweb, night, nocloakonjoin, nova-resource, op_on_duty, openstack-manager, origin/test, os-change, osm, osm-bug, paf, pageant, pang, password, pastebin, pathconflict, pc, perl, petan, petan..., petan:ping, petan-build, petan-forgot, ping, pl, po*of, pong, poof, poofing, port-forwarding, project-access, project-discuss, projects, proxy, pung, puppetmaster::self, puppetmasterself, puppet-variables, putty, pxe, pypi, python, pythonguy, pythonwalkthrough, queue, quilt, ragesoss, rb, reboot, redis, reedy-spam-hour, relay, remove, replicateddb, report, requests, resource, revision, rights, rq, rt, rules, Ryan, Ryan_Lane, ryanland, sal, SAL, say, screenfix, search, searchlog, security, security-groups, seen, self, sexytime, shellrequests, single-node-mediawiki, snapshits, socks-proxy, somethingtorelax, srv, ssh, sshkey, start, stats, status, Steinsplitter, StoneB, stucked, sudo, sudo-policies, sudo-policy, sumanah, svn, t13, taskinfo, tdb, Technical_13, terminology, test, Thehelpfulone, tmh, todo, tooldocs, tools-admin, toolsbeta, tools-bug, tools-equiad, toolsmigration, tools-request, toolsvslabs, tools-web, trout, tunnel, tygs, unicorn, valhallasw, venue, vim, vmem, we, whatIwant, whitespace, whyismypackagegone:'(, wiki, wikitech, wikitech-putty, wikiversity-sandbox, windows, wl, wm-bot, wm-bot2, wm-bot3, wm-bot4, wmflabs, xy, you, [05:53:03] meh [05:53:08] ! is hello :) [05:53:09] Key was added [05:53:10] meh [05:53:15] . [05:53:16] ! [05:53:16] hello :) [05:54:29] !Coren [05:54:29] Coren is dead. petan killed him. He now roams about as a zombie. [05:54:36] :o [05:54:46] lol [05:54:51] @infobot-detail Coren [05:54:52] Info for Coren: this key was created at 6/17/2013 6:06:56 PM by Cyberpower678, this key was displayed 11 time(s), last time at 12/5/2013 2:06:13 PM (101.15:48:38.0162220 ago) this key is normal [05:55:32] hey what [05:55:45] this is a bug meh [05:55:52] I just displayed that key [05:55:59] !ping [05:56:00] !pong [05:56:04] @infobot-detail ping [05:56:04] Info for ping: this key was created at 8/26/2013 10:25:56 AM by T13|sleeps, this key was displayed 103 time(s), last time at 2/21/2014 10:51:37 AM (23.19:04:27.2278470 ago) this key is normal [05:56:11] !ping [05:56:11] !pong [05:56:13] @infobot-detail ping [05:56:13] Info for ping: this key was created at 8/26/2013 10:25:56 AM by T13|sleeps, this key was displayed 103 time(s), last time at 2/21/2014 10:51:37 AM (23.19:04:36.4093870 ago) this key is normal [07:19:29] a930913: woo! are you using websockets?! [10:32:49] Anyone around that could allocate a project another IP for me? :) [11:06:22] YuviPanda: SSEs. [11:07:54] Coren, yt? [11:46:33] Dropdowns are empty for me at https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=create&project=dumps®ion=eqiad [12:33:20] Nemo_bis, yep - I had the same complaint. guess we'll have to wait for andrewbogott_afk or Coren to come back online [12:34:32] ok [13:45:12] @notify Coren [13:45:12] This user is now online in #wikimedia-labs. I'll let you know when they show some activity (talk, etc.) [13:51:15] hello [13:51:22] anyone has any clue why DNS is dead ? [13:51:35] wondering whether the issue is known or not [13:51:58] "Failed to create new proxy hatjitsu.wmflabs.org." [13:52:13] Coren: ^ [13:52:29] No clue. I just got here. [13:53:00] Good morning Coren! [13:53:11] filled the useless bug https://bugzilla.wikimedia.org/show_bug.cgi?id=62731 to track the DNS being dead :] [13:53:18] DNS seems quite dead indeed. [13:53:18] can't access http://wikidata-jenkins.wmflabs.org/ci [13:53:27] * Coren goes to check. [13:53:28] still in tampa (being moved) [13:53:28] the labs-ns0 and labs-ns1 yield nothing [13:55:23] Restarting the pdns fixed 'em, at least. [13:55:36] MaxSem: Try to create the instance again, see if that was the root cause? [13:56:03] Coren: works for me now [13:56:43] Coren: ever fixing that bug [13:56:46] ? [13:57:13] Coren, the instance's already here [13:57:18] zhuyifei1999_: As I told you, bugzilla squishing takes place /after/ migration (which begins its last phase today) [13:57:26] (it's not a newly created one) [13:57:46] Coren, now works, thanks [13:58:05] Coren: I have not received any messages about that [13:58:24] zhuyifei1999_: I told you when you asked me the first time. :-) [13:58:27] Coren: how do i mark tools as moved? (manually) [13:58:59] aude: You can put a note in the table on https://wikitech.wikimedia.org/wiki/Tool_Labs/Migration_to_eqiad [13:59:07] ok [14:00:02] Coren: thank you :-] [14:14:49] Coren: how can I use /data/scratch [14:15:04] it doesn't let me write there [14:16:29] petan: Ah, huh. I probably should have turned on writing to it shouldn't I? :-) Hang on. [14:16:41] Coren: that explain why nothing is there [14:16:42] :P [14:17:07] Coren: maybe just create folder tmp with 1777 there [14:17:07] petan: You should be able to write there now. Suggestion: treat it like a /tmp; use a subdirectory. [14:17:56] petan: /data/scratch itself is sticky. [14:18:50] nobody nogroup ? [14:18:51] :D [14:18:57] petan: When I said "treat it like a /tmp" I didn't mean "create a /tmp in it". :-) It doesn't harm, it's just pointless. :-) [14:19:02] what filesystem is there, fat? :D [14:19:21] Root squash [14:19:47] Root has no magic powers on that filesystem. [14:19:55] it's a bit like committing tool suicide! [14:20:01] aha so how would I create a folder there that normal user can use? [14:20:10] * aude start new db for my tool in eqiad with new data [14:20:16] petan: /data/scratch itself is sticky. [14:33:03] today? o.O [14:33:17] Coren: whole pmtpa? [14:34:03] Ah, yes, that message is ambiguous [14:36:30] speaking of scratch: err: /Stage[main]/Role::Labs::Instance/File[/data/scratch]/mode: change from 1777 to 0755 failed: failed to set mode 1777 on /data/scratch: Operation not permitted - /data/scratch [14:37:10] hashar: Hm. puppet seems to be overspecified. Fixing. [14:39:15] hashar: https://gerrit.wikimedia.org/r/119053 [14:40:28] Actually, even owner and group are overspecified in this case. [14:40:30] Coren: :-] [14:40:49] so the NFS erver handle the permissions right? [14:41:39] hashar: Well, that's where they'd be adjusted from once the mount has taken place. The ensure=>directory is only useful to create the mountpoint before the first mount. [14:42:18] So pretty much only on first boot or in case of someone doing a rmdir by accident (after having unmounted the filesystem, which'd take some doing). [14:45:51] hashar: Merged. That should make puppet stfu. [14:46:11] !log deployment-prep manually purging all commonswiki archived files (on beta of course) [14:46:13] Logged the message, Master [14:46:22] Coren: trying :] [14:47:30] Coren: the beta apaches are almost passing puppet \O/ Gotta setup Apache now. [14:49:39] what? reboot? [14:50:07] Danny_B: where [14:50:13] Danny_B: It's more final that just "reboot". It's the last reboot. [14:50:31] and how can i connect then? where will my data be? [14:50:49] don't reboot until it's solved pls [14:50:52] o/~ Con te partiro o/~ [14:51:02] ?? [14:51:16] Danny_B: ... where have you been the last couple of months? Have you not migrated your tools yet? [14:52:11] nope, didn't even know to do so. give me some time to back up then [14:52:41] Danny_B: There have been warnings for a couple months, and instructions for two weeks. Don't worry -- your tools will just end up in the batch migration. [14:52:44] Coren: given the amount of migrated tools, I guess that more than 50% of tool labs didn't know [14:53:02] It's too late do to a manual migration now. [14:53:16] i have not seen any warning in motd [14:53:24] again :-/ [14:53:26] Danny_B: labs-l [14:53:36] It's been announced over and over since January. [14:53:43] although we made a deal that such important stuff WILL be in motd [14:54:04] so my ~ will be moved somewhere? [14:55:11] Danny_B: Yeah, to the eqiad mirror. You should be able to login there already, but the contents of your home will get there in the automatic copy which might take an hour or two. [14:55:33] Well, eqiad isn't a mirror anymore. :-) [14:55:56] sigh, too many maillists with too many mails. important things should be always noticed on prominent places such as motd [14:56:15] seems you killed several ppl's irc ;-) [14:56:44] DangSunM|cloud: That seems unlikely; unless they had IRC clients on tools-dev (which, honestly, they probably shouldn't). :-) [14:57:09] Coren: of course they had :P [14:57:16] does everything on a labs instance need to be puppetized for it to survive migration to equid? [14:57:18] lot of people likes to run irssi in screen [14:58:24] Coren: why they shouldn't? [14:58:38] chippy: General labs instances? No, the actual image is copied, but there may be issues when doing that that will need fixing. If everything is in puppet, it's usually simplest to just recreate a new one rather than use a migrated image. [14:58:39] i acutally ran irc for recent changes there [14:58:56] ahh I see, thank you Coren [14:59:10] Danny_B: I mean interactive processes; bots are obviously okay by design. :-) [15:01:50] Danny_B: If you were doing things interactively, lemme force-migrate you by hand. What's your shell account name again? [15:02:01] !log deployment-prep Starting copying /data/project from ptmpa to eqiad [15:02:02] Logged the message, Master [15:02:04] (with rsync) [15:03:31] Ahh seems like the final party has just begun [15:03:54] Coren: could you pls update https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/tools-login.wmflabs.org [15:04:08] Coren: and add the key for tools-dev [15:04:27] Coren: as long as my full ~ will be migrated, i'm ok with batch [15:04:29] Ah, yes. Now is a good time to do that. :-) [15:04:34] thanks for care [15:04:50] Danny_B: It will, tools as well. [15:05:44] Webservice down? [15:11:19] a930913: No, but the domain name was just repointed; you may be suffering cache. [15:11:33] hedonil: {{done}} [15:11:52] fine [15:17:49] Coren: Ah, no more pmtpa proxying? [15:17:56] * Coren nods. [15:18:59] hello there. [15:19:28] i have trouble logging in to tools-login.wmflabs.org [15:19:44] apparently my ssh-key is gone. [15:20:00] Coren: 208.80.153.201 is old or new? [15:20:42] a930913: Old one. eqiad is 208.80.155. [15:21:45] johl: There is likely to be issues while DNS timeouts propagate. Try tools-login-eqiad.wmflabs.org directly for the next hour or so. [15:23:18] Coren: Ah, I'll just use tools-eqiad.wmflabs for now then. [15:28:04] Coren: thank you. now I'm logged in and became 'file-reuse' (name of my tool). public_html is empty, all the files are gone. [15:28:30] Coren: did a git clone, still won't turn up on http://tools.wmflabs.org/file-reuse/ [15:29:00] johl: If you did not migrate your tools manually during the migration period, it'll be migrated in today's batch. [15:29:35] Coren: thank you! [15:30:14] !log deployment-prep repacking all mwcore extension repositories on the bastion [15:31:07] johl: in eqiad, webservice start [15:31:21] * aude had to do that after i finished migrating [15:32:10] Coren: i guess it's too late for people to migrate with teh migrate script? [15:32:38] aude: Indeed. [15:32:42] ok [15:32:53] It's all being batched up for copy now. [15:34:14] Coren, was this the same dns failure or a new one? [15:35:05] andrewbogott: Different one; both pdns were stuck unable to talk to LDAP, but LDAP was happy; it took a restart of the pdnses, but they came back without issue. [15:35:14] hm [15:35:25] was there any reason to think that opendj was restarted? [15:36:07] andrewbogott: Not by my doing, but I'm pretty sure that was it. I haven't investigated much further since the issue was solved and I have a full plate already. [15:36:14] ok [15:36:36] looks like grrrrt is gone :/ [15:36:42] Coren: I've got yet another issue in eqiad. /var is a 2G partition on the new image and it keeps getting full on my deployment-scap host that is acting as a self-hosted puppetmaster. `apt-get clean` got me ~500M back for now, but a 2G /var seems small especially compared to a 7.5G / [15:38:08] Coren: A housemate just informed me that my emails to her have been getting spam-trapped… I emailed you a couple of times over the weekend, did you get them? (I don't need you to respond immediately, just concerned about the filtering thing.) [15:39:19] bd808: ... having /var on the same partition as / is begging for trouble; and 2G of logs is a /lot/. I think it's the self-hosted puppet that stuffs itself somewhere under /var that causes issues? [15:39:55] Did I change something, or does cron send output to our personal emails now? [15:40:44] a930913: Local mail delivery has been disabled entirely as part of the legal requirements for being to receive email @tools.wmflabs.org. You were already receiving that email, it just wasn't forwarded to you before. [15:40:54] andrewbogott: Some projects like Wikivoyage were missing from https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration/Progress and had to be added manually; could you make OpenStack produce a list so that can be cross-checked? [15:41:05] Well, it was forwarded to the .forward instead of to the wikitech address [15:41:19] scfc_de: Hm, yes, I think so. [15:41:22] weird [15:41:30] valhallasw: Yes, that's true. [15:41:50] valhallasw: Also part of the legal thingies (forward must be to verified addresses) [15:42:12] Coren: Oh yes, of course. [15:42:29] Though honestly, cron spam is a symptom that you should probably look at your crontab and make it quiet. :-) [15:43:00] * Coren tries to understand why some users' homes are so effin big. [15:43:02] Coren: Self-hosted puppet does the git checkout to /var/lib/git and that's ~100M for me right now. [15:43:19] Coren: :p >/dev/null, right? [15:43:19] andrewbogott: I noticed some other differences between Nova instances for example and SMW on-wiki data (I think Andre Klapper's bug report), so it might be worth the while to look at re-syncing that. [15:43:24] It actually looks like my culprit is the l10nupdate role [15:43:42] scfc_de: are you interested/able to do the cross-check? Here's a complete list: https://dpaste.de/nspN [15:43:47] a930913: That's the big hammer of stfu, yes. :-) [15:43:55] andrewbogott: I'll take a look. [15:43:59] thank you! [15:44:02] bd808: Yeah, I was about to say that 100M seemed managable. [15:44:29] Coren: /var/lib/l10nupdate is 928M o_O. I'll look into that. Full mediawiki + extensions checkout [15:44:35] Coren: Should there be anything in the jsub return that I should be listening to? :p [15:44:53] bd808: Yow! That should probably live in project space instead. [15:44:59] a930913: depends. If the grid is broken, it will spew error messages [15:45:18] a930913: Not on stdout; you might want to just add -quiet though -- that'll make it quiet when everything is fine. [15:45:44] (Which is what most people want in cron) [15:45:55] Coren: <3 [15:46:25] Well, the stderr messages typically are also not very informative. 'Segmentation Fault' 'Out of Memory' 'Use of uninitialized value $prog in scalar chomp at /usr/bin/jsub line 112.' 'Unable to run job: unable to send message to qmaster using port 6444 on host "tools-master.pmtpa.wmflabs": got send error.' [15:46:56] valhallasw: No, but their /presence/ is. [15:47:00] it would be great to seperate 'well, stuff is broken, will probably be fixed by coren' messages from the 'I made a booboo' messages [15:47:21] Moderately. There's nothing I can do to fix the issue, except waiting, typically. [15:47:37] but it's good to know a job did not run, in general [15:49:20] * Coren groans. [15:49:27] Seriously? 38G homes? [15:50:43] Does that include the directories of users who did migrate-user? [15:51:00] scfc_de: ... no; I'm talking about *one* home. [15:51:24] Uh. [15:51:35] Hmm resolves fine. web doesn't (still 208.80.153.201, even from console) [15:51:43] Coren: well, actually, one of the emails was about nfs not working with lucid… on second thought I am sort of blocked by that one :( [15:51:50] hedonil: The joys of DNS TTL [15:52:15] andrewbogott: ... yeah, that one I /did/ get; I'm still boggling over how in hell we'll make that work. [15:52:25] Oh, it's hard, huh? :( [15:52:28] andrewbogott: How many lucids do you still have? [15:52:35] I was hoping it'd just be a simple flag. [15:52:49] andrewbogott: 4 /really/ doesn't work the same as 3. [15:52:50] Coren: Probably not very many. I'll skip that one for now and see if I hit any others. [15:53:19] andrewbogott: We might have to work around the few lucids by setting up a small fileserver just for them until they upgrade. [15:53:28] eek, ok. [15:53:34] With luck none of them will turn out to matter... [15:53:53] andrewbogott: Alternately, by giving them local disk to replace the share. [15:53:57] yep [15:54:08] For now "It's hard" is a perfectly useful answer. [15:54:26] andrewbogott: But yeah, AFAICT that's the only email I got from you over the weekend. I'll check my junk folder a bit later. [15:54:42] thanks. There should've been another one about sql & security groups [15:55:04] but that one really isn't a rush [15:56:27] hashar: is the labs 'gerrit' project yours? Or used by you? [15:56:44] andrewbogott: I guess it belong to ^d [15:56:54] ^d: ? [15:57:25] * andrewbogott is not looking forward to the torches and pitchforks when he starts turning things off [15:57:27] <^d> Tis mine. [15:57:44] andrewbogott: Missing from the wiki page are the projects: catgraph, datadog, deployment-prep, global-admin, language, logstash, megacron, multimedia, scrumbugz, social-tools, wikistats. [15:57:45] <^d> There's only one instance. I'm setting up a replacement in eqiad but didn't finish Friday. [15:58:31] ^d, so, two questions... [15:58:42] 1) Can you please annotate https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration_Progress accordingly? [15:59:20] <^d> Sure. [16:00:27] scfc_de: logstash is on Labs_Eqiad_Migration/Progress. I'm working on it this week. [16:00:32] 2) Um… what can I do/could I have done to impress on you the importance of doing that? Because I'm worried that there are 50 more projects like that that people care about but they haven't said anything to me yet... [16:01:33] <^d> I knew it was important and I was hoping to have it done last week :( [16:01:36] scfc_de: Do you have time to insert entries for those on the progress page? (If not I can do it in a bit) [16:01:54] ^d: Ah, ok. As long as you knew the game was afoot I'm not so worried. [16:02:05] Mostly just fretting about people being surprised when I stop things. [16:02:32] <^d> The hard deadline is thursday, right? [16:03:08] ^d: That's the deadline for projects that have notes attached. [16:03:29] For unmarked/unclaimed projects (which yours was until a minute ago) the deadline was theoretically this morning. [16:03:47] * ^d nods [16:03:49] But it'll take me ages to handle them so projects later in the alphabet are safer :) [16:04:19] <^d> Whew, I've got [a-f] in front of me :P [16:05:04] yep [16:07:25] andrewbogott: I'll do that. [16:07:30] thanks! [16:08:08] Coren: tools-login, tools.wmflabs.org are fine now. still 208.80.153.163 (TTL 3600) ;) [16:08:20] bd808: I searched for "Nova Resource"; logstash's entry was "Nova_Resource" (and just checked: It's the only one). [16:08:35] * bd808 nods [16:09:14] hedonil: Yeah, default TTL is 1H [16:09:15] I added it entirely to that page. It was missed by whatever process created the initial content [16:09:31] !log deployment-prep Started transfer of files/thumbs from pmtpa to eqiad. rsync is running in a screen process on deployment-bastio.pmtpa.wmflabs [16:10:01] Coren: I gonna need a "apache" user on the eqiad NFS server. The beta cluster files being uploaded belong to apache:apache :-] [16:10:25] ... wait what? [16:10:36] or maybe some better system :-] [16:10:57] Define "uploaded" in context, and why do you preserve that user? [16:10:59] bd808: Yep, that's the underlying problem. Just wanted to explain why my search failed. [16:11:05] files uploaded on beta cluster are stored under /data/project/upload7 , the file hierarchy and files are created by MediaWiki running in Apache as apache:apache. [16:11:26] so Apache would need write access on the shared directory [16:11:42] hashar: ... apache on ubuntu is supposed to run as www-data [16:12:54] Coren: I guess our apache config is slightly different so. [16:13:46] Bleh. Lemme think on that a minute. The obvious workaround is to make the upload dir a+rwx,+t (which is no less secure) [16:14:06] And doesn't require adding yet another non-standard system user to the entire labs cluster. [16:14:16] ahh [16:14:49] you are right, the root dir /data/project/upload7 on pmtpa is rwxrwxrwx [16:15:42] Right; that'll cause the uploaded files to belong to nobody:nogroup which is actually even better than a random system user. [16:16:40] :-] [16:17:18] bah my eqiad mount is now readonly :-( [16:17:19] labstore.svc.eqiad.wmnet:/project/deployment-prep/project [16:17:38] The "apache" user is quite likely something that has been carried forward in prod from older versions of ubuntu. We definitely have active prod config that runs as the user "apache" rather than "www-data". [16:18:33] Ewwww [16:18:40] hashar: New instance? [16:18:46] nop [16:19:14] deployment-bastion.eqiad.wmflabs [16:19:16] hashar: ... what? Lemme check, because that's /not/ supposed to happen. [16:19:17] created a while ago [16:19:26] define "a while"? [16:19:42] not sure, uptime is 12 days [16:19:54] Labs down? [16:19:58] ... dafu [16:20:43] Gloria: No. As announced on labs-l for months, in detail two weeks ago, and with a final warning last week, today is final migration-to-eqiad day. Unmigrated stuff is being shuffled around. [16:21:00] though beta is lagging out :-( [16:21:25] Coren: Not subscribed to labs-l. No idea why you think I would be. [16:21:27] Gloria: Speaking of; you have 38G of dumps in your home. Can I blow those away for migration? That'll make your home get to eqiad all that much faster. :-) [16:21:33] Gloria: i don't know if grrrrt is in eqiad yet [16:21:42] if not, then it's batch migrated today [16:22:00] Gloria: Because this is where labs-related announcements go? [16:22:03] Coren: Hmm, can't look myself, but probably fine to kill those, yeah. If they look like typical dumps. [16:22:16] Coren: I've mostly avoided Labs. [16:22:20] Gloria: Yeah, it's two enwiki dumps, one about 5 months old the other 3 or so. [16:22:30] Feel free to delete. [16:22:45] Anyway, I only noticed because grrrit-wm is now down. [16:23:03] Gloria: KK. Your home should be in eqiad in ~30 min or so, judging by the current progress. [16:23:11] Not sure that's a show-stopper, but it is kind of disruptive to development. :-) [16:23:23] Same with lolrrit-wm (sp?)'s home? [16:23:28] Gloria: no one moved it [16:23:35] Oh. [16:23:38] it will be auto migrated and someone has to start it [16:23:44] Gloria: Tools are migrated after user homes; but that'll be in a few hours. [16:23:50] (worse case, it's in git) [16:24:01] I think someone has to unpack it. [16:24:04] And then start it. [16:24:09] I read something about tarballs. [16:24:16] Gloria: hmmm [16:24:31] Guess no grrrit-wm today, then. Oh well. Good luck with the migration! [16:24:37] problem with a shared tool, i assumed someone else did it [16:24:43] * aude would have migrated it [16:24:50] Hah. [16:24:53] Diffused responsibility. [16:24:57] Coren: the old friend 'missing trailing slash' seems to be back again: https://bugzilla.wikimedia.org/show_bug.cgi?id=59926 [16:25:39] hedonil: Please to show me a URL showing that symptom [16:26:05] Coren: I am off for now, will be back after dinner though :] [16:26:11] Coren: http://tools.wmflabs.org/tools-info vs. http://tools.wmflabs.org/tools-info/ [16:26:41] hashar: That's not that bug. Apparently, there is something in the tool proper. [16:27:06] hashar: That bug would have redirected to an internal IP (and failed) not give a 500 [16:27:19] That's hedonil, not hashar :-). [16:27:29] D'oh autocomplete fail. [16:27:48] Coren: but worked some hours ago ... [16:28:18] hedonil: Possibly the pmtpa proxy's added indirection made sure the no-slash URL never reached the actual tool. [16:28:45] hedonil: Thus hiding the bug. [16:29:07] hedonil: I'll see what I can do to fake it up after I migrate the rest of the stuff. [16:29:38] Coren: 'k. [16:29:56] hashar: As far as I can tell, this is the "mount too early" bug, it just never was rebooted since. Okay to reboot that box? [16:30:53] * Coren will have to find some trickery to put in puppet to sequence the first boot. [16:32:05] valhallasw: YuviPanda: This is what I've been able to do (so far) with your help ^_^ https://tools.wmflabs.org/cluestuff/stats.html [16:32:14] hedonil: Besides, if tools-info relies on the pmtpa login box being reachable, then it can no longer work. [16:32:43] Coren: no it doesn't. ;) [16:33:12] Coren: yeah I can reboot it [16:34:13] a930913: nice! [16:35:01] a930913: although you might want to adjust the typical sample spacing -- all the lows are 0 [16:35:12] andrewbogott: Added the projects to the list with https://wikitech.wikimedia.org/w/index.php?title=Labs_Eqiad_Migration/Progress&diff=104717&oldid=104704. Deployment-Prep, Global-Admin and Social-Tools are red links, i. e. no wiki pages about these projects whatsoever. [16:35:26] thanks! [16:35:35] a930913: it looks to me as if the 'per minute' values are actually '10 second integration time, times 6', or something like that [16:35:47] Coren: sorry can't follow up on deployment-bastio.eqiad.wmflabs having /data/project readonly. I must rush back home. Will be back later after dinner. [16:39:14] valhallasw: Yeah, it's buckets of 2 seconds. Per minutes are the average of the circular buffer, times 30. (Because average of 2s buckets * 30 = 60s.) [16:41:41] Not sure what to do about the lows of 0, if I increase the bucket size, you lose the novelty of graph speed. [16:58:24] I haven't managed to convert .htaccess to lighttpd config [16:59:27] where should I put the lighttpd configuration file? public_html or root directory /data/project/mytoolname ? [17:00:26] do I have to set up something else to make url.rewrite-once work? [17:03:01] please help me! [17:07:22] Ricordisamoa: It the tool's home, outside public_html [17:07:24] Ricordisamoa: Mine is in my tool home. [17:10:14] Coren: can you stick another public IP on the scrumbugz project for eqiad for me? :) [17:11:32] a930913: sweet :D [17:12:08] addshore: {{done}} [17:12:21] cheers Coren ! [17:16:09] Damianz: Page creations don't make it through to the CBNG feed, do they? [17:27:54] hi Coren [17:28:01] Coren: > become lolrrit-wm [17:28:03] Coren: > sudo: sorry, a password is required to run sudo [17:28:09] * Coren waves at YuviPanda [17:28:25] YuviPanda: Did you migrate the tool from pmtpa before today? [17:28:35] Coren: uh, I... [17:28:36] uh [17:28:37] no [17:28:41] is the tool lost? [17:28:58] I've been waaay behind on my labs-l [17:28:59] Heh. No, but now it's batched for the big bundle of copy; that'll take most of today. [17:29:10] ah [17:29:30] RoanKattouw: ^ [17:29:38] RoanKattouw: should we just let it go through there? :) [17:29:48] YuviPanda: Catch up to labs-l. It's not that high volume and you'll now wth is going on. :-) [17:29:49] Coren, andrewbogott: I'm getting "Permission denied (publickey)" when trying to authenticate to both pmtpa instances in the logstash project. Is that expected? [17:30:01] Coren: yeah, I'll do it before I go to sleep [17:30:03] bd808: nope [17:30:15] bd808: what's an instance name? [17:30:39] YuviPanda, RoanKattouw: Nothing you can do about it now anyways; it's too late to do manual migrations now. [17:30:41] Coren: wget times out from console Connecting to tools.wmflabs.org (tools.wmflabs.org)|208.80.155.131|:80... [17:30:43] andrewbogott: logstash.pmtpa.wmflabs and logstash-puppet.pmtpa.wmflabs [17:30:47] The batch copy has already started. [17:31:01] Coren: heh. I could setup a local instance on my laptop, but considering the internet I am on... [17:31:04] hedonil: Are you trying to do this from /within/ labs? [17:31:09] Coren: I'll just email wikitech-l then [17:31:14] andrewbogott: Getting into either of them would help me finish migrating [17:31:29] Coren from tools-login. also worked before ;) [17:31:39] hm, puppet is disabled on logstash [17:32:13] andrewbogott: blerg. I was probably playing with something and forgot to re-enable [17:33:08] where to put the .lighttpd.conf file? [17:33:28] bd808: I can connect to logstash-puppet just fine. I'm going to turn puppet back on on logstash... [17:33:35] hedonil: No, it didn't. You can't reach external IPs from within the virtual network (nova-network limitation). Unless, like, someone added something in /etc/hosts to fake it. You should be using the actual instance name (tools-webproxy) when you want to talk to an instance. [17:33:55] RoanKattouw: I emailed wikitech-l [17:34:01] Thanks [17:34:13] andrewbogott: Cool. I can get into logstash-puppet now as well. [17:34:13] hedonil: If you absolutely need it, I could add something to the hosts file, but that's brittle. [17:34:30] Ricordisamoa_: It the tool's home. [17:34:36] In* [17:34:51] Coren: that pulled the status from a tools webpage. and really it definitly worked until 14:00 [17:35:09] Coren: /data/project/toolname or /data/project/toolname/public_html ? [17:35:28] hedonil: Oh! "trickery". You were accessing /the other datacenter's/ public IP. [17:35:59] hedonil: That IP doesn't exist anymore. Indeed, that *host* doesn't exist anymore -- there isn't a webserver in pmtpa tools at all anymore. [17:36:32] Coren: ?? it says Connecting to tools.wmflabs.org (tools.wmflabs.org)|208.80.155.131|:80. this is eqiad [17:36:48] hedonil: So yeah. Yes, but before a few hours ago "tools.wmflabs.org" pointed to pmtpa. [17:37:00] Not anymore. [17:37:23] Coren: ah ok, I know what you mean [17:37:46] Coren: /data/project/toolname or /data/project/toolname/public_html ? [17:37:46] hedonil: It worked while you were talking to pmtpa. :-) [17:38:04] Ricordisamoa_: /data/project/toolname -- the tool's home, not its public_html [17:38:11] thx [17:38:27] Coren: yeah got it. going outside and been proxied back to the 'inside' [17:38:35] Right. [17:39:00] The program 'irssi' is currently not installed. To run 'irssi' please ask your administrator to install the package 'irssi' [17:39:25] Coren, my dear administrator, I'm asking you ^^ ;-) [17:39:36] mmm, bouncers? :P [17:39:36] Danny_B: That was never officially installed since it's not in puppet; someone cheated. :-) [17:39:55] Danny_B: But also, why are you running an interactive IRC client in tools? [17:40:29] because it's a tool? [17:41:22] Danny_B: ... what? How is an IRC client a tool? [17:41:59] monitoring tool [17:42:26] obviously rc channels are read only, thus it is not that "interactive" [17:43:09] Coren: Re user directories, if I didn't use migrate-user, will /home@eqiad be overwritten by /home@pmtpa, will it be a subdirectory, only available on request, ...? [17:43:26] YuviPanda: i would have moved it if realised [17:43:46] Danny_B: I guess I'm missing context. You're running an irssi on the rc channels to... look at it on occasion? [17:44:44] Coren: i have some filters & hilits there for patrolling [17:44:44] scfc_de: If you have not done a manual migration before today, the contents of your pmtpa home will be moved on top of your eqiad one. Actually, /has been/ because I see your home is already done. [17:45:26] Danny_B: Why don't you use that from your own computer? [17:45:45] scfc_de: does my computer run 24/7? ;-) [17:46:23] Coren, ping [17:46:54] Danny_B: So essentially, you're logging the RC channels? [17:47:08] Danny_B: Neither does an irssi you are running on tools, unless you run in it a screen, which is specifically not allowed. You're welcome to set a bouncer up though, if you want. (Arguably, a bouncer tool for users' convenience would be an even better idea) [17:47:21] Coren: Indeed, my prompt changed to "scfc@tools-login.pmtpa" :-). Okay, cleaning up, then. [17:48:21] Coren, is it reasonable to assume that since the instances are moved to eqiad, that instances other than tools can have access to the replication DBs? [17:48:33] not allowed, since when? i particularly asked you about that at last hackathon and you ok'd that [17:48:41] Coren: tools-webproxy did the trick. thx [17:48:52] Cyberpower678: That has been the case in pmtpa as well. [17:48:57] Cyberpower678: They already could, with some trickery. That trickery will no longer be necessary in a couple of days as we finish the rest of migration though. [17:49:21] aude: heh. diffusion of responsibility, and bystander effect, and all that [17:49:32] scfc_de not really from what I've been hearing. [17:49:42] Coren, trickery aside. [17:49:49] of course [17:50:04] YuviPanda: yep [17:50:11] Danny_B: I clearly misunderstood your request, probably thinking you were talking about a bot that logged or a bouncer (both of which would be quite okay). [17:50:17] Cyberpower678: What have you been hearing? [17:50:50] scfc_de`, that accounts.wmflabs.org has no access at all to replication. [17:51:07] Which has been bugging stwalkerster and me for a bit. [17:52:03] scfc_de`, also Coren says that at current only tools has access to it and that the only way to get access to it elsewhere was to SSH into tools. [17:52:30] Essentially tunnel a connection to the DB/ [17:53:14] Cyberpower678: I never said that. In fact, several other projects have used the replicas. [17:53:28] Cyberpower678: The only thing it needed was some iptables and a /etc/hosts [17:53:29] Coren, I swear you said that. [17:53:43] or was that stwalkerster? [17:53:49] * Cyberpower678 doesn't remember. [17:53:49] I [17:54:03] I'm pretty sure I would have never said something that's the opposite of reality. :-) [17:54:12] <^d> Hmm, I can't ssh to an eqiad instance. [17:54:19] Worst I could have said is "It's annoying and a little hackish" :-) [17:54:20] <^d> I can hit bastion fine, and the instance worked fine friday. [17:54:28] <^d> Now I'm getting "no route to host" [17:54:55] ^d: "no route to host" means one of (a) the instance is down or (b) the instance is seriously hosed. [17:55:06] <^d> Well damn :( [17:55:06] Coren, will you let me know when other instances can access the replicas, without the trickery, especially accounts? [17:55:41] <^d> Coren: Wikitech claims its up but it's not responding to pings. [17:55:48] Cyberpower678: Sure; it's on my short-term list after migration. It requires moving hardware around in the DC which is a bit complicated, but it's high on my priority list. [17:56:02] :-D [17:56:07] ^d: Give it a swift reboot in the pants; the kernel might have gone boink. [17:56:10] Coren, thank you. [17:56:25] <^d> reboooooooting [17:56:26] bd808: do you need to access logstash as well? I might need to reboot it... [17:57:07] andrewbogott: All the important bits are on the shared drive so I'm good for the moment. [17:57:13] great [18:00:36] Coren: Should [[Special:NovaServiceGroup]] still be listing local-toolnames? [18:02:01] jarry1250___: Web interface hasn't been updated to show the new style yet. Simply mentally s/local-/$projectname./ :-) [18:02:32] Coren: Okey dokie. I thought that might be why group permissions (still) aren't working [18:02:59] andrewbogott: deployment-logstash.eqiad instance in deployment-prep project is not letting me authenticate and shows `Can't contact LDAP server: Connection refused (uri="ldap://virt0.wikimedia.org:389")` in console output. Reboot didn't help. [18:03:20] jarry1250___: Actually, they should. Do you have a specific failing example in mind? [18:03:30] <^d> Coren: Reboot finished, wikitech's saying "ACTIVE" again, still no respond to ping. [18:03:33] This is an instance I just created a couple of hours ago. First time I tried to login [18:03:33] bd808: is that an instance you made or I made? [18:03:43] ^d: Nothing interesting on the console? [18:03:47] Coren: http://tools.wmflabs.org/wikicup/test.php [18:04:07] andrewbogott: I made it about 2-3 hours ago [18:04:24] <^d> Coren: Some complaints about LDAP and DNS? [18:05:34] <^d> Here's some of the console: http://p.defau.lt/?Vvy2Z8abpjqqwzJpa6aPvw [18:06:13] bd808: I'm in a call now, sorry -- can look in a bit [18:06:32] andrewbogott: Cool. It looks like ^d is having similar issues [18:06:34] which, btw coren, time for meeting... [18:06:53] andrewbogott: I'm trying, but that flash thing is completely broken for me. [18:06:59] ok :( [18:07:05] Coren, for me it worked in ff and not in chrome [18:54:44] a930913: hmyeah, that makes sense. Maybe try making a histogram of the gap between events [18:56:36] valhallasw: I'm not sure I follow. [18:58:02] a930913: woah, I just saw https://tools.wmflabs.org/cluestuff/stats.html [18:58:05] a930913: pretty cool :D [18:58:13] a930913: glad I could be of use :) [18:58:25] a930913: based on the distribution of distances between events, you can choose an integration time that's better suited than 1 minute (which is just an arbitrary choice) [18:58:53] YuviPanda: "17:11 < YuviPanda> a930913: sweet :D" ? :p [18:58:56] a930913: also do note that you've to make sure you are managing your server side connections carefully. it is easy for them to blow out of control [18:59:05] a930913: it was sweet I could help, but I didn't actually check :P [18:59:14] YuviPanda: :D [18:59:21] a930913: :D [19:00:23] YuviPanda: Blow out of control? [19:00:35] a930913: 'use up too many resources and get killed' :) [19:02:23] Hehe, I'll guess I'll cross that bridge when I get to it :p [19:03:50] a930913: :) [19:04:32] * Coren groans. [19:05:02] 3 tools over 100G, 12 over 10G. [19:05:06] Coren: http://tools.wmflabs.org/delinker/ 404? error in copyying :O [19:05:30] Steinsplitter: Not even begun to copy yet. [19:09:21] Too late for that. [19:13:42] logstash.eqiad and deployment-logstash.eqiad both having LDAP connectivity problems that are keeping me from authenticating. Log from last reboot of deploymnet-logstash: http://p.defau.lt/?BQwKSsoXjOrZPqqtwFBqiA [19:14:02] These are both instances that were created today [19:36:09] bd808|LUNCH: logstash.eqiad doesn't even respond to ping, does it? [19:37:00] andrewbogott: Apparently not [19:37:43] andrewbogott: "Mar 17 19:37:19 logstash dhclient: No DHCPOFFERS received." [19:37:55] Firewall madness? [19:38:17] Oh I bet I haven't fixecd the logstash project firewall rules in eqiad [19:38:55] bd808: That's definitely part of it, but maybe not all... [19:45:06] andrewbogott: I fixed up the security groups. I'll reboot and see if that helps. icmp was allowed from 0.0.0.0 though so...? [19:45:18] yeah, weird [19:49:59] Coren: is the umask on eqiad different from pmtpa? New directories don't have g+rw? [19:51:18] andrewbogott: I just brought up a new instance (bd808-test) in the logstash project and it seems to be normal. I'm just going to kill the logstash instance and try creating it again. [19:51:32] bd808: that's what I would do! [19:51:50] Great minds think alike [19:52:29] Nettrom: There shouldn't be any difference. What user/tool do you mean? [19:53:09] scfc_de: I created a couple of directories as my "suggestbot" tool, noticed they were group read-only [19:55:35] easy to fix, of course [19:55:56] ^d: was that joking or real (re: don't like bots in channel):) [19:56:17] Nettrom: I see that as well; hmmm. [19:57:20] <^d> mutante: Both! [19:57:41] Nettrom: Ah, okay, there's an open bug about that: https://bugzilla.wikimedia.org/show_bug.cgi?id=46468 [19:58:15] ^d: fair:) is grrrritt-wm just labs migration related then.. (guess) [19:59:08] reee [19:59:44] !ping [19:59:44] !pong [19:59:46] ok [20:00:15] scfc_de: thanks for finding that! I'll set the tool's umask myself in the meantime [20:00:17] Coren: have you managed to find out my read-only file system user issue ? [20:00:53] Coren: it is the mount of labstore.svc.eqiad.wmnet:/project/deployment-prep/project [20:00:57] hashar: Exactly as I have said: it was the boot-too-fast bug combined with the instance having never been rebooted to fix the issue. [20:01:17] though I rebooted it and it still has the issue [20:01:51] hashar: What? I was on it earlier and it didn't. Lemme see. [20:02:16] o_O [20:02:46] :-D [20:03:01] havent looked on other instances though [20:04:04] hashar: No, wait, that makes no sense. hang on for a sec. [20:04:13] from another instance it works fine 10.68.16.74 deployment-apache01.eqiad.wmflabs works [20:14:16] Coren: whats the tools webserver IP ? [20:15:15] Betacommand: 208.80.155.131 [20:15:32] Not that I knew that without doing a dns lookup. :-) [20:16:03] Coren: Im having DNS issues [20:18:37] * YuviPanda makes Coren a free public DNS server [20:23:23] hopes that doesn't include public mail relay [20:25:33] hashar: Yeah, definitely the same issue though also more complicated than I had hoped. There is *something* that caches negative ACLs hits, but I'll be damned if I know what. [20:25:43] hashar: The problem self-correct as the cache timed out. [20:26:41] Coren: that is most probably local to the instance [20:27:15] Coren: hmm no, because rebooting the instance does not fix the issue. So maybe the NFS server maintain that ACL cache per instance? [20:27:57] hashar: If so, that's a new, mysterious and completely undesriable feature of some sort. Maybe it's mountd that does that; I'll have to research this further. [20:29:30] Coren: at least I now know it times out after X hours :] [20:30:14] hashar: It's still [bleep]ing stupid and annoying. [20:32:05] Coren: hi [20:32:24] I'm trying to set up iptables rules on wikimetrics1.eqiad to match those on wikimetrics.pmtpa [20:32:56] milimetric: The very same tables should work verbatim. [20:33:07] ok, cool [20:33:09] thank you! [20:33:18] no problem. [20:33:32] * Coren likes it when the problems are /that/ simple. :-) [20:36:12] a930913: o.0 [20:50:17] Hey Coren, is now a good time to bug you for a new labs project in eqiad? [20:51:49] csteipp: Heh. Not really. Well, I mean you can bug me now if you want but I'm unlikely to do anything about it for a couple days. :-) [20:52:16] Coren: No problem, I imagined that might be the case. Not urgent, I'll bug you next week. [20:57:46] Coren: Is there any progress in the mass copying? I can't see any ;) [20:59:56] is there some kind of disk quota per project? [21:01:21] hashar: There is a default quota of 300GB per project, [21:01:28] mutante: were you ever an active user of the 'ceph' project in labs? Or did you just create it for Anth1y? [21:01:28] says https://wikitech.wikimedia.org/wiki/Help:Shared_storage#Quota [21:01:32] mutante: thanks. Sounds large enough [21:01:48] andrewbogott: just create in that case, afair [21:01:51] 'k [21:01:59] andrewbogott: what i really want to move is "wikistats" and ..maybe... bugzilla [21:02:14] i can start right now :p [21:02:32] mutante: aklapper has a pending bug for the bugzilla project… https://bugzilla.wikimedia.org/show_bug.cgi?id=62658 [21:02:56] As for wikistats, please make a note on the progress page if you're going to migrate it… https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration/Progress [21:04:12] andrewbogott: yep, already a tab i have:) [21:04:43] thx [21:06:30] hedonil: I don't expect any of this is going to be visible; there is no implicit "finish-migration" step. [21:06:44] hedonil: That said, all users homes but one have been migrated. [21:06:58] And tools are in progress. [21:07:42] Coren: I have sharp eyes ;) but yep. that's what I expected - stiil busy with user's home [21:09:45] and just hope that the 3 topscorer-tools with ~450 GB are not a zillion of 1kb files ;) [21:11:34] hedonil: why, what's the relevant block size? [21:11:42] is it worse than 4k? [21:13:14] !log deployment-prep manually installing timidity-daemon to unblock puppet on deployment-bastion.eqiad.wmflabs Puppet change pending in Gerrit [21:13:29] wm-bot: pong [21:13:30] Hi hasharMeeting, there is some error, I am a stupid bot and I am not intelligent enough to hold a conversation with you :-) [21:13:34] logstash-labs-wm: ping [21:13:40] bah whatever [21:14:09] !ping [21:14:09] !pong [21:21:41] SamB: a zillion of 4kB ones might be as ugly as 1kB ones [21:43:52] andrewbogott, Coren: Could one of you look at deployment-logstash1001.eqiad in the deployment-prep project and see if you can figure out why it isn't getting answers to DHCPDISCOVER? [21:44:05] * bd808 is stumped [21:44:44] bd808: can you log in? [21:44:54] andrewbogott: I can't even ping it [21:46:32] bd808: Probably best to scrap it and start a new one, unless you've already tried that. [21:46:44] My project filespace is empty; I'm guessing something awful/important/sweeping is in progress and the situation's temporary? [21:46:49] I'm happy to look, but me scrapping will be quicker. [21:46:54] This is the 3rd try today, but I can try one more [21:47:00] Oh, huh. [21:47:33] maybe you broke the dhcp service :D [21:47:35] bd808: well, the security groups are all wrong, of course. I will fix that [21:47:36] tb__: You should subscribe to (and read) labs-l [21:47:57] Coren: Ta. [21:48:52] andrewbogott: Thanks. I looked at the security groups and was a little scared to mess with them. [21:49:16] bd808: mostly you just need to make sure that port 22 is set to 10.0.0.0/8 [21:50:00] I'm logged into several other instances in that project, but yeah that should get cleaned up [21:52:14] bd808: I just built a fresh instance there and it came up fine [21:52:21] Not that that helps really [21:52:40] andrewbogott: Maybe it's just that instances with "logstash" in the host name are cursed :) [21:52:47] probably! [21:53:02] I'll try yet again and hope that it works :) [21:53:18] ok [21:53:25] sorry I don't have a better idea [22:01:45] andrewbogott: I got one to work! deployment-logstash1 is alive and well. Thanks for the "just keep trying" encouragement [22:01:56] :/ [22:11:52] !log deployment-prep root@i-00000390:/data/project# rsync --verbose -rlptD --delete --exclude '*/thumbs/*' upload7 root@10.68.16.58:/data/project [22:11:59] pff [22:13:10] I am always surprised at how fast rsync is [22:16:50] Then again, you'd expect it to be. It's been designed and tweaked for years with that specific objective of 'do as little as possible as quickly as it can'. :-) [22:17:21] can't beat that :-] [22:17:51] anyway, I transferred most of the beta cluster files from ptmpa to eqiad [22:17:52] got puppet passing on almost all instances [22:18:15] and springle should be creating the MariaDB instances this week :] [22:19:33] hasharMeeting: Near-success! [22:20:06] maybe :] [22:20:56] is stuff moved yet? [22:21:03] aude: no [22:21:20] I managed to get puppet to run on the beta cluster bastion and apaches a few seconds ago :] [22:22:10] aude: I will definitely announce whenever it is ready for testing. Would be done by editing our local /etc/hosts :] [22:22:16] hasharMeeting: i mean tools [22:22:21] ah sorry [22:22:27] * aude would like to get grrrrt back :) [22:22:41] * aude do look forward to the new beta cluster also [22:22:49] Coren: do you have one minute? [22:23:04] Steinsplitter: Sure, what's up? [22:23:08] hasharMeeting: I'm making progress on a logstash instance in beta that uses the puppet classes! [22:23:15] \O/ [22:23:32] bd808: the syslog / udp2log on eqiad should/is deployment-bastio.eqiad.wmflabs [22:23:36] it might even work right now [22:24:03] Coren: becous ther is no docu and i can't access to the old cluster. can you pleas copy the jobs for delinker to the new systen [22:24:48] Steinsplitter: I'm not sure I understand your question. [22:25:21] Coren: ther is a cronjob for delinker on the old cluster. can you pleas give me the code. it is not written in the doku. [22:25:58] Well, first off, you'll need to 'finish-migration delinker'. That'll restore the crontab; but stuff in it will be commented out. Then just edit your crontab to reenable it. (crontab -e) [22:26:28] (The first command from your user account; the latter from the tool itself) [22:26:45] k, thx :) [22:26:48] There will be an email with reminders once migration is complete on labs-l [22:27:17] * aude eagerly waits [22:29:05] aude: If you read the previous one, there's no new stuff in it. Simply end the migration like described in there except for the parts that had to be done in pmtpa. :-) [22:31:20] how do i login to tools eqiad [22:31:22] tools-login-eqiad.wmflabs.org ? [22:31:43] aude: That still works, but now tools-login also points there. [22:32:45] (In fact, pmtpa is no longer accesible to endusers at all) [22:33:36] i get permission denied via bastion.wmflabs.org [22:33:42] * aude won't trouble you now, though [22:34:35] aude: Proxy through bastion-equiad.wmflabs.org [22:34:50] Well, (a) you don't need to go through a bastion to reach tools-login, and (b) you'd want to use bastion-eqiad if you did. :-) [22:34:53] ok [22:36:09] i do bastion for other labs stuff, so always did that way [22:37:51] ok, it works :) [22:38:48] "sudo: sorry, a password is required to run sudo" [22:38:57] ok, i'll try tomorrow [22:39:20] why does http://tools.wmflabs.org/earwigbot/ work but not http://tools.wmflabs.org/earwigbot ? [22:39:25] I don't remember this happening before [22:40:46] i think it's the " the webserver has temporarily lost its mind" [22:41:49] No, it's the funky too-many-proxy setup being not quite right and being a little anal about enforcing 3986 [22:42:18] I'll add rewriterules to work around it soon. [22:43:09] I have a few instances at eqiad I'm trying to use, can't seem to hit either from private-bastian host. unsure if related to chicanery currently happening for labs or not. [22:43:19] rush-puppet-testing1 is one of them [22:43:27] I was on it friday when I finally figured things out [22:43:41] seems unavailable and reboot through console has no effect [22:44:09] chasemp: Check your project's security groups. Migrated projects usually still have the overly restrictive rules that limit to pmtpa. [22:44:33] I believe this was eqiad from the get go but I'll look [22:44:34] they should have already been created in eqiad [22:44:39] what he said:) [22:45:13] I made test-adminspp-overhaul w/ the suspicion the first instance was somehow weird fresh today and still bunk for it too [22:45:32] chasemp: Anything interesting on the console? [22:45:57] *sighs* For the record hasteur != hashar. I don't do things with core Wikimedia puppet configuration. [22:46:00] failure in name resolution [22:46:22] or 'could not get request certificate: getaddrinfo: temporary failure in name resolution' [22:46:56] but I've tried connectivity by ip from the bastion and still nothing [22:47:15] instance is: rush-puppet-testing1 ip: rush-puppet-testing1 [22:47:28] I can wait on this, but wanted to bring it up in case it's a symptom of something else [22:47:35] considering all the moving and shaking going on today [22:47:54] ip 10.68.16.109 [22:50:00] Coren: the tools sems migrated, but dosen't work. mabye i need to wayt until all tools ar migrated? [22:50:25] Steinsplitter: No. What 'doesn't work' exactly? [22:51:18] chasemp: Today's shaking is limited to tools; tomorrow I believe andrew starts the general labs stuff. [22:51:54] Coren: in that case I don't get it, but I can wait until wednesday for you to school me. My instinct is something isn't right tho. [22:52:30] Coren: the reply to check if i run it on the new cluster and to chck if the migration is done (the standard thing) [22:52:40] wit steinsplitter works perfectly, but delinker not. [22:52:42] *h [22:53:00] but "steinsplitter" i have moved a week (?) ago :/ [22:53:07] chasemp: You're almost certainly right; but honestly today is a bad day for anything that requires in-depth investigation from me. If you can comfortably wait a day or two, that simplifies things a lot. [22:53:20] Steinsplitter: You still haven't told be what doesn't work. [22:53:25] me* [22:53:37] Coren: no problem, I will circle back later [22:53:52] finish-migration delinker [22:54:01] (sorry for my english :P) [22:54:58] That tool doesn't seem to be migrated yet. [22:54:58] Make sure you ran 'migrate-tool delinker' in pmtpa [22:54:58] and that it has completed. [22:54:58] If you have, wait a few minutes and try again. [22:57:04] Aaah! That helps. [22:59:59] Steinsplitter: That's because you can only do it once, and it looks like it was already done. (admitedly, the error message there is confusing) [23:01:32] Steinsplitter: As far as I can tell, your tool is completely and correctly migrated. You might want to reenable entries in your crontab and start a webservice (if your tool has a web interface). [23:01:43] Coren: but why the crons are not copyed? [23:01:53] Steinsplitter: They are. Commented out. [23:02:16] Steinsplitter: Edit your crontab to uncomment the entries you want to run. [23:03:02] Wait, no they weren't. [23:03:09] yes, this is the problem. [23:03:11] ... why in blazes. [23:03:41] yes [23:04:01] Aw, crap. I invoked the wrong script and crontabs weren't copied. [23:04:14] No worries, it's a simple enough fix. :-) [23:04:52] Annoying, but simple. [23:06:51] :) [23:07:01] But it /will/ have to wait until complete migration; otherwise I risk breaking things. Sorry about that. [23:07:17] np :) [23:07:26] * Coren facepalms. [23:07:33] I *knew* things were going too well. [23:15:49] Coren: job 40719 was executed on the tomcat host again [23:16:14] Coren: Is there supposed to be an LDAP user matching each project name? I'm trying to get role::deployment::deployment_servers::labs working and it wants to make a directory that is owned by "${::instanceproject}". I'm not sure if this is a bug in the role or in LDAP for the "deployment-prep" project. [23:16:35] but when I ran the job manually this afternoon, the job ran on tools-exec-03, so it might have something to do with the timing (many jobs starting around 2300 UTC?) [23:17:45] bd808: There is a /group/ named project-$projectname [23:17:53] bd808: But there isn't supposed to be a user. [23:18:37] valhallasw: I don't think that's a particularily busy time. 0h UTC has a little rush, so does 3h, but I never noticed anything around 23h. [23:19:22] Coren: Ok. I'll call it a bug in the role then. :) [23:20:47] Coren: Could you create an LDAP user named [23:21:00] Coren: Could you create an LDAP user named "trebuchet" that I can change the role to use? [23:22:02] bd808: Hm. Ostensibly yes, but having a puppet role that creates stuff in a shared directory seems a little iffy in general. Can you give me a bit more context? [23:22:22] I mean, unless by definition it can only run on exactly one instance in a project. [23:23:07] Coren: the old databases on tools-db with names like pXXXXXgXXXXX__*, these are gone? I'm confused. [23:23:17] Coren: That is the case actually. This is the role that manages the trebuchet deploy master [23:23:26] * bd808 didn't write or design it :) [23:23:26] Earwig: They get renames to sXXXXX__ (which match the new username) [23:23:33] I still can't find them for some reason [23:23:48] Earwig: Was your tool forcibly migrated today? [23:23:53] I guess so [23:24:13] Ah. Then it's normal: database copies come last and will start a bit later. [23:24:22] okay, right. my mistake then. [23:24:30] I'll just update config files and wait. [23:24:44] What's the tool name? [23:24:47] earwigbot [23:25:21] Earwig: It's about 1/4 down the list, so it should arrive relatively soon. [23:25:25] got it [23:26:41] Coren: Here's the role in question: https://git.wikimedia.org/blob/operations%2Fpuppet.git/5404f3470b63b51c1c15b8301c94e107d5e63f2e/manifests%2Frole%2Fdeployment.pp#L246 [23:27:00] bd808: No, that's okay. It's a reasonable system user to create. [23:27:27] I just want to avoid creating too many odd project-specific global users but trebuchet seems like a good candidate. [23:27:50] Cool. That's the username from prod too. [23:28:39] I would have just set it to "root" but I'm sure there's something un-good about that option. [23:30:38] bd808: uid=604(trebuchet) gid=604(trebuchet) groups=604(trebuchet) [23:30:47] Coren: Thanks! [23:38:34] Coren: is there something wrong with commonswiki replication ? [23:39:43] Betacommand: Not that I know of, but migrations that weren't done manually are not complete for the most part. [23:39:53] Betacommand: At least, those that have crontabs or databases. [23:39:58] Coren: 19.5 hour lag [23:40:08] Oh, replication. [23:40:17] * Coren read 'migration'. [23:40:48] Betacommand: I don't know. Probably nothing more than an asocial tool holding a lock. I'll take a look at it later. [23:41:13] Coren: No problem just took a look at my replag tool and noticed it [23:58:58] Coren: Hi, are you there? [23:59:26] petan: are you there? [23:59:49] Amir1: I'm around.