[00:10:43] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Infrastructure: CI rejecting every patch to OpenStackManager - https://phabricator.wikimedia.org/T125008#1972188 (10Andrew) 3NEW a:3hashar [00:13:54] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Infrastructure: CI rejecting every patch to OpenStackManager - https://phabricator.wikimedia.org/T125008#1972220 (10Reedy) https://gerrit.wikimedia.org/r/#/c/266942/ should be merged, but it shows a bigger problem. A dependency on LdapAuthentic... [00:35:15] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Infrastructure: CI rejecting every patch to OpenStackManager - https://phabricator.wikimedia.org/T125008#1972364 (10scfc) [00:35:17] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Config: ApiDocumentationTest failure: Undefined property: AuthPlugin::$boundAs - https://phabricator.wikimedia.org/T124613#1972365 (10scfc) [00:38:41] YuviPanda: Connecting to the tool labs databases via MySQL Workbench doesn't seem to work any more :( [00:39:22] need more information kaldari :) how are you connecting, what credentials you are using, how are you tunneling through it... [00:39:36] there was also https://phabricator.wikimedia.org/T122658 [00:39:39] I'm using the settings recommended at https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Configuring_MySQL_Workbench [00:39:41] which might have relevant information [00:40:32] kaldari: so what happens when it doesn't work? does it not connect at all? [00:41:01] Authentication error, unhandled exception caught in tunnel manager [00:41:12] It used to work fine though [00:41:26] are you able to ssh to tools-login.wmflabs.org? [00:42:09] sure. I ssh to it every day [00:42:52] YuviPanda: Anyway, just wanted to see if you knew something was broken. I can troubleshoot it on my end [00:42:59] kaldari: nope, nothing I can see is broken... [00:43:02] am trying a small tunnel [00:43:38] YuviPanda: I was trying to use mySQL Workbench because I could use mysqldump [00:43:57] could=couldn't [00:45:07] so [00:45:09] with [00:45:11] > yuvipanda@picard:~/code/quarry$ ssh -L 3306:enwiki.labsdb:3306 tools-login.wmflabs.org [00:45:15] I can run mysql from my local client [00:46:48] yuvipanda@picard:~/code/quarry$ mysqldump -h 127.0.0.1 -d enwiki_p > test [00:46:51] mysqldump: Got error: 1044: Access denied for user 's51275'@'%' to database 'enwiki_p' when using LOCK TABLES [00:46:53] kaldari: yeah, I can't either [00:48:13] kaldari: what're you trying to do with mysqldump, btw? [00:48:21] hmm. ssh -L 3306:enwiki.labsdb:3306 tools-login.wmflabs.org doesn't work for me [00:48:45] so if I add --skip-lock-tables [00:48:49] to mysqldump it works for me [00:48:54] kaldari: what happens when you do it? [00:49:39] YuviPanda: I was trying to get a CSV copy of the externallinks table from a small wiki for the Internet Archive to use. They want to try feeding our external links data into their archiving script. But it looks like I can get that table in SQL from https://dumps.wikimedia.org/backup-index.html :) [00:50:19] :D [00:50:21] ok [00:50:37] kaldari: since I've a working copy, I can also just give you the dump if you want [00:50:39] wel [00:50:41] well [00:50:44] working mysqldump [00:51:04] YuviPanda: I'll worry about tunneling later. Thanks for looking into it for me though! [00:51:09] ok! [00:51:11] np [03:39:42] 6Labs, 10Labs-Infrastructure: tools-worker-1002 locked up - https://phabricator.wikimedia.org/T125039#1972864 (10yuvipanda) 3NEW [03:41:34] 6Labs, 10Beta-Cluster-Infrastructure, 6operations: Duplicate IP address DNS entry - https://phabricator.wikimedia.org/T125040#1972870 (10mobrovac) 3NEW [08:05:04] Hello, I am regularly receiving error messages in e-mail since two days from Cron Daemon about my scheduled jobs. [08:05:31] "/bin/sh: 1: cannot create /dev/null" Has anyone else such an experience? [08:05:54] ": Permission denied" [08:20:17] ato_: tool labs? [08:21:56] tools-submit has udev mounted [08:22:08] udev on /dev type devtmpfs (rw,mode=0755)' [08:24:11] yes [08:25:09] I don't have such emails [08:25:56] earlier I did not have too. [08:26:01] what's the job? [08:26:10] (I mean the command) [08:26:10] All my jobs [08:26:15] ugh [08:26:59] My jobs are things like follows... [08:27:04] 45 4 * * * jsub -once -N sign-of-life -mem 512m -o public_html/log/sign-of-life.txt -j y scripts/sign-of-life-core.sh > /dev/null [08:28:15] On the site https://wikitech.wikimedia.org/wiki/Help:Cron is written: A VERY important note: PLEASE add the "> /dev/null" ... [08:28:45] hmm 20:30 YuviPanda: switched over cron host to tools-cron-01, manually copied all old cron files from tools-submit to tools-cron-01 [08:28:57] 2016-01-25 [08:29:05] about two days ago [08:30:46] exactly [08:31:11] which has udev mounted as well... udev on /dev type devtmpfs (rw,mode=0755) [08:31:54] same on tools-cron-02 [08:33:40] weird [08:34:57] ato_: you'd better ask Yuvi or file a bug. [08:35:19] Ok. I will do when I see him. [08:37:50] oh, does the mail say which host it is from? [08:38:21] in the title there's like Cron [08:42:06] It say nothing else [08:42:29] ? [08:42:38] The title is: "" [08:42:39] the title of the email [08:42:55] that's extremely weird [08:43:02] The title is: "Cron Daemon - Cron jsub -once -N szubcsonk -mem 512m -o public_html/log/szubcsonk.txt -j y scripts/szubcsonk-core.sh > /dev/null" [08:43:33] then the host is tools-cron-01 [08:43:39] Yes [08:45:44] * zhuyifei1999_ has no idea why [08:48:04] zhuyifei1999_: please file a bug\ [08:48:28] eh, sorry, ato_ , please file a bug [08:49:08] ugh valhallasw`cloud you always tell me to file a bug :P [08:49:23] zhuyifei1999_: yes. Please always file bugs :D [08:49:54] it makes it much easier for us to track what's going wrong [08:50:06] because anything which happens in irc is basically lost [08:50:49] fine whatever [09:06:19] 6Labs, 10Tool-Labs: tools-cron-01 - /bin/sh: 1: cannot create /dev/null - : Permission denied - https://phabricator.wikimedia.org/T125053#1973408 (10Ato_01) p:5Triage>3Normal [09:08:15] I have successfully done my first file :) [09:17:43] 6Labs, 10Tool-Labs: tools-cron-01 - /bin/sh: 1: cannot create /dev/null - : Permission denied - https://phabricator.wikimedia.org/T125053#1973428 (10zhuyifei1999) For the record, in [[https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL|SAL]], 2016-01-25 20:30 @YuviPanda: switched over cron host to tool... [09:45:02] 6Labs, 10Tool-Labs: tools-cron-01 - /bin/sh: 1: cannot create /dev/null - : Permission denied - https://phabricator.wikimedia.org/T125053#1973491 (10valhallasw) Odd: ``` valhallasw@tools-cron-01:~$ ls -l /dev/null crw-rw-rw- 1 root root 1, 3 Jan 21 19:44 /dev/null valhallasw@tools-cron-01:~$ echo "boo" > /dev... [12:47:11] is emails from labs stuck somewhere ? I am not getting any emails :( [13:50:25] tonythomas: please file a bug ;-) [13:50:35] valhallasw`cloud: okey. doing that now [13:51:12] please include as much information as possible (sender? what address it was sent to (something@tools.wmflabs.org?), etc [13:51:53] valhallasw`cloud: okey. in that case, I will have to check the logs [13:55:32] tonythomas: I don't see anything obvious in the exim log [13:55:59] valhallasw`cloud: hmm. even quim was tellign me the same thing [13:56:11] 'the same thing'? [13:56:24] btw, do we need to have some other special security policy ( like port 25 ) open to send mails from labs instance vagrant ? [13:56:40] 17:03 Hm. I cannot confirm my email address because I'm not receiving anyconformation code from http://newsletter-test.wmflabs.org/wiki/Special:ConfirmEmail [13:57:06] oh, we're talking about general labs instances [13:58:03] valhallasw`cloud: well. I am talking about mediawiki labs instance [13:58:07] inside vagrant [13:58:29] Yeah. I don't think outgoing port 25 is blocked, but there might not be a mail server running on your host. [13:58:40] valhallasw`cloud: exactly [13:58:43] exim [13:58:43] -bash: exim: command not found [13:59:00] should I enable that role in wikitech or something ?? [13:59:31] No idea. [14:00:12] it might get tough if I try configuring exim on my own though, should know the IP of mail servers [14:35:40] valhallasw`cloud, so I'm unable to tunnel into labs all of a sudden. [14:36:18] hmm same [14:36:46] breaks after debug3: send_pubkey_test [14:36:46] debug2: we sent a publickey packet, wait for reply [14:37:59] FATAL ERROR: Server unexpectedly closed network connection [14:38:45] Hello! [14:38:59] * zhuyifei1999_ is writing a phab ticket [14:43:02] YuviPanda, ^^^ [14:43:02] I can log via ssh but can't use scp, i'm running scp in a local terminal. It always fail the verbose mode tel "file doesn't exist" or "permission denied". [14:43:02] wait, it just worked a few seconds ago [14:43:02] but logging out hangs forever o.O [14:43:03] The tunnel is a bit slow. [14:43:03] now works somehow [14:43:13] Cyberpower678: which host? [14:43:22] also, please file a bug in phab so we can track it. [14:43:23] Youni what files? [14:43:44] valhallasw`cloud: everything seemed to recover last minute [14:44:13] typicaly i'm trying to upload index.html for tools.vocabulary-index [14:44:34] Youni: make sure group has rights to write in that directory [14:45:28] before that, ssh connections to instances (whether or not passing through the labs bastion) just hangs and get disconnected [14:47:38] valhallasw`cloud, it's working again. Must've been a momentary outage. [14:48:20] ok i saw a few help about it, let me confirm i have to give the write access to the group named "my user" in the directory "my instance"... i'll try this. [14:48:45] Youni: no. https://wikitech.wikimedia.org/wiki/Help:Tool_Labs#Handling_permissions [14:49:04] you need to give write access to the group tools. [14:51:20] ok i'm going slowly, i'm checking all that... [14:53:36] Youni: if you're using only tool labs yet reading docs about 'directory "my instance"', you're most likely reading the wrong docs [14:54:28] i'm on the link https://wikitech.wikimedia.org/wiki/Help:Tool_Labs#Handling_permissions [14:55:16] k [14:57:36] is it normal that i can't scp anyfile in /home/y-verciti/ ? [14:59:05] i'm currently failing here : [14:59:07] scp -v ~/public_html/index.html y-verciti@bastion.wmflabs.org:~/newfile [15:10:02] Well i updated the rights of vocabulary-index and i'm abble to cp a file from /home/y-verciti/ to /data/project/vocabulary-index/ [15:16:07] Youni: first, you shouldn't have a public_html directory unless you're on a tool account [15:16:39] second, why are you creating files on bastion.wmflabs.org ? [15:17:03] I can't scp in y-verciti/ and i can't in /data/project/vocabulary-index from my local term [15:17:48] 6Labs, 10Tool-Labs: tools-cron-01 - /bin/sh: 1: cannot create /dev/null - : Permission denied - https://phabricator.wikimedia.org/T125053#1974101 (10scfc) The dump at `/data/project/.system/crontabs` has no `\r`s, so the line endings must have been introduced between: ``` scfc@tools-cron-01:~$ sudo fgrep -l '... [15:18:05] Youni: I'm not sure what ~ does at the remote end of scp. Try scp -v ~/public_html/index.html y-verciti@bastion.wmflabs.org:./newfile instead? [15:18:53] i'm doing somthing wrong? yesterday i try a few test around public.html end test files! [15:19:18] i'll give you the return with full path... [15:19:33] Youni: oh! [15:19:37] you're scping to bastion [15:19:40] not to tools-login [15:19:57] use tools-login.wmflabs.org instead of bastion.wmflabs.org [15:21:16] moment [15:22:24] when production docs, labs docs, and tool labs docs are on the same wiki, readers get confused [15:22:39] * zhuyifei1999_ don [15:23:05] '/me don't understand why labconsole and wikitech got merged [15:23:38] WONDERFUL [15:26:22] Sorry for all, and congratulations, i put my first file on the y-verciti directory. I hope the rest will be more easy. I didn't know the scp command. this is done ;-) [15:48:48] 6Labs, 10Tool-Labs, 7Shinken: Lots of hosts' services missing from Shinken - https://phabricator.wikimedia.org/T123271#1974185 (10scfc) a:3scfc The problem is due to the move of the application of `role::labs::instance` to all instances from the LDAP records to `manifests/site.pp`. `shinkengen` copies the... [15:55:28] 6Labs, 10Beta-Cluster-Infrastructure, 6operations: Duplicate IP address DNS entry - https://phabricator.wikimedia.org/T125040#1974191 (10hashar) integration-t102108-jessie-new2 is an old one that got deleted but apparently is still registered in LDAP :( [15:59:25] 6Labs, 10Labs-Infrastructure, 10Datasets-Archiving, 10Datasets-General-or-Unknown, 10Wikidata: [Bug] Wikidata JSON dumps gets deleted after every new Wikidata dump - https://phabricator.wikimedia.org/T107226#1974205 (10Hydriz) Putting it in `/public/dumps/wikibase` made more sense to me as it uses the na... [16:03:12] PROBLEM - Puppet failure on tools-exec-1206 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:03:26] PROBLEM - Puppet staleness on tools-grid-master is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [16:03:26] PROBLEM - Puppet failure on tools-exec-1203 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:03:38] PROBLEM - Puppet failure on tools-exec-1220 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:03:38] PROBLEM - Puppet failure on tools-precise-dev is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:13:29] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#1974235 (10NiharikaKohli) >>! In T120497#1966721, @Egedda wrote: > Hello everyone. > Hopefully we will be able to host our code repository at wikimedias gerrit-s... [16:13:29] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:18:10] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 777205 bytes in 3.626 second response time [16:55:00] 10Labs-Other-Projects: Creating new messages via e-mail - https://phabricator.wikimedia.org/T125098#1974344 (10AdHuikeshoven) 3NEW [16:56:35] 10Labs-Other-Projects: Set up reply via email support - https://phabricator.wikimedia.org/T125099#1974356 (10AdHuikeshoven) 3NEW [17:03:48] 6Labs, 10Tool-Labs: tools-cron-01 - /bin/sh: 1: cannot create /dev/null - : Permission denied - https://phabricator.wikimedia.org/T125053#1974374 (10Ato_01) Thank you all! The problem, I did is solved. I think I used the text editor of WinSCP and it was not the best choice. :D [17:04:37] 6Labs, 10Tool-Labs: tools-cron-01 - /bin/sh: 1: cannot create /dev/null - : Permission denied - https://phabricator.wikimedia.org/T125053#1974377 (10Ato_01) 5Open>3Resolved [17:17:21] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Gedda was created, changed by Gedda link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Gedda edit summary: Created page with "{{Tools Access Request |Justification=Project manager for the pageview Stats tool https://phabricator.wikimedia.org/T120497 |Completed=false |User Name=Gedda }}" [17:22:30] 10Labs-Other-Projects: Problem creating an account at https://discourse.wmflabs.org/ - https://phabricator.wikimedia.org/T125107#1974477 (10AdHuikeshoven) 3NEW [17:25:07] 10Labs-Other-Projects: Succesful pilot of Discourse on https://discourse.wmflabs.org/ as an alternative to wikimedia-l mailinglist - https://phabricator.wikimedia.org/T124690#1974493 (10AdHuikeshoven) [17:30:32] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#1974520 (10Egedda) >>! In T120497#1974235, @NiharikaKohli wrote: > > Hi @Egedda. I think http://tools.wmflabs.org might be what you're looking for, not Gerrit.... [17:34:38] Reedy: Currently here? [17:34:49] Yes, but packing up to leave [17:35:57] Can you add an answer at [17:36:07] T119829#1947675? (Not urgent) [17:36:42] 6Labs, 10Beta-Cluster-Infrastructure, 6operations: Duplicate IP address DNS entry - https://phabricator.wikimedia.org/T125040#1974545 (10Andrew) 5Open>3Resolved a:3Andrew It's in designate, not ldap. But, yes, looks like we leaked one; I've cleaned it up. [17:37:44] Luke081515: It can be removed from any and all [17:37:53] ok [17:38:00] I will start in a few minutes. Thanks :) [17:54:13] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Infrastructure: CI rejecting every patch to OpenStackManager - https://phabricator.wikimedia.org/T125008#1974658 (10Andrew) 5duplicate>3Open I'm re-opening... there are several issues in a row that CI is rejecting. And of course fixing thos... [17:56:09] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Config, 5Patch-For-Review: ApiDocumentationTest failure: Undefined property: AuthPlugin::$boundAs - https://phabricator.wikimedia.org/T124613#1974691 (10Andrew) Surely the CI environment can stub out calls and make those ldap calls return a re... [17:57:45] 10Gerrit-Patch-Uploader, 6Collaboration-Team-Backlog, 10Flow: Flow is breaking OAuth login - https://phabricator.wikimedia.org/T60705#1974827 (10Luke081515) [17:59:53] 6Labs, 10Tool-Labs: Evaluate installing timezone aware cron - https://phabricator.wikimedia.org/T48170#1975195 (10Luke081515) [18:00:06] 6Labs, 10Tool-Labs: Install libmediawiki-api-perl - https://phabricator.wikimedia.org/T48105#1975237 (10Luke081515) [18:00:32] 6Labs, 10Tool-Labs, 6operations: Relax restrictions on .htaccess - https://phabricator.wikimedia.org/T48003#1975308 (10Luke081515) [18:01:17] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Gedda was modified, changed by Gedda link https://wikitech.wikimedia.org/w/index.php?diff=276332 edit summary: [18:01:44] 10Wikibugs: wikibugs - throttle output, don't get kicked for flooding - https://phabricator.wikimedia.org/T112032#1975471 (10Luke081515) 5Resolved>3Open T119829: 17:59 -!- wikibugs [tools.wiki@wikimedia/bot/pywikibugs] has quit [Excess Flood] He leaved after a big job.... [18:02:19] * Luke081515 should use smaller jobs next time [18:04:17] Luke081515: meh, it's fine [18:04:40] Luke081515: although I'm not sure why you're removing wikibugs? [18:04:45] valhallasw`cloud: Why does wikibugs notify, if only a suscriber change? [18:04:50] https://phabricator.wikimedia.org/T119829 [18:05:24] (Not removing all tasks at once, just a little piece) [18:05:37] Luke081515: I would just not do that? it spams everyone for no good reason [18:06:16] and why wikibugs reports it: in this case probably because your changes triggered a Herald project assignment [18:06:40] One job is already running. I can wait till I submit the next jobs, but I can not abort the current job :( [18:06:50] this is a disadvantage of bulk edits [18:09:38] :( [18:12:01] the bulk will stop in 1-2 minutes [18:13:53] This was the last kick now [18:14:05] oh [18:14:11] the job finished before [18:14:13] 10Wikibugs: wikibugs - throttle output, don't get kicked for flooding - https://phabricator.wikimedia.org/T112032#1977570 (10valhallasw) I think freenode probably has two sets of limits, e.g. max x messages per 10 seconds, and max 5x messages per 5 minutes. Your bulk change might cause it to hit the latter. [18:26:47] valhallasw`cloud: Can you take a look at wikibugs? He did not rejoined the releng channel :-/ (Sorry for let him flooding... trying to avoid this next time) [18:27:05] Luke081515: wikibugs only joins once something should be reported [18:27:13] ah, ok [18:27:14] -labs is the exception to this rule [18:30:30] tonythomas: Did you figure out your MediaWiki-Vagrant email problem? MediaWiki-Vagrant sets up a mailer that traps all email and delivers it to the local "vagrant' user inside the VM. I'm pretty sure that doesn't behave any differently in a Labs LXC guest. [18:39:23] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Infrastructure: CI rejecting every patch to OpenStackManager - https://phabricator.wikimedia.org/T125008#1978973 (10scfc) When I look at https://integration.wikimedia.org/ci/job/mwext-testextension-php53/941/consoleText, the only two errors that... [18:48:09] 10MediaWiki-extensions-OpenStackManager, 10MediaWiki-Authentication-and-authorization, 6Reading-Infrastructure-Team: Update OpenStackManager to use AuthManager - https://phabricator.wikimedia.org/T110461#1979012 (10bd808) [18:48:11] 10MediaWiki-extensions-OpenStackManager, 10MediaWiki-Authentication-and-authorization, 6Reading-Infrastructure-Team: Update OpenStackManager to use AuthManager - https://phabricator.wikimedia.org/T110288#1979014 (10bd808) [18:48:16] 6Labs, 10Labs-Infrastructure, 10Tool-Labs, 10MediaWiki-extensions-SemanticForms, 5Patch-For-Review: https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request down - https://phabricator.wikimedia.org/T123583#1979018 (10Nikerabbit) [18:49:52] YuviPanda: is there a phab task somewhere for killing off OSM? [18:50:01] (or am I making that up?) [18:50:19] 10Labs-Other-Projects: Problem creating an account at https://discourse.wmflabs.org/ - https://phabricator.wikimedia.org/T125107#1979024 (10Steko) Ideally discourse.wmflabs.org should be working with Wikimedia single-signon (SSO) credentials, apart from the specific problem listed here. Detailed instructions are... [18:54:14] there is one for getting horizon going which is kind of the flip side [18:54:22] should be assigend to andrew I think? [18:56:54] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Gedda was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=276389 edit summary: [19:04:00] 10Labs-Other-Projects: Problem creating an account at https://discourse.wmflabs.org/ - https://phabricator.wikimedia.org/T125107#1979083 (10AdHuikeshoven) >>! In T125107#1979024, @Steko wrote: > Ideally discourse.wmflabs.org should be working with Wikimedia single-signon (SSO) credentials, apart from the specifi... [19:11:47] bd808: yeah, not exactly 'kill of OSM' [19:14:24] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/TerraCodes was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=276454 edit summary: [19:16:28] 6Labs, 10Tool-Labs, 5Patch-For-Review: GridEngine down due to bdb issues - https://phabricator.wikimedia.org/T122638#1979152 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Sooo... we announced downtime and messed around. Chase and Valhallasw found that dumping and reloading the queue bdb had no effect, bu... [19:16:49] 6Labs, 10Tool-Labs, 5Patch-For-Review: GridEngine down due to bdb issues - https://phabricator.wikimedia.org/T122638#1979155 (10yuvipanda) a:5yuvipanda>3None [19:17:01] chasemp: puppet sitll disabled on tools-grid-master [19:17:20] my bad that can be enabled, I'll take care of it [19:17:37] chasemp: kk thanks :D [19:27:25] \o/ [19:27:37] YuviPanda: in retrospect, we should have seen it was the config database [19:27:47] it always had been complaining about USER keys rather than jobs [19:29:32] yeah [19:29:50] but that one is not hanging on our heads anymore! [19:29:52] but [19:29:58] chasemp: tools-worker-1002 died yesterday [19:30:10] saw that but not why yet [19:30:14] did you reboot it? [19:30:15] https://phabricator.wikimedia.org/T125039 [19:30:17] no [19:30:19] I just depooled it [19:30:20] from k8s [19:30:28] ok let me try to take a look today [19:30:33] any idea what time it died? [19:30:34] chasemp: thanks [19:30:42] chasemp: *at least* 24h ago [19:30:47] don't have a firm timeline [19:30:53] I can look at k8s logs [19:33:19] if you could that would be cool [19:33:32] I'm redlly wondering if it has been stuck since the nfs fallout from teh wikimedia redirect [19:33:36] if so then no mystery [19:35:25] > Jan 25 15:11:41 tools-k8s-master-01 kube-apiserver[1282]: E0125 15:11:41.075571 1282 errors.go:62] apiserver received an error that is not an unversioned.S [19:35:25] 6Labs, 10Tool-Labs, 10wikitech.wikimedia.org, 7Documentation: Create a wiki documentation page for each tool - https://phabricator.wikimedia.org/T122865#1979203 (10bd808) Any objection by the watchers here to my general proposal from {T123429} to make a nice place for such tool docs but not automatically c... [19:35:25] 6Labs, 10Labs-Infrastructure: tools-worker-1002 locked up - https://phabricator.wikimedia.org/T125039#1979206 (10yuvipanda) First error from Jan 25 15:11:41 UTC, when it timed out trying to reach 1002. [19:35:26] chasemp: I still feel like the wikimedia redirect and nfs outage were only co-incidently related [19:35:26] chasemp: we should also write an incident report for that [19:35:29] at least, not causal :) [19:35:35] yeah let's do that but I think it was pretty related [19:36:04] 6Labs, 10Tool-Labs, 10wikitech.wikimedia.org, 7Documentation: Create a wiki documentation page for each tool - https://phabricator.wikimedia.org/T122865#1979208 (10Matthewrbowker) As a developer, I **support** creation of a namespace and **oppose** automatic creation of pages for the reasons stated above. [19:36:15] ok [19:36:56] I'm not up for testing the theory though atm :) [19:37:22] heh [19:37:34] * YuviPanda is working from uc berkeley today [19:37:49] I'm going to separate the flannel etcd cluster from the k8s one today [19:37:52] different security domains [19:54:55] valhallasw`cloud: btw, in trying to build in-house debian/ubuntu images, I spent a few days fighting with the upstream bash scripts and then eventually just rewrote it to python [19:54:57] https://github.com/wikimedia/operations-docker-images-debian [19:55:04] next step will be to try get upstream to accept the rewrite [19:55:05] or port [19:59:37] YuviPanda: ah, nice [19:59:57] so this allows easily customizable base debian (and soon ubuntu) images [20:00:06] the wikimedia-prod one already works fine and is useful [20:00:20] is there a reason why we want debian/ubuntu specifically? iirc there was a 'tiny docker linux' as well [20:00:43] or just because we use debian everywhere? [20:00:46] also, why not just jessie? [20:07:04] valhallasw`cloud: it is just jessie [20:07:18] valhallasw`cloud: and for why we want ubuntu because we've to recreate the current exec_environs for legacy stuff [20:07:43] new stuff should all be just jessie, but if we say 'yeah, only php 5.6 from now on!' all the fucktons of stuff we have on precise are going to stay that way [20:08:14] valhallasw`cloud: yeah, alpine linux, but I don't think we care about it being tiny [20:09:39] YuviPanda: do containers share memory at all? because they are completely seperated from the main system I would guess they don't (after all, when is a shared library the same then?) [20:10:02] but yeah, I suppose we need precise webgrid hosts on k8s as well [20:11:50] valhallasw`cloud: they do, they're just linux processes ultimately [20:12:36] valhallasw`cloud: container is just a linux process with cgroups and namespaces around [20:12:49] although [20:12:51] now that you meantion it [20:12:57] that's a good point and maybe they don't [20:12:59] oh [20:13:01] but [20:13:03] if all of the processes [20:13:08] are running from the same base image [20:13:12] they probably should totally be able to [20:13:15] needs testing [20:15:27] YuviPanda: yeah I don't know how shared libs work in linux to be sure [20:17:47] mm, it might be fine if the soname is the same [20:45:44] I just figured out nfsiostat is a damn liar so now I'm going to take a minute and come back to it :) [20:48:24] 6Labs, 10Tool-Labs: tools-cron-01 - /bin/sh: 1: cannot create /dev/null - : Permission denied - https://phabricator.wikimedia.org/T125053#1979472 (10scfc) >>! In T125053#1974101, @scfc wrote: > […] > However I always assumed that `cron` feeds the line to a shell which then splits it into words, etc., and I exp... [20:55:54] 6Labs: Figure out what to do about servicegrouphomedirpattern - https://phabricator.wikimedia.org/T125002#1979505 (10Andrew) Service groups currently exist in tools, tools-beta, bots, catgraph and snuggle. In tools, tools-beta and bots the the default homedir is set to /data/project/. In catgraph and... [20:58:44] Reedy or anomie, can I get a +1 for https://gerrit.wikimedia.org/r/#/c/266942/? [21:04:48] hashar: If you’re still working, can I get a hand with https://phabricator.wikimedia.org/T124613 [21:04:49] ? [21:12:20] andrewbogott: oh [21:12:58] andrewbogott: I dont have any good solution for that one :-: [21:13:11] Is it a new test? Or an old test that just started breaking? [21:13:19] the ApiDocumentationTest invokes the api methods [21:13:26] and execute the code [21:13:34] but openstackmanager api entry point attempt to connect to a ldap backend :/ [21:13:48] ApiDocumentationTest is a test from mediawiki/core [21:14:04] it is in core so we have it to run on all extensions automagically [21:14:19] kind of our "please adhere to mediawiki conventions in your extensions" [21:14:19] What changed to break everything? [21:14:33] so what changed is the ApiDocumentationTest got introduced in mediawiki [21:14:43] when you send a patch for openstack manager, CI clones MediaWiki as well [21:14:56] and run a set of tests inside mediawiki [21:15:17] so a change made to mediawiki such as introducing ApiDocumentationTest test, can well break an extension even if the extension received no change [21:16:14] So maybe the fix is to alter ApiDocumentationTest and prevent the test in this case [21:16:37] I am not sure how we could dynamically skip/ignore that test [21:16:38] I guess I’ll look at that :/ [21:16:49] the easiest would be to revert the change in mediawiki [21:19:39] what I tried to explain in my comment, is that would need a way to inject a different ldap backend [21:20:22] yeah [21:21:06] or define a stub for $wgAuth [21:22:12] I’ll read the test and see what I can do [21:22:34] So many digressions... [21:25:56] :( [21:25:58] at least anomie is aware of it [21:29:42] andrewbogott: For https://gerrit.wikimedia.org/r/#/c/266942/, I think you will need the check for $wgAuth instanceof LdapAuthenticationPlugin after all. [21:30:31] anomie: why? [21:31:02] andrewbogott: The test failure you're getting on PS3 is because it's trying to call a method on $wgAuth that only exists on LdapAuthenticationPlugin. [21:31:12] ah, I see. ok... [21:31:30] so that makes me think the test is wrong, not so much the code :) [21:32:18] The other option would be to have CI check out LdapAuthentication in order to run the tests for OpenStackManager, if it's not doing that already. [21:32:47] oh [21:32:50] worth a try [21:34:29] https://gerrit.wikimedia.org/r/267147 [21:34:35] though I am not sure how $wgAuth would be set [21:35:54] hashar: it looks like 'Call to undefined method’ is catchable in php7 but not in earlier versions :( [21:36:45] with hhvm [21:36:56] you can even redefine a built-in function iirc [21:37:06] CI doesn’t run the tests on hhvm though, does it? [21:37:19] something like: facebook_redefine_function( 'ldap_list', function () { return array(); } ); [21:45:13] ok, so, including LdapAuthentication didn’t help [21:45:41] anomie: you are my hero [21:45:59] anomie: I would have to send you goodies at home :-} [21:46:05] andrewbogott: got a patch [21:46:48] link? [21:46:51] YuviPanda: Everything's on kilo already, right? [21:47:00] * ostriches still sees juno files lying about puppet [21:47:24] ostriches: yep, kilo [21:47:34] but I always keep n-1 around so I know what to look at when writing the n+1 config [21:47:59] writing the doc [21:48:13] Er, nvm then [21:48:23] Also, I see `class openstack ($version = 'juno'){}` [21:48:25] :) [21:48:56] well, it’s overridden in hiera, I hope [21:49:09] it is we checked before :) [21:51:29] andrewbogott: it is running at https://gerrit.wikimedia.org/r/#/c/267152/ [21:51:45] ah [21:54:38] I still kind of hate that we’re hacking the tested code rather than fixing the test. Not that I know how to fix the test. [21:55:26] now that fails with "PHP Fatal error: Call to undefined function ldap_bind() in /mnt/jenkins-workspace/workspace/mwext-testextension-php53/src/extensions/LdapAuthentication/LdapAuthentication.php on line 176" .. [21:55:56] and of course, the test suite pass on my machine [22:16:38] andrewbogott: sorry it is too late for me :-/ [22:16:56] at least https://gerrit.wikimedia.org/r/#/c/267152/ does work for me [22:17:03] hashar: ok. I’ll just stare daggers at anomie until he fixes it :) [22:17:04] $wgAuth is properly set to LdapAuthentication on my local machine [22:18:14] hashar: Does the CI infrastructure have PHP's ldap extension installed? [22:18:35] (probably package php5-ldap) [22:18:52] oh [22:19:00] yeah I was thinking about that [22:19:42] un php5-ldap [22:19:58] but my machine as it [22:22:56] andrewbogott: how do we get php5-ldap on silver ? it is not in puppet.git apparently [22:24:15] anomie: you are, as usual, the most helpful hacker around :} [22:25:33] ok, I’ll write a puppet patch... [22:25:43] oh, wait, I misread. [22:25:47] I don’t know! I’ll look [22:25:55] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Config, 5Patch-For-Review: ApiDocumentationTest failure: Undefined property: AuthPlugin::$boundAs - https://phabricator.wikimedia.org/T124613#1979874 (10hashar) So to get it fixed we need: [x] LdapAuthentication injected as a dependency by CI... [22:27:20] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Infrastructure: CI slaves need package php5-ldap for OpenStackManager/LdapAuthentication - https://phabricator.wikimedia.org/T125158#1979880 (10hashar) 3NEW [22:27:34] andrewbogott: I have just manually installed it [22:27:41] and filled a task to remember to puppetize [22:28:14] it is puppetized on silver, in openstack_manager.pp [22:29:54] hey it passed! [22:30:16] andrewbogott: https://gerrit.wikimedia.org/r/267165 is the puppet related stuff [22:31:08] anomie: thank you very much [22:31:22] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: CI slaves need package php5-ldap for OpenStackManager/LdapAuthentication - https://phabricator.wikimedia.org/T125158#1979909 (10hashar) a:3hashar [22:31:28] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Config, 5Patch-For-Review: ApiDocumentationTest failure: Undefined property: AuthPlugin::$boundAs - https://phabricator.wikimedia.org/T124613#1979911 (10hashar) a:3hashar [22:32:26] all set I guess [22:32:44] the php5-ldap is installed on precise nodes which run the Zend PHP5.3 jobs [22:33:06] anomie: will we still need https://gerrit.wikimedia.org/r/#/c/266942/ after this? [22:33:12] puppet patch clarify it https://gerrit.wikimedia.org/r/267165 [22:33:54] quite happy we haven't had to revert the mediawiki ApiDocumentationTest [22:33:56] andrewbogott: It depends on whether the tests start passing without it ;) [22:34:12] pass without :-} [22:35:04] it is terrible [22:35:10] since it is only for the wmf ci context [22:35:26] so locally it would still fail whenever you get OpenStackManager / LdapAuthentication loaded [22:37:51] https://gerrit.wikimedia.org/r/#/c/266940/ [22:37:55] passes! [22:38:00] \O/ [22:38:05] so [22:38:05] So I’m unblocked. Thanks y'all [22:38:17] so... [22:38:31] you can tell who is the smartest guy in the room: anomie who in 3 lines have a 100% success rate in unblocking the whole mess [22:38:46] andrewbogott: then you can just 'recheck' all patches [22:38:51] wel, he also /caused/ the mess :p [22:39:11] on good faith [22:39:31] despite a lot of attention, the patch got reverted because some extensions were lagged out [22:39:39] and apparently OSM has not been detected [22:40:10] OSM is the pointiest of the corner cases. [22:40:34] I have rechecked all ofthem [22:40:41] sleep time now [22:40:49] have a good aftenroon! [22:41:18] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Infrastructure: CI rejecting every patch to OpenStackManager - https://phabricator.wikimedia.org/T125008#1979972 (10Andrew) 5Open>3Resolved @scfc, you're right. And, in any case, this is fixed! [22:41:30] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Config, 5Patch-For-Review: ApiDocumentationTest failure: Undefined property: AuthPlugin::$boundAs - https://phabricator.wikimedia.org/T124613#1979974 (10Andrew) 5Open>3Resolved