[00:05:25] !help [00:05:25] kdhingra307: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [00:06:11] kdhingra307: What's up? [00:07:20] was trying to access vps but I was denied access by remote [00:07:38] I have set up ssh public keys and config properly [00:08:16] but my account does not belong to shell group (mentioned in wiki guide to connect to vps) [00:09:30] can u help bstorm_ [00:09:52] What is your project name? [00:11:59] I am working in search/MjoLniR, I am a GSoC student and need vps for analysis of my models [00:12:36] Ah ok. Have you requested the VPS project yet? https://phabricator.wikimedia.org/project/view/2875/ [00:13:00] Or is it an existing project that you should have access to? [00:13:57] ebernhardson has already configured it i guess [00:14:10] \msg bstorm_ [00:15:18] though its subdomain is clickmodel.eqiad.wmflabs [00:16:40] !help Hello, Amitie 10g once again here. I'm running the WebArchiveBOT backend. When configuring the Pages per query with 1000, the message "SQLSTATE[HY000] [2006] MySQL server has gone away" is printed, but configuring to 100, there is enough time to avoid the connection closing. I use PDO to connect to the MariaDB server. [00:16:40] Amitie_10g: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [00:17:52] kdhingra307: what is your ldap name (or name on wikitech)? [00:18:10] kdhingra2210 on wikitech and kdhingra as shell name [00:18:17] ok [00:19:30] The relevant part of the code is in https://github.com/Amitie10g/WebArchiveBOT/blob/master/bin/class.php from the line 498 (function), 500 (connection), and between lines 549 and 572 (populating the query and run PDO::exec() [00:20:43] What is the time limit for connections to the MariaDB server (using PDO)? [00:23:02] Amitie_10g: there is a long-running query killer. [00:23:15] kdhingra307: did you follow these steps: https://wikitech.wikimedia.org/wiki/Help:Access#Prerequisites [00:23:26] And can you access the main toolforge login [00:23:35] yes [00:23:55] Hmm. I don't see your account home directory [00:24:39] I see your account in ldap [00:25:25] So you can log into login.tools.wmflabs.org? [00:26:05] nope I cannot [00:26:13] Whm. [00:26:14] it yield with connection closed at port 22 [00:27:36] So, whould be more efficient to run PDO::query() on every iteration instead of creating a big query and run it at once? [00:29:54] Depends what you're doing, what you're querying, how many rows you're trying to return, how intensive/performant your query is [00:31:53] The query is an INSERT INTO statement with 1000 fields (now configured as 100). [00:33:02] 1000 fields or 1000 rows? [00:33:17] Rows [00:34:25] kdhingra307: It would likely be best to make sure that works. You need to be able to proxy through the bastion server before you can get to a VPS instance. [00:34:52] It looks like your setup isn't entirely complete from all indications [00:35:20] I did put public key into my wikitech account [00:35:39] I did modified .ssh/config and its loading config very well [00:36:01] i am proxying it through primary.bastion.wmflabs.org [00:36:05] Ok [00:36:33] Did you do this: https://wikitech.wikimedia.org/wiki/Help:Create_a_Wikimedia_developer_account#VPS_and_General_Users [00:36:54] Just trying to find the missing piece here. [00:37:07] It's more important that you can log into anything than proxying and such [00:37:16] I moved the code to run PDO::exec() in every iteration. Then, making several exec() with one row at once at thousand of iterations be more efficient and non problematic than a huge query with thousand of row insertions at once? [00:37:53] kdhingra307: I might suggest you create a task so we can poke around more just in case some script didn't run or there is some additional step needed [00:38:20] bstorm_ yes i have created wikitech proper account [00:38:36] I have not been given permission in wikitech account to do ssh [00:38:40] Can you create a task for investigating this? https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [00:39:18] yep i will thanks [00:40:52] 👍🏻 [00:55:37] Also, I'm getting SlowTimer [5170ms] at curl, but opening the given URL in my browser, page loads almost inmediately. [00:56:08] I'm running HHVM [01:36:43] I could consider the first issue as resolved, but I still see the SlowTime as a serious issue [01:53:39] ffs https://tools.wmflabs.org/csp-report/search?fs=www.hampel-auctions.com [07:39:27] !log tools.dibot qdel-ed all jobs except lighttpd of dibot and disabled crontab (`sed 's/^/#/'`) T195834 [07:39:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.dibot/SAL [07:39:30] T195834: mono-based bot hangs after mono version upgrade - https://phabricator.wikimedia.org/T195834 [09:54:07] !help [09:54:07] kdhingra307: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [10:25:37] kdhingra307: yes? [10:49:31] zhuyifei1999_: I was trying to ssh vcs but i cant [10:49:39] vcs? [10:50:02] clickmodel.eqiad.wmflabs [10:50:02] what's the hostname? [10:51:06] are you a member of the 'search' project? [10:51:14] yeapp [10:51:17] (because the FQDN is clickmodel.search.eqiad.wmflabs) [10:51:29] what is the error you are getting? [10:51:39] permission error [10:52:01] I tried this(clickmodel.search.eqiad.wmflabs) too [10:52:31] Its giving me this error `ssh_exchange_identification: Connection closed by remote host ` [10:52:48] but i can ssh to primary.bastion.wmflabs.org without any prob [10:53:28] could you add `-vvv` onto the ssh command and paste the logs? [11:05:34] zhuyifei1999_: yep just a sec [11:07:46] !log wikidata-dev wikidata-constraints update MediaWiki and extensions to current master [11:07:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [11:08:54] OpenSSH_7.6p1, LibreSSL 2.6.2 [11:08:55] debug1: Reading configuration data /Users/karan/.ssh/config [11:08:55] debug1: /Users/karan/.ssh/config line 1: Applying options for * [11:08:56] debug1: /Users/karan/.ssh/config line 2: Deprecated option "useroaming" [11:08:58] debug1: /Users/karan/.ssh/config line 4: Applying options for *.eqiad.wmflabs [11:09:00] debug1: /Users/karan/.ssh/config line 7: Applying options for *.wmflabs [11:09:02] debug1: /Users/karan/.ssh/config line 10: Applying options for * [11:09:04] debug3: kex names ok: [curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256] [11:09:07] debug1: Reading configuration data /etc/ssh/ssh_config [11:09:08] debug1: /etc/ssh/ssh_config line 48: Applying options for * [11:09:10] debug1: Executing proxy command: exec ssh -a -W clickmodel.search.eqiad.wmflabs:22 kdhingra@primary.bastion.wmflabs.org [11:09:12] debug1: permanently_drop_suid: 501 [11:09:14] debug1: identity file wikimedia type 0 [11:09:16] debug1: key_load_public: No such file or directory [11:09:18] debug1: identity file wikimedia-cert type -1 [11:09:20] debug1: Local version string SSH-2.0-OpenSSH_7.6 [11:09:22] kdhingra@primary.bastion.wmflabs.org: Permission denied (publickey). [11:09:24] ssh_exchange_identification: Connection closed by remote host [11:09:26] zhuyifei1999_: please find the logs above [11:10:25] hmm [11:10:39] it looks like the proxycommand failed [11:11:15] you can ssh kdhingra@primary.bastion.wmflabs.org directly right? [11:11:25] yeah i can [11:12:15] did you specify a key explicitly with ssh -i when you ssh kdhingra@primary.bastion.wmflabs.org? [11:12:27] yes in both cases i specified the key [11:12:54] ok so the issue is that when the proxycommand works it's like there's an ssh tunnel [11:13:16] the command in which the tunnel set up is `ssh -a -W clickmodel.search.eqiad.wmflabs:22 kdhingra@primary.bastion.wmflabs.org` [11:13:20] the key is missing [11:13:28] ohhk [11:13:54] thanks gotcha [11:13:55] I would recommend saying the 'which key to use' to your .ssh/config [11:13:56] perhaps using the ProxyJump option would be easier [11:14:15] (but yeah, also configure IdentityFile in the config) [11:14:19] i made a mistake there [11:15:00] * zhuyifei1999_ has a 59-line .ssh/config [17:31:36] hmm im getting this [17:31:36] Failed resources (up to 3 shown): File[/var/lib/prometheus/node.d] [17:32:21] Error: Could not find user prometheus [17:32:21] Error: /Stage[main]/Prometheus::Node_exporter/File[/var/lib/prometheus/node.d]/owner: change from 14736 to prometheus failed: Could not find user prometheus [17:32:25] !help ^^ [17:32:25] paladox: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [17:32:31] I believe arturo is addressing it paladox [17:32:37] thanks chasemp [17:32:55] yeah thanks paladox [17:38:49] !log tools Added grid engine quota to limit user debenben to 2 concurrent jobs (T196486) [17:38:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:38:52] T196486: Concurrent generated jobs from a single user overloaded grid engine - https://phabricator.wikimedia.org/T196486 [17:39:24] !log tools T196137 clush: delete `prometheus` user and re-create it locally. Then, chown prometheus dirs [17:39:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:39:26] T196137: toolforge: prometheus issue is filling up email queue - https://phabricator.wikimedia.org/T196137 [18:02:51] !log tools Forced puppet run on tools-bastion-03 to re-enable logins by dubenben (T196486) [18:02:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:02:54] T196486: Concurrent generated jobs from a single user overloaded grid engine - https://phabricator.wikimedia.org/T196486 [19:09:28] !help I caused some problems to the grid lately, see https://phabricator.wikimedia.org/T196486 [19:09:28] Debenben: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [19:09:44] I was told that I could restart the jobs [19:10:00] however it appears that they were deleted from the job queue [19:10:04] Debenben: That was me [19:10:59] The queueing was working well, but I saw a torrent of job failures come in, so I became concerned. I think the failures were legitimately queueing nicely, but I jumped in and killed the jobs when I got the spam again. [19:11:38] So you didn't absorb a huge amount of grid resources at any given time, but the jobs chew through and die really quickly now. [19:12:53] I might not have needed to delete them all from the queue, since they were queuing up. Sorry about that. I was just concerned by the flood of failure emails. You probably do need to check the jobs, though. [19:14:01] OK, can you give an example of a job that failed. I just checked some randomly and they appeared to be ok [19:14:22] `06/05/2018 14:33:47 [4470:1402]: execvlp(/var/spool/gridengine/execd/tools-webgrid-generic-1404/job_scripts/6893051, "/var/spool/gridengine/execd/tools-webgrid-generic-1404/job_scripts/6893051") failed: Exec format error` [19:14:43] That was job 6893051 [19:15:33] Also 6902576, which doesn't give any useful error message? `Shepherd error: [19:15:33] 06/05/2018 18:53:00 [600:30554]: exit_status of epilog = 255` [19:24:23] I think overall, the reason for the issue is the exit status. Of course generally a non-zero exit is a failure. I'm not sure what the issue is there. [19:24:36] bstorm_: iirc the epilog script for webservices tries to deregister the webservice with the proxy [19:24:56] but I have no clue how that was ever configured [19:25:18] and SGE also has an habit of just doing random stuff when things don't completely go the way they are supposed to :/ [19:25:36] Debeben: should this be running on web nodes? It looks like generic awk? [19:26:18] That may be the issue, actually. Jobs going to the wrong queue [19:26:52] Since these don't look like web stuff at all, if there is a queue-level epilog script, it wouldn't apply here [19:26:56] There's a subtlety with submitting with jsub vs with qsub. jsub automatically chooses the right queue, and iirc qsub does not [19:27:10] qsub lets you do many things you probably don't want to [19:27:14] :) [19:27:53] bstorm_: you are completely right about the epilog being queue-level... https://github.com/wikimedia/puppet/blob/e959321aa620b77403cc9379db2e86080323c6e8/modules/toollabs/templates/gridengine/queue-webgrid.erb [19:28:40] so if another job is submitted there, /usr/local/bin/portreleaser will be called, and that will probably get confused. However, that should only be afterwards, and shouldn't really affect the result of the job itself. It might put the queue in an error state, though [19:28:58] Yup, so that seems to be the issue. The jobs need to go to the right queue, which is `task` from the look of the code. [19:29:35] I'll add that to the user's talk page, since they have logged out of the channel [19:29:46] also, is the job a raw `awk` command? (i.e., not in a script?) [19:31:19] that could explain the `Exec format error` -- I think SGE tries to copy scripts, but that tends to fail with binaries. The "-b y" flag to qsub handles that, and jsub handles it magically [19:31:51] sorry, I was interrupted [19:32:25] you are right, it is just awk, so it shouldn't run on a web note [19:32:27] node [19:33:30] for job number 6893051 I don't have any logs, I am not sure if that was my job, or if it just didn't produce any logs [19:33:34] the other, [19:34:23] 6902576 I get a port unregistration failed! in the error log [19:34:32] however otherwise everything appears to be fine [19:34:53] Debenben: are you using qsub directly, or our jsub wrapper? [19:35:06] i used qsub [19:35:14] do you have access to my home? [19:35:22] that may cause some of the issues actually [19:35:57] it is the script called mathextract.sh [19:36:03] you should try using "jsub" in place of "qsub". It should be a transparent replacement [19:38:56] Yup. If you do use qsub, make sure you use `-q task` for these so it goes to the right queue for the job. "jsub" will, however, likely fix the problem without digging. Jsub sets a few good sane defaults. It will also accept other options that you would normally pass to qsub. [19:40:17] I resubmitted job 6902576 with jsub now [19:40:35] should I try to submit the others as well? [19:43:39] Debenben: maybe do a subset? Like just the a* jobs or something? [19:43:49] ok [19:50:55] Not getting any alerts :) [19:52:08] the port unregistration failed message also dissappeared [19:52:46] should I add -q task, or was that automatically chosen? [19:53:00] jsub automatically will chose that queue by default [19:53:14] s/chose/choose/ [19:53:35] ok, can I restart the rest with jsub? [19:53:44] I think it will work fine, really. [19:53:50] ok [19:54:30] On the web queues, there are scripts that run to connect things to web resources. Those were failing because this isn't a web job. Using jsub, it correctly picks the task queue. [20:00:29] o/ bd808 [20:00:46] so we want to merge all those k8s related project requests into 1? :) [20:01:07] addshore: o hi. I'm in a pile of meetings now, but yeah maybe? [20:01:20] okay! [20:01:30] * addshore will comment on one of the tickets with some quota suggestion or something [20:01:51] * valhallasw`cloud is off to read & sleep, g'night all [20:07:48] o/ valhallasw`cloud [20:07:50] sleep well [20:15:04] Hi everyone. I'm getting "500 internal server error" trying to log into paws.wmflabs.org . Who shoud I ask about that? [20:15:24] notconfusing: what is your username? [20:15:45] Wiki oauth username is "maximilianklein" [20:15:53] nvm I'm getting that as well, let me check [20:16:25] thanks, chicocvenancio [20:19:01] seems to be related to the toolsdb outage [20:19:21] s/outage/maintenance [20:19:58] https://www.irccloud.com/pastebin/p8gFEFAP/ [20:24:04] !log tools.paws restarting hub container to attempt to fix sql session [20:24:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.paws/SAL [20:24:40] notconfusing: ^ indeed that fixed it [20:26:38] yes, it did indeed. thank you so much chicocvenancio [21:27:49] bd808: hare I updated https://phabricator.wikimedia.org/T196094 with a better quota for the 3 projects to be combined [21:28:29] addshore: any reason to use k8s 1.6 instead of a more updated version there? [21:29:00] hmm, where do I say I'll be using k8s 1.6? [21:29:52] "using the same puppetized k8s setup as is used for the tools k8s cluster" maybe I'm misreading something [21:30:09] ahh, I didn't realize that it was only using 1.6 [21:30:20] In which case it will be based on perhaps, but not the same [21:30:40] cool [21:30:45] 1.10 ftw [21:36:38] addshore: looks fine to me. Should I withdraw the other two proposals? [21:37:01] perhaps :) [21:41:09] Hola. Quick question(ish) thing: I was told that the docs at https://wikitech.wikimedia.org/wiki/Help:Access#Accessing_instances_with_ProxyCommand_ssh_option_(recommended) may be slightly off because they don't use the actual FQDN in the SSH connection instructions. Didn't want to actually change the docs, though, since I don't know if that's actually a correct statement.... [22:01:14] I don't see anything in there that is wrong. It's not exactly like my config, but using wildcards is definitely part of it. [22:01:14] hi, can anyone point me to information about using PAWS with a virtualenv? I'm trying to use a mysqlclient that isn't in the default environment. thanks in advance [22:01:51] notconfusing: not supported at the time, unfortunatly [22:02:11] but, you can start a terminal and install anything you want with pip [22:02:33] (will be a single environment for your whole server though) [22:02:38] bstorm_: Thanks. The specific question was about whether `ssh deployment-webperf11.eqiad.wmflabs` was specific enough, or if it needed to be `ssh deployment-webperf11.deployment-prep.wmflabs.org`. If the former (which is what's documented) is good enough, it works for me. Less typing, too :-) [22:03:20] @chico, i see, let me try that, even if it's just one global environment that i can add packages to that'd be great [22:04:50] Ahh. That's an interesting one. marlier, when I ssh, I've found that in some cases, you can get away with leaving out the project name, and I have NO IDEA WHY yet. However, yeah, it is technically supposed to include the VPS project name in the FQDN [22:06:14] notconfusing: I don't quite remember where pip installs the local packages to, but I think the env doesn't survive a restart [22:06:38] one way I get arround that is to have a cell in the notebook install the packages [22:06:55] `!pip install package` works in the notebooks [22:07:05] Oh wait, you also added ".org" at the end, marlier. That will only work at all in some cases. [22:07:07] right, that's what I'm doing now. [22:07:29] bstorm_: the ".org" was just a typo [22:07:30] Sorry. [22:07:48] Hrm, and the second version doesn't have the eqiad as well, but that's probably the same thing [22:08:10] So for VPS, you usually need the project name after the hostname before eqiad.wmflabs [22:08:27] Sometimes it works without, and I don't know why [22:08:36] Because usually it won't and shouldn't [22:08:51] all of the DNS names are created both with and without the project included [22:08:58] because legacy mostly [22:09:09] originally there were not project names in dns [22:09:09] Ahhh ok [22:09:23] and when we introduced that we kept making the old name style too [22:09:33] which ... we should probably think about sunsetting [22:09:36] That helps :) Since I cannot always make it work without the project in all cases. [22:09:56] @chicocvenacio, I'm trying to do `!pip install mysqlclient` which is giving an error "OSError: mysql_config not found", stackoverflow says that can be fixed by `apt install libmysql-dev` which as you can see is not quite an option here either [22:10:43] my end goal is to connect to the database replicas with 'sqlalchemy' so i can feed sql results into python pandas (if there's another way around this). [22:12:48] isn't python-mysqldb enough? [22:14:31] * notconfusing waves at halfak [22:15:18] @platonides, maybe, let me check, it seems that pandas wants sqlalchemy, and sqlalchemy wants python-mysqlclient, but i will double check this assumption [22:26:25] SUCCESS. I fixed my own problem. the solution is to use the connection string "mysql+pymysql://{etc}" [22:26:35] is there somewhere i should document this trick? [22:26:39] notconfusing, don't feed me to python pandas! THAT SOUNDS HORRIFYING! [22:27:12] sql, welcome to the jungle [22:35:11] notconfusing: It doesn't look like we have a single place to look for PAWS docs today unfortunately. There is https://www.mediawiki.org/wiki/PAWS which seems to be a stub page and https://www.mediawiki.org/wiki/Manual:Pywikibot/PAWS which is very focused on the initial use-case of PAWS which was running pywikibot scripts [22:35:45] the https://www.mediawiki.org/wiki/PAWS page is maybe the right place to start building better end user help [22:36:00] great, i'll flesh out https://www.mediawiki.org/wiki/PAWS as I struggle through myself [22:45:07] notconfusing: sorry for not responding, great you found an alternative [22:45:59] we could add libmysql-dev to the user server if necessary [22:46:20] this works fine for now, thanks for all your great help