[00:06:21] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#2033723 (10kaldari) @MusikAnimal: Would you like some help tackling the issues at https://github.com/MusikAnimal/pageviews/issues? [00:14:52] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#2033763 (10MusikAnimal) @kaldari Sure! The only problem is I'm in the middle of a big cleanup. When this first started I just wanted to get something out that wo... [00:24:46] chasemp: I'm documenting it on the ticket [00:25:08] chasemp: but yeah, mount_nfs: false on project hiera, mount_nfs: true on the instances that need it, and am setting up clush to umount /data/project everywhere [00:25:16] Negative24: sure! [00:25:54] yuvipanda: how are you hosting and distributing k8s? hyperkube? [00:25:59] nope [00:26:02] just puppet roles [00:26:10] in case unseen https://phabricator.wikimedia.org/T127066 [00:29:18] yuvipanda: are these the roles? https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/tools.pp [00:31:06] Negative24: yeah [00:31:09] Negative24: and the k8s module [00:31:14] yep [00:31:16] Negative24: I don't expect it to be generally useful outside WMF tho [00:31:33] ok, I'll take a look [00:32:10] how do you get outside access to services? [00:32:23] Negative24: search for kube2proxy.pp [00:32:27] we have a thing :D [00:32:34] ah [00:33:20] because they say you need google cloud to allocate an external ip for loadbalancer services [00:36:48] Negative24: right [00:36:58] Negative24: yeah, and we just have L7 routing via kube2dynproxy [00:37:04] Negative24: there's an upstream effort that uses haproxy for it [00:37:59] yuvipanda: great [00:38:13] I assume they want to use haproxy because it can run universally [00:38:28] I wonder how they do it in google cloud [00:39:04] It's interesting with the k8s docs right now because its so new [00:41:12] yeah [00:43:07] chasemp: do you have the clush command you were using handy? [00:43:09] I keep losing it [00:48:44] 6Labs, 6Phabricator: Git broken on phabricator labs machines - https://phabricator.wikimedia.org/T127139#2033843 (10Negative24) 3NEW [01:11:08] Krenair: can you reach deployment-parsoid05? [01:11:13] andrewbogott: ^ (if you're still here) [01:12:44] I can ping it, yuvipanda [01:12:57] I can ssh in [01:13:21] Krenair: hmm, I can't [01:13:30] Krenair: can you try doing 'sudo umount /data/project'? [01:13:50] you can't ping or you can't ssh? [01:13:56] Krenair: can't ssh [01:13:56] salt from deployment-salt works for me as well [01:14:02] interesting. even as root? [01:14:11] yuvipanda@deployment-sca01:/data/project$ ssh deployment-parsoid05.eqiad.wmflabs [01:14:13] Permission denied (publickey). [01:14:15] y [01:14:16] Creating directory '/home/krenair'. [01:14:16] Linux deployment-parsoid05 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 x86_64 [01:14:17] Krenair: yeah [01:14:21] interesting [01:14:38] oh hmm [01:14:42] it isn't using the right ssh key for some reason [01:14:58] Krenair: baaah [01:15:00] I'm an utter idiot [01:15:03] I just noticed [01:15:08] I was trying to ssh from deployment-sca01 [01:15:10] not my local [01:15:12] ignore me [01:15:54] :D [01:16:33] Krenair: I can't get to deployment-bastion tho [01:16:36] I guess that's been deleted [01:16:43] that's gone [01:16:45] use deployment-tin [01:16:52] ah [01:16:54] ok [01:17:02] Krenair: it should probably be deleted [01:17:06] since it shows up on the lists still [01:17:11] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#2033921 (10kaldari) @MusikAnimal: Cool, created T127143 so we can track it as part of Community Tech work. [01:17:13] oh? thought it was gone for some reason [01:17:13] hm [01:17:21] I still see it [01:17:31] I assumed it was deleted because you said you couldn't get to it [01:18:13] it's apparently running though [01:19:00] yeah [01:20:04] destination host unreachable [01:20:09] yet, nova thinks it's running... [01:21:24] yeah [01:21:24] 6Labs, 10Beta-Cluster-Infrastructure: Disable /data/project for instances in deployment-prep that do not need it - https://phabricator.wikimedia.org/T125624#2033926 (10yuvipanda) I've tried to unmount it from the following instances: ``` deployment-analytics03.eqiad.wmflabs deployment-analytics02.eqiad.wmflab... [01:21:27] could try rebooting it [01:24:07] all the console log shows is the login screen [01:27:00] 6Labs, 10Labs-Infrastructure, 6Operations: labservices1001 ran out of disk space - https://phabricator.wikimedia.org/T126572#2033935 (10Stashbot) {nav icon=file, name=Mentioned in SAL, href=https://tools.wmflabs.org/sal/log/AVLs1l6w-0X0Il_jxsDV} [2016-02-17T01:26:57Z] labservices1001 - out of disk... [01:28:44] 6Labs, 10Labs-Infrastructure, 6Operations: labservices1001 ran out of disk space - https://phabricator.wikimedia.org/T126572#2033936 (10Dzahn) /dev/md0 9.1G 6.3G 2.3G 74% / ..but it will come back [01:29:25] labservices1001 was out of disk space [01:29:44] i imagine this may have caused issues for users [01:30:22] i made some space by moving designate-mdns logs to /srv/var . ticket updated [02:25:20] 6Labs, 6Operations: Manual creation of labs account - https://phabricator.wikimedia.org/T125172#2034049 (10Krenair) a:3Krenair We really need to find new LDAP admins who will actually process things. I'll go ahead and try to fix your account given what I know and my existing permissions. [02:29:37] 6Labs, 6Operations: Manual creation of labs account - https://phabricator.wikimedia.org/T125172#2034052 (10Krenair) a:5Krenair>3Cobi Okay, I followed Ryan's instructions, but a) from a clone of the puppet repo on terbium, so I can actually run the script and b) using wikitech's LDAP credentials instead of... [02:50:01] 6Labs, 10Beta-Cluster-Infrastructure: Disable /data/project for instances in deployment-prep that do not need it - https://phabricator.wikimedia.org/T125624#2034059 (10yuvipanda) Everything except deployment-parsoid05 and deployment-sca01 has been handled. Parsoid seems to be writing logs to NFS >_> [03:02:05] 6Labs, 10Beta-Cluster-Infrastructure, 5Patch-For-Review: Disable /data/project for instances in deployment-prep that do not need it - https://phabricator.wikimedia.org/T125624#2034076 (10yuvipanda) Disabled on deployment-parsoid05, logs are on `/var/log/parsoid` now. [03:07:05] 6Labs, 10Beta-Cluster-Infrastructure, 5Patch-For-Review: Disable /data/project for instances in deployment-prep that do not need it - https://phabricator.wikimedia.org/T125624#2034078 (10yuvipanda) And an umount -f does the trick in deployment-sca01! [03:14:10] 6Labs, 10Beta-Cluster-Infrastructure, 5Patch-For-Review: Disable /data/project for instances in deployment-prep that do not need it - https://phabricator.wikimedia.org/T125624#2034080 (10yuvipanda) So all instances that do not need NFS do not have NFS anymore! Woo! :D If someone is building a new server tha... [03:15:12] 6Labs, 10Beta-Cluster-Infrastructure, 5Patch-For-Review: Disable /data/project for instances in deployment-prep that do not need it - https://phabricator.wikimedia.org/T125624#2034081 (10yuvipanda) 5Open>3Resolved w00t! [03:15:21] 6Labs, 10Beta-Cluster-Infrastructure, 5Patch-For-Review: Disable /data/project for instances in deployment-prep that do not need it - https://phabricator.wikimedia.org/T125624#2034083 (10yuvipanda) p:5Triage>3Normal [07:37:08] 10Tool-Labs-tools-Other, 6TCB-Team, 7German-Community-Wishlist: "pageviews" tool link at bottom creates invalid HTML syntax (due to missing HTML encoding of characters like '&') - https://phabricator.wikimedia.org/T126975#2034418 (10PatoLogic) @MusikAnimal: you reproduced the issue correctly, but you say "I... [07:44:09] 10Tool-Labs-tools-Other, 6TCB-Team, 7German-Community-Wishlist: "pageviews" tool link at bottom creates invalid HTML syntax (due to missing HTML encoding of characters like '&') - https://phabricator.wikimedia.org/T126975#2034446 (10PatoLogic) @Raymond: sorry, as of 2016-02-17, 8:43 CET, I DO see the effect... [07:55:06] 6Labs, 10Beta-Cluster-Infrastructure, 5Patch-For-Review: New labs instance fails running `block-for-project-export` before running mount - https://phabricator.wikimedia.org/T126568#2034488 (10yuvipanda) p:5Unbreak!>3Normal Is this still happening? I see that nfs-exports daemon is running fine on labstore... [07:57:09] 10Tool-Labs-tools-Other, 6TCB-Team, 7German-Community-Wishlist: "pageviews" tool link at bottom creates invalid HTML syntax (due to missing HTML encoding of characters like '&') - https://phabricator.wikimedia.org/T126975#2034492 (10PatoLogic) this issue may be closed now. Thanks everybody. [07:58:51] 10Tool-Labs-tools-Other, 6TCB-Team, 7German-Community-Wishlist: "pageviews" tool link at bottom creates invalid HTML syntax (due to missing HTML encoding of characters like '&') - https://phabricator.wikimedia.org/T126975#2034495 (10Raymond) 5Open>3Resolved a:3Raymond [08:08:58] 6Labs, 10Tool-Labs, 10DBA: Copy user database to c1 - https://phabricator.wikimedia.org/T127107#2034499 (10jcrespo) It is not possible to recover user databases from c2. While usually I recover them manually, in this was there was an unrecoverable disk failure. This was a conscious decision- these are repli... [08:12:39] PROBLEM - Puppet staleness on tools-worker-1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [43200.0] [09:08:45] 6Labs, 10Beta-Cluster-Infrastructure, 5Patch-For-Review: New labs instance fails running `block-for-project-export` before running mount - https://phabricator.wikimedia.org/T126568#2034590 (10hashar) a:3chasemp I am assuming @chasemp fixed the script that generate the NFS export list by simply raising the... [09:08:53] 6Labs, 10Beta-Cluster-Infrastructure, 5Patch-For-Review: New labs instance fails running `block-for-project-export` before running mount - https://phabricator.wikimedia.org/T126568#2034592 (10hashar) 5Open>3Resolved [09:32:48] 6Labs, 10DBA: Some databases cannot be backed up/replicated on toolsdb - https://phabricator.wikimedia.org/T127164#2034626 (10jcrespo) 3NEW a:3jcrespo [09:44:54] 6Labs, 10DBA: Some databases cannot be backed up/replicated on toolsdb - https://phabricator.wikimedia.org/T127164#2034651 (10jcrespo) I may add here all tables in s52721__pagecount_stats_p. Performance is not a perfect science. [10:11:51] 6Labs, 10DBA: Some databases cannot be backed up/replicated on toolsdb - https://phabricator.wikimedia.org/T127164#2034699 (10Stigmj) Thanks for the notice. My DB's are possible to recover in case of a complete wipeout, but that would take several months to do. What is the issue with my DB's? Are they too big... [10:19:30] 6Labs, 10DBA: Some databases cannot be backed up/replicated on toolsdb - https://phabricator.wikimedia.org/T127164#2034709 (10jcrespo) s52721__pagecount_stats_p is currently being replicated, but for some time it was on the verge of being filtered out. Right now I do not see any issue with it, but it showed me... [12:54:12] 6Labs, 10Beta-Cluster-Infrastructure, 5Patch-For-Review: Disable /data/project for instances in deployment-prep that do not need it - https://phabricator.wikimedia.org/T125624#2034919 (10hashar) Thank you @yuvipanda ! [12:54:16] 6Labs, 10Beta-Cluster-Infrastructure, 5Patch-For-Review: Disable /data/project for instances in deployment-prep that do not need it - https://phabricator.wikimedia.org/T125624#2034928 (10hashar) [12:54:18] 6Labs, 10Beta-Cluster-Infrastructure: Completely remove Beta Cluster dependency on NFS - https://phabricator.wikimedia.org/T102953#2034927 (10hashar) [12:54:29] 6Labs, 10Beta-Cluster-Infrastructure, 5Patch-For-Review: Disable /data/project for instances in deployment-prep that do not need it - https://phabricator.wikimedia.org/T125624#1992944 (10hashar) I marked it again as blocked on the root task {T102953}, it is part of it. [12:54:48] 6Labs, 10Beta-Cluster-Infrastructure, 5Patch-For-Review: Disable /data/project for instances in deployment-prep that do not need it - https://phabricator.wikimedia.org/T125624#2034931 (10hashar) And Swift is {T64835} itself a child task of the root task T102953. [13:01:45] what does it take to make an instance read it' [13:01:59] 6Labs, 10Labs-Infrastructure: Get HA db support for labs services - https://phabricator.wikimedia.org/T126251#2034948 (10jcrespo) How badly would it be, in case of an emergency, point to codfw? How sensitive is it regarding latency (only for a few hours)? I have some available host there, and it would provide... [13:02:01] read it's classes from hiera instead of the puppet groups in wikitech [13:02:31] e.g. why isn't phabricator::deployment::source on https://wikitech.wikimedia.org/wiki/Hiera:Phabricator applied to deploy.phabricator.eqiad.wmflabs [13:03:02] (I had to add it to the wikitech puppet groups and enable it with a checkbox ...) [14:14:03] twentyafterfour: I believe we don't do class application via hiera that way, only parameterization. afaik what you did is canonical [14:40:53] hmm... I could swear yuvi wrote an enc that used hiera [14:46:53] (03PS1) 10Youni Verciti: Initial check-in [labs/tools/vocabulary-index] - 10https://gerrit.wikimedia.org/r/271268 [15:02:26] 6Labs: Bingbot scraping tools? - https://phabricator.wikimedia.org/T127066#2035209 (10Magnus) I don't know what's going on, but many of my tools keep crashing, including catscan2, autolist, glamtools, all of which are of not insignificant value to the community. Looking at my overview tool https://tools.wmflabs.... [15:05:41] 6Labs: Bingbot scraping tools? - https://phabricator.wikimedia.org/T127066#2035212 (10valhallasw) If there's a lot of open connections, that's probably {T104799}. Could you check if there are indeed a large number of child php processes? [15:17:08] could someone restart qa-morebots ( https://wikitech.wikimedia.org/wiki/Morebots )? [15:26:25] 6Labs: Bingbot scraping tools? - https://phabricator.wikimedia.org/T127066#2035259 (10Magnus) Lots of active connections: https://tools.wmflabs.org/catscan2/server-statistics Not sure which server to check for php processes. I do have some code in PHP to force-close the connection: header("Connection: close");... [15:32:07] 6Labs: Bingbot scraping tools? - https://phabricator.wikimedia.org/T127066#2035280 (10valhallasw) Sorry, that's indeed not entirely obvious. ``` $ qstat -u 'tools.catscan2' -xml ... webgrid-lighttpd@tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs ... $ssh tools-webgrid-lighttpd-1... [15:39:01] 6Labs: Bingbot scraping tools? - https://phabricator.wikimedia.org/T127066#2035306 (10Magnus) I keep restarting it, because it becomes unresponsive. Restarted two times since my last comment here... [15:43:07] 6Labs: Bingbot scraping tools? - https://phabricator.wikimedia.org/T127066#2035326 (10Magnus) And four minutes later, back up to 24 active requests. [16:17:40] 6Labs, 6Operations, 10ops-eqiad: disk failure on labsdb1002 - https://phabricator.wikimedia.org/T126946#2035452 (10Cmjohnson) Problem I am having is figuring out which disk is /dev/sdc [16:21:07] 6Labs: Bingbot scraping tools? - https://phabricator.wikimedia.org/T127066#2035471 (10valhallasw) It does not seem to be reflected in the number of php processes, but: ``` fastcgi.active-requests: 226 (...) fastcgi.requests: 233 ``` does suggest it's related to requests hanging/waiting in a fcgi request. [17:17:09] 6Labs, 10Tool-Labs: provide a more strict robots.txt at Tool Labs - https://phabricator.wikimedia.org/T127206#2035680 (10Bugreporter) 3NEW [17:17:35] hi [17:17:50] I have a tool that someone made for me on WMFLabs, and it seems to have crashed [17:17:55] could someone restart it for me [17:18:05] https://tools.wmflabs.org/shortnames/ [17:59:14] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Cobi was created, changed by Cobi link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Cobi edit summary: Created page with "{{Tools Access Request |Justification=ClueBot NG, ClueBot III, account projects |Completed=false |User Name=Cobi }}" [18:10:13] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#2035925 (10DannyH) [18:31:12] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Cobi was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=313679 edit summary: [18:40:18] bd808: Congratulations concerning the phabricator stashbot. I think this is a very good idea [18:41:02] hey, Luke081515? [18:42:34] Dragonfly6-7: What's up? [18:42:49] Luke081515 - I've got a tool that someone made for me on WMFLabs [18:42:53] and it has crashed? [18:42:57] could you (or someone) restart it? [18:43:00] https://tools.wmflabs.org/shortnames/ [18:43:10] I'm not a labs admin. But which tool is it? [18:43:13] ah, ok [18:43:43] Mdann52 or Coren can restart it [18:43:51] or a labs admin. But I'm not one [18:43:52] previously I've had Coren do it, but he's been AFK for nearly 20 hours [18:44:11] maybe yuvipanda can help? (I don't know if he is currently here) [18:45:18] 6Labs, 10Tool-Labs, 10DBA: Copy user database to c1 - https://phabricator.wikimedia.org/T127107#2036021 (10Superyetkin) 5Open>3Resolved Thanks for the information. I have just created a new database on c1. [18:47:14] does mdann use any other names? [18:47:19] on IRC, I mean [18:54:56] Dragonfly6-7: please create bugs for these things [18:57:53] valhallasw`cloud - you're assuming a skillset not in evidence. [18:58:24] If you can handle IRC you can handle phabricator ;-) [18:59:09] we should have a person creating tickets from irc bug reports :) [19:00:06] Dragonfly6-7: also, as far as I can see, the tool is running -- it's just not showing any output. [19:00:23] valhallasw`cloud - it should be showing several hundred, if not thousand, filenames. [19:01:40] any idea why that would be? [19:02:56] 10Tool-Labs-tools-Other: https://tools.wmflabs.org/shortnames/ down - https://phabricator.wikimedia.org/T127220#2036116 (10valhallasw) 3NEW [19:03:11] Dragonfly6-7: no. Again, create bugs, because that allows the right people to be notified. [19:03:34] thank you [19:10:50] 6Labs, 10Tool-Labs: Cluebot writes massive logs that are making labstore run out of space - https://phabricator.wikimedia.org/T127222#2036169 (10yuvipanda) 3NEW [19:16:04] 6Labs, 10Tool-Labs: Cluebot writes massive logs that are making labstore run out of space - https://phabricator.wikimedia.org/T127222#2036233 (10yuvipanda) ``` root@labstore1001:/srv/project/tools/project/cluebot/logs# ls -lsh total 1.2T 757M -rw-rw---- 1 51109 51109 757M Feb 17 19:15 cbng_bot.err 221M -rw-rw-... [19:29:39] valhallasw`cloud: I suppose /dev/null works for -e with jsub [19:32:02] yuvipanda: /dev/logstash! :> [19:32:18] valhallasw`cloud: yeah, making tickets and starting hardware procurement for those soon [19:32:28] valhallasw`cloud: but in the meantime cluebot's error log is going to dev/null now [19:32:34] I just have symlinked the err file to /dev/null [19:32:37] let's hope that works :) [19:32:41] but I suspect it won't [19:32:43] and I'll have to restart [19:32:45] the jobs [19:48:13] 6Labs, 10Beta-Cluster-Infrastructure: Soft mount remaining NFS mounts on deployment-prep - https://phabricator.wikimedia.org/T127224#2036325 (10yuvipanda) 3NEW [19:59:34] 6Labs, 10Beta-Cluster-Infrastructure: Soft mount remaining NFS mounts on deployment-prep - https://phabricator.wikimedia.org/T127224#2036347 (10chasemp) p:5Triage>3Normal [20:00:03] 6Labs, 10Tool-Labs: Cluebot writes massive logs that are making labstore run out of space - https://phabricator.wikimedia.org/T127222#2036349 (10chasemp) p:5Triage>3High [20:01:28] 6Labs, 10Tool-Labs: Cluebot writes massive logs that are making labstore run out of space - https://phabricator.wikimedia.org/T127222#2036169 (10chasemp) It's growing at a rate of >100G a day, that is unsustainable [20:04:51] 6Labs, 10Tool-Labs: Cluebot writes massive logs that are making labstore run out of space - https://phabricator.wikimedia.org/T127222#2036359 (10yuvipanda) I've symlinked the log file to /dev/null, let's see how that goes! [20:06:43] yuvipanda: it will probably hang the bot [20:06:54] yuvipanda: unless you actually changed the links under /proc [20:07:11] SGE does not reopen files [20:07:21] 6Labs, 10Tool-Labs: Cluebot writes massive logs that are making labstore run out of space - https://phabricator.wikimedia.org/T127222#2036377 (10yuvipanda) I also restarted all the restartable jobs so that this change can take effect. [20:07:31] valhallasw`cloud: i qmod -rj'd them [20:07:39] yuvipanda: ah, should be OK then [20:08:11] * yuvipanda nods [20:08:20] valhallasw`cloud: also http://graphite.wikimedia.org/render/?width=948&height=576&_salt=1455738188.847&target=servers.labstore1001.nfsd.clients happened yesterday :D [20:08:54] that's a nice 15% drop [20:08:57] what did you kill? :P [20:09:11] (I guess about 100 of those 250 are tools?) [20:09:34] 6Labs, 10Beta-Cluster-Infrastructure, 5Patch-For-Review: Disable /data/project for instances in deployment-prep that do not need it - https://phabricator.wikimedia.org/T125624#2036392 (10yuvipanda) Positive side of this is this graph, showing the NFS client drops: http://graphite.wikimedia.org/render/?width=... [20:09:45] valhallasw`cloud: deployment-prep [20:09:47] https://phabricator.wikimedia.org/T125624 [20:09:50] ah [20:24:50] hey, if there were any puppet fails on deployment-prep machines, would i see them here from shinken-wm ? [20:25:19] mutante: I think shinken-wm only reports tools hosts at the moment [20:25:30] mutante: but they might be visible under http://shinken.wmflabs.org/dashboard [20:25:37] and you should be spammed with e-mails as well [20:25:58] thanks, checking the dashboard [20:26:04] i made a change to the puppet roles for beta [20:26:13] i cherry-picked it on the deployment-puppetmaster [20:26:36] it looks ok on deployment-mediawiki01 f.e. [20:26:37] ah. You'll only get emails if puppet fails for > 24 hours, I think [20:26:59] would i only get those emails if i was an admin in deployment-prep? [20:27:11] *nod* [20:27:49] ok, i think i'm not one, so far things look good to me [20:33:19] 6Labs, 10Labs-Infrastructure, 10Tool-Labs, 10DBA: tools.ptwikis throttled for abusing labs db replica resources - https://phabricator.wikimedia.org/T127228#2036468 (10jcrespo) [20:34:30] valhallasw`cloud, let's be specially careful these days, as we have reduced redundancy [20:34:52] * valhallasw`cloud nods [20:35:43] apparently role::beta::bastion is not used by any instance [20:35:50] is beta bastion not a thing anymore? [20:38:47] jynus: I think we discussed the toolserver query killer at some point, but I think the conclusion was 'just checking for replag isn't a good way to work against over-use'? In any case, it's at https://github.com/valhallasw/ts-puppet/blob/master/modules/database/files/kill-queries [20:42:49] it is difficult to get it right [20:43:05] because there are several factors [20:43:12] many small queries [20:43:22] 1 long query [20:43:29] locks blocking replication [20:43:32] memory usage [20:43:34] latency [20:43:57] *nod* [20:44:13] we have someting in production that is very conservative, and still we cannot get it well [20:44:46] and of course, there is the social component [20:44:51] I think part of the usefulness of the TS method is that it allows you to specify a max runtime for a query. I don't know how many people actually used it in practice, though. [20:45:28] I think we should establish some rules [20:45:34] but not me, the users [20:45:50] decide how to proceed, and I will implement it [20:46:04] also there are some plans for a slow and a fast servers [20:46:25] but needs thinking [20:46:59] * jynus disconnects [20:46:59] right, a fast server for direct web frontend use and a slow server for more time-consuming analytics stuff, I guess. [20:47:17] exactly that [20:48:16] I think the biggest win might be to prevent long-running queries spawned from a web request. I'm not sure if that's what happened for those ptwiki queries, but I wouldn't be surprised [20:48:26] with longer limits for OLAP and more strict ones for the OLTP [20:48:32] user tries to visit webpage, doesn't work, F5's a few times, ask someone else if it works, etc [20:49:16] we could put a proxy in frony [20:49:29] run explain, redirect it :-) [20:49:42] but too much overhead [20:50:57] I would just kill all queries originating from webgrid hosts that run longer than 30 seconds ;-) [21:04:53] how to find out which (if any) instances use manifests/misc/limn.pp and module/limn/ in general? [21:05:06] if i use "watroles" i dont see them, is that because it's not a real role? [21:05:24] or is it actually not used, then we want to delete it [21:07:33] mutante: check ops/puppet to see if it's used in any roles/other classes, then work back up that tree, and see if any of those classes are in watroles... not great :( [21:09:43] so misc::limn seems unused, statistics seems to use statistics::limn, so they seem unused [21:09:59] valhallasw`cloud: i just saw that, it seems statistics::limn is separate right [21:10:06] and then there is geowiki::job::limn .. uhmm [21:10:09] I think so, yes [21:10:24] modules/statistics/manifests/limn/ [21:10:40] maybe there is just one way to find out in this case :) [21:10:55] it's been sitting there since August 2015 [21:11:13] waiting for someone to confirm if it can be killed or not [21:11:33] last one was "It is supposed to die, but there's no other easy way to set up a dashboard from ad-hoc graphs. And it doesn't look like we'll get that prioritized anytime soon. So until then, the limn module is pretty useful." [21:13:55] mutante: limn is fun. you should talk to milimetric about it. [21:14:05] it's a self hosted puppetmaster instance with local commits on site.pp [21:15:20] mutante: yes, limn is still waiting to die, but we're much closer to killing it now [21:15:28] by the end of the quarter, almost for sure [21:15:44] ok! thank you, that sounds good [21:15:45] 6Labs, 10Tool-Labs: Cluebot writes massive logs that are making labstore run out of space - https://phabricator.wikimedia.org/T127222#2036625 (10Luke081515) Adding Rich as maintainer of that tool... [21:15:49] is it possible to allow a 'service group' user to log in via ssh? I have set up the keys via puppet but I get this error: [21:15:51] phab-04 sshd[13435]: fatal: Access denied for user phab-deploy by PAM account configuration [preauth] [21:15:55] (from auth.log) [21:18:25] twentyafterfour: i think it would have to exist in LDAP [21:18:31] twentyafterfour: if you add the keys to /ssh/userkeys/phab.something, it might work [21:18:41] twentyafterfour: also, the user is phab.deploy, I think? [21:19:11] twentyafterfour: do you mean like a totall local user not via NSS? [21:19:16] totally even [21:19:24] and yuvi was working in a pretty method ('accept keys of any user in the user group') [21:23:29] twentyafterfour:tw [21:23:50] twentyafterfour: I would recommend against using service groups outside of tools, I'd like to kill it for all non-tools projects [21:24:40] it's a 'service group' user added via wikitech [21:25:18] yuvipanda: I thought labs couldn't support regular local users (I was under the impression they had to be added that way) [21:25:46] twentyafterfour: you can add 'regular local users' via puppet [21:26:11] twentyafterfour: you only need them to exist in LDAP if they are touching NFS [21:26:18] ok [21:26:20] so that the id will be the same between the instances and the nfs server [21:26:21] cool [21:26:23] and that's no longer teh case so [21:29:21] next question: why can't labs instances access phabricator git repos? [21:29:35] previously I think we had to use the proxy which is no longer available? [21:30:25] do you maybe just need to open a firewall hole via security groups? [21:30:45] yuvipanda, is there a mailserver that PHP can use. I want to use the mail function [21:30:57] 6Labs, 10Tool-Labs: Cluebot writes massive logs that are making labstore run out of space - https://phabricator.wikimedia.org/T127222#2036686 (10RichSmith) I'll look in to chilling down the log files a little bit... [21:30:59] twentyafterfour: good question, I think you might need a firewall hole in the labs <-> prod firewall [21:31:17] twentyafterfour: where is it located? you mean apt.wikimedia.rog ? [21:31:23] Cyberpower678: not sure what php does, I know that the 'mail' commandline thing works. [21:31:38] Cyberpower678: we don't have an smtp thing setup is all I know :) [21:34:10] How do I run a PHP one liner in the CLI? [21:34:17] yuvipanda, ^ [21:34:40] Cyberpower678: what have you tried doing already? [21:34:46] Nothing yet [21:34:59] I want to see if the mail function works [21:35:00] then try googling the question, I'm sure there'll be answers there :) [21:35:03] php file.php ? [21:35:32] mutante, The command to execute in PHP is literally in the CLI command [21:35:38] the mail command should open a socket by itself and use whatever the "mail" command uses too, afaik [21:36:58] Cyberpower678: i'm not sure i understand that last one. [21:37:24] but if the mail command in bash works, the php mail command should also work [21:38:28] so i'd first try to just echo "foo" | mail ... [21:38:48] mutante: git-ssh.wikimedia.org port 22 [21:39:19] twentyafterfour: oh right, yea, labs cant talk to production in general for security [21:39:21] mutante, I know I've been given one line codes to execute in the CLI itself. Something like "php twentyafterfour: it will need an explicit hole on network gear as yuvi said [21:39:44] or another workaround [21:42:01] Cyberpower678: echo "" | php [21:43:00] Could not open input file: [21:43:28] replace ; with a space [21:44:13] or just put it in a file and php foo.php [21:45:32] Yay it works [21:45:43] could someone restart qa-morebots ( https://wikitech.wikimedia.org/wiki/Morebots )? [21:47:49] Feb 17 21:46:53 phab-04 sshd[22101]: error: AuthorizedKeysCommand /usr/sbin/ssh-key-ldap-lookup returned status 1 [21:47:51] Cyberpower678: cool [21:47:51] Feb 17 21:46:53 phab-04 sshd[22101]: pam_access(sshd:account): access denied for user `phab-deploy' from `deploy.phabricator.eqiad.wmflabs' [21:47:56] jzerebecki: yes, i'll do it [21:48:02] PHP supports mail on labs [21:48:06] so I removed the user from wikitech and added it via puppet, still no dicer [21:48:18] * Cyberpower678 tests something more advanced [21:49:25] tools.morebots@tools-bastion-01:~$ jstart -N qamorebot /usr/lib/adminbot/adminlogbot.py --config ./confs/qa-logbot.py [21:49:28] Your job 3547556 ("qamorebot") has been submitted [21:49:31] jzerebecki: [21:49:40] chasemp: soft mounts seem to work, but needs manual unmount /remount [21:49:49] I'll go through and do it for all the beta cluster hosts shortly. [21:49:53] already done on deployment-mediawiki03 [21:51:00] mutante: worked. thx. [21:51:23] cool, np [21:52:13] yuvipanda: cool [21:52:24] (yuvi does mount -a not do it?) [21:53:08] chasemp: didn't try it, because it needs to unmount first, since the default is already mounted as hard [21:53:13] chasemp: puppet did a mount -o remount [21:53:14] and that seems to fail [21:53:31] ok interesting I'll tinker w/ it when I get a chance [22:00:07] 6Labs, 10Tool-Labs: Cluebot writes massive logs that are making labstore run out of space - https://phabricator.wikimedia.org/T127222#2036752 (10DamianZaremba) I replied to the mailing list about this, in reply to the email about NFS running out of space. Cleared down the (at the time) 1.5T log file. Currentl... [22:17:45] please restart meetbot, it vanished https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Tools/meetbot [22:19:53] 6Labs, 10DBA: Some databases cannot be backed up/replicated on toolsdb - https://phabricator.wikimedia.org/T127164#2036825 (10Kolossos) I can recreate all data inside templatetiger, in the case of a disaster. [22:20:24] mutante, sweet. I can spoof the from address. :D [22:21:41] Cyberpower678: yea, normal for email :p [22:46:11] 6Labs, 10DBA: Some databases cannot be backed up/replicated on toolsdb - https://phabricator.wikimedia.org/T127164#2036868 (10APPER) I can recreate s51412__data.dewiki_templatedata if needed. It will take some time but it would be possible. [23:56:59] !log added Ppchelko to the list of members [23:57:00] added is not a valid project. [23:57:11] !log deployment-prep added Ppchelko to the list of members [23:57:11] Please !log in #wikimedia-releng for beta cluster SAL [23:57:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL, Master [23:58:14] I like that it warns but still does it :) [23:59:18] I love how you can now mention a ticket number in a log line and it actually gets appended to that ticket