[05:42:06] 6Labs, 10Tool-Labs: Linkwatcher spawns many processes without parent - https://phabricator.wikimedia.org/T123121#1939529 (10Beetstra) The bot is still eating away its (old) backlog, which goes slowly. Bot seems to operate fine now with way less processes. Still it uses 200-250% of processor power, which seem... [08:26:27] 6Labs, 10Tool-Labs: tools.taxonbot and tools.giftbot cronjobs not firing - https://phabricator.wikimedia.org/T123186#1939569 (10doctaxon) Here the next problem again (tools.taxonbot): ``` JSUB_OPTIONS=-once -j y -quiet -v LC_ALL=en_US.UTF-8 -mem 1g 0 0 * * * jsub -once -j y -quiet -v LC_ALL=en_... [08:56:51] 6Labs, 10Tool-Labs: tools.taxonbot and tools.giftbot cronjobs not firing - https://phabricator.wikimedia.org/T123186#1939584 (10doctaxon) Any possible opinions for solution of the problem (not sure about practicabillity): * Install a second cron and run cron1 and cron2, maybe run 2 cpu hosts independently, sel... [12:31:02] 6Labs, 10Tool-Labs: Delete "toolserver" tool - https://phabricator.wikimedia.org/T116389#1939680 (10valhallasw) [12:31:04] 6Labs, 10Tool-Labs, 7Tracking: Toolserver migration to Tools (tracking) - https://phabricator.wikimedia.org/T60788#1939681 (10valhallasw) [12:40:22] 10Tool-Labs-tools-Other, 7Tracking: Toolserver.org tools that have not been migrated (tracking) - https://phabricator.wikimedia.org/T60865#1939693 (10valhallasw) [12:40:24] 10Tool-Labs-tools-Other: Migrate https://toolserver.org/~bawolff/en-wn-editor-stats.php to Tool Labs - https://phabricator.wikimedia.org/T60867#1939691 (10valhallasw) 5Open>3declined Closing as declined, then. [12:50:15] 6Labs, 10wikitech.wikimedia.org: Promote @valhallasw to contentadmin on wikitech - https://phabricator.wikimedia.org/T123032#1939700 (10valhallasw) 5Open>3Resolved a:3valhallasw [12:56:19] 6Labs, 10Tool-Labs: toolserver.org uses 302 redirects instead of 301 - https://phabricator.wikimedia.org/T123861#1939702 (10valhallasw) 3NEW [15:20:15] 6Labs, 10Tool-Labs: tools.taxonbot and tools.giftbot cronjobs not firing - https://phabricator.wikimedia.org/T123186#1939774 (10Luke081515) @doctaxon What if you tell cron to start 5 minutes later? [15:22:42] 6Labs, 10Tool-Labs: tools.taxonbot and tools.giftbot cronjobs not firing - https://phabricator.wikimedia.org/T123186#1939776 (10doctaxon) Sorry, the job has to start at midnight, 00:00 UTC [15:25:36] 6Labs, 10Tool-Labs: tools.taxonbot and tools.giftbot cronjobs not firing - https://phabricator.wikimedia.org/T123186#1939778 (10Luke081515) Start the job 5 minutes earlier and add a five minutes sleep in your language might help? [16:18:06] 6Labs, 10Tool-Labs: tools.taxonbot and tools.giftbot cronjobs not firing - https://phabricator.wikimedia.org/T123186#1939859 (10valhallasw) >>! In T123186#1939336, @scfc wrote: > The thread at http://lists.arthurdejong.org/nss-pam-ldapd-users/2013/msg00001.html deals with "error reading from client: Timer expi... [16:21:01] Do we allow users to directly change their LDAP data? (I could not find docs on it on wikitech. A gerrit user asked me to change his username.) [16:21:17] Is there any page I could send him to that covers the needed steps? [16:21:29] Or would we prefer him to sign up afresh? [16:21:42] qchris_: fresh signup is the best option [16:21:47] ldap renames are painful [16:21:53] :-) [16:21:57] Ok. Thanks. [16:40:02] 6Labs, 10Tool-Labs: tools.taxonbot and tools.giftbot cronjobs not firing - https://phabricator.wikimedia.org/T123186#1939867 (10valhallasw) As the errors also happen during the day: ``` Jan 17 16:30:10 tools-submit nslcd[1033]: [bc73ee] error reading from client: Timer expired Jan 17 16:30:10 tools-submit nslc... [17:54:47] valhallasw`cloud: should we start building a new cron box? [17:56:20] YuviPanda: Yeah not a bad idea [17:56:46] Although debugging why the loaf is so crazilu high would also be good [17:57:17] Also new box is a good time to write down recovery of crontab steps [17:58:09] No time today thougg [18:01:55] valhallasw`cloud: probably hanging on contacting LDAP [18:02:02] valhallasw`cloud: yeah, for me neither (Today) [18:02:11] valhallasw`cloud: probably can file a bug tho [18:02:19] YuviPanda: right, but then a new host might not help [18:02:30] then it's maybe more useful to move bigbrother to a secondary host? [18:02:55] we should do both [18:02:58] move bigbrother to services [18:03:01] and cron to its own [18:04:41] could you take a look at tools.giftbot's qstat? there's something seriosly wrong/broken (jobs waiting for a long time) [18:05:49] or is this a general issue? [18:06:00] ugh [18:06:02] 'array jobs' [18:06:11] I've no idea how those work either... [18:06:21] * YuviPanda checks if this is a general issue first [18:06:35] nope [18:06:41] just giftbot [18:11:06] interesting [18:11:09] > 01/17/2016 18:09:59|schedu|tools-grid-master|E|unable to find job 2421143 from the scheduler order package [18:11:31] valhallasw`cloud: didn't you run into this earlier? [18:17:26] I just tried submitting a new job [18:17:28] worked fine too [18:17:30] hmm [18:17:35] gifti: can you try deleting and rebusmitting them? [18:18:56] yeah [18:20:15] thanks [18:21:01] hi! i'd like to request a username change at LDAP/Gerrit, anyone able and willing ? [18:23:30] lfschenone: I think general consensus might be just to create a new username, renames too hard atm [18:23:49] :-/ [18:24:10] i have considerable history associated with my username .. any chance of renames becoming easier in the future ? [18:24:43] https://phabricator.wikimedia.org/T85913 has a lot of history [18:24:55] and there's at least one person willing to try to do the rename ( ostriches ) [18:25:00] maybe comment there and see how it goes? [18:26:26] thanks! [18:26:57] np! and sorry it is messy! [18:27:50] * ostriches feels the ping [18:31:58] gifti: unfortunately I've to run now :( I'll check back later in the evening... sorry! [18:32:18] np [18:33:45] gifti: check qstat -j , that should give scheduler info [18:34:16] ah, right [18:35:52] but how would you explain job 2366019? [18:37:02] 6Labs, 10Tool-Labs: Migrate tools-submit to tools-cron-01/-02 - https://phabricator.wikimedia.org/T123873#1939979 (10yuvipanda) 3NEW [18:37:17] 6Labs, 10Tool-Labs: tools.taxonbot and tools.giftbot cronjobs not firing - https://phabricator.wikimedia.org/T123186#1939987 (10yuvipanda) [18:37:19] 6Labs, 10Tool-Labs: Migrate tools-submit to tools-cron-01/-02 - https://phabricator.wikimedia.org/T123873#1939988 (10yuvipanda) [18:39:20] gifti: on mobile, will check in 20 mins [18:49:43] gifti: "job dropped because of user limitations" [18:50:11] not sure if that's the actual issue [18:50:24] it's supposed to be a set of 200 jobs, right? job-array tasks: 1-200:1 [18:51:37] that's right [18:51:39] gifti: and 121 of those are currently running [18:51:51] ah. [18:52:05] we set maxujobs last week [18:52:30] to 128, probably [18:52:35] duh [18:53:19] (the goal was to enable job submission from exec hosts without killing SGE in the case of a recursive job creation scenario) [18:53:39] ah, that's nice [18:53:42] so let me just up that to 1024 for now [18:54:00] i could now drop the use of jlocal [18:54:43] gifti: would 1024 be enough for you? [18:54:52] more than enough [18:56:10] 6Labs, 10Tool-Labs: Make gridengine exec hosts also submit hosts - https://phabricator.wikimedia.org/T123270#1939990 (10valhallasw) @giftpflanze actually needs more than this because of their array jobs. I'm setting this to 1000 now, which should be enough even for extreme use cases. It might also still be eno... [18:57:02] gifti: yep, works now, except for queue instance "giftbot@tools-exec-gift.eqiad.wmflabs" dropped because it is overloaded: np_load_short=1.570000 (no load adjustment) >= 1.50 [18:57:10] but that's probably expected? [18:57:52] probably? [18:57:55] idk [18:58:23] SGE also tries to not put more jobs on a host that's already using 100% cpu :-) [18:58:30] which is what that warning is [18:59:01] it should submit more tasks soon enough [18:59:23] ah, there we go [18:59:31] Yeah, now waiting on np_load_avg=2.162500 (= 1.485000 + 0.50 * 2.710000 with nproc=2) >= 2.00 [19:00:02] thank you for finding the obstacle and fixing it! [19:00:12] (1.5 is current cpu load, 2.71 is a measure of 'recently started tasks') [19:00:53] You're welcome (although I'm also responsible for causing it ;-)) [19:01:19] ^^ [19:03:27] 6Labs: labstore2001 disk space WARNING - https://phabricator.wikimedia.org/T123874#1939994 (10Andrew) 3NEW [19:38:06] 6Labs, 10Tool-Labs: tools.taxonbot and tools.giftbot cronjobs not firing - https://phabricator.wikimedia.org/T123186#1940015 (10scfc) I tried to trigger the warnings with: ``` #!/usr/bin/python3 import grp import multiprocessing import pwd import time def f(x): return pwd.getpwnam(x) if __name__ == '__... [20:07:44] 10Tool-Labs-tools-Global-user-contributions, 7Easy, 7JavaScript: GUC counter doesn't increment - https://phabricator.wikimedia.org/T123879#1940058 (10Danny_B) 3NEW [20:52:24] 6Labs, 10Tool-Labs: Delete "toolserver" tool - https://phabricator.wikimedia.org/T116389#1940092 (10Nemo_bis) [20:52:26] 6Labs, 10Tool-Labs, 7Tracking: Toolserver migration to Tools (tracking) - https://phabricator.wikimedia.org/T60788#1940093 (10Nemo_bis) [21:47:45] help, i am being spammed by bigbrother [22:56:10] 6Labs, 10Labs-Infrastructure, 10Tool-Labs: Backports are enabled in new Trusty instances - https://phabricator.wikimedia.org/T123890#1940220 (10scfc) 3NEW a:3Andrew [22:57:31] 6Labs, 10Labs-Infrastructure, 10Tool-Labs: Backports are enabled in new Trusty instances - https://phabricator.wikimedia.org/T123890#1940230 (10scfc) [22:57:42] 6Labs, 10Tool-Labs, 5Patch-For-Review: Reduce amount of Tools-local packages - https://phabricator.wikimedia.org/T91874#1940229 (10scfc)