[04:32:28] Just got a failed grid submission email [04:32:51] https://pastebin.com/nEDF5bRL [04:59:04] Betacommand: qstat fails with the same error [05:02:24] stalkword incoming... [05:02:52] !help tools-grid-master.tools.eqiad.wmflabs port 6444 is closed, breaks jsub, qstat, jstop, etc... [05:02:53] zhuyifei1999_: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [05:03:18] zhuyifei1999_: looking... [05:03:34] thanks [05:07:55] !log tools "service gridengine-master restart" on tools-grid-master [05:08:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:08:01] fwiw, my earliest email regarding a crontab jsub failure is around 49 mins ago, 4:06 AM UTC [05:08:04] zhuyifei1999_: better? [05:08:39] * 04:17 [05:08:57] andrewbogott: uh no [05:09:08] oh yeah, it crashed again [05:09:11] it was up for a minute :( [05:09:16] tools.zhuyifei1999-test@tools-bastion-02:~$ qstat [05:09:16] error: commlib error: got select error (Connection refused) [05:09:16] error: unable to send message to qmaster using port 6444 on host "tools-grid-master.tools.eqiad.wmflabs": got send error [05:10:45] oh great :( [05:10:46] segfault at 68 ip 000000000049d85a sp 00007f8890cfec40 error 6 in sge_qmaster[400000+2a3000] [05:17:01] andrewbogott: fyi https://phabricator.wikimedia.org/T183216#3847101 [05:17:55] (I UBN-ed it) [05:32:43] zhuyifei1999_: thanks. It's not at all clear what's going on but we're working on it. [05:51:07] zhuyifei1999_: we still don't know what happened but I think it's better [05:51:12] k [05:51:16] thanks [13:07:54] Hey cloud im getting from cron daemon connection refused errors [13:08:12] https://www.irccloud.com/pastebin/VxaDirE4 [13:08:29] Any idea why? [13:08:41] This is for tools.zppixbot [13:11:17] Zppix: yesterday we had some issues with gridengine [13:11:47] Around 11:20pm UTC-6? [13:12:09] probably yes [13:12:24] So i can disregard? [13:12:36] I think a fix was found by andrewbogott [13:12:48] this error is in realtime? [13:12:53] I mean, right now? [13:12:56] No [13:13:04] Its 7am right now [13:13:42] oh right, so your cron should be able to work in the next run [13:14:16] If i get any other errors ill say something, is there a ticket? [13:14:27] I'm not sure, let me check [13:17:06] arturo: if you cant find one its not a big deal i just wanted to see if i couldnt read up on something. [13:17:11] Zppix: it seems there is no ticket, feel free to open one in phabricator if you find any issue [13:17:33] Of course [13:17:38] Thank you! [13:17:45] thanks you :-) [13:22:13] Zppix: https://phabricator.wikimedia.org/T183216 [13:22:27] Ah [13:22:29] Thanks [14:00:50] !log rcm CAC: Doing vagrant-git update [14:00:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [14:01:39] !log rcm Tin: Jenkins update [14:01:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [14:03:08] !log rcm Neon: Package updates [14:03:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [14:05:50] !log rcm Neon: Rebooting [14:05:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [14:09:29] !log rcm Oxygen: Package updates [14:09:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [16:14:01] I have just reloaded wikireplica proxies with noop configuration, ping me if you see something weird (e.g. connections failing, etc.) [17:00:09] !log deployment-prep Disable ORES UI for beta wikidatawiki, T183266 [17:00:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [17:00:19] T183266: wikidata.beta.wmflabs.org/wiki/Special:RecentChanges InvalidArgumentException No model available for [goodfaith] - https://phabricator.wikimedia.org/T183266 [18:11:29] !log deployment-prep Update beta ORES service to f109792 [18:11:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [18:38:00] !log tools rebooting tools-paws-master-01 [18:38:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:10:40] !log deployment-prep (Earlier today) Depooled deployment-db04, it needs fixing after replication broke badly. It's out of sync with deployment-db03, where I manually fixed inconsistencies [20:10:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [21:26:55] !log reboot tools-paws-master-01 [21:26:55] chasemp: Unknown project "reboot" [21:27:02] !log tools reboot tools-paws-master-01 [21:27:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:11:29] !log paws Killed tiller pod that was in crashloopbackoff [22:11:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [22:13:04] bd808: can you check citation bot jobs its not loading for me but doesnt give me any error [22:14:16] Zppix: what would I be checking for? Do you have a link to docs on that tool? [22:20:29] No [22:20:37] Its fine now [22:20:46] I guess it just had a hiccup [22:25:24] bd808: so about the vagrant reports - do you want a separate phab task for each issue? also, what I tag them with? [22:26:16] SMalyshev: sub tasks of T181353 would be awesome. And I guess break them up however makes sense to you? [22:26:17] T181353: [EPIC] Migrate base image to Debian Stretch - https://phabricator.wikimedia.org/T181353 [22:27:12] I'm planning on working on compat for more roles over the break next week [22:27:31] hopefully I'll be able to get most things I understand working [22:28:55] would be glad to test, I use vagrant every day with cirrus & wikidata [22:30:24] SMalyshev: cool! glad to have the help [22:31:08] I should also fix some pecl-yaml bugs over the break I guess. The ones you assigned to me are pretty stale. :/ [22:31:26] I need to advertise for a new maintainer for that [22:58:42] is quarry broken? https://tools.wmflabs.org/quarry/login is 502 for me [22:59:56] MaxSem, you want https://quarry.wmflabs.org/ [23:00:13] bleh [23:00:20] (The other thing is a Toolforge project, currently stalled) [23:00:22] danke, quiddity