[00:56:27] YuviPanda: so uh, yesterday I filed https://phabricator.wikimedia.org/T140711 about not being able to log into extdist-02, and today I'm finding myself unable to get into integration-aptly01...but other labs instances are fine, so I'm kinda stumped [00:58:05] andrewbogott, chasemp? ^ [01:02:23] PROBLEM - SSH on tools-worker-1004 is CRITICAL: Server answer [01:06:04] 10Labs-project-Phabricator, 07Tracking: Bugs at security options at phabricator labs - https://phabricator.wikimedia.org/T117665#2478315 (10Danny_B) [01:09:36] I think integration-aptly01 might just be down? integration-jessie-lego-test01 is having trouble connecting to it [01:17:31] legoktm: that sounds like a down instance to me — what does horizon say about them? [01:18:04] andrewbogott: https://horizon.wikimedia.org/project/instances/4dc57853-2e94-4db5-b869-1f549288a368/ status: active [01:18:29] legoktm: look at the log [01:18:52] integration-aptly01 login: [923040.088317] INFO: task jbd2/vda3-8:111 blocked for more than 120 seconds. [01:19:02] then a bunch of [01:19:03] [923160.115432] INFO: task nscd:10188 blocked for more than 120 seconds. [01:19:03] [923160.116223] Not tainted 4.4.0-1-amd64 #1 [01:19:03] [923160.117705] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [01:19:21] so it's stuck? [01:19:37] yeah, looks like it's having deep-down kernel problems [01:19:42] A reboot will probably fix it [01:21:01] should I do "soft reboot" or "hard reboot"? [01:21:24] hard [01:21:36] I mean — it probably doesn't matter, but hard won't do any harm. Might take a few seconds longer [01:21:56] * legoktm does [01:24:14] andrewbogott: aaand I'm in. thank you very much :))) [01:24:30] I didn't do anything :) [01:24:51] you told me what to do :P [01:30:57] did labstore1005 get reinstalled ? [01:37:22] RECOVERY - SSH on tools-worker-1004 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0) [01:58:27] 10Wikimedia-Labs-General, 07Tracking: Labs bugs that were resolved but are waiting for ops to merge them - https://phabricator.wikimedia.org/T59301#2478498 (10Danny_B) [02:00:27] 10Tool-Labs: Install ufraw-batch - https://phabricator.wikimedia.org/T59008#2478503 (10Danny_B) [02:00:29] 10Tool-Labs: Install socat - https://phabricator.wikimedia.org/T59005#2478504 (10Danny_B) [02:00:31] 10Wikimedia-Labs-General, 07Tracking: Labs bugs that were resolved but are waiting for ops to merge them - https://phabricator.wikimedia.org/T59301#2478501 (10Danny_B) [02:00:33] 10Tool-Labs: Install rrdtool - https://phabricator.wikimedia.org/T59004#2478505 (10Danny_B) [02:00:39] 10Wikimedia-Labs-General: Labs bugs that were resolved but are waiting for ops to merge them - https://phabricator.wikimedia.org/T59301#619679 (10Danny_B) [02:00:53] 10Wikimedia-Labs-General: Labs bugs that were resolved but are waiting for ops to merge them - https://phabricator.wikimedia.org/T59301#619679 (10Danny_B) 05Resolved>03Invalid [03:01:13] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Omerfarukdemir was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=766135 edit summary: [03:40:21] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Support reverse dns for public labs IPs - https://phabricator.wikimedia.org/T104521#2478589 (10AlexMonk-WMF) The above is no longer the problem, now there's some designate internal errors happening when I try to create domains under a new 128-25.155.80.208.i... [06:44:50] PROBLEM - Puppet run on tools-worker-1002 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:19:47] RECOVERY - Puppet run on tools-worker-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [08:31:44] 06Labs, 10Tool-Labs: Offer Korean Locales "ko_KR.euckr" and "ko_KR.utf8" on Tool Labs - https://phabricator.wikimedia.org/T130532#2479026 (10Ykhwong) How long will it take for my request to be processed? Please let me know if you need any additional information. [08:38:26] Is there a way using the API to get a list of all blocks instead of just the currently effective block? [08:43:13] TParis: you mean like an API version of the block log? [08:43:46] Pretty much. I'm rewriting UTRS and I need to know in the case of a reblock of an existing block, what the original block was for [08:48:24] PROBLEM - SSH on tools-worker-1004 is CRITICAL: Server answer [08:53:19] TParis: https://en.wikipedia.org/w/api.php?action=query&list=logevents&letitle=User:Tom29739&letype=block [08:53:44] Or: https://en.wikipedia.org/w/api.php?action=query&list=logevents&letitle=User:Tom29739&leaction=block/block for only blocks [08:53:58] Or: https://en.wikipedia.org/w/api.php?action=query&list=logevents&letitle=User:Tom29739&leaction=block/reblock for only reblocks [08:54:09] block/unblock for unblocks [08:54:33] tom.... <3 [08:54:46] Thank you, I'd been fiddling with that for awhile [08:54:58] It took me ages to find. [08:59:54] very nice, it specifies if it was a block, reblock, or an unblock, that's going to be super helpful [09:53:43] Hi [09:53:48] I am slightly dissapointed [09:54:14] acess to tools-static is being somewhat stubborn in performance terms recently [09:54:21] Anyone else had this? [09:54:34] (Trying to locallise it a local pecularity.) [09:56:03] https://en.wikisource.org/w/index.php?title=Page:The_Imperial_Gazetteer_of_India_-_Volume_6_(2nd_edition).pdf/4&action=edit&redlink=1 for example is taking a lot longer to load than expected [09:56:21] (and tools-static.wmflabs.org is where it's getting stuck most of the time.) [10:03:07] I've noticed [10:03:46] ShakespeareFan00: it's not just me then? [10:03:50] Nope [10:03:59] I get when doing stuff at wikisource [10:06:38] ShakespeareFan00: is this returning 500's for you too? https://tools.wmflabs.org/static-browser/ [10:07:39] tom29739: yep [10:09:26] It's been like that for weeks [10:09:41] Anny timescale on it being fixed? [10:09:56] I've mentioned it in here about 5 times now and have been ignored. [10:10:11] Time to submit a Phab ticket? [10:10:28] In my instance it's making it a LOT harder to contribute to [10:10:30] a project [10:10:33] tools-static should be different. [10:10:55] Although i'm not entirely sure where Wikisource (an official WMF project) should be needing labs annyway [10:11:00] What confuses me is that production shouldn't be loading from tools [10:11:05] Especially for something that should be core functionality [10:11:14] @tom29739: Quite [10:11:38] What exactly is it loading on tools-static? [10:12:30] I don't know [10:12:34] Possibly thumbnialing [10:12:40] Or it's some kind of script [10:12:54] You'd have to look into it at Wikisource [10:13:21] RECOVERY - SSH on tools-worker-1004 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0) [10:19:22] PROBLEM - SSH on tools-worker-1004 is CRITICAL: Server answer [10:22:30] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Gammawave was created, changed by Gammawave link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Gammawave edit summary: Created page with "{{Tools Access Request |Justification=Figuring out how many musicians there are in a city. |Completed=false |User Name=Gammawave }}" [10:56:50] tom29739 tools-static is unrelated to static-browser. [10:57:27] static-browser is maintained by ireas, and serves tools.wmflabs.org/static [10:57:36] which is entirely different from tools-static.wmflabs.org, which just allows arbitrary users to serve static files faster [10:57:47] Oh [10:57:57] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web#Static_file_server [10:58:17] one such service is cdnjs, which can be browsed at tools.wmflabs.org/cdnjs [10:58:40] that* is puppetized, and complaints about that should go to me. [10:59:15] That makes sense now [10:59:31] Thanks for explaining it to me [10:59:54] yw [11:02:27] (03PS1) 10Jcrespo: Add fake prometheus mysql password [labs/private] - 10https://gerrit.wikimedia.org/r/299969 (https://phabricator.wikimedia.org/T128185) [11:03:19] (03CR) 10Jcrespo: [C: 032 V: 032] Add fake prometheus mysql password [labs/private] - 10https://gerrit.wikimedia.org/r/299969 (https://phabricator.wikimedia.org/T128185) (owner: 10Jcrespo) [11:11:55] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: etcd hosts hanging with kernel hang - https://phabricator.wikimedia.org/T140256#2479484 (10yuvipanda) Hmm, happened with similar symptoms on tools-worker-1004, which has no etcd. [11:14:24] RECOVERY - SSH on tools-worker-1004 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0) [11:14:45] !log tools rebooted tools-worker-1004 [11:14:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [11:15:16] 10Wikibugs: Print events in closed tasks in grey - https://phabricator.wikimedia.org/T140881#2479496 (10Danny_B) [11:18:47] 10Wikibugs: Don't display User-* tags - https://phabricator.wikimedia.org/T140883#2479529 (10Danny_B) [11:21:57] PROBLEM - Puppet staleness on tools-worker-1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [43200.0] [11:26:59] RECOVERY - Puppet staleness on tools-worker-1004 is OK: OK: Less than 1.00% above the threshold [3600.0] [12:28:24] (03PS1) 10Youni Verciti: Rev. 0.8 more functions [labs/tools/fr-wikiversity-ns] - 10https://gerrit.wikimedia.org/r/299991 [12:32:52] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Add a diamond collector for kubernetes usage stats - https://phabricator.wikimedia.org/T140887#2479718 (10yuvipanda) [13:40:15] (03PS1) 10Yuvipanda: Add new secrets for grafana labs / prod instances [labs/private] - 10https://gerrit.wikimedia.org/r/300007 (https://phabricator.wikimedia.org/T120295) [13:41:44] (03PS2) 10Yuvipanda: Add new secrets for grafana labs / prod instances [labs/private] - 10https://gerrit.wikimedia.org/r/300007 (https://phabricator.wikimedia.org/T120295) [13:41:52] (03CR) 10Yuvipanda: [C: 032 V: 032] Add new secrets for grafana labs / prod instances [labs/private] - 10https://gerrit.wikimedia.org/r/300007 (https://phabricator.wikimedia.org/T120295) (owner: 10Yuvipanda) [14:34:38] 06Labs, 07Graphite, 13Patch-For-Review: Setup "official labs grafana" instance - https://phabricator.wikimedia.org/T120295#2480152 (10yuvipanda) a:03yuvipanda This is very close to being done now! [14:42:05] 06Labs, 10Tool-Labs: Webservice on Tools Labs fails repeatedly - https://phabricator.wikimedia.org/T115231#2480167 (10yuvipanda) p:05Unbreak!>03Normal [14:44:01] 06Labs, 06Operations: Move labs graphite to graphite-labs.wikimedia.org - https://phabricator.wikimedia.org/T140899#2480171 (10yuvipanda) [15:15:42] 06Labs, 07Tracking: Labs project quota increases (Tracking) - https://phabricator.wikimedia.org/T140904#2480369 (10yuvipanda) [15:32:30] If an labsadmin is here: Would be better to kill the MerlBot jobs and disable them in the crontab, till the owner returns. MerlBots scripts are using lots of ressources, which makes sense, because the bot does a lot, but currently (because https is now required) the bot does nothing (https://de.wikipedia.org/wiki/Spezial:Beitr%C3%A4ge/MerlBot?uselang=en), so it makes more sense that he don't use [15:32:36] s ressources for no results [15:33:21] Luke081515: you may be right that it's the sane thing to do to cleanup now that it's been neutered, there are a few tasks for this ongoing can you comment to that effect there? [15:34:16] chasemp: which effect do you mean? [15:34:52] the phrasing means, can you comment that you think this is a good idea [15:36:37] I think it is a good idea to prevent merlbots jobs from starting, till Merl is back, and can enable himself these jobs again. Without Merl the bot can't get fixed, I think he is currently the only one with access to the source [15:36:52] so disabling the crontab entrys with # would be useful [15:38:29] the second thing I want to request is, to disable the sum_disc job of asurabot. This job is spamming (https://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion:DrTrigon&action=history&uselang=en), the owner to reachable since september, and a restart did not help [15:40:13] sure both seem reasonable but merlbot has a huge amount of history and coordination and if you want to propose we outright shut it down now the right place is here https://phabricator.wikimedia.org/T121279 [15:40:27] so all of the relevant folks see [15:40:44] on the second I'm not too familiar, could you make a task so we can at least persist something for if they return? [15:45:48] hm, no wikibugs? [15:45:58] 06Labs, 10Tool-Labs, 13Patch-For-Review: Figure out a way to keep MerlBot running when the HTTP POST loophole is closed - https://phabricator.wikimedia.org/T121279#2480508 (10Luke081515) MerlBot is now down, he is not making edits anymore. How can we resolve the actual situation? I think we should: * a) Add... [15:47:41] 06Labs, 10Tool-Labs: Disable the sumdisc job of AsuraBot - https://phabricator.wikimedia.org/T140909#2480514 (10Luke081515) [15:47:46] chasemp: second one: ^ [15:53:20] 06Labs, 10Tool-Labs: Figure out a way to keep MerlBot running when the HTTP POST loophole is closed - https://phabricator.wikimedia.org/T121279#2480541 (10bd808) [15:57:51] 06Labs, 10Tool-Labs: Figure out a way to keep MerlBot running when the HTTP POST loophole is closed - https://phabricator.wikimedia.org/T121279#2480556 (10bd808) >>! In T121279#2480508, @Luke081515 wrote: > MerlBot is now down, he is not making edits anymore. How can we resolve the actual situation? I think we... [15:58:56] bd808: so you prefer b)? [15:59:23] 10Tool-Labs-tools-Pageviews: Do not show log scale when all values are below 10 - https://phabricator.wikimedia.org/T140910#2480564 (10MusikAnimal) [15:59:24] :) [16:00:31] Luke081515: well I don't prefer (a) at least [16:00:46] and what about b)? [16:00:47] I'm fine with shutting it down if nothing at all is working [16:00:59] I'm not sure how to evaluate that [16:01:12] makes sonse for merlbot actually. merl said once, that all of his scripts are running about 64h a day [16:01:31] with lot's of DB-usage [16:02:00] yeah... I'd actually love to see the source to figure out what the giant pile of jobs there are actually doing [16:02:46] Merl did show up to post one comment a month ago -- https://phabricator.wikimedia.org/T121279#2392449 [16:02:59] but nothing I've seen since [16:03:48] I haven't gotten any response from Birgit since wikimania either [16:10:34] soemone yesterday mentioned that his edits had stopped w/ the merge of the relevant changeset, but I don't know if there are other things his stuff does [16:11:11] my impression is no, that it's all on-wiki work and very visible both in action and absence [16:21:20] 06Labs, 10Tool-Labs: Figure out a way to keep MerlBot running when the HTTP POST loophole is closed - https://phabricator.wikimedia.org/T121279#2480749 (10bd808) Here's my current proposal for "resolving" this task: * Close this ticket with a status of "Declined" since we weren't able to rescue the bot before... [16:26:18] 06Labs, 07Tracking: Labs project quota increases (Tracking) - https://phabricator.wikimedia.org/T140904#2480848 (10Andrew) [16:36:03] 06Labs, 10Tool-Labs: Figure out a way to keep MerlBot running when the HTTP POST loophole is closed - https://phabricator.wikimedia.org/T121279#2480918 (10BBlack) >>! In T121279#2480556, @bd808 wrote: > I don't think that we can reasonably re-introduce an exception for Merlbot. I agree as well. Keep in mind... [17:01:07] 06Labs, 10Tool-Labs: Figure out a way to keep MerlBot running when the HTTP POST loophole is closed - https://phabricator.wikimedia.org/T121279#2481026 (10doctaxon) I can replace MerlBot's bot actions in a very short time - I am working on this right now. [17:08:32] 06Labs, 10Tool-Labs: Figure out a way to keep MerlBot running when the HTTP POST loophole is closed - https://phabricator.wikimedia.org/T121279#1874462 (10tom29739) Java 1.8 is supported on Kubernetes in Tool Labs. If the SGE scripts could be converted or replaced with Kubernetes equivalents, then MerlBot coul... [17:19:58] 06Labs, 10Tool-Labs: Figure out a way to keep MerlBot running when the HTTP POST loophole is closed - https://phabricator.wikimedia.org/T121279#2481079 (10bd808) >>! In T121279#2481043, @tom29739 wrote: > Java 1.8 is supported on Kubernetes in Tool Labs. If the SGE scripts could be converted or replaced with K... [17:20:02] 06Labs, 10Tool-Labs: Figure out a way to keep MerlBot running when the HTTP POST loophole is closed - https://phabricator.wikimedia.org/T121279#2481080 (10Bmueller) @doctaxon, this is great news! If we (wmde #tcb-team) can support you with code reviews please let us know! [17:21:37] 06Labs, 10Beta-Cluster-Infrastructure, 13Patch-For-Review, 07Puppet: /etc/puppet/puppet.conf keeps getting double content - first for labs-wide puppetmaster, then for the correct puppetmaster - https://phabricator.wikimedia.org/T132689#2481082 (10mmodell) a:05mmodell>03None I'm not sure if my patch fix... [17:23:27] 06Labs, 10Tool-Labs: Figure out a way to keep MerlBot running when the HTTP POST loophole is closed - https://phabricator.wikimedia.org/T121279#2481089 (10doctaxon) Thank you, @Bmueller ... but it's not written in Java. I'll replace the bot jobs in another script language. [17:47:02] 10Tool-Labs-tools-Other: merlbot cron jobs disabled due to HTTP POST errors - https://phabricator.wikimedia.org/T140925#2481163 (10bd808) [17:48:37] 06Labs, 07Tracking: Labs project quota increases (Tracking) - https://phabricator.wikimedia.org/T140904#2481179 (10chasemp) p:05Triage>03Normal [17:49:51] 10Tool-Labs-tools-Other, 07Tracking: merl tools (tracking) - https://phabricator.wikimedia.org/T69556#2481191 (10bd808) [17:49:54] 06Labs, 10Tool-Labs: Figure out a way to keep MerlBot running when the HTTP POST loophole is closed - https://phabricator.wikimedia.org/T121279#2481187 (10bd808) 05Open>03declined >>! In T121279#2480749, @bd808 wrote: > Here's my current proposal for "resolving" this task: > * Close this ticket with a stat... [17:57:04] 10Tool-Labs-tools-Other: merlbot cron jobs disabled due to HTTP POST errors - https://phabricator.wikimedia.org/T140925#2481210 (10bd808) A typical error message from a failing job is: ``` WARNING: unexcpected Status code: 403 (Insecure Request Forbidden - use HTTPS - https://lists.wikimedia.org/pipermail/mediaw... [17:57:20] 10Tool-Labs-tools-Other: merlbot cron jobs disabled due to HTTP POST errors - https://phabricator.wikimedia.org/T140925#2481212 (10bd808) p:05Lowest>03High [17:57:32] 10Tool-Labs-tools-Other: merlbot cron jobs disabled due to HTTP POST errors - https://phabricator.wikimedia.org/T140925#2481214 (10bd808) a:03Merl [18:10:01] 06Labs, 07Tracking: Labs project quota increases (Tracking) - https://phabricator.wikimedia.org/T140904#2481242 (10chasemp) [18:12:26] 06Labs, 07Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#2481247 (10chasemp) [18:14:05] 06Labs: Two small instances: for WikiToLearn development - https://phabricator.wikimedia.org/T115282#1720069 (10chasemp) >>! In T115282#1771743, @Toma.luca95 wrote: > It is possible have a proxy for *.wikitolearn.org/*.wiki2learn.org domains and subdomains with websocket support? > I can't find admin page where... [18:15:48] 06Labs, 07Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#2481256 (10chasemp) [18:15:50] 06Labs, 10whatcanidoforwikimedia.org: Project wcidfwm (What can I do for wikimedia) - https://phabricator.wikimedia.org/T115092#2481254 (10chasemp) 05Open>03declined if I understand this issue this was resolved in other ways [18:22:39] 06Labs, 10Tool-Labs: Figure out a way to keep MerlBot running when the HTTP POST loophole is closed - https://phabricator.wikimedia.org/T121279#2481261 (10bd808) >>! In T121279#2481026, @doctaxon wrote: > I can replace MerlBot's bot actions in a very short time - I am working on this right now. @doctaxon let... [18:26:05] 06Labs, 10Tool-Labs: Figure out a way to keep MerlBot running when the HTTP POST loophole is closed - https://phabricator.wikimedia.org/T121279#2481269 (10doctaxon) Thank you, @bd808! [18:35:23] 06Labs, 10Horizon: Investigate (and probably disable) 'rebuild instance' option - https://phabricator.wikimedia.org/T140259#2481303 (10Andrew) This /almost/ works. The breakdown seems to be with puppet and salt certs -- we don't get a proper notice of the instance being destroyed and recreated, so the new ins... [21:26:58] !log tools rebooting tools-k8s-etcd-01 [21:27:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, dummy [21:27:15] tx andrewbogott, there is a task we should note this on somewhere I"ll look [21:28:15] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: etcd hosts hanging with kernel hang - https://phabricator.wikimedia.org/T140256#2482092 (10chasemp) 05Resolved>03Open tools-k8s-etcd-01 seems the same again today. No time to look into deeply atm so we are rebooting it to kick the can down the road. [21:29:22] ok, tools-k8s-etcd-01.eqiad.wmflabs is back [21:30:10] check cleared too [21:36:39] 06Labs, 10wikitech.wikimedia.org, 07Upstream, 07Wikimedia-log-errors: PHP array to string conversion on wikitech in SMW 1.8.x - https://phabricator.wikimedia.org/T124235#2482115 (10Dereckson) #upstream — Reported at https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/1743 [21:46:02] 10Wikibugs, 13Patch-For-Review: Do not notify #Trash tasks to IRC - https://phabricator.wikimedia.org/T140426#2482143 (10Danny_B) 05Resolved>03Open Doesn't work as expectd... Reopened per @Legoktm's wish on IRC. [22:23:40] 06Labs, 10Tool-Labs: Webservice on Tools Labs fails repeatedly - https://phabricator.wikimedia.org/T115231#1718509 (10Gorthian) This has been failing over and over yesterday and today. It is intermittent. As a frequent user of dplbot, I have to say that the solution has **not** been found yet.