[00:39:39] !log tools T217280 depooled and rebooted tools-sgewebgrid-lighttpd-0902 [00:39:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [00:39:43] T217280: LDAP server running out of memory frequently and disrupting Cloud VPS clients - https://phabricator.wikimedia.org/T217280 [01:45:06] Phabricator is not loading any tickets on my workboards. [01:45:11] Is there an issue? [01:45:47] Cyberpower678, I haven't seen this on other workboards, can you link to a broken one? [01:46:35] Krenair: It's back now. It's been down for the last 5 minutes though. Weird. Phabricator itself loaded, but the workboards were empty. [01:46:58] Did you check the JS console when this happened? [01:47:57] Is it possible someone else came along and changed the default filter to something without any tasks? [01:49:37] Krenair: these are my project boards. I don't think anyone would have touched. But no, I did not touch the JS filter. [01:49:43] *console [02:09:10] !log tools T217280 depooled and rebooted tools-sgewebgrid-lighttpd-0924 [02:09:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [02:09:14] T217280: LDAP server running out of memory frequently and disrupting Cloud VPS clients - https://phabricator.wikimedia.org/T217280 [02:31:04] !log tools T217280 depooled and rebooted tools-sgeexec-0908 since it had no jobs but very high load from an NFS event that was no longer happening [02:31:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [02:31:08] T217280: LDAP server running out of memory frequently and disrupting Cloud VPS clients - https://phabricator.wikimedia.org/T217280 [03:09:55] !log tools T217280 depooled and rebooted 15 other nodes. Entire stretch grid is in a good state for now. [03:09:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [03:09:58] T217280: LDAP server running out of memory frequently and disrupting Cloud VPS clients - https://phabricator.wikimedia.org/T217280 [06:51:13] hare: if i might bring back an idea from the toolserver era regarding T218975: users were asked to offset the minute by the first letter of the toolname (e.g. my tool tools.giftbot runs non-urgent jobs at the 7th minute of the hour, urgent ones at the 0th or 1st). this is not the nice automated solution but it also doesn't require writing one. [06:51:14] T218975: Create a job scheduling lottery for non-fussy Toolforge use cases - https://phabricator.wikimedia.org/T218975 [06:51:56] that is an interesting idea [06:52:14] if we could get people to agree to it as a social convention, we wouldn't have to write any code :D [08:48:22] so i noticed that the old tiles sever has a separate /var/log... [08:48:23] /dev/mapper/vd-logfile--disk 7.8G 1.3G 6.1G 18% /var/log [08:48:30] how do i do that for new instances ? [08:48:49] i think i saw doc. on this somewhere on wikitech, but can't find it anymore. [09:56:48] thedj: something like this? https://wikitech.wikimedia.org/wiki/Help:Adding_Disk_Space [09:59:04] !help [09:59:04] noam: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [09:59:19] noam: hey! how can I help you? [10:00:06] I'm trying to create a wikitech account but when trying to acces [[special:Create Account]] I get Permission denied [10:02:38] noam: due to a bug, wikitech account creation is disabled right now [10:03:35] sorry :-/ [10:03:52] do you know for how long that should last? [10:04:48] noam: I don't have an estimation right now. But I know there are several people actively working on it [10:05:35] Thank you, for quick response. I'll try again with in a week [10:05:43] ok, thanks [10:16:09] arturo: i think this was done manually... [10:16:12] Created *after* executing '/sbin/lvcreate -L 8G -n logfile-disk vd' [10:16:15] 2014... ;) [10:16:36] :-) [10:17:13] anyway.. isolating logs seems like a good idea here.. [10:17:45] 4,5GB of apache access logs and 500MB of render logging per day on tiles1 ;) [10:18:08] I believe isolating log storage (and /var in general) is a common best practice [13:59:03] !log toolsbeta create VMs arturo-sgeexec-sssd-test-[12] for testing T218126 [13:59:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:59:08] T218126: LDAP: try how sssd works with our servers - https://phabricator.wikimedia.org/T218126 [15:11:18] thedj: the separate /var/log volume was a non-optional feature of all instances in the 2013-2014 era. We stopped doing that because it was very problematic for m1.small instances (took a significant portion of the disk quota) and generally confusing to users with larger instances ("where is my quota being used?"). [15:12:09] there are building blocks in puppet still that could be used to make it an optional feature again, but I honestly do not think it would see much use [15:15:28] !help is there a way to run a webservice with a version of node newer than 6? [15:15:29] Sorry, you are not authorized to perform this [15:16:29] Gaelan: yes [15:16:31] Gaelan: the Stretch job grid has nodejs 8.11.1. [15:16:49] !help test [15:16:49] chicocvenancio: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [15:17:00] There's nvm on the gridengine system. Also I believe the newer grid nodes use 8.11 [15:17:04] Yeah, that :) [15:17:22] Was this expected behavior got wm-bot? [15:17:30] chicocvenancio: it's the "!help is" that causes the error message. That's the syntax for adding a new key [15:17:42] Ahhh, makes sense [15:18:19] Thanks for the help y'all [15:18:28] Gaelan: I have a new nodejs 10 container for Kubernetes, but it is not live for use yet [15:18:45] Gaelan: you can also use nvm and bring your own node [15:18:45] Yeah I saw that and was trying to see if I could get k8s to use a custom container name [15:18:56] that will probably happen in about a week or so. It requires a new release of the `webservice` package to roll out [15:48:15] bd808: looks like there are issues with tools-static again [15:48:36] >input fromat example farsi to phone with setting/fa/IR> [15:49:05] musikanimal: yuck. I'll take a look [15:49:11] ty :) [15:56:58] !log tools Rebooting tools-static-12 [15:57:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:58:08] musikanimal: fixed I think. Looks like the same class of NFS error that keeps breaking Stretch grid nodes [15:58:30] yup, looks good. Thank you! [16:04:58] From item 4 of the toolforge rules: "the job grid or Kubernetes should not be used for anything that runs for more than a few seconds or consumes large amounts of resources" [16:04:59] typo? [16:07:01] Gaelan: I would agree – I think the “not” should have been dropped in https://wikitech.wikimedia.org/w/index.php?title=Help:Toolforge/Rules&diff=1767249&oldid=1767149 [16:11:33] i have corrected it [16:12:21] !log tools cleared errored out stretch grid queues [16:12:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:16:55] !log tools switching all instances to use ldap-ro.eqiad.wikimedia.org as both primary and secondary ldap server [17:16:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:51:49] !log git deploy new session plugin to gerrit-test3 T218739 [17:51:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [17:59:44] o/ [18:00:56] is there a way to run webservices with Python 3.5? There is seems to be a discrepancy between the Python version on bastion (3.5) and the Python version in kubernetes python webservices (3.4). [18:01:03] !log tools.trusty-tools Disable daily nag emails. Anyone who is not aware at this point is unlikely to suddenly notice the 20th warning email. [18:01:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.trusty-tools/SAL [18:01:56] pintoch: we have a python3.5 Kuberentes container built, but it is not exposed for use yet. I expect to make it available in the first week of April [18:01:56] pintoch: not as far as I know – make sure to run all `pip install` etc. from within a Kubernetes shell [18:02:14] bd808: ooooh, is there a Phabricator task to subscribe to? [18:02:43] Lucas_WMDE: I kind of just stealthed it into gerrit last week without a task ;) [18:02:56] heh, okay :) [18:03:00] https://gerrit.wikimedia.org/r/#/c/operations/software/tools-webservice/+/496265/ [18:03:49] so is that based on Stretch instead of Jessie, or does it get the newer Python some other way? [18:03:50] hey guys! something has changed on the DBs? two tools that I use aren't working, with GlobalContribs returning an SQL error: "Warning: parse_ini_file(/replica.my.cnf): failed to open stream: No such file or directory in /data/project/guc/labs-tools-guc/src/Settings.php on line 35" and "Error: MySQL login data not found at" [18:03:55] and ipcheck doesn't load the wikis to login [18:04:14] bd808, Lucas_WMDE: ok great, thanks! [18:04:35] Tks4Fish: "parse_ini_file(/replica.my.cnf)" looks like it is missing the initial part of the file path? [18:04:53] don't know, I just use it :P [18:05:00] here's an example: https://tools.wmflabs.org/guc/?src=hr&by=date&user=123.123.123.123 [18:05:14] and here no wikis are displayed to login: https://tools.wmflabs.org/ipcheck/splash.php [18:06:01] Krinkle: guc errors -- "Warning: parse_ini_file(/replica.my.cnf): failed to open stream: No such file or directory in /data/project/guc/labs-tools-guc/src/Settings.php on line 35" [18:07:25] bd808: Ha, dejavu. [18:07:36] Tks4Fish: it looks like musikanimal and SQL are the maintainers of ipcheck -- https://toolsadmin.wikimedia.org/tools/id/ipcheck [18:07:36] bd808: last time that happened it was because the posix user id was broken [18:07:47] It's doing posix php user group home thingy to get the home directory [18:07:57] Krinkle: ah, yeah that could be LDAP barfing :/ [18:08:10] Hi, could anyone tell look at these two pages and tell me, if they are for Toollabs users or for Toollabs admins? https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Monitoring and https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Nodes [18:08:43] okay, thanks bd808 :) [18:08:52] Dvorapa: anything under a /Admin/... is probably platform maintainer docs [18:09:13] bd808: i've got the code separated so that the replica code is generic for several toosl, hence I can't easily pass it a hardcoded or relative path [18:09:17] I'll try later maybe [18:09:33] maybe restart it, feel free to do so, I'll do it when I'm back as wlel. [18:09:45] bd808: weird is that it looks like it is not for labs at all (but is categorized for labs) [18:11:05] Dvorapa: What is "labs"? ;) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Monitoring is very much about the Cloud VPS systems, but is at least currently admin facing docs [18:11:58] bd808: I mean Toolforge (sorry, I'm still used to Toollabs) [18:12:45] https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Nodes is a bit out of date and really also mostly admin facing. Its content that some advanced Toolforge users might care about, but probably not super useful generally [18:13:55] bd808: I see. It was moved from /Admin/Nodes to /Nodes. And the first link is not about Toolforge too, right? [18:16:25] Dvorapa: its not specifically about Toolforge, no. The monitoring systems described there are used by Toolforge, but not specific to Toolforge. And again, mostly admin facing [18:19:44] bd808: And this one: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Monitoring It seems also like admin stuff, but also not prefixed by /Admin/ [18:19:48] ? [18:23:14] Could someone put the logs in the topic [19:09:37] Dvorapa: yeah. that one is Admin stuff for sure [20:05:38] (or we could just amend "More details and logs at ...Help:IRC" ) [20:09:50] we needs url shorteners that we trust :/ [20:12:33] it just so happens thati've written one... but there's no inherent access control or anything :) [20:13:35] someday™ we will get the one at w.wiki working [20:13:53] Wikimedia Cloud Services (wikitech.wikimedia.org) | Ask questions here, but please provide links and context. Use "!help" if nobody responds | Service Status: OK! | login.tools.wmflabs.org is now Debian Stretch | More details and channel logs at https://wikitech.wikimedia.org/wiki/Help:IRC | Code of Conduct applies: https://www.mediawiki.org/wiki/Code_of_Conduct [20:14:03] gah [20:14:54] I'll pull the Stretch section out on Monday when I start shutting down the Trusty grid [20:15:00] * quiddity stops poking at topic, and looks for lunch [20:15:04] * bd808 should send a note about that [20:24:25] + [20:31:02] bd808: Thank you for your help. I think there is no admin-only stuff in https://wikitech.wikimedia.org/wiki/Category:Toolforge left (except the /Admin page itself) [20:32:39] Dvorapa: Thanks for working on that. Wikitech doesn't get as much love as it needs for making the content pages more easily discoverable. :) [20:33:07] and thanks you too for following up your complaints about tech docs with some help in improving them! [20:33:52] no problem [20:34:48] Only last three chapters of Toolforge for beginners are still blank [20:35:47] And after that I would like to solve T134495 [20:35:48] T134495: Create a "my first Pywikibot bot" tutorial for Toolforge - https://phabricator.wikimedia.org/T134495 [20:43:17] Trusty grid will shutdown on 2019-03-25 -- https://lists.wikimedia.org/pipermail/cloud-announce/2019-March/000150.html [20:52:51] !log tools.heritage Starting Stretch migration [20:52:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [21:40:43] !log tools.heritage Manually starting harvesting job to ensure migration (T216364) worked [21:40:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [21:40:47] T216364: Upgrade heritage to Strech - https://phabricator.wikimedia.org/T216364 [21:42:42] Krinkle: around? [21:59:10] bd808: Looks like job 957144 is stuck now. Didn't respond to qdel. [22:01:56] !log tools.anomiebot Force deleted job 957144 [22:01:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.anomiebot/SAL [22:02:02] Thanks [22:02:45] akoopal: I might me. [22:03:20] Krinkle: I get errors on guc [22:03:40] Warning: parse_ini_file(/replica.my.cnf): failed to open stream: No such file or directory in /data/project/guc/labs-tools-guc/src/Settings.php on line 35 [22:03:55] akoopal: yeah, looks like ldap has issues with finding the user's home directory. [22:04:01] Not sure what I can do about it, but I've restarted it just in case. [22:04:24] that seems to have worked [22:04:32] k :) We'll see for how long, though.