[00:40:16] !log admin Restarted maintain-dbusers on labstore1004. Process hung up on failed LDAP connection. [00:40:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [01:47:43] what is the Stretch IP range? I have to reconfigure a tool that uses OAuth, the tool is not working after migration because I had configured the trusty IP range [01:50:09] I think it is 172.16.0.0/21 [01:52:18] chicocvenancio: it is working, thanks! [02:26:12] !log tools Deleted shutdown Trusty grid nodes tools-webgrid-lighttpd-14{20,21,22,24,25,26,27,28} (T217152) [02:26:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [02:26:16] T217152: Monitor and scale in the Trusty grid - https://phabricator.wikimedia.org/T217152 [02:34:28] !log tools Disassociated floating IPs and deleted shutdown Trusty grid nodes tools-exec-14{33,34,35,36,37,38,39,40,41,42} (T217152) [02:34:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [02:34:32] T217152: Monitor and scale in the Trusty grid - https://phabricator.wikimedia.org/T217152 [02:40:36] !log tools.trusty-deprecation Restarted stuck webservice [02:40:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.trusty-deprecation/SAL [06:12:39] !log tools.legobot migrated to stretch grid [06:12:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.legobot/SAL [06:13:04] okay, one tool left [06:43:39] I honestly have no idea how dbreps works right now [13:54:46] !log paws created `paws.wmflabs.org` subdomain under `paws` project (T211096) [13:54:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [14:05:21] hmm stashbot didn't update the phab task [14:06:02] I think the parentheses threw it of [14:07:16] !log paws created `paws.wmflabs.org` subdomain under `paws` project T211096 [14:07:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [14:09:30] Há, gtirloni that's a restricted task [14:09:36] ah!!! [14:09:44] crap, ok. mystery solved :) thanks [14:12:39] !log paws created `paws.wmflabs.org` subdomain under `paws` project (T211096) [14:12:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [14:12:44] T211096: PAWS: Upgrade Kubernetes - https://phabricator.wikimedia.org/T211096 [14:12:48] there you go [14:45:12] !log sunset deployment-maps03 [14:45:13] mateusbs17: Unknown project "sunset" [14:45:58] !log tools.deployment-prep sunset deployment-maps03 [14:45:58] mateusbs17: Unknown project "tools.deployment-prep" [14:46:12] !log deployment-prep sunset deployment-maps03 [14:46:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [14:48:30] Cyberpower678: you recently had a hidden disk storage leak due to a log or something ? [14:48:46] Cyberpower678: i'm experiencing something similar, but cant seem to find it. any tips ? [14:49:19] zhuyifei1999_ helped a lot on that as well [14:49:27] i can't find deleted files in lsof that seem to explain this. [14:50:45] My time is kind of scarce this week, but if you still have troubles on Saturday I can take a look [14:58:55] hmm. lemm kill all tmux and old ssh sessions.. see if that fixes it.. [14:59:34] How likely is it that https://tools.wmflabs.org/not-in-the-other-language broke during trusty deprecation? [15:02:36] i tried restarting rsyslog, apache and my renderd (the most logical causes for this leak), tmux and systemd-journald... [15:03:01] weird. [15:04:00] hare: seems more like a ldap bug [15:04:02] parse_ini_file(/data/project//replica.my.cnf): failed to open stream: No such file or directory [15:04:18] hare: well that's no good ;) [15:05:09] Yeah there is a common pattern of using ldap to get userhome, and when that fails we get that cryptic message [15:05:43] s/userhome/toolhome/ [15:06:50] I think(?) the ldap lookup is only in tool startup, so a restart might help [15:10:32] hare: either Magnus or a Toolforge root needs to restart it [15:12:56] !log tools.not-in-the-other-language Restarted webservice. Trying to clear database connection errors which appear to have been caused by LDAP lookup failure. [15:12:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.not-in-the-other-language/SAL [15:13:45] hare, chicocvenancio: seems to be working as expected again after restart [15:13:54] stupid LDAP bugs :/ [15:15:51] :) [15:21:05] bd808: small note, i just noticed that grafana info from the old cluster instances started running.... which made me notice that apparently all historic data from the old instances is gone ?? is that intentional ? [15:21:35] https://grafana-labs.wikimedia.org/dashboard/db/labs-project-board?orgId=1&from=1546300800000&to=now&var-project=maps&var-server=maps-tiles1&var-server=maps-tiles2&var-server=maps-tiles3 [15:21:43] thedj: no, not intentional, but it happened [15:22:20] k. just wanted to make sure people were aware. [15:23:09] it has happened across pretty much all data in the Cloud VPS specific graphite server [15:27:50] !log tools Copied all crontab files still on tools-cron-01 to tool's $HOME/crontab.trusty.save [15:27:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:32:31] thedj: I'm kinda here (at work so not really 'continuously' available) [15:34:02] is the issue like du saying a lot of space available and df says it's full? [17:42:50] !log toolsbeta Preparing to shutdown beta Trusty job grid [17:42:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:44:40] 🎉 [17:45:04] !log git running apt full-upgrade on gerrit-mysql, gerrit-test, gerrit-test3, jenkins-slave-01, and puppet-paladox [17:45:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [17:50:04] !log wikidata-dev wikidata-shex add SPARQL endpoint to $wgWBSchemaShExSimpleUrl (T218886) [17:50:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [17:50:08] T218886: Improve shex-simple tool for WikibaseSchema use - https://phabricator.wikimedia.org/T218886 [17:52:22] !log wikidata-dev wikidata-shex mwscript maintenance/purgeList.php --namespace 640 # purge Schema pages after config change (T218886) [17:52:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [18:00:12] !log toolsbeta All Trusty instances shutdown and now in process of deleting [18:00:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:02:52] !log wikidata-dev wikidata-shex add &manifest=[] to $wgWBSchemaShExSimpleUrl (T218886) [18:02:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [18:02:56] T218886: Improve shex-simple tool for WikibaseSchema use - https://phabricator.wikimedia.org/T218886 [18:03:12] !log wikidata-dev wikidata-shex mwscript maintenance/purgeList.php --namespace 640 # purge Schema pages after config change (T218886) [18:03:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [18:06:00] grr, I think I’ve been purging the wrong cache [18:08:03] !log wikidata-dev wikidata-shex for i in {1..26}; do printf 'Schema:O%d\n' "$i"; done | mwscript purgePage.php --skip-exists-check # purge Schema pages after config change (T218886) [18:08:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [18:08:06] T218886: Improve shex-simple tool for WikibaseSchema use - https://phabricator.wikimedia.org/T218886 [18:25:45] hi, is anyone here? [18:26:29] I'm having trouble moving files from trusty to stretch (i.e. I… don't know how to do that) [18:27:22] jc86035: moving from where to where? [18:27:32] jc86035: the files are present in both systems thanks to the NFS storage [18:27:52] all that needs to move is crontabs and active jobs [18:30:43] oh [18:30:51] never mind then [18:35:46] zhuyifei1999_: exactly [18:39:31] !log tools Shutdown tools-cron-01.tools.eqiad.wmflabs (T217152) [18:39:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:39:35] T217152: Monitor and scale in the Trusty grid - https://phabricator.wikimedia.org/T217152 [18:42:25] zhuyifei1999_: for instance, i rebooted during the weekend and it freed up 12GB in df -h [18:43:55] !log tools icinga downtime tools-checker for 24h due to trusty grid shutdown [18:43:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:46:56] !log tools All Trusty job grid queues marked as disabled. This should stop all new Trusty job submissions. [18:46:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:49:03] !log tools All jobs still running on the Trusty job grid force deleted. [18:49:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:53:09] thedj: what does sudo lsof say? [18:53:10] !log tools Shutdown tools-grid-shadow (T217152) [18:53:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:53:13] T217152: Monitor and scale in the Trusty grid - https://phabricator.wikimedia.org/T217152 [18:53:39] !log tools Shutdown tools-grid-master (T217152) [18:53:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:03:15] bd808: Did you make a backup of the active crontabs? [19:03:20] If not, could you please do that? [19:03:47] multichill: yes, I did. [19:04:09] They are in $HOME/crontab.trusty.save for each tool [19:04:26] (and also still backed up in an NFS share as well) [19:04:39] First tool I tried to migrate failed and was busy doing other things [19:05:06] I updated https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation#Move_a_cron_job as well [19:05:13] Good to hear [19:05:23] trusty finally going down [19:06:23] we were having 'use jessie, don't use trusty' since yuvi time ;) [19:07:45] Quite a list of tools that will break tonight: https://tools.wmflabs.org/trusty-tools/ [19:08:30] bd808: Can you fix https://tools.wmflabs.org/trusty-tools/ in time? Otherwise it will just be empty in a couple of days [19:08:49] too bad for them. they had a long time to migrate [19:09:53] multichill: saw https://tools.wmflabs.org/precise-tools/ ? [19:10:15] ^ probably will end up like this soon [19:10:27] GnuTLS: A TLS packet with unexpected length was received. [19:10:27] Unable to establish SSL connection. [19:10:48] multichill: I saved a snapshot of that page that I will put up as a static page in a bit [19:11:14] !log tools Shutdown tools-webgrid-lighttpd-14* (T217152) [19:11:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:11:17] T217152: Monitor and scale in the Trusty grid - https://phabricator.wikimedia.org/T217152 [19:11:36] /data/project/archive-things/apps/bin/wget: error while loading shared libraries: libgnutls.so.26: cannot open shared object file: No such file or directory [19:11:40] how do I fix this? [19:11:58] I use a newer version of wget because it has the --retry-on-http-error flag [19:12:06] rebuild it [19:12:23] jc86035: use the wget installed in the system or rebuild your custom version to work on Stretch [19:12:35] okay, will do [19:12:48] !log tools Shutdown tools-webgrid-generic-14* (T217152) [19:12:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:13:18] jc86035: you might want to double check and see if the wget we get from upstream on Debian Stretch has the feature you built the custom version for previously [19:13:37] Thanks bd808. I have no clue what httpd daemon to use. Too many options and last one I tried crashed. Which one should I use for simple files and php? [19:13:57] bd808: I did, the version in Stretch is 1.18, the flag was added in 1.19.something [19:14:31] multichill: I would try `webservice --backend=kubernetes php7.2 start` as the new "default" type [19:14:58] sometime next week that will actually become the default for `webservice start` with no other args [19:15:04] Isn't there a stupid option so you don't bother me when you are replacing php7.2? ;-) [19:15:34] Or -lazy-i-ll-have-a-look-when-it-broke option? ;-) [19:15:38] php7.2 will be with us for at least anohter year :) [19:16:01] probably 2 more years in reality... [19:17:31] Would be an interesting option for future migrations. Just give an option to stay with the current version and announce the date of shift [19:17:47] (and then do we get debian buster?) [19:18:34] zhuyifei1999_: buster will probably start showing up in Kubernetes in 2-4 months. I will not speculate when or if it may be added to the grid engine [19:18:46] nice [19:19:12] !log tools Shutdown tools-exec-140* (T217152) [19:19:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:19:15] T217152: Monitor and scale in the Trusty grid - https://phabricator.wikimedia.org/T217152 [19:19:42] configure: error: Package requirements (gnutls) were not met: [19:19:42] No package 'gnutls' found [19:19:42] Consider adjusting the PKG_CONFIG_PATH environment variable if you [19:19:43] installed software in a non-standard prefix. [19:19:43] Alternatively, you may set the environment variables GNUTLS_CFLAGS [19:19:44] and GNUTLS_LIBS to avoid the need to call pkg-config. [19:19:44] See the pkg-config man page for more details. [19:19:47] bd808: https://tools.wmflabs.org/multichill/ <- right 502 Bad Gateway again [19:20:54] multichill: anything in the error.log? [19:21:28] 502 usually means either the service has not started completely yet or that it is crashing hard as soon as it starts [19:22:09] jc86035: it it expecting libgnutls28-dev or something? [19:22:33] bd808, multichill: it looks like it’s complaining about a duplicate 'txt' entry in .lighttpd.conf [19:22:45] (the error.log is world-readable btw, not sure if that’s good ^^) [19:22:47] zhuyifei1999_: I don't know. I don't remember if this happened last time I tried to build wget [19:22:58] "checking for GNUTLS... no" [19:23:11] Lucas_WMDE. multichill: sounds like https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation#Lighttpd_crashes_on_startup_with_message_%22parser_failed_somehow_near_here:_(EOL)%22 [19:23:40] sure does [19:24:25] I move my .lighttpd.conf to an other name and restarted it to rule that out [19:24:28] * bd808 doesn't actually expect people to read the FAQ but loves having canned answers [19:25:09] I can’t find any documentation on how to override the '.txt' entry though [19:25:22] mimetype.assign[".txt"] = "text/plain;charset=UTF-8"? [19:25:33] that's the new default [19:25:39] ok ^^ [19:25:43] then just drop it I guess [19:25:44] and yes, there is no way to override now [19:26:10] Hmm, looks like I hit some caching issue too [19:26:12] you can only replace the whole array? [19:27:08] Lucas_WMDE: I actually don't think the Stretch version of lighttpd will let you do that. It will let you add new mappings to the array, but not change existing ones. [19:27:33] I think that extends to not letting you replace the entire config array [19:28:22] does anyone think they know what I'm doing wrong? [19:28:46] Lucas_WMDE: It's not awesome at all, but most of the usage I have seen was adding the utf-8 encoding hint, so we just made that default in the basic config. -- https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Lighttpd#Default_configuration [19:28:56] I think it's expecting libgnutls28-dev but that's not installed [19:31:01] !log tools Shutdown tools-bastion-0{2,3} (T217152) [19:31:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:31:04] T217152: Monitor and scale in the Trusty grid - https://phabricator.wikimedia.org/T217152 [19:32:24] zhuyifei1999_: should I install an older version of wget or something like that? [19:32:59] I think the best you can do is file a ticket on getting libgnutls28-dev installed [19:33:00] (Are you sure it's the version? It just says "No package 'gnutls' found") [19:33:23] !log tools Shutdown tools-exec-141* (T217152) [19:33:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:33:39] libgnutls28-dev is the only version of libgnutls-dev installable on stretch afaict [19:34:23] bd808: So the manual at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Lighttpd#Header,_mimetype,_character_encoding,_error_handler is incorrect? [19:34:43] I can confirm that libgnutls-dev is missing at least from the Stretch bastions which likely means its nowhere on the Stretch grid [19:35:05] multichill: yes. that has not be updated for Stretch's changes yet [19:35:30] edits welcome! [19:35:46] zhuyifei1999_: should it be a subtask of T55704? [19:35:58] T55704: Packages to be added to toollabs puppet - https://phabricator.wikimedia.org/T55704 [19:36:01] So much for canned answers! [19:36:19] jc86035: yes [19:36:28] So how do I get the server to send .sql as plain text instead of some binary type? [19:36:44] multichill: If I could delete all the docs going back to 2005 that are mostly wrong and/or misguided I would. [19:38:13] multichill: You should be able to add new types, so if there is no '.sql' mapping in https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Lighttpd#Default_configuration then you can add it [19:38:36] ".sql" => "application/x-sql", [19:39:01] yeah. your going to be stuck with that then for now. [19:39:12] we can fix the shared defaults though! [19:39:28] And I have server.dir-listing = "disable" set to enabled [19:39:30] That will require a patch to the webservice package [19:40:09] Isn't the whole point of everyone having their own instance of a webservice that we're flexible with configuration? [19:40:54] zhuyifei1999_: I've filed T219219 [19:40:54] T219219: Please add libgnutls-dev - https://phabricator.wikimedia.org/T219219 [19:41:17] multichill: the upstream software changed. Not a whole lot I can do about that [19:41:57] Can I override the default configuration for a tool? [19:42:44] multichill: not today, no. That would need to be somehow added to the webservice command. [19:43:09] which is something that obviously could be done, but not today. [19:43:30] And I would like to hear more than one person ask for it honestly before adding the complexity [19:43:33] multichill: patches to how `webservice` works welcome :) [19:44:24] If the config format were simpler then we could try to implement our own merge, but I don't think that will actually be possible [19:45:20] so the option is probably something more like adding a cli argument that says "use my $HOME/.lighttpd.conf and not the system default" [19:45:59] but that's got a lot of potential issues for long term support too honestly [19:46:09] I just have directory listings enabled and push out .txt and .sql as plain text unicode so it's easy to download [19:46:45] The most basic functionality of a webserver, serving some files and that doesn't work after migration. [19:47:11] !log tools depooling tools-worker-1025.tools.eqiad.wmflabs because it's not responding and showing insane load [19:47:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:47:47] That's not what I expected to break. Wasting time and this is negative energy. Good night [19:48:09] multichill: I get that you are frustrated today, but showing up on the last day of a 3 month migration to vent is not helping fix things [19:51:44] !log tools Shutdown tools-exec-142* (T217152) [19:51:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:51:47] T217152: Monitor and scale in the Trusty grid - https://phabricator.wikimedia.org/T217152 [19:59:36] !log tools Shutdown tools-exec-143* (T217152) [19:59:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:59:39] T217152: Monitor and scale in the Trusty grid - https://phabricator.wikimedia.org/T217152 [20:29:19] thedj: still on? [20:30:35] My disk leak came from out and err logs flooding the disk despite being directed to /dev/null [20:31:00] This is because the crontab uses a different CLI protocol. [20:35:02] !log tools rebooted tools-worker-1025 and tools-worker-1021 [20:35:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:37:41] For the syntax I was using. My solution was to instructing the crontab to use a different shell [20:42:34] !log tools Deleted tools-bastion-0{2,3} (T217152) [20:42:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:42:37] T217152: Monitor and scale in the Trusty grid - https://phabricator.wikimedia.org/T217152 [20:43:15] !log tools Deleted tools-cron-01 (T217152) [20:43:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:45:01] bd808: Agree, that's why I left irc and instead worked on stuff [20:46:24] !log tools.universalviewer Linked public_html -> src and webservice --backend=kubernetes php7.2 start . Symlink /data/project/universalviewer/src/ seems to break stuff [20:46:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.universalviewer/SAL [20:46:55] https://tools.wmflabs.org/universalviewer/#?manifest=https%3A%2F%2Fwww.nga.gov%2Fapi%2Fv1%2Fiiif%2Fpresentation%2Fmanifest.json%3FcultObj%3Aid%3D26=26&c=0&m=0&s=0&cv=0&xywh=-3874%2C-1%2C17558%2C9812 [20:47:57] !log tools Deleted tools-exec-140* (T217152) [20:48:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:48:01] T217152: Monitor and scale in the Trusty grid - https://phabricator.wikimedia.org/T217152 [20:48:44] !log tools Deleted tools-exec-141* (T217152) [20:48:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:49:11] !log tools Deleted tools-exec-142* (T217152) [20:49:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:49:31] !log tools Deleted tools-exec-143* (T217152) [20:49:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:51:16] !log tools.wlmtrafo Not sure what the tool is doing, but webservice is running again [20:51:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wlmtrafo/SAL [20:51:54] !log tools Deleted tools-webgrid-generic-14* (T217152) [20:51:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:52:02] !log tools rebooting tools-package-builder-02 due to lots of hung /usr/bin/lsof +c 15 -nXd DEL processes [20:52:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:52:06] Hello! I'm working on moving a VM from eqiad to eqiad1-r. With identical credentials (replica.my.cnf - that works on old eqiad) I get a "mysqli_connect(): (HY000/1045): Access denied for user 'p50380g50921'@'10.64.37.14'" on eqiad1-r. Do I need to make changes? I'm accessing using the xxwiki.labsdb hostnames [20:52:47] andrewbogott this is regarding your email about maps-wma1 [20:53:10] uh [20:53:22] if you're in eqiad1-r why does mysql think you're connecting from a 10.64 address? [20:53:29] something isn't right there... [20:54:01] ew [20:54:06] krenair@bastion-eqiad1-01:~$ host 10.64.37.14 [20:54:07] 14.37.64.10.in-addr.arpa domain name pointer dbproxy1010.eqiad.wmnet. [20:55:15] that username looks weird [20:55:49] !log tools reboot tools-sgewebgrid-generic-0903 to clear up some issues [20:55:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:56:35] might be one of the early ones? [20:57:18] I assume at some point mysql grants were updated to let people use it from the new region, did that apply to such usernames? [20:57:31] yeah, that's an ancient set of creds [20:58:54] bd808: Got most stuff fixed. Symlink and web service seems to have changed. Symlinks outside of homedir seems to have stopped working. Any thoughts on that? [20:59:21] multichill: hmmm.. like a symlink that goes into another tool's $HOME? [20:59:45] Yeah and one was going to something shared [21:00:29] the shared one might be about differences in how the mounts are done in the Kuberenetes containers [21:00:30] See for example ~universalviewer with the public_html and public_html2 [21:00:57] I recall reading something about it. [21:04:05] multichill: to debug a bit, you should be able to use `webservice --backend=kubernetes php7.2 shell` to get an interactive shell attached to a running Kubernetes pod with all the same mounts and basic config that the running lighttpd process would see [21:04:53] I can spend some time looking into this today as well if you would like to create a phab task and describe the desired outcomes and where possible current bugs [21:06:24] dschwen: I'm wondering if we should just generate new database credentials for that problem... I'm not sure if I have all the right super powers needed to debug it on the server side. [21:06:59] That sounds good. Nothing too bad broken. I'll just poke around later this week. Might switch it to the grid for a moment to see if that fixes it [21:18:43] !log tools Deleted tools-webgrid-lighttpd-14* (T217152) [21:18:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:18:46] T217152: Monitor and scale in the Trusty grid - https://phabricator.wikimedia.org/T217152 [21:19:38] !log tools Deleted tools-grid-{master,shadow} (T217152) [21:19:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:20:37] dschwen: I agree with bd808 that generating new creds is the best approach. [21:21:43] !log tools All Trusty grid engine hosts shutdown and deleted (T217152) [21:21:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:21:49] * bd808 does a little dance [21:23:01] Fixed the last problem for now bd808. Switching from Kubernetes to grid helped :-) [21:24:23] bd808: nice work :) [21:25:05] multichill: there are certainly mount differences between the grid and kubernetes. Many of them were announced so long ago that I don't remember off the top of my head what they are anymore :/ [21:25:25] Good job with the move [21:26:21] Might be worth doing a " webservice --backend=gridengine lighttpd start " for the tools that had a webservice running? [21:27:37] multichill: others may disagree, but I think that any webservice or cronjob that has not been moved by the maintainers should stay down until either a) an active user community emerges to complain or b) the maintainers revive them [21:28:02] I fundamentally believe that many of the 300+ tools that went dark today have been long abandoned [21:29:02] int he case of (a) I would want to put the tool up for adoption and set a deadline for finding new maintainers before starting the service back up [21:29:09] dschwen: what's your tool's name? [21:29:15] We call that the "piep" system in Dutch [21:29:39] https://tools.wmflabs.org/trusty-tools/ just went down so I can't see what tools are in the list [21:29:53] working on it :) [21:42:01] gtirloni: it's not a tool, this is on vps. migrating maps-wma1 to maps-wma [21:42:30] is this advice still up to date? https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation#SSH_login-stretch.tools.wmflabs.org_fails_with_'Permission_denied_(publickey)' [21:42:33] dschwen: it would help if you can give me a command to try that will produce that error message. (don't paste the password here though, of course, just tell me where to find it) [21:42:59] probably not HaeB [21:43:04] ...asking because I'm getting this error, which seems to be related to the one described on that page : "$ ssh login.tools.wmflabs.org [21:43:04] z@login.tools.wmflabs.org: Permission denied (publickey,hostbased)." [21:43:04] it has login-trusty references [21:44:08] you are logging in as tbayer, HaeB ? [21:44:18] yes [21:44:55] your key in LDAP is 2048-bit RSA [21:45:19] so the DSA or RSA<1024 issue mentioned there is likely not the issue [21:46:51] HaeB, if you run `ssh-add -l` do you see an entry like '2048 SHA256:8NunY0ckDblPyCiSKjcawWSbOuwCKvd+chhmYB7cMyQ'? [21:47:09] z@login.tools.wmflabs.org "you are logging in as tbayer ?" [21:47:12] ehm.. ^ [21:47:18] sure ? [21:47:50] thedj: oh, so it's sending the wrong login name? ;) [21:47:53] :D [21:47:59] I should've noticed that one [21:48:23] there's no uid=z account in LDAP :) [21:48:24] Krenair: yes, that entry is there [21:48:48] multichill: https://tools.wmflabs.org/trusty-tools/ now has a static snapshot of the data from jsut before the shutdown [21:49:50] zhuyifei1999_: it says that apache has lots of very big (but current) logfiles open, but that's about it.. [21:50:11] can I see? [21:50:24] like paste the stuff here? [21:53:42] i'll past on phab [21:55:29] Excellent bd808! [21:58:23] zhuyifei1999_: https://phabricator.wikimedia.org/P8267 [21:58:57] looking [22:00:10] it's a very busy system atm ;) [22:01:23] it doesn't seem to have any deleted files indeed [22:01:39] brb [22:02:19] bd808: Congratulations re. Trusty. [22:03:07] thanks James_F :) [22:04:24] back [22:04:45] thedj: how fast does disk space fill up after reboot? [22:06:59] !log tools.admin Updated to 1d61965 and migrated to php7.2 runtime [22:07:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.admin/SAL [22:07:28] zhuyifei1999_: gigs a day [22:08:17] i rebooted on saturday i think, and the 12G is almost gone again [22:10:17] could it be cleanups that normally happen on reboot? [22:10:36] things in /tmp maybe? [22:11:02] du -sh doesn't show anything in /tmp... [22:11:49] thedj: but a reboot with no other changes frees disk? [22:11:53] yup [22:12:13] at least it did last time... [22:12:17] 2 times [22:12:26] i haven't restarted it today yet [22:16:04] actually... [22:16:16] maybe i found it.. [22:18:34] * chicocvenancio waits anxiously [22:19:23] its an rsyslog problem i think... [22:22:31] ok.. duh [22:22:41] i fixed it last time.. [22:23:07] but now i have same wasted logspace, but it actually goes into /var/log [22:23:55] right. systemd-journal sucks... [22:24:34] (sorry, I got preempted) [22:24:51] hmm, maybe i should change the logrotate strategy on that renderd log or something... [22:26:20] so that likely caused the delete file hangup last time. I figured i had restored my changes to original before rebooting last time, but i hadn't and now i didn't have a delete file, but an actual humongous file... [22:27:53] renderd.log per day 2GB.. apache access log. 3.8GB per day... [22:27:59] f. me... [22:29:04] Congrats on the trust grid shut down \o/ [22:29:08] *trusty [22:29:20] * halfak builds an AI and calls it "trust grid" [22:29:22] mwahahaha [22:31:02] thedj: yeah, you probably need to setup size based rotation for your access and debug logs on those boxes. They will get a massive amount of traffic (mostly from random sites on the internet that are using you for free map tiles) [22:32:18] it's a big milestone indeed, I feel bstorm_ and bd808 deserve special praise here, congrats [22:32:23] I don't remember what we setup for the access log rotation on the shared http proxy but I do know we had to tune it several times to keep the disk from filling up [22:32:31] gtirloni: agreed ! [22:32:41] yeah congrats bd808 and bstorm_ [22:32:54] all credit to bstorm_ for the hard parts! She built the new grid and that was a massive pile of work [22:33:05] bd808: i just disabled delaycompress and set it to keep 3 days [22:33:12] Mostly I just nagged people and then stole the fun job of turning things off [22:33:36] will probably switch to some sort of hour based rotation i guess. [22:35:08] "maxsize 2000M" is apparently what we settled on in the logrotate config for project-proxy [22:35:30] ah per rotation ? [22:36:12] yeah. And at the moment "rotate 1" so we only have the live log and the prior file [22:37:53] Aww thanks. :) Hopefully, it will be easier when we upgrade to buster in the future. Amazingly, there's already gridengine packages for buster. [22:40:48] would buster hosts be able to join with a stretch cluster bstorm_? [22:41:00] we will have to test to find out [22:41:00] andrewbogott, had to leave for a meeting (sorry). Credentials are on the maps-wma VM under ~dschwen/replica.my.conf [22:41:02] mysql --defaults-extra-file=replica.my.cnf -h enwiki.labsdb [22:41:13] They currently use the same version of gridengine, but that doesn't guarantee anything in my mind [22:41:34] However, it was the gridengine version spread that required the parallel grids, so maybe [22:41:56] actually strike that. that doesn't work on the old cluster either [22:42:10] How do I recreate my credentials then? [22:43:10] bd808: do you know offhand how to help with ^ ? [22:43:16] dschwen: that simplifies things :) [22:43:54] dschwen: the recommended way is for you to create a new tool expressly for the purpose of getting the db credentials and then copy the replica.my.cnf that is generated for it off to your vms as needed [22:44:25] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Connecting_to_the_wiki_replicas_from_other_Cloud_VPS_projects [22:44:59] Ok, thx [22:45:22] If I have a tool with working credentials can I use those? [22:46:58] i don't see why not, other than separation of concerns.. [22:47:00] dschwen: you can, but if that tool is not related to wma you are probably better off making a new one to take the credentials from [22:47:12] I have a wma tool [22:47:28] mostly so that you don't end up confused about concurrent connection limits [22:47:32] dschwen: can you actually cd to your ~ on that host? [22:47:39] (because I can't, even as root) [22:48:16] andrewbogott: I can as him [22:48:31] you need to sudo -u user yeah [22:48:38] hm, me too [22:48:42] yeah [22:48:56] so just weird permissions I guess [22:49:01] It's likely rootsquashed [22:49:06] I wonder you you guys do that (having user directories inaccessible to super user) [22:49:07] yup [22:49:13] that's not interfering with the web service I hope? Can /it/ read the creds? [22:49:13] ah [22:49:15] NFS [22:49:17] nfs option [22:49:26] The creds are definitely non-working [22:49:29] I have a copy of teh creds where I need them [22:49:35] I cannot find the user referenced in the creds anywhere [22:49:46] readable by www-data (outside of teh served directory of course) [22:49:57] It's clearly formatted like a typical replica.my.conf...I just don't seem to find that username in the database [22:50:14] FYI, I just overwrote that cred file [22:50:34] Ah good :) [22:50:39] The new one looks more likely to work [22:50:43] it does