[00:20:31] thedj: I updated the proxies. https://tiles.wmflabs.org/ is live and serving a default debian apache2 landing page [00:21:31] I recreated the 3 {a,b,c}.tiles.wmflabs.org proxies too, but they obviously still have the TLS cert problem [00:22:15] !log maps Switched *.tiles.wmflabs.org proxies to point to http://172.16.5.154:80 (T217992) [00:22:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps/SAL [00:22:21] T217992: Unable to delete "tiles.wmflabs.org" proxy entry via horizon - https://phabricator.wikimedia.org/T217992 [01:13:42] !log project-proxy Deleted dangling backend records in /etc/dynamicproxy-api/data.db (T218064) [01:13:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [01:13:46] T218064: dynamicproxy data contains domains ending in '.' which can not be deleted via Horizon or cli tools - https://phabricator.wikimedia.org/T218064 [03:00:07] !log project-proxy Fixed domains with trailing dot (T218064) [03:00:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [03:00:15] T218064: dynamicproxy data contains domains ending in '.' which can not be deleted via Horizon or cli tools - https://phabricator.wikimedia.org/T218064 [03:21:41] !log project-proxy Removed redis sets with no record in the backing database (T133554) [03:21:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [03:21:44] T133554: Switch dynamicproxy to point back to IP rather than domain names - https://phabricator.wikimedia.org/T133554 [03:45:20] hello, looks like all of VPS is returning a 404? [03:46:06] musikanimal: I think I may have jsut fixed that [03:46:37] !log project-proxy Restarted uwsgi-invisible-unicorn on proxy-01 [03:46:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [03:46:51] yes ha, all is well now! thank you [03:47:24] I apparently managed to delete all the redis cache for the proxy :/ [03:47:33] but I reloaded it from the backing database [03:49:17] and now I see what I did horribly wrong, so I can fix it! [07:35:38] T218072 [07:35:39] T218072: Please install locale information on (kubernetes) webserver - https://phabricator.wikimedia.org/T218072 [07:36:03] I thnink, there a some more affected by the missing locales [08:13:34] bd808: i fell asleep last night ;) But its running now. [09:45:19] Something going on in Toolforge... after become: "groups: cannot find name for group ID 50062" [09:51:35] jem: https://phabricator.wikimedia.org/T217838 -> https://phabricator.wikimedia.org/T217280 -> https://phabricator.wikimedia.org/T130593 [09:53:33] Ok, thanks, mutante :) [09:56:26] yw,jem [10:51:38] hey mutante, in which TZ are you? :-) [10:51:43] germany? [10:52:41] arturo: yes, that's right [10:53:04] cool [11:16:12] Hi again, my tool spellcheck is stalled and "webservice stop" doesn't stop it, even with --backend, any help? [11:18:38] mutante maybe? [11:22:54] jem: sorry, i dont know about webservice [11:23:14] Ok, thanks anyway [11:23:29] jem: perhaps this? https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation#'webservice_stop'_says_service_is_not_running,_but_'webservice_start'_says_service_is_running [11:24:25] (assuming you aren't using trusty) [11:24:37] No, I'm not [11:25:06] But "stop" just tries, it doesn't say it's not running [11:25:23] Anyway I'll try that [11:25:35] cool, let us know the results [11:28:11] arturo: No changes, stop doesn't stop it and start says it's running [11:28:48] I can start and stop it in kubernetes but I guess that's not a good solution [11:29:39] ok, at this point I would suggest you open a phab task with all the details so we can investigate and others may share info (and reuse it) [11:32:42] Ok... but I'm a little busy these days so I was hoping I could avoid that [11:33:08] Anyway thanks and I'll do it in a few hours if there is no solution [13:18:13] !log openstack T216497 create cloudnet-stretch-test-01 instance for testing puppet code [13:18:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [13:18:16] T216497: CloudVPS: workaround archival of jessie-backports repo - https://phabricator.wikimedia.org/T216497 [13:48:31] How can a cron job be moved to stretch in Windows using Putty? [13:49:41] Adithya: https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation#Move_a_cron_job [13:50:06] Actually, its not for Windows [13:50:43] Please read the page http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-cloud/20190311.txt [13:52:01] In that, please read from the time [17:13:15] [14:14:51] I have a doubt [14:14:57] 3116810 0.30000 linter_cou tools.firefl r 03/12/2019 14:00:17 task@tools-exec-1409.eqiad.wmf [14:15:13] This is the statement I get after executing qstat command [14:15:40] From this, when I move it to stretch, what actually do I need to type alongwith the command jstart? [14:19:55] stretch webgrid webservice #348739 stuck and can't kill, could an admin kill [14:22:47] bamyers99: I've killed it and started the webservice again [14:23:10] thanks [14:24:28] Adithya: assuming you want to run the `linter_counts_dump.sh` script, you'd type: `jstart /data/project/fireflytools/linter_counts_dump.sh` [14:29:47] Ok. Let me try it [14:41:26] Currently, as seen from the page https://tools.wmflabs.org/trusty-tools/u/adithyak1997 the job has been converted from Running to Last seen [14:41:49] Now, how can I confirm whether the task has started in Stretch? [14:42:03] qstat command doesn't work [15:03:07] Adithya: qstat would work if it's executed on a stretch bastion [15:03:17] Ok [15:03:52] To be frank, I was searching for you in Phabricator(https://phabricator.wikimedia.org/T209780) this page [15:04:01] trusty bastions can only communicate with trusty grid. stretch bastions can only communicate with stretch grid [15:04:37] Ok [15:04:54] But in Kuberenetes, some command is there to check there right? [15:05:01] Something starting with pods [15:06:05] you bean to check 'jobs' running on k8s? [15:06:15] that's `kubectl get pods` [15:06:20] *mean [15:06:38] Yes [15:06:42] that works on both trusty and stretch bastions [15:07:09] Ok [15:09:09] Currently, I know from the link https://tools.wmflabs.org/sge-jobs/ I am able to see that my tool has migrated to stretch [15:09:19] k [15:09:24] But the updations are not taking place in https://tools.wmflabs.org/fireflytools/linter/enwiki [15:09:45] The reason I say so is that in the toolpage, the timestamp is not changing [15:09:50] did you check the job logs? [15:09:59] No [15:10:04] then do that [15:10:07] How can that be checked? [15:10:09] and check qstat [15:10:29] .{out,err} [15:10:36] https://tools.wmflabs.org/sge-jobs/tool/fireflytools [15:10:43] .out for stdout, .err for stderr [15:10:43] Won't this help? [15:10:53] no. `$ qstat` [15:11:17] submit it, check status, and check logs [15:11:30] But qstat is not showing anything [15:11:42] It just proceeds to next command line [15:12:17] Adithya: qstat will only show things while your cron jobs are actively running. [15:12:23] then it is not running. submit it [15:17:22] Using jstart? [15:19:36] yes [15:19:48] jstart linter_counts.py? [15:20:43] Now qstat has one job [15:22:58] Now, the job appeared on https://tools.wmflabs.org/sge-status/ [15:29:25] But I am facing an issue [15:30:13] It was the job linter_counts_dump that I ran earlier using the jstart command [15:30:36] But now, when I give the command jstart linter_counts_dump.sh, its not working [15:47:46] I would like to know how I can start the job linter_counts_dump.sh using jstart [17:08:38] But the values are not getting updated in https://tools.wmflabs.org/fireflytools/linter/enwiki [17:08:53] It worked smoothly before moving to Stretch [17:10:14] !log wikilabels uploaded new database config (clouddb1002) to staging [17:10:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL [17:16:20] I have migrated my tool onto stretch and the updates are happening there. Link: https://tools.wmflabs.org/sge-jobs/tool/fireflytools. The last seen is also getting updated [17:17:41] Adithya: I see 5 jobs running as tools.fireflytools on the Stretch job grid. If the jobs are not doing what you expect them to do then you should check your error logs. I do see that the current jobs are running under the continuous queue rather than as the run-once-and-stop description in the crontab file. Maybe this is part of your problem? [17:18:28] I actually need linter_counts_error_dump.sh to be run continuously [17:18:37] !log wikilabels restarted uwsgi-wikilabels-web on staging [17:18:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL [17:19:23] But I am not able to make it using jstart command [17:21:47] 0,30 * * * * jsub -quiet -once /data/project/fireflytools/linter_counts_dump.sh [17:22:03] This is the format in which each of the jobs are given in crontab [17:22:33] But I think the way in which non continuous jobs needs to be executed is different [17:24:17] Now, I have deleted all the other jobs using jdel. Now, what needs to be done? [17:32:16] Adithya: the syntax for using the job grid has not changed from Trusty to Stretch [17:32:40] Ok [17:33:25] But what about those of continuous as well as non continuous ones? [17:35:22] Adithya: there is no change in how to start the jobs. It looks to me like you started many jobs manually using `jstart` which puts the jobs in the continuous queue and wraps them a shell script that restarts them when the exit with an error code. This may be blocking your cron tasks from starting. [17:36:07] the `-once` flag in cron will check for a running job with the same name and not start the new job if so [17:36:39] bd808: did the cron issue from yesterday get sorted, i got a spam of cron error emails yesterday and i havent recieved any since and i was wondering if i have to resubmit my cron jobs or if i have to personally take action on anything [17:37:39] Zppix: As far as I know the grid is working as expected again. There was a failure of the Stretch grid master yesterday which lasted about an hour or so as far as I know. [17:39:12] bd808: do i need to resubmit my cron jobs or anything, or do i have to check? [17:40:02] Zppix: you'll have to check I guess. I don't know how I would know what should or should not have run in your tool [17:40:41] How actually do I need to start the jobs? [17:40:53] bd808: well i guess i was asking if other tools had jobs that didnt get resubmitting, i guess i wasnt clear but ill check neverless [17:42:00] Adithya: At this point I really can only refer you to the docs on wikitech -- https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid -- without specific questions I don't really know how to help you. I do not know how your tool is supposed to work. [17:43:25] Ok [18:25:55] {{bd808}} I think I have found the issue [18:26:34] The problem I think is that the virtualenv needs to be rebuilt as mentioned in https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation#Rebuild_virtualenv_for_python_users [18:32:05] What does status qw mean? [18:32:39] *In the qstat menu [18:33:00] queue wait [18:33:12] Ok [18:33:14] I believe, usually means either grid is overloaded or your criteria cannot be fulfilled potentially atm [18:38:06] Ok [19:10:03] so.. i have root:root owned files on the nfs partition of maps.. but root has no permissions to modify anything on the nfs (this is a safety mode of some nfs things apparently)... anyone got suggestions how to take over the ownership of those files ? [19:13:52] root_squash? [19:14:48] if it's root-owned and root_squashed, I dunno if there's anything the client can do about it [19:17:10] yeah probably... [19:22:26] If you have permission in the directory, you can delete those files. If you need the content, copy the content to a different file before removing [19:23:04] I have one such file: ...DATA.crontab … but I do not care about that one [19:23:15] root-owned + root_squashed = no permission [19:23:47] or any possibility to obtain permission at all, even if you are root [19:24:43] either it should be chowned on the nfs host or made non-root_squash [19:24:49] yeah i tried deleting. didn't work [19:38:53] thedj: yuck. the root-squash may be a new thing. If we fix the ownership from the nfs server side would that correct the problem for you? [19:49:01] bd808: yup [19:49:31] thedj: make a ticket and assign to me with the changes you need and I'll figure out how to get the perms fixed [19:49:34] www-data:www-data will probably be good and ownership of that can be claimed via sudo -u www-data [19:50:06] https://phabricator.wikimedia.org/T218145 [19:53:31] bd808: If you have a few seconds, please take a look at https://phabricator.wikimedia.org/T218072 [19:55:00] i'll be so glad if all this image and region moving is over ;) [19:57:11] The end is near :-) [20:11:31] It'll never be 100% over; every OS has a limited shelf-life [20:12:01] Which is why everyone needs to document and manage their setups so they can be rebuild and updated as needed :) [20:19:36] thedj: looking at the files in the maps share now. There is a lot of really old looking junk in directories like /data/project/warper/tmp [20:20:34] yeah i don't know much about warper though. [20:22:03] for me ./styles and ./tiles are most important [20:22:18] ok. I'll start with them [20:33:24] !log maps Changing files owned by root to www-data:www-data in /data/project/{styles,tiles} from NFS server (T218145) [20:33:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps/SAL [20:33:28] T218145: maps: take back root owned files/dirs from root_squash protected nfs - https://phabricator.wikimedia.org/T218145 [20:38:38] thedj: styles was small, so it's fixed. I'm running a heavily ionice'd find+chown on tiles now [20:58:28] bd808: hi! Is tools-login using now login-stretch? [20:58:38] rather, if it redirects you [20:58:46] or shall we still use login-stretch? [20:58:47] hauskatze: yes [20:58:52] olá [20:58:59] olá! [20:59:00] https://lists.wikimedia.org/pipermail/cloud-announce/2019-March/000142.html [20:59:29] hauskatze: both names now point to the same IP address [20:59:50] bd808: I knew I read something, but I wasn't sure where ;) [20:59:52] the "official" DNS name is login.tools.wmflabs.org [20:59:53] too many emails [21:00:22] I'll test ssh -a @login.tools.wmflabs.org next time then [21:01:05] heh - maurelio@login-stretch.tools.wmflabs.org: Permission denied (publickey,hostbased). [21:01:55] using login.tools.etc worked though [21:02:11] weird [21:04:02] hauskatze: "error: AuthorizedKeysCommand /usr/sbin/ssh-key-ldap-lookup maurelio failed, status 1" I think you got bitten by a transient LDAP lookup failure [21:04:43] I think I heard something about LDAP being a bit unstable this days right? [21:05:06] yeah, we are having some growing pains with LDAP and the new Stretch hosts [22:26:39] bd808: sorry to disturb again. If I want a table added to the replicas, it is just adding the table name to maintain-views or are other files involved as well? [22:28:04] thedj, I run the warper [22:28:20] there is some junk there, lemme clear it out [22:28:35] hauskatze: "it depends" is the answer. If something is not seen in the wiki replicas today then it probably has to be reviewed for security implications, added to the replication process on the santarium server, replicated to the wiki replica hosts, and then exposed to the users via maintain-views [22:29:35] bd808: I'm talking about a non-private-data table of course [22:29:40] chippy: if there are a lot of files to delete in warper, it may be easier for you to figure out what they are and then file a phab task for someone on my team to clean them up from the NFS server directly. [22:30:00] I was wondering on the process from a gerrit standpoint [22:30:03] hauskatze: they are all private until they have been reviewed ;) but some reviews are easier than others [22:30:32] well yes, private as to contain personal or nonpublic information such as email addresses or IPs, etc :) [22:32:30] bd808, I was able to remove most of them in /tmp data/project/uploads is needed and active though [22:35:26] chippy: cool. there was some really old stuff in there :) I saw various state files from 2015 [22:35:32] yeah! [22:35:49] hauskatze, so for this type of table that contains nothing that needs partial redaction, it should be a 'fullview' entry for the purposes of maintain-views IIRC [22:36:08] that said as bd808 wrote it's far more complicated than just uploading a patch against maintain-views.yaml [22:36:23] the underlying table is likely to not actually be getting replicated yet for starters [22:36:38] and I don't think it's possible to tell whether that is the case without being ops [22:36:46] and certainly to fix that would need ops [22:37:24] but also as bryan wrote, I think procedure now requires a more formal review than just a regular engineer checking it? [22:37:38] I just looked at a Task in which I could see more or less the process. Although it ends with a patch against maintain-views indeed we need ops to see what is being replicated, add it, wmf legal and security, etc. [22:38:08] the long and short is basically it's not possible to do this process by yourself as a volunteer, make a task [22:39:10] Sure, thanks. [22:40:02] you're asking the right questions hauskatze [22:40:25] some tables would need more complicated entries for maintain-views than just a fullview entry, to set up partial redaction and stuff [22:40:50] I actually have a #documentation task to document whatever the process is, but have not done so yet :) [22:41:13] unfortunately the answer is it's a foundation multi-team bureaucracy [22:41:25] with some behind-the-scenes stuf [22:41:28] some review [22:41:32] etc. [22:41:58] so the answer is unideal but, considering what comes out the other end does seem to be fairly safe [23:09:42] do deployment-prep puppet run logs go anywhere? /var/log/puppet is empty, and logstash-beta.wmflabs.org seems to have very specific things and not the entire run output [23:10:43] that's a good question and I should know the answer [23:10:55] usually I just run puppet via cumin and read all the results :) [23:11:11] if there is a central place I'd love to know about it too [23:11:22] well, the problem is puppet already ran after i changed some hiera in horizon, and i want to know what it actually did :) [23:12:59] ah :) [23:13:31] ebernhardson: at one point they went into the beta cluster logstash, but I don't know if that is still in place or not [23:14:00] ebernhardson, oh, what about /var/log/puppet.log? [23:14:05] for a single-instance thing [23:14:19] they should be on the local puppet master too I think? That's where I used to collect them from [23:14:20] I know /var/log/puppet is an empty directory but this .log file has stuff [23:14:23] at least on -deploy01 [23:15:37] bd808, hmm I don't think the master keeps them but I could be wrong [23:18:34] puppetmaster::logstash is the manifest that tells a puppetmaster to log to logstash. and role::logstash::puppetreports sets up the ELK cluster side [23:18:58] ... this is beginning to ring a bell [23:19:55] oh right [23:19:59] oooooohhhh [23:20:00] oh no [23:20:15] role::beta::puppetmaster is applied on deployment-puppetmaster03 and it tries to apply puppetmaster::logstash [23:20:17] root@deployment-puppetmaster03:/etc/puppet# grep reports puppet.conf [23:20:17] reports = puppetdb [23:20:17] reports = store,logstash [23:20:17] root@deployment-puppetmaster03:/etc/puppet# [23:20:36] knew I had a bad feeling about this. [23:21:47] I bet it sees puppetdb and stops looking [23:23:51] could be :/ [23:26:58] Krenair: indeed, puppet.log seems to have what i was looking for. Thanks! i should have thought to look there too [23:27:12] no worries [23:27:21] the /var/log/puppet thing threw me off at first too [23:27:49] bd808, does tools-puppetmaster also have this problem? [23:28:03] IIRC there was some puppetdb going on there too? [23:28:09] i suppose i never really looked in /var/log, and just accepted the tab completion [23:28:23] Krenair: I don't think we have either the puppetdb or ELK stuff there at the moment [23:28:26] oh, ok [23:29:00] puppetdb is on the wish list. b.storm_ set it up in toolsbeta a few months ago [23:29:13] there is a puppetdb host in tools [23:29:41] doesn't look like puppetmaster has been configured to use it from a glance at openstack-browser [23:33:10] -> T218175 [23:33:10] T218175: puppetmaster config in deployment-prep may be inadvertently breaking store,logstash reports? - https://phabricator.wikimedia.org/T218175 [23:44:32] !help I’m working on a tool to edit pages, and I tried to do the right thing to protect against edit conflicts, but apparently it’s still overwriting conflicting edits [23:44:32] lucaswerkmeister: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [23:44:41] does anyone know of recent bugs in MediaWiki in this area? [23:44:56] (there are some old ones on Phabricator, but they seem more complicated than what I’m experiencing, e. g. involving three different users) [23:45:20] and if not, what would be the best place to follow up? I’m not sure if Phabricator or Discourse [23:45:30] (preferably not IRC because I want to go to bed soonish :P ) [23:45:33] lucaswerkmeister: that sounds more like a general MediaWiki Action API question than a Toolforge question (which is what !+help summons) [23:45:48] hm, true [23:45:53] so, #wikimedia-tech? [23:46:26] Discourse might get you some help, or yeah #wikimedia-tech or #mediawiki [23:46:45] or do what I would do and just ask t.gr or a.nomie ;) [23:47:42] hehe [23:47:53] I’ll ask on #wikimedia-tech and otherwise fall back to Discourse, thanks