[00:18:48] (03CR) 1020after4: [C: 032] Wikimedia-cloud: Adjust for renamed tags [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/363287 (https://phabricator.wikimedia.org/T167244) (owner: 10BryanDavis) [00:19:12] (03Merged) 10jenkins-bot: Wikimedia-cloud: Adjust for renamed tags [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/363287 (https://phabricator.wikimedia.org/T167244) (owner: 10BryanDavis) [00:19:20] (03CR) 10jenkins-bot: Wikimedia-cloud: Adjust for renamed tags [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/363287 (https://phabricator.wikimedia.org/T167244) (owner: 10BryanDavis) [00:19:44] !log tools.wikibugs Updated channels.yaml to: ca957aaf87fe64aa3a06e0d885c3fdd255c52475 Wikimedia-cloud: Adjust for renamed tags [00:19:44] 10cloud-services-team (Kanban), 10Project-Admins, 10Patch-For-Review, 10User-bd808: Rename and update Cloud Services Phabricator projects - https://phabricator.wikimedia.org/T167244#3405744 (10bd808) [00:40:27] 10Data-Services, 10cloud-services-team, 10wikitech.wikimedia.org, 10DBA: move wikitech and labstestwiki to s3 (needs discussion) - https://phabricator.wikimedia.org/T167973#3405749 (10bd808) [00:40:32] 10Data-Services, 10DBA, 10MediaWiki-extensions-Babel, 10Security-Team, 10WMF-Legal: Replicate babel db table on Labs - https://phabricator.wikimedia.org/T160713#3405750 (10bd808) [00:40:34] 10Data-Services, 10DBA, 10Epic: Labs database replica drift - https://phabricator.wikimedia.org/T138967#3405751 (10bd808) [00:40:37] 10Data-Services, 10DBA: Expose ar_content_format and ar_content_model columns of archive table on Labs replicas - https://phabricator.wikimedia.org/T89741#3405752 (10bd808) [00:46:47] PROBLEM - Puppet errors on tools-exec-1416 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [00:46:51] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:01:03] !help Hello is anyone able to restore a backup of a folder? [01:01:03] SigmaWP: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [01:07:00] SigmaWP: in what context? [01:07:26] Labs [01:07:29] Cloud services [01:07:37] Seems that a folder on a tool's gone missing [01:19:58] SigmaWP: which tool? I'm outside atm, but can check in a bit [01:20:17] sigma [01:26:46] RECOVERY - Puppet errors on tools-exec-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [01:39:26] SigmaWP: Your data in the tool directory went missing? [01:40:55] madhuvishy: Files, but not directories, that sit directly in ~/cherry/cherryhtml/, have vanished [01:51:52] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [03:42:53] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [03:49:10] PROBLEM - Puppet errors on tools-exec-1435 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [04:19:10] RECOVERY - Puppet errors on tools-exec-1435 is OK: OK: Less than 1.00% above the threshold [0.0] [04:22:53] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [05:23:31] madhuvishy: afk let me know when you get a chance to look at it [05:26:17] !help [05:26:17] fajne_farita_: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [05:41:42] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Optimize edit count queries in XTools - https://phabricator.wikimedia.org/T163284#3405823 (10Samwilson) A bit more rejigging and things are improving (I think). Now with a cold cache https://xtools.wmflabs.org/ec/fr.wikipedia.org/Kaldari is: > Executed in 3.23... [07:04:53] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Verify all routes work between new and old xtools - https://phabricator.wikimedia.org/T165612#3405964 (10Samwilson) a:03Samwilson The old routes are listed in T163283, and I've checked that we've got everything in that list covered at the moment. Do we want... [07:38:24] PROBLEM - Free space - all mounts on tools-worker-1020 is CRITICAL: CRITICAL: tools.tools-worker-1020.diskspace.root.byte_percentfree (<100.00%) [07:57:01] 10Cloud-Services: Run non-interactive commands on labs kubernetes webservices - https://phabricator.wikimedia.org/T169695#3406083 (10Tarrow) [07:57:41] is there any plan to have a xenial image on labs? [08:21:17] gilles: afaik no [08:23:49] the plan seems to be trusty -> jessie -> stretch. [08:30:46] 10Cloud-Services, 10Tracking, 10User-bd808: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#3406177 (10Fnielsen) [08:43:50] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [09:21:07] 10Tool-Labs-tools-Xtools: Commons' upload count incorrect in Edit Counter - https://phabricator.wikimedia.org/T169705#3406317 (10Samwilson) [09:46:03] 10Cloud-Services, 10Toolforge, 10Monumental, 10Privacy: Monumental imports css from fonts.googleapis.com - https://phabricator.wikimedia.org/T168786#3406380 (10Yarl) a:03Yarl [09:52:02] 10Cloud-Services, 10Graphite, 10Operations, 10Patch-For-Review, 10User-fgiunchedi: Move labs 'instances' data to graphite labs - https://phabricator.wikimedia.org/T143405#3406402 (10fgiunchedi) @chasemp @bd808 no problem! thanks for working on it :D In terms of rethinking, I don't know exactly what was... [10:20:58] 10Cloud-Services, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Sunset of WDQ - https://phabricator.wikimedia.org/T153439#3406527 (10Esc3300) [10:44:59] 10Cloud-Services: Problem with kubernetes not being able to read ~/.kube/config file. - https://phabricator.wikimedia.org/T169715#3406620 (10Fnielsen) [10:45:27] PROBLEM - Puppet errors on tools-bastion-03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [10:46:26] 10Cloud-Services: Problem with kubernetes not being able to read ~/.kube/config file. - https://phabricator.wikimedia.org/T169715#3406608 (10Fnielsen) I see a related task here https://phabricator.wikimedia.org/T133999 [11:23:54] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [11:25:26] RECOVERY - Puppet errors on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [11:48:19] 10Cloud-Services, 10Tools-Kubernetes: Problem with kubernetes not being able to read ~/.kube/config file. - https://phabricator.wikimedia.org/T169715#3406861 (10zhuyifei1999) [11:49:55] bd808: Herald edited projects, added Cloud-Services; removed Toolforge. · View Herald Transcript <= I think there's something wrong here [11:50:40] 10Cloud-Services, 10Tools-Kubernetes: Problem with kubernetes not being able to read ~/.kube/config file. - https://phabricator.wikimedia.org/T169715#3406871 (10Fnielsen) I have managed to get it going with `--backend=gridengine uwsgi-plain`, but I suppose it would perhaps be nice to have the kubernetes issue... [11:59:59] 10Cloud-Services, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Sunset of WDQ - https://phabricator.wikimedia.org/T153439#3406969 (10Magnus) @Multichill AFAICT it's only wdq-mm-01 left in the wdq-mm project. Switched off since April 6. [12:00:33] 10cloud-services-team (Kanban), 10Project-Admins, 10Patch-For-Review, 10User-bd808: Rename and update Cloud Services Phabricator projects - https://phabricator.wikimedia.org/T167244#3321877 (10zhuyifei1999) > Herald edited projects, added Cloud-Services; removed Toolforge. · View Herald Transcript H28 sho... [12:15:21] (03PS1) 10Alexandros Kosiaris: Remove unused ores_password [labs/private] - 10https://gerrit.wikimedia.org/r/363333 [12:16:04] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Remove unused ores_password [labs/private] - 10https://gerrit.wikimedia.org/r/363333 (owner: 10Alexandros Kosiaris) [12:21:19] 10Cloud-Services: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3407049 (10MaxBioHazard) [12:32:35] (03PS1) 10Alexandros Kosiaris: Add profile::ores::web::redis_password: [labs/private] - 10https://gerrit.wikimedia.org/r/363336 [12:34:27] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Add profile::ores::web::redis_password: [labs/private] - 10https://gerrit.wikimedia.org/r/363336 (owner: 10Alexandros Kosiaris) [12:44:52] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [13:17:20] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [13:24:52] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [13:45:53] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [13:51:26] PROBLEM - Puppet errors on tools-bastion-03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [13:57:06] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3407408 (10MaxBioHazard) [13:57:21] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [14:01:59] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3407425 (10zhuyifei1999) There was [[https://wm-bot.wmflabs.org/logs/%23wikimedia-cloud/20170705.txt|a possibly related discussion on irc]]: ```lang=irc [01:01:03] !help Hello is anyone... [14:10:30] 10Toolforge, 10Tools-Kubernetes: Problem with kubernetes not being able to read ~/.kube/config file. - https://phabricator.wikimedia.org/T169715#3407471 (10zhuyifei1999) [14:12:15] 10Cloud-Services, 10DBA, 10Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3407481 (10Marostegui) I thought I would update with the latest news from this task: db1102 is the new sanitarium 3 running multi-instance with... [14:20:53] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [14:21:20] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3407525 (10zhuyifei1999) p:05Triage>03Unbreak! Fatemi (?) contacted @Ladsgroup of data in `/data/project/fatemi/temp` gone missing, which he then asked in #tool-labs-standards-committee mailin... [14:25:25] PROBLEM - Puppet errors on tools-exec-1411 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [14:26:26] RECOVERY - Puppet errors on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [14:28:56] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3407539 (10zhuyifei1999) For the record: {P5678} All of the affected directories have chmod either 2777 or 0777. I don't think the maintainers themselves did this, as @MaxBioHazard did not expect... [14:35:50] SigmaWP zhuyifei1999_ thanks for the phab task, Sorry, I got back really late yesterday, I'll follow up now. [14:35:59] thanks [14:36:52] I would be really afraid it there is an NFS exploit [14:38:15] but the most logical explanation would be that the folders are chmod-ed 777 in the first place and someone, either accidentally or maliciously, tried to rm -rf the whole /data/project [14:38:36] *afraid if there [14:41:21] Right [14:41:43] We also had a report of missing stuff in /data/scratch [14:42:09] shit :/ [14:46:32] madhuvishy: zhuyifei1999_ my question is missing since when I guess [14:46:51] I'm not sure when the last proof of life is on the few cases, it could be newly discovered from data lost a month+ ago [14:46:59] that's the last mass operation I can think of admin wise [14:47:10] all three reports I got are today [14:47:31] and yesterday, adjusting some timezones [14:47:46] I mean more like, they noticed missing data today, when was it last known good? [14:47:53] that's the thing I'm most curious about [14:47:54] but the max said, "On July 2 all of my bots, located on labs, stopped working" [14:47:59] hm [14:48:23] well, having three reports on the same day can't be a co-incidence [14:48:42] I wouldn't think so [14:48:50] but I have no explanation atm [14:49:03] the /data/scratch one was on 3rd [14:49:08] atleast the report [14:49:18] data scratch being so entirely separate makes it all the more mysterious [14:49:31] (Still not saying they are definitely related) [14:49:31] yes [14:50:22] madhuvishy: can you look back at the pastes from some accidental removal a few weeks or months ago and see if any of these paths are implicated? [14:51:02] chasemp: sure [14:51:06] zhuyifei1999_: madhuvishy is going to look at what limited backups we have to see the state there [14:51:23] k [14:52:33] 10cloud-services-team (Kanban), 10Project-Admins, 10Patch-For-Review, 10User-bd808: Rename and update Cloud Services Phabricator projects - https://phabricator.wikimedia.org/T167244#3407576 (10bd808) >>! In T167244#3406973, @zhuyifei1999 wrote: >> Herald edited projects, added Cloud-Services; removed Toolf... [14:52:37] PROBLEM - Puppet errors on tools-prometheus-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [14:52:45] at least afaict my stuffs haven't been affected (but I have way too many stuffs to check all of them) [14:54:42] chasemp: from a week ago [14:54:45] https://www.irccloud.com/pastebin/XQbibx0r/ [14:55:30] is it chmod 777? [14:55:50] which looks right i think, currently there's only [14:55:54] https://www.irccloud.com/pastebin/iIPvpqUz/ [14:56:07] zhuyifei1999_: https://phabricator.wikimedia.org/T169736#3407582 [14:56:12] it looks varied? [14:56:15] do I misunderstand the report? [14:56:41] 163907909 -rw-r--r-- 1 mbh wikidev 6656 Feb 2 2016 currevents_remove.exe [14:56:48] tools.mbh vs mbh [14:56:52] in the ownership [14:57:23] madhuvishy: yeah people do that all the time [14:57:50] they set chmod g+w so owners chan transfer files with things like sftp [14:58:12] zhuyifei1999_: does that directory contents conflict with the task description? [14:58:23] but they often forget or ignore `take` [14:58:47] chasemp: it might be so; they might have restored some of the files [14:59:00] zhuyifei1999_: not based on those modtimes I think [14:59:19] in theory this directory was emptied according to them? [14:59:20] 163907915 -rw-r--r-- 1 mbh wikidev 10240 Dec 6 2016 pats_awarding.exe [14:59:29] I think restoring from backups rests the modtimes right? [14:59:54] it depends on the backup process [15:00:04] but I know we didn't do any restore here atm [15:00:08] and they didn't note it on task [15:00:09] we have to ask them i suppose [15:00:12] which would be whacky [15:00:27] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3407603 (10chasemp) p:05Unbreak!>03Normal [15:01:01] i guess i thought may be those files aren't visible to them on ls because of the ownership [15:01:13] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3407049 (10chasemp) @MaxBioHazard did you do any restore here since this report? [15:01:18] madhuvishy: or they checked the wrong thing in a panic or who knows [15:01:24] chasemp: ya may be [15:01:42] atm I think this is coincidence in regards to other things [15:01:43] chasemp: the sigma case seems legit, all the files are in the backup, but not in tools [15:02:13] madhuvishy: can you respond to that and offer to restore and explain it's a very very unlikely thing for us to do on an individual basis but in this case we certainly can [15:02:19] and the backup is only a week old [15:02:37] yeah idk what's up there but I'm not leaning towards foul play atm [15:02:45] ya i wonder what happened though [15:02:48] scratch I'm disregarding as it's never reliable [15:02:49] I'll note that `stat /mnt/nfs/labstore-secondary-tools-project/mbh/bots/orphane.files.exe` shows "Modify: 2016-12-22 13:02:05.000000000 +0000 Change: 2017-07-05 12:36:40.887601270 +0000" [15:03:00] right [15:03:19] zhuyifei1999_: nice call yeah, I wonder if that means they did restore things [15:03:23] hard to tell [15:03:27] I'm not 100% sure, but the change time may indicate the actual create time [15:03:41] zhuyifei1999_: can you note that on-task? [15:03:52] k [15:04:07] thanks you zhuyifei1999_, what a weird week [15:05:28] RECOVERY - Puppet errors on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [15:06:42] SigmaWP: fyi, I found your files existing in our week old backup, we're not sure atm what's going on, but can restore this one off , can you open a phab task, and I can note progress there? [15:07:58] PROBLEM - Puppet errors on tools-prometheus-02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [15:08:01] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3407609 (10zhuyifei1999) `stat` on some still-live files, eg. `/mnt/nfs/labstore-secondary-tools-project/mbh/bots/currevents_remove.exe`, shows: ``` File: ‘/mnt/nfs/labstore-secondary-tools-proj... [15:11:28] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3407622 (10MaxBioHazard) Path /mnt/nfs/labstore-secondary-tools-project/mbh/bots is an alias for /data/project/mbh/bots I restored all files from my PC several hours ago. [15:11:50] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3407624 (10MaxBioHazard) p:05Normal>03Unbreak! [15:13:25] 10Cloud-Services: Files (Not directories) in Tools /data/project/sigma/cherry/cherryhtml reported missing - https://phabricator.wikimedia.org/T169756#3407626 (10madhuvishy) [15:13:37] SigmaWP: nvm ^ [15:15:51] 10Cloud-Services: Files (Not directories) in Tools /data/project/sigma/cherry/cherryhtml reported missing - https://phabricator.wikimedia.org/T169756#3407626 (10zhuyifei1999) See also T169736. [15:16:21] 10Cloud-Services: Files (Not directories) in Tools /data/project/sigma/cherry/cherryhtml reported missing - https://phabricator.wikimedia.org/T169756#3407643 (10madhuvishy) [15:19:04] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3407648 (10MaxBioHazard) See also https://ru.wikipedia.org/wiki/%D0%9E%D0%B1%D1%81%D1%83%D0%B6%D0%B4%D0%B5%D0%BD%D0%B8%D0%B5_%D1%83%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA%D0%B0:Haffman#.D0.90.D... [15:19:41] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3407650 (10madhuvishy) State of the world in the week old codfw backup: ``` root@labstore2003:/srv/backup/tools/shared/tools/project/mbh/bots# ls -l total 5068 -rw-r--r-- 1 10604 wikidev 10240... [15:21:17] https://phabricator.wikimedia.org/T169736#3407648 <= google translate of the linked talk page also says july 2 [15:21:49] 10Cloud-Services, 10Toolforge, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-bd808: Update maintain-kubeusers to allow tool's to write to $HOME/.kube - https://phabricator.wikimedia.org/T165875#3407654 (10chasemp) [15:21:51] 10Toolforge, 10Tools-Kubernetes: Problem with kubernetes not being able to read ~/.kube/config file. - https://phabricator.wikimedia.org/T169715#3407656 (10chasemp) [15:23:00] Sunday, nothing administrative happened then [15:24:44] I haven't even logged into labstore since our outage thursday fwiw [15:25:00] I've only looked at load :) [15:25:49] and we have no cleanup scripts or anything [15:28:25] 10Toolforge, 10Tools-Kubernetes, 10User-bd808: Problem with kubernetes not being able to read ~/.kube/config file. - https://phabricator.wikimedia.org/T169715#3407704 (10bd808) 05duplicate>03Resolved a:03bd808 This was caused by errors I introduced in the `maintain-kubeusers` script in {rOPUPaedd882} w... [15:37:07] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3407730 (10MaxBioHazard) @madhuvishy, could you restore this? [15:38:12] madhuvishy: can you check what was the chmod in the backup? if it wasn't 777 then uh [15:38:44] zhuyifei1999_: is it not in my paste? [15:39:01] the chmod of the directory [15:39:04] doesn't look like 777 [15:39:04] aah [15:39:07] sorry [15:39:09] one sec [15:39:24] to delete files you need write permissions of the directory [15:39:31] drwxrwxrwx 3 52321 52321 4096 Jun 14 13:26 bots [15:39:38] oh well [15:39:39] yeah this is 777 [15:39:56] someone much have attempted to rm -rf everything [15:40:33] bleh [15:40:39] I'm kind of wondering if that is what happened. We have some action accounting but it won't tell us the command args. [15:40:57] we should email the list with some best practices and a warning [15:40:58] which makes perfect sense, considering you can't rm a subdirectory without 1. write permissions for the directory itself and 2. remove all the files in the subdirectoru [15:41:07] true [15:41:18] but for singfle files #2 is not necessary [15:41:22] *single [15:41:41] the last time I looked there were lots of 1777 tool dirs :/ [15:42:00] probably at least in part done by folks who were having troubles with scp [15:42:01] oh well they are all broken :/ [15:42:06] drwxrwsrwx 7 51469 51469 4096 Jun 22 22:33 cherryhtml [15:42:06] too [15:42:11] 777 [15:42:46] madhuvishy, gotcha [15:42:58] well, crap [15:43:05] If we catch anyone rm'ing purposefully that seems like solid grounds for a ban [15:43:22] Oh definitely [15:43:23] * madhuvishy goes to look at https://grafana-admin.wikimedia.org/dashboard/db/labstore-nfs-directory-sizes [15:43:36] odds are good it was accidental imo [15:43:38] no significant drops in data sizes - but these files are small [15:43:44] aa much as hostile [15:43:49] as [15:45:30] yeah. it could easily be a poorly written cleanup script [15:45:44] or a fumbled manual attempt [15:45:57] oh umm [15:45:59] chasemp: [15:46:03] * bd808 has a `sudo rm -rf .*` story of his own [15:46:04] /dev/drbd4 ext4 8.0T 3.5T 4.2T 46% /srv/tools [15:46:10] that seems [15:46:14] Wrong [15:46:43] we were at atleast 57% [15:46:52] thats a lot of missing data [15:47:01] last week backup - /dev/mapper/backup-tools--project--backup ext4 8.0T 4.4T 3.2T 58% /srv/backup/tools [15:47:33] right [15:47:38] question for labs: is there a way to get more than 160GB of disk space for an instance? and is it possible to grow the existing partition without losing the data? [15:47:56] Partitions aren't resizable [15:48:24] well, we are going to have to write up some best practices for this and maybe do a round of restore for folks if we can [15:48:32] Maybe? [15:48:45] At least we have working backups [15:49:07] yeah, there's one that ran yesterday, so the week old snapshot is saving us rn [15:49:49] schana: larger custom instances can be requested. We don't have support for resizing existing instances. [15:50:13] yesterday's backup is /dev/mapper/backup-tools--project ext4 8.0T 3.5T 4.2T 46% /srv/backup/tools [15:50:39] bd808: using T140904? [15:50:39] T140904: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904 [15:51:01] schana: yes :) [15:51:04] schana: yes, to get a larger custom instance you will need to file a phabricator task and explain in detail why you need more resources, how many you need, and how this benefits the movement. [15:51:21] will do, thanks [15:51:49] chasemp: might be time for a bigger tracking task though [15:54:44] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3407843 (10zhuyifei1999) p:05Unbreak!>03Triage https://wm-bot.wmflabs.org/logs/%23wikimedia-cloud/20170705.txt: @madhuvishy checked the backups and the directories seems originally chmodded 77... [15:55:38] the number of o+w directories in the tools project is pretty scary [15:57:32] bad practice. some people don't care about permission stuffs until things get broken :( [16:01:58] bd808: 1777 is okay. you can only write to a 1777 directory, but not delete files belonging to someone else [16:02:12] there is a scary amount of rm * in bash_history for people logged in july 2 [16:03:30] any rm with -rf? [16:05:30] or -fr, {-r,-R,--recursive} {-f,--force}, {-f,--force} {-r,-R,--recursive} [16:06:42] one instance of rm -rf * [16:06:44] yes [16:11:50] so, can you restore my folder from backup? [16:12:36] mbh: yes probably, we are trying to decide on whether to do a default mass restore or let people ask atm as restoring from history has its own issues [16:12:56] but it seems to have only effected those with permissive perms afaict [16:13:17] 10Cloud-Services, 10Recommendation-API: Request custom instance for recommendation-api labs project - https://phabricator.wikimedia.org/T169766#3407923 (10schana) [16:13:54] 10Cloud-Services, 10Recommendation-API: Request custom instance for recommendation-api labs project - https://phabricator.wikimedia.org/T169766#3407941 (10schana) [16:16:00] maybe I set 777 rights to this folder 3 years ago, when I was inexpirienced [16:16:51] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [16:18:20] zhuyifei1999_: I'm thinking it would be best either way to couple a notice with a brief bit of best practice, we have an idea now of count on expected effected users and it's like 146 or so tools that show up with sudo find . -perm -o+w -type d [16:18:28] also, i don't know about 0777, 1777, 2777 and other x777 notations, only know about 3-number notation [16:18:43] 10Tool-Labs-tools-Xtools, 10Community-Tech: Internal Server Error from new articleinfo interface in XTools - https://phabricator.wikimedia.org/T169767#3407955 (10kaldari) [16:18:50] I'll make a task and maybe we can collab on what to announce, seems like we'll have folks attention for this email and best to put it all in one place [16:18:52] chasemp: that's very uh oh [16:18:57] k [16:19:19] zhuyifei1999_: yeah not wonderful but at least not a mystery now [16:19:21] mbh: basically, they are octals [16:19:46] 10Tool-Labs-tools-Xtools, 10Community-Tech: Internal Server Error from new articleinfo interface in XTools - https://phabricator.wikimedia.org/T169767#3407955 (10kaldari) p:05Triage>03High [16:19:48] think of them in binary [16:19:57] i know [16:19:58] o0 is b000 [16:20:01] ok [16:20:20] so the last digit is sticky flag [16:20:27] or restricted deletion [16:21:16] zhuyifei1999_: I'm not comfortable atm exposing the user since it looks like from hteir history they were cleaning up their own things and this was a mishap, at least to me [16:21:19] but [16:21:21] Jul 2 08:12:30 logged in [16:21:21] Jul 2 09:30:00 logged out [16:21:21] Jul 2 16:56:59 logged in [16:21:23] Jul 2 19:16:20 logged out [16:21:40] bummer [16:22:14] I don't know if sticky (applies to files) has any use in the modern world, but restricted deletion (applies to directories) prevents deletion of files within that directory when you do not own the files, even if you have write access to the directory [16:23:01] @chasemp what means "atm"? [16:23:09] the second to last (2) is setgroupid [16:23:14] mbh: ah sorry "at the moment" [16:24:17] i know only https://en.wikipedia.org/wiki/Automated_teller_machine [16:24:41] heh, I'm deep in irc shorthand mentally by this point [16:24:57] mbh: in all seriousness, sorry this happened but thank you for the good report and follow up [16:25:10] when set on directories, iirc, it makes newly created files within that directory to belong (group) to the group of the directory [16:25:11] really helpful even if the situations sucks [16:26:11] when set on executable files, **security breach unless you know what you are doing** [16:26:27] when set on non-executable files, I don't think it does anything [16:26:45] the first (4) is setuserid [16:27:09] when set on directories, I never tried this... [16:27:20] when set on executable files, **security breach unless you know what you are doing** [16:27:24] when set on non-executable files, I don't think it does anything [16:27:55] anyways, `man chmod` or https://linux.die.net/man/1/chmod will be helpful [16:28:59] chasemp: k [16:30:50] I 'm not a Linux user, the only place when I use Linux is Labs. I'm ordinary Windows user [16:31:37] mbh: adding to that, setuid is how sudo works (tools-dev's sudo is 4755), so be really careful when setting that bit to executable files [16:31:44] um okay [16:32:52] oh and `sudo` is how `become` works [16:33:09] (setuid is also how `take` works, fwiw) [16:33:34] mbh you should be able to run linux without a vm in windows 10. [16:34:53] i know [16:35:03] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3408100 (10chasemp) [16:35:39] 10Cloud-Services: Files (Not directories) in Tools /data/project/sigma/cherry/cherryhtml reported missing - https://phabricator.wikimedia.org/T169756#3408119 (10chasemp) [16:37:04] zhuyifei1999_: https://phabricator.wikimedia.org/T169774 [16:37:42] It is uncomfortable that Labs require 2 accounts: for me and for bot. Often I need to change permissions of files, for example that read nohup.out from main account (initially it has read permission only for bot account) [16:38:15] mbh: just read it as tool account [16:38:59] chasemp: in what context though? [16:39:05] i do it from WinSCP application, where I logged in as main account [16:39:23] zhuyifei1999_: don't understand the question? [16:39:46] I mean which directory was `rm -rf *` executed from? [16:40:21] I'm wondering because of scratch and project are kind of separate [16:40:55] zhuyifei1999_: agreed, and it's unclear from what I have atm [16:41:21] mbh: talf -f is so much better. (I might be biased though) [16:41:42] *tail -f [16:42:10] zhuyifei1999_: we don't backup scratch either so it's difficult to determine if what's missing has the same perm issues [16:42:39] mbh: you might be interested in https://phabricator.wikimedia.org/T113979 [16:43:07] that hopefully will allow direct login to tool account, but it's like distant future [16:44:07] chasemp: well, I think everyone using scratch should expect their data in scratch to be scratched [16:44:36] agreed [16:46:08] scratch wasn't a case of subdirectory files i think, .gz in /data/scratch/wdqs went missing, and that's not 777 [16:46:32] we should just treat that as a one of separate case, and forget for now i think [16:49:50] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Verify all routes work between new and old xtools - https://phabricator.wikimedia.org/T165612#3408181 (10Matthewrbowker) >>! In T165612#3405964, @Samwilson wrote: > Do we want to set up some monitoring of 404s so that we can track if there are popular broken i... [17:15:30] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3408291 (10chasemp) @MaxBioHazard how do you want to handle restore here? Should we overwrite the restored files or put restored files in another directory or? [17:16:21] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3408292 (10MaxBioHazard) Overwrite. [17:18:22] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:44:54] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Verify all routes work between new and old xtools - https://phabricator.wikimedia.org/T165612#3408385 (10MusikAnimal) I assume we'll have the redirect functionality done via some code we add to the old XTools, and not by lighttpd settings or whatever (which ma... [17:51:52] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [17:52:26] PROBLEM - Puppet errors on tools-bastion-03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:56:44] !log git upgrading puppet-paladox3 to stretch from jessie using https://linuxconfig.org/how-to-upgrade-debian-8-jessie-to-debian-9-stretch [17:56:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [17:58:22] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [18:16:48] 10Cloud-Services, 10Tracking, 10User-bd808: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#3408511 (10chasemp) [18:27:25] 10Cloud-Services, 10MediaWiki-extensions-EducationProgram, 10Patch-For-Review, 10Security: ep_courses.course_token column should not be publically exposed on labs - https://phabricator.wikimedia.org/T169661#3405017 (10chasemp) it seems this has been done filtered at the sanitarium level and https://gerrit.... [18:27:25] RECOVERY - Puppet errors on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [18:30:28] PROBLEM - Puppet errors on tools-exec-1434 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [18:32:50] 10Cloud-Services, 10MediaWiki-extensions-EducationProgram, 10Patch-For-Review, 10Security: ep_courses.course_token column should not be publically exposed on labs - https://phabricator.wikimedia.org/T169661#3408580 (10chasemp) >>! In T169661#3408548, @chasemp wrote: > it seems this has been done filtered a... [18:37:59] !log git restarting gerrit-test3 weird gerrit.service issues with limited helpful logs [18:38:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [18:39:22] 10Cloud-Services, 10LabsDB-Auditor: Add more logging to labsdb-auditor - https://phabricator.wikimedia.org/T78726#3408610 (10chasemp) 05Open>03declined [18:39:37] 10Cloud-Services, 10LabsDB-Auditor: Verify that public views are defined the way they should be (matching whitelist/greylist) - https://phabricator.wikimedia.org/T85473#3408613 (10chasemp) 05Open>03declined [18:44:45] 10Data-Services, 10DBA: Expose ar_content_format and ar_content_model columns of archive table on Labs replicas - https://phabricator.wikimedia.org/T89741#1044220 (10chasemp) @Umherirrender could you put up a patch to https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/role/tem... [18:47:56] 10Cloud-Services, 10Tracking, 10User-bd808: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#3408645 (10chasemp) [19:04:42] 10Cloud-Services, 10DBA: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3399766 (10chasemp) >>! In T169488#3403344, @Marostegui wrote: > The reason I suggested to use the script because it was pointed out to me that... [19:06:07] 10Cloud-Services: Redirect wdq.wmflabs.org to query.wikidata.org - https://phabricator.wikimedia.org/T169653#3404860 (10chasemp) @bd808 thank you! > > What is needed now is to remove the proxy from the wdq-mm project and add it to the redirects project. > @Multichill is this something you can handle? > One... [19:06:47] 10Cloud-Services: Run non-interactive commands on labs kubernetes webservices - https://phabricator.wikimedia.org/T169695#3408778 (10chasemp) p:05Triage>03Normal @Tarrow for now we are in a holding pattern on kubernetes features we can support but I think the idea is solid when we start gaining more traction. [19:08:20] 10Cloud-Services, 10DC-Ops, 10Operations: labstore1005 A PCIe link training failure error on boot - https://phabricator.wikimedia.org/T169286#3408795 (10chasemp) @Cmjohnson I ping'd the wrong chris before :) As of this moment labstore1005 is the standby, if you have time to look at this it would be great ch... [19:08:22] 10Cloud-Services, 10DBA: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3408797 (10Marostegui) It is not super common, but if a wiki is moved to deleted.dblist, then our check_private_data script will complain and we... [19:10:31] RECOVERY - Puppet errors on tools-exec-1434 is OK: OK: Less than 1.00% above the threshold [0.0] [19:10:35] 10Cloud-VPS, 10Tools: Restarting tools after NFS issues - https://phabricator.wikimedia.org/T169210#3408801 (10chasemp) Somewhere on wikitech we have an equivalent procedure to https://wikitech.wikimedia.org/wiki/Portal:Tool_Labs/Admin#Restarting_all_webservices for restarting all webservices running on k8s an... [19:12:25] 10Cloud-Services, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-bd808: `maintain-meta_p --all-databases` timeout on labsdb1009 contacting uk.wikimedia.org - https://phabricator.wikimedia.org/T168436#3364444 (10chasemp) @bd808 can we close this? [19:12:58] 10Cloud-Services, 10Cloud-VPS, 10cloud-services-team (Kanban), 10Operations: Puppet CA: virt1000.wikimedia.org' will expire on 2017-08-15 - https://phabricator.wikimedia.org/T168110#3408839 (10chasemp) a:03Andrew [19:14:35] 10Cloud-Services, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10Nodepool, and 2 others: Lower rate of Nodepool requests to OpenStack API - https://phabricator.wikimedia.org/T167803#3408840 (10chasemp) 05Open>03Resolved >>! In T167803#3378947, @hashar wrote: > Keeping it open for monitoring. T... [19:15:31] 10Cloud-Services, 10Cloud-VPS, 10Toolforge, 10User-fgiunchedi: Rollout prometheus-node-exporter 0.14 in labs - https://phabricator.wikimedia.org/T166561#3408844 (10chasemp) Did this ever happen? Mostly just curious :) [19:16:42] 10Cloud-Services, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-bd808: `maintain-meta_p --all-databases` timeout on labsdb1009 contacting uk.wikimedia.org - https://phabricator.wikimedia.org/T168436#3408849 (10bd808) 05Open>03Resolved Verified by running `sudo /usr/local/sbin/maintain-meta... [19:16:44] 10Cloud-Services, 10Toolforge, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-bd808: Update maintain-kubeusers to allow tool's to write to $HOME/.kube - https://phabricator.wikimedia.org/T165875#3408852 (10chasemp) @bd808 is this resolved then along with T169715? [19:17:52] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [19:19:03] 10Data-Services, 10MediaWiki-extensions-EducationProgram, 10Patch-For-Review, 10Security: ep_courses.course_token column should not be publically exposed on labs - https://phabricator.wikimedia.org/T169661#3408861 (10bd808) [19:19:06] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Optimize edit count queries in XTools - https://phabricator.wikimedia.org/T163284#3408862 (10kaldari) >But that seems too good, so perhaps the DB server is doing some caching too? Yeah, probably still had some of the queries cached in the database. I'll try r... [19:21:06] 10Cloud-Services, 10Tracking, 10User-bd808: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#3408879 (10Paladox) [19:23:16] 10Cloud-Services, 10Toolforge, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-bd808: Update maintain-kubeusers to allow tool's to write to $HOME/.kube - https://phabricator.wikimedia.org/T165875#3408881 (10bd808) >>! In T165875#3408852, @chasemp wrote: > @bd808 is this resolved then along wit... [19:29:44] chasemp: I'm doing the maintain-meta runs for T169488 [19:29:45] T169488: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488 [19:32:00] 10Data-Services, 10DBA: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3408913 (10bd808) 05Open>03Resolved a:03jcrespo I ran `sudo /usr/local/sbin/maintain-meta_p --all-databases --debug` on: * labsdb1001 * lab... [19:34:51] bd808: cool [19:34:57] I forgot about that [19:35:31] I did one on 1009 to test the dblist change and then just did it across the rest [19:36:13] bd808: fyi the mediawiki-config dir is updated via puppet run as it's a system level lib or treated that way [19:36:35] so on a pupept frozen host runs may not have the effects with old data intended for maintain-* [19:36:38] general fyi [19:36:48] been thinking about you need puppet run ability there to be effective [19:37:12] *nod* I thought briefly about fixing the script to use other means to get that data but was too lazy [19:37:38] it must have caught the config change before being frozen because it worked as hoped [19:37:42] bd808: I think there are pitfalls each way [19:37:45] cool [19:38:07] if that really is the system level /usr/local/lib/mediawiki-config then it seems bad manners for any one script to manage it [19:38:12] and puppet is the correct means [19:38:12] those dblist files can be pulled from noc over https instead of needing a local clone [19:38:35] that's fair [19:38:46] current status is mostly a failure to reimagine from previous thinking really [19:38:47] that /usr/local/lib/mediawiki-config is a hack just for these scripts [19:39:10] I mean yes but that location and general symantics it's dangerous to treat that checkout as single purpose imo [19:39:23] better than the full MW tree that is on the dumps servers for the same reason though [19:39:26] either that should be more explicitly per script, or for this group of scripts [19:39:29] or something idk [19:39:38] sure [19:39:55] I think that when you disable puppet this not getting updated is teh /right thing/ atm [19:39:59] that's the inverse [19:40:20] switching this means it's blind to we all expect a system to be in status and now we get into multiple layers of state management [19:40:22] anyhoo [19:40:49] s/status/stasis [19:41:03] If I was writing the script I would just fetch from https://noc.wikimedia.org/conf/ as needed [19:41:31] it also currently does not fully support the dblist format which may cause issues at some point [19:41:33] maintain-meta_p does more bizarre things [19:41:36] right [19:41:44] it's a mess of epic proportions [19:41:58] this dblist would not work for instance -- https://noc.wikimedia.org/conf/group1.dblist [19:42:12] I think teh last commit there is something like [19:42:22] "make this minimally viable to get out of current deadlock" [19:42:33] :) [19:42:46] it does seem to work for the things it works for [19:44:04] so much of this setup is buried in hysterical raisons it's slow progress upwards [19:44:24] there was a period of >1y where it was just ignored afaict until rescue [19:46:34] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3409004 (10Papaul) [19:48:13] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3379828 (10Papaul) a:05Papaul>03RobH @Robh can you please setup the network port for me? Once done you can assign the task back to me for me to pr... [19:48:37] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3409009 (10Papaul) [19:48:45] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3379794 (10Papaul) @Robh can you please setup the network port for me? Once done you can assign the task back to me for me to proceed with the instal... [19:49:00] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3409011 (10Papaul) a:05Papaul>03RobH [19:49:44] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3409013 (10Papaul) [19:49:57] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3379811 (10Papaul) a:05Papaul>03RobH @Robh can you please setup the network port for me? Once done you can assign the task back to me for me to p... [19:50:06] chasemp: yeah. I've sort of aware of the mess that is the replica config. Waaayy back in my mw-core PM job I stepped into it a bit trying to find someone to review the whitelist/blacklist files. [19:50:40] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestmetal2001.codfw.wmnet - https://phabricator.wikimedia.org/T168891#3409016 (10Papaul) [19:50:59] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestmetal2001.codfw.wmnet - https://phabricator.wikimedia.org/T168891#3379777 (10Papaul) a:05Papaul>03RobH @Robh can you please setup the network port for me? Once done you can assign the task back to me for me to procee... [19:54:25] !help [19:54:25] Haffman: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [19:54:39] Haffman: what's you question? [19:54:47] Haffman: please ask a question :) [19:54:58] bd808: that must be ancient history; i had no idea you were even the mw-core product manager [19:55:28] FY2014/2015 Q3 [19:55:43] I have done all the things. ;) [19:55:48] https://www.mediawiki.org/wiki/User:BDavis_(WMF)#Past_Projects [19:57:17] harej: this one was epic! Manager of a team for 3 weeks -- https://www.mediawiki.org/wiki/Wikimedia_MediaWiki_API_Team [19:57:45] was that the re-org right before the other, bigger re-org? [19:57:58] yeah [19:58:00] A strange thing happened to my tool on wmflabs.org around 2nd of July. Yesterday I logged in and saw that all folders in public_html have been moved into to_move folder. And, more importantly, a folder with a tool that is most often visited was missing. [19:59:02] chasemp: ^ that sounds like our g+w issue [19:59:26] Haffman: https://phabricator.wikimedia.org/T169774 [19:59:34] seems related [19:59:52] Haffman: can you make a subtask there explaining and we can look at restoring data [20:00:14] it's not a thing we normally do and we usually say we explicitly won't but it seems like we can and so should at the moment [20:01:03] 10Toolforge: Toolforge data loss for permissive data July 2 2017 - https://phabricator.wikimedia.org/T169774#3409079 (10bd808) [20:03:58] thanks :) [20:07:38] 10Toolforge: Toolforge data loss for permissive data July 2 2017 - https://phabricator.wikimedia.org/T169774#3408084 (10Haffman) >>! In T169774#3408132, @chasemp wrote: > Does anyone know how to contact this user? > > https://phabricator.wikimedia.org/T169736#3407648 I'm here. It would be great to find a backu... [20:13:33] !help [20:13:33] fajne_farita: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [20:13:47] fajne_farita: what's up? [20:14:21] cannot login to wikitech [20:14:50] I am using PuTTY [20:14:53] fajne_farita: can you describe your problem in more detail? what's your username, what's the error it throws? [20:15:34] first, do i need to use shell name or username? [20:15:39] what commands are you running to login to wikitech using putty? [20:15:59] wikitech is wikitech.wikimedia.org/wiki/Main_Page [20:16:11] where you'd login with username [20:16:36] if you are trying to log in to tools/vps(labs) instances, you'd use your shellname [20:16:56] ok. [20:17:49] fajne_farita: what is your shell name? :) [20:18:46] so, i am using my shell name, and putty attempts "Authenticating with public key "rsa-key-20170704"" and throws a "fatal error: server unexpectedly closed network connection" [20:19:52] fajne_farita: what is the shell name you are using? [20:20:10] fajne [20:20:34] fajne_farita: what host are you trying to authenticate to? [20:21:41] i am trying bastion, as the instructions suggest [20:22:00] bastion.wmflabs.org [20:22:03] ok [20:24:43] 10Cloud-Services: Redirect wdq.wmflabs.org to query.wikidata.org - https://phabricator.wikimedia.org/T169653#3409196 (10Multichill) @Magnus can you do the horizon steps? [20:24:57] bd808: so I think fajne_farita is not in teh bastion project [20:25:08] possibly from https://phabricator.wikimedia.org/T165337 [20:25:08] ? [20:25:14] are they in any projects? [20:25:18] but attempts to add via wikitech crapped out on me [20:25:28] no... [20:25:30] id fajne [20:25:31] uid=17439(fajne) gid=500(wikidev) groups=500(wikidev) [20:25:43] then they won't be in the bastion project for sure [20:25:50] it's not fajne_farita. just "fajne" [20:26:16] the shell group/bastion is supposed to be added automatically when a user is added to another OpenStack project [20:26:20] bd808: I'm unclear on https://phabricator.wikimedia.org/T165337 and if this is a known broken thing [20:26:40] but my instinct is some refactoring has resulted in new users having issues [20:26:59] oh wow( [20:27:02] OWM adds the permission when a user is added to a project [20:27:14] could they still request access to Tools via https://toolsadmin.wikimedia.org/auth/login/?next=/tools/membership/apply and get started there? [20:27:17] but yeas we have seen a couple blow up which may be a Striker bug [20:27:19] I know that's sideways [20:28:07] fajne_farita: did you create your account via toolsadmin.wikimedia.org? [20:29:21] if they are not a member of tools or any other VPS project then not being in bastion is normal. If they are a new tools member added from Striker and they are not in bastion too then yes this what that bug is worried about [20:29:36] o/ [20:29:50] get fajne_farita [20:29:56] I created my account through https://wikitech.wikimedia.org/w/index.php?title=Special:CreateAccount&returnto=Help%3AGetting+Started [20:30:00] *hey [20:30:22] fajne_farita: I think what we are saying is that membership in bastion is only checked when you have joined a project [20:30:30] which means current status is possibly normal [20:30:38] and I'm wondering what project you are looking to join [20:30:46] ORES! [20:30:50] * halfak is on it [20:31:08] what's the wikitech username? [20:31:19] fajne. thanks, halfak. [20:31:33] https://wikitech.wikimedia.org/wiki/User:Fajne [20:31:36] oh, wikitech - Timakova [20:31:37] bd808: thanks for explaining I didn't realize bastion membershipo was at that stage [20:31:48] OK cool [20:32:21] bd808: I'm wondering if our instructions need some love at https://wikitech.wikimedia.org/wiki/Help:Getting_Started#Creating_an_account_2 [20:32:29] chasemp: it's kind of a twisted maze, but yes. This is a buffer against giving free compute resources to anyone who creates an account. [20:32:35] * chasemp nods [20:32:38] I'm sure they do :) [20:32:40] I like it [20:33:18] fajne_farita: all appears well to me now [20:33:28] fajne_farita: try logging into one of the ores-staging instances :) [20:34:00] you can see the list at https://tools.wmflabs.org/openstack-browser/project/ores-staging [20:34:17] instructions say: "You can test your key by attempting to connect to the bastion VPS project: ssh -v bastion.wmflabs.org After you are authenticated, the session will automatically be closed." Maybe, this is what happened? [20:38:26] OK. fajne_farita is now part of ore-staging [20:38:31] *ores-staging [20:39:02] chasemp, fajne_farita: a small attempt to make to prerequisites more clear -- https://wikitech.wikimedia.org/w/index.php?title=Help:Getting_Started&diff=1763725&oldid=1762226 [20:39:58] chasemp, looks good [20:41:52] bd808: small bump https://wikitech.wikimedia.org/w/index.php?title=Help%3AGetting_Started&type=revision&diff=1763726&oldid=1763725 [20:41:55] fajne_farita: awesome [20:41:56] halfak, what host name should i use? [20:42:26] chasemp: nice. that data is cached for a bit, but its a good check [20:42:36] fajne_farita, ores-compute-01.eqiad.wmflabs [20:43:22] 10Cloud-VPS, 10ORES, 10Scoring-platform-team: Set up larger ores-compute instance - https://phabricator.wikimedia.org/T169809#3409254 (10Halfak) [20:43:41] halfak to the rescue [20:44:47] :D [20:45:00] 10Data-Services, 10DBA, 10MediaWiki-extensions-Babel, 10Security-Team, 10WMF-Legal: Replicate babel db table on Labs - https://phabricator.wikimedia.org/T160713#3409285 (10APalmer_WMF) Approved by Legal. Thanks, all. [20:48:21] 10Cloud-VPS, 10ORES, 10Scoring-platform-team: Increase quote for ores-staging to 48 GB - https://phabricator.wikimedia.org/T169811#3409290 (10Halfak) [20:52:27] 10Cloud-VPS, 10ORES, 10Scoring-platform-team: Set up larger ores-compute instance - https://phabricator.wikimedia.org/T169809#3409318 (10Halfak) [20:52:46] 10Cloud-VPS, 10ORES, 10Scoring-platform-team: Set up larger ores-compute instance - https://phabricator.wikimedia.org/T169809#3409254 (10Halfak) [20:53:10] OK I think I have this task right: https://phabricator.wikimedia.org/T169811 [20:53:56] Not sure if tagged exactly right: ^ [20:54:34] brb [21:01:05] 10Toolforge: Toolforge data loss for permissive data July 2 2017 - https://phabricator.wikimedia.org/T169774#3409357 (10chasemp) >>! In T169774#3409107, @Haffman wrote: >>>! In T169774#3408132, @chasemp wrote: >> Does anyone know how to contact this user? >> >> https://phabricator.wikimedia.org/T169736#3407648... [21:02:05] 10Toolforge: Toolforge data loss for permissive data July 2 2017 - https://phabricator.wikimedia.org/T169774#3409362 (10Haffman) >>! In T169774#3409357, @chasemp wrote: >>>! In T169774#3409107, @Haffman wrote: >>>>! In T169774#3408132, @chasemp wrote: >>> Does anyone know how to contact this user? >>> >>> https... [21:05:30] 10Tools: Third party resources loaded from "communityguidelines" tool - https://phabricator.wikimedia.org/T169334#3409363 (10Samtar) @bd808 resolving now.. [21:09:10] 10Tools: Third party resources loaded from "communityguidelines" tool - https://phabricator.wikimedia.org/T169334#3409369 (10Samtar) 05Open>03Resolved Changed to `` - thanks for the report @Nemo_bis [21:09:49] 10Toolforge: Toolforge data loss for permissive data July 2 2017 - https://phabricator.wikimedia.org/T169774#3409377 (10chasemp) > > h2bot We are still working out what to do here but fyi paths found via `sudo find . -perm -o+w -type d` are in P5684 [21:12:52] halfak: we track them currently as subtasks of https://phabricator.wikimedia.org/T140904 [21:13:11] thanks halfak I think we may not get to that until the prescribed time of tue morn fyi, I'll try to sanity check before then but it's a weird week already [21:13:15] you can see other quota requests for examples (linked as subtasks there) [21:16:48] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3409414 (10RobH) a:05RobH>03Papaul In the future please list out the full network port info, so I don't have to go hunting =] Example: ge-1/0/17... [21:16:57] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3409416 (10RobH) [21:17:08] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3409417 (10RobH) [21:17:36] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3379828 (10RobH) Network port setup, in the future, please list the full network port info. Example: ge-1/0/16 is actually asw-c-codfw:ge-1/0/16 [21:17:46] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3409420 (10RobH) a:05RobH>03Papaul [21:17:51] 10Toolforge: Toolforge data loss for permissive data July 2 2017 - https://phabricator.wikimedia.org/T169774#3409421 (10chasemp) [21:22:53] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [21:24:45] back! [21:25:21] thanks madhuvishy & chasemp. Will make things look right and no problem waiting until y'all have time. [21:25:51] halfak: thank you <3 [21:26:31] 10Cloud-VPS, 10ORES, 10Scoring-platform-team: Increase quota for ores-staging to 48 GB - https://phabricator.wikimedia.org/T169811#3409441 (10Halfak) [21:26:44] 10Cloud-VPS, 10ORES, 10Scoring-platform-team: Increase quota for ores-staging to 48 GB - https://phabricator.wikimedia.org/T169811#3409290 (10Halfak) [21:26:46] 10Cloud-Services, 10Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#3409442 (10Halfak) [21:27:10] 10Cloud-VPS, 10ORES, 10Scoring-platform-team: Request increase quota for ores-staging to 48GB RAM - https://phabricator.wikimedia.org/T169811#3409290 (10Halfak) [21:28:27] 10Cloud-VPS, 10ORES, 10Scoring-platform-team: Request increase quota for ores-staging to 48GB RAM - https://phabricator.wikimedia.org/T169811#3409290 (10Halfak) [21:33:10] 10Toolforge: Toolforge data loss for permissive data July 2 2017 - https://phabricator.wikimedia.org/T169774#3409478 (10chasemp) Current status: best guess this effected about 128 Tools, and I'm trying to gather how much data we are talking here size wise. [21:36:26] 10Toolforge: Toolforge data loss for permissive data July 2 2017 - https://phabricator.wikimedia.org/T169774#3409484 (10chasemp) >>! In T169774#3409478, @chasemp wrote: > Current status: best guess this effected about 128 Tools, and I'm trying to gather how much data we are talking here size wise. aswnbot autol... [21:43:52] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:50:23] 10Toolforge: Toolforge data loss for permissive data July 2 2017 - https://phabricator.wikimedia.org/T169774#3409528 (10chasemp) [22:00:23] 10Toolforge: Toolforge data loss for permissive data July 2 2017 - https://phabricator.wikimedia.org/T169774#3409561 (10chasemp) At this moment our strategy is probably going to be to restore these files to somewhere user accessible as the size and side effects of trying to do unnecessary or unwelcome restore he... [22:01:51] bd808: i noticed a template was out of date with the name change so i fixed it thoughts? https://wikitech.wikimedia.org/w/index.php?title=Template:LabsDocumentationPage&oldid=1763734 [22:02:58] tags? Classy [22:03:04] thanks. I haven't started the big on wiki renaming yet [22:03:13] no problem [22:03:15] that template is gross too for some other reasons [22:03:38] its applied inconsitently and eats a lot of screen space [22:07:02] !log tools.meetbot Stopped and started bot because 2 irc connections were open [22:07:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.meetbot/SAL [22:11:39] 10Data-Services, 10DBA, 10MediaWiki-extensions-Babel: Replicate babel db table on Labs - https://phabricator.wikimedia.org/T160713#3409602 (10bd808) p:05Triage>03Normal Approved by Legal and Security. The table needs to be added to maintain-views.yaml and then run in all the appropriate places. [22:12:49] 10Cloud-Services, 10cloud-services-team (Kanban): Toolforge data loss for permissive data July 2 2017 - https://phabricator.wikimedia.org/T169774#3409607 (10chasemp) [22:16:20] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10MediaWiki-extensions-Babel, 10Patch-For-Review: Replicate babel db table on Labs - https://phabricator.wikimedia.org/T160713#3409619 (10bd808) [22:32:36] 10cloud-services-team (Kanban), 10wikitech.wikimedia.org: Add `wikitech-grep` to puppet - https://phabricator.wikimedia.org/T169820#3409654 (10bd808) [22:39:18] 10Toolforge: Homedir for user cosmiclattes is very large (>60G) - https://phabricator.wikimedia.org/T169283#3409690 (10Quiddity) [22:40:23] 10Cloud-Services: Homedir for user cosmiclattes is very large (>60G) - https://phabricator.wikimedia.org/T169283#3393452 (10Quiddity) [22:42:40] 10Toolforge: Homedir for user cosmiclattes is very large (>60G) - https://phabricator.wikimedia.org/T169283#3409694 (10bd808) [23:10:11] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Planning for Xtools beta - https://phabricator.wikimedia.org/T167217#3409821 (10kaldari) p:05Low>03High [23:10:22] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Write unit tests for Xtools - https://phabricator.wikimedia.org/T165400#3409823 (10kaldari) p:05Normal>03Low [23:11:29] 10Cloud-Services, 10Toolforge, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-bd808: Update maintain-kubeusers to allow tool's to write to $HOME/.kube - https://phabricator.wikimedia.org/T165875#3409830 (10bd808) 05Open>03Resolved >>! In T165875#3289712, @bd808 wrote: > I'll clean up the... [23:13:09] 10Toolforge, 10VPS-Projects, 10cloud-services-team (Kanban): Deprecate DSA (ssh-dss) SSH keys for Labs users - https://phabricator.wikimedia.org/T168433#3409836 (10bd808) [23:13:26] 10Cloud-VPS, 10Toolforge, 10cloud-services-team (Kanban): Deprecate DSA (ssh-dss) SSH keys for Labs users - https://phabricator.wikimedia.org/T168433#3364362 (10bd808) [23:13:27] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Address all the TODOs in the new XTools interface - https://phabricator.wikimedia.org/T169829#3409853 (10kaldari) [23:14:03] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Address all the TODOs in the new XTools interface - https://phabricator.wikimedia.org/T169829#3409866 (10kaldari) p:05Triage>03High [23:19:24] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Address all the TODOs in the new XTools interface - https://phabricator.wikimedia.org/T169829#3409885 (10kaldari) [23:20:03] 10Tool-Labs-tools-Xtools, 10Community-Tech: Give visual feedback while Editcounter is thinking - https://phabricator.wikimedia.org/T169831#3409899 (10DannyH) [23:20:25] 10Tool-Labs-tools-Xtools, 10Community-Tech: Give visual feedback while Editcounter is thinking - https://phabricator.wikimedia.org/T169831#3409915 (10DannyH) p:05Triage>03High [23:23:13] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Address all the TODOs in the new XTools interface - https://phabricator.wikimedia.org/T169829#3409919 (10kaldari) [23:23:58] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Commons' upload count incorrect in Edit Counter - https://phabricator.wikimedia.org/T169705#3409922 (10Samwilson) [23:25:01] 10Tool-Labs-tools-Xtools, 10Community-Tech: Give visual feedback while Editcounter is thinking - https://phabricator.wikimedia.org/T169831#3409923 (10kaldari) [23:25:25] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Give visual feedback while Editcounter is thinking - https://phabricator.wikimedia.org/T169831#3409899 (10kaldari) [23:26:11] 10Tool-Labs-tools-Xtools, 10Community-Tech: Commons' upload count incorrect in Edit Counter - https://phabricator.wikimedia.org/T169705#3409926 (10Samwilson) [23:29:22] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Internal Server Error from new articleinfo interface in XTools - https://phabricator.wikimedia.org/T169767#3409948 (10kaldari) [23:32:40] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Commons' upload count incorrect in Edit Counter - https://phabricator.wikimedia.org/T169705#3409969 (10kaldari) p:05Triage>03High [23:32:59] 10Cloud-Services, 10Tracking, 10User-bd808: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#3409985 (10bd808) [23:38:01] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Internal Server Error from new articleinfo interface in XTools - https://phabricator.wikimedia.org/T169767#3410029 (10MusikAnimal) 05Open>03Resolved a:03MusikAnimal Just needed to supply credentials for the tools-db database, which is where the checkwiki... [23:44:22] !help [23:44:22] fajne_farita: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [23:44:38] fajne_farita: what's your question? [23:45:26] 10Data-Services, 10Toolforge, 10cloud-services-team (Kanban): Toolforge data loss for permissive data July 2 2017 - https://phabricator.wikimedia.org/T169774#3410050 (10bd808) [23:47:34] hi Bryan! i already discussed this issue here earlier today - cannot login into an ores-staging instance i am part of [23:47:59] oh. I thought we got you fixed up. [23:48:15] nope. i had to run and now got back to it [23:48:38] followed these instructions https://wikitech.wikimedia.org/wiki/Help:Access_to_instances_with_PuTTY_and_WinSCP#How_to_set_up_PuTTY_for_proxying_through_bastion.wmflabs.org_to_your_instance [23:48:49] and it didn't help [23:49:33] Ok. I do see that you are in the bastion project now so at least that problem is fixed [23:49:52] yep. i did login to bastion, no problem [23:50:05] ok. that's a good start [23:50:13] so we know that your ssh key is working [23:50:18] now i try to login to this instance through bastion if i get it right [23:50:47] *i'm trying [23:51:14] ok. So you have PuTTY and PLink installed locally? [23:51:20] bd808: Screenshots with pmtpa in them ;-) [23:51:29] RainbowSprinkles: yeah they are a bit crusty [23:51:44] shoud plink.exe be in the same folder with putty? [23:51:57] bd808: I'm not a huge fan of screenshots of software for that reason :p [23:52:38] fajne_farita: I think it has to be somewhere in your command search path *or* you need to use a full path to it when you setup the proxy command [23:53:02] * bd808 is a bit rusty on windows unfortunately [23:54:26] fajne_farita: this tutorial I just googled may or may not be more explanatory -- http://mikelococo.com/2008/01/multihop-ssh/ [23:54:49] they are using "shell plink.exe intermediate.proxy.host -l username -agent -nc %host:%port" [23:55:13] your "intermediate.proxy.host" would be "bastion.wmflabs.org" [23:55:24] and "username" is "fajne" [23:55:45] hm.. [23:57:04] nope. server unexpectedly closed nw connectn [23:57:33] fajne_farita: which VM are you trying to connect into? I can go look for error logs [23:58:06] ores-compute-01.eqiad.wmflabs [23:58:41] ores-compute-01.ores.eqiad.wmflabs [23:58:49] you missed ores in there [23:59:01] hm.. not me) [23:59:34] fajne_farita: I have a clue. That host won't let me ssh in with my root key either