[00:00:00] that makes me think that the server needs some help. [00:00:28] halfak: are you around? I can get into ores-compute-01.ores.eqiad.wmflabs with my root key and its not letting fajne_farita in either [00:01:17] paladox: the hostname you gave is canonical, but it will work without the project name too. That's a quirk of our DNS setup [00:01:30] oh [00:01:30] halfak is not here [00:01:46] yep, nothing helps [00:02:18] fajne_farita: I think we are either going to need halfak to look at that host or wait for one of the other cloud services roots to try with their keys. [00:02:31] * bd808 checks some other hosts in that project [00:02:50] ok, i can wait [00:03:15] fajne_farita: I can get into this host -- ores-lb-02.ores.eqiad.wmflabs -- maybe you can try and see if that works for you too [00:03:15] meanwhile, can you recommend a decent unix emulator for win? [00:03:40] *linux emulator? [00:03:50] paladox: can you help fajne_farita find useful tools for his windows client? [00:04:01] Yep [00:04:17] there's the git client that he can use to debug the ssh problems [00:04:48] or if he uses windows 10. he can open a bash console and try that. [00:04:49] fajne_farita ^^ [00:05:46] fajne_farita: when you say "linux emulator" are you looking for a local VM for development or better tools for connecting to linux hosts? [00:06:07] For local VMs VirtualBox usually works pretty well [00:07:01] !log ores bd808's root key is rejected by ores-compute-01.ores.eqiad.wmflabs. Puppet busted there? [00:07:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [00:08:12] bd808: ores-lb-02.ores.eqiad.wmflabs host does not exist [00:08:14] bd808 i have a puppet checker on there. [00:08:23] and it shows no puppet errors [00:08:50] "OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures" [00:09:27] fajne_farita: did you setup that connection to use a plink.exe proxy command? [00:09:55] The DNS names will only resolve once you are inside our network by connecting to the bastion [00:09:59] yep. just plink.exe, not shell plink.exe though [00:10:06] hmmm... [00:11:02] maybe try selecting the "Do DNS name lookup at proxy end: Yes" option too? [00:11:27] That should be on the Connection > Proxy screen I think [00:12:31] bd808: i am talking about something like https://www.cygwin.com/ but maybe more robust [00:13:22] *nod* Windows 10 has this stuff built in or at least installable I think. [00:13:45] I used to use cygwin many years ago with windows NT 4.5 :) [00:14:27] fajne_farita windows 10 has built in support for bash [00:14:27] fajne_farita: paladox is a windows power user and may be able to point you to some things [00:14:30] aka ubuntu [00:14:45] yep. [00:14:53] i'll try it [00:15:17] https://msdn.microsoft.com/en-gb/commandline/wsl/install_guide [00:15:21] fajne_farita ^^ [00:16:29] thanks [00:17:14] but still, this is what putty puts: Starting local proxy command: plink.exe bastion.wmflabs.org -l fajne -agent -nc ores-lb-02.ores.eqiad.wmflabs:22 proxy: FATAL ERROR: Disconnected: No supported authentication methods available (server sent: publickey) [00:17:36] maybe he key is the key? [00:18:07] did you add your key [00:18:11] through wikitech? [00:18:12] ok. let me check the server log [00:18:15] i provided my private key in the auth [00:18:34] and i added my public to my gerrit acc [00:18:35] ah, using git may be easier for you. [00:18:43] by using git you gain the ssh command [00:18:47] ah [00:18:52] where else should i send my pubkey? [00:18:55] fajne_farita you need to add the key to wikitech [00:19:07] it's there [00:19:08] https://wikitech.wikimedia.org/wiki/Special:Preferences#mw-prefsection-openstack [00:19:38] paladox: he tested ssh directly to the bastion and it worked so that part should be ok [00:19:44] ah ok [00:19:50] yep. [00:19:57] can he ssh into any other instances in the ores project? [00:20:23] this is the secon instance that we are trying, to no avail [00:20:45] try ores-web-01. [00:20:59] woops i meant ores-redis-01 [00:21:39] ores-redis-01.ores.eqiad.wmflabs ? [00:21:49] yep please. [00:22:35] same message [00:22:39] hmm [00:22:47] is there a way for you to do -vvv [00:22:48] ? [00:23:16] ok. i'll try Bash like every normal person here [00:23:30] ok [00:23:34] thank you for being with me!) [00:23:40] I'm not certain that your proxy setup is working. I only see one auth for you on bastion at Wed Jul 5 23:07 [00:23:44] https://github.com/git-for-windows/git/releases [00:23:54] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [00:23:58] it should count a login each time you hop through the server [00:24:21] i see [00:25:03] and in ~/.ssh/config (which you can do on the git client) [00:25:08] you should do something like [00:25:09] Host gerrit-test [00:25:09] ProxyCommand ssh -a -W %h:%p paladox@primary.bastion.wmflabs.org [00:25:10] UseRoaming no [00:25:10] User paladox [00:25:23] replacing my username and the host with your username and the host you want [00:25:30] then connect like ssh [00:25:41] paladox: use pastes or some bot will kick you for flooding. :) [00:25:55] yep woops. thanks. Sorry. [00:29:46] actually you can run opnesuse and another linux varient sidebar side along side ubuntu. without using a vm. just remeber do not run rm -rf / otherwise you will break your windows install. [00:29:54] Im pushing for them to support debian. [00:34:03] * paladox has to go as it's 01.33am [00:35:03] paladox: nice @ "dont run rm -rf /" i chuckled... [00:35:10] :) [00:35:14] paladox: reminded me of that ticket... [00:35:16] and good night [00:35:25] thanks and you too :) [00:36:53] 10Cloud-Services, 10Tracking, 10User-bd808: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#3410144 (10bd808) 05Open>03Resolved a:03bd808 Replaced by #cloud-vps-quota-requests [00:38:52] 10Cloud-Services, 10Tracking, 10User-bd808: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#3410151 (10bd808) [00:40:30] PROBLEM - Puppet errors on tools-exec-1417 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [00:50:49] 10Cloud-VPS, 10Tools: Restarting tools after NFS issues - https://phabricator.wikimedia.org/T169210#3410163 (10yuvipanda) When I was doing it, I'd just do some shell scripting to delete all the pods in all namespaces that aren't paws. k8s will start them back up. [00:51:09] 10Cloud-VPS, 10Tools: Restarting tools after NFS issues - https://phabricator.wikimedia.org/T169210#3410164 (10yuvipanda) You can get a list of all pods with `kubectl get --all-namespaces pods` and then do bash magic from there. [00:54:35] 10Cloud-VPS, 10Tools: Restarting tools after NFS issues - https://phabricator.wikimedia.org/T169210#3410166 (10Jeff_G) All tools should have predefined documented ways to recover from the return to service of formerly failed dependencies, the more automated the better. For manually initiated scripts, the peopl... [00:55:52] 10Toolforge, 10Tools: Restarting tools after NFS issues - https://phabricator.wikimedia.org/T169210#3410169 (10bd808) [00:58:27] 10Toolforge, 10Tools: Restarting tools after NFS issues - https://phabricator.wikimedia.org/T169210#3410187 (10bd808) >>! In T169210#3410163, @yuvipanda wrote: > When I was doing it, I'd just do some shell scripting to delete all the pods in all namespaces that aren't paws. k8s will start them back up. >>! In... [00:58:40] 10Quarry: Provide a way to hyperlink Quarry results/output - https://phabricator.wikimedia.org/T74874#3410189 (10MZMcBride) [01:10:34] RECOVERY - Puppet errors on tools-exec-1417 is OK: OK: Less than 1.00% above the threshold [0.0] [01:33:59] 10Data-Services, 10Toolforge, 10cloud-services-team (Kanban): Toolforge data loss for permissive data July 2 2017 - https://phabricator.wikimedia.org/T169774#3410226 (10zhuyifei1999) `483G ./shared/tools/project/.shared` is supposed to be okay since it's 3777, unless the user deleted their own files, but for... [01:44:52] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [01:51:36] paladox: rm -rf / <= you forgot --no-preserve-root? :P [01:52:39] (I used to play with this command in a VM) [02:05:37] PROBLEM - Puppet staleness on tools-worker-1020 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [02:08:53] chasemp: I can see the two files even when logged out [02:09:16] it seems to be "attached to" the task [02:30:28] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Address all the TODOs in the new XTools interface - https://phabricator.wikimedia.org/T169829#3409853 (10MusikAnimal) For the block info: https://github.com/x-tools/xtools-rebirth/commit/29b94da87ceb444d24708e153276b9694fa578c4 Currently running on prod: http... [02:31:16] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Address all the TODOs in the new XTools interface - https://phabricator.wikimedia.org/T169829#3410260 (10MusikAnimal) This accounts for all the visible TODO's that I'm aware of [03:19:51] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [03:49:13] PROBLEM - Puppet errors on tools-exec-1438 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [04:14:15] RECOVERY - Puppet errors on tools-exec-1438 is OK: OK: Less than 1.00% above the threshold [0.0] [04:32:45] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Commons' upload count incorrect in Edit Counter - https://phabricator.wikimedia.org/T169705#3410397 (10Samwilson) It was looking for the same user ID as the main project, so in some cases was showing *something* but in lots there was no match. Fixed now, see h... [04:38:53] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Give visual feedback while Editcounter is thinking - https://phabricator.wikimedia.org/T169831#3410417 (10Samwilson) a:03Samwilson [04:45:54] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [05:20:53] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [05:28:23] 10Data-Services, 10DBA: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3410469 (10Marostegui) Our sanitarium host (db1069) got replication broken with: ``` Error 'Table 'ukwikimedia.site_stats' doesn't exist' on quer... [05:41:53] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [06:11:03] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Internal Server Error from new articleinfo interface in XTools - https://phabricator.wikimedia.org/T169767#3410489 (10kaldari) @MusikAnimal: Can you make sure that is documented in the configuration instructions? Is this something that `composer install` shoul... [06:18:54] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Internal Server Error from new articleinfo interface in XTools - https://phabricator.wikimedia.org/T169767#3410501 (10Samwilson) `composer install` doesn't prompt for labs-only parameters. Although, is this really a labs-only thing? 3rd party users could also... [06:21:52] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [06:23:35] 10Tool-Labs-tools-Pageviews: Totals for Edits and Editors are not updated when pages are removed from the analysis - https://phabricator.wikimedia.org/T169848#3410511 (10Larske) [06:31:57] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Internal Server Error from new articleinfo interface in XTools - https://phabricator.wikimedia.org/T169767#3410542 (10kaldari) > composer install doesn't prompt for labs-only parameters. Although, is this really a labs-only thing? 3rd party users could also be... [06:36:38] 10Tool-Labs-tools-Pageviews: Totals for Edits and Editors are not updated when pages are removed from the analysis - https://phabricator.wikimedia.org/T169848#3410549 (10Larske) [07:12:25] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Verify all routes work between new and old xtools - https://phabricator.wikimedia.org/T165612#3271583 (10kaldari) The smart redirecting for `./autoblock/index.php` and `./echo/index.php` sounds like overkill (and potentially confusing to the user). How about j... [07:12:29] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Internal Server Error from new articleinfo interface in XTools - https://phabricator.wikimedia.org/T169767#3410587 (10Samwilson) (The readthedocs can sometimes lag I think. It's working now at https://xtools.readthedocs.io/en/latest/configuration.html ) The `... [07:26:25] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Internal Server Error from new articleinfo interface in XTools - https://phabricator.wikimedia.org/T169767#3410607 (10kaldari) 05Resolved>03Open http://xtools.wmflabs.org/articleinfo/en.wikipedia.org/Thomas%20Jefferson still gives the same database error f... [07:30:01] 10Data-Services, 10DBA, 10Wikimedia-maintenance-script-run: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3410626 (10jcrespo) 05Resolved>03Open CC @bd808 ^ Maybe maintenance was not updated, but something else... [07:42:03] 10cloud-services-team, 10DBA, 10Operations: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3410634 (10jcrespo) I would do labsdb1004 first, which is the slave for the toolsdb, and labsdb1005- I didn't want to pressure you because I knew you had other concerns. I would say Tue... [08:46:52] 10Tools: [stalktoy] sixxs.net WHOIS link is down, requesting replacement - https://phabricator.wikimedia.org/T169854#3410727 (10MarcoAurelio) [08:49:17] 10Tools: [stalktoy] sixxs.net WHOIS link is down, requesting replacement - https://phabricator.wikimedia.org/T169854#3410739 (10MarcoAurelio) (if someone has an account on GitHub and could port this report over https://github.com/Pathoschild/Wikimedia-contrib/issues it'd be good) [09:18:03] 10Cloud-Services, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Sunset of WDQ - https://phabricator.wikimedia.org/T153439#3410792 (10Magnus) [09:45:29] 10cloud-services-team (Kanban), 10wikitech.wikimedia.org: Add `wikitech-grep` to puppet - https://phabricator.wikimedia.org/T169820#3409654 (10Legoktm) Is Special:Search really not sufficient? [09:46:27] 10Toolforge, 10Tools: Someone deleted folder with my bots on Tool Labs - https://phabricator.wikimedia.org/T169736#3410854 (10MaxBioHazard) Not restored yet. [09:49:21] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [09:51:09] 10Cloud-Services, 10Graphite, 10Operations, 10Patch-For-Review, 10User-fgiunchedi: Move labs 'instances' data to graphite labs - https://phabricator.wikimedia.org/T143405#3410877 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi I've deleted the `instances` directory for real from graphite machines,... [10:37:21] 10Cloud-Services, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Sunset of WDQ - https://phabricator.wikimedia.org/T153439#3410990 (10Magnus) [10:54:21] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [11:30:48] !log wikispeech Deploy latest from Git master: c766368, 5abcb1f, 63af3be (T149091) [11:30:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikispeech/SAL [11:30:51] T149091: Segment by tags - https://phabricator.wikimedia.org/T149091 [11:57:10] zhuyifei1999_ oh, microsoft have a warnning that says doint do rm -rf / [11:57:34] lol [12:16:23] mbh: afaik, this is the first time a restoration is being done, so please be patient [12:34:00] 10Tools, 10Commons: Zoomviewer is down - https://phabricator.wikimedia.org/T169864#3411280 (10zhuyifei1999) [12:36:28] 10Tools, 10Commons: Zoomviewer is down - https://phabricator.wikimedia.org/T169864#3411054 (10zhuyifei1999) Maintainer has been contacted [[https://commons.wikimedia.org/wiki/User_talk:Dschwen#Zoomviewer|via talk page]] on July 3rd. [12:47:06] !log git upgrade puppet-phabricator to stretch from jessie using https://linuxconfig.org/how-to-upgrade-debian-8-jessie-to-debian-9-stretch [12:47:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [12:54:53] (03Draft1) 10Paladox: Add apt update check to puppet-phabricator [labs/icinga2] - 10https://gerrit.wikimedia.org/r/363586 [12:54:55] (03PS2) 10Paladox: Add apt update check to puppet-phabricator [labs/icinga2] - 10https://gerrit.wikimedia.org/r/363586 [12:54:58] (03CR) 10Paladox: [V: 032 C: 032] Add apt update check to puppet-phabricator [labs/icinga2] - 10https://gerrit.wikimedia.org/r/363586 (owner: 10Paladox) [14:02:22] 10cloud-services-team (Kanban), 10wikitech.wikimedia.org: Add `wikitech-grep` to puppet - https://phabricator.wikimedia.org/T169820#3411660 (10bd808) >>! In T169820#3410852, @Legoktm wrote: > Is Special:Search really not sufficient? In theory, [[https://www.mediawiki.org/wiki/Help:CirrusSearch#Regular_express... [14:07:22] 10Cloud-Services, 10Cloud-VPS, 10Toolforge, 10User-fgiunchedi: Rollout prometheus-node-exporter 0.14 in labs - https://phabricator.wikimedia.org/T166561#3411673 (10fgiunchedi) No it didn't happen, though if someone wants to pick it up please do! [14:33:41] 10Cloud-VPS, 10ORES, 10Scoring-platform-team: Set up larger ores-compute instance - https://phabricator.wikimedia.org/T169809#3411767 (10Halfak) a:03Halfak [15:09:09] 10Tools: [stalktoy] sixxs.net WHOIS link is down, requesting replacement - https://phabricator.wikimedia.org/T169854#3411935 (10Pathoschild) 05Open>03Resolved a:03Pathoschild I switched to a different service. Thanks for pointing that out! [15:13:55] 10VPS-Projects, 10Math, 10Release-Engineering-Team (Kanban): Instances in math project show high system CPU usage - https://phabricator.wikimedia.org/T160824#3411953 (10hashar) 05Open>03Resolved a:03hashar That has been fixed around June 21th when all labvirt / instances have been rebooted. [15:49:07] 10Cloud-Services, 10Toolforge: Automatically restarting job for tools.sbot - https://phabricator.wikimedia.org/T168206#3412016 (10zhuyifei1999) 05Open>03Resolved >>! In T168206#3360942, @zhuyifei1999 wrote: > A root needs to send it a SIGCONT or SIGKILL. Gone during reboot. [16:15:35] What is the proper name for the public db views in labs now that everything is renamed cloud? [16:16:26] bawolff: it's up for discussion, I think we have suggested wikireplicas but nothings settled [16:16:50] bawolff: https://wikitech.wikimedia.org/wiki/User_talk:BryanDavis/Rebranding_Cloud_Services_products#Labsdb_needs_a_new_name [16:16:52] ok, so if I continue to call it labsdb public views nobody is going to get mad at me :P [16:17:13] heh, nope but please think of a better name :D [16:17:22] I could just continue on calling it toolserver database, and then I'm two degrees of being wrong [16:18:03] hipster labsdb bawolff [16:18:09] madhuvishy and I are trying to see if "wiki replicas" will stick [16:18:10] lol [16:18:31] wikireplicas is dangerously close to DB_REPLICA [16:18:51] which is a term that 0.004% of our users would know [16:18:52] since db slaves is apparently politically incorrect... [16:18:56] atm I think when folks say "the replicas" they generally mean the labsdb replica cluster [16:19:01] that's just IMO [16:19:14] so taking out the labs staleness and going wikireplicas seems fine [16:19:19] since that practice will probably continue anyhow [16:19:20] but also [16:19:24] there is no good name here [16:19:25] the big point of renaming is to make things more clear for folks who are not insiders [16:19:36] and we use labsdb for things /not/ replicas atm so it's a spiral of bad naming [16:19:41] that^ [16:19:45] confusing insiders is not a goal, but we can live with it [16:19:54] The roe of names is to make sense to developers, and confuse users, not the other way around :P [16:20:06] databaseoid [16:20:28] mediawikidbreplicaOMGBBQ1 [16:20:37] clouddb [16:20:56] that gets ugly becuse labsdb1004/1005 are also DB's but not replicas [16:21:02] hosted by the cloud services team [16:21:11] even "database" is an insider term in many respects [16:21:17] so are 1006/1007 [16:22:13] Give everything we provide a UUID [16:22:13] public wiki replicas (PWR) [16:22:14] "big wiki data" :D [16:22:29] oooh PoWeR [16:22:39] The {8ed86469-85b8-4dab-bb61-63dd124f481a} service [16:23:02] pfff alphanumeric, not sexy enough -- add emoji UUID and I'm in [16:23:30] the desired end goal is that a one page A4 sheet of the names and 5-10 word descriptions actually tells people what we have that they can use [16:23:33] Hmm, has anyone invented base-emoiji encoding yet [16:23:51] bd808: agreed [16:24:06] and the answer is https://github.com/pfrazee/base-emoji yes [16:24:11] https://www.npmjs.com/package/base-emoji [16:24:17] heh [16:24:21] sam google answer [16:24:53] base emoji ssh and gpg key fingerprints might actually be useful [16:26:05] can't use their mapping. It does not include :unicorn face: [16:27:03] bd808: actually, that's not a bad idea for hashes [16:28:02] 2000 years later and hieroglyphics are looking smart [16:28:47] pictogram based languages are widely used today. think of map legends and traffic signs [16:29:30] you can get around the transit system in most parts of the world without being literate in the local language [16:33:37] human brains are surprisingly good at information density when it's done well :) [16:43:39] Well, in any case, I started https://wikitech.wikimedia.org/wiki/Labsdb_redaction (rough draft) to document the redaction part, because its been rather confusing to me [16:53:47] bd808: that reminds me: how much work would it be for my toolsdb database to do joins against the replicas? [16:54:23] infinite. joining across database servers doesn't work [16:55:02] it is possible today to make dbs owned by a tool that live on the same server as a wiki's replica [16:55:14] the future of that ability is uncertain though [16:55:23] it makes maintaining the replicas difficult [16:55:56] having a mix of canonical and non-canonical data in the server complicates replication and restore processes [16:56:30] you would be much better served figuring out how to "join" in you application code but taking data from both sides and matching it up [16:56:39] *by taking [17:01:56] so basically, have two separate queries, and then reconcile them with python or something? [17:02:08] yeah that would be the typical method [17:02:26] it kind of depends on what you are actually wanting to join for. [17:02:59] often you would have a "driver query" that gets you some data that you then use to fetch additional data from another source [17:03:55] like "select all the things that match foo in dbA"; for each result: "select some detail from dbB" [17:04:35] bd808: so right now I basically duplicate data in my app, which ultimately causes maintenance issues. I may need to do a refactor so that I can get away with not relying on that duplicated data anymore, instead directly querying the canonical source [17:05:33] *nod* duping data is always a problem [17:05:44] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3412471 (10Papaul) asw-d-codfw:ge-1/0/13 [17:05:57] this is data related to programs or something similar on-wiki? [17:06:37] when you search this stuff in the app to you usually start from the duplicated data side or from the local data side? [17:06:43] *do you [17:25:19] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Commons' upload count incorrect in Edit Counter - https://phabricator.wikimedia.org/T169705#3412571 (10kaldari) 05Open>03Resolved [17:35:46] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestmetal2001.codfw.wmnet - https://phabricator.wikimedia.org/T168891#3412608 (10RobH) a:05RobH>03Papaul I've gone ahead and setup the following: robh@asw-b-codfw# show | compare [edit interfaces interface-range vlan-la... [17:41:01] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestmetal2001.codfw.wmnet - https://phabricator.wikimedia.org/T168891#3412639 (10RobH) Confirmed with @chasemp that the instances vlan is indeed where we want this. Once the OS install is done, assign back to me to enable th... [17:42:43] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Internal Server Error from new articleinfo interface in XTools - https://phabricator.wikimedia.org/T169767#3412642 (10Matthewrbowker) defaults.yml was my idea. In fact, most of the configuration setup is mine - I'll document it better. Might be a good idea... [17:44:21] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3412643 (10RobH) a:05RobH>03Papaul network port setup done [17:44:32] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3412645 (10RobH) [17:47:53] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:49:33] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Inconsistent tool name format in new XTools - https://phabricator.wikimedia.org/T169913#3412653 (10kaldari) [17:49:43] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Inconsistent tool name format in new XTools - https://phabricator.wikimedia.org/T169913#3412665 (10kaldari) p:05Triage>03Normal [18:21:12] 10Data-Services, 10DBA, 10Patch-For-Review, 10Wikimedia-maintenance-script-run: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3412750 (10bd808) >>! In T169488#3410469, @Marostegui wrote: > Our sanitarium host (d... [18:22:03] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10Patch-For-Review, and 2 others: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3412759 (10bd808) a:05jcrespo>03bd808 [18:22:54] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [18:25:03] Got a question is merging 2 tools to 1 labs instance a viable reason to request a labs project? [18:26:18] Not really? [18:26:42] What benefit do you get from having a VPS instance instead of just two tools on the toolforge? [18:27:29] harej: not having to logout of one tool to switch [18:28:23] So the thing with having a VPS is that you are responsible for conducting all maintenance. It doesn't seem like a reasonable cost for the convenience of not needing to toggle between tools. [18:29:54] Ok i was just curious [18:33:56] Zppix: if you need to switch often, just open two terminals? [18:34:25] valhallasw`cloud: i know i just didnt want to anyway i was just curious i wasnt actually going to request anything [18:35:33] GNU screen! [18:36:18] random much? [18:37:04] Zppix: GNU screen is a computer program for switching between multiple virtual terminals in a single terminal [18:37:35] oh [18:37:41] (shows how much i know about terminals [18:37:57] Although the cool kids all use tmux these days (Or so I'm told. /me not cool) [18:39:48] bawolff: i've used tmux once on tools to make my bot run without being killed when idle kick occoured (because i didnt have the knowledge of using job grid or k8s then [19:00:52] (03PS1) 10Zppix: Add additional info about Icinga itself and provide a link to the orginial download. [labs/icinga2] - 10https://gerrit.wikimedia.org/r/363650 [19:01:33] (03PS2) 10Zppix: Add additional info about Icinga itself and provide a link to the orginial download. [labs/icinga2] - 10https://gerrit.wikimedia.org/r/363650 [19:02:45] (03CR) 10Paladox: [V: 032 C: 032] Add additional info about Icinga itself and provide a link to the orginial download. [labs/icinga2] - 10https://gerrit.wikimedia.org/r/363650 (owner: 10Zppix) [19:04:59] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Inconsistent tool name format in new XTools - https://phabricator.wikimedia.org/T169913#3412653 (10Matthewrbowker) All of the above is simply a throwback from the old XTools. I would go for Title Case. [19:07:02] +1 for tmux. it can be configured to use the same keyboard shortcuts as Screen but is more actively developed and better documented [19:07:19] (03PS1) 10Zppix: Add license from original source to comply with license requirements. [labs/icinga2] - 10https://gerrit.wikimedia.org/r/363652 [19:09:19] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10MediaWiki-extensions-Babel, 10Patch-For-Review: Replicate babel db table on Labs - https://phabricator.wikimedia.org/T160713#3108521 (10Andrew) This is done on most labsdbs. Remaining steps: 1) Figure out why puppet is disabled on labsdb1009 and... [19:13:54] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [19:15:19] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Inconsistent tool name format in new XTools - https://phabricator.wikimedia.org/T169913#3412653 (10MusikAnimal) I also prefer Title Case. As part of this task we might want to also make the internal routing names consistent too, e.g. [[ https://github.com/x-to... [19:19:17] (03PS2) 10Paladox: Add license from original source to comply with license requirements. [labs/icinga2] - 10https://gerrit.wikimedia.org/r/363652 (owner: 10Zppix) [19:19:23] (03CR) 10Paladox: [V: 032 C: 032] Add license from original source to comply with license requirements. [labs/icinga2] - 10https://gerrit.wikimedia.org/r/363652 (owner: 10Zppix) [19:27:47] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Internal Server Error from new articleinfo interface in XTools - https://phabricator.wikimedia.org/T169767#3413072 (10MusikAnimal) >>! In T169767#3410542, @kaldari wrote: >> composer install doesn't prompt for labs-only parameters. Although, is this really a l... [19:28:19] bd808: if I wanted to link to a landing page for Toolforge, what would I link to? [19:28:58] today -- https://wikitech.wikimedia.org/wiki/Portal:Tool_Labs [19:29:13] that will be redirected when I get to updating wikitech content [19:29:30] hopefully I will make the page a bit less ugly then too [19:29:41] it suffered some in the last content refactoring [19:34:17] 10cloud-services-team, 10Operations, 10Upstream: New anti-stackclash (4.9.25-1~bpo8+3 ) kernel super bad for NFS - https://phabricator.wikimedia.org/T169290#3393693 (10chasemp) >>! In T169290#3399875, @MoritzMuehlenhoff wrote: > Which NFS services/processes caused this? Summarizing from IRC for poserity :)... [19:39:13] 10cloud-services-team (Kanban), 10User-bd808: Design and submit Wikimania poster art for "FLOSS best practices for bots and tools" - https://phabricator.wikimedia.org/T169919#3413118 (10bd808) [20:16:34] 10Data-Services, 10Toolforge, 10cloud-services-team (Kanban): Toolforge data loss for permissive data July 2 2017 - https://phabricator.wikimedia.org/T169774#3413266 (10chasemp) ```#!/bin/bash cd /srv/backup/tools/shared/tools/project # 566f5ecc3bf0f3bdbd90a609cf966f70 /home/rush/tools-permissive-files wc... [20:53:12] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3413393 (10Papaul) [20:54:03] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3379828 (10Papaul) a:05Papaul>03chasemp @chasemp This is complete, you can take over. Thanks. [20:54:31] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3413398 (10Papaul) [20:54:45] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3379794 (10Papaul) a:05Papaul>03chasemp @chasemp This is complete, you can take over. Thanks. [20:56:11] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3413401 (10Papaul) [20:56:15] PROBLEM - Puppet errors on tools-worker-1018 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:56:35] 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3379811 (10Papaul) a:05Papaul>03chasemp @chasemp This is complete, you can take over. Thanks. [21:01:54] 10cloud-services-team (Kanban): Design and submit Wikimania poster art for "What is Cloud Services?" - https://phabricator.wikimedia.org/T169927#3413432 (10bd808) [21:02:15] 10cloud-services-team (FY2017-18), 10Goal: Program 10 Outcome 3: Outreach - https://phabricator.wikimedia.org/T166406#3413445 (10bd808) [21:02:17] 10cloud-services-team (Kanban): Design and submit Wikimania poster art for "What is Cloud Services?" - https://phabricator.wikimedia.org/T169927#3413444 (10bd808) [21:25:23] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:36:17] RECOVERY - Puppet errors on tools-worker-1018 is OK: OK: Less than 1.00% above the threshold [0.0] [21:48:47] 10cloud-services-team, 10DBA, 10Operations: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3413600 (10madhuvishy) @jcrespo, okay, I'll do the announcements. @Halfak We are proposing labsdb1004 reboot (wikilabels db server) for Tuesday 11 July at 1400 UTC. Would that work for... [21:55:19] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [21:58:29] 10cloud-services-team, 10DBA, 10Operations, 10Scoring-platform-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3413650 (10Halfak) Yup! That works! [21:58:34] 10cloud-services-team, 10DBA, 10Operations, 10Scoring-platform-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3413651 (10Zppix) [22:15:20] 10Cloud-Services, 10Quarry: Consider moving Quarry to be an installation of Redash - https://phabricator.wikimedia.org/T169452#3413729 (10Milimetric) Thanks @yuvipanda, looks promising. [22:23:53] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [22:26:54] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Internal Server Error from new articleinfo interface in XTools - https://phabricator.wikimedia.org/T169767#3413759 (10Samwilson) >>! In T169767#3412642, @Matthewrbowker wrote: > I don't believe that `composer install` actually touches parameters.yml if there a... [22:30:08] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1418 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [22:44:52] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [22:57:08] tarrow: Hi! Do you actively maintain the tool fatameh? [22:57:26] madhuvishy: Yes! Is it causing problems? [22:58:13] tarrow: it seems like it's IO and CPU utilization is super high, the k8s node is running on has been complaining of High IOWait on and off for 3 days now. What does it do :) [22:59:15] It is a django app that uses OAuth to help people make wikidata items. [22:59:36] aah [22:59:46] this is from iotop a few minutes ago [22:59:49] https://www.irccloud.com/pastebin/hab1rQ0D/ [23:00:00] It is probably because I wrote it following a very basic intro to OAuth guide and didnt think about having a job queue etc.. [23:00:16] I would guess that it's become popular and is now hammering resources [23:00:35] yeah it seems to be reading a ton from disk/nfs [23:00:45] Oh [23:00:51] it shouldn't be... [23:01:15] My only thought is that it is writing to the uwsgi.ini log [23:01:21] aah [23:01:32] -rw-r----- 1 tools.fatameh tools.fatameh 197105181 Jul 6 23:01 uwsgi.log [23:01:36] you may be right [23:01:48] I think that happens automatically from webservices though right? [23:01:49] it's about 188M rn [23:01:56] hehe, that's too buig [23:02:00] big* [23:02:18] yeah I think so [23:03:43] Cool, I've moved it and I'll gzip up the old one. Do you think the IO wait is because it is large or... it is large because there is so much writing to the logs? [23:04:40] I'm not sure exactly - the read rates are high, from the iotop numbers atleast. Does it have other data it reads from /data/project or /home? [23:07:15] tarrow: it looks like it is using an sqlite db too. If it is using sessions in db then that could cause a ton of io [23:07:18] Probably a little, there is an sqlitedb (but it should be absolutely tiny). I really need to migrate it to [23:07:22] maria [23:07:43] hehe, yes bd808. We passed each other [23:08:14] Ok, I'll take it down for a bit while I migrate it to mariadb [23:08:26] sorry to cause problems :/ [23:08:26] cool. shout if you need help [23:08:59] tarrow: no need to be sorry. something is always causing problems, we just noticed this one [23:09:11] :) [23:09:27] and you being responsive makes us very happy [23:09:35] totally :) [23:10:07] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [23:10:11] It was lucky timing :). I was just about to sleep... [23:10:57] tarrow: it's not that urgent :) let us know if you need help! good night :) [23:19:52] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0]