[00:04:54] 10Data-Services, 10cloud-services-team (Kanban), 10User-bd808: Promote initial use of new Wiki Replica servers - https://phabricator.wikimedia.org/T172704#3633841 (10MusikAnimal) @bd808 I take it we should no longer rely on the `wikireplica-web` and `wikireplica-analytics` host names? [00:05:21] bd808: RE: new replicas 1) awesome! 2) Looks like the 'sql' tool doesn't support this hostname format yet. [00:05:34] Krinkle: no, it does not [00:05:35] Would make connecting easier without mysql -h --default-file=.. hardcoding [00:05:44] also, should its update be changed to one of web/analytics? [00:05:49] default be updated* [00:06:05] it pretty much needs to be rewritten from scratch :) [00:06:19] its an ugly little bash script [00:06:24] bd808: Would it be okay if I switch guc to the new web replicas? [00:06:54] sure! They are 100% ready for use [00:06:55] I don't know the stats, but I imagine it's a fairly large consumer of replicas from web perspective right now. Not sure if that's too soon, or whether you'd like that. [00:07:40] From a few quick spot checks looks like the new slaves respond about the same time for most queries, some queries slower right now (less hot cache I guess?) but repeat queries are always faster on the new slaves for me (with warm cache). yay :) [00:07:53] 10Data-Services, 10cloud-services-team (Kanban), 10User-bd808: Promote initial use of new Wiki Replica servers - https://phabricator.wikimedia.org/T172704#3633845 (10bd808) >>! In T172704#3633841, @MusikAnimal wrote: > @bd808 I take it we should no longer rely on the `wikireplica-web` and `wikireplica-analyt... [00:08:05] the 2.5x RAM should help with cache :) [00:08:37] 10Tool-Global-user-contributions: Switch GUC to use new Wiki Replicas - https://phabricator.wikimedia.org/T176686#3633849 (10Krinkle) [00:08:44] 10Tool-Global-user-contributions: Switch GUC to use new Wiki Replicas - https://phabricator.wikimedia.org/T176686#3633862 (10Krinkle) p:05Triage>03Normal a:03Krinkle [00:12:32] 10Toolforge: Update `sql` command to use new wiki replica servers - https://phabricator.wikimedia.org/T176688#3633887 (10bd808) [00:12:43] 10Toolforge: Update `sql` command to use new wiki replica servers - https://phabricator.wikimedia.org/T176688#3633899 (10bd808) [00:12:45] 10Data-Services, 10cloud-services-team (FY2017-18), 10DBA, 10Goal: Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T142807#3633900 (10bd808) [00:15:03] Krinkle: you should make a pretty toolinfo block for guc too -- https://toolsadmin.wikimedia.org/tools/id/guc [00:16:28] for anyone wondering if they have a tool that uses the deprecated user owned tables on c1 and c3, there's a tool now that will show you -- https://tools.wmflabs.org/tool-db-usage/ [00:21:57] bd808: cool tool, found another table I need to delete :) [00:22:10] bd808: can Quarry query databases on tools.labsdb? [00:22:35] Nettrom: I honestly don't know if it can or not. [00:24:01] if it does it would have to have some magic for parsing the USE statement at the top of a query [00:27:39] bd808: so for something like https://tools.wmflabs.org/pageviews that is cross-wiki, is it better to choose shard or is `enwiki.web.db.svc.eqiad.wmflabs` okay too (since it appears to work even for other wikis)? [00:27:50] Yeah, I don’t know either. Not sure if I want to push to have some of SuggestBot’s databases in datasets_p, or just move them to tools.labsdb. Either way it’ll have to wait for a bit, so I’ll think about it and get in touch if anything’s needed. [00:29:02] musikanimal: under the hood, the wikidb names like enwiki map to the shard (s1 in that case) [00:29:23] so it's really not much difference either way [00:29:50] *except* if you hod connections open and use lots of wikis then you should probably use the shard names [00:30:04] *hold connections open [00:30:36] today there are 7 shards and 900+ wikidb names [00:30:55] so 7 vs 900 is a big difference in active connections [00:31:05] I see [00:31:23] shard 8 is coming in Q2 :) [00:31:25] well Pageviews runs very quick and fast queries, so I guess there it doesn't matter [00:31:26] wikidata is going to get its own shard [00:31:38] oh? [00:31:58] its a DBA+techops+WMCS Q2 goal [00:32:20] on wiki nobody should notice, but some tools will care [00:32:55] yeah, so Pageviews works on wikidata too for instance. If I use `enwiki.web.db.svc.eqiad.wmflabs` is that always going to work? since you say it resolves to s1 [00:33:05] yes [00:33:13] ok cool :) [00:34:07] and what do you recommend for XTools? It *can* do long, expensive queries, so I figured the analytics server was better there [00:35:02] it may be better in the long run, but "it depends" [00:35:20] hehe alright [00:35:45] right now there is really no difference between *.web and *.analytics [00:35:56] I'll go with analytics for now I guess. And here it is also safe to use `enwiki` over a shard? [00:36:15] yeah, its the same in both sub-domains [00:36:49] all the $LANG$FAMILY dbnames are DNS CNAMES pointing to the shard that hosts that wiki [00:37:05] I see [00:37:24] so even if I connect to enwiki, it's going to resolve to the proper shard based on what wiki I query? [00:37:27] you can read the crazy script I wrote to manage them -- https://gerrit.wikimedia.org/r/#/c/378739/ [00:38:28] no, if you connect to "enwiki.*.db.svc.eqiad.wmflabs" then you will be connected to the "s1" host [00:38:43] nothing is looking at the contents of the query [00:38:50] or any USE statement [00:39:06] yeah, sorry if I don't understand. Because looking at https://wikitech.wikimedia.org/wiki/MariaDB#Shards it seems I'd need to connect to a different shard if I wanted to query say, Commons [00:39:35] to be future proof, yes. [00:39:59] today all hosts have all tables available. That is not guaranteed to be the case in the future though [00:40:11] oh boy [00:40:29] so if say we find that 30% of queries are made to enwiki we could put enwiki on its own [00:41:31] so if you have a tool that does cross-wiki lookups you really should open a different db connection for each shard that you are talking to [00:41:57] or be prepared to do that quickly the day that things stop working "accidentally" [00:42:16] ouch, I get why that it is but it will make the code a bit more complicated [00:43:00] I don't know your code, but it shouldn't be that tricky really [00:43:20] and sometimes we might want to JOIN or something cross-wiki. Would that still be possible? [00:43:39] as long as its in the same shard [00:43:54] right [00:44:00] joining enwiki to commons only works by accident today [00:45:01] got it [00:45:16] so for now, nothing is set in stone [00:45:48] we reserve the right to change how the backend servers work with very little or no notice :) [00:45:52] haha [00:45:56] understood [00:46:08] in practice we will try not to surprise people [00:46:10] my final question -- is it *faster* to use s4 over s1 if I want to only query Commons, for example? [00:46:27] no difference today [00:46:48] alrighty [00:47:03] since today s[1-7].web all point at the same physical box [00:47:23] okay, all I need to know I guess [00:48:00] I will use `enwiki` for now but leave a comment that it isn't future proof [00:48:16] maybe there will be a machine-readable resource somewhere that maps out the db names to the correct shard? [00:53:02] I could query that and cache it. Then if you wanted to move say `dewiki` to its own shard, you could first have it live in both the old and new shard, and update that machine-readable endpoint in advance. That way once dewiki is no longer available on the old shard, the tool is already automatically updated to connect to the right place [00:53:05] if that makes sense [00:53:23] but I assume this sort of thing won't happen super often so I probably shouldn't worry about it [00:53:52] just an idea! [00:54:25] musikanimal: there is a db that gives the mappings and data files on noc [00:54:43] oh well there ya go haha [00:55:31] the "meta_p" database gives the mappings -- https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Metadata_database [00:56:04] and the files on noc -- https://noc.wikimedia.org/conf/s1.dblist [00:56:19] https://noc.wikimedia.org/conf/s2.dblist -- etc [00:58:24] beautiful, thank you! [01:01:36] 10Data-Services, 10DBA, 10Epic: Labs database replica drift - https://phabricator.wikimedia.org/T138967#3633963 (10bd808) [01:19:32] 10Data-Services, 10Quarry: Switch Quarry to use *.analytics.db.svc.eqiad.wmflabs as replica database host - https://phabricator.wikimedia.org/T176694#3633999 (10bd808) [01:19:45] 10Data-Services, 10cloud-services-team (FY2017-18), 10DBA, 10Goal: Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T142807#3634014 (10bd808) [01:19:47] 10Data-Services, 10Quarry: Switch Quarry to use *.analytics.db.svc.eqiad.wmflabs as replica database host - https://phabricator.wikimedia.org/T176694#3634013 (10bd808) [01:45:46] PROBLEM - Puppet errors on tools-exec-1412 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [02:55:45] RECOVERY - Puppet errors on tools-exec-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [03:41:21] PROBLEM - Puppet errors on tools-exec-1409 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [03:44:32] 10Data-Services, 10Quarry: Switch Quarry to use *.analytics.db.svc.eqiad.wmflabs as replica database host - https://phabricator.wikimedia.org/T176694#3633999 (10zhuyifei1999) I donno is the config.yaml is puppetized somewhere (probably not), but this change should do the switch: ``` $ sed 's/enwiki.labsdb/enwi... [03:50:50] bd808: should I do the switch /right now/ or should I do a bit more waiting? [03:50:54] I mean quarry [03:51:43] zhuyifei1999_: well.... [03:52:45] I have an email from halfak that I think says the breakage should be minimal [03:53:18] there will be some loss of functionality, but I'm not sure how to warn people about that [03:53:23] I guess that's a 'right now' then [03:53:45] It's basically just a config setting, correct? [03:54:18] I think so [03:54:32] unless it's puppetised somewhere. [03:54:47] but considering it contains oauth secrets that's unlikely [03:55:08] also the uwsgi and celery workers has to be restarted afaik [03:55:51] https://quarry.wmflabs.org/query/runs/all no query seems actually running atm so that should be straightforward [03:56:31] it may be possible to tune the query killer to let things run longer too [03:59:47] !log quarry Switching REPLICA_HOST from 'enwiki.labsdb' to 'enwiki.analytics.db.svc.eqiad.wmflabs' T176694 (Executing `sudo -- sudo -u quarry sed -i 's/enwiki.labsdb/enwiki.analytics.db.svc.eqiad.wmflabs/' /srv/quarry/quarry/config.yaml` on all hosts) [03:59:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [03:59:51] T176694: Switch Quarry to use *.analytics.db.svc.eqiad.wmflabs as replica database host - https://phabricator.wikimedia.org/T176694 [04:02:18] zhuyifei1999@quarry-main-01:~$ sudo service uwsgi status [04:02:18] * which one? [04:02:20] o.O [04:02:27] no systemctl either [04:02:46] sudo service uwsgi-SOMETHING [04:03:10] probably uwsgi-quarry [04:03:39] uwsgi-quarry: unrecognized service [04:03:46] argh /me reads puppet [04:04:37] uwsgi-quarry-web ? [04:04:43] uwsgi-quarry-web start/running, process 22212 [04:04:45] yeah [04:08:20] okay gotta restart the services. fingers crossed [04:08:48] !log quarry Restarting service 'uwsgi-quarry-web' on quarry-main-01, 'celery-quarry-worker' on quarry-runner-01 & quarry-runner-02 T176694 [04:08:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [04:08:53] T176694: Switch Quarry to use *.analytics.db.svc.eqiad.wmflabs as replica database host - https://phabricator.wikimedia.org/T176694 [04:10:42] 10Data-Services, 10cloud-services-team (FY2017-18), 10DBA, 10Goal: Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T142807#3634209 (10zhuyifei1999) [04:10:44] 10Data-Services, 10Quarry: Switch Quarry to use *.analytics.db.svc.eqiad.wmflabs as replica database host - https://phabricator.wikimedia.org/T176694#3634206 (10zhuyifei1999) 05Open>03Resolved a:03zhuyifei1999 LGTM [04:13:43] bd808: thanks :) [04:17:49] thank you zhuyifei1999_ :) [04:17:59] uh np [04:18:57] just wondering, how much faster did the queries run? [04:19:50] no timing data, so now way to tell from the gui :/ [04:20:47] I might be able to come up a way to store the timing data, but idk where to display them [04:22:14] It could be put where the "Resultset (N rows)" is. Something like "Resultset (N rows; 00:00)" [04:22:40] that's populated by the library that does the tables [04:22:52] ah [04:22:56] and there might be multiple resultsets [04:23:27] I was looking at the code a tiny bit today but I really haven't looked at it much [04:23:42] ok [04:24:20] it's https://phabricator.wikimedia.org/T71264, so if you have any ideas feel free to comment [04:24:26] I poked around a bit after someone asked if it could query toolsdb [04:25:01] 10Toolforge, 10Outreachy (Round-15): Outreachy - webservice microtask for Mridu_Bhatnagar - https://phabricator.wikimedia.org/T176018#3634210 (10Mridu_Bhatnagar) @srishakatux Hello! Was relocating hence, wasn't able to continue since a couple of days. From today going to start again. :) Thanks. [04:25:01] ^ is a no :( [04:25:10] which led me to wonder about T169452 :) [04:25:11] T169452: Consider moving Quarry to be an installation of Redash - https://phabricator.wikimedia.org/T169452 [04:26:04] ... which I simply don't have time to invest time on :( [04:26:27] there was some other similar tool that yuvi dropped a link too in irc a while ago too [04:27:05] yeah, it would be probably be a real project to do the fixes needed for Redash [04:27:47] * zhuyifei1999_ thinks about GSoC :P [04:28:35] yeah, if someone had all the things needed figured out then a GSoC/Outreach student might be able to make good progress [04:29:06] someday™ Cloud Services will have more energy to put towards Quarry & PAWS [04:29:16] lol [04:36:19] RECOVERY - Puppet errors on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [04:59:17] (03PS1) 10BryanDavis: sql: Update to allow connecting to new cluster [labs/toollabs] - 10https://gerrit.wikimedia.org/r/380684 (https://phabricator.wikimedia.org/T176688) [05:12:52] (03Abandoned) 10BryanDavis: Update git repo link [labs/toollabs] - 10https://gerrit.wikimedia.org/r/358039 (owner: 10Framawiki) [05:13:39] (03Abandoned) 10BryanDavis: Switch list.php to proxymanager's new API [labs/toollabs] - 10https://gerrit.wikimedia.org/r/268343 (owner: 10Tim Landscheidt) [05:14:21] (03PS3) 10BryanDavis: WIP: Don't ignore fchdir()'s errors [labs/toollabs] - 10https://gerrit.wikimedia.org/r/331227 (owner: 10Tim Landscheidt) [05:15:02] (03CR) 10jerkins-bot: [V: 04-1] WIP: Don't ignore fchdir()'s errors [labs/toollabs] - 10https://gerrit.wikimedia.org/r/331227 (owner: 10Tim Landscheidt) [05:36:40] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10User-bd808: Create and announce timeline for shutting down labsdb100[13] - https://phabricator.wikimedia.org/T175086#3582412 (10Marostegui) >>! In T175086#3633776, @bd808 wrote: > We also need to choose a date sometime in October to perform the outs... [05:38:00] (03CR) 10BryanDavis: [C: 04-1] WIP: Don't ignore fchdir()'s errors (031 comment) [labs/toollabs] - 10https://gerrit.wikimedia.org/r/331227 (owner: 10Tim Landscheidt) [06:01:44] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10User-bd808: Create and announce timeline for shutting down labsdb100[13] - https://phabricator.wikimedia.org/T175086#3634372 (10bd808) >>! In T175086#3634296, @Marostegui wrote: > My question to that would be...if one of them doesn't come back, are... [06:03:00] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10User-bd808: Create and announce timeline for shutting down labsdb100[13] - https://phabricator.wikimedia.org/T175086#3634375 (10Marostegui) >>! In T175086#3634372, @bd808 wrote: >>>! In T175086#3634296, @Marostegui wrote: >> My question to that woul... [06:25:18] 10cloud-services-team, 10DBA, 10Patch-For-Review: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3634410 (10Marostegui) What should we do with this task? is it all good now? [06:40:41] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1427 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [06:45:10] hi! [06:45:46] PROBLEM - Puppet errors on tools-exec-1416 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:20:43] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1427 is OK: OK: Less than 1.00% above the threshold [0.0] [07:25:46] RECOVERY - Puppet errors on tools-exec-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [08:02:24] 10cloud-services-team, 10DBA, 10Patch-For-Review: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3634575 (10hashar) >>! In T175002#3579784, @chasemp wrote: > I'm wondering if this is related: > > https://phabricator.wikimedia.org/T170492#3... [08:36:56] (03PS2) 10Lokal Profil: [WIP]Restructure missing_commonscat_links Statistics [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380060 (https://phabricator.wikimedia.org/T176528) [08:37:52] (03PS3) 10Lokal Profil: [WIP]Restructure missing_commonscat_links Statistics [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380060 (https://phabricator.wikimedia.org/T176528) [09:17:20] is it possible to continue using the old db servers? [09:18:44] (03PS1) 10Lokal Profil: Fix misspelt parameter in lu_lb [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380708 (https://phabricator.wikimedia.org/T174556) [09:21:52] (03CR) 10Lokal Profil: "I did a test harvest and there are no hits for the old spelling." [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380708 (https://phabricator.wikimedia.org/T174556) (owner: 10Lokal Profil) [09:24:56] annika, the old servers are not being removed... yet [09:25:07] however, they can fail at any moment [09:25:16] they are 6 year old [09:25:22] and have no redundancy [09:25:31] we already lost 1 of the 3 we used to have [09:32:00] (03CR) 10Jean-Frédéric: [C: 032] Fix misspelt parameter in lu_lb [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380708 (https://phabricator.wikimedia.org/T174556) (owner: 10Lokal Profil) [09:33:15] (03Merged) 10jenkins-bot: Fix misspelt parameter in lu_lb [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380708 (https://phabricator.wikimedia.org/T174556) (owner: 10Lokal Profil) [09:34:10] (03CR) 10jenkins-bot: Fix misspelt parameter in lu_lb [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380708 (https://phabricator.wikimedia.org/T174556) (owner: 10Lokal Profil) [09:42:33] 10cloud-services-team, 10Wikimedia-Hackathon-2018-Organization, 10Developer-Relations (Jul-Sep 2017): Featured Projects related to Wikimedia Cloud Services and/or Technical Operations? - https://phabricator.wikimedia.org/T170242#3634806 (10Aklapper) 05Open>03declined Makes sense. Thanks for the explanation! [09:48:32] jynus: if i understood that right, you cannot join user databases and replicas on the new servers anymore? [09:50:49] (03CR) 10Jean-Frédéric: [C: 032] Add default instructions to top of unused images reports [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380058 (owner: 10Lokal Profil) [09:52:25] !log tools.heritage Deploy latest from Git master: 263ccee (T174556) [09:52:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [09:52:29] T174556: Harvester misses field in lu_(lb) - https://phabricator.wikimedia.org/T174556 [10:09:01] 10PAWS: PAWS - Redirect loop detected - https://phabricator.wikimedia.org/T175454#3593792 (10MisterSynergy) Same here: “Start My Server” leads to a redirect, and after three cycles it reports “500 : Internal Server Error” [11:01:13] (03CR) 10Jean-Frédéric: [C: 031] Add default instructions to top of unused images reports [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380058 (owner: 10Lokal Profil) [11:01:15] (03CR) 10Jean-Frédéric: [C: 032] Add default instructions to top of unused images reports [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380058 (owner: 10Lokal Profil) [11:04:47] (03CR) 10Jean-Frédéric: [C: 032] Group unused images per source page [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/379141 (https://phabricator.wikimedia.org/T117327) (owner: 10Lokal Profil) [11:06:40] (03Merged) 10jenkins-bot: Group unused images per source page [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/379141 (https://phabricator.wikimedia.org/T117327) (owner: 10Lokal Profil) [11:07:38] (03CR) 10jenkins-bot: Group unused images per source page [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/379141 (https://phabricator.wikimedia.org/T117327) (owner: 10Lokal Profil) [11:12:59] !log tools.heritage Deploy latest from Git master: 2828a0f (T117327) [11:13:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [11:13:03] T117327: Sort and group "Unused images" and "Missing commons category links" by the page name - https://phabricator.wikimedia.org/T117327 [11:17:27] (03PS4) 10Jean-Frédéric: Add default instructions to top of unused images reports [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380058 (owner: 10Lokal Profil) [11:55:58] (03Abandoned) 10Jean-Frédéric: Guard against empty totals when making statistics [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/379974 (https://phabricator.wikimedia.org/T176528) (owner: 10Jean-Frédéric) [12:34:15] PROBLEM - Puppet errors on tools-exec-1441 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [12:47:48] 10cloud-services-team (Kanban), 10Operations: puppet ca_server confusion - https://phabricator.wikimedia.org/T176437#3635161 (10akosiaris) It's also used in revocation checks which can and will happen by masters and not just agents, in order to verify the agent is authorized to connect to them and obtain the c... [13:09:12] RECOVERY - Puppet errors on tools-exec-1441 is OK: OK: Less than 1.00% above the threshold [0.0] [13:33:41] 10Tool-stewardbots: hat-web-tool/projects: listing incorrect number of users on some projects - https://phabricator.wikimedia.org/T176578#3635265 (10MarcoAurelio) @bd808 Something that can be fixed with what was announced at {J70}? [13:33:49] 10Tool-stewardbots, 10Need-volunteer, 10WorkType-Maintenance: Outdated MySQL handling for hat-web-tool@stewardbots - https://phabricator.wikimedia.org/T156545#3635267 (10MarcoAurelio) @bd808 Something that can be fixed with what was announced at {J70}? [13:47:50] !log tools.stewardbots Updated stewardbots to 2f72f0d16eb2 [13:47:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [13:58:37] 10cloud-services-team, 10DBA, 10Patch-For-Review: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3635438 (10chasemp) 05Open>03Resolved >>! In T175002#3634410, @Marostegui wrote: > What should we do with this task? is it all good now? C... [14:03:08] (03PS1) 10MarcoAurelio: Update MySQL DB connection to use analytics.db.svc.eqiad.wmflabs [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380748 (https://phabricator.wikimedia.org/T176578) [14:04:23] (03CR) 10MarcoAurelio: [C: 032] Update MySQL DB connection to use analytics.db.svc.eqiad.wmflabs [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380748 (https://phabricator.wikimedia.org/T176578) (owner: 10MarcoAurelio) [14:04:47] (03Merged) 10jenkins-bot: Update MySQL DB connection to use analytics.db.svc.eqiad.wmflabs [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380748 (https://phabricator.wikimedia.org/T176578) (owner: 10MarcoAurelio) [14:09:44] !log tools.stewardbots Updated stewardbots to 261fda1 [14:09:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [14:13:19] 10Tool-stewardbots, 10Patch-For-Review: hat-web-tool/projects: listing incorrect number of users on some projects - https://phabricator.wikimedia.org/T176578#3635509 (10MarcoAurelio) https://github.com/wikimedia/labs-tools-stewardbots/commit/261fda1c58556c21e479ca6dab1a7a4abdc09672 didn't break the tool (that'... [14:16:24] Hey, one of my Dashboards running from a Labs instance (wikidataconcepts) cannot connect to MySql on 'tools.labsdb: [14:16:24] Failed to connect to database: Error: Can't connect to MySQL server on 'tools.labsdb' (111) [14:16:41] Does anyone know whether the things are running smoothly on tools.labsdb? [14:19:35] nope, they are not [14:19:39] we are trying to put them up [14:20:41] bd808: Is there a way to get to the new replicas with the command line 'sql' command, or do I need to connect manually? [14:20:45] jynus: Thanks a lot. Just wanted to make sure. [14:21:42] !log graphite deleting all instances. Filippo confirms they're unused, and puppet is broken on every single one. [14:21:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Graphite/SAL [14:25:46] !log services deleting docker-testing01; it's broken and gwicke says it is unused [14:25:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Services/SAL [14:28:04] GoranSM, should be up [14:28:41] !help I'd like to use a specific module on labs/forge, but it uses scons to build itself, so is installing it possible? [14:28:41] DatGuy: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [14:28:51] module in question is https://github.com/escaped/pyexiv2 [14:30:47] DatGuy: I don't know what 'scons' is, but the standard answer is: on toolforge you can generally install whatever python things you need in a venv if you use the kubernetes backend. [14:31:01] On a VM of course you can install whatever you want, as long as it's OSS compliant. [14:31:09] 10Tool-stewardbots, 10Patch-For-Review, 10User-MarcoAurelio: hat-web-tool/projects: listing incorrect number of users on some projects - https://phabricator.wikimedia.org/T176578#3635562 (10MarcoAurelio) 05Open>03Resolved a:03MarcoAurelio I've updated the hosts to the new ones, and after consulting wit... [14:31:25] 10Tool-stewardbots, 10User-MarcoAurelio: hat-web-tool/projects: listing incorrect number of users on some projects - https://phabricator.wikimedia.org/T176578#3635566 (10MarcoAurelio) [14:32:10] Hm… all I find is https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes — bd808 do you have a doc link for DatGuy re installing local python packages on k8s? [14:33:52] `scons` as in http://www.scons.org/ fwiw, so its looking more like a cloud VPS [14:34:49] I mean, if scons is a "new" make/build tool set it would have the ability i suspect to specify binary path and permissions [14:35:54] * TNT|away is instantly dubious when something is self-described as `next-generation * tool` [14:37:25] as are we all :) [14:39:07] anomie: I have some patches up for /usr/bin/sql, but not generally deployed yet. You can try using ~bd808/sql and see if that works for you for now. Check it's --help. [14:40:13] bd808: I'm gonig to merge https://gerrit.wikimedia.org/r/#/c/380318/ fyi if you see odd behavior on statics [14:40:24] bd808: Permission denied. I figured out the manual command to do it anyway, then I decided to leave trying to convert AnomieBOT until I have more time to decide what to do about that one slow query. [14:40:30] DatGuy: there is info on wikitech about using virtualenv and pip inside a kubernetes container -- https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Python_.28uWSGI.29 [14:41:39] anomie: fair enough. We are not planning on turning off the older cluster for a couple of months (if we can keep the hardware alive that long) [14:42:18] chasemp: cool. We aren't sure it will fix the problem, but it probably won't hurt. I did check that our nginx supports the syntax already [14:42:34] * chasemp nods, nice -- I figured :) [14:44:09] heh. for those wondering, scons is "next generation" like Star Trek: TNG. Its a circa 2000 build tool made by some folks who got sick of autoconf/automake [14:45:47] the page looked straight early 00's too [14:59:27] 10cloud-services-team (Kanban): CamelCase vs. VPS instance naming - https://phabricator.wikimedia.org/T176757#3635689 (10Andrew) [15:17:23] (03PS1) 10Lokal Profil: Remove non-url registrant_url [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380769 [15:18:23] !log suggestbot Deleted unused databases from c1.labsdb & c3.labsdb [15:18:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Suggestbot/SAL [15:18:38] 10Tool-stewardbots, 10User-MarcoAurelio: hat-web-tool/projects: listing incorrect number of users on some projects - https://phabricator.wikimedia.org/T176578#3635812 (10MarcoAurelio) 05Resolved>03Open https://phabricator.wikimedia.org/diffusion/TSTW/browse/master/hat-web-tool/projects.php does not take in... [15:18:42] (03CR) 10Lokal Profil: "Discovered as part of https://phabricator.wikimedia.org/T176112" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380769 (owner: 10Lokal Profil) [15:19:20] 10Tool-stewardbots, 10Need-volunteer, 10WorkType-Maintenance: Outdated MySQL handling for hat-web-tool@stewardbots - https://phabricator.wikimedia.org/T156545#3635817 (10MarcoAurelio) [15:19:23] 10Tool-stewardbots, 10User-MarcoAurelio: hat-web-tool/projects: listing incorrect number of users on some projects - https://phabricator.wikimedia.org/T176578#3635816 (10MarcoAurelio) [15:19:55] 10Tool-stewardbots, 10User-MarcoAurelio: hat-web-tool/projects: listing incorrect number of users on some projects (not listening to ug_expiry) - https://phabricator.wikimedia.org/T176578#3630217 (10MarcoAurelio) [15:20:09] 10Tool-stewardbots: hat-web-tool/projects: listing incorrect number of users on some projects (not listening to ug_expiry) - https://phabricator.wikimedia.org/T176578#3630217 (10MarcoAurelio) a:05MarcoAurelio>03None [15:52:59] bd808, I'm going through that right now, but have an error when doing python2 -m virtualenv. "No module named virtualenv" [15:53:24] 10Cloud-VPS (Project-requests): Request creation of webperf VPS project - https://phabricator.wikimedia.org/T176597#3630730 (10bd808) +1, let's give this a shot [15:55:54] never mind, had to enter the python2 shell specifically instead of python [16:08:44] 10Cloud-VPS (Project-requests): Request creation of webperf VPS project - https://phabricator.wikimedia.org/T176597#3630730 (10chasemp) +1'd [16:14:59] !log gitblit rebooting 'test' instance as it is unreachable [16:15:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Gitblit/SAL [16:15:37] 10Cloud-VPS (Project-requests): Request creation of webperf VPS project - https://phabricator.wikimedia.org/T176597#3636025 (10madhuvishy) 05Open>03Resolved a:03madhuvishy Project https://wikitech.wikimedia.org/wiki/Nova_Resource:Webperf created with User phedenskog as projectadmin [16:16:55] !log gitblit it first sounds like "what, but we don't use gitblit anymore" but then.. https://phabricator.wikimedia.org/T138986 is still open [16:16:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Gitblit/SAL [16:23:19] tgr|away: Having emailed you those patches, can I now delete sentry-alpha.sentry.eqiad.wmflabs ? [16:23:58] andrewbogott: sure, thanks! [16:24:06] great, thanks [16:26:21] 10Cloud-VPS (Project-requests): Request creation of webperf VPS project - https://phabricator.wikimedia.org/T176597#3636072 (10Gilles) @Peter if you can add me to the project, that'd be great [16:34:19] 10Toolforge, 10Outreachy (Round-15): Outreachy - webservice microtask for Mridu_Bhatnagar - https://phabricator.wikimedia.org/T176018#3636106 (10Mridu_Bhatnagar) a:03Mridu_Bhatnagar [16:44:28] Hey! I was going through scripts/webservice. It has a method call to log_command_invocation. And the definition of log_command_invocation takes 2 arguments. commandname and commandline [16:44:56] can someone please give me an example of commandname and commandline. [16:47:54] figured it out. [16:51:11] 10Data-Services, 10cloud-services-team (Kanban), 10User-bd808: Promote initial use of new Wiki Replica servers - https://phabricator.wikimedia.org/T172704#3636163 (10bd808) [16:51:13] 10Data-Services, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-bd808: Define naming scheme for connecting to new wiki replica cluster - https://phabricator.wikimedia.org/T174860#3636161 (10bd808) 05Open>03Resolved Management tool documented at https://wikitech.wikimedia.org/wiki/Portal:Dat... [16:59:45] Great, I didn't kill tools-bastion-05 with my greps for '.labsdb' [17:00:33] (03CR) 10Merlijn van Deen: [C: 032] sql: Update to allow connecting to new cluster [labs/toollabs] - 10https://gerrit.wikimedia.org/r/380684 (https://phabricator.wikimedia.org/T176688) (owner: 10BryanDavis) [17:01:19] (03Merged) 10jenkins-bot: sql: Update to allow connecting to new cluster [labs/toollabs] - 10https://gerrit.wikimedia.org/r/380684 (https://phabricator.wikimedia.org/T176688) (owner: 10BryanDavis) [17:35:42] zhuyifei1999_: not killing the bastion is good :) [17:48:03] (03PS116) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [17:54:48] (03CR) 10Ricordisamoa: [C: 04-2] "PS116 adds Rollup-based build" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [18:45:28] inside tools-webservice/toollabs/common/el.py there is a method named manifest. The doc string for that method mentions presence of service.manifest file. Wanted to know what does service.manifest file contain? [18:47:56] (03PS117) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [18:49:05] Mridu: it has a "backend:" key, a "version:", and depending on the backend other keys that can be used to recreate the equivalent `webservice --backend=X ...` call that launched the service [18:54:13] okay. Where is this file present though? Or it gets created when we launch the service? [19:01:30] (03CR) 10Ricordisamoa: [C: 04-2] "PS117 splits mixinClass into a module" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [19:02:22] ps117 omg [19:05:58] PROBLEM - Puppet errors on tools-worker-1020 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [19:27:05] 10cloud-services-team (Kanban): CamelCase vs. VPS instance naming - https://phabricator.wikimedia.org/T176757#3636596 (10Andrew) I've confirmed that nova detects name collisions between 'camelcase' and 'CamelCase'. So this isn't especially urgent. There's still a potential race if the users get really luck and... [19:45:58] RECOVERY - Puppet errors on tools-worker-1020 is OK: OK: Less than 1.00% above the threshold [0.0] [19:51:01] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1413 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [19:53:27] why am I blanking on the web tool for managing VPS projects? [19:53:33] not toolsadmin... [19:53:50] horizon [19:54:04] that's it [20:21:02] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [20:30:41] PROBLEM - Puppet errors on tools-exec-1408 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:33:50] 10Tool-stewardbots, 10Need-volunteer, 10WorkType-Maintenance: SULWatcher: avoid logging account creations more than once - https://phabricator.wikimedia.org/T156546#3637077 (10MarcoAurelio) p:05Triage>03High [20:37:05] (03CR) 10Lokal Profil: [C: 032] Add default instructions to top of unused images reports [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380058 (owner: 10Lokal Profil) [20:37:36] (03CR) 10Lokal Profil: [C: 032] "just trying to re-trigger the gate-and-submit jobs" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380058 (owner: 10Lokal Profil) [20:37:42] bd808: Hm.. so I mistakenly thought GUC hardcodes .labsdb as suffic, but that code is unused. It uses meta_p.wiki.slice as-is. [20:38:34] There isn't a column for just the prefix right now (I think). And I'd rather not try to a strtr on .labsdb [20:42:56] (03Merged) 10jenkins-bot: Add default instructions to top of unused images reports [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380058 (owner: 10Lokal Profil) [20:50:04] (03CR) 10jenkins-bot: Add default instructions to top of unused images reports [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380058 (owner: 10Lokal Profil) [20:51:27] (03PS1) 10Krinkle: Centralise database hostname handling [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/380873 [20:51:34] (03CR) 10Krinkle: [C: 032] Centralise database hostname handling [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/380873 (owner: 10Krinkle) [20:52:46] (03Merged) 10jenkins-bot: Centralise database hostname handling [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/380873 (owner: 10Krinkle) [21:03:00] 10Toolforge, 10Outreachy (Round-15): Outreachy - webservice microtask for Mridu_Bhatnagar - https://phabricator.wikimedia.org/T176018#3637222 (10srishakatux) @Mridu_Bhatnagar no worries, I was just checking! :) [21:04:49] hi, I'm trying to build a node server on the wiki cloud. I'v used this guide https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#node.js_web_services. but when I print the port- it is empty (undefiend). so it doesn't listen to anything. do I need to get a port number and set it? [21:05:42] RECOVERY - Puppet errors on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [21:06:48] neta: hmmm... in your node process there is no PORT environment variable [21:06:59] (that was a ?) [21:07:09] bd808: nope [21:07:37] do you mind if I try to look inside the container? I'd need the tool name to do that [21:08:01] bd808: sure. fb-translate-bot [21:10:46] I’m having some trouble with mediawiki-vagrant, HHVM is aggressively caching the bytecode and I can’t figure out how to purge the cache after changing PHP sources. [21:11:02] awight: restart hhvm [21:12:15] That doesn’t seem to do it [21:12:35] but k I will keep doing that and assume that something else is to blame for the stickiness [21:13:07] awight: it really should "just work" :/ [21:13:19] lol yes I remember it working very well [21:13:34] I’ll get back to you if I learn anything... [21:14:48] neta: I'm looking inside the running container and I don't see the variable I would expect... [21:15:32] I’m running “service hhvm restart”, then I reload the Special:Version page and my extension treeish is several months behind the actual checked-out version of the code. [21:15:49] bd808: is there a way for you to check what my port suppose to be? [21:16:12] awight: do a purgespecialchanges.php [21:16:16] awight: do a purgespecialpages* [21:17:00] neta: I'm going to restart the service and see if anything changes [21:17:31] Zppix: That sounds neat—I see nothing by that name in maintenance/, is that where I should be looking? [21:17:32] bd808: OK [21:17:43] awight: let me check the file name [21:18:46] mwscript purgePage.php wiki, perhaps [21:20:00] no. maybe updateSpecialPages [21:20:53] well, that was cool but didn’t solve it. [21:21:12] bd808: aha—I’m on MacOS and think I’m not using NFS to share files. [21:21:14] awight: i dont remember/know maybe i am imaging files that never existed :P [21:21:21] will try a vagrant reload? [21:22:50] harrr, that didn’t work either. [21:23:12] 10Toolforge: node.js webservice not seeing PORT in env - https://phabricator.wikimedia.org/T176812#3637319 (10bd808) [21:23:44] neta: ^ I filed a task ... I have a meeting now, but I'll try to look a bit more later [21:23:59] 10Tools, 10cloud-services-team, 10Community-Liaisons (Oct-Dec 2017): Find and promote tools and their authors - https://phabricator.wikimedia.org/T176677#3637331 (10Qgil) [21:24:41] bd808: thanks a lot! [21:25:14] 10Toolforge: node.js webservice not seeing PORT in env - https://phabricator.wikimedia.org/T176812#3637333 (10bd808) Not matching the expectations from https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#node.js_web_services [21:29:19] bd808: /me unpalms face. Turns out, the code was being reloaded just fine, but the tree-ish display in Special:Version is wonky. [21:29:32] \o/ [21:29:38] I proved to myself by adding an author [21:30:03] Good news is that it forced me to turn on NFS mode which will make me happy in the long un. [21:30:05] *run [21:30:08] some jerk wrote that code... (git blame -> bd808) [21:30:45] bd808: hey some jerk had to at some point xD [22:00:43] (03PS118) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [22:17:42] (03CR) 10Ricordisamoa: [C: 04-2] "PS118 splits ValidationRule into a module" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [22:20:42] Krinkle: hmm... I don't know if we are ready to change meta_p.slice quite yet or not... [22:21:00] I wonder how many things that would break? [22:21:54] really the only tools that will end up having problems are the ones that are using user dbs on c1/c3 today. I bet those would not be using the slice column to pick a db [22:37:59] (03PS119) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [22:40:30] (03CR) 10Ricordisamoa: [C: 04-2] "PS119 splits ForbiddenCharactersValidationRule into a module" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [22:47:20] (03PS120) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [22:51:23] (03CR) 10Ricordisamoa: [C: 04-2] "PS120 splits MaxLengthValidationRule into a module" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [22:57:52] 10Tool-Global-user-contributions: guc giving an error - https://phabricator.wikimedia.org/T176823#3637607 (10Akoopal) [23:01:43] (03PS121) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [23:09:30] !log tools.heritage Deploy latest from Git master: 8248ff4 (T176200) [23:09:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [23:09:35] T176200: Add usage instructions to "unused images" reports - https://phabricator.wikimedia.org/T176200 [23:11:44] (03CR) 10Ricordisamoa: [C: 04-2] "PS121 splits DraggableElement into a module" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [23:11:55] (03PS2) 10Jean-Frédéric: Remove non-url registrant_url for Tunisia in French (tn_fr) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380769 (owner: 10Lokal Profil) [23:12:24] (03CR) 10Jean-Frédéric: [C: 032] Remove non-url registrant_url for Tunisia in French (tn_fr) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380769 (owner: 10Lokal Profil) [23:13:36] (03PS122) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [23:17:38] (03Merged) 10jenkins-bot: Remove non-url registrant_url for Tunisia in French (tn_fr) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380769 (owner: 10Lokal Profil) [23:21:52] !log tools.heritage Deploy latest from Git master: 0dddaf5 [23:21:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [23:30:12] (03CR) 10jenkins-bot: Remove non-url registrant_url for Tunisia in French (tn_fr) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380769 (owner: 10Lokal Profil) [23:33:32] (03CR) 10Ricordisamoa: [C: 04-2] "PS122 splits EditableDraggableElement into a module" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [23:35:25] (03PS123) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [23:40:25] (03CR) 10Ricordisamoa: [C: 04-2] "PS123 splits Section into a module" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [23:44:25] (03PS124) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [23:48:36] (03CR) 10Ricordisamoa: [C: 04-2] "PS124 splits SingleItemSection into a module" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [23:50:05] (03PS125) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [23:54:40] (03CR) 10Ricordisamoa: [C: 04-2] "PS125 splits MultiItemSection into a module" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa)