[06:03:08] join [06:03:15] help [06:03:37] !help [06:03:37] hsync7_: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [06:09:17] With what hsync7_ [06:12:27] with how to make tool on toolforge [06:12:39] RF1dle [06:13:46] with making tool on toolforge RF1dle [06:14:09] I don't understand but I don't use toolforge [06:15:16] what do you use to make tools if not toolforge RF1dle [06:18:07] I don't make tools - I'm a botop that runs bots locally on demand. [06:18:23] I don't actually use WikiMedia sites, I use other wikis [06:41:39] !help with toolforge [06:41:39] hsync7_: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [06:47:18] ? [07:11:34] hey hsync7_ [07:12:35] hey arturo [07:13:02] did you take a look at wikitech? there should be plenty of useful info there [07:14:49] arturo i saw that but couldn't go through it at first sight [07:15:09] I'll try again there at wikitech arturo [07:16:51] ok [07:16:58] let me find some links for you [07:18:12] perhaps you can start here hsync7_ https://wikitech.wikimedia.org/wiki/About_Toolforge [08:45:50] Hello. I'd like to know how long the "actor migration" from revision table is going to last. :) [09:16:21] mmecor, hi, could you be a bit more specific about what you mean with migration? [09:16:47] in some way, migration has been going on for many years! [09:17:13] if you mean "when things will be stable" probably very soon [09:18:28] you can subscribe to https://phabricator.wikimedia.org/T188327 for progress tracking [09:30:06] !log wikidata-dev wikidata-mud created instance [09:30:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [09:35:41] !log wikidata-dev wikibase-mud created instance [09:35:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [09:35:46] (wrong instance name in the earlier message) [09:52:13] how long does it usually take until you can ssh into a new Cloud VPS instance? [09:52:22] jynus: what i mean is that i found that rev_user and rev_user_text are not there anymore and I read that there is a migration towards revactor. [09:52:29] I created one ca. 20 minutes ago, according to the log output it finished booting, but it still rejects my public key [09:53:02] mmecor: correct [09:53:04] i'm running some scripts constantly and i'd need to change it. i'd like to know when the situation will be stable. [09:53:57] so the user text and id is not going to come back [09:54:23] but the same information is still on the db [09:54:39] there is a wiki page and/or an email with the suggested plan, let me search it [09:55:10] ok [09:55:19] mmecor: here is a detailed manual https://wikitech.wikimedia.org/wiki/News/Actor_storage_changes_on_the_Wiki_Replicas [09:55:42] the summary is you can use _compat tables for a quick fix [09:55:53] but a proper fix is suggested for a better performance [09:56:57] just to be clear, this is not an arbitrary change cloud has decided, it has changed on production, and because wikireplicas is a replica of production, it has to change too [09:58:40] but are _compat tables as efficient as the original ones? [09:59:03] i'm retrieving data calculations based on the entire table and i'm afraid the english revision table is huge [09:59:39] mmecor: it depends on the query, definitely if you have the time, I would suggest you adapt to the new schema [09:59:48] i'd change my code but i'd like to know when the change will be effective for all the wiki replicas [10:00:03] I understand [10:00:18] i'll adapt, yes. i'd like to know when the process will be done entirely. [10:00:37] so the change is live not meaning that the views already have the column available and have dropped the old one [10:00:57] the change to production has already been done [10:01:17] but I think it has only missing to drop the columns (on production) [10:01:56] the _compat is not a real table, just a view that does a join under the hood [10:02:09] does that help? [10:02:26] *the change is live NOW [10:03:14] in other words, since a few days ago, actor is the canonical place to search for user information (but of course the user table still exists for registered users) [10:30:36] !log wikidata-dev wikibase-mud deleted instance (SSH didn’t work, we found another server we could use) [10:30:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [11:04:50] jynus: thanks. i haven't checked it properly. but if u tell me that the new columns are effective in all wikis, it means that i can already change the code. [11:05:02] yes, they are [11:05:04] which is what i wanted to know :) cuz I retrieve from all of them. [11:05:17] sorry, I didn't understood your question at first [11:05:53] migration for us is a long process, "the fields being there" is only a small part of it! [11:06:16] :-) [11:07:04] :-) [13:44:23] jynus: rev_actor = user_id? [13:44:35] not exactly [13:45:38] please read documentation at https://www.mediawiki.org/wiki/Manual:Actor_table [13:45:56] ahm [13:46:09] a second table. [13:46:11] ok [13:46:29] yes, but it holds both registered and anodimous users [13:46:38] while the user table holds only registered accounts [13:46:47] rev_actor is equivalent to actor_id [13:46:52] and rev_actor is unique [13:47:08] rev_actor refers to an actor_id in the actor table [13:47:35] ok. clear now. [13:48:48] i hope the table joins are no less efficient [13:48:54] i'll have to try [13:49:06] thanks again jynus [15:23:12] I don't seem to be able to SSH to new instances, even within projects where I can connect to existing ones. Is there something obvious I might be missing? [15:24:16] I had the same problem earlier today [15:25:02] Were you able to solve the problem? [15:25:26] no, eventually the person who needed the instance found another cloud hosting opportunity and I deleted the instance again :/ [15:25:40] Ah, ok. I'll create a Phab task then. [15:28:34] https://phabricator.wikimedia.org/T225307 [15:28:50] thanks [15:29:07] oops, we had an edit conflict :) [15:29:16] Ha, yep, sorry! [15:29:25] I'll correct [15:29:28] ok [15:29:38] I was about to fix it too, that would have been funny :D [15:29:48] Haha I did worry :D [15:31:05] arturo: I remember you were working on auth-y things with sssd, could that affect Cloud VPS instances or is it Toolforge only? [15:31:15] (just a wild guess, but the only idea I have) [15:44:35] Hi Samwalton9 and Lucas_WMDE, do you happen to have one of the new instances online still? I'd like to take a look at the console log for any metadata/puppet errors. [15:45:04] I removed mine now, but I can quickly create one if that would be helpful [15:45:51] Lucas_WMDE: sssd is only running in toolforge AFAIK [15:45:54] Oh actually I just tried to connect to one and it appears to be working now... [15:46:25] hashtags-staging.hashtags if you want to look at logs [15:48:56] well that's good news, but makes it difficult to find what the problem was earlier :) [15:49:22] I'll take a look, I also created a new instance a few minutes ago and didn't have any SSH connection issues. [15:51:19] jeh: mine’s gone, sorry [15:56:05] no problem. Do you recall how long after creating the VM you tried connecting to it? Wondering if maybe puppet was taking longer than usual to configure the host, delaying the ability to SSH in. [15:59:31] That's entirely possible...or puppet got stuck/crashed for some reason [15:59:43] Samwalton9: nothing interesting in the hashtags-staging logs. I'll keep my eye out for anything that may have caused the issue earlier, if it happens again please let us know [16:05:45] Well one of the instances I created yesterday, couldn't connect, so I left it alone and tried again today, still couldn't connect. Unfortunately deleted that one now. Will update if I get any more issues. Thanks :) [16:06:19] jynus: the only drawback i found in this whole actor thing in the revision table is that if i want to retrieve rev_ids massively and filter the anonymous edits i need to do the inner join and that is considerably slower. [16:14:17] i miss the rev_user in order to filter anonymous. [16:19:30] Lucas_WMDE, jeh, there are a million things that can go wrong with creating a new VM, some of them interesting :) Our test service suggests that VM creation is mostly working but if you reproduce the issue please point me (or someone) to the broken VM. Sorry for the inconvenience! [16:19:45] * andrewbogott does a manual test in the meantime [16:20:32] will do next time, thanks :) [16:25:15] jeh: fyi there's a job running on cloudcontrol1003 called 'nova-fullstack' that creates a new VM every five minutes, validates that dns/puppet/ssh are working properly, and the deletes it. If any steps fail it leaks the VM, and then it dies and alerts if some number of VMs (5? 7?) are leaked. [16:25:34] You can see the number of leaks at any given point with [16:25:34] OS_PROJECT_ID=admin-monitoring openstack server list [16:25:59] (and I delete them by hand periodically if there isn't a torrent of failures) [16:26:40] ok, thanks good to know. thanks! [16:27:10] * andrewbogott just created three fresh VMs by hand and they all came up just fine [16:28:07] Lucas_WMDE: by chance was your new VM in a project that has a local puppetmaster set with a project-wide setting? [16:31:31] andrewbogott: I wouldn’t know about that, but the project was wikidata-dev [16:31:38] * andrewbogott looks [16:33:26] did the vm name start with 'wikibase'? [16:33:35] yes [16:33:49] wikibase-mud (I logged it to our SAL too) [16:33:51] ok — so, VMs that have that name automatically get role::wikibase applied [16:34:01] so if that role is broken in some way (such that puppet can't compile it) [16:34:06] that would prevent anything from working [16:34:11] hm, okay [16:34:24] I don’t remember seeing any errors in the log, but I didn’t look very closely either [16:34:28] I’ll create another instance [16:34:45] try creating with a different name and then applying that role by hand after it's reachable; see if that's the issue [16:35:54] !log wikidata-dev wikibase-test-T225307 testing T225307 with an instance whose name starts with wikibase- [16:35:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [16:35:57] T225307: Can't SSH to new instances - https://phabricator.wikimedia.org/T225307 [16:36:05] !log wikidata-dev other-test-T225307 testing T225307 with an instance whose name does not start with wikibase- [16:36:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [16:37:55] so far I’m getting permission denied on both instances [16:38:00] but they’re probably not fully ready yet [16:42:13] okay, you’re right, other-test-T225307 works [16:42:13] T225307: Can't SSH to new instances - https://phabricator.wikimedia.org/T225307 [16:42:47] Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, Could not find data item profile::microsites::wikibase::server_name in any Hiera data file and no default supplied at /etc/puppet/modules/profile/manifests/microsites/wikibase.pp:3:18 on node wikibase-test-t225307 [16:42:50] .wikidata-dev.eqiad.wmflabs [16:43:16] Lucas_WMDE: of course it could just be lucky timing :) [16:43:30] Oh wait, nope, that looks like you find the issue with role::wikibase [16:47:00] andrewbogott: https://phabricator.wikimedia.org/T225312 does that task look about right? [16:47:17] I don’t know much about Puppet, so perhaps I used some terminology in nonsensical ways :) [16:48:39] I commented [16:50:32] !log wikidata-dev other-test-T225307 deleted instance (no longer needed for T225312) [16:50:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [16:50:36] T225312: role::wikibase in wikidata-dev Cloud VPS project broken - https://phabricator.wikimedia.org/T225312 [16:50:36] T225307: Can't SSH to new instances - https://phabricator.wikimedia.org/T225307 [16:50:57] ok thanks [16:56:40] andrewbogott: would those args be set in the “hiera config” section of the “prefix puppet” horizon page? [16:56:44] the one that’s currently just “{}”? [16:56:52] Lucas_WMDE: yep! [16:56:57] ok [16:57:02] any syntax example I can look at? :) [16:57:23] It's just key:value [16:57:31] Or key:"string value" [16:57:46] okay I’ll try [16:59:24] and reboot to apply the change, I assume [16:59:52] !log wikidata-dev wikibase-test-T225307 edited hiera config and rebooted (investigating T225312) [16:59:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [16:59:55] T225312: role::wikibase in wikidata-dev Cloud VPS project broken (⇒ can’t SSH into wikibase-* instances) - https://phabricator.wikimedia.org/T225312 [16:59:56] T225307: Can't SSH to new instances - https://phabricator.wikimedia.org/T225307 [17:02:21] hm, that doesn’t seem to have done much [17:04:10] I tried setting the hiera config of just the one instance, but if it’s already so broken that it doesn’t get new Puppet runs then I guess that doesn’t help [17:04:27] let’s set the default hiera config for the prefix thingy and create another instance afterwards [17:05:16] !log wikidata-dev wikibase-test-T225307 deleted instance, no longer needed for testing T225312 [17:05:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [17:05:19] T225312: role::wikibase in wikidata-dev Cloud VPS project broken (⇒ can’t SSH into wikibase-* instances) - https://phabricator.wikimedia.org/T225312 [17:05:20] T225307: Can't SSH to new instances - https://phabricator.wikimedia.org/T225307 [17:05:54] Lucas_WMDE: no need to reboot, just "sudo puppet agent -tv" [17:06:05] !log wikidata-dev set default hiera config for role::wikibase prefix [17:06:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [17:06:19] andrewbogott: and how do I do that without SSH access? :) [17:06:47] !log wikidata-dev set default hiera config for role::wikibase prefix (T225312) [17:06:49] ah, I thought you were testing this on the already-accessible VM [17:06:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [17:07:32] !log wikidata-dev wikidata-test-T225312 created instance [17:07:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [17:07:42] no that one is already deleted [17:13:26] yaaaay it’s working [17:13:38] cool! [17:14:12] let’s add a proxy to figure out where these config variables actually ended up [17:15:20] !log wikidata-dev wikibase-test-T225312 added proxy [17:15:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [17:15:23] T225312: role::wikibase in wikidata-dev Cloud VPS project broken (⇒ can’t SSH into wikibase-* instances) - https://phabricator.wikimedia.org/T225312 [17:17:02] okay, https://wikibase-test-t225312.wmflabs.org/ gives me an nginx 404 [17:17:09] which is odd given the instance is running an apache o_O [17:17:52] Lucas_WMDE 404 means it reached your instance. (otherwise you would get a different error). [17:19:34] but nginx isn’t even installed on the instance [17:19:49] and apache’s access.log is empty [17:19:55] I’ll try an SSH proxy instead [17:20:12] yup, now I see the apache page [17:21:02] !log wikidata-dev wikibase-test-T225312 sudo a2ensite 50-wikiba-se.conf [17:21:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [17:21:05] T225312: role::wikibase in wikidata-dev Cloud VPS project broken (⇒ can’t SSH into wikibase-* instances) - https://phabricator.wikimedia.org/T225312 [17:21:43] oh, that one only applies if the ServerName is what I put in the hiera config (server_name.example) [17:21:52] not sure how this role is supposed to work, tbh [17:22:05] but I think that’s enough information to close the task at least [17:22:15] and I’ll document what I know on wikitech [17:22:19] thanks a lot andrewbogott! [17:28:01] you bet! [17:34:59] !log wikidata-dev wikibase-test-T225312 deleted instance (test over) [17:35:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [17:35:01] T225312: role::wikibase in wikidata-dev Cloud VPS project broken (⇒ can’t SSH into wikibase-* instances) - https://phabricator.wikimedia.org/T225312 [20:43:22] !help Replag is ridiculous on web.db [20:43:22] RF1dle: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [20:43:37] +30 mins on most shards [20:44:05] ^ I can't read [20:44:10] RF1dle: its catching up. only s8 is that far behind right now [20:44:15] give it 5 mintues and I bet its clear [20:44:40] ridiculous lag is 2+ days :) [20:44:59] bd808: s8 is bad definitely - rest are ~10 mins - rest are worse than earlier but s8 improving from what I can see. [20:48:07] That may be due to ongoing DB compression tasks. I wonder where that is now... [20:50:11] 35 minute lag is not ideal, but it is really not horrible. [20:50:30] no it is not the compression, we only compress on offline dbs [20:50:33] back in the olden days lag was routinely a week. We've come a long way in the last 4 years [20:50:41] it is cpu at 100% due to queries [20:51:15] Oh, ouch [20:51:17] but we have 2 services precisely so that users can choose! [20:52:11] If we had had more room in the next budget I would have put it into more db servers :) [20:52:30] we're investing in storage first this time around [20:52:54] Is there a more informative report than /Replag on tools where you can see history and stuff. [20:53:15] bd808: no: people want real time, instant analytics and < 0.01 latency with limited hardware [20:53:30] if you cannot do that, how do you call yourself an engineer? [20:53:39] :-P [20:54:13] It is normally pretty good - that's why I pointed out how high it is. [20:54:40] RF1dle: I don't think we have a dashboard that shows replag over time. [20:54:47] bd808: we do [20:54:54] oh? awesome [20:55:02] https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=labsdb1009&var-port=9104 [20:55:10] that is for web [20:56:09] and this is the 20 cpus maximized: https://grafana.wikimedia.org/d/000000274/prometheus-machine-stats?orgId=1&var-server=labsdb1009&var-datasource=eqiad%20prometheus%2Fops&from=1559336153068&to=1559940893068 [20:56:22] RF1dle: so if you look at https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=labsdb1009&var-port=9104&panelId=6&fullscreen&from=now-90d&to=now -- you can see that there is a bit of a bump in the last week, but its not completely outside of normal [20:59:24] That's helpful [21:06:06] cpu seems to be beter now- it will recover. My guess is some web app or cron creating overload from some time [21:07:26] I wouldn't doubt that its related to the comment & actor changes. I need to add a bit to the docs bstorm_ wrote today about the new specialized views for those tables and get that info out to folks. [21:07:56] thanks for the info you gave jynus :) [21:08:32] and I'm really surprised you didn't attribute your snarky answer to Abraham Lincoln ;) [21:08:39] That history definitely points to the actor and comment changes. [21:08:59] no surprise of course