[06:45:05] Hi. I was going through the api.php docs and little confused about the difference between https://login.wikimedia.org/w/api.php?action=help&modules=query%2Ballpages and https://www.mediawiki.org/w/api.php?action=help&modules=query%2Ballpages pages. [06:45:05] basically login.wikimedia and wikimedia. [06:45:05] I had been using wikimedia page, some error in the code directed me towards the former page. [09:18:27] tanny411: login.wikimedia.org is used for certal auth only, so probably the api is somehow limited compared to the mediawiki. You can try asking on some of the other channels though for hore details [10:33:02] !log admin purging rbd snapshots for image fc6fb78b-4515-4dcc-8254-591b9fe01762 (T270478) [10:33:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:33:05] T270478: [ceph] Make sure rbd snapshots are being cleaned up - https://phabricator.wikimedia.org/T270478 [10:42:25] !log puppet-diffs updated facts from the tools project: `PUPPET_MASTER="tools-puppetmaster-02.eqiad.wmflabs" modules/puppet_compiler/files/compiler-update-facts` [10:42:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet-diffs/SAL [10:52:33] !log toolsbeta live-hacking local puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/650470 (T267966) [10:52:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:52:38] T267966: Add more k8s-etcd nodes to the cluster on tools project (and investigate performance issues) - https://phabricator.wikimedia.org/T267966 [16:13:05] !log admin setting autoscale to 'off' for both ceph pools (eqiad1-compute and eqiad1-glance-images) because we like how things are set and the autoscaler does not [16:13:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:21:04] !log admin removing dangling rbd snapshots (for backups on cloudvirt1024) (T270478) [16:21:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:21:08] T270478: [ceph] Make sure rbd snapshots are being cleaned up - https://phabricator.wikimedia.org/T270478 [16:43:12] Is a Cloud VPS writing to the filesystem still considered slow or did that stop being a problem once we stopped over-relying on the bad shared storage solution whose name I forgot [16:45:05] Namely, writing a lot of small files [16:45:31] hare: it depends a bit on context. Writing to an nfs shared mount is and continues to be bad [16:45:36] local storage should be ok [16:46:08] there are throttles on default VMs which should prevent you from stealing iops from other VMs. [16:47:52] !log admin finished cleaning up the dangling snapshots from cloudvirt1024, freed ~12% of the capacity (T270478) [16:47:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:47:56] T270478: [ceph] Make sure rbd snapshots are being cleaned up - https://phabricator.wikimedia.org/T270478 [16:51:34] !log admin removing dangling rbd snapshots (for backups on cloudvirt1023) (T270478) [16:51:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:54:19] !log admin finished cleaning up the dangling snapshots from cloudvirt1023 (T270478) [16:54:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:54:23] T270478: [ceph] Make sure rbd snapshots are being cleaned up - https://phabricator.wikimedia.org/T270478 [16:55:36] !log admin removing dangling rbd snapshots (for backups on cloudvirt1022) (T270478) [16:55:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:56:57] !log admin finished cleaning up the dangling snapshots from cloudvirt1022 (T270478) [16:56:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:58:44] !log admin removing dangling rbd snapshots (for backups on cloudvirt1021) (T270478) [16:58:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:00:40] !log admin finished cleaning up the dangling snapshots from cloudvirt1021 (T270478) [17:00:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:00:48] T270478: [ceph] Make sure rbd snapshots are being cleaned up - https://phabricator.wikimedia.org/T270478 [17:04:28] bd808 aftert having raised the ulimit, we are no longer hitting these limits but we suspect we are hitting something else now. [17:04:36] Can you shed some light on this? [17:05:12] Skynet: you are going to need to provide a lot more details. [17:05:18] !log admin removing dangling rbd snapshots (for backups on cloudvirt1025) (T270478) [17:05:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:06:19] I'm sure hare has been discussing with you about connection and file socket limits on a VM. [17:06:19] !log admin finished cleaning up the dangling snapshots from cloudvirt1025 (T270478) [17:06:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:06:22] T270478: [ceph] Make sure rbd snapshots are being cleaned up - https://phabricator.wikimedia.org/T270478 [17:06:26] bd808 ^^^ [17:06:59] IABot has been suffering from high levels of false positives. As it turns out many requests aren't making it to the TCP stack [17:07:47] You suggested raising the ulimit which we have and noticed the problem drop considerably, but it's still hitting a limit, just not ulimit [17:08:57] !log admin removing dangling rbd snapshots (for backups on cloudvirt1026) (T270478) [17:08:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:09:29] !log admin finished cleaning up the dangling snapshots from cloudvirt1026 (T270478) [17:09:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:13:07] Skynet: you seem to be asking me to randomly guess what is the next bottleneck in your application. I can't really do that level of magic. I might be able to help given a concrete problem, but I don't have the level of understanding of your code and deployment to give a solution without y'all doing the work to at least find a potential bottleneck. [17:14:03] Actually, I'm wondering what kind of throttles the Cloud VMs might be set up with that could cause TCP requests to be aborted before even opening. [17:14:18] none that I know of [17:14:37] You suggested ulimit, but I'm wondering if the VM itself might be getting throttled since it's on a shared cluster? [17:14:57] throttled how? [17:16:10] there is certainly time sharing on the underlying physical hardware, but nothing that would stop the guest kernel from managing its networking layer [17:16:57] bd808 okay, thank you. Just needed to know if I was hitting a limit being imposed by the host on the guest. [17:17:49] what is the design of your service? what level of parallelism are you targeting? What metrics are you tracking to determine if you are operating within the constraints of your instance? [17:18:38] None at current. Before a few days ago, I thought I was suffering a problem with the code itself. [17:18:38] It sounds vaguely to me like you are experiencing growth in usage and that is leading to performance issues [17:19:04] Yea. As we expand to more wikis, the bot is more concurrently checking dead links. [17:19:21] Unknowningly, we hit kernel limts. :p [17:31:31] incidentally, thanks Skynet for working on this very important service our communities love :) [17:32:03] oh it's you Cyberpower :D [17:35:52] Nemo_bis yes it is. :D [17:36:24] And I'm glad to do it. [18:00:32] Skynet: do you use NFS in this service? [18:01:17] throttles exist for hosts that use that mostly, but they don't block initiation of connections per se. It's more that it can cause retransmits and things that slow it down [18:02:05] I would be very surprised if they caused what you describe [18:12:37] bstorm: Not really. IABoy' [18:12:51] *IABot's VM uses the local virtual disk [18:15:18] I mean if the host it is on is NFS connected at all [18:15:28] If it is not, then NFS throttles are not applied [18:37:09] !log tools set profile::wmcs::kubeadm::etcd_latency_ms: 15 T267966 [18:37:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:37:16] T267966: Add more k8s-etcd nodes to the cluster on tools project (and investigate performance issues) - https://phabricator.wikimedia.org/T267966 [19:29:46] hi - on enwiki a number of block notices send people to whatismyip.com to find out their IP address, e.g. https://en.wikipedia.org/wiki/Template:Colocationwebhost [19:30:13] but that's a private website that serves ads, which seems like a suboptimal solution [19:30:41] is there a Wikimedia Cloud tool that returns IP addresses? if not, is there any possibility we could build one? see also https://en.wikipedia.org/wiki/Template_talk:Colocationwebhost#Developing_a_new_tool_to_display_IP_addresses [20:09:32] L235: Toolforge does not allow the tools hosted there to see the real ip address of the visiting web browser as a privacy requirement. Cloud VPS projects can ask for an exemption to ip address hiding, so a what's my ip type service could in theory be built there. [20:10:11] That makes sense [20:10:27] What kind of approval is needed for the exemption? [20:12:21] It would require review by the WMCS team in their weekly meeting. Mostly we would want an assurance from a trusted Wikimedian that they would properly handle the IP address information as PII (personally identifiable information). [20:14:29] Cloud VPS (and by extension Toolforge) are only IPv4 reachable, so if IPv6 information was desired it would not currently be possible [20:15:54] there are a lot of IP display services on the internet. It seems reasonable that there would already be a privacy respecting one somewhere. [20:28:55] L235: it should be pretty straightforward to have a gadget show people their IP addresses [20:29:24] https://en.wikipedia.org/w/api.php?action=query&meta=userinfo&callback= [20:41:28] legoktm: any suggestions what I should do next? if I want to pursue an improvement to that template and others [20:43:40] L235: I would ask someone to write a bit of JS that uses the API endpoint I linked to display the IP address wherever you'd like it [20:43:59] and that should be a default-on gadget? [20:44:55] yeah. I'd think it would be 10-15 lines of code [20:46:34] !log admin setting pg and pgp number to 4096 for eqiad1-compute as joachim thinks 8192 might be too much T270305 [20:46:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [20:46:38] T270305: Ceph performance tuning - https://phabricator.wikimedia.org/T270305 [20:47:36] legoktm: it doesn't get the IP when you're logged in, I just tried [20:48:03] visiting https://en.wikipedia.org/w/api.php?action=query&meta=userinfo&callback= ? [20:49:02] https://usercontent.irccloud-cdn.com/file/Td47t0iA/image.png [20:49:06] legoktm: ^ [20:49:49] you have to make the request over JSONP, which adds the callback= parameter and does the magic :) [20:50:19] see https://www.mediawiki.org/wiki/API:Cross-site_requests#GET_request [20:50:41] ah, well, well clearly I am in over my head haha [20:50:43] the API processes all JSONP requests as logged out, which is why it'll reveal your IP even if you're logged in [20:51:28] I volunteer enterprisey to do it [20:51:44] enterprisey gets volunteered to do a lot of things heh [20:52:04] at the virtual WCNA this year he coded a script live in 21 minutes [20:52:27] to show notifications as browser notifications [20:55:48] the reward for a job well done is 3 more jobs [20:56:25] so true [21:12:55] made a cloud VPS instance and waited over night but can't SSH to it. can SSH to other existing cloud instances. console log showed it finished the install. hmmm [21:13:07] just delete and try again? [21:37:10] mutante: that's the first thing to try! If it doesn't work second time around ping me and I'll look [21:37:33] 95% of the time the issue is "there is a pre-set puppet config running on that VM which fails so puppet never gets as far as setting up logins" [21:37:39] that's something you should be able to see in the log [21:45:52] andrewbogott: thanks, I will try it again and made sure it has only "insetup" [21:47:36] in the console log I can see it ran puppet after i put "insetup" on it though [21:47:52] and then it shows an automatic root login [21:50:11] what's the fqdn? [21:59:53] andrewbogott: doc1002.devtools. but the rest not so sure [22:00:02] I deleted and recreated [22:00:21] 172.16.1.5 [22:00:24] should be at doc1002.devtools.eqiad1.wikimedia.cloud [22:00:29] looks like it's not up yet [22:00:45] The last Puppet run was at Fri Dec 18 21:53:32 UTC 2020 (0 minutes ago). [22:01:10] ok, let's just wait a little longer [22:01:39] but there is nothing happening after this in the console [22:04:17] mutante: wfm both as user and root [22:04:55] it doesn't for me, while it works for other instances in the same project, using the same config... wut [22:06:37] show me your ssh line? [22:09:01] https://phabricator.wikimedia.org/P13607 [22:09:08] first is fail, second works [22:09:38] same proxy line... [22:10:09] same SSH versions.. uhm [22:12:39] my cloud root key gets into doc1002.devtools.eqiad1.wikimedia.cloud with no issues [22:14:39] mutante: doc1002.devtools is clearly inserting some kind of magic to qualify the domain [22:14:44] and it's probably using .eqiad.wmflabs [22:14:46] I can directly ssh dzahn@restricted.bastion.wmcloud.org [22:14:49] which is not a valid tld for newly created hosts [22:14:54] but the difference seems to be in DNS [22:14:59] mutante: the more interesting question is actually how the second one works. Neither of those partial FQDNs resolve on restricted.bastion [22:15:00] dzahn@bastion-restricted-eqiad1-01:~$ host deploy-1002 [22:15:00] deploy-1002.eqiad.wmflabs has address 172.16.0.238 [22:15:03] works ^ [22:15:08] dzahn@bastion-restricted-eqiad1-01:~$ host doc1002 [22:15:08] Host doc1002 not found: 3(NXDOMAIN) [22:15:11] ^ doesn't [22:15:43] for new hosts a simple hostname is not considered specific enough to resolve [22:15:56] doc1002.devtools should resolve on a bastion [22:17:07] hm… mayeb not [22:17:09] andrewbogott: at least on bastion-restricted-eqiad1-01.bastion.eqiad.wmflabs the /etc/resolv.conf does not have eqiad1.wikimedia.cloud as a search domain. [22:17:16] And honestly I think that's correct [22:17:30] folks should use fqdns [22:17:32] in any case I'm pretty sure this will work for mutante if you use fqdn [22:18:07] Host deploy-1002.devtools not found: 3(NXDOMAIN) [22:21:14] ok, i gotta change the entire ssh config for this change and find another way to write down my host names [22:23:27] mutante: here is some (maybe?) useful context https://lists.wikimedia.org/pipermail/cloud-announce/2020-August/000307.html [22:24:04] https://wikitech.wikimedia.org/wiki/Help:Accessing_Cloud_VPS_instances#ProxyJump_(recommended) may be useful as well [22:24:07] andrewbogott: confirmed working and on doc1002 now. this did the trick for me: [22:24:14] ProxyCommand ssh -W %h.eqiad1.wikimedia.cloud:%p dzahn@restricted.bastion.wmcloud.org [22:24:30] add the eqiad1 suffix there are just %h before [22:24:42] thanks for the links and help [22:26:24] the docs above are basically what I used before though and still just uses %h [22:29:13] or not, but it's different ways to get there and I don't have to memorize the instance names. either way ok now [22:29:37] %h is whatever you type after ssh, and above that we tell folks to use fqdns [22:42:34] I'm having issues sshing into a new instance today [22:42:57] (`ssh -v skins.skins.eqiad1.wikimedia.cloud` ) [22:43:16] it seems to be rejecting my public key but it's listed in https://wikitech.wikimedia.org/wiki/Special:Preferences#mw-prefsection-openstack [22:45:25] Jdlrobson: maybe the same issue I had? you can ssh into old instances but not new ones? [22:45:32] yep [22:45:34] exactly [22:45:43] see the immediate backlog [22:45:56] my global root key is not working there either Jdlrobson [22:46:22] It works for me now when using a FQDN (in the ProxyCommand) [22:46:40] mutante: skins.skins.eqiad1.wikimedia.cloud works for you? [22:47:31] Jdlrobson: there is no project named skins according to openstack-browser [22:47:35] bd808: No, I am talking about my instance (that was also new) [22:47:42] 😱 [22:48:28] right skins.reading-web-staging.eqiad1.wikimedia.cloud is what i need [22:48:33] thanks for clearing that up [22:48:39] sure! :) [22:48:42] how can i make this something readable e.g. ssh skins.wmflabs.org [22:49:01] im useless when it comes to ssh [22:49:36] you can setup some remapping rules in your ~/.ssh/config, but I'm not sure what is not readable about the fqdn [22:50:45] and it should be pretty easy to setup hostname tab completion based on your known_hosts file so you really just need to type `ssh skins.` to get there [22:58:55] Jdlrobson: one line just "Host skins.reading" another line "ProxyCommand ssh -W %h.eqiad1.wikimedia.cloud:%p jdlrobson@bastion.wmcloud.org" and then "ssh skins.reading" [23:53:24] !log admin truncated haproxy.log.1 on cloudcontrol1003 [23:53:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL