[13:33:55] !log tools remove /etc/init.d/rsyslog on tools-worker-XXXX nodes so the rsyslog deb prerm script doesn't prevent the package from being updated [13:33:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:04:29] Hey folks. Do web requests to, say ores.wmflabs.org, pass through varnish. We're getting hit with a massive request load and I want to dig into the user-agents to find out if they gave us some contact info, [14:25:20] halfak: they don't pass through the production varnish clusters for sure. I don't think they would pass through the deployment-prep varnishes in wmcs either. [14:25:39] Thanks bblack [14:25:54] I think it's more or less straight into whatever software you're running, although there might be some lightweight https revproxy involved somewhere [14:26:04] I wonder if I'd have any other means of looking up the (probably good-faith applied) user-agents for these requests. [14:26:19] yeah, there's a proxy that strips incoming IP and user-agent for privacy/security reasons. [14:26:22] does ores itself not have logs of request headers and client ips, etc? [14:26:26] before it hits my instances. [14:26:27] ah, I see! :) [14:27:57] https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy [14:28:37] not a lot there which helps with this, though [14:28:41] but I think that's the proxy we're talking about [14:58:57] bblack, what do you think of putting wmflabs.org behind icinga so that we have request logs internally? [14:59:05] *varnish [14:59:09] Sorry I had icinga on the brain [15:00:13] !log git Reset password for user Halfak per IRC request [15:00:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [15:03:34] halfak: if you mean the production varnishes and all their tie-ins to e.g. analytics outputs, the answer is basically no. There's a pretty big, intentional wall between production and cloud services at these kinds of layers that we like to maintain :) The only other public varnishes I know of within the WMCS/WMFLabs sphere are the deployment-prep test instances which handle e.g. https://en.wikipe [15:03:40] dia.beta.wmflabs.org/ , but that's a specific and limited use for deployment testing stuff for MediaWiki and related. [15:05:03] halfak: if it were me, I'd probably pursue asking WMCS about whether you can get logs, analysis, maybe even filtering/caching when necc, etc from the existing nova project-proxy stuff or something along those lines. [15:05:49] unbreaking my line-broken link above: https://en.wikipedia.beta.wmflabs.org/ :) [15:05:55] Makes sense. Thanks for your thoughts on the subject bblack :) [15:06:37] np! :) [15:07:03] I think my main question is: We like to tell API users to give us a useful user-agent so that we can track them down in case something is going weird. Where can I get access to the user-agents they provide when something does go wrong? Or alternatively, should I suggest they do something different to let us know who they are and how to get in contact with them? [15:07:39] You could just do a floating ip and have traffic directed to the instance [15:07:50] without using the web proxy [15:08:00] so the url is like ..wmflabs.org [15:11:07] paladox, hmm. That's interesting. I thought the proxy was required to strip these details for security/privacy. [15:11:27] But even if that were the case, I would need a good way to store those logs. [15:11:38] Speaking of which, I might have a rotating log of this stuff somewhere [15:11:39] * halfak digs. [15:12:40] Yeah. No IP or user-agent :| [15:13:02] Oh no. There is a user-agent [15:13:03] Hmm [15:14:19] Yeah. I only get "-" for all requests [15:23:53] halfak: I am looking at the nginx reverse proxy code for domainproxy (the proxy for *.wmflabs.org) and I do not see it stripping user-agent. We do hide the origin IP from the downstream, but not UA. [15:49:53] halfak i'm not entirely sure maybe wait for the cloud team to respond [19:17:36] halfak, we can look at the request logs on the project-proxy/domainproxy servers if need be [19:18:23] Hey! Just saw bd808's earlier comment. I should double-check that UAs get removed. [19:18:29] Maybe I'd misunderstood that. [19:18:30] To get access yourself you'd need to make a request and wait a week [19:18:35] Thanks Krenair. I'll report back. [19:18:52] it counts as a special infrastructure project for which people need NDAs etc [19:19:02] Right. makes sense. [19:22:49] Aha! I am getting some UAs. [19:22:51] Nice. [19:25:10] Yup. Looks like we got hammered by someone with the UA of "-" [19:25:12] Dang. [19:25:54] Let's say we wanted to figured out which IP/IP-range was hitting us and block it if necessary. What would be the best way to do that? [19:25:58] Krenair, ^ [19:26:18] .* deny-from-all :P [19:26:22] lol [19:26:31] {{done}} [19:26:35] ;) [19:27:29] Could we skip the proxy and handle the IPs ourselves or does that violate something (possibilities, good practice, etc.)? [19:27:55] halfak, someone with access to the proxy can dig through the access logs [19:28:44] theoretically you can ask for a floating IP (if you don't already have one available in your quota) and handle whatever traffic you like (within the scope of the rules ofc) [19:29:40] Gotcha. We have a rate limiting strategy based on IP address that I could implement if we got IPs with the requests. [19:29:42] We do have a place to put a list of IP addresses banned from getting through the proxy [19:30:04] That rate limiting strategy might be the best long term solution. [19:30:06] added really recently: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/479041/ [19:30:40] Good to know. Thanks. [19:30:50] maybe. on the other hand we AIUI were historically running low on public IP addresses in the old region with a /25 and when we moved to neutron got a /24 [19:31:18] I don't know if wanting to filter on public IPs is gonna be enough to get a floating IP, maybe [19:31:57] halfak, of course with a floating IP you will have to handle getting your own certs. shouldn't be a problem but you'll need to handle it [19:32:07] Damn. [19:32:26] * halfak adds that to the list of things he doesn't want to have to think about or maintain. [19:33:48] no new plain-HTTP things pls [19:36:51] halfak: you might chat with musikanimal. He has fought some of the same battles with xtools. [19:37:22] Oh yes. I love learning from other people's headaches. So nice to avoid them myself :) [19:37:27] Thanks bd808 [19:37:44] There is really no perfect fix in Cloud VPS. If we pass origin ips to the projects then they have toxic data to deal with (IP address is PII in Foundation thinking) [19:38:01] and by not passing IP blocking options are limited [19:38:01] Right. That makes sense. [19:38:25] What so you think about rate limiting potential in the proxy as a configurable thing? [19:38:39] E.g. max_connections_per_ip or something like that? [19:39:53] that you can configure on a per-domain basis? [19:40:15] and yeah musikanimal is a good suggestion, I recall digging through proxy logs for him [19:40:47] Krenair, right. Per-subdomain or whatever [19:51:27] I'm not entirely sure if nginx lets us do that, if it does I guess it becomes a question of whether it's worth it [19:51:44] might be easy to set up an instance with a floating IP and a simple http-01 lets encrypt process [19:54:53] I wonder why they determined IPs to be PII cause in reality it is public information [19:54:53] * hauskater learns about floating ips [19:55:23] nginx has some rate limit primitives built-in, but I've never used them. The tricky bit in the project-proxy case would be figuring how to expose control to the proxy creator [19:58:00] Zppix: its a sticky question no matter how is deciding. IP addresses are needed at the network level, but they are also "personal" in that most folks have no control of the information and it can easily be shown as a way to "identify" a human or small group of humans [19:59:18] bd808: i guess, maybe its just because I don't mind people finding out my ip as much, that i just dont see the point, however i am against just having it out there for easily viewing but for logs and such like this I see no issue with it *shrugs* but i guess the lawyers/staff that made up that policy know what they are doing [20:00:26] Zppix: the logs are the toxic data bit. What keeps a log file from being public? Only the control of that data and a lack of exploits that exfiltrate it from the protected space to a public space [20:01:52] Collecting PII is not banned, but once you have PII on your hands you have a responsibility to maintain control of the data and actively manage its retention/deletion [20:03:15] I like the analogy of PII & toxic waste. Both are the by-products of desired activities but both represent a long term risk that requires active management [20:04:03] and there's a lot of different "you"s involved in the chain of custody there, too. If you're handing off PII to someone else, you have some responsibility to make sure they're taking seriously as well, etc [20:07:36] as for the whole debate about who decides on what basis that IPs are in fact PII ... well that's a deep topic that surely has points of contention, but a convenient reference point is that the ECJ ruled it to be so: [20:07:43] https://www.enterprisetimes.co.uk/2016/10/20/ecj-rules-ip-address-is-pii/ [20:08:12] so that ties directly into any relevant EU legal definitions of PII, etc [20:33:13] !log tools Killed 2 /usr/bin/unattended-upgrade procs on tools-sgeexec-0923 that seemed stuck [20:33:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:35:16] !log tools Manually running `/usr/bin/python3 /usr/bin/unattended-upgrade` on tools-sgeexec-0923 [20:35:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:26:49] !log tools Rebooting tools-sgeexec-0923 after lots of messing about with a broken update-initramfs build [21:26:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:43:05] !log tools `sudo exec-manage repool tools-sgeexec-0923.tools.eqiad.wmflabs` [21:43:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL