[11:12:59] Hello. I have a question regarding https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use. [11:13:28] I have a script (Convenient Discussions) hosted on GitHub that I need to deploy to wikis when a new version is released. I'm looking for a way to do it automatically, and Toolforge is one of the solutions that comes to the mind, but that would require running a job that would periodically check for updates of a repository, which is not very [11:13:28] convenient. [11:13:36] A more viable solution seems to be GitHub Actions (https://docs.github.com/en/actions), but there is a problem — IPs that GitHub uses for its network activity are blocked on Wikimedia wikis as open proxies. [11:13:43] There are two ways out of it: 1) Ask for an IP block exemption right. 2) Use Toolforge as a proxy for GitHub Actions. [11:13:55] 1 would require asking for an exemption on each wiki that has the IPs blocked on them locally (a global exemption doesn't overwrite that). Not very convenient for a developer who seeks to automatize. [11:14:02] 2 seems perfect as it is a general solution that other script creators could use. But there is a problem — https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use currently states: [11:14:09] "Prohibited Uses" - "Using Wikimedia Cloud Services as a network proxy: Do not use Wikimedia Cloud Services servers or projects to proxy or relay traffic for other servers. Examples of such activities include running Tor nodes, peer-to-peer network services, or VPNs to other networks. In other words, all network connections must originate from or [11:14:10] terminate at Wikimedia Cloud Services." [11:14:13] So, using Toolforge as a proxy for GitHub Actions technically violates this rule. All the examples given ("Tor nodes, peer-to-peer network services, or VPNs") seem irrelevant to my use case, but still "all network connections must originate from or terminate at Wikimedia Cloud Services" — the network connection originates at GitHub and terminates [11:14:13] at Commons. [11:14:16] So, should I abandon the idea of using Toolforge in such a way, or does this paragraph not apply to my case, or is there something else that could be done? [11:14:39] For the former too... Github uses a loooot of different IPs [11:14:58] I'm not sure how a cron looking for git repo changes and then saving the update is necessarily "not very convienient" [11:15:34] Hi again [11:16:03] "Github uses a loooot of different IPs" — yeah, I'm the guy who posted the issue to Phab yesterday :-) [11:20:40] "how a cron looking for git repo changes" — well, seeing the changes (almost) immediately after a release would be more convenient. Otherwise, I would still have to run manual update if I want to see the new version in action as quick as possible. [11:49:45] With cron, I would also require to create an interlayer that would either do the whole building phase (and there is no user-friendly interface on Toolforge for that, but GitHub Actions is created specifieically for that purpose), or take the files that I already built and submitted to GitHub (to "dist" folder). That would result in repository [11:49:45] cluttering ("dist" folder is usually not submitted to repository) and unnecessary steps. [11:56:29] yurik (https://www.mediawiki.org/wiki/User:Yurik_(WMF)) initially came up with the "Toolforge as a proxy for GitHub Actions" idea, but we checked the Cloud Services rules and found what seems to be a contradiction, so here I am. [12:00:23] [telegram] Getting IPBE is not an option? Only a few wikis proactively block open proxies, mostly you should be fine with a few local flags and a global one [12:30:06] You could use a github web hook to ping an endpoint running in labs that then triggers the changes [12:50:08] I guess that could work too. But what's the fundamental difference of this with just proxying the request? The difference seems superficial. I could just as well connect to Toolforge via SSH and run a command that would publish something on wiki. Toolforge is still used as a proxy in essence. [12:52:15] I'm just not sure that these use cases are what https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use means when it says "Using Wikimedia Cloud Services as a network proxy". [13:03:32] So, technically, if Toolforge performs some task on schedule (looks for changes in remote repo, for example), then the network connection originates at Toolforge and there is no contradiction with the rules. If Toolforge receives some request from GitHub Actions (or even my local computer (?)) and updates some wiki page according to the request, then the network connection doesn't originate at Toolforge and there [13:03:33] is a contradiction with the rules. [13:04:15] But the essence of these two cases doesn't seem very different. [13:55:07] Should I consult in some other place except for this chat? [14:24:30] hi, heads-up re: T259143 as tomorrow EU morning I'll be upgrading the production Grafana, ok if I do grafana-labs too ? [14:24:31] T259143: Upgrade to Grafana 7 - https://phabricator.wikimedia.org/T259143 [14:25:31] godog: assuming that they're the same version currently, then yes please! [14:26:44] jwbth: phabricator task or email post would get you a more long-lasting record of the discussion. [14:26:55] andrewbogott: AFAIK they are yeah, ok will do! [14:27:05] thanks! [14:42:58] andrewbogott: Not sure there is a task here to post to Phabricator. Maybe https://discourse-mediawiki.wmflabs.org/? [14:44:40] Would be perfect if I could reach out to people who can clarify Cloud Services' Terms of Use authoritatively. [14:45:19] jwbth: I don't think anyone on staff follows the discourse forums, so mailing list is a better bet [14:45:31] s/staff/wmcs staff/ [14:45:59] OK, thanks [15:24:19] !log tools Rebuilding all Docker containers to pick up newest versions of installed packages [15:24:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:13:49] !log tools.dewikivpncheck Edited live Deployment and deploy.yaml to update Image name [16:13:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.dewikivpncheck/SAL [16:50:38] !log wikilabels u_wikilabels=> update campaign set active = false where id = 81 [16:50:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL [17:00:14] need a bit of clue on how to set environment variables in the kube webservice [17:00:40] \o @bd808 if you're free [17:02:22] we really don't have anything fancy for that today. I have done it mostly using something in my application layer like https://pypi.org/project/python-dotenv/ or a homegrown work alike [17:03:16] it can be done other ways if you are running a custom deployment rather than using `webservice`. A web search can probably help you figure that out [17:04:13] my custom deployments via shell scripts are fine bc I can set them in the script [17:04:26] is there a guide to write custom webservices on toolforge? [17:05:55] qedk: https://wikitech.wikimedia.org/wiki/Help:Toolforge/How_to -- there are some tutorials there [17:08:57] will probably use a config file, way too many config files :P [17:09:09] thanks @bd808 [17:09:53] bd808: thanks for the STS headers on the proxy, but it looks like you're always adding a header no matter what is already present (such as an existing STS header). Is it possible to tweak the config so it only adds an STS header if one is not already present? [17:10:12] I don't know the correct place to bring this up but the flask on toolforge guide recommends using a file for app config but flask official docs states the behaviour to be finicky (and it doesn't work on a lot of systems) [17:11:52] stwalkerster: maybe... right now the HSTS header is sent from outside of the proxied service block in the nginx config. We would have to change that I think to allow the proxied backend to set their own header instead [17:12:03] qedk: it works on toolforge :) [17:14:11] bd808: gotcha, for now I'll just stop publishing a HSTS header and defer to the global one, though I'd prefer to have a more lengthy header set than a day. :) [17:14:20] yeah, it does :P my local flask env doesn't even go into debug mode without an environment variable set beforehand (out of the program) [17:15:17] i remember having a conversation about this on phab and I think bd808 had some idea on how to deal with multiple hsts headers [17:18:51] it might be possible with something like this -- https://stackoverflow.com/questions/31017524/how-to-add-headers-in-nginx-only-sometimes -- but I'm not sure how we would keep track of if we have switched to TLS already or not in practice. [17:19:17] it's hacks all the way down at this point ;) [17:21:11] I know Apache has a "setIfEmpty" option when setting headers; though I know that's not helpful for nginx. It would surprise me if there wasn't a way to do it though. [17:22:12] eh, I've stopped publishing the 6mo header for now, if there's an easy way to do allow custom ones, I'd appreciate it. I'm guessing the 1 day expiry will be increased to something more sizable at some point in the future anyway? [17:22:15] we should confront nginx with their marketing in that case ;) [17:22:28] :D [17:23:13] nginx is a nice http server to work with, but the way we are abusing it for our domain proxy service makes some things challenging [17:29:30] a graphical representation: https://xkcd.com/2347/ [17:53:45] bd808: bump on license thing [20:28:09] did you delete ores instances a couple hours ago? icinga monitoring is all read for ORES [20:28:17] "name or service not known" though [20:29:08] * halfak perks up. [20:29:41] I don't know anything about ORES instances getting deleted. [20:31:05] halfak: click the image on https://phabricator.wikimedia.org/T260732 [20:31:17] see how it monitors the multiple backends [20:31:24] -04 -05 and -06 [20:31:33] but since a couple hours these are all unkown [20:32:00] (the comments are unrelated since they are from January) [20:32:45] Looks like the instances are still recognized by horizon. [20:33:21] I can log into ores-web-04 [20:33:41] uwsgi is running [20:34:04] I can access ORES via localhost:8080 [20:34:10] ok, good. then something else must have happened.maybe change in networking ACLs between prod icinga and wmcs [20:35:11] Aha. Seems likely. [20:43:51] halfak: i feel like maybe it is because the wmflabs.org domain is being replaced with wmcloud [20:44:00] all these are trying to check ores.wmflabs.org [20:44:12] That domain still works for me. [20:44:41] hmm.yea. confirmed on the monitoring server [20:44:47] tries to debug [20:45:10] mutante: I bet it is the new TLS redirect [20:45:35] mutante: https://wikitech.wikimedia.org/wiki/News/HTTPS_enforcement_at_shared_proxy -- that want live today [20:45:38] bd808: hmm. ok. "name or service not known" sounds DNSish though ? [20:45:45] looks [20:50:43] mutante: I'm pretty sure we have seen in the past that icinga checks using http:// do not follow redirects to https:// [21:33:27] bd808: yea, but it uses '-f follow' and same error when using https. the timing definitely matches the novaproxy change though. [21:53:04] mutante, bd808: I left some notes and a patch on the task... it's not clear to me how this *ever* worked... [21:57:03] it worked because there is "proxy_pass http://oresweb;" in the nginx config [21:57:45] it isn't a simple check_http check where host and URL are the same. it checks 4 times, always the same host, but with a different URL which includes the backend node name [21:58:01] There's not though. [21:58:27] the nginx we'll be hitting here is the labs novaproxy. [21:59:03] which won't know anything about ORES, beyond a mapping somewhere saying to send `Host: ores.wmflabs.org` stuff to a particular backend [22:00:44] though come the thought of it we won't even be getting that far [22:01:09] if my understanding of the nagios commands etc. is right, this thing is actually trying to do a DNS lookup on `oresweb` which is going to fail [22:01:19] i.e. it's not actually attempting to send anything to ores.wmflabs.org [22:05:27] the string "oresweb", respectively "oresweb/node/ores-web-05", "oresweb/node/ores-web-04" etc. in the check_command are not host names. the hostname is ores.wmflabs.org. [22:07:50] it's being used as a hostname by this command [22:10:47] check_ores_workers $HOSTADDRESS$ '$ARG1$' [22:10:53] what you are seeing is $ARG1 [22:11:27] when that is translated to check_http the $HOSTADDRESS is -h and $ARG1 is -u [22:12:40] check_http -f follow -H "$host" -I "$host" -A "${user_agent}" -u "http://${urlhost}/v3/scores/fakewiki/${timestamp}/" [22:13:07] ^ the $host part in this is unchanged. it's about the -u part $urlhost [22:36:51] yes I know [22:37:05] that's the idea [22:37:26] oresweb is not valid in that -u, it's not an existing DNS name [22:47:56] i don't think DNS matters. as long as there is a ServerName in the server config and the client sends the same name in the Host: header and it matches.. the server should serve it [22:50:38] it can never get to ores without it being ores.wmflabs.org [22:50:53] regardless of whatever ores put on their backend nginx configs [22:51:08] but it's all irrelevant [22:51:17] because the client never gets even that far [22:51:27] $ /usr/lib/nagios/plugins/check_http -f follow -H "ores.wmflabs.org" -I "ores.wmflabs.org" -A "wmf-icinga/something (root@wikimedia.org)" -u "http://oresweb/v3/scores/fakewiki/$(/bin/date +%s)/" [22:51:27] Name or service not known [22:51:27] HTTP CRITICAL - Unable to open TCP socket [22:51:37] it does not recognise oresweb [22:53:50] the name does not exist [22:56:16] it does not have to exist, you can ask for http://DOESNOTEXIST and it will still work [22:56:24] the difference is the -f follow [22:57:02] the DNS lookup is for what comes after -H [22:57:26] /usr/lib/nagios/plugins/check_http -H "ores.wmflabs.org" -I "ores.wmflabs.org" -A "wmf-icinga/something" -u "http://DOESNOTEXIST/v3/scores/fakewiki/$(/bin/date +%s)/" [22:57:29] HTTP OK: HTTP/1.1 301 Moved Permanently [22:58:10] all that it does with the -u string is to match it against the server names in the webserver config [22:58:48] it's like using curl with some IP and then sending the "Host: foo" header. it could be any string [22:59:14] try removing the "-f follow" part though and it makes things work [22:59:27] which is due to the new protocol redirect somehow [23:02:12] the real fix should be just adding -S for https [23:02:22] that becomes a 200 now [23:18:44] 18<mutante18> all that it does with the -u string is to match it against the server names in the webserver config [23:19:13] This can't be the whole story as a 'Host: oresweb' header won't get your requested routed to an ORES server that knows what to do with that [23:19:36] it does, that's the -H $HOSTADDRESS$ part [23:21:19] I think maybe this 'oresweb' bit is ignored if we specify -S, but is required to be changed to ores.wmflabs.org if we don't specify -S? [23:22:12] $ /usr/lib/nagios/plugins/check_http -f follow -H "ores.wmflabs.org" -A "wmf-icinga/something (root@wikimedia.org)" -u "http://jgbhklkmjhbjgvuyhiojknbhyghijhbgvuyhjnhbgvfc/v3/scores/fakewiki/$(/bin/date +%s)/" -S [23:22:12] HTTP OK: HTTP/1.1 200 OK - 1023 bytes in 0.036 second response time |time=0.035618s;;;0.000000 size=1023B;;;0 [23:23:37] I don't fully understand this separate -S option though [23:23:44] it means to use HTTPS [23:23:48] well yeah but [23:24:11] like what does it not understand about the schema in the URL? [23:25:23] I think maybe our use of -u is confusing everything here [23:25:30] krenair@labs-bootstrapvz-jessie:~$ /usr/lib/nagios/plugins/check_http -f follow -H "ores.wmflabs.org" -A "wmf-icinga/something (root@wikimedia.org)" -u "/v3/scores/fakewiki/$(/bin/date +%s)/" -S [23:25:31] HTTP OK: HTTP/1.1 200 OK - 1019 bytes in 0.078 second response time |time=0.078062s;;;0.000000 size=1019B;;;0 [23:25:31] krenair@labs-bootstrapvz-jessie:~$ /usr/lib/nagios/plugins/check_http -f follow -H "ores.wmflabs.org" -A "wmf-icinga/something (root@wikimedia.org)" -u "/v3/scores/fakewiki/$(/bin/date +%s)/" [23:25:31] HTTP OK: HTTP/1.1 200 OK - 1019 bytes in 0.091 second response time |time=0.090772s;;;0.000000 size=1019B;;;0 [23:25:38] if you leave -u as just the path [23:27:28] which seems to be what the --help suggests - ` -u, --url=PATH` and `URL to GET or POST (default: /)`, sort of [23:29:05] whole thing feels weird [23:45:47] mutante: by the way your patch added a -S inside the -u string [23:53:20] Krenair: thanks, i fixed it. and that setup is like this on purpose. it is to check each individual backend behind a proxy. so it will ask the same host 4 times but each time send a different URL and the backend name is part of that URL and the (reverse) proxy knows about the state of the backends. so if we dropped that we would not be checking each backend [23:54:30] icinga is green again now at https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=ores.wmflabs.org [23:55:07] so one check has "ores-web-04", the next "ores-web-05" etc [23:55:11] yeah that is a thing, but it's also irrelevant to this, it's just another bit of the path [23:55:37] krenair@labs-bootstrapvz-jessie:~$ /usr/lib/nagios/plugins/check_http -f follow -H "ores.wmflabs.org" -A "wmf-icinga/something (root@wikimedia.org)" -u "/node/ores-web-04/v3/scores/fakewiki/$(/bin/date +%s)/" [23:55:38] HTTP OK: HTTP/1.1 200 OK - 1023 bytes in 0.076 second response time |time=0.076059s;;;0.000000 size=1023B;;;0 [23:55:38] it's not irrelevant because nginx has a reverse_proxy setup [23:55:38] etc. etc. [23:56:05] look, it used to work. the only thing that changed is the https redirect, we are telling it to now use https and ..it is fixed [23:56:57] it is irrelevant to this. [23:57:35] the exact path we give it is not the problem here [23:57:40] i identified the issue, explained why it's broken, added the fix and fixed it [23:58:06] it was not related to a missing DNS record [23:58:21] as evidenced by it working again as it did before [23:58:40] I think it was. [23:59:25] "Name or service not known"? what else is going to give that?