[00:29:04] imagines a world where icinga detects wikibugs quitting and uses eventhandlers to restart it [00:29:19] but that would require prod talking to cloud [00:29:33] and then i think again we should use a dedicated ganeti prod vm for the important bots [00:36:16] 10serviceops, 10Gerrit, 10Operations, 10Patch-For-Review: Convert Gerrit to use H2 as the database - https://phabricator.wikimedia.org/T211139 (10Dzahn) say something random to make wikibugs rejoin a channel [02:03:10] 10serviceops, 10Analytics, 10Analytics-Kanban, 10EventBus, and 3 others: Set up LVS for eventgate-main on port 32192 - https://phabricator.wikimedia.org/T222899 (10Dzahn) https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=lvs2003&service=PyBal+IPVS+diff+check https://icinga.wikimedia.org/... [06:22:53] 10serviceops, 10Beta-Cluster-Infrastructure, 10Editing-team, 10Release Pipeline, and 3 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Joe) >>! In T220235#5176630, @Ottomata wrote: > Could we use image version: latest in beta hiera? And somehow pull d... [06:43:00] 10serviceops, 10Beta-Cluster-Infrastructure, 10Editing-team, 10Release Pipeline, and 3 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Krenair) Lets put together a list of all the services we need to set up as containers within beta (either because stu... [06:50:04] 10serviceops, 10Beta-Cluster-Infrastructure, 10Editing-team, 10Release Pipeline, and 3 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Joe) >>! In T220235#5179225, @Krenair wrote: > Lets put together a list of all the services we need to set up as cont... [06:54:25] 10serviceops, 10Beta-Cluster-Infrastructure, 10Editing-team, 10Release Pipeline, and 3 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Krenair) Alright, I guess in that case I'll create a VM for citoid later and try getting the container for that runni... [07:50:16] 10serviceops, 10Beta-Cluster-Infrastructure, 10Editing-team, 10Release Pipeline, and 3 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10akosiaris) >>! In T220235#5179194, @Joe wrote: >>>! In T220235#5176630, @Ottomata wrote: >> Could we use image versio... [07:57:34] 10serviceops, 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, and 4 others: Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main - https://phabricator.wikimedia.org/T218346 (10akosiaris) [07:57:43] 10serviceops, 10Analytics, 10Analytics-Kanban, 10EventBus, and 3 others: Set up LVS for eventgate-main on port 32192 - https://phabricator.wikimedia.org/T222899 (10akosiaris) 05Open→03Resolved >>! In T222899#5179002, @Dzahn wrote: > https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=lv... [09:32:08] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10User-fsero: placeholder task for migration problems - https://phabricator.wikimedia.org/T222210 (10hashar) I can confirm that instances are now able to fetch new containers immediately after they have been published. So that solves it for me.... [10:02:12] 10serviceops, 10Operations, 10Release Pipeline, 10Release-Engineering-Team, and 5 others: Introduce wikidata termbox SSR to kubernetes - https://phabricator.wikimedia.org/T220402 (10akosiaris) > With respect to the end point checks it would be great to hear what we are trying to achieve with them. Our serv... [10:13:08] 10serviceops, 10Operations, 10Release Pipeline, 10Release-Engineering-Team, and 5 others: Introduce wikidata termbox SSR to kubernetes - https://phabricator.wikimedia.org/T220402 (10akosiaris) >>! In T220402#5177862, @Pablo-WMDE wrote: > Hi @akosiaris - thanks for getting back to us. > >> sending a Host:... [11:23:59] 10serviceops, 10Operations, 10Release Pipeline, 10Release-Engineering-Team, and 5 others: Introduce wikidata termbox SSR to kubernetes - https://phabricator.wikimedia.org/T220402 (10Pablo-WMDE) Hi @akosiaris, thanks for taking the time to explain the way the `Host` header is intended to be used. If I unde... [11:55:58] 10serviceops, 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Puppet broken on VMs in deployment-prep - https://phabricator.wikimedia.org/T221654 (10Krenair) deleted deployment-mathoid [12:00:17] <_joe_> Krenair: \o/ [12:00:36] there's still sca [12:00:44] gotta figure out the service_def entry for citoid [12:01:12] am wondering what the mathoid one was based on [12:03:59] 10serviceops, 10Operations, 10Release Pipeline, 10Release-Engineering-Team, and 5 others: Introduce wikidata termbox SSR to kubernetes - https://phabricator.wikimedia.org/T220402 (10akosiaris) >>! In T220402#5180031, @Pablo-WMDE wrote: > Hi @akosiaris, > > thanks for taking the time to explain the way the... [12:04:16] maybe part of modules/service/templates/node/config.yaml.erb [12:06:44] root@deployment-sca01:/etc/citoid/config.yaml looks like what I want [12:07:10] wonder what the version should be [13:14:24] <_joe_> Krenair: sorry, I was busy [13:15:12] <_joe_> so for mathoid I extracted the configmap from kubernetes [13:15:23] <_joe_> but what's on sca01 should work too, I think [13:15:41] 'docker ps' on -docker-mathoid shows the image is from docker-registry.wikimedia.org/wikimedia/mediawiki-services-mathoid:2019-04-24-175758-production [13:15:56] I wonder if I can get a list of everything under /wikimedia [13:16:21] <_joe_> it will be easier once we install portus [13:16:35] <_joe_> but I defer to fsero a suggestion on how to extract that info [13:17:12] where can I find prod's current image used for citoid? [13:17:24] <_joe_> heh, kubernetes :) [13:17:35] <_joe_> it's still not available in a git repo [13:17:36] ... can I get that from outside? [13:17:39] <_joe_> that will change soon [13:17:46] <_joe_> not right now, no [13:18:17] <_joe_> it's one of the goals for my team this quarter to make deployments easier to track from the outside [13:19:20] ok [13:19:28] mind telling me what the current citoid image used in prod is? [13:20:30] and is there a task for the deployment transparency thing? [13:20:55] 10serviceops, 10Beta-Cluster-Infrastructure, 10Editing-team, 10Release Pipeline, and 3 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Ottomata) eventgate-main is up in both beta in via service::docker and in prod k8s. Both eventgate-analytics and eve... [13:21:50] <_joe_> Krenair: https://phabricator.wikimedia.org/T212130 [13:22:00] Krenair: we dont have a registry frontend yet [13:22:14] <_joe_> as for citoid, fsero can help you I think [13:22:15] but thcipriani made a bash script that publish a catalog on a web format https://tools.wmflabs.org/dockerregistry/ [13:22:42] <_joe_> I need to write a report from an interview before it fades from memmory [13:23:16] Krenair: image: docker-registry.wikimedia.org/wikimedia/mediawiki-services-citoid:2019-04-01-104952-production [13:23:19] thats the one [13:24:49] interesting, I already managed to extract some of the data shown at thcipriani's tool but all the important stuff was missing [13:25:23] oh right [13:25:31] implicit limit on number of results, set n parameter [13:26:03] ok so this is not so bad [13:26:51] also about current production image [13:27:03] giving that images are labeled as TIMESTAMP-production [13:27:14] f you get the most recent one usually you would be right [13:27:20] Well. [13:27:26] and we will work on making it more transparent [13:27:36] You said 2019-04-01-104952-production is the one running. [13:28:02] Problem is, most recent one labeled TIMESTAMP-production is 2019-04-11-182015-production... [13:28:24] so... usually does not seem to include this particular case :) [13:37:30] That why I said usually and not will be always :) [14:03:38] anyone looking into free space on mwmaint1002? it's down to 2% on / [14:05:37] <_joe_> well /root has 62 GB in it [14:05:44] /home is 32GB and /root has home-mwmaint2001 that is 63G [14:05:54] <_joe_> yep [14:06:14] <_joe_> I think they should be removed [14:06:18] <_joe_> but let's wait fro mutante [14:06:41] <_joe_> I just recovered a couple GBs running apt-get clean [14:09:26] thanks [14:09:40] and clearly our grafana dashboard is not covering things properly [14:09:41] https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&panelId=6&fullscreen&orgId=1&var-server=mwmaint1002&var-datasource=eqiad%20prometheus%2Fops&var-cluster=misc [14:09:51] cc cdanis ;) [14:10:13] https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&panelId=12&fullscreen&orgId=1&var-server=mwmaint1002&var-datasource=eqiad%20prometheus%2Fops&var-cluster=misc [14:10:34] fsero: I'm here [14:10:43] so I got some idea around blubber and all [14:11:01] the names are not clear but 'disk utilization' is "% of time the device had an iop in the queue" [14:11:09] I can already start making a patch [14:11:13] which is itself a not-great metric, but it is what is there [14:12:08] but getting docs around kubernetes/docker/helm etc and other docker/k8s related stuffs is hard. [14:12:56] onimisionipe: we are working on some documentation [14:13:07] For reference you can take a look into https://github.com/wikimedia/citoid/tree/master/.pipeline [14:13:22] For an example of a nodejs running on the pipeline [14:13:39] cdanis: ack, layer 8 issue ;) [14:13:42] thx [14:14:56] _joe_: ack to wait for mutante, just FYI there was some large addition today between 11:38 and 12:33, see: [14:14:59] https://grafana.wikimedia.org/d/000000377/host-overview?panelId=12&fullscreen&orgId=1&var-server=mwmaint1002&var-datasource=eqiad%20prometheus%2Fops&var-cluster=misc&from=1557827791901&to=1557841402055 [14:15:13] from 86% to 98% space used [14:19:49] there's a couple people who if they cleaned up we could get back 40g instantly [14:20:08] sf tz, I'll poke a few folks later [14:32:27] fsero: thanks! [15:10:13] 10serviceops, 10Operations, 10Beta-Feature, 10MW-1.34-notes (1.34.0-wmf.6; 2019-05-21), and 2 others: Remove php7 beta feature - https://phabricator.wikimedia.org/T219128 (10Jdforrester-WMF) For simplicity, we should land https://gerrit.wikimedia.org/r/508177 whenever, and then just let the removal of the... [15:32:55] 10serviceops, 10Wikimedia-Site-requests, 10Patch-For-Review, 10Performance-Team (Radar): Enlarging the default thumb size on Dutch Wikipedia - https://phabricator.wikimedia.org/T215106 (10Gilles) Yes, having the thumbnails you wanted primed on a dedicated page visited once works just fine. [15:44:02] 10serviceops, 10Operations, 10Traffic, 10Wikidata, and 4 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10WMDE-leszek) @BBlack @Dzahn: I have passed the topic of domain ownership transfer to the C-level ranks here at WMDE, and I have to inform that... [17:11:50] 10serviceops, 10Operations: SRE FY2019 Q4 goal: complete the transition to PHP7 - https://phabricator.wikimedia.org/T219127 (10Reedy) [17:47:33] <_joe_> guillom is reporting issues with citoid, looking [18:05:39] apparently you can create a group with just emojis (https://gerrit.git.wmflabs.org/r/admin/groups/18) :P [18:29:43] did a small amount of cleanup on mwmaint1002/2001 and got back 40 gb on both [18:29:49] should hold us for awhile :-) [19:20:38] I am asking in case someone is still around, but is there any policy regarding adding a new namespace to the docker registry? [19:21:24] for the developers local tooling brennen created a bunch of images in releng/dev-images.git and we could use to have them published to a namespace [19:21:47] something different than the poorly named /releng/ namespace used for the CI image in integration/config.git [19:35:12] ^^^ will follow up on that tomorrow :) [19:36:03] thanks hashar. [19:37:14] brennen: but please do get a task ;-] [19:57:17] 10serviceops, 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, and 4 others: Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main - https://phabricator.wikimedia.org/T218346 (10Nuria) [19:57:21] 10serviceops, 10Analytics, 10Analytics-Kanban, 10EventBus, and 2 others: Use new eventgate chart release analytics for eventgate-analytics service. - https://phabricator.wikimedia.org/T222962 (10Nuria) 05Open→03Resolved [20:11:48] 10serviceops, 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, and 4 others: Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main - https://phabricator.wikimedia.org/T218346 (10Nuria) 05Open→03Resolved [20:14:57] 10serviceops, 10dev-images: Create credentials and add a puppet secret for publishing dev-images with docker-pkg in /dev/ namespace - https://phabricator.wikimedia.org/T223329 (10brennen) [20:33:52] 10serviceops, 10MediaWiki-General-or-Unknown, 10Operations, 10Core Platform Team (PHP7 (TEC4)), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10WDoranWMF) [21:05:53] 10serviceops, 10Beta-Cluster-Infrastructure, 10Editing-team, 10Release Pipeline, and 3 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Krenair) `krenair@deployment-docker-citoid01:~$ curl 'http://localhost:1970/api?search=10.1038%2Fng.590&format=mediaw... [21:15:10] 10serviceops, 10Beta-Cluster-Infrastructure, 10Editing-team, 10Release Pipeline, and 3 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Krenair) hold on a minute, that zotero instance does not exist, why does this work? [21:36:33] 10serviceops, 10Analytics, 10EventBus, 10Release Pipeline, 10Services (watching): Modern Event Platform: Stream Intake Service: Documentation - https://phabricator.wikimedia.org/T219332 (10Nuria) 05Open→03Resolved [21:50:19] Okay. [21:50:38] I'm wondering if Citoid in beta is actually getting the config we give to it via hiera [21:50:59] The config makes it into /etc/mediawiki-services-citoid/config.yaml and that gets mounted inside the container [21:51:24] But if I `docker exec -it /bin/bash` I start in /srv/service [21:52:04] And there's a config.yaml in there [21:52:33] is there anything to make the Citoid service actually go look at /etc/mediawiki-services-citoid/config.yaml ? [22:35:53] wonder if I need some special network rule somewhere to let citoid talk to zotero [22:36:00] wonder where the logs are being sent [23:04:19] Logs can be seen at 'docker logs ' [23:08:06] I don't suppose there's a way for me to edit the source and make it restart the service using the new source [23:23:11] Citoid does not log a whole lot of useful information about Zotero failures but I figured out how to make the same request as it does (I think) and got http 500 from zotero [23:23:44] meanwhile the zotero logs show it failed to recognise the tls cert for https://journals.plos.org [23:35:04] https://phabricator.wikimedia.org/P8525 [23:40:04] It doesn't seem to be able to handle google's cert either [23:40:13] and there is no /etc/certs dir [23:40:39] er, /etc/ssl/certs, or /etc/ssl for that matter [23:44:08] what's the point in having a network connection in 2019 if you don't recognise any CA certs