[04:17:45] 10serviceops, 10Operations, 10Core Platform Team Backlog (Later), 10Patch-For-Review, 10Services (next): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10Tgr) [06:43:52] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10jijiki) [07:05:34] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` thumbor1003.eqiad.wmnet ` The log can be found in... [07:08:31] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin2001.codfw.wmnet for hosts: ` thumbor2003.codfw.wmnet ` The log can be found in... [08:03:35] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['thumbor2003.codfw.wmnet'] ` and were **ALL** successful. [08:28:03] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin2001.codfw.wmnet for hosts: ` thumbor2004.codfw.wmnet ` The log can be found in... [08:43:35] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10jijiki) [08:44:48] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['thumbor1003.eqiad.wmnet'] ` and were **ALL** successful. [08:46:56] !log Pooling thumbor1003 [09:04:09] jijiki: relog in #operations [09:04:26] *sigh* [09:04:32] thanks fab [09:20:38] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['thumbor2004.codfw.wmnet'] ` and were **ALL** successful. [12:17:11] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10jijiki) [12:18:56] 10serviceops, 10Operations, 10Thumbor, 10User-jijiki: Upgrade Thumbor to Buster - https://phabricator.wikimedia.org/T216815 (10jijiki) [12:19:50] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 2 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) [12:20:00] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10jijiki) 05Open→03Resolved All servers have been upgraded to stretch, next episode on T216815 🍾 [12:20:11] 10serviceops, 10Citoid: JSTOR is blocking citoid IPs - https://phabricator.wikimedia.org/T216456 (10Mvolz) [12:21:54] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10jijiki) [12:22:00] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 2 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) [12:22:05] 10serviceops, 10Multimedia, 10Operations, 10Thumbor, and 2 others: Deploy 3d2png to thumbor servers (stretch) - https://phabricator.wikimedia.org/T216494 (10jijiki) 05Open→03Resolved a:03jijiki Thanks to @gilles and @mobrovac, we can close this. [12:23:09] 10serviceops, 10Operations, 10Thumbor, 10User-jijiki: Upgrade Thumbor to Buster - https://phabricator.wikimedia.org/T216815 (10jijiki) [12:23:29] 10serviceops, 10Operations, 10Thumbor, 10User-jijiki: Upgrade Thumbor to Buster - https://phabricator.wikimedia.org/T216815 (10jijiki) [12:23:36] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, 10User-jijiki: Investigate systemd hardening to replace Firejail for Thumbor - https://phabricator.wikimedia.org/T212941 (10jijiki) [12:23:39] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 2 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) [12:23:47] 10serviceops, 10Operations, 10Thumbor, 10User-jijiki: Upgrade Thumbor to Buster - https://phabricator.wikimedia.org/T216815 (10jijiki) [12:23:53] 10serviceops, 10Operations, 10Thumbor, 10Wikimedia-Logstash, 10User-jijiki: Stream Thumbor logs to logstash - https://phabricator.wikimedia.org/T212946 (10jijiki) [12:24:04] 10serviceops, 10Operations, 10Thumbor, 10User-jijiki: Upgrade Thumbor to Buster - https://phabricator.wikimedia.org/T216815 (10jijiki) [12:24:52] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 2 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) [12:26:12] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 2 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) [12:31:23] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 2 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) 05Open→03Resolved a:03jijiki All servers have been upgraded to stretch, next episode on T216815 🍾 [12:37:20] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 2 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10Gilles) [12:38:30] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 2 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10Gilles) [12:52:11] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 2 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) [12:52:54] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 2 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) [13:09:24] 10serviceops, 10Operations, 10Thumbor: Use image testing for Thumbor upgrades - https://phabricator.wikimedia.org/T217133 (10jijiki) p:05Triage→03Low [13:09:49] 10serviceops, 10Operations, 10Thumbor, 10User-jijiki: Use image testing for Thumbor upgrades - https://phabricator.wikimedia.org/T217133 (10jijiki) [14:36:43] akosiaris: I need to also merge a change that updates the index/ .tgz file? [14:36:44] right? [14:37:59] ottomata: I am guessing your ask about the charts? yes [14:38:05] ya [14:38:14] * akosiaris should automate that [14:38:49] ya i am pretty sure there was a way to do it so that it updated the timestamps for only one chart...looking [14:42:04] mehhh can't find it...thougth I did that once.. [14:47:55] akosiaris: https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/493032 [14:51:35] ottomata: merged [14:51:59] danke! [14:52:29] i should git pull on deploy1001? [14:52:34] btw, I see dirty changes there [14:52:43] in create_new_service.sh [14:53:09] I should delete that thing, it's confusing. The repo in deploy1001 is not used [14:53:22] if you want to rush things, just run puppet on releases1001 [14:53:36] but otherwise, in about 30mins everything will have coalesced [14:55:50] ahh right ok [14:56:32] akosiaris: ok i'm goigng to deploy this version to staging, and if all is ok there, then to production clusters, ok? [14:56:41] sure [15:00:11] hm, i've run puppet on both releases1001 and on deploy1001 [15:00:20] releases did a git pull and i see the new chart version there [15:00:36] cool [15:00:42] deploy1001 changed some ownership of a chart in /etc/helm/cache/archive/ but that's it [15:00:58] i don't see 0.0.5.tgz in /etc/helm/cache/archive/ tho, not sure if that is needed [15:01:12] but, upgrade didn't go to 0.0.5 [15:01:20] it went to 0.0.4 (we never acatually deployed that) [15:01:32] stable/eventgate-analytics 0.0.5 eventgate-analytics receives JSON events over HTTP, valid... [15:01:37] how'd you see that? [15:01:44] scap-helm mathoid search eventgate [15:01:51] so it's available for installing [15:01:58] hm ok [15:02:02] mathoid being fully irrelevant here [15:02:02] trying upgrade again [15:02:20] ok! that worked! [15:03:14] :-) [15:14:40] 10serviceops, 10Operations, 10Performance-Team (Radar), 10User-Elukey, 10User-jijiki: Test different growth factors for memcached (prep step for upgrade to newer versions) - https://phabricator.wikimedia.org/T217020 (10elukey) [15:16:46] so, the release tag I pushed yesterday doesn't have a docker image built [15:16:53] don't know why [15:17:03] jenkins UI is really hard to use, trying to find some job ... [15:18:23] oo i think i found it.. [15:22:46] thcipriani: yt? [15:22:48] https://integration.wikimedia.org/ci/me/my-views/view/All/job/service-pipeline-test-and-publish/30/console [15:22:51] Failed to read 'application/yaml' config from request body. Error: json: cannot unmarshal number into Go struct field RunsConfig.environment of type string [15:23:32] i can repro [15:23:36] works fine on the CLI [15:23:41] but the blubberoid api doesn't work [15:24:20] ottomata: I did notice that earlier. I played with it locally a bit: it's referring to "BUILD_LIBRDKAFKA: 0" it thinks the 0 is an int if you do "BUILD_LIBRDKAFKA: '0'" it worked for me locally. [15:24:29] ah hm [15:24:53] may need to do the same with UV_THREADPOO_SIZE [15:25:05] *UV_THREADPOOL_SIZE [15:25:07] great [15:25:11] it just didn't make it that far :) [15:25:12] that works, will commit, thank you. [15:25:26] thcipriani: is there a way to get notified email when this job completes or fails? [15:25:27] for my repo? [15:27:16] there's really not much of a feedback mechanism in place for pushing tags, but that's a good idea for a method of notifcation. Falls under this task: https://phabricator.wikimedia.org/T177868 [15:27:42] currently finishing up better notification for patches/postmerge [15:28:21] k thanks [15:28:30] tags triggering an email seems like a reasonable next thing to implement. Would it just send to the tag author? [15:29:35] could be fine, but maybe it'd be more flexible to configure the alert emails also in .pipeline/ [15:29:46] since yall are considering making that config more flexible in general [15:31:01] yes indeed [16:13:20] akosiaris: to install in production [16:13:23] should -n be 'production [16:13:24] '/ [16:13:24] ? [16:13:27] e.g. [16:13:39] helm install --name production [16:13:50] scap-helm eventgate-analtyics install -n production stable/eventgate-analytics ... [16:13:58] yeah that's correct [16:14:01] k [16:16:01] hm ok since i have slightly different values for eqiad/codfw [16:16:05] i have 2 different values files [16:16:19] i should do CLUSTER=eqiad and -f eventgate-analytics-eqiad-values.yaml [16:16:24] etc. for codfw [16:16:25] ? [16:17:00] e.g. [16:17:01] CLUSTER=eqiad scap-helm eventgate-analtyics install-n production eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [16:28:18] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, and 2 others: Make swift containers for docker registry cross replicated. - https://phabricator.wikimedia.org/T214289 (10fsero) It seems there are some issues on the swift side regarding container-real-synchronization. I'll hold this for now and... [16:29:11] ottomata: yep, it seems a good idea to keep values per dc [16:29:27] where are you storing those values file? [16:30:32] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, and 2 others: improve docker registry architecture - https://phabricator.wikimedia.org/T209271 (10fsero) [16:30:35] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, and 2 others: Make swift containers for docker registry cross replicated. - https://phabricator.wikimedia.org/T214289 (10fsero) 05Open→03Stalled [16:31:24] ottomata: we are working on that related task https://phabricator.wikimedia.org/T212130 [16:31:32] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, and 2 others: Make swift containers for docker registry cross replicated. - https://phabricator.wikimedia.org/T214289 (10CDanis) I'm happy to help in the future, although it will also be a learning exercise for me :) [16:31:54] fsero: they are in /srv/scap-helm/eventgate [16:31:57] in deploy1001 [16:32:03] great [16:32:18] its pretty brittle, you have to remember to edit them to change the main_app.version every time you have a new image version [16:32:34] or if there are any other chart template value changes needed [16:32:55] 10serviceops, 10Prod-Kubernetes: Helm packages deployment tool, at least for cluster applications. - https://phabricator.wikimedia.org/T212130 (10fsero) we are going to pick helmfile for now as it seems to have an slighly wider community. I'll work on create a suitable package for us for helmfile [16:33:26] you can override it during the helm install part [16:33:43] yeah, but then on the CLI it is also a bit brittle. [16:33:46] and keep in mind that the end goal is to keep those files over source conrol [16:33:48] ya [16:33:50] that'd be good [16:34:00] also, it might be nice if scap-helm could wrap some defaults a little better, e.g. remembering when e.g. stable/eventgate-analytics is needed [16:34:08] or forgetting to put CLUSTER=staging with -n staging [16:34:30] or to use -n eventgate-analytics when using kubectl [16:34:39] or doing e.g. KUBECONFIG=/etc/kubernetes/eventgate-analytics-staging.config [16:36:51] i'm not convinced that we should keep scap-helm tbh, i think it was a quick and dirty wrapper to move forward [16:37:17] aye [16:37:20] and ideally using helmfile you should only need to interact with git for deployment [16:37:20] maybe something even better [16:37:31] debugging and scaling up is another story [16:37:44] 10serviceops, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Kanban): Add k8s credentials for Blubberoid continuous deployment - https://phabricator.wikimedia.org/T217147 (10thcipriani) [16:38:11] 10serviceops, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Kanban): Add k8s credentials for Blubberoid continuous deployment - https://phabricator.wikimedia.org/T217147 (10thcipriani) a:05LarsWirzenius→03None [16:38:20] it'd be nice if there was a wrapper for which you only had to provide cluster and chart, and all other commands would be generated for you, e.g. getting logs, execing a shell, upgrading/installing a chart, etc. [16:38:42] but anyway! [16:38:43] this is awesome [16:38:53] deployed in prod clsuters! [16:38:59] akosiaris: fsero i'm going to merge DNS change [16:39:08] woudl like also to merge puppet discovery change too [16:39:36] there are a couple of TODO: on the dns change [16:39:56] yes but those can't be done until the puppet change is merged [16:39:58] chicken egg problem [16:40:09] see top of file https://gerrit.wikimedia.org/r/#/c/operations/dns/+/491860/5/utils/mock_etc/discovery-geo-resources [16:40:18] Do not add anything to this file unless you've *first* made the [16:40:18] # corresponding edit to hieradata/common/discovery.yaml , merged that, and [16:40:18] # deployed it to all authdns servers via the puppet agent! [16:42:13] akosiaris: ack [16:46:49] ok , dns merged and authdns-update done. [16:47:44] akosiaris: fsero what needs done to merge puppet stuff? [16:53:45] ottomata: do you mind if we do that tomorrow EU morning with fsero? [16:54:04] it's about time we create that runbook about how to instantiate a new LVS service [16:54:12] yep i would like to do it to get properly onboarded akosiaris [16:54:16] ottomata: sorry [16:54:18] :) [16:54:34] ottomata: I can promise you will probably wake up to a working LVS service :P [16:55:45] akosiaris: ya that's fine! [16:55:50] sounds great :) [16:59:29] jijiki: you too ^ btw [16:59:49] about time you create your first LVS service. So you can tell us as well what sucks in the process ;) [17:14:36] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, and 2 others: Make swift containers for docker registry cross replicated. - https://phabricator.wikimedia.org/T214289 (10fsero) With the help of @CDanis now PCC looks happy, @fgiunchedi is good for merge if you think so too. [17:14:51] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, and 2 others: improve docker registry architecture - https://phabricator.wikimedia.org/T209271 (10fsero) [17:14:59] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, and 2 others: Make swift containers for docker registry cross replicated. - https://phabricator.wikimedia.org/T214289 (10fsero) 05Stalled→03Open [17:26:20] lol [17:38:57] akosiaris: are yall going to make a new patch tomorrow, or use mine? [17:39:09] there is a port mistake in mine, mabye I just fix and yall can reference? [19:03:55] ottomata: use yours. We 'll possibly update it though, no worries [19:04:56] k, i updated the port mistake [19:04:59] should be good to go [22:48:03] ottomata: you just got an almost automatic dashboard under https://grafana.wikimedia.org/d/POYzU8rmz/eventgate-analytics?refresh=1m&orgId=1 [22:48:35] woooowww coooool