[03:53:13] !log repooling cp5001 - T231262 [03:53:38] arg... wrong channel :) [07:45:32] hello, i am back from vacation [07:51:08] <_joe_> hi mutante [07:51:12] <_joe_> I guess in europe? [07:51:52] _joe_: yes, i am still here right now [07:52:07] adding services proxy to wikitech as the first thing [07:53:43] i hope there wasn't too many stressful days for you [07:55:27] <_joe_> well [07:55:30] <_joe_> next q? [07:55:31] <_joe_> :P [07:55:40] <_joe_> it was frankly exhausting [07:56:02] <_joe_> anyways, can you take a look at icinga? there are a few alerts that should have to do with us. [07:58:07] yea, in a second. just watching one more puppet run on labweb [08:11:00] i saw 2 x PHP opcache health on mwdebug2*. they seemed common before i was gone already. i did a reload of php7.2-fpm service to empty the cache [08:11:12] the other (unhandled) ones did not look very specific to us so far [08:11:22] analytics/cloud/netbox [08:15:23] <_joe_> yep just those two then [08:15:29] <_joe_> that should go away once scap is fixed [08:17:14] alright [08:28:36] keyholder is not armed on deployment servers..but somebody has ACKed that [08:28:46] do you know about that? [08:29:08] oh heh, 34 and 46 days ago though, so no rush apparently :p [09:09:22] <_joe_> no I don't know the details [09:41:49] 10serviceops, 10Operations, 10Patch-For-Review: upgrade and rename krypton & create its codfw equivalent - https://phabricator.wikimedia.org/T224247 (10Dzahn) p:05Normal→03High [10:18:02] 10serviceops, 10Operations, 10Patch-For-Review: upgrade and rename krypton & create its codfw equivalent - https://phabricator.wikimedia.org/T224247 (10Dzahn) [14:48:24] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): "worker died, restarting" mobileapps issue - https://phabricator.wikimedia.org/T229286 (10Mholloway) @mobrovac @Pchelolo are we still pregenerating responses for the definit... [15:02:04] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): "worker died, restarting" mobileapps issue - https://phabricator.wikimedia.org/T229286 (10Pchelolo) Stopping regeneration === stopping storage. However, looking into grafana... [17:03:32] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Product-Infrastructure-Team-Backlog (Kanban): Stop pregenerating and storing /page/definition responses - https://phabricator.wikimedia.org/T231361 (10Mholloway) [17:03:48] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Product-Infrastructure-Team-Backlog: Stop pregenerating and storing /page/definition responses - https://phabricator.wikimedia.org/T231361 (10Mholloway) [17:05:04] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): "worker died, restarting" mobileapps issue - https://phabricator.wikimedia.org/T229286 (10Mholloway) >>! In T229286#5442034, @Pchelolo wrote: > Could you please create a sub... [17:06:48] 10serviceops, 10Core Platform Team, 10Mobile-Content-Service, 10Page Content Service, 10Product-Infrastructure-Team-Backlog: Stop pregenerating and storing /page/definition responses - https://phabricator.wikimedia.org/T231361 (10Pchelolo) [17:07:11] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, 10Core Platform Team Workboards (Clinic Duty Team): Stop pregenerating and storing /page/definition responses - https://phabricator.wikimedia.org/T231361 (10Pchelolo) [17:26:55] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): "worker died, restarting" mobileapps issue - https://phabricator.wikimedia.org/T229286 (10Mholloway) It looks like the rate of worker deaths approximately doubled for some u... [19:55:52] 12:20:48 Hi! I'm getting this weird error whenever I try to pull down an image from docker-registry.wikimedia.org: `Error response from daemon: received unexpected HTTP status: 502 connect failed` [19:55:54] 12:21:22 Any ideas? thcipriani [19:56:03] cc others in serviceops who might know ^ :) [19:56:15] (tyler's out for lunch right now) [19:59:52] I saw one of those in a CI run earlier today, a recheck worked though [20:06:06] cdanis: I rechecked a few times and I'm still getting the same error [21:14:46] cdanis: we narrowed the problem to pulling the manifest from the registry; e.g., curl https://docker-registry.wikimedia.org/v2/wikimedia/mediawiki-services-kask/manifests/v1.0.3 is a 50x for clarakosi. I don't have access to whatever machine the registry is running on, are there any logs to check that might point out what's happening? [21:16:21] I don't think there are any in logstash. I need to leave for the evening but I can take a look tomorrow, if someone in EUTZ hasn't gotten to it before then. can you file a task tagged operations and serviceops thcipriani ? [21:16:44] cdanis: yep, I can do that, thanks [21:17:21] er, well, clarakosi would you file ^ since you can give a little more info/copy-and-paste acutal error output? [21:18:12] thcipriani sure [21:19:35] 10serviceops, 10Operations, 10WMF-Legal: Move old transparency report pages to historical URLs and setup redirect - https://phabricator.wikimedia.org/T230638 (10Varnent) As I understand it - Legal would like the existing microsite located at transparency.wikimedia.org to be relocated to transparency.wikimedi... [21:30:09] 10serviceops, 10Operations: Error pulling image from docker registry - https://phabricator.wikimedia.org/T231388 (10Clarakosi) [21:30:13] 10serviceops, 10Operations, 10WMF-Legal: Move old transparency report pages to historical URLs and setup redirect - https://phabricator.wikimedia.org/T230638 (10BBlack) @Varnent: For the redirects: just the main https://transparency.wikimedia.org/ URL? Or also the sub-pages like https://transparency.wikimed... [21:30:24] 10serviceops, 10Operations, 10WMF-Legal: Move old transparency report pages to historical URLs and setup redirect - https://phabricator.wikimedia.org/T230638 (10BBlack) 05Stalled→03Open [21:34:33] 10serviceops, 10Operations, 10Traffic: Error pulling image from docker registry - https://phabricator.wikimedia.org/T231388 (10Jdforrester-WMF) Tagging in Traffic; this is the server (cp1075) running ATS not Varnish, right? [21:41:44] 10serviceops, 10Operations, 10Traffic: Error pulling image from docker registry - https://phabricator.wikimedia.org/T231388 (10BBlack) a:03ema Assigning to @ema to investigate (yes, this is the live test server for ATS backends for these servers). Most likely the problem is specific to ATS<->docker-regist... [21:44:51] 10serviceops, 10Operations, 10Traffic: Error pulling image from docker registry - https://phabricator.wikimedia.org/T231388 (10greg) p:05Triage→03Unbreak! This is blocking CI runs. [21:47:12] 10serviceops, 10Operations, 10Traffic: Error pulling image from docker registry - https://phabricator.wikimedia.org/T231388 (10ayounsi) Note that it's breaking Jenkins on the Puppet repo (goes straight to -1). https://integration.wikimedia.org/ci/job/operations-puppet-tests-stretch-docker/20234/console [21:51:59] 10serviceops, 10Operations, 10Traffic: Error pulling image from docker registry - https://phabricator.wikimedia.org/T231388 (10BBlack) Depooled cp1075 `ats-be` service via confctl, can someone retry and confirm mitigated? [21:58:45] 10serviceops, 10Operations, 10Traffic: Error pulling image from docker registry - https://phabricator.wikimedia.org/T231388 (10Clarakosi) >>! In T231388#5443941, @BBlack wrote: > Depooled cp1075 `ats-be` service via confctl, can someone retry and confirm mitigated? It works! [21:59:39] 10serviceops, 10Operations, 10Traffic: Error pulling image from docker registry - https://phabricator.wikimedia.org/T231388 (10BBlack) Please leave this open for now so @ema can look at a more-permanent fixup tomorrow! [22:00:31] 10serviceops, 10Operations, 10Traffic: Error pulling image from docker registry - https://phabricator.wikimedia.org/T231388 (10Jdforrester-WMF) p:05Unbreak!→03Normal De-prioritising.