[08:05:41] 10serviceops, 10Operations, 10ops-codfw: Degraded RAID on mw2279 - https://phabricator.wikimedia.org/T264698 (10Volans) [10:39:56] 10serviceops, 10Operations, 10ops-codfw: Degraded RAID on mw2279 - https://phabricator.wikimedia.org/T264698 (10jijiki) ` [Tue Oct 6 06:28:23 2020] ata2.00: failed command: READ FPDMA QUEUED [Tue Oct 6 06:28:23 2020] ata2.00: cmd 60/80:00:00:a9:f7/00:00:03:00:00/40 tag 0 ncq dma 65536 in... [10:51:25] 10serviceops, 10Operations, 10ops-codfw: Degraded RAID on mw2279 - https://phabricator.wikimedia.org/T264698 (10jijiki) According to netbox, this server is still under warranty. @Papaul the server is set as inactive, let me know if you need anything from, thank you! [11:07:14] 10serviceops, 10Discovery-Search, 10Maps, 10Product-Infrastructure-Team-Backlog: [OSM] Install imposm3 in Maps master - https://phabricator.wikimedia.org/T238753 (10MSantos) [12:25:24] I have a restbase instance that lacks the clusters and listeners that all other nodes in the cluster have. It has a define for the local restbase listener and no other services - any idea what might be causing that? it's more or less the same as the others but hasn't been added to conftool [12:25:31] restbase2009 in this case [12:27:53] in the envoy services proxy, I should clarify [12:33:24] hnowlan: I don't see it in hieradata/role/codfw/restbase/production.yaml [12:33:31] like if you do git grep restbase2010 [12:34:01] unless my local copy is outdated :D [12:34:36] I'm afraid it is :D [12:35:03] that list only gets used for setting up the ratelimit service these days, and even that is of questionable utility [12:36:58] * volans goes back in his corner then :) [13:11:05] hnowlan: Guess you already fixed it? Was just looking and the clusters/listeners seem to be there now [13:24:26] jayme: nope, they're defined in /etc/envoy/clusters.d but not in envoy.yaml [13:24:46] oh, I see [13:27:47] hnowlan: did you try running the build_config script by hand? [13:31:33] hnowlan: that host also has a old envoy version [13:31:55] /usr/local/sbin/build-envoy-config --configdir /etc/envoy/ -d fails with [13:32:03] error initializing configuration '/tmp/.envoyconfig/envoy.yaml': Didn't find a registered implementation for name: 'envoy.filters.listener.tls_inspector' [13:32:20] I guess thats because of envoy 1.13 being installed there [13:33:25] but maybe the "build-envoy-config" script failing should raise an error during puppet agent run? [13:40:34] 10serviceops, 10Kubernetes, 10Patch-For-Review: Support TLS for service-to-service communication in k8s staging - https://phabricator.wikimedia.org/T260917 (10Joe) >>! In T260917#6518755, @JMeybohm wrote: > I could use a pair of eyes on https://gerrit.wikimedia.org/r/q/bug:T260917 > The PCC full diff (https:... [13:50:23] jayme: oh interesting. This host was down for a period of time so I guess it missed some updates at some point [13:51:07] hnowlan: unfortunately you need to go 1.15.1 now or grab the 1.14 deb from apt1001 manually [13:51:58] /home/jayme/envoy_1.14/envoyproxy_1.14.4-1_amd64.deb [13:52:52] 1.15 I'm just running first test with citoid, so it's probably better to go with 1.14 for now [13:54:29] looks like that fixed it! Thanks [13:55:29] interesting, it didn't update debmonitor as I would have expected [13:55:30] yw [13:55:40] hnowlan: how did you install it? [13:55:53] there were any debmonitor-related logs in the stdout/err? [13:56:02] dpkg -X ; cp :P [13:56:16] * jayme hides [13:56:56] dpkg -i should be covered [13:57:02] as it's all done via apt/dpkg hooks [13:57:51] no, dpkg -i isn't covered, the hook doesn't get called for it, when using dpkg -i, a manual debmonitor-client is needed (or waiting for the daily cron) [14:00:26] Dpkg::Pre-Install-Pkgs doesn't get called? [14:00:47] yeah I did a `dpkg -i` :s [14:03:34] moritzm: ^^ [14:05:59] no, I don't think this ever worked for the case of "dpkg -i" [14:06:36] I also remember that we debugged this when this was added and found some missing feature on the hook side of dpkg/apt [14:07:05] yeah could be, I don't recall all the details to be honest [14:07:11] anyway the daily cron will fix it anyway [14:08:28] ack, and I have a TODO to make this a systemd timer at some point [14:09:42] splayed timer :) [14:09:57] dont' want to DDoS debmonitor :D [14:14:45] !log installed envoyproxy 1.15.1-2 on mwdebug1001 [14:14:51] yeah... [14:37:51] 10serviceops, 10Operations, 10Patch-For-Review, 10Platform Team Initiatives (API Gateway): Separate mediawiki latency metrics by endpoint - https://phabricator.wikimedia.org/T263727 (10colewhite) [14:38:39] 10serviceops, 10Operations, 10observability, 10Patch-For-Review, 10Platform Team Initiatives (API Gateway): mtail 3.0.0-rc35 doesn't support the histogram type in -oneshot mode. - https://phabricator.wikimedia.org/T263728 (10colewhite) 05Open→03Resolved Patched mtail rolling out to the fleet this mor... [15:08:10] 10serviceops, 10Parsing-Team, 10Parsoid: parsoid apt repo rolled back breaks updates - https://phabricator.wikimedia.org/T264546 (10ssastry) Parsoid/JS support ends with 1.31 LTS (which I believe is June 2021). So, we need something for 9 months. Are you saying we should release a new deb with 0.11.1? Or ar... [15:10:32] 10serviceops, 10Parsing-Team, 10Parsoid: parsoid apt repo rolled back breaks updates - https://phabricator.wikimedia.org/T264546 (10MoritzMuehlenhoff) >>! In T264546#6521840, @ssastry wrote: > Are you saying we should release a new deb with 0.11.1? Or are you saying you can bump the version in the index to 0... [16:46:17] 10serviceops, 10Kubernetes, 10Patch-For-Review: Support TLS for service-to-service communication in k8s staging - https://phabricator.wikimedia.org/T260917 (10JMeybohm) >>! In T260917#6521438, @Joe wrote: > It also lacks the file for staging/eqiad zotero completely. Are you sure it's all defined correctly in... [18:45:11] 10serviceops, 10Operations, 10ops-codfw: Degraded RAID on mw2279 - https://phabricator.wikimedia.org/T264698 (10wiki_willy) a:03Papaul [23:37:20] 10serviceops, 10DBA, 10Operations: Hourly read spikes against s8 resulting in occasional user-visible latency & error spikes - https://phabricator.wikimedia.org/T264821 (10CDanis)