[07:40:36] 10serviceops, 10MW-on-K8s, 10Operations: Create the base container images for running MediaWiki in a production environment - https://phabricator.wikimedia.org/T265324 (10Joe) [07:49:53] 10serviceops, 10MW-on-K8s, 10Operations: Create a basic helm chart to test MediaWiki on kubernetes - https://phabricator.wikimedia.org/T265327 (10Joe) p:05Triage→03High [09:49:38] 10serviceops, 10Operations, 10Patch-For-Review: Upgrade the MediaWiki appservers to debian buster, icu63 - https://phabricator.wikimedia.org/T264991 (10JMeybohm) p:05Triage→03Medium [09:53:43] 10serviceops, 10MW-on-K8s, 10Operations: Create the base container images for running MediaWiki in a production environment - https://phabricator.wikimedia.org/T265324 (10JMeybohm) p:05Triage→03Medium [10:07:42] 10serviceops, 10Operations, 10Wikidata: Hourly read spikes against s8 resulting in occasional user-visible latency & error spikes - https://phabricator.wikimedia.org/T264821 (10JMeybohm) p:05Triage→03Medium [11:25:00] 10serviceops, 10GrowthExperiments-NewcomerTasks, 10Operations, 10Product-Infrastructure-Team-Backlog, 10Growth-Team (Current Sprint): Service operations setup for Add a Link project - https://phabricator.wikimedia.org/T258978 (10kostajh) [12:27:56] 10serviceops, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service: High latency on push notification service initialization - https://phabricator.wikimedia.org/T265258 (10JMeybohm) A simple restart of the pods in codfw (without any actual change) triggered the same behavior, so I will deploy t... [12:42:50] 10serviceops, 10Operations, 10ops-codfw: Degraded RAID on mw2279 - https://phabricator.wikimedia.org/T264698 (10Papaul) p:05Triage→03Medium [13:23:23] 10serviceops: Build envoy-build-tools image locally - https://phabricator.wikimedia.org/T265357 (10JMeybohm) p:05Triage→03Medium [13:28:02] 10serviceops, 10CX-cxserver, 10Language-Team (Language-2020-October-December), 10Release-Engineering-Team (Pipeline): Migrate apertium to the deployment pipeline - https://phabricator.wikimedia.org/T255672 (10KartikMistry) Update: We've https://gerrit.wikimedia.org/r/admin/repos/mediawiki/services/apertium... [13:49:43] 10serviceops, 10Prod-Kubernetes, 10Release Pipeline, 10Patch-For-Review: Refactor our helmfile.d dir structure for services - https://phabricator.wikimedia.org/T258572 (10JMeybohm) @Ottomata do you have the chance to migrate eventgate-analytics-external, eventgate-logging-external and eventstreams in near... [15:55:27] is there anything akin to a contract for services running in k8s? as in what upper limits of resources a service can/should use per pod/service. If not, any general opinions on that kind of thing? [16:08:23] hnowlan: there def. is a upper limit on how big a pod can be as thats basically the resources available on our biggest node type. Although it will probably not get sheduled than as well [16:31:05] so in the past I've seen rules of thumb for larger scales than our own that were like [16:31:23] * don't reserve more than 1/2 of the maximum available along any resource axis [16:31:43] 10serviceops, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10User-jijiki: High latency on push notification service initialization - https://phabricator.wikimedia.org/T265258 (10jijiki) [16:31:57] * if you run many pods, they need to come close to the aggregate ratios between resources of the underlying machines -- most notably CPU cores / GB of RAM [16:37:44] that makes sense. it'd be cool if there was a one stop shop for that to show users, but then again are we really adding *that* many services on a regular basis etc [16:38:06] background being I was talking to a team who have a service with a 16GB memory footprint and I figure that's not gonna fly [16:39:06] it depends! if they need a small number of those pods, it might be okay [16:52:01] 10serviceops, 10Operations, 10Growth-Team (Current Sprint), 10Patch-For-Review, 10User-Elukey: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10nettrom_WMF) >>! In T252391#6536174, @kostajh wrote: > Hmm, I spoke too soon. We rely on the `wgWMEUnderstandingFirstDay` bei... [16:56:43] <_joe_> hnowlan: it might be just matter of asking for the budget for it [16:57:03] <_joe_> we never got around solving the issue of resource allocation policies, maybe it's time we do? [16:57:56] <_joe_> hnowlan: also what team, and what service? please include someone from serviceops in the discussions early on :) [17:05:14] _joe_: it's a bit of an overlapping set of teams but it's coming from research. Things are entirely preliminary, I'll keep you all in the loop of course [17:17:32] 10serviceops, 10Operations, 10Growth-Team (Current Sprint), 10Patch-For-Review, 10User-Elukey: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10Tgr) >>! In T252391#6536174, @kostajh wrote: > I think instead of checking to see if `wgWMEUnderstandingFirstDay` is true, we... [17:57:37] 10serviceops, 10Operations, 10ops-codfw: Degraded RAID on mw2279 - https://phabricator.wikimedia.org/T264698 (10Papaul) @jijiki I will request a disk replacement [18:34:42] this should resolve one more little step towards mediawiki on buster: https://gerrit.wikimedia.org/r/c/operations/puppet/+/633275 [19:23:31] 10serviceops, 10Operations, 10Growth-Team (Current Sprint), 10Patch-For-Review, 10User-Elukey: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10kostajh) >>! In T252391#6539461, @nettrom_WMF wrote: >>>! In T252391#6536174, @kostajh wrote: >> Hmm, I spoke too soon. We re... [19:23:45] 10serviceops, 10Operations, 10ops-codfw: Degraded RAID on mw2279 - https://phabricator.wikimedia.org/T264698 (10jijiki) Thank you! [19:50:03] 10serviceops, 10Growth-Structured-Tasks, 10Growth-Team, 10Release-Engineering-Team: Move mwaddlink-query from github to gerrit - https://phabricator.wikimedia.org/T261403 (10kostajh) > if you help me with requesting a gerrit-repo that would be great (no experience with that yet). also Pinging #release-eng... [19:51:42] 10serviceops, 10Growth-Structured-Tasks, 10Growth-Team, 10Release-Engineering-Team: Move mwaddlink-query from github to gerrit - https://phabricator.wikimedia.org/T261403 (10Dzahn) >>! In T261403#6538868, @MGerlach wrote: > if you help me with requesting a gerrit-repo that would be great (no experience wit... [21:14:15] 10serviceops, 10Operations, 10ops-codfw: Degraded RAID on mw2279 - https://phabricator.wikimedia.org/T264698 (10Dzahn) ` 21:08 <+icinga-wm> PROBLEM - Ensure local MW versions match expected deployment on mw2279 is CRITICAL: CRITICAL: 956 mismatched wikiversions https://wikitech.wikimedia.... [21:41:26] 10serviceops, 10Operations, 10ops-codfw: Degraded RAID on mw2279 - https://phabricator.wikimedia.org/T264698 (10Papaul) Create Dispatch: Success You have successfully submitted request SR1039679642. [23:05:58] 10serviceops, 10Growth-Structured-Tasks, 10Growth-Team, 10Release-Engineering-Team: Move mwaddlink-query from github to gerrit - https://phabricator.wikimedia.org/T261403 (10thcipriani) >>! In T261403#6540180, @kostajh wrote: >> if you help me with requesting a gerrit-repo that would be great (no experienc...