[08:39:53] 10serviceops, 10MediaWiki-General, 10Core Platform Team Workboards (Clinic Duty Team), 10Language-Team (Language-2019-October-December), and 4 others: Preemptive refresh in getMultiWithSetCallback() and getMultiWithUnionSetCallback() pollutes cache - https://phabricator.wikimedia.org/T235188 (10Nikerabbit)... [08:44:04] 10serviceops, 10Operations, 10HHVM, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), and 2 others: Remove HHVM from production - https://phabricator.wikimedia.org/T229792 (10jijiki) [09:05:02] 10serviceops, 10Operations: Kubernetes workers frequent oom-killer in action - https://phabricator.wikimedia.org/T237198 (10Joe) 05Open→03Invalid a:03Joe So: - kubernetes{1,2}00{5,6} are specialized nodes that only run kask for sessions, that's why you don't see ooms there. - The OOM killer doesn't only... [09:14:16] 10serviceops, 10Operations: Kubernetes hosts raid check make facter fail - https://phabricator.wikimedia.org/T237197 (10MoritzMuehlenhoff) p:05Triage→03Normal [09:30:37] 10serviceops, 10Operations, 10HHVM, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), and 2 others: Remove HHVM from production - https://phabricator.wikimedia.org/T229792 (10jijiki) [09:31:54] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Release Pipeline, 10Release-Engineering-Team: Allow testing of feature-flag-protected features in deployment-charts CI - https://phabricator.wikimedia.org/T236899 (10Joe) 05Open→03Resolved The CI is far from perfect, but it catches the most mundane iss... [09:57:35] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) Status at the moment: ` == jobs_with_all_failures (6) == an-master1002.eqiad.wmnet-Monthly-1st-Mon-production-hadoop-namen... [10:13:57] 10serviceops, 10Operations, 10Kubernetes: Add TLS termination to services running on kubernetes - https://phabricator.wikimedia.org/T235411 (10Joe) [10:24:29] 10serviceops, 10Operations, 10Kubernetes: Collect metrics from envoy where it is enabled on k8s - https://phabricator.wikimedia.org/T237234 (10Joe) [10:29:25] 10serviceops, 10Operations, 10Packaging: Build and upload envoy 1.12.0 package. - https://phabricator.wikimedia.org/T237235 (10Joe) [10:55:57] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) * `an-master1002.eqiad.wmnet-Monthly-1st-Mon-production-hadoop-namenode-backup`: connectivity issue bacula client: ` Nov 04 0... [11:04:37] 10serviceops, 10Operations: Upgrade to PHP 7.2.24 - https://phabricator.wikimedia.org/T237239 (10MoritzMuehlenhoff) [11:06:36] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) @elukey @Ottomata Re: matomo1001, is there a reason not to have daily incrementals? If the reason is that it generates a full... [11:07:33] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) [11:07:56] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) [11:42:02] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10elukey) >>! In T236406#5630677, @jcrespo wrote: > @elukey @Ottomata Re: matomo1001, is there a reason not to have daily incrementals?... [11:47:26] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) @elukey If this helps, I can try generating manually an incremental, for a better informed decision about storage size (it sh... [11:55:26] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10elukey) I am all for simplifying and standardizing confs, so no opposition about incremental. Only one question - what would it change... [11:56:42] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) ` # check_bacula.py matomo1001.eqiad.wmnet-Weekly-Wed-production-mysql-srv-backups 2019-10-30 02:05:43: type: F, status: T, b... [11:58:50] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) >>! In T236406#5630885, @elukey wrote: > I am all for simplifying and standardizing confs, so no opposition about incremental... [12:00:12] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) Let me find where they are configured and I will send you a patch- later feel free to ping me on IRC and I will show you how... [12:17:45] i propose in the serviceops meeting today we discuss the mw/k8s/thumbor hardware order [12:18:13] akosiaris: effie: _joe_: ^ [12:20:03] keep in mind we are in velocity [12:20:30] but yes, there are some disrepancies between some things that we should discuss [12:41:34] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) @elukey I am sorry, after looking closely to the policies, I mistakenly assumed the schedule was wrong. I will abandon patchi... [13:40:06] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) The matomo false alert is now correctly gone, only the 6 issues due to the 3 tickets above (T236406#5630631) left: ` All fail... [13:40:20] 10serviceops, 10Operations: Kubernetes workers frequent oom-killer in action - https://phabricator.wikimedia.org/T237198 (10akosiaris) As @Joe said, that's expected. It's how misbehaving services are killed in order to recover. Here's also a breakdown in case anyone is interested ` kubectl get pods --all-name... [13:41:16] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10elukey) Nice thanks! Just pushed the new rules to the routers, so in theory an-master1002 and analytics1029 should go away now! Let me... [13:43:51] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10akosiaris) [13:50:15] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) Forcing a manual run on the 2 above for validation. [14:31:47] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) :-) ` All failures: 4 (bromine, ...), Fresh: 90 jobs ` Unsubbing elukey and Otto to prevent unwanted spam (feel free to resub... [16:05:12] 10serviceops, 10Operations: Upgrade to PHP 7.2.24 - https://phabricator.wikimedia.org/T237239 (10MoritzMuehlenhoff) p:05Triage→03Normal [16:05:52] 10serviceops, 10Operations, 10Packaging: Build and upload envoy 1.12.0 package. - https://phabricator.wikimedia.org/T237235 (10MoritzMuehlenhoff) p:05Triage→03Normal [16:35:27] 10serviceops, 10Operations: Upgrade to PHP 7.2.24 - https://phabricator.wikimedia.org/T237239 (10Jdforrester-WMF) [16:35:34] 10serviceops, 10Operations: Upgrade to PHP 7.2.24 - https://phabricator.wikimedia.org/T237239 (10Jdforrester-WMF) [18:37:05] 10serviceops, 10Core Platform Team, 10Performance-Team, 10Scap, and 5 others: Define variant Wikimedia production config in compiled, static files - https://phabricator.wikimedia.org/T223602 (10Jdforrester-WMF) [18:53:19] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10Dzahn) [20:00:41] 10serviceops, 10Release-Engineering-Team, 10Wikimedia-General-or-Unknown: "No localisation cache found for English. Please run maintenance/rebuildLocalisationCache.php." - https://phabricator.wikimedia.org/T237305 (10Krinkle) [20:00:53] 10serviceops, 10Release-Engineering-Team, 10Wikimedia-General-or-Unknown: Some servers CLI fatal "No localisation cache found for English." - https://phabricator.wikimedia.org/T237305 (10Krinkle) [20:23:15] 10serviceops, 10Release-Engineering-Team, 10Wikimedia-General-or-Unknown: Some servers CLI fatal "No localisation cache found for English." - https://phabricator.wikimedia.org/T237305 (10Reedy) On `deploy1001` - All wikis should fail, or non should. They're all on the same MW version `lines=10 reedy@deploy1... [20:31:39] 10serviceops, 10Release-Engineering-Team, 10Wikimedia-General-or-Unknown: Some wikis on CLI deploy1001 fatal with "No localisation cache found for English." - https://phabricator.wikimedia.org/T237305 (10Reedy) [21:04:12] 10serviceops, 10Release-Engineering-Team, 10Wikimedia-General-or-Unknown: Some wikis on CLI deploy1001 fatal with "No localisation cache found for English." - https://phabricator.wikimedia.org/T237305 (10Krinkle) (On mobile) Do the files exist on disk? Are they different on mwmaint in terms of content or mo... [21:07:17] mutante, FYI about https://phabricator.wikimedia.org/T237304 .. looks like some missing packages. [21:10:08] ok, looking! any news about beta deploy? [21:21:33] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) Down to 2: ` All failures: 2 (cloudweb2001-dev, ...), Fresh: 90 jobs ` Which should be fixed when cloud patch is reviewed an... [21:25:04] 10serviceops, 10Operations: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10Dzahn) [21:32:27] the math rendering packages were removed on Oct 1st in https://gerrit.wikimedia.org/r/c/operations/puppet/+/540154 [21:32:31] that included ploticus [21:32:48] so the 2 test servers had that because they were installed before Oct 1st [21:40:45] 10serviceops, 10Parsoid-PHP, 10Patch-For-Review: EasyTimeline extension shell error - https://phabricator.wikimedia.org/T237304 (10Dzahn) [21:52:22] mutante, reg. beta not yet .. we'll probably test that tomorrow. [21:58:39] ack [21:59:17] nice to see the traffic ticket resolved [22:05:50] 10serviceops, 10Operations, 10Parsoid-PHP: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10ssastry) Later next month, we are also going to relax some resource limit constraints which we imported from Parsoid/JS (and which we have wanted to relax for a while now but never got ar...