[10:35:11] 10serviceops, 10MediaWiki-General, 10Core Platform Team Workboards (Clinic Duty Team), 10Language-Team (Language-2019-October-December), and 4 others: Some revisions' contents are incorrect in the cache - wrong contents shown in history & diffs - https://phabricator.wikimedia.org/T235188 (10Nikerabbit) The... [10:37:47] 10serviceops, 10Operations, 10Traffic, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10ema) [10:59:00] 10serviceops, 10MediaWiki-General, 10Core Platform Team Workboards (Clinic Duty Team), 10Language-Team (Language-2019-October-December), and 4 others: Some revisions' contents are incorrect in the cache - wrong contents shown in history & diffs - https://phabricator.wikimedia.org/T235188 (10Joe) >>! In T23... [10:59:20] <_joe_> akosiaris, effie ^^ we need to tackle that :/ [11:00:11] let me read, we are on a roll with caching issues lately [11:00:21] inside and out of the apps [11:01:24] 10serviceops, 10Arc-Lamp, 10Performance-Team: Resolve arclamp disk exhaustion problem (Oct 2019) - https://phabricator.wikimedia.org/T235455 (10akosiaris) > Increase disk space on the webperf*002 Ganeti VMs? – Was previously denied, at T199853. I don't think that's true, from the looks of it, not only it wa... [11:05:28] _joe_: translatewiki? [11:05:32] what am I missing? [11:06:12] <_joe_> you're missing that the cache corruption was happening on actual wikis using the translate extension [11:06:57] <_joe_> so the corruption is in all cache layers in production [12:45:56] 10serviceops, 10MediaWiki-General, 10Core Platform Team Workboards (Clinic Duty Team), 10Language-Team (Language-2019-October-December), and 4 others: Some revisions' contents are incorrect in the cache - wrong contents shown in history & diffs - https://phabricator.wikimedia.org/T235188 (10Trizek-WMF) Any... [14:19:51] 10serviceops, 10Deployments, 10Release-Engineering-Team, 10Performance-Team (Radar): Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10Krinkle) [14:22:08] 10serviceops, 10Arc-Lamp, 10Performance-Team: Resolve arclamp disk exhaustion problem (Oct 2019) - https://phabricator.wikimedia.org/T235455 (10Krinkle) >>! In T235455#5602334, @akosiaris wrote: >> Increase disk space on the webperf*002 Ganeti VMs? – Was previously denied, at T199853. > > I don't think that... [14:48:07] 10serviceops, 10Arc-Lamp, 10Performance-Team: Resolve arclamp disk exhaustion problem (Oct 2019) - https://phabricator.wikimedia.org/T235455 (10akosiaris) > This task is about getting our retention back from 45d to 90d. Ok, let's add another 150GB to achieve that. [14:50:56] 10serviceops, 10Arc-Lamp, 10Performance-Team: Resolve arclamp disk exhaustion problem (Oct 2019) - https://phabricator.wikimedia.org/T235455 (10akosiaris) > Thu Oct 24 14:50:11 2019 Growing disk 1 of instance 'webperf2002.codfw.wmnet' by 150.0G to 300.0G and > Thu Oct 24 14:49:03 2019 Growing disk 1 of in... [14:51:01] mark rlazarus akosiaris apergos if I am not done with removing hhvm packages from prod until the meeting [14:51:07] I may miss it [14:51:20] I need to finish it today [14:51:23] ah ha [15:08:58] 10serviceops, 10Deployments, 10Release-Engineering-Team, 10Performance-Team (Radar): Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10Krinkle) I don't know off-hand whether the mtime comes from the deployment server (set when the last git command sav... [15:14:21] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, and 2 others: Resolve service instability due to excessive event loop blockage since starting PCS response pregeneration - https://phabricator.wikimedia.org/T229286 (10JoeWalsh) This issue is now resolved... [15:22:43] 10serviceops, 10Operations, 10observability, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10jijiki) I will take a look tomorrow, sorry for delaying this [15:22:58] 10serviceops, 10Operations, 10observability, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10jijiki) a:03jijiki [15:32:15] effie: guessing you'll skip it? [15:41:26] mark: yeah I am doing eqiad now [15:57:03] oh the meeting finished earlier? [15:57:07] I joined and I was all alone [15:57:09] :) [15:57:30] 10serviceops, 10Operations, 10Phabricator, 10hardware-requests, 10Release-Engineering-Team (Development services): The phabricator server, WMF7426, was given to us temporarily, we would like to make it permanent - https://phabricator.wikimedia.org/T232887 (10mark) I'm a bit confused; as far as I know the... [15:59:23] 10serviceops, 10Operations, 10HHVM, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), and 2 others: Remove HHVM from production - https://phabricator.wikimedia.org/T229792 (10jijiki) [16:02:26] yeah, we were light on participants and also agenda :) [16:02:40] 10serviceops, 10Operations, 10Wikimedia-General-or-Unknown, 10Performance-Team (Radar): Investigate recurrent GET latency spikes on MediaWiki appservers (Oct 16) - https://phabricator.wikimedia.org/T235872 (10Krinkle) [16:03:06] 10serviceops, 10Operations, 10Performance-Team (Radar): Increased POST latency for MW app servers (Oct 2019) - https://phabricator.wikimedia.org/T235755 (10Krinkle) [16:10:29] so next week we do the status reporting on thursday [16:24:12] 10serviceops, 10DBA, 10Operations, 10Goal: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) [16:24:21] 10serviceops, 10DBA, 10Operations, 10Goal: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) p:05Triage→03High [16:25:18] 10serviceops, 10DBA, 10Operations, 10Goal: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) [16:26:46] 10serviceops, 10DBA, 10Operations, 10Goal: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10akosiaris) [16:27:17] 10serviceops, 10DBA, 10Operations, 10Goal: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) [16:28:56] 10serviceops, 10MediaWiki-General, 10Core Platform Team Workboards (Clinic Duty Team), 10Language-Team (Language-2019-October-December), and 4 others: Some revisions' contents are incorrect in the cache - wrong contents shown in history & diffs - https://phabricator.wikimedia.org/T235188 (10Krinkle) >>! In... [16:29:07] 10serviceops, 10DBA, 10Operations, 10Goal: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) So because of buster clients and jessie storage daemons cannot talk to each other, we will have to alter slightly the upgrade strategy. Several opt... [17:05:48] 10serviceops, 10MediaWiki-General, 10Core Platform Team Workboards (Clinic Duty Team), 10Language-Team (Language-2019-October-December), and 4 others: Preemptive refresh in getMultiWithSetCallback() and getMultiWithUnionSetCallback() pollutes cache - https://phabricator.wikimedia.org/T235188 (10aaron) [19:08:41] 10serviceops, 10Core Platform Team, 10Performance-Team, 10Scap, and 5 others: Define variant Wikimedia production config in compiled, static files - https://phabricator.wikimedia.org/T223602 (10ArielGlenn) Hey @Jdforrester-WMF I'm looking around at the last patchset and not really understanding where the d... [21:32:09] a lot of cron spam from hhvm-needs-restart because /usr/local/bin/hhvm-needs-restart: not found [21:32:25] goes to remove some crons [21:52:39] merges https://gerrit.wikimedia.org/r/c/operations/puppet/+/545950 [21:56:51] 10serviceops, 10Operations, 10HHVM, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), and 2 others: Remove HHVM from production - https://phabricator.wikimedia.org/T229792 (10Dzahn) merged the above because we were getting cron spam from appservers with "/usr/local/bin/hhvm-needs-restart: not found"