[08:39:00] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10jijiki) [08:40:07] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10jijiki) p:05Triage→03Normal [08:54:09] 10serviceops, 10Operations: Appservers rising GET latency might have triggered LVS pages - https://phabricator.wikimedia.org/T238973 (10fgiunchedi) [08:54:13] 10serviceops, 10Operations, 10Traffic: Increased latency in appservers - 22 Nov 2019 - https://phabricator.wikimedia.org/T238939 (10fgiunchedi) [08:54:35] 10serviceops, 10Operations: Appservers rising GET latency might have triggered LVS pages - https://phabricator.wikimedia.org/T238973 (10fgiunchedi) The cause was indeed appservers latency, resolving in favor of T238939 [08:55:20] 10serviceops, 10Operations: Appservers rising GET latency might have triggered LVS pages - https://phabricator.wikimedia.org/T238973 (10Joe) I find it hard to believe this is the case. Text-lb checks request a cached url, so the backend latency should not matter. [09:45:38] 10serviceops, 10Operations: Appservers rising GET latency might have triggered LVS pages - https://phabricator.wikimedia.org/T238973 (10fgiunchedi) 05duplicate→03Open >>! In T238973#5688257, @Joe wrote: > I find it hard to believe this is the case. Text-lb checks request a cached url, so the backend latenc... [11:50:00] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Pybal, 10Traffic: Proposal: simplify set up of a new load-balanced service on kubernetes - https://phabricator.wikimedia.org/T238909 (10akosiaris) [12:40:52] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10jijiki) [13:24:59] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: mw1239 memory errors - https://phabricator.wikimedia.org/T227867 (10Cmjohnson) mw1239 is well out of warranty and is over 5 years old. Historically we decom these host at this stage in their life. We also have a several new MW servers waiting to be racke... [13:26:56] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: mw1239 memory errors - https://phabricator.wikimedia.org/T227867 (10MoritzMuehlenhoff) @Cmjohnson mw1239 will be decommed soon via https://phabricator.wikimedia.org/T239054, we can close this task. [13:47:21] 10serviceops, 10Operations, 10observability, 10Patch-For-Review: dropped packets to conf1004/5/6 2379/tcp - https://phabricator.wikimedia.org/T238791 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Fixed! [14:07:28] FYI the etcd cloud project will be shutdown next week if not claimed ;) [14:15:11] * apergos pokes _joe_ [14:21:35] <_joe_> I know, it should die, I don't use it since 3 years [14:38:55] ack [16:07:14] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10WDoranWMF) Thanks @jijiki! [16:31:13] coming ! [16:31:42] o/ _joe_ you around? can youu help merge the lvs and discovery patches in https://phabricator.wikimedia.org/T236386 [16:32:25] or, maybe we can schedule some time tomorrow my morning to do it? [16:32:34] or wedneseday my morning? [16:38:12] <_joe_> tomorrow is better [16:38:18] <_joe_> now I have meetings until 7 pm [16:40:38] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10jijiki) [16:50:03] ok, wed might actually be better for me, ok if i find a time on your cal for then? [16:58:47] <_joe_> sure [17:01:45] ok done danke [17:12:25] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10WDoranWMF) @jijiki Do you know when the rollout will be complete to all prod? [17:15:25] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10jijiki) @WDoranWMF Today after the SRE meeting, I will roll out to production. We had some minor issues with our api servers this morn... [17:29:49] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10elukey) In hiera we have 4 codfw mw hosts acting as proxy for mcrouter: ` codfw: A: host: 10.192.0.61 # mw2235, A3 port: 11214 ssl: true B: host: 10.192.16.5... [17:32:14] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10elukey) [17:38:36] 10serviceops, 10Core Platform Team, 10Performance-Team, 10Scap, and 5 others: Define variant Wikimedia production config in compiled, static files - https://phabricator.wikimedia.org/T223602 (10Jdforrester-WMF) [17:42:10] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10jijiki) [17:45:12] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10jijiki) [18:02:39] 10serviceops, 10Operations, 10HHVM, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), and 2 others: Remove HHVM from production - https://phabricator.wikimedia.org/T229792 (10jijiki) [18:02:57] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10jijiki) [18:03:00] 10serviceops, 10Operations, 10HHVM, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), and 2 others: Remove HHVM from production - https://phabricator.wikimedia.org/T229792 (10jijiki) [18:05:26] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10jijiki) [18:30:37] addshore: are you stalking a certain wiki that i just edited? :-P [18:31:00] apparently not closely enough :P [18:31:35] lolol [18:32:29] which was it? [20:23:18] i moved mw1298 to the correct section in conftool (jobrunner) so it matches what it is in site.pp [20:23:36] i saw it as jobrunner/videoscaler as it should and then pooled it [20:23:42] except the weight is set to 0 [20:25:41] docs seem to say this should only happen after "confctl drain" [20:30:49] there's no default weight set anymore on new nodes, since there's no longer the idea of service-level settings in confctl [20:33:36] cdanis: oh. so does it mean i can ignore the weight setting? [20:33:48] and there is only pool or depool [20:33:51] mutante: no, you will still have to set it to a value that makes sense given the weights of other nodes [20:33:52] or inactive [20:34:05] mutante: it just means that new nodes no longer automatically come with a per-service default weight [20:34:19] cdanis: How do i change the weight? I could not find that in the conftool page [20:36:23] so this will show for a given host: confctl --quiet select name=mw1298.eqiad.wmnet get [20:36:58] and judging by the output of `confctl --quiet select cluster=jobrunner get` (and same for cluster=videoscaler) it looks like the default weight is 10 [20:37:10] yea, i got that part [20:37:20] so you should be able to replace `get` with `set/weight=10` I think [20:37:21] set/weight=10 is needed now? [20:37:24] in the first one [20:37:28] the name=mw1298 [20:37:54] yep:) works. thanks! [20:38:03] last time i did this it was automatically 10 [20:38:14] yeah, I'll fix the docs some [20:38:34] cool. thank you [20:59:07] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10jijiki) Version 1.10.0 is live, please mark this as resolved if everything works as expected. [21:01:50] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10jijiki) [21:21:23] 10serviceops, 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10jijiki) Currently: | Rack |A 5| A6|A 7|B 6|B 7| C 6| D 4 | D 5 |mw servers|6|6|17|21|6| 30|6 (decom)|30 (decom) We will decommission 36 servers from... [22:45:44] 10serviceops, 10Operations: upgrade and rename krypton & create its codfw equivalent - https://phabricator.wikimedia.org/T224247 (10Dzahn) 05Open→03Resolved [22:59:24] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Prod compare endpoint missing offset object (with from & to keys) on diff items - https://phabricator.wikimedia.org/T238846 (10Tsevener) 05Open→03Resolved [22:59:28] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10Tsevener) [23:00:11] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10Tsevener) 05Open→03Resolved [23:00:21] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10Tsevener) Looking good in Prod, thanks everyone! [23:15:44] 10serviceops, 10ChangeProp, 10Operations, 10Release Pipeline, and 4 others: Migrate cpjobqueue to kubernetes - https://phabricator.wikimedia.org/T220399 (10holger.knust) a:03holger.knust [23:16:10] 10serviceops, 10ChangeProp, 10Release Pipeline, 10Release-Engineering-Team-TODO, and 2 others: Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10holger.knust) a:03holger.knust [23:40:26] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: mw1239 memory errors - https://phabricator.wikimedia.org/T227867 (10Dzahn) 05Open→03Declined [23:52:44] _joe_, mutante checking in on the memory bump patch ... looks like it is blocked on a puppet patch. something that can go out tomorrow?