[06:05:18] <_joe_> so we have 10G racks that can be used for memcached as well? [14:21:48] Hey akosiaris o/ - do you have some time to discuss on https://phabricator.wikimedia.org/T266766 today? [14:22:18] sure yeah, maybe I can focus on something for a while [14:22:21] * akosiaris reading [14:23:09] * jayme espressoing [14:34:59] Now that I read calico stuff, it turns out that the current calico versions do not currently test against kubernetes 1.19. So we might need to stick to 1.18 anyways [14:55:58] jayme: we can probably do what https://phabricator.wikimedia.org/T266766 talks about. It's not really like we win much by that building process [14:56:08] I 've left a comment already [14:56:14] jayme: which calico stuff? [14:57:19] akosiaris: generics...trying to understand what we would need to get/build/package there. https://docs.projectcalico.org/getting-started/kubernetes/requirements sais Calico v3.16 is tested against 1.6-1.18 [15:00:38] might be worth the time savings to use the official release builds for calico as well. Unfortunately I did not even find checksums for the tarbal they release :-( [15:06:08] can't find anything about 1.19 so yeah, maybe stick to 1.18 [15:06:45] I found https://github.com/kubernetes/kubernetes/issues/94640 where it seems like it did not work, but it's marked as fixed now [15:07:07] ah no, as closed, but not fixed [15:07:08] heh [15:11:50] hm..yeah. Most likely calico will release a k8s 1.19 tested version as soon as we have updated the clusters to 1.18 :) [15:15:07] _joe_: can you please look at https://gerrit.wikimedia.org/r/c/operations/puppet/+/637708 [15:18:53] <_joe_> effie: +1, but... for monday, right? [15:19:13] for mc2036 we were thinking about today [15:20:19] yeah, my take is this is fine to do on a Friday -- it's low blast-radius, and if anything goes wrong it'll certainly be right now and not over the weekend [15:20:42] if you'd prefer that we wait, we can [15:26:41] _joe_: thoughts? [15:27:10] <_joe_> oh for just codfw, yeah go on [15:27:34] the patch will kill the shard though on both DCs [15:27:40] but I take it it is ok? [15:27:40] <_joe_> yeah [15:27:42] cool [15:28:05] 👍 [15:28:38] <_joe_> yeah in codfw it's cool to do today I think [15:28:47] <_joe_> it will just stop replication for that redis shard [16:22:08] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: replace production deployment servers - https://phabricator.wikimedia.org/T265963 (10Cmjohnson) [16:43:11] 10serviceops, 10MW-on-K8s, 10Operations, 10observability: Logging options for apache httpd in k8s - https://phabricator.wikimedia.org/T265876 (10akosiaris) Couple of points > We create a directory on the k8s node that works as a hostpath in all apache containers, and we make apache write its logs there, w... [17:23:10] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Build kubernetes 1.19 - https://phabricator.wikimedia.org/T266766 (10JMeybohm) Unfortunately, the current version of calico is not tested against kubernetes 1.19.x yet (https://docs.projectcalico.org/getting-started/kubernetes/requirements#... [17:23:23] 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` scandium.eqi... [17:33:19] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Build calico 3.16 - https://phabricator.wikimedia.org/T266893 (10JMeybohm) [17:33:58] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Build calico 3.16 - https://phabricator.wikimedia.org/T266893 (10JMeybohm) [17:34:01] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10User-fsero: Upgrade Calico - https://phabricator.wikimedia.org/T207804 (10JMeybohm) [17:39:08] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Decide if we want to stick with etcd datastore - https://phabricator.wikimedia.org/T266895 (10JMeybohm) [17:39:33] 10serviceops, 10Operations, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin2001.codfw.wmnet for hosts: ` mc2036.codfw.wmnet ` The... [17:44:38] 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['scandium.eqiad.wmnet'] ` and were **ALL** successful. [17:52:05] 10serviceops, 10Operations, 10Performance-Team, 10Traffic, 10Performance Issue: Very long response time on frwiki main page - https://phabricator.wikimedia.org/T266865 (10CDanis) This isn't limited to just esams; it is in fact happening across all cache clusters. All of my requests took at least 19 seco... [18:37:00] 10serviceops, 10Operations, 10Performance-Team, 10Traffic, 10Performance Issue: Very long response time on frwiki main page - https://phabricator.wikimedia.org/T266865 (10jijiki) p:05High→03Unbreak! [18:45:14] 10serviceops, 10Operations, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc2036.codfw.wmnet'] ` Of which those **FAILED**: ` ['mc2036.codfw.wmn... [18:46:19] 10serviceops, 10Operations, 10Performance-Team, 10Traffic, and 2 others: Very long response time on frwiki main page - https://phabricator.wikimedia.org/T266865 (10jijiki) [18:56:00] 10serviceops, 10Operations, 10Performance-Team, 10Traffic, and 2 others: Very long response time on frwiki main page - https://phabricator.wikimedia.org/T266865 (10CDanis) 05Open→03Resolved a:03CDanis Approx 23:00 on 28 Oct, the size of the featured feed for frwiki started to become too large to be s... [19:07:02] 10serviceops, 10Operations, 10Performance-Team, 10Traffic, and 2 others: Very long response time on frwiki main page - https://phabricator.wikimedia.org/T266865 (10Legoktm) >>! In T266865#6592372, @CDanis wrote: > Long ago, frwiki's default feed length (in days) was set to 60, well above the default of 10.... [20:41:02] 10serviceops, 10Operations, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin2001.codfw.wmnet for hosts: ` mc2036.codfw.wmnet ` The... [20:41:06] 10serviceops, 10Operations, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc2036.codfw.wmnet'] ` Of which those **FAILED**: ` ['mc2036.codfw.wmn... [20:44:21] 10serviceops, 10Operations, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin2001.codfw.wmnet for hosts: ` mc2036.codfw.wmnet ` The... [20:58:43] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: eqiad: Physical Moves for MediaWiki Servers - https://phabricator.wikimedia.org/T266164 (10Dzahn) a:05Cmjohnson→03Dzahn [21:10:47] 10serviceops, 10Operations, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc2036.codfw.wmnet'] ` and were **ALL** successful. [21:17:37] 10serviceops, 10Operations, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: Reimage one memcached shard per DC to Buster - https://phabricator.wikimedia.org/T252391 (10jijiki) [21:20:51] 10serviceops, 10Operations, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: Reimage one memcached shard per DC to Buster - https://phabricator.wikimedia.org/T252391 (10jijiki) We removed `shard18` from `redis.yaml` so to be able to avoid installing redis-server on this server pair (mc1036... [22:17:22] 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10Dzahn) scandium has been reimaged. It is now just an mw appserver plus: git clone of parsoid repo, nginx for test s... [22:18:53] 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10Dzahn) 05Open→03Resolved @ssastry @Muehlenhoff Let me know if you see anything missing. Claiming resolved for now. [23:48:29] regarding the Wikidata microsites, I added apache config and updated the TLS cert: [23:48:32] https://phabricator.wikimedia.org/T266702#6592812 [23:49:28] now it should just need the ATS part to make it public (as ticket says that needs discussion) but I wanted to unblock as much as possible [23:49:54] it already works to use httpbb (with a manual test) file on miscweb* to get query.wikidata.org