[07:06:41] 10serviceops, 10Dumps-Generation, 10SRE-tools, 10IPv6: Some Service Operations clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271142 (10ArielGlenn) >>! In T271142#6957884, @crusnov wrote: >>>! In T271142#6957794, @crusnov wrote: >> Sounds good, if you'd like to ping me on IRC... [09:20:56] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Define the size of a pod for mediawiki in terms of resource usage - https://phabricator.wikimedia.org/T278220 (10jijiki) At first it looks like a mw1410 (api same h/w as mw1412) performs slightly better in p50, but that was happening from before, so basically I... [12:32:40] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Define the size of a pod for mediawiki in terms of resource usage - https://phabricator.wikimedia.org/T278220 (10akosiaris) I think so too. I 've worked up a new version of the patch that no longer touches needlessly all hosts having mcrouter and memcached insta... [12:44:46] 10serviceops, 10MediaWiki-Page-derived-data, 10Sustainability: Add rate limiting to the jobqueue vidoscalers - https://phabricator.wikimedia.org/T278945 (10jbond) p:05Triage→03Medium [12:50:33] 10serviceops, 10MediaWiki-Page-derived-data, 10Sustainability (Incident Followup): Add rate limiting to the jobqueue vidoscalers - https://phabricator.wikimedia.org/T278945 (10jbond) [12:51:19] 10serviceops, 10SRE, 10Sustainability (Incident Followup): Add alerting for Memcached timeout errors - https://phabricator.wikimedia.org/T278946 (10jbond) p:05Triage→03Medium [13:11:34] 10serviceops, 10Dumps-Generation, 10SRE-tools, 10IPv6: Some Service Operations clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271142 (10akosiaris) >>! In T271142#6957011, @crusnov wrote: >>>! In T271142#6955077, @akosiaris wrote: >> Hi! >> >> TL;DR: Aside from snapshot and du... [13:24:10] 10serviceops, 10SRE, 10Parsoid (Tracking): Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10jijiki) [14:14:41] 10serviceops, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service: Allow `push-notifications` service to accept production environment flag for APNS requests - https://phabricator.wikimedia.org/T274456 (10jijiki) >>! In T274456#6957406, @Dmantena wrote: >>> If we can't access that actual produ... [14:19:15] 10serviceops, 10Prod-Kubernetes, 10Patch-For-Review, 10User-fsero: Set up PodSecurityPolicies in clusters - https://phabricator.wikimedia.org/T228967 (10akosiaris) [14:20:37] 10serviceops, 10Analytics-Radar, 10Cassandra, 10ContentTranslation, and 9 others: Rebuild all blubber build docker images running on kubernetes - https://phabricator.wikimedia.org/T274262 (10akosiaris) 05Open→03Resolved All action items have been addressed, many many thanks to everyone who contributed.... [15:05:20] 10serviceops, 10MW-on-K8s: Define the size of a pod for mediawiki in terms of resource usage - https://phabricator.wikimedia.org/T278220 (10akosiaris) p:05Triage→03Medium Change merged and shepherd into production. https://grafana-rw.wikimedia.org/d/0VjCCwwGk/mediawiki-server-clusters-utilization?orgId=1 n... [15:26:58] 10serviceops, 10Dumps-Generation, 10SRE-tools, 10IPv6: Some Service Operations clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271142 (10crusnov) [15:32:33] 10serviceops, 10Dumps-Generation, 10SRE-tools, 10IPv6: Some Service Operations clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271142 (10crusnov) >>! In T271142#6960492, @akosiaris wrote: >> rdb looks straight forward. > > Yes it does. ferm configuration is just opening up the... [20:05:34] 10serviceops, 10SRE: bring 35 new mediawiki appserver in codfw into production (mw2377 - mw2402) - https://phabricator.wikimedia.org/T278396 (10Dzahn) codfw: number of appservers ("apaches"): 49 number of API appservers ("api"): 54 number of jobrunners/videoscalers ("jobrunner"): 18 eqiad: number of app... [20:14:51] 10serviceops, 10SRE: bring 35 new mediawiki appserver in codfw into production, rack A3 (mw2377 - mw2402) - https://phabricator.wikimedia.org/T278396 (10Dzahn) [20:20:32] ^ figuring out how to divide the new hardware between different roles [20:20:54] I am going to add: 4 more jobrunners, 12 app and 8 API servers [20:21:26] this will let us finish decom of 4 more jobrunners on old hardware and bring the numbers to 61/62 vs 63/63 in eqiad for the rest [20:21:40] feel free to check me pls [20:53:42] 10serviceops, 10DBA, 10Phabricator, 10User-brennen: Phabricator intermittently slow; db connection failures to m3-master.eqiad.wmnet with "Temporary failure in name resolution" - https://phabricator.wikimedia.org/T279013 (10brennen) [20:56:50] 10serviceops, 10DBA, 10Phabricator, 10User-brennen: Phabricator intermittently slow; db connection failures to m3-master.eqiad.wmnet with "Temporary failure in name resolution" - https://phabricator.wikimedia.org/T279013 (10Reedy) Isn't this why in MW land we tend to use IP addresses rather than hostnames... [21:05:09] 10serviceops, 10DBA, 10Phabricator, 10User-brennen: Phabricator intermittently slow; db connection failures to m3-master.eqiad.wmnet with "Temporary failure in name resolution" - https://phabricator.wikimedia.org/T279013 (10Legoktm) For reference: ` $ host m3-master.eqiad.wmnet m3-master.eqiad.wmnet is an... [21:05:38] 10serviceops, 10DBA, 10Phabricator, 10User-brennen: Phabricator intermittently slow; db connection failures to m3-master.eqiad.wmnet with "Temporary failure in name resolution" - https://phabricator.wikimedia.org/T279013 (10mmodell) @reedy: perhaps? but we've had it configured that way forever. [21:06:29] mutante: that seems reasonable to me [21:07:16] legoktm: thanks! [21:07:41] and re: phab, I can do a DNS lookup on commandline (without PHP) constantly and it never fails... tried in a loop [21:07:45] 10serviceops, 10DBA, 10Phabricator, 10User-brennen: Phabricator intermittently slow; db connection failures to m3-master.eqiad.wmnet with "Temporary failure in name resolution" - https://phabricator.wikimedia.org/T279013 (10Reedy) Just because something has worked fine for a long time, doesn't mean it alwa... [21:09:20] 10serviceops, 10DBA, 10Phabricator, 10User-brennen: Phabricator intermittently slow; db connection failures to m3-master.eqiad.wmnet with "Temporary failure in name resolution" - https://phabricator.wikimedia.org/T279013 (10mmodell) I'm certainly not against doing it that way, I presume we could have puppe... [21:14:46] re: phab, think we might be getting a bunch of bot traffic from cloud vps too, name resolution could be a red herring. [21:25:29] 10serviceops, 10DBA, 10Phabricator, 10Patch-For-Review, 10User-brennen: Phabricator intermittently slow; db connection failures to m3-master.eqiad.wmnet with "Temporary failure in name resolution" - https://phabricator.wikimedia.org/T279013 (10CDanis) I'm not sure if we tend to use IP addresses directly...