[03:26:25] PROBLEM - Host db1140 is DOWN: PING CRITICAL - Packet loss = 100% [04:00:49] PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration [04:02:31] RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 0.006 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration [04:45:47] (03CR) 10Andrew Bogott: "retest" [puppet] - 10https://gerrit.wikimedia.org/r/589741 (owner: 10Andrew Bogott) [04:47:20] (03CR) 10Andrew Bogott: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/589741 (owner: 10Andrew Bogott) [04:57:32] (03CR) 10Andrew Bogott: "confirmed no-op in https://puppet-compiler.wmflabs.org/compiler1002/22046/" [puppet] - 10https://gerrit.wikimedia.org/r/589856 (owner: 10Andrew Bogott) [05:51:19] 10Operations, 10DBA: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 (10Marostegui) [05:51:57] !log Power back on db1140 T250602 [05:52:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:52:03] T250602: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 [06:00:24] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 (10Marostegui) p:05Triage→03High a:03jcrespo This server is fully broken apparently, it is not powering ON :-( - I have tried multiple combinations: ` hpiLO-> power on status=0 s... [07:30:27] marostegui: hola! is it ok to ack db1140 in icinga? (double checking) [07:42:00] oh yes [07:42:07] i forgot to do that [07:42:16] thanks elukey [07:44:31] ACKNOWLEDGEMENT - Host db1140 is DOWN: PING CRITICAL - Packet loss = 100% Marostegui T250602 [07:46:07] <3 [07:47:23] (03PS1) 10Marostegui: db1140: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/590255 (https://phabricator.wikimedia.org/T250602) [07:50:24] (03CR) 10Marostegui: [C: 03+2] db1140: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/590255 (https://phabricator.wikimedia.org/T250602) (owner: 10Marostegui) [08:40:23] PROBLEM - Stale file for node-exporter textfile in codfw on icinga1001 is CRITICAL: cluster=cache_text file=vhtcpd.prom instance=cp2029:9100 job=node site=codfw https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile [08:48:49] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:50:29] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3050 is OK: HTTP OK: HTTP/1.0 200 OK - 22720 bytes in 0.269 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [10:20:37] PROBLEM - Stale file for node-exporter textfile in esams on icinga1001 is CRITICAL: cluster=cache_text file=vhtcpd.prom instance=cp3050:9100 job=node site=esams https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile [10:34:44] 10Operations, 10MediaWiki-General, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10matej_suchanek) [10:49:19] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [10:52:53] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 22720 bytes in 0.263 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:02:13] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:09:29] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is OK: HTTP OK: HTTP/1.0 200 OK - 22718 bytes in 0.258 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:19:37] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 54 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [11:25:27] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 50 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:05:11] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [13:16:05] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is OK: HTTP OK: HTTP/1.0 200 OK - 22719 bytes in 0.257 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [13:35:01] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 52 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:40:51] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 46 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:49:19] (03PS3) 10ArielGlenn: check bz2 page content files for existence before running command batch [dumps] - 10https://gerrit.wikimedia.org/r/589032 (https://phabricator.wikimedia.org/T250260) [13:50:19] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 56 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [14:01:59] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 50 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [14:05:11] (03PS1) 10Ppchelko: EventBus: Switch to namespaced class names [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590437 [14:07:27] (03CR) 10Ppchelko: [C: 04-2] "Can't be merged until If99d909ca5c0074eb1484708d08116da49a6b19a is completely deployed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590437 (owner: 10Ppchelko) [15:23:53] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:25:31] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is OK: HTTP OK: HTTP/1.0 200 OK - 22715 bytes in 0.260 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:34:17] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:46:57] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is OK: HTTP OK: HTTP/1.0 200 OK - 22712 bytes in 0.259 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [16:05:07] !log rolling restart of ats-tls in text@esams - T249335 [16:05:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:05:15] T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 [16:07:12] (03Abandoned) 10Krinkle: hieradata: Move beta 'cache::app_directors' from Horizon to Puppet [puppet] - 10https://gerrit.wikimedia.org/r/530771 (https://phabricator.wikimedia.org/T158837) (owner: 10Krinkle) [16:11:28] (03PS2) 10Krinkle: hieradata: Add 'performance.wikimedia.beta.wmflabs.org' routing [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) [16:13:47] (03CR) 10Reedy: [C: 03+2] labs: Move RB traffic to new stretch host [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589912 (https://phabricator.wikimedia.org/T250574) (owner: 10Alex Monk) [16:14:54] (03Merged) 10jenkins-bot: labs: Move RB traffic to new stretch host [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589912 (https://phabricator.wikimedia.org/T250574) (owner: 10Alex Monk) [16:15:10] (03PS3) 10Krinkle: hieradata: Add 'performance.wikimedia.beta.wmflabs.org' routing [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) [16:16:37] thanks Reedy [16:17:18] (03PS4) 10Krinkle: hieradata: Add 'performance.wikimedia.beta.wmflabs.org' routing [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) [16:17:32] https://gerrit.wikimedia.org/r/operations/mediawiki-config/': [16:17:42] fatal: unable to access 'https://gerrit.wikimedia.org/r/operations/mediawiki-config/': The requested URL returned error: 502 [16:17:44] gj gerrit [16:18:04] T246763 [16:18:05] T246763: Jenkins job failing intermittently due to Gerrit HTTP 502 errors when cloning repos - https://phabricator.wikimedia.org/T246763 [16:18:32] Krenair: do you happen to know what I need to do nowadays to make ^ work? [16:18:49] I've tried a few variations based on rudimentary immitation of what other things do [16:19:01] cherry-picked and re-ran puppet [16:19:13] !log reedy@deploy1001 Synchronized wmf-config/LabsServices.php: labs: Move RB traffic to new stretch host (duration: 01m 11s) [16:19:16] afraid not, sorry [16:19:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:22] haven't run into that myself [16:20:07] Krinkle, oh sorry are you talking about the gerrit error or some routing for stuff in beta? [16:20:23] routing stuff [16:22:09] ah "cache::alternate_domains: {}" is apply locally via Horizon [16:22:13] I guess that wins [16:23:13] and 'mapping_rules' as well [16:27:40] (03PS5) 10Krinkle: hieradata: Add 'performance.wikimedia.beta.wmflabs.org' routing [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) [16:28:07] yeah we should really delete all the puppet.git hieradata so we have one place for it [16:28:49] I'd prefer it live in puppet, so that it's not as easily forgotten about. at least for my team it would allow us to keep it together and CR the same way etc. [16:29:47] well, it's not going in that direction [16:29:53] puppet.git has too restrictive ACL [16:31:33] I don't have merge access there either, I don't see that as a problem though. I don't plan to maintain stuff through Horizon if there's another way without obvious downside. [16:31:43] at least for stuff that can be shared with prod [16:32:37] is there a task about this? isn't it good to converge? [16:36:05] Krinkle, well there's a task about general puppet management, that mentions this [16:36:48] (03PS6) 10Krinkle: hieradata: Add 'performance.wikimedia.beta.wmflabs.org' routing [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) [16:36:58] https://phabricator.wikimedia.org/T161675 [16:39:13] oh, I guess that yaml file is old/unused [16:39:17] and le_subjects is no longer used either? [16:41:30] (03PS7) 10Krinkle: hieradata: Add 'performance.wikimedia.beta.wmflabs.org' routing [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) [16:46:45] well, https://performance.wikimedia.beta.wmflabs.org/ is now timing out instead of serving "Domain not served" [16:46:51] I guess that's progress? [16:47:16] Krinkle, I think le_subjects was for the old Let's Encrypt puppetisation designed for single-server setups? [16:47:55] Krinkle, which yaml file is old/unused? [16:48:01] text05 [16:48:11] yeah I deleted that host [16:48:29] upload04 too, but like over a year ago I think [16:48:55] Krinkle, and yes, specifically in that file - that's a legacy from before we used acme-chief in deployment-prep [16:50:08] looking at your new domain now [16:50:37] root@deployment-cache-text06:/etc/trafficserver# grep performance.wikimedia * -r [16:50:37] remap.config:map http://performance.wikimedia.beta.wmflabs.org http://deployment-webperf11.deployment-prep.eqiad.wmflabs [16:51:05] Krinkle, probably not the source of the immediate issue (domain not served), but I can't seem to connect from cache-text06 to webperf11 [16:51:28] (TCP port 80) [16:51:38] it no longer serves domain not served for me, add query buster to bypass cache [16:51:44] so yeah, looking at access rules now [16:52:01] its accessible by webproxy already [16:52:06] https://performance-beta.wmflabs.org/ [16:52:09] (has been for years) [16:52:13] so I figured it was accessible enough [16:53:17] will either be ferm or an openstack security group [16:53:21] maybe both [16:53:40] root@deployment-webperf11:~# iptables -L | grep dpt:http [16:53:40] ACCEPT tcp -- proxy-01.project-proxy.eqiad1.wikimedia.cloud anywhere tcp dpt:http [16:53:40] ACCEPT tcp -- proxy-02.project-proxy.eqiad1.wikimedia.cloud anywhere tcp dpt:http [16:53:40] root@deployment-webperf11:~# [16:53:53] that's probably what lets the cloud vps generic proxy through but not our ATS machine [16:54:56] Krinkle, try setting `cache_hosts` in hiera to include those two but also deployment-cache-text06 [16:58:11] oh there's also prefixpuppet in Horizon [16:58:36] yeah [16:58:57] btw you do know about the cloud/instance-puppet.git repo right? [16:58:59] (03PS8) 10Krinkle: hieradata: Add 'performance.wikimedia.beta.wmflabs.org' routing [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) [16:59:42] Krenair: Haven't heard of it. But I assume that's where the public git commits are stored from Horizon? [16:59:49] Yeah,I've seen that [17:03:26] yeah [17:03:46] https://performance.wikimedia.beta.wmflabs.org/?bus2346 [17:03:47] woo [17:03:52] yay [17:03:57] thanks Krenair [17:04:12] np [17:06:37] (03PS9) 10Krinkle: hieradata: Add 'performance.wikimedia.beta.wmflabs.org' routing [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) [17:07:06] (03CR) 10Krinkle: "I've applied the text06 changes via Horizon instead:" [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) (owner: 10Krinkle) [17:07:54] Krenair: the 'mapping_rules' rules, where do you think that should live? [17:08:02] project-wide horizon, prefix puppet or instance? [17:08:15] I can't tell whether it's meant to apply to upload or not [17:08:24] seems not? [17:08:41] also, given these only have 1 instance, do you prefer it be on the instance or on the prefix? [17:09:01] I tend not to worry about these things too much [17:09:40] it's profile::trafficserver::backend::mapping_rules: so [17:10:09] it's probably not gonna do any harm to leave it in project-wide hiera, but I think we should maybe avoid adding lots of clutter to project hiera [17:10:22] prefix puppet sounds ideal [17:10:26] oh my, it's alphabetizing my properties [17:10:40] OK, so that's why it was the way it was [17:10:55] mm I think somewhere there's some code that loads it as YAML and some other code that writes it as YAML [17:11:01] Bryan mentioned something like that [17:11:07] or was it and.rew [17:11:16] yeah, I thought maybe per-instance helped you with migrations if e.g. next ats/varnish needs to be differnet [17:11:24] but I suppose in that case you can copy in/out as needed [17:11:30] what we might actually want is keep it as-is but validate that it's valid YAML [17:11:37] well [17:12:04] one thing I do say is to be very careful about which puppet *classes* get added project/prefix-wide [17:12:13] generally I always apply them on a per-instance basis [17:12:46] right [17:12:49] this is because applying a new class is often a bit of a bumpy process [17:13:04] if you do it at a project-wide/prefix-wide level, labs cloud init stuff will attempt to apply them at first boot [17:13:30] which can break the ability to create new instances (that actually work and normal people can log into) in that project/prefix [17:13:43] hiera though should be fine [17:13:55] I dind't know classees couldbe applied by prefix [17:13:59] anyway, yeah sounds good to me [17:14:00] yeah. [17:14:01] prefix is probably better for migrations [17:14:13] generally if I come along and want to replace an instance it'll be because of OS version [17:14:21] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [17:14:23] (distro version, not openstack) [17:14:31] and I'll want it to otherwise act the same [17:14:45] if I need to change something in the new one I can set hieradata specifically for the new instance and leave the prefix be [17:14:51] until we've cleaned away the old instances [17:15:43] (well you could keep it after, but y'know, tidying) [17:15:59] overall though deployment-prep needs a big hiera cleanup, but it needs other things more [17:16:05] (03PS10) 10Krinkle: hieradata: Add 'performance.wikimedia.beta.wmflabs.org' routing [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) [17:17:28] (03PS11) 10Krinkle: hieradata: Include cache-text in Beta Cluster 'cache_hosts' [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) [17:17:53] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 22694 bytes in 0.257 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [17:18:02] (03CR) 10Krinkle: hieradata: Include cache-text in Beta Cluster 'cache_hosts' (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) (owner: 10Krinkle) [17:18:59] Krenair: I suppose 'cache_hosts' is useful to keep here given it supports comments and commit summaries? [17:19:18] overall though I'm on board with putting stuff in Horizon I guess, so long as we're consistent for a given domain/area (e.g. all caching/ats stuff) [17:19:38] eg. either all project/prefix/host in puppet or in horizon [17:20:26] yeah I would like comment support at least before moving everything [17:22:08] (03PS12) 10Krinkle: hieradata: Include cache-text in Beta Cluster 'cache_hosts' [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) [17:22:10] (03PS1) 10Krinkle: hieradata: Remove obsolete deployment-prep overrides [puppet] - 10https://gerrit.wikimedia.org/r/590530 [17:22:32] bd808, do we have a task somewhere about preserving formatting of YAML in the horizon hiera editor? [17:22:51] I recall probably either you or and.rew mentioning us parsing and then dumping which looses that stuff [17:23:17] I did a search but nothing jumps out to me as this [17:23:26] Krenair: I don't remember if anyone made a feature request about that or not. Probably not. [17:23:36] we did talk about it on irc though :) [17:23:46] :D [17:23:49] ok [17:24:00] I guess I'll do that now and maybe look at the code [17:24:48] we would need to split things up so that we validate using the pyyaml parser, but stop roundtriping the data through python data structures or something [17:25:45] it becomes a more ugly problem when you consider maintaining support for the form based workflow that builds the yaml for you [17:26:07] probably just ugly and not impossible though [17:26:26] (03PS2) 10Krinkle: hieradata: Remove obsolete deployment-prep overrides [puppet] - 10https://gerrit.wikimedia.org/r/590530 [17:26:28] wasn't the form workflow the puppet classes thing? [17:26:28] (03PS13) 10Krinkle: hieradata: Include cache-text in Beta Cluster 'cache_hosts' [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) [17:26:36] i.e. not hiera [17:26:51] opened https://phabricator.wikimedia.org/T250622 [17:27:25] OK, the above two are now cherry-picked and re-ran puppet on the various hosts and stuff still works :) [17:27:29] 10Puppet, 10Horizon: Preserve formatting etc. in horizon hiera editor - https://phabricator.wikimedia.org/T250622 (10Krenair) [17:27:34] Krenair: yes, but the puppet classes thing is also hiera [17:27:41] hm [17:27:46] interesting [17:27:55] oh yes with the parameters right? [17:28:01] *nod* [17:28:25] and the classes themselves are hiera too, but we kind of hide that [17:28:29] 10Puppet, 10Horizon: Preserve formatting etc. in horizon hiera editor - https://phabricator.wikimedia.org/T250622 (10Krenair) [17:30:01] also found https://phabricator.wikimedia.org/T241999 and will make a task for providing a commit summary [17:31:43] 10Puppet, 10Horizon: Allow providing a commit message for hieradata changes - https://phabricator.wikimedia.org/T250623 (10Krenair) [17:34:13] For what It's worth, once I realised there was automatic sorting/formatting , I have been abusing/enjoying it. I can just paste stuff at the bottom and not worry about new lines or where it "should" go. [17:34:24] sorting usually puts it where I'd manually place it [17:34:30] but yeah comments woud be super useful [17:34:35] which is probably a difficult pair [18:18:16] (03CR) 10Andrew Bogott: [C: 03+2] Openstack nova: remove spiceproxy code and config [puppet] - 10https://gerrit.wikimedia.org/r/589856 (owner: 10Andrew Bogott) [18:21:16] (03CR) 10Andrew Bogott: [C: 03+2] nova.conf: remove cc_host config [puppet] - 10https://gerrit.wikimedia.org/r/589857 (owner: 10Andrew Bogott) [18:23:29] (03PS9) 10Andrew Bogott: nova common: replace nova_controller and nova_controller_standby [puppet] - 10https://gerrit.wikimedia.org/r/589858 (https://phabricator.wikimedia.org/T249941) [18:24:45] (03CR) 10Andrew Bogott: "mysql:root@localhost [keystone]> select * from token;" [puppet] - 10https://gerrit.wikimedia.org/r/589877 (https://phabricator.wikimedia.org/T243418) (owner: 10Andrew Bogott) [18:27:25] (03CR) 10Andrew Bogott: [C: 03+2] "pcc shows resource changes but no config changes" [puppet] - 10https://gerrit.wikimedia.org/r/589858 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [18:33:59] (03PS1) 10Andrew Bogott: Nova project spreadcheck: run on all openstack_controllers [puppet] - 10https://gerrit.wikimedia.org/r/590587 (https://phabricator.wikimedia.org/T249941) [18:34:01] (03PS1) 10Andrew Bogott: keystone::fernet_keys profile: use openstack_controllers from hiera [puppet] - 10https://gerrit.wikimedia.org/r/590588 (https://phabricator.wikimedia.org/T249941) [18:34:03] (03PS1) 10Andrew Bogott: Keystone service: stop using nova_controller and nova_controller_standby [puppet] - 10https://gerrit.wikimedia.org/r/590589 (https://phabricator.wikimedia.org/T249941) [18:39:04] (03CR) 10Andrew Bogott: [C: 03+2] "pcc results at https://puppet-compiler.wmflabs.org/compiler1003/22050/" [puppet] - 10https://gerrit.wikimedia.org/r/590587 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [18:41:25] (03CR) 10Andrew Bogott: "pcc results at https://puppet-compiler.wmflabs.org/compiler1003/22051/" [puppet] - 10https://gerrit.wikimedia.org/r/590588 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [18:42:44] (03CR) 10Andrew Bogott: "https://puppet-compiler.wmflabs.org/compiler1003/22051/" [puppet] - 10https://gerrit.wikimedia.org/r/590588 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [18:43:50] (03CR) 10Andrew Bogott: [C: 03+2] keystone::fernet_keys profile: use openstack_controllers from hiera [puppet] - 10https://gerrit.wikimedia.org/r/590588 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [18:52:04] (03CR) 10Andrew Bogott: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/22052/" [puppet] - 10https://gerrit.wikimedia.org/r/590589 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [19:13:45] (03CR) 10ArielGlenn: [C: 03+2] remove extraneous private/public wik checks [dumps] - 10https://gerrit.wikimedia.org/r/585534 (https://phabricator.wikimedia.org/T249508) (owner: 10ArielGlenn) [19:14:15] (03CR) 10ArielGlenn: [C: 03+2] remove capability to dump private tables [dumps] - 10https://gerrit.wikimedia.org/r/585535 (https://phabricator.wikimedia.org/T249508) (owner: 10ArielGlenn) [19:14:56] (03CR) 10ArielGlenn: [C: 03+2] read special files from directory with the correct date [dumps] - 10https://gerrit.wikimedia.org/r/589004 (https://phabricator.wikimedia.org/T249508) (owner: 10ArielGlenn) [19:15:37] (03CR) 10ArielGlenn: [C: 03+2] unit test for checking content of index.html file for a wiki dump run [dumps] - 10https://gerrit.wikimedia.org/r/589005 (https://phabricator.wikimedia.org/T249477) (owner: 10ArielGlenn) [19:19:02] (03PS1) 10ArielGlenn: We don't dump private tables so remove them from the dumps config [puppet] - 10https://gerrit.wikimedia.org/r/590622 (https://phabricator.wikimedia.org/T249508) [19:20:46] (03CR) 10ArielGlenn: [C: 03+2] We don't dump private tables so remove them from the dumps config [puppet] - 10https://gerrit.wikimedia.org/r/590622 (https://phabricator.wikimedia.org/T249508) (owner: 10ArielGlenn) [19:24:58] (03CR) 10ArielGlenn: [C: 03+2] unit test for private/public table type handling [dumps] - 10https://gerrit.wikimedia.org/r/589006 (https://phabricator.wikimedia.org/T249508) (owner: 10ArielGlenn) [19:26:17] this is all prep for a deployment tomorrow morning (I have a very short window in which to deploy, if it's not the weekend and not the middle of a dump run) [19:26:22] spam will be done shortly [19:26:31] (03CR) 10ArielGlenn: [C: 03+2] for 7z production in batches, skip files that exist at beginning of each batch [dumps] - 10https://gerrit.wikimedia.org/r/565301 (https://phabricator.wikimedia.org/T250260) (owner: 10ArielGlenn) [19:26:56] (03CR) 10ArielGlenn: [C: 03+2] add convenience bash script that runs all unit tests [dumps] - 10https://gerrit.wikimedia.org/r/589008 (owner: 10ArielGlenn) [19:27:49] (03CR) 10ArielGlenn: [C: 03+2] fix an annoying typo in the test modules docs [dumps] - 10https://gerrit.wikimedia.org/r/589010 (owner: 10ArielGlenn) [19:28:19] (03CR) 10ArielGlenn: [C: 03+2] check bz2 page content files for existence before running command batch [dumps] - 10https://gerrit.wikimedia.org/r/589032 (https://phabricator.wikimedia.org/T250260) (owner: 10ArielGlenn) [19:28:33] done for today, tomorrow these will be deployed [20:30:17] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:32:07] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:36:25] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [20:43:43] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is OK: HTTP OK: HTTP/1.0 200 OK - 22705 bytes in 0.257 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [20:47:12] (03CR) 10Jhedden: [C: 03+1] openstack::keystone::cleanup: remove all timers [puppet] - 10https://gerrit.wikimedia.org/r/589877 (https://phabricator.wikimedia.org/T243418) (owner: 10Andrew Bogott) [20:47:46] (03CR) 10Jhedden: [C: 03+1] Keystone: remove openstack::keystone::cleanup [puppet] - 10https://gerrit.wikimedia.org/r/589876 (https://phabricator.wikimedia.org/T243418) (owner: 10Andrew Bogott) [20:58:38] (03Abandoned) 10Reedy: Update EventBus classes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589878 (owner: 10Reedy) [20:59:06] (03CR) 10Reedy: "I'd done this in https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/589878/ but just abandoned it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590437 (owner: 10Ppchelko) [21:50:41] (03PS1) 10BryanDavis: legacy ingress: propagate query string to toolforge domain [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/590671 (https://phabricator.wikimedia.org/T250625)