[00:00:30] (03PS22) 10CRusnov: customscripts/interface_automation.py: Add Interface and IP Importer [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/588036 (https://phabricator.wikimedia.org/T244153) [00:01:07] (03CR) 10jerkins-bot: [V: 04-1] customscripts/interface_automation.py: Add Interface and IP Importer [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/588036 (https://phabricator.wikimedia.org/T244153) (owner: 10CRusnov) [00:02:17] (03PS23) 10CRusnov: customscripts/interface_automation.py: Add Interface and IP Importer [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/588036 (https://phabricator.wikimedia.org/T244153) [00:02:44] (03CR) 10jerkins-bot: [V: 04-1] customscripts/interface_automation.py: Add Interface and IP Importer [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/588036 (https://phabricator.wikimedia.org/T244153) (owner: 10CRusnov) [00:03:51] (03PS24) 10CRusnov: customscripts/interface_automation.py: Add Interface and IP Importer [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/588036 (https://phabricator.wikimedia.org/T244153) [00:08:16] (03CR) 10Dzahn: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/621257 (owner: 10Jbond) [00:09:47] 10Operations, 10SRE-Access-Requests: edtadros is in nda group, but registered with a WMF account - https://phabricator.wikimedia.org/T260070 (10Dzahn) [00:11:30] (03PS1) 10Papaul: Add kuberbetes2017 MAC address and to netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/621616 (https://phabricator.wikimedia.org/T258745) [00:19:58] (03CR) 10Papaul: [C: 03+2] Add kuberbetes2017 MAC address and to netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/621616 (https://phabricator.wikimedia.org/T258745) (owner: 10Papaul) [00:30:52] 10Operations, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install kubernetes2017.codfw.wmnet - https://phabricator.wikimedia.org/T258745 (10Papaul) [00:44:15] (03PS1) 10Bstorm: wikireplicas: refactor to eliminate confusing "labsdb" naming [puppet] - 10https://gerrit.wikimedia.org/r/621618 (https://phabricator.wikimedia.org/T260843) [00:44:29] (03CR) 10jerkins-bot: [V: 04-1] wikireplicas: refactor to eliminate confusing "labsdb" naming [puppet] - 10https://gerrit.wikimedia.org/r/621618 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [00:46:39] (03CR) 10Bstorm: "This is hardly done yet, but it is pretty close and reviews are welcome!" [puppet] - 10https://gerrit.wikimedia.org/r/621618 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [00:52:31] (03PS1) 10Papaul: Add kubernetes2017 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/621619 (https://phabricator.wikimedia.org/T258745) [00:53:04] (03PS2) 10Bstorm: wikireplicas: refactor to eliminate confusing "labsdb" naming [puppet] - 10https://gerrit.wikimedia.org/r/621618 (https://phabricator.wikimedia.org/T260843) [00:54:04] (03CR) 10Papaul: [C: 03+2] Add kubernetes2017 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/621619 (https://phabricator.wikimedia.org/T258745) (owner: 10Papaul) [01:00:01] 10Operations, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install kubernetes2017.codfw.wmnet - https://phabricator.wikimedia.org/T258745 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` kubernetes2017.codfw.wmnet ` The log... [01:12:13] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install kubernetes2017.codfw.wmnet - https://phabricator.wikimedia.org/T258745 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['kubernetes2017.codfw.wmnet'] ` Of which those **FAILED**: ` ['kubernetes2017.codfw.wmnet'] ` [01:13:57] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [01:13:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:15:58] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [01:16:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:20:07] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10Papaul) [01:24:14] PROBLEM - PHP7 rendering on mw1344 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1310 bytes in 0.053 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [01:24:34] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) @Marostegui i can only get you 1 server at A1 [01:25:24] PROBLEM - Apache HTTP on mw1344 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1310 bytes in 0.049 second response time https://wikitech.wikimedia.org/wiki/Application_servers [01:28:12] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) [01:31:23] 10Operations, 10Maps, 10Traffic, 10Wiki-Loves-Monuments (2020): maps.wikilovesmonuments.org returns a HTTP 429 error (let it access varnish maps_domains) - https://phabricator.wikimedia.org/T260520 (10Zache) Confirming, the map seems to work, thank you for everybody involved! [01:35:10] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install kubernetes2017.codfw.wmnet - https://phabricator.wikimedia.org/T258745 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` kubernetes2017.codfw.wmnet ` The log can be found in `/var/... [01:47:24] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install kubernetes2017.codfw.wmnet - https://phabricator.wikimedia.org/T258745 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['kubernetes2017.codfw.wmnet'] ` Of which those **FAILED**: ` ['kubernetes2017.codfw.wmnet'] ` [01:49:07] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [01:49:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:51:15] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [01:51:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:18:39] 10Operations, 10serviceops, 10Platform Team Workboards (Clinic Duty Team): PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10tstarling) >>! In T260330#6399137, @Joe wrote: > * The service will be executed as user nobody I would suggest using a new system user... [02:35:08] (03PS1) 10Ppchelko: Provide a message for 404 response body [deployment-charts] - 10https://gerrit.wikimedia.org/r/621620 (https://phabricator.wikimedia.org/T260795) [02:41:28] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 572 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [02:47:26] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 49 probes of 572 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [03:47:10] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 74 probes of 571 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [03:49:14] 04Critical Alert for device cr3-eqsin.wikimedia.org - Primary outbound port utilisation over 80% #page [03:50:22] PROBLEM - Ensure traffic_manager binds on 443 and responds to HTTP requests on cp5002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [03:50:58] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp5002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [03:52:32] PROBLEM - Ensure traffic_server is running for instance tls on cp5002 is CRITICAL: PROCS CRITICAL: 0 processes with args /srv/trafficserver/tls/bin/traffic_server -M --run-root=/srv/trafficserver/tls/runroot.yaml --httpport 443 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [03:53:40] PROBLEM - LibreNMS has a critical alert #page on icinga1001 is CRITICAL: Primary outbound port utilisation over 80% #page (cr3-eqsin.wikimedia.org) https://bit.ly/wmf-librenms [03:54:08] 👋 [03:55:31] evening 👋 [03:56:01] . [03:56:45] I suspect this is again already over [03:56:58] yeah, looks from https://grafana.wikimedia.org/d/000000479/frontend-traffic?orgId=1 like a blip that's recovered [03:57:03] So do we need to bring any traffic folks in? (its outside most of their hours) [03:57:09] eqsin-only again [03:57:22] it's gonna be the same thing of a push notification, I am quite sure [03:58:04] yeah, confirmed looking at who got the page. bblack was the only one on there but its also nearing end of his paging hours [03:58:15] robh: we don't need to escalate to traffic [03:58:25] cool, just wanted to know before it got too late for brandon =] [03:58:35] reschedules icinga check to try again [03:58:48] he's also on vacation this week, glad we don't need to bother him [03:58:56] heh, same ASes as the other day [03:59:08] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 49 probes of 571 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [03:59:13] good call [03:59:14] 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr3-eqsin.wikimedia.org recovered from Primary outbound port utilisation over 80% #page [03:59:24] RECOVERY - LibreNMS has a critical alert #page on icinga1001 is OK: OK: zero critical LibreNMS alerts https://bit.ly/wmf-librenms [03:59:29] yep [03:59:37] acked victorops [03:59:44] thanks mutante [04:00:32] in a little bit we can look at turnilo to see what image it was this time ;) [04:00:39] anyway, cheers all, thanks [04:00:49] should we also say resolved? [04:01:05] yeah, resolved [04:01:15] I'm going to check out if you don't need anything else here, but cdanis you know how to reach me [04:01:18] it just did by itself [04:01:25] thanks all [04:01:32] alright, cya then [04:15:26] 10Operations, 10Performance-Team, 10observability, 10Patch-For-Review: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837 (10Krinkle) 05Open→03Resolved [04:36:43] (03PS1) 10KartikMistry: Publish: Fix broken wikidata linking [extensions/ContentTranslation] (wmf/1.36.0-wmf.5) - 10https://gerrit.wikimedia.org/r/621497 (https://phabricator.wikimedia.org/T249458) [04:39:02] 10Operations, 10Traffic, 10Platform Team Initiatives (API Gateway), 10Platform Team Sprints Board (Sprint 1), and 2 others: Client Developer has a cookie-free API call - https://phabricator.wikimedia.org/T258748 (10Nuria) [05:25:08] PROBLEM - Check systemd state on ores1005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:39:00] PROBLEM - ores_workers_running on ores1005 is CRITICAL: PROCS CRITICAL: 0 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES [05:47:08] RECOVERY - Check systemd state on ores1005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:49:04] RECOVERY - ores_workers_running on ores1005 is OK: PROCS OK: 91 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES [05:56:20] (03CR) 10Jcrespo: [C: 03+1] "YOLO" [software/dbtree] - 10https://gerrit.wikimedia.org/r/621529 (https://phabricator.wikimedia.org/T260876) (owner: 10Jcrespo) [06:00:13] (03CR) 10Jcrespo: [V: 03+2 C: 03+2] dbtree: Implement use_index parsing and apply it to QPS query [software/dbtree] - 10https://gerrit.wikimedia.org/r/621529 (https://phabricator.wikimedia.org/T260876) (owner: 10Jcrespo) [06:02:33] (03PS10) 10Jcrespo: mariadb: Setup section->port assignment on puppet [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) [06:04:36] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10jcrespo) @Papaul (manuel is on vacations until Monday), what about 1 on A1 and 2 on A6? Same row but it looks like it could fit it? [06:05:15] (03PS4) 10Jcrespo: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) [06:21:42] 10Operations, 10serviceops, 10Platform Team Workboards (Clinic Duty Team): PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10tstarling) [06:30:07] (03CR) 10Jcrespo: "@Kormat" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [06:36:44] 10Operations, 10serviceops, 10Platform Team Workboards (Clinic Duty Team): PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10tstarling) I'm reconsidering the layout abstraction I had planned, with inputs/ and outputs/ subdirectories under the working directory.... [06:39:43] (03PS1) 10Jcrespo: Port use_index implementation from dbtree to tendril and apply to tree [software/tendril] - 10https://gerrit.wikimedia.org/r/621625 (https://phabricator.wikimedia.org/T260876) [06:48:30] (03PS2) 10Jcrespo: Port use_index implementation from dbtree to tendril and apply to tree [software/tendril] - 10https://gerrit.wikimedia.org/r/621625 (https://phabricator.wikimedia.org/T260876) [06:58:26] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 571 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200821T0700) [07:04:22] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 49 probes of 571 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [07:19:14] (03PS1) 10Jcrespo: use_index: Fix missing leading whitespace before USE [software/dbtree] - 10https://gerrit.wikimedia.org/r/621677 (https://phabricator.wikimedia.org/T260876) [07:20:00] (03PS3) 10Jcrespo: Port use_index implementation from dbtree to tendril and apply to tree [software/tendril] - 10https://gerrit.wikimedia.org/r/621625 (https://phabricator.wikimedia.org/T260876) [07:25:19] (03PS4) 10Jcrespo: Port use_index implementation from dbtree to tendril and apply to tree [software/tendril] - 10https://gerrit.wikimedia.org/r/621625 (https://phabricator.wikimedia.org/T260876) [07:25:53] XioNoX: I just replied to your url shortner task [07:26:10] RhinosF1: thanks about to reply, I wasn't clear [07:26:27] (03PS2) 10Jcrespo: use_index: Fix missing leading whitespace before USE and wrong argument [software/dbtree] - 10https://gerrit.wikimedia.org/r/621677 (https://phabricator.wikimedia.org/T260876) [07:26:31] XioNoX: np, happy to help! [07:27:39] RhinosF1: replied [07:28:37] 10Operations, 10Gerrit, 10User-Kormat: gerrit.wm.o/r/changes/ has leading garbage in the output - https://phabricator.wikimedia.org/T259333 (10Mainframe98) [07:29:27] (03PS3) 10Jcrespo: use_index: Fix missing leading whitespace before USE and wrong argument [software/dbtree] - 10https://gerrit.wikimedia.org/r/621677 (https://phabricator.wikimedia.org/T260876) [07:31:21] (03PS4) 10Jcrespo: use_index: Fix missing leading whitespace before USE and wrong argument [software/dbtree] - 10https://gerrit.wikimedia.org/r/621677 (https://phabricator.wikimedia.org/T260876) [07:31:47] (03PS5) 10Jcrespo: Port use_index implementation from dbtree to tendril and apply to tree [software/tendril] - 10https://gerrit.wikimedia.org/r/621625 (https://phabricator.wikimedia.org/T260876) [07:32:27] (03CR) 10Jcrespo: [V: 03+2 C: 03+2] use_index: Fix missing leading whitespace before USE and wrong argument [software/dbtree] - 10https://gerrit.wikimedia.org/r/621677 (https://phabricator.wikimedia.org/T260876) (owner: 10Jcrespo) [07:33:13] XioNoX: https://meta.wikimedia.org/w/api.php?action=help&modules=shortenurl seems to exist so if you can make a valid POST request and paste the output. It should work. I'm not sure how it's built though for you to integrate that somewhere. [07:33:50] yeah I don't know neither, thus the task :) [07:34:43] I'll quote that on the task [07:34:45] (03CR) 10Jcrespo: [V: 03+2 C: 03+2] Port use_index implementation from dbtree to tendril and apply to tree [software/tendril] - 10https://gerrit.wikimedia.org/r/621625 (https://phabricator.wikimedia.org/T260876) (owner: 10Jcrespo) [07:37:44] XioNoX: summed up on https://phabricator.wikimedia.org/T233336. If it's built in python, there's a chance I can help but otherwise less so. [07:38:46] thanks! [07:42:12] (03PS17) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 [07:44:52] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper) [07:59:28] (03PS18) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 [08:01:44] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 571 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [08:01:47] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper) [08:07:42] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 571 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [08:07:51] (03PS19) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 [08:09:53] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper) [08:17:34] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 53 probes of 571 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [08:29:28] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 571 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [08:31:25] (03PS20) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 [08:32:42] (03CR) 10Kormat: mariadb: Setup section->port assignment on puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/620722 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [08:33:37] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper) [08:34:57] (03PS21) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 [08:37:53] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper) [08:38:49] 10Operations, 10ops-codfw, 10DBA, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Kormat) 05Open→03Resolved No further issues seen with db2125, so i'm going to resolve this task. [08:47:22] jynus: dbprov2003 looks angry on icinga [08:47:34] 10Operations, 10ops-codfw, 10DC-Ops: (Need By:2020-08-17) label/setup/install pki2001 - https://phabricator.wikimedia.org/T259825 (10jbond) >>! In T259825#6401517, @Papaul wrote: > Please provide a specific partition/raid configuration for 4x4TB disks. Once I have the information i will proceed with the inst... [08:48:07] let me see [08:48:34] (03PS1) 10Kormat: pontoon/mariadb104-test: Move to new eqiad1.wikimedia.cloud domain [puppet] - 10https://gerrit.wikimedia.org/r/621685 (https://phabricator.wikimedia.org/T259256) [08:49:01] (03PS22) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 [08:49:29] (03CR) 10Kormat: [C: 03+2] pontoon/mariadb104-test: Move to new eqiad1.wikimedia.cloud domain [puppet] - 10https://gerrit.wikimedia.org/r/621685 (https://phabricator.wikimedia.org/T259256) (owner: 10Kormat) [08:51:40] 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Audit /etc/apt directories - https://phabricator.wikimedia.org/T214605 (10jbond) 05Open→03Resolved >>! In T214605#6401575, @Dzahn wrote: > This ticket actually sounds like it was completed but is still open. ? Yes your right thanks, closing [08:53:55] (03CR) 10Jbond: [C: 03+2] "LGTM will merge and thanks" [puppet] - 10https://gerrit.wikimedia.org/r/621563 (https://phabricator.wikimedia.org/T260930) (owner: 10Hashar) [08:54:02] jbond42: thx ;) [08:54:10] :) np [08:54:36] which comes back from February and went unnoticed probably because we never had to build a new instance on CI since then hehe [08:55:11] yes had the same thought stream when i saw the PS :) [08:55:13] merged now [08:56:36] \o/ [08:58:37] (03CR) 10JMeybohm: [C: 04-1] "Looks good. Couple of small changes and questions, though 😊" (036 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/621605 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli) [08:58:47] now I am going to try to tweak some apache config bah [08:59:08] RECOVERY - Host dbprov2003 is UP: PING OK - Packet loss = 0%, RTA = 33.46 ms [09:01:32] !log ayounsi@cumin1001 START - Cookbook sre.network.prepare-upgrade [09:01:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:13] !log ayounsi@cumin1001 START - Cookbook sre.network.prepare-upgrade [09:02:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:05:56] (03PS1) 10Mvolz: Revert "Update Zotero to b0a30f98c" [deployment-charts] - 10https://gerrit.wikimedia.org/r/621501 [09:06:07] (03CR) 10Mvolz: [C: 03+2] Revert "Update Zotero to b0a30f98c" [deployment-charts] - 10https://gerrit.wikimedia.org/r/621501 (owner: 10Mvolz) [09:07:23] (03Merged) 10jenkins-bot: Revert "Update Zotero to b0a30f98c" [deployment-charts] - 10https://gerrit.wikimedia.org/r/621501 (owner: 10Mvolz) [09:09:06] 10Operations, 10netops: Upgrade Junos on asw2-esams - https://phabricator.wikimedia.org/T252631 (10ayounsi) a:05ayounsi→03None [09:13:14] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10jcrespo) 05Resolved→03Open Hey, Papaul, just to discard intruders on dc or other major hardware issues, could you maybe accidentally have pressed th... [09:13:53] (03PS1) 10Mvolz: Update zotero to 2020-08-07-190051-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/621686 [09:15:59] 10Operations, 10SRE-Access-Requests: edtadros is in nda group, but registered with a WMF account - https://phabricator.wikimedia.org/T260070 (10jbond) 05Open→03Resolved a:03jbond @Edtadros I have moved you to the correct group, closing this ticket please reopen if you see any issues. [09:16:50] 10Operations, 10netops, 10observability: LibreNMS monitoring glitch caused paging - https://phabricator.wikimedia.org/T252630 (10ayounsi) 05Open→03Resolved a:03ayounsi With the mitigation and the task to upgrade the router it's fine to close that one. [09:19:47] (03PS1) 10Kormat: pontoon: Use a symlink for /etc/puppet/hieradata/pontoon [puppet] - 10https://gerrit.wikimedia.org/r/621688 [09:27:08] (03CR) 10Kormat: "This is ready now." [software/transferpy] - 10https://gerrit.wikimedia.org/r/619959 (https://phabricator.wikimedia.org/T259516) (owner: 10Kormat) [09:29:33] 10Operations, 10Gerrit-Privilege-Requests, 10User-Kormat: Request for Gerrit Managers permissions - https://phabricator.wikimedia.org/T260342 (10Kormat) 05Open→03Resolved a:03Kormat Ahh. I wasn't aware of the limitations of Gerrit Managers. Now that i look at the members of `ldap/gerritadmin`, there ar... [09:30:27] (03PS1) 10Jbond: pcc.py: fix comment about gerrit magic XSSI prefix [puppet] - 10https://gerrit.wikimedia.org/r/621690 (https://phabricator.wikimedia.org/T259333) [09:30:49] (03CR) 10jerkins-bot: [V: 04-1] pcc.py: fix comment about gerrit magic XSSI prefix [puppet] - 10https://gerrit.wikimedia.org/r/621690 (https://phabricator.wikimedia.org/T259333) (owner: 10Jbond) [09:32:30] (03PS2) 10Jbond: pcc.py: fix comment about gerrit magic XSSI prefix [puppet] - 10https://gerrit.wikimedia.org/r/621690 (https://phabricator.wikimedia.org/T259333) [09:33:38] (03CR) 10Jcrespo: [C: 03+2] Move RemoteExecution library to wmfmariadbpy [software/transferpy] - 10https://gerrit.wikimedia.org/r/619959 (https://phabricator.wikimedia.org/T259516) (owner: 10Kormat) [09:34:19] 10Operations, 10DBA, 10Patch-For-Review, 10User-Kormat: DBA python layout - https://phabricator.wikimedia.org/T259516 (10Kormat) [09:34:45] (03CR) 10Jcrespo: "I will create a new release soon." [software/transferpy] - 10https://gerrit.wikimedia.org/r/619959 (https://phabricator.wikimedia.org/T259516) (owner: 10Kormat) [09:34:51] (03CR) 10Kormat: [C: 03+1] pcc.py: fix comment about gerrit magic XSSI prefix [puppet] - 10https://gerrit.wikimedia.org/r/621690 (https://phabricator.wikimedia.org/T259333) (owner: 10Jbond) [09:35:42] (03CR) 10Jbond: [C: 03+2] pcc.py: fix comment about gerrit magic XSSI prefix [puppet] - 10https://gerrit.wikimedia.org/r/621690 (https://phabricator.wikimedia.org/T259333) (owner: 10Jbond) [09:39:38] 10Operations, 10Traffic, 10Patch-For-Review: Enable DNSSEC validation in Wikidough - https://phabricator.wikimedia.org/T259816 (10jbond) > Seems to be working. Can you please confirm as well before I mark this as resolved in case I am missing something? LGTM thanks [09:40:20] (03PS1) 10Jcrespo: Fix references to now removed transferpy.RemoteExecution module [software/transferpy] - 10https://gerrit.wikimedia.org/r/621692 (https://phabricator.wikimedia.org/T259516) [09:41:32] jynus: oh, damn. i didn't check the docs [09:42:01] ah, no. i didn't want to touch that file because it looked like the student's report [09:42:14] yeah, but it breaks ci [09:42:17] :-( [09:42:26] seriously? uff. [09:42:39] docs are generated from modules, which are referenced [09:42:50] and now they don't exist, so the sphynx fails to compile [09:43:00] if you can have a look, I've sent a patch [09:43:22] (03CR) 10Kormat: Fix references to now removed transferpy.RemoteExecution module (031 comment) [software/transferpy] - 10https://gerrit.wikimedia.org/r/621692 (https://phabricator.wikimedia.org/T259516) (owner: 10Jcrespo) [09:43:34] we can give him creding on the module on wmfmariadbpy [09:43:54] dont comment, I have assigned it to you :-D [09:44:21] just amend and deploy [09:44:48] jynus: kormat: then if the doc is broken by a patch, surely CI should catch it when the patchset is proposed? [09:45:09] hashar: it was caught as it is post-merge check [09:45:14] yeah [09:45:21] but well, I would rather have it caught before the merge! :] [09:45:25] sure [09:45:53] (03PS1) 10Vgutierrez: Update debian/control [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621693 [09:45:53] I'm not sure how feasable is to compile it on every patch uploaded [09:46:15] (03CR) 10jerkins-bot: [V: 04-1] Update debian/control [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621693 (owner: 10Vgutierrez) [09:46:17] and that would also cause the build to fail on warnings [09:46:50] [tox:jenkins] [09:46:50] envlist = flake8, unit [09:46:52] was there a warning before? [09:46:56] I guess it is all about adding sphinx env to the list [09:47:19] Warning, treated as error: [09:47:19] autodoc: failed to import module 'CuminExecution' from module 'transferpy.RemoteExecution' [09:47:41] and I see "Wikimedia Jenkins lacks MariaDB" [09:47:58] turnsout we have another tox image which has mysql/mariadb :] [09:48:17] though I have no idea which version is shipped [09:50:18] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:53:47] (03PS2) 10Vgutierrez: Update debian/control [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621693 (https://phabricator.wikimedia.org/T260702) [09:53:49] (03PS1) 10Vgutierrez: Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) [09:54:10] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:54:11] (03CR) 10jerkins-bot: [V: 04-1] Update debian/control [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621693 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [09:54:14] (03CR) 10jerkins-bot: [V: 04-1] Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [09:55:58] !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0) [09:56:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:56:34] (03CR) 10Jbond: "LGTM but as its python calling python wonder if there is a more pythonic way to do this? (honestly don't know if my suggestion is better o" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/621343 (https://phabricator.wikimedia.org/T260389) (owner: 10Bstorm) [09:56:40] !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0) [09:56:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:57:05] jynus: kormat: https://gerrit.wikimedia.org/r/c/integration/config/+/621695 that will have tox owith mysqld started :] [09:58:21] (03PS1) 10Hashar: Run sphinx pre merge in CI [software/transferpy] - 10https://gerrit.wikimedia.org/r/621696 [09:59:03] (03CR) 10jerkins-bot: [V: 04-1] Run sphinx pre merge in CI [software/transferpy] - 10https://gerrit.wikimedia.org/r/621696 (owner: 10Hashar) [10:01:04] (03CR) 10Hashar: "That would make the CI job slightly slower since it would now have to generate the doc. But it is probably still fast enough." [software/transferpy] - 10https://gerrit.wikimedia.org/r/621696 (owner: 10Hashar) [10:06:25] 10Operations, 10LDAP-Access-Requests, 10Release-Engineering-Team: Request to be a gerrit manger - https://phabricator.wikimedia.org/T260974 (10Ladsgroup) [10:08:41] (03CR) 10Jcrespo: "Ok with the general idea unless it turns out it is extremely slow. Let's wait for Kormat to amend and merge the other patch and see how it" [software/transferpy] - 10https://gerrit.wikimedia.org/r/621696 (owner: 10Hashar) [10:08:43] 10Operations, 10LDAP-Access-Requests, 10Release-Engineering-Team: Request to be a gerrit manager - https://phabricator.wikimedia.org/T260974 (10Majavah) [10:15:12] jynus: does sphinx work for you on your CR? https://phabricator.wikimedia.org/P12310 is what i get [10:15:44] kormat: we may need more changes [10:15:54] will check it soon, I am supporting hashar [10:15:58] jynus: i can't get it to build before the move of remoteexecution either [10:16:00] ok [10:16:11] 10Operations, 10LDAP-Access-Requests, 10Release-Engineering-Team: Request to be a gerrit manager - https://phabricator.wikimedia.org/T260974 (10MarcoAurelio) General note only: > Helping with CoC cases (as chair of the committee) on blocking and unblocking users in case it's needed. As far as I can see, and... [10:16:36] kormat: it did work for me, I checked it for the student that it was ok [10:16:55] (before the lastest changes) [10:20:57] (03CR) 10Jcrespo: [C: 03+2] doc: prepare /srv/doc as the new destination [puppet] - 10https://gerrit.wikimedia.org/r/620389 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar) [10:26:27] (03PS1) 10Hashar: doc: add a placeholder in /srv/doc for Bacula [puppet] - 10https://gerrit.wikimedia.org/r/621700 [10:27:30] (03CR) 10jerkins-bot: [V: 04-1] doc: add a placeholder in /srv/doc for Bacula [puppet] - 10https://gerrit.wikimedia.org/r/621700 (owner: 10Hashar) [10:29:14] (03PS2) 10Hashar: doc: add a placeholder in /srv/doc for Bacula [puppet] - 10https://gerrit.wikimedia.org/r/621700 [10:30:00] (03PS3) 10Hashar: doc: add a placeholder in /srv/doc for Bacula [puppet] - 10https://gerrit.wikimedia.org/r/621700 [10:31:24] (03CR) 10Jcrespo: [C: 03+2] doc: add a placeholder in /srv/doc for Bacula [puppet] - 10https://gerrit.wikimedia.org/r/621700 (owner: 10Hashar) [10:36:54] (03PS1) 10Jbond: cookbook sre.puppet.renew-cert: add cookbook to renew a puppet cert [cookbooks] - 10https://gerrit.wikimedia.org/r/621701 (https://phabricator.wikimedia.org/T260110) [10:37:16] kormat: having a look at sphinx now [10:37:55] (03CR) 10jerkins-bot: [V: 04-1] cookbook sre.puppet.renew-cert: add cookbook to renew a puppet cert [cookbooks] - 10https://gerrit.wikimedia.org/r/621701 (https://phabricator.wikimedia.org/T260110) (owner: 10Jbond) [10:41:35] (03PS2) 10Jbond: cookbook sre.puppet.renew-cert: add cookbook to renew a puppet cert [cookbooks] - 10https://gerrit.wikimedia.org/r/621701 (https://phabricator.wikimedia.org/T260110) [10:44:04] kormat: it fails to me locally because I don't have the wmfmariadbpy packages installed or available [10:44:53] jynus: that.. shouldn't happen. setup.py references the wmfmariadbpy git repo directly [10:45:19] have you tried nuking .tox and building it fresh? [10:46:31] oh, no, it is something else [10:47:18] /media/home/jynus/transferpy/transferpy/doc/gsoc2020.rst:14:Inline interpreted text or phrase reference start-string without end-string. [10:47:26] but I think that is exected [10:47:33] without your patch [10:47:45] let me apply it [10:48:43] yep, 2 patches ago: sphinx: commands succeeded [10:48:55] and I saw them being compiled [10:50:09] let me try with HEAD +1 [10:50:54] oh, you didn't sent the ` fix, let me do it real quick [10:52:19] (03PS2) 10Jcrespo: Fix references to now removed transferpy.RemoteExecution module [software/transferpy] - 10https://gerrit.wikimedia.org/r/621692 (https://phabricator.wikimedia.org/T259516) [10:53:22] i didn't send the fix because i can't get the docs to build at all [10:53:25] autodoc: failed to import module 'CuminExecution' from module 'transferpy.RemoteExecution'; the following exception was raised [10:53:31] No module named 'transferpy.RemoteExecution.CuminExecution' [10:53:35] so need more fixes [10:54:54] (03CR) 10Jcrespo: "Still failing, needs more fixes:" [software/transferpy] - 10https://gerrit.wikimedia.org/r/621692 (https://phabricator.wikimedia.org/T259516) (owner: 10Jcrespo) [10:58:03] aaagh [10:58:26] jynus: the problem was that `transferpy/doc/transferpy/` is ignored by git, but sphinx doesn't regenerate it if it exists :/ [10:58:37] so `git status` never told me there was stale data [10:58:54] yeah, I was suspecting that [10:59:34] I made the same mistake [10:59:37] git-ignoring stale data like that is a bad idea. [10:59:41] yeah [10:59:59] that cost me 1h [11:00:14] well, or doing it if you have a well configured cleaning process [11:00:45] so we can merge the patch as is, as long as fixing dependencies [11:00:53] correct? [11:01:02] we will handle the building process separatelly [11:01:10] sphinx: commands succeeded [11:01:25] on my side, no error [11:01:42] waiting your ok to +2 [11:04:09] I am going to merge, as it is broken anyway, we can improve the workflow later [11:04:46] (03CR) 10Jcrespo: [C: 03+2] Fix references to now removed transferpy.RemoteExecution module [software/transferpy] - 10https://gerrit.wikimedia.org/r/621692 (https://phabricator.wikimedia.org/T259516) (owner: 10Jcrespo) [11:05:14] (03Merged) 10jenkins-bot: Fix references to now removed transferpy.RemoteExecution module [software/transferpy] - 10https://gerrit.wikimedia.org/r/621692 (https://phabricator.wikimedia.org/T259516) (owner: 10Jcrespo) [11:05:50] https://gerrit.wikimedia.org/r/c/operations/software/transferpy/+/621696 should help [11:06:09] (03CR) 10Jcrespo: "recheck" [software/transferpy] - 10https://gerrit.wikimedia.org/r/621696 (owner: 10Hashar) [11:06:33] (03PS2) 10Jcrespo: Run sphinx pre merge in CI [software/transferpy] - 10https://gerrit.wikimedia.org/r/621696 (owner: 10Hashar) [11:07:32] ^I am guessing that will fail until CI conf is altered [11:07:51] (03PS1) 10Kormat: Don't git-ignore transferpy/doc/transferpy/ [software/transferpy] - 10https://gerrit.wikimedia.org/r/621704 [11:07:55] it actually succeeded [11:08:20] (03CR) 10Jcrespo: [C: 03+1] Don't git-ignore transferpy/doc/transferpy/ [software/transferpy] - 10https://gerrit.wikimedia.org/r/621704 (owner: 10Kormat) [11:13:57] not against 621704, but I think 621696 is a better response (we can do both) [11:14:08] (03CR) 10Jcrespo: [C: 03+2] Run sphinx pre merge in CI [software/transferpy] - 10https://gerrit.wikimedia.org/r/621696 (owner: 10Hashar) [11:14:34] (03PS2) 10Jcrespo: Don't git-ignore transferpy/doc/transferpy/ [software/transferpy] - 10https://gerrit.wikimedia.org/r/621704 (owner: 10Kormat) [11:14:56] let's see if it is significantly slower to test [11:15:20] nah, it is ok [11:15:57] going for lunch, leave you with that +1ed [11:23:34] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 238, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [11:24:02] PROBLEM - OSPF status on cr2-codfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [11:29:36] (03PS1) 10Hashar: doc: set WMF_DOC_PATH environment variable [puppet] - 10https://gerrit.wikimedia.org/r/621707 (https://phabricator.wikimedia.org/T149924) [11:30:36] (03CR) 10Hashar: "Then in our php code instead of retrieving $_SERVER['DOCUMENT_ROOT'] we would use getenv('WMF_DOC_PATH')." [puppet] - 10https://gerrit.wikimedia.org/r/621707 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar) [11:31:24] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 240, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [11:31:50] RECOVERY - OSPF status on cr2-codfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [11:43:19] (03PS1) 10ZPapierski: Enable file query events for WCQS [puppet] - 10https://gerrit.wikimedia.org/r/621710 (https://phabricator.wikimedia.org/T259637) [11:44:18] (03CR) 10jerkins-bot: [V: 04-1] Enable file query events for WCQS [puppet] - 10https://gerrit.wikimedia.org/r/621710 (https://phabricator.wikimedia.org/T259637) (owner: 10ZPapierski) [11:59:59] 10Operations, 10Traffic, 10Patch-For-Review: Enable DNSSEC validation in Wikidough - https://phabricator.wikimedia.org/T259816 (10ssingh) 05Open→03Resolved [12:00:02] 10Operations, 10Traffic, 10Patch-For-Review: Deploy Wikidough: Experimental DNS-over-HTTPS (DoH) public resolver - https://phabricator.wikimedia.org/T252132 (10ssingh) [12:14:52] (03CR) 10Kormat: [C: 03+2] Don't git-ignore transferpy/doc/transferpy/ [software/transferpy] - 10https://gerrit.wikimedia.org/r/621704 (owner: 10Kormat) [12:15:38] (03Merged) 10jenkins-bot: Don't git-ignore transferpy/doc/transferpy/ [software/transferpy] - 10https://gerrit.wikimedia.org/r/621704 (owner: 10Kormat) [12:18:24] (03CR) 10Gehel: [C: 04-1] "minor comment inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/621710 (https://phabricator.wikimedia.org/T259637) (owner: 10ZPapierski) [12:40:24] (03PS2) 10ZPapierski: Enable file query events for WCQS [puppet] - 10https://gerrit.wikimedia.org/r/621710 (https://phabricator.wikimedia.org/T259637) [12:41:29] (03CR) 10jerkins-bot: [V: 04-1] Enable file query events for WCQS [puppet] - 10https://gerrit.wikimedia.org/r/621710 (https://phabricator.wikimedia.org/T259637) (owner: 10ZPapierski) [12:43:21] (03PS3) 10ZPapierski: Enable file query events for WCQS [puppet] - 10https://gerrit.wikimedia.org/r/621710 (https://phabricator.wikimedia.org/T259637) [12:47:18] (03CR) 10Gehel: [C: 04-1] Enable file query events for WCQS (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/621710 (https://phabricator.wikimedia.org/T259637) (owner: 10ZPapierski) [12:48:16] (03CR) 10Gehel: [C: 04-1] "minor issue with log file location." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/621710 (https://phabricator.wikimedia.org/T259637) (owner: 10ZPapierski) [12:52:19] (03PS4) 10ZPapierski: Enable file query events for WCQS [puppet] - 10https://gerrit.wikimedia.org/r/621710 (https://phabricator.wikimedia.org/T259637) [12:52:53] (03PS5) 10ZPapierski: Enable file query events for WCQS [puppet] - 10https://gerrit.wikimedia.org/r/621710 (https://phabricator.wikimedia.org/T259637) [13:10:49] (03CR) 10ZPapierski: "NOOP for production servers - https://puppet-compiler.wmflabs.org/compiler1003/24600/" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/621710 (https://phabricator.wikimedia.org/T259637) (owner: 10ZPapierski) [13:24:28] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:25:12] !log jayme@cumin1001 conftool action : set/pooled=False; selector: dnsdisc=termbox,name=codfw [13:25:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:32] !log jayme@cumin1001 conftool action : set/pooled=True; selector: dnsdisc=termbox,name=codfw [13:25:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:22] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:30:42] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10Papaul) dbprov2003 is in D4 an i do not recall working in D4 yesterday when on site. i worked in D2 and C3. the only action taken in D4 yesterday was to... [13:35:14] (03PS6) 10ZPapierski: Add a weekly reload job for wcqs data reload [puppet] - 10https://gerrit.wikimedia.org/r/619289 (https://phabricator.wikimedia.org/T251515) [13:46:35] (03PS1) 10Ppchelko: Include certs into annotations for api-gateway [deployment-charts] - 10https://gerrit.wikimedia.org/r/621716 [14:23:31] (03CR) 10Ppchelko: [C: 03+1] Update zotero to 2020-08-07-190051-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/621686 (owner: 10Mvolz) [14:23:44] (03PS1) 10Papaul: Add pki2001 to netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/621719 (https://phabricator.wikimedia.org/T259825) [14:28:53] (03PS3) 10Effie Mouzeli: helmfile: add values for staging environment [deployment-charts] - 10https://gerrit.wikimedia.org/r/621605 (https://phabricator.wikimedia.org/T256973) [14:29:35] (03PS4) 10Effie Mouzeli: helmfile: add values for staging environment [deployment-charts] - 10https://gerrit.wikimedia.org/r/621605 (https://phabricator.wikimedia.org/T256973) [14:30:40] (03CR) 10Papaul: [C: 03+2] Add pki2001 to netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/621719 (https://phabricator.wikimedia.org/T259825) (owner: 10Papaul) [14:31:42] (03PS1) 10JMeybohm: sre.k8s.service-route: New cookbook to pool/depool k8s servies [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) [14:32:34] (03CR) 10jerkins-bot: [V: 04-1] sre.k8s.service-route: New cookbook to pool/depool k8s servies [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) (owner: 10JMeybohm) [14:32:38] (03PS2) 10JMeybohm: sre.k8s.service-route: New cookbook to pool/depool k8s servies [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) [14:33:32] (03CR) 10jerkins-bot: [V: 04-1] sre.k8s.service-route: New cookbook to pool/depool k8s servies [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) (owner: 10JMeybohm) [14:41:20] (03PS1) 10Papaul: Add pki2001 to site.pp and fixed kubernetes2017 typo [puppet] - 10https://gerrit.wikimedia.org/r/621723 (https://phabricator.wikimedia.org/T259825) [14:41:31] (03CR) 10jerkins-bot: [V: 04-1] Add pki2001 to site.pp and fixed kubernetes2017 typo [puppet] - 10https://gerrit.wikimedia.org/r/621723 (https://phabricator.wikimedia.org/T259825) (owner: 10Papaul) [14:46:34] (03PS1) 10Kormat: Add 'black' formatter support. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/621724 [14:51:55] (03CR) 10Effie Mouzeli: "I hope it looks better now:)" (036 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/621605 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli) [15:06:31] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/621723 (https://phabricator.wikimedia.org/T259825) (owner: 10Papaul) [15:08:39] (03CR) 10Dzahn: [C: 03+2] Add pki2001 to site.pp and fixed kubernetes2017 typo [puppet] - 10https://gerrit.wikimedia.org/r/621723 (https://phabricator.wikimedia.org/T259825) (owner: 10Papaul) [15:11:24] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 568 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [15:15:56] (03CR) 10Bstorm: cumin: for new wmcs. prefix for cookbooks, grant access to wmcs-admins (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/621343 (https://phabricator.wikimedia.org/T260389) (owner: 10Bstorm) [15:23:09] 10Operations, 10Traffic: Don't set cookies for api.wikimedia.org at the caching layer - https://phabricator.wikimedia.org/T260943 (10colewhite) p:05Triage→03Medium [15:23:20] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 49 probes of 568 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [15:23:26] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:25:20] (03CR) 10Dzahn: [C: 03+2] doc: set WMF_DOC_PATH environment variable [puppet] - 10https://gerrit.wikimedia.org/r/621707 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar) [15:25:22] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:26:04] 10Operations, 10ops-codfw: Degraded RAID on backup2001 - https://phabricator.wikimedia.org/T260927 (10colewhite) [15:26:08] 10Operations, 10ops-codfw: backup2001 RAID controller failure, unable to post 2020-08-19 - https://phabricator.wikimedia.org/T260764 (10colewhite) [15:26:54] 10Operations, 10ops-codfw: Degraded RAID on backup2001 - https://phabricator.wikimedia.org/T260927 (10colewhite) 05Open→03Declined Superseded by parent task. [15:27:00] 10Operations, 10ops-codfw: backup2001 RAID controller failure, unable to post 2020-08-19 - https://phabricator.wikimedia.org/T260764 (10colewhite) [15:45:03] (03PS2) 10Kormat: Add 'black' formatter support. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/621724 [15:45:05] (03PS1) 10Kormat: Run 'black' against setup.py and wmfmariadbpy [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/621753 [15:48:10] (03PS1) 10Kormat: Run 'black' in CI against wmfmariadbpy. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/621754 [15:50:36] (03PS3) 10Bstorm: wikireplicas: refactor to eliminate confusing "labsdb" naming [puppet] - 10https://gerrit.wikimedia.org/r/621618 (https://phabricator.wikimedia.org/T260843) [15:52:00] (03PS2) 10Kormat: Run 'black' against setup.py and wmfmariadbpy [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/621753 [15:52:02] (03PS2) 10Kormat: Run 'black' in CI against wmfmariadbpy. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/621754 [15:57:08] (03PS1) 10Dzahn: webperf: add data types to profiles [puppet] - 10https://gerrit.wikimedia.org/r/621756 [16:09:30] (03PS1) 10Dzahn: zuul: add data types to profiles [puppet] - 10https://gerrit.wikimedia.org/r/621758 [16:14:38] (03PS1) 10Dzahn: prometheus: replace hiera() with lookup(), add data types for node lookups [puppet] - 10https://gerrit.wikimedia.org/r/621759 [16:15:54] !log zpapierski@deploy1001 Started deploy [search/mjolnir/deploy@c80e2e7]: .. [16:15:55] !log zpapierski@deploy1001 deploy aborted: .. (duration: 00m 01s) [16:15:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:16:24] !log zpapierski@deploy1001 Started deploy [search/mjolnir/deploy@c80e2e7]: .. redeploy after theory verification [16:16:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:13] !log zpapierski@deploy1001 Finished deploy [search/mjolnir/deploy@c80e2e7]: .. redeploy after theory verification (duration: 00m 50s) [16:17:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:20:33] (03PS1) 10Andrew Bogott: wmcs: amend the check-flavor-aggregates test to send an email [puppet] - 10https://gerrit.wikimedia.org/r/621760 (https://phabricator.wikimedia.org/T259542) [16:24:06] (03PS2) 10Dzahn: prometheus: replace hiera() with lookup(), add data types for node lookups [puppet] - 10https://gerrit.wikimedia.org/r/621759 [16:25:08] (03CR) 10jerkins-bot: [V: 04-1] prometheus: replace hiera() with lookup(), add data types for node lookups [puppet] - 10https://gerrit.wikimedia.org/r/621759 (owner: 10Dzahn) [16:25:36] 10Operations, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install kubernetes2017.codfw.wmnet - https://phabricator.wikimedia.org/T258745 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` kubernetes2017.codfw.wmnet ` The log... [16:25:42] (03CR) 10Andrew Bogott: [C: 03+2] wmcs: amend the check-flavor-aggregates test to send an email [puppet] - 10https://gerrit.wikimedia.org/r/621760 (https://phabricator.wikimedia.org/T259542) (owner: 10Andrew Bogott) [16:29:45] (03CR) 10Clarakosi: [C: 03+2] Provide a message for 404 response body [deployment-charts] - 10https://gerrit.wikimedia.org/r/621620 (https://phabricator.wikimedia.org/T260795) (owner: 10Ppchelko) [16:30:50] (03Merged) 10jenkins-bot: Provide a message for 404 response body [deployment-charts] - 10https://gerrit.wikimedia.org/r/621620 (https://phabricator.wikimedia.org/T260795) (owner: 10Ppchelko) [16:31:37] 10Operations, 10ops-codfw, 10DC-Ops: (Need By:2020-08-17) label/setup/install pki2001 - https://phabricator.wikimedia.org/T259825 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` pki2001.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimag... [16:34:52] (03PS1) 10Hashar: Run integration tests on CI [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/621762 [16:35:37] (03CR) 10jerkins-bot: [V: 04-1] Run integration tests on CI [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/621762 (owner: 10Hashar) [16:39:32] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [16:39:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:41:33] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [16:41:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:51:36] (03PS1) 10Ebernhardson: Increase weight of grants and research in metawiki search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/621767 (https://phabricator.wikimedia.org/T260569) [16:52:03] (03PS1) 10Andrew Bogott: wmcs check_flavor_aggregates.py: return CRITICAL rather than WARNING [puppet] - 10https://gerrit.wikimedia.org/r/621768 (https://phabricator.wikimedia.org/T259542) [16:52:54] (03CR) 10Andrew Bogott: [C: 03+2] wmcs check_flavor_aggregates.py: return CRITICAL rather than WARNING [puppet] - 10https://gerrit.wikimedia.org/r/621768 (https://phabricator.wikimedia.org/T259542) (owner: 10Andrew Bogott) [16:53:12] mutante: thank you for the addition of WMF_DOC_PATH env variable in Apache :] [16:53:13] (03CR) 10Hashar: "Seems like it fails because of:" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/621762 (owner: 10Hashar) [16:54:33] I am off, have a good week-end everyone [16:54:54] hashar: no problem, have a good weekend too [16:56:21] (03PS1) 10Dzahn: prometheus: add more data types to all exporters [puppet] - 10https://gerrit.wikimedia.org/r/621770 [16:56:23] (03PS1) 10Dzahn: prometheus: replace hiera() with lookup() in several exporters [puppet] - 10https://gerrit.wikimedia.org/r/621771 [16:57:27] (03PS4) 10Bstorm: wikireplicas: refactor to eliminate confusing "labsdb" naming [puppet] - 10https://gerrit.wikimedia.org/r/621618 (https://phabricator.wikimedia.org/T260843) [16:58:44] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [16:58:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:59:01] (03PS5) 10Bstorm: wikireplicas: refactor to eliminate confusing "labsdb" naming [puppet] - 10https://gerrit.wikimedia.org/r/621618 (https://phabricator.wikimedia.org/T260843) [17:00:36] 10Operations, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install kubernetes2017.codfw.wmnet - https://phabricator.wikimedia.org/T258745 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['kubernetes2017.codfw.wmnet'] ` and were **ALL** successful. [17:00:49] 10Operations, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install kubernetes2017.codfw.wmnet - https://phabricator.wikimedia.org/T258745 (10Papaul) [17:00:51] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [17:00:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:01:13] 10Operations, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install kubernetes2017.codfw.wmnet - https://phabricator.wikimedia.org/T258745 (10Papaul) 05Open→03Resolved Complete [17:03:47] (03CR) 10Andrew Bogott: [C: 03+1] "This seems like a perfectly reasonable idea. Actually tracking all the renames for codereview is probably beyond my skills but we can trus" [puppet] - 10https://gerrit.wikimedia.org/r/621618 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [17:14:40] 10Operations, 10ops-codfw, 10DC-Ops: (Need By:2020-08-17) label/setup/install pki2001 - https://phabricator.wikimedia.org/T259825 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['pki2001.codfw.wmnet'] ` and were **ALL** successful. [17:14:57] 10Operations, 10ops-codfw, 10DC-Ops: (Need By:2020-08-17) label/setup/install pki2001 - https://phabricator.wikimedia.org/T259825 (10Papaul) [17:15:32] 10Operations, 10ops-codfw, 10DC-Ops: (Need By:2020-08-17) label/setup/install pki2001 - https://phabricator.wikimedia.org/T259825 (10Papaul) 05Open→03Resolved complete [17:21:16] 10Operations, 10ops-codfw, 10DC-Ops, 10Maps: (Need By: TBD) rack/setup/install maps20[05-10].codfw.wmnet - https://phabricator.wikimedia.org/T260271 (10Papaul) [17:22:51] 10Operations, 10ops-codfw, 10DC-Ops, 10Maps: (Need By: TBD) rack/setup/install maps20[05-10].codfw.wmnet - https://phabricator.wikimedia.org/T260271 (10Papaul) [17:39:29] !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox [17:39:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:42:27] 10Operations, 10serviceops: Reproduce opcache corruptions in production - https://phabricator.wikimedia.org/T261009 (10jijiki) [17:43:55] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [17:43:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:45:19] (03PS1) 10BryanDavis: Make `webservice shell` scriptable [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) [17:46:19] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (2020-08-15) rack/setup/install dbprov1003.eqiad.wmnet - https://phabricator.wikimedia.org/T258750 (10Cmjohnson) [17:47:24] RECOVERY - MegaRAID on backup2001 is OK: OK: optimal, 2 logical, 24 physical https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [17:50:35] (03PS1) 10Bstorm: wikireplicas: add new location for these fake passwords [labs/private] - 10https://gerrit.wikimedia.org/r/621777 (https://phabricator.wikimedia.org/T260843) [17:51:38] 10Operations, 10ops-codfw, 10DC-Ops, 10Maps: (Need By: TBD) rack/setup/install maps20[05-10].codfw.wmnet - https://phabricator.wikimedia.org/T260271 (10Papaul) [17:57:15] (03PS1) 10Ebernhardson: admin: Update ebernhardson home files [puppet] - 10https://gerrit.wikimedia.org/r/621778 [17:57:17] (03PS1) 10Dzahn: openstack: replace hiera() with lookup() in several places [puppet] - 10https://gerrit.wikimedia.org/r/621779 [17:57:37] (03CR) 10Bstorm: [V: 03+2 C: 03+2] wikireplicas: add new location for these fake passwords [labs/private] - 10https://gerrit.wikimedia.org/r/621777 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [18:00:26] (03CR) 10Bstorm: [C: 04-1] "Marking -1 until I can get PCC to pass 😊" [puppet] - 10https://gerrit.wikimedia.org/r/621618 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [18:01:03] 10Operations, 10Domains, 10Traffic: Change of nameservers for Wikimedia.org.tr - https://phabricator.wikimedia.org/T259792 (10CRoslof) Usually, only chapters are allowed to use wikimedia.[ccTLD] domain names. We may be able to make an exception in this case, however. One question I have first, though, is whe... [18:01:24] (03CR) 10BryanDavis: "One side effect of this change is that concurrent executions of `webservice shell` is no longer possible. In the prior implementation mult" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) (owner: 10BryanDavis) [18:05:08] (03PS1) 10Bstorm: wikireplicas: fix the path for the new fake pw locations [labs/private] - 10https://gerrit.wikimedia.org/r/621780 [18:05:44] (03CR) 10Bstorm: [V: 03+2 C: 03+2] wikireplicas: fix the path for the new fake pw locations [labs/private] - 10https://gerrit.wikimedia.org/r/621780 (owner: 10Bstorm) [18:13:05] (03PS2) 10BryanDavis: Make `webservice shell` scriptable [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) [18:13:07] (03PS1) 10Bstorm: wikireplicas: another small correction to the path for the index script [labs/private] - 10https://gerrit.wikimedia.org/r/621781 [18:14:16] (03CR) 10Bstorm: [V: 03+2 C: 03+2] wikireplicas: another small correction to the path for the index script [labs/private] - 10https://gerrit.wikimedia.org/r/621781 (owner: 10Bstorm) [18:18:20] (03PS3) 10BryanDavis: Make `webservice shell` scriptable [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) [18:29:52] (03PS6) 10Bstorm: wikireplicas: refactor to eliminate confusing "labsdb" naming [puppet] - 10https://gerrit.wikimedia.org/r/621618 (https://phabricator.wikimedia.org/T260843) [18:29:55] (03PS1) 10Dzahn: decom mw2135 through mw2214 [puppet] - 10https://gerrit.wikimedia.org/r/621783 (https://phabricator.wikimedia.org/T260654) [18:31:22] (03PS4) 10BryanDavis: Make `webservice shell` scriptable [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) [18:32:33] (03PS5) 10BryanDavis: Make `webservice shell` scriptable [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) [18:34:01] 10Operations, 10Puppet, 10User-jbond: Audit /etc/apt directories - https://phabricator.wikimedia.org/T214605 (10Pppery) [18:34:19] (03CR) 10Bstorm: "Ok, PCC is now happy with the setup. https://puppet-compiler.wmflabs.org/compiler1002/24606/" [puppet] - 10https://gerrit.wikimedia.org/r/621618 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [18:38:37] (03CR) 10Bstorm: "I think I'll wait for a +1 from a DBA before merge. This should have no actual affect on the servers, and I could deploy it by first disab" [puppet] - 10https://gerrit.wikimedia.org/r/621618 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [18:42:16] (03PS1) 10Dzahn: decom mw2135 through mw2214 [dns] - 10https://gerrit.wikimedia.org/r/621786 (https://phabricator.wikimedia.org/T260654) [18:53:37] (03CR) 10Ryan Kemper: [C: 03+2] admin: Update ebernhardson home files [puppet] - 10https://gerrit.wikimedia.org/r/621778 (owner: 10Ebernhardson) [19:07:29] (03CR) 10Ryan Kemper: [C: 03+2] Enable file query events for WCQS [puppet] - 10https://gerrit.wikimedia.org/r/621710 (https://phabricator.wikimedia.org/T259637) (owner: 10ZPapierski) [19:10:04] (03CR) 10BryanDavis: "Tested live in the webserviceng tool using the process described at https://phabricator.wikimedia.org/T220053#6403000." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) (owner: 10BryanDavis) [19:13:58] (03CR) 10Ryan Kemper: "Change looks good, I couldn't find any instances of the string `wcqs-beta.wmflabs.org` left." [puppet] - 10https://gerrit.wikimedia.org/r/615810 (owner: 10ZPapierski) [19:15:01] (03CR) 10Ryan Kemper: "(for next week) do we need to do more investigation here to figure out the wdqs reload failing, or are you comfortable proceeding with mer" [puppet] - 10https://gerrit.wikimedia.org/r/619289 (https://phabricator.wikimedia.org/T251515) (owner: 10ZPapierski) [19:55:20] (03PS1) 10Dzahn: planet: fix redirect to meta page, stop appending request URI path [puppet] - 10https://gerrit.wikimedia.org/r/621802 (https://phabricator.wikimedia.org/T257840) [20:01:30] (03CR) 10Dzahn: [C: 03+2] planet: fix redirect to meta page, stop appending request URI path [puppet] - 10https://gerrit.wikimedia.org/r/621802 (https://phabricator.wikimedia.org/T257840) (owner: 10Dzahn) [20:05:09] 10Operations, 10Traffic, 10Patch-For-Review: fix planet.wm.org redirect nitpick (was: missing from planet.discovery.wmnet Subject Alternative Name) - https://phabricator.wikimedia.org/T257840 (10Dzahn) 05Open→03Resolved >>! In T257840#6311364, @ema wrote: > Nitpick: https://planet.wikimedia.org/test shou... [20:45:30] (03CR) 10Bstorm: Make `webservice shell` scriptable (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) (owner: 10BryanDavis) [20:46:39] (03CR) 10Bstorm: "This seems like it will make a lot of future users less confused! 😊" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) (owner: 10BryanDavis) [21:01:08] (03CR) 10BryanDavis: Make `webservice shell` scriptable (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) (owner: 10BryanDavis) [21:03:44] (03PS1) 10Andrew Bogott: standard_packages: remove purge of 'at' package [puppet] - 10https://gerrit.wikimedia.org/r/621816 [21:03:54] (03PS6) 10BryanDavis: Make `webservice shell` scriptable [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) [21:04:44] (03CR) 10Bstorm: Make `webservice shell` scriptable (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) (owner: 10BryanDavis) [21:09:56] (03CR) 10BryanDavis: "I have noticed in testing that it is possible for kubectl to get confused/interrupted and not clean up the pod. So far I have triggered th" (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) (owner: 10BryanDavis) [21:48:58] (03PS1) 10Dzahn: wikistats (cloud): replace wmflabs.org with wmcloud.org and update README [puppet] - 10https://gerrit.wikimedia.org/r/621832 (https://phabricator.wikimedia.org/T260739) [21:52:13] (03CR) 10Dzahn: [C: 03+2] "comments and README-only" [puppet] - 10https://gerrit.wikimedia.org/r/621832 (https://phabricator.wikimedia.org/T260739) (owner: 10Dzahn) [22:20:57] 10Operations, 10Wikimedia-Mailing-lists: unreadable mailing list description - https://phabricator.wikimedia.org/T261031 (10Aftabuzzaman) [23:29:05] (03PS1) 10Andrew Bogott: cloud-vps: improve flavor monitoring [puppet] - 10https://gerrit.wikimedia.org/r/621840 [23:30:47] (03PS2) 10Andrew Bogott: cloud-vps: improve flavor monitoring [puppet] - 10https://gerrit.wikimedia.org/r/621840