[00:03:16] 10Operations, 10ops-codfw, 10DC-Ops, 10SRE-swift-storage, 10Patch-For-Review: (Need By: ASAP) rack/setup/install ms-be2057.codfw.wmnet (Test Server - Keep Boxes) - https://phabricator.wikimedia.org/T260188 (10Papaul) [00:13:32] 10Operations, 10Mail: Create Group Aliases for itservices@ - https://phabricator.wikimedia.org/T259727 (10Dzahn) @HMarcus Thank you very much for the detailed response. I understand the difference between those requests now. There was some misunderstanding because the term "alias" is used differently in differ... [00:17:47] 10Operations: incident 20170323-wikibase did not trigger Icinga paging - https://phabricator.wikimedia.org/T161528 (10Dzahn) Well, the action item would be "ensure wikibase alerts are sending pages". [00:18:49] 10Operations: incident 20170323-wikibase did not trigger Icinga paging - https://phabricator.wikimedia.org/T161528 (10Dzahn) I am not sure what else it should be tagged with. Icinga and alerting seems pretty relevant to observability. [00:22:43] 10Operations: incident 20170323-wikibase did not trigger Icinga paging - https://phabricator.wikimedia.org/T161528 (10Dzahn) alternative better action item: "ensure SRE gets paged if more than X number of application servers are returning "500 Internal Server Error". [00:23:24] 10Operations, 10Icinga: incident 20170323-wikibase did not trigger Icinga paging - https://phabricator.wikimedia.org/T161528 (10Dzahn) [00:24:45] 10Operations, 10Icinga: incident 20170323-wikibase did not trigger Icinga paging - https://phabricator.wikimedia.org/T161528 (10Dzahn) I thought Icinga alerting was a core part of observability work, so a bit confused here. [00:26:32] 10Operations, 10Wikimedia-Mailing-lists: Disable google code in mailinglists - https://phabricator.wikimedia.org/T261084 (10Dzahn) @jijiki No, "rmlist" will not delete archives unless it is explicitly "rmlist -a" and that is not recommended unless we have special reasons that it really needs to be deleted. [01:10:30] 10Operations, 10ops-codfw, 10DC-Ops, 10SRE-swift-storage, 10Patch-For-Review: (Need By: ASAP) rack/setup/install ms-be2057.codfw.wmnet (Test Server - Keep Boxes) - https://phabricator.wikimedia.org/T260188 (10Papaul) The install is failing at the first puppet run; because the server doesn't have the role... [01:10:45] (03Restored) 10Papaul: Add ms-be2057 to site.pp with role insetup [puppet] - 10https://gerrit.wikimedia.org/r/622908 (https://phabricator.wikimedia.org/T260188) (owner: 10Papaul) [01:10:51] (03CR) 10Papaul: [C: 03+2] Add ms-be2057 to site.pp with role insetup [puppet] - 10https://gerrit.wikimedia.org/r/622908 (https://phabricator.wikimedia.org/T260188) (owner: 10Papaul) [01:11:51] 10Operations, 10SRE-Access-Requests: Requesting access researchers, statistics-privatedata-users, and analytics-privatedata-users, nda for AndrewKuznetsov - https://phabricator.wikimedia.org/T254939 (10BGerdemann) @jbond , FYI we are extending Andrew's contract through Sep 15, 2020. Please continue access unt... [01:12:12] (03PS2) 10Papaul: Add ms-be2057 to site.pp with role insetup [puppet] - 10https://gerrit.wikimedia.org/r/622908 (https://phabricator.wikimedia.org/T260188) [01:12:21] (03CR) 10Papaul: [V: 03+2 C: 03+2] Add ms-be2057 to site.pp with role insetup [puppet] - 10https://gerrit.wikimedia.org/r/622908 (https://phabricator.wikimedia.org/T260188) (owner: 10Papaul) [01:23:02] 10Operations, 10ops-codfw, 10DC-Ops, 10SRE-swift-storage, 10Patch-For-Review: (Need By: ASAP) rack/setup/install ms-be2057.codfw.wmnet (Test Server - Keep Boxes) - https://phabricator.wikimedia.org/T260188 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['ms-be2057.codfw.wmnet'] ` and were **A... [01:26:13] 10Operations, 10ops-codfw, 10DC-Ops, 10SRE-swift-storage, 10Patch-For-Review: (Need By: ASAP) rack/setup/install ms-be2057.codfw.wmnet (Test Server - Keep Boxes) - https://phabricator.wikimedia.org/T260188 (10Papaul) [01:27:32] 10Operations, 10ops-codfw, 10DC-Ops, 10SRE-swift-storage, 10Patch-For-Review: (Need By: ASAP) rack/setup/install ms-be2057.codfw.wmnet (Test Server - Keep Boxes) - https://phabricator.wikimedia.org/T260188 (10Papaul) 05Open→03Resolved @fgiunchedi All yours have fun [01:40:11] 10Operations, 10MediaWiki-extensions-Score, 10Security-Team, 10Wikimedia-General-or-Unknown, and 3 others: Extension:Score / Lilypond is disabled on all wikis - https://phabricator.wikimedia.org/T257066 (10kaldari) 05Resolved→03Open Reopening per T257091 [03:08:39] (03CR) 10Cwhite: [C: 03+1] prometheus: minimal default alerts for Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/622557 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi) [03:16:49] (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/622558 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi) [03:17:41] (03CR) 10Cwhite: [C: 03+1] prometheus: move beta to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/622561 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi) [04:50:42] (03PS1) 10Marostegui: db1091: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622919 [04:52:06] (03CR) 10Marostegui: [C: 03+2] db1091: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622919 (owner: 10Marostegui) [05:19:29] 10Operations, 10TechCom-RFC, 10serviceops, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10tstarling) [05:50:07] !log restart db2098 [05:50:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:51:43] (03CR) 10ArielGlenn: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/622342 (https://phabricator.wikimedia.org/T261204) (owner: 10DCausse) [06:07:18] !log restart db2099 [06:07:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:12:45] PROBLEM - PHP7 rendering on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:14:23] RECOVERY - PHP7 rendering on mwdebug1001 is OK: HTTP OK: HTTP/1.1 302 Found - 649 bytes in 6.028 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:15:14] (03PS1) 10Tulsi Bhagat: Edit Repo Config [mediawiki-config] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/622776 [06:18:10] (03Abandoned) 10Tulsi Bhagat: Edit Repo Config [mediawiki-config] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/622776 (owner: 10Tulsi Bhagat) [06:28:14] !log restart db2100 [06:28:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:29:27] PROBLEM - PHP7 rendering on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:31:17] RECOVERY - PHP7 rendering on mwdebug1001 is OK: HTTP OK: HTTP/1.1 302 Found - 648 bytes in 8.323 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:32:54] (03CR) 10Elukey: [C: 03+1] mediawiki: replace mw2196 with mw2336 as mcrouter proxy [puppet] - 10https://gerrit.wikimedia.org/r/622900 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [06:34:28] mmm the Zayo link between codfw and ulsfo is down [06:35:03] ah Chris (of course!) already sent an email :) [06:36:17] but no update in hours [06:36:19] lovely [06:40:03] ok sent an email, let's see if somebody answers [06:47:04] !log restart db2102 [06:47:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:09] PROBLEM - Apache HTTP on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [06:53:59] RECOVERY - Apache HTTP on mwdebug1001 is OK: HTTP OK: HTTP/1.1 302 Found - 635 bytes in 7.905 second response time https://wikitech.wikimedia.org/wiki/Application_servers [06:54:01] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=mysql-test site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [06:55:51] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [06:58:13] (03PS23) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200828T0700) [07:01:15] RECOVERY - Check systemd state on mwdebug1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:03:57] RECOVERY - Check the last execution of php7.2-fpm_check_restart on mwdebug1001 is OK: OK: Status of the systemd unit php7.2-fpm_check_restart https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:05:55] (03PS1) 10Muehlenhoff: Extend access for kuz [puppet] - 10https://gerrit.wikimedia.org/r/622924 [07:06:22] 10Operations, 10CAS-SSO: idp.wikimedia.org asking twice for YubiKey - https://phabricator.wikimedia.org/T258029 (10MoritzMuehlenhoff) p:05Triage→03Medium [07:06:43] 10Operations, 10CAS-SSO: idp.wikimedia.org asking twice for YubiKey - https://phabricator.wikimedia.org/T258029 (10MoritzMuehlenhoff) [07:07:22] !log Warm up parsercache in codfw - T260042 [07:07:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:07:26] T260042: Compare a few tables per section before the switchover - https://phabricator.wikimedia.org/T260042 [07:08:42] (03CR) 10Elukey: [C: 03+1] "From my limited understanding it seems sound!" [puppet] - 10https://gerrit.wikimedia.org/r/622826 (owner: 10Kormat) [07:10:17] !log restart db2139 [07:10:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:14:23] PROBLEM - PHP7 rendering on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:14:34] ^ me [07:14:41] sigh downtime expired [07:16:05] RECOVERY - PHP7 rendering on mwdebug1001 is OK: HTTP OK: HTTP/1.1 302 Found - 648 bytes in 0.097 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:17:05] (03PS1) 10Elukey: install_server: add custom reuse recipe for Kafka Jumbo [puppet] - 10https://gerrit.wikimedia.org/r/622925 (https://phabricator.wikimedia.org/T255123) [07:18:50] (03CR) 10Muehlenhoff: [C: 03+2] Extend access for kuz [puppet] - 10https://gerrit.wikimedia.org/r/622924 (owner: 10Muehlenhoff) [07:27:10] (03CR) 10Elukey: [C: 03+1] prometheus: switch over to buster kafkamon hosts [puppet] - 10https://gerrit.wikimedia.org/r/622836 (https://phabricator.wikimedia.org/T252773) (owner: 10Herron) [07:40:59] !log restart backup2001,backup1002 [07:41:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:03:59] (03CR) 10Muehlenhoff: "Let's go with a minimal fix here which simply uses /run/nagios instead of /var/run/nagios." [puppet] - 10https://gerrit.wikimedia.org/r/621967 (https://phabricator.wikimedia.org/T252990) (owner: 10Southparkfan) [08:15:46] 10Operations, 10Goal: FY2020-2021 Q1 DC switchover and switchback - https://phabricator.wikimedia.org/T243314 (10ema) [08:15:58] 10Operations, 10Traffic, 10Goal: Verify ATS handling of DNS TTLs - https://phabricator.wikimedia.org/T261312 (10ema) 05Open→03Resolved >>! In T261312#6412856, @Volans wrote: > @ema as one of the requester for this test thanks a lot for the effort. It looks like we're in good shape here. Thank you for ra... [08:19:26] (03CR) 10Kormat: [C: 03+2] install_server: Add reuse-parts-test.cfg [puppet] - 10https://gerrit.wikimedia.org/r/622826 (owner: 10Kormat) [08:22:41] !log enabling replication from db2112 to db1083 (s1) T243373 [08:22:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:45] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [08:22:56] 10Operations, 10Wikidata, 10Wikidata-Termbox, 10serviceops, 10User-Addshore: Plan to scale up termbox service to be able to render the termbox for desktop pageviews - https://phabricator.wikimedia.org/T261486 (10Addshore) [08:31:13] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "I don't think we use openstack queens anywhere, but anyway this should be safe to merge. I would say try PCC before merging just in case t" [puppet] - 10https://gerrit.wikimedia.org/r/622874 (https://phabricator.wikimedia.org/T218423) (owner: 10Bstorm) [08:31:42] (03PS3) 10Vgutierrez: Add Origin and Description headers for every debian patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621014 (https://phabricator.wikimedia.org/T260702) [08:31:44] (03PS3) 10Vgutierrez: Remove unused patches [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621015 (https://phabricator.wikimedia.org/T260702) [08:31:46] (03PS3) 10Vgutierrez: Remove unnecessary patches for Varnish 6 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621265 (https://phabricator.wikimedia.org/T260702) [08:31:48] (03PS4) 10Vgutierrez: Update 0003-vsm-perms.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621284 (https://phabricator.wikimedia.org/T260702) [08:31:50] (03PS3) 10Vgutierrez: Update 0005-stats-shortlived.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621532 (https://phabricator.wikimedia.org/T260702) [08:31:52] (03PS5) 10Vgutierrez: Update debian/control [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621693 (https://phabricator.wikimedia.org/T260702) [08:31:54] (03PS4) 10Vgutierrez: Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) [08:31:56] (03PS1) 10Vgutierrez: Add 0006-bump-api-soname [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622964 (https://phabricator.wikimedia.org/T260702) [08:31:58] (03PS1) 10Vgutierrez: Bump libvarnishapi SONAME [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622965 (https://phabricator.wikimedia.org/T260702) [08:32:09] (03CR) 10jerkins-bot: [V: 04-1] Add Origin and Description headers for every debian patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621014 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [08:32:36] (03CR) 10jerkins-bot: [V: 04-1] Remove unused patches [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621015 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [08:32:38] (03CR) 10jerkins-bot: [V: 04-1] Remove unnecessary patches for Varnish 6 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621265 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [08:32:45] (03CR) 10jerkins-bot: [V: 04-1] Update 0003-vsm-perms.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621284 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [08:32:48] (03CR) 10jerkins-bot: [V: 04-1] Update 0005-stats-shortlived.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621532 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [08:32:56] (03CR) 10jerkins-bot: [V: 04-1] Update debian/control [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621693 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [08:33:06] (03CR) 10jerkins-bot: [V: 04-1] Add 0006-bump-api-soname [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622964 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [08:33:17] (03CR) 10jerkins-bot: [V: 04-1] Bump libvarnishapi SONAME [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622965 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [08:33:27] vgutierrez: please don't upset jenkins [08:33:59] 10Operations, 10Traffic: Varnish 6.0 needs a SONAME version bump - https://phabricator.wikimedia.org/T261487 (10ema) [08:34:03] PROBLEM - Rate of JVM GC Old generation-s runs - logstash1010-production-logstash-eqiad on logstash1010 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-logstash-eqiad&var-instance=logstash1010&panelId=37 [08:34:10] elukey: gotta love when you empathise with the bot and not with the human ;P [08:34:27] 10Operations, 10Traffic: Varnish 6.0 needs a SONAME version bump - https://phabricator.wikimedia.org/T261487 (10ema) [08:34:30] 10Operations, 10Traffic, 10Patch-For-Review: Analyze custom varnish 5.1 patches considering the migration to varnish 6 - https://phabricator.wikimedia.org/T260702 (10ema) [08:34:38] 10Operations, 10Traffic: Varnish 6.0 needs a SONAME version bump - https://phabricator.wikimedia.org/T261487 (10ema) p:05Triage→03Medium [08:35:11] vgutierrez: ahahahh [08:35:41] (03PS2) 10Vgutierrez: Add 0006-bump-api-soname [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622964 (https://phabricator.wikimedia.org/T261487) [08:35:43] (03PS6) 10Vgutierrez: Update debian/control [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621693 (https://phabricator.wikimedia.org/T260702) [08:35:46] (03PS2) 10Vgutierrez: Bump libvarnishapi SONAME [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622965 (https://phabricator.wikimedia.org/T261487) [08:35:48] (03PS5) 10Vgutierrez: Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) [08:36:20] (03CR) 10jerkins-bot: [V: 04-1] Add 0006-bump-api-soname [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622964 (https://phabricator.wikimedia.org/T261487) (owner: 10Vgutierrez) [08:36:30] (03CR) 10jerkins-bot: [V: 04-1] Bump libvarnishapi SONAME [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622965 (https://phabricator.wikimedia.org/T261487) (owner: 10Vgutierrez) [08:43:17] 10Operations: FY2020-2021 Q1 eqiad -> codfw switchover - https://phabricator.wikimedia.org/T243316 (10Marostegui) [08:47:23] (03CR) 10JMeybohm: [C: 03+2] Update patch: Detect kubeconfig as known argument in plugin invocations [debs/helm] - 10https://gerrit.wikimedia.org/r/620890 (owner: 10JMeybohm) [08:48:11] (03PS9) 10JMeybohm: sre.discovery: Refactor [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) [08:49:22] (03PS3) 10Vgutierrez: Bump libvarnishapi SONAME [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622965 (https://phabricator.wikimedia.org/T261487) [08:49:24] (03PS6) 10Vgutierrez: Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) [08:49:57] (03CR) 10jerkins-bot: [V: 04-1] Bump libvarnishapi SONAME [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622965 (https://phabricator.wikimedia.org/T261487) (owner: 10Vgutierrez) [08:51:56] (03Merged) 10jenkins-bot: Update patch: Detect kubeconfig as known argument in plugin invocations [debs/helm] - 10https://gerrit.wikimedia.org/r/620890 (owner: 10JMeybohm) [08:55:44] (03PS7) 10Vgutierrez: Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) [08:55:46] (03PS1) 10Vgutierrez: Package vcstool.py [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622967 (https://phabricator.wikimedia.org/T260702) [08:56:51] (03PS1) 10Ema: Update versioned dependency on libvarnishapi-dev [software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/622969 (https://phabricator.wikimedia.org/T261487) [08:58:39] (03CR) 10Legoktm: [C: 04-1] [WIP] maps: block 3rd parties with 403, even hits (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/570156 (https://phabricator.wikimedia.org/T244278) (owner: 10BBlack) [08:59:22] (03CR) 10Ema: [C: 03+1] Bump libvarnishapi SONAME [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622965 (https://phabricator.wikimedia.org/T261487) (owner: 10Vgutierrez) [09:09:01] (03PS1) 10Jcrespo: mariadb-backups: productionize backup stats and check database grants [puppet] - 10https://gerrit.wikimedia.org/r/622970 (https://phabricator.wikimedia.org/T260686) [09:09:04] (03CR) 10jerkins-bot: [V: 04-1] Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [09:09:14] (03CR) 10jerkins-bot: [V: 04-1] Package vcstool.py [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622967 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [09:09:36] 10Operations, 10Traffic, 10Patch-For-Review: Varnish 6.0 needs a SONAME version bump - https://phabricator.wikimedia.org/T261487 (10ema) [09:09:45] (03CR) 10jerkins-bot: [V: 04-1] Update versioned dependency on libvarnishapi-dev [software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/622969 (https://phabricator.wikimedia.org/T261487) (owner: 10Ema) [09:10:58] (03CR) 10Jcrespo: "This should close T260686" [puppet] - 10https://gerrit.wikimedia.org/r/622970 (https://phabricator.wikimedia.org/T260686) (owner: 10Jcrespo) [09:14:14] (03CR) 10Marostegui: mariadb-backups: productionize backup stats and check database grants (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622970 (https://phabricator.wikimedia.org/T260686) (owner: 10Jcrespo) [09:17:32] (03CR) 10Muehlenhoff: [C: 03+2] Switch openstack::serverpackages::rocky::stretch to component/ceph [puppet] - 10https://gerrit.wikimedia.org/r/622340 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff) [09:18:10] (03CR) 10Jcrespo: mariadb-backups: productionize backup stats and check database grants (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622970 (https://phabricator.wikimedia.org/T260686) (owner: 10Jcrespo) [09:19:18] !log imported helm_2.16.9-3 to buster-wikimedia, stretch-wikimedia, jessie-wikimedia [09:19:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:18] (03CR) 10jerkins-bot: [V: 04-1] Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [09:22:49] (03PS2) 10Jcrespo: mariadb-backups: productionize backup stats and check database grants [puppet] - 10https://gerrit.wikimedia.org/r/622970 (https://phabricator.wikimedia.org/T260686) [09:23:46] (03CR) 10Marostegui: [C: 03+1] mariadb-backups: productionize backup stats and check database grants (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622970 (https://phabricator.wikimedia.org/T260686) (owner: 10Jcrespo) [09:25:29] (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: productionize backup stats and check database grants [puppet] - 10https://gerrit.wikimedia.org/r/622970 (https://phabricator.wikimedia.org/T260686) (owner: 10Jcrespo) [09:25:36] (03PS1) 10Muehlenhoff: Revert "Switch openstack::serverpackages::rocky::stretch to component/ceph" [puppet] - 10https://gerrit.wikimedia.org/r/622971 [09:26:56] (03PS1) 10Kormat: mariadb: Create profile::mariadb::packages_wmf [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) [09:27:00] (03CR) 10Muehlenhoff: [C: 03+2] Revert "Switch openstack::serverpackages::rocky::stretch to component/ceph" [puppet] - 10https://gerrit.wikimedia.org/r/622971 (owner: 10Muehlenhoff) [09:30:52] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wmcs: expose metricsinfra alert manager [puppet] - 10https://gerrit.wikimedia.org/r/622858 (owner: 10BryanDavis) [09:32:22] (03PS2) 10Ema: Package vcstool.py [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622967 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [09:32:24] (03PS8) 10Ema: Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [09:32:26] (03PS1) 10Ema: Use libvarnishapi2 instead of libvarnishapi1 in override [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622973 (https://phabricator.wikimedia.org/T261487) [09:33:05] (03CR) 10jerkins-bot: [V: 04-1] Package vcstool.py [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622967 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [09:33:20] (03CR) 10jerkins-bot: [V: 04-1] Use libvarnishapi2 instead of libvarnishapi1 in override [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622973 (https://phabricator.wikimedia.org/T261487) (owner: 10Ema) [09:34:03] (03PS2) 10Kormat: mariadb: Create profile::mariadb::packages_wmf [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) [09:43:32] (03CR) 10Kormat: [C: 04-1] install_server: add custom reuse recipe for Kafka Jumbo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622925 (https://phabricator.wikimedia.org/T255123) (owner: 10Elukey) [09:45:27] (03CR) 10Kormat: "PCC run is mostly a no-op: https://puppet-compiler.wmflabs.org/compiler1002/24774/" [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [09:46:01] (03CR) 10Elukey: install_server: add custom reuse recipe for Kafka Jumbo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622925 (https://phabricator.wikimedia.org/T255123) (owner: 10Elukey) [09:46:59] (03PS1) 10Muehlenhoff: Remove apt::pin for ceph packages in openstack/rocky/stretch [puppet] - 10https://gerrit.wikimedia.org/r/622974 (https://phabricator.wikimedia.org/T256877) [09:47:05] (03PS2) 10Elukey: install_server: add custom reuse recipe for Kafka Jumbo [puppet] - 10https://gerrit.wikimedia.org/r/622925 (https://phabricator.wikimedia.org/T255123) [09:47:11] elukey: lol @ "That was a surprise! I am glad you found it, now I can include it :P" [09:47:46] :D [09:48:04] !log updated helm to 2.16.9-3 on chartmuseum*, contint*, deploy* [09:48:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:50:13] (03CR) 10Kormat: install_server: add custom reuse recipe for Kafka Jumbo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622925 (https://phabricator.wikimedia.org/T255123) (owner: 10Elukey) [09:52:17] (03CR) 10Elukey: install_server: add custom reuse recipe for Kafka Jumbo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622925 (https://phabricator.wikimedia.org/T255123) (owner: 10Elukey) [09:53:10] kormat: I am clearly not capable of writing anything before 11 AM [09:53:17] (03CR) 10Marostegui: "You might need to review some of the my.cnf.erb as we have ifs there to set the basedir there too. So that probably needs checking too" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [09:53:40] elukey: you and me both :) [09:53:53] (03PS3) 10Elukey: install_server: add custom reuse recipe for Kafka Jumbo [puppet] - 10https://gerrit.wikimedia.org/r/622925 (https://phabricator.wikimedia.org/T255123) [09:55:26] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/622969 (https://phabricator.wikimedia.org/T261487) (owner: 10Ema) [09:56:54] (03CR) 10Elukey: [C: 03+1] Update versioned dependency on libvarnishapi-dev [software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/622969 (https://phabricator.wikimedia.org/T261487) (owner: 10Ema) [09:58:05] (03CR) 10Kormat: "> You might need to review some of the my.cnf.erb as we have ifs there to set the basedir there too. So that probably needs checking too" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [09:58:20] (03PS3) 10Kormat: mariadb: Create profile::mariadb::packages_wmf [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) [09:58:54] (03CR) 10Kormat: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [09:59:20] (03CR) 10jerkins-bot: [V: 04-1] Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [09:59:22] (03CR) 10Marostegui: "> Patch Set 2:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [10:01:18] (03CR) 10Kormat: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [10:03:49] (03CR) 10Marostegui: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [10:04:09] (03PS9) 10Ema: Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [10:04:11] (03PS1) 10Ema: Work around a breaking change in GNU make 4.3 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622975 (https://phabricator.wikimedia.org/T260702) [10:04:54] (03CR) 10Kormat: [C: 04-2] "> Regarding the patch in general, let's not submit today as it is friday, let's do it after the DC switchover and with puppet disabled I w" [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [10:04:59] (03CR) 10jerkins-bot: [V: 04-1] Work around a breaking change in GNU make 4.3 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622975 (https://phabricator.wikimedia.org/T260702) (owner: 10Ema) [10:05:03] 10Operations: Upgrade debmonitor to Buster - https://phabricator.wikimedia.org/T261489 (10MoritzMuehlenhoff) [10:05:12] (03CR) 10Jcrespo: "-1 As per style guide:" [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [10:10:20] (03PS1) 10Zoranzoki21: Enable sitenotice on mobile for closed wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622777 (https://phabricator.wikimedia.org/T261357) [10:10:51] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10fdans) @CDanis that makes sense. In that case what we propose is adding an intermediate data augmentation step to add these dimensions about 6-7 h... [10:13:00] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10Abbe98) Aren't the referenced patch blocking access to all services on maps.wikimedia.org and not only osm-intl? This issue and the deprecation mess... [10:14:35] (03PS2) 10Zoranzoki21: Enable sitenotice on mobile for closed wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622777 (https://phabricator.wikimedia.org/T261357) [10:19:26] (03CR) 10Kormat: [C: 04-2] "This is stalled for the moment while i try to figure out smaller approaches instead." [puppet] - 10https://gerrit.wikimedia.org/r/622578 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [10:20:31] 10Operations, 10vm-requests: eqiad/codfw: 1 VM request for debmonitor - https://phabricator.wikimedia.org/T261492 (10MoritzMuehlenhoff) [10:20:44] 10Operations, 10vm-requests: eqiad/codfw: 1 VM request for debmonitor - https://phabricator.wikimedia.org/T261492 (10MoritzMuehlenhoff) p:05Triage→03Medium a:03MoritzMuehlenhoff [10:23:25] (03CR) 10jerkins-bot: [V: 04-1] Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [10:28:35] (03CR) 10Kormat: install_server: add custom reuse recipe for Kafka Jumbo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622925 (https://phabricator.wikimedia.org/T255123) (owner: 10Elukey) [10:29:25] (03PS10) 10JMeybohm: sre.discovery: Refactor [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) [10:30:06] (03CR) 10Jcrespo: "For more context, quoting again from the style guide:" [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [10:30:45] (03CR) 10JMeybohm: sre.discovery: Refactor (036 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) (owner: 10JMeybohm) [10:30:54] (03CR) 10Elukey: install_server: add custom reuse recipe for Kafka Jumbo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622925 (https://phabricator.wikimedia.org/T255123) (owner: 10Elukey) [10:32:30] elukey: "Partman is so flexible and manageable that I am sure it will not cause any problem" 😆 [10:32:40] (03CR) 10ZPapierski: "https://puppet-compiler.wmflabs.org/compiler1002/24775/. Tested on a custom instance - launching the instances work, unfortunately I canno" [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) (owner: 10ZPapierski) [10:32:54] (03CR) 10Elukey: install_server: add custom reuse recipe for Kafka Jumbo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622925 (https://phabricator.wikimedia.org/T255123) (owner: 10Elukey) [10:35:03] 10Operations, 10vm-requests: eqiad/codfw: 1 VM request for debmonitor - https://phabricator.wikimedia.org/T261492 (10jbond) Looks good [10:36:03] 10Operations: FY2020-2021 Q1 eqiad -> codfw switchover - https://phabricator.wikimedia.org/T243316 (10Marostegui) [10:36:06] kormat: it is, isn't that your experience too? :D [10:36:45] 10Operations, 10ops-eqiad, 10netops: eqiad row D switch fabric recabling - https://phabricator.wikimedia.org/T256112 (10Marostegui) [10:36:47] 10Operations: FY2020-2021 Q1 eqiad -> codfw switchover - https://phabricator.wikimedia.org/T243316 (10Marostegui) [10:36:50] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: eqiad row D switch upgrade - https://phabricator.wikimedia.org/T172459 (10Marostegui) [10:48:22] (03PS1) 10Muehlenhoff: Add DNS records for debmonitor1002/2002 [dns] - 10https://gerrit.wikimedia.org/r/622977 (https://phabricator.wikimedia.org/T261492) [10:52:18] (03CR) 10Kormat: install_server: add custom reuse recipe for Kafka Jumbo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622925 (https://phabricator.wikimedia.org/T255123) (owner: 10Elukey) [11:00:07] (03CR) 10Muehlenhoff: [C: 03+2] Add DNS records for debmonitor1002/2002 [dns] - 10https://gerrit.wikimedia.org/r/622977 (https://phabricator.wikimedia.org/T261492) (owner: 10Muehlenhoff) [11:06:05] (03PS10) 10Ema: Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [11:06:07] (03PS1) 10Ema: Set debhelper compatibility level to 10 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622978 [11:07:00] (03CR) 10jerkins-bot: [V: 04-1] Set debhelper compatibility level to 10 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622978 (owner: 10Ema) [11:07:03] (03CR) 10Muehlenhoff: "All caches are on Buster, so you can just as well use 12?" [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622978 (owner: 10Ema) [11:25:25] (03CR) 10jerkins-bot: [V: 04-1] Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [11:27:30] !log jmm@cumin2001 START - Cookbook sre.ganeti.makevm [11:27:30] !log jmm@cumin2001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) [11:27:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:27:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:27:59] !log jmm@cumin2001 START - Cookbook sre.ganeti.makevm [11:28:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:29:26] (03PS1) 10Jbond: sqlite: add sqlite module [puppet] - 10https://gerrit.wikimedia.org/r/622979 [11:37:27] 10Operations: FY2020-2021 Q1 codfw -> eqiad switchback - https://phabricator.wikimedia.org/T243318 (10Marostegui) [11:37:56] !log jmm@cumin2001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [11:37:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:38:32] (03CR) 10Muehlenhoff: "Looks good to me, one comment inline." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/621485 (https://phabricator.wikimedia.org/T260883) (owner: 10Filippo Giunchedi) [11:40:24] !log jmm@cumin2001 START - Cookbook sre.ganeti.makevm [11:40:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:44:53] (03CR) 10Ema: "recheck" [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [11:48:45] (03CR) 10Hnowlan: [C: 03+2] api-gateway: Restrict unauthenticated write HTTP methods, permit read HTTP methods [deployment-charts] - 10https://gerrit.wikimedia.org/r/613650 (https://phabricator.wikimedia.org/T256769) (owner: 10Hnowlan) [11:49:57] (03Merged) 10jenkins-bot: api-gateway: Restrict unauthenticated write HTTP methods, permit read HTTP methods [deployment-charts] - 10https://gerrit.wikimedia.org/r/613650 (https://phabricator.wikimedia.org/T256769) (owner: 10Hnowlan) [11:50:17] !log jmm@cumin2001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [11:50:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:55:32] (03PS2) 10Ema: Set debhelper compatibility level to 12 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622978 [11:55:57] (03PS11) 10Ema: Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [11:56:08] (03CR) 10jerkins-bot: [V: 04-1] Set debhelper compatibility level to 12 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622978 (owner: 10Ema) [11:56:51] (03CR) 10Ema: "> Patch Set 1:" [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622978 (owner: 10Ema) [12:00:44] (03CR) 10Muehlenhoff: Set debhelper compatibility level to 12 (031 comment) [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622978 (owner: 10Ema) [12:12:12] (03PS1) 10Muehlenhoff: Add debmonitor1002/2002 to puppet [puppet] - 10https://gerrit.wikimedia.org/r/622984 (https://phabricator.wikimedia.org/T261492) [12:15:03] (03CR) 10jerkins-bot: [V: 04-1] Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [12:16:43] (03CR) 10Muehlenhoff: [C: 03+2] Add debmonitor1002/2002 to puppet [puppet] - 10https://gerrit.wikimedia.org/r/622984 (https://phabricator.wikimedia.org/T261492) (owner: 10Muehlenhoff) [12:20:13] (03PS3) 10Ema: Set debhelper compatibility level to 12 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622978 [12:20:15] (03PS12) 10Ema: Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [12:21:03] (03CR) 10jerkins-bot: [V: 04-1] Set debhelper compatibility level to 12 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622978 (owner: 10Ema) [12:21:08] (03CR) 10Ema: Set debhelper compatibility level to 12 (031 comment) [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622978 (owner: 10Ema) [12:28:38] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10CDanis) Yes, it would. There's two use cases here: * DoS attack analysis, for which real-time is essential. Here, the augmented data would be hel... [12:32:47] 10Operations, 10Discovery, 10Discovery-Search, 10Elasticsearch: Elasticsearch errors about BulkShardRequest - https://phabricator.wikimedia.org/T167091 (10Gehel) 05Open→03Declined Context lost from 3 years ago, closing it for now. [12:43:19] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622978 (owner: 10Ema) [12:46:17] !log installing debmonitor2002 T261492 [12:46:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:22] T261492: eqiad/codfw: 1 VM request for debmonitor - https://phabricator.wikimedia.org/T261492 [12:47:18] (03CR) 10jerkins-bot: [V: 04-1] Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [12:53:06] (03PS4) 10Vgutierrez: Remove unnecessary patches for Varnish 6 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621265 (https://phabricator.wikimedia.org/T260702) [12:53:08] (03PS5) 10Vgutierrez: Update 0003-vsm-perms.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621284 (https://phabricator.wikimedia.org/T260702) [12:53:10] (03PS4) 10Vgutierrez: Update 0005-stats-shortlived.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621532 (https://phabricator.wikimedia.org/T260702) [12:53:12] (03PS3) 10Vgutierrez: Add 0006-bump-api-soname [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622964 (https://phabricator.wikimedia.org/T261487) [12:53:14] (03PS7) 10Vgutierrez: Update debian/control [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621693 (https://phabricator.wikimedia.org/T260702) [12:53:16] (03PS4) 10Vgutierrez: Bump libvarnishapi SONAME [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622965 (https://phabricator.wikimedia.org/T261487) [12:53:18] (03PS2) 10Vgutierrez: Use libvarnishapi2 instead of libvarnishapi1 in override [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622973 (https://phabricator.wikimedia.org/T261487) (owner: 10Ema) [12:53:20] (03PS3) 10Vgutierrez: Package vcstool.py [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622967 (https://phabricator.wikimedia.org/T260702) [12:53:22] (03PS2) 10Vgutierrez: Work around a breaking change in GNU make 4.3 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622975 (https://phabricator.wikimedia.org/T260702) (owner: 10Ema) [12:53:24] (03PS4) 10Vgutierrez: Set debhelper compatibility level to 12 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622978 (owner: 10Ema) [12:53:26] (03PS13) 10Vgutierrez: Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) [12:54:07] (03PS4) 10Elukey: install_server: add custom reuse recipe for Kafka Jumbo [puppet] - 10https://gerrit.wikimedia.org/r/622925 (https://phabricator.wikimedia.org/T255123) [12:54:30] (03CR) 10Elukey: install_server: add custom reuse recipe for Kafka Jumbo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622925 (https://phabricator.wikimedia.org/T255123) (owner: 10Elukey) [12:54:43] (03CR) 10jerkins-bot: [V: 04-1] Remove unnecessary patches for Varnish 6 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621265 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [12:54:45] (03CR) 10jerkins-bot: [V: 04-1] Update 0003-vsm-perms.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621284 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [12:54:54] (03CR) 10jerkins-bot: [V: 04-1] Add 0006-bump-api-soname [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622964 (https://phabricator.wikimedia.org/T261487) (owner: 10Vgutierrez) [12:54:56] (03CR) 10jerkins-bot: [V: 04-1] Update 0005-stats-shortlived.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621532 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [12:55:00] (03CR) 10jerkins-bot: [V: 04-1] Bump libvarnishapi SONAME [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622965 (https://phabricator.wikimedia.org/T261487) (owner: 10Vgutierrez) [12:55:04] (03CR) 10jerkins-bot: [V: 04-1] Update debian/control [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621693 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [12:55:11] (03CR) 10jerkins-bot: [V: 04-1] Use libvarnishapi2 instead of libvarnishapi1 in override [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622973 (https://phabricator.wikimedia.org/T261487) (owner: 10Ema) [12:55:17] (03CR) 10jerkins-bot: [V: 04-1] Package vcstool.py [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622967 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [12:55:27] (03CR) 10jerkins-bot: [V: 04-1] Work around a breaking change in GNU make 4.3 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622975 (https://phabricator.wikimedia.org/T260702) (owner: 10Ema) [12:55:33] (03CR) 10Kormat: [C: 03+1] "LGTM! Let me know how it goes when you test it." [puppet] - 10https://gerrit.wikimedia.org/r/622925 (https://phabricator.wikimedia.org/T255123) (owner: 10Elukey) [12:55:37] (03CR) 10jerkins-bot: [V: 04-1] Set debhelper compatibility level to 12 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/622978 (owner: 10Ema) [12:56:07] !log installing debmonitor1002 T261492 [12:56:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:11] T261492: eqiad/codfw: 1 VM request for debmonitor - https://phabricator.wikimedia.org/T261492 [12:57:25] 10Operations: "In other projects" is big and not style aligned in Italian projects - https://phabricator.wikimedia.org/T261497 (10Ferdi2005) [13:02:52] (03CR) 10Elukey: [C: 03+2] install_server: add custom reuse recipe for Kafka Jumbo [puppet] - 10https://gerrit.wikimedia.org/r/622925 (https://phabricator.wikimedia.org/T255123) (owner: 10Elukey) [13:07:59] !log stop kafka on kafka-jumbo1006 and reimage to buster [13:08:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:44] (03PS4) 10Kormat: mariadb: Create profile::mariadb::packages_wmf [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) [13:14:06] (03PS13) 10Kormat: mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 (https://phabricator.wikimedia.org/T256972) [13:16:54] (03CR) 10Kormat: [C: 04-2] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/622578 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [13:18:04] (03PS3) 10Itamar Givon: Add `wmgWikibaseClientMainEntitySourceName` to InitialiseSettings.phpg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622612 (https://phabricator.wikimedia.org/T258060) [13:19:25] (03PS4) 10Itamar Givon: Add `wmgWikibaseClientMainEntitySourceName` to InitialiseSettings.phpg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622612 (https://phabricator.wikimedia.org/T258060) [13:21:59] (03CR) 10jerkins-bot: [V: 04-1] Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [13:22:19] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1003 is CRITICAL: 96 ge 10 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1003 [13:22:28] yep this is due to my reimage [13:23:04] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [13:23:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:24:36] (03PS1) 10Itamar Givon: Use `wmgWikibaseClientMainEntitySourceName` instead of `wmgWikibaseClientLocalEntitySourceName` in Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622993 (https://phabricator.wikimedia.org/T258060) [13:25:15] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [13:25:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:02] (03PS1) 10Itamar Givon: Remove wmgWikibaseClientLocalEntitySourceName from InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622994 (https://phabricator.wikimedia.org/T258060) [13:28:29] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1005 is CRITICAL: 94 ge 10 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1005 [13:28:29] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1002 is CRITICAL: 106 ge 10 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1002 [13:28:40] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10fdans) Thanks for clarifying. A correction from my end: the extra dimensions would actually take significantly less then 6 hours since they would... [13:30:06] (03PS5) 10Itamar Givon: Add `wmgWikibaseClientMainEntitySourceName` to InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622612 (https://phabricator.wikimedia.org/T258060) [13:30:08] (03PS2) 10Itamar Givon: Use `wmgWikibaseClientMainEntitySourceName` instead of `wmgWikibaseClientLocalEntitySourceName` in Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622993 (https://phabricator.wikimedia.org/T258060) [13:30:10] (03PS2) 10Itamar Givon: Remove `wmgWikibaseClientLocalEntitySourceName` from InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622994 (https://phabricator.wikimedia.org/T258060) [13:33:00] lol [13:33:04] oops... wrong window :) [13:34:50] 10Operations, 10Release-Engineering-Team, 10Scap, 10serviceops, 10Platform Team Workboards (Clinic Duty Team): Deployment infrastructure for PHP microservices - https://phabricator.wikimedia.org/T261369 (10thcipriani) Scap3 was built as a general deployment tool, and should be able to deploy PHP code (I... [13:35:12] !log hashar@deploy1001 Started deploy [integration/docroot@65ec92c]: noop, sync up for README.md [13:35:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:20] !log hashar@deploy1001 Finished deploy [integration/docroot@65ec92c]: noop, sync up for README.md (duration: 00m 07s) [13:35:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:52] (03CR) 10Jcrespo: "There is one additional thing, this removes a functionality (which is not mentioned on the commit message) that I used to use quite a lot " [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [13:36:45] (03CR) 10Clarakosi: [C: 03+1] Install OAuthRateLimiter extension I: Add i18n [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622896 (https://phabricator.wikimedia.org/T258423) (owner: 10Ppchelko) [13:40:32] 10Operations, 10Release-Engineering-Team, 10Scap, 10serviceops, 10Platform Team Workboards (Clinic Duty Team): Deployment infrastructure for PHP microservices - https://phabricator.wikimedia.org/T261369 (10Joe) @thcipriani this is a service that will have to be deployed to kubernetes. So I think the actu... [13:42:42] (03CR) 10Clarakosi: [C: 03+1] Install OAuthRateLimiter III: Install where enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622898 (https://phabricator.wikimedia.org/T258423) (owner: 10Ppchelko) [13:44:44] (03CR) 10Clarakosi: [C: 03+1] Install OAuthRateLimiter extension II: Add flag to IS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622897 (https://phabricator.wikimedia.org/T258423) (owner: 10Ppchelko) [13:44:55] (03PS1) 10Kormat: mariadb: Allow overriding of wmf-mariadb version in hiera [puppet] - 10https://gerrit.wikimedia.org/r/622995 (https://phabricator.wikimedia.org/T256972) [13:45:45] (03CR) 10Jcrespo: "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/622995 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [13:45:49] (03CR) 10Kormat: [C: 04-2] "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [13:46:58] (03PS1) 10Hnowlan: api-gateway: fix config indentation [deployment-charts] - 10https://gerrit.wikimedia.org/r/622996 (https://phabricator.wikimedia.org/T256769) [13:48:38] 10Operations, 10Release-Engineering-Team, 10serviceops, 10Platform Team Workboards (Clinic Duty Team), 10Release Pipeline (Blubber): Deployment infrastructure for PHP microservices - https://phabricator.wikimedia.org/T261369 (10Joe) [13:48:51] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1001 is CRITICAL: 71 ge 10 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1001 [13:48:57] (03CR) 10Jcrespo: [C: 03+1] "I had some discussions and I belive this is just a step towards a deeper refactoring, which needs some sacrifices. I wasn't fully aware of" [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [13:49:31] (03CR) 10Jbond: "I had a chat with kormat and Jynus and will try to summarise here (please both correct me if im wrong)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [13:52:05] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1004 is CRITICAL: 35 ge 10 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1004 [13:55:32] (03PS2) 10Jbond: sqlite: add sqlite module [puppet] - 10https://gerrit.wikimedia.org/r/622979 [13:55:35] (03CR) 10Marostegui: "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [13:57:12] (03CR) 10Jcrespo: [C: 03+1] "> As soon as m5 master failover is done, you'll get that backup testing host that was promised :)" [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [13:57:56] (03PS1) 10Elukey: install_server: set reuse-part.cfg as partman recipe for kafka-jumbo [puppet] - 10https://gerrit.wikimedia.org/r/622998 (https://phabricator.wikimedia.org/T255123) [13:58:38] 10Operations, 10vm-requests: eqiad/codfw: 1 VM request for debmonitor - https://phabricator.wikimedia.org/T261492 (10MoritzMuehlenhoff) 05Open→03Resolved debmonitor1002 and debmonitor2002 have been created and installed. [13:58:55] (03CR) 10Elukey: [C: 03+2] install_server: set reuse-part.cfg as partman recipe for kafka-jumbo [puppet] - 10https://gerrit.wikimedia.org/r/622998 (https://phabricator.wikimedia.org/T255123) (owner: 10Elukey) [13:59:01] (03CR) 10Kormat: [C: 03+1] install_server: set reuse-part.cfg as partman recipe for kafka-jumbo [puppet] - 10https://gerrit.wikimedia.org/r/622998 (https://phabricator.wikimedia.org/T255123) (owner: 10Elukey) [14:04:05] RECOVERY - Kafka Broker Under Replicated Partitions on kafka-jumbo1001 is OK: (C)10 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1001 [14:04:40] all the others should recover soon [14:04:45] RECOVERY - Kafka Broker Under Replicated Partitions on kafka-jumbo1002 is OK: (C)10 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1002 [14:05:35] RECOVERY - Kafka Broker Under Replicated Partitions on kafka-jumbo1004 is OK: (C)10 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1004 [14:06:19] RECOVERY - Kafka Broker Under Replicated Partitions on kafka-jumbo1005 is OK: (C)10 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1005 [14:11:00] (03CR) 10Ottomata: [C: 03+1] admin: Add razzi to users and add to analytics groups [puppet] - 10https://gerrit.wikimedia.org/r/622878 (https://phabricator.wikimedia.org/T261443) (owner: 10Razzi) [14:13:03] RECOVERY - Kafka Broker Under Replicated Partitions on kafka-jumbo1003 is OK: (C)10 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1003 [14:20:38] (03PS5) 10Kormat: mariadb: Create profile::mariadb::packages_wmf [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) [14:21:29] (03PS2) 10Kormat: mariadb: Allow overriding of wmf-mariadb version in hiera [puppet] - 10https://gerrit.wikimedia.org/r/622995 (https://phabricator.wikimedia.org/T256972) [14:22:36] (03CR) 10Kormat: [C: 04-2] mariadb: Create profile::mariadb::packages_wmf (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [14:22:38] !log installing Java security updates on kafka/main and Logstash(5) clusters [14:22:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:24] (03CR) 10Jbond: "given this a quick once over and it looks like a great improvement to me. ultimately i think we want all/most of the mariadb module name " (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/622578 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [14:29:29] jbond42: 💜 [14:29:58] * marostegui kormat....👀 [14:30:26] kormat: re the purge thing you can see here https://phabricator.wikimedia.org/T214605 for some really hacky scripts to check if there are any unmanaged files in a dir [14:31:20] jbond42: ack. i'll need to take a look to make sure that unmanaged files aren't used for.. anything. [14:31:22] (03CR) 10Jbond: [C: 03+1] "obvioulsy wait untill monday but +1 from me" [puppet] - 10https://gerrit.wikimedia.org/r/622972 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [14:32:04] hopefully there are no unmanaged files :D lol [14:32:21] jbond42: i completely agree :) [14:32:41] (03PS5) 10Ottomata: admin: Add razzi to users and add to analytics-privatedata-users group [puppet] - 10https://gerrit.wikimedia.org/r/622878 (https://phabricator.wikimedia.org/T261443) (owner: 10Razzi) [14:32:57] i'm once again missing ansible, where i could run it with --check so i'd see what it would remove [14:33:59] (03CR) 10Ottomata: [C: 03+1] "Ah! We realized today that analytics-admins does have some sudo privileges, so we need to get approval from SRE before we add that. analy" [puppet] - 10https://gerrit.wikimedia.org/r/622878 (https://phabricator.wikimedia.org/T261443) (owner: 10Razzi) [14:34:10] (03PS6) 10Ottomata: admin: Add razzi to users and add to analytics-privatedata-users group [puppet] - 10https://gerrit.wikimedia.org/r/622878 (https://phabricator.wikimedia.org/T261443) (owner: 10Razzi) [14:35:11] jbond42: oh, even better. the dir doesn't exist on half the hosts, aaand it's empty on all the others [14:35:39] thats perfect :) [14:36:12] fyi vcery hacky but you could do something like cumin '*' ' puppet apply --noop -e "file {'/tmp/foo': ensure => file, purge => true, recurse => true}"' [14:36:36] * kormat shudders [14:36:46] :D a friday treat for you ;) [14:39:00] correction: when i look at the _correct_ directory. it only exists on multiinstance hosts, and it only contains managed files. [14:39:06] so your purge suggestion is perfect. [14:40:52] (03PS7) 10Ottomata: admin: Add razzi to users and add to analytics groups [puppet] - 10https://gerrit.wikimedia.org/r/622878 (https://phabricator.wikimedia.org/T261443) (owner: 10Razzi) [14:41:13] (03PS14) 10Kormat: mariadb: Add profile::mariadb::common [puppet] - 10https://gerrit.wikimedia.org/r/622578 (https://phabricator.wikimedia.org/T256972) [14:41:31] (03CR) 10Ottomata: "> Ah! We realized today that analytics-admins does have some sudo privileges, so we need to get approval from SRE before we add that." [puppet] - 10https://gerrit.wikimedia.org/r/622878 (https://phabricator.wikimedia.org/T261443) (owner: 10Razzi) [14:42:27] (03PS8) 10Ottomata: admin: Add razzi to users and add to analytics groups [puppet] - 10https://gerrit.wikimedia.org/r/622878 (https://phabricator.wikimedia.org/T261443) (owner: 10Razzi) [14:43:55] (03CR) 10Jbond: [C: 03+2] sqlite: add sqlite module [puppet] - 10https://gerrit.wikimedia.org/r/622979 (owner: 10Jbond) [14:43:57] (03CR) 10Ottomata: [C: 03+2] admin: Add razzi to users and add to analytics groups [puppet] - 10https://gerrit.wikimedia.org/r/622878 (https://phabricator.wikimedia.org/T261443) (owner: 10Razzi) [14:46:53] jbond42: [14:46:56] E: failed to lock, another puppet-merge running on this host? [14:46:57] locking process tree: systemd---sshd---sshd---sshd(jbond)---sudo(root)---puppet-merge---python3(gitpuppet) [14:46:58] :) [14:47:06] you got a prompt waiting for a 'y' ? [14:47:40] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production shell and wmf ldap access for Razzi Abuissa - https://phabricator.wikimedia.org/T261443 (10Ottomata) [14:48:45] ottomata: sorry merging yours as well [14:51:16] danke [14:56:01] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production shell and wmf ldap access for Razzi Abuissa - https://phabricator.wikimedia.org/T261443 (10Ottomata) LDAP wmf addition done: ` [@mwmaint1002:/home/otto] $ ldapsearch -x cn=wmf | grep razzi memb... [15:18:54] 10Operations, 10Maps, 10Traffic, 10Wiki-Loves-Monuments (2020): wikimedia.pl returns a HTTP 429 error (let it access varnish maps_domains) - https://phabricator.wikimedia.org/T261506 (10TOR) [15:19:42] 10Operations, 10Product-Infrastructure-Data, 10Epic, 10Goal, 10Patch-For-Review: automatically collect network error reports from users' browsers (Network Error Logging API) - https://phabricator.wikimedia.org/T257527 (10CDanis) [15:24:46] (03CR) 10Ebernhardson: Multiple instances of msearch_daemon (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) (owner: 10ZPapierski) [15:25:40] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10AntiCompositeNumber) There are some uses of maps.wikimedia.org by websites that are definitely movement-affiliated but may not be hosted on Wikimedi... [15:27:23] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10AntiCompositeNumber) [15:35:07] (03CR) 10Elukey: Multiple instances of msearch_daemon (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) (owner: 10ZPapierski) [15:58:40] elukey: thanks for the +1 on the mcrouter proxy change. was it really just 'yaml engineering' and to merge that change or was there some other action needed to change the proxy? [16:03:12] mutante: np! the yaml file gets picked up and reloaded by mcrouter automatically without the need of a restart (via inotify), so it is just a matter of letting puppet to do the heavy lift :) [16:03:49] elukey: perfect! thank you. i'll do it later today and then all the old hardware is removed from codfw before the switch [16:03:56] super [16:06:58] !log starting one more live test of the data center switchover automation, no production impact is expected but there will be some SAL noise [16:07:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:12] FYI: ^ and I'll be watching this channel in case of trouble [16:07:46] !log rzl@cumin1001 START - Cookbook sre.switchdc.mediawiki.00-disable-puppet [16:07:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:49] !log rzl@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0) [16:07:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:08:24] (03CR) 10Ppchelko: [C: 03+2] api-gateway: fix config indentation [deployment-charts] - 10https://gerrit.wikimedia.org/r/622996 (https://phabricator.wikimedia.org/T256769) (owner: 10Hnowlan) [16:08:30] !log rzl@cumin1001 START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl [16:08:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:08:40] !log rzl@cumin1001 END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99) [16:08:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:09:31] !log rzl@cumin1001 START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl [16:09:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:09:37] (03Merged) 10jenkins-bot: api-gateway: fix config indentation [deployment-charts] - 10https://gerrit.wikimedia.org/r/622996 (https://phabricator.wikimedia.org/T256769) (owner: 10Hnowlan) [16:09:38] !log rzl@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0) [16:09:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:11:49] (03PS1) 10Hnowlan: api-gateway: Collect metrics [deployment-charts] - 10https://gerrit.wikimedia.org/r/623012 (https://phabricator.wikimedia.org/T254910) [16:12:12] !log rzl@cumin1001 START - Cookbook sre.switchdc.mediawiki.00-warmup-caches [16:12:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:13:38] !log rzl@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0) [16:13:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:16:05] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [16:16:45] (03CR) 10Ppchelko: [C: 04-1] "LGTM, some questions inlined" (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/623012 (https://phabricator.wikimedia.org/T254910) (owner: 10Hnowlan) [16:17:57] RECOVERY - High average GET latency for mw requests on appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [16:19:09] !log rzl@cumin1001 START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance [16:19:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:26] (03CR) 10Dzahn: "@hashar the -1 seems to be outdated based on the assumption this was related to the move of charts to chartmuseum" [puppet] - 10https://gerrit.wikimedia.org/r/621090 (https://phabricator.wikimedia.org/T260742) (owner: 10Dzahn) [16:19:29] !log rzl@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0) [16:19:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:22:13] 10Puppet, 10Beta-Cluster-Infrastructure: Puppet Failures on deployment-cache-upload06 - https://phabricator.wikimedia.org/T261513 (10nskaggs) [16:24:48] (03CR) 10Hnowlan: api-gateway: Collect metrics (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/623012 (https://phabricator.wikimedia.org/T254910) (owner: 10Hnowlan) [16:28:07] !log rzl@cumin1001 START - Cookbook sre.switchdc.mediawiki.02-set-readonly [16:28:08] !log rzl@cumin1001 [DRY-RUN] MediaWiki read-only period starts at: 2020-08-28 16:28:07.882663 [16:28:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:19] !log rzl@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0) [16:28:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:24] !log rzl@cumin1001 START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly [16:28:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:55] !log rzl@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0) [16:28:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:00] !log rzl@cumin1001 START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki [16:29:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:09] !log rzl@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0) [16:29:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:13] !log rzl@cumin1001 START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions [16:29:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:15] !log rzl@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0) [16:29:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:19] !log rzl@cumin1001 START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite [16:29:21] !log rzl@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0) [16:29:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:24] !log rzl@cumin1001 START - Cookbook sre.switchdc.mediawiki.07-set-readwrite [16:29:24] !log rzl@cumin1001 [DRY-RUN] MediaWiki read-only period ends at: 2020-08-28 16:29:24.432463 [16:29:24] !log rzl@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0) [16:29:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:31] (03CR) 10Ppchelko: api-gateway: Collect metrics (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/623012 (https://phabricator.wikimedia.org/T254910) (owner: 10Hnowlan) [16:33:03] !log rzl@cumin1001 START - Cookbook sre.switchdc.mediawiki.08-restore-ttl [16:33:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:13] !log rzl@cumin1001 END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99) [16:33:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:16] !log rzl@cumin1001 START - Cookbook sre.switchdc.mediawiki.08-restore-ttl [16:33:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:36] !log rzl@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0) [16:33:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:34:03] !log rzl@cumin1001 START - Cookbook sre.switchdc.mediawiki.08-start-maintenance [16:34:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:35:23] (03CR) 10Ppchelko: "Oh I see now. We can only scrape one of them.. Hm.. Yeah, I guess envoy metrics are more important then ratelimit metrics, let's fix them " [deployment-charts] - 10https://gerrit.wikimedia.org/r/623012 (https://phabricator.wikimedia.org/T254910) (owner: 10Hnowlan) [16:35:32] !log rzl@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0) [16:35:33] (03PS1) 10Dzahn: microsites/misc_apps: add data types for host names [puppet] - 10https://gerrit.wikimedia.org/r/623016 [16:35:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:35:38] (03CR) 10Ppchelko: "Maybe enable it in staging right away?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/623012 (https://phabricator.wikimedia.org/T254910) (owner: 10Hnowlan) [16:35:55] !log rzl@cumin1001 START - Cookbook sre.switchdc.mediawiki.08-update-tendril [16:35:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:36:11] !log rzl@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0) [16:36:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:53] (03CR) 10Hnowlan: api-gateway: Collect metrics (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/623012 (https://phabricator.wikimedia.org/T254910) (owner: 10Hnowlan) [16:40:15] (03PS1) 10Dzahn: icinga::ircbot:: add data types [puppet] - 10https://gerrit.wikimedia.org/r/623017 [16:40:25] !log switchdc live test complete [16:40:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:42:09] (03CR) 10Bstorm: [C: 03+2] shared-storage: add specific NFS volume monitoring for cleanups [puppet] - 10https://gerrit.wikimedia.org/r/622877 (https://phabricator.wikimedia.org/T261335) (owner: 10Bstorm) [16:42:25] (03CR) 10Ppchelko: api-gateway: Collect metrics (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/623012 (https://phabricator.wikimedia.org/T254910) (owner: 10Hnowlan) [16:46:50] rzl: nice! [16:47:05] (03PS1) 10Dzahn: ircecho: split server parameter into FQDN and port, add data types [puppet] - 10https://gerrit.wikimedia.org/r/623018 [16:47:10] still a couple of things to chase down :) but we're getting close [16:47:27] (03CR) 10Dzahn: [C: 03+2] mediawiki: replace mw2196 with mw2336 as mcrouter proxy [puppet] - 10https://gerrit.wikimedia.org/r/622900 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [16:48:27] (03CR) 10jerkins-bot: [V: 04-1] ircecho: split server parameter into FQDN and port, add data types [puppet] - 10https://gerrit.wikimedia.org/r/623018 (owner: 10Dzahn) [16:52:47] RECOVERY - Rate of JVM GC Old generation-s runs - logstash1010-production-logstash-eqiad on logstash1010 is OK: (C)100 gt (W)80 gt 78.31 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-logstash-eqiad&var-instance=logstash1010&panelId=37 [16:54:37] (03PS2) 10Dzahn: ircecho: split server var into FQDN and port, data types, hiera->lookup [puppet] - 10https://gerrit.wikimedia.org/r/623018 [16:56:35] (03PS1) 10Dzahn: site: remove mw2196.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/623022 (https://phabricator.wikimedia.org/T260654) [17:03:05] (03PS1) 10Dzahn: OTRS: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/623023 [17:04:04] (03CR) 10jerkins-bot: [V: 04-1] OTRS: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/623023 (owner: 10Dzahn) [17:05:38] (03PS1) 10Dzahn: wikimania_scholarships: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/623024 [17:15:33] (03PS1) 10Dzahn: base: remove override and conditionals for rasdaemon install [puppet] - 10https://gerrit.wikimedia.org/r/623027 (https://phabricator.wikimedia.org/T205396) [17:15:47] (03CR) 10DannyS712: [C: 03+1] Install OAuthRateLimiter extension I: Add i18n [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622896 (https://phabricator.wikimedia.org/T258423) (owner: 10Ppchelko) [17:15:58] (03CR) 10DannyS712: [C: 03+1] Install OAuthRateLimiter extension II: Add flag to IS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622897 (https://phabricator.wikimedia.org/T258423) (owner: 10Ppchelko) [17:16:05] (03CR) 10Jdlrobson: [C: 03+1] Enable sitenotice on mobile for closed wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622777 (https://phabricator.wikimedia.org/T261357) (owner: 10Zoranzoki21) [17:17:30] (03PS2) 10Hnowlan: api-gateway: Collect metrics [deployment-charts] - 10https://gerrit.wikimedia.org/r/623012 (https://phabricator.wikimedia.org/T254910) [17:17:51] (03CR) 10DannyS712: Install OAuthRateLimiter III: Install where enabled (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622898 (https://phabricator.wikimedia.org/T258423) (owner: 10Ppchelko) [17:18:19] (03CR) 10DannyS712: Install OAuthRateLimiter extension IV: Enable on labs (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622899 (owner: 10Ppchelko) [17:37:54] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [17:37:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:39:08] !log shutting down mw2196 [17:39:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:39:21] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [17:39:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:39:28] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2135-mw2147, mw2187-mw2199, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw2196.codfw.wmnet` - mw2196.codfw.wmnet (*... [17:39:48] (03CR) 10Dzahn: [C: 03+2] site: remove mw2196.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/623022 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [17:41:34] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2135-mw2147, mw2187-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10Dzahn) [17:43:01] 10Operations, 10Icinga: incident 20170323-wikibase did not trigger Icinga paging - https://phabricator.wikimedia.org/T161528 (10Dzahn) 05Open→03Declined [17:43:55] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2135-mw2147, mw2187-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10Dzahn) [17:43:57] 10Operations: FY2020-2021 Q1 eqiad -> codfw switchover - https://phabricator.wikimedia.org/T243316 (10Dzahn) [17:45:37] 10Operations, 10Wikidata, 10Wikidata-Termbox, 10serviceops, 10User-Addshore: Plan to scale up termbox service to be able to render the termbox for desktop pageviews - https://phabricator.wikimedia.org/T261486 (10jijiki) p:05Triage→03Medium [17:45:41] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2135-mw2147, mw2187-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10Dzahn) [17:45:43] 10Operations, 10serviceops: move all 86 new codfw appservers into production (mw2[291-2377].codfw.wmnet) - https://phabricator.wikimedia.org/T247021 (10Dzahn) [17:49:00] (03PS1) 10Dzahn: remove mw2135-mw2147 and mw2187-mw2214 [dns] - 10https://gerrit.wikimedia.org/r/623031 (https://phabricator.wikimedia.org/T260654) [17:51:04] (03PS2) 10Dzahn: remove mw2135-mw2147 and mw2187-mw2214 [dns] - 10https://gerrit.wikimedia.org/r/623031 (https://phabricator.wikimedia.org/T260654) [18:03:35] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10Dzahn) Isn't this already the case? I mean we just recently had to add wikilovesmonuments to the regex of allowed domains to let it use m... [18:06:12] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10Dzahn) It seems like this statement: > only support requests from the Wikimedia cluster is in conflict with this statement: > Cloud S... [18:08:09] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10JMinor) @AntiCompositeNumber our intention was to create a clear, maintainable, line around what domains would be supported. We'd really l... [18:08:19] (03CR) 10Krinkle: [WIP] maps: block 3rd parties with 403, even hits (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/570156 (https://phabricator.wikimedia.org/T244278) (owner: 10BBlack) [18:09:38] (03CR) 10Bstorm: "PCC for some physical machines https://puppet-compiler.wmflabs.org/compiler1003/24781/" [puppet] - 10https://gerrit.wikimedia.org/r/622874 (https://phabricator.wikimedia.org/T218423) (owner: 10Bstorm) [18:10:15] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10CDanis) My two cents: I suspect it's merely the case that the language in the announcement was somewhat imprecise. That's understandable... [18:10:30] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10JMinor) > " from the Wikimedia cluster" is a naive Product person's use of that term. I apologize if that imprecision is causing misunders... [18:16:27] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10BBlack) Based on the email to maps-l linked at the top, it's not clear to me that cases like wikilovesmonuments and wikimedia.pl were mean... [18:18:34] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/24785/miscweb1002.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/623024 (owner: 10Dzahn) [18:21:53] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10BBlack) //[Removed my earlier comment because I had failed to refresh this ticket and catch up on all the other recent traffic, which chan... [18:24:37] (03PS1) 10Andrew Bogott: Openstack Nova compute: hack around a bug with threading on Buster [puppet] - 10https://gerrit.wikimedia.org/r/623033 (https://phabricator.wikimedia.org/T261463) [18:25:03] (03CR) 10jerkins-bot: [V: 04-1] Openstack Nova compute: hack around a bug with threading on Buster [puppet] - 10https://gerrit.wikimedia.org/r/623033 (https://phabricator.wikimedia.org/T261463) (owner: 10Andrew Bogott) [18:31:18] (03PS2) 10Andrew Bogott: Openstack Nova compute: hack around a bug with threading on Buster [puppet] - 10https://gerrit.wikimedia.org/r/623033 (https://phabricator.wikimedia.org/T261463) [18:33:11] (03CR) 10Andrew Bogott: [C: 03+2] Openstack Nova compute: hack around a bug with threading on Buster [puppet] - 10https://gerrit.wikimedia.org/r/623033 (https://phabricator.wikimedia.org/T261463) (owner: 10Andrew Bogott) [18:35:53] (03CR) 10Bstorm: [C: 03+2] cloud-vps: Add python3 client packages in cloud [puppet] - 10https://gerrit.wikimedia.org/r/622874 (https://phabricator.wikimedia.org/T218423) (owner: 10Bstorm) [18:37:15] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10JMinor) > "So if "limit maps serving to Wikimedia hosted sites only" wasn't already the status quo then why would we have to add to this... [18:43:21] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10CDanis) @JMinor @Elitre Although some of this is already mentioned upwards in this task, here's a summary of the community objections I'm... [18:46:43] (03PS1) 10Dzahn: wikimania_scholarships: split udp2log host into FQDN and port, alignments [puppet] - 10https://gerrit.wikimedia.org/r/623036 [19:03:22] (03CR) 10Papaul: [V: 03+1] remove mw2135-mw2147 and mw2187-mw2214 [dns] - 10https://gerrit.wikimedia.org/r/623031 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [19:04:17] (03CR) 10Dzahn: [C: 03+2] remove mw2135-mw2147 and mw2187-mw2214 [dns] - 10https://gerrit.wikimedia.org/r/623031 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [19:11:53] !log rebooting cloudvirt1006. It's a spare, unused system but showing a bus error and icinga alerts; not worth saving if it needs saving [19:11:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:17:19] 10Operations, 10ops-codfw, 10serviceops: decommission mw2135-mw2147, mw2187-mw2214 - physical / datacenter part - https://phabricator.wikimedia.org/T261524 (10Dzahn) [19:19:54] (03PS1) 10Bstorm: cloudcumin: remove openstack packages that are already now in all vms [puppet] - 10https://gerrit.wikimedia.org/r/623047 (https://phabricator.wikimedia.org/T218423) [19:21:45] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2135-mw2147, mw2187-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10Dzahn) 05Open→03Resolved [19:21:47] 10Operations, 10serviceops: move all 86 new codfw appservers into production (mw2[291-2377].codfw.wmnet) - https://phabricator.wikimedia.org/T247021 (10Dzahn) [19:21:49] 10Operations: FY2020-2021 Q1 eqiad -> codfw switchover - https://phabricator.wikimedia.org/T243316 (10Dzahn) [19:22:11] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2135-mw2147, mw2187-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10Dzahn) All steps that need to be done by us as the service owner are done. 40 hosts removed from repos, shutdown and wiped bootloader by decom cookboo... [19:22:25] (03CR) 10Bstorm: "Git grep says that this is the only place that doesn't use "require_package" that mentions python3_keystoneauth1" [puppet] - 10https://gerrit.wikimedia.org/r/623047 (https://phabricator.wikimedia.org/T218423) (owner: 10Bstorm) [19:24:49] (03CR) 10Bstorm: [C: 03+2] cloudcumin: remove openstack packages that are already now in all vms [puppet] - 10https://gerrit.wikimedia.org/r/623047 (https://phabricator.wikimedia.org/T218423) (owner: 10Bstorm) [19:26:05] (03CR) 10RhinosF1: "Is this still needed?" [dns] - 10https://gerrit.wikimedia.org/r/621786 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [19:27:32] 10Operations, 10ops-codfw, 10serviceops: decommission mw2135-mw2147, mw2187-mw2214 - physical / datacenter part - https://phabricator.wikimedia.org/T261524 (10Dzahn) The shorter version to say this is: "all R420's in codfw that have an mw* name and are already in state 'decommissioning' in netbox' The tota... [19:29:53] (03CR) 10Dzahn: "oh, thanks! that is a duplicate indeed. I forgot about that. But it was also missing one server." [dns] - 10https://gerrit.wikimedia.org/r/621786 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [19:30:17] (03Abandoned) 10Dzahn: decom mw2135 through mw2214 [dns] - 10https://gerrit.wikimedia.org/r/621786 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [19:30:57] (03CR) 10RhinosF1: "Thanks for the quick check!" [dns] - 10https://gerrit.wikimedia.org/r/621786 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [19:36:15] (03PS1) 10Bearloga: wgEventStreams: Stream for MEP-iOS pilot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/623048 (https://phabricator.wikimedia.org/T260382) [19:36:50] (03PS2) 10Dzahn: wikimania_scholarships: split udp2log host into FQDN and port, alignments [puppet] - 10https://gerrit.wikimedia.org/r/623036 [19:54:19] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/24789/miscweb1002.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/623036 (owner: 10Dzahn) [19:54:22] (03CR) 10Dzahn: [C: 03+2] wikimania_scholarships: split udp2log host into FQDN and port, alignments [puppet] - 10https://gerrit.wikimedia.org/r/623036 (owner: 10Dzahn) [19:55:13] (03CR) 10Bearloga: "Scheduled for Monday's (Aug 31st) morning backport window on https://wikitech.wikimedia.org/wiki/Deployments" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/623048 (https://phabricator.wikimedia.org/T260382) (owner: 10Bearloga) [20:05:57] (03PS2) 10Dzahn: OTRS: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/623023 [20:20:33] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10Multichill) I noticed the announcement on the maps-l list and I also noticed https://twitter.com/krmaher/status/1299203640188690434 where... [20:43:47] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10User-Smalyshev, 10cloud-services-team (Kanban): Provide a way to have test servers on real hardware, isolated from production for Wikidata Query Service - https://phabricator.wikimedia.org/T206636 (10Andrew) I would like to delete the flavors named t2... [20:55:16] (03PS3) 10Dzahn: OTRS: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/623023 [21:06:44] (03PS1) 10Andrew Bogott: wmcs admin scripts: added wmcs-flavorusage [puppet] - 10https://gerrit.wikimedia.org/r/623062 [21:09:12] (03CR) 10Andrew Bogott: [C: 03+2] wmcs admin scripts: added wmcs-flavorusage [puppet] - 10https://gerrit.wikimedia.org/r/623062 (owner: 10Andrew Bogott) [21:19:18] (03PS4) 10Dzahn: OTRS: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/623023 [21:32:21] (03PS5) 10Dzahn: OTRS: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/623023 [21:49:33] (03PS6) 10Dzahn: OTRS: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/623023 [21:51:47] 10Operations, 10DNS, 10Traffic: Configure subdomain foundation.wikimedia.org to enable *:foundation.wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T261531 (10Krinkle) [21:53:13] !log `sudo systemctl reload nginx.service` on `cloudelastic100[5,6].wikimedia.org` to try to resolve certificate warning issues [21:53:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:54:30] (03PS7) 10Dzahn: OTRS: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/623023 [21:56:53] ryankemper: you can go to Icinga web UI, select the services and run command "reschedule next service" check and click ok. then Icinga should let you know within seconds instead of minutes or hours [21:57:08] if you have the privileges for it ..but if not we can fix that [21:57:40] (03PS1) 10Dave Pifke: [WIP] arclamp: serve SVGs from Swift [puppet] - 10https://gerrit.wikimedia.org/r/623068 (https://phabricator.wikimedia.org/T244776) [21:57:45] mutante: I do have privileges for it. I went to do exactly that a couple minutes ago but the alerts were already resolved [21:57:52] Did you happen to push the button? [21:58:18] ryankemper: no, wasn't me this time [21:58:25] so it is OK and the reload fixed it? [21:58:31] or it was already OK before the reload [21:59:08] if that was a CRIT and not just a WARN we would see it on IRC, but IRC does not get the warnings, btw [21:59:44] you can also click to see history of a check and timestamp of last change [22:01:20] mutante: I didn't check icinga right before doing the reload, but the reload had to have been what fixed it [22:01:43] I'll see if I can find the alert history...the fact that icinga is a SPA (or something similar) does confuse me so we'll see how that goes [22:04:02] ryankemper: oooh.. SPA as in "single page app" you mean? i had to check [22:04:11] you know that's the single sign on tool's fault [22:04:21] or Icinga's because it's using framesets [22:04:47] try to go manually to https://icinga.wikimedia.org/icinga/ [22:04:49] Yup that's what I meant, but did discover it must be a frame thing cause when I hover over a link I see a different URL but the URl for my actual tab never changes [22:04:52] 10Operations, 10InternetArchiveBot, 10Traffic: Support TLSv1.3 in IABot - https://phabricator.wikimedia.org/T251414 (10Cyberpower678) 05Open→03Declined This is not something I believe I have control over. [22:04:58] despite the "Landing page" that idp.wm.org sends you to [22:05:15] Anyway I found https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=cloudelastic1006&service=Elasticsearch+HTTPS+for+cloudelastic-chi-eqiad [22:05:18] this is happening since Icinga switched to the single-sign-on solution [22:06:37] Not sure where to find check frequency but just looking at next scheduled vs last check time, it looks like it might check every 60s [22:06:45] ryankemper: I recommend .. go to the Search form on the left and type the start of a server name, like "elastic" and hit enter. Then break out of the frameset by opening the part on the right in a new tab. that gets you an URL like https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=elastic [22:07:03] so you can also open that right away status.cgi?search_string=foo [22:07:22] now you have a single page with all the related hosts and services that you can scroll down [22:07:27] and see anything not green [22:07:29] thanks, that will save a lot of time [22:10:00] so if you search for "cert" in that page .. or "SSL" we get the "SSL OK - Certificate search.svc.eqiad.wmnet" but that's not the right ones. next to them you see they have not changed since 380d [22:11:03] but under cloudelastic1005 .. there are the SSL OK - Certificate cloudelastic.wikimedia.org and they have changed to OK 15 to 16 min ago [22:13:50] I actually don't see cloudelastic searching `cert`nor `SSL` [22:14:17] (But as I did earlier, going to the host manually I was able to see the ssl service checks) [22:14:21] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=elastic&scroll=9353 [22:14:53] Oh I misunderstood [22:15:01] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=cloudelastic [22:15:06] Thought you meant search icinga for cert or ssl, not search within the page having searched elastic [22:15:09] this is better for cloudelastic [22:15:11] got it [22:15:59] so about the reload of the service by puppet. first step is to find where the nginx gets installed in puppet [22:16:09] i see it uses profile::elasticsearch::cirrus [22:16:42] but not immediately where nginx gets included [22:18:00] then you'll want something like 'notify' to make it reload when a file changes https://www.puppetcookbook.com/posts/restart-a-service-when-a-file-changes.html [22:18:39] or use our systemd::service and its restart parameter [22:21:55] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:23:47] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:25:06] and "acme_chief::cert" has a parameter for 'puppet_svc' that many places use but maybe not cloudelastic [22:29:07] (03PS2) 10Dave Pifke: [WIP] arclamp: serve SVGs from Swift [puppet] - 10https://gerrit.wikimedia.org/r/623068 (https://phabricator.wikimedia.org/T244776) [22:31:25] (03PS3) 10Dave Pifke: [WIP] arclamp: serve SVGs from Swift [puppet] - 10https://gerrit.wikimedia.org/r/623068 (https://phabricator.wikimedia.org/T244776) [22:34:46] (03CR) 10Dave Pifke: "This works as expected in beta. pcc output: https://puppet-compiler.wmflabs.org/compiler1002/24800/" [puppet] - 10https://gerrit.wikimedia.org/r/622904 (https://phabricator.wikimedia.org/T244776) (owner: 10Dave Pifke) [22:38:38] 10Operations, 10DNS, 10Traffic: Configure subdomain foundation.wikimedia.org to enable *:foundation.wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T261531 (10Krinkle) Based on the [Matrix docs](https://github.com/matrix-org/synapse/blob/v1.19.1/docs/delegate.md#srv-dns-record-delegation) a... [22:46:44] (03PS4) 10Dave Pifke: [WIP] arclamp: serve SVGs from Swift [puppet] - 10https://gerrit.wikimedia.org/r/623068 (https://phabricator.wikimedia.org/T244776) [22:54:47] (03PS8) 10Dzahn: OTRS: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/623023 [23:00:22] (03CR) 10Dzahn: [V: 03+1 C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/24802/mendelevium.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/623023 (owner: 10Dzahn) [23:05:26] (03PS2) 10Dzahn: icinga::ircbot:: add data types and hiera()->lookup() [puppet] - 10https://gerrit.wikimedia.org/r/623017 [23:06:17] (03CR) 10Dave Pifke: "No worries. I'll try to sort out the statsd stuff between now and then." [puppet] - 10https://gerrit.wikimedia.org/r/601429 (https://phabricator.wikimedia.org/T229584) (owner: 10Dave Pifke) [23:07:39] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/24804/" [puppet] - 10https://gerrit.wikimedia.org/r/623017 (owner: 10Dzahn) [23:08:56] 10Operations, 10DNS, 10Traffic: Configure subdomain foundation.wikimedia.org to enable *:foundation.wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T261531 (10bcampbell) I heard back from the vendor regarding DNS and the rep said "I have not found the DNS way for delegation in our internal... [23:12:21] (03CR) 10Dzahn: "complete noop on icinga1001/icinga2001/alert1001 - bot is happy" [puppet] - 10https://gerrit.wikimedia.org/r/623017 (owner: 10Dzahn) [23:19:43] (03PS2) 10Dzahn: microsites/misc_apps: add data types for host names [puppet] - 10https://gerrit.wikimedia.org/r/623016 [23:20:48] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/24805/" [puppet] - 10https://gerrit.wikimedia.org/r/623016 (owner: 10Dzahn) [23:21:34] (03PS3) 10Dzahn: microsites/misc_apps: add data types for host names [puppet] - 10https://gerrit.wikimedia.org/r/623016 [23:26:02] (03CR) 10Dzahn: "noop on miscweb1002 as expected per compiler" [puppet] - 10https://gerrit.wikimedia.org/r/623016 (owner: 10Dzahn) [23:27:48] (03CR) 10Dzahn: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/621364 (owner: 10Dzahn) [23:30:20] (03CR) 10Dzahn: [V: 03+1 C: 03+1] "also checked the instance names in deployment-prep. nothing found. I'll still merge it after the weekend though." [puppet] - 10https://gerrit.wikimedia.org/r/621364 (owner: 10Dzahn) [23:37:46] (03PS1) 10Dzahn: mediawiki::deployment::server: add data types, hiera->lookup [puppet] - 10https://gerrit.wikimedia.org/r/623076 [23:42:42] 10Operations, 10InternetArchiveBot, 10Traffic: Support TLSv1.3 in IABot - https://phabricator.wikimedia.org/T251414 (10Krenair) >>! In T251414#6420161, @Cyberpower678 wrote: > This is not something I believe I have control over. Could you be more specific? What challenges do you see in implementing this? [23:47:56] (03PS1) 10Dzahn: k8s::deployment_server: hiera() -> lookup() [puppet] - 10https://gerrit.wikimedia.org/r/623077