[00:00:04] RoanKattouw, Niharika, and Urbanecm: #bothumor My software never has bugs. It just develops random features. Rise for Evening SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200110T0000). [00:00:04] RoanKattouw: A patch you scheduled for Evening SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:02:17] I'll do my own SWAT [00:14:36] (03CR) 10Dzahn: "Hi, this is very nice. Is this for running in wmcs or prod or both?" [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [00:16:43] (03CR) 10Dzahn: Initial puppetization of codesearch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [00:33:03] PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:35:09] !log restart prometheus on prometheus2004, enabling debug log [00:35:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:42:27] (03PS3) 10Dzahn: Initial puppetization of codesearch [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [00:43:59] PROBLEM - Prometheus prometheus2004/ops restarted: beware possible monitoring artifacts on prometheus2004 is CRITICAL: instance=127.0.0.1:9900 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=codfw+prometheus/ops [00:44:25] (03CR) 10Dzahn: Initial puppetization of codesearch (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [00:44:27] (03CR) 10jerkins-bot: [V: 04-1] Initial puppetization of codesearch [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [00:45:08] !log catrope@deploy1001 Synchronized php-1.35.0-wmf.14/extensions/GrowthExperiments/: Expose tasktype/topic API parameter info (T240512) (duration: 01m 01s) [00:45:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:45:11] T240512: Newcomer tasks: Morelike backend for topic matching - https://phabricator.wikimedia.org/T240512 [00:48:34] (03PS4) 10Dzahn: Initial puppetization of codesearch [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [00:49:38] (03CR) 10Dzahn: "I added role and profile class (the role should then be applied to an instance), moved the parameter to Hiera, moved the systemd unit file" [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [00:50:15] (03PS1) 10Dave Pifke: Add forward DNS for WebPageRelay hosts [dns] - 10https://gerrit.wikimedia.org/r/563320 (https://phabricator.wikimedia.org/T242398) [00:50:17] (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [dns] - 10https://gerrit.wikimedia.org/r/563320 (https://phabricator.wikimedia.org/T242398) (owner: 10Dave Pifke) [00:50:41] (03CR) 10jerkins-bot: [V: 04-1] Initial puppetization of codesearch [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [01:05:37] RECOVERY - Prometheus prometheus2004/ops restarted: beware possible monitoring artifacts on prometheus2004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=codfw+prometheus/ops [01:13:43] (03PS5) 10Dzahn: Initial puppetization of codesearch [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [01:22:09] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [01:27:10] (03CR) 10Legoktm: "> Patch Set 2:" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [01:27:34] mutante: thank you for helping fix it :) [01:30:16] (03PS1) 10Papaul: DNS: Add mgmt and production DNS for es202[0-5] [dns] - 10https://gerrit.wikimedia.org/r/563323 [01:30:40] (03CR) 10jerkins-bot: [V: 04-1] DNS: Add mgmt and production DNS for es202[0-5] [dns] - 10https://gerrit.wikimedia.org/r/563323 (owner: 10Papaul) [01:34:29] PROBLEM - Prometheus prometheus2004/ops restarted: beware possible monitoring artifacts on prometheus2004 is CRITICAL: instance=127.0.0.1:9900 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=codfw+prometheus/ops [01:36:58] (03PS2) 10Papaul: DNS: Add mgmt and production DNS for es202[0-5] [dns] - 10https://gerrit.wikimedia.org/r/563323 [01:56:05] RECOVERY - Prometheus prometheus2004/ops restarted: beware possible monitoring artifacts on prometheus2004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=codfw+prometheus/ops [02:11:24] !log jhuneidi@deploy1001 Pruned MediaWiki: 1.35.0-wmf.10 (duration: 04m 13s) [02:11:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:37:46] (03CR) 10CDanis: [C: 03+2] Add forward DNS for WebPageRelay hosts [dns] - 10https://gerrit.wikimedia.org/r/563320 (https://phabricator.wikimedia.org/T242398) (owner: 10Dave Pifke) [02:41:33] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [02:41:59] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 81, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [03:01:37] 10Operations, 10DNS, 10Performance-Team, 10Traffic, and 2 others: Add DNS for WebPageRelay hosts - https://phabricator.wikimedia.org/T242398 (10Krinkle) [03:22:17] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [03:24:03] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [04:24:33] 10Operations, 10Phabricator, 10Traffic, 10serviceops, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10mmodell) @Dzahn perhaps we should try just re-enabling aphlict as i... [04:30:38] 10Operations, 10SRE-Access-Requests: Replace SSH key for cchen - https://phabricator.wikimedia.org/T242407 (10cchen) [05:11:38] 10Operations, 10Phabricator, 10Traffic, 10serviceops, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10mmodell) So it's worth pointing out that there are two types of con... [05:30:00] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: (Needed By 31st January) codfw: rack/setup/install es202[0-5].codfw.wmnet - https://phabricator.wikimedia.org/T241336 (10Papaul) es2021 has some issues I look into it later Critical Thu 09 Jan 2020 23:24:33 System BIOS has halted. Normal Thu 09 Jan 20... [05:57:51] 10Operations, 10ops-codfw: (No Need By Date Provided) codfw: rack/setup/install puppetmaster2003.codfw.wmnet - https://phabricator.wikimedia.org/T239732 (10Papaul) [06:00:07] 10Operations, 10ops-codfw: (No Need By Date Provided) codfw: rack/setup/install puppetmaster2003.codfw.wmnet - https://phabricator.wikimedia.org/T239732 (10Papaul) [06:18:48] (03PS3) 10Marostegui: DNS: Add mgmt and production DNS for es202[0-5] [dns] - 10https://gerrit.wikimedia.org/r/563323 (owner: 10Papaul) [06:20:21] (03CR) 10Marostegui: [C: 03+2] DNS: Add mgmt and production DNS for es202[0-5] [dns] - 10https://gerrit.wikimedia.org/r/563323 (owner: 10Papaul) [06:21:05] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: (Needed By 31st January) codfw: rack/setup/install es202[0-5].codfw.wmnet - https://phabricator.wikimedia.org/T241336 (10Marostegui) [06:22:03] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1100 - https://phabricator.wikimedia.org/T241506 (10Marostegui) >>! In T241506#5791183, @Jclark-ctr wrote: > @Marostegui Drive has arrives Please PM me on IRC so we can get this swapped I have messaged you, 8AM EST is a bit late for me, so let's schedul... [06:22:33] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 133, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:23:47] PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:37:01] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 135, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:38:11] RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:40:59] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:41:43] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 83, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:43:53] PROBLEM - snapshot of x1 in eqiad on db1115 is CRITICAL: snapshot for x1 at eqiad taken more than 4 days ago: Most recent backup 2020-01-06 06:12:55 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [07:24:37] RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:47:12] (03CR) 10Ema: [C: 03+1] lvs: Add ncredir esams configuration [puppet] - 10https://gerrit.wikimedia.org/r/563214 (https://phabricator.wikimedia.org/T242321) (owner: 10Vgutierrez) [08:10:05] (03CR) 10Muehlenhoff: [C: 03+1] "I haven't tested this, but patch-wise looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/561817 (https://phabricator.wikimedia.org/T240941) (owner: 10Jbond) [08:13:22] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/561816 (https://phabricator.wikimedia.org/T240941) (owner: 10Jbond) [08:15:19] (03CR) 10Muehlenhoff: "This only affects apt::pin which isn't used unless a non-standard apt priority is used. Plus, the tor role must not be used in Cloud VPS t" [puppet] - 10https://gerrit.wikimedia.org/r/563208 (owner: 10Muehlenhoff) [08:17:38] (03PS2) 10Muehlenhoff: Deprecate raid1-lvm-ext4-srv-dualboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/561852 (https://phabricator.wikimedia.org/T156955) [08:24:35] !log cp3062: varnish-frontend-restart to clear things up after child crash the past days [08:24:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:16] (03PS2) 10Effie Mouzeli: mtail: Remove hhvm handler [puppet] - 10https://gerrit.wikimedia.org/r/563206 [08:25:42] (03CR) 10Muehlenhoff: [C: 03+2] Deprecate raid1-lvm-ext4-srv-dualboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/561852 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [08:26:31] RECOVERY - Varnish frontend child restarted on cp3062 is OK: (C)2 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Varnish https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=66&fullscreen&orgId=1&var-server=cp3062&var-datasource=esams+prometheus/ops [08:27:18] (03CR) 10jerkins-bot: [V: 04-1] mtail: Remove hhvm handler [puppet] - 10https://gerrit.wikimedia.org/r/563206 (owner: 10Effie Mouzeli) [08:28:46] (03PS3) 10Effie Mouzeli: mtail: Remove hhvm handler [puppet] - 10https://gerrit.wikimedia.org/r/563206 [08:31:12] (03PS1) 10Muehlenhoff: Switch ORES to standard partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/563374 (https://phabricator.wikimedia.org/T156955) [08:31:25] (03CR) 10Giuseppe Lavagetto: [C: 03+1] mtail: Remove hhvm handler [puppet] - 10https://gerrit.wikimedia.org/r/563206 (owner: 10Effie Mouzeli) [08:46:20] (03CR) 10Vgutierrez: [C: 03+2] lvs: Add ncredir esams configuration [puppet] - 10https://gerrit.wikimedia.org/r/563214 (https://phabricator.wikimedia.org/T242321) (owner: 10Vgutierrez) [08:48:50] !log vgutierrez@puppetmaster1001 conftool action : set/pooled=yes; selector: service=nginx,name=ncredir3001.esams.wmnet [08:48:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:48:55] !log vgutierrez@puppetmaster1001 conftool action : set/pooled=yes; selector: service=nginx,name=ncredir3002.esams.wmnet [08:48:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:17] (03PS1) 10Elukey: Add role to mc-gp200[1-3] [puppet] - 10https://gerrit.wikimedia.org/r/563381 (https://phabricator.wikimedia.org/T239249) [08:51:43] !log restarting pybal on lvs3007 - T242321 [08:51:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:46] T242321: Provide non-canonical-redirect service from every datacenter - https://phabricator.wikimedia.org/T242321 [08:53:25] (03CR) 10jerkins-bot: [V: 04-1] Add role to mc-gp200[1-3] [puppet] - 10https://gerrit.wikimedia.org/r/563381 (https://phabricator.wikimedia.org/T239249) (owner: 10Elukey) [08:53:45] (03CR) 10Muehlenhoff: "David, please create a separate SSH key for production (to replace this one added here). We have two administrative domains to log into (p" [puppet] - 10https://gerrit.wikimedia.org/r/562947 (https://phabricator.wikimedia.org/T242189) (owner: 10Dzahn) [08:54:14] (03CR) 10Filippo Giunchedi: [C: 03+1] Switch ORES to standard partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/563374 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [08:55:22] !log restarting pybal on lvs3005 (high-traffic1) - T242321 [08:55:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:40] (03CR) 10Elukey: "> Main test build failed." [puppet] - 10https://gerrit.wikimedia.org/r/563381 (https://phabricator.wikimedia.org/T239249) (owner: 10Elukey) [08:59:22] !log marostegui@cumin1001 dbctl commit (dc=all): 'Pool db2088:3312', diff saved to https://phabricator.wikimedia.org/P10112 and previous config saved to /var/cache/conftool/dbconfig/20200110-085921-marostegui.json [08:59:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:01:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depoool db2091:3312', diff saved to https://phabricator.wikimedia.org/P10113 and previous config saved to /var/cache/conftool/dbconfig/20200110-090143-marostegui.json [09:01:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:11] !log Remove revision partitions from db2091:3312 [09:02:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:05:19] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/563381 (https://phabricator.wikimedia.org/T239249) (owner: 10Elukey) [09:07:48] (03PS1) 10Vgutierrez: Pool esams for ncredir service [dns] - 10https://gerrit.wikimedia.org/r/563382 (https://phabricator.wikimedia.org/T242321) [09:12:10] (03PS4) 10Vgutierrez: ganeti: Add esams, ulsfo and eqsin clusters and rows [software/spicerack] - 10https://gerrit.wikimedia.org/r/563132 [09:16:43] (03CR) 10jerkins-bot: [V: 04-1] ganeti: Add esams, ulsfo and eqsin clusters and rows [software/spicerack] - 10https://gerrit.wikimedia.org/r/563132 (owner: 10Vgutierrez) [09:26:50] 10Operations, 10Discovery, 10Traffic, 10Wikidata, and 2 others: LDF server has 404 errors for JS and CSS resources - https://phabricator.wikimedia.org/T237165 (10Gehel) Thanks @BBlack, this now works as expected. [09:28:53] 10Operations, 10Traffic, 10Wikidata, 10Wikidata-Query-Service, and 2 others: LDF service does not Vary responses by Accept, sending incorrect cached responses to clients - https://phabricator.wikimedia.org/T232006 (10Gehel) a:03Mstyles Investigation on this was blocked by T237165 (which is now resolved).... [09:31:18] (03CR) 10Muehlenhoff: [V: 03+2 C: 03+2] Bump CAS to 6.1.3 [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/563140 (owner: 10Muehlenhoff) [09:31:20] (03PS5) 10Vgutierrez: ganeti: Add esams, ulsfo and eqsin clusters and rows [software/spicerack] - 10https://gerrit.wikimedia.org/r/563132 [09:31:22] (03PS1) 10Vgutierrez: my-py: Get rid of no longer needed # type: ignore annotations [software/spicerack] - 10https://gerrit.wikimedia.org/r/563392 [09:35:27] 10Operations, 10Traffic: varnish parent unable to send signals to child - https://phabricator.wikimedia.org/T242411 (10ema) [09:35:44] (03PS2) 10Vgutierrez: mypy: Get rid of no longer needed # type: ignore annotations [software/spicerack] - 10https://gerrit.wikimedia.org/r/563392 [09:35:46] (03PS6) 10Vgutierrez: ganeti: Add esams, ulsfo and eqsin clusters and rows [software/spicerack] - 10https://gerrit.wikimedia.org/r/563132 [09:36:06] (03CR) 10Volans: [C: 03+2] "Thanks for the fix!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/563392 (owner: 10Vgutierrez) [09:41:35] (03CR) 10Volans: [C: 03+1] "LGTM, unfortunately we need to change the cookbook makevm too as we had a convention of group names 'row_$ROW_NAME' that doesn't fit the n" [software/spicerack] - 10https://gerrit.wikimedia.org/r/563132 (owner: 10Vgutierrez) [09:41:38] (03Merged) 10jenkins-bot: mypy: Get rid of no longer needed # type: ignore annotations [software/spicerack] - 10https://gerrit.wikimedia.org/r/563392 (owner: 10Vgutierrez) [09:47:18] 10Operations, 10serviceops: Migrate debug proxies to Stretch/Buster - https://phabricator.wikimedia.org/T224567 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff @ema, @Vgutierrez, @BBlack With ATS handling the X-Wikimedia-Header and the ATS backend work completed these should be good to go. I've checked the NGIN... [09:47:49] (03CR) 10Vgutierrez: "> Patch Set 6: Code-Review+1" [software/spicerack] - 10https://gerrit.wikimedia.org/r/563132 (owner: 10Vgutierrez) [09:52:20] (03PS8) 10Alexandros Kosiaris: mathoid: Support canary functionality [deployment-charts] - 10https://gerrit.wikimedia.org/r/469662 [09:52:23] 10Operations, 10serviceops: Migrate debug proxies to Stretch/Buster - https://phabricator.wikimedia.org/T224567 (10ema) >>! In T224567#5792079, @MoritzMuehlenhoff wrote: > Can you confirm there's no other further pending work/tests which would make debug proxies needed again? Then I'd drop them from our enviro... [09:52:25] (03PS1) 10Alexandros Kosiaris: mathoid: Rely on kubernetes 1.12 [deployment-charts] - 10https://gerrit.wikimedia.org/r/563397 [10:05:34] (03PS1) 10Vgutierrez: Add ncredir400[12] DNS records [dns] - 10https://gerrit.wikimedia.org/r/563401 (https://phabricator.wikimedia.org/T242321) [10:10:08] (03PS2) 10Vgutierrez: Add ncredir400[12] DNS records [dns] - 10https://gerrit.wikimedia.org/r/563401 (https://phabricator.wikimedia.org/T242321) [10:16:39] 10Puppet: Peculiar puppet agent error for apt::pin change - https://phabricator.wikimedia.org/T242383 (10jbond) What server was this so i can attempt to recreate? [10:17:38] (03CR) 10Jbond: [C: 03+2] etcd: add parameter type checking and clean up [puppet] - 10https://gerrit.wikimedia.org/r/561816 (https://phabricator.wikimedia.org/T240941) (owner: 10Jbond) [10:17:51] (03PS9) 10Jbond: etcd: add parameter type checking and clean up [puppet] - 10https://gerrit.wikimedia.org/r/561816 (https://phabricator.wikimedia.org/T240941) [10:18:10] (03PS8) 10Jbond: etcd: enable ssl validation [puppet] - 10https://gerrit.wikimedia.org/r/561817 (https://phabricator.wikimedia.org/T240941) [10:18:23] (03PS9) 10Jbond: etcd: add cert parameter to enable client auth [puppet] - 10https://gerrit.wikimedia.org/r/561818 (https://phabricator.wikimedia.org/T240941) [10:18:30] (03PS5) 10Jbond: etcd: remove username/password [puppet] - 10https://gerrit.wikimedia.org/r/561819 (https://phabricator.wikimedia.org/T240941) [10:21:20] !log rename Ganeti group for eqsin from "default" to "row_1" T228099 [10:21:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:21:23] T228099: rack/setup/install ganeti500[123].eqsin.wmnet - https://phabricator.wikimedia.org/T228099 [10:24:20] !log rename Ganeti group for esams from "default" to "row_OE" T236216 [10:24:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:24:29] T236216: rack/setup/install ganeti300[123] - https://phabricator.wikimedia.org/T236216 [10:25:10] (03PS9) 10Alexandros Kosiaris: mathoid: Support canary functionality [deployment-charts] - 10https://gerrit.wikimedia.org/r/469662 [10:25:50] (03PS7) 10Vgutierrez: ganeti: Add esams, ulsfo and eqsin clusters and rows [software/spicerack] - 10https://gerrit.wikimedia.org/r/563132 [10:29:38] (03PS1) 10Jbond: apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/563408 [10:32:11] (03PS1) 10Alexandros Kosiaris: mathoid: Test canary functionality in codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/563409 [10:32:47] (03PS1) 10Tarrow: Termbox chart 0.0.4 [deployment-charts] - 10https://gerrit.wikimedia.org/r/563410 [10:34:07] (03CR) 10Alexandros Kosiaris: [C: 03+2] mathoid: Rely on kubernetes 1.12 [deployment-charts] - 10https://gerrit.wikimedia.org/r/563397 (owner: 10Alexandros Kosiaris) [10:34:16] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Tested in minikube, seems to work fine" [deployment-charts] - 10https://gerrit.wikimedia.org/r/563397 (owner: 10Alexandros Kosiaris) [10:34:20] (03PS2) 10Jbond: apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/563408 [10:34:31] (03Merged) 10jenkins-bot: mathoid: Rely on kubernetes 1.12 [deployment-charts] - 10https://gerrit.wikimedia.org/r/563397 (owner: 10Alexandros Kosiaris) [10:35:12] (03PS1) 10Tarrow: Docs: Add information on updating a chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/563412 [10:35:33] (03CR) 10Alexandros Kosiaris: "Tested with 2 different canary releases in minikube, works fine. Will be tested in codfw as well" [deployment-charts] - 10https://gerrit.wikimedia.org/r/469662 (owner: 10Alexandros Kosiaris) [10:36:35] (03CR) 10Alexandros Kosiaris: [C: 03+2] mathoid: Support canary functionality [deployment-charts] - 10https://gerrit.wikimedia.org/r/469662 (owner: 10Alexandros Kosiaris) [10:36:37] (03CR) 10jerkins-bot: [V: 04-1] apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/563408 (owner: 10Jbond) [10:36:52] (03Merged) 10jenkins-bot: mathoid: Support canary functionality [deployment-charts] - 10https://gerrit.wikimedia.org/r/469662 (owner: 10Alexandros Kosiaris) [10:36:59] (03CR) 10Tarrow: "This change is ready for review." (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/563412 (owner: 10Tarrow) [10:37:10] (03CR) 10Alexandros Kosiaris: [C: 03+2] mathoid: Test canary functionality in codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/563409 (owner: 10Alexandros Kosiaris) [10:37:38] (03Merged) 10jenkins-bot: mathoid: Test canary functionality in codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/563409 (owner: 10Alexandros Kosiaris) [10:37:50] !log akosiaris@cumin1001 conftool action : set/pooled=false; selector: name=codfw,dnsdisc=mathoid [10:37:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:38:09] !log depool mathoid codfw in preparation for testing canary support in the mathoid helm chart [10:38:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:40:20] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' . [10:40:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:41:27] (03PS2) 10Tarrow: Termbox chart 0.0.4 [deployment-charts] - 10https://gerrit.wikimedia.org/r/563410 [10:43:03] (03PS3) 10Tarrow: Termbox chart 0.0.4 [deployment-charts] - 10https://gerrit.wikimedia.org/r/563410 [10:51:04] !log akosiaris@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' . [10:51:04] !log akosiaris@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'canary' . [10:51:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:52:00] (03PS1) 10Elukey: oozie: add spark conf directory in oozie-site.xml [puppet] - 10https://gerrit.wikimedia.org/r/563414 (https://phabricator.wikimedia.org/T240934) [10:52:11] (03PS1) 10Jbond: root access: add my key to the root authorized_keys file on wmcs [labs/private] - 10https://gerrit.wikimedia.org/r/563415 [10:54:30] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1003/20304/" [puppet] - 10https://gerrit.wikimedia.org/r/563414 (https://phabricator.wikimedia.org/T240934) (owner: 10Elukey) [10:56:26] !log akosiaris@cumin1001 conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mathoid [10:56:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:56:39] !log repool mathoid codfw for testing canary support in the mathoid helm chart [10:56:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:56:50] up to now everything looks good [11:01:25] and traffic to canary indeed is in the ~6% range [11:01:40] at least per grafana... [11:03:12] (03PS1) 10Ammarpad: Add `Tutoriel` namespace for French Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563418 (https://phabricator.wikimedia.org/T242102) [11:03:37] I 'll leave it like that for a while [11:18:01] (03CR) 10Andrew Bogott: [C: 03+1] "I have no objection but someone who isn't on vacation should verify identity &c." [labs/private] - 10https://gerrit.wikimedia.org/r/563415 (owner: 10Jbond) [11:18:10] (03PS3) 10Jbond: apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/563408 [11:20:19] (03CR) 10jerkins-bot: [V: 04-1] apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/563408 (owner: 10Jbond) [11:20:27] RECOVERY - snapshot of x1 in eqiad on db1115 is OK: snapshot for x1 at eqiad taken less than 4 days ago and larger than 90 GB: Last one 2020-01-10 10:36:29 from db1140.eqiad.wmnet:3320 (143 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups [11:21:36] (03PS4) 10Jbond: apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/563408 [11:31:04] 10Operations, 10Traffic: varnish-fe crashes due to "Error in munmap(): Cannot allocate memory" - https://phabricator.wikimedia.org/T242417 (10ema) [11:31:10] 10Operations, 10Traffic: varnish-fe crashes due to "Error in munmap(): Cannot allocate memory" - https://phabricator.wikimedia.org/T242417 (10ema) p:05Triage→03High [11:33:20] 10Operations, 10Traffic: varnish-fe crashes due to "Error in munmap(): Cannot allocate memory" - https://phabricator.wikimedia.org/T242417 (10ema) [11:42:01] (03PS5) 10Jbond: apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/563408 [11:46:59] (03PS6) 10Jbond: apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/563408 [11:47:46] * Urbanecm stashes at mwdebug1001 [11:48:41] (03PS7) 10Jbond: apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/563408 [11:48:48] !log stop/mask nginx on hassium/hassaleh T224567 [11:48:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:48:52] T224567: Migrate debug proxies to Stretch/Buster - https://phabricator.wikimedia.org/T224567 [11:50:20] (03PS8) 10Jbond: apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/563408 (https://phabricator.wikimedia.org/T242383) [11:51:43] 10Puppet, 10Patch-For-Review: Peculiar puppet agent error for apt::pin change - https://phabricator.wikimedia.org/T242383 (10jbond) Looks like this is caused by [[ https://tickets.puppetlabs.com/browse/PUP-8067 | PUP-8067 ]], i have created a [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/563408 | new... [11:52:04] * Urbanecm is done with stashing [12:00:24] * Urbanecm again stashes at mwdebug1001 [12:02:13] (03CR) 10Muehlenhoff: [C: 03+1] "Nice detective work!" [puppet] - 10https://gerrit.wikimedia.org/r/563408 (https://phabricator.wikimedia.org/T242383) (owner: 10Jbond) [12:04:16] * Urbanecm done iwth stashing [12:09:14] ACKNOWLEDGEMENT - Memory correctable errors -EDAC- on mw1239 is CRITICAL: 14 ge 4 Muehlenhoff This host is pending decommission along with mw1221-mw1258 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw1239&var-datasource=eqiad+prometheus/ops [12:19:12] (03PS1) 10Filippo Giunchedi: varnish: format log consumer stdout as cee+json [puppet] - 10https://gerrit.wikimedia.org/r/563430 (https://phabricator.wikimedia.org/T227108) [12:20:07] (03CR) 10jerkins-bot: [V: 04-1] varnish: format log consumer stdout as cee+json [puppet] - 10https://gerrit.wikimedia.org/r/563430 (https://phabricator.wikimedia.org/T227108) (owner: 10Filippo Giunchedi) [12:21:23] (03PS2) 10Filippo Giunchedi: varnish: format log consumer stdout as cee+json [puppet] - 10https://gerrit.wikimedia.org/r/563430 (https://phabricator.wikimedia.org/T227108) [12:22:15] (03CR) 10Filippo Giunchedi: "Sample output when running under systemd/journald:" [puppet] - 10https://gerrit.wikimedia.org/r/563430 (https://phabricator.wikimedia.org/T227108) (owner: 10Filippo Giunchedi) [12:23:37] (03PS3) 10Muehlenhoff: Switch cergen to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561814 [12:24:08] (03CR) 10jerkins-bot: [V: 04-1] Switch cergen to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561814 (owner: 10Muehlenhoff) [12:26:58] 10Operations, 10observability: Make status.wikimedia.org a "status entry page"? - https://phabricator.wikimedia.org/T242367 (10Aklapper) This task is the very same request as T199816#4506872 which makes this a duplicate of T202061. [12:27:08] 10Operations, 10observability: Make status.wikimedia.org a "status entry page"? - https://phabricator.wikimedia.org/T242367 (10Aklapper) [12:27:13] 10Operations, 10observability: Implement an accurate and easy to understand status page for all wikis - https://phabricator.wikimedia.org/T202061 (10Aklapper) [12:33:18] (03CR) 10Muehlenhoff: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/561814 (owner: 10Muehlenhoff) [12:35:11] (03CR) 10Muehlenhoff: [C: 03+2] Switch cergen to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561814 (owner: 10Muehlenhoff) [12:39:55] 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Traffic, 10Release-Engineering-Team (Development services): Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10kostajh) Could we please get an update on the timeframe for this? [12:40:03] 10Operations, 10Phabricator, 10Traffic, 10serviceops, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10akosiaris) > I really can't think of any reason for it to be using... [12:41:36] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/20307/" [puppet] - 10https://gerrit.wikimedia.org/r/562226 (owner: 10Muehlenhoff) [12:45:26] (03PS2) 10Muehlenhoff: spicerack: Switch to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561871 [12:47:35] (03CR) 10Muehlenhoff: [C: 03+2] spicerack: Switch to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561871 (owner: 10Muehlenhoff) [12:50:05] PROBLEM - Check systemd state on idp2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:51:55] RECOVERY - Check systemd state on idp2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:54:30] (03PS1) 10Muehlenhoff: Switch Thumbor to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/563432 [12:54:50] (03CR) 10Joal: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/563414 (https://phabricator.wikimedia.org/T240934) (owner: 10Elukey) [13:03:37] (03CR) 10Alexandros Kosiaris: [C: 04-1] "LGTM, but we will first need to create namespace on the clusters" [deployment-charts] - 10https://gerrit.wikimedia.org/r/563211 (https://phabricator.wikimedia.org/T233629) (owner: 10Ottomata) [13:06:02] (03CR) 10Alexandros Kosiaris: [C: 03+2] Docs: Add information on updating a chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/563412 (owner: 10Tarrow) [13:06:05] (03PS2) 10Alexandros Kosiaris: Docs: Add information on updating a chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/563412 (owner: 10Tarrow) [13:07:33] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Hmm, got an error when delete the release" [deployment-charts] - 10https://gerrit.wikimedia.org/r/469662 (owner: 10Alexandros Kosiaris) [13:13:04] ACKNOWLEDGEMENT - Varnish frontend child restarted on cp3054 is CRITICAL: 2 ge 2 Ema Cause under investigation https://wikitech.wikimedia.org/wiki/Varnish https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=66&fullscreen&orgId=1&var-server=cp3054&var-datasource=esams+prometheus/ops [13:16:43] (03PS1) 10Muehlenhoff: Fix condition for conditional application of apt::pin [puppet] - 10https://gerrit.wikimedia.org/r/563435 [13:20:44] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/20309/" [puppet] - 10https://gerrit.wikimedia.org/r/563435 (owner: 10Muehlenhoff) [13:26:33] (03PS1) 10Tarrow: Pin termbox chart versions at 0.0.3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/563438 [13:31:05] 10Operations, 10Citoid, 10SRE-Access-Requests: Revoke access Citoid/Zotero production servers for MVOLZ - https://phabricator.wikimedia.org/T242427 (10Mvolz) [13:32:56] (03CR) 10Tarrow: "This change is ready for review." [deployment-charts] - 10https://gerrit.wikimedia.org/r/563438 (owner: 10Tarrow) [13:33:04] 10Operations, 10Citoid, 10SRE-Access-Requests: Revoke access Citoid/Zotero production servers for MVOLZ - https://phabricator.wikimedia.org/T242427 (10Mvolz) 05Open→03Declined [13:33:07] 10Operations, 10Citoid, 10SRE-Access-Requests: Requesting access to Citoid/Zotero production servers for MVOLZ - https://phabricator.wikimedia.org/T213269 (10Mvolz) [13:43:45] (03CR) 10Jakob: [C: 03+1] Pin termbox chart versions at 0.0.3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/563438 (owner: 10Tarrow) [13:53:20] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1100 - https://phabricator.wikimedia.org/T241506 (10Jclark-ctr) slot appears to be 0 as discussed on irc we will change monday [13:55:39] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1100 - https://phabricator.wikimedia.org/T241506 (10Marostegui) @jclark-ctr I think I calculated wrongly the converstion UTC and EST, if you are around the DC now, please change the disk :-) [13:57:43] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1100 - https://phabricator.wikimedia.org/T241506 (10Marostegui) >>! In T241506#5792554, @Jclark-ctr wrote: > slot appears to be 0 as discussed on irc we will change monday Yep, looks like it from my side too [14:00:46] (03CR) 10Ayounsi: "Adding Mark as well for the administrative -2/+2" [homer/public] - 10https://gerrit.wikimedia.org/r/562698 (owner: 10Ayounsi) [14:14:49] (03PS17) 10Jbond: stunnel: add stunnel module and update rsync to use it [puppet] - 10https://gerrit.wikimedia.org/r/558133 [14:16:25] (03PS18) 10Jbond: stunnel: add stunnel module and update rsync to use it [puppet] - 10https://gerrit.wikimedia.org/r/558133 [14:17:51] (03PS19) 10Jbond: stunnel: add stunnel module and update rsync to use it [puppet] - 10https://gerrit.wikimedia.org/r/558133 [14:19:58] (03CR) 10jerkins-bot: [V: 04-1] stunnel: add stunnel module and update rsync to use it [puppet] - 10https://gerrit.wikimedia.org/r/558133 (owner: 10Jbond) [14:24:27] (03PS20) 10Jbond: stunnel: add stunnel module and update rsync to use it [puppet] - 10https://gerrit.wikimedia.org/r/558133 [14:30:30] (03CR) 10Elukey: [C: 03+2] oozie: add spark conf directory in oozie-site.xml [puppet] - 10https://gerrit.wikimedia.org/r/563414 (https://phabricator.wikimedia.org/T240934) (owner: 10Elukey) [14:33:04] (03CR) 10Alexandros Kosiaris: [C: 03+1] Pin termbox chart versions at 0.0.3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/563438 (owner: 10Tarrow) [14:36:00] (03PS21) 10Jbond: stunnel: add stunnel module and update rsync to use it [puppet] - 10https://gerrit.wikimedia.org/r/558133 [14:41:06] 10Operations, 10Traffic: varnish parent unable to send signals to child - https://phabricator.wikimedia.org/T242411 (10ema) p:05Triage→03Normal [14:50:07] (03PS1) 10Reedy: Undeploy ParsoidBatchAPI [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563461 (https://phabricator.wikimedia.org/T242430) [15:06:40] (03CR) 10Muehlenhoff: [C: 04-1] "Needs 563435 first" [puppet] - 10https://gerrit.wikimedia.org/r/563432 (owner: 10Muehlenhoff) [15:09:09] (03CR) 10Jbond: [C: 03+2] apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/563408 (https://phabricator.wikimedia.org/T242383) (owner: 10Jbond) [15:17:04] (03PS1) 10Muehlenhoff: Remove support for < Buster from Phabricator class [puppet] - 10https://gerrit.wikimedia.org/r/563469 [15:19:08] 10Puppet: Peculiar puppet agent error for apt::pin change - https://phabricator.wikimedia.org/T242383 (10Bstorm) The new version is looking good! [15:27:04] (03PS1) 10Muehlenhoff: gerrit: Remove support for Java older than 11 [puppet] - 10https://gerrit.wikimedia.org/r/563472 [15:31:54] 10Operations: CAS SSO: failed u2f registration - https://phabricator.wikimedia.org/T242438 (10jbond) [15:32:13] 10Operations, 10User-jbond: CAS SSO: failed u2f registration - https://phabricator.wikimedia.org/T242438 (10jbond) p:05Triage→03Normal [15:33:14] 10Puppet: Peculiar puppet agent error for apt::pin change - https://phabricator.wikimedia.org/T242383 (10jbond) 05Open→03Resolved great, ill close this then :) [15:36:09] 10Operations, 10ops-eqiad, 10serviceops: (No Need By Date Provided) rack/setup/install kubernetes10[07-14].eqiad.wmnet - https://phabricator.wikimedia.org/T241850 (10jijiki) [15:38:10] (03PS1) 10Muehlenhoff: Switch component-pyall to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/563474 [15:38:35] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/563435 (owner: 10Muehlenhoff) [15:39:09] (03CR) 10Muehlenhoff: [C: 03+2] Fix condition for conditional application of apt::pin [puppet] - 10https://gerrit.wikimedia.org/r/563435 (owner: 10Muehlenhoff) [15:49:27] (03PS1) 10Muehlenhoff: jenkins: Switch to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/563478 [15:50:10] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work), 10Patch-For-Review: Requesting access to Search Platform Service for Zbyszko Papierski - https://phabricator.wikimedia.org/T242341 (10Nuria) Approved on my end [15:51:02] 10Operations, 10User-jbond: CAS SSO: failed u2f registration - https://phabricator.wikimedia.org/T242438 (10jbond) Looking at the code it only accepts [[ https://github.com/Yubico/java-u2flib-server/blob/master/u2flib-server-core/src/main/java/com/yubico/u2f/crypto/BouncyCastleCrypto.java#L42 | SHA256withECDSA... [15:51:05] (03CR) 10Paladox: [C: 04-1] "We are not on a gerrit version that supports java 11 yet." [puppet] - 10https://gerrit.wikimedia.org/r/563472 (owner: 10Muehlenhoff) [15:58:40] (03PS1) 10Giuseppe Lavagetto: Add a registryctl command-line utility [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/563482 [15:59:35] (03CR) 10jerkins-bot: [V: 04-1] Add a registryctl command-line utility [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/563482 (owner: 10Giuseppe Lavagetto) [16:01:30] (03PS1) 10Cwhite: nagios: add VO and OG to sms contact group [puppet] - 10https://gerrit.wikimedia.org/r/563483 (https://phabricator.wikimedia.org/T236075) [16:09:05] (03CR) 10BryanDavis: [C: 03+1] "+1 for intent. I have not attempted to verify the key with jbond." [labs/private] - 10https://gerrit.wikimedia.org/r/563415 (owner: 10Jbond) [16:16:36] (03CR) 10Jhedden: [C: 03+2] root access: add my key to the root authorized_keys file on wmcs [labs/private] - 10https://gerrit.wikimedia.org/r/563415 (owner: 10Jbond) [16:17:25] (03CR) 10Jhedden: [V: 03+2 C: 03+2] root access: add my key to the root authorized_keys file on wmcs [labs/private] - 10https://gerrit.wikimedia.org/r/563415 (owner: 10Jbond) [16:28:26] (03PS2) 10Giuseppe Lavagetto: Add a registryctl command-line utility [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/563482 [16:39:01] (03PS4) 10Effie Mouzeli: mtail: Remove hhvm handler [puppet] - 10https://gerrit.wikimedia.org/r/563206 [16:39:43] (03CR) 10Filippo Giunchedi: [C: 03+1] nagios: add VO and OG to sms contact group [puppet] - 10https://gerrit.wikimedia.org/r/563483 (https://phabricator.wikimedia.org/T236075) (owner: 10Cwhite) [16:40:52] (03CR) 10Effie Mouzeli: [C: 03+1] "<3" [puppet] - 10https://gerrit.wikimedia.org/r/563432 (owner: 10Muehlenhoff) [16:48:43] (03PS1) 10Giuseppe Lavagetto: docker-report: remove failing images, fix logging [puppet] - 10https://gerrit.wikimedia.org/r/563490 [16:50:00] (03CR) 10Jforrester: [C: 04-1] "Not yet, per Subbu." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563461 (https://phabricator.wikimedia.org/T242430) (owner: 10Reedy) [16:54:28] 10Operations, 10Wikimedia-Mailing-lists, 10Release-Engineering-Team-TODO (201911), 10User-zeljkofilipin: Close QA mailing list - https://phabricator.wikimedia.org/T237383 (10zeljkofilipin) 05Resolved→03Open @Jdforrester-WMF and I have talked over e-mail. Mailing list home page (https://lists.wikimedia.... [16:54:52] 10Operations, 10Wikimedia-Mailing-lists, 10Release-Engineering-Team-TODO (201911), 10User-zeljkofilipin: Close QA mailing list - https://phabricator.wikimedia.org/T237383 (10zeljkofilipin) [16:55:41] (03CR) 10Giuseppe Lavagetto: [C: 03+2] docker-report: remove failing images, fix logging [puppet] - 10https://gerrit.wikimedia.org/r/563490 (owner: 10Giuseppe Lavagetto) [17:03:16] (03CR) 10Cwhite: [C: 03+2] nagios: add VO and OG to sms contact group [puppet] - 10https://gerrit.wikimedia.org/r/563483 (https://phabricator.wikimedia.org/T236075) (owner: 10Cwhite) [17:05:49] 10Operations, 10ops-codfw, 10DBA: (Needed By 31st January) codfw: rack/setup/install es202[0-5].codfw.wmnet - https://phabricator.wikimedia.org/T241336 (10Papaul) [17:09:15] 10Operations, 10Wikimedia-Mailing-lists, 10Release-Engineering-Team-TODO (201911), 10User-zeljkofilipin: Close QA mailing list - https://phabricator.wikimedia.org/T237383 (10Jdforrester-WMF) Only SRE can. [17:12:00] (03PS1) 10Papaul: DHCP: Add MAC address entries for es202[02345] [puppet] - 10https://gerrit.wikimedia.org/r/563496 [17:13:02] (03CR) 10jerkins-bot: [V: 04-1] DHCP: Add MAC address entries for es202[02345] [puppet] - 10https://gerrit.wikimedia.org/r/563496 (owner: 10Papaul) [17:14:05] 10Operations, 10ops-codfw, 10DBA: (Needed By 31st January) codfw: rack/setup/install es202[0-5].codfw.wmnet - https://phabricator.wikimedia.org/T241336 (10Papaul) [17:23:00] 10Operations, 10ops-codfw, 10DBA: (Needed By 31st January) codfw: rack/setup/install es202[0-5].codfw.wmnet - https://phabricator.wikimedia.org/T241336 (10Papaul) ` papaul@asw-a-codfw# show | compare [edit interfaces interface-range vlan-private1-a-codfw] member xe-4/0/20 { ... } + member ge-3/0/27... [17:23:41] 10Operations, 10ops-codfw, 10DBA: (Needed By 31st January) codfw: rack/setup/install es202[0-5].codfw.wmnet - https://phabricator.wikimedia.org/T241336 (10Papaul) [17:26:41] (03PS1) 10Cwhite: bugfix and cleanup [debs/prometheus-swagger-exporter] - 10https://gerrit.wikimedia.org/r/563497 [17:28:16] (03CR) 10Cwhite: [C: 03+2] bugfix and cleanup [debs/prometheus-swagger-exporter] - 10https://gerrit.wikimedia.org/r/563497 (owner: 10Cwhite) [17:41:25] 10Operations, 10Continuous-Integration-Infrastructure: Package python3.8 for stretch-wikimedia pyall component - https://phabricator.wikimedia.org/T241195 (10faidon) [17:43:04] 10Operations, 10Discovery, 10Recommendation-API, 10Wikidata, and 3 others: flapping monitoring for recommendation_api on scb - https://phabricator.wikimedia.org/T178445 (10Eevans) [17:52:05] (03PS1) 10Elukey: wikistats: serve the v2 version of the website by default [puppet] - 10https://gerrit.wikimedia.org/r/563508 (https://phabricator.wikimedia.org/T237752) [17:55:38] (03PS2) 10Elukey: wikistats: serve the v2 version of the website by default [puppet] - 10https://gerrit.wikimedia.org/r/563508 (https://phabricator.wikimedia.org/T237752) [17:57:57] (03CR) 10Elukey: "Fran the patch is still WIP but please check if you see anything weird and/or that we didn't discuss." [puppet] - 10https://gerrit.wikimedia.org/r/563508 (https://phabricator.wikimedia.org/T237752) (owner: 10Elukey) [18:01:47] (03PS1) 10Cwhite: enable fetching debian version from debian changelog [debs/prometheus-swagger-exporter] - 10https://gerrit.wikimedia.org/r/563514 [18:03:07] (03PS2) 10Papaul: DHCP: Add MAC address entries for es202[02345] [puppet] - 10https://gerrit.wikimedia.org/r/563496 (https://phabricator.wikimedia.org/T241336) [18:06:27] (03PS2) 10Cwhite: enable fetching debian version from debian changelog [debs/prometheus-swagger-exporter] - 10https://gerrit.wikimedia.org/r/563514 [18:12:43] RECOVERY - Check systemd state on boron is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:13:28] (03CR) 10BryanDavis: [C: 03+2] Apply black formatting and make the webservice frontend pass flake8 [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/562996 (https://phabricator.wikimedia.org/T236202) (owner: 10Bstorm) [18:18:27] (03PS1) 10Elukey: Remove spark-specific options from Hadoop Test's Yarn config [puppet] - 10https://gerrit.wikimedia.org/r/563521 (https://phabricator.wikimedia.org/T240934) [18:19:17] 10Operations, 10SRE-Access-Requests: Replace SSH key for cchen - https://phabricator.wikimedia.org/T242407 (10cchen) Hi @Dzahn , i see you are on the clinic duty this week. do i need to assign the ticker to you? [18:21:06] (03CR) 10BryanDavis: [C: 03+1] kubernetes: persist the cpu and mem args in service manifests [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563003 (https://phabricator.wikimedia.org/T236202) (owner: 10Bstorm) [18:27:30] (03CR) 10Elukey: [V: 03+2 C: 03+2] Remove spark-specific options from Hadoop Test's Yarn config [puppet] - 10https://gerrit.wikimedia.org/r/563521 (https://phabricator.wikimedia.org/T240934) (owner: 10Elukey) [18:29:11] !log Restarted zuul on contint1001; no logs since 2020-01-10 17:55:28,452 [18:29:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:45] (03CR) 10BryanDavis: Apply black formatting and make the webservice frontend pass flake8 [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/562996 (https://phabricator.wikimedia.org/T236202) (owner: 10Bstorm) [18:31:47] (03CR) 10BryanDavis: [C: 03+2] Apply black formatting and make the webservice frontend pass flake8 [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/562996 (https://phabricator.wikimedia.org/T236202) (owner: 10Bstorm) [18:32:06] (03CR) 10Cwhite: [V: 03+2 C: 03+2] enable fetching debian version from debian changelog [debs/prometheus-swagger-exporter] - 10https://gerrit.wikimedia.org/r/563514 (owner: 10Cwhite) [18:32:18] (03Merged) 10jenkins-bot: Apply black formatting and make the webservice frontend pass flake8 [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/562996 (https://phabricator.wikimedia.org/T236202) (owner: 10Bstorm) [18:32:39] (03PS7) 10Jforrester: Variant configuration: Replace symfony/yaml with spyc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/554967 [18:33:03] (03PS3) 10Jforrester: Variant configuration: Read and write variant config from conf-dir, not /tmp [mediawiki-config] - 10https://gerrit.wikimedia.org/r/554977 [18:33:07] (03CR) 10Jforrester: [C: 04-2] Variant configuration: Read and write variant config from conf-dir, not /tmp [mediawiki-config] - 10https://gerrit.wikimedia.org/r/554977 (owner: 10Jforrester) [18:40:13] (03CR) 10Dzahn: [C: 03+2] admins: add clarakosi to deploy-service for RESTBase deployment [puppet] - 10https://gerrit.wikimedia.org/r/562661 (https://phabricator.wikimedia.org/T242152) (owner: 10Dzahn) [18:45:05] (03PS3) 10Dzahn: admins: add clarakosi to deploy-service for RESTBase deployment [puppet] - 10https://gerrit.wikimedia.org/r/562661 (https://phabricator.wikimedia.org/T242152) [18:46:15] 10Operations, 10SRE-Access-Requests: Replace SSH key for cchen - https://phabricator.wikimedia.org/T242407 (10Dzahn) a:03Dzahn [18:48:15] (03CR) 10Dzahn: [C: 03+2] admins: add clarakosi to deploy-service for RESTBase deployment [puppet] - 10https://gerrit.wikimedia.org/r/562661 (https://phabricator.wikimedia.org/T242152) (owner: 10Dzahn) [18:49:55] (03PS3) 10BryanDavis: Improve Python 3 compatibility [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/561262 (owner: 10Legoktm) [18:51:53] 10Operations, 10SRE-Access-Requests: Replace SSH key for cchen - https://phabricator.wikimedia.org/T242407 (10Dzahn) Hi @cchen, i got it. Could we confirm really quick it's you via a second channel? For example could you leave a message on office wiki or invite me to Google Meet or send me an SMS from your num... [18:53:26] (03CR) 10jerkins-bot: [V: 04-1] Improve Python 3 compatibility [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/561262 (owner: 10Legoktm) [18:53:29] !log welcome new (restbase) service deployer Clara Andrew-Wani (T242152) [18:53:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:54:29] (03PS2) 10Gehel: admin: added user zpapierski [puppet] - 10https://gerrit.wikimedia.org/r/563261 (https://phabricator.wikimedia.org/T242341) (owner: 10ZPapierski) [18:54:31] (03CR) 10Clarakosi: "@Dzahn: Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/562661 (https://phabricator.wikimedia.org/T242152) (owner: 10Dzahn) [18:56:40] (03CR) 10Gehel: [C: 03+2] admin: added user zpapierski [puppet] - 10https://gerrit.wikimedia.org/r/563261 (https://phabricator.wikimedia.org/T242341) (owner: 10ZPapierski) [18:59:04] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to RESTBase for clarakosi - https://phabricator.wikimedia.org/T242152 (10Dzahn) Hi @Clarakosi your user has now been created on the [[ https://wikitech.wikimedia.org/wiki/Deployment_server | deployment server ]]. Check out the wiki... [18:59:11] 10Operations, 10SRE-Access-Requests: Replace SSH key for cchen - https://phabricator.wikimedia.org/T242407 (10cchen) hi @Dzahn, just sent via SMS! Thank you! [19:00:45] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to RESTBase for clarakosi - https://phabricator.wikimedia.org/T242152 (10Dzahn) 05Open→03Resolved Feel free to reopen if any problems arise or ping us on IRC. [19:01:23] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work), 10Patch-For-Review: Requesting access to Search Platform Service for Zbyszko Papierski - https://phabricator.wikimedia.org/T242341 (10Gehel) 05Open→03Resolved a:03Gehel [19:03:15] 10Operations, 10SRE-Access-Requests: Replace SSH key for cchen - https://phabricator.wikimedia.org/T242407 (10Dzahn) Thanks Connie. Received :) Will upload the change to replace your key. [19:07:51] (03PS1) 10Dzahn: admin: replace SSH key for cchen [puppet] - 10https://gerrit.wikimedia.org/r/563544 (https://phabricator.wikimedia.org/T242407) [19:09:15] (03CR) 10Dzahn: [C: 03+2] "https://phabricator.wikimedia.org/T242407#5793287" [puppet] - 10https://gerrit.wikimedia.org/r/563544 (https://phabricator.wikimedia.org/T242407) (owner: 10Dzahn) [19:09:41] (03CR) 10Jhedden: [C: 03+1] kubernetes: persist the cpu and mem args in service manifests [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563003 (https://phabricator.wikimedia.org/T236202) (owner: 10Bstorm) [19:10:20] (03PS2) 10Dzahn: admin: replace SSH key for cchen [puppet] - 10https://gerrit.wikimedia.org/r/563544 (https://phabricator.wikimedia.org/T242407) [19:12:14] (03PS1) 10Ottomata: eventgate - bump image versions to use latest schema repo versions [deployment-charts] - 10https://gerrit.wikimedia.org/r/563545 (https://phabricator.wikimedia.org/T240985) [19:12:25] 10Operations, 10SRE-Access-Requests: Requesting access to RESTBase for clarakosi - https://phabricator.wikimedia.org/T242152 (10Clarakosi) > Check out the wiki page on SSH config for production access and try if you can ssh to deploy1001.eqiad.wmnet. It works! Thanks for getting this done. [19:12:54] (03CR) 10Ottomata: [C: 03+2] eventgate - bump image versions to use latest schema repo versions [deployment-charts] - 10https://gerrit.wikimedia.org/r/563545 (https://phabricator.wikimedia.org/T240985) (owner: 10Ottomata) [19:13:25] !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' . [19:13:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:24] (03PS1) 10Dzahn: gerrit: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/563546 [19:18:57] (03CR) 10Bstorm: "> Patch Set 3: Verified-1" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/561262 (owner: 10Legoktm) [19:19:26] (03CR) 10Bstorm: [C: 03+2] kubernetes: persist the cpu and mem args in service manifests [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563003 (https://phabricator.wikimedia.org/T236202) (owner: 10Bstorm) [19:24:26] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Replace SSH key for cchen - https://phabricator.wikimedia.org/T242407 (10Dzahn) Hi @cchen i merged the change and ran puppet on stat* and bast* and i see your key is updated on stat1004. It should all work with the new key now. Best, Daniel [19:24:32] (03PS1) 10Dzahn: releases: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/563548 [19:24:52] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Replace SSH key for cchen - https://phabricator.wikimedia.org/T242407 (10Dzahn) 05Open→03Resolved [19:25:06] 10Operations, 10SRE-Access-Requests: Replace SSH key for cchen - https://phabricator.wikimedia.org/T242407 (10Dzahn) p:05Triage→03High [19:26:51] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to EventLogging data for knissen - https://phabricator.wikimedia.org/T241838 (10Dzahn) @Nuria is this approved? [19:28:26] (03CR) 10Bstorm: "> Patch Set 3:" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/561262 (owner: 10Legoktm) [19:29:17] (03PS4) 10Bstorm: Improve Python 3 compatibility [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/561262 (owner: 10Legoktm) [19:29:57] (03PS1) 10Dzahn: aptrepo: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/563551 [19:33:37] (03PS1) 10Dzahn: microsites: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/563552 [19:35:00] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@e141941]: repair model upload in bulk daemon [19:35:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:36:22] (03PS1) 10Dzahn: smokeping: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/563555 [19:38:40] (03PS1) 10Dzahn: planet: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/563556 [19:40:02] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@e141941]: repair model upload in bulk daemon (duration: 05m 02s) [19:40:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:40:48] !log restart mjolnir-kafka-bulk-daemon across eqiad and codfw search clusters [19:40:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:40:57] (03CR) 10Bstorm: "12:30:12 py27-flake8: commands succeeded" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/561262 (owner: 10Legoktm) [19:41:51] (03PS1) 10Dzahn: admins: add Hugh Nowlan to ldap_only_admins (wmf) [puppet] - 10https://gerrit.wikimedia.org/r/563557 (https://phabricator.wikimedia.org/T242309) [19:42:50] (03CR) 10Bstorm: "It is worth noting that pyyaml appears to have decided to not support 3.4 between patch set 1 and patch set 3. More reasons to get rid of" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/561262 (owner: 10Legoktm) [19:42:56] !log restarting blazegraph on wdqs1005 [19:42:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:43:05] PROBLEM - Query Service HTTP Port on wdqs1005 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 380 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [19:43:57] (03CR) 10Dzahn: [C: 03+2] admins: add Hugh Nowlan to ldap_only_admins (wmf) [puppet] - 10https://gerrit.wikimedia.org/r/563557 (https://phabricator.wikimedia.org/T242309) (owner: 10Dzahn) [19:44:30] (03CR) 10Bstorm: [C: 03+2] Improve Python 3 compatibility [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/561262 (owner: 10Legoktm) [19:44:51] RECOVERY - Query Service HTTP Port on wdqs1005 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.024 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [19:47:00] !log LDAP - add Hugh Nowlan to "wmf" group (T242309) [19:47:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:47:03] T242309: Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 [19:48:08] !log LDAP - add Zbyszko Papierski to "wmf" group (T242341) [19:48:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:50:23] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Requesting access to Search Platform Service for Zbyszko Papierski - https://phabricator.wikimedia.org/T242341 (10Dzahn) added Zbyszko to the "wmf" LDAP group. @Zbyszko This gives you access to a bunch of web-based logins: https://wiki... [19:51:07] !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' . [19:51:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:51:27] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10serviceops-radar, and 2 others: Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn) @hnowlan The LDAP group gave you access to a bunch of web-based logins now: See https://wikitech.wikimedia.org/wiki/LDAP/Groups#wmf_group [19:52:16] !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' . [19:52:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:28] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10serviceops-radar, and 2 others: Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn) [19:53:05] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10serviceops-radar, and 2 others: Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn) [19:54:30] !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' . [19:54:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:54:57] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10serviceops-radar, and 2 others: Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn) @hnowlan One more thing we'll need for the "pwstore" part will be a GPG key. If you already have one or want to create one you can go a... [19:55:59] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [19:56:35] !log otto@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' . [19:56:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:57:46] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:00:57] 10Operations, 10DC-Ops, 10decommission: decommission hassium.eqiad.wmnet - https://phabricator.wikimedia.org/T242456 (10Dzahn) [20:01:40] 10Operations, 10DC-Ops, 10decommission: decommission hassaleh.codfw.wmnet - https://phabricator.wikimedia.org/T242457 (10Dzahn) [20:02:12] 10Operations, 10serviceops: Migrate debug proxies to Stretch/Buster - https://phabricator.wikimedia.org/T224567 (10Dzahn) [20:02:14] 10Operations, 10DC-Ops, 10decommission: decommission hassaleh.codfw.wmnet - https://phabricator.wikimedia.org/T242457 (10Dzahn) [20:02:16] 10Operations, 10DC-Ops, 10decommission: decommission hassium.eqiad.wmnet - https://phabricator.wikimedia.org/T242456 (10Dzahn) [20:02:26] 10Operations, 10serviceops: decom debug proxies (was: Migrate debug proxies to Stretch/Buster) - https://phabricator.wikimedia.org/T224567 (10Dzahn) [20:03:22] 10Operations, 10serviceops: decom debug proxies (was: Migrate debug proxies to Stretch/Buster) - https://phabricator.wikimedia.org/T224567 (10Dzahn) created decom subtasks via [[ https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Reclaim_to_Spares_OR_Decommission | Lifecycle decom form ]] [20:03:37] !log drop legacy Parsoid/JS storage keyspaces, production env -- T242344 [20:03:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:03:40] T242344: Remove Parsoid-JS tables from Cassandra - https://phabricator.wikimedia.org/T242344 [20:07:33] (03PS1) 10Dzahn: install_server: remove hassaleh and hassium [puppet] - 10https://gerrit.wikimedia.org/r/563566 (https://phabricator.wikimedia.org/T242456) [20:08:42] (03PS1) 10Dzahn: site: remove hassaleh and hassium [puppet] - 10https://gerrit.wikimedia.org/r/563567 (https://phabricator.wikimedia.org/T242456) [20:08:54] 10Operations, 10SRE-Access-Requests: Replace SSH key for cchen - https://phabricator.wikimedia.org/T242407 (10cchen) Thank you @Dzahn !! [20:10:15] PROBLEM - Restrouter LVS eqiad on restrouter.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/html/{title} (Get html by title from storage) is CRITICAL: Test Get html by title from storage returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase [20:10:25] PROBLEM - Restrouter LVS codfw on restrouter.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/page/html/{title} (Get html by title from storage) is CRITICAL: Test Get html by title from storage returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase [20:10:48] clarakosi: is that a deploy? [20:11:33] mutante: huh? [20:12:11] clarakosi: i was wondering about the icinga alerts above [20:12:45] (03PS1) 10Dzahn: remove hassium.eqiad.wmnet and hassleh.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/563570 (https://phabricator.wikimedia.org/T242456) [20:13:04] because it's restbase related [20:13:19] mutante: yeah.... [20:13:20] ohhh ok. Our last deploy was yesterday but I'll look into [20:13:21] urandom: [20:13:47] Pchelolo: ah! [20:13:56] mutante: so the 'restrouter' is not used. we forgot about it... [20:14:00] clarakosi: thanks! [20:14:07] clarakosi did it. [20:14:12] Pchelolo: ok. "not used" sounds good to me [20:14:31] so, it is broken indeed, but it's no problem. I'll ack the alert for now [20:14:37] urandom: 😫 [20:15:05] Pchelolo: so...we need to do deploy to k8s? [20:15:17] I don't think I understand what state that's all currently in [20:15:25] urandom: me neither... [20:15:33] and by "think", I mean, I have no earthly clue [20:16:04] Pchelolo: thanks for ACKing [20:16:37] (03PS2) 10Dzahn: remove hassium.eqiad.wmnet and hassaleh.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/563570 (https://phabricator.wikimedia.org/T242456) [20:17:09] ACKNOWLEDGEMENT - Restrouter LVS codfw on restrouter.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/page/html/{title} (Get html by title from storage) is CRITICAL: Test Get html by title from storage returned the unexpected status 500 (expecting: 200) ppchelko RESTRouter is currently unused. We have removed some of the Cassandra tables it was relying on. - The acknowledgement expires at: 2020-01-13 20:16:27. https://wikitech.w [20:17:09] /RESTBase [20:17:09] ACKNOWLEDGEMENT - Restrouter LVS eqiad on restrouter.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/html/{title} (Get html by title from storage) is CRITICAL: Test Get html by title from storage returned the unexpected status 500 (expecting: 200) ppchelko RESTRouter is currently unused. We have removed some of the Cassandra tables it was relying on. - The acknowledgement expires at: 2020-01-13 20:16:27. https://wikitech.w [20:17:09] /RESTBase [20:20:10] 10Operations, 10Traffic, 10Wikidata, 10Wikidata-Query-Service, and 2 others: LDF service does not Vary responses by Accept, sending incorrect cached responses to clients - https://phabricator.wikimedia.org/T232006 (10Mstyles) @gehel I think we can consider this closed unless someone is able to reproduce [20:22:45] (03CR) 10Dzahn: DHCP: Add MAC address entries for es202[02345] (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/563496 (https://phabricator.wikimedia.org/T241336) (owner: 10Papaul) [20:24:11] (03CR) 10Dzahn: "ack, the cloud issue is not a blocker, just says 'grafana' in there" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/563208 (owner: 10Muehlenhoff) [20:27:23] (03CR) 10Krinkle: [C: 04-1] "Unable to test on Beta Cluster because puppet is broken currently there unrelatedly I think:" [puppet] - 10https://gerrit.wikimedia.org/r/559262 (https://phabricator.wikimedia.org/T241097) (owner: 10Krinkle) [20:27:29] (03PS3) 10Dzahn: DHCP: Add MAC address entries for es202[02345] [puppet] - 10https://gerrit.wikimedia.org/r/563496 (https://phabricator.wikimedia.org/T241336) (owner: 10Papaul) [20:27:50] (03PS4) 10Dzahn: DHCP: Add MAC address entries for es202[02345] [puppet] - 10https://gerrit.wikimedia.org/r/563496 (https://phabricator.wikimedia.org/T241336) (owner: 10Papaul) [20:29:09] (03CR) 10Dzahn: [C: 03+2] DHCP: Add MAC address entries for es202[02345] [puppet] - 10https://gerrit.wikimedia.org/r/563496 (https://phabricator.wikimedia.org/T241336) (owner: 10Papaul) [20:29:54] !log cloudmetrics100[12] schedule downtime until Feb 28 2020 on prometheus check T242460 [20:29:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:29:57] T242460: Fix cloudmetrics icinga prometheus check - https://phabricator.wikimedia.org/T242460 [20:38:39] (03CR) 10Krinkle: "This seems to have broken the abilty for the puppet agent to run on Beta Cluster nodes. For example:" [puppet] - 10https://gerrit.wikimedia.org/r/561816 (https://phabricator.wikimedia.org/T240941) (owner: 10Jbond) [20:45:18] !log cloudcontrol200[13]-dev schedule downtime until Feb 28 2020 on systemd service check T242462 [20:45:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:45:21] T242462: cloudcontrol200[13]-dev linux bridge agent errors - https://phabricator.wikimedia.org/T242462 [20:46:30] (03CR) 10Dzahn: etcd: add parameter type checking and clean up (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/561816 (https://phabricator.wikimedia.org/T240941) (owner: 10Jbond) [20:50:11] (03CR) 10Dzahn: "As Paladox says, i think we need to wait for Gerrit 2.16. cc: Thcipriani" [puppet] - 10https://gerrit.wikimedia.org/r/563472 (owner: 10Muehlenhoff) [20:55:01] (03PS1) 10Catrope: [beta] Enable topics for suggested edits in beta (except cawiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563580 [20:55:12] (03CR) 10Catrope: [C: 03+2] [beta] Enable topics for suggested edits in beta (except cawiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563580 (owner: 10Catrope) [20:56:12] (03Merged) 10jenkins-bot: [beta] Enable topics for suggested edits in beta (except cawiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563580 (owner: 10Catrope) [21:03:06] (03PS1) 10Catrope: [beta] Remove wgGEHomepageSuggestedEditsRequiresOptIn override on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563582 [21:03:17] (03CR) 10Catrope: [C: 03+2] [beta] Remove wgGEHomepageSuggestedEditsRequiresOptIn override on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563582 (owner: 10Catrope) [21:04:26] (03Merged) 10jenkins-bot: [beta] Remove wgGEHomepageSuggestedEditsRequiresOptIn override on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563582 (owner: 10Catrope) [21:14:33] (03CR) 10Thcipriani: [C: 04-1] "> As Paladox says, i think we need to wait for Gerrit 2.16. cc:" [puppet] - 10https://gerrit.wikimedia.org/r/563472 (owner: 10Muehlenhoff) [21:16:46] 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Traffic, 10Release-Engineering-Team (Development services): Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10srodlund) @kostajh We're in progress with the tech blog. It still ne... [21:23:12] (03PS1) 10Krinkle: gerrit: Link r123 to SVN CodeReview instead of Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/563584 [21:24:32] (03PS2) 10Krinkle: gerrit: Link r123 to SVN CodeReview instead of Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/563584 [21:25:24] (03PS3) 10Krinkle: gerrit: Link r123 to SVN CodeReview instead of Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/563584 [21:33:46] (03PS1) 10Bstorm: py2to3: fix the with_metaclass declaration [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563586 [21:54:29] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/20316/" [puppet] - 10https://gerrit.wikimedia.org/r/563556 (owner: 10Dzahn) [21:55:03] (03PS2) 10Dzahn: planet: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/563556 [21:57:34] (03CR) 10Thcipriani: [C: 03+1] "mw.org page does look like it has more useful content, lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/563584 (owner: 10Krinkle) [21:57:53] PROBLEM - Check systemd state on ms-be1035 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:58:09] PROBLEM - Disk space on ms-be1035 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sda4 is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be1035&var-datasource=eqiad+prometheus/ops [21:58:09] PROBLEM - MD RAID on ms-be1035 is CRITICAL: CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [21:58:10] ACKNOWLEDGEMENT - MD RAID on ms-be1035 is CRITICAL: CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T242471 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [21:58:12] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1035 - https://phabricator.wikimedia.org/T242471 (10ops-monitoring-bot) [21:58:51] PROBLEM - very high load average likely xfs on ms-be1035 is CRITICAL: CRITICAL - load average: 213.93, 259.45, 177.90 https://wikitech.wikimedia.org/wiki/Swift [21:59:38] (03PS2) 10Dzahn: smokeping: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/563555 [22:10:26] (03PS3) 10Mstyles: Add new MLR models [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559614 (https://phabricator.wikimedia.org/T219534) [22:10:32] godog: ^ do we need to take ms-be1035 out of the swift ring or sometihng? [22:11:07] (i think in the past those were not super urgent and ticket was already auto-created though) [22:12:16] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/20317/" [puppet] - 10https://gerrit.wikimedia.org/r/563555 (owner: 10Dzahn) [22:13:09] RECOVERY - very high load average likely xfs on ms-be1035 is OK: OK - load average: 2.44, 20.51, 78.41 https://wikitech.wikimedia.org/wiki/Swift [22:13:23] ah, was hoping for that [22:16:35] (03CR) 10Dzahn: gerrit: replace hiera() with lookup() (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/563546 (owner: 10Dzahn) [22:16:50] (03CR) 10Dzahn: [C: 04-1] gerrit: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/563546 (owner: 10Dzahn) [22:17:59] (03PS2) 10Dzahn: aptrepo: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/563551 [22:18:34] 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1013 - https://phabricator.wikimedia.org/T242472 (10ops-monitoring-bot) [22:20:33] (03CR) 10Dzahn: [C: 03+2] "noop https://puppet-compiler.wmflabs.org/compiler1003/20318/" [puppet] - 10https://gerrit.wikimedia.org/r/563551 (owner: 10Dzahn) [22:25:36] 10Operations, 10LDAP-Access-Requests: Allow LDAP access to superset dashboards for Moushira Elamrawy - https://phabricator.wikimedia.org/T242000 (10Dzahn) a:05Dzahn→03None Thanks @Moushira please let us know with a quick comment here once you have one later. [22:25:42] (03PS1) 10Bstorm: k8s: Set default requests for the new cluster [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563592 (https://phabricator.wikimedia.org/T236202) [22:26:01] 10Operations, 10LDAP-Access-Requests: Allow LDAP access to superset dashboards for Moushira Elamrawy - https://phabricator.wikimedia.org/T242000 (10Dzahn) p:05Normal→03Low [22:26:09] 10Operations, 10ops-eqiad, 10cloud-services-team (Hardware): Degraded RAID on cloudvirt1013 - https://phabricator.wikimedia.org/T242472 (10JHedden) [22:27:04] (03CR) 10RLazarus: "Looks good! Mostly just minor quibbles." (0314 comments) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/563482 (owner: 10Giuseppe Lavagetto) [22:27:20] 10Operations, 10ops-eqiad, 10cloud-services-team (Hardware): Degraded RAID on cloudvirt1013 - https://phabricator.wikimedia.org/T242472 (10JHedden) p:05Triage→03High [22:27:39] (03CR) 10BryanDavis: [C: 03+2] py2to3: fix the with_metaclass declaration [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563586 (owner: 10Bstorm) [22:27:49] PROBLEM - Check systemd state on ms-be1026 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:29:49] (03CR) 10Bstorm: [V: 03+2] py2to3: fix the with_metaclass declaration [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563586 (owner: 10Bstorm) [22:30:13] ms-be1026 does not actually have failed units ... [22:32:35] ah, it does. but not "session of user debmonitor", not swift [22:33:19] RECOVERY - Check systemd state on ms-be1026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:33:46] !log ms-be1026 sudo systemctl reset-failed (failed Session 372989 of user debmonitor) [22:33:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:35:02] (03CR) 10Dzahn: [C: 04-1] "https://puppet-compiler.wmflabs.org/compiler1003/20319/torrelay1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/563208 (owner: 10Muehlenhoff) [22:38:24] (03CR) 10BryanDavis: k8s: Set default requests for the new cluster (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563592 (https://phabricator.wikimedia.org/T236202) (owner: 10Bstorm) [22:40:29] (03CR) 10Bstorm: k8s: Set default requests for the new cluster (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563592 (https://phabricator.wikimedia.org/T236202) (owner: 10Bstorm) [22:44:41] 10Operations, 10ops-eqiad, 10cloud-services-team (Hardware): Degraded RAID on cloudvirt1013 - https://phabricator.wikimedia.org/T242472 (10JHedden) Multiple hardware errors reported for this host T241313 [22:45:08] 10Operations, 10ops-eqiad, 10SRE-swift-storage: Degraded RAID on ms-be1035 - https://phabricator.wikimedia.org/T242471 (10Peachey88) [22:45:26] (03CR) 10Paladox: [C: 03+1] gerrit: Link r123 to SVN CodeReview instead of Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/563584 (owner: 10Krinkle) [22:45:31] (03CR) 10Dzahn: [C: 03+2] gerrit: Link r123 to SVN CodeReview instead of Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/563584 (owner: 10Krinkle) [22:47:59] (03CR) 10Peachey88: "How does this play in with the plans to un-deploy E:CodeReview?" [puppet] - 10https://gerrit.wikimedia.org/r/563584 (owner: 10Krinkle) [22:49:40] (03CR) 10Krinkle: "See commit message :)" [puppet] - 10https://gerrit.wikimedia.org/r/563584 (owner: 10Krinkle) [22:50:27] (03PS6) 10Dzahn: Initial puppetization of codesearch [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [22:55:57] (03CR) 10Dzahn: [C: 03+2] "not used yet, cloud-only, testing it on instance codesearch-buster in devtools, currently stretch" [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [23:02:08] (03PS1) 10Dzahn: codesearch: fix systemd template name for hound_proxy service [puppet] - 10https://gerrit.wikimedia.org/r/563599 (https://phabricator.wikimedia.org/T242319) [23:03:35] (03CR) 10Dzahn: [C: 03+2] codesearch: fix systemd template name for hound_proxy service [puppet] - 10https://gerrit.wikimedia.org/r/563599 (https://phabricator.wikimedia.org/T242319) (owner: 10Dzahn) [23:05:43] (03CR) 10Dzahn: "one follow-up for the template name: https://gerrit.wikimedia.org/r/c/operations/puppet/+/563599" [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [23:08:01] (03PS1) 10Papaul: DHCP: Testing Buster on es2024 [puppet] - 10https://gerrit.wikimedia.org/r/563600 (https://phabricator.wikimedia.org/T241336) [23:09:22] (03CR) 10Papaul: [C: 03+2] DHCP: Testing Buster on es2024 [puppet] - 10https://gerrit.wikimedia.org/r/563600 (https://phabricator.wikimedia.org/T241336) (owner: 10Papaul) [23:15:06] (03PS1) 10Dzahn: codesearch: create system group, fix system user membership [puppet] - 10https://gerrit.wikimedia.org/r/563602 (https://phabricator.wikimedia.org/T242319) [23:15:51] (03CR) 10jerkins-bot: [V: 04-1] codesearch: create system group, fix system user membership [puppet] - 10https://gerrit.wikimedia.org/r/563602 (https://phabricator.wikimedia.org/T242319) (owner: 10Dzahn) [23:17:21] (03PS2) 10Dzahn: codesearch: create system group, fix system user membership [puppet] - 10https://gerrit.wikimedia.org/r/563602 (https://phabricator.wikimedia.org/T242319) [23:18:05] (03PS8) 10BryanDavis: Make Kubernetes the default backend and warn when guessing [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/443190 (https://phabricator.wikimedia.org/T154504) (owner: 10Nehajha) [23:18:07] (03PS2) 10BryanDavis: kubernetes: Set php7.3 as the default type [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496564 [23:18:09] (03PS2) 10BryanDavis: Report error messages on stderr [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496565 [23:18:11] (03PS2) 10BryanDavis: Remove lighttpd-precise handling [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496566 [23:18:13] (03PS2) 10BryanDavis: Improve support for extra_args [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496567 [23:18:15] (03PS1) 10BryanDavis: Remove long unused pykube submodule [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563603 [23:18:17] (03PS1) 10BryanDavis: Add Black for formatting and some flake8 add-ons [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563604 [23:18:19] (03PS1) 10BryanDavis: Rename internal "toollabs" package to "toolforge" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563605 [23:18:53] (03CR) 10jerkins-bot: [V: 04-1] Rename internal "toollabs" package to "toolforge" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563605 (owner: 10BryanDavis) [23:19:54] (03PS1) 10Volans: binary packages: optimize queries [software/debmonitor] - 10https://gerrit.wikimedia.org/r/563606 [23:20:18] (03CR) 10Dzahn: [C: 03+2] codesearch: create system group, fix system user membership [puppet] - 10https://gerrit.wikimedia.org/r/563602 (https://phabricator.wikimedia.org/T242319) (owner: 10Dzahn) [23:21:22] (03CR) 10Dzahn: [C: 04-1] "a couple follow-ups on this branch https://gerrit.wikimedia.org/r/q/topic:%22gerrit-test%22+(status:open%20OR%20status:merged)" [puppet] - 10https://gerrit.wikimedia.org/r/562587 (https://phabricator.wikimedia.org/T239151) (owner: 10Herron) [23:27:13] (03PS2) 10Bstorm: k8s: Set default requests for the new cluster [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563592 (https://phabricator.wikimedia.org/T236202) [23:29:53] (03PS1) 10Dzahn: codesearch: fix typo in user groups attribute [puppet] - 10https://gerrit.wikimedia.org/r/563607 (https://phabricator.wikimedia.org/T242319) [23:30:07] (03PS2) 10BryanDavis: Remove long unused pykube submodule [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563603 [23:30:09] (03PS2) 10BryanDavis: Add Black for formatting and some flake8 add-ons [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563604 [23:30:11] (03PS9) 10BryanDavis: Make Kubernetes the default backend and warn when guessing [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/443190 (https://phabricator.wikimedia.org/T154504) (owner: 10Nehajha) [23:30:13] (03PS3) 10BryanDavis: kubernetes: Set php7.3 as the default type [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496564 [23:30:15] (03PS3) 10BryanDavis: Report error messages on stderr [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496565 [23:30:17] (03PS3) 10BryanDavis: Remove lighttpd-precise handling [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496566 [23:30:19] (03PS3) 10BryanDavis: Improve support for extra_args [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496567 [23:30:21] (03PS2) 10BryanDavis: Rename internal "toollabs" package to "toolforge" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563605 [23:30:23] (03CR) 10jerkins-bot: [V: 04-1] codesearch: fix typo in user groups attribute [puppet] - 10https://gerrit.wikimedia.org/r/563607 (https://phabricator.wikimedia.org/T242319) (owner: 10Dzahn) [23:30:44] (03CR) 10jerkins-bot: [V: 04-1] Remove long unused pykube submodule [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563603 (owner: 10BryanDavis) [23:30:52] (03PS2) 10Dzahn: codesearch: fix typo in user groups attribute [puppet] - 10https://gerrit.wikimedia.org/r/563607 (https://phabricator.wikimedia.org/T242319) [23:31:32] (03CR) 10BryanDavis: "recheck" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563603 (owner: 10BryanDavis) [23:31:55] (03CR) 10Dzahn: [C: 03+2] codesearch: fix typo in user groups attribute [puppet] - 10https://gerrit.wikimedia.org/r/563607 (https://phabricator.wikimedia.org/T242319) (owner: 10Dzahn) [23:34:56] (03CR) 10Dzahn: "Error: Found 1 dependency cycle:" [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [23:36:12] (03CR) 10Bstorm: "I had wondered about this. We use an old packaged pykube, don't we?" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563603 (owner: 10BryanDavis) [23:37:49] (03CR) 10Bstorm: [C: 03+2] Remove long unused pykube submodule [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563603 (owner: 10BryanDavis) [23:37:51] (03CR) 10BryanDavis: "> I had wondered about this. We use an old packaged pykube, don't" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563603 (owner: 10BryanDavis) [23:38:58] (03Merged) 10jenkins-bot: Remove long unused pykube submodule [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563603 (owner: 10BryanDavis) [23:39:30] (03CR) 10BryanDavis: [C: 03+1] k8s: Set default requests for the new cluster [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563592 (https://phabricator.wikimedia.org/T236202) (owner: 10Bstorm) [23:43:24] 10Operations, 10serviceops: package requirements for upgrading deployment_servers to buster - https://phabricator.wikimedia.org/T242480 (10Dzahn) [23:43:43] 10Operations, 10Performance-Team, 10Traffic: Production load.php spends ~ 10% time doing output compression within PHP - https://phabricator.wikimedia.org/T242478 (10Krinkle) [23:44:24] 10Operations, 10Performance-Team, 10Traffic: Production load.php spends ~ 10% time doing output compression within PHP - https://phabricator.wikimedia.org/T242478 (10Krinkle) p:05Triage→03High [23:44:41] 10Operations, 10ops-codfw, 10DBA: Missing Netowrk drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Papaul) [23:45:07] 10Operations, 10Performance-Team, 10Traffic: Production load.php spends ~ 10% time doing output compression within PHP - https://phabricator.wikimedia.org/T242478 (10Krinkle) I recall that in the pre-ATS setup, we explicitly configured the interaction between applayer and traffic to not request compressed re... [23:45:22] 10Operations, 10Performance-Team, 10Traffic: Production load.php spends ~ 10% time doing output compression within PHP - https://phabricator.wikimedia.org/T242478 (10Krinkle) [23:45:28] (03PS1) 10Bstorm: Buster image uses the toolforge prefix [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/563610 [23:47:38] (03PS2) 10Gergő Tisza: [DNM until June 15] Revert "Invalidate CommonsMetadata cache for entries affected by T222935" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509914 [23:49:38] 10Operations, 10ops-codfw, 10DBA: Missing Netowrk drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Papaul) [23:50:28] (03CR) 10BryanDavis: [C: 03+2] Buster image uses the toolforge prefix [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/563610 (owner: 10Bstorm) [23:50:56] (03Merged) 10jenkins-bot: Buster image uses the toolforge prefix [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/563610 (owner: 10Bstorm) [23:51:06] 10Operations, 10ops-codfw, 10DBA: Missing Netowrk drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Papaul) [23:53:45] (03PS1) 10Dzahn: codesearch: fix dependency cycle over clone directories [puppet] - 10https://gerrit.wikimedia.org/r/563612 (https://phabricator.wikimedia.org/T242319) [23:55:14] (03CR) 10Dzahn: [C: 03+2] codesearch: fix dependency cycle over clone directories [puppet] - 10https://gerrit.wikimedia.org/r/563612 (https://phabricator.wikimedia.org/T242319) (owner: 10Dzahn) [23:55:46] (03PS2) 10Dzahn: codesearch: fix dependency cycle over clone directories [puppet] - 10https://gerrit.wikimedia.org/r/563612 (https://phabricator.wikimedia.org/T242319) [23:55:55] (03PS1) 10Gergő Tisza: Newcomer tasks: use remote search for cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563613 [23:55:57] (03CR) 10Bstorm: [C: 03+2] Add Black for formatting and some flake8 add-ons [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563604 (owner: 10BryanDavis) [23:56:20] (03CR) 10Jforrester: "Oops. :-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509914 (owner: 10Gergő Tisza) [23:57:21] (03PS2) 10Catrope: [beta] Newcomer tasks: use remote search for cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563613 (owner: 10Gergő Tisza) [23:57:27] (03CR) 10Catrope: [C: 03+2] [beta] Newcomer tasks: use remote search for cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563613 (owner: 10Gergő Tisza) [23:58:39] (03Merged) 10jenkins-bot: [beta] Newcomer tasks: use remote search for cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563613 (owner: 10Gergő Tisza) [23:59:00] 10Operations, 10serviceops: package requirements for upgrading deployment_servers to buster - https://phabricator.wikimedia.org/T242480 (10Jdforrester-WMF)