[00:00:04] RoanKattouw, Niharika, and Urbanecm: I, the Bot under the Fountain, allow thee, The Deployer, to do Evening SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200228T0000). [00:00:04] Jdlrobson: A patch you scheduled for Evening SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:23] (03Abandoned) 10saper: Wikistats v2 need no symbolic link [puppet] - 10https://gerrit.wikimedia.org/r/564739 (https://phabricator.wikimedia.org/T237752) (owner: 10saper) [00:00:28] (03PS2) 10Jdlrobson: Drop legacy main page special casing on select projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575376 (https://phabricator.wikimedia.org/T32405) [00:00:50] i am here. anyone around to swat? [00:01:08] i need the swatter to also run `composer buildDBLists` on my patch as I can't seem to run it locally [00:01:54] (03CR) 10jerkins-bot: [V: 04-1] Drop legacy main page special casing on select projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575376 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson) [00:04:20] (03PS3) 10Jdlrobson: Drop legacy main page special casing on select projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575376 (https://phabricator.wikimedia.org/T32405) [00:04:37] (03PS5) 10Jforrester: Parsoid: Use the version of Parsoid in $IP/vendor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575336 (https://phabricator.wikimedia.org/T240055) [00:05:34] (03CR) 10Jforrester: [C: 04-1] "Needs a checkout of Parsoid on scandium /srv/parsoid-testing and an ln -s applied from vendor/wikimedia/parsoid to it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575336 (https://phabricator.wikimedia.org/T240055) (owner: 10Jforrester) [00:05:47] (03CR) 10jerkins-bot: [V: 04-1] Drop legacy main page special casing on select projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575376 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson) [00:07:27] (03PS4) 10Jdlrobson: Drop legacy main page special casing on select projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575376 (https://phabricator.wikimedia.org/T32405) [00:07:34] anyone around? [00:08:39] (03CR) 10jerkins-bot: [V: 04-1] Drop legacy main page special casing on select projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575376 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson) [00:17:28] (03PS1) 10Dzahn: site: add second batch of new eqiad appservers as spares, by rack [puppet] - 10https://gerrit.wikimedia.org/r/575382 (https://phabricator.wikimedia.org/T241849) [00:31:39] (03PS5) 10Jdlrobson: Drop legacy main page special casing on select projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575376 (https://phabricator.wikimedia.org/T32405) [00:31:44] (03CR) 10jerkins-bot: [V: 04-1] Drop legacy main page special casing on select projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575376 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson) [00:32:01] 10Operations, 10Performance-Team, 10serviceops, 10Patch-For-Review: Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10aaron) >>! In T240684#5893528, @Joe wrote: > A few notes: > - We cannot really worry too much about stale keys over failovers - we... [00:34:36] 10Operations, 10Performance-Team, 10serviceops, 10Patch-For-Review: Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10aaron) >>! In T240684#5839106, @elukey wrote: > Couple of random thoughts: > > * we should check the diff between our mcrouter ver... [00:36:37] 10Operations, 10wikitech.wikimedia.org: Install php-ldap on all MW appservers - https://phabricator.wikimedia.org/T237889 (10bd808) [00:37:18] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 10User-DannyS712: Pages whose title ends with semicolon (;) are intermittently inaccessible - https://phabricator.wikimedia.org/T238285 (10matmarex) @bblack @ema @Vgutierrez Did you get a response on this from ATS developers? Did we file an issue or... [00:39:04] (03PS1) 10Dzahn: parsoid: change cluster of parsoid-test machines from misc to parsoid [puppet] - 10https://gerrit.wikimedia.org/r/575383 [00:41:52] (03PS2) 10Dzahn: parsoid: change cluster of parsoid-test machines from misc to parsoid [puppet] - 10https://gerrit.wikimedia.org/r/575383 [00:43:01] (03CR) 10Dzahn: "this is what this does:" [puppet] - 10https://gerrit.wikimedia.org/r/575383 (owner: 10Dzahn) [00:47:29] (03PS6) 10Jdlrobson: Drop legacy main page special casing on select projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575376 (https://phabricator.wikimedia.org/T32405) [00:47:39] (03CR) 10Jforrester: "Does being in the "parsoid" cluster mean that the load balancer will think it's something it can route production requests to? That would " [puppet] - 10https://gerrit.wikimedia.org/r/575383 (owner: 10Dzahn) [00:47:43] (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/575220 (https://phabricator.wikimedia.org/T246327) (owner: 10Jbond) [00:47:58] (03CR) 10Jforrester: "(From the MW side this is great.)" [puppet] - 10https://gerrit.wikimedia.org/r/575383 (owner: 10Dzahn) [00:49:04] (03CR) 10Dzahn: [C: 03+2] parsoid: change cluster of parsoid-test machines from misc to parsoid [puppet] - 10https://gerrit.wikimedia.org/r/575383 (owner: 10Dzahn) [00:57:21] (03PS6) 10Jforrester: Parsoid: Use the version of Parsoid in $IP/vendor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575336 (https://phabricator.wikimedia.org/T240055) [01:01:04] (03CR) 10Dzahn: Parsoid: Use the version of Parsoid in $IP/vendor (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575336 (https://phabricator.wikimedia.org/T240055) (owner: 10Jforrester) [01:01:52] (03CR) 10Dzahn: Parsoid: Use the version of Parsoid in $IP/vendor (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575336 (https://phabricator.wikimedia.org/T240055) (owner: 10Jforrester) [01:05:01] !log Running mwscript emptyUserGroup.php --wiki=labswiki shell for T196466 [01:05:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:05:07] T196466: Remove 'shell user' right on wikitech - https://phabricator.wikimedia.org/T196466 [01:10:01] (03PS1) 10Dzahn: upgrade cscott, arlolra from parsoid-test-admins to parsoid-test-roots [puppet] - 10https://gerrit.wikimedia.org/r/575386 [01:11:41] (03PS1) 10Jforrester: [wikitech] Remove the 'shell' user right from assignment and rights lists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575387 (https://phabricator.wikimedia.org/T196466) [01:13:59] (03CR) 10Jforrester: [C: 03+2] [wikitech] Remove the 'shell' user right from assignment and rights lists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575387 (https://phabricator.wikimedia.org/T196466) (owner: 10Jforrester) [01:15:00] (03Merged) 10jenkins-bot: [wikitech] Remove the 'shell' user right from assignment and rights lists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575387 (https://phabricator.wikimedia.org/T196466) (owner: 10Jforrester) [01:15:42] !log dzahn@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [01:15:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:16:44] 10Operations, 10Product-Infrastructure-Team-Backlog, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Michael Holloway - https://phabricator.wikimedia.org/T246019 (10Dzahn) >>! In T246019#5912778, @Mholloway wrote: > Side note: Is it possible to update my shell username simply by c... [01:19:30] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T196466 [wikitech] Remove the 'shell' user right from assignment and rights lists (duration: 00m 58s) [01:19:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:19:36] T196466: Remove 'shell user' right on wikitech - https://phabricator.wikimedia.org/T196466 [01:21:32] (03PS1) 10Dzahn: admins: add mholloway to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/575388 (https://phabricator.wikimedia.org/T246019) [01:22:58] 10Operations, 10Product-Infrastructure-Team-Backlog, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for Michael Holloway - https://phabricator.wikimedia.org/T246019 (10Dzahn) @Mholloway We can ignore the checkboxes about making an SSH key since you already have... [01:24:48] (03PS2) 10Dzahn: admins: add mholloway to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/575388 (https://phabricator.wikimedia.org/T246019) [01:25:33] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Bonus sync for cache clearance (duration: 00m 56s) [01:25:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:29:05] 10Operations, 10Product-Infrastructure-Team-Backlog, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for Michael Holloway - https://phabricator.wikimedia.org/T246019 (10Dzahn) [01:39:31] 10Operations, 10serviceops, 10wikitech.wikimedia.org: Install php-ldap on all MW appservers - https://phabricator.wikimedia.org/T237889 (10Jdforrester-WMF) [01:42:00] (03PS1) 10Jforrester: [wikitech] Drop the 'cloudadmin' user group, no longer used and empty [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575390 (https://phabricator.wikimedia.org/T237890) [01:42:02] (03PS1) 10Jforrester: [wikitech] Drop the 'flood' user group, no longer used and empty [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575391 (https://phabricator.wikimedia.org/T237890) [01:42:17] 10Operations, 10Performance-Team, 10serviceops, 10Patch-For-Review: Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10aaron) >>! In T240684#5891129, @jijiki wrote: > **Other comments** > * It is not completely predictable how fast a mcrouter will fa... [01:43:59] 10Operations, 10serviceops, 10wikitech.wikimedia.org: Install php-ldap on all MW appservers - https://phabricator.wikimedia.org/T237889 (10Jdforrester-WMF) `ldap` is already installed in CI, so no need for RelEng to change things from our end. [01:46:22] (03PS1) 10Aaron Schulz: Remove references to obsolete rpc/RunJobs.php endpoint [puppet] - 10https://gerrit.wikimedia.org/r/575392 (https://phabricator.wikimedia.org/T175146) [01:46:54] (03PS1) 10Dzahn: admins: adding arlolra to deployers [puppet] - 10https://gerrit.wikimedia.org/r/575393 (https://phabricator.wikimedia.org/T245877) [01:50:16] (03CR) 10Dzahn: "[scandium:/etc/apache2/conf-available] $ grep SERVERGROUP 50-wikimedia-cluster.conf" [puppet] - 10https://gerrit.wikimedia.org/r/575383 (owner: 10Dzahn) [01:50:46] (03CR) 10Dzahn: Parsoid: Use the version of Parsoid in $IP/vendor (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575336 (https://phabricator.wikimedia.org/T240055) (owner: 10Jforrester) [01:50:51] (03CR) 10Jforrester: [C: 03+1] "LGTM." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575376 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson) [01:53:32] 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10serviceops, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access - https://phabricator.wikimedia.org/T245877 (10Dzahn) @cscott @ssastry Actually you already have deployment access. So this is just about adding arlol... [01:54:53] 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10serviceops, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) [01:55:52] (03CR) 10Dzahn: [C: 03+2] admins: adding arlolra to deployers [puppet] - 10https://gerrit.wikimedia.org/r/575393 (https://phabricator.wikimedia.org/T245877) (owner: 10Dzahn) [01:58:40] (03CR) 10Dzahn: "[deploy1001:~] $ id arlolra" [puppet] - 10https://gerrit.wikimedia.org/r/575393 (https://phabricator.wikimedia.org/T245877) (owner: 10Dzahn) [01:59:59] 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10serviceops, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) @Arlolra Has been added to deployers. This would solve this ticket n... [02:00:45] 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10serviceops, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) a:03Sbailey [02:02:45] (03PS2) 10Aaron Schulz: Remove references to obsolete rpc/RunJobs.php endpoint [puppet] - 10https://gerrit.wikimedia.org/r/575392 (https://phabricator.wikimedia.org/T175146) [02:03:05] (03PS1) 10Dzahn: DHCP: update MAC address for apt2001 once again [puppet] - 10https://gerrit.wikimedia.org/r/575394 [02:04:46] (03CR) 10Dzahn: [C: 03+2] DHCP: update MAC address for apt2001 once again [puppet] - 10https://gerrit.wikimedia.org/r/575394 (owner: 10Dzahn) [02:14:50] (03PS3) 10Aaron Schulz: Use DBO_DEFAULT for extension1 since it is not for key/value blob storage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525977 [02:27:27] 10Operations, 10Security-Team, 10User-jbond: Determine any impacts to SRE from OIT's planned move to JumpCloud for LDAP - https://phabricator.wikimedia.org/T244792 (10HMarcus) @MoritzMuehlenhoff @chasemp Yes, we will plan on providing notice six weeks before the product goes live (assuming it passes the othe... [02:46:28] (03CR) 10Ppchelko: "looks good. We're working on larger changes of this right now, references inline. Adding Joe and Hugh for visibility." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/575392 (https://phabricator.wikimedia.org/T175146) (owner: 10Aaron Schulz) [02:53:38] (03CR) 10JJMC89: "T237890#5925460" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575391 (https://phabricator.wikimedia.org/T237890) (owner: 10Jforrester) [03:50:18] PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting%23Nova-fullstack [03:51:40] ACKNOWLEDGEMENT - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project andrew bogott Ill fix this. Its surely something to do with host aggregates. https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting%23Nova-fullstack [03:52:31] RECOVERY - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is OK: 0 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting%23Nova-fullstack [03:52:33] 10Operations, 10ops-codfw, 10Traffic: (Need by: TBD) rack/setup/install LVS200[7-10] - https://phabricator.wikimedia.org/T196560 (10Papaul) a:05BBlack→03Papaul [03:54:16] 10Operations, 10ops-codfw, 10Traffic: (Need by: TBD) rack/setup/install LVS200[7-10] - https://phabricator.wikimedia.org/T196560 (10Papaul) Firmware upgrade on lvs2008 Before BIOS Version 1.3.7 iDRAC Firmware Version 3.15.17.15 After BIOS Version 2.4.8 iDRAC Firmware Version 4.00.00 [04:15:27] PROBLEM - rpki grafana alert on icinga1001 is CRITICAL: CRITICAL: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is alerting: RRDP status alert. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [04:43:59] RECOVERY - rpki grafana alert on icinga1001 is OK: OK: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is not alerting. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [05:15:45] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [05:17:57] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [05:53:41] RECOVERY - Too many messages in kafka logging-eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All [05:58:36] (03PS2) 10Dzahn: installserver: add apt2001 to fail over servers for APT repo sync [puppet] - 10https://gerrit.wikimedia.org/r/575327 [05:58:40] !log apt2001 - signed puppet cert, initial run after OS install, rsyncing repo data, not in use yet [05:58:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:00:18] (03CR) 10Dzahn: [C: 03+2] installserver: add apt2001 to fail over servers for APT repo sync [puppet] - 10https://gerrit.wikimedia.org/r/575327 (owner: 10Dzahn) [06:04:05] !log rsyncing APT repo and firmware data from install1002 to apt2001 [06:04:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:07:56] (03Abandoned) 10Dzahn: install_server: allow rsyncing from active to replacement servers [puppet] - 10https://gerrit.wikimedia.org/r/569691 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [06:09:01] (03PS2) 10Dzahn: site: add installserver::light role on new install servers [puppet] - 10https://gerrit.wikimedia.org/r/572394 (https://phabricator.wikimedia.org/T224576) [06:09:11] (03PS1) 10Dzahn: switch apt.wikimedia.org from install1002 to apt1001 [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) [06:14:48] (03PS1) 10Dzahn: access_new_install: remove superfluous lint-ignore and FIXME [puppet] - 10https://gerrit.wikimedia.org/r/575405 [06:25:37] !log marostegui@cumin1001 dbctl commit (dc=all): '75% of original weight to db1084 - T245621', diff saved to https://phabricator.wikimedia.org/P10549 and previous config saved to /var/cache/conftool/dbconfig/20200228-062536-marostegui.json [06:25:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:25:42] T245621: db1084 crashed due to BBU failure - https://phabricator.wikimedia.org/T245621 [06:32:25] PROBLEM - rpki grafana alert on icinga1001 is CRITICAL: CRITICAL: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is alerting: RRDP status alert. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [06:40:37] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1084 - T245621', diff saved to https://phabricator.wikimedia.org/P10550 and previous config saved to /var/cache/conftool/dbconfig/20200228-064037-marostegui.json [06:40:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:40:42] T245621: db1084 crashed due to BBU failure - https://phabricator.wikimedia.org/T245621 [06:41:11] 10Operations, 10DBA: db1084 crashed due to BBU failure - https://phabricator.wikimedia.org/T245621 (10Marostegui) 05Open→03Resolved Host fully repooled Thanks everyone! [06:49:58] marostegui: it always warms my heart to see those !logs go by [06:51:56] hahaha [06:54:40] (03CR) 10Muehlenhoff: [C: 03+1] "Patch looks good, but best to first test some package upgrades / apt-get update from a single server (like a codfw mw) by pointing apt.wik" [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [07:03:22] 10Operations, 10Security-Team, 10User-jbond: Determine any impacts to SRE from OIT's planned move to JumpCloud for LDAP - https://phabricator.wikimedia.org/T244792 (10MoritzMuehlenhoff) >>! In T244792#5925452, @HMarcus wrote: > Yes, we will plan on providing notice six weeks before the product goes live (ass... [07:09:59] 10Operations, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): Ferm rules for labstore1004/1005 NFS hosts - https://phabricator.wikimedia.org/T165136 (10MoritzMuehlenhoff) Awesome, thanks! [07:10:41] PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 74, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:11:25] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 62, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:15:18] ACKNOWLEDGEMENT - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 62, down: 1, dormant: 0, excluded: 0, unused: 0: CDanis Planned maintenance PWIC105517 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:15:18] ACKNOWLEDGEMENT - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 74, down: 1, dormant: 0, excluded: 0, unused: 0: CDanis Planned maintenance PWIC105517 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:28:09] (03PS1) 10ArielGlenn: Don't die if no real name is present for a user in LDAP cross-account check [puppet] - 10https://gerrit.wikimedia.org/r/575463 [07:28:53] (03CR) 10Muehlenhoff: "See https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/575141/" [puppet] - 10https://gerrit.wikimedia.org/r/575463 (owner: 10ArielGlenn) [07:31:00] !log installing gnutls28 bugfix update from Buster point release [07:31:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:38:52] !log START warming wikidata term cache on db1126 for Q6-8 million T219123 (pass1 today) [07:38:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:38:58] T219123: Migrate to and read from new store for item terms - https://phabricator.wikimedia.org/T219123 [07:40:48] (03PS1) 10Muehlenhoff: Add Cumin alias for memcached gutter [puppet] - 10https://gerrit.wikimedia.org/r/575465 [07:41:13] (03Abandoned) 10Muehlenhoff: Don't die if no real name is present for a user in LDAP cross-account check [puppet] - 10https://gerrit.wikimedia.org/r/575463 (owner: 10ArielGlenn) [07:49:37] (03CR) 10Muehlenhoff: [C: 03+2] Add Cumin alias for memcached gutter [puppet] - 10https://gerrit.wikimedia.org/r/575465 (owner: 10Muehlenhoff) [08:00:04] Deploy window NO DEPLOYS (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200228T0800) [08:05:35] !log installing systemd bugfix update from Buster point release [08:05:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:12:33] !log START warming wikidata term cache on db1126 for Q6-8 million T219123 (pass2 today) (pass1 just finished) [08:12:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:12:37] T219123: Migrate to and read from new store for item terms - https://phabricator.wikimedia.org/T219123 [08:22:14] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1087 to move labs hosts back under it', diff saved to https://phabricator.wikimedia.org/P10551 and previous config saved to /var/cache/conftool/dbconfig/20200228-082213-marostegui.json [08:22:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:40] !log Stop db1087 and db2079 in sync [08:22:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:24:17] 10Operations: Integrate Buster 10.3 point update - https://phabricator.wikimedia.org/T244693 (10MoritzMuehlenhoff) [08:24:28] (03PS1) 10Giuseppe Lavagetto: Do not update the globals cache file while opcache needs regeneration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575469 (https://phabricator.wikimedia.org/T236104) [08:24:51] !log installing cups updates from buster point release [08:24:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:27:10] (03PS1) 10Elukey: Move import_mediawiki_dumps timers from stat1007 to an-launcher1001 [puppet] - 10https://gerrit.wikimedia.org/r/575470 (https://phabricator.wikimedia.org/T243934) [08:30:35] !log installing mariadb-10.3 update from buster point release (just client-side libs and tools, no mysqlds) [08:30:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:31:13] !llog pool mw1262 [08:31:44] (03PS2) 10Elukey: Move import_mediawiki_dumps timers from stat1007 to an-launcher1001 [puppet] - 10https://gerrit.wikimedia.org/r/575470 (https://phabricator.wikimedia.org/T243934) [08:36:39] 10Operations, 10observability: Have monitoring of updatequerypages cronjobs - https://phabricator.wikimedia.org/T246097 (10fgiunchedi) If we're moving those to systemd timers then abstractions in puppet will take care of setting up monitoring too [08:38:23] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1003/21144/" [puppet] - 10https://gerrit.wikimedia.org/r/575470 (https://phabricator.wikimedia.org/T243934) (owner: 10Elukey) [08:44:24] !log END warming wikidata term cache on db1126 for Q6-8 million T219123 (pass2 today) [08:44:24] !log installing openssh updates from buster point release [08:44:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:44:29] T219123: Migrate to and read from new store for item terms - https://phabricator.wikimedia.org/T219123 [08:44:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:03:35] (03PS1) 10Muehlenhoff: Add Cumin alias for KDCs [puppet] - 10https://gerrit.wikimedia.org/r/575472 [09:13:28] RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 76, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [09:13:40] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 64, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [09:21:29] !log removed leftover labs prometheus target files from ops at prometheus1003, prometheus1004 [09:21:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:21:50] (03CR) 10Elukey: [C: 03+2] Move import_mediawiki_dumps timers from stat1007 to an-launcher1001 [puppet] - 10https://gerrit.wikimedia.org/r/575470 (https://phabricator.wikimedia.org/T243934) (owner: 10Elukey) [09:23:04] (03CR) 10Muehlenhoff: [C: 03+2] Add Cumin alias for KDCs [puppet] - 10https://gerrit.wikimedia.org/r/575472 (owner: 10Muehlenhoff) [09:24:53] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1087 after moving labs hosts back under it', diff saved to https://phabricator.wikimedia.org/P10553 and previous config saved to /var/cache/conftool/dbconfig/20200228-092453-marostegui.json [09:24:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:26:31] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1087 into vslow,dump as it was there originally', diff saved to https://phabricator.wikimedia.org/P10554 and previous config saved to /var/cache/conftool/dbconfig/20200228-092631-marostegui.json [09:26:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:27:03] (03PS1) 10Effie Mouzeli: hieradata: send mw1276's apache logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/575474 (https://phabricator.wikimedia.org/T244472) [09:29:10] 10Operations: Integrate Buster 10.3 point update - https://phabricator.wikimedia.org/T244693 (10MoritzMuehlenhoff) [09:29:25] 10Operations: Integrate Buster 10.3 point update - https://phabricator.wikimedia.org/T244693 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff This is complete [09:30:48] 10Operations: Integrate Stretch 9.12 point update - https://phabricator.wikimedia.org/T244695 (10MoritzMuehlenhoff) [09:36:53] !log marostegui@cumin1001 dbctl commit (dc=all): 'Remove db1101:3318 from vslow,dump', diff saved to https://phabricator.wikimedia.org/P10555 and previous config saved to /var/cache/conftool/dbconfig/20200228-093653-marostegui.json [09:36:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:39:39] (03CR) 10Effie Mouzeli: [C: 03+2] hieradata: send mw1276's apache logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/575474 (https://phabricator.wikimedia.org/T244472) (owner: 10Effie Mouzeli) [09:50:17] (03PS1) 10Elukey: Move import_wikidata_entities_dumps timers to an-launcher1001 [puppet] - 10https://gerrit.wikimedia.org/r/575476 (https://phabricator.wikimedia.org/T243934) [09:54:19] (03PS1) 10Marostegui: db1087: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/575477 (https://phabricator.wikimedia.org/T232446) [09:55:36] (03CR) 10Marostegui: [C: 03+2] db1087: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/575477 (https://phabricator.wikimedia.org/T232446) (owner: 10Marostegui) [09:57:34] (03PS1) 10Gergő Tisza: Switch Newcomer Tasks topic search to ORES-based on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575478 [09:57:52] (03CR) 10Alexandros Kosiaris: [C: 04-1] "A variety of inline comments, but overall this doesn't look in bad shape. I 'll get onto creating the namespaces/tokens for k8s" (0310 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/557090 (https://phabricator.wikimedia.org/T238830) (owner: 10MSantos) [09:58:58] (03PS1) 10Jcrespo: prometheus-mysqld-exporter: Fix port of multisource host [puppet] - 10https://gerrit.wikimedia.org/r/575479 [09:59:44] !log starting rolling restart of elasticsearch/eqiad for JVM upgrade [09:59:46] !log gehel@cumin1001 START - Cookbook sre.elasticsearch.rolling-restart [09:59:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:59:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:00] (03CR) 10Marostegui: [C: 03+1] prometheus-mysqld-exporter: Fix port of multisource host [puppet] - 10https://gerrit.wikimedia.org/r/575479 (owner: 10Jcrespo) [10:01:50] (03PS2) 10Jcrespo: prometheus-mysqld-exporter: Fix port of multisource host [puppet] - 10https://gerrit.wikimedia.org/r/575479 [10:01:59] !log gehel@cumin1001 END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99) [10:02:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:04:38] (03PS2) 10Elukey: Move import_wikidata_entities_dumps timers to an-launcher1001 [puppet] - 10https://gerrit.wikimedia.org/r/575476 (https://phabricator.wikimedia.org/T243934) [10:09:00] (03CR) 10Alexandros Kosiaris: [C: 04-1] "I don't see yet how a new chart would help here. Have we tried the approach that andrew and me suggested about passing the configuration o" [deployment-charts] - 10https://gerrit.wikimedia.org/r/575108 (https://phabricator.wikimedia.org/T220399) (owner: 10Holger Knust) [10:09:06] (03PS1) 10Marostegui: mariadb: Move db1114 to s8 [puppet] - 10https://gerrit.wikimedia.org/r/575482 (https://phabricator.wikimedia.org/T242702) [10:10:06] (03PS1) 10Jcrespo: mariadb: Add example percona configuration for a core host [puppet] - 10https://gerrit.wikimedia.org/r/575483 (https://phabricator.wikimedia.org/T193224) [10:10:20] (03CR) 10Jcrespo: [C: 03+2] prometheus-mysqld-exporter: Fix port of multisource host [puppet] - 10https://gerrit.wikimedia.org/r/575479 (owner: 10Jcrespo) [10:10:46] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/21146/" [puppet] - 10https://gerrit.wikimedia.org/r/575476 (https://phabricator.wikimedia.org/T243934) (owner: 10Elukey) [10:11:20] (03CR) 10Marostegui: mariadb: Add example percona configuration for a core host (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/575483 (https://phabricator.wikimedia.org/T193224) (owner: 10Jcrespo) [10:13:35] !log gehel@cumin1001 START - Cookbook sre.elasticsearch.rolling-restart [10:13:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:21] (03CR) 10Marostegui: [C: 03+2] mariadb: Move db1114 to s8 [puppet] - 10https://gerrit.wikimedia.org/r/575482 (https://phabricator.wikimedia.org/T242702) (owner: 10Marostegui) [10:15:45] !log gehel@cumin1001 END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99) [10:15:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:20:44] 10Operations, 10Pybal, 10Traffic: Minor fixes in pybal checks - https://phabricator.wikimedia.org/T246431 (10jijiki) [10:21:50] (03PS2) 10ArielGlenn: Move kowiki to xml dumps bigwikis list with appropriate settings [puppet] - 10https://gerrit.wikimedia.org/r/573528 (https://phabricator.wikimedia.org/T245721) [10:22:13] (03PS1) 10Muehlenhoff: Create two roles for the initial setup of a server [puppet] - 10https://gerrit.wikimedia.org/r/575485 [10:23:23] (03PS1) 10Marostegui: Revert "mariadb: Move db1114 to s8" [puppet] - 10https://gerrit.wikimedia.org/r/575486 [10:24:03] (03CR) 10ArielGlenn: [C: 03+2] Move kowiki to xml dumps bigwikis list with appropriate settings [puppet] - 10https://gerrit.wikimedia.org/r/573528 (https://phabricator.wikimedia.org/T245721) (owner: 10ArielGlenn) [10:25:19] (03CR) 10Marostegui: [C: 03+2] Revert "mariadb: Move db1114 to s8" [puppet] - 10https://gerrit.wikimedia.org/r/575486 (owner: 10Marostegui) [10:27:29] !log gehel@cumin1001 START - Cookbook sre.elasticsearch.rolling-restart [10:27:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:39] (03PS2) 10Jcrespo: mariadb: Add example percona configuration for a core host [puppet] - 10https://gerrit.wikimedia.org/r/575483 (https://phabricator.wikimedia.org/T193224) [10:39:41] (03PS1) 10Jcrespo: prometheus: Update ops file names for labs mysqld exporter scrapping [puppet] - 10https://gerrit.wikimedia.org/r/575487 [10:41:05] (03PS1) 10Elukey: role::analytics_cluster::launcher: add statistics xml dataset mounts [puppet] - 10https://gerrit.wikimedia.org/r/575488 (https://phabricator.wikimedia.org/T243934) [10:43:12] (03CR) 10Jcrespo: [C: 03+2] prometheus: Update ops file names for labs mysqld exporter scrapping [puppet] - 10https://gerrit.wikimedia.org/r/575487 (owner: 10Jcrespo) [10:43:25] (03PS2) 10Jcrespo: prometheus: Update ops file names for labs mysqld exporter scrapping [puppet] - 10https://gerrit.wikimedia.org/r/575487 [10:43:44] (03CR) 10Marostegui: [C: 03+1] "Thanks for getting this fixed" [puppet] - 10https://gerrit.wikimedia.org/r/575487 (owner: 10Jcrespo) [10:43:47] (03CR) 10Jcrespo: [C: 03+2] mariadb: Add example percona configuration for a core host [puppet] - 10https://gerrit.wikimedia.org/r/575483 (https://phabricator.wikimedia.org/T193224) (owner: 10Jcrespo) [10:44:01] (03CR) 10Elukey: [C: 03+2] role::analytics_cluster::launcher: add statistics xml dataset mounts [puppet] - 10https://gerrit.wikimedia.org/r/575488 (https://phabricator.wikimedia.org/T243934) (owner: 10Elukey) [10:44:39] jynus: feel free to merge my stuff [10:44:53] I didn't proceed with yours [10:44:54] done [10:45:12] well, not finished yet [10:45:28] now it finished :-) [10:45:48] thanks! [10:49:01] (03PS1) 10Muehlenhoff: Remove role::prometheus::k8s in favour of including the profile [puppet] - 10https://gerrit.wikimedia.org/r/575491 [10:49:39] (03PS1) 10Elukey: profile::statistics::dataset_mount: add an-launcher1001 [puppet] - 10https://gerrit.wikimedia.org/r/575492 [10:51:41] Have no idea what is wrong with yahoo/aol but I'm getting like 50 or 60 emails with yahoo/aol mailman errors [10:52:06] (Mailing list commons-l) [10:53:01] !log labsdb1009-12 prometheus metrics restored after 90 minutes of unscheduled unavailability [10:53:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:53:10] (03CR) 10Elukey: [C: 03+2] profile::statistics::dataset_mount: add an-launcher1001 [puppet] - 10https://gerrit.wikimedia.org/r/575492 (owner: 10Elukey) [10:53:38] that was for mysqld-exporter, not host metrics [10:56:29] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1003/21148/" [puppet] - 10https://gerrit.wikimedia.org/r/575491 (owner: 10Muehlenhoff) [10:58:25] 10Operations, 10DBA: Disable/remove unused features on Tendril - https://phabricator.wikimedia.org/T231185 (10jcrespo) [10:58:29] 10Operations, 10observability, 10Availability, 10Goal, 10Patch-For-Review: Setup bacula backup monitoring - https://phabricator.wikimedia.org/T234900 (10jcrespo) [10:58:31] 10Operations: Migrate dbmonitor hosts to Stretch/Buster - https://phabricator.wikimedia.org/T224589 (10jcrespo) [10:58:33] 10Operations: Add favicon to icinga and tendril - https://phabricator.wikimedia.org/T204110 (10jcrespo) [10:58:39] 10Operations, 10DBA, 10observability: Display lag on grafana (prometheus) and dbtree from pt-heartbeat instead (or in addition) of Seconds_Behind_Master - https://phabricator.wikimedia.org/T141968 (10jcrespo) [10:58:42] 10Operations, 10DBA, 10Traffic, 10Patch-For-Review: Framework to transfer files over the LAN - https://phabricator.wikimedia.org/T156462 (10jcrespo) [10:58:49] 10Operations, 10DBA, 10Privacy Engineering, 10Traffic, and 4 others: dbtree loads third party resources (from jquery.com and google.com) - https://phabricator.wikimedia.org/T96499 (10jcrespo) [10:59:56] 10Operations, 10Performance-Team, 10serviceops, 10Patch-For-Review: Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10aaron) >>! In T240684#5917524, @jijiki wrote: > > Secondly, regardless of when we use the gutter pool, do you thing we need to cle... [11:03:12] (03PS1) 10Muehlenhoff: Fix system role names for restbase [puppet] - 10https://gerrit.wikimedia.org/r/575494 [11:04:33] !log gehel@cumin1001 END (ERROR) - Cookbook sre.elasticsearch.rolling-restart (exit_code=97) [11:04:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:09:55] (03PS1) 10Jcrespo: mysql: Fix mysql server configuration for the percona flavour [puppet] - 10https://gerrit.wikimedia.org/r/575496 (https://phabricator.wikimedia.org/T193224) [11:10:19] (03CR) 10Jcrespo: "Sorry, I missed your comment on previous patch." [puppet] - 10https://gerrit.wikimedia.org/r/575496 (https://phabricator.wikimedia.org/T193224) (owner: 10Jcrespo) [11:11:25] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/21149/" [puppet] - 10https://gerrit.wikimedia.org/r/575494 (owner: 10Muehlenhoff) [11:12:37] (03CR) 10Filippo Giunchedi: "There's a role for each prometheus "instance" like analytics, presumably the system::role invocation there triggers the same false positiv" [puppet] - 10https://gerrit.wikimedia.org/r/575491 (owner: 10Muehlenhoff) [11:20:27] revi: not sure what the error is - but perhaps mailman tries to send an email on behalf of @yahoo.com address, which fails, because yahoo's policies prohibit anyone but yahoo to send mails from their addresses. [11:21:04] it was something similar to https://phab.wiki/238780 [11:22:19] then I would blame T232417 and it might well be on yahoo's side [11:22:19] T232417: mass Yahoo / AOL bounces mailman - https://phabricator.wikimedia.org/T232417 [11:24:58] (03CR) 10Muehlenhoff: "In the last run of the script only this one was triggered, could simply be caused by the order puppetdb returned the results. Followup pat" [puppet] - 10https://gerrit.wikimedia.org/r/575491 (owner: 10Muehlenhoff) [11:29:57] (03CR) 10Jcrespo: mysql: Fix mysql server configuration for the percona flavour (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/575496 (https://phabricator.wikimedia.org/T193224) (owner: 10Jcrespo) [11:35:00] (03CR) 10Alexandros Kosiaris: [C: 03+1] changeprop: add hierdata k8s entries [puppet] - 10https://gerrit.wikimedia.org/r/574811 (https://phabricator.wikimedia.org/T213193) (owner: 10Hnowlan) [11:35:48] (03CR) 10Alexandros Kosiaris: [C: 03+1] Admin: Add changeprop namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/574719 (https://phabricator.wikimedia.org/T213193) (owner: 10Hnowlan) [11:52:46] does anyone know if watchlist editing was touched by mw deployments this week? [11:53:27] I saw a reocurrence of T208003, but only once [11:53:28] T208003: WatchedItemStore::addWatchBatchForUser does not have outer scope. - https://phabricator.wikimedia.org/T208003 [11:53:29] (03PS1) 10Giuseppe Lavagetto: prometheus::ops: collect envoy stats from all servers [puppet] - 10https://gerrit.wikimedia.org/r/575504 [11:53:52] (03CR) 10Hnowlan: [C: 03+2] changeprop: add hierdata k8s entries [puppet] - 10https://gerrit.wikimedia.org/r/574811 (https://phabricator.wikimedia.org/T213193) (owner: 10Hnowlan) [11:54:25] (03CR) 10Alexandros Kosiaris: [C: 03+2] Admin: Add changeprop namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/574719 (https://phabricator.wikimedia.org/T213193) (owner: 10Hnowlan) [11:54:30] nah, it happens here and there before this week, so not deployment related [11:55:12] (03PS8) 10Alexandros Kosiaris: Admin: Add changeprop namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/574719 (https://phabricator.wikimedia.org/T213193) (owner: 10Hnowlan) [11:55:14] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Admin: Add changeprop namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/574719 (https://phabricator.wikimedia.org/T213193) (owner: 10Hnowlan) [11:57:27] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . [11:57:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:06:40] !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . [12:06:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:11:19] (03CR) 10MarcoAurelio: [C: 04-1] "Per discussion in task. This group is meant to be empty most of the time as it's for temporary use." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575391 (https://phabricator.wikimedia.org/T237890) (owner: 10Jforrester) [12:11:28] !log akosiaris@deploy1001 helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . [12:11:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:25:44] (03PS7) 10Holger Knust: changeprop: New helmfiles for deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/574094 (https://phabricator.wikimedia.org/T213193) [12:29:22] (03CR) 10Holger Knust: "Added Redis keys. Should I add the cpjobqueue files to this change or do we want a separate change for those?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/574094 (https://phabricator.wikimedia.org/T213193) (owner: 10Holger Knust) [12:48:59] 10Operations, 10Beta-Cluster-Infrastructure, 10observability, 10serviceops, 10Patch-For-Review: Stream a subset of mediawiki apache logs to logstash - https://phabricator.wikimedia.org/T244472 (10jijiki) [12:49:36] 10Operations, 10Beta-Cluster-Infrastructure, 10observability, 10serviceops, 10Patch-For-Review: Stream a subset of mediawiki apache logs to logstash - https://phabricator.wikimedia.org/T244472 (10jijiki) 05Open→03Resolved Thank you @herron and @fgiunchedi for your help! [13:01:47] (03CR) 10Alexandros Kosiaris: [C: 03+1] "> Should I add the cpjobqueue files to this change or do we want a separate change for those?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/574094 (https://phabricator.wikimedia.org/T213193) (owner: 10Holger Knust) [13:32:41] !log Reset idrac from db1114 [13:32:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:37:02] 10Operations, 10ops-eqiad, 10DC-Ops: db1114 IPMI unreachable - https://phabricator.wikimedia.org/T246441 (10Marostegui) [13:48:32] (03CR) 10Krinkle: Do not update the globals cache file while opcache needs regeneration (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575469 (https://phabricator.wikimedia.org/T236104) (owner: 10Giuseppe Lavagetto) [13:52:07] (03CR) 10Filippo Giunchedi: prometheus::ops: collect envoy stats from all servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/575504 (owner: 10Giuseppe Lavagetto) [13:55:31] (03CR) 10Filippo Giunchedi: [C: 03+1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/575491 (owner: 10Muehlenhoff) [13:58:09] !log gehel@cumin1001 START - Cookbook sre.elasticsearch.rolling-restart [13:58:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:09] !log elukey@deploy1001 Started deploy [analytics/refinery@2db36f4]: Fix refinery-drop-older-than script [14:01:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:03:39] 10Operations, 10ops-eqiad, 10DC-Ops: db1114 IPMI unreachable - https://phabricator.wikimedia.org/T246441 (10Marostegui) 05Open→03Resolved a:03Marostegui Looks like `racadm racreset hard` did the trick: ` root@cumin1001:~# sudo ipmitool -I lanplus -H db1114.mgmt.eqiad.wmnet -U root -E chassis power stat... [14:06:03] 10Operations, 10ops-eqiad, 10DC-Ops: db1114 IPMI unreachable - https://phabricator.wikimedia.org/T246441 (10jcrespo) Done, the remote password had gone async from the local ipmi one, setting the password again via idrac made it work again: ` sudo ipmitool -I lanplus -H db1114.mgmt.eqiad.wmnet -U root -E ch... [14:08:19] (03CR) 10Muehlenhoff: "> That's fair, let's go ahead and see if this fixes things or the other roles need fixing as well" [puppet] - 10https://gerrit.wikimedia.org/r/575491 (owner: 10Muehlenhoff) [14:09:56] (03CR) 10Holger Knust: "> There is one assumption in the above, it's that we won't need" [deployment-charts] - 10https://gerrit.wikimedia.org/r/575108 (https://phabricator.wikimedia.org/T220399) (owner: 10Holger Knust) [14:10:36] !log marostegui@cumin1001 dbctl commit (dc=all): 'Increase weight from 100 to 300', diff saved to https://phabricator.wikimedia.org/P10558 and previous config saved to /var/cache/conftool/dbconfig/20200228-141035-marostegui.json [14:10:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:10] !log elukey@deploy1001 Finished deploy [analytics/refinery@2db36f4]: Fix refinery-drop-older-than script (duration: 14m 01s) [14:15:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:16:49] !log gehel@cumin1001 END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0) [14:16:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:08] !log rolling restart of elasticsearch/eqiad for JVM upgrade completed [14:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:44] (03PS1) 10Muehlenhoff: Add new DNS entries for logstash-next plus the CAS counter parts [dns] - 10https://gerrit.wikimedia.org/r/575530 [14:40:45] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:41:08] 10Operations, 10Scap, 10serviceops: Make canary wait time configurable - https://phabricator.wikimedia.org/T217924 (10jijiki) Great we merged this patch! Do we have a plan of how we will communicate this to the deployers when we release scap as well as how to test that it is all good in production? Thahnk you! [14:42:31] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10User-Smalyshev, 10cloud-services-team (Kanban): Provide a way to have test servers on real hardware, isolated from production for Wikidata Query Service - https://phabricator.wikimedia.org/T206636 (10Andrew) [14:42:57] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:48:10] PROBLEM - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is CRITICAL: /v2/translate/{from}/{to}{/provider} (Machine translate an HTML fragment using TestClient, adapt the links to target language wiki.) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX [14:49:06] RECOVERY - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX [14:53:22] (03PS1) 10Muehlenhoff: Add logstash-next IDP service [puppet] - 10https://gerrit.wikimedia.org/r/575536 [15:05:52] 10Operations, 10ORES, 10Scoring-platform-team (Current): ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) - https://phabricator.wikimedia.org/T242705 (10Halfak) I ran a test with gunicorn on my laptop and I was able to replicate this behavior. I started the se... [15:10:57] (03PS1) 10Andrew Bogott: Nova scheduler: disable the scheduling pool filter [puppet] - 10https://gerrit.wikimedia.org/r/575540 (https://phabricator.wikimedia.org/T226731) [15:10:59] (03PS1) 10Andrew Bogott: nova: remove the custom scheduler pool filter [puppet] - 10https://gerrit.wikimedia.org/r/575541 (https://phabricator.wikimedia.org/T226731) [15:11:12] (03CR) 10Herron: [C: 03+1] Add logstash-next IDP service [puppet] - 10https://gerrit.wikimedia.org/r/575536 (owner: 10Muehlenhoff) [15:14:59] (03PS1) 10Jgreen: add fran1001.frack.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/575543 (https://phabricator.wikimedia.org/T245554) [15:15:20] (03CR) 10Andrew Bogott: [C: 03+2] Nova scheduler: disable the scheduling pool filter [puppet] - 10https://gerrit.wikimedia.org/r/575540 (https://phabricator.wikimedia.org/T226731) (owner: 10Andrew Bogott) [15:19:07] (03CR) 10Jgreen: [C: 03+2] add fran1001.frack.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/575543 (https://phabricator.wikimedia.org/T245554) (owner: 10Jgreen) [15:20:36] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "Two minor comments you can ignore or followup upon later, but LGTM" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/574862 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron) [15:21:43] 10Operations, 10ORES, 10Scoring-platform-team (Current): ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) - https://phabricator.wikimedia.org/T242705 (10Halfak) I was able to replicate the behavior with a very simple Flask application. app.py: ` $ cat app.py... [15:21:56] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Need by: ASAP) rack/setup/install fran1001 - https://phabricator.wikimedia.org/T245554 (10Jgreen) [15:22:26] !log elukey@deploy1001 Started deploy [analytics/refinery@28fa2fc]: fix for refinery-drop-older-than - part 2 [15:22:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:33] 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10serviceops, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Arlolra) >>! In T245877#5925447, @Dzahn wrote: > @Arlolra Has been added to... [15:24:32] (03CR) 10Herron: [C: 03+2] "thanks!" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/574862 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron) [15:24:39] !log Stop replication on db1077 from db1111 (its master) - T246447 [15:24:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:44] T246447: Move db1111 from test-s4 to s8 - https://phabricator.wikimedia.org/T246447 [15:24:57] (03PS2) 10Herron: add kibana-next service records [dns] - 10https://gerrit.wikimedia.org/r/574861 [15:25:50] (03CR) 10Herron: [C: 03+2] add kibana-next service records [dns] - 10https://gerrit.wikimedia.org/r/574861 (owner: 10Herron) [15:25:50] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (Need by: ASAP) rack/setup/install frnetmon1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T232137 (10Jgreen) @cmjohnson @Jclark-ctr could someone do the final bits on this task so we can finish the deploy? This host arrived nearly 6 months ago! [15:30:28] (03PS1) 10Marostegui: mariadb: Move db1111 into s8 [puppet] - 10https://gerrit.wikimedia.org/r/575550 (https://phabricator.wikimedia.org/T246447) [15:34:10] (03CR) 10Giuseppe Lavagetto: Do not update the globals cache file while opcache needs regeneration (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575469 (https://phabricator.wikimedia.org/T236104) (owner: 10Giuseppe Lavagetto) [15:34:24] (03CR) 10Marostegui: [C: 03+2] mariadb: Move db1111 into s8 [puppet] - 10https://gerrit.wikimedia.org/r/575550 (https://phabricator.wikimedia.org/T246447) (owner: 10Marostegui) [15:36:06] !log elukey@deploy1001 Finished deploy [analytics/refinery@28fa2fc]: fix for refinery-drop-older-than - part 2 (duration: 13m 40s) [15:36:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:39:16] !log installing libperl4-corelibs-perl updates from Stretch point release [15:39:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:35] 10Operations: Integrate Stretch 9.12 point update - https://phabricator.wikimedia.org/T244695 (10MoritzMuehlenhoff) [15:42:33] PROBLEM - Confd template for /srv/config-master/pybal/eqiad/kibana on puppetmaster2001 is CRITICAL: Compilation of file /srv/config-master/pybal/eqiad/kibana is broken https://wikitech.wikimedia.org/wiki/Confd%23Monitoring [15:43:23] PROBLEM - Confd template for /srv/config-master/pybal/codfw/kibana on puppetmaster2001 is CRITICAL: Compilation of file /srv/config-master/pybal/codfw/kibana is broken https://wikitech.wikimedia.org/wiki/Confd%23Monitoring [15:45:55] herron: fyi ^ [15:46:42] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:46:45] rlazarus: thx [15:49:02] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:51:31] Jdlrobson: hi, NearbyPages just migrated to gerrit as requested [15:51:50] !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime [15:51:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:05] (03Abandoned) 10Effie Mouzeli: install_server: use raid1-gpt-lvm-ext4-srv.cfg recipe for mw* [puppet] - 10https://gerrit.wikimedia.org/r/553095 (https://phabricator.wikimedia.org/T156955) (owner: 10Effie Mouzeli) [15:54:02] <_joe_> herron: I'm running puppet on 1001 so I can see what changes [15:54:09] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [15:54:10] <_joe_> I think I know where I messed up [15:54:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:12] ok [15:54:24] <_joe_> can you please check nothing happens on a lvs server? [15:54:37] <_joe_> this is absolutely not impacting production right now [15:54:59] sure I'll disable puppet on lvs* then? [15:55:05] 10Operations, 10MediaWiki-General, 10observability, 10serviceops: MediaWiki Prometheus support - https://phabricator.wikimedia.org/T240685 (10colewhite) As I think about it more, it's the wire format being wholly incompatible with Prometheus format. In order to make it work, StatsD requires a lot of confi... [15:55:52] disabled puppet [15:56:00] <_joe_> herron: run it on lvs1016 [15:56:05] ok [15:56:13] <_joe_> sorry, this is codfw right? [15:56:17] <_joe_> 2006 then [15:56:21] PROBLEM - Confd template for /srv/config-master/pybal/eqiad/kibana on puppetmaster1001 is CRITICAL: Compilation of file /srv/config-master/pybal/eqiad/kibana is broken https://wikitech.wikimedia.org/wiki/Confd%23Monitoring [15:56:24] <_joe_> but I found my error [15:56:27] codfw is in a state of flux, the numbers aren't like they used to be [15:56:43] <_joe_> bblack: what do you mean? [15:56:46] <_joe_> oh lvs servers [15:56:47] vgutierrez: knows the current state of affairs, but lvs2001-6 are in the middle of being replaced by lvs2007-2010 [15:57:08] <_joe_> well they still have the puppet classes applied, right? [15:57:33] yes, I just don't know off the top of my head which are in which traffic-classes and/or live and/or decommed [15:57:38] <_joe_> herron: I think I know what your poblem is [15:57:42] I thin k2003/6 are decommed already or soon-to-ne [15:57:43] <_joe_> Feb 28 15:57:22 puppetmaster1001 confd[14925]: 2020-02-28T15:57:22Z puppetmaster1001 /usr/bin/confd[14925]: ERROR "updating error mtime on /var/run/confd-template/.kibana-next261629901.err\nfailed linting '/usr/local/bin/pybal-eval-check /srv/config-master/pybal/eqiad/.kibana-next261629901' with 1 (0.021476984024s) [invalid]: server pool cannot be empty!\n\n" [15:57:52] and 2009/10 are online [15:58:14] (10 being a universal backup like lvs1016, and 09 being line lvs1015) [15:58:25] bblack: lvs2003 and lvs2006 are already decom [16:00:26] indeed [16:00:44] (03PS1) 10Marostegui: db1111: Reimage db1111 as buster [puppet] - 10https://gerrit.wikimedia.org/r/575561 (https://phabricator.wikimedia.org/T246447) [16:01:57] lsvs2003 and lvs2006 have been decommed, I'm waiting on lvs2007 and lvs2008 to get online to get rid of the others [16:02:13] besides that, lvs @ ulsfo are already running buster, same as lvs5003 [16:02:35] and that's the picture reported here: https://phabricator.wikimedia.org/T245984 [16:03:34] PROBLEM - Confd template for /srv/config-master/pybal/eqiad/kibana-next on puppetmaster2001 is CRITICAL: File not found: /srv/config-master/pybal/eqiad/kibana-next https://wikitech.wikimedia.org/wiki/Confd%23Monitoring [16:04:17] 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10serviceops, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Sbailey) Greg, do you really want me to post my public key in this form? or... [16:04:20] 10Operations, 10Scap, 10serviceops: Make canary wait time configurable - https://phabricator.wikimedia.org/T217924 (10LarsWirzenius) I had a discussion with Tyler just now. We plan the following: * add to docs/ in the source tree; this ends up in doc.wikimedia.org * add to debian/changelog in the source tre... [16:05:09] (03PS1) 10Andrew Bogott: nfs: remove dumps and scratch mounts for the fastcci project. [puppet] - 10https://gerrit.wikimedia.org/r/575562 (https://phabricator.wikimedia.org/T208404) [16:05:20] !log oblivian@puppetmaster1001 conftool action : set/pooled=yes:weight=1; selector: cluster=kibana,service=kibana-next [16:05:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:05:48] _joe_: as a quick mapping, lvs2003 has been replaced by lvs2009 and lvs2006 by lvs2010 (and this is actually playing the role of universal secondary LVS @ codfw) [16:09:58] 10Operations, 10Scap, 10serviceops: Make canary wait time configurable - https://phabricator.wikimedia.org/T217924 (10jijiki) >>! In T217924#5926912, @LarsWirzenius wrote: > I had a discussion with Tyler just now. We plan the following: > > * add to docs/ in the source tree; this ends up in doc.wikimedia.or... [16:10:14] (03CR) 10Marostegui: [C: 03+2] db1111: Reimage db1111 as buster [puppet] - 10https://gerrit.wikimedia.org/r/575561 (https://phabricator.wikimedia.org/T246447) (owner: 10Marostegui) [16:15:32] 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10serviceops, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Sbailey) Subbu says yes to here it is: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABA... [16:15:57] (03CR) 10Andrew Bogott: [C: 03+2] nfs: remove dumps and scratch mounts for the fastcci project. [puppet] - 10https://gerrit.wikimedia.org/r/575562 (https://phabricator.wikimedia.org/T208404) (owner: 10Andrew Bogott) [16:16:18] (03PS1) 10Papaul: DHCP: Add MAC address for lvs200[7-8] [puppet] - 10https://gerrit.wikimedia.org/r/575565 (https://phabricator.wikimedia.org/T196560) [16:17:43] PROBLEM - Confd template for /srv/config-master/pybal/codfw/kibana on puppetmaster1001 is CRITICAL: Compilation of file /srv/config-master/pybal/codfw/kibana is broken https://wikitech.wikimedia.org/wiki/Confd%23Monitoring [16:22:52] !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime [16:22:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:25:09] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [16:25:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:54] RECOVERY - Confd template for /srv/config-master/pybal/eqiad/kibana on puppetmaster2001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring [16:26:54] RECOVERY - Confd template for /srv/config-master/pybal/codfw/kibana on puppetmaster2001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring [16:26:54] RECOVERY - Confd template for /srv/config-master/pybal/eqiad/kibana-next on puppetmaster2001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring [16:30:14] RECOVERY - Confd template for /srv/config-master/pybal/codfw/kibana on puppetmaster1001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring [16:30:14] RECOVERY - Confd template for /srv/config-master/pybal/eqiad/kibana on puppetmaster1001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring [16:30:39] 10Operations, 10Security-Team, 10User-jbond: Determine any impacts to SRE from OIT's planned move to JumpCloud for LDAP - https://phabricator.wikimedia.org/T244792 (10HMarcus) We are already running a fully functional sandbox environment (free for up to 10 users) that we have been using for testing. I will s... [16:32:16] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [16:33:21] (03CR) 10Papaul: [C: 03+1] site: add second batch of new eqiad appservers as spares, by rack [puppet] - 10https://gerrit.wikimedia.org/r/575382 (https://phabricator.wikimedia.org/T241849) (owner: 10Dzahn) [16:34:12] (03PS1) 10Marostegui: mariadb: Place db1111 into s8 eqiad [puppet] - 10https://gerrit.wikimedia.org/r/575566 [16:35:45] (03CR) 10Marostegui: [C: 03+2] mariadb: Place db1111 into s8 eqiad [puppet] - 10https://gerrit.wikimedia.org/r/575566 (owner: 10Marostegui) [16:36:03] 10Operations, 10ORES, 10Scoring-platform-team (Current): ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) - https://phabricator.wikimedia.org/T242705 (10Halfak) @ACraze and I were discussing this and we were wondering if maybe it's time to try threads. In some s... [16:38:39] 10Operations, 10MediaWiki-General, 10observability, 10serviceops: MediaWiki Prometheus support - https://phabricator.wikimedia.org/T240685 (10colewhite) In response to @Joe's concerns: > What happens if redis is overwhelmed/down? How can we control timeouts? The library has a configurable timeout with a... [16:39:32] 10Operations, 10serviceops, 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: Upgrade memcached for Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10jijiki) 05Open→03Stalled This is stalled until we completely stop using redis and we have put the gutter pool in production [16:39:34] 10Operations: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10jijiki) [16:39:36] 10Operations, 10serviceops: Upgrade and improve our application object caching service (memcached) - https://phabricator.wikimedia.org/T244852 (10jijiki) [16:42:18] 10Operations, 10ops-codfw, 10Traffic, 10Patch-For-Review: (Need by: TBD) rack/setup/install LVS200[7-10] - https://phabricator.wikimedia.org/T196560 (10Papaul) [16:44:16] 10Operations, 10ops-codfw, 10Traffic: (OoW) lvs2006 crashed into (what it seems) an unrecoverable state - https://phabricator.wikimedia.org/T209337 (10Papaul) 05Stalled→03Declined Server is decommissioned in T246329 [16:44:57] 10Operations, 10ops-codfw, 10DC-Ops, 10Traffic: (OoW) lvs2006 Embedded Flash/SD-CARD iLO errors - https://phabricator.wikimedia.org/T192082 (10Papaul) 05Open→03Declined Server is decommissioned in T246329 [16:49:40] 10Operations, 10ORES, 10Scoring-platform-team (Current): ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) - https://phabricator.wikimedia.org/T242705 (10Halfak) We also talked about https://medium.com/@pgjones/quart-a-asyncio-alternative-to-flask-32666ae2abb0 - h... [16:49:48] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (Need by: ASAP) rack/setup/install frnetmon1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T232137 (10Jclark-ctr) [16:50:34] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (Need by: ASAP) rack/setup/install frnetmon1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T232137 (10Jclark-ctr) configured bios set password @Jgreen [16:54:14] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: (Need by: TBD) rack/setup/install frpm2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T242269 (10Papaul) a:03Papaul [17:01:31] (03PS4) 10SBassett: Deployment group audit [puppet] - 10https://gerrit.wikimedia.org/r/574869 (https://phabricator.wikimedia.org/T237696) [17:01:57] (03CR) 10jerkins-bot: [V: 04-1] Deployment group audit [puppet] - 10https://gerrit.wikimedia.org/r/574869 (https://phabricator.wikimedia.org/T237696) (owner: 10SBassett) [17:05:34] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:06:03] (03CR) 10MarcoAurelio: "Shouldn't removed members (people which previously had access but now they ain't having it) be added to the 'absent' group on top of the f" [puppet] - 10https://gerrit.wikimedia.org/r/574869 (https://phabricator.wikimedia.org/T237696) (owner: 10SBassett) [17:06:43] (03PS5) 10SBassett: Deployment group audit [puppet] - 10https://gerrit.wikimedia.org/r/574869 (https://phabricator.wikimedia.org/T237696) [17:07:36] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:07:40] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [17:08:55] (03CR) 10SBassett: "> Shouldn't removed members (people which previously had access but now they ain't having it) be added to the 'absent' group on top of the" [puppet] - 10https://gerrit.wikimedia.org/r/574869 (https://phabricator.wikimedia.org/T237696) (owner: 10SBassett) [17:11:38] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [17:15:52] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [17:15:56] (03PS2) 10Dzahn: site: add second batch of new eqiad appservers as spares, by rack [puppet] - 10https://gerrit.wikimedia.org/r/575382 (https://phabricator.wikimedia.org/T241849) [17:20:16] (03PS1) 10CRusnov: Edit Project Config [software/netbox] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/575572 [17:20:28] (03PS1) 10Elukey: profile::analytics::refinery::job::refine: fix MobileWebMainMenuClickTracking blacklist [puppet] - 10https://gerrit.wikimedia.org/r/575573 [17:21:06] (03Abandoned) 10CRusnov: Edit Project Config [software/netbox] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/575572 (owner: 10CRusnov) [17:22:17] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (Need by: ASAP) rack/setup/install frnetmon1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T232137 (10RobH) So I'm not familiar with the frack vlans and bonding setup for interfaces. However, if someone can point out a server this should duplicate... [17:22:54] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:23:39] (03CR) 10jerkins-bot: [V: 04-1] profile::analytics::refinery::job::refine: fix MobileWebMainMenuClickTracking blacklist [puppet] - 10https://gerrit.wikimedia.org/r/575573 (owner: 10Elukey) [17:25:04] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:25:15] (03PS2) 10Elukey: profile::analytics::refinery::job::refine: fix el blacklist [puppet] - 10https://gerrit.wikimedia.org/r/575573 [17:32:14] RECOVERY - rpki grafana alert on icinga1001 is OK: OK: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is not alerting. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [17:33:25] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (Need by: ASAP) rack/setup/install frnetmon1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T232137 (10RobH) Ok, updated the switch (Thanks Arzhel) and put things into the admin vlan to match dns that was already setup. This should be good to from... [17:33:34] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (Need by: ASAP) rack/setup/install frnetmon1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T232137 (10RobH) [17:35:40] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (Need by: ASAP) rack/setup/install frnetmon1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T232137 (10Jclark-ctr) a:03Jgreen [17:40:57] 10Operations, 10Core Platform Team, 10DC-Ops, 10serviceops: Rename wtp* servers to parsoid* (Parsoid PHP servers) - https://phabricator.wikimedia.org/T245888 (10jijiki) [17:42:41] 10Operations, 10Core Platform Team, 10DC-Ops, 10serviceops: Rename wtp* servers to parsoid* (Parsoid PHP servers) - https://phabricator.wikimedia.org/T245888 (10Dzahn) let's rename the ticket to parse* to match what is in DNS and the wiki page, ok? [17:52:48] (03PS1) 10CRusnov: Update Netbox to v2.7.8 [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/575580 [17:53:02] (03CR) 10CRusnov: "This change is ready for review." [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/575580 (owner: 10CRusnov) [17:57:01] (03PS1) 10ArielGlenn: cleanup of page content dumps run() [dumps] - 10https://gerrit.wikimedia.org/r/575581 (https://phabricator.wikimedia.org/T246465) [18:00:09] (03PS1) 10ArielGlenn: fix up file list methods [dumps] - 10https://gerrit.wikimedia.org/r/575584 (https://phabricator.wikimedia.org/T246465) [18:00:43] why yes I am gonig to push through about 7 more commits that I realized have been stacking up... but one at a time to let jenkins whine if it wants to [18:02:49] (03PS1) 10ArielGlenn: convert all file list methods to use common args [dumps] - 10https://gerrit.wikimedia.org/r/575585 (https://phabricator.wikimedia.org/T246465) [18:03:07] 10Operations, 10Core Platform Team, 10DC-Ops, 10serviceops: Rename wtp* servers to parse* (Parsoid PHP servers) - https://phabricator.wikimedia.org/T245888 (10jijiki) [18:03:19] 10Operations, 10Core Platform Team, 10DC-Ops, 10serviceops: Rename wtp* servers to parse* (Parsoid PHP servers) - https://phabricator.wikimedia.org/T245888 (10jijiki) done, sigh [18:03:40] (03PS1) 10ArielGlenn: move StubProvider out to its own module [dumps] - 10https://gerrit.wikimedia.org/r/575586 (https://phabricator.wikimedia.org/T246465) [18:04:59] (03PS1) 10ArielGlenn: move some dfname/pagerange munging methods to their own class [dumps] - 10https://gerrit.wikimedia.org/r/575587 (https://phabricator.wikimedia.org/T246465) [18:06:53] (03PS1) 10ArielGlenn: move some output file listing methods to their own module [dumps] - 10https://gerrit.wikimedia.org/r/575588 (https://phabricator.wikimedia.org/T246465) [18:07:12] (03CR) 10jerkins-bot: [V: 04-1] move some output file listing methods to their own module [dumps] - 10https://gerrit.wikimedia.org/r/575588 (https://phabricator.wikimedia.org/T246465) (owner: 10ArielGlenn) [18:08:28] PROBLEM - BFD status on cr3-knams is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [18:08:38] PROBLEM - BFD status on cr1-eqiad is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [18:09:16] PROBLEM - OSPF status on cr3-knams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [18:10:02] PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 4/6 UP : OSPFv3: 4/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [18:10:08] PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [18:10:33] (03PS2) 10ArielGlenn: move some output file listing methods to their own module [dumps] - 10https://gerrit.wikimedia.org/r/575588 (https://phabricator.wikimedia.org/T246465) [18:13:13] (03PS1) 10ArielGlenn: use only jobFileLister instance methods in other modules [dumps] - 10https://gerrit.wikimedia.org/r/575589 (https://phabricator.wikimedia.org/T246465) [18:14:25] (03PS1) 10ArielGlenn: add some unit tests for prefetch arg generation [dumps] - 10https://gerrit.wikimedia.org/r/575591 (https://phabricator.wikimedia.org/T246465) [18:15:06] (03CR) 10jerkins-bot: [V: 04-1] add some unit tests for prefetch arg generation [dumps] - 10https://gerrit.wikimedia.org/r/575591 (https://phabricator.wikimedia.org/T246465) (owner: 10ArielGlenn) [18:18:38] (03PS3) 10Dwisehaupt: Add IPs for new frack hosts: civi2001, frpm2001 [dns] - 10https://gerrit.wikimedia.org/r/574097 (https://phabricator.wikimedia.org/T242270) [18:21:19] 10Operations, 10ops-eqiad, 10DC-Ops: (Need by: ASAP) rack/setup/install new GPU host - https://phabricator.wikimedia.org/T246472 (10RobH) p:05Triage→03High [18:21:43] (03PS2) 10ArielGlenn: add some unit tests for prefetch arg generation [dumps] - 10https://gerrit.wikimedia.org/r/575591 (https://phabricator.wikimedia.org/T246465) [18:21:48] 10Operations, 10ops-eqiad, 10DC-Ops: (Need by: ASAP) rack/setup/install new GPU host - https://phabricator.wikimedia.org/T246472 (10RobH) [18:23:15] 10Operations, 10ops-eqiad, 10DC-Ops: (Need by: ASAP) rack/setup/install new GPU host - https://phabricator.wikimedia.org/T246472 (10RobH) [18:23:36] that ends my gerrit spam for today. :-P [18:24:00] 10Operations, 10ops-eqiad, 10DC-Ops: (Need by: ASAP) rack/setup/install new GPU host - https://phabricator.wikimedia.org/T246472 (10RobH) @elukey, Can you provide the info in the 'Hostname / Racking / Installation Details' section in the task body, and then reassign this from yourself to @Jclark-ctr? Thank... [18:24:36] (03CR) 10Dzahn: [C: 03+2] Add IPs for new frack hosts: civi2001, frpm2001 [dns] - 10https://gerrit.wikimedia.org/r/574097 (https://phabricator.wikimedia.org/T242270) (owner: 10Dwisehaupt) [18:27:02] 10Operations, 10Core Platform Team, 10DC-Ops, 10serviceops: Rename wtp* servers to parse* (Parsoid PHP servers) - https://phabricator.wikimedia.org/T245888 (10Dzahn) linking ticket and gerrit change that adds the _new_ parse* servers to go with this: T243112 https://gerrit.wikimedia.org/r/c/operations/pup... [18:32:27] (03PS3) 10Dzahn: site: add second batch of new eqiad appservers as spares, by rack [puppet] - 10https://gerrit.wikimedia.org/r/575382 (https://phabricator.wikimedia.org/T241849) [18:37:27] (03PS4) 10Dzahn: site: add second batch of new eqiad appservers as spares, by rack [puppet] - 10https://gerrit.wikimedia.org/r/575382 (https://phabricator.wikimedia.org/T241849) [18:41:36] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:43:28] (03PS1) 10Andrew Bogott: profile::wmcs::nfs::backup::primary::base: include a standard firewall [puppet] - 10https://gerrit.wikimedia.org/r/575594 (https://phabricator.wikimedia.org/T245808) [18:45:44] (03CR) 10Jforrester: [C: 03+1] "This feels like it would (a) work and (b) not be totally terrible in impacts." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575469 (https://phabricator.wikimedia.org/T236104) (owner: 10Giuseppe Lavagetto) [18:47:54] !log milimetric@deploy1001 Started deploy [analytics/refinery@0fc392f]: Hotfix: going back to a safe version of geo udf [18:47:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:48:12] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:51:34] RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [18:51:44] RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [18:52:18] RECOVERY - BFD status on cr3-knams is OK: OK: UP: 8 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [18:52:18] RECOVERY - BFD status on cr1-eqiad is OK: OK: UP: 11 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [18:53:04] RECOVERY - OSPF status on cr3-knams is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [18:59:07] (03PS1) 10Herron: check_confd_template: glob fixup and add detail to alerts [puppet] - 10https://gerrit.wikimedia.org/r/575598 [19:01:01] !log milimetric@deploy1001 Finished deploy [analytics/refinery@0fc392f]: Hotfix: going back to a safe version of geo udf (duration: 13m 06s) [19:01:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:01:14] !log milimetric@deploy1001 Started deploy [analytics/refinery@0fc392f] (thin): Hotfix: going back to a safe version of geo udf [19:01:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:01:21] !log milimetric@deploy1001 Finished deploy [analytics/refinery@0fc392f] (thin): Hotfix: going back to a safe version of geo udf (duration: 00m 07s) [19:01:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:04] 10Operations, 10ops-eqiad, 10DC-Ops: (Need by: ASAP) rack/setup/install stat1008 - https://phabricator.wikimedia.org/T246472 (10RobH) [19:05:40] (03CR) 10Dzahn: [C: 03+2] site: add second batch of new eqiad appservers as spares, by rack [puppet] - 10https://gerrit.wikimedia.org/r/575382 (https://phabricator.wikimedia.org/T241849) (owner: 10Dzahn) [19:06:12] 10Operations, 10ops-eqiad, 10DC-Ops: (Need by: ASAP) rack/setup/install stat1008 - https://phabricator.wikimedia.org/T246472 (10RobH) [19:10:49] 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10serviceops, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) Thanks @Sbailey yes, that's correct. We actually want it on the ticke... [19:11:03] 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10serviceops, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) a:05Sbailey→03Dzahn [19:15:12] (03PS1) 10Dzahn: admins: upgrade Shannon Bailey from ldap_only to shell, parsoid-test-root [puppet] - 10https://gerrit.wikimedia.org/r/575599 (https://phabricator.wikimedia.org/T245877) [19:19:40] 10Operations, 10ops-codfw, 10fundraising-tech-ops: (Need by: TBD) codfw: rack/setup/install 3 new payments server for frack - https://phabricator.wikimedia.org/T244169 (10Jgreen) [19:33:44] (03CR) 10Papaul: [C: 03+2] DHCP: Add MAC address for lvs200[7-8] [puppet] - 10https://gerrit.wikimedia.org/r/575565 (https://phabricator.wikimedia.org/T196560) (owner: 10Papaul) [19:33:57] (03PS2) 10Papaul: DHCP: Add MAC address for lvs200[7-8] [puppet] - 10https://gerrit.wikimedia.org/r/575565 (https://phabricator.wikimedia.org/T196560) [19:34:02] (03CR) 10Papaul: [V: 03+2 C: 03+2] DHCP: Add MAC address for lvs200[7-8] [puppet] - 10https://gerrit.wikimedia.org/r/575565 (https://phabricator.wikimedia.org/T196560) (owner: 10Papaul) [19:39:23] (03PS1) 10CRusnov: netbox: Add framework for exposing scripts to internal services [puppet] - 10https://gerrit.wikimedia.org/r/575603 [19:40:23] (03CR) 10jerkins-bot: [V: 04-1] netbox: Add framework for exposing scripts to internal services [puppet] - 10https://gerrit.wikimedia.org/r/575603 (owner: 10CRusnov) [19:42:56] 10Operations, 10ops-codfw, 10fundraising-tech-ops: (Need by: TBD) codfw: rack/setup/install 3 new payments server for frack - https://phabricator.wikimedia.org/T244169 (10Jgreen) [19:49:02] (03PS2) 10CRusnov: netbox: Add framework for exposing scripts to internal services [puppet] - 10https://gerrit.wikimedia.org/r/575603 (https://phabricator.wikimedia.org/T243927) [19:52:03] (03CR) 10jerkins-bot: [V: 04-1] netbox: Add framework for exposing scripts to internal services [puppet] - 10https://gerrit.wikimedia.org/r/575603 (https://phabricator.wikimedia.org/T243927) (owner: 10CRusnov) [19:52:24] 10Operations, 10ops-codfw, 10Traffic, 10netops: switch port configuration for lvs200[7-10] - https://phabricator.wikimedia.org/T196946 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` lvs2008.codfw.wmnet ` The log can be found in `/var/log/wmf-auto... [19:56:02] (03PS3) 10CRusnov: netbox: Add framework for exposing scripts to internal services [puppet] - 10https://gerrit.wikimedia.org/r/575603 (https://phabricator.wikimedia.org/T243927) [20:04:43] (03CR) 10Dzahn: [C: 03+2] "confirmed L3 has been signed, requested by manager (Subbu), confirmed manager and full time employee status on corp-LDAP, only test server" [puppet] - 10https://gerrit.wikimedia.org/r/575599 (https://phabricator.wikimedia.org/T245877) (owner: 10Dzahn) [20:14:36] 10Operations, 10Jade, 10TechCom, 10Core Platform Team Legacy (Watching / External), and 4 others: Deploy pilot of Jade to a small set of wikis. - https://phabricator.wikimedia.org/T183381 (10Halfak) [20:21:30] 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10serviceops, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) 05Open→03Resolved @Sbailey You now have root access to the parsoi... [20:22:28] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:22:39] 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10serviceops, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) @Sbailey Here you see an example SSH config for jumping via bastion h... [20:23:16] jouncebot: now [20:23:16] For the next 11 hour(s) and 36 minute(s): NO DEPLOYS (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200228T0800) [20:24:40] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:24:44] fatals spike without deploy but going down and within last 24hour view there were larger ones [20:26:56] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [20:26:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:46] (03Abandoned) 10Dzahn: admins: add Shannon Bailey to parsoid-test groups, upgrade to shell user [puppet] - 10https://gerrit.wikimedia.org/r/575097 (https://phabricator.wikimedia.org/T245877) (owner: 10Dzahn) [20:29:11] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [20:29:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:30:24] (03PS1) 10Sharvaniharan: Enabling depicts count [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575611 [20:34:37] 10Operations, 10ops-codfw, 10Traffic, 10netops: switch port configuration for lvs200[7-10] - https://phabricator.wikimedia.org/T196946 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['lvs2008.codfw.wmnet'] ` and were **ALL** successful. [20:35:41] 10Operations, 10ops-codfw, 10Traffic, 10netops: switch port configuration for lvs200[7-10] - https://phabricator.wikimedia.org/T196946 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` lvs2007.codfw.wmnet ` The log can be found in `/var/log/wmf-auto... [20:37:09] (03PS1) 10Holger Knust: kafka-dev: Updated API endpoint and added required selector [deployment-charts] - 10https://gerrit.wikimedia.org/r/575612 [20:37:46] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:38:46] PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 4/6 UP : OSPFv3: 4/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [20:39:05] (03PS2) 10Dzahn: upgrade cscott, arlolra from parsoid-test-admins to parsoid-test-roots [puppet] - 10https://gerrit.wikimedia.org/r/575386 [20:39:06] PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [20:39:30] PROBLEM - BFD status on cr1-eqiad is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [20:39:42] PROBLEM - BFD status on cr3-knams is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [20:39:58] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:40:26] PROBLEM - OSPF status on cr3-knams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [20:40:30] (03PS1) 10Herron: logstash: filter-syslog: match http_method on DATA [puppet] - 10https://gerrit.wikimedia.org/r/575614 [20:42:36] (03CR) 10Holger Knust: "This is prep for https://phabricator.wikimedia.org/T245803 "Make changeprop chart depend on Kafka-dev for minikube". The API changed in v1" [deployment-charts] - 10https://gerrit.wikimedia.org/r/575612 (owner: 10Holger Knust) [20:43:23] (03CR) 10Effie Mouzeli: [C: 03+1] "That will cover our weird looking pybal test" [puppet] - 10https://gerrit.wikimedia.org/r/575614 (owner: 10Herron) [20:43:40] 10Operations, 10ops-codfw, 10Traffic, 10netops: switch port configuration for lvs200[7-10] - https://phabricator.wikimedia.org/T196946 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['lvs2007.codfw.wmnet'] ` Of which those **FAILED**: ` ['lvs2007.codfw.wmnet'] ` [20:45:17] 10Operations, 10ops-codfw, 10Traffic: (Need by: TBD) rack/setup/install LVS200[7-10] - https://phabricator.wikimedia.org/T196560 (10Papaul) @Vgutierrez lvs2008 is ready for service will can on lvs2007 on Monday [20:47:01] (03CR) 10Dzahn: [C: 03+2] upgrade cscott, arlolra from parsoid-test-admins to parsoid-test-roots [puppet] - 10https://gerrit.wikimedia.org/r/575386 (owner: 10Dzahn) [20:48:48] 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10serviceops, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) @arlolra and @cscott You also have root on parsoid::testing (scandium... [20:54:01] 10Operations, 10Performance-Team, 10serviceops, 10Patch-For-Review: Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10jijiki) >>! In T240684#5925272, @aaron wrote: > Yeah, the eviction ("tko") delay should be low to avoid prolonging DB traffic spike... [21:08:31] 10Operations, 10ops-codfw, 10fundraising-tech-ops: (Need by: TBD) codfw: rack/setup/install 3 new payments server for frack - https://phabricator.wikimedia.org/T244169 (10Jgreen) We're seeing an odd warning/error about bond0/eno2 on the new payments2003: [ 13.843817] bond0: invalid new link 3 on slave eno... [21:10:18] RECOVERY - BFD status on cr3-knams is OK: OK: UP: 8 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [21:10:38] RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [21:10:52] RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [21:11:16] RECOVERY - OSPF status on cr3-knams is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [21:11:56] 10Operations, 10ops-codfw, 10fundraising-tech-ops: new payments2003 bonded ethernet network error/warning - https://phabricator.wikimedia.org/T246492 (10Jgreen) [21:11:56] RECOVERY - BFD status on cr1-eqiad is OK: OK: UP: 11 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [21:24:21] (03CR) 10Ppchelko: "Ok, I've been poking this for a while and I tend to agree with Holger - the 2 installations are different enough that splitting them into " [deployment-charts] - 10https://gerrit.wikimedia.org/r/575108 (https://phabricator.wikimedia.org/T220399) (owner: 10Holger Knust) [21:25:16] 10Operations, 10ops-codfw, 10fundraising-tech-ops: new payments2003 bonded ethernet network error/warning - https://phabricator.wikimedia.org/T246492 (10Jgreen) ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio) tg3.c:v3.137 (May 11, 2014) tg3 0000:04:00.0 eth0: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express)... [21:25:35] (03CR) 10Ppchelko: "> Patch Set 1:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/575612 (owner: 10Holger Knust) [21:31:32] !log using planet1001 to manually hack APT sources to test new apt1001.wikimedia.org [21:31:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:36:04] (03PS18) 10Effie Mouzeli: mcrouter: add gutter pool servers in configuration [puppet] - 10https://gerrit.wikimedia.org/r/569541 (https://phabricator.wikimedia.org/T213089) [21:39:03] 10Operations, 10Performance-Team, 10serviceops, 10Patch-For-Review: Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10jijiki) @aaron Please have a look at https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/569541/ and let us know if it reflects... [21:39:10] (03CR) 10jerkins-bot: [V: 04-1] mcrouter: add gutter pool servers in configuration [puppet] - 10https://gerrit.wikimedia.org/r/569541 (https://phabricator.wikimedia.org/T213089) (owner: 10Effie Mouzeli) [21:42:30] RECOVERY - Check systemd state on logstash1010 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:51:01] 10Operations, 10Wikidata, 10Wikidata-Query-Service: WDQS Categories update lag alert - https://phabricator.wikimedia.org/T246497 (10ayounsi) p:05Triage→03Medium [21:58:00] (03PS1) 10Holger Knust: changeprop: Add Kafka subchart [deployment-charts] - 10https://gerrit.wikimedia.org/r/575620 (https://phabricator.wikimedia.org/T245803) [21:58:11] 10Operations, 10ops-eqiad, 10serviceops, 10Patch-For-Review: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10Dzahn) p:05Medium→03High Raising priority because the Needed-by date has arrived . Could we have a status update @Cmjohnson ? Is... [21:58:16] (03CR) 10jerkins-bot: [V: 04-1] changeprop: Add Kafka subchart [deployment-charts] - 10https://gerrit.wikimedia.org/r/575620 (https://phabricator.wikimedia.org/T245803) (owner: 10Holger Knust) [22:00:55] (03PS2) 10Holger Knust: changeprop: Add Kafka subchart [deployment-charts] - 10https://gerrit.wikimedia.org/r/575620 (https://phabricator.wikimedia.org/T245803) [22:01:04] 10Operations, 10Patch-For-Review: Upgrade install servers to Buster - https://phabricator.wikimedia.org/T224576 (10Dzahn) [22:01:10] (03CR) 10jerkins-bot: [V: 04-1] changeprop: Add Kafka subchart [deployment-charts] - 10https://gerrit.wikimedia.org/r/575620 (https://phabricator.wikimedia.org/T245803) (owner: 10Holger Knust) [22:03:47] (03PS3) 10Holger Knust: changeprop: Add Kafka subchart [deployment-charts] - 10https://gerrit.wikimedia.org/r/575620 (https://phabricator.wikimedia.org/T245803) [22:10:29] (03CR) 10Bstorm: [C: 03+2] toolforge-kubernetes: shut down the old maintain-kubeusers [puppet] - 10https://gerrit.wikimedia.org/r/575322 (https://phabricator.wikimedia.org/T214513) (owner: 10Bstorm) [22:13:43] wmf-config/InitialiseSettings.php | 6301 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------- [22:13:45] wat [22:19:37] errr? [22:20:20] 12 files changed, 5736 insertions(+), 2349 deletions(-) [22:20:22] * Reedy eyes James_F [22:20:36] What? [22:21:06] Oh, the merger of wgLogo, wgLogos, wgMinervaVectorLogo, and wgVectorPrintLogo all into one? You're welcome. [22:21:21] But production should be all fine with the changes, right? [22:21:33] heh [22:22:07] if you're asking me I am absolutely not going to answer... just peeked in before winding down for bed, the usual [22:22:13] :-D [22:22:13] midnight-30 [22:22:45] paged in 3...2...1... [22:23:04] * James_F grins. [22:25:02] (03CR) 10Ppchelko: [C: 04-1] "1. Given that you reference 0.0.5 version fo Kafka-dev chart, this depends on I752de2bedef6660ba13525ff5ba2f6b3da3dba2b. Please include it" (036 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/575620 (https://phabricator.wikimedia.org/T245803) (owner: 10Holger Knust) [22:32:48] hope you all have a quiet rest of the day and weekend! [22:33:14] (03PS2) 10Holger Knust: kafka-dev: Updated API endpoint and added required selector [deployment-charts] - 10https://gerrit.wikimedia.org/r/575612 (https://phabricator.wikimedia.org/T246501) [22:34:11] (03CR) 10Holger Knust: "Added Task number T246501" [deployment-charts] - 10https://gerrit.wikimedia.org/r/575612 (https://phabricator.wikimedia.org/T246501) (owner: 10Holger Knust) [22:34:58] (03PS1) 10Herron: lvs: kibana-next: promote from "service_setup" to "lvs_setup" [puppet] - 10https://gerrit.wikimedia.org/r/575631 (https://phabricator.wikimedia.org/T234854) [22:38:03] apergos: same to you [22:40:43] (03CR) 10Dzahn: "Testing in browser is a bit tricky because they enforce https and HSTS is enabled and the cert just has the apt.wm.org host name on it.. b" [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [22:42:38] (03CR) 10Herron: [C: 04-2] "-2 until coordinating with traffic" [puppet] - 10https://gerrit.wikimedia.org/r/575631 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron) [22:43:26] (03CR) 10Herron: [C: 04-2] "https://puppet-compiler.wmflabs.org/compiler1003/21156/" [puppet] - 10https://gerrit.wikimedia.org/r/575631 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron) [22:44:49] (03CR) 10Dzahn: "The following NEW packages will be installed:" [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [22:45:12] PROBLEM - Check systemd state on planet1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:45:56] ACKNOWLEDGEMENT - Check systemd state on planet1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn testing new APT repo https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:47:22] RECOVERY - Check systemd state on planet1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:47:35] (03CR) 10Dzahn: [C: 03+1] "seems good to me, but if you also want to do some tests that would be appreciated" [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [22:51:05] (03CR) 10Dzahn: [C: 03+1] "well.. that was about the http part and using it in sources lists... there is still also testing reprepro to import packages and it looks " [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [23:05:13] (03PS4) 10Holger Knust: changeprop: Add Kafka subchart [deployment-charts] - 10https://gerrit.wikimedia.org/r/575620 (https://phabricator.wikimedia.org/T245803) [23:09:48] (03PS5) 10Holger Knust: changeprop: Add Kafka subchart [deployment-charts] - 10https://gerrit.wikimedia.org/r/575620 (https://phabricator.wikimedia.org/T245803) [23:09:56] (03CR) 10jerkins-bot: [V: 04-1] changeprop: Add Kafka subchart [deployment-charts] - 10https://gerrit.wikimedia.org/r/575620 (https://phabricator.wikimedia.org/T245803) (owner: 10Holger Knust) [23:10:59] (03PS4) 10CRusnov: netbox: Add framework for exposing scripts to internal services [puppet] - 10https://gerrit.wikimedia.org/r/575603 (https://phabricator.wikimedia.org/T243927) [23:11:09] (03PS1) 10Bstorm: sonofgridengine: prepare for new domain name [puppet] - 10https://gerrit.wikimedia.org/r/575637 (https://phabricator.wikimedia.org/T245572) [23:12:18] (03PS3) 10Holger Knust: kafka-dev: Updated API endpoint and added required selector [deployment-charts] - 10https://gerrit.wikimedia.org/r/575612 (https://phabricator.wikimedia.org/T246501) [23:13:09] (03PS1) 10Dzahn: aptrepo: puppetize REPREPRO_BASE_DIR env variable [puppet] - 10https://gerrit.wikimedia.org/r/575638 (https://phabricator.wikimedia.org/T224576) [23:13:35] (03CR) 10Bstorm: "Since the lines are so simple, I really think a file_line resource should do it for this. I played with the collectors and other ideas, bu" [puppet] - 10https://gerrit.wikimedia.org/r/575637 (https://phabricator.wikimedia.org/T245572) (owner: 10Bstorm) [23:14:19] (03CR) 10jerkins-bot: [V: 04-1] aptrepo: puppetize REPREPRO_BASE_DIR env variable [puppet] - 10https://gerrit.wikimedia.org/r/575638 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [23:14:46] (03CR) 10Holger Knust: "Addressed items identified" (036 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/575620 (https://phabricator.wikimedia.org/T245803) (owner: 10Holger Knust) [23:17:55] (03PS2) 10Dzahn: aptrepo: puppetize REPREPRO_BASE_DIR env variable [puppet] - 10https://gerrit.wikimedia.org/r/575638 (https://phabricator.wikimedia.org/T224576) [23:19:04] (03CR) 10jerkins-bot: [V: 04-1] aptrepo: puppetize REPREPRO_BASE_DIR env variable [puppet] - 10https://gerrit.wikimedia.org/r/575638 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [23:20:45] (03PS3) 10Dzahn: aptrepo: puppetize REPREPRO_BASE_DIR env variable [puppet] - 10https://gerrit.wikimedia.org/r/575638 (https://phabricator.wikimedia.org/T224576) [23:22:59] (03CR) 10Dzahn: [C: 03+2] access_new_install: remove superfluous lint-ignore and FIXME [puppet] - 10https://gerrit.wikimedia.org/r/575405 (owner: 10Dzahn) [23:23:35] (03CR) 10jerkins-bot: [V: 04-1] aptrepo: puppetize REPREPRO_BASE_DIR env variable [puppet] - 10https://gerrit.wikimedia.org/r/575638 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [23:26:42] (03CR) 10CRusnov: "compiler output: https://puppet-compiler.wmflabs.org/compiler1003/21157/" [puppet] - 10https://gerrit.wikimedia.org/r/575603 (https://phabricator.wikimedia.org/T243927) (owner: 10CRusnov) [23:30:09] (03CR) 10Dzahn: "jclark shared the dcops planning sheet and it confirms the list of servers in this change as well. thanks" [puppet] - 10https://gerrit.wikimedia.org/r/575382 (https://phabricator.wikimedia.org/T241849) (owner: 10Dzahn) [23:37:06] (03PS4) 10Dzahn: aptrepo: puppetize REPREPRO_BASE_DIR env variable [puppet] - 10https://gerrit.wikimedia.org/r/575638 (https://phabricator.wikimedia.org/T224576) [23:38:04] (03CR) 10Jhedden: [C: 03+1] sonofgridengine: prepare for new domain name [puppet] - 10https://gerrit.wikimedia.org/r/575637 (https://phabricator.wikimedia.org/T245572) (owner: 10Bstorm) [23:38:07] (03PS1) 10Andrew Bogott: grafana: prevent Anonymous viewers from editing user settings [puppet] - 10https://gerrit.wikimedia.org/r/575641 (https://phabricator.wikimedia.org/T246502) [23:38:20] (03CR) 10jerkins-bot: [V: 04-1] aptrepo: puppetize REPREPRO_BASE_DIR env variable [puppet] - 10https://gerrit.wikimedia.org/r/575638 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [23:41:02] (03PS1) 10Andrew Bogott: grafana-labs: remove auto-login feature [puppet] - 10https://gerrit.wikimedia.org/r/575642 (https://phabricator.wikimedia.org/T246502) [23:42:41] (03PS2) 10Gergő Tisza: Switch Newcomer Tasks topic search to ORES-based on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575478 [23:44:03] (03CR) 10Dzahn: "why am i getting " Unknown resource type: 'file_line', we use that all over the place ?!" [puppet] - 10https://gerrit.wikimedia.org/r/575638 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [23:44:46] (03CR) 10Andrew Bogott: [C: 04-2] "this turns out to block people w/out a cookie and ask them to log in. So it's no good as it is." [puppet] - 10https://gerrit.wikimedia.org/r/575642 (https://phabricator.wikimedia.org/T246502) (owner: 10Andrew Bogott) [23:47:30] 10Operations, 10ops-eqiad, 10DC-Ops: audit/rebalance power in a5-eqiad - https://phabricator.wikimedia.org/T245655 (10Papaul) I looked into this a little bit, ps1-a5-eqiad has some value setup under Health and in the high column https://librenms.wikimedia.org/device/device=41/tab=edit/section=health/ Line,... [23:48:32] (03CR) 10Gergő Tisza: [C: 03+2] "Deploying (beta-only)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575478 (owner: 10Gergő Tisza) [23:49:47] (03Merged) 10jenkins-bot: Switch Newcomer Tasks topic search to ORES-based on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575478 (owner: 10Gergő Tisza) [23:50:28] (03CR) 10Dzahn: [C: 03+1] "after adding the BASEDIR line in /root/.bashrc manually:" [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [23:50:47] (03CR) 10Andrew Bogott: [C: 03+2] grafana: prevent Anonymous viewers from editing user settings [puppet] - 10https://gerrit.wikimedia.org/r/575641 (https://phabricator.wikimedia.org/T246502) (owner: 10Andrew Bogott) [23:52:13] (03PS5) 10CRusnov: tox: Support DNS_INCLUDE_DIR and generated DNS [dns] - 10https://gerrit.wikimedia.org/r/569340 (https://phabricator.wikimedia.org/T243362) [23:52:15] (03CR) 10Bstorm: [C: 03+2] sonofgridengine: prepare for new domain name [puppet] - 10https://gerrit.wikimedia.org/r/575637 (https://phabricator.wikimedia.org/T245572) (owner: 10Bstorm) [23:53:51] (03CR) 10CRusnov: "Notably, PS4 changes the flow slightly as we discussed on IRC, essentially inserting the generated zones from whatever the specified path " [dns] - 10https://gerrit.wikimedia.org/r/569340 (https://phabricator.wikimedia.org/T243362) (owner: 10CRusnov) [23:58:01] (03PS1) 10CRusnov: templates/wmnet: Remove most mgmt.eqiad entries to test generated zones [dns] - 10https://gerrit.wikimedia.org/r/575650 [23:59:19] (03CR) 10CRusnov: "Also notable:" [dns] - 10https://gerrit.wikimedia.org/r/569340 (https://phabricator.wikimedia.org/T243362) (owner: 10CRusnov) [23:59:43] (03Abandoned) 10Bstorm: sonofgridengine: accomodate the new domain name [puppet] - 10https://gerrit.wikimedia.org/r/574885 (https://phabricator.wikimedia.org/T245572) (owner: 10Bstorm)