[00:09:00] (03PS1) 10Dzahn: mediawiki:maintenance: switch translationnotifications to PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/528296 (https://phabricator.wikimedia.org/T195392) [00:21:29] (03PS1) 10CRusnov: netbox: Fix parameter that didnt get passed [puppet] - 10https://gerrit.wikimedia.org/r/528299 [00:22:21] (03CR) 10jerkins-bot: [V: 04-1] netbox: Fix parameter that didnt get passed [puppet] - 10https://gerrit.wikimedia.org/r/528299 (owner: 10CRusnov) [00:24:45] (03CR) 10Smalyshev: Add L and M to allowed statement starts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526755 (owner: 10Smalyshev) [00:34:23] (03PS3) 10CRusnov: netbox: Fix parameter that didnt get passed [puppet] - 10https://gerrit.wikimedia.org/r/528285 [00:35:03] (03Abandoned) 10CRusnov: netbox: Fix parameter that didnt get passed [puppet] - 10https://gerrit.wikimedia.org/r/528299 (owner: 10CRusnov) [00:37:07] (03PS4) 10CRusnov: netbox: Fix parameter that didnt get passed [puppet] - 10https://gerrit.wikimedia.org/r/528285 [00:41:38] (03CR) 10CRusnov: [V: 03+2 C: 03+2] "Self merging due to production errors." [puppet] - 10https://gerrit.wikimedia.org/r/528285 (owner: 10CRusnov) [00:41:49] (03PS5) 10CRusnov: netbox: Fix parameter that didnt get passed [puppet] - 10https://gerrit.wikimedia.org/r/528285 [00:51:17] RECOVERY - puppet last run on netmon1002 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [00:52:52] (03PS1) 10CRusnov: netbox: fix swift url [puppet] - 10https://gerrit.wikimedia.org/r/528301 [01:11:29] (03PS10) 10Holger Knust: table-properties: Initial commit [software/cassandra-table-properties] - 10https://gerrit.wikimedia.org/r/524921 (https://phabricator.wikimedia.org/T220246) [01:11:41] RECOVERY - puppet last run on netmon2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [01:13:11] (03PS7) 10Viztor: Update HD logo for en.ws and mul.ws [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) [01:22:25] (03CR) 10Viztor: "> Patch Set 5: Code-Review-1" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [01:26:15] PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [01:29:19] PROBLEM - WDQS Categories update lag on wdqs1008 is CRITICAL: CRITICAL - wdqs categories lag: 18 days, 20:29:17.389366 https://wikitech.wikimedia.org/wiki/Wikidata_query_service [01:29:19] PROBLEM - WDQS Categories update lag on wdqs1006 is CRITICAL: CRITICAL - wdqs categories lag: 17 days, 20:29:17.407308 https://wikitech.wikimedia.org/wiki/Wikidata_query_service [01:59:53] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [02:05:42] !log Creating local accounts for Community Tech bot on every Wikipedia [02:05:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:38:01] PROBLEM - Postgres Replication Lag on maps2001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 21656704 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [02:43:49] (03PS1) 10DannyS712: Update wmflabs.org redirect target [puppet] - 10https://gerrit.wikimedia.org/r/528304 (https://phabricator.wikimedia.org/T229896) [02:44:15] (03CR) 10jerkins-bot: [V: 04-1] Update wmflabs.org redirect target [puppet] - 10https://gerrit.wikimedia.org/r/528304 (https://phabricator.wikimedia.org/T229896) (owner: 10DannyS712) [02:44:23] (03PS11) 10Holger Knust: table-properties: Initial commit [software/cassandra-table-properties] - 10https://gerrit.wikimedia.org/r/524921 (https://phabricator.wikimedia.org/T220246) [02:44:40] (03PS2) 10DannyS712: Update wmflabs.org redirect target [puppet] - 10https://gerrit.wikimedia.org/r/528304 (https://phabricator.wikimedia.org/T229896) [02:45:10] (03PS3) 10DannyS712: Update wmflabs.org redirect target [puppet] - 10https://gerrit.wikimedia.org/r/528304 (https://phabricator.wikimedia.org/T229896) [02:45:29] (03CR) 10jerkins-bot: [V: 04-1] table-properties: Initial commit [software/cassandra-table-properties] - 10https://gerrit.wikimedia.org/r/524921 (https://phabricator.wikimedia.org/T220246) (owner: 10Holger Knust) [02:46:05] PROBLEM - puppet last run on lvs5003 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [02:47:17] PROBLEM - Apache HTTP on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [02:47:49] RECOVERY - Postgres Replication Lag on maps2001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 139264 and 39 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [02:48:25] (03PS1) 10Herron: logstash: manage currently unmanaged log4j2.properties file [puppet] - 10https://gerrit.wikimedia.org/r/528305 (https://phabricator.wikimedia.org/T166107) [02:48:27] (03PS1) 10Herron: logstash: rotate logstash plain logs with log4j2 [puppet] - 10https://gerrit.wikimedia.org/r/528306 (https://phabricator.wikimedia.org/T166107) [02:48:47] RECOVERY - Apache HTTP on mw1281 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.033 second response time https://wikitech.wikimedia.org/wiki/Application_servers [02:48:50] (03CR) 10jerkins-bot: [V: 04-1] logstash: manage currently unmanaged log4j2.properties file [puppet] - 10https://gerrit.wikimedia.org/r/528305 (https://phabricator.wikimedia.org/T166107) (owner: 10Herron) [02:49:31] PROBLEM - Widespread puppet agent failures- no resources reported on icinga1001 is CRITICAL: site=eqsin https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [02:50:22] (03PS2) 10Herron: logstash: manage currently unmanaged log4j2.properties file [puppet] - 10https://gerrit.wikimedia.org/r/528305 (https://phabricator.wikimedia.org/T166107) [02:57:35] (03PS12) 10Holger Knust: table-properties: Initial commit [software/cassandra-table-properties] - 10https://gerrit.wikimedia.org/r/524921 (https://phabricator.wikimedia.org/T220246) [03:14:09] RECOVERY - puppet last run on lvs5003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [03:18:31] RECOVERY - Widespread puppet agent failures- no resources reported on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [03:29:33] PROBLEM - snapshot of s6 in codfw on db1115 is CRITICAL: snapshot for s6 at codfw taken more than 4 days ago: Most recent backup 2019-08-02 03:26:15 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [03:58:09] !log start importing group[12] to cloudelastic from mwmaint1002 [03:58:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:03:20] (03PS6) 10Vgutierrez: prometheus: Collect ncredir nginx metrics [puppet] - 10https://gerrit.wikimedia.org/r/524409 (https://phabricator.wikimedia.org/T228382) [04:03:39] (03CR) 10jerkins-bot: [V: 04-1] prometheus: Collect ncredir nginx metrics [puppet] - 10https://gerrit.wikimedia.org/r/524409 (https://phabricator.wikimedia.org/T228382) (owner: 10Vgutierrez) [04:04:05] (03PS7) 10Vgutierrez: prometheus: Collect ncredir nginx metrics [puppet] - 10https://gerrit.wikimedia.org/r/524409 (https://phabricator.wikimedia.org/T228382) [04:06:22] (03CR) 10Vgutierrez: [C: 03+2] prometheus: Collect ncredir nginx metrics [puppet] - 10https://gerrit.wikimedia.org/r/524409 (https://phabricator.wikimedia.org/T228382) (owner: 10Vgutierrez) [04:27:36] (03PS1) 10Vgutierrez: ncredir: Allow prometheus nodes to reach mtail port [puppet] - 10https://gerrit.wikimedia.org/r/528309 (https://phabricator.wikimedia.org/T228382) [04:29:53] (03CR) 10Vgutierrez: [C: 03+2] "pcc looks happy: https://puppet-compiler.wmflabs.org/compiler1001/17736/" [puppet] - 10https://gerrit.wikimedia.org/r/528309 (https://phabricator.wikimedia.org/T228382) (owner: 10Vgutierrez) [05:01:40] (03PS1) 10Marostegui: Revert "dbproxy1011: Depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/528311 [05:03:40] (03PS2) 10Marostegui: Revert "dbproxy1011: Depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/528311 [05:04:42] ACKNOWLEDGEMENT - snapshot of s6 in codfw on db1115 is CRITICAL: snapshot for s6 at codfw taken more than 4 days ago: Most recent backup 2019-08-02 03:26:15 Marostegui checking https://wikitech.wikimedia.org/wiki/MariaDB/Backups [05:04:48] (03CR) 10Marostegui: [C: 03+2] Revert "dbproxy1011: Depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/528311 (owner: 10Marostegui) [05:06:39] !log Reload haproxy on dbproxy1011 to repool labsdb1010 T222978 [05:06:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:06:49] T222978: Compress and defragment tables on labsdb hosts - https://phabricator.wikimedia.org/T222978 [05:34:29] !log Restart wikibugs [05:34:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:38:20] 10Operations, 10ops-eqiad, 10netops: (Need By: Sept 30) upgrade msw1-eqiad from EX4200 to EX4300 - https://phabricator.wikimedia.org/T225121 (10Papaul) @Cmjohnson I put together a "How to" at the link below on how to upgrade the switch. Please let me know if you have any questions. https://wikitech.wikimedi... [05:38:22] 10Operations, 10Traffic: Provide prometheus metrics for the ncredir service - https://phabricator.wikimedia.org/T228382 (10Vgutierrez) 05Open→03Resolved [05:38:27] 10Operations, 10Traffic, 10Goal, 10HTTPS, 10Patch-For-Review: Create a secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548 (10Vgutierrez) [05:39:58] 10Operations, 10Domains, 10Traffic, 10Wikimedia-Apache-configuration: en-wp.org certificate error - https://phabricator.wikimedia.org/T190244 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez This has been solved by the deploy of the ncredir service (T133548). [05:42:36] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Add db2127 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528313 (https://phabricator.wikimedia.org/T228969) [05:43:25] (03PS1) 10Marostegui: db2127: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/528315 (https://phabricator.wikimedia.org/T228969) [05:43:50] (03PS2) 10Marostegui: db2127: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/528315 (https://phabricator.wikimedia.org/T228969) [05:46:07] (03PS3) 10Marostegui: db2127: Enable notifications, add to conftool [puppet] - 10https://gerrit.wikimedia.org/r/528315 (https://phabricator.wikimedia.org/T228969) [05:46:18] (03PS4) 10Marostegui: db2127: Enable notifications, add to conftool [puppet] - 10https://gerrit.wikimedia.org/r/528315 (https://phabricator.wikimedia.org/T228969) [05:46:43] (03CR) 10Vgutierrez: [C: 03+1] db-eqiad,db-codfw.php: Add db2127 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528313 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [05:47:29] (03CR) 10Marostegui: [C: 03+2] db2127: Enable notifications, add to conftool [puppet] - 10https://gerrit.wikimedia.org/r/528315 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [05:47:50] (03CR) 10ArielGlenn: [C: 03+1] db-eqiad,db-codfw.php: Add db2127 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528313 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [05:47:55] (03CR) 10Marostegui: [C: 03+2] db-eqiad,db-codfw.php: Add db2127 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528313 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [05:48:50] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Add db2127 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528313 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [05:49:04] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Add db2127 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528313 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [05:49:58] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Provision db2127 into s3 T228969 (duration: 00m 48s) [05:50:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:50:06] T228969: Productionize db21[21-30} - https://phabricator.wikimedia.org/T228969 [05:53:58] !log marostegui@cumin1001 dbctl commit (dc=all): 'Pool db2127 into s3 T228969', diff saved to https://phabricator.wikimedia.org/P8868 and previous config saved to /var/cache/conftool/dbconfig/20190806-055357-marostegui.json [05:54:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:54:11] cdanis: ^ :) [05:59:04] !log urbanecm@deploy1001 Synchronized php-1.34.0-wmf.16/extensions/CheckUser: Fix T229893 (duration: 00m 47s) [05:59:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:06:24] (03CR) 10Urbanecm: [C: 03+1] "Ok, makes sense. LGTM then!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [06:07:12] (03CR) 10jerkins-bot: [V: 04-1] Update HD logo for en.ws and mul.ws [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [06:12:17] (03PS1) 10Vgutierrez: ncredir: Let ncredir take over wikimedia.com and linked DNS zones [dns] - 10https://gerrit.wikimedia.org/r/528316 (https://phabricator.wikimedia.org/T133548) [06:13:19] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 54, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:14:57] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 56, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:22:49] 10Operations, 10DBA, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133 - https://phabricator.wikimedia.org/T229657 (10Marostegui) [06:31:40] 10Operations, 10DBA, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133 - https://phabricator.wikimedia.org/T229657 (10Marostegui) Announcement about wikitech sent to wikitech-l and operations list: https://lists.wikimedia.org/pipermail/wikitech-l/2019-... [06:32:02] (03CR) 10Giuseppe Lavagetto: envoyproxy: create module, add tls terminator definition (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/526110 (owner: 10Giuseppe Lavagetto) [06:32:16] (03CR) 10Giuseppe Lavagetto: envoyproxy: create module, add tls terminator definition (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526110 (owner: 10Giuseppe Lavagetto) [06:39:55] (03PS1) 10Marostegui: mariadb: Provision db2130 into s1 [puppet] - 10https://gerrit.wikimedia.org/r/528319 (https://phabricator.wikimedia.org/T228969) [06:41:27] (03PS2) 10Marostegui: mariadb: Provision db2130 into s1 [puppet] - 10https://gerrit.wikimedia.org/r/528319 (https://phabricator.wikimedia.org/T228969) [06:50:59] (03CR) 10Marostegui: [C: 03+2] mariadb: Provision db2130 into s1 [puppet] - 10https://gerrit.wikimedia.org/r/528319 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [06:52:56] 10Operations, 10LDAP: Create an LDAP replica in codfw (using LVS) - https://phabricator.wikimedia.org/T227778 (10MoritzMuehlenhoff) 05Open→03Resolved This is completed and all services not requiring writes have been switched over. [06:52:58] 10Operations, 10LDAP: Migrate web services using LDAP authentication towards the readonly LDAP replicas - https://phabricator.wikimedia.org/T227650 (10MoritzMuehlenhoff) [06:53:59] 10Operations, 10LuaSandbox: Build and deploy php-luasandbox 3.0.1 to Wikimedia wikis - https://phabricator.wikimedia.org/T187673 (10MoritzMuehlenhoff) [06:54:21] 10Operations, 10LuaSandbox: Build and deploy php-luasandbox 3.0.1 to Wikimedia wikis - https://phabricator.wikimedia.org/T187673 (10MoritzMuehlenhoff) a:05MoritzMuehlenhoff→03None This can wait until HHVM is undeployed, removing myself for now [06:55:47] 10Operations, 10DBA, 10Patch-For-Review: Firewall configurations for database hosts - https://phabricator.wikimedia.org/T104699 (10MoritzMuehlenhoff) 05Open→03Resolved This is complete. [06:55:58] PROBLEM - puppet last run on graphite1004 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [06:56:32] (03CR) 10Viztor: "> Patch Set 7: Verified-1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [06:57:18] (03PS1) 10Vgutierrez: Let ncredir take care of wikimediacommons non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/528320 (https://phabricator.wikimedia.org/T133548) [06:57:50] (03CR) 10Gehel: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/528306 (https://phabricator.wikimedia.org/T166107) (owner: 10Herron) [06:58:24] (03PS1) 10Vgutierrez: nc_redirects.dat: Re-enable wikimediacommons rules [puppet] - 10https://gerrit.wikimedia.org/r/528321 (https://phabricator.wikimedia.org/T133548) [07:01:31] (03CR) 10Vgutierrez: [C: 03+2] nc_redirects.dat: Re-enable wikimediacommons rules [puppet] - 10https://gerrit.wikimedia.org/r/528321 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [07:01:38] (03PS2) 10Vgutierrez: nc_redirects.dat: Re-enable wikimediacommons rules [puppet] - 10https://gerrit.wikimedia.org/r/528321 (https://phabricator.wikimedia.org/T133548) [07:05:25] (03CR) 10Urbanecm: [C: 03+1] "explained inline" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [07:06:06] (03CR) 10Vgutierrez: [C: 03+2] Let ncredir take care of wikimediacommons non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/528320 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [07:07:41] 10Operations, 10Traffic, 10Discovery-Search (Current work): Can't reach cloudelastic.wikimedia.org via IPv6 - https://phabricator.wikimedia.org/T229861 (10Mathew.onipe) 05Open→03Resolved [07:08:58] (03PS8) 10Viztor: Update HD logo for en.ws and mul.ws [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) [07:09:57] (03CR) 10Muehlenhoff: cassandra: rolling restart cookbook (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 (owner: 10Jbond) [07:10:47] !log pool maps1001. Postgres init complete - T229788 [07:10:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:10:57] T229788: postgresql replication issues on maps1001 - https://phabricator.wikimedia.org/T229788 [07:11:07] (03PS1) 10Vgutierrez: nc_redirects.dat: Re-enable voyagewiki.(org|com) rules [puppet] - 10https://gerrit.wikimedia.org/r/528345 (https://phabricator.wikimedia.org/T133548) [07:11:48] RECOVERY - snapshot of s6 in codfw on db1115 is OK: snapshot for s6 at codfw taken less than 4 days ago and larger than 90 GB: Last one 2019-08-06 06:17:51 from db2097.codfw.wmnet:3316 (501 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups [07:11:52] (03CR) 10Viztor: "> Patch Set 7:" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [07:11:55] \o/ [07:12:01] (03PS1) 10Vgutierrez: Let ncredir take care of voyagewiki.com and voyagewiki.org [dns] - 10https://gerrit.wikimedia.org/r/528368 (https://phabricator.wikimedia.org/T133548) [07:13:26] (03CR) 10Vgutierrez: [C: 03+2] nc_redirects.dat: Re-enable voyagewiki.(org|com) rules [puppet] - 10https://gerrit.wikimedia.org/r/528345 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [07:13:28] (03CR) 10Viztor: [C: 03+1] "> Patch Set 8:" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [07:13:36] (03PS2) 10Vgutierrez: nc_redirects.dat: Re-enable voyagewiki.(org|com) rules [puppet] - 10https://gerrit.wikimedia.org/r/528345 (https://phabricator.wikimedia.org/T133548) [07:16:50] (03CR) 10Viztor: "> Patch Set 8: Code-Review+1" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [07:18:42] 10Operations, 10Maps: postgresql replication issues on maps1001 - https://phabricator.wikimedia.org/T229788 (10Mathew.onipe) 05Open→03Resolved a:03Mathew.onipe Postgres reinitialization was performed to bring this slave back up. I'll close this task for now and investigate more if it re-occurs. [07:19:01] (03CR) 10Vgutierrez: [C: 03+2] Let ncredir take care of voyagewiki.com and voyagewiki.org [dns] - 10https://gerrit.wikimedia.org/r/528368 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [07:22:51] RECOVERY - puppet last run on graphite1004 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [07:24:06] 10Operations, 10Elasticsearch, 10Traffic, 10Discovery-Search (Current work): Icinga check defined from LVS configuration for cloudelastic are borked - https://phabricator.wikimedia.org/T229621 (10Mathew.onipe) [07:27:17] 10Operations, 10ops-codfw: SSH to mw2269.mgmt not working - https://phabricator.wikimedia.org/T227548 (10jijiki) 05Open→03Resolved @Papaul closing, sorry this slipped through the cracks. [07:27:22] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool for firmware and BIOS upgrade T228732', diff saved to https://phabricator.wikimedia.org/P8869 and previous config saved to /var/cache/conftool/dbconfig/20190806-072720-marostegui.json [07:27:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:32] T228732: Upgrade db1100 firmware and BIOS - https://phabricator.wikimedia.org/T228732 [07:27:56] !log Stop MySQL on db1100 before powering the host off - T228732 [07:28:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:29:58] 10Operations, 10vm-requests: eqiad/codfw: One VM for Failoid - https://phabricator.wikimedia.org/T229903 (10MoritzMuehlenhoff) [07:30:25] 10Operations, 10vm-requests: eqiad/codfw: One VM for Failoid - https://phabricator.wikimedia.org/T229903 (10MoritzMuehlenhoff) p:05Triage→03Normal a:03MoritzMuehlenhoff [07:31:14] 10Operations, 10ops-eqiad, 10DBA: Upgrade db1100 firmware and BIOS - https://phabricator.wikimedia.org/T228732 (10Marostegui) @Cmjohnson db1100 is now OFF. Once you are done leave it ON and will take care of MySQL and the rest of things Thanks! [07:35:16] 10Operations, 10vm-requests: eqiad/codfw: One VM for Failoid - https://phabricator.wikimedia.org/T229903 (10akosiaris) LGTM. Naming wise I 'd say let's do failoid{1,2}001.(eqiad|codfw).wmnet instead of the less obvious tureis/roentgenium that we have now. [07:37:17] (03CR) 10Urbanecm: "> Patch Set 8:" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [07:37:20] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [07:37:42] (03CR) 10Urbanecm: "> Patch Set 8:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [07:37:44] (03PS5) 10Mathew.onipe: cloudelastic: remove ocsp_proxy [puppet] - 10https://gerrit.wikimedia.org/r/511381 (https://phabricator.wikimedia.org/T223519) [07:38:46] 10Operations, 10vm-requests: eqiad/codfw: One VM for Failoid - https://phabricator.wikimedia.org/T229903 (10MoritzMuehlenhoff) >>! In T229903#5395435, @akosiaris wrote: > LGTM. Naming wise I 'd say let's do failoid{1,2}001.(eqiad|codfw).wmnet instead of the less obvious tureis/roentgenium that we have now. Ac... [07:39:25] (03CR) 10Urbanecm: [C: 03+1] "LGTM for both me and jenkins!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [07:40:33] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527773 (owner: 10Viztor) [07:41:43] (03CR) 10Urbanecm: [C: 03+1] "> Patch Set 8:" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [07:41:57] (03CR) 10Urbanecm: [C: 03+1] "LGTM!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527773 (owner: 10Viztor) [07:44:00] (03Abandoned) 10Mathew.onipe: elasticsearch: split plugin into base and cirrus [puppet] - 10https://gerrit.wikimedia.org/r/499785 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe) [07:44:46] (03PS2) 10Muehlenhoff: role::mediawiki::common: Remove support for Jessie [puppet] - 10https://gerrit.wikimedia.org/r/525532 [07:44:56] (03PS8) 10Viztor: Add hd variations for zhwikiource project logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527773 [07:47:34] (03PS1) 10Alexandros Kosiaris: Add resources stanza to prometheus-metrics-exporter [deployment-charts] - 10https://gerrit.wikimedia.org/r/528390 (https://phabricator.wikimedia.org/T228837) [07:48:34] (03CR) 10Muehlenhoff: [C: 03+2] role::mediawiki::common: Remove support for Jessie [puppet] - 10https://gerrit.wikimedia.org/r/525532 (owner: 10Muehlenhoff) [07:52:43] (03PS2) 10Muehlenhoff: profile::mediawiki::nutcracker: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/525229 [07:57:51] (03CR) 10Muehlenhoff: [C: 03+2] profile::mediawiki::nutcracker: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/525229 (owner: 10Muehlenhoff) [07:58:59] (03PS1) 10Muehlenhoff: Icinga: Remove support for Jessie [puppet] - 10https://gerrit.wikimedia.org/r/528391 [07:59:03] (03PS2) 10Alexandros Kosiaris: Add resources stanza to prometheus-metrics-exporter [deployment-charts] - 10https://gerrit.wikimedia.org/r/528390 (https://phabricator.wikimedia.org/T228837) [08:03:33] 10Operations, 10ops-eqiad, 10DC-Ops: b3-eqiad pdu refresh - https://phabricator.wikimedia.org/T227539 (10Marostegui) [08:05:12] 10Operations, 10ops-eqiad, 10DC-Ops: b3-eqiad pdu refresh - https://phabricator.wikimedia.org/T227539 (10Marostegui) db1104 is s8 primary master, we'd probably need to failover this host if we are not confident this host can be swapped over without downtime. @mark @faidon what do you guys thing? another pos... [08:06:01] (03CR) 10Alexandros Kosiaris: [C: 03+2] kubernetes: expand alert description [puppet] - 10https://gerrit.wikimedia.org/r/528143 (https://phabricator.wikimedia.org/T229262) (owner: 10Filippo Giunchedi) [08:06:09] 10Operations, 10ops-eqiad, 10DC-Ops: a1-eqiad pdu refresh - https://phabricator.wikimedia.org/T226782 (10Marostegui) [08:06:10] (03PS2) 10Alexandros Kosiaris: kubernetes: expand alert description [puppet] - 10https://gerrit.wikimedia.org/r/528143 (https://phabricator.wikimedia.org/T229262) (owner: 10Filippo Giunchedi) [08:06:15] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] kubernetes: expand alert description [puppet] - 10https://gerrit.wikimedia.org/r/528143 (https://phabricator.wikimedia.org/T229262) (owner: 10Filippo Giunchedi) [08:07:18] 10Operations, 10ops-eqiad, 10DC-Ops: a8-eqiad pdu refresh - https://phabricator.wikimedia.org/T227133 (10Marostegui) [08:08:43] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/17739/" [puppet] - 10https://gerrit.wikimedia.org/r/528391 (owner: 10Muehlenhoff) [08:14:43] (03PS6) 10Gehel: cloudelastic: remove ocsp_proxy [puppet] - 10https://gerrit.wikimedia.org/r/511381 (https://phabricator.wikimedia.org/T223519) (owner: 10Mathew.onipe) [08:14:51] (03CR) 10Gehel: [C: 03+2] cloudelastic: remove ocsp_proxy [puppet] - 10https://gerrit.wikimedia.org/r/511381 (https://phabricator.wikimedia.org/T223519) (owner: 10Mathew.onipe) [08:15:58] (03PS3) 10Filippo Giunchedi: base: stop per-host puppet critical when master has issues [puppet] - 10https://gerrit.wikimedia.org/r/528087 (https://phabricator.wikimedia.org/T229262) [08:16:00] (03PS2) 10Filippo Giunchedi: prometheus: stop polling varnish on upload backend [puppet] - 10https://gerrit.wikimedia.org/r/528142 [08:16:09] (03CR) 10Filippo Giunchedi: prometheus: stop polling varnish on upload backend (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/528142 (owner: 10Filippo Giunchedi) [08:17:02] (03CR) 10Filippo Giunchedi: [C: 03+2] toil: add rsyslog TLS remedy [puppet] - 10https://gerrit.wikimedia.org/r/520207 (https://phabricator.wikimedia.org/T199406) (owner: 10Filippo Giunchedi) [08:17:09] (03PS7) 10Filippo Giunchedi: toil: add rsyslog TLS remedy [puppet] - 10https://gerrit.wikimedia.org/r/520207 (https://phabricator.wikimedia.org/T199406) [08:18:43] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] "Tested in minikube, works fine, merging and upgrading staging and then production" [deployment-charts] - 10https://gerrit.wikimedia.org/r/528390 (https://phabricator.wikimedia.org/T228837) (owner: 10Alexandros Kosiaris) [08:19:30] (03CR) 10Gehel: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/528305 (https://phabricator.wikimedia.org/T166107) (owner: 10Herron) [08:21:29] (03CR) 10Gehel: [C: 03+1] logstash: rotate logstash plain logs with log4j2 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/528306 (https://phabricator.wikimedia.org/T166107) (owner: 10Herron) [08:22:41] !log @ helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' . [08:22:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:51] (03PS1) 10Vgutierrez: ncredir: Handle more domains to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/528394 (https://phabricator.wikimedia.org/T133548) [08:27:03] (03PS1) 10Filippo Giunchedi: toil: rsyslog_tls_remedy log to journald [puppet] - 10https://gerrit.wikimedia.org/r/528395 (https://phabricator.wikimedia.org/T199406) [08:28:11] (03CR) 10Filippo Giunchedi: [C: 03+1] logstash: manage currently unmanaged log4j2.properties file [puppet] - 10https://gerrit.wikimedia.org/r/528305 (https://phabricator.wikimedia.org/T166107) (owner: 10Herron) [08:28:18] (03CR) 10Filippo Giunchedi: [C: 03+1] logstash: rotate logstash plain logs with log4j2 [puppet] - 10https://gerrit.wikimedia.org/r/528306 (https://phabricator.wikimedia.org/T166107) (owner: 10Herron) [08:28:57] (03CR) 10Filippo Giunchedi: [C: 03+2] toil: rsyslog_tls_remedy log to journald [puppet] - 10https://gerrit.wikimedia.org/r/528395 (https://phabricator.wikimedia.org/T199406) (owner: 10Filippo Giunchedi) [08:29:17] (03PS1) 10Alexandros Kosiaris: mathoid: Fix metrics exporter livenessProbe [deployment-charts] - 10https://gerrit.wikimedia.org/r/528396 [08:31:28] (03PS1) 10Vgutierrez: ncredir: Introduce non-canonical-redirect-5 [puppet] - 10https://gerrit.wikimedia.org/r/528397 (https://phabricator.wikimedia.org/T133548) [08:31:30] (03PS1) 10Vgutierrez: nc_redirects.dat: Add rules to support non-canonical-redirect-5 [puppet] - 10https://gerrit.wikimedia.org/r/528398 (https://phabricator.wikimedia.org/T133548) [08:35:27] (03CR) 10Filippo Giunchedi: base: stop per-host puppet critical when master has issues (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/528087 (https://phabricator.wikimedia.org/T229262) (owner: 10Filippo Giunchedi) [08:35:35] (03PS4) 10Filippo Giunchedi: base: stop per-host puppet critical when master has issues [puppet] - 10https://gerrit.wikimedia.org/r/528087 (https://phabricator.wikimedia.org/T229262) [08:36:12] (03CR) 10Filippo Giunchedi: [C: 03+2] base: stop per-host puppet critical when master has issues [puppet] - 10https://gerrit.wikimedia.org/r/528087 (https://phabricator.wikimedia.org/T229262) (owner: 10Filippo Giunchedi) [08:37:45] (03PS1) 10Elukey: role::analytics_test_cluster::hadoop::worker: set spark.executorEnv.LD_LIBRARY_PATH [puppet] - 10https://gerrit.wikimedia.org/r/528400 (https://phabricator.wikimedia.org/T226698) [08:38:11] (03CR) 10jerkins-bot: [V: 04-1] role::analytics_test_cluster::hadoop::worker: set spark.executorEnv.LD_LIBRARY_PATH [puppet] - 10https://gerrit.wikimedia.org/r/528400 (https://phabricator.wikimedia.org/T226698) (owner: 10Elukey) [08:39:11] !log Add db2130 to tendril and zarcillo T228969 [08:39:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:20] (03PS1) 10Alexandros Kosiaris: mathoid: Align limitranges/resourcequotas in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/528401 (https://phabricator.wikimedia.org/T228837) [08:39:20] T228969: Productionize db21[21-30} - https://phabricator.wikimedia.org/T228969 [08:39:43] (03PS2) 10Elukey: Set spark.executorEnv.LD_LIBRARY_PATH in the Hadoop test cluster's workers [puppet] - 10https://gerrit.wikimedia.org/r/528400 (https://phabricator.wikimedia.org/T226698) [08:41:08] (03PS2) 10Alexandros Kosiaris: mathoid: Align limitranges/resourcequotas in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/528401 (https://phabricator.wikimedia.org/T228837) [08:41:10] (03PS2) 10Alexandros Kosiaris: mathoid: Fix metrics exporter livenessProbe [deployment-charts] - 10https://gerrit.wikimedia.org/r/528396 [08:41:27] (03CR) 10Filippo Giunchedi: "> Patch Set 2: Verified-1 Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/512486 (owner: 10Filippo Giunchedi) [08:41:35] (03Abandoned) 10Filippo Giunchedi: thumbor: add ban lists on client ip and path re [puppet] - 10https://gerrit.wikimedia.org/r/512486 (owner: 10Filippo Giunchedi) [08:42:11] (03CR) 10Vgutierrez: [C: 03+2] ncredir: Let ncredir take over wikimedia.com and linked DNS zones [dns] - 10https://gerrit.wikimedia.org/r/528316 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:42:41] (03PS2) 10Vgutierrez: ncredir: Let ncredir take over wikimedia.com and linked DNS zones [dns] - 10https://gerrit.wikimedia.org/r/528316 (https://phabricator.wikimedia.org/T133548) [08:42:55] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] mathoid: Align limitranges/resourcequotas in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/528401 (https://phabricator.wikimedia.org/T228837) (owner: 10Alexandros Kosiaris) [08:42:59] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] mathoid: Fix metrics exporter livenessProbe [deployment-charts] - 10https://gerrit.wikimedia.org/r/528396 (owner: 10Alexandros Kosiaris) [08:44:03] (03CR) 10Filippo Giunchedi: "Late to the review party, but thanks for this!" [puppet] - 10https://gerrit.wikimedia.org/r/528204 (owner: 10Bstorm) [08:50:54] (03PS1) 10Alexandros Kosiaris: Add TILLER_NAMESPACE to .hfenv [puppet] - 10https://gerrit.wikimedia.org/r/528403 [08:52:53] !log @ helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' . [08:53:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:53:43] (03PS3) 10Elukey: Set spark.executorEnv.LD_LIBRARY_PATH in the Hadoop test cluster's workers [puppet] - 10https://gerrit.wikimedia.org/r/528400 (https://phabricator.wikimedia.org/T226698) [08:53:53] (03CR) 10Viztor: "> Patch Set 8:" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [08:54:38] (03CR) 10Elukey: [C: 03+2] Set spark.executorEnv.LD_LIBRARY_PATH in the Hadoop test cluster's workers [puppet] - 10https://gerrit.wikimedia.org/r/528400 (https://phabricator.wikimedia.org/T226698) (owner: 10Elukey) [08:56:26] 10Operations, 10serviceops, 10Performance-Team (Radar), 10User-jijiki: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 (10jijiki) We switched mw1270 to PHP7 but we came across the following issues * a 10% increase in the median a... [08:58:12] (03CR) 10Urbanecm: [C: 03+1] "> Patch Set 8:" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [08:59:45] (03CR) 10Alexandros Kosiaris: [C: 03+2] "PCC happy at https://puppet-compiler.wmflabs.org/compiler1002/17741/ with the minor exception of the admin token, which is not however use" [puppet] - 10https://gerrit.wikimedia.org/r/528403 (owner: 10Alexandros Kosiaris) [08:59:45] (03PS2) 10Alexandros Kosiaris: Add TILLER_NAMESPACE to .hfenv [puppet] - 10https://gerrit.wikimedia.org/r/528403 [08:59:47] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Add TILLER_NAMESPACE to .hfenv [puppet] - 10https://gerrit.wikimedia.org/r/528403 (owner: 10Alexandros Kosiaris) [09:06:01] (03PS1) 10Alexandros Kosiaris: mathoid: Partialy revert 269abb124130e0f [deployment-charts] - 10https://gerrit.wikimedia.org/r/528404 (https://phabricator.wikimedia.org/T228837) [09:07:15] (03CR) 10Ema: [C: 03+1] prometheus: stop polling varnish on upload backend [puppet] - 10https://gerrit.wikimedia.org/r/528142 (owner: 10Filippo Giunchedi) [09:08:36] (03PS3) 10Filippo Giunchedi: prometheus: stop polling varnish on upload backend [puppet] - 10https://gerrit.wikimedia.org/r/528142 [09:09:16] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: stop polling varnish on upload backend [puppet] - 10https://gerrit.wikimedia.org/r/528142 (owner: 10Filippo Giunchedi) [09:11:20] (03PS1) 10Ladsgroup: Disable EntitySchema in production wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528405 (https://phabricator.wikimedia.org/T229904) [09:11:32] (03CR) 10Ema: [C: 03+1] nc_redirects.dat: Add rules to support non-canonical-redirect-5 [puppet] - 10https://gerrit.wikimedia.org/r/528398 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:17:26] (03PS2) 10Vgutierrez: ncredir: Introduce non-canonical-redirect-5 [puppet] - 10https://gerrit.wikimedia.org/r/528397 (https://phabricator.wikimedia.org/T133548) [09:17:28] (03PS2) 10Vgutierrez: nc_redirects.dat: Add rules to support non-canonical-redirect-5 [puppet] - 10https://gerrit.wikimedia.org/r/528398 (https://phabricator.wikimedia.org/T133548) [09:19:15] (03CR) 10Vgutierrez: [C: 03+2] ncredir: Introduce non-canonical-redirect-5 [puppet] - 10https://gerrit.wikimedia.org/r/528397 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:19:30] (03PS3) 10Vgutierrez: ncredir: Introduce non-canonical-redirect-5 [puppet] - 10https://gerrit.wikimedia.org/r/528397 (https://phabricator.wikimedia.org/T133548) [09:20:42] (03PS1) 10Elukey: Revert "Set spark.executorEnv.LD_LIBRARY_PATH in the Hadoop test cluster's workers" [puppet] - 10https://gerrit.wikimedia.org/r/528408 [09:21:06] (03PS2) 10Elukey: Revert "Set spark.executorEnv.LD_LIBRARY_PATH in the Hadoop test cluster's workers" [puppet] - 10https://gerrit.wikimedia.org/r/528408 [09:21:28] (03CR) 10Elukey: [V: 03+2 C: 03+2] "Not needed, it seems that only the client needs to set it." [puppet] - 10https://gerrit.wikimedia.org/r/528408 (owner: 10Elukey) [09:25:29] (03CR) 10Vgutierrez: [C: 03+2] nc_redirects.dat: Add rules to support non-canonical-redirect-5 [puppet] - 10https://gerrit.wikimedia.org/r/528398 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:25:38] (03PS3) 10Vgutierrez: nc_redirects.dat: Add rules to support non-canonical-redirect-5 [puppet] - 10https://gerrit.wikimedia.org/r/528398 (https://phabricator.wikimedia.org/T133548) [09:27:03] PROBLEM - puppet last run on ncredir1001 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[nginx-reload] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:27:27] ^^ checking [09:29:12] (03CR) 10Gehel: [C: 03+1] Add L and M to allowed statement starts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526755 (owner: 10Smalyshev) [09:31:23] (03CR) 10Muehlenhoff: [C: 03+1] "Great work! A few nits inline, but this looks good to merge" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/527591 (owner: 10Jbond) [09:31:57] glad I didn't completely break the puppet check! [09:32:16] hmm that check is missing the #page' [09:32:23] oh nope, that doesn't page [09:32:28] forget it :) [09:32:37] RECOVERY - puppet last run on ncredir1001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [09:32:56] hahah [09:33:40] and the ncredir glitch was a race condition.. nginx reloading before the new cert has been deployed in the server [09:34:51] (03CR) 10Vgutierrez: [C: 03+2] ncredir: Handle more domains to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/528394 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:34:56] (03PS2) 10Vgutierrez: ncredir: Handle more domains to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/528394 (https://phabricator.wikimedia.org/T133548) [09:35:16] (03CR) 10Volans: "Comments inline" (034 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 (owner: 10Jbond) [09:41:30] (03CR) 10Effie Mouzeli: [C: 03+1] "Just a question, other wise lgtm" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/528176 (owner: 10Giuseppe Lavagetto) [09:41:32] (03PS1) 10Alexandros Kosiaris: Revert "k8s, cache: disabling codfw services for k8s cluster recreation" [puppet] - 10https://gerrit.wikimedia.org/r/528409 (https://phabricator.wikimedia.org/T228837) [09:42:21] (03CR) 10Muehlenhoff: cassandra: rolling restart cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 (owner: 10Jbond) [09:42:34] (03CR) 10Legoktm: [C: 03+1] Update wmflabs.org redirect target [puppet] - 10https://gerrit.wikimedia.org/r/528304 (https://phabricator.wikimedia.org/T229896) (owner: 10DannyS712) [09:47:09] (03PS1) 10Filippo Giunchedi: icinga: add /alerts shortcut for faster ack'ing [puppet] - 10https://gerrit.wikimedia.org/r/528410 (https://phabricator.wikimedia.org/T228379) [09:49:37] (03PS1) 10Marostegui: install_server: Do not reimage db2131 [puppet] - 10https://gerrit.wikimedia.org/r/528411 (https://phabricator.wikimedia.org/T228969) [09:49:50] 10Operations, 10PDF-Rendering, 10Proton, 10Reading-Infrastructure-Team-Backlog, and 2 others: PDF renderer needs better CJK font - https://phabricator.wikimedia.org/T226633 (10MoritzMuehlenhoff) [09:50:11] (03PS2) 10Marostegui: install_server: Do not reimage db2131 [puppet] - 10https://gerrit.wikimedia.org/r/528411 (https://phabricator.wikimedia.org/T228969) [09:50:25] 10Operations, 10PDF-Rendering, 10Proton, 10Reading-Infrastructure-Team-Backlog, and 2 others: PDF renderer needs better CJK font - https://phabricator.wikimedia.org/T226633 (10MoritzMuehlenhoff) >>! In T226633#5395538, @Viztor wrote: > It would be best if we can do that, Noto/Source Serif CJK is the most c... [09:50:49] (03CR) 10Marostegui: [C: 03+2] install_server: Do not reimage db2131 [puppet] - 10https://gerrit.wikimedia.org/r/528411 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [09:51:54] (03CR) 10Alexandros Kosiaris: [C: 04-1] "LGTM, minor inline comments." (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/526679 (https://phabricator.wikimedia.org/T229287) (owner: 10MSantos) [09:53:58] 10Operations, 10serviceops, 10PHP 7.2 support: Don't monitor HHVM on PHP7 only servers - https://phabricator.wikimedia.org/T228643 (10jijiki) 05Open→03Invalid @Dzahn you are right, I am marking this as invalid. [09:54:00] (03PS1) 10Marostegui: db2035: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/528413 (https://phabricator.wikimedia.org/T229784) [09:54:01] 10Operations, 10serviceops, 10Performance-Team (Radar), 10User-jijiki: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 (10jijiki) [09:55:26] (03CR) 10Marostegui: [C: 03+2] db2035: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/528413 (https://phabricator.wikimedia.org/T229784) (owner: 10Marostegui) [09:55:49] 10Operations, 10ops-eqiad, 10DC-Ops, 10media-storage: ms-be1040 - disk issues - https://phabricator.wikimedia.org/T229880 (10fgiunchedi) It looks like to me the host came back up with the wrong disk ordering, (sdc should be sdb), I'll try rebooting the host [09:57:51] 10Operations, 10ops-eqiad, 10DC-Ops: b6-eqiad pdu refresh - https://phabricator.wikimedia.org/T227541 (10RobH) [09:58:59] !log filippo@cumin1001 START - Cookbook sre.hosts.downtime [09:58:59] !log filippo@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [09:59:07] !log filippo@cumin1001 START - Cookbook sre.hosts.downtime [09:59:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:59:08] !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [09:59:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:59:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:59:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:18] 10Operations, 10ops-eqiad, 10DC-Ops, 10media-storage: ms-be1040 - disk issues - https://phabricator.wikimedia.org/T229880 (10fgiunchedi) 05Open→03Resolved Looks like we're back, and this problem is an instance of {T163673} ` ms-be1040:~$ pat Warning: Downgrading to PSON for future requests Info: Using... [10:06:20] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [10:06:21] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [10:06:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:07:27] !log rebooting etherpad1001 to pick up MDS-enabled qemu [10:07:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:01] RECOVERY - puppet last run on ms-be1040 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:10:05] (03PS2) 10Ema: ATS: unify common trafficserver settings [puppet] - 10https://gerrit.wikimedia.org/r/528104 (https://phabricator.wikimedia.org/T227432) [10:10:15] (03PS4) 10Effie Mouzeli: profile::mediawiki::hhvm: default php to php7 on stretch [puppet] - 10https://gerrit.wikimedia.org/r/425027 (https://phabricator.wikimedia.org/T189295) (owner: 10Giuseppe Lavagetto) [10:12:52] 10Operations, 10ops-eqiad: (OoW) Degraded RAID on analytics1032 - https://phabricator.wikimedia.org/T227940 (10elukey) 05Open→03Resolved The alert should not fire again (I hope), I have disabled it via Icinga UI. Closing :) [10:13:01] 10Operations, 10ops-eqiad: (OoW) Degraded RAID on analytics1039 - https://phabricator.wikimedia.org/T226599 (10elukey) 05Open→03Resolved The alert should not fire again (I hope), I have disabled it via Icinga UI. Closing :) [10:16:07] (03CR) 10Ema: "pcc says noop https://puppet-compiler.wmflabs.org/compiler1001/17744/" [puppet] - 10https://gerrit.wikimedia.org/r/528104 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [10:17:53] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/528104 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [10:19:56] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops, 10decommission: Decommission db2035 - https://phabricator.wikimedia.org/T229784 (10Marostegui) [10:21:47] (03CR) 10Ema: [C: 03+2] ATS: unify common trafficserver settings [puppet] - 10https://gerrit.wikimedia.org/r/528104 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [10:22:42] PROBLEM - Host mw1280 is DOWN: PING CRITICAL - Packet loss = 100% [10:23:34] RECOVERY - Host mw1280 is UP: PING OK - Packet loss = 16%, RTA = 0.17 ms [10:24:47] (03PS3) 10Giuseppe Lavagetto: envoyproxy: create module, add tls terminator definition [puppet] - 10https://gerrit.wikimedia.org/r/526110 [10:29:38] (03PS1) 10Filippo Giunchedi: swift: 'nobarrier' for xfs has been removed in linux 4.19 [puppet] - 10https://gerrit.wikimedia.org/r/528419 (https://phabricator.wikimedia.org/T229911) [10:30:09] (03PS5) 10Jbond: cassandra: rolling restart cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 [10:30:25] (03CR) 10Jbond: "Thanks all for the review comments inline" (035 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 (owner: 10Jbond) [10:32:19] (03CR) 10Ema: [C: 03+1] "One typo, other than that LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526110 (owner: 10Giuseppe Lavagetto) [10:35:38] (03CR) 10Muehlenhoff: cassandra: rolling restart cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 (owner: 10Jbond) [10:39:00] 10Operations, 10SRE-Access-Requests: Requesting access to Puppet for Viztor[S] - https://phabricator.wikimedia.org/T229894 (10Aklapper) 05Open→03Stalled @Viztor: Please see https://wikitech.wikimedia.org/wiki/Production_shell_access section 2.2 for missing information here, and then provide missing informa... [10:39:05] (03PS3) 10Mathew.onipe: Cassandra nodetool repair cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/517377 (https://phabricator.wikimedia.org/T225694) [10:40:10] (03CR) 10Mathew.onipe: Cassandra nodetool repair cookbook (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/517377 (https://phabricator.wikimedia.org/T225694) (owner: 10Mathew.onipe) [10:40:16] 10Operations, 10Discovery, 10Traffic, 10WMDE-Analytics-Engineering, and 3 others: Allow access to wdqs.svc.eqiad.wmnet on port 8888 - https://phabricator.wikimedia.org/T176875 (10elukey) Changed the following: (Cc: @ayounsi ) ` elukey@re0.cr2-eqiad# show | compare [edit firewall family inet filter analyti... [10:43:18] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, two comments inline, good to merge from my PoV." (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 (owner: 10Jbond) [10:43:41] 10Operations, 10Discovery, 10Traffic, 10WMDE-Analytics-Engineering, and 3 others: Allow access to wdqs.svc.eqiad.wmnet on port 8888 - https://phabricator.wikimedia.org/T176875 (10elukey) Adding @WMDE-leszek and @Ladsgroup since afaics they were/are working on this :) [10:47:12] (03CR) 10Filippo Giunchedi: "CC'd DBAs as it might affect them too" [puppet] - 10https://gerrit.wikimedia.org/r/528419 (https://phabricator.wikimedia.org/T229911) (owner: 10Filippo Giunchedi) [10:48:30] (03CR) 10Urbanecm: [C: 03+1] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527773 (owner: 10Viztor) [10:49:11] (03PS6) 10Jbond: cassandra: rolling restart cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 [10:49:19] (03CR) 10Jbond: cassandra: rolling restart cookbook (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 (owner: 10Jbond) [10:49:29] (03CR) 10Marostegui: "Thanks - I will take a look!" [puppet] - 10https://gerrit.wikimedia.org/r/528419 (https://phabricator.wikimedia.org/T229911) (owner: 10Filippo Giunchedi) [10:49:52] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [10:49:53] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [10:49:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:50:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:50:26] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Add L and M to allowed statement starts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526755 (owner: 10Smalyshev) [10:52:32] !log rebooting install2002 to pick up MDS-enabled qemu [10:52:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:53:20] (03PS7) 10Jbond: cassandra: rolling restart cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 [10:54:27] (03CR) 10Volans: [C: 04-1] "One small detail to fix, the other are nits" (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 (owner: 10Jbond) [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190806T1100). [11:00:04] Viztor_ and Amir1: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:17] Viztor: Around? [11:00:22] o/ [11:00:28] (03PS5) 10Effie Mouzeli: profile::mediawiki::hhvm: default php to php7 on stretch [puppet] - 10https://gerrit.wikimedia.org/r/425027 (https://phabricator.wikimedia.org/T195392) (owner: 10Giuseppe Lavagetto) [11:00:30] o/ [11:00:45] Amir1, feel free to start with your patch [11:01:24] Sure [11:02:02] (03CR) 10Ladsgroup: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528405 (https://phabricator.wikimedia.org/T229904) (owner: 10Ladsgroup) [11:02:09] (03PS2) 10Ladsgroup: Disable EntitySchema in production wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528405 (https://phabricator.wikimedia.org/T229904) [11:02:19] (03CR) 10Ladsgroup: [C: 03+2] Disable EntitySchema in production wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528405 (https://phabricator.wikimedia.org/T229904) (owner: 10Ladsgroup) [11:02:27] (03CR) 10Marostegui: "Created https://phabricator.wikimedia.org/T229915 in our backlog" [puppet] - 10https://gerrit.wikimedia.org/r/528419 (https://phabricator.wikimedia.org/T229911) (owner: 10Filippo Giunchedi) [11:03:17] (03CR) 10Urbanecm: [C: 03+1] "> Patch Set 8:" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [11:05:10] (03PS6) 10Effie Mouzeli: profile::mediawiki::hhvm: default php to php7 on stretch [puppet] - 10https://gerrit.wikimedia.org/r/425027 (https://phabricator.wikimedia.org/T195392) (owner: 10Giuseppe Lavagetto) [11:05:25] (03Merged) 10jenkins-bot: Disable EntitySchema in production wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528405 (https://phabricator.wikimedia.org/T229904) (owner: 10Ladsgroup) [11:06:10] 10Operations, 10Traffic: SRE Onboarding for Sukhbir Singh - https://phabricator.wikimedia.org/T229860 (10ema) p:05Triage→03Normal a:03ema [11:06:33] (03CR) 10jenkins-bot: Disable EntitySchema in production wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528405 (https://phabricator.wikimedia.org/T229904) (owner: 10Ladsgroup) [11:07:07] (03PS8) 10Jbond: cassandra: rolling restart cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 [11:07:36] (03CR) 10Volans: [C: 04-1] "Some minor things to fix inline" (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/517377 (https://phabricator.wikimedia.org/T225694) (owner: 10Mathew.onipe) [11:08:30] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:528405|Disable EntitySchema in production wikidata ]] (duration: 00m 48s) [11:08:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:09:19] Urbanecm: I'm done [11:09:45] thanks. I'd like Viztor to be around, to teach them how a SWAT window looks like, and since the patch is not urgent, let's skip it [11:09:48] (03CR) 10Volans: [C: 03+1] "LGTM, if you didn't try the status_cmd via cumin I'd suggest to, to make sure it works as expected due to the various encapsulations ;)" [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 (owner: 10Jbond) [11:10:33] (03CR) 10Aklapper: [C: 03+1] Update wmflabs.org redirect target [puppet] - 10https://gerrit.wikimedia.org/r/528304 (https://phabricator.wikimedia.org/T229896) (owner: 10DannyS712) [11:11:12] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [11:11:13] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [11:11:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:11:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:11:33] !log rebooting install1002 to pick up MDS-enabled qemu [11:11:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:59] 10Operations, 10Traffic: SRE Onboarding for Sukhbir Singh - https://phabricator.wikimedia.org/T229860 (10ema) [11:13:00] (03CR) 10Jbond: "Thanks command tested merging now" (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 (owner: 10Jbond) [11:13:02] (03CR) 10Jbond: [C: 03+2] cassandra: rolling restart cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 (owner: 10Jbond) [11:13:04] jouncebot: now [11:13:04] (03PS9) 10Jbond: cassandra: rolling restart cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/528133 [11:13:04] For the next 0 hour(s) and 46 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190806T1100) [11:13:17] Urbanecm Hey! Would you be available now? [11:13:30] Daimona, certainly! [11:13:42] (03CR) 10Volans: [C: 03+1] "LGTM" [software/conftool] - 10https://gerrit.wikimedia.org/r/527564 (owner: 10Giuseppe Lavagetto) [11:13:46] 10Operations, 10Traffic: SRE Onboarding for Sukhbir Singh - https://phabricator.wikimedia.org/T229860 (10mark) Approved for access. [11:13:57] Cool! Given your last comment, I believe it could really be a coincidence, although it's still weird that the errors didn't disappear. Wanna try again now? [11:14:10] (03CR) 10Volans: [C: 03+1] "LGTM" [software/conftool] - 10https://gerrit.wikimedia.org/r/527565 (owner: 10Giuseppe Lavagetto) [11:14:17] Sure [11:14:24] I'm going to merge the backport now [11:14:27] and let's try what happens [11:14:47] https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/528425/ [11:14:59] Just plz gimme one minute to set up logstash etc. [11:15:04] thanks [11:15:16] sure [11:15:37] 10Operations, 10Cassandra: Create a cassandra.service which subsumes casandra-{a,b,c} services using PartsOf=cassandra.service - https://phabricator.wikimedia.org/T229916 (10jbond) [11:16:06] You can V+2 on gerrit in the meanwhile [11:16:33] done [11:16:52] And I'm ready, so whenever you want :) [11:17:28] I'm ready as well [11:17:50] fetched on mwdebug1002, since you tested it locally, you'd be probably able to test it there as well [11:18:10] Sure, can I test there already? [11:18:22] yes, the patch should be on mwdebug1002 [11:19:02] Confirming that it seems to work there, just like it does locally [11:19:16] While it doesn't out of mwdebug1002, so green light [11:20:03] okay, syncing [11:20:34] PROBLEM - puppet last run on stat1006 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 8 minutes ago with 4 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config],Exec[git_pull_mediawiki/event-schemas],Exec[git_pull_statistics_mediawiki],Exec[git_pull_analytics/reportupdater] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [11:21:08] Ty [11:21:14] happy to help [11:21:38] !log urbanecm@deploy1001 Synchronized php-1.34.0-wmf.16/extensions/AbuseFilter/: SWAT: 8cc96db: Better handling of DNONE (T214674, T228677) (duration: 00m 48s) [11:21:42] synced [11:21:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:52] T228677: use of get_matches function returns "Requesting array item of non-array" - https://phabricator.wikimedia.org/T228677 [11:21:53] T214674: Short circuit fails with assignments - https://phabricator.wikimedia.org/T214674 [11:22:02] Daimona, ^^ [11:22:09] Thanks, checking [11:22:09] watching logs, but would appreciate if you do the same :) [11:22:28] Yeah I have like 5 tabs [11:22:32] 10Operations, 10Traffic: SRE Onboarding for Sukhbir Singh - https://phabricator.wikimedia.org/T229860 (10ema) [11:22:34] Tests on-wiki work as expected [11:22:46] thanks a lot [11:23:18] Parser errors seem to have dropped, while we got another spike for slow filters, but let it sit there for 5 minutes [11:23:39] sure [11:25:39] (03Abandoned) 10Elukey: role::analytics_cluster::turnilo: add TLS proxy [puppet] - 10https://gerrit.wikimedia.org/r/524259 (https://phabricator.wikimedia.org/T227860) (owner: 10Elukey) [11:25:45] (03Abandoned) 10Elukey: role::analytics_cluster::webserver: add TLS proxy [puppet] - 10https://gerrit.wikimedia.org/r/524258 (https://phabricator.wikimedia.org/T227860) (owner: 10Elukey) [11:25:46] Looks much better now, look at the drop https://logstash.wikimedia.org/goto/7ff2a202b79d6b690665f7dc6291c8ea [11:25:53] (03Abandoned) 10Elukey: role::analytics_cluster::superset: add TLS proxy [puppet] - 10https://gerrit.wikimedia.org/r/524255 (https://phabricator.wikimedia.org/T227860) (owner: 10Elukey) [11:26:07] (03Abandoned) 10Elukey: [WIP] profile::netbox: allow /metrics to be polled via http [puppet] - 10https://gerrit.wikimedia.org/r/527601 (owner: 10Elukey) [11:26:34] And the slow-filters spike is also gone, now I think it's definitely related, but it's just some temporary problem - nothing serious I hope [11:26:44] Daimona, I agree with you [11:26:56] don't know why it didn't work last time [11:27:07] anyway, glad it works now :) [11:28:57] (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.26 [software/spicerack] - 10https://gerrit.wikimedia.org/r/528428 [11:29:33] Daimona, anything else I may help with now? [11:29:35] (03PS6) 10Jbond: apereo_cas: Initial module [puppet] - 10https://gerrit.wikimedia.org/r/527591 [11:31:18] Well, I think we're done :) [11:31:20] Thanks a lot! [11:31:28] 10Operations, 10Traffic: SRE Onboarding for Sukhbir Singh - https://phabricator.wikimedia.org/T229860 (10ssingh) [11:31:51] 10Operations, 10Traffic: SRE Onboarding for Sukhbir Singh - https://phabricator.wikimedia.org/T229860 (10ema) [11:31:52] Happy to help! [11:31:55] !Log EU SWAT done [11:36:38] I don’t think that works with an uppercase L [11:36:46] (03CR) 10Urbanecm: [C: 03+1] Update wmflabs.org redirect target [puppet] - 10https://gerrit.wikimedia.org/r/528304 (https://phabricator.wikimedia.org/T229896) (owner: 10DannyS712) [11:36:55] !log EU SWAT done [11:37:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:04] Lucas_WMDE, thanks, I was wondering what's happening :) [11:37:14] (03CR) 10jerkins-bot: [V: 04-1] apereo_cas: Initial module [puppet] - 10https://gerrit.wikimedia.org/r/527591 (owner: 10Jbond) [11:41:27] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, one nit." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/528419 (https://phabricator.wikimedia.org/T229911) (owner: 10Filippo Giunchedi) [11:42:56] RECOVERY - puppet last run on stat1006 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [11:43:02] (03PS7) 10Jbond: apereo_cas: Initial module [puppet] - 10https://gerrit.wikimedia.org/r/527591 [11:43:24] (03PS2) 10Alexandros Kosiaris: calico: add all kafka-main hosts to k8s eventgate policy [puppet] - 10https://gerrit.wikimedia.org/r/528275 (https://phabricator.wikimedia.org/T225005) (owner: 10Herron) [11:43:30] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] calico: add all kafka-main hosts to k8s eventgate policy [puppet] - 10https://gerrit.wikimedia.org/r/528275 (https://phabricator.wikimedia.org/T225005) (owner: 10Herron) [11:44:48] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.26 [software/spicerack] - 10https://gerrit.wikimedia.org/r/528428 (owner: 10Volans) [11:46:20] (03PS1) 10Alexandros Kosiaris: calico: add all kafka-main hosts to k8s eventgate policy [deployment-charts] - 10https://gerrit.wikimedia.org/r/528432 (https://phabricator.wikimedia.org/T225005) [11:46:25] (03CR) 10jerkins-bot: [V: 04-1] apereo_cas: Initial module [puppet] - 10https://gerrit.wikimedia.org/r/527591 (owner: 10Jbond) [11:47:13] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] "Applied. Also copied to https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/528432 which is where this file is going to live" [puppet] - 10https://gerrit.wikimedia.org/r/528275 (https://phabricator.wikimedia.org/T225005) (owner: 10Herron) [11:47:54] (03PS2) 10Alexandros Kosiaris: calico: add all kafka-main hosts to k8s eventgate policy [deployment-charts] - 10https://gerrit.wikimedia.org/r/528432 (https://phabricator.wikimedia.org/T225005) [11:48:00] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] calico: add all kafka-main hosts to k8s eventgate policy [deployment-charts] - 10https://gerrit.wikimedia.org/r/528432 (https://phabricator.wikimedia.org/T225005) (owner: 10Alexandros Kosiaris) [11:48:52] (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.26 [software/spicerack] - 10https://gerrit.wikimedia.org/r/528428 (owner: 10Volans) [11:49:35] !log akosiaris@ helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' . [11:49:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:55] (03CR) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.26 [software/spicerack] - 10https://gerrit.wikimedia.org/r/528428 (owner: 10Volans) [11:49:57] !log akosiaris@ helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' . [11:50:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:53] (03PS2) 10Paladox: gerrit: replication: exclude some projects [puppet] - 10https://gerrit.wikimedia.org/r/528276 (owner: 10Thcipriani) [11:51:58] (03CR) 10Jbond: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/527591 (owner: 10Jbond) [11:52:29] (03PS4) 10Mathew.onipe: Cassandra nodetool repair cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/517377 (https://phabricator.wikimedia.org/T225694) [11:52:49] (03CR) 10Mathew.onipe: Cassandra nodetool repair cookbook (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/517377 (https://phabricator.wikimedia.org/T225694) (owner: 10Mathew.onipe) [11:53:04] volans: ^ [11:53:45] (03PS1) 10Paladox: Gerrit: Switch 'mirror' back on for the GitHub remote [puppet] - 10https://gerrit.wikimedia.org/r/528433 [11:53:58] (03CR) 10Mathew.onipe: "> Patch Set 4:" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/517377 (https://phabricator.wikimedia.org/T225694) (owner: 10Mathew.onipe) [11:54:33] (03PS2) 10Paladox: Gerrit: Switch 'mirror' back on for the GitHub remote [puppet] - 10https://gerrit.wikimedia.org/r/528433 [11:54:42] (03PS2) 10Alexandros Kosiaris: mathoid: Partialy revert 269abb124130e0f [deployment-charts] - 10https://gerrit.wikimedia.org/r/528404 (https://phabricator.wikimedia.org/T228837) [11:54:52] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] mathoid: Partialy revert 269abb124130e0f [deployment-charts] - 10https://gerrit.wikimedia.org/r/528404 (https://phabricator.wikimedia.org/T228837) (owner: 10Alexandros Kosiaris) [11:55:14] (03CR) 10jerkins-bot: [V: 04-1] apereo_cas: Initial module [puppet] - 10https://gerrit.wikimedia.org/r/527591 (owner: 10Jbond) [11:58:18] (03PS2) 10Elukey: profile::cache::kafka::alerts: move alarms to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/526611 (https://phabricator.wikimedia.org/T229357) [11:58:57] (03PS3) 10Paladox: Gerrit: Switch 'mirror' back on for the GitHub remote [puppet] - 10https://gerrit.wikimedia.org/r/528433 [11:59:03] (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/528433 (owner: 10Paladox) [11:59:08] PROBLEM - Unmerged changes on repository puppet on labpuppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [11:59:23] 10Operations, 10Discovery, 10Traffic, 10WMDE-Analytics-Engineering, and 3 others: Allow access to wdqs.svc.eqiad.wmnet on port 8888 - https://phabricator.wikimedia.org/T176875 (10Gehel) At the moment, we have a [[ https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/wdqs/gui.pp#L24... [12:00:14] PROBLEM - Unmerged changes on repository puppet on labpuppetmaster1002 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [12:00:41] 10Operations, 10SRE-Access-Requests, 10cloud-services-team (Kanban): SRE: root access for Hieu Pham, SRE @ WMCS - https://phabricator.wikimedia.org/T229833 (10Phamhi) My SSH public key: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHJ+tdR854tJNFXK14dxJFrURxnEpk1rE+KkEUPeXrrZ hpham@wikimedia.org(prod) [12:02:43] (03CR) 10Paladox: [C: 04-1] gerrit: replication: exclude some projects (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/528276 (owner: 10Thcipriani) [12:03:02] (03PS1) 10Arturo Borrero Gonzalez: admin: add Hieu Pham credentials [puppet] - 10https://gerrit.wikimedia.org/r/528434 (https://phabricator.wikimedia.org/T229833) [12:04:07] (03PS3) 10Elukey: profile::cache::kafka::alerts: move alarms to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/526611 (https://phabricator.wikimedia.org/T229357) [12:05:31] !log akosiaris@ helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' . [12:05:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:06:16] !log akosiaris@ helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' . [12:06:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:31] (03PS1) 10Ema: ATS: add {upload,maps}_domain to text_ats settings [puppet] - 10https://gerrit.wikimedia.org/r/528436 (https://phabricator.wikimedia.org/T207340) [12:07:44] (03PS1) 10Muehlenhoff: Remove obsolete/unsupported nobarrier option from three partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/528437 (https://phabricator.wikimedia.org/T229915) [12:08:58] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/528434 (https://phabricator.wikimedia.org/T229833) (owner: 10Arturo Borrero Gonzalez) [12:09:28] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] admin: add Hieu Pham credentials [puppet] - 10https://gerrit.wikimedia.org/r/528434 (https://phabricator.wikimedia.org/T229833) (owner: 10Arturo Borrero Gonzalez) [12:10:25] (03PS4) 10Elukey: profile::cache::kafka::alerts: move alarms to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/526611 (https://phabricator.wikimedia.org/T229357) [12:11:30] RECOVERY - Unmerged changes on repository puppet on labpuppetmaster1002 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [12:11:58] RECOVERY - Unmerged changes on repository puppet on labpuppetmaster1001 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [12:13:21] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1001/17747/icinga1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/526611 (https://phabricator.wikimedia.org/T229357) (owner: 10Elukey) [12:14:06] (03CR) 10Marostegui: [C: 03+1] Remove obsolete/unsupported nobarrier option from three partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/528437 (https://phabricator.wikimedia.org/T229915) (owner: 10Muehlenhoff) [12:14:41] (03PS2) 10Ema: ATS: add {upload,maps}_domain to text_ats settings [puppet] - 10https://gerrit.wikimedia.org/r/528436 (https://phabricator.wikimedia.org/T207340) [12:14:43] (03PS1) 10Ema: ATS: add prometheus::varnishkafka_exporter::config [puppet] - 10https://gerrit.wikimedia.org/r/528439 (https://phabricator.wikimedia.org/T196066) [12:14:47] (03PS1) 10Ema: ATS: add profile::base::nameservers [puppet] - 10https://gerrit.wikimedia.org/r/528440 (https://phabricator.wikimedia.org/T228190) [12:15:42] (03PS8) 10Jbond: apereo_cas: Initial module [puppet] - 10https://gerrit.wikimedia.org/r/527591 [12:16:08] (03CR) 10Ema: [C: 03+2] ATS: add {upload,maps}_domain to text_ats settings [puppet] - 10https://gerrit.wikimedia.org/r/528436 (https://phabricator.wikimedia.org/T207340) (owner: 10Ema) [12:16:20] (03CR) 10Marostegui: "For context: read backlog on #wikimedia-sre channel :)" [puppet] - 10https://gerrit.wikimedia.org/r/528433 (owner: 10Paladox) [12:22:01] (03PS9) 10Jbond: apereo_cas: Initial module [puppet] - 10https://gerrit.wikimedia.org/r/527591 [12:24:24] (03CR) 10Jbond: [C: 03+2] apereo_cas: Initial module [puppet] - 10https://gerrit.wikimedia.org/r/527591 (owner: 10Jbond) [12:25:58] (03PS2) 10Filippo Giunchedi: swift: 'nobarrier' for xfs has been removed in linux 4.19 [puppet] - 10https://gerrit.wikimedia.org/r/528419 (https://phabricator.wikimedia.org/T229911) [12:26:10] (03CR) 10Filippo Giunchedi: "Thanks!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/528419 (https://phabricator.wikimedia.org/T229911) (owner: 10Filippo Giunchedi) [12:26:46] (03CR) 10Filippo Giunchedi: "Interesting! Thanks for tracking this down" [puppet] - 10https://gerrit.wikimedia.org/r/528437 (https://phabricator.wikimedia.org/T229915) (owner: 10Muehlenhoff) [12:28:41] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM! Thanks Luca" [puppet] - 10https://gerrit.wikimedia.org/r/526611 (https://phabricator.wikimedia.org/T229357) (owner: 10Elukey) [12:29:24] 10Operations, 10SRE-tools, 10serviceops-radar, 10Patch-For-Review, and 3 others: Convert makevm to spicerack cookbook - https://phabricator.wikimedia.org/T203963 (10elukey) Can we close this? [12:30:04] (03CR) 10Muehlenhoff: [C: 03+1] swift: 'nobarrier' for xfs has been removed in linux 4.19 [puppet] - 10https://gerrit.wikimedia.org/r/528419 (https://phabricator.wikimedia.org/T229911) (owner: 10Filippo Giunchedi) [12:30:19] !log roll restart cassandra on aqs for openjdk-8 upgrades [12:30:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:31:30] (03PS1) 10Volans: Upstream release v0.0.26 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/528443 [12:34:23] 10Operations, 10SRE-tools, 10serviceops-radar, 10Patch-For-Review, and 3 others: Convert makevm to spicerack cookbook - https://phabricator.wikimedia.org/T203963 (10MoritzMuehlenhoff) From my PoV yes, I've used this multiple times successfully to create Ganeti instances, all further enhancesments can be do... [12:35:45] (03PS2) 10Muehlenhoff: Remove obsolete/unsupported nobarrier option from three partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/528437 (https://phabricator.wikimedia.org/T229915) [12:36:52] (03PS3) 10Filippo Giunchedi: swift: 'nobarrier' for xfs has been removed in linux 4.19 [puppet] - 10https://gerrit.wikimedia.org/r/528419 (https://phabricator.wikimedia.org/T229911) [12:37:34] (03CR) 10Filippo Giunchedi: [C: 03+2] swift: 'nobarrier' for xfs has been removed in linux 4.19 [puppet] - 10https://gerrit.wikimedia.org/r/528419 (https://phabricator.wikimedia.org/T229911) (owner: 10Filippo Giunchedi) [12:38:18] (03CR) 10Volans: [C: 03+2] Upstream release v0.0.26 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/528443 (owner: 10Volans) [12:38:45] jbond42: merging your patch too [12:42:21] (03Merged) 10jenkins-bot: Upstream release v0.0.26 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/528443 (owner: 10Volans) [12:43:37] (03PS1) 10Elukey: role::aqs: update druid configuration with new MW snapshot [puppet] - 10https://gerrit.wikimedia.org/r/528445 [12:45:07] (03PS1) 10Krinkle: CommonSettings: Clean up wmf-config caching code [no-op] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528446 (https://phabricator.wikimedia.org/T217830) [12:45:09] (03PS1) 10Krinkle: CommonSettings: Store mtime inside wmf-config cache file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528447 (https://phabricator.wikimedia.org/T217830) [12:45:23] godog: thanks [12:45:39] sorry got destracted [12:45:53] (03CR) 10Krinkle: "Untested." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528446 (https://phabricator.wikimedia.org/T217830) (owner: 10Krinkle) [12:46:08] (03CR) 10Krinkle: "Untested." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528447 (https://phabricator.wikimedia.org/T217830) (owner: 10Krinkle) [12:46:10] 10Operations, 10SRE-tools, 10serviceops-radar, 10Patch-For-Review, and 3 others: Convert makevm to spicerack cookbook - https://phabricator.wikimedia.org/T203963 (10elukey) 05Open→03Resolved Same for me, please re-open if necessary! [12:46:13] 10Operations, 10SRE-tools, 10User-Joe, 10User-jijiki: Spicerack cookbooks TODO list - https://phabricator.wikimedia.org/T203943 (10elukey) [12:48:24] (03PS1) 10Alexandros Kosiaris: mathoid: Take tiller into account as well [deployment-charts] - 10https://gerrit.wikimedia.org/r/528451 [12:50:31] jbond42: np, also looks like a unrelated change to check_puppetrun slipped in in the review btw [12:51:15] (03PS2) 10Krinkle: CommonSettings: Store mtime inside wmf-config cache file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528447 (https://phabricator.wikimedia.org/T217830) [12:51:23] godog: yes rubocop was complaining about something not sure why it got checked with my change or why it didn't get picked up originaly but figuered was easy to just fix it :) [12:54:49] jbond42: interesting! I wonder why rubocop's from CI didn't pick up on it, it should have AIUI [12:56:13] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] mathoid: Take tiller into account as well [deployment-charts] - 10https://gerrit.wikimedia.org/r/528451 (owner: 10Alexandros Kosiaris) [12:56:35] (03PS1) 10Arturo Borrero Gonzalez: nagios: add contacts for new SRE @ WMCS: Hieu Pham [puppet] - 10https://gerrit.wikimedia.org/r/528454 (https://phabricator.wikimedia.org/T228942) [12:56:42] godog: yes it should have not sure why it didn't. for refrence this is the run that failed on me https://integration.wikimedia.org/ci/job/operations-puppet-tests-stretch-docker/18715/console [13:01:09] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] nagios: add contacts for new SRE @ WMCS: Hieu Pham [puppet] - 10https://gerrit.wikimedia.org/r/528454 (https://phabricator.wikimedia.org/T228942) (owner: 10Arturo Borrero Gonzalez) [13:05:28] (03PS3) 10Muehlenhoff: Remove obsolete/unsupported nobarrier option from three partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/528437 (https://phabricator.wikimedia.org/T229915) [13:05:37] (03PS1) 10Arturo Borrero Gonzalez: icinga: add permissions for new SRE @ WMCS: Hieu Pham [puppet] - 10https://gerrit.wikimedia.org/r/528457 (https://phabricator.wikimedia.org/T228942) [13:07:57] (03CR) 10CDanis: [C: 03+1] Icinga: Remove support for Jessie [puppet] - 10https://gerrit.wikimedia.org/r/528391 (owner: 10Muehlenhoff) [13:08:01] (03PS1) 10Pmiazga: Enable AMC on all wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528458 (https://phabricator.wikimedia.org/T228916) [13:08:18] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] icinga: add permissions for new SRE @ WMCS: Hieu Pham [puppet] - 10https://gerrit.wikimedia.org/r/528457 (https://phabricator.wikimedia.org/T228942) (owner: 10Arturo Borrero Gonzalez) [13:09:29] (03CR) 10CDanis: [C: 03+1] "LG!" [puppet] - 10https://gerrit.wikimedia.org/r/528410 (https://phabricator.wikimedia.org/T228379) (owner: 10Filippo Giunchedi) [13:11:23] (03PS3) 10Jhedden: toolschecker: Ensure webservice is fully stopped [puppet] - 10https://gerrit.wikimedia.org/r/528292 (https://phabricator.wikimedia.org/T221301) [13:12:35] (03PS2) 10Ema: ATS: add prometheus::varnishkafka_exporter::config [puppet] - 10https://gerrit.wikimedia.org/r/528439 (https://phabricator.wikimedia.org/T196066) [13:12:36] (03CR) 10Jhedden: [C: 03+2] toolschecker: Ensure webservice is fully stopped [puppet] - 10https://gerrit.wikimedia.org/r/528292 (https://phabricator.wikimedia.org/T221301) (owner: 10Jhedden) [13:12:54] jbond42: indeed, bizzarre, I don't have time to dig into it further now tho :| [13:14:37] (03PS3) 10Ema: ATS: add prometheus::varnishkafka_exporter::config [puppet] - 10https://gerrit.wikimedia.org/r/528439 (https://phabricator.wikimedia.org/T196066) [13:15:54] (03PS1) 10Elukey: Update yarn.wikimedia.org.crt after regeneration [puppet] - 10https://gerrit.wikimedia.org/r/528461 [13:16:16] (03PS2) 10Filippo Giunchedi: icinga: add /alerts shortcut for faster ack'ing [puppet] - 10https://gerrit.wikimedia.org/r/528410 (https://phabricator.wikimedia.org/T228379) [13:16:18] (03PS1) 10Filippo Giunchedi: monitoring::host: rename critical to paging [puppet] - 10https://gerrit.wikimedia.org/r/528462 (https://phabricator.wikimedia.org/T228379) [13:16:20] (03PS1) 10Filippo Giunchedi: monitoring::service rename critical to paging [puppet] - 10https://gerrit.wikimedia.org/r/528463 (https://phabricator.wikimedia.org/T228379) [13:16:24] (03CR) 10Ema: [C: 03+2] ATS: add prometheus::varnishkafka_exporter::config [puppet] - 10https://gerrit.wikimedia.org/r/528439 (https://phabricator.wikimedia.org/T196066) (owner: 10Ema) [13:16:54] (03PS2) 10Elukey: Update yarn.wikimedia.org.crt after regeneration [puppet] - 10https://gerrit.wikimedia.org/r/528461 [13:16:56] (03CR) 10Krinkle: "Does this mean a third-party origin can fetch asserts from people.wikimedia.org and inspect its headers? If so, that might be a problem if" [puppet] - 10https://gerrit.wikimedia.org/r/522991 (https://phabricator.wikimedia.org/T224068) (owner: 10Gergő Tisza) [13:17:03] (03PS2) 10Ema: ATS: add profile::base::nameservers [puppet] - 10https://gerrit.wikimedia.org/r/528440 (https://phabricator.wikimedia.org/T228190) [13:17:07] (03CR) 10Filippo Giunchedi: "Put WIP by mistake" [puppet] - 10https://gerrit.wikimedia.org/r/528410 (https://phabricator.wikimedia.org/T228379) (owner: 10Filippo Giunchedi) [13:17:29] (03PS2) 10Filippo Giunchedi: monitoring::host: rename critical to paging [puppet] - 10https://gerrit.wikimedia.org/r/528462 (https://phabricator.wikimedia.org/T228379) [13:17:31] (03PS2) 10Filippo Giunchedi: monitoring::service rename critical to paging [puppet] - 10https://gerrit.wikimedia.org/r/528463 (https://phabricator.wikimedia.org/T228379) [13:18:09] (03CR) 10Krinkle: "Ah, right, that's controlled by Access-Control-Allow-Credentials. Might be worth an inline comment to further indidate this should not all" [puppet] - 10https://gerrit.wikimedia.org/r/522991 (https://phabricator.wikimedia.org/T224068) (owner: 10Gergő Tisza) [13:18:16] (03CR) 10Elukey: [V: 03+2 C: 03+2] Update yarn.wikimedia.org.crt after regeneration [puppet] - 10https://gerrit.wikimedia.org/r/528461 (owner: 10Elukey) [13:19:04] puppet-merge has a queue of 3 commits now [13:19:08] elukey: you can go ahead and puppet-merge my change whenever [13:19:13] ack! [13:19:24] what is the IRC name of Jhedden? [13:19:36] jeh [13:19:46] merging now [13:19:47] h! [13:19:49] (03CR) 10Krinkle: "(or even explicitly unset it?)" [puppet] - 10https://gerrit.wikimedia.org/r/522991 (https://phabricator.wikimedia.org/T224068) (owner: 10Gergő Tisza) [13:19:49] hello :) [13:19:59] (03CR) 10jerkins-bot: [V: 04-1] monitoring::service rename critical to paging [puppet] - 10https://gerrit.wikimedia.org/r/528463 (https://phabricator.wikimedia.org/T228379) (owner: 10Filippo Giunchedi) [13:20:00] all right thanks! [13:20:36] mine is safe to merge anytime, looks like there's multiple things in queue [13:21:06] that time of the day where we got to take tickets to be in line [13:21:25] ema: safe to merge `ATS: add prometheus::varnishkafka_exporter::config`? [13:21:45] nm, I see your message above [13:21:59] jeh: yeah you can proceed! [13:22:03] elukey: OK to merge yours as well? [13:22:40] yep! [13:23:52] merged, thanks :) [13:25:25] 10Operations, 10Analytics, 10Core Platform Team Legacy (Watching / External), 10Patch-For-Review, and 2 others: Replace and expand codfw kafka main hosts (kafka200[123]) with kafka-main200[12345] - https://phabricator.wikimedia.org/T225005 (10Ottomata) In addition to the steps in https://phabricator.wikime... [13:26:35] (03PS1) 10Volans: debian: re-add the tests directory in the package [software/conftool] - 10https://gerrit.wikimedia.org/r/528467 [13:27:49] (03PS4) 10Muehlenhoff: Remove obsolete/unsupported nobarrier option from three partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/528437 (https://phabricator.wikimedia.org/T229915) [13:28:04] (03CR) 10Herron: [C: 03+1] "Looks great! Much easier to understand" [puppet] - 10https://gerrit.wikimedia.org/r/528462 (https://phabricator.wikimedia.org/T228379) (owner: 10Filippo Giunchedi) [13:29:10] (03CR) 10Muehlenhoff: [C: 03+2] Remove obsolete/unsupported nobarrier option from three partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/528437 (https://phabricator.wikimedia.org/T229915) (owner: 10Muehlenhoff) [13:29:36] RECOVERY - WDQS Categories update lag on wdqs1006 is OK: OK - wdqs categories lag: 1 day, 8:29:33.957011 https://wikitech.wikimedia.org/wiki/Wikidata_query_service [13:29:36] RECOVERY - WDQS Categories update lag on wdqs1008 is OK: OK - wdqs categories lag: 8:29:34.006896 https://wikitech.wikimedia.org/wiki/Wikidata_query_service [13:30:18] (03CR) 10CDanis: [C: 03+2] debian: re-add the tests directory in the package [software/conftool] - 10https://gerrit.wikimedia.org/r/528467 (owner: 10Volans) [13:32:51] (03Merged) 10jenkins-bot: debian: re-add the tests directory in the package [software/conftool] - 10https://gerrit.wikimedia.org/r/528467 (owner: 10Volans) [13:33:01] (03PS3) 10Ema: ATS: add profile::base::nameservers [puppet] - 10https://gerrit.wikimedia.org/r/528440 (https://phabricator.wikimedia.org/T228190) [13:34:57] (03PS1) 10Volans: Bump debian release [software/conftool] - 10https://gerrit.wikimedia.org/r/528469 [13:36:39] (03CR) 10Ema: [C: 03+2] ATS: add profile::base::nameservers [puppet] - 10https://gerrit.wikimedia.org/r/528440 (https://phabricator.wikimedia.org/T228190) (owner: 10Ema) [13:39:41] (03PS2) 10Volans: Bump debian release [software/conftool] - 10https://gerrit.wikimedia.org/r/528469 [13:41:33] (03CR) 10Volans: [C: 03+1] "LGTM cookbook wise. I'll leave it to the Cassandra experts for the underlying logic and if any sleep is needed between nodes." [cookbooks] - 10https://gerrit.wikimedia.org/r/517377 (https://phabricator.wikimedia.org/T225694) (owner: 10Mathew.onipe) [13:41:35] (03CR) 10Herron: [C: 03+1] "Good idea!" [puppet] - 10https://gerrit.wikimedia.org/r/528410 (https://phabricator.wikimedia.org/T228379) (owner: 10Filippo Giunchedi) [13:42:15] (03CR) 10CDanis: [C: 03+2] Bump debian release [software/conftool] - 10https://gerrit.wikimedia.org/r/528469 (owner: 10Volans) [13:44:46] (03PS1) 10Jbond: confserver: enable ipv6 mapped address on the conf200* servers. [puppet] - 10https://gerrit.wikimedia.org/r/528475 [13:44:48] (03PS1) 10Jbond: conf servers: remove old IPv6 SLAAC addresses [puppet] - 10https://gerrit.wikimedia.org/r/528476 (https://phabricator.wikimedia.org/T102099) [13:46:00] (03PS2) 10Jbond: confserver: enable ipv6 mapped address on the conf200* servers. [puppet] - 10https://gerrit.wikimedia.org/r/528475 (https://phabricator.wikimedia.org/T102099) [13:48:39] (03Merged) 10jenkins-bot: Bump debian release [software/conftool] - 10https://gerrit.wikimedia.org/r/528469 (owner: 10Volans) [13:51:07] (03PS3) 10Jbond: confserver: enable ipv6 mapped address on the conf200* servers. [puppet] - 10https://gerrit.wikimedia.org/r/528475 (https://phabricator.wikimedia.org/T102099) [13:51:09] (03CR) 10Paladox: [C: 04-1] gerrit: replication: exclude some projects (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/528276 (owner: 10Thcipriani) [13:56:07] (03CR) 10Gehel: [C: 04-1] "It now looks trivial enough :)" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/517377 (https://phabricator.wikimedia.org/T225694) (owner: 10Mathew.onipe) [13:59:36] (03PS3) 10Filippo Giunchedi: icinga: add /alerts shortcut for faster ack'ing [puppet] - 10https://gerrit.wikimedia.org/r/528410 (https://phabricator.wikimedia.org/T228379) [13:59:47] (03CR) 10Filippo Giunchedi: [C: 03+2] icinga: add /alerts shortcut for faster ack'ing [puppet] - 10https://gerrit.wikimedia.org/r/528410 (https://phabricator.wikimedia.org/T228379) (owner: 10Filippo Giunchedi) [14:02:37] (03PS5) 10Mathew.onipe: Cassandra nodetool repair cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/517377 (https://phabricator.wikimedia.org/T225694) [14:03:03] (03CR) 10Mathew.onipe: Cassandra nodetool repair cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/517377 (https://phabricator.wikimedia.org/T225694) (owner: 10Mathew.onipe) [14:04:21] (03PS1) 10Jbond: conf servers: add AAAA and ipv6 PTR records for conf200* servers [dns] - 10https://gerrit.wikimedia.org/r/528479 (https://phabricator.wikimedia.org/T102099) [14:04:46] (03CR) 10Andrew Bogott: [C: 03+1] Update wmflabs.org redirect target [puppet] - 10https://gerrit.wikimedia.org/r/528304 (https://phabricator.wikimedia.org/T229896) (owner: 10DannyS712) [14:05:05] (03CR) 10jerkins-bot: [V: 04-1] conf servers: add AAAA and ipv6 PTR records for conf200* servers [dns] - 10https://gerrit.wikimedia.org/r/528479 (https://phabricator.wikimedia.org/T102099) (owner: 10Jbond) [14:06:43] (03CR) 10Gergő Tisza: "In general I don't think you can trigger a CORS request and inspect its headers. You can inspect the response headers, but people.wm.org c" [puppet] - 10https://gerrit.wikimedia.org/r/522991 (https://phabricator.wikimedia.org/T224068) (owner: 10Gergő Tisza) [14:07:07] (03CR) 10Bstorm: Update wmflabs.org redirect target (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/528304 (https://phabricator.wikimedia.org/T229896) (owner: 10DannyS712) [14:11:50] (03PS2) 10Jbond: conf servers: add AAAA and ipv6 PTR records for conf200* servers [dns] - 10https://gerrit.wikimedia.org/r/528479 (https://phabricator.wikimedia.org/T102099) [14:12:00] (03CR) 10Bstorm: Update wmflabs.org redirect target (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/528304 (https://phabricator.wikimedia.org/T229896) (owner: 10DannyS712) [14:13:15] (03PS4) 10Jbond: confserver: enable ipv6 mapped address on the conf200* servers. [puppet] - 10https://gerrit.wikimedia.org/r/528475 (https://phabricator.wikimedia.org/T102099) [14:13:46] (03PS3) 10MSantos: First version of the wikifeeds chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/526679 (https://phabricator.wikimedia.org/T229287) [14:13:58] 10Operations, 10Traffic: SRE Onboarding for Sukhbir Singh - https://phabricator.wikimedia.org/T229860 (10Jdforrester-WMF) [14:14:21] (03CR) 10MSantos: First version of the wikifeeds chart (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/526679 (https://phabricator.wikimedia.org/T229287) (owner: 10MSantos) [14:16:01] 10Operations, 10Readers-Web-Backlog, 10Traffic: [Bug] iPadOS 13 shows the desktop version of Safari with a broken layout - https://phabricator.wikimedia.org/T229875 (10Jdlrobson) a:05Jdlrobson→03phuedx We shouldn't add ` (03CR) 10Elukey: [C: 03+2] Remove Spark RPC auth/encryption settings from Hadoop test [puppet] - 10https://gerrit.wikimedia.org/r/528483 (https://phabricator.wikimedia.org/T226698) (owner: 10Elukey) [14:22:09] (03CR) 10Effie Mouzeli: "https://puppet-compiler.wmflabs.org/compiler1002/17748/mwmaint1002.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/425027 (https://phabricator.wikimedia.org/T195392) (owner: 10Giuseppe Lavagetto) [14:24:42] (03CR) 10Giuseppe Lavagetto: [C: 04-1] profile::mediawiki::hhvm: default php to php7 on stretch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/425027 (https://phabricator.wikimedia.org/T195392) (owner: 10Giuseppe Lavagetto) [14:29:49] (03PS1) 10Muehlenhoff: Initial stub role for the IDP (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/528487 [14:31:37] 10Operations, 10ops-eqiad, 10DBA: Upgrade db1100 firmware and BIOS - https://phabricator.wikimedia.org/T228732 (10Cmjohnson) 05Open→03Resolved @marostegui The f/w and bios update is complete. [14:31:40] 10Operations, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Cmjohnson) [14:32:02] (03CR) 10jerkins-bot: [V: 04-1] Initial stub role for the IDP (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/528487 (owner: 10Muehlenhoff) [14:34:16] (03PS1) 10Jbond: puppetmaster1003: move kafka and analytics host to puppetmaster1003 [puppet] - 10https://gerrit.wikimedia.org/r/528488 [14:36:21] 10Operations, 10ops-eqiad, 10DBA: Upgrade db1100 firmware and BIOS - https://phabricator.wikimedia.org/T228732 (10Marostegui) Thanks - I will start MySQL now and take it from there [14:36:36] !log Start mysql on db1100 after on-site maintenance - T228732 [14:36:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:44] T228732: Upgrade db1100 firmware and BIOS - https://phabricator.wikimedia.org/T228732 [14:37:32] !log ✔️ cdanis@install1002.wikimedia.org ~/conftool-1.1.4-2 🕥 sudo -E reprepro -C main include jessie-wikimedia conftool_1.1.4-2+deb8u1_amd64.changes [14:37:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:38:31] 10Operations, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui) [14:39:46] (03CR) 10Jbond: "saw this go by so i thought i would take a quick look :)" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/528487 (owner: 10Muehlenhoff) [14:40:01] (03CR) 10Jbond: [C: 03+2] puppetmaster1003: move kafka and analytics host to puppetmaster1003 [puppet] - 10https://gerrit.wikimedia.org/r/528488 (owner: 10Jbond) [14:42:00] (03PS1) 10Elukey: role::piwik: add TLS proxy [puppet] - 10https://gerrit.wikimedia.org/r/528490 (https://phabricator.wikimedia.org/T227860) [14:42:18] (03PS3) 10Herron: logstash: manage currently unmanaged log4j2.properties file [puppet] - 10https://gerrit.wikimedia.org/r/528305 (https://phabricator.wikimedia.org/T166107) [14:44:39] (03CR) 10Herron: [C: 03+2] logstash: manage currently unmanaged log4j2.properties file [puppet] - 10https://gerrit.wikimedia.org/r/528305 (https://phabricator.wikimedia.org/T166107) (owner: 10Herron) [14:44:50] !log ✔️ cdanis@install1002.wikimedia.org ~/conftool-1.1.4-2 🕥☕ sudo -E reprepro -C main include stretch-wikimedia conftool_1.1.4-2_amd64.changes [14:44:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:03] (03PS2) 10Elukey: role::piwik: add TLS proxy [puppet] - 10https://gerrit.wikimedia.org/r/528490 (https://phabricator.wikimedia.org/T227860) [14:45:05] !log ✔️ cdanis@install1002.wikimedia.org ~/conftool-1.1.4-2 🕥☕ sudo -E reprepro -C main include buster-wikimedia conftool_1.1.4-2+deb10u1_amd64.changes [14:45:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:46:04] (03PS1) 10Mathew.onipe: lvs: isolate cloudelastic icinga check [puppet] - 10https://gerrit.wikimedia.org/r/528491 (https://phabricator.wikimedia.org/T229621) [14:47:34] 10Operations, 10Traffic: SRE Onboarding for Sukhbir Singh - https://phabricator.wikimedia.org/T229860 (10Marostegui) We are getting emails to root@ with: sukhe@wikimedia.org (generated from root@wikimedia.org) Address sukhe@wikimedia.org does not exist [14:49:39] (03CR) 10Elukey: [C: 03+2] role::piwik: add TLS proxy [puppet] - 10https://gerrit.wikimedia.org/r/528490 (https://phabricator.wikimedia.org/T227860) (owner: 10Elukey) [14:49:48] (03PS2) 10Muehlenhoff: Initial stub role for the IDP (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/528487 [14:49:54] (03CR) 10Muehlenhoff: Initial stub role for the IDP (WIP) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/528487 (owner: 10Muehlenhoff) [14:49:56] 10Operations, 10serviceops: tmpreaper possible race condition - https://phabricator.wikimedia.org/T151304 (10Andrew) From the man page: ` Unless your machine is one with lots of relatively untrusted users, such as an ISP or school, you don't need this program; `find ... -exec rm ...' wor... [14:50:23] !log ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo debdeploy deploy -u 2019-08-06-conftool.yaml -s mw-canary [14:50:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:44] (03PS1) 10Ottomata: Install pyall package on analytics cluster [puppet] - 10https://gerrit.wikimedia.org/r/528492 (https://phabricator.wikimedia.org/T229347) [14:52:19] (03CR) 10jerkins-bot: [V: 04-1] Initial stub role for the IDP (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/528487 (owner: 10Muehlenhoff) [14:52:48] !log restarting logstash service on logstash1007 to pick up puppet managed log4j2 config [14:52:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:09] 10Operations, 10Readers-Web-Backlog, 10Traffic: [Bug] iPadOS 13 shows the desktop version of Safari with a broken layout - https://phabricator.wikimedia.org/T229875 (10ovasileva) a:05phuedx→03Jdlrobson [14:54:45] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [14:54:46] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:54:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:00] !log ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo cumin -p99 -b100 'A:all' 'apt-get update' [14:55:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:10] (03PS1) 10Elukey: profile::piwik::webserver: fix profile import [puppet] - 10https://gerrit.wikimedia.org/r/528493 [14:55:14] (03PS3) 10Filippo Giunchedi: monitoring::service rename critical to paging [puppet] - 10https://gerrit.wikimedia.org/r/528463 (https://phabricator.wikimedia.org/T228379) [14:55:58] !log rebooting mwlog1001 for kernel update [14:56:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:47] (03CR) 10Elukey: [C: 03+2] profile::piwik::webserver: fix profile import [puppet] - 10https://gerrit.wikimedia.org/r/528493 (owner: 10Elukey) [14:57:02] 10Operations, 10Release Pipeline, 10Maps (Kartotherian), 10Patch-For-Review: Create blubberfile for deploying kartotherian into docker environment. - https://phabricator.wikimedia.org/T223275 (10Mathew.onipe) @MSantos Thank you! [15:00:32] 10Operations, 10ops-codfw, 10DBA: (2019-08-31)rack/setup/install db2131.codfw.wmnet - https://phabricator.wikimedia.org/T229251 (10Papaul) a:05Papaul→03Marostegui Fixed [15:04:38] !log ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo debdeploy deploy -u 2019-08-06-conftool.yaml -s all [15:04:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:30] * bd808 is not sure about the new emojis in the cumin output [15:05:55] bd808: it's not cumin, it's just cdanis [15:05:58] 10Operations, 10Analytics, 10Analytics-Kanban, 10Traffic, and 2 others: TLS certificates for Analytics origin servers - https://phabricator.wikimedia.org/T227860 (10elukey) [15:06:01] oh, that's just cdanis showing off :) [15:06:06] yeah it's his prompt [15:06:12] 10Operations, 10ops-eqiad: (OoW) Degraded RAID on analytics1039 - https://phabricator.wikimedia.org/T226599 (10wiki_willy) Thanks @elukey [15:06:18] I copy & paste my shell prompt for my log statements; it encodes a lot of state and also it's very easy [15:06:28] I wanted actually to ask you to add a filter to the !-log bot to filter them bd808 :-P [15:06:42] if you know how the state is encoded… :P [15:06:50] 10Operations, 10ops-eqiad: (OoW) Degraded RAID on analytics1032 - https://phabricator.wikimedia.org/T227940 (10wiki_willy) @elukey , thank you [15:06:57] I could write an irssi script and a web browser userscript for alternative renderings of the emojis [15:07:09] I am thinking something like: [U+2714 HEAVY CHECK MARK] cdanis@cumin1001.eqiad.wmnet ~ [U+1F55A CLOCK FACE ELEVEN OCLOCK][U+2615 HOT BEVERAGE] sudo cumin -p99 -b100 'A:all' 'apt-get update' [15:07:18] volans: heh. patches welcome...https://gerrit.wikimedia.org/r/#/admin/projects/labs/tools/stashbot [15:07:56] yeah the SAL page looks weird with those emojis [15:08:23] it looks modern and exciting [15:09:33] (in seriousness, if people are deeply bothered by this, please let me know and I can do something else) [15:09:57] jokes apart we were chatting with _j.oe_ before to maybe make a wrapper that !-logs any command and then exexutes it [15:10:13] that could live on the cluster-mgmt hosts [15:10:15] at least [15:10:17] that is something I was surprised didn't exist already tbh [15:10:27] but have not yet written myself ;) [15:11:37] <_joe_> the plus there is [15:11:51] <_joe_> every typo you make in a shell will be tweeted with your username attached [15:11:55] <_joe_> :P [15:12:17] cdanis: I would say I am deeply bothered just to see what the 'do something else' is [15:12:37] jijiki: in that case, see a few lines above ;) [15:12:59] oh I thought there was a 3rd something else [15:13:05] !log installing bind9 security updates (client-side tools/libs only) for jessie [15:13:06] withdrawn :p [15:13:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:23] <_joe_> we should make the wrapper log only when successful [15:13:26] <_joe_> :P [15:13:29] logit vi /tmp/script.sh && logit /tmp/script.sh # <- hiding the ugly details so I don't get shamed on IRC [15:15:14] _joe_: rotfl [15:18:05] PROBLEM - puppet last run on dubnium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[dnsutils] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [15:28:22] (03PS1) 10Alexandros Kosiaris: staging: Bump all LimitRanges and ResourceQuotas [deployment-charts] - 10https://gerrit.wikimedia.org/r/528494 (https://phabricator.wikimedia.org/T228837) [15:28:24] (03PS1) 10Alexandros Kosiaris: codfw: Bump all LimitRanges and ResourceQuotas [deployment-charts] - 10https://gerrit.wikimedia.org/r/528495 (https://phabricator.wikimedia.org/T228837) [15:32:06] (03PS2) 10Alexandros Kosiaris: staging: Bump all LimitRanges and ResourceQuotas [deployment-charts] - 10https://gerrit.wikimedia.org/r/528494 (https://phabricator.wikimedia.org/T228837) [15:32:08] (03PS2) 10Alexandros Kosiaris: codfw: Bump all LimitRanges and ResourceQuotas [deployment-charts] - 10https://gerrit.wikimedia.org/r/528495 (https://phabricator.wikimedia.org/T228837) [15:34:02] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] staging: Bump all LimitRanges and ResourceQuotas [deployment-charts] - 10https://gerrit.wikimedia.org/r/528494 (https://phabricator.wikimedia.org/T228837) (owner: 10Alexandros Kosiaris) [15:44:53] RECOVERY - puppet last run on dubnium is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [15:44:54] (03Abandoned) 10EBernhardson: Add swift read credentials for mjolnir [puppet] - 10https://gerrit.wikimedia.org/r/524625 (https://phabricator.wikimedia.org/T227364) (owner: 10EBernhardson) [15:45:17] (03PS2) 10Gehel: Change mjolnir_bulk_daemon kafka topics [puppet] - 10https://gerrit.wikimedia.org/r/528190 (https://phabricator.wikimedia.org/T227364) (owner: 10EBernhardson) [15:45:56] (03Abandoned) 10EBernhardson: Add swift analytics_mjolnir dummy account key [labs/private] - 10https://gerrit.wikimedia.org/r/524624 (https://phabricator.wikimedia.org/T227364) (owner: 10EBernhardson) [15:47:17] (03CR) 10Gehel: [C: 03+2] Change mjolnir_bulk_daemon kafka topics [puppet] - 10https://gerrit.wikimedia.org/r/528190 (https://phabricator.wikimedia.org/T227364) (owner: 10EBernhardson) [15:48:29] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] "Staging patch applied all over with no problems, moving on to codfw" [deployment-charts] - 10https://gerrit.wikimedia.org/r/528495 (https://phabricator.wikimedia.org/T228837) (owner: 10Alexandros Kosiaris) [15:53:20] (03PS1) 10Volans: Upstream release v0.0.26 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/528502 [15:53:50] (03PS1) 10EBernhardson: Turn on cloudelastic writes for group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528503 (https://phabricator.wikimedia.org/T220625) [15:53:52] (03PS2) 10Volans: Upstream release v0.0.26 (take 2) [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/528502 [15:57:37] (03PS2) 10Ottomata: Install python3.7 package on analytics cluster [puppet] - 10https://gerrit.wikimedia.org/r/528492 (https://phabricator.wikimedia.org/T229347) [16:00:04] godog and _joe_: #bothumor My software never has bugs. It just develops random features. Rise for Puppet SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190806T1600). [16:00:04] tgr: A patch you scheduled for Puppet SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:15] o/ [16:00:37] tgr: I'll take a look [16:01:11] (03CR) 10Volans: [C: 03+2] Upstream release v0.0.26 (take 2) [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/528502 (owner: 10Volans) [16:01:45] PROBLEM - PyBal backends health check on lvs2006 is CRITICAL: PYBAL CRITICAL - CRITICAL - blubberoid_8748: Servers kubernetes2006.codfw.wmnet, kubernetes2003.codfw.wmnet, kubernetes2005.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [16:01:53] PROBLEM - LVS HTTP IPv4 on blubberoid.svc.codfw.wmnet is CRITICAL: connect to address 10.2.1.31 and port 8748: Connection refused https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [16:02:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1004 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [16:03:03] PROBLEM - PyBal backends health check on lvs2003 is CRITICAL: PYBAL CRITICAL - CRITICAL - blubberoid_8748: Servers kubernetes2002.codfw.wmnet, kubernetes2006.codfw.wmnet, kubernetes2005.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [16:03:24] 10Operations, 10Discovery-Search: Prometheus not collecting cloudelastic metrics - https://phabricator.wikimedia.org/T229937 (10EBernhardson) [16:03:33] mhh sec looking at the mw exceptions [16:04:05] 10Operations, 10Discovery-Search: Prometheus not collecting cloudelastic metrics - https://phabricator.wikimedia.org/T229937 (10EBernhardson) [16:04:35] ebernhardson: [{exception_id}] {exception_url} RuntimeException from line 86 of /srv/mediawiki/php-1.34.0-wmf.16/extensions/CirrusSearch/includes/Job/JobTraits.php: Received cirrusSearchElasticaWrite job for an unwritable cluster cloudelastic. [16:04:43] (03CR) 10Alexandros Kosiaris: [C: 04-1] First version of the wikifeeds chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/526679 (https://phabricator.wikimedia.org/T229287) (owner: 10MSantos) [16:04:57] (03PS1) 10Jbond: GeoIP: the version of geoipupdate in buster has difference in the config [puppet] - 10https://gerrit.wikimedia.org/r/528505 [16:04:57] ebernhardson: known ? a bunch of those exceptions [16:05:03] godog: expected. For some reason we don't maintain a whitelist of non-private wikis [16:05:12] godog: so the maintenance script runs on all wikis, and bails on private ones [16:05:50] (03PS1) 10Isarra: Enable Related Article cards in Timeless across all projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528506 (https://phabricator.wikimedia.org/T181242) [16:06:05] ebernhardson: ah thanks, afaik there's the list of private wikis though that could be used (?) [16:06:10] anyways, back to puppet swat [16:06:38] godog: right, but it goes the opposite way. I suppose i could string together some bash to do set intersections but didn't... sorry to be a distraction [16:07:00] (03Merged) 10jenkins-bot: Upstream release v0.0.26 (take 2) [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/528502 (owner: 10Volans) [16:07:19] (03PS2) 10Jbond: GeoIP: the version of geoipupdate in buster has difference in the config [puppet] - 10https://gerrit.wikimedia.org/r/528505 [16:07:23] ebernhardson: no worries at all, talking out loud for the most part :) [16:08:00] tgr: ok to go ahead even with the unanswered question/comment ? [16:08:34] godog: good to go IMO, I can follow up later if needed but it's not a functional change [16:08:47] kk [16:08:47] !log akosiaris@puppetmaster1001 conftool action : set/pooled=false; selector: dnsdisc=citoid,name=codfw [16:08:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:08:54] (03CR) 10Filippo Giunchedi: [C: 03+2] Allow CORS access to publichtml (people.wikimedia.org) [puppet] - 10https://gerrit.wikimedia.org/r/522991 (https://phabricator.wikimedia.org/T224068) (owner: 10Gergő Tisza) [16:09:02] (03PS3) 10Filippo Giunchedi: Allow CORS access to publichtml (people.wikimedia.org) [puppet] - 10https://gerrit.wikimedia.org/r/522991 (https://phabricator.wikimedia.org/T224068) (owner: 10Gergő Tisza) [16:09:13] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1004 is OK: OK: Less than 70.00% above the threshold [25.0] https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [16:09:34] akosiaris: blubberoid is you ? [16:12:43] (03CR) 10Ottomata: [C: 03+2] Install python3.7 package on analytics cluster [puppet] - 10https://gerrit.wikimedia.org/r/528492 (https://phabricator.wikimedia.org/T229347) (owner: 10Ottomata) [16:12:52] (03PS3) 10Ottomata: Install python3.7 package on analytics cluster [puppet] - 10https://gerrit.wikimedia.org/r/528492 (https://phabricator.wikimedia.org/T229347) [16:12:55] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Install python3.7 package on analytics cluster [puppet] - 10https://gerrit.wikimedia.org/r/528492 (https://phabricator.wikimedia.org/T229347) (owner: 10Ottomata) [16:13:37] (03PS1) 10Alexandros Kosiaris: Fixup limitranges for citoid,cxserver [deployment-charts] - 10https://gerrit.wikimedia.org/r/528508 (https://phabricator.wikimedia.org/T228837) [16:13:46] godog: yup, and it's harmless [16:14:17] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Fixup limitranges for citoid,cxserver [deployment-charts] - 10https://gerrit.wikimedia.org/r/528508 (https://phabricator.wikimedia.org/T228837) (owner: 10Alexandros Kosiaris) [16:15:01] tgr: {{done}} there might be varnish caching results btw [16:15:05] akosiaris: ack, thanks [16:15:35] if puppet swat is over, i'm going to sneak in a regular config swat this morning. only effects job runners [16:15:55] (03CR) 10EBernhardson: [C: 03+2] Turn on cloudelastic writes for group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528503 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [16:16:12] (03CR) 10Jbond: [C: 03+2] GeoIP: the version of geoipupdate in buster has difference in the config [puppet] - 10https://gerrit.wikimedia.org/r/528505 (owner: 10Jbond) [16:16:22] (03PS3) 10Jbond: GeoIP: the version of geoipupdate in buster has difference in the config [puppet] - 10https://gerrit.wikimedia.org/r/528505 [16:16:38] ebernhardson: it is yeah [16:16:59] (03Merged) 10jenkins-bot: Turn on cloudelastic writes for group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528503 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [16:17:15] (03CR) 10jenkins-bot: Turn on cloudelastic writes for group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528503 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [16:18:37] PROBLEM - puppet last run on analytics1058 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:19:56] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T220625: Turn on cloudelastic writes for group1 (duration: 00m 47s) [16:20:03] PROBLEM - puppet last run on notebook1003 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:20:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:20:04] T220625: Initialize CirrusSearch on cloudelastic - https://phabricator.wikimedia.org/T220625 [16:20:05] PROBLEM - puppet last run on analytics1072 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:20:06] (03PS1) 10Ottomata: Declare python3.7 package explicity to avoid circular dependency [puppet] - 10https://gerrit.wikimedia.org/r/528509 (https://phabricator.wikimedia.org/T229347) [16:20:55] PROBLEM - puppet last run on an-worker1089 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:20:55] PROBLEM - puppet last run on analytics1042 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:21:37] PROBLEM - puppet last run on analytics1062 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:21:53] (03CR) 10Ottomata: [C: 03+1] Switch updateBetaFeaturesUserCounts job to eventgate. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528209 (https://phabricator.wikimedia.org/T228705) (owner: 10Ppchelko) [16:21:59] PROBLEM - puppet last run on analytics1057 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:22:11] PROBLEM - puppet last run on analytics1070 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:22:13] PROBLEM - puppet last run on stat1007 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:22:13] (03PS1) 10Jbond: geoipupdate: correct configuration stanza AccountID instead of UserID [puppet] - 10https://gerrit.wikimedia.org/r/528510 [16:22:21] ^^^^ workers are me [16:22:22] am fixing [16:22:25] circular dep in puppet. [16:22:31] (03CR) 10Ottomata: [C: 03+2] Declare python3.7 package explicity to avoid circular dependency [puppet] - 10https://gerrit.wikimedia.org/r/528509 (https://phabricator.wikimedia.org/T229347) (owner: 10Ottomata) [16:23:09] PROBLEM - puppet last run on analytics1056 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:24:05] (03PS2) 10Jbond: geoipupdate: correct configuration stanza AccountID instead of UserID [puppet] - 10https://gerrit.wikimedia.org/r/528510 [16:24:07] ah yeah, 'agent failed' alerts immediately now, will need tweaking to wait a little bit now [16:24:50] (03PS3) 10Jbond: geoipupdate: correct configuration stanza AccountID instead of UserID [puppet] - 10https://gerrit.wikimedia.org/r/528510 [16:25:21] PROBLEM - puppet last run on analytics1060 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:25:23] PROBLEM - puppet last run on analytics1063 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:25:23] PROBLEM - puppet last run on an-worker1086 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:25:23] PROBLEM - puppet last run on analytics1049 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:25:55] PROBLEM - puppet last run on an-coord1001 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:25:59] PROBLEM - puppet last run on puppetmaster1003 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[geoipupdate] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:26:04] (03CR) 10Jbond: [C: 03+2] geoipupdate: correct configuration stanza AccountID instead of UserID [puppet] - 10https://gerrit.wikimedia.org/r/528510 (owner: 10Jbond) [16:26:11] PROBLEM - puppet last run on an-worker1087 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:26:28] huh, i thought puppet always ran apt-get update... [16:26:30] (03PS1) 10Alexandros Kosiaris: blubberoid/sessionstore: Bump requests/limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/528513 (https://phabricator.wikimedia.org/T228837) [16:27:17] PROBLEM - puppet last run on an-worker1083 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:28:29] PROBLEM - puppet last run on an-worker1080 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:28:39] PROBLEM - puppet last run on analytics1046 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 9 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/apt/sources.list.d/component-pyall.list],Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:29:03] RECOVERY - puppet last run on analytics1058 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:29:07] (03PS1) 10Alexandros Kosiaris: Revert Add resources stanza to prometheus-metrics-exporter [deployment-charts] - 10https://gerrit.wikimedia.org/r/528515 (https://phabricator.wikimedia.org/T228837) [16:29:47] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] blubberoid/sessionstore: Bump requests/limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/528513 (https://phabricator.wikimedia.org/T228837) (owner: 10Alexandros Kosiaris) [16:30:03] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Revert Add resources stanza to prometheus-metrics-exporter [deployment-charts] - 10https://gerrit.wikimedia.org/r/528515 (https://phabricator.wikimedia.org/T228837) (owner: 10Alexandros Kosiaris) [16:31:19] RECOVERY - puppet last run on puppetmaster1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:31:34] !log @ helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' . [16:31:35] PROBLEM - puppet last run on analytics1065 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 30 seconds ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:31:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:31:47] PROBLEM - puppet last run on stat1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 45 seconds ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:31:47] PROBLEM - puppet last run on notebook1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:31:57] PROBLEM - puppet last run on an-worker1095 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:31:59] PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 46 seconds ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:32:07] PROBLEM - puppet last run on analytics1052 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:32:20] ottomata: only when puppet runs from cron [16:32:21] PROBLEM - puppet last run on analytics1059 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:32:27] !log @ helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' . [16:32:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:39] PROBLEM - puppet last run on analytics1077 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:32:43] PROBLEM - puppet last run on analytics1069 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:32:51] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1004 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [16:32:53] PROBLEM - puppet last run on an-worker1079 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:33:03] !log @ helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' . [16:33:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:13] !log @ helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' . [16:33:15] PROBLEM - puppet last run on analytics1066 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:33:15] PROBLEM - puppet last run on an-worker1088 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:33:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:25] PROBLEM - puppet last run on an-worker1090 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:33:37] !log @ helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' . [16:33:43] PROBLEM - puppet last run on an-worker1082 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:33:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:34:07] PROBLEM - puppet last run on an-worker1081 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:34:09] PROBLEM - puppet last run on analytics1074 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:34:41] PROBLEM - puppet last run on an-worker1091 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:34:45] PROBLEM - puppet last run on an-worker1092 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:34:49] PROBLEM - puppet last run on analytics1051 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:35:02] godog: REALLY! [16:35:04] huh. [16:35:41] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1004 is OK: OK: Less than 70.00% above the threshold [25.0] https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [16:35:59] PROBLEM - puppet last run on analytics1054 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:35:59] RECOVERY - puppet last run on notebook1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:36:11] PROBLEM - puppet last run on analytics1075 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:36:20] 10Operations, 10SRE-Access-Requests: Requesting access to Puppet for Viztor[S] - https://phabricator.wikimedia.org/T229894 (10Viztor) >>! In T229894#5395772, @Aklapper wrote: > @Viztor: Please see https://wikitech.wikimedia.org/wiki/Production_shell_access section 2.2 for missing information here, and then pro... [16:36:23] !log @ helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' . [16:36:25] PROBLEM - puppet last run on analytics1076 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:36:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:36:51] RECOVERY - puppet last run on analytics1042 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:37:17] PROBLEM - puppet last run on analytics1043 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:37:23] RECOVERY - puppet last run on an-worker1095 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:37:52] !log @ helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' . [16:37:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:27] RECOVERY - LVS HTTP IPv4 on blubberoid.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 7832 bytes in 0.074 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [16:38:43] RECOVERY - puppet last run on an-worker1088 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:38:43] RECOVERY - puppet last run on analytics1066 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:39:11] RECOVERY - puppet last run on an-worker1082 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:39:13] RECOVERY - puppet last run on analytics1056 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:39:19] RECOVERY - puppet last run on an-worker1080 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:39:32] !log @ helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' . [16:39:33] RECOVERY - puppet last run on analytics1046 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:39:37] RECOVERY - puppet last run on an-worker1081 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:39:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:39:41] RECOVERY - puppet last run on analytics1074 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:40:03] RECOVERY - PyBal backends health check on lvs2006 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [16:40:15] RECOVERY - puppet last run on an-worker1091 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:40:19] RECOVERY - puppet last run on an-worker1092 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:40:21] RECOVERY - puppet last run on analytics1051 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:40:36] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T220625: Re-sync enable group1 on cloudelastic, job runners are claiming its not enabled while app servers are sending jobs (duration: 00m 47s) [16:40:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:40:45] T220625: Initialize CirrusSearch on cloudelastic - https://phabricator.wikimedia.org/T220625 [16:41:05] RECOVERY - PyBal backends health check on lvs2003 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [16:41:31] RECOVERY - puppet last run on analytics1054 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:41:37] RECOVERY - puppet last run on analytics1072 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:41:43] RECOVERY - puppet last run on analytics1060 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:41:45] RECOVERY - puppet last run on analytics1063 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:41:45] RECOVERY - puppet last run on analytics1075 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:41:47] RECOVERY - puppet last run on an-worker1086 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:41:47] RECOVERY - puppet last run on analytics1049 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:41:59] RECOVERY - puppet last run on analytics1076 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:42:19] RECOVERY - puppet last run on an-coord1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:42:27] RECOVERY - puppet last run on an-worker1089 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:42:33] RECOVERY - puppet last run on an-worker1087 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:42:37] RECOVERY - puppet last run on analytics1065 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:42:49] RECOVERY - puppet last run on stat1006 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:42:49] RECOVERY - puppet last run on notebook1004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:42:53] RECOVERY - puppet last run on analytics1043 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:43:01] RECOVERY - puppet last run on stat1004 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:43:11] RECOVERY - puppet last run on analytics1052 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:43:13] RECOVERY - puppet last run on analytics1062 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:43:23] RECOVERY - puppet last run on analytics1059 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:43:24] !log akosiaris@puppetmaster1001 conftool action : set/pooled=false; selector: dnsdisc=eventgate-analytics,name=codfw [16:43:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:43:35] RECOVERY - puppet last run on analytics1057 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:43:41] RECOVERY - puppet last run on an-worker1083 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:43:43] RECOVERY - puppet last run on analytics1077 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:43:45] RECOVERY - puppet last run on analytics1069 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:43:47] RECOVERY - puppet last run on analytics1070 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:43:51] RECOVERY - puppet last run on stat1007 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:43:55] RECOVERY - puppet last run on an-worker1079 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:44:29] RECOVERY - puppet last run on an-worker1090 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:45:12] (03CR) 10Krinkle: "> You mean like" [puppet] - 10https://gerrit.wikimedia.org/r/522991 (https://phabricator.wikimedia.org/T224068) (owner: 10Gergő Tisza) [16:47:17] !log akosiaris@puppetmaster1001 conftool action : set/pooled=false; selector: dnsdisc=mathoid,name=codfw [16:47:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:48:37] !log @ helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'analytics' . [16:48:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:03] !log akosiaris@puppetmaster1001 conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics,name=codfw [16:50:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:16] !log akosiaris@puppetmaster1001 conftool action : set/pooled=true; selector: dnsdisc=citoid,name=codfw [16:50:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:27] !log cutting branch for 1.34.0-wmf.17 [16:50:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:51:08] godog: works, thanks! [16:51:24] tgr: yw [16:52:03] (03CR) 10BBlack: [C: 03+1] "This looks like the right way to go about this for now, to me, but I haven't tried to validate anything on a technical level (e.g. compile" [puppet] - 10https://gerrit.wikimedia.org/r/528491 (https://phabricator.wikimedia.org/T229621) (owner: 10Mathew.onipe) [16:52:34] !log @ helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' . [16:52:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:54:32] !log akosiaris@puppetmaster1001 conftool action : set/pooled=true; selector: dnsdisc=mathoid,name=codfw [16:54:33] PROBLEM - puppet last run on mw2142 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz],File[/usr/share/GeoIP/GeoIPCity.dat.test] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:54:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:57:15] PROBLEM - puppet last run on mw2139 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 7 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz],File[/usr/share/GeoIP/GeoIP2-City.mmdb.test] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [17:00:05] cscott, arlolra, subbu, halfak, and accraze: I, the Bot under the Fountain, allow thee, The Deployer, to do Services – Graphoid / Parsoid / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190806T1700). [17:00:17] no parsoid deploy today [17:03:53] 10Operations, 10SRE-Access-Requests: Requesting access to Puppet for Viztor[S] - https://phabricator.wikimedia.org/T229894 (10Aklapper) Thanks. It's still unclear to me which [specific server](https://wikitech.wikimedia.org/wiki/Puppet#Puppetmaster) you need access to and especially why, but SRE can judge that... [17:07:33] (03PS1) 10Jbond: puppetmaster::frontend: update web conf to use RewriteRules instead of proxypass [puppet] - 10https://gerrit.wikimedia.org/r/528521 (https://phabricator.wikimedia.org/T228657) [17:11:23] (03PS2) 10Jbond: puppetmaster::frontend: update web conf to use RewriteRules instead of proxypass [puppet] - 10https://gerrit.wikimedia.org/r/528521 (https://phabricator.wikimedia.org/T228657) [17:14:42] !log uploaded spicerack_0.0.26-1_amd64.deb to apt.wikimedia.org stretch-wikimedia [17:14:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:17:29] We're looking to do an ORES deploy. [17:17:51] Here we go! [17:18:44] (03CR) 10Ori.livneh: "OK, what do you want to do with this?" [puppet] - 10https://gerrit.wikimedia.org/r/511751 (owner: 10Ori.livneh) [17:20:05] !log accraze@deploy1001 Started deploy [ores/deploy@d08fa62]: T229848 [17:20:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:20:14] T229848: ORES deployment, Early August 2019 - https://phabricator.wikimedia.org/T229848 [17:20:24] (03PS2) 10BBlack: anycast recdns: enable for codfw clients [puppet] - 10https://gerrit.wikimedia.org/r/526788 (https://phabricator.wikimedia.org/T228190) [17:20:26] (03PS1) 10BBlack: anycast recdns: config for many eqiad canaries [puppet] - 10https://gerrit.wikimedia.org/r/528524 (https://phabricator.wikimedia.org/T228190) [17:20:28] (03PS1) 10BBlack: anycast recdns: enable globally [puppet] - 10https://gerrit.wikimedia.org/r/528525 (https://phabricator.wikimedia.org/T228190) [17:20:40] accraze, https://grafana.wikimedia.org/d/HIRrxQ6mk/ores?refresh=1m&orgId=1 [17:20:40] (03PS1) 10Ladsgroup: mediawiki: Introduce startupregistrystats.pp to record RL modules registry [puppet] - 10https://gerrit.wikimedia.org/r/528526 (https://phabricator.wikimedia.org/T229836) [17:22:29] RECOVERY - puppet last run on mw2142 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [17:24:37] (03PS2) 10BBlack: anycast recdns: config for many eqiad canaries [puppet] - 10https://gerrit.wikimedia.org/r/528524 (https://phabricator.wikimedia.org/T228190) [17:24:39] (03PS3) 10BBlack: anycast recdns: enable for codfw clients [puppet] - 10https://gerrit.wikimedia.org/r/526788 (https://phabricator.wikimedia.org/T228190) [17:24:41] (03PS2) 10BBlack: anycast recdns: enable globally [puppet] - 10https://gerrit.wikimedia.org/r/528525 (https://phabricator.wikimedia.org/T228190) [17:24:45] (03PS1) 10CDanis: noc: fetch dbconfig from etcd to local disk [puppet] - 10https://gerrit.wikimedia.org/r/528527 [17:25:13] RECOVERY - puppet last run on mw2139 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [17:28:01] (03PS2) 10CDanis: noc: fetch dbconfig from etcd to local disk [puppet] - 10https://gerrit.wikimedia.org/r/528527 [17:30:19] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: Introduce startupregistrystats.pp to record RL modules registry [puppet] - 10https://gerrit.wikimedia.org/r/528526 (https://phabricator.wikimedia.org/T229836) (owner: 10Ladsgroup) [17:30:50] (03CR) 10jerkins-bot: [V: 04-1] anycast recdns: enable globally [puppet] - 10https://gerrit.wikimedia.org/r/528525 (https://phabricator.wikimedia.org/T228190) (owner: 10BBlack) [17:31:01] (03CR) 10jerkins-bot: [V: 04-1] noc: fetch dbconfig from etcd to local disk [puppet] - 10https://gerrit.wikimedia.org/r/528527 (owner: 10CDanis) [17:32:55] (03CR) 10Krinkle: "Worth making explicit somewhere that this will be "publicly viewable on the web" or some such. This is obvious from the Apache config, but" [puppet] - 10https://gerrit.wikimedia.org/r/528527 (owner: 10CDanis) [17:35:27] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 57 probes of 447 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [17:36:21] (03PS3) 10CDanis: noc: fetch dbconfig from etcd to local disk [puppet] - 10https://gerrit.wikimedia.org/r/528527 [17:37:26] !log accraze@deploy1001 Finished deploy [ores/deploy@d08fa62]: T229848 (duration: 17m 21s) [17:37:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:37:33] T229848: ORES deployment, Early August 2019 - https://phabricator.wikimedia.org/T229848 [17:38:38] (03CR) 10jerkins-bot: [V: 04-1] noc: fetch dbconfig from etcd to local disk [puppet] - 10https://gerrit.wikimedia.org/r/528527 (owner: 10CDanis) [17:39:26] (03PS4) 10CDanis: noc: fetch dbconfig from etcd to local disk [puppet] - 10https://gerrit.wikimedia.org/r/528527 [17:40:08] (03PS4) 10CRusnov: netbox: Add configuration and timers for csv dumps [puppet] - 10https://gerrit.wikimedia.org/r/521313 [17:40:13] PROBLEM - Mediawiki Cirrussearch update rate - codfw on icinga1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [17:40:42] (03CR) 10jerkins-bot: [V: 04-1] noc: fetch dbconfig from etcd to local disk [puppet] - 10https://gerrit.wikimedia.org/r/528527 (owner: 10CDanis) [17:41:01] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 20 probes of 447 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [17:42:02] (03PS5) 10CRusnov: netbox: Add configuration and timers for csv dumps [puppet] - 10https://gerrit.wikimedia.org/r/521313 [17:42:13] (03CR) 10Cwhite: [C: 03+1] profile::cache::kafka::alerts: move alarms to prometheus (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526611 (https://phabricator.wikimedia.org/T229357) (owner: 10Elukey) [17:42:55] PROBLEM - Mediawiki Cirrussearch update rate - eqiad on icinga1001 is CRITICAL: CRITICAL: 30.00% of data under the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [17:43:25] (03CR) 10CRusnov: "Finally getting back to this. New revs and replies inline." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/521313 (owner: 10CRusnov) [17:44:19] 10Operations, 10Elasticsearch, 10Traffic, 10Discovery-Search (Current work), 10Patch-For-Review: Icinga check defined from LVS configuration for cloudelastic are borked - https://phabricator.wikimedia.org/T229621 (10Mathew.onipe) a:03Mathew.onipe [17:44:26] (03PS5) 10CDanis: noc: fetch dbconfig from etcd to local disk [puppet] - 10https://gerrit.wikimedia.org/r/528527 [17:45:14] (03PS1) 10Brennen Bearnes: Group0 to 1.34.0-wmf.17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528530 [17:47:10] (03CR) 10Cwhite: [C: 03+1] Icinga: Remove support for Jessie [puppet] - 10https://gerrit.wikimedia.org/r/528391 (owner: 10Muehlenhoff) [17:48:00] (03PS6) 10CDanis: noc: fetch dbconfig from etcd to local disk [puppet] - 10https://gerrit.wikimedia.org/r/528527 (https://phabricator.wikimedia.org/T229631) [17:48:42] (03CR) 10Cwhite: [C: 03+1] logstash: rotate logstash plain logs with log4j2 [puppet] - 10https://gerrit.wikimedia.org/r/528306 (https://phabricator.wikimedia.org/T166107) (owner: 10Herron) [17:49:42] (03CR) 10Krinkle: noc: fetch dbconfig from etcd to local disk (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/528527 (https://phabricator.wikimedia.org/T229631) (owner: 10CDanis) [17:49:54] (03Abandoned) 10Bstorm: sssd: Add some new images to test sssd in containers [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/527258 (https://phabricator.wikimedia.org/T229058) (owner: 10Bstorm) [17:50:11] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/528301 (owner: 10CRusnov) [17:51:42] (03CR) 10CRusnov: [C: 03+2] netbox: fix swift url [puppet] - 10https://gerrit.wikimedia.org/r/528301 (owner: 10CRusnov) [17:51:44] (03PS2) 10CRusnov: netbox: fix swift url [puppet] - 10https://gerrit.wikimedia.org/r/528301 [17:53:53] (03PS7) 10CDanis: noc: fetch dbconfig from etcd to local disk [puppet] - 10https://gerrit.wikimedia.org/r/528527 (https://phabricator.wikimedia.org/T229631) [17:54:01] (03PS2) 10Ladsgroup: mediawiki: Introduce startupregistrystats.pp to record RL modules registry [puppet] - 10https://gerrit.wikimedia.org/r/528526 (https://phabricator.wikimedia.org/T229836) [17:54:07] (03Abandoned) 10Jforrester: Require an 8-byte new password for all users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479571 (https://phabricator.wikimedia.org/T211622) (owner: 10Jforrester) [17:54:37] PROBLEM - Widespread puppet agent failures- no resources reported on icinga1001 is CRITICAL: site=eqsin https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [17:54:42] (03Abandoned) 10Jforrester: Require that passwords are not in any common list for privileged groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479573 (owner: 10Jforrester) [17:55:38] (03PS4) 10Jforrester: Enforce a 10-byte password for privileged users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479570 (https://phabricator.wikimedia.org/T208246) [17:55:40] (03PS4) 10Jforrester: Require that passwords are not in the most common 100k list for all users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479574 (https://phabricator.wikimedia.org/T151425) [17:58:46] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: blazegraph journal on wdqs1005 has doubled in space - https://phabricator.wikimedia.org/T229876 (10Smalyshev) p:05Triage→03Normal @Gehel I think for now we need to reload the DB from other server and repool i... [17:59:01] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: blazegraph journal on wdqs1005 has doubled in space - https://phabricator.wikimedia.org/T229876 (10Smalyshev) a:03Gehel [17:59:40] 10Operations, 10ops-codfw: (OoW) wtp2019 shows error messages in the racadm getsel's output - https://phabricator.wikimedia.org/T221572 (10Papaul) Instructions: The System Event Log contains information about the managed system. To sort the log by column, click a column header. Clear Log Save As Tue A... [18:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190806T1800) [18:01:55] (03PS3) 10Jforrester: Stop setting wgGraphIsTrusted to the default; never varied [mediawiki-config] - 10https://gerrit.wikimedia.org/r/522536 [18:04:04] (03CR) 10Jforrester: [C: 04-2] "Rebased. Still waiting on the Anti-Harassment Team for deployment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479570 (https://phabricator.wikimedia.org/T208246) (owner: 10Jforrester) [18:04:16] (03CR) 10Jforrester: "Rebased. Still waiting on the Anti-Harassment Team for deployment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479574 (https://phabricator.wikimedia.org/T151425) (owner: 10Jforrester) [18:04:43] (03PS1) 10CRusnov: netbox: redirect swagger doc requests to official docs [puppet] - 10https://gerrit.wikimedia.org/r/528531 [18:05:28] RECOVERY - Mediawiki Cirrussearch update rate - eqiad on icinga1001 is OK: OK: Less than 1.00% under the threshold [80.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [18:05:50] (03CR) 10CDanis: "PCC looks good: https://puppet-compiler.wmflabs.org/compiler1001/17757/mwmaint1002.eqiad.wmnet/" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/528527 (https://phabricator.wikimedia.org/T229631) (owner: 10CDanis) [18:06:02] RECOVERY - Mediawiki Cirrussearch update rate - codfw on icinga1001 is OK: OK: Less than 1.00% under the threshold [80.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [18:08:16] Hi I am back [18:08:34] Ooops wrong channel, it should be stewards [18:09:08] (03CR) 10Jforrester: [C: 03+1] "Good to deploy whenever." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/522536 (owner: 10Jforrester) [18:09:50] 10Operations, 10netbox: Setup Swift Storage for Netbox image (was: netbox won't allow me to upload photos of the rack) - https://phabricator.wikimedia.org/T209182 (10crusnov) 05Open→03Resolved Okay after some finagling, uploading (and downloading) images should work. A particularity of swift storage is th... [18:13:42] !log brennen@deploy1001 Pruned MediaWiki: 1.34.0-wmf.14 [keeping static files] (duration: 08m 28s) [18:13:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:02] (03PS4) 10Krinkle: Stop setting wgGraphIsTrusted (no longer used) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/522536 (owner: 10Jforrester) [18:17:05] (03CR) 10Krinkle: [C: 03+1] Stop setting wgGraphIsTrusted (no longer used) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/522536 (owner: 10Jforrester) [18:17:39] (03CR) 10Krinkle: [C: 03+1] "Adding commit for future log perusal." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/522536 (owner: 10Jforrester) [18:18:32] (03PS1) 10Bstorm: icinga: add wmcs-team-email for email-only alerts [puppet] - 10https://gerrit.wikimedia.org/r/528533 (https://phabricator.wikimedia.org/T229884) [18:19:00] 10Operations, 10SRE-Access-Requests: Requesting access to Puppet for Viztor[S] - https://phabricator.wikimedia.org/T229894 (10Viztor) >>! In T229894#5396816, @Aklapper wrote: > Thanks. It's still unclear to me which [specific server](https://wikitech.wikimedia.org/wiki/Puppet#Puppetmaster) you need access to a... [18:19:39] !log brennen@deploy1001 Started scap: testwiki to php-1.34.0-wmf.17 and rebuild l10n cache [18:19:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:00] (03PS2) 10Bstorm: icinga: add wmcs-team-email for email-only alerts [puppet] - 10https://gerrit.wikimedia.org/r/528533 (https://phabricator.wikimedia.org/T229884) [18:22:26] RECOVERY - Widespread puppet agent failures- no resources reported on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [18:30:51] (03PS1) 10Bstorm: icinga: switch up bstorm user a bit [puppet] - 10https://gerrit.wikimedia.org/r/528535 (https://phabricator.wikimedia.org/T229884) [18:38:41] !log brennen@deploy1001 Finished scap: testwiki to php-1.34.0-wmf.17 and rebuild l10n cache (duration: 19m 02s) [18:38:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:42:19] (03PS1) 10Sbisson: Enable and configure ORES damaging and goodfaith on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528537 (https://phabricator.wikimedia.org/T225562) [19:00:04] brennen: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for MediaWiki train - American version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190806T1900). [19:00:42] (03CR) 10Brennen Bearnes: [C: 03+2] Group0 to 1.34.0-wmf.17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528530 (owner: 10Brennen Bearnes) [19:01:38] (03Merged) 10jenkins-bot: Group0 to 1.34.0-wmf.17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528530 (owner: 10Brennen Bearnes) [19:01:51] (03CR) 10jenkins-bot: Group0 to 1.34.0-wmf.17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528530 (owner: 10Brennen Bearnes) [19:06:30] (03CR) 10Jhedden: [C: 03+1] icinga: add wmcs-team-email for email-only alerts [puppet] - 10https://gerrit.wikimedia.org/r/528533 (https://phabricator.wikimedia.org/T229884) (owner: 10Bstorm) [19:07:12] brennen: whenever you're done with train could you ping me? I want to restart gerrit. [19:08:15] thcipriani: ack [19:11:06] (03PS3) 10Thcipriani: gerrit: replication: exclude some projects [puppet] - 10https://gerrit.wikimedia.org/r/528276 [19:11:29] (03PS4) 10Paladox: Gerrit: Switch 'mirror' back on for the GitHub remote [puppet] - 10https://gerrit.wikimedia.org/r/528433 [19:11:35] (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/528433 (owner: 10Paladox) [19:11:48] (03CR) 10Paladox: [C: 04-1] "We think we've identified the cause so holding this." [puppet] - 10https://gerrit.wikimedia.org/r/528433 (owner: 10Paladox) [19:12:19] !log brennen@deploy1001 rebuilt and synchronized wikiversions files: Group0 to 1.34.0-wmf.17 [19:12:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:17:34] (03CR) 10Jhedden: [C: 03+1] icinga: switch up bstorm user a bit [puppet] - 10https://gerrit.wikimedia.org/r/528535 (https://phabricator.wikimedia.org/T229884) (owner: 10Bstorm) [19:18:23] thcipriani: i... think things look basically ok. [19:19:12] brennen: nice, kudos :) [19:22:25] !log gerrit restart on cobalt [19:22:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:24:03] (03PS1) 10Sbisson: lvwiki damaging model adjustment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528546 (https://phabricator.wikimedia.org/T221871) [19:24:09] thcipriani \o/ [19:24:11] works! [19:24:13] so it's a bug [19:24:23] [3b59b00d] push git@github.com:wikimedia/wikimedia-textcat-demo [19:25:08] hrm, replication config reload bug: fun :) [19:25:22] yeh [19:30:46] PROBLEM - puppet last run on db2094 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [19:31:58] PROBLEM - puppet last run on an-coord1001 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 8 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas],Exec[git_pull_operations/mediawiki-config] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [19:33:37] thcipriani i see repos being updated :) [19:33:46] (on github) [19:37:28] 10Operations, 10Discovery-Search: Prometheus not collecting cloudelastic metrics - https://phabricator.wikimedia.org/T229937 (10EBernhardson) Looked into this a little bit (on cloudelastic1001.wikimedia.org), no solution yet: * Verified with tcpdump over 10 minutes that nothing is calling port 9100 . Tcpdump... [19:42:53] !log depooled wtp2019 ( to assist papaul with T221572 ) [19:43:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:43:02] T221572: (OoW) wtp2019 shows error messages in the racadm getsel's output - https://phabricator.wikimedia.org/T221572 [19:44:01] (03PS2) 10Herron: logstash: rotate logstash plain logs with log4j2 [puppet] - 10https://gerrit.wikimedia.org/r/528306 (https://phabricator.wikimedia.org/T166107) [19:46:08] 10Operations, 10ops-codfw: (OoW) wtp2019 shows error messages in the racadm getsel's output - https://phabricator.wikimedia.org/T221572 (10ssastry) Depooled the server just now (logged in SAL). Tailing /srv/log/parsoid/main.log shows traffic has stopped. Grafana graphs is still showing traffic, but maybe the... [19:48:46] 10Operations, 10ops-codfw: (OoW) wtp2019 shows error messages in the racadm getsel's output - https://phabricator.wikimedia.org/T221572 (10ssastry) Oh, I was looking at the cluster graphs. the wtp2019 graph does indeed show zero traffic to the host now. [19:50:39] !log disabling puppet on logstash collectors for rolling deploy of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/528306/ T166107 [19:50:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:50:47] T166107: logrotate and logstash logs does not play well together - https://phabricator.wikimedia.org/T166107 [19:50:55] 10Operations, 10Jade, 10Scoring-platform-team, 10TechCom, and 4 others: Deploy Jade extension MVP to production - https://phabricator.wikimedia.org/T183381 (10Halfak) [19:52:32] !log shutting down wtp2019 for firmware upgrade [19:52:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:44] (03CR) 10Herron: [C: 03+2] logstash: rotate logstash plain logs with log4j2 [puppet] - 10https://gerrit.wikimedia.org/r/528306 (https://phabricator.wikimedia.org/T166107) (owner: 10Herron) [19:54:16] PROBLEM - Host wtp2019 is DOWN: PING CRITICAL - Packet loss = 100% [19:54:20] RECOVERY - puppet last run on an-coord1001 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [19:58:44] RECOVERY - puppet last run on db2094 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [19:59:54] RECOVERY - Host wtp2019 is UP: PING OK - Packet loss = 0%, RTA = 36.22 ms [20:00:02] (03PS1) 10EBernhardson: Define cloudelastic as a cluster in hieradata [puppet] - 10https://gerrit.wikimedia.org/r/528554 (https://phabricator.wikimedia.org/T229937) [20:00:22] 10Operations, 10Discovery-Search, 10Patch-For-Review: Prometheus not collecting cloudelastic metrics - https://phabricator.wikimedia.org/T229937 (10EBernhardson) Think i found it: Cumin::Selector is defined with cluster="cloudelastic", site="eqiad" Prometheus::Class_config (and friends) use the puppetlib `g... [20:00:36] 10Operations, 10Discovery-Search, 10Patch-For-Review: Prometheus not collecting cloudelastic metrics - https://phabricator.wikimedia.org/T229937 (10EBernhardson) a:03EBernhardson [20:00:51] 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: Prometheus not collecting cloudelastic metrics - https://phabricator.wikimedia.org/T229937 (10EBernhardson) [20:14:57] 10Operations, 10ops-codfw: (OoW) wtp2019 shows error messages in the racadm getsel's output - https://phabricator.wikimedia.org/T221572 (10Papaul) @ssastry upgrade complete, I have no more errors showing in the IDRAC log, I am leaving the task open until next week then will resolve it if no errors. The serv... [20:17:20] (03PS2) 10Bstorm: icinga: switch up bstorm user a bit [puppet] - 10https://gerrit.wikimedia.org/r/528535 (https://phabricator.wikimedia.org/T229884) [20:17:58] !log repooled wtp2019 ( after papaul finished upgrade as part of T221572 ) [20:18:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:07] T221572: (OoW) wtp2019 shows error messages in the racadm getsel's output - https://phabricator.wikimedia.org/T221572 [20:18:46] (03CR) 10Bstorm: [C: 03+2] icinga: switch up bstorm user a bit [puppet] - 10https://gerrit.wikimedia.org/r/528535 (https://phabricator.wikimedia.org/T229884) (owner: 10Bstorm) [20:20:04] 10Operations, 10ops-codfw: (OoW) wtp2019 shows error messages in the racadm getsel's output - https://phabricator.wikimedia.org/T221572 (10ssastry) After upgrade, verified code version ` ssastry@wtp2019:~$ curl http://localhost:8000/_version {"name":"parsoid","version":"0.10.0+git","sha":"7232dfff04a305db11f6c... [20:20:20] (03PS3) 10Bstorm: icinga: add wmcs-team-email for email-only alerts [puppet] - 10https://gerrit.wikimedia.org/r/528533 (https://phabricator.wikimedia.org/T229884) [20:22:20] 10Operations, 10Jade, 10Scoring-platform-team, 10TechCom, and 4 others: Deploy Jade extension MVP to production - https://phabricator.wikimedia.org/T183381 (10Halfak) [20:22:50] (03CR) 10Bstorm: [C: 03+2] icinga: add wmcs-team-email for email-only alerts [puppet] - 10https://gerrit.wikimedia.org/r/528533 (https://phabricator.wikimedia.org/T229884) (owner: 10Bstorm) [20:43:47] 10Operations, 10Jade, 10Scoring-platform-team, 10TechCom, and 4 others: Deploy Jade extension MVP to production - https://phabricator.wikimedia.org/T183381 (10Halfak) [20:44:37] (03PS13) 10Holger Knust: table-properties: Initial commit [software/cassandra-table-properties] - 10https://gerrit.wikimedia.org/r/524921 (https://phabricator.wikimedia.org/T220246) [20:45:14] 10Operations, 10Jade, 10Scoring-platform-team, 10TechCom, and 4 others: Deploy Jade extension MVP to production - https://phabricator.wikimedia.org/T183381 (10Halfak) [20:59:28] PROBLEM - Check correctness of the icinga configuration on icinga1001 is CRITICAL: Icinga configuration contains errors https://wikitech.wikimedia.org/wiki/Icinga [20:59:49] 👀 [21:02:57] bstorm_: did you just merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/528533 ? [21:03:16] Aug 6 20:46:40 icinga1001 icinga[30039]: Error: Could not find any contact matching 'aborrero-email' (config file '/etc/icinga/objects/contactgroups.cfg', starting on line 77) [21:03:17] A little while ago, yes? [21:03:38] Icinga can't load the new configuration (and is still running using the old one) because of that [21:03:43] A shoot. I may have made a mistake in the config on srv/private...fixing [21:05:12] No, that looks right. Perhaps it is having some chicken/egg thing? [21:06:33] That contact is definitely in contacts.cfg [21:06:59] and has been for a while [21:07:06] yeah, you are right, that is strange [21:07:07] (03CR) 10Dzahn: [C: 03+2] gerrit: replication: exclude some projects [puppet] - 10https://gerrit.wikimedia.org/r/528276 (owner: 10Thcipriani) [21:07:08] As in for like an hour before I did that change [21:07:16] (03PS4) 10Dzahn: gerrit: replication: exclude some projects [puppet] - 10https://gerrit.wikimedia.org/r/528276 (owner: 10Thcipriani) [21:07:39] bstorm_: i think you had bstorm-email vs bstorm-wmcs [21:07:48] ? [21:07:56] Both are valid [21:08:54] hmm, ok. looking as well [21:10:29] running puppet to see what it does [21:10:44] I just did, on icinga1001, it didn't want to do anything [21:10:53] Ok [21:10:58] checking the validity of the config file still shows the same error though: sudo /usr/sbin/icinga -v /etc/icinga/icinga.cfg [21:11:01] Error: Could not find any contact matching 'aborrero-email' (config file '/etc/icinga/objects/contactgroups.cfg', starting on line 77) [21:11:04] Error: Could not expand member contacts specified in contactgroup (config file '/etc/icinga/objects/contactgroups.cfg', starting on line 77) [21:11:04] Could not find any contact matching 'aborrero-email' [21:11:06] Error processing object config files! [21:11:08] Why that one? [21:11:09] but it *is* there. [21:11:09] it's not yours [21:11:13] I know. [21:11:17] But it's there [21:11:29] Which makes me start looking for a curly brace or some such nonsense [21:13:07] none of the -email users show up in https://icinga.wikimedia.org/cgi-bin/icinga/config.cgi?type=contacts [21:13:20] but the -wmcs users do? [21:13:22] So perhaps it didn't re-read those or like them for some reason. [21:13:29] something about the use of the template? [21:13:48] I have no idea re: the tempate bit. [21:13:52] *template [21:14:18] I'll try reverting this change and see if the setup picks up the users (or if they need to be part of the templates or some reason?) [21:14:19] also confirmed it is there on the actual icinga server, not just in private repo... [21:14:25] yeah [21:15:10] do the -email users need an alias? [21:15:13] my contact has a contact_name [21:15:16] that's the only difference I can see at a glance [21:15:18] his contact has a name [21:15:20] but not a contact_name [21:15:22] (03PS1) 10Bstorm: Revert "icinga: add wmcs-team-email for email-only alerts" [puppet] - 10https://gerrit.wikimedia.org/r/528575 [21:15:44] oof [21:15:46] I bet it is that [21:15:55] check what happens if you give him contact_name [21:16:30] Will do before merging the revert [21:16:46] [icinga1001:/etc/icinga] $ sudo grep name objects/contacts.cfg | grep -v contact_name [21:16:48] I think 'name' is not actually a meaningful thing, only 'contact_name' https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/objectdefinitions.html [21:16:49] ^ [21:16:51] that hits all the -email users [21:16:54] but no others [21:17:12] I copy and pasted mine to make the others, so if I made the mistake in mine... [21:17:16] bstorm, aborreoro, hpam, andrew, jhedden [21:17:51] want me to fix in private repo? [21:17:55] already in the file [21:18:12] oh, not the only one :) [21:18:44] I see what it is. I copied too much from the template 😉 [21:18:50] I'm fixing it now [21:18:56] cool [21:19:37] you can check like this: [icinga1001:/etc/icinga] $ sudo icinga -v /etc/icinga/icinga.cfg [21:20:37] thx :) [21:23:55] looks good now! [21:24:02] Aug 6 21:23:55 icinga1001 icinga: Event loop started... [21:24:21] Yup :) [21:24:26] and now the email users are in https://icinga.wikimedia.org/cgi-bin/icinga/config.cgi?type=contacts [21:24:27] Sorry about that! [21:24:27] nice [21:24:42] I copied and pasted one line I should not have from my template [21:24:45] np! that was a silly one [21:24:48] In the templates, there's a "name" field [21:24:57] no worries, i'm glad somebody is working on sms for sub groups [21:25:07] i had the same point about 2 users just recently [21:25:39] (03Abandoned) 10Bstorm: Revert "icinga: add wmcs-team-email for email-only alerts" [puppet] - 10https://gerrit.wikimedia.org/r/528575 (owner: 10Bstorm) [21:28:29] !log @ helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' . [21:28:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:30:54] RECOVERY - Check correctness of the icinga configuration on icinga1001 is OK: Icinga configuration is correct https://wikitech.wikimedia.org/wiki/Icinga [21:31:07] bstorm_: that's that same command ^ [21:31:23] Fair enough :) [21:34:09] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@860fb33]: Deploy latest mjolnir daemon to handle bulk imports via swift [21:34:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:35:59] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@860fb33]: Deploy latest mjolnir daemon to handle bulk imports via swift (duration: 01m 50s) [21:36:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:36:41] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@8e513f6]: Deploy latest mjolnir daemon to handle bulk imports via swift [21:36:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:43:14] (03PS1) 10Bstorm: icinga: Set the WMCS host alerts to go only to WMCS [puppet] - 10https://gerrit.wikimedia.org/r/528581 (https://phabricator.wikimedia.org/T229884) [21:47:49] (03PS2) 10Bstorm: icinga: Set the WMCS host alerts to go only to WMCS [puppet] - 10https://gerrit.wikimedia.org/r/528581 (https://phabricator.wikimedia.org/T229884) [21:53:17] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@8e513f6]: Deploy latest mjolnir daemon to handle bulk imports via swift (duration: 16m 35s) [21:53:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:56:59] (03PS1) 10Ssingh: Add Sukhbir Singh (sukhe) to the ops group [puppet] - 10https://gerrit.wikimedia.org/r/528585 [22:01:20] (03CR) 10Bstorm: "Overall, every host we use has to get the contactgroups setting with wmcs-team in order for this to work. There may be more roles and suc" [puppet] - 10https://gerrit.wikimedia.org/r/528581 (https://phabricator.wikimedia.org/T229884) (owner: 10Bstorm) [22:07:12] (03CR) 10Dzahn: "seems good, but is there maybe a ticket to link to this?" [puppet] - 10https://gerrit.wikimedia.org/r/528585 (owner: 10Ssingh) [22:08:39] (03CR) 10Ssingh: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/528585 (owner: 10Ssingh) [22:10:45] (03PS2) 10Dzahn: Add Sukhbir Singh (sukhe) to the ops group [puppet] - 10https://gerrit.wikimedia.org/r/528585 (https://phabricator.wikimedia.org/T229860) (owner: 10Ssingh) [22:11:39] (03CR) 10Dzahn: [C: 03+1] "thanks! looks good to me. +1. i took the liberty to edit your commit message wiki style to show the way we usually link to tickets. That w" [puppet] - 10https://gerrit.wikimedia.org/r/528585 (https://phabricator.wikimedia.org/T229860) (owner: 10Ssingh) [22:15:52] 10Operations, 10Traffic, 10Patch-For-Review: SRE Onboarding for Sukhbir Singh - https://phabricator.wikimedia.org/T229860 (10Dzahn) [22:16:26] (03PS1) 10Cwhite: icinga: disable autocomplete.js in icinga search text input [puppet] - 10https://gerrit.wikimedia.org/r/528586 [22:17:38] (03CR) 10jerkins-bot: [V: 04-1] icinga: disable autocomplete.js in icinga search text input [puppet] - 10https://gerrit.wikimedia.org/r/528586 (owner: 10Cwhite) [22:18:58] (03PS1) 10Ottomata: Install libpython3.7 with python3.7 on analytics nodes [puppet] - 10https://gerrit.wikimedia.org/r/528588 (https://phabricator.wikimedia.org/T229347) [22:19:10] 10Operations, 10Traffic, 10Patch-For-Review: SRE Onboarding for Sukhbir Singh - https://phabricator.wikimedia.org/T229860 (10Dzahn) [x] Add to "Ops vendor maintenance" Calendar (added with 'all email as it arrives'). This means more email about maintenance at data centers and also web access to https://grou... [22:19:15] mutante: thanks! [22:19:41] sukhe: welcome to ops [22:19:49] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@9e95ab4]: Deploy latest mjolnir daemon to handle bulk imports via swift [22:19:51] (but we don't call it ops anymore) :P [22:20:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:20:07] what's the new name? I see "ops" everywhere! [22:20:21] SRE and sub teams of SRE [22:20:41] ha [22:20:58] https://www.mediawiki.org/wiki/Wikimedia_Site_Reliability_Engineering https://wikitech.wikimedia.org/wiki/SRE [22:21:25] I thought ops was a (sub)team in SRE [22:21:30] also the new "Director of Operations" has nothing to do with "formerly known as "Tech Ops" [22:21:43] but the job posting "Operations Engineer" still does :P [22:22:21] sukhe: Service Operations is a subteam of SRE but also "ops" is the historical term of all of SRE [22:23:27] (03PS3) 10Bstorm: icinga: Set the WMCS host alerts to go only to WMCS [puppet] - 10https://gerrit.wikimedia.org/r/528581 (https://phabricator.wikimedia.org/T229884) [22:24:10] 10Operations, 10Traffic, 10Patch-For-Review: SRE Onboarding for Sukhbir Singh - https://phabricator.wikimedia.org/T229860 (10ssingh) [22:25:18] I see [22:25:24] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@9e95ab4]: Deploy latest mjolnir daemon to handle bulk imports via swift (duration: 05m 35s) [22:25:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:26:24] 10Operations, 10Traffic, 10Patch-For-Review: SRE Onboarding for Sukhbir Singh - https://phabricator.wikimedia.org/T229860 (10Dzahn) [22:26:47] sukhe: i added you to the group with 'all email' and checked that second box. that should be the same thing [22:27:01] sukhe: that should let you see that special inbox in web [22:27:25] yes, it was supposed to inherit privileges from the sre@wikimedia.org (Tech Ops, haha) group [22:27:36] but it seems like it doesn't work that way with google groups in groups [22:27:43] so we had to manually add for onboarding anyways [22:28:04] (03CR) 10Ottomata: [C: 03+2] Install libpython3.7 with python3.7 on analytics nodes [puppet] - 10https://gerrit.wikimedia.org/r/528588 (https://phabricator.wikimedia.org/T229347) (owner: 10Ottomata) [22:30:17] mutante: I see the group now! [22:30:34] sukhe: ok, cool [22:30:59] that is mostly useful if you see something in IRC here that looks like networking issues. like a lot of hosts going down at once [22:31:21] then you can check in that group if any of the data center providers recently announced maintenance work [22:31:46] well, not just hosts.. also stuff like transit link down [22:33:08] !log restart mjolnir-kafka-daemon across all elasticsearch servers [22:33:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:35:59] (03CR) 10Bstorm: "This is a pretty pervasive sampling of the host roles this covers: https://puppet-compiler.wmflabs.org/compiler1001/17765/" [puppet] - 10https://gerrit.wikimedia.org/r/528581 (https://phabricator.wikimedia.org/T229884) (owner: 10Bstorm) [22:37:30] (03PS4) 10Bstorm: icinga: Set the WMCS host alerts to go only to WMCS [puppet] - 10https://gerrit.wikimedia.org/r/528581 (https://phabricator.wikimedia.org/T229884) [22:45:02] (03CR) 10Bstorm: "Patch set 4 takes a slightly different approach for the wiki replicas and dumps servers, since those are a bit more collaborative. It use" [puppet] - 10https://gerrit.wikimedia.org/r/528581 (https://phabricator.wikimedia.org/T229884) (owner: 10Bstorm) [22:45:58] (03CR) 10Bstorm: "That should be everything. Please help me find anything I may have missed or let me know if there is anything that should instead be 'wmc" [puppet] - 10https://gerrit.wikimedia.org/r/528581 (https://phabricator.wikimedia.org/T229884) (owner: 10Bstorm) [22:46:05] (03CR) 10Jhedden: [C: 03+1] "Looks great! nice job on this." [puppet] - 10https://gerrit.wikimedia.org/r/528581 (https://phabricator.wikimedia.org/T229884) (owner: 10Bstorm) [22:47:50] (03PS1) 10Subramanya Sastry: WIP: Add conditional loading of Parsoid/PHP as an extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528591 (https://phabricator.wikimedia.org/T229354) [22:52:54] (03CR) 10Bstorm: "> Patch Set 4: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/528581 (https://phabricator.wikimedia.org/T229884) (owner: 10Bstorm) [22:53:46] (03CR) 10Bstorm: "To be clear, I need to do another patch to capture additional services. This covered more of them than I actually thought it would, but t" [puppet] - 10https://gerrit.wikimedia.org/r/528581 (https://phabricator.wikimedia.org/T229884) (owner: 10Bstorm) [22:59:20] (03CR) 10jerkins-bot: [V: 04-1] WIP: Add conditional loading of Parsoid/PHP as an extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528591 (https://phabricator.wikimedia.org/T229354) (owner: 10Subramanya Sastry) [23:00:05] MaxSem, RoanKattouw, and Niharika: #bothumor My software never has bugs. It just develops random features. Rise for Evening SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190806T2300). [23:00:05] Viztor_, Pchelolo, and RoanKattouw: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:03:22] 10Operations, 10Wikimedia-Mailing-lists: New Mailing lists for AzWiki sysops - https://phabricator.wikimedia.org/T228542 (10Dzahn) Hi, regarding the user lists, i think there was a misunderstand about "primary admin" and "secondary admin". It did not mean that you should list all wiki admins and their individu... [23:03:25] (03PS4) 10DannyS712: Update wmflabs.org redirect target [puppet] - 10https://gerrit.wikimedia.org/r/528304 (https://phabricator.wikimedia.org/T229896) [23:03:52] I'm here, and I'll do the swat [23:04:07] thanks RoanKattouw [23:04:17] (03PS5) 10DannyS712: Update wmflabs.org redirect target [puppet] - 10https://gerrit.wikimedia.org/r/528304 (https://phabricator.wikimedia.org/T229896) [23:05:11] (03CR) 10Bstorm: [C: 03+1] Update wmflabs.org redirect target [puppet] - 10https://gerrit.wikimedia.org/r/528304 (https://phabricator.wikimedia.org/T229896) (owner: 10DannyS712) [23:06:09] (03PS2) 10Catrope: Switch updateBetaFeaturesUserCounts job to eventgate. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528209 (https://phabricator.wikimedia.org/T228705) (owner: 10Ppchelko) [23:06:17] (03CR) 10Catrope: [C: 03+2] Switch updateBetaFeaturesUserCounts job to eventgate. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528209 (https://phabricator.wikimedia.org/T228705) (owner: 10Ppchelko) [23:07:34] (03Merged) 10jenkins-bot: Switch updateBetaFeaturesUserCounts job to eventgate. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528209 (https://phabricator.wikimedia.org/T228705) (owner: 10Ppchelko) [23:07:49] (03CR) 10jenkins-bot: Switch updateBetaFeaturesUserCounts job to eventgate. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528209 (https://phabricator.wikimedia.org/T228705) (owner: 10Ppchelko) [23:08:41] Pchelolo: Your patch is on mwdebug1002, in case it can be tested there [23:08:51] If not, let me know and I'll just deploy it cluster-wide [23:08:51] it can, gimme 5 mins [23:08:55] OK [23:11:34] (03CR) 10Cwhite: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/528586 (owner: 10Cwhite) [23:12:29] How should I deploy? [23:12:37] (03CR) 10jerkins-bot: [V: 04-1] icinga: disable autocomplete.js in icinga search text input [puppet] - 10https://gerrit.wikimedia.org/r/528586 (owner: 10Cwhite) [23:13:07] Is evening SWAT here? [23:13:58] viztor_: Hello! Yes, I can do yours after Pchelolo finishes testing his patch [23:14:22] sweet. mine should be pretty straight forward, so no hurries. [23:15:04] (03PS2) 10Cwhite: icinga: disable autocomplete.js in icinga search text input [puppet] - 10https://gerrit.wikimedia.org/r/528586 [23:17:00] 10Operations, 10SRE-Access-Requests, 10Traffic, 10Patch-For-Review: SRE Onboarding for Sukhbir Singh - https://phabricator.wikimedia.org/T229860 (10Dzahn) [23:17:45] RoanKattouw: ok, it works :) [23:18:10] (03PS9) 10Catrope: Update HD logo for en.ws and mul.ws [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [23:18:17] (03CR) 10Catrope: [C: 03+2] Update HD logo for en.ws and mul.ws [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [23:19:21] (03Merged) 10jenkins-bot: Update HD logo for en.ws and mul.ws [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [23:19:25] eh, my irc client keeps dropping me. [23:19:36] (03CR) 10jenkins-bot: Update HD logo for en.ws and mul.ws [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527922 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [23:19:39] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Switch updateBetaFeaturesUserCounts job to eventgate (T228705) (duration: 00m 57s) [23:19:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:19:47] T228705: Migrate JobQueue to eventgate - https://phabricator.wikimedia.org/T228705 [23:21:06] RoanKattouw [23:21:11] so is it done like that? [23:23:17] Almost done, give me a minute [23:23:20] Then I'll ask you to test [23:24:31] <3 [23:24:46] !log catrope@deploy1001 Synchronized static/images/project-logos/: Update HD logos for enwikisource and sourceswiki (T229769) (duration: 00m 56s) [23:24:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:24:54] T229769: Set up HD logo for mul.Wikisource and en.Wikisource - https://phabricator.wikimedia.org/T229769 [23:25:54] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Update HD logos for enwikisource and sourceswiki (T229769) (duration: 00m 55s) [23:26:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:52] viztor_: OK, should be deployed now, please test [23:29:40] viztor_: hi, got a question later about your access request. [23:30:06] RoanKattouw, the image appears to be cropped. [23:30:36] Which one, and on which wiki? [23:31:49] probably both [23:32:31] the css is written to have width 135px [23:32:38] and height automatically adjusted [23:33:41] which only work for logo of 135 * 155px size or shorter logos but not slimmer logos [23:34:16] I don't see it [23:35:02] The logos are 125x155 px and the box they fit in is 160x160 it looks like [23:36:07] I'm not seeing any cropping [23:36:14] !log phabricator - added ssingh to acl*sre-team (group 29), WMF-NDA-requests (group 974) and WMF-NDA (group 61) (T229860) [23:36:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:23] T229860: SRE Onboarding for Sukhbir Singh - https://phabricator.wikimedia.org/T229860 [23:36:39] sukhe: ^ moar ticket access for you now [23:36:56] that gives you private phabricator tickets [23:37:28] 10Operations, 10SRE-Access-Requests, 10Traffic, 10Patch-For-Review: SRE Onboarding for Sukhbir Singh - https://phabricator.wikimedia.org/T229860 (10Dzahn) [23:38:10] RoanKattouw I've updated the images but seems the pr is closed [23:38:23] should I open a new one or is there a way to reopen it? [23:38:30] Yeah Gerrit won't let you update an already-merged commit, you have to create a new change [23:38:44] I'm not seeing anything wrong though, could you screenshot what you see? [23:40:48] 10Operations, 10SRE-Access-Requests: Requesting access to Puppet for Viztor[S] - https://phabricator.wikimedia.org/T229894 (10Dzahn) Hi @Viztor is the reason for requesting this just that you want a change deployed for T226633? (It seems we need to improve on how to get fonts installed T228591) [23:41:53] viz_: are you asking about gerrit or github? [23:43:30] (03PS1) 10Viztor: Fix issue with HD mul.ws and en.ws logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528597 [23:44:06] mutante gerrit here. [23:44:16] if it's closed as "abandoned" you can hit restored. if it's closed as merged you can click "revert" (if that's what you want) or make a new one [23:44:25] new one might be easier [23:44:45] Revert is probably better for the log but I already created a new one so [23:45:06] I don't think I can revert though since I don't have access to write to the master branch? [23:45:35] RoanKattouw I've uploaded a patch [23:45:36] You can propose a revert [23:45:37] in this case it just means "gerrit, create a new patch that will revert it once somebody merges it" but without having to manually create it [23:45:45] by clicking that button in the UI [23:45:45] In the same way you can propose a new patch [23:46:23] (03PS2) 10Viztor: Fix issue with HD mul.ws and en.ws logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528597 [23:46:42] (03PS3) 10Catrope: Fix issue with HD mul.ws and en.ws logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528597 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [23:46:53] (03CR) 10Catrope: [C: 03+2] Fix issue with HD mul.ws and en.ws logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528597 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [23:47:46] (03Merged) 10jenkins-bot: Fix issue with HD mul.ws and en.ws logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528597 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [23:48:03] (03CR) 10jenkins-bot: Fix issue with HD mul.ws and en.ws logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528597 (https://phabricator.wikimedia.org/T229769) (owner: 10Viztor) [23:49:22] !log catrope@deploy1001 Synchronized php-1.34.0-wmf.16/extensions/Flow/includes/Import/OptInController.php: Unbreak disabling of Flow beta feature (T229795) (duration: 00m 56s) [23:49:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:30] T229795: It is not possible to deactivate Structured Discussions using the Beta feature on some wikis - https://phabricator.wikimedia.org/T229795 [23:50:17] !log catrope@deploy1001 Synchronized php-1.34.0-wmf.17/extensions/Flow/includes/Import/OptInController.php: Unbreak disabling of Flow beta feature (T229795) (duration: 00m 55s) [23:50:24] I see [23:50:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:30] !log catrope@deploy1001 Synchronized static/images/project-logos/: Update HD logos for enwikisource and sourceswiki (T229769) (duration: 00m 56s) [23:51:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:38] T229769: Set up HD logo for mul.Wikisource and en.Wikisource - https://phabricator.wikimedia.org/T229769 [23:51:54] That sounds like double the trouble, though could keep the log bit cleaner I suppose [23:52:22] viz_: OK, try now [23:52:28] I've merged and deployed your second patch [23:52:39] Yes, it worked [23:53:09] Thanks RoanKattouw <3 [23:53:54] Alright, so that's the SWAT done! Thanks everyone [23:56:18] (03PS5) 10Dzahn: gerrit: replication: exclude some projects [puppet] - 10https://gerrit.wikimedia.org/r/528276 (owner: 10Thcipriani) [23:59:09] mutante What is your question for that access request? [23:59:58] viz_: is the reason to request that just that you want those fonts installed? https://phabricator.wikimedia.org/T229894#5398158