[00:37:25] PROBLEM - HHVM rendering on mw2220 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:38:25] RECOVERY - HHVM rendering on mw2220 is OK: HTTP OK: HTTP/1.1 200 OK - 77491 bytes in 0.414 second response time [03:32:09] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 927.33 seconds [03:59:57] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 232.43 seconds [04:02:49] PROBLEM - puppet last run on icinga1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:18:10] RECOVERY - puppet last run on icinga1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:40:00] (03PS1) 10KartikMistry: cxserver: Disable Youdao MT until service is back [puppet] - 10https://gerrit.wikimedia.org/r/475688 [05:58:27] !log kartik@deploy1001 Started deploy [cxserver/deploy@d2173ca]: Update cxserver to 218a543 [05:58:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:02:48] !log kartik@deploy1001 Finished deploy [cxserver/deploy@d2173ca]: Update cxserver to 218a543 (duration: 04m 21s) [06:02:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:12:17] (03PS1) 10Vgutierrez: certcentral: Provide TLS certificates for archiva [puppet] - 10https://gerrit.wikimedia.org/r/475691 (https://phabricator.wikimedia.org/T207050) [06:12:19] (03PS1) 10Vgutierrez: archiva: Deploy certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/475692 (https://phabricator.wikimedia.org/T207050) [06:14:35] (03PS1) 10Marostegui: db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475693 (https://phabricator.wikimedia.org/T86339) [06:16:28] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475693 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [06:17:32] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475693 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [06:21:46] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1077 - T86339 (duration: 00m 50s) [06:21:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:21:50] T86339: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 [06:21:58] !log Deploy schema change on db1077 (s3 sanitarium master) with replication - T86339 [06:22:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:22:23] !log Reload haproxy on dbproxy1005 [06:22:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:25:38] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1077" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475694 [06:26:45] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1077" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475694 (owner: 10Marostegui) [06:27:46] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1077" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475694 (owner: 10Marostegui) [06:28:46] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1077 - T86339 (duration: 00m 46s) [06:28:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:28:50] T86339: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 [06:29:14] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475693 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [06:29:16] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1077" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475694 (owner: 10Marostegui) [06:36:02] !log Deploy schema change on s3 master (db1075) - T86339 [06:36:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:36:06] T86339: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 [06:46:07] !log Deploy schema change on db2067 - T86338 [06:46:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:46:11] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [06:48:55] (03CR) 10Vgutierrez: [C: 032] certcentral: Provide TLS certificates for archiva [puppet] - 10https://gerrit.wikimedia.org/r/475691 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [06:51:51] 10Operations, 10Citoid, 10Patch-For-Review, 10Service-deployment-requests, and 3 others: Deploy translation-server-v2 - https://phabricator.wikimedia.org/T201611 (10akosiaris) >>! In T201611#4772018, @Krenair wrote: > @Akosiaris: I think your last paste is slightly broken (most of the URL got muddled with... [06:54:07] (03CR) 10Alexandros Kosiaris: [C: 031] "jenkins votes -1 because of modules/stdlib/spec/fixtures/test/manifests/deftype.pp:2 WARNING variable not enclosed in {}" [puppet] - 10https://gerrit.wikimedia.org/r/475258 (owner: 10Dzahn) [06:54:14] (03PS2) 10Alexandros Kosiaris: upgrade puppet stdlib from 4.16.0 to 4.19.0 [puppet] - 10https://gerrit.wikimedia.org/r/475258 (owner: 10Dzahn) [06:54:42] (03CR) 10Alexandros Kosiaris: [C: 032] upgrade puppet stdlib from 4.16.0 to 4.19.0 [puppet] - 10https://gerrit.wikimedia.org/r/475258 (owner: 10Dzahn) [06:54:54] (03CR) 10jerkins-bot: [V: 04-1] upgrade puppet stdlib from 4.16.0 to 4.19.0 [puppet] - 10https://gerrit.wikimedia.org/r/475258 (owner: 10Dzahn) [06:55:35] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] upgrade puppet stdlib from 4.16.0 to 4.19.0 [puppet] - 10https://gerrit.wikimedia.org/r/475258 (owner: 10Dzahn) [06:55:57] (03PS1) 10Marostegui: db-eqiad.php: Depool db1088 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475695 (https://phabricator.wikimedia.org/T86338) [06:57:48] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1088 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475695 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:58:51] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1088 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475695 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:59:58] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1088 - T86338 (duration: 00m 46s) [07:00:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:00:03] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [07:00:09] !log Deploy schema change on db1088 - T86338 [07:00:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:02:13] (03PS1) 10Elukey: profile::hadoop::master: raise RM heap alarms and fix dashboard links [puppet] - 10https://gerrit.wikimedia.org/r/475697 [07:05:23] (03PS1) 10Elukey: Fix alarm's dashboard links in hadoop profiles [puppet] - 10https://gerrit.wikimedia.org/r/475698 [07:05:54] (03PS2) 10Elukey: profile::hadoop::master: raise RM heap alarms and fix dashboard links [puppet] - 10https://gerrit.wikimedia.org/r/475697 [07:06:29] (03PS2) 10Elukey: Fix alarm's dashboard links in hadoop profiles [puppet] - 10https://gerrit.wikimedia.org/r/475698 [07:07:07] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1088" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475699 [07:08:14] (03CR) 10Elukey: [C: 032] profile::hadoop::master: raise RM heap alarms and fix dashboard links [puppet] - 10https://gerrit.wikimedia.org/r/475697 (owner: 10Elukey) [07:08:24] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1088" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475699 (owner: 10Marostegui) [07:08:28] (03CR) 10Elukey: [C: 032] Fix alarm's dashboard links in hadoop profiles [puppet] - 10https://gerrit.wikimedia.org/r/475698 (owner: 10Elukey) [07:09:25] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1088" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475699 (owner: 10Marostegui) [07:09:28] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1088 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475695 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [07:09:38] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1088" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475699 (owner: 10Marostegui) [07:10:23] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1088 - T86338 (duration: 00m 46s) [07:10:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:10:27] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [07:18:23] (03PS1) 10Elukey: profile::hadoop::spark2: rename parameter to be more generic [puppet] - 10https://gerrit.wikimedia.org/r/475706 [07:19:46] (03PS1) 10Ema: ATS: add remap rules for Beta cluster [puppet] - 10https://gerrit.wikimedia.org/r/475707 [07:21:15] (03CR) 10Ema: [C: 032] ATS: add remap rules for Beta cluster [puppet] - 10https://gerrit.wikimedia.org/r/475707 (owner: 10Ema) [07:22:09] (03CR) 10Elukey: [C: 032] profile::hadoop::spark2: rename parameter to be more generic [puppet] - 10https://gerrit.wikimedia.org/r/475706 (owner: 10Elukey) [07:22:15] (03PS2) 10Elukey: profile::hadoop::spark2: rename parameter to be more generic [puppet] - 10https://gerrit.wikimedia.org/r/475706 [07:22:34] (03CR) 10Giuseppe Lavagetto: "As expected there are a few things to fix -apparently more in the private repo than anywhere else:" [puppet] - 10https://gerrit.wikimedia.org/r/475500 (owner: 10Giuseppe Lavagetto) [07:32:10] (03CR) 10Alexandros Kosiaris: [C: 04-1] "minor nitpicks, but premise looks fine to me" (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/475457 (https://phabricator.wikimedia.org/T209573) (owner: 10Giuseppe Lavagetto) [07:35:01] (03PS1) 10Elukey: memcached: add -R 200 to mc1021 [puppet] - 10https://gerrit.wikimedia.org/r/475708 (https://phabricator.wikimedia.org/T208844) [07:36:25] (03CR) 10Elukey: [C: 032] memcached: add -R 200 to mc1021 [puppet] - 10https://gerrit.wikimedia.org/r/475708 (https://phabricator.wikimedia.org/T208844) (owner: 10Elukey) [07:36:58] !log restart memcached on mc1021 (cache wipe) to add -R 200 - T208844 [07:37:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:37:03] T208844: Apply -R 200 to all the memcached mw object cache instances running in eqiad/codfw - https://phabricator.wikimedia.org/T208844 [07:37:30] long way to 1036 :P [07:37:58] PROBLEM - Logstash rate of ingestion percent change compared to yesterday on icinga1001 is CRITICAL: 147.9 ge 130 https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1panelId=2fullscreen [07:38:53] this could very well be memcached errors due to the restart above, should clear in a bit if this is the case [07:38:53] (03PS2) 10Ema: ATS: path normalization [puppet] - 10https://gerrit.wikimedia.org/r/475508 (https://phabricator.wikimedia.org/T210295) [07:42:47] (03CR) 10Ema: [C: 032] ATS: path normalization [puppet] - 10https://gerrit.wikimedia.org/r/475508 (https://phabricator.wikimedia.org/T210295) (owner: 10Ema) [07:45:48] yep seems so [07:45:52] cc: godog --^ [07:46:06] let me know if I have to take precautions for the next times [07:48:31] (03PS1) 10Muehlenhoff: Fix absent entry [puppet] - 10https://gerrit.wikimedia.org/r/475709 [07:49:34] 10Operations, 10Traffic, 10Patch-For-Review: ATS path normalization - https://phabricator.wikimedia.org/T210295 (10ema) Looking at the logs on cp1071 and other cp-ats hosts, it seems that the patch above is working as expected: ` modified path: /api/rest_v1/page/mobile-sections/Wikipedia%3AWikiProject_Spide... [07:49:44] elukey: ack, thanks for the heads up! [07:51:55] (03CR) 10Muehlenhoff: [C: 032] Fix absent entry [puppet] - 10https://gerrit.wikimedia.org/r/475709 (owner: 10Muehlenhoff) [07:52:13] (03CR) 10Filippo Giunchedi: [C: 031] Remove Diamond from Ganeti hosts [puppet] - 10https://gerrit.wikimedia.org/r/475456 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [08:10:33] !log installing pixman security updates [08:10:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:11:53] (03PS1) 10Marostegui: db-eqiad.php: Depool db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475710 (https://phabricator.wikimedia.org/T202167) [08:12:46] (03PS1) 10Muehlenhoff: Add library hint for pixman [puppet] - 10https://gerrit.wikimedia.org/r/475711 [08:14:18] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475710 (https://phabricator.wikimedia.org/T202167) (owner: 10Marostegui) [08:15:22] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475710 (https://phabricator.wikimedia.org/T202167) (owner: 10Marostegui) [08:16:29] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1096:3316 - T202167 (duration: 00m 47s) [08:16:31] !log Deploy schema change on db1096:3316 - T202167 [08:16:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:32] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [08:16:32] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475710 (https://phabricator.wikimedia.org/T202167) (owner: 10Marostegui) [08:16:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:47] (03CR) 10Muehlenhoff: [C: 032] Add library hint for pixman [puppet] - 10https://gerrit.wikimedia.org/r/475711 (owner: 10Muehlenhoff) [08:19:18] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1096:3316" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475712 [08:20:31] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1096:3316" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475712 (owner: 10Marostegui) [08:21:32] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1096:3316" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475712 (owner: 10Marostegui) [08:22:27] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1096:3316 - T202167 (duration: 00m 46s) [08:22:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:30] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [08:25:44] (03PS1) 10Vgutierrez: acme_requests: Handle TCP/HTTPS errors [software/certcentral] - 10https://gerrit.wikimedia.org/r/475713 (https://phabricator.wikimedia.org/T209980) [08:29:53] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1096:3316" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475712 (owner: 10Marostegui) [08:30:59] (03PS1) 10Alexandros Kosiaris: varnish: use $domain_networks instead of $all_networks [puppet] - 10https://gerrit.wikimedia.org/r/475714 [08:31:21] (03CR) 10Alexandros Kosiaris: [C: 031] Remove Diamond from Ganeti hosts [puppet] - 10https://gerrit.wikimedia.org/r/475456 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [08:34:35] (03CR) 10Alexandros Kosiaris: [C: 04-1] network::constants: Include cloud private range in all_networks (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475150 (owner: 10Alex Monk) [08:35:31] !log removed labvirt1015 from debmonitor DB (got renamed to cloudvirt1015) [08:35:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:48:18] (03CR) 10Ema: [C: 031] wmflib: make the role() function store a path in $::_role [puppet] - 10https://gerrit.wikimedia.org/r/475498 (owner: 10Giuseppe Lavagetto) [08:48:19] (03CR) 10Ema: [C: 031] hiera: remove the role backend in production [puppet] - 10https://gerrit.wikimedia.org/r/475499 (owner: 10Giuseppe Lavagetto) [08:48:21] (03CR) 10Ema: [C: 031] hiera: fix the hierarchical order of lookups [puppet] - 10https://gerrit.wikimedia.org/r/475500 (owner: 10Giuseppe Lavagetto) [08:49:26] 10Operations, 10MediaWiki-Cache, 10Performance-Team (Radar), 10User-Elukey: mcrouter does not remove a memcached shard from consistent hashing when timeouts happen - https://phabricator.wikimedia.org/T208934 (10elukey) >>! In T208934#4763555, @aaron wrote: > It definitely seems like something worth doing.... [08:51:18] (03CR) 10Giuseppe Lavagetto: mediawiki::appserver: add php monitoring (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/475457 (https://phabricator.wikimedia.org/T209573) (owner: 10Giuseppe Lavagetto) [08:51:44] (03PS2) 10Giuseppe Lavagetto: mediawiki::appserver: add php monitoring [puppet] - 10https://gerrit.wikimedia.org/r/475457 (https://phabricator.wikimedia.org/T209573) [08:56:40] (03PS3) 10Muehlenhoff: Remove Diamond from Ganeti hosts [puppet] - 10https://gerrit.wikimedia.org/r/475456 (https://phabricator.wikimedia.org/T183454) [08:57:35] (03CR) 10Muehlenhoff: [C: 032] Remove Diamond from Ganeti hosts [puppet] - 10https://gerrit.wikimedia.org/r/475456 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [09:05:29] 10Operations, 10Discovery-Wikidata-Query-Service-Sprint: make wdqs-updater heap size configurable from puppet - https://phabricator.wikimedia.org/T210290 (10Gehel) p:05Triage>03High [09:12:02] PROBLEM - Check systemd state on ganeti2008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:12:30] PROBLEM - puppet last run on ganeti2008 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 3 minutes ago with 3 failures. Failed resources (up to 3 shown): Package[diamond],Package[python-diamond] [09:14:08] (03PS1) 10Gehel: wdqs: allow configuring wdqs-updater heap size [puppet] - 10https://gerrit.wikimedia.org/r/475717 (https://phabricator.wikimedia.org/T210290) [09:14:46] (03CR) 10jerkins-bot: [V: 04-1] wdqs: allow configuring wdqs-updater heap size [puppet] - 10https://gerrit.wikimedia.org/r/475717 (https://phabricator.wikimedia.org/T210290) (owner: 10Gehel) [09:16:36] RECOVERY - Check systemd state on ganeti2008 is OK: OK - running: The system is fully operational [09:17:37] RECOVERY - puppet last run on ganeti2008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:17:54] (03PS2) 10Gehel: wdqs: allow configuring wdqs-updater heap size [puppet] - 10https://gerrit.wikimedia.org/r/475717 (https://phabricator.wikimedia.org/T210290) [09:30:20] PROBLEM - Check systemd state on ganeti1005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:30:26] PROBLEM - puppet last run on ganeti1005 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 3 minutes ago with 3 failures. Failed resources (up to 3 shown): Package[diamond],Package[python-diamond] [09:38:06] RECOVERY - Logstash rate of ingestion percent change compared to yesterday on icinga1001 is OK: (C)130 ge (W)110 ge 85.32 https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1panelId=2fullscreen [09:39:39] (03CR) 10Seb35: extdist: Switch to Python 3 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475579 (https://phabricator.wikimedia.org/T210312) (owner: 10Legoktm) [09:40:08] 10Operations, 10monitoring: Icinga downtime script should fail on the passive hosts - https://phabricator.wikimedia.org/T210380 (10Volans) [09:40:16] 10Operations, 10monitoring: Icinga downtime script should fail on the passive hosts - https://phabricator.wikimedia.org/T210380 (10Volans) p:05Triage>03Normal [09:44:12] (03CR) 10Fsero: [C: 032] hiera: fix the hierarchical order of lookups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475500 (owner: 10Giuseppe Lavagetto) [09:44:39] !log performing schema change on s6 master - db1061 for jawiki (T85757) [09:44:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:43] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [09:45:10] RECOVERY - Check systemd state on ganeti1005 is OK: OK - running: The system is fully operational [09:45:52] RECOVERY - puppet last run on ganeti1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:47:13] (03PS1) 10Elukey: base::firewall::defs: update comments about analytics backups [puppet] - 10https://gerrit.wikimedia.org/r/475719 (https://phabricator.wikimedia.org/T204943) [09:48:09] (03CR) 10Elukey: [C: 032] base::firewall::defs: update comments about analytics backups [puppet] - 10https://gerrit.wikimedia.org/r/475719 (https://phabricator.wikimedia.org/T204943) (owner: 10Elukey) [09:51:54] !log performing schema change on s6 master - db1061 for ruwiki (T85757) [09:51:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:51:59] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [09:58:12] !log performing schema change on s6 master - db1061 for frwiki (T85757) [09:58:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:58:16] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [10:07:43] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10jcrespo) [10:11:40] hashar: Hi, I'm unable to be around during SWAT today, is there any chance you can arrange for https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/475363/ to be handled for https://phabricator.wikimedia.org/T210296 please? [10:12:20] (at least, the european mid-day swat. if not I can probably request it in the 'morning' swat) [10:16:33] Krenair: what is that -staging.php suffix? [10:16:56] I thought for beta we all had them with -labs [10:17:31] 670fdf7862 590 (Chad Horohoe 2017-10-31 10:36:19 -| require "$wmfConfigDir/reverse-proxy-staging.php"; [10:17:32] bah [10:17:49] Krenair: I am going to deploy it right now [10:17:52] it is just for beta [10:19:26] (03CR) 10Hashar: [C: 032] deployment-prep: Update IPs for Varnish [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475363 (https://phabricator.wikimedia.org/T208101) (owner: 10Alex Monk) [10:20:33] (03Merged) 10jenkins-bot: deployment-prep: Update IPs for Varnish [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475363 (https://phabricator.wikimedia.org/T208101) (owner: 10Alex Monk) [10:21:24] Krenair: done. Beta will self update at some point eventually :) [10:21:25] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10jcrespo) * We are still waiting on almost all details from Todd. * L3 is signed by both Ryan and Joe. * Need cl... [10:23:56] 10Operations, 10SRE-Access-Requests: access request for Jeena Huneidi (deployment, conint-admins, contint-docker) - https://phabricator.wikimedia.org/T210027 (10jcrespo) a:03jcrespo Assuming this is approved today, I will be deploying the access soon afterwards. [10:24:03] hashar, cool, thank you [10:25:01] (03CR) 10Volans: [C: 031] "Code looks good to me, just one nit inline, no need to review again if changing just that ;)" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [10:25:36] 10Operations, 10Analytics, 10SRE-Access-Requests: Allow access to Data Lake/Hive for Niharika - https://phabricator.wikimedia.org/T210022 (10jcrespo) a:03jcrespo [10:26:41] 10Operations, 10Analytics, 10SRE-Access-Requests: Allow access to Data Lake/Hive for Niharika - https://phabricator.wikimedia.org/T210022 (10jcrespo) [10:30:04] (03CR) 10jenkins-bot: deployment-prep: Update IPs for Varnish [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475363 (https://phabricator.wikimedia.org/T208101) (owner: 10Alex Monk) [10:30:04] jan_drewniak: #bothumor My software never has bugs. It just develops random features. Rise for Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181126T1030). [10:31:05] 10Operations, 10SRE-Access-Requests, 10WMDE-Analytics-Engineering, 10Graphite, 10User-Addshore: Requesting access to graphite hosts for addshore - https://phabricator.wikimedia.org/T208750 (10jcrespo) Quick question, Filippo, you mention "allowing certain groups" do you know of some in particular, or wou... [10:31:56] 10Operations, 10ChangeProp, 10RESTBase, 10Traffic, and 3 others: ATS path normalization - https://phabricator.wikimedia.org/T210295 (10mobrovac) [10:32:20] Krenair: you are welcome :) [10:32:54] !log rebooting boron [10:32:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:34:58] (03PS1) 10Fsero: initial debianization of docker distribution 2.7 [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475720 (https://phabricator.wikimedia.org/T210071) [10:35:00] (03CR) 10Gehel: "> Patch Set 24: Code-Review+1" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [10:35:37] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475721 (https://phabricator.wikimedia.org/T128546) [10:37:24] (03CR) 10Fsero: "I am not a DD, and not used to debian packaging so if i'm doing something wrong please tell me :)" [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475720 (https://phabricator.wikimedia.org/T210071) (owner: 10Fsero) [10:41:11] (03PS2) 10Fsero: initial debianization of docker distribution 2.7 [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475720 (https://phabricator.wikimedia.org/T210071) [10:41:30] 10Operations, 10SRE-Access-Requests: Requesting access to Jupyter notebook / analytics-privatedata-users for jgleeson - https://phabricator.wikimedia.org/T208432 (10jcrespo) a:05XenoRyet>03mepps @mepps @DStrine Can you approve the request? Please comment in, and unassign yourself. Once this has no assignee... [10:43:00] (03CR) 10Jdrewniak: [C: 032] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475721 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:44:03] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475721 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:44:17] (03CR) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475721 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:46:58] (03CR) 10Gehel: [C: 04-1] "Few comments inline" (034 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [10:55:13] hey ops, I was just about to do a Wikimedia/portals deploy, but I noticed this in the diff when pulling the mediawiki-config repo `wmf-config/reverse-proxy-staging.php | 4 ++--` [10:55:32] I'm not sure what that is... [10:55:34] (03CR) 10Gehel: [C: 031] "LGTM, but waiting on feedback from the WMCS team before moving forward with this." [puppet] - 10https://gerrit.wikimedia.org/r/475093 (https://phabricator.wikimedia.org/T206639) (owner: 10Mathew.onipe) [10:56:36] jan_drewniak: hashar had the same question earlier, not sure what the answer was [10:58:21] Looks like it was already deployed, so I should be ok. Not sure why it's showing up in the diff [11:00:35] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:475721|Bumping portals to master (T128546)]] (duration: 00m 47s) [11:00:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:40] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [11:01:22] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:475721|Bumping portals to master (T128546)]] (duration: 00m 46s) [11:01:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:06:44] (03CR) 10Alexandros Kosiaris: [C: 04-1] "is there a task for this ?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475688 (owner: 10KartikMistry) [11:06:55] jan_drewniak: I have merged a patch earlier. It is for the deplokyment-prep project [11:07:09] jan_drewniak: though I had pulled it on the prod deployment host, or at least should have [11:08:03] jan_drewniak: oh no my bad. I have fteched it but forgot to rebase :/ It is safe to be synced on production, the file is only loaded when running on labs realm (== beta cluster) [11:09:00] hashar: ok thanks! I figured it was something like that [11:10:38] (03CR) 10Alexandros Kosiaris: [C: 04-1] "1 minor comment, rest LGTM" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/475457 (https://phabricator.wikimedia.org/T209573) (owner: 10Giuseppe Lavagetto) [11:11:13] (03PS1) 10Ema: ATS: path normalization tests [puppet] - 10https://gerrit.wikimedia.org/r/475723 (https://phabricator.wikimedia.org/T210295) [11:18:46] (03CR) 10KartikMistry: cxserver: Disable Youdao MT until service is back (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475688 (owner: 10KartikMistry) [11:21:05] (03CR) 10Hashar: initial debianization of docker distribution 2.7 (032 comments) [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475720 (https://phabricator.wikimedia.org/T210071) (owner: 10Fsero) [11:24:11] akosiaris: no need of Puppet config removal for https://gerrit.wikimedia.org/r/475688, right? [11:24:50] kart_: em, I still haven't understood what is going on [11:25:06] is the service broken ? [11:25:09] or not ? [11:27:31] (03PS3) 10Fsero: initial debianization of docker distribution 2.7 [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475720 (https://phabricator.wikimedia.org/T210071) [11:31:08] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I left some comments on the patch but I have a more general question about vendoring:" (035 comments) [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475720 (https://phabricator.wikimedia.org/T210071) (owner: 10Fsero) [11:32:16] (03PS2) 10Ema: ATS: path normalization tests [puppet] - 10https://gerrit.wikimedia.org/r/475723 (https://phabricator.wikimedia.org/T210295) [11:34:04] (03CR) 10jerkins-bot: [V: 04-1] ATS: path normalization tests [puppet] - 10https://gerrit.wikimedia.org/r/475723 (https://phabricator.wikimedia.org/T210295) (owner: 10Ema) [11:41:30] !log icinga downtime cloudcontrol1004 for some systemd slice tests [11:41:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:45:03] akosiaris: service down; we will work on it once we get update from them. [11:45:25] so need to disable temporary. I guess I can do it via cxserver/deploy only. [11:46:29] (03PS4) 10Fsero: initial debianization of docker distribution 2.7 [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475720 (https://phabricator.wikimedia.org/T210071) [11:46:46] kart_: if it's down and we want to disable it, then you need that puppet patch and a deploy for the configuration to be distributed to all hosts [11:48:59] (03PS3) 10Filippo Giunchedi: WIP rsyslog: udp input json_lines shim [puppet] - 10https://gerrit.wikimedia.org/r/475352 (https://phabricator.wikimedia.org/T205851) [11:49:26] (03PS1) 10Ema: ts-lua: add logic to detect when to run tests [puppet] - 10https://gerrit.wikimedia.org/r/475725 [11:50:16] (03CR) 10Fsero: "Re vendoring, you are right i added that patch because some vendor dependencies are broken on master but since i'm building this from the " (033 comments) [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475720 (https://phabricator.wikimedia.org/T210071) (owner: 10Fsero) [11:52:31] (03PS5) 10Fsero: initial debianization of docker distribution 2.7 [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475720 (https://phabricator.wikimedia.org/T210071) [11:53:23] (03PS3) 10Ema: ATS: path normalization tests [puppet] - 10https://gerrit.wikimedia.org/r/475723 (https://phabricator.wikimedia.org/T210295) [11:53:52] akosiaris: in that case, can you deploy that? [11:54:02] kart_: sure [11:54:19] (03CR) 10jerkins-bot: [V: 04-1] ATS: path normalization tests [puppet] - 10https://gerrit.wikimedia.org/r/475723 (https://phabricator.wikimedia.org/T210295) (owner: 10Ema) [11:54:31] (03CR) 10Alexandros Kosiaris: [C: 032] cxserver: Disable Youdao MT until service is back [puppet] - 10https://gerrit.wikimedia.org/r/475688 (owner: 10KartikMistry) [11:54:38] (03PS2) 10Alexandros Kosiaris: cxserver: Disable Youdao MT until service is back [puppet] - 10https://gerrit.wikimedia.org/r/475688 (owner: 10KartikMistry) [11:54:40] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] cxserver: Disable Youdao MT until service is back [puppet] - 10https://gerrit.wikimedia.org/r/475688 (owner: 10KartikMistry) [11:55:10] (03PS4) 10Ema: ATS: path normalization tests [puppet] - 10https://gerrit.wikimedia.org/r/475723 (https://phabricator.wikimedia.org/T210295) [11:56:05] (03CR) 10jerkins-bot: [V: 04-1] ATS: path normalization tests [puppet] - 10https://gerrit.wikimedia.org/r/475723 (https://phabricator.wikimedia.org/T210295) (owner: 10Ema) [11:56:26] akosiaris: thanks. I'll update cxserver now. [11:56:29] !log kartik@deploy1001 Started deploy [cxserver/deploy@da41227]: Disable Youdao MT until service is back [11:56:31] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10Afandian) Sorry for the confusion, I'm struggling to understand how many different types of accounts I need and... [11:56:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:59:46] (03PS6) 10Fsero: initial debianization of docker distribution 2.7 [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475720 (https://phabricator.wikimedia.org/T210071) [12:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181126T1200). [12:00:04] Zoranzoki21: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:30] !log kartik@deploy1001 Finished deploy [cxserver/deploy@da41227]: Disable Youdao MT until service is back (duration: 04m 01s) [12:00:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:44] Zoranzoki21: around for SWAT? [12:03:08] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10Lauren.maggio) Thank you for helping with this! I will reach out to Todd re outstanding items If possible,... [12:04:43] <_joe_> !log uploaded php-excimer for component thirdparty/php72 to stretch-wikimedia T205059 [12:04:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:04:47] T205059: Excimer: new profiler for PHP - https://phabricator.wikimedia.org/T205059 [12:06:36] (03PS2) 10Ema: tslua: add logic to detect when to run tests [puppet] - 10https://gerrit.wikimedia.org/r/475725 [12:08:31] (03PS5) 10Ema: ATS: path normalization tests [puppet] - 10https://gerrit.wikimedia.org/r/475723 (https://phabricator.wikimedia.org/T210295) [12:09:52] (03Abandoned) 10Alexandros Kosiaris: docker-registry: Allow image layers up to 3g to be pushed [puppet] - 10https://gerrit.wikimedia.org/r/455829 (https://phabricator.wikimedia.org/T200722) (owner: 10Alexandros Kosiaris) [12:15:21] (03PS6) 10Ema: ATS: path normalization tests [puppet] - 10https://gerrit.wikimedia.org/r/475723 (https://phabricator.wikimedia.org/T210295) [12:15:56] (03PS2) 10Alexandros Kosiaris: coal: Remove redundant uwsgi::app parameter [puppet] - 10https://gerrit.wikimedia.org/r/426010 (https://phabricator.wikimedia.org/T192102) [12:15:58] (03PS2) 10Alexandros Kosiaris: encapi: Remove redundant uwsgi::app parameter [puppet] - 10https://gerrit.wikimedia.org/r/426011 (https://phabricator.wikimedia.org/T192102) [12:16:00] (03PS2) 10Alexandros Kosiaris: dynamicproxy: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426012 (https://phabricator.wikimedia.org/T192102) [12:16:02] (03PS2) 10Alexandros Kosiaris: graphite::web: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426013 (https://phabricator.wikimedia.org/T192102) [12:16:04] (03PS2) 10Alexandros Kosiaris: ifttt: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426014 (https://phabricator.wikimedia.org/T192102) [12:16:06] (03PS2) 10Alexandros Kosiaris: quarry::web: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426015 (https://phabricator.wikimedia.org/T192102) [12:16:08] (03PS2) 10Alexandros Kosiaris: service::uwsgi: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426016 (https://phabricator.wikimedia.org/T192102) [12:16:10] (03PS2) 10Alexandros Kosiaris: wikilabels::web: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426017 (https://phabricator.wikimedia.org/T192102) [12:16:12] (03PS2) 10Alexandros Kosiaris: wikimetrics::web: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426018 (https://phabricator.wikimedia.org/T192102) [12:16:52] (03CR) 10Alexandros Kosiaris: [C: 032] coal: Remove redundant uwsgi::app parameter [puppet] - 10https://gerrit.wikimedia.org/r/426010 (https://phabricator.wikimedia.org/T192102) (owner: 10Alexandros Kosiaris) [12:19:18] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10jcrespo) Sorry for the confusion, @Afandian. You need to register an account on https://wikitech.wikimedia.org... [12:21:09] (03PS7) 10Fsero: initial debianization of docker distribution 2.7 [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475720 (https://phabricator.wikimedia.org/T210071) [12:21:38] (03CR) 10Alexandros Kosiaris: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/13704/ says noop, merging" [puppet] - 10https://gerrit.wikimedia.org/r/426011 (https://phabricator.wikimedia.org/T192102) (owner: 10Alexandros Kosiaris) [12:22:49] (03PS3) 10Giuseppe Lavagetto: mediawiki::appserver: add php monitoring [puppet] - 10https://gerrit.wikimedia.org/r/475457 (https://phabricator.wikimedia.org/T209573) [12:25:12] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10jcrespo) > If possible, can we move forward at least on Ryan Steinberg's access? I will schedule it for approv... [12:25:13] (03CR) 10Giuseppe Lavagetto: mediawiki::appserver: add php monitoring (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475457 (https://phabricator.wikimedia.org/T209573) (owner: 10Giuseppe Lavagetto) [12:29:19] (03PS3) 10Alexandros Kosiaris: wikilabels::web: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426017 (https://phabricator.wikimedia.org/T192102) [12:29:32] (03CR) 10Alexandros Kosiaris: [C: 032] wikilabels::web: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426017 (https://phabricator.wikimedia.org/T192102) (owner: 10Alexandros Kosiaris) [12:30:09] (03PS1) 10Giuseppe Lavagetto: Add configuration for credentials of the php admin site [labs/private] - 10https://gerrit.wikimedia.org/r/475729 [12:30:54] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Add configuration for credentials of the php admin site [labs/private] - 10https://gerrit.wikimedia.org/r/475729 (owner: 10Giuseppe Lavagetto) [12:36:49] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki::appserver: add php monitoring [puppet] - 10https://gerrit.wikimedia.org/r/475457 (https://phabricator.wikimedia.org/T209573) (owner: 10Giuseppe Lavagetto) [12:37:17] (03PS4) 10Giuseppe Lavagetto: mediawiki::appserver: add php monitoring [puppet] - 10https://gerrit.wikimedia.org/r/475457 (https://phabricator.wikimedia.org/T209573) [12:38:27] (03CR) 10Alexandros Kosiaris: [C: 032] "kill -USR1 on novaproxy-01 says" [puppet] - 10https://gerrit.wikimedia.org/r/426012 (https://phabricator.wikimedia.org/T192102) (owner: 10Alexandros Kosiaris) [12:38:35] (03PS3) 10Alexandros Kosiaris: dynamicproxy: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426012 (https://phabricator.wikimedia.org/T192102) [12:38:37] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] dynamicproxy: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426012 (https://phabricator.wikimedia.org/T192102) (owner: 10Alexandros Kosiaris) [12:41:41] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::php::monitoring: fix php-admin vhost [puppet] - 10https://gerrit.wikimedia.org/r/475731 [12:42:44] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::mediawiki::php::monitoring: fix php-admin vhost [puppet] - 10https://gerrit.wikimedia.org/r/475731 (owner: 10Giuseppe Lavagetto) [12:46:03] (03CR) 10Alexandros Kosiaris: [C: 032] "kill -USR1 on graphite1003 says" [puppet] - 10https://gerrit.wikimedia.org/r/426013 (https://phabricator.wikimedia.org/T192102) (owner: 10Alexandros Kosiaris) [12:46:11] (03PS3) 10Alexandros Kosiaris: graphite::web: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426013 (https://phabricator.wikimedia.org/T192102) [12:46:26] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] graphite::web: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426013 (https://phabricator.wikimedia.org/T192102) (owner: 10Alexandros Kosiaris) [12:48:29] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10jcrespo) [12:50:42] <_joe_> mw1269 when it alerts is me [12:51:27] PROBLEM - Apache HTTP on mw1269 is CRITICAL: connect to address 10.64.0.64 and port 80: Connection refused [12:51:49] PROBLEM - HHVM rendering on mw1269 is CRITICAL: connect to address 10.64.0.64 and port 80: Connection refused [12:51:51] PROBLEM - Check systemd state on mw1269 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:56:03] <_joe_> it's already depooled fwiw [12:56:27] RECOVERY - Apache HTTP on mw1269 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.071 second response time [12:56:51] RECOVERY - HHVM rendering on mw1269 is OK: HTTP OK: HTTP/1.1 200 OK - 77478 bytes in 0.150 second response time [12:56:55] RECOVERY - Check systemd state on mw1269 is OK: OK - running: The system is fully operational [13:08:43] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::php::monitoring: disable auth on -free methods [puppet] - 10https://gerrit.wikimedia.org/r/475733 [13:09:09] (03PS4) 10Muehlenhoff: Script to generate service principals/keytabs (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/470566 [13:09:12] (03CR) 10Hashar: [C: 031] tslua: add logic to detect when to run tests [puppet] - 10https://gerrit.wikimedia.org/r/475725 (owner: 10Ema) [13:10:02] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::mediawiki::php::monitoring: disable auth on -free methods [puppet] - 10https://gerrit.wikimedia.org/r/475733 (owner: 10Giuseppe Lavagetto) [13:12:19] PROBLEM - puppet last run on kafka1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:15:51] (03PS3) 10Hashar: ci: add some gated extensions to git cache [puppet] - 10https://gerrit.wikimedia.org/r/440539 (https://phabricator.wikimedia.org/T197469) [13:24:02] (03CR) 10Zfilipin: "Not deployed during EU SWAT today because the developer was not available on IRC." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475367 (https://phabricator.wikimedia.org/T210171) (owner: 10Zoranzoki21) [13:25:51] 10Operations, 10Discovery-Wikidata-Query-Service-Sprint, 10Patch-For-Review: wdqs-updater crashing on all wdqs servers - https://phabricator.wikimedia.org/T210235 (10Gehel) p:05Triage>03High The immediate issue is solved. I'd like a review from @Smalyshev to see if we have a good way to prevent this from... [13:26:16] 10Operations, 10Discovery-Wikidata-Query-Service-Sprint, 10Patch-For-Review: wdqs-updater crashing on all wdqs servers - https://phabricator.wikimedia.org/T210235 (10Gehel) a:03Gehel [13:26:29] 10Operations, 10Discovery-Wikidata-Query-Service-Sprint, 10Patch-For-Review: make wdqs-updater heap size configurable from puppet - https://phabricator.wikimedia.org/T210290 (10Gehel) a:03Gehel [13:27:30] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10Afandian) @jcrespo The account name is "Joe Wass". [13:33:18] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10jcrespo) Thanks, @Afandian - this is unrelated to this request, but let me suggest to link it to your Phabricat... [13:33:38] !log rebooting netmon1003 [13:33:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:30] !log rebooting releases2001 [13:36:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:37:31] RECOVERY - puppet last run on kafka1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:43:30] (03PS3) 10Alexandros Kosiaris: ifttt: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426014 (https://phabricator.wikimedia.org/T192102) [13:43:35] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] ifttt: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426014 (https://phabricator.wikimedia.org/T192102) (owner: 10Alexandros Kosiaris) [13:47:20] !log rebooting releases1001 [13:47:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:49:46] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10jcrespo) [13:50:09] (03CR) 10Alexandros Kosiaris: [C: 032] "kill -USR1 on query-web-01 in labs says" [puppet] - 10https://gerrit.wikimedia.org/r/426015 (https://phabricator.wikimedia.org/T192102) (owner: 10Alexandros Kosiaris) [13:50:17] (03PS3) 10Alexandros Kosiaris: quarry::web: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426015 (https://phabricator.wikimedia.org/T192102) [13:50:20] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] quarry::web: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426015 (https://phabricator.wikimedia.org/T192102) (owner: 10Alexandros Kosiaris) [13:52:09] (03PS1) 10Hashar: tests: replace deprecated assertEquals [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475734 [13:52:58] (03CR) 10jerkins-bot: [V: 04-1] tests: replace deprecated assertEquals [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475734 (owner: 10Hashar) [13:54:06] (03PS2) 10Hashar: tests: replace deprecated assertEquals [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475734 [13:57:58] (03PS1) 10Jcrespo: mariadb: Add Niharika to analytics-privatedata-users group [puppet] - 10https://gerrit.wikimedia.org/r/475736 (https://phabricator.wikimedia.org/T210022) [14:00:53] jynus: mariadb? :) [14:01:25] (03CR) 10Jcrespo: "I would appreciate a +1 to confirm this fulfills the request." [puppet] - 10https://gerrit.wikimedia.org/r/475736 (https://phabricator.wikimedia.org/T210022) (owner: 10Jcrespo) [14:02:04] (03PS2) 10Jcrespo: admin: Add Niharika to analytics-privatedata-users group [puppet] - 10https://gerrit.wikimedia.org/r/475736 (https://phabricator.wikimedia.org/T210022) [14:04:08] elukey: just a mind slip, corrected on PS2 [14:04:23] I use it all the time on mediawiki-config for pools and depools [14:04:47] yep yep I figured, it was nice to see it popping up in my IRC feed :) [14:06:00] you can vote +1 or -1, even if I didn't explicitly asked you [14:06:12] I just added the people that were directly involved [14:06:44] a -1 for those mistakes is greatly appreciated [14:07:10] as it prevents (maybe not in this case) major issues [14:07:17] (03CR) 10Alexandros Kosiaris: [C: 032] "kill -USR1 on wikimetrics-01 says" [puppet] - 10https://gerrit.wikimedia.org/r/426018 (https://phabricator.wikimedia.org/T192102) (owner: 10Alexandros Kosiaris) [14:07:24] (03PS3) 10Alexandros Kosiaris: wikimetrics::web: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426018 (https://phabricator.wikimedia.org/T192102) [14:07:29] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] wikimetrics::web: Stop using --autoload in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/426018 (https://phabricator.wikimedia.org/T192102) (owner: 10Alexandros Kosiaris) [14:09:18] jynus: sure sure I'd have done it in cae, but it was clearly a very minor issue due to work habits :) [14:10:31] (03Abandoned) 10Alexandros Kosiaris: uwsgi: Remove uwsgi from service name [puppet] - 10https://gerrit.wikimedia.org/r/291751 (owner: 10Alexandros Kosiaris) [14:11:27] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, nit inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475725 (owner: 10Ema) [14:12:04] (03Abandoned) 10Alexandros Kosiaris: service::node: change logrotate parameters [puppet] - 10https://gerrit.wikimedia.org/r/348717 (owner: 10Alexandros Kosiaris) [14:14:53] (03PS1) 10Banyek: mariadb: depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475739 (https://phabricator.wikimedia.org/T85757) [14:15:07] (03PS1) 10Banyek: mariadb: depool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475740 (https://phabricator.wikimedia.org/T85757) [14:15:25] (03PS1) 10Banyek: mariadb: depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475741 (https://phabricator.wikimedia.org/T85757) [14:15:34] (03PS1) 10Banyek: mariadb: depool db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475742 (https://phabricator.wikimedia.org/T85757) [14:15:48] (03PS1) 10Banyek: mariadb: depool db1105 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475743 (https://phabricator.wikimedia.org/T85757) [14:16:01] (03PS1) 10Banyek: mariadb: depool db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475744 (https://phabricator.wikimedia.org/T85757) [14:16:20] (03CR) 10Filippo Giunchedi: "Since this package exists in debian already I think it'd be easier to review your changes if the review contained changes wrt the existing" [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475720 (https://phabricator.wikimedia.org/T210071) (owner: 10Fsero) [14:16:22] (03CR) 10jerkins-bot: [V: 04-1] mariadb: depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475741 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:17:45] (03PS2) 10Banyek: mariadb: depool db1105 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475743 (https://phabricator.wikimedia.org/T85757) [14:19:08] (03PS2) 10Banyek: mariadb: depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475741 (https://phabricator.wikimedia.org/T85757) [14:20:08] (03CR) 10jerkins-bot: [V: 04-1] mariadb: depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475741 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:22:30] (03PS3) 10Jayprakash12345: Enable NewUserMessage Extension on tcy.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474409 (https://phabricator.wikimedia.org/T209432) [14:22:50] (03PS3) 10Banyek: mariadb: depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475741 (https://phabricator.wikimedia.org/T85757) [14:24:26] (03CR) 10Volans: [C: 031] "I think we're there, LGTM! Thanks for the patience :)" [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe) [14:26:05] (03CR) 10Marostegui: [C: 031] mariadb: depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475739 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:26:58] (03CR) 10Marostegui: [C: 04-1] mariadb: depool db1105 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475743 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:27:21] (03CR) 10Marostegui: mariadb: depool db1074 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475739 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:27:42] (03CR) 10Marostegui: mariadb: depool db1122 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475744 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:27:57] (03PS1) 10DCausse: [cirrus] Use normal config for labswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475745 (https://phabricator.wikimedia.org/T198352) [14:27:59] (03PS1) 10DCausse: [cirrus] multi-instance: add cirrussearch-big-indices.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475746 (https://phabricator.wikimedia.org/T210381) [14:28:01] (03PS1) 10DCausse: [cirrus] Allow configuration arrays in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475747 (https://phabricator.wikimedia.org/T210381) [14:28:03] (03PS1) 10DCausse: [cirrus] switch to explicit config in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475748 (https://phabricator.wikimedia.org/T210381) [14:28:05] (03PS1) 10DCausse: [cirrus] prepare multi-instance services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475749 (https://phabricator.wikimedia.org/T210381) [14:28:07] (03PS1) 10DCausse: [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) [14:28:09] (03PS1) 10DCausse: [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 [14:28:11] (03PS1) 10DCausse: [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) [14:28:19] (03PS3) 10Banyek: mariadb: depool db1105 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475743 (https://phabricator.wikimedia.org/T85757) [14:29:04] (03PS1) 10Gehel: elasticsearch: configure LVS endpoint for new codfw clusters [puppet] - 10https://gerrit.wikimedia.org/r/475753 (https://phabricator.wikimedia.org/T207195) [14:29:06] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Allow configuration arrays in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475747 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [14:29:15] dcausse: ^ looks like we're working on the same things :) [14:29:18] :) [14:29:21] (03CR) 10Marostegui: "Try to give more context in the commit body: something like alter table is normally enough. Because right now the title and the body are t" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475740 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:29:23] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] prepare multi-instance services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475749 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [14:29:26] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] switch to explicit config in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475748 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [14:29:44] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [14:30:00] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 (owner: 10DCausse) [14:30:06] (03PS5) 10Muehlenhoff: Script to generate service principals/keytabs (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/470566 [14:30:44] (03CR) 10Gehel: [cirrus] Start using psi&omega in codfw (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 (owner: 10DCausse) [14:30:45] (03CR) 10Marostegui: "Try to put something more meaningful on the commit body" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475743 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:30:47] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [14:31:53] (03CR) 10Marostegui: "Same as the other comments: Try to put something different on the commit body than the commit title :-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475742 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:32:15] (03PS3) 10Ema: tslua: add logic to detect when to run tests [puppet] - 10https://gerrit.wikimedia.org/r/475725 [14:35:22] (03CR) 10Ema: [C: 032] tslua: add logic to detect when to run tests [puppet] - 10https://gerrit.wikimedia.org/r/475725 (owner: 10Ema) [14:36:26] (03PS7) 10Ema: ATS: path normalization tests [puppet] - 10https://gerrit.wikimedia.org/r/475723 (https://phabricator.wikimedia.org/T210295) [14:36:33] (03PS2) 10Banyek: mariadb: depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475739 (https://phabricator.wikimedia.org/T85757) [14:37:17] (03CR) 10jerkins-bot: [V: 04-1] ATS: path normalization tests [puppet] - 10https://gerrit.wikimedia.org/r/475723 (https://phabricator.wikimedia.org/T210295) (owner: 10Ema) [14:37:26] (03PS2) 10Banyek: mariadb: depool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475740 (https://phabricator.wikimedia.org/T85757) [14:38:13] (03PS4) 10Banyek: mariadb: depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475741 (https://phabricator.wikimedia.org/T85757) [14:38:47] (03PS8) 10Ema: ATS: path normalization tests [puppet] - 10https://gerrit.wikimedia.org/r/475723 (https://phabricator.wikimedia.org/T210295) [14:39:09] (03PS2) 10Banyek: mariadb: depool db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475742 (https://phabricator.wikimedia.org/T85757) [14:39:58] (03PS4) 10Banyek: mariadb: depool db1105 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475743 (https://phabricator.wikimedia.org/T85757) [14:40:11] (03CR) 10Ema: [C: 032] ATS: path normalization tests [puppet] - 10https://gerrit.wikimedia.org/r/475723 (https://phabricator.wikimedia.org/T210295) (owner: 10Ema) [14:40:34] (03CR) 10Fsero: "@fgiunchedi you are right, i'll do that" [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475720 (https://phabricator.wikimedia.org/T210071) (owner: 10Fsero) [14:40:59] (03PS2) 10Banyek: mariadb: depool db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475744 (https://phabricator.wikimedia.org/T85757) [14:41:59] (03CR) 10Marostegui: [C: 031] mariadb: depool db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475744 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:42:10] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::php::monitoring: add missing newline [puppet] - 10https://gerrit.wikimedia.org/r/475755 [14:42:12] (03Abandoned) 10Fsero: initial debianization of docker distribution 2.7 [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475720 (https://phabricator.wikimedia.org/T210071) (owner: 10Fsero) [14:42:22] (03CR) 10Marostegui: [C: 031] mariadb: depool db1105 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475743 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:43:19] (03CR) 10Marostegui: [C: 031] mariadb: depool db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475742 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:43:58] (03CR) 10Marostegui: [C: 04-1] mariadb: depool db1090:3312 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475741 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:44:10] (03PS1) 10Ema: ATS normalize-path: stop logging [puppet] - 10https://gerrit.wikimedia.org/r/475756 (https://phabricator.wikimedia.org/T210295) [14:44:40] (03CR) 10Marostegui: [C: 031] mariadb: depool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475740 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:44:56] (03CR) 10Marostegui: [C: 031] mariadb: depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475739 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:45:26] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::mediawiki::php::monitoring: add missing newline [puppet] - 10https://gerrit.wikimedia.org/r/475755 (owner: 10Giuseppe Lavagetto) [14:45:43] (03PS2) 10Ema: ATS normalize-path: stop logging [puppet] - 10https://gerrit.wikimedia.org/r/475756 (https://phabricator.wikimedia.org/T210295) [14:46:55] (03CR) 10Ema: [C: 032] ATS normalize-path: stop logging [puppet] - 10https://gerrit.wikimedia.org/r/475756 (https://phabricator.wikimedia.org/T210295) (owner: 10Ema) [14:47:12] (03PS5) 10Banyek: mariadb: depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475741 (https://phabricator.wikimedia.org/T85757) [14:47:58] (03PS1) 10Vgutierrez: archiva: Use certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/475757 (https://phabricator.wikimedia.org/T207050) [14:50:44] (03CR) 10Marostegui: [C: 031] mariadb: depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475741 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [14:50:51] (03CR) 10Elukey: [C: 031] archiva: Deploy certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/475692 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [14:51:35] (03CR) 10Vgutierrez: [C: 032] archiva: Use certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/475757 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [14:52:12] (03CR) 10DCausse: [C: 031] elasticsearch_cluster: Added multi-cluster/multi-instance support [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe) [14:52:14] (03CR) 10Vgutierrez: [C: 032] archiva: Deploy certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/475692 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [14:52:16] (03PS1) 10Hashar: cli: allow INFO logging to stdout [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475758 [14:52:19] (03PS2) 10Vgutierrez: archiva: Deploy certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/475692 (https://phabricator.wikimedia.org/T207050) [14:55:06] 10Operations, 10Traffic: ATS production-ready as a backend cache layer - https://phabricator.wikimedia.org/T207048 (10ema) [14:55:10] 10Operations, 10Traffic, 10Patch-For-Review: ATS: log inspection at runtime - https://phabricator.wikimedia.org/T204225 (10ema) 05Open>03Resolved [14:56:47] (03PS1) 10WMDE-Fisch: Make AdvancedSearch default on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475759 (https://phabricator.wikimedia.org/T207639) [14:59:27] (03PS1) 10Andrew Bogott: Horizon: move more projects to eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/475760 (https://phabricator.wikimedia.org/T204745) [15:00:53] (03CR) 10Andrew Bogott: [C: 032] Horizon: move more projects to eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/475760 (https://phabricator.wikimedia.org/T204745) (owner: 10Andrew Bogott) [15:03:24] 10Operations, 10Citoid, 10Patch-For-Review, 10Service-deployment-requests, and 4 others: Deploy translation-server-v2 - https://phabricator.wikimedia.org/T201611 (10marcella) @Mvolz can you work with @Ryasmeen to identify whether there are any specific QA needs we should resolve? [15:03:43] (03CR) 10Elukey: [C: 031] archiva: Use certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/475757 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [15:04:27] (03PS1) 10Anomie: Set comment migration stage to new in Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475761 (https://phabricator.wikimedia.org/T166733) [15:05:02] (03PS2) 10Anomie: Set comment migration stage to new in Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475761 (https://phabricator.wikimedia.org/T166733) [15:05:58] (03CR) 10Anomie: [C: 032] "Deploying config change" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475761 (https://phabricator.wikimedia.org/T166733) (owner: 10Anomie) [15:06:19] (03PS1) 10Vgutierrez: certcentral: Notified puppet_svc on certificate file changes [puppet] - 10https://gerrit.wikimedia.org/r/475762 (https://phabricator.wikimedia.org/T207050) [15:07:04] (03Merged) 10jenkins-bot: Set comment migration stage to new in Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475761 (https://phabricator.wikimedia.org/T166733) (owner: 10Anomie) [15:11:23] (03CR) 10jenkins-bot: Set comment migration stage to new in Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475761 (https://phabricator.wikimedia.org/T166733) (owner: 10Anomie) [15:14:32] (03CR) 10Vgutierrez: [C: 032] archiva: Use certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/475757 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [15:14:39] (03PS2) 10Vgutierrez: archiva: Use certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/475757 (https://phabricator.wikimedia.org/T207050) [15:15:06] (03CR) 10DCausse: [cirrus] Start using psi&omega in codfw (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 (owner: 10DCausse) [15:18:41] (03PS1) 10Fsero: Importing docker-registry debianization from buster for 2.6.2 [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475764 [15:19:39] (03PS1) 10Vgutierrez: archiva: Remove old LE puppetization [puppet] - 10https://gerrit.wikimedia.org/r/475765 (https://phabricator.wikimedia.org/T207050) [15:20:14] (03PS2) 10Fsero: Importing docker-registry debianization from buster for 2.6.2 [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475764 (https://phabricator.wikimedia.org/T210071) [15:22:25] (03PS1) 10Anomie: Set ActorTableSchemaMigrationStage => write-both/read-old on group 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475767 (https://phabricator.wikimedia.org/T188327) [15:22:44] (03CR) 10Anomie: [C: 032] "Deploying config change" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475767 (https://phabricator.wikimedia.org/T188327) (owner: 10Anomie) [15:23:28] (03CR) 10Elukey: [C: 031] archiva: Remove old LE puppetization [puppet] - 10https://gerrit.wikimedia.org/r/475765 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [15:23:47] (03Merged) 10jenkins-bot: Set ActorTableSchemaMigrationStage => write-both/read-old on group 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475767 (https://phabricator.wikimedia.org/T188327) (owner: 10Anomie) [15:23:58] (03CR) 10Vgutierrez: [C: 032] archiva: Remove old LE puppetization [puppet] - 10https://gerrit.wikimedia.org/r/475765 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [15:24:44] (03PS2) 10Vgutierrez: archiva: Remove old LE puppetization [puppet] - 10https://gerrit.wikimedia.org/r/475765 (https://phabricator.wikimedia.org/T207050) [15:24:46] (03CR) 10jenkins-bot: Set ActorTableSchemaMigrationStage => write-both/read-old on group 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475767 (https://phabricator.wikimedia.org/T188327) (owner: 10Anomie) [15:24:49] !log anomie@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-old on group 1 (T188327) (duration: 00m 46s) [15:24:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:53] T188327: Deploy refactored actor storage - https://phabricator.wikimedia.org/T188327 [15:25:05] 10Operations, 10cloud-services-team (Kanban): Reboot WMCS servers for L1TF - https://phabricator.wikimedia.org/T207377 (10Bstorm) So far, we haven't had a smooth maintenance on those two. They will break toolforge for at least a bit. Otherwise, there is a failover mechanism. They are both NFS and web server... [15:26:13] (03PS1) 10Giuseppe Lavagetto: php: upgrade deployment servers, maintenance servers to php 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/475768 (https://phabricator.wikimedia.org/T208433) [15:26:15] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::php: install excimer on newer versions of php [puppet] - 10https://gerrit.wikimedia.org/r/475769 (https://phabricator.wikimedia.org/T205059) [15:26:17] (03PS3) 10Fsero: Importing docker-registry debianization from buster for 2.6.2 [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475764 (https://phabricator.wikimedia.org/T210071) [15:26:19] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::jobrunner: convert to use class httpd [puppet] - 10https://gerrit.wikimedia.org/r/475770 [15:26:31] !log ppchelko@deploy1001 Started deploy [changeprop/deploy@b97e8eb]: Temporary stop emitting revision-score events for schema change T197000 [15:26:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:26:34] T197000: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 [15:27:14] PROBLEM - MariaDB Slave SQL: s2 on db1095 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1146, Errmsg: Error Table enwiktionary.actor doesnt exist on query. Default database: enwiktionary. [Query snipped] [15:27:51] !log ppchelko@deploy1001 Finished deploy [changeprop/deploy@b97e8eb]: Temporary stop emitting revision-score events for schema change T197000 (duration: 01m 21s) [15:27:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:05] PROBLEM - MariaDB Slave SQL: s2 on db1122 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1146, Errmsg: Error Table enwiktionary.actor doesnt exist on query. Default database: enwiktionary. [Query snipped] [15:28:27] (03CR) 10Fsero: "as suggested over https://gerrit.wikimedia.org/r/#/c/operations/debs/docker-distribution/+/475720/ importing first the original 2.6.2 pack" [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475764 (https://phabricator.wikimedia.org/T210071) (owner: 10Fsero) [15:28:29] <_joe_> uh what's going on? [15:28:49] I guess that table was not created everywhere? [15:28:54] I guess not [15:29:02] I'm guessing the "temporary stop" is already in response to what caused the replication stop, at a glance [15:29:11] * volans around but about to enter a meeting [15:29:13] <_joe_> marostegui: are you on top of it? [15:29:14] ping if needed ofc [15:29:26] _joe_: I guess so [15:29:26] is there some deploy? [15:29:33] • 15:24 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-old on group 1 (https://phabricator.wikimedia.org/T188327) (duration: 00m 46s) [15:29:54] <_joe_> do we need to revert that? [15:30:00] I think they already are [15:30:12] 3. Turn the feature flag to "write both, read old". See if stuff breaks. that would be: yes [15:30:15] 15:26 <+logmsgbot> !log ppchelko@deploy1001 Started deploy [changeprop/deploy@b97e8eb]: Temporary stop emitting revision-score events for schema change T197000 [15:30:25] but that is not possible, if table creation was done on the master [15:30:31] very weird [15:30:33] <_joe_> bblack: that has nothing to do with it [15:30:42] (03PS1) 10Marostegui: db-eqiad.php: Depool db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475771 [15:30:45] <_joe_> that's for json schema changes :) [15:30:46] `actor` table creation was T188299, BTW. [15:30:46] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [15:30:50] ok [15:31:14] (03PS1) 10Daimona Eaytoy: Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772 [15:31:18] so we have 2 hosts breaking only? [15:31:24] that is the weird part [15:31:30] (03CR) 10Marostegui: [V: 032 C: 032] db-eqiad.php: Depool db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475771 (owner: 10Marostegui) [15:31:35] (03CR) 10jerkins-bot: [V: 04-1] Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772 (owner: 10Daimona Eaytoy) [15:32:17] thanks for the depool [15:32:22] (03CR) 10Jforrester: "Presumably we can now drop the Beta-specific over-ride? It'll inherit the cluster-wide config, which we can't remove until MW core is swit" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475761 (https://phabricator.wikimedia.org/T166733) (owner: 10Anomie) [15:32:22] there was still 50 connections open there [15:32:35] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1122 - actor table missing (duration: 00m 46s) [15:32:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:13] there is no filters, unlike db1095, I guess [15:33:32] 10Operations, 10Traffic, 10Patch-For-Review: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Vgutierrez) [15:33:33] So db1122 and db1095 are the only ones [15:33:38] I just checked across all the hosts in s2 [15:33:40] how is that possible? [15:33:46] switchover? [15:33:51] another s8 situation? [15:34:10] can anomie be around, we definitely need him [15:34:28] jynus: I'm here, but in a meeting at the moment. [15:34:35] it would be "normal" to accidentaly miss a table creation [15:34:49] but I don't know how that could happen on some host (most) but not others [15:35:23] <_joe_> ditto [15:35:41] _joe_: and unlike us, they don't have root to write in read only [15:35:59] so unless there is a bug in replication >_< [15:36:18] so I see 2 options, a replication issue like s8 or the table was dropped afterwards [15:36:18] <_joe_> missing a table completely? [15:36:31] <_joe_> was this the first time anyone wrote to that table? [15:36:38] probably [15:36:46] I am seeing something interesting [15:36:51] marostegui: good or bad? [15:36:55] "good" [15:36:55] PROBLEM - MariaDB Slave Lag: s2 on db1122 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 728.87 seconds [15:37:13] thanks icinga, we know [15:37:14] PROBLEM - MariaDB Slave Lag: s2 on db1095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 748.63 seconds [15:37:28] the lag alert is delayed a bit [15:37:32] So, looks like that table wasn't created via the normal table creation but it was part of a schema change: https://gerrit.wikimedia.org/r/#/c/380669/48/maintenance/archives/patch-actor-table.sql [15:37:40] And I am not seeing that table on any db on that host [15:37:42] so maybe I missed it [15:37:45] (I did that schema change) [15:37:53] I see, ok, that is good news [15:37:55] However, I do see the other changes [15:37:59] So the alters are there [15:38:01] but not the table [15:38:01] because it would mean just a mistake [15:38:04] let me confirm it [15:38:15] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475771 (owner: 10Marostegui) [15:38:17] rather than a scary bug on consistency [15:38:27] 💯 [15:38:49] yeah, the table is not present on any DB on db1122 [15:39:16] <_joe_> wow [15:39:21] <_joe_> ok that looks like an error [15:39:26] yeah, my error [15:39:34] I am confirming that the rest of the changes are there [15:39:34] <_joe_> I'm relieved tbh [15:39:37] human error = easily fixable (whew) [15:39:46] <_joe_> yep [15:39:49] marostegui: sorry to ask you this [15:39:58] but could you own this while there is a meeting [15:39:59] I need to check the whole host [15:40:01] Yes [15:40:06] so at least one can attend [15:40:10] Yeah, I will own this [15:40:14] I am going to silence this host [15:40:19] It needs full review [15:40:32] And same for db1095 [15:41:18] <_joe_> can you afford to put them out of rotation? [15:41:23] yep [15:41:26] db1095 isn't a core host [15:41:28] only db1122 [15:41:46] I think I will re-clone db1122 to be on the safe side, tomorrow [15:41:58] I was looking for db1095 in the config and didn't see it, yeah [15:41:58] (03PS2) 10Daimona Eaytoy: Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772 [15:42:18] (03CR) 10jerkins-bot: [V: 04-1] Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772 (owner: 10Daimona Eaytoy) [15:42:58] (03PS1) 10Marostegui: db1095, db1122: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/475774 [15:44:02] (03CR) 10Marostegui: [C: 032] db1095, db1122: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/475774 (owner: 10Marostegui) [15:44:06] (03PS3) 10Daimona Eaytoy: Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772 [15:44:26] (03CR) 10jerkins-bot: [V: 04-1] Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772 (owner: 10Daimona Eaytoy) [15:44:28] (03PS6) 10Muehlenhoff: Script to generate service principals/keytabs (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/470566 [15:44:51] 10Operations, 10SRE-Access-Requests, 10WMDE-Analytics-Engineering, 10Graphite, 10User-Addshore: Requesting access to graphite hosts for addshore - https://phabricator.wikimedia.org/T208750 (10fgiunchedi) >>! In T208750#4773283, @jcrespo wrote: > Quick question, Filippo, you mention "allowing certain grou... [15:45:14] (03CR) 10Alex Monk: [C: 031] "no idea how we missed this" [puppet] - 10https://gerrit.wikimedia.org/r/475762 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [15:46:51] (03CR) 10Alex Monk: [C: 032] acme_requests: Handle TCP/HTTPS errors [software/certcentral] - 10https://gerrit.wikimedia.org/r/475713 (https://phabricator.wikimedia.org/T209980) (owner: 10Vgutierrez) [15:46:51] apergos: db1095 is a backup host [15:46:59] but not part of the mediawiki rotation [15:47:04] ah ha. thanks, mystery solved! [15:47:40] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM" [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475764 (https://phabricator.wikimedia.org/T210071) (owner: 10Fsero) [15:47:43] marostegui, jynus: It looks like db1122 was created and pooled 25-26 April, and that schema change was being done on s2 26 April - 3 May. So it probably just got left off the checklist in T188299#4160112 due to the bad timing. [15:47:44] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [15:48:24] anomie: Yep, I am checking now, and the whole schema change is missing on db1122 (I am going to check db1095 now), looks like that host was added to the pool right in the middle of the schema changes [15:48:28] PROBLEM - Host ms-be2047.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [15:48:38] !log ppchelko@deploy1001 Started deploy [changeprop/deploy@77be2c6]: Change schema for revision-score events and start emitting again T197000 [15:48:40] (03Merged) 10jenkins-bot: acme_requests: Handle TCP/HTTPS errors [software/certcentral] - 10https://gerrit.wikimedia.org/r/475713 (https://phabricator.wikimedia.org/T209980) (owner: 10Vgutierrez) [15:48:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:42] T197000: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 [15:49:04] So yes, race condition there [15:49:21] I will reclone those two hosts [15:50:02] !log ppchelko@deploy1001 Finished deploy [changeprop/deploy@77be2c6]: Change schema for revision-score events and start emitting again T197000 (duration: 01m 24s) [15:50:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:54] (03CR) 10Fsero: [C: 032] Importing docker-registry debianization from buster for 2.6.2 [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475764 (https://phabricator.wikimedia.org/T210071) (owner: 10Fsero) [15:51:13] (03CR) 10jenkins-bot: acme_requests: Handle TCP/HTTPS errors [software/certcentral] - 10https://gerrit.wikimedia.org/r/475713 (https://phabricator.wikimedia.org/T209980) (owner: 10Vgutierrez) [15:51:16] (03PS2) 10Muehlenhoff: Add mapped IPv6 to labmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/475461 [15:52:19] (03CR) 10Muehlenhoff: [C: 032] Add mapped IPv6 to labmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/475461 (owner: 10Muehlenhoff) [15:56:48] (03PS1) 10Marostegui: db-eqiad.php: Depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475777 [15:57:51] !log Stop MySQL on db1122 - it will be recloned [15:57:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:58:10] (03PS1) 10Ottomata: Refine revision-score after schema improvements [puppet] - 10https://gerrit.wikimedia.org/r/475778 (https://phabricator.wikimedia.org/T197000) [15:58:30] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475777 (owner: 10Marostegui) [15:59:26] (03PS2) 10Cmjohnson: Adding mgmt dns for new servers p1007-10 [dns] - 10https://gerrit.wikimedia.org/r/470849 (https://phabricator.wikimedia.org/T207258) [16:00:03] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475777 (owner: 10Marostegui) [16:00:27] 10Operations, 10Traffic: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10ema) p:05Triage>03Normal [16:00:56] !log Stop MySQL on db1090:3312 to clone db1122 [16:00:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:01:06] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1090:3312 to clone db1122 (duration: 00m 46s) [16:01:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:01:30] (03CR) 10Cmjohnson: [C: 032] Adding mgmt dns for new servers p1007-10 [dns] - 10https://gerrit.wikimedia.org/r/470849 (https://phabricator.wikimedia.org/T207258) (owner: 10Cmjohnson) [16:01:32] 10Operations, 10DBA, 10monitoring, 10Patch-For-Review: Better mysql monitoring for number of connections and processlist strange patterns - https://phabricator.wikimedia.org/T112473 (10Dzahn) [16:03:24] 10Operations, 10Traffic: ATS production-ready as a backend cache layer - https://phabricator.wikimedia.org/T207048 (10ema) [16:04:17] 10Operations, 10Traffic, 10Patch-For-Review: ATS backend-side request-mangling - https://phabricator.wikimedia.org/T209021 (10ema) [16:04:23] 10Operations, 10ChangeProp, 10RESTBase, 10Traffic, and 3 others: ATS path normalization - https://phabricator.wikimedia.org/T210295 (10ema) 05Open>03Resolved Deployed and working fine. Closing. [16:05:09] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475777 (owner: 10Marostegui) [16:05:51] (03PS1) 10Hashar: Log docker build output [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475779 [16:06:39] (03CR) 10jerkins-bot: [V: 04-1] Log docker build output [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475779 (owner: 10Hashar) [16:06:54] (03CR) 10Anomie: "Beta Cluster picks up the prod configuration without the override, doesn't it? So removing it now would drop back to WRITE_NEW." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475761 (https://phabricator.wikimedia.org/T166733) (owner: 10Anomie) [16:07:39] <_joe_> hashar: <3 will look at thaty patch [16:08:56] (03CR) 10Ottomata: [C: 032] Refine revision-score after schema improvements [puppet] - 10https://gerrit.wikimedia.org/r/475778 (https://phabricator.wikimedia.org/T197000) (owner: 10Ottomata) [16:09:07] (03CR) 10Hashar: "Tried with 'docker-pkg --info --select=*someimage*' which dramatically help figuring out what is failling :)" [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475779 (owner: 10Hashar) [16:10:25] (03PS2) 10Thcipriani: Add example systemd service file [software/keyholder] - 10https://gerrit.wikimedia.org/r/473270 [16:11:33] (03CR) 10Giuseppe Lavagetto: [C: 032] php: upgrade deployment servers, maintenance servers to php 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/475768 (https://phabricator.wikimedia.org/T208433) (owner: 10Giuseppe Lavagetto) [16:11:43] (03PS2) 10Giuseppe Lavagetto: php: upgrade deployment servers, maintenance servers to php 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/475768 (https://phabricator.wikimedia.org/T208433) [16:13:26] (03PS2) 10Hashar: Log docker build output [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475779 [16:13:48] (03CR) 10Hashar: "PS2 to fix flake8 and adjust a test." [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475779 (owner: 10Hashar) [16:15:46] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10DarTar) >>! In T209298#4773520, @jcrespo wrote: >> If possible, can we move forward at least on Ryan Steinberg'... [16:16:48] (03CR) 10Thcipriani: Add example systemd service file (035 comments) [software/keyholder] - 10https://gerrit.wikimedia.org/r/473270 (owner: 10Thcipriani) [16:18:24] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10DarTar) Regarding "a reference for the signed MOU/NDA", the process we've followed so far is to use the tracke... [16:20:26] (03PS2) 10DCausse: [cirrus] Allow configuration arrays in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475747 (https://phabricator.wikimedia.org/T210381) [16:20:27] (03PS2) 10DCausse: [cirrus] switch to explicit config in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475748 (https://phabricator.wikimedia.org/T210381) [16:20:30] (03PS2) 10DCausse: [cirrus] prepare multi-instance services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475749 (https://phabricator.wikimedia.org/T210381) [16:20:31] (03PS2) 10DCausse: [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) [16:20:34] (03PS2) 10DCausse: [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 [16:20:36] (03PS2) 10DCausse: [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) [16:21:12] (03PS2) 10Alexandros Kosiaris: First draft of a graphoid helm chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/434475 [16:21:23] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Allow configuration arrays in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475747 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [16:21:34] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] switch to explicit config in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475748 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [16:21:55] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] prepare multi-instance services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475749 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [16:22:03] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [16:22:25] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 (owner: 10DCausse) [16:22:46] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [16:27:18] (03PS3) 10DCausse: [cirrus] Allow configuration arrays in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475747 (https://phabricator.wikimedia.org/T210381) [16:27:20] (03PS3) 10DCausse: [cirrus] switch to explicit config in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475748 (https://phabricator.wikimedia.org/T210381) [16:27:22] (03PS3) 10DCausse: [cirrus] prepare multi-instance services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475749 (https://phabricator.wikimedia.org/T210381) [16:27:24] (03PS3) 10DCausse: [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) [16:27:26] (03PS3) 10DCausse: [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 [16:27:28] (03PS3) 10DCausse: [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) [16:27:53] 10Operations: Provide an option menu when booting via PXE - https://phabricator.wikimedia.org/T191018 (10Aklapper) [16:28:19] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Allow configuration arrays in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475747 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [16:28:32] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] switch to explicit config in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475748 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [16:29:03] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] prepare multi-instance services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475749 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [16:29:06] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [16:29:16] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 (owner: 10DCausse) [16:29:45] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [16:32:32] (03CR) 10Krinkle: "Sorry for the delay, but while I'm unable to find it in written pages, I believe we have a policy against non-internal git dependencies in" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475345 (https://phabricator.wikimedia.org/T210141) (owner: 10Gilles) [16:33:20] (03PS4) 10DCausse: [cirrus] Allow configuration arrays in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475747 (https://phabricator.wikimedia.org/T210381) [16:33:22] (03PS4) 10DCausse: [cirrus] switch to explicit config in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475748 (https://phabricator.wikimedia.org/T210381) [16:33:24] (03PS4) 10DCausse: [cirrus] prepare multi-instance services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475749 (https://phabricator.wikimedia.org/T210381) [16:33:26] (03PS4) 10DCausse: [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) [16:33:28] (03PS4) 10DCausse: [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 [16:33:30] (03PS4) 10DCausse: [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) [16:34:43] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] switch to explicit config in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475748 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [16:34:52] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [16:34:53] sigh... sorry for the spam [16:35:01] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Allow configuration arrays in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475747 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [16:35:03] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] prepare multi-instance services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475749 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [16:35:27] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 (owner: 10DCausse) [16:35:43] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [16:37:06] (03PS2) 10GTirloni: toolforge: Increase WebServiceMonitor sleep time to 60s [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/475355 (https://phabricator.wikimedia.org/T210190) [16:40:02] (03CR) 10Dzahn: "thanks for these merges :))" [puppet] - 10https://gerrit.wikimedia.org/r/475258 (owner: 10Dzahn) [16:41:12] (03CR) 10Vgutierrez: [C: 032] certcentral: Notified puppet_svc on certificate file changes [puppet] - 10https://gerrit.wikimedia.org/r/475762 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [16:41:20] (03PS2) 10Vgutierrez: certcentral: Notified puppet_svc on certificate file changes [puppet] - 10https://gerrit.wikimedia.org/r/475762 (https://phabricator.wikimedia.org/T207050) [16:44:30] (03CR) 10Ottomata: "Great, In general I agree. I'd be for changing all if not most properties to work this way too, like we do with druid, etc. This is just" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/474113 (owner: 10Elukey) [16:44:41] PROBLEM - Disk space on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [16:44:59] PROBLEM - Check systemd state on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [16:45:23] PROBLEM - dhclient process on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [16:45:23] PROBLEM - MD RAID on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [16:45:27] PROBLEM - configured eth on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [16:45:37] hello notebook1003 [16:45:39] PROBLEM - DPKG on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [16:46:11] RECOVERY - Check systemd state on notebook1003 is OK: OK - running: The system is fully operational [16:46:35] RECOVERY - dhclient process on notebook1003 is OK: PROCS OK: 0 processes with command name dhclient [16:46:35] RECOVERY - MD RAID on notebook1003 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [16:46:39] RECOVERY - configured eth on notebook1003 is OK: OK - interfaces up [16:46:46] (just restarted the nrpe server) [16:46:49] RECOVERY - DPKG on notebook1003 is OK: All packages OK [16:46:59] (OOM party) [16:47:03] RECOVERY - Disk space on notebook1003 is OK: DISK OK [16:48:35] (03PS1) 10Giuseppe Lavagetto: prometheus: gather metrics for php, php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/475787 [16:49:23] (03CR) 10jerkins-bot: [V: 04-1] prometheus: gather metrics for php, php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/475787 (owner: 10Giuseppe Lavagetto) [16:49:28] (03CR) 10GTirloni: [C: 032] toolforge: Increase WebServiceMonitor sleep time to 60s [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/475355 (https://phabricator.wikimedia.org/T210190) (owner: 10GTirloni) [16:50:00] (03CR) 10Filippo Giunchedi: [C: 031] prometheus: gather metrics for php, php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/475787 (owner: 10Giuseppe Lavagetto) [16:50:38] (03CR) 10Thcipriani: [C: 031] "Agree: disk space looks fine to add 30MB to /srv on the integration docker boxes." [puppet] - 10https://gerrit.wikimedia.org/r/440539 (https://phabricator.wikimedia.org/T197469) (owner: 10Hashar) [16:54:05] (03PS2) 10Giuseppe Lavagetto: prometheus: gather metrics for php, php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/475787 (https://phabricator.wikimedia.org/T209573) [16:55:05] (03CR) 10Giuseppe Lavagetto: [C: 032] prometheus: gather metrics for php, php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/475787 (https://phabricator.wikimedia.org/T209573) (owner: 10Giuseppe Lavagetto) [16:55:20] (03PS2) 10Daimona Eaytoy: Clarify docs for AbuseFilter emergency threshold [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475337 [16:56:17] PROBLEM - DPKG on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [16:56:30] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Cmjohnson) [16:56:31] PROBLEM - Disk space on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [16:56:49] PROBLEM - Check systemd state on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [16:57:11] PROBLEM - dhclient process on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [16:57:12] (03CR) 10Herron: "Adding comments while WIP so apologies in advance if these are already on your roadmap!" (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/475352 (https://phabricator.wikimedia.org/T205851) (owner: 10Filippo Giunchedi) [16:57:13] PROBLEM - MD RAID on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [16:57:17] PROBLEM - configured eth on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [16:58:29] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Cmjohnson) a:05Cmjohnson>03RobH Rob, Can you complete the installs of pc1008-pc1010. The server used for pc1007 arrived DOA and a ticket with Dell needs to be submitt... [16:58:55] PROBLEM - puppet last run on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [17:00:32] jan_drewniak: do you know anything about T210401 ? [17:00:33] T210401: www.wikipedia.org says 0+ articles for all languages - https://phabricator.wikimedia.org/T210401 [17:01:03] 10Operations: SRE quarterly goal: Ability to serve a fraction of the production traffic from PHP7 - https://phabricator.wikimedia.org/T206336 (10Joe) [17:01:27] 10Operations, 10Patch-For-Review, 10User-Joe: Gather metrics from php-fpm - https://phabricator.wikimedia.org/T209573 (10Joe) 05Open>03Resolved [17:01:29] 10Operations: SRE quarterly goal: Ability to serve a fraction of the production traffic from PHP7 - https://phabricator.wikimedia.org/T206336 (10Joe) [17:01:30] gehel: oh dear. I do now! [17:01:34] 10Operations, 10Patch-For-Review, 10User-Joe: Package and install php 7.2 in place of php 7.0 - https://phabricator.wikimedia.org/T208433 (10Joe) 05Open>03Resolved [17:01:56] jan_drewniak: :) [17:02:14] jan_drewniak: can you look into it? Ping here if you need help. [17:02:21] PROBLEM - Check the NTP synchronisation status of timesyncd on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [17:02:57] gehel: thanks, I'll look into now, can't believe I didn't notice that... [17:05:05] (03PS5) 10DCausse: [cirrus] Allow configuration arrays in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475747 (https://phabricator.wikimedia.org/T210381) [17:05:06] (03PS5) 10DCausse: [cirrus] switch to explicit config in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475748 (https://phabricator.wikimedia.org/T210381) [17:05:08] (03PS5) 10DCausse: [cirrus] prepare multi-instance services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475749 (https://phabricator.wikimedia.org/T210381) [17:05:11] (03PS5) 10DCausse: [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) [17:05:12] (03PS5) 10DCausse: [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 [17:05:14] (03PS5) 10DCausse: [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) [17:05:15] I'm gonna make a revert-patch and add it to swat for now [17:05:58] jan_drewniak: I think this fits for an UBN revert without waiting SWAT (my 2 cents) [17:06:46] 10Operations, 10ops-codfw, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc2004 pc2005 pc2006 - https://phabricator.wikimedia.org/T209858 (10RobH) p:05Normal>03High [17:06:50] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [17:07:11] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 (owner: 10DCausse) [17:07:14] 10Operations, 10ops-codfw, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc2004 pc2005 pc2006 - https://phabricator.wikimedia.org/T209858 (10RobH) This is high priority due to return to Farnam in December. I'll get these ready for onsite wipe ASAP. [17:07:43] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [17:08:44] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Marostegui) p:05Normal>03High Setting up to high as we need to get the old ones out before the leasing deadline [17:10:59] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): Update label and switch to rename labvirt1015 to cloudvirt1015 - https://phabricator.wikimedia.org/T209622 (10Cmjohnson) 05Open>03Resolved [17:11:03] (03PS1) 10Marostegui: db-eqiad.php: Repool db1122 and db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475789 [17:11:38] jan_drewniak: Hey, some reports on Twitter that the www.wikipedia.org portal is showing wikis as having 0+ articles, but I can't replicate. Might be broken by your sync a couple of hours ago? [17:12:09] James_F: seems to be already tracked on T210401 [17:12:10] T210401: www.wikipedia.org says 0+ articles for all languages - https://phabricator.wikimedia.org/T210401 [17:12:11] James_F: yeah, just noticed that. Reverting... [17:12:30] Aha, thanks! [17:12:35] gehel: I'm having trouble creating the revert patch... [17:12:40] Both gehel and jan_drewniak. [17:12:41] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): Relabel labvirt1016.eqiad.wmnet as cloudvirt1016.eqiad.wmnet - https://phabricator.wikimedia.org/T209427 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson [17:13:21] I tried `git revert 52fa9f23d9cbebe28bc4731ba142ec61b5dc5a56` but when I try to commit it says it's an empty commit [17:13:49] jan_drewniak: stupid question, but did you try through the gerrit UI ? [17:14:11] jan_drewniak: I'm missing a lot of context here, which repo is it? Do you have a link to the problematic patch? [17:14:13] gehel: ah there's a button! [17:14:19] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1122 and db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475789 (owner: 10Marostegui) [17:14:22] (03PS1) 10Marostegui: db1122: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/475790 [17:14:49] (03PS6) 10DCausse: [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) [17:14:51] (03PS6) 10DCausse: [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 [17:14:53] (03PS6) 10DCausse: [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) [17:14:55] (03PS1) 10Jdrewniak: Revert "Bumping portals to master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475791 (https://phabricator.wikimedia.org/T210401) [17:15:01] (03CR) 10Marostegui: [C: 032] db1122: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/475790 (owner: 10Marostegui) [17:15:16] gehel: Ok I clicked revert, got this patch: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/475791/ [17:15:28] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1122 and db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475789 (owner: 10Marostegui) [17:15:47] jan_drewniak: looking [17:16:14] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [17:16:19] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 (owner: 10DCausse) [17:16:27] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1090:3312 and db1122 (duration: 00m 46s) [17:16:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:16:43] RECOVERY - Check systemd state on notebook1003 is OK: OK - running: The system is fully operational [17:16:59] jan_drewniak: it does look like it is a revert of that previous CR [17:16:59] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [17:17:05] RECOVERY - dhclient process on notebook1003 is OK: PROCS OK: 0 processes with command name dhclient [17:17:07] RECOVERY - MD RAID on notebook1003 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [17:17:09] RECOVERY - configured eth on notebook1003 is OK: OK - interfaces up [17:17:21] (03PS1) 10Fsero: buster package modified to customize it for WMF and for build 2.7 [debs/docker-distribution] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/475792 (https://phabricator.wikimedia.org/T210071) [17:17:21] RECOVERY - DPKG on notebook1003 is OK: All packages OK [17:17:36] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1122 and db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475789 (owner: 10Marostegui) [17:17:53] gehel: ok, so I'm good to +2 and deploy that now right? [17:18:03] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475793 [17:18:11] jan_drewniak: yep [17:18:25] (03CR) 10Jdrewniak: [C: 032] Revert "Bumping portals to master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475791 (https://phabricator.wikimedia.org/T210401) (owner: 10Jdrewniak) [17:18:39] (03CR) 10Gehel: [C: 031] "LGTM, this reverts the problematic portals deployment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475791 (https://phabricator.wikimedia.org/T210401) (owner: 10Jdrewniak) [17:19:35] RECOVERY - puppet last run on notebook1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:19:46] 10Operations, 10Research: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10bmansurov) [17:20:00] (03Merged) 10jenkins-bot: Revert "Bumping portals to master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475791 (https://phabricator.wikimedia.org/T210401) (owner: 10Jdrewniak) [17:21:02] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): Relabel labvirt1017.eqiad.wmnet as cloudvirt1017.eqiad.wmnet - https://phabricator.wikimedia.org/T208945 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson [17:22:12] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:475791|Reverting: Bumping portals to master (T128546)]] (duration: 00m 46s) [17:22:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:22:15] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [17:22:18] RECOVERY - Disk space on notebook1003 is OK: DISK OK [17:22:59] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:475791|Reverting: Bumping portals to master (T128546)]] (duration: 00m 46s) [17:23:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:23:16] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): labvirt1007 predicted raid failure - https://phabricator.wikimedia.org/T209861 (10Cmjohnson) a:03faidon This server is out of warranty, we would need to order new disks but that needs to be approved by @Faidon. [17:23:19] 10Operations, 10ops-codfw, 10Services (watching): rack/setup/install restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T209615 (10RobH) @papaul, As discussed in irc, this is replacing leased hardware and should be treated with the highest priority. Anything that is a task for non lease hardw... [17:23:53] OK fixed now! [17:24:04] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475793 (owner: 10Marostegui) [17:24:10] ACKNOWLEDGEMENT - Host ms-be2047 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T209395 [17:24:10] ACKNOWLEDGEMENT - Host ms-be2047.mgmt is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T209395 [17:24:13] gehel: thanks for the support! [17:24:40] * gehel denies any knowledge or involvement :) [17:24:45] jan_drewniak: thanks for fixing this! [17:24:48] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): labvirt1007 predicted raid failure - https://phabricator.wikimedia.org/T209861 (10faidon) Sure sounds fine, but @Cmjohnson please file a #procurement request so that we can proceed with that purchase :) [17:25:06] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475793 (owner: 10Marostegui) [17:25:37] 10Operations, 10Traffic: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Joe) I think we need to solve somehow T194031 before this will be feasible. We could as well keep using the puppet CA for everything, but I feel we're at the point where we really need a real PKI solution. [17:26:07] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Increase traffic for db1122 (duration: 00m 45s) [17:26:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:26:53] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475795 [17:27:57] 10Operations, 10ops-eqiad, 10DC-Ops: icinga1001 mysterious reboots - https://phabricator.wikimedia.org/T210108 (10Cmjohnson) I do not see anything in the logs that would tell me where to start. It appears to be working correctly [17:29:53] 10Operations, 10Operations-Software-Development: Develop and deploy at least three Netbox reports to assist with data correctness and consistency - https://phabricator.wikimedia.org/T205899 (10crusnov) Sounds good. [17:31:55] (03CR) 10jenkins-bot: Revert "Bumping portals to master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475791 (https://phabricator.wikimedia.org/T210401) (owner: 10Jdrewniak) [17:31:57] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475793 (owner: 10Marostegui) [17:32:22] RECOVERY - Check the NTP synchronisation status of timesyncd on notebook1003 is OK: OK: synced at Mon 2018-11-26 17:32:21 UTC. [17:32:32] 10Operations, 10ops-eqiad: Degraded RAID on labcontrol1001 - https://phabricator.wikimedia.org/T209829 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson This server is coming up on 5 years old, I can replace a disk but your manual check does not show any failed disk and icinga does not know show a degraded r... [17:33:27] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475795 (owner: 10Marostegui) [17:34:20] (03PS7) 10DCausse: [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) [17:34:22] (03PS7) 10DCausse: [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 [17:34:24] (03PS7) 10DCausse: [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) [17:34:28] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475795 (owner: 10Marostegui) [17:34:46] 10Operations, 10ops-eqiad: Degraded RAID on cloudelastic1003 - https://phabricator.wikimedia.org/T209408 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson Icinga does not show a degraded raid and a manual check confirm that the raid is not failed Personalities : [raid10] [linear] [multipath] [raid0] [raid1]... [17:35:21] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [17:35:28] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Increase traffic for db1122 (duration: 00m 46s) [17:35:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:36:05] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 (owner: 10DCausse) [17:36:14] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [17:36:24] 10Operations, 10Release Pipeline, 10Release-Engineering-Team, 10Epic, 10Services (watching): Migrate production services to kubernetes using the pipeline - https://phabricator.wikimedia.org/T198901 (10greg) [17:36:27] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475796 [17:42:07] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475796 (owner: 10Marostegui) [17:42:51] 10Operations, 10ORES, 10Scoring-platform-team: [Epic] Deploy ORES in kubernetes cluster - https://phabricator.wikimedia.org/T182331 (10Ladsgroup) [17:42:55] 10Operations, 10Release Pipeline, 10Release-Engineering-Team, 10Epic, 10Services (watching): Migrate production services to kubernetes using the pipeline - https://phabricator.wikimedia.org/T198901 (10Ladsgroup) [17:43:40] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475796 (owner: 10Marostegui) [17:44:18] 10Operations, 10ops-eqiad: Missing rack face/position for 2 eqiad devices - https://phabricator.wikimedia.org/T209073 (10Cmjohnson) 05Open>03Resolved Fixed [17:44:34] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Increase traffic for db1122 (duration: 00m 46s) [17:44:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:45:24] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475797 [17:45:27] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475795 (owner: 10Marostegui) [17:45:29] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475796 (owner: 10Marostegui) [17:49:59] (03PS8) 10DCausse: [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) [17:50:01] (03PS8) 10DCausse: [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 [17:50:03] (03PS8) 10DCausse: [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) [17:50:59] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475797 (owner: 10Marostegui) [17:51:01] (03CR) 10CRusnov: "Okay testing results!" [software/cumin] - 10https://gerrit.wikimedia.org/r/474087 (https://phabricator.wikimedia.org/T207037) (owner: 10CRusnov) [17:51:07] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [17:51:43] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 (owner: 10DCausse) [17:51:52] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [17:52:04] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475797 (owner: 10Marostegui) [17:52:56] 10Operations, 10ops-eqsin, 10Traffic: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 (10BBlack) Update from SRE meeting today - memtest was successful, and we're asked to put it back in production and see if the error happens again or not. Re-pooling! [17:53:03] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1122 (duration: 00m 46s) [17:53:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:57] !log re-pooling cp5001 - T199675 [17:54:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:54:01] T199675: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 [17:54:05] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=cp5001.eqsin.wmnet [17:54:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:54:33] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [17:54:35] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Ship peopleweb apache2 error logs to ELK - https://phabricator.wikimedia.org/T209860 (10herron) 05Open>03Resolved [17:55:58] (03PS9) 10DCausse: [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) [17:56:00] (03PS9) 10DCausse: [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 [17:56:03] (03PS9) 10DCausse: [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) [17:56:05] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10jcrespo) [17:57:08] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Ship peopleweb apache2 error logs to ELK - https://phabricator.wikimedia.org/T209860 (10Dzahn) @herron please not that rutherfordium is being replaced by people1001 as we speak (jessie -> stretch). Currently in progress. The new host already has the rol... [17:57:16] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [17:57:53] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10jcrespo) yes, no problem- I brought these up for Ops approval and some people pointed out that your signoff was... [17:58:06] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [17:58:08] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 (owner: 10DCausse) [17:58:36] 10Operations, 10Discovery-Wikidata-Query-Service-Sprint, 10Patch-For-Review: wdqs-updater crashing on all wdqs servers - https://phabricator.wikimedia.org/T210235 (10Smalyshev) This is weird, bind shouldn't consume that much space... I'll need to look into the code in detail, but something looks wrong there.... [17:58:56] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475797 (owner: 10Marostegui) [17:59:44] 10Operations, 10Discovery-Wikidata-Query-Service-Sprint, 10Patch-For-Review: wdqs-updater crashing on all wdqs servers - https://phabricator.wikimedia.org/T210235 (10Smalyshev) a:05Gehel>03Smalyshev [18:00:04] gehel and onimisionipe: Time to snap out of that daydream and deploy Wikidata Query Service weekly deploy. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181126T1800). [18:00:35] here here.. [18:00:51] 10Operations, 10SRE-Access-Requests: access request for Jeena Huneidi (deployment, conint-admins, contint-docker) - https://phabricator.wikimedia.org/T210027 (10jcrespo) This was approved with no objections, please allow me a few hours before deployment as it is the end of my day- this and T210028 should be do... [18:01:38] (03PS2) 10Dzahn: remove icinga-old.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/474392 (https://phabricator.wikimedia.org/T209738) [18:02:15] (03CR) 10Dzahn: [C: 032] remove icinga-old.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/474392 (https://phabricator.wikimedia.org/T209738) (owner: 10Dzahn) [18:02:20] (03PS10) 10DCausse: [cirrus] Start using cirrus multi-instance config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) [18:02:22] (03PS10) 10DCausse: [cirrus] Start using psi&omega in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475751 [18:02:24] (03PS10) 10DCausse: [cirrus] Start using psi&omega in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475752 (https://phabricator.wikimedia.org/T210381) [18:03:25] 10Operations, 10SRE-Access-Requests: access request for Jeena Huneidi (deployment, conint-admins, contint-docker) - https://phabricator.wikimedia.org/T210027 (10greg) Thanks Jaime! [18:04:37] 10Operations, 10Traffic, 10Patch-For-Review: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Dzahn) [18:04:38] !log onimisionipe@deploy1001 Started deploy [wdqs/wdqs@8bfea2b]: GUI updates and updater with dump and revision logging. [18:04:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:05:26] 10Operations, 10Traffic, 10Patch-For-Review: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Dzahn) icinga-old has been removed from DNS and was only temporary. T209738 https://gerrit.wikimedia.org/r/#/c/operations/dns/+/474392/ so it does... [18:06:23] (03PS7) 10Dzahn: icinga: remove einsteinium as an alerting_host [puppet] - 10https://gerrit.wikimedia.org/r/473278 (https://phabricator.wikimedia.org/T202782) [18:08:10] 10Operations, 10Traffic, 10Patch-For-Review: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Krenair) [18:08:30] 10Operations, 10Traffic, 10Patch-For-Review: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Krenair) [18:11:41] 10Operations, 10SRE-Access-Requests, 10WMDE-Analytics-Engineering, 10Graphite, 10User-Addshore: Requesting access to graphite hosts for addshore - https://phabricator.wikimedia.org/T208750 (10jcrespo) a:03jcrespo @addshore, @fgiunchedi's Plan was approved on today's SRE meeting. The following was not... [18:13:33] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Ship peopleweb apache2 error logs to ELK - https://phabricator.wikimedia.org/T209860 (10herron) >>! In T209860#4774746, @Dzahn wrote: > @herron please note that rutherfordium is being replaced by people1001 as we speak (jessie -> stretch). Currently in... [18:14:44] (03CR) 10Dzahn: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/13712/" [puppet] - 10https://gerrit.wikimedia.org/r/473278 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [18:15:07] (03PS1) 10Jforrester: Make it possible to configure enhanced RC/Watchlist default state per wiki, Part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475804 [18:15:09] (03PS1) 10Jforrester: Make it possible to configure enhanced RC/Watchlist default state per wiki, Part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475805 [18:15:11] (03PS1) 10Jforrester: [Beta Cluster] Make enhanced RC the default on Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475806 [18:15:51] !log onimisionipe@deploy1001 Finished deploy [wdqs/wdqs@8bfea2b]: GUI updates and updater with dump and revision logging. (duration: 11m 13s) [18:15:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:35] my change i just merged removes einsteinium as a monitoring host.. so it will affect ferm on everything [18:16:53] but only removing the old host that should of course not be used anymore [18:17:26] also removes it from NRPE allowed hosts [18:17:34] (03CR) 10Smalyshev: wdqs: allow configuring wdqs-updater heap size (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475717 (https://phabricator.wikimedia.org/T210290) (owner: 10Gehel) [18:17:39] (03PS3) 10Herron: rsyslog: ship logs with tag 'icinga' to ELK [puppet] - 10https://gerrit.wikimedia.org/r/474982 (https://phabricator.wikimedia.org/T7) [18:18:15] so nagios-nrpe-server gets restarted across the board. in the past we have seen some alerts if that happens at the wrong moment..races [18:19:33] (03CR) 10Herron: [C: 032] rsyslog: ship logs with tag 'icinga' to ELK [puppet] - 10https://gerrit.wikimedia.org/r/474982 (https://phabricator.wikimedia.org/T7) (owner: 10Herron) [18:20:17] that would manifest as NRPE checks like DISK or DPKG alerting but with "connection refused" rather than a normal alert message..which recovers right after [18:24:34] 10Operations, 10Icinga, 10decommission, 10monitoring, 10Patch-For-Review: decom einsteinium - https://phabricator.wikimedia.org/T209738 (10Dzahn) - removed alerting_host role from einsteinium - removed einsteinium from network::constants - removed from whitelisted hosts on lists server - removed from nag... [18:24:56] I just saw a soft error that disappeared on next check for a raid check [18:25:00] is that related? [18:25:18] jynus: yes, that is that [18:25:26] 10Operations, 10Wikidata, 10Wikidata-Query-Service: WDQS puppet/hiera configs are too distributed - https://phabricator.wikimedia.org/T210431 (10Smalyshev) [18:25:36] since RAID checks are run via NRPE, it's due to the restart of nagios-nrpe-server [18:25:54] which is because einsteinium is removed as an allowed_host to connect to everything [18:26:19] it is now a "role(spare) since a minute [18:27:02] (03PS1) 10Brian Wolff: Remove unblockself rights everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475808 (https://phabricator.wikimedia.org/T150826) [18:27:09] slightly scary part is that touching that means ferm and service restart on all hosts , * [18:28:49] anyways, half of the half hour is already over and it's fine :) [18:30:26] 10Operations, 10Icinga, 10decommission, 10monitoring, 10Patch-For-Review: decom einsteinium - https://phabricator.wikimedia.org/T209738 (10Dzahn) [18:33:34] (03PS1) 10RobH: pc1009-pc1010 install params [puppet] - 10https://gerrit.wikimedia.org/r/475810 (https://phabricator.wikimedia.org/T207258) [18:33:51] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10jcrespo) Yes, RAID 10 with 256K stripe is our default setup, sorry for not specifying it. Only on very specific setup we will not want that (parsercaches or o... [18:34:19] (03CR) 10MusikAnimal: [C: 031] Remove unblockself rights everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475808 (https://phabricator.wikimedia.org/T150826) (owner: 10Brian Wolff) [18:34:24] (03CR) 10jerkins-bot: [V: 04-1] pc1009-pc1010 install params [puppet] - 10https://gerrit.wikimedia.org/r/475810 (https://phabricator.wikimedia.org/T207258) (owner: 10RobH) [18:35:15] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10jcrespo) [18:35:22] hrmm [18:35:28] (03PS1) 10Cmjohnson: Adding mgmt dns for dbstore1003-5 [dns] - 10https://gerrit.wikimedia.org/r/475812 (https://phabricator.wikimedia.org/T209620) [18:35:54] https://integration.wikimedia.org/ci/job/operations-puppet-tests-stretch-docker/1357/console makes no sense to me [18:36:19] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10jcrespo) [18:36:25] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10Cmjohnson) Thanks @jcrespo! I didn't want to assume anything. [18:37:16] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10jcrespo) We are cool! Better to ask than make you work twice :-D [18:38:13] robh: there is "pc10010" instead of 'pc1010" [18:38:18] ahhh [18:38:22] not sure if that explains the error message [18:38:44] ? [18:38:50] where is that at? [18:38:57] in site.pp [18:38:59] not the DHCP part [18:39:08] oh ,site.pp [18:39:10] maybe that is what happens if DNS lookup fails for the host [18:39:41] but it doesnt say anything obvious besides the DHCP test started and then stuff about augeas [18:39:51] (03PS2) 10RobH: pc1009-pc1010 install params [puppet] - 10https://gerrit.wikimedia.org/r/475810 (https://phabricator.wikimedia.org/T207258) [18:39:52] that seems to be unrelated.. so still a bit weird [18:40:00] yeah resubmitted and thx for noticing [18:40:14] the failure message would be nice to point to what line in the file [18:40:45] likely not due to it compiling the config out of those so it has no idea. [18:40:58] (03PS3) 10RobH: pc1009-pc1010 install params [puppet] - 10https://gerrit.wikimedia.org/r/475810 (https://phabricator.wikimedia.org/T207258) [18:41:43] robh: oh, i see it now. the relevant line in that wall of text: [18:41:43] !log re-activate BGP to AS41692 on cr2-esams [18:41:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:41:45] 18:34:11 Typo found! [18:42:01] yeah but it doesnt say where ;D [18:42:11] but, its ok now [18:42:18] mutante: so that was it! [18:42:25] site.pp invalid dns entry lookup [18:42:34] which, i didnt know would happen lik ethat since i've not introduced that typo. [18:42:37] huh [18:42:40] robh: it says "Typo found!" because the regex in the file "typos" [18:42:41] (? detected there are 3 digits and not 4 [18:42:54] heh [18:42:56] smart rules [18:43:06] (03CR) 10RobH: [C: 032] pc1009-pc1010 install params [puppet] - 10https://gerrit.wikimedia.org/r/475810 (https://phabricator.wikimedia.org/T207258) (owner: 10RobH) [18:43:16] 10Operations, 10Release Pipeline, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10Services (watching): Create Graphoid .pipeline files - https://phabricator.wikimedia.org/T203092 (10thcipriani) Created docker-registry.wikimedia.org/wikimedia/mediawiki-services-graphoid:20181126180555-productio... [18:43:30] ci saves me from myself [18:43:32] ;D [18:43:40] it works:) [18:44:12] !log ppchelko@deploy1001 Started deploy [changeprop/deploy@c89bff5]: Prepared for ORES error response T197000 [18:44:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:44:15] T197000: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 [18:45:25] !log ppchelko@deploy1001 Finished deploy [changeprop/deploy@c89bff5]: Prepared for ORES error response T197000 (duration: 01m 13s) [18:45:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:45:42] PROBLEM - puppet last run on icinga1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:46:54] !log removed allowed sender addresses from AQL (mail2SMS gateway) portal: @einsteinium @tegmen addresses T208824 T209738 [18:46:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:46:59] T209738: decom einsteinium - https://phabricator.wikimedia.org/T209738 [18:46:59] T208824: rename tegmen to icinga2001 and reinstall it with stretch - https://phabricator.wikimedia.org/T208824 [18:47:55] robh: i am in AQL portal, i removed einsteinium and tegmen .. from registered addresses that are allowed to send, as in icinga@einsteium , icinga@tegmen [18:48:22] robh: but wasn't there also a second place where we allowed the IP addresses to send.. pretty sure it was both [18:48:39] also checking that puppet alert on icinga1001 [18:48:48] (it's fine) [18:50:43] RECOVERY - puppet last run on icinga1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:50:46] !log stopping icinga service on einsteinium, is a role(spare) now T209738 [18:50:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:50:59] (03PS1) 10EBernhardson: Search profiles for wbsearchentities AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475817 (https://phabricator.wikimedia.org/T209402) [18:53:54] (03PS1) 10Ppchelko: Revert "Disable public revision-score events until we figure out a good schema" [puppet] - 10https://gerrit.wikimedia.org/r/475818 [18:54:34] (03CR) 10Bstorm: "I am curious if any of these changes looks problematic to Alex:" [puppet] - 10https://gerrit.wikimedia.org/r/475093 (https://phabricator.wikimedia.org/T206639) (owner: 10Mathew.onipe) [18:54:48] (03CR) 10jerkins-bot: [V: 04-1] Revert "Disable public revision-score events until we figure out a good schema" [puppet] - 10https://gerrit.wikimedia.org/r/475818 (owner: 10Ppchelko) [18:56:06] (03PS3) 10Dzahn: decom einsteinium remove from netboot and DHCP [puppet] - 10https://gerrit.wikimedia.org/r/474390 (https://phabricator.wikimedia.org/T209738) [18:56:33] (03CR) 10jerkins-bot: [V: 04-1] decom einsteinium remove from netboot and DHCP [puppet] - 10https://gerrit.wikimedia.org/r/474390 (https://phabricator.wikimedia.org/T209738) (owner: 10Dzahn) [18:58:23] 10Operations, 10Release Pipeline, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10Services (watching): Create Graphoid .pipeline files - https://phabricator.wikimedia.org/T203092 (10mobrovac) >>! In T203092#4774968, @thcipriani wrote: > The image complains about `Error: Cannot find module 'bun... [18:58:53] 10Operations, 10Citoid, 10Patch-For-Review, 10Services (watching), 10VisualEditor (Current work): Transition citoid to use Zotero's translation-server-v2 - https://phabricator.wikimedia.org/T197242 (10Mvolz) As per chat, this is scheduled for deploy Monday Dec 03. [18:58:59] 10Operations, 10Release Pipeline, 10Core Platform Team Backlog (Watching / External), 10Release-Engineering-Team (Kanban), 10Services (watching): Create Graphoid .pipeline files - https://phabricator.wikimedia.org/T203092 (10mobrovac) [19:00:04] Deploy window Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181126T1900) [19:00:05] Jayprakash12345, kostajh, and ebernhardson: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:13] i'm here [19:00:17] \o [19:04:28] (03PS1) 10RobH: production dns for pc1008-1010 [dns] - 10https://gerrit.wikimedia.org/r/475820 (https://phabricator.wikimedia.org/T207258) [19:05:02] (03CR) 10RobH: [C: 032] Adding mgmt dns for dbstore1003-5 [dns] - 10https://gerrit.wikimedia.org/r/475812 (https://phabricator.wikimedia.org/T209620) (owner: 10Cmjohnson) [19:05:44] (03PS2) 10RobH: production dns for pc1008-1010 [dns] - 10https://gerrit.wikimedia.org/r/475820 (https://phabricator.wikimedia.org/T207258) [19:05:48] i suppose i can ship swat today [19:07:04] Jayprakash12345: around for deploy? [19:07:15] (03CR) 10RobH: [C: 032] production dns for pc1008-1010 [dns] - 10https://gerrit.wikimedia.org/r/475820 (https://phabricator.wikimedia.org/T207258) (owner: 10RobH) [19:07:17] yes [19:08:03] ebernhardson: thanks. once it's on a debug server, I'll be working with Nettrom on verifying a few events on testwiki [19:08:05] Jayprakash12345: can you check that patch from the mwdebug hosts? [19:08:23] (03PS4) 10EBernhardson: Enable NewUserMessage Extension on tcy.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474409 (https://phabricator.wikimedia.org/T209432) (owner: 10Jayprakash12345) [19:08:24] yes [19:08:41] (03CR) 10EBernhardson: [C: 032] Enable NewUserMessage Extension on tcy.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474409 (https://phabricator.wikimedia.org/T209432) (owner: 10Jayprakash12345) [19:08:58] (03PS2) 10Ppchelko: Revert "Disable public revision-score events until we figure out a good schema" [puppet] - 10https://gerrit.wikimedia.org/r/475818 [19:09:41] Let me where I need to check 1001, 1002, 2001 or 2002 [19:09:49] (03Merged) 10jenkins-bot: Enable NewUserMessage Extension on tcy.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474409 (https://phabricator.wikimedia.org/T209432) (owner: 10Jayprakash12345) [19:09:57] Jayprakash12345: will be 1001 in a sec [19:10:36] Jayprakash12345: rsync'd to 1001 now [19:10:42] or pulled, whatever we call it now :) [19:12:53] kostajh: you're now synced to 1001 as well [19:13:01] ebernhardson: ok, checking now [19:14:02] https://tcy.wikipedia.org/wiki/ಬಳಕೆದಾರೆ_ಪಾತೆರ:Indic-TechCom works fine, Can you go ahead to run the bot. [19:14:25] Jayprakash12345: bot? you mean go ahead and deploy? [19:14:46] Yeah [19:16:14] ebernhardson: Run the stashbot is another phrase, sorry about that. [19:16:16] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT T209432 Enable NewUserMessage extension on tcy.wikipedia (duration: 00m 48s) [19:16:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:16:20] T209432: Enable Extension:NewUserMessage on tcy.wikipedia - https://phabricator.wikimedia.org/T209432 [19:16:36] 10Operations, 10Operations-Software-Development: Develop and deploy at least three Netbox reports to assist with data correctness and consistency - https://phabricator.wikimedia.org/T205899 (10crusnov) a:03crusnov [19:16:42] Jayprakash12345: should be deployed now [19:17:21] (03CR) 10EBernhardson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475817 (https://phabricator.wikimedia.org/T209402) (owner: 10EBernhardson) [19:17:39] ebernhardson: Yes [19:18:28] (03Merged) 10jenkins-bot: Search profiles for wbsearchentities AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475817 (https://phabricator.wikimedia.org/T209402) (owner: 10EBernhardson) [19:18:48] (03CR) 10jenkins-bot: Enable NewUserMessage Extension on tcy.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474409 (https://phabricator.wikimedia.org/T209432) (owner: 10Jayprakash12345) [19:18:50] (03CR) 10jenkins-bot: Search profiles for wbsearchentities AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475817 (https://phabricator.wikimedia.org/T209402) (owner: 10EBernhardson) [19:21:22] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [19:22:09] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10Cmjohnson) [19:22:34] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10Cmjohnson) a:05Cmjohnson>03RobH @robh these are ready to for installs. [19:23:25] ebernhardson: all looks good, please proceed [19:24:04] !log cloudvirt1019 is going down to check something for HPE [19:24:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:24:51] !log remove IP from blacklist on Amsterdam routers - T201411 [19:24:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:25:01] kostajh: going out now [19:25:06] !log ebernhardson@deploy1001 Synchronized php-1.33.0-wmf.4/extensions/WikimediaEvents/includes/: SWAT T210003 T210004 EditorJourney: Adjust DeferredUpdates usage (duration: 00m 46s) [19:25:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:25:16] T210004: Some events are out of order - https://phabricator.wikimedia.org/T210004 [19:25:17] T210003: Deferred update causes "Retrieving hash salt for user ID failed" - https://phabricator.wikimedia.org/T210003 [19:26:33] ebernhardson: thank you [19:26:39] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10RobH) [19:26:43] kostajh: np [19:27:12] PROBLEM - Host cloudvirt1019 is DOWN: PING CRITICAL - Packet loss = 100% [19:27:42] that pinged. [19:27:42] anyone looking at cloudvirt1019? just got paged [19:27:46] yes that's me [19:27:46] paged even [19:27:48] cmjohnson1: that's you [19:27:50] yeah ok [19:28:00] sorry i forgot to disable checks [19:28:05] why is a single host sending a page =P [19:28:35] robh, because any one cloudvirt going down is a problem? [19:29:02] enough to page everyone on the sre team? if so ok. [19:29:08] I don't know that [19:29:09] just seems odd for paging for a single host. [19:29:18] ACKNOWLEDGEMENT - Host cloudvirt1019 is DOWN: PING CRITICAL - Packet loss = 100% andrew bogott Chris is working on the drive controller again. [19:30:26] to me it seems worth paging a member of cloud services if any cloudvirt host goes down [19:30:38] (unless that host currently holds no instances for some reason) [19:30:44] <_joe_> robh: it was expressly asked by the cloud team, the alternative they had was to page on any alert [19:30:53] good enough [19:30:58] <_joe_> we're paging everyone as they get paged for our alerts [19:31:05] <_joe_> it widens coverage [19:32:15] (03PS2) 10Muehlenhoff: Remove tweaks to use Linux 4.14 on backup2001 [puppet] - 10https://gerrit.wikimedia.org/r/465434 (https://phabricator.wikimedia.org/T196477) [19:33:04] cloudvirt is openstack VMs? will the instances running there get reassigned to a good host? [19:33:11] yes it's openstack VMs [19:33:17] they won't get auto-migrated AFAIK [19:33:24] cdanis: cloudvirt1019 and 1020 have had hardware problems for months [19:33:29] (03CR) 10Muehlenhoff: [C: 032] Remove tweaks to use Linux 4.14 on backup2001 [puppet] - 10https://gerrit.wikimedia.org/r/465434 (https://phabricator.wikimedia.org/T196477) (owner: 10Muehlenhoff) [19:33:31] so, nothing important is scheduled there [19:33:42] !log ebernhardson@deploy1001 Synchronized wmf-config/WikibaseSearchSettings.php: SWAT T209402 Search profiles for wbsearchentities AB test (duration: 00m 46s) [19:33:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:33:46] T209402: A/B testing plan for wbsearchentities, context=item - https://phabricator.wikimedia.org/T209402 [19:33:54] 10Operations, 10DBA, 10Patch-For-Review, 10User-Banyek: Implement parsercache service on pc[12]0(07|08|09|10) and replace leased pc[12]00[456] - https://phabricator.wikimedia.org/T208383 (10RobH) Please note that pc1008, pc1009, and pc1010 are ready for #dba team to take them over. OS is installed and run... [19:33:59] cmjohnson1 has spent about 500 hours working with HP to get them to stop alerting :( [19:34:04] oof [19:34:09] I don't know if this is the right place to report this, but this page throws an error when trying to view the page history. [19:34:11] https://meta.wikimedia.org/w/index.php?title=Why_there_will_always_be_debate_in_this_project&action=history [19:34:11] I'm just curious from an alerting perspective. if the service will recover on its own in the case of one unexpected failure, I'm not sure that's worth a page? [19:34:12] RECOVERY - Host cloudvirt1019 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [19:34:20] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10RobH) [19:34:45] cdanis: we don't have live migration currently. VMs are stored locally on virt hosts, so if a virt host goes down it causes a major outage — dozens of user VMs shut down. [19:34:50] Kb03, please can you file a ticket with a copy of the error including the hash at the beginning? [19:34:59] ahh, understood; ty [19:35:45] Kb03, tag it #wikimedia-production-error and #mediawiki-general-or-unknown maybe [19:35:59] Thanks, will do! [19:47:46] 10Operations, 10SRE-Access-Requests: Requesting access to Jupyter notebook / analytics-privatedata-users for jgleeson - https://phabricator.wikimedia.org/T208432 (10mepps) a:05mepps>03None @jcrespo I just returned from maternity leave and I can approve this request. [19:55:11] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10RobH) a:05RobH>03Cmjohnson Assigning back to Chris for the followup to repair pc1007. The other servers have been handed off to #dba team for use via task T208383 [19:58:22] (03PS3) 10Gehel: Enable dumping RDF on test & internal [puppet] - 10https://gerrit.wikimedia.org/r/475243 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [19:59:12] (03CR) 10Gehel: [C: 032] Enable dumping RDF on test & internal [puppet] - 10https://gerrit.wikimedia.org/r/475243 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [19:59:27] SMalyshev: ^ [19:59:51] gehel: cool, thanks! [20:00:19] gehel: will you be around for the next hour or so? [20:00:26] in case if something is not right there and we'd need to revert [20:00:30] yep, not too far at least [20:00:38] ok, thanks [20:02:31] 10Operations, 10Traffic, 10netops, 10Goal: Increase network capacity (2018-19 Q2 Goal) - https://phabricator.wikimedia.org/T207668 (10ayounsi) [20:03:03] 10Operations, 10netops: Rack/Setup new codfw QFX5100 10G switch - https://phabricator.wikimedia.org/T197147 (10ayounsi) [20:03:05] 10Operations, 10ops-codfw, 10netops: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi) [20:03:43] 10Operations, 10ops-codfw, 10netops: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi) [20:05:12] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [20:05:33] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [20:14:11] !log ebernhardson@deploy1001 Synchronized php-1.33.0-wmf.4/extensions/Wikibase/repo/includes/Search/Elastic/EntitySearchElastic.php: SWAT T209402 Make wbsearchentities tie-breaker configurable (duration: 00m 47s) [20:14:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:14:15] T209402: A/B testing plan for wbsearchentities, context=item - https://phabricator.wikimedia.org/T209402 [20:15:45] ok SWAT is complete [20:22:09] (03CR) 10Mobrovac: [C: 031] Revert "Disable public revision-score events until we figure out a good schema" [puppet] - 10https://gerrit.wikimedia.org/r/475818 (owner: 10Ppchelko) [20:25:16] 10Operations, 10hardware-requests, 10Core Platform Team (Session Management Service (CDP2)), 10Core Platform Team Kanban (Doing), 10User-Eevans: Hardware for session storage service - https://phabricator.wikimedia.org/T206017 (10Papaul) [20:32:21] (03CR) 10Gilles: [C: 04-1] "I'm fine with just copying it over. Plus, we might have to adapt it to hit upload.* for images anyway." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475345 (https://phabricator.wikimedia.org/T210141) (owner: 10Gilles) [20:41:31] 10Operations, 10Research: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10bmansurov) @Dzahn I've updated the description with acceptance criteria. Let me know if it doesn't make sense. Also, how about we pair tomorrow at your convenience? I was thinking of a Hango... [20:42:34] 10Operations, 10ops-codfw, 10netops: codfw row A recable and add QFX - https://phabricator.wikimedia.org/T210447 (10ayounsi) p:05Triage>03Normal [20:43:06] 10Operations, 10ops-codfw, 10netops: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi) [20:43:13] 10Operations, 10ops-codfw, 10netops, 10Patch-For-Review: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10ayounsi) [20:43:40] 10Operations, 10ops-codfw, 10netops: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi) [20:50:22] (03PS4) 10Bmansurov: Labs: enable the reader trust survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475105 (https://phabricator.wikimedia.org/T209882) [20:53:48] (03PS5) 10Bmansurov: Labs: enable the reader trust survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475105 (https://phabricator.wikimedia.org/T209882) [20:55:42] 10Operations, 10Release Pipeline, 10Core Platform Team Backlog (Watching / External), 10Release-Engineering-Team (Kanban), 10Services (watching): Create Graphoid .pipeline files - https://phabricator.wikimedia.org/T203092 (10thcipriani) >>! In T203092#4775052, @mobrovac wrote: >>>! In T203092#4774968, @t... [20:58:59] gehel: do you know who owns kafka setup? is it analytics? [21:00:05] cscott, arlolra, subbu, bearND, halfak, and Amir1: It is that lovely time of the day again! You are hereby commanded to deploy Services – Parsoid / Citoid / Mobileapps / ORES / …. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181126T2100). [21:03:33] 10Operations: Netbox: fill network topology - https://phabricator.wikimedia.org/T205897 (10ayounsi) [21:04:07] no parsoid deploy today. [21:06:07] 10Operations, 10Cloud-Services, 10Mail, 10Patch-For-Review, 10User-herron: Create a Cloud VPS SMTP smarthost - https://phabricator.wikimedia.org/T41785 (10herron) 05Open>03Resolved I'll transition this to resolved now that we've had mail clients migrated for some time, but if any follow up is needed... [21:07:01] 10Operations: Netbox: fill network topology - https://phabricator.wikimedia.org/T205897 (10ayounsi) [21:09:01] 10Operations, 10ops-codfw: rack/setup/install elastic201[6-9], elastic202[0-9] and elastic203[0-3] - https://phabricator.wikimedia.org/T210450 (10Papaul) p:05Triage>03Normal [21:09:36] SMalyshev: to answer your question above: officially it is SRE [21:09:50] analytics is trying to offload the ownership to them...although I still do a lot of work with it [21:10:19] ottomata: ok, anybody personally that I'll need to bug next time? [21:11:44] in SRE? no i dunno who've they've put on that :p [21:11:50] you can always ping me too [21:11:55] 10Operations, 10Analytics, 10EventBus, 10Wikidata, 10Wikidata-Query-Service: Kafka eqiad.mediawiki.page-delete topic is empty - https://phabricator.wikimedia.org/T210451 (10Smalyshev) [21:12:04] but which SRE opsen to bug might be a question for Faidon... [21:12:05] not sure [21:12:18] 10Operations, 10Analytics, 10EventBus, 10Wikidata, 10Wikidata-Query-Service: Kafka eqiad.mediawiki.page-delete topic is empty - https://phabricator.wikimedia.org/T210451 (10Smalyshev) p:05Triage>03Unbreak! [21:14:08] 10Operations, 10ops-codfw, 10Core Platform Team (Session Management Service (CDP2)), 10Core Platform Team Backlog (Watching / External), and 2 others: rack/setup/install sessionstore200[123].codfw.wmnet - https://phabricator.wikimedia.org/T209389 (10Papaul) [21:15:47] 10Operations, 10Analytics, 10EventBus, 10Wikidata, and 2 others: Kafka eqiad.mediawiki.page-delete topic is empty - https://phabricator.wikimedia.org/T210451 (10Smalyshev) [21:22:21] (03PS1) 10Herron: rsyslog:input:file add multiline handling and ship gerrit logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/475840 (https://phabricator.wikimedia.org/T141324) [21:23:55] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [21:28:19] 10Operations, 10Analytics, 10EventBus, 10Wikidata, and 2 others: Kafka eqiad.mediawiki.page-delete topic is empty - https://phabricator.wikimedia.org/T210451 (10Pchelolo) Yes, [[ https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/475011/ | the fix ]] has not been deployed yet. Asked to put it on Euro Mid... [21:31:50] (03PS7) 10Dzahn: icinga: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/473276 (https://phabricator.wikimedia.org/T202782) [21:31:53] (03PS1) 10Hashar: Support docker build pull and cache [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475843 (https://phabricator.wikimedia.org/T210438) [21:32:50] (03CR) 10jerkins-bot: [V: 04-1] icinga: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/473276 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [21:34:16] (03CR) 10Dzahn: "paladox, krenair.. will this .. break shinken? i really hope not because the reason we have the 3 monitoring modules including the "commo" [puppet] - 10https://gerrit.wikimedia.org/r/473276 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [21:35:07] 10Operations, 10Analytics, 10EventBus, 10WMF-JobQueue, and 5 others: Kafka eqiad.mediawiki.page-delete topic is empty - https://phabricator.wikimedia.org/T210451 (10mobrovac) p:05Unbreak!>03High Will be done in the EU SWAT window on 2018-11-27. Lowering the priority as it is a known issue and does not... [21:36:35] (03PS1) 10GTirloni: cloudvps: Fix Puppet alerts [puppet] - 10https://gerrit.wikimedia.org/r/475875 (https://phabricator.wikimedia.org/T210432) [21:36:45] (03PS8) 10Dzahn: icinga: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/473276 (https://phabricator.wikimedia.org/T202782) [21:37:05] 10Operations, 10Analytics, 10EventBus, 10WMF-JobQueue, and 5 others: Kafka eqiad.mediawiki.page-delete topic is empty - https://phabricator.wikimedia.org/T210451 (10Smalyshev) It does cause pretty severe breakage - all delete updates are missing from WDQS and people are complaining (in fact, have been comp... [21:37:40] (03CR) 10jerkins-bot: [V: 04-1] icinga: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/473276 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [21:37:55] (03PS2) 10GTirloni: cloudvps: Fix Puppet alerts [puppet] - 10https://gerrit.wikimedia.org/r/475875 (https://phabricator.wikimedia.org/T210432) [21:43:26] (03PS3) 10Zoranzoki21: Delete 'Импортировано' namespace from ru.wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475367 (https://phabricator.wikimedia.org/T210171) [21:43:47] (03PS1) 10Dzahn: profile::icinga: stop using mysql module, rm jessie support [puppet] - 10https://gerrit.wikimedia.org/r/475876 (https://phabricator.wikimedia.org/T209738) [21:46:22] (03CR) 10Andrew Bogott: [C: 031] "Thank you for working on this!" [puppet] - 10https://gerrit.wikimedia.org/r/475875 (https://phabricator.wikimedia.org/T210432) (owner: 10GTirloni) [21:46:54] 10Operations, 10Wikimedia-Logstash: Ship prometheus logs to ELK - https://phabricator.wikimedia.org/T210455 (10herron) p:05Triage>03Normal [21:47:16] 10Operations, 10Wikimedia-Logstash: Ship prometheus logs to ELK - https://phabricator.wikimedia.org/T210455 (10herron) [21:47:19] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [21:47:40] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [21:49:11] (03PS1) 10Herron: logstash: ship prometheus logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/475879 (https://phabricator.wikimedia.org/T210455) [21:50:05] (03CR) 10Dzahn: [C: 032] "Compilation results for icinga1001.wikimedia.org: no change" [puppet] - 10https://gerrit.wikimedia.org/r/475876 (https://phabricator.wikimedia.org/T209738) (owner: 10Dzahn) [21:50:59] 10Operations, 10ops-codfw, 10netops: codfw row B recable and add QFX - https://phabricator.wikimedia.org/T210456 (10ayounsi) p:05Triage>03Normal [21:51:09] 10Operations, 10ops-codfw, 10netops: codfw row B recable and add QFX - https://phabricator.wikimedia.org/T210456 (10ayounsi) a:05ayounsi>03Papaul [21:51:16] 10Operations, 10ops-codfw, 10netops: codfw row A recable and add QFX - https://phabricator.wikimedia.org/T210447 (10ayounsi) a:05ayounsi>03Papaul [21:51:38] 10Operations, 10ops-codfw, 10netops: codfw row B recable and add QFX - https://phabricator.wikimedia.org/T210456 (10ayounsi) [21:51:41] 10Operations, 10ops-codfw, 10netops: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi) [21:52:03] 10Operations, 10ops-codfw, 10netops: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi) [21:52:05] (03CR) 10Alex Monk: "I think it's the nagios_common stuff where this gets problematic. icinga should probably be okay. Give it a go?" [puppet] - 10https://gerrit.wikimedia.org/r/473276 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [21:53:42] (03CR) 10Hashar: "I have done a single change for both pull and cache since they are very similar: options passed to "docker build"." [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475843 (https://phabricator.wikimedia.org/T210438) (owner: 10Hashar) [21:58:23] (03PS1) 10Dzahn: icinga/NSCA (passive checks): remove jessie support hacks [puppet] - 10https://gerrit.wikimedia.org/r/475881 (https://phabricator.wikimedia.org/T202782) [21:59:09] (03PS1) 10Jdlrobson: Set wgMinervaSchemaMainMenuClickTrackingSampleRate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475882 (https://phabricator.wikimedia.org/T205008) [22:00:04] bawolff and Reedy: #bothumor I � Unicode. All rise for Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181126T2200). [22:00:11] \o/ [22:00:34] (03CR) 10Dzahn: "thanks! i am splitting this up into multiple smaller ones. started with https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/475876/ ne" [puppet] - 10https://gerrit.wikimedia.org/r/473276 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [22:01:16] (03PS2) 10Jdlrobson: Set wgMinervaSchemaMainMenuClickTrackingSampleRate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475882 (https://phabricator.wikimedia.org/T205008) [22:03:39] 10Operations, 10Wikimedia-Logstash: Ship PuppetDB logs to ELK - https://phabricator.wikimedia.org/T210458 (10herron) p:05Triage>03Normal [22:03:58] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [22:04:15] 10Operations, 10Wikimedia-Logstash: Ship PuppetDB logs to ELK - https://phabricator.wikimedia.org/T210458 (10herron) [22:04:20] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [22:05:34] (03PS3) 10GTirloni: cloudvps: Fix Puppet alerts [puppet] - 10https://gerrit.wikimedia.org/r/475875 (https://phabricator.wikimedia.org/T210432) [22:06:06] (03CR) 10Ottomata: [C: 032] admin: Add Niharika to analytics-privatedata-users group [puppet] - 10https://gerrit.wikimedia.org/r/475736 (https://phabricator.wikimedia.org/T210022) (owner: 10Jcrespo) [22:06:12] (03PS3) 10Ottomata: admin: Add Niharika to analytics-privatedata-users group [puppet] - 10https://gerrit.wikimedia.org/r/475736 (https://phabricator.wikimedia.org/T210022) (owner: 10Jcrespo) [22:06:14] (03CR) 10Ottomata: [V: 032 C: 032] admin: Add Niharika to analytics-privatedata-users group [puppet] - 10https://gerrit.wikimedia.org/r/475736 (https://phabricator.wikimedia.org/T210022) (owner: 10Jcrespo) [22:06:18] (03PS1) 10RobH: settting dbstore100[5-67].eqiad.wmnet production dns entries [dns] - 10https://gerrit.wikimedia.org/r/475883 (https://phabricator.wikimedia.org/T209620) [22:06:27] (03CR) 10jerkins-bot: [V: 04-1] cloudvps: Fix Puppet alerts [puppet] - 10https://gerrit.wikimedia.org/r/475875 (https://phabricator.wikimedia.org/T210432) (owner: 10GTirloni) [22:07:24] (03CR) 10RobH: [C: 032] settting dbstore100[5-67].eqiad.wmnet production dns entries [dns] - 10https://gerrit.wikimedia.org/r/475883 (https://phabricator.wikimedia.org/T209620) (owner: 10RobH) [22:07:48] (03PS4) 10GTirloni: cloudvps: Fix Puppet alerts [puppet] - 10https://gerrit.wikimedia.org/r/475875 (https://phabricator.wikimedia.org/T210432) [22:07:55] (03PS1) 10Herron: puppetdb: ship puppetdb logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/475884 (https://phabricator.wikimedia.org/T210458) [22:08:43] (03CR) 10Paladox: [C: 031] rsyslog:input:file add multiline handling and ship gerrit logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/475840 (https://phabricator.wikimedia.org/T141324) (owner: 10Herron) [22:08:57] (03PS5) 10GTirloni: cloudvps: Fix Puppet alerts [puppet] - 10https://gerrit.wikimedia.org/r/475875 (https://phabricator.wikimedia.org/T210432) [22:09:55] (03PS1) 10Hashar: tox: allow passing options to pytest environments [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475885 [22:11:18] (03CR) 10GTirloni: [C: 032] cloudvps: Fix Puppet alerts [puppet] - 10https://gerrit.wikimedia.org/r/475875 (https://phabricator.wikimedia.org/T210432) (owner: 10GTirloni) [22:12:00] (03CR) 10SBassett: [C: 032] "Looks sane, discussed w/ Brian in IRC." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475808 (https://phabricator.wikimedia.org/T150826) (owner: 10Brian Wolff) [22:12:52] (03PS1) 10RobH: dbstore100[345].eqiad.wmnet base isntall params [puppet] - 10https://gerrit.wikimedia.org/r/475888 (https://phabricator.wikimedia.org/T209620) [22:13:24] (03PS2) 10RobH: dbstore100[345].eqiad.wmnet base isntall params [puppet] - 10https://gerrit.wikimedia.org/r/475888 (https://phabricator.wikimedia.org/T209620) [22:16:47] (03CR) 10RobH: [C: 032] dbstore100[345].eqiad.wmnet base isntall params [puppet] - 10https://gerrit.wikimedia.org/r/475888 (https://phabricator.wikimedia.org/T209620) (owner: 10RobH) [22:18:54] (03PS2) 10Brian Wolff: Remove unblockself rights everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475808 (https://phabricator.wikimedia.org/T150826) [22:23:12] (03CR) 10Dzahn: [C: 032] icinga/NSCA (passive checks): remove jessie support hacks [puppet] - 10https://gerrit.wikimedia.org/r/475881 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [22:23:20] (03PS2) 10Dzahn: icinga/NSCA (passive checks): remove jessie support hacks [puppet] - 10https://gerrit.wikimedia.org/r/475881 (https://phabricator.wikimedia.org/T202782) [22:25:29] (03PS1) 10Herron: swift: ship logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/475898 (https://phabricator.wikimedia.org/T63780) [22:25:40] (03CR) 10Dzahn: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/13715/icinga1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/475881 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [22:28:29] (03CR) 10jenkins-bot: Remove unblockself rights everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475808 (https://phabricator.wikimedia.org/T150826) (owner: 10Brian Wolff) [22:29:06] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10RobH) [22:30:31] Deploying https://gerrit.wikimedia.org/r/475808/ now... [22:31:47] (03PS1) 10Dzahn: icinga::web: do not use PHP anymore [puppet] - 10https://gerrit.wikimedia.org/r/475901 (https://phabricator.wikimedia.org/T208257) [22:32:57] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10RobH) a:05RobH>03Cmjohnson @Cmjohnson dbstore1004 shows production network cable issue? Switch shows it is admin enabled but no link: ge-1/0/13 u... [22:33:20] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10RobH) dbstore1003 & dbstore1005 are fully installed and now online, standing by with role:spare applied. [22:33:30] (03PS2) 10Herron: puppetdb: ship puppetdb logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/475884 (https://phabricator.wikimedia.org/T210458) [22:34:02] 10Operations, 10Research: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Dzahn) Yep, we had an IRC chat and i already know more and we will continue tomorrow. [22:34:27] (03CR) 10Herron: [C: 032] puppetdb: ship puppetdb logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/475884 (https://phabricator.wikimedia.org/T210458) (owner: 10Herron) [22:39:22] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [22:39:41] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [22:39:58] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Onboard at least 10 new non-sensitive log producers to the logging pipeline - https://phabricator.wikimedia.org/T205852 (10herron) [22:44:24] Umm, whats with the exceptions on enwiki currently? [W-x3FArAADwAAHIf9UEAAAAS] /wiki/Main_Page ErrorException from line 302 of /srv/mediawiki/php-1.33.0-wmf.4/includes/Message.php: [22:44:27] Is that a known issue? [22:45:17] 10Operations, 10Research: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Dzahn) [22:45:18] Seems to be memcached unserialize errors [22:45:46] with the sidebar [22:48:39] It seems to be reporting 30,000 such errors [22:48:54] like every 10 minutes [22:49:11] 10Operations, 10Mail: Tracking down gary@ and redirecting it to trustandsafety@ - https://phabricator.wikimedia.org/T210464 (10Krenair) [22:50:44] greg-g: Do you happen to know what those errors are about? [22:54:15] It looks like a bad memcached key. Maybe the key should just be removed to force a recache [22:54:29] if it's an unserialize error, why is it only happening every 10 minutes? [22:54:41] no, no clue [22:55:02] is the key listed on the error? [22:55:36] 10Operations, 10Graphite, 10Patch-For-Review, 10Performance-Team (Radar): Improve graphite failover - https://phabricator.wikimedia.org/T88997 (10hashar) Dropping Zuul. Filippo proposed a nice fixup (statsite). Nodepool is gone. I am unsubscribing. Thank you @godog! [22:55:37] I think maybe the key is used to regernate another key that maybe only lasts ten minutes or something [22:55:41] seems to be related to the sidebar [22:56:13] 10Operations, 10Mail: Tracking down gary@ and redirecting it to trustandsafety@ - https://phabricator.wikimedia.org/T210464 (10Dzahn) @bcampbell gary@ is indeed in the config on our side, it is an alias for box6699@, along with pat@. box6699@ itself is an alias for: mdennis@, archive01@. mdennis@ is a Google... [22:56:27] maybe [22:56:41] can you share the available info? [22:57:28] (03PS2) 10Andrew Bogott: Make labnodepool1001.eqiad.wmnet a spare system [puppet] - 10https://gerrit.wikimedia.org/r/473838 (https://phabricator.wikimedia.org/T209642) (owner: 10Hashar) [22:57:33] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to trustandsafety@ - https://phabricator.wikimedia.org/T210464 (10Dzahn) [22:57:45] Platonides: https://dpaste.de/5sWr [22:58:29] (03CR) 10Andrew Bogott: [C: 032] Make labnodepool1001.eqiad.wmnet a spare system [puppet] - 10https://gerrit.wikimedia.org/r/473838 (https://phabricator.wikimedia.org/T209642) (owner: 10Hashar) [22:58:47] looking [22:58:50] In the last 4 hours, enwiki had 1,120,514 php notices related to this [22:59:02] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to trustandsafety@ - https://phabricator.wikimedia.org/T210464 (10Dzahn) Let's clean it up all at once and also do something with pat@ what about box6699@ in general. and what about the OTRS queue "archive01". Added Legal. ` 282 ## L... [23:00:03] bawolff, someone came in with a weird exception on a history page earlier, I told them to open a ticket but I'm not sure they did [23:00:33] Krenair: I don't think this is triggering exceptions, just a bunch of php notices about invalid unserialize and invalid foreach [23:01:19] it's complaining about r:46 [23:01:24] which should be an object reference.. [23:01:33] 10Operations, 10Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144 (10Dzahn) [23:02:15] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to trustandsafety@ - https://phabricator.wikimedia.org/T210464 (10Dzahn) [23:03:36] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to trustandsafety@ - https://phabricator.wikimedia.org/T210464 (10Krenair) Interesting, I've never noticed that queue before. Earliest record I've found is Cbrown1023 changing it around October 2010. [23:07:07] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to trustandsafety@ - https://phabricator.wikimedia.org/T210464 (10Dzahn) @bcampbell btw, if you are wondering about those aliases again, we have an automatic cron job that sends the entire alias file to officeit@wikimedia.org once per... [23:07:57] Is this something that should stop me and sbassett from deploying the thing we were deploying, or can we safely ignore it, do you think? [23:11:43] what were you deploying? [23:12:13] (03PS1) 10BBlack: Depool ulsfo, overlapping circuit maint [dns] - 10https://gerrit.wikimedia.org/r/475909 [23:12:30] Just a rights change (rm unblockself from sysop group) [23:12:38] Not anything that could remotely be related to the error [23:12:51] Platonides: something unrelated to https://includes/Message.php: https://gerrit.wikimedia.org/r/475808/ [23:13:13] * includes/Message.php [23:13:39] (03PS1) 10Ayounsi: Disable traffic to ulsfo for providers maintenance [dns] - 10https://gerrit.wikimedia.org/r/475910 [23:13:41] (03CR) 10Hashar: "Jenkins has a ZeroMQ publisher that was consumed by Nodepool. Now that Nodepool is disabled, we no more need the publisher and hence the I" [puppet] - 10https://gerrit.wikimedia.org/r/473846 (https://phabricator.wikimedia.org/T209361) (owner: 10Hashar) [23:14:11] (03CR) 10BBlack: [C: 031] Disable traffic to ulsfo for providers maintenance [dns] - 10https://gerrit.wikimedia.org/r/475910 (owner: 10Ayounsi) [23:14:35] (03CR) 10Ayounsi: [C: 032] Disable traffic to ulsfo for providers maintenance [dns] - 10https://gerrit.wikimedia.org/r/475910 (owner: 10Ayounsi) [23:14:37] (03Abandoned) 10BBlack: Depool ulsfo, overlapping circuit maint [dns] - 10https://gerrit.wikimedia.org/r/475909 (owner: 10BBlack) [23:15:19] !log depool ulsfo for transport providers maintenance [23:15:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:16:23] LGTM [23:16:52] was reading T150826 [23:16:53] T150826: Remove unblockself right on wikimedia wikis - https://phabricator.wikimedia.org/T150826 [23:17:06] and nicely, it links to T210192 [23:17:28] Ok, sounds good. Lets do this [23:24:22] !log sbassett@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Remove unblockself right everywhere 2046d8df3 T150826 (duration: 00m 47s) [23:24:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:24:26] T150826: Remove unblockself right on wikimedia wikis - https://phabricator.wikimedia.org/T150826 [23:24:53] 10Operations, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: Allow access to Data Lake/Hive for Niharika - https://phabricator.wikimedia.org/T210022 (10Niharika) 05Open>03Resolved Thanks all! [23:25:23] (03PS2) 10Cwhite: mw_rc_irc: ensure diamond::collector absent [puppet] - 10https://gerrit.wikimedia.org/r/475009 (https://phabricator.wikimedia.org/T183454) [23:27:16] (03CR) 10Cwhite: [C: 032] mw_rc_irc: ensure diamond::collector absent [puppet] - 10https://gerrit.wikimedia.org/r/475009 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite) [23:29:10] (03PS4) 10Zoranzoki21: Delete 'Импортировано' namespace from ru.wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475367 (https://phabricator.wikimedia.org/T210171) [23:34:56] 10Operations, 10ops-codfw, 10netops: codfw row D recable and add QFX - https://phabricator.wikimedia.org/T210467 (10ayounsi) p:05Triage>03Normal [23:35:10] 10Operations, 10ops-codfw, 10netops: codfw row D recable and add QFX - https://phabricator.wikimedia.org/T210467 (10ayounsi) [23:35:12] 10Operations, 10ops-codfw, 10netops: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi) [23:35:27] 10Operations, 10ops-codfw, 10netops: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi) [23:42:31] 10Operations, 10Analytics, 10EventBus, 10WMF-JobQueue, and 5 others: Kafka eqiad.mediawiki.page-delete topic is empty - https://phabricator.wikimedia.org/T210451 (10Smalyshev) To make the problem worse, looks like breakage started more than 30 days ago - which means we don't have a record of the old delete... [23:43:08] (03PS1) 10MaxSem: Enable SVGs in page language everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475919 (https://phabricator.wikimedia.org/T208899) [23:44:05] (03PS1) 10Thcipriani: Scap: upgrade to 3.8.9-1 [puppet] - 10https://gerrit.wikimedia.org/r/475920 (https://phabricator.wikimedia.org/T210469) [23:46:02] 10Operations, 10Release-Engineering-Team, 10Scap, 10Patch-For-Review: Update Debian Package for Scap to 3.8.9-1 - https://phabricator.wikimedia.org/T210469 (10thcipriani) [23:48:45] !log temporarily disabling puppet on stat1007 to copy over eventbus validation logs [23:48:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:50:55] !log temporarily disabling puppet on stat1007 to copy over eventbus validation logs (not using stat1007 after all) [23:50:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log