[00:02:35] (03CR) 10Ebe123: [C: 04-1] "Almost! :)" (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478570 (owner: 10Robingan7) [00:26:16] (03PS2) 10Krinkle: Fix bad namespace number for yuewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477856 (https://phabricator.wikimedia.org/T205546) (owner: 10Urbanecm) [00:26:33] (03CR) 10Krinkle: [C: 032] Fix bad namespace number for yuewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477856 (https://phabricator.wikimedia.org/T205546) (owner: 10Urbanecm) [00:31:39] (03PS10) 10Krinkle: Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [00:32:09] (03PS11) 10Krinkle: tests: Assert that extra namespaces have correspondent talk namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [00:39:35] (03PS3) 10Krinkle: Fix bad namespace number for yuewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477856 (https://phabricator.wikimedia.org/T205546) (owner: 10Urbanecm) [00:39:53] (03CR) 10Krinkle: [C: 032] Fix bad namespace number for yuewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477856 (https://phabricator.wikimedia.org/T205546) (owner: 10Urbanecm) [00:40:56] (03Merged) 10jenkins-bot: Fix bad namespace number for yuewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477856 (https://phabricator.wikimedia.org/T205546) (owner: 10Urbanecm) [00:45:22] (03CR) 10Ebe123: [C: 04-1] Upload some new logos (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478570 (owner: 10Robingan7) [00:46:48] (03CR) 10Ebe123: [C: 04-1] "* Please rename wikitest to testwiki." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478498 (owner: 10Robingan7) [00:48:30] (03CR) 10jenkins-bot: Fix bad namespace number for yuewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477856 (https://phabricator.wikimedia.org/T205546) (owner: 10Urbanecm) [00:55:21] (03CR) 10Krinkle: [C: 032] tests: Assert that extra namespaces have correspondent talk namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [00:55:25] (03CR) 10Krinkle: [C: 032] "Thanks!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [00:55:53] !log krinkle@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Ic07ff9acfbe17 - T211529, T205546 (duration: 00m 47s) [00:55:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:55:58] T211529: ApiQuerySiteinfo.php: PHP Notice: Undefined index: 103 - https://phabricator.wikimedia.org/T211529 [00:55:59] T205546: Create Wiktionary Cantonese - https://phabricator.wikimedia.org/T205546 [00:56:25] (03CR) 10jerkins-bot: [V: 04-1] tests: Assert that extra namespaces have correspondent talk namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [01:07:22] (03PS10) 10Krinkle: mediawiki/hhvm: Move fatal-error.php to Puppet [puppet] - 10https://gerrit.wikimedia.org/r/379953 (https://phabricator.wikimedia.org/T113114) [01:08:14] (03PS3) 10Krinkle: [WIP] errorpages: Remove unused hhvm-fatal-error.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/412829 (https://phabricator.wikimedia.org/T113114) [01:08:19] (03PS4) 10Krinkle: [WIP] errorpages: Remove unused hhvm-fatal-error.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/412829 (https://phabricator.wikimedia.org/T113114) [01:08:32] (03PS5) 10Krinkle: errorpages: Remove unused hhvm-fatal-error.php file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/412829 (https://phabricator.wikimedia.org/T113114) [01:08:40] (03PS6) 10Krinkle: errorpages: Remove unused hhvm-fatal-error.php file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/412829 (https://phabricator.wikimedia.org/T113114) [01:09:02] (03PS12) 10Krinkle: tests: Assert that extra namespaces have correspondent talk namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [01:28:51] (03PS11) 10Krinkle: mediawiki/hhvm: Move fatal-error.php to Puppet [puppet] - 10https://gerrit.wikimedia.org/r/379953 (https://phabricator.wikimedia.org/T113114) [01:31:57] (03PS1) 10Robingan7: Add several logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478571 [02:09:31] (03PS12) 10Krinkle: mediawiki/hhvm: Move fatal-error.php to Puppet [puppet] - 10https://gerrit.wikimedia.org/r/379953 (https://phabricator.wikimedia.org/T113114) [02:18:13] (03PS1) 10Krinkle: errorpages: Remove unused php-fatal-error.html file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478575 (https://phabricator.wikimedia.org/T113114) [02:18:49] (03PS2) 10Krinkle: errorpages: Remove unused php-fatal-error.html file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478575 (https://phabricator.wikimedia.org/T113114) [02:19:37] (03PS3) 10Krinkle: errorpages: Remove unused php-fatal-error.html file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478575 (https://phabricator.wikimedia.org/T113114) [02:20:04] (03PS13) 10Krinkle: mediawiki: Move hhvm-fatal-error.php to Puppet [puppet] - 10https://gerrit.wikimedia.org/r/379953 (https://phabricator.wikimedia.org/T113114) [02:24:32] 10Operations, 10MediaWiki-ResourceLoader, 10Performance-Team, 10Traffic: Investigate source of 404 Not Found responses from load.php - https://phabricator.wikimedia.org/T202479 (10Krinkle) 05Open>03stalled a:05Krinkle>03None @ema @BBlack This outcome of this task should be for the mtail/varnishrls... [02:24:45] 10Operations, 10Performance-Team, 10Traffic: Investigate source of 404 Not Found responses from load.php - https://phabricator.wikimedia.org/T202479 (10Krinkle) [02:37:32] PROBLEM - cassandra-a SSL 10.192.32.137:7001 on restbase2004 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [02:38:06] PROBLEM - cassandra-a CQL 10.192.32.137:9042 on restbase2004 is CRITICAL: connect to address 10.192.32.137 and port 9042: Connection refused [02:53:33] !log decommissioning cassandra-b, restbase2004 -- T210843 [02:53:34] 10Operations, 10ops-codfw, 10Core Platform Team, 10Services (doing), and 2 others: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 (10Eevans) [02:53:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:53:37] T210843: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 [03:29:50] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 885.57 seconds [03:44:28] PROBLEM - MegaRAID on db1063 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [03:44:40] 10Operations, 10ops-eqiad: Degraded RAID on db1063 - https://phabricator.wikimedia.org/T211537 (10ops-monitoring-bot) [03:57:22] (03Abandoned) 10Chad: Revert "Remove unblockself rights everywhere" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475947 (owner: 10Chad) [03:58:28] (03Abandoned) 10Chad: Setup apache vhost on scap proxies as well [puppet] - 10https://gerrit.wikimedia.org/r/344221 (https://phabricator.wikimedia.org/T147938) (owner: 10Chad) [03:58:31] (03Abandoned) 10Chad: Beta: Cron to update wmf-config every 3 minutes [puppet] - 10https://gerrit.wikimedia.org/r/414893 (owner: 10Chad) [03:58:48] (03Abandoned) 10Chad: WIP: Initial crappy implementation of Github repo creation [software/gerrit/plugins/wikimedia] - 10https://gerrit.wikimedia.org/r/422429 (owner: 10Chad) [03:58:55] (03Abandoned) 10Chad: Gerrit: Further clean up file ownership [puppet] - 10https://gerrit.wikimedia.org/r/423796 (owner: 10Chad) [03:59:03] (03Abandoned) 10Chad: Gerrit: Run directly from deployment location [puppet] - 10https://gerrit.wikimedia.org/r/423801 (owner: 10Chad) [03:59:28] (03Abandoned) 10Chad: mwdeploy: Ensure home directory exists on all machines [puppet] - 10https://gerrit.wikimedia.org/r/427188 (owner: 10Chad) [03:59:35] (03Abandoned) 10Chad: Greatly simplify svn.wikimedia.org redirects [puppet] - 10https://gerrit.wikimedia.org/r/429449 (owner: 10Chad) [03:59:39] (03Abandoned) 10Chad: Apache redirects: rewrite all WMF URLs to https [puppet] - 10https://gerrit.wikimedia.org/r/429452 (owner: 10Chad) [03:59:44] (03Abandoned) 10Chad: Gerrit: Preemptively set Gerrit elasticsearch config [puppet] - 10https://gerrit.wikimedia.org/r/431664 (owner: 10Chad) [04:20:35] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 219.33 seconds [05:55:29] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1 [05:57:55] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1 [06:17:59] (03PS1) 10Marostegui: dbproxy1010: Depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/478582 (https://phabricator.wikimedia.org/T86338) [06:18:55] (03PS1) 10Marostegui: db-eqiad.php: Depool db1121 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478583 (https://phabricator.wikimedia.org/T86338) [06:20:06] (03CR) 10Marostegui: [C: 032] dbproxy1010: Depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/478582 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:21:44] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1121 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478583 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:22:48] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1121 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478583 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:23:12] !log Reload haproxy on dbproxy1010 to depool labsdb1010 - T86338 [06:23:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:23:17] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [06:24:42] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1121 T86338 T202167 (duration: 00m 49s) [06:24:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:24:47] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [06:25:25] !log Deploy schema change on db1121 with replication (this will generate lag on labs) - T86338 T202167 [06:25:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:26:41] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1121 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478583 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:28:56] ACKNOWLEDGEMENT - MegaRAID on db1063 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) Marostegui T211537 - The acknowledgement expires at: 2018-12-13 06:28:42. [06:29:23] PROBLEM - puppet last run on an-worker1084 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/share/ca-certificates/DigiCert_High_Assurance_CA-3.crt] [06:30:51] PROBLEM - puppet last run on aqs1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/apt-upgrade-activity] [06:31:29] PROBLEM - puppet last run on mw1319 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/puppet-enabled] [06:32:56] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1063 - https://phabricator.wikimedia.org/T211537 (10Marostegui) p:05Triage>03High a:03Cmjohnson @Cmjohnson I am setting this to high priority because there is one failed disk and another one with smart errors (on a different SPAN). Let's **replace on... [06:35:53] 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [06:45:44] (03CR) 10Giuseppe Lavagetto: [C: 032] hiera: remove the role backend in production [puppet] - 10https://gerrit.wikimedia.org/r/475499 (owner: 10Giuseppe Lavagetto) [06:45:54] (03PS3) 10Giuseppe Lavagetto: hiera: remove the role backend in production [puppet] - 10https://gerrit.wikimedia.org/r/475499 [06:47:11] !log Stop slave on s4 on labsdb1011 [06:47:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:50:34] <_joe_> !log disabled puppet across the fleet for merge of hiera change [06:50:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:54] <_joe_> !log running puppet on the puppetmasters in codfw, twice, then restarting apache to ensure cleanup of any cache [06:52:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:53:19] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1121" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478586 [07:01:15] (03PS1) 10Giuseppe Lavagetto: Revert "hiera: remove the role backend in production" [puppet] - 10https://gerrit.wikimedia.org/r/478587 [07:01:58] (03CR) 10jerkins-bot: [V: 04-1] Revert "hiera: remove the role backend in production" [puppet] - 10https://gerrit.wikimedia.org/r/478587 (owner: 10Giuseppe Lavagetto) [07:04:56] <_joe_> lol [07:05:36] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Revert "hiera: remove the role backend in production" [puppet] - 10https://gerrit.wikimedia.org/r/478587 (owner: 10Giuseppe Lavagetto) [07:18:04] <_joe_> !log reenabling puppet given my changes were useless [07:18:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:21:23] PROBLEM - puppet last run on an-worker1084 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/share/ca-certificates/DigiCert_High_Assurance_CA-3.crt] [07:22:48] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::php: install php-tideways, php-mongodb [puppet] - 10https://gerrit.wikimedia.org/r/478594 (https://phabricator.wikimedia.org/T206152) [07:22:50] (03PS2) 10Zoranzoki21: Add http://idb.ub.uni-tuebingen.de/digitue to the wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478466 (https://phabricator.wikimedia.org/T211466) [07:22:51] PROBLEM - puppet last run on aqs1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/apt-upgrade-activity] [07:22:54] (03PS2) 10Zoranzoki21: Remove FlaggedRevs for ptwikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478465 (https://phabricator.wikimedia.org/T211433) [07:23:31] PROBLEM - puppet last run on mw1319 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/puppet-enabled] [07:25:01] ^ that gets fixed with a second puppet run - I just tried [07:26:33] RECOVERY - puppet last run on an-worker1084 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [07:26:59] did the same with aqs1005, all good [07:27:18] (03CR) 10Elukey: [C: 032] Add AAAA records for analytics103* eqiad hosts [dns] - 10https://gerrit.wikimedia.org/r/478220 (owner: 10Elukey) [07:28:03] RECOVERY - puppet last run on aqs1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:28:41] RECOVERY - puppet last run on mw1319 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:43:24] (03CR) 10Elukey: [C: 032] profile::cache::kafka::alerts: set more sensitive thresholds [puppet] - 10https://gerrit.wikimedia.org/r/478210 (https://phabricator.wikimedia.org/T210939) (owner: 10Elukey) [07:43:31] (03PS3) 10Elukey: profile::cache::kafka::alerts: set more sensitive thresholds [puppet] - 10https://gerrit.wikimedia.org/r/478210 (https://phabricator.wikimedia.org/T210939) [07:54:06] 10Operations, 10Puppet, 10puppet-compiler: Cleanup the puppetmaster module so that we stop breaking expectations (and the puppet compiler) - https://phabricator.wikimedia.org/T211547 (10Joe) [07:59:59] (03CR) 10Muehlenhoff: [C: 031] remove diamond::collector reference from role::labs::nfs::secondary [puppet] - 10https://gerrit.wikimedia.org/r/478371 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite) [08:23:23] (03CR) 10Elukey: [C: 032] Add AAAA records for analytics104* eqiad hosts [dns] - 10https://gerrit.wikimedia.org/r/478221 (owner: 10Elukey) [08:23:35] !log final round of weight addition to new ms-be codfw hosts - T209395 [08:23:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:23:39] T209395: rack/setup/install new ms-be servers ms-be204[4-9] ,ms-be2050 - https://phabricator.wikimedia.org/T209395 [08:24:01] (03PS1) 10Marostegui: Revert "dbproxy1010: Depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/478609 [08:24:46] (03PS2) 10Marostegui: Revert "dbproxy1010: Depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/478609 [08:27:23] (03PS1) 10Joal: Bump AQS druid datasource to new 2018-11 snapshot [puppet] - 10https://gerrit.wikimedia.org/r/478610 [08:27:30] elukey: --^ :) [08:28:54] (03CR) 10Elukey: [C: 032] Bump AQS druid datasource to new 2018-11 snapshot [puppet] - 10https://gerrit.wikimedia.org/r/478610 (owner: 10Joal) [08:33:53] (03CR) 10Filippo Giunchedi: "afaict the role 'bastion' isn't used/present, changing 'cluster' in bastionhost ::pop and ::general should work tho" [puppet] - 10https://gerrit.wikimedia.org/r/478372 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [08:34:22] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1121" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478586 (owner: 10Marostegui) [08:35:26] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1121" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478586 (owner: 10Marostegui) [08:36:29] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1121 T86338 T202167 (duration: 00m 51s) [08:36:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:34] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [08:36:34] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [08:38:39] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM! I have a concern we might spam prometheus with metrics though we'll see when/if we get there." [puppet] - 10https://gerrit.wikimedia.org/r/478225 (https://phabricator.wikimedia.org/T209863) (owner: 10CDanis) [08:39:06] !log roll restart of aqs on aqs100* to pick up new Druid backend settings [08:39:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:08] (03CR) 10Filippo Giunchedi: [C: 031] wmcs: add prometheus-memcached-exporter [puppet] - 10https://gerrit.wikimedia.org/r/477620 (https://phabricator.wikimedia.org/T147326) (owner: 10Cwhite) [08:43:58] (03CR) 10Elukey: [C: 032] Add AAAA records for analytics10[567]* eqiad hosts [dns] - 10https://gerrit.wikimedia.org/r/478222 (owner: 10Elukey) [08:45:08] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1121" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478586 (owner: 10Marostegui) [08:46:30] 10Operations, 10User-jijiki: Create a mediawiki::cronjob define - https://phabricator.wikimedia.org/T211250 (10fgiunchedi) I recommend sending cronjobs output to logstash (as well as files?), when cronjobs are logging to syslog you can opt-in via `./modules/profile/files/rsyslog/lookup_table_output.json` [08:52:06] 10Operations, 10User-jijiki: Create a mediawiki::cronjob define - https://phabricator.wikimedia.org/T211250 (10elukey) We (analytics) have been trying to move away from crons in favor of systemd timers, adding some automation in `profile::analytics::systemd_timer`. It shouldn't need too much work to be genera... [08:53:16] (03PS4) 10Muehlenhoff: Add kerberos puppet wrapper [puppet] - 10https://gerrit.wikimedia.org/r/477987 [08:55:54] (03PS3) 10Marostegui: Revert "dbproxy1010: Depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/478609 [08:56:43] (03CR) 10Marostegui: [C: 032] Revert "dbproxy1010: Depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/478609 (owner: 10Marostegui) [08:57:43] !log Repool labsdb1010 - T86338 [08:57:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:57:47] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [08:58:02] !log installing chromium security updates on proton* [08:58:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:58:22] (03CR) 10Elukey: Add kerberos puppet wrapper (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/477987 (owner: 10Muehlenhoff) [08:59:54] (03PS1) 10Marostegui: dbproxy1010: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/478616 (https://phabricator.wikimedia.org/T202167) [09:00:50] (03CR) 10Marostegui: [C: 032] dbproxy1010: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/478616 (https://phabricator.wikimedia.org/T202167) (owner: 10Marostegui) [09:01:04] (03PS5) 10Muehlenhoff: Add kerberos puppet wrapper [puppet] - 10https://gerrit.wikimedia.org/r/477987 [09:01:57] !log Depool labsdb1011 - T86338 [09:02:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:04:47] (03PS1) 10Filippo Giunchedi: Revert "LabsServices: ship logs locally" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478617 (https://phabricator.wikimedia.org/T205851) [09:07:40] !log Deploy schema change on s8 codfw master with replication (db2045) - lag will be generated on codfw - T202167 T86338 [09:07:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:07:45] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [09:07:45] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [09:12:07] (03PS6) 10Muehlenhoff: Add kerberos puppet wrapper [puppet] - 10https://gerrit.wikimedia.org/r/477987 [09:13:55] (03CR) 10Elukey: [C: 031] Add kerberos puppet wrapper [puppet] - 10https://gerrit.wikimedia.org/r/477987 (owner: 10Muehlenhoff) [09:19:09] (03CR) 10Filippo Giunchedi: [C: 032] Revert "LabsServices: ship logs locally" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478617 (https://phabricator.wikimedia.org/T205851) (owner: 10Filippo Giunchedi) [09:29:05] (03CR) 10jenkins-bot: Revert "LabsServices: ship logs locally" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478617 (https://phabricator.wikimedia.org/T205851) (owner: 10Filippo Giunchedi) [09:36:35] (03PS1) 10Filippo Giunchedi: logging: special-case CeeFormatter when sending logstash to localhost [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478621 (https://phabricator.wikimedia.org/T211124) [09:41:07] (03PS1) 10Volans: comments: uniform and add missing Ganeti comments [dns] - 10https://gerrit.wikimedia.org/r/478622 (https://phabricator.wikimedia.org/T182028) [09:47:28] (03PS9) 10Filippo Giunchedi: rsyslog: add UDP localhost compatibility endpoint [puppet] - 10https://gerrit.wikimedia.org/r/475352 (https://phabricator.wikimedia.org/T205851) [09:47:30] (03PS4) 10Filippo Giunchedi: logstash: add new logging kafka consumer [puppet] - 10https://gerrit.wikimedia.org/r/476472 (https://phabricator.wikimedia.org/T205851) [09:47:32] (03PS4) 10Filippo Giunchedi: logstash: copy 'severity' into 'level' where needed [puppet] - 10https://gerrit.wikimedia.org/r/476473 (https://phabricator.wikimedia.org/T205851) [09:47:43] (03CR) 10Volans: [C: 031] "LGTM, two nitpicks inline, no need to re-review." (034 comments) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/478458 (https://phabricator.wikimedia.org/T205899) (owner: 10CRusnov) [09:51:23] (03CR) 10Volans: [C: 04-1] "It seems a too old version to me." (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/477958 (owner: 10Mathew.onipe) [09:53:23] (03PS1) 10Elukey: Add two new HDFS journalnodes to the Analytics Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/478623 (https://phabricator.wikimedia.org/T209929) [10:09:21] (03PS1) 10Muehlenhoff: Enable Kerberos wrapper for cdh::hadoop::directory [puppet/cdh] - 10https://gerrit.wikimedia.org/r/478625 [10:20:42] (03PS1) 10Odder: Add localised logos for the Chewa Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478628 (https://phabricator.wikimedia.org/T211570) [10:21:49] 10Operations, 10Prod-Kubernetes, 10Continuous-Integration-Infrastructure (shipyard), 10Kubernetes, 10Patch-For-Review: debianize docker-registry 2.7.0-rc0 and upload in stretch-wikimedia - https://phabricator.wikimedia.org/T210071 (10fselles) [10:22:07] 10Operations, 10Prod-Kubernetes, 10Continuous-Integration-Infrastructure (shipyard), 10Kubernetes: improve docker registry architecture - https://phabricator.wikimedia.org/T209271 (10fselles) [10:22:15] 10Operations, 10User-jijiki: Create a mediawiki::cronjob define - https://phabricator.wikimedia.org/T211250 (10jijiki) @elukey @fgiunchedi we should definitely take into account logging to logstash and using systemd timers, tx! [10:22:30] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Reboot WMCS servers for L1TF - https://phabricator.wikimedia.org/T207377 (10aborrero) 05Open>03Resolved Thanks @Bstorm and @GTirloni you both did most of the heavy work :-) I'm closing the task now as done, since `labstore1003.eqiad.wmnet... [10:23:34] 10Operations, 10Thumbor, 10Performance-Team (Radar), 10User-jijiki: Assess Thumbor upgrade options - https://phabricator.wikimedia.org/T209886 (10jijiki) [10:24:28] 10Operations, 10Citoid, 10Prod-Kubernetes, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Citoid automated monitoring times out due to Zotero v2 - https://phabricator.wikimedia.org/T211411 (10fselles) a:03fselles [10:25:01] 10Operations, 10Citoid, 10Prod-Kubernetes, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Citoid automated monitoring times out due to Zotero v2 - https://phabricator.wikimedia.org/T211411 (10Mvolz) As a first pass maybe we should update Zotero? It's a few months old now. A... [10:25:05] (03PS1) 10Marostegui: Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/478629 [10:26:42] (03PS1) 10Michael Große: Perform even more PHP constraint checks before falling back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478630 (https://phabricator.wikimedia.org/T209504) [10:27:11] 10Operations, 10Thumbor, 10Performance-Team (Radar), 10User-jijiki: Assess Thumbor upgrade options - https://phabricator.wikimedia.org/T209886 (10jijiki) 2018-12-05 12:53 jijiki: uploaded python-thumbor-community-core_0.4.0-1+deb9u1 to stretch-wikimedia 2018-12-05 16:27 jijiki: uploaded python-thumbor-wiki... [10:27:30] 10Operations, 10Thumbor, 10Performance-Team (Radar), 10User-jijiki: Assess Thumbor upgrade options - https://phabricator.wikimedia.org/T209886 (10jijiki) [10:30:16] jan_drewniak: #bothumor I � Unicode. All rise for Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181210T1030). [10:31:47] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478631 (https://phabricator.wikimedia.org/T128546) [10:31:58] (03CR) 10jerkins-bot: [V: 04-1] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478631 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:33:09] (03PS1) 10Odder: Add localised logos for the Chewa Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478632 (https://phabricator.wikimedia.org/T211570) [10:33:17] (03CR) 10Marostegui: [C: 032] Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/478629 (owner: 10Marostegui) [10:33:54] (03CR) 10jerkins-bot: [V: 04-1] Add localised logos for the Chewa Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478632 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [10:35:38] !log Repool labsdb1011 T86338 [10:35:41] (03Abandoned) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478631 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:35:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:42] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [10:35:53] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478633 (https://phabricator.wikimedia.org/T128546) [10:36:49] !log Deploy schema change on dbstore1002:s8 T86338 T202167 [10:36:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:53] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [10:38:03] (03CR) 10Jdrewniak: [C: 032] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478633 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:38:48] (03CR) 10Zoranzoki21: [C: 031] "This change looks good, but see comment and fix it, so you can get +2 from Jenkins." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478632 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [10:38:58] !log Deploy schema change on db1116:3318 T86338 T202167 [10:39:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:06] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478633 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:41:40] (03PS2) 10Odder: Add localised logos for the Chewa Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478632 (https://phabricator.wikimedia.org/T211570) [10:41:52] PROBLEM - cassandra-b SSL 10.192.32.138:7001 on restbase2004 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [10:41:53] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:478633| Bumping portals to master (T128546)]] (duration: 00m 52s) [10:41:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:41:56] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [10:42:30] PROBLEM - cassandra-b CQL 10.192.32.138:9042 on restbase2004 is CRITICAL: connect to address 10.192.32.138 and port 9042: Connection refused [10:42:40] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:478633| Bumping portals to master (T128546)]] (duration: 00m 46s) [10:42:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:42:54] (03CR) 10Amire80: [C: 031] "Thank you!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478628 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [10:43:43] (03CR) 10Zoranzoki21: [C: 031] "Yes :) LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478632 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [10:49:01] (03CR) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478633 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [11:04:48] !log decommissioning cassandra-c, restbase2004 -- T210843 [11:04:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:52] T210843: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 [11:12:30] (03PS1) 10Giuseppe Lavagetto: role::beta: introduce docker_services [puppet] - 10https://gerrit.wikimedia.org/r/478637 [11:13:00] (03CR) 10Giuseppe Lavagetto: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/478637 (owner: 10Giuseppe Lavagetto) [11:15:21] (03CR) 10Alexandros Kosiaris: [C: 04-1] admins: add new group for proton admins (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/478373 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [11:16:33] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Marostegui) Why not using mwmaint1002 for the import? I just checked and it can reach m2-master fine. [11:23:41] (03PS1) 10Ema: trafficserver (8.0.1-1wm1) stretch-wikimedia; urgency=medium [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/478640 (https://phabricator.wikimedia.org/T207048) [11:23:56] (03CR) 10Fsero: [C: 04-1] "have some questions about some details, overall looks OK to me :)" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/478637 (owner: 10Giuseppe Lavagetto) [11:24:23] the -1 _joe_ is just to get my questions answered as long as they are i will change my vote [11:26:32] (03CR) 10Muehlenhoff: admins: add new group for proton admins (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/478373 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [11:32:36] (03CR) 10Alexandros Kosiaris: [C: 031] "Took a quick look, seems fine to me" [dns] - 10https://gerrit.wikimedia.org/r/478622 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [11:33:24] PROBLEM - Check systemd state on ms-be1028 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [11:34:09] (03CR) 10Urbanecm: [C: 04-1] Add logos. (037 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478498 (owner: 10Robingan7) [11:34:22] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478571 (owner: 10Robingan7) [11:34:26] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478570 (owner: 10Robingan7) [11:34:31] (03CR) 10Urbanecm: [C: 04-1] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478498 (owner: 10Robingan7) [11:35:29] (03CR) 10jerkins-bot: [V: 04-1] Add several logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478571 (owner: 10Robingan7) [11:35:39] (03CR) 10jerkins-bot: [V: 04-1] Upload some new logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478570 (owner: 10Robingan7) [11:46:28] (03PS2) 10Muehlenhoff: Make Kerberos configurable for cdh::hadoop::namenode::primary [puppet/cdh] - 10https://gerrit.wikimedia.org/r/478625 [11:48:50] (03PS1) 10Fsero: Added fsero as icinga contact [puppet] - 10https://gerrit.wikimedia.org/r/478644 (https://phabricator.wikimedia.org/T208715) [11:49:30] (03CR) 10Fsero: "The last bit of my onboarding :)" [puppet] - 10https://gerrit.wikimedia.org/r/478644 (https://phabricator.wikimedia.org/T208715) (owner: 10Fsero) [11:50:49] (03CR) 10Giuseppe Lavagetto: role::beta: introduce docker_services (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/478637 (owner: 10Giuseppe Lavagetto) [11:51:08] (03PS2) 10Giuseppe Lavagetto: role::beta: introduce docker_services [puppet] - 10https://gerrit.wikimedia.org/r/478637 [11:54:36] 10Operations, 10Cloud-Services, 10Cloud-VPS, 10IPv6: Enable ipv6 on labs - https://phabricator.wikimedia.org/T37947 (10Lucas_Werkmeister_WMDE) [11:58:09] 10Operations, 10ops-eqiad: eqiad: Re-connect cage cameras - https://phabricator.wikimedia.org/T207965 (10faidon) @Cmjohnson all of the ports show as "physical link down", could you have a look? Thanks! [12:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the European Mid-day SWAT(Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181210T1200). [12:00:04] CFisch_WMDE, hoo, Urbanecm, and Zoranzoki21: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:41] ` If you break AND fix the wikis, you will be rewarded with a sticker.` D: [12:01:57] (03CR) 10Fsero: [C: 031] "nit: add the comment to remember to clean up the docker command, otherwise LGTM" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/478637 (owner: 10Giuseppe Lavagetto) [12:03:28] I can SWAT today [12:03:57] CFisch_WMDE, hoo, Urbanecm, and Zoranzoki21: around for SWAT? [12:04:02] yes [12:04:22] CFisch_NA, hoo: you are deployers, right? [12:04:25] (03CR) 10Alexandros Kosiaris: [C: 031] Added fsero as icinga contact (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/478644 (https://phabricator.wikimedia.org/T208715) (owner: 10Fsero) [12:04:35] Yeah [12:04:38] feel free to deploy your patches yourselves [12:04:43] or do you want me to do it? [12:04:55] zeljkof: ah almost forgot [12:05:07] I'm not yet deployer [12:05:17] so please go ahead [12:05:34] hoo: you can go first while I get ready [12:05:39] Will do :) [12:06:09] (03CR) 10Hoo man: [C: 032] Remove the "wikibase-debug" log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478196 (https://phabricator.wikimedia.org/T207850) (owner: 10Hoo man) [12:09:52] hoo: you'll probably have to rebase 478196, gerrit says "merge conflict" [12:10:05] :S [12:10:25] RECOVERY - Check systemd state on ms-be1028 is OK: OK - running: The system is fully operational [12:10:30] (03PS2) 10Hoo man: Remove the "wikibase-debug" log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478196 (https://phabricator.wikimedia.org/T207850) [12:10:39] (03CR) 10Hoo man: [C: 032] Remove the "wikibase-debug" log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478196 (https://phabricator.wikimedia.org/T207850) (owner: 10Hoo man) [12:10:41] CFisch_NA: 477798 has -1 from jenkins :/ [12:11:43] (03Merged) 10jenkins-bot: Remove the "wikibase-debug" log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478196 (https://phabricator.wikimedia.org/T207850) (owner: 10Hoo man) [12:11:47] oh o.O [12:11:55] (03PS2) 10WMDE-Fisch: Set FileImporter config help location [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477798 (https://phabricator.wikimedia.org/T199108) [12:11:57] (03CR) 10Zfilipin: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477798 (https://phabricator.wikimedia.org/T199108) (owner: 10WMDE-Fisch) [12:12:11] hehe [12:12:15] CFisch_NA: looks like it's caused by a CI problem, rechecking [12:13:24] !log hoo@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Remove the "wikibase-debug" log channel (T207850) (duration: 00m 47s) [12:13:26] CFisch_NA: all good, no problems with PS2 [12:13:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:13:29] T207850: Consolidate the log groups used within Wikibase & Wikibase extensions. - https://phabricator.wikimedia.org/T207850 [12:13:36] I'm done [12:13:45] hoo: great, I'll continue with SWAT [12:14:01] CFisch_NA: please stand by, I'll let you know when the patch is at mwdebug [12:14:17] cool thanks [12:14:38] (03CR) 10Fsero: [C: 032] Added fsero as icinga contact [puppet] - 10https://gerrit.wikimedia.org/r/478644 (https://phabricator.wikimedia.org/T208715) (owner: 10Fsero) [12:14:55] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477798 (https://phabricator.wikimedia.org/T199108) (owner: 10WMDE-Fisch) [12:15:57] (03Merged) 10jenkins-bot: Set FileImporter config help location [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477798 (https://phabricator.wikimedia.org/T199108) (owner: 10WMDE-Fisch) [12:17:51] CFisch_NA: the patch is at mwdebug1002 [12:18:25] zeljkof: thanks should be fine :-) [12:18:56] (03CR) 10jenkins-bot: Remove the "wikibase-debug" log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478196 (https://phabricator.wikimedia.org/T207850) (owner: 10Hoo man) [12:18:57] CFisch_NA: ok to deploy? [12:18:58] (03CR) 10jenkins-bot: Set FileImporter config help location [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477798 (https://phabricator.wikimedia.org/T199108) (owner: 10WMDE-Fisch) [12:19:09] zeljkof: Yes, please. [12:20:24] CFisch_NA: ok [12:20:34] !log running puppet agent on icinga to add fsero [12:20:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:28] !log zfilipin@deploy1001 Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:477798|Set FileImporter config help location (T199108)]] (duration: 00m 47s) [12:21:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:36] T199108: Block all imports if no configuration file for a source wiki exists - https://phabricator.wikimedia.org/T199108 [12:21:48] CFisch_NA: deployed, please test and thanks for deploying with #releng ;) [12:22:10] Urbanecm, and Zoranzoki21: around for swat? [12:22:17] sure [12:22:27] it's nice to see zeljkof doing EU SWAT again [12:22:35] things are getting back to normal +; [12:22:39] ;) [12:22:48] Urbanecm: I was on team offsite all last week :) [12:23:05] I know :) [12:23:11] I'm not blaming you of course [12:23:30] Urbanecm: hm, 477856 is already merged [12:23:39] I see [12:23:43] I'm just removing it [12:23:53] {{done}} [12:25:43] !log installing imagemagick security update for jessie [12:25:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:25:50] Urbanecm: 478476 is also merged :D [12:25:57] well, that was quick :) [12:26:11] hmm, didn't know :D [12:26:44] Zoranzoki21: last call for SWAT [12:27:16] zeljkof, I'll take over for his patches if he won't show [12:27:31] Urbanecm: ok, great, looks like he's not around [12:27:36] 10Operations, 10Thumbor, 10Performance-Team (Radar), 10User-jijiki: Assess Thumbor upgrade options - https://phabricator.wikimedia.org/T209886 (10Gilles) Looking good! On Beta thumbnails are generated by the Thumbor instance running on deployment-imagescaler01.deployment-prep.eqiad.wmflabs and deployment-... [12:27:56] (03CR) 10Ema: [C: 031] "LGTM and to pcc https://puppet-compiler.wmflabs.org/compiler1002/13878/" [puppet] - 10https://gerrit.wikimedia.org/r/478016 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [12:28:09] 10Operations, 10Patch-For-Review: Onboard Fabián Sellés Rosa to SRE - https://phabricator.wikimedia.org/T208715 (10fselles) [12:28:17] 10Operations, 10Patch-For-Review: Onboard Fabián Sellés Rosa to SRE - https://phabricator.wikimedia.org/T208715 (10fselles) 05Open>03Resolved [12:28:40] Hi [12:28:47] I am lating for SWAT? [12:28:50] Zoranzoki21: just in time for swat :) cc Urbanecm [12:29:08] ok, so looks I have nothing to deploy :( [12:29:17] Zoranzoki21: last minute :) Urbanecm said he'll test your changes if you're not around, but there you are [12:29:28] :D [12:29:39] Urbanecm: Be happy :) [12:29:44] :) [12:29:57] I just realized changes I scheduled were merged in the meanwhile :) [12:30:04] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478465 (https://phabricator.wikimedia.org/T211433) (owner: 10Zoranzoki21) [12:30:06] Ok. Let's go in new adventure zeljkof :) [12:30:14] adventure time! [12:30:32] Urbanecm: that's a good problem to have :) [12:31:07] (03Merged) 10jenkins-bot: Remove FlaggedRevs for ptwikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478465 (https://phabricator.wikimedia.org/T211433) (owner: 10Zoranzoki21) [12:31:51] (03CR) 10jenkins-bot: Remove FlaggedRevs for ptwikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478465 (https://phabricator.wikimedia.org/T211433) (owner: 10Zoranzoki21) [12:32:16] Zoranzoki21: 478465 is at mwdebug1002 [12:33:00] zeljkof: Ok.. [12:33:27] testing [12:34:49] Should be ok.. zeljkof: Check logs [12:35:11] zeljkof: let me know if the SWAT ends early by any chance, I'll deploy a small thing of my own if it does [12:35:41] gilles: two more patches, should be done soon [12:35:59] Zoranzoki21: logs look fine, deploying [12:36:42] zeljkof: Ok.. Second patch for CopyDomains no need testing, so you can push it directly [12:36:51] Zoranzoki21: ok [12:37:00] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/data/citation/{format}/{query} (Get citation for Darth Vader) timed out before a response was received [12:37:10] (03PS1) 10Gilles: Oversample performance survey on specific ruwiki articles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478656 (https://phabricator.wikimedia.org/T197607) [12:37:26] !log zfilipin@deploy1001 Synchronized dblists/flaggedrevs.dblist: SWAT: [[gerrit:478465|Remove FlaggedRevs for ptwikipedia (T211433)]] (duration: 00m 46s) [12:37:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:37:30] T211433: Remove FlaggedRevs for ptwikipedia - https://phabricator.wikimedia.org/T211433 [12:37:45] Zoranzoki21: 478465 is deployed, please test [12:37:49] Sure [12:37:58] (03PS2) 10Muehlenhoff: Remove Diamond from DNS roles [puppet] - 10https://gerrit.wikimedia.org/r/478016 (https://phabricator.wikimedia.org/T183454) [12:38:00] RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy [12:38:07] Ok [12:38:11] hey guys, when SWAT is done, I have a tiny unbreak-soon re: donation banners I'd like to revert if that's ok: https://files.slack.com/files-pri/T024KLHS4-FEQPZ2CJ3/image.png [12:38:28] re:wikipedia.org portal [12:38:45] jan_drewniak: sure, please coordinate with gilles, he has something too [12:38:50] jan_drewniak: you go first [12:38:58] (03PS3) 10Zfilipin: Add http://idb.ub.uni-tuebingen.de/digitue to the wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478466 (https://phabricator.wikimedia.org/T211466) (owner: 10Zoranzoki21) [12:39:52] gilles: ok thanks! I'll wait until SWAT is done of course [12:40:18] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478466 (https://phabricator.wikimedia.org/T211466) (owner: 10Zoranzoki21) [12:40:28] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received [12:41:21] (03Merged) 10jenkins-bot: Add http://idb.ub.uni-tuebingen.de/digitue to the wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478466 (https://phabricator.wikimedia.org/T211466) (owner: 10Zoranzoki21) [12:41:28] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [12:42:38] (03CR) 10Ema: [C: 032] trafficserver (8.0.1-1wm1) stretch-wikimedia; urgency=medium [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/478640 (https://phabricator.wikimedia.org/T207048) (owner: 10Ema) [12:42:50] (03PS1) 10Jdrewniak: Revert "Bumping portals to master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478659 [12:43:30] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:478466|Add http://idb.ub.uni-tuebingen.de/digitue to the wgCopyUploadsDomains (T211466)]] (duration: 00m 47s) [12:43:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:43:35] T211466: Please add http://idb.ub.uni-tuebingen.de/digitue to the wgCopyUploadsDomains whitelist of Wikimedia Commons - https://phabricator.wikimedia.org/T211466 [12:43:44] Zoranzoki21: 478466 is deployed [12:43:57] Yes, I saw [12:44:01] Thanks! [12:44:01] jan_drewniak, gilles: I'm done, SWAT is yours [12:44:18] Zoranzoki21: thanks for deploying with #releng :) [12:44:51] yw [12:44:51] zeljkof: ok thanks! gilles: I'll start mine, it's pretty quick. [12:45:04] (03CR) 10jenkins-bot: Add http://idb.ub.uni-tuebingen.de/digitue to the wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478466 (https://phabricator.wikimedia.org/T211466) (owner: 10Zoranzoki21) [12:45:14] (03CR) 10Jdrewniak: [C: 032] Revert "Bumping portals to master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478659 (owner: 10Jdrewniak) [12:46:19] (03Merged) 10jenkins-bot: Revert "Bumping portals to master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478659 (owner: 10Jdrewniak) [12:46:20] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received [12:46:24] (03CR) 10Muehlenhoff: [C: 032] Remove Diamond from DNS roles [puppet] - 10https://gerrit.wikimedia.org/r/478016 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [12:47:26] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [12:48:45] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:478659| Bumping portals to master (T128546)]] (duration: 00m 46s) [12:48:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:49] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [12:49:31] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:478659| Bumping portals to master (T128546)]] (duration: 00m 46s) [12:49:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:51:28] gilles: ok, I'm done, it's all yours [12:51:32] thanks [12:51:39] (03PS2) 10Gilles: Oversample performance survey on specific ruwiki articles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478656 (https://phabricator.wikimedia.org/T197607) [12:53:18] (03CR) 10Gilles: [C: 032] Oversample performance survey on specific ruwiki articles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478656 (https://phabricator.wikimedia.org/T197607) (owner: 10Gilles) [12:54:24] (03Merged) 10jenkins-bot: Oversample performance survey on specific ruwiki articles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478656 (https://phabricator.wikimedia.org/T197607) (owner: 10Gilles) [12:56:34] !log gilles@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T187299 T197607 Oversample performance survey on specific ruwiki articles (duration: 00m 46s) [12:56:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:38] T197607: Add ability to oversample specific pages - https://phabricator.wikimedia.org/T197607 [12:57:05] and I'm done [12:57:51] (03CR) 10jenkins-bot: Revert "Bumping portals to master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478659 (owner: 10Jdrewniak) [12:57:53] (03CR) 10jenkins-bot: Oversample performance survey on specific ruwiki articles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478656 (https://phabricator.wikimedia.org/T197607) (owner: 10Gilles) [13:07:26] 10Operations, 10DNS, 10Operations-Software-Development, 10Traffic, 10Patch-For-Review: DNS repo: add CI checks for obvious configuration errors - https://phabricator.wikimedia.org/T182028 (10Volans) [13:08:32] (03PS1) 10Filippo Giunchedi: wmnet: remove unused ms-fe.esams.wmnet [dns] - 10https://gerrit.wikimedia.org/r/478663 (https://phabricator.wikimedia.org/T182028) [13:10:31] (03CR) 10Filippo Giunchedi: [C: 031] wmnet: remove unused ms-fe.esams.wmnet [dns] - 10https://gerrit.wikimedia.org/r/478663 (https://phabricator.wikimedia.org/T182028) (owner: 10Filippo Giunchedi) [13:17:21] 10Operations, 10ops-codfw, 10Core Platform Team, 10Services (doing), and 2 others: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 (10Eevans) [13:21:11] 10Operations, 10ops-eqiad: eqiad: Re-connect cage cameras - https://phabricator.wikimedia.org/T207965 (10Cmjohnson) @faidon the cameras are connected, I am wondering if these ports have PoE? [13:24:49] (03CR) 10Filippo Giunchedi: [C: 032] wmnet: remove unused ms-fe.esams.wmnet [dns] - 10https://gerrit.wikimedia.org/r/478663 (https://phabricator.wikimedia.org/T182028) (owner: 10Filippo Giunchedi) [13:27:09] (03CR) 10Filippo Giunchedi: "Straw man attempt, if you can think of better strategies please LMK" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478621 (https://phabricator.wikimedia.org/T211124) (owner: 10Filippo Giunchedi) [13:28:16] 10Operations, 10Release Pipeline: blubber template for nodejs should allow defining configuration files to copy to the container - https://phabricator.wikimedia.org/T211580 (10Joe) [13:29:44] 10Operations, 10ops-eqiad: eqiad: Re-connect cage cameras - https://phabricator.wikimedia.org/T207965 (10faidon) They don't, these aren't PoE switches. I didn't know these cameras required PoE. So, two options I suppose: - Use PoE injectors - Hook them up to (old) EX4200s. Are we using any of them for mgmt swi... [13:34:17] !log trafficserver 8.0.1-1wm1 uploaded to stretch-wikimedia T207048 [13:34:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:23] T207048: ATS production-ready as a backend cache layer - https://phabricator.wikimedia.org/T207048 [13:35:47] (03PS13) 10CDanis: kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 (https://phabricator.wikimedia.org/T209863) [13:36:40] (03CR) 10CDanis: [C: 032] "> LGTM! I have a concern we might spam prometheus with metrics though" [puppet] - 10https://gerrit.wikimedia.org/r/478225 (https://phabricator.wikimedia.org/T209863) (owner: 10CDanis) [13:38:53] 10Operations, 10MediaWiki-Debug-Logger, 10Performance-Team, 10Patch-For-Review: Set up request profiling for PHP 7 - https://phabricator.wikimedia.org/T206152 (10Joe) I did some benchmarks , using the same setup I used for T206341, with tideways enabled and disabled. I could not notice any clear trend besi... [13:40:36] 10Operations, 10Core Platform Team Backlog (Watching / External), 10Services (watching), 10User-fgiunchedi: Put restbase201[3-8] into conftool and LVS - https://phabricator.wikimedia.org/T211416 (10fgiunchedi) >>! In T211416#4805907, @mobrovac wrote: > They are independent, though. Cassandra doesn't go int... [13:45:55] this is strange. I submitted https://gerrit.wikimedia.org/r/478225, ran puppet-merge, ran sudo puppet agent --test on wezen.codfw, saw it update /etc/mtail/kernel.mtail, but puppet did not seem to notify the running mtail process of the change in any way... [13:47:38] mtail process there has been running since October, has nothing in its logs about reloading config files (NB I'm not sure whether to expect that or not); Puppet logs there indicate it changed file content but doesn't say anything about notifying anything [13:49:11] (03CR) 10Lucas Werkmeister (WMDE): "This change is ready for review." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478630 (https://phabricator.wikimedia.org/T209504) (owner: 10Michael Große) [13:51:35] (03CR) 10Lucas Werkmeister (WMDE): "Oops, sorry, I didn’t mean to remove the WIP status from this! Silly Gerrit…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478630 (https://phabricator.wikimedia.org/T209504) (owner: 10Michael Große) [13:51:47] godog: it looks like... syslog::centralserver doesn't actually fill in the 'notify' section of mtail programs with anything when it instantiates them? [13:52:24] cdanis: mtail does pick up its programs on change iirc, it didn't do that? [13:52:29] it did not [13:52:38] and looking at the puppet files, I'm honestly not sure how it works [13:52:52] program.pp has a notify => $notify stanza -- although there's no obvious mechanism where $notify is filled in, as it is not part of the define mtail::program(...) up top [13:53:10] jouncebot: next [13:53:10] In 4 hour(s) and 6 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181210T1800) [13:53:23] and centralserver.pp, which instantiates the programs like kernel.mtail, does not set notify to anything either [13:54:18] I see notify explicitly filled in by some other programs -- e.g. the varnishmtail ones [13:55:54] (03PS1) 10Marostegui: db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478668 (https://phabricator.wikimedia.org/T202167) [13:57:48] advice from someone who knows puppet better than I (read: at all) as to if the current configuration should work or not, and if not, how best to fix it, would be appreciated ;) [13:58:03] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478668 (https://phabricator.wikimedia.org/T202167) (owner: 10Marostegui) [13:59:08] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478668 (https://phabricator.wikimedia.org/T202167) (owner: 10Marostegui) [13:59:48] so notify is one of puppet metaparameters, I'm not sure at this point if being a metaparameter you are required to declare it in the define [14:00:29] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1099 T86338 T202167 (duration: 00m 46s) [14:00:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:34] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [14:00:34] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [14:00:35] yeah i have no idea either; on one hand I see ensure there; otoh it's there to provide a default [14:00:36] !log Deploy schema change db1099:3318 T86338 T202167 [14:00:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:57] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478668 (https://phabricator.wikimedia.org/T202167) (owner: 10Marostegui) [14:02:53] godog: how do you feel about program.pp providing a default notify = Service['mtail']? [14:04:05] cdanis: yup, sounds good to me! please verify that mtail isn't supposed to reload its programs otherwise we should report it as a bug [14:07:10] 10Operations, 10Release Pipeline: blubber template for nodejs should allow defining configuration files to copy to the container - https://phabricator.wikimedia.org/T211580 (10mobrovac) FYI, we can also have an alternative mount point for the config file, and can tell `service-runner` where to look for it by s... [14:07:28] (03PS1) 10CDanis: mtail::program notify Service['mtail'] by default [puppet] - 10https://gerrit.wikimedia.org/r/478669 [14:08:05] (03CR) 10jerkins-bot: [V: 04-1] mtail::program notify Service['mtail'] by default [puppet] - 10https://gerrit.wikimedia.org/r/478669 (owner: 10CDanis) [14:09:01] (03CR) 10CDanis: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/478669 (owner: 10CDanis) [14:09:26] (03CR) 10jerkins-bot: [V: 04-1] mtail::program notify Service['mtail'] by default [puppet] - 10https://gerrit.wikimedia.org/r/478669 (owner: 10CDanis) [14:10:16] can someone tell me what jenkins is upset about in https://integration.wikimedia.org/ci/job/operations-puppet-tests-stretch-docker/2459/console ? [14:12:03] (03PS1) 10Filippo Giunchedi: conftool: add restbase10[3-8] [puppet] - 10https://gerrit.wikimedia.org/r/478672 (https://phabricator.wikimedia.org/T211416) [14:12:05] cdanis: at first look it seems it failed to find a test log [14:12:05] (03PS1) 10Filippo Giunchedi: hieradata: add restbase10[3-8] to restbase [puppet] - 10https://gerrit.wikimedia.org/r/478673 (https://phabricator.wikimedia.org/T211416) [14:12:09] in /srv/workspace/puppet/.tox/log/* [14:12:30] Build step 'Execute shell' marked build as failure [14:14:35] (03CR) 10Mobrovac: [C: 04-1] admins: add new group for proton admins (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/478373 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [14:15:16] I don't understand how my change touches anything that could produce a failure like that though :) [14:15:22] (03CR) 10Mobrovac: conftool: add restbase10[3-8] (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/478672 (https://phabricator.wikimedia.org/T211416) (owner: 10Filippo Giunchedi) [14:15:58] (03CR) 10Mobrovac: hieradata: add restbase10[3-8] to restbase (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/478673 (https://phabricator.wikimedia.org/T211416) (owner: 10Filippo Giunchedi) [14:16:32] godog: ^ :P [14:17:12] cdanis: : [14:17:14] 14:08:02 notify is a metaparam; this value will inherit to all contained resources in the mtail::program definition [14:17:17] I think it’s that [14:17:24] that's what I _want_ though ;) [14:18:02] https://puppet-compiler.wmflabs.org/compiler1002/13879/ puppet-compiler is happy with my change but I guess jenkins is running a separate linting step that might effectively have that warning as an error? [14:19:09] <_joe_> cdanis: yes, that's what puppet-lint complains about [14:19:30] <_joe_> cdanis: lemme look at your change [14:20:06] <_joe_> cdanis: oh ok, so I assume you should do something like [14:20:16] <_joe_> $notify_to = ... [14:20:38] (03PS1) 10Volans: labmon1002: add missing PTR for IPv6 [dns] - 10https://gerrit.wikimedia.org/r/478675 [14:20:47] <_joe_> and then add notify => $notify_to to the actual file resource [14:20:57] <_joe_> that you want to notify about [14:20:59] hm [14:21:07] <_joe_> that's to do things like puppet-lint likes them [14:21:18] <_joe_> the alternative is to ignore the linting rule I guess [14:21:31] is stuff like that common? I don't see any hits for notify_to or ensure_to in our puppet already [14:21:57] <_joe_> frankly, I don't think it's very common to make what you want to notify to configurable [14:22:26] <_joe_> cdanis: and I was sure what you wrote was a syntax error in puppet :P [14:22:37] I have zero frame of reference for what is and isn't common or acceptable syntax in puppet [14:22:56] <_joe_> ok lemme comment on the patch :) [14:23:00] (03PS2) 10Bearloga: shiny_server: change gfortran/g++ dep [puppet] - 10https://gerrit.wikimedia.org/r/478252 [14:23:17] _joe_: [14:23:32] how does one usually puppetize "a set of user-specified config files that, when changed, should notify a common service"? [14:23:32] hi! :) if you could please +2 that'd be super helpful [14:25:27] brb, this is definitely a second-coffee kind of morning [14:25:37] <_joe_> cdanis: oh ok eheh [14:25:58] <_joe_> bearloga: If I wasn't working on three things at the same time already, maybe :) [14:26:26] _joe_: fair! :D sorry!!! I'll ask someone else :) [14:26:27] <_joe_> you can ask someone else (also, having a task linked to your commit might make it easier to review) [14:27:04] (03CR) 10Faidon Liambotis: "Let's add the "ticket" custom field to the list of coherency checks we're making, as that's often missing, lacking, wrong etc." [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/478458 (https://phabricator.wikimedia.org/T205899) (owner: 10CRusnov) [14:29:06] (03CR) 10Rush: [C: 04-1] "Reiterating: Do we have any concerns about private/security related things being stored in swift?" [puppet] - 10https://gerrit.wikimedia.org/r/432528 (https://phabricator.wikimedia.org/T182085) (owner: 1020after4) [14:30:37] 10Operations, 10Product-Analytics: Upload shiny-server .deb to our Stretch apt repository - https://phabricator.wikimedia.org/T168967 (10mpopov) So @Gehel told me we might actually be fine. I'm testing out this theory but ran into a problem with the current puppet config for [[ https://github.com/wikimedia/pup... [14:30:50] (03PS3) 10Bearloga: shiny_server: change gfortran/g++ dep [puppet] - 10https://gerrit.wikimedia.org/r/478252 (https://phabricator.wikimedia.org/T168967) [14:33:21] 10Operations, 10Product-Analytics, 10Patch-For-Review: Upload shiny-server .deb to our Stretch apt repository - https://phabricator.wikimedia.org/T168967 (10mpopov) a:03mpopov [14:35:56] (03PS1) 10BBlack: cache_text: Vary for PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/478680 (https://phabricator.wikimedia.org/T206339) [14:37:32] (03CR) 10Michael Große: "This change is ready for review." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478630 (https://phabricator.wikimedia.org/T209504) (owner: 10Michael Große) [14:37:45] (03CR) 10Giuseppe Lavagetto: mtail::program notify Service['mtail'] by default (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/478669 (owner: 10CDanis) [14:39:35] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Ottomata) AH a task! I missed that. Making now. [14:41:54] 10Operations, 10Puppet, 10puppet-compiler: Cleanup the puppetmaster module so that we stop breaking expectations (and the puppet compiler) - https://phabricator.wikimedia.org/T211547 (10herron) Since we are arguably due for another puppet upgrade, and puppet 5 will be in buster, should we remove `puppet_majo... [14:42:47] <_joe_> cdanis: see my comment on the patch when you're back [14:42:56] yepyep [14:43:08] I did not know of the spaceship operator [14:43:24] <_joe_> cdanis: it's a bit absurd but it's powerful [14:43:27] I think I will just add an explicit notify to the users who do not presently have it -- most do [14:44:13] <_joe_> cdanis: uhm at line https://gerrit.wikimedia.org/r/c/operations/puppet/+/478669/1/modules/mtail/manifests/program.pp#53 [14:44:19] <_joe_> you already notify the service [14:44:22] I really want to say If you need to say "all mtail::programs with ensure present should notify Service['mtail'] by default, or a service they specified themselves" -- there are a bunch of mtail::programs that need to notify Service['varnishmtail'] for whatever reason [14:44:37] <_joe_> ok [14:44:50] <_joe_> I see so it must change between servers [14:44:58] <_joe_> uhm [14:45:00] it sometimes does and sometimes doesn't, I'm not sure why. [14:45:30] <_joe_> ok, so just add them explicitly. I know almost zero about our mtail setups sadly right now [14:45:33] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478686 [14:45:34] ok [14:46:46] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478686 (owner: 10Marostegui) [14:47:26] (03PS2) 10Filippo Giunchedi: conftool: add restbase20[3-8] [puppet] - 10https://gerrit.wikimedia.org/r/478672 (https://phabricator.wikimedia.org/T211416) [14:47:27] mobrovac: doh, of course! fixed [14:47:28] (03PS2) 10Filippo Giunchedi: hieradata: add restbase20[3-8] to restbase [puppet] - 10https://gerrit.wikimedia.org/r/478673 (https://phabricator.wikimedia.org/T211416) [14:47:56] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478686 (owner: 10Marostegui) [14:48:28] (03PS1) 10CDanis: syslog::centralserver: notify mtail when programs change [puppet] - 10https://gerrit.wikimedia.org/r/478687 (https://phabricator.wikimedia.org/T209863) [14:48:30] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Ottomata) > checked on stat1007 for iptables / ferm rules, expecting it to be puppetized. But to my surprise i found none at all. I thought everything... [14:48:55] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1099 T86338 T202167 (duration: 00m 47s) [14:49:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:01] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [14:49:02] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [14:49:16] (03CR) 10CDanis: [C: 032] syslog::centralserver: notify mtail when programs change [puppet] - 10https://gerrit.wikimedia.org/r/478687 (https://phabricator.wikimedia.org/T209863) (owner: 10CDanis) [14:49:20] (03CR) 10Nikerabbit: [C: 031] "+1 for consistency." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375616 (https://phabricator.wikimedia.org/T117845) (owner: 10Fomafix) [14:49:38] now I believe I need to go to all syslog::centralserver hosts and restart mtail by hand [14:50:29] <_joe_> !log uploading php-mongodb 1.5.3 to stretch-wikimedia thirdparty/php72 T206152 [14:50:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:32] T206152: Set up request profiling for PHP 7 - https://phabricator.wikimedia.org/T206152 [14:51:50] (03PS1) 10Marostegui: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478689 (https://phabricator.wikimedia.org/T86338) [14:52:43] cdanis: or use cumin ;) 'O:syslog::centralserver', batch and sleep as needed ;) [14:52:49] 10Operations, 10Scap, 10User-jijiki: Introduce state to Scap - https://phabricator.wikimedia.org/T209881 (10thcipriani) This sounds like a workable plan. I anticipate that this will be a medium-sized project given the current state of scap. That is, I think this can be accomplished in a quarter or so given a... [14:52:59] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478686 (owner: 10Marostegui) [14:53:14] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478689 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [14:53:55] (03CR) 10Ottomata: Make Kerberos configurable for cdh::hadoop::namenode::primary (032 comments) [puppet/cdh] - 10https://gerrit.wikimedia.org/r/478625 (owner: 10Muehlenhoff) [14:54:23] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478689 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [14:55:29] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1101:3318 T86338 T202167 (duration: 00m 46s) [14:55:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:34] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [14:55:35] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [14:55:37] !log Deploy schema change db1101:3318 T86338 T202167 [14:55:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:48] looking at wezen.codfw.wmnet, /srv/syslog/*.log are 0-length files... is that expected? those are the files mtail is attempting to tail [14:58:33] 10Operations, 10ops-eqiad: eqiad: Re-connect cage cameras - https://phabricator.wikimedia.org/T207965 (10Cmjohnson) I agree that cameras are a better fit for the mgmt network. The old switches are still in the racks for row A-C, I removed them from row D awhile ago. I know that we have been talking about us... [14:59:52] RECOVERY - Memory correctable errors -EDAC- on wtp2020 is OK: (C)4 ge (W)2 ge 0 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1var-server=wtp2020var-datasource=codfw%2520prometheus%252Fops [15:05:24] !log anomie@deploy1001 Synchronized php-1.33.0-wmf.6/includes/user/User.php: Backport fix for T210621 (duration: 00m 46s) [15:05:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:29] T210621: Internal api error: CannotCreateActorException - https://phabricator.wikimedia.org/T210621 [15:05:54] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478689 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [15:08:29] 10Operations: mtail seems broken on syslog::centralserver installations - https://phabricator.wikimedia.org/T211596 (10CDanis) [15:08:51] herron: I assigned to you just because I figured you probably have the most state on whatever happened? [15:09:26] cdanis: ok will have a look [15:10:06] 10Operations, 10User-CDanis: mtail seems broken on syslog::centralserver installations - https://phabricator.wikimedia.org/T211596 (10CDanis) [15:10:16] thanks :) [15:11:03] 10Operations, 10Developer-Advocacy, 10Gerrit: Remove port 29418 from cloning process - https://phabricator.wikimedia.org/T37611 (10fgiunchedi) a:05fgiunchedi>03None Unassigning as I'm not going to work on this [15:13:05] !log depooling db1097:3315 on a schema change - T85757 [15:13:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:09] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [15:13:15] (03CR) 10Banyek: [C: 032] mariadb: depool db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477590 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [15:16:35] (03PS2) 10Banyek: mariadb: depool db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477590 (https://phabricator.wikimedia.org/T85757) [15:21:23] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::php: install php-tideways, php-mongodb [puppet] - 10https://gerrit.wikimedia.org/r/478594 (https://phabricator.wikimedia.org/T206152) [15:21:25] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::php: install excimer on newer versions of php [puppet] - 10https://gerrit.wikimedia.org/r/475769 (https://phabricator.wikimedia.org/T205059) [15:22:07] (03PS3) 10Andrew Bogott: Remove all remaining wdq_mm references [puppet] - 10https://gerrit.wikimedia.org/r/463325 (owner: 10Alex Monk) [15:23:25] (03CR) 10Andrew Bogott: [C: 032] Remove all remaining wdq_mm references [puppet] - 10https://gerrit.wikimedia.org/r/463325 (owner: 10Alex Monk) [15:24:45] !log banyek@deploy1001 Synchronized wmf-config/db-eqiad.php: T85757: depool db1097:3315 (duration: 00m 46s) [15:24:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:49] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [15:25:46] (03PS3) 10Andrew Bogott: labs puppetmaster: Remove old promethium baremetal stuff [puppet] - 10https://gerrit.wikimedia.org/r/470101 (owner: 10Alex Monk) [15:26:17] (03PS2) 10BBlack: cache_text: Vary for PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/478680 (https://phabricator.wikimedia.org/T206339) [15:27:45] (03CR) 10Andrew Bogott: [C: 032] "Thanks for the cleanup!" [puppet] - 10https://gerrit.wikimedia.org/r/470101 (owner: 10Alex Monk) [15:28:39] (03CR) 10Muehlenhoff: [C: 031] "Looks good and my bad; when labmon* were ipv6-enabled (for consistent Prometheus ferm rules), I've added a PTR for labmon1001, but missed " [dns] - 10https://gerrit.wikimedia.org/r/478675 (owner: 10Volans) [15:28:47] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::mediawiki::php: install excimer on newer versions of php [puppet] - 10https://gerrit.wikimedia.org/r/475769 (https://phabricator.wikimedia.org/T205059) (owner: 10Giuseppe Lavagetto) [15:28:49] (03CR) 10Andrew Bogott: [C: 032] labmon1002: add missing PTR for IPv6 [dns] - 10https://gerrit.wikimedia.org/r/478675 (owner: 10Volans) [15:28:57] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::mediawiki::php: install php-tideways, php-mongodb [puppet] - 10https://gerrit.wikimedia.org/r/478594 (https://phabricator.wikimedia.org/T206152) (owner: 10Giuseppe Lavagetto) [15:29:27] (03PS3) 10Giuseppe Lavagetto: profile::mediawiki::php: install php-tideways, php-mongodb [puppet] - 10https://gerrit.wikimedia.org/r/478594 (https://phabricator.wikimedia.org/T206152) [15:29:34] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] profile::mediawiki::php: install php-tideways, php-mongodb [puppet] - 10https://gerrit.wikimedia.org/r/478594 (https://phabricator.wikimedia.org/T206152) (owner: 10Giuseppe Lavagetto) [15:31:02] !log repooling db1097:3315 after schema change - T85757 [15:31:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:31:05] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [15:31:14] (03PS1) 10Banyek: Revert "mariadb: depool db1097:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478695 [15:31:30] (03CR) 10jenkins-bot: mariadb: depool db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477590 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [15:32:30] (03CR) 10Elukey: Make Kerberos configurable for cdh::hadoop::namenode::primary (031 comment) [puppet/cdh] - 10https://gerrit.wikimedia.org/r/478625 (owner: 10Muehlenhoff) [15:34:04] (03CR) 10Banyek: [C: 032] Revert "mariadb: depool db1097:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478695 (owner: 10Banyek) [15:35:08] (03Merged) 10jenkins-bot: Revert "mariadb: depool db1097:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478695 (owner: 10Banyek) [15:36:39] !log banyek@deploy1001 Synchronized wmf-config/db-eqiad.php: T85757: repool db1097:3315 (duration: 00m 45s) [15:36:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:43] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [15:36:47] (03CR) 10Muehlenhoff: Make Kerberos configurable for cdh::hadoop::namenode::primary (032 comments) [puppet/cdh] - 10https://gerrit.wikimedia.org/r/478625 (owner: 10Muehlenhoff) [15:41:45] 10Operations, 10User-CDanis: mtail seems broken on syslog::centralserver installations - https://phabricator.wikimedia.org/T211596 (10herron) > I don't know what the intended configuration is here. I'm going to guess that previously, rsyslogd wrote directly to /srv/syslog/syslog.log and friends, and log rotati... [15:42:42] RECOVERY - IPMI Sensor Status on elastic2051 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK [15:43:08] (03PS3) 10Giuseppe Lavagetto: profile::mediawiki::php: install excimer on newer versions of php [puppet] - 10https://gerrit.wikimedia.org/r/475769 (https://phabricator.wikimedia.org/T205059) [15:44:33] (03CR) 10jenkins-bot: Revert "mariadb: depool db1097:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478695 (owner: 10Banyek) [15:45:25] 10Operations, 10MediaWiki-Cache, 10Performance-Team (Radar), 10User-Elukey, 10Wikimedia-production-error: Mcrouter periodically reports soft TKOs for mc[1,2]035 leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) [15:45:43] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478697 [15:45:45] 10Operations, 10MediaWiki-Cache, 10Performance-Team (Radar), 10User-Elukey, 10Wikimedia-production-error: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) [15:46:47] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478697 (owner: 10Marostegui) [15:47:50] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478697 (owner: 10Marostegui) [15:48:50] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1101:3318 T86338 T202167 (duration: 00m 46s) [15:48:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:55] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [15:48:56] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [15:50:40] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Can't we just apply the change everywhere?" [puppet] - 10https://gerrit.wikimedia.org/r/478198 (https://phabricator.wikimedia.org/T209489) (owner: 10Elukey) [15:52:03] (03CR) 10Elukey: "Yes there is https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/472099/ ready to go, I wanted to be conservative, but I guess I'll clo" [puppet] - 10https://gerrit.wikimedia.org/r/478198 (https://phabricator.wikimedia.org/T209489) (owner: 10Elukey) [15:52:07] (03Abandoned) 10Elukey: Apply interface::rps to mc1022 [puppet] - 10https://gerrit.wikimedia.org/r/478198 (https://phabricator.wikimedia.org/T209489) (owner: 10Elukey) [15:56:52] (03PS1) 10Andrew Bogott: nova: update comments explaining about cloudvirts and the scheduler pool [puppet] - 10https://gerrit.wikimedia.org/r/478699 [16:01:09] (03CR) 10Andrew Bogott: [C: 032] nova: update comments explaining about cloudvirts and the scheduler pool [puppet] - 10https://gerrit.wikimedia.org/r/478699 (owner: 10Andrew Bogott) [16:01:35] (03PS1) 10ArielGlenn: convert dump scripts to python3 [dumps] - 10https://gerrit.wikimedia.org/r/478702 (https://phabricator.wikimedia.org/T210989) [16:01:57] (03CR) 10jerkins-bot: [V: 04-1] convert dump scripts to python3 [dumps] - 10https://gerrit.wikimedia.org/r/478702 (https://phabricator.wikimedia.org/T210989) (owner: 10ArielGlenn) [16:03:10] !log installing PHP updates on netmon1002 [16:03:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:04:06] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478697 (owner: 10Marostegui) [16:14:56] (03PS1) 10Herron: centrallog: change receiver logrotate to systemd enabled postrotate [puppet] - 10https://gerrit.wikimedia.org/r/478706 (https://phabricator.wikimedia.org/T211596) [16:14:59] (03PS1) 10Mathew.onipe: admin: added joewalsh to researchers group [puppet] - 10https://gerrit.wikimedia.org/r/478707 (https://phabricator.wikimedia.org/T211115) [16:15:15] 10Operations, 10ops-codfw: ms-be2047 spontaneous reboots - https://phabricator.wikimedia.org/T209921 (10Papaul) Running Stress test on the system [16:15:56] 10Operations, 10Performance-Team, 10monitoring, 10User-CDanis: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 (10CDanis) I've heard of no issues with Grafana 5, and will be upgrading today. [ ] copy current grafana DB to the new `grafana1001` host one last time [ ] spot-check several d... [16:17:34] (03PS3) 10BBlack: cache_text: Vary for PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/478680 (https://phabricator.wikimedia.org/T206339) [16:18:58] (03CR) 10Muehlenhoff: admins: add new group for proton admins (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/478373 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [16:19:21] (03PS1) 10ArielGlenn: query checking scripts for auditing WikiExporter (dumps) queries [software] - 10https://gerrit.wikimedia.org/r/478708 (https://phabricator.wikimedia.org/T207628) [16:20:04] (03CR) 10jerkins-bot: [V: 04-1] query checking scripts for auditing WikiExporter (dumps) queries [software] - 10https://gerrit.wikimedia.org/r/478708 (https://phabricator.wikimedia.org/T207628) (owner: 10ArielGlenn) [16:20:31] 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Deprovision Diamond collectors no longer in use - https://phabricator.wikimedia.org/T183454 (10colewhite) [16:22:43] 10Operations, 10ops-codfw, 10netops: codfw row B recable and add QFX - https://phabricator.wikimedia.org/T210456 (10Papaul) The switch is connected to port 48 on scs-a1-codfw [16:23:10] 10Operations, 10ops-codfw, 10netops: codfw row B recable and add QFX - https://phabricator.wikimedia.org/T210456 (10Papaul) [16:24:26] 10Operations, 10Patch-For-Review, 10User-CDanis: mtail seems broken on syslog::centralserver installations - https://phabricator.wikimedia.org/T211596 (10CDanis) [16:24:31] 10Operations, 10monitoring, 10Patch-For-Review, 10User-CDanis: graph server temperature metrics - https://phabricator.wikimedia.org/T209863 (10CDanis) [16:28:31] (03PS2) 10Imarlier: config: move wgMFNoindexPages to InitialiseSettings-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476060 (https://phabricator.wikimedia.org/T206497) [16:28:51] 10Operations, 10ops-codfw, 10netops: codfw row B recable and add QFX - https://phabricator.wikimedia.org/T210456 (10Papaul) [16:29:19] 10Operations, 10monitoring: Graphite1001 disk usage at 96% - https://phabricator.wikimedia.org/T207040 (10fgiunchedi) 05Open>03Resolved Resolving, we're onto new graphite hardware now with more resources. [16:30:45] (03CR) 10Ema: cache_text: Vary for PHP7 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/478680 (https://phabricator.wikimedia.org/T206339) (owner: 10BBlack) [16:32:21] 10Operations, 10ops-codfw, 10netops: codfw row B recable and add QFX - https://phabricator.wikimedia.org/T210456 (10Papaul) [16:34:21] 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: Setup elasticsearch on new codfw servers - https://phabricator.wikimedia.org/T210265 (10Mathew.onipe) [16:38:45] (03PS1) 10Takidelfin: HD Logos: Add 1.5x and 2x variants of fr and fy wikibooks and fr wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478709 (https://phabricator.wikimedia.org/T150618) [16:43:30] 10Operations, 10Wikimedia-Logstash: Investigate approaches to ingest sensitive log producers - https://phabricator.wikimedia.org/T205855 (10herron) [16:44:18] (03PS1) 10Elukey: Fix mgmt PTRs for an-master1002 [dns] - 10https://gerrit.wikimedia.org/r/478710 [16:48:40] (03CR) 10Volans: [C: 031] "LGTM, thanks for fixing it" [dns] - 10https://gerrit.wikimedia.org/r/478710 (owner: 10Elukey) [16:48:46] (03PS1) 10Takidelfin: HD Logos: Add fr and fy wikibooks and fr wiiknews variants to InitaliseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478712 (https://phabricator.wikimedia.org/T150618) [16:48:59] 10Operations, 10Patch-For-Review, 10User-CDanis: mtail seems broken on syslog::centralserver installations - https://phabricator.wikimedia.org/T211596 (10CDanis) Cool, looks like mtail is happy now as well: ` cdanis@wezen.codfw.wmnet ~ % curl -s localhost:3903/metrics | head # TYPE cpu_throttled counter #... [16:50:19] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1002/13881/" [puppet] - 10https://gerrit.wikimedia.org/r/478706 (https://phabricator.wikimedia.org/T211596) (owner: 10Herron) [16:50:21] (03CR) 10CDanis: [C: 032] centrallog: change receiver logrotate to systemd enabled postrotate [puppet] - 10https://gerrit.wikimedia.org/r/478706 (https://phabricator.wikimedia.org/T211596) (owner: 10Herron) [16:50:27] (03CR) 10Papaul: [C: 031] Fix mgmt PTRs for an-master1002 [dns] - 10https://gerrit.wikimedia.org/r/478710 (owner: 10Elukey) [16:50:29] (03PS4) 10Ema: cache_text: Vary for PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/478680 (https://phabricator.wikimedia.org/T206339) (owner: 10BBlack) [16:50:37] (03PS2) 10GTirloni: remove diamond::collector reference from role::labs::nfs::secondary [puppet] - 10https://gerrit.wikimedia.org/r/478371 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite) [16:51:04] 10Operations, 10Patch-For-Review, 10User-CDanis: mtail seems broken on syslog::centralserver installations - https://phabricator.wikimedia.org/T211596 (10herron) With 478706 merged let's check back on this in 24 hours [16:51:22] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received [16:51:42] (03CR) 10GTirloni: [C: 032] remove diamond::collector reference from role::labs::nfs::secondary [puppet] - 10https://gerrit.wikimedia.org/r/478371 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite) [16:52:11] (03PS1) 10Elukey: Remove PTR for analytics1009 (host not in site.pp) [dns] - 10https://gerrit.wikimedia.org/r/478714 [16:52:28] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [16:55:29] (03PS2) 10Takidelfin: HD Logos: Add 1.5x and 2x variants of fr and fy wikibooks and fr wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478709 (https://phabricator.wikimedia.org/T150618) [16:55:31] (03PS2) 10Takidelfin: HD Logos: Add fr and fy wikibooks and fr wiiknews variants to InitaliseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478712 (https://phabricator.wikimedia.org/T150618) [16:55:39] (03CR) 10Volans: [C: 031] "LGTM, racktables have it in the decom rack (it dates back to 2011), see also T84555" [dns] - 10https://gerrit.wikimedia.org/r/478714 (owner: 10Elukey) [16:56:24] (03CR) 10jerkins-bot: [V: 04-1] HD Logos: Add fr and fy wikibooks and fr wiiknews variants to InitaliseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478712 (https://phabricator.wikimedia.org/T150618) (owner: 10Takidelfin) [16:56:43] (03PS3) 10Takidelfin: HD Logos: Add fr and fy wikibooks and fr wiiknews variants to InitaliseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478712 (https://phabricator.wikimedia.org/T150618) [16:57:33] (03PS1) 10Mathew.onipe: admins: add user toddleroux [puppet] - 10https://gerrit.wikimedia.org/r/478717 (https://phabricator.wikimedia.org/T209298) [16:58:32] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic, and 2 others: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10jijiki) [16:58:39] (03PS3) 10GTirloni: wmcs: add prometheus-memcached-exporter [puppet] - 10https://gerrit.wikimedia.org/r/477620 (https://phabricator.wikimedia.org/T147326) (owner: 10Cwhite) [16:58:47] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic, and 2 others: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10Mathew.onipe) [16:58:57] (03PS1) 10Elukey: Add wmfXXXX mgmt PTR record for analytics1069 [dns] - 10https://gerrit.wikimedia.org/r/478718 [17:00:26] (03CR) 10Dzahn: "> This could be done in a separate commit when the full list of people has been approved, imo." [puppet] - 10https://gerrit.wikimedia.org/r/478373 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [17:00:59] (03PS1) 10Elukey: Fix PTR record for an-master1001's mgmt entry [dns] - 10https://gerrit.wikimedia.org/r/478720 [17:02:05] (03CR) 10Dzahn: "i will remove the puppet part" [puppet] - 10https://gerrit.wikimedia.org/r/478373 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [17:02:12] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received [17:03:18] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [17:03:48] <_joe_> sigh [17:09:40] (03CR) 10GTirloni: [C: 032] wmcs: add prometheus-memcached-exporter [puppet] - 10https://gerrit.wikimedia.org/r/477620 (https://phabricator.wikimedia.org/T147326) (owner: 10Cwhite) [17:10:33] (03PS5) 10Ema: cache_text: Vary for PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/478680 (https://phabricator.wikimedia.org/T206339) (owner: 10BBlack) [17:15:10] 10Operations, 10Release-Engineering-Team, 10SRE-Access-Requests, 10Patch-For-Review, 10User-jijiki: Requesting access to deployment for Christoph Jauera (WMDE-Fisch) - https://phabricator.wikimedia.org/T211014 (10jijiki) [17:17:03] 10Operations, 10Analytics, 10Performance-Team, 10Traffic: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10Anomie) > server I note that with X-Wikimedia-Debug it seems you have to specify a backend, so this wouldn't be terribly useful there eit... [17:18:31] 10Operations, 10ops-codfw, 10decommission, 10Discovery-Search (Current work): Decommission elastic2001-2024 - https://phabricator.wikimedia.org/T211023 (10RobH) [17:18:38] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10User-jijiki: Requesting access to deployment for Christoph Jauera (WMDE-Fisch) - https://phabricator.wikimedia.org/T211014 (10greg) Approved from our side. [17:23:48] (03PS1) 10Jforrester: Disable ParserMigration now that Raggett has been dropped from MW, Part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478724 (https://phabricator.wikimedia.org/T211527) [17:24:33] jouncebot: next [17:24:33] In 0 hour(s) and 35 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181210T1800) [17:24:51] OK, I'm going to deploy ParserMigration disablement right now. [17:25:10] (03CR) 10Jforrester: [C: 032] Disable ParserMigration now that Raggett has been dropped from MW, Part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478724 (https://phabricator.wikimedia.org/T211527) (owner: 10Jforrester) [17:26:14] (03Merged) 10jenkins-bot: Disable ParserMigration now that Raggett has been dropped from MW, Part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478724 (https://phabricator.wikimedia.org/T211527) (owner: 10Jforrester) [17:29:39] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: T211527 Hot-deploy Disable ParserMigration now that Raggett has been dropped (duration: 00m 47s) [17:29:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:29:51] T211527: Notice: Undefined variable: wgTidyConf in /srv/mediawiki/wmf-config/CommonSettings.php on line 3672 - https://phabricator.wikimedia.org/T211527 [17:32:52] (03PS1) 10Jforrester: Uninstall the ParserMigration extension, Part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478726 [17:32:54] (03PS1) 10Jforrester: Uninstall the ParserMigration extension, Part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478727 [17:32:56] (03PS1) 10Jforrester: Uninstall the ParserMigration extension, Part III [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478728 [17:34:44] (03CR) 10Jforrester: [C: 04-2] "Emergency hot-fix for T211527 landed; leaving to the Parsing team to make the call as to whether they want this remediated or removed." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478726 (owner: 10Jforrester) [17:36:04] (03CR) 10jenkins-bot: Disable ParserMigration now that Raggett has been dropped from MW, Part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478724 (https://phabricator.wikimedia.org/T211527) (owner: 10Jforrester) [17:38:16] (03PS2) 10Joewalsh: admin: added joewalsh to researchers group [puppet] - 10https://gerrit.wikimedia.org/r/478707 (https://phabricator.wikimedia.org/T211115) (owner: 10Mathew.onipe) [17:40:21] (Conch released, sorry.) [17:46:39] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Ottomata) @MoritzMuehlenhoff wherever Baho's import job runs, the user running it (either him or some system user) will need access to a file with the... [17:50:08] 10Operations, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: Grant fdans permissions to deploy AQS in prod, and accessing the aqs hosts - https://phabricator.wikimedia.org/T211095 (10Nuria) Approved. [17:55:06] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Banyek) A quick recap on today's meeting: - we'll have an import in every month or in every quarter, this is tbd, but it will happen continuouly, but n... [17:58:48] !log restarting mysql instance on labsdb1004 to restore replication filters to the original state [17:58:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:59:31] !log restarting mysql instance on labsdb1004 to restore replication filters to the original state - T211210 [17:59:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:59:34] T211210: labsdb1004 replication broken for linkwatcher_linklog table - https://phabricator.wikimedia.org/T211210 [18:00:04] gehel and onimisionipe: Your horoscope predicts another unfortunate Wikidata Query Service weekly deploy deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181210T1800). [18:00:13] here here [18:00:31] I will be nice with Zuul too :) [18:01:24] 10Operations, 10Core Platform Team Backlog (Next), 10Patch-For-Review, 10Services (next): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10Nuria) [18:05:35] !log onimisionipe@deploy1001 Started deploy [wdqs/wdqs@dcde39f]: GUI Update [18:05:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:05:57] 10Operations, 10Community-Tech, 10MediaWiki-extensions-PageAssessments, 10Performance, 10User-Banyek: Issues with purgeUnusedProjects.php cron job on mwmaint1002 (Fri Oct 26) - https://phabricator.wikimedia.org/T208231 (10kaldari) @Banyek - Thanks for the ping. I don't think anything is unexpected here.... [18:06:04] (03CR) 10CRusnov: "> Patch Set 2:" [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/478458 (https://phabricator.wikimedia.org/T205899) (owner: 10CRusnov) [18:06:50] (03PS2) 10Elukey: admin: add fdans to deploy-aqs [puppet] - 10https://gerrit.wikimedia.org/r/477524 (https://phabricator.wikimedia.org/T211095) [18:06:55] 10Operations, 10ops-eqiad, 10DBA, 10User-Marostegui: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10RobH) p:05Triage>03Normal [18:07:19] (03CR) 10Elukey: [C: 032] Fix mgmt PTRs for an-master1002 [dns] - 10https://gerrit.wikimedia.org/r/478710 (owner: 10Elukey) [18:07:30] (03CR) 10Elukey: [C: 032] Remove PTR for analytics1009 (host not in site.pp) [dns] - 10https://gerrit.wikimedia.org/r/478714 (owner: 10Elukey) [18:08:04] 10Operations, 10Proton, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10Jhernandez) Approved on my side too. [18:08:10] (03CR) 10Elukey: [C: 032] Add wmfXXXX mgmt PTR record for analytics1069 [dns] - 10https://gerrit.wikimedia.org/r/478718 (owner: 10Elukey) [18:08:31] (03CR) 10Elukey: [C: 032] Fix PTR record for an-master1001's mgmt entry [dns] - 10https://gerrit.wikimedia.org/r/478720 (owner: 10Elukey) [18:08:34] 10Operations, 10Community-Tech, 10MediaWiki-extensions-PageAssessments, 10Performance, 10User-Banyek: Issues with purgeUnusedProjects.php cron job on mwmaint1002 (Fri Oct 26) - https://phabricator.wikimedia.org/T208231 (10Banyek) I think we should adjust the slow timer in a way of not to alert if the sc... [18:10:03] (03CR) 10Elukey: [C: 032] "Approved by the SRE team meeting" [puppet] - 10https://gerrit.wikimedia.org/r/477524 (https://phabricator.wikimedia.org/T211095) (owner: 10Elukey) [18:10:59] (03PS1) 10Robingan7: Revise images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478738 [18:11:03] 10Operations, 10ops-eqiad, 10DBA, 10User-Marostegui: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10RobH) So, to figure out the racking plan: db1061: s6 master : C3 db1062: s7 master : D4 db1063: m1 master : C5 db1064: x1 slave : D1 db1065: m5 master : D1 db1066:... [18:12:50] (03CR) 10CRusnov: [C: 031] "lgtm" [software/spicerack] - 10https://gerrit.wikimedia.org/r/477707 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [18:15:06] !log onimisionipe@deploy1001 Finished deploy [wdqs/wdqs@dcde39f]: GUI Update (duration: 09m 31s) [18:15:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:22] (03CR) 10ArielGlenn: "Code is for Python 3.5, CI runs 3.4. See https://phabricator.wikimedia.org/T191764 about that. Will try to work around this." [dumps] - 10https://gerrit.wikimedia.org/r/478702 (https://phabricator.wikimedia.org/T210989) (owner: 10ArielGlenn) [18:30:06] 10Operations, 10Performance-Team, 10monitoring, 10User-CDanis: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 (10CDanis) Thanks to @ema who found a bug in 5.4.0 -- the tag filter UI seems quite broken. Reported upstream: https://github.com/grafana/grafana/issues/14437 For now this can... [18:39:28] PROBLEM - cassandra-c SSL 10.192.32.139:7001 on restbase2004 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [18:39:38] PROBLEM - cassandra-c CQL 10.192.32.139:9042 on restbase2004 is CRITICAL: connect to address 10.192.32.139 and port 9042: Connection refused [18:39:49] (03Abandoned) 10Dduvall: Support a literal body for POST requests in `fetch_url` [software/service-checker] - 10https://gerrit.wikimedia.org/r/461457 (owner: 10Dduvall) [18:40:29] (03CR) 10Mathew.onipe: [C: 031] wdqs: collect JMX metrics from ConcurrentHttpRequestsFilter [puppet] - 10https://gerrit.wikimedia.org/r/463511 (https://phabricator.wikimedia.org/T204364) (owner: 10Gehel) [18:42:52] (03PS7) 10CRusnov: Add reports deployment to netbox profile [puppet] - 10https://gerrit.wikimedia.org/r/477845 (https://phabricator.wikimedia.org/T205899) [18:43:06] (03PS2) 10Dzahn: admins: add new group for proton admins [puppet] - 10https://gerrit.wikimedia.org/r/478373 (https://phabricator.wikimedia.org/T211382) [18:44:57] (03PS8) 10CRusnov: Add reports deployment to netbox profile [puppet] - 10https://gerrit.wikimedia.org/r/477845 (https://phabricator.wikimedia.org/T205899) [18:45:48] (03CR) 10CRusnov: "Done and Done." (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/477845 (https://phabricator.wikimedia.org/T205899) (owner: 10CRusnov) [18:50:44] (03PS3) 10Dzahn: admins: add new group for proton admins [puppet] - 10https://gerrit.wikimedia.org/r/478373 (https://phabricator.wikimedia.org/T211382) [18:51:51] (03CR) 10Dzahn: [C: 031] "was approved "pending that Daniel removes the puppet part" which i did. adding people in a second change" [puppet] - 10https://gerrit.wikimedia.org/r/478373 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [18:52:24] (03CR) 10Volans: Add reports deployment to netbox profile (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/477845 (https://phabricator.wikimedia.org/T205899) (owner: 10CRusnov) [18:52:30] (03CR) 10Mathew.onipe: Make dumps dir tagged as in-wdqs-data-dir (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/478374 (https://phabricator.wikimedia.org/T211462) (owner: 10Smalyshev) [18:55:00] (03CR) 10Mobrovac: [C: 031] admins: add new group for proton admins [puppet] - 10https://gerrit.wikimedia.org/r/478373 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [18:56:42] (03CR) 10Smalyshev: Make dumps dir tagged as in-wdqs-data-dir (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/478374 (https://phabricator.wikimedia.org/T211462) (owner: 10Smalyshev) [18:57:38] (03CR) 10Smalyshev: "Do we still need this one?" [puppet] - 10https://gerrit.wikimedia.org/r/463511 (https://phabricator.wikimedia.org/T204364) (owner: 10Gehel) [18:59:55] (03PS9) 10CRusnov: Add reports deployment to netbox profile [puppet] - 10https://gerrit.wikimedia.org/r/477845 (https://phabricator.wikimedia.org/T205899) [19:00:54] (03CR) 10Mathew.onipe: [C: 031] Make dumps dir tagged as in-wdqs-data-dir (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/478374 (https://phabricator.wikimedia.org/T211462) (owner: 10Smalyshev) [19:01:23] jouncebot next [19:01:23] In 1 hour(s) and 58 minute(s): Services – Parsoid / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181210T2100) [19:01:27] jouncebot now [19:01:27] For the next 0 hour(s) and 58 minute(s): Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181210T1900) [19:03:13] (03CR) 10Muehlenhoff: [C: 031] "The Prometheus exporter is now deployed on all the servers of the role:" [puppet] - 10https://gerrit.wikimedia.org/r/469250 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite) [19:03:17] (03CR) 10CRusnov: Add reports deployment to netbox profile (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/477845 (https://phabricator.wikimedia.org/T205899) (owner: 10CRusnov) [19:04:36] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:05:25] (03CR) 10Muehlenhoff: "FWIW, this duplicates the existing" [puppet] - 10https://gerrit.wikimedia.org/r/429221 (https://phabricator.wikimedia.org/T183454) (owner: 10Filippo Giunchedi) [19:05:46] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.916 second response time [19:06:21] (03PS10) 10CRusnov: Add reports deployment to netbox profile [puppet] - 10https://gerrit.wikimedia.org/r/477845 (https://phabricator.wikimedia.org/T205899) [19:08:47] (03CR) 10Muehlenhoff: [C: 031] "Looks good to go once the two dependant patches are merged" [puppet] - 10https://gerrit.wikimedia.org/r/466907 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite) [19:09:24] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:11:08] (03CR) 10Gehel: [C: 032] Make dumps dir tagged as in-wdqs-data-dir [puppet] - 10https://gerrit.wikimedia.org/r/478374 (https://phabricator.wikimedia.org/T211462) (owner: 10Smalyshev) [19:13:00] (03PS2) 10Gehel: Make dumps dir tagged as in-wdqs-data-dir [puppet] - 10https://gerrit.wikimedia.org/r/478374 (https://phabricator.wikimedia.org/T211462) (owner: 10Smalyshev) [19:14:25] SMalyshev: ^ (I just happened to be around) [19:14:34] gehel: thanks! :) [19:16:10] (03PS11) 10CRusnov: Add reports deployment to netbox profile [puppet] - 10https://gerrit.wikimedia.org/r/477845 (https://phabricator.wikimedia.org/T205899) [19:20:43] 10Operations, 10Analytics, 10Performance-Team, 10Traffic: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10Krinkle) >>! In T210484#4811185, @Anomie wrote: >> server > > I note that with X-Wikimedia-Debug it seems you have to specify a backend,... [19:21:56] (03PS1) 10Ottomata: Add hieradata/labs/cloud-analytics/common.yaml [puppet] - 10https://gerrit.wikimedia.org/r/478746 (https://phabricator.wikimedia.org/T204951) [19:24:08] 10Operations, 10Traffic, 10Continuous-Integration-Infrastructure (Slipway), 10User-ArielGlenn: CI jobs for authdns linting need to run on Stretch - https://phabricator.wikimedia.org/T205439 (10ArielGlenn) [19:25:21] (03PS2) 10Ottomata: Add hieradata/labs/cloud-analytics/common.yaml [puppet] - 10https://gerrit.wikimedia.org/r/478746 (https://phabricator.wikimedia.org/T204951) [19:26:52] (03CR) 10Ottomata: [C: 032] Add hieradata/labs/cloud-analytics/common.yaml [puppet] - 10https://gerrit.wikimedia.org/r/478746 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [19:28:40] (03CR) 10Cwhite: "> afaict the role 'bastion' isn't used/present, changing 'cluster' in" [puppet] - 10https://gerrit.wikimedia.org/r/478372 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [19:32:24] (03PS1) 10Ottomata: Add cloud-analytics zookeeper settings [puppet] - 10https://gerrit.wikimedia.org/r/478748 (https://phabricator.wikimedia.org/T204951) [19:32:59] (03CR) 10jerkins-bot: [V: 04-1] Add cloud-analytics zookeeper settings [puppet] - 10https://gerrit.wikimedia.org/r/478748 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [19:34:36] (03PS2) 10Ottomata: Add cloud-analytics zookeeper settings [puppet] - 10https://gerrit.wikimedia.org/r/478748 (https://phabricator.wikimedia.org/T204951) [19:35:08] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478709 (https://phabricator.wikimedia.org/T150618) (owner: 10Takidelfin) [19:35:13] (03CR) 10Ottomata: [C: 032] Add cloud-analytics zookeeper settings [puppet] - 10https://gerrit.wikimedia.org/r/478748 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [19:35:19] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478712 (https://phabricator.wikimedia.org/T150618) (owner: 10Takidelfin) [19:37:00] 10Operations, 10ops-codfw, 10netops: codfw row B recable and add QFX - https://phabricator.wikimedia.org/T210456 (10ayounsi) [19:37:06] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 4.110 second response time [19:38:16] PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS1299/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [19:39:26] RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 24, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [19:40:48] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:42:50] 10Operations, 10Operations-Software-Development, 10Patch-For-Review: Develop and deploy at least three Netbox reports to assist with data correctness and consistency - https://phabricator.wikimedia.org/T205899 (10Volans) [19:44:20] (03PS1) 10Ottomata: Update zookeeper package version for cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478751 (https://phabricator.wikimedia.org/T204951) [19:44:37] (03CR) 10Ottomata: [V: 032 C: 032] Update zookeeper package version for cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478751 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [19:45:09] (03CR) 10Cwhite: "> > afaict the role 'bastion' isn't used/present, changing 'cluster'" [puppet] - 10https://gerrit.wikimedia.org/r/478372 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [19:45:32] (03PS2) 10Cwhite: add bastion cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/478372 (https://phabricator.wikimedia.org/T210486) [19:48:29] (03PS3) 10CRusnov: Add "Coherence" check [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/478458 (https://phabricator.wikimedia.org/T205899) [19:49:51] (03CR) 10CRusnov: "This changeset adds the ticket checker. It also does not currently filter on any dates, which we should decide on how we want done and for" [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/478458 (https://phabricator.wikimedia.org/T205899) (owner: 10CRusnov) [19:50:23] (03PS3) 10Cwhite: add bastion cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/478372 (https://phabricator.wikimedia.org/T210486) [19:51:20] (03PS4) 10Cwhite: add bastion cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/478372 (https://phabricator.wikimedia.org/T210486) [19:52:27] (03PS1) 10Ottomata: Allow configuration of $hadoop_var_directory in profile::hadoop::commmon [puppet] - 10https://gerrit.wikimedia.org/r/478755 (https://phabricator.wikimedia.org/T204951) [19:56:17] 10Operations, 10Performance-Team, 10monitoring, 10User-CDanis: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 (10CDanis) This bug may have been fixed in 5.4.1. Going to grab that version into `wikimedia-stretch`. [19:56:50] (03CR) 10Ottomata: [C: 032] "No op!" [puppet] - 10https://gerrit.wikimedia.org/r/478755 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [19:58:09] !log T210416: updating grafana to 5.4.1 in stretch-wikimedia: reprepro --restrict grafana update stretch-wikimedia [19:58:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:58:13] T210416: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 [20:01:36] (03PS1) 10Ottomata: Use subdir of hadoop data path for datanode_mounts in cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478758 (https://phabricator.wikimedia.org/T204951) [20:01:57] (03CR) 10Ottomata: [V: 032 C: 032] Use subdir of hadoop data path for datanode_mounts in cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478758 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [20:02:22] 10Operations, 10Performance-Team, 10monitoring, 10User-CDanis: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 (10CDanis) One part of the bug is fixed. The other (typing in the tag filter dropdown box) is not. Proceeding as discussed with @ema . [20:04:53] (03PS1) 10CDanis: Revert "grafana1001: answer for grafana-beta.wikimedia.org" [puppet] - 10https://gerrit.wikimedia.org/r/478763 (https://phabricator.wikimedia.org/T210416) [20:06:37] (03PS1) 10CDanis: Switch grafana.wikimedia.org to point to grafana1001 [puppet] - 10https://gerrit.wikimedia.org/r/478765 (https://phabricator.wikimedia.org/T210416) [20:08:42] 10Operations, 10DBA, 10Gerrit, 10Release-Engineering-Team (Next): Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532 (10Paladox) ReviewDB has now been removed upstream. [20:10:20] (03CR) 10Cwhite: [C: 032] add bastion cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/478372 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [20:10:29] (03PS5) 10Cwhite: add bastion cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/478372 (https://phabricator.wikimedia.org/T210486) [20:11:21] (03CR) 10CDanis: [C: 032] Revert "grafana1001: answer for grafana-beta.wikimedia.org" [puppet] - 10https://gerrit.wikimedia.org/r/478763 (https://phabricator.wikimedia.org/T210416) (owner: 10CDanis) [20:11:42] (03CR) 10CDanis: [C: 032] Switch grafana.wikimedia.org to point to grafana1001 [puppet] - 10https://gerrit.wikimedia.org/r/478765 (https://phabricator.wikimedia.org/T210416) (owner: 10CDanis) [20:11:49] (03PS2) 10CDanis: Switch grafana.wikimedia.org to point to grafana1001 [puppet] - 10https://gerrit.wikimedia.org/r/478765 (https://phabricator.wikimedia.org/T210416) [20:12:05] (03PS6) 10Cwhite: add bastion cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/478372 (https://phabricator.wikimedia.org/T210486) [20:12:42] 10Operations, 10ops-codfw, 10Core Platform Team, 10Services (doing), and 2 others: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 (10Eevans) [20:13:35] !log decommissioning cassandra-a, restbase2005 -- T210843 [20:13:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:13:39] T210843: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 [20:14:31] (03PS1) 10Ottomata: Set monitoring_enabled to true for cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478767 (https://phabricator.wikimedia.org/T204951) [20:14:33] (03PS1) 10CDanis: Revert "Revert "grafana1001: answer for grafana-beta.wikimedia.org"" [puppet] - 10https://gerrit.wikimedia.org/r/478768 [20:15:01] (03PS2) 10CDanis: Revert "Revert "grafana1001: answer for grafana-beta.wikimedia.org"" [puppet] - 10https://gerrit.wikimedia.org/r/478768 [20:15:33] (03PS2) 10Ottomata: Set monitoring_enabled to true for cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478767 (https://phabricator.wikimedia.org/T204951) [20:15:35] (03CR) 10CDanis: [C: 032] Revert "Revert "grafana1001: answer for grafana-beta.wikimedia.org"" [puppet] - 10https://gerrit.wikimedia.org/r/478768 (owner: 10CDanis) [20:15:53] (03CR) 10Ottomata: [C: 032] Set monitoring_enabled to true for cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478767 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [20:15:58] (03PS3) 10Ottomata: Set monitoring_enabled to true for cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478767 (https://phabricator.wikimedia.org/T204951) [20:16:04] (03CR) 10Ottomata: [V: 032 C: 032] Set monitoring_enabled to true for cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478767 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [20:20:07] !log T210416: setting grafana.wikimedia.org (currently served by krypton) to read-only and copying to grafana1001 (serving grafana-beta) [20:20:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:20:11] T210416: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 [20:24:52] jouncebot: next [20:24:52] In 0 hour(s) and 35 minute(s): Services – Parsoid / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181210T2100) [20:24:59] OK, I've got a quick one. [20:25:32] (03CR) 10Jforrester: [C: 032] "Parsing team say go." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478726 (owner: 10Jforrester) [20:25:37] (03CR) 10Jforrester: [C: 032] Uninstall the ParserMigration extension, Part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478727 (owner: 10Jforrester) [20:25:40] (03CR) 10Jforrester: [C: 032] Uninstall the ParserMigration extension, Part III [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478728 (owner: 10Jforrester) [20:25:54] !log messing with ulsfo power for 103.02.23 tower b, shouldnt disrupt anything T209101 [20:25:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:25:58] T209101: ulsfo: install new PDUs in racks / phase out APC loaner PDU use - https://phabricator.wikimedia.org/T209101 [20:26:18] !log T210416: switching grafana.wikimedia.org to point to grafana1001.eqiad.wmnet [20:26:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:26:21] (03PS1) 10Ottomata: Wrap package zookeeper in if !defined block [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/478770 [20:26:22] T210416: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 [20:26:37] (03PS4) 10Dzahn: admins: add new group for proton admins [puppet] - 10https://gerrit.wikimedia.org/r/478373 (https://phabricator.wikimedia.org/T211382) [20:26:41] (03Merged) 10jenkins-bot: Uninstall the ParserMigration extension, Part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478726 (owner: 10Jforrester) [20:26:45] (03Merged) 10jenkins-bot: Uninstall the ParserMigration extension, Part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478727 (owner: 10Jforrester) [20:26:47] (03CR) 10Ottomata: [V: 032 C: 032] Wrap package zookeeper in if !defined block [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/478770 (owner: 10Ottomata) [20:26:49] (03CR) 10Dzahn: [C: 032] admins: add new group for proton admins [puppet] - 10https://gerrit.wikimedia.org/r/478373 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [20:26:51] (03Merged) 10jenkins-bot: Uninstall the ParserMigration extension, Part III [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478728 (owner: 10Jforrester) [20:27:09] (03Merged) 10jenkins-bot: Wrap package zookeeper in if !defined block [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/478770 (owner: 10Ottomata) [20:27:23] (03PS1) 10CDanis: grafana1001 answers for grafana.wikimedia.org (the default) [puppet] - 10https://gerrit.wikimedia.org/r/478771 (https://phabricator.wikimedia.org/T210416) [20:27:34] (03CR) 10CDanis: [C: 032] grafana1001 answers for grafana.wikimedia.org (the default) [puppet] - 10https://gerrit.wikimedia.org/r/478771 (https://phabricator.wikimedia.org/T210416) (owner: 10CDanis) [20:27:45] (03PS1) 10Ottomata: Bump zookeeper submodule version [puppet] - 10https://gerrit.wikimedia.org/r/478772 (https://phabricator.wikimedia.org/T204951) [20:28:06] (03PS2) 10CDanis: grafana1001 answers for grafana.wikimedia.org (the default) [puppet] - 10https://gerrit.wikimedia.org/r/478771 (https://phabricator.wikimedia.org/T210416) [20:28:09] (03CR) 10CDanis: [V: 032 C: 032] grafana1001 answers for grafana.wikimedia.org (the default) [puppet] - 10https://gerrit.wikimedia.org/r/478771 (https://phabricator.wikimedia.org/T210416) (owner: 10CDanis) [20:28:31] (03PS3) 10CDanis: Switch grafana.wikimedia.org to point to grafana1001 [puppet] - 10https://gerrit.wikimedia.org/r/478765 (https://phabricator.wikimedia.org/T210416) [20:29:06] (03CR) 10Ottomata: [C: 032] Bump zookeeper submodule version [puppet] - 10https://gerrit.wikimedia.org/r/478772 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [20:29:13] (03PS2) 10Ottomata: Bump zookeeper submodule version [puppet] - 10https://gerrit.wikimedia.org/r/478772 (https://phabricator.wikimedia.org/T204951) [20:29:16] (03CR) 10Ottomata: [V: 032 C: 032] Bump zookeeper submodule version [puppet] - 10https://gerrit.wikimedia.org/r/478772 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [20:29:37] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: Uninstall the ParserMigration extension, Part I I338a3d8a87fd (duration: 00m 47s) [20:29:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:29:55] (03PS4) 10CDanis: Switch grafana.wikimedia.org to point to grafana1001 [puppet] - 10https://gerrit.wikimedia.org/r/478765 (https://phabricator.wikimedia.org/T210416) [20:30:40] cdanis: am merging [20:30:43] yourt change [20:30:49] ok go ahead [20:30:54] i will merge my next change immediately after [20:30:57] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Uninstall the ParserMigration extension, Part II I1f7266f55a (duration: 00m 46s) [20:30:58] k! [20:30:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:31:49] please let me know when done ottomata, I see a TODO in puppet-merge to add locking [20:32:35] (03CR) 10jenkins-bot: Uninstall the ParserMigration extension, Part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478726 (owner: 10Jforrester) [20:32:37] (03CR) 10jenkins-bot: Uninstall the ParserMigration extension, Part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478727 (owner: 10Jforrester) [20:32:37] !log jforrester@deploy1001 Synchronized wmf-config/extension-list: Uninstall the ParserMigration extension, Part III I332939809 (duration: 00m 46s) [20:32:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:32:39] (03CR) 10jenkins-bot: Uninstall the ParserMigration extension, Part III [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478728 (owner: 10Jforrester) [20:33:04] OK, conch released. [20:33:47] :) [20:33:48] cdanis: am done [20:33:50] proceed! [20:33:55] (03PS1) 10Cwhite: hiera: add trafficserver cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/478774 (https://phabricator.wikimedia.org/T210486) [20:35:44] !log T210416: grafana.wikimedia.org switch to point to grafana1001.eqiad.wmnet (running grafana 5.4.1) [20:35:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:35:47] T210416: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 [20:36:23] woooo [20:36:48] (03PS1) 10Dzahn: admins: add to proton-admins: pmiazga, bsitzmann, mholloway, mbsantos, tgr [puppet] - 10https://gerrit.wikimedia.org/r/478776 (https://phabricator.wikimedia.org/T211382) [20:43:47] (03PS2) 10Dzahn: admins: add to proton-admins: pmiazga, bsitzmann, mholloway, mbsantos, tgr [puppet] - 10https://gerrit.wikimedia.org/r/478776 (https://phabricator.wikimedia.org/T211382) [20:43:54] (03CR) 10Imarlier: [C: 032] config: move wgMFNoindexPages to InitialiseSettings-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476060 (https://phabricator.wikimedia.org/T206497) (owner: 10Imarlier) [20:44:22] (03CR) 10Dzahn: [C: 032] "continued with https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/478776/ which is the actual access requests that gives permissions" [puppet] - 10https://gerrit.wikimedia.org/r/478373 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [20:45:32] PROBLEM - MariaDB Slave Lag: m3 on db2078 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 568.19 seconds [20:45:42] PROBLEM - MariaDB Slave Lag: m3 on db2042 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 575.24 seconds [20:45:55] (03PS3) 10Dzahn: admins: add to proton-admins: pmiazga, bsitzmann, mholloway, mbsantos, tgr [puppet] - 10https://gerrit.wikimedia.org/r/478776 (https://phabricator.wikimedia.org/T211382) [20:48:11] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/data/citation/{format}/{query} (Get citation for Darth Vader) timed out before a response was received [20:48:21] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10User-jijiki: Requesting access to `researchers` group for joewalsh - https://phabricator.wikimedia.org/T211115 (10Dzahn) approved in SRE-2018-12-10#Access_Requests [20:48:54] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10User-jijiki: Requesting access to deployment for Christoph Jauera (WMDE-Fisch) - https://phabricator.wikimedia.org/T211014 (10Dzahn) approved in SRE meeting SRE-2018-12-10#Access_Requests [20:49:05] RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy [20:49:46] 10Operations, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: Grant fdans permissions to deploy AQS in prod, and accessing the aqs hosts - https://phabricator.wikimedia.org/T211095 (10Dzahn) approved in SRE-2018-12-10#Access_Requests pending manager approval which is now done [20:51:00] (03CR) 10Mholloway: [C: 031] admins: add to proton-admins: pmiazga, bsitzmann, mholloway, mbsantos, tgr [puppet] - 10https://gerrit.wikimedia.org/r/478776 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [20:51:37] (03CR) 10Mobrovac: [C: 04-1] admins: add to proton-admins: pmiazga, bsitzmann, mholloway, mbsantos, tgr (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/478776 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [20:52:06] 10Operations, 10ops-ulsfo: ulsfo: install new PDUs in racks / phase out APC loaner PDU use - https://phabricator.wikimedia.org/T209101 (10RobH) Ok, good news, the new PDUs will fit just fine in the racks, as long as we remove our cable managers. {F27481371} {F27481372} As one can see, the deeper 1U cable man... [20:52:24] (03PS1) 10Ottomata: Attempt to get around prometheus jmx exporter race condition on new cluster [puppet] - 10https://gerrit.wikimedia.org/r/478778 (https://phabricator.wikimedia.org/T204951) [20:53:57] (03PS2) 10Ottomata: Attempt to get around prometheus jmx exporter race condition on new cluster [puppet] - 10https://gerrit.wikimedia.org/r/478778 (https://phabricator.wikimedia.org/T204951) [20:56:01] (03CR) 10Ottomata: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/13884/an-master1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/478778 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [20:57:51] 10Operations, 10monitoring, 10Patch-For-Review, 10Performance-Team (Radar): Provision >= 50% of statsd/Graphite-only metrics in Prometheus - https://phabricator.wikimedia.org/T205870 (10Imarlier) [20:57:55] PROBLEM - ensure kvm processes are running on cloudvirt1023 is CRITICAL: PROCS CRITICAL: 96 processes with regex args qemu-system-x86_64 [20:58:04] 10Operations, 10ops-ulsfo: ulsfo: install new PDUs in racks / phase out APC loaner PDU use - https://phabricator.wikimedia.org/T209101 (10RobH) [21:00:05] cscott, arlolra, subbu, bearND, halfak, and Amir1: (Dis)respected human, time to deploy Services – Parsoid / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181210T2100). Please do the needful. [21:00:05] RECOVERY - ensure kvm processes are running on cloudvirt1023 is OK: PROCS OK: 95 processes with regex args qemu-system-x86_64 [21:01:12] 10Operations, 10monitoring, 10Patch-For-Review, 10Performance-Team (Radar), 10User-CDanis: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 (10Imarlier) [21:01:17] I want to deploy something for ores [21:03:43] (03PS1) 10Ottomata: Undo last change and temporarily disable JMX prometheus in cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478779 (https://phabricator.wikimedia.org/T204951) [21:04:26] ores rev to rollback in case needed: 9b9ba06265c9191a0087ecdf25fbef712c642953 [21:05:18] (03CR) 10Ottomata: [C: 032] Undo last change and temporarily disable JMX prometheus in cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478779 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [21:05:21] !log ladsgroup@deploy1001 Started deploy [ores/deploy@03b9c98]: Add celery4 configs back to the deploy repo [21:05:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:06:06] (03PS12) 10CRusnov: Add reports deployment to netbox profile [puppet] - 10https://gerrit.wikimedia.org/r/477845 (https://phabricator.wikimedia.org/T205899) [21:06:21] Shiny fancy grafana [21:06:31] 10Operations, 10DBA, 10Performance-Team: Increase parsercache keys TTL from 22 days back to 30 days - https://phabricator.wikimedia.org/T210992 (10Imarlier) a:03aaron @aaron to provide feedback, will assign back once he has. [21:07:01] 10Operations, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: Grant fdans permissions to deploy AQS in prod, and accessing the aqs hosts - https://phabricator.wikimedia.org/T211095 (10Dzahn) @fdans said: > There are two things I'd like to do .. - "Deploy AQS with scap from deployment.eqiad.wmnet"... [21:07:14] 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: Setup elasticsearch on new codfw servers - https://phabricator.wikimedia.org/T210265 (10debt) 05Open>03Resolved [21:07:25] 10Operations, 10Analytics, 10Performance-Team, 10Traffic: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10Gilles) a:03Gilles [21:07:52] 10Operations, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: Grant fdans permissions to deploy AQS in prod, and accessing the aqs hosts - https://phabricator.wikimedia.org/T211095 (10Dzahn) 05Open>03Resolved a:03Dzahn [21:10:31] 10Operations, 10ops-ulsfo: ulsfo: install new PDUs in racks / phase out APC loaner PDU use - https://phabricator.wikimedia.org/T209101 (10RobH) So some bad news: I only ordered enough brackets for half the PDUs. T211632 has been created to order the other half. However, this will NOT block the migration to... [21:10:32] (03PS13) 10CRusnov: Add reports deployment to netbox profile [puppet] - 10https://gerrit.wikimedia.org/r/477845 (https://phabricator.wikimedia.org/T205899) [21:13:03] (03CR) 10Volans: [C: 031] "LGTM, compiler too seems happy:" [puppet] - 10https://gerrit.wikimedia.org/r/477845 (https://phabricator.wikimedia.org/T205899) (owner: 10CRusnov) [21:13:32] 10Operations, 10Discovery-Search (Current work), 10Epic, 10Patch-For-Review: Migrate elasticsearch scripts to spicerack cookbooks - https://phabricator.wikimedia.org/T202885 (10debt) [21:13:37] 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: Write cookbooks to support spicerack's elasticsearch multi cluster/instance - https://phabricator.wikimedia.org/T207919 (10debt) 05Open>03Resolved [21:14:49] !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@9f4b567]: More internal promisification and other performance tweaks (T202642) [21:14:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:14:52] T202642: Investigate how to fix the performance problems caused by CPU bound work on the MCS services - https://phabricator.wikimedia.org/T202642 [21:17:13] (03PS1) 10Ottomata: Use either /usr/lib/zookeeper or /usr/share/zookeeper [puppet/cdh] - 10https://gerrit.wikimedia.org/r/478781 [21:17:26] (03PS2) 10Ottomata: Use either /usr/lib/zookeeper or /usr/share/zookeeper [puppet/cdh] - 10https://gerrit.wikimedia.org/r/478781 [21:17:56] !log arlolra@deploy1001 Started deploy [parsoid/deploy@dc9b3a1]: Updating Parsoid to 19560da [21:17:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:18:17] (03PS14) 10CRusnov: Add reports deployment to netbox profile [puppet] - 10https://gerrit.wikimedia.org/r/477845 (https://phabricator.wikimedia.org/T205899) [21:19:06] !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@9f4b567]: More internal promisification and other performance tweaks (T202642) (duration: 04m 17s) [21:19:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:19:15] (03CR) 10CRusnov: [C: 032] Add reports deployment to netbox profile [puppet] - 10https://gerrit.wikimedia.org/r/477845 (https://phabricator.wikimedia.org/T205899) (owner: 10CRusnov) [21:19:40] (03PS3) 10Ottomata: Use either /usr/lib/zookeeper or /usr/share/zookeeper [puppet/cdh] - 10https://gerrit.wikimedia.org/r/478781 [21:20:15] (03CR) 10Ottomata: [C: 032] Use either /usr/lib/zookeeper or /usr/share/zookeeper [puppet/cdh] - 10https://gerrit.wikimedia.org/r/478781 (owner: 10Ottomata) [21:20:45] !log ladsgroup@deploy1001 Finished deploy [ores/deploy@03b9c98]: Add celery4 configs back to the deploy repo (duration: 15m 25s) [21:20:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:21:04] (03PS1) 10Ottomata: Bump cdh submodule to vary zkCli.sh path [puppet] - 10https://gerrit.wikimedia.org/r/478782 (https://phabricator.wikimedia.org/T204951) [21:24:10] (03CR) 10Ottomata: [C: 032] Bump cdh submodule to vary zkCli.sh path [puppet] - 10https://gerrit.wikimedia.org/r/478782 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [21:24:20] (03PS2) 10Ottomata: Bump cdh submodule to vary zkCli.sh path [puppet] - 10https://gerrit.wikimedia.org/r/478782 (https://phabricator.wikimedia.org/T204951) [21:24:22] (03CR) 10Ottomata: [V: 032 C: 032] Bump cdh submodule to vary zkCli.sh path [puppet] - 10https://gerrit.wikimedia.org/r/478782 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [21:27:01] PROBLEM - netbox HTTPS on netmon1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:28:14] 10Operations, 10ops-ulsfo, 10Traffic: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327 (10RobH) [21:28:16] 10Operations, 10ops-ulsfo, 10Traffic, 10Patch-For-Review: setup bast4002/WMF7218 - https://phabricator.wikimedia.org/T179050 (10RobH) 05Open>03Resolved [21:28:19] (03PS1) 10Ottomata: Set necessary PATH in hadoop-hdfs-zkfc-init [puppet/cdh] - 10https://gerrit.wikimedia.org/r/478783 [21:28:39] netbox is known, chaomodus is looking into it ^^^ [21:29:09] (03CR) 10Ottomata: [C: 032] Set necessary PATH in hadoop-hdfs-zkfc-init [puppet/cdh] - 10https://gerrit.wikimedia.org/r/478783 (owner: 10Ottomata) [21:29:11] !log arlolra@deploy1001 Finished deploy [parsoid/deploy@dc9b3a1]: Updating Parsoid to 19560da (duration: 11m 15s) [21:29:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:30:10] 10Operations, 10ops-ulsfo, 10Traffic, 10decommission: decommission/replace bast4001.wikimedia.org - https://phabricator.wikimedia.org/T178592 (10RobH) [21:30:23] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Dzahn) >>! In T208622#4811324, @Banyek wrote: > and the password issue too. We have a previous case where we solved "give access to a mysql db to a pu... [21:30:51] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:30:52] (03PS1) 10Ottomata: cdh submodule bump https://gerrit.wikimedia.org/r/#/c/operations/puppet/cdh/+/478783/ [puppet] - 10https://gerrit.wikimedia.org/r/478784 [21:31:02] (03PS2) 10Ottomata: cdh submodule bump https://gerrit.wikimedia.org/r/#/c/operations/puppet/cdh/+/478783/ [puppet] - 10https://gerrit.wikimedia.org/r/478784 [21:31:44] (03CR) 10jerkins-bot: [V: 04-1] cdh submodule bump https://gerrit.wikimedia.org/r/#/c/operations/puppet/cdh/+/478783/ [puppet] - 10https://gerrit.wikimedia.org/r/478784 (owner: 10Ottomata) [21:32:26] (03CR) 10Ottomata: [V: 032 C: 032] cdh submodule bump https://gerrit.wikimedia.org/r/#/c/operations/puppet/cdh/+/478783/ [puppet] - 10https://gerrit.wikimedia.org/r/478784 (owner: 10Ottomata) [21:32:45] (03PS1) 10RobH: decommission bast4001 [puppet] - 10https://gerrit.wikimedia.org/r/478785 (https://phabricator.wikimedia.org/T178592) [21:33:13] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational [21:33:35] (03PS2) 10RobH: decommission bast4001 [puppet] - 10https://gerrit.wikimedia.org/r/478785 (https://phabricator.wikimedia.org/T178592) [21:33:37] (03PS5) 10Cwhite: incorporating first round of feedback [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/477360 (https://phabricator.wikimedia.org/T208066) [21:33:59] RECOVERY - netbox HTTPS on netmon1002 is OK: HTTP OK: HTTP/1.1 302 Found - 346 bytes in 0.012 second response time [21:34:44] (03CR) 10RobH: [C: 032] decommission bast4001 [puppet] - 10https://gerrit.wikimedia.org/r/478785 (https://phabricator.wikimedia.org/T178592) (owner: 10RobH) [21:35:12] should be fixed now :3 [21:35:23] dunno whatr exactly happened, it restarted fine after breaking itself [21:37:16] 10Operations, 10ops-ulsfo, 10decommission: decommission/replace bast4001.wikimedia.org - https://phabricator.wikimedia.org/T178592 (10RobH) [21:39:06] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Dzahn) P.S. One issue with my suggestion above, don't use mysql::config::client but mariadb::config or something else in mariadb:: . We want to get rid... [21:39:08] 10Operations, 10ops-ulsfo, 10decommission: decommission/replace bast4001.wikimedia.org - https://phabricator.wikimedia.org/T178592 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for bast4001.wikimedia.org and performed the following actions: - Revoked Puppet certificate - Removed from Pupp... [21:39:33] (03PS1) 10Andrew Bogott: Neutron: allow VMs to access the neutron API [puppet] - 10https://gerrit.wikimedia.org/r/478786 (https://phabricator.wikimedia.org/T211391) [21:40:04] (03CR) 10SBassett: [C: 031] "These all make sense to add IMO." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478467 (owner: 10Gergő Tisza) [21:40:06] 10Operations, 10ops-ulsfo, 10decommission: decommission/replace bast4001.wikimedia.org - https://phabricator.wikimedia.org/T178592 (10RobH) a:03RobH [21:40:22] (03CR) 10jerkins-bot: [V: 04-1] Neutron: allow VMs to access the neutron API [puppet] - 10https://gerrit.wikimedia.org/r/478786 (https://phabricator.wikimedia.org/T211391) (owner: 10Andrew Bogott) [21:40:40] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Ottomata) Right, the 'researchers' group has no association with the Research team. Its just a bunch of people who get access to 'research' MySQL inst... [21:42:07] (03PS2) 10Andrew Bogott: Neutron: allow VMs to access the neutron API [puppet] - 10https://gerrit.wikimedia.org/r/478786 (https://phabricator.wikimedia.org/T211391) [21:42:09] (03PS1) 10Effie Mouzeli: admin: Add Greta Doci to ldap_only group [puppet] - 10https://gerrit.wikimedia.org/r/478787 (https://phabricator.wikimedia.org/T211126) [21:43:21] PROBLEM - Host mw1272 is DOWN: PING CRITICAL - Packet loss = 100% [21:51:12] (03PS1) 10CRusnov: Fix typo in oldhardware report. [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/478788 [21:53:01] (03PS1) 10Ottomata: Move cloud-analytics zookeeper to ca-conf-* [puppet] - 10https://gerrit.wikimedia.org/r/478789 (https://phabricator.wikimedia.org/T204951) [21:53:40] (03PS1) 10Effie Mouzeli: admin: Add wmde-fisch to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/478790 (https://phabricator.wikimedia.org/T211014) [21:53:45] (03CR) 10CRusnov: [V: 032 C: 032] "Quick typo fix." [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/478788 (owner: 10CRusnov) [21:53:50] (03PS3) 10Imarlier: config: move wgMFNoindexPages to InitialiseSettings-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476060 (https://phabricator.wikimedia.org/T206497) [21:53:54] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 1.794 second response time [21:53:57] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Dzahn) >>! In T208622#4812114, @Ottomata wrote: > Perhaps a good group name would be `research-admins`? I was thinking `research-team` to make a point... [21:53:59] (03CR) 10Ottomata: [C: 032] Move cloud-analytics zookeeper to ca-conf-* [puppet] - 10https://gerrit.wikimedia.org/r/478789 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [21:57:14] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:58:21] (03CR) 10Imarlier: config: move wgMFNoindexPages to InitialiseSettings-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476060 (https://phabricator.wikimedia.org/T206497) (owner: 10Imarlier) [21:58:26] (03CR) 10Imarlier: [C: 032] config: move wgMFNoindexPages to InitialiseSettings-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476060 (https://phabricator.wikimedia.org/T206497) (owner: 10Imarlier) [21:59:31] (03Merged) 10jenkins-bot: config: move wgMFNoindexPages to InitialiseSettings-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476060 (https://phabricator.wikimedia.org/T206497) (owner: 10Imarlier) [22:00:04] bawolff and Reedy: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181210T2200). [22:02:46] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 8.502 second response time [22:05:28] (03CR) 10Dzahn: [C: 032] admin: Add wmde-fisch to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/478790 (https://phabricator.wikimedia.org/T211014) (owner: 10Effie Mouzeli) [22:06:09] (03PS2) 10Dzahn: admin: Add Greta Doci to ldap_only group [puppet] - 10https://gerrit.wikimedia.org/r/478787 (https://phabricator.wikimedia.org/T211126) (owner: 10Effie Mouzeli) [22:06:14] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:08:18] (03CR) 10Dzahn: [C: 032] "uidNumber: 20496" [puppet] - 10https://gerrit.wikimedia.org/r/478787 (https://phabricator.wikimedia.org/T211126) (owner: 10Effie Mouzeli) [22:09:12] (03PS2) 10Dzahn: admin: Add wmde-fisch to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/478790 (https://phabricator.wikimedia.org/T211014) (owner: 10Effie Mouzeli) [22:11:17] (03CR) 10jenkins-bot: config: move wgMFNoindexPages to InitialiseSettings-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476060 (https://phabricator.wikimedia.org/T206497) (owner: 10Imarlier) [22:13:26] !log Welcome new Mediawiki deployer Christoph 'WMDE-Fisch' Jauera (T211014) [22:13:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:13:30] T211014: Requesting access to deployment for Christoph Jauera (WMDE-Fisch) - https://phabricator.wikimedia.org/T211014 [22:22:16] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10User-jijiki: Requesting access to deployment for Christoph Jauera (WMDE-Fisch) - https://phabricator.wikimedia.org/T211014 (10Dzahn) 05Open>03Resolved a:03Dzahn On `deployment1001.eqiad.wmne... [22:24:59] (03PS3) 10Dzahn: admin: added joewalsh to researchers group [puppet] - 10https://gerrit.wikimedia.org/r/478707 (https://phabricator.wikimedia.org/T211115) (owner: 10Mathew.onipe) [22:29:46] RECOVERY - MariaDB Slave Lag: m3 on db2078 is OK: OK slave_sql_lag Replication lag: 0.15 seconds [22:30:14] RECOVERY - MariaDB Slave Lag: m3 on db2042 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [22:31:32] (03CR) 10Dzahn: [C: 031] "thanks Matt. looks all good. https://puppet-compiler.wmflabs.org/compiler1002/13886/ will merge it" [puppet] - 10https://gerrit.wikimedia.org/r/478707 (https://phabricator.wikimedia.org/T211115) (owner: 10Mathew.onipe) [22:31:47] (03PS1) 10Ottomata: Set yarn and hadoop heapsize for cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478798 (https://phabricator.wikimedia.org/T204951) [22:32:26] (03CR) 10Ottomata: [V: 032 C: 032] Set yarn and hadoop heapsize for cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478798 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [22:32:32] (03PS2) 10Ottomata: Set yarn and hadoop heapsize for cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478798 (https://phabricator.wikimedia.org/T204951) [22:32:35] (03CR) 10Ottomata: [V: 032 C: 032] Set yarn and hadoop heapsize for cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478798 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [22:35:24] (03PS6) 10Cwhite: incorporating first round of feedback [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/477360 (https://phabricator.wikimedia.org/T208066) [22:36:30] (03CR) 10Cwhite: [C: 032] incorporating first round of feedback [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/477360 (https://phabricator.wikimedia.org/T208066) (owner: 10Cwhite) [22:37:25] 10Operations, 10Traffic, 10netops: IPv6 ~20ms higher ping than IPv4 to gerrit - https://phabricator.wikimedia.org/T211079 (10ayounsi) Talked to Faidon last week, we agreed that a mechanism to ignore AS paths learned from the route servers would be a useful thing to have and not only a hotfix for this issue.... [22:37:44] (03PS1) 10Catrope: Enable emails for certain notification types by default on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478799 (https://phabricator.wikimedia.org/T211620) [22:37:59] 10Operations, 10Traffic, 10netops: IPv6 ~20ms higher ping than IPv4 to gerrit - https://phabricator.wikimedia.org/T211079 (10ayounsi) a:03ayounsi [22:40:39] (03CR) 10Cwhite: [C: 032] memcached: remove memcached diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/469250 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite) [22:40:46] (03PS2) 10Cwhite: memcached: remove memcached diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/469250 (https://phabricator.wikimedia.org/T183454) [22:42:30] (03PS1) 10Ottomata: Set Xmx for datanode and nodemanager in cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478800 (https://phabricator.wikimedia.org/T204951) [22:42:59] (03CR) 10Ottomata: [V: 032 C: 032] Set Xmx for datanode and nodemanager in cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478800 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [22:43:07] (03PS2) 10Ottomata: Set Xmx for datanode and nodemanager in cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478800 (https://phabricator.wikimedia.org/T204951) [22:43:13] (03CR) 10Ottomata: [V: 032 C: 032] Set Xmx for datanode and nodemanager in cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478800 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [22:44:53] 10Operations, 10ops-ulsfo, 10decommission: decommission/replace bast4001.wikimedia.org - https://phabricator.wikimedia.org/T178592 (10RobH) wipe is in progress via usb live image boot, it'll take 24-48 hours to complete, so I'll just check it when I'm onsite next. [22:45:42] (03CR) 10Dzahn: [C: 032] admin: added joewalsh to researchers group [puppet] - 10https://gerrit.wikimedia.org/r/478707 (https://phabricator.wikimedia.org/T211115) (owner: 10Mathew.onipe) [22:45:50] (03PS4) 10Dzahn: admin: added joewalsh to researchers group [puppet] - 10https://gerrit.wikimedia.org/r/478707 (https://phabricator.wikimedia.org/T211115) (owner: 10Mathew.onipe) [22:46:28] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to trustandsafety@ - https://phabricator.wikimedia.org/T210464 (10bcampbell) We can set these aliases up on our end. Just let me know when to do so. I imagine you'll remove the aliases from your end and we'll add said aliases to our end? [22:49:08] (03PS5) 10Cwhite: role, profile: install, run, and collect icinga exporter metrics [puppet] - 10https://gerrit.wikimedia.org/r/476431 (https://phabricator.wikimedia.org/T208066) [22:49:52] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to trustandsafety@ - https://phabricator.wikimedia.org/T210464 (10Dzahn) Thanks @bcampbell ! Actually it's usually the other way around, you can create these any time and let us know when done and then we delete our side and it should... [22:51:45] (03CR) 10Cwhite: [C: 032] role, profile: install, run, and collect icinga exporter metrics [puppet] - 10https://gerrit.wikimedia.org/r/476431 (https://phabricator.wikimedia.org/T208066) (owner: 10Cwhite) [22:55:40] (03PS1) 10Ottomata: Reenable hadoop prometheus jmx exporters in cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478801 (https://phabricator.wikimedia.org/T204951) [22:56:25] (03CR) 10Ottomata: [C: 032] Reenable hadoop prometheus jmx exporters in cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478801 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [22:56:31] (03PS2) 10Ottomata: Reenable hadoop prometheus jmx exporters in cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478801 (https://phabricator.wikimedia.org/T204951) [22:56:33] (03CR) 10Ottomata: [V: 032 C: 032] Reenable hadoop prometheus jmx exporters in cloud-analytics [puppet] - 10https://gerrit.wikimedia.org/r/478801 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [22:56:54] 10Operations, 10Traffic, 10netops: Free up 185.15.59.0/24 - https://phabricator.wikimedia.org/T211254 (10ayounsi) [23:00:07] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10User-jijiki: Requesting access to `researchers` group for joewalsh - https://phabricator.wikimedia.org/T211115 (10Dzahn) 05Open>03Resolved a:03Dzahn @JoeWalsh This is now approved and granted. I see you had shell access in the past that was d... [23:00:55] hello! I am here to deploy my changes for the Google Code-in task [23:02:21] Oh shoot I'm an hour early.. sorry about that [23:02:30] jouncebot, next [23:02:30] In 0 hour(s) and 57 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181211T0000) [23:05:48] 10Operations, 10Traffic, 10netops: Free up 185.15.59.0/24 - https://phabricator.wikimedia.org/T211254 (10ayounsi) Added some context in the task description. In addition, 185.15.58.0/24 is currently reserved as a 2nd anycast range (since T98006 I think), which goes against the idea of segregating the whole... [23:06:23] (03CR) 10Dzahn: admins: add to proton-admins: pmiazga, bsitzmann, mholloway, mbsantos, tgr (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/478776 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [23:06:29] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to trustandsafety@ - https://phabricator.wikimedia.org/T210464 (10bcampbell) @Dzahn makes sense, thanks. I'll let you know. @Jalexander it looks like you requested OIT to rename trustandsafety@ to tsops@ on 8/1/18. I can confirm that... [23:06:59] (03PS4) 10Dzahn: admins: add to proton-admins: pmiazga, bsitzmann, mholloway, mbsantos, tgr [puppet] - 10https://gerrit.wikimedia.org/r/478776 (https://phabricator.wikimedia.org/T211382) [23:07:14] 10Operations, 10monitoring, 10Patch-For-Review, 10Performance-Team (Radar), 10User-CDanis: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 (10CDanis) 05Open>03Resolved [23:07:36] 10Operations, 10monitoring, 10Patch-For-Review, 10Performance-Team (Radar), 10User-CDanis: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 (10CDanis) Need to create some other tasks to track work that should be done with new 5.x features but marking this as done :) [23:07:59] 10Operations, 10Proton, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10Dzahn) a:03Dzahn [23:08:15] 10Operations, 10Proton, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10Dzahn) p:05Triage>03High [23:09:39] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic, and 2 others: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10Dzahn) a:05toddleroux>03Dzahn [23:15:23] (03PS2) 10Dzahn: admins: add user toddleroux [puppet] - 10https://gerrit.wikimedia.org/r/478717 (https://phabricator.wikimedia.org/T209298) (owner: 10Mathew.onipe) [23:16:38] (03CR) 10Dzahn: [C: 032] admins: add user toddleroux [puppet] - 10https://gerrit.wikimedia.org/r/478717 (https://phabricator.wikimedia.org/T209298) (owner: 10Mathew.onipe) [23:20:38] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic, and 2 others: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10Dzahn) [23:21:57] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic, and 2 others: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10Dzahn) [23:24:32] (03PS1) 10Dzahn: admins: add toddleroux to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/478802 (https://phabricator.wikimedia.org/T209298) [23:25:23] (03CR) 10Dzahn: [C: 032] "followed by https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/478802/" [puppet] - 10https://gerrit.wikimedia.org/r/478717 (https://phabricator.wikimedia.org/T209298) (owner: 10Mathew.onipe) [23:26:27] (03CR) 10Dzahn: [C: 032] "same ticket as "ryanmax, afandian2", just finishing it by adding the last of 3 users" [puppet] - 10https://gerrit.wikimedia.org/r/478802 (https://phabricator.wikimedia.org/T209298) (owner: 10Dzahn) [23:27:15] (03PS1) 10Cwhite: decode status file as ISO-8859-1 [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/478803 (https://phabricator.wikimedia.org/T208066) [23:27:47] (03CR) 10Cwhite: [C: 032] decode status file as ISO-8859-1 [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/478803 (https://phabricator.wikimedia.org/T208066) (owner: 10Cwhite) [23:30:35] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic, and 2 others: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10Dzahn) @toddleroux Your user has been created now. For example i ran puppet and i see it on the h... [23:31:34] 10Operations, 10Traffic, 10netops: IPv6 ~20ms higher ping than IPv4 to gerrit - https://phabricator.wikimedia.org/T211079 (10faidon) - It's been a while, but I believe an import statement in the neighbor block overrides the parent one in its entirety, and does not supplement it, so we'd have to repeat the wh... [23:33:08] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic, and 2 others: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10Dzahn) @Afandian Is your access working meanwhile? If not, please add details which host you are... [23:33:51] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic, and 2 others: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10Miriam) @Dzahn thank you so much! [23:34:16] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic, and 2 others: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10Dzahn) p:05High>03Normal Everything should be done, lowering prio from High to Normal, just w... [23:37:54] 10Operations, 10Icinga, 10fundraising-tech-ops: Errors connecting to icinga2001.wikimedia.org - https://phabricator.wikimedia.org/T211641 (10Dzahn) [23:38:59] 10Operations, 10Icinga, 10fundraising-tech-ops: frack / passive icinga checks: Errors connecting to icinga2001.wikimedia.org - https://phabricator.wikimedia.org/T211641 (10Dzahn) [23:39:53] 10Operations, 10ops-ulsfo, 10decommission: decommission/replace bast4001.wikimedia.org - https://phabricator.wikimedia.org/T178592 (10RobH) So, this is on asw2-ulsfo:ge-2/0/12 ` robh@asw2-ulsfo# show | compare [edit interfaces interface-range vlan-public1-ulsfo] - member ge-2/0/12; ` There wasn't a d... [23:40:04] 10Operations, 10Icinga, 10fundraising-tech-ops: frack / passive icinga checks: Errors connecting to icinga2001.wikimedia.org - https://phabricator.wikimedia.org/T211641 (10Dzahn) p:05Triage>03Normal normal prio, the production checks are working, it's about the standby host dropping packets from send_nsc... [23:40:16] 10Operations, 10ops-ulsfo, 10decommission: decommission/replace bast4001.wikimedia.org - https://phabricator.wikimedia.org/T178592 (10RobH) [23:47:50] PROBLEM - ensure kvm processes are running on cloudvirt1023 is CRITICAL: PROCS CRITICAL: 96 processes with regex args qemu-system-x86_64 [23:48:45] andrewbogott ^^ [23:49:02] RECOVERY - ensure kvm processes are running on cloudvirt1023 is OK: PROCS OK: 95 processes with regex args qemu-system-x86_64 [23:49:25] (03CR) 10Volans: [C: 031] "LGTM, see inline for a comment." (031 comment) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/478458 (https://phabricator.wikimedia.org/T205899) (owner: 10CRusnov) [23:49:28] (03PS1) 10Dmaza: Increase default minimum password length on multiple group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478807 (https://phabricator.wikimedia.org/T208246) [23:51:40] yeah, it's a false alarm but I don't know why it's happening [23:51:56] !log silencing the kvm process count alert on cloudvirt1023 until I can figure out why it's misfiring [23:51:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log