[00:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: It is that lovely time of the day again! You are hereby commanded to deploy Evening SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181207T0000). [00:00:04] ebernhardson: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:02:38] just me? i can ship it [00:02:48] 10Operations, 10Traffic, 10Browser-Support-Apple-Safari, 10Upstream: Fix broken referer categorization for visits from Safari browsers - https://phabricator.wikimedia.org/T154702 (10Tbayer) I added an entry about this to the log at https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_h... [00:03:19] (03CR) 10EBernhardson: [C: 032] Turn off wbsearchentities test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478103 (https://phabricator.wikimedia.org/T209402) (owner: 10EBernhardson) [00:04:26] (03Merged) 10jenkins-bot: Turn off wbsearchentities test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478103 (https://phabricator.wikimedia.org/T209402) (owner: 10EBernhardson) [00:10:14] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:478103|turn off wbsearchentities ab test]] T209402 (duration: 00m 47s) [00:10:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:10:17] T209402: A/B testing plan for wbsearchentities, context=item - https://phabricator.wikimedia.org/T209402 [00:10:54] (03PS3) 10Dzahn: ci::master: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/453554 [00:14:49] (03PS4) 10Dzahn: ci::master: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/453554 [00:14:58] (03CR) 10jenkins-bot: Turn off wbsearchentities test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478103 (https://phabricator.wikimedia.org/T209402) (owner: 10EBernhardson) [00:16:33] (03CR) 10Dzahn: [C: 031] "works now! https://puppet-compiler.wmflabs.org/compiler1002/13863/" [puppet] - 10https://gerrit.wikimedia.org/r/453554 (owner: 10Dzahn) [00:19:48] (03PS5) 10Dzahn: ci::master: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/453554 [00:28:19] (03PS1) 10Dzahn: ci::httpd: add support for stretch/PHP 7.0 [puppet] - 10https://gerrit.wikimedia.org/r/478125 [00:34:58] (03PS3) 10Cwhite: incorporating first round of feedback [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/477360 (https://phabricator.wikimedia.org/T208066) [00:36:55] (03PS4) 10Cwhite: incorporating first round of feedback [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/477360 (https://phabricator.wikimedia.org/T208066) [00:45:47] (03PS1) 10Mforns: Adjust params for Analytics data_purge EventLoggingSanitization job [puppet] - 10https://gerrit.wikimedia.org/r/478129 (https://phabricator.wikimedia.org/T202429) [00:46:40] (03CR) 10jerkins-bot: [V: 04-1] Adjust params for Analytics data_purge EventLoggingSanitization job [puppet] - 10https://gerrit.wikimedia.org/r/478129 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [00:47:41] !log done troubleshoting bird bfd on dns2001/cr1-codfw [00:47:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:47:07] PROBLEM - Check systemd state on pc2007 is CRITICAL: connect to address 10.192.0.104 port 5666: Connection refused [01:47:07] PROBLEM - puppet last run on pc2007 is CRITICAL: connect to address 10.192.0.104 port 5666: Connection refused [01:47:11] PROBLEM - Check size of conntrack table on pc2007 is CRITICAL: connect to address 10.192.0.104 port 5666: Connection refused [01:47:14] PROBLEM - MariaDB disk space on pc2007 is CRITICAL: connect to address 10.192.0.104 port 5666: Connection refused [01:47:21] PROBLEM - Check whether ferm is active by checking the default input chain on pc2007 is CRITICAL: connect to address 10.192.0.104 port 5666: Connection refused [01:47:22] PROBLEM - mysqld processes on pc2007 is CRITICAL: connect to address 10.192.0.104 port 5666: Connection refused [01:47:31] PROBLEM - MariaDB Slave IO: pc1 on pc2007 is CRITICAL: connect to address 10.192.0.104 port 5666: Connection refused [01:47:39] PROBLEM - MariaDB Slave SQL: pc1 on pc2007 is CRITICAL: connect to address 10.192.0.104 port 5666: Connection refused [01:47:43] PROBLEM - dhclient process on pc2007 is CRITICAL: connect to address 10.192.0.104 port 5666: Connection refused [01:48:49] RECOVERY - MariaDB Slave SQL: pc1 on pc2007 is OK: OK slave_sql_state Slave_SQL_Running: Yes [01:48:53] RECOVERY - dhclient process on pc2007 is OK: PROCS OK: 0 processes with command name dhclient [01:49:31] RECOVERY - Check systemd state on pc2007 is OK: OK - running: The system is fully operational [01:49:37] RECOVERY - Check size of conntrack table on pc2007 is OK: OK: nf_conntrack is 0 % full [01:49:38] RECOVERY - MariaDB disk space on pc2007 is OK: DISK OK [01:49:45] RECOVERY - Check whether ferm is active by checking the default input chain on pc2007 is OK: OK ferm input default policy is set [01:49:48] RECOVERY - mysqld processes on pc2007 is OK: PROCS OK: 1 process with command name mysqld [01:49:55] RECOVERY - MariaDB Slave IO: pc1 on pc2007 is OK: OK slave_io_state Slave_IO_Running: Yes [01:52:19] RECOVERY - puppet last run on pc2007 is OK: OK: Puppet is currently enabled, last run 29 minutes ago with 0 failures [01:59:26] I am not sure what happened there, some stuff in syslog that looks like a networking blip? [02:11:45] agreed, nothing obvious and looks like network was gone for a short time [02:22:05] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install codfw logstash elasticsearch storage servers - https://phabricator.wikimedia.org/T211065 (10Papaul) @fgiunchedi the installation is complaining about not finding any swap partition. ──────────────────────┤ [!!] Partition disks ├──────────────... [02:23:29] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install codfw logstash elasticsearch storage servers - https://phabricator.wikimedia.org/T211065 (10Papaul) [02:29:09] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is CRITICAL: cluster=cache_text site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1 [02:30:21] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1 [02:35:59] (03PS1) 10Dzahn: change IP for logstash2001, 10.192.0.104 is already in use [dns] - 10https://gerrit.wikimedia.org/r/478132 (https://phabricator.wikimedia.org/T211065) [02:37:55] (03PS1) 10Dzahn: add missing PTR for pc2007 [dns] - 10https://gerrit.wikimedia.org/r/478133 [02:42:59] (03PS2) 10Dzahn: change IP for logstash2001, 10.192.0.104 is already in use [dns] - 10https://gerrit.wikimedia.org/r/478132 (https://phabricator.wikimedia.org/T211065) [02:43:10] (03CR) 10Dzahn: [C: 032] change IP for logstash2001, 10.192.0.104 is already in use [dns] - 10https://gerrit.wikimedia.org/r/478132 (https://phabricator.wikimedia.org/T211065) (owner: 10Dzahn) [02:45:02] (03CR) 10Papaul: [C: 032] change IP for logstash2001, 10.192.0.104 is already in use [dns] - 10https://gerrit.wikimedia.org/r/478132 (https://phabricator.wikimedia.org/T211065) (owner: 10Dzahn) [02:47:30] (03PS2) 10Dzahn: add missing PTR for pc2007 [dns] - 10https://gerrit.wikimedia.org/r/478133 [02:48:42] (03CR) 10Papaul: [C: 032] add missing PTR for pc2007 [dns] - 10https://gerrit.wikimedia.org/r/478133 (owner: 10Dzahn) [03:34:13] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 891.05 seconds [03:46:48] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1 [03:56:23] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1 [04:09:07] PROBLEM - puppet last run on scb1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:16:07] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 293.58 seconds [04:17:22] (03PS3) 10Tim Starling: Refactor profiler.php and X-Wikimedia-Debug parsing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477939 [04:17:24] (03PS2) 10Tim Starling: Class wrapper for ProductionServices.php etc. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477956 [04:17:26] (03PS2) 10Tim Starling: Put profiler hostnames in ProductionServices.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477957 [04:17:28] (03PS1) 10Tim Starling: Excimer and Tideways support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478137 [04:18:24] (03CR) 10jerkins-bot: [V: 04-1] Refactor profiler.php and X-Wikimedia-Debug parsing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477939 (owner: 10Tim Starling) [04:19:34] (03CR) 10jerkins-bot: [V: 04-1] Excimer and Tideways support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478137 (owner: 10Tim Starling) [04:24:37] PROBLEM - HHVM rendering on mw1345 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:25:39] RECOVERY - HHVM rendering on mw1345 is OK: HTTP OK: HTTP/1.1 200 OK - 82296 bytes in 0.100 second response time [04:30:30] (03PS2) 10Tim Starling: Excimer and Tideways support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478137 [04:32:16] (03CR) 10jerkins-bot: [V: 04-1] Excimer and Tideways support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478137 (owner: 10Tim Starling) [04:35:01] RECOVERY - puppet last run on scb1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [04:35:28] (03PS3) 10Tim Starling: Excimer and Tideways support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478137 [04:37:14] (03CR) 10jerkins-bot: [V: 04-1] Excimer and Tideways support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478137 (owner: 10Tim Starling) [04:37:41] (03CR) 10Tim Starling: "The Excimer part of this is tested. I made an auto_prepend_file on my local wiki which required WMF's profiler.php, and then invoked wmfSe" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478137 (owner: 10Tim Starling) [04:51:29] (03PS1) 10Tulsi Bhagat: Enable 'flood' user group at ne.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478138 [05:02:01] 10Operations, 10Cloud-Services: Cron test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) - https://phabricator.wikimedia.org/T211271 (10bd808) [05:26:07] 10Operations, 10Cloud-Services: Cron test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) - https://phabricator.wikimedia.org/T211271 (10bd808) ` $ sudo -i systemctl status apache2.service ● apache2.service - The Apache HTTP Server Loaded: loaded (/lib/systemd/system... [05:26:27] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): Cron test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) - https://phabricator.wikimedia.org/T211271 (10bd808) [05:28:29] (03PS2) 10Tulsi Bhagat: Enable 'flood' user group at ne.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478138 (https://phabricator.wikimedia.org/T211181) [05:32:20] (03PS3) 10Tulsi Bhagat: Enable 'flood' user group at ne.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478138 (https://phabricator.wikimedia.org/T211181) [06:09:42] (03CR) 10Muehlenhoff: profile::statistics::private: allow labsdb to push nginx logs (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/478022 (https://phabricator.wikimedia.org/T211330) (owner: 10Elukey) [06:11:52] (03CR) 10Muehlenhoff: [C: 031] service::node: add the 'use_nodejs10' parameter [puppet] - 10https://gerrit.wikimedia.org/r/477475 (https://phabricator.wikimedia.org/T210704) (owner: 10Elukey) [06:15:56] 10Operations, 10netops: pc2007.codfw.wmnet network blip? - https://phabricator.wikimedia.org/T211405 (10Marostegui) [06:19:29] 10Operations, 10netops: pc2007.codfw.wmnet network blip? - https://phabricator.wikimedia.org/T211405 (10MoritzMuehlenhoff) See IRC/internal channel, during the setup of logstash2001.codfw.wmnet it accidentally reused the 10.192.0.104 A record. [06:21:17] 10Operations, 10netops: pc2007.codfw.wmnet network blip? - https://phabricator.wikimedia.org/T211405 (10Marostegui) 05Open>03Resolved >>! In T211405#4805188, @MoritzMuehlenhoff wrote: > See IRC/internal channel, during the setup of logstash2001.codfw.wmnet it accidentally reused the 10.192.0.104 A record.... [06:25:59] RECOVERY - MariaDB Slave SQL: s8 on db1124 is OK: OK slave_sql_state Slave_SQL_Running: Yes [06:27:25] (03CR) 10BPirkle: [C: 031] "Tested the described configuration on my local php-fpm 7.2, with the changes from the patchset. Looks good to me, and also makes local te" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478021 (https://phabricator.wikimedia.org/T211184) (owner: 10Giuseppe Lavagetto) [06:28:21] PROBLEM - netbox HTTPS on netmon1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 547 bytes in 0.009 second response time [06:28:23] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:28:53] PROBLEM - puppet last run on sodium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/puppet-enabled] [06:29:20] <_joe_> uhm netbox went down? [06:31:03] PROBLEM - puppet last run on analytics1061 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/share/ca-certificates/DigiCert_SHA2_High_Assurance_Server_CA.crt] [06:31:27] PROBLEM - puppet last run on mw1307 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/share/bash/puppet-common.sh] [06:36:36] logrotate triggered a reload and now netbox bails on logsocket_plugin.so after the involved restart [06:37:53] RECOVERY - netbox HTTPS on netmon1002 is OK: HTTP OK: HTTP/1.1 302 Found - 348 bytes in 0.547 second response time [06:37:55] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational [06:39:11] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Marostegui) Thanks for the heads up @Cmjohnson! [06:54:47] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:57:01] RECOVERY - puppet last run on analytics1061 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:21] RECOVERY - puppet last run on mw1307 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:59:33] 10Operations, 10Mail: Wikipedia.org DMARC "rua" and "ruf" email addresses need verification - https://phabricator.wikimedia.org/T211401 (10Reedy) [07:00:54] 10Operations, 10Mail: Domains of most projects do not have DMARC policy - https://phabricator.wikimedia.org/T211403 (10Reedy) [07:01:26] (03CR) 10Elukey: profile::statistics::private: allow labsdb to push nginx logs (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/478022 (https://phabricator.wikimedia.org/T211330) (owner: 10Elukey) [07:01:53] (03PS1) 10Marostegui: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478139 [07:02:14] 10Operations, 10Mail: More restrictive DMARC policy for the wikimedia.org domain - https://phabricator.wikimedia.org/T211404 (10Reedy) [07:02:58] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478139 (owner: 10Marostegui) [07:04:03] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478139 (owner: 10Marostegui) [07:05:17] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1084 (duration: 00m 49s) [07:05:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:10:15] (03PS1) 10Muehlenhoff: Update MOU data for nathante [puppet] - 10https://gerrit.wikimedia.org/r/478140 [07:10:17] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478139 (owner: 10Marostegui) [07:11:24] !log Stop MySQL on db1084 for mysql and kernel upgrade [07:11:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:15:11] (03CR) 10Muehlenhoff: [C: 032] Update MOU data for nathante [puppet] - 10https://gerrit.wikimedia.org/r/478140 (owner: 10Muehlenhoff) [07:15:52] 10Operations, 10MediaWiki-Debug-Logger, 10Performance-Team: Set up request profiling for PHP 7 - https://phabricator.wikimedia.org/T206152 (10Joe) >>! In T206152#4804916, @tstarling wrote: > Please install tideways, but it should only be enabled in php.ini on the debug servers, since it will cause a performa... [07:19:48] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478142 [07:20:25] (03PS1) 10Tulsi Bhagat: Namespace configuration on shnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478143 [07:21:01] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478142 (owner: 10Marostegui) [07:22:04] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478142 (owner: 10Marostegui) [07:22:55] (03PS3) 10Elukey: profile::statistics::private: allow labstore to push nginx logs [puppet] - 10https://gerrit.wikimedia.org/r/478022 (https://phabricator.wikimedia.org/T211330) [07:23:03] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1084 (duration: 00m 46s) [07:23:04] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478142 (owner: 10Marostegui) [07:23:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:25:31] RECOVERY - MariaDB Slave Lag: s8 on db1124 is OK: OK slave_sql_lag Replication lag: 0.45 seconds [07:30:25] (03PS2) 10Tulsi Bhagat: Namespace configuration on shnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478143 (https://phabricator.wikimedia.org/T210699) [07:38:39] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478144 [07:39:50] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478144 (owner: 10Marostegui) [07:41:02] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478144 (owner: 10Marostegui) [07:42:13] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Increase traffic for db1084 (duration: 00m 46s) [07:42:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:44:19] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478145 [07:48:52] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478144 (owner: 10Marostegui) [07:50:27] !log decommissioning cassandra-c, restbase2001 -- T210843 [07:50:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:50:36] T210843: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 [07:50:59] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/13865/" [puppet] - 10https://gerrit.wikimedia.org/r/478022 (https://phabricator.wikimedia.org/T211330) (owner: 10Elukey) [07:51:06] (03PS4) 10Elukey: profile::statistics::private: allow labstore to push nginx logs [puppet] - 10https://gerrit.wikimedia.org/r/478022 (https://phabricator.wikimedia.org/T211330) [07:53:06] 10Operations, 10ops-codfw, 10Core Platform Team, 10Services (doing), and 2 others: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 (10fgiunchedi) [07:56:03] (03CR) 10Filippo Giunchedi: [C: 031] Use webp -exact option on Stretch [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/477796 (https://phabricator.wikimedia.org/T170817) (owner: 10Gilles) [07:57:03] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478145 (owner: 10Marostegui) [07:58:05] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478145 (owner: 10Marostegui) [07:59:00] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Increase traffic for db1084 (duration: 00m 46s) [07:59:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:01:42] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478145 (owner: 10Marostegui) [08:10:33] PROBLEM - MariaDB Slave Lag: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 622.73 seconds [08:10:59] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 615.52 seconds [08:11:42] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478147 [08:12:55] RECOVERY - MariaDB Slave Lag: s3 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 185.39 seconds [08:13:01] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478147 (owner: 10Marostegui) [08:13:50] 10Operations, 10Mail: More restrictive DMARC policy for the wikimedia.org domain - https://phabricator.wikimedia.org/T211404 (10Reedy) >There were already many tasks related to this problem, but all were closed without actually solving the problem: T57559, T136468, T144390, T210384, etc. I don't think that's... [08:14:04] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478147 (owner: 10Marostegui) [08:14:49] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478147 (owner: 10Marostegui) [08:14:58] 10Operations, 10DBA, 10User-Banyek: Issues with purgeUnusedProjects.php cron job on mwmaint1002 (Fri Oct 26) - https://phabricator.wikimedia.org/T208231 (10Marostegui) >>! In T208231#4803855, @Banyek wrote: > i'd like to add the owner of the script as a subscriber, but I don't know how to find who is it gi... [08:15:02] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1084 (duration: 00m 47s) [08:15:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:09] (03PS1) 10Marostegui: db-eqiad.php: Depool db1096:3315 db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478148 [08:18:05] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 234.94 seconds [08:18:50] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1096:3315 db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478148 (owner: 10Marostegui) [08:19:55] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1096:3315 db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478148 (owner: 10Marostegui) [08:21:04] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1096:3315, db1096:3316 for kernel and mysql upgrade (duration: 00m 46s) [08:21:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:07] !log Stop MySQL on db1096:3315,3316 for kernel and mysql upgrade [08:21:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:28] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-extensions-PageAssessments, and 2 others: Issues with purgeUnusedProjects.php cron job on mwmaint1002 (Fri Oct 26) - https://phabricator.wikimedia.org/T208231 (10Reedy) [08:26:57] 10Operations, 10Community-Tech, 10MediaWiki-extensions-PageAssessments, 10Performance, 10User-Banyek: Issues with purgeUnusedProjects.php cron job on mwmaint1002 (Fri Oct 26) - https://phabricator.wikimedia.org/T208231 (10Marostegui) [08:27:48] (03CR) 10Elukey: [C: 032] profile::statistics::private: allow labstore to push nginx logs [puppet] - 10https://gerrit.wikimedia.org/r/478022 (https://phabricator.wikimedia.org/T211330) (owner: 10Elukey) [08:27:55] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1096:3315 db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478148 (owner: 10Marostegui) [08:29:58] (03PS1) 10Filippo Giunchedi: install_server: fix logstash partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/478149 (https://phabricator.wikimedia.org/T211065) [08:30:13] (03CR) 10Filippo Giunchedi: [C: 032] install_server: fix logstash partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/478149 (https://phabricator.wikimedia.org/T211065) (owner: 10Filippo Giunchedi) [08:30:23] (03PS2) 10Filippo Giunchedi: install_server: fix logstash partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/478149 (https://phabricator.wikimedia.org/T211065) [08:32:26] 10Operations, 10Security-Team, 10Wikimedia-Site-requests: Enable csp-report-only mode everywhere - https://phabricator.wikimedia.org/T207900 (10Schnark) I'm pretty sure this message is defined by the browser. For me in a German Firefox it is: ` Content Security Policy: Die Einstellungen der Seite haben das... [08:32:26] 10Operations: puppet (systemd::service) attempts to start manually masked units - https://phabricator.wikimedia.org/T211027 (10fgiunchedi) [08:33:29] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478150 [08:34:36] 10Operations: puppet (systemd::service) attempts to start manually masked units - https://phabricator.wikimedia.org/T211027 (10fgiunchedi) >>! In T211027#4800887, @Dzahn wrote: > https://tickets.puppetlabs.com/browse/PUP-1253 > > https://github.com/puppetlabs/puppet/pull/3141 > > "If a service is masked, it is... [08:36:45] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478150 (owner: 10Marostegui) [08:37:04] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install codfw logstash elasticsearch storage servers - https://phabricator.wikimedia.org/T211065 (10fgiunchedi) >>! In T211065#4805036, @Papaul wrote: > @fgiunchedi the installation is complaining about not finding any swap partition. > > ───────────... [08:37:48] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478150 (owner: 10Marostegui) [08:39:14] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1096:3315, db1096:3316 after kernel and mysql upgrade (duration: 00m 46s) [08:39:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:58] (03PS2) 10Giuseppe Lavagetto: hiera: remove the role backend in production [puppet] - 10https://gerrit.wikimedia.org/r/475499 [08:40:54] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478150 (owner: 10Marostegui) [08:42:35] (03PS2) 10Elukey: Move remaining stat1005 references to stat1007 [puppet] - 10https://gerrit.wikimedia.org/r/478020 (https://phabricator.wikimedia.org/T205846) [08:45:33] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1001/13867/" [puppet] - 10https://gerrit.wikimedia.org/r/478020 (https://phabricator.wikimedia.org/T205846) (owner: 10Elukey) [08:48:27] 10Operations, 10Community-Tech, 10MediaWiki-extensions-PageAssessments, 10Performance, 10User-Banyek: Issues with purgeUnusedProjects.php cron job on mwmaint1002 (Fri Oct 26) - https://phabricator.wikimedia.org/T208231 (10Banyek) @kaldari if you need any help for further debugging this, you can ask me [08:59:29] 10Operations, 10MediaWiki-Logging, 10Wikimedia-Logstash, 10Patch-For-Review: Move mediawiki to new logging infrastructure - https://phabricator.wikimedia.org/T211124 (10fgiunchedi) >>! In T211124#4798535, @bd808 wrote: >>>! In T211124#4798013, @fgiunchedi wrote: >> I've looked briefly at how to implement p... [08:59:51] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478153 [09:03:07] (03PS2) 10Ema: ATS: Collapsed Forwarding support [puppet] - 10https://gerrit.wikimedia.org/r/478003 (https://phabricator.wikimedia.org/T207048) [09:04:14] (03CR) 10Ema: [C: 032] ATS: Collapsed Forwarding support [puppet] - 10https://gerrit.wikimedia.org/r/478003 (https://phabricator.wikimedia.org/T207048) (owner: 10Ema) [09:08:03] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478153 (owner: 10Marostegui) [09:09:08] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478153 (owner: 10Marostegui) [09:09:37] 10Operations, 10Wikimedia-Logstash, 10service-runner, 10Core Platform Team Backlog (Next), 10Services (next): Move service-runner to new logging infrastructure - https://phabricator.wikimedia.org/T211125 (10fgiunchedi) >>! In T211125#4801464, @Pchelolo wrote: > So, currently we only support sending to sy... [09:10:24] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1096:3315, db1096:3316 after kernel and mysql upgrade (duration: 00m 46s) [09:10:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:12:23] (03PS1) 10Ema: ATS: set http.wait_for_cache [puppet] - 10https://gerrit.wikimedia.org/r/478155 (https://phabricator.wikimedia.org/T207048) [09:19:52] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478153 (owner: 10Marostegui) [09:23:35] 10Operations, 10netops: pc2007.codfw.wmnet network blip? - https://phabricator.wikimedia.org/T211405 (10Peachey88) [09:26:03] (03PS2) 10Shreyasminocha: Add HD logos for 3 projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478034 [09:29:30] (03PS3) 10Shreyasminocha: Add HD logos for 3 projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478034 (https://phabricator.wikimedia.org/T150618) [09:29:34] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478157 [09:30:25] (03PS3) 10Shreyasminocha: Update settings to include new HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478036 (https://phabricator.wikimedia.org/T150618) [09:31:54] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478157 (owner: 10Marostegui) [09:33:03] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478157 (owner: 10Marostegui) [09:33:08] (03PS2) 10Elukey: Add missing AAAA records to analytics hosts [dns] - 10https://gerrit.wikimedia.org/r/467710 (owner: 10Volans) [09:33:37] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478157 (owner: 10Marostegui) [09:33:39] (03CR) 10jerkins-bot: [V: 04-1] Add missing AAAA records to analytics hosts [dns] - 10https://gerrit.wikimedia.org/r/467710 (owner: 10Volans) [09:34:07] yeah there you go, Riccardo getting -1s [09:34:08] :P [09:34:15] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1096:3315, db1096:3316 after kernel and mysql upgrade (duration: 00m 46s) [09:34:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:35:43] (03PS3) 10Elukey: Add missing AAAA records to analytics hosts [dns] - 10https://gerrit.wikimedia.org/r/467710 (owner: 10Volans) [09:38:37] (03CR) 10Elukey: [C: 032] "Checked all the IPs, looks good!" [dns] - 10https://gerrit.wikimedia.org/r/467710 (owner: 10Volans) [09:40:51] !log importing back linkwatcher_linklog into database s51230__linkwatcher on host labsdb1004.eqiad.wmnet. - T211210 [09:40:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:40:55] T211210: labsdb1004 replication broken for linkwatcher_linklog table - https://phabricator.wikimedia.org/T211210 [09:46:13] (03CR) 10Elukey: "Volans: question - do you have a way to generate these changes from the linter errors? If so there may be new hosts missing AAAA (an-worke" [dns] - 10https://gerrit.wikimedia.org/r/467705 (owner: 10Volans) [09:48:12] (03CR) 10Filippo Giunchedi: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/477620 (https://phabricator.wikimedia.org/T147326) (owner: 10Cwhite) [09:50:57] (03PS1) 10Ema: ATS: configuration settings for named pipe logging [puppet] - 10https://gerrit.wikimedia.org/r/478158 (https://phabricator.wikimedia.org/T204225) [10:00:07] (03CR) 10Elukey: [C: 031] "One nit for an error message, the rest looks good!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/477987 (owner: 10Muehlenhoff) [10:05:31] (03PS5) 10Thifranc: puppet:Reduce cronspam from modules/mediawiki/ [puppet] - 10https://gerrit.wikimedia.org/r/470877 (https://phabricator.wikimedia.org/T150375) [10:07:35] 10Operations, 10Proton, 10SRE-Access-Requests: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10mobrovac) Once all the usernames are known in the task description, your respective managers need to explicitly approve each of the individual... [10:21:34] (03CR) 10MarcoAurelio: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478138 (https://phabricator.wikimedia.org/T211181) (owner: 10Tulsi Bhagat) [10:22:20] (03CR) 10MarcoAurelio: [C: 031] Namespace configuration on shnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478143 (https://phabricator.wikimedia.org/T210699) (owner: 10Tulsi Bhagat) [10:27:34] PROBLEM - MariaDB disk space on labsdb1004 is CRITICAL: DISK CRITICAL - free space: / 484 MB (5% inode=87%) [10:28:16] sad_trombone.wav, checking host dashboard [10:28:54] banyek marostegui ^ [10:29:09] PROBLEM - Disk space on labsdb1004 is CRITICAL: DISK CRITICAL - free space: / 218 MB (2% inode=87%) [10:29:15] tx [10:29:19] I stop the import [10:29:27] I have stopped it [10:29:58] thanks! yeah looks like writing to the wrong fs? [10:32:09] probably, yes [10:32:41] 10Operations, 10Citoid, 10Prod-Kubernetes, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Citoid automated monitoring times out due to Zotero v2 - https://phabricator.wikimedia.org/T211411 (10mobrovac) p:05Triage>03High [10:34:44] RECOVERY - MariaDB disk space on labsdb1004 is OK: DISK OK [10:35:07] RECOVERY - Disk space on labsdb1004 is OK: DISK OK [10:36:54] godog: yes it was because I started not in a larger fs [10:39:20] ack [10:41:24] (03CR) 10MarcoAurelio: [C: 031] "Apparently this needs a rebase now." [cookbooks] - 10https://gerrit.wikimedia.org/r/460731 (owner: 10Legoktm) [10:47:13] (03CR) 10Ema: [C: 032] ATS: set http.wait_for_cache [puppet] - 10https://gerrit.wikimedia.org/r/478155 (https://phabricator.wikimedia.org/T207048) (owner: 10Ema) [10:47:27] (03CR) 10Ema: [C: 032] ATS: configuration settings for named pipe logging [puppet] - 10https://gerrit.wikimedia.org/r/478158 (https://phabricator.wikimedia.org/T204225) (owner: 10Ema) [10:54:05] PROBLEM - Apache HTTP on mw1282 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.002 second response time [10:54:12] (03CR) 10GTirloni: [C: 032] prometheus: add directory size collector [puppet] - 10https://gerrit.wikimedia.org/r/477937 (https://phabricator.wikimedia.org/T211094) (owner: 10Cwhite) [10:54:39] PROBLEM - HHVM rendering on mw1282 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [10:54:44] (03PS6) 10GTirloni: prometheus: add directory size collector [puppet] - 10https://gerrit.wikimedia.org/r/477937 (https://phabricator.wikimedia.org/T211094) (owner: 10Cwhite) [10:54:57] PROBLEM - Nginx local proxy to apache on mw1282 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.010 second response time [10:55:29] !log mobrovac@deploy1001 Started deploy [citoid/deploy@6b36331]: Add an explicit check for Zotero [10:55:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:55:53] RECOVERY - HHVM rendering on mw1282 is OK: HTTP OK: HTTP/1.1 200 OK - 82215 bytes in 1.746 second response time [10:56:09] RECOVERY - Nginx local proxy to apache on mw1282 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.051 second response time [10:56:20] (03CR) 10Volans: "> Patch Set 1:" [dns] - 10https://gerrit.wikimedia.org/r/467705 (owner: 10Volans) [10:56:29] RECOVERY - Apache HTTP on mw1282 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.079 second response time [10:58:12] !log mobrovac@deploy1001 Finished deploy [citoid/deploy@6b36331]: Add an explicit check for Zotero (duration: 02m 42s) [10:58:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:17] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (Ensure Zotero is working) is CRITICAL: Test Ensure Zotero is working returned the unexpected status 400 (expecting: 200) [10:58:37] known ^ [10:59:53] (03CR) 10Filippo Giunchedi: prometheus: add directory size collector (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/477937 (https://phabricator.wikimedia.org/T211094) (owner: 10Cwhite) [11:00:43] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [11:03:38] !log mobrovac@deploy1001 Started deploy [citoid/deploy@269c9c7]: Add an explicit check for Zotero [11:03:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:49] (03PS4) 10Shreyasminocha: Add HD logos for 3 projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478034 (https://phabricator.wikimedia.org/T150618) [11:07:35] (03PS4) 10Shreyasminocha: Update settings to include new HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478036 (https://phabricator.wikimedia.org/T150618) [11:09:27] 10Operations, 10Mail: More restrictive DMARC policy for the wikimedia.org domain - https://phabricator.wikimedia.org/T211404 (10putnik) >>! In T211404#4805307, @Reedy wrote: > I don't think that's completely fair. Some received no further responses, so it's hard to do anything about them Some were marked as du... [11:09:57] !log mobrovac@deploy1001 Finished deploy [citoid/deploy@269c9c7]: Add an explicit check for Zotero (duration: 06m 19s) [11:09:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:26] (03PS1) 10GTirloni: prometheus: fix/adjust directory size collector [puppet] - 10https://gerrit.wikimedia.org/r/478168 (https://phabricator.wikimedia.org/T211094) [11:13:53] (03CR) 10GTirloni: [C: 032] prometheus: fix/adjust directory size collector [puppet] - 10https://gerrit.wikimedia.org/r/478168 (https://phabricator.wikimedia.org/T211094) (owner: 10GTirloni) [11:15:42] !log mobrovac@deploy1001 Started deploy [restbase/deploy@31c44e8]: Fix: Encode recommendation api title [11:15:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:13] (03PS5) 10Shreyasminocha: Add HD logos for 3 projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478034 (https://phabricator.wikimedia.org/T150618) [11:19:31] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@31c44e8]: Fix: Encode recommendation api title (duration: 03m 49s) [11:19:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:20:20] (03PS1) 10Volans: linter: on duplicate names check if private [dns] - 10https://gerrit.wikimedia.org/r/478171 (https://phabricator.wikimedia.org/T182028) [11:20:41] !log rolling upgrade of nginx on swift frontends [11:20:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:06] (03PS6) 10Shreyasminocha: Add HD logos for 3 projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478034 (https://phabricator.wikimedia.org/T150618) [11:22:09] !log mobrovac@deploy1001 Started deploy [restbase/deploy@9e4af13]: Fix: Encode recommendation api title [11:22:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:24:12] (03PS5) 10Shreyasminocha: Update settings to include new HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478036 (https://phabricator.wikimedia.org/T150618) [11:26:03] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478034 (https://phabricator.wikimedia.org/T150618) (owner: 10Shreyasminocha) [11:26:29] 10Operations, 10Operations-Software-Development, 10Patch-For-Review: DNS repo: add CI checks for obvious configuration errors - https://phabricator.wikimedia.org/T182028 (10Volans) FYI I've added this small section to the docs for running the script: https://wikitech.wikimedia.org/wiki/DNS#Linting_the_zone_f... [11:41:07] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@9e4af13]: Fix: Encode recommendation api title (duration: 18m 58s) [11:41:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:42:14] !log mobrovac@deploy1001 Started deploy [restbase/deploy@9e4af13]: Fix: Encode recommendation api title [11:42:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:42:34] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@9e4af13]: Fix: Encode recommendation api title (duration: 00m 21s) [11:42:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:43:55] !log mobrovac@deploy1001 Started deploy [restbase/deploy@44e0955]: Fix: Encode recommendation api title [11:43:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:45:56] 10Operations, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Put restbase201[3-8] into conftool and LVS - https://phabricator.wikimedia.org/T211416 (10mobrovac) p:05Triage>03High [11:54:21] 10Operations, 10cloud-services-team, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Port DirectorySize diamond collector to a Prometheus exporter - https://phabricator.wikimedia.org/T211094 (10GTirloni) This collector takes 5-10min to run. Since it's only used for capacity planning, I've adjusted t... [11:56:29] PROBLEM - Docker registry HTTPS interface on darmstadtium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:57:33] RECOVERY - Docker registry HTTPS interface on darmstadtium is OK: HTTP OK: HTTP/1.1 200 OK - 2460 bytes in 0.472 second response time [12:01:20] (03PS1) 10BBlack: Remove various dead cp4005-20 DNS entries [dns] - 10https://gerrit.wikimedia.org/r/478176 (https://phabricator.wikimedia.org/T167377) [12:02:49] (03CR) 10BBlack: "@RobH is it ok to kill all of these, or does it screw up some process or tracking on the final open ticket?" [dns] - 10https://gerrit.wikimedia.org/r/478176 (https://phabricator.wikimedia.org/T167377) (owner: 10BBlack) [12:05:06] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@44e0955]: Fix: Encode recommendation api title (duration: 21m 11s) [12:05:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:15] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received [12:09:33] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [12:18:35] !log installing nodejs security updates on stat/notebook [12:18:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:19:24] (03CR) 10GTirloni: "I wasn't aware of T200984 but I seem to have had the same thoughts in this review." [puppet] - 10https://gerrit.wikimedia.org/r/477937 (https://phabricator.wikimedia.org/T211094) (owner: 10Cwhite) [12:22:45] 10Operations, 10ops-codfw: Time on new servers different from time on puppetmaster1001 - https://phabricator.wikimedia.org/T211170 (10MoritzMuehlenhoff) elastic2050 now has a role assigned and thus the current time, is there a server among the 27 you installed which is still in the pristine condition right aft... [12:24:43] PROBLEM - MediaWiki memcached error rate on graphite1004 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1panelId=1fullscreen [12:27:07] RECOVERY - MediaWiki memcached error rate on graphite1004 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1panelId=1fullscreen [12:31:43] 10Operations, 10DBA, 10StructuredDiscussions, 10Growth-Team (Current Sprint), and 2 others: Setup separate logical External Store for Flow in production - https://phabricator.wikimedia.org/T107610 (10Banyek) Sorry for the late answer @Catrope, The bullets you provided seems good to me, I'd say we could ta... [12:44:53] (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler1002/13866/ this is a new compilation from today. I am inclined to deploy this on monday." [puppet] - 10https://gerrit.wikimedia.org/r/475499 (owner: 10Giuseppe Lavagetto) [12:56:55] 10Operations, 10monitoring, 10Patch-For-Review, 10User-CDanis: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 (10CDanis) [13:14:39] 10Operations, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Put restbase201[3-8] into conftool and LVS - https://phabricator.wikimedia.org/T211416 (10fgiunchedi) [13:14:41] 10Operations, 10ops-codfw, 10Patch-For-Review, 10Services (watching), 10User-fgiunchedi: rack/setup/install restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T209615 (10fgiunchedi) [13:16:26] 10Operations, 10ops-eqsin: update PDUs for eqsin (asset tag and other info) - https://phabricator.wikimedia.org/T211368 (10faidon) Can we add procurement task and purchase date immediately? It doesn't sound like there is an immediate blocker to this. [13:24:31] 10Operations, 10monitoring, 10Patch-For-Review, 10User-CDanis: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 (10CDanis) [13:24:51] 10Operations, 10ops-eqsin, 10Traffic: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 (10BBlack) 05Open>03Resolved No new EDAC errors reported since repooling, all we can do is assume it's ok for now I think. [13:37:38] !log rolling reboot of scb in codfw (along with nodejs update) [13:37:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:55:12] (03PS11) 10Paladox: httpd::mpm: Also remove mod_php for 7.0 and 7.2 if not prefork [puppet] - 10https://gerrit.wikimedia.org/r/477587 (https://phabricator.wikimedia.org/T208257) [13:55:21] 10Operations, 10Core Platform Team Backlog (Watching / External), 10Services (watching), 10User-fgiunchedi: Put restbase201[3-8] into conftool and LVS - https://phabricator.wikimedia.org/T211416 (10fgiunchedi) Indeed, FWIW I tend to treat restbase and cassandra separate so this will be done as soon as the... [14:01:24] 10Operations, 10Core Platform Team Backlog (Watching / External), 10Services (watching), 10User-fgiunchedi: Put restbase201[3-8] into conftool and LVS - https://phabricator.wikimedia.org/T211416 (10mobrovac) They are independent, though. Cassandra doesn't go into LVS at all, and RESTBase is fully functiona... [14:01:32] (03PS2) 10Mforns: Adjust params for Analytics data_purge EventLoggingSanitization job [puppet] - 10https://gerrit.wikimedia.org/r/478129 (https://phabricator.wikimedia.org/T202429) [14:01:55] (03CR) 10Mforns: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/478129 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [14:02:24] (03CR) 10jerkins-bot: [V: 04-1] Adjust params for Analytics data_purge EventLoggingSanitization job [puppet] - 10https://gerrit.wikimedia.org/r/478129 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [14:05:37] (03PS3) 10Mforns: Adjust params for Analytics data_purge EventLoggingSanitization job [puppet] - 10https://gerrit.wikimedia.org/r/478129 (https://phabricator.wikimedia.org/T202429) [14:05:39] (03CR) 10Mforns: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/478129 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [14:05:42] 10Operations, 10ops-codfw: Time on new servers different from time on puppetmaster1001 - https://phabricator.wikimedia.org/T211170 (10Papaul) @MoritzMuehlenhoff no there is no server for now which is still in pristine condition right after the OS install. But I am about to install logstatsh200[1-3] once done,... [14:18:08] 10Operations, 10Gadgets, 10MediaWiki-Cache, 10MW-1.33-notes (1.33.0-wmf.6; 2018-11-27), and 4 others: Mcrouter periodically reports soft TKOs for mc[1,2]035 leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) Today another mediawiki alert from ~12:24 to ~12:27 UTC. @Ni... [14:21:11] !log more weight to new ms-be codfw hosts - T209395 [14:21:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:14] T209395: rack/setup/install new ms-be servers ms-be204[4-9] ,ms-be2050 - https://phabricator.wikimedia.org/T209395 [14:21:21] (03PS1) 10Hoo man: Remove the "wikibase-debug" log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478196 (https://phabricator.wikimedia.org/T207850) [14:32:45] (03PS1) 10Elukey: Apply interface::rps to mc1022 [puppet] - 10https://gerrit.wikimedia.org/r/478198 (https://phabricator.wikimedia.org/T209489) [14:37:17] 10Operations, 10Proton, 10SRE-Access-Requests: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10Mholloway) Ping @Jhernandez ^ [14:42:14] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/13868/" [puppet] - 10https://gerrit.wikimedia.org/r/478198 (https://phabricator.wikimedia.org/T209489) (owner: 10Elukey) [14:45:55] 10Operations, 10Proton, 10SRE-Access-Requests: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10Dzahn) Are you requesting the admin group "sc-admins" ? ` sc-admins: description: General service cluster admins - sc(a|b) gid: 77... [14:46:26] (03PS1) 10Alexandros Kosiaris: baseimages: Add a default LC_ALL C.UTF-8 locale [puppet] - 10https://gerrit.wikimedia.org/r/478200 (https://phabricator.wikimedia.org/T210260) [14:48:03] 10Operations, 10ops-codfw, 10Core Platform Team, 10Services (doing), and 2 others: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 (10Eevans) [14:51:01] 10Operations, 10Proton, 10SRE-Access-Requests: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10Dzahn) please see https://wikitech.wikimedia.org/wiki/Production_shell_access#New_users for the required steps for the process we'll also ne... [14:52:56] !log decommissioning cassandra-a, restbase2002 -- T210843 [14:52:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:59] T210843: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 [14:55:12] (03PS1) 10Muehlenhoff: Point ntp alias for codfw to dns2001 [dns] - 10https://gerrit.wikimedia.org/r/478202 (https://phabricator.wikimedia.org/T211170) [14:57:33] 10Operations, 10Wikimedia-Logstash, 10service-runner, 10Core Platform Team Backlog (Next), 10Services (next): Move service-runner to new logging infrastructure - https://phabricator.wikimedia.org/T211125 (10Pchelolo) There's not that much logging happening in beta services, I would guess we should start... [14:57:36] (03PS2) 10Muehlenhoff: Point ntp alias for codfw to dns2001 [dns] - 10https://gerrit.wikimedia.org/r/478202 (https://phabricator.wikimedia.org/T211170) [14:57:55] (03PS8) 10Mark Bergsma: Ensure that depool threshold is being honored on new/updated configs [debs/pybal] - 10https://gerrit.wikimedia.org/r/443967 (https://phabricator.wikimedia.org/T184715) (owner: 10Vgutierrez) [14:57:57] (03PS3) 10Mark Bergsma: Call _updateServerMetrics from _serverInitDone [debs/pybal] - 10https://gerrit.wikimedia.org/r/477794 [14:57:59] (03PS1) 10Mark Bergsma: Expand Coordinator.resultUp behavior on first monitor check result [debs/pybal] - 10https://gerrit.wikimedia.org/r/478203 [14:59:02] (03CR) 10Muehlenhoff: [C: 032] Point ntp alias for codfw to dns2001 [dns] - 10https://gerrit.wikimedia.org/r/478202 (https://phabricator.wikimedia.org/T211170) (owner: 10Muehlenhoff) [14:59:21] 10Operations, 10Proton, 10SRE-Access-Requests: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10jijiki) Those requests will be discussed on the 17th Dec SRE meeting, provided we have all information :) [15:03:28] 10Operations, 10ops-codfw, 10Patch-For-Review: Time on new servers different from time on puppetmaster1001 - https://phabricator.wikimedia.org/T211170 (10MoritzMuehlenhoff) 05Open>03Resolved During the initial installation d-i run the clock-setup component which syncronises the system clock using rdate.... [15:04:55] (03PS2) 10Volans: validator: on duplicate names check if private [dns] - 10https://gerrit.wikimedia.org/r/478171 (https://phabricator.wikimedia.org/T182028) [15:04:57] (03PS1) 10Volans: validator: fix mgmt detection, be deterministic [dns] - 10https://gerrit.wikimedia.org/r/478205 (https://phabricator.wikimedia.org/T182028) [15:05:34] (03PS3) 10Andrew Bogott: Openstack: monitor nova and kvm on cloudvirt hosts [puppet] - 10https://gerrit.wikimedia.org/r/478113 (https://phabricator.wikimedia.org/T211388) [15:07:40] (03CR) 10Andrew Bogott: [C: 032] Openstack: monitor nova and kvm on cloudvirt hosts [puppet] - 10https://gerrit.wikimedia.org/r/478113 (https://phabricator.wikimedia.org/T211388) (owner: 10Andrew Bogott) [15:08:39] PROBLEM - DPKG on kubestage1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:08:51] PROBLEM - Check systemd state on kubestage1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:09:59] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install codfw logstash elasticsearch storage servers - https://phabricator.wikimedia.org/T211065 (10Papaul) [15:13:29] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/477620 (https://phabricator.wikimedia.org/T147326) (owner: 10Cwhite) [15:14:10] (03CR) 10Andrew Bogott: [C: 031] wmcs: add prometheus-memcached-exporter [puppet] - 10https://gerrit.wikimedia.org/r/477620 (https://phabricator.wikimedia.org/T147326) (owner: 10Cwhite) [15:16:56] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Banyek) >>! In T208622#4804549, @Ottomata wrote: > @Banyek another Q: Can we add permissions to the recommendationapi user on m2-master to be able to... [15:20:06] (03CR) 10BBlack: [C: 031] validator: on duplicate names check if private [dns] - 10https://gerrit.wikimedia.org/r/478171 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [15:20:20] (03CR) 10BBlack: [C: 031] validator: fix mgmt detection, be deterministic [dns] - 10https://gerrit.wikimedia.org/r/478205 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [15:22:24] RECOVERY - DPKG on kubestage1002 is OK: All packages OK [15:22:53] (03CR) 10Volans: [C: 032] validator: on duplicate names check if private [dns] - 10https://gerrit.wikimedia.org/r/478171 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [15:23:29] (03PS3) 10Volans: validator: on duplicate names check if private [dns] - 10https://gerrit.wikimedia.org/r/478171 (https://phabricator.wikimedia.org/T182028) [15:24:18] (03PS1) 10BBlack: lvs300x: add v6 forward records [dns] - 10https://gerrit.wikimedia.org/r/478207 [15:24:50] (03PS2) 10Volans: validator: fix mgmt detection, be deterministic [dns] - 10https://gerrit.wikimedia.org/r/478205 (https://phabricator.wikimedia.org/T182028) [15:25:49] (03CR) 10Volans: [C: 032] validator: fix mgmt detection, be deterministic [dns] - 10https://gerrit.wikimedia.org/r/478205 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [15:33:10] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic, 10Patch-For-Review: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10jijiki) >>! In T209298#4799902, @Afandian wrote: > It appears I supplied a key in the... [15:33:50] RECOVERY - Check systemd state on kubestage1002 is OK: OK - running: The system is fully operational [15:34:02] PROBLEM - puppet last run on icinga1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:18] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Marostegui) Is this going to be a one time import? How much data will be imported? [15:36:21] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic, 10Patch-For-Review: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10jijiki) @toddleroux please provide the missing information, we would really like to have... [15:36:59] !log rebooting sarin/neodymium [15:37:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:37:07] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10User-jijiki: Requesting access to deployment for Christoph Jauera (WMDE-Fisch) - https://phabricator.wikimedia.org/T211014 (10jijiki) [15:38:33] 10Operations, 10Proton, 10SRE-Access-Requests: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10mobrovac) >>! In T211382#4806010, @Dzahn wrote: > Are you requesting the admin group "sc-admins" ? No. The scope of this ticket is for Proton... [15:39:35] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Banyek) ignore https://phabricator.wikimedia.org/T208622#4803814 I mis-read something [15:41:09] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install codfw logstash elasticsearch storage servers - https://phabricator.wikimedia.org/T211065 (10Papaul) [15:41:37] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install codfw logstash elasticsearch storage servers - https://phabricator.wikimedia.org/T211065 (10Papaul) a:05Papaul>03fgiunchedi @fgiunchedi all yours [15:41:50] (03PS1) 10Elukey: profile::cache::kafka::alerts: set more sensitive thresholds [puppet] - 10https://gerrit.wikimedia.org/r/478210 (https://phabricator.wikimedia.org/T210939) [15:43:03] (03PS1) 10Ema: ATS: check when a restart is required [puppet] - 10https://gerrit.wikimedia.org/r/478211 (https://phabricator.wikimedia.org/T204209) [15:43:35] (03PS1) 10Faidon Liambotis: Cleanup a few old mgmt entries [dns] - 10https://gerrit.wikimedia.org/r/478212 [15:43:58] (03CR) 10Faidon Liambotis: [C: 032] Cleanup a few old mgmt entries [dns] - 10https://gerrit.wikimedia.org/r/478212 (owner: 10Faidon Liambotis) [15:44:53] 10Operations, 10Proton, 10SRE-Access-Requests: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10pmiazga) @phuedx could you approve my request? @Jhernandez can you approve the request for @bearND @Mholloway @MSantos and @Tgr? [15:44:54] (03CR) 10Filippo Giunchedi: "> Patch Set 6:" [puppet] - 10https://gerrit.wikimedia.org/r/477937 (https://phabricator.wikimedia.org/T211094) (owner: 10Cwhite) [15:45:28] (03PS2) 10BBlack: Remove various dead cp4005-20 DNS entries [dns] - 10https://gerrit.wikimedia.org/r/478176 (https://phabricator.wikimedia.org/T167377) [15:45:32] (03CR) 10BBlack: [C: 032] Remove various dead cp4005-20 DNS entries [dns] - 10https://gerrit.wikimedia.org/r/478176 (https://phabricator.wikimedia.org/T167377) (owner: 10BBlack) [15:46:14] (03PS2) 10BBlack: lvs300x: add v6 forward records [dns] - 10https://gerrit.wikimedia.org/r/478207 [15:46:31] (03CR) 10BBlack: [C: 032] lvs300x: add v6 forward records [dns] - 10https://gerrit.wikimedia.org/r/478207 (owner: 10BBlack) [15:48:29] (03CR) 10Ema: [C: 032] ATS: check when a restart is required [puppet] - 10https://gerrit.wikimedia.org/r/478211 (https://phabricator.wikimedia.org/T204209) (owner: 10Ema) [15:49:27] RECOVERY - puppet last run on icinga1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:49:32] 10Operations, 10Proton, 10SRE-Access-Requests: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10phuedx) Approved. [15:52:52] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Banyek) I propose a quick talk with @bmansurov and @Ottomata to clarify a few questions on monday [15:56:15] 10Operations, 10Proton, 10SRE-Access-Requests: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10bearND) [15:57:56] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install codfw logstash elasticsearch storage servers - https://phabricator.wikimedia.org/T211065 (10herron) a:05fgiunchedi>03herron Awesome! Thanks much @Papaul! I'll be working on these along with @fgiunchedi and will get started on service c... [16:00:27] 10Operations, 10Proton, 10SRE-Access-Requests: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10bearND) Shouldn't @Jdforrester-WMF be added, too? [16:01:47] (03PS2) 10Elukey: profile::cache::kafka::alerts: set more sensitive thresholds [puppet] - 10https://gerrit.wikimedia.org/r/478210 (https://phabricator.wikimedia.org/T210939) [16:02:01] 10Operations, 10Proton, 10SRE-Access-Requests: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10pmiazga) @bearND - per https://phabricator.wikimedia.org/T210652#4787383 looks like no, James is working on multimedia. [16:05:33] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10bmansurov) @Marostegui > Is this going to be a one time import? Maybe a 4-5 time import, maybe less. For now we have all the data needed in MySQL (tha... [16:06:00] (03PS1) 10Effie Mouzeli: admin: Add Michael Grosse to ldap_only group [puppet] - 10https://gerrit.wikimedia.org/r/478215 (https://phabricator.wikimedia.org/T211128) [16:06:17] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10bmansurov) @Banyek OK, Monday sounds good. Feel free to send a calendar invitation or let me know your preferred time and I'll send it myself. [16:13:58] (03PS2) 10Elukey: Add missing AAAA records for analytics eqiad hosts [dns] - 10https://gerrit.wikimedia.org/r/467705 (owner: 10Volans) [16:26:28] (03PS3) 10Elukey: Add missing AAAA records for some analytics eqiad hosts [dns] - 10https://gerrit.wikimedia.org/r/467705 (owner: 10Volans) [16:27:38] 10Operations, 10LDAP-Access-Requests: Add Michael Grosse to 'wmde' LDAP group - https://phabricator.wikimedia.org/T208722 (10MoritzMuehlenhoff) >>! In T208722#4728744, @RStallman-legalteam wrote: > Michael's NDA is now signed and on file. Feel free to proceed w/ access. Thanks! @RStallman-legalteam : Can you... [16:30:48] (03PS4) 10Elukey: Add missing AAAA records for some analytics eqiad hosts [dns] - 10https://gerrit.wikimedia.org/r/467705 (owner: 10Volans) [16:35:42] (03CR) 10Elukey: [C: 032] Add missing AAAA records for some analytics eqiad hosts [dns] - 10https://gerrit.wikimedia.org/r/467705 (owner: 10Volans) [16:36:41] elukey: <3 [16:37:03] volans: did a little batch, prepping for the other ones that will be merged on monday :) [16:37:29] thanks! [16:37:32] (03CR) 10Dzahn: [C: 031] admin: Add Michael Grosse to ldap_only group [puppet] - 10https://gerrit.wikimedia.org/r/478215 (https://phabricator.wikimedia.org/T211128) (owner: 10Effie Mouzeli) [16:40:24] (03PS1) 10Elukey: Add AAAA records for analytics103* eqiad hosts [dns] - 10https://gerrit.wikimedia.org/r/478220 [16:40:26] (03PS1) 10Elukey: Add AAA records for analytics104* eqiad hosts [dns] - 10https://gerrit.wikimedia.org/r/478221 [16:40:28] (03PS1) 10Elukey: Add AAAA records for analytics10[567]* eqiad hosts [dns] - 10https://gerrit.wikimedia.org/r/478222 [16:41:17] (03PS2) 10Elukey: Add AAAA records for analytics104* eqiad hosts [dns] - 10https://gerrit.wikimedia.org/r/478221 [16:41:19] (03PS2) 10Elukey: Add AAAA records for analytics10[567]* eqiad hosts [dns] - 10https://gerrit.wikimedia.org/r/478222 [16:41:32] volans: --^ [16:41:42] wow [16:47:55] 10Operations, 10cloud-services-team, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Port DirectorySize diamond collector to a Prometheus exporter - https://phabricator.wikimedia.org/T211094 (10colewhite) @GTirloni glad to help! [16:48:17] 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Deprovision Diamond collectors no longer in use - https://phabricator.wikimedia.org/T183454 (10colewhite) [16:48:23] 10Operations, 10cloud-services-team, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Port DirectorySize diamond collector to a Prometheus exporter - https://phabricator.wikimedia.org/T211094 (10colewhite) 05Open>03Resolved [16:59:22] Does anyone know if Etherpad is backed up these days? My understanding is that Etherpad isn't backed up, but a lot of team records are stored there (for better or worse) and I would regret it if they disappeared. [16:59:55] (I tried to embark on a project to move our team's notes to office wiki, but the formatting couldn't really be converted well.) [17:00:13] (03PS1) 10CDanis: WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 [17:00:47] (03CR) 10jerkins-bot: [V: 04-1] WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 (owner: 10CDanis) [17:01:21] harej: I am pretty sure that etherpad is not backed up, and is generally regarded as insecure (obviously) and disposable. Copying them to a wiki is the way to save things. Probably you could just apply some big blockquotes on the wiki page if you don't want to deal with formatting. [17:01:31] mutante may correct me re: backups but I'd be shocked [17:02:27] harej: it's also worth remembering that at any time a random internet stranger can open your etherpad, hit select-all and then [17:02:30] definitely we officialy say "dont rely on this, if you need a backup you must copy to wiki" [17:02:39] * mutante peaks into Bacula console though [17:03:24] re "insecure" [17:03:29] copying to officewiki will not resolve this [17:03:36] once data is in etherpad it is public forever [17:04:05] I can't recall ever using Etherpad to document private stuff. (It's obviously a bad choice for that as you said.) [17:04:07] Krenair: that's correct, for certain values of 'insecure'. On officewiki it's at least secure against anonymous and un-auditable editing. [17:04:24] (03PS2) 10CDanis: WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 [17:04:59] (03CR) 10jerkins-bot: [V: 04-1] WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 (owner: 10CDanis) [17:05:41] maybe 'secure' isn't the right word for that [17:06:04] whatever the adjective form of "data integrity" is [17:12:36] PROBLEM - toolschecker: Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 504 Gateway Time-out - string OK not found on http://checker.tools.wmflabs.org:80/dumps - 356 bytes in 60.011 second response time [17:13:05] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Andrew) *bump* [17:14:26] RECOVERY - toolschecker: Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 22.309 second response time [17:15:50] <_joe_> !log uploading php-tideways (rebuilt with php 7.2 support) to stretch-wikimedia thirdparty/php72 T206152 [17:15:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:15:54] T206152: Set up request profiling for PHP 7 - https://phabricator.wikimedia.org/T206152 [17:16:55] <_joe_> harej: to know if etherpad is backed up [17:17:05] <_joe_> ask if the database is :) [17:17:09] !log T207377 rebooted labstore1007 for kernel upgrades [17:17:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:17:12] <_joe_> so a question for our dbas [17:17:12] T207377: Reboot WMCS servers for L1TF - https://phabricator.wikimedia.org/T207377 [17:17:30] _joe_: Good point. Are they also in here? [17:17:48] <_joe_> harej: yes, but #wikimedia-databases is their den [17:17:50] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Reboot WMCS servers for L1TF - https://phabricator.wikimedia.org/T207377 (10Bstorm) [17:17:58] <_joe_> also they might all be offline at this time on friday [17:18:34] Perhaps. [17:33:02] (03PS3) 10CDanis: WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 [17:33:38] (03CR) 10jerkins-bot: [V: 04-1] WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 (owner: 10CDanis) [17:37:17] 10Operations, 10cloud-services-team, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Port DirectorySize diamond collector to a Prometheus exporter - https://phabricator.wikimedia.org/T211094 (10MoritzMuehlenhoff) @colewhite Could you followup with a patch to absent the collector from puppet? Other t... [17:40:35] 10Operations, 10LDAP-Access-Requests: Add Michael Grosse to 'wmde' LDAP group - https://phabricator.wikimedia.org/T208722 (10RStallman-legalteam) Done! Thanks. [17:46:38] PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: /en.wikipedia.org/v1/data/citation/{format}/{query} (Get citation for Darth Vader) timed out before a response was received [17:47:42] RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy [17:47:44] !log rebooting logstash1006 for security updates [17:47:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:49:40] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received [17:53:08] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [17:58:27] (03PS4) 10CDanis: WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 [18:08:10] (03PS5) 10CDanis: WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 [18:08:20] 10Operations, 10ops-eqiad: eqiad: Re-connect cage cameras - https://phabricator.wikimedia.org/T207965 (10Cmjohnson) a:05Cmjohnson>03Papaul Thanks @papaul for helping with this...below is the correct port assignments for each camera. Please assign back to me after switch port update Thanks! Camera Row A F... [18:20:13] 10Operations, 10Packaging: Upgrade php5-json .deb to at least 1.3.8 - https://phabricator.wikimedia.org/T160101 (10Jdforrester-WMF) We don't use PHP 5.x anywhere anymore. Is this still relevant or can we close? [18:20:16] (03PS6) 10CDanis: WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 [18:20:49] (03CR) 10jerkins-bot: [V: 04-1] WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 (owner: 10CDanis) [18:22:43] (03PS7) 10CDanis: WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 [18:23:16] (03CR) 10jerkins-bot: [V: 04-1] WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 (owner: 10CDanis) [18:23:56] (03PS8) 10CDanis: WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 [18:24:58] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Marostegui) >>! In T208622#4806242, @bmansurov wrote: > @Marostegui >> Is this going to be a one time import? > Maybe a 4-5 time import, maybe less. F... [18:32:36] 10Operations, 10Proton, 10SRE-Access-Requests: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10Mholloway) [18:32:38] harej: got distracted earlier. just wanted to double confirm there is no backup, also not bacula. that stuff should really move to a wiki. it doesn't necessarily all have to be office wiki. any wiki gives us history and edit log separate from it being public or not [18:33:03] and if it was on etherpad it was already public. so i would actually say it's more for meta wiki [18:33:14] Thank you for confirming [18:36:07] (03PS9) 10CDanis: WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 [18:36:43] (03CR) 10jerkins-bot: [V: 04-1] WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 (owner: 10CDanis) [18:37:39] (03PS10) 10CDanis: WIP kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 [18:41:43] (03CR) 10Muehlenhoff: [C: 031] "Rachel added him to the NDA sheet." [puppet] - 10https://gerrit.wikimedia.org/r/478215 (https://phabricator.wikimedia.org/T211128) (owner: 10Effie Mouzeli) [18:43:30] RECOVERY - Long running screen/tmux on logstash1006 is OK: OK: No SCREEN or tmux processes detected. [18:51:06] (03PS11) 10CDanis: kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 (https://phabricator.wikimedia.org/T209863) [18:51:45] (03CR) 10jerkins-bot: [V: 04-1] kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 (https://phabricator.wikimedia.org/T209863) (owner: 10CDanis) [18:52:43] (03PS12) 10CDanis: kernel.mtail needs hostnames [puppet] - 10https://gerrit.wikimedia.org/r/478225 (https://phabricator.wikimedia.org/T209863) [18:59:48] PROBLEM - HHVM rendering on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:00:50] RECOVERY - HHVM rendering on mw1234 is OK: HTTP OK: HTTP/1.1 200 OK - 81810 bytes in 0.154 second response time [19:04:21] I see an hhvm restart in mw1234's syslog a few minutes before that, I think? [19:04:51] it is also one of the perpetually-overheating machines [19:06:42] oh, and HHVM failed to shut down gracefully. no idea if any of this is 'normal' [19:08:20] 10Operations, 10hardware-requests, 10Core Platform Team (Session Management Service (CDP2)), 10Core Platform Team Kanban (Doing), 10User-Eevans: Hardware for session storage service - https://phabricator.wikimedia.org/T206017 (10Cmjohnson) [19:14:40] 10Operations, 10SRE-Access-Requests: Requesting access to RESOURCE for USER[S] - https://phabricator.wikimedia.org/T211445 (10toddleroux) [19:16:31] 10Operations, 10SRE-Access-Requests: Requesting access to RESOURCE for USER[S] - https://phabricator.wikimedia.org/T211445 (10toddleroux) 05Open>03Invalid [19:16:47] 10Operations, 10ops-eqiad: eqiad: Re-connect cage cameras - https://phabricator.wikimedia.org/T207965 (10Papaul) ` papaul@asw2-a-eqiad# run show interfaces ge-1/0/2 descriptions Interface Admin Link Description ge-1/0/2 up down cam1-a-b-eqiad.eqiad.wmnet papaul@asw2-a-eqi... [19:17:05] 10Operations, 10ops-eqiad: eqiad: Re-connect cage cameras - https://phabricator.wikimedia.org/T207965 (10Papaul) a:05Papaul>03Cmjohnson [19:21:24] 10Operations, 10Research-Programs, 10SRE-Access-Requests, 10Epic, 10Patch-For-Review: access to analytics-privatedata-users for @toddleroux, @Afandian, & @RyanSteinberg - https://phabricator.wikimedia.org/T209298 (10toddleroux) Sorry for the delay on my end. 1. L3 is now signed. 2. my wikitech username... [19:55:38] 10Operations, 10monitoring, 10Patch-For-Review, 10User-CDanis: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 (10CDanis) [20:10:43] (03PS1) 10Bearloga: shiny_server: change gfortran/g++ dep [puppet] - 10https://gerrit.wikimedia.org/r/478252 [21:09:02] (03PS2) 10Dzahn: admin: Add Michael Grosse to ldap_only group [puppet] - 10https://gerrit.wikimedia.org/r/478215 (https://phabricator.wikimedia.org/T211128) (owner: 10Effie Mouzeli) [21:09:26] (03CR) 10Dzahn: [C: 032] admin: Add Michael Grosse to ldap_only group [puppet] - 10https://gerrit.wikimedia.org/r/478215 (https://phabricator.wikimedia.org/T211128) (owner: 10Effie Mouzeli) [21:12:22] (03CR) 10Dzahn: [C: 031] puppet:Reduce cronspam from modules/mediawiki/ [puppet] - 10https://gerrit.wikimedia.org/r/470877 (https://phabricator.wikimedia.org/T150375) (owner: 10Thifranc) [21:35:16] (03CR) 10Urbanecm: "Such pages shouldn't be in the DB as of now:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477856 (https://phabricator.wikimedia.org/T205546) (owner: 10Urbanecm) [21:35:21] 10Operations, 10Proton, 10SRE-Access-Requests: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10MSantos) [21:47:39] (03PS1) 10Urbanecm: Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) [21:48:29] (03CR) 10jerkins-bot: [V: 04-1] Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [21:50:13] 10Operations, 10Performance-Team, 10monitoring, 10User-CDanis: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 (10Peter) a:05Peter>03None I've checked the WebPageTest and WebPageReplay dashboards and they look ok. There's some GUI fine-tuning I need to do: The new version always disp... [21:50:53] (03PS2) 10Urbanecm: Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) [21:51:39] (03CR) 10jerkins-bot: [V: 04-1] Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [21:53:27] 10Operations, 10Performance-Team, 10monitoring, 10User-CDanis: Upgrade grafana to 5.x - https://phabricator.wikimedia.org/T210416 (10CDanis) a:03CDanis [21:53:33] (03PS3) 10Urbanecm: Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) [21:54:21] (03CR) 10jerkins-bot: [V: 04-1] Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [21:55:50] (03PS4) 10Urbanecm: Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) [21:56:37] (03CR) 10jerkins-bot: [V: 04-1] Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [21:57:22] (03PS5) 10Urbanecm: Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) [21:57:45] (03PS6) 10Urbanecm: Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) [21:58:48] (03CR) 10jerkins-bot: [V: 04-1] Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [21:59:21] (03PS7) 10Urbanecm: Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) [22:00:15] (03CR) 10jerkins-bot: [V: 04-1] Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [22:00:55] (03PS8) 10Urbanecm: Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) [22:01:58] (03CR) 10jerkins-bot: [V: 04-1] Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [22:03:21] (03PS9) 10Urbanecm: Check if all extra namespaces have correspondent talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) [22:05:12] 10Operations, 10SRE-Access-Requests: Can't connect to production with mbsantos user - https://phabricator.wikimedia.org/T211455 (10MSantos) [22:06:17] 10Operations, 10SRE-Access-Requests: Can't connect to production with mbsantos user - https://phabricator.wikimedia.org/T211455 (10MSantos) @Dzahn could this be related to {T211382} somehow? [22:12:51] (03PS3) 10MarcoAurelio: Add NS_PROJECT localised name for tt.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477979 (https://phabricator.wikimedia.org/T211312) [22:15:16] 10Operations, 10SRE-Access-Requests: Can't connect to production with mbsantos user - https://phabricator.wikimedia.org/T211455 (10MSantos) 05Open>03Invalid After reboot, the problem disappeared. [22:26:59] (03CR) 10RobH: [C: 031] Remove various dead cp4005-20 DNS entries [dns] - 10https://gerrit.wikimedia.org/r/478176 (https://phabricator.wikimedia.org/T167377) (owner: 10BBlack) [22:27:58] (03CR) 10RobH: [C: 031] "Normally we don't remove mgmt dns while they are racked, so they are accessible. However, that isn't the case in ulsfo, since we moved ra" [dns] - 10https://gerrit.wikimedia.org/r/478176 (https://phabricator.wikimedia.org/T167377) (owner: 10BBlack) [22:34:04] (03CR) 10CRusnov: [V: 032 C: 032] Add an old hardware report [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/477716 (https://phabricator.wikimedia.org/T205899) (owner: 10CRusnov) [22:43:30] 10Operations, 10MediaWiki-Logging, 10Wikimedia-Logstash, 10MW-1.33-notes (1.33.0-wmf.8; 2018-12-11), 10Patch-For-Review: Move mediawiki to new logging infrastructure - https://phabricator.wikimedia.org/T211124 (10bd808) >>! In T211124#4805389, @fgiunchedi wrote: > Thanks @bd808 for the context/insight, I... [23:41:38] 10Operations, 10ops-codfw, 10Core Platform Team, 10Services (doing), and 2 others: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 (10Eevans) [23:41:57] !log decommissioning cassandra-b, restbase2002 -- T210843 [23:42:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:42:01] T210843: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 [23:53:42] (03PS1) 10Cwhite: remove directorysize diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/478368 (https://phabricator.wikimedia.org/T183454) [23:55:59] 10Operations, 10Research, 10Patch-For-Review, 10User-Banyek: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Dzahn) >>! In T208622#4806576, @Marostegui wrote: > However, if there is a firewall in between it must be for a reason. So probably this needs good res... [23:58:20] (03PS1) 10Cwhite: remove diamond::collector reference from role::labs::nfs::secondary [puppet] - 10https://gerrit.wikimedia.org/r/478371 (https://phabricator.wikimedia.org/T183454)