[00:01:57] addshore: https://phabricator.wikimedia.org/T210953#4792678 [00:02:27] legoktm: ack, can do, didn;t want to do that on a sunday :/ [00:02:47] I think it's the safer option [00:03:13] shall we? [00:03:36] * addshore looks at other action types [00:03:40] please [00:03:56] and I'll be around if you need help (or if you want me to do it) [00:04:08] well, i should be able to check all of the actions tbh [00:04:16] the issue with the wikibase one is the edit one extended the view one [00:04:24] the default for action is fine [00:04:48] it's still a significant breaking change [00:09:07] legoktm: i just checked through all of the other actions, and as long as SpecialPageAction is only for read only stuff it should all be fine [00:10:02] addshore: I mean, I'm going to revert the core patch either way [00:10:14] addshore: up to you which one you want to deploy to fix the UBN [00:10:21] legoktm: I'll let you do the revert then :) [00:10:33] and I'll head to bed, as it is already 1:10am here! :D [00:10:51] addshore: uh, how do I test that the bug is fixed once I finish deploying? [00:10:59] pm :) [00:13:21] legoktm: enjoy! thanks for being around! :) [00:13:28] <3 [00:13:51] im still jet lagged from getting back from SF, hence why im still up [00:13:57] but now to force myself to sleep [00:16:55] legoktm: fyi requiresUnblock has been around since 2011 [00:17:17] but yes, i guess the behaviour change is kind of breaking, albeit unexpectedly [00:34:40] !log legoktm@deploy1001 Synchronized php-1.33.0-wmf.6/includes/Title.php: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/477182 (duration: 00m 52s) [00:34:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:53:12] PROBLEM - puppet last run on stat1007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[cdh::hadoop::directory /user/spark] [02:56:32] PROBLEM - DPKG on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused [02:56:36] PROBLEM - dhclient process on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused [02:56:46] PROBLEM - configured eth on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused [02:56:54] PROBLEM - Disk space on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused [03:05:44] PROBLEM - Check systemd state on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused [03:07:36] PROBLEM - MD RAID on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused [03:11:48] PROBLEM - Check the NTP synchronisation status of timesyncd on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused [03:17:06] RECOVERY - MD RAID on stat1007 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [03:17:10] RECOVERY - DPKG on stat1007 is OK: All packages OK [03:17:14] RECOVERY - dhclient process on stat1007 is OK: PROCS OK: 0 processes with command name dhclient [03:17:26] RECOVERY - configured eth on stat1007 is OK: OK - interfaces up [03:17:34] RECOVERY - Check systemd state on stat1007 is OK: OK - running: The system is fully operational [03:17:38] RECOVERY - Disk space on stat1007 is OK: DISK OK [03:18:54] RECOVERY - puppet last run on stat1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [03:21:23] (03PS3) 10Zoranzoki21: Upload HD logos for multiple projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477136 (https://phabricator.wikimedia.org/T150618) [03:23:22] (03PS4) 10Zoranzoki21: Use HD logos in InitialiseSettings.php for multiple projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477139 (https://phabricator.wikimedia.org/T150618) [03:35:16] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 971.39 seconds [03:42:01] RECOVERY - Check the NTP synchronisation status of timesyncd on stat1007 is OK: OK: synced at Mon 2018-12-03 03:41:58 UTC. [04:13:20] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 205.46 seconds [05:10:40] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install elastic203[7-9], elastic204[0-9], elastic205[0-4] - https://phabricator.wikimedia.org/T210450 (10Papaul) [05:15:27] 10Operations, 10ops-codfw: ms-be2047 spontaneous reboots - https://phabricator.wikimedia.org/T209921 (10Papaul) Replaced all the parts that was shipped to me by Dell (main board, RAID controller, RAID controller interposer board.SAS cable) swapped CPU1 with CPU2 we have the same problem on the server. I email... [06:12:25] (03PS1) 10Marostegui: db-eqiad.php: Depool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477187 (https://phabricator.wikimedia.org/T86338) [06:14:02] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477187 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:15:05] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477187 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:16:19] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1076 T86338 T202167 (duration: 00m 50s) [06:16:22] !log Deploy schema change db1076 T86338 T202167 [06:16:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:16:24] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [06:16:25] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [06:16:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:16:56] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1076" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477188 [06:19:37] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477187 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:28:21] PROBLEM - puppet last run on mw1319 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ImageMagick-6/policy.xml] [06:29:27] PROBLEM - puppet last run on ms-be1027 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/rsyslog.d/40-swift.conf] [06:29:37] (03PS1) 10Marostegui: pc1004,1005,1006: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/477190 (https://phabricator.wikimedia.org/T210969) [06:31:26] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1076" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477188 (owner: 10Marostegui) [06:32:29] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1076" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477188 (owner: 10Marostegui) [06:32:43] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1076" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477188 (owner: 10Marostegui) [06:33:31] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1076 T86338 T202167 (duration: 00m 47s) [06:33:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:33:36] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [06:33:37] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [06:34:09] (03PS1) 10Marostegui: db-eqiad.php: Depool db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477191 (https://phabricator.wikimedia.org/T86338) [06:34:12] (03CR) 10Marostegui: [C: 032] pc1004,1005,1006: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/477190 (https://phabricator.wikimedia.org/T210969) (owner: 10Marostegui) [06:36:07] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477191 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:36:29] PROBLEM - puppet last run on labmon1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 10 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/rsyslog.d/40-prometheus.conf] [06:37:08] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477191 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:38:29] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1122 T86338 T202167 (duration: 00m 48s) [06:38:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:38:39] !log Deploy schema change db1122 T86338 T202167 [06:38:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:38:43] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [06:38:43] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [06:44:21] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1122" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477193 [06:44:23] (03PS1) 10Marostegui: mariadb: Decommission pc1004,pc1005 and pc1006 [puppet] - 10https://gerrit.wikimedia.org/r/477192 (https://phabricator.wikimedia.org/T210969) [06:45:05] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477191 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:48:19] (03CR) 10Marostegui: "https://puppet-compiler.wmflabs.org/compiler1002/13812/" [puppet] - 10https://gerrit.wikimedia.org/r/477192 (https://phabricator.wikimedia.org/T210969) (owner: 10Marostegui) [06:48:26] (03CR) 10Marostegui: [C: 032] mariadb: Decommission pc1004,pc1005 and pc1006 [puppet] - 10https://gerrit.wikimedia.org/r/477192 (https://phabricator.wikimedia.org/T210969) (owner: 10Marostegui) [06:52:50] !log Remove pc1004, pc1005 and pc1006 from tendril and zarcillo - T210969 [06:52:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:54] T210969: Decommission parsercache hosts: pc1004.eqiad.wmnet pc1005.eqiad.wmnet pc1006.eqiad.wmnet - https://phabricator.wikimedia.org/T210969 [06:56:49] RECOVERY - puppet last run on labmon1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:47] RECOVERY - puppet last run on mw1319 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:59:55] RECOVERY - puppet last run on ms-be1027 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:07:08] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::php: add opcache tuning for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/476499 (https://phabricator.wikimedia.org/T206341) [07:08:08] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::mediawiki::php: add opcache tuning for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/476499 (https://phabricator.wikimedia.org/T206341) (owner: 10Giuseppe Lavagetto) [07:09:13] !log Stop MySQL on pc1004, pc1005 and pc1006 as they will be decommissioned - T210969 [07:09:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:09:16] T210969: Decommission parsercache hosts: pc1004.eqiad.wmnet pc1005.eqiad.wmnet pc1006.eqiad.wmnet - https://phabricator.wikimedia.org/T210969 [07:12:01] PROBLEM - Zookeeper Server on druid1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/zookeeper/conf/zoo.cfg [07:12:17] PROBLEM - Check systemd state on druid1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:12:45] 10Operations, 10ops-eqiad, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc1004.eqiad.wmnet pc1005.eqiad.wmnet pc1006.eqiad.wmnet - https://phabricator.wikimedia.org/T210969 (10Marostegui) a:05Marostegui>03RobH pc1004, pc1005 and pc1006 are now fully ready for #dc-ops to... [07:13:27] 10Operations, 10ops-eqiad, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc1004.eqiad.wmnet pc1005.eqiad.wmnet pc1006.eqiad.wmnet - https://phabricator.wikimedia.org/T210969 (10Marostegui) Priority is high like {T209858} because these have a hard deadline on the lease expiration [07:14:09] 10Operations, 10ops-eqiad, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc1004.eqiad.wmnet pc1005.eqiad.wmnet pc1006.eqiad.wmnet - https://phabricator.wikimedia.org/T210969 (10Marostegui) [07:15:06] So random question, I was debugging something else - is it normal for a single request to load.php on mwdebug1002 to make 934 db queries? (Seems to be fetching each message individually) https://logstash.wikimedia.org/goto/fed367b79a70c72e728b243534b1b930 [07:15:13] Because naively that seems pretty excessive to me [07:16:01] checking druid1001 [07:20:17] RECOVERY - Zookeeper Server on druid1001 is OK: PROCS OK: 1 process with command name java, args org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/zookeeper/conf/zoo.cfg [07:20:26] root partition full due to logs :( [07:20:31] RECOVERY - Check systemd state on druid1001 is OK: OK - running: The system is fully operational [07:20:37] RECOVERY - Disk space on druid1001 is OK: DISK OK [07:21:04] (03CR) 10Giuseppe Lavagetto: profile::mediawiki::php: tune php-fpm parameters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/476500 (https://phabricator.wikimedia.org/T206341) (owner: 10Giuseppe Lavagetto) [07:22:03] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1122" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477193 (owner: 10Marostegui) [07:23:04] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1122" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477193 (owner: 10Marostegui) [07:23:39] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1122" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477193 (owner: 10Marostegui) [07:27:02] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1122 T86338 T202167 (duration: 00m 47s) [07:27:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:06] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [07:27:07] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [07:28:09] (03PS1) 10Marostegui: db-eqiad.php: Depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477208 (https://phabricator.wikimedia.org/T86338) [07:28:27] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::php: tune php-fpm parameters [puppet] - 10https://gerrit.wikimedia.org/r/476500 (https://phabricator.wikimedia.org/T206341) [07:28:29] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::php: armonize settings with HHVM [puppet] - 10https://gerrit.wikimedia.org/r/476501 [07:28:31] (03PS2) 10Giuseppe Lavagetto: mediawiki: configure php-fpm logging [puppet] - 10https://gerrit.wikimedia.org/r/476502 [07:29:24] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477208 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [07:29:26] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::mediawiki::php: tune php-fpm parameters [puppet] - 10https://gerrit.wikimedia.org/r/476500 (https://phabricator.wikimedia.org/T206341) (owner: 10Giuseppe Lavagetto) [07:29:52] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::mediawiki::php: armonize settings with HHVM [puppet] - 10https://gerrit.wikimedia.org/r/476501 (owner: 10Giuseppe Lavagetto) [07:31:01] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477208 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [07:31:22] <_joe_> puppet failures are mine [07:31:53] (03PS2) 10Elukey: Correct escape chars of EL sanitization in analytics data_purge.pp [puppet] - 10https://gerrit.wikimedia.org/r/476886 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [07:32:04] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1074 T86338 T202167 (duration: 00m 46s) [07:32:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:32:08] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [07:32:09] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [07:32:48] !log Deploy schema change db1074 with replication (lag will appear on labs) T86338 T202167 [07:32:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:32:56] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::php: fix function call [puppet] - 10https://gerrit.wikimedia.org/r/477209 [07:33:36] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] profile::mediawiki::php: fix function call [puppet] - 10https://gerrit.wikimedia.org/r/477209 (owner: 10Giuseppe Lavagetto) [07:33:39] PROBLEM - puppet last run on mw1242 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:33:57] PROBLEM - puppet last run on mw2290 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:34:05] PROBLEM - puppet last run on mw1261 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:34:29] PROBLEM - puppet last run on mw2177 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:34:57] PROBLEM - puppet last run on mw2252 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:35:11] PROBLEM - puppet last run on mw1345 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:35:29] PROBLEM - puppet last run on mw1274 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:35:56] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477210 [07:35:59] (03CR) 10Elukey: hadoop::ui: migrate from apache to httpd module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/474832 (owner: 10Dzahn) [07:36:03] PROBLEM - puppet last run on mw1339 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:36:03] PROBLEM - puppet last run on mw1322 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:36:07] PROBLEM - puppet last run on mw2231 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:36:16] (03CR) 10Elukey: [C: 031] Absent Redis Diamond collector on mc* servers [puppet] - 10https://gerrit.wikimedia.org/r/476883 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [07:36:23] PROBLEM - puppet last run on mw2251 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:36:25] (03CR) 10Elukey: [C: 032] Correct escape chars of EL sanitization in analytics data_purge.pp [puppet] - 10https://gerrit.wikimedia.org/r/476886 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [07:36:26] <_joe_> as I said, that was my fault [07:36:27] PROBLEM - puppet last run on mw2271 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:36:33] (03PS3) 10Elukey: Correct escape chars of EL sanitization in analytics data_purge.pp [puppet] - 10https://gerrit.wikimedia.org/r/476886 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [07:36:37] <_joe_> but it should recover soon-ish [07:36:38] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477208 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [07:37:01] PROBLEM - puppet last run on mw2176 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:37:05] PROBLEM - puppet last run on mw2233 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:37:07] PROBLEM - puppet last run on mw2201 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:37:07] PROBLEM - puppet last run on mw1233 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:37:19] PROBLEM - puppet last run on mw1332 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:37:27] PROBLEM - puppet last run on mw1263 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:37:31] PROBLEM - php7.2-fpm service on mw1255 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:37:33] PROBLEM - puppet last run on mw1279 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:37:35] PROBLEM - php7.2-fpm service on mw1261 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:37:39] PROBLEM - PHP7 rendering on mw1245 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.001 second response time [07:37:41] PROBLEM - PHP7 rendering on mw1255 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.001 second response time [07:37:45] PROBLEM - puppet last run on mw1232 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:37:57] PROBLEM - Check systemd state on mw2286 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:37:57] PROBLEM - php7.2-fpm service on mw2286 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:37:59] PROBLEM - puppet last run on mw1284 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:38:01] PROBLEM - PHP7 rendering on mw2208 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1309 bytes in 0.077 second response time [07:38:05] PROBLEM - PHP7 rendering on mw1261 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.002 second response time [07:38:09] <_joe_> that's my fault people ^^ [07:38:21] PROBLEM - puppet last run on mw1287 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:38:25] PROBLEM - php7.2-fpm service on mw2262 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:38:26] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::php: fix reference to max children [puppet] - 10https://gerrit.wikimedia.org/r/477211 [07:38:27] PROBLEM - Check systemd state on mw1255 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:38:31] PROBLEM - Check systemd state on mw1261 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:38:33] PROBLEM - PHP7 rendering on mw2286 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.073 second response time [07:38:33] PROBLEM - php7.2-fpm service on mw1226 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:38:35] PROBLEM - PHP7 rendering on mw2146 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.074 second response time [07:38:35] PROBLEM - PHP7 rendering on mw2262 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.073 second response time [07:38:37] PROBLEM - php7.2-fpm service on mw1245 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:38:39] PROBLEM - puppet last run on mw2218 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:38:39] PROBLEM - Check systemd state on mw2146 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:38:42] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] profile::mediawiki::php: fix reference to max children [puppet] - 10https://gerrit.wikimedia.org/r/477211 (owner: 10Giuseppe Lavagetto) [07:38:42] PROBLEM - Check systemd state on mw2262 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:38:43] PROBLEM - Check systemd state on mw1226 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:38:45] PROBLEM - PHP7 rendering on mw2236 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.074 second response time [07:38:47] PROBLEM - puppet last run on mw1276 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:38:49] PROBLEM - php7.2-fpm service on mw2208 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:38:49] PROBLEM - PHP7 rendering on mw1226 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.001 second response time [07:38:51] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:38:54] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::php: fix reference to max children [puppet] - 10https://gerrit.wikimedia.org/r/477211 [07:38:57] PROBLEM - php7.2-fpm service on mw2146 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:38:57] PROBLEM - Check systemd state on mw2208 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:38:57] PROBLEM - Check systemd state on mw2236 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:38:58] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] profile::mediawiki::php: fix reference to max children [puppet] - 10https://gerrit.wikimedia.org/r/477211 (owner: 10Giuseppe Lavagetto) [07:38:59] PROBLEM - Check systemd state on mw1245 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:39:09] PROBLEM - PHP7 rendering on mw1222 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.001 second response time [07:39:11] PROBLEM - php7.2-fpm service on mw2236 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:39:12] PROBLEM - puppet last run on mw2253 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:39:13] PROBLEM - puppet last run on mw2186 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:39:13] PROBLEM - PHP7 rendering on mw1221 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.001 second response time [07:39:15] RECOVERY - puppet last run on mw1261 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:39:15] PROBLEM - PHP7 rendering on mw2226 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.073 second response time [07:39:15] PROBLEM - puppet last run on mw2214 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:39:21] PROBLEM - php7.2-fpm service on mw1221 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:39:21] PROBLEM - puppet last run on mw2220 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:39:23] PROBLEM - Check systemd state on mw1221 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:39:23] (03PS4) 10Elukey: superset: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/476907 (owner: 10Dzahn) [07:39:27] PROBLEM - Check systemd state on mw2203 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:39:31] PROBLEM - Check systemd state on mw2193 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:39:35] PROBLEM - php7.2-fpm service on mw2193 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:39:35] PROBLEM - php7.2-fpm service on mw1277 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:39:39] PROBLEM - php7.2-fpm service on mw1222 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:39:41] PROBLEM - PHP7 rendering on mw2203 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.074 second response time [07:39:43] PROBLEM - PHP7 rendering on mw2283 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.073 second response time [07:39:45] PROBLEM - Check systemd state on mw1222 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:39:55] PROBLEM - PHP7 rendering on mw2193 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.073 second response time [07:40:01] PROBLEM - Check systemd state on mw2283 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:40:02] (03CR) 10Elukey: [C: 032] superset: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/476907 (owner: 10Dzahn) [07:40:05] PROBLEM - php7.2-fpm service on mw2203 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:40:11] PROBLEM - php7.2-fpm service on mw2226 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:40:17] PROBLEM - PHP7 rendering on mw1277 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.002 second response time [07:40:19] PROBLEM - PHP7 rendering on mw2189 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.074 second response time [07:40:25] PROBLEM - php7.2-fpm service on mw1224 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:40:27] PROBLEM - Check systemd state on mw2174 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:40:29] PROBLEM - php7.2-fpm service on mw2283 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:40:29] PROBLEM - Check systemd state on mw2226 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:40:33] PROBLEM - PHP7 rendering on mw2174 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1309 bytes in 0.074 second response time [07:40:45] PROBLEM - Check systemd state on mw1277 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:40:57] PROBLEM - PHP7 rendering on mw2230 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.073 second response time [07:40:57] PROBLEM - PHP7 rendering on mw2185 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.074 second response time [07:40:59] PROBLEM - Check systemd state on mw2185 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:41:03] PROBLEM - php7.2-fpm service on mw1267 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:41:05] PROBLEM - Check systemd state on mw2230 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:41:05] PROBLEM - PHP7 rendering on mw1267 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.001 second response time [07:41:07] PROBLEM - PHP7 rendering on mw1224 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.001 second response time [07:41:11] PROBLEM - php7.2-fpm service on mw1331 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:41:11] PROBLEM - php7.2-fpm service on mw2174 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:41:15] PROBLEM - Check systemd state on mw2189 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:41:15] PROBLEM - php7.2-fpm service on mw2189 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:41:15] PROBLEM - Check systemd state on mw1224 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:41:22] PROBLEM - PHP7 rendering on mw2275 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.073 second response time [07:41:22] PROBLEM - Check systemd state on mw2215 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:41:25] PROBLEM - PHP7 rendering on mw2238 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1309 bytes in 0.075 second response time [07:41:25] PROBLEM - PHP7 rendering on mw2215 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1309 bytes in 0.078 second response time [07:41:25] PROBLEM - php7.2-fpm service on mw2238 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:41:25] PROBLEM - Check systemd state on mw1331 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:41:25] PROBLEM - PHP7 rendering on mw1251 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.002 second response time [07:41:25] PROBLEM - Check systemd state on mw1267 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:41:27] PROBLEM - puppet last run on mw2138 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:41:29] PROBLEM - PHP7 rendering on mw1234 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.001 second response time [07:41:29] PROBLEM - PHP7 rendering on mw1331 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.001 second response time [07:41:31] PROBLEM - Check systemd state on mw2275 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:41:35] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::php: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/477212 [07:41:35] PROBLEM - php7.2-fpm service on mw2185 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:41:37] PROBLEM - Check systemd state on mw2238 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:41:39] PROBLEM - Check systemd state on mw1256 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:41:41] PROBLEM - Check systemd state on mw1251 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:41:42] PROBLEM - php7.2-fpm service on mw2230 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:41:49] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] profile::mediawiki::php: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/477212 (owner: 10Giuseppe Lavagetto) [07:41:57] PROBLEM - php7.2-fpm service on mw1234 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:42:01] PROBLEM - php7.2-fpm service on mw2275 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:42:05] PROBLEM - Check systemd state on mw1234 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:42:09] PROBLEM - PHP7 rendering on mw1313 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.001 second response time [07:42:13] PROBLEM - PHP7 rendering on mw1235 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.001 second response time [07:42:15] PROBLEM - Check systemd state on mw1313 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:42:21] PROBLEM - php7.2-fpm service on mw2215 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:42:23] PROBLEM - PHP7 rendering on mw1256 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1308 bytes in 0.001 second response time [07:42:25] PROBLEM - php7.2-fpm service on mw1256 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:42:25] PROBLEM - php7.2-fpm service on mw1251 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:42:39] PROBLEM - php7.2-fpm service on mw1235 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:42:45] PROBLEM - php7.2-fpm service on mw1313 is CRITICAL: CRITICAL - Expecting active but unit php7.2-fpm is failed [07:42:53] PROBLEM - Check systemd state on mw1235 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:43:09] PROBLEM - puppet last run on mw1231 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:43:15] PROBLEM - puppet last run on mw2190 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:43:29] PROBLEM - puppet last run on mw2145 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:43:35] PROBLEM - puppet last run on mw1315 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:43:35] RECOVERY - php7.2-fpm service on mw1261 is OK: OK - php7.2-fpm is active [07:44:05] RECOVERY - PHP7 rendering on mw1261 is OK: HTTP OK: HTTP/1.1 200 OK - 80627 bytes in 0.162 second response time [07:44:05] PROBLEM - puppet last run on mw2191 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:44:31] RECOVERY - Check systemd state on mw1261 is OK: OK - running: The system is fully operational [07:44:31] PROBLEM - puppet last run on mw1228 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:45:01] PROBLEM - puppet last run on mw1285 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:45:37] PROBLEM - puppet last run on mw1225 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:45:47] PROBLEM - puppet last run on mw2178 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:45:49] RECOVERY - PHP7 rendering on mw2146 is OK: HTTP OK: HTTP/1.1 200 OK - 80637 bytes in 1.304 second response time [07:45:55] RECOVERY - Check systemd state on mw2146 is OK: OK - running: The system is fully operational [07:45:57] PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:46:11] RECOVERY - php7.2-fpm service on mw2146 is OK: OK - php7.2-fpm is active [07:46:39] RECOVERY - puppet last run on mw2138 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [07:47:15] RECOVERY - Check systemd state on mw1224 is OK: OK - running: The system is fully operational [07:47:25] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477136 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [07:47:37] RECOVERY - php7.2-fpm service on mw2185 is OK: OK - php7.2-fpm is active [07:47:37] RECOVERY - php7.2-fpm service on mw1224 is OK: OK - php7.2-fpm is active [07:47:39] RECOVERY - Check systemd state on mw1256 is OK: OK - running: The system is fully operational [07:47:53] PROBLEM - puppet last run on mw1321 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:47:57] RECOVERY - puppet last run on mw1279 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [07:48:03] RECOVERY - Check systemd state on mw1255 is OK: OK - running: The system is fully operational [07:48:09] RECOVERY - PHP7 rendering on mw2286 is OK: HTTP OK: HTTP/1.1 200 OK - 80637 bytes in 1.275 second response time [07:48:09] RECOVERY - PHP7 rendering on mw2185 is OK: HTTP OK: HTTP/1.1 200 OK - 80635 bytes in 0.318 second response time [07:48:11] RECOVERY - Check systemd state on mw2185 is OK: OK - running: The system is fully operational [07:48:13] RECOVERY - php7.2-fpm service on mw1245 is OK: OK - php7.2-fpm is active [07:48:17] RECOVERY - php7.2-fpm service on mw1255 is OK: OK - php7.2-fpm is active [07:48:19] RECOVERY - puppet last run on mw1284 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:48:19] RECOVERY - PHP7 rendering on mw1224 is OK: HTTP OK: HTTP/1.1 200 OK - 80627 bytes in 0.853 second response time [07:48:23] RECOVERY - PHP7 rendering on mw1256 is OK: HTTP OK: HTTP/1.1 200 OK - 80627 bytes in 0.988 second response time [07:48:23] RECOVERY - php7.2-fpm service on mw1256 is OK: OK - php7.2-fpm is active [07:48:27] RECOVERY - PHP7 rendering on mw1245 is OK: HTTP OK: HTTP/1.1 200 OK - 80627 bytes in 0.958 second response time [07:48:27] RECOVERY - PHP7 rendering on mw1255 is OK: HTTP OK: HTTP/1.1 200 OK - 80627 bytes in 0.943 second response time [07:48:27] RECOVERY - puppet last run on mw2190 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:48:35] RECOVERY - Check systemd state on mw1245 is OK: OK - running: The system is fully operational [07:48:35] RECOVERY - Check systemd state on mw2215 is OK: OK - running: The system is fully operational [07:48:35] RECOVERY - PHP7 rendering on mw2215 is OK: HTTP OK: HTTP/1.1 200 OK - 80635 bytes in 0.319 second response time [07:48:41] RECOVERY - puppet last run on mw2145 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:48:43] RECOVERY - puppet last run on mw1287 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:48:44] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477139 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [07:48:45] RECOVERY - Check systemd state on mw2286 is OK: OK - running: The system is fully operational [07:48:45] RECOVERY - puppet last run on mw1315 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:48:47] RECOVERY - php7.2-fpm service on mw2286 is OK: OK - php7.2-fpm is active [07:49:09] RECOVERY - Check systemd state on mw2193 is OK: OK - running: The system is fully operational [07:49:11] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [07:49:13] RECOVERY - php7.2-fpm service on mw2193 is OK: OK - php7.2-fpm is active [07:49:15] RECOVERY - php7.2-fpm service on mw2262 is OK: OK - php7.2-fpm is active [07:49:17] RECOVERY - puppet last run on mw2191 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:49:21] RECOVERY - PHP7 rendering on mw1313 is OK: HTTP OK: HTTP/1.1 200 OK - 80627 bytes in 0.976 second response time [07:49:23] RECOVERY - PHP7 rendering on mw2262 is OK: HTTP OK: HTTP/1.1 200 OK - 80637 bytes in 1.295 second response time [07:49:25] RECOVERY - Check systemd state on mw1313 is OK: OK - running: The system is fully operational [07:49:27] (03PS2) 10Ema: ATS: do not add X-Forwarded-For [puppet] - 10https://gerrit.wikimedia.org/r/476828 (https://phabricator.wikimedia.org/T207048) [07:49:29] RECOVERY - Check systemd state on mw2262 is OK: OK - running: The system is fully operational [07:49:29] RECOVERY - puppet last run on mw2290 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:49:31] RECOVERY - PHP7 rendering on mw2193 is OK: HTTP OK: HTTP/1.1 200 OK - 80635 bytes in 0.318 second response time [07:49:33] RECOVERY - php7.2-fpm service on mw2215 is OK: OK - php7.2-fpm is active [07:49:35] RECOVERY - puppet last run on mw2186 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:49:39] RECOVERY - Check systemd state on mw2283 is OK: OK - running: The system is fully operational [07:49:41] RECOVERY - php7.2-fpm service on mw2203 is OK: OK - php7.2-fpm is active [07:49:55] RECOVERY - php7.2-fpm service on mw1313 is OK: OK - php7.2-fpm is active [07:50:05] RECOVERY - php7.2-fpm service on mw2283 is OK: OK - php7.2-fpm is active [07:50:11] RECOVERY - puppet last run on mw1285 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:50:15] RECOVERY - Check systemd state on mw2203 is OK: OK - running: The system is fully operational [07:50:19] RECOVERY - Check systemd state on mw1277 is OK: OK - running: The system is fully operational [07:50:21] RECOVERY - php7.2-fpm service on mw1277 is OK: OK - php7.2-fpm is active [07:50:29] RECOVERY - PHP7 rendering on mw2203 is OK: HTTP OK: HTTP/1.1 200 OK - 80637 bytes in 1.384 second response time [07:50:29] RECOVERY - PHP7 rendering on mw2283 is OK: HTTP OK: HTTP/1.1 200 OK - 80635 bytes in 0.336 second response time [07:50:31] RECOVERY - puppet last run on mw2252 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [07:50:45] RECOVERY - puppet last run on mw1345 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:50:45] RECOVERY - php7.2-fpm service on mw1331 is OK: OK - php7.2-fpm is active [07:50:46] (03CR) 10Ema: [C: 032] ATS: do not add X-Forwarded-For [puppet] - 10https://gerrit.wikimedia.org/r/476828 (https://phabricator.wikimedia.org/T207048) (owner: 10Ema) [07:50:47] RECOVERY - php7.2-fpm service on mw2174 is OK: OK - php7.2-fpm is active [07:50:49] RECOVERY - puppet last run on mw1225 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:50:49] RECOVERY - php7.2-fpm service on mw2208 is OK: OK - php7.2-fpm is active [07:50:54] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477210 (owner: 10Marostegui) [07:50:57] RECOVERY - PHP7 rendering on mw2238 is OK: HTTP OK: HTTP/1.1 200 OK - 80635 bytes in 0.310 second response time [07:50:57] RECOVERY - Check systemd state on mw2208 is OK: OK - running: The system is fully operational [07:50:57] RECOVERY - puppet last run on mw2178 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:50:59] RECOVERY - Check systemd state on mw1331 is OK: OK - running: The system is fully operational [07:50:59] RECOVERY - php7.2-fpm service on mw2238 is OK: OK - php7.2-fpm is active [07:50:59] RECOVERY - php7.2-fpm service on mw1235 is OK: OK - php7.2-fpm is active [07:51:01] RECOVERY - PHP7 rendering on mw1331 is OK: HTTP OK: HTTP/1.1 200 OK - 80627 bytes in 0.978 second response time [07:51:05] RECOVERY - PHP7 rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 80627 bytes in 0.862 second response time [07:51:07] RECOVERY - puppet last run on mw1274 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [07:51:09] RECOVERY - puppet last run on mw1243 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:51:11] RECOVERY - PHP7 rendering on mw2208 is OK: HTTP OK: HTTP/1.1 200 OK - 80637 bytes in 1.197 second response time [07:51:11] RECOVERY - Check systemd state on mw2238 is OK: OK - running: The system is fully operational [07:51:11] RECOVERY - php7.2-fpm service on mw2236 is OK: OK - php7.2-fpm is active [07:51:15] RECOVERY - Check systemd state on mw1235 is OK: OK - running: The system is fully operational [07:51:15] RECOVERY - Check systemd state on mw1251 is OK: OK - running: The system is fully operational [07:51:15] RECOVERY - Check systemd state on mw2174 is OK: OK - running: The system is fully operational [07:51:16] (03PS2) 10Ema: varnish: do not allow X-ATS-Debug to be set from the outside [puppet] - 10https://gerrit.wikimedia.org/r/476814 (https://phabricator.wikimedia.org/T207048) [07:51:17] RECOVERY - php7.2-fpm service on mw2230 is OK: OK - php7.2-fpm is active [07:51:19] RECOVERY - PHP7 rendering on mw2174 is OK: HTTP OK: HTTP/1.1 200 OK - 80635 bytes in 0.321 second response time [07:51:37] RECOVERY - php7.2-fpm service on mw2275 is OK: OK - php7.2-fpm is active [07:51:39] RECOVERY - puppet last run on mw1339 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:51:39] RECOVERY - php7.2-fpm service on mw1222 is OK: OK - php7.2-fpm is active [07:51:43] RECOVERY - puppet last run on mw2231 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:51:45] RECOVERY - PHP7 rendering on mw2230 is OK: HTTP OK: HTTP/1.1 200 OK - 80637 bytes in 1.215 second response time [07:51:45] RECOVERY - Check systemd state on mw1222 is OK: OK - running: The system is fully operational [07:51:45] RECOVERY - PHP7 rendering on mw1235 is OK: HTTP OK: HTTP/1.1 200 OK - 80627 bytes in 0.145 second response time [07:51:51] RECOVERY - Check systemd state on mw2230 is OK: OK - running: The system is fully operational [07:51:55] RECOVERY - PHP7 rendering on mw2236 is OK: HTTP OK: HTTP/1.1 200 OK - 80635 bytes in 0.338 second response time [07:51:56] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477210 (owner: 10Marostegui) [07:51:57] RECOVERY - php7.2-fpm service on mw1251 is OK: OK - php7.2-fpm is active [07:51:57] RECOVERY - puppet last run on mw2251 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:52:01] RECOVERY - puppet last run on mw2271 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [07:52:07] RECOVERY - PHP7 rendering on mw2275 is OK: HTTP OK: HTTP/1.1 200 OK - 80635 bytes in 0.322 second response time [07:52:09] RECOVERY - Check systemd state on mw2236 is OK: OK - running: The system is fully operational [07:52:11] RECOVERY - PHP7 rendering on mw1251 is OK: HTTP OK: HTTP/1.1 200 OK - 80627 bytes in 0.978 second response time [07:52:13] RECOVERY - PHP7 rendering on mw1234 is OK: HTTP OK: HTTP/1.1 200 OK - 80627 bytes in 0.138 second response time [07:52:17] RECOVERY - Check systemd state on mw2275 is OK: OK - running: The system is fully operational [07:52:19] RECOVERY - PHP7 rendering on mw1222 is OK: HTTP OK: HTTP/1.1 200 OK - 80634 bytes in 1.131 second response time [07:52:19] (03CR) 10Ema: [C: 032] varnish: do not allow X-ATS-Debug to be set from the outside [puppet] - 10https://gerrit.wikimedia.org/r/476814 (https://phabricator.wikimedia.org/T207048) (owner: 10Ema) [07:52:37] RECOVERY - puppet last run on mw2176 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:52:41] RECOVERY - puppet last run on mw2233 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:52:41] RECOVERY - puppet last run on mw1233 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:52:41] RECOVERY - php7.2-fpm service on mw1234 is OK: OK - php7.2-fpm is active [07:52:42] RECOVERY - puppet last run on mw2201 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:52:49] RECOVERY - Check systemd state on mw1234 is OK: OK - running: The system is fully operational [07:52:55] RECOVERY - puppet last run on mw1332 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:52:55] RECOVERY - php7.2-fpm service on mw1226 is OK: OK - php7.2-fpm is active [07:53:01] RECOVERY - php7.2-fpm service on mw1267 is OK: OK - php7.2-fpm is active [07:53:01] RECOVERY - puppet last run on mw1263 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:53:01] RECOVERY - PHP7 rendering on mw1267 is OK: HTTP OK: HTTP/1.1 200 OK - 80627 bytes in 0.986 second response time [07:53:03] RECOVERY - Check systemd state on mw1226 is OK: OK - running: The system is fully operational [07:53:04] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1074 T86338 T202167 (duration: 00m 47s) [07:53:05] RECOVERY - puppet last run on mw1321 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:53:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:53:11] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [07:53:11] RECOVERY - PHP7 rendering on mw1226 is OK: HTTP OK: HTTP/1.1 200 OK - 80627 bytes in 0.169 second response time [07:53:11] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [07:53:15] RECOVERY - Check systemd state on mw2189 is OK: OK - running: The system is fully operational [07:53:15] RECOVERY - php7.2-fpm service on mw2189 is OK: OK - php7.2-fpm is active [07:53:18] (03PS3) 10Giuseppe Lavagetto: mediawiki: configure php-fpm logging [puppet] - 10https://gerrit.wikimedia.org/r/476502 [07:53:19] RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:53:23] RECOVERY - Check systemd state on mw1267 is OK: OK - running: The system is fully operational [07:53:25] RECOVERY - php7.2-fpm service on mw2226 is OK: OK - php7.2-fpm is active [07:53:31] RECOVERY - PHP7 rendering on mw2189 is OK: HTTP OK: HTTP/1.1 200 OK - 80637 bytes in 1.270 second response time [07:53:31] RECOVERY - puppet last run on mw1231 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [07:53:33] RECOVERY - PHP7 rendering on mw1221 is OK: HTTP OK: HTTP/1.1 200 OK - 80627 bytes in 0.151 second response time [07:53:37] RECOVERY - PHP7 rendering on mw2226 is OK: HTTP OK: HTTP/1.1 200 OK - 80637 bytes in 1.295 second response time [07:53:41] RECOVERY - Check systemd state on mw2226 is OK: OK - running: The system is fully operational [07:53:43] RECOVERY - php7.2-fpm service on mw1221 is OK: OK - php7.2-fpm is active [07:53:45] RECOVERY - Check systemd state on mw1221 is OK: OK - running: The system is fully operational [07:54:02] (03CR) 10Ema: [C: 032] ATS: add SystemTap probe for uncacheable responses [puppet] - 10https://gerrit.wikimedia.org/r/476820 (https://phabricator.wikimedia.org/T207048) (owner: 10Ema) [07:54:09] (03PS3) 10Ema: ATS: add SystemTap probe for uncacheable responses [puppet] - 10https://gerrit.wikimedia.org/r/476820 (https://phabricator.wikimedia.org/T207048) [07:54:11] RECOVERY - puppet last run on mw2218 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:54:17] RECOVERY - puppet last run on mw1276 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:54:21] RECOVERY - puppet last run on mw1242 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:54:45] RECOVERY - puppet last run on mw2253 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:54:49] RECOVERY - puppet last run on mw2214 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:54:53] RECOVERY - puppet last run on mw1228 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:54:55] RECOVERY - puppet last run on mw2220 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:55:07] RECOVERY - puppet last run on mw2177 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:56:51] RECOVERY - puppet last run on mw1322 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:02:15] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477210 (owner: 10Marostegui) [08:03:02] (03PS2) 10Filippo Giunchedi: Partman: added 3SSD JBOD config for restbase201[3-8] [puppet] - 10https://gerrit.wikimedia.org/r/476912 (https://phabricator.wikimedia.org/T210863) (owner: 10Eevans) [08:03:44] (03PS3) 10Filippo Giunchedi: install_server: add 3SSD JBOD config for restbase201[3-8] [puppet] - 10https://gerrit.wikimedia.org/r/476912 (https://phabricator.wikimedia.org/T210863) (owner: 10Eevans) [08:04:24] (03CR) 10Filippo Giunchedi: [C: 032] install_server: add 3SSD JBOD config for restbase201[3-8] [puppet] - 10https://gerrit.wikimedia.org/r/476912 (https://phabricator.wikimedia.org/T210863) (owner: 10Eevans) [08:04:31] (03PS4) 10Filippo Giunchedi: install_server: add 3SSD JBOD config for restbase201[3-8] [puppet] - 10https://gerrit.wikimedia.org/r/476912 (https://phabricator.wikimedia.org/T210863) (owner: 10Eevans) [08:05:38] (03PS2) 10Filippo Giunchedi: hieradata: reconfigure restbase2013 for 3-SSD JBOD [puppet] - 10https://gerrit.wikimedia.org/r/476915 (https://phabricator.wikimedia.org/T210863) (owner: 10Eevans) [08:05:44] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: reconfigure restbase2013 for 3-SSD JBOD [puppet] - 10https://gerrit.wikimedia.org/r/476915 (https://phabricator.wikimedia.org/T210863) (owner: 10Eevans) [08:07:54] !log Deploy schema change on s2 master (db1066) T86338 T202167 [08:07:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:07:58] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [08:07:58] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [08:09:33] 10Operations, 10ops-codfw, 10Patch-For-Review, 10Services (watching), and 2 others: Reconfigure hardware and reimage restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T210863 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by filippo on cumin1001.eqiad.wmnet for hosts: ` restbase... [08:13:45] !log rearmed keyholders on netmon1002/netmon2001 [08:13:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:25] ACKNOWLEDGEMENT - Device not healthy -SMART- on db1063 is CRITICAL: cluster=mysql device=megaraid,7 instance=db1063:9100 job=node site=eqiad Banyek T210976 https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db1063var-datasource=eqiad%2520prometheus%252Fops [08:19:07] (03PS3) 10Muehlenhoff: Absent Redis Diamond collector on mc* servers [puppet] - 10https://gerrit.wikimedia.org/r/476883 (https://phabricator.wikimedia.org/T183454) [08:28:09] RECOVERY - Keyholder SSH agent on netmon2001 is OK: OK: Keyholder is armed with all configured keys. [08:30:36] (03CR) 10Giuseppe Lavagetto: [C: 032] tox: allow passing options to pytest environments [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475885 (owner: 10Hashar) [08:30:40] !log Deploy schema change on s5 codfw master (db2052) with replication, lag will be generated on codfw T86338 T202167 [08:30:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:45] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [08:30:45] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [08:31:19] (03CR) 10jenkins-bot: tox: allow passing options to pytest environments [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475885 (owner: 10Hashar) [08:32:09] RECOVERY - Keyholder SSH agent on netmon1002 is OK: OK: Keyholder is armed with all configured keys. [08:32:51] (03PS4) 10Giuseppe Lavagetto: mediawiki: configure php-fpm logging [puppet] - 10https://gerrit.wikimedia.org/r/476502 [08:32:59] !log restarted keyholder agents/proxies on netmon1002/netmon2001 to pick up removal of netbox key [08:33:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:32] (03CR) 10Muehlenhoff: [C: 032] Absent Redis Diamond collector on mc* servers [puppet] - 10https://gerrit.wikimedia.org/r/476883 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [08:36:51] (03PS5) 10Giuseppe Lavagetto: mediawiki: configure php-fpm logging [puppet] - 10https://gerrit.wikimedia.org/r/476502 [08:42:03] (03CR) 10Giuseppe Lavagetto: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/13815/mw1261.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/476502 (owner: 10Giuseppe Lavagetto) [08:42:31] (03PS6) 10Giuseppe Lavagetto: mediawiki: configure php-fpm logging [puppet] - 10https://gerrit.wikimedia.org/r/476502 [08:43:10] !log depooling db1074 - T85757 [08:43:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:43:14] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [08:43:46] (03CR) 10Banyek: [C: 032] mariadb: depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475739 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [08:43:54] (03PS3) 10Banyek: mariadb: depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475739 (https://phabricator.wikimedia.org/T85757) [08:43:58] (03CR) 10Banyek: [V: 032 C: 032] mariadb: depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475739 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [08:44:16] !log bootstrap cassandra-a on restbase2013 - T209615 [08:44:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:44:20] T209615: rack/setup/install restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T209615 [08:44:44] <_joe_> banyek: I don't think it's a good idea to V+2 changes unless you're in an emergency [08:44:52] (03PS1) 10Filippo Giunchedi: hieradata: use 3 ssd as jbod for restbase201[3-8] [puppet] - 10https://gerrit.wikimedia.org/r/477214 (https://phabricator.wikimedia.org/T210863) [08:45:00] <_joe_> banyek: esp in repositories with gate-and-submit jobs [08:45:27] (03Merged) 10jenkins-bot: mariadb: depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475739 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [08:45:45] _joe_: patch was already checked and prepared days before [08:45:53] <_joe_> banyek: doesn't matter [08:46:09] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: use 3 ssd as jbod for restbase201[3-8] [puppet] - 10https://gerrit.wikimedia.org/r/477214 (https://phabricator.wikimedia.org/T210863) (owner: 10Filippo Giunchedi) [08:46:16] ok, sorry [08:46:18] (03PS2) 10Filippo Giunchedi: hieradata: use 3 ssd as jbod for restbase201[3-8] [puppet] - 10https://gerrit.wikimedia.org/r/477214 (https://phabricator.wikimedia.org/T210863) [08:46:21] <_joe_> banyek: things might have changed in the repository in the meanwhile [08:47:11] <_joe_> so unless it's an emergency, or a simple rebase of in a FF-only repository like ops/pupppet, don't V+2 changes [08:48:42] 10Operations, 10ops-codfw, 10Patch-For-Review, 10Services (watching), 10User-fgiunchedi: rack/setup/install restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T209615 (10fgiunchedi) [08:48:48] 10Operations, 10ops-codfw, 10Patch-For-Review, 10Services (watching), and 2 others: Reconfigure hardware and reimage restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T210863 (10fgiunchedi) 05Open>03Resolved This is completed, thanks @Papaul and all involved. [08:49:30] !log banyek@deploy1001 Synchronized wmf-config/db-eqiad.php: T85757: depool db1074 (duration: 00m 48s) [08:49:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:49:33] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [08:50:20] _joe_: aye [08:50:45] !log stopping replication on db1074 - T85757 [08:50:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:53:38] (03CR) 10jenkins-bot: mariadb: depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475739 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [08:55:30] (03CR) 10Gehel: [C: 031] osm::planet_sync: change osm log dir permission (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/477118 (https://phabricator.wikimedia.org/T210940) (owner: 10Mathew.onipe) [08:55:38] (03PS2) 10Gehel: osm::planet_sync: change osm log dir permission [puppet] - 10https://gerrit.wikimedia.org/r/477118 (https://phabricator.wikimedia.org/T210940) (owner: 10Mathew.onipe) [08:57:50] (03CR) 10Gehel: [C: 032] osm::planet_sync: change osm log dir permission [puppet] - 10https://gerrit.wikimedia.org/r/477118 (https://phabricator.wikimedia.org/T210940) (owner: 10Mathew.onipe) [08:58:12] 10Operations, 10ops-codfw, 10Patch-For-Review, 10Services (watching), and 2 others: Reconfigure hardware and reimage restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T210863 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['restbase2013.codfw.wmnet'] ` Of which those **FAILED**:... [09:01:04] 10Operations, 10ops-codfw, 10Patch-For-Review, 10Services (watching), and 2 others: Reconfigure hardware and reimage restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T210863 (10fgiunchedi) >>! In T210863#4793101, @ops-monitoring-bot wrote: > Completed auto-reimage of hosts: > ` > ['restbase... [09:05:11] (03PS4) 10Arturo Borrero Gonzalez: openstack: dnsleaks.py: respect PTR records for .svc.eqiad.wmflabs too [puppet] - 10https://gerrit.wikimedia.org/r/476859 [09:06:22] (03PS1) 10Filippo Giunchedi: hieradata: add instances for restbase201[4-8] [puppet] - 10https://gerrit.wikimedia.org/r/477216 (https://phabricator.wikimedia.org/T209615) [09:07:01] (03CR) 10Arturo Borrero Gonzalez: [C: 032] openstack: dnsleaks.py: respect PTR records for .svc.eqiad.wmflabs too [puppet] - 10https://gerrit.wikimedia.org/r/476859 (owner: 10Arturo Borrero Gonzalez) [09:10:34] 10Operations, 10Maps, 10Discovery-Search (Current work), 10Patch-For-Review: Cron test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) - https://phabricator.wikimedia.org/T210940 (10Gehel) @Mathew.onipe patch deployed, can you validate that it works before moving the... [09:12:32] 10Operations, 10cloud-services-team (Kanban): WMCS-related dashboards using Diamond metrics - https://phabricator.wikimedia.org/T210850 (10MoritzMuehlenhoff) As for spotting remaining Diamond metrics, https://phabricator.wikimedia.org/P7680 contains a Paste with remaining Diamond metric references (based on a... [09:14:40] 10Operations, 10ops-codfw: Degraded RAID on restbase2014 - https://phabricator.wikimedia.org/T210984 (10ops-monitoring-bot) [09:15:49] RECOVERY - MD RAID on restbase2014 is OK: OK: Active: 9, Working: 9, Failed: 0, Spare: 0 [09:16:09] 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Banyek) [09:16:19] 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Banyek) db1063 `name: Adapter #0 Virtual Drive: 0 (Target Id: 0) RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0 State: Optimal Number Of Drives per span: 2 Number of Spans: 6 Cur... [09:16:38] !log repooling db1074 [09:16:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:41] RECOVERY - MD RAID on restbase2017 is OK: OK: Active: 9, Working: 9, Failed: 0, Spare: 0 [09:16:42] !log repooling db1074 - T85757 [09:16:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:45] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [09:17:30] (03PS1) 10Banyek: Revert "mariadb: depool db1074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477219 [09:19:38] (03CR) 10Banyek: [C: 032] Revert "mariadb: depool db1074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477219 (owner: 10Banyek) [09:20:45] (03Merged) 10jenkins-bot: Revert "mariadb: depool db1074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477219 (owner: 10Banyek) [09:22:10] !log banyek@deploy1001 Synchronized wmf-config/db-eqiad.php: T85757: repool db1074 (duration: 00m 47s) [09:22:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:13] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [09:27:25] !log executing schema change on db1066 (s2 master) - T85757 [09:27:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:27:28] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [09:28:15] (03PS1) 10Ema: ATS: MediaWiki and RestBASE request mangling [puppet] - 10https://gerrit.wikimedia.org/r/477221 (https://phabricator.wikimedia.org/T209021) [09:29:11] RECOVERY - MD RAID on restbase2015 is OK: OK: Active: 9, Working: 9, Failed: 0, Spare: 0 [09:30:12] (03PS1) 10ArielGlenn: convert snapshot/dumps python scripts in puppet to python3 [puppet] - 10https://gerrit.wikimedia.org/r/477222 (https://phabricator.wikimedia.org/T210980) [09:31:38] (03CR) 10jenkins-bot: Revert "mariadb: depool db1074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477219 (owner: 10Banyek) [09:32:07] 10Operations, 10Traffic, 10Patch-For-Review: ATS backend-side request-mangling - https://phabricator.wikimedia.org/T209021 (10ema) [09:32:37] (03CR) 10Ema: [C: 032] ATS: MediaWiki and RestBASE request mangling [puppet] - 10https://gerrit.wikimedia.org/r/477221 (https://phabricator.wikimedia.org/T209021) (owner: 10Ema) [09:33:35] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, and 2 others: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Marostegui) >>! In T200297#4787352, @awight wrote: > @Marostegui Hello! I've added a few summary columns and indexes to the lin... [09:37:43] 10Operations, 10Patch-For-Review, 10User-Marostegui: Audit "misc" cluster hosts - https://phabricator.wikimedia.org/T210486 (10jijiki) p:05Triage>03Normal @colewhite @fgiunchedi should we add a checklist of actions need to be done in order to consider this task as "Resolved?" [09:43:18] 10Operations, 10ops-codfw: Degraded RAID on restbase2018 - https://phabricator.wikimedia.org/T210990 (10ops-monitoring-bot) [09:44:53] (03PS1) 10Marostegui: db-eqiad.php: Depool db1096:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477223 (https://phabricator.wikimedia.org/T86338) [09:45:54] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1096:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477223 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [09:46:54] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1096:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477223 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [09:48:02] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1096:3315 T86338 T202167 (duration: 00m 47s) [09:48:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:48:06] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [09:48:07] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [09:48:08] !log Deploy schema change on db1096:3315 T86338 T202167 [09:48:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:57:32] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1096:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477223 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [09:59:38] (03PS1) 10ArielGlenn: start conversion to python3 [dumps] (python3) - 10https://gerrit.wikimedia.org/r/477227 (https://phabricator.wikimedia.org/T210989) [09:59:57] (03CR) 10jerkins-bot: [V: 04-1] start conversion to python3 [dumps] (python3) - 10https://gerrit.wikimedia.org/r/477227 (https://phabricator.wikimedia.org/T210989) (owner: 10ArielGlenn) [10:00:58] RECOVERY - MD RAID on restbase2018 is OK: OK: Active: 9, Working: 9, Failed: 0, Spare: 0 [10:01:17] 10Operations, 10Elasticsearch, 10Discovery-Search (Current work): Fix prometheus elasticsearch exporter to show all the metrics - https://phabricator.wikimedia.org/T210592 (10dcausse) [10:01:59] (03CR) 10Filippo Giunchedi: [C: 031] logstash: ship zookeeper logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/476977 (https://phabricator.wikimedia.org/T63789) (owner: 10Herron) [10:02:04] 10Operations, 10Toolforge, 10cloud-services-team, 10monitoring, 10User-fgiunchedi: Deprecate Diamond collectors in Tool Labs / Tool Forge - https://phabricator.wikimedia.org/T210991 (10MoritzMuehlenhoff) p:05Triage>03Normal [10:02:54] PROBLEM - puppet last run on icinga1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:04:16] 10Operations, 10cloud-services-team, 10monitoring, 10User-fgiunchedi: Deprecate Diamond collectors in Cloud VPS - https://phabricator.wikimedia.org/T210993 (10MoritzMuehlenhoff) p:05Triage>03Normal [10:05:31] 10Operations, 10DBA, 10Performance-Team: Increase parsercache keys TTL from 22 days back to 30 days - https://phabricator.wikimedia.org/T210992 (10Marostegui) [10:05:57] 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Deprovision Diamond collectors no longer in use - https://phabricator.wikimedia.org/T183454 (10MoritzMuehlenhoff) [10:07:39] 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Deprovision Diamond collectors no longer in use - https://phabricator.wikimedia.org/T183454 (10MoritzMuehlenhoff) [10:08:31] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, eyeballing both code and PCC changes. After merging please apply to one host at a time JIC" [puppet] - 10https://gerrit.wikimedia.org/r/476916 (owner: 10Dzahn) [10:08:35] 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Deprovision Diamond collectors no longer in use - https://phabricator.wikimedia.org/T183454 (10MoritzMuehlenhoff) [10:10:54] (03CR) 10Filippo Giunchedi: [C: 031] admin: Add addshore to graphite-admins; allow _graphite commands [puppet] - 10https://gerrit.wikimedia.org/r/476558 (https://phabricator.wikimedia.org/T208750) (owner: 10Jcrespo) [10:12:03] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1096:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477229 [10:16:28] (03PS1) 10Muehlenhoff: Remove Redis collector [puppet] - 10https://gerrit.wikimedia.org/r/477231 (https://phabricator.wikimedia.org/T183454) [10:17:33] (03PS1) 10Urbanecm: Create two extra namespaces on yuewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477232 (https://phabricator.wikimedia.org/T205546) [10:18:14] RECOVERY - puppet last run on icinga1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:20:26] (03PS2) 10Urbanecm: Revert "Milestone logo for atjwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449445 (https://phabricator.wikimedia.org/T200713) [10:23:13] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1096:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477229 (owner: 10Marostegui) [10:23:56] (03PS1) 10Muehlenhoff: Remove labstore::monitoring::nfsd [puppet] - 10https://gerrit.wikimedia.org/r/477234 [10:24:19] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1096:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477229 (owner: 10Marostegui) [10:25:26] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1096:3315 T86338 T202167 (duration: 00m 45s) [10:25:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:25:30] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [10:25:31] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [10:27:33] (03PS1) 10Marostegui: db-eqiad.php: Depool db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477235 (https://phabricator.wikimedia.org/T86338) [10:28:44] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477235 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [10:29:44] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477235 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [10:30:04] jan_drewniak: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181203T1030). [10:30:24] PROBLEM - Disk space on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [10:30:52] PROBLEM - Check systemd state on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [10:30:57] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1097:3315 T86338 T202167 (duration: 00m 46s) [10:31:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:31:01] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [10:31:02] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [10:31:08] PROBLEM - DPKG on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [10:31:37] !log Deploy schema change on db1097:3315 T86338 T202167 [10:31:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:33:00] PROBLEM - MD RAID on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [10:33:14] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477237 (https://phabricator.wikimedia.org/T128546) [10:34:17] checking notebook1003 [10:34:32] PROBLEM - puppet last run on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused [10:34:38] !log installing nodejs security updates on scb2001 [10:34:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:16] RECOVERY - MD RAID on notebook1003 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [10:35:20] RECOVERY - Check systemd state on notebook1003 is OK: OK - running: The system is fully operational [10:35:39] RECOVERY - DPKG on notebook1003 is OK: All packages OK [10:35:41] (03CR) 10Jdrewniak: [C: 032] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477237 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:35:46] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1096:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477229 (owner: 10Marostegui) [10:35:48] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477235 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [10:35:53] had to restart nagios checker, OOM happened before and left things in a weird state [10:36:00] RECOVERY - Disk space on notebook1003 is OK: DISK OK [10:36:42] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477237 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:39:05] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:477237| Bumping portals to master (T128546)]] (duration: 00m 47s) [10:39:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:09] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [10:39:42] RECOVERY - puppet last run on notebook1003 is OK: OK: Puppet is currently enabled, last run 22 minutes ago with 0 failures [10:39:52] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:477237| Bumping portals to master (T128546)]] (duration: 00m 46s) [10:39:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:48:49] (03CR) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477237 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:51:52] (03CR) 10Filippo Giunchedi: [C: 031] Remove Redis collector [puppet] - 10https://gerrit.wikimedia.org/r/477231 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [10:52:18] !log rolling upgrade of scb in codfw to nodejs security update [10:52:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:55:46] (03PS1) 10Ema: ATS: do not cache cookie responses [puppet] - 10https://gerrit.wikimedia.org/r/477240 (https://phabricator.wikimedia.org/T209021) [10:56:14] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1097:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477241 [10:57:29] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1097:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477241 (owner: 10Marostegui) [10:58:32] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1097:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477241 (owner: 10Marostegui) [10:59:31] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1097:3315 T86338 T202167 (duration: 00m 47s) [10:59:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:36] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [10:59:36] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [11:01:30] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1097:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477241 (owner: 10Marostegui) [11:02:13] (03PS2) 10Ema: ATS: do not cache cookie responses [puppet] - 10https://gerrit.wikimedia.org/r/477240 (https://phabricator.wikimedia.org/T209021) [11:02:26] !log installing nodejs security updates on stat/notebook hosts [11:02:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:44] 10Operations, 10Maps, 10Discovery-Search (Current work), 10Patch-For-Review: Cron test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) - https://phabricator.wikimedia.org/T210940 (10Mathew.onipe) @gehel I don't seem to have root access on maps servers (strange). I tho... [11:06:11] (03CR) 10Muehlenhoff: [C: 032] Remove Redis collector [puppet] - 10https://gerrit.wikimedia.org/r/477231 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [11:08:59] (03PS3) 10Ema: ATS: do not cache cookie responses [puppet] - 10https://gerrit.wikimedia.org/r/477240 (https://phabricator.wikimedia.org/T209021) [11:10:06] (03CR) 10Ema: [C: 032] ATS: do not cache cookie responses [puppet] - 10https://gerrit.wikimedia.org/r/477240 (https://phabricator.wikimedia.org/T209021) (owner: 10Ema) [11:18:28] 10Operations, 10Math, 10Patch-For-Review: Clean up artifacts from LaTeX based math rendering - https://phabricator.wikimedia.org/T195847 (10fgiunchedi) Is cleaning up swift `global-math-render.*` containers in scope for this? afaik with mathoid now these containers shouldn't be used anymore? [11:23:12] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 10media-storage: Loading full versions of larger images from Commons stucks / repeatedly gets interrupted after a few MBs - https://phabricator.wikimedia.org/T210890 (10fgiunchedi) I can indeed reproduce the problem when fetching e.g. https://upload... [11:29:30] 10Operations, 10ops-codfw: Degraded RAID on restbase2014 - https://phabricator.wikimedia.org/T210984 (10fgiunchedi) 05Open>03Invalid reimage [11:30:02] 10Operations, 10ops-codfw: Degraded RAID on restbase2018 - https://phabricator.wikimedia.org/T210990 (10fgiunchedi) 05Open>03Invalid reimage [11:35:28] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 10media-storage: Loading full versions of larger images from Commons stucks / repeatedly gets interrupted after a few MBs - https://phabricator.wikimedia.org/T210890 (10akosiaris) I can reproduce it as well. Received sizes and execution times are not... [11:37:02] (03CR) 10Volans: "I know I'm late for this, still leaving some comments." (0310 comments) [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/471298 (https://phabricator.wikimedia.org/T208066) (owner: 10Cwhite) [11:38:42] (03CR) 10Effie Mouzeli: admin: Add addshore to graphite-admins; allow _graphite commands (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/476558 (https://phabricator.wikimedia.org/T208750) (owner: 10Jcrespo) [11:39:04] !log more weight to new ms-be codfw hosts - T209395 [11:39:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:39:08] T209395: rack/setup/install new ms-be servers ms-be204[4-9] ,ms-be2050 - https://phabricator.wikimedia.org/T209395 [11:44:24] (03CR) 10Filippo Giunchedi: [C: 031] admin: Add addshore to graphite-admins; allow _graphite commands (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/476558 (https://phabricator.wikimedia.org/T208750) (owner: 10Jcrespo) [11:46:36] (03CR) 10Mathew.onipe: [C: 031] admin: Add addshore to graphite-admins; allow _graphite commands [puppet] - 10https://gerrit.wikimedia.org/r/476558 (https://phabricator.wikimedia.org/T208750) (owner: 10Jcrespo) [11:57:56] I’m online for the EU SWAT, but I’ll have to be offline for a minute around 12:00 UTC, sorry [11:58:07] but I should be able to deploy Michael_WMDE’s and my config patch [11:58:15] it’ll just take me a minute to respond to jouncebot :) [12:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I, the Bot under the Fountain, allow thee, The Deployer, to do European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181203T1200). [12:00:05] Lucas_WMDE, Michael_WMDE, Urbanecm, and Zoranzoki21: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:10] * Urbanecm waves [12:01:23] o/ [12:01:47] hi [12:01:58] since I’m first in the list, should I start deploying my patch? [12:02:00] or do I wait for a SWAT conductor? [12:04:00] AFAIK it is usually whoever is doing SWAT that deploys patches [12:04:18] I thought if the patch author is a deployer they do the deployment themselves? [12:06:02] Lucas_WMDE: could totally be! [12:06:02] Lucas_WMDE, they can, if they want to [12:06:06] Personally I think you can start deploying, but I'm not an authority on that [12:06:27] (03PS2) 10Lucas Werkmeister (WMDE): Don’t send SPARQL prefixes in WikibaseQualityConstraints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476267 (https://phabricator.wikimedia.org/T204317) [12:06:28] I can start rebasing my patch at least [12:07:19] (03PS1) 10Ema: ATS: log cache results and backend URL [puppet] - 10https://gerrit.wikimedia.org/r/477246 [12:07:36] addshore says I can go ahead :) [12:07:51] (03CR) 10Lucas Werkmeister (WMDE): [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476267 (https://phabricator.wikimedia.org/T204317) (owner: 10Lucas Werkmeister (WMDE)) [12:07:52] o/ [12:07:57] * addshore will watch [12:09:06] 10Operations, 10Patch-For-Review, 10User-Marostegui: Audit "misc" cluster hosts - https://phabricator.wikimedia.org/T210486 (10fgiunchedi) [12:09:19] (03Merged) 10jenkins-bot: Don’t send SPARQL prefixes in WikibaseQualityConstraints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476267 (https://phabricator.wikimedia.org/T204317) (owner: 10Lucas Werkmeister (WMDE)) [12:09:32] 10Operations, 10Patch-For-Review, 10User-Marostegui: Audit "misc" cluster hosts - https://phabricator.wikimedia.org/T210486 (10fgiunchedi) >>! In T210486#4793292, @jijiki wrote: > @colewhite @fgiunchedi should we add a checklist of actions need to be done in order to consider this task as "Resolved?" Sounds... [12:10:42] Lucas_WMDE: your patch is on mwdebug1002, please test ;) [12:10:44] (03CR) 10Effie Mouzeli: [C: 032] admin: Add addshore to graphite-admins; allow _graphite commands (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/476558 (https://phabricator.wikimedia.org/T208750) (owner: 10Jcrespo) [12:11:05] (03PS3) 10Effie Mouzeli: admin: Add addshore to graphite-admins; allow _graphite commands [puppet] - 10https://gerrit.wikimedia.org/r/476558 (https://phabricator.wikimedia.org/T208750) (owner: 10Jcrespo) [12:11:44] (03CR) 10Alexandros Kosiaris: [C: 032] "PCC fine per https://puppet-compiler.wmflabs.org/compiler1002/13818/scb1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/476266 (https://phabricator.wikimedia.org/T210578) (owner: 10KartikMistry) [12:12:14] there’s not much I can do to test the change, but it’s at least not broken anything [12:12:16] going ahead [12:13:33] :) [12:13:45] kart_: merging https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/476266/ [12:13:53] (03PS3) 10Alexandros Kosiaris: cxserver: Added update Youdao config [puppet] - 10https://gerrit.wikimedia.org/r/476266 (https://phabricator.wikimedia.org/T210578) (owner: 10KartikMistry) [12:14:20] when my change is done, I can deploy Michael_WMDE’s change (unless someone else really wants to do it?) [12:14:25] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:476267| Don’t send SPARQL prefixes in WikibaseQualityConstraints (T204317)]] (duration: 00m 49s) [12:14:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:14:29] Lucas_WMDE: go ahead! :) [12:14:31] T204317: Don’t send SPARQL prefixes with WikibaseQualityConstraints queries - https://phabricator.wikimedia.org/T204317 [12:14:34] ready [12:14:42] ack [12:14:48] (03PS3) 10Lucas Werkmeister (WMDE): Perform more PHP constraint checks before falling back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476822 (https://phabricator.wikimedia.org/T209504) (owner: 10Michael Große) [12:15:10] (03CR) 10Lucas Werkmeister (WMDE): [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476822 (https://phabricator.wikimedia.org/T209504) (owner: 10Michael Große) [12:15:17] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/476431 (https://phabricator.wikimedia.org/T208066) (owner: 10Cwhite) [12:15:24] 10Operations, 10SRE-Access-Requests, 10WMDE-Analytics-Engineering, 10Graphite, and 2 others: Requesting access to graphite hosts for addshore - https://phabricator.wikimedia.org/T208750 (10jijiki) [12:15:57] 10Operations, 10SRE-Access-Requests, 10WMDE-Analytics-Engineering, 10Graphite, and 2 others: Requesting access to graphite hosts for addshore - https://phabricator.wikimedia.org/T208750 (10jijiki) @Addshore Please ensure that your access on graphite hosts is alright [12:16:24] (03CR) 10Filippo Giunchedi: [C: 031] hiera: add cluster definition to spare role [puppet] - 10https://gerrit.wikimedia.org/r/476396 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [12:16:39] (03Merged) 10jenkins-bot: Perform more PHP constraint checks before falling back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476822 (https://phabricator.wikimedia.org/T209504) (owner: 10Michael Große) [12:17:20] Michael_WMDE: your change is on mwdebug1002, can you test it? [12:17:24] testing... [12:18:05] (03CR) 10jenkins-bot: Don’t send SPARQL prefixes in WikibaseQualityConstraints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476267 (https://phabricator.wikimedia.org/T204317) (owner: 10Lucas Werkmeister (WMDE)) [12:18:07] (03CR) 10jenkins-bot: Perform more PHP constraint checks before falling back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476822 (https://phabricator.wikimedia.org/T209504) (owner: 10Michael Große) [12:18:49] looks not broken to me, which is all I can test right now. Will watch how the statistics dashboards change in the next days [12:18:56] :) [12:19:00] good enough for me, going ahead [12:19:04] (03CR) 10Filippo Giunchedi: [C: 031] "Thanks for the rationale/explanation in the commit message, LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/444230 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [12:19:53] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:476822|Perform more PHP constraint checks before falling back (T209504)]] (duration: 00m 48s) [12:19:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:19:58] T209504: Perform more constraint type checks in PHP before falling back to SPARQL - https://phabricator.wikimedia.org/T209504 [12:20:18] okay, I think I’m done [12:20:21] Urbanecm: the stage is yours :) [12:20:37] I'm not a deployer, can you deploy my changes, please? [12:20:41] oh [12:20:49] I’m not sure if I should do that or leave it to addshore [12:21:05] (03PS4) 10Lucas Werkmeister (WMDE): Change sitename of shnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473670 (https://phabricator.wikimedia.org/T206777) (owner: 10Urbanecm) [12:21:16] addshore: is it okay if I continue deploying even though I’m not part of the SWAT team? [12:21:29] Lucas_WMDE: sure :) [12:21:32] if you want to! [12:21:35] alright [12:21:35] but dont feel you have to! [12:21:38] yeah, let’s get some more practice in :) [12:21:48] (03CR) 10Lucas Werkmeister (WMDE): [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473670 (https://phabricator.wikimedia.org/T206777) (owner: 10Urbanecm) [12:21:53] just make sure you are fine with each patch individually :) [12:21:56] Urbanecm: going ahead with shnwiki now [12:21:58] akosiaris: Dah. I just saw this. We've yet to merge client code. If it doesn't give any warnings, it is fine. [12:21:59] ack [12:22:05] Lucas_WMDE: you can also join the swat team ;) [12:22:20] kart_: ok [12:22:49] (03Merged) 10jenkins-bot: Change sitename of shnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473670 (https://phabricator.wikimedia.org/T206777) (owner: 10Urbanecm) [12:23:10] Hi, I am here. Sorry for lating. zeljkof: You SWAT`ing? [12:23:23] !log bootstrap cassandra-b on restbase2013 - T209615 [12:23:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:23:26] T209615: rack/setup/install restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T209615 [12:23:48] Zoranzoki21: I’m currently deploying Urbanecm’s first patch [12:23:56] Urbanecm: it’s on mwdebug1002 now [12:24:02] ack, testing [12:25:16] PROBLEM - cassandra-c service on restbase2013 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed [12:25:26] PROBLEM - cassandra-b CQL 10.192.16.83:9042 on restbase2013 is CRITICAL: connect to address 10.192.16.83 and port 9042: Connection refused [12:25:46] PROBLEM - Check systemd state on restbase2013 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:25:58] PROBLEM - cassandra-c CQL 10.192.16.84:9042 on restbase2013 is CRITICAL: connect to address 10.192.16.84 and port 9042: Connection refused [12:26:01] Lucas_WMDE, are you sure it's on mwdebug1002? [12:26:06] PROBLEM - Restbase root url on restbase2013 is CRITICAL: connect to address 10.192.16.80 and port 7231: Connection refused [12:26:06] PROBLEM - cassandra-c SSL 10.192.16.84:7001 on restbase2013 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [12:26:17] ah [12:26:20] no, forgot the git rebase [12:26:22] sorry :) [12:26:34] try again? [12:27:05] thanks, working, please continue with deploying [12:28:08] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:473670|Change sitename of shnwiki (T206777)]] (duration: 00m 47s) [12:28:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:28:13] T206777: Create Wikipedia Shan - https://phabricator.wikimedia.org/T206777 [12:28:21] Zoranzoki21: should I continue with SWAT or do you want to take over? [12:28:38] (03PS3) 10Lucas Werkmeister (WMDE): Close internalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468823 (https://phabricator.wikimedia.org/T205584) (owner: 10Urbanecm) [12:28:56] Zoranzoki21: You can continue with SWAT. I have free time [12:29:16] Lucas_WMDE, Zoranzoki21 is not a deployer, FYI [12:29:23] oh, okay, sorry [12:29:27] I misunderstood the message [12:29:30] then I’ll just continue, yeah [12:29:59] (03CR) 10Lucas Werkmeister (WMDE): [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468823 (https://phabricator.wikimedia.org/T205584) (owner: 10Urbanecm) [12:30:08] Oh, I thinked to you told something like: ''Should I continue with SWAT per list of patches or you want to I deploy your first?'' [12:30:16] :D [12:30:27] no, I thought you were a member of the SWAT team, sorry [12:30:33] because I’m not one, but apparently I’m running the SWAT now [12:30:36] (03CR) 10jenkins-bot: Change sitename of shnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473670 (https://phabricator.wikimedia.org/T206777) (owner: 10Urbanecm) [12:30:38] (with addshore supervising) [12:30:40] (03PS2) 10Ema: ATS: log cache results and backend URL [puppet] - 10https://gerrit.wikimedia.org/r/477246 [12:30:43] o/ [12:31:03] (03Merged) 10jenkins-bot: Close internalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468823 (https://phabricator.wikimedia.org/T205584) (owner: 10Urbanecm) [12:31:17] (03CR) 10jenkins-bot: Close internalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468823 (https://phabricator.wikimedia.org/T205584) (owner: 10Urbanecm) [12:31:32] (03CR) 10Ema: [C: 032] ATS: log cache results and backend URL [puppet] - 10https://gerrit.wikimedia.org/r/477246 (owner: 10Ema) [12:31:33] Urbanecm: should be on mwdebug1002 now [12:32:22] Unable to check, as I don't have access to the wiki, but as the wiki isn't down or something, I suppose it works [12:33:01] alright, I’ll go ahead then [12:33:09] (03PS4) 10Alexandros Kosiaris: cxserver: Added update Youdao config [puppet] - 10https://gerrit.wikimedia.org/r/476266 (https://phabricator.wikimedia.org/T210578) (owner: 10KartikMistry) [12:33:12] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] cxserver: Added update Youdao config [puppet] - 10https://gerrit.wikimedia.org/r/476266 (https://phabricator.wikimedia.org/T210578) (owner: 10KartikMistry) [12:34:23] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:468823|Close internalwiki (T205584)]] (duration: 00m 46s) [12:34:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:34:27] T205584: Close internal.wikimedia.org - https://phabricator.wikimedia.org/T205584 [12:34:42] Zoranzoki21: I think we’re ready to go ahead with your changes now [12:34:53] hopefully there should be enough time left in the SWAT [12:35:01] but would it be bad if the last one didn’t make it? [12:35:06] Lucas_WMDE: Ok. First patch should be uploading of logos, which can be moved directly at production [12:35:09] (03PS4) 10Lucas Werkmeister (WMDE): Upload HD logos for multiple projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477136 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [12:35:31] Lucas_WMDE, there shouldn't be any problem, the logo will just be on production with noone wanting them [12:35:40] sounds good [12:35:51] (I also think you can skip mwdebug part for all those patches) [12:36:00] addshore: to deploy https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/477136, I’ll use `sync-file static/images/project-logos/`, correct? [12:36:06] *looks* [12:36:18] (03CR) 10Lucas Werkmeister (WMDE): [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477136 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [12:36:20] yup, that will be fine [12:36:27] ok thanks [12:36:49] yeah, at least the first one makes no sense on mwdebug1002 [12:36:55] will check the others when we get to them [12:37:08] !log installing nodejs security updates on scb1001 [12:37:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:37:21] (03Merged) 10jenkins-bot: Upload HD logos for multiple projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477136 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [12:37:39] godog: looks like I can get onto the graphite boxes now :) [12:38:07] just to confirm, to delete a metric I just have to remove the correct .wsp files? (before I go and give it a go) [12:38:49] (03PS3) 10Lucas Werkmeister (WMDE): Cleaning of wgLogoHD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477138 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [12:39:19] !log lucaswerkmeister-wmde@deploy1001 Synchronized static/images/project-logos/: SWAT: [[gerrit:477136|Upload HD logos for multiple projects (T150618)]] (duration: 00m 47s) [12:39:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:22] T150618: Provide HD logos for all projects - https://phabricator.wikimedia.org/T150618 [12:39:24] as it's SWAT time, can somebody run createAndPromote.php for me and T204477? [12:39:24] T204477: Create punjabi.wikimedia.org for Punjabi Wikimedians User Group - https://phabricator.wikimedia.org/T204477 [12:39:44] 10Operations, 10Citoid, 10Services (watching), 10VisualEditor (Current work): Decreased internationalisation of automatic citations as a result of switch to new translation-server - https://phabricator.wikimedia.org/T210806 (10Mvolz) IMO it's relatively minor, and also not a new issue; it's in the patch su... [12:40:13] (03CR) 10Lucas Werkmeister (WMDE): [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477138 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [12:41:04] (03CR) 10Volans: "LGTM, just a last minute nitpick (my fault) and a question inline" (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [12:41:18] (03Merged) 10jenkins-bot: Cleaning of wgLogoHD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477138 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [12:41:18] Zoranzoki21: I’d prefer to test the cleanup patch on mwdebug just to be sure, I’ll let you know once it’s ready [12:41:59] Zoranzoki21: it’s on mwdebug1002, can you quickly check that nothing is broken? [12:42:09] Lucas_WMDE: Ok. Can you check logs too? [12:42:16] yup [12:42:51] (03CR) 10jenkins-bot: Upload HD logos for multiple projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477136 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [12:42:53] (03CR) 10jenkins-bot: Cleaning of wgLogoHD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477138 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [12:43:21] Lucas_WMDE: For me looks good [12:43:29] I don’t see anything either, going ahead [12:44:20] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:477138|Cleaning of wgLogoHD (T150618)]] (duration: 00m 46s) [12:44:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:44:23] T150618: Provide HD logos for all projects - https://phabricator.wikimedia.org/T150618 [12:44:32] okay, onto the third change [12:44:42] (03PS5) 10Lucas Werkmeister (WMDE): Use HD logos in InitialiseSettings.php for multiple projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477139 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [12:45:16] (03CR) 10Lucas Werkmeister (WMDE): [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477139 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [12:46:28] (03Merged) 10jenkins-bot: Use HD logos in InitialiseSettings.php for multiple projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477139 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [12:47:01] Zoranzoki21: should be on mwdebug1002 now, please test [12:47:17] Lucas_WMDE: Ok.. Testing [12:47:46] uh oh https://screenshotscdn.firefoxusercontent.com/images/7686c9a0-8878-4b60-bee0-687d9851ab26.png [12:48:30] Lucas_WMDE: Oh I saw it [12:49:06] Lucas_WMDE: Let`me rollback change [12:49:37] (03PS1) 10Zoranzoki21: Revert "Use HD logos in InitialiseSettings.php for multiple projects" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477255 [12:49:47] ok [12:50:02] (that screenshot will expire in 14 days, apparently :( ) [12:50:47] (03PS2) 10Zoranzoki21: Revert "Use HD logos in InitialiseSettings.php for multiple projects" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477255 (https://phabricator.wikimedia.org/T150618) [12:50:57] i see a bit of missing logo! :P [12:50:59] Lucas_WMDE: I fixed commit message [12:51:04] ok thanks [12:51:18] (03CR) 10Lucas Werkmeister (WMDE): [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477255 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [12:51:56] (03PS1) 10Volans: administrative: fix Reason's signature [software/spicerack] - 10https://gerrit.wikimedia.org/r/477256 (https://phabricator.wikimedia.org/T205884) [12:52:24] (03Merged) 10jenkins-bot: Revert "Use HD logos in InitialiseSettings.php for multiple projects" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477255 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [12:52:49] 10Operations, 10SRE-Access-Requests, 10WMDE-Analytics-Engineering, 10Graphite, and 2 others: Requesting access to graphite hosts for addshore - https://phabricator.wikimedia.org/T208750 (10Addshore) 05Open>03Resolved >>! In T208750#4793808, @jijiki wrote: > @Addshore Please ensure that your access on g... [12:52:51] okay, revert deployed to mwdebug1002 [12:53:08] addshore: I don’t need to do a scap-sync now, right? [12:53:09] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Addshore) Poke as it is now 1 or 2 weeks since the last movement here. @BBlack I just cced you on the patch so it appears in your review queue.... [12:53:38] Lucas_WMDE: if you didn't sync it out then in theory no :) [12:53:43] okay [12:54:05] then I think we’re done with patches, and the logos will have to be fixed [12:54:11] but we don’t need to revert the uploads, they should be harmless [12:54:27] * Urbanecm wonders why the logo is cuted... [12:54:55] Lucas_WMDE: You no need to revert uploads [12:55:01] Urbanecm: we have five minutes left, I could try to run the maintenance script… [12:55:14] but I don’t see from https://phabricator.wikimedia.org/T204477 yet why it’s necessary [12:55:32] It's a fishbowl wiki, that means it's detached from SUL [12:55:34] (03CR) 10jenkins-bot: Use HD logos in InitialiseSettings.php for multiple projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477139 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [12:55:36] (03CR) 10jenkins-bot: Revert "Use HD logos in InitialiseSettings.php for multiple projects" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477255 (https://phabricator.wikimedia.org/T150618) (owner: 10Zoranzoki21) [12:55:50] Lucas_WMDE, and as it's fishbowl wiki, only logged in people can edit [12:55:57] and as there's no account, nobody can login :D [12:56:40] Lucas_WMDE, for the purpose of running the script, the new account should be called "Satdeep Gill" and it should be bureaucrat and sysop [12:56:41] ah, I see [12:57:31] but I’m not comfortable with running this, sorry [12:57:44] too unfamiliar with createAndPromote + SUL [12:57:49] and too little time left for me [12:57:54] Ok then [12:57:56] perhaps addshore wants to do it? [12:58:00] or next SWAT [13:00:11] 10Operations, 10DBA, 10Availability (MediaWiki-MultiDC), 10Performance-Team (Radar): Investigate solutions for MySQL connection pooling - https://phabricator.wikimedia.org/T196378 (10jijiki) p:05Triage>03Normal [13:01:08] okay, the window is over, sorry [13:01:12] !log EU SWAT done [13:01:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:22] we got through 7 patches (8 if you count the revert), I’m happy with that [13:02:04] (03PS1) 10Alexandros Kosiaris: mathoid: Remove from scb conftool data [puppet] - 10https://gerrit.wikimedia.org/r/477258 [13:02:05] No problem, the limit is 6 actually :) [13:02:06] (03PS1) 10Alexandros Kosiaris: mathoid: Remove LVS from scb, repo, and users [puppet] - 10https://gerrit.wikimedia.org/r/477259 [13:02:08] (03PS1) 10Alexandros Kosiaris: Remove the mathoid profile [puppet] - 10https://gerrit.wikimedia.org/r/477260 [13:02:15] yes, I’m aware :) [13:02:30] I hope it wasn’t too bad that I didn’t reject the 7th change [13:02:45] You did well, thanks for deploying my changes [13:03:00] 👍 [13:03:03] woo! [13:03:08] thanks, you’re welcome :) [13:04:20] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time [13:05:35] (y) [13:10:41] 10Operations: Upgrade Ganeti clusters to 2.15.2-7+deb9u3 - https://phabricator.wikimedia.org/T210289 (10jijiki) p:05Triage>03Normal [13:11:34] is the job queue limited for any number of operations? i've changed the template and only ~3k out of ~22k pages got updated and nothing happened further for more than 15 hrs... [13:12:16] 10Operations, 10Math, 10Patch-For-Review: Clean up artifacts from LaTeX based math rendering - https://phabricator.wikimedia.org/T195847 (10Physikerwelt) @fgiunchedi, unfortunately, I don't understand your question. From my perspective, the next step is to create a ticket for > Remove LaTeX and texvcv(check... [13:13:33] (03PS1) 10Ema: ATS: set cache.cache_responses_to_cookies to 1 [puppet] - 10https://gerrit.wikimedia.org/r/477262 (https://phabricator.wikimedia.org/T209021) [13:21:28] 10Operations, 10ORES, 10Scoring-platform-team, 10Release Pipeline (Blubber): Build blubber file for ORES - https://phabricator.wikimedia.org/T210268 (10jijiki) p:05Triage>03Normal [13:22:15] 10Operations, 10ORES, 10Scoring-platform-team, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Backlog): Blubber should be able to make multi docker files per repo - https://phabricator.wikimedia.org/T210267 (10jijiki) p:05Triage>03Normal [13:22:57] 10Operations, 10ORES, 10Scoring-platform-team (Current): Investigate memory usage of ORES in kubernetes - https://phabricator.wikimedia.org/T210264 (10jijiki) p:05Triage>03Normal [13:23:05] (03CR) 10Alexandros Kosiaris: [C: 032] mathoid: Remove from scb conftool data [puppet] - 10https://gerrit.wikimedia.org/r/477258 (owner: 10Alexandros Kosiaris) [13:24:47] 10Operations, 10Traffic, 10Privacy: Disable WMF-Last-Access cookies for wmfusercontent.org - https://phabricator.wikimedia.org/T210167 (10jijiki) p:05Triage>03Normal [13:25:00] (03PS2) 10Alexandros Kosiaris: mathoid: Remove from scb LVS, repo, and users [puppet] - 10https://gerrit.wikimedia.org/r/477259 [13:25:02] (03PS2) 10Alexandros Kosiaris: Remove the mathoid profile [puppet] - 10https://gerrit.wikimedia.org/r/477260 [13:26:26] 10Operations, 10ops-eqiad, 10DC-Ops: icinga1001 mysterious reboots - https://phabricator.wikimedia.org/T210108 (10jijiki) p:05Triage>03Low @Cmjohnson @cwhite @Dzahn Has the host rebooted mysteriously again? If not, do you think we should close it? [13:27:27] 10Operations, 10media-storage: Ingest swift access logs for thumbnail/original analysis - https://phabricator.wikimedia.org/T209810 (10jijiki) p:05Triage>03Normal [13:27:41] 10Operations, 10Traffic: INMARSAT geolocates to the UK, leading to requests going to esams - https://phabricator.wikimedia.org/T209785 (10jijiki) p:05Triage>03Low @Reedy if you disagree with the priority I set, feel free to change it:) [13:31:08] (03CR) 10Gehel: [C: 031] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/477256 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [13:32:58] (03PS2) 10Ema: ATS: set cache_responses_to_cookies to 1 [puppet] - 10https://gerrit.wikimedia.org/r/477262 (https://phabricator.wikimedia.org/T209021) [13:33:55] 10Operations, 10Icinga, 10Scoring-platform-team: Add ahalfaker to ORES-related icinga contacts - https://phabricator.wikimedia.org/T210742 (10jijiki) p:05Triage>03Normal [13:34:32] (03CR) 10Gehel: elasticsearch: cookbook for multi-cluster services rolling restart (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [13:34:41] (03CR) 10Ema: [C: 032] ATS: set cache_responses_to_cookies to 1 [puppet] - 10https://gerrit.wikimedia.org/r/477262 (https://phabricator.wikimedia.org/T209021) (owner: 10Ema) [13:35:08] 10Operations: Filter potentially harmful PostScript commands in Commons upload/thumbor - https://phabricator.wikimedia.org/T210833 (10jijiki) p:05Triage>03Normal [13:35:46] (03PS3) 10Alexandros Kosiaris: mathoid: Remove from scb LVS, repo, and users [puppet] - 10https://gerrit.wikimedia.org/r/477259 [13:35:48] (03PS3) 10Alexandros Kosiaris: Remove the mathoid profile [puppet] - 10https://gerrit.wikimedia.org/r/477260 [13:36:09] (03CR) 10Alexandros Kosiaris: [C: 032] mathoid: Remove from scb LVS, repo, and users [puppet] - 10https://gerrit.wikimedia.org/r/477259 (owner: 10Alexandros Kosiaris) [13:36:19] (03CR) 10Alexandros Kosiaris: [C: 032] Remove the mathoid profile [puppet] - 10https://gerrit.wikimedia.org/r/477260 (owner: 10Alexandros Kosiaris) [13:38:35] addshore: that's right, from graphite1004 and graphite2003 [13:40:14] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 10media-storage: Loading full versions of larger images from Commons stucks / repeatedly gets interrupted after a few MBs - https://phabricator.wikimedia.org/T210890 (10jijiki) p:05Triage>03High Should we merge this with T190988 or vice versa? [13:42:28] (03PS1) 10Marostegui: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477267 (https://phabricator.wikimedia.org/T86338) [13:43:34] 10Operations, 10Puppet, 10ORES, 10Scoring-platform-team, 10Wikimedia-Incident: Logrotate should restart services when more people are around - https://phabricator.wikimedia.org/T210720 (10jijiki) p:05Triage>03Normal @Ladsgroup feel free to mark this as "Resolved" if you feel we don't have other options. [13:44:29] 10Operations, 10Epic, 10cloud-services-team (Kanban): CloudVPS: our ideal future model - https://phabricator.wikimedia.org/T209460 (10aborrero) [13:44:36] 10Operations, 10Cloud-Services, 10netops, 10Patch-For-Review: Renumber cloud-instance-transport1-b-eqiad to public IPs - https://phabricator.wikimedia.org/T207663 (10aborrero) [13:44:40] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477267 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [13:44:42] godog: ack, on both machines! [13:44:57] 10Operations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Renumber cloud-instance-transport1-b-eqiad to public IPs - https://phabricator.wikimedia.org/T207663 (10aborrero) [13:45:13] godog: I'm going to give some of the docs a tiny bit of love along the way [13:45:13] 10Operations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Renumber cloud-instance-transport1-b-eqiad to public IPs - https://phabricator.wikimedia.org/T207663 (10aborrero) p:05Normal>03Low [13:45:30] PROBLEM - Check systemd state on scb1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [13:45:34] PROBLEM - Check systemd state on scb2006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [13:45:44] PROBLEM - Check systemd state on scb1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [13:45:48] PROBLEM - Check systemd state on scb2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [13:45:48] PROBLEM - Check systemd state on scb1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [13:45:48] PROBLEM - Check systemd state on scb2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [13:45:49] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477267 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [13:46:02] PROBLEM - Check systemd state on scb2005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [13:46:04] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477267 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [13:46:16] PROBLEM - Check systemd state on scb2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [13:46:18] godog: I noticed the docs had different hosts listed, how can one tell which hosts are the "master" graphite servers? [13:46:20] addshore: for sure, that'd be much needed, thanks! feel free to reach out with more questions etc [13:46:48] addshore: ATM from site.pp in puppet [13:46:52] ack! [13:46:58] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1082 T86338 T202167 (duration: 00m 47s) [13:47:00] (03PS4) 10Mathew.onipe: elasticsearch: add new elastic2037-elastic2054 [puppet] - 10https://gerrit.wikimedia.org/r/475942 (https://phabricator.wikimedia.org/T210265) [13:47:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:02] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [13:47:02] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [13:47:04] which also reminds me I need to clean that up too [13:47:23] !log Deploy schema change on db1082 (sanitarium master) with replication, lag will be generated on labs (s5) T86338 T202167 [13:47:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:33] (03PS5) 10Mathew.onipe: elasticsearch: add new elastic2037-elastic2054 [puppet] - 10https://gerrit.wikimedia.org/r/475942 (https://phabricator.wikimedia.org/T210265) [13:48:54] godog: right, so I see 1001 and 1004 listed there but i only need to delete things from 1004? it doesnt matter if I leave them on 1001? [13:49:35] addshore: yeah 1001 isn't serving traffic atm, a spare host waiting decom [13:49:36] (03CR) 10Mathew.onipe: [C: 031] administrative: fix Reason's signature [software/spicerack] - 10https://gerrit.wikimedia.org/r/477256 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [13:49:53] godog: how can I tell that bit? [13:50:48] hieradata i guess? puppet\hieradata\role\eqiad\graphite\production.yaml [13:51:12] (03CR) 10Volans: [C: 032] administrative: fix Reason's signature [software/spicerack] - 10https://gerrit.wikimedia.org/r/477256 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [13:51:30] RECOVERY - Check systemd state on scb2002 is OK: OK - running: The system is fully operational [13:51:32] addshore: that's for the read path yeah, also hieradata/role/common/cache/text.yaml for the varnish configuration [13:51:51] thanks! [13:51:58] RECOVERY - Check systemd state on scb1001 is OK: OK - running: The system is fully operational [13:51:59] addshore: the write path is in modules/role/manifests/graphite/production.pp in the carbon-c-relay configuration, see metrics get written to 1004 and 2003 [13:52:01] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477268 [13:52:02] RECOVERY - Check systemd state on scb2003 is OK: OK - running: The system is fully operational [13:52:04] RECOVERY - Check systemd state on scb1002 is OK: OK - running: The system is fully operational [13:52:08] RECOVERY - Check systemd state on scb2001 is OK: OK - running: The system is fully operational [13:52:16] RECOVERY - Check systemd state on scb2005 is OK: OK - running: The system is fully operational [13:52:17] (03Merged) 10jenkins-bot: administrative: fix Reason's signature [software/spicerack] - 10https://gerrit.wikimedia.org/r/477256 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [13:53:44] 10Operations, 10ops-codfw, 10Core Platform Team, 10Services (doing), and 2 others: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 (10Eevans) [13:54:10] RECOVERY - Check systemd state on scb2006 is OK: OK - running: The system is fully operational [13:55:59] RECOVERY - Check systemd state on scb1004 is OK: OK - running: The system is fully operational [13:56:18] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install elastic203[7-9], elastic204[0-9], elastic205[0-4] - https://phabricator.wikimedia.org/T210450 (10Papaul) @Gehel The first 8 servers (elastic2037-elastic2044) are ready. The only thing left is the first puppet run. When running puppet agent on... [13:56:55] (03PS6) 10Mathew.onipe: elasticsearch: add new elastic2037-elastic2054 [puppet] - 10https://gerrit.wikimedia.org/r/475942 (https://phabricator.wikimedia.org/T210265) [13:58:42] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install elastic203[7-9], elastic204[0-9], elastic205[0-4] - https://phabricator.wikimedia.org/T210450 (10Gehel) @Papaul it looks like elastic2037-39 already have entries as `role(elasticsearch::cirrus)` in site.pp and elastic2040-44 don't have any ent... [13:59:59] (03PS1) 10Filippo Giunchedi: site: use spare::system for old graphite hosts [puppet] - 10https://gerrit.wikimedia.org/r/477269 (https://phabricator.wikimedia.org/T199321) [14:00:34] (03CR) 10jerkins-bot: [V: 04-1] site: use spare::system for old graphite hosts [puppet] - 10https://gerrit.wikimedia.org/r/477269 (https://phabricator.wikimedia.org/T199321) (owner: 10Filippo Giunchedi) [14:01:00] 10Operations, 10Maps: Cronspam from maps* hosts - https://phabricator.wikimedia.org/T211009 (10jijiki) p:05Triage>03Normal [14:02:14] (03PS2) 10Filippo Giunchedi: site: use spare::system for old graphite hosts [puppet] - 10https://gerrit.wikimedia.org/r/477269 (https://phabricator.wikimedia.org/T199321) [14:04:02] (03PS1) 10Gehel: elasticsearch: add entries in site.pp for new elasticsearch nodes [puppet] - 10https://gerrit.wikimedia.org/r/477272 [14:04:52] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477268 (owner: 10Marostegui) [14:05:43] (03CR) 10Volans: elasticsearch: cookbook for multi-cluster services rolling restart (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [14:05:48] (03PS2) 10Gehel: elasticsearch: add entries in site.pp for new elasticsearch nodes [puppet] - 10https://gerrit.wikimedia.org/r/477272 (https://phabricator.wikimedia.org/T210450) [14:06:01] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477268 (owner: 10Marostegui) [14:06:42] (03PS1) 10Arturo Borrero Gonzalez: openstack: dnsleaks.py: a PTR entry may have several records [puppet] - 10https://gerrit.wikimedia.org/r/477273 (https://phabricator.wikimedia.org/T202886) [14:07:04] (03CR) 10Mathew.onipe: [C: 031] elasticsearch: add entries in site.pp for new elasticsearch nodes [puppet] - 10https://gerrit.wikimedia.org/r/477272 (https://phabricator.wikimedia.org/T210450) (owner: 10Gehel) [14:07:14] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1082 T86338 T202167 (duration: 00m 46s) [14:07:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:19] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [14:07:19] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [14:07:37] (03CR) 10Gehel: elasticsearch: cookbook for multi-cluster services rolling restart (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [14:08:06] (03CR) 10Gehel: [C: 032] elasticsearch: add entries in site.pp for new elasticsearch nodes [puppet] - 10https://gerrit.wikimedia.org/r/477272 (https://phabricator.wikimedia.org/T210450) (owner: 10Gehel) [14:08:14] (03PS3) 10Gehel: elasticsearch: add entries in site.pp for new elasticsearch nodes [puppet] - 10https://gerrit.wikimedia.org/r/477272 (https://phabricator.wikimedia.org/T210450) [14:09:24] (03CR) 10Filippo Giunchedi: [C: 032] site: use spare::system for old graphite hosts [puppet] - 10https://gerrit.wikimedia.org/r/477269 (https://phabricator.wikimedia.org/T199321) (owner: 10Filippo Giunchedi) [14:09:31] (03PS3) 10Filippo Giunchedi: site: use spare::system for old graphite hosts [puppet] - 10https://gerrit.wikimedia.org/r/477269 (https://phabricator.wikimedia.org/T199321) [14:09:33] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477268 (owner: 10Marostegui) [14:10:57] (03PS2) 10Muehlenhoff: Remove sarin/neodymium from grant/mysql root hosts [puppet] - 10https://gerrit.wikimedia.org/r/466833 [14:12:02] (03PS2) 10Arturo Borrero Gonzalez: openstack: dnsleaks.py: a PTR entry may have several records [puppet] - 10https://gerrit.wikimedia.org/r/477273 (https://phabricator.wikimedia.org/T202886) [14:12:54] (03CR) 10Arturo Borrero Gonzalez: [C: 032] openstack: dnsleaks.py: a PTR entry may have several records [puppet] - 10https://gerrit.wikimedia.org/r/477273 (https://phabricator.wikimedia.org/T202886) (owner: 10Arturo Borrero Gonzalez) [14:13:45] (03PS1) 10Marostegui: db-eqiad.php: Depool db1113:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477275 (https://phabricator.wikimedia.org/T86338) [14:16:54] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1113:3315 T86338 T202167 (duration: 00m 46s) [14:16:57] !log Deploy schema change on db1113:3315 T86338 T202167 [14:16:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:16:58] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [14:16:59] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [14:17:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:04] (03CR) 10Alex Monk: "This looks okay but remember that if you have one in-addr zone with a PTR for a svc and a PTR for a non-svc, the non-svc will not get clea" [puppet] - 10https://gerrit.wikimedia.org/r/477273 (https://phabricator.wikimedia.org/T202886) (owner: 10Arturo Borrero Gonzalez) [14:18:17] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1113:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477275 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [14:18:20] (03PS4) 10Filippo Giunchedi: site: use spare::system for old graphite hosts [puppet] - 10https://gerrit.wikimedia.org/r/477269 (https://phabricator.wikimedia.org/T199321) [14:20:31] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1113:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477275 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [14:21:39] (03PS7) 10Mathew.onipe: elasticsearch: add new elastic2037-elastic2054 [puppet] - 10https://gerrit.wikimedia.org/r/475942 (https://phabricator.wikimedia.org/T210265) [14:21:41] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: add new elastic2037-elastic2054 [puppet] - 10https://gerrit.wikimedia.org/r/475942 (https://phabricator.wikimedia.org/T210265) (owner: 10Mathew.onipe) [14:21:47] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1113:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477275 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [14:22:14] (03PS8) 10Mathew.onipe: elasticsearch: add new elastic2037-elastic2054 [puppet] - 10https://gerrit.wikimedia.org/r/475942 (https://phabricator.wikimedia.org/T210265) [14:24:41] 10Operations, 10Traffic, 10Patch-For-Review: ATS backend-side request-mangling - https://phabricator.wikimedia.org/T209021 (10ema) [14:25:27] (03CR) 10Volans: elasticsearch: cookbook for multi-cluster services rolling restart (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [14:25:46] 10Operations, 10SRE-Access-Requests: Requesting access to deployment for Christoph Jauera (WMDE-Fisch) - https://phabricator.wikimedia.org/T211014 (10WMDE-Fisch) [14:26:19] PROBLEM - carbon-cache@d service on graphite1003 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@d is inactive [14:26:29] PROBLEM - carbon-local-relay service on graphite1003 is CRITICAL: CRITICAL - Expecting active but unit carbon-local-relay is failed [14:26:49] PROBLEM - carbon-cache@b service on graphite1003 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@b is inactive [14:26:49] PROBLEM - carbon-cache@a service on graphite1003 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@a is inactive [14:26:53] PROBLEM - carbon-cache@h service on graphite1003 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@h is inactive [14:27:01] PROBLEM - graphite.wikimedia.org on graphite1003 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.001 second response time [14:27:07] PROBLEM - Check systemd state on graphite1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:27:09] PROBLEM - carbon-cache@g service on graphite1003 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@g is inactive [14:27:18] PROBLEM - carbon-cache@e service on graphite1003 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@e is inactive [14:27:19] PROBLEM - carbon-cache@f service on graphite1003 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@f is inactive [14:27:19] PROBLEM - carbon-cache@c service on graphite1003 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@c is inactive [14:27:32] godog: any WIP on graphite1003? [14:29:00] * volans having a look [14:29:02] volans: yes, thanks I'll silence [14:29:06] ah ok [14:29:08] thx [14:30:03] moved to spare but icinga didn't get the memo yet [14:31:04] eheheh [14:32:48] (03PS9) 10Mathew.onipe: elasticsearch: add new elastic2037-elastic2054 [puppet] - 10https://gerrit.wikimedia.org/r/475942 (https://phabricator.wikimedia.org/T210265) [14:33:09] 10Operations, 10SRE-Access-Requests: Requesting access to deployment for Christoph Jauera (WMDE-Fisch) - https://phabricator.wikimedia.org/T211014 (10WMDE-Fisch) Pinging @Tobi_WMDE_SW for engineering manager approval. [14:33:20] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1113:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477278 [14:39:51] (03PS1) 10Mathew.onipe: spicerack: add dateutil dependency [puppet] - 10https://gerrit.wikimedia.org/r/477281 (https://phabricator.wikimedia.org/T207919) [14:40:51] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install elastic203[7-9], elastic204[0-9], elastic205[0-4] - https://phabricator.wikimedia.org/T210450 (10Papaul) @Gehel all first 8 servers are all yours [14:42:50] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install elastic203[7-9], elastic204[0-9], elastic205[0-4] - https://phabricator.wikimedia.org/T210450 (10Papaul) [14:43:13] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/477281 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [14:43:54] (03CR) 10BBlack: [C: 04-1] "Mostly looks good, but this version of the zonefile should have ns[012].wikimedia.org as the nameservers on the NS line. Outside parties " [dns] - 10https://gerrit.wikimedia.org/r/473543 (https://phabricator.wikimedia.org/T99531) (owner: 10Addshore) [14:45:54] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1113:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477278 (owner: 10Marostegui) [14:47:28] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1113:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477278 (owner: 10Marostegui) [14:48:50] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1113:3315 T86338 T202167 (duration: 00m 47s) [14:48:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:55] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [14:48:56] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [14:50:24] (03PS1) 10Marostegui: db-eqiad.php: Depool db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477282 (https://phabricator.wikimedia.org/T86338) [14:50:35] (03PS1) 10Muehlenhoff: Create a role for the initial installation [puppet] - 10https://gerrit.wikimedia.org/r/477283 [14:51:42] (03PS28) 10Mathew.onipe: elasticsearch: cookbook for multi-cluster services rolling restart [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) [14:51:45] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477282 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [14:52:54] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477282 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [14:53:22] (03CR) 10Mathew.onipe: elasticsearch: cookbook for multi-cluster services rolling restart (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [14:54:01] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1100 T86338 T202167 (duration: 00m 48s) [14:54:03] !log Deploy schema change on db1100 T86338 T202167 [14:54:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:06] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [14:54:06] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [14:54:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:17] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1100" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477284 [14:56:16] (03CR) 10Gehel: spicerack: add dateutil dependency (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/477281 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [14:57:20] 10Operations, 10Traffic, 10Patch-For-Review: ATS production-ready as a backend cache layer - https://phabricator.wikimedia.org/T207048 (10ema) [14:57:25] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1113:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477278 (owner: 10Marostegui) [14:57:28] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477282 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [15:02:20] (03PS10) 10Mathew.onipe: elasticsearch: add new elastic2037-elastic2044 [puppet] - 10https://gerrit.wikimedia.org/r/475942 (https://phabricator.wikimedia.org/T210265) [15:04:56] (03CR) 10Gehel: [C: 04-1] elasticsearch: add new elastic2037-elastic2044 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475942 (https://phabricator.wikimedia.org/T210265) (owner: 10Mathew.onipe) [15:05:06] onimisionipe: ^ [15:05:19] (03PS2) 10Mathew.onipe: spicerack: add dateutil dependency [puppet] - 10https://gerrit.wikimedia.org/r/477281 (https://phabricator.wikimedia.org/T207919) [15:07:44] (03CR) 10Volans: [C: 032] "We're done, thanks! LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [15:11:20] (03PS11) 10Mathew.onipe: elasticsearch: add new elastic2037-elastic2044 [puppet] - 10https://gerrit.wikimedia.org/r/475942 (https://phabricator.wikimedia.org/T210265) [15:11:42] (03CR) 10Mathew.onipe: spicerack: add dateutil dependency (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/477281 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [15:12:18] (03CR) 10Mathew.onipe: elasticsearch: add new elastic2037-elastic2044 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475942 (https://phabricator.wikimedia.org/T210265) (owner: 10Mathew.onipe) [15:13:16] (03PS1) 10CDanis: add grafana1001 host in row C, which appears to have more free capacity in Ganeti [dns] - 10https://gerrit.wikimedia.org/r/477286 (https://phabricator.wikimedia.org/T210416) [15:14:02] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 239, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [15:14:26] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1100" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477284 (owner: 10Marostegui) [15:15:27] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1100" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477284 (owner: 10Marostegui) [15:16:34] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1100 T86338 T202167 (duration: 00m 46s) [15:16:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:16:38] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [15:16:39] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [15:18:32] 10Operations, 10Maps, 10SRE-Access-Requests, 10Discovery-Search (Current work): Add Matt(onimisionipe) to maps-root - https://phabricator.wikimedia.org/T211020 (10Mathew.onipe) [15:18:40] 10Operations, 10Maps, 10SRE-Access-Requests, 10Discovery-Search (Current work): Add Matt(onimisionipe) to maps-root - https://phabricator.wikimedia.org/T211020 (10Mathew.onipe) p:05Triage>03Normal [15:20:34] (03PS1) 10Filippo Giunchedi: site: add new restbase codfw hardware [puppet] - 10https://gerrit.wikimedia.org/r/477288 (https://phabricator.wikimedia.org/T209615) [15:20:37] 10Operations, 10Maps, 10SRE-Access-Requests, 10Discovery-Search (Current work): Add Matt(onimisionipe) to maps-root - https://phabricator.wikimedia.org/T211020 (10Gehel) Note that the `maps-root` group does not exist yet. It is obvious from the name that we want members of that group to have full root acce... [15:20:49] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 241, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [15:21:30] 10Operations, 10Maps: Cronspam from maps* hosts - https://phabricator.wikimedia.org/T211009 (10Gehel) a:03Mathew.onipe [15:21:41] (03PS1) 10Ladsgroup: ores: Use json for result serializer [puppet] - 10https://gerrit.wikimedia.org/r/477289 (https://phabricator.wikimedia.org/T206333) [15:22:24] (03PS12) 10Gehel: elasticsearch: add new elastic2037-elastic2044 [puppet] - 10https://gerrit.wikimedia.org/r/475942 (https://phabricator.wikimedia.org/T210265) (owner: 10Mathew.onipe) [15:22:30] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1100" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477284 (owner: 10Marostegui) [15:22:45] 10Operations, 10Maps: Cronspam from maps* hosts - https://phabricator.wikimedia.org/T211009 (10Mathew.onipe) [15:22:52] 10Operations, 10Maps, 10Discovery-Search (Current work), 10Patch-For-Review: Cron test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) - https://phabricator.wikimedia.org/T210940 (10Mathew.onipe) [15:23:24] !log start configuration of elastic2037-2044 (new servers) - T210265 [15:23:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:28] T210265: Setup elasticsearch on new codfw servers - https://phabricator.wikimedia.org/T210265 [15:23:28] (03CR) 10Filippo Giunchedi: [C: 032] site: add new restbase codfw hardware [puppet] - 10https://gerrit.wikimedia.org/r/477288 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [15:23:45] (03CR) 10Volans: "minor nitpick inline, look good otherwise" (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/477286 (https://phabricator.wikimedia.org/T210416) (owner: 10CDanis) [15:23:48] (03PS13) 10Gehel: elasticsearch: add new elastic2037-elastic2044 [puppet] - 10https://gerrit.wikimedia.org/r/475942 (https://phabricator.wikimedia.org/T210265) (owner: 10Mathew.onipe) [15:25:03] (03CR) 10Gehel: [C: 032] elasticsearch: add new elastic2037-elastic2044 [puppet] - 10https://gerrit.wikimedia.org/r/475942 (https://phabricator.wikimedia.org/T210265) (owner: 10Mathew.onipe) [15:25:28] (03PS2) 10CDanis: add grafana1001 host in row C, which appears to have more free capacity in Ganeti [dns] - 10https://gerrit.wikimedia.org/r/477286 (https://phabricator.wikimedia.org/T210416) [15:25:44] (03CR) 10CDanis: add grafana1001 host in row C, which appears to have more free capacity in Ganeti (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/477286 (https://phabricator.wikimedia.org/T210416) (owner: 10CDanis) [15:26:10] (03CR) 10Volans: [C: 031] "LGTM" [dns] - 10https://gerrit.wikimedia.org/r/477286 (https://phabricator.wikimedia.org/T210416) (owner: 10CDanis) [15:26:51] (03CR) 10CDanis: [C: 032] add grafana1001 host in row C, which appears to have more free capacity in Ganeti [dns] - 10https://gerrit.wikimedia.org/r/477286 (https://phabricator.wikimedia.org/T210416) (owner: 10CDanis) [15:27:45] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: add instances for restbase201[4-8] [puppet] - 10https://gerrit.wikimedia.org/r/477216 (https://phabricator.wikimedia.org/T209615) (owner: 10Filippo Giunchedi) [15:28:47] (03PS2) 10Filippo Giunchedi: hieradata: add instances for restbase201[4-8] [puppet] - 10https://gerrit.wikimedia.org/r/477216 (https://phabricator.wikimedia.org/T209615) [15:35:45] 10Operations, 10ops-codfw, 10netops: codfw row A recable and add QFX - https://phabricator.wikimedia.org/T210447 (10Papaul) [15:40:28] 10Operations, 10ops-codfw, 10netops: codfw row A recable and add QFX - https://phabricator.wikimedia.org/T210447 (10Papaul) [15:42:15] 10Operations, 10ops-codfw: ms-be2047 spontaneous reboots - https://phabricator.wikimedia.org/T209921 (10Papaul) update from Dell Can you clear the log from the IDRAC, boot into the Life Cycle controller and run diagnostics. I need this to provide to my team lead for review. [15:45:12] 10Operations, 10Discovery-Search (Current work): Decommission elastic2001-2024 - https://phabricator.wikimedia.org/T211023 (10Gehel) [15:45:40] 10Operations, 10Maps, 10SRE-Access-Requests, 10Discovery-Search (Current work): Create maps-root group and add Matt(onimisionipe) to maps-root - https://phabricator.wikimedia.org/T211020 (10Mathew.onipe) [15:48:21] (03PS1) 10CDanis: Add DHCP and autoinstall options for grafana1001 [puppet] - 10https://gerrit.wikimedia.org/r/477293 (https://phabricator.wikimedia.org/T210416) [15:56:08] (03PS1) 10Mathew.onipe: admin: add create maps-roots and add onimisionipe(Matt) to it [puppet] - 10https://gerrit.wikimedia.org/r/477294 (https://phabricator.wikimedia.org/T211020) [16:01:04] PROBLEM - Check systemd state on restbase2016 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:02:05] that's me ^ expected [16:04:46] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 10media-storage: Loading full versions of larger images from Commons stucks / repeatedly gets interrupted after a few MBs - https://phabricator.wikimedia.org/T210890 (10BBlack) They seem different, as T190988 is about faulty uploads (which I presume... [16:08:43] 10Operations: puppet (systemd::service) attempts to start masked units - https://phabricator.wikimedia.org/T211027 (10fgiunchedi) [16:13:14] o/ bblack [16:13:40] when you say it should have the wmf nameservers on the ns line, do you mean lines 12-14 and also line 2? [16:19:03] (03PS2) 10CDanis: Add DHCP and autoinstall options for grafana1001 [puppet] - 10https://gerrit.wikimedia.org/r/477293 (https://phabricator.wikimedia.org/T210416) [16:19:44] (03CR) 10jerkins-bot: [V: 04-1] Add DHCP and autoinstall options for grafana1001 [puppet] - 10https://gerrit.wikimedia.org/r/477293 (https://phabricator.wikimedia.org/T210416) (owner: 10CDanis) [16:20:32] PROBLEM - cassandra-c SSL 10.192.32.175:7001 on restbase2016 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [16:22:20] PROBLEM - cassandra-c service on restbase2016 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed [16:23:25] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM! see inline too, feel free to merge once the extra space is gone" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/477293 (https://phabricator.wikimedia.org/T210416) (owner: 10CDanis) [16:24:15] (03PS3) 10CDanis: Add DHCP and autoinstall options for grafana1001 [puppet] - 10https://gerrit.wikimedia.org/r/477293 (https://phabricator.wikimedia.org/T210416) [16:24:55] (03CR) 10CDanis: Add DHCP and autoinstall options for grafana1001 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/477293 (https://phabricator.wikimedia.org/T210416) (owner: 10CDanis) [16:25:38] (03PS4) 10CDanis: Add DHCP and autoinstall options for grafana1001 [puppet] - 10https://gerrit.wikimedia.org/r/477293 (https://phabricator.wikimedia.org/T210416) [16:26:12] PROBLEM - restbase endpoints health on restbase2016 is CRITICAL: /en.wikipedia.org/v1/page/title/{title}{/revision} (Get rev by title from storage) timed out before a response was received [16:26:58] (03PS1) 10Mathew.onipe: maps: add maps-roots to maps hieradata [puppet] - 10https://gerrit.wikimedia.org/r/477298 (https://phabricator.wikimedia.org/T211020) [16:27:15] (03CR) 10CDanis: [C: 032] Add DHCP and autoinstall options for grafana1001 [puppet] - 10https://gerrit.wikimedia.org/r/477293 (https://phabricator.wikimedia.org/T210416) (owner: 10CDanis) [16:27:16] RECOVERY - restbase endpoints health on restbase2016 is OK: All endpoints are healthy [16:28:27] !log poweroff ms-be2021 for battery replacement - T208269 [16:28:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:36] bblack: in other words, the SOA line as well of the NS lines [16:29:46] PROBLEM - cassandra-a CQL 10.192.32.108:9042 on restbase2016 is CRITICAL: connect to address 10.192.32.108 and port 9042: Connection refused [16:29:51] papaul: ms-be2021 is powering off [16:30:00] (03CR) 10Addshore: "Should the SOA line also be changed to:?" [dns] - 10https://gerrit.wikimedia.org/r/473543 (https://phabricator.wikimedia.org/T99531) (owner: 10Addshore) [16:30:20] sorry for the restbase/cassandra spam, silencing [16:31:50] 10Operations, 10monitoring, 10netops, 10Patch-For-Review: Add virtual chassis port status alerting - https://phabricator.wikimedia.org/T201097 (10ayounsi) 05Open>03Resolved a:03ayounsi Done. Runnbook at https://wikitech.wikimedia.org/wiki/Network_monitoring#VCP_status [16:35:24] 10Operations, 10ops-codfw, 10Patch-For-Review, 10Services (watching), 10User-fgiunchedi: rack/setup/install restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T209615 (10fgiunchedi) All hosts had their first puppet run done, and restbase2013 is bootstrapping cassandra instances. On the rema... [16:36:04] (03PS2) 10Addshore: Add wikiba.se [dns] - 10https://gerrit.wikimedia.org/r/473543 (https://phabricator.wikimedia.org/T99531) [16:40:00] (03PS3) 10Addshore: Add wikiba.se [dns] - 10https://gerrit.wikimedia.org/r/473543 (https://phabricator.wikimedia.org/T99531) [16:40:48] (03PS2) 10Addshore: Define a new 'Wikibase' log channel to use [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474185 (https://phabricator.wikimedia.org/T207850) [16:42:33] (03CR) 10BBlack: [C: 031] "Yeah that looks about right! Will poke at this again a little later in the day" [dns] - 10https://gerrit.wikimedia.org/r/473543 (https://phabricator.wikimedia.org/T99531) (owner: 10Addshore) [16:43:14] 10Operations, 10Patch-For-Review, 10User-fgiunchedi: Return graphite100[13] to spares pool (or decom) - https://phabricator.wikimedia.org/T209357 (10colewhite) [16:43:16] 10Operations, 10ops-codfw, 10monitoring: Decom graphite2002 - https://phabricator.wikimedia.org/T200210 (10colewhite) [16:43:27] 10Operations, 10ops-codfw, 10monitoring: Decom graphite2001 - https://phabricator.wikimedia.org/T200209 (10colewhite) [16:44:18] 10Operations, 10Discovery, 10Maps-Sprint, 10Maps (Kartographer), and 2 others: nodejs 6.11 - https://phabricator.wikimedia.org/T170548 (10bd808) [16:44:21] (03PS2) 10Ladsgroup: ores: Accept json for result serializer [puppet] - 10https://gerrit.wikimedia.org/r/477289 (https://phabricator.wikimedia.org/T206333) [16:44:23] (03PS1) 10Ladsgroup: ores: Change result serializer to json [puppet] - 10https://gerrit.wikimedia.org/r/477302 (https://phabricator.wikimedia.org/T206333) [16:44:46] RECOVERY - cassandra-b CQL 10.192.16.83:9042 on restbase2013 is OK: TCP OK - 0.036 second response time on 10.192.16.83 port 9042 [16:46:56] (03PS3) 10Alexandros Kosiaris: ores: Accept json for result serializer [puppet] - 10https://gerrit.wikimedia.org/r/477289 (https://phabricator.wikimedia.org/T206333) (owner: 10Ladsgroup) [16:47:00] (03CR) 10Alexandros Kosiaris: [C: 032] ores: Accept json for result serializer [puppet] - 10https://gerrit.wikimedia.org/r/477289 (https://phabricator.wikimedia.org/T206333) (owner: 10Ladsgroup) [16:48:11] bblack: thanks for looking at the patch, if we get this moving I could probably have the dns switched over as early as tomorrow [16:48:32] not sure if we have to wait for the cert on the wmf side though (right now we don't even serve https on that domain...) [16:49:54] RECOVERY - HP RAID on ms-be2021 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK [16:50:10] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T208096 (10Papaul) a:05Papaul>03fgiunchedi @fgiunchedi All looks good on the system. You can resolve the task when finished double checking. [16:59:16] addshore: yeah after some meetings, etc, I can merge it up later today. The certificates will have to wait until the new year before we're ready with a bunch of other dependencies (changes in our infra). But we can do that part all on our side without coordination, and then come back and coordinate the actual switch of server IPs [16:59:40] ack, so no switch over before the certs are in place? [17:00:26] right. Once we have the DNS switched over like this, we can issue the certs (next quarter) without changing the server IPs. Then we can test and validate everything and coordinate switching the IPs over to our infra with a working cert already in place. [17:00:42] (because we use DNS TXT record validation with LE to issue the certs, not in-band over HTTP(S)) [17:09:19] addshore: re-reading, I think I misunderstood the meanings of "switch over" in our exchange above! [17:09:40] PROBLEM - Host ms-be2047 is DOWN: PING CRITICAL - Packet loss = 100% [17:09:41] !log T207377 reboot labstore1006 for upgrades [17:09:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:09:45] T207377: Reboot WMCS servers for L1TF - https://phabricator.wikimedia.org/T207377 [17:10:42] addshore: yes, we'll switch DNS hosting ahead of the cert (as in, shortly after the ops/dns change is merged). But the IPs provided by our DNS will stay pointing at the existing non-WMF webservers until later after we've configured and deployed certificates, and done some final checking and handoff stuff, during the next quarter. [17:12:54] bblack: ack! [17:22:36] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Reboot WMCS servers for L1TF - https://phabricator.wikimedia.org/T207377 (10Bstorm) [17:30:02] 10Operations, 10Analytics, 10Performance-Team, 10Traffic: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10fdans) Analytics needs x-analytics in every request, not only in debugging ones but we don't need to include it in the response headers. W... [17:33:02] (03PS2) 10Elukey: Allow analytics-admins to restart daemons with systemctl [puppet] - 10https://gerrit.wikimedia.org/r/475984 [17:34:13] (03CR) 10EBernhardson: [C: 031] [cirrus] Allow configuration arrays in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475747 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [17:36:28] (03CR) 10EBernhardson: [C: 031] [cirrus] prepare multi-instance services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475749 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [17:41:22] (03CR) 10EBernhardson: [C: 031] [cirrus] Add temp clusters but still write to the old ones [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475750 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [17:45:52] 10Operations, 10Analytics, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10chasemp) I want to acknowledge a few things: - @Legoktm I appreciate that you feel strongly about this - The use of exfat is not any sort of... [17:47:01] (03CR) 10EBernhardson: [C: 031] [cirrus] Start using replica group settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476272 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [17:55:12] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban, and 2 others: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10RobH) >>! In T207194#4789624, @Ottomata wrote: > Hm. They are cattle, but it would probably be nice if the whole node doesn't go down... [17:56:48] (03CR) 10EBernhardson: [C: 031] [cirrus] Cleanup transitional states (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476273 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [17:58:56] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban, and 2 others: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10RobH) Updated from IRC chat with Otto: These should have identical networking vlan setup as the cloudvirts. So we'll have to add the... [18:00:04] gehel and onimisionipe: My dear minions, it's time we take the moon! Just kidding. Time for Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181203T1800). [18:00:53] no deployment today :) [18:02:35] (03CR) 10DCausse: [cirrus] Cleanup transitional states (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476273 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [18:05:49] (03PS1) 10CDanis: Fix MAC address for grafana1001 [puppet] - 10https://gerrit.wikimedia.org/r/477311 (https://phabricator.wikimedia.org/T210416) [18:06:19] !log bootstrap cassandra-c on restbase2013 - T209615 [18:06:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:06:23] T209615: rack/setup/install restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T209615 [18:06:28] RECOVERY - cassandra-c service on restbase2013 is OK: OK - cassandra-c is active [18:06:36] RECOVERY - cassandra-c SSL 10.192.16.84:7001 on restbase2013 is OK: SSL OK - Certificate restbase2013-c valid until 2020-11-29 09:26:06 +0000 (expires in 726 days) [18:06:52] RECOVERY - Check systemd state on restbase2013 is OK: OK - running: The system is fully operational [18:08:15] !log add elastic2037 to cirrus eqiad (new server) - T210265 [18:08:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:08:18] T210265: Setup elasticsearch on new codfw servers - https://phabricator.wikimedia.org/T210265 [18:09:28] (03CR) 10Arturo Borrero Gonzalez: [C: 032] "> This looks okay but remember that if you have one in-addr zone with" [puppet] - 10https://gerrit.wikimedia.org/r/477273 (https://phabricator.wikimedia.org/T202886) (owner: 10Arturo Borrero Gonzalez) [18:11:21] (03CR) 10Filippo Giunchedi: [C: 032] Fix MAC address for grafana1001 [puppet] - 10https://gerrit.wikimedia.org/r/477311 (https://phabricator.wikimedia.org/T210416) (owner: 10CDanis) [18:16:25] (03PS3) 10Elukey: Allow analytics-admins to restart daemons with systemctl [puppet] - 10https://gerrit.wikimedia.org/r/475984 [18:18:43] (03CR) 10Dzahn: [C: 031] "confirmed with: dig MX wikiba.se @ns.udag.de ; dig wikiba.se @ns.udag.de ;dig www.wikiba.se @ns.udag.de" [dns] - 10https://gerrit.wikimedia.org/r/473543 (https://phabricator.wikimedia.org/T99531) (owner: 10Addshore) [18:19:48] (03CR) 10Elukey: [C: 031] Allow pull based rsync between stat & notebook boxes only [puppet] - 10https://gerrit.wikimedia.org/r/476920 (https://phabricator.wikimedia.org/T205157) (owner: 10Ottomata) [18:19:57] (03CR) 10Elukey: [C: 032] Allow analytics-admins to restart daemons with systemctl [puppet] - 10https://gerrit.wikimedia.org/r/475984 (owner: 10Elukey) [18:22:47] (03CR) 10Dzahn: [C: 031] "** meeting result: Approved (not sure about that comma and double check the UID has not been used twice)" [puppet] - 10https://gerrit.wikimedia.org/r/477294 (https://phabricator.wikimedia.org/T211020) (owner: 10Mathew.onipe) [18:23:18] (03CR) 10Dzahn: [C: 031] "s/uid/gid" [puppet] - 10https://gerrit.wikimedia.org/r/477294 (https://phabricator.wikimedia.org/T211020) (owner: 10Mathew.onipe) [18:25:35] (03CR) 10Dzahn: [C: 031] "fwiw, merging this change doesn't give actual access, the follow-up https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/477298/ will" [puppet] - 10https://gerrit.wikimedia.org/r/477294 (https://phabricator.wikimedia.org/T211020) (owner: 10Mathew.onipe) [18:26:54] (03CR) 10Dzahn: [C: 04-1] "a typo. "maps-root" vs. "maps-roots" but otherwise good and needs to go after https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/47729" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/477298 (https://phabricator.wikimedia.org/T211020) (owner: 10Mathew.onipe) [18:29:51] (03PS1) 10Gehel: elasticsearch: create base data dir [puppet] - 10https://gerrit.wikimedia.org/r/477314 [18:31:48] (03PS21) 10Paladox: phabricator: Add support for php-fpm in stretch [puppet] - 10https://gerrit.wikimedia.org/r/476985 [18:32:04] (03PS22) 10Paladox: phabricator: Add support for php-fpm in stretch [puppet] - 10https://gerrit.wikimedia.org/r/476985 [18:35:03] (03PS2) 10Mathew.onipe: maps: add maps-roots to maps hieradata [puppet] - 10https://gerrit.wikimedia.org/r/477298 (https://phabricator.wikimedia.org/T211020) [18:38:11] paladox got phabricator running with php-fpm ,PHP7 and using _joe_'s PHP module https://phab.wmflabs.org/ | https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/476985/22/modules/profile/manifests/phabricator/main.pp [18:38:37] PROBLEM - Long running screen/tmux on logstash1006 is CRITICAL: CRIT: Long running SCREEN process. (user: root PID: 31639, 2174115s 1728000s). [18:39:17] (03PS1) 10RobH: setting production dns for cloudvirtan100[1-5]] [dns] - 10https://gerrit.wikimedia.org/r/477317 (https://phabricator.wikimedia.org/T207194) [18:39:20] (03PS2) 10Anomie: Avoid putting Message objects in sidebar cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476280 (https://phabricator.wikimedia.org/T210528) [18:39:37] (03CR) 10Anomie: [C: 032] "Deploying "config" change" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476280 (https://phabricator.wikimedia.org/T210528) (owner: 10Anomie) [18:39:41] (03PS1) 10Elukey: admin: correct path for systemctl in analytics-admins rules [puppet] - 10https://gerrit.wikimedia.org/r/477318 [18:39:59] <_joe_> mutante: yeah I need to take a look at parameters, but I didn't get to it today [18:40:02] (03PS23) 10Paladox: phabricator: Add support for php-fpm in stretch [puppet] - 10https://gerrit.wikimedia.org/r/476985 [18:40:03] <_joe_> I'll do it tomorrow [18:40:14] _joe_: :) cool [18:40:17] (03CR) 10Elukey: [C: 032] admin: correct path for systemctl in analytics-admins rules [puppet] - 10https://gerrit.wikimedia.org/r/477318 (owner: 10Elukey) [18:40:45] (03Merged) 10jenkins-bot: Avoid putting Message objects in sidebar cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476280 (https://phabricator.wikimedia.org/T210528) (owner: 10Anomie) [18:40:54] _joe_ seems your vars from https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/476500/ work in labs! [18:41:15] https://phabricator.wikimedia.org/P7880 [18:41:55] <_joe_> paladox: yeah if you want me to take a better look, I'll do it tomorrow morning [18:42:06] (03CR) 10RobH: [C: 032] setting production dns for cloudvirtan100[1-5]] [dns] - 10https://gerrit.wikimedia.org/r/477317 (https://phabricator.wikimedia.org/T207194) (owner: 10RobH) [18:42:07] <_joe_> today I was killed by php-fpm's bugs with logging :/ [18:42:19] _joe_ ok, thanks! [18:42:21] !log anomie@deploy1001 Synchronized wmf-config/CommonSettings.php: Updating SkinBuildSidebar hook function for T210528 (duration: 00m 47s) [18:42:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:42:26] T210528: PHP/HHVM serialization incompatibility in some situations when using Serializable - https://phabricator.wikimedia.org/T210528 [18:48:07] (03PS2) 10GTirloni: Remove labstore::monitoring::nfsd [puppet] - 10https://gerrit.wikimedia.org/r/477234 (owner: 10Muehlenhoff) [18:49:23] (03PS6) 10MarcoAurelio: Close chairwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/443585 (https://phabricator.wikimedia.org/T184961) [18:50:14] (03CR) 10GTirloni: [C: 032] Remove labstore::monitoring::nfsd [puppet] - 10https://gerrit.wikimedia.org/r/477234 (owner: 10Muehlenhoff) [18:51:24] (03CR) 10jenkins-bot: Avoid putting Message objects in sidebar cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476280 (https://phabricator.wikimedia.org/T210528) (owner: 10Anomie) [18:52:12] (03CR) 10Cwhite: [C: 032] hiera: add cluster definition to spare role [puppet] - 10https://gerrit.wikimedia.org/r/476396 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [18:52:19] (03PS4) 10Cwhite: hiera: add cluster definition to spare role [puppet] - 10https://gerrit.wikimedia.org/r/476396 (https://phabricator.wikimedia.org/T210486) [18:52:53] (03PS1) 10RobH: adding ipv6 for cloudvirtan hosts [dns] - 10https://gerrit.wikimedia.org/r/477322 (https://phabricator.wikimedia.org/T207194) [18:59:36] I'll SWAT given that I'm half the window. [19:00:00] (03CR) 10RobH: [C: 032] adding ipv6 for cloudvirtan hosts [dns] - 10https://gerrit.wikimedia.org/r/477322 (https://phabricator.wikimedia.org/T207194) (owner: 10RobH) [19:00:04] Deploy window Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181203T1900) [19:00:04] dcausse, niedzielski, and James_F: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:14] o/ [19:00:20] niedzielski: Kk, merging now. [19:00:29] o/ [19:00:34] thanks James_F ! [19:00:41] (03PS9) 10Jforrester: [cirrus] Allow configuration arrays in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475747 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [19:00:49] (03CR) 10Jforrester: [C: 032] "SWATage." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475747 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [19:01:37] James_F: mind if I add a (slightly belated) patch? [19:01:43] RoanKattouw: Go for it. [19:01:52] (03Merged) 10jenkins-bot: [cirrus] Allow configuration arrays in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475747 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [19:02:10] dcausse: For 475747, presumably I need to sync CommonSettings first and then CirrusSearch-production? [19:02:42] (Ideally it should have been multiple patches so I don't have to worry. ;-)) [19:02:44] James_F: order is not importnat but I'd prefer to do a quick on mwdebug1002 if you don't mind [19:02:53] Oh, of course. [19:03:19] 10Operations, 10ops-codfw, 10Core Platform Team, 10Services (doing), and 2 others: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 (10Eevans) [19:03:47] James_F: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Echo/+/477325 ; will add to the wiki page [19:03:51] dcausse: live on mwdebug1002 too. [19:03:56] thanks, checking [19:04:02] RoanKattouw: Ta. [19:04:38] (03CR) 10jenkins-bot: [cirrus] Allow configuration arrays in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475747 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [19:04:39] niedzielski: The MF patch has failed CI for the hhvm task :-( [19:06:02] Oh, just the normal npm flakiness. [19:06:21] yeah, i think npm install failed [19:06:29] James_F: would you give it another go? [19:06:38] Once it reports back I'll try again. [19:06:41] James_F: sounds good, you can ship the files in any order you want [19:06:51] dcausse: OK, going live now. [19:08:06] (03PS9) 10Jforrester: [cirrus] switch to explicit config in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475748 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [19:08:08] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: SWAT T210381 I2ae162f5 Part I (duration: 00m 46s) [19:08:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:08:18] T210381: Update mw-config to use the psi&omega elastic clusters in codfw - https://phabricator.wikimedia.org/T210381 [19:09:14] !log jforrester@deploy1001 Synchronized wmf-config/CirrusSearch-production.php: SWAT T210381 I2ae162f5 Part II (duration: 00m 47s) [19:09:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:09:20] (03CR) 10Jforrester: [C: 032] [cirrus] switch to explicit config in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475748 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [19:11:02] (03Merged) 10jenkins-bot: [cirrus] switch to explicit config in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475748 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [19:11:30] dcausse: Second patch live on mwdebug1002. [19:11:35] checking, thanks [19:12:26] 10Operations, 10ops-codfw, 10netops: codfw row A recable and add QFX - https://phabricator.wikimedia.org/T210447 (10ayounsi) [19:12:29] (03PS2) 10Jforrester: Make it possible to configure enhanced RC/Watchlist default state per wiki, Part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475804 [19:12:36] PROBLEM - Host mw2156.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [19:14:06] James_F: all good to me [19:14:15] Excellent. Deploying now. [19:14:40] 10Operations, 10Gadgets, 10MediaWiki-Cache, 10MW-1.33-notes (1.33.0-wmf.6; 2018-11-27), and 4 others: Mcrouter periodically reports soft TKOs for mc[1,2]035 leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) >>! In T203786#4792584, @aaron wrote: > To clarify, the $use... [19:15:17] (03CR) 10Jforrester: [C: 032] Make it possible to configure enhanced RC/Watchlist default state per wiki, Part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475804 (owner: 10Jforrester) [19:15:32] niedzielski: It's still not even failed yet. :_( [19:15:36] !log jforrester@deploy1001 Synchronized wmf-config/ProductionServices.php: SWAT T210381 I73c7596818b Actual config (duration: 00m 46s) [19:15:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:42] T210381: Update mw-config to use the psi&omega elastic clusters in codfw - https://phabricator.wikimedia.org/T210381 [19:15:50] dcausse: Live everywhere. Thank you for flying the SWATage skies. [19:16:03] James_F: great, thanks for the deploy! :) [19:16:14] Any time. [19:16:16] (03Merged) 10jenkins-bot: Make it possible to configure enhanced RC/Watchlist default state per wiki, Part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475804 (owner: 10Jforrester) [19:16:49] James_F: any reason I shouldn't kill the Jenkins job? [19:16:54] https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-php71-docker/1122/ [19:17:13] actually, looks like it's wrapping up now [19:17:17] niedzielski: Won't help much. [19:17:20] Yeah, that. [19:17:47] (03PS1) 10RobH: setting cloudvirtan install params [puppet] - 10https://gerrit.wikimedia.org/r/477327 (https://phabricator.wikimedia.org/T207194) [19:17:52] (03CR) 10jenkins-bot: [cirrus] switch to explicit config in production services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475748 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse) [19:17:54] (03CR) 10jenkins-bot: Make it possible to configure enhanced RC/Watchlist default state per wiki, Part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475804 (owner: 10Jforrester) [19:18:16] (03PS2) 10RobH: setting cloudvirtan install params [puppet] - 10https://gerrit.wikimedia.org/r/477327 (https://phabricator.wikimedia.org/T207194) [19:18:18] (03CR) 10Mathew.onipe: maps: add maps-roots to maps hieradata (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/477298 (https://phabricator.wikimedia.org/T211020) (owner: 10Mathew.onipe) [19:18:58] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Errors trying to fetch RDF from Wikidata - https://phabricator.wikimedia.org/T207718 (10Imarlier) @BBlack @ema Couple of questions for you about Nginx: - Do we have nginx configured to handle a specific number of requests on a given wo... [19:19:02] RoanKattouw: On debug1002. Please test. [19:24:22] Got an HTTP 500, looking at logstash for why [19:24:56] Looks like it was just a timeout [19:25:09] OK, going live. [19:25:12] Tried it again and it loaded quickly, and the bug is fixed [19:25:58] !log jforrester@deploy1001 Synchronized php-1.33.0-wmf.6/extensions/Echo/modules/styles/mw.echo.ui.PaginationWidget.less: SWAT T210487 I914b94515 (duration: 00m 47s) [19:26:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:02] T210487: Navigation controls are out of place in the Notification page - https://phabricator.wikimedia.org/T210487 [19:26:09] (03PS3) 10RobH: setting cloudvirtan install params [puppet] - 10https://gerrit.wikimedia.org/r/477327 (https://phabricator.wikimedia.org/T207194) [19:26:57] (03PS2) 10Jforrester: Make it possible to configure enhanced RC/Watchlist default state per wiki, Part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475805 [19:27:03] (03CR) 10Jforrester: [C: 032] Make it possible to configure enhanced RC/Watchlist default state per wiki, Part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475805 (owner: 10Jforrester) [19:27:32] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT Ic2787309e59e IS part of setting enhanced RC (duration: 00m 47s) [19:27:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:27:45] (03PS4) 10RobH: setting cloudvirtan install params [puppet] - 10https://gerrit.wikimedia.org/r/477327 (https://phabricator.wikimedia.org/T207194) [19:28:35] (03Merged) 10jenkins-bot: Make it possible to configure enhanced RC/Watchlist default state per wiki, Part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475805 (owner: 10Jforrester) [19:28:37] (03CR) 10RobH: [C: 032] setting cloudvirtan install params [puppet] - 10https://gerrit.wikimedia.org/r/477327 (https://phabricator.wikimedia.org/T207194) (owner: 10RobH) [19:30:09] (03PS2) 10Jforrester: [Beta Cluster] Make enhanced RC the default on Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475806 [19:30:14] (03CR) 10Jforrester: [C: 032] [Beta Cluster] Make enhanced RC the default on Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475806 (owner: 10Jforrester) [19:30:34] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: SWAT I3b906e8b1 CS part of setting enhanced RC (duration: 00m 46s) [19:30:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:31:07] (03CR) 10jenkins-bot: Make it possible to configure enhanced RC/Watchlist default state per wiki, Part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475805 (owner: 10Jforrester) [19:31:17] (03Merged) 10jenkins-bot: [Beta Cluster] Make enhanced RC the default on Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475806 (owner: 10Jforrester) [19:31:30] (03CR) 10jenkins-bot: [Beta Cluster] Make enhanced RC the default on Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475806 (owner: 10Jforrester) [19:31:30] niedzielski: Aha, finally merged. [19:31:37] \o/ [19:32:06] niedzielski: Live on mwdebug1002. [19:32:35] James_F: i'm testing it now. i should have feedback pretty quick [19:33:45] RECOVERY - Host mw2156.mgmt is UP: PING OK - Packet loss = 0%, RTA = 39.00 ms [19:35:52] eek, this should not cause this error: [19:35:52] Request from 73.181.4.128 via cp1077 cp1077, Varnish XID 530433812 [19:35:53] Error: 500, Internal Server Error at Mon, 03 Dec 2018 19:35:20 GMT [19:36:41] that error aside which i can longer reproduce, it seems to be working. let me check out logstash [19:37:28] niedzielski: I've seen that in testing generally. Don't think it's related. [19:38:02] i see info logs in logstash but not the error :/ [19:38:16] James_F: thanks, looks good to me. [19:38:19] It's coming from cp* not mw* so… [19:38:22] OK, syncing. [19:39:12] !log jforrester@deploy1001 Synchronized php-1.33.0-wmf.6/extensions/MobileFrontend/resources/mobile.toc/TableOfContents.js: SWAT T210869 Fix Table of contents rendering (duration: 00m 47s) [19:39:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:16] T210869: [SWAT] Regression: Table of contents is empty - https://phabricator.wikimedia.org/T210869 [19:39:33] OK, that's SWAT done. Conch free. [19:40:12] thank you James_F ! [19:40:24] Any time. :-) [19:42:43] i see the changes elsewhere now [19:44:16] 10Operations, 10JADE, 10TechCom, 10Epic, and 3 others: Deploy JADE extension to production - https://phabricator.wikimedia.org/T183381 (10awight) [19:47:51] Hi there -- I'm not authorized to acknowledge alerts in Icinga, so I can't make this stop: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=icinga1001&service=https%3A%2F%2Fgrafana.wikimedia.org%2Fdashboard%2Fdb%2Fwebpagereplay-desktop-alerts+grafana+alert [19:48:24] It can be acknowledged, with the note "Fundraising banner has been re-enabled after not showing during the weekend: http://wpt.wmftest.org/video/compare.php?tests=181203_Y3_G4,181203_3Y_DG" [19:51:50] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Errors trying to fetch RDF from Wikidata - https://phabricator.wikimedia.org/T207718 (10Imarlier) @Smalyshev Another thought: why not just disable pooling, and have the client close each connection after each request? [19:52:27] marlier: ok, doing [19:53:00] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Errors trying to fetch RDF from Wikidata - https://phabricator.wikimedia.org/T207718 (10Smalyshev) @lmarlier wouldn't that be slower? But I could try that too I guess. [19:53:09] herron: thanks. Would if I could :-/ [19:55:38] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Errors trying to fetch RDF from Wikidata - https://phabricator.wikimedia.org/T207718 (10Imarlier) @Smalyshev Yes, it would be slower, but it would also be diagnostic -- if persistent connections are disabled and the errors stop, we can... [19:56:18] (03PS1) 10RobH: fix cloudvirtan100* entries in dhcp [puppet] - 10https://gerrit.wikimedia.org/r/477336 [19:56:30] (03PS2) 10RobH: fix cloudvirtan100* entries in dhcp [puppet] - 10https://gerrit.wikimedia.org/r/477336 [19:56:44] (03CR) 10RobH: [C: 032] fix cloudvirtan100* entries in dhcp [puppet] - 10https://gerrit.wikimedia.org/r/477336 (owner: 10RobH) [19:57:37] marlier: ok done [19:58:06] !log ppchelko@deploy1001 Started deploy [changeprop/deploy@867c571]: TEMP: stop production of revision-scor events for schema change [19:58:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:59:19] !log ppchelko@deploy1001 Finished deploy [changeprop/deploy@867c571]: TEMP: stop production of revision-scor events for schema change (duration: 01m 13s) [19:59:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:07] marlier: I was also looking through the configs too to see about your icinga access. I don’t see you in there but if you wanted to spin up an access request task we could review/discuss at the next sre meeting [20:02:44] Depends how annoying it is to ack alerts for me about 4 times a year :-) [20:03:31] haha fair point [20:05:15] !log push firewall change to pfw3-eqiad - T211028 [20:05:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:22] !log onimisionipe@deploy1001 Started deploy [wdqs/wdqs@0c94e5f]: New GUI and updater build [20:05:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:07:20] 10Operations, 10Analytics, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10JBennett) > 2) With respect to the WMF charter and the values and manifestation thereof, it seems the exception process and/or the bar for ea... [20:08:44] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban, 10User-Elukey: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10RobH) [20:09:00] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban, 10User-Elukey: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10RobH) Ok, I have these booting into the installer, but it dislikes something about the new recipe I made for them. I'm troubleshoot... [20:14:53] !log onimisionipe@deploy1001 Finished deploy [wdqs/wdqs@0c94e5f]: New GUI and updater build (duration: 09m 31s) [20:14:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:20] (03CR) 10Cwhite: [C: 032] initial commit (0310 comments) [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/471298 (https://phabricator.wikimedia.org/T208066) (owner: 10Cwhite) [20:19:25] !log ppchelko@deploy1001 Started deploy [changeprop/deploy@7470c85]: Start emitting revision-score events with new schema [20:19:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:20:39] !log ppchelko@deploy1001 Finished deploy [changeprop/deploy@7470c85]: Start emitting revision-score events with new schema (duration: 01m 13s) [20:20:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:28:47] (03PS2) 10Gehel: Enable RDF dumps everywhere [puppet] - 10https://gerrit.wikimedia.org/r/477111 (owner: 10Smalyshev) [20:30:15] (03CR) 10Gehel: [C: 032] Enable RDF dumps everywhere [puppet] - 10https://gerrit.wikimedia.org/r/477111 (owner: 10Smalyshev) [20:32:31] (03PS4) 10Dzahn: kibana: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/476916 [20:33:13] RECOVERY - cassandra-c CQL 10.192.16.84:9042 on restbase2013 is OK: TCP OK - 0.036 second response time on 10.192.16.84 port 9042 [20:48:04] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Errors trying to fetch RDF from Wikidata - https://phabricator.wikimedia.org/T207718 (10Imarlier) I've been running this in a tmux session on a few of the wdqs servers: `while :; do DSTAMP=$(date); CW=$(sudo netstat -anet | grep 208.8... [20:48:43] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Errors trying to fetch RDF from Wikidata - https://phabricator.wikimedia.org/T207718 (10Imarlier) @Smalyshev Guessing this should go back to you for followup? [20:49:39] 10Operations, 10ops-codfw, 10Core Platform Team, 10Services (doing), and 2 others: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 (10Eevans) [20:50:03] (03PS1) 10CDanis: Change cdanis shell to zsh, and add a bunch of dotfiles [puppet] - 10https://gerrit.wikimedia.org/r/477352 [20:50:16] 10Operations, 10ops-codfw, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc2004 pc2005 pc2006 - https://phabricator.wikimedia.org/T209858 (10RobH) [20:51:00] (03CR) 10CDanis: [C: 032] Change cdanis shell to zsh, and add a bunch of dotfiles [puppet] - 10https://gerrit.wikimedia.org/r/477352 (owner: 10CDanis) [20:51:41] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Errors trying to fetch RDF from Wikidata - https://phabricator.wikimedia.org/T207718 (10Smalyshev) Thanks, I'll try to play with the connection pooling and see what happens and report here. [20:55:59] PROBLEM - puppet last run on mw2170 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 2 minutes ago with 4 failures. Failed resources (up to 3 shown): File[/home/cdanis/.gitconfig],File[/home/cdanis/.tmux.conf],File[/home/cdanis/.zshenv],File[/home/cdanis/.zshrc] [20:56:13] aw jeez [20:56:21] 10Operations, 10ops-codfw, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc2004 pc2005 pc2006 - https://phabricator.wikimedia.org/T209858 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for pc2004.codfw.wmnet and performed the following actions: - Revoked Pu... [20:56:34] 10Operations, 10ops-codfw, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc2004 pc2005 pc2006 - https://phabricator.wikimedia.org/T209858 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for pc2005.codfw.wmnet and performed the following actions: - Revoked Pu... [20:56:45] 10Operations, 10ops-codfw, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc2004 pc2005 pc2006 - https://phabricator.wikimedia.org/T209858 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for pc2006.codfw.wmnet and performed the following actions: - Revoked Pu... [20:57:30] (03PS1) 10CDanis: Revert "Change cdanis shell to zsh, and add a bunch of dotfiles" Seems to make puppet fail on some hosts. [puppet] - 10https://gerrit.wikimedia.org/r/477356 [20:58:06] (03CR) 10jerkins-bot: [V: 04-1] Revert "Change cdanis shell to zsh, and add a bunch of dotfiles" Seems to make puppet fail on some hosts. [puppet] - 10https://gerrit.wikimedia.org/r/477356 (owner: 10CDanis) [20:58:51] (03PS2) 10CDanis: Revert "Change cdanis shell to zsh, and add a bunch of dotfiles" [puppet] - 10https://gerrit.wikimedia.org/r/477356 [20:59:04] cdanis: zsh not installed everywhere? [20:59:18] I was told it was part of the base install [20:59:22] cdanis i think it may resolve on it's own. [20:59:49] what was the puppet error? [21:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: I, the Bot under the Fountain, allow thee, The Deployer, to do Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181203T2100). [21:00:07] PROBLEM - puppet last run on db2060 is CRITICAL: CRITICAL: Puppet has 5 failures. Last run 7 minutes ago with 5 failures. Failed resources (up to 3 shown): File[/home/cdanis/.gitconfig],File[/home/cdanis/.screenrc],File[/home/cdanis/.tmux.conf],File[/home/cdanis/.zshenv] [21:00:12] !log bootstrapping restbase2014-a -- T210843 [21:00:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:00:16] T210843: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 [21:00:43] cdanis: maybe worth trying to run puppet a second time and see if that changes it [21:00:59] yeah, I'm looking at mw2170, the first failure [21:01:01] if it was a dependency thing [21:02:16] 10Operations, 10ops-codfw, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc2004 pc2005 pc2006 - https://phabricator.wikimedia.org/T209858 (10RobH) Switch ports for later removal once unracked: pc2004 asw-b-codfw:ge-5/0/35 pc2005 asw-c-codfw:ge-5/0/3 pc2006 asw-d-codfw:ge-... [21:02:25] AIUI modules/base/manifests/standard_packages.pp means that zsh should be everywhere, though [21:02:46] I reran puppet by hand on mw2170 and it worked with no error message [21:03:00] 10Operations, 10ops-codfw, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc2004 pc2005 pc2006 - https://phabricator.wikimedia.org/T209858 (10RobH) [21:03:26] aha, yea. sounds like it's just about the right order of things [21:03:26] it doesn’t look like the errors relate much to zsh [21:03:40] it happens that stuff works but only after 2 runs [21:03:52] wonder if /home/cdanis doesn’t exist at first and causing issues [21:03:59] as paladox said we have seen similar before when people changed dot files [21:04:11] hm [21:04:51] picked another host that hadn't seen a puppet run yet at random (mw2150), worked fine there -- but also i logged in, so my homedir definitely existed [21:05:20] alright I guess I will not revert [21:05:31] let’s see… with cumin you could target a host and test where you haven’t logged in yet [21:06:02] do the puppet failure messages make it into logstash or anything? [21:06:19] RECOVERY - puppet last run on mw2170 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [21:06:20] they would be in syslog or /var/log/puppet.log [21:06:36] puppet-agent[38231]: Could not set 'file' on ensure: Error 404 on SERVER: {"message":"Not Found: Could not find file_content modules/admin/home/cdanis/.gitconfig","issue_kind":"RESOU RCE_NOT_FOUND"} [21:06:38] not yet, puppet logs are blocked by splitting “sensitive” and “non-sensitive” logs into their own indices with different acls [21:06:44] (03PS1) 10RobH: remove pc200[456] from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/477357 (https://phabricator.wikimedia.org/T209858) [21:06:51] which is on the docket for this quarter goals (this month) [21:07:10] that's very odd mutante [21:07:25] <_joe_> it's not so odd [21:07:36] <_joe_> it's puppet usual race condition when adding files [21:08:01] you are only making more more scared, _joe_ [21:08:04] (03PS1) 10Bstorm: sonofgridengine: build a correct shadow_masters file [puppet] - 10https://gerrit.wikimedia.org/r/477358 (https://phabricator.wikimedia.org/T200557) [21:08:08] (03PS1) 10RobH: decom pc200[456] production dns entries [dns] - 10https://gerrit.wikimedia.org/r/477359 (https://phabricator.wikimedia.org/T209858) [21:08:17] (03CR) 10RobH: [C: 032] remove pc200[456] from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/477357 (https://phabricator.wikimedia.org/T209858) (owner: 10RobH) [21:08:39] <_joe_> cdanis: rightfully so! [21:09:01] (03CR) 10RobH: [C: 032] decom pc200[456] production dns entries [dns] - 10https://gerrit.wikimedia.org/r/477359 (https://phabricator.wikimedia.org/T209858) (owner: 10RobH) [21:09:08] (03CR) 10jerkins-bot: [V: 04-1] sonofgridengine: build a correct shadow_masters file [puppet] - 10https://gerrit.wikimedia.org/r/477358 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [21:09:19] yea, it just fixed it on the next run. puppet-agent[41010]: (/Stage[main]/Admin/Admin::Hashuser[cdanis]/Admin::User[cdanis]/File[/home/cdanis/.gitconfig]/ensure) defined content as '{md5} .. ACK [21:09:22] yeah [21:09:23] <_joe_> so we determined that sometimes files can be present on the host where you compile the catalog but not on the one that serves the file data (which is always a puppetmaster frontend) [21:09:37] <_joe_> so if you puppet-merge in eqiad, you should see errors typically in codfw [21:09:47] working as expected, then [21:10:20] (03PS1) 10Cwhite: incorporating first round of feedback [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/477360 (https://phabricator.wikimedia.org/T208066) [21:10:25] !log catrope@deploy1001 Synchronized php-1.33.0-wmf.6/extensions/WikimediaEvents/includes/WikimediaEventsHooks.php: Fix ChangesListFilters validation errors (duration: 00m 49s) [21:10:29] <_joe_> not-working as expected [21:10:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:10:50] <_joe_> let's say it's a known, and typically innocuous, race condition [21:11:19] you dont have to revert.. you could use cumin to run puppet on the affected hosts or just wait [21:11:24] 10Operations, 10ops-codfw, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc2004 pc2005 pc2006 - https://phabricator.wikimedia.org/T209858 (10RobH) [21:12:00] it's only been two hosts so far, and others have run puppet successfully since [21:13:04] yep, "only 2 out of hundreds" also shows the race isn't super common.. it's just the scale of a change to * [21:13:27] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Errors trying to fetch RDF from Wikidata - https://phabricator.wikimedia.org/T207718 (10Imarlier) a:05Imarlier>03Smalyshev [21:13:33] (03PS2) 10Cwhite: incorporating first round of feedback [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/477360 (https://phabricator.wikimedia.org/T208066) [21:14:00] 10Operations, 10MediaWiki-Debug-Logger, 10Performance-Team: Set up request profiling for PHP 7 - https://phabricator.wikimedia.org/T206152 (10Joe) `php-tideways` is packaged in debian stretch, and although we will need to rebuild it (to add php 7.2 support), it should be pretty straightforward. AIUI, it shou... [21:15:41] RECOVERY - puppet last run on db2060 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:15:57] 10Operations, 10ops-codfw, 10DBA, 10decommission: Decommission parsercache hosts: pc2004 pc2005 pc2006 - https://phabricator.wikimedia.org/T209858 (10RobH) [21:16:32] (03PS5) 10Dzahn: kibana: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/476916 [21:16:35] 10Operations, 10ops-codfw, 10DBA, 10decommission: Decommission parsercache hosts: pc2004 pc2005 pc2006 - https://phabricator.wikimedia.org/T209858 (10RobH) a:05RobH>03Papaul @Papaul, This is now ready for you to take over, wipe disks, and set aside for lease return this month (December 2018). This is... [21:17:49] !log push firewall change to pfw3-eqiad - T211028 [21:17:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:18:07] 10Operations, 10ops-codfw, 10DBA, 10decommission: Decommission parsercache hosts: pc2004 pc2005 pc2006 (Dec 2018 lease return) - https://phabricator.wikimedia.org/T209858 (10RobH) [21:18:22] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10cloud-services-team (Kanban): decommission labvirt101[01].eqiad.wmnet (Dec 2018 lease return) - https://phabricator.wikimedia.org/T210735 (10RobH) [21:18:24] (03PS2) 10Bstorm: sonofgridengine: build a correct shadow_masters file [puppet] - 10https://gerrit.wikimedia.org/r/477358 (https://phabricator.wikimedia.org/T200557) [21:22:10] (03PS1) 10Sbisson: Configure a sample welcome survey experiment for kowiki in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477365 (https://phabricator.wikimedia.org/T207717) [21:24:28] !log temp. disabling puppet on logstash1007 and logstash1008 to carefully deploy gerrit:476916 [21:24:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:24:44] (03CR) 10Catrope: [C: 032] Configure a sample welcome survey experiment for kowiki in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477365 (https://phabricator.wikimedia.org/T207717) (owner: 10Sbisson) [21:24:53] (03CR) 10Dzahn: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/13823/" [puppet] - 10https://gerrit.wikimedia.org/r/476916 (owner: 10Dzahn) [21:26:24] 10Operations, 10ops-eqiad, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc1004.eqiad.wmnet pc1005.eqiad.wmnet pc1006.eqiad.wmnet - https://phabricator.wikimedia.org/T210969 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for pc1004.eqiad.wmnet and performed... [21:26:35] 10Operations, 10ops-eqiad, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc1004.eqiad.wmnet pc1005.eqiad.wmnet pc1006.eqiad.wmnet - https://phabricator.wikimedia.org/T210969 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for pc1005.eqiad.wmnet and performed... [21:26:57] (03Merged) 10jenkins-bot: Configure a sample welcome survey experiment for kowiki in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477365 (https://phabricator.wikimedia.org/T207717) (owner: 10Sbisson) [21:27:01] 10Operations, 10ops-eqiad, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc1004.eqiad.wmnet pc1005.eqiad.wmnet pc1006.eqiad.wmnet - https://phabricator.wikimedia.org/T210969 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for pc1006.eqiad.wmnet and performed... [21:28:16] (03PS1) 10Cwhite: Add prometheus cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/477366 (https://phabricator.wikimedia.org/T210486) [21:28:44] wait what.. i compiled my change and saw the changes i expected and now i merge in production and .. nothing happens? heh [21:30:46] (03CR) 10jenkins-bot: Configure a sample welcome survey experiment for kowiki in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477365 (https://phabricator.wikimedia.org/T207717) (owner: 10Sbisson) [21:30:52] no, actually noop is right.. only resource names changed [21:31:15] (03PS2) 10Cwhite: Add prometheus cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/477366 (https://phabricator.wikimedia.org/T210486) [21:32:54] (03PS1) 10Cwhite: Add graphite cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/477367 (https://phabricator.wikimedia.org/T210486) [21:34:59] 10Operations, 10ops-eqiad, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc1004.eqiad.wmnet pc1005.eqiad.wmnet pc1006.eqiad.wmnet - https://phabricator.wikimedia.org/T210969 (10RobH) Switch ports noted for @cmjohnson to clear their descriptions once they are unracked: pc1004... [21:36:39] 10Operations, 10ops-eqiad, 10DBA, 10decommission: Decommission parsercache hosts: pc1004.eqiad.wmnet pc1005.eqiad.wmnet pc1006.eqiad.wmnet - https://phabricator.wikimedia.org/T210969 (10RobH) [21:36:45] (03PS3) 10Bstorm: sonofgridengine: build a correct shadow_masters file [puppet] - 10https://gerrit.wikimedia.org/r/477358 (https://phabricator.wikimedia.org/T200557) [21:37:33] 10Operations, 10ops-eqiad, 10DBA, 10decommission: Decommission parsercache hosts: pc1004.eqiad.wmnet pc1005.eqiad.wmnet pc1006.eqiad.wmnet - https://phabricator.wikimedia.org/T210969 (10RobH) [21:37:38] 10Operations, 10ops-eqiad, 10DBA, 10decommission: Decommission parsercache hosts: pc1004.eqiad.wmnet pc1005.eqiad.wmnet pc1006.eqiad.wmnet - https://phabricator.wikimedia.org/T210969 (10RobH) a:05RobH>03Cmjohnson [21:38:25] (03CR) 10Bstorm: [C: 032] sonofgridengine: build a correct shadow_masters file [puppet] - 10https://gerrit.wikimedia.org/r/477358 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [21:38:59] (03CR) 10Dzahn: [C: 032] "yep, applied one-by-one and there was no visible change at all during puppet runs, only changes to resource names" [puppet] - 10https://gerrit.wikimedia.org/r/476916 (owner: 10Dzahn) [21:42:52] (03PS1) 10Smalyshev: Stop RDF dumps [puppet] - 10https://gerrit.wikimedia.org/r/477410 (https://phabricator.wikimedia.org/T210044) [21:43:23] (03CR) 10jerkins-bot: [V: 04-1] Stop RDF dumps [puppet] - 10https://gerrit.wikimedia.org/r/477410 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [21:44:09] (03PS2) 10Smalyshev: Stop RDF dumps [puppet] - 10https://gerrit.wikimedia.org/r/477410 (https://phabricator.wikimedia.org/T210044) [21:45:59] (03PS3) 10Dzahn: puppetmaster: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/451821 [21:47:44] (03CR) 10Gehel: [C: 032] Stop RDF dumps [puppet] - 10https://gerrit.wikimedia.org/r/477410 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [21:50:15] i think puppetmasters are the only hosts where we run httpd twice on 2 separate ports. and the httpd class might have an issue with that. not sure how to avoid duplicate declaration issues yet [21:50:52] otherwise we are getting really close to not using apache module anymore. soon just mediawiki and puppetmaster are left [21:51:56] (and simplelamp in cloud vps.. uhm) [21:57:39] (03PS1) 10Bstorm: sonofgridengine: remove pointless dependency [puppet] - 10https://gerrit.wikimedia.org/r/477413 (https://phabricator.wikimedia.org/T200557) [21:58:43] (03CR) 10jerkins-bot: [V: 04-1] sonofgridengine: remove pointless dependency [puppet] - 10https://gerrit.wikimedia.org/r/477413 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [21:59:47] (03PS2) 10Bstorm: sonofgridengine: remove pointless dependency [puppet] - 10https://gerrit.wikimedia.org/r/477413 (https://phabricator.wikimedia.org/T200557) [22:00:07] bawolff and Reedy: (Dis)respected human, time to deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181203T2200). Please do the needful. [22:04:45] cdanis: FYI given it's in topic, when (not if ;) ) a puppet change causes puppet to fail on many hosts, once fixed, you can run this: [22:04:48] https://wikitech.wikimedia.org/wiki/Cumin#Run_Puppet_only_if_last_run_failed [22:04:59] haha ty volans [22:05:00] not for this case, but thought to mention it as it's in topic :) [22:05:18] that is nice, cumin is good stuff [22:09:06] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install elastic203[7-9], elastic204[0-9], elastic205[0-4] - https://phabricator.wikimedia.org/T210450 (10Papaul) [22:13:35] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudstore1008 & cloudstore1009 - https://phabricator.wikimedia.org/T193655 (10bd808) [22:14:04] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudstore1008 & cloudstore1009 - https://phabricator.wikimedia.org/T193655 (10bd808) [22:15:48] (03CR) 10Bstorm: [C: 032] sonofgridengine: remove pointless dependency [puppet] - 10https://gerrit.wikimedia.org/r/477413 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [22:15:50] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 10media-storage: Loading full versions of larger images from Commons stucks / repeatedly gets interrupted after a few MBs - https://phabricator.wikimedia.org/T210890 (10Danmichaelo) This also affects CropTool (hosted on Tool Labs), I'm getting report... [22:16:11] (03PS1) 10RobH: update to cloudvirtan100* dns [dns] - 10https://gerrit.wikimedia.org/r/477414 [22:16:34] (03CR) 10RobH: [C: 032] update to cloudvirtan100* dns [dns] - 10https://gerrit.wikimedia.org/r/477414 (owner: 10RobH) [22:26:36] (03PS5) 10Ladsgroup: redis: Add redis::sentinel class [puppet] - 10https://gerrit.wikimedia.org/r/476891 (https://phabricator.wikimedia.org/T210580) [22:26:38] (03PS1) 10Ladsgroup: Add sentinel profile and role [puppet] - 10https://gerrit.wikimedia.org/r/477415 (https://phabricator.wikimedia.org/T210580) [22:27:35] (03CR) 10jerkins-bot: [V: 04-1] Add sentinel profile and role [puppet] - 10https://gerrit.wikimedia.org/r/477415 (https://phabricator.wikimedia.org/T210580) (owner: 10Ladsgroup) [22:31:52] (03PS2) 10Ladsgroup: Add sentinel profile and role [puppet] - 10https://gerrit.wikimedia.org/r/477415 (https://phabricator.wikimedia.org/T210580) [22:32:50] (03CR) 10jerkins-bot: [V: 04-1] Add sentinel profile and role [puppet] - 10https://gerrit.wikimedia.org/r/477415 (https://phabricator.wikimedia.org/T210580) (owner: 10Ladsgroup) [22:33:44] (03PS3) 10Ladsgroup: Add sentinel profile and role [puppet] - 10https://gerrit.wikimedia.org/r/477415 (https://phabricator.wikimedia.org/T210580) [22:34:19] We're deploying a security thing [22:34:28] (03CR) 10jerkins-bot: [V: 04-1] Add sentinel profile and role [puppet] - 10https://gerrit.wikimedia.org/r/477415 (https://phabricator.wikimedia.org/T210580) (owner: 10Ladsgroup) [22:34:41] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install elastic203[7-9], elastic204[0-9], elastic205[0-4] - https://phabricator.wikimedia.org/T210450 (10Papaul) [22:35:08] (03PS4) 10Ladsgroup: Add sentinel profile and role [puppet] - 10https://gerrit.wikimedia.org/r/477415 (https://phabricator.wikimedia.org/T210580) [22:36:08] (03CR) 10jerkins-bot: [V: 04-1] Add sentinel profile and role [puppet] - 10https://gerrit.wikimedia.org/r/477415 (https://phabricator.wikimedia.org/T210580) (owner: 10Ladsgroup) [22:38:21] 10Operations, 10Wikimedia-Logstash: Procure and provision Logging pipeline hardware in multiple datacenters - https://phabricator.wikimedia.org/T205850 (10Papaul) [22:41:48] (03PS5) 10Ladsgroup: Add sentinel profile and role [puppet] - 10https://gerrit.wikimedia.org/r/477415 (https://phabricator.wikimedia.org/T210580) [22:44:23] 10Operations, 10Puppet, 10ORES, 10Scoring-platform-team, 10Wikimedia-Incident: Logrotate should restart services when more people are around - https://phabricator.wikimedia.org/T210720 (10Ladsgroup) >>! In T210720#4784938, @akosiaris wrote: > So we should make a better job of surfacing and fixing the iss... [22:45:55] 10Operations, 10ops-codfw: rack/setup/install codfw logstash elasticsearch storage servers - https://phabricator.wikimedia.org/T211065 (10Papaul) [22:45:57] (03PS1) 10Gergő Tisza: EventLogging Logstash filter: move useful fields out of event [puppet] - 10https://gerrit.wikimedia.org/r/477419 [22:46:21] 10Operations, 10ops-codfw: rack/setup/install codfw logstash elasticsearch storage servers - https://phabricator.wikimedia.org/T211065 (10Papaul) p:05Triage>03High [22:47:17] (03PS2) 10Gergő Tisza: EventLogging Logstash filter: move useful fields out of event [puppet] - 10https://gerrit.wikimedia.org/r/477419 (https://phabricator.wikimedia.org/T205437) [22:52:30] 10Operations, 10ops-codfw: rack/setup/install codfw logstash elasticsearch storage servers - https://phabricator.wikimedia.org/T211065 (10Papaul) [22:56:27] !log sbassett@deploy1001 Synchronized php-1.33.0-wmf.6/extensions/AbuseFilter/includes/api/ApiQueryAbuseLog.php: Deploy security fix for T210329 (duration: 00m 47s) [22:56:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:56:36] (03CR) 10Awight: redis: Add redis::sentinel class (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/476891 (https://phabricator.wikimedia.org/T210580) (owner: 10Ladsgroup) [23:00:58] (03CR) 10Ladsgroup: "This change is ready for review." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/476891 (https://phabricator.wikimedia.org/T210580) (owner: 10Ladsgroup) [23:02:28] (03PS1) 10Bstorm: sonofgridengine: file_line corrections [puppet] - 10https://gerrit.wikimedia.org/r/477423 (https://phabricator.wikimedia.org/T200557) [23:04:12] (03CR) 10Bstorm: [C: 032] sonofgridengine: file_line corrections [puppet] - 10https://gerrit.wikimedia.org/r/477423 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [23:08:49] (03PS1) 10BBlack: Revert "cache: stop using nhw admission policy" [puppet] - 10https://gerrit.wikimedia.org/r/477424 (https://phabricator.wikimedia.org/T144187) [23:09:28] (03CR) 10BBlack: [C: 032] Revert "cache: stop using nhw admission policy" [puppet] - 10https://gerrit.wikimedia.org/r/477424 (https://phabricator.wikimedia.org/T144187) (owner: 10BBlack) [23:13:47] PROBLEM - Disk space on notebook1004 is CRITICAL: DISK CRITICAL - free space: / 1545 MB (3% inode=94%) [23:21:18] jouncebot: now [23:21:18] For the next 0 hour(s) and 38 minute(s): Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181203T2200) [23:21:21] jouncebot: next [23:21:21] In 0 hour(s) and 38 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181204T0000) [23:21:30] sbassett: are you still deploying stuff? [23:21:36] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 10media-storage, 10Patch-For-Review: Loading full versions of larger images from Commons stucks / repeatedly gets interrupted after a few MBs - https://phabricator.wikimedia.org/T210890 (10BBlack) I think the patch reverted above was at fault. Wha... [23:21:51] legoktm: No, we're done. Just backporting in gerrit now. [23:22:00] sbassett: ok, is it alright if I sync some patches out? [23:23:12] Yep, that should be fine. [23:24:26] thanks [23:25:26] (03PS1) 10Smalyshev: Enable SPARQL logging to a separate file [puppet] - 10https://gerrit.wikimedia.org/r/477429 (https://phabricator.wikimedia.org/T210044) [23:26:32] (03CR) 10jerkins-bot: [V: 04-1] Enable SPARQL logging to a separate file [puppet] - 10https://gerrit.wikimedia.org/r/477429 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [23:31:34] (03PS2) 10Smalyshev: Enable SPARQL logging to a separate file [puppet] - 10https://gerrit.wikimedia.org/r/477429 (https://phabricator.wikimedia.org/T210044) [23:31:43] (03PS1) 10Bstorm: sonofgridengine: remove yet another bad dependency [puppet] - 10https://gerrit.wikimedia.org/r/477432 [23:36:53] (03CR) 10Bstorm: [C: 032] sonofgridengine: remove yet another bad dependency [puppet] - 10https://gerrit.wikimedia.org/r/477432 (owner: 10Bstorm) [23:39:08] (03PS4) 10BBlack: Add wikiba.se [dns] - 10https://gerrit.wikimedia.org/r/473543 (https://phabricator.wikimedia.org/T99531) (owner: 10Addshore) [23:39:36] (03CR) 10BBlack: [C: 032] Add wikiba.se [dns] - 10https://gerrit.wikimedia.org/r/473543 (https://phabricator.wikimedia.org/T99531) (owner: 10Addshore) [23:47:49] PROBLEM - HHVM rendering on mwdebug1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:50:05] RECOVERY - HHVM rendering on mwdebug1002 is OK: HTTP OK: HTTP/1.1 200 OK - 80614 bytes in 1.530 second response time [23:51:36] !log legoktm@deploy1001 Synchronized php-1.33.0-wmf.6/includes/Linker.php: Restore old HTML structure for history section links (T165189 part 1) (duration: 00m 47s) [23:51:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:40] T165189: "→" link to page section on History page can be hard to click, should be larger somehow - https://phabricator.wikimedia.org/T165189 [23:52:20] (03PS1) 10Papaul: DNS: Add mgmt and production DNS for elastic2045 - elastic2054 [dns] - 10https://gerrit.wikimedia.org/r/477436 (https://phabricator.wikimedia.org/T210450) [23:52:45] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 10media-storage, 10Patch-For-Review: Loading full versions of larger images from Commons stucks / repeatedly gets interrupted after a few MBs - https://phabricator.wikimedia.org/T210890 (10BBlack) 05Open>03Resolved a:03BBlack I can't reproduc... [23:53:23] !log legoktm@deploy1001 Synchronized php-1.33.0-wmf.6/resources/src/mediawiki.legacy/: Restore gray coloring for autocomments (T165189 part 2) (duration: 00m 47s) [23:53:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:54:50] !log legoktm@deploy1001 Synchronized php-1.33.0-wmf.6/tests/: for completeness (duration: 00m 58s) [23:54:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:59:03] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10cloud-services-team (Kanban): decommission labvirt101[01].eqiad.wmnet (Dec 2018 lease return) - https://phabricator.wikimedia.org/T210735 (10RobH)