[00:20:27] RECOVERY - Host cp5001 is UP: PING OK - Packet loss = 0%, RTA = 245.48 ms [00:20:31] RECOVERY - IPsec on cp2026 is OK: Strongswan OK - 64 ESP OK [00:20:31] RECOVERY - IPsec on cp1082 is OK: Strongswan OK - 72 ESP OK [00:20:31] RECOVERY - IPsec on cp2020 is OK: Strongswan OK - 64 ESP OK [00:20:31] RECOVERY - IPsec on cp1078 is OK: Strongswan OK - 72 ESP OK [00:20:35] RECOVERY - IPsec on cp1086 is OK: Strongswan OK - 72 ESP OK [00:20:35] RECOVERY - IPsec on cp2024 is OK: Strongswan OK - 64 ESP OK [00:20:47] RECOVERY - IPsec on cp2005 is OK: Strongswan OK - 64 ESP OK [00:20:49] RECOVERY - IPsec on cp1084 is OK: Strongswan OK - 72 ESP OK [00:20:51] RECOVERY - IPsec on cp1088 is OK: Strongswan OK - 72 ESP OK [00:20:57] RECOVERY - IPsec on cp2002 is OK: Strongswan OK - 64 ESP OK [00:20:59] RECOVERY - IPsec on cp2011 is OK: Strongswan OK - 64 ESP OK [00:20:59] RECOVERY - IPsec on cp2018 is OK: Strongswan OK - 64 ESP OK [00:21:01] RECOVERY - IPsec on cp1080 is OK: Strongswan OK - 72 ESP OK [00:21:03] RECOVERY - IPsec on cp1090 is OK: Strongswan OK - 72 ESP OK [00:21:03] RECOVERY - IPsec on cp2014 is OK: Strongswan OK - 64 ESP OK [00:21:17] RECOVERY - IPsec on cp2022 is OK: Strongswan OK - 64 ESP OK [00:21:17] RECOVERY - IPsec on cp1076 is OK: Strongswan OK - 72 ESP OK [00:21:27] RECOVERY - IPsec on cp2017 is OK: Strongswan OK - 64 ESP OK [00:21:27] RECOVERY - IPsec on cp2025 is OK: Strongswan OK - 64 ESP OK [00:21:29] RECOVERY - IPsec on cp2008 is OK: Strongswan OK - 64 ESP OK [01:22:44] (03PS3) 10Dzahn: update puppet stdlib to 4.16.0 [puppet] - 10https://gerrit.wikimedia.org/r/474334 [01:23:38] (03CR) 10jerkins-bot: [V: 04-1] update puppet stdlib to 4.16.0 [puppet] - 10https://gerrit.wikimedia.org/r/474334 (owner: 10Dzahn) [01:35:08] (03PS1) 10Dzahn: upgrade puppet stdlib from 4.16.0 to 4.19.0 [puppet] - 10https://gerrit.wikimedia.org/r/475258 [01:35:46] (03CR) 10jerkins-bot: [V: 04-1] upgrade puppet stdlib from 4.16.0 to 4.19.0 [puppet] - 10https://gerrit.wikimedia.org/r/475258 (owner: 10Dzahn) [01:39:02] (03PS1) 10Dzahn: upgrade puppet stdlib from 4.19.0 to 4.22.0 [puppet] - 10https://gerrit.wikimedia.org/r/475259 [01:39:04] (03PS1) 10Dzahn: upgrade puppet stdlib from 4.22.0 to 4.24.0 [puppet] - 10https://gerrit.wikimedia.org/r/475260 [01:39:06] (03PS1) 10Dzahn: upgrade puppet stdlib from 4.24.0 to 4.25.1 [puppet] - 10https://gerrit.wikimedia.org/r/475261 [01:40:20] (03PS4) 10Dzahn: upgrade puppet stdlib from 4.15.0 to 4.16.0 [puppet] - 10https://gerrit.wikimedia.org/r/474334 [01:40:24] (03CR) 10jerkins-bot: [V: 04-1] upgrade puppet stdlib from 4.19.0 to 4.22.0 [puppet] - 10https://gerrit.wikimedia.org/r/475259 (owner: 10Dzahn) [01:40:57] (03CR) 10jerkins-bot: [V: 04-1] upgrade puppet stdlib from 4.22.0 to 4.24.0 [puppet] - 10https://gerrit.wikimedia.org/r/475260 (owner: 10Dzahn) [01:41:23] (03CR) 10jerkins-bot: [V: 04-1] upgrade puppet stdlib from 4.24.0 to 4.25.1 [puppet] - 10https://gerrit.wikimedia.org/r/475261 (owner: 10Dzahn) [01:41:28] (03CR) 10jerkins-bot: [V: 04-1] upgrade puppet stdlib from 4.15.0 to 4.16.0 [puppet] - 10https://gerrit.wikimedia.org/r/474334 (owner: 10Dzahn) [01:42:14] (03CR) 10Dzahn: "Thank you Alex! That was very helpful. Based on your comments I added (dependent) patches for the following steps:" [puppet] - 10https://gerrit.wikimedia.org/r/474334 (owner: 10Dzahn) [01:42:47] PROBLEM - MariaDB Slave Lag: m3 on db2078 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 593.35 seconds [01:43:17] PROBLEM - MariaDB Slave Lag: m3 on db2042 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 618.03 seconds [01:49:29] PROBLEM - HHVM jobrunner on mw1309 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [01:50:37] RECOVERY - HHVM jobrunner on mw1309 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.003 second response time [01:50:59] PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 27927 MB (5% inode=99%) [02:03:33] RECOVERY - Disk space on elastic1025 is OK: DISK OK [02:18:11] PROBLEM - MD RAID on stat1005 is CRITICAL: connect to address 10.64.53.30 port 5666: Connection refused [02:19:07] PROBLEM - Check systemd state on stat1005 is CRITICAL: connect to address 10.64.53.30 port 5666: Connection refused [02:19:07] PROBLEM - DPKG on stat1005 is CRITICAL: connect to address 10.64.53.30 port 5666: Connection refused [02:19:07] PROBLEM - Disk space on stat1005 is CRITICAL: connect to address 10.64.53.30 port 5666: Connection refused [02:19:15] PROBLEM - dhclient process on stat1005 is CRITICAL: connect to address 10.64.53.30 port 5666: Connection refused [02:19:16] PROBLEM - configured eth on stat1005 is CRITICAL: connect to address 10.64.53.30 port 5666: Connection refused [02:21:13] PROBLEM - puppet last run on stat1005 is CRITICAL: connect to address 10.64.53.30 port 5666: Connection refused [02:24:08] hmm.. there is "[Engineering] Upcoming move of users from stat1005 to stat1007" but i dont think that should be down yet [02:24:15] looks on mgmt [02:24:23] oh, actually i can SSH to it normally [02:24:40] also has a "do not use this server" banner.. ok [02:25:15] !log stat1005 - started nagios-nrpe-server [02:25:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:25:59] RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational [02:25:59] RECOVERY - DPKG on stat1005 is OK: All packages OK [02:26:07] RECOVERY - dhclient process on stat1005 is OK: PROCS OK: 0 processes with command name dhclient [02:26:07] RECOVERY - configured eth on stat1005 is OK: OK - interfaces up [02:26:13] RECOVERY - MD RAID on stat1005 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [02:26:21] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 30 minutes ago with 0 failures [02:27:02] ok.. off [02:41:33] PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 25094 MB (5% inode=99%) [02:59:48] RECOVERY - Disk space on elastic1025 is OK: DISK OK [03:15:21] RECOVERY - MariaDB Slave Lag: m3 on db2042 is OK: OK slave_sql_lag Replication lag: 3.26 seconds [03:16:01] RECOVERY - MariaDB Slave Lag: m3 on db2078 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [03:18:59] (03CR) 10Mathew.onipe: elasticsearch_cluster: multi-cluster/multi-instance support (0317 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe) [03:24:34] (03PS24) 10Mathew.onipe: elasticsearch_cluster: Added multi-cluster/multi-instance support [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) [03:30:29] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 834.43 seconds [03:37:21] PROBLEM - Check systemd state on wdqs1010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [03:38:54] (03CR) 10Mathew.onipe: elasticsearch_cluster: Added multi-cluster/multi-instance support (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe) [03:51:14] (03CR) 10Mathew.onipe: elasticsearch: cookbook for multi-cluster services rolling restart (038 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [03:53:31] (03PS22) 10Mathew.onipe: elasticsearch: cookbook for multi-cluster services rolling restart [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) [03:54:37] RECOVERY - Check systemd state on wdqs1010 is OK: OK - running: The system is fully operational [03:58:01] PROBLEM - Check systemd state on wdqs1010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [04:04:54] (03PS23) 10Mathew.onipe: elasticsearch: cookbook for multi-cluster services rolling restart [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) [04:10:47] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 279.91 seconds [04:24:29] RECOVERY - Check systemd state on wdqs1010 is OK: OK - running: The system is fully operational [04:27:53] PROBLEM - Check systemd state on wdqs1010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [04:54:19] RECOVERY - Check systemd state on wdqs1010 is OK: OK - running: The system is fully operational [04:57:45] PROBLEM - Check systemd state on wdqs1010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [05:17:09] RECOVERY - Check systemd state on wdqs1010 is OK: OK - running: The system is fully operational [06:21:51] PROBLEM - Check systemd state on ms-be2021 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:23:37] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2044 - https://phabricator.wikimedia.org/T210049 (10Marostegui) 05Open>03Resolved The disk got rebuilt but now it shows predictive failure: ` root@db2044:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 0014380264... [06:23:45] 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [06:24:23] 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) db2044 got its disk replaced but came up with predictive failure (T210049#4767169) [06:32:33] RECOVERY - Disk space on stat1005 is OK: DISK OK [06:44:29] (03PS1) 10Marostegui: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475279 (https://phabricator.wikimedia.org/T86339) [06:45:58] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475279 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [06:47:06] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475279 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [06:47:58] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475281 [06:48:11] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1079 - T86339 (duration: 00m 51s) [06:48:13] !log Deploy schema change on db1079 (sanitarium master) with replication - T86339 [06:48:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:48:15] T86339: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 [06:48:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:48:53] * elukey thanks marostegui for the daily alter tables [06:49:09] \o\ |o| /o/ [06:49:43] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475281 (owner: 10Marostegui) [06:50:27] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475281 (owner: 10Marostegui) [06:50:45] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475279 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [06:50:47] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475281 (owner: 10Marostegui) [06:51:41] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1079 - T86339 (duration: 00m 46s) [06:51:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:38] (03PS1) 10Marostegui: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475282 (https://phabricator.wikimedia.org/T86339) [06:53:45] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475282 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [06:54:08] (03PS1) 10Ema: check_long_procs: fix shellcheck warnings [puppet] - 10https://gerrit.wikimedia.org/r/475283 [06:54:45] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475282 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [06:55:21] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475284 [06:55:47] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1086 - T86339 (duration: 00m 46s) [06:55:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:55:50] !log Deploy schema change on db1086 - T86339 [06:55:50] T86339: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 [06:55:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:56:21] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475284 (owner: 10Marostegui) [06:57:28] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475284 (owner: 10Marostegui) [06:58:27] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1086 - T86339 (duration: 00m 46s) [06:58:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:03:06] (03CR) 1020after4: [C: 031] phabricator: ship apache error logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/474988 (https://phabricator.wikimedia.org/T141895) (owner: 10Herron) [07:03:22] (03CR) 1020after4: [C: 031] Scap prep should use latest MediaWiki version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474971 (owner: 10Thcipriani) [07:03:37] (03CR) 1020after4: [C: 031] Install docker on releases-jenkins [puppet] - 10https://gerrit.wikimedia.org/r/474825 (https://phabricator.wikimedia.org/T208529) (owner: 10Thcipriani) [07:03:48] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475282 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [07:03:50] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475284 (owner: 10Marostegui) [07:03:58] (03CR) 1020after4: [C: 031] phabricator: add data types to all parameters [puppet] - 10https://gerrit.wikimedia.org/r/471325 (owner: 10Dzahn) [07:18:41] (03PS1) 10Ema: ATS: add profile::trafficserver::nrpe_monitor_script [puppet] - 10https://gerrit.wikimedia.org/r/475287 (https://phabricator.wikimedia.org/T204209) [07:25:43] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210121 (10ops-monitoring-bot) [07:40:24] ACKNOWLEDGEMENT - PyBal backends health check on lvs2010 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 Ema lvs2010 is not ready to run pybal yet [07:40:24] ACKNOWLEDGEMENT - pybal on lvs2010 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal Ema lvs2010 is not ready to run pybal yet [07:54:02] (03PS2) 10Muehlenhoff: Disable Diamond on Graphite hosts [puppet] - 10https://gerrit.wikimedia.org/r/474922 (https://phabricator.wikimedia.org/T183454) [07:57:01] <_joe_> moritzm: are you doing something in codfw? [07:57:13] <_joe_> we have like 15 appservers with HHVM down [07:59:19] no, nothing [07:59:56] down as in service not running or depooled? [08:03:17] (03PS2) 10Muehlenhoff: Remove Diamond from restbase servers [puppet] - 10https://gerrit.wikimedia.org/r/474930 (https://phabricator.wikimedia.org/T183454) [08:03:38] <_joe_> they went down collectively [08:03:50] <_joe_> like 20 of them, for a few minutes [08:04:04] <_joe_> I guess some network hiccup at this point, I'm verifying [08:04:13] ack, seems most likely [08:05:03] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10kchapman) TechCom hosted an IRC meeting on this today: * Log: https://tools.wmflabs.org/meetbot/wikimedia... [08:06:19] (03PS1) 10Marostegui: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475290 (https://phabricator.wikimedia.org/T86339) [08:06:24] <_joe_> moritzm: yes, it seems so [08:08:47] (03PS2) 10Ema: ATS: add profile::trafficserver::nrpe_monitor_script [puppet] - 10https://gerrit.wikimedia.org/r/475287 (https://phabricator.wikimedia.org/T204209) [08:08:49] (03PS1) 10Ema: ATS: add check_trafficserver_log_fifo [puppet] - 10https://gerrit.wikimedia.org/r/475291 (https://phabricator.wikimedia.org/T204209) [08:10:36] (03CR) 10Muehlenhoff: [C: 031] "Right, I created that exporter about a year ago via T182196. I think we had a dashboard in the past, I probably used that to track whether" [puppet] - 10https://gerrit.wikimedia.org/r/475009 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite) [08:10:49] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) Points I took away from the TechCom meeting, which should be addressed before deployment: * The s... [08:12:42] (03CR) 10Muehlenhoff: "Looks good, but let's amend the patch to also disable Diamond for the mw_rc_irc role, with the removal of the IRCDStatsCollector it's good" [puppet] - 10https://gerrit.wikimedia.org/r/475010 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite) [08:14:24] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475290 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [08:15:30] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475290 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [08:16:02] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1094" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475292 [08:16:14] (03CR) 10Vgutierrez: [C: 032] certcentral: Provide bare minimum icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/475049 (https://phabricator.wikimedia.org/T207294) (owner: 10Vgutierrez) [08:16:26] (03PS9) 10Vgutierrez: certcentral: Provide bare minimum icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/475049 (https://phabricator.wikimedia.org/T207294) [08:16:36] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1094 - T86339 (duration: 00m 49s) [08:16:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:41] T86339: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 [08:17:08] !log Deploy schema change on db1094 - T86339 [08:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:29] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1094" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475292 (owner: 10Marostegui) [08:18:33] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1094" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475292 (owner: 10Marostegui) [08:19:37] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1094 - T86339 (duration: 00m 46s) [08:19:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:48] !log Deploy schema change on db1062 (s7 master) - T86339 [08:19:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:36] !log Deploy schema change on s1 codfw master (db2048) with replication - T86339 [08:21:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:49] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475290 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [08:21:53] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1094" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475292 (owner: 10Marostegui) [08:26:13] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T208096 (10fgiunchedi) [08:26:15] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210121 (10fgiunchedi) [08:29:19] (03PS1) 10Muehlenhoff: Record new MOU date for Robert West [puppet] - 10https://gerrit.wikimedia.org/r/475293 [08:32:30] (03CR) 10Muehlenhoff: [C: 032] Record new MOU date for Robert West [puppet] - 10https://gerrit.wikimedia.org/r/475293 (owner: 10Muehlenhoff) [08:37:18] (03CR) 10Filippo Giunchedi: [C: 032] logstash: add 'level' normalization rules [puppet] - 10https://gerrit.wikimedia.org/r/475110 (https://phabricator.wikimedia.org/T143733) (owner: 10Filippo Giunchedi) [08:37:26] (03PS2) 10Filippo Giunchedi: logstash: add 'level' normalization rules [puppet] - 10https://gerrit.wikimedia.org/r/475110 (https://phabricator.wikimedia.org/T143733) [08:39:02] moritzm: merged your change too [08:44:49] PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 28749 MB (5% inode=99%) [08:45:57] RECOVERY - Disk space on elastic1025 is OK: DISK OK [08:46:19] ack, thanks [08:46:26] (03PS3) 10Ema: ATS: add profile::trafficserver::nrpe_monitor_script [puppet] - 10https://gerrit.wikimedia.org/r/475287 (https://phabricator.wikimedia.org/T204209) [08:47:39] (03CR) 10Ema: [C: 032] ATS: add profile::trafficserver::nrpe_monitor_script [puppet] - 10https://gerrit.wikimedia.org/r/475287 (https://phabricator.wikimedia.org/T204209) (owner: 10Ema) [08:48:00] (03PS2) 10Ema: ATS: add check_trafficserver_log_fifo [puppet] - 10https://gerrit.wikimedia.org/r/475291 (https://phabricator.wikimedia.org/T204209) [08:49:00] (03CR) 10Ema: [C: 032] ATS: add check_trafficserver_log_fifo [puppet] - 10https://gerrit.wikimedia.org/r/475291 (https://phabricator.wikimedia.org/T204209) (owner: 10Ema) [08:50:09] (03PS5) 10Filippo Giunchedi: logstash: rename 'severity' syslog field if present [puppet] - 10https://gerrit.wikimedia.org/r/475104 (https://phabricator.wikimedia.org/T143733) [08:57:16] (03CR) 10Filippo Giunchedi: [C: 032] logstash: rename 'severity' syslog field if present [puppet] - 10https://gerrit.wikimedia.org/r/475104 (https://phabricator.wikimedia.org/T143733) (owner: 10Filippo Giunchedi) [08:57:28] (03PS6) 10Filippo Giunchedi: logstash: rename 'severity' syslog field if present [puppet] - 10https://gerrit.wikimedia.org/r/475104 (https://phabricator.wikimedia.org/T143733) [09:03:28] (03PS1) 10Ema: ATS: pass fifo filename to check_trafficserver_log_fifo [puppet] - 10https://gerrit.wikimedia.org/r/475294 (https://phabricator.wikimedia.org/T204209) [09:06:22] !log installing jasper security updates [09:06:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:08:17] (03PS2) 10Ema: ATS: pass fifo filename to check_trafficserver_log_fifo [puppet] - 10https://gerrit.wikimedia.org/r/475294 (https://phabricator.wikimedia.org/T204209) [09:09:59] (03CR) 10Ema: "recheck" [debs/prometheus-php-fpm-exporter] - 10https://gerrit.wikimedia.org/r/475075 (owner: 10Giuseppe Lavagetto) [09:12:21] (03PS3) 10Ema: ATS: pass fifo filename to check_trafficserver_log_fifo [puppet] - 10https://gerrit.wikimedia.org/r/475294 (https://phabricator.wikimedia.org/T204209) [09:17:06] (03PS2) 10ArielGlenn: add redirects of various zh-yue projects to yue [puppet] - 10https://gerrit.wikimedia.org/r/474901 (https://phabricator.wikimedia.org/T209693) [09:17:23] (03PS4) 10Ema: ATS: pass fifo filename to check_trafficserver_log_fifo [puppet] - 10https://gerrit.wikimedia.org/r/475294 (https://phabricator.wikimedia.org/T204209) [09:20:49] !log installing ruby-rack security updates [09:20:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:24:24] (03CR) 10ArielGlenn: [C: 032] add redirects of various zh-yue projects to yue [puppet] - 10https://gerrit.wikimedia.org/r/474901 (https://phabricator.wikimedia.org/T209693) (owner: 10ArielGlenn) [09:24:55] !log installing ruby-l18n security updates [09:24:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:30:58] 10Operations, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Redirect from zh-yue.wiktionary.org is not working properly - https://phabricator.wikimedia.org/T209693 (10ArielGlenn) ` $ curl -D someheaders.txt -H 'X-Wikimedia-Debug: backend=mwdebug1002.eqiad.wmnet' 'https://zh-yue.wiktionary.org/wiki/Pa... [09:32:47] (03PS2) 10Filippo Giunchedi: phabricator: ship apache error logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/474988 (https://phabricator.wikimedia.org/T141895) (owner: 10Herron) [09:33:32] (03CR) 10Filippo Giunchedi: [C: 032] phabricator: ship apache error logs to ELK [puppet] - 10https://gerrit.wikimedia.org/r/474988 (https://phabricator.wikimedia.org/T141895) (owner: 10Herron) [09:35:00] (03PS1) 10Volans: netbox: allow read-only access to wmf ldap group [puppet] - 10https://gerrit.wikimedia.org/r/475295 (https://phabricator.wikimedia.org/T208267) [09:36:36] (03PS5) 10Ema: ATS: pass fifo filename to check_trafficserver_log_fifo [puppet] - 10https://gerrit.wikimedia.org/r/475294 (https://phabricator.wikimedia.org/T204209) [09:38:59] (03CR) 10Ema: [C: 032] ATS: pass fifo filename to check_trafficserver_log_fifo [puppet] - 10https://gerrit.wikimedia.org/r/475294 (https://phabricator.wikimedia.org/T204209) (owner: 10Ema) [09:49:11] !log stop and upgrade dbstore2001 [09:49:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:54:24] !log bounce rsyslog on wezen, tls listener timeout on icinga [09:54:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:51] (03PS2) 10Mathew.onipe: maps: added use_proxy flag to set proxy [puppet] - 10https://gerrit.wikimedia.org/r/475092 (https://phabricator.wikimedia.org/T209570) [10:00:53] (03PS2) 10Mathew.onipe: profile::maps::osm_master: change osmupdater and osmimporter auth method to peer [puppet] - 10https://gerrit.wikimedia.org/r/475093 (https://phabricator.wikimedia.org/T206639) [10:01:20] PROBLEM - rsyslog TLS listener on port 6514 on lithium is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer [10:01:26] !log bounce rsyslog on lithium, tls listener timeout on icinga [10:01:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:54] (03CR) 10jerkins-bot: [V: 04-1] profile::maps::osm_master: change osmupdater and osmimporter auth method to peer [puppet] - 10https://gerrit.wikimedia.org/r/475093 (https://phabricator.wikimedia.org/T206639) (owner: 10Mathew.onipe) [10:05:50] (03CR) 10Mathew.onipe: "PCC seem good: https://puppet-compiler.wmflabs.org/compiler1002/13652/labsdb1006.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/475092 (https://phabricator.wikimedia.org/T209570) (owner: 10Mathew.onipe) [10:07:38] (03PS3) 10Mathew.onipe: profile::maps::osm_master: change osmupdater and osmimporter auth method to peer [puppet] - 10https://gerrit.wikimedia.org/r/475093 (https://phabricator.wikimedia.org/T206639) [10:13:01] (03CR) 10DCausse: [C: 031] elasticsearch_cluster: Added multi-cluster/multi-instance support (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe) [10:13:34] (03PS3) 10Jcrespo: mariadb: remove section s2 from dbstore2002 [puppet] - 10https://gerrit.wikimedia.org/r/475089 (https://phabricator.wikimedia.org/T208320) (owner: 10Banyek) [10:17:34] (03CR) 10Gehel: [C: 04-1] "PCC fails, but I don't understand why: https://puppet-compiler.wmflabs.org/compiler1002/13653/" [puppet] - 10https://gerrit.wikimedia.org/r/475241 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [10:18:21] hashar: I've tried a 'recheck' on https://gerrit.wikimedia.org/r/#/c/operations/debs/prometheus-php-fpm-exporter/+/475075/ but nothing happened even though you've merged https://gerrit.wikimedia.org/r/#/c/integration/config/+/475081/ [10:18:41] hashar: any idea on what might be wrong? [10:21:59] !log upgrading and restarting dbstore1001 [10:22:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:23:00] PROBLEM - rsyslog TLS listener on port 6514 on lithium is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [10:24:20] RECOVERY - rsyslog TLS listener on port 6514 on lithium is OK: SSL OK - Certificate lithium.eqiad.wmnet valid until 2021-10-23 19:09:29 +0000 (expires in 1066 days) [10:25:39] (03PS4) 10Gehel: Enable dumping RDF data for debugging purposes [puppet] - 10https://gerrit.wikimedia.org/r/475241 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [10:26:03] (03CR) 10Gehel: Enable dumping RDF data for debugging purposes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475241 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [10:26:48] (03CR) 10jerkins-bot: [V: 04-1] Enable dumping RDF data for debugging purposes [puppet] - 10https://gerrit.wikimedia.org/r/475241 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [10:29:34] (03CR) 10Gehel: [C: 031] "PCC now looks good: https://puppet-compiler.wmflabs.org/compiler1002/13654/" [puppet] - 10https://gerrit.wikimedia.org/r/475241 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [10:36:57] (03PS5) 10Gehel: Enable dumping RDF data for debugging purposes [puppet] - 10https://gerrit.wikimedia.org/r/475241 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [10:43:46] 10Operations, 10DNS, 10Traffic, 10User-revi: wikidata.org lacks SPF record - https://phabricator.wikimedia.org/T210134 (10revi) [10:44:51] (03PS2) 10Alexandros Kosiaris: Introduce zoterov2 LVS endpoint [puppet] - 10https://gerrit.wikimedia.org/r/473733 (https://phabricator.wikimedia.org/T201611) [10:44:53] (03PS1) 10Alexandros Kosiaris: Introduce zoterov2 LVS conftool data [puppet] - 10https://gerrit.wikimedia.org/r/475303 (https://phabricator.wikimedia.org/T201611) [10:44:55] (03PS1) 10Alexandros Kosiaris: Add zoterov2 LVS ip block and realserver config [puppet] - 10https://gerrit.wikimedia.org/r/475304 (https://phabricator.wikimedia.org/T201611) [10:54:29] (03CR) 10Volans: [C: 031] "LGTM, 2 minor nitpicks inline. Feel free to merge as is or without follow up review." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/475241 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [10:56:50] 10Operations, 10SRE-Access-Requests, 10WMDE-Analytics-Engineering, 10Graphite, 10User-Addshore: Requesting access to graphite hosts for addshore - https://phabricator.wikimedia.org/T208750 (10fgiunchedi) a:05fgiunchedi>03None I've talked to @Addshore to clarify a bit the work involved and looks good... [10:58:56] (03Abandoned) 10Mforns: Send an alert email when EventLoggingSanitization job fails [puppet] - 10https://gerrit.wikimedia.org/r/454562 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [11:00:37] 10Operations, 10DNS, 10Traffic, 10Wikidata, 10User-revi: wikidata.org lacks SPF record - https://phabricator.wikimedia.org/T210134 (10Addshore) [11:01:03] (03PS2) 10Elukey: Add RefineMonitor to EventLoggingSanitization analytics refinery job [puppet] - 10https://gerrit.wikimedia.org/r/475231 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [11:08:05] (03PS1) 10Filippo Giunchedi: toil: cleanup failed systemd scope units [puppet] - 10https://gerrit.wikimedia.org/r/475306 (https://phabricator.wikimedia.org/T199911) [11:18:44] (03CR) 10Muehlenhoff: [C: 031] "Thanks! Looks good to me. This could be restricted to sessions triggered by the debmonitor user, but given that it's limited to "*.scope" " [puppet] - 10https://gerrit.wikimedia.org/r/475306 (https://phabricator.wikimedia.org/T199911) (owner: 10Filippo Giunchedi) [11:23:36] (03CR) 10Elukey: [C: 032] Add RefineMonitor to EventLoggingSanitization analytics refinery job [puppet] - 10https://gerrit.wikimedia.org/r/475231 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [11:42:36] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210140 (10ops-monitoring-bot) [11:52:23] (03PS2) 10Alexandros Kosiaris: Introduce zoterov2 LVS conftool data [puppet] - 10https://gerrit.wikimedia.org/r/475303 (https://phabricator.wikimedia.org/T201611) [11:52:25] (03PS2) 10Alexandros Kosiaris: Add zoterov2 LVS ip block and realserver config [puppet] - 10https://gerrit.wikimedia.org/r/475304 (https://phabricator.wikimedia.org/T201611) [11:52:28] (03PS3) 10Alexandros Kosiaris: Introduce zoterov2 LVS endpoint [puppet] - 10https://gerrit.wikimedia.org/r/473733 (https://phabricator.wikimedia.org/T201611) [11:52:38] (03CR) 10Alexandros Kosiaris: [C: 032] Introduce zoterov2 LVS conftool data [puppet] - 10https://gerrit.wikimedia.org/r/475303 (https://phabricator.wikimedia.org/T201611) (owner: 10Alexandros Kosiaris) [11:52:48] (03CR) 10Alexandros Kosiaris: [C: 032] Add zoterov2 LVS ip block and realserver config [puppet] - 10https://gerrit.wikimedia.org/r/475304 (https://phabricator.wikimedia.org/T201611) (owner: 10Alexandros Kosiaris) [12:19:26] !log upgrading, deleting at and restarting dbstore2002 [12:19:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:33:43] (03PS1) 10Ema: fifo-log-demux: restart upon writer restart [puppet] - 10https://gerrit.wikimedia.org/r/475313 (https://phabricator.wikimedia.org/T204225) [12:42:36] (03CR) 10Jcrespo: [C: 032] mariadb: remove section s2 from dbstore2002 [puppet] - 10https://gerrit.wikimedia.org/r/475089 (https://phabricator.wikimedia.org/T208320) (owner: 10Banyek) [12:42:47] (03PS4) 10Jcrespo: mariadb: remove section s2 from dbstore2002 [puppet] - 10https://gerrit.wikimedia.org/r/475089 (https://phabricator.wikimedia.org/T208320) (owner: 10Banyek) [13:13:37] (03PS1) 10Marostegui: db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475317 (https://phabricator.wikimedia.org/T86339) [13:15:04] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475317 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [13:16:10] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475317 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [13:17:13] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475318 [13:17:18] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1119 - T86339 (duration: 00m 47s) [13:17:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:23] T86339: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 [13:19:10] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475318 (owner: 10Marostegui) [13:20:11] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475318 (owner: 10Marostegui) [13:20:37] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1119 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475317 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [13:20:39] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1119" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475318 (owner: 10Marostegui) [13:21:08] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1119 - T86339 (duration: 00m 46s) [13:21:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:21:35] (03PS1) 10Muehlenhoff: Remove shell access for deskana [puppet] - 10https://gerrit.wikimedia.org/r/475319 [13:21:50] (03PS1) 10Marostegui: db-eqiad.php: Depool db1114 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475320 (https://phabricator.wikimedia.org/T86339) [13:23:37] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1114 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475320 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [13:24:19] (03CR) 10Muehlenhoff: [C: 032] Remove shell access for deskana [puppet] - 10https://gerrit.wikimedia.org/r/475319 (owner: 10Muehlenhoff) [13:25:12] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1114 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475320 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [13:25:35] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1114" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475321 [13:26:09] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1114 - T86339 (duration: 00m 43s) [13:26:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:13] T86339: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 [13:27:01] (03PS1) 10Phuedx: eventlogging: Tag error log lines with the schema [puppet] - 10https://gerrit.wikimedia.org/r/475322 (https://phabricator.wikimedia.org/T205437) [13:27:57] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1114" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475321 (owner: 10Marostegui) [13:28:59] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1114" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475321 (owner: 10Marostegui) [13:30:04] (03PS6) 10Gehel: Enable dumping RDF data for debugging purposes [puppet] - 10https://gerrit.wikimedia.org/r/475241 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [13:30:35] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1114 - T86339 (duration: 00m 45s) [13:30:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:02] 10Operations: Canaries canaries canaries - https://phabricator.wikimedia.org/T210143 (10jijiki) [13:32:44] (03PS1) 10Marostegui: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475323 (https://phabricator.wikimedia.org/T86339) [13:33:26] 10Operations, 10Anti-Harassment, 10DBA: Error Unknown column ipb_sitewide in field list on query - https://phabricator.wikimedia.org/T208462 (10Marostegui) 05Open>03Resolved [13:33:41] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1114 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475320 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [13:33:43] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1114" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475321 (owner: 10Marostegui) [13:34:09] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475323 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [13:35:11] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475323 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [13:35:13] 10Operations, 10Scap: Canaries canaries canaries - https://phabricator.wikimedia.org/T210143 (10jijiki) p:05Triage>03Normal [13:35:35] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475324 [13:35:43] 10Operations, 10Scap, 10User-jijiki: Introduce state to Scap - https://phabricator.wikimedia.org/T209881 (10jijiki) [13:35:46] 10Operations, 10Scap: Canaries canaries canaries - https://phabricator.wikimedia.org/T210143 (10jijiki) [13:36:11] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1089 - T86339 (duration: 00m 46s) [13:36:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:16] T86339: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 [13:36:44] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475324 (owner: 10Marostegui) [13:36:53] (03CR) 10Vgutierrez: [C: 032] netbox: Deploy the TLS certificate managed by certcentral [puppet] - 10https://gerrit.wikimedia.org/r/474941 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [13:36:58] (03PS2) 10Vgutierrez: netbox: Deploy the TLS certificate managed by certcentral [puppet] - 10https://gerrit.wikimedia.org/r/474941 (https://phabricator.wikimedia.org/T207050) [13:37:45] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475324 (owner: 10Marostegui) [13:38:50] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1089 - T86339 (duration: 00m 45s) [13:38:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:16] (03PS1) 10Marostegui: db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475325 (https://phabricator.wikimedia.org/T86339) [13:41:24] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475325 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [13:42:44] (03PS1) 10Arturo Borrero Gonzalez: openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) [13:42:55] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475325 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [13:43:13] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1083" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475327 [13:43:35] (03CR) 10jerkins-bot: [V: 04-1] openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [13:43:51] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1083 - T86339 (duration: 00m 46s) [13:43:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:43:54] T86339: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 [13:44:20] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1083" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475327 (owner: 10Marostegui) [13:45:20] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1083" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475327 (owner: 10Marostegui) [13:45:43] !log Deploy schema change on s1 eqiad hosts T86339 [13:45:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:46:17] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1083 - T86339 (duration: 00m 46s) [13:46:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:46:32] (03PS1) 10Marostegui: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475328 (https://phabricator.wikimedia.org/T86339) [13:47:20] (03PS1) 10Vgutierrez: netbox: Use certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/475329 (https://phabricator.wikimedia.org/T207050) [13:47:53] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475328 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [13:49:23] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475328 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [13:50:21] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1080 - T86339 (duration: 00m 46s) [13:50:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:24] T86339: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 [13:50:26] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475330 [13:51:21] (03PS7) 10Gehel: Enable dumping RDF data for debugging purposes [puppet] - 10https://gerrit.wikimedia.org/r/475241 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [13:51:45] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475330 (owner: 10Marostegui) [13:52:15] (03CR) 10Gehel: Enable dumping RDF data for debugging purposes (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/475241 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [13:52:29] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475323 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [13:52:31] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475324 (owner: 10Marostegui) [13:52:33] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475325 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [13:52:35] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1083" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475327 (owner: 10Marostegui) [13:52:37] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475328 (https://phabricator.wikimedia.org/T86339) (owner: 10Marostegui) [13:52:45] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475330 (owner: 10Marostegui) [13:52:53] !log marostegui@deploy1001 sync-file aborted: Depool db1080 - T86339 (duration: 00m 00s) [13:52:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:58] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475330 (owner: 10Marostegui) [13:53:40] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1080 - T86339 (duration: 00m 45s) [13:53:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:46] PROBLEM - Check systemd state on dbstore1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:02:53] checking dbstore1001 [14:03:38] a prometheus daemon that shouldn't start, tried to, disabling it [14:04:04] RECOVERY - Check systemd state on dbstore1001 is OK: OK - running: The system is fully operational [14:07:22] (03PS1) 10Jcrespo: mariadb: Remove dbstore2002:s2 from prometheus monitoring [puppet] - 10https://gerrit.wikimedia.org/r/475331 [14:09:14] (03PS1) 10Gehel: wdqs: options should all be passed as arrays [puppet] - 10https://gerrit.wikimedia.org/r/475332 [14:10:11] (03PS8) 10Gehel: Enable dumping RDF data for debugging purposes [puppet] - 10https://gerrit.wikimedia.org/r/475241 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [14:10:13] (03CR) 10jerkins-bot: [V: 04-1] wdqs: options should all be passed as arrays [puppet] - 10https://gerrit.wikimedia.org/r/475332 (owner: 10Gehel) [14:11:40] (03CR) 10Gehel: [C: 032] Enable dumping RDF data for debugging purposes [puppet] - 10https://gerrit.wikimedia.org/r/475241 (https://phabricator.wikimedia.org/T210044) (owner: 10Smalyshev) [14:12:16] (03PS2) 10Jcrespo: mariadb: Remove dbstore2002:s2 from prometheus monitoring [puppet] - 10https://gerrit.wikimedia.org/r/475331 [14:16:57] (03CR) 10Jcrespo: [C: 032] mariadb: Remove dbstore2002:s2 from prometheus monitoring [puppet] - 10https://gerrit.wikimedia.org/r/475331 (owner: 10Jcrespo) [14:19:22] (03PS2) 10Phuedx: eventlogging: Tag error log lines with the schema [puppet] - 10https://gerrit.wikimedia.org/r/475322 (https://phabricator.wikimedia.org/T205437) [14:45:33] (03CR) 10Muehlenhoff: netbox: allow read-only access to wmf ldap group (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475295 (https://phabricator.wikimedia.org/T208267) (owner: 10Volans) [14:48:50] (03PS1) 10Vgutierrez: netbox: Get rid of the old LE puppetization [puppet] - 10https://gerrit.wikimedia.org/r/475335 (https://phabricator.wikimedia.org/T207050) [14:50:06] !log akosiaris@deploy1001 scap-helm zotero install --name production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad] [14:50:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:13] (03PS2) 10Gehel: wdqs: options should all be passed as arrays [puppet] - 10https://gerrit.wikimedia.org/r/475332 [14:54:54] (03CR) 10Volans: "reply inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475295 (https://phabricator.wikimedia.org/T208267) (owner: 10Volans) [14:55:45] (03CR) 10Volans: "more details" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475295 (https://phabricator.wikimedia.org/T208267) (owner: 10Volans) [14:58:37] (03CR) 10Alexandros Kosiaris: [C: 032] "jenkins-bot votes -1 for a typo that isn't in our code and hence should be fixed upstream so I 'll remove it's -1" [puppet] - 10https://gerrit.wikimedia.org/r/474334 (owner: 10Dzahn) [14:58:50] (03PS5) 10Alexandros Kosiaris: upgrade puppet stdlib from 4.15.0 to 4.16.0 [puppet] - 10https://gerrit.wikimedia.org/r/474334 (owner: 10Dzahn) [14:58:56] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] upgrade puppet stdlib from 4.15.0 to 4.16.0 [puppet] - 10https://gerrit.wikimedia.org/r/474334 (owner: 10Dzahn) [15:03:47] (03CR) 10Gehel: elasticsearch_cluster: Added multi-cluster/multi-instance support (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe) [15:06:22] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] "Let's play it safe here and wait a couple of days between merges. Overcautious probably but doesn't really hurt" [puppet] - 10https://gerrit.wikimedia.org/r/474334 (owner: 10Dzahn) [15:07:07] (03CR) 10Vgutierrez: [C: 032] netbox: Use certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/475329 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [15:07:17] (03PS2) 10Vgutierrez: netbox: Use certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/475329 (https://phabricator.wikimedia.org/T207050) [15:12:05] (03PS1) 10Daimona Eaytoy: Clarify docs for AbuseFilter emergency threshold [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475337 [15:15:45] (03CR) 10Muehlenhoff: [C: 031] "Thanks, looks fine, then." [puppet] - 10https://gerrit.wikimedia.org/r/475295 (https://phabricator.wikimedia.org/T208267) (owner: 10Volans) [15:22:56] !log akosiaris@deploy1001 scap-helm zotero install --name production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad] [15:22:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:22:57] !log akosiaris@deploy1001 scap-helm zotero cluster eqiad completed [15:22:57] !log akosiaris@deploy1001 scap-helm zotero finished [15:22:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:04] !log akosiaris@deploy1001 scap-helm zotero install --name production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: codfw] [15:23:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:06] !log akosiaris@deploy1001 scap-helm zotero cluster codfw completed [15:23:06] !log akosiaris@deploy1001 scap-helm zotero finished [15:23:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:54] (03PS3) 10Gehel: wdqs: options should all be passed as arrays [puppet] - 10https://gerrit.wikimedia.org/r/475332 [15:33:15] (03PS2) 10Ema: Build -dbgsym packages [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/475108 [15:33:17] (03PS1) 10Ema: 8.0.0-1wm3: 0010-logs-to-pipe.patch, -dbgsym packages [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/475339 [15:36:13] (03CR) 10Volans: [C: 031] "I'm ok to have this as a workaround. The 1h cron will basically clear that up automatically during weekends and low-presence times, while " (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475306 (https://phabricator.wikimedia.org/T199911) (owner: 10Filippo Giunchedi) [15:39:07] (03PS4) 10Gehel: wdqs: options should all be passed as arrays [puppet] - 10https://gerrit.wikimedia.org/r/475332 [15:41:26] (03CR) 10Gehel: "PCC looks happy: https://puppet-compiler.wmflabs.org/compiler1002/13666/" [puppet] - 10https://gerrit.wikimedia.org/r/475332 (owner: 10Gehel) [15:43:33] (03PS2) 10Muehlenhoff: Script to generate service principals/keytabs (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/470566 [15:47:37] 10Operations, 10Research: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10Dzahn) @bmansurov I can help with puppetization. Just need to know a bit more what you actually need. Let's chat about that after the holidays or feel free to update with more details here,... [15:49:43] (03PS5) 10Gehel: wdqs: options should all be passed as arrays [puppet] - 10https://gerrit.wikimedia.org/r/475332 [15:51:48] (03CR) 10Ema: [C: 032] Build -dbgsym packages [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/475108 (owner: 10Ema) [15:51:56] (03CR) 10Ema: [C: 032] 8.0.0-1wm3: 0010-logs-to-pipe.patch, -dbgsym packages [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/475339 (owner: 10Ema) [15:53:33] (03CR) 10Volans: [C: 031] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475332 (owner: 10Gehel) [15:54:10] RECOVERY - Check systemd state on ms-be2021 is OK: OK - running: The system is fully operational [15:54:25] (03PS1) 10Elukey: profile::analytics::refinery::job::data_purge: add new parameters [puppet] - 10https://gerrit.wikimedia.org/r/475340 [15:54:52] RECOVERY - Check systemd state on ms-be2034 is OK: OK - running: The system is fully operational [15:55:23] (03PS6) 10Gehel: wdqs: options should all be passed as arrays [puppet] - 10https://gerrit.wikimedia.org/r/475332 [15:55:27] (03CR) 10Gehel: wdqs: options should all be passed as arrays (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475332 (owner: 10Gehel) [15:56:37] (03PS2) 10Filippo Giunchedi: toil: cleanup failed systemd scope units [puppet] - 10https://gerrit.wikimedia.org/r/475306 (https://phabricator.wikimedia.org/T199911) [15:56:42] (03CR) 10Gehel: [C: 032] wdqs: options should all be passed as arrays [puppet] - 10https://gerrit.wikimedia.org/r/475332 (owner: 10Gehel) [15:57:48] (03CR) 10Filippo Giunchedi: "Thanks for the reviews! Simplified the systemd invocation to fix a bug in case there were no scope unit failed." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475306 (https://phabricator.wikimedia.org/T199911) (owner: 10Filippo Giunchedi) [16:03:59] (03CR) 10Alexandros Kosiaris: [C: 04-1] deployment-prep: move lists of cache nodes out of labs.yaml hiera (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475225 (owner: 10Alex Monk) [16:05:43] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/475306 (https://phabricator.wikimedia.org/T199911) (owner: 10Filippo Giunchedi) [16:06:44] !log trafficserver_8.0.0-1wm3 uploaded to stretch-wikimedia [16:06:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:06:47] (03CR) 10Alex Monk: deployment-prep: move lists of cache nodes out of labs.yaml hiera (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475225 (owner: 10Alex Monk) [16:07:15] (03PS2) 10Alex Monk: deployment-prep: move lists of cache nodes out of labs.yaml hiera [puppet] - 10https://gerrit.wikimedia.org/r/475225 [16:09:58] (03CR) 10Alexandros Kosiaris: [C: 04-1] "We should probably alter the wikimedia varnish ACL to use the $domain_networks, and not $all_networks." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475150 (owner: 10Alex Monk) [16:10:55] (03CR) 10Alex Monk: network::constants: Include cloud private range in all_networks (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475150 (owner: 10Alex Monk) [16:11:12] (03CR) 10Vgutierrez: [C: 032] netbox: Get rid of the old LE puppetization [puppet] - 10https://gerrit.wikimedia.org/r/475335 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [16:11:21] (03PS2) 10Vgutierrez: netbox: Get rid of the old LE puppetization [puppet] - 10https://gerrit.wikimedia.org/r/475335 (https://phabricator.wikimedia.org/T207050) [16:11:34] (03PS2) 10Elukey: profile::analytics::refinery::job::data_purge: add new parameters [puppet] - 10https://gerrit.wikimedia.org/r/475340 [16:12:11] (03CR) 10jerkins-bot: [V: 04-1] profile::analytics::refinery::job::data_purge: add new parameters [puppet] - 10https://gerrit.wikimedia.org/r/475340 (owner: 10Elukey) [16:14:29] (03PS3) 10Elukey: profile::analytics::refinery::job::data_purge: add new parameters [puppet] - 10https://gerrit.wikimedia.org/r/475340 [16:14:47] 17:12:07 fatal: protocol error: bad pack header [16:14:49] mmmm [16:15:28] (03CR) 10Elukey: [C: 032] profile::analytics::refinery::job::data_purge: add new parameters [puppet] - 10https://gerrit.wikimedia.org/r/475340 (owner: 10Elukey) [16:18:54] 10Operations, 10Traffic, 10Patch-For-Review: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Vgutierrez) [16:21:32] (03PS1) 10Jcrespo: mariadb: Depool es1016 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475343 [16:22:59] (03CR) 10Gehel: [C: 031] "PCC looks good: https://puppet-compiler.wmflabs.org/compiler1002/13669/" [puppet] - 10https://gerrit.wikimedia.org/r/475092 (https://phabricator.wikimedia.org/T209570) (owner: 10Mathew.onipe) [16:23:29] (03PS3) 10Gehel: maps: added use_proxy flag to set proxy [puppet] - 10https://gerrit.wikimedia.org/r/475092 (https://phabricator.wikimedia.org/T209570) (owner: 10Mathew.onipe) [16:24:40] (03CR) 10Gehel: [C: 032] maps: added use_proxy flag to set proxy [puppet] - 10https://gerrit.wikimedia.org/r/475092 (https://phabricator.wikimedia.org/T209570) (owner: 10Mathew.onipe) [16:27:43] (03PS2) 10Arturo Borrero Gonzalez: openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) [16:28:54] (03CR) 10jerkins-bot: [V: 04-1] openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [16:31:43] (03CR) 10Jcrespo: [C: 032] mariadb: Depool es1016 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475343 (owner: 10Jcrespo) [16:32:59] (03PS2) 10Ema: fifo-log-demux: restart upon writer restart, remove cargo cult [puppet] - 10https://gerrit.wikimedia.org/r/475313 (https://phabricator.wikimedia.org/T204225) [16:33:10] (03Merged) 10jenkins-bot: mariadb: Depool es1016 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475343 (owner: 10Jcrespo) [16:34:30] (03CR) 10Alexandros Kosiaris: [C: 04-1] network::constants: Include cloud private range in all_networks (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475150 (owner: 10Alex Monk) [16:35:01] (03CR) 10Ema: [C: 032] fifo-log-demux: restart upon writer restart, remove cargo cult [puppet] - 10https://gerrit.wikimedia.org/r/475313 (https://phabricator.wikimedia.org/T204225) (owner: 10Ema) [16:35:12] (03PS3) 10Arturo Borrero Gonzalez: openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) [16:35:34] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool es1016 (duration: 00m 47s) [16:35:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:36:05] (03CR) 10jerkins-bot: [V: 04-1] openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [16:36:07] (03PS1) 10Gilles: Add HTTP/2 priorities test to speed tests [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475345 (https://phabricator.wikimedia.org/T210141) [16:36:35] (03CR) 10Alex Monk: network::constants: Include cloud private range in all_networks (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/475150 (owner: 10Alex Monk) [16:36:51] (03CR) 10Gehel: [C: 04-1] "PCC is failing: https://puppet-compiler.wmflabs.org/compiler1002/13674/" [puppet] - 10https://gerrit.wikimedia.org/r/475093 (https://phabricator.wikimedia.org/T206639) (owner: 10Mathew.onipe) [16:37:00] (03CR) 10Arturo Borrero Gonzalez: "Compilation test: https://integration.wikimedia.org/ci/view/operations/job/operations-puppet-catalog-compiler/13675/console" [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [16:37:25] (03CR) 10Gehel: [C: 04-1] "Adding Brook since labsdb1006 is impacted" [puppet] - 10https://gerrit.wikimedia.org/r/475093 (https://phabricator.wikimedia.org/T206639) (owner: 10Mathew.onipe) [16:38:13] (03PS4) 10Arturo Borrero Gonzalez: openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) [16:39:28] (03CR) 10jerkins-bot: [V: 04-1] openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [16:41:31] (03PS5) 10Arturo Borrero Gonzalez: openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) [16:42:21] (03CR) 10jerkins-bot: [V: 04-1] openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [16:42:26] 10Operations, 10Traffic, 10Privacy: Disable WMF-Last-Access cookies for wmfusercontent.org - https://phabricator.wikimedia.org/T210167 (10Krinkle) [16:44:04] 10Operations, 10Scoring-platform-team (Current), 10User-Ladsgroup: Spec out migrating ORES to kubernetes - https://phabricator.wikimedia.org/T210109 (10Ladsgroup) I found this: {T182331} [16:44:18] (03CR) 10jenkins-bot: mariadb: Depool es1016 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475343 (owner: 10Jcrespo) [16:44:29] 10Operations, 10Traffic, 10Privacy: Disable WMF-Last-Access cookies for wmfusercontent.org - https://phabricator.wikimedia.org/T210167 (10Krinkle) This also relates to T202479, in that it touches on the larger problem of not having an established way to detect in Varnish whether the request is for a wiki or... [16:46:00] (03PS6) 10Arturo Borrero Gonzalez: openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) [16:47:13] (03PS4) 10Mathew.onipe: profile::maps::osm_master: change osmupdater and osmimporter auth method to peer [puppet] - 10https://gerrit.wikimedia.org/r/475093 (https://phabricator.wikimedia.org/T206639) [16:47:22] (03CR) 10Arturo Borrero Gonzalez: "Preliminary compiler check: https://integration.wikimedia.org/ci/view/operations/job/operations-puppet-catalog-compiler/13676/console" [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [16:48:14] (03CR) 10Mathew.onipe: "> Patch Set 3: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/475093 (https://phabricator.wikimedia.org/T206639) (owner: 10Mathew.onipe) [16:48:22] PROBLEM - Check systemd state on ms-be2038 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:49:36] !log upgrading, deleting at and restarting es1016 [16:49:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:49:50] !log upgrading, and restarting es1016 (but not deleting, that was a mistake) [16:49:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:52:31] 10Operations, 10Continuous-Integration-Infrastructure (shipyard), 10Kubernetes: set up a test node with new version, Redis as cache, a new Swift container and export metrics over Fraphana - https://phabricator.wikimedia.org/T210076 (10fselles) i've got some metrics at least. ` # HELP go_gc_duration_seconds... [16:53:20] (03PS2) 10Gehel: wdqs: run test queries periodically on wdqs test servers [puppet] - 10https://gerrit.wikimedia.org/r/474285 (https://phabricator.wikimedia.org/T207665) [16:53:34] (03CR) 10Ladsgroup: "I think we should restore this as we are on ores* nodes. We can fix the number" [puppet] - 10https://gerrit.wikimedia.org/r/396055 (https://phabricator.wikimedia.org/T182249) (owner: 10Awight) [16:53:43] (03CR) 10jerkins-bot: [V: 04-1] wdqs: run test queries periodically on wdqs test servers [puppet] - 10https://gerrit.wikimedia.org/r/474285 (https://phabricator.wikimedia.org/T207665) (owner: 10Gehel) [16:56:45] (03PS3) 10Gehel: wdqs: run test queries periodically on wdqs test servers [puppet] - 10https://gerrit.wikimedia.org/r/474285 (https://phabricator.wikimedia.org/T207665) [16:57:33] (03CR) 10jerkins-bot: [V: 04-1] wdqs: run test queries periodically on wdqs test servers [puppet] - 10https://gerrit.wikimedia.org/r/474285 (https://phabricator.wikimedia.org/T207665) (owner: 10Gehel) [16:58:56] (03CR) 10Mathew.onipe: "PCC look Ok: https://puppet-compiler.wmflabs.org/compiler1002/13677/" [puppet] - 10https://gerrit.wikimedia.org/r/475093 (https://phabricator.wikimedia.org/T206639) (owner: 10Mathew.onipe) [17:01:42] (03PS7) 10Arturo Borrero Gonzalez: openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) [17:02:31] (03CR) 10jerkins-bot: [V: 04-1] openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [17:03:53] (03CR) 10Arturo Borrero Gonzalez: "Another compilation test: https://integration.wikimedia.org/ci/view/operations/job/operations-puppet-catalog-compiler/13678/console" [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [17:08:29] (03PS8) 10Arturo Borrero Gonzalez: openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) [17:08:37] (03PS1) 10Jcrespo: Revert "mariadb: Depool es1016 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475347 [17:09:28] (03CR) 10jerkins-bot: [V: 04-1] openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [17:11:29] (03PS1) 10Filippo Giunchedi: swift: cleanup stale scope sessions [puppet] - 10https://gerrit.wikimedia.org/r/475348 (https://phabricator.wikimedia.org/T199911) [17:12:07] (03CR) 10jerkins-bot: [V: 04-1] swift: cleanup stale scope sessions [puppet] - 10https://gerrit.wikimedia.org/r/475348 (https://phabricator.wikimedia.org/T199911) (owner: 10Filippo Giunchedi) [17:12:57] (03PS9) 10Arturo Borrero Gonzalez: openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) [17:15:19] (03CR) 10Arturo Borrero Gonzalez: "Compilation test: https://integration.wikimedia.org/ci/view/operations/job/operations-puppet-catalog-compiler/13680/console" [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [17:16:07] (03CR) 10Filippo Giunchedi: [C: 032] "PCC https://puppet-compiler.wmflabs.org/compiler1002/13679/" [puppet] - 10https://gerrit.wikimedia.org/r/475348 (https://phabricator.wikimedia.org/T199911) (owner: 10Filippo Giunchedi) [17:16:31] (03CR) 10Filippo Giunchedi: [C: 032] toil: cleanup failed systemd scope units [puppet] - 10https://gerrit.wikimedia.org/r/475306 (https://phabricator.wikimedia.org/T199911) (owner: 10Filippo Giunchedi) [17:16:50] (03PS3) 10Filippo Giunchedi: toil: cleanup failed systemd scope units [puppet] - 10https://gerrit.wikimedia.org/r/475306 (https://phabricator.wikimedia.org/T199911) [17:18:42] (03PS2) 10Filippo Giunchedi: swift: cleanup stale scope sessions [puppet] - 10https://gerrit.wikimedia.org/r/475348 (https://phabricator.wikimedia.org/T199911) [17:18:51] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] swift: cleanup stale scope sessions [puppet] - 10https://gerrit.wikimedia.org/r/475348 (https://phabricator.wikimedia.org/T199911) (owner: 10Filippo Giunchedi) [17:20:19] (03PS4) 10Gehel: wdqs: run test queries periodically on wdqs test servers [puppet] - 10https://gerrit.wikimedia.org/r/474285 (https://phabricator.wikimedia.org/T207665) [17:20:51] (03CR) 10jerkins-bot: [V: 04-1] wdqs: run test queries periodically on wdqs test servers [puppet] - 10https://gerrit.wikimedia.org/r/474285 (https://phabricator.wikimedia.org/T207665) (owner: 10Gehel) [17:22:58] (03PS5) 10Gehel: wdqs: run test queries periodically on wdqs test servers [puppet] - 10https://gerrit.wikimedia.org/r/474285 (https://phabricator.wikimedia.org/T207665) [17:26:22] PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is CRITICAL: 55.51 le 60 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6fullscreenorgId=1 [17:29:34] (03PS10) 10Arturo Borrero Gonzalez: openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) [17:32:06] RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is OK: (C)60 le (W)70 le 72.37 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6fullscreenorgId=1 [17:39:04] (03PS11) 10Arturo Borrero Gonzalez: openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) [17:42:50] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool es1016 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475347 (owner: 10Jcrespo) [17:43:38] PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is CRITICAL: 53.23 le 60 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6fullscreenorgId=1 [17:43:54] (03Merged) 10jenkins-bot: Revert "mariadb: Depool es1016 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475347 (owner: 10Jcrespo) [17:46:21] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool es1016 (duration: 00m 46s) [17:46:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:40] (03CR) 10Arturo Borrero Gonzalez: "Another test https://integration.wikimedia.org/ci/view/operations/job/operations-puppet-catalog-compiler/13682/console" [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [17:49:07] (03CR) 10jenkins-bot: Revert "mariadb: Depool es1016 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475347 (owner: 10Jcrespo) [17:49:24] RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is OK: (C)60 le (W)70 le 76.12 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6fullscreenorgId=1 [17:50:36] RECOVERY - Check systemd state on ms-be2038 is OK: OK - running: The system is fully operational [17:51:32] (03PS12) 10Arturo Borrero Gonzalez: openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) [18:03:33] (03CR) 10Arturo Borrero Gonzalez: "Another compilation check: https://integration.wikimedia.org/ci/view/operations/job/operations-puppet-catalog-compiler/13683/" [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [18:05:29] (03PS1) 10Filippo Giunchedi: WIP rsyslog: udp input json_lines shim [puppet] - 10https://gerrit.wikimedia.org/r/475352 (https://phabricator.wikimedia.org/T205851) [18:06:18] (03CR) 10jerkins-bot: [V: 04-1] WIP rsyslog: udp input json_lines shim [puppet] - 10https://gerrit.wikimedia.org/r/475352 (https://phabricator.wikimedia.org/T205851) (owner: 10Filippo Giunchedi) [18:30:47] (03PS13) 10Arturo Borrero Gonzalez: openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) [18:34:17] (03PS14) 10Arturo Borrero Gonzalez: openstack: rearrange repos, packages and pinnings [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) [18:37:59] !log disable puppet in all CloudVPS HW servers to test a patch (T209948) [18:38:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:38:02] T209948: CloudVPS: puppet organization for openstack server/client packages, repos and pinning - https://phabricator.wikimedia.org/T209948 [18:45:31] !log enable puppet in all CloudVPS HW servers [18:45:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:46:01] !log installing openjpeg2 security updates [18:46:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:47:53] (03CR) 10Arturo Borrero Gonzalez: "last compilation attempt: https://integration.wikimedia.org/ci/view/operations/job/operations-puppet-catalog-compiler/13687/console" [puppet] - 10https://gerrit.wikimedia.org/r/475326 (https://phabricator.wikimedia.org/T209948) (owner: 10Arturo Borrero Gonzalez) [18:47:56] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:48:14] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210178 (10ops-monitoring-bot) [18:49:10] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:49:31] 10Operations, 10Research: Import recommendations into production database - https://phabricator.wikimedia.org/T208622 (10bmansurov) @Dzahn, thanks! OK, Iʼll ping you on Monday. [19:01:05] !log installing uriparser security updates [19:01:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:36] PROBLEM - HHVM jobrunner on mw1301 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:36] RECOVERY - HHVM jobrunner on mw1301 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [19:17:06] (03PS1) 10GTirloni: toolforge: Increase WebServiceMonitor sleep time to 60s [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/475355 (https://phabricator.wikimedia.org/T210190) [19:18:42] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210193 (10ops-monitoring-bot) [19:41:31] (03PS6) 10Gehel: wdqs: run test queries periodically on wdqs test servers [puppet] - 10https://gerrit.wikimedia.org/r/474285 (https://phabricator.wikimedia.org/T207665) [19:45:49] (03CR) 10Gehel: [C: 032] wdqs: run test queries periodically on wdqs test servers [puppet] - 10https://gerrit.wikimedia.org/r/474285 (https://phabricator.wikimedia.org/T207665) (owner: 10Gehel) [19:49:09] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210199 (10ops-monitoring-bot) [19:50:19] (03PS1) 10Gehel: wdqs: fix ensure on cron to run tests [puppet] - 10https://gerrit.wikimedia.org/r/475360 [19:51:12] (03CR) 10Gehel: [C: 032] wdqs: fix ensure on cron to run tests [puppet] - 10https://gerrit.wikimedia.org/r/475360 (owner: 10Gehel) [19:51:37] PROBLEM - puppet last run on wdqs2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:52:09] PROBLEM - puppet last run on wdqs1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:54:05] PROBLEM - puppet last run on wdqs1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:57:17] RECOVERY - puppet last run on wdqs1009 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [19:58:55] PROBLEM - puppet last run on wdqs1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:59:11] RECOVERY - puppet last run on wdqs1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:09:22] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210202 (10ops-monitoring-bot) [20:17:17] RECOVERY - puppet last run on wdqs2001 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [20:19:37] (03PS1) 10Alex Monk: deployment-prep: Update IPs for Varnish [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475363 (https://phabricator.wikimedia.org/T208101) [20:24:39] RECOVERY - puppet last run on wdqs1008 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:29:36] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210203 (10ops-monitoring-bot) [20:32:55] PROBLEM - HHVM jobrunner on mw1336 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.005 second response time [20:34:05] RECOVERY - HHVM jobrunner on mw1336 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [20:39:52] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210202 (10Zoranzoki21) Looks same as T210203, so closing as duplicate. [20:40:03] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210202 (10Zoranzoki21) [20:40:05] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210203 (10Zoranzoki21) [20:40:40] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210199 (10Zoranzoki21) [20:40:42] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210203 (10Zoranzoki21) [20:41:21] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210140 (10Zoranzoki21) [20:41:24] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210203 (10Zoranzoki21) [20:41:54] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210178 (10Zoranzoki21) [20:41:56] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210203 (10Zoranzoki21) [20:42:11] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210193 (10Zoranzoki21) [20:42:13] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210203 (10Zoranzoki21) [20:43:19] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T210203 (10Zoranzoki21) [20:43:21] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T208096 (10Zoranzoki21) [21:05:52] (03PS1) 10Zoranzoki21: Delete 'Импортировано' namespace from ru.wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475367 (https://phabricator.wikimedia.org/T210171) [21:07:15] I'll disable the raid handler on ms-be2021 [21:08:38] !log disable raid handler for ms-be2021 - T208096 [21:08:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:08:44] T208096: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T208096 [21:49:05] anyone knows a tool for searching article contents in an XML dump? [21:49:09] apergos maybe? [21:49:21] uh [21:49:32] you can use the multistream dumps best, with their index [21:49:45] there's bound to be stuff out there that'sbetter than the crap tool I wrote long ago [21:49:50] I mean, get all titles which contain the word foobar [21:50:01] hm no idea tbh [21:50:08] I think I made a tool for that some time ago [21:50:10] can't you just download the alltitles file [21:50:13] and grep that? [21:50:14] and it wouldn't be hard to do [21:50:18] we dump that every day [21:50:28] no, I'm not explaining myself correctly [21:50:29] or all titles in main namespace, I think we dump that every day too [21:50:50] get all titles of articles whose content contain the word foobar [21:51:00] the content contains it [21:51:02] eh [21:51:17] nope, no idea, that's not something I've fooled with or looked into [21:51:26] but there must be stuff around, you'd think [21:51:30] it wouldn't be hard to code [21:51:39] but if there's someone already there... :P [21:51:56] *something [21:52:28] do we have a repo for xml tools? [21:53:08] nothing general, no [21:53:17] any tool would have been written outside of wmf [21:53:33] the only dump related ones are the crap ones I wrote or mqdumper and such, that are in an wmf repo [21:54:41] how big is this xml file (how many pages)? [21:55:04] maybe you want to build an elastic-style index or something with the contents of the articles? dunno [21:57:20] https://en.wikipedia.org/wiki/Wikipedia:Database_download I'm looking at stuff here, no idea if any of it's useful [21:57:27] there's some offline reader sort of stuff etc in here [21:59:33] https://github.com/lemire/IndexWikipedia if you hav to roll your own here's a lucene-based 'build an index' thingie [21:59:36] meh [21:59:47] midnight for me, going to sign off, let me know what you turn up [22:02:13] ok, good night apergos [22:12:15] 10Operations, 10Release-Engineering-Team (Backlog): Keyholder phab repo duplicate work - https://phabricator.wikimedia.org/T203003 (10hashar) I guess we can close rKEYHOLDER. Seems to me keyholder code will be moved out of operations/puppet to `operations/software/keyholder` where development has been occurrin... [23:09:02] (03CR) 10Zhuyifei1999: [C: 031] "LGTM" [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/475355 (https://phabricator.wikimedia.org/T210190) (owner: 10GTirloni) [23:09:44] Platonides: the parsing team has/had a dump grep script...I can't seem to find it right now though [23:10:20] legoktm: I already made a quick one [23:10:54] 34 lines :P [23:14:14] :D nice [23:31:55] PROBLEM - puppet last run on cp3044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:40:51] 10Operations, 10Certcentral, 10Traffic, 10Patch-For-Review: Create and deploy a centralized letsencrypt service - https://phabricator.wikimedia.org/T194962 (10Krenair) @vgutierrez: are we done with this task? [23:41:40] 10Operations, 10Certcentral, 10Traffic, 10Patch-For-Review: Create and deploy a centralized letsencrypt service - https://phabricator.wikimedia.org/T194962 (10Krenair) [23:41:43] 10Operations, 10Traffic, 10HTTPS, 10Patch-For-Review: Create a secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548 (10Krenair) [23:41:50] 10Operations, 10Certcentral, 10Traffic, 10Patch-For-Review: Create and deploy a centralized letsencrypt service - https://phabricator.wikimedia.org/T194962 (10Krenair) [23:41:56] 10Operations, 10Traffic, 10HTTPS, 10Patch-For-Review: Create a secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548 (10Krenair) [23:46:19] 10Operations, 10Certcentral, 10Traffic, 10Goal, 10Patch-For-Review: Deploy a scalable service for ACME (LetsEncrypt) certificate management - https://phabricator.wikimedia.org/T199711 (10Krenair) [23:46:21] 10Operations, 10Certcentral, 10Traffic: certcentral: challenge checking on *all* pooled backend hosts - https://phabricator.wikimedia.org/T203396 (10Krenair) [23:46:29] 10Operations, 10Certcentral, 10Traffic: certcentral: Provide script for certificate revocation - https://phabricator.wikimedia.org/T203423 (10Krenair) [23:46:31] 10Operations, 10Certcentral, 10Traffic, 10Goal, 10Patch-For-Review: Deploy a scalable service for ACME (LetsEncrypt) certificate management - https://phabricator.wikimedia.org/T199711 (10Krenair) [23:47:05] 10Operations, 10Certcentral, 10Traffic, 10Goal, 10Patch-For-Review: Deploy a scalable service for ACME (LetsEncrypt) certificate management - https://phabricator.wikimedia.org/T199711 (10Krenair) @Vgutierrez: I'm thinking we should close this and open a new task about improving our certcentral setup to t... [23:47:26] 10Operations, 10Certcentral, 10Traffic, 10Goal, 10Patch-For-Review: Deploy a scalable service for ACME (LetsEncrypt) certificate management - https://phabricator.wikimedia.org/T199711 (10Krenair) [23:51:33] 10Operations, 10Certcentral, 10Traffic, 10Goal, 10Patch-For-Review: Deploy a scalable service for ACME (LetsEncrypt) certificate management - https://phabricator.wikimedia.org/T199711 (10Krenair) [23:51:44] 10Operations, 10Certcentral, 10Traffic, 10Goal, 10Patch-For-Review: Deploy a scalable service for ACME (LetsEncrypt) certificate management - https://phabricator.wikimedia.org/T199711 (10Krenair) [23:53:08] 10Operations, 10Certcentral, 10Traffic, 10Goal, 10Patch-For-Review: Deploy a scalable service for ACME (LetsEncrypt) certificate management - https://phabricator.wikimedia.org/T199711 (10Krenair) [23:53:12] 10Operations, 10Traffic, 10HTTPS, 10Patch-For-Review: Create a secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548 (10Krenair) [23:53:16] 10Operations, 10Certcentral, 10Traffic, 10Patch-For-Review: Create and deploy a centralized letsencrypt service - https://phabricator.wikimedia.org/T194962 (10Krenair) 05Open>03Resolved I'm just boldly marking this as resolved but feel free to revert if you disagree [23:55:07] 10Operations, 10Certcentral, 10Traffic, 10Goal, 10Patch-For-Review: Deploy a scalable service for ACME (LetsEncrypt) certificate management - https://phabricator.wikimedia.org/T199711 (10Krenair) 05Open>03Resolved I've rearranged the structure of these tasks to be logical and this has no more open su... [23:56:54] 10Operations, 10Certcentral, 10Traffic, 10Goal, 10Patch-For-Review: Deploy a scalable service for ACME (LetsEncrypt) certificate management - https://phabricator.wikimedia.org/T199711 (10Krenair) [23:56:56] 10Operations, 10Certcentral, 10Traffic: certcentral: delay deployment of renewed certs to wait out skewed client clocks - https://phabricator.wikimedia.org/T204997 (10Krenair) [23:57:18] 10Operations, 10Certcentral, 10Traffic, 10Goal, 10Patch-For-Review: Deploy a scalable service for ACME (LetsEncrypt) certificate management - https://phabricator.wikimedia.org/T199711 (10Krenair) [23:57:21] 10Operations, 10Certcentral, 10Traffic: Integrate certspotter with certcentral to avoid certspotter notifying us on legitimate certs generated by our certcentral boxes - https://phabricator.wikimedia.org/T204994 (10Krenair) [23:59:26] 10Operations, 10Traffic: Update certspotter - https://phabricator.wikimedia.org/T204993 (10Krenair) @faidon: Is this now done?