[00:47:48] PROBLEM - Host labnet1001.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [00:47:48] PROBLEM - Host labcontrol1002.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [00:52:58] RECOVERY - Host labnet1001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.16 ms [00:52:58] RECOVERY - Host labcontrol1002.mgmt is UP: PING OK - Packet loss = 0%, RTA = 2.07 ms [01:18:12] (03CR) 10TerraCodes: "> Is there a SWAT deployer to deploy this patch? If this patch is no" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406487 (https://phabricator.wikimedia.org/T184866) (owner: 10星耀晨曦) [01:45:18] PROBLEM - HHVM jobrunner on mw1335 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [01:46:18] RECOVERY - HHVM jobrunner on mw1335 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [02:38:39] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.20) (duration: 11m 40s) [02:38:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:26:58] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 851.97 seconds [04:00:08] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 225.66 seconds [05:59:31] (03CR) 10星耀晨曦: "> > Is there a SWAT deployer to deploy this patch? If this patch is" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406487 (https://phabricator.wikimedia.org/T184866) (owner: 10星耀晨曦) [06:05:30] I get an error while using VFS on Commons (copy-paste of the message not working): "API request failed..." [06:05:49] here https://commons.wikimedia.org/wiki/Special:Contributions/Kanhaiya_sao [06:11:57] https://phabricator.wikimedia.org/T187016 [06:27:16] I am not the only one ^ [06:40:38] !log Drop dewiki database from s8 servers - T184599 [06:40:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:40:52] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [06:47:28] (03PS1) 10Marostegui: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409732 (https://phabricator.wikimedia.org/T184599) [06:50:00] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409732 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [06:51:41] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409732 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [06:51:55] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409732 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [06:53:13] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1109 - T184599 (duration: 00m 56s) [06:53:25] !log Reboot db1109 to pick up new kernel [06:53:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:53:26] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [06:53:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:59:26] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409735 [07:02:43] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409735 (owner: 10Marostegui) [07:04:16] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409735 (owner: 10Marostegui) [07:05:27] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Slowly repool db1109 - T184599 (duration: 00m 55s) [07:05:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:05:40] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [07:07:12] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409735 (owner: 10Marostegui) [07:11:15] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409736 [07:15:48] PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: Traceback (most recent call last) [07:16:39] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: Traceback (most recent call last) [07:16:39] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: Traceback (most recent call last) [07:17:39] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: Traceback (most recent call last) [07:17:45] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409736 (owner: 10Marostegui) [07:17:58] PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: Traceback (most recent call last) [07:18:18] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: Traceback (most recent call last) [07:19:23] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409736 (owner: 10Marostegui) [07:19:33] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409736 (owner: 10Marostegui) [07:20:48] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1109 - T184599 (duration: 00m 55s) [07:21:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:21:01] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [07:25:58] PROBLEM - Disk space on analytics1029 is CRITICAL: DISK CRITICAL - /sys/kernel/debug/tracing is not accessible: Permission denied [07:28:03] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409738 [07:32:14] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409738 (owner: 10Marostegui) [07:33:18] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 0 probes of 303 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [07:33:21] I am checking an1029 :) [07:33:45] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409738 (owner: 10Marostegui) [07:35:16] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1109 - T184599 (duration: 00m 55s) [07:35:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:35:29] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [07:35:48] RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 1 probes of 307 (alerts on 19) - https://atlas.ripe.net/measurements/1791307/#!map [07:36:39] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 10 probes of 288 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [07:36:42] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 11 probes of 289 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [07:37:02] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409738 (owner: 10Marostegui) [07:37:39] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 11 probes of 288 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [07:37:58] RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 0 probes of 306 (alerts on 19) - https://atlas.ripe.net/measurements/1790945/#!map [07:43:37] (03PS1) 10Urbanecm: Require 7 days & 10 edits for autoconfirmed at zhwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409743 (https://phabricator.wikimedia.org/T187018) [07:44:06] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1109, depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409744 (https://phabricator.wikimedia.org/T184599) [07:45:11] (03PS1) 10Urbanecm: Add suppressredirect to autoconfirmed at zhwikt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409745 [07:46:43] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1109, depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409744 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [07:46:55] (03PS1) 10Chad: Moving Sentry to CommonSettings/extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409750 [07:46:59] !log installing exim security updates on remaining hosts [07:47:09] RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational [07:47:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:48:10] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1109, depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409744 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [07:48:20] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1109, depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409744 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [07:49:15] 10Operations, 10hardware-requests: Replace spinning disks with SSDs in conf1004-6.eqiad.wmnet - https://phabricator.wikimedia.org/T187022#3962142 (10Joe) [07:49:23] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool db1109, depool db1104 - T184599 (duration: 00m 55s) [07:49:28] 10Operations, 10hardware-requests: Replace spinning disks with SSDs in conf1004-6.eqiad.wmnet - https://phabricator.wikimedia.org/T187022#3962154 (10Joe) p:05Triage>03Normal a:03RobH [07:49:33] (03PS1) 10Urbanecm: Enable flood flag at zhwikt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409752 (https://phabricator.wikimedia.org/T187018) [07:49:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:49:38] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [07:50:28] (03PS2) 10Urbanecm: Add suppressredirect to autoconfirmed at zhwikt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409745 (https://phabricator.wikimedia.org/T187018) [07:51:13] (03CR) 10jerkins-bot: [V: 04-1] Enable flood flag at zhwikt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409752 (https://phabricator.wikimedia.org/T187018) (owner: 10Urbanecm) [07:54:24] (03PS1) 10Marostegui: db-eqiad.php: Repool db1104, depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409753 (https://phabricator.wikimedia.org/T184599) [07:55:38] (03PS2) 10Urbanecm: Enable flood flag at zhwikt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409752 (https://phabricator.wikimedia.org/T187018) [08:01:14] !log Upgrading CI Jenkins plugins [08:01:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:01:39] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1104, depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409753 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [08:03:12] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1104, depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409753 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [08:03:26] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1104, depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409753 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [08:05:20] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1104, depool db1101:3318 - T184599 (duration: 00m 55s) [08:05:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:05:34] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [08:08:54] (03PS1) 10Marostegui: db-eqiad.php: Repool db1101:3318, depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409760 (https://phabricator.wikimedia.org/T184599) [08:11:13] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1101:3318, depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409760 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [08:13:22] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1101:3318, depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409760 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [08:13:35] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1101:3318, depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409760 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [08:14:46] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1101:3318, depool db1099:3318 - T184599 (duration: 00m 55s) [08:14:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:14:58] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [08:20:43] (03PS1) 10Marostegui: db-eqiad.php: Repool db1099:3318, depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409761 (https://phabricator.wikimedia.org/T184599) [08:22:37] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1099:3318, depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409761 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [08:24:16] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1099:3318, depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409761 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [08:25:29] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1099:3318, depool db1092 - T184599 (duration: 00m 55s) [08:25:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:42] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [08:26:50] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1099:3318, depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409761 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [08:28:13] (03PS1) 10Marostegui: db-eqiad.php: Repool db1092, depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409811 (https://phabricator.wikimedia.org/T184599) [08:29:49] PROBLEM - puppet last run on mw2109 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[exim4-config] [08:29:58] PROBLEM - puppet last run on mw2183 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[exim4-config],Package[exim4-daemon-light] [08:30:36] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1092, depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409811 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [08:30:49] PROBLEM - puppet last run on mw2135 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[exim4-config],Package[exim4-daemon-light] [08:31:49] ^those are harmless [08:31:58] PROBLEM - puppet last run on mw2195 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[exim4-config] [08:32:06] morning moritzm :) [08:32:51] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1092, depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409811 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [08:33:00] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1092, depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409811 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [08:34:22] morning :-) [08:34:34] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1092, depool db1087 - T184599 (duration: 00m 55s) [08:34:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:34:47] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [08:34:48] PROBLEM - puppet last run on mw2221 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[exim4-config] [08:34:49] RECOVERY - puppet last run on mw2109 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [08:34:58] RECOVERY - puppet last run on mw2183 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [08:35:48] RECOVERY - puppet last run on mw2135 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:36:09] (03PS1) 10Marostegui: db-eqiad.php: Repool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409812 (https://phabricator.wikimedia.org/T184599) [08:36:29] !log Reboot db1087 to pick new kernel [08:36:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:50] * elukey looks for marostegui's alter tables [08:36:58] RECOVERY - puppet last run on mw2195 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [08:37:08] PROBLEM - Disk space on krypton is CRITICAL: DISK CRITICAL - /var/spool/exim4/db is not accessible: Permission denied [08:37:12] elukey: they are coming as soon as I finish my current task in a sec [08:37:16] hahahhaa [08:37:20] <3 [08:37:47] marostegui: I'd also start the week with the following question: cumin? [08:38:08] Yeah, we all love CUMIN [08:38:38] PROBLEM - puppet last run on radon is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[exim4-config],Package[exim4-daemon-light] [08:39:39] RECOVERY - puppet last run on mw2221 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [08:40:25] yup.. I discovered cumin during volans talk in FOSDEM... pretty cool tool :) [08:40:43] vgutierrez!!! o/ welcome! [08:41:21] thx marostegui :D [08:43:29] RECOVERY - puppet last run on radon is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:44:04] vgutierrez: welcome Valentin! [08:44:09] RECOVERY - Disk space on krypton is OK: DISK OK [08:44:29] hi moritzm :D [08:45:25] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409812 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [08:46:51] hi vgutierrez :D [08:48:58] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409812 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [08:49:09] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409812 (https://phabricator.wikimedia.org/T184599) (owner: 10Marostegui) [08:50:18] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1087 - T184599 (duration: 00m 55s) [08:50:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:31] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [08:52:49] hey hey vgutierrez [08:57:15] !log installing glibc security updates on trusty (harmless in our environment; CVE-2018-1000001 is non-exploitable due to disabled unprivileged user name spaces) [08:57:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:58:41] (03PS1) 10Marostegui: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409815 (https://phabricator.wikimedia.org/T162807) [09:02:44] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409815 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [09:04:20] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409815 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [09:05:28] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1089 - T162807 (duration: 00m 55s) [09:05:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:05:41] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [09:05:41] !log Stop replication in sync on db1089 and db2048 - T162807 [09:05:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:07:01] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409815 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [09:07:29] (03PS1) 10Muehlenhoff: Add library hint for glibc [puppet] - 10https://gerrit.wikimedia.org/r/409817 [09:14:32] (03CR) 10Muehlenhoff: [C: 032] Add library hint for glibc [puppet] - 10https://gerrit.wikimedia.org/r/409817 (owner: 10Muehlenhoff) [09:19:21] (03CR) 10Alexandros Kosiaris: [C: 032] Add runtime dependency to pkg_resources [software/service-checker] - 10https://gerrit.wikimedia.org/r/409374 (owner: 10Alexandros Kosiaris) [09:19:23] !log Deploy schema change on s5 - T185128 T153182 [09:19:29] elukey: ^ [09:19:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:39] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [09:19:39] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [09:21:12] (03PS1) 10Filippo Giunchedi: prometheus: tweak varnish aggregation rules [puppet] - 10https://gerrit.wikimedia.org/r/409820 (https://phabricator.wikimedia.org/T177195) [09:21:44] 10Operations, 10ops-eqiad, 10User-Eevans: Degraded RAID on restbase-dev1006 - https://phabricator.wikimedia.org/T185494#3962449 (10faidon) a:03RobH [09:23:48] PROBLEM - Disk space on stat1004 is CRITICAL: DISK CRITICAL - free space: / 197 MB (0% inode=92%) [09:23:54] on it --^ [09:24:48] RECOVERY - Disk space on stat1004 is OK: DISK OK [09:25:49] !log install swift stretch updates on ms-be eqiad - T177739 [09:26:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:26:03] T177739: Integrate stretch 9.2 point release - https://phabricator.wikimedia.org/T177739 [09:26:38] (03CR) 10Volans: [C: 031] "LGTM! I hope to be able to work on the "spinoff" soon-ish to make tasks like this one much simpler and cleaner to write." [puppet] - 10https://gerrit.wikimedia.org/r/409047 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [09:28:34] (03PS2) 10Filippo Giunchedi: prometheus: tweak varnish aggregation rules [puppet] - 10https://gerrit.wikimedia.org/r/409820 (https://phabricator.wikimedia.org/T177195) [09:28:40] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: tweak varnish aggregation rules [puppet] - 10https://gerrit.wikimedia.org/r/409820 (https://phabricator.wikimedia.org/T177195) (owner: 10Filippo Giunchedi) [09:29:53] !log installing libdatetime-timezone-perl SUA update [09:30:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:30:48] PROBLEM - Check systemd state on ms-be1027 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:34:08] PROBLEM - Check systemd state on ms-be1025 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:34:25] that's me ^ [09:35:48] RECOVERY - Check systemd state on ms-be1027 is OK: OK - running: The system is fully operational [09:36:58] PROBLEM - Check systemd state on ms-be1022 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:38:08] RECOVERY - Check systemd state on ms-be1025 is OK: OK - running: The system is fully operational [09:39:58] RECOVERY - Check systemd state on ms-be1022 is OK: OK - running: The system is fully operational [09:41:09] (03PS1) 10Marostegui: db-eqiad.php: Depool db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409824 (https://phabricator.wikimedia.org/T162807) [09:42:44] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1018 - https://phabricator.wikimedia.org/T186988#3962491 (10fgiunchedi) a:03Cmjohnson @Cmjohnson looks like the BBU battery, similar to T171183 and T166777 [09:43:30] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409824 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [09:45:04] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409824 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [09:45:56] 10Operations, 10Traffic: Ops Onboarding for Valentín Gutiérrez - https://phabricator.wikimedia.org/T187035#3962498 (10ema) [09:46:24] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1067 - T162807 (duration: 00m 56s) [09:46:34] !log Stop replication in sync on db1089 and db1067 - T162807 [09:46:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:46:38] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [09:46:42] 10Operations, 10Traffic: Ops Onboarding for Valentín Gutiérrez - https://phabricator.wikimedia.org/T187035#3962510 (10ema) p:05Triage>03Normal [09:46:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:03] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409824 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [09:47:58] 10Operations, 10Ops-Access-Requests, 10Traffic: Ops Onboarding for Valentín Gutiérrez - https://phabricator.wikimedia.org/T187035#3962498 (10ema) [09:50:29] 10Operations, 10Patch-For-Review, 10Prometheus-metrics-monitoring: prometheus: ganglia-gen and outdated Ganglia:cluster resource name - https://phabricator.wikimedia.org/T186918#3962525 (10fgiunchedi) p:05Triage>03Normal [09:50:49] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1067" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409827 [09:51:01] !log reboot mw1302 (hhvm defunct processes, hungs registered in dmesg, very high load) [09:51:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:53:48] RECOVERY - HHVM jobrunner on mw1302 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [09:54:18] RECOVERY - Nginx local proxy to apache on mw1302 is OK: HTTP OK: HTTP/1.1 200 OK - 245 bytes in 0.007 second response time [09:54:30] (03PS7) 10Muehlenhoff: Remove access for myself [puppet] - 10https://gerrit.wikimedia.org/r/407577 (https://phabricator.wikimedia.org/T186289) (owner: 10Yuvipanda) [09:54:55] 10Operations, 10Ops-Access-Requests, 10Traffic: Ops Onboarding for Valentín Gutiérrez - https://phabricator.wikimedia.org/T187035#3962548 (10ema) [09:55:00] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1067" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409827 (owner: 10Marostegui) [09:56:20] (03CR) 10Muehlenhoff: [C: 032] Remove access for myself [puppet] - 10https://gerrit.wikimedia.org/r/407577 (https://phabricator.wikimedia.org/T186289) (owner: 10Yuvipanda) [09:56:33] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1067" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409827 (owner: 10Marostegui) [09:56:58] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1067" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409827 (owner: 10Marostegui) [09:57:00] (03PS2) 10Ema: lvs: don't bind prometheus-node-exporter on INADDR_ANY [puppet] - 10https://gerrit.wikimedia.org/r/409338 (https://phabricator.wikimedia.org/T176182) [09:57:22] (03CR) 10Ema: [V: 032 C: 032] lvs: don't bind prometheus-node-exporter on INADDR_ANY [puppet] - 10https://gerrit.wikimedia.org/r/409338 (https://phabricator.wikimedia.org/T176182) (owner: 10Ema) [09:57:38] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1067 - T162807 (duration: 00m 55s) [09:57:48] (03CR) 10Volans: "See a couple of questions inline" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/406535 (https://phabricator.wikimedia.org/T185862) (owner: 10Dzahn) [09:57:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:57:51] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [09:59:30] PROBLEM - DPKG on ms-be1032 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:00:08] (03PS1) 10Marostegui: db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409828 (https://phabricator.wikimedia.org/T162807) [10:00:20] RECOVERY - DPKG on ms-be1032 is OK: All packages OK [10:02:34] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409828 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [10:03:17] 10Operations, 10media-storage: xfs_db blocked / timeout on ms-be2023 - https://phabricator.wikimedia.org/T185298#3962559 (10fgiunchedi) p:05Triage>03Normal [10:03:37] 10Operations, 10Wikidata: Badges not displaying on trwiki - https://phabricator.wikimedia.org/T186815#3962560 (10fgiunchedi) p:05Triage>03Normal [10:04:18] 10Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 10Zuul: Upload new zuul and jenkins-debian-glue packages to apt.wikimedia.org - https://phabricator.wikimedia.org/T186786#3962573 (10fgiunchedi) p:05Triage>03Normal [10:04:38] (03CR) 10Volans: "See inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/409443 (owner: 10Herron) [10:04:40] 10Operations, 10ORES, 10Scoring-platform-team: Clean up redundant ORES celery_workers defaults - https://phabricator.wikimedia.org/T186734#3962586 (10fgiunchedi) p:05Triage>03Low [10:04:49] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409828 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [10:04:55] 10Operations, 10Maps-Sprint, 10Traffic: Decide on Cache-Control headers for map tiles - https://phabricator.wikimedia.org/T186732#3962587 (10fgiunchedi) p:05Triage>03Normal [10:06:15] 10Operations, 10Cassandra, 10RESTBase-Cassandra, 10Services (doing), and 2 others: Upload cassandra package(s) to wikimedia apt repository - https://phabricator.wikimedia.org/T186619#3962609 (10fgiunchedi) p:05Triage>03Normal [10:06:29] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1066 - T162807 (duration: 00m 55s) [10:06:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:46] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [10:06:51] 10Operations, 10docker-pkg: Allow selecting which images to build - https://phabricator.wikimedia.org/T186416#3962611 (10fgiunchedi) p:05Triage>03Normal [10:07:02] !log Stop replication in sync on db1089 and db1066 - T162807 [10:07:03] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409828 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [10:07:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:07:14] (03PS6) 10Ema: wmf-upgrade-varnish: initial release [puppet] - 10https://gerrit.wikimedia.org/r/409047 (https://phabricator.wikimedia.org/T168529) [10:07:33] (03CR) 10Ema: [V: 032 C: 032] wmf-upgrade-varnish: initial release [puppet] - 10https://gerrit.wikimedia.org/r/409047 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [10:08:40] PROBLEM - DPKG on ms-be1031 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:09:40] RECOVERY - DPKG on ms-be1031 is OK: All packages OK [10:10:52] (03PS1) 10Filippo Giunchedi: admin: revoke access for shrlak [puppet] - 10https://gerrit.wikimedia.org/r/409832 (https://phabricator.wikimedia.org/T186614) [10:13:00] 10Operations, 10monitoring, 10Patch-For-Review: Evaluate Grafana's LDAP group options and deprecate grafana-admin if possible - https://phabricator.wikimedia.org/T170150#3962628 (10akosiaris) Unfortunately, this aint gonna happen today. I 've had no time to test the migration yet and it would irresponsible t... [10:17:38] (03CR) 10Muehlenhoff: [C: 04-1] admin: revoke access for shrlak (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/409832 (https://phabricator.wikimedia.org/T186614) (owner: 10Filippo Giunchedi) [10:18:20] PROBLEM - DPKG on ms-be1033 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:19:20] RECOVERY - DPKG on ms-be1033 is OK: All packages OK [10:21:49] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1066" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409833 [10:21:59] (03CR) 10Filippo Giunchedi: admin: revoke access for shrlak (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/409832 (https://phabricator.wikimedia.org/T186614) (owner: 10Filippo Giunchedi) [10:22:46] (03PS2) 10Filippo Giunchedi: admin: revoke access for shrlak [puppet] - 10https://gerrit.wikimedia.org/r/409832 (https://phabricator.wikimedia.org/T186614) [10:22:53] 10Operations, 10Wikidata: Badges not displaying on trwiki - https://phabricator.wikimedia.org/T186815#3956134 (10jcrespo) This is maybe a defect on the deployment of the badges extension/Wikidata or a breakage on apache redirects. However, I have open other languages, that show the image on the browser, and al... [10:23:20] PROBLEM - DPKG on ms-be1034 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:24:20] RECOVERY - DPKG on ms-be1034 is OK: All packages OK [10:25:30] 10Operations, 10Wikidata: Badges not displaying on trwiki - https://phabricator.wikimedia.org/T186815#3962684 (10jcrespo) [10:27:50] PROBLEM - DPKG on ms-be1036 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:28:50] RECOVERY - DPKG on ms-be1036 is OK: All packages OK [10:32:48] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1066" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409833 (owner: 10Marostegui) [10:34:15] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1066" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409833 (owner: 10Marostegui) [10:36:47] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1066 - T162807 (duration: 00m 55s) [10:36:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:00] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [10:37:01] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1066" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409833 (owner: 10Marostegui) [10:41:51] 10Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests: Give misc dump crons their own host - https://phabricator.wikimedia.org/T181936#3962761 (10ArielGlenn) @hoo, hopefully you are back and recovered from the various trips and things, could you please give more detail on the... [10:44:47] (03PS1) 10Muehlenhoff: Remove absented users from Icinga config [puppet] - 10https://gerrit.wikimedia.org/r/409837 [10:49:34] (03CR) 10Filippo Giunchedi: "LGTM, would be nice to move these to a template + variable" [puppet] - 10https://gerrit.wikimedia.org/r/409837 (owner: 10Muehlenhoff) [10:54:00] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409840 (https://phabricator.wikimedia.org/T128546) [10:54:11] PROBLEM - DPKG on ms-be1035 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:55:11] RECOVERY - DPKG on ms-be1035 is OK: All packages OK [10:56:10] (03PS2) 10Muehlenhoff: Remove absented users from Icinga config [puppet] - 10https://gerrit.wikimedia.org/r/409837 [10:58:01] (03CR) 10Muehlenhoff: [C: 032] Remove absented users from Icinga config [puppet] - 10https://gerrit.wikimedia.org/r/409837 (owner: 10Muehlenhoff) [11:00:05] jan_drewniak: It is that lovely time of the day again! You are hereby commanded to deploy Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180212T1100). [11:00:05] No GERRIT patches in the queue for this window AFAICS. [11:01:43] (03CR) 10Elukey: [V: 032 C: 032] Initial packaging [debs/prometheus-burrow-exporter] (debian) - 10https://gerrit.wikimedia.org/r/409310 (https://phabricator.wikimedia.org/T180442) (owner: 10Elukey) [11:03:57] (03CR) 10Jdrewniak: [C: 032] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409840 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [11:06:03] PROBLEM - DPKG on ms-be1029 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:06:46] (03CR) 10Hashar: [C: 031] "deployment-mediawiki07 (no idea what it is for) is using Stretch so seems we will need the same package for Stretch as well and remember " [puppet] - 10https://gerrit.wikimedia.org/r/409018 (owner: 10Muehlenhoff) [11:07:03] RECOVERY - DPKG on ms-be1029 is OK: All packages OK [11:07:50] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409840 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [11:08:01] (03CR) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409840 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [11:08:29] (03CR) 10Volans: "Nice to have it in Python! Thanks for taking care of it. See few comments/questions inline." (0313 comments) [puppet] - 10https://gerrit.wikimedia.org/r/409054 (https://phabricator.wikimedia.org/T181410) (owner: 10Filippo Giunchedi) [11:10:00] (03CR) 10Muehlenhoff: "Thanks. This is only necessary on jessie, stretch already has the modern ICU from the start." [puppet] - 10https://gerrit.wikimedia.org/r/409018 (owner: 10Muehlenhoff) [11:11:03] PROBLEM - DPKG on ms-be1037 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:12:03] RECOVERY - DPKG on ms-be1037 is OK: All packages OK [11:14:18] (03PS3) 10Muehlenhoff: Add apt configuration to switch deployment-prep to the ICU57-enabled HHVM build [puppet] - 10https://gerrit.wikimedia.org/r/409018 [11:14:24] !log jdrewniak@tin Synchronized portals/prod/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:402805|Bumping portals to master (T128546)]] (duration: 00m 57s) [11:14:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:14:37] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [11:15:22] !log jdrewniak@tin Synchronized portals: Wikimedia Portals Update: [[gerrit:402805|Bumping portals to master (T128546)]] (duration: 00m 58s) [11:15:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:57] (03CR) 10Muehlenhoff: [C: 032] Add apt configuration to switch deployment-prep to the ICU57-enabled HHVM build [puppet] - 10https://gerrit.wikimedia.org/r/409018 (owner: 10Muehlenhoff) [11:21:58] (03PS1) 10Filippo Giunchedi: prometheus: fix aggregation rules for ops [puppet] - 10https://gerrit.wikimedia.org/r/409843 [11:22:35] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: fix aggregation rules for ops [puppet] - 10https://gerrit.wikimedia.org/r/409843 (owner: 10Filippo Giunchedi) [11:22:42] (03PS2) 10Filippo Giunchedi: prometheus: fix aggregation rules for ops [puppet] - 10https://gerrit.wikimedia.org/r/409843 [11:28:20] (03PS1) 10Vgutierrez: Add vgutierrez shell account in ops [puppet] - 10https://gerrit.wikimedia.org/r/409844 (https://phabricator.wikimedia.org/T187035) [11:38:52] (03PS1) 10Ema: pybal::monitoring: use host IP address in check_pybal_ipvs_diff [puppet] - 10https://gerrit.wikimedia.org/r/409846 (https://phabricator.wikimedia.org/T176182) [11:39:12] (03CR) 10Giuseppe Lavagetto: [C: 032] Add Python 3 support [software/conftool] - 10https://gerrit.wikimedia.org/r/387544 (owner: 10Volans) [11:43:36] (03CR) 10Ema: "All good according to PCC: https://puppet-compiler.wmflabs.org/compiler02/9926/" [puppet] - 10https://gerrit.wikimedia.org/r/409846 (https://phabricator.wikimedia.org/T176182) (owner: 10Ema) [11:43:47] (03CR) 10Muehlenhoff: Add vgutierrez shell account in ops (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/409844 (https://phabricator.wikimedia.org/T187035) (owner: 10Vgutierrez) [11:45:18] (03CR) 10Giuseppe Lavagetto: [C: 032] cli.tool: drop the "find" interface [software/conftool] - 10https://gerrit.wikimedia.org/r/405301 (owner: 10Giuseppe Lavagetto) [11:46:17] (03PS1) 10Giuseppe Lavagetto: conftool::scripts: convert to using select instead of find [puppet] - 10https://gerrit.wikimedia.org/r/409850 [11:46:31] <_joe_> volans: ^^ [11:46:39] yeah, watching :D [11:47:42] (03CR) 10Filippo Giunchedi: [C: 031] pybal::monitoring: use host IP address in check_pybal_ipvs_diff [puppet] - 10https://gerrit.wikimedia.org/r/409846 (https://phabricator.wikimedia.org/T176182) (owner: 10Ema) [11:49:07] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/409850 (owner: 10Giuseppe Lavagetto) [11:49:13] (03PS3) 10Giuseppe Lavagetto: Add preemptive validation. [software/conftool] - 10https://gerrit.wikimedia.org/r/405302 (https://phabricator.wikimedia.org/T185080) [11:49:30] (03CR) 10Ema: [C: 032] pybal::monitoring: use host IP address in check_pybal_ipvs_diff [puppet] - 10https://gerrit.wikimedia.org/r/409846 (https://phabricator.wikimedia.org/T176182) (owner: 10Ema) [11:49:53] (03CR) 10Volans: [C: 031] "LGTM" [software/conftool] - 10https://gerrit.wikimedia.org/r/405302 (https://phabricator.wikimedia.org/T185080) (owner: 10Giuseppe Lavagetto) [11:50:20] (03CR) 10Giuseppe Lavagetto: [C: 032] Add preemptive validation. [software/conftool] - 10https://gerrit.wikimedia.org/r/405302 (https://phabricator.wikimedia.org/T185080) (owner: 10Giuseppe Lavagetto) [11:51:20] (03Merged) 10jenkins-bot: Add preemptive validation. [software/conftool] - 10https://gerrit.wikimedia.org/r/405302 (https://phabricator.wikimedia.org/T185080) (owner: 10Giuseppe Lavagetto) [11:53:08] (03CR) 10Muehlenhoff: [C: 031] admin: revoke access for shrlak [puppet] - 10https://gerrit.wikimedia.org/r/409832 (https://phabricator.wikimedia.org/T186614) (owner: 10Filippo Giunchedi) [11:56:42] (03CR) 10Giuseppe Lavagetto: Refactor conftool.action, add the edit action (031 comment) [software/conftool] - 10https://gerrit.wikimedia.org/r/405303 (owner: 10Giuseppe Lavagetto) [11:57:36] (03PS2) 10Vgutierrez: Add vgutierrez shell account in ops [puppet] - 10https://gerrit.wikimedia.org/r/409844 (https://phabricator.wikimedia.org/T187035) [11:59:35] (03PS18) 10ArielGlenn: [WIP] php7 manifests for mediawiki on stretch [puppet] - 10https://gerrit.wikimedia.org/r/394977 [12:00:11] (03PS5) 10Giuseppe Lavagetto: Refactor conftool.action, add the edit action [software/conftool] - 10https://gerrit.wikimedia.org/r/405303 (https://phabricator.wikimedia.org/T185080) [12:01:25] (03CR) 10jerkins-bot: [V: 04-1] Refactor conftool.action, add the edit action [software/conftool] - 10https://gerrit.wikimedia.org/r/405303 (https://phabricator.wikimedia.org/T185080) (owner: 10Giuseppe Lavagetto) [12:01:43] (03CR) 10Vgutierrez: Add vgutierrez shell account in ops (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/409844 (https://phabricator.wikimedia.org/T187035) (owner: 10Vgutierrez) [12:01:45] (03CR) 10Filippo Giunchedi: "I think this broke puppet on graphite1003 and graphite2002:" [puppet] - 10https://gerrit.wikimedia.org/r/409200 (owner: 10Dzahn) [12:01:46] <_joe_> ahah I forgot to change the test [12:02:02] eheheh [12:03:22] !log upgrading jessie-based servers in deployment-prep/beta to the HHVM build using ICU 57 (component/icu57) [12:03:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:09:12] 10Puppet, 10AbuseFilter, 10User-MarcoAurelio: Setup puppet cron to delete old data daily - https://phabricator.wikimedia.org/T187053#3962999 (10MarcoAurelio) p:05Triage>03Normal [12:16:56] 10Operations, 10LDAP-Access-Requests: ldap/ops membership for vgutierrez - https://phabricator.wikimedia.org/T187055#3963035 (10Vgutierrez) p:05Triage>03Normal [12:20:43] (03PS6) 10Giuseppe Lavagetto: Refactor conftool.action, add the edit action [software/conftool] - 10https://gerrit.wikimedia.org/r/405303 (https://phabricator.wikimedia.org/T185080) [12:22:53] (03CR) 10Volans: [C: 031] "LGTM" [software/conftool] - 10https://gerrit.wikimedia.org/r/405303 (https://phabricator.wikimedia.org/T185080) (owner: 10Giuseppe Lavagetto) [12:23:50] (03CR) 10Giuseppe Lavagetto: [C: 032] Refactor conftool.action, add the edit action [software/conftool] - 10https://gerrit.wikimedia.org/r/405303 (https://phabricator.wikimedia.org/T185080) (owner: 10Giuseppe Lavagetto) [12:26:21] (03CR) 10Alexandros Kosiaris: [C: 04-1] "I think I 've fixed this this time around correctly. Version 0.1.4-2 has python(3)-pkg-sources in Depends:" (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/405205 (https://phabricator.wikimedia.org/T184220) (owner: 10Dduvall) [12:28:38] (03CR) 10Alexandros Kosiaris: [C: 04-1] otrs: apache -> httpd module (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/409462 (owner: 10Dzahn) [12:30:39] 10Puppet, 10AbuseFilter, 10User-MarcoAurelio: Setup puppet cron to delete old data daily - https://phabricator.wikimedia.org/T187053#3963141 (10MarcoAurelio) I see that https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/mediawiki/manifests/maintenance/purge_abusefilter.pp is... [12:35:19] 10Puppet, 10AbuseFilter, 10User-MarcoAurelio: Setup puppet cron to delete old data daily - https://phabricator.wikimedia.org/T187053#3963143 (10MarcoAurelio) Weird: https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/profile/manifests/mediawiki/maintenance.pp;9bd6328c0f4499da... [12:38:39] (03PS1) 10Alexandros Kosiaris: Specify the ops-staff-group correctly [puppet] - 10https://gerrit.wikimedia.org/r/409871 [12:38:41] (03PS1) 10Alexandros Kosiaris: Remove the default ops-staff-group [puppet] - 10https://gerrit.wikimedia.org/r/409872 [12:41:04] (03PS2) 10Alexandros Kosiaris: Remove the now defunct ops-staff-group [puppet] - 10https://gerrit.wikimedia.org/r/409872 [12:43:24] PROBLEM - HHVM rendering on mw2207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:44:23] RECOVERY - HHVM rendering on mw2207 is OK: HTTP OK: HTTP/1.1 200 OK - 80299 bytes in 0.293 second response time [12:45:46] 10Operations, 10Beta-Cluster-Infrastructure: Remove video scaler instances from deployment-prep - https://phabricator.wikimedia.org/T187063#3963166 (10MoritzMuehlenhoff) [12:45:49] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851#3963176 (10akosiaris) >>! In T171851#3959677, @Halfak wrote: > I found this in our deploy repo. > > {P6677} > > Not sure what is going on as th... [12:52:25] (03PS4) 10Giuseppe Lavagetto: Safely load yaml files [software/conftool] - 10https://gerrit.wikimedia.org/r/408290 (https://phabricator.wikimedia.org/T185080) [12:53:46] (03CR) 10jerkins-bot: [V: 04-1] Safely load yaml files [software/conftool] - 10https://gerrit.wikimedia.org/r/408290 (https://phabricator.wikimedia.org/T185080) (owner: 10Giuseppe Lavagetto) [12:54:26] <_joe_> uhm [12:56:59] (03PS5) 10Giuseppe Lavagetto: Safely load yaml files [software/conftool] - 10https://gerrit.wikimedia.org/r/408290 (https://phabricator.wikimedia.org/T185080) [13:00:04] (03PS1) 10Muehlenhoff: Add a component for Cassandra 3.11 packages for stretch-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/409877 (https://phabricator.wikimedia.org/T186619) [13:06:56] 10Puppet, 10AbuseFilter, 10User-MarcoAurelio: Setup puppet cron to delete old data daily - https://phabricator.wikimedia.org/T187053#3962999 (10Reedy) Something isn't right for sure... Running `select * from abuse_filter_log where afl_ip <> "" ORDER BY afl_id limit 1;` on `enwiki` gives a row of with `afl_t... [13:11:01] !log Deploy schema change on db2084 and db2075 - T185128 T153182 [13:11:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:11:16] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [13:11:16] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [13:12:33] (03PS2) 10Arturo Borrero Gonzalez: apt: apt-upgrade: add switch for the node name output [puppet] - 10https://gerrit.wikimedia.org/r/409323 (https://phabricator.wikimedia.org/T181647) [13:13:03] PROBLEM - HHVM rendering on mw2245 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:13:53] RECOVERY - HHVM rendering on mw2245 is OK: HTTP OK: HTTP/1.1 200 OK - 80343 bytes in 1.079 second response time [13:30:23] 10Puppet, 10Beta-Cluster-Infrastructure, 10Wikidata, 10User-Addshore: mediawiki::maintenance::wikidata should not run crons for testwikidatawiki when used on labs / a testwikidatawiki doesnt exist - https://phabricator.wikimedia.org/T173357#3963295 (10Addshore) [13:37:19] (03PS1) 10Marostegui: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409880 (https://phabricator.wikimedia.org/T162807) [13:39:20] 10Operations, 10HHVM, 10User-Elukey: Provide a forward port of ICU 52 for stretch / Investigate best ICU update strategy - https://phabricator.wikimedia.org/T177498#3963329 (10MoritzMuehlenhoff) Beta/deployment-prep has been upgraded to an HHVM build using ICU 57. [13:39:37] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409880 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [13:40:33] 10Operations: Update ICU version to 55.1 - https://phabricator.wikimedia.org/T143931#3963343 (10MoritzMuehlenhoff) Beta has been upgraded to ICU 57, we'll also upgrade production to that version at (no timeline established yet). [13:41:16] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409880 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [13:42:30] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409880 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [13:42:56] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1105:3311 - T162807 (duration: 01m 06s) [13:43:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:43:10] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [13:53:10] 10Operations, 10ops-codfw, 10Cloud-VPS: Connect labtestvirt2003 eth1 and eth2 interface(s) to switch fabric - https://phabricator.wikimedia.org/T183167#3963382 (10chasemp) >>! In T183167#3950688, @chasemp wrote: > We sorted things out in real time and the definitive is: > > ```labtestvirt2001:eth0 = ge-5/0/... [13:58:33] RECOVERY - Disk space on analytics1029 is OK: DISK OK [14:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for European Mid-day SWAT(Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180212T1400). [14:00:04] Jhs and Addshore: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:23] I can SWAT today [14:00:56] addshore: want to deploy your patch yourself? [14:01:08] I can do [14:01:14] it needs a full scap, so I'll go last [14:01:29] addshore: go ahead, since jhs is not around [14:01:35] ack [14:01:47] +2ed [14:02:27] db1073 down? [14:02:44] 10Operations, 10User-fgiunchedi: Integrate stretch 9.2 point release - https://phabricator.wikimedia.org/T177739#3668407 (10fgiunchedi) Swift fully rolled out in eqiad/codfw [14:03:19] seems overloaded, checking [14:04:29] ongoing issues there from mediawiki perspective too [14:05:10] jynus: want me to hold off on my sync (only i18n messages) or? [14:05:25] if you were doing something, please pause for a minute [14:05:34] even if unrelated, better not complicate stuff [14:05:50] Yup, unrelated but ill pause, give me a ping once youve finished investigating [14:05:58] I will [14:07:12] there are 2 hosts that are under maintenance, that could create issues [14:07:39] marostegui^ [14:07:48] but not sure if the core of the issue [14:08:07] (03CR) 10Zfilipin: [C: 031] Set category collation for nowikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406022 (https://phabricator.wikimedia.org/T185630) (owner: 10Jon Harald Søby) [14:08:17] it seems a blockage on show slave status, which could be due to a long running write [14:08:17] zeljkof, :) [14:08:27] (03CR) 10Filippo Giunchedi: [C: 031] Add a component for Cassandra 3.11 packages for stretch-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/409877 (https://phabricator.wikimedia.org/T186619) (owner: 10Muehlenhoff) [14:08:56] 10Operations, 10User-fgiunchedi: Integrate stretch 9.2 point release - https://phabricator.wikimedia.org/T177739#3963435 (10MoritzMuehlenhoff) 05Open>03Resolved Fully rolled out now. [14:09:00] Jhs: swat is on hold because of unrelated problems, I'm reviewing your commits [14:09:03] the contention started at 13:45 aprox [14:09:17] (03CR) 10Muehlenhoff: [C: 032] Add a component for Cassandra 3.11 packages for stretch-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/409877 (https://phabricator.wikimedia.org/T186619) (owner: 10Muehlenhoff) [14:09:23] zeljkof, ok, thx [14:09:43] https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&from=1518440977366&to=1518444577366&var-dc=eqiad%20prometheus%2Fops&var-server=db1073&var-port=9104 [14:09:48] seems solved now [14:10:14] addshore: all yours [14:10:20] thanks jynus! [14:10:21] we are back to normal [14:10:22] continuing [14:10:45] there is, however, some long running transactions I do not like [14:12:13] error: insufficient permission for adding an object to repository database .git/objects [14:12:14] bah [14:13:44] addshore: wait, what?! [14:13:55] during my git rebase [14:14:05] error: insufficient permission for adding an object to repository database .git/objects [14:14:05] fatal: failed to write commit object [14:14:16] um, how do you resolve that? [14:14:32] should we ping ops? [14:14:57] (03Abandoned) 10Ottomata: Ensure specific librdkafka version for changeprop and eventstreams [puppet] - 10https://gerrit.wikimedia.org/r/404540 (https://phabricator.wikimedia.org/T176126) (owner: 10Ottomata) [14:15:08] (03CR) 10Zfilipin: [C: 031] Add 3 namespaces to wawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405258 (https://phabricator.wikimedia.org/T185289) (owner: 10Jon Harald Søby) [14:16:11] I'll take a look on tin, addshore zeljkof [14:16:18] thanks godog [14:16:21] thanks godog [14:16:25] I was just about to ping you :) [14:17:27] addshore: can you paste more output from the error? I wanted to understand which directory that is [14:17:49] which parent directory that is [14:17:59] /srv/mediawiki-staging/php-1.31.0-wmf.20 [14:18:56] ugghhh yeah some directories there don't have g+w [14:19:01] It looks like it has stopped half way through the rebase :) [14:20:17] !log grant group write for wikidev on tin on /srv/mediawiki-staging/php-1.31.0-wmf.20/.git [14:20:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:20:45] addshore: can you try aborting the rebase and do it again? should work [14:21:00] godog: looks good :) [14:21:07] (03PS3) 10Zfilipin: Add 3 namespaces to wawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405258 (https://phabricator.wikimedia.org/T185289) (owner: 10Jon Harald Søby) [14:21:33] sweet [14:21:41] Right, sync time! :) [14:21:44] Thanks! [14:22:14] np, my guess would be sth didn't go according to plan when adding wmf.20 [14:22:21] !log otto@tin Started deploy [eventlogging/analytics@01d5761]: T186833 [14:22:26] !log otto@tin Finished deploy [eventlogging/analytics@01d5761]: T186833 (duration: 00m 04s) [14:22:26] (03PS3) 10Vgutierrez: Add vgutierrez shell account [puppet] - 10https://gerrit.wikimedia.org/r/409844 (https://phabricator.wikimedia.org/T187035) [14:22:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:35] T186833: Include X-Client-IP in EventLogging data and geocode during Hive JSON Refinement - https://phabricator.wikimedia.org/T186833 [14:22:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:29] !log addshore@tin Started scap: T186612 [[gerrit:409063]] TwoColConflict wmf.20 (Remove hint and link from twoColConflict-beta-feature-description) [14:24:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:43] T186612: TwoColConflict change feedback link in beta section - https://phabricator.wikimedia.org/T186612 [14:24:43] zeljkof: lets see how quick this full scap is :D [14:24:50] not done one in a while [14:24:55] addshore: should be quick [14:31:11] i18n cache done, now syncing :) [14:31:37] 10Operations, 10Scap: insufficient permission for adding an object to repository database .git/objects - https://phabricator.wikimedia.org/T187076#3963515 (10fgiunchedi) p:05Triage>03Normal I'll hold back fixing naos too to leave time #release-engineering-team to inspect the situation. [14:31:52] 10Operations, 10Scap: Deploy error: insufficient permission for adding an object to repository database .git/objects - https://phabricator.wikimedia.org/T187076#3963518 (10fgiunchedi) [14:32:01] filed as ^ zeljkof addshore [14:32:15] also thcipriani ^ [14:32:16] thanks godog [14:32:24] np! joys of clinic duty [14:37:43] (03PS2) 10Zfilipin: Set category collation for nowikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406022 (https://phabricator.wikimedia.org/T185630) (owner: 10Jon Harald Søby) [14:38:38] !log uploading cassandra 3.11.0-wmf5 to component/cassandra311 for stretch-wikimedia/apt.wikimedia.org (T186619) [14:38:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:38:54] T186619: Upload cassandra package(s) to wikimedia apt repository - https://phabricator.wikimedia.org/T186619 [14:40:16] addshore: still syncing? [14:40:19] (03PS6) 10Giuseppe Lavagetto: Safely load yaml files [software/conftool] - 10https://gerrit.wikimedia.org/r/408290 (https://phabricator.wikimedia.org/T185080) [14:40:20] yup [14:40:21] (03PS5) 10Giuseppe Lavagetto: Add support for jsonschema-based entities [software/conftool] - 10https://gerrit.wikimedia.org/r/408585 (https://phabricator.wikimedia.org/T185080) [14:40:21] 15 left [14:40:23] 12 [14:40:28] 6 [14:40:40] T-6 [14:40:50] oh, scap-cdb-rebuild next! :P [14:41:08] Scap should output the total number of stages and how far through it is too! [14:41:27] scap-cdb-rebuild = 6 of n stages [14:41:29] 10Operations, 10Cassandra, 10RESTBase-Cassandra, 10Patch-For-Review, and 3 others: Upload cassandra package(s) to wikimedia apt repository - https://phabricator.wikimedia.org/T186619#3963538 (10MoritzMuehlenhoff) 05Open>03Resolved a:03MoritzMuehlenhoff Uploaded to apt.wikimedia.org. To add it to a se... [14:42:17] 10Operations, 10Cassandra, 10RESTBase-Cassandra, 10Patch-For-Review, and 3 others: Upload cassandra package(s) to wikimedia apt repository - https://phabricator.wikimedia.org/T186619#3963542 (10Eevans) [14:42:33] PROBLEM - High CPU load on API appserver on mw1227 is CRITICAL: CRITICAL - load average: 50.57, 32.07, 27.30 [14:42:39] Jhs: please stand by, your patch is important to us [14:42:54] you are next, addshore should be done soon [14:44:19] !log andrew@tin Started deploy [horizon/deploy@de72527]: just checking that this still doesn't work [14:44:23] !log andrew@tin Finished deploy [horizon/deploy@de72527]: just checking that this still doesn't work (duration: 00m 04s) [14:44:25] !log addshore@tin Finished scap: T186612 [[gerrit:409063]] TwoColConflict wmf.20 (Remove hint and link from twoColConflict-beta-feature-description) (duration: 19m 56s) [14:44:30] zeljkof: ^^ [14:44:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:39] addshore: done? [14:44:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:49] yup [14:44:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:55] T186612: TwoColConflict change feedback link in beta section - https://phabricator.wikimedia.org/T186612 [14:45:02] addshore: ok, taking over swat [14:45:06] Jhs: still around for swat? [14:45:08] enjoy! [14:45:23] (03CR) 10Thiemo Kreuz (WMDE): [C: 031] Add federation-related configs for clients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409622 (https://phabricator.wikimedia.org/T186955) (owner: 10Ladsgroup) [14:46:11] I'll depool mw1227, looks like it is overheating/not well [14:46:52] godog: It was complaining about that a day ago too btw [14:47:16] !log filippo@neodymium conftool action : set/pooled=no; selector: name=mw1227.eqiad.wmnet [14:47:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:47:38] Wiki13: easy to believe, I bet it didn't happen overnight [14:48:15] nah, it probably has been complaining about it overnight as well [14:48:34] Jhs: your patches will not be deployed if you are not around to test [14:48:55] err, yeah bad wording! I meant "begin" instead of "happen" [14:50:38] (03PS2) 10Ottomata: EventLogging: emit X-Client-IP and parse as `ip` field [puppet] - 10https://gerrit.wikimedia.org/r/409354 (https://phabricator.wikimedia.org/T186833) [14:50:54] (03CR) 10Ottomata: [V: 032 C: 032] EventLogging: emit X-Client-IP and parse as `ip` field [puppet] - 10https://gerrit.wikimedia.org/r/409354 (https://phabricator.wikimedia.org/T186833) (owner: 10Ottomata) [14:51:02] looks like jhs is not around, closing swat window [14:51:09] !log EU SWAT finished [14:51:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:33] !log emitting IP field from varnishkafka-eventlogging instance T186833 [14:51:40] godog: https://wm-bot.wmflabs.org/logs/%23wikimedia-operations/20180211.txt at the bottom [14:51:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:47] T186833: Include X-Client-IP in EventLogging data and geocode during Hive JSON Refinement - https://phabricator.wikimedia.org/T186833 [14:52:33] RECOVERY - High CPU load on API appserver on mw1227 is OK: OK - load average: 8.28, 17.71, 23.40 [14:52:42] Wiki13: sweet! thanks [14:52:58] no problem :) [14:54:10] !log upload prometheus-burrow-exporter 0.0.4 on jessie/stretch-wikimedia [14:54:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:25] 10Puppet, 10AbuseFilter, 10User-MarcoAurelio: Setup puppet cron to delete old data daily - https://phabricator.wikimedia.org/T187053#3963588 (10Reedy) Looks like @Huji should've fixed it in https://github.com/wikimedia/mediawiki-extensions-AbuseFilter/commit/8ca391c8e0912438495cb6eb390b22ed123a9434 for T1869... [14:55:31] 10Operations, 10ops-eqiad: Heating alerts for mw servers in eqiad - https://phabricator.wikimedia.org/T149287#2747695 (10fgiunchedi) mw1227 has been alerting over the weekend of high load, I depooled it and noticed it was on the list of machines with temperature overheating as well, so likely related. [14:56:59] (03CR) 10Volans: [C: 031] "LGTM" [software/conftool] - 10https://gerrit.wikimedia.org/r/408290 (https://phabricator.wikimedia.org/T185080) (owner: 10Giuseppe Lavagetto) [14:57:53] aww, i missed it. thought it wasn't happening. tomorrow's another day then :) [15:00:58] !log roll-upgrade thumbor to 1.12 - T186500 T186594 T186492 [15:01:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:01:12] T186500: Thumbor failing on some SVGs with "ValueError: invalid literal for int() with base 10" - https://phabricator.wikimedia.org/T186500 [15:01:13] T186594: Log time when request makes it to Thumbor - https://phabricator.wikimedia.org/T186594 [15:01:13] T186492: Improve Thumbor error logging - https://phabricator.wikimedia.org/T186492 [15:02:58] !log reedy@tin Synchronized php-1.31.0-wmf.20/extensions/AbuseFilter/maintenance/: Fix maintenance scripts (duration: 00m 56s) [15:03:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:02] 10Operations, 10Wikimedia-General-or-Unknown: Re-consider ` >/dev/null 2>&1` as output of many cron'd MW maintenance scripts - https://phabricator.wikimedia.org/T187078#3963637 (10Reedy) [15:06:47] (03PS3) 10Filippo Giunchedi: Improve Thumbor error logging [puppet] - 10https://gerrit.wikimedia.org/r/409319 (https://phabricator.wikimedia.org/T186492) (owner: 10Gilles) [15:07:40] (03CR) 10Filippo Giunchedi: [C: 032] Improve Thumbor error logging [puppet] - 10https://gerrit.wikimedia.org/r/409319 (https://phabricator.wikimedia.org/T186492) (owner: 10Gilles) [15:12:14] 10Operations, 10Ops-Access-Requests, 10Traffic, 10Patch-For-Review: Ops Onboarding for Valentín Gutiérrez - https://phabricator.wikimedia.org/T187035#3962498 (10Vgutierrez) Public GPG key for pwstore access: ``` -----BEGIN PGP PUBLIC KEY BLOCK----- mQINBFqBcZoBEACfsj5/PYP3lHfihfGWVBDkp4GfB8JFTVTeUQS+r8YDh... [15:19:38] (03PS1) 10Muehlenhoff: Add library hint for libtasn [puppet] - 10https://gerrit.wikimedia.org/r/409911 [15:21:32] (03PS1) 10Elukey: role::kafka::analytics::burrow: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/409912 (https://phabricator.wikimedia.org/T180442) [15:21:40] (03PS1) 10BBlack: Add ntp.eqsin, fix ntp.ulsfo [dns] - 10https://gerrit.wikimedia.org/r/409913 [15:22:07] (03PS1) 10BBlack: eqsin: switch installer NTP to local [puppet] - 10https://gerrit.wikimedia.org/r/409914 [15:22:13] (03CR) 10Muehlenhoff: [C: 032] Add library hint for libtasn [puppet] - 10https://gerrit.wikimedia.org/r/409911 (owner: 10Muehlenhoff) [15:22:15] (03CR) 10BBlack: [C: 032] Add ntp.eqsin, fix ntp.ulsfo [dns] - 10https://gerrit.wikimedia.org/r/409913 (owner: 10BBlack) [15:22:51] (03PS2) 10BBlack: eqsin: switch installer NTP to local [puppet] - 10https://gerrit.wikimedia.org/r/409914 [15:23:03] !log installing libtasn security updates [15:23:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:25:36] (03CR) 10BBlack: [C: 032] eqsin: switch installer NTP to local [puppet] - 10https://gerrit.wikimedia.org/r/409914 (owner: 10BBlack) [15:26:17] (03CR) 10Volans: [C: 04-1] "Nice! One quick thing to fix. I've done only a quick pass on the tests." (033 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/408585 (https://phabricator.wikimedia.org/T185080) (owner: 10Giuseppe Lavagetto) [15:28:49] !log Stop replication in sync on db1089 and db1105:3311 - T162807 [15:29:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:08] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [15:29:54] (03PS1) 10Eevans: cassandra: add instance ID to list of custom logstash fields [puppet] - 10https://gerrit.wikimedia.org/r/409916 (https://phabricator.wikimedia.org/T130862) [15:30:12] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409917 [15:30:49] (03CR) 10Addshore: Add edit and create rate limit for wikidatawiki (034 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408629 (https://phabricator.wikimedia.org/T184948) (owner: 10Ladsgroup) [15:32:10] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409917 (owner: 10Marostegui) [15:33:18] (03CR) 10Addshore: Add edit and create rate limit for wikidatawiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408629 (https://phabricator.wikimedia.org/T184948) (owner: 10Ladsgroup) [15:33:49] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409917 (owner: 10Marostegui) [15:34:02] (03PS1) 10BBlack: lvs500[123] dhcp macaddrs [puppet] - 10https://gerrit.wikimedia.org/r/409919 [15:34:18] (03CR) 10BBlack: [V: 032 C: 032] lvs500[123] dhcp macaddrs [puppet] - 10https://gerrit.wikimedia.org/r/409919 (owner: 10BBlack) [15:35:25] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 - T162807 (duration: 00m 55s) [15:35:37] (03CR) 10Addshore: [C: 04-1] Add edit and create rate limit for wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408629 (https://phabricator.wikimedia.org/T184948) (owner: 10Ladsgroup) [15:35:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:35:40] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [15:35:45] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409920 [15:37:01] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409917 (owner: 10Marostegui) [15:37:15] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409920 [15:38:40] (03PS1) 10Ema: icinga: add check_established_connections plugin [puppet] - 10https://gerrit.wikimedia.org/r/409921 (https://phabricator.wikimedia.org/T170847) [15:38:42] (03PS1) 10Ema: pybal: check established TCP connections to etcd [puppet] - 10https://gerrit.wikimedia.org/r/409922 (https://phabricator.wikimedia.org/T170847) [15:40:17] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409920 (owner: 10Marostegui) [15:42:23] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409920 (owner: 10Marostegui) [15:42:36] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409920 (owner: 10Marostegui) [15:43:31] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1089 - T162807 (duration: 00m 55s) [15:43:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:45] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [15:52:28] 10Operations, 10ops-eqiad, 10Analytics-Kanban: dbstore1002 possibly MEMORY issues - https://phabricator.wikimedia.org/T183771#3963851 (10Nuria) 05Open>03Resolved [16:01:21] (03PS1) 10Andrew Bogott: labtestn doesn't have a labweb host currently [puppet] - 10https://gerrit.wikimedia.org/r/409927 (https://phabricator.wikimedia.org/T187084) [16:01:48] (03CR) 10jerkins-bot: [V: 04-1] labtestn doesn't have a labweb host currently [puppet] - 10https://gerrit.wikimedia.org/r/409927 (https://phabricator.wikimedia.org/T187084) (owner: 10Andrew Bogott) [16:01:57] (03PS3) 10Matthias Mullie: Enable 3D on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403680 (https://phabricator.wikimedia.org/T184728) [16:01:59] (03PS1) 10Matthias Mullie: Enable 3D on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409928 (https://phabricator.wikimedia.org/T184728) [16:02:41] (03PS2) 10Andrew Bogott: labtestn: use the normal labtest labweb for firewall purposes [puppet] - 10https://gerrit.wikimedia.org/r/409927 (https://phabricator.wikimedia.org/T187084) [16:03:01] (03CR) 10jerkins-bot: [V: 04-1] labtestn: use the normal labtest labweb for firewall purposes [puppet] - 10https://gerrit.wikimedia.org/r/409927 (https://phabricator.wikimedia.org/T187084) (owner: 10Andrew Bogott) [16:03:59] (03PS3) 10Andrew Bogott: labtestn: use the normal labtest labweb for firewall purposes [puppet] - 10https://gerrit.wikimedia.org/r/409927 (https://phabricator.wikimedia.org/T187084) [16:04:57] (03CR) 10Andrew Bogott: [C: 032] labtestn: use the normal labtest labweb for firewall purposes [puppet] - 10https://gerrit.wikimedia.org/r/409927 (https://phabricator.wikimedia.org/T187084) (owner: 10Andrew Bogott) [16:06:34] (03PS4) 10Gilles: Proxy public wiki thumb.php requests through Thumbor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407611 (https://phabricator.wikimedia.org/T169144) [16:08:36] 10Operations, 10Scap: Deploy error: insufficient permission for adding an object to repository database .git/objects - https://phabricator.wikimedia.org/T187076#3963501 (10demon) This usually happens for one of two reasons # A root user has come along and stolen ownership to root. This should't happen often in... [16:10:21] 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 4 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#3963941 (10Niedzielski) I've submitted a [[ https://gerrit.wikimedia.org/r/409115 | WIP patch ]] and commented on known incomplete parts. [16:15:10] (03PS1) 10Milimetric: Blacklist Print schema, pushing too many events [puppet] - 10https://gerrit.wikimedia.org/r/409930 [16:16:32] !log andrew@tin Started deploy [horizon/deploy@de72527]: scap debugging run [16:16:34] (03PS1) 10Rush: keystone: add missing for installed packages [puppet] - 10https://gerrit.wikimedia.org/r/409931 [16:16:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:16:56] !log andrew@tin Finished deploy [horizon/deploy@de72527]: scap debugging run (duration: 00m 24s) [16:17:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:11] (03PS2) 10Rush: openstack: labtest codify mysql settings [puppet] - 10https://gerrit.wikimedia.org/r/408793 [16:17:48] (03PS2) 10Rush: keystone: add missing for installed packages [puppet] - 10https://gerrit.wikimedia.org/r/409931 [16:20:32] (03PS1) 10BBlack: webproxy: add eqsin networks to ACL [puppet] - 10https://gerrit.wikimedia.org/r/409933 [16:21:28] (03CR) 10BBlack: [C: 032] webproxy: add eqsin networks to ACL [puppet] - 10https://gerrit.wikimedia.org/r/409933 (owner: 10BBlack) [16:21:30] (03CR) 10Rush: [C: 032] openstack: labtest codify mysql settings [puppet] - 10https://gerrit.wikimedia.org/r/408793 (owner: 10Rush) [16:22:37] (03PS3) 10Rush: openstack: labtest codify mysql settings [puppet] - 10https://gerrit.wikimedia.org/r/408793 [16:26:09] !log andrew@tin Started deploy [horizon/deploy@4d1bdeb]: updating requirements.txt [16:26:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:56] 10Operations, 10Traffic: Migrate dns caches to stretch - https://phabricator.wikimedia.org/T187090#3964044 (10BBlack) p:05Triage>03Normal [16:27:13] !log andrew@tin Finished deploy [horizon/deploy@4d1bdeb]: updating requirements.txt (duration: 01m 04s) [16:27:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:58] (03PS3) 10Rush: keystone: add missing for installed packages [puppet] - 10https://gerrit.wikimedia.org/r/409931 [16:29:53] 10Operations, 10LDAP-Access-Requests: ldap/ops membership for vgutierrez - https://phabricator.wikimedia.org/T187055#3963035 (10RobH) This typically isn't added until AFTER the user's shell account is created, or we'll just have to do two updates to admins module. Once the shell user is made, this is 2 second... [16:30:25] 10Operations, 10Ops-Access-Requests, 10Traffic, 10Patch-For-Review: Ops Onboarding for Valentín Gutiérrez - https://phabricator.wikimedia.org/T187035#3964083 (10RobH) [16:33:22] (03PS4) 10Rush: keystone: add missing for installed packages [puppet] - 10https://gerrit.wikimedia.org/r/409931 [16:34:16] (03CR) 10Rush: [C: 032] keystone: add missing for installed packages [puppet] - 10https://gerrit.wikimedia.org/r/409931 (owner: 10Rush) [16:34:25] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851#3964104 (10awight) @akosiaris Do you know whether that group_size change will apply to rollback as well? We want the rollback to be as fast as po... [16:39:48] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851#3964154 (10akosiaris) Yes it will. Which is why I am experimenting already with `fetch_batch_size` [1] [1] https://github.com/wikimedia/scap/blob... [16:42:25] (03PS1) 10Rush: keystone: pair down installed package list [puppet] - 10https://gerrit.wikimedia.org/r/409940 [16:43:29] (03CR) 10Rush: [C: 032] keystone: pair down installed package list [puppet] - 10https://gerrit.wikimedia.org/r/409940 (owner: 10Rush) [16:45:02] (03PS1) 10Gilles: Whitelist new Thumbor-Request-Date header in Swift [puppet] - 10https://gerrit.wikimedia.org/r/409942 (https://phabricator.wikimedia.org/T186594) [16:47:21] (03PS1) 10Rush: keystone: websockify package comes with spiceproxy [puppet] - 10https://gerrit.wikimedia.org/r/409944 [16:47:34] 10Operations, 10Ops-Access-Requests, 10Traffic, 10Patch-For-Review: Ops Onboarding for Valentín Gutiérrez - https://phabricator.wikimedia.org/T187035#3962498 (10RobH) So there is typically a shell request task, and then the ops team approves the access to root and ops groups. We went ahead and just had ou... [16:47:57] (03CR) 10Rush: [C: 032] keystone: websockify package comes with spiceproxy [puppet] - 10https://gerrit.wikimedia.org/r/409944 (owner: 10Rush) [16:51:53] PROBLEM - puppet last run on labcontrol1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:52:51] !log demon@tin Synchronized php-1.31.0-wmf.20/extensions/VisualEditor/ApiVisualEditor.php: T186934 (duration: 00m 57s) [16:53:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:04] T186934: Undefined variable: bodyOnly in ApiVisualEditor - https://phabricator.wikimedia.org/T186934 [16:54:37] (03PS5) 10Dduvall: Add service-checker image used to test service images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/405205 (https://phabricator.wikimedia.org/T184220) [16:56:06] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=dns5001.wikimedia.org [16:56:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:56:57] (03PS1) 10Rush: keystone: ldap-util can come from multiple places [puppet] - 10https://gerrit.wikimedia.org/r/409947 [16:57:54] (03CR) 10Rush: [C: 032] keystone: ldap-util can come from multiple places [puppet] - 10https://gerrit.wikimedia.org/r/409947 (owner: 10Rush) [16:58:19] (03CR) 10Alexandros Kosiaris: [C: 032] Add service-checker image used to test service images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/405205 (https://phabricator.wikimedia.org/T184220) (owner: 10Dduvall) [16:59:02] (03CR) 10Chad: "The lack of relative URLs was the whole reason I had to add baseUrl to begin with ;-)" [puppet] - 10https://gerrit.wikimedia.org/r/409211 (https://phabricator.wikimedia.org/T184116) (owner: 10Chad) [16:59:14] marxarelli: I am removing your -1 on https://gerrit.wikimedia.org/r/#/c/405205/ and merging [17:00:01] marxarelli: I 've also started a build that should push the image soon [17:00:11] akosiaris: ah. thank you! [17:00:28] cmjohnson1: would it work for you to do https://phabricator.wikimedia.org/T186534 poolcounter1002 tomorrow ? [17:01:07] (03PS1) 10Rush: keystone: ldap-util issues with differing roles [puppet] - 10https://gerrit.wikimedia.org/r/409949 [17:01:39] (03CR) 10Rush: [C: 032] keystone: ldap-util issues with differing roles [puppet] - 10https://gerrit.wikimedia.org/r/409949 (owner: 10Rush) [17:01:50] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Add service-checker image used to test service images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/405205 (https://phabricator.wikimedia.org/T184220) (owner: 10Dduvall) [17:01:58] Hi all. I currently have a question about SWAT deployment. If I submit a patch, is the patch schedule by either one of the deployers or by the patch owner? [17:02:44] When you "submit" a patch for SWAT... You put it in one of the windows for deployment [17:03:35] By the patch owner? [17:04:03] Doesn't have to be, no [17:04:24] godog: yes [17:05:09] cmjohnson1: ok thanks! [17:05:32] My situation, https://gerrit.wikimedia.org/r/#/c/406487/ [17:05:38] How do I do that, I'm not a SWAT deployer [17:06:53] RECOVERY - puppet last run on labcontrol1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:07:06] razesoldier: you add it to Deployments page on wikitech [17:07:25] razesoldier: explained at https://wikitech.wikimedia.org/wiki/Deployments [17:08:21] Any one window? [17:08:59] one where you'll be able to be present online in this channel during the deployment [17:09:00] one that is free, and someone could help test it, etc [17:09:04] (03PS3) 10Dzahn: Decom: Remove mgmt DNS entries for db201[6-9],db2023 and db202[8-9] [dns] - 10https://gerrit.wikimedia.org/r/407173 (owner: 10Papaul) [17:09:25] !log andrew@tin Started deploy [horizon/deploy@01021b4]: rolling out new dashboards [17:09:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:09:52] I understand, thanks you [17:10:19] !log andrew@tin Finished deploy [horizon/deploy@01021b4]: rolling out new dashboards (duration: 00m 54s) [17:10:20] razesoldier: it has to be Evening, European or Morning SWAT [17:10:26] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install tendril2001 - https://phabricator.wikimedia.org/T186123#3964293 (10Papaul) @jcrespo @Marostegui Hello this has been already a week since last week I have no update if we have to keep the name or not on this system. if you have time... [17:10:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:10:32] other slots are for other kinds of deployment [17:10:54] https://wikitech.wikimedia.org/wiki/SWAT_deploys#How_to_submit_a_patch_for_SWAT [17:11:43] (03CR) 10Dzahn: [C: 032] Decom: Remove mgmt DNS entries for db201[6-9],db2023 and db202[8-9] [dns] - 10https://gerrit.wikimedia.org/r/407173 (owner: 10Papaul) [17:11:50] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install tendril2001 - https://phabricator.wikimedia.org/T186123#3964305 (10jcrespo) @Papaul as you may have heard, we are in a kind of an emergency right now busy on fixing other stuff, this will have to be delayed. [17:11:54] (03PS2) 10Ottomata: Blacklist Print schema, pushing too many events [puppet] - 10https://gerrit.wikimedia.org/r/409930 (owner: 10Milimetric) [17:12:03] (03CR) 10Ottomata: [V: 032 C: 032] Blacklist Print schema, pushing too many events [puppet] - 10https://gerrit.wikimedia.org/r/409930 (owner: 10Milimetric) [17:13:25] 10Operations, 10ops-codfw, 10DBA, 10hardware-requests: Decommission db2016, db2017, db2018, db2019, db2023, db2028, db2029 - https://phabricator.wikimedia.org/T184090#3964308 (10RobH) [17:13:33] 10Operations, 10ops-codfw, 10DBA, 10hardware-requests: Decommission db2016, db2017, db2018, db2019, db2023, db2028, db2029 - https://phabricator.wikimedia.org/T184090#3872109 (10RobH) 05Open>03Resolved [17:15:44] 10Operations, 10ops-codfw, 10DBA, 10hardware-requests: Decommission db2016, db2017, db2018, db2019, db2023, db2028, db2029 - https://phabricator.wikimedia.org/T184090#3964327 (10RobH) [17:16:04] (03PS1) 10Niedzielski: WIP: Hygiene: remove pdfrender and electron-render services [puppet] - 10https://gerrit.wikimedia.org/r/409952 (https://phabricator.wikimedia.org/T186748) [17:17:21] {{ircnick}} template irc-nickname to write my name? [17:17:43] (03PS1) 10Gilles: Avoid default 60s nginx proxy timeouts for thumbor [puppet] - 10https://gerrit.wikimedia.org/r/409954 (https://phabricator.wikimedia.org/T185466) [17:18:22] !log home dirs on stat1004 moved to /srv/home (/home symlinks to it) [17:18:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:21:56] !log demon@tin Pruned MediaWiki: 1.31.0-wmf.17 [keeping static files] (duration: 02m 08s) [17:22:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:22:49] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3964359 (10fgiunchedi) [17:29:38] ACKNOWLEDGEMENT - Host lvs5001.mgmt is DOWN: PING CRITICAL - Packet loss = 100% Brandon Black eqsin mgmt unreachable at this time, still bringing site up [17:29:38] ACKNOWLEDGEMENT - Host lvs5002.mgmt is DOWN: PING CRITICAL - Packet loss = 100% Brandon Black eqsin mgmt unreachable at this time, still bringing site up [17:29:38] ACKNOWLEDGEMENT - Host lvs5003.mgmt is DOWN: PING CRITICAL - Packet loss = 100% Brandon Black eqsin mgmt unreachable at this time, still bringing site up [17:31:06] 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 4 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#3964397 (10phuedx) > Chromium, a dependency of the project, is packaged by Puppeteer and as an outcome of T178189 and T178570, Puppeteer's in... [17:31:27] (03PS1) 10Alexandros Kosiaris: Add the ORES cluster to wikimedia_clusters [puppet] - 10https://gerrit.wikimedia.org/r/409962 (https://phabricator.wikimedia.org/T171851) [17:32:09] (03CR) 10Alexandros Kosiaris: [C: 032] Add the ORES cluster to wikimedia_clusters [puppet] - 10https://gerrit.wikimedia.org/r/409962 (https://phabricator.wikimedia.org/T171851) (owner: 10Alexandros Kosiaris) [17:34:17] !log akosiaris@tin Started deploy [ores/deploy@f7e23f4]: T171851 [17:34:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:34:32] T171851: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851 [17:34:57] !log added thumborUrl to PrivateSettings.php on labs, in preparation for https://gerrit.wikimedia.org/r/#/c/407611/ [17:35:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:35:49] 10Operations, 10Phabricator, 10Release-Engineering-Team, 10User-Elukey: Phabricator down due to "Failed to `proc_open()`: proc_open() expects parameter 2 to be array" - https://phabricator.wikimedia.org/T186620#3964413 (10mmodell) [17:35:53] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3964414 (10mmodell) [17:39:35] 10Operations, 10Phabricator, 10Release-Engineering-Team, 10User-Elukey: Phabricator down due to "Failed to `proc_open()`: proc_open() expects parameter 2 to be array" - https://phabricator.wikimedia.org/T186620#3949613 (10mmodell) I think this is probably the same root cause as T182832: that one is an unso... [17:41:00] 10Operations, 10Phabricator, 10Release-Engineering-Team, 10User-Elukey: Phabricator down due to "Failed to `proc_open()`: proc_open() expects parameter 2 to be array" - https://phabricator.wikimedia.org/T186620#3964442 (10mmodell) I've been away for 2 weeks due to our team offsite, travel and vacation time... [17:42:40] RECOVERY - Router interfaces on cr1-eqsin is OK: OK: host 103.102.166.129, interfaces up: 63, down: 0, dormant: 0, excluded: 0, unused: 0 [17:42:41] 10Operations, 10Phabricator, 10Release-Engineering-Team, 10User-Elukey: Phabricator down due to "Failed to `proc_open()`: proc_open() expects parameter 2 to be array" - https://phabricator.wikimedia.org/T186620#3964444 (10elukey) >>! In T186620#3964442, @mmodell wrote: > I've been away for 2 weeks due to o... [17:44:21] 10Operations, 10Ops-Access-Requests, 10Traffic, 10Patch-For-Review: Ops Onboarding for Valentín Gutiérrez - https://phabricator.wikimedia.org/T187035#3962498 (10Dzahn) added to WMF-NDA-Requests group on Phabricator https://phabricator.wikimedia.org/project/members/974/ [17:45:25] !log andrew@tin Started deploy [horizon/deploy@01021b4]: rolling out new dashboards again [17:45:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:45:42] !log andrew@tin Finished deploy [horizon/deploy@01021b4]: rolling out new dashboards again (duration: 00m 17s) [17:45:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:40] 10Operations, 10Phabricator, 10Release-Engineering-Team, 10User-Elukey: Phabricator down due to "Failed to `proc_open()`: proc_open() expects parameter 2 to be array" - https://phabricator.wikimedia.org/T186620#3964452 (10Paladox) I think upstream test with php 7.1 now (even though php 5.2+ is supported) s... [17:47:35] !log akosiaris@tin Finished deploy [ores/deploy@f7e23f4]: T171851 (duration: 13m 18s) [17:47:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:47:49] T171851: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851 [17:47:53] 10Puppet, 10AbuseFilter, 10User-MarcoAurelio: Setup puppet cron to delete old data daily - https://phabricator.wikimedia.org/T187053#3964454 (10MarcoAurelio) @Reedy But the script has been running in prod, right? It was not so long ago when we added the `wfRequireExtension ('AbuseFilter');` to this, and that... [17:48:34] 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 4 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#3964456 (10Niedzielski) [17:50:13] 10Puppet, 10AbuseFilter, 10User-MarcoAurelio: Setup puppet cron to delete old data daily - https://phabricator.wikimedia.org/T187053#3964476 (10Reedy) https://gerrit.wikimedia.org/r/#/c/326723/ December 2016.. It's been there a long time `20160916011613` would suggest it has been broken for 3 months before... [17:51:10] 10Puppet, 10AbuseFilter, 10User-MarcoAurelio: Setup puppet cron to delete old data daily - https://phabricator.wikimedia.org/T187053#3964481 (10Reedy) ``` reedy@tin:~$ mwscript extensions/AbuseFilter/maintenance/purgeOldLogIPData.php testwiki Purging old IP Address data from abuse_filter_log... 200 400 600 8... [17:53:53] 10Puppet, 10AbuseFilter, 10User-MarcoAurelio: Setup puppet cron to delete old data daily - https://phabricator.wikimedia.org/T187053#3964489 (10MarcoAurelio) >>! In T187053#3964476, @Reedy wrote: > https://gerrit.wikimedia.org/r/#/c/326723/ > > December 2016.. It's been there a long time > > `2016091601161... [17:54:29] Reedy: just took a coffee and it seems I need another one ;-) [17:58:24] 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 4 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#3964511 (10Niedzielski) Thanks @phuedx and sorry for the confusion. I've revised this task and the related patchsets. > PUPPETEER_SKIP_CHROM... [18:00:00] !log akosiaris@tin Started deploy [ores/deploy@f7e23f4]: T171851 [18:00:06] gehel: Dear deployers, time to do the Wikidata Query Service weekly deploy deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180212T1800). [18:00:07] No GERRIT patches in the queue for this window AFAICS. [18:00:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:15] T171851: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851 [18:01:24] (03CR) 10Dzahn: otrs: apache -> httpd module (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/409462 (owner: 10Dzahn) [18:04:36] jouncebot: new WDQS GUI comping up [18:05:49] (03PS1) 10Dzahn: graphite: add httpd to production role, not just primary [puppet] - 10https://gerrit.wikimedia.org/r/409978 [18:07:01] (03CR) 10Alexandros Kosiaris: [C: 04-1] otrs: apache -> httpd module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/409462 (owner: 10Dzahn) [18:07:35] !log gehel@tin Started deploy [wdqs/wdqs@b6bd483]: new WDQS GUI [18:07:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:09:28] !log gehel@tin Finished deploy [wdqs/wdqs@b6bd483]: new WDQS GUI (duration: 01m 53s) [18:09:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:10:05] (03PS2) 10Dzahn: graphite: add httpd to production role, not just primary [puppet] - 10https://gerrit.wikimedia.org/r/409978 [18:10:07] SMalyshev: wdqs deployment completed, tests are green [18:10:13] (03PS1) 10Volans: Backend: allow to extract random subset of hosts [software/cumin] - 10https://gerrit.wikimedia.org/r/409980 (https://phabricator.wikimedia.org/T186818) [18:10:35] gehel: thank you! [18:11:00] SMalyshev: my pleasure, as always! [18:11:29] (03PS3) 10Dzahn: graphite: add httpd to production role, not just primary [puppet] - 10https://gerrit.wikimedia.org/r/409978 [18:12:30] !log akosiaris@tin Finished deploy [ores/deploy@f7e23f4]: T171851 (duration: 12m 30s) [18:12:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:43] T171851: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851 [18:13:13] (03CR) 10Dzahn: [C: 032] graphite: add httpd to production role, not just primary [puppet] - 10https://gerrit.wikimedia.org/r/409978 (owner: 10Dzahn) [18:14:32] (03CR) 10Dzahn: [C: 032] "yes it did.. thanks for pointing it out. fixed with https://gerrit.wikimedia.org/r/#/c/409978/" [puppet] - 10https://gerrit.wikimedia.org/r/409200 (owner: 10Dzahn) [18:14:50] RECOVERY - puppet last run on graphite1003 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [18:17:20] 10Operations, 10Puppet: Setup some alert mechanism when some 'critical' cron jobs fail - https://phabricator.wikimedia.org/T187101#3964579 (10MarcoAurelio) [18:17:40] RECOVERY - puppet last run on graphite2002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [18:18:28] (03CR) 10Dzahn: [C: 032] Gerrit: Proxy gitiles through gerrit.wikimedia.org/g/ [puppet] - 10https://gerrit.wikimedia.org/r/409211 (https://phabricator.wikimedia.org/T184116) (owner: 10Chad) [18:19:57] (03CR) 10Dzahn: ">Should be rebased on top of I7d4a83f, requires that." [puppet] - 10https://gerrit.wikimedia.org/r/409385 (owner: 10Paladox) [18:22:53] (03CR) 10Dzahn: [C: 031] "has https://gerrit.wikimedia.org/r/#/c/409385/ as parent but needs to be merged first" [puppet] - 10https://gerrit.wikimedia.org/r/409211 (https://phabricator.wikimedia.org/T184116) (owner: 10Chad) [18:23:10] !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: ores1001.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=ores', 'service=ores']) [18:23:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:40] PROBLEM - puppet last run on graphite1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:27:53] !log akosiaris@tin Started deploy [ores/deploy@f7e23f4]: T171851 [18:28:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:06] T171851: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851 [18:29:28] 10Puppet, 10Cloud-Services: Retire and remove module labs_debrepo - https://phabricator.wikimedia.org/T153612#3964625 (10Multichill) [18:29:54] 10Puppet, 10Cloud-Services: Retire and remove module labs_debrepo - https://phabricator.wikimedia.org/T153612#2885673 (10Multichill) @scfc do your thing :-) [18:33:34] (03PS1) 10Rush: keystone: bootstrapping framework for install [puppet] - 10https://gerrit.wikimedia.org/r/409992 [18:34:11] (03CR) 10jerkins-bot: [V: 04-1] keystone: bootstrapping framework for install [puppet] - 10https://gerrit.wikimedia.org/r/409992 (owner: 10Rush) [18:34:39] !log akosiaris@tin Finished deploy [ores/deploy@f7e23f4]: T171851 (duration: 06m 47s) [18:34:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:34:52] T171851: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851 [18:35:55] !log akosiaris@tin Started deploy [ores/deploy@f7e23f4]: T171851 [18:36:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:36:50] (03PS2) 10Rush: keystone: bootstrapping framework for install [puppet] - 10https://gerrit.wikimedia.org/r/409992 [18:37:25] (03CR) 10jerkins-bot: [V: 04-1] keystone: bootstrapping framework for install [puppet] - 10https://gerrit.wikimedia.org/r/409992 (owner: 10Rush) [18:39:51] (03PS1) 10Herron: add puppetdb role to puppetdb[12]001 servers [puppet] - 10https://gerrit.wikimedia.org/r/409995 (https://phabricator.wikimedia.org/T185499) [18:40:12] (03PS1) 10Niedzielski: New: add chromium_render service [puppet] - 10https://gerrit.wikimedia.org/r/409996 (https://phabricator.wikimedia.org/T178166) [18:41:40] PROBLEM - puppet last run on graphite2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:48:08] !log akosiaris@tin Finished deploy [ores/deploy@f7e23f4]: T171851 (duration: 12m 14s) [18:48:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:48:22] (03PS1) 10BBlack: eqsin: use LVS recdns for normal hosts/installer [puppet] - 10https://gerrit.wikimedia.org/r/409999 (https://phabricator.wikimedia.org/T156027) [18:48:22] T171851: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851 [18:49:31] (03CR) 10BBlack: [C: 032] eqsin: use LVS recdns for normal hosts/installer [puppet] - 10https://gerrit.wikimedia.org/r/409999 (https://phabricator.wikimedia.org/T156027) (owner: 10BBlack) [18:50:20] !log akosiaris@tin Started deploy [ores/deploy@f7e23f4]: T171851 [18:50:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:56:58] 10Operations, 10Ops-Access-Requests: Access request: #mediawiki_security for Quiddity - https://phabricator.wikimedia.org/T187108#3964766 (10Quiddity) [18:57:59] !log akosiaris@tin Finished deploy [ores/deploy@f7e23f4]: T171851 (duration: 07m 39s) [18:58:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:58:13] T171851: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851 [18:59:49] (03PS3) 10Rush: keystone: bootstrapping framework for install [puppet] - 10https://gerrit.wikimedia.org/r/409992 [19:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor I � Unicode. All rise for Morning SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180212T1900). [19:00:04] tgr, Niharika, and Gilles: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:18] o/ [19:00:26] (03CR) 10jerkins-bot: [V: 04-1] keystone: bootstrapping framework for install [puppet] - 10https://gerrit.wikimedia.org/r/409992 (owner: 10Rush) [19:00:31] Hey. I can SWAT. [19:01:05] thanks [19:01:27] o/ [19:01:45] (03PS4) 10Rush: keystone: bootstrapping framework for install [puppet] - 10https://gerrit.wikimedia.org/r/409992 [19:02:01] (03CR) 10Niharika29: [C: 032] "SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409171 (https://phabricator.wikimedia.org/T45086) (owner: 10Gergő Tisza) [19:03:26] (03CR) 10Chad: "Iff5fe547 has to land first." [puppet] - 10https://gerrit.wikimedia.org/r/409211 (https://phabricator.wikimedia.org/T184116) (owner: 10Chad) [19:03:42] (03CR) 10Chad: "Which is the parent." [puppet] - 10https://gerrit.wikimedia.org/r/409211 (https://phabricator.wikimedia.org/T184116) (owner: 10Chad) [19:04:40] (03Merged) 10jenkins-bot: Stop PHP errors from going to the hhvm channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409171 (https://phabricator.wikimedia.org/T45086) (owner: 10Gergő Tisza) [19:05:31] tgr: Labs-only, do you want to test? [19:05:50] It's on mwdebug1002. [19:05:59] Niharika: it's an error reporting change, so I'd have to break labs :) [19:06:18] I'll just check if it works after it's deployed [19:06:55] That's what they're for. ;) [19:07:17] (03CR) 10jenkins-bot: Stop PHP errors from going to the hhvm channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409171 (https://phabricator.wikimedia.org/T45086) (owner: 10Gergő Tisza) [19:07:24] (03CR) 10Niharika29: [C: 032] "SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407611 (https://phabricator.wikimedia.org/T169144) (owner: 10Gilles) [19:07:47] !log niharika29@tin Synchronized wmf-config/InitialiseSettings-labs.php: Stop PHP errors from going to the hhvm channel T45086 (duration: 00m 56s) [19:07:50] tgr: Done. [19:07:56] thx! [19:08:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:08:01] T45086: Capture PHP warnings with stacktraces in MediaWiki and save to logstash - https://phabricator.wikimedia.org/T45086 [19:11:45] (03Merged) 10jenkins-bot: Proxy public wiki thumb.php requests through Thumbor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407611 (https://phabricator.wikimedia.org/T169144) (owner: 10Gilles) [19:11:59] (03CR) 10jenkins-bot: Proxy public wiki thumb.php requests through Thumbor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407611 (https://phabricator.wikimedia.org/T169144) (owner: 10Gilles) [19:12:27] !log niharika29@tin Synchronized php-1.31.0-wmf.20/extensions/PageAssessments/: Fix 500 error with PageAssessments API T185037 (duration: 00m 56s) [19:12:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:12:42] T185037: Fatal error: Class undefined: PageAssessmentsBody - https://phabricator.wikimedia.org/T185037 [19:13:04] !log andrew@tin Started deploy [horizon/deploy@01021b4]: trying another force [19:13:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:13:20] !log andrew@tin Finished deploy [horizon/deploy@01021b4]: trying another force (duration: 00m 17s) [19:13:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:13:32] gilles: Your patch is on mwdebug1002. [19:13:42] Niharika: thanks, testing [19:14:04] (03PS1) 10Rush: labtest: add rabbit_cleanup_pass dummy [labs/private] - 10https://gerrit.wikimedia.org/r/410008 [19:14:11] (03PS5) 10Rush: keystone: bootstrapping framework for install [puppet] - 10https://gerrit.wikimedia.org/r/409992 [19:14:12] no_justification: hi, question, which gitile related change should be merged first, second and last please? [19:14:44] Cookie, then proxy it, then swap the baseUrl [19:14:44] (03CR) 10Rush: [V: 032 C: 032] labtest: add rabbit_cleanup_pass dummy [labs/private] - 10https://gerrit.wikimedia.org/r/410008 (owner: 10Rush) [19:14:56] (it'll work proxied, just the links point to wrong place) [19:15:06] If we swap links before we proxy it, they'll 404 [19:15:16] Ok thanks [19:15:18] So [19:15:21] https://gerrit.wikimedia.org/r/#/c/409216/ first [19:15:26] mutante: ^^ [19:15:37] Then https://gerrit.wikimedia.org/r/#/c/409211/ [19:15:44] Niharika: works fine [19:15:48] Then https://gerrit.wikimedia.org/r/#/c/409385/ [19:15:51] Ack. [19:16:00] That's the order they're in gerrit already [19:16:09] The parent of 409211 is 409216 [19:16:14] (03PS3) 10Paladox: Gerrit: Set cookie path to / [puppet] - 10https://gerrit.wikimedia.org/r/409216 (owner: 10Chad) [19:16:28] (03PS4) 10Paladox: Gerrit: Proxy gitiles through gerrit.wikimedia.org/g/ [puppet] - 10https://gerrit.wikimedia.org/r/409211 (https://phabricator.wikimedia.org/T184116) (owner: 10Chad) [19:16:41] (03PS5) 10Paladox: Gerrit: Set gerrit.baseUrl in gitiles.config [puppet] - 10https://gerrit.wikimedia.org/r/409385 [19:17:03] Yep [19:17:26] !log niharika29@tin Synchronized wmf-config/filebackend.php: Proxy public wiki thumb.php requests through Thumbor T169144 (duration: 00m 55s) [19:17:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:17:38] gilles: Synced. [19:17:38] T169144: Serve thumb.php requests with Thumbor - https://phabricator.wikimedia.org/T169144 [19:17:54] (03CR) 10Paladox: "> >Should be rebased on top of I7d4a83f, requires that." [puppet] - 10https://gerrit.wikimedia.org/r/409385 (owner: 10Paladox) [19:18:06] And that's all for today's episode. [19:20:28] (03CR) 10Paladox: [C: 031] "This one is first." [puppet] - 10https://gerrit.wikimedia.org/r/409216 (owner: 10Chad) [19:20:37] (03CR) 10Paladox: [C: 031] "This is second" [puppet] - 10https://gerrit.wikimedia.org/r/409211 (https://phabricator.wikimedia.org/T184116) (owner: 10Chad) [19:20:49] (03CR) 10Paladox: "This is last." [puppet] - 10https://gerrit.wikimedia.org/r/409385 (owner: 10Paladox) [19:21:27] (03PS6) 10Rush: keystone: bootstrapping framework for install [puppet] - 10https://gerrit.wikimedia.org/r/409992 [19:22:04] (03CR) 10jerkins-bot: [V: 04-1] keystone: bootstrapping framework for install [puppet] - 10https://gerrit.wikimedia.org/r/409992 (owner: 10Rush) [19:24:37] (03PS1) 10Rush: admin_token dummy for labtestn [labs/private] - 10https://gerrit.wikimedia.org/r/410013 [19:25:13] /go team [19:25:16] grrr [19:25:20] (03CR) 10Rush: [V: 032 C: 032] admin_token dummy for labtestn [labs/private] - 10https://gerrit.wikimedia.org/r/410013 (owner: 10Rush) [19:25:31] Niharika: works fine in prod, seem like thumb.php requests fail on beta, but that's not worth reverting for, I'll fix it now or tomorrow. thumb.php requests are highly unusual, usually crafted manually (mediawiki never points to those) [19:26:05] it's very likely to be an issue with beta configuration/thumbor packages, not with the config change, anyway [19:27:23] gilles: Okay, cool. [19:27:43] I can deploy the fix if you get it up within the next 30 minutes. [19:27:47] (03PS7) 10Rush: keystone: bootstrapping framework for install [puppet] - 10https://gerrit.wikimedia.org/r/409992 [19:29:13] I don't think I can figure it out that quickly, too many moving parts. and most likely it'll be in the beta realm of things, where I can fix it without a deployment [19:29:49] Yeah. [19:32:49] (03PS8) 10Rush: keystone: bootstrapping framework for install [puppet] - 10https://gerrit.wikimedia.org/r/409992 [19:37:54] (03PS9) 10Rush: keystone: bootstrapping framework for install [puppet] - 10https://gerrit.wikimedia.org/r/409992 [19:42:16] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3964968 (10mmodell) >>! In T182832#3945200, @elukey wro... [19:43:20] !log andrew@tin Started deploy [horizon/deploy@8cf0c3c]: updating with sudo dashboard fixes [19:43:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:44:26] !log andrew@tin Finished deploy [horizon/deploy@8cf0c3c]: updating with sudo dashboard fixes (duration: 01m 06s) [19:44:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:45:56] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3964973 (10mmodell) It seems that something has changed... [19:47:51] !log andrew@tin Started deploy [horizon/deploy@8cf0c3c]: updating with sudo dashboard fixes -- take two [19:47:55] !log andrew@tin Finished deploy [horizon/deploy@8cf0c3c]: updating with sudo dashboard fixes -- take two (duration: 00m 03s) [19:48:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:48:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:50:01] !log andrew@tin Started deploy [horizon/deploy@8cf0c3c]: updating with sudo dashboard fixes -- take two [19:50:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:50:45] !log andrew@tin Finished deploy [horizon/deploy@8cf0c3c]: updating with sudo dashboard fixes -- take two (duration: 00m 45s) [19:50:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:09] !log ppchelko@tin Started deploy [restbase/deploy@b257b4f]: Support batching in the reading lists API [19:52:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:55:02] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3965012 (10mmodell) >>! In T182832#3948659, @Paladox wr... [19:57:09] !log andrew@tin Started deploy [horizon/deploy@9d73005]: fixes to post-isntall checks [19:57:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:57:24] (03CR) 10Rush: [C: 032] keystone: bootstrapping framework for install [puppet] - 10https://gerrit.wikimedia.org/r/409992 (owner: 10Rush) [19:57:34] 10Operations, 10Phabricator, 10Release-Engineering-Team, 10User-Elukey: Phabricator down due to "Failed to `proc_open()`: proc_open() expects parameter 2 to be array" - https://phabricator.wikimedia.org/T186620#3965021 (10mmodell) The php stack trace points to phabricator setup checks which should only hap... [19:58:10] !log andrew@tin Finished deploy [horizon/deploy@9d73005]: fixes to post-isntall checks (duration: 01m 01s) [19:58:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:11] 10Operations, 10Phabricator, 10Release-Engineering-Team, 10User-Elukey: Phabricator down due to "Failed to `proc_open()`: proc_open() expects parameter 2 to be array" - https://phabricator.wikimedia.org/T186620#3965024 (10mmodell) I guess it must be related to the parent task - it's in ExecFuture which is... [20:00:12] !log andrew@tin Started deploy [horizon/deploy@1fcd9ff]: fixes to post-install checks [20:00:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:35] (03PS1) 10Rush: keystone: move bootstrap policy.json file [puppet] - 10https://gerrit.wikimedia.org/r/410025 [20:01:12] (03CR) 10Rush: [C: 032] keystone: move bootstrap policy.json file [puppet] - 10https://gerrit.wikimedia.org/r/410025 (owner: 10Rush) [20:01:14] !log andrew@tin Finished deploy [horizon/deploy@1fcd9ff]: fixes to post-install checks (duration: 01m 02s) [20:01:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:30] PROBLEM - High CPU load on API appserver on mw1233 is CRITICAL: CRITICAL - load average: 40.62, 36.06, 31.32 [20:07:15] !log andrew@tin Started deploy [horizon/deploy@cba66d2]: more submodule tinkering [20:07:19] !log ppchelko@tin Finished deploy [restbase/deploy@b257b4f]: Support batching in the reading lists API (duration: 15m 10s) [20:07:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:07:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:08:30] !log andrew@tin Finished deploy [horizon/deploy@cba66d2]: more submodule tinkering (duration: 01m 15s) [20:08:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:53] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3965043 (10mmodell) >>! In T182832#3947150, @elukey wro... [20:10:39] (03PS1) 10Andrew Bogott: horizon source deploy: allow scap to restart apache2 [puppet] - 10https://gerrit.wikimedia.org/r/410031 [20:11:13] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3965045 (10Paladox) @mmodell option one would at least... [20:12:25] (03PS3) 10Krinkle: webperf: Re-use expected result by reference to simplify fixture [puppet] - 10https://gerrit.wikimedia.org/r/404045 [20:12:35] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3965049 (10mmodell) @Paladox: I don't think stretch upg... [20:12:59] !log demon@tin Synchronized php-1.31.0-wmf.20/extensions/Flow/includes/Model/UUID.php: T186909 (duration: 00m 56s) [20:13:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:13:12] T186909: Unknown input type to UUID class: array - https://phabricator.wikimedia.org/T186909 [20:14:02] (03CR) 10Thcipriani: [C: 031] "scap::target should handle adding the sudoers rule after this change." [puppet] - 10https://gerrit.wikimedia.org/r/410031 (owner: 10Andrew Bogott) [20:14:14] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3965060 (10Paladox) @mmodell though the old php 5.x is... [20:14:36] (03CR) 10Krinkle: [C: 031] Whitelist new Thumbor-Request-Date header in Swift [puppet] - 10https://gerrit.wikimedia.org/r/409942 (https://phabricator.wikimedia.org/T186594) (owner: 10Gilles) [20:14:52] (03CR) 10Andrew Bogott: [C: 032] horizon source deploy: allow scap to restart apache2 [puppet] - 10https://gerrit.wikimedia.org/r/410031 (owner: 10Andrew Bogott) [20:14:55] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3965061 (10mmodell) We don't need php 5.x for phabricat... [20:16:24] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3965065 (10Paladox) Yep. We should upgrade to php 7.1 a... [20:17:41] !log andrew@tin Started deploy [horizon/deploy@c009388]: updating puppet dashboard [20:17:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:21:03] !log andrew@tin Finished deploy [horizon/deploy@c009388]: updating puppet dashboard (duration: 03m 22s) [20:21:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:59:46] (03PS1) 10Herron: WIP: puppetdbquery: upgrade to 3.0.1 [puppet] - 10https://gerrit.wikimedia.org/r/410050 [20:59:49] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3965156 (10Dzahn) >>! In T182832#3965043, @mmodell wrot... [21:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: I, the Bot under the Fountain, allow thee, The Deployer, to do Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180212T2100). [21:00:04] No GERRIT patches in the queue for this window AFAICS. [21:00:13] (03CR) 10jerkins-bot: [V: 04-1] WIP: puppetdbquery: upgrade to 3.0.1 [puppet] - 10https://gerrit.wikimedia.org/r/410050 (owner: 10Herron) [21:00:15] (03CR) 10Herron: [C: 04-2] "work in progress" [puppet] - 10https://gerrit.wikimedia.org/r/410050 (owner: 10Herron) [21:00:29] !log mholloway-shell@tin Started deploy [mobileapps/deploy@0639c31]: Update mobileapps to f14bdd5 [21:00:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:01:47] no parsoid deploy today [21:02:27] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3965163 (10Paladox) @Dzahn phabricator does not support... [21:03:15] Nothing for ORES [21:06:15] !log mholloway-shell@tin Finished deploy [mobileapps/deploy@0639c31]: Update mobileapps to f14bdd5 (duration: 05m 46s) [21:06:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:10:45] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3965181 (10mmodell) Yeah unfortunately 7.0 dropped a fe... [21:16:30] PROBLEM - haproxy failover on dbproxy1005 is CRITICAL: CRITICAL check_failover servers up 2 down 1 [21:16:50] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1004 is CRITICAL: CRITICAL - Expecting active but unit maintain-dbusers is failed [21:17:01] PROBLEM - Check systemd state on labstore1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:24:09] 10Operations, 10DBA, 10Patch-For-Review, 10codfw-rollout: [RFC] improve parsercache replication and sharding handling - https://phabricator.wikimedia.org/T133523#3965226 (10Krinkle) [21:24:29] !log andrew@tin Started deploy [horizon/deploy@2f70002]: updating several submodules, probably breaking static content [21:24:34] 10Operations, 10DBA, 10Patch-For-Review, 10codfw-rollout: [RFC] improve parsercache replication and sharding handling - https://phabricator.wikimedia.org/T133523#2234475 (10Krinkle) @jcrespo Thanks, I'll untag our team for now then. Let me know if there's anything we can do. [21:24:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:27:47] !log andrew@tin Finished deploy [horizon/deploy@2f70002]: updating several submodules, probably breaking static content (duration: 03m 18s) [21:27:51] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1004 is OK: OK - maintain-dbusers is active [21:28:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:28:01] RECOVERY - Check systemd state on labstore1004 is OK: OK - running: The system is fully operational [21:51:10] (03PS1) 10Andrew Bogott: wmcs puppetmaster: allow labweb to access apis [puppet] - 10https://gerrit.wikimedia.org/r/410067 [21:56:22] (03PS2) 10Andrew Bogott: wmcs puppetmaster: allow labweb to access apis [puppet] - 10https://gerrit.wikimedia.org/r/410067 [22:00:04] bawolff and Reedy: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Weekly Security deployment window . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180212T2200). [22:00:04] No GERRIT patches in the queue for this window AFAICS. [22:01:35] (03CR) 10Andrew Bogott: [C: 032] wmcs puppetmaster: allow labweb to access apis [puppet] - 10https://gerrit.wikimedia.org/r/410067 (owner: 10Andrew Bogott) [22:03:59] (03PS1) 10Andrew Bogott: Move labs VMs back to the default environment. [puppet] - 10https://gerrit.wikimedia.org/r/410069 [22:05:01] (03PS2) 10Andrew Bogott: Move WMCS VMs back to the default environment. [puppet] - 10https://gerrit.wikimedia.org/r/410069 [22:05:59] (03CR) 10Paladox: [C: 031] "thanks." [puppet] - 10https://gerrit.wikimedia.org/r/410069 (owner: 10Andrew Bogott) [22:18:13] (03PS1) 10Rush: keystone: fix type on 'bootstrap' file directory [puppet] - 10https://gerrit.wikimedia.org/r/410070 [22:18:58] (03PS2) 10Rush: keystone: fix type on 'bootstrap' file directory [puppet] - 10https://gerrit.wikimedia.org/r/410070 [22:20:46] (03CR) 10Rush: [C: 032] keystone: fix type on 'bootstrap' file directory [puppet] - 10https://gerrit.wikimedia.org/r/410070 (owner: 10Rush) [22:22:24] 10Operations, 10Wikimedia-General-or-Unknown: Re-consider ` >/dev/null 2>&1` as output of many cron'd MW maintenance scripts - https://phabricator.wikimedia.org/T187078#3963637 (10MarcoAurelio) Somewhat related T187101. When an script breaks and that script is critical (ie: deals with private data to comply wi... [22:25:00] (03PS6) 10Dzahn: otrs: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409462 [22:30:56] I wonder if for the next round of https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/mediawiki/manifests/maintenance/purge_abusefilter.pp;cacc0b5224994e40df6c23bc7b8781305061ed57$4 we could actually store the log ? [22:31:37] re-running after fixing it, not running since 2016 [22:32:20] PROBLEM - Hadoop NodeManager on analytics1058 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [22:33:30] PROBLEM - Hadoop NodeManager on analytics1031 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [22:34:21] PROBLEM - Disk space on analytics1029 is CRITICAL: DISK CRITICAL - free space: / 1378 MB (2% inode=97%) [22:34:28] Hauskatze: sure, we can just change the command line and replace the /dev/null .. [22:34:41] i saw a task by Reedy today [22:34:45] that is about this thing in general [22:35:00] see https://phabricator.wikimedia.org/T187078 [22:35:36] I was wondering if I could submit a patch for it, but I'm not sure if just replacing /dev/null 2>&1 to purge_abusefilter.log would be enough or you want to store the file elsewhere [22:35:40] PROBLEM - Disk space on analytics1055 is CRITICAL: DISK CRITICAL - free space: / 1987 MB (3% inode=97%) [22:35:41] PROBLEM - Disk space on analytics1045 is CRITICAL: DISK CRITICAL - free space: / 1609 MB (3% inode=97%) [22:35:56] (03PS1) 10Rush: openstack: novaobserver.sh comments for use/function [puppet] - 10https://gerrit.wikimedia.org/r/410071 [22:36:03] one sec, looking on maintenance server [22:36:30] Hauskatze: > /ver/log/mediawiki/purge_abusefilter.log please [22:36:35] ok! [22:36:37] /var/log/mediawiki/ sorry [22:37:11] (03CR) 10Rush: [C: 032] openstack: novaobserver.sh comments for use/function [puppet] - 10https://gerrit.wikimedia.org/r/410071 (owner: 10Rush) [22:37:14] /var/log/mediawiki/purge_abusefilter_20180213.log [22:37:21] (03PS2) 10Rush: openstack: novaobserver.sh comments for use/function [puppet] - 10https://gerrit.wikimedia.org/r/410071 [22:37:46] (03PS3) 10Rush: openstack: novaobserver.sh comments for use/function [puppet] - 10https://gerrit.wikimedia.org/r/410071 [22:37:57] 10Operations, 10Wikimedia-General-or-Unknown: Re-consider ` >/dev/null 2>&1` as output of many cron'd MW maintenance scripts - https://phabricator.wikimedia.org/T187078#3963637 (10Dzahn) 17:30 < Hauskatze> I wonder if for the next round of https://phabricator.wikimedia.org/source/operations-puppet/browse/produ... [22:38:30] 10Operations, 10Wikimedia-General-or-Unknown: Re-consider ` >/dev/null 2>&1` as output of many cron'd MW maintenance scripts - https://phabricator.wikimedia.org/T187078#3965395 (10MarcoAurelio) Patch incoming. [22:45:52] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3965408 (10Dzahn) There is T151070 (Experiment with php... [22:46:34] (03PS1) 10MarcoAurelio: mediawiki: log next run of purge_abusefilter.pp [puppet] - 10https://gerrit.wikimedia.org/r/410072 (https://phabricator.wikimedia.org/T187078) [22:46:49] 10Operations, 10Phabricator: Upload php7.1 to apt.wm.org - https://phabricator.wikimedia.org/T160714#3108551 (10Dzahn) This issue is coming back on T182832 [22:48:46] mutante: patch uploaded [22:49:01] sorry for the delay [22:50:22] Hauskatze: do you really want to revert it tomorrow? [22:50:33] like just a single run [22:50:36] or keep the log [22:51:07] i ask this way because of the date in the log file name [22:51:10] mutante: I think Reedy wanted something more complex like log rotating or something, but on a second though, until that happens, let's keep the log [22:51:15] let me amend [22:51:33] just drop the date and yea, we need to rotate it but we have a few days for that :) [22:52:08] 10Operations, 10Phabricator, 10Release-Engineering-Team: Add support for stretch in the phabricator puppet class - https://phabricator.wikimedia.org/T187127#3965428 (10Paladox) [22:52:10] (03PS2) 10MarcoAurelio: mediawiki: log next run of purge_abusefilter.pp [puppet] - 10https://gerrit.wikimedia.org/r/410072 (https://phabricator.wikimedia.org/T187078) [22:52:37] done, is the syntax correct >/var... or > var (with space?) [22:52:48] (03CR) 10Dzahn: [C: 032] mediawiki: log next run of purge_abusefilter.pp [puppet] - 10https://gerrit.wikimedia.org/r/410072 (https://phabricator.wikimedia.org/T187078) (owner: 10MarcoAurelio) [22:53:16] my second patch to ops/puppet :) [22:53:28] both work, as long as it starts with / [22:53:34] heh :) [22:53:58] good good [22:54:14] I'll ask around tomorrow to see if someone could check the logs :) [22:54:47] 10Operations, 10Phabricator, 10Release-Engineering-Team: Add support for stretch in the phabricator puppet class - https://phabricator.wikimedia.org/T187127#3965444 (10Paladox) p:05Triage>03High Needs to be high so we can try to resolve T182832 and T186620. [22:54:49] Dzahn was quick [22:54:50] RECOVERY - High CPU load on API appserver on mw1233 is OK: OK - load average: 20.13, 22.18, 23.77 [22:55:05] Hauskatze: too quick :p [22:55:09] we should have kept the 2>&1 [22:55:18] you want both stdout and stderr [22:55:32] oh wait, you're dzahn [22:55:33] lol [22:55:39] haha [22:55:47] fix incoming I guess [22:56:09] yea [22:56:14] I'm on it [22:56:48] ah, ok :) [22:58:39] 10Operations, 10Traffic, 10Patch-For-Review: Renew unified certificates 2017 - https://phabricator.wikimedia.org/T178173#3965468 (10RobH) [22:59:42] (03PS1) 10MarcoAurelio: mediawiki: fix for I3544d91a [puppet] - 10https://gerrit.wikimedia.org/r/410074 [23:00:46] done [23:01:27] Hauskatze, wait you didn't know mutante == dzahn? [23:02:03] Krenair: if I did, I forgot [23:02:27] though I can't blame anyone, I don't use my Wiki name as irc nick [23:02:54] avoids me a lot of pings :) [23:04:03] Hauskatze: i'll be back soon and merge.. had to go afk [23:04:35] no probs [23:05:20] Hauskatze: And that's why I rotate my nick every few months ;-) [23:05:23] Avoids a lot of pings! [23:05:28] heh [23:07:54] (03PS2) 10MarcoAurelio: Log accessing private abusefilter details [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409445 (https://phabricator.wikimedia.org/T160357) [23:08:41] RECOVERY - Disk space on analytics1055 is OK: DISK OK [23:08:50] RECOVERY - Disk space on analytics1045 is OK: DISK OK [23:09:09] !log cleaned up tmp files on all analytics hadoop worker nodes, job filling up tmp [23:09:09] (03PS1) 10Niharika29: Make it explicit that extension1 contains Echo databases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410079 [23:09:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:09:30] RECOVERY - Disk space on analytics1029 is OK: DISK OK [23:11:27] 10Puppet, 10AbuseFilter, 10User-MarcoAurelio: Setup puppet cron to delete old data daily - https://phabricator.wikimedia.org/T187053#3965494 (10MarcoAurelio) 05Open>03Invalid Given that the cron was already there and was the script the one that was broken, I'm closing this. [23:12:02] (03CR) 10Gergő Tisza: "It's counting the rows where the first two columns of the index are a specified value, so MariaDB just needs to walk the B-tree and count " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409712 (https://phabricator.wikimedia.org/T186296) (owner: 10Gergő Tisza) [23:12:21] RECOVERY - Hadoop NodeManager on analytics1058 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [23:13:40] RECOVERY - Hadoop NodeManager on analytics1031 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [23:13:47] !log manual restart of Yarn Node Managers on analytics1058/31 (failed due to root partition filled up for the issue logged before) [23:13:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:20:27] 10Operations, 10Wikidata: Badges not displaying on trwiki - https://phabricator.wikimedia.org/T186815#3965553 (10Ladsgroup) This happened because we removed the bundle of Wikidata from the production and thus the addresses got lost. Where these addresses have used? in [[https://tr.wikipedia.org/wiki/MediaWiki:... [23:21:40] jouncebot: next [23:21:40] In 0 hour(s) and 38 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180213T0000) [23:27:57] (03CR) 10Dzahn: [C: 032] mediawiki: fix for I3544d91a [puppet] - 10https://gerrit.wikimedia.org/r/410074 (owner: 10MarcoAurelio) [23:32:43] !log terbium,wasat: touch /var/log/mediawwiki/purge_abusefilter.log ; set owner/permissions like other logfiles [23:32:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:33:16] mediaWWiki :) [23:33:46] * Hauskatze zZzZ [23:46:30] (03CR) 10Chad: [V: 032 C: 032] Adding reviewers plugin [software/gerrit] - 10https://gerrit.wikimedia.org/r/409363 (owner: 10Chad) [23:46:56] !log demon@tin Started deploy [gerrit/gerrit@6adde70]: reviewers plugin [23:47:08] !log demon@tin Finished deploy [gerrit/gerrit@6adde70]: reviewers plugin (duration: 00m 12s) [23:47:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:47:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:47:35] no_justification :) [23:48:18] no_justification https://gerrit.wikimedia.org/r/#/x/reviewers/p/3d2png [23:48:30] * paladox finds a way to enable it for all registered users [23:49:04] Project owners have to approve it since it goes into refs/meta/config [23:49:35] no_justification not if we do it globally to allow it, but let project owners decide weather to close it. [23:49:49] 10Operations, 10Beta-Cluster-Infrastructure: Remove video scaler instances from deployment-prep - https://phabricator.wikimedia.org/T187063#3965674 (10brion) I'm not using them for anything; should be clear to wipe them as long as the live servers serving the video scaler queues are not affected. :D [23:49:57] How would we allow merging of some files on that branch but not others? [23:50:02] (we don't do file-level permissions...) [23:50:12] oh, i see what you mean [23:50:59] Yeah. Might be an issue :\ [23:52:24] no_justification: is it a good time to log out all the users?:) [23:53:39] Hmm it could use a seperate branch refs/meta/reviewers. [23:55:49] mutante: Is there ever a good time? [23:55:50] :p [23:56:35] 10Operations, 10Phabricator: Upload php7.1 to apt.wm.org - https://phabricator.wikimedia.org/T160714#3965680 (10mmodell) @MoritzMuehlenhoff: There is at least one 3rd party PHP 7.1 package available [[[ https://packages.sury.org/php/ | 1 ]]]. Could we not use their source packages to build our own binaries? I... [23:56:47] no_justification: maybe metrics meeting :) [23:57:26] jouncebot: next [23:57:26] In 0 hour(s) and 2 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180213T0000) [23:57:32] eh, not now :) hehe [23:57:32] no_justification i've filled https://bugs.chromium.org/p/gerrit/issues/detail?id=8365 [23:58:41] (03CR) 10Chad: [C: 032] Moving Sentry to CommonSettings/extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409750 (owner: 10Chad)