[00:02:42] guess I can get the parserfunctions change out while we wait on zuul [00:08:59] !log thcipriani@deploy1001 Synchronized php-1.32.0-wmf.15/extensions/ParserFunctions/includes/ExtParserFunctions.php: SWAT: [[gerrit:449634|Remove & from $mwDefault variable assignment]] T200772 (duration: 00m 57s) [00:09:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:09:03] T200772: ParserIntegrationTest::testParse (Only variables should be assigned by reference) - https://phabricator.wikimedia.org/T200772 [00:11:09] brion: ok, TimedMediaHandler update for wmf.15 live on mwdebug1002, check please [00:11:50] looking [00:13:01] thcipriani: ++good [00:13:11] awesome, thanks for checking, going live [00:13:17] \o/ [00:15:40] !log thcipriani@deploy1001 Synchronized php-1.32.0-wmf.15/extensions/TimedMediaHandler/WebVideoTranscode/WebVideoTranscode.php: SWAT: [[gerrit:449622|Workaround for job queue reporting 0 length]] (T200813) (duration: 00m 56s) [00:15:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:15:44] T200813: JobQueueGroup::singleton()->getQueueSizes() returns 0 for all queues in production - https://phabricator.wikimedia.org/T200813 [00:15:46] ^ brion all live [00:15:51] yayyy [00:16:26] (03CR) 10jenkins-bot: Enable $wgCiteResponsiveReferences for Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449277 (https://phabricator.wikimedia.org/T200707) (owner: 10MarcoAurelio) [00:18:22] Krenair: that's right. https://phabricator.wikimedia.org/T174431 has been closed today [00:19:27] there is still https://phabricator.wikimedia.org/T176370 though [00:19:48] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Jenkins, 10Patch-For-Review: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10thcipriani) >>! In T192561#4467456, @Krenair wrote: > Luckily deploy01 is almost there? deploy01 is... [00:22:45] (03PS2) 10Dzahn: Beta: Make deployment-deploy01 main deploy server [puppet] - 10https://gerrit.wikimedia.org/r/449521 (https://phabricator.wikimedia.org/T192561) (owner: 10Thcipriani) [00:23:31] (03CR) 10Dzahn: [C: 032] "already cherry-picked and active server per https://phabricator.wikimedia.org/T192561#4467532" [puppet] - 10https://gerrit.wikimedia.org/r/449521 (https://phabricator.wikimedia.org/T192561) (owner: 10Thcipriani) [00:39:09] (03PS1) 10Thcipriani: Scap: update logstash_checker.py mwdeploy query [puppet] - 10https://gerrit.wikimedia.org/r/449639 [00:48:18] (03CR) 10Dzahn: [C: 032] "we did break deployment-prep (sorry) but it should be ok now: https://phabricator.wikimedia.org/T192561#4467069" [puppet] - 10https://gerrit.wikimedia.org/r/449219 (owner: 10Muehlenhoff) [00:50:01] (03CR) 10Dzahn: [C: 032] "--> "Beta: Make deployment-deploy01 main deploy server" has been merged" [puppet] - 10https://gerrit.wikimedia.org/r/449219 (owner: 10Muehlenhoff) [00:52:12] (03PS2) 10Dzahn: ores::redis: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/449631 [01:01:08] 10Operations, 10User-herron: Improve visibility of incoming operations tasks - https://phabricator.wikimedia.org/T197624 (10Dzahn) 05Open>03Resolved a:03Dzahn I would like it if we could have a new task status besides Resolved/Open/etc. That status that i feel is missing would be "New", a state before "O... [01:01:26] 10Operations, 10User-herron: Improve visibility of incoming operations tasks - https://phabricator.wikimedia.org/T197624 (10Dzahn) 05Resolved>03Open [01:02:05] 10Operations, 10User-herron: Improve visibility of incoming operations tasks - https://phabricator.wikimedia.org/T197624 (10Dzahn) a:05Dzahn>03None (closed by accident, oops) [01:04:46] (03CR) 10Dzahn: [C: 032] ores::redis: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/449631 (owner: 10Dzahn) [01:17:49] 10Operations, 10LDAP-Access-Requests, 10Graphite, 10User-Addshore: Give Bmueller grafana-admin access - https://phabricator.wikimedia.org/T199965 (10Dzahn) A comment on this (that doesn't affect the actual access request). grafana-admin doesn't exist anymore since just now. (--> T170150). But the same gr... [01:32:13] @brion it's back to error 500 now [01:32:23] https://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/Jupiter_diagram.svg/5000px-Jupiter_diagram.svg.png [01:32:48] aaaaaaaaaa: yep I got a sample recorded to check logs later if necessary :) [01:36:38] any idea if i'll be able to download it in the near future? :) [01:37:50] not likely, no [01:38:25] poop [01:38:27] might need to be investigated :) [01:38:55] any thing i can follow to keep updated? i don't wanna try to keep up with this IRC [01:59:57] lemme file a task real quick [02:04:00] aaaaaaaaaa: https://phabricator.wikimedia.org/T200866 [02:19:07] (03CR) 10Alex Monk: "No worries, this one has been expected for a while." [puppet] - 10https://gerrit.wikimedia.org/r/449219 (owner: 10Muehlenhoff) [02:29:03] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Jenkins, 10Patch-For-Review: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10Krenair) >>! In T192561#4467532, @thcipriani wrote: > Probably ought to make a deploy02 and then shu... [02:34:19] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.14) (duration: 14m 25s) [02:34:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:54:17] PROBLEM - Memory correctable errors -EDAC- on mw2157 is CRITICAL: 9.001 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw2157&var-datasource=codfw%2520prometheus%252Fops [03:06:48] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.15) (duration: 14m 45s) [03:06:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:12:09] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Jenkins, 10Patch-For-Review: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10Krenair) I think I misunderstood something about the npm thing and I don't think my patch for it wor... [03:15:43] (03PS1) 10Alex Monk: deployment-prep: Set up deployment-deploy02 as deployment-mira stretch replacement [puppet] - 10https://gerrit.wikimedia.org/r/449643 (https://phabricator.wikimedia.org/T192561) [03:16:28] (03CR) 10jerkins-bot: [V: 04-1] deployment-prep: Set up deployment-deploy02 as deployment-mira stretch replacement [puppet] - 10https://gerrit.wikimedia.org/r/449643 (https://phabricator.wikimedia.org/T192561) (owner: 10Alex Monk) [03:16:55] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Jenkins, 10Patch-For-Review: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10Krenair) thcipriani, do we also need to do something about the apt source pointing at depl... [03:17:12] !log l10nupdate@deploy1001 ResourceLoader cache refresh completed at Wed Aug 1 03:17:12 UTC 2018 (duration 10m 24s) [03:17:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:17:44] (03PS2) 10Alex Monk: beta: Set up deployment-deploy02 as deployment-mira replacement [puppet] - 10https://gerrit.wikimedia.org/r/449643 (https://phabricator.wikimedia.org/T192561) [03:20:16] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Jenkins, 10Patch-For-Review: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10Krenair) Also need to figure out what to do about `hieradata/labs/deployment-prep/host/deployment-ti... [03:26:57] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 950.21 seconds [03:41:47] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 158.92 seconds [04:17:57] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Jenkins, 10Patch-For-Review: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10Dzahn) Move the content to hieradata/labs/deployment-prep-host/common.yaml and delete ./host/deploym... [04:19:41] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Jenkins, 10Patch-For-Review: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10Dzahn) >>! In T192561#4467794, @Krenair wrote: > It looks like we probably need to copy /srv/packag... [04:23:51] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Jenkins, 10Patch-For-Review: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10Dzahn) >>! In T192561#4467858, @Dzahn wrote: > Move the content to hieradata/labs/deployment-prep-ho... [04:24:40] (03PS1) 10Tim Starling: Enable MCR migration stage "write both, read old" (the default) on remaining wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449649 (https://phabricator.wikimedia.org/T197816) [04:29:28] (03CR) 10Giuseppe Lavagetto: [C: 031] Use mcrouter for cache reads for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449603 (https://phabricator.wikimedia.org/T198239) (owner: 10Aaron Schulz) [04:29:37] PROBLEM - etcd request latencies on neon is CRITICAL: instance=10.64.0.40:6443 operation={compareAndSwap,get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [04:29:45] (03CR) 10Giuseppe Lavagetto: [C: 031] Use mcrouter for cache reads on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449604 (https://phabricator.wikimedia.org/T198239) (owner: 10Aaron Schulz) [04:30:22] (03CR) 10Giuseppe Lavagetto: [C: 031] Only do cache writes to mcrouter for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449605 (https://phabricator.wikimedia.org/T198239) (owner: 10Aaron Schulz) [04:30:47] RECOVERY - etcd request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [04:30:47] (03CR) 10Giuseppe Lavagetto: [C: 031] Allow broadcasted mcrouter cache operations for purges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449606 (https://phabricator.wikimedia.org/T198239) (owner: 10Aaron Schulz) [04:40:56] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::hhvm: add auto_prepend_file everywhere [puppet] - 10https://gerrit.wikimedia.org/r/440822 (https://phabricator.wikimedia.org/T180183) [04:41:03] (03PS1) 10Marostegui: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449651 [04:42:48] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449651 (owner: 10Marostegui) [04:42:51] (03PS1) 10Marostegui: db1089: Change binlog to ROW [puppet] - 10https://gerrit.wikimedia.org/r/449652 (https://phabricator.wikimedia.org/T199861) [04:44:02] (03CR) 10Marostegui: [C: 032] db1089: Change binlog to ROW [puppet] - 10https://gerrit.wikimedia.org/r/449652 (https://phabricator.wikimedia.org/T199861) (owner: 10Marostegui) [04:44:05] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449651 (owner: 10Marostegui) [04:44:19] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449651 (owner: 10Marostegui) [04:45:51] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1101:3318 (duration: 01m 06s) [04:45:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:46:05] !log Deploy schema change on db1101:3318 T144010 T51190 T199368 [04:46:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:46:11] T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 [04:46:11] T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 [04:46:11] T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 [04:46:33] (03CR) 10Giuseppe Lavagetto: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/11939/" [puppet] - 10https://gerrit.wikimedia.org/r/440822 (https://phabricator.wikimedia.org/T180183) (owner: 10Giuseppe Lavagetto) [04:46:49] (03PS3) 10Giuseppe Lavagetto: profile::mediawiki::hhvm: add auto_prepend_file everywhere [puppet] - 10https://gerrit.wikimedia.org/r/440822 (https://phabricator.wikimedia.org/T180183) [04:47:51] !log Stop MySQL on db1052 to copy its content to dbstore1001 - https://phabricator.wikimedia.org/T199861 [04:47:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:48:03] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::hhvm: enable TC garbage collection everywhere [puppet] - 10https://gerrit.wikimedia.org/r/440823 (https://phabricator.wikimedia.org/T103886) [04:52:37] (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler02/11940/" [puppet] - 10https://gerrit.wikimedia.org/r/440823 (https://phabricator.wikimedia.org/T103886) (owner: 10Giuseppe Lavagetto) [04:52:40] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::mediawiki::hhvm: enable TC garbage collection everywhere [puppet] - 10https://gerrit.wikimedia.org/r/440823 (https://phabricator.wikimedia.org/T103886) (owner: 10Giuseppe Lavagetto) [04:56:31] (03PS1) 10Marostegui: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449653 (https://phabricator.wikimedia.org/T199861) [04:58:59] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449653 (https://phabricator.wikimedia.org/T199861) (owner: 10Marostegui) [05:00:12] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449653 (https://phabricator.wikimedia.org/T199861) (owner: 10Marostegui) [05:00:25] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449653 (https://phabricator.wikimedia.org/T199861) (owner: 10Marostegui) [05:01:25] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1089 (duration: 00m 57s) [05:01:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:02:43] !log Stop MySQL on db1089 for a binlog format change (and also upgrade kernel) [05:02:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:05:59] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449655 [05:12:06] PROBLEM - MariaDB Slave Lag: s6 on dbstore2001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 307.79 seconds [05:12:07] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449655 (owner: 10Marostegui) [05:13:30] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449655 (owner: 10Marostegui) [05:14:40] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1089 (duration: 00m 57s) [05:14:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:16:06] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449655 (owner: 10Marostegui) [05:21:56] PROBLEM - HHVM jobrunner on mw1307 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [05:23:06] RECOVERY - HHVM jobrunner on mw1307 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.006 second response time [05:34:26] PROBLEM - HHVM jobrunner on mw1303 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.005 second response time [05:35:27] RECOVERY - HHVM jobrunner on mw1303 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [05:50:17] PROBLEM - HHVM jobrunner on mw1306 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.042 second response time [05:51:26] RECOVERY - HHVM jobrunner on mw1306 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.004 second response time [05:51:53] (03PS1) 10Muehlenhoff: Remove access for pnorman [puppet] - 10https://gerrit.wikimedia.org/r/449656 [05:54:16] (03CR) 10Muehlenhoff: [C: 032] Remove access for pnorman [puppet] - 10https://gerrit.wikimedia.org/r/449656 (owner: 10Muehlenhoff) [05:56:56] PROBLEM - HHVM jobrunner on mw1305 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.003 second response time [05:57:57] RECOVERY - HHVM jobrunner on mw1305 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.004 second response time [06:07:36] PROBLEM - HHVM jobrunner on mw1300 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [06:08:46] RECOVERY - HHVM jobrunner on mw1300 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [06:11:02] !log seeing spike in errors on video scalers [06:11:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:11:41] !log stopping requeueTranscodes.php for now [06:11:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:17:41] <_joe_> brion: which kind of errors? [06:17:57] <_joe_> it might be due to us distributing a new hhvm config [06:19:03] looking at logs [06:19:06] _joe_: some signal 15s (SIGTERM I think) [06:19:15] others no listed signal, just exitcode -1 [06:19:24] <_joe_> brion: yes, that comes from hhvm restarting [06:19:25] <_joe_> sorry [06:19:32] <_joe_> videos of course will fail [06:19:35] aha fun :D [06:19:42] ok i'll improve the retry logic in future ;) [06:20:51] thanks :D [06:21:44] hmmmmm yeah i can re-run the batch command and it should just requeue them. sweet [06:22:49] !log restarting transcode batch job (errors believed caused by hhvm config restarts) [06:22:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:25:03] <_joe_> brion: I think changeprop has its own retry logic [06:26:10] i think tmh assumes a retry would likely fail again and so just errors out until someone manually requeues it. i can probably add a retry-at-least-once though [06:27:25] ok it's late for me, i'm backing away from the computer :D [06:28:46] PROBLEM - puppet last run on mw1323 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/share/ca-certificates/DigiCert_High_Assurance_CA-3.crt] [06:29:36] PROBLEM - puppet last run on analytics1071 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/default/ferm],File[/usr/local/bin/puppet-enabled] [06:29:36] PROBLEM - puppet last run on ms-be1027 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/puppet-enabled] [06:32:27] PROBLEM - puppet last run on mw1278 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/prometheus-puppet-agent-stats] [06:32:46] <_joe_> uhm puppetmaster failure at 6:30 I would say [06:32:47] PROBLEM - puppet last run on mw1289 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/vim/vimrc.local] [06:32:47] PROBLEM - puppet last run on mw1319 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/apache2/sites-available/09-wikimedia.conf] [06:32:50] <_joe_> blame logrotate [06:33:27] PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/bash_autologout.sh] [06:34:37] PROBLEM - HHVM jobrunner on mw1301 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [06:35:02] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449658 [06:35:30] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449658 [06:35:39] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449659 [06:35:46] RECOVERY - HHVM jobrunner on mw1301 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.008 second response time [06:35:47] PROBLEM - HHVM jobrunner on mw1296 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.034 second response time [06:36:56] RECOVERY - HHVM jobrunner on mw1296 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [06:37:10] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449658 (owner: 10Marostegui) [06:37:17] PROBLEM - HHVM jobrunner on mw1334 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [06:38:19] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449658 (owner: 10Marostegui) [06:38:26] RECOVERY - HHVM jobrunner on mw1334 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.003 second response time [06:38:44] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449659 (owner: 10Marostegui) [06:38:49] (03PS2) 10Marostegui: db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449659 [06:39:33] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1101:3318 (duration: 00m 55s) [06:39:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:41:43] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Increase traffic db1089 (duration: 00m 55s) [06:41:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:42:22] (03PS1) 10Marostegui: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449660 [06:44:17] PROBLEM - HHVM jobrunner on mw1311 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [06:44:42] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449660 (owner: 10Marostegui) [06:45:27] RECOVERY - HHVM jobrunner on mw1311 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [06:46:00] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449660 (owner: 10Marostegui) [06:46:29] (03PS7) 10Giuseppe Lavagetto: mediawiki: use compile_redirects as a function [puppet] - 10https://gerrit.wikimedia.org/r/357733 (owner: 10Faidon Liambotis) [06:46:31] (03PS1) 10Giuseppe Lavagetto: mediawiki::web::site: backport changes from mediawiki_exp [puppet] - 10https://gerrit.wikimedia.org/r/449661 [06:47:12] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1087 (duration: 00m 55s) [06:47:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:47:53] !log Deploy schema change on db1087 with replication, this will generate lag on labs:s8 T144010 T51190 T199368 [06:47:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:48:00] T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 [06:48:00] T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 [06:48:01] T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 [06:51:44] 10Operations: Add email addresses for new techcom members to techcom@wikimedia.org - https://phabricator.wikimedia.org/T200799 (10Joe) 05Open>03Resolved [06:53:10] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449658 (owner: 10Marostegui) [06:53:12] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449659 (owner: 10Marostegui) [06:53:14] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449660 (owner: 10Marostegui) [06:55:06] RECOVERY - puppet last run on analytics1071 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:55:43] (03PS1) 10Nikerabbit: Fix config for $wgLocalisationUpdateRepositories [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449663 (https://phabricator.wikimedia.org/T148965) [06:57:17] PROBLEM - HHVM rendering on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:58:02] !log elukey@deploy1001 Started deploy [eventlogging/analytics@762ca2b]: Deploy https://gerrit.wikimedia.org/r/#/c/eventlogging/+/449422/ [06:58:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:06] RECOVERY - puppet last run on mw1278 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:58:08] !log elukey@deploy1001 Finished deploy [eventlogging/analytics@762ca2b]: Deploy https://gerrit.wikimedia.org/r/#/c/eventlogging/+/449422/ (duration: 00m 07s) [06:58:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:17] RECOVERY - HHVM rendering on mw1226 is OK: HTTP OK: HTTP/1.1 200 OK - 75606 bytes in 0.092 second response time [06:58:26] RECOVERY - puppet last run on mw1289 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:58:26] RECOVERY - puppet last run on mw1319 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:59:07] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:27] RECOVERY - puppet last run on mw1323 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [06:59:29] !log restart eventlogging on eventlog1002 to pick up new logging settings [06:59:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:59:56] PROBLEM - Nginx local proxy to apache on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:00:07] RECOVERY - puppet last run on ms-be1027 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:00:56] RECOVERY - Nginx local proxy to apache on mw1287 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.038 second response time [07:01:30] (03PS14) 10Elukey: Refactor Hadoop code to allow more than one cluster in Prod [puppet] - 10https://gerrit.wikimedia.org/r/447813 (https://phabricator.wikimedia.org/T167790) [07:07:24] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Remove db1052 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449665 (https://phabricator.wikimedia.org/T199861) [07:08:51] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Remove db1052 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449665 (https://phabricator.wikimedia.org/T199861) (owner: 10Marostegui) [07:09:28] (03CR) 10KartikMistry: [C: 031] Fix config for $wgLocalisationUpdateRepositories [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449663 (https://phabricator.wikimedia.org/T148965) (owner: 10Nikerabbit) [07:09:48] !log Remove db1052 from tendril - T199861 [07:09:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:09:52] T199861: Decommission db1052 - https://phabricator.wikimedia.org/T199861 [07:10:08] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1052 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449665 (https://phabricator.wikimedia.org/T199861) (owner: 10Marostegui) [07:10:47] RECOVERY - MariaDB Slave Lag: s6 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 35.24 seconds [07:11:32] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Remove db1052 from config as it will be decommissioned - T199861 (duration: 00m 56s) [07:11:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:35] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Remove db1052 from config as it will be decommissioned - T199861 (duration: 00m 55s) [07:12:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:13:15] (03PS1) 10Marostegui: mariadb: Set db1052 to spare [puppet] - 10https://gerrit.wikimedia.org/r/449666 (https://phabricator.wikimedia.org/T199861) [07:15:27] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0 [07:16:06] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 124, down: 1, dormant: 0, excluded: 0, unused: 0 [07:16:23] (03CR) 10Marostegui: "https://puppet-compiler.wmflabs.org/compiler02/11942/" [puppet] - 10https://gerrit.wikimedia.org/r/449666 (https://phabricator.wikimedia.org/T199861) (owner: 10Marostegui) [07:16:30] (03CR) 10Marostegui: [C: 032] mariadb: Set db1052 to spare [puppet] - 10https://gerrit.wikimedia.org/r/449666 (https://phabricator.wikimedia.org/T199861) (owner: 10Marostegui) [07:23:27] 10Operations, 10ops-eqiad, 10DBA, 10decommission, 10Patch-For-Review: Decommission db1052 - https://phabricator.wikimedia.org/T199861 (10Marostegui) a:05Marostegui>03RobH db1052 is now ready for DCOps to finish its decommissioning - assigning it to @RobH db1052 was a great s1 master but now it needs... [07:25:39] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1052 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449665 (https://phabricator.wikimedia.org/T199861) (owner: 10Marostegui) [07:37:57] (03PS15) 10Elukey: Refactor Hadoop code to allow more than one cluster in Prod [puppet] - 10https://gerrit.wikimedia.org/r/447813 (https://phabricator.wikimedia.org/T167790) [07:41:51] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown, 10SEO: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10TheDJ) So apparently Google detects JS based redirects these days. I also suspect... [07:43:48] (03CR) 10Elukey: [C: 032] "Forgot about flerovium, furud, stat100[4,5] but they now looks good https://puppet-compiler.wmflabs.org/compiler02/11944/" [puppet] - 10https://gerrit.wikimedia.org/r/447813 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [07:59:05] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449671 [07:59:37] !log restart hadoop-yarn-nodemanager on analytics10[28-30] to test new memory settings [07:59:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:12:29] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449671 [08:14:45] (03CR) 10Mobrovac: [C: 031] Scap: update logstash_checker.py mwdeploy query [puppet] - 10https://gerrit.wikimedia.org/r/449639 (owner: 10Thcipriani) [08:15:13] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449671 (owner: 10Marostegui) [08:16:05] PROBLEM - mysqld processes on db1117 is CRITICAL: PROCS CRITICAL: 3 processes with command name mysqld [08:16:34] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449671 (owner: 10Marostegui) [08:16:37] !log Stop MySQL on db2078 [08:16:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:46] (03CR) 10Amire80: "There are some replies at the discussion now:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448553 (https://phabricator.wikimedia.org/T200522) (owner: 10Amire80) [08:16:48] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449671 (owner: 10Marostegui) [08:17:47] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1087 (duration: 00m 56s) [08:17:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:21] !log start of ladsgroup@mwmaint1001:~$ foreachwikiindblist s8 populateChangeTagDef.php --sleep 3 (T193873) [08:21:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:25] T193873: Run maintenance script to populate change_tag_def on WMF production (all wikis) - https://phabricator.wikimedia.org/T193873 [08:22:07] !log start of ladsgroup@mwmaint1001:~$ foreachwikiindblist s1 populateChangeTagDef.php --sleep 2 (T193873) [08:22:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:23:03] (03PS1) 10Muehlenhoff: Remove jessie-specific Puppet code from mediawiki classes [puppet] - 10https://gerrit.wikimedia.org/r/449672 [08:23:05] (03PS1) 10Muehlenhoff: Inline mediawiki::packages::math [puppet] - 10https://gerrit.wikimedia.org/r/449673 [08:23:07] (03PS1) 10Muehlenhoff: Remove jessie-specific code from mediawiki::packages::tex, cleanups [puppet] - 10https://gerrit.wikimedia.org/r/449674 [08:23:09] (03PS1) 10Muehlenhoff: Inline mediawiki::packages::tex [puppet] - 10https://gerrit.wikimedia.org/r/449675 [08:23:43] (03CR) 10jerkins-bot: [V: 04-1] Remove jessie-specific Puppet code from mediawiki classes [puppet] - 10https://gerrit.wikimedia.org/r/449672 (owner: 10Muehlenhoff) [08:23:48] (03CR) 10jerkins-bot: [V: 04-1] Inline mediawiki::packages::math [puppet] - 10https://gerrit.wikimedia.org/r/449673 (owner: 10Muehlenhoff) [08:24:02] 10Operations, 10User-herron: Improve visibility of incoming operations tasks - https://phabricator.wikimedia.org/T197624 (10Aklapper) >>! In T197624#4467581, @Dzahn wrote: > I would like it if we could have a new task status besides Resolved/Open/etc. That status that i feel is missing would be "New", a state... [08:24:46] (03CR) 10jerkins-bot: [V: 04-1] Inline mediawiki::packages::tex [puppet] - 10https://gerrit.wikimedia.org/r/449675 (owner: 10Muehlenhoff) [08:25:09] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449677 [08:25:52] 10Operations, 10Mathoid, 10SCB, 10Services (watching): remove mathoid from scb - https://phabricator.wikimedia.org/T200832 (10mobrovac) I'd hold off with this for the time being. @akosiaris what do you think? [08:26:13] (03PS2) 10Muehlenhoff: Remove jessie-specific Puppet code from mediawiki classes [puppet] - 10https://gerrit.wikimedia.org/r/449672 [08:26:51] (03CR) 10jerkins-bot: [V: 04-1] Remove jessie-specific Puppet code from mediawiki classes [puppet] - 10https://gerrit.wikimedia.org/r/449672 (owner: 10Muehlenhoff) [08:26:58] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449677 (owner: 10Marostegui) [08:28:21] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449677 (owner: 10Marostegui) [08:28:44] (03PS3) 10Muehlenhoff: Remove jessie-specific Puppet code from Mediawiki classes [puppet] - 10https://gerrit.wikimedia.org/r/449672 [08:29:16] (03CR) 10jerkins-bot: [V: 04-1] Remove jessie-specific Puppet code from Mediawiki classes [puppet] - 10https://gerrit.wikimedia.org/r/449672 (owner: 10Muehlenhoff) [08:31:12] (03PS4) 10Muehlenhoff: Remove Jessie-specific Puppet code from Mediawiki classes [puppet] - 10https://gerrit.wikimedia.org/r/449672 [08:32:04] (03PS1) 10Marostegui: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449678 [08:32:52] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449677 (owner: 10Marostegui) [08:33:54] (03PS2) 10Muehlenhoff: Inline mediawiki::packages::math [puppet] - 10https://gerrit.wikimedia.org/r/449673 [08:34:40] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449678 (owner: 10Marostegui) [08:35:56] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449678 (owner: 10Marostegui) [08:37:09] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1109 (duration: 00m 55s) [08:37:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:15] PROBLEM - toolschecker: Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/dumps - 288 bytes in 0.034 second response time [08:37:24] !log Deploy schema change on db1109:3318 T144010 T51190 T199368 [08:37:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:31] T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 [08:37:31] T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 [08:37:32] T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 [08:38:19] !log restart hadoop-yarn-nodemanager on analytics10[31-77] to apply the new memory settings [08:38:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:40:36] RECOVERY - toolschecker: Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.042 second response time [08:41:05] 10Operations, 10Mathoid, 10SCB, 10Services (watching): remove mathoid from scb - https://phabricator.wikimedia.org/T200832 (10akosiaris) >>! In T200832#4468216, @mobrovac wrote: > I'd hold off with this for the time being. @akosiaris what do you think? Overall with the push to move all SCB services to kub... [08:49:22] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449678 (owner: 10Marostegui) [08:51:26] PROBLEM - Check systemd state on analytics1051 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:51:36] PROBLEM - Hadoop NodeManager on analytics1051 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [08:52:07] checking -^ [08:52:46] RECOVERY - Hadoop NodeManager on analytics1051 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [08:53:55] RECOVERY - Check systemd state on analytics1051 is OK: OK - running: The system is fully operational [09:01:18] (03PS1) 10Jcrespo: db backup statistics: Initial implementation of the backup stats [puppet] - 10https://gerrit.wikimedia.org/r/449681 (https://phabricator.wikimedia.org/T198987) [09:02:01] (03CR) 10jerkins-bot: [V: 04-1] db backup statistics: Initial implementation of the backup stats [puppet] - 10https://gerrit.wikimedia.org/r/449681 (https://phabricator.wikimedia.org/T198987) (owner: 10Jcrespo) [09:05:12] (03CR) 10Daniel Kinzler: [C: 031] "yes, please" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449649 (https://phabricator.wikimedia.org/T197816) (owner: 10Tim Starling) [09:10:21] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449683 [09:11:40] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449683 (owner: 10Marostegui) [09:13:31] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449683 (owner: 10Marostegui) [09:14:38] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1109 (duration: 00m 56s) [09:14:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:14:53] (03PS1) 10Marostegui: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449686 [09:15:17] (03PS2) 10Jcrespo: db backup statistics: Initial implementation of the backup stats [puppet] - 10https://gerrit.wikimedia.org/r/449681 (https://phabricator.wikimedia.org/T198987) [09:15:59] (03CR) 10jerkins-bot: [V: 04-1] db backup statistics: Initial implementation of the backup stats [puppet] - 10https://gerrit.wikimedia.org/r/449681 (https://phabricator.wikimedia.org/T198987) (owner: 10Jcrespo) [09:16:37] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449686 (owner: 10Marostegui) [09:17:45] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449686 (owner: 10Marostegui) [09:18:55] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1092 (duration: 00m 55s) [09:18:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:03] !log Deploy schema change on db1092 T144010 T51190 T199368 [09:19:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:09] T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 [09:19:10] T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 [09:19:10] T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 [09:20:28] (03CR) 10Jcrespo: "We need to fix some issues and enable tls as this will require cross-dc communication." [puppet] - 10https://gerrit.wikimedia.org/r/449681 (https://phabricator.wikimedia.org/T198987) (owner: 10Jcrespo) [09:22:01] (03CR) 10Ladsgroup: "PageTriage is not yet enabled on enwiki. At the end it doesn't matter though." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449437 (https://phabricator.wikimedia.org/T199357) (owner: 10Sbisson) [09:22:52] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449683 (owner: 10Marostegui) [09:22:56] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449686 (owner: 10Marostegui) [09:24:19] 10Operations, 10Wikimedia-General-or-Unknown: Wrong umask when deploying from screen - https://phabricator.wikimedia.org/T200690 (10zeljkofilipin) This is what the train and swat documentation says, created mostly by @thcipriani, does it need updating? https://wikitech.wikimedia.org/wiki/Heterogeneous_deploym... [09:28:36] RECOVERY - mysqld processes on db1117 is OK: PROCS OK: 4 processes with command name mysqld [09:36:07] !log installing clamav security updates [09:36:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:39:30] !log Run community_metrics.sh on phab1001 [09:39:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:41:46] !log cp3044 (upload) repooled after upgrade to stretch T200445 [09:41:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:41:50] T200445: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 [09:41:53] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449687 [09:42:52] (03PS3) 10Jcrespo: db backup statistics: Initial implementation of the backup stats [puppet] - 10https://gerrit.wikimedia.org/r/449681 (https://phabricator.wikimedia.org/T198987) [09:43:21] (03PS1) 10Marostegui: db1117: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/449688 [09:43:34] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449687 (owner: 10Marostegui) [09:43:36] (03CR) 10jerkins-bot: [V: 04-1] db backup statistics: Initial implementation of the backup stats [puppet] - 10https://gerrit.wikimedia.org/r/449681 (https://phabricator.wikimedia.org/T198987) (owner: 10Jcrespo) [09:44:56] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449687 (owner: 10Marostegui) [09:46:01] (03CR) 10Marostegui: [C: 032] db1117: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/449688 (owner: 10Marostegui) [09:46:15] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1092 (duration: 00m 55s) [09:46:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:18] 10Operations, 10Wikimedia-General-or-Unknown: Wrong umask when deploying from screen - https://phabricator.wikimedia.org/T200690 (10hashar) Using ssh interactively we get the proper behavior: ``` $ ssh deploy1001.eqiad.wmnet hashar@deploy1001:~$ umask 0002 ``` But running umask directly: ``` $ ssh deploy1001... [09:50:15] !log Deploy schema change on db2048 (s1 codfw master) with replication, this will generate lag on codfw s1 T144010 T51190 T199368 [09:50:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:50:23] T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 [09:50:23] T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 [09:50:23] T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 [09:50:38] !log installing wireshark security updates [09:50:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:53:16] (03PS4) 10Jcrespo: db backup statistics: Initial implementation of the backup stats [puppet] - 10https://gerrit.wikimedia.org/r/449681 (https://phabricator.wikimedia.org/T198987) [09:54:39] (03PS1) 10Marostegui: Revert "dbproxy100{3,8}: Point m3 secondary to codfw" [puppet] - 10https://gerrit.wikimedia.org/r/449690 [09:54:45] (03PS2) 10Marostegui: Revert "dbproxy100{3,8}: Point m3 secondary to codfw" [puppet] - 10https://gerrit.wikimedia.org/r/449690 [09:56:47] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449687 (owner: 10Marostegui) [10:06:07] 10Operations, 10Cloud-VPS, 10cloud-services-team: labvirt1009 has high CPU, disk I/O and skyrocketted load - https://phabricator.wikimedia.org/T200888 (10hashar) [10:06:25] 10Operations, 10Cloud-VPS, 10cloud-services-team: labvirt1009 has high CPU, disk I/O and skyrocketted load - https://phabricator.wikimedia.org/T200888 (10hashar) p:05Triage>03High [10:07:29] 10Operations, 10Wikimedia-General-or-Unknown: Wrong umask when deploying from screen - https://phabricator.wikimedia.org/T200690 (10Tgr) >>! In T200690#4467351, @Dzahn wrote: > You mean deploy1001.eqiad.wmnet, right (vs. mwdeploy) right? Probably, yeah. I actually use `deployment.eqiad.wmnet`. [10:11:23] (03CR) 10ArielGlenn: "mediawiki::packages::fonts gets included a lot of places and a lot of hosts that have it are still running jessie:" [puppet] - 10https://gerrit.wikimedia.org/r/449672 (owner: 10Muehlenhoff) [10:13:45] 10Operations, 10Wikimedia-General-or-Unknown: Wrong umask when deploying from screen - https://phabricator.wikimedia.org/T200690 (10Tgr) So yeah, apparently SSH uses a non-login shell when you give it a command to execute, and there is no easy way around it; you have to do things like `ssh deploy1001.eqiad.wmn... [10:17:17] 10Operations, 10DBA, 10JADE, 10Scoring-platform-team, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10daniel) To me it still seems the easiest solution would be to put this on a separate wiki. That way, we can observe... [10:18:55] (03PS5) 10Muehlenhoff: Remove Jessie-specific Puppet code from Mediawiki math class [puppet] - 10https://gerrit.wikimedia.org/r/449672 [10:19:11] (03CR) 10Muehlenhoff: "Right, I've reverted that hunk from the patch until Thumbor and Graphoid are migrated to stretch." [puppet] - 10https://gerrit.wikimedia.org/r/449672 (owner: 10Muehlenhoff) [10:27:54] (03CR) 10Ladsgroup: [C: 031] CleanupParent for draftquality model when PageTriage is used [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449437 (https://phabricator.wikimedia.org/T199357) (owner: 10Sbisson) [10:36:19] (03PS2) 10Volans: Initial structure [software/spicerack] - 10https://gerrit.wikimedia.org/r/448046 (https://phabricator.wikimedia.org/T199079) [10:36:21] (03PS3) 10Volans: Add common base utility modules [software/spicerack] - 10https://gerrit.wikimedia.org/r/448047 (https://phabricator.wikimedia.org/T199079) [10:36:35] (03CR) 10Volans: "Thanks for the reviews, replies inline, code updated" (036 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/448047 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [10:37:45] (03CR) 10jerkins-bot: [V: 04-1] Add common base utility modules [software/spicerack] - 10https://gerrit.wikimedia.org/r/448047 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [10:38:14] (03CR) 10KartikMistry: [C: 031] Enable SandboxLink on the isiZulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448553 (https://phabricator.wikimedia.org/T200522) (owner: 10Amire80) [10:39:31] (03CR) 10Marostegui: [C: 032] Revert "dbproxy100{3,8}: Point m3 secondary to codfw" [puppet] - 10https://gerrit.wikimedia.org/r/449690 (owner: 10Marostegui) [10:40:36] 10Operations, 10Analytics, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345 (10elukey) Any update on this one? [10:40:57] !log Reload haproxy on dbproxy1008 and dbproxy1003 [10:41:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:41:29] 10Operations, 10Analytics, 10hardware-requests: eqiad: (2) hardware refresh for analytics1003 - https://phabricator.wikimedia.org/T198685 (10elukey) Quick check in to see if we can get the quotes for this host during the next couple of weeks. [10:46:35] (03CR) 10ArielGlenn: "The only thing left now is that integration-slave-jessie-1001,2,3,4.integration.eqiad.wmflabs have the math packages (included in mediawik" [puppet] - 10https://gerrit.wikimedia.org/r/449672 (owner: 10Muehlenhoff) [10:54:10] (03PS1) 10Giuseppe Lavagetto: profile::dumps::generation::worker::common: add mcrouter [puppet] - 10https://gerrit.wikimedia.org/r/449694 [10:54:15] <_joe_> elukey, ema [10:54:17] <_joe_> ^^ [10:54:30] <_joe_> this should solve the issues with memcached errors [10:54:42] <_joe_> apergos: you too ofc :) [10:54:58] ah gooooood [10:54:59] thank you [10:59:04] <_joe_> apergos: https://puppet-compiler.wmflabs.org/compiler02/11945/snapshot1008.eqiad.wmnet/ seems ok [10:59:48] I was just looking at the manifest itself [11:00:05] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: How many deployers does it take to do European Mid-day SWAT(Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180801T1100). [11:00:05] Nikerabbit, aharoni, and stephanebisson: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:15] o/ [11:00:19] I can SWAT today [11:00:28] yo zeljkof, how's it going [11:00:38] all work no fun :D [11:00:42] (03CR) 10ArielGlenn: [C: 031] "Thanks for finding/fixing this." [puppet] - 10https://gerrit.wikimedia.org/r/449694 (owner: 10Giuseppe Lavagetto) [11:00:43] \o [11:00:46] I just added a last minute patch, sorry but it's an UBN, I hope it can make it [11:00:51] I'm just glad I'm no longer running train ;) [11:01:10] jan_drewniak: sure, UBN have priority, want to deploy it first? [11:01:39] Hi [11:01:43] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::dumps::generation::worker::common: add mcrouter [puppet] - 10https://gerrit.wikimedia.org/r/449694 (owner: 10Giuseppe Lavagetto) [11:01:57] I only have ~30 minutes today, but I can do my patch some other time too [11:02:07] @zeljkof I haven't actually done a deploy like this before, can you deploy it for me? It's a back port [11:02:48] jan_drewniak: sure, but we have extensive docs these days if you want to try ;) https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers [11:04:34] Nikerabbit: you're a deployer, want to deploy your own change? [11:04:44] you can go first if you're in a hurry [11:04:55] @zeljkof thanks for pointing me to there! how about I do it next time :D promise [11:05:26] jan_drewniak: no problem :) I'll deploy, take a look at the docs, we've updated them significantly [11:05:28] zeljkof: oh, I haven't deployed in a few years and wasn't mentally prepared for that :D [11:05:56] Nikerabbit: no problem, I'll deploy, but we have nice docs if you want to try :) https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers [11:06:20] zeljkof: I'll make a note to prepare to deploy myself next time [11:07:00] Nikerabbit: sure, it's good that every team has somebody that can deploy, #releng can not guarantee that somebody will be there for swat every time [11:07:41] stephanebisson: you're a deployer too, right? want to deploy your own changes today? [11:08:22] I'll start with Nikerabbit and aharoni, please stand by, I'll ping you when your changes are at mwdebug1002 for testing, let me know if you need help testing there, there are docs [11:08:46] zeljkof: my test plan is to just to deploy and wait until LocalisationUpdate runs again and see if warnings persist (it does not affect web requests in any way) [11:09:00] zeljkof: I do't think I have access. I've deployed maps but I don't know how to swat. I'll look into it in the near future though. [11:09:06] zeljkof: thanks. Mine is a super-simple one-line configuration change. [11:09:27] Enabling a small extension on a small wiki. [11:10:03] stephanebisson: this says you have :) https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/admin/data/data.yaml$62 [11:10:37] stephanebisson: I'll deploy, but take a look at the docs for a possible future deploy :) https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers [11:10:58] Nikerabbit: ok, so I just deploy your commit, no need for testing at mwdebug? [11:10:59] zeljkof: I get the message ;) [11:11:27] zeljkof: famous last words, but no need to test at mwdebug [11:11:27] stephanebisson: I'll repeat my usual mantra ;) it's good that every team has somebody that can deploy, #releng can not guarantee that somebody will be there for swat every time [11:11:49] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 126, down: 0, dormant: 0, excluded: 0, unused: 0 [11:11:54] Nikerabbit: I'll deploy there first anyway, check the logs quickly, but ping you when it's at production [11:12:02] zeljkof: +1 [11:12:19] Nikerabbit: and scap checks the logs automatically anyway [11:12:19] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 [11:12:44] zeljkof: there's a new scap swat command by the way? [11:13:11] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449663 (https://phabricator.wikimedia.org/T148965) (owner: 10Nikerabbit) [11:13:13] Hauskatze: which one? [11:14:09] jan_drewniak: your entry to deployments calendar does not look good, check it, for example the link is not there :) [11:14:22] zeljkof: saw some stuff being added at requirements.txt with that summary "New scap swat command" [11:14:26] I'll check it anyway [11:14:28] (03Merged) 10jenkins-bot: Fix config for $wgLocalisationUpdateRepositories [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449663 (https://phabricator.wikimedia.org/T148965) (owner: 10Nikerabbit) [11:14:35] Hauskatze: did not notice that [11:15:09] jan_drewniak: this is it? [11:15:11] https://gerrit.wikimedia.org/r/c/mediawiki/skins/MinervaNeue/+/449647 [11:15:12] @zeljkof baaa sorry! was added in a hurry, link is https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/MinervaNeue/+/449647/ but I'll update [11:15:27] bah, commit is 2 years old and doesn't do justice to its name [11:16:13] jan_drewniak: and hm, the commit is not merged into master yet, that's usually the case [11:16:30] jan_drewniak: it isn't a backport from master to a branch if it isn't in master yet ;) [11:17:33] !log installing openjpeg2 security updates on jessie [11:17:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:18] (03CR) 10jenkins-bot: Fix config for $wgLocalisationUpdateRepositories [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449663 (https://phabricator.wikimedia.org/T148965) (owner: 10Nikerabbit) [11:20:39] !log zfilipin@deploy1001 Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:449663|Fix config for $wgLocalisationUpdateRepositories (T148965)]] (duration: 00m 57s) [11:20:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:20:44] T148965: PHP Notice "Undefined index: skin" from extensions/LocalisationUpdate/Updater.php - https://phabricator.wikimedia.org/T148965 [11:21:02] Nikerabbit: it's deployed, please check relevant logs and thanks for deploying with #releng ;) [11:21:10] zeljkof: https://github.com/wikimedia/operations-mediawiki-config/commit/0da4f050082a493d3e0ffa51baba027f6ef6fdbe for when you have time :) [11:21:10] aharoni: you're next, please stand by [11:21:19] I'm here [11:21:26] Hauskatze: so, when I'm retired? ;) [11:21:43] zeljkof: excellent :) [11:21:53] call me when that happens, we can go play bingo :P [11:22:56] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448553 (https://phabricator.wikimedia.org/T200522) (owner: 10Amire80) [11:24:10] (03CR) 10Zfilipin: Enable SandboxLink on the isiZulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448553 (https://phabricator.wikimedia.org/T200522) (owner: 10Amire80) [11:24:15] (03PS2) 10Zfilipin: Enable SandboxLink on the isiZulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448553 (https://phabricator.wikimedia.org/T200522) (owner: 10Amire80) [11:24:22] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448553 (https://phabricator.wikimedia.org/T200522) (owner: 10Amire80) [11:25:56] (03Merged) 10jenkins-bot: Enable SandboxLink on the isiZulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448553 (https://phabricator.wikimedia.org/T200522) (owner: 10Amire80) [11:26:19] zeljkof: ready to test? [11:26:55] (03CR) 10Muehlenhoff: "Yeah, but that's a bug in the contint class, it has a different scope and should not include the mediawiki classes. We already fixed that " [puppet] - 10https://gerrit.wikimedia.org/r/449672 (owner: 10Muehlenhoff) [11:27:22] aharoni: it's at mwdebug1002, please test and let me know if I can deploy it [11:27:31] Nikerabbit: are you there? can you test please? [11:27:59] yep [11:28:43] can confirm that it works (and that the label seems to be untranslated) [11:29:10] aharoni, Nikerabbit: ok to deploy? [11:29:14] (03PS1) 10Jcrespo: mariadb: Depool db1092 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449697 [11:29:16] +2 from me [11:29:26] ok, deploying [11:29:49] thanks Nikerabbit [11:30:24] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:448553|Enable SandboxLink on the isiZulu Wikipedia (T200522)]] (duration: 00m 56s) [11:30:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:30:28] T200522: Enable SandboxLink on the isiZulu Wikipedia - https://phabricator.wikimedia.org/T200522 [11:30:50] aharoni, Nikerabbit: it's deployed, please test and thanks for deploying with #releng! ;) [11:30:56] Nikerabbit: I pointed them to the right place for translation. In these languages there is a frequent problem of missing terminology. I'm working with them on this slowly. [11:31:06] zeljkof: it works! thank you. [11:31:51] (03CR) 10Marostegui: [C: 031] mariadb: Depool db1092 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449697 (owner: 10Jcrespo) [11:31:56] (03PS2) 10Jcrespo: mariadb: Depool db1092 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449697 [11:32:36] (03PS1) 10Marostegui: multiinstance.pp: Page based on the number of processess [puppet] - 10https://gerrit.wikimedia.org/r/449698 (https://phabricator.wikimedia.org/T200509) [11:32:45] stephanebisson: I can deploy 449437 while waiting for 449647 to merge, sounds good? [11:34:10] (03PS3) 10Zfilipin: CleanupParent for draftquality model when PageTriage is used [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449437 (https://phabricator.wikimedia.org/T199357) (owner: 10Sbisson) [11:34:54] zeljkof: sure. There's nothing to test for 449437. It just needs to be in place for when some code is deployed later today or tomorrow. [11:35:03] jan_drewniak: 449647 is merged, I'll deploy it now since it's UBN cc stephanebisson [11:35:21] sounds good! [11:35:31] stephanebisson: you are next after all, the UBN patch got merged, I'll ping you [11:36:06] jan_drewniak: just please update deployment calendar [11:36:44] (03CR) 10jenkins-bot: Enable SandboxLink on the isiZulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448553 (https://phabricator.wikimedia.org/T200522) (owner: 10Amire80) [11:37:35] zeljkof: that at least I can do :P [11:41:00] jan_drewniak: 449647 is at mwdebug1002, please test and let me know if I can deploy it [11:44:23] stephanebisson: so I can deploy 449437 directly, no need for testing at mwdebug? [11:44:40] zeljkof: yup it's deployable [11:44:49] jan_drewniak: ok, deploying [11:45:12] zeljkof: Right, 449437 can be deployed directly [11:45:53] !log zfilipin@deploy1001 Synchronized php-1.32.0-wmf.15/skins/MinervaNeue/: SWAT: [[gerrit:449647|Restore page issues (T200867)]] (duration: 00m 57s) [11:45:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:45:57] T200867: Uncaught TypeError: Cannot read property 'getLeadSectionElement' of undefined - https://phabricator.wikimedia.org/T200867 [11:46:43] jan_drewniak: it's deployed, please test and thanks for deploying with #releng ;) [11:46:54] stephanebisson: ok, will ping you once it's deployed [11:46:57] zeljkof: thank you! [11:48:17] stephanebisson: I'll be able to deploy the config change, but we will run out of time for backports, are they urgent? can they be moved to another window? later today or tomorrow? [11:49:19] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449437 (https://phabricator.wikimedia.org/T199357) (owner: 10Sbisson) [11:49:50] zeljkof: the backport to wmf.15 is kinda urgent since it's broken on enwiki but I guess it can be moved to the next window in ~4 hours [11:50:36] (03Merged) 10jenkins-bot: CleanupParent for draftquality model when PageTriage is used [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449437 (https://phabricator.wikimedia.org/T199357) (owner: 10Sbisson) [11:51:01] stephanebisson: if it's urgent, I can deploy it, but if it can wait 4 hours, please move it [11:51:21] I'll move them both [11:51:34] stephanebisson: thanks! [11:52:00] The swat windows now perfectly line up with my breakfast, lunch, and dinner... it's great ;) [11:53:51] (03CR) 10jenkins-bot: CleanupParent for draftquality model when PageTriage is used [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449437 (https://phabricator.wikimedia.org/T199357) (owner: 10Sbisson) [11:54:17] !log zfilipin@deploy1001 Synchronized wmf-config/: SWAT: [[gerrit:449437|CleanupParent for draftquality model when PageTriage is used (T199357)]] (duration: 00m 56s) [11:54:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:54:21] T199357: New Pages Feed: score draftquality on most recent revision - https://phabricator.wikimedia.org/T199357 [11:54:52] stephanebisson: it's deployed, please check relevant logs and thanks for deploying with #releng :) [11:55:59] !log EU SWAT finished [11:56:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180801T1200) [12:05:12] 10Operations, 10vm-requests: eqiad: (1) VM request for Archiva - https://phabricator.wikimedia.org/T200895 (10elukey) [12:05:25] 10Operations, 10vm-requests: eqiad: (1) VM request for Archiva - https://phabricator.wikimedia.org/T200895 (10elukey) [12:08:09] (03PS1) 10Marostegui: db-eqiad.php: Depool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449701 [12:10:29] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449701 (owner: 10Marostegui) [12:11:47] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449701 (owner: 10Marostegui) [12:13:19] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1099:3311 (duration: 00m 56s) [12:13:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:16:01] !log Deploy schema change on db1099:3311 T144010 T51190 T199368 [12:16:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:16:08] T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 [12:16:09] T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 [12:16:09] T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 [12:16:15] !log installing fuse security updates [12:16:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:24:36] (03PS1) 10Muehlenhoff: Add library hint for fuse [puppet] - 10https://gerrit.wikimedia.org/r/449702 [12:25:16] 10Operations, 10Cloud-VPS, 10cloud-services-team: labvirt1009 has high CPU, disk I/O and skyrocketted load - https://phabricator.wikimedia.org/T200888 (10hashar) [12:25:50] (03CR) 10Muehlenhoff: [C: 032] Add library hint for fuse [puppet] - 10https://gerrit.wikimedia.org/r/449702 (owner: 10Muehlenhoff) [12:26:31] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449703 [12:27:34] (03PS4) 10Volans: Add common base utility modules [software/spicerack] - 10https://gerrit.wikimedia.org/r/448047 (https://phabricator.wikimedia.org/T199079) [12:27:36] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449701 (owner: 10Marostegui) [12:27:42] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449703 (owner: 10Marostegui) [12:28:58] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449703 (owner: 10Marostegui) [12:30:02] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1099:3311 (duration: 00m 55s) [12:30:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:30:31] (03PS1) 10Marostegui: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449705 [12:32:14] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449705 (owner: 10Marostegui) [12:33:51] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449705 (owner: 10Marostegui) [12:35:17] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1105:3311 (duration: 00m 55s) [12:35:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:35:20] !log Deploy schema change on db1105:3311 T144010 T51190 T199368 [12:35:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:35:27] T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 [12:35:27] T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 [12:35:27] T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 [12:37:14] (03CR) 10Jcrespo: [C: 031] multiinstance.pp: Page based on the number of processess [puppet] - 10https://gerrit.wikimedia.org/r/449698 (https://phabricator.wikimedia.org/T200509) (owner: 10Marostegui) [16:46:39] (03PS1) 10Elukey: Import upstream version 2.2.3 [debs/archiva] - 10https://gerrit.wikimedia.org/r/449755 (https://phabricator.wikimedia.org/T192639) [16:46:41] (03PS1) 10Sbisson: Fix a typo in ORES models config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449756 [16:48:17] (03CR) 10Ladsgroup: [C: 031] Fix a typo in ORES models config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449756 (owner: 10Sbisson) [16:51:51] !log Finished deploying refinery using scap, then refinery-deploy-to-hdfs [16:51:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:52:27] team, this deployment should prevent further EL sanitization alarms, sorry for the noise these days [16:53:07] thannks [16:53:08] mforns: [16:53:19] :] [16:53:37] PROBLEM - puppet last run on db1107 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/bash_autologout.sh] [16:57:36] PROBLEM - puppet last run on analytics1070 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/R/biocLite.R] [17:01:04] Question - if I wanted to generate a file in production (presumably an mwmaint host, as the generation is done by a mediawiki maintenance script), and then cause that file to be moved somewhere that it would be publicly available via HTTP, is there prior art that I can piggyback on? [17:01:15] Specifically this is for sitemaps [17:01:54] 10Operations, 10ops-eqsin, 10Traffic: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 (10RobH) Update from email thread: I've started to arrange for the Dell Onsite Engineer to visit next Monday, August 6th. We'll need to ensure cp5001 is still offline this Friday in... [17:02:41] PROBLEM - puppet last run on labvirt1013 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/apt2xml] [17:05:13] (03PS1) 10Ayounsi: Extend cp to cp ipsec MTU 1450 to codfw [puppet] - 10https://gerrit.wikimedia.org/r/449760 [17:07:39] RECOVERY - puppet last run on labvirt1013 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:11:08] RECOVERY - puppet last run on cp3044 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [17:11:10] (03CR) 10Filippo Giunchedi: [C: 04-1] ircecho: Support auth over irc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/405594 (owner: 10Paladox) [17:11:37] (03CR) 10BBlack: [C: 031] Extend cp to cp ipsec MTU 1450 to codfw [puppet] - 10https://gerrit.wikimedia.org/r/449760 (owner: 10Ayounsi) [17:12:25] (03CR) 10Ayounsi: "https://puppet-compiler.wmflabs.org/compiler02/11951/" [puppet] - 10https://gerrit.wikimedia.org/r/449760 (owner: 10Ayounsi) [17:12:33] (03CR) 10Ayounsi: [C: 032] Extend cp to cp ipsec MTU 1450 to codfw [puppet] - 10https://gerrit.wikimedia.org/r/449760 (owner: 10Ayounsi) [17:14:19] (03PS1) 10Dzahn: graphite: delete duplicate role(graphite::primary) [puppet] - 10https://gerrit.wikimedia.org/r/449763 [17:18:38] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler03/11952/graphite1003.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/448779 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [17:18:45] (03CR) 10Aaron Schulz: [C: 032] Use mcrouter for cache reads for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449603 (https://phabricator.wikimedia.org/T198239) (owner: 10Aaron Schulz) [17:19:41] !log enable puppet on all cp2* servers after merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/449760 - T195365 [17:19:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:19:46] T195365: cp intermittent IPsec MTU issue - https://phabricator.wikimedia.org/T195365 [17:20:04] (03Merged) 10jenkins-bot: Use mcrouter for cache reads for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449603 (https://phabricator.wikimedia.org/T198239) (owner: 10Aaron Schulz) [17:22:36] !log aaron@deploy1001 Synchronized wmf-config/mc.php: Use mcrouter for cache reads for test wikis (duration: 00m 55s) [17:22:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:22:58] RECOVERY - puppet last run on analytics1070 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:24:09] RECOVERY - puppet last run on db1107 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:33:12] (03CR) 10jenkins-bot: Use mcrouter for cache reads for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449603 (https://phabricator.wikimedia.org/T198239) (owner: 10Aaron Schulz) [17:34:14] (03CR) 10Marostegui: "Mmmm we only want core multininstance paging, not misc or sanitarium, why not just modify: modules/profile/manifests/mariadb/core/multiins" [puppet] - 10https://gerrit.wikimedia.org/r/449711 (https://phabricator.wikimedia.org/T200509) (owner: 10Marostegui) [17:34:22] (03PS1) 10Dduvall: jenkins: add workspacesDir system property [puppet] - 10https://gerrit.wikimedia.org/r/449769 [17:35:05] (03CR) 10jerkins-bot: [V: 04-1] jenkins: add workspacesDir system property [puppet] - 10https://gerrit.wikimedia.org/r/449769 (owner: 10Dduvall) [17:35:59] (03PS2) 10Dduvall: jenkins: add workspacesDir system property [puppet] - 10https://gerrit.wikimedia.org/r/449769 [17:36:41] (03CR) 10jerkins-bot: [V: 04-1] jenkins: add workspacesDir system property [puppet] - 10https://gerrit.wikimedia.org/r/449769 (owner: 10Dduvall) [17:37:47] (03PS3) 10Dduvall: jenkins: add workspacesDir system property [puppet] - 10https://gerrit.wikimedia.org/r/449769 [17:38:46] 10Operations, 10Wikimedia-Mailing-lists: Mailing list for Knowledge Integrity program - https://phabricator.wikimedia.org/T200924 (10Samwalton9) [17:39:11] 10Operations, 10Wikimedia-Mailing-lists: Mailing list for Knowledge Integrity program - https://phabricator.wikimedia.org/T200924 (10Samwalton9) [17:39:44] (03PS6) 10Dzahn: jenkins: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/434538 (https://phabricator.wikimedia.org/T194724) [17:41:57] (03CR) 10Alex Monk: labs-ip-alias-dump: Update to work with pdns-recursor v4.x (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/449627 (https://phabricator.wikimedia.org/T200294) (owner: 10Andrew Bogott) [17:42:06] 10Operations, 10Analytics, 10Analytics-EventLogging, 10EventBus, and 2 others: RFC: Modern Event Platform - Choose Schema Tech - https://phabricator.wikimedia.org/T198256 (10kchapman) Reminder there is a meeting today August 1st at 2pm PST(22:00 UTC, 23:00 CET) in #wikimediaoffice [17:42:23] !log mobrovac@deploy1001 Started deploy [restbase/deploy@4c9966c] (dev-cluster): Multi-content bucket: delete old revisions when a new render happens [17:42:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:43:34] (03CR) 10Dzahn: jenkins: add workspacesDir system property (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/449769 (owner: 10Dduvall) [17:44:50] (03CR) 10Dzahn: jenkins: add workspacesDir system property (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/449769 (owner: 10Dduvall) [17:45:14] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@4c9966c] (dev-cluster): Multi-content bucket: delete old revisions when a new render happens (duration: 02m 52s) [17:45:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:45:34] !log mobrovac@deploy1001 Started deploy [restbase/deploy@4c9966c]: Multi-content bucket: delete old revisions when a new render happens [17:45:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:07] (03CR) 10Dduvall: jenkins: add workspacesDir system property (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/449769 (owner: 10Dduvall) [17:46:47] (03CR) 10Marostegui: "> Mmmm we only want core multininstance paging, not misc or" [puppet] - 10https://gerrit.wikimedia.org/r/449711 (https://phabricator.wikimedia.org/T200509) (owner: 10Marostegui) [17:47:06] (03PS4) 10Dduvall: jenkins: add workspacesDir system property [puppet] - 10https://gerrit.wikimedia.org/r/449769 [17:47:51] 10Operations, 10Availability (MediaWiki-MultiDC), 10Patch-For-Review, 10Performance-Team (Radar): Deploy mcrouter to production as a wancache backend - https://phabricator.wikimedia.org/T192370 (10Krinkle) [17:48:17] (03CR) 10Dduvall: jenkins: add workspacesDir system property (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/449769 (owner: 10Dduvall) [17:48:32] eh.. i haven't seen _this_ in compiler output before: CRITICAL: Build run failed: Unsupported Gerrit project: operations/mediawiki-config .. heh [17:48:57] mutante, huh. dependency possibly? [17:49:38] Krenair: i made a typo in the gerrit change number to test.. my bad [17:49:53] ah :) [17:51:34] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/11954/contint1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/434538 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [17:57:57] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@4c9966c]: Multi-content bucket: delete old revisions when a new render happens (duration: 12m 23s) [17:58:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:58:14] !log mobrovac@deploy1001 Started deploy [restbase/deploy@4c9966c]: Multi-content bucket: delete old revisions when a new render happens, take #2 [17:58:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:01:39] (03CR) 10Krinkle: [WIP] Add php72 base and web images (031 comment) [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/449033 (https://phabricator.wikimedia.org/T188318) (owner: 10Legoktm) [18:03:22] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@4c9966c]: Multi-content bucket: delete old revisions when a new render happens, take #2 (duration: 05m 08s) [18:03:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:03:28] !log mobrovac@deploy1001 Started deploy [restbase/deploy@4c9966c]: Multi-content bucket: delete old revisions when a new render happens, take #3 [18:03:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:07:50] (03PS15) 10ArielGlenn: Generate daily diffs for categories RDF [puppet] - 10https://gerrit.wikimedia.org/r/378355 (https://phabricator.wikimedia.org/T198356) (owner: 10Smalyshev) [18:09:02] (03CR) 10ArielGlenn: [C: 032] Generate daily diffs for categories RDF [puppet] - 10https://gerrit.wikimedia.org/r/378355 (https://phabricator.wikimedia.org/T198356) (owner: 10Smalyshev) [18:09:56] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@4c9966c]: Multi-content bucket: delete old revisions when a new render happens, take #3 (duration: 06m 27s) [18:09:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:09:59] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [18:10:14] !log mobrovac@deploy1001 Started deploy [restbase/deploy@4c9966c]: Multi-content bucket: delete old revisions when a new render happens, take #4 [18:10:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:35] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@4c9966c]: Multi-content bucket: delete old revisions when a new render happens, take #4 (duration: 03m 21s) [18:13:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:49] (03CR) 10Alex Monk: WIP: provide ACMEv2 support based on certbot/acme library (033 comments) [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) (owner: 10Vgutierrez) [18:32:00] 10Operations, 10Beta-Cluster-Infrastructure, 10procurement: Beta eswikibooks certificate issues - https://phabricator.wikimedia.org/T199387 (10MarcoAurelio) [18:35:25] (03PS1) 10Ayounsi: cp to cp ipsec MTU change everywhere except eqiad [puppet] - 10https://gerrit.wikimedia.org/r/449787 (https://phabricator.wikimedia.org/T195365) [18:40:11] (03PS5) 10Dduvall: jenkins: add workspacesDir system property [puppet] - 10https://gerrit.wikimedia.org/r/449769 [18:42:03] (03CR) 10Krinkle: [C: 04-1] Scap: update logstash_checker.py mwdeploy query (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/449639 (owner: 10Thcipriani) [18:46:27] (03CR) 10Ayounsi: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/11956/" [puppet] - 10https://gerrit.wikimedia.org/r/449787 (https://phabricator.wikimedia.org/T195365) (owner: 10Ayounsi) [18:49:12] so jouncebot keeps being killed.. did it lose ability to identify with services.. or is it really a fake one [18:49:18] tools.joun [18:51:06] mutante: I'll go check the logs [18:51:48] bd808: thank you [18:52:01] Seems to be identified [18:52:03] ? [18:53:19] I should move it to the k8s cluster like stashbot [18:54:37] the logs looks like 2 instances were running and fighting over the nick. Maybe a netsplit problem? [18:55:03] hmm.. Nickname regained by services.. somebody else was identifying to release the nick [18:55:24] but if that was just 2 instances.. ah! makes sense [18:55:28] yeah, the bot does that itself actually [18:56:04] and there it goes again :/ [18:56:14] * bd808 stares a grid status [18:56:56] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown, 10SEO: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Imarlier) @fgiunchedi @Joe @Dzahn @herron - Casting a wide net here, as I'm not s... [18:58:04] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown, 10SEO: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Imarlier) @BBlack Somewhat random, but does varnish have the ability to translate... [18:59:08] jouncebot: now [18:59:08] No deployments scheduled for the next 0 hour(s) and 0 minute(s) [19:00:05] twentyafterfour: Your horoscope predicts another unfortunate MediaWiki train - Americas version deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180801T1900). [19:07:31] !log preparing to deploy 1.32.0-wmf.15 to group1 wikis. Gonna merge a couple of patches first to fix some logspam. [19:07:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:17] (03CR) 10Andrew Bogott: labs-ip-alias-dump: Update to work with pdns-recursor v4.x (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/449627 (https://phabricator.wikimedia.org/T200294) (owner: 10Andrew Bogott) [19:20:31] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown, 10SEO: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Dzahn) >>! In T199252#4470132, @Imarlier wrote: > My current plan is to solve thi... [19:36:07] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decom promethium/WMF3571 - https://phabricator.wikimedia.org/T191362 (10Andrew) This host looks weird because it's on the wmcs vlan and uses the wmcs puppetmaster. I'm currently trying to confirm that it's no longer used so we can decom it (and I can ri... [19:43:21] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decom promethium/WMF3571 - https://phabricator.wikimedia.org/T191362 (10RobH) a:05RobH>03ssastry @ssastry: I'm assigning this to you for feedback, please confirm this host is no longer used and can be decommissioned. [19:47:42] (03PS1) 10Andrew Bogott: wmcs pdns: exclude metal resolver on all non-main deploys [puppet] - 10https://gerrit.wikimedia.org/r/449802 (https://phabricator.wikimedia.org/T200294) [19:48:59] 10Operations, 10Wikimedia-General-or-Unknown: Wrong umask when deploying from screen - https://phabricator.wikimedia.org/T200690 (10Dzahn) Yes, it's unrelated to screen and the -t. As you say it's because SSH uses a non-login shell: ``` ssh deploy1001.eqiad.wmnet 'umask' 0022 ``` Using /etc/bash.bashrc doe... [19:53:05] !log twentyafterfour@deploy1001 Synchronized php-1.32.0-wmf.15/extensions/Collection/: Sync Change: https://gerrit.wikimedia.org/r/449796 Bug: T189636 unblocks: T191061 (duration: 00m 57s) [19:53:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:12] T191061: 1.32.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T191061 [19:53:12] T189636: Undefined 'subtitle' and 'title' indexes in CollectionPageTemplate.php - https://phabricator.wikimedia.org/T189636 [19:55:05] (03PS2) 10Andrew Bogott: wmcs pdns: exclude metal resolver on all non-main deploys [puppet] - 10https://gerrit.wikimedia.org/r/449802 (https://phabricator.wikimedia.org/T200294) [19:56:00] (03PS1) 10Dduvall: ci: Put Blubber back on Docker integration agents [puppet] - 10https://gerrit.wikimedia.org/r/449804 [19:58:02] (03CR) 10Andrew Bogott: [C: 032] wmcs pdns: exclude metal resolver on all non-main deploys [puppet] - 10https://gerrit.wikimedia.org/r/449802 (https://phabricator.wikimedia.org/T200294) (owner: 10Andrew Bogott) [20:00:05] cscott, arlolra, subbu, bearND, halfak, and Amir1: Time to snap out of that daydream and deploy Services – Parsoid / Citoid / Mobileapps / ORES / …. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180801T2000). [20:00:14] Nothing for ORES today. [20:04:04] (03PS1) 1020after4: group1 wikis to 1.32.0-wmf.15 refs T191061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449810 [20:04:06] (03CR) 1020after4: [C: 032] group1 wikis to 1.32.0-wmf.15 refs T191061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449810 (owner: 1020after4) [20:04:20] !log bsitzmann@deploy1001 Started deploy [mobileapps/deploy@c2448e0]: Update mobileapps to 282f368 (T200464 T200459) [20:04:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:26] T200459: Bug: mobile-html should use only externally visible URIs for referencing site CSS - https://phabricator.wikimedia.org/T200459 [20:04:26] T200464: mobile-html CSP issues - https://phabricator.wikimedia.org/T200464 [20:06:47] (03Merged) 10jenkins-bot: group1 wikis to 1.32.0-wmf.15 refs T191061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449810 (owner: 1020after4) [20:10:02] !log bsitzmann@deploy1001 Finished deploy [mobileapps/deploy@c2448e0]: Update mobileapps to 282f368 (T200464 T200459) (duration: 05m 42s) [20:10:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:08] T200459: Bug: mobile-html should use only externally visible URIs for referencing site CSS - https://phabricator.wikimedia.org/T200459 [20:10:08] T200464: mobile-html CSP issues - https://phabricator.wikimedia.org/T200464 [20:18:42] 10Operations, 10DBA, 10JADE, 10Scoring-platform-team, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) @daniel: The separate, single wiki alternative is still on my radar, IMO it's the only alternative which ca... [20:23:40] !log twentyafterfour@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.15 refs T191061 [20:23:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:23:55] T191061: 1.32.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T191061 [20:24:36] !log twentyafterfour@deploy1001 Synchronized php: group1 wikis to 1.32.0-wmf.15 refs T191061 (duration: 00m 55s) [20:24:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:29:49] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown, 10SEO: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Krinkle) >>! In T199252#4470147, @Imarlier wrote: > @BBlack Somewhat random, but... [20:32:11] (03CR) 10jenkins-bot: group1 wikis to 1.32.0-wmf.15 refs T191061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449810 (owner: 1020after4) [20:36:52] !log T191061 Finished deploying 1.32.0-wmf.15 to group1 wikis. Fatalmonitor is quiet and everything appears to be stable. See you tomorrow for group2. [20:36:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:36:59] T191061: 1.32.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T191061 [20:42:37] (03PS2) 10Thcipriani: Scap: update logstash_checker.py mwdeploy query [puppet] - 10https://gerrit.wikimedia.org/r/449639 [20:44:11] aww.. this is why my command works manually but not when run by cron: Syntax error: EOF in backquote substitution ..fun [20:45:53] (03CR) 10Ayounsi: [V: 032 C: 032] "Builds properly." [debs/python-anycast-healthchecker] - 10https://gerrit.wikimedia.org/r/397619 (owner: 10Ayounsi) [20:46:10] (03CR) 10Ayounsi: [V: 032 C: 032] "Builds properly." [debs/python-json-logger] - 10https://gerrit.wikimedia.org/r/397615 (owner: 10Ayounsi) [20:51:05] (03PS1) 10Dzahn: postgresql::backup: fix "EOF in backquote substitution" [puppet] - 10https://gerrit.wikimedia.org/r/449874 (https://phabricator.wikimedia.org/T190184) [20:54:26] (03PS2) 10Dzahn: postgresql::backup: fix "EOF in backquote substitution" [puppet] - 10https://gerrit.wikimedia.org/r/449874 (https://phabricator.wikimedia.org/T190184) [20:55:23] (03PS1) 10Andrew Bogott: labweb: add some AAAA ferm rules for labweb access [puppet] - 10https://gerrit.wikimedia.org/r/449875 [20:55:54] (03CR) 10Dzahn: [C: 032] postgresql::backup: fix "EOF in backquote substitution" [puppet] - 10https://gerrit.wikimedia.org/r/449874 (https://phabricator.wikimedia.org/T190184) (owner: 10Dzahn) [20:59:00] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Jenkins, 10Patch-For-Review: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10thcipriani) >>! In T192561#4467865, @Dzahn wrote: >>>! In T192561#4467858, @Dzahn wrote: >> Move the... [21:00:48] (03PS2) 10Andrew Bogott: labweb: add some AAAA ferm rules for labweb access [puppet] - 10https://gerrit.wikimedia.org/r/449875 [21:02:00] (03CR) 10Andrew Bogott: [C: 032] labweb: add some AAAA ferm rules for labweb access [puppet] - 10https://gerrit.wikimedia.org/r/449875 (owner: 10Andrew Bogott) [21:06:46] (03PS3) 10MSantos: Set up cron task to regen low-zoom vector tiles [puppet] - 10https://gerrit.wikimedia.org/r/449719 (https://phabricator.wikimedia.org/T194787) [21:08:29] PROBLEM - Check systemd state on labtestweb2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:21:49] PROBLEM - Check systemd state on labweb1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:23:46] (03CR) 10Volans: [C: 04-1] "Nice work! Few comments here and there, most minor nitpicks or optional." (0326 comments) [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) (owner: 10Vgutierrez) [21:24:09] PROBLEM - nova instance creation test on labnet1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack [21:25:51] ACKNOWLEDGEMENT - nova instance creation test on labnet1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack andrew bogott These are all things Im currently working on :/ [21:25:51] ACKNOWLEDGEMENT - Check systemd state on labtestweb2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. andrew bogott These are all things Im currently working on :/ [21:25:52] ACKNOWLEDGEMENT - Check systemd state on labweb1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. andrew bogott These are all things Im currently working on :/ [21:29:49] (03PS1) 10Andrew Bogott: labweb ferm: second attempt at getting the @resolve syntax right [puppet] - 10https://gerrit.wikimedia.org/r/449882 [21:30:26] (03CR) 10jerkins-bot: [V: 04-1] labweb ferm: second attempt at getting the @resolve syntax right [puppet] - 10https://gerrit.wikimedia.org/r/449882 (owner: 10Andrew Bogott) [21:31:28] (03PS2) 10Andrew Bogott: labweb ferm: second attempt at getting the @resolve syntax right [puppet] - 10https://gerrit.wikimedia.org/r/449882 [21:32:30] (03CR) 10Andrew Bogott: [C: 032] labweb ferm: second attempt at getting the @resolve syntax right [puppet] - 10https://gerrit.wikimedia.org/r/449882 (owner: 10Andrew Bogott) [21:33:06] (03CR) 10Legoktm: [C: 04-1] [WIP] Add php72 base and web images (031 comment) [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/449033 (https://phabricator.wikimedia.org/T188318) (owner: 10Legoktm) [21:33:39] PROBLEM - Check systemd state on labweb1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:41:00] RECOVERY - Check systemd state on labweb1001 is OK: OK - running: The system is fully operational [21:41:04] (03PS1) 10Andrew Bogott: labweb ferm: one last pair of parentheses [puppet] - 10https://gerrit.wikimedia.org/r/449883 [21:42:20] 10Operations, 10Analytics, 10Analytics-EventLogging, 10EventBus, and 2 others: RFC: Modern Event Platform - Choose Schema Tech - https://phabricator.wikimedia.org/T198256 (10Ottomata) In todays RFC meeting, there was consensus to move forward with JSONSchema with strict policies about what is allowed (e.g.... [21:43:43] (03CR) 10Andrew Bogott: [C: 032] labweb ferm: one last pair of parentheses [puppet] - 10https://gerrit.wikimedia.org/r/449883 (owner: 10Andrew Bogott) [21:43:46] (03PS1) 10Andrew Bogott: nova fullstack: update image name for latest image [puppet] - 10https://gerrit.wikimedia.org/r/449884 [21:44:07] (03PS2) 10Andrew Bogott: nova fullstack: update image name for latest image [puppet] - 10https://gerrit.wikimedia.org/r/449884 [21:45:07] (03CR) 10Andrew Bogott: [C: 032] nova fullstack: update image name for latest image [puppet] - 10https://gerrit.wikimedia.org/r/449884 (owner: 10Andrew Bogott) [21:46:10] RECOVERY - Check systemd state on labweb1002 is OK: OK - running: The system is fully operational [21:46:50] RECOVERY - nova instance creation test on labnet1001 is OK: PROCS OK: 1 process with command name python, args nova-fullstack [21:47:13] godog: I wonder why https://grafana.wikimedia.org/dashboard/db/memcache?orgId=1 doesn't show "add" operations [21:50:47] (03PS1) 10Ayounsi: cp to cp ipsec MTU set to 1450 for all cp servers [puppet] - 10https://gerrit.wikimedia.org/r/449886 (https://phabricator.wikimedia.org/T195365) [21:52:30] (03CR) 10Volans: [C: 04-1] "Thanks a lot for taking care of this, really appreciated!" (039 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/449191 (owner: 10Gehel) [21:57:57] (03CR) 10Ayounsi: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/11960/" [puppet] - 10https://gerrit.wikimedia.org/r/449886 (https://phabricator.wikimedia.org/T195365) (owner: 10Ayounsi) [22:05:38] RECOVERY - Check systemd state on labtestweb2001 is OK: OK - running: The system is fully operational [22:09:41] 10Operations, 10decommission: Decommission notebook1001 - https://phabricator.wikimedia.org/T192103 (10RobH) [22:11:57] 10Operations, 10decommission: Decommission notebook1001 - https://phabricator.wikimedia.org/T192103 (10RobH) [22:15:31] (03PS1) 10RobH: decom of notebook101 prod dns [dns] - 10https://gerrit.wikimedia.org/r/449890 (https://phabricator.wikimedia.org/T192103) [22:16:56] (03PS1) 10Dzahn: etcd: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/449891 [22:16:58] (03PS1) 10Dzahn: elasticsearch: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/449892 [22:17:01] (03PS1) 10RobH: decom notebook1001 entries [puppet] - 10https://gerrit.wikimedia.org/r/449893 (https://phabricator.wikimedia.org/T192103) [22:17:06] (03CR) 10RobH: [C: 032] decom of notebook101 prod dns [dns] - 10https://gerrit.wikimedia.org/r/449890 (https://phabricator.wikimedia.org/T192103) (owner: 10RobH) [22:17:39] (03CR) 10RobH: [C: 032] decom notebook1001 entries [puppet] - 10https://gerrit.wikimedia.org/r/449893 (https://phabricator.wikimedia.org/T192103) (owner: 10RobH) [22:18:42] (03CR) 10Dzahn: "yea, this was more of a test what the script can do automatically and how smart it is. since we can already see it is quite limited as you" [puppet] - 10https://gerrit.wikimedia.org/r/441209 (owner: 10Dzahn) [22:19:35] 10Operations, 10ops-eqiad, 10decommission: Decommission notebook1001 - https://phabricator.wikimedia.org/T192103 (10RobH) a:03Cmjohnson [22:21:33] (03CR) 10Thcipriani: Scap: update logstash_checker.py mwdeploy query (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/449639 (owner: 10Thcipriani) [22:21:33] 10Operations, 10ops-eqsin, 10Traffic: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 (10RobH) Just need to check with #traffic to ensure having this offline Friday-Monday is ok? [22:22:06] (03CR) 10BBlack: [C: 032] cp1075-99: further mkfs tweaks [puppet] - 10https://gerrit.wikimedia.org/r/449466 (https://phabricator.wikimedia.org/T195923) (owner: 10BBlack) [22:22:14] (03PS2) 10BBlack: cp1075-99: further mkfs tweaks [puppet] - 10https://gerrit.wikimedia.org/r/449466 (https://phabricator.wikimedia.org/T195923) [22:22:51] 10Operations, 10ops-eqsin, 10Traffic: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 (10BBlack) It's already depooled, should be fine! [22:28:38] (03PS1) 10Brion VIBBER: Re-enable VP8 video transcodes to fix playback regression [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449895 [22:37:29] (03CR) 10Volans: "Thanks for the fix, minor nitpick inline." (031 comment) [software/cumin] - 10https://gerrit.wikimedia.org/r/449224 (owner: 10Gehel) [22:38:28] 10Operations, 10Deployments, 10HHVM, 10Performance-Team (Radar), and 2 others: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886 (10Krinkle) [22:39:08] (03CR) 10Krinkle: [C: 031] Scap: update logstash_checker.py mwdeploy query (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/449639 (owner: 10Thcipriani) [22:45:08] (03PS4) 10Krinkle: services: Convert ProductionServices.php to static array file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/443874 [22:46:37] (03CR) 10Krinkle: [C: 032] services: Convert ProductionServices.php to static array file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/443874 (owner: 10Krinkle) [22:47:54] (03Merged) 10jenkins-bot: services: Convert ProductionServices.php to static array file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/443874 (owner: 10Krinkle) [22:48:01] * Krinkle staging on mwdebug1002/deploy1001 [22:52:50] 10Operations, 10Domains, 10Traffic, 10WMF-Communications, 10wikimediafoundation.org: Update jobs.wikimedia.org - https://phabricator.wikimedia.org/T200951 (10Varnent) [22:57:09] !log krinkle@deploy1001 Synchronized wmf-config: I4db5d03f8af7 (step 1) (duration: 00m 57s) [22:57:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:58:13] !log krinkle@deploy1001 Synchronized wmf-config: I4db5d03f8af7 (step 2) (duration: 00m 57s) [22:58:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:00:05] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Evening SWAT (Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180801T2300). [23:00:05] brion: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:14] yummmm stickers [23:00:23] * Krinkle is done [23:00:39] Although I was very confused by this in the scap sync-dir output [23:00:41] > 22:57:44 Check 'Check endpoints for mw1261.eqiad.wmnet' failed: /wiki/{title} (Main Page) is CRITICAL: Test Main Page returned the unexpected status 503 (expecting: 200); /wiki/{title} (Special Version) is CRITICAL: Test Special Version returned the unexpected status 503 (expecting: 200); /w/api.php (Main Page pageprops) is CRITICAL: Test Main Page pageprops returned the unexpected status 503 (expecting: 200) [23:01:03] Logstash says nothing bad about mw1261, though. And it continued regardless because only 1 failed. [23:02:19] $ curl -I 'http://mw1261.eqiad.wmnet/wiki/Main_Page' -H 'Host: en.wikipedia.org' [23:02:19] HTTP/1.1 200 OK [23:02:19] Server: mw1261.eqiad.wmnet [23:02:25] So looks like a fluke of sorts. [23:04:34] weird. looks like it returned a 503 on that server a few times for some reason. [23:04:55] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: cp intermittent IPsec MTU issue - https://phabricator.wikimedia.org/T195365 (10ayounsi) 05Open>03Resolved This is done, the static routes with mtu lock did the trick, as expected. No more ICMP spikes confirmed on https://grafana.wikimedia.org/dashb... [23:05:41] that looks like all the checks from https://en.wikipedia.org/spec.yaml [23:08:22] anyway, I can SWAT [23:08:57] :) [23:09:44] (03PS1) 10BBlack: cp nodes: use newer mke2fs on all stretch installs [puppet] - 10https://gerrit.wikimedia.org/r/449901 [23:12:12] brion: is your mwconfig change independent of the timedmediahandler changes? Fine if it goes out first? [23:12:32] (03PS2) 10Thcipriani: Re-enable VP8 video transcodes to fix playback regression [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449895 (owner: 10Brion VIBBER) [23:12:32] (03CR) 10BBlack: [C: 032] cp nodes: use newer mke2fs on all stretch installs [puppet] - 10https://gerrit.wikimedia.org/r/449901 (owner: 10BBlack) [23:13:02] thcipriani: yeah they're independent [23:13:29] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449895 (owner: 10Brion VIBBER) [23:13:32] okie doke [23:13:35] whee! [23:14:42] (03Merged) 10jenkins-bot: Re-enable VP8 video transcodes to fix playback regression [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449895 (owner: 10Brion VIBBER) [23:15:18] brion: ^ is live on mwdebug1002, check please [23:15:44] thcipriani: it's a fix for job queue runners, so nothing to see on web :) [23:15:50] k [23:16:02] just roll em out :) [23:16:06] * thcipriani does [23:17:53] !log thcipriani@deploy1001 Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:449895|Re-enable VP8 video transcodes to fix playback regression]] (duration: 00m 56s) [23:17:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:18:02] ^ brion live now [23:18:19] thcipriani: looks good! thanks [23:21:09] thcipriani: Did a similar error show? [23:21:19] Krinkle: nope, not that time [23:21:22] k [23:33:38] (03CR) 10jenkins-bot: services: Convert ProductionServices.php to static array file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/443874 (owner: 10Krinkle) [23:33:41] (03CR) 10jenkins-bot: Re-enable VP8 video transcodes to fix playback regression [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449895 (owner: 10Brion VIBBER) [23:34:31] brion: TimedMediaHandler changes on mwdebug1002, check please [23:35:00] thcipriani: backend only again [23:35:11] okie doke, going live [23:35:17] woooooohoooooooo [23:35:23] * brion puts on sunglasses and enjoys the ride [23:37:12] !log thcipriani@deploy1001 Synchronized php-1.32.0-wmf.15/extensions/TimedMediaHandler/WebVideoTranscode/WebVideoTranscodeJob.php: SWAT: [[gerrit:449793|Work around transcode failures in newer ffmpeg]] (duration: 00m 57s) [23:37:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:37:54] > Check 'Check endpoints for mwdebug1002.eqiad.wmnet' failed: /wiki/{title} (Main Page) timed out before a response was received; /wiki/{title} (Special Version) timed out before a response was received; /w/api.php (Main Page pageprops) timed out before a response was received [23:37:56] hrm [23:39:07] and then this sync it's fine [23:39:22] !log thcipriani@deploy1001 Synchronized php-1.32.0-wmf.14/extensions/TimedMediaHandler/WebVideoTranscode/WebVideoTranscodeJob.php: SWAT: [[gerrit:449792|Work around transcode failures in newer ffmpeg]] (duration: 00m 56s) [23:39:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:39:30] ^ brion all live! [23:39:38] thanks! [23:39:42] yw :) [23:42:17] 10Operations, 10ops-eqsin, 10Traffic: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 (10RobH) Excellent, I'll continue coordinating with Dell support and Equinix to file the tasks for this repair. [23:46:24] (03PS6) 10Dduvall: jenkins: add workspacesDir system property [puppet] - 10https://gerrit.wikimedia.org/r/449769 (https://phabricator.wikimedia.org/T200953) [23:51:49] (03PS1) 10Reedy: Update Foundation urls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/449909 (https://phabricator.wikimedia.org/T199812)