[00:04:16] (03PS2) 10Ladsgroup: Add wordmark for Persian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423497 (https://phabricator.wikimedia.org/T191176) [00:04:28] (03CR) 10Ladsgroup: [C: 032] Add wordmark for Persian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423497 (https://phabricator.wikimedia.org/T191176) (owner: 10Ladsgroup) [00:05:41] (03Merged) 10jenkins-bot: Add wordmark for Persian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423497 (https://phabricator.wikimedia.org/T191176) (owner: 10Ladsgroup) [00:07:55] (03CR) 10jenkins-bot: Add wordmark for Persian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423497 (https://phabricator.wikimedia.org/T191176) (owner: 10Ladsgroup) [00:08:23] (03PS1) 10Ladsgroup: Revert "Add wordmark for Persian Wikipedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423583 [00:08:50] (03CR) 10Ladsgroup: [C: 032] "It's too small :/ Will make a better version tomorrow." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423583 (owner: 10Ladsgroup) [00:10:03] (03Merged) 10jenkins-bot: Revert "Add wordmark for Persian Wikipedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423583 (owner: 10Ladsgroup) [00:11:14] !log Evening SWAT is done [00:11:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:12:20] (03CR) 10jenkins-bot: Revert "Add wordmark for Persian Wikipedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423583 (owner: 10Ladsgroup) [00:18:21] (03PS5) 10Dzahn: cassandra/icinga: make monitoring configurable, skip on dev [puppet] - 10https://gerrit.wikimedia.org/r/419339 (https://phabricator.wikimedia.org/T189050) [00:21:17] 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Define turn-up process and scope for eqsin service to regional countries - https://phabricator.wikimedia.org/T189252#4099198 (10Krinkle) [00:57:05] Amir1: That SVG in https://gerrit.wikimedia.org/r/#/c/423497/ doesn't look it's aligned to https://www.mediawiki.org/wiki/Manual:Coding_conventions/SVG [00:58:07] Volker_E: oh, good to know, I thought I checked it with SVGOMG [00:59:07] the inkscape styles shouldn't be there [00:59:48] Amir1: I'd also rec adding an XML declaration and a title element for accessibility reasons [01:02:38] noted [01:02:41] thanks :) [01:45:50] 10Operations, 10netops: Config discrepencies on network devices - https://phabricator.wikimedia.org/T189588#4099392 (10ayounsi) [02:44:10] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.27) (duration: 19m 03s) [02:44:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:18:04] !log Enable back gtid on db2035 - T191193 [05:18:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:18:12] T191193: Move masters away from codfw C6 - https://phabricator.wikimedia.org/T191193 [05:19:20] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Move masters away from codfw C6 - https://phabricator.wikimedia.org/T191193#4099491 (10Marostegui) >>! In T191193#4097801, @Papaul wrote: > switch port information when ready to move db2039. This i just a note for when we are ready to do the move. > >... [05:31:27] (03PS1) 10Marostegui: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423602 (https://phabricator.wikimedia.org/T187089) [05:34:04] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423602 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [05:35:17] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423602 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [05:36:44] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1078 for alter table (duration: 00m 59s) [05:36:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:37:23] !log Deploy schema change on db1078 - s3 - T187089 T185128 T153182 [05:37:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:37:31] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [05:37:31] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [05:37:31] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [05:38:14] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423602 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [06:14:00] (03PS1) 10Marostegui: db-codfw.php: Change db2035 rack. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423603 [06:14:16] (03PS1) 10Muehlenhoff: Extend access for jdcc [puppet] - 10https://gerrit.wikimedia.org/r/423604 [06:15:25] (03CR) 10Marostegui: [C: 032] db-codfw.php: Change db2035 rack. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423603 (owner: 10Marostegui) [06:16:16] (03CR) 10Muehlenhoff: [C: 032] Extend access for jdcc [puppet] - 10https://gerrit.wikimedia.org/r/423604 (owner: 10Muehlenhoff) [06:16:38] (03Merged) 10jenkins-bot: db-codfw.php: Change db2035 rack. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423603 (owner: 10Marostegui) [06:18:09] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Change db2035 rack comment (duration: 00m 58s) [06:18:12] (03CR) 10jenkins-bot: db-codfw.php: Change db2035 rack. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423603 (owner: 10Marostegui) [06:18:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:33:14] (03PS1) 10Elukey: profile::kafka::mirror::alerts: fix monitoring query string [puppet] - 10https://gerrit.wikimedia.org/r/423605 [06:34:48] (03CR) 10Elukey: [C: 032] profile::kafka::mirror::alerts: fix monitoring query string [puppet] - 10https://gerrit.wikimedia.org/r/423605 (owner: 10Elukey) [06:42:34] RECOVERY - Check systemd state on kafka1023 is OK: OK - running: The system is fully operational [06:42:54] RECOVERY - Check systemd state on kafka1022 is OK: OK - running: The system is fully operational [06:43:11] !log execute systemctl reset-failed kafka-mirror-main-eqiad_to_jumbo-eqiad.service on kafka102[23] [06:43:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:31] (03PS1) 10Elukey: role::druid::analytics::public: enable prometheus zk monitoring [puppet] - 10https://gerrit.wikimedia.org/r/423606 (https://phabricator.wikimedia.org/T177460) [06:59:43] (03PS2) 10ArielGlenn: pylint and cleanup of runnerutils [dumps] - 10https://gerrit.wikimedia.org/r/423141 [07:05:10] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/10764/" [puppet] - 10https://gerrit.wikimedia.org/r/423606 (https://phabricator.wikimedia.org/T177460) (owner: 10Elukey) [07:08:50] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1078" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423609 [07:08:55] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1078" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423609 [07:10:27] (03Abandoned) 10Marostegui: Revert "db-eqiad.php: Depool db1078" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423609 (owner: 10Marostegui) [07:10:37] !log Stop MySQL on db1078 for mariadb and kernel upgrade [07:10:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:03] !log upgrade and restart of labsdb1010 [07:12:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:13:39] (03CR) 10Elukey: [V: 032 C: 032] "Full no op https://puppet-compiler.wmflabs.org/compiler02/10765/" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/423156 (https://phabricator.wikimedia.org/T189051) (owner: 10Elukey) [07:14:54] PROBLEM - haproxy failover on dbproxy1011 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [07:15:04] See last log [07:18:00] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423610 [07:28:03] RECOVERY - haproxy failover on dbproxy1011 is OK: OK check_failover servers up 2 down 0 [07:29:11] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423610 (owner: 10Marostegui) [07:31:07] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423610 (owner: 10Marostegui) [07:31:09] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423610 (owner: 10Marostegui) [07:32:25] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1078 with low weight (duration: 00m 58s) [07:32:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:36:11] (03PS1) 10Elukey: profile::hadoop::common: allow to enable/disable the HDFS trash [puppet] - 10https://gerrit.wikimedia.org/r/423613 (https://phabricator.wikimedia.org/T189051) [07:43:32] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/10766/ - only a whiteline change as expected." [puppet] - 10https://gerrit.wikimedia.org/r/423613 (https://phabricator.wikimedia.org/T189051) (owner: 10Elukey) [07:44:35] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423616 [07:46:07] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423616 (owner: 10Marostegui) [07:47:21] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423616 (owner: 10Marostegui) [07:48:11] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423616 (owner: 10Marostegui) [07:50:00] !log roll restart zookeeper on druid100[456] to enable prometheus monitoring [07:50:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:50:08] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1078 (duration: 00m 58s) [07:50:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:56:52] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423645 [07:58:20] (03Abandoned) 10Marostegui: db-eqiad,db-codfw.php: Proposal for moving hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/418898 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [07:58:33] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423645 (owner: 10Marostegui) [07:59:45] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423645 (owner: 10Marostegui) [08:00:00] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423645 (owner: 10Marostegui) [08:00:55] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1078 (duration: 00m 58s) [08:01:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:01:21] !log uploaded HHVM 3.18.5-dfsg-1+wmf5+deb9u1 for stretch-security to apt.wikimedia.org [08:01:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:01:33] !log restart of druid-(overlord|middlemanager) on druid1004[456] as precautionary measure after zk restart [08:01:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:03:04] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423646 [08:07:03] (03PS8) 10Vgutierrez: mtail: Provide ttfb histogram for varnishbackend [puppet] - 10https://gerrit.wikimedia.org/r/422155 (https://phabricator.wikimedia.org/T184942) [08:09:42] (03CR) 10Vgutierrez: [C: 032] mtail: Provide ttfb histogram for varnishbackend [puppet] - 10https://gerrit.wikimedia.org/r/422155 (https://phabricator.wikimedia.org/T184942) (owner: 10Vgutierrez) [08:10:43] dunno why wikibugs doesn't show the "Merged" msg for my changes [08:11:04] vgutierrez: it never does for puppet changes [08:11:05] 10Operations, 10wikidiff2, 10Patch-For-Review, 10WMDE-QWERTY-Team-Board: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4099843 (10Lea_WMDE) Hi @MoritzMuehlenhoff, the patch is merged - we are ready for beta! Is there anything else you need from our side? [08:11:15] marostegui: oh, thx [08:11:23] :) [08:11:41] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423646 (owner: 10Marostegui) [08:12:54] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423646 (owner: 10Marostegui) [08:13:09] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423646 (owner: 10Marostegui) [08:14:07] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1078 (duration: 00m 58s) [08:14:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:14:33] 10Operations, 10wikidiff2, 10Patch-For-Review, 10WMDE-QWERTY-Team-Board: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4099848 (10MoritzMuehlenhoff) @Lea_WMDE: Seems fine from a quick glance, I'll look into building/updating beta tomorrow. [08:16:27] (03PS1) 10Marostegui: db-eqiad.php: Restore db1078 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423647 [08:18:06] 10Operations, 10wikidiff2, 10Patch-For-Review, 10WMDE-QWERTY-Team-Board: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4099852 (10Lea_WMDE) Awesome, thanks! [08:18:23] PROBLEM - HHVM rendering on mw2234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:19:13] RECOVERY - HHVM rendering on mw2234 is OK: HTTP OK: HTTP/1.1 200 OK - 74948 bytes in 0.305 second response time [08:23:50] (03CR) 10Volans: Puppetmaster: store reports also in puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/422907 (https://phabricator.wikimedia.org/T190918) (owner: 10Volans) [08:23:56] (03PS2) 10Volans: Puppetmaster: store reports also in puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/422907 (https://phabricator.wikimedia.org/T190918) [08:26:38] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Restore db1078 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423647 (owner: 10Marostegui) [08:28:39] (03Merged) 10jenkins-bot: db-eqiad.php: Restore db1078 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423647 (owner: 10Marostegui) [08:28:54] (03CR) 10jenkins-bot: db-eqiad.php: Restore db1078 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423647 (owner: 10Marostegui) [08:29:42] !log codfw-prod: more weight to ms-be204[0-3] - T189633 [08:29:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:48] T189633: rack/setup/install ms-be204[0-3] - https://phabricator.wikimedia.org/T189633 [08:29:50] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Restore original weight for db1078 (duration: 00m 58s) [08:29:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:31:15] (03PS1) 10Marostegui: db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423649 (https://phabricator.wikimedia.org/T187089) [08:32:34] 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-fgiunchedi: rack/setup/install ms-be204[0-3] - https://phabricator.wikimedia.org/T189633#4099929 (10fgiunchedi) [08:33:07] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423649 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [08:34:21] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423649 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [08:35:37] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1077 for alter table (duration: 00m 59s) [08:35:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:01] !log Stop MySQL on db1077 for mysql and kernel upgrade [08:36:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:38:32] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423649 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [08:39:33] PROBLEM - HHVM jobrunner on mw1293 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [08:40:33] RECOVERY - HHVM jobrunner on mw1293 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.012 second response time [08:40:57] !log temporarily disabled puppet (and re-enabling it one-by-one) on all prod puppetmasters to deploy g/422907 - T190918 [08:41:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:03] T190918: Puppet: enable reports to puppetdb - https://phabricator.wikimedia.org/T190918 [08:41:24] (03CR) 10Volans: [C: 032] Puppetmaster: store reports also in puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/422907 (https://phabricator.wikimedia.org/T190918) (owner: 10Volans) [08:41:57] !log upgrading HHVM on video scalers [08:42:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:44:04] PROBLEM - HHVM jobrunner on mw1318 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [08:45:04] RECOVERY - HHVM jobrunner on mw1318 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.015 second response time [08:46:04] !log Deploy schema change on db1077 - s3 - T187089 T185128 T153182 [08:46:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:11] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [08:46:14] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [08:46:14] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [08:55:45] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 4 others: Thumbor incorrectly normalizes .jpe and .jpeg into .jpg for Swift thumbnail storage - https://phabricator.wikimedia.org/T191028#4099963 (10Gilles) [09:08:58] (03PS2) 10Muehlenhoff: Update ssh public key for arlolra [puppet] - 10https://gerrit.wikimedia.org/r/423516 (owner: 10Arlolra) [09:09:31] (03CR) 10Muehlenhoff: [C: 032] Update ssh public key for arlolra [puppet] - 10https://gerrit.wikimedia.org/r/423516 (owner: 10Arlolra) [09:10:23] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops, 10media-storage: msw-c6-codfw offline - https://phabricator.wikimedia.org/T191129#4100007 (10Marostegui) 05Open>03Resolved I think we can consider this resolved. Thanks guys! [09:14:04] (03PS4) 10Gilles: Upgrade to 1.16 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/419172 (https://phabricator.wikimedia.org/T186528) [09:23:00] (03PS1) 10Vgutierrez: Bump to version 1.15 [debs/pybal] - 10https://gerrit.wikimedia.org/r/423657 [09:24:25] (03PS1) 10Elukey: role::configcluster: enable prometheus zk monitoring in main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/423658 (https://phabricator.wikimedia.org/T177460) [09:26:08] (03CR) 10Filippo Giunchedi: [C: 032] Upgrade to 1.16 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/419172 (https://phabricator.wikimedia.org/T186528) (owner: 10Gilles) [09:27:09] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/10768/" [puppet] - 10https://gerrit.wikimedia.org/r/423658 (https://phabricator.wikimedia.org/T177460) (owner: 10Elukey) [09:29:33] (03PS2) 10Muehlenhoff: Update SSH key for dduvall [puppet] - 10https://gerrit.wikimedia.org/r/422962 (owner: 10Dduvall) [09:30:11] (03CR) 10Muehlenhoff: [C: 032] Update SSH key for dduvall [puppet] - 10https://gerrit.wikimedia.org/r/422962 (owner: 10Dduvall) [09:33:39] (03PS2) 10Elukey: role::configcluster: enable prometheus zk monitoring in main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/423658 (https://phabricator.wikimedia.org/T177460) [09:33:41] (03PS1) 10Elukey: role::prometheus::ops:: add jmx targets for role::configcluster [puppet] - 10https://gerrit.wikimedia.org/r/423659 (https://phabricator.wikimedia.org/T177460) [09:41:33] (03PS2) 10Muehlenhoff: Update Jon Robson's public key [puppet] - 10https://gerrit.wikimedia.org/r/423507 (owner: 10Jdlrobson) [09:43:23] (03PS2) 10Muehlenhoff: Update SSH key for nikerabbit [puppet] - 10https://gerrit.wikimedia.org/r/422932 (owner: 10Nikerabbit) [09:44:03] (03CR) 10Muehlenhoff: [C: 032] Update SSH key for nikerabbit [puppet] - 10https://gerrit.wikimedia.org/r/422932 (owner: 10Nikerabbit) [09:44:25] 10Operations, 10Wikimedia-Apache-configuration: Review/Merge/Deploy advisors.wikimedia.org apache vhost - https://phabricator.wikimedia.org/T190143#4100035 (10Reedy) 05Open>03Resolved a:03Dzahn [09:48:08] (03PS1) 10Daimona Eaytoy: Re-enable filter profiling on every wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423660 (https://phabricator.wikimedia.org/T191039) [09:48:15] (03PS3) 10Muehlenhoff: Update Jon Robson's public key [puppet] - 10https://gerrit.wikimedia.org/r/423507 (owner: 10Jdlrobson) [09:50:20] (03CR) 10Muehlenhoff: [C: 032] Update Jon Robson's public key [puppet] - 10https://gerrit.wikimedia.org/r/423507 (owner: 10Jdlrobson) [09:51:38] (03CR) 10Daimona Eaytoy: [C: 04-1] "Let's wait a bit in case there are opposing opinions." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423660 (https://phabricator.wikimedia.org/T191039) (owner: 10Daimona Eaytoy) [09:51:53] !log deploy thumbor 1.16 in codfw and eqiad - T186528 T179200 T189647 T191028 [09:52:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:52:04] T189647: Thumbor missing log extras when hitting ImageMagicException - https://phabricator.wikimedia.org/T189647 [09:52:04] T191028: Thumbor incorrectly normalizes .jpe and .jpeg into .jpg for Swift thumbnail storage - https://phabricator.wikimedia.org/T191028 [09:52:04] T186528: Thumbor Swift authentication support for videos - https://phabricator.wikimedia.org/T186528 [09:52:04] T179200: ImageMagick may return errors when it actually converted what it could - https://phabricator.wikimedia.org/T179200 [09:53:42] gilles: ^ codfw thumbor and thumbor1001 updated [09:57:02] godog: can't seem to be able to SSH into them from here. has there been a recent bastion change or something? [09:57:18] hangs for a while, then times out [09:57:31] gilles: which bastion are you going through? bast1001 was recently decom'd [09:57:53] ah yes that's it [09:57:58] which one should I use now? [09:58:20] in eqiad bast1002, though I usually use the bastion closer to me [09:58:24] IOW bast3002 [10:04:26] godog: I'm still seeing wikimedia-thumbor-version 1.15 on thumbor1001 [10:06:42] godog: "dpkg -l | grep thumbor" confirms thumbor1001 still has the old version. thumbor2001 has the new one, though [10:07:14] gilles: ah yes, my bad! upgraded now [10:09:02] 10Operations, 10cloud-services-team: rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#4100120 (10faidon) >>! In T183937#4097563, @RobH wrote: > So there is an issue where trusty expects the os to be on eth0, and its on eth3. However, after discussion in IRC, @ayounsi pointed out t... [10:09:13] godog: lgtm, you can roll it out everywhere [10:09:26] gilles: kk [10:12:37] gilles: {{done}} [10:28:23] (03PS9) 10Arturo Borrero Gonzalez: wmcs: monitoring: rsync whisper files between mon servers [puppet] - 10https://gerrit.wikimedia.org/r/422389 (https://phabricator.wikimedia.org/T190512) [10:28:50] (03CR) 10jerkins-bot: [V: 04-1] wmcs: monitoring: rsync whisper files between mon servers [puppet] - 10https://gerrit.wikimedia.org/r/422389 (https://phabricator.wikimedia.org/T190512) (owner: 10Arturo Borrero Gonzalez) [10:31:03] (03PS10) 10Arturo Borrero Gonzalez: wmcs: monitoring: rsync whisper files between mon servers [puppet] - 10https://gerrit.wikimedia.org/r/422389 (https://phabricator.wikimedia.org/T190512) [10:31:25] (03CR) 10jerkins-bot: [V: 04-1] wmcs: monitoring: rsync whisper files between mon servers [puppet] - 10https://gerrit.wikimedia.org/r/422389 (https://phabricator.wikimedia.org/T190512) (owner: 10Arturo Borrero Gonzalez) [10:31:29] (03PS11) 10Arturo Borrero Gonzalez: wmcs: monitoring: rsync whisper files between mon servers [puppet] - 10https://gerrit.wikimedia.org/r/422389 (https://phabricator.wikimedia.org/T190512) [10:33:56] (03CR) 10Arturo Borrero Gonzalez: [C: 032] wmcs: monitoring: rsync whisper files between mon servers [puppet] - 10https://gerrit.wikimedia.org/r/422389 (https://phabricator.wikimedia.org/T190512) (owner: 10Arturo Borrero Gonzalez) [10:37:53] PROBLEM - Check systemd state on labmon1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [10:40:11] (03PS1) 10Arturo Borrero Gonzalez: wmcs: monitoring: fix usage of puppet variable inside ferm @resolve [puppet] - 10https://gerrit.wikimedia.org/r/423665 (https://phabricator.wikimedia.org/T190512) [10:40:35] (03CR) 10jerkins-bot: [V: 04-1] wmcs: monitoring: fix usage of puppet variable inside ferm @resolve [puppet] - 10https://gerrit.wikimedia.org/r/423665 (https://phabricator.wikimedia.org/T190512) (owner: 10Arturo Borrero Gonzalez) [10:41:30] (03PS2) 10Arturo Borrero Gonzalez: wmcs: monitoring: fix usage of puppet variable inside ferm @resolve [puppet] - 10https://gerrit.wikimedia.org/r/423665 (https://phabricator.wikimedia.org/T190512) [10:42:19] (03CR) 10Arturo Borrero Gonzalez: [C: 032] wmcs: monitoring: fix usage of puppet variable inside ferm @resolve [puppet] - 10https://gerrit.wikimedia.org/r/423665 (https://phabricator.wikimedia.org/T190512) (owner: 10Arturo Borrero Gonzalez) [10:43:54] RECOVERY - Check systemd state on labmon1001 is OK: OK - running: The system is fully operational [10:51:27] (03PS1) 10Arturo Borrero Gonzalez: wmcs: monitoring: fix ssh userkey name [puppet] - 10https://gerrit.wikimedia.org/r/423666 (https://phabricator.wikimedia.org/T190512) [10:52:11] (03CR) 10Arturo Borrero Gonzalez: [C: 032] wmcs: monitoring: fix ssh userkey name [puppet] - 10https://gerrit.wikimedia.org/r/423666 (https://phabricator.wikimedia.org/T190512) (owner: 10Arturo Borrero Gonzalez) [11:06:37] !log installing libdatetime-timezone-perl update from Debian SUA [11:06:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:11:38] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423667 [11:13:30] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423667 (owner: 10Marostegui) [11:14:55] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423667 (owner: 10Marostegui) [11:16:18] !log deploy thumbor 1.16 in codfw [11:16:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:18:07] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423667 (owner: 10Marostegui) [11:19:45] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Slowly repool db1077 after alter table (duration: 00m 58s) [11:19:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:24:28] 10Operations, 10Thumbor, 10Patch-For-Review, 10Performance-Team (Radar), 10User-fgiunchedi: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817#4100217 (10Gilles) [11:26:42] jynus: Hi - Missed your email yesterday - I hope we've not killed the platform :( [11:27:12] joal: I had to kill some of the threads of the platform [11:27:22] to prevent a full outage [11:27:25] jynus: ok [11:27:27] it was either you or all [11:27:39] jynus: I'd rather be killed then :) [11:27:40] (yours was not the only one affected) [11:27:46] one question [11:27:57] sure [11:28:00] by any chance, do you have that on a cron starting on the 1st? [11:28:11] (which would make sense for monthly updates) [11:28:22] 1st or 2nd, etc [11:28:58] (03CR) 10Filippo Giunchedi: role::prometheus::ops:: add jmx targets for role::configcluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/423659 (https://phabricator.wikimedia.org/T177460) (owner: 10Elukey) [11:29:08] jynus: we do yes [11:29:10] I think we have some overload with many scripts happening around that date, could it be scheduled later in the month(3-4th) [11:29:27] I may send an email asking someone to delay those jobs because on the fist we had a huge spike [11:29:44] I think are those kind of tasks + WLM contests, etc [11:29:52] plus some bad tools [11:29:53] jynus: feasible - We could also start with small wikis and end with big ones (instead of the opposite) [11:29:59] that would help, too [11:30:16] I think everyone wants to do those analysis at the same time [11:30:22] and now it is relatively calm [11:30:48] compared to monday morning [11:30:51] jynus: We use the updates to compute wikistats2 updates, and users are usually eager to get new month results - But if it's a question of whether or not the system sustains the load, we'll do what we're told :) [11:30:55] sorry for the incomveniences [11:31:03] I know [11:31:07] jynus: I'm gonna check if any of our jobs died [11:31:22] but I prefer to fix dbstore1002 and have dedicated resources for you [11:31:38] if it was needed [11:32:00] or even propose an expansion of labsdbs [11:32:48] with 3 servers, we are quite constrained when 1 goes down like it happened this week [11:33:25] jynus: I'd love a labs expansion rather than db1002 update (usefull for more people) [11:33:32] depending on the queries, if you make more but each one is faster [11:33:46] you could also do some of those to the "web" service [11:34:02] web is only a suggestion, for queries that take less than 300 s [11:34:39] jynus: only usage we havbe as of today is sqoop - too big of queries for web I think [11:35:00] let's talk on future days, maybe the next begining of the month we will not have such a huge load [11:35:39] or you could document some panic button that pauses, but does not kill your jobs [11:36:51] jynus: we do not have such a button as of now - We probably should add one [11:36:56] Will talk to the team about that [11:37:07] Thanks again for having saved the infra :) [11:41:01] (03PS1) 10Arturo Borrero Gonzalez: wmcs: monitoring: fix rsync mechanisms [puppet] - 10https://gerrit.wikimedia.org/r/423671 (https://phabricator.wikimedia.org/T190512) [11:41:35] (03CR) 10jerkins-bot: [V: 04-1] wmcs: monitoring: fix rsync mechanisms [puppet] - 10https://gerrit.wikimedia.org/r/423671 (https://phabricator.wikimedia.org/T190512) (owner: 10Arturo Borrero Gonzalez) [11:43:01] (03PS2) 10Arturo Borrero Gonzalez: wmcs: monitoring: fix rsync mechanisms [puppet] - 10https://gerrit.wikimedia.org/r/423671 (https://phabricator.wikimedia.org/T190512) [11:43:30] (03CR) 10jerkins-bot: [V: 04-1] wmcs: monitoring: fix rsync mechanisms [puppet] - 10https://gerrit.wikimedia.org/r/423671 (https://phabricator.wikimedia.org/T190512) (owner: 10Arturo Borrero Gonzalez) [11:44:24] (03PS3) 10Arturo Borrero Gonzalez: wmcs: monitoring: fix rsync mechanisms [puppet] - 10https://gerrit.wikimedia.org/r/423671 (https://phabricator.wikimedia.org/T190512) [11:45:32] (03CR) 10Arturo Borrero Gonzalez: [C: 032] wmcs: monitoring: fix rsync mechanisms [puppet] - 10https://gerrit.wikimedia.org/r/423671 (https://phabricator.wikimedia.org/T190512) (owner: 10Arturo Borrero Gonzalez) [11:45:58] jynus: There have a some jobs failing from a sqoop perspective [11:46:29] jynus: We''l probably relaunch them either tonight or tomorrow, depending on when other sqoop have finished - Would that be ok ? [11:47:38] jynus: mostly big wikis :( enwiki.revision, wikidatawiki.logging, wikidatawiki.revision, wikidatawiki.pagelinks, commonswiki.logging, commonswiki.revision, commonswiki.pagelinks [11:55:23] (03PS1) 10Arturo Borrero Gonzalez: wcms: monitoring: add more rssh configuration for _graphite user [puppet] - 10https://gerrit.wikimedia.org/r/423672 (https://phabricator.wikimedia.org/T190512) [11:56:17] (03CR) 10Arturo Borrero Gonzalez: [C: 032] wcms: monitoring: add more rssh configuration for _graphite user [puppet] - 10https://gerrit.wikimedia.org/r/423672 (https://phabricator.wikimedia.org/T190512) (owner: 10Arturo Borrero Gonzalez) [11:59:25] (03PS2) 10Muehlenhoff: mediawiki::packages::fonts: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/421930 [12:02:51] (03PS1) 10Vgutierrez: Fix dummy metrics implementation [debs/pybal] (1.15) - 10https://gerrit.wikimedia.org/r/423673 (https://phabricator.wikimedia.org/T190527) [12:03:52] (03PS1) 10Arturo Borrero Gonzalez: wcms: monitoring: use double quotes for puppet to understand \n in file content [puppet] - 10https://gerrit.wikimedia.org/r/423674 (https://phabricator.wikimedia.org/T190512) [12:04:39] (03CR) 10Arturo Borrero Gonzalez: [C: 032] wcms: monitoring: use double quotes for puppet to understand \n in file content [puppet] - 10https://gerrit.wikimedia.org/r/423674 (https://phabricator.wikimedia.org/T190512) (owner: 10Arturo Borrero Gonzalez) [12:06:45] (03CR) 10Muehlenhoff: [C: 032] mediawiki::packages::fonts: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/421930 (owner: 10Muehlenhoff) [12:06:50] (03PS3) 10Muehlenhoff: mediawiki::packages::fonts: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/421930 [12:07:11] (03PS3) 10Rush: openstack: neutron router l3-agent HA [puppet] - 10https://gerrit.wikimedia.org/r/423032 (https://phabricator.wikimedia.org/T188266) [12:07:41] (03CR) 10jerkins-bot: [V: 04-1] openstack: neutron router l3-agent HA [puppet] - 10https://gerrit.wikimedia.org/r/423032 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [12:08:49] (03PS4) 10Rush: openstack: neutron router l3-agent HA [puppet] - 10https://gerrit.wikimedia.org/r/423032 (https://phabricator.wikimedia.org/T188266) [12:11:32] (03PS5) 10Rush: openstack: neutron router l3-agent HA [puppet] - 10https://gerrit.wikimedia.org/r/423032 (https://phabricator.wikimedia.org/T188266) [12:12:08] (03CR) 10Rush: "labtestmetal2001.codfw.wmnet,labtestcontrol2001.codfw.wmnet,labtestvirt2003.codfw.wmnet,labtestneutron2001.codfw.wmnet,labtestneutron2002." [puppet] - 10https://gerrit.wikimedia.org/r/423032 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [12:13:05] (03PS2) 10Muehlenhoff: mediawiki::packages::fonts: Consistently use require_package [puppet] - 10https://gerrit.wikimedia.org/r/420670 [12:15:03] (03CR) 10jerkins-bot: [V: 04-1] openstack: neutron router l3-agent HA [puppet] - 10https://gerrit.wikimedia.org/r/423032 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [12:22:39] (03CR) 10Huji: [C: 031] Re-enable filter profiling on every wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423660 (https://phabricator.wikimedia.org/T191039) (owner: 10Daimona Eaytoy) [12:26:24] (03PS6) 10Rush: openstack: neutron router l3-agent HA [puppet] - 10https://gerrit.wikimedia.org/r/423032 (https://phabricator.wikimedia.org/T188266) [12:39:37] (03CR) 10Filippo Giunchedi: [C: 031] role::configcluster: enable prometheus zk monitoring in main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/423658 (https://phabricator.wikimedia.org/T177460) (owner: 10Elukey) [12:42:09] 10Operations, 10Traffic, 10media-storage, 10User-fgiunchedi: Swift invalid range requests causing 501s - https://phabricator.wikimedia.org/T183902#4100418 (10fgiunchedi) a:03fgiunchedi [12:46:49] joal: yes, now things are ok, we have the 3 servers [12:46:56] (03PS1) 10BBlack: eqsin: turn up FM GU KI MH MP PW TV UM [dns] - 10https://gerrit.wikimedia.org/r/423680 (https://phabricator.wikimedia.org/T189252) [12:46:58] (03PS1) 10BBlack: eqsin: temporary test for AU [dns] - 10https://gerrit.wikimedia.org/r/423681 (https://phabricator.wikimedia.org/T189252) [12:48:24] jynus: noted - Will poing you before restarting jobs - Thanks [12:51:58] no need [12:52:13] issue was only at the start of the month [12:52:21] we will see if next month the same happens [12:52:44] jouncebot: next [12:52:45] In 0 hour(s) and 7 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180403T1300) [12:53:30] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423682 [12:55:12] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423682 (owner: 10Marostegui) [12:55:57] 10Puppet, 10Beta-Cluster-Infrastructure: Error: Could not find class role::kafka::jumbo::mirror for deployment-kafka0[45] - https://phabricator.wikimedia.org/T191154#4100436 (10MarcoAurelio) @greg should be able to fix @Ottomata's accesses there. [12:56:25] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423682 (owner: 10Marostegui) [12:56:45] 10Operations, 10Pybal, 10Traffic, 10Patch-For-Review: Upgrade pybal-test instances to stretch - https://phabricator.wikimedia.org/T190993#4100439 (10Vgutierrez) [12:57:39] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1077 after alter table (duration: 00m 59s) [12:57:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:58:00] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423682 (owner: 10Marostegui) [13:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor I � Unicode. All rise for European Mid-day SWAT(Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180403T1300). [13:00:04] No GERRIT patches in the queue for this window AFAICS. [13:02:11] no EU SWAT I guess then [13:02:27] 10Puppet, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-prep down hosts - fix/remove? - https://phabricator.wikimedia.org/T191293#4100460 (10MarcoAurelio) [13:04:07] (03CR) 10BBlack: [C: 032] eqsin: turn up FM GU KI MH MP PW TV UM [dns] - 10https://gerrit.wikimedia.org/r/423680 (https://phabricator.wikimedia.org/T189252) (owner: 10BBlack) [13:04:39] (03CR) 10BBlack: [C: 032] eqsin: temporary test for AU [dns] - 10https://gerrit.wikimedia.org/r/423681 (https://phabricator.wikimedia.org/T189252) (owner: 10BBlack) [13:07:16] (03PS1) 10Ottomata: Use $mirror_name in produce rate alert [puppet] - 10https://gerrit.wikimedia.org/r/423685 (https://phabricator.wikimedia.org/T189464) [13:08:27] 10Puppet, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: Long-lived cherry-picks on deployment-puppetmaster02.deployment-prep.equiad.wmflabs - https://phabricator.wikimedia.org/T191294#4100482 (10MarcoAurelio) [13:11:39] (03PS3) 10Elukey: role::configcluster: enable prometheus zk monitoring in main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/423658 (https://phabricator.wikimedia.org/T177460) [13:13:37] (03CR) 10Elukey: [C: 032] role::configcluster: enable prometheus zk monitoring in main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/423658 (https://phabricator.wikimedia.org/T177460) (owner: 10Elukey) [13:14:04] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423686 [13:15:33] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423686 (owner: 10Marostegui) [13:16:48] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423686 (owner: 10Marostegui) [13:18:07] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423686 (owner: 10Marostegui) [13:18:38] !log roll restart of zookeeper on conf200[123] to pick up prometheus monitoring settings [13:18:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:46] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1077 after alter table (duration: 00m 58s) [13:18:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:21:03] !log Reimport  s51541_sulwatcher.logging from master to slave - T191020 [13:21:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:21:08] (03PS1) 10Muehlenhoff: Enable icu57 component von jessie-based app servers [puppet] - 10https://gerrit.wikimedia.org/r/423687 [13:21:09] T191020: labsdb1004: s51541_sulwatcher.logging is out of sync - https://phabricator.wikimedia.org/T191020 [13:32:56] (03PS1) 10Rush: openstack: manage bridges and bridge mappings for l3 and compute [puppet] - 10https://gerrit.wikimedia.org/r/423690 (https://phabricator.wikimedia.org/T188266) [13:33:07] (03PS1) 10Marostegui: db-eqiad.php: Restore db1077 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423691 [13:33:29] (03CR) 10jerkins-bot: [V: 04-1] openstack: manage bridges and bridge mappings for l3 and compute [puppet] - 10https://gerrit.wikimedia.org/r/423690 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [13:34:37] (03PS2) 10Muehlenhoff: Enable base::service_auto_restart for cron [puppet] - 10https://gerrit.wikimedia.org/r/422391 (https://phabricator.wikimedia.org/T135991) [13:34:40] marostegui: FYI, I'm going to backport those two fixes for the "use of subqueries" error (T191116) instead of waiting for the train to get them, since it sounds like the 1.5 million instances are getting in the way of stuff. [13:34:41] T191116: Wikimedia\Rdbms\Database::tableName: use of subqueries is not supported this way. - https://phabricator.wikimedia.org/T191116 [13:34:57] (03PS2) 10Rush: openstack: manage bridges and bridge mappings for l3 and compute [puppet] - 10https://gerrit.wikimedia.org/r/423690 (https://phabricator.wikimedia.org/T188266) [13:35:04] anomie: thanks a lot :-) [13:35:30] (03CR) 10jerkins-bot: [V: 04-1] openstack: manage bridges and bridge mappings for l3 and compute [puppet] - 10https://gerrit.wikimedia.org/r/423690 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [13:35:39] (03PS2) 10Vgutierrez: Bump to version 1.15 [debs/pybal] - 10https://gerrit.wikimedia.org/r/423657 [13:36:21] (03PS3) 10Rush: openstack: manage bridges and bridge mappings for l3 and compute [puppet] - 10https://gerrit.wikimedia.org/r/423690 (https://phabricator.wikimedia.org/T188266) [13:36:31] (03CR) 10Vgutierrez: [C: 032] Bump to version 1.15 [debs/pybal] - 10https://gerrit.wikimedia.org/r/423657 (owner: 10Vgutierrez) [13:36:54] (03CR) 10jerkins-bot: [V: 04-1] openstack: manage bridges and bridge mappings for l3 and compute [puppet] - 10https://gerrit.wikimedia.org/r/423690 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [13:38:12] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Restore db1077 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423691 (owner: 10Marostegui) [13:38:49] (03CR) 10Muehlenhoff: [C: 032] Enable base::service_auto_restart for cron [puppet] - 10https://gerrit.wikimedia.org/r/422391 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [13:39:11] (03CR) 10Vgutierrez: [C: 032] Fix dummy metrics implementation [debs/pybal] (1.15) - 10https://gerrit.wikimedia.org/r/423673 (https://phabricator.wikimedia.org/T190527) (owner: 10Vgutierrez) [13:39:22] (03Merged) 10jenkins-bot: db-eqiad.php: Restore db1077 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423691 (owner: 10Marostegui) [13:39:37] (03CR) 10jenkins-bot: db-eqiad.php: Restore db1077 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423691 (owner: 10Marostegui) [13:40:00] (03PS1) 10Vgutierrez: Bump to version 1.15 [debs/pybal] (1.15) - 10https://gerrit.wikimedia.org/r/423693 [13:40:39] (03PS2) 10Ottomata: Use $mirror_name in produce rate alert [puppet] - 10https://gerrit.wikimedia.org/r/423685 (https://phabricator.wikimedia.org/T189464) [13:41:18] (03CR) 10Ottomata: [C: 032] Use $mirror_name in produce rate alert [puppet] - 10https://gerrit.wikimedia.org/r/423685 (https://phabricator.wikimedia.org/T189464) (owner: 10Ottomata) [13:41:23] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Restore original weight for db1077 after alter table (duration: 00m 58s) [13:41:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:41:50] (03CR) 10Vgutierrez: [C: 032] Bump to version 1.15 [debs/pybal] (1.15) - 10https://gerrit.wikimedia.org/r/423693 (owner: 10Vgutierrez) [13:42:28] ottomata: ah sorry missed that :( [13:43:05] (03PS2) 10Elukey: role::prometheus::ops:: add jmx targets for zookeeper [puppet] - 10https://gerrit.wikimedia.org/r/423659 (https://phabricator.wikimedia.org/T177460) [13:43:35] (03PS3) 10Elukey: role::prometheus::ops:: add jmx targets for zookeeper [puppet] - 10https://gerrit.wikimedia.org/r/423659 (https://phabricator.wikimedia.org/T177460) [13:44:39] (03CR) 10Elukey: [C: 032] role::prometheus::ops:: add jmx targets for zookeeper [puppet] - 10https://gerrit.wikimedia.org/r/423659 (https://phabricator.wikimedia.org/T177460) (owner: 10Elukey) [13:44:41] (03PS1) 10Muehlenhoff: Update SSH key for Miriam Redi [puppet] - 10https://gerrit.wikimedia.org/r/423694 [13:46:09] (03PS1) 10Ottomata: Replicate job queue topics main -> jumbo [puppet] - 10https://gerrit.wikimedia.org/r/423695 (https://phabricator.wikimedia.org/T189464) [13:47:02] !log anomie@tin Synchronized php-1.31.0-wmf.27/includes/specials/SpecialWhatlinkshere.php: Backporting fix for T191116 ([[gerrit:423688]]) (duration: 00m 58s) [13:47:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:09] T191116: Wikimedia\Rdbms\Database::tableName: use of subqueries is not supported this way. - https://phabricator.wikimedia.org/T191116 [13:47:36] (03PS2) 10Muehlenhoff: Update SSH key for Miriam Redi [puppet] - 10https://gerrit.wikimedia.org/r/423694 [13:48:06] (03CR) 10Ottomata: [C: 032] Replicate job queue topics main -> jumbo [puppet] - 10https://gerrit.wikimedia.org/r/423695 (https://phabricator.wikimedia.org/T189464) (owner: 10Ottomata) [13:48:21] !log anomie@tin Synchronized php-1.31.0-wmf.27/extensions/intersection/DynamicPageList.hooks.php: Backporting fix for T191116 ([[gerrit:423689]]) (duration: 00m 58s) [13:48:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:49:18] 10Operations, 10Dumps-Generation, 10HHVM, 10Patch-For-Review: Convert snapshot hosts to use HHVM and trusty - https://phabricator.wikimedia.org/T94277#4100548 (10hoo) [13:49:23] (03CR) 10Muehlenhoff: [C: 032] Update SSH key for Miriam Redi [puppet] - 10https://gerrit.wikimedia.org/r/423694 (owner: 10Muehlenhoff) [13:49:28] (03PS3) 10Muehlenhoff: Update SSH key for Miriam Redi [puppet] - 10https://gerrit.wikimedia.org/r/423694 [13:49:35] (03PS4) 10Rush: openstack: manage bridges and bridge mappings for l3 and compute [puppet] - 10https://gerrit.wikimedia.org/r/423690 (https://phabricator.wikimedia.org/T188266) [13:49:57] 10Operations, 10Ops-Access-Requests, 10Analytics, 10Research, and 2 others: Restricting access for a collaboration nearing completion - https://phabricator.wikimedia.org/T189341#4100549 (10herron) p:05Triage>03Normal a:05DarTar>03herron [13:51:14] anomie: I can see the decrease in errors already! :) [13:51:16] marostegui: Done, and it seems to have turned off the firehose. If you happen to see any new instances of that warning after now (2018-04-03 13:49 UTC), please report them on the task so they can be fixed too. [13:51:51] anomie: Sure, I will keep an eye and if nothing comes back up, I will close the task tomorrow, does that sound ok? [13:51:58] Works for me. [13:52:10] Excellent! Thanks a lot :) [13:52:13] No problem [13:53:00] (03PS1) 10Vgutierrez: Release 1.15.3: Avoid having a hard requirement on prometheus-client [debs/pybal] - 10https://gerrit.wikimedia.org/r/423696 (https://phabricator.wikimedia.org/T190527) [13:53:10] (03PS1) 10Elukey: Revert "role::prometheus::ops:: add jmx targets for zookeeper" [puppet] - 10https://gerrit.wikimedia.org/r/423697 [13:53:48] (03CR) 10Elukey: "The list of nodes added to the target is more than zookeeper exporters, because for example on druid nodes we co-host druid with zookeeper" [puppet] - 10https://gerrit.wikimedia.org/r/423697 (owner: 10Elukey) [13:53:52] (03CR) 10Elukey: [C: 032] Revert "role::prometheus::ops:: add jmx targets for zookeeper" [puppet] - 10https://gerrit.wikimedia.org/r/423697 (owner: 10Elukey) [13:53:57] (03PS2) 10Elukey: Revert "role::prometheus::ops:: add jmx targets for zookeeper" [puppet] - 10https://gerrit.wikimedia.org/r/423697 [13:53:59] (03PS1) 10Vgutierrez: Release 1.15.3: Avoid having a hard requirement on prometheus-client [debs/pybal] (1.15) - 10https://gerrit.wikimedia.org/r/423698 (https://phabricator.wikimedia.org/T190527) [13:59:41] (03PS1) 10Vgutierrez: Release 1.15.3: Avoid having a hard requirement on prometheus-client [debs/pybal] (1.15-stretch) - 10https://gerrit.wikimedia.org/r/423699 (https://phabricator.wikimedia.org/T190527) [14:05:42] (03CR) 10Alexandros Kosiaris: Enable icu57 component von jessie-based app servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/423687 (owner: 10Muehlenhoff) [14:10:05] (03PS2) 10Muehlenhoff: Enable icu57 component for jessie-based app servers [puppet] - 10https://gerrit.wikimedia.org/r/423687 [14:10:07] (03CR) 10Muehlenhoff: Enable icu57 component for jessie-based app servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/423687 (owner: 10Muehlenhoff) [14:10:51] 10Operations, 10Operations-Software-Development, 10Goal: Release and deploy Debmonitor (patch management software) [Technology Goal 2017-18_Q4] - https://phabricator.wikimedia.org/T191298#4100613 (10Volans) [14:11:40] 10Operations, 10Operations-Software-Development, 10Goal: Release and deploy Debmonitor (patch management software) [Technology Goal 2017-18_Q4] - https://phabricator.wikimedia.org/T191298#4100625 (10Volans) [14:11:46] 10Operations, 10Operations-Software-Development, 10Continuous-Integration-Infrastructure (shipyard), 10Patch-For-Review: New tool to track package updates/status for hosts and images (debmonitor) - https://phabricator.wikimedia.org/T167504#4100624 (10Volans) [14:12:18] (03PS1) 10Elukey: role::configcluster: enable zookeeper prometheus monitoring in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/423703 (https://phabricator.wikimedia.org/T177460) [14:13:16] 10Operations, 10Operations-Software-Development: Debmonitor: deploy it in production - https://phabricator.wikimedia.org/T191299#4100627 (10Volans) p:05Triage>03Normal [14:15:01] (03CR) 10Alexandros Kosiaris: [C: 031] Enable icu57 component for jessie-based app servers [puppet] - 10https://gerrit.wikimedia.org/r/423687 (owner: 10Muehlenhoff) [14:15:28] 10Operations, 10Operations-Software-Development: Debmonitor: deploy the agent across the fleet - https://phabricator.wikimedia.org/T191300#4100643 (10Volans) p:05Triage>03Normal [14:15:42] 10Operations, 10Operations-Software-Development: Debmonitor: deploy the service in production - https://phabricator.wikimedia.org/T191299#4100667 (10Volans) [14:17:38] 10Operations, 10Operations-Software-Development, 10Goal: Release and deploy Debmonitor (patch management software) [Technology Goal 2017-18_Q4] - https://phabricator.wikimedia.org/T191298#4100613 (10Volans) [14:17:40] 10Operations, 10Operations-Software-Development: Debmonitor: deploy the service in production - https://phabricator.wikimedia.org/T191299#4100627 (10Volans) [14:19:34] 10Operations, 10Operations-Software-Development, 10Goal: Release and deploy Debmonitor (patch management software) [Technology Goal 2017-18_Q4] - https://phabricator.wikimedia.org/T191298#4100677 (10Volans) p:05Triage>03Normal [14:20:12] (03PS2) 10Mobrovac: Switch high traffic jobs to kafka for all wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423512 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [14:21:32] taking over tin for 10 mins [14:23:10] (03CR) 10Mobrovac: [C: 032] Switch high traffic jobs to kafka for all wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423512 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [14:23:26] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/10774/ - Turned out to enable monitoring for zookeeper in druid too, that is really good :)" [puppet] - 10https://gerrit.wikimedia.org/r/423703 (https://phabricator.wikimedia.org/T177460) (owner: 10Elukey) [14:30:24] oh come on, jenkins [14:31:15] (03CR) 10jerkins-bot: [V: 04-1] Switch high traffic jobs to kafka for all wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423512 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [14:31:27] (03PS5) 10Rush: openstack: manage bridges and bridge mappings for l3 and compute [puppet] - 10https://gerrit.wikimedia.org/r/423690 (https://phabricator.wikimedia.org/T188266) [14:32:18] (03CR) 10Mobrovac: [C: 032] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423512 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [14:32:22] (03CR) 10Rush: [C: 032] openstack: manage bridges and bridge mappings for l3 and compute [puppet] - 10https://gerrit.wikimedia.org/r/423690 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [14:38:07] (03CR) 10jenkins-bot: Switch high traffic jobs to kafka for all wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423512 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [14:38:29] PROBLEM - Host labtestvirt2003 is DOWN: PING CRITICAL - Packet loss = 100% [14:38:33] !log ppchelko@tin Started deploy [cpjobqueue/deploy@60a2292]: Switch all high traffic jobs to kafka T190327 [14:38:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:38:39] T190327: FY17/18 Q4 Program 8 Services Goal: Complete the JobQueue transition to EventBus - https://phabricator.wikimedia.org/T190327 [14:39:13] !log mobrovac@tin Synchronized wmf-config/jobqueue.php: Switch the remaining high-traffic jobs for all wikis, file 1/2 - T190327 (duration: 00m 59s) [14:39:17] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@60a2292]: Switch all high traffic jobs to kafka T190327 (duration: 00m 44s) [14:39:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:38] RECOVERY - Host labtestvirt2003 is UP: PING OK - Packet loss = 0%, RTA = 36.10 ms [14:40:53] !log mobrovac@tin Synchronized wmf-config/InitialiseSettings.php: Switch the remaining high-traffic jobs for all wikis, file 2/2 - T190327 (duration: 00m 59s) [14:40:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:07] ok, we are done [14:43:16] (03PS1) 10Elukey: role::prometheus::ops: add zookeeper jmx configuration [puppet] - 10https://gerrit.wikimedia.org/r/423706 (https://phabricator.wikimedia.org/T177460) [14:43:44] (03PS3) 10Gehel: Make 'style' and 'storage id' available to maps services [puppet] - 10https://gerrit.wikimedia.org/r/422239 (https://phabricator.wikimedia.org/T112948) (owner: 10Sbisson) [14:43:55] (03CR) 10Elukey: [C: 032] role::prometheus::ops: add zookeeper jmx configuration [puppet] - 10https://gerrit.wikimedia.org/r/423706 (https://phabricator.wikimedia.org/T177460) (owner: 10Elukey) [14:45:00] (03PS1) 10Rush: openstack: set ha as default mode for l3-agents [puppet] - 10https://gerrit.wikimedia.org/r/423707 (https://phabricator.wikimedia.org/T188266) [14:49:28] PROBLEM - Host labtestneutron2002 is DOWN: PING CRITICAL - Packet loss = 100% [14:50:15] (03PS1) 10Ppchelko: Clean up the config for high traffic jobs. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423710 (https://phabricator.wikimedia.org/T190327) [14:50:28] RECOVERY - Host labtestneutron2002 is UP: PING OK - Packet loss = 0%, RTA = 36.99 ms [14:50:38] PROBLEM - Host labtestneutron2001 is DOWN: PING CRITICAL - Packet loss = 100% [14:51:28] RECOVERY - Host labtestneutron2001 is UP: PING OK - Packet loss = 0%, RTA = 36.16 ms [14:54:33] (03PS1) 10Herron: rm mtizzoni,panisson,paolotti,ciro from analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/423711 (https://phabricator.wikimedia.org/T189341) [14:55:27] 10Operations, 10Proton, 10Readers-Web-Backlog, 10Services (watching): Choose a server for the chromium-render service - https://phabricator.wikimedia.org/T187821#4100785 (10Niedzielski) Hey @mobrovac. We're making final plans for the current quarter. Any idea when we should expect a place for the Proton pr... [14:57:18] RECOVERY - Check systemd state on labtestmetal2001 is OK: OK - running: The system is fully operational [14:58:17] ^ me [14:59:39] (03PS2) 10Rush: openstack: set ha as default mode for l3-agents [puppet] - 10https://gerrit.wikimedia.org/r/423707 (https://phabricator.wikimedia.org/T188266) [15:00:10] PROBLEM - Host labtestmetal2001 is DOWN: PING CRITICAL - Packet loss = 100% [15:00:22] (03CR) 10Rush: [C: 032] openstack: set ha as default mode for l3-agents [puppet] - 10https://gerrit.wikimedia.org/r/423707 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [15:00:59] RECOVERY - Host labtestmetal2001 is UP: PING OK - Packet loss = 0%, RTA = 36.17 ms [15:04:53] (03PS4) 10Gehel: Make 'style' and 'storage id' available to maps services [puppet] - 10https://gerrit.wikimedia.org/r/422239 (https://phabricator.wikimedia.org/T112948) (owner: 10Sbisson) [15:06:18] 10Operations, 10Ops-Access-Requests, 10Analytics, 10Research, and 3 others: Restricting access for a collaboration nearing completion - https://phabricator.wikimedia.org/T189341#4100822 (10herron) @DarTar @Ottomata could you please review patch 423711? It appears these users are already members of `statis... [15:08:33] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/423711 (https://phabricator.wikimedia.org/T189341) (owner: 10Herron) [15:09:10] (03CR) 10Gehel: [C: 032] "Looks good. This only introduces new vars. They will be used only when the config will be generated by scap templates. This should functio" [puppet] - 10https://gerrit.wikimedia.org/r/422239 (https://phabricator.wikimedia.org/T112948) (owner: 10Sbisson) [15:09:26] !log depool ms-fe2005 to test rewrite.py - T183902 [15:09:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:33] T183902: Swift invalid range requests causing 501s - https://phabricator.wikimedia.org/T183902 [15:20:38] (03PS2) 10ArielGlenn: split out prefetch code to its own module [dumps] - 10https://gerrit.wikimedia.org/r/423143 [15:20:57] (03PS3) 10ArielGlenn: break class for handling the list of dump jobs out into its own module [dumps] - 10https://gerrit.wikimedia.org/r/423155 [15:23:57] 10Operations, 10Ops-Access-Requests: Access to stat100x and notebook1003.eqiad.wmnet - https://phabricator.wikimedia.org/T191308#4100865 (10Jonas) [15:23:59] 10Operations, 10Ops-Access-Requests: Access to stat100x and notebook1003.eqiad.wmnet - https://phabricator.wikimedia.org/T191309#4100877 (10Jonas) [15:31:14] (03CR) 10ArielGlenn: [C: 032] pylint and cleanup of runnerutils [dumps] - 10https://gerrit.wikimedia.org/r/423141 (owner: 10ArielGlenn) [15:31:46] (03CR) 10ArielGlenn: [C: 032] split out prefetch code to its own module [dumps] - 10https://gerrit.wikimedia.org/r/423143 (owner: 10ArielGlenn) [15:32:35] (03CR) 10ArielGlenn: [C: 032] break class for handling the list of dump jobs out into its own module [dumps] - 10https://gerrit.wikimedia.org/r/423155 (owner: 10ArielGlenn) [15:33:16] 10Operations, 10Ops-Access-Requests: Access to stat100x and notebook1003.eqiad.wmnet - https://phabricator.wikimedia.org/T191308#4100934 (10Dzahn) [15:33:19] 10Operations, 10Ops-Access-Requests: Access to stat100x and notebook1003.eqiad.wmnet - https://phabricator.wikimedia.org/T191309#4100936 (10Dzahn) [15:34:00] 10Operations, 10hardware-requests: Hardware request for Graphite - https://phabricator.wikimedia.org/T191312#4100937 (10fgiunchedi) [15:35:23] 10Operations, 10Ops-Access-Requests: Access to stat100x and notebook1003.eqiad.wmnet - https://phabricator.wikimedia.org/T191308#4100865 (10Dzahn) Hi Jonas, I don't see any details regarding notebook/stat hosts in the linked ticket. Is it the right one? [15:36:29] (03PS1) 10Muehlenhoff: Enable base::service_auto_restart for Parsoid services on ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/423718 (https://phabricator.wikimedia.org/T135991) [15:36:57] (03CR) 10jerkins-bot: [V: 04-1] Enable base::service_auto_restart for Parsoid services on ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/423718 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [15:38:21] 10Operations, 10vm-requests: eqiad: 1 VM %request for ping offload - https://phabricator.wikimedia.org/T190243#4100966 (10Dzahn) 05Open>03Resolved a:03Dzahn [ganeti1004:~] $ sudo gnt-instance info ping1001.eqiad.wmnet - Instance name: ping1001.eqiad.wmnet UUID: e20fcbf7-b66f-4fca-96b1-97f615224a3b Se... [15:38:27] (03PS2) 10Muehlenhoff: Enable base::service_auto_restart for Parsoid services on ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/423718 (https://phabricator.wikimedia.org/T135991) [15:38:51] (03CR) 10jerkins-bot: [V: 04-1] Enable base::service_auto_restart for Parsoid services on ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/423718 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [15:39:07] !log roll restart of zookeeper on conf100[123] to pick up prometheus monitoring [15:39:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:35] (03PS3) 10Muehlenhoff: Enable base::service_auto_restart for Parsoid services on ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/423718 (https://phabricator.wikimedia.org/T135991) [15:50:58] 10Operations, 10vm-requests: eqiad: 1 VM %request for ping offload - https://phabricator.wikimedia.org/T190243#4101020 (10Krinkle) [15:54:32] 10Operations, 10hardware-requests: Hardware request for Graphite - https://phabricator.wikimedia.org/T191312#4101027 (10fgiunchedi) a:03RobH [15:59:28] (03PS1) 10Elukey: role::prometheus::ops: rename cluster label for zookeeper [puppet] - 10https://gerrit.wikimedia.org/r/423720 (https://phabricator.wikimedia.org/T177460) [16:00:04] godog, moritzm, and _joe_: (Dis)respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180403T1600). Please do the needful. [16:00:04] No GERRIT patches in the queue for this window AFAICS. [16:00:13] \o/ [16:01:26] (03PS1) 10Gehel: maps: remove sources.yaml [puppet] - 10https://gerrit.wikimedia.org/r/423721 (https://phabricator.wikimedia.org/T112948) [16:01:28] (03PS1) 10Gehel: maps: cleanup of sources.yaml code [puppet] - 10https://gerrit.wikimedia.org/r/423722 (https://phabricator.wikimedia.org/T112948) [16:02:10] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/10776/prometheus1003.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/423720 (https://phabricator.wikimedia.org/T177460) (owner: 10Elukey) [16:02:49] (03CR) 10Gehel: [C: 04-1] "This should only be merged once related patches to kartotherian / tilerator deploy repo are deployed in production (https://gerrit.wikimed" [puppet] - 10https://gerrit.wikimedia.org/r/423722 (https://phabricator.wikimedia.org/T112948) (owner: 10Gehel) [16:02:58] (03CR) 10Gehel: [C: 04-1] "This should only be merged once related patches to kartotherian / tilerator deploy repo are deployed in production (https://gerrit.wikimed" [puppet] - 10https://gerrit.wikimedia.org/r/423721 (https://phabricator.wikimedia.org/T112948) (owner: 10Gehel) [16:03:06] (03PS1) 10Andrew Bogott: bootstrapvz: try to speed up VM creation in labtest [puppet] - 10https://gerrit.wikimedia.org/r/423723 [16:03:18] gehel: o/ [16:03:33] elukey: \o ! [16:03:40] all good?? [16:03:44] 10Operations, 10cloud-services-team: rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#4101051 (10RobH) >>! In T183937#4100120, @faidon wrote: >>>! In T183937#4097563, @RobH wrote: >> So there is an issue where trusty expects the os to be on eth0, and its on eth3. However, after di... [16:03:51] (03CR) 10Andrew Bogott: [C: 032] bootstrapvz: try to speed up VM creation in labtest [puppet] - 10https://gerrit.wikimedia.org/r/423723 (owner: 10Andrew Bogott) [16:04:35] !log ariel@tin Started deploy [dumps/dumps@77dc467]: split up some large modules, prep work for prefetch changes [16:04:39] !log ariel@tin Finished deploy [dumps/dumps@77dc467]: split up some large modules, prep work for prefetch changes (duration: 00m 04s) [16:04:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:04:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:05:23] 10Operations, 10cloud-services-team: rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#4101058 (10RobH) a:05RobH>03Cmjohnson @Cmjohnson: as pinged in IRC, please go to labvirt1021 and remove the cat5 connections, and plug the two 10G connections into the new 10G switch in that r... [16:10:23] !log rebooting restbase-dev1004 - T186751 [16:10:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:10:29] T186751: Reset RESTBase dev environment - https://phabricator.wikimedia.org/T186751 [16:10:51] (03PS1) 10Dzahn: install_server: add Icinga monitoring for TFTP service [puppet] - 10https://gerrit.wikimedia.org/r/423725 (https://phabricator.wikimedia.org/T190439) [16:11:21] (03CR) 10jerkins-bot: [V: 04-1] install_server: add Icinga monitoring for TFTP service [puppet] - 10https://gerrit.wikimedia.org/r/423725 (https://phabricator.wikimedia.org/T190439) (owner: 10Dzahn) [16:11:44] (03CR) 10Subramanya Sastry: [C: 031] Enable base::service_auto_restart for Parsoid services on ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/423718 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [16:11:53] 10Operations, 10Analytics, 10Code-Stewardship-Reviews, 10Tools, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319#4101074 (10Nuria) Analytics agrees to be stewards of this service once it is migrated to be on top of akafka stream, cc... [16:12:29] PROBLEM - Host restbase-dev1004 is DOWN: PING CRITICAL - Packet loss = 100% [16:13:29] RECOVERY - Host restbase-dev1004 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms [16:16:02] 10Operations, 10Services, 10Graphite: Cassandra Graphite metrics space usage audit and cleanup - https://phabricator.wikimedia.org/T191315#4101084 (10fgiunchedi) [16:18:27] (03PS1) 10Arturo Borrero Gonzalez: wcms: monitoring: delay cron running time window [puppet] - 10https://gerrit.wikimedia.org/r/423726 (https://phabricator.wikimedia.org/T190512) [16:18:34] (03PS1) 10Madhuvishy: nfs: Stop exporting dumps from labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/423727 (https://phabricator.wikimedia.org/T188643) [16:18:37] (03PS2) 10Dzahn: install_server: add Icinga monitoring for TFTP service [puppet] - 10https://gerrit.wikimedia.org/r/423725 (https://phabricator.wikimedia.org/T190439) [16:18:50] (03PS2) 10Arturo Borrero Gonzalez: wmcs: monitoring: delay cron running time window [puppet] - 10https://gerrit.wikimedia.org/r/423726 (https://phabricator.wikimedia.org/T190512) [16:19:07] (03CR) 10jerkins-bot: [V: 04-1] install_server: add Icinga monitoring for TFTP service [puppet] - 10https://gerrit.wikimedia.org/r/423725 (https://phabricator.wikimedia.org/T190439) (owner: 10Dzahn) [16:20:53] (03PS1) 10Madhuvishy: nfsclient: Cleanup absented dumps mount from labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/423728 (https://phabricator.wikimedia.org/T188643) [16:21:34] (03CR) 10Arturo Borrero Gonzalez: [C: 032] wmcs: monitoring: delay cron running time window [puppet] - 10https://gerrit.wikimedia.org/r/423726 (https://phabricator.wikimedia.org/T190512) (owner: 10Arturo Borrero Gonzalez) [16:22:09] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [16:23:09] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [16:23:47] (03PS3) 10Dzahn: install_server: add Icinga monitoring for TFTP service [puppet] - 10https://gerrit.wikimedia.org/r/423725 (https://phabricator.wikimedia.org/T190439) [16:24:20] (03CR) 10jerkins-bot: [V: 04-1] install_server: add Icinga monitoring for TFTP service [puppet] - 10https://gerrit.wikimedia.org/r/423725 (https://phabricator.wikimedia.org/T190439) (owner: 10Dzahn) [16:24:37] (03PS1) 10Bstorm: wiki replicas: set the index script to run without args in expected ways [puppet] - 10https://gerrit.wikimedia.org/r/423730 (https://phabricator.wikimedia.org/T181650) [16:25:20] that 5xx spike alerted above, it's brief but it's 504s [16:25:31] they were focused through ulsfo, but also minorly hit the other sites as well [16:25:42] (03PS1) 10Madhuvishy: dumps: Turn off cron that rsyncs to labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/423731 (https://phabricator.wikimedia.org/T188643) [16:25:45] 504s logged there tend to be applayer-generated... [16:26:57] the 504s were all for /api/rest_v1/metrics/pageviews [16:27:05] ah [16:27:12] malformed request maybe [16:27:29] the 504s were all for /api/rest_v1/metrics/pageviews/per-article actually [16:27:37] after that it's a fairly diverse set of wikinames and page titles [16:27:38] I think it is a known issue, I would ping analytics [16:27:50] so they can assess the impact [16:28:10] oh hmmm [16:28:15] ? [16:28:18] not so diverse. lots of article titles, but all itwiki [16:28:26] "/api/rest_v1/metrics/pageviews/per-article/it.wikipedia/all-access/all-agents/Environment_%28biophysical%29/daily/2015082100/2015082100" [16:28:37] lots of these, all itwiki, different articles [16:28:43] (03CR) 10Dzahn: [V: 032 C: 032] install_server: add Icinga monitoring for TFTP service [puppet] - 10https://gerrit.wikimedia.org/r/423725 (https://phabricator.wikimedia.org/T190439) (owner: 10Dzahn) [16:28:44] ~67K failures [16:28:47] (03PS4) 10Dzahn: install_server: add Icinga monitoring for TFTP service [puppet] - 10https://gerrit.wikimedia.org/r/423725 (https://phabricator.wikimedia.org/T190439) [16:29:09] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [16:29:17] (03CR) 10jerkins-bot: [V: 04-1] install_server: add Icinga monitoring for TFTP service [puppet] - 10https://gerrit.wikimedia.org/r/423725 (https://phabricator.wikimedia.org/T190439) (owner: 10Dzahn) [16:29:31] (that's ~67K failres all in a 1-2 minute window) [16:29:56] mostly apparently from EC2 IPs [16:30:22] (03PS1) 10Madhuvishy: dumps: Clean up code that rsyncs to labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/423732 (https://phabricator.wikimedia.org/T188643) [16:30:35] all same user-agent too: python-requests/2.18.4 [16:30:44] bblack: ouch.. it might be a recurrence of https://phabricator.wikimedia.org/T190213 [16:30:58] (03CR) 10jerkins-bot: [V: 04-1] dumps: Clean up code that rsyncs to labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/423732 (https://phabricator.wikimedia.org/T188643) (owner: 10Madhuvishy) [16:31:00] (03CR) 10Dzahn: [C: 032] "@Alex just adding you to show you that jenkins -1 issue here.. i dont get it. it is about the spec file but afaict the class _does_ contai" [puppet] - 10https://gerrit.wikimedia.org/r/423725 (https://phabricator.wikimedia.org/T190439) (owner: 10Dzahn) [16:31:03] I knew I had heard about it before [16:31:09] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [16:31:42] (03CR) 10Dzahn: [V: 032 C: 032] install_server: add Icinga monitoring for TFTP service [puppet] - 10https://gerrit.wikimedia.org/r/423725 (https://phabricator.wikimedia.org/T190439) (owner: 10Dzahn) [16:31:45] yeah, that answered a 404 to me [16:31:55] what bblack pasted [16:32:28] report the new occurence there and see if we can do something to help [16:32:52] (03PS2) 10Madhuvishy: dumps: Clean up code that rsyncs to labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/423732 (https://phabricator.wikimedia.org/T188643) [16:33:21] !log rebooting restbase-dev1006 - T186751 [16:33:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:27] T186751: Reset RESTBase dev environment - https://phabricator.wikimedia.org/T186751 [16:33:29] ah this is the thing that lead to the time_wait discussion I got trolled into friday? [16:33:29] checking https://logstash.wikimedia.org/app/kibana#/dashboard/restbase and it seems the same [16:33:33] yeah [16:33:36] exactly [16:33:49] not the trolled part :) [16:34:36] PROBLEM - Host restbase-dev1006 is DOWN: PING CRITICAL - Packet loss = 100% [16:35:40] 10Operations, 10monitoring, 10Patch-For-Review: add tftpd monitoring - https://phabricator.wikimedia.org/T190439#4101191 (10Dzahn) p:05Triage>03Normal [16:35:57] I think that we should proceed with https://gerrit.wikimedia.org/r/#/c/421901/ [16:36:06] and roll it out to some restbase nodes first [16:36:09] thoughts? [16:36:28] RECOVERY - Host restbase-dev1006 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [16:36:45] 10Operations, 10monitoring, 10Patch-For-Review: add tftpd monitoring - https://phabricator.wikimedia.org/T190439#4073739 (10Dzahn) https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=FTP [16:39:23] (03CR) 10BBlack: [C: 031] "+1 for functional concept, not actual puppet-coding details" [puppet] - 10https://gerrit.wikimedia.org/r/421901 (https://phabricator.wikimedia.org/T190213) (owner: 10Elukey) [16:39:38] elukey: ^ yes, I think it can only help, even if it may not be enough [16:39:39] (03PS1) 10Madhuvishy: dumps: Remove stat1005|6 from nfs clients for dataset1001 [puppet] - 10https://gerrit.wikimedia.org/r/423733 (https://phabricator.wikimedia.org/T188644) [16:41:49] bblack: thanks, going to have a chat with services but I agree that we'd need more [16:46:49] 10Operations, 10Analytics, 10Code-Stewardship-Reviews, 10Tools, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319#4101234 (10greg) >>! In T185319#4101074, @Nuria wrote: > once it is migrated to be on top of akafka stream Great! But w... [16:48:29] 10Operations, 10Analytics, 10Code-Stewardship-Reviews, 10Tools, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319#4101252 (10Nuria) One of the analytics engineers. [16:48:55] 10Operations, 10monitoring, 10Patch-For-Review: add tftpd monitoring - https://phabricator.wikimedia.org/T190439#4101255 (10Dzahn) 05Open>03Resolved done! see the link above [16:52:43] (03PS1) 10Chad: Adding empty motd.config [software/gerrit] (stable-2.14) - 10https://gerrit.wikimedia.org/r/423734 [16:52:54] (03CR) 10Chad: [C: 032] Adding empty motd.config [software/gerrit] (stable-2.14) - 10https://gerrit.wikimedia.org/r/423734 (owner: 10Chad) [16:52:56] (03CR) 10Chad: [V: 032 C: 032] Adding empty motd.config [software/gerrit] (stable-2.14) - 10https://gerrit.wikimedia.org/r/423734 (owner: 10Chad) [16:53:23] !log demon@tin Started deploy [gerrit/gerrit@aa1a1a0]: no-op, pushing empty motd.config file [16:53:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:35] !log demon@tin Finished deploy [gerrit/gerrit@aa1a1a0]: no-op, pushing empty motd.config file (duration: 00m 11s) [16:53:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:56:32] 10Operations, 10Services, 10Graphite: Cassandra Graphite metrics space usage audit and cleanup - https://phabricator.wikimedia.org/T191315#4101084 (10Eevans) FWIW: The RESTBase cluster has been disabled for some time. I just disabled the RESTBase Dev cluster as well. [16:57:00] (03PS4) 10Zoranzoki21: Enable on ku.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423188 (https://phabricator.wikimedia.org/T190944) [16:57:59] (03PS1) 10Chad: Gerrit: symlink in motd.config from deploy repo [puppet] - 10https://gerrit.wikimedia.org/r/423735 [16:58:15] paladox: ^^ [16:58:24] :) [16:58:33] (03CR) 10Paladox: [C: 031] Gerrit: symlink in motd.config from deploy repo [puppet] - 10https://gerrit.wikimedia.org/r/423735 (owner: 10Chad) [17:00:04] cscott, arlolra, subbu, halfak, and Amir1: Time to snap out of that daydream and deploy Services – Graphoid / Parsoid / Citoid / ORES. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180403T1700). [17:00:04] No GERRIT patches in the queue for this window AFAICS. [17:05:45] (03PS6) 10Elukey: profile::restbase: add sysctl settings to improve tcp performance [puppet] - 10https://gerrit.wikimedia.org/r/421901 (https://phabricator.wikimedia.org/T190213) [17:05:59] 10Operations, 10Analytics, 10Code-Stewardship-Reviews, 10Tools, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319#4101328 (10greg) 05Open>03Resolved a:03Nuria Awesome! https://www.mediawiki.org/w/index.php?title=Developers%2FMai... [17:08:17] !log manually set net.ipv4.tcp_tw_reuse=1 on restbase2001 as test for T190213 [17:08:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:13:29] (03PS3) 10Jcrespo: labsdb: Reduce the sleep timeouts of clients to prevent connection hogging [puppet] - 10https://gerrit.wikimedia.org/r/423494 [17:13:31] (03PS1) 10Jcrespo: dbproxy-labsdb: set labsdb1009 as the main web, the others analytics [puppet] - 10https://gerrit.wikimedia.org/r/423741 (https://phabricator.wikimedia.org/T191149) [17:13:46] (03PS2) 10Bstorm: wiki replicas: set the index script to run without args in expected ways [puppet] - 10https://gerrit.wikimedia.org/r/423730 (https://phabricator.wikimedia.org/T181650) [17:15:17] (03CR) 10Jcrespo: [C: 032] dbproxy-labsdb: set labsdb1009 as the main web, the others analytics [puppet] - 10https://gerrit.wikimedia.org/r/423741 (https://phabricator.wikimedia.org/T191149) (owner: 10Jcrespo) [17:15:24] (03PS2) 10Jcrespo: dbproxy-labsdb: set labsdb1009 as the main web, the others analytics [puppet] - 10https://gerrit.wikimedia.org/r/423741 (https://phabricator.wikimedia.org/T191149) [17:16:47] 10Operations, 10Datasets-General-or-Unknown, 10monitoring, 10Patch-For-Review: NFS on dataset1001 overloaded, high load on the hosts that mount it - https://phabricator.wikimedia.org/T169680#4101358 (10Dzahn) 05Open>03stalled [17:17:07] 10Operations, 10Datasets-General-or-Unknown, 10monitoring, 10Patch-For-Review: NFS on dataset1001 overloaded, high load on the hosts that mount it - https://phabricator.wikimedia.org/T169680#3405612 (10Dzahn) https://gerrit.wikimedia.org/r/#/c/363548/ [17:18:28] !log reloading labsdb proxy configuration [17:18:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:19:01] mm, I did something wrong [17:20:17] 10Operations, 10Ops-Access-Requests: Requesting access to stats machines for Lucas Werkmeister - https://phabricator.wikimedia.org/T190415#4101368 (10Lucas_Werkmeister_WMDE) > Can this not be inferred from wiki databases data? I don’t quite understand what you mean, sorry… in this case, I would distinguish be... [17:20:26] typo [17:23:41] 10Operations, 10monitoring: Disk space checks complaining on docker build hosts when building containers - https://phabricator.wikimedia.org/T179271#4101378 (10Dzahn) I believe this has already been resolved meanwhile. See the "-i ..docker" exclusions in these checks: ( --ignore-ereg-partition) ``` hierada... [17:23:50] 10Operations, 10monitoring: Disk space checks complaining on docker build hosts when building containers - https://phabricator.wikimedia.org/T179271#4101379 (10Dzahn) 05Open>03Resolved a:03Dzahn [17:24:05] !log upgrading HHVM on labweb* [17:24:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:24:12] !log sbisson@tin Started deploy [tilerator/deploy@03add2d]: Deploying tilerator i18n to maps-test* [17:24:15] (03CR) 10Bstorm: [C: 032] wiki replicas: set the index script to run without args in expected ways [puppet] - 10https://gerrit.wikimedia.org/r/423730 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [17:24:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:25:22] !log awight@tin Started deploy [ores/deploy@7701cee]: ORES versioned virtualenv, T181071 [17:25:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:25:28] T181071: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071 [17:25:46] !log awight@tin Finished deploy [ores/deploy@7701cee]: ORES versioned virtualenv, T181071 (duration: 00m 27s) [17:25:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:26:23] (03PS1) 10Jcrespo: dbproxy: Fix typo on labsdb1010 configuration [puppet] - 10https://gerrit.wikimedia.org/r/423744 [17:26:38] (03PS2) 10Jcrespo: dbproxy: Fix typo on labsdb1010 configuration [puppet] - 10https://gerrit.wikimedia.org/r/423744 [17:27:05] (03CR) 10Jcrespo: [C: 032] dbproxy: Fix typo on labsdb1010 configuration [puppet] - 10https://gerrit.wikimedia.org/r/423744 (owner: 10Jcrespo) [17:27:27] !log awight@tin Started deploy [ores/deploy@7701cee]: ORES versioned virtualenv, T181071 [17:27:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:28:21] !log sbisson@tin Finished deploy [tilerator/deploy@03add2d]: Deploying tilerator i18n to maps-test* (duration: 04m 09s) [17:28:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:28:30] (03PS1) 10Giuseppe Lavagetto: Upgrade the repack script. [debs/mcrouter] - 10https://gerrit.wikimedia.org/r/423745 [17:28:34] (03PS1) 10Giuseppe Lavagetto: New upstream version 0.36.0 [debs/mcrouter] - 10https://gerrit.wikimedia.org/r/423746 [17:28:36] (03PS1) 10Giuseppe Lavagetto: New upstream version 0.37.0 [debs/mcrouter] - 10https://gerrit.wikimedia.org/r/423747 [17:28:38] (03PS1) 10Giuseppe Lavagetto: Upgrade to 0.37.0 [debs/mcrouter] - 10https://gerrit.wikimedia.org/r/423748 [17:28:42] 10Operations, 10Scoring-platform-team: Remove deprecated hosts from ORES scap config - https://phabricator.wikimedia.org/T191321#4101428 (10awight) [17:28:54] !log awight@tin Finished deploy [ores/deploy@7701cee]: ORES versioned virtualenv, T181071 (duration: 01m 27s) [17:29:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:35:33] !log sbisson@tin Started deploy [tilerator/deploy@8e68cb8]: Deploying tilerator i18n to maps-test* (take 2) [17:35:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:35:58] !log sbisson@tin Finished deploy [tilerator/deploy@8e68cb8]: Deploying tilerator i18n to maps-test* (take 2) (duration: 00m 25s) [17:36:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:37:46] (03PS3) 10Bstorm: wiki replicas: set the index script to run without args in expected ways [puppet] - 10https://gerrit.wikimedia.org/r/423730 (https://phabricator.wikimedia.org/T181650) [17:37:55] (03CR) 10Bstorm: [V: 032 C: 032] wiki replicas: set the index script to run without args in expected ways [puppet] - 10https://gerrit.wikimedia.org/r/423730 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [17:38:50] 10Operations, 10netops: Config discrepencies on network devices - https://phabricator.wikimedia.org/T189588#4101468 (10ayounsi) [17:40:01] !log manually set net.ipv4.tcp_tw_reuse=1 on restbase1007 as test for T190213 [17:40:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:41:40] 10Operations, 10Ops-Access-Requests: Requesting access to stats machines for Lucas Werkmeister - https://phabricator.wikimedia.org/T190415#4101473 (10Nuria) @Lucas_Werkmeister_WMDE I think we need to understand a bit more what is what you are doing to recommend, Can you set up a meeting with someone from ana... [17:41:52] (03PS1) 10RobH: labvirt1021 mac address update [puppet] - 10https://gerrit.wikimedia.org/r/423751 [17:42:11] (03PS2) 10RobH: labvirt1021 mac address update [puppet] - 10https://gerrit.wikimedia.org/r/423751 [17:42:41] (03CR) 10jerkins-bot: [V: 04-1] labvirt1021 mac address update [puppet] - 10https://gerrit.wikimedia.org/r/423751 (owner: 10RobH) [17:43:16] hrmm [17:43:17] 17:42:39 3) install_server::tftp_server should contain File[/etc/default/atftpd] with mode => "0444", owner => "root" and group => "root" [17:43:26] that is interesting i didnt touch those settings [17:44:39] robh: i ran into it as well [17:44:45] it's not just your change [17:44:53] Do we know the root cause? [17:44:56] i looked at the spec file.. and i didnt understand it [17:45:08] because it says what that class should have.. and when i look at it .. it has that [17:45:50] cause is modules/install_server/spec/classes/install_server_tftp_server_spec.rb [17:45:51] 17:42:39 rspec ./spec/classes/install_server_tftp_server_spec.rb:4 # install_server::tftp_server should compile into a catalogue without dependency cycles [17:45:51] 17:42:39 rspec ./spec/classes/install_server_tftp_server_spec.rb:5 # install_server::tftp_server should contain Package[atftpd] with ensure => "present" [17:45:52] 17:42:39 rspec ./spec/classes/install_server_tftp_server_spec.rb:7 # install_server::tftp_server should contain File[/etc/default/atftpd] with mode => "0444", owner => "root" and group => "root" [17:45:53] 17:42:39 rspec ./spec/classes/install_server_tftp_server_spec.rb:15 # install_server::tftp_server should contain File[/srv/tftpboot] with mode => "0444", owner => "root", group => "root" and recurse => "remote" [17:45:56] =P [17:46:09] yea, now look at modules/install_server/spec/classes/install_server_tftp_server_spec.rb [17:46:43] that file has rules for modules/install_server/manifests/tftp_server.pp [17:46:54] yeah [17:46:59] says it sets the file and perms there [17:47:00] but it seems to be that ..the check should be succesful [17:47:10] why it is not.. i dont know [17:47:13] I don't recall this last week [17:47:17] me neither [17:47:21] but i didnt make changes to install_server yesterday did you? [17:47:24] just wondering when this started. [17:47:26] i added Alex in a comment about that [17:47:38] so did you just +V yourself and overrirde or just wait? [17:47:45] i mean, i rather not be blocked but i dont wanna break things. [17:47:58] yes, i did just +V to confirm [17:48:05] and there was no issue with my change and it worked [17:48:53] (03CR) 10RobH: [V: 032 C: 032] "overriding the -1 v from the failed check. discussion in irc with daniel shows he also had the issue, and has pinged alex about it." [puppet] - 10https://gerrit.wikimedia.org/r/423751 (owner: 10RobH) [17:49:06] robh: i suggest you do that ^ but we also create a ticket... [17:49:18] yeah, ill file a task and link my patch in it if you do same [17:49:25] or will when done messing with labvirt [17:49:54] cool, i'll link my change [17:50:31] thx for chiming in, helps to know its not just me! [17:50:33] =] [17:51:45] robh: also, my change was to add .. Icinga monitoring for TFTP service [17:51:59] so if that is stopped or crashed.. we will now get an alert [17:58:04] ACKNOWLEDGEMENT - Host cp2022 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T191229 [17:59:04] !log restarting ferm on bromine [17:59:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:59:31] RECOVERY - Check whether ferm is active by checking the default input chain on bromine is OK: OK ferm input default policy is set [18:00:04] (03PS1) 10Bstorm: wiki replicas: make dry-run command the same as others and fix a typo [puppet] - 10https://gerrit.wikimedia.org/r/423754 (https://phabricator.wikimedia.org/T181650) [18:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180403T1800) [18:00:05] No GERRIT patches in the queue for this window AFAICS. [18:01:18] (03CR) 10Bstorm: [C: 032] wiki replicas: make dry-run command the same as others and fix a typo [puppet] - 10https://gerrit.wikimedia.org/r/423754 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [18:01:23] (03CR) 10Bstorm: [C: 032] "Quick merging small bugfix" [puppet] - 10https://gerrit.wikimedia.org/r/423754 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [18:01:57] 10Operations, 10Ops-Access-Requests: Access to stat100x and notebook1003.eqiad.wmnet - https://phabricator.wikimedia.org/T191308#4101537 (10Jonas) The public key is in the other ticket. If you are asking for the reason: I would like to run SQL and SPARK queries [18:03:51] !log elnath - fixing and re-enabling Icinga alert about screens, none are running, spare hosts should not have these [18:03:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:05:36] !log rhodium - closing idle screen session from maintenance work on puppetmasters [18:05:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:06:31] RECOVERY - Long running screen/tmux on rhodium is OK: OK: No SCREEN or tmux processes detected. [18:09:59] mutante: the failure of the tests is because of your change, you added the call to nrpe::monitor_service that is a define and you need to add it as a dependency in the tests, in the .fixtures.yml [18:13:26] !log upgrading restbase-dev1004-a to cassandra 3.11.2 - T186751 [18:13:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:33] T186751: Reset RESTBase dev environment - https://phabricator.wikimedia.org/T186751 [18:14:27] volans: ? ok.. is there a reason why i would have to do that only in this one specific module? [18:15:40] !log upgrading restbase-dev1004-b to cassandra 3.11.2 - T186751 [18:15:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:32] mutante: in the tests for this module, because you added a dependency on a define that is defined elsewhere [18:16:39] is it because using a define from another module should never happen?? [18:17:07] actually that should probably be in the profile, was the style guide check happy? [18:18:13] ah it didn't get to that check at all [18:18:21] i don't know because the errors just showed me other things. .yes that [18:18:39] !log upgrading restbase-dev1005-a to cassandra 3.11.2 - T186751 [18:18:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:18:45] T186751: Reset RESTBase dev environment - https://phabricator.wikimedia.org/T186751 [18:19:31] 10Operations, 10cloud-services-team, 10Patch-For-Review: rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#4101606 (10RobH) [18:19:33] not sure i follow since "define from other module" will be every single one of those nrpe checks across the fleet [18:20:00] if it's just because it needs to be in the profile.. ok .. then i see [18:20:15] though that conversion would be blocking it [18:20:19] are two different things [18:20:28] !log upgrading restbase-dev1005-b to cassandra 3.11.2 - T186751 [18:20:30] the profile one is our style [18:20:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:20:55] the error is because the way the tests are done, you need for each module's test to specify its dependencies [18:20:56] (03PS1) 10Dzahn: Revert "install_server: add Icinga monitoring for TFTP service" [puppet] - 10https://gerrit.wikimedia.org/r/423755 [18:21:04] (03PS2) 10Dzahn: Revert "install_server: add Icinga monitoring for TFTP service" [puppet] - 10https://gerrit.wikimedia.org/r/423755 [18:21:11] if you see modules/jenkins/.fixtures.yml we add a depedency to the nrpe module [18:21:30] so I guess that there we're including something from the nrpe module somewhere, probably a check [18:21:57] this needs to be added only in the .fixtures.yml of the module that has unit tests and has also dependencies on other modules AFAIK [18:21:59] yea, but that's like any other NRPE check in any ohter module [18:22:09] (03CR) 10Dzahn: [C: 032] Revert "install_server: add Icinga monitoring for TFTP service" [puppet] - 10https://gerrit.wikimedia.org/r/423755 (owner: 10Dzahn) [18:22:16] but each module has its own tests [18:22:26] or none at all actually [18:22:28] ok [18:23:02] in this case yours was the first dependency in the install_server module from the nrpe module [18:23:06] I guess [18:23:19] is the goal to have to do this for ALL changes in the future? seems quite some overhead [18:23:41] !log upgrading restbase-dev1006-a to cassandra 3.11.2 - T186751 [18:23:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:46] T186751: Reset RESTBase dev environment - https://phabricator.wikimedia.org/T186751 [18:23:57] mutante: is not a goal, is how the tests works AFAIK [18:24:15] 10Operations, 10monitoring, 10Patch-For-Review: add tftpd monitoring - https://phabricator.wikimedia.org/T190439#4101626 (10Dzahn) 05Resolved>03Open [18:24:34] mutante: see .fixtures.yml in https://puppet.com/blog/unit-testing-rspec-puppet-for-beginners [18:24:59] 10Operations, 10monitoring, 10Patch-For-Review: add tftpd monitoring - https://phabricator.wikimedia.org/T190439#4073739 (10Dzahn) reverted puppet change [18:25:37] !log upgrading restbase-dev1006-b to cassandra 3.11.2 - T186751 [18:25:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:25:50] but in this case the check should go in the profile IMHO, see also https://wikitech.wikimedia.org/wiki/Puppet_coding#WMF_Design_conventions [18:27:52] 10Operations, 10monitoring, 10Patch-For-Review: add tftpd monitoring - https://phabricator.wikimedia.org/T190439#4101632 (10Dzahn) 05Open>03stalled [18:31:12] (03PS1) 10Pnorman: Add osmium and osmborder to maps machines [puppet] - 10https://gerrit.wikimedia.org/r/423758 (https://phabricator.wikimedia.org/T191324) [18:32:16] (03PS1) 10Andrew Bogott: Labtest: move openstack version to Mitaka [puppet] - 10https://gerrit.wikimedia.org/r/423759 [18:33:15] (03CR) 10Andrew Bogott: [C: 032] Labtest: move openstack version to Mitaka [puppet] - 10https://gerrit.wikimedia.org/r/423759 (owner: 10Andrew Bogott) [18:33:52] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4101652 (10Cmjohnson) [18:33:55] 10Operations, 10DBA, 10hardware-requests, 10Goal: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#4101653 (10Cmjohnson) [18:33:58] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1001 - https://phabricator.wikimedia.org/T190262#4101651 (10Cmjohnson) 05Open>03Resolved [18:35:49] 10Operations, 10cloud-services-team, 10Patch-For-Review: rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#4101660 (10RobH) Ok, labvirt1021, which was moved to 10G and uses its eth0 for primary OS install, installed fine. I went ahead and have documented the issue, as shown on la... [18:36:25] 10Operations, 10ops-eqiad, 10netops: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960#4101662 (10Cmjohnson) Not sure where to put this but I removed to switch ports today - ge-2/0/0 { - description db1001; - disable; - } - ge-2/0/10 { - descr... [18:36:42] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1016 - https://phabricator.wikimedia.org/T190179#4101663 (10Cmjohnson) [18:36:56] 10Operations, 10DBA, 10Patch-For-Review: Switchover m1 master from db1016 to db1063 - https://phabricator.wikimedia.org/T189655#4101668 (10Cmjohnson) [18:36:59] 10Operations, 10DBA, 10hardware-requests, 10Goal: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#4101669 (10Cmjohnson) [18:37:02] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1016 - https://phabricator.wikimedia.org/T190179#4065400 (10Cmjohnson) 05Open>03Resolved [18:38:37] (03CR) 10Gehel: [C: 04-1] Add osmium and osmborder to maps machines (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/423758 (https://phabricator.wikimedia.org/T191324) (owner: 10Pnorman) [18:40:05] 10Operations, 10ops-eqiad, 10netops: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960#4101681 (10Cmjohnson) Also removed this - ge-2/0/15 { - description db1016; - disable; - } [18:40:32] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1016 - https://phabricator.wikimedia.org/T190179#4101682 (10Cmjohnson) - ge-2/0/0 { - description db1001; - disable; - } - ge-2/0/10 { - description db1011; - disable; - } - ge-2/0/... [18:40:52] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4101687 (10Cmjohnson) [18:40:55] 10Operations, 10DBA, 10hardware-requests, 10Goal: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#4101688 (10Cmjohnson) [18:40:58] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1011 - https://phabricator.wikimedia.org/T184703#4101685 (10Cmjohnson) 05Open>03Resolved - ge-2/0/0 { - description db1001; - disable; - } - ge-2/0/10 { - description db1011; - disa... [18:41:26] 10Operations, 10DBA, 10hardware-requests, 10Goal: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3235306 (10Cmjohnson) [18:41:29] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1030 - https://phabricator.wikimedia.org/T184397#4101689 (10Cmjohnson) 05Open>03Resolved [18:41:57] (03PS3) 10Dzahn: Gerrit: Simplify directory structure [puppet] - 10https://gerrit.wikimedia.org/r/422336 (owner: 10Chad) [18:44:27] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/10778/cobalt.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/422336 (owner: 10Chad) [18:44:29] (03CR) 10Dzahn: [C: 032] Gerrit: Simplify directory structure [puppet] - 10https://gerrit.wikimedia.org/r/422336 (owner: 10Chad) [18:44:49] (03PS2) 10Muehlenhoff: Update production SSH key for dereckson [puppet] - 10https://gerrit.wikimedia.org/r/419844 (owner: 10Dereckson) [18:45:24] (03CR) 10Muehlenhoff: [C: 032] Update production SSH key for dereckson [puppet] - 10https://gerrit.wikimedia.org/r/419844 (owner: 10Dereckson) [18:45:58] (03PS1) 10Dzahn: Revert "Gerrit: Simplify directory structure" [puppet] - 10https://gerrit.wikimedia.org/r/423763 [18:46:37] (03CR) 10Dzahn: [C: 032] "Invalid relationship: File[/var/lib/gerrit2/review_site/bin] { require => File[/var/lib/gerrit] }, because File[/var/lib/gerrit] doesn't s" [puppet] - 10https://gerrit.wikimedia.org/r/423763 (owner: 10Dzahn) [18:47:06] (03CR) 10Dzahn: [C: 032] "let's please test these on a VPS" [puppet] - 10https://gerrit.wikimedia.org/r/423763 (owner: 10Dzahn) [18:47:37] (03PS2) 10Dzahn: Revert "Gerrit: Simplify directory structure" [puppet] - 10https://gerrit.wikimedia.org/r/423763 [18:47:52] (03PS23) 10Paladox: Gerrit: Switch to the mariadb connector [puppet] - 10https://gerrit.wikimedia.org/r/384588 (https://phabricator.wikimedia.org/T176164) [18:47:54] (03PS24) 10Paladox: Gerrit: Switch to the mariadb connector [puppet] - 10https://gerrit.wikimedia.org/r/384588 (https://phabricator.wikimedia.org/T176164) [18:48:47] (03PS26) 10Paladox: Gerrit: Switch to the mariadb connector [puppet] - 10https://gerrit.wikimedia.org/r/384588 (https://phabricator.wikimedia.org/T176164) [18:49:17] (03PS27) 10Paladox: Gerrit: Switch to the mariadb connector [puppet] - 10https://gerrit.wikimedia.org/r/384588 (https://phabricator.wikimedia.org/T176164) [18:50:54] (03CR) 10Dzahn: [C: 032] "it broke puppet with an invalid relationship, that isn't even caught by compiling. can we please start to apply changes to a VPS before pr" [puppet] - 10https://gerrit.wikimedia.org/r/422336 (owner: 10Chad) [18:51:50] (03PS6) 10Dzahn: cassandra/icinga: make monitoring configurable, skip on dev [puppet] - 10https://gerrit.wikimedia.org/r/419339 (https://phabricator.wikimedia.org/T189050) [18:53:29] (03PS2) 10Gehel: Add osmium and osmborder to maps machines [puppet] - 10https://gerrit.wikimedia.org/r/423758 (https://phabricator.wikimedia.org/T191324) (owner: 10Pnorman) [18:54:40] (03CR) 10Gehel: [C: 032] "After discussion with Paul, it make sense to have those packages on all nodes which work with OSM data. So... LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/423758 (https://phabricator.wikimedia.org/T191324) (owner: 10Pnorman) [18:54:45] (03PS28) 10Paladox: Gerrit: Switch to the mariadb connector [puppet] - 10https://gerrit.wikimedia.org/r/384588 (https://phabricator.wikimedia.org/T176164) [19:00:04] twentyafterfour: It is that lovely time of the day again! You are hereby commanded to deploy MediaWiki train. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180403T1900). [19:00:04] No GERRIT patches in the queue for this window AFAICS. [19:01:10] no_justification wondering could you review https://gerrit.wikimedia.org/r/#/c/384588/ when you have a chance please? :) [19:06:41] 10Puppet, 10Beta-Cluster-Infrastructure: Error: Could not find class role::kafka::jumbo::mirror for deployment-kafka0[45] - https://phabricator.wikimedia.org/T191154#4095542 (10thcipriani) >>! In T191154#4097292, @Ottomata wrote: > Hm actually, I don't seem to have access to the deployment-prep project in Hori... [19:07:05] !log Preparing to deploy 1.31.0-wmf.28 refs T183967 [19:07:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:07:25] T183967: 1.31.0-wmf.28 deployment blockers - https://phabricator.wikimedia.org/T183967 [19:07:33] (03PS1) 10Andrew Bogott: nova: add mitaka version of dnsmasq-nova.conf.erb [puppet] - 10https://gerrit.wikimedia.org/r/423766 [19:08:59] (03CR) 10Andrew Bogott: [C: 032] nova: add mitaka version of dnsmasq-nova.conf.erb [puppet] - 10https://gerrit.wikimedia.org/r/423766 (owner: 10Andrew Bogott) [19:19:18] (03PS1) 10Andrew Bogott: nova: add api database for nova-network deploy [puppet] - 10https://gerrit.wikimedia.org/r/423767 [19:19:58] (03CR) 10Andrew Bogott: [C: 032] nova: add api database for nova-network deploy [puppet] - 10https://gerrit.wikimedia.org/r/423767 (owner: 10Andrew Bogott) [19:31:12] 10Operations, 10Patch-For-Review, 10Scoring-platform-team (Current): Remove deprecated hosts from ORES scap config - https://phabricator.wikimedia.org/T191321#4101950 (10awight) [19:45:15] (03CR) 10Chad: "Well, it wasn't tested yet :\" [puppet] - 10https://gerrit.wikimedia.org/r/422336 (owner: 10Chad) [19:46:33] !log restarting restbase-dev1004-{a,b} to enable patched cassandra 3.11.2 build - T186751 [19:46:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:45] T186751: Reset RESTBase dev environment - https://phabricator.wikimedia.org/T186751 [20:04:09] (03PS1) 1020after4: Add testwikis dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423775 [20:18:36] (03PS1) 1020after4: testwikis wikis to 1.31.0-wmf.28 refs T183967 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423777 [20:18:47] (03CR) 1020after4: [C: 032] testwikis wikis to 1.31.0-wmf.28 refs T183967 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423777 (owner: 1020after4) [20:19:26] 10Operations, 10cloud-services-team, 10Patch-For-Review: rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#4102062 (10RobH) So @ayounsi found this: https://help.ubuntu.com/community/Installation/Netboot#Multiple_Network_Interface_Note this seems to describe our issue. However,... [20:26:42] (03CR) 1020after4: [C: 032] Add testwikis dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423775 (owner: 1020after4) [20:28:11] (03Merged) 10jenkins-bot: Add testwikis dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423775 (owner: 1020after4) [20:28:13] (03Merged) 10jenkins-bot: testwikis wikis to 1.31.0-wmf.28 refs T183967 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423777 (owner: 1020after4) [20:28:26] (03CR) 10jenkins-bot: Add testwikis dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423775 (owner: 1020after4) [20:28:55] !log twentyafterfour@tin Started scap: testwikis wikis to 1.31.0-wmf.28 refs T183967 [20:29:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:29:08] T183967: 1.31.0-wmf.28 deployment blockers - https://phabricator.wikimedia.org/T183967 [20:29:30] 10Operations, 10cloud-services-team, 10Patch-For-Review: rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#4102135 (10ayounsi) ``` lang=diff ayounsi@asw2-b-eqiad# show | compare [edit interfaces interface-range vlan-cloud-hosts1-b-eqiad] member xe-2/0/24 { ... } + member... [20:42:19] (03PS1) 10Ottomata: Bump request.timeout.ms and batch.size for main -> jumbo MirrorMaker [puppet] - 10https://gerrit.wikimedia.org/r/423781 (https://phabricator.wikimedia.org/T189464) [20:44:51] (03CR) 10Ottomata: [C: 032] Bump request.timeout.ms and batch.size for main -> jumbo MirrorMaker [puppet] - 10https://gerrit.wikimedia.org/r/423781 (https://phabricator.wikimedia.org/T189464) (owner: 10Ottomata) [20:56:33] (03PS1) 10Dzahn: installserver: convert tftp role to profile [puppet] - 10https://gerrit.wikimedia.org/r/423787 [21:09:00] 10Operations, 10cloud-services-team, 10Patch-For-Review: rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#4102261 (10faidon) >>! In T183937#4102062, @RobH wrote: > So @ayounsi found this: https://help.ubuntu.com/community/Installation/Netboot#Multiple_Network_Interface_Note > >... [21:09:32] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: decommission uranium/WMF3128 - https://phabricator.wikimedia.org/T191348#4102262 (10RobH) p:05Triage>03Normal [21:11:03] PROBLEM - High CPU load on API appserver on mw1231 is CRITICAL: CRITICAL - load average: 54.57, 29.47, 21.76 [21:12:19] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: decommission uranium/WMF3128 - https://phabricator.wikimedia.org/T191348#4102297 (10RobH) [21:12:31] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: decom vanadium/WMF3291 - https://phabricator.wikimedia.org/T191351#4102298 (10RobH) p:05Triage>03Normal [21:13:49] !log (re)starting restbase-dev1004-{a,b} (ooms), and enabling alternately patched cassandra 3.11.2 build - T186751 [21:13:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:14:02] T186751: Reset RESTBase dev environment - https://phabricator.wikimedia.org/T186751 [21:14:03] RECOVERY - High CPU load on API appserver on mw1231 is OK: OK - load average: 17.47, 23.64, 20.91 [21:15:33] !log twentyafterfour@tin Finished scap: testwikis wikis to 1.31.0-wmf.28 refs T183967 (duration: 46m 38s) [21:15:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:15:46] T183967: 1.31.0-wmf.28 deployment blockers - https://phabricator.wikimedia.org/T183967 [21:16:47] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: decom zinc/WMF3298 - https://phabricator.wikimedia.org/T191352#4102317 (10RobH) p:05Triage>03Normal [21:21:06] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: decom niobium/WMF3428 - https://phabricator.wikimedia.org/T191355#4102362 (10RobH) p:05Triage>03Normal [21:27:46] (03PS1) 1020after4: group0 wikis to 1.31.0-wmf.28 refs T183967 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423792 [21:27:52] 10Operations, 10Ops-Access-Requests, 10Scoring-platform-team: Request for "administrator" rights on beta cluster - https://phabricator.wikimedia.org/T191356#4102382 (10awight) [21:27:56] (03CR) 1020after4: [C: 032] group0 wikis to 1.31.0-wmf.28 refs T183967 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423792 (owner: 1020after4) [21:28:56] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: decom silver/WMF3434 - https://phabricator.wikimedia.org/T191357#4102394 (10RobH) p:05Triage>03Normal [21:29:17] (03Merged) 10jenkins-bot: group0 wikis to 1.31.0-wmf.28 refs T183967 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423792 (owner: 1020after4) [21:30:38] !log twentyafterfour@tin rebuilt and synchronized wikiversions files: group0 wikis to 1.31.0-wmf.28 refs T183967 [21:30:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:30:51] T183967: 1.31.0-wmf.28 deployment blockers - https://phabricator.wikimedia.org/T183967 [21:33:29] 10Operations, 10Puppet: Plan Puppet 5 upgrade - https://phabricator.wikimedia.org/T184564#4102416 (10herron) An upgrade to Puppet 5 on paper is very similar in process/approach to our Puppet 4 upgrade where we upgraded Puppet masters first, then PuppetDB. The plan looks like: Prep: * Address deprecation w... [21:33:58] 10Operations, 10Puppet, 10Goal: Modernize Puppet Configuration Management (2017-18 Q3 Goal) - https://phabricator.wikimedia.org/T184561#4102419 (10herron) [21:34:47] (03PS1) 10Chad: Gerrit: Move all logging to /var/log [puppet] - 10https://gerrit.wikimedia.org/r/423794 [21:36:40] (03CR) 10Paladox: [C: 031] Gerrit: Move all logging to /var/log [puppet] - 10https://gerrit.wikimedia.org/r/423794 (owner: 10Chad) [21:37:39] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: decom spare server caesium - https://phabricator.wikimedia.org/T191358#4102429 (10RobH) p:05Triage>03Normal [21:39:19] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: decom spare server iodine - https://phabricator.wikimedia.org/T191359#4102444 (10RobH) p:05Triage>03Normal [21:43:00] 10Operations, 10Analytics, 10New-Readers, 10Traffic, and 2 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#4102458 (10Nuria) [21:43:16] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: decom spare server lawrencium/WMF3542 - https://phabricator.wikimedia.org/T191360#4102460 (10RobH) p:05Triage>03Normal [21:45:01] (03PS1) 10Chad: Gerrit: Further clean up file ownership [puppet] - 10https://gerrit.wikimedia.org/r/423796 [21:45:20] (03CR) 10Chad: "Can you test this in labs plz?" [puppet] - 10https://gerrit.wikimedia.org/r/423794 (owner: 10Chad) [21:48:13] (03CR) 10Paladox: [C: 031] "> Can you test this in labs plz?" [puppet] - 10https://gerrit.wikimedia.org/r/423794 (owner: 10Chad) [21:49:09] (03CR) 10Paladox: [C: 031] Gerrit: Further clean up file ownership [puppet] - 10https://gerrit.wikimedia.org/r/423796 (owner: 10Chad) [21:50:07] hi phab seems slow for me [21:50:15] twentyafterfour ^^ [21:52:08] everything else is fine for me so not my browser [21:52:49] 10Operations, 10Analytics, 10New-Readers, 10Traffic, and 2 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#4102475 (10Nuria) >If we can get updates from them, can we repair the date/update things on our side retroactively? I Not likely as we would need the original re... [21:56:26] (03PS1) 10Chad: Gerrit: Remove symlink to gerrit.war from homedir [puppet] - 10https://gerrit.wikimedia.org/r/423800 [21:58:04] 10Operations, 10Phabricator: Phabricator is really slow - https://phabricator.wikimedia.org/T191361#4102487 (10Paladox) [21:58:16] 10Operations, 10Phabricator: Phabricator is really slow - https://phabricator.wikimedia.org/T191361#4102497 (10Paladox) p:05Triage>03High [21:58:49] paladox: it's working ok here [21:59:01] twentyafterfour see http://recordit.co/TT4Gcyai4k [21:59:11] it seems this happened before too [21:59:28] where it was working in one place and in another place it woulden't [22:00:40] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: decom promethium/WMF3571 - https://phabricator.wikimedia.org/T191362#4102499 (10RobH) p:05Triage>03Normal [22:01:50] doesn't seem that slow? [22:01:53] https://grafana.wikimedia.org/dashboard/db/phabricator?orgId=1&from=now%2Fd&to=now%2Fd [22:02:00] I don't see anything too crazy there [22:02:02] (03PS1) 10Chad: Gerrit: Run directly from deployment location [puppet] - 10https://gerrit.wikimedia.org/r/423801 [22:02:21] twentyafterfour it feels alot slower then before. [22:02:32] (03PS2) 10Chad: Gerrit: Run directly from deployment location [puppet] - 10https://gerrit.wikimedia.org/r/423801 [22:02:40] search should be almost instantly [22:02:44] and browsing too [22:02:54] but then within 10-20mins it started slowing [22:02:56] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: decom spare server nobelium/wmf4543 - https://phabricator.wikimedia.org/T191363#4102513 (10RobH) p:05Triage>03Normal [22:05:40] (03CR) 10Paladox: [C: 031] Gerrit: Run directly from deployment location [puppet] - 10https://gerrit.wikimedia.org/r/423801 (owner: 10Chad) [22:06:46] 10Operations, 10Phabricator: Phabricator is really slow - https://phabricator.wikimedia.org/T191361#4102530 (10Paladox) Possibly related T182832 [22:08:13] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: decom spare server osmium/wmf4546 - https://phabricator.wikimedia.org/T191364#4102532 (10RobH) p:05Triage>03Normal [22:10:15] !log twentyafterfour@tin Pruned MediaWiki: 1.31.0-wmf.24 [keeping static files] (duration: 04m 18s) [22:10:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:10:57] jouncebot: next [22:11:05] In 0 hour(s) and 48 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180403T2300) [22:12:24] !log twentyafterfour@tin Pruned MediaWiki: 1.31.0-wmf.25 [keeping static files] (duration: 01m 55s) [22:12:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:13:28] mutante: I'm done if you need to deploy something? [22:14:10] !log Finished MediaWiki Train for group0, 1.31.0-wmf.28 refs T183967 [22:14:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:14:22] T183967: 1.31.0-wmf.28 deployment blockers - https://phabricator.wikimedia.org/T183967 [22:14:52] twentyafterfour: i just want to deploy the deployment server ;) [22:15:01] but i was too late, so next time [22:15:03] well I'm logging off of it right now [22:15:43] can i have a window on the calendar [22:15:43] 10Operations, 10Phabricator: Phabricator is really slow - https://phabricator.wikimedia.org/T191361#4102588 (10Paladox) Takes 3.13s to load. But it is very slow. Even submitting this comment is taking a few secs whereas before it was really fast. [22:15:51] like the next time you do this maybe [22:18:22] 10Operations, 10Phabricator: Phabricator is loading really slowly - https://phabricator.wikimedia.org/T191361#4102595 (10Paladox) [22:20:32] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), and 2 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#4102601 (10Dzahn) ``` [phab1001:~] $ sudo apache-status... [22:20:41] 10Operations, 10Phabricator: Phabricator is loading really slowly - https://phabricator.wikimedia.org/T191361#4102602 (10Paladox) [22:21:15] mutante: when would you like? [22:21:53] twentyafterfour: right before a normal train .. any day except Monday i guess [22:22:05] So tomorrow? [22:22:32] ok :) that works [22:22:44] twentyafterfour: totally separate question.. should we restart apache on phab really quick [22:22:56] paladox reports it's slow and i see workers again that are in G state [22:23:38] 96 requests currently being processed, 50 idle workers [22:23:39] two users in #wikipedia-en reported it being slow too but usable (i find it usable but very slow). [22:23:47] GWWGGGGGG.WWWWWGW.WGGWW._GGGG_.GGGGW_GWGGG.W_.WWGGGGGW_.__ .. [22:24:15] i find it usable.. after all i commented on the ticket about it , heh [22:24:18] mutante: I don't think that is enough workers to cause any trouble but it won't hurt to restart and see if it helps [22:25:02] !log restarting Apache on phab1001 - T182832 [22:25:09] mutante: is an hour long enough for your deploy server switchover? there is already a slot reserved for "Pre MediaWiki train sanity break" at 2018-04-04 11:00 SF [22:25:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:25:10] T182832: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832 [22:25:19] thanks mutante that fixed it :) [22:25:21] mutante: so you can just take over that slot I think [22:25:29] paladox: :) ok, yw [22:25:33] :) [22:25:33] hmmm [22:26:01] 10Operations, 10Analytics, 10New-Readers, 10Traffic, and 2 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#4102622 (10atgo) >>! In T187014#4102475, @Nuria wrote: >>If we can get updates from them, can we repair the date/update things on our side retroactively? I > Ano... [22:26:09] twentyafterfour: yes, that works. should be enough [22:26:29] this time we won't wait for an initial puppet run ... [22:26:35] which is the one thing that took so long [22:26:41] 10Operations, 10Phabricator: Phabricator is loading really slowly - https://phabricator.wikimedia.org/T191361#4102623 (10Paladox) 05Open>03Resolved restarting Apache on phab1001 - T182832 [22:27:03] mutante: done: https://wikitech.wikimedia.org/wiki/Deployments#Week_of_April_2nd [22:27:09] it should just a be merging a 'revert:revert:revert' change, running puppet and then watching a deploy hopefully go just normal [22:27:15] twentyafterfour: thanks! [22:27:55] you're welcome! :) [22:29:30] (03PS1) 10Dzahn: Revert "Revert "Gerrit: Simplify directory structure"" [puppet] - 10https://gerrit.wikimedia.org/r/423805 [22:30:22] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), and 2 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#4102628 (10Paladox) This happened again even with the cron... [22:32:21] (03CR) 10Dzahn: [C: 032] "nevermind, let me just revert:revert and then follow-up to fix that issue..it's cleaner that way" [puppet] - 10https://gerrit.wikimedia.org/r/422336 (owner: 10Chad) [22:33:00] 10Operations, 10Ops-Access-Requests: Access to stat100x and notebook1003.eqiad.wmnet for Jonas Kress - https://phabricator.wikimedia.org/T191308#4102662 (10Aklapper) [22:33:56] (03PS2) 10Dzahn: Revert "Revert "Gerrit: Simplify directory structure"" [puppet] - 10https://gerrit.wikimedia.org/r/423805 [22:34:08] (03CR) 10Dzahn: [C: 032] Revert "Revert "Gerrit: Simplify directory structure"" [puppet] - 10https://gerrit.wikimedia.org/r/423805 (owner: 10Dzahn) [22:34:54] !log cobalt - puppet disabled temporarily to apply fix to "simplify directory structure" change .. on gerrit2001 first [22:34:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:37:08] paladox: let's fix that invalid relationship [22:37:14] ok [22:37:29] i reverted the revert because it seemed cleaner this way.. instead of amending to a revert or something [22:37:38] looking at the issue on gerrit2001 now [22:37:46] while cobalt isn't changed [22:37:58] yep [22:38:03] File[/var/lib/gerrit2/review_site/bin] { require => File[/var/lib/gerrit] }, because File[/var/lib/gerrit] doesn't seem to be in the catalog [22:38:26] /var/lib/gerrit created by deploying, not by puppet.. right [22:38:26] oh [22:38:37] that does not exist [22:38:41] should be /var/lib/gerrit2 [22:38:50] hah! nice [22:39:12] do you wanna upload a change for that or should i [22:39:38] (03CR) 10Paladox: [C: 031] Gerrit: Simplify directory structure (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/422336 (owner: 10Chad) [22:39:44] mutante i can :) [22:40:39] 'k, thanks [22:40:51] (03Draft1) 10Paladox: Gerrit: Change /var/lib/gerrit -> /var/lib/gerrit2 [puppet] - 10https://gerrit.wikimedia.org/r/423806 [22:40:55] (03PS2) 10Paladox: Gerrit: Change /var/lib/gerrit -> /var/lib/gerrit2 [puppet] - 10https://gerrit.wikimedia.org/r/423806 [22:40:56] mutante ^^ [22:41:17] drafts ;) [22:41:43] (03CR) 10Dzahn: [C: 032] Gerrit: Change /var/lib/gerrit -> /var/lib/gerrit2 [puppet] - 10https://gerrit.wikimedia.org/r/423806 (owner: 10Paladox) [22:42:04] thanks :) [22:42:47] ok, one fixed, one new [22:42:52] oh [22:43:04] File[/var/lib/gerrit2/review_site/etc/replication.config] { require => File[/var/lib/gerrit2/review_site/etc] }, because File[/var/lib/gerrit2/review_site/etc] doesn't seem to be in the catalog [22:43:26] hmm /var/lib/gerrit2/review_site/etc [22:43:35] ah [22:43:40] it was removed in his change [22:43:53] * paladox knows what needs to be fixed [22:43:58] sounds good [22:45:25] (03Draft1) 10Paladox: Gerrit: Remove requiring /var/lib/gerrit2/review_site/etc [puppet] - 10https://gerrit.wikimedia.org/r/423807 [22:45:27] (03PS2) 10Paladox: Gerrit: Remove requiring /var/lib/gerrit2/review_site/etc [puppet] - 10https://gerrit.wikimedia.org/r/423807 [22:45:29] mutante ^^ [22:46:03] (03CR) 10Paladox: [C: 031] Gerrit: Simplify directory structure (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/422336 (owner: 10Chad) [22:47:31] (03CR) 10Dzahn: [C: 032] "confirmed! another follow-up to https://gerrit.wikimedia.org/r/#/c/422336/" [puppet] - 10https://gerrit.wikimedia.org/r/423807 (owner: 10Paladox) [22:47:52] thanks :) [22:49:15] sooo.. there are no more puppet errors now [22:49:21] but .. it's not nooop [22:49:27] :) [22:49:28] oh [22:49:57] some files in ./bin get owner changed 'root' to 'gerrit2' [22:50:08] (03CR) 10Paladox: [C: 031] "needs rebasing, merge conflicts" [puppet] - 10https://gerrit.wikimedia.org/r/423794 (owner: 10Chad) [22:50:18] and a lot of other files get more open permissions [22:52:12] I...wasn't going to do it that way any more [22:52:17] Hence the pile of things I wrote [22:52:18] But w/e [22:52:37] Also: my changes are all now merge conflict hell [22:54:51] no_justification so we should revert the changes? [22:54:54] you are saying you didnt want me to merge your change that was in my incoming review queue? [22:55:34] Whoops, I forgot to abandon that. [22:55:47] Whatever, I'll rebase all of it and fix it [22:56:12] ok [22:56:48] oh i have merge conflicts on the puppet master as i have your log changed cherry picked heh [22:59:43] 10Operations, 10Analytics, 10New-Readers, 10Traffic, and 2 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#4102802 (10Nuria) @atgo I have checked data for january and march for Opera and i just see us receiving IP addresses of Opera's Proxy endpoints instead of Ip adr... [22:59:55] applying changes on coablt [22:59:57] cobalt [23:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Evening SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180403T2300). [23:00:04] Jdlrobson and Niharika: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:11] Hello, I can swat this evening. [23:00:18] \o [23:00:48] (03PS2) 10Dereckson: Rollout VirtualPageViews (final stage) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423047 (https://phabricator.wikimedia.org/T189906) (owner: 10Jdlrobson) [23:02:22] (03PS2) 10Chad: Gerrit: Move all logging to /var/log [puppet] - 10https://gerrit.wikimedia.org/r/423794 [23:02:28] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423047 (https://phabricator.wikimedia.org/T189906) (owner: 10Jdlrobson) [23:03:43] (03Merged) 10jenkins-bot: Rollout VirtualPageViews (final stage) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423047 (https://phabricator.wikimedia.org/T189906) (owner: 10Jdlrobson) [23:03:53] (03Abandoned) 10Chad: Gerrit: Remove symlink to gerrit.war from homedir [puppet] - 10https://gerrit.wikimedia.org/r/423800 (owner: 10Chad) [23:05:05] jdlrobson: live on mwdebug1002 [23:05:16] no_justification this is the changed files https://phabricator.wikimedia.org/P6938 [23:05:26] (nothing bad) [23:06:03] I know [23:06:05] Dereckson: testng [23:06:52] Dereckson: you can sync! [23:06:59] (03PS2) 10Dereckson: Make a note about the loading order of GlobalPreferences and Echo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/422642 (https://phabricator.wikimedia.org/T190353) (owner: 10Samwilson) [23:07:08] jdlrobson: ok [23:08:56] (03CR) 10Chad: [C: 04-1] Gerrit: Run directly from deployment location (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/423801 (owner: 10Chad) [23:10:15] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Rollout VirtualPageViews (final stage) (T189906) (duration: 01m 19s) [23:10:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:10:22] T189906: Roll out VirtualPageViews to all Wikipedia wikis - https://phabricator.wikimedia.org/T189906 [23:10:44] Logs are quiet. [23:11:01] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/422642 (https://phabricator.wikimedia.org/T190353) (owner: 10Samwilson) [23:12:23] (03Merged) 10jenkins-bot: Make a note about the loading order of GlobalPreferences and Echo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/422642 (https://phabricator.wikimedia.org/T190353) (owner: 10Samwilson) [23:16:15] !log dereckson@tin Synchronized wmf-config/CommonSettings.php: Make a note about the loading order of GlobalPreferences and Echo ([[Gerrit:422642]]) (no-op) (duration: 01m 17s) [23:16:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:16:43] SWAT is done. [23:17:10] Dereckson: thank you! [23:17:31] You're welcome [23:32:08] (03PS1) 10Dereckson: Enforce 1. Echo 2. GlobalPreferences extensions load order [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423809 (https://phabricator.wikimedia.org/T190353) [23:35:36] Niharika: for the change you included in SWAT this evening, perhaps it would be better to really enforce it and throw a clear, easy to understand exception instead to only rely on documentation? [23:38:42] That to me sounds like Echo and GlobalPreferences should play nicer [23:38:48] Order should not matter. [23:40:57] (03CR) 10Samwilson: [C: 031] "That's a better idea! :) Thanks." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423809 (https://phabricator.wikimedia.org/T190353) (owner: 10Dereckson) [23:55:07] !log re-activating graceful-switchover on cr1-codfw - T189588 [23:55:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:55:13] T189588: Config discrepencies on network devices - https://phabricator.wikimedia.org/T189588 [23:56:13] 10Operations, 10netops: Config discrepencies on network devices - https://phabricator.wikimedia.org/T189588#4102889 (10ayounsi)