[00:04:15] (03PS6) 10Yuvipanda: Write logs to the tool's homedir more securely [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202472 [00:04:18] ori: ^ much more elegant, I think [00:04:51] (03CR) 10Ori.livneh: [C: 031] "Looks good; haven't tested." [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202472 (owner: 10Yuvipanda) [00:04:56] ori: ty [00:05:26] (03PS7) 10Yuvipanda: Write logs to the tool's homedir more securely [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202472 [00:08:03] (03PS1) 10Aaron Schulz: Lowered "max lag" to 20 seconds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202626 [00:09:24] Coren: legoktm I’m going to go +2 all the open patchsets there now, all have had another +1 already, I think [00:09:25] * YuviPanda does [00:10:02] (03CR) 10Yuvipanda: [C: 032] Better symlink race protection [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202297 (https://phabricator.wikimedia.org/T95210) (owner: 10Yuvipanda) [00:10:10] (03Merged) 10jenkins-bot: Better symlink race protection [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202297 (https://phabricator.wikimedia.org/T95210) (owner: 10Yuvipanda) [00:10:13] (03Merged) 10jenkins-bot: Validate tool accounts before accepting them [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202305 (owner: 10Yuvipanda) [00:10:24] (03CR) 10Yuvipanda: [C: 032] Send stats about webservices, manifests and errors! [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202318 (https://phabricator.wikimedia.org/T95256) (owner: 10Yuvipanda) [00:10:34] (03Merged) 10jenkins-bot: Send stats about webservices, manifests and errors! [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202318 (https://phabricator.wikimedia.org/T95256) (owner: 10Yuvipanda) [00:11:08] 6operations, 10RESTBase, 7Performance: Create a path entry point for the REST API under regular domains - https://phabricator.wikimedia.org/T95229#1188099 (10GWicke) [00:11:08] (03CR) 10Yuvipanda: [C: 032] Rename package to tools.manifest [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202323 (owner: 10Yuvipanda) [00:11:18] (03CR) 10Yuvipanda: [C: 032] Handle webservice calls erroring out [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202342 (owner: 10Yuvipanda) [00:11:27] (03CR) 10Yuvipanda: [C: 032] Rename service monitor to web service monitor [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202343 (owner: 10Yuvipanda) [00:11:36] (03CR) 10Yuvipanda: [C: 032] Write logs to the tool's homedir more securely [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202472 (owner: 10Yuvipanda) [00:11:41] (sorry) [00:12:27] 6operations, 10RESTBase, 7Performance: Create a path entry point for the REST API under regular domains - https://phabricator.wikimedia.org/T95229#1183654 (10GWicke) [00:12:47] 6operations, 10RESTBase, 7Performance: Create a path entry point for the REST API under regular domains - https://phabricator.wikimedia.org/T95229#1183654 (10GWicke) [00:17:39] 6operations, 10RESTBase, 7Performance: Create a path entry point for the REST API under regular domains - https://phabricator.wikimedia.org/T95229#1188107 (10GWicke) [00:18:29] (03CR) 10Yuvipanda: [C: 032] Add minimial setup.py & requirements.txt [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202320 (owner: 10Yuvipanda) [00:18:51] (03Merged) 10jenkins-bot: Add minimial setup.py & requirements.txt [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202320 (owner: 10Yuvipanda) [00:18:53] (03Merged) 10jenkins-bot: Rename package to tools.manifest [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202323 (owner: 10Yuvipanda) [00:18:55] (03Merged) 10jenkins-bot: Handle webservice calls erroring out [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202342 (owner: 10Yuvipanda) [00:18:57] (03Merged) 10jenkins-bot: Rename service monitor to web service monitor [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202343 (owner: 10Yuvipanda) [00:18:59] (03Merged) 10jenkins-bot: Write logs to the tool's homedir more securely [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202472 (owner: 10Yuvipanda) [00:19:47] 10Ops-Access-Requests, 6operations: Add Stephane Bisson to WMF LDAP - https://phabricator.wikimedia.org/T95374#1188109 (10Mattflaschen) 3NEW a:3yuvipanda [00:20:04] 10Ops-Access-Requests, 6operations: Add Stephane Bisson to WMF LDAP - https://phabricator.wikimedia.org/T95374#1188120 (10yuvipanda) What's his wikitech (LDAP) username? [00:20:06] 10Ops-Access-Requests, 6operations: Add Stephane Bisson (sbisson) to WMF LDAP - https://phabricator.wikimedia.org/T95374#1188118 (10Mattflaschen) [00:20:13] superm401: hah :) [00:20:46] YuviPanda, it's in a different order in the web interface, but I think that's Phabricator's reality distortion field. [00:20:53] 10Ops-Access-Requests, 6operations: Add Stephane Bisson to WMF LDAP - https://phabricator.wikimedia.org/T95374#1188125 (10SBisson) sbisson [00:21:11] superm401: stephanebisson done. [00:21:22] thanks! [00:21:35] stephanebisson: can you verify? [00:26:18] YuviPanda: out of 2 open changes I see in gerrit, I can +2 one of them and not the other one. Seem normal? [00:26:21] (03CR) 10Yuvipanda: [C: 04-1] "Nitpick!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/202467 (owner: 1020after4) [00:26:42] stephanebisson: which repo can you see +2 in and which one can you not? [00:27:53] YuviPanda: both in mediawiki/vagrant [00:28:07] stephanebisson: uh, that’s strange. you should be able to see +2 on both. [00:28:09] try a hard refresh maybe? [00:28:14] (03PS1) 1020after4: create /srv/patches directory on tin [puppet] - 10https://gerrit.wikimedia.org/r/202631 (https://phabricator.wikimedia.org/T95375) [00:28:51] PROBLEM - puppet last run on wtp2001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:29:10] (03CR) 10Yuvipanda: create /srv/patches directory on tin (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/202631 (https://phabricator.wikimedia.org/T95375) (owner: 1020after4) [00:29:20] YuviPanda: full refresh, log out, log in -> same thing... [00:29:22] stephanebisson: link? [00:29:51] (03PS6) 1020after4: Add git-new-workdir to tin (via deployment::deployment_server) [puppet] - 10https://gerrit.wikimedia.org/r/202467 [00:29:53] can't: https://gerrit.wikimedia.org/r/#/c/202579/ [00:29:54] stephanebisson: check the dependencies section [00:29:56] (03PS1) 10Aaron Schulz: Set "recentchanges" query group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202633 [00:30:02] it might depend on another one [00:30:12] (03CR) 1020after4: [C: 031] Add git-new-workdir to tin (via deployment::deployment_server) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/202467 (owner: 1020after4) [00:30:15] can: https://gerrit.wikimedia.org/r/#/c/202592/ [00:30:52] (03PS7) 10Yuvipanda: Add git-new-workdir to tin (via deployment::deployment_server) [puppet] - 10https://gerrit.wikimedia.org/r/202467 (owner: 1020after4) [00:31:01] (03CR) 10Yuvipanda: [C: 032 V: 032] Add git-new-workdir to tin (via deployment::deployment_server) [puppet] - 10https://gerrit.wikimedia.org/r/202467 (owner: 1020after4) [00:31:15] it does have a dependency, on another one that is alreadymerged [00:31:40] stephanebisson: strange >_> I don’t understand what’s happening. [00:31:47] stephanebisson: I’m going to be lame and ask you to log out and back in [00:31:54] (03CR) 10Aaron Schulz: [C: 032] Removed unused var [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202605 (owner: 10Aaron Schulz) [00:32:01] (03Merged) 10jenkins-bot: Removed unused var [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202605 (owner: 10Aaron Schulz) [00:32:04] stephanebisson: yea, then that's not the issue. it's odd [00:32:05] YuviPanda: done that many times already [00:32:10] stephanebisson: strange. [00:32:23] (03PS2) 1020after4: create /srv/patches directory on tin [puppet] - 10https://gerrit.wikimedia.org/r/202631 (https://phabricator.wikimedia.org/T95375) [00:32:33] stephanebisson: do you see a +2 on https://gerrit.wikimedia.org/r/#/c/191848/ [00:32:37] (random patch I picked up) [00:32:49] YuviPanda: yes [00:33:04] stephanebisson: hmm, ok. so the ‘wmf’ group did work, except for one patch [00:33:09] not sure why that’s happening, sorry :( [00:33:34] alright, not a problem in this case [00:33:44] stephanebisson: :) cool. let me close the ticket [00:33:57] YuviPanda: sure. Thanks! [00:34:05] 10Ops-Access-Requests, 6operations: Add Stephane Bisson to WMF LDAP - https://phabricator.wikimedia.org/T95374#1188155 (10yuvipanda) 5Open>3Resolved Done [00:34:05] (03CR) 1020after4: create /srv/patches directory on tin (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/202631 (https://phabricator.wikimedia.org/T95375) (owner: 1020after4) [00:34:49] twentyafterfour: ugh, hmm. I guess [0] is going to be wikidev [00:35:04] springle: any reason db1057 doesn [00:35:13] doesn't have more weight? [00:35:57] is the RAM difference that bad [00:36:02] YuviPanda: yes, I could add a second parameter that isn't an array, or just pass wikidev and get rid of the array that's passed to that class [00:36:22] (I probably would need to refactor trebuchet to not as for an array, also...) [00:36:24] twentyafterfour: nah, I think just a comment saying ‘FIXME: WHY IS THIS AN ARRAY AARGH, but this is ‘wikidev’” should be enough [00:36:33] twentyafterfour: yeah, definitely don’t go down that rabbit hole now :P [00:36:46] I mean, you can if you want to, but I’m not asking you to :P Just a comment should be good... [00:37:06] (03PS1) 10Dzahn: dumps: move hiera data to new location [puppet] - 10https://gerrit.wikimedia.org/r/202637 [00:37:21] (03PS2) 10Dzahn: dumps: move hiera data to new location [puppet] - 10https://gerrit.wikimedia.org/r/202637 [00:38:26] (03PS3) 1020after4: create /srv/patches directory on tin [puppet] - 10https://gerrit.wikimedia.org/r/202631 (https://phabricator.wikimedia.org/T95375) [00:40:03] (03CR) 10Yuvipanda: [C: 032] create /srv/patches directory on tin [puppet] - 10https://gerrit.wikimedia.org/r/202631 (https://phabricator.wikimedia.org/T95375) (owner: 1020after4) [00:41:40] burritocat: enwiki is outgrowing db1057. it (and db1051) are still in s1 to keep the shard spread across enough racks [00:42:22] outgrowing in terms of RAM or disk (or both?) [00:42:53] twentyafterfour: thcipriani|afk any other patches you want me to poke at otday? [00:42:54] *today [00:43:41] YuviPanda: no I think I'm good, thanks for reviewing those for me :) [00:43:45] I assume the former, though that would include db1055 [00:44:16] twentyafterfour: yw :) [00:44:58] burritocat: RAM, plus the 96GB boxes are Dell R510s (12 core), and the 160GB boxes are R720s (20 core) [00:45:21] ahh, right [00:45:33] RECOVERY - puppet last run on wtp2001 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [00:45:36] is mariadb 10 actually good with 20 cores? [00:45:37] during traffic spikes the R720s do help for concurrency [00:46:03] "good" as in "fairly linear" [00:47:06] springle: I still suggest the old 'drop database enwiki;' solution then all problems solve ;) [00:48:01] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [00:48:30] burritocat: debatable. we see most benefit from more cores with the mariadb thread pool enabled, but at least part of that is not about sheer innodb scaling linearly, but about avoiding stalls [00:49:00] JohnFLewis: :) [00:49:31] 7Puppet: Write, publish and deploy puppet-lint plug-in for ensure attribute bareword check - https://phabricator.wikimedia.org/T95377#1188184 (10scfc) 3NEW [00:52:40] 7Puppet, 6operations, 5Patch-For-Review: Resource attributes are quoted inconsistently - https://phabricator.wikimedia.org/T91908#1188193 (10scfc) >>! In T91908#1168849, @scfc wrote: > [[https://github.com/rodjek/puppet-lint/issues/412|Filed upstream]]; if someone is interested in implementing, a developer's... [00:53:24] (03PS2) 10Springle: Lowered "max lag" to 20 seconds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202626 (owner: 10Aaron Schulz) [00:53:46] (03CR) 10Springle: [C: 032] Lowered "max lag" to 20 seconds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202626 (owner: 10Aaron Schulz) [00:53:50] (03Merged) 10jenkins-bot: Lowered "max lag" to 20 seconds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202626 (owner: 10Aaron Schulz) [00:54:41] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [00:55:08] !log springle Synchronized wmf-config/db-eqiad.php: reduce max lag to 20s, gerrit 202626 (duration: 00m 11s) [00:55:16] Logged the message, Master [01:02:52] bd808: LoggerFactory::getInstance( 'DBPerformance' ); fails with mwscript in vagrant for me :/ [01:03:00] odd since it's called in MediaWiki.php [01:03:14] There's a namespace [01:03:30] MediaWiki\Logger\LoggerFactory [01:03:59] ah, "use MediaWiki\Logger\LoggerFactory;", hiding up there ;) [01:04:10] sneaky me [01:04:17] that's kind of annoying, oh well [01:04:40] Today there is a stub class named MWLoggerFactory [01:04:54] but I hope to kill it in the next couple of weeks [01:06:35] 6operations, 10MediaWiki-extensions-Graph, 6Services, 10service-template-node, 7service-runner: Deploy graphoid service into production - https://phabricator.wikimedia.org/T90487#1188226 (10Yurik) [01:18:47] (03PS1) 10Dzahn: phab: indentation fixes in role class [puppet] - 10https://gerrit.wikimedia.org/r/202642 (https://phabricator.wikimedia.org/T93645) [01:19:52] (03CR) 10John F. Lewis: [C: 031] phab: indentation fixes in role class [puppet] - 10https://gerrit.wikimedia.org/r/202642 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn) [01:23:01] 6operations, 5Patch-For-Review: Force https for archiva.wikimedia.org - https://phabricator.wikimedia.org/T88139#1188263 (10Ottomata) @dzahn could you review this? https://gerrit.wikimedia.org/r/#/c/202474/11/manifests/role/archiva.pp I think I'm doing it ok? I do have install_certificate in there. [01:29:12] PROBLEM - mailman I/O stats on sodium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=297.20 Read Requests/Sec=240.50 Write Requests/Sec=16.20 KBytes Read/Sec=5367.60 KBytes_Written/Sec=127.15 [01:35:52] RECOVERY - mailman I/O stats on sodium is OK: OK - I/O stats: Transfers/Sec=25.60 Read Requests/Sec=0.20 Write Requests/Sec=15.30 KBytes Read/Sec=1.20 KBytes_Written/Sec=70.40 [01:53:10] (03PS1) 10Dzahn: mariadb: lint fixes in role class [puppet] - 10https://gerrit.wikimedia.org/r/202645 (https://phabricator.wikimedia.org/T93645) [01:53:12] (03PS1) 10Dzahn: labsdns: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/202646 (https://phabricator.wikimedia.org/T93645) [01:53:14] (03PS1) 10Dzahn: gerrit: lint fixes in role class [puppet] - 10https://gerrit.wikimedia.org/r/202647 (https://phabricator.wikimedia.org/T93645) [01:56:13] (03PS1) 10John F. Lewis: Increase I/O values for mailman check [puppet] - 10https://gerrit.wikimedia.org/r/202648 [01:59:05] (03CR) 10Dzahn: [C: 032] "thanks for this. indeed the first values were based on averages of a much shorter time and this came from actual alert history now" [puppet] - 10https://gerrit.wikimedia.org/r/202648 (owner: 10John F. Lewis) [02:07:31] (03CR) 10Yuvipanda: [C: 04-1] "Can't we change this the other way around, and make it module/private? I think this is a useful check, having made this mistake myself bec" [puppet] - 10https://gerrit.wikimedia.org/r/199154 (owner: 10Negative24) [02:09:07] (03Abandoned) 10Negative24: puppet-lint: Disable URL modules check [puppet] - 10https://gerrit.wikimedia.org/r/199154 (owner: 10Negative24) [02:10:10] (03Abandoned) 10Negative24: puppet-lint: Disable case default check [puppet] - 10https://gerrit.wikimedia.org/r/199545 (owner: 10Negative24) [02:10:32] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [02:19:46] (03PS1) 10Yuvipanda: Add a 'runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 [02:20:05] (03CR) 10jenkins-bot: [V: 04-1] Add a 'runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 (owner: 10Yuvipanda) [02:20:15] (03CR) 10Dzahn: "thanks for leaving it enabled. we also recently added some exeptions to this. see https://gerrit.wikimedia.org/r/#/c/197759/" [puppet] - 10https://gerrit.wikimedia.org/r/199545 (owner: 10Negative24) [02:20:21] (03PS1) 10Dzahn: logstash: lint [puppet] - 10https://gerrit.wikimedia.org/r/202651 [02:20:23] (03PS1) 10Dzahn: logging: lint [puppet] - 10https://gerrit.wikimedia.org/r/202652 [02:20:25] (03PS1) 10Dzahn: various role classes: moar small lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/202653 (https://phabricator.wikimedia.org/T93645) [02:20:27] (03PS1) 10Dzahn: dataset: lint [puppet] - 10https://gerrit.wikimedia.org/r/202654 (https://phabricator.wikimedia.org/T93645) [02:20:29] (03PS1) 10Dzahn: beta: lint [puppet] - 10https://gerrit.wikimedia.org/r/202655 [02:20:31] (03PS1) 10Dzahn: backup: lint [puppet] - 10https://gerrit.wikimedia.org/r/202656 [02:22:33] (03CR) 10Dzahn: "@Jan, yes, but i prefer a few exceptions over disabling it completely. at least not https://gerrit.wikimedia.org/r/#/c/199545/" [puppet] - 10https://gerrit.wikimedia.org/r/197759 (https://phabricator.wikimedia.org/T87132) (owner: 10Tim Landscheidt) [02:23:09] (03PS2) 10Yuvipanda: Add a 'runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 [02:23:33] (03CR) 10jenkins-bot: [V: 04-1] Add a 'runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 (owner: 10Yuvipanda) [02:27:32] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [02:33:31] !log l10nupdate Synchronized php-1.25wmf23/cache/l10n: (no message) (duration: 08m 54s) [02:33:40] Logged the message, Master [02:40:13] !log LocalisationUpdate completed (1.25wmf23) at 2015-04-08 02:39:10+00:00 [02:40:20] Logged the message, Master [02:44:36] (03PS3) 10Yuvipanda: Add a 'runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 [02:44:50] (03CR) 10Negative24: "I should really push my changes more (e.g. add reviewers) so that its on people's radar and this doesn't happen again." [puppet] - 10https://gerrit.wikimedia.org/r/199545 (owner: 10Negative24) [02:48:08] * burritocat chuckles at https://www.mediawiki.org/wiki/Special:Code/MediaWiki/59993#c4885 [02:55:23] (03CR) 10Dzahn: "thanks for your patches! if there is no reply on gerrit in a reasonable time feel free to ping more on IRC. we can still fix the exception" [puppet] - 10https://gerrit.wikimedia.org/r/199545 (owner: 10Negative24) [03:01:08] !log l10nupdate Synchronized php-1.25wmf24/cache/l10n: (no message) (duration: 06m 15s) [03:01:16] Logged the message, Master [03:01:58] (03CR) 10Dzahn: "the scope of planet feeds is described as:" [puppet] - 10https://gerrit.wikimedia.org/r/202471 (owner: 10Nemo bis) [03:05:56] !log LocalisationUpdate completed (1.25wmf24) at 2015-04-08 03:04:53+00:00 [03:06:01] Logged the message, Master [03:07:09] (03CR) 10John F. Lewis: [C: 031] "Seems relevant with the scope of from contributors. While not directly Wikimedia, the scope of the blog seems like it may be appealing for" [puppet] - 10https://gerrit.wikimedia.org/r/202471 (owner: 10Nemo bis) [03:09:30] (03PS4) 10Yuvipanda: Add a 'runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 [03:10:02] (03CR) 10jenkins-bot: [V: 04-1] Add a 'runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 (owner: 10Yuvipanda) [03:10:24] (03CR) 10Dzahn: "do i put role::dumps::zim into role/zim.pp so 2 roles in one file, or do we want the role/dumps/zim.pp structure like in modules?" [puppet] - 10https://gerrit.wikimedia.org/r/200725 (https://phabricator.wikimedia.org/T94457) (owner: 10Dzahn) [03:11:17] (03PS5) 10Yuvipanda: Add a 'runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 [03:11:26] (03CR) 10Negative24: "I will see about getting those default cases set. I had a half written patch in case this failed." [puppet] - 10https://gerrit.wikimedia.org/r/199545 (owner: 10Negative24) [03:11:37] (03CR) 10Dzahn: "i meant "class role::dumps::zim" in file role/dumps.pp or in role/dumps/zim.pp ?" [puppet] - 10https://gerrit.wikimedia.org/r/200725 (https://phabricator.wikimedia.org/T94457) (owner: 10Dzahn) [03:14:23] (03PS6) 10Yuvipanda: Add a 'runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 [03:27:08] (03PS6) 10BBlack: scale varnish->varnish backend weight for prod 2layer clusters [puppet] - 10https://gerrit.wikimedia.org/r/201714 (https://phabricator.wikimedia.org/T86663) [03:29:59] !log disabling puppet on caches for weight scale deploy/test [03:30:05] Logged the message, Master [03:30:59] (03CR) 10BBlack: [C: 032] scale varnish->varnish backend weight for prod 2layer clusters [puppet] - 10https://gerrit.wikimedia.org/r/201714 (https://phabricator.wikimedia.org/T86663) (owner: 10BBlack) [03:34:44] !log re-enabling puppet on caches (weight scale looks good!) [03:34:48] Logged the message, Master [03:38:32] (03PS1) 10BBlack: clean up cp30xx site.pp regexes [puppet] - 10https://gerrit.wikimedia.org/r/202662 [03:38:34] (03PS1) 10BBlack: add site.pp cache roles for cp30[34]x [puppet] - 10https://gerrit.wikimedia.org/r/202663 [03:40:22] (03PS2) 10BBlack: add site.pp cache roles for cp30[34]x [puppet] - 10https://gerrit.wikimedia.org/r/202663 (https://phabricator.wikimedia.org/T86663) [03:40:38] (03CR) 10BBlack: [C: 032] clean up cp30xx site.pp regexes [puppet] - 10https://gerrit.wikimedia.org/r/202662 (owner: 10BBlack) [03:47:37] (03CR) 10BBlack: [C: 032] add site.pp cache roles for cp30[34]x [puppet] - 10https://gerrit.wikimedia.org/r/202663 (https://phabricator.wikimedia.org/T86663) (owner: 10BBlack) [03:48:44] (03PS7) 10Yuvipanda: Add a 'runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 [03:48:46] (03PS1) 10Yuvipanda: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 [03:53:17] (03PS1) 1020after4: mwpatch - a simple patch queue using a directory tree [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202665 (https://phabricator.wikimedia.org/T95375) [04:05:18] (03PS1) 10BBlack: esams text: add cp303[12], kill amssq31-36 [puppet] - 10https://gerrit.wikimedia.org/r/202667 [04:06:16] (03CR) 10BBlack: [C: 032] esams text: add cp303[12], kill amssq31-36 [puppet] - 10https://gerrit.wikimedia.org/r/202667 (owner: 10BBlack) [04:09:28] PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL: CRITICAL: 1.69% of data above the critical threshold [1000.0] [04:10:41] wth is "too many creates"? [04:12:38] PROBLEM - puppet last run on mw1151 is CRITICAL: CRITICAL: Puppet has 1 failures [04:20:52] (03PS1) 10BBlack: esams text: add cp304[01], kill amssq37-42 [puppet] - 10https://gerrit.wikimedia.org/r/202668 [04:21:24] (03CR) 10BBlack: [C: 032 V: 032] esams text: add cp304[01], kill amssq37-42 [puppet] - 10https://gerrit.wikimedia.org/r/202668 (owner: 10BBlack) [04:25:23] bast1001 ok? [04:25:46] ah - ok now [04:25:55] maybe my connection just freaked for a bit [04:29:29] RECOVERY - puppet last run on mw1151 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [04:31:01] (03PS1) 10BBlack: esams upload: add cp303[23] [puppet] - 10https://gerrit.wikimedia.org/r/202670 [04:31:34] (03CR) 10BBlack: [C: 032 V: 032] esams upload: add cp303[23] [puppet] - 10https://gerrit.wikimedia.org/r/202670 (owner: 10BBlack) [04:34:49] RECOVERY - mysqld processes on db2042 is OK: PROCS OK: 1 process with command name mysqld [04:39:23] (03PS1) 10BBlack: esams upload: add cp304[23] [puppet] - 10https://gerrit.wikimedia.org/r/202672 [04:39:38] (03CR) 10BBlack: [C: 032 V: 032] esams upload: add cp304[23] [puppet] - 10https://gerrit.wikimedia.org/r/202672 (owner: 10BBlack) [04:43:28] 6operations, 7HTTPS, 3HTTPS-by-default: Expand HTTP frontend clusters with new hardware - https://phabricator.wikimedia.org/T86663#1188456 (10BBlack) Steps 1-3 above are complete. [05:13:10] legoktm: uhm, I can’t seem to login to jawiki [05:13:15] I wonder if this is SUL related? [05:13:25] > Login error [05:13:25] .*(?i:vip).* > [05:14:33] umm what [05:14:40] oh [05:14:40] so [05:14:48] autocreation triggers titleblacklist [05:14:54] I'm actually disabling that tomorrow... :P [05:14:58] PROBLEM - Host mw2027 is DOWN: PING CRITICAL - Packet loss = 100% [05:16:17] RECOVERY - Host mw2027 is UP: PING WARNING - Packet loss = 37%, RTA = 192.10 ms [05:16:29] legoktm: I don’t really understand? [05:16:52] YuviPanda: I can't really do anything about that, you might be able to find a sysop or steward in #wikimedia-stewards to remove the offending rule [05:17:12] YuviPanda: basically any account that matches that regex can't be created [05:17:12] well, I don’t really care :D Just wanted to make sure it’s kinown [05:17:13] wait [05:17:15] oh [05:17:19] ‘vip’? [05:17:20] hahaaaah [05:17:22] ahhahahahah [05:17:23] hahahahah [05:17:24] lol [05:17:25] right [05:17:25] ok [05:17:27] :P [05:17:50] and SUL autocreation obeys that [05:17:51] so [05:17:52] :| [05:18:10] but it's getting turned off tomorrow, and it's going to be managed globally on meta instead [05:25:08] RECOVERY - carbon-cache too many creates on graphite1001 is OK: OK: Less than 1.00% above the threshold [500.0] [05:31:36] (03PS4) 10Andrew Bogott: Have sink create ldap host entries. [puppet] - 10https://gerrit.wikimedia.org/r/202582 [05:45:19] ACKNOWLEDGEMENT - Host mw2128 is DOWN: PING CRITICAL - Packet loss = 100% Faidon Liambotis T95264 [05:52:17] (03PS1) 10Faidon Liambotis: graphite: fix carbon-relay queue check [puppet] - 10https://gerrit.wikimedia.org/r/202677 [05:52:54] godog [05:52:58] ^^ :) [06:03:19] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Apr 8 06:02:15 UTC 2015 (duration 2m 14s) [06:03:25] Logged the message, Master [06:06:11] 6operations: setup/deploy ganeti2001-2006 - https://phabricator.wikimedia.org/T94042#1188501 (10faidon) [06:06:31] ACKNOWLEDGEMENT - puppet last run on ganeti1003 is CRITICAL: CRITICAL: Puppet has 3 failures Faidon Liambotis T94042 [06:06:32] ACKNOWLEDGEMENT - puppet last run on ganeti2001 is CRITICAL: CRITICAL: Puppet has 3 failures Faidon Liambotis T94042 [06:06:32] ACKNOWLEDGEMENT - puppet last run on ganeti2002 is CRITICAL: CRITICAL: Puppet has 3 failures Faidon Liambotis T94042 [06:07:09] (03CR) 10Faidon Liambotis: [C: 032] graphite: fix carbon-relay queue check [puppet] - 10https://gerrit.wikimedia.org/r/202677 (owner: 10Faidon Liambotis) [06:07:44] springle: ping? [06:08:42] ACKNOWLEDGEMENT - Disk space on dataset1001 is CRITICAL: DISK CRITICAL - free space: /data 622459 MB (1% inode=99%): Faidon Liambotis T84148 [06:09:36] (03PS2) 10Yuvipanda: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 [06:24:33] (03PS2) 10Giuseppe Lavagetto: snapshot: fixup for I1863679d and Icc9aba2 [puppet] - 10https://gerrit.wikimedia.org/r/202406 [06:28:29] (03PS3) 10Yuvipanda: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 [06:29:06] (03CR) 10Giuseppe Lavagetto: [C: 032] snapshot: fixup for I1863679d and Icc9aba2 [puppet] - 10https://gerrit.wikimedia.org/r/202406 (owner: 10Giuseppe Lavagetto) [06:29:48] PROBLEM - puppet last run on subra is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:21] (03PS2) 10Giuseppe Lavagetto: standard: include admin wherever needed [puppet] - 10https://gerrit.wikimedia.org/r/202407 (https://phabricator.wikimedia.org/T86774) [06:30:29] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:39] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:48] PROBLEM - puppet last run on elastic1030 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:59] <_joe_> ah! passenger o'clock is back to its usual time [06:31:18] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:18] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: Puppet has 3 failures [06:33:14] (03PS4) 10Yuvipanda: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 [06:33:38] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:59] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:28] PROBLEM - puppet last run on cp3035 is CRITICAL: CRITICAL: puppet fail [06:34:47] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 2 failures [06:35:09] PROBLEM - puppet last run on mw2013 is CRITICAL: CRITICAL: Puppet has 1 failures [06:37:07] PROBLEM - puppet last run on mw2127 is CRITICAL: CRITICAL: Puppet has 1 failures [06:38:08] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:38:17] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures [06:38:41] <_joe_> mmh this is going on a bit too long [06:40:29] paravoid: pong [06:42:44] springle: hi [06:43:00] springle: db2042 is lagging behind but I guess you know about that [06:43:15] springle: and db1003 has a disk space alert for about a month now [06:43:20] (03PS4) 10Ori.livneh: Add a script for storing NavTiming metrics using RRD [puppet] - 10https://gerrit.wikimedia.org/r/202362 [06:44:31] (03CR) 10Ori.livneh: [C: 032] "well, let's see" [puppet] - 10https://gerrit.wikimedia.org/r/202362 (owner: 10Ori.livneh) [06:44:55] paravoid: hmm, i thought i had silenced db1003, but apparently not. sorry [06:45:16] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:45:31] acked, rather [06:45:46] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:46:20] <_joe_> hey springle [06:46:25] RECOVERY - puppet last run on subra is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:29] <_joe_> good... afternoon? [06:46:42] right :) hi [06:46:46] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:46:46] RECOVERY - puppet last run on mw2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:46] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:47:06] RECOVERY - puppet last run on mw2127 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:47:16] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [06:47:16] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [06:47:26] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:47:27] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:47:36] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:45] PROBLEM - HHVM rendering on mw1081 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 0.014 second response time [06:47:46] RECOVERY - puppet last run on elastic1030 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:47:56] PROBLEM - Apache HTTP on mw1081 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 0.013 second response time [06:48:36] PROBLEM - HHVM processes on mw1081 is CRITICAL: PROCS CRITICAL: 0 processes with command name hhvm [06:48:42] <_joe_> interesting [06:48:50] <_joe_> hhvm pre-start process (11934) terminated with status 1 [06:49:46] RECOVERY - Apache HTTP on mw1081 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.076 second response time [06:50:25] RECOVERY - HHVM processes on mw1081 is OK: PROCS OK: 25 processes with command name hhvm [06:50:57] RECOVERY - puppet last run on cp3035 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:50:58] springle: so, what should I ack db1003 with? [06:51:16] RECOVERY - HHVM rendering on mw1081 is OK: HTTP OK: HTTP/1.1 200 OK - 67178 bytes in 0.146 second response time [06:52:55] paravoid: nothing. i just cleaned the pupet stored config, which should update neon shortly. db1003 (02-10) are all but decommissioned. it need not still be replicating [06:53:08] did you also disable puppet? [06:53:20] a puppet run will reintroduce it to stored configs [06:53:31] but without the maraidb checks, right? [06:53:46] this isn't a mariadb check, it's a disk space check :) [06:53:50] included from standard [06:54:04] haha [06:54:06] ok [06:54:16] so what's the plan for them? to get decommissioned? [06:54:21] * springle kills the datadir [06:54:22] yes [06:54:27] is there a task for that? [06:54:41] nope [06:54:45] I can just ack the alert with the task number [06:54:48] oh heh [06:55:45] gerrit seems so fast now [06:56:00] so much better when few people are up [06:59:25] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: puppet fail [07:03:11] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:07:06] (03PS1) 10Springle: assign db2041 to s2 [puppet] - 10https://gerrit.wikimedia.org/r/202686 [07:08:16] (03CR) 10Springle: [C: 032] assign db2041 to s2 [puppet] - 10https://gerrit.wikimedia.org/r/202686 (owner: 10Springle) [07:12:43] springle: do we have per-db parallel replication yet? [07:13:04] * burritocat is curious of that status of things since the last update on T85266 [07:15:21] PROBLEM - puppet last run on db2041 is CRITICAL: CRITICAL: Puppet has 3 failures [07:15:40] not at that level; prod masters are still 5.5 [07:15:48] (03PS3) 10Giuseppe Lavagetto: standard: include admin wherever needed [puppet] - 10https://gerrit.wikimedia.org/r/202407 (https://phabricator.wikimedia.org/T86774) [07:20:42] RECOVERY - puppet last run on db2041 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:25:36] PROBLEM - mysqld processes on db2041 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [07:25:50] offs [07:27:08] ACKNOWLEDGEMENT - mysqld processes on db2041 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld Sean Pringle yes. good icinga. have a treat. [07:30:38] mr grrrit missed my patch [07:43:03] * RGW bucket sharding: RGW can now shard the bucket index for large buckets across, improving performance for very large buckets. [07:43:07] ceph 0.94 [07:50:11] (03PS4) 10Giuseppe Lavagetto: standard: include admin wherever needed [puppet] - 10https://gerrit.wikimedia.org/r/202407 (https://phabricator.wikimedia.org/T86774) [07:54:16] (03PS5) 10Yuvipanda: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 [07:55:10] (03PS6) 10Yuvipanda: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 [07:57:13] (03PS8) 10Yuvipanda: Add a 'runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 [07:57:15] (03PS7) 10Yuvipanda: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 [07:58:19] (03CR) 10Alexandros Kosiaris: [C: 032] [English Planet] Add Timo Tijhof (second blog) [puppet] - 10https://gerrit.wikimedia.org/r/202471 (owner: 10Nemo bis) [08:00:31] 6operations, 5Patch-For-Review: failed icinga/graphite login for Moritz - https://phabricator.wikimedia.org/T94729#1188807 (10MoritzMuehlenhoff) My login to Icinga is working fine, this was caused by the umlaut in my original user name (https://phabricator.wikimedia.org/T94717#1186082) When I log into graphit... [08:02:00] (03CR) 10Alexandros Kosiaris: "Since this seemed to be a "maybe", I merged it anyway just based on the number of supporters and Daniel's reasoning. Worse case scenario, " [puppet] - 10https://gerrit.wikimedia.org/r/202471 (owner: 10Nemo bis) [08:12:45] (03PS8) 10Yuvipanda: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 [08:18:04] akosiaris: can you merge, https://gerrit.wikimedia.org/r/202340 (cx, beta) [08:18:09] thanks :) [08:19:21] YuviPanda: I started with, https://gerrit.wikimedia.org/r/#/c/202689/ to add more 'wikis' in beta as per, https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/Add_a_wiki and will imrove document as needed. [08:19:27] feel free to comment. [08:19:49] kart_: sure. just curious, which language is 'da' ? [08:19:49] I’ve no idea about the entire mediawiki side of anything, sorry :( You need -releng [08:20:13] akosiaris: Danish [08:20:25] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Enable sv-da MT in Apertium [puppet] - 10https://gerrit.wikimedia.org/r/202340 (owner: 10KartikMistry) [08:20:44] YuviPanda: no problem :) will poke hashar later. [08:21:19] kart_: have you thought about moving the cx configuration out of puppet and in the software repo ? [08:21:44] it will help decouple ops from the update process of cx [08:22:13] akosiaris: I think 'da' is legacy [08:22:28] > [08:22:29] ? [08:22:31] akosiaris: doable. [08:22:31] you can look them up in mediawiki/core language file named Name.php [08:22:39] oh.. thanks hashar [08:22:46] I did not know that [08:23:08] (03PS9) 10Yuvipanda: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 [08:23:19] hashar: https://da.wikipedia.org/wiki/Forside really? [08:25:34] akosiaris: you mean operations/software/.. or inside cxserver? [08:25:46] akosiaris: It will be helpful for Beta atleast. [08:26:45] paravoid: +1 ! thanks [08:27:39] (03CR) 10Yuvipanda: "Do we really, really, REALLY have to write this in PHP?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202665 (https://phabricator.wikimedia.org/T95375) (owner: 1020after4) [08:27:50] 'da' => 'dansk', # Danish [08:28:04] I think we used to have 'dk' [08:29:02] so it isn't legacy :) [08:29:08] yeah sorry :( [08:29:55] PROBLEM - puppet last run on ms-fe2001 is CRITICAL: CRITICAL: puppet fail [08:30:10] kart_: inside cxserver [08:31:38] RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [08:33:31] akosiaris: specially registry part! [08:34:25] :-) [08:37:57] (03CR) 10Giuseppe Lavagetto: "Have you guys considered using quilt to manage a patch queue?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202665 (https://phabricator.wikimedia.org/T95375) (owner: 1020after4) [08:39:41] (03CR) 10Giuseppe Lavagetto: [C: 031] tungsten: deprovision roles [puppet] - 10https://gerrit.wikimedia.org/r/202376 (https://phabricator.wikimedia.org/T90591) (owner: 10Filippo Giunchedi) [08:43:14] (03CR) 10Giuseppe Lavagetto: [C: 031] set cassandra vars in cassandra.yaml, not restbase [puppet] - 10https://gerrit.wikimedia.org/r/201401 (owner: 10Dzahn) [08:44:22] (03CR) 1020after4: "Yuvi: if you wanna write it in python, go for it. I'm still more comfortable/productive in PHP despite it's ugliness." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202665 (https://phabricator.wikimedia.org/T95375) (owner: 1020after4) [08:47:46] (03CR) 1020after4: "I really don't think any existing tool does what I need this to do - I just need a simple accounting of .patch files so that they can be a" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202665 (https://phabricator.wikimedia.org/T95375) (owner: 1020after4) [08:48:45] (03CR) 10Giuseppe Lavagetto: "I agree with hashar. No point in polluting our manifests, and we really want to move things into modules sooner than later." [puppet] - 10https://gerrit.wikimedia.org/r/198116 (https://phabricator.wikimedia.org/T87132) (owner: 10Tim Landscheidt) [08:54:01] (03CR) 10Filippo Giunchedi: Add a script for storing NavTiming metrics using RRD (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/202362 (owner: 10Ori.livneh) [08:57:30] (03PS2) 10Giuseppe Lavagetto: puppet: selector outside a resource [puppet] - 10https://gerrit.wikimedia.org/r/195533 (owner: 10Matanya) [08:57:41] (03PS10) 10Yuvipanda: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 [09:01:27] ugh [09:01:45] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet: selector outside a resource [puppet] - 10https://gerrit.wikimedia.org/r/195533 (owner: 10Matanya) [09:01:46] someone's still working? :P [09:02:19] who? [09:02:20] :P [09:02:55] 6operations, 6Commons, 6Multimedia: 220px of Fachada_e_lateral_da_Catedral_São_Sebastião_após_pintura,_Coronel_Fabriciano_MG.JPG not purging - https://phabricator.wikimedia.org/T95333#1188895 (10fgiunchedi) not sure there's a way to block a task after the fact anyway there doesn't seem to be a 220px file in... [09:05:35] (03PS11) 10Yuvipanda: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 (https://phabricator.wikimedia.org/T95255) [09:06:07] 6operations, 6Commons, 6Multimedia: 220px of Fachada_e_lateral_da_Catedral_São_Sebastião_após_pintura,_Coronel_Fabriciano_MG.JPG not purging - https://phabricator.wikimedia.org/T95333#1188898 (10fgiunchedi) btw on the 220px I got `x-cache:cp1063 hit (26), cp3004 hit (8), cp3009 frontend miss (0)` and this on... [09:07:17] godog: have time for some debian packaging review / advice? [09:07:42] ^ is patch, I’ve added you as well [09:08:06] YuviPanda: ye maybe later today, let's say 50% change of having it reviewed when you wake up :D [09:08:22] godog: sweet :) [09:08:23] godog: ty! [09:08:30] np [09:08:37] godog: this is the first time I think I understood almost all the things (almost) that went into the debian/ :) [09:08:50] (for some definition of ‘understood’ and ‘all’ of course :)) [09:09:33] YuviPanda: cool! it is like starting to remember your cc number, meaning you've used it many times [09:09:44] :D [09:09:44] haha [09:10:07] so I started off with stdeb and stripped until it still worked + I understood :) [09:10:24] pybuilder seems to have negated much of the need for stdeb... [09:10:40] godog: hello! I went in a time sink since last week [09:11:01] godog: I think the Zuul debian package for Precise is ready by now. I even got it deployed :) https://gerrit.wikimedia.org/r/#/c/195272/ [09:11:16] hashar: cool! so it works as expected [09:11:27] godog: yeah I had to do some last minute change to it though [09:11:44] specially --version that was always reporting 2.0.0 when I really wanted the git sha 1 to show [09:12:43] anyway, zuul-cloner is workiing on Precise :) [09:12:53] have to upgrade the server now (gallium) [09:14:46] sweet [09:15:00] not today though [09:15:33] will get the Trusty package build and figure out an update of the puppet manifest to get rid of git clone && pip install . [09:15:34] :D [09:16:07] 6operations, 10MediaWiki-extensions-Graph, 6Services, 10service-template-node, 7service-runner: Deploy graphoid service into production - https://phabricator.wikimedia.org/T90487#1188907 (10mobrovac) [09:19:57] 6operations, 6Commons, 6Multimedia: 220px of Fachada_e_lateral_da_Catedral_São_Sebastião_após_pintura,_Coronel_Fabriciano_MG.JPG not purging - https://phabricator.wikimedia.org/T95333#1188908 (10fgiunchedi) also cc @bblack and @faidon as the timeline for the file at http://commons.wikimedia.org/wiki/File:Fac... [09:26:16] godog: in the same vien, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=778601 has patches for python3-statsd which you last touched :) [09:31:14] YuviPanda: indeedly, I did that because I needed it :P [09:31:31] godog: :D yeah, and there’s a python3-statsd on tools now because I needed it :D [09:31:39] but would like to import it from ‘upstream’ instead :) [09:31:53] (03PS2) 10Filippo Giunchedi: tungsten: deprovision roles [puppet] - 10https://gerrit.wikimedia.org/r/202376 (https://phabricator.wikimedia.org/T90591) [09:32:00] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] tungsten: deprovision roles [puppet] - 10https://gerrit.wikimedia.org/r/202376 (https://phabricator.wikimedia.org/T90591) (owner: 10Filippo Giunchedi) [09:32:47] YuviPanda: ah cool, if you have the package already and it works by applying the patch I can sponsor an upload [09:37:57] godog: yup, I’ve applied the patch and built them, work fine [09:39:31] 10Ops-Access-Requests, 6operations, 6Services: Allow mobrovac to restart Zotero - https://phabricator.wikimedia.org/T95400#1188916 (10mobrovac) 3NEW [10:04:37] (03CR) 1020after4: "I take it back - it looks like ply would be a good fit for what I need to do: http://pythonhackers.com/p/gochist/ply" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202665 (https://phabricator.wikimedia.org/T95375) (owner: 1020after4) [10:06:04] (03CR) 1020after4: "https://github.com/rconradharris/ply" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202665 (https://phabricator.wikimedia.org/T95375) (owner: 1020after4) [10:27:36] 6operations, 6Phabricator, 10Wikimedia-Bugzilla: Sanitise a Bugzilla database dump - https://phabricator.wikimedia.org/T85141#1189025 (10Aklapper) >>! In T85141#1187711, @Dzahn wrote: > Addtionally, @aklapper provided me with a list of IDs (not sure how he created that) Query in old-bz for all tasks in the... [10:30:59] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: onboarding Moritz Muehlenhoff in ops - https://phabricator.wikimedia.org/T94717#1189027 (10MoritzMuehlenhoff) Gerrit login doesn't work for me: When logging in with my Wiki tech credentials, I only get "Cannot assign user name", possibly a remnant of the... [10:32:15] 20after4: https://git.wikimedia.org/blob/operations%2Fpuppet.git/eff2e014d3ad0563f87cd6d10c0d9b567dfdb221/modules%2Fphabricator%2Fdata%2Ffixed_settings.yaml#L11 possible to change to protocol relative url's? [10:33:53] Steinsplitter: Why protocol relative? [10:34:15] the links will always be served from https so the end result would be the same I think [10:34:26] ah [10:39:53] (03CR) 10Filippo Giunchedi: Add initial debian package & upstart script (033 comments) [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 (https://phabricator.wikimedia.org/T95255) (owner: 10Yuvipanda) [10:41:25] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: puppet fail [10:50:00] (03PS3) 10Filippo Giunchedi: statsite: new module [puppet] - 10https://gerrit.wikimedia.org/r/199599 (https://phabricator.wikimedia.org/T90111) [10:50:02] (03PS2) 10Filippo Giunchedi: statsdlb: replace txstatsd with statsite [puppet] - 10https://gerrit.wikimedia.org/r/199600 (https://phabricator.wikimedia.org/T90111) [10:50:04] (03PS1) 10Filippo Giunchedi: statsite: replace ::txstatsd class and role calls [puppet] - 10https://gerrit.wikimedia.org/r/202701 (https://phabricator.wikimedia.org/T90111) [10:58:55] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:15:49] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "A couple of small comments, on the code, that looks good to me as far as correctness is concerned." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/199599 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [11:18:08] 6operations, 10Analytics, 6Scrum-of-Scrums, 10Wikipedia-Android-App, and 4 others: Avoid cache fragmenting URLs for Share a Fact shares - https://phabricator.wikimedia.org/T90606#1189182 (10mark) [11:21:14] * aude needs to deploy bug fix for wikidata [11:21:20] that really shouldn't wait until swat [11:21:37] assume no one else is doing anything now? [11:22:07] it's a tiny but important change :) [11:23:01] 6operations, 6Commons, 6Multimedia: 220px of Fachada_e_lateral_da_Catedral_São_Sebastião_após_pintura,_Coronel_Fabriciano_MG.JPG not purging - https://phabricator.wikimedia.org/T95333#1189216 (10Aklapper) [11:23:51] * aude takes it as a "no" and proceeds [11:28:52] 6operations, 6Commons, 6Multimedia: 220px of Fachada_e_lateral_da_Catedral_São_Sebastião_após_pintura,_Coronel_Fabriciano_MG.JPG not purging - https://phabricator.wikimedia.org/T95333#1189233 (10Aklapper) p:5Triage>3High [11:35:41] and waaaaaaaaaaaaaaaaaits for jenkins [11:38:56] !log aude Synchronized php-1.25wmf24/extensions/Wikidata: Fix issue with edit links in diff view (duration: 00m 20s) [11:38:59] done [11:39:03] Logged the message, Master [11:42:27] (03PS1) 10Aklapper: Provide URLs for licenses mentioned in Phabricator footer [puppet] - 10https://gerrit.wikimedia.org/r/202708 (https://phabricator.wikimedia.org/T94946) [11:46:46] (03CR) 10Filippo Giunchedi: statsite: new module (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/199599 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [11:50:04] (03PS4) 10Filippo Giunchedi: statsite: new module [puppet] - 10https://gerrit.wikimedia.org/r/199599 (https://phabricator.wikimedia.org/T90111) [11:50:06] (03PS2) 10Filippo Giunchedi: statsite: replace ::txstatsd class and role calls [puppet] - 10https://gerrit.wikimedia.org/r/202701 (https://phabricator.wikimedia.org/T90111) [11:50:08] (03PS3) 10Filippo Giunchedi: statsdlb: replace txstatsd with statsite [puppet] - 10https://gerrit.wikimedia.org/r/199600 (https://phabricator.wikimedia.org/T90111) [11:53:57] andre__: doh! "edit blocked by tasks" [11:54:13] eh? [11:54:43] ah. "Edit 'Blocked by' tasks". I had problems parsing, like "This edit is blocked by tasks" or such :) [11:54:56] (English = no good) [11:55:43] andre__: yeah sorry that was lacking context too :) [11:57:01] too many tasks & too little brain = yes, it can take me a while to get the context :P [12:13:05] (03CR) 10Qgil: Provide URLs for licenses mentioned in Phabricator footer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/202708 (https://phabricator.wikimedia.org/T94946) (owner: 10Aklapper) [12:13:34] (03PS1) 10Hashar: zuul: switch install to a Debian package [puppet] - 10https://gerrit.wikimedia.org/r/202714 (https://phabricator.wikimedia.org/T48552) [12:23:15] !log Killed Zuul :( [12:23:19] Logged the message, Master [12:27:34] "Sorry! This site is experiencing technical difficulties. [12:27:36] Try waiting a few minutes and reloading. [12:27:38] (Cannot access the database: Too many connections (10.64.32.30))" [12:27:44] at https://pt.wikipedia.org/w/index.php?title=Predefini%C3%A7%C3%A3o_Discuss%C3%A3o:Tabela_peri%C3%B3dica&action=edit§ion=new [12:28:57] actually, this happens for https://pt.wikipedia.org/ too [12:29:16] noted [12:29:19] looking into it [12:30:45] * hoo also looking [12:30:56] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [500.0] [12:31:43] * aude looks [12:31:44] all of s2 wikis have the problem [12:31:51] 6operations, 6Commons, 6Multimedia: 220px of Fachada_e_lateral_da_Catedral_São_Sebastião_após_pintura,_Coronel_Fabriciano_MG.JPG not purging - https://phabricator.wikimedia.org/T95333#1189349 (10BBlack) I don't think it's directly related to the esams depool, although perhaps the excess swift load has induce... [12:32:33] odd that it is just ptwiki apparently [12:32:45] and nlwiki [12:33:20] aude: no, it's all s2 wikis [12:33:24] for example thwiki [12:33:35] i see entries for BadMethodCallException from line 292 of /srv/mediawiki/php-1.25wmf24/includes/db/DatabaseMysqli.php: Call to a member function real_escape_string() on a non-object (boolean) [12:33:55] mw1021 [12:33:58] mainly [12:34:52] probably unrelated [12:35:14] PROBLEM - HHVM rendering on mw1114 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 0.002 second response time [12:35:45] PROBLEM - Apache HTTP on mw1114 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 0.004 second response time [12:35:55] PROBLEM - HHVM processes on mw1114 is CRITICAL: PROCS CRITICAL: 0 processes with command name hhvm [12:37:04] (03CR) 10Steinsplitter: "Please link directly to https://gnu.org/licenses/gpl.html. https://www.gnu.org/licenses/ contains multiple licenses. :-)" [puppet] - 10https://gerrit.wikimedia.org/r/202708 (https://phabricator.wikimedia.org/T94946) (owner: 10Aklapper) [12:37:46] anybody knows which phab task is for the current outage? like it.wikipedia.org [12:38:14] akosiaris: Did oyu do something? [12:38:40] Couple of long running queries on db1067 just went away [12:38:49] hoo: nope [12:39:01] mh [12:39:10] (03CR) 10Aklapper: "That idea was intentional as we don't explicitly mention GPL version 3 in the footer and I don't want to imply that we mean version 3 only" [puppet] - 10https://gerrit.wikimedia.org/r/202708 (https://phabricator.wikimedia.org/T94946) (owner: 10Aklapper) [12:39:12] Someone probably pt-killed them [12:39:15] * hoo didn't [12:39:23] most of s2 slaves have < 200 q but db1060 have like 4000+ [12:39:46] and no more [12:39:48] zeljkof, I don't think we have a Phab task yet - we might have an incident project afterwards though [12:39:49] springle: around ? [12:39:51] Yeah, they pile up, because the servers were busy SiteStatsInit::articles queries [12:40:14] seems like db servers are ok now [12:40:16] andre__: ok, thanks [12:40:29] Yep, as told the queries just disappeared [12:40:36] between me seeing them and me trying to kill them [12:40:41] https://it.wikipedia.org/ works again for me now [12:41:04] yeah, all of s2 wikis should be fine now [12:41:11] mysql:wikiadmin@db1067 [information_schema]> SELECT * FROM processlist WHERE command != 'sleep'; [12:41:13] looks fine [12:41:34] that being said, I 'll have to restart a couple of HHVM processes [12:41:45] (03CR) 10Qgil: "Then we can link to https://www.gnu.org/licenses/old-licenses/gpl-2.0.html since that is the most used GPL license in our code." [puppet] - 10https://gerrit.wikimedia.org/r/202708 (https://phabricator.wikimedia.org/T94946) (owner: 10Aklapper) [12:43:15] !log Zuul is back and it is nasty [12:43:20] Logged the message, Master [12:44:27] (03PS2) 10Steinsplitter: Provide URLs for licenses mentioned in Phabricator footer [puppet] - 10https://gerrit.wikimedia.org/r/202708 (https://phabricator.wikimedia.org/T94946) (owner: 10Aklapper) [12:45:56] RECOVERY - HHVM rendering on mw1114 is OK: HTTP OK: HTTP/1.1 200 OK - 67256 bytes in 0.409 second response time [12:46:28] (03CR) 10Steinsplitter: "changed the patch, linking now to gpl-2.0.html" [puppet] - 10https://gerrit.wikimedia.org/r/202708 (https://phabricator.wikimedia.org/T94946) (owner: 10Aklapper) [12:46:34] RECOVERY - Apache HTTP on mw1114 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.168 second response time [12:46:44] RECOVERY - HHVM processes on mw1114 is OK: PROCS OK: 25 processes with command name hhvm [12:47:13] !log restarted HHVM on mw1114 [12:47:18] Logged the message, Master [12:47:20] (03CR) 10Aklapper: [C: 031] "Thanks. Looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/202708 (https://phabricator.wikimedia.org/T94946) (owner: 10Aklapper) [12:47:59] FYI: The problem was also reported on https://lists.wikimedia.org/pipermail/wikitech-l/2015-April/081496.html [12:49:04] Does anyone want to do an incident report? [12:49:10] I have the queries that blocked the slaves [12:49:26] 6operations, 10ops-eqiad: dysprosium memory failure - https://phabricator.wikimedia.org/T95423#1189384 (10BBlack) 3NEW a:3Cmjohnson [12:49:31] (03CR) 10Qgil: [C: 031] Provide URLs for licenses mentioned in Phabricator footer [puppet] - 10https://gerrit.wikimedia.org/r/202708 (https://phabricator.wikimedia.org/T94946) (owner: 10Aklapper) [12:49:56] hoo: I 'd say yes [12:52:24] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [12:53:21] 6operations, 10ops-eqiad: Make dysprosium physical storage layout match other caches - https://phabricator.wikimedia.org/T95424#1189397 (10BBlack) 3NEW a:3Cmjohnson [12:53:58] Mh... maybe those queries were just a symptom and I missed the actual problem by a few seconds [12:56:55] akosiaris: hi [12:57:03] hoo: what did you see? [12:57:07] springle: hey... you just missed an outage [12:57:36] tendril shows a surge of SiteStatsInit::articles just now [12:57:44] on svwiki [12:59:37] springle: Yeah, saw those [12:59:44] but I think I was a few seconds to late for the party [12:59:53] https://tendril.wikimedia.org/report/slow_queries_checksum?checksum=a1ee41c25b4f21da593497574696358c&host=^db&user=wikiuser&schema=wik&hours=1 [13:00:43] springle: Did you invoke pt-kill in the end or did it just somehow solve itself? [13:01:11] * aude didn't know about tendril [13:01:13] nice [13:01:14] the slaves' event scheduler started killing stuff [13:01:36] similar to pt-kill, but doesn't require an open connection during max_connections [13:01:40] and automatic [13:01:57] Good thing... I was about to start killing them, but then it suddenly got better itself [13:04:22] SELECT /* SiteStatsInit::edits ... */ COUNT(*) FROM `revision` LIMIT 1 .... wtf? [13:04:38] SiteStatsInit::articles [13:04:42] no? [13:05:06] Ah yes [13:05:10] yeah that time. but couple hours ago, also that ^ [13:05:15] just looks nuts [13:05:40] I thought we only do the full count in the maint. script [13:05:54] And never during normal opeatrion [13:06:21] Also the articles query is a little stupid [13:06:37] COUNT(DISTINCT... is bad [13:06:43] SELECT /* SiteStatsInit::articles ... */ COUNT(DISTINCT page_id) FROM `page`, `pagelinks` WHERE page_namespace = '0' AND page_is_redirect = '0' AND (pl_from=page_id) LIMIT 1 [13:06:50] would be better done with EXISTS, in case anyone cares [13:07:34] all duplicates too [13:07:42] SELECT COUNT(*) FROM page WHERE page_namespace = 0 AND EXISTS(SELECT 1 FROM pagelinks WHERE pl_from = page_id); [13:07:59] In the end, those should never fire in production, I think [13:08:11] O_o [13:08:34] Nemo_bis: Wasn't that the query you asked me(?) about a few days ago? [13:08:44] And I said it was not fully optimized? :P [13:08:55] But wasn't it supposed to be over a week ago [13:09:12] Nemo_bis: That got hit of by normal users [13:09:15] IPs even AFAIR [13:09:28] How's possible? [13:09:37] Good question [13:10:03] Let's create a ticket and let someone reasearch [13:10:16] Looks like someone found a bug and DoS'ed the sites, either intentionally or not [13:10:37] No, wasn't a DDOS and it wasn't just on eperson [13:10:48] many different client IPs [13:10:51] yep [13:10:59] also search engine bots [13:11:05] a number of google and msnbots, but not enough to look like just a crawl event [13:11:30] unless they triggered the wtp100x hits indirectly [13:11:58] (03PS1) 10BBlack: add_ip6_mapped: enable token-based SLAAC for all jessie/trusty [puppet] - 10https://gerrit.wikimedia.org/r/202725 (https://phabricator.wikimedia.org/T94417) [13:12:52] 6operations, 3Interdatacenter-IPsec, 5Patch-For-Review: Fix ipv6 autoconf issues - https://phabricator.wikimedia.org/T94417#1189425 (10BBlack) [13:16:56] Can someone check /var/log/mediawiki/updateArticleCount.log just in case? [13:18:18] (03PS1) 10Dereckson: Added media.padil.gov.au to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202726 (https://phabricator.wikimedia.org/T95328) [13:18:31] hoo@terbium:~$ ls -l /var/log/mediawiki/updateArticleCount.log [13:18:31] -rw-rw-r-- 1 www-data www-data 134439 Mar 29 20:13 /var/log/mediawiki/updateArticleCount.log [13:18:32] Nemo_bis: ^ [13:18:42] But as told... we saw the IPs hitting these off [13:18:54] So can't be the maint. script [13:19:22] (03CR) 10Hashar: "Cherry picked on integration puppetmaster." [puppet] - 10https://gerrit.wikimedia.org/r/202714 (https://phabricator.wikimedia.org/T48552) (owner: 10Hashar) [13:19:25] SiteStatsUpdate ? [13:19:35] is that deferred (and not a job?) [13:19:44] aude: Should be only a job in production [13:19:52] * aude would hope so [13:20:03] I mean the maint. [13:20:12] ah [13:20:14] It should never get hit of during any normal operation [13:20:26] Usually you just add/subtract from the numbers [13:20:33] (03CR) 10Steinsplitter: [C: 031] "schould be ok :-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202726 (https://phabricator.wikimedia.org/T95328) (owner: 10Dereckson) [13:23:04] seems possibly triggered on if ( !self::isSane( $row ) ) [13:23:18] in SiteStats [13:23:50] somehow if the row was false or such, then that might happen? [13:24:25] * aude not totally sure [13:24:42] Could be... we had that in the past [13:24:55] * aude vaguely remembers that [13:24:58] i think plwiki [13:25:10] Yeah, I once patched something related to that I think [13:25:35] https://phabricator.wikimedia.org/T95426 [13:26:15] can you guys update T95426 with any findings? [13:26:24] whenever you have such things :) [13:27:36] https://wikitech.wikimedia.org/wiki/Server_admin_log/Archive_24#February_23 [13:27:45] hoo: did you end up starting an outage report, or are we still todo? I can, if you'll chime in since I wasn't actually around [13:28:03] springle: No, I didn't start one, yet [13:28:44] aude: ooh now i recall.. vaguely [13:28:52] hoo: I was wondering if it could happen that something empties the site stats table and then everyone visiting Special:Statistics causes queries [13:29:05] i don't see an incident report for that [13:36:24] (03PS8) 10Mobrovac: parsoid: Remove parsoid beta role [puppet] - 10https://gerrit.wikimedia.org/r/193082 (https://phabricator.wikimedia.org/T86633) (owner: 10Yuvipanda) [13:36:49] aude: hoo: "site outage S2 plwiki 2014-02-23 18:32:32" email to ops@. looks like I did a report on this a year ago, but the mentioned gerrit changet sets suggest we fixed something [13:37:11] i remember something got fixed [13:37:14] https://gerrit.wikimedia.org/r/#/c/114994/ [13:37:32] (03CR) 10Mobrovac: "Rebased and tweaked parsoid package requirements (npm and build-essential are not needed)" [puppet] - 10https://gerrit.wikimedia.org/r/193082 (https://phabricator.wikimedia.org/T86633) (owner: 10Yuvipanda) [13:38:40] that's even my change, yes :P [13:38:56] :) [13:40:04] SiteStatsInit::doAllAndCommit should be a job or something [13:41:03] maybe for a small wiki, it is ok as deferred [13:41:09] not for us [13:42:24] PROBLEM - mailman I/O stats on sodium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=67.30 Read Requests/Sec=56.70 Write Requests/Sec=2.60 KBytes Read/Sec=457.20 KBytes_Written/Sec=86.00 [13:47:35] PROBLEM - mailman I/O stats on sodium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=95.90 Read Requests/Sec=110.50 Write Requests/Sec=4.60 KBytes Read/Sec=2405.20 KBytes_Written/Sec=37.20 [13:48:57] (03PS3) 10Alexandros Kosiaris: ssh: remove lucid special casing for authorized_keys_file [puppet] - 10https://gerrit.wikimedia.org/r/202394 [13:48:59] (03PS3) 10Alexandros Kosiaris: ssh: allow parameterization of authorized_keys [puppet] - 10https://gerrit.wikimedia.org/r/202392 [13:49:01] (03PS3) 10Alexandros Kosiaris: sodium: specify the position of authorized_keys_file [puppet] - 10https://gerrit.wikimedia.org/r/202393 [13:49:03] (03PS1) 10Alexandros Kosiaris: Specify ssh userkey policy for ganeti clusters [puppet] - 10https://gerrit.wikimedia.org/r/202730 [13:49:05] (03PS1) 10Alexandros Kosiaris: ssh::userkey: Allow a prefix to be specified for a key [puppet] - 10https://gerrit.wikimedia.org/r/202731 [13:50:41] _joe_: , yt? [13:50:44] hiera q for ya [13:51:22] _joe_: oh and while you are on hiera... I am thinking we should kill mainrole from nuyaml... we don't use it and it is making it difficult for me to debug sometimes [13:51:34] ottomata: I can probably help as well [13:51:35] ha [13:51:37] ok [13:51:38] so [13:51:53] sometimes, i use this defaults.pp pattern [13:52:06] where module class parameter defaults get initialzed from that file [13:52:07] like [13:52:23] yeah I know the pattern [13:52:26] class foo { $bar = $::foo::defaults::bar [13:52:26] yeah [13:52:26] ::params [13:52:31] yeah, exactly [13:52:41] i like that because then I can set dyanmic defaults [13:52:59] that I can't set if I were to put the value directly in the module class params [13:53:03] so, for example [13:53:09] right now, i'm doing an archiva nginx proxy [13:53:13] with a [13:53:18] $use_ssl parameter [13:53:24] in labs, I'd like this to default to false [13:53:29] but true in production [13:53:38] there are a few others as well that are dependent on $::realm [13:53:53] i'd prefer to avoid having to edit hiera params in one case or the other, if the default will suffice [13:54:13] so, i can do this like: [13:54:25] # Should we use and force SSL for this nginx archiva proxy? [13:54:25] $use_ssl = hiera('archiva::use_ssl', $::realm ? { [13:54:25] 'production' => true, [13:54:25] default => false, [13:54:25] }) [13:54:32] in the class itself [13:54:43] but that wouldn't be ideal, since now the default isn't in the parameter [13:54:52] and i'm doing the hiera lookup manually [13:55:05] or, i could make a defaults file like i usually do. [13:55:23] not sure which is better, i am maybe getting an impression that folks don't like my defaults.pp files? not sure. [13:55:49] do you know that class parameters are autolooked up in hiera, right ? [13:55:59] yes [13:56:07] thats why class param is better [13:56:09] just make it a class parameter defaulting to something you like and override in hiera for production [13:56:27] and no realm lookup, no hiera() calls, no nothing [13:56:41] prod should be default, IMHO :) [13:56:45] hm. i guess. i was hoping to avoid having to edit hiera to set the sane default [13:57:03] bblack, except that there could be more dynamic cases in labs than in production [13:57:05] more places to override [13:57:20] they don't have a place to override for all of labs in one fell swoop? [13:57:26] ottomata: you are only overriding in code and not in hiera [13:57:28] per project, ja [13:57:32] (03PS1) 10Hoo man: Create Wikidata ttl dumps on Mondays 23:00:00 (UTC) [puppet] - 10https://gerrit.wikimedia.org/r/202734 [13:57:36] <_joe_> akosiaris: i fully agree re: mainrole [13:57:36] that case block is doing exactly that [13:57:46] <_joe_> ottomata: yes I'm here, sorry, deep in coding [13:57:51] akosiaris: only in code? [13:57:54] like in the role that includes this? [13:58:05] _joe_: its ok, akosiaris is helping [13:58:06] carry on [13:58:31] no I mean that your proposed solution with the case looking up $::realm is doing the config in code and not hiera [13:58:41] well (a) it would be nice if labs had a way to set labs-wide alternate hiera defaults from prod and (b) it would be nice to have labs holding those diffs rather than prod: it gives us a nice collection of "here's the data diffs to labs", which could be a target for reduction in many cases. [13:58:43] <_joe_> akosiaris: my idea is to swap it out in favor of a direct call to the role backend :) [13:58:55] _joe_: sounds fine to me [13:59:04] <_joe_> bblack: we have all that btw [13:59:08] ottomata: so you are not really gaining that much... [13:59:13] bblack: that is what hiera does afaik [13:59:14] <_joe_> bblack: hieradata/labs.yaml [13:59:24] 13:57 < ottomata> bblack, except that there could be more dynamic cases in labs than in production [13:59:27] 13:57 < ottomata> more places to override [13:59:29] ? [13:59:50] <_joe_> hiera does what we tell it to do [13:59:53] <_joe_> btw [13:59:55] <_joe_> :) [14:00:04] chasemp: Dear anthropoid, the time has come. Please deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150408T1400). [14:00:16] ok, so if that's not a valid reason not to, I still think prod's default should be the default default :) [14:00:37] hm. [14:00:39] ok. [14:00:41] labs.yaml. [14:00:46] ok, i'm for it. thanks guys! [14:01:26] <_joe_> yes the idea is [14:01:42] <_joe_> 1) make the default what most people will use, or production for a specific class [14:01:53] <_joe_> 2) override it per dc/per realm as needed [14:02:59] I just like the idea that labs suffers any burden of special hieradata deviations, because we probably want to target eliminating those deviations where possible in the long run. [14:03:25] PROBLEM - mailman I/O stats on sodium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=122.50 Read Requests/Sec=156.30 Write Requests/Sec=39.60 KBytes Read/Sec=2044.40 KBytes_Written/Sec=2363.45 [14:03:30] heheh, bblack, i think in this case i'm fine with that idea, because archiva is mostly a production thing [14:03:37] but in general i disagree with you [14:03:48] by favoring production, you are favoring special casing, rather than abstractions [14:04:18] I'm favoring the case that matters in the real world for production, and favoring that labs loses relevance for prod testing the more it deviates from what production does. [14:05:05] naw i'm not saying that labs shoudl be favored over production [14:05:30] i'm saying that modules should be developed in a realm/environment agnostic way [14:05:38] it makes them easier to test, and easier to use in more places [14:05:58] I'm just saying, if there's a default default built in somewhere, and labs/prod realm deviates on what that default default should be, pick prod. [14:06:09] either way, you have a default built in that doesn't work *somewhere* [14:06:42] aye, i'm doing that for archiva here, because i don't care much about archiva module. e.g. I am setting default $certificate_name = 'archiva.wikimedia.org' [14:06:50] but, in general, i don't like doing this [14:06:57] i would just as soon not give a default [14:07:32] not having a default at all (in the code) is reasonable too, then we can see the diff clearly by comparing the two sets of hieradata [14:08:01] I just don't like having a default that works for labs and not prod and ending up with a prod-specific override and relying-on-default in labs. [14:09:13] things like $cert_name will always be different anyways, but things like $use_ssl shouldn't be. that's something we want to target eliminating. [14:10:41] aye, bblack, makes sense to not favor labs over prod. [14:10:53] well, except if I enabled ssl for this in labs, i'm not so sure I could reach the instance without a tunnel [14:10:59] i dont' think yuvi's labs proxy works with ssl [14:12:06] 6operations, 6Commons, 6Multimedia: 220px of Fachada_e_lateral_da_Catedral_São_Sebastião_após_pintura,_Coronel_Fabriciano_MG.JPG not purging - https://phabricator.wikimedia.org/T95333#1189499 (10fgiunchedi) indeed Brandon is correct and I jumped the gun, the file is on one of the handoff nodes: ``` root@ms-... [14:13:17] ottomata: yeah we already have lots of little deviations like that for pragmatic reasons. they're necessary in the present, but they make a good set of things to target fixing in the long run. [14:13:30] the less of them there are, the more-valid labs testing is wrt prod. [14:13:55] RECOVERY - mailman I/O stats on sodium is OK: OK - I/O stats: Transfers/Sec=23.30 Read Requests/Sec=1.00 Write Requests/Sec=9.50 KBytes Read/Sec=4.00 KBytes_Written/Sec=709.30 [14:14:19] _joe_: I 've been meaning to ask. How can I test in hiera CLI role lookups ? [14:15:58] I usually do something like RUBYLIB="./modules/wmflib/lib/" hiera -d -c hiera.yaml "a::b::c" "::site=eqiad" "::hostname=blah" etc [14:17:51] springle, re that outage: is there a way to find out the contents of site_stats before that storm? i.e. was it empty or was there some bogus value? [14:18:28] MaxSem: Sure, binlogs are your frined [14:18:51] And you have the state of 24h ago on the delayed replicated dbstroe [14:20:34] well, I *don't*:P [14:20:50] because I'm travelling and have no server keys atm [14:20:51] <_joe_> akosiaris: mh "::_roles=['myrole']" I think [14:21:44] <_joe_> akosiaris: but lemme check [14:22:35] _joe_: you already pointed me to the right direction... ::_roles= sure uses the function [14:22:54] it's just the argument that is failing now... [14:22:57] reading code [14:23:33] <_joe_> topscope_var = '::_roles' roles = scope[topscope_var] [14:23:35] <_joe_> oh sorry [14:23:39] <_joe_> it's an hash, right [14:24:01] <_joe_> because that's the only puppet data structure that can be modified [14:24:40] <_joe_> so something like "::_roles = { 'myrole' => true }" [14:24:44] <_joe_> if that even works [14:28:14] <_joe_> akosiaris: if you make any progress, lemme know [14:30:36] _joe_: I did [14:30:39] roles = eval(scope[topscope_var]) [14:30:48] need to make the string a hash and it works [14:30:55] <_joe_> ugh that is horrible [14:30:56] <_joe_> :) [14:31:05] <_joe_> and won't work in prod [14:31:12] it doesn't have to [14:31:18] <_joe_> what cli did you use exactly [14:31:31] CHANGE=202393; RUBYLIB="/tmp/catalog-differ/akosiaris/$CHANGE/production/src/modules/wmflib/lib/" hiera -d -c /tmp/catalog-differ/akosiaris/$CHANGE/production/src/hiera.yaml "ssh::server::authorized_keys_file" "::site=codfw" "::fqdn=ganeti2001.codfw.wmnet" "::hostname=ganeti2001" "::_roles={'role::ganeti' => true} [14:31:32] (03PS1) 10Dereckson: Added ymt.adlibhosting.com to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202736 [14:31:35] helpful ? :P [14:31:47] oh, I forget one " at the end [14:31:51] but you get the idea [14:32:10] obviously evaling is a bad bad idea [14:32:28] I can probably write a patch to check the types of roles [14:32:41] and if it is a string to parse it as json and then turn it to a ruby object [14:32:44] way way way safer [14:32:51] <_joe_> I was about to suggest something like that [14:33:01] <_joe_> which keyw ere you looking up there? [14:33:02] (03PS2) 10Dereckson: Added ymt.adlibhosting.com to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202736 (https://phabricator.wikimedia.org/T95418) [14:33:12] <_joe_> or better, what were you trying to do? [14:33:26] override ssh::server::authorized_keys_file for ganeti clusters [14:33:42] long story short, ganeti uses ssh and needs ssh root access [14:34:16] so to comply with our policy of /etc/ssh/userkeys/%u I patched up ssh::server to allow overriding AuthorizedKeysFile via hiera [14:34:26] and it works fine for example for sodium [14:34:35] <_joe_> ok got it [14:34:35] now I am pondering the best way to do it for ganeti clusters [14:34:55] one way is https://gerrit.wikimedia.org/r/#/c/202730/ [14:35:02] using regex.yaml [14:35:12] but I am wondering if I should use role/ganeti.yaml [14:35:19] can't make it work up to now though [14:35:43] <_joe_> you should, do you have a patch? [14:36:02] <_joe_> or, I can prepare one [14:37:09] <_joe_> akosiaris: I think I know why [14:37:15] <_joe_> it doesn't work for you [14:37:34] <_joe_> you should move include role::ganeti to the top of the node defs [14:37:39] <_joe_> and write it like [14:37:44] <_joe_> "role ganeti" [14:38:00] yeah that was the next step [14:38:07] after making sure hiera would actually find it [14:38:14] <_joe_> oh it would [14:38:15] <_joe_> :) [14:39:22] hmmm, now that I can debug it and see all the lookups, I am gonna say yes :-) [14:39:31] let's amend that change [14:39:53] hello:) when i have, as an example, class role::dumps::zim do i put that into role/dumps.pp so there can be multiple classes in that file, or do we want it to be like in modules and role/dumps/zim.pp with directories [14:40:03] (we are doing both of course) [14:40:30] mutante: go for a single file for now [14:40:30] <_joe_> mutante: ideally, we'd use the right form everywhere [14:40:41] (03PS1) 10Dereckson: Throttle rule for Editatón Ciencia y Tecnología en Chile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202740 (https://phabricator.wikimedia.org/T95302) [14:40:42] lol [14:40:44] :) hehe [14:41:05] seriously my take is for now a single file. modularization of roles is not very far down the road [14:41:11] (03PS1) 10Legoktm: Add fatal log group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202741 (https://phabricator.wikimedia.org/T89169) [14:41:16] and one extra class won't save us [14:42:15] ok. and actually only analytics has a sub directory so far [14:42:51] but.. i get reminded too because puppet-lint will tell us all those role classes are not in autoload format all the time [14:43:07] role is in bad format anyway [14:43:15] but i also dont want that check to be disabled globally [14:43:17] but as I said, modularizing it is not far [14:43:23] ok [14:46:53] _joe_: hieradata/role/common/role/ganeti.yaml ??? [14:47:02] ETOOMANYROLES ? [14:47:16] it's a bit funny to see role and then again role in there [14:47:33] <^d> it's roles all the way down [14:47:36] one more question, is zotero a service as in "service zotero start"? [14:47:44] (like citoid) [14:47:46] mutante: yes [14:47:49] ok, thx [14:49:05] (03CR) 10Dzahn: [C: 032] "just a start, it's not included anywhere yet" [puppet] - 10https://gerrit.wikimedia.org/r/200725 (https://phabricator.wikimedia.org/T94457) (owner: 10Dzahn) [14:49:24] _joe_: scratch that out. my bad passing "::_roles={'role::ganeti' => true} instead of "::_roles={'ganeti' => true} [14:50:44] (03PS1) 10MaxSem: WIP: Hierator puppetization [puppet] - 10https://gerrit.wikimedia.org/r/202743 [14:50:53] (03CR) 10Dzahn: [C: 032] "role class can be tested in labs this way, isn't applied on prod node yet" [puppet] - 10https://gerrit.wikimedia.org/r/200790 (https://phabricator.wikimedia.org/T94457) (owner: 10Dzahn) [14:51:09] (03CR) 10Manybubbles: [C: 031] Create Wikidata ttl dumps on Mondays 23:00:00 (UTC) [puppet] - 10https://gerrit.wikimedia.org/r/202734 (owner: 10Hoo man) [14:51:31] legoktm: Ping for SWAT in about 8.5 minutes [14:51:38] pong :) [14:52:34] (03CR) 10Dzahn: "if i put it in hieradata/common/dumps.yaml will role/dumps.pp find it without a further change?" [puppet] - 10https://gerrit.wikimedia.org/r/202637 (owner: 10Dzahn) [14:52:50] (03PS2) 10Alexandros Kosiaris: ssh::userkey: Allow a prefix to be specified for a key [puppet] - 10https://gerrit.wikimedia.org/r/202731 [14:52:52] (03PS2) 10Alexandros Kosiaris: Specify ssh userkey policy for ganeti clusters [puppet] - 10https://gerrit.wikimedia.org/r/202730 [14:52:54] (03PS4) 10Alexandros Kosiaris: ssh: remove lucid special casing for authorized_keys_file [puppet] - 10https://gerrit.wikimedia.org/r/202394 [14:52:56] (03PS4) 10Alexandros Kosiaris: ssh: allow parameterization of authorized_keys [puppet] - 10https://gerrit.wikimedia.org/r/202392 [14:52:58] (03PS4) 10Alexandros Kosiaris: sodium: specify the position of authorized_keys_file [puppet] - 10https://gerrit.wikimedia.org/r/202393 [14:53:26] legoktm: Eew, needs scap ;) [14:53:55] I don't mind doing it if you don't want to [14:54:12] I might just take you up on that one. [14:56:05] Hi. We've a throttle rule for an event this week-end: https://gerrit.wikimedia.org/r/#/c/202740/ [14:56:27] Dereckson: For SWAT? Add it to the deployments page, please [14:59:11] (03PS4) 10coren: WIP: Proper labs_storage class [puppet] - 10https://gerrit.wikimedia.org/r/199267 (https://phabricator.wikimedia.org/T85606) [14:59:19] <_joe_> akosiaris: sorry, coding, now interview [14:59:33] 7Blocked-on-Operations, 6operations, 5Patch-For-Review: Install nodejs, nginx and other dependencies on francium - https://phabricator.wikimedia.org/T94457#1189597 (10Dzahn) As a step forward i created class `role::dumps::zim` and `modules/dumps/manifests/zim.pp` which install nodejs, nodejs-legacy, libsqlit... [14:59:54] 6operations, 10ops-eqiad: Make dysprosium physical storage layout match other caches - https://phabricator.wikimedia.org/T95424#1189598 (10Cmjohnson) 5Open>3declined Disks have been swapped. Adding the 2 2.5" 250GB Disk to spares [15:00:04] manybubbles, anomie, ^d, thcipriani, marktraceur, anomie: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150408T1500). [15:00:17] Ok, let's start with the config changes because they're fast. [15:00:37] 6operations, 10ops-eqiad: labnodepool1001 setup tasks: labels/ports/racktables - https://phabricator.wikimedia.org/T95048#1189602 (10Cmjohnson) ge-3/0/18 is the correct port. [15:00:41] (03CR) 10Anomie: [C: 032] Add fatal log group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202741 (https://phabricator.wikimedia.org/T89169) (owner: 10Legoktm) [15:00:48] 6operations, 3Continuous-Integration-Isolation: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1189604 (10Cmjohnson) [15:00:50] 6operations, 10ops-eqiad: labnodepool1001 setup tasks: labels/ports/racktables - https://phabricator.wikimedia.org/T95048#1189603 (10Cmjohnson) 5Open>3Resolved [15:01:11] 6operations, 6Commons, 6Multimedia: 220px of Fachada_e_lateral_da_Catedral_São_Sebastião_após_pintura,_Coronel_Fabriciano_MG.JPG not purging - https://phabricator.wikimedia.org/T95333#1189605 (10fgiunchedi) so the object-replicator attempted to rsync the 41724 partition and failed, given that we are also reb... [15:01:16] anomie: added [15:01:53] (03CR) 10coren: "This is nearing deployworthy." [puppet] - 10https://gerrit.wikimedia.org/r/199267 (https://phabricator.wikimedia.org/T85606) (owner: 10coren) [15:02:50] Ooh, Zuul is fancier. Grr, config change stuck behind a random core merge. [15:03:46] (03CR) 10Anomie: [V: 032] "Sigh. Jenkins passed the tests, but won't merge because a mediawiki/core merge is clogging the pipe." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202741 (https://phabricator.wikimedia.org/T89169) (owner: 10Legoktm) [15:04:02] oops >.> [15:04:10] !log anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Add fatal log group [[gerrit:202741]] (duration: 00m 12s) [15:04:11] legoktm: ^ Test please [15:04:14] Logged the message, Master [15:04:33] (03PS3) 10Anomie: Add REL1_25 branches to ExtDist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202591 (owner: 10Chad) [15:04:42] That's what "Publish and submit" is for ;) [15:05:00] anomie: yay I see fatals! [15:05:03] I mean :P [15:05:19] (03CR) 10Anomie: [C: 032] Add REL1_25 branches to ExtDist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202591 (owner: 10Chad) [15:05:33] (03Merged) 10jenkins-bot: Add REL1_25 branches to ExtDist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202591 (owner: 10Chad) [15:05:48] (03PS1) 10Dzahn: admin: add admin group for zotero [puppet] - 10https://gerrit.wikimedia.org/r/202746 (https://phabricator.wikimedia.org/T95400) [15:06:09] legoktm: ... Huh. That's weird, because I actually forgot to pull the changed file before syncing it. [15:06:18] uhh [15:06:19] (03CR) 10GWicke: "fwiw, npm is still needed if we want to debug memory leaks in production. The normal procedure to do this is to" [puppet] - 10https://gerrit.wikimedia.org/r/193082 (https://phabricator.wikimedia.org/T86633) (owner: 10Yuvipanda) [15:06:30] ˙^d: Can I get a hand with a gerrit account? And/or can you refer me to whoever is the new person for that kind of thing? [15:06:31] !log anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Add fatal log group [[gerrit:202741]] (for real this time) (duration: 00m 13s) [15:06:34] Logged the message, Master [15:06:37] (03PS1) 10BBlack: sync cache nodelists to hieradata [puppet] - 10https://gerrit.wikimedia.org/r/202747 [15:06:47] legoktm: ^ There [15:06:56] hmm, I need to cause a fatal somehow [15:07:11] !log anomie Synchronized wmf-config/CommonSettings.php: SWAT: Add REL1_25 branches to ExtDist [[gerrit:202591]] (duration: 00m 11s) [15:07:13] legoktm: Here, I'll merge your other patch and maybe that'll do it ;) [15:07:13] ^d: hm, nevermind, I may have found docs! [15:07:14] (03PS2) 10BBlack: sync cache nodelists to hieradata [puppet] - 10https://gerrit.wikimedia.org/r/202747 [15:07:15] Logged the message, Master [15:07:18] >.< [15:07:33] (03CR) 10GWicke: "Also, build-essential as well." [puppet] - 10https://gerrit.wikimedia.org/r/193082 (https://phabricator.wikimedia.org/T86633) (owner: 10Yuvipanda) [15:07:47] (03PS2) 10Anomie: Throttle rule for Editatón Ciencia y Tecnología en Chile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202740 (https://phabricator.wikimedia.org/T95302) (owner: 10Dereckson) [15:08:14] Thank you. [15:08:14] (03CR) 10GWicke: [C: 04-1] parsoid: Remove parsoid beta role [puppet] - 10https://gerrit.wikimedia.org/r/193082 (https://phabricator.wikimedia.org/T86633) (owner: 10Yuvipanda) [15:08:15] anomie: extensiondistributor patch looks good [15:08:17] <^d> andrewbogott: Hm, whassup? [15:08:23] (03CR) 10John F. Lewis: [C: 031] "It's the usual sane patch which from what I can see is correct so... lgtm." [puppet] - 10https://gerrit.wikimedia.org/r/202746 (https://phabricator.wikimedia.org/T95400) (owner: 10Dzahn) [15:08:26] (03CR) 10Anomie: [C: 032] Throttle rule for Editatón Ciencia y Tecnología en Chile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202740 (https://phabricator.wikimedia.org/T95302) (owner: 10Dereckson) [15:08:28] (03CR) 10Mobrovac: admin: add admin group for zotero (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/202746 (https://phabricator.wikimedia.org/T95400) (owner: 10Dzahn) [15:08:30] (03Merged) 10jenkins-bot: Throttle rule for Editatón Ciencia y Tecnología en Chile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202740 (https://phabricator.wikimedia.org/T95302) (owner: 10Dereckson) [15:08:32] bah, the docs are wrong, the db it indicates don’t exist. [15:09:04] ^d: just a user rename. I renamed Moritz in ldap already, probably best to just erase his local gerrit account and let it create a new one. [15:09:05] !log anomie Synchronized wmf-config/throttle.php: SWAT: Throttle rule for Editatón Ciencia y Tecnología en Chile [[gerrit:202740]] (duration: 00m 11s) [15:09:06] Dereckson: ^ I suppose there's no way to check that. [15:09:09] Logged the message, Master [15:09:12] (03CR) 10BBlack: [C: 032] sync cache nodelists to hieradata [puppet] - 10https://gerrit.wikimedia.org/r/202747 (owner: 10BBlack) [15:09:30] legoktm: I just saw two fatals go into fatal.log, is that it? [15:09:39] yeah but the log is still useless :/ [15:09:40] #0 (): MWExceptionHandler::handleFatalError() [15:09:40] #1 {main} [15:09:43] hoo|away: ^ [15:09:43] ^d: I believe his new name is ‘Muehlenhoff’ and his old name was ‘Moritz Mühlenhoff' [15:09:52] which, the latter didn’t work due to umlaut [15:10:05] well, slightly better now that they're being logged [15:10:19] <^d> andrewbogott: If you erase the account he'll lose his existing history [15:10:20] Apart syntaxical basic tests for date or IP format, nope. [15:10:20] (03CR) 10Papaul: [C: 031] remove rbf* production dns [dns] - 10https://gerrit.wikimedia.org/r/202298 (https://phabricator.wikimedia.org/T95153) (owner: 10John F. Lewis) [15:10:22] patch is fine though [15:10:25] <^d> (comments, etc won't be associated) [15:10:41] (03CR) 10John F. Lewis: admin: add admin group for zotero (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/202746 (https://phabricator.wikimedia.org/T95400) (owner: 10Dzahn) [15:10:45] PROBLEM - HHVM rendering on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:10:50] (03CR) 10Dzahn: admin: add admin group for zotero (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/202746 (https://phabricator.wikimedia.org/T95400) (owner: 10Dzahn) [15:11:14] ^d: as far as I know he’s never logged in. [15:11:15] PROBLEM - Apache HTTP on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:11:15] PROBLEM - Apache HTTP on mw1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:11:21] But, whatever is easiest for you to do, you should do. [15:11:25] <^d> andrewbogott: In which case nbd :) [15:11:26] PROBLEM - HHVM rendering on mw1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:11:29] 6operations, 10ops-codfw: setup and deploy mw2135 through mw2215 - https://phabricator.wikimedia.org/T86806#1189639 (10Papaul) @Joe and RobH can I resolve this task? [15:11:35] <^d> If he's never logged in then he can just login with the new ldap account [15:11:48] ^d: yes, except he can't. [15:11:51] (03CR) 10Dzahn: [C: 032] "just adds an empty group, adding a member will be separate because access request rules" [puppet] - 10https://gerrit.wikimedia.org/r/202746 (https://phabricator.wikimedia.org/T95400) (owner: 10Dzahn) [15:11:54] So maybe something else is happening [15:12:02] ^d: https://phabricator.wikimedia.org/T94717 [15:12:23] moritzm: ^ ? [15:12:26] <^d> Weird. [15:12:44] Actually, I think he probably did log in once but didn’t do anything. [15:12:54] PROBLEM - HHVM queue size on mw1198 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [80.0] [15:12:56] Now that I think about it — the checklist says that he had gerrit access but hadn’t tested +2 [15:13:21] ^d: I would DIY but the docs on https://wikitech.wikimedia.org/wiki/Renaming_users seem to be wrong? There’s no ‘reviewdb’ on that host. [15:13:37] <^d> It got moved? [15:13:52] <^d> That might be out of date, but the instructions generally speaking haven't changed [15:14:06] <^d> It's on whatever makes up m2 I think now [15:14:18] Ah, the old outdated docs issue :) [15:14:22] Is there some… way that I can locate a database if I know its name but not its host? [15:14:35] PROBLEM - HHVM queue size on mw1208 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [80.0] [15:15:00] (03PS1) 10Dzahn: access: add mobrovac to zotero-admins [puppet] - 10https://gerrit.wikimedia.org/r/202748 (https://phabricator.wikimedia.org/T95400) [15:15:28] <^d> andrewbogott: templates/mariadb/production-grants-m2.sql.erb points to 10.64.0.166? [15:15:35] PROBLEM - HHVM busy threads on mw1208 is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [115.2] [15:15:43] moritzm: love the new nick, easier to see than jmm (or what it was before :p) [15:15:45] PROBLEM - HHVM busy threads on mw1198 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [115.2] [15:16:07] (03CR) 10Mobrovac: "@GWicke, if that's the case, then isn't it better to have heapdump as a dep directly? That way npm and build-essential would not be needed" [puppet] - 10https://gerrit.wikimedia.org/r/193082 (https://phabricator.wikimedia.org/T86633) (owner: 10Yuvipanda) [15:16:52] <^d> andrewbogott: Alternatively, you can connect to the DB via gerrit's ssh. `ssh -p 29418 gerrit.wikimedia.org gerrit gsql` [15:17:21] ^d: are you able/willing to update those docs? Then I’ll try to do this myself. [15:17:29] (03CR) 10Dzahn: "added an admin group for this. now this is the actual access request to keep it easy" [puppet] - 10https://gerrit.wikimedia.org/r/202748 (https://phabricator.wikimedia.org/T95400) (owner: 10Dzahn) [15:17:54] that gerrit link seems to work fine [15:18:08] <^d> andrewbogott: Yeah, I'll tweak the docs to suggest using gerrit ssh to connect to the DB [15:18:13] <^d> Then we don't have to worry about where the DB is [15:19:32] 6operations, 10ops-codfw: setup and deploy mw2135 through mw2215 - https://phabricator.wikimedia.org/T86806#1189649 (10RobH) 5Open>3Resolved They all ping online. So the installation part of this is done. So all the onsite work is done. [15:20:23] <^d> andrewbogott: https://wikitech.wikimedia.org/w/index.php?title=Renaming_users&diff=152744&oldid=128712 [15:20:54] andrewbogott: "I believe his new name is ‘Muehlenhoff’ and his old name was ‘Moritz Mühlenhoff'" is correct, I'm fine with losing the gerrit history [15:20:57] (03CR) 10Rush: [C: 04-1] "https://phabricator.wikimedia.org/T94946#1189655" [puppet] - 10https://gerrit.wikimedia.org/r/202708 (https://phabricator.wikimedia.org/T94946) (owner: 10Aklapper) [15:20:59] thx [15:21:13] JohnFLewis: you have to thank whoever preregisted jmm on nickserv :-) [15:21:49] anomie: can I do my scap now or are we still waiting on something? [15:22:01] legoktm: One patch ahead of you [15:22:01] (03PS2) 10Rush: phab: indentation fixes in role class [puppet] - 10https://gerrit.wikimedia.org/r/202642 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn) [15:22:15] (03CR) 10Rush: [C: 032 V: 032] phab: indentation fixes in role class [puppet] - 10https://gerrit.wikimedia.org/r/202642 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn) [15:22:17] andrewbogott: please ping me when I should re-test a login [15:22:21] moritzm: love how they used the account for two weeks and went 'this isn't for me' and left :D [15:22:38] omg this is annoying to do without arrow keys [15:23:28] moritzm: try now? [15:23:38] 6operations, 7Swift: rsync errors slowing down object-replicator - https://phabricator.wikimedia.org/T95429#1189667 (10fgiunchedi) 3NEW a:3fgiunchedi [15:24:04] (03PS12) 10Ottomata: Set up https with archiva certificate for archiva.wikmedia.org [puppet] - 10https://gerrit.wikimedia.org/r/202474 (https://phabricator.wikimedia.org/T88139) [15:24:05] <^d> andrewbogott: Also added a note about the caching at the end [15:24:13] <^d> (it doesn't require a restart, just not-stale-caches) [15:24:18] (03CR) 10GWicke: "The reasons for why we decided to no longer require heapdump by default are:" [puppet] - 10https://gerrit.wikimedia.org/r/193082 (https://phabricator.wikimedia.org/T86633) (owner: 10Yuvipanda) [15:24:27] moritzm: btw gerrit, i added you those https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/puppet+branch:production+topic:sshprotocols,n,z [15:24:35] ^d: that matters! [15:24:53] !log anomie Synchronized php-1.25wmf24/includes/page/Article.php: SWAT: More debugging for [[phab:T92046]] ([[gerrit:202602]], [[gerrit:202603]]) (duration: 00m 13s) [15:24:54] anomie: ^ Test please [15:24:59] Logged the message, Master [15:25:03] just if you feel like it and i categorized it as security [15:25:24] (03PS1) 10Alexandros Kosiaris: hiera nuyaml: disable mainrole lookups [puppet] - 10https://gerrit.wikimedia.org/r/202749 [15:25:28] anomie: Editing isn't broken. Can't really test the error logging. [15:25:32] 6operations, 10ops-codfw: mw2128 not rebooting after network driver crash, blank console - https://phabricator.wikimedia.org/T95264#1189695 (10Papaul) After working with the Dell Engineer he came to the conclusion that the main board needs to be replaced. A tech will come and replace the main board on the syst... [15:25:33] legoktm: Ok, all yours [15:25:40] (03PS13) 10Ottomata: Set up https with archiva certificate for archiva.wikmedia.org [puppet] - 10https://gerrit.wikimedia.org/r/202474 (https://phabricator.wikimedia.org/T88139) [15:26:50] mutante: yep, I've checked these already and wrote them in a text file, but couldn't submit the status yet due to the failing gerrit login :-) will commit them later on [15:26:56] 6operations, 6Commons, 6Multimedia: 220px of Fachada_e_lateral_da_Catedral_São_Sebastião_após_pintura,_Coronel_Fabriciano_MG.JPG not purging - https://phabricator.wikimedia.org/T95333#1189708 (10fgiunchedi) 5Open>3Resolved a:3fgiunchedi I've manually removed the file from the handoff and purged it, I'm... [15:27:36] andrewbogott: works now, thank! [15:27:46] moritzm: want to try out your +2 while you’re in there? [15:27:56] (thanks ^d) [15:28:30] <^d> yw [15:28:37] (03CR) 10Andrew Bogott: limn: minor lint and Resource attributes quoting [puppet] - 10https://gerrit.wikimedia.org/r/195616 (owner: 10Matanya) [15:28:50] moritzm: try that one ^ [15:29:26] hmm, where should the +2 appear? [15:29:52] (03CR) 10GWicke: "I've created https://phabricator.wikimedia.org/T95431 for the heapdump discussion to make sure that we have that info around in the future" [puppet] - 10https://gerrit.wikimedia.org/r/193082 (https://phabricator.wikimedia.org/T86633) (owner: 10Yuvipanda) [15:30:08] <^d> moritzm: When you do a review on a patch, you should see more voting options now [15:30:14] moritzm: click ‘add comment’ way on the bottom [15:30:31] Try to +2, and publish (but don’t submit) [15:30:35] 6operations, 10Parsoid, 7service-runner: Decide whether to install heapdump by default, or install npm & install on demand - https://phabricator.wikimedia.org/T95431#1189731 (10GWicke) [15:30:40] then see if it takes, or you get an angry black screen [15:30:56] 6operations, 10Parsoid, 7service-runner: Decide whether to install heapdump by default, or continue to install npm & install on demand - https://phabricator.wikimedia.org/T95431#1189734 (10GWicke) [15:31:07] ah, yes. when I click on "add comment" I get the +2 options [15:31:20] I'll be unavailable for 30min, but it seems to work [15:31:30] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: onboarding Moritz Muehlenhoff in ops - https://phabricator.wikimedia.org/T94717#1189735 (10Andrew) [15:32:10] moritzm: I don’t see your review, maybe you just didn’t publish. [15:32:34] mutante: what does ‘icinga user and permissions’ refer to? [15:32:51] (03PS14) 10Ottomata: Set up https with archiva certificate for archiva.wikmedia.org [puppet] - 10https://gerrit.wikimedia.org/r/202474 (https://phabricator.wikimedia.org/T88139) [15:34:59] andrewbogott: I think it means what you have, so the ability to acknowledge things, toggle service notifications as well as the standard Icinga contact stanzas [15:35:15] 6operations, 10Parsoid, 7service-runner: Decide whether to install heapdump by default, or continue to install npm & install on demand - https://phabricator.wikimedia.org/T95431#1189748 (10GWicke) [15:35:50] (03PS15) 10Ottomata: Set up https with archiva certificate for archiva.wikmedia.org [puppet] - 10https://gerrit.wikimedia.org/r/202474 (https://phabricator.wikimedia.org/T88139) [15:36:53] !log legoktm Started scap: Log promote to global renames in the global rename log https://gerrit.wikimedia.org/r/202742 [15:36:57] Logged the message, Master [15:37:17] :o new fancy scap logo? [15:37:27] (03PS16) 10Ottomata: Set up https with archiva certificate for archiva.wikmedia.org [puppet] - 10https://gerrit.wikimedia.org/r/202474 (https://phabricator.wikimedia.org/T88139) [15:37:36] (03CR) 10Ottomata: [C: 032 V: 032] Set up https with archiva certificate for archiva.wikmedia.org [puppet] - 10https://gerrit.wikimedia.org/r/202474 (https://phabricator.wikimedia.org/T88139) (owner: 10Ottomata) [15:40:21] (03PS1) 10Andrew Bogott: Rename Moritz to his new ldap name [puppet] - 10https://gerrit.wikimedia.org/r/202752 [15:40:51] (03PS1) 10Ottomata: Make sure archiva certificate is installed before refreshing nginx service [puppet] - 10https://gerrit.wikimedia.org/r/202753 [15:40:54] (03PS2) 10Andrew Bogott: Rename Moritz to his new ldap name [puppet] - 10https://gerrit.wikimedia.org/r/202752 (https://phabricator.wikimedia.org/T94717) [15:44:47] (03CR) 10Dzahn: "i believe it's Muehlenhoff, not Muelenhoff" [puppet] - 10https://gerrit.wikimedia.org/r/202752 (https://phabricator.wikimedia.org/T94717) (owner: 10Andrew Bogott) [15:46:11] dang [15:46:19] (03PS1) 10Dereckson: New WP and VP namespaces aliases on lv.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202755 (https://phabricator.wikimedia.org/T95106) [15:46:20] andrewbogott: hmm. i just checked on terbium [15:46:24] but i cant find it? [15:46:30] thanks for doing the rename, btw! [15:46:36] mutante: look for jmm [15:46:59] ah, and the cn is used for icinga, right [15:47:01] cn: Muehlenhoff [15:47:07] ok, will fix thanks [15:47:11] so that, yep, thank you too [15:48:00] (03CR) 10Ottomata: [C: 032] Make sure archiva certificate is installed before refreshing nginx service [puppet] - 10https://gerrit.wikimedia.org/r/202753 (owner: 10Ottomata) [15:48:22] (03PS3) 10Andrew Bogott: Rename Moritz to his new ldap name [puppet] - 10https://gerrit.wikimedia.org/r/202752 (https://phabricator.wikimedia.org/T94717) [15:48:55] (03CR) 10Muehlenhoff: [C: 032 V: 032] limn: minor lint and Resource attributes quoting [puppet] - 10https://gerrit.wikimedia.org/r/195616 (owner: 10Matanya) [15:49:04] (03CR) 10Dzahn: [C: 031] "[terbium:~] $ ldaplist -l passwd jmm" [puppet] - 10https://gerrit.wikimedia.org/r/202752 (https://phabricator.wikimedia.org/T94717) (owner: 10Andrew Bogott) [15:49:17] 6operations, 5Patch-For-Review: Force https for archiva.wikimedia.org - https://phabricator.wikimedia.org/T88139#1189867 (10Ottomata) 5Open>3Resolved [15:49:20] (03PS1) 10Alexandros Kosiaris: Allow hiera role_backend to be debuggable via hiera CLI [puppet] - 10https://gerrit.wikimedia.org/r/202756 [15:49:27] andrewbogott: I'm back, I just did so now [15:49:45] moritzm: looks good! [15:49:50] How can I process https://phabricator.wikimedia.org/T15712 this issue now? [15:49:56] it passed it's security review [15:49:58] and you just merged stuff, nice! [15:50:17] moritzm: oh, but for the record — you’ll almost never want to tick the ‘verified’ boxes — those are usually handled by testbots. [15:50:17] cmjohnson1: mornin :) [15:50:18] bblack: https://gerrit.wikimedia.org/r/#/c/200732/ [15:50:22] (03CR) 10Alexandros Kosiaris: [C: 032] access: add mobrovac to zotero-admins [puppet] - 10https://gerrit.wikimedia.org/r/202748 (https://phabricator.wikimedia.org/T95400) (owner: 10Dzahn) [15:50:26] hi ottomata [15:50:32] moritzm: you know about puppet-merge on the master yet? [15:50:43] just saw it, i must have not ytped yes correctly bfore [15:50:46] mutante: he didn’t merge, just +2’d. [15:50:47] you can merge that [15:50:52] moritzm: and yea, what andrew said, give jenkins a chance to do the V part of it [15:50:55] cmjohnson1: https://phabricator.wikimedia.org/T95263 [15:50:56] ah [15:51:04] ottomata: merged yours as well [15:51:05] or i can right now [15:51:07] ok danke [15:51:12] ottomata: yep saw that [15:51:15] ok cool [15:51:18] thanks [15:51:23] andrewbogott: gotcha, cool, confirms the +2 part [15:51:26] (03CR) 10Andrew Bogott: [C: 032] Rename Moritz to his new ldap name [puppet] - 10https://gerrit.wikimedia.org/r/202752 (https://phabricator.wikimedia.org/T94717) (owner: 10Andrew Bogott) [15:51:32] (03PS3) 10Dereckson: Added ymt.adlibhosting.com to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202736 (https://phabricator.wikimedia.org/T95418) [15:51:32] andrewbogott, mutante: shall I undo +2 somehow? [15:51:40] haha mutante: Merge these changes? (yes/no)? ues [15:51:42] (03PS4) 10Dereckson: Added ymt.adlibhosting.com to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202736 (https://phabricator.wikimedia.org/T95418) [15:51:43] ottomata: more than likely that is a new mainboard [15:51:50] moritzm: doesn’t matter, I’m going to merge that shortly anyway [15:51:51] oof [15:51:54] nasty [15:52:02] andrewbogott: ok [15:52:06] arlolra: I looked at that a bit last week, but it seems unclear to me. The existing parameters on various clusters for retry5xx and related are a mixed bag that probably needs cleaning up in general... [15:52:08] moritzm: nah, it's ok [15:52:12] mutante: not yet, do you have a wiki reference? [15:52:31] 10Ops-Access-Requests, 6operations, 6Services, 5Patch-For-Review: Allow mobrovac to restart Zotero - https://phabricator.wikimedia.org/T95400#1189877 (10akosiaris) 5Open>3Resolved a:3akosiaris Change merged, @mobrovac you should now have access to restart zotero. Btw, when deploying new translators,... [15:52:39] actually, moritzm, why don’t you merge it as an exercise? [15:52:52] First get yourself a shell session on palladium.eqiad.wmnet [15:53:11] moritzm: so if you would actually merge stuff, which you didnt. then you would also have to go to the puppetmaster and "sudo puppet-merge" there. it's a sanity check a human has to do [15:53:26] otherwise you wouldnt see an effect on the nodes [15:54:41] 10Ops-Access-Requests, 6operations, 6Services, 5Patch-For-Review: Allow mobrovac to restart Zotero - https://phabricator.wikimedia.org/T95400#1189886 (10akosiaris) And I just realized I violated the policy of discussing sudo right escalation in meetings. It seemed so obvious and non controversial change to... [15:54:48] (03PS1) 10Andrew Bogott: Remove icinga rights for a couple of departed employees. [puppet] - 10https://gerrit.wikimedia.org/r/202759 [15:54:53] andrewbogott, mutante: I can do the merge , but need to a shop first which closes at 19h, I'll catch one of you later the evening, then? [15:55:03] bblack: ah, ok, thanks. wasn't sure if you saw it. this just makes it consistent with the others that have 'retry5xx' => 1, not positive it'll fix it but probably won't hurt either [15:55:08] moritzm: yeah, it doesn’t matter, you’ll get your chance eventually :) [15:55:21] k, will ping you later [15:55:30] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: onboarding Moritz Muehlenhoff in ops - https://phabricator.wikimedia.org/T94717#1189888 (10Andrew) [15:55:44] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: onboarding Moritz Muehlenhoff in ops - https://phabricator.wikimedia.org/T94717#1189891 (10Andrew) 5Open>3Resolved That, at last, is everything. [15:55:50] moritzm: yes, ttyl !:) [15:56:20] it will be already merged [15:56:21] (03CR) 10Andrew Bogott: "This needs a manual rebase, alas" [puppet] - 10https://gerrit.wikimedia.org/r/195616 (owner: 10Matanya) [15:58:30] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: onboarding Moritz Muehlenhoff in ops - https://phabricator.wikimedia.org/T94717#1189900 (10Dzahn) gerrit +2 confirmed working icinga logins should be fixed now , by https://gerrit.wikimedia.org/r/#/c/202752/ thanks Andrew [15:59:21] !log legoktm Finished scap: Log promote to global renames in the global rename log https://gerrit.wikimedia.org/r/202742 (duration: 22m 27s) [15:59:24] Logged the message, Master [16:02:26] 10Ops-Access-Requests, 6operations, 6Services, 5Patch-For-Review: Allow mobrovac to restart Zotero - https://phabricator.wikimedia.org/T95400#1189912 (10Dzahn) We have a group and a member but I believe we forgot to use the group on nodes. Is that a Hiera edit? [16:02:32] 6operations, 7domains: .ua and .укр domain registration - https://phabricator.wikimedia.org/T95433#1189913 (10Base) 3NEW [16:07:31] 6operations, 5Patch-For-Review: failed icinga/graphite login for Moritz - https://phabricator.wikimedia.org/T94729#1189941 (10Dzahn) graphite: Yes, only the HTTP auth part is relevant. Just recently i was wondering the same thing. -> T93158 . so that's done icinga: this does have 2 parts, the HTTP auth and... [16:07:57] legoktm: ;( [16:08:03] * :( [16:08:56] 10Ops-Access-Requests, 6operations, 10Analytics-EventLogging, 5Patch-For-Review: Grant user 'tomasz' access to dbstore1002 for Event Logging data - https://phabricator.wikimedia.org/T95036#1189944 (10RobH) a:3mark [16:08:56] (03CR) 10Smalyshev: [C: 031] Create Wikidata ttl dumps on Mondays 23:00:00 (UTC) [puppet] - 10https://gerrit.wikimedia.org/r/202734 (owner: 10Hoo man) [16:09:21] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: onboarding Moritz Muehlenhoff in ops - https://phabricator.wikimedia.org/T94717#1189949 (10Dzahn) [16:09:23] 6operations, 5Patch-For-Review: failed icinga/graphite login for Moritz - https://phabricator.wikimedia.org/T94729#1189947 (10Dzahn) 5Open>3Resolved a:3Dzahn [16:09:29] (03PS1) 10Alexandros Kosiaris: Apply zotero-admin to SCA [puppet] - 10https://gerrit.wikimedia.org/r/202762 (https://phabricator.wikimedia.org/T95400) [16:10:15] akosiaris: I may be missing something but I didn't see manager approval for the access ticket? (I probably am missing something) [16:11:59] 10Ops-Access-Requests, 6operations, 6Services, 5Patch-For-Review: Allow mobrovac to restart Zotero - https://phabricator.wikimedia.org/T95400#1189993 (10akosiaris) 5Resolved>3stalled Indeed @Dzahn and thanks for noticing it. And I don't have to revert since due to that I 've effectively NOT violated th... [16:12:23] Ok, allocating a deployment server in codfw per https://phabricator.wikimedia.org/T91678 [16:12:31] (03PS2) 10ArielGlenn: Create Wikidata ttl dumps on Mondays 23:00:00 (UTC) [puppet] - 10https://gerrit.wikimedia.org/r/202734 (owner: 10Hoo man) [16:12:35] (03PS3) 10Hoo man: Create Wikidata ttl dumps on Mondays 23:00:00 (UTC) [puppet] - 10https://gerrit.wikimedia.org/r/202734 [16:12:41] I'm working with papaul on this, but we're going to do it all in public channel in case anyone wonders how this goes about happening. [16:13:12] papaul: So, I typically handle all the server allocation/hardware-requests [16:13:15] robh: ooh, interesting :) [16:13:21] but, it never hurts that others know how the hell this happens [16:13:32] (typically cmjohnson1 is pretty knowledgeable on what to do here) [16:13:50] So, first step, new deployment server should be similar to tin hardware specifications [16:14:09] unfortunately, gettin ghtis data isnt public, cuz you have to have the service tag =[ [16:14:14] mutante: ;-) [16:14:17] (from racktables, which isnt public) [16:14:19] akosiaris: heh:) [16:14:25] or login to tin and look directly [16:15:26] 6operations, 10hardware-requests, 3wikis-in-codfw: setup deployment server in codfw (tin equivalent) - https://phabricator.wikimedia.org/T91678#1190041 (10RobH) robh@tin:~$ sudo lshw -class disk *-disk description: SCSI Disk product: PERC H310 vendor: Winbond Electron... [16:17:56] and tin is using the h310 in raid config (ewwww) [16:17:56] but, it emans its just dual 500GB disks [16:17:56] means [16:17:57] (03CR) 10Smalyshev: [C: 031] Create Wikidata ttl dumps on Mondays 23:00:00 (UTC) [puppet] - 10https://gerrit.wikimedia.org/r/202734 (owner: 10Hoo man) [16:17:57] so, i have all our misc spares on https://wikitech.wikimedia.org/wiki/Server_Spares [16:17:57] YuviPanda: hi [16:17:58] so, lardner looks good for this. [16:18:06] YuviPanda: can you check if puppet in beta is latest? [16:18:13] 6operations, 10Analytics, 6Scrum-of-Scrums, 10Wikipedia-Android-App, and 4 others: Avoid cache fragmenting URLs for Share a Fact shares - https://phabricator.wikimedia.org/T90606#1190049 (10dr0ptp4kt) 5Open>3Resolved [16:18:38] i remove it off the spares page and note the phab task # in the edit summary [16:18:53] papaul: So the allocation of the actual system is not always quite so straightforward [16:24:14] ok [16:24:15] and it requires that i coordinate with mark for our roadmap and budget [16:24:15] YuviPanda: ie points to a17af5dc5e2de710e764fba5d5b6bb1ddca48b86 instead of HEAD [16:24:15] on this one, for instance, you can see that he had to specifically approve the allocation [16:24:15] but then he leaves it to my judgement on which spare system to snag [16:24:16] (if i get it wrong, someone will eventually notice ;) [16:24:17] Usually its pretty straightforward, as we have either a system to copy [16:24:19] or the request calls out how many cpu cores and memory, etc... [16:24:19] so, we'll use lardner, which is an old tampa server. So then I create the tickets for the onsite work to rename the system [16:24:19] and we have to pick out a star name for it [16:24:19] https://en.wikipedia.org/wiki/List_of_proper_names_of_stars [16:24:19] this is where we can make the devs who deploy hate us forever with a name like eiximenis ;D [16:24:20] (lets not do that, i picked tin for eqiad for ease of their typing) [16:24:21] if there are services with 2 nodes, use binary star systems /jk :) [16:24:21] legoktm: Any (easy) way to see the full logs? [16:24:21] mutante: frack snagged a constellation! [16:24:21] heh [16:24:21] robh: nice !:) [16:24:21] 6operations, 10RESTBase, 7Performance: Create a path entry point for the REST API under regular domains - https://phabricator.wikimedia.org/T95229#1190055 (10faidon) Sounds good to me. Would we unset Cookies in VCL as well, or does this mean that RESTBase will suddenly get potentially sensitive cookies (e.g.... [16:24:21] i thought it was cool too [16:24:21] so, mira [16:24:21] mira is a nice easy star name for a deployment system [16:24:22] where the devs who use it won't hate our guts for the spelling. [16:24:22] papaul: So im going to create an onsite ticket fo ryou detailing the onsite and dns steps to take [16:24:22] sounds good to me [16:24:22] give me a few minutes and i'll link them =] [16:24:22] robh: ok [16:24:22] I like ain for a simple one robh for that's not my job :) [16:25:57] (03PS4) 10ArielGlenn: Create Wikidata ttl dumps on Mondays 23:00:00 (UTC) [puppet] - 10https://gerrit.wikimedia.org/r/202734 (owner: 10Hoo man) [16:25:57] since we keep finding more and more exoplanets we can use stars and their planets for master/slave type setups [16:25:58] mutante: awesome idea [16:26:06] (03CR) 10ArielGlenn: [C: 032] Create Wikidata ttl dumps on Mondays 23:00:00 (UTC) [puppet] - 10https://gerrit.wikimedia.org/r/202734 (owner: 10Hoo man) [16:31:33] JohnFLewis: but the problem might be they have boring names only with numbers [16:31:34] mutante: go tell people to make them more interesting then [16:31:34] or just rename the Wikipedia pages ;) [16:31:35] JohnFLewis: you mean i should file a bug upstream a the International Astronomical Union ?:) [16:31:36] mutante: depends. do they use phab or bugzilla ;) [16:31:37] http://www.iau.org/public/themes/naming_exoplanets/ [16:31:38] "The IAU fully supports the involvement of the general public in the naming of astronomical objects" there you go:) [16:31:38] lovely [16:32:17] JohnFLewis: CC @Thierry http://www.iau.org/science/scientific_bodies/working_groups/209/ . ok , back to stuff:) [16:32:21] 6operations: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1190079 (10RobH) 3NEW a:3RobH [16:32:37] 6operations: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1190093 (10RobH) [16:32:39] 6operations, 10hardware-requests, 3wikis-in-codfw: setup deployment server in codfw (tin equivalent) - https://phabricator.wikimedia.org/T91678#1093119 (10RobH) [16:32:54] 6operations, 10hardware-requests, 3wikis-in-codfw: setup deployment server in codfw (tin equivalent) - https://phabricator.wikimedia.org/T91678#1190103 (10RobH) 5Open>3Resolved system mira/wmf5818 is assigned for this task. I've linked in task T95436 for the installation and deployment. [16:32:55] 6operations: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1190079 (10RobH) [16:33:02] 6operations, 7domains: .ua and .укр domain registration - https://phabricator.wikimedia.org/T95433#1190192 (10Krenair) Hmm. Don't chapters usually own their own domains? What would be done with these domains? wikipedia.org.ua sounds like something for #WMF-Legal to deal with [16:34:37] 6operations, 10ops-codfw: server mira setup/mgmt dns/label/port info - https://phabricator.wikimedia.org/T95437#1190305 (10RobH) 3NEW a:3Papaul [16:34:50] papaul: ok, https://phabricator.wikimedia.org/T95437 [16:34:56] so that is the dns for mgmt [16:35:00] and the onsite setup steps [16:36:16] (03PS1) 10Cmjohnson: Adding dns entries for labvirt1001-6 [dns] - 10https://gerrit.wikimedia.org/r/202765 [16:36:16] got it [16:36:16] take a look, and feel free to ask questions [16:36:16] ok [16:36:34] 6operations: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1190351 (10RobH) [16:37:10] robh: i see you created two phab ftask or the same server one to track down and one for setup [16:37:36] one is the setup for the entire system [16:37:39] https://phabricator.wikimedia.org/T95436 [16:37:41] that is mine [16:37:45] the other is the onsite tasks plus dns [16:42:37] https://phabricator.wikimedia.org/T95437 [16:42:37] which is a sub-task [16:42:38] ok [16:42:38] basically since an install has a few distinct parts, it seems easier to break the install up into sub-tasks [16:42:38] that make sense [16:42:39] right now it has the hands off software setup (dns/os/installer/etc), onsite tasks (label/port), and network task (vlan and port description on switch) [16:42:39] so depending on who does what, some things dont get subtasks [16:42:39] i tend to just do the network stuff as part of the main ticket, since i can do both [16:42:39] got it [16:42:39] but, when i didnt know how, i made sub-tasks/tickets for our network admins [16:42:39] robh: will start working on that [16:42:40] I also keep the hardware-request task distinct and on its own, and resolve it when done [16:42:40] so i can easily search phab history for server allocation requests [16:42:40] ok [16:42:41] (someone once asked why it didnt just travel over to the new projects) [16:42:41] robh: i have a code waiting for review can you please check that thanks [16:43:16] papaul: you do? [16:43:23] yes [16:43:25] the asset tag thing? [16:43:32] yes [16:43:35] papaul and rob: we also have this https://gerrit.wikimedia.org/r/#/c/202298/ it's about deleting rbf hosts [16:43:36] lemme take a gander [16:43:44] i started the decom and said in this case mgmt should also go [16:43:49] mutante i review that [16:43:57] because the entire service type "rbf" is not used anymore [16:44:00] papaul: thanks! [16:44:03] yw [16:44:06] (03CR) 10RobH: [C: 032] add asset tag mgmt info fro mw(2135-2214) [dns] - 10https://gerrit.wikimedia.org/r/200889 (owner: 10Papaul) [16:44:16] its merged and live now [16:44:20] thanks [16:44:27] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [24.0] [16:44:33] so you may wanna git pull on your local repo [16:44:37] before making those new changes ;D [16:44:47] (or you get the joy of rebase) [16:45:15] ok [16:48:58] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [35.0] [16:49:31] 6operations, 10RESTBase, 7Performance: Create a path entry point for the REST API under regular domains - https://phabricator.wikimedia.org/T95229#1190411 (10GWicke) > Would we unset Cookies in VCL as well, or does this mean that RESTBase will suddenly get potentially sensitive cookies (e.g. login)? That's o... [16:49:46] (03Abandoned) 10Dzahn: base: add nmap to standard packages [puppet] - 10https://gerrit.wikimedia.org/r/201725 (owner: 10Dzahn) [16:50:23] labs keeps going down and then up after about a minute...could someone look at it? [16:50:44] 6operations, 10RESTBase, 7Performance: Create a path entry point for the REST API under regular domains - https://phabricator.wikimedia.org/T95229#1190422 (10GWicke) [16:50:44] those labstore warnings don't look good... [16:51:57] Coren: You about? [16:52:03] andrewbogott_afk: is afk ;D [16:52:10] chasemp: cc'ed jalexander :) [16:52:29] I just unpacked a dump in a labs job, so could have been partly me [16:54:08] RECOVERY - Persistent high iowait on labstore1001 is OK: OK: Less than 50.00% above the threshold [25.0] [16:54:41] 6operations: Make sure that Anasuya has been removed from fdcsupport@ alias - https://phabricator.wikimedia.org/T95212#1190441 (10Dzahn) a:3Dzahn [16:57:21] hoo: heh [16:57:29] seems like it cleared, so yay [16:57:56] the iowaits are also new and maybe not quite fully adjusted (iirc) [16:58:02] for labstore that is [16:58:03] (03CR) 10John F. Lewis: [C: 031] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/202759 (owner: 10Andrew Bogott) [16:58:18] RECOVERY - High load for whatever reason on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0] [16:59:06] 6operations, 10RESTBase, 7Performance: Create a path entry point for the REST API under regular domains - https://phabricator.wikimedia.org/T95229#1190459 (10GWicke) [16:59:39] (03CR) 10John F. Lewis: [C: 031] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/202656 (owner: 10Dzahn) [17:00:24] (03PS1) 10Papaul: added mira mgmt dns entries [dns] - 10https://gerrit.wikimedia.org/r/202771 [17:00:29] robh: Am now. What up [17:00:37] Coren: disregard sorry [17:00:45] iowait on labstore when hoo unpacked something [17:00:50] so its likely needing tweaking [17:00:59] * Coren pointedly ignores robh. [17:01:02] ? [17:01:15] was there an email about this i missed and have thus insulted? wasnt intent =[ [17:01:26] Coren: < legoktm> labs keeps going down and then up after about a minute... < hoo> I just unpacked a dump in a labs job, so could have been partly me icinga-wm> RECOVERY - Persistent high iowait on labstore1001 is OK [17:01:35] No, no! I was just obeying your "disregard" directive. :-) [17:01:38] ahh [17:01:38] heh [17:02:01] * Coren looks at the graphs to make sure. [17:02:24] 6operations, 10RESTBase, 7Performance: Create a path entry point for the REST API under regular domains - https://phabricator.wikimedia.org/T95229#1190474 (10GWicke) [17:02:59] robh: Yeah, that was just "normal" pegging I/O for a while. Whatever hoo did was *big* [17:04:11] FYI: If you look at http://grafana.wikimedia.org/#/dashboard/db/labs-monitoring you can see what's up - note the huge spike in network fitting a long period of pegged I/O. [17:04:43] (03PS1) 10Ori.livneh: Fixes for rrd-navtiming [puppet] - 10https://gerrit.wikimedia.org/r/202772 [17:04:57] The huge iowait spike is right at the end and matches lots of write getting flushed at the end of a long operation. [17:09:12] hoo: By the by, if you are unpacking a dump to work on it for a while but don't need to keep it around indefinitely, /data/scratch is a better place for that - doesn't share IO bandwidth with the general filesystems too so it's generally faster to boot. [17:11:01] Coren: Yeah, will do... will also probably clean out my home later on [17:14:04] 6operations, 10RESTBase, 7Performance: Set up a generic API base path to be used by action & REST APIs - https://phabricator.wikimedia.org/T95229#1190514 (10GWicke) [17:16:24] papaul: looks good, but you should have the task in the commit msg [17:16:32] hoo: Just keep in mind that /data/scratch is explicitly transient storage. It's not cleaned out and tends to be left alone but there are no long-term guarantees. Don't put anything in there you couldn't regenerate if you hard to. :-) [17:16:37] (03PS2) 10RobH: added mira mgmt dns entries [dns] - 10https://gerrit.wikimedia.org/r/202771 (owner: 10Papaul) [17:16:48] 6operations, 10RESTBase, 7Performance: Set up a generic API base path to be used by action & REST APIs - https://phabricator.wikimedia.org/T95229#1190519 (10GWicke) [17:16:49] i just added via gerrit web interface ;D [17:17:25] (03CR) 10RobH: [C: 032] added mira mgmt dns entries [dns] - 10https://gerrit.wikimedia.org/r/202771 (owner: 10Papaul) [17:17:50] papaul: its merged, so you can use that IP for the mgmt [17:18:02] it id done testing now [17:18:06] cool [17:19:26] 6operations: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1190542 (10Papaul) [17:19:28] 6operations, 10ops-codfw: server mira setup/mgmt dns/label/port info - https://phabricator.wikimedia.org/T95437#1190540 (10Papaul) 5Open>3Resolved Dns entries complete racktables updated physical label in place mgmt setup complete Bios setup complete test complete mira 10.193.2.171 ge-5/0/13 [17:21:06] Robh: i am going to get me some food will be back [17:21:14] k [17:25:59] (03CR) 10Legoktm: "The current flake8 config doesn't cover scripts/runner because it doesn't end in .py...the current practice is to add a "flake8-bin" tox e" [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 (owner: 10Yuvipanda) [17:26:24] (03CR) 10Ori.livneh: [C: 032] Fixes for rrd-navtiming [puppet] - 10https://gerrit.wikimedia.org/r/202772 (owner: 10Ori.livneh) [17:31:25] 6operations: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1190602 (10RobH) [17:31:52] (03CR) 10coren: "Minor quibble about the choice of license." (031 comment) [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 (https://phabricator.wikimedia.org/T95255) (owner: 10Yuvipanda) [17:32:59] pretty sure the deployment system still has to be unbuntu.... [17:33:14] 6operations, 10Deployment-Systems, 6Release-Engineering: Determine Trebuchet/git-deploy maintenance plan - https://phabricator.wikimedia.org/T85008#1190603 (10demon) >>! In T85008#964243, @Ryan_Lane wrote: > I'm more than happy to add WMF as maintainers. It's not unmaintained, but no one has been bugging me... [17:34:43] (03PS1) 10Ori.livneh: rrd-navtiming: pass `path` to update_rrds() [puppet] - 10https://gerrit.wikimedia.org/r/202779 [17:35:00] 6operations, 10ops-eqiad: check Temperature Alarm: asw-d-eqiad. - https://phabricator.wikimedia.org/T94997#1190610 (10faidon) 5Open>3Resolved These started appearing because I swapped routing engines to FPCs 4 & 5 the other day which increased their CPU and hence their temperature by 15-20C(!) That said,... [17:35:25] (03PS1) 10RobH: mira base install params [puppet] - 10https://gerrit.wikimedia.org/r/202780 [17:35:51] 6operations, 10Deployment-Systems, 6Release-Engineering: Determine Trebuchet/git-deploy maintenance plan - https://phabricator.wikimedia.org/T85008#1190612 (10greg) 5Open>3Resolved >>! In T85008#1190603, @demon wrote: >>>! In T85008#964243, @Ryan_Lane wrote: >> I'm more than happy to add WMF as maintaine... [17:38:38] (03PS1) 10RobH: setting mira production dns [dns] - 10https://gerrit.wikimedia.org/r/202781 [17:38:46] (03CR) 10RobH: [C: 032] mira base install params [puppet] - 10https://gerrit.wikimedia.org/r/202780 (owner: 10RobH) [17:38:58] (03PS2) 10Ori.livneh: rrd-navtiming: pass `path` to update_rrds() [puppet] - 10https://gerrit.wikimedia.org/r/202779 [17:39:07] (03CR) 10RobH: [C: 032] setting mira production dns [dns] - 10https://gerrit.wikimedia.org/r/202781 (owner: 10RobH) [17:43:09] (03CR) 10Ori.livneh: [C: 032] rrd-navtiming: pass `path` to update_rrds() [puppet] - 10https://gerrit.wikimedia.org/r/202779 (owner: 10Ori.livneh) [17:43:16] 10Ops-Access-Requests, 6operations, 6Services, 5Patch-For-Review: Allow mobrovac to restart Zotero - https://phabricator.wikimedia.org/T95400#1190644 (10RobLa-WMF) Approved [17:45:38] PROBLEM - puppet last run on mw2057 is CRITICAL: CRITICAL: puppet fail [17:51:08] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [17:58:38] 6operations, 7Varnish: Move bits traffic to text/mobile clusters - https://phabricator.wikimedia.org/T95448#1190727 (10BBlack) 3NEW [17:58:54] 6operations, 7Varnish: Move bits traffic to text/mobile clusters - https://phabricator.wikimedia.org/T95448#1190738 (10BBlack) [17:58:55] 6operations, 7Performance: Optimize prod's resource domains for SPDY/HTTP2 - https://phabricator.wikimedia.org/T94896#1175461 (10BBlack) [18:00:04] twentyafterfour, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150408T1800). Please do the needful. [18:00:47] PROBLEM - puppet last run on mw2120 is CRITICAL: CRITICAL: puppet fail [18:01:39] 6operations, 10procurement, 7Varnish: Purchase SSDs for legacy bits cache machines for re-use in other clusters - https://phabricator.wikimedia.org/T95449#1190742 (10BBlack) 3NEW [18:02:28] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [18:04:08] 6operations, 10procurement, 7Varnish: Purchase SSDs for legacy bits cache machines for re-use in other clusters - https://phabricator.wikimedia.org/T95449#1190751 (10BBlack) [18:04:47] RECOVERY - puppet last run on mw2057 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [18:05:56] (03PS2) 10MaxSem: WIP: Hierator puppetization [puppet] - 10https://gerrit.wikimedia.org/r/202743 [18:13:44] 6operations, 3Interdatacenter-IPsec, 7Monitoring, 5Patch-For-Review: Monitor IPsec status - https://phabricator.wikimedia.org/T92603#1190778 (10Gage) [18:15:40] 6operations, 3Interdatacenter-IPsec, 7Monitoring, 5Patch-For-Review: Monitor IPsec status - https://phabricator.wikimedia.org/T92603#1115786 (10Gage) Patch is submitted: https://gerrit.wikimedia.org/r/#/c/199787/ Instead of simply counting established Security Assocations or defining a monitor for each SA... [18:16:11] (03CR) 10Aaron Schulz: [C: 032] Set "recentchanges" query group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202633 (owner: 10Aaron Schulz) [18:17:57] RECOVERY - puppet last run on mw2120 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [18:19:37] (03CR) 10Cscott: [C: 031] "Works for me." [puppet] - 10https://gerrit.wikimedia.org/r/199952 (owner: 10Ori.livneh) [18:20:25] robh: my laptop crashed and I lost my backscroll. What did I miss? [18:20:48] nothing, labstore iowait alarms was all it was [18:21:07] and its possible need for tweaking, but it was awhile ago and marc also saw [18:22:16] (03PS3) 10MaxSem: WIP: Hierator puppetization [puppet] - 10https://gerrit.wikimedia.org/r/202743 [18:24:13] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 0 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [18:24:35] robh: my internet connection is flaky and also my laptop crashed and threw away the backscroll… what did I miss? [18:24:51] nothing, labstore iowait alarms was all it was [18:24:54] and its possible need for tweaking, but it was awhile ago and marc also saw [18:25:18] ah, great. [18:25:36] I’m having a complicated day, will be be on and off a lot [18:25:49] not ideal for clinic duty :( [18:26:05] 6operations, 6Mobile-Apps, 6Services: Deployment of Mobile App's service on the SCA cluster - https://phabricator.wikimedia.org/T92627#1191025 (10bearND) @mobrovac Any updates about this? When can we expect the service to be deployed on the SCA cluster? [18:26:13] dunno, sounds good for task triaging, just not for irc ;D [18:27:10] 6operations: Make sure that Anasuya has been removed from fdcsupport@ alias - https://phabricator.wikimedia.org/T95212#1191029 (10Dzahn) Hi, i removed her from the alias. This is how it looks now: fdcsupport: wolliff, klove Best, Daniel P.S. In the past we used to get (automated?) emails from Julie when... [18:27:30] 6operations: Make sure that Anasuya has been removed from fdcsupport@ alias - https://phabricator.wikimedia.org/T95212#1191038 (10Dzahn) 5Open>3Resolved [18:32:26] (03PS1) 10Southparkfan: Direct labsconsole.wm.o through Apache cluster [puppet] - 10https://gerrit.wikimedia.org/r/202788 (https://phabricator.wikimedia.org/T48544) [18:33:49] (03PS1) 10Thcipriani: Allow for new labs domain schema in ENC [puppet] - 10https://gerrit.wikimedia.org/r/202790 [18:34:02] (03CR) 10John F. Lewis: [C: 04-1] "Need to run refreshDomainRedirects (or whatever the shell script is)" [puppet] - 10https://gerrit.wikimedia.org/r/202788 (https://phabricator.wikimedia.org/T48544) (owner: 10Southparkfan) [18:34:29] 6operations, 10ops-eqiad, 10Analytics-Cluster: analytics1020 hardware failure - https://phabricator.wikimedia.org/T95263#1191043 (10Ottomata) p:5Triage>3High [18:34:35] 6operations, 10ops-eqiad, 10Analytics-Cluster: analytics1020 hardware failure - https://phabricator.wikimedia.org/T95263#1184828 (10Ottomata) p:5High>3Triage [18:35:21] (03PS2) 10Southparkfan: Direct labsconsole.wm.o through Apache cluster [puppet] - 10https://gerrit.wikimedia.org/r/202788 (https://phabricator.wikimedia.org/T48544) [18:35:48] 6operations, 10ops-eqiad, 10Analytics-Cluster: analytics1020 hardware failure - https://phabricator.wikimedia.org/T95263#1184828 (10Ottomata) p:5Triage>3High [18:36:04] (03PS1) 10Southparkfan: Direct labsconsole.wm.o through Apache cluster [dns] - 10https://gerrit.wikimedia.org/r/202791 (https://phabricator.wikimedia.org/T48544) [18:37:30] 6operations, 10ops-eqiad, 10Analytics-Cluster: analytics1020 hardware failure - https://phabricator.wikimedia.org/T95263#1191058 (10Cmjohnson) I did an initial look and we had this in the past with a couple of the R720's and a main board had to be swapped. I need to do some more testing before I contact Dell... [18:37:41] (03CR) 10John F. Lewis: [C: 031] "lgtm" [dns] - 10https://gerrit.wikimedia.org/r/202791 (https://phabricator.wikimedia.org/T48544) (owner: 10Southparkfan) [18:38:08] (03PS3) 10Southparkfan: Direct labsconsole.wm.o through Apache cluster [puppet] - 10https://gerrit.wikimedia.org/r/202788 (https://phabricator.wikimedia.org/T48544) [18:39:09] (03PS1) 1020after4: Remove 1.25wmf18-19 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202793 [18:39:11] (03PS1) 1020after4: Add 1.26wmf1 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202794 [18:39:13] (03PS1) 1020after4: Group0 to 1.26wmf1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202795 [18:39:39] (03CR) 10jenkins-bot: [V: 04-1] Allow for new labs domain schema in ENC [puppet] - 10https://gerrit.wikimedia.org/r/202790 (owner: 10Thcipriani) [18:39:43] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [18:40:19] (03CR) 1020after4: [C: 032] Remove 1.25wmf18-19 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202793 (owner: 1020after4) [18:40:29] (03CR) 1020after4: [C: 032] Add 1.26wmf1 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202794 (owner: 1020after4) [18:41:40] (03PS1) 10Ori.livneh: rrd-navtiming: fix step argument for archive defs [puppet] - 10https://gerrit.wikimedia.org/r/202796 [18:42:20] (03PS2) 10Ori.livneh: rrd-navtiming: fix step argument for archive defs [puppet] - 10https://gerrit.wikimedia.org/r/202796 [18:42:28] (03CR) 10Ori.livneh: [C: 032 V: 032] rrd-navtiming: fix step argument for archive defs [puppet] - 10https://gerrit.wikimedia.org/r/202796 (owner: 10Ori.livneh) [18:43:14] (03PS4) 10Southparkfan: Direct labsconsole.wm.o through Apache cluster [puppet] - 10https://gerrit.wikimedia.org/r/202788 (https://phabricator.wikimedia.org/T48544) [18:43:29] (03PS2) 10Thcipriani: Allow for new labs domain schema in ENC [puppet] - 10https://gerrit.wikimedia.org/r/202790 [18:44:09] (03CR) 10John F. Lewis: [C: 031] Direct labsconsole.wm.o through Apache cluster [puppet] - 10https://gerrit.wikimedia.org/r/202788 (https://phabricator.wikimedia.org/T48544) (owner: 10Southparkfan) [18:46:56] (03PS1) 10Dzahn: fix check_eth icinga plugin [puppet] - 10https://gerrit.wikimedia.org/r/202798 (https://phabricator.wikimedia.org/T92293) [18:48:59] (03CR) 10Dzahn: [C: 04-1] "WIP" [puppet] - 10https://gerrit.wikimedia.org/r/202798 (https://phabricator.wikimedia.org/T92293) (owner: 10Dzahn) [18:49:38] can someone help refresh caches on exim? (I have a ldap group record, but it must be negative caching) [18:49:47] on polonium [18:50:36] (03PS5) 10Southparkfan: Direct labsconsole.wm.o through Apache cluster [puppet] - 10https://gerrit.wikimedia.org/r/202788 (https://phabricator.wikimedia.org/T48554) [18:51:03] (03PS2) 10Southparkfan: Direct labsconsole.wm.o through Apache cluster [dns] - 10https://gerrit.wikimedia.org/r/202791 (https://phabricator.wikimedia.org/T48554) [18:55:03] (03PS2) 10Dzahn: fix check_eth icinga plugin [puppet] - 10https://gerrit.wikimedia.org/r/202798 (https://phabricator.wikimedia.org/T92293) [18:56:49] uhm. deployment is significantly delayed because the zuul queue is way backed up [19:08:00] 6operations, 7Icinga, 5Patch-For-Review: "NRPE: Unable to read output" should not be OK for "configured eth" check - https://phabricator.wikimedia.org/T92293#1191217 (10Dzahn) {F110158} [19:10:39] (03PS2) 10Cmjohnson: Adding dns entries for labvirt1001-6 [dns] - 10https://gerrit.wikimedia.org/r/202765 [19:11:54] eh [12:11:37] * #mediawiki_security: Cannot join channel (+i) - you must be invited [19:12:06] (03CR) 10Cmjohnson: [C: 032] Adding dns entries for labvirt1001-6 [dns] - 10https://gerrit.wikimedia.org/r/202765 (owner: 10Cmjohnson) [19:12:25] I'm signed in properly... [19:12:37] (03PS2) 10Aaron Schulz: Set "recentchanges" query group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202633 [19:12:37] ori: #wikimedia-releng [19:13:27] (03PS3) 10Dzahn: fix check_eth icinga plugin [puppet] - 10https://gerrit.wikimedia.org/r/202798 (https://phabricator.wikimedia.org/T92293) [19:15:27] 6operations, 6Phabricator, 10Wikimedia-Bugzilla: Sanitise a Bugzilla database dump - https://phabricator.wikimedia.org/T85141#1191236 (10Eloquence) @Slaporte is the tech liaison from legal now that Luis runs community. Daniel and/or Andre, my recommendation would be that you set up a quick time with him to w... [19:21:20] (03PS4) 10Dzahn: fix check_eth icinga plugin [puppet] - 10https://gerrit.wikimedia.org/r/202798 (https://phabricator.wikimedia.org/T92293) [19:22:17] (03CR) 10Dzahn: [C: 032] fix check_eth icinga plugin [puppet] - 10https://gerrit.wikimedia.org/r/202798 (https://phabricator.wikimedia.org/T92293) (owner: 10Dzahn) [19:25:35] (03Merged) 10jenkins-bot: Remove 1.25wmf18-19 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202793 (owner: 1020after4) [19:25:37] (03Merged) 10jenkins-bot: Add 1.26wmf1 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202794 (owner: 1020after4) [19:25:45] woot! [19:27:16] geez [19:30:30] (03PS1) 10Ori.livneh: rrd-navtiming: case fix [puppet] - 10https://gerrit.wikimedia.org/r/202835 [19:30:40] (03CR) 10Ori.livneh: [C: 032 V: 032] rrd-navtiming: case fix [puppet] - 10https://gerrit.wikimedia.org/r/202835 (owner: 10Ori.livneh) [19:30:59] 6operations, 7Icinga, 5Patch-For-Review: "NRPE: Unable to read output" should not be OK for "configured eth" check - https://phabricator.wikimedia.org/T92293#1191442 (10Dzahn) a:3Dzahn [19:35:10] (03PS12) 10Yuvipanda: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 (https://phabricator.wikimedia.org/T95255) [19:40:21] (03PS13) 10Yuvipanda: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 (https://phabricator.wikimedia.org/T95255) [19:41:10] 6operations: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1191463 (10RobH) [19:43:59] 6operations: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1191473 (10RobH) a:5RobH>3None So the puppet/salt keys are accepted and this is ready for service implementation. [19:44:22] 6operations, 7Icinga, 5Patch-For-Review: "NRPE: Unable to read output" should not be OK for "configured eth" check - https://phabricator.wikimedia.org/T92293#1191477 (10Dzahn) fixed. see the column on the right {F110175} [19:44:55] 6operations, 7Icinga, 5Patch-For-Review: "NRPE: Unable to read output" should not be OK for "configured eth" check - https://phabricator.wikimedia.org/T92293#1191479 (10Dzahn) 5Open>3Resolved [19:59:21] !log twentyafterfour Started scap: testwiki to php-1.26wmf1 and rebuild l10n cache [19:59:25] Logged the message, Master [20:00:04] gwicke, cscott, arlolra, subbu: Respected human, time to deploy Services – Parsoid / OCG / Citoid / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150408T2000). Please do the needful. [20:01:02] (03PS5) 10Dzahn: limn: minor lint and Resource attributes quoting [puppet] - 10https://gerrit.wikimedia.org/r/195616 (owner: 10Matanya) [20:02:34] (03CR) 10jenkins-bot: [V: 04-1] limn: minor lint and Resource attributes quoting [puppet] - 10https://gerrit.wikimedia.org/r/195616 (owner: 10Matanya) [20:07:26] (03PS6) 10Dzahn: limn: minor lint and Resource attributes quoting [puppet] - 10https://gerrit.wikimedia.org/r/195616 (owner: 10Matanya) [20:08:23] (03CR) 10jenkins-bot: [V: 04-1] limn: minor lint and Resource attributes quoting [puppet] - 10https://gerrit.wikimedia.org/r/195616 (owner: 10Matanya) [20:09:43] (03PS7) 10Dzahn: limn: minor lint and Resource attributes quoting [puppet] - 10https://gerrit.wikimedia.org/r/195616 (owner: 10Matanya) [20:18:53] twentyafterfour: the train deploy is delayed I guess? [20:19:03] gwicke: it's scapping right now [20:19:16] sync-common: 86% (ok: 401; fail: 0; left: 64) [20:19:19] ah, k [20:19:25] wmf24 as well? [20:20:02] I haven't committed the wikiversions changes yet, just syncing for test of 1.26wmf1 [20:20:30] but scap syncs the files from both branches so... [20:20:46] anything on 1.25wmf24 that was new ... just got sync'd [20:21:03] k, mostly waiting for group2 to switch [20:22:09] when wikibugs quits with excess flood.. [20:22:55] oh, hm -- should i wait to deploy parsoid? [20:22:58] mutante: That's probably more because /it/ got flooded. [20:23:20] cscott: do you depend on VE being updated in group 2? [20:23:35] no [20:23:51] if anything there's a dependency on VE's group 0 update [20:24:09] (03CR) 10Andrew Bogott: [C: 04-1] "Thanks for doing this. I added a couple of inline comments." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/202790 (owner: 10Thcipriani) [20:24:30] cscott: how so? [20:24:32] I'm about to sync both of those updates... [20:24:36] (03PS1) 10Dereckson: Namespace configuration on ru.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202912 (https://phabricator.wikimedia.org/T95110) [20:24:59] !log twentyafterfour Finished scap: testwiki to php-1.26wmf1 and rebuild l10n cache (duration: 25m 38s) [20:25:00] VE group 0 is getting the new comment-parsing code. [20:25:02] Logged the message, Master [20:25:31] o/ logmsgbot [20:25:42] cscott: most of your clients will be on the old code of course, so parsoid should not depend on that to have happened [20:26:36] yes, it's rather the opposite -- it's best if parsoid is deployed before group 0 gets deployed. but that's only a minor concern. [20:27:06] you want me to wait for parsoid deploy? [20:27:27] twentyafterfour: it seems like we're going to do them more-or-less simultaneous, so i don't think you need to wait. [20:28:00] (03CR) 10Gage: [C: 031] limn: minor lint and Resource attributes quoting [puppet] - 10https://gerrit.wikimedia.org/r/195616 (owner: 10Matanya) [20:28:08] (03CR) 10Dzahn: [C: 032] limn: minor lint and Resource attributes quoting [puppet] - 10https://gerrit.wikimedia.org/r/195616 (owner: 10Matanya) [20:29:19] (03PS1) 1020after4: Wikipedias to 1.25wmf24 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202915 [20:30:18] (03CR) 1020after4: [C: 032] Wikipedias to 1.25wmf24 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202915 (owner: 1020after4) [20:31:06] (03CR) 10Dzahn: "labs only" [puppet] - 10https://gerrit.wikimedia.org/r/195616 (owner: 10Matanya) [20:32:15] !log updated Parsoid to version a76bd8a3 [20:32:18] Logged the message, Master [20:38:54] twentyafterfour: okay, parsoid is updated. let me know when the group 0 wikis are finished with their deploy, i'd like to test the latest VE against the version of Parsoid I just deployed. [20:40:39] (03Merged) 10jenkins-bot: Wikipedias to 1.25wmf24 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202915 (owner: 1020after4) [20:43:05] 6operations, 7domains: .ua and .укр domain registration - https://phabricator.wikimedia.org/T95433#1191676 (10Andrew) p:5Triage>3Normal [20:43:47] (03Abandoned) 1020after4: Group0 to 1.26wmf1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202795 (owner: 1020after4) [20:44:24] (03PS1) 1020after4: Group0 to 1.26wmf1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202916 [20:46:36] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf24 [20:46:52] Logged the message, Master [20:46:52] (03CR) 1020after4: [C: 032] Group0 to 1.26wmf1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202916 (owner: 1020after4) [20:46:53] (03Merged) 10jenkins-bot: Group0 to 1.26wmf1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202916 (owner: 1020after4) [20:48:15] !log aaron Synchronized wmf-config/db-eqiad.php: Set "recentchanges" query group (duration: 00m 16s) [20:48:18] Logged the message, Master [20:49:25] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf1 [20:49:29] Logged the message, Master [20:49:30] cscott: group0 done [20:50:11] gwicke: all done if you were still waiting ... [20:51:05] !log twentyafterfour Purged l10n cache for 1.25wmf23 [20:51:12] Logged the message, Master [20:52:40] twentyafterfour: thank you! [20:52:53] /cc James_F [20:53:30] twentyafterfour: better than last week! :) [20:53:40] Bah. [20:53:48] gwicke: https://gerrit.wikimedia.org/r/#/c/200105/ but it needs rebasing. RoanKattouw/ [20:54:14] James_F: On it' [20:54:21] Ta. [20:54:44] (03CR) 10Jforrester: "Now OK to go." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200105 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [20:57:08] (03PS3) 10Catrope: Make VisualEditor access RESTbase directly on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200105 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [20:57:12] James_F: ---^^ [20:57:20] ( gwicke ---^^ ) [20:57:28] (03CR) 10Jforrester: [C: 031] Make VisualEditor access RESTbase directly on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200105 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [20:57:46] (03CR) 10GWicke: [C: 032] Make VisualEditor access RESTbase directly on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200105 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [20:58:06] RoanKattouw, James_F: am happy to do the deploy as well [21:00:19] (once Jenkins has done its part) [21:00:30] gwicke: Please do [21:00:47] ori: hey! did you benchmark teafile / whisper before switching to rrd? [21:00:52] or did they not offer medians? [21:03:33] (03PS9) 10Yuvipanda: Add a 'runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 [21:03:35] (03PS14) 10Yuvipanda: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 (https://phabricator.wikimedia.org/T95255) [21:03:38] legoktm: do you have an example for the tox8-bin stuff? [21:05:27] YuviPanda: https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/server/tox.ini#L29 [21:07:36] 6operations, 6Labs, 10hardware-requests: Replace virt1000 with a newer warrantied server - https://phabricator.wikimedia.org/T90626#1191765 (10RobH) a:5Andrew>3RobH Understood, so we won't replace virt1000 but instead have a warm(ish) standby for use. I'll steal this task back. [21:07:55] (03Merged) 10jenkins-bot: Make VisualEditor access RESTbase directly on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200105 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [21:08:41] (03PS10) 10Yuvipanda: Add a 'collector-runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 [21:08:42] legoktm: done [21:08:43] (03PS15) 10Yuvipanda: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 (https://phabricator.wikimedia.org/T95255) [21:09:27] !log gwicke Synchronized wmf-config/InitialiseSettings.php: Make VisualEditor load HTML directly from rest.wikimedia.org on enwiki (duration: 00m 11s) [21:09:39] RoanKattouw, James_F|Away ^^ [21:09:46] Sweet [21:11:57] legoktm: can you +1? :) [21:12:50] RoanKattouw: looking good on enwiki [21:14:11] (03CR) 10Legoktm: [C: 031] Add a 'collector-runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 (owner: 10Yuvipanda) [21:14:19] parsoid deploy complete, by the way. and group 0 VE looks good to me, thanks twentyafterfour [21:14:29] (03CR) 10Yuvipanda: [C: 032] Add a 'collector-runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 (owner: 10Yuvipanda) [21:15:25] cscott: no prob, glad everything is looking good [21:15:45] (03CR) 10Yuvipanda: Add initial debian package & upstart script (033 comments) [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 (https://phabricator.wikimedia.org/T95255) (owner: 10Yuvipanda) [21:15:52] (03CR) 10Yuvipanda: [C: 032] Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 (https://phabricator.wikimedia.org/T95255) (owner: 10Yuvipanda) [21:16:03] 6operations, 10Incident-20150205-SiteOutage, 6MediaWiki-API-Team, 10MediaWiki-Debug-Logging, and 2 others: Decouple logging infrastructure failures from MediaWiki logging - https://phabricator.wikimedia.org/T88732#1191822 (10bd808) [21:16:09] (03Merged) 10jenkins-bot: Add a 'collector-runner' helper script that can run any collector [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202650 (owner: 10Yuvipanda) [21:17:25] (03Merged) 10jenkins-bot: Add initial debian package & upstart script [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/202664 (https://phabricator.wikimedia.org/T95255) (owner: 10Yuvipanda) [21:21:50] gwicke: If RB on enwiki is holding up nicely, let's add the next one (all Wikipedias) to the 4pm SWAT? [21:22:17] RoanKattouw: wfm [21:22:27] could you prepare the patch? [21:23:00] RoanKattouw: http://grafana.wikimedia.org/#/dashboard/db/visualeditor-load-save [21:23:10] mean load time now at ~700ms [21:24:17] if it stays that way it'll be another ~40% on top of the previous 40% [21:24:30] Will prepare [21:24:33] Wow [21:24:42] So 72%? [21:24:54] something on that order, yes [21:25:10] previous 30-day mean was ~1.96s [21:25:11] So sweet a graph that it doesn't even look real. :O [21:25:31] What does the increase in the serialization actually imply? [21:25:58] good question [21:26:20] gwicke: Holy crap that graph [21:26:41] That's an impressive decrease [21:26:51] especially in the 99% [21:26:51] But yeah I don't know what's going on with serialization [21:27:00] Even the mean increases sharply [21:27:54] maybe HHVM takes some time to reach steady state performance? [21:29:00] How is HHVM relevant? [21:29:22] What we just did shouldn't have changed anything about the serialization pipeline [21:29:25] saves are still passing through the PHP API, and the core scap only finished shortly before 2pm [21:29:30] Other than that it's now being fed v2 HTML instead of v1 HTML [21:29:32] Ooh I see [21:29:46] You're right, that serialization increase happened earlier [21:30:24] I could also imagine that some of the 99% change we saw initially on load was caused by HHVM restarts [21:30:40] Yeah [21:30:50] Maybe it wasn't such a good idea to do this so close to the deploy train [21:31:24] save times are trending down again [21:31:35] slightly at least [21:33:42] mean VE HTML load time now at 517ms [21:34:15] hopefully the stats aren't broken ;) [21:34:48] (03PS1) 10Andrew Bogott: For cert names, use the fqdn instead of the ec2id if use_dnsmasq is lowered. [puppet] - 10https://gerrit.wikimedia.org/r/202924 [21:34:54] Ahm, well there is https://phabricator.wikimedia.org/T95432 [21:35:11] Along with us unbreaking mobile VE a bit [21:35:22] But mobile VE traffic should be low enough that it shouldn't have that big an impact [21:36:47] the numbers fit with what we saw on mw.org, but it's still rather significant to see a 50+% drop from switching enwiki alone [21:38:25] RoanKattouw, re save timings: http://grafana.wikimedia.org/#/dashboard/db/restbase?panelId=9&fullscreen [21:38:32] (03PS1) 10Kaldari: Removing ® from mobile wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202926 (https://phabricator.wikimedia.org/T95007) [21:38:59] there was a slight increase just before 2pm, likely because parsoid's API requests slowed down after the train deploy [21:39:12] (03CR) 10jenkins-bot: [V: 04-1] For cert names, use the fqdn instead of the ec2id if use_dnsmasq is lowered. [puppet] - 10https://gerrit.wikimedia.org/r/202924 (owner: 10Andrew Bogott) [21:39:13] but now it looks back to normal [21:42:34] (03CR) 10Multichill: "Probably better to whitelist *.adlibhosting.com . They host collections of plenty of museums" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202736 (https://phabricator.wikimedia.org/T95418) (owner: 10Dereckson) [21:48:21] (03PS3) 10Catrope: Make VisualEditor access RESTbase directly on Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200106 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [21:48:56] (03PS2) 10Andrew Bogott: For cert names, use the fqdn instead of the ec2id if use_dnsmasq is lowered. [puppet] - 10https://gerrit.wikimedia.org/r/202924 [21:49:52] (03CR) 10jenkins-bot: [V: 04-1] For cert names, use the fqdn instead of the ec2id if use_dnsmasq is lowered. [puppet] - 10https://gerrit.wikimedia.org/r/202924 (owner: 10Andrew Bogott) [21:51:12] 6operations, 10MediaWiki-Debug-Logging, 10hardware-requests, 5Patch-For-Review: Fluorine needs bigger disks - https://phabricator.wikimedia.org/T92417#1191989 (10bd808) [21:58:58] (03CR) 10Dereckson: [C: 04-1] "Hold on, per comment 4." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202736 (https://phabricator.wikimedia.org/T95418) (owner: 10Dereckson) [22:05:22] 10Ops-Access-Requests, 6operations, 10Analytics: Grant Sati access to geowiki - https://phabricator.wikimedia.org/T95494#1192093 (10kevinator) 3NEW a:3Ottomata [22:09:23] 10Ops-Access-Requests, 6operations, 10Analytics: Grant Sati access to geowiki - https://phabricator.wikimedia.org/T95494#1192127 (10kevinator) @ottomata can you give us an ETA is this isn't something you can complete quickly? [22:16:45] (03CR) 10Greg Grossmeier: "Mukunda: It can't be merged because it needs a rebase or? (I don't see that indication off hand)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188388 (https://phabricator.wikimedia.org/T75905) (owner: 10Reedy) [22:17:13] (03CR) 10Greg Grossmeier: "oh, "Can merge: No"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188388 (https://phabricator.wikimedia.org/T75905) (owner: 10Reedy) [22:19:49] 6operations, 10MediaWiki-Debug-Logging, 6Release-Engineering, 6Security-Team, 5Patch-For-Review: Store unsampled API and XFF logs - https://phabricator.wikimedia.org/T88393#1192197 (10RobLa-WMF) [22:30:42] !log rbf1001,rbf1002 - stopping redis-server [22:30:47] Logged the message, Master [22:31:24] (it shouldnt be used, but being careful anyways before we kill them from DNS) [22:38:50] !log aaron Synchronized php-1.26wmf1/extensions/AbuseFilter: e0c99fa093f23f23310c77524a78adfd3017f79e (duration: 00m 12s) [22:38:53] Logged the message, Master [22:39:39] hmm, aaron? [22:41:57] (03CR) 10Kaldari: [C: 04-2] "Not yet approved by legal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202926 (https://phabricator.wikimedia.org/T95007) (owner: 10Kaldari) [22:42:55] !log rbf1001 - shutdown -h now (https://phabricator.wikimedia.org/T93006#1177448) [22:42:59] Logged the message, Master [22:44:24] 6operations, 10MediaWiki-JobQueue, 7Monitoring: Establish monitoring thresholds for job queue - https://phabricator.wikimedia.org/T79687#1192310 (10bd808) [22:44:28] 6operations, 10Beta-Cluster, 7HHVM: Convert work machines (tin, terbium) to Trusty and hhvm usage - https://phabricator.wikimedia.org/T87036#1192308 (10bd808) [22:48:55] * RoanKattouw claims SWAT today [22:52:19] !log rbf1002 - power down, gone from icinga, rbf200x revoke salt keys [22:52:24] Logged the message, Master [22:56:25] 6operations, 10hardware-requests, 5Patch-For-Review: Decom/repurpose rbf* hosts - https://phabricator.wikimedia.org/T95153#1192422 (10Dzahn) rbf1001/rbf1002: gone from icinga. disabled puppet and salt. stopped redis-server, shutdown servers shortly after rbf2001/2002: gone from icinga, revoked salt keys, re... [23:00:04] RoanKattouw, ^d, RoanKattouw: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150408T2300). Please do the needful. [23:00:18] o/ [23:00:23] * RoanKattouw claims [23:00:54] bah, someone broke the deployments page [23:00:58] legoktm did [23:01:00] Fixing [23:01:03] oops [23:01:04] sorry :/ [23:01:13] :) [23:01:32] ...and I don't know how to use my own software so I just undid my change [23:01:34] Fixing *that* now [23:03:03] (03CR) 10Catrope: [C: 032] Manage username blacklist (TitleBlacklist) from Meta-Wiki only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195623 (https://phabricator.wikimedia.org/T38939) (owner: 10Legoktm) [23:05:08] (03PS1) 10BBlack: T86663 4.1: pool cp3034; depool cp3015,amssq4[34] [puppet] - 10https://gerrit.wikimedia.org/r/202947 [23:05:53] (03CR) 10BBlack: [C: 032 V: 032] T86663 4.1: pool cp3034; depool cp3015,amssq4[34] [puppet] - 10https://gerrit.wikimedia.org/r/202947 (owner: 10BBlack) [23:06:43] bblack: https://gerrit.wikimedia.org/r/#/c/200222/ ? [23:07:50] (03CR) 10BBlack: [C: 031] decom cp3001,cp3002. keep mgmt [dns] - 10https://gerrit.wikimedia.org/r/200222 (https://phabricator.wikimedia.org/T94215) (owner: 10Dzahn) [23:09:24] RECOVERY - Apache HTTP on mw1198 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.114 second response time [23:09:24] RECOVERY - HHVM rendering on mw1198 is OK: HTTP OK: HTTP/1.1 200 OK - 67219 bytes in 0.389 second response time [23:09:31] !log mw1198 - restarted hhvm [23:09:36] Logged the message, Master [23:09:54] !log mw1208 - restarted hhvm [23:09:58] Logged the message, Master [23:10:34] RECOVERY - HHVM rendering on mw1208 is OK: HTTP OK: HTTP/1.1 200 OK - 67219 bytes in 0.650 second response time [23:10:53] RECOVERY - Apache HTTP on mw1208 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.078 second response time [23:15:04] legoktm: You here for your config change for SWAT? [23:15:13] yup [23:15:37] Not that it's gonna be deployed any time soon, there's a zend-phpunit job running for 18 mins and counting that's holding everything up [23:15:43] :| [23:15:52] actually that's the hhvm job >.> [23:15:58] ugh [23:16:01] (03Merged) 10jenkins-bot: Manage username blacklist (TitleBlacklist) from Meta-Wiki only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195623 (https://phabricator.wikimedia.org/T38939) (owner: 10Legoktm) [23:16:05] woot [23:16:08] OK there we go [23:16:08] :P [23:16:33] RECOVERY - HHVM busy threads on mw1198 is OK: OK: Less than 30.00% above the threshold [76.8] [23:17:23] RECOVERY - HHVM queue size on mw1198 is OK: OK: Less than 30.00% above the threshold [10.0] [23:17:23] RECOVERY - HHVM busy threads on mw1208 is OK: OK: Less than 30.00% above the threshold [76.8] [23:17:29] (03PS3) 10Thcipriani: Allow for new labs domain schema in ENC [puppet] - 10https://gerrit.wikimedia.org/r/202790 [23:17:36] !log catrope Synchronized wmf-config/CommonSettings.php: Manage username blacklist from metawiki only (duration: 00m 14s) [23:17:41] Logged the message, Master [23:17:59] (03CR) 10Catrope: [C: 032] Make VisualEditor access RESTbase directly on Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200106 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [23:18:01] yay :D [23:18:13] (03Merged) 10jenkins-bot: Make VisualEditor access RESTbase directly on Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200106 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [23:18:32] RECOVERY - HHVM queue size on mw1208 is OK: OK: Less than 30.00% above the threshold [10.0] [23:18:53] !log catrope Synchronized wmf-config/InitialiseSettings.php: Make VE access RB directly on Wikipedias (duration: 00m 12s) [23:18:56] Logged the message, Master [23:19:17] gwicke: ---^^ [23:20:19] RoanKattouw: confirmed [23:22:02] (03PS1) 10BBlack: T86663 4.1: shuffle roles, no repool yet [puppet] - 10https://gerrit.wikimedia.org/r/202952 [23:22:14] (03CR) 10BBlack: [C: 032 V: 032] T86663 4.1: shuffle roles, no repool yet [puppet] - 10https://gerrit.wikimedia.org/r/202952 (owner: 10BBlack) [23:22:25] (03CR) 10GWicke: [C: 031] set cassandra vars in cassandra.yaml, not restbase [puppet] - 10https://gerrit.wikimedia.org/r/201401 (owner: 10Dzahn) [23:23:32] (03CR) 10Dzahn: [C: 032] set cassandra vars in cassandra.yaml, not restbase [puppet] - 10https://gerrit.wikimedia.org/r/201401 (owner: 10Dzahn) [23:23:56] RoanKattouw: yeehaw! [23:24:06] mutante: oops got yours merged [23:24:19] bblack: ok:) [23:24:24] i was just starting to wonder [23:25:11] checks on restbase1001 [23:26:01] looks good, noop, bbl [23:26:07] (03PS1) 10Ori.livneh: rrd-navtiming: fix logging [puppet] - 10https://gerrit.wikimedia.org/r/202955 [23:26:46] (03PS2) 10Ori.livneh: rrd-navtiming: fix logging [puppet] - 10https://gerrit.wikimedia.org/r/202955 [23:27:10] (03CR) 10Ori.livneh: [C: 032 V: 032] rrd-navtiming: fix logging [puppet] - 10https://gerrit.wikimedia.org/r/202955 (owner: 10Ori.livneh) [23:33:08] (03CR) 10Thcipriani: "Andrew: This code should work with ec2ids or hostnames (note the associatedDomain change in the ldap search)." [puppet] - 10https://gerrit.wikimedia.org/r/202790 (owner: 10Thcipriani) [23:38:25] !log catrope Synchronized php-1.25wmf24/includes/api/ApiParse.php: SWAT (duration: 00m 11s) [23:38:29] Logged the message, Master [23:38:57] !log catrope Synchronized php-1.25wmf24/extensions/VisualEditor/: SWAT (duration: 00m 14s) [23:39:00] Logged the message, Master [23:40:27] (03PS1) 10BBlack: T86663 4.1: repool cp3015 [puppet] - 10https://gerrit.wikimedia.org/r/202957 [23:40:39] (03CR) 10BBlack: [C: 032 V: 032] T86663 4.1: repool cp3015 [puppet] - 10https://gerrit.wikimedia.org/r/202957 (owner: 10BBlack) [23:43:48] 6operations, 10RESTBase, 7Performance: Set up a generic API base path to be used by action & REST APIs - https://phabricator.wikimedia.org/T95229#1192574 (10Catrope) >>! In T95229#1190411, @GWicke wrote: > The other security aspect is XSS, although this is not new as the HTML content has already been loaded... [23:45:06] (03PS1) 10BBlack: T86663 4.2: pool 3035; depool 3016,3012,amssq4[56] [puppet] - 10https://gerrit.wikimedia.org/r/202959 [23:45:23] !log catrope Synchronized php-1.26wmf1/extensions/VisualEditor/: SWAT (duration: 00m 12s) [23:45:28] Logged the message, Master [23:45:39] (03CR) 10BBlack: [C: 032 V: 032] T86663 4.2: pool 3035; depool 3016,3012,amssq4[56] [puppet] - 10https://gerrit.wikimedia.org/r/202959 (owner: 10BBlack) [23:45:45] !log catrope Synchronized php-1.26wmf1/includes/api/ApiParse.php: SWAT (duration: 00m 12s) [23:45:48] Logged the message, Master [23:49:07] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: puppet fail [23:52:09] 6operations: Requesting stat1003 access for mholloway - https://phabricator.wikimedia.org/T95506#1192615 (10Dbrant) 3NEW [23:52:37] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:52:55] 6operations: Requesting stat1003 access for mholloway - https://phabricator.wikimedia.org/T95506#1192623 (10yuvipanda) ccing @tfinc for manager approval. [23:53:11] 6operations: Requesting stat1003 access for mholloway - https://phabricator.wikimedia.org/T95506#1192625 (10Dbrant) @tfinc please approve when you get a chance. [23:53:56] (03PS1) 10BBlack: T86663 4.2: shuffle roles, no repool yet [puppet] - 10https://gerrit.wikimedia.org/r/202961 [23:55:03] (03CR) 10BBlack: [C: 032] T86663 4.2: shuffle roles, no repool yet [puppet] - 10https://gerrit.wikimedia.org/r/202961 (owner: 10BBlack)