[00:00:37] Krenair: When you're done I've got another commit for wmf21. [00:01:29] Krinkle, one more I think [00:03:49] (03PS1) 10Alex Monk: Move be-x-old wgServer/wgCanonicalServer to be-tarask [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235940 (https://phabricator.wikimedia.org/T11823) [00:04:34] (03PS1) 10Tim Landscheidt: Tools: Replace references to tools.wmflabs.org [puppet] - 10https://gerrit.wikimedia.org/r/235941 (https://phabricator.wikimedia.org/T87387) [00:05:25] (03CR) 10Alex Monk: [C: 032] Move be-x-old wgServer/wgCanonicalServer to be-tarask [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235940 (https://phabricator.wikimedia.org/T11823) (owner: 10Alex Monk) [00:05:33] (03Merged) 10jenkins-bot: Move be-x-old wgServer/wgCanonicalServer to be-tarask [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235940 (https://phabricator.wikimedia.org/T11823) (owner: 10Alex Monk) [00:06:05] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235940/ (duration: 00m 11s) [00:06:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:06:50] looks good [00:07:11] guess we just need to get those redirects in now [00:07:13] Krinkle, go for it [00:09:01] (03PS7) 10Dzahn: move mediawiki maintenance scripts to module [puppet] - 10https://gerrit.wikimedia.org/r/178873 (https://phabricator.wikimedia.org/T88597) [00:11:16] Krenair: OK [00:11:18] !log krinkle@tin Synchronized php-1.26wmf21/includes/resourceloader/ResourceLoader.php: I24f68e34a9fa4918 (duration: 00m 12s) [00:11:22] (03CR) 10Tim Landscheidt: [C: 04-1] "While this works nicely, I forgot to add another level of parameters, i. e. not rely on urlproxy getting the right scope on web_domain, bu" [puppet] - 10https://gerrit.wikimedia.org/r/235941 (https://phabricator.wikimedia.org/T87387) (owner: 10Tim Landscheidt) [00:11:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:12:04] (03PS1) 10Alex Monk: Redirect be-x-old.wikipedia.org to be-tarask.wikipedia.org [puppet] - 10https://gerrit.wikimedia.org/r/235943 (https://phabricator.wikimedia.org/T11823) [00:14:28] (03PS8) 10Dzahn: move mediawiki maintenance scripts to module [puppet] - 10https://gerrit.wikimedia.org/r/178873 (https://phabricator.wikimedia.org/T88597) [00:15:54] (03CR) 10Krinkle: Redirect be-x-old.wikipedia.org to be-tarask.wikipedia.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/235943 (https://phabricator.wikimedia.org/T11823) (owner: 10Alex Monk) [00:18:06] (03CR) 10Krinkle: move mediawiki maintenance scripts to module (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/178873 (https://phabricator.wikimedia.org/T88597) (owner: 10Dzahn) [00:20:01] (03PS9) 10Dzahn: move mediawiki maintenance scripts to module [puppet] - 10https://gerrit.wikimedia.org/r/178873 (https://phabricator.wikimedia.org/T88597) [00:22:26] legoktm: around? [00:22:45] (03PS10) 10Dzahn: move mediawiki maintenance scripts to module [puppet] - 10https://gerrit.wikimedia.org/r/178873 (https://phabricator.wikimedia.org/T88597) [00:22:49] lfaraone: hey [00:22:59] mutante: Could you merge https://gerrit.wikimedia.org/r/#/c/235222/ ? [00:23:54] Soooooo who do I bug about " I get a MySQL database connection lost error when trying to revdel more than a few revs"? [00:24:23] heh [00:24:48] lfaraone: can you file a phabricator ticket under the MediaWiki-Revision-deletion project and I'll take a look in a bit? [00:25:51] Thanks, legoktm :) [00:36:46] (03CR) 10Alex Monk: Redirect be-x-old.wikipedia.org to be-tarask.wikipedia.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/235943 (https://phabricator.wikimedia.org/T11823) (owner: 10Alex Monk) [00:39:10] (03PS2) 10Alex Monk: Redirect be-x-old.wikipedia.org to be-tarask.wikipedia.org [puppet] - 10https://gerrit.wikimedia.org/r/235943 (https://phabricator.wikimedia.org/T11823) [00:43:44] 6operations, 10Wikimedia-Mailing-lists: rsync all configs and archives one more time - https://phabricator.wikimedia.org/T110129#1605091 (10Dzahn) -W : real 117m40.660s --no-W: [00:55:03] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/874/terbium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/178873 (https://phabricator.wikimedia.org/T88597) (owner: 10Dzahn) [01:15:19] 6operations, 10Wikimedia-Mailing-lists: Evaluate lists with large moderation queues - https://phabricator.wikimedia.org/T110438#1605122 (10Dzahn) a:5Dzahn>3None [01:15:35] 6operations, 10Wikimedia-Mailing-lists: wikisk-l: Give the list an administrator - https://phabricator.wikimedia.org/T111054#1605123 (10Dzahn) a:5Dzahn>3None [01:15:45] 6operations, 10Wikimedia-Mailing-lists, 6Wiktionary: wiktionary-l: assign new moderators - https://phabricator.wikimedia.org/T110969#1605125 (10Dzahn) a:5Dzahn>3None [01:27:22] (03PS1) 10Ori.livneh: hhvm: Disable fss.so on canary API and APP servers [puppet] - 10https://gerrit.wikimedia.org/r/235954 [01:30:13] (03PS1) 10Dzahn: use https urls for miraheze updates [debs/wikistats] - 10https://gerrit.wikimedia.org/r/235955 (https://phabricator.wikimedia.org/T107398) [01:30:48] (03CR) 10Dzahn: [C: 032 V: 032] use https urls for miraheze updates [debs/wikistats] - 10https://gerrit.wikimedia.org/r/235955 (https://phabricator.wikimedia.org/T107398) (owner: 10Dzahn) [01:30:54] (03PS1) 10Krinkle: asset-check: Use mwLoadEvent hook instead of polling modules directly [puppet] - 10https://gerrit.wikimedia.org/r/235956 [01:31:50] (03CR) 10Krinkle: "Won't work against production (halted indefinitely on the event that doesn't exist yet), but can be tested locally with the referenced Med" [puppet] - 10https://gerrit.wikimedia.org/r/235956 (owner: 10Krinkle) [01:44:39] 6operations, 10Datasets-General-or-Unknown, 7HHVM: Convert snapshot hosts to use HHVM and trusty - https://phabricator.wikimedia.org/T94277#1605165 (10Krenair) @ArielGlenn: What's the status of this? MediaWiki is not going to support PHP 5.3 forever, at some point these hosts are going to break. [01:48:22] (03PS1) 10Dzahn: add script to import miraheze wikis [debs/wikistats] - 10https://gerrit.wikimedia.org/r/235958 (https://phabricator.wikimedia.org/T107398) [01:49:32] (03CR) 10Jalexander: "Yes please re:hewiki, I can see why it happened (they just added all abusefilter options and they aren't documented well enough/marked off" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234910 (https://phabricator.wikimedia.org/T109755) (owner: 10Mjbmr) [01:49:53] (03CR) 10Jalexander: "I'm happy for that to be a different patch if we want to split it though (sorry for double comment)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234910 (https://phabricator.wikimedia.org/T109755) (owner: 10Mjbmr) [01:54:18] Jamesofur, mind making a separate task about revoking those rights? [01:54:27] already created [01:54:36] https://phabricator.wikimedia.org/T111439 [01:54:54] ah right [01:55:05] I'll do it now [01:55:19] mafk just volunteered [01:55:24] you were a smidge behind :) [01:56:05] I ain't going to fight with Krenair, but I can propose that I create the patch and he merges /me hides [01:56:09] :) [01:56:17] Yes but he's not on the server,is he? [01:56:23] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: T111439 (duration: 00m 12s) [01:56:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:56:43] for he I was thinking in you Krenair :) [01:57:49] (03PS1) 10Dzahn: wikistats: add cronjob for miraheze import script [puppet] - 10https://gerrit.wikimedia.org/r/235959 (https://phabricator.wikimedia.org/T107398) [01:58:07] (03PS1) 10Alex Monk: Revoke suppression-level rights from interface editors on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235960 (https://phabricator.wikimedia.org/T111439) [01:58:41] (03CR) 10Alex Monk: [C: 032] "Already in production" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235960 (https://phabricator.wikimedia.org/T111439) (owner: 10Alex Monk) [01:58:47] (03Merged) 10jenkins-bot: Revoke suppression-level rights from interface editors on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235960 (https://phabricator.wikimedia.org/T111439) (owner: 10Alex Monk) [01:58:54] (03CR) 10Dzahn: "how do i exclude private wikis?" [debs/wikistats] - 10https://gerrit.wikimedia.org/r/235958 (https://phabricator.wikimedia.org/T107398) (owner: 10Dzahn) [01:59:09] HAHA [01:59:59] it's better that way, he can do it everything himself [02:00:07] submit and merge [02:01:12] I could argue that is, in fact, not the better way to do it ;) but I'm le tired [02:01:14] :P [02:01:53] petty change in this case [02:01:59] yup :) [02:03:02] time to sleep for me [02:03:07] good night [02:03:20] likewise Krenair [02:04:08] sleep well [02:04:46] (03CR) 10Alex Monk: "if ( isset( $wiki['private'] ) ) {" [debs/wikistats] - 10https://gerrit.wikimedia.org/r/235958 (https://phabricator.wikimedia.org/T107398) (owner: 10Dzahn) [02:04:54] PROBLEM - puppet last run on mw2191 is CRITICAL: CRITICAL: puppet fail [02:05:00] yeah, let's hope I don't have another nightmare like yesterday [02:23:27] !log l10nupdate@tin Synchronized php-1.26wmf21/cache/l10n: l10nupdate for 1.26wmf21 (duration: 05m 21s) [02:23:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:26:04] !log l10nupdate@tin LocalisationUpdate completed (1.26wmf21) at 2015-09-04 02:26:04+00:00 [02:26:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:28:25] (03PS1) 10Krinkle: gdash: Remove 'frontend' dashboard [puppet] - 10https://gerrit.wikimedia.org/r/235961 (https://phabricator.wikimedia.org/T104365) [02:29:54] 6operations, 6Services, 3Mobile-Content-Service, 7Monitoring, 5Patch-For-Review: Encoding issue in test_checker.py - https://phabricator.wikimedia.org/T111447#1605226 (10bearND) Thanks, @ori and @Dzahn! [02:30:55] RECOVERY - puppet last run on mw2191 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [02:33:28] 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1605232 (10Krinkle) [02:51:03] 6operations: Adapt all the things to localized Special: namespaces - https://phabricator.wikimedia.org/T105434#1605284 (10Krinkle) [03:06:06] (03CR) 10Krinkle: [C: 04-1] "Depends on Ic0b1fb64ee in MediaWiki core." [puppet] - 10https://gerrit.wikimedia.org/r/235956 (owner: 10Krinkle) [03:34:28] (03CR) 10Papaul: [V: 031] wmnet: indentation fixes [dns] - 10https://gerrit.wikimedia.org/r/235928 (owner: 10Dzahn) [03:59:51] 6operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-Requests, 5Patch-For-Review: Rename "be-x-old" to "be-tarask" - https://phabricator.wikimedia.org/T11823#1605388 (10Koavf) Thanks for everyone who has evidently resolved this issue. Can these Wikipedias be merged like the Cyrillic and Latin scripts... [04:19:20] !log l10nupdate@tin ResourceLoader cache refresh completed at Fri Sep 4 04:19:20 UTC 2015 (duration 19m 19s) [04:19:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [04:19:33] PROBLEM - puppet last run on cp3042 is CRITICAL: CRITICAL: puppet fail [04:38:32] PROBLEM - puppet last run on mw2213 is CRITICAL: CRITICAL: puppet fail [04:47:23] RECOVERY - puppet last run on cp3042 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:03:17] (03PS1) 10Dzahn: Revert "mailman: ferm, allow rsync from sodium for migration" [puppet] - 10https://gerrit.wikimedia.org/r/235976 [05:04:23] RECOVERY - puppet last run on mw2213 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [05:09:32] (03CR) 10Dzahn: [C: 032] Covert Polish listinfo template back to ISO 8859-2 [puppet] - 10https://gerrit.wikimedia.org/r/235911 (owner: 10saper) [05:11:03] (03CR) 10Dzahn: [C: 04-2] "reminder to self to revert this after the migration is over" [puppet] - 10https://gerrit.wikimedia.org/r/235976 (owner: 10Dzahn) [05:16:04] PROBLEM - puppet last run on eventlog1001 is CRITICAL: CRITICAL: puppet fail [05:24:45] (03CR) 10Dzahn: [C: 04-1] "yea, this is good but we will merge it next Wednesday with the new mailman version, not before" [puppet] - 10https://gerrit.wikimedia.org/r/235384 (owner: 10John F. Lewis) [05:29:43] PROBLEM - puppet last run on fermium is CRITICAL: CRITICAL: Puppet has 1 failures [05:30:56] (03PS4) 10Dzahn: varnish: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/211352 [05:32:49] (03CR) 10Dzahn: "Error: /Stage[main]/Mailman::Webui/File[/etc/mailman/pl/listinfo.html]/content:.. failed: invalid byte sequence in UTF-8" [puppet] - 10https://gerrit.wikimedia.org/r/235911 (owner: 10saper) [05:32:57] (03PS1) 10Dzahn: Revert "Covert Polish listinfo template back to ISO 8859-2" [puppet] - 10https://gerrit.wikimedia.org/r/235980 [05:34:00] (03PS2) 10Dzahn: Revert "Covert Polish listinfo template back to ISO 8859-2" [puppet] - 10https://gerrit.wikimedia.org/r/235980 [05:36:09] (03CR) 10Dzahn: [C: 032] "breaks on jessie, does not on lucid" [puppet] - 10https://gerrit.wikimedia.org/r/235980 (owner: 10Dzahn) [05:37:52] RECOVERY - puppet last run on fermium is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [05:41:08] (03CR) 10Muehlenhoff: "This isn't ready to enable yet, it needs rules for incoming SSH connections from pybal" [puppet] - 10https://gerrit.wikimedia.org/r/223244 (https://phabricator.wikimedia.org/T104970) (owner: 10Dzahn) [05:43:53] RECOVERY - puppet last run on eventlog1001 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [05:47:20] Anything with Parsoid? [05:47:49] It looks ContentTranslation can't publish article due to it. [06:14:18] (03PS1) 10Muehlenhoff: Restrict Hadoop access to the analytics network [puppet] - 10https://gerrit.wikimedia.org/r/235982 [06:23:54] (03PS1) 10Muehlenhoff: Restrict access to analytics network for Hadoop master/standby [puppet] - 10https://gerrit.wikimedia.org/r/235983 [06:29:09] (03PS1) 10Muehlenhoff: Use base::firewall on analytics1015 [puppet] - 10https://gerrit.wikimedia.org/r/235984 [06:31:32] PROBLEM - puppet last run on db2056 is CRITICAL: CRITICAL: puppet fail [06:32:22] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:23] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:13] PROBLEM - puppet last run on mw2050 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:13] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:53] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 2 failures [06:34:22] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 2 failures [06:54:21] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1605485 (10Qgil) @ashley, you got it! And the periodical reminder: please familiarize yourself with https://www.mediawiki.org/wiki/Phabricator... [07:00:44] RECOVERY - puppet last run on mw2050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:01:03] RECOVERY - puppet last run on db2056 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [07:01:53] RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [07:01:53] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [07:02:53] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:19:10] 6operations, 10Wikimedia-Git-or-Gerrit: Upgrade gerrit to latest 2.8.x (minor version upgrade) - https://phabricator.wikimedia.org/T65847#1605513 (10Nemo_bis) [07:25:42] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [07:27:13] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:38:31] (03Abandoned) 10Alexandros Kosiaris: Provide LE with the right to stop/start apertium-apy [puppet] - 10https://gerrit.wikimedia.org/r/235461 (https://phabricator.wikimedia.org/T108678) (owner: 10Alexandros Kosiaris) [07:39:32] (03CR) 10Alexandros Kosiaris: [C: 04-1] "also need to apply the group to the corresponding hosts. Done via role::sca so file is hieradata/role/common/sca.yaml" [puppet] - 10https://gerrit.wikimedia.org/r/235854 (https://phabricator.wikimedia.org/T111360) (owner: 10Dzahn) [07:41:54] godog: good morning! Grafana might be able to graph metrics from Elastic Search queries :-} https://github.com/grafana/grafana/issues/1034#issuecomment-137467692 [07:50:24] !log cloning es3 mysql data from es1008 to es1019 [07:50:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:50:36] [ETA:8h] [07:56:00] (03PS1) 10Muehlenhoff: Define initial debdeploy server groups [puppet] - 10https://gerrit.wikimedia.org/r/235987 [07:57:39] (03CR) 10Muehlenhoff: [C: 032 V: 032] Define initial debdeploy server groups [puppet] - 10https://gerrit.wikimedia.org/r/235987 (owner: 10Muehlenhoff) [07:57:48] morning, is there a way to bypass varnish (with some http headers) for this kind of URLs: https://en.wikipedia.org/w/api.php?action=opensearch ? [08:18:41] hashar: sweet! wasn't aware of that [08:18:41] 6operations, 10RESTBase-Cassandra, 10hardware-requests: codfw 3x spares for cassandra encryption testing - https://phabricator.wikimedia.org/T111382#1605591 (10fgiunchedi) >>! In T111382#1602899, @GWicke wrote: >> The recent restbase expansion order was entirely for eqiad. > > Umm, that surprises me. Those... [08:21:44] (03PS2) 10Ori.livneh: hhvm: Disable fss.so on MediaWiki canary servers [puppet] - 10https://gerrit.wikimedia.org/r/235954 [08:21:50] (03PS3) 10Ori.livneh: hhvm: Disable fss.so on MediaWiki canary servers [puppet] - 10https://gerrit.wikimedia.org/r/235954 [08:26:41] dcausse: if you are logged in that should bypass varnish, i.e. session or token cookies [08:27:16] godog: ok thanks will try [08:28:31] which puppet version are we running? [08:28:37] (and how do I check myself?) [08:30:03] saper: you can check from labs instances, puppet agent --version [08:30:28] 3.7.2 on jessie btw [08:31:58] 3.4.3 on tools-bastion-01 [08:32:16] which seems to be consistent with http://apt.wikimedia.org/wikimedia/pool/main/p/puppet/ [08:32:25] thank you [08:32:47] no problem saper [08:40:44] (03PS1) 10Hashar: nodepool: send metrics to statsd [puppet] - 10https://gerrit.wikimedia.org/r/235989 (https://phabricator.wikimedia.org/T111496) [08:41:46] (03PS1) 10KartikMistry: CX: Disable ContentTranslation until publishing is fixed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235990 [08:42:38] (03CR) 10Hashar: "Send them under `nodepool.` hierarchy." [puppet] - 10https://gerrit.wikimedia.org/r/235989 (https://phabricator.wikimedia.org/T111496) (owner: 10Hashar) [08:47:02] PROBLEM - puppet last run on mw1081 is CRITICAL: CRITICAL: Puppet has 1 failures [08:56:52] (03CR) 10Filippo Giunchedi: [C: 04-1] elasticsearch partman and autoinstall (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/235893 (owner: 10Rush) [08:57:06] (03CR) 10Filippo Giunchedi: "minor nit but LGTM, also do you have an example of metrics sent?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/235989 (https://phabricator.wikimedia.org/T111496) (owner: 10Hashar) [08:58:31] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add Debian package for apertium-fr-ca [debs/contenttranslation/apertium-fr-ca] - 10https://gerrit.wikimedia.org/r/235418 (https://phabricator.wikimedia.org/T99637) (owner: 10KartikMistry) [09:00:57] (03CR) 10Filippo Giunchedi: gdash: Remove 'frontend' dashboard (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/235961 (https://phabricator.wikimedia.org/T104365) (owner: 10Krinkle) [09:02:33] (03CR) 10Filippo Giunchedi: [C: 031] admin: add a group for apertium admins [puppet] - 10https://gerrit.wikimedia.org/r/235851 (https://phabricator.wikimedia.org/T111360) (owner: 10Dzahn) [09:06:47] (03CR) 10Hashar: [C: 04-1] "Will amend commit summary with example metrics." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/235989 (https://phabricator.wikimedia.org/T111496) (owner: 10Hashar) [09:07:22] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Added Debian package for apertium-eo-es [debs/contenttranslation/apertium-eo-es] - 10https://gerrit.wikimedia.org/r/235408 (https://phabricator.wikimedia.org/T102101) (owner: 10KartikMistry) [09:09:55] (03Abandoned) 10Muehlenhoff: Add ferm rules for role::mariadb::core [puppet] - 10https://gerrit.wikimedia.org/r/228804 (https://phabricator.wikimedia.org/T104699) (owner: 10Muehlenhoff) [09:09:59] hashar: btw does default need to be quoted on https://gerrit.wikimedia.org/r/#/c/233936/2/manifests/role/grafana.pp ? [09:10:02] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Added Debian package for apertium-eo-fr [debs/contenttranslation/apertium-eo-fr] - 10https://gerrit.wikimedia.org/r/235404 (https://phabricator.wikimedia.org/T102101) (owner: 10KartikMistry) [09:10:19] godog: at least on my recentish puppet version yeah [09:10:27] godog: I think a previous patchset failed as well [09:10:42] https://gerrit.wikimedia.org/r/#/c/233936/1..2/manifests/role/grafana.pp [09:10:45] quoted it [09:10:47] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add Debian package for apertium-eo-ca [debs/contenttranslation/apertium-eo-ca] - 10https://gerrit.wikimedia.org/r/235415 (https://phabricator.wikimedia.org/T102101) (owner: 10KartikMistry) [09:11:03] godog: and puppet parser validate failed on ps1 [09:11:34] RECOVERY - puppet last run on mw1081 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [09:12:18] godog: nodepool generates too many metrics :-/ [09:12:46] akosiaris: thanks! [09:12:55] hashar: how many? :D [09:12:56] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add Debian package for apertium-ca-it [debs/contenttranslation/apertium-ca-it] - 10https://gerrit.wikimedia.org/r/235410 (https://phabricator.wikimedia.org/T105582) (owner: 10KartikMistry) [09:13:32] godog: it keeps track of number of builds for a given job. So at least a metric per Jenkins job we are going to trigger with Nodepool [09:13:37] potentially a shit ton of the [09:13:40] m [09:14:19] and there is no doc for their metrics so I am browsing http://graphite.openstack.org :-} [09:16:09] !log uploaded to apt.wikimedia.org trusty-wikimedia: apertium-ca-it_0.1.1~r57554-1 [09:16:09] !log uploaded to apt.wikimedia.org trusty-wikimedia: apertium-eo-ca_0.9.1~r60655-1 [09:16:09] !log uploaded to apt.wikimedia.org trusty-wikimedia: apertium-eo-es_0.9.1~r60655-1 [09:16:09] !log uploaded to apt.wikimedia.org trusty-wikimedia: apertium-eo-fr_0.9.0~r28336-1 [09:16:09] !log uploaded to apt.wikimedia.org trusty-wikimedia: apertium-fr-ca_1.0.3~r61329-1 [09:16:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:16:18] kart_: ^ [09:16:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:16:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:16:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:16:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:18:35] hashar: yeah the per-job stats look like a whole lot [09:18:56] godog: yeah that is a no/no :-} [09:19:13] (03PS3) 10Filippo Giunchedi: grafana: graphite is the default datasource [puppet] - 10https://gerrit.wikimedia.org/r/233936 (https://phabricator.wikimedia.org/T110317) (owner: 10Hashar) [09:19:23] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] grafana: graphite is the default datasource [puppet] - 10https://gerrit.wikimedia.org/r/233936 (https://phabricator.wikimedia.org/T110317) (owner: 10Hashar) [09:20:07] (03CR) 10Hashar: "Thank Filippo to have suggested to give examples. I looked at Nodepool and on a job success it reports 8 metrics. So the more jobs we ru" [puppet] - 10https://gerrit.wikimedia.org/r/235989 (https://phabricator.wikimedia.org/T111496) (owner: 10Hashar) [09:20:38] (03Abandoned) 10Hashar: nodepool: send metrics to statsd [puppet] - 10https://gerrit.wikimedia.org/r/235989 (https://phabricator.wikimedia.org/T111496) (owner: 10Hashar) [09:21:04] (03PS3) 10Tobias Gritschacher: phragile: Add role class [puppet] - 10https://gerrit.wikimedia.org/r/227466 (https://phabricator.wikimedia.org/T108803) (owner: 10WMDE-leszek) [09:23:29] hashar: thanks! btw I'm not opposed in principle, your plan of poking upstream sounds good [09:25:20] godog: I am sure it will hit us down the road at some point [09:25:30] and we can live with out stats :-D So that is a very good catch! [09:25:56] saves you from a nasty text message at 4 am regarding disk space being at 1% or something [09:26:47] haha I hope it didn't come to that, but possible yeah! [09:27:17] potentially we could set different retention strategy for different hierarchy [09:33:16] 6operations, 7Graphite, 5Patch-For-Review: Grafana: singlestat / graph panels can not be edited - https://phabricator.wikimedia.org/T110317#1605820 (10hashar) 5Open>3Resolved Thank you very much, I managed to add a graph panel :-} [09:33:34] godog: at least we can add graphs panel easily in Grafana :-} [09:34:42] hashar: hehe no problem, sorry it's been sitting there for a while [09:35:08] hashar: btw since we tweaked all retentions space isn't so bad, zuul takes ~16G [09:35:36] I still owe you a patch to limit the metrics sent by Zuul :/ [09:36:47] yeah but don't worry too much about it, we're in a better place ATM [09:39:47] (03CR) 10Santhosh: [C: 04-1] "We have fix for the bugs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235990 (owner: 10KartikMistry) [09:49:44] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra inter-node encryption (TLS) - https://phabricator.wikimedia.org/T108953#1605919 (10MoritzMuehlenhoff) >>! In T108953#1597221, @fgiunchedi wrote: > thinking about it more, this scheme would make certs rollover harder since we'll need to have both new+ol... [10:04:07] akosiaris: more thanks! [10:13:43] 6operations, 6Services: Migrate SCA cluster to Jessie - https://phabricator.wikimedia.org/T96017#1606028 (10akosiaris) [10:14:39] 6operations, 6Services: Migrate SCA cluster to Jessie - https://phabricator.wikimedia.org/T96017#1206310 (10akosiaris) Removed the `Blocked-on-Operations` tag as all prerequisites are done and we can start migrating services to it at leisure [10:16:44] 6operations, 7Graphite: Remove graphite metrics trees gerrit.fab2 and gerrit.fab3 - https://phabricator.wikimedia.org/T110312#1606034 (10fgiunchedi) {{done}} ``` root@graphite2001:/var/lib/carbon/whisper# rm -rf gerrit/fab2/ gerrit/fab3/ root@graphite2001:/var/lib/carbon/whisper# root@graphite1001:/var/lib/c... [10:16:58] 6operations, 7Graphite: Remove graphite metrics trees gerrit.fab2 and gerrit.fab3 - https://phabricator.wikimedia.org/T110312#1606035 (10fgiunchedi) 5Open>3Resolved a:3fgiunchedi [10:18:55] !log kartik@tin Synchronized php-1.26wmf21/extensions/ContentTranslation/api/ApiContentTranslationPublish.php: php-1.26wmf21/extensions/ContentTranslation/extension.json T111490:Use the VirtualRESTService to configure CX (duration: 00m 12s) [10:19:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:22:14] 6operations, 10Beta-Cluster, 7Graphite, 7Shinken: Delete specific deployment-prep graphite datapoints - https://phabricator.wikimedia.org/T104091#1606044 (10fgiunchedi) 5Open>3Resolved {{done}} ``` root@labmon1001:/var/lib/carbon/whisper/deployment-prep# rm -rfv deployment-elastic05/diskspace/_var dep... [10:24:44] 6operations, 7Graphite, 7Monitoring: evaluate tessera dashboards - https://phabricator.wikimedia.org/T104366#1606052 (10fgiunchedi) 5Open>3declined given that grafana is actively maintained and with a bigger community we should use that in favor of tessera, thus declining this. note that tessera can stil... [10:26:27] 6operations, 7Graphite: graphite2001 OOM and unresponsive - https://phabricator.wikimedia.org/T101572#1606068 (10fgiunchedi) 5Open>3stalled upstream issue hasn't seen a lot of activity, stalling [10:31:13] (03Abandoned) 10KartikMistry: CX: Disable ContentTranslation until publishing is fixed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235990 (owner: 10KartikMistry) [10:43:00] (03PS1) 10KartikMistry: apertium: Add new language pairs [puppet] - 10https://gerrit.wikimedia.org/r/236004 [10:43:51] (03CR) 10jenkins-bot: [V: 04-1] apertium: Add new language pairs [puppet] - 10https://gerrit.wikimedia.org/r/236004 (owner: 10KartikMistry) [10:52:20] (03PS2) 10KartikMistry: apertium: Add new language pairs [puppet] - 10https://gerrit.wikimedia.org/r/236004 [10:56:59] (03CR) 10Filippo Giunchedi: [C: 04-1] "some more comments around host filtering, LGTM overall" (033 comments) [tools/scap] - 10https://gerrit.wikimedia.org/r/235385 (owner: 10Thcipriani) [11:30:43] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds [11:34:33] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 38.46% of data above the critical threshold [500.0] [11:34:53] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 6 below the confidence bounds [11:37:09] 10Ops-Access-Requests, 6operations: Requesting access to elasticsearch-roots - https://phabricator.wikimedia.org/T111473#1606243 (10Wwes) Approved [11:42:40] (03PS1) 10Muehlenhoff: Implement list-server-groups command [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/236009 [11:46:33] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [11:50:31] (03CR) 10Muehlenhoff: [C: 032 V: 032] Implement list-server-groups command [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/236009 (owner: 10Muehlenhoff) [12:13:13] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [12:47:19] !log uploaded debdeploy 0.0.4 to carbon [12:47:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:48:06] headsup: I'll restart the salt-master in about 5 minutes (minions will take up to 1-2 minutes to fully reconnect), if that's bad for anyone, please speak out [12:54:14] assuming it comes back, no problem for me ;-) [12:55:45] !log restarted salt-master on palladium [12:55:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:56:36] jynus: all well, most minions have reconnected already [13:09:03] (03CR) 10Krinkle: gdash: Remove 'frontend' dashboard (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/235961 (https://phabricator.wikimedia.org/T104365) (owner: 10Krinkle) [13:16:12] (03CR) 10Alexandros Kosiaris: [C: 032] apertium: Add new language pairs [puppet] - 10https://gerrit.wikimedia.org/r/236004 (owner: 10KartikMistry) [13:37:35] Krinkle: I guess just sth like "deprecated: see " would be enough re: https://gerrit.wikimedia.org/r/#/c/235961/ [13:37:43] (03CR) 10Ottomata: [C: 031] Restrict Hadoop access to the analytics network [puppet] - 10https://gerrit.wikimedia.org/r/235982 (owner: 10Muehlenhoff) [13:37:48] godog: Does it support HTML? [13:39:24] Krinkle: no idea, just the pointer is fine I think [13:39:32] godog: openstack wrote some YAML based DSL to generate Grafana dashboards :-D That let you make graph templates ! https://github.com/openstack-infra/grafyaml [13:39:53] (03CR) 10Ottomata: [C: 031] "I think this is fine, but let's be careful with it and log drops when we enable base::firewall" [puppet] - 10https://gerrit.wikimedia.org/r/235983 (owner: 10Muehlenhoff) [13:40:00] sweet hashar ! [13:40:47] (03CR) 10Ottomata: [C: 031] "This node is going to be a hive::server and oozie::server after it gets reinstalled." [puppet] - 10https://gerrit.wikimedia.org/r/235984 (owner: 10Muehlenhoff) [13:41:33] godog: they are very early in their process. The dashboard is empty for now :/ [13:42:00] They're running a much newer version though [13:43:28] 6operations, 10ops-codfw: Patch NTT @ eqdfw - https://phabricator.wikimedia.org/T111522#1606387 (10faidon) 3NEW a:3RobH [13:45:53] is there a way to see the % of queries that hit the varnish cache for opensearch? [13:46:32] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, the commit message still references .scaprc btw" [tools/scap] - 10https://gerrit.wikimedia.org/r/224374 (owner: 10Thcipriani) [14:00:20] (03CR) 10Hashar: [C: 032 V: 032] "Build and deployed!" [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/235696 (https://phabricator.wikimedia.org/T107268) (owner: 10Hashar) [14:26:48] (03PS2) 10Dzahn: admin: add kartik to apertium-admins [puppet] - 10https://gerrit.wikimedia.org/r/235854 (https://phabricator.wikimedia.org/T111360) [14:26:55] 6operations, 10hardware-requests: Request three servers for Pageview API - https://phabricator.wikimedia.org/T111053#1606521 (10akosiaris) After the meeting ops had with analytics where the requirements were clarified, this was proposed. I support 3 different spare boxes on 3 rack rows (if possible) in eqiad s... [14:27:22] godog: RE: https://phabricator.wikimedia.org/T111170#1605584 - even in very recently aggregated data points it's still not using the new logic I think [14:27:31] "newly created whisper files" what does that mean? [14:27:32] (03CR) 10Dzahn: "@akosiaris - done" [puppet] - 10https://gerrit.wikimedia.org/r/235854 (https://phabricator.wikimedia.org/T111360) (owner: 10Dzahn) [14:27:37] A metric name? Per day? Per week? [14:27:53] (03PS3) 10Dzahn: admin: add a group for apertium admins [puppet] - 10https://gerrit.wikimedia.org/r/235851 (https://phabricator.wikimedia.org/T111360) [14:28:31] (03CR) 10Dzahn: [C: 032] admin: add a group for apertium admins [puppet] - 10https://gerrit.wikimedia.org/r/235851 (https://phabricator.wikimedia.org/T111360) (owner: 10Dzahn) [14:29:05] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [14:29:17] 6operations, 10hardware-requests: Request three servers for Pageview API - https://phabricator.wikimedia.org/T111053#1606525 (10akosiaris) [14:30:21] godog: basically, I'd like to see it working on recent data before we ditch history and re-run it retroactively [14:30:27] (03PS2) 10Muehlenhoff: Use base::firewall on analytics1015 [puppet] - 10https://gerrit.wikimedia.org/r/235984 [14:30:42] (03CR) 10Muehlenhoff: [C: 032 V: 032] Use base::firewall on analytics1015 [puppet] - 10https://gerrit.wikimedia.org/r/235984 (owner: 10Muehlenhoff) [14:31:25] hashar: jenkins upgrade on gallium can be done, if you want [14:31:36] (03PS2) 10Muehlenhoff: Restrict Hadoop access to the analytics network [puppet] - 10https://gerrit.wikimedia.org/r/235982 [14:32:06] (03CR) 10Muehlenhoff: [C: 032 V: 032] Restrict Hadoop access to the analytics network [puppet] - 10https://gerrit.wikimedia.org/r/235982 (owner: 10Muehlenhoff) [14:32:50] (03PS2) 10Hashar: statsd and systemd support [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/224390 (https://phabricator.wikimedia.org/T96867) [14:33:20] (03CR) 10Hashar: "systemd for the .deb package is being worked on at https://gerrit.wikimedia.org/r/#/c/224390/" [puppet] - 10https://gerrit.wikimedia.org/r/224102 (https://phabricator.wikimedia.org/T96867) (owner: 10Hashar) [14:33:23] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 0 below the confidence bounds [14:33:52] (03PS1) 10Phedenskog: Collect missing Navigation Timing metrics [puppet] - 10https://gerrit.wikimedia.org/r/236024 (https://phabricator.wikimedia.org/T109756) [14:34:24] mutante: I will do it on monday :-} [14:35:37] hashar: ok:) less users then since US will have Labor Day [14:36:22] (03PS2) 10Muehlenhoff: Restrict access to analytics network for Hadoop master/standby [puppet] - 10https://gerrit.wikimedia.org/r/235983 [14:36:25] 6operations, 10Continuous-Integration-Infrastructure, 7Jenkins: Please refresh Jenkins package on apt.wikimedia.org to 1.609.3 - https://phabricator.wikimedia.org/T111327#1606548 (10hashar) Confirmed: ``` $ ssh gallium.wikimedia.org apt-cache policy jenkins jenkins: Installed: 1.609.1 Candidate: 1.609.3... [14:36:46] (03CR) 10Muehlenhoff: [C: 032 V: 032] Restrict access to analytics network for Hadoop master/standby [puppet] - 10https://gerrit.wikimedia.org/r/235983 (owner: 10Muehlenhoff) [14:44:34] mutante: I have scheduled Jenkins upgrade for monday at 8am UTC . Thank you ! [14:45:13] hashar: welcome [14:48:07] (03PS1) 10Amire80: Configure $wgBabelCategoryNames for the Hebrew Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236025 [14:48:25] (03CR) 10Krinkle: [C: 031] Collect missing Navigation Timing metrics [puppet] - 10https://gerrit.wikimedia.org/r/236024 (https://phabricator.wikimedia.org/T109756) (owner: 10Phedenskog) [14:49:02] (03PS2) 10Krinkle: Collect missing Navigation Timing metrics [puppet] - 10https://gerrit.wikimedia.org/r/236024 (https://phabricator.wikimedia.org/T109756) (owner: 10Phedenskog) [14:50:09] (03CR) 10Dzahn: [C: 032] wmnet: indentation fixes [dns] - 10https://gerrit.wikimedia.org/r/235928 (owner: 10Dzahn) [14:50:50] (03PS3) 10Hashar: statsd and systemd support [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/224390 (https://phabricator.wikimedia.org/T96867) [14:54:55] (03PS3) 10Phedenskog: Collect missing Navigation Timing metrics [puppet] - 10https://gerrit.wikimedia.org/r/236024 (https://phabricator.wikimedia.org/T109756) [14:55:29] Krinkle: ack, I'm in the middle of sth but will get back to it [14:55:43] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [14:56:13] k :) [14:56:32] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [14:57:43] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [14:58:33] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [14:59:31] (03PS4) 10Phedenskog: Collect missing Navigation Timing metrics [puppet] - 10https://gerrit.wikimedia.org/r/236024 (https://phabricator.wikimedia.org/T109756) [15:00:18] 6operations, 10hardware-requests: Request three servers for Pageview API - https://phabricator.wikimedia.org/T111053#1606574 (10Ottomata) The boxes that can be slated for this currently are: - analytics1011 - analytics1016 - analytics1017 - analytics1019 - analytics1015 The first 4 in this list are still liv... [15:00:21] (03CR) 10jenkins-bot: [V: 04-1] Collect missing Navigation Timing metrics [puppet] - 10https://gerrit.wikimedia.org/r/236024 (https://phabricator.wikimedia.org/T109756) (owner: 10Phedenskog) [15:06:56] (03PS4) 10Hashar: statsd and systemd support [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/224390 (https://phabricator.wikimedia.org/T96867) [15:12:45] (03PS5) 10Hashar: 0.1.1-wmf3: statsd and systemd support [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/224390 (https://phabricator.wikimedia.org/T96867) [15:17:50] (03CR) 10Hashar: "Moritz, this is ready for review :-)" [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/224390 (https://phabricator.wikimedia.org/T96867) (owner: 10Hashar) [15:26:11] 6operations, 10vm-requests: Site: 1 VM request for OTRS - https://phabricator.wikimedia.org/T111532#1606606 (10akosiaris) 3NEW a:3akosiaris [15:34:51] (03PS1) 10Alexandros Kosiaris: Remove etcd100X from manifests [puppet] - 10https://gerrit.wikimedia.org/r/236030 (https://phabricator.wikimedia.org/T110030) [15:35:50] !log python varnishlog collector + gdb running on cp1052 for debugging T83580 [15:35:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:36:29] (03PS1) 10Alexandros Kosiaris: Remove etcd100X from DNS [dns] - 10https://gerrit.wikimedia.org/r/236031 (https://phabricator.wikimedia.org/T110030) [15:39:26] (03PS1) 10Jcrespo: Repool es1002, es1016; Depool es1004 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236032 [15:39:59] 6operations, 10vm-requests: Site: 1 VM request for OTRS - https://phabricator.wikimedia.org/T111532#1606674 (10akosiaris) [15:40:16] (03CR) 10Jcrespo: [C: 032] Repool es1002, es1016; Depool es1004 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236032 (owner: 10Jcrespo) [15:40:58] (03PS1) 10Alexandros Kosiaris: Introduce mendelevium to the cluster [dns] - 10https://gerrit.wikimedia.org/r/236033 (https://phabricator.wikimedia.org/T111532) [15:42:05] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1002, es1016; Depool es1004 (duration: 00m 11s) [15:42:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:43:08] 6operations, 10vm-requests, 5Patch-For-Review: Site: 1 VM request for OTRS - https://phabricator.wikimedia.org/T111532#1606682 (10Krenair) [15:43:15] 6operations, 10OTRS, 10vm-requests, 5Patch-For-Review: Site: 1 VM request for OTRS - https://phabricator.wikimedia.org/T111532#1606684 (10Krenair) [15:46:34] (03PS2) 10Alexandros Kosiaris: Remove etcd100X from manifests [puppet] - 10https://gerrit.wikimedia.org/r/236030 (https://phabricator.wikimedia.org/T110030) [15:46:41] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Remove etcd100X from manifests [puppet] - 10https://gerrit.wikimedia.org/r/236030 (https://phabricator.wikimedia.org/T110030) (owner: 10Alexandros Kosiaris) [15:47:53] (03CR) 10Alexandros Kosiaris: [C: 032] Remove etcd100X from DNS [dns] - 10https://gerrit.wikimedia.org/r/236031 (https://phabricator.wikimedia.org/T110030) (owner: 10Alexandros Kosiaris) [15:49:31] 7Blocked-on-Operations, 5Patch-For-Review: Remove etcd100{1,2,3}.eqiad.wmnet from the fleet - https://phabricator.wikimedia.org/T110030#1606691 (10akosiaris) 5Open>3Resolved [15:49:39] 7Blocked-on-Operations, 5Patch-For-Review: Remove etcd100{1,2,3}.eqiad.wmnet from the fleet - https://phabricator.wikimedia.org/T110030#1567171 (10akosiaris) [15:54:38] 6operations, 10OTRS, 10vm-requests, 5Patch-For-Review: Site: 1 VM request for OTRS - https://phabricator.wikimedia.org/T111532#1606696 (10akosiaris) [15:54:48] haha akosiaris [15:54:58] hive/hadoop webrequest log access [15:55:06] requires membership in analytics-privatedata-users [15:55:10] not statistics-privatedata-users [15:55:15] https://gerrit.wikimedia.org/r/#/c/233376/ [15:57:16] (03PS1) 10Alexandros Kosiaris: Introduce mendelevium to the cluster [puppet] - 10https://gerrit.wikimedia.org/r/236035 (https://phabricator.wikimedia.org/T111532) [15:57:36] ottomata: yeah that's why I was asking back then what is going on with those 2 groups [15:58:09] haha, i guess we didn't clear it up! i've since chagned the descriptions in data.yal [15:58:11] yaml [15:58:17] if you had read them as they are now [15:58:24] would you have put them in analytics-privatedata-users? [15:58:58] I think now it's clear [15:59:05] I would have put the user in the correct group [16:01:31] 6operations, 10OTRS, 10vm-requests, 5Patch-For-Review: Site: 1 VM request for OTRS - https://phabricator.wikimedia.org/T111532#1606702 (10akosiaris) p:5Triage>3High [16:01:56] akosiaris: you fixing or shall I? [16:03:40] ah, I had already fixed it for smalyshev https://phabricator.wikimedia.org/T110217. erik eluded me for some reason [16:03:49] I 'll fix it [16:06:28] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [16:07:31] (03PS2) 10Krinkle: gdash: Remove 'frontend' dashboard [puppet] - 10https://gerrit.wikimedia.org/r/235961 (https://phabricator.wikimedia.org/T104365) [16:07:33] (03PS1) 10Alexandros Kosiaris: Move ebernhardson to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/236036 (https://phabricator.wikimedia.org/T109356) [16:09:22] ottomata: ^ looks ok I suppose ? [16:10:04] 6operations, 10OTRS, 10vm-requests, 5Patch-For-Review: EQIAD: 1 VM request for OTRS - https://phabricator.wikimedia.org/T111532#1606782 (10akosiaris) [16:10:53] (03CR) 10Alexandros Kosiaris: [C: 032] Move ebernhardson to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/236036 (https://phabricator.wikimedia.org/T109356) (owner: 10Alexandros Kosiaris) [16:11:18] !log updating firewall border ACLs and BGP border filters across all cr [16:11:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:17:08] ak yes +1 [16:17:19] akosiaris: +1 [16:25:14] 6operations, 10OTRS, 10vm-requests, 5Patch-For-Review: EQIAD: 1 VM request for OTRS - https://phabricator.wikimedia.org/T111532#1606819 (10pajz) Are there practical implications of this for administrative maintenance of OTRS? Would this affect the server name (iodine)? Does it require a re-configuration of... [16:26:05] (03CR) 10Dzahn: "nevermind, papaul did mgmt and i merged that, you are doing server IPs. no conflict" [dns] - 10https://gerrit.wikimedia.org/r/235906 (owner: 10Rush) [16:26:15] (03PS3) 10Rush: New addition elasticsearch20[0-1][0-9] [dns] - 10https://gerrit.wikimedia.org/r/235906 [16:27:02] 6operations, 10ops-codfw, 5Patch-For-Review: rack & initial setup of elastic2001-2024 - https://phabricator.wikimedia.org/T111080#1606820 (10chasemp) https://gerrit.wikimedia.org/r/#/c/235906/ [16:27:09] !log cloning es1 mysql data from es1004 to es1018 [ETA:16h] [16:27:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:27:35] ^and that will make 9/9 [16:29:05] (03PS5) 10BBlack: varnish: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/211352 (owner: 10Dzahn) [16:30:58] (03CR) 10BBlack: [C: 032] varnish: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/211352 (owner: 10Dzahn) [16:33:33] (03PS2) 10BBlack: GeoIP: specify lat/lon to two decimal places [puppet] - 10https://gerrit.wikimedia.org/r/235543 (owner: 10Ori.livneh) [16:36:35] 6operations, 6Performance-Team, 7Graphite, 5Patch-For-Review: "sum" aggregation broken in Graphite - https://phabricator.wikimedia.org/T111170#1606845 (10fgiunchedi) [16:36:49] Krinkle: let me know if https://phabricator.wikimedia.org/T111170#1606844 helps [16:38:54] (03CR) 10BBlack: "The code change is simple, and at %.2f, I think worst case (varies by position on globe, obviously) we'd be looking at a ~1.3km resolution" [puppet] - 10https://gerrit.wikimedia.org/r/235543 (owner: 10Ori.livneh) [16:40:29] godog: so at this point, I should get dashboards moved to grafana right? [16:41:08] 6operations, 10hardware-requests: Request three servers for Pageview API - https://phabricator.wikimedia.org/T111053#1606858 (10JAllemandou) [16:42:21] 6operations, 10hardware-requests: Request three servers for Pageview API - https://phabricator.wikimedia.org/T111053#1593063 (10JAllemandou) The machines @Ottomata describes have no SSDs --> @akosiaris: Is that a no-go ? [16:43:42] bblack: yeah, after playing with it a bit more grafana seems better if for no other reason that's actively maintained [16:44:29] (03CR) 10EBernhardson: "minor quibble, the commit message only encompases elasticsearch20{01..19} but the patch includes servers through 24." [dns] - 10https://gerrit.wikimedia.org/r/235906 (owner: 10Rush) [16:44:31] its also kind of addictive [16:44:39] can't stop tweaking dashboard [16:45:54] yeah I was looking at grafana v2 and we should able to fix that, it can load json dashboards from local fs too [16:46:10] that == dashboards can change at any time [16:46:10] nice [16:46:18] would be cool to be able to puppetize dashboards [16:46:55] we should have a competition for most useful and prettiest dashboard [16:47:54] 6operations, 10Beta-Cluster, 7Graphite, 7Shinken: Delete more specific deployment-prep graphite datapoints - https://phabricator.wikimedia.org/T111540#1606894 (10Krenair) 3NEW [16:48:01] the winner gets a rag puppet! [16:50:49] (03PS4) 10Rush: New addition elasticsearch20[0-2][0-9] [dns] - 10https://gerrit.wikimedia.org/r/235906 [16:52:39] 6operations, 10Datasets-General-or-Unknown, 7HHVM: Convert snapshot hosts to use HHVM and trusty - https://phabricator.wikimedia.org/T94277#1606911 (10demon) >>! In T94277#1605165, @Krenair wrote: > @ArielGlenn: What's the status of this? MediaWiki is not going to support PHP 5.3 forever, at some point these... [17:00:31] godog, it seems kind of suboptimal that we have to have production roots clean up after deleting labs instances, or even changing mounts on them :/ [17:00:36] but thanks for handling those tickets anyway [17:03:18] Krenair: I agree it is suboptimal :( for mount points specifically we might be able to get smarter since it'll be recurring I suspect, otoh processing those in batches every now and then isn't a lot of work either [17:03:29] (03PS1) 10Amire80: Add wmgBabelCategoryNames values for the Ladino Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236042 [17:12:20] (03PS3) 10Filippo Giunchedi: gdash: Remove 'frontend' dashboard [puppet] - 10https://gerrit.wikimedia.org/r/235961 (https://phabricator.wikimedia.org/T104365) (owner: 10Krinkle) [17:12:26] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] gdash: Remove 'frontend' dashboard [puppet] - 10https://gerrit.wikimedia.org/r/235961 (https://phabricator.wikimedia.org/T104365) (owner: 10Krinkle) [17:16:56] (03CR) 10Ori.livneh: "@BBlack: yep: https://github.com/wikimedia/mediawiki-extensions-CentralNotice/blob/master/resources/subscribing/ext.centralNotice.geoIP.js" [puppet] - 10https://gerrit.wikimedia.org/r/235543 (owner: 10Ori.livneh) [17:17:31] 6operations, 10netops: Filter outgoing BGP announcements on AS regex - https://phabricator.wikimedia.org/T83037#1607012 (10faidon) [17:19:10] (03PS4) 10Ori.livneh: hhvm: Disable fss.so on MediaWiki canary servers [puppet] - 10https://gerrit.wikimedia.org/r/235954 [17:19:19] (03CR) 10Ori.livneh: [C: 032 V: 032] hhvm: Disable fss.so on MediaWiki canary servers [puppet] - 10https://gerrit.wikimedia.org/r/235954 (owner: 10Ori.livneh) [17:19:26] (03CR) 10BBlack: [C: 031] GeoIP: specify lat/lon to two decimal places [puppet] - 10https://gerrit.wikimedia.org/r/235543 (owner: 10Ori.livneh) [17:19:52] bblack: i appreciate that merge [17:20:09] (03CR) 10Krinkle: [C: 031] GeoIP: specify lat/lon to two decimal places [puppet] - 10https://gerrit.wikimedia.org/r/235543 (owner: 10Ori.livneh) [17:20:46] (03PS3) 10Ori.livneh: GeoIP: specify lat/lon to two decimal places [puppet] - 10https://gerrit.wikimedia.org/r/235543 [17:20:57] (03CR) 10Ori.livneh: [C: 032 V: 032] GeoIP: specify lat/lon to two decimal places [puppet] - 10https://gerrit.wikimedia.org/r/235543 (owner: 10Ori.livneh) [17:24:34] (03PS1) 10Greg Grossmeier: Revert "Disable anonymous page creation on swWiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236045 (https://phabricator.wikimedia.org/T44894) [17:24:37] (03CR) 10jenkins-bot: [V: 04-1] Revert "Disable anonymous page creation on swWiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236045 (https://phabricator.wikimedia.org/T44894) (owner: 10Greg Grossmeier) [17:25:55] ori: \o/ [17:26:06] paravoid: fss? [17:26:13] yes [17:26:16] yeah, looking good [17:26:22] ebernhardson did some awesome work [17:26:22] i dunno if its just me, but bast1001.wikimedia.org is resolving as ipv6, but port 22 just hangs forever (filtered perhaps?) [17:26:24] awesome [17:26:28] i can still connect to the ipv4 address for bast1001 though [17:26:41] ebernhardson: works from here [17:26:54] but I did change a lot of filters in our border routers today [17:26:58] odd, i'll just force ipv4 for now could be my isp doing wackiness [17:26:58] so let's please debug this [17:27:01] ok [17:27:10] what's your ipv6? [17:27:27] paravoid: 2601:648:8402:c015:bdeb:9ed9:234:82a4 [17:27:48] can you try to connect now? [17:28:52] paravoid: still just sitting there. i noticed the other connection i let sit for awhile did eventually connect though [17:29:07] but this one has been going >30s now and now connection yet [17:29:25] if you let it sit for a while it will fallback to ipv4 [17:29:33] ahh, ok [17:29:56] tcp6 0 1000 2620:0:861:2:208:80::22 2601:648:8402:c01:39256 ESTABLISHED 30957/sshd: ebernha [17:30:00] I do see that, though [17:30:07] FWIW, I can reach it over ipv6 fine from my linode bouncer [17:30:13] you're sending packets back and forth over that too [17:30:15] ControlMaster? [17:30:54] 17:29:56.605340 IP6 2620:0:861:2:208:80:154:149.22 > 2601:648:8402:c015:bdeb:9ed9:234:82a4.39256: Flags [P.], seq 0:100, ack 1, win 83, options [nop,nop,TS val 456897664 ecr 20655030], length 100 [17:30:58] 17:29:56.710298 IP6 2601:648:8402:c015:bdeb:9ed9:234:82a4.39256 > 2620:0:861:2:208:80:154:149.22: Flags [.], ack 100, win 1429, options [nop,nop,TS val 20863968 ecr 456897664,nop,nop,sack 1 {0:100}], length 0 [17:31:00] 6operations, 6Performance-Team, 7Graphite, 5Patch-For-Review: "sum" aggregation broken in Graphite - https://phabricator.wikimedia.org/T111170#1607084 (10Krinkle) @fgiunchedi Ah explains the behaviour I've been seeing. The settings are hardcoded when new whisper files are created. Whenever fresh data is ag... [17:31:02] 17:29:56.710339 IP6 2620:0:861:2:208:80:154:149.22 > 2601:648:8402:c015:bdeb:9ed9:234:82a4.39256: Flags [P.], seq 100:1000, ack 1, win 83, options [nop,nop,TS val 456897690 ecr 20863968], length 900 [17:31:06] 17:29:56.789521 IP6 2601:648:8402:c015:bdeb:9ed9:234:82a4.39256 > 2620:0:861:2:208:80:154:149.22: Flags [.], ack 1000, win 1424, options [nop,nop,TS val 20863988 ecr 456897690], length 0 [17:31:50] (03PS1) 10Ori.livneh: Disable fss.so on all HHVM servers [puppet] - 10https://gerrit.wikimedia.org/r/236046 (https://phabricator.wikimedia.org/T101418) [17:32:13] ori: not today please :) [17:32:25] paravoid: I was about to -1 it saying that :P [17:32:29] :P [17:33:11] (03CR) 10Ori.livneh: [C: 04-1] "Let's allow I6985a8128 to sit in prod over the weekend and push this out on Monday." [puppet] - 10https://gerrit.wikimedia.org/r/236046 (https://phabricator.wikimedia.org/T101418) (owner: 10Ori.livneh) [17:33:20] paravoid: the ControlMaster thing, its set to 'auto' in my .ssh/config [17:33:24] (for bast1001) [17:34:00] try looking for another ssh session with bast1001 [17:34:05] ps aux |grep ssh [17:34:18] (03CR) 10Ori.livneh: "Also note that this does not ensure => absent the hhvm-fss package; I'll do that with Salt post-deploy." [puppet] - 10https://gerrit.wikimedia.org/r/236046 (https://phabricator.wikimedia.org/T101418) (owner: 10Ori.livneh) [17:34:57] paravoid: i do have a connection open to terbium, which is proxies through bast1001. but i've never had issues opening multiple connections before [17:35:14] which, now that i think about it, means bast1001 worked fine ~ 2 hours ago when i opened that conection [17:35:25] !log Maps: dropped duplicate index on water_polygons [17:35:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:35:38] try "ssh -S none bast1001.wikimedia.org" [17:36:19] paravoid: connects immediatly and SSH_CLIENT is an ipv6 address. your a magician. now to learn what that all means :) [17:36:32] "-S none" means to not use the control master [17:36:59] your control master is hung or broken or something [17:37:23] interesting. well learning new useful things every day. thanks! [17:37:33] try logging out from terbium [17:37:46] (and possibly kill the master, if you have ControlPersist set) [17:37:53] and logging in again [17:38:33] yup just killing it did the trick. ssh got stuffed somehow [17:39:38] * ori likes https://github.com/ClockworkNet/cmc [17:46:29] ebernhardson: btw, since you're here, thanks for the fss replacement effort :) [17:46:33] this is pretty awesome work [17:47:01] paravoid: ended up being alot more work that i envisioned, but i do like the end result :) [17:49:03] 6operations, 6Performance-Team, 7Graphite, 5Patch-For-Review: "sum" aggregation broken in Graphite - https://phabricator.wikimedia.org/T111170#1607167 (10fgiunchedi) >>! In T111170#1607084, @Krinkle wrote: > @fgiunchedi Ah explains the behaviour I've been seeing. The settings are hardcoded when new whisper... [17:52:35] 6operations, 7Graphite, 7Monitoring: grafana.wikimedia.org calls out to AWS - https://phabricator.wikimedia.org/T110484#1607236 (10Krinkle) [17:54:38] PROBLEM - Host mw2027 is DOWN: PING CRITICAL - Packet loss = 100% [17:55:12] 6operations, 7Graphite: grafana access control - https://phabricator.wikimedia.org/T108546#1607436 (10Krinkle) > graphite.wikimedia.org requires a WMF account, while grafana.wikimedia.org is completely open and exposes the exact same set of metrics. graphite.wikimedia.org only requires a WMF account for its... [17:55:26] YuviPanda: [spam] http://www.projectcalico.org might be interesting [17:55:58] RECOVERY - Host mw2027 is UP: PING OK - Packet loss = 0%, RTA = 37.63 ms [17:56:10] aude, feel free to reopen the ticket, I didn't want to make the bot operatpr feel bad [18:01:31] 6operations, 6Performance-Team, 7Graphite, 5Patch-For-Review: "sum" aggregation broken in Graphite - https://phabricator.wikimedia.org/T111170#1607685 (10ori) This should do it: ```lang=sh #!/usr/bin/env bash shopt -s globstar for wsp in /var/lib/carbon/whisper/**/sum.wsp; do if [[ $(/usr/bin/whisper-in... [18:01:46] 6operations, 6Discovery, 10Incident-20150825-Redis, 3Discovery-Cirrus-Sprint, and 2 others: Update Elasticsearch for missing updates from outage on 20150825 - https://phabricator.wikimedia.org/T110179#1607686 (10EBernhardson) 5Open>3Resolved [18:04:29] PROBLEM - puppet last run on francium is CRITICAL: CRITICAL: Puppet has 1 failures [18:04:31] 6operations, 6Performance-Team, 7Graphite, 5Patch-For-Review: "sum" aggregation broken in Graphite - https://phabricator.wikimedia.org/T111170#1607731 (10fgiunchedi) @ori that would, before running it we should check `whisper-set-aggregation-method` locks the whisper file or risk clashing with `carbon-cache` [18:04:52] 6operations, 6Phabricator: phabricator metrics script should use slave, not master - https://phabricator.wikimedia.org/T111547#1607732 (10Dzahn) 3NEW [18:05:27] jynus: the bot operator is just addshore :) [18:05:42] who also works for wmde now [18:06:00] !log krinkle@tin Synchronized php-1.26wmf21/extensions/WikimediaEvents/modules/ext.wikimediaEvents.statsd.js: Ib98988f67ef (duration: 00m 11s) [18:06:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:06:12] we want to take a look at the code, for sure [18:06:20] 6operations, 6Phabricator: phabricator metrics script should use slave, not master - https://phabricator.wikimedia.org/T111547#1607784 (10Dzahn) @aklapper we should adjust your script to use a slave instead of the master, per above [18:07:55] 6operations, 6Phabricator: phabricator metrics script should use slave, not master - https://phabricator.wikimedia.org/T111547#1607858 (10jcrespo) Reasons: 1) It will not affect production traffic, and 2) It will run faster! [18:08:16] 6operations, 6Performance-Team, 7Graphite, 5Patch-For-Review: "sum" aggregation broken in Graphite - https://phabricator.wikimedia.org/T111170#1607879 (10ori) >>! In T111170#1607731, @fgiunchedi wrote: > @ori that would, before running it we should check `whisper-set-aggregation-method` locks the whisper f... [18:09:41] (03CR) 10Greg Grossmeier: "Is it just me, or is this already removed?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236045 (https://phabricator.wikimedia.org/T44894) (owner: 10Greg Grossmeier) [18:10:16] legoktm: Reedy: can you sanity check me on this change: https://gerrit.wikimedia.org/r/#/c/236045/ [18:10:28] * legoktm looks [18:11:02] greg-g: it's still there.... [18:11:06] where? [18:11:23] oh... wait, nvm, I'm looking at my checkout with it applied [18:11:31] https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php :) [18:12:24] Reedy: nvm! [18:14:01] (03PS11) 10Thcipriani: Add service deploy via scap [tools/scap] - 10https://gerrit.wikimedia.org/r/224374 [18:14:54] 6operations, 6Performance-Team, 7Graphite, 5Patch-For-Review: "sum" aggregation broken in Graphite - https://phabricator.wikimedia.org/T111170#1608403 (10fgiunchedi) indeed, IIRC `carbon` sets `whisper.LOCK = True` explicitly if gets configured with locking writes, which we do [18:15:21] 6operations, 6Phabricator: phabricator metrics script should use slave, not master - https://phabricator.wikimedia.org/T111547#1608436 (10Dzahn) The phabricator puppet role class sets `$mysql_host = 'm3-master.eqiad.wmnet'` to be used for phab in general and then these metrics scripts just do `sql_host='<%= @m... [18:17:55] (03PS2) 10Greg Grossmeier: Revert "Disable anonymous page creation on swWiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236045 (https://phabricator.wikimedia.org/T44894) [18:18:37] (03CR) 10Greg Grossmeier: "Fixed :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236045 (https://phabricator.wikimedia.org/T44894) (owner: 10Greg Grossmeier) [18:18:43] 6operations, 6Phabricator, 7Database: phabricator metrics script should use slave, not master - https://phabricator.wikimedia.org/T111547#1608520 (10Dzahn) [18:20:05] 6operations, 6Performance-Team, 7Graphite, 5Patch-For-Review: "sum" aggregation broken in Graphite - https://phabricator.wikimedia.org/T111170#1608522 (10ori) Hacked this together, seems to work (but did not run it in prod): ```lang=python, name=update-aggr.py #!/usr/bin/env python # -*- coding: utf-8 -*-... [18:21:34] (03PS3) 10Jforrester: Revert "Disable anonymous page creation on swWiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236045 (https://phabricator.wikimedia.org/T44894) (owner: 10Greg Grossmeier) [18:26:56] 6operations, 10MediaWiki-extensions-GWToolset, 6Multimedia, 7Performance: Can Commons support a mass upload of 14 million files (1.5 TB)? - https://phabricator.wikimedia.org/T88758#1608680 (10Jdforrester-WMF) Ping on this – what's the status? [18:27:10] 6operations, 10MediaWiki-extensions-GWToolset, 6Multimedia, 7Performance: Can Commons support a mass upload of 14 million files (1.5 TB)? - https://phabricator.wikimedia.org/T88758#1608682 (10Jdforrester-WMF) a:3Harej [18:30:59] RECOVERY - puppet last run on francium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:34:58] (03PS1) 10Alex Monk: Change ukwikivoyage logo, take 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236063 (https://phabricator.wikimedia.org/T110370) [18:35:25] (03CR) 10Alex Monk: [C: 032] Change ukwikivoyage logo, take 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236063 (https://phabricator.wikimedia.org/T110370) (owner: 10Alex Monk) [18:35:31] (03Merged) 10jenkins-bot: Change ukwikivoyage logo, take 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236063 (https://phabricator.wikimedia.org/T110370) (owner: 10Alex Monk) [18:36:22] !log krenair@tin Synchronized w/static/images/project-logos/ukwikivoyage.png: https://gerrit.wikimedia.org/r/#/c/236063/ (duration: 00m 11s) [18:36:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:40:57] (03PS2) 10Rush: elasticsearch partman and autoinstall [puppet] - 10https://gerrit.wikimedia.org/r/235893 [18:41:48] PROBLEM - RAID on ms-be1010 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) [18:43:28] PROBLEM - Disk space on ms-be1010 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdf1 is not accessible: Input/output error [18:43:36] 7Blocked-on-Operations, 6operations, 10MediaWiki-extensions-GWToolset, 6Multimedia, 7Performance: Can Commons support a mass upload of 14 million files (1.5 TB)? - https://phabricator.wikimedia.org/T88758#1609690 (10ori) [18:45:04] 6operations, 10fundraising-tech-ops: build libanon package for trusty - https://phabricator.wikimedia.org/T110739#1609735 (10Krenair) [18:47:19] uh, is the ms-be1010 CRITICAL a CRITICAL thing? [18:51:40] 6operations, 10MediaWiki-extensions-TimedMediaHandler, 6Multimedia, 10Wikimedia-Video: Backport libtheora 1.2.0alpha package to Trusty - https://phabricator.wikimedia.org/T109207#1609897 (10Jdforrester-WMF) p:5Triage>3Normal [18:51:43] godog: ^ [18:53:37] 6operations, 10MediaWiki-extensions-TimedMediaHandler, 6Multimedia, 10Wikimedia-Video: Backport libtheora 1.2.0alpha package to Trusty - https://phabricator.wikimedia.org/T109207#1609912 (10brion) 5Open>3Resolved I believe this is done now, the updated packages are in place in trusty repo and show up i... [18:58:05] 6operations, 10ops-codfw: Patch NTT @ eqdfw - https://phabricator.wikimedia.org/T111522#1610132 (10faidon) 5Open>3Resolved Done! [18:58:05] 6operations, 10netops: Set up NTT transit @ eqdfw, eqord - https://phabricator.wikimedia.org/T111274#1610134 (10faidon) [18:59:08] 6operations, 10netops: Set up NTT transit @ eqdfw, eqord - https://phabricator.wikimedia.org/T111274#1599999 (10faidon) The DA1 cross-connect is completed and patched on our side (and NTT knows). We're still waiting for completion of the CH1 cross-connect. [19:00:57] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Puppet has 1 failures [19:01:17] 6operations, 10ops-codfw: Patch NTT @ eqdfw - https://phabricator.wikimedia.org/T111522#1610220 (10Papaul) Closing this task the patches are done I will update the diagram later with the new patches cr1-eqdfw xe-0/0/1 cable ID 11401 to Equinix patch panel ID 20036827 port 3/4 cr1-eqdfw xe-1/1/0 cable ID 11400... [19:04:40] 6operations, 10fundraising-tech-ops: reformulate kafkatee package to work with Trusty - https://phabricator.wikimedia.org/T110591#1610291 (10Ottomata) Done https://gerrit.wikimedia.org/r/#/c/236066/ http://apt.wikimedia.org/wikimedia/pool/main/k/kafkatee/ [19:06:03] 6operations, 10ops-eqiad: ms-be1010.eqiad.wmnet: slot=5 dev=sdf failed - https://phabricator.wikimedia.org/T111553#1610294 (10fgiunchedi) 3NEW [19:06:16] 6operations, 10ops-eqiad: ms-be1010.eqiad.wmnet: slot=5 dev=sdf failed - https://phabricator.wikimedia.org/T111553#1610302 (10fgiunchedi) ``` 19008794.269908] end_request: I/O error, dev sdf, sector 962033072 [19008794.269915] sd 0:2:5:0: [sdf] Unhandled error code [19008794.269917] sd 0:2:5:0: [sdf] Result:... [19:06:32] chasemp greg-g ^ [19:06:45] is there a task update? [19:07:27] godog: ok thanks I was just about to ask in -sec [19:07:39] np [19:07:40] ACKNOWLEDGEMENT - Disk space on ms-be1010 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdf1 is not accessible: Input/output error Filippo Giunchedi T111553 [19:07:40] ACKNOWLEDGEMENT - RAID on ms-be1010 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) Filippo Giunchedi T111553 [19:07:40] ACKNOWLEDGEMENT - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Puppet has 1 failures Filippo Giunchedi T111553 [19:07:41] <- off [19:17:13] 6operations, 10fundraising-tech-ops: package udp-filter for Trusty, for use on fundraising banner_logger - https://phabricator.wikimedia.org/T110592#1610336 (10Ottomata) Hm, Jeff, both udp-filter and libanon are installed from packages on stat1002, which is a Trusty box. I just tried it with both anonymizatio... [19:19:42] 6operations: Run assert check to verify the existence of certain texts in the footer - https://phabricator.wikimedia.org/T108081#1610357 (10chasemp) a:5chasemp>3ZhouZ >>! In T108081#1555654, @ZhouZ wrote: > Thanks Chase - this is great. And I guess automation has already done its job. > > I will have to di... [19:36:47] jynus! sorry for all the database log spam ;) [19:38:48] (03PS1) 10BBlack: Disable zerofetcher on text caches T111045 [puppet] - 10https://gerrit.wikimedia.org/r/236080 [19:40:17] (03CR) 10BBlack: [C: 032] Disable zerofetcher on text caches T111045 [puppet] - 10https://gerrit.wikimedia.org/r/236080 (owner: 10BBlack) [19:51:49] 6operations, 10MediaWiki-extensions-ZeroPortal, 10Traffic, 6Zero, 5Patch-For-Review: zerofetcher in production is getting throttled for API logins - https://phabricator.wikimedia.org/T111045#1610465 (10Krenair) Is this proxying requests for external clients? If so they should probably be added to the lis... [19:58:48] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [20:00:30] bblack: your change wasn't merged on strontium for some reason, fixed that just now [20:00:57] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [20:03:16] (03CR) 10Hashar: "I though the nodepool sudo would be enough but I was wrong:" [puppet] - 10https://gerrit.wikimedia.org/r/235742 (https://phabricator.wikimedia.org/T111374) (owner: 10Hashar) [20:09:55] 6operations, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 10Parsoid, and 3 others: Decom parsoid-lb.eqiad.wikimedia.org entrypoint - https://phabricator.wikimedia.org/T110474#1610495 (10cscott) [20:34:16] (03PS1) 10Smalyshev: Add icinga monitoring for WDQS services [puppet] - 10https://gerrit.wikimedia.org/r/236189 [20:53:22] (03PS2) 10Smalyshev: Add icinga monitoring for WDQS services [puppet] - 10https://gerrit.wikimedia.org/r/236189 (https://phabricator.wikimedia.org/T103911) [21:12:27] (03CR) 10Dzahn: [C: 032] Add icinga monitoring for WDQS services [puppet] - 10https://gerrit.wikimedia.org/r/236189 (https://phabricator.wikimedia.org/T103911) (owner: 10Smalyshev) [21:13:37] (03PS1) 10GWicke: Disallow indexing for /api/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236200 [21:14:05] (03PS2) 10GWicke: Disallow indexing for /api/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236200 (https://phabricator.wikimedia.org/T109023) [21:20:32] (03PS3) 10GWicke: Disallow indexing for /api/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236200 (https://phabricator.wikimedia.org/T109023) [21:23:46] PROBLEM - WDQS HTTP Port on wdqs1001 is CRITICAL: NRPE: Command check_WDQS not defined [21:26:47] these are brand new checks, i'm checking it [21:27:02] re: wdqs [21:28:27] mutante: something wrong there? [21:29:08] (03PS1) 10Rush: phab: use permissions for files on bot upload [puppet] - 10https://gerrit.wikimedia.org/r/236205 [21:29:18] SMalyshev: with the check itself yes, with the service, no [21:29:27] it currently thinks the nrpe command is not defined [21:29:41] i disabled notifications and taking a look [21:29:42] mutante: it's in https://gerrit.wikimedia.org/r/#/c/236189/2/modules/wdqs/manifests/monitor/services.pp [21:29:57] 6operations, 10Datasets-General-or-Unknown, 5Patch-For-Review: Add App Guidelines on Dumps Page - https://phabricator.wikimedia.org/T110742#1610683 (10Krenair) @VBaranetsky, are you commenting on whether there should be something at legal.html? Did you review https://gerrit.wikimedia.org/r/#/c/235208/ or the... [21:30:06] SMalyshev: NRPE: Command 'check_WDQS' not defined [21:30:21] mutante: yeah that's weird no such command in the config [21:30:41] maybe the name of the service is meaningful? [21:31:03] (03PS1) 10Mattflaschen: Add missing extra namespaces from prod config to labs version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236206 (https://phabricator.wikimedia.org/T111267) [21:31:03] it does not like the "-" ? [21:31:12] (03CR) 10OliverKeyes: "Legal has given the A-OK to this patch, just to be clear." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235274 (https://phabricator.wikimedia.org/T76497) (owner: 10EBernhardson) [21:31:18] SMalyshev: let's use underscore [21:31:38] mutante: where? [21:31:50] so there are these configs: [21:31:57] check_WDQS-Internal-HTTP-endpoint.cfg [21:32:22] mutante: ahh, I see. So it uses the name to make the cfg. [21:32:23] and it says " Command 'check_WDQS' " [21:32:30] so i think the - [21:32:33] I'll eliminate the spaces in the names then and see what happens [21:32:33] I'll eliminate the spaces in the names then and see what happens [21:33:55] so i ran one of the nrpe commands on wdqs1001, it works fine [21:33:56] (03PS1) 10Smalyshev: Remove spaces in check names [puppet] - 10https://gerrit.wikimedia.org/r/236208 [21:33:58] and they got created [21:34:05] it just tries the wrong name to find them [21:34:17] mutante: https://gerrit.wikimedia.org/r/#/c/236208/ ? [21:34:53] 6operations, 6Labs, 5Patch-For-Review: audit labs versus production ssh keys - https://phabricator.wikimedia.org/T108078#1610715 (10Krenair) @RobH: Any reason for this to remain open? [21:34:59] looks good, let me just edit the message a little bit [21:35:14] (03PS2) 10Dzahn: wdsq: monitoring, remove spaces in check names [puppet] - 10https://gerrit.wikimedia.org/r/236208 (owner: 10Smalyshev) [21:35:18] PROBLEM - Blazegraph Port on wdqs1002 is CRITICAL: NRPE: Command check_WDQS not defined [21:35:21] mutante: looking at https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?servicegroup=wdqs_eqiad&style=detail&nostatusheader all NRPE ones don't work [21:35:34] 6operations, 6Labs, 5Patch-For-Review: audit labs versus production ssh keys - https://phabricator.wikimedia.org/T108078#1610722 (10RobH) 5Open>3Resolved Nope! both sub-tasks resolved, resolving task. [21:35:37] PROBLEM - Blazegraph process on wdqs1002 is CRITICAL: NRPE: Command check_WDQS not defined [21:35:41] SMalyshev: ack, they all have the same issue [21:35:51] and the ones on wdqs1002 literally just got added by puppet run [21:36:28] PROBLEM - Updater process on wdqs1002 is CRITICAL: NRPE: Command check_WDQS not defined [21:36:33] disabled notifications for the moment [21:37:05] (03CR) 10Dzahn: [C: 032] "NRPE: Command 'check_WDQS' not defined" [puppet] - 10https://gerrit.wikimedia.org/r/236208 (owner: 10Smalyshev) [21:37:36] ok, let's see if it better with _ [21:37:46] yea, running puppet on neon and looking [21:38:06] it can be a bit slow [21:38:19] (03CR) 10Platonides: Enable CirrusSearch per-user rate limiting (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235274 (https://phabricator.wikimedia.org/T76497) (owner: 10EBernhardson) [21:39:26] 6operations, 6Phabricator: Moving procurement from RT to Phabricator - https://phabricator.wikimedia.org/T93760#1610755 (10chasemp) [21:40:42] (03CR) 10Southparkfan: [C: 04-1] add script to import miraheze wikis (031 comment) [debs/wikistats] - 10https://gerrit.wikimedia.org/r/235958 (https://phabricator.wikimedia.org/T107398) (owner: 10Dzahn) [21:41:53] hmm... it still shows the same error... didn't update yet? [21:42:16] 6operations, 6Labs, 10wikitech.wikimedia.org: Determine whether wikitech should really depend on production search cluster - https://phabricator.wikimedia.org/T110987#1610756 (10chasemp) p:5Triage>3Normal iirc we have done other things to island off wikitech since it is the defacto place we check for inf... [21:43:08] (03CR) 10Deskana: Enable CirrusSearch per-user rate limiting (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235274 (https://phabricator.wikimedia.org/T76497) (owner: 10EBernhardson) [21:43:16] 6operations, 10ops-eqiad: ms-be1010.eqiad.wmnet: slot=5 dev=sdf failed - https://phabricator.wikimedia.org/T111553#1610760 (10chasemp) p:5Triage>3High [21:43:35] it created new nrpe config files on the wdqs hosts, so there are 2 versions each now. should delete the old stuff once it works [21:43:41] 6operations, 10ops-eqiad: ms-be1010.eqiad.wmnet: slot=5 dev=sdf failed - https://phabricator.wikimedia.org/T111553#1610763 (10RobH) a:3Cmjohnson [21:45:56] SMalyshev: i see the first one is fixed :) [21:46:29] mutante: ah, excellent. So I hope the rest will catch up then [21:46:46] re-enabling notifications [21:47:21] 6operations, 10ops-eqiad: ms-be1010.eqiad.wmnet: slot=5 dev=sdf failed - https://phabricator.wikimedia.org/T111553#1610779 (10RobH) The warranty on these ends on 2015-11-30, so we can still get this replaced. I've assigned it to Chris as the onsite, since it won't require ordering a placement for purchase, si... [21:47:35] RECOVERY - Blazegraph Port on wdqs1002 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 9999 [21:47:54] RECOVERY - Blazegraph process on wdqs1002 is OK: PROCS OK: 1 process with UID = 998 (blazegraph), regex args ^java .* blazegraph-service-.*-dist.war [21:48:13] SMalyshev: ^ there they are. all green [21:48:15] mutante: thanks, looks good now! [21:48:34] yep, the service group is 100% green [21:50:33] (03CR) 10Dzahn: "works now after mini follow-up fix that replaced spaces with underscore" [puppet] - 10https://gerrit.wikimedia.org/r/236189 (https://phabricator.wikimedia.org/T103911) (owner: 10Smalyshev) [21:53:54] PROBLEM - puppet last run on restbase1001 is CRITICAL: CRITICAL: Puppet last ran 3 days ago [21:55:18] !log bouncing Cassandra on restbase1001 to restore default GC settings [21:55:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:55:36] (03CR) 10MaxSem: [C: 031] Disallow indexing for /api/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236200 (https://phabricator.wikimedia.org/T109023) (owner: 10GWicke) [21:55:55] RECOVERY - puppet last run on restbase1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:58:01] got a weird problem - Java process periodically loses DNS: Aug 31 06:58:00 wdqs1002 bash[14108]: Caused by: java.net.UnknownHostException: www.wikidata.org [21:58:24] from CLI everything OK, after restarting it everything OK too... Anybody seen anything like that? [22:05:44] (03PS2) 10Dzahn: add script to import miraheze wikis [debs/wikistats] - 10https://gerrit.wikimedia.org/r/235958 (https://phabricator.wikimedia.org/T107398) [22:06:13] (03CR) 10Dzahn: add script to import miraheze wikis (031 comment) [debs/wikistats] - 10https://gerrit.wikimedia.org/r/235958 (https://phabricator.wikimedia.org/T107398) (owner: 10Dzahn) [22:07:05] (03CR) 10Dzahn: "@AlexMonk: thank you! added" [debs/wikistats] - 10https://gerrit.wikimedia.org/r/235958 (https://phabricator.wikimedia.org/T107398) (owner: 10Dzahn) [22:09:05] (03CR) 10Dzahn: [C: 032] "this is what was manually used for the initial import for now" [debs/wikistats] - 10https://gerrit.wikimedia.org/r/235958 (https://phabricator.wikimedia.org/T107398) (owner: 10Dzahn) [22:14:29] (03CR) 10CSteipp: [C: 031] "Whitespace cleanup. Otherwise, yes please add this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236200 (https://phabricator.wikimedia.org/T109023) (owner: 10GWicke) [22:18:41] (03PS2) 10Mattflaschen: Add missing extra namespaces from prod config to labs version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236206 (https://phabricator.wikimedia.org/T111267) [22:21:59] (03CR) 10Dzahn: [C: 031] "checked against racktables info. the networks and row info from there match. lgtm" [dns] - 10https://gerrit.wikimedia.org/r/235906 (owner: 10Rush) [22:33:29] (03PS1) 10Ori.livneh: Apply snapshot role on osmium [puppet] - 10https://gerrit.wikimedia.org/r/236220 [22:49:48] !log krenair@tin Synchronized php-1.26wmf21/extensions/Citoid: https://gerrit.wikimedia.org/r/#/c/236218/ and https://gerrit.wikimedia.org/r/#/c/236222/ (duration: 00m 12s) [22:49:49] James_F, ^ [22:49:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:49:53] Thanks Krenair. [22:50:23] This does funny things with i18n messages but I don't think it needs an l10n update [22:51:38] Krenair: LGTM. [23:00:08] (03PS4) 10GWicke: Disallow indexing for /api/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236200 (https://phabricator.wikimedia.org/T109023) [23:01:12] aude: So who would be the point person @ WMDE to approve/confirm other WMDE folks requesting access? [23:01:20] (03CR) 10GWicke: "Chris: Whitespace is fixed." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236200 (https://phabricator.wikimedia.org/T109023) (owner: 10GWicke) [23:02:33] James_F: ^ same question for you =] [23:02:37] regarding https://phabricator.wikimedia.org/T111204 [23:03:18] 10Ops-Access-Requests, 6operations: Requesting access to hadoop / hive (analytics-privatedata-users) for Addshore - https://phabricator.wikimedia.org/T111204#1611005 (10RobH) @Addshore: We'll also need you to review and sign L3. Additionally, I need to determine who should approve WMDE access requests (as we... [23:05:50] 6operations, 10MediaWiki-extensions-ZeroPortal, 10Traffic, 6Zero, 5Patch-For-Review: zerofetcher in production is getting throttled for API logins - https://phabricator.wikimedia.org/T111045#1611008 (10BBlack) No, the zerofetcher is just a custom script that runs on the caches and fetches zero-rating met... [23:07:47] robh: Wikidata relationship at WMF is Wes's world, I think – Deskana? [23:08:05] yea we get access requests like once every 4-7 months [23:08:06] (03CR) 10Legoktm: [C: 031] "Suggestion. Code looks fine" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236206 (https://phabricator.wikimedia.org/T111267) (owner: 10Mattflaschen) [23:08:12] just long enough apart for us all to forget or not be certain [23:08:28] * James_F nods. [23:08:32] also other than jan, most of them are pre-phabricator [23:08:43] and jan is personally known by me, so i just knew it was cool. [23:12:56] robh, isn't it supposed to be the person's manager? [23:13:05] well, wmde isnt wmf [23:13:07] rather than just one person in their entire organisation [23:13:09] so its wmf manager [23:13:28] I don't remember the policy saying that [23:13:32] so i dunno how to handle the wmde, it'd be best if there is a wmf sponsering manager [23:13:43] the policy says manager approval [23:13:53] so its implied i suppose [23:14:00] but any manager on earth doesnt count ;] [23:14:15] Krenair: Do you have a suggestion on how to handle it? [23:14:22] it's approval from the person's direct supervisor. I guess you can default to wmf for volunteers, but for wmde... [23:14:56] don't we treat all chapters like volunteers? [23:14:56] the other one mentioned is approval from project lead. to be honest I don't know if this is being consistently required at the moment... [23:15:01] no? [23:15:56] the project lead part is an attempt to phrase it so if someone in one team wants access to another teams servers, we try to check with the other tema [23:16:07] unless its within the typical scope of the work of the perosn which is very difficult to determine [23:16:11] ah, interesting [23:16:18] so yea, i dunno how to better phrase it [23:16:27] most of the time its not an issue [23:16:27] I think in some cases it's not very clear what the appropriate project lead would be anyway [23:16:35] like, restricted? [23:16:40] which project would that fall under? [23:16:45] yea, its not clear at all [23:16:54] you could argue deployment would fall under releng and therefore greg-g [23:17:05] yep [23:17:41] im not sure its really an issue though [23:17:45] i mean, we could pull the project lead requirement [23:17:57] and then if any one request seems odd, we just put in a blocking question about it [23:18:09] as we can block for any reason we want during the 3 day wait and ask for clarification pretty much. [23:18:19] (well, any reason within reason ;) [23:18:58] looks like the engineering and operations managers can approve/deny for any reason they want [23:19:01] not clear normal staff can though [23:19:37] at least, for new shell users. escalations does not mention that part. I'd expect it to be the same... [23:19:45] hrmm? [23:19:58] we're discussing https://wikitech.wikimedia.org/wiki/Requesting_shell_access greg-g [23:20:20] * greg-g will read backscroll in a bit unless I should now [23:20:28] s/I/he/ #stupid grammar in third-person [23:21:15] I was just using you and releng with deployment as an example of a theoretical server group -> team mapping, sorry greg-g [23:21:26] no prob :) [23:22:02] for some other groups I don't think it's necessarily clear who'd be the appropriate 'project lead' to approve [23:23:16] robh, that said, I have no idea how NDAs work in the case of chapter employees [23:23:31] (03CR) 10Papaul: [C: 04-1] "elastic201[3-8] are in row C so there are part of private1-c1 and belongs to the network 10.192.32.0 and not 10.192.1.0" [dns] - 10https://gerrit.wikimedia.org/r/235906 (owner: 10Rush) [23:23:53] yea i think we have enough eyes on things to drop the project lead thing [23:24:04] but otherwise indeed we shoudl map out leads someplace [23:24:07] wouldn't NDAs work in the same way as they do for volunteers though? [23:24:14] yes they should [23:24:17] well [23:24:20] if the person wants access to restricted access [23:24:37] but nda's arent covered in shell access [23:24:46] i always though that the process should be the same, just a different "L" document [23:24:53] or multiple.. shrug [23:24:56] they are owned by different groups [23:24:58] the only two types of NDAs which seems to exist and accepted by legal are volunteer and WMF-staff signed ones. so... [23:25:11] wait [23:25:17] i dont get why nda's are part of this conversation [23:25:29] JohnFLewis: are you taking us on a tangent? [23:25:33] =] [23:25:43] robh: no, just answering a question Krenair put out :) [23:25:43] oh no was Krenair [23:25:45] It's probably my fault for bringing it up [23:25:50] Krenair: it is! [23:25:52] ;] [23:25:52] :) [23:26:00] So Ops has ZERO control over NDAs or their process [23:26:14] thus, NDA review is ONLY an issue if someone asks for restricted data and is also a volunteer [23:26:25] our instructions should account for that if on ly to mention it though. [23:26:31] maybe L2 is the appropriate one for chapter employees? I don't know... I would ask wmf legal [23:26:42] Krenair: i would think it would be indeed [23:27:08] but i dunno if there is a chapter nda... if there is its the first ive ever heard of it [23:28:06] I don't really get why we're trying to distinguish chapter staff into a separate category [23:28:44] i dont think we should [23:28:48] they work for the chapter and volunteer for the Foundation, there's no legal connection between a chapter and the Foundation and people only exist in two categories, staff or volunteers [23:28:58] i just wanted to know who on wmf staff would be good to approve wmde requests =] [23:29:06] JohnFLewis: i agree [23:29:28] who? a manager at the WMF in my opinion :) [23:29:30] just for other volunteer reuqests, they typically have someone in wmf approving it [23:30:42] JohnFLewis, is there no legal connection? there's the affiliation agreements right? I don't know what's in those [23:30:54] again, this is why I'd ask legal :p [23:30:55] (03PS3) 10Mattflaschen: Add missing extra namespaces from prod config to labs version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236206 (https://phabricator.wikimedia.org/T111267) [23:31:17] yeah best ask legal for clarification but they'd come back saying treat them as volunteer likely [23:31:32] (03CR) 10Mattflaschen: Add missing extra namespaces from prod config to labs version (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236206 (https://phabricator.wikimedia.org/T111267) (owner: 10Mattflaschen) [23:32:17] I doubt legal have gone 'chapters have control over the WMF process' as the chapter process now iirc is fairly distinct from active WMF involvement beside board approval from recommendations or so [23:32:33] James_F, robh: Yeah: Wes, Tomasz and myself are the primary people on the WMF side that speak to WMDE. [23:32:52] What's the context here? [23:32:54] Deskana: Which of you can sign-off that production server access is needed? [23:32:55] Deskana: should I assign the 'manager approval for shell' to you then? [23:33:06] https://phabricator.wikimedia.org/T111204 [23:33:11] Krenair: you are right htough [23:33:15] JohnFLewis, so at that point you end up with paid volunteers [23:33:19] i'll have to ask him to sign the L2 [23:33:34] Krenair: not really though [23:34:14] robh: It depends, what are you looking for from me? To approve it myself, or to find out about the process? [23:34:16] the chapter pays and employees the staff, not the Foundation, which they volunteer for [23:35:16] 10Ops-Access-Requests, 6operations: Requesting access to hadoop / hive (analytics-privatedata-users) for Addshore - https://phabricator.wikimedia.org/T111204#1611038 (10RobH) Also, @Krenair was kind enough to remind me that analytics-privatedata-users is indeed private data. So we need an NDA for @addshore.... [23:35:31] Deskana: well, someone has to as a manager look at the request and what the person wants [23:35:39] and decide if that person is trusted enough and should have said access [23:35:56] in the case of wmde, its typically if they are volunteering or being paid by wmde to work on a specific project [23:35:57] so how does offboarding work for chapter employees? [23:36:02] it's also tricky wording. as if they're paid volunteers, edits they make from chapter accounts would then theoretically be considered paid editing :) [23:36:18] Krenair: there is none unless chapter tells us, i was intentionally not bringing that up during our offboard conversation the other day ;] [23:36:23] heh [23:36:28] robh: I'm happy to do that. I've been working with them to set up dashboards for Wikidata, and they would need access to the request logs for that. [23:36:44] Deskana: perfect, please append your approval and state all of that in it please =] [23:36:53] robh: Will do. [23:36:55] thanks! [23:37:24] Deskana: is it ok for me to list you on our ops clinic page instructions for ops folks to triage those approvals to you for wmde shell requests? [23:37:36] if it changes to someone else, its on wikitech so easy to update = [23:37:37] =] [23:38:10] robh: Absolutely. [23:38:34] cool, thank you. https://wikitech.wikimedia.org/wiki/Ops_Clinic_Duty is the page im talking about [23:38:43] we have a quick reference on how to triage common requests [23:39:53] robh: I'll check with Addshore what his purpose for requesting the logs is before formally approving. [23:40:28] robh, interesting. you're saying NDA is required for analytics-privatedata-users, I had assumed it was required for any kind of server access because pretty much everything in production can hold something sensitive? [23:41:29] 10Ops-Access-Requests, 6operations: Requesting access to hadoop / hive (analytics-privatedata-users) for Addshore - https://phabricator.wikimedia.org/T111204#1611042 (10Deskana) @addshore I've been asked to approve this request from the WMF side, which I'm happy to do. But, we should get the intended purpose o... [23:41:47] if it changes to someone else, its on wikitech so easy to update = [23:41:48] hahaha [23:41:54] If only this was done in practice [23:46:44] (03CR) 10Ori.livneh: [C: 032] Apply snapshot role on osmium [puppet] - 10https://gerrit.wikimedia.org/r/236220 (owner: 10Ori.livneh) [23:47:59] 10Ops-Access-Requests, 6operations: Requesting access to hadoop / hive (analytics-privatedata-users) for Addshore - https://phabricator.wikimedia.org/T111204#1611046 (10Legoktm) I'm fairly sure addshore already has an NDA signed, because we had to jump through hoops to get him graphite access. [23:49:59] (03CR) 10Legoktm: [C: 031] Add missing extra namespaces from prod config to labs version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236206 (https://phabricator.wikimedia.org/T111267) (owner: 10Mattflaschen) [23:50:32] (03CR) 10Mattflaschen: [C: 032] Add missing extra namespaces from prod config to labs version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236206 (https://phabricator.wikimedia.org/T111267) (owner: 10Mattflaschen) [23:50:57] (03Merged) 10jenkins-bot: Add missing extra namespaces from prod config to labs version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236206 (https://phabricator.wikimedia.org/T111267) (owner: 10Mattflaschen) [23:51:40] 10Ops-Access-Requests, 6operations: Requesting access to hadoop / hive (analytics-privatedata-users) for Addshore - https://phabricator.wikimedia.org/T111204#1611052 (10Krenair) Yep: ```alex@alex-laptop:~$ ssh tools-login.wmflabs.org ldaplist -l group nda | grep addshore member: uid=addshore,ou=people,dc=wiki... [23:52:24] !log mattflaschen@tin Synchronized wmf-config/CommonSettings-labs.php: Beta-only change (duration: 00m 11s) [23:52:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:52:46] !log mattflaschen@tin Synchronized wmf-config/InitialiseSettings-labs.php: Beta-only change (duration: 00m 12s) [23:52:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:53:33] 10Ops-Access-Requests, 6operations: Requesting access to hadoop / hive (analytics-privatedata-users) for Addshore - https://phabricator.wikimedia.org/T111204#1611055 (10Legoktm) I found a copy of the signed NDA (with @Eloquence's approval) in my mailbox from February 2014 and can forward it on to whomever need... [23:53:43] oops, I'm behind [23:54:09] legoktm :) [23:54:24] PROBLEM - puppet last run on osmium is CRITICAL: CRITICAL: Puppet has 1 failures [23:55:12] chasemp, what's the plan for RT access-requests then? [23:55:32] ahaha [23:55:39] I just found a ton of shinken in my spam [23:55:48] someone is marking shinken emails as spam :P [23:56:04] ACKNOWLEDGEMENT - puppet last run on osmium is CRITICAL: CRITICAL: Puppet has 1 failures ori.livneh Ori testing snapshot role on Trusty [23:56:43] Krenair: do you mean importing the old tickets? [23:57:04] or an alternative [23:57:29] (03PS1) 10Ori.livneh: Add osmium to dataset1001 exports so it can be used as snapshot host [puppet] - 10https://gerrit.wikimedia.org/r/236230 [23:57:49] (03CR) 10Ori.livneh: [C: 032 V: 032] Add osmium to dataset1001 exports so it can be used as snapshot host [puppet] - 10https://gerrit.wikimedia.org/r/236230 (owner: 10Ori.livneh)