[00:01:55] (03CR) 10GWicke: "Ping!" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/195483 (https://phabricator.wikimedia.org/T91617) (owner: 10GWicke) [00:05:22] PROBLEM - Disk space on vanadium is CRITICAL: DISK CRITICAL - free space: / 934 MB (0% inode=94%): [00:06:37] andrewbogott_afk, Coren: I get a blank page on https://wikitech.wikimedia.org/wiki/Special:NovaInstance - I thought that was fixes? [00:06:45] s/fixes/fixed/ [00:07:23] MaxSem: The flaw that caused the storage of corrupted auth tokens should be fixed, but that doesn't mean you didn't already have one cached. [00:07:46] mmm, I specifically relogined:P [00:07:53] MaxSem: It may be a recurrence of the same issue if you get the problem again with a token that dates from /after/ the fix though. [00:08:19] also, just fatal out? :) [00:08:31] MaxSem: And you actually have the projects you want to see in the filter right? :-) That gets cleared alongside the session. [00:08:49] Fatal? [00:09:08] blank page means fatal [00:09:14] like, completely blank [00:09:21] Oh, you don't mean a list with no entries! [00:09:31] Hm. It wfm right now. Odd. [00:09:49] I'm not at home though, so I have limited debugging capabilities here. [00:10:46] !log catrope Finished scap: SWAT (duration: 30m 20s) [00:10:50] Logged the message, Master [00:11:16] (03CR) 10Jforrester: "> The principal use case is Wikipedia?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197473 (owner: 10Jforrester) [00:11:32] bd808: tgr: SWAT scap finally done now, please verify [00:12:33] PROBLEM - Check status of defined EventLogging jobs on vanadium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:13:42] RoanKattouw: Code is in the right place. l10update will test it all in ~3 hours [00:14:05] OK cool [00:16:09] RoanKattouw: verified, thanks [00:17:33] chasemp: small cleanup https://gerrit.wikimedia.org/r/#/c/197320/ [00:18:03] (i just saw it because of the default mail relay thing/workaround) [00:21:16] 6operations, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform/Ops April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1130789 (10RobLa-WMF) [00:21:21] (03CR) 10Jforrester: [C: 032] RESTbase production enablement step 3 – frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197470 (owner: 10Jforrester) [00:22:05] (03Merged) 10jenkins-bot: RESTbase production enablement step 3 – frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197470 (owner: 10Jforrester) [00:22:05] 6operations, 7Graphite: logins on graphite - https://phabricator.wikimedia.org/T93158#1130790 (10Dzahn) 3NEW [00:23:49] !log catrope Synchronized wmf-config/InitialiseSettings.php: Enable RESTbase on frwiki (duration: 00m 06s) [00:23:56] Logged the message, Master [00:24:07] 6operations, 7Graphite: logins on graphite - https://phabricator.wikimedia.org/T93158#1130806 (10Dzahn) that said, for Coren it works. and: ``` # LDAP authentication AuthName "WMF Labs (use wiki login name not shell)" AuthType Basic AuthBasicProvider ldap AuthLDAPBindD... [00:26:17] RoanKattouw, frwiki was not done? [00:26:32] Oh it is now, lemme update [00:29:23] !log Set email for frwiki account "Sarcelles" to the one of the global account with the same name. [00:29:26] Logged the message, Master [00:36:00] Coren: question, seems that graphite is also kind of broken in prod, right? [00:36:13] Coren: maybe I am the last one to find out ... [00:37:05] nuria, what data are you looking at specifically? [00:37:24] Krenair: a bunch of alarms are being triggered [00:37:52] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [00:38:04] mmmm... [00:41:52] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [00:43:25] (03PS3) 10Tim Landscheidt: WIP: Ignore some warnings about case statements without default matches [puppet] - 10https://gerrit.wikimedia.org/r/197759 (https://phabricator.wikimedia.org/T87132) [00:48:53] RECOVERY - Check status of defined EventLogging jobs on vanadium is OK: OK: All defined EventLogging jobs are runnning. [00:49:23] RECOVERY - Disk space on vanadium is OK: DISK OK [00:51:02] ACKNOWLEDGEMENT - Host cerium is DOWN: CRITICAL - Bogus ICMP: Port Unreachable (10.64.16.147) daniel_zahn firewalling - T92680 [00:51:02] ACKNOWLEDGEMENT - Host praseodymium is DOWN: CRITICAL - Bogus ICMP: Port Unreachable (10.64.16.149) daniel_zahn firewalling - T92680 [00:51:02] ACKNOWLEDGEMENT - Host xenon is DOWN: CRITICAL - Bogus ICMP: Port Unreachable (10.64.0.200) daniel_zahn firewalling - T92680 [00:52:49] (03PS4) 10Tim Landscheidt: Ignore some warnings about case statements without default matches [puppet] - 10https://gerrit.wikimedia.org/r/197759 (https://phabricator.wikimedia.org/T87132) [00:53:25] 6operations, 10RESTBase, 10RESTBase-Cassandra, 6Security: securing the RESTBase Cassandra cluster - https://phabricator.wikimedia.org/T92680#1130860 (10Dzahn) We should open a hole for Icinga (neon) to still check if the hosts are up. Currently it filters ICMP: dzahn@neon:~$ ping 10.64.16.147 PING 10.64.1... [00:55:54] (03PS9) 10BBlack: sslcert: generate chained certs automatically [puppet] - 10https://gerrit.wikimedia.org/r/197341 (owner: 10Faidon Liambotis) [00:55:56] (03PS1) 10BBlack: OCSP support for install_certificate [puppet] - 10https://gerrit.wikimedia.org/r/197821 [00:56:34] ACKNOWLEDGEMENT - Host db2042 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn codfw - still to be installed [00:56:34] ACKNOWLEDGEMENT - Host mw2003 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn codfw - still to be installed [01:00:18] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra compaction is getting behind - https://phabricator.wikimedia.org/T93140#1130871 (10Eevans) Compaction throughput is now set at 128 MB/s, but with only 2 compaction threads, actual throughput seems to be limited to about ~50MB/s. Pending compactions se... [01:02:35] (03PS1) 10Dzahn: add base::firewall on cassandra test hosts [puppet] - 10https://gerrit.wikimedia.org/r/197822 (https://phabricator.wikimedia.org/T92680) [01:05:02] 6operations, 10RESTBase, 10RESTBase-Cassandra, 6Security, 5Patch-For-Review: securing the RESTBase Cassandra cluster - https://phabricator.wikimedia.org/T92680#1130903 (10Dzahn) >>! In T92680#1129960, @BBlack wrote: > monitoring and bastion ssh are already account for in our puppet base::firewall stuff,... [01:06:03] RECOVERY - Host cerium is UP: PING OK - Packet loss = 0%, RTA = 1.87 ms [01:06:53] PROBLEM - puppet last run on mw2161 is CRITICAL: CRITICAL: Puppet has 1 failures [01:08:44] RECOVERY - Host praseodymium is UP: PING OK - Packet loss = 0%, RTA = 1.46 ms [01:08:44] RECOVERY - Host xenon is UP: PING OK - Packet loss = 0%, RTA = 1.20 ms [01:09:45] (03PS6) 10Ori.livneh: Gzip .svg and .ico files on bits.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/113687 (https://bugzilla.wikimedia.org/61442) (owner: 10Brion VIBBER) [01:09:48] (03PS1) 10Ori.livneh: Enable gzip of ico and svg files on labs bits [puppet] - 10https://gerrit.wikimedia.org/r/197825 [01:09:48] 6operations, 10Wikimedia-Shop, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1130907 (10Dzahn) shopify now claims: //You can only have one SSL on each account because it can only be associated to the primary domain. If you're planning on moving back to... [01:12:13] PROBLEM - RAID on db1020 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [01:12:47] 6operations, 10RESTBase, 10RESTBase-Cassandra, 6Security, 5Patch-For-Review: securing the RESTBase Cassandra cluster - https://phabricator.wikimedia.org/T92680#1130910 (10Eevans) > We should open a hole for Icinga (neon) to still check if the hosts are up. Currently it filters ICMP: Oh yes, good point.... [01:13:56] urandom: all of this is already handled by existing puppet code [01:14:09] we use ferm and we've abstracted it to puppet definitions [01:14:21] we already allow monitoring hosts, bastions for ssh etc. [01:14:54] paravoid: you mean the script? [01:15:36] I mean no need for iptables-save and allowing neon explicitly etc. [01:15:39] I only wrote that to test with, and attached is to document [01:15:43] right [01:16:17] without any means to test puppet changes, I'm hoping someone who can will make those changes :) [01:16:32] will work up the patch, I mean [01:16:49] someone should but at the same time you should have a way to test puppet patches [01:16:57] are you familiar with the Labs infrastructure at all? [01:17:14] no, but I think we're going to work on setting something up [01:17:26] https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster [01:18:07] so basically you can spawn VMs as you please [01:18:14] manage your own puppet tree (or multiple of them) [01:18:32] we should create a cassandra test project to play with [01:18:51] the cassandra test cluster that we have now is unusual [01:19:08] as Labs doesn't offer a "bare metal" mode yet and gabriel wanted to operate on real hardware for performance reasons [01:19:09] (03CR) 10GWicke: [C: 031] Adjust RESTBase / Cassandra settings for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/197662 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [01:19:35] so the test cluster is under "production" terms, hence all these restrictions [01:19:47] (it's also annoying as we get prod nagios alerts etc.) [01:19:59] I hope we can add bare metal Labs to the roadmap soon [01:20:00] urandom: so that first patch i uploaded , it will add the default drop policy and the default exceptions (bastion, monitoring) [01:20:13] then we just need to add a few holes for the actual cassandra things [01:20:30] mutante: ok [01:20:44] but in the meanwhile I think we can have a Cassandra setup on Labs (at least one!) that we can play with, even if it's not performant as prod [01:21:06] paravoid: we already have two actually [01:21:48] oh, nice :) [01:22:11] although, recently deleted the second in services as it wasn't on jessie [01:23:43] RECOVERY - puppet last run on mw2161 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [01:25:16] mutante: also, I don't know if that was just for demonstrative purposes or not, but we need those rules add to production as well [01:26:12] I probably confused matters by uploading that script with hard-coded users and hosts [01:27:47] urandom: we add the base::firewall to nodes and the holes to punch into it, we add to role classes. so we can open up ports in role::restbase and/or role::cassandra, test it on the test hosts without touching prod yet. then if it works, we add base::firewall to prod nodes too [01:28:13] i'm uploading an example [01:28:48] (03PS1) 10Dzahn: restbase: add ferm service for 7231/tcp [puppet] - 10https://gerrit.wikimedia.org/r/197830 (https://phabricator.wikimedia.org/T92680) [01:29:13] urandom: https://gerrit.wikimedia.org/r/#/c/197830/1/manifests/role/restbase.pp for example for 7231 [01:29:33] we should probably do this, mutante [01:29:41] maybe godog, tomorrow our morning [01:29:58] ok, yes [01:30:05] just wanted to explain how it works [01:30:14] 6operations, 10ops-eqiad: db1020 raid degraded - https://phabricator.wikimedia.org/T93166#1130930 (10Springle) 3NEW [01:30:52] PROBLEM - NTP on cerium is CRITICAL: NTP CRITICAL: No response from NTP server [01:31:01] mutante: gotcha [01:31:05] ACKNOWLEDGEMENT - RAID on db1020 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) Sean Pringle T93166 [01:31:11] thanks! [01:31:34] it's gonna get more complicated than that if we want to only allow cassandra<->cassandra network access [01:31:54] in any case, while firewalling is a good idea anyway, it's likely not enough :/ [01:32:18] let's just say it's not too hard to assume another host's IP address if you own a host [01:33:23] PROBLEM - NTP on praseodymium is CRITICAL: NTP CRITICAL: No response from NTP server [01:33:52] paravoid: then maybe this whole approach is a waste [01:34:12] PROBLEM - NTP on xenon is CRITICAL: NTP CRITICAL: No response from NTP server [01:34:22] (03CR) 10Tim Landscheidt: [C: 04-1] "Further thinking: In register(port), port should be an integer, not a string." [puppet] - 10https://gerrit.wikimedia.org/r/197658 (https://phabricator.wikimedia.org/T91954) (owner: 10Tim Landscheidt) [01:35:06] paravoid: because we were taking this route to avoid having to lock down each cassandra service individually [01:36:19] 6operations, 10RESTBase, 10RESTBase-Cassandra, 6Security, 5Patch-For-Review: securing the RESTBase Cassandra cluster - https://phabricator.wikimedia.org/T92680#1130938 (10faidon) While firewalling the Cassandra cluster (with ferm/puppet) is certainly a good idea by itself (and someone, probably @fgiunche... [01:37:18] Why is it that people use if [ "x$VAR" = "x" ]; to test for null/empty string in Bash? Doesn't "$VAR" = "" work just as well without the "x", or the if [ -n "$VAR" ] -n operator. [01:39:07] Krinkle: portability [01:39:21] for non-bash/non-dash/non-modern shells in general [01:39:36] "[" (i.e. "test" is a builtin in dash but wasn't in all shells [01:39:48] /usr/bin/[ still exists [01:40:04] paravoid: Hm.. I tried searching for it in the past but couldn't find anything, looks like google grammar has improved for '$' -> http://stackoverflow.com/a/174288/319266 [01:40:13] paravoid: Does this apply to [[ as well? [01:40:32] if you type [ "$VAR" = "" ], then this is passed as two arguments to /usr/bin/[, not three [01:40:35] so it's invalid [01:40:53] in bash (and dash), [ is a builtin, so this doesn't apply [01:41:01] [[ is a bashism [01:41:04] doesn't even exist in dash [01:43:09] Ah, okay [01:43:19] /bin/sh = dash, btw [01:44:22] So if it uses /bin/env in the hashbang, then [[ can be used, or [ without "x" hack. [01:44:33] And in an unspecified shell for portability one might use the x hack [01:44:45] ? [01:44:59] bin/bash * [01:45:02] not env, sorry [01:45:04] yes [01:45:11] if you use /bin/bash you can use all the bashisms you want [01:45:21] in expense for a tiny performance loss [01:45:33] https://gerrit.wikimedia.org/r/#/c/197832/1/bin/ci-settings.sh [01:45:41] (that resulted in a huge performance loss when booting systems with sysvinit, though, hence the switch of /bin/sh to dash) [01:46:25] -z, presumably? [01:46:31] Yeah, just noticed [01:46:42] or ! -n, although -z is prettier :) [01:46:48] zero, null [01:46:54] both make sense as abbreviation [01:46:59] but n is not zero [01:47:06] not "n"ull [01:51:37] (03PS2) 10Gergő Tisza: [WIP] Make vbench more generic [puppet] - 10https://gerrit.wikimedia.org/r/197240 (https://phabricator.wikimedia.org/T92701) [01:56:06] (03PS1) 10Dzahn: add restbase hosts to network.pp [puppet] - 10https://gerrit.wikimedia.org/r/197839 (https://phabricator.wikimedia.org/T92680) [01:57:28] (03PS3) 10Gergő Tisza: [WIP] Make vbench more generic [puppet] - 10https://gerrit.wikimedia.org/r/197240 (https://phabricator.wikimedia.org/T92701) [01:59:14] tgr: upstart file on osmium currently looks like this: https://dpaste.de/cNQy/raw [02:04:08] (03PS1) 10Dzahn: cassandra: add ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/197840 (https://phabricator.wikimedia.org/T92680) [02:11:54] PROBLEM - Disk space on vanadium is CRITICAL: DISK CRITICAL - free space: / 1428 MB (1% inode=94%): [02:13:41] (03CR) 10Alex Monk: "I don't get why people have started using this namespace alias if it never existed :/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195017 (owner: 10Mjbmr) [02:14:04] (03CR) 10Dzahn: wikistats: resource attributes quoting (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/195866 (https://phabricator.wikimedia.org/T91908) (owner: 10Matanya) [02:23:07] !log l10nupdate Synchronized php-1.25wmf21/cache/l10n: (no message) (duration: 00m 04s) [02:23:13] Logged the message, Master [02:24:17] !log LocalisationUpdate completed (1.25wmf21) at 2015-03-19 02:23:13+00:00 [02:24:20] Logged the message, Master [02:35:59] !log l10nupdate Synchronized php-1.25wmf22/cache/l10n: (no message) (duration: 00m 03s) [02:36:05] Logged the message, Master [02:37:06] !log LocalisationUpdate completed (1.25wmf22) at 2015-03-19 02:36:03+00:00 [02:37:10] Logged the message, Master [03:18:31] (03PS4) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-fr-es] - 10https://gerrit.wikimedia.org/r/195577 (https://phabricator.wikimedia.org/T92252) [03:20:25] (03PS2) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-mk-bg] - 10https://gerrit.wikimedia.org/r/195284 (https://phabricator.wikimedia.org/T89936) [03:22:43] PROBLEM - Disk space on vanadium is CRITICAL: DISK CRITICAL - free space: / 4267 MB (3% inode=94%): [04:08:16] (03CR) 10Dzahn: [C: 031] puppet_compiler: resource attributes quoting and minor lints [puppet] - 10https://gerrit.wikimedia.org/r/195660 (owner: 10Matanya) [04:12:28] (03CR) 10Dzahn: [C: 031] zuul: lint [puppet] - 10https://gerrit.wikimedia.org/r/195769 (owner: 10Matanya) [04:19:50] !log manually globalized User:WeeJay [04:19:53] Logged the message, Master [04:22:45] (03CR) 10Dzahn: [C: 032] wikimania_scholarships: resource attributes quoting and minor lint [puppet] - 10https://gerrit.wikimedia.org/r/195864 (https://phabricator.wikimedia.org/T91908) (owner: 10Matanya) [04:25:09] (03PS3) 10KartikMistry: Added missing Build-Depends [debs/contenttranslation/apertium-hbs] - 10https://gerrit.wikimedia.org/r/197345 [04:26:10] (03CR) 10Dzahn: [C: 031] backup: resource attributes quoting [puppet] - 10https://gerrit.wikimedia.org/r/195661 (owner: 10Matanya) [04:26:29] (03CR) 10Ori.livneh: [C: 032] "Labs-only" [puppet] - 10https://gerrit.wikimedia.org/r/197825 (owner: 10Ori.livneh) [04:26:31] (03PS3) 10KartikMistry: Added initial Debian package [debs/contenttranslation/apertium-hbs-eng] - 10https://gerrit.wikimedia.org/r/195232 [04:27:22] (03PS2) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-hbs-mkd] - 10https://gerrit.wikimedia.org/r/195264 (https://phabricator.wikimedia.org/T89936) [04:28:21] (03PS2) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-hbs-slv] - 10https://gerrit.wikimedia.org/r/195275 (https://phabricator.wikimedia.org/T89936) [04:31:17] (03CR) 10Dzahn: [C: 031] limn: minor lint and Resource attributes quoting [puppet] - 10https://gerrit.wikimedia.org/r/195616 (owner: 10Matanya) [04:31:25] 6operations, 6Mobile-Apps, 6Services: Deployment of Mobile App's service on the SCA cluster - https://phabricator.wikimedia.org/T92627#1131034 (10bearND) @faidon I've transferred the repo to the wikimedia account: https://github.com/wikimedia/service-mobileapp-node. [04:33:46] (03CR) 10Dzahn: [C: 032] ""Labs-only" role::labs::extdist" [puppet] - 10https://gerrit.wikimedia.org/r/195743 (owner: 10Matanya) [04:36:54] (03PS3) 10Dzahn: haproxy: quoting / lint [puppet] - 10https://gerrit.wikimedia.org/r/195652 (owner: 10Matanya) [04:37:21] (03CR) 10Dzahn: [C: 031] haproxy: quoting / lint [puppet] - 10https://gerrit.wikimedia.org/r/195652 (owner: 10Matanya) [04:42:28] (03CR) 10Dzahn: [C: 031] "needs manual rebase, path conflict" [puppet] - 10https://gerrit.wikimedia.org/r/195533 (owner: 10Matanya) [04:43:32] (03PS2) 10Dzahn: mysql: selector outside a resource + 4 spaces [puppet] - 10https://gerrit.wikimedia.org/r/195518 (owner: 10Matanya) [04:54:51] (03CR) 10Springle: [C: 031] mysql: selector outside a resource + 4 spaces [puppet] - 10https://gerrit.wikimedia.org/r/195518 (owner: 10Matanya) [05:00:08] (03PS1) 10KartikMistry: config: Remove wgContentTranslationTranslateInTarget [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197854 [05:20:04] (03CR) 10Aaron Schulz: "Can't wait till these are DC local." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197495 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [06:04:51] (03PS2) 10KartikMistry: Beta: Enable more languages in Beta [puppet] - 10https://gerrit.wikimedia.org/r/197312 [06:29:13] PROBLEM - puppet last run on amssq54 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:23] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:43] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:53] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:03] PROBLEM - puppet last run on mw2022 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:33] PROBLEM - puppet last run on mw2143 is CRITICAL: CRITICAL: Puppet has 2 failures [06:34:03] PROBLEM - Check status of defined EventLogging jobs on vanadium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:46:03] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:46:13] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:46:32] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:46:32] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:46:52] RECOVERY - puppet last run on mw2143 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [06:48:02] RECOVERY - puppet last run on mw2022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:14] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [07:02:12] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [07:10:19] <_joe_> noone around, eh? [07:14:06] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Mar 19 07:13:03 UTC 2015 (duration 13m 2s) [07:14:12] Logged the message, Master [08:12:07] good morning [08:52:08] hm [08:52:28] quick question: do i need a special right to add people to a gerrit group? [08:53:31] <_joe_> mobrovac: I think my skills at divination are better than my understanding of gerrit permissions [08:53:36] <_joe_> sorry :) [08:53:43] hehehe [08:53:54] np [08:54:20] my guess is the answer is "yes" since i can see the "add member" magic button but it's greyed out [08:54:47] which in general makes sense, but for groups/repos of which are the owner doesn't [08:57:14] (03CR) 10Alexandros Kosiaris: [C: 04-2] "Nothing special about restbase hosts to warrant a special place in that file. Let's not go around adding entries about services in that fi" [puppet] - 10https://gerrit.wikimedia.org/r/197839 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [09:01:14] (03CR) 10Alexandros Kosiaris: [C: 04-1] cassandra: add ferm rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/197840 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [09:02:26] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "I agree with alex." [puppet] - 10https://gerrit.wikimedia.org/r/197839 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [09:02:55] (03Abandoned) 10Giuseppe Lavagetto: labs::dns: remove duplicate monitoring [puppet] - 10https://gerrit.wikimedia.org/r/197513 (owner: 10Giuseppe Lavagetto) [09:03:18] (03PS2) 10Giuseppe Lavagetto: mediawiki: install fonts metric-compatible with Calibri and Cambria [puppet] - 10https://gerrit.wikimedia.org/r/196173 (https://phabricator.wikimedia.org/T84842) [09:04:09] (03CR) 10Alexandros Kosiaris: [C: 032] restbase: add ferm service for 7231/tcp [puppet] - 10https://gerrit.wikimedia.org/r/197830 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [09:05:06] 6operations: Publish a full SVN dump - https://phabricator.wikimedia.org/T93179#1131217 (10Nemo_bis) 3NEW [09:06:01] (03CR) 10Alexandros Kosiaris: [C: 032] add base::firewall on cassandra test hosts [puppet] - 10https://gerrit.wikimedia.org/r/197822 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [09:06:12] (03PS2) 10Alexandros Kosiaris: add base::firewall on cassandra test hosts [puppet] - 10https://gerrit.wikimedia.org/r/197822 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [09:08:12] (03PS4) 10Giuseppe Lavagetto: mediawiki: add appserver cluster IPs in codfw [dns] - 10https://gerrit.wikimedia.org/r/195887 (https://phabricator.wikimedia.org/T92377) [09:08:22] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: add appserver cluster IPs in codfw [dns] - 10https://gerrit.wikimedia.org/r/195887 (https://phabricator.wikimedia.org/T92377) (owner: 10Giuseppe Lavagetto) [09:10:34] (03CR) 10Gilles: [C: 031] mediawiki: install fonts metric-compatible with Calibri and Cambria [puppet] - 10https://gerrit.wikimedia.org/r/196173 (https://phabricator.wikimedia.org/T84842) (owner: 10Giuseppe Lavagetto) [09:11:12] (03CR) 10Alexandros Kosiaris: [C: 032] add base::firewall on cassandra test hosts [puppet] - 10https://gerrit.wikimedia.org/r/197822 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [09:18:12] RECOVERY - NTP on praseodymium is OK: NTP OK: Offset -0.000746846199 secs [09:23:23] RECOVERY - NTP on cerium is OK: NTP OK: Offset -0.001393914223 secs [09:24:13] RECOVERY - NTP on xenon is OK: NTP OK: Offset -0.000617146492 secs [09:25:48] (03PS1) 10Giuseppe Lavagetto: dns: re-add formatting whitespace [dns] - 10https://gerrit.wikimedia.org/r/197871 [09:25:50] (03CR) 10jenkins-bot: [V: 04-1] dns: re-add formatting whitespace [dns] - 10https://gerrit.wikimedia.org/r/197871 (owner: 10Giuseppe Lavagetto) [09:26:21] <_joe_> I hate gerrit diffs [09:26:31] (03PS2) 10Giuseppe Lavagetto: dns: re-add formatting whitespace [dns] - 10https://gerrit.wikimedia.org/r/197871 [09:27:15] mobrovac: the answer is indeed yes [09:27:30] * matanya pokes hashar for mobrovac [09:27:46] (03CR) 10Giuseppe Lavagetto: [C: 032] dns: re-add formatting whitespace [dns] - 10https://gerrit.wikimedia.org/r/197871 (owner: 10Giuseppe Lavagetto) [09:31:11] * mobrovac likes making educated guesses :) [09:37:02] PROBLEM - puppet last run on cp3018 is CRITICAL: CRITICAL: puppet fail [09:41:31] 6operations, 10ops-codfw: ms-be2009.codfw.wmnet: slot=10 dev=sdk failed - https://phabricator.wikimedia.org/T92833#1131277 (10fgiunchedi) 5Open>3Resolved a:3fgiunchedi back in service and resynched [09:42:07] (03PS2) 10Giuseppe Lavagetto: lvs: add loadbalancers for appservers, api and rendering [puppet] - 10https://gerrit.wikimedia.org/r/195899 (https://phabricator.wikimedia.org/T92377) [09:47:23] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [09:47:53] PROBLEM - HTTP 5xx req/min on graphite2001 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [09:55:04] RECOVERY - puppet last run on cp3018 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [09:55:43] _joe_: I usually download the pending patch from gerrit and use my local diff tool instead of the web diff :D [09:55:48] matanya: will contact him [09:59:43] RECOVERY - HTTP 5xx req/min on graphite2001 is OK: OK: Less than 1.00% above the threshold [250.0] [10:00:23] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [10:22:23] PROBLEM - puppet last run on vanadium is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [10:23:24] uh [10:23:41] * YuviPanda looks at puppet failures on vanadium [10:24:33] RECOVERY - Check status of defined EventLogging jobs on vanadium is OK: OK: All defined EventLogging jobs are runnning. [10:25:06] !log 50G of logs in /var/log/upstart/eventlogging_processor-client-side-events.log.1 [10:25:13] Logged the message, Master [10:28:33] PROBLEM - Check status of defined EventLogging jobs on vanadium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:30:23] !log sudo mv eventlogging_processor-client-side-events.log.1 /srv on vanadium, make space in / [10:30:28] Logged the message, Master [10:32:45] 6operations, 10Analytics, 10Analytics-EventLogging: Disk space full on vanadium from logs in /var/log/upstart - https://phabricator.wikimedia.org/T93185#1131462 (10yuvipanda) 3NEW [10:33:14] 6operations, 10Analytics, 10Analytics-EventLogging: Disk space full on vanadium from logs in /var/log/upstart - https://phabricator.wikimedia.org/T93185#1131469 (10yuvipanda) I am temporarily moving the biggest of the files (51G eventlogging_processor-client-side-events.log.1) on to /srv. Someone who knows m... [10:42:19] (03CR) 10Filippo Giunchedi: [C: 031] lvs: add loadbalancers for appservers, api and rendering [puppet] - 10https://gerrit.wikimedia.org/r/195899 (https://phabricator.wikimedia.org/T92377) (owner: 10Giuseppe Lavagetto) [10:45:33] 7Blocked-on-Operations, 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation, and 2 others: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1131481 (10hashar) 1/1 with Filippo is scheduled for Tuesday 3/24 9:30am CET. [10:51:23] RECOVERY - Check status of defined EventLogging jobs on vanadium is OK: OK: All defined EventLogging jobs are runnning. [10:59:32] 6operations, 10Analytics, 10Analytics-EventLogging: Disk space full on vanadium from logs in /var/log/upstart - https://phabricator.wikimedia.org/T93185#1131505 (10yuvipanda) The biggest logs seem to be filled with validation errors about https://meta.wikimedia.org/wiki/Schema:Edit [10:59:58] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "let's wait to have a working mediawiki install in codfw to merge this." [puppet] - 10https://gerrit.wikimedia.org/r/195899 (https://phabricator.wikimedia.org/T92377) (owner: 10Giuseppe Lavagetto) [11:04:02] RECOVERY - puppet last run on vanadium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:02] RECOVERY - Disk space on vanadium is OK: DISK OK [11:09:41] 6operations, 10Analytics, 10Analytics-EventLogging: Disk space full on vanadium from logs in /var/log/upstart - https://phabricator.wikimedia.org/T93185#1131534 (10yuvipanda) p:5Triage>3Normal [11:14:17] !log test-run tftpd-hpa on carbon vs atftpd [11:14:23] Logged the message, Master [11:14:35] ^ should have no impact, but let me know if provisioning fails in the meantime [11:28:58] (03PS1) 10Steinsplitter: Changing logo for huwikquote from std to url [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197883 (https://phabricator.wikimedia.org/T93176) [11:30:37] yuvipanda, can you merge this pls, the wiki has currently no logo or schould i wait for SWAT? [11:40:06] please someone have a look at this https related mediawiki config change https://gerrit.wikimedia.org/r/#/c/194856/ i'd like another set of eyes before i'll SWAT it [11:45:09] akosiaris: the protocol of url-downloader.wikimedia.org is http? [11:47:15] (03CR) 10JanZerebecki: [C: 04-1] "See inline." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197883 (https://phabricator.wikimedia.org/T93176) (owner: 10Steinsplitter) [11:47:54] mobrovac: yes, plus the CONNECT command as it is a forward proxy [11:48:22] oki [11:48:23] thnx [11:50:36] (03PS1) 10Mobrovac: Citoid: Set the protocol for the proxy [puppet] - 10https://gerrit.wikimedia.org/r/197884 (https://phabricator.wikimedia.org/T93157) [11:52:52] Steinsplitter: I think you should wait for SWA, yeah. [11:57:35] (03CR) 10Alexandros Kosiaris: [C: 032] Citoid: Set the protocol for the proxy [puppet] - 10https://gerrit.wikimedia.org/r/197884 (https://phabricator.wikimedia.org/T93157) (owner: 10Mobrovac) [11:58:23] 6operations, 10Analytics, 10Analytics-EventLogging: Disk space full on vanadium from logs in /var/log/upstart - https://phabricator.wikimedia.org/T93185#1131591 (10yuvipanda) p:5Normal>3Unbreak! Free space is being gobbled up really fast still, and won't last more than a few hours. @nuria Is the eventl... [12:03:35] (03PS1) 10Giuseppe Lavagetto: scap: add codfw proxies [puppet] - 10https://gerrit.wikimedia.org/r/197885 [12:09:06] 6operations, 6Commons, 6Multimedia, 7HHVM, and 3 others: Create an HHVM 3.6.0 package, adding Tim's streaming patch - https://phabricator.wikimedia.org/T93194#1131609 (10Joe) 3NEW a:3Joe [12:14:04] 10Ops-Access-Requests, 6operations, 6Phabricator, 6Release-Engineering, 5Patch-For-Review: Mukunda needs sudo on iridium (phab host) - https://phabricator.wikimedia.org/T93151#1131619 (10Aklapper) p:5Triage>3Normal [12:14:07] (03CR) 10Alexandros Kosiaris: [C: 031] "seems fine to me, wherever you are ready :-)" [puppet] - 10https://gerrit.wikimedia.org/r/195899 (https://phabricator.wikimedia.org/T92377) (owner: 10Giuseppe Lavagetto) [12:29:02] (03CR) 10Mjbmr: "No one is willing to use this alias, I'm saying it has been used, do you get that?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195017 (owner: 10Mjbmr) [12:43:32] (03CR) 10Chad: [C: 032] poolcounter: add support for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197495 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [12:43:40] <^d> _joe_: Next patch about to land ^ [12:43:45] (03Merged) 10jenkins-bot: poolcounter: add support for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197495 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [12:43:53] <^d> (also, it's not even 6am, why am I awake?!?!?!?) [12:44:43] !log demon Synchronized wmf-config: poolcounter for codfw (duration: 00m 10s) [12:44:47] Logged the message, Master [12:45:00] <^d> mw1097 didn't get the sync :( [12:46:36] (03PS2) 10Steinsplitter: Changing logo for huwikquote from std to url [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197883 (https://phabricator.wikimedia.org/T93176) [12:46:44] <^d> _joe_: No errors that I'm seeing, you've got codfw poolcounter now :) [12:48:13] (03PS5) 10Faidon Liambotis: IPsec: big off switch [puppet] - 10https://gerrit.wikimedia.org/r/196498 (https://phabricator.wikimedia.org/T88536) (owner: 10Gage) [12:48:42] (03CR) 10Faidon Liambotis: [C: 032] IPsec: big off switch [puppet] - 10https://gerrit.wikimedia.org/r/196498 (https://phabricator.wikimedia.org/T88536) (owner: 10Gage) [12:50:31] (03CR) 10Chad: memcached: add configurations for codfw (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197496 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [12:51:23] <_joe_> ^d: ops, it was scheduled for swat today :) [12:51:26] (03PS4) 10Chad: proxy: add codfw networks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197497 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [12:51:38] <^d> _joe_: I'm being all rogue :p [12:52:02] <_joe_> ^d: thanks :) [12:52:03] <^d> (also I hadn't checked swat yet) [12:52:12] <_joe_> I'll amend the swat list then [12:52:28] <_joe_> ^d: I was at lunch btw :P [12:52:38] (03CR) 10Chad: [C: 032] proxy: add codfw networks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197497 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [12:52:42] (03Merged) 10jenkins-bot: proxy: add codfw networks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197497 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [12:53:18] !log demon Synchronized wmf-config/squid.php: codfw (duration: 00m 07s) [12:53:21] Logged the message, Master [12:53:47] (03CR) 10Giuseppe Lavagetto: memcached: add configurations for codfw (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197496 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [12:54:07] (03PS4) 10Chad: jobqueue: add configuration for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197498 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [12:54:59] (03PS1) 10Faidon Liambotis: base: add tcpdump to standard-packages [puppet] - 10https://gerrit.wikimedia.org/r/197893 [12:55:04] (03CR) 10Alexandros Kosiaris: [C: 031] "Nice!" [puppet] - 10https://gerrit.wikimedia.org/r/197821 (owner: 10BBlack) [12:55:18] (03CR) 10Faidon Liambotis: [C: 032] base: add tcpdump to standard-packages [puppet] - 10https://gerrit.wikimedia.org/r/197893 (owner: 10Faidon Liambotis) [12:56:14] (03CR) 10Chad: [C: 032] jobqueue: add configuration for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197498 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [12:56:31] <_joe_> ^d: logs are full of template eval errors [12:56:38] <^d> Yes we know. [12:56:58] <^d> I griped about it for about an hour yesterday [12:57:33] jenkins lagging behind [12:57:40] (03CR) 10Faidon Liambotis: [V: 032] base: add tcpdump to standard-packages [puppet] - 10https://gerrit.wikimedia.org/r/197893 (owner: 10Faidon Liambotis) [12:58:02] <^d> paravoid: Is it? Zuul looks ok. https://integration.wikimedia.org/zuul/ [12:58:24] ^d: https://gerrit.wikimedia.org/r/#/c/197893/ [12:58:37] (03Merged) 10jenkins-bot: jobqueue: add configuration for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197498 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [12:58:43] PROBLEM - puppet last run on amssq39 is CRITICAL: CRITICAL: Puppet has 1 failures [12:58:52] <^d> paravoid: It's just waiting for those wikibase tests to finish [12:59:02] <^d> Same gate-and-submit queue [12:59:18] <^d> wawit, I was looking at the wrong thing [12:59:22] * ^d finds coffee [12:59:47] <_joe_> 2015-03-19 05:09:38 mw1249 mediawikiwiki: Redirect loop detected! [12:59:49] <_joe_> nice [12:59:58] <_joe_> of course no indication of the URL [13:00:03] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: Puppet has 1 failures [13:00:03] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: Puppet has 1 failures [13:00:40] <^d> _joe_: hhvm.log would be far too useful with urls. [13:00:42] <^d> :) [13:00:57] <^d> Part of the game is figuring out what caused it! [13:01:10] <_joe_> ^d: this was httperror.log [13:01:17] <^d> heh, gotcha [13:01:29] (those errors are temporary, they'll fix themselves) [13:01:32] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: Puppet has 1 failures [13:01:32] RECOVERY - puppet last run on amssq39 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [13:02:01] !log demon Synchronized wmf-config/jobqueue-codfw.php: codfw support (duration: 00m 05s) [13:02:02] (03CR) 10JanZerebecki: [C: 031] Changing logo for huwikquote from std to url [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197883 (https://phabricator.wikimedia.org/T93176) (owner: 10Steinsplitter) [13:02:04] Logged the message, Master [13:02:12] <^d> _joe_: All you've got left is memc and filebackend [13:02:14] <^d> :) [13:04:29] <_joe_> ^d: w00t thanks [13:04:57] <^d> yw. memc is probably fine once we finish the common/realm-specific split [13:05:03] <^d> filebackend I need to review again [13:05:31] <_joe_> filebackend needs a thorough review, and I need to add the relevant part in private [13:05:34] (03PS3) 10Chad: filebackend: add configuration for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197499 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [13:05:42] <_joe_> so hold it up [13:05:53] <_joe_> I need to speak with aaron and filippo about it [13:06:16] <^d> Yeah, I was just rebasing since I'd undone most of your chain [13:08:19] 6operations: Purge > 90 days stat1002:/a/squid/archive/glam_nara - https://phabricator.wikimedia.org/T92340#1131699 (10Multichill) Would be a waste to loose this information. What kind of geo data do you have now? Something like ipaddresses? Would it be possible to replace the sensitive data with something not s... [13:16:53] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [13:17:12] (03CR) 10Faidon Liambotis: cassandra: add ferm rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/197840 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [13:18:13] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:18:13] RECOVERY - puppet last run on amssq45 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [13:19:52] 6operations, 10Analytics, 10Analytics-EventLogging: Disk space full on vanadium from logs in /var/log/upstart - https://phabricator.wikimedia.org/T93185#1131723 (10Milimetric) Thanks @YuviPanda for the emergency fix, I'm tagging our team and making this high importance. [13:20:08] 6operations, 10Analytics, 10Analytics-EventLogging, 6Analytics-Kanban: Disk space full on vanadium from logs in /var/log/upstart - https://phabricator.wikimedia.org/T93185#1131724 (10Milimetric) [13:20:29] (03CR) 10Matanya: "I can abandon this if you wish." [puppet] - 10https://gerrit.wikimedia.org/r/195898 (owner: 10Matanya) [13:23:51] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Still failing with:" [debs/contenttranslation/apertium-fr-es] - 10https://gerrit.wikimedia.org/r/195577 (https://phabricator.wikimedia.org/T92252) (owner: 10KartikMistry) [13:26:22] 6operations, 10Analytics, 10Analytics-EventLogging, 6Analytics-Kanban: Disk space full on vanadium from logs in /var/log/upstart - https://phabricator.wikimedia.org/T93185#1131736 (10yuvipanda) Cool :) I also copied logs_02_06_onward folder onto /srv to make space as well. [13:26:33] 6operations, 10Analytics, 10Analytics-EventLogging, 6Analytics-Kanban: Disk space full on vanadium from logs in /var/log/upstart - https://phabricator.wikimedia.org/T93185#1131737 (10yuvipanda) Cool :) I also copied logs_02_06_onward folder onto /srv to make space as well. [13:27:12] (03CR) 10Hoo man: [C: 031] "Makes sense to me... keeping test2 on bits.wm.o (unlike test) also make sense (for testing caching or the like)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197381 (https://phabricator.wikimedia.org/T92949) (owner: 10Aude) [13:37:39] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "LGTM" [debs/contenttranslation/apertium-hbs] - 10https://gerrit.wikimedia.org/r/197345 (owner: 10KartikMistry) [13:38:22] _joe_: btw, I rewrote my ENC as a http service... [13:38:22] _joe_: needs more cleanup and profiling, but hopefully I’ll be able to get it deployed on labs puppetmaster at some point soon. [13:38:22] and *then* I can get rid of hiera_include(‘classes’) [13:38:41] <_joe_> YuviPanda: ok, do you have a change for that? [13:39:47] !log uploaded apertium-hbs_0.5.0~r57197-2 on apt.wikimedia.org [13:39:49] kart_: ^ [13:39:52] Logged the message, Master [13:40:02] _joe_: yeah, finding... [13:40:03] * YuviPanda is on super high latency internet atm [13:40:27] _joe_: is Ie5f0d688d8385dd9a723f69798138f7b9e020b7f [13:40:37] <_joe_> YuviPanda: ok thanks [13:41:00] (gerrit is too slow for me atm) [13:41:17] _joe_: the python style could use some improvement, and I have to study ‘threading / async behavior in python’ a fair bit more before I will be comfortable merging that [13:41:26] <_joe_> YuviPanda: you also need an enc script that curls it :) [13:41:50] btw you know what caused a ton of latency for me yesterday ? Dropbox + my parents router [13:42:11] while dropbox was running, mtr was reporting latencies of up to 3 secs [13:42:22] because unlike most python webservices I’ve written, this one keeps state in itself... [13:42:22] _joe_: yeah, that too but that should be trivial. I will write that in Scala [13:42:27] RTT of 3 secs is plain ridiculous... [13:42:30] time to get some QoS on your router! [13:42:32] YuviPanda: :P [13:42:44] akosiaris: the cause of latency for me is my 3G provider seems shit, and I’m in an autorickshaw that’s careening through this city with no regard for laws of physics... [13:42:51] I am gonna write a bot to kick anyone talking about scala in this room :P [13:43:00] scalakick-bot [13:43:14] <_joe_> YuviPanda: I can take a look maybe, not now though I have to work on codfw for now [13:43:16] haha [13:43:18] toollabs would be perfect as a host [13:43:37] <_joe_> ^d: we've moved from 500 Internal server error to HTTP/1.1 500 MediaWiki exception thanks to your merges :) [13:43:53] <_joe_> that's actually a progress [13:44:11] <^d> Yay! [13:44:14] _joe_: yeah, that’s fine :) I have been talking to andrewbogott, when time coms / horizon comes we’ll just extend this, and get rid of the LDAP parts. [13:45:19] akosiaris: would you have some spare time to get a bunch of .deb packages pushed to our apt.wm.o repos ? [13:46:25] YuviPanda: So yeah; I've been giving some thought to the proxy thing and I see a clean and simple way to coalesce everything in redis in a way that should be pretty much restartproof for state. [13:46:42] YuviPanda: We'll sit down and talk it out once you switched continents. :-) [13:47:28] :) nice! :D [13:47:29] Coren: yeah, cool. feel free to jot down notes in the phab task, though. subconscious noodling on it might be useful [13:49:32] hashar: yeah, you are next on the list after I am done with kart_ [13:50:08] akosiaris: looking. Thanks! [13:50:08] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "LGTM" [debs/contenttranslation/apertium-mk-bg] - 10https://gerrit.wikimedia.org/r/195284 (https://phabricator.wikimedia.org/T89936) (owner: 10KartikMistry) [13:50:46] !log uploaded apertium-mk-bg_0.2.0~r49489-1 on apt.wikimedia.org [13:50:49] Logged the message, Master [13:52:03] akosiaris: can you merge, https://gerrit.wikimedia.org/r/#/c/197312 meanwhile? [13:55:15] 6operations, 10Analytics, 10Analytics-EventLogging, 6Analytics-Kanban: Disk space full on vanadium from logs in /var/log/upstart - https://phabricator.wikimedia.org/T93185#1131817 (10Nuria) Sorry I did not send e-mail to team yesterday. The root cause is a huge volume of events that are invalid that we can... [13:55:54] (03CR) 10Ottomata: [C: 031] "I'd merge but Alex had the last comments. Alex?" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/195483 (https://phabricator.wikimedia.org/T91617) (owner: 10GWicke) [13:59:43] (03PS3) 10Matanya: nova: lint compute.pp [puppet] - 10https://gerrit.wikimedia.org/r/195535 [14:00:10] (03PS4) 10Giuseppe Lavagetto: memcached: add configurations for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197496 (https://phabricator.wikimedia.org/T91754) [14:00:19] (03CR) 10Alexandros Kosiaris: [C: 04-1] Added initial Debian packaging (031 comment) [debs/contenttranslation/apertium-hbs-slv] - 10https://gerrit.wikimedia.org/r/195275 (https://phabricator.wikimedia.org/T89936) (owner: 10KartikMistry) [14:00:40] (03CR) 10jenkins-bot: [V: 04-1] nova: lint compute.pp [puppet] - 10https://gerrit.wikimedia.org/r/195535 (owner: 10Matanya) [14:01:46] (03PS4) 10Matanya: nova: lint compute.pp [puppet] - 10https://gerrit.wikimedia.org/r/195535 [14:01:47] how ironic [14:01:55] fail on lint in a lint patch [14:02:05] :) [14:02:22] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "missing labs config" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197499 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [14:02:30] (03PS3) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-hbs-slv] - 10https://gerrit.wikimedia.org/r/195275 (https://phabricator.wikimedia.org/T89936) [14:02:33] * matanya sits in the corner, ashamed. [14:02:44] akosiaris: wrap-and-sort fail :/ [14:03:57] <_joe_> matanya: ahah [14:05:29] !log uploaded apertium-hbs-eng_0.1.0~r57554-1 on apt.wikimedia.org [14:05:33] Logged the message, Master [14:05:40] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "LGTM" [debs/contenttranslation/apertium-hbs-eng] - 10https://gerrit.wikimedia.org/r/195232 (owner: 10KartikMistry) [14:05:49] (03CR) 10Chad: "Couple of inline nits, but looks functionally ok :)" (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197496 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [14:05:53] <^d> _joe_: ^ [14:08:13] (03CR) 10Alexandros Kosiaris: [C: 04-1] Added initial Debian packaging (031 comment) [debs/contenttranslation/apertium-hbs-mkd] - 10https://gerrit.wikimedia.org/r/195264 (https://phabricator.wikimedia.org/T89936) (owner: 10KartikMistry) [14:09:56] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Enable more languages in Beta [puppet] - 10https://gerrit.wikimedia.org/r/197312 (owner: 10KartikMistry) [14:10:53] (03CR) 10Giuseppe Lavagetto: memcached: add configurations for codfw (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197496 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [14:11:54] <^d> _joe_: Fair enough on having a default file, just making sure we both knew why :) [14:11:57] (03PS5) 10Giuseppe Lavagetto: memcached: add configurations for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197496 (https://phabricator.wikimedia.org/T91754) [14:12:39] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "LGTM" [debs/contenttranslation/apertium-hbs-slv] - 10https://gerrit.wikimedia.org/r/195275 (https://phabricator.wikimedia.org/T89936) (owner: 10KartikMistry) [14:13:43] !log uploaded apertium-hbs-slv_0.5.0~r43858-1 on apt.wikimedia.org [14:13:49] Logged the message, Master [14:13:50] (03CR) 10Chad: [C: 032] memcached: add configurations for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197496 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [14:13:56] (03Merged) 10jenkins-bot: memcached: add configurations for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197496 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [14:14:41] !log demon Synchronized wmf-config/: memc for codfw (duration: 00m 08s) [14:14:44] Logged the message, Master [14:15:34] <_joe_> ^d: you're too fast for me :P [14:16:39] <^d> ;-) [14:18:05] 6operations, 10Analytics, 10Analytics-EventLogging, 6Analytics-Kanban: Disk space full on vanadium from logs in /var/log/upstart - https://phabricator.wikimedia.org/T93185#1131852 (10Nuria) Ticket: https://phabricator.wikimedia.org/T93201 [14:18:21] <_joe_> thanks a lot btw [14:19:00] <^d> yw, glad to be of help [14:20:30] (03PS3) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-hbs-mkd] - 10https://gerrit.wikimedia.org/r/195264 (https://phabricator.wikimedia.org/T89936) [14:24:06] 6operations, 10ops-codfw, 7network, 3wikis-in-codfw: Codfw mediawiki appservers from any rows but row A can't communicate with the dhcp server - https://phabricator.wikimedia.org/T92815#1131871 (10Joe) It works like a charm, thanks @RobH [14:24:40] 6operations, 10ops-eqiad: cp1047 down - https://phabricator.wikimedia.org/T88045#1131876 (10Cmjohnson) Returning the bad DIMM either USPS or FEDEX - Documenting for future reference as our shipments have been lost in the past. FEDEX Tracking # 9611918 2393026 47599924 or USPS 9202 3946 5301 2426 1088 14 [14:25:48] Is there any way to check if particular wiki exists and apply config accordingly [14:25:53] ^d: ^^ [14:26:15] for example if wiki exists in Beta, use this config, else this. [14:27:31] <^d> You just put the wiki's dbname in the config. [14:27:42] <^d> And if the wiki doesn't exist, it'll just be ignored. [14:28:49] 6operations, 10ops-codfw, 7network, 3wikis-in-codfw: Codfw mediawiki appservers from any rows but row A can't communicate with the dhcp server - https://phabricator.wikimedia.org/T92815#1131888 (10Joe) 5Open>3Resolved [14:29:58] aude: could you check if https://gerrit.wikimedia.org/r/#/c/194856/ is good to deploy? i'd like another set of eyes before signing up for SWAT. [14:30:38] jzerebecki: i am not knowledgable enough to +1 [14:30:42] maybe ask hoo [14:30:50] seems sane though [14:30:54] thx anyway [14:32:22] ^d: perhaps you?^^ [14:33:37] 6operations, 10ops-eqiad: db1020 raid degraded - https://phabricator.wikimedia.org/T93166#1131897 (10Cmjohnson) a:3Cmjohnson [14:37:37] (03PS2) 10Mobrovac: Adjust RESTBase / Cassandra settings for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/197662 (https://phabricator.wikimedia.org/T91102) [14:39:09] (03PS3) 10Mobrovac: Adjust RESTBase / Cassandra settings for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/197662 (https://phabricator.wikimedia.org/T91102) [14:40:09] 6operations, 10ops-eqiad: db1020 raid degraded - https://phabricator.wikimedia.org/T93166#1131933 (10Cmjohnson) replaced the drive with an on-site spare Enclosure Device ID: 32 Slot Number: 10 Drive's position: DiskGroup: 0, Span: 5, Arm: 0 Enclosure position: N/A Device Id: 10 WWN: 5000C5003240ED44 Sequence... [14:40:40] (03CR) 10Chad: [C: 031] Hide "prefershttps" preference on HSTS domains (ru): it has no effect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194856 (https://phabricator.wikimedia.org/T91352) (owner: 10Nemo bis) [14:42:26] (03CR) 10JanZerebecki: "Signed up for todays Morning SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194856 (https://phabricator.wikimedia.org/T91352) (owner: 10Nemo bis) [14:42:36] thx [14:42:55] (03CR) 10JanZerebecki: "Signed up for todays Morning SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197883 (https://phabricator.wikimedia.org/T93176) (owner: 10Steinsplitter) [14:43:10] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra compaction is getting behind - https://phabricator.wikimedia.org/T93140#1131941 (10Eevans) After letting it run over night, it would seem that 1-4 are in OK shape, steadily trending down, but 5 and 6 (the two nodes involved in the recent bootstrap oper... [14:43:42] ^d: that's easy :D [14:46:22] 6operations: Provide dh-virtualenv 0.9 package on apt.wikimedia.org Precise and Trusty distributions - https://phabricator.wikimedia.org/T91631#1131954 (10akosiaris) a:3akosiaris [14:48:12] akosiaris: and similar is updating python-virtualenv on Precise https://phabricator.wikimedia.org/T92033 :D [14:48:58] 6operations: Backport python-virtualenv 1.11.4 from Trusty to Precise - https://phabricator.wikimedia.org/T92033#1131958 (10akosiaris) a:5hashar>3akosiaris [14:51:24] Krenair: Want to SWAT today, since it's Thursday and you have patches? [14:51:33] okay [14:51:45] * anomie leaves you to it then. [14:52:04] I haven't pinged the other SWATees yet, so you might want to do that. [14:52:09] good morning [14:52:14] * aude waves :) [14:53:20] Well I know jzerebecki is here [14:53:30] \o [14:53:34] aude, gwicke: ping, swat in a few minutes [14:54:01] (03PS2) 10KartikMistry: config: Remove wgContentTranslationTranslateInTarget for existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197854 [14:54:24] 6operations, 6Phabricator, 7domains: enable email for tickets in domains project? - https://phabricator.wikimedia.org/T88842#1131975 (10Aklapper) @Dzahn: Do you know if there is something left to do here? [14:55:14] gwicke, your patches are.... interesting. [14:55:25] (03CR) 10Glaisher: "Any issues with this?" [puppet] - 10https://gerrit.wikimedia.org/r/185474 (https://phabricator.wikimedia.org/T87039) (owner: 10Glaisher) [14:55:37] Krenair: don't blame me [14:55:47] You have three patches, and the last one basically reverts the previous two and does more changes [14:55:59] it's all Jame's patches [14:56:06] James' [14:56:15] Are you happy to deploy this to all wikipedias straight away? [14:56:34] Krenair: we are on it,ru,pl,pt,fr so far [14:56:38] and things are looking good [14:56:44] frwiki is the biggest VE user [14:56:51] Okay. And you think it can cope with the load etc.? [14:56:59] yeah [14:57:13] current request rate is trending up, but is barely over 1 req/s [14:57:28] peak throughput is about 10k req/s [14:57:32] so some headroom left [14:58:06] Krenair: let me prepare a single patch [14:58:11] I was about to ask for that :) [14:58:12] thanks [14:58:17] I'm happy with all the other patches. [14:59:26] https://test.wikidata.org/w/static-1.25wmf22/extensions/GuidedTour/modules/ext.guidedTour.lib/ext.guidedTour.lib.Step.js (e.g. looks good) [14:59:29] for my change [15:00:04] manybubbles, anomie, ^d, thcipriani, marktraceur, Krenair, gwicke: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150319T1500). [15:00:47] Yep looks fine, thanks aude [15:01:03] (03PS3) 10Alex Monk: Don't use bits for test.wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197381 (https://phabricator.wikimedia.org/T92949) (owner: 10Aude) [15:01:09] (03CR) 10Alex Monk: [C: 032] Don't use bits for test.wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197381 (https://phabricator.wikimedia.org/T92949) (owner: 10Aude) [15:01:14] (03Merged) 10jenkins-bot: Don't use bits for test.wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197381 (https://phabricator.wikimedia.org/T92949) (owner: 10Aude) [15:01:30] 6operations: Add Yana to contracts@ - https://phabricator.wikimedia.org/T91269#1131988 (10Aklapper) @Chip: Any progress here? [15:01:32] (03PS1) 10GWicke: Enable RESTBase for VisualEditor on all wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197907 [15:02:32] !log krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/197381/ (duration: 00m 06s) [15:02:33] aude, ^ [15:02:37] Logged the message, Master [15:02:38] checking [15:02:48] Krenair: https://gerrit.wikimedia.org/r/#/c/197907/ [15:02:59] updating the deployments page [15:03:01] looks ok [15:03:01] (03CR) 10Mobrovac: [C: 031] Enable RESTBase for VisualEditor on all wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197907 (owner: 10GWicke) [15:03:08] yep [15:03:13] <^d> Krenair: I already did _joe_'s stuff for codfw [15:03:19] ok [15:03:29] <^d> Oh it was removed, ok [15:03:33] (03PS2) 10Alex Monk: Use wikidata touch icon for test.wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197354 (https://phabricator.wikimedia.org/T92948) (owner: 10Aude) [15:03:38] (03CR) 10Alex Monk: [C: 032] Use wikidata touch icon for test.wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197354 (https://phabricator.wikimedia.org/T92948) (owner: 10Aude) [15:03:56] <^d> _joe_: speaking of, how can I test codfw from home? [15:04:31] <_joe_> ^d: you ssh into one of the mw servers in codfw, and curl -H to localhost [15:04:34] <_joe_> :P [15:04:42] <^d> no lvs yet? [15:04:49] 6operations, 10ops-eqiad, 5Patch-For-Review: Rack and set up ms-be1016-1018 - https://phabricator.wikimedia.org/T90922#1131995 (10fgiunchedi) giving uefi boot a try, pxelinux / syslinux uefi versions currently run into this: http://www.syslinux.org/archives/2015-February/023178.html after which it proceeds w... [15:05:02] <_joe_> ^d: lvs will be up once the enwiki main page will not respond with a 500 [15:05:12] <_joe_> and even after that, it will be an internal IP [15:05:15] 6operations: Backport python-virtualenv 1.11.4 from Trusty to Precise - https://phabricator.wikimedia.org/T92033#1132003 (10akosiaris) So, the only boxes that do have the package are: ``` {'hafnium.wikimedia.org': {'lsb_distrib_codename': 'precise'}} {'helium.eqiad.wmnet': {'lsb_distrib_codename': 'precise'}} {'... [15:05:22] <_joe_> no varnish there up to now [15:05:40] <_joe_> but you can create an ssh tunnel if you prefer [15:06:50] Why does jenkins queue up things to be merged like this? [15:07:00] (03Merged) 10jenkins-bot: Use wikidata touch icon for test.wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197354 (https://phabricator.wikimedia.org/T92948) (owner: 10Aude) [15:07:13] !log uploaded python-virtualenv_1.11.4-1 on apt.wikimedia.org precise-wikimedia [15:07:17] Logged the message, Master [15:07:18] <^d> Krenair: For unrelated things I've yet to figure out [15:07:26] :( [15:07:28] 7Blocked-on-Operations, 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation, and 2 others: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1132011 (10akosiaris) [15:07:29] 6operations: Backport python-virtualenv 1.11.4 from Trusty to Precise - https://phabricator.wikimedia.org/T92033#1132009 (10akosiaris) 5Open>3Resolved Done [15:07:39] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/197354/ (duration: 00m 08s) [15:07:41] aude, ^ [15:07:42] Logged the message, Master [15:07:47] * aude checks [15:08:03] looks good [15:08:43] RECOVERY - RAID on db1020 is OK: OK: optimal, 1 logical, 2 physical [15:09:00] thanks [15:09:08] (03PS2) 10Alex Monk: Enable RESTBase for VisualEditor on all wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197907 (owner: 10GWicke) [15:09:15] (03PS3) 10Alex Monk: Enable RESTBase for VisualEditor on all wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197907 (owner: 10GWicke) [15:09:24] (03CR) 10Alex Monk: [C: 032] Enable RESTBase for VisualEditor on all wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197907 (owner: 10GWicke) [15:09:29] (03Merged) 10jenkins-bot: Enable RESTBase for VisualEditor on all wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197907 (owner: 10GWicke) [15:10:25] gwicke, ready? [15:10:40] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/197907/ (duration: 00m 06s) [15:10:43] Logged the message, Master [15:11:06] works for me [15:11:09] Krenair: yes [15:12:24] yep, think everything is fine gwicke [15:12:30] any issues on the restbase side? [15:12:43] Krenair: the only thing I'm worried about is dirty diffs [15:12:51] checking the recent changes [15:12:56] ok [15:13:22] but looking good [15:13:57] VE request rate is around 1.5/s currently [15:14:37] yeah, I checked a few enwiki ve edits and they seem fine to me as well [15:14:44] happy to move on? [15:14:55] yup [15:15:03] jzerebecki, hey [15:15:10] Krenair: and thanks! [15:15:17] Krenair: pong [15:15:30] gwicke, you're welcome. don't forget about those group1 wikis please :) [15:16:01] Krenair: yes, maybe tonight [15:17:28] (03PS3) 10Alex Monk: Changing logo for huwikquote from std to url [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197883 (https://phabricator.wikimedia.org/T93176) (owner: 10Steinsplitter) [15:17:35] (03CR) 10Alex Monk: [C: 032] Changing logo for huwikquote from std to url [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197883 (https://phabricator.wikimedia.org/T93176) (owner: 10Steinsplitter) [15:17:40] (03Merged) 10jenkins-bot: Changing logo for huwikquote from std to url [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197883 (https://phabricator.wikimedia.org/T93176) (owner: 10Steinsplitter) [15:18:32] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/197883/ (duration: 00m 05s) [15:18:36] jzerebecki, ^ [15:18:37] Logged the message, Master [15:19:23] Krenair: works [15:19:41] jzerebecki, ok, just thinking about the ru hsts patch... [15:19:50] (03PS1) 10Eevans: overrid-able concurrent_compactors setting [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/197911 (https://phabricator.wikimedia.org/T93140) [15:20:09] Krenair: any questions? concerns? [15:20:56] jzerebecki, just wondering if it's in the public config anywhere already [15:21:04] or if this is something in the public puppet repo [15:21:55] Krenair: the config setting for setting the origin to https is in the config repo [15:22:02] although we do already have a patch approved to set the canonical server url just for russian projects [15:22:11] search for https://ru. [15:22:17] so I think it's fine to assume it's russian projects only [15:22:19] <_joe_> bbiab [15:23:03] Krenair: you could look at the private puppet repo [15:23:11] _joe_: RB is enabled for real now, you should re-try VE when you are back ;) [15:23:20] jzerebecki, I can't [15:23:38] Or at least I don't think I can? [15:24:09] Either way, if it is there I'm not going to risk going there and revealing whether it's there or not [15:25:16] yes you should not have permission for that, but what does it matter? if there are more wikis we don't know about we can still add them later. [15:25:35] okay, this patch and task has been open long enough for comment and it has +1 from people who know better than I. it's easy to simply un-hide the preference later [15:25:46] or apply it to more wikis [15:26:20] (03PS8) 10Alex Monk: Hide "prefershttps" preference on HSTS domains (ru): it has no effect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194856 (https://phabricator.wikimedia.org/T91352) (owner: 10Nemo bis) [15:26:28] (03CR) 10Alex Monk: [C: 032] Hide "prefershttps" preference on HSTS domains (ru): it has no effect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194856 (https://phabricator.wikimedia.org/T91352) (owner: 10Nemo bis) [15:26:35] (03Merged) 10jenkins-bot: Hide "prefershttps" preference on HSTS domains (ru): it has no effect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194856 (https://phabricator.wikimedia.org/T91352) (owner: 10Nemo bis) [15:27:10] !log krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/194856/ (duration: 00m 06s) [15:27:14] jzerebecki, ^ [15:27:18] Logged the message, Master [15:27:36] Krenair: works. pref is gone. thank you. [15:29:02] (03PS3) 10Alex Monk: Enable Citoid extension on all VisualEditor wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197074 (https://phabricator.wikimedia.org/T62768) (owner: 10Jforrester) [15:29:08] (03CR) 10Alex Monk: [C: 032] Enable Citoid extension on all VisualEditor wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197074 (https://phabricator.wikimedia.org/T62768) (owner: 10Jforrester) [15:29:18] (03Merged) 10jenkins-bot: Enable Citoid extension on all VisualEditor wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197074 (https://phabricator.wikimedia.org/T62768) (owner: 10Jforrester) [15:29:29] hmm... hang on a sec [15:29:53] I think that might need a default set. [15:29:56] will do that just in case [15:31:12] and now I wonder about the 'visualeditor' group :/ [15:32:21] (03CR) 10Mobrovac: [C: 04-1] "LGTM, minor templating issue" (031 comment) [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/197911 (https://phabricator.wikimedia.org/T93140) (owner: 10Eevans) [15:34:09] (03PS1) 10Eevans: increase compaction throughput and concurrency [puppet] - 10https://gerrit.wikimedia.org/r/197915 (https://phabricator.wikimedia.org/T93140) [15:34:10] ... yeah that's not going to work. [15:34:37] (03PS1) 10Alex Monk: Revert "Enable Citoid extension on all VisualEditor wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197917 [15:34:44] (03CR) 10Alex Monk: [C: 032] Revert "Enable Citoid extension on all VisualEditor wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197917 (owner: 10Alex Monk) [15:35:03] (03Merged) 10jenkins-bot: Revert "Enable Citoid extension on all VisualEditor wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197917 (owner: 10Alex Monk) [15:35:26] (didn't sync it any further than mw1017) [15:35:37] (03PS1) 10Nikerabbit: Add dedicated runner for MessageIndexRebuildJon [puppet] - 10https://gerrit.wikimedia.org/r/197919 (https://phabricator.wikimedia.org/T90704) [15:36:20] <_joe_> urandom: small gerrit trick, when you want to create two dependent changes, commit both changes on the same branch and do git review on it [15:36:27] <_joe_> they will be dependent on each other [15:36:52] _joe_: thanks! [15:36:59] _joe_: does that work for a submodule? [15:37:18] <_joe_> within the same gerrit project [15:37:22] (03PS1) 10Nikerabbit: $wgTranslateDelayedMessageIndexRebuild = true; [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197920 (https://phabricator.wikimedia.org/T90704) [15:37:33] <_joe_> so if you are modifying operations/puppet/cassandra [15:37:50] <_joe_> you need that to be independent [15:38:07] <_joe_> but then you can make the change to operations/puppet depend on the merge from the submodule [15:38:16] <_joe_> ugh, I hate submodules [15:38:36] _joe_: hrmm, is it too late to do that? [15:38:56] <_joe_> urandom: well I won't bother in this case [15:39:05] (03CR) 10Mobrovac: [C: 04-1] "Due to the submodule dep, the patchset should also include the ref update of the cassandra submodule once https://gerrit.wikimedia.org/r/#" [puppet] - 10https://gerrit.wikimedia.org/r/197915 (https://phabricator.wikimedia.org/T93140) (owner: 10Eevans) [15:39:19] <_joe_> or what marko just commented :) [15:39:20] (03CR) 10Alex Monk: "I realised while looking at this that:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197074 (https://phabricator.wikimedia.org/T62768) (owner: 10Jforrester) [15:41:10] 6operations: (U)EFI support - https://phabricator.wikimedia.org/T93208#1132116 (10fgiunchedi) 3NEW [15:41:28] 6operations, 10ops-eqiad, 5Patch-For-Review: Rack and set up ms-be1016-1018 - https://phabricator.wikimedia.org/T90922#1071198 (10fgiunchedi) grub-efi boots fine, however (obviously) doesn't support the path prefix passed from dhcp to pxelinux and thus tries to load files relative to the tftp root directory.... [15:46:00] (03PS2) 10Eevans: overrid-able concurrent_compactors setting [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/197911 (https://phabricator.wikimedia.org/T93140) [15:46:47] (03PS1) 10Alex Monk: Enable Citoid extension on all VisualEditor wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197925 (https://phabricator.wikimedia.org/T62768) [15:48:02] 6operations, 3codfw-appserver-setup, 7database, 3wikis-in-codfw: Grant access to the databases to codfw appserver networks - https://phabricator.wikimedia.org/T93211#1132152 (10Joe) 3NEW [15:48:37] (03PS2) 10Alex Monk: Enable Citoid extension on all VisualEditor wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197925 (https://phabricator.wikimedia.org/T62768) [15:49:32] (03CR) 10Eevans: overrid-able concurrent_compactors setting (031 comment) [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/197911 (https://phabricator.wikimedia.org/T93140) (owner: 10Eevans) [15:49:52] (03PS1) 10Glaisher: Make Spam Blacklist global file protocol-relative [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197927 [15:50:55] !log krenair Synchronized php-1.25wmf21/extensions/WikiEditor/WikiEditor.hooks.php: https://gerrit.wikimedia.org/r/#/c/197905/ (duration: 00m 07s) [15:50:58] Logged the message, Master [15:51:31] Krenair: I think we do have a visualeditor dblist, no? [15:51:39] I don't think so? [15:51:51] we have a visualeditor-default [15:52:17] ah, confused with that one [15:53:14] !log krenair Synchronized php-1.25wmf22/extensions/WikiEditor/WikiEditor.hooks.php: https://gerrit.wikimedia.org/r/#/c/197904/ (duration: 00m 05s) [15:53:17] Logged the message, Master [15:55:26] (03CR) 10Alex Monk: [C: 032] Enable Citoid extension on all VisualEditor wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197925 (https://phabricator.wikimedia.org/T62768) (owner: 10Alex Monk) [15:55:28] (03PS2) 10Nemo bis: Add dedicated runner for MessageIndexRebuildJob [puppet] - 10https://gerrit.wikimedia.org/r/197919 (https://phabricator.wikimedia.org/T90704) (owner: 10Nikerabbit) [15:55:31] (03Merged) 10jenkins-bot: Enable Citoid extension on all VisualEditor wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197925 (https://phabricator.wikimedia.org/T62768) (owner: 10Alex Monk) [15:56:40] !log krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/197925/ (duration: 00m 07s) [15:56:46] Logged the message, Master [15:58:40] (03CR) 10Mobrovac: [C: 031] overrid-able concurrent_compactors setting [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/197911 (https://phabricator.wikimedia.org/T93140) (owner: 10Eevans) [15:58:58] swat is over [16:00:03] (03CR) 10Tim Landscheidt: "This caused T93212." [puppet] - 10https://gerrit.wikimedia.org/r/197337 (owner: 10Faidon Liambotis) [16:00:04] kart_, ^d: Respected human, time to deploy Content Translation/cxserver (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150319T1600). Please do the needful. [16:01:16] Aye Sir [16:05:50] set sail [16:13:43] !log kartik Started scap: Update ContentTranslation [16:13:47] 6operations, 10ops-eqiad: db1020 raid degraded - https://phabricator.wikimedia.org/T93166#1132236 (10Cmjohnson) 5Open>3Resolved Disk replaced and spun up Enclosure Device ID: 32 Slot Number: 10 Drive's position: DiskGroup: 0, Span: 5, Arm: 0 Enclosure position: N/A Device Id: 10 WWN: 5000C5003240ED44 Seq... [16:13:47] Logged the message, Master [16:14:12] Captain Nikerabbit, scap has started. Mutiny!! [16:14:27] let there be good wind [16:17:10] 6operations, 10ops-eqiad: cp1047 down - https://phabricator.wikimedia.org/T88045#1132245 (10Cmjohnson) Received the new DIMM and replaced the bad one that I had inserted in slot B1. Rebooted and no errors show. Sending the other bad DIMM back. Tracking Numbers are Fedex 9611918 2393026 47648516 USPS 9202... [16:17:14] (03PS1) 10Tim Landscheidt: sslcert: Don't delete manually managed private keys [puppet] - 10https://gerrit.wikimedia.org/r/197932 (https://phabricator.wikimedia.org/T93212) [16:17:22] 6operations, 10ops-eqiad: cp1047 down - https://phabricator.wikimedia.org/T88045#1132258 (10Cmjohnson) 5Open>3Resolved [16:17:22] RECOVERY - Host cp1047 is UP: PING OK - Packet loss = 0%, RTA = 2.10 ms [16:20:04] 6operations, 10ops-codfw: rack and connect labstore capacity expansion in codfw - https://phabricator.wikimedia.org/T93215#1132269 (10RobH) 3NEW a:3coren [16:20:24] 6operations, 10ops-codfw: rack and connect labstore capacity expansion in codfw - https://phabricator.wikimedia.org/T93215#1132285 (10RobH) This is the shelf ordered on https://rt.wikimedia.org/Ticket/Display.html?id=9122 [16:21:12] (03CR) 10BBlack: [C: 04-1] "a file with just a lone "ensure => present" will create the file via puppet if missing (with default perms and zero bytes of content)." [puppet] - 10https://gerrit.wikimedia.org/r/197932 (https://phabricator.wikimedia.org/T93212) (owner: 10Tim Landscheidt) [16:21:53] (03CR) 10BBlack: "(probably we should remove the else clause altogether for now)" [puppet] - 10https://gerrit.wikimedia.org/r/197932 (https://phabricator.wikimedia.org/T93212) (owner: 10Tim Landscheidt) [16:24:09] 6operations, 10ops-codfw: rack/wire/initial setup of db2043-db2070 - https://phabricator.wikimedia.org/T89368#1132300 (10RobH) @Mark: Can you advise on the db racking (all info is in above ticket , quick link to comment https://phabricator.wikimedia.org/T89368#1034823 ) The unracked databases are just sittin... [16:24:14] 6operations: Add Yana to contracts@ - https://phabricator.wikimedia.org/T91269#1132301 (10Chip) Sorry all, missed these, didn't have notifications set up properly and didn't have a proper Phabricator account. Yana has been added to the contracts@ group. In the future, would you please send OIT requests to tech... [16:25:20] 6operations: Add Yana to contracts@ - https://phabricator.wikimedia.org/T91269#1132302 (10Slaporte) Got it. Thanks! [16:26:29] (03PS2) 10BBlack: sslcert: Don't delete manually managed private keys [puppet] - 10https://gerrit.wikimedia.org/r/197932 (https://phabricator.wikimedia.org/T93212) (owner: 10Tim Landscheidt) [16:26:35] 6operations, 10Analytics, 10Analytics-EventLogging, 6Analytics-Kanban, 5Patch-For-Review: Disk space full on vanadium from logs in /var/log/upstart - https://phabricator.wikimedia.org/T93185#1132304 (10Nuria) Crisis averted, loogs are growing to 300kbs sec not to *ahem* 2MB per sec. Resolving ticket. [16:26:49] 6operations, 10Analytics, 10Analytics-EventLogging, 6Analytics-Kanban, 5Patch-For-Review: Disk space full on vanadium from logs in /var/log/upstart - https://phabricator.wikimedia.org/T93185#1132307 (10Nuria) 5Open>3Resolved [16:29:06] (03CR) 10BBlack: [C: 032] sslcert: Don't delete manually managed private keys [puppet] - 10https://gerrit.wikimedia.org/r/197932 (https://phabricator.wikimedia.org/T93212) (owner: 10Tim Landscheidt) [16:30:53] (03PS1) 10BryanDavis: Update wgLocalisationUpdateDirectory to match l10nupdate-1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197937 (https://phabricator.wikimedia.org/T92823) [16:31:31] Nikerabbit: ^ we need that change now for l10nupdate [16:32:41] bd808: what do you need, +2? [16:32:47] 6operations, 10ops-codfw: rack and connect labstore capacity expansion in codfw - https://phabricator.wikimedia.org/T93215#1132329 (10coren) @papaul: This needs to be configured and wired the same way eqiad currently is wired. Please ask @cmjohnson for a recent diagram, I believe he has updated it when labsto... [16:32:58] I need someone to deploy. I'm too busy today [16:33:07] we could just put it up for afternoon swat [16:33:27] bd808: if we have time in CX window I could do it, otherwise let's put it into swat [16:33:49] (03PS2) 10BryanDavis: Update wgLocalisationUpdateDirectory to match l10nupdate-1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197937 (https://phabricator.wikimedia.org/T92823) [16:34:27] Nikerabbit: sweet. Can you add it to the swat window if you don't have time to push it? I'm going to be out most of the day today. [16:35:02] PROBLEM - HHVM rendering on mw1034 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:05] bd808: sure [16:35:24] PROBLEM - Apache HTTP on mw1034 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:38:46] 6operations, 10ops-codfw: rack and connect labstore capacity expansion in codfw - https://phabricator.wikimedia.org/T93215#1132350 (10coren) (Sorry, I should have probably mentioned that this shelf is intended for labstore200[12] alongside the other four) [16:39:24] 6operations, 10hardware-requests: deploy eventlog2001 - https://phabricator.wikimedia.org/T90907#1132351 (10RobH) [16:42:43] PROBLEM - HHVM busy threads on mw1034 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [86.4] [16:43:22] !log deployed parsoid sha f5f5f0ede [16:43:34] Logged the message, Master [16:45:02] 6operations, 10ops-codfw: rack and connect labstore capacity expansion in codfw - https://phabricator.wikimedia.org/T93215#1132361 (10RobH) So this will rack in b1-codfw. Lets get the layout of the eqiad labstore on this task so we can compare, and then see if we have to move anything around in b1-codfw, or i... [16:46:07] 6operations, 10ops-codfw: rack and connect labstore capacity expansion in codfw - https://phabricator.wikimedia.org/T93215#1132362 (10RobH) a:5coren>3Papaul @Papaul: Please work with @cmjohnson on getting the layout of eqiad replicated in codfw for this (as he advises.) I'm asking Chris to help on this,... [16:46:12] (03PS1) 10Glaisher: Add 'Kurs' (106) to $wgContentNamespaces at dewikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197938 (https://phabricator.wikimedia.org/T93071) [16:48:21] 6operations, 10ops-codfw: rack and connect labstore-array4-codfw in codfw - https://phabricator.wikimedia.org/T93215#1132370 (10RobH) [16:49:52] (03CR) 10Manybubbles: [C: 031] Add 'Kurs' (106) to $wgContentNamespaces at dewikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197938 (https://phabricator.wikimedia.org/T93071) (owner: 10Glaisher) [16:51:42] YuviPanda: So what exactly did you do with Parsoid in beta regarding automatic updates? [16:52:00] YuviPanda: Because the Parsoid team deployed a patch to production and 3 minutes later someone tells me Parsoid in beta is broken [16:52:02] PROBLEM - HHVM queue size on mw1034 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [80.0] [16:52:26] !log kartik Finished scap: Update ContentTranslation (duration: 38m 43s) [16:52:29] Logged the message, Master [16:52:38] Nikerabbit: we're done with scap [16:52:48] kart_: ok [16:53:50] RoanKattouw: I've done absolutely nothing so far. That patch is still unmerged. [16:53:55] OK [16:53:58] Well, it broke :( [16:54:14] But if you didn't do anything I guess I shouldn't be talking to you [16:54:26] I haven't understood the current state of affairs well enough to replace them and hashar has been busy [16:54:28] Heh yeah [16:54:37] Plus I'm on mobile outside as well. [16:57:24] (03CR) 10Aaron Schulz: "The referenced commit didn't seem to have changed the dir, was it an older one?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197937 (https://phabricator.wikimedia.org/T92823) (owner: 10BryanDavis) [16:58:20] aspiecat: https://gerrit.wikimedia.org/r/#/c/196137/2/modules/scap/files/l10nupdate-1,unified -- the change is in line 83 [16:58:44] nevermind, I was looking at the final dir [16:59:14] (03CR) 10Nikerabbit: [C: 032] Update wgLocalisationUpdateDirectory to match l10nupdate-1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197937 (https://phabricator.wikimedia.org/T92823) (owner: 10BryanDavis) [16:59:16] (03CR) 10BryanDavis: "See -- the change is in line 83 where the new caches" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197937 (https://phabricator.wikimedia.org/T92823) (owner: 10BryanDavis) [16:59:19] (03Merged) 10jenkins-bot: Update wgLocalisationUpdateDirectory to match l10nupdate-1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197937 (https://phabricator.wikimedia.org/T92823) (owner: 10BryanDavis) [16:59:55] 6operations, 10hardware-requests: deploy eventlog2001 - https://phabricator.wikimedia.org/T90907#1132402 (10RobH) [17:00:15] bd808, aspiecat deploying now, legoktm sorry for running over your window [17:00:36] 6operations, 10hardware-requests: deploy eventlog2001 - https://phabricator.wikimedia.org/T90907#1070870 (10RobH) [17:00:40] Nikerabbit: my window starts in another hour [17:00:47] legoktm: are you sure? did I misread [17:00:57] legoktm: yes I did, np ten [17:00:59] bd808: it's odd that a temp dir has config instead of just being a script param [17:01:01] :) [17:01:21] (03PS1) 10Chad: Automate creation of -mc* and -elastic* instances [puppet] - 10https://gerrit.wikimedia.org/r/197941 [17:01:52] (03PS2) 10Chad: Automate creation of -mc* and -elastic* instances in staging [puppet] - 10https://gerrit.wikimedia.org/r/197941 [17:01:53] !log nikerabbit Synchronized wmf-config/CommonSettings.php: Update wgLocalisationUpdateDirectory to match l10nupdate-1 (duration: 00m 05s) [17:01:57] <^d> YuviPanda: ^^ [17:01:57] Logged the message, Master [17:02:21] wow sync-file is fast! [17:02:43] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [17:02:52] PROBLEM - HTTP 5xx req/min on graphite2001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [17:03:55] kart_: yarrr!!!! [17:04:33] PROBLEM - puppet last run on carbon is CRITICAL: CRITICAL: Puppet last ran 7 hours ago [17:04:49] 6operations: deploy eventlog2001 services - https://phabricator.wikimedia.org/T93220#1132416 (10RobH) 3NEW a:3ori [17:05:01] 6operations, 10hardware-requests: deploy eventlog2001 - https://phabricator.wikimedia.org/T90907#1070870 (10RobH) [17:05:02] 6operations: deploy eventlog2001 services - https://phabricator.wikimedia.org/T93220#1132426 (10RobH) [17:05:43] RECOVERY - puppet last run on carbon is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [17:06:15] gwicke: what is load rate on that graph, exactly? [17:06:31] sorry, wrong channel [17:06:42] 6operations, 10hardware-requests: deploy eventlog2001 - https://phabricator.wikimedia.org/T90907#1132431 (10RobH) 5Open>3Resolved System installed and keys are signed. I created task T93220 for the actual service implementation. [17:06:43] 6operations, 10hardware-requests: codfw/eqiad: (1) eventlogging node (per site) - eqiad done, codfw in progress - https://phabricator.wikimedia.org/T90747#1132436 (10RobH) [17:06:44] 6operations: deploy eventlog2001 services - https://phabricator.wikimedia.org/T93220#1132416 (10RobH) [17:06:58] 6operations, 10hardware-requests: codfw/eqiad: (1) eventlogging node (per site) - https://phabricator.wikimedia.org/T90747#1132437 (10RobH) 5stalled>3Resolved [17:07:22] 6operations, 10hardware-requests: codfw/eqiad: (1) eventlogging node (per site) - https://phabricator.wikimedia.org/T90747#1066479 (10RobH) eventlog1001 and eventlog2001 are installed and ready for service (or in the former, already in service) resolving this hardware request. [17:09:32] (03CR) 10Filippo Giunchedi: [C: 031] overrid-able concurrent_compactors setting [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/197911 (https://phabricator.wikimedia.org/T93140) (owner: 10Eevans) [17:10:46] (03CR) 10Filippo Giunchedi: "LGTM modulo what mobrovac said" [puppet] - 10https://gerrit.wikimedia.org/r/197915 (https://phabricator.wikimedia.org/T93140) (owner: 10Eevans) [17:11:51] 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: Cassandra compaction is getting behind - https://phabricator.wikimedia.org/T93140#1132446 (10fgiunchedi) thanks for looking into this! what are reasonable thresholds we should alert on? [17:14:24] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [17:14:24] RECOVERY - HTTP 5xx req/min on graphite2001 is OK: OK: Less than 1.00% above the threshold [250.0] [17:16:24] (03PS3) 10Yuvipanda: Automate creation of -mc* and -elastic* instances in staging [puppet] - 10https://gerrit.wikimedia.org/r/197941 (owner: 10Chad) [17:16:53] PROBLEM - Disk space on vanadium is CRITICAL: DISK CRITICAL - free space: / 4266 MB (3% inode=94%): [17:16:54] (03CR) 10Yuvipanda: [C: 032 V: 032] Automate creation of -mc* and -elastic* instances in staging [puppet] - 10https://gerrit.wikimedia.org/r/197941 (owner: 10Chad) [17:19:55] 6operations, 10ops-codfw, 3wikis-in-codfw: PXE doesn't work on mc2017-18 - https://phabricator.wikimedia.org/T90586#1132483 (10Joe) @Robh I guess these two need their VLANs configured as well? [17:20:04] (03CR) 10Filippo Giunchedi: [C: 04-1] Adjust RESTBase / Cassandra settings for deployment-prep (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/197662 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [17:20:09] 6operations, 10ops-codfw, 3wikis-in-codfw: PXE doesn't work on mc2017-18 - https://phabricator.wikimedia.org/T90586#1132484 (10Joe) a:5Papaul>3RobH [17:21:12] (03PS1) 10Catrope: Actually disable RESTbase in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197949 [17:21:18] James_F: ---^^ [17:21:28] 6operations, 10ops-codfw, 3wikis-in-codfw: PXE doesn't work on mc2017-18 - https://phabricator.wikimedia.org/T90586#1062590 (10Joe) p:5High>3Normal [17:22:21] nuria: michi_cc vanadium is about to run out of space again ^ [17:22:24] 6operations, 10hardware-requests: Upgrade eqiad LVS to 10G - https://phabricator.wikimedia.org/T89120#1132488 (10RobH) a:3RobH [17:22:50] YuviPanda: we did a fix that will mitigate this, let me move other logs [17:22:56] nuria: cool [17:23:14] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, thanks matanya! to be merged next week" [puppet] - 10https://gerrit.wikimedia.org/r/195607 (owner: 10Matanya) [17:23:21] 6operations, 10hardware-requests: Upgrade eqiad LVS to 10G - https://phabricator.wikimedia.org/T89120#1027766 (10RobH) Each LVS system has 4 connections (one per row), so I've created https://rt.wikimedia.org/Ticket/Display.html?id=9271 for the quote request to add this to lvs1001-1006. [17:24:22] 6operations, 10hardware-requests: Upgrade eqiad LVS to 10G - https://phabricator.wikimedia.org/T89120#1132494 (10RobH) I'll note the warranty on lvs1001-1006 expired on 2014-07-27. However, LVS systems kind of sit and just work, and haven't had hardware issues. As such, my initial request is simply to add th... [17:26:02] (03CR) 10Jforrester: [C: 032] Actually disable RESTbase in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197949 (owner: 10Catrope) [17:26:07] (03Merged) 10jenkins-bot: Actually disable RESTbase in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197949 (owner: 10Catrope) [17:27:11] !log catrope Synchronized wmf-config/InitialiseSettings-labs.php: Actually disable RESTbase in labs (duration: 00m 06s) [17:27:15] Logged the message, Master [17:30:03] RECOVERY - Disk space on vanadium is OK: DISK OK [17:30:35] !log manually attached all of User:FuzzyBot's accounts [17:30:39] Logged the message, Master [17:31:36] !log manually attached all of User:MediaWiki message delivery's accounts [17:31:42] Logged the message, Master [17:31:49] YuviPanda: moved things arround [17:31:53] 6operations, 10ops-codfw, 3wikis-in-codfw: PXE doesn't work on mc2017-18 - https://phabricator.wikimedia.org/T90586#1132500 (10RobH) a:5RobH>3Joe the ports were set to disabled and the vlan wasnt set. both are fixed now. [17:32:24] https://www.irccloud.com/pastebin/jknzYs5u [17:32:31] cc YuviPanda [17:33:32] legoktm: ah great FuzzyBot :) [17:33:48] !log ran fix-stats.php on all wikis with ContentTranslation [17:33:51] Logged the message, Master [17:34:06] :) [17:34:16] I'm still not sure what to do about the Translation notifications bot... [17:35:47] (03CR) 10Ori.livneh: graphite: add error alerts (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/197352 (https://phabricator.wikimedia.org/T92965) (owner: 10Filippo Giunchedi) [17:35:52] legoktm, those are all system users right? [17:35:54] ^ godog [17:35:58] don't we have a list of those somewhere? [17:36:13] Krenair: yup, and kinda not really. [17:37:24] ori: sweet, thanks I'll take a look [17:37:53] Well, there must be the password in private settings if the accounts are used by MediaWiki *and* use the web API [17:38:43] (03CR) 10Ori.livneh: [C: 032] overrid-able concurrent_compactors setting [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/197911 (https://phabricator.wikimedia.org/T93140) (owner: 10Eevans) [17:39:11] (03CR) 10Ori.livneh: [V: 032] overrid-able concurrent_compactors setting [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/197911 (https://phabricator.wikimedia.org/T93140) (owner: 10Eevans) [17:39:48] nuria: https://graphite.wikimedia.org/render/?width=586&height=308&_salt=1426786782.217&target=servers.vanadium.diskspace.root.byte_percentfree.value&from=-64hours [17:40:00] right...if I log in manually it should theoretically merge [17:40:20] nuria: so it might not last long, unless something has changed... [17:40:29] 6operations, 10ops-codfw, 3wikis-in-codfw: PXE doesn't work on mc2017-18 - https://phabricator.wikimedia.org/T90586#1132525 (10Joe) now on mc2017: ``` PXE-E61: Media test failure, check cable PXE-M0F: Exiting Broadcom PXE ROM. ``` seems like the cable is disconnected/connected to the wrong port? [17:40:37] (03PS7) 10Ori.livneh: dsh: delete most groups [puppet] - 10https://gerrit.wikimedia.org/r/195840 (https://phabricator.wikimedia.org/T92259) (owner: 10Dzahn) [17:40:46] (03CR) 10Ori.livneh: [C: 032 V: 032] dsh: delete most groups [puppet] - 10https://gerrit.wikimedia.org/r/195840 (https://phabricator.wikimedia.org/T92259) (owner: 10Dzahn) [17:40:47] YuviPanda: we did changed logging [17:40:53] nuria: ah, cool :) [17:40:59] YuviPanda: from 2MB secs to 300k sec [17:41:05] nuria: aaah :) cool, cool [17:41:10] nuria: do keep an eye out, however. [17:41:15] YuviPanda: so that should improve matters [17:41:22] nuria: it should also probably go on /srv [17:41:31] which has a lot more space [17:41:44] https://meta.wikimedia.org/wiki/Special:CentralAuth/Translation_Notification_Bot actually looks like everything is already attached? [17:41:54] so what's the issue? [17:42:08] none :) [17:42:13] YuviPanda: that requires puppet changes but we can do that too of course [17:42:39] nuria: :) I might not be available much, but do keep an eye out :) [17:42:48] YuviPanda: super-many-thanks [17:42:54] nuria: yw! :) [17:44:27] (03PS1) 10Ori.livneh: Configure statsd for citoid [puppet] - 10https://gerrit.wikimedia.org/r/197961 [17:45:16] (03CR) 10Jforrester: [C: 031] Configure statsd for citoid [puppet] - 10https://gerrit.wikimedia.org/r/197961 (owner: 10Ori.livneh) [17:46:23] (03CR) 10Eevans: "The dependent change, https://gerrit.wikimedia.org/r/#/c/197911/2, has been merged" [puppet] - 10https://gerrit.wikimedia.org/r/197915 (https://phabricator.wikimedia.org/T93140) (owner: 10Eevans) [17:48:10] (03Abandoned) 10Ori.livneh: Configure statsd for citoid [puppet] - 10https://gerrit.wikimedia.org/r/197961 (owner: 10Ori.livneh) [17:50:53] !log manually attached accounts for User:MediaWiki default, required clearing password+email on dewiki and cswiki [17:51:00] Logged the message, Master [17:52:09] 6operations, 10Analytics, 6Scrum-of-Scrums, 10Wikipedia-App-Android-App, and 3 others: Avoid cache fragmenting URLs for Share a Fact shares - https://phabricator.wikimedia.org/T90606#1132599 (10dr0ptp4kt) [17:53:33] 6operations, 10Analytics, 6Scrum-of-Scrums, 10Wikipedia-App-Android-App, and 3 others: Avoid cache fragmenting URLs for Share a Fact shares - https://phabricator.wikimedia.org/T90606#1063059 (10dr0ptp4kt) [17:56:50] 6operations, 10Analytics, 6Scrum-of-Scrums, 10Wikipedia-App-Android-App, and 3 others: Avoid cache fragmenting URLs for Share a Fact shares - https://phabricator.wikimedia.org/T90606#1132651 (10dr0ptp4kt) a:3dr0ptp4kt [17:56:55] legoktm: what about the Babel account? [17:57:21] Glaisher: what's the username for that one? [17:57:43] https://meta.wikimedia.org/wiki/Special:CentralAuth/Babel_AutoCreate [17:58:59] * legoktm gumbles at bable for re-inventing the wheel [17:59:03] babel even [17:59:13] 6operations, 10ops-eqiad: Increase asw-d-eqiad uplink capacity - https://phabricator.wikimedia.org/T92914#1132681 (10Cmjohnson) All the fibers are run to each router from D7/8. [18:00:04] legoktm: Respected human, time to deploy CentralAuth / SUL Finalization (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150319T1800). Please do the needful. [18:00:12] (03PS2) 10Eevans: increase compaction throughput and concurrency [puppet] - 10https://gerrit.wikimedia.org/r/197915 (https://phabricator.wikimedia.org/T93140) [18:02:32] (03PS2) 10Filippo Giunchedi: graphite: add error alerts [puppet] - 10https://gerrit.wikimedia.org/r/197352 (https://phabricator.wikimedia.org/T92965) [18:03:08] (03CR) 10Ori.livneh: [C: 031] "very nice" [puppet] - 10https://gerrit.wikimedia.org/r/197352 (https://phabricator.wikimedia.org/T92965) (owner: 10Filippo Giunchedi) [18:03:53] (03CR) 10Mobrovac: [C: 031] increase compaction throughput and concurrency [puppet] - 10https://gerrit.wikimedia.org/r/197915 (https://phabricator.wikimedia.org/T93140) (owner: 10Eevans) [18:03:58] (03Abandoned) 10Ori.livneh: apache mod: correct relationship declarations [puppet] - 10https://gerrit.wikimedia.org/r/195898 (owner: 10Matanya) [18:06:43] PROBLEM - HTTP 5xx req/min on graphite2001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [18:06:43] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [18:08:18] 7Puppet, 6operations: puppet lint check for resource names - https://phabricator.wikimedia.org/T93231#1132731 (10fgiunchedi) 3NEW [18:08:49] (03CR) 10Filippo Giunchedi: graphite: add error alerts (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/197352 (https://phabricator.wikimedia.org/T92965) (owner: 10Filippo Giunchedi) [18:09:16] (03PS3) 10Filippo Giunchedi: graphite: add error alerts [puppet] - 10https://gerrit.wikimedia.org/r/197352 (https://phabricator.wikimedia.org/T92965) [18:09:23] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] graphite: add error alerts [puppet] - 10https://gerrit.wikimedia.org/r/197352 (https://phabricator.wikimedia.org/T92965) (owner: 10Filippo Giunchedi) [18:15:56] !log legoktm Started scap: Update CentralAuth to master [18:16:01] Logged the message, Master [18:16:35] 6operations: Purge > 90 days stat1002:/a/squid/archive/glam_nara - https://phabricator.wikimedia.org/T92340#1132783 (10leila) @kevinator, where do we keep track of the definition of each entry in the gzip files? Knowing what's exactly in there can help us figure out if/how we can aggregate data or remove PIIs. [18:18:10] 6operations: Purge > 90 days stat1002:/a/squid/archive/glam_nara - https://phabricator.wikimedia.org/T92340#1132795 (10Ottomata) Keep track of? har har har. https://github.com/wikimedia/operations-puppet/blob/production/templates/udp2log/filters.erbium.erb#L24 [18:18:13] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [18:18:13] RECOVERY - HTTP 5xx req/min on graphite2001 is OK: OK: Less than 1.00% above the threshold [250.0] [18:19:35] 7Puppet, 6operations: puppet lint check for resource names - https://phabricator.wikimedia.org/T93231#1132805 (10fgiunchedi) p:5Triage>3Low [18:21:15] (03PS3) 10Alexandros Kosiaris: Package builder module [puppet] - 10https://gerrit.wikimedia.org/r/194471 [18:24:49] (03CR) 10RobH: [C: 04-2] "As this is a sudo level access request, my -2 is only to reflect that this has to be reviewed in the weekly operations meeting. The next " [puppet] - 10https://gerrit.wikimedia.org/r/197798 (https://phabricator.wikimedia.org/T93151) (owner: 10Dzahn) [18:26:37] (03PS16) 10Ori.livneh: Gzip SVGs on back upload varnishes. [puppet] - 10https://gerrit.wikimedia.org/r/108484 (https://bugzilla.wikimedia.org/54291) [18:27:01] 10Ops-Access-Requests, 6operations, 6Phabricator, 6Release-Engineering, 5Patch-For-Review: Mukunda needs sudo on iridium (phab host) - https://phabricator.wikimedia.org/T93151#1132820 (10RobH) a:3mmodell @mmodell, Please read and sign https://phabricator.wikimedia.org/L3. This is now required for any... [18:28:05] (03PS7) 10Ori.livneh: Gzip .svg and .ico files on bits.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/113687 (https://bugzilla.wikimedia.org/61442) (owner: 10Brion VIBBER) [18:28:07] (03PS3) 10Faidon Liambotis: increase compaction throughput and concurrency [puppet] - 10https://gerrit.wikimedia.org/r/197915 (https://phabricator.wikimedia.org/T93140) (owner: 10Eevans) [18:28:21] (03CR) 10Faidon Liambotis: [C: 032] increase compaction throughput and concurrency [puppet] - 10https://gerrit.wikimedia.org/r/197915 (https://phabricator.wikimedia.org/T93140) (owner: 10Eevans) [18:28:59] (03CR) 10Faidon Liambotis: [V: 032] increase compaction throughput and concurrency [puppet] - 10https://gerrit.wikimedia.org/r/197915 (https://phabricator.wikimedia.org/T93140) (owner: 10Eevans) [18:30:04] (03PS8) 10Ori.livneh: Gzip .svg and .ico files on bits.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/113687 (https://bugzilla.wikimedia.org/61442) (owner: 10Brion VIBBER) [18:32:23] !log restarting cassandra on restbase1006 [18:32:29] Logged the message, Master [18:36:46] akosiaris: citoid deploy ... again [18:36:46] :( [18:37:00] akosiaris: tried your git deploy-update-info on tin [18:37:03] didn't work [18:37:03] :( [18:37:19] update-server-info [18:37:59] 6operations, 6MediaWiki-Core-Team, 10hardware-requests, 5Patch-For-Review: Fluorine needs bigger disks - https://phabricator.wikimedia.org/T92417#1132883 (10Andrew) No need to do anything about this yet, I still need to gather info. [18:38:19] 6operations, 10Wikimedia-Shop, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1132884 (10vshchepakina) I don't plan on moving back to shop.wikimedia.org. Let's just go with it. [18:38:24] (03Abandoned) 10BBlack: OCSP support for install_certificate [puppet] - 10https://gerrit.wikimedia.org/r/197821 (owner: 10BBlack) [18:46:33] !log legoktm Finished scap: Update CentralAuth to master (duration: 30m 36s) [18:46:38] Logged the message, Master [18:56:09] 6operations, 10ops-codfw, 3wikis-in-codfw: PXE doesn't work on mc2017-18 - https://phabricator.wikimedia.org/T90586#1132968 (10Papaul) i switched port and cable same error message. [18:56:19] !log legoktm Synchronized php-1.25wmf22/extensions/Renameuser/: Move logging inside of RenameuserSQL (duration: 00m 07s) [18:56:24] Logged the message, Master [18:56:49] !log legoktm Synchronized php-1.25wmf21/extensions/Renameuser/: Move logging inside of RenameuserSQL (duration: 00m 08s) [18:56:52] Logged the message, Master [18:57:14] PROBLEM - Host cp1047 is DOWN: PING CRITICAL - Packet loss = 100% [18:58:23] PROBLEM - Host mw2027 is DOWN: PING CRITICAL - Packet loss = 100% [18:58:56] hmm [18:59:02] (03CR) 10Ottomata: [C: 031] "I'm fine with this, but I'm not really sure who cares about these." [puppet] - 10https://gerrit.wikimedia.org/r/195917 (owner: 10ArielGlenn) [18:59:13] RECOVERY - Host mw2027 is UP: PING OK - Packet loss = 0%, RTA = 43.24 ms [19:01:54] ACKNOWLEDGEMENT - Host cp1047 is DOWN: PING CRITICAL - Packet loss = 100% Brandon Black reinstalling [19:02:23] RECOVERY - Host cp1047 is UP: PING OK - Packet loss = 0%, RTA = 0.87 ms [19:07:53] !log restarting cassandra on restbase1005 [19:07:58] Logged the message, Master [19:09:22] 6operations: Put archiva.wikimedia.org behind misc-web-lb and force https - https://phabricator.wikimedia.org/T88139#1133057 (10RobH) Yes, but the main question asked with every certificate purchase is why it cannot exist behind misc-web. While I see you reference this conversation occurred, can you summarize t... [19:10:23] PROBLEM - HTTP 5xx req/min on graphite2001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [19:10:32] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [19:13:38] mobrovac: akosiaris@tin:/srv/deployment/citoid/deploy/src$ git update-server-info [19:13:42] and it seems to have worked [19:13:59] 553 Undefined variable: wmgUseCitoid in /srv/mediawiki/wmf-config/CommonSett [19:13:59] ings.php on line 2096 [19:14:04] RoanKattouw, James_F ^ [19:14:19] Hmm. Krenair? [19:14:30] legoktm: When was this? [19:14:48] dunno, it just popped up on the top of fatalmonitor [19:14:58] Fun. [19:15:03] So that means it probably happened recently [19:15:06] 533 is high too [19:17:19] sigh [19:18:02] legoktm, James_F: there is no wmgUseCitoid [19:18:37] akosiaris: solved it, thnx [19:18:43] earlier [19:18:53] lol [19:18:59] akosiaris: strangely, had to do that 3 times before it worked [19:19:05] legoktm, it is not referenced in that file [19:19:20] dunno [19:19:24] I haven't looked [19:20:47] mobrovac: weird... [19:21:02] mucho weird [19:21:44] krenair@fluorine:/a/mw-log$ grep UseCitoid hhvm.log [19:21:44] Mar 19 15:33:34 mw1017: #012Notice: Undefined variable: wmgUseCitoid in /srv/mediawiki/wmf-config/CommonSettings.php on line 2096 [19:21:44] Mar 19 15:56:38 mw1229: #012Notice: Undefined variable: wmgUseCitoid in /srv/mediawiki/wmf-config/CommonSettings.php on line 2096 [19:21:44] Mar 19 15:35:37 mw1017: message repeated 552 times: [ #012Notice: Undefined variable: wmgUseCitoid in /srv/mediawiki/wmf-config/CommonSettings.php on line 2096] [19:22:14] RoanKattouw, legoktm, James_F I think that was probably me testing James_F's broken commit on mw1017 earlier [19:22:15] Krenair: maybe it didn't sync to 1017 properly? [19:22:18] ah [19:22:22] RECOVERY - HTTP 5xx req/min on graphite2001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:22:22] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:22:28] alright [19:22:32] * legoktm is off for lunch [19:22:34] it only appears for a bunch of timestamps earlier today [19:23:18] (03PS4) 10Ottomata: Make eventlogging forwarder define work with new forwarder config file format [puppet] - 10https://gerrit.wikimedia.org/r/197413 [19:23:30] (03PS5) 10Ottomata: Make eventlogging forwarder define work with new forwarder config file format [puppet] - 10https://gerrit.wikimedia.org/r/197413 [19:24:44] 10Ops-Access-Requests, 6operations, 6Phabricator, 6Release-Engineering, 5Patch-For-Review: Mukunda needs sudo on iridium (phab host) - https://phabricator.wikimedia.org/T93151#1133088 (10mmodell) signed. [19:26:16] (03PS4) 10Alexandros Kosiaris: Package builder module [puppet] - 10https://gerrit.wikimedia.org/r/194471 [19:27:18] (03CR) 10Ottomata: [C: 032] Make eventlogging forwarder define work with new forwarder config file format [puppet] - 10https://gerrit.wikimedia.org/r/197413 (owner: 10Ottomata) [19:28:30] (03CR) 10Jforrester: "Scheduled for Monday 30 March. Not to go out before then." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196984 (owner: 10Jforrester) [19:29:42] (03Abandoned) 10Jforrester: RESTbase production enablement step 4 – enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197471 (owner: 10Jforrester) [19:29:46] (03Abandoned) 10Jforrester: RESTbase production enablement step 5 – dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197472 (owner: 10Jforrester) [19:29:50] (03Abandoned) 10Jforrester: RESTbase production enablement step 6 – all Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197473 (owner: 10Jforrester) [19:30:42] (03PS1) 10BBlack: repool cp1047 [puppet] - 10https://gerrit.wikimedia.org/r/197993 [19:31:07] (03CR) 10BBlack: [C: 032 V: 032] repool cp1047 [puppet] - 10https://gerrit.wikimedia.org/r/197993 (owner: 10BBlack) [19:31:33] (03PS2) 10Ottomata: Set up eventlogging varnishkafka instance on production bits [puppet] - 10https://gerrit.wikimedia.org/r/197650 [19:31:35] !log repooled cp1047 in pybal [19:31:38] Logged the message, Master [19:32:59] (03CR) 10Ottomata: [C: 032 V: 032] Set up eventlogging varnishkafka instance on production bits [puppet] - 10https://gerrit.wikimedia.org/r/197650 (owner: 10Ottomata) [19:34:37] 6operations: Purge > 90 days stat1002:/a/squid/archive/glam_nara - https://phabricator.wikimedia.org/T92340#1133109 (10leila) thanks, @Ottomata. It looks like IP address and user_agent is the problem. @Multichill, I hear you that it's unfortunate to just purge the data but we shouldn't be keeping it as is. Do yo... [19:46:14] PROBLEM - Disk space on vanadium is CRITICAL: DISK CRITICAL - free space: / 4190 MB (3% inode=94%): [20:02:01] heya paravoid, q about deb packaging stuff [20:02:15] i need a newer version of python-six that is not in an upstream apt [20:02:19] as a dependency for a newer version of python-kafka [20:02:44] i downloaded the source package, copied the debian/ dir into the source of the newer python-six [20:02:48] dch -i [20:02:52] and then built a package [20:03:04] now, should I also make a gerrit repo for this? [20:03:09] nah it's fine [20:03:10] and do the git-buildpackage thing? [20:03:19] I mean you can if you want to [20:03:19] i can just reprepro into apt? [20:03:25] (i dont' really want to :p) [20:03:32] which version do you need? [20:03:35] 1.9.0 [20:03:37] six! [20:03:40] (j/k) [20:03:43] ha [20:03:46] ah, it's not even in jessie [20:04:02] dget http://ftp.de.debian.org/debian/pool/main/s/six/six_1.9.0-1.dsc [20:04:49] make the version 1.9.0-1~trusty1 or something [20:04:56] rebuild, put in apt [20:04:57] no gerrit needed [20:04:57] oh. [20:05:00] hmmm [20:05:01] cool [20:05:01] (03PS1) 10Rush: phab update password location for phabtools [puppet] - 10https://gerrit.wikimedia.org/r/198000 [20:06:48] (03CR) 10Rush: [C: 032] phab update password location for phabtools [puppet] - 10https://gerrit.wikimedia.org/r/198000 (owner: 10Rush) [20:07:49] paravoid, that would go in the universe repo? [20:08:00] universe for trusty, yes [20:08:08] (03PS2) 10Rush: phabricator: delete legalpad.yaml [puppet] - 10https://gerrit.wikimedia.org/r/197320 (owner: 10Dzahn) [20:08:09] backports for jessie-wikimedia [20:08:13] (03CR) 10Rush: [C: 031] phabricator: delete legalpad.yaml [puppet] - 10https://gerrit.wikimedia.org/r/197320 (owner: 10Dzahn) [20:08:18] i need it for precise [20:08:22] will build it for trusty too [20:09:20] ok [20:09:22] be careful [20:09:36] reprepro won't be happy if you set the version to be the same [20:09:50] so if you e.g. pick 1.9.0-1~wikimedia1, it won't accept that for both trusty-wikimedia and precise-wikimedia [20:10:07] ~trusty1 & ~precise1 would be better picks [20:10:42] I'm surprised it needs such a newer version of six [20:14:37] aye ok. hmm, it might not i guess...? it just won't work with what is in precise [20:14:47] !log manually merged accounts for User:Babel AutoCreate [20:14:50] Glaisher: ^ [20:14:51] Logged the message, Master [20:15:13] checking... [20:16:13] hm, yeah the kafka-python package doesn't specify a version [20:16:31] and, in development, when this eventlogging installed python-six 1.9.0 [20:16:45] so that's the one i was going with [20:17:02] i bet 1.8.0 would work, paravoid, should I use that? [20:17:34] 6operations, 10Deployment-Systems, 6Release-Engineering, 5Patch-For-Review: /usr/local/bin/deploy2graphite broken on tin due to nc command syntax - https://phabricator.wikimedia.org/T1387#1133205 (10greg) [20:22:21] 6operations: Backport python-virtualenv 1.11.4 from Trusty to Precise - https://phabricator.wikimedia.org/T92033#1133226 (10hashar) Thanks! One less manually maintained dependency :) [20:24:33] oof, paravoid, i'm not sure this upstream debian package will work on precise...it has build deps i can't find [20:24:46] i was able to build this using precise's debianization of 1.1.0 [20:24:57] but not the source package you pointed me at :/ [20:25:01] that's fine too [20:25:08] ok phew :) [20:25:08] for which system is that? [20:25:12] ? [20:25:18] is what? [20:25:20] just curious [20:25:22] oh [20:25:23] eventlogging [20:25:25] kafka integration [20:25:27] vanadium? [20:25:29] yes, [20:25:41] want to injest from kafka instead of varnishncsa/udp [20:26:23] interesting [20:35:57] (03PS6) 10Ottomata: Set up kafka processor and kafka consumer for client side events from /beacon/event.gif via varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/197418 [20:36:57] 6operations: poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133244 (10Dzahn) 3NEW [20:37:50] 6operations: poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133253 (10Dzahn) [20:37:51] 6operations, 3codfw-appserver-setup, 3wikis-in-codfw: Set up the mediawiki application layer in codfw - https://phabricator.wikimedia.org/T86894#1133252 (10Dzahn) [20:38:39] (03PS7) 10Ottomata: Set up kafka processor and kafka consumer for client side events from /beacon/event.gif via varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/197418 [20:41:49] (03PS8) 10Ottomata: Set up kafka processor and kafka consumer for client side events from /beacon/event.gif via varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/197418 [20:45:43] (03CR) 10Ottomata: [C: 032 V: 032] Set up kafka processor and kafka consumer for client side events from /beacon/event.gif via varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/197418 (owner: 10Ottomata) [20:48:23] 6operations: poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133280 (10Dzahn) the current poolservers in eqiad are potassium (WMF3287) and helium (WMF3137). the new ones should be similar.helium is a Dell PowerEdge R310, potassium is unlabeled [20:49:58] every string like "R123" becomes a link in phabricator to a repository. that also means if you just write stuff like "Dell R310" it links to an unrelated repo [20:53:47] (03PS1) 10Ottomata: Use proper timestamp format in eventlogging varnishkafka messages [puppet] - 10https://gerrit.wikimedia.org/r/198084 [20:54:25] (03PS3) 10Tim Landscheidt: Tools: Factor out registering with proxies [puppet] - 10https://gerrit.wikimedia.org/r/197658 (https://phabricator.wikimedia.org/T91954) [20:55:01] 6operations: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133306 (10Dzahn) [20:55:09] (03CR) 10Ottomata: [C: 032] Use proper timestamp format in eventlogging varnishkafka messages [puppet] - 10https://gerrit.wikimedia.org/r/198084 (owner: 10Ottomata) [20:57:51] (03CR) 10Tim Landscheidt: "Still needs a test case of uswgi-python and nodejs." [puppet] - 10https://gerrit.wikimedia.org/r/197658 (https://phabricator.wikimedia.org/T91954) (owner: 10Tim Landscheidt) [21:03:21] (03PS1) 10Ottomata: Another fix for eventlogging varnishkafka timestamp format [puppet] - 10https://gerrit.wikimedia.org/r/198088 [21:03:29] (03CR) 10Ottomata: [C: 032 V: 032] Another fix for eventlogging varnishkafka timestamp format [puppet] - 10https://gerrit.wikimedia.org/r/198088 (owner: 10Ottomata) [21:07:41] !log set email for User:Phrazz@global and attached commonswiki account [21:16:33] 6operations, 6Phabricator, 6Project-Creators: Create policy projects and convert people projects to open - https://phabricator.wikimedia.org/T90491#1133346 (10greg) The way launchpad dealt with this was just by having separate teams. Example: * Anyone could just projectX and watch the bugs, join the mailing... [21:19:01] 6operations: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133352 (10Dzahn) ticket from the past to setup the ones in eqiad: https://rt.wikimedia.org/Ticket/Display.html?id=3407 so, these should be - in different racks - don't need public IPs, private vlan - should be like "... [21:21:05] 6operations, 6Commons, 6Multimedia, 7HHVM, and 2 others: Create an HHVM 3.6.0 package, adding Tim's streaming patch - https://phabricator.wikimedia.org/T93194#1133362 (10gpaumier) [21:21:10] 6operations: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133244 (10Dzahn) [21:24:42] 6operations: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133383 (10RobH) [21:25:40] 6operations, 6Phabricator, 6Project-Creators: Create policy projects and convert people projects to open - https://phabricator.wikimedia.org/T90491#1133385 (10greg) >>! In T90491#1133346, @greg wrote: > The way launchpad dealt with this was just by having separate teams. ....which looks like what this task... [21:26:23] PROBLEM - puppet last run on mw2033 is CRITICAL: CRITICAL: puppet fail [21:28:22] Krenair: (responding over here so people know) ok, what's up? [21:28:30] 14:28 <+ Krenair> greg-g, hey, I'm about to deploy some emergency wikieditor stuff [21:29:03] we seem to be flooding eventlogging [21:29:06] with invalid entries [21:29:11] * greg-g nods [21:29:12] like, at least twice every plwiki page load [21:29:12] doit [21:29:17] greg-g: did the deployment train just… train? [21:29:23] need to upload the patch first :) [21:29:24] andrewbogott: yesterday [21:29:59] mwdeploy mwdeploy 4096 Mar 19 18:55 php-1.25wmf22 <- that’s a couple hours ago; what’s that? [21:30:21] uh [21:30:33] twentyafterfour: around? opsen question ^ [21:30:51] greg-g: also, logins don’t work anymore [21:30:56] and also cswiki, maybe more [21:31:03] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.69% of data above the critical threshold [500.0] [21:31:03] PROBLEM - HTTP 5xx req/min on graphite2001 is CRITICAL: CRITICAL: 7.69% of data above the critical threshold [500.0] [21:31:07] what the [21:31:08] greg-g: I'm here [21:31:13] and all existing sessions seem to be trashed. [21:31:14] andrewbogott: I synced around that time [21:31:15] what? [21:31:32] I was logging in and out not too long ago [21:31:35] andrewbogott: on wikitechwiki or production? [21:31:40] wikitech [21:31:45] oh, wikitech [21:31:46] whew [21:32:23] greg-g: so… now that deployers can deploy to wikitech, does that mean that it’s not exclusively my proble when it breaks? [21:32:25] *problem [21:32:25] 1) not sure about that timestamp for mwf22, it was deployed yesterday to group0 and seems to be working on mw.org [21:32:32] 6operations, 10hardware-requests: codfw: (2) poolcounter systems - https://phabricator.wikimedia.org/T93266#1133412 (10RobH) p:5Triage>3Normal [21:33:01] andrewbogott: is wikitechwiki in group0? [21:33:34] check it out: upgraded an hour ago https://wikitech.wikimedia.org/wiki/Special:Version [21:34:17] greg-g: I don’t know what group0 is, but no doubt that’s in the config someplace [21:34:40] 6operations: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133419 (10RobH) [21:34:41] 6operations, 10hardware-requests: codfw: (2) poolcounter systems - https://phabricator.wikimedia.org/T93266#1133417 (10RobH) 5Open>3Resolved I've allocated systems wmf5816 (subra) and wmf5817 (suhail). However, system suhail is in B5, and has to relocate into A5, so they are in different racks. I'll reso... [21:34:59] andrewbogott: we have groups of mediawiki servers which are deployment targets for scap, group0 is mediawiki.org + testwikis [21:35:39] andrewbogott: group0 is the first to get the new branch, then group1 (non-wikipedias), then group2 (wikipedias) [21:36:02] so, honestly, I can't tell you how wikitech is upgraded [21:36:05] how/when [21:36:15] https://mediawiki.org/wiki/MediaWiki_1.25/Roadmap [21:36:47] 1) that date/time is probably a result of the scap that legoktm ran [21:36:58] 2) I don't really have a 2, I don't know about the login issue [21:37:08] ok, but this is all knowable, right? I mean, what group it’s in is coded someplace… [21:37:35] wikitech is not group 0 [21:37:39] wikitech is group 1 [21:37:39] legoktm: you updated centralauth, can you help debug the lack of ability to login to wikitech, plz [21:37:40] https://gerrit.wikimedia.org/r/#/c/197382/1/wikiversions.json [21:37:43] line 434 [21:37:48] greg-g: CentralAuth doesn't run on wikitech [21:37:50] is it possible for me to find out in retrospect what happened in that most recent sync? [21:38:01] legoktm: then I'm out of ideas [21:39:44] legoktm: and yet, somehow I expect that a central auth change caused the problem… [21:39:53] andrewbogott: that wmf22 on disk isn't a big deal, wikitech is still running wmf21 (every host gets all versions, it's only the wikiversions.json that tells which version is served) [21:40:06] all the CentralAuth updates were related to global rename [21:40:17] greg-g: ok, how about this one then? mwdeploy mwdeploy 4096 Mar 19 18:56 php-1.25wmf21 [21:40:34] I updated both 1.25wmf21 and 22 [21:40:37] legoktm: are you able to log in and see where the auth process is failing? [21:41:04] it still runs wmf21 [21:41:18] it's on silver right? [21:41:41] yes [21:41:49] andrewbogott: the timestamps update when scap is run, look at the SAL for all the scaps today [21:42:36] where are wikitech's debug logs kept? [21:42:54] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [21:42:54] RECOVERY - HTTP 5xx req/min on graphite2001 is OK: OK: Less than 1.00% above the threshold [250.0] [21:42:55] 18:56 logmsgbot: legoktm Synchronized php-1.25wmf21/extensions/Renameuser/: Move logging inside of RenameuserSQL (duration: 00m 08s) [21:43:25] legoktm: what I know is: wikitech is a normal-ish wiki, configured using the same config files as every other production wiki. [21:43:31] ok [21:44:46] andrewbogott: I think memcached is having issues [21:44:47] > $wgMemc->set('foobar', 'baz'); [21:44:47] > var_dump($wgMemc->get('foobar')); [21:44:47] bool(false) [21:45:28] legoktm: I can restart it. But… any guesses what’s wrong/why? [21:45:53] RECOVERY - puppet last run on mw2033 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:01] 6operations, 10ops-codfw: subra/wmf5816 - relabel system / setup mgmt / update racktables - https://phabricator.wikimedia.org/T93272#1133458 (10RobH) 3NEW a:3Papaul [21:46:20] andrewbogott: well MW is having issues talking to memcache [21:46:26] and that would break sessions [21:46:42] is there any reason why a sync would cause that? [21:46:49] memcache doesn’t usually just… die off [21:47:12] I don't think it's related to the sync [21:47:37] I restarted memcached… any difference? [21:47:56] I renamed a page then edited it, and got "The revision #0 of the page named "MediaWiki Developer Summit 2015/Lessons learned" does not exist." in two tabs. I've never seen that before. ?action=purge fixed it, is it just a transient? [21:48:25] andrewbogott: nope still failing...try restarting nutcracker? [21:49:10] legoktm: yeah, that helped. [21:49:12] hm [21:49:26] (03PS1) 10RobH: setting subra/suhail dns records [dns] - 10https://gerrit.wikimedia.org/r/198108 [21:49:32] yup, memcache is working again [21:49:34] !log restarted memcached and nutcracker on silver/wikitech. Why did it need a restart? [21:49:34] nutcracker log is a poet or something [2015-03-19 21:48:33.980] nc.c:189 run, rabbit run / dig that hole, forget the sun / and when at last the work is done / don't sit down / it's time to dig another one [21:49:55] and login wfm now [21:50:39] yay for spontaneous unexplained failures! [21:50:49] We don’t even really need nutcracker on that box, it’s just talking to itself [21:50:52] Mar 19 21:48:33 silver kernel: [3711097.930612] init: nutcracker main process (31101) killed by TERM signal [21:50:55] ^ [21:51:14] mutante: I think that’s me restarting it just now, isn’t it? [21:51:20] That’s, like, 2 minutes ago? [21:51:25] * legoktm goes back to the other stuff he was doing [21:51:38] andrewbogott: right it is, and that means nothing else in syslog about it . [21:51:42] hmm [21:51:46] yeah [21:51:50] FWIW "The revision #0... does not exist" was on mw1027 [21:52:22] legoktm: thanks [21:52:31] np [21:53:24] (03CR) 10RobH: [C: 032] setting subra/suhail dns records [dns] - 10https://gerrit.wikimedia.org/r/198108 (owner: 10RobH) [21:56:08] !log krenair Synchronized php-1.25wmf21/extensions/WikiEditor/modules/jquery.wikiEditor.js: https://gerrit.wikimedia.org/r/198106 (duration: 00m 06s) [21:56:11] Logged the message, Master [21:56:21] robla: subra-a and subra-b would have been following the rules. it's a binary star :) [21:57:05] or omicron and leonis [21:57:12] RoanKattouw, hm, it's still happening :/ [21:57:29] Krenair: You should wait at least 10 minutes [21:57:34] yeah [21:57:34] JavaScript doesn't update instantly on clients [21:57:41] aha [21:57:43] mutante: robh [21:57:49] https://en.wikipedia.org/wiki/Subra [21:58:23] RoanKattouw, it's applied in my client now [21:58:25] and the errors went away [21:58:37] greg-g: ? [21:59:50] (03PS1) 10BBlack: Local OCSP updates to the filesystem [puppet] - 10https://gerrit.wikimedia.org/r/198110 [22:00:30] Nice [22:00:30] mutante: you tab completed to rob.la [22:00:38] 6operations, 10ops-codfw: suhail/wmf5817 - relabel system / relocate / setup mgmt / update racktables - https://phabricator.wikimedia.org/T93284#1133615 (10RobH) 3NEW a:3Papaul [22:00:44] RoanKattouw, at least, some of them [22:01:27] mutante: ha, noooo [22:01:30] greg-g: ooh, did not notice. thanks [22:01:33] that soudns like its asking for confusion [22:01:56] so dns is done, once papaul updates and does the blocking onsite tasks i'll have the port info to setup that part as well [22:02:21] great, thanks [22:02:29] nuria, milimetric: around? [22:03:15] (03Abandoned) 10Dzahn: WIP: role and module for contacts [puppet] - 10https://gerrit.wikimedia.org/r/194786 (https://phabricator.wikimedia.org/T90679) (owner: 10Dzahn) [22:03:42] RoanKattouw, http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20eqiad&h=vanadium.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1426802457&v=381653.03&m=bytes_in&vl=bytes%2Fsec&ti=Bytes%20Received&z=large [22:04:10] Krenair: yes, but one second [22:04:20] hah nice [22:04:28] what's the edit schema rate like now? [22:04:31] Bytes received on vanadium halved [22:14:21] (03PS2) 10BBlack: Local OCSP updates to the filesystem [puppet] - 10https://gerrit.wikimedia.org/r/198110 [22:15:15] (03CR) 10jenkins-bot: [V: 04-1] Local OCSP updates to the filesystem [puppet] - 10https://gerrit.wikimedia.org/r/198110 (owner: 10BBlack) [22:15:17] Krenair: ok, back [22:15:20] what's up? [22:16:02] We identified a few ways that events would be getting logged when they probably shouldn't've been [22:16:21] deployed a patch, see the ganglia link I posted above [22:16:22] 6operations: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133642 (10RobH) [22:16:46] 6operations: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133244 (10RobH) [22:17:08] krenair; yes [22:17:16] Krenair:yes [22:17:38] what is the edit schema rate like now? [22:17:51] Krenair: of inflow events per second? [22:17:58] 6operations: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133657 (10RobH) p:5Triage>3Normal [22:18:02] yues [22:18:03] yes* [22:18:42] (03CR) 10Dzahn: cassandra: add ferm rules (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/197840 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [22:18:52] Krenair: 120 events/sec valid: http://graphite.wikimedia.org/render/?width=588&height=311&_salt=1426803508.28&from=00%3A00_20150317&until=23%3A59_20150319&target=eventlogging.schema.Edit.rate [22:19:05] Krenair: more than that invalid [22:19:06] (03PS3) 10BBlack: Local OCSP updates to the filesystem [puppet] - 10https://gerrit.wikimedia.org/r/198110 [22:19:21] Krenair: more than that invalid, so >200 events/sec total, [22:19:33] ok, I saw there were still some invalid ones going in [22:19:56] (03PS1) 10Rush: exim should send unaliased local mail to root@wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/198114 [22:20:09] Krenair: there is a lot more of invalid ones, let me show you, so many that all our alarms are up [22:20:19] (03CR) 10Dzahn: "does that mean you want to kill the entire network.pp and move all that data into hiera? or does it mean bastion hosts and monitoring host" [puppet] - 10https://gerrit.wikimedia.org/r/197839 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [22:20:27] 6operations: deploy francium for html/zim dumps - https://phabricator.wikimedia.org/T93113#1133664 (10RobH) p:5Triage>3Normal [22:20:28] (03CR) 10jenkins-bot: [V: 04-1] Local OCSP updates to the filesystem [puppet] - 10https://gerrit.wikimedia.org/r/198110 (owner: 10BBlack) [22:21:00] Krenair: big green waves are mostly edit events: http://graphite.wikimedia.org/render/?width=588&height=311&_salt=1426803636.621&from=00%3A00_20150317&until=23%3A59_20150319&target=eventlogging.schema.Edit.rate&target=eventlogging.client_side_events.raw.rate [22:21:04] http://graphite.wikimedia.org/render/?width=588&height=311&_salt=1426783712.644&from=00%3A00_20150317&until=23%3A59_20150319&target=eventlogging.client_side_events.valid.rate&target=eventlogging.client_side_events.raw.rate&target=eventlogging.schema.Edit.rate&target=eventlogging.schema.MobileWebUIClickTracking.rate is looking better [22:22:21] !log restarting cassandra on restbase1004 [22:22:24] Krenair: yes, pufff... [22:22:25] Logged the message, Master [22:22:56] Krenair: we just got Oks from alarms [22:23:06] (03CR) 10Dzahn: mediawiki: allow ssh from deployment servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/197053 (https://phabricator.wikimedia.org/T92843) (owner: 10Dzahn) [22:24:14] Krenair: will keep an eye on things [22:24:19] ok [22:24:26] there are still fixes to make on our end that should improve things more [22:24:34] Krenair: did you deployed the sampling? [22:24:45] not yet. [22:24:58] Krenair: so, how did you lower the inflow? [22:25:15] Fixed some crazy cases where it was erroring twice on every cswiki/plwiki (and probably others) page load [22:25:24] k [22:25:49] i.e. not actual editing pages. We assume some gadget was depending on it or something [22:25:53] (03PS1) 10Tim Landscheidt: Ignore warnings about URLs without modules for private repository [puppet] - 10https://gerrit.wikimedia.org/r/198116 (https://phabricator.wikimedia.org/T87132) [22:26:01] Krenair: ok, let me know as you deploy fixes as we should see those reflected on data [22:27:02] (03PS1) 10QChris: Mark limn's apache configs as managed by puppet [puppet] - 10https://gerrit.wikimedia.org/r/198117 [22:27:04] (03PS1) 10QChris: Allow limn instances to specify Apache config for proxy [puppet] - 10https://gerrit.wikimedia.org/r/198118 [22:27:06] (03PS1) 10QChris: Add limn proxy template that handles taken down Wikipedia Zero dashboards [puppet] - 10https://gerrit.wikimedia.org/r/198119 (https://phabricator.wikimedia.org/T92920) [22:27:48] 7Puppet, 6operations, 5Patch-For-Review: Resource attributes are quoted inconsistently - https://phabricator.wikimedia.org/T91908#1133676 (10scfc) @Matanya, as you wrote most of the [[https://wikitech.wikimedia.org/wiki/Puppet_coding|style guide]], do you want to update it accordingly? [22:28:43] 7Puppet, 6operations, 5Patch-For-Review: Resource attributes are quoted inconsistently - https://phabricator.wikimedia.org/T91908#1133678 (10Matanya) Yes, i'll do it during the weekend. [22:32:55] (03CR) 10Milimetric: [C: 031] "handsome patch" [puppet] - 10https://gerrit.wikimedia.org/r/198119 (https://phabricator.wikimedia.org/T92920) (owner: 10QChris) [22:35:17] how to get list of restbase hostnames out of hiera [22:37:10] (03PS3) 10BryanDavis: Make deploy2graphite use mw-deployment-vars.sh [puppet] - 10https://gerrit.wikimedia.org/r/183568 (https://phabricator.wikimedia.org/T1387) (owner: 10Reedy) [22:38:33] (03PS4) 10BBlack: Local OCSP updates to the filesystem [puppet] - 10https://gerrit.wikimedia.org/r/198110 [22:39:30] (03CR) 10jenkins-bot: [V: 04-1] Local OCSP updates to the filesystem [puppet] - 10https://gerrit.wikimedia.org/r/198110 (owner: 10BBlack) [22:39:33] (03PS1) 10EBernhardson: Enable flow on ruwiki and pawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198126 [22:40:31] (03CR) 10BryanDavis: "PS 3 was a manual rebase + updated commit message" [puppet] - 10https://gerrit.wikimedia.org/r/183568 (https://phabricator.wikimedia.org/T1387) (owner: 10Reedy) [22:41:12] (03PS5) 10BBlack: Local OCSP updates to the filesystem [puppet] - 10https://gerrit.wikimedia.org/r/198110 [22:47:20] 6operations, 10Wikimedia-Shop, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1133769 (10Dzahn) Alright, i let them know to go ahead. [22:48:13] PROBLEM - Check status of defined EventLogging jobs on vanadium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:48:18] 6operations, 10Deployment-Systems, 6Release-Engineering, 5Patch-For-Review: /usr/local/bin/deploy2graphite broken on tin due to nc command syntax - https://phabricator.wikimedia.org/T1387#1133775 (10bd808) >>! In T1387#1095974, @bd808 wrote: > I would guess that the file was renamed in Puppet and no ensure... [22:48:27] If someone is prepping for SWAT, note I replaced one of the Flow bumps with another bump that goes one commit farther. [22:48:44] I also added one to 1.25wmf21 (the other branch). [22:53:14] RECOVERY - Check status of defined EventLogging jobs on vanadium is OK: OK: All defined EventLogging jobs are runnning. [22:57:00] 6operations: Put archiva.wikimedia.org behind misc-web-lb and force https - https://phabricator.wikimedia.org/T88139#1133824 (10Dzahn) There is also the option to use misc-web but without the varnish caching, so to just let it pass throught all requests but still get the benefits of the wildcard cert. (we do thi... [22:58:27] 6operations: Add Yana to contracts@ - https://phabricator.wikimedia.org/T91269#1133835 (10Dzahn) 5Open>3Resolved [23:00:04] RoanKattouw, ^d, Krenair, James_F: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150319T2300). [23:00:17] who wants to get that? [23:00:58] chad's out right now [23:01:07] RoanKattouw: swat'able? [23:03:29] Krenair: ? :) [23:04:04] ok [23:04:33] let's see [23:04:54] PROBLEM - Check status of defined EventLogging jobs on vanadium is CRITICAL: CRITICAL: Stopped EventLogging jobs: reporter/statsd consumer/server-side-events-log consumer/mysql-m4-master consumer/client-side-events-log consumer/client-side-events-kafka-log consumer/all-events-log multiplexer/all-events processor/server-side-events processor/client-side-events-kafka processor/client-side-events forwarder/8422 forwarder/8421 [23:05:14] (03PS2) 10Alex Monk: Enable VisualEditor in plwiki's NS 102 ('Wikiprojekt') [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197466 (https://phabricator.wikimedia.org/T92698) (owner: 10Jforrester) [23:06:05] (I didn't touch anything analytics-related when that warning happened, btw) [23:06:14] (03CR) 10Alex Monk: [C: 032] Enable VisualEditor in plwiki's NS 102 ('Wikiprojekt') [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197466 (https://phabricator.wikimedia.org/T92698) (owner: 10Jforrester) [23:06:30] (03Merged) 10jenkins-bot: Enable VisualEditor in plwiki's NS 102 ('Wikiprojekt') [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197466 (https://phabricator.wikimedia.org/T92698) (owner: 10Jforrester) [23:07:03] gosh. is tin slow today or is it just my connection? [23:07:28] James_F, ping [23:08:14] fyi re that EL CRITICAL [23:08:14] 16:07 < nuria> qchris, greg-g: yes, i know, disk is full , i am doing cleanup, will re-start again in a sec [23:08:36] meh, I can test this VE config change myself [23:08:46] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/197466/ (duration: 00m 07s) [23:08:52] Logged the message, Master [23:10:08] yeah seems fine [23:10:31] ok... these others look like fun [23:10:32] superm401, hey [23:10:39] Krenair, hey [23:10:50] ugh your submodule updates don't include details either [23:10:51] ok [23:11:00] Krenair, I can add the underlying patches easily. [23:11:03] Just give me a sec. [23:11:32] it's fine [23:12:29] Krenair, done [23:12:49] I already followed the hashes back [23:13:54] superm401, I have no idea what jenkins will do now you updated the commit message in the middle of it merging [23:13:59] probably require it to be done again [23:14:01] will resubmit [23:15:50] Krenair: Hey. [23:15:53] RECOVERY - Disk space on vanadium is OK: DISK OK [23:15:53] Sorry, the one you +2'ed the old message I had already updated before I saw you preferred I just leave it. [23:16:01] The other one you +2'ed the edited message. [23:16:09] ACKNOWLEDGEMENT - Check status of defined EventLogging jobs on vanadium is CRITICAL: CRITICAL: Stopped EventLogging jobs: reporter/statsd consumer/server-side-events-log consumer/mysql-m4-master consumer/client-side-events-log consumer/client-side-events-kafka-log consumer/all-events-log multiplexer/all-events processor/server-side-events processor/client-side-events-kafka processor/client-side-events forwarder/8422 forwarder/8421 Nur [23:17:11] jenkins has quite a queue today :/ [23:22:40] superm401 [23:22:41] !log krenair Synchronized php-1.25wmf22/extensions/Flow: https://gerrit.wikimedia.org/r/#/c/198127/ (duration: 00m 07s) [23:22:46] Logged the message, Master [23:24:02] Krenair, confirmed (through the board deletion one). [23:24:07] !log eventlogging re-start due to vanadium disk filling up. Moved logs to "/srv/" [23:24:10] Logged the message, Master [23:24:19] superm401, do you have the rights you need to confirm the board deletion one? [23:24:39] Krenair, no, I had quiddity check for me. [23:24:49] quiddity checked it? OK [23:24:53] ya [23:27:44] superm401 [23:27:47] !log krenair Synchronized php-1.25wmf21/extensions/Flow/maintenance/FlowUpdateRevisionContentLength.php: https://gerrit.wikimedia.org/r/#/c/198125/ (duration: 00m 05s) [23:27:51] Logged the message, Master [23:28:12] Krenair, thanks. Can't test that one right now. It's a long-running job. [23:28:20] okay [23:29:28] kaldari, hey [23:29:43] wikilove has sql doesn't it? [23:30:55] looks like I need to create the tbale [23:30:56] table* [23:30:59] hm [23:31:44] Krenair: yes [23:31:57] Krenair: it creates 2 tables [23:32:13] I found wikilove_log in the patches dir [23:32:30] there's supposed to be an image log somewhere? [23:33:01] Krenair: The image log was deprecated, I believe [23:33:04] based on https://github.com/wikimedia/mediawiki-extensions-WikimediaMaintenance/blob/master/createExtensionTables.php#L81 [23:33:05] ok [23:33:35] Krenair: Do I need to change that file? [23:35:18] Creating wikilove tables...[836e9304] [no req] MWException from line 3998 of /srv/mediawiki-staging/php-1.25wmf21/includes/db/Database.php: Could not open "/srv/mediawiki-staging/php-1.25wmf21/extensions/WikiLove/patches/WikiLoveImageLog.sql". [23:35:23] I'll do it manually this time. [23:36:11] Krenair: Sorry, I’ve never done a deployment that required a new table. What’s the correct procedure for doing that? [23:36:34] PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: puppet fail [23:36:56] !log Run mwscript sql.php --wiki=ukwiki php-1.25wmf21/extensions/WikiLove/patches/WikiLoveLog.sql for https://gerrit.wikimedia.org/r/#/c/196988/ [23:36:59] Logged the message, Master [23:37:12] kaldari, there's a page on wikitech documenting how to do schema changes [23:37:19] I don't know how widely createExtensionTables is used [23:37:39] (03PS2) 10Alex Monk: Enable WikiLove extension at Ukrainian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196988 (https://phabricator.wikimedia.org/T91530) (owner: 10Glaisher) [23:37:58] (03CR) 10Alex Monk: [C: 032] Enable WikiLove extension at Ukrainian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196988 (https://phabricator.wikimedia.org/T91530) (owner: 10Glaisher) [23:38:03] 1) Don't run update.php [23:38:03] (03Merged) 10jenkins-bot: Enable WikiLove extension at Ukrainian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196988 (https://phabricator.wikimedia.org/T91530) (owner: 10Glaisher) [23:38:12] 2) You'll figure out the rest [23:39:12] Krenair, MaxSem: So normally I would create a phab ticket for creating/updating the schema first, and then once that is done, request deployment of the config change? [23:39:20] :) [23:39:25] kaldari, nah [23:39:35] you just create the table yourself [23:39:45] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/196988/ (duration: 00m 05s) [23:39:48] Logged the message, Master [23:39:48] the deployer will create it [23:39:52] assuming they know what they're doing [23:39:52] (provided that the table was originally approved by a dba) [23:40:18] ^ [23:40:20] the dba will love it if you tell about schema changes :p [23:40:23] springle [23:40:41] That table was already used on a bunch of other wikis [23:41:05] so it was already approved [23:41:07] Krenair: so maybe just add a note in the deployment request mentioning that it has a DB update requirement? [23:41:18] which reminds me of... springle, can you take a look at https://gerrit.wikimedia.org/r/#/c/177448/ ? :) [23:41:33] but yeah, get it reviewed by the dba if you're making something new [23:41:43] kaldari, probably best to do so, yeah [23:41:56] Krenair: Oh yeah, for sure. These tables were approved ages ago :) [23:42:19] Patch Set 61 .. wow [23:42:34] anyway does it work on ukwiki kaldari? [23:42:44] let’s see... [23:43:10] mutante, my pain... :P [23:43:35] Whoops, back. [23:43:58] https://uk.wikipedia.org/wiki/Обговорення_користувача:Kaldari - lgtm! [23:44:28] Yay :) [23:44:59] ebernhardson, hey [23:45:07] you didn't break stuff, right? :) [23:45:17] :) [23:45:23] awesome [23:45:27] MaxSem: ack [23:45:35] thanks! [23:45:37] Krenair: hey [23:45:44] RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [23:45:51] ebernhardson, so flow is going to need DB tables, I assume [23:45:56] Krenair: nope [23:45:58] Krenair: I had to remove the barnstar interface for uk.wiki (apparently they don’t use them), and they insisted on a blue heart instead of red. Go figure. [23:46:03] no? ok [23:46:20] (03PS2) 10Alex Monk: Enable flow on ruwiki and pawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198126 (owner: 10EBernhardson) [23:46:33] (03CR) 10Alex Monk: [C: 032] Enable flow on ruwiki and pawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198126 (owner: 10EBernhardson) [23:46:38] (03Merged) 10jenkins-bot: Enable flow on ruwiki and pawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198126 (owner: 10EBernhardson) [23:47:44] ebernhardson, [23:47:49] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/198126/ (duration: 00m 07s) [23:47:57] Logged the message, Master [23:50:41] Krenair: looks good, thanks [23:51:01] and still time for me to go through my list of trivial stuff that's been waiting forever [23:51:20] (03CR) 10Alex Monk: [C: 04-1] "does not merge" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192894 (owner: 10Southparkfan) [23:51:21] blue and yellow heart is a thing because of the .uk national flag i suppose [23:52:16] (03PS2) 10Alex Monk: Fix typo in robots.txt: Wayback Machine [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195097 (owner: 10Glaisher) [23:52:25] (03CR) 10Alex Monk: [C: 032] Fix typo in robots.txt: Wayback Machine [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195097 (owner: 10Glaisher) [23:52:52] (03Merged) 10jenkins-bot: Fix typo in robots.txt: Wayback Machine [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195097 (owner: 10Glaisher) [23:53:27] !log krenair Synchronized robots.txt: https://gerrit.wikimedia.org/r/#/c/195097/ - typo fix (duration: 00m 06s) [23:53:31] Logged the message, Master [23:54:48] (03PS2) 10Alex Monk: Remove toolserver.org from captcha whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194966 (owner: 10Reedy) [23:54:57] (03CR) 10Alex Monk: [C: 032] Remove toolserver.org from captcha whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194966 (owner: 10Reedy) [23:57:34] (03Merged) 10jenkins-bot: Remove toolserver.org from captcha whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194966 (owner: 10Reedy) [23:58:24] !log krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/194966/ - rm toolserver.org from whitelist (duration: 00m 07s) [23:58:30] Logged the message, Master [23:59:28] springle, replied. can change to unsigned if you say that a query error is better:) [23:59:28] (03CR) 10Alex Monk: [C: 04-1] "does not merge" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/156078 (https://bugzilla.wikimedia.org/29902) (owner: 10Withoutaname)