[00:00:40] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 810.263560907 [00:03:08] (03PS1) 10Andrew Bogott: We need -y when calling salt-key -d remotely [puppet] - 10https://gerrit.wikimedia.org/r/204423 [00:04:25] (03CR) 10Andrew Bogott: [C: 032] We need -y when calling salt-key -d remotely [puppet] - 10https://gerrit.wikimedia.org/r/204423 (owner: 10Andrew Bogott) [00:07:50] (03CR) 10Dzahn: [C: 032] drop shop & store entries from most projects [dns] - 10https://gerrit.wikimedia.org/r/196605 (https://phabricator.wikimedia.org/T92438) (owner: 10Dzahn) [00:08:21] (03PS6) 10Andrew Bogott: Have sink create ldap host entries. [puppet] - 10https://gerrit.wikimedia.org/r/202582 [00:25:52] (03CR) 10Dzahn: [C: 04-1] "most of these have been dropped from DNS in https://gerrit.wikimedia.org/r/#/c/196605/" [puppet] - 10https://gerrit.wikimedia.org/r/199791 (https://phabricator.wikimedia.org/T92438) (owner: 10Dzahn) [00:26:38] (03Abandoned) 10Dzahn: shop redirects: store instead of shop [puppet] - 10https://gerrit.wikimedia.org/r/199791 (https://phabricator.wikimedia.org/T92438) (owner: 10Dzahn) [00:34:10] (03CR) 10Dzahn: [C: 031] html dumps will be served from host where they are produced, via proxy [puppet] - 10https://gerrit.wikimedia.org/r/204257 (owner: 10ArielGlenn) [00:36:18] (03CR) 10Dzahn: "and the remaining mediawiki-installation group would be added in Hiera here now: https://gerrit.wikimedia.org/r/#/c/204331/" [puppet] - 10https://gerrit.wikimedia.org/r/179121 (owner: 10Giuseppe Lavagetto) [01:01:55] 6operations, 10ops-codfw: mw2128 not rebooting after network driver crash, blank console - https://phabricator.wikimedia.org/T95264#1211151 (10Papaul) on 4-13-1015 the tech came with another main board and the server did power back up with the new board but there were a problem on one of the power supply it tu... [01:13:58] 6operations: Update DNS for the Wikipedia store, before May 31 - https://phabricator.wikimedia.org/T96182#1211176 (10Dzahn) I have mailed Effie and CCed you, Victoria. [01:16:32] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1211177 (10Dzahn) >>! In T92438#1116424, @faidon wrote: > Let's just drop everything but 2-3 domains, I'd say.... >>! In T92438#1117737, @vshchepakina w... [01:31:26] (03PS4) 10Thcipriani: Add submodules to master checkoutMediaWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204080 (https://phabricator.wikimedia.org/T88442) [01:45:41] PROBLEM - puppetmaster https on virt1000 is CRITICAL - Socket timeout after 10 seconds [01:47:21] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 8.149 second response time [01:51:41] (03CR) 10Krinkle: Add submodules to master checkoutMediaWiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204080 (https://phabricator.wikimedia.org/T88442) (owner: 10Thcipriani) [01:53:11] (03PS4) 10Springle: Labs: puppetize labsdb1005's mysql setup [puppet] - 10https://gerrit.wikimedia.org/r/200170 (https://phabricator.wikimedia.org/T88234) (owner: 10coren) [01:53:42] * YuviPanda springle: wooo! :) [01:54:21] springle: I’m here if you need anything [01:54:51] excellent ;) [01:55:31] (03CR) 10Springle: [C: 032] Labs: puppetize labsdb1005's mysql setup [puppet] - 10https://gerrit.wikimedia.org/r/200170 (https://phabricator.wikimedia.org/T88234) (owner: 10coren) [01:57:26] YuviPanda: a shoulder to cry on? [01:57:28] !log restarted keystone and nova-scheduler in a failed attempt to unstick things [01:59:12] !log restarting mysql on virt1000, because how could things me worse? [02:00:00] PROBLEM - puppet last run on labsdb1005 is CRITICAL puppet fail [02:00:10] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 3872.95178999 [02:00:51] (03PS1) 10Springle: Add missing .erb extension [puppet] - 10https://gerrit.wikimedia.org/r/204434 [02:01:20] blazecat: more often than not [02:01:24] !log restarted keystone and nova-scheduler in a failed attempt to unstick things [02:01:30] !log restarting mysql on virt1000, because how could things me worse? [02:01:46] oh you already did that [02:01:51] (03CR) 10Springle: [C: 032] Add missing .erb extension [puppet] - 10https://gerrit.wikimedia.org/r/204434 (owner: 10Springle) [02:02:08] morebots, what’s the deal? [02:02:09] I am a logbot running on tools-exec-09. [02:02:09] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [02:02:09] To log a message, type !log . [02:03:49] !log log I say [02:06:37] (03PS9) 10BBlack: Set up /api/v1/ entry point for restbase [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) (owner: 10GWicke) [02:08:37] andrewbogott: ah, so toollabs homepage is down because the db is down because springle is doing things to it :) [02:09:00] > DBI connect('database=toollabs_p;host=tools.labsdb','s51051',...) failed: Can't connect to MySQL server on 'tools.labsdb' (111) at /data/project/admin/bin/updatetools.pl line 16 [02:09:01] Unable to connect to database [02:09:02] yeah, but there’s a problem on virt1000 as well, presumably unrelated [02:09:27] oh ok [02:12:59] keystone has suddenly decided that it needs to purge expired tokens, which that query apparently takes hours [02:13:04] and it won’t do anything in the meantime, I guess? [02:15:00] PROBLEM - puppetmaster https on virt1000 is CRITICAL - Socket timeout after 10 seconds [02:17:51] RECOVERY - puppet last run on labsdb1005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [02:18:11] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 5.300 second response time [02:18:11] PROBLEM - puppet last run on cp3018 is CRITICAL puppet fail [02:21:26] YuviPanda: currently backing up the data (just a local copy). going slower than expected [02:21:37] disk io seems oddly bursty on labsdb1005 [02:21:51] :) I wouldn’t be surprised if we dig in deeply and end up somewhere horrible [02:22:27] or i'm just too used to the production DBs disks [02:22:46] :) [02:23:34] !log testing the log by logging a test [02:23:44] Logged the message, Master [02:23:54] (03PS3) 10Yuvipanda: tools: Remove tomcat node definitions from puppet [puppet] - 10https://gerrit.wikimedia.org/r/193561 (https://phabricator.wikimedia.org/T91066) [02:23:57] PROBLEM - mysqld processes on labsdb1005 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [02:24:15] !log restarted keystone and nova-scheduler in a failed attempt to unstick things [02:24:21] Logged the message, Master [02:24:23] (03PS4) 10Yuvipanda: tools: Remove tomcat node definitions from puppet [puppet] - 10https://gerrit.wikimedia.org/r/193561 (https://phabricator.wikimedia.org/T91066) [02:24:30] !log restarted mysql on virt1000 because keystone was stuck. It seems to have helped, eventually [02:24:36] Logged the message, Master [02:24:36] !log but the ‘token’ table is still too big to manage [02:24:42] Logged the message, Master [02:28:39] YuviPanda: 50% copied. will just wait it out, but we'll go at least another half hour [02:29:02] !log l10nupdate Synchronized php-1.26wmf1/cache/l10n: (no message) (duration: 05m 57s) [02:29:12] Logged the message, Master [02:29:19] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [02:29:37] springle: ow. how much data is there anyway? [02:30:36] 380G + 100G binlogs [02:31:16] woah, that’s a lot of data... [02:33:29] !log LocalisationUpdate completed (1.26wmf1) at 2015-04-16 02:32:26+00:00 [02:33:36] Logged the message, Master [02:34:03] !log starting forceRenameUsers.php (SUL finalization) on non-test*wikis [02:34:09] Logged the message, Master [02:34:49] RECOVERY - puppet last run on cp3018 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [02:36:37] (03PS10) 10GWicke: Set up /api/v1/ entry point for restbase [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) [02:42:48] legoktm: such a generic simple log entry for such an important and long running script ;) [02:42:54] also grats [02:42:58] * jamesofur crosses fingers [02:43:01] (03PS11) 10GWicke: Set up /api/v1/ entry point for restbase [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) [02:43:45] springle: "select max(q.c) from (select count(*) as c from watchlist group by wl_user) as q" <= any idea where those are coming from? [02:43:53] are you manually triggering this per-wiki legoktm? [02:44:16] blazecat, woah, those? that looks like a single query I ran earlier [02:44:35] Krenair: yes. we'll see how well that scales, I might automate it a bit later on [02:44:58] RECOVERY - mysqld processes on labsdb1005 is OK: PROCS OK: 1 process with command name mysqld [02:45:02] or, maybe one or two. I killed it when I came up with a better idea [02:45:13] did they not end properly? [02:46:05] is it causing issues? [02:46:11] or just looking funny? [02:46:50] it's better to use one of the less-used ones, I guess if it's one off it doesn't matter [02:47:14] * blazecat is glad that's not coming from code [02:47:17] Yeah it was literally just to prove to someone how ridiculous some users' watchlists are [02:47:27] It shouldn't still be running [02:47:28] yeah, that should really be limited [02:47:47] they show up at tendril slow query logs [02:48:54] YuviPanda: tool-db lives. care to test some tools [02:49:07] springle: still trying to find the cause of the lag spikes at https://tendril.wikimedia.org/host/view/db1073.eqiad.wmnet/3306 (I fixed the 1-am ones, but I still see more) [02:49:16] * blazecat doesn't think those are cron scripts this time [02:49:18] springle: woo [02:49:27] springle: https://tools.wmflabs.org/ depends on it [02:49:35] so... not causing issues, right? just looking funny? can we end them? [02:49:46] * blazecat is looking at the 36/53/59 ones [02:50:00] Krenair: it's all just logs, nothing running [02:50:07] springle: so looks good. let me email labs-l [02:50:14] ohh, ok [02:51:05] * blazecat needs more http://cdn2.bigcommerce.com/server5500/tpbc2s65/products/1237/images/1270/carolansirishcream__65883__93458.1407759865.1280.1280.jpg?c=2 [02:51:14] * Krenair will be more careful in future anyway. that query really sucked and I should have come up with a better one before giving it to a live db to work on [02:51:58] springle: think it has 1. areas for tuning / improvement, 2. you’ll have any time for them? [02:52:28] !log l10nupdate Synchronized php-1.26wmf2/cache/l10n: (no message) (duration: 04m 38s) [02:52:37] Logged the message, Master [02:52:38] Krenair: sql.php takes a --slave argument, so you can use db1055 for enwiki [02:52:43] YuviPanda: cool tnx. have to brb, but will review it today [02:52:50] springle: :) [02:53:00] eval.php could just use 'vslow' in getConnection() [02:53:09] anyway, it probably did cause the lag anway [02:53:58] right. I definitely chose a slave to run it on, it may not have been the best one to run it on [02:54:21] heh [02:54:45] I was looking at the lag for it around the time and didn't see anything :/ [02:56:11] !log LocalisationUpdate completed (1.26wmf2) at 2015-04-16 02:55:08+00:00 [02:56:18] Logged the message, Master [02:56:57] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60631 bytes in 2.780 second response time [02:59:28] (03PS1) 10Aaron Schulz: Use "groupLoadsBySection" for enwiki for consistency, just like s4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204443 [03:01:49] (03CR) 10Yuvipanda: [C: 032] tools: Remove tomcat node definitions from puppet [puppet] - 10https://gerrit.wikimedia.org/r/193561 (https://phabricator.wikimedia.org/T91066) (owner: 10Yuvipanda) [03:07:04] !log Updated iegreview to e126f7c (Fix aggregated reports to work on the new reviews system) [03:07:11] Logged the message, Master [03:07:15] Who's blazecat? [03:07:36] superm401: cheesecat is blazecat [03:08:06] no burritocat is blazecat [03:08:16] *cat == aaron [03:08:47] Is this anti-ping technology? [03:13:54] (03PS1) 10GWicke: Fix hiera lookup [puppet] - 10https://gerrit.wikimedia.org/r/204448 [03:14:11] bd808: WP:BEANS [03:17:28] RECOVERY - nova-compute process on labvirt1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [03:18:28] * gwicke looks through the docs for reverse prefix irc nick autocompletion [03:18:55] suffix even [03:20:16] (03CR) 10Krinkle: "Maintained at https://github.com/wikimedia/integration-docroot/tree/master/org/wikimedia/doc and https://github.com/wikimedia/integration-" [puppet] - 10https://gerrit.wikimedia.org/r/204202 (owner: 10Dzahn) [03:21:37] (03CR) 10Krinkle: "Since the pages these redirects handle are in that repo, it made sense to manage the redirects there as well since otherwise atomic deploy" [puppet] - 10https://gerrit.wikimedia.org/r/204202 (owner: 10Dzahn) [03:22:07] (03PS2) 10GWicke: Fix hiera lookup for cache::text::node [puppet] - 10https://gerrit.wikimedia.org/r/204448 [03:22:28] PROBLEM - nova-compute process on labvirt1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [03:22:32] 2015-04-15 08:26:29 mw1170 enwiki: Query affected 71642 row(s): [03:22:34] query-m: INSERT IGNORE INTO `watchlist` (wl_user,wl_namespace,wl_title,wl_notificationtimestamp) VALUES ('X',NULL) [TRX#0db58bb99fba] [03:23:09] boy someone is having fun with SpecialEditWatchlist [03:28:16] springle: why oh why don't we limit watchlist size? :( [03:29:06] blazecat: limit it! [03:29:31] (03PS3) 10GWicke: Fix hiera lookup for cache::text::node [puppet] - 10https://gerrit.wikimedia.org/r/204448 [03:29:37] !log legoktm Synchronized php-1.26wmf1/includes/DefaultSettings.php: The 'spambot_username' message is a reserved username (duration: 00m 12s) [03:29:45] Logged the message, Master [03:30:22] gwicke: what's the deal? hiera doesn't like fully-qualified variables? [03:32:58] (03CR) 10Ori.livneh: "@gwicke: I think it happens here: https://github.com/wikimedia/operations-puppet/blob/production/modules/wmflib/lib/hiera/backend/nuyaml_b" [puppet] - 10https://gerrit.wikimedia.org/r/204448 (owner: 10GWicke) [03:33:28] PROBLEM - puppet last run on cp3031 is CRITICAL puppet fail [03:34:27] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0] [03:36:35] (03PS1) 10Ori.livneh: hiera mwyaml backend: strip leading '::' from key names [puppet] - 10https://gerrit.wikimedia.org/r/204452 [03:36:55] (03CR) 10Ori.livneh: "Maybe instead?" [puppet] - 10https://gerrit.wikimedia.org/r/204448 (owner: 10GWicke) [03:45:57] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [03:46:01] ori: I'm not sure what the deal is really [03:46:36] let's try https://gerrit.wikimedia.org/r/#/c/204448/ , because there are other lookup keys with a leading '::' [03:47:07] so we might as well try to fix it comprehensively, either by getting rid of all instances of leading '::', or by applying the same normalization [03:47:19] but yeah, this nu_yaml / mw_yaml is confusing and poorly-documented [03:47:33] I've had problems with '::whatever' not working in hiera lookups in mw-vagrant [03:47:53] the thing I'm wondering most about is where the config for production lives [03:48:26] grepping for the full key or parts of it didn't seem to turn up much [03:48:28] hiera is just bad [03:49:36] I can't say that I'm in love with it either [03:49:57] ok, so which is it -- normalize in both backends, or strip all leading colons from keys? [03:50:28] That var would go in this file I think -- https://github.com/wikimedia/operations-puppet/blob/production/hieradata/role/common/cache/text.yaml [03:51:02] nope... here it is -- https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common/cache/text.yaml [03:51:06] bd808: thing is though, this is referenced in production [03:51:27] RECOVERY - puppet last run on cp3031 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures [03:51:47] so either it's broken in prod, or there's something else going on that makes it work magically [03:52:54] there are different backends in use [03:53:06] one munges keys to strip leading '::' (nuyaml) [03:53:09] one does not (mwyaml) [03:53:15] the former is used in production; the latter in labs [03:53:27] ;/ [03:53:40] why oh why? [03:53:40] owyaml [03:53:49] and _j.oe_ wrote both right? [03:54:03] yes [03:54:12] could we use the same in both labs & prod? [03:54:30] if we're going to talk trash, we might as well ping _joe_ so he has a chance to defend himself [03:54:33] mwyaml is the one that reads from wikitech for labs [03:54:35] it doesn't strike me as a great idea to test with one, then use another in prod [03:54:45] no, it's an awful idea [03:54:56] using different backends [03:55:03] I think labs uses both actually [03:55:21] but mwyaml first and nuyaml second [03:55:58] but, unless mwyaml does some drastic key munging I still don't see how this would work in prod [03:55:59] https://github.com/wikimedia/operations-puppet/blob/21c72942dd7bf25dbe0759d2f867082e966bfb45/modules/puppetmaster/files/labs.hiera.yaml#L1-L3 [03:56:23] Sounds like mwyaml just needs to be fixed to strip the leading :: [03:56:33] that's what i did in my patch [03:56:42] but i think it's a bad idea, actually [03:57:11] for one, i was lazy and didn't make both backends delegate to the same munging function, so there is still breakage waiting to happen when we change one and not the other [03:57:27] grepping for text::nodes only turns up one definition, the one for labs [03:57:36] and it's another magical wmf idiosyncracy that makes testing hard [03:57:56] so i think we're better off with gwicke's initial idea of stripping the leading colons, but it should be done to all keys [03:58:23] manifests/role/analytics/kafkatee.pp:81: $mobile_nodes = hiera('::cache::mobile::nodes') [03:58:23] modules/role/manifests/cache/mobile.pp:3: $mobile_nodes = hiera('::cache::mobile::nodes') [03:58:23] modules/role/manifests/cache/parsoid.pp:3: $parsoid_nodes = hiera('::cache::parsoid::nodes') [03:58:25] modules/role/manifests/cache/text.pp:3: $text_nodes = hiera('::cache::text::nodes') [03:58:27] modules/role/manifests/cache/upload.pp:3: $upload_nodes = hiera('::cache::upload::nodes') [03:58:29] I do wonder why we can't get rid of the second backend [03:59:55] we should be a lot less flexible in general [04:00:01] !log legoktm Synchronized php-1.26wmf2/includes/DefaultSettings.php: The 'spambot_username' message is a reserved username (duration: 00m 11s) [04:00:09] Logged the message, Master [04:00:18] i'm not sold on hiera even when the decision procedure for determining a value for a particular host is transparent [04:00:52] doubly so when it is so labyrinthine [04:02:48] so we shouldn't use hiera and we shouldn't use $::realm conditionals? [04:03:19] seems like that will make beta/prod config differences impossible [04:04:21] I kind of like the way hierarchical configs ('group_vars') are done in ansible [04:04:50] http://docs.ansible.com/intro_inventory.html#groups-of-groups-and-group-variables [04:07:05] use hiera but use the hieradata/ filesystem hierarchy as the exclusive backend [04:07:34] rather than allowing regex host-matching rules, delegating to the mediawiki api, and all that fancy nonsense [04:07:50] yeah, and also allow everyone with a project +2 on ops/puppet [04:08:14] well, everyone who is a projectadmin [04:08:46] well, ok, look up values in hieradata/ and in a project-specific location [04:08:53] like, wikitech? [04:08:55] that's already quite elaborate, but still better [04:09:05] no, use the filesystem [04:09:06] project specific location where? [04:09:10] filesystem on which host? [04:09:28] most projects don’t have their own puppetmaster [04:10:32] labs / prod hiera differences need fixing, but mwyaml isn’t the problem... [04:10:45] lack of ‘role’ equivalent is one [04:10:48] if your threshold for when complexity is worthwhile is set so low, you'll always be fighting mountains of slop [04:10:50] yeah [04:11:08] ori: I’m happy to hear alternative solutions to ‘ContentHandler on Wikitech' [04:11:41] lack of ‘role’ equivalent, the fact that labs starts from scratch than just overriding the stuff defined in prod another [04:11:49] pandacat: git repo [04:12:11] automatically created per project? with +2 on gerrit to the owners? [04:12:19] single one for all of labs [04:12:23] with +2 on gerrit to the owners [04:12:27] ‘the owners’? [04:12:47] of what? we can’t hand out +2 for tools’ hiera to people who maintain, say, the xtools project [04:13:20] again, our problems aren’t wikitech :) [04:13:22] actually [04:13:30] just make it a local filesystem path [04:13:41] if you want to override a setting for multiple hosts in your project, do it on each [04:13:43] can't there be a fork or branch per project? [04:13:46] or get a project puppetmaster [04:13:52] gwicke: i thought of that, but that's getting elaborate again [04:13:57] a per-project puppetmaster is immense amount of complexity. [04:14:22] copying a file to local_hieradata for five hosts, on the other hand, is not a lot of work [04:14:26] pandacat: it's simple compared to also having to set up a salt master for trebuchet ;/ [04:14:31] well, patches welcome [04:14:36] gwicke: that’s trebuchet’s fault [04:14:56] it also bugs me slgihtly that labs is just considered as ‘that place we test prod stuff on’ ignoring the fact there’s an ecosystem of people who use just labs [04:15:00] but that’s a rant for another day [04:15:28] ok [04:15:45] I can say ‘so copy this file via some means to all your hosts / set up your own puppetmaster / your own salt master’ is not a valid way for most people who have their own projects. [04:15:57] (for some definition of valid) [04:16:03] I do hope that we can eventually eliminate the labs / prod distinction by using the same tech for both [04:16:22] pandacat: no, it's "go edit /etc/puppet/hiera_overrides.yaml" [04:16:30] ori: on all your 14 machines [04:17:03] ok, how many settings (a) are applied project-wide, (b) on a project with 14 machines, (c) that doesn't have a local puppet master? [04:17:08] how big is that intersection? [04:17:34] Tools for example. Or halfak's revscoring project [04:17:58] submit a patch to operations/puppet then [04:18:03] that's two cases [04:18:12] not the end of the world [04:18:18] Easy to say :) [04:18:45] i'm not doubting that the way things are set up does not provide some convenience value [04:18:56] i'm doubting that it's enough value to warrant the complexity [04:19:24] Tim L and valhallasw are admins in good standing of tools who have no access to +2 on ops puppet [04:19:50] Anyway I think hiera has bigger problems than mwyaml [04:19:53] they've been perfectly happy to wait on an ops merge for adding tools to the software set [04:20:05] For some definition of happy [04:20:15] well, ok [04:20:30] all i'm saying is: at the end of the day you can only maintain so much code [04:20:32] choose wisely [04:20:56] I think mwyaml being involved in this discussion is a red herring [04:21:12] And frustrations elsewhere are are being channeled here. [04:21:54] On that note I'll go eat and sleep [04:22:02] different lookup behaviors for hiera vars are a problem for a staging-like testing scenario [04:22:30] Solution is to unify them and I have no disagreements :) they are literally two files now... [04:22:39] Large room for improvement there [04:22:45] Where the yaml comes from is immaterial [04:24:48] heading out as well, see you all tomorrow! [04:24:55] Cya [04:58:08] PROBLEM - puppet last run on cp3039 is CRITICAL puppet fail [05:10:48] !log disabling puppet on virt1000 so that I can prevent a questionable cron (purging tokens from the keystone db) from running while I sleep. [05:11:25] morebots: what gives? [05:12:40] I am a logbot running on tools-exec-09. [05:12:40] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [05:12:40] To log a message, type !log . [05:14:01] !log disabling puppet on virt1000 so that I can prevent a questionable cron (purging tokens from the keystone db) from running while I sleep. [05:14:27] RECOVERY - puppet last run on cp3039 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures [05:15:17] PROBLEM - check_puppetrun on boron is CRITICAL Puppet has 1 failures [05:15:45] welp [05:20:17] PROBLEM - check_puppetrun on boron is CRITICAL Puppet has 1 failures [05:25:17] PROBLEM - check_puppetrun on boron is CRITICAL Puppet has 1 failures [05:30:17] PROBLEM - check_puppetrun on boron is CRITICAL Puppet has 1 failures [05:35:17] PROBLEM - check_puppetrun on boron is CRITICAL Puppet has 1 failures [05:39:30] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Apr 16 05:38:27 UTC 2015 (duration 38m 26s) [05:40:17] PROBLEM - check_puppetrun on boron is CRITICAL Puppet has 1 failures [05:45:17] PROBLEM - check_puppetrun on boron is CRITICAL Puppet has 1 failures [05:50:17] PROBLEM - check_puppetrun on boron is CRITICAL Puppet has 1 failures [05:55:17] PROBLEM - check_puppetrun on boron is CRITICAL Puppet has 1 failures [06:00:17] PROBLEM - check_puppetrun on boron is CRITICAL Puppet has 1 failures [06:05:17] PROBLEM - check_puppetrun on boron is CRITICAL Puppet has 1 failures [06:07:59] PROBLEM - puppet last run on labsdb1003 is CRITICAL puppet fail [06:10:17] RECOVERY - check_puppetrun on boron is OK Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:25:58] RECOVERY - puppet last run on labsdb1003 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:28:57] PROBLEM - puppet last run on mw1126 is CRITICAL puppet fail [06:30:28] PROBLEM - puppet last run on cp4001 is CRITICAL puppet fail [06:30:38] PROBLEM - puppet last run on pc1002 is CRITICAL puppet fail [06:30:58] PROBLEM - puppet last run on db1040 is CRITICAL Puppet has 1 failures [06:30:58] PROBLEM - puppet last run on ms-fe2001 is CRITICAL Puppet has 1 failures [06:33:47] PROBLEM - puppet last run on cp3010 is CRITICAL Puppet has 1 failures [06:34:48] PROBLEM - puppet last run on mw1092 is CRITICAL Puppet has 1 failures [06:35:08] PROBLEM - puppet last run on analytics1030 is CRITICAL Puppet has 1 failures [06:35:27] PROBLEM - puppet last run on mw1123 is CRITICAL Puppet has 1 failures [06:35:38] PROBLEM - puppet last run on mw2079 is CRITICAL Puppet has 1 failures [06:36:08] PROBLEM - puppet last run on mw2011 is CRITICAL Puppet has 1 failures [06:36:18] PROBLEM - puppet last run on mw2123 is CRITICAL Puppet has 1 failures [06:36:18] PROBLEM - puppet last run on mw2003 is CRITICAL Puppet has 1 failures [06:36:38] PROBLEM - puppet last run on mw1213 is CRITICAL Puppet has 1 failures [06:45:39] RECOVERY - puppet last run on db1040 is OK Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:45:47] RECOVERY - puppet last run on ms-fe2001 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:46:18] RECOVERY - puppet last run on mw1092 is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:46:38] RECOVERY - puppet last run on analytics1030 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:49] RECOVERY - puppet last run on mw1126 is OK Puppet is currently enabled, last run 1 second ago with 0 failures [06:46:57] RECOVERY - puppet last run on mw1123 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:57] RECOVERY - puppet last run on pc1002 is OK Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:47:08] RECOVERY - puppet last run on mw2079 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:47:48] RECOVERY - puppet last run on mw2123 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:48] RECOVERY - puppet last run on mw2003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:08] RECOVERY - puppet last run on mw1213 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:28] RECOVERY - puppet last run on cp4001 is OK Puppet is currently enabled, last run 27 seconds ago with 0 failures [06:48:29] RECOVERY - puppet last run on cp3010 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:49:18] RECOVERY - puppet last run on mw2011 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [07:02:24] (03PS1) 10Giuseppe Lavagetto: cache: change the label of the hiera lookup [puppet] - 10https://gerrit.wikimedia.org/r/204461 [07:03:06] <_joe_> bblack: ^^ [07:05:43] I hate whitespace changes mixed in with real ones btw :P [07:07:41] <_joe_> bblack: ach it's whitespace-mode in emacs [07:08:32] I also hate emacs! :) [07:08:54] <_joe_> well, I use the corresponding plugin in vim too [07:08:59] lol [07:09:17] <_joe_> but it's to protect me from my laziness [07:09:30] <_joe_> most of our files are already clean [07:09:58] of course if gerrit sucked less at displaying whitespace changes.... [07:09:59] * YuviPanda uses intelliJ for Puppet too [07:10:41] it has a very nice vim emulation plugin [07:10:46] not as good as evil mode tho [07:11:14] (03CR) 10BBlack: [C: 031] "verified via "git-review -d 204461; git show -w"" [puppet] - 10https://gerrit.wikimedia.org/r/204461 (owner: 10Giuseppe Lavagetto) [07:16:03] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "I think this is wrong. You can find 100 ways to do this via hiera (but I'd use stored configs, tbh), like parametrizing the directory wher" [puppet] - 10https://gerrit.wikimedia.org/r/204331 (owner: 10Chad) [07:19:29] (03CR) 10Giuseppe Lavagetto: [C: 032] "sorry for the whitespace changes :(" [puppet] - 10https://gerrit.wikimedia.org/r/204461 (owner: 10Giuseppe Lavagetto) [07:33:17] (03PS1) 10Mjbmr: Rename project namespace for tewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204464 (https://phabricator.wikimedia.org/T89332) [07:43:54] oh men it is already the morning! [07:48:49] (03PS2) 10Muehlenhoff: Add a systemd unit file (Bug: T95055) [debs/ircecho] - 10https://gerrit.wikimedia.org/r/204054 [07:53:47] (03CR) 10Muehlenhoff: "Thanks, I've updated the Phab reference in the correct format." [debs/ircecho] - 10https://gerrit.wikimedia.org/r/204054 (owner: 10Muehlenhoff) [07:57:04] (03PS2) 10Muehlenhoff: * Simplify package build, also the stepping stone for adding a systemd unit file (Bug: T95055) [debs/ircecho] - 10https://gerrit.wikimedia.org/r/204045 [08:10:47] (03PS1) 10Mjbmr: Add draft namespace for fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204467 (https://phabricator.wikimedia.org/T92760) [08:16:16] (03PS2) 10Mjbmr: Add draft namespace for fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204467 (https://phabricator.wikimedia.org/T92760) [08:20:20] (03PS1) 10KartikMistry: Beta: Enable Greek (el) and Zulu (zu) [puppet] - 10https://gerrit.wikimedia.org/r/204469 [08:27:50] !log swift ms-be10[678] to weight 2450 [08:28:19] sigh, no morebots? [08:30:28] (03CR) 10Filippo Giunchedi: [C: 031] Add a systemd unit file (Bug: T95055) [debs/ircecho] - 10https://gerrit.wikimedia.org/r/204054 (owner: 10Muehlenhoff) [08:31:04] (03CR) 10Filippo Giunchedi: [C: 031] * Simplify package build, also the stepping stone for adding a systemd unit file (Bug: T95055) [debs/ircecho] - 10https://gerrit.wikimedia.org/r/204045 (owner: 10Muehlenhoff) [08:32:08] (03PS2) 10Filippo Giunchedi: eventlogging: adjust counters thresholds [puppet] - 10https://gerrit.wikimedia.org/r/204237 (https://phabricator.wikimedia.org/T90111) [08:32:13] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] eventlogging: adjust counters thresholds [puppet] - 10https://gerrit.wikimedia.org/r/204237 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [08:33:47] PROBLEM - puppet last run on mw2048 is CRITICAL puppet fail [08:34:49] (03PS6) 10Filippo Giunchedi: graphite: introduce carbon-c-relay [puppet] - 10https://gerrit.wikimedia.org/r/181080 (https://phabricator.wikimedia.org/T85908) [08:49:04] 6operations, 7Graphite: scale graphite deployment (tracking) - https://phabricator.wikimedia.org/T85451#1211741 (10fgiunchedi) [08:49:06] 6operations, 7Graphite, 5Patch-For-Review: replace txstatsd - https://phabricator.wikimedia.org/T90111#1211739 (10fgiunchedi) 5Open>3Resolved this is completed, txstatsd has been replaced by statsite [08:50:34] RECOVERY - puppet last run on mw2048 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures [08:52:21] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I already made a change fixing all occurrences, not just for text." [puppet] - 10https://gerrit.wikimedia.org/r/204448 (owner: 10GWicke) [08:54:37] 6operations, 7Graphite: scale graphite deployment (tracking) - https://phabricator.wikimedia.org/T85451#947062 (10fgiunchedi) [08:54:38] 6operations, 7Graphite: scale statsd reporting/aggregation (plan) - https://phabricator.wikimedia.org/T89857#1211744 (10fgiunchedi) 5Open>3stalled stalling this for now, getting graphite more reliable is a priority on T85451 [08:54:40] 6operations, 10RESTBase: Investigate apparent restbase request rate under-reporting in graphite: statsd issue? - https://phabricator.wikimedia.org/T89846#1211749 (10fgiunchedi) [09:04:19] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Enable Greek (el) and Zulu (zu) [puppet] - 10https://gerrit.wikimedia.org/r/204469 (owner: 10KartikMistry) [09:18:55] 6operations: Encrypted password storage - https://phabricator.wikimedia.org/T96130#1211758 (10MoritzMuehlenhoff) Sure, we should merge it, then. I did some digging in existing bugs, but couldn't find anything. I can't access the existing https://phabricator.wikimedia.org/T83410 , though?. It claims to be a restr... [09:19:48] (03PS1) 10Mjbmr: Add autopatrolled, patroller and rollbacker user groups for svwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) [09:37:14] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me (and tested on precise and trusty)" [puppet] - 10https://gerrit.wikimedia.org/r/185321 (owner: 10Dzahn) [09:38:27] 7Blocked-on-Operations, 6operations, 10Continuous-Integration, 5Continuous-Integration-Isolation, and 2 others: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1211784 (10hashar) @fgiunchedi and I have just finished a 1/1. I have further tweaked the debian files for Precise and... [09:51:16] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me (and tested on trusty)" [puppet] - 10https://gerrit.wikimedia.org/r/185325 (owner: 10Dzahn) [09:52:45] (03CR) 10Dereckson: [C: 04-1] "Change looks good to me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204467 (https://phabricator.wikimedia.org/T92760) (owner: 10Mjbmr) [09:57:14] (03CR) 10Dereckson: [C: 04-1] "It seems according the diff you edited the +swwiki block as well." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [10:02:51] (03PS3) 10Dereckson: User rights configuration on ne.wikipedia - Reviewer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203323 (https://phabricator.wikimedia.org/T95101) [10:06:55] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me (and tested on precise and trusty)" [puppet] - 10https://gerrit.wikimedia.org/r/185329 (owner: 10Dzahn) [10:13:13] mhhh logins on wikitech not working, still keystone related andrewbogott_afk _joe_ akosiaris ? [10:15:00] Can repro. "Wikitech uses cookies to log in users. You have cookies disabled. Please enable them and try again. [10:15:03] " [10:15:34] (03CR) 10Mjbmr: [C: 031] "They have to be alphabetical order, swwiki is after all sv projects, I fixed it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [10:19:30] I can try bouncing keystone but didn't follow yesterday's investigation/fix [10:20:30] (03CR) 10Mjbmr: "Read the discussion at https://fa.wikipedia.org/wiki/%D9%88%DB%8C%DA%A9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%D9%82%D9%87%D9%88%D9%87%E2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204467 (https://phabricator.wikimedia.org/T92760) (owner: 10Mjbmr) [10:20:52] (03CR) 10Mjbmr: [C: 031] Add draft namespace for fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204467 (https://phabricator.wikimedia.org/T92760) (owner: 10Mjbmr) [10:21:12] !log bounce keystone on virt1000 [10:21:24] that didn't do much afaict [10:26:08] (03CR) 10Dereckson: "So your change does two things:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [10:31:21] (03CR) 10Dereckson: "As indicated on Phabricator, we would like a confirmation from the community itself." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204467 (https://phabricator.wikimedia.org/T92760) (owner: 10Mjbmr) [10:32:54] (03CR) 10Mjbmr: "Yep, they provided a link on the ticket." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204467 (https://phabricator.wikimedia.org/T92760) (owner: 10Mjbmr) [10:32:56] (03CR) 10Mjbmr: "I don't know where you come from, I've never see you before, please don't put your unusable discussion here." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [10:33:26] (03CR) 10Mjbmr: Add draft namespace for fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204467 (https://phabricator.wikimedia.org/T92760) (owner: 10Mjbmr) [10:33:52] godog: yeah tried that already [10:33:59] I am not sure what's the issue tbh [10:34:06] (03CR) 10Giuseppe Lavagetto: [C: 031] "this also makes use of the servicegroup and hostgroup directives in icinga, as an added plus." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/204274 (owner: 10Filippo Giunchedi) [10:34:13] and the error message is really not helpful [10:34:19] me neither :sadface: [10:34:58] (03PS2) 10Filippo Giunchedi: restbase: add ganglia cluster [puppet] - 10https://gerrit.wikimedia.org/r/204274 [10:35:04] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] restbase: add ganglia cluster [puppet] - 10https://gerrit.wikimedia.org/r/204274 (owner: 10Filippo Giunchedi) [10:38:23] (03CR) 10Dereckson: "Hmmmm... community didn't validate this translation, which is a question from from Glaisher since March 19th." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204464 (https://phabricator.wikimedia.org/T89332) (owner: 10Mjbmr) [10:41:36] (03CR) 10Mjbmr: "Please read more about MediaWiki, there is a already NS_PROJECT_TALK defined for this project in MessagesTe.php." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204464 (https://phabricator.wikimedia.org/T89332) (owner: 10Mjbmr) [10:43:36] (03PS1) 10Steinsplitter: Whitelisting *.dropbox.com for GWToolset upload. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204485 [10:44:36] jenkins down? [10:46:03] (03PS2) 10Steinsplitter: Whitelisting *.dropbox.com for GWToolset upload. Requested by Nicolas Rück (WMDE) via IRC. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204485 [10:53:46] (03PS2) 10Mjbmr: Add autopatrolled, patroller and rollbacker user groups for svwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) [10:55:14] (03CR) 10Mjbmr: "I don't need to learn from you, you don't know me. You better think of what asked for." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [10:55:27] (03CR) 10Dereckson: "This is a strange question, this "I don't know where you come from" especially during a review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [11:05:22] not sure if anyone is able to fix wikitech's login issues https://phabricator.wikimedia.org/T96240#1211906 [11:11:15] (03PS1) 10Filippo Giunchedi: restbase: add ganglia aggregators [puppet] - 10https://gerrit.wikimedia.org/r/204488 [11:11:53] <_joe_> sDrewth: I'm taking a look [11:12:07] thx so much _joe_ [11:12:29] it is at least hours ago, but less than 12 hours ago that there has been whatever change [11:12:38] at least 8 hours [11:12:47] (03CR) 10Giuseppe Lavagetto: [C: 031] restbase: add ganglia aggregators [puppet] - 10https://gerrit.wikimedia.org/r/204488 (owner: 10Filippo Giunchedi) [11:13:05] hmm, maybe I exaggerate, at least 6+ [11:13:20] 6operations, 10Wikimedia-Labs-wikitech-interface: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211923 (10Krenair) [11:13:32] <_joe_> sDrewth: if it's a systems failure, I can try to fix it. If this is due to some software release, though, I'm not sure I can help you [11:13:43] k [11:14:49] 6operations, 10Wikimedia-Labs-wikitech-interface: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211926 (10Joe) We already restarted keystone earlier to no effect I'm looking around to find any specific error. [11:15:16] 6operations, 10Wikimedia-Labs-wikitech-interface: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211928 (10Joe) p:5Triage>3Unbreak! a:3Joe [11:15:54] (03CR) 10Mjbmr: "I don't need to know you, you should check my past commits before telling me how to commit." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [11:18:54] 6operations: Encrypted password storage - https://phabricator.wikimedia.org/T96130#1211929 (10MoritzMuehlenhoff) a:3MoritzMuehlenhoff [11:21:33] 6operations, 5Interdatacenter-IPsec: Update 3.19 kernel to 3.19.4 - https://phabricator.wikimedia.org/T96146#1211936 (10MoritzMuehlenhoff) a:3MoritzMuehlenhoff [11:22:31] 6operations, 10Wikimedia-Labs-wikitech-interface: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211938 (10Krenair) In case you don't already know about it there are troubleshooting instructions at https://wikitech.wikimedia.org/wiki/Wikitech I usually hide the logging behi... [11:24:20] (03PS1) 10MaxSem: Package jetty-runner [debs/jetty-runner] - 10https://gerrit.wikimedia.org/r/204489 [11:31:11] PROBLEM - puppet last run on mw2050 is CRITICAL Puppet has 1 failures [11:32:37] (03CR) 10Tim Landscheidt: Beta: Enable Greek (el) and Zulu (zu) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/204469 (owner: 10KartikMistry) [11:33:19] <_joe_> is wikitech extremely slow for everyone or just me? [11:33:59] _joe_: it has been quite slow for most of the morning for me [11:34:33] _joe_: gotta restart keystone on virt1000 I guess [11:34:36] it is a recurring issue [11:34:48] <_joe_> hashar: the guys did that, and changed nothing [11:34:56] bah :/ [11:39:11] PROBLEM - puppet last run on mw2129 is CRITICAL puppet fail [11:47:42] RECOVERY - puppet last run on mw2050 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:57:06] (03CR) 10Aklapper: "Mjbmr: Please read https://wikimediafoundation.org/wiki/Code_of_conduct_policy . Your previous comments sound disrespectful to me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [11:57:21] RECOVERY - puppet last run on mw2129 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:58:46] !log restarted apache on silver, wikitech login seems to work again [11:58:50] Logged the message, Master [12:00:22] 6operations, 10Wikimedia-Labs-wikitech-interface: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211951 (10Krenair) a:5Joe>3Krenair ```krenair@silver:/srv/mediawiki/wmf-config$ sudo /usr/sbin/apache2ctl restart krenair@silver:/srv/mediawiki/wmf-config$ ``` Seems to wor... [12:00:31] 6operations, 10Wikimedia-Labs-wikitech-interface: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211953 (10Krenair) 5Open>3Resolved [12:01:05] 6operations, 10Wikimedia-Labs-wikitech-interface: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211954 (10Joe) I checked/restarted keystone, I checked all the error logs I could think of (both locally and on fluorine), restarted the local failing nutcracker service on silv... [12:03:36] _joe_, actually if you check the link you posted, you'll see group 1 didn't get changed yesterday [12:04:04] 1.26wmf1 hit group 1 wikis the day before [12:08:30] <_joe_> Krenair: oh I got confused, that page is a bit full of info [12:08:49] <_joe_> Krenair: still, I can't find any log anywhere telling me what's not working [12:08:52] yeah it's probably not too obvious if you don't follow deployments too closely [12:09:04] thx gentlemen [12:09:18] <_joe_> sDrewth: thank me if I fix it :) [12:09:31] <_joe_> which as of now, I don't seem to be able to do [12:09:46] we (read: labs people) should probably find a proper solution to it [12:09:55] because right now we are using the "turn it off and back on again" method [12:10:19] <_joe_> Krenair: logging what gave an error to mediawiki would help, for starters [12:11:03] 6operations, 10Wikimedia-Labs-wikitech-interface: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211962 (10Joe) Maybe this is obvious enough, but is everyone having problems logging in using the 2 factor auth? [12:12:25] 6operations, 10Wikimedia-Labs-wikitech-interface: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211963 (10Krenair) Nope... Works for me [12:12:50] I figure it's something to do with cookies and sessions [12:14:41] <_joe_> sDrewth: can you login? [12:14:55] <_joe_> because I just logged in [12:15:12] (03PS3) 10Alex Monk: Whitelisting *.dropbox.com for GWToolset upload. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204485 (owner: 10Steinsplitter) [12:16:15] (03CR) 10Mjbmr: "Calling me a noob is not disrespectful?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [12:16:26] yes _joe_ , now to get the phone set up as I tried to fathom all afternoon about its "stupid" browser settings [12:16:30] (03CR) 10Alex Monk: "Previously these have been justified by saying they are only domains which host content almost entirely usable on commons... The same is n" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204485 (owner: 10Steinsplitter) [12:17:34] Mjbmr: please stop picking fights [12:18:01] 6operations, 10Wikimedia-Labs-wikitech-interface: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211967 (10Joe) So, it seems memcached was the problem. Restarting nutcracker solved the issue afterall. [12:18:49] 6operations, 10Wikimedia-Labs-wikitech-interface: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211969 (10MoritzMuehlenhoff) No, I don't use 2FA, in my case Krenair's Apache restart fixed it for me. [12:28:56] (03CR) 10Glaisher: "Does this need to be scheduled?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195886 (https://phabricator.wikimedia.org/T92376) (owner: 10Nemo bis) [12:33:15] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] restbase: add ganglia aggregators [puppet] - 10https://gerrit.wikimedia.org/r/204488 (owner: 10Filippo Giunchedi) [12:34:31] (03CR) 10Aklapper: "Mjbmr: Dereckson verbosely and clearly explained which further changes are required to the patch. For no obvious reason, you get personal " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [12:34:48] (03CR) 10Alex Monk: "Probably just registered for a swat deployment or something. Maybe check with Greg. Actually, if you're unsure whether something needs to " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195886 (https://phabricator.wikimedia.org/T92376) (owner: 10Nemo bis) [12:35:43] !log uploaded etherpad-lite_1.4.1-2 on apt.wikimedia.org [12:35:48] Logged the message, Master [12:36:12] (03CR) 10Mjbmr: "You thought I don't know how to use git?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [12:37:24] (03CR) 10Glaisher: "Greg: Is SWAT enough to deploy this?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195886 (https://phabricator.wikimedia.org/T92376) (owner: 10Nemo bis) [12:42:52] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0] [12:49:00] (03CR) 10Aklapper: "@Mjbmr: I explicitly refer to your communication style like "don't put your unusable discussion here." or "I don't need to learn from you"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [12:52:43] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [12:55:29] (03CR) 10Mjbmr: "You're saying I shouldn't defend my work when I did nothing wrong and someone say I did worng? is that it? or you're trying to stop people" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [12:56:24] <_joe_> #betterThanPopcornTime [13:01:00] PROBLEM - Disk space on zirconium is CRITICAL: DISK CRITICAL - free space: / 266 MB (2% inode=40%) [13:01:02] #headthunk [13:02:40] <_joe_> looking ^^ [13:02:40] RECOVERY - Disk space on zirconium is OK: DISK OK [13:02:44] <_joe_> uhm [13:02:58] <_joe_> the etherpad host [13:03:09] yeah, apt-cache [13:03:15] <_joe_> I guess you have something to do with it :) [13:03:17] I did a dist-ugprade [13:04:11] speaking of etherpad, could we please add a listing? [13:04:27] (listing of available etherpads) [13:04:39] all of them ? [13:04:45] they must be thousands [13:04:49] if not more [13:05:32] I had done a survey of a couple of etherpad plugins sometime ago that supposedly did that automagically [13:05:54] one would not install at all [13:06:03] :/ [13:06:04] the other clearly said "DO NOT USE THIS IN PRODUCTION" [13:06:09] haha [13:06:27] there is an bugzilla/phab task about it [13:06:29] lemme find it [13:07:20] https://phabricator.wikimedia.org/T32240 [13:07:40] cool thnx [13:07:48] it's getting a star from me right away [13:07:49] :P [13:08:26] ah closed already [13:13:35] (03CR) 10Aklapper: "You can (and should) defend your work by argumenting *technically* instead of personally. Your choice of words is problematic. That is ent" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [13:16:05] (03CR) 10Hashar: [C: 031] integration: move redirect out of .htaccess [puppet] - 10https://gerrit.wikimedia.org/r/204202 (owner: 10Dzahn) [13:18:07] (03CR) 10Alex Monk: [C: 04-1] Add AffCom user group application contact page on meta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204205 (https://phabricator.wikimedia.org/T95789) (owner: 10Alex Monk) [13:18:14] (03CR) 10Glaisher: Add AffCom user group application contact page on meta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204205 (https://phabricator.wikimedia.org/T95789) (owner: 10Alex Monk) [13:22:08] (03CR) 10Mjbmr: "So next time I'll comment on your patches and show you how to use git, how does that sound like?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [13:22:14] hrm ^^ [13:22:41] Krenair: wikibugs has left us [13:22:58] hmm [13:23:03] not sure if that's a bot I can kick [13:23:18] I have grrrit-wm and ecmabot-wm, IIRC [13:24:06] YuviPanda: ping [13:24:25] legoktm and YuviPanda are probably sleeping given it's 6:30 in the morning there [13:24:35] twentyafterfour, ....? [13:25:19] I wonder if we have flood protection on wikibugs [13:25:27] I think it's like 8:30 for him [13:25:44] I'll bet there happened to be a surge of bug changes and that caused something [13:27:21] the bot seems to have rejoined only -labs [13:27:52] and collab [13:29:00] Krenair: well can you ping YuviPanda and legoktm when they're around. I'm heading out [13:29:22] * Krenair waves [13:29:37] (03PS1) 10Giuseppe Lavagetto: cache: mock change to show use of hiera [puppet] - 10https://gerrit.wikimedia.org/r/204504 [13:29:49] <_joe_> bblack: ^^ [13:30:10] PROBLEM - check_puppetrun on payments1001 is CRITICAL puppet fail [13:30:26] <_joe_> bblack: of course the same thing can be done for most things you have in there [13:30:46] springle: still up? [13:30:55] payments1001 looking... [13:32:57] (03CR) 10Chad: "I'm not sure I'd call a static file perfectly fine since it doesn't work outside of prod :)" [puppet] - 10https://gerrit.wikimedia.org/r/204331 (owner: 10Chad) [13:33:29] (03CR) 10Aklapper: "That sounds like I would appreciate your review comments to make me follow patch conventions and use Git effectively. "Assume people mean " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [13:35:10] PROBLEM - check_puppetrun on indium is CRITICAL puppet fail [13:35:11] RECOVERY - check_puppetrun on payments1001 is OK Puppet is currently enabled, last run 225 seconds ago with 0 failures [13:35:20] PROBLEM - check_puppetrun on tellurium is CRITICAL puppet fail [13:35:45] (03CR) 10Giuseppe Lavagetto: "file { "/etc/dsh/group/$title":" [puppet] - 10https://gerrit.wikimedia.org/r/204331 (owner: 10Chad) [13:35:52] garg, for some reason the puppet agent croaked on the last run [13:36:13] <^d> _joe_: That'll work just fine ^ [13:36:20] i think it tried to restart itself because of whitespace cleanup in the config file [13:36:24] * ^d shall amend today [13:36:36] akosiaris: did you have a chance to circle back on my cert issue from yesterday? (If not then I’ll take another stab) [13:38:51] andrewbogott: yeah, I know what to do, I am missing though some info [13:39:02] the key for the cert to be precise [13:39:32] (03CR) 10Mjbmr: "Now, you're trying to tell me how to live, you better tell people when someone did nothing wrong, don't tell them you did something wrong." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [13:40:10] RECOVERY - check_puppetrun on indium is OK Puppet is currently enabled, last run 179 seconds ago with 0 failures [13:40:13] oh man this is getting better and better ^ [13:40:20] RECOVERY - check_puppetrun on tellurium is OK Puppet is currently enabled, last run 161 seconds ago with 0 failures [13:40:35] akosiaris: on palladium, /root/private/files/ssl/labvirt-star.eqiad.wmnet.key [13:40:46] which is actually a copy of /root/private/files/ssl/virt-star.eqiad.wmnet.key [13:41:23] andrewbogott: yes [13:41:44] hashar: zuul is on precise-wikimedia [13:41:51] (03PS1) 10BBlack: remove ancient+unused varnish::packages from base role [puppet] - 10https://gerrit.wikimedia.org/r/204506 [13:41:53] (03PS1) 10BBlack: move prof perf tweaks to separate include [puppet] - 10https://gerrit.wikimedia.org/r/204507 [13:41:55] springle: ok, I just sent you a series of (probably) incoherent emails. The upshot is, trying to run even a limited query times out. [13:42:03] andrewbogott: ok thanks, will update that change then [13:42:11] springle: e.g. ‘DELETE FROM token WHERE NOW() - INTERVAL 2 day > expires LIMIT 100;’ [13:42:28] springle: but, I should have asked ‘are you up and working?’ because this is not an emergency. [13:42:30] (03CR) 10BBlack: [C: 032 V: 032] remove ancient+unused varnish::packages from base role [puppet] - 10https://gerrit.wikimedia.org/r/204506 (owner: 10BBlack) [13:42:39] andrewbogott: you'll have to leave it alone for a while: ---TRANSACTION 34103A95, ACTIVE 41925 sec recovered trx ROLLING BACK [13:42:55] springle: ah, ok. Where is that? [13:42:56] nothing to be done. it need to finishing rolling back something big, i guess the frist DELETE [13:43:09] SHOW ENGINE INNODB STATUS [13:44:13] springle: ok thanks [13:45:12] (03CR) 10BBlack: [C: 032] move prof perf tweaks to separate include [puppet] - 10https://gerrit.wikimedia.org/r/204507 (owner: 10BBlack) [13:45:23] andrewbogott: what started the downward slide I don't know. For now, let it clean up after those two 12h transactions. Then consider batching the delete [13:45:38] or adding more resources [13:45:43] or both :) [13:45:46] springle: yeah, makes sense. [13:46:07] In theory more resources shouldn’t be needed — if cleanups happen ongoingly then the tokens table should only have a few thousand entries. [13:46:18] fair enough [13:46:23] We only have 5000 users [13:46:41] there will be a sweet spot for the cron frequency [13:47:18] 5000 users each with one 30-day token, that means there should only be <200 records to delete each day. [13:47:27] Unless something is seriously broken or I’m misunderstanding how this works. [13:47:48] But I will baby it in the meantime. Once I can run queries at all [13:48:02] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1212093 (10Anomie) Since the comments on the gerrit change seem to not be having any real attention paid: I (and apparently also @MaxSem) think that "/api/v1/"... [13:49:45] godog: great! [13:50:24] godog: I am merging in the related Gerrit change https://gerrit.wikimedia.org/r/#/c/195272/ against debian/precise-wikimedia [13:50:32] have you had any issue building it ? [13:53:06] 7Blocked-on-Operations, 6operations, 10Continuous-Integration, 5Continuous-Integration-Isolation, and 2 others: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1212105 (10hashar) Package has been build out of https://gerrit.wikimedia.org/r/#/c/195272/ patchset 19 ``` $ apt-cach... [13:53:09] (03CR) 10Aklapper: "If you don't interpret help as help but instead as commands, and if you reply to recommendations with a "Nobody should tell me what to do!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204473 (https://phabricator.wikimedia.org/T93339) (owner: 10Mjbmr) [13:58:42] 6operations: bacula restore job waiting on higher jobs - https://phabricator.wikimedia.org/T95705#1212111 (10akosiaris) Seems like the problem was bacula-sd on palladium locking up. bacula-sd has had a lot of problems in the past with threads locking up quite often. Most have been fixed but maybe we stumbled acr... [14:00:53] andrewbogott: my (brief and not thorough) understanding of the tokens thing the other day was that it would get a token for every operation, not just one per user [14:01:25] bblack: ah, that would better account for the gigantic leakage [14:04:21] PROBLEM - puppet last run on mw2102 is CRITICAL puppet fail [14:06:16] 7Blocked-on-Operations, 6operations, 10Continuous-Integration, 5Continuous-Integration-Isolation, and 2 others: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1212127 (10hashar) The Trusty version has been build out of https://gerrit.wikimedia.org/r/#/c/197329/ patchset 7 ``` a... [14:06:45] 6operations: bacula restore job waiting on higher jobs - https://phabricator.wikimedia.org/T95705#1212128 (10fgiunchedi) sounds good, I've kicked off the restore again [14:08:08] (03PS4) 10Filippo Giunchedi: Ensure that apt preferences are named *.pref [puppet] - 10https://gerrit.wikimedia.org/r/195081 (https://phabricator.wikimedia.org/T60681) (owner: 10Tim Landscheidt) [14:08:17] (03PS1) 10BBlack: move $mmaN down to 2layer where it belongs [puppet] - 10https://gerrit.wikimedia.org/r/204513 [14:08:19] (03PS1) 10BBlack: s/inheritance/include/ for role::cache::base [puppet] - 10https://gerrit.wikimedia.org/r/204514 [14:08:21] (03PS1) 10BBlack: s/inheritance/include/ for role::cache::1layer [puppet] - 10https://gerrit.wikimedia.org/r/204515 [14:08:36] (03PS4) 10Alex Monk: Add AffCom user group application contact page on meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204205 (https://phabricator.wikimedia.org/T95789) [14:09:52] (03CR) 10jenkins-bot: [V: 04-1] s/inheritance/include/ for role::cache::base [puppet] - 10https://gerrit.wikimedia.org/r/204514 (owner: 10BBlack) [14:14:05] (03CR) 10JanZerebecki: [C: 031] sshd: don't use NIST key exchange protocols [puppet] - 10https://gerrit.wikimedia.org/r/185321 (owner: 10Dzahn) [14:22:31] RECOVERY - puppet last run on mw2102 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [14:22:32] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Ensure that apt preferences are named *.pref [puppet] - 10https://gerrit.wikimedia.org/r/195081 (https://phabricator.wikimedia.org/T60681) (owner: 10Tim Landscheidt) [14:30:56] (03PS4) 10Filippo Giunchedi: statsite: default to localhost, override as needed [puppet] - 10https://gerrit.wikimedia.org/r/204275 [14:32:11] 6operations, 10Wikimedia-Labs-wikitech-interface: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1212172 (10Andrew) The keystone database is jam packed with expired tokens -- many of these issues seem to be resulting from resulting keystone slowdowns. I'm working on cleanin... [14:34:29] thanks andrewbogott [14:41:12] I am off see you tomorrow [14:45:25] (03CR) 10Mjbmr: [C: 04-1] Provide URLs for licenses mentioned in Phabricator footer [puppet] - 10https://gerrit.wikimedia.org/r/202708 (https://phabricator.wikimedia.org/T94946) (owner: 10Aklapper) [14:46:04] (03PS2) 10BBlack: move $mmaN down to 2layer where it belongs [puppet] - 10https://gerrit.wikimedia.org/r/204513 [14:46:06] (03PS2) 10BBlack: s/inheritance/include/ for role::cache::1layer [puppet] - 10https://gerrit.wikimedia.org/r/204515 [14:46:08] (03PS2) 10BBlack: s/inheritance/include/ for role::cache::base [puppet] - 10https://gerrit.wikimedia.org/r/204514 [14:46:23] where is jouncebot [14:46:39] not logged in to IRC at all, apparently [14:47:09] (03CR) 10jenkins-bot: [V: 04-1] s/inheritance/include/ for role::cache::1layer [puppet] - 10https://gerrit.wikimedia.org/r/204515 (owner: 10BBlack) [14:47:15] (03CR) 10jenkins-bot: [V: 04-1] s/inheritance/include/ for role::cache::base [puppet] - 10https://gerrit.wikimedia.org/r/204514 (owner: 10BBlack) [14:49:13] Dereckson, want to do those newiki changes in this swat window? [14:51:45] (03PS1) 10Andrew Bogott: Change the keystone token cleanup cron. [puppet] - 10https://gerrit.wikimedia.org/r/204519 [14:52:35] (03CR) 10Mjbmr: [C: 04-1] "Please see https://www.mediawiki.org/wiki/Manual:Coding_conventions#General_style" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203335 (https://phabricator.wikimedia.org/T95103) (owner: 10Dereckson) [14:52:41] (03PS2) 10Andrew Bogott: Change the keystone token cleanup cron. [puppet] - 10https://gerrit.wikimedia.org/r/204519 [14:54:16] (03CR) 10Tim Landscheidt: "Mille grazie! "git log -p 21f1fdcc834d593ee3dddcc61d35ce115620f696..9e95f828a0859f26710c0abfc290d720921e390c" unfortunately showed that m" [puppet] - 10https://gerrit.wikimedia.org/r/195081 (https://phabricator.wikimedia.org/T60681) (owner: 10Tim Landscheidt) [14:54:20] (03PS3) 10Andrew Bogott: Change the keystone token cleanup cron. [puppet] - 10https://gerrit.wikimedia.org/r/204519 [14:54:56] (03CR) 10Alexandros Kosiaris: [C: 04-1] "a first round of comments" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/181080 (https://phabricator.wikimedia.org/T85908) (owner: 10Filippo Giunchedi) [14:55:29] (03CR) 10Andrew Bogott: [C: 032] Change the keystone token cleanup cron. [puppet] - 10https://gerrit.wikimedia.org/r/204519 (owner: 10Andrew Bogott) [14:56:02] (03CR) 10Mjbmr: [C: 04-1] "You must submit separate patches for unrelated edits such as fixing commas." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203323 (https://phabricator.wikimedia.org/T95101) (owner: 10Dereckson) [14:57:41] PROBLEM - puppet last run on virt1000 is CRITICAL Puppet last ran 10 hours ago [14:59:21] RECOVERY - puppet last run on virt1000 is OK Puppet is currently enabled, last run 48 seconds ago with 0 failures [15:01:31] (no jouncebot, morning swat window open) [15:01:51] (03CR) 10Alex Monk: [C: 032] Follow-up I7cf8a614: Clean up lists of BZ numbers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203861 (owner: 10Alex Monk) [15:01:59] (03Merged) 10jenkins-bot: Follow-up I7cf8a614: Clean up lists of BZ numbers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203861 (owner: 10Alex Monk) [15:02:03] (03PS5) 10Thcipriani: Add submodules to master checkoutMediaWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204080 (https://phabricator.wikimedia.org/T88442) [15:02:28] (03CR) 10JanZerebecki: [C: 031] "Looks good. We could still add the alternative for older versions (see inline)." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/185325 (owner: 10Dzahn) [15:03:47] !log krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/203861/ - should be a no-op, just config file cleanup (duration: 00m 13s) [15:03:57] Logged the message, Master [15:05:01] (03CR) 10Andrew Bogott: [C: 032] Have sink create ldap host entries. [puppet] - 10https://gerrit.wikimedia.org/r/202582 (owner: 10Andrew Bogott) [15:05:52] (03CR) 10Alex Monk: [C: 032] "No unrelated changes, looks fine to me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203323 (https://phabricator.wikimedia.org/T95101) (owner: 10Dereckson) [15:06:01] (03Merged) 10jenkins-bot: User rights configuration on ne.wikipedia - Reviewer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203323 (https://phabricator.wikimedia.org/T95101) (owner: 10Dereckson) [15:06:21] (03PS3) 10Alex Monk: User rights configuration on ne.wikipedia - Abusefilter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203333 (https://phabricator.wikimedia.org/T95102) (owner: 10Dereckson) [15:09:00] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/203323 (duration: 00m 12s) [15:09:01] (03CR) 10Mjbmr: "You've submitted this patch on top of your other patch which is not merged, you can use following command to reset your local branch:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203333 (https://phabricator.wikimedia.org/T95102) (owner: 10Dereckson) [15:09:06] Logged the message, Master [15:09:35] (03CR) 10Alex Monk: "No, I assure you it's merged :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203333 (https://phabricator.wikimedia.org/T95102) (owner: 10Dereckson) [15:09:44] <_joe_> hey all of you [15:09:55] <_joe_> please stop this. [15:11:02] (03CR) 10JanZerebecki: [C: 031] "Looks good." [puppet] - 10https://gerrit.wikimedia.org/r/185329 (owner: 10Dzahn) [15:11:06] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1212302 (10GWicke) @anomie, my understanding is that there are no plans to version the action api, which is why I believe `/api/action/` could work well. We'll... [15:11:37] (03CR) 10Alex Monk: [C: 04-1] "There appears to be another issue though - this group doesn't appear to exist?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203333 (https://phabricator.wikimedia.org/T95102) (owner: 10Dereckson) [15:11:37] what's wrong _joe_? [15:13:05] (03CR) 10Mjbmr: "Right one or two minutes before my comment you merged it, but I'm trying to help him how to use git." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203333 (https://phabricator.wikimedia.org/T95102) (owner: 10Dereckson) [15:13:31] Mjbmr: Really, can you stop that? [15:13:34] (03CR) 10Mjbmr: [C: 04-1] User rights configuration on ne.wikipedia - Abusefilter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203333 (https://phabricator.wikimedia.org/T95102) (owner: 10Dereckson) [15:14:04] (03CR) 10Chad: "Yeah we won't be able to vary just on realm, as staging/beta will have different values here (hence my preference for hiera)." [puppet] - 10https://gerrit.wikimedia.org/r/204331 (owner: 10Chad) [15:23:17] (03CR) 10Mjbmr: [C: 04-1] New user groups on fr.wikinews (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198822 (https://phabricator.wikimedia.org/T90979) (owner: 10Dereckson) [15:24:30] <^d> Mjbmr: If you're going to go around and nitpick -1s all over the place because you're mad tell me now so I can just block you from gerrit and be done with it. [15:25:33] (03CR) 10Alex Monk: "I disagree with Mjbmr, I think the placing of the comments is fine, but we should add a space between the "//" and the "T"." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198822 (https://phabricator.wikimedia.org/T90979) (owner: 10Dereckson) [15:25:48] (03PS1) 10Ottomata: Add oozie queue that uses DRF scheduling mode [puppet] - 10https://gerrit.wikimedia.org/r/204527 [15:26:20] (03CR) 10Ottomata: [C: 032 V: 032] Add oozie queue that uses DRF scheduling mode [puppet] - 10https://gerrit.wikimedia.org/r/204527 (owner: 10Ottomata) [15:27:06] (03PS1) 10coren: Put mysql db on tmpfs for role::ci::slave::labs [puppet] - 10https://gerrit.wikimedia.org/r/204528 [15:27:13] Krinkle: ^^ [15:29:24] Krinkle: This does have the side effect that mysql-server won't start until the initial puppet run on reboot, because that's the only way for the permissions of /var/lib/mysql to be set first [15:29:32] (03CR) 10Mjbmr: "People will ignore it and add new things in the same array, so you won't understand by which ticket was added." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198822 (https://phabricator.wikimedia.org/T90979) (owner: 10Dereckson) [15:30:34] Coren: Ah interesting. The mysql-server is available like that? [15:31:02] Krinkle: It relies on the default (debian) provider - it works because 'service X start' works. [15:31:11] But I just noticed a missing require [15:31:33] I'm decoding the exec() [15:32:03] ^d, https://gerrit.wikimedia.org/r/#/c/204203/ [15:32:03] So the puppet 'creates' passes both if it's a directory and if it's a mount [15:32:03] (03PS2) 10coren: Put mysql db on tmpfs for role::ci::slave::labs [puppet] - 10https://gerrit.wikimedia.org/r/204528 [15:32:03] it's not even nitpicking [15:32:25] it's just completely wrong [15:32:40] Coren: Cool. [15:33:42] Coren: So to deploy. (or rather, try on one instance first). I should... stop mysql server, clear /var/lib/dir, create the mount, and then start the server (it will populate the directory? or re-run mysql-install). I think Tim was saying something about that yesterday. [15:33:46] Krinkle: The creates stanza just short-circuits the exec is there is an object of that name in the filesystem, it doesn't care what it is. [15:33:56] Nice [15:34:07] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1212410 (10Anomie) >>! In T95229#1212302, @GWicke wrote: > and lets us present documentation for each api at /api/{name}/. You changed the name of restbase to... [15:34:49] Krinkle: I don't think the directory will be populated by default - if you delete the dir and run puppet, mysql-server will fail to start but everything will be ready for a myql-install [15:34:55] (03CR) 10jenkins-bot: [V: 04-1] Put mysql db on tmpfs for role::ci::slave::labs [puppet] - 10https://gerrit.wikimedia.org/r/204528 (owner: 10coren) [15:35:09] Hm. Typo? [15:35:13] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1212424 (10GWicke) >>! In T95229#1212410, @Anomie wrote: > You changed the name of restbase to "v1" now? No, 'v1' is the name of the API we expose currently. R... [15:35:44] Missing comma [15:35:56] (03PS3) 10coren: Put mysql db on tmpfs for role::ci::slave::labs [puppet] - 10https://gerrit.wikimedia.org/r/204528 [15:37:39] Krinkle: So we should also puppetize the mysql-install since we know the tmpfs will be empty at every boot. [15:37:41] Coren: OK. Comparing to contint::tmpfs (which antoine wrote, we use that in /home/jenkins-deploy/tmpfs, assigned to env TMPDIR for any tmp files created during the build, wiped afterwards) [15:37:59] I guess we'll need size=. I don't know what the other options do though. [15:38:06] options => "noatime,defaults,size=${size},mode=1777", [15:38:24] andrewbogott: please look at https://gerrit.wikimedia.org/r/#/c/195567/ again [15:38:35] Coren: Thanks :) [15:38:39] Krinkle: size= is usful if you want to limit it. mode=1777 makes it a "true" sticky temporary dir which you don't want [15:39:16] Coren: Ah, right. world writable? [15:39:24] <^d> !log restarting gerrit [15:39:30] Yes, and with sticky bit [15:39:44] Logged the message, Master [15:39:46] <_joe_> ^d: aw in the middle of a review!!!1! [15:40:46] <^d> I'm sorry [15:40:46] <^d> It's back now [15:40:46] <_joe_> nah nevermind [15:40:46] <_joe_> now as a retaliation I'll engage in a !log war [15:40:46] "now you ruined it" [15:40:46] <^d> Er, coming back [15:41:33] Coren: Hm.. I notice that yours uses device=>none which makes sense. The other one uses device=tmpfs. I guess that's weird, right? [15:41:38] <_joe_> gerrit is java, right? [15:42:28] <_joe_> in that case, I'll take a break [15:42:28] <^d> Yes [15:42:28] <^d> It's back [15:42:28] <^d> Mjbmr: I've set your account to inactive in gerrit, you will not be able to login. Multiple people asked you to stop. [15:42:28] Krinkle: Actually, the device on a pseudofilesystem is just ignored. I used the traditional 'none' but any string works. [15:42:28] Oh, ha [15:43:04] was gerrit 503? [15:43:21] <^d> kart_: Briefly, had to restart [15:45:28] <_joe_> gerrit-wm is dead too [15:45:34] yeah :/ [15:45:59] <^d> it runs on tool labs with a redis queue [15:46:03] Krinkle: Hm, odd, I can't seem to find mysql_install_db in the mariadb 10 packages. [15:46:35] Coren: `which mysql_install_db` does resolve on the integration slave. [15:46:40] just checked [15:46:53] Hm. You're using 5 though right? [15:47:00] mysql Ver 14.14 Distrib 5.5.41, for debian-linux-gnu (x86_64) using readline 6.3 [15:47:08] Hm.. not MariaDB? [15:47:16] Oh, that too. [15:47:24] You use 'mysql-server' not 'mariadb-server'. :-) [15:47:28] sorry [15:47:45] Yeah. I wouldn't mind switching to mariadb if that's easier though [15:47:47] No, it's just me being silly. I just *saw* the package {} stanza minutes ago. [15:47:58] ^d: can't let me login [15:49:49] ok. I forgot. How to reset Gerrit password? [15:49:50] Is it same as Wikitech? [15:49:50] Krinkle: What's the full pathname? [15:49:50] <_joe_> kart_: yes [15:49:50] _joe_: thanks! [15:49:56] Coren: /var/lib/mysql. I'm aiming for 256M for now. [15:50:09] It's never used more than 90M. [15:50:29] oh sorry, that's not what you meant [15:50:34] Krinkle: I mean, the path to mysql_install_db :-) [15:50:37] ; /usr/bin/mysql_install_db [15:50:55] Yeah, just realised. [15:51:12] * Krinkle points to the jQuery logo on his shirt :P [15:55:30] RECOVERY - Host mw2128 is UPING OK - Packet loss = 0%, RTA = 43.47 ms [15:57:11] 6operations, 10MediaWiki-extensions-Graph, 6Services, 10service-template-node, 7service-runner: Deploy graphoid service into production - https://phabricator.wikimedia.org/T90487#1060320 (10Yurik) [15:58:55] Krinkle: ^^ new and improved [16:00:00] akosiaris, is there anything blocking this? https://phabricator.wikimedia.org/T94984 [16:00:15] sorry, wrong nick (( [16:01:38] jouncebot is gone? [16:01:40] :) [16:04:21] Coren: Thanks. Now for the fun part :) [16:04:31] (where you get to laugh :D) [16:05:05] can someone poke jouncebot? [16:06:12] !log starting to run forceRenameUsers.php (SUL finalization) [16:06:18] Logged the message, Master [16:11:24] 6operations, 10ops-codfw: mw2128 not rebooting after network driver crash, blank console - https://phabricator.wikimedia.org/T95264#1212610 (10Papaul) 5Open>3Resolved a:3Papaul Har Joe add the new NIC MAC address int he dhcp files. Did the installation, the system is back up and running. [16:12:40] legoktm: CX deployment will start soon, hope that it won't intrupt on SUL finalization :) [16:13:09] waiting to merge patches for submodule update. [16:14:09] kart_: nope, thats fine [16:14:09] cool. thanks. [16:18:18] 6operations, 10MediaWiki-extensions-Graph, 6Services, 10service-template-node, 7service-runner: Deploy graphoid service into production - https://phabricator.wikimedia.org/T90487#1212629 (10Yurik) a:5Yurik>3akosiaris [16:19:37] !log kartik Started scap: Update ContentTranslation [16:19:42] Logged the message, Master [16:28:21] PROBLEM - puppet last run on wtp2018 is CRITICAL puppet fail [16:28:47] sth is wrong with gerrit? [16:28:57] "Add Reviewer" button is not there [16:29:07] <^d> Is for me... [16:29:14] <^d> Logged in? [16:29:31] <^d> Gerrit had to get restarted a bit ago, sessions were reset. [16:29:31] oh.. got logged out somehow [16:29:35] ah [16:34:51] PROBLEM - HHVM rendering on mw1181 is CRITICAL - Socket timeout after 10 seconds [16:34:51] 10Ops-Access-Requests, 6operations: Give joal access to eventlog1001.eqiad.wmnet - https://phabricator.wikimedia.org/T95905#1212682 (10kevinator) Joseph will need rights to access **hafnium** as well [16:34:51] PROBLEM - Apache HTTP on mw1181 is CRITICAL - Socket timeout after 10 seconds [16:35:28] 10Ops-Access-Requests, 6operations: Give joal access to eventlog1001.eqiad.wmnet - https://phabricator.wikimedia.org/T95905#1212684 (10kevinator) [16:39:18] 10Ops-Access-Requests, 6operations: Give joal access to eventlog1001.eqiad.wmnet - https://phabricator.wikimedia.org/T95905#1212695 (10Tnegrin) approved [16:42:29] !log kartik Finished scap: Update ContentTranslation (duration: 22m 13s) [16:42:30] Logged the message, Master [16:43:50] PROBLEM - NTP on mw2128 is CRITICAL: NTP CRITICAL: No response from NTP server [16:44:08] !log csteipp Synchronized w: (no message) (duration: 00m 11s) [16:44:11] Logged the message, Master [16:45:00] RECOVERY - puppet last run on wtp2018 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:45:01] PROBLEM - HHVM busy threads on mw1181 is CRITICAL 40.00% of data above the critical threshold [115.2] [16:45:22] <^d> _joe_: Actually the more I look at it, I don't like keeping dsh lists as plain text files [16:45:42] <^d> There really isn't a clean way to do it that wouldn't require staging to check in prod puppet changes every time we change a dest host. [16:46:00] !log removed oauth-headers.php since that allowed stealing httponly cookies [16:46:10] Logged the message, Master [16:48:40] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [16:50:09] <^d> Ugh, that's what we do right now for beta :( [16:52:02] ^d: yeah :( It wouldn't be hard to make scap read the list of hosts from somewhere else if we had a place for it that was automatically generated and updated [16:52:12] 6operations, 10ops-codfw: rack/wire/initial setup of db2043-db2070 - https://phabricator.wikimedia.org/T89368#1212737 (10RobH) It seems the DRAC isn't setup, please advise when done, as we'll start the installs on these. [16:52:45] <^d> bd808: I was looking at stuffing them in hiera but it was ugly [16:52:58] <^d> https://gerrit.wikimedia.org/r/#/c/204331/, for reference [16:53:08] etcd ;) [16:53:49] or salt I guess [16:53:53] * bd808 shivvers [16:56:18] <^d> I mean we could easily do the if ( prod ) { use file } else { fetch from hiera } [16:56:28] <^d> But those kinds of distinctions are exactly what I'm trying to kill :) [16:57:29] PROBLEM - HHVM queue size on mw1181 is CRITICAL 44.44% of data above the critical threshold [80.0] [16:57:30] 6operations, 10ops-codfw: rack/wire/initial setup of db2043-db2070 - https://phabricator.wikimedia.org/T89368#1212761 (10Papaul) @robh working on that [17:03:40] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60627 bytes in 7.143 second response time [17:06:56] 6operations, 10Wikimedia-Logstash, 10hardware-requests: eqiad: (3) servers for logstash service - https://phabricator.wikimedia.org/T84958#1212802 (10RobH) The order for these systems has been placed on https://rt.wikimedia.org/Ticket/Display.html?id=9199. The ETA as of now is 2015-04-24 (Friday) for delive... [17:10:20] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [17:18:21] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60627 bytes in 0.089 second response time [17:31:07] Hm.. no gerrit bot? [17:32:17] <^d> Yeah it did earlier with jouncebot [17:32:22] <^d> Or from the gerrit restart [17:33:06] Krenair: there is a fix [17:33:13] jouncebot was dead first [17:33:27] grrrit-wm is here but not relaying changes for some reason [17:33:47] I think the redis stream probably died with the gerrit restart, and I don't know if I have access to kick that [17:34:02] Nikerabbit, for the content translation thing? [17:34:28] Krenair: https://gerrit.wikimedia.org/r/#/c/204551/ [17:34:46] yeah I saw :) [17:36:05] Nikerabbit, it looks OK as long as you're never basing it on an existing revision. If the text is based on an existing revision you should set oldid [17:38:36] YuviPanda, legoktm, marktraceur, and others can restart gerrit-to-redis [17:40:23] hmm what's the sources.list syntax to use debian/experimental from our mirror? [17:41:54] grrrit-wm: ping [17:42:25] Is jenkins known to be broken? Or is there just a big backlog? [17:43:07] looks like a big backlog andrewbogott [17:43:11] see https://integration.wikimedia.org/zuul/ [17:43:18] ok, I will seize the moment and get lunch [17:43:38] It could be both of course :) [17:44:07] hmm [17:44:28] how hard would it be to add more $realms besides prod and labs? i guess since beta and staging have different values for things they are actually different realms and "labs" isnt specific enough? [17:46:15] I think that's the idea with hiera, to allow for more $realms [17:47:15] except that right now, we kinda treat $realm as a binary value for prod/labs. It might be easier to use a new name than to track down all the != 'production' sorts of use-cases. I donno. [17:47:26] well, interestingly this was re: a discussion on gerrit where it's said moving a static file to hiera is wrong, in this specific case [17:47:41] where? [17:47:49] https://gerrit.wikimedia.org/r/#/c/204331/ [17:49:00] yeah, I don't know the specifics on what our plan is for "many $realms" in pragmatic hiera terms yet [17:49:40] 3 realms would feel natural, beta -> staging -> prod [17:49:42] I guess a big part of it is "custom hiera data provider" [17:50:22] in a discussion last night, Yuvi said in the long term they basically want $realms_as_a_service . Like, deploy a new testing $realm of new unique name in a matter of hours for a new project to use :) [17:51:02] oh, that sounds advanced :) [17:53:19] <^d> bblack: That's the idea with staging. See https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/nodes/labs/staging.yaml so far [17:53:59] creator of realms [17:57:08] apergos: let's enable the proxy for /htmldumps or not yet? [17:57:31] RECOVERY - nova-compute process on labvirt1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [18:02:40] PROBLEM - nova-compute process on labvirt1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [18:10:30] 6operations, 10MediaWiki-Debug-Logging, 10hardware-requests, 5Patch-For-Review: Fluorine needs bigger disks - https://phabricator.wikimedia.org/T92417#1213188 (10RobH) I'm going to pull the hardware-requests project off this, and update the subject of the task accordingly. [18:10:51] 6operations, 10MediaWiki-Debug-Logging, 5Patch-For-Review: Investigation if Fluorine needs bigger disks or we retain too much data - https://phabricator.wikimedia.org/T92417#1213189 (10RobH) [18:11:01] Krenair: grrrit-wm no longer depends on gerrrit to redis [18:11:08] In fact nothing depends on that [18:11:59] 6operations, 10MediaWiki-extensions-Sentry, 6Multimedia, 10hardware-requests, 3Multimedia-Sprint-2015-03-25: Procure hardware for Sentry - https://phabricator.wikimedia.org/T93138#1213191 (10RobH) p:5Lowest>3Normal [18:12:17] YuviPanda, huh [18:12:45] YuviPanda, please update https://wikitech.wikimedia.org/wiki/Grrrit-wm [18:13:34] Boop yes [18:33:27] 6operations, 10Parsoid, 6Services: Lets consider upgrading our Node.js installs to io.js (once decent Debian packages are ready) - https://phabricator.wikimedia.org/T91855#1213324 (10Ricordisamoa) [18:36:30] 6operations, 10Parsoid, 6Services: Lets consider upgrading our Node.js installs to io.js (once decent Debian packages are ready) - https://phabricator.wikimedia.org/T91855#1213348 (10GWicke) I think it's worth considering the upgrade for our Jessie installs. That would avoid affecting any older services incl... [18:38:16] !log creating "Maintenance script" account on all SUL wikis for globaluserpage [18:38:22] Logged the message, Master [18:38:41] PROBLEM - Host labvirt1001 is DOWN: PING CRITICAL - Packet loss = 100% [18:40:50] !log rebooting labvirt1001 [18:40:54] Logged the message, Master [18:41:47] RECOVERY - Host labvirt1001 is UPING OK - Packet loss = 0%, RTA = 0.57 ms [18:43:27] oops, somehow i managed to mass add a ton of reviewers to a patch without meaning too :) [18:43:48] i think i hit a group name by autocomplete [18:49:36] 6operations: Encrypted password storage - https://phabricator.wikimedia.org/T96130#1213364 (10akosiaris) Just 2 cents from my (limited) effort on this. I have evaluated http://www.passwordstore.org/ two times. The first one was on version 1.4 or something and I found it unsuitable for the job. Back then it had... [18:50:11] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 35.71% of data above the critical threshold [500.0] [18:52:12] 6operations: Encrypted password storage - https://phabricator.wikimedia.org/T96130#1213366 (10akosiaris) [18:52:12] 6operations: implement GPG based password sharing solution - https://phabricator.wikimedia.org/T83410#1213365 (10akosiaris) [18:52:17] 6operations: implement GPG based password sharing solution - https://phabricator.wikimedia.org/T83410#913554 (10akosiaris) [18:53:58] !log rebooting labvirt100x to turn on virtualization in bios [18:54:04] Logged the message, Master [18:54:21] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 11 data above and 2 below the confidence bounds [18:54:47] 503 on bits [18:55:03] 6operations: Encrypted password storage - https://phabricator.wikimedia.org/T96130#1213378 (10Dzahn) >>! In T96130#1211758, @MoritzMuehlenhoff wrote: > I did some digging in existing bugs, but couldn't find anything. I can't access the existing https://phabricator.wikimedia.org/T83410 , though?. It claims to be... [18:55:38] (not always) [18:56:22] PROBLEM - Host labvirt1001 is DOWN: PING CRITICAL - Packet loss = 100% [18:56:55] 503's not only on bits [18:57:11] PROBLEM - Host labvirt1004 is DOWN: PING CRITICAL - Packet loss = 100% [18:57:11] PROBLEM - Host labvirt1003 is DOWN: PING CRITICAL - Packet loss = 100% [18:57:11] PROBLEM - Host labvirt1002 is DOWN: PING CRITICAL - Packet loss = 100% [18:57:11] PROBLEM - Host labvirt1005 is DOWN: PING CRITICAL - Packet loss = 100% [18:57:11] PROBLEM - Host labvirt1006 is DOWN: PING CRITICAL - Packet loss = 100% [18:57:47] http://wklej.org/id/1689779/ [18:58:39] Hi [18:58:48] Wikisource a bit so-so right now... [18:58:53] (UK users) [18:58:57] everything is so-so [18:59:07] and getting worse [18:59:26] ah, the lack of css for en.wp isn't just me then. [18:59:31] jgage: ^ [18:59:36] 503 on main site too [18:59:37] stuff is down [18:59:39] region? [18:59:44] europe [18:59:48] europe [18:59:49] UK for me, bblack [18:59:56] Backend problem [19:00:02] cp3007 cp3013 [19:00:48] I see a serious QPS on db1072 and db1073 [19:00:58] https://gdash.wikimedia.org/dashboards/reqerror/ <- is awful too [19:01:32] thanks for the reports everyone, we are investigating [19:02:09] <_joe_> ok seems we're back [19:02:16] <_joe_> it was /not/ a backend issue [19:02:30] PROBLEM - puppet last run on cp3044 is CRITICAL puppet fail [19:02:33] which backend is backend in that statement? [19:02:51] <_joe_> bblack: appservers, people looking at databases may suggest that :) [19:03:08] dberrors might be unrelated [19:03:25] Getting a lot of can't connect to DB stuff [19:03:28] but not that much [19:04:39] I'll come back later you seem busy [19:05:58] still seeing errors [19:06:21] RECOVERY - Host labvirt1005 is UPING OK - Packet loss = 0%, RTA = 1.43 ms [19:06:21] RECOVERY - Host labvirt1002 is UPING OK - Packet loss = 0%, RTA = 1.78 ms [19:06:21] RECOVERY - Host labvirt1003 is UPING OK - Packet loss = 0%, RTA = 2.72 ms [19:06:21] RECOVERY - Host labvirt1006 is UPING OK - Packet loss = 0%, RTA = 1.65 ms [19:06:30] RECOVERY - Host labvirt1004 is UPING OK - Packet loss = 0%, RTA = 2.89 ms [19:09:07] no grrrit-wm? [19:10:12] PROBLEM - nova-compute process on labvirt1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [19:10:21] RECOVERY - Host labvirt1001 is UPING OK - Packet loss = 0%, RTA = 4.64 ms [19:10:31] PROBLEM - nova-compute process on labvirt1004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [19:10:41] PROBLEM - nova-compute process on labvirt1006 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [19:10:41] PROBLEM - nova-compute process on labvirt1003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [19:11:11] PROBLEM - nova-compute process on labvirt1005 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [19:15:51] !log depooling esams, network issues [19:15:56] Logged the message, Master [19:16:03] hey a working bot! [19:16:13] No grrrit-wm for some reason. Yesterday' s gerrit restart maybe? [19:16:41] I'll poke from office (stuck in Bart for 15mins now, 'medical emergency') [19:19:30] RECOVERY - puppet last run on cp3044 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [19:22:32] ACKNOWLEDGEMENT - nova-compute process on labvirt1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute andrew bogott nova-compute and libvirtd broken pending https://phabricator.wikimedia.org/T96291 [19:22:32] ACKNOWLEDGEMENT - nova-compute process on labvirt1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute andrew bogott nova-compute and libvirtd broken pending https://phabricator.wikimedia.org/T96291 [19:22:32] ACKNOWLEDGEMENT - nova-compute process on labvirt1003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute andrew bogott nova-compute and libvirtd broken pending https://phabricator.wikimedia.org/T96291 [19:22:32] ACKNOWLEDGEMENT - nova-compute process on labvirt1004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute andrew bogott nova-compute and libvirtd broken pending https://phabricator.wikimedia.org/T96291 [19:22:32] ACKNOWLEDGEMENT - nova-compute process on labvirt1005 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute andrew bogott nova-compute and libvirtd broken pending https://phabricator.wikimedia.org/T96291 [19:22:33] ACKNOWLEDGEMENT - nova-compute process on labvirt1006 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute andrew bogott nova-compute and libvirtd broken pending https://phabricator.wikimedia.org/T96291 [19:25:01] andrewbogott: assuming you ? ^ [19:25:11] PROBLEM - puppet last run on cp3046 is CRITICAL puppet fail [19:25:12] yep [19:32:32] mutante: fire when ready, I say [19:32:48] but I would disable puppt on ds1001 frst, test on ms1001 [19:32:50] FIRE ZE MISSILES [19:33:06] then if there's some stupid thing I did you can catch it without any harm [19:37:42] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [19:37:50] 6operations: Encrypted password storage - https://phabricator.wikimedia.org/T96130#1213480 (10chasemp) >>! In T96130#1211758, @MoritzMuehlenhoff wrote: > Sure, we should merge it, then. I did some digging in existing bugs, but couldn't find anything. I can't access the existing https://phabricator.wikimedia.org/... [19:40:25] 6operations: Encrypted password storage - https://phabricator.wikimedia.org/T96130#1213481 (10chasemp) p:5Triage>3High [19:43:41] RECOVERY - puppet last run on cp3046 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures [19:44:51] PROBLEM - puppet last run on nescio is CRITICAL puppet fail [19:44:52] !log Updated jobqueue:aggregator:s-wikis:v2 key on 10.64.32.76 to $wgLocalDatabases (sans labswiki) [19:44:59] Logged the message, Master [19:46:16] andrewbogott: https://gerrit.wikimedia.org/r/204612 [19:46:44] I also have a half hearted attempt to set the ca in there. I suppose it should work but I haven't tested it [19:47:08] (03CR) 10Hashar: "random comments mostly to clarify." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/204528 (https://phabricator.wikimedia.org/T96230) (owner: 10coren) [19:47:14] (03CR) 10BBlack: [C: 031] "testing grrrit-wm" [puppet] - 10https://gerrit.wikimedia.org/r/204545 (owner: 10BBlack) [19:47:18] woot [19:48:32] (03CR) 10Yuvipanda: "The template seems overkill, but why not just put the data itself in hiera and just construct the output via inline_template or something?" [puppet] - 10https://gerrit.wikimedia.org/r/204331 (owner: 10Chad) [19:48:51] PROBLEM - puppet last run on cp3016 is CRITICAL puppet fail [19:49:09] \o/ wb grrrit-wm [19:49:11] PROBLEM - puppet last run on mw2104 is CRITICAL Puppet has 1 failures [19:49:23] puppetfails are going to be sproadically-unavoidable for esams hosts, due to the link issues [19:49:24] should have a logging solution that isn’t ‘write to NFS' [19:49:29] legoktm: round? [19:49:30] but traffic is off of them regardless [19:49:31] I think that’s going to fix some of our issues too [19:49:36] jamesofur: yes [19:50:19] (03PS2) 10Alexandros Kosiaris: Populate labvirtstar from wmf_ca_2014_2017 [puppet] - 10https://gerrit.wikimedia.org/r/204612 [19:50:34] legoktm: any thought about a good way to force email to be disabled on all wikis but 1? [19:50:44] The global merge caused some privacy issues... [19:50:56] jamesofur: don't say that [19:51:19] jamesofur: er, for a global account? [19:51:50] legoktm: yup [19:51:54] I'll PM you [19:52:01] you may have some other thoughts [19:52:07] (03PS12) 10GWicke: Set up /api/v1/ entry point for restbase [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) [19:54:27] (03CR) 10coren: contint: Put mysql db on tmpfs for role::ci::slave::labs (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/204528 (https://phabricator.wikimedia.org/T96230) (owner: 10coren) [19:56:18] (03PS1) 10Alex Monk: Change RT references to Phabricator tickets [puppet] - 10https://gerrit.wikimedia.org/r/204616 [19:57:19] (03CR) 10Dzahn: [C: 032] html dumps will be served from host where they are produced, via proxy [puppet] - 10https://gerrit.wikimedia.org/r/204257 (owner: 10ArielGlenn) [19:58:12] (03PS3) 10Alexandros Kosiaris: Populate labvirtstar from wmf_ca_2014_2017 [puppet] - 10https://gerrit.wikimedia.org/r/204612 (https://phabricator.wikimedia.org/T96291) [20:00:35] (03PS5) 10Hashar: (wip) nodepool yaml conf file (wip) [puppet] - 10https://gerrit.wikimedia.org/r/201728 (https://phabricator.wikimedia.org/T89143) [20:01:25] (03CR) 10jenkins-bot: [V: 04-1] (wip) nodepool yaml conf file (wip) [puppet] - 10https://gerrit.wikimedia.org/r/201728 (https://phabricator.wikimedia.org/T89143) (owner: 10Hashar) [20:02:11] akosiaris: I’ll try it, thanks [20:02:45] (03CR) 10Andrew Bogott: [C: 032] Add a codfw nova config. [puppet] - 10https://gerrit.wikimedia.org/r/204610 (owner: 10Andrew Bogott) [20:03:21] RECOVERY - puppet last run on nescio is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:06:18] <^d> YuviPanda: I thought hiera but it was ugly :) [20:06:22] PROBLEM - puppet last run on db2039 is CRITICAL puppet fail [20:07:21] RECOVERY - puppet last run on cp3016 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:07:40] RECOVERY - puppet last run on mw2104 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:09:27] andrewbogott: question, on labs VMs with jessie I get a Notice: /Stage[main]/Ssh::Server/File[/etc/ssh/userkeys/admin/.ssh/authorized_keys /public]/ensure: removed [20:09:34] when puppet is running... [20:09:54] actually, not one, more like 10 but all pretty much for that hierarchy [20:10:06] that rings a bell, let me look… [20:10:09] I still haven't figured out how that thing ends up in there [20:10:27] it’s probably a remnant from the base image [20:10:44] if we built a new one they wouldn’t be there because puppet would run on the base before packaging [20:11:06] I did investigate the image [20:11:14] a did a glance download [20:11:22] and then qemu-utils convert to raw [20:11:22] (03PS1) 10Dzahn: site.pp: add node francium.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/204617 (https://phabricator.wikimedia.org/T94457) [20:11:31] and mounted it and that path was not there [20:11:53] YuviPanda: ^ that’s something you worked on, right? [20:12:06] akosiaris: does it happen on every puppet run or just the initial one? [20:12:13] every one [20:12:27] i did get this btw cloud-init.log:Apr 15 19:34:09 [CLOUDINIT] util.py[DEBUG]: Writing to /etc/ssh/userkeys/admin/.ssh/authorized_keys /public/keys/admin/.ssh/authorized_keys - wb: [384] 0 bytes [20:12:52] akosiaris: that sounds like a hiera mistake that is applying admin on labs instances [20:12:55] and cloud-init is not something I have worked with before [20:13:17] how is francium.eqiad.wmnet up and running and a puppet run works, but it's not in site.pp? [20:13:31] is this some kind of testing with a different master in prod? [20:13:35] mutante: default [20:13:39] end of site.pp [20:13:48] oh! right [20:14:00] magic [20:15:31] (03CR) 10Dzahn: [C: 032] site.pp: add node francium.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/204617 (https://phabricator.wikimedia.org/T94457) (owner: 10Dzahn) [20:16:11] https://wikitech.wikimedia.org/wiki/Incident_documentation/20150406-Flow is not showing up at https://wikitech.wikimedia.org/wiki/Incident_documentation , even after I edit the page. Is there normally a lag on the query? [20:20:13] (03PS1) 10Dzahn: dumps::zim: add role and firewall to francium [puppet] - 10https://gerrit.wikimedia.org/r/204618 (https://phabricator.wikimedia.org/T94457) [20:21:40] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 14.29% of data above the critical threshold [500.0] [20:22:10] (03CR) 10Dzahn: [C: 032] dumps::zim: add role and firewall to francium [puppet] - 10https://gerrit.wikimedia.org/r/204618 (https://phabricator.wikimedia.org/T94457) (owner: 10Dzahn) [20:23:52] akosiaris: that looks like roughly the same thing as https://phabricator.wikimedia.org/T94866 [20:24:13] 7Blocked-on-Operations, 6operations, 5Patch-For-Review: Install nodejs, nginx and other dependencies on francium - https://phabricator.wikimedia.org/T94457#1213563 (10Dzahn) >>! In T94457#1209183, @ArielGlenn wrote: > https://gerrit.wikimedia.org/r/#/c/204257/ nginx setup for the html dumps producing host, w... [20:25:00] RECOVERY - puppet last run on db2039 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:25:24] 7Blocked-on-Operations, 6operations, 5Patch-For-Review: Install nodejs, nginx and other dependencies on francium - https://phabricator.wikimedia.org/T94457#1213565 (10Dzahn) ``` [francium:~] $ dpkg -l | grep nodejs ii nodejs 0.10.25~dfsg2-2ubuntu1 amd64 evented... [20:25:38] andrewbogott: indeed [20:25:43] thanks! [20:25:56] (03CR) 10Hashar: "The package was not creating the nodepool user. Refreshed it and uploaded the result at terbium.eqiad.wmnet:/home/hashar/public_html/debs/" [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/203961 (https://phabricator.wikimedia.org/T89142) (owner: 10Hashar) [20:26:41] PROBLEM - puppet last run on francium is CRITICAL Puppet has 2 failures [20:26:44] andrewbogott: I wasn’t particularly involved - I just merged a bunch of paravoid’s patches because they were needed for some beta work I was doing [20:26:44] (03PS1) 10Ori.livneh: Reset port of $wgStatsdServer to default (8125) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204622 [20:26:56] what did I do [20:26:58] (03PS1) 10Legoktm: Increase $wgMaxNameChars to 85 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204623 [20:27:01] 7Blocked-on-Operations, 6operations, 5Patch-For-Review: Install nodejs, nginx and other dependencies on francium - https://phabricator.wikimedia.org/T94457#1213569 (10Dzahn) ``` Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install libsqlite3' returned 100: Reading package li... [20:27:04] 6operations, 10Continuous-Integration, 5Continuous-Integration-Isolation, 5Patch-For-Review, 7Upstream: Create a Debian package for NodePool on Debian Jessie - https://phabricator.wikimedia.org/T89142#1213570 (10hashar) I have build the package based on https://gerrit.wikimedia.org/r/#/c/203961/ patchset... [20:29:57] 6operations, 5Continuous-Integration-Isolation: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1213581 (10hashar) I have created a basic Debian package for Nodepool (T89142) and installed it on `labnodepool1001.eqiad.wmnet`. For testing purposes I have created a basic configuration... [20:31:16] anyone free to chat with a volunteer on IRC to see if there are any routing issues for their area? Dutch user has been having slowness for hours now on all our sites. [20:31:39] (03PS1) 10Dzahn: dumps::zim: libsqlite3 is actually libsqlite3-0 [puppet] - 10https://gerrit.wikimedia.org/r/204625 (https://phabricator.wikimedia.org/T94457) [20:31:50] !log ori Synchronized php-1.26wmf1/includes/libs/BufferingStatsdDataFactory.php: 3077a66625: Don't bother buffering a counter update with a delta of zero. (duration: 00m 13s) [20:32:15] Logged the message, Master [20:33:17] (03CR) 10Dzahn: [C: 032] "[francium:~] $ apt-cache search libsqlite3" [puppet] - 10https://gerrit.wikimedia.org/r/204625 (https://phabricator.wikimedia.org/T94457) (owner: 10Dzahn) [20:33:22] (03CR) 10Filippo Giunchedi: [C: 031] Reset port of $wgStatsdServer to default (8125) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204622 (owner: 10Ori.livneh) [20:33:22] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [20:33:29] ori: looks good [20:33:39] (03CR) 10Ori.livneh: [C: 032] Reset port of $wgStatsdServer to default (8125) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204622 (owner: 10Ori.livneh) [20:33:45] (03Merged) 10jenkins-bot: Reset port of $wgStatsdServer to default (8125) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204622 (owner: 10Ori.livneh) [20:34:58] !log ori Synchronized wmf-config/CommonSettings.php: I31c7b2c3d5: Reset port of $wgStatsdServer to default (8125) (duration: 00m 14s) [20:35:02] Logged the message, Master [20:37:14] !log MediaWiki stats flowing into StatsD again. [20:37:19] Logged the message, Master [20:37:23] 7Blocked-on-Operations, 6operations, 5Patch-For-Review: Install nodejs, nginx and other dependencies on francium - https://phabricator.wikimedia.org/T94457#1213599 (10Dzahn) ``` ii libsqlite3-0:amd64 3.8.2-1ubuntu2 amd64 SQLite 3 shared library ``` ^ now installed.... [20:37:52] jamesofur: yes? [20:38:11] (03PS1) 10Alex Monk: Change BZ references to Phabricator tickets [puppet] - 10https://gerrit.wikimedia.org/r/204626 [20:39:30] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK No anomaly detected [20:39:49] paravoid: could you /query Trijnstel she can explain more about where she is/what's happening [20:40:21] ok, thanks [20:42:39] 7Blocked-on-Operations, 6operations, 5Patch-For-Review: Install nodejs, nginx and other dependencies on francium - https://phabricator.wikimedia.org/T94457#1213615 (10Dzahn) so to summarize: original request was "nodejs, nodejs-dev, npm, libsqlite3 and libsqlite3-dev .. additionally other tools like imagema... [20:49:06] (03Abandoned) 10GWicke: Fix hiera lookup for cache::text::node [puppet] - 10https://gerrit.wikimedia.org/r/204448 (owner: 10GWicke) [20:50:01] PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL 1.67% of data above the critical threshold [1000.0] [20:50:34] icinga-wm: shush [20:51:31] ACKNOWLEDGEMENT - carbon-cache too many creates on graphite1001 is CRITICAL 1.67% of data above the critical threshold [1000.0] Filippo Giunchedi mediawiki statsd traffic [20:52:10] (03PS1) 10Andrew Bogott: Tidy up codfw nova config a bit. [puppet] - 10https://gerrit.wikimedia.org/r/204627 [20:53:00] (03CR) 10GWicke: "Tested again in labs, with Giuseppe's hiera lookup patch applied." [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) (owner: 10GWicke) [20:53:38] (03CR) 10Andrew Bogott: [C: 032] Tidy up codfw nova config a bit. [puppet] - 10https://gerrit.wikimedia.org/r/204627 (owner: 10Andrew Bogott) [20:54:35] thedj: around? [20:55:51] RECOVERY - puppet last run on labcontrol2001 is OK Puppet is currently enabled, last run 27 seconds ago with 0 failures [20:58:08] 6operations: Update DNS for the Wikipedia store, before May 31 - https://phabricator.wikimedia.org/T96182#1213667 (10Dzahn) @vshchepakina We can close this one, Effie replied and said: "Apologies for the mass email sent to Victoria, this doesn't apply to Plus customers, or your account, because you're using a... [20:58:14] (03PS1) 10Ori.livneh: Rename rrd-navtiming to 'coal'; log to Whisper file instead of RRD [puppet] - 10https://gerrit.wikimedia.org/r/204628 [20:58:18] 6operations: Update DNS for the Wikipedia store, before May 31 - https://phabricator.wikimedia.org/T96182#1213668 (10Dzahn) 5Open>3Invalid a:3Dzahn [20:58:32] (03PS2) 10Ori.livneh: Rename rrd-navtiming to 'coal'; log to Whisper file instead of RRD [puppet] - 10https://gerrit.wikimedia.org/r/204628 [20:58:47] haha :) [20:58:58] * YuviPanda makes a coal log joke [20:59:06] YuviPanda: you were right, re: whisper [20:59:33] just using it directly you mean? [20:59:36] (03CR) 10Ori.livneh: [C: 032] Rename rrd-navtiming to 'coal'; log to Whisper file instead of RRD [puppet] - 10https://gerrit.wikimedia.org/r/204628 (owner: 10Ori.livneh) [20:59:41] does it support medians? [21:00:04] no, but in my fight for "aggressive KISS", i am not using aggregation at all [21:00:15] aha! YESS [21:00:44] one year's worth of data, write an update every sixty seconds [21:00:57] update is median of sliding window representing last five minutes [21:01:39] nice. [21:01:44] that definitely is far simpler. [21:01:48] and you can just backup older data... [21:01:50] that's 525,949 datapoints, or 6 megabytes of data per metric. easy and fast to resample to hourly/daily/weekly/monthly values on the fly [21:02:01] yeah. [21:02:16] I’m going to not do this for tools until you’re done with your stuff and then I’ll just steal it :) [21:02:24] cool cool [21:03:02] this is the source: https://github.com/wikimedia/operations-puppet/blob/HEAD/modules/webperf/files/coal [21:03:05] godog: ^ you might like this too [21:04:10] ori: yeah I like minimalist tools [21:04:25] (03CR) 10Andrew Bogott: [C: 032] "This works! Thank you." [puppet] - 10https://gerrit.wikimedia.org/r/204612 (https://phabricator.wikimedia.org/T96291) (owner: 10Alexandros Kosiaris) [21:04:31] RECOVERY - nova-compute process on labvirt1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [21:04:31] * YuviPanda puts coal in godog’s socks. [21:04:41] I never understood that joke until like a year ago [21:05:26] godog: re: the statsite / graphite issue I was talking to you about earlier, realized that it is working as intended... [21:05:37] (03PS1) 10Dzahn: shop/store: set CNAME for store c.ssl.shopify.com [dns] - 10https://gerrit.wikimedia.org/r/204629 (https://phabricator.wikimedia.org/T92438) [21:05:40] if I’m sending it 6 points every minute, and then it’s averaging them... [21:05:53] then it’s going to be 1/6 if there was only one data point and everything is 0 [21:06:04] so what I want is for it to be *added* together for each flush interval... [21:06:08] not sure if that’s possible. [21:06:19] sure, you can send delta for gauges [21:06:29] but when they get flushed do gauges get set to 0? [21:06:44] (coal in socks, yahoo answers delivers https://answers.yahoo.com/question/index?qid=20061224202156AACGaaF) [21:06:45] I can probably set gauges and use something like derivative to get the rate of change [21:07:34] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1213688 (10Dzahn) ``` for your new domain store.wikimedia.org, you'll need to change your CNAME for store to c.ssl.shopify.com. Currently it's an A reco... [21:08:43] YuviPanda: what if you always send the total number of manifests you have found? [21:08:52] (03CR) 10Gage: [C: 031] remove dropped shop/store redirects [puppet] - 10https://gerrit.wikimedia.org/r/204558 (https://phabricator.wikimedia.org/T92438) (owner: 10Dzahn) [21:09:07] godog: then because sometimes I get 5 metrics per min and sometimes 6, it has this weird ridgy pattern [21:09:08] (03CR) 10Dzahn: [C: 032] "as last time, not touching existing shop, but switching new URL over to new shopify destination, per Effie" [dns] - 10https://gerrit.wikimedia.org/r/204629 (https://phabricator.wikimedia.org/T92438) (owner: 10Dzahn) [21:09:12] godog: I am sending it total number of manifests [21:09:35] godog: see for example http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1429218569.113&target=tools.tools-services-02.WebServiceMonitor.manifestscollected [21:09:41] PROBLEM - puppet last run on wtp2012 is CRITICAL puppet fail [21:09:41] also lol no idea where that spike came from [21:13:47] (03PS1) 10Ori.livneh: coal: add some comments, fix a couple of typos [puppet] - 10https://gerrit.wikimedia.org/r/204631 [21:14:23] godog: gauge + deltas seem like what I should be doing... [21:14:48] YuviPanda: mhh but then you have to know the previous count to calculate delta? messy [21:14:55] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1213736 (10Dzahn) We have made the switch above per instructions from shopify. I have mailed Effie so that they can properly enable it on their end now.... [21:14:58] godog: hmm? [21:15:16] (03CR) 10Ori.livneh: [C: 032] coal: add some comments, fix a couple of typos [puppet] - 10https://gerrit.wikimedia.org/r/204631 (owner: 10Ori.livneh) [21:15:21] godog: it’ll just monotonically increase, sadly - and I’ll have to use graphite functions to get a real answer... [21:15:29] godog: oooh, that’ll wrap around pretty quickly won’t it? [21:15:40] I wonder if graphite does 64bit ints or bignums [21:15:43] YuviPanda: the problem I think is this, a counter will flush the sum of its values and a gauge will flush its current value [21:15:55] so you set the right value and you are done [21:16:08] the right value being the number of manifests you have found [21:16:46] hmm [21:16:51] I think in statsite counters seem to return a rate rather than a total count [21:17:07] https://phabricator.wikimedia.org/T95703 [21:17:13] look at http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1429219025.764&target=tools.tools-services-02.WebServiceMonitor.manifestscollected&from=-24minutes [21:17:35] the fluctuations are by ~300 [21:17:41] which is the number of manifests it collects [21:17:46] https://github.com/armon/statsite/blob/master/src/conn_handler.c#L118 [21:17:48] so it’s outputting the sum? [21:17:54] (03CR) 10Dzahn: [C: 04-1] "wait until shopify properly enabled it on their side and Victoria ack's we want to switch over. existing shop is untouched for now" [puppet] - 10https://gerrit.wikimedia.org/r/204559 (https://phabricator.wikimedia.org/T92438) (owner: 10Dzahn) [21:17:58] gauge is a few lines above that [21:18:34] (03CR) 10Dzahn: [C: 031] "anytime but i'm not really sure about deploying" [puppet] - 10https://gerrit.wikimedia.org/r/204558 (https://phabricator.wikimedia.org/T92438) (owner: 10Dzahn) [21:18:34] 6operations, 10Parsoid, 6Services: Lets consider upgrading our Node.js installs to io.js (once decent Debian packages are ready) - https://phabricator.wikimedia.org/T91855#1213756 (10Krinkle) I would strongly recommend against Wikimedia adopting io.js at this time. To think it'll just be a performance boost... [21:18:38] oh, hmm. interesting [21:18:47] that doesn’t seem to be the behavior I’m seeing. [21:19:05] specifically for http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1429219135.824&from=-24hours&target=tools.tools-services-02.WebServiceMonitor.startsuccess [21:19:11] 6operations, 10Graphoid, 6Services, 10service-template-node, 7service-runner: Deploy graphoid service into production - https://phabricator.wikimedia.org/T90487#1213767 (10Yurik) [21:19:11] which seems to be an average [21:19:21] because it’s sending only integer data but output seems to be real? [21:21:00] YuviPanda: not sure, perhaps the raw data isn't? [21:21:09] another factor is aggregation that whisper does [21:22:27] (03CR) 10Anomie: "Do we care about the possible numeric suffix?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204623 (owner: 10Legoktm) [21:22:32] (03PS1) 10Andrew Bogott: Have libvirtd use the newer wmf_ca_2014_2017.pem on labvirt* [puppet] - 10https://gerrit.wikimedia.org/r/204632 [21:23:39] (03CR) 10Andrew Bogott: [C: 032] Have libvirtd use the newer wmf_ca_2014_2017.pem on labvirt* [puppet] - 10https://gerrit.wikimedia.org/r/204632 (owner: 10Andrew Bogott) [21:24:47] (03PS4) 10Dereckson: User rights configuration on ne.wikipedia - Abusefilter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203333 (https://phabricator.wikimedia.org/T95102) [21:25:09] (03PS5) 10Dereckson: User rights configuration on ne.wikipedia - Abusefilter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203333 (https://phabricator.wikimedia.org/T95102) [21:25:35] (03CR) 10Legoktm: "I just looked at the users who are going to be renamed on those two wikis, and none of them have usernames long enough to cause issues. Ot" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204623 (owner: 10Legoktm) [21:25:41] (03PS3) 10Dereckson: User rights configuration on ne.wikipedia - Filemover [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203335 (https://phabricator.wikimedia.org/T95103) [21:26:41] RECOVERY - puppet last run on wtp2012 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:27:34] (03CR) 10Anomie: [C: 031] "Ok then, works for me. You should be in time for the evening SWAT window." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204623 (owner: 10Legoktm) [21:27:41] actually I think that's what is wrong, I'll look into it [21:27:51] RECOVERY - nova-compute process on labvirt1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [21:28:01] (03PS2) 10Dzahn: lint: indentation fixes in roles [puppet] - 10https://gerrit.wikimedia.org/r/204554 (https://phabricator.wikimedia.org/T93645) [21:28:12] (03CR) 10Legoktm: [C: 032] Increase $wgMaxNameChars to 85 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204623 (owner: 10Legoktm) [21:28:40] RECOVERY - nova-compute process on labvirt1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [21:29:01] RECOVERY - nova-compute process on labvirt1006 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [21:29:10] RECOVERY - nova-compute process on labvirt1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [21:29:28] (03CR) 10Dzahn: [C: 032] lint: indentation fixes in roles [puppet] - 10https://gerrit.wikimedia.org/r/204554 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn) [21:30:23] (03CR) 10Hashar: "That has the side effect of git blame pointing to this commit. We have a redirector in place so not sure it is that useful to update the " [puppet] - 10https://gerrit.wikimedia.org/r/204616 (owner: 10Alex Monk) [21:30:30] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 13 data above and 9 below the confidence bounds [21:30:42] (03CR) 10Hashar: "That has the side effect of git blame pointing to this commit. We have a redirector in place so not sure it is that useful to update the " [puppet] - 10https://gerrit.wikimedia.org/r/204626 (owner: 10Alex Monk) [21:31:16] (03Merged) 10jenkins-bot: Increase $wgMaxNameChars to 85 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204623 (owner: 10Legoktm) [21:32:13] !log legoktm Synchronized wmf-config/InitialiseSettings.php: Increase $wgMaxNameChars to 85 (duration: 00m 12s) [21:32:21] Logged the message, Master [21:32:28] (03CR) 10Dzahn: [C: 032] integration: move redirect out of .htaccess [puppet] - 10https://gerrit.wikimedia.org/r/204202 (owner: 10Dzahn) [21:32:29] no jouncebot [21:32:30] still [21:33:09] 6operations, 7Graphite: Counters now only provide rates (multiplied by 1000?) - https://phabricator.wikimedia.org/T95703#1213842 (10fgiunchedi) so it is a rate in the sense that the counter (a sum of all values received for that counter during the flush period) is reset at each flush to 0, hence a 1/flushperio... [21:34:01] (03CR) 10Dzahn: [C: 031] Redirect dev.wikimedia.org URLs [puppet] - 10https://gerrit.wikimedia.org/r/199182 (https://phabricator.wikimedia.org/T372) (owner: 10Spage) [21:35:25] 6operations, 10Parsoid, 6Services: Lets consider upgrading our Node.js installs to io.js (once decent Debian packages are ready) - https://phabricator.wikimedia.org/T91855#1213845 (10GWicke) @Krinkle, you seem to be more concerned about the name & political situation. I don't think that those aspects matter... [21:46:25] akosiaris: it’s looking like I need the certs on all machines (virt* and labvirt*) to use the same CA so that they can talk to each other. [21:47:31] RECOVERY - carbon-cache too many creates on graphite1001 is OK Less than 1.00% above the threshold [500.0] [21:48:41] (03PS1) 10Dzahn: dumps::zim: fix nginx setup / basic site template [puppet] - 10https://gerrit.wikimedia.org/r/204636 [21:51:22] (03PS2) 10Dzahn: dumps::zim: fix nginx setup / basic site template [puppet] - 10https://gerrit.wikimedia.org/r/204636 (https://phabricator.wikimedia.org/T94457) [21:51:38] (03CR) 10Dzahn: [C: 032] dumps::zim: fix nginx setup / basic site template [puppet] - 10https://gerrit.wikimedia.org/r/204636 (https://phabricator.wikimedia.org/T94457) (owner: 10Dzahn) [21:55:21] RECOVERY - nova-compute process on labvirt1004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [21:56:37] (03PS1) 10Dzahn: dumps::zim: fix template source line [puppet] - 10https://gerrit.wikimedia.org/r/204640 (https://phabricator.wikimedia.org/T94457) [21:56:49] (03CR) 10jenkins-bot: [V: 04-1] dumps::zim: fix template source line [puppet] - 10https://gerrit.wikimedia.org/r/204640 (https://phabricator.wikimedia.org/T94457) (owner: 10Dzahn) [21:57:20] (03PS2) 10Dzahn: dumps::zim: fix template source line [puppet] - 10https://gerrit.wikimedia.org/r/204640 (https://phabricator.wikimedia.org/T94457) [21:57:41] (03PS1) 10Ori.livneh: coal: explicitly import logging.handlers [puppet] - 10https://gerrit.wikimedia.org/r/204641 [21:57:56] (03PS2) 10Ori.livneh: coal: explicitly import logging.handlers [puppet] - 10https://gerrit.wikimedia.org/r/204641 [21:58:00] (03CR) 10Ori.livneh: [C: 032] coal: explicitly import logging.handlers [puppet] - 10https://gerrit.wikimedia.org/r/204641 (owner: 10Ori.livneh) [21:58:08] (03CR) 10Ori.livneh: [V: 032] coal: explicitly import logging.handlers [puppet] - 10https://gerrit.wikimedia.org/r/204641 (owner: 10Ori.livneh) [21:58:25] (03CR) 10Dzahn: [C: 032] dumps::zim: fix template source line [puppet] - 10https://gerrit.wikimedia.org/r/204640 (https://phabricator.wikimedia.org/T94457) (owner: 10Dzahn) [21:58:42] mutante: i merged your change [21:58:43] Dzahn: dumps::zim: fix template source line (26f46c7b3a) [21:58:50] ori: i merged your change :) [21:58:53] heh [21:58:55] i thought i did [21:59:04] i typed "yes" and then it was done *g* [22:00:55] 7Blocked-on-Operations, 6operations, 5Patch-For-Review: Install nodejs, nginx and other dependencies on francium - https://phabricator.wikimedia.org/T94457#1213959 (10Dzahn) nginx now running: ``` Info: /Stage[main]/Dumps::Zim/Nginx::Site[zim]/File[/etc/nginx/sites-enabled/zim]: Scheduling refresh of Servi... [22:01:11] RECOVERY - puppet last run on francium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [22:06:15] 6operations, 7Graphite: Counters now only provide rates (multiplied by 1000?) - https://phabricator.wikimedia.org/T95703#1213993 (10GWicke) Yeah, the regular counter semantics are closer to what statsite provides as [counter_count(value)](https://github.com/armon/statsite/blob/master/src/conn_handler.c#L109) i... [22:11:52] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1214000 (10GWicke) [22:12:02] 6operations, 5Interdatacenter-IPsec: Strongswan: security association reauthentication failure - https://phabricator.wikimedia.org/T96111#1214002 (10Gage) Strongswan 5.3.0 has been uploaded to Debian/experimental, and is now running on Berkelium & Curium. So far the problem has not recurred. [22:15:35] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1214020 (10GWicke) [22:22:34] (03PS1) 10Dzahn: dumps::zim: open port 80 for http connections [puppet] - 10https://gerrit.wikimedia.org/r/204649 (https://phabricator.wikimedia.org/T94457) [22:25:18] (03CR) 10Springle: "I'd wondered about doing this previously, but figured that by leaving enwiki out there alone we also ensure that groupLoadsByDB keeps work" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204443 (owner: 10Aaron Schulz) [22:25:36] !log running forceRenameUsers.php (SUL finalization) on all small wikis [22:25:40] Logged the message, Master [22:26:56] (03CR) 10Dzahn: [C: 032] dumps::zim: open port 80 for http connections [puppet] - 10https://gerrit.wikimedia.org/r/204649 (https://phabricator.wikimedia.org/T94457) (owner: 10Dzahn) [22:27:03] legoktm: go go go! [22:27:22] (03PS2) 10Dzahn: dumps::zim: open port 80 for http connections [puppet] - 10https://gerrit.wikimedia.org/r/204649 (https://phabricator.wikimedia.org/T94457) [22:28:06] greg-g: if I am around to babysit, would you be OK with Hovercards going out at around 6pm SF time to Catalan and Greek Wikipedias? [22:28:24] theoretically yes [22:28:34] ori: as long as you're comfortable with everything (and erik) [22:28:47] yeah. cc Eloquence ^ [22:28:53] pretend at 5:59 I ask you: "Are you sure?" [22:29:29] Thanks. It has been blocked on me for some time and I took longer than is reasonable to review it, so I don't want to delay it further. [22:29:31] (I agree that no chinese wiki makes sense) [22:31:40] it's a bit troubling to discover these missing dependencies so late in the game. if we see any evidence that it's not working as expected, we should not deploy. would be great to get the exact config up on cawiki beta wmflabs first [22:32:01] 7Blocked-on-Operations, 6operations, 5Patch-For-Review: Install nodejs, nginx and other dependencies on francium - https://phabricator.wikimedia.org/T94457#1214087 (10Dzahn) firewall opened on 80 on francium. current error is it refuses connection from dataset1001: ``` [error] 24810#0: *88980 connect() fa... [22:41:52] (03PS1) 10Dzahn: dumps::zim: fix nginx listening port and docroot [puppet] - 10https://gerrit.wikimedia.org/r/204661 (https://phabricator.wikimedia.org/T94457) [22:42:03] (03CR) 10jenkins-bot: [V: 04-1] dumps::zim: fix nginx listening port and docroot [puppet] - 10https://gerrit.wikimedia.org/r/204661 (https://phabricator.wikimedia.org/T94457) (owner: 10Dzahn) [22:42:29] (03PS2) 10Dzahn: dumps::zim: fix nginx listening port and docroot [puppet] - 10https://gerrit.wikimedia.org/r/204661 (https://phabricator.wikimedia.org/T94457) [22:42:49] FYI report of slowness in #wikipedia - might be a fluke. [22:43:06] (03CR) 10Dzahn: [C: 032] dumps::zim: fix nginx listening port and docroot [puppet] - 10https://gerrit.wikimedia.org/r/204661 (https://phabricator.wikimedia.org/T94457) (owner: 10Dzahn) [22:43:21] marktraceur: Probably network issues as esams is depooled and traffic has to make it to eqiad [22:43:32] OK. [22:46:59] 7Blocked-on-Operations, 6operations, 5Patch-For-Review: Install nodejs, nginx and other dependencies on francium - https://phabricator.wikimedia.org/T94457#1214114 (10Dzahn) [22:47:01] 7Blocked-on-Operations, 10Ops-Access-Requests, 6operations: Access to francium - https://phabricator.wikimedia.org/T94093#1214113 (10Dzahn) [22:49:26] gwicke: http://dumps.wikimedia.org/htmldumps/ [22:49:44] proxied to francium [22:50:55] 7Blocked-on-Operations, 6operations, 5Patch-For-Review: Install nodejs, nginx and other dependencies on francium - https://phabricator.wikimedia.org/T94457#1214128 (10Dzahn) nginx proxy setup working now. see patches above. added placeholder page: http://dumps.wikimedia.org/htmldumps/ ^ this is now server... [22:51:31] can an opsen have a look at archiva.wikimedia.org? Its timing shit out. Is the JVM spinning like crazy, stuck in a GC hell? [22:52:38] or maybe it just recovered.... [22:52:40] 7Blocked-on-Operations, 6operations, 5Patch-For-Review: Install nodejs, nginx and other dependencies on francium - https://phabricator.wikimedia.org/T94457#1214133 (10Dzahn) [22:52:42] 6operations, 5Patch-For-Review: deploy francium for html/zim dumps - https://phabricator.wikimedia.org/T93113#1214132 (10Dzahn) [22:59:34] mutante: nice! [22:59:49] mutante: what is the path on francium this is pointing to? [23:00:37] nm, found it on the ticket: /srv/www/htmldumps [23:02:20] * legoktm waves [23:02:55] oh, the swat deployment window [23:02:58] RoanKattouw_away, ^demon|away, Krenair, rmoen: swat time! [23:03:04] where is jon [23:03:11] * Dereckson pings. [23:03:12] Not it [23:03:14] I have a meeting [23:03:27] I can do it [23:04:41] (03CR) 10Legoktm: [C: 032] User rights configuration on ne.wikipedia - Abusefilter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203333 (https://phabricator.wikimedia.org/T95102) (owner: 10Dereckson) [23:04:49] (03Merged) 10jenkins-bot: User rights configuration on ne.wikipedia - Abusefilter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203333 (https://phabricator.wikimedia.org/T95102) (owner: 10Dereckson) [23:05:33] !log legoktm Synchronized wmf-config/: User rights configuration on ne.wikipedia - Abusefilter (duration: 00m 12s) [23:05:38] Logged the message, Master [23:05:40] (03CR) 10Legoktm: [C: 032] User rights configuration on ne.wikipedia - Filemover [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203335 (https://phabricator.wikimedia.org/T95103) (owner: 10Dereckson) [23:05:48] (03Merged) 10jenkins-bot: User rights configuration on ne.wikipedia - Filemover [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203335 (https://phabricator.wikimedia.org/T95103) (owner: 10Dereckson) [23:06:03] sorry Krenair [23:06:08] was grabbing water [23:06:16] jdlrobson: awesome, you're after Dereckson [23:06:32] 203333 verified. [23:06:55] !log legoktm Synchronized wmf-config/InitialiseSettings.php: User rights configuration on ne.wikipedia - Filemover (duration: 00m 11s) [23:06:56] roger [23:06:58] Logged the message, Master [23:07:15] gwicke: /srv/www/htmldumps [23:07:16] 203335 verified. [23:07:29] (03CR) 10Legoktm: [C: 032] Set meta namespace on or.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204563 (https://phabricator.wikimedia.org/T94142) (owner: 10Dereckson) [23:07:35] (03Merged) 10jenkins-bot: Set meta namespace on or.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204563 (https://phabricator.wikimedia.org/T94142) (owner: 10Dereckson) [23:08:20] !log legoktm Synchronized wmf-config/InitialiseSettings.php: Set meta namespace on or.wiktionary (duration: 00m 14s) [23:08:24] Logged the message, Master [23:08:24] Dereckson: ^ [23:08:44] jdlrobson: do you already have a cherry-pick/submodule bump? [23:08:55] shoot no let me do that now [23:09:01] jdlrobson: I can do it [23:09:07] legoktm: okay! whatever's easiest [23:09:15] jdlrobson: just wmf2? [23:09:26] the issue is on test wiki [23:09:31] ok [23:09:40] so i'm assuming if we cherry pick there then that will ride out automatically to enwiki? [23:10:04] or is there any other branches i should be concerned about? [23:10:06] on tuesday yes [23:10:13] cool [23:11:43] (03CR) 10Aaron Schulz: "A comment can always be added to explain the feature, but I'd prefer the consistency of using groups." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204443 (owner: 10Aaron Schulz) [23:15:01] 204563 verified too. [23:19:24] Thank you legoktm for the deploy. [23:22:59] Dereckson: np :) [23:25:21] !log legoktm Synchronized php-1.26wmf2/extensions/Gather/includes/specials/SpecialGather.php: Error in regex broke User lists pages https://gerrit.wikimedia.org/r/#/c/204499/ (duration: 00m 12s) [23:25:25] jdlrobson: ^ [23:25:26] Logged the message, Master [23:26:41] !log legoktm Synchronized php-1.26wmf2/extensions/CentralAuth/includes/CentralAuthUser.php: Fix CentralAuthUser::loadAttached if no accounts are attached (duration: 00m 13s) [23:26:44] Logged the message, Master [23:26:50] legoktm: checking [23:27:27] !log legoktm Synchronized php-1.26wmf1/extensions/CentralAuth/includes/CentralAuthUser.php: Fix CentralAuthUser::loadAttached if no accounts are attached (duration: 00m 13s) [23:27:31] Logged the message, Master [23:28:21] legoktm: i can confirm that one's good :) Ready for the other patch? [23:29:24] oh shoot.. where's my other patch on the wiki page :-S shoot edit conflict! [23:29:56] jdlrobson: uhohs :/ [23:29:59] jdlrobson: link? [23:30:09] https://gerrit.wikimedia.org/r/#/c/204671/ if you have time [23:30:18] no worries if not, i just need to make sure i get these done by tuesday [23:30:20] i appreciate it [23:30:35] we have 30 more minutes :) [23:30:38] \o/ [23:40:00] (03PS1) 10Tim Landscheidt: package_builder: Reflect rename of /etc/apt/preferences.d/wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/204673 [23:43:18] !log legoktm Synchronized php-1.26wmf2/extensions/Gather/includes/specials/SpecialGather.php: Make Special:Gather show pages for that user https://gerrit.wikimedia.org/r/#/c/204671/ (duration: 00m 13s) [23:43:22] jdlrobson: ^ [23:43:25] Logged the message, Master [23:43:43] legoktm: it works! you're a star :D [23:43:48] thanks so much! that's all from me :D [23:43:48] woot [23:43:51] :) [23:52:26] (03PS1) 10Ori.livneh: coal: pass a materialized list to numpy.median [puppet] - 10https://gerrit.wikimedia.org/r/204674 [23:52:39] (03CR) 10Ori.livneh: [C: 032 V: 032] coal: pass a materialized list to numpy.median [puppet] - 10https://gerrit.wikimedia.org/r/204674 (owner: 10Ori.livneh)