[02:49:09] (03PS2) 10Andrew Bogott: Switch keystone to mysql assignment from ldap. [puppet] - 10https://gerrit.wikimedia.org/r/268325 (https://phabricator.wikimedia.org/T115029) [02:49:11] (03PS1) 10Andrew Bogott: Specify wgOpenStackManagerProjectId in WikitechPrivateSettings.php [puppet] - 10https://gerrit.wikimedia.org/r/268927 (https://phabricator.wikimedia.org/T115029) [02:53:39] (03CR) 10Alex Monk: [C: 031] Specify wgOpenStackManagerProjectId in WikitechPrivateSettings.php [puppet] - 10https://gerrit.wikimedia.org/r/268927 (https://phabricator.wikimedia.org/T115029) (owner: 10Andrew Bogott) [02:58:25] (03PS2) 10Alex Monk: Specify wgOpenStackManagerProjectId in WikitechPrivateSettings.php [puppet] - 10https://gerrit.wikimedia.org/r/268927 (https://phabricator.wikimedia.org/T115029) (owner: 10Andrew Bogott) [02:58:27] (03PS5) 10Alex Monk: labs dnsrecursor IP aliasing: work on all projects, not just some arbitrary ones [puppet] - 10https://gerrit.wikimedia.org/r/268921 [02:58:31] oops [02:58:55] (03CR) 10Alex Monk: "oops... messed this up while trying to make my own commit depend on it." [puppet] - 10https://gerrit.wikimedia.org/r/268927 (https://phabricator.wikimedia.org/T115029) (owner: 10Andrew Bogott) [03:00:18] (03PS6) 10Alex Monk: labs dnsrecursor IP aliasing: work on all projects, not just some arbitrary ones [puppet] - 10https://gerrit.wikimedia.org/r/268921 [03:01:40] (03CR) 10Alex Monk: "hmm... I suppose it didn't actually depend on the previous parent..." [puppet] - 10https://gerrit.wikimedia.org/r/268927 (https://phabricator.wikimedia.org/T115029) (owner: 10Andrew Bogott) [03:17:21] Krenair: sorry about the 1000 gerrit alerts [03:17:35] you keep rebasing all of the patches every time you update 1? [03:17:44] or something [03:20:16] It’s just that it’s a long dependency chain and I keep changing the first one [03:21:51] you don't really need to keep the others rebased all the time [03:21:51] hmmm, in this case, if you wish to avoid any flood, you can wait the first one is ready to rebase the others [03:22:47] You won't still be able to merge them as long as the first isn't ready to merge, so there is no hurry for that. [03:23:06] what is the difference between projectadmin and admin in keystone roles andrewbogott? [03:23:58] admin is global, projectadmin is project-local [03:24:18] admin is roughly equivalent to cloudadmin [03:24:30] and users = project members? [03:24:34] right [03:25:07] I have this documented someplace, hang on... [03:26:03] Krenair: https://wikitech.wikimedia.org/wiki/Labs_keystone_roles#The_Future [03:26:21] Although I should add the ‘admin’ one there, hang on... [03:28:05] how working is labtest exactly? is it possible to actually run instances in there? [03:30:06] The networking part of the cluster isn’t up yet. [03:30:12] So you can launch instances and look at their consoles [03:30:22] but no ssh, and the consoles will reveal distress [03:31:04] (and of course right now I’ve applied this refactor, so probably lots of wikitech things are broken now.) [03:31:13] 'distress'? [03:31:26] Like ‘OMG I can’t reach the network who am I' [03:31:37] heh, ok [03:32:02] I updated https://wikitech.wikimedia.org/wiki/Labs_keystone_roles#The_Future, hopefully it makes sense now. [03:32:54] so say I approved the next commit... we're ready to begin reading some information from keystone? [03:33:30] clearly this works on labtestweb2001 but I don't know if keystone etc. is ready in real labs? [03:35:12] Hm... [03:36:00] PROBLEM - puppet last run on cp1059 is CRITICAL: CRITICAL: Puppet has 1 failures [03:36:00] PROBLEM - puppet last run on analytics1050 is CRITICAL: CRITICAL: Puppet has 2 failures [03:36:00] PROBLEM - puppet last run on mw1192 is CRITICAL: CRITICAL: Puppet has 1 failures [03:36:30] Krenair: How about if I roll labtestweb back to https://gerrit.wikimedia.org/r/#/c/252615/ and we hammer on it for a day or two to make sure it’s really providing accurate info [03:36:46] Keystone in production is in the same state, that shouldn’t be an issue [03:37:02] But I don’t trust my code to be totally accurate, especially the caching. [03:37:08] yes please [03:37:17] And it’s super slow still, that’s my main reservation about deploying the reading side of it. [03:37:45] (Some of it /has/ to be slow due to limits in the keystone api. But I still want to look around for options.) [03:38:09] did you see my comment about the dns ip aliaser? [03:38:12] commit* [03:38:27] I did, seems promising. I haven’t read it yet though. [03:39:32] ok, labtestweb is now reading via keystone (which is backed w/ldap) but still writing directly to ldap. [03:39:52] I am pretty sure it’s broken in at least one way, though, conflating uids with usernames. [03:39:56] Going to look at that in a minute. [03:40:30] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [03:41:33] Krenair: I have a dumb question about memcache: The cache is global, right? Not session-local? I keep running things on the commandline that should be priming the cache but then the webUI is acting like the cache is still cold. [03:41:47] yes, it's global [03:42:37] ok, so that means I’m probably making a mistake of some sort :) [03:45:11] andrewbogott, re: my commit [03:45:32] when I run it against labtest I get an empty output [03:45:51] presumably no instances there have public ips [03:46:28] but when I run it against proper labs, I get HTTP 404 from the /tenants query [03:47:07] hm, let me make sure that keystone is using the same api version on both [03:47:27] it's trying to use v2.0 on both [03:51:22] PROBLEM - puppet last run on mw1054 is CRITICAL: CRITICAL: Puppet has 1 failures [03:53:30] Krenair: I’m pretty sure that keystone rewrites your url based on the keystone service catalog. And I’ve confirmed that the catalog on production has v3 api urls [03:53:39] I’m trying to think now what it would hurt for me to change them... [03:54:12] Or, ok, really I should probably move labtest to v3 to match instead [03:54:21] which will require me to rewrite my patches again, but I can live with that [03:54:41] wouldn't python-keystoneclient choose v3 if it was available? [03:55:07] i've been trying to avoid the phrase 'production labs' :) [03:55:07] I think it uses the api that the catalog offers. [03:55:13] yeah, good point. [03:55:16] What do you call it instead? [03:55:21] real labs, lol [03:55:24] proper labs [03:55:25] etc. [03:55:42] oh, wait -- [03:56:04] are you getting the 404 from a simple curl? [03:56:17] If so, try using urls like these instead: http://developer.openstack.org/api-ref-identity-v3.html#projects-v3 [03:57:45] no 404 from a simple curl [03:58:39] curl -H "Accept: application/json" -H "Content-Type: application/json" --data '{"auth": {"passwordCredentials": {"username": "novaadmin", "password": "redacted"}, "tenantId": "testlabs"}}' "http://labcontrol1001.wikimedia.org:35357/v2.0/tokens" [03:58:43] substitute in the password, run [03:58:59] get token from output (access -> token -> id) [03:59:21] curl -H "X-Auth-Token: redacted" "http://labcontrol1001.wikimedia.org:35357/v2.0/tenants" [03:59:28] returns http 200 and all the data you would expect [04:00:11] wait, didn’t you say before you were getting 404 with proper labs but 200 with labtest? [04:00:17] that's right [04:00:40] RECOVERY - puppet last run on analytics1050 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [04:01:12] in labtest it gives me the full project list as you'd expect [04:01:43] but 404 from labs [04:02:30] RECOVERY - puppet last run on cp1059 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [04:02:30] RECOVERY - puppet last run on mw1192 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:02:51] the command you posted above… curl -H "X-Auth-Token: redacted" "http://labcontrol1001.wikimedia.org:35357/v2.0/tenants" [04:03:09] I’m confused because I thought you said a second ago that that worked [04:03:32] it does work [04:03:40] simple curl works, python-keystoneclient does not [04:03:48] oh! I see, sorry. [04:04:14] it's the combination of python-keystoneclient + real labs that returns a 404 [04:05:39] can you try keystoneclient.v2_0 import client [04:05:42] or the equivalent? [04:06:01] maybe you would be from keystoneclient.v2_0.client import Client [04:06:03] not sure [04:06:08] but I think specifying the client version will matter [04:06:45] well part of the stack trace is this [04:06:47] File "/usr/lib/python2.7/dist-packages/keystoneclient/v2_0/tenants.py", line 118, in list [04:06:47] tenant_list = self._list("/tenants%s" % query, "tenants") [04:07:00] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [04:07:02] File "/usr/lib/python2.7/dist-packages/keystoneclient/base.py", line 106, in _list [04:07:03] resp, body = self.client.get(url) [04:07:09] File "/usr/lib/python2.7/dist-packages/keystoneclient/httpclient.py", line 590, in get [04:07:09] return self._cs_request(url, 'GET', **kwargs) [04:07:10] etc. [04:07:16] keystoneclient.apiclient.exceptions.NotFound: The resource could not be found. (HTTP 404) [04:07:23] so it's definitely v2_0 [04:08:21] PROBLEM - puppet last run on ms-be2018 is CRITICAL: CRITICAL: puppet fail [04:08:35] I suspect that you’re getting a v3 client, but then you’re asking it for tenants rather than projects (‘tenants’ being 2.0 terminology) and are traversing a codepath that’s untested/unintended. [04:08:49] and I get a object [04:09:09] oh, well, that’s more convicing :) [04:09:24] in my keystoneClient variable [04:09:53] I’m going to try to shift the labtest catalog to point to v3, let’s see if then you get identical (if bad) behavior both places. [04:10:03] ok [04:11:09] PROBLEM - Disk space on ytterbium is CRITICAL: DISK CRITICAL - free space: / 355 MB (3% inode=87%) [04:15:41] Krenair: ok, I don’t have it perfectly organized, but, try now? [04:16:05] still get 404 [04:16:10] from proper labs [04:16:17] ah, right, I only changed labtest [04:16:19] do you get 404 there too? [04:16:21] and I still get a useful project list from labtest [04:16:30] hm, ok, stay tuned :) [04:17:51] RECOVERY - puppet last run on mw1054 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [04:20:44] Krenair: how about now? [04:21:16] labtest still gives me a useful project list [04:21:24] real labs still gives me a 404 [04:24:07] hm, I don’t know why I’m not getting any logging from labtest keystone [04:28:51] the other thing that could be different is that real labs is still using ldap projects/assigments and labtest is using keystone projects/assigments. [04:28:56] So let me roll that back as well [04:29:03] (since I should be doing that for testing anyway) [04:31:32] (03CR) 10Andrew Bogott: [C: 032] Specify wgOpenStackManagerProjectId in WikitechPrivateSettings.php [puppet] - 10https://gerrit.wikimedia.org/r/268927 (https://phabricator.wikimedia.org/T115029) (owner: 10Andrew Bogott) [04:36:40] RECOVERY - puppet last run on ms-be2018 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [04:37:19] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: puppet fail [04:42:39] Krenair: ok, now labtest should be the same as real labs, except for having https://gerrit.wikimedia.org/r/#/c/252615/ applied [04:42:47] do you still see differences from the python client? [04:43:11] I still get a project list from labtest and an http 404 error from labs [04:43:41] weird [04:44:21] can you get anything else back from labs? Does the client 404 no matter what you ask? [04:44:30] although I suspect silver and labtestweb2001 are running different versions of python-keystoneclient [04:44:45] since they have differing versions of the files [04:45:19] hm, yeah [04:45:43] I get the same behavior on both when I use the commandline [04:45:52] like ‘keystone tenant-list’ or ‘openstack project list' [04:45:55] and a differing number of files [04:46:02] the former fails (both places) that the latter works (both places) [04:46:03] under /usr/lib/python2.7/dist-packages/keystoneclient [04:46:18] Want me to upgrade silver? I don’t mind, no one’s using it but you [04:47:50] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [04:48:07] I'd like to know exactly what's wrong currently before changing anything [04:49:41] the version running on labcontrol1001 is also 1:1.2.0-0ubuntu1.1~cloud0 [04:49:54] much newer, almost as modern as on labtest [04:51:16] krenair@silver:~$ keystone --version [04:51:16] 0.7.1 [04:51:23] krenair@labtestweb2001:~$ keystone --version [04:51:24] 1.2.0 [04:51:25] huh, okay [04:51:40] yes, please update silver andrewbogott [04:53:55] !log upgraded python-openstackclient python-glanceclient python-novaclient python-keystoneclient on silver [04:54:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [04:54:16] ok, all done [04:55:36] so now silver has a newer version of novaclient than labtestweb2001 [04:56:03] yeah, slightly :) [04:56:24] on the other hand, this looks promising [04:56:32] no, both are 1:1.2.0-0ubuntu1.2~cloud0 [04:56:42] yep [04:56:44] it now works [04:56:50] ah, great! [04:57:03] krenair@silver:~$ nova --version [04:57:03] 2.22.0 [04:57:09] krenair@labtestweb2001:~$ nova --version [04:57:09] 2.17.0 [04:57:29] oh yeah, I was talking about keystone [04:57:37] I’ll update nova too, just so we’re in harmony [04:58:06] there, now the above four packages should be the same in both places [05:00:41] Krenair: I’m about to skip out. Thank you for your comments on that giant query patch… I’ll catch up (and debug) tomorrow with luck. [05:00:45] You’re unblocked for now, right? [05:00:58] well that patch seems to be good to go [05:01:23] hopefully the version of python-keystoneclient on whatever machine actually runs that script is up to date [05:02:52] labservices — I’ll check that now [05:03:16] yeah, it’s reasonably modern — 1:1.2.0-0ubuntu1.1~cloud0 [05:03:25] I can upgrade it but not right before I go [05:03:34] But yeah, I’ll catch up with your patch soon. [05:07:48] 1.2.0 is the version needed, that's fine [05:08:50] great, we can merge as soon as we both have time to sit and watch [05:09:23] cool. bye! [05:09:29] ‘night! [06:18:50] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-4/2/0 (Telia, IC-314533, 24ms) {#11371} [10Gbps wave]BR [06:27:19] RECOVERY - Disk space on ytterbium is OK: DISK OK [06:30:40] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:40] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:40] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:29] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:29] PROBLEM - puppet last run on mw2050 is CRITICAL: CRITICAL: Puppet has 2 failures [06:35:09] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [06:56:39] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:56:50] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:57:31] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:30] RECOVERY - puppet last run on mw2050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:31] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:15:21] PROBLEM - Kafka Broker Replica Max Lag on kafka1020 is CRITICAL: CRITICAL: 53.57% of data above the critical threshold [5000000.0] [08:22:59] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [08:23:01] PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: puppet fail [08:26:00] RECOVERY - Kafka Broker Replica Max Lag on kafka1020 is OK: OK: Less than 50.00% above the threshold [1000000.0] [08:26:09] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [08:29:40] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [08:30:09] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [08:49:49] RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:35:09] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 817 [10:40:09] RECOVERY - check_mysql on db1008 is OK: Uptime: 1623715 Threads: 2 Questions: 9162732 Slow queries: 10951 Opens: 4096 Flush tables: 2 Open tables: 421 Queries per second avg: 5.643 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [11:51:21] PROBLEM - HHVM rendering on mw1132 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50392 bytes in 0.007 second response time [11:52:20] PROBLEM - Apache HTTP on mw1132 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50394 bytes in 0.025 second response time [11:54:49] PROBLEM - Apache HTTP on mw1134 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:55:00] PROBLEM - HHVM rendering on mw1134 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:55:39] PROBLEM - nutcracker port on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:55:39] PROBLEM - salt-minion processes on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:55:49] PROBLEM - HHVM processes on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:56:01] PROBLEM - SSH on mw1134 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:56:01] PROBLEM - Disk space on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:56:10] PROBLEM - DPKG on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:56:20] PROBLEM - nutcracker process on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:56:40] PROBLEM - puppet last run on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:56:50] PROBLEM - RAID on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:57:00] PROBLEM - configured eth on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:57:01] PROBLEM - Check size of conntrack table on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:57:20] PROBLEM - dhclient process on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:00:49] RECOVERY - configured eth on mw1134 is OK: OK - interfaces up [12:01:00] RECOVERY - dhclient process on mw1134 is OK: PROCS OK: 0 processes with command name dhclient [12:03:00] RECOVERY - nutcracker port on mw1134 is OK: TCP OK - 0.000 second response time on port 11212 [12:03:00] RECOVERY - salt-minion processes on mw1134 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [12:03:10] RECOVERY - HHVM processes on mw1134 is OK: PROCS OK: 12 processes with command name hhvm [12:03:20] RECOVERY - SSH on mw1134 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0) [12:03:20] RECOVERY - Disk space on mw1134 is OK: DISK OK [12:03:30] RECOVERY - DPKG on mw1134 is OK: All packages OK [12:03:40] RECOVERY - nutcracker process on mw1134 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [12:04:00] RECOVERY - Apache HTTP on mw1134 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 486 bytes in 6.695 second response time [12:04:09] RECOVERY - puppet last run on mw1134 is OK: OK: Puppet is currently enabled, last run 41 minutes ago with 0 failures [12:04:11] RECOVERY - RAID on mw1134 is OK: OK: no RAID installed [12:04:19] RECOVERY - HHVM rendering on mw1134 is OK: HTTP OK: HTTP/1.1 200 OK - 67229 bytes in 8.348 second response time [12:04:20] RECOVERY - Check size of conntrack table on mw1134 is OK: OK: nf_conntrack is 4 % full [14:15:19] PROBLEM - puppet last run on db2070 is CRITICAL: CRITICAL: puppet fail [14:42:20] RECOVERY - puppet last run on db2070 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [15:28:17] 6operations, 10DBA, 5Patch-For-Review: Prepare db1018 and s2-slaves for s2 master failover - https://phabricator.wikimedia.org/T125215#2006239 (10jcrespo) @greg I realized that I said 1-10 seconds of unavailability, but that doesn't have into account that currently, it takes a bit over a minute to deploy a f... [15:40:53] 6operations: [RFC] Alert about *when* partitions will run out of space, not a percentage/absolute number - https://phabricator.wikimedia.org/T126158#2006246 (10jcrespo) 3NEW [15:57:58] 6operations, 7Monitoring: [RFC] Alert about *when* partitions will run out of space, not a percentage/absolute number - https://phabricator.wikimedia.org/T126158#2006261 (10jcrespo) [16:01:06] (03CR) 10Jcrespo: "I am actually holding this change until Monday discussion, as it will require small tweaks on the actual group used." [puppet] - 10https://gerrit.wikimedia.org/r/268438 (https://phabricator.wikimedia.org/T125435) (owner: 10Jcrespo) [16:20:48] (03PS1) 10Andrew Bogott: Fix ldap_user_name_attribute in keystone config [puppet] - 10https://gerrit.wikimedia.org/r/268949 [16:23:26] (03CR) 10Andrew Bogott: [C: 032] Fix ldap_user_name_attribute in keystone config [puppet] - 10https://gerrit.wikimedia.org/r/268949 (owner: 10Andrew Bogott) [16:27:41] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: puppet fail [16:47:30] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [17:40:32] (03PS1) 10Merlijn van Deen: toollabs: libbytes-random-secure-perl does not exist on precise [puppet] - 10https://gerrit.wikimedia.org/r/268960 (https://phabricator.wikimedia.org/T126168) [17:40:34] (03PS1) 10Merlijn van Deen: toollabs: sync jessie packages with precise/trusty [puppet] - 10https://gerrit.wikimedia.org/r/268961 [21:06:34] (03PS1) 10Tim Landscheidt: Tools: Fix "invalid byte sequence in UTF-8" in crontab backups [puppet] - 10https://gerrit.wikimedia.org/r/268978 [21:10:49] (03PS2) 10Yuvipanda: toollabs: libbytes-random-secure-perl does not exist on precise [puppet] - 10https://gerrit.wikimedia.org/r/268960 (https://phabricator.wikimedia.org/T126168) (owner: 10Merlijn van Deen) [21:11:19] (03PS2) 10Tim Landscheidt: Tools: Fix "invalid byte sequence in UTF-8" in crontab backups [puppet] - 10https://gerrit.wikimedia.org/r/268978 [21:11:36] (03CR) 10Yuvipanda: [C: 032 V: 032] toollabs: libbytes-random-secure-perl does not exist on precise [puppet] - 10https://gerrit.wikimedia.org/r/268960 (https://phabricator.wikimedia.org/T126168) (owner: 10Merlijn van Deen) [21:11:53] (03PS2) 10Yuvipanda: toollabs: sync jessie packages with precise/trusty [puppet] - 10https://gerrit.wikimedia.org/r/268961 (owner: 10Merlijn van Deen) [21:12:13] (03CR) 10Yuvipanda: [C: 032 V: 032] toollabs: sync jessie packages with precise/trusty [puppet] - 10https://gerrit.wikimedia.org/r/268961 (owner: 10Merlijn van Deen) [21:14:21] Krenair: what username are you using on labtestweb? And do you get that failure even if you load the front page? [21:15:06] andrewbogott, Alex Monk [21:15:15] andrewbogott, no, only on the Nova pages [21:15:32] if you reload now does it still happen? [21:15:32] IIRC this error is triggered by not being able to get an unscoped token from nova for the user [21:16:31] still happens [21:16:55] can you c/p the error? [21:17:02] No Nova credentials found for your account. [21:17:03] There were no Nova credentials found for your user account. Please ask a Nova administrator to create credentials for you. [21:23:45] 10Ops-Access-Requests, 6operations, 10DBA, 5Patch-For-Review: Grant mysql client access to testreduce_vd and testreduce_0715 databases - https://phabricator.wikimedia.org/T125435#2006696 (10ssastry) Actually, I was a little slow to realize this but, I just noticed that mysql client is already installed (pr... [21:25:02] Krenair: you have 2fa enabled, and have tried logging out and in again? [21:25:27] (I’m looking in ldap but don’t see anything wrong with your account) [21:25:29] not using 2fa on the test wiki at the moment [21:25:39] 2016-02-07 21:24:49 labtestweb2001 labtestwiki ldap INFO: 2.1.0 OpenStackNovaController::restCall fullurl: http://labtestcontrol2001.wikimedia.org:35357/v2.0/tokens [21:25:39] 2016-02-07 21:24:49 labtestweb2001 labtestwiki ldap INFO: 2.1.0 OpenStackNovaController::authenticate return code: 401 [21:26:23] does keystone pull my record from ldap? [21:28:08] hm, OSM shouldn’t even let you load that page without 2fa enabled. [21:28:29] NovaProject? [21:28:53] we don't require users to have 2fa enabled to use osm [21:29:42] (and, yes, the current running code is making keystone calls but keystone is using the ldap backend.) [21:31:43] ok, it looks like it’s only when changing things that it checks 2fa [21:32:22] I can enable it if that'd make OSM feel better [21:32:24] at least, in SpecialNovaProject it gets checked in execute() [21:32:37] sure, try it [21:32:41] I don’t think it’ll make a difference for this issue [21:33:16] ah, interesting. it requires that for users with cloudadmin [21:33:18] but, sorry, you’ve logged out and in since we started talking just now? [21:33:33] it logs me out every time it sends me this error [21:33:57] harsh [21:35:02] ok, logged in with 2fa [21:35:10] No Nova credentials found for your account. [21:39:28] I added some debug lines. If you reload the project page you should see something like ‘checkpoint one checkpoint two checkpoint three’ along the top. [21:39:30] what do you see? [21:41:57] andrewbogott, checkpoint two [21:42:47] so it thinks my ldap user doesn't exist [21:43:51] which is unsurprising given that the way that check works is by looking at your token [21:44:03] which it failed to get while I logged in during the previous request [21:44:08] (03PS3) 10Yuvipanda: Tools: Fix "invalid byte sequence in UTF-8" in crontab backups [puppet] - 10https://gerrit.wikimedia.org/r/268978 (owner: 10Tim Landscheidt) [21:44:15] (03CR) 10Yuvipanda: [C: 032 V: 032] Tools: Fix "invalid byte sequence in UTF-8" in crontab backups [puppet] - 10https://gerrit.wikimedia.org/r/268978 (owner: 10Tim Landscheidt) [21:45:09] Hm, I wonder… Krenair, does live wikitech fail for you in the same way? [21:46:51] it fails for me in a different way, one sec [21:46:59] probably unrelated [21:51:34] Hello [21:51:43] Anyone in? [21:53:20] andrewbogott, right, sorry about that. I can get into my normal wikitech account just fine [21:53:21] hi ShakespeareFan00 [21:53:42] Krenair [21:54:00] I've found a problem in SVG rendering [21:54:12] stick it in phab? [21:54:17] Namely that rsvg doesn't support some attributes in and [21:54:27] Kenair Already done so :) [21:54:34] I was coming here to note it [21:54:59] ok... [21:55:10] https://phabricator.wikimedia.org/T126175#2006729 [21:57:56] Krenair: What's the default dip? [21:58:00] *dpi used ? [21:58:04] I have no idea [21:58:24] OKay asking so I can manually export PNG's if needed.. [21:59:43] I have never touched SVG rendering [22:05:49] SVG has DPI? [22:06:18] seems like that would be asking for trouble [22:06:53] Krenair: I created a new account with a mismatched username/uid because I thought that might be the problem... [22:06:55] but it seems to work fine [22:07:05] TimStarling: The export does [22:07:17] I.e Exporing at 72dpi for printers for example [22:07:29] right... [22:07:33] Krenair: so we can create a new test account for you, or I can muck around and try to delete the one you have now so that we get a fresh start [22:07:44] it's rendered at whatever size the user requests [22:07:54] plus srcset 1.5x, 2x [22:08:17] OH OK [22:08:18] Thanks [22:08:37] but obviously you don't know what size the user is requesting when you upload a PNG [22:08:49] you just need to make it some reasonably large size, like 2000px width [22:08:55] Hmm [22:09:03] so that it won't look too rubbish [22:09:18] Someone over in Inkscape was saying that most modern browsers support SVG anyway [22:09:27] http://caniuse.com/#search=SVG [22:09:48] So Mediawikis PNG render as PNG may be an older tech [22:09:56] *SVG render as PNG sorry [22:10:00] or you could think about whether you really really need these text attributes [22:10:33] there are a couple of other reasons [22:10:47] one is that SVGs tend to be about 10x larger than PNGs [22:11:24] ori recently had the idea of sending SVGs to save on bandwidth, and rediscovered this fact [22:11:25] (03PS1) 10BryanDavis: Monolog: normalize messages before PSR3 expansion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/269063 (https://phabricator.wikimedia.org/T124985) [22:11:26] TimStarling: In the uploads I've been making of some figures for Wikisource , I've been converting the text to paths [22:11:47] and probably should upload PNG conversions as well [22:12:03] another is that a lot of our SVGs would take many seconds to render on a mobile device [22:12:55] converting text to paths seems pretty sensible to me [22:12:56] andrewbogott, I'll try making a new account [22:13:19] you can't rely on the server having the exact same font file as you, and text rendering is hard [22:13:40] 6operations, 6Discovery, 10Maps, 10Salt, 3Discovery-Maps-Sprint: Kartotherian git deploy service restart failed with perm error - https://phabricator.wikimedia.org/T112707#2006804 (10Yurik) 5Open>3Resolved all's good on this front i think. [22:13:54] you lose indexability, not sure if there is a fix for that [22:14:25] andrewbogott, wait [22:14:34] andrewbogott, for your new test account, what if you log out and back in? [22:19:25] Krenair: trying [22:19:53] Krenair: seems to still work. I get the 2fa refusal page [22:20:01] which comes after the error you’re getting [22:20:37] well I get "No Nova credentials found for your account." on my new account [22:21:35] hm... [22:21:39] what’s the new account name? [22:21:47] username Labtestalex [22:21:51] uid labtestkrenair [22:22:39] ah, and you gave yourself cloudadmin already? [22:22:46] I’m going to give you shell too, we’ll see if that helps [22:22:55] (It shouldn’t matter) [22:22:58] yes [22:23:36] ok, now I need to fuss with the code for a minute to add your new account as projectadmin someplace. One moment... [22:24:47] ok, try now, with labtestalex? [22:26:03] No Nova credentials found for your account. [22:26:13] what the heck [22:27:41] I am out of ideas but will ponder for a bit [22:29:35] every account I have on labtestwikitech creates and logs in successfully [22:29:42] but cannot get a token from nova [22:30:30] no matter whether the uid matches cn or not [22:31:53] andrewbogott, It's perfectly happy if I roll OSM back to the previous commit [22:36:01] which doesn't make a lot of sense [22:42:02] (03PS1) 10BryanDavis: Monolog: Add mwversion to udp2log log events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/269065 (https://phabricator.wikimedia.org/T125707) [22:55:52] (03CR) 10Gergő Tisza: [C: 04-1] Monolog: normalize messages before PSR3 expansion (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/269063 (https://phabricator.wikimedia.org/T124985) (owner: 10BryanDavis) [22:57:58] (03CR) 10Gergő Tisza: [C: 031] Monolog: Add mwversion to udp2log log events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/269065 (https://phabricator.wikimedia.org/T125707) (owner: 10BryanDavis) [23:09:46] Krenair: could it have to do with characters in your password that are getting messed up during the extra hop through keystone? [23:09:54] possibly [23:11:22] although I can now log in just fine [23:11:36] might be a cached token though, will purge it and try again [23:12:20] PROBLEM - puppet last run on cp3043 is CRITICAL: CRITICAL: puppet fail [23:12:25] andrewbogott, it seems to work now... [23:17:06] Krenair: what did you purge? [23:17:12] I sure I restarted memcached several times when we were testing before [23:17:42] removed it from the openstack_tokens table [23:17:52] didn't think about memcached, ugh [23:19:58] (03PS2) 10BryanDavis: Monolog: normalize messages before PSR3 expansion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/269063 (https://phabricator.wikimedia.org/T124985) [23:20:00] (03PS1) 10BryanDavis: logging: Send all udp2log eligible messages to $wmgDefaultMonologHandler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/269068 (https://phabricator.wikimedia.org/T117019) [23:20:02] (03PS1) 10BryanDavis: logging: Collect mw1017 logs for debugging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/269069 (https://phabricator.wikimedia.org/T117020) [23:20:46] (03CR) 10BryanDavis: Monolog: normalize messages before PSR3 expansion (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/269063 (https://phabricator.wikimedia.org/T124985) (owner: 10BryanDavis) [23:21:10] (03CR) 10jenkins-bot: [V: 04-1] logging: Collect mw1017 logs for debugging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/269069 (https://phabricator.wikimedia.org/T117020) (owner: 10BryanDavis) [23:21:54] how would you have a bad cached token for an account you just created? [23:22:26] I wanted to get rid of it to make sure the token was really being generated next time I logged in [23:22:41] instead of just copied from memcached [23:22:45] or the db [23:24:10] (03PS2) 10BryanDavis: logging: Collect mw1017 logs for debugging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/269069 (https://phabricator.wikimedia.org/T117020) [23:24:29] So what have we learned? Just to truncate the token table when we roll this out? [23:28:45] (03CR) 1020after4: "I've come up with 3 possibile solutions to the package_settings problem, discussion on the task: https://phabricator.wikimedia.org/T113072" [puppet] - 10https://gerrit.wikimedia.org/r/262742 (https://phabricator.wikimedia.org/T113072) (owner: 10Alexandros Kosiaris) [23:29:21] (03PS1) 10Subramanya Sastry: ruthenium services: Couple more tweaks [puppet] - 10https://gerrit.wikimedia.org/r/269070 [23:33:38] (03PS2) 10Subramanya Sastry: ruthenium services: Few more tweaks [puppet] - 10https://gerrit.wikimedia.org/r/269070 [23:34:52] (03CR) 10jenkins-bot: [V: 04-1] ruthenium services: Few more tweaks [puppet] - 10https://gerrit.wikimedia.org/r/269070 (owner: 10Subramanya Sastry) [23:36:45] andrewbogott, no, I've learned nothing [23:36:55] great, me neither [23:37:54] (03PS3) 10Subramanya Sastry: ruthenium services: Few more tweaks [puppet] - 10https://gerrit.wikimedia.org/r/269070 [23:39:00] RECOVERY - puppet last run on cp3043 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [23:48:02] (03CR) 10Gergő Tisza: [C: 031] Monolog: normalize messages before PSR3 expansion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/269063 (https://phabricator.wikimedia.org/T124985) (owner: 10BryanDavis) [23:57:50] (03CR) 10Tim Starling: [C: 032] ruthenium services: Few more tweaks [puppet] - 10https://gerrit.wikimedia.org/r/269070 (owner: 10Subramanya Sastry)