[01:38:50] PROBLEM - Puppet errors on tools-exec-1442 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [02:18:49] RECOVERY - Puppet errors on tools-exec-1442 is OK: OK: Less than 1.00% above the threshold [0.0] [02:54:36] Steinsplitter: did you get your respawning job issue sorted out? It looks like the err log spew at least stopped. [03:26:01] bd808: https://phabricator.wikimedia.org/T168206#3358698 [03:26:17] he pinged me yesterday, I was zzz-ing [03:29:47] my phone's irc client seems to be not always getting push notifications for pings. :/ [03:30:59] * zhuyifei1999_ hopes there are phones-for-devs where everything inside is FOSS and you can freely hack around in it :/ [03:31:23] ^ why I still hate phones [03:31:51] I sold my freedom for stability long ago. piles of Apple devices in my house [03:32:03] lol [03:32:25] I ran linux on laptops from 1995-2002. [03:32:42] I'm still on linux [03:33:07] Then I found OSX and stopped fighting with video drivers, wifi cards, sound, cameras, ... [03:33:45] most (all?) of the things I run on my desktop are FLOSS still [03:34:00] lol driver nuisance [03:34:28] I'm on thinkpad and haven't had the need to deal with drivers in a while [03:34:38] battery life for laptops used to be horrible too. I've heard that is mostly fixed in the modern era [03:35:58] I do have a ~1 year old thinkpad that I bought to try the linux laptop life again. I used it for a month and could not find an irc client I liked enough for day to day. Weird blocker I know [03:36:11] yeah, battery is the thing that doesn't scale with that Moore's law [03:36:58] petan used to have an irc client that I like [03:37:21] ah. I've seen a git repo for that I think [03:38:22] I use Textual on my laptop. It has really awesome display customization ability from using Safari as the rendering engine. [03:38:34] * zhuyifei1999_ searches [03:39:10] hmm "Textual: IRC for OS X" [03:39:36] yes, https://github.com/Codeux-Software/Textual [03:40:05] I build it from source and add a few hacks that they don't want upstream [03:40:36] * zhuyifei1999_ isn't familiar with objective-c :/ so no chance to port it to work with firefox on linux [03:40:37] and then I run this pile of css/js theme additions -- https://github.com/bd808/Textual-Theme-bd808 [03:40:54] heh. yeah I would be a full rewrite I'm afraid [03:42:16] probably easier to figure out how to swap the rendering engine in xchat or something [03:42:35] s/xchat/hexchat/ [03:42:44] hexchat? [03:42:48] * zhuyifei1999_ searches [03:43:03] https://hexchat.github.io/ [03:44:16] I'm not sure that writing an irc client is a good use of anyone's time ;) [03:44:56] * zhuyifei1999_ is gonna try this one [03:45:12] Some people like WeeChat. [03:45:21] I use MacIrssi, which is just a weird thin wrapper around irssi. [03:52:47] my favorite thing about Textual is that I can apply css classes using js that make things like bots render in a smaller font with lower contrast. The messages are still on my screen, but things said by people stand out and the bots fade away if I'm not looking closely [03:52:52] (actually, I give up on this irc client madness. it's like vim vs emacs) [03:53:47] zhuyifei1999_: yeah, please don't waste your spare cycles trying to make an irc client for me. [03:54:01] lol [03:58:29] I turn off join, part, and quit messages, which seems to help quite a bit. [03:58:48] I don't have to suffer the "download my client!" spam and other quit messages, at least. [03:59:56] channels like this one and -operations can be a real wall of noise during an outage. makes things hard to follow. :/ [04:21:38] Yeah, I have very ignores, but... [04:21:39] [00:21] Ignore List: [04:21:39] [00:21] 1 icinga-wm #wikimedia-operations: ALL [04:26:44] ooh good idea [06:15:43] !log xtools Redirected //tools.wmflabs.org/xtools-dev/ to //xtools-dev.wmflabs.org/ [06:15:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Xtools/SAL [06:20:38] 10PAWS, 10Wikidata: Editing wikidata with paws/pywikibot fails when user is not registered for commons - https://phabricator.wikimedia.org/T168222#3358914 (10Knuthuehne) [06:30:59] PROBLEM - Puppet errors on tools-exec-1428 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [06:43:54] PROBLEM - Puppet errors on tools-exec-1435 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [06:58:12] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1425 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:11:00] RECOVERY - Puppet errors on tools-exec-1428 is OK: OK: Less than 1.00% above the threshold [0.0] [07:18:55] RECOVERY - Puppet errors on tools-exec-1435 is OK: OK: Less than 1.00% above the threshold [0.0] [07:33:12] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1425 is OK: OK: Less than 1.00% above the threshold [0.0] [08:05:33] 10Labs, 10Labs-Infrastructure: ssh fails for Labs instance with "debian-9.0-stretch (experimental)" image - https://phabricator.wikimedia.org/T167267#3359117 (10Magnus) ``` ssh magnus@petscan-dev.petscan.eqiad.wmflabs Permission denied (publickey). Killed by signal 1. ``` [08:45:41] 10cloud-services-team, 10DBA, 10Operations, 10Patch-For-Review, 10Wikimedia-log-errors: Terbium cronjobs attempting to connect to labstestweb2001 - https://phabricator.wikimedia.org/T167961#3359253 (10jcrespo) [08:46:54] 10cloud-services-team, 10DBA, 10Operations, 10Patch-For-Review, 10Wikimedia-log-errors: Terbium cronjobs attempting to connect to labstestweb2001 - https://phabricator.wikimedia.org/T167961#3351232 (10jcrespo) I have added 2 temporary accounts to connect from terbium and wasat as the admin users. [09:13:01] 10Labs, 10DBA: Prepare and check storage layer for kbp.wikipedia.org - https://phabricator.wikimedia.org/T160869#3113206 (10Marostegui) Just leaving the comment here for the record: ping us (DBAs) when the tables are created so we can sanitize them on sanitarium hosts and labs. [10:58:19] 10Labs, 10Tool-Labs: Automatically restarting job for tools.sbot - https://phabricator.wikimedia.org/T168206#3359886 (10Steinsplitter) job has been stopped from restarting after using //kill -SIGSTOP 6755//. Still surprised... there was no crontab, no continuous job, .bigbrother, screen, etc. Maybe something i... [10:59:06] 10Labs, 10DBA: Prepare and check storage layer for kbp.wikipedia.org - https://phabricator.wikimedia.org/T160869#3359887 (10Dereckson) OK. Will be probably this week. [12:48:27] (03CR) 10Gehel: [V: 032 C: 032] maps - add dummy redis password for tilerator / tileratorui [labs/private] - 10https://gerrit.wikimedia.org/r/358950 (https://phabricator.wikimedia.org/T167871) (owner: 10Gehel) [13:07:00] 10Labs, 10DBA, 10User-bd808, 10cloud-services-team (Kanban): setup dewiki and wikidatawiki on the labsdb1009, 1010 and 1011 - https://phabricator.wikimedia.org/T168021#3360443 (10Marostegui) @jcrespo @bd808 I have tried to run: `sudo /usr/local/sbin/maintain-views --databases dewiki` on labsdb1009, 1010 an... [13:18:01] 10Labs, 10DBA, 10User-bd808, 10cloud-services-team (Kanban): setup dewiki and wikidatawiki on the labsdb1009, 1010 and 1011 - https://phabricator.wikimedia.org/T168021#3360473 (10jcrespo) a:05jcrespo>03Marostegui Assigned to to for now, ping it back when done. [13:19:01] 10Labs, 10DBA, 10Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3360477 (10Marostegui) [13:19:03] 10Labs, 10DBA, 10User-bd808, 10cloud-services-team (Kanban): setup dewiki and wikidatawiki on the labsdb1009, 1010 and 1011 - https://phabricator.wikimedia.org/T168021#3360476 (10Marostegui) 05Open>03Resolved [13:55:49] andrewbogott: Hey, I have a problem with https://horizon.wikimedia.org/project/proxy/ it shows "Something went wrong!" error to me, I'm on "wikidata-federation" project and tried to add two public DNS proxies but didn't work [14:11:29] Amir1: so the page loads correctly but it fails when you create? [14:12:42] andrewbogott: https://horizon.wikimedia.org/project/proxy/ loads correctly but gives me error [14:12:54] 10cloud-services-team, 10DBA, 10Operations, 10Patch-For-Review, 10Wikimedia-log-errors: Terbium cronjobs attempting to connect to labstestweb2001 - https://phabricator.wikimedia.org/T167961#3360676 (10jcrespo) [14:12:59] the DNS proxies I've made doesn't load at all [14:13:05] Amir1: what are you doing when it gives you the error? [14:13:14] just loading the page [14:13:42] but it was working before I tried to make DNS proxies which failed [14:13:50] but when you said 'loads correctly...' [14:13:56] https://usercontent.irccloud-cdn.com/file/VGKCMEzf/image.png [14:14:04] HTH [14:15:19] I would not describe that as 'correctly' :) [14:15:29] I'm looking, it'll be a few minutes [14:15:37] :D [14:15:47] I need to go now, be back in ten minutes [14:15:50] thanks [14:40:28] (03PS1) 10Alexandros Kosiaris: hieraize OTRS private data [labs/private] - 10https://gerrit.wikimedia.org/r/359947 [14:55:44] Amir1: I'm pretty sure I fixed the base issue, but dns is taking a while to catch up so it may stay broken for another hour or so :( [14:58:19] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] hieraize OTRS private data [labs/private] - 10https://gerrit.wikimedia.org/r/359947 (owner: 10Alexandros Kosiaris) [14:59:55] 10cloud-services-team, 10DBA, 10Operations, 10Patch-For-Review, 10Wikimedia-log-errors: Terbium cronjobs attempting to connect to labstestweb2001 - https://phabricator.wikimedia.org/T167961#3360789 (10Marostegui) 05Open>03Resolved a:03jcrespo After Jaime added the grants manually, I have talked to... [15:17:36] andrewbogott: I tried to make it again and errorred out: "Danger: There was an error submitting the form. Please try again." [15:17:46] yep, dns is still behind [15:17:53] it's intermittent, some servers are up to date and some not [15:18:41] okay, thanks [15:26:03] 10Labs, 10Labs-Infrastructure, 10Operations: Puppet CA: virt1000.wikimedia.org' will expire on 2017-08-15 - https://phabricator.wikimedia.org/T168110#3360856 (10Andrew) I see this but can't figure out where it's coming from. We haven't had a box named virt1000 for ages... is the CA cert somehow still named... [16:08:31] Amir1: all good now? [16:09:54] andrewbogott: I made the proxies but times out when I try out on browser [16:10:06] inside the server curl -L 0.0.0.0:80 works just fine [16:10:17] http://federated-commons.wmflabs.org/ [16:10:31] the node for this: federated-commons.eqiad.wmflabs [16:10:34] So probably you need to change the security group? [16:11:56] andrewbogott: hmm, is 80 not open in default security group? [16:12:08] nope, definitely not [16:12:22] well, I mean, security groups are project-specific and editable [16:12:27] but by default it's just ping and ssh [16:12:36] okay, I work on it [16:12:42] Thanks [16:15:23] 10Labs, 10Labs-Infrastructure: ssh fails for Labs instance with "debian-9.0-stretch (experimental)" image - https://phabricator.wikimedia.org/T167267#3360938 (10bd808) Logs from the last attempt by @Magnus: ``` Jun 19 08:04:44 petscan-dev sshd[1268]: Connection from 10.68.17.232 port 39748 on 10.68.20.123 port... [16:16:20] 10Labs, 10Labs-Infrastructure: ssh-dss (DSA) keys fail for Labs instances with "debian-9.0-stretch (experimental)" image - https://phabricator.wikimedia.org/T167267#3360939 (10bd808) [16:19:27] 10Labs, 10Tool-Labs: Automatically restarting job for tools.sbot - https://phabricator.wikimedia.org/T168206#3360942 (10zhuyifei1999) a:05zhuyifei1999>03None ``` tools.sbot@tools-bastion-03:~$ ps up 6755 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND tools.s+ 6755 0.0 0.0 4478... [16:21:50] 10Labs, 10Labs-Infrastructure: designate: on instance deletion, remove proxy dns entries as well as proxy entries - https://phabricator.wikimedia.org/T168313#3360958 (10Andrew) [16:31:11] andrewbogott: okay, still not working and I'm kinda stuck: https://horizon.wikimedia.org/project/access_and_security/security_groups/874/ (it's "web" security group in wikidata-federation project) 80 and 8080 is open [16:31:25] and this group is added to the nodes [16:31:33] but still gives 504 error [16:31:59] you need to specify the ip range of incoming hosts [16:32:07] in this case it's just the proxy, so you can do 10.0.0.0/8 [16:32:22] (which basically just means 'ok from within wmf cloud instances') [16:33:54] okay [16:37:42] andrewbogott: it's working now. Thanks [16:37:47] great! [20:30:07] 10Labs, 10Labs-Infrastructure, 10Patch-For-Review, 10cloud-services-team (Kanban): Horizon puppet roles not cleared when instance is deleted - https://phabricator.wikimedia.org/T147878#2706690 (10Andrew) 05Open>03Resolved [20:30:22] 10Labs, 10Labs-Infrastructure, 10Patch-For-Review, 10cloud-services-team (Kanban): Horizon puppet roles not cleared when instance is deleted - https://phabricator.wikimedia.org/T147878#2706690 (10Andrew) 05Resolved>03Open [20:31:40] 10Labs, 10Labs-Infrastructure, 10cloud-services-team (Kanban): designate: on instance deletion, remove proxy dns entries as well as proxy entries - https://phabricator.wikimedia.org/T168313#3361553 (10Andrew) [22:24:54] 10Labs, 10cloud-services-team (Kanban): labmon1001 disk filling up - https://phabricator.wikimedia.org/T168344#3361694 (10Andrew) [22:29:30] 10Labs, 10cloud-services-team (Kanban): labmon1001 disk filling up - https://phabricator.wikimedia.org/T168344#3361739 (10Andrew) For now, I am going to delete all metrics more than 2 years old: ``` find . -mtime +730 -type f -delete ``` That didn't free up a ton of space, but enough to last us a few weeks... [23:36:29] hello! did something happen to the Tool Labs databases around 22:00 UTC on 19 January (3.5 hours ago)? [23:36:43] all of a sudden my bot can't authenticate, using the same credentials it always has [23:37:09] odd thing is I tried running `mysql --defaults-file=$HOME/replica.my.cnf -h enwiki.labsdb enwiki_p` and I'm able to log in [23:39:05] also, unrelated, I'm getting dramatically different results running this query on Tool Labs vs production: https://quarry.wmflabs.org/query/19672 [23:40:13] musikanimal: `sql enwiki` doesn't work either? [23:40:33] yes I'm able to log in manually [23:40:41] https://en.wikipedia.org/wiki/User:MusikBot/PermClerk/Error_log (see the bottom) [23:41:02] I didn't change any code [23:41:55] o.O [23:42:46] can I see the code? [23:43:44] well, it's more about authentication and not the code, I was just saying that to state the record [23:43:55] but anyway it's at https://github.com/MusikAnimal/MusikBot [23:44:34] logging in happens at https://github.com/MusikAnimal/MusikBot/blob/master/musikbot.rb#L358-L367 [23:46:07] are you sure app_config is the same ad in replica.my.cnf [23:46:59] the generate_replica_users script might malfunction if that's the case [23:47:10] I just ran the bot locally and it was able to connect [23:47:14] same credentials [23:47:48] wait nvm, I killed it too early [23:48:20] yeah, same issue :( [23:48:43] credentials are the same as replica.my.cnf [23:49:08] at least now I know it's not the Tool Labs environment, since it fails locally [23:50:13] I wish my ruby skills are better [23:50:22] hehe [23:54:48] musikanimal: can you try manually select? [23:55:12] I just did a test and found if it were a password error it should be: [23:55:20] ERROR 1045 (28000): Access denied for user 's51201'@'10.68.16.44' (using password: YES) [23:56:02] password correct but wrong permissions is like: [23:56:19] tools.yifeibot@tools-bastion-02:~$ mysql -h enwiki.labsdb -u s51201 -p <<< 'SELECT * FROM enwiki.page LIMIT 1' [23:56:19] Enter password: [23:56:19] ERROR 1142 (42000) at line 1: SELECT command denied to user 's51201'@'10.68.16.44' for table 'page' [23:57:26] zhuyifei1999_: enwiki_p [23:57:35] Platonides: yeah ik [23:57:53] yes [23:57:59] just demonstrating what an authorization (non-password) issue looks like [23:58:01] you were using "enwiki.page", not "enwiki_p.page" [23:58:07] nvm, my stupid mistake... https://github.com/MusikAnimal/MusikBot/commit/bb1b53b1a154e52beee83b7c7de9b88ab1efbd07#diff-c5c5b7041a7aa084bdd46ebd36e4de28R138 [23:58:13] Platonides: see above [23:58:19] I've been working off of Analytics replicas locally and forgot to change it back! so sorry for the noise [23:58:24] but thank you for the help nonetheless [23:58:50] musikanimal: np