[02:33:46] !log tools Rebooting tools-sgewebgrid-lighttpd-0903. Instance hung. [02:33:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:59:19] 1 hour replag on s1, s3, s5 and s8? [11:59:53] !log admin network switch hardware is down affecting cloudvirt1025/1026 (T227536) VMs are supposed to be online but unreachable [11:59:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:59:56] T227536: b1-eqiad pdu refresh (Thursday 10/10 @11am UTC) - https://phabricator.wikimedia.org/T227536 [13:32:11] !log openstack resuming cleanup of designate dns leaks in eqiad1 T235127 [13:32:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [13:32:17] T235127: wmfsink designate handler not running during vm deletes - https://phabricator.wikimedia.org/T235127 [14:47:20] Any updates (re the status message) jeh or arturo ? [14:49:48] that is old :-) [15:03:23] !log openstack cleanup puppet leaks left over from eqiad1 upgrades [15:03:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [16:25:37] !log ores unquiet icinga-wm in #wikimedia-ai [16:25:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [16:32:11] * arturo off [18:55:34] !log admin Created indexes and views for nqowiki (T230543) [18:55:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [18:55:37] T230543: Prepare and check storage layer for nqowiki - https://phabricator.wikimedia.org/T230543 [19:02:07] !log tools.heritage restarted webservice [19:02:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [19:20:06] !log ores quiet icinga-wm in #wikimedia-ai [19:20:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [19:29:41] Hi, why this happening on toolforge? https://pastebin.com/mtARexm2 [19:30:11] Zoranzoki21: which part? the lack of sendmail? [19:30:23] yes [19:30:25] Is your tool running on the Kubernetes backend? [19:30:28] yes [19:30:48] no sendmail there :) Let me find the wikitech docs for you on using an smtp server [19:31:36] Zoranzoki21: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Email#Sending_via_SMTP [19:32:44] sh: 1: /usr/sbin/sendmail: not found is related to mediawiki because one user registred account on wiki of tool with email [19:33:12] I don't know how to make MediaWiki to use /usr/bin/mail [19:33:42] Zoranzoki21: https://www.mediawiki.org/wiki/Manual:$wgSMTP [19:34:42] How to set it for Toolforge? [19:36:21] Use mail.tools.wmflabs.org as the SMTP host. and auth=>false [19:37:20] bd808: Is this correct? https://pastebin.com/HepiY9BV [19:38:52] Zoranzoki21: maybe? You may want to set IDHost to just tools.wmflabs.org. I would have to read the php code to figure out what that is actually used for. The docs on that setting on mediawiki.org are not really clear about that [19:39:43] oh, it is clear "If not provided, will default to $wgServer." I think you can just leave that line out. [19:40:07] but yes, the host, port, and auth values in that paste look correct [19:41:41] Yes, thank you! [19:43:42] Hi, does anyone know if my email i just sent to wikitech-i has gone through? It's not showing up at https://lists.wikimedia.org/pipermail/wikitech-l/2019-October/thread.html [19:45:39] paladox: I don't see anything in my inbox from you on that list yet. Last one I have from wikitech-l is the email from Zeljko [19:45:49] ah, ok, thanks! [19:54:22] paladox: it looks like it's actually the Yahoo issue again.. yes :( [19:54:27] :( [19:54:27] per other channel [19:54:50] that one ticket is literally 10 years old, wow. but there is another one i think [19:55:06] you already pinged the right people if you comment there i think [19:55:13] yup [19:56:38] [TSS04] Messages from 208.80.154.21 temporarily deferred due to user complaints [19:57:02] says..host mta6.am0.yahoodns.net [19:57:22] i wonder who complained about lists.wikimedia.org? [19:59:04] i dunno. we also had issues with delivering _to_ Yahoo from wiki mail. like https://phabricator.wikimedia.org/T146281#2681996 [20:03:08] "I signed up to a mailing list then marked it as spam" [20:05:39] haha, probably [20:16:37] Hi, puppet seems to be failing with: [20:16:38] Warning: Unable to fetch my node definition, but the agent run will continue: [20:16:38] Warning: Error 500 on SERVER: Server Error: Failed to find puppet-paladox.git.eqiad.wmflabs via exec: Execution of '/usr/local/bin/puppet-enc puppet-paladox.git.eqiad.wmflabs' returned 1: [20:16:56] running /usr/local/bin/puppet-enc puppet-paladox.git.eqiad.wmflabs shows 502 [20:17:51] paladox: Mind if I log in there and poke around a bit? [20:17:56] yup [20:17:56] sure [20:18:13] happening on all my other hosts too [20:19:10] ping from there to puppetmaster.cloudinfra.wmflabs.org works [20:19:34] * bd808 is live blogging this debugging :) [20:19:47] heh [20:21:33] confirmed "bad gateway" for curl to http://puppetmaster.cloudinfra.wmflabs.org:8100 [20:23:01] I added a new security group rule to the puppetmaster. Must've tripped a bug of some sort. [20:23:15] hm... [20:25:49] andrewbogott: I'm getting the 502 response if I curl directly on cloud-puppetmaster-01.cloudinfra.eqiad.wmflabs too, but... maybe that's by design? How does the authn/authz work for the encoder backend? [20:25:56] hm, or maybe not [20:26:12] bd808: it should work locally [20:26:15] that's the most common use case [20:26:21] so the enc is misbehaving somehow [20:26:32] but not logging anything :| [20:27:06] 502 makes me think that the backend behind the Apache reverse proxy is not working? [20:28:02] all okay in here? [20:28:13] the enc on cloud-puppetmaster-01 is failing [20:28:55] bd808: on port 8101 is an nginx proxy that rejects write traffic but forwards read traffic to 8100 where the actual enc runs [20:29:17] andrewbogott: should we try restarting the uwsgi container for encapi? [20:29:22] I did [20:29:41] Okay [20:30:03] cloud-puppetmaster-01 runs nginx which listens on 8100, it forwards to localhost port 8101 [20:30:12] root@cloud-puppetmaster-01:~# curl http://localhost:8101/v1/tools/node/tools-puppetmaster-01 [20:30:12] curl: (52) Empty reply from server [20:30:18] ok sorry, had it the wrong way around I guess [20:30:35] andrewbogott: it is failig to connect to a mysql db. `journalctl -u uwsgi-labspuppetbackend.service --no-pager -f` [20:30:44] ok [20:30:48] that's what I feared... [20:31:01] access denied [20:31:05] didn't you just make a change in this area? [20:31:08] (1045, "Access denied for user 'labspuppet'@'172.16.4.166' (using password: YES)") [20:31:21] Krenair: on the cloudinfra db? I don't think so [20:31:42] https://gerrit.wikimedia.org/r/#/c/labs/private/+/542207/ [20:31:59] what, and that overrode the horizon-set password? [20:32:05] that's not how that's supposed to work [20:32:25] or, I guess it's a local patch and not in horizon? [20:32:37] hm... [20:33:10] root@cloud-puppetmaster-01:~# grep MYSQL_PASSWORD /etc/uwsgi/apps-enabled/labspuppetbackend.ini [20:33:10] env=MYSQL_PASSWORD=dummy [20:33:14] This is obviously not the actual password. [20:33:43] ok, but /that/ password should be coming from cloudinfra-internal-puppetmaster01 shouldn't it? [20:34:21] and I see a real-looking "profile::openstack::eqiad1::puppetmaster::encapi::db_pass" on there [20:34:33] it should be [20:34:48] hm, patch conflict there [20:34:49] Ah [20:34:51] I see why [20:34:53] I don't know why that would be breaking it but... [20:34:55] You set it in labs.yaml [20:34:59] Our cherry-pick in there is for common.yaml [20:35:11] ah [20:35:17] ok [20:35:26] do you want to resolve that rebase conflict while you're in there or shall I? [20:35:41] I will [20:35:47] thanks [20:36:48] done [20:37:15] probably have to paste that password in by hand on cloud-puppetmaster-01 to get things bootstrapped [20:37:41] yes [20:38:26] doing... [20:38:35] I did it [20:38:40] ok :) [20:38:43] oh wait we didn't fix the problem on the repo yet [20:38:45] it reverted the fix [20:39:09] shall we change the cherry-pick to replace the line you added to common.yaml? [20:39:17] er, to labs.yaml [20:39:31] sure [20:41:57] Okay [20:42:09] puppet on the main puppetmaster is happy [20:42:26] puppet on the tools puppetmaster is happy [20:43:12] lgtm [20:43:32] paladox, thanks for reporting this. want to check your instance can run puppet again? [20:43:41] yup [20:43:48] thanks to all of you for fixing it :) [20:44:24] it works! [20:45:04] thanks Krenair [20:45:10] np [21:06:05] andrewbogott: My DB VM is at 2/3 capacity now. I'm going to ask now, though I don't anticipate it filling up for another 6 months at least. [21:06:35] How do I expand the DB size? Is it possible to draw on another VM's SRV space? [21:06:51] Or is it possible to spawn a new DB VM with larger disk space? [21:08:18] Cyberpower678: In theory you can shard your DB across multiple servers but I can't offer much help or guidance with that. If you need a bigger VM then best to open a ticket. [21:08:50] andrewbogott: yea. It should hold for a while but IABot is at 200 GB of 300 GB