[04:29:42] I am getting error "User None doesn't have upload rights" [04:29:55] on pywikibot [04:30:55] Even I am logged in using os.system("python3 /shared/....../pwb.py login")") [04:31:15] Login works correctly [04:32:22] Look here https://github.com/nokibsarkar/nokib-bot/blob/393f0879061ae7a93037ffad31842d9a9b9e9e03/reduceImage.py#L107 [04:34:30] !help [04:34:30] If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-kanban [04:54:25] Knock knock [04:54:30] !help [04:54:30] If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-kanban [05:13:31] Is there any known issue with replication lag for the toolforge, particularly enwiki? [05:13:55] The replag tool says there is 0 lag, but I'm struggling to find a user that was registered 19 hours ago in the database [09:10:34] !log project-proxy restart acme-chief service in roject-proxy-acme-chief-01 (T262237) [09:10:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [09:10:39] T262237: The TLS certificate for https://wma.wmflabs.org is expired - https://phabricator.wikimedia.org/T262237 [09:17:58] !log project-proxy upgrading acme-chief deb package from 0.25-1 to 0.28-1 on project-proxy-acme-chief-01 (T26223 [09:18:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [09:18:03] T26223: Add button to &diffonly=1 history link - https://phabricator.wikimedia.org/T26223 [09:18:07] !log project-proxy upgrading acme-chief deb package from 0.25-1 to 0.28-1 on project-proxy-acme-chief-01 (T262237) [09:18:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [09:18:11] T262237: The TLS certificate for https://wma.wmflabs.org is expired - https://phabricator.wikimedia.org/T262237 [09:23:22] !log project-proxy cleanup old apt sources.list entries referencing mitaka-jessie that prevents clean package upgrades in proxy-01 [09:23:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [09:35:28] !log project-proxy refresh some hiera settings in the `project-proxy-acme-chief` puppet tab [09:35:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [10:05:52] !log project-proxy remove /var/lib/acme-chief/certs/* to force acme-chief generating new certs instead of renewing them (T262237) [10:05:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [10:05:56] T262237: The TLS certificate for https://wma.wmflabs.org is expired - https://phabricator.wikimedia.org/T262237 [11:13:14] anybody using putty? i've tried to set it up to use the new domain but it refuses the connection [11:14:02] refuses? [11:16:00] let me coppy the message [11:16:02] mmt [11:16:54] Reedy: proxy: FATAL ERROR: Server refused to open main channel: Administratively prohibited [open failed] [12:22:06] Hello everyone, I'm having trouble to setup puppet for deployment-push-notifications01 instance [12:22:22] It was having error because it was pointing to puppetdb03 [12:22:41] But now I can't sign a cert from puppetdb04 [12:22:49] any documentation I can look into? [12:23:52] as far as I remember, certs are signed on puppetmaster* and not on puppetdb* [12:25:07] yeah, sorry puppetmaster [12:25:23] I'm following this guide https://phabricator.wikimedia.org/P7162 [12:25:32] but with puppetmaster04 [12:26:00] how does it fail? [12:27:28] https://www.irccloud.com/pastebin/3zB5jxZE/ [12:28:07] did you try to run puppet once on deployment-push-notifications01? [12:28:36] https://www.irccloud.com/pastebin/kNfRR3cw/ [12:28:40] Yes, I did [12:30:50] uhh interesting [12:31:16] did you try removing certs from both the master and the agent with the instructions on the error message? [12:31:51] I removed the certs from the agent, but not the master [12:32:32] I thought the master would do that when signing the cert [12:32:42] no, the master remembers the existing cert [12:32:51] $ puppet cert clean deployment-push-notifications01.deployment-prep.eqiad.wmflabs [12:34:41] nice catch [12:34:46] but still can't sign [12:34:49] https://www.irccloud.com/pastebin/Eyx3fc2c/ [12:35:52] you ran puppet on the host between cleaning and trying to sign, right? [12:36:37] !log tools.wikibugs Updated channels.yaml to: fed690d7bcd92f17f1f4d090d4a05cf35a52a5d7 Send #jenkins to #wikimedia-releng [12:36:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [12:37:38] Majavah: I got it thanks! [12:37:48] It works fine now [12:37:51] you're welcome [12:38:04] !log tools.wikibugs restart to pick up https://gerrit.wikimedia.org/r/c/labs/tools/wikibugs2/+/625755 [12:38:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [12:38:43] Yeah, almost, I'm getting this error now [12:38:44] https://www.irccloud.com/pastebin/ANDy9rRI/ [12:40:03] uhh, that's a puppetdb error [12:41:09] I have no idea how puppetdb works on deployment-prep :( [12:55:41] 6May i ask why when i changed my hardrive the onecloud is only linking to file name and not contents in the search fucntion? [12:56:19] DDD-016: I believe you are at the wrong place [12:56:47] this channel is for support related to Wikimedia Cloud Services [13:17:26] mateusbs17, Majavah, most often how puppetdb works on deployment prep is that it crashes: https://phabricator.wikimedia.org/T248041 [13:17:53] not sure if that's today's problem though. [13:18:39] andrewbogott: should I restart it and see if the problem persist? [13:18:52] yeah, that's a good first place to start [13:19:08] I don't know if I'll have time to dig deeper for the next few hours though, sorry [13:27:45] puppetdb was not even running [13:28:16] working now [13:33:25] cool [13:48:13] Heja. Could someone who's better in understanding cloud things (and expressing themselves) than me comment on https://phabricator.wikimedia.org/T101442 about hosting code in CloudVPS? Thanks in advance :) [13:59:38] ok, seems like it was a fluke, now got created [13:59:50] (had to stop and start for a few times) [14:01:50] gave it a try andre__ [14:02:03] thank you! [14:02:49] yeah, that's a better explanation, definitely [14:13:20] the VimConf Live 2020 videos have been published https://www.youtube.com/playlist?list=PLcTu2VkAIIWzD2kicFNHN2c35XQCeZdsv 🤤 [14:13:37] hmm, no, seems like there is really a problem with creating pods on toolforge :( [14:14:35] dibabel keeps getting stuck in the ContainerCReating state, with events showing it is unable to mount volumes [14:14:56] joakino, are you the person to ping? [14:15:21] !log admin stopping apache2 on labweb1001 to make sure the Horizon outage is total [14:15:53] nope, yurik. I'm sure someone will show up in a bit [14:16:34] judging by the above log entry, something big is coming down? [14:17:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:17:53] hmm, took abut 10 min for a pod to get created [14:18:20] !log admin restarting nova-fullstack service in cloudcontrol1003 [14:18:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:21:00] yurik, yes there is some disruption of services at the moment. We're working to find and mitigate impacts to WMCS related services [14:21:35] thanks balloons ! [14:29:11] !log admin restarting nova-compute on all cloudvirts (everyone is upset from the reset switch failure) [14:29:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:54:19] andrewbogott: probably related to the outage, but I noticed that hiera config was erased for at least 2 instances in horizon [14:54:41] mateusbs17: which instances? [14:54:47] mateusbs17: changes you made in the last couple of hours? [14:55:04] deployment-push-notifications-01 which I was working today [14:55:13] And deployment-maps06 [14:55:22] our database cluster was affected so some edits may have been lost. [14:55:23] Which I didn't touch since last Friday [14:55:26] huh [14:55:34] I see this as the last change mateusbs17 [14:55:34] worst case there should be records in git [14:55:36] The hiera config is empty [14:55:38] https://www.irccloud.com/pastebin/zmsfGNK6/ [14:56:18] arturo: that's right, but for me the hiera config is now empty, although not applied to the machines [14:56:37] mateusbs17: we're about to all head into a long-scheduled meeting; if you can create a task about this we will look later on [14:56:49] sounds good [14:56:51] I'd also encourage you to log out/in to Horizon just to make sure you're seeing a consistent state [14:57:07] ok, will do [14:57:32] mateusbs17: for the record, this can serve as a backup [14:57:33] https://gerrit.wikimedia.org/r/admin/repos/cloud/instance-puppet [14:57:56] arturo thanks that's useful information indeed [14:57:59] for example https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/refs/heads/master/deployment-prep/ [16:24:47] I doubt anyone saw my comment last night, but enwiki database replicas still appear to be lagged much more than replag.toolforge.org is indicating. I filed T262239 with some more details. Is there a known issue or maintenance? [16:24:48] T262239: enwiki database replicas appear to be lagged and are falling further behind - https://phabricator.wikimedia.org/T262239 [16:40:14] !log tools.lexeme-forms deployed 9ac796e7aa (Manbhumi verbs) [16:40:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [16:52:54] ST47, let me have a look [16:53:08] balloons: I have a hunch it may be related to the dc switch [16:53:20] bd808, ahh, good point! [16:53:42] * bd808 applied the #dba tag to the task [16:55:20] and Manuel responds almost immediately :) [16:55:57] :D [16:56:19] At least it is a known situation [16:58:29] I'm not sure how to warn everyone that things are likely to be a bit slow in the wiki replicas the whole time that the main wikis are being served from the codfw datacenter [16:59:13] I think it would help to place a notice on replag.toolforge.org, if you are able [16:59:24] And perhaps a note in the /topic for this channel? [18:05:00] !log tools.openstack-browser Updated to ec02307 for T262293 [18:05:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.openstack-browser/SAL [18:21:28] !log clouddb-services copied the profile::mariadb::section_ports key into prefix puppet to fix puppet after the refactor for wmfmariadbpy [18:21:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL [20:54:57] hi all! welcome razzi! He's a new SRE on the analytics engineering team [20:55:10] he's learning how to set up standalone puppetmaster in cloud vps [20:55:13] Hi all :) [20:55:25] I haven't done this in a while, but something isn't working right, and we are following the docs pretty closely [20:55:39] we can get the standalone master to be a puppet agent client of itself fine [20:55:47] but a second node isn't working [20:55:57] we are able to sign the client cert on the master just fine [20:56:12] but then the next puppet run on the client gives: Error: Could not request certificate: SSL_connect returned=1 errno=0 state=error: certificate verify failed (certificate rejected): [ok for /CN=razzi-puppetmaster.analytics.eqiad.wmflabs] [20:56:29] stumped for the moment, so I suggested we ask in here [20:56:43] maybe bstorm has some advice :) [20:56:50] I have to run very soon, but razzi will be around for a bit [20:56:58] Indeed [20:57:50] * bstorm wonders how much needs updating in the docs [20:58:18] Hmm. Actually, this could be related to this mornings change [20:58:26] andrewbogott: check this out above ^^ [20:58:35] The cert that was rejected [20:58:55] Did you just create that puppetmaster? [20:59:55] Created it last week [21:00:08] Ok, so before the change [21:00:52] * bstorm goes to look at the puppetmaster [21:01:48] bstorm: that's probably from naming changes but I can't look right now, feel free to make me a bug [21:01:58] Sure! I'll collect some info first [21:01:59] gotta run good luck thanks bstorm and all! [21:02:33] razzi: what's the name of the client that was having trouble? [21:02:50] bstorm: razzi-puppet-client.analytics.eqiad.wmflabs [21:03:03] easy enough to remember :) [21:04:43] Yeah, it does seem related to the changes. I'll start recording info [21:09:38] Gotcha. Any related docs I can peek at? I don't suppose I'll understand, but maybe I'll learn something [21:10:54] Has anyone with access to configuration of cloud-machines a minute to look at https://phabricator.wikimedia.org/T262186 Thanks in addition: tools-sgeexec-0947 is also affected [21:12:28] T262328 [21:12:29] T262328: Standalone puppetmaster seems broken, possibly due to FQDN changes - https://phabricator.wikimedia.org/T262328 [21:13:50] Thanks [21:14:46] That's linked to the parent task that talks about what was done [21:15:06] I'm looking at a few other things to see if there's a quick way to unblock you [21:15:24] Wurgl: that seems like there's a difference in PHP libs installed. [21:15:31] It works on some nodes, I take it [21:27:14] Yes, some updates done here but not there … [21:29:22] the best option right now is for y'all to just stop for a bit [21:29:32] while I fix a different known issue that might relate to the puppetmaster thing [21:31:05] Can do. This isn't blocking me; not an issue for me to wait [21:31:36] thx [21:31:50] Great. [21:39:18] Wurgl: does it work on the login *and* the dev bastion? [21:39:38] tools-sgebastion-08 works [21:39:44] Ok, good [21:40:01] That's got the php-7.2 version that I think really should be installed [21:40:02] I have one tool which uses redis and it seems to run stable [21:40:16] just an annoying error message [21:41:29] It's one of the few things that isn't installed from the correct repo across the whole set, so I'll try a patch to see if that clears it up [21:48:11] !log admin Renamed FQDN prefixes to wikimedia.cloud scheme in cloudinfra-db01's labspuppet db (T260614) [21:48:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [21:48:16] T260614: Phase out use of .wmflabs tld - https://phabricator.wikimedia.org/T260614 [22:11:31] ssh/mosh to toolforge seems broken [22:16:36] gifti: we are having some issues with ldap apparently. bstorm is working on it [22:53:47] !log tools forcing puppet run on tools-sgebastion-07 [22:53:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:24:52] !log tools clearing grid queue error states blocking job runs [23:24:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:45:25] !log tools.toolschecker deleted job 2111722 to clear up errored queue [23:45:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.toolschecker/SAL