[00:12:42] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [01:38:18] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 06Developer-Relations, and 2 others: Developing community norms for vital bots and tools - https://phabricator.wikimedia.org/T149312#2944007 (10bd808) Session notes: https://etherpad.wikimedia.org/p/devsummit17-developing-community-norms Video: https://www.y... [02:39:12] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Facilitate Volunteer NDA application process for potential Tool Labs standards committee appointees - https://phabricator.wikimedia.org/T154625#2944057 (10bd808) [03:09:04] 06Labs, 10Tool-Labs: Please install mktorrent on tool labs - https://phabricator.wikimedia.org/T155470#2944096 (10Legoktm) [05:30:35] (03CR) 10TTO: [C: 031] Add mediawiki/extensions/Babel to #wikimedia-dev for gerrit [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/331899 (https://phabricator.wikimedia.org/T155165) (owner: 10Paladox) [06:08:41] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [06:12:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [06:49:20] PROBLEM - Puppet run on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:16:12] (03CR) 10Legoktm: [C: 032] Add mediawiki/extensions/Babel to #wikimedia-dev for gerrit [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/331899 (https://phabricator.wikimedia.org/T155165) (owner: 10Paladox) [07:16:34] (03Merged) 10jenkins-bot: Add mediawiki/extensions/Babel to #wikimedia-dev for gerrit [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/331899 (https://phabricator.wikimedia.org/T155165) (owner: 10Paladox) [07:16:45] (03CR) 10jenkins-bot: Add mediawiki/extensions/Babel to #wikimedia-dev for gerrit [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/331899 (https://phabricator.wikimedia.org/T155165) (owner: 10Paladox) [07:29:19] RECOVERY - Puppet run on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [08:26:34] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Shweta Chandrakant Pawar was created, changed by Shweta Chandrakant Pawar link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Shweta_Chandrakant_Pawar edit summary: Created page with "{{Tools Access Request |Justification=Shweta Chandrakant Pawar is Born on 1st May 1982. And she is real heroine for every girl. She is very good humanbeing. She is also Acting..." [08:27:40] >She is very good humanbeing [08:52:10] 06Labs, 10Labs-Infrastructure: Empty default security group for newly created project - https://phabricator.wikimedia.org/T136871#2944456 (10Lokal_Profil) Just a note about this still happening [09:07:21] 06Labs, 10Tool-Labs-tools-Other, 06Community-Tech-Tool-Labs, 06Developer-Relations, 10Developer Wishlist (2017): Create an authoritative and well promoted catalog of Wikimedia tools - https://phabricator.wikimedia.org/T115650#2944497 (10Qgil) [09:09:37] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 06Developer-Relations, and 2 others: Run a documentation sprint for Labs - https://phabricator.wikimedia.org/T101659#2944502 (10Qgil) [09:39:53] Question... are people already able to usurp and or adapt abandoned tools? [09:44:49] Wiki13: bd808__ could correct me if I'm wrong, but I think not [09:45:35] :/ [09:49:11] do you know when people will be able to? [10:00:31] Wiki13: nope, but bd808__ might. he is probably asleep tho [10:00:33] forking is always an option [11:25:52] yuvipanda: ! [11:27:28] Hi samtar [11:27:46] Am at a hospital now, will be back in a few hours. Do ask your question :) [11:27:58] I'll get to it as soon as i can [11:28:26] yuvipanda: How's it going? :) I was going to ask if I could be that Super Annoying Person(TM) and ask if you can/know of anyone who can generate a replica.my.cnf? [11:29:47] That would be me :) I'll get to it in a few hours. [11:30:05] I've some amount of work to do around user replica.my.cnf files [11:31:20] 10Tool-Labs-tools-Attribution-Generator, 06TCB-Team, 15User-Tobi_WMDE_SW: E-Mail Adress - https://phabricator.wikimedia.org/T155093#2944730 (10Tobi_WMDE_SW) [11:32:11] yuvipanda: :D <3 well that would be awesome - hope the hospital visit is just that :) [12:17:01] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [13:08:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [13:10:56] 10Tool-Labs-tools-Attribution-Generator, 06TCB-Team, 15User-Tobi_WMDE_SW: E-Mail Adress - https://phabricator.wikimedia.org/T155093#2944809 (10Tobi_WMDE_SW) 05Open>03Resolved a:03Tobi_WMDE_SW You have been added @Katja_Ullrich_WMDE! [14:10:25] samtar: I just got back, will work on it in a bit [14:10:37] yuvipanda: No worries :D thank you [14:13:03] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [14:21:52] 06Labs: Webservice outages and/or issues - https://phabricator.wikimedia.org/T155494#2944916 (10doctaxon) [14:28:55] 06Labs: Webservice outages and/or issues - https://phabricator.wikimedia.org/T155494#2944932 (10doctaxon) [14:30:21] 06Labs: Webservice outages and/or issues - https://phabricator.wikimedia.org/T155494#2944916 (10doctaxon) [14:44:42] doctaxon: there's webservicemon iirc [14:49:57] * zhuyifei1999_ is searching [15:03:45] 06Labs: Webservice outages and/or issues - https://phabricator.wikimedia.org/T155494#2944996 (10zhuyifei1999) https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Grid#Bigbrother > The webservice system uses manifest monitors to provide similar functionality automatically. See also {T90561}. I'm pretty sure webse... [15:09:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [15:27:11] 10Tool-Labs-tools-Attribution-Generator, 06TCB-Team, 15User-Tobi_WMDE_SW: Redirect e-mail adress given in Attribution-Generator feedback form to katja - https://phabricator.wikimedia.org/T155093#2945015 (10Aklapper) [15:39:49] 06Labs, 10Tool-Labs: Webservice outages and/or issues - https://phabricator.wikimedia.org/T155494#2945042 (10zhuyifei1999) [16:00:51] Krenair: think I should just go ahead and merge https://gerrit.wikimedia.org/r/#/c/331638/, or do I need to pre-announce the change to tools users? [16:01:16] (ordinarily I'd add it to the deployment calendar to avoid conflict with other things, but it seems pretty clear that there are no deployments happening today) [16:02:38] andrewbogott, should probably check that ruby-httpclient on a few puppetmasters is happy with LE certs first [16:02:50] maybe restart the puppetmaster services there too [16:03:36] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Facilitate Volunteer NDA application process for potential Tool Labs standards committee appointees - https://phabricator.wikimedia.org/T154625#2945136 (10bd808) [16:05:12] Krenair: You mean, after merging? Or are there tests I should be doing beforehand? [16:05:55] before [16:06:29] to make sure the ruby-httpclient ssl config change took effect [16:08:33] ok, so that's the steps that you list in T154913. Let me try that on the tools puppetmaster... [16:08:33] T154913: convert wikitech.wikimedia.org from globalsign to letsencrypt certificate (deadline 2017-02-24) - https://phabricator.wikimedia.org/T154913 [16:13:35] Krenair: I'm still seeing "unable to get local issuer certificate" [16:14:00] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:14:56] what host? [16:15:19] and how are you testing this exactly [16:15:32] I'm on tools-puppetmaster-02 [16:15:39] and I'm trying to do the steps you list in that ticket. [16:15:51] So, in particular, it was @http.get('https://labtestwikitech.wikimedia.org') [16:25:18] andrewbogott, you should get the same results [16:26:01] the point was to demonstrate the exact change I was making to the script - one part of that comment is to test before the new added lines, and the rest tests with the new lines [16:31:56] oh, ok — I misunderstood [16:34:02] So that c/p is from a system that already had https://gerrit.wikimedia.org/r/#/c/311048/ applied? [16:39:22] Is this some variety of spam? -- https://wikitech.wikimedia.org/wiki/User:Shweta_Chandrakant_Pawar [16:39:49] The associated tools membership request certainly looks spammy [16:41:14] bd808: the tool request is definitely.. [16:41:59] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Shweta Chandrakant Pawar was modified, changed by BryanDavis link https://wikitech.wikimedia.org/w/index.php?diff=1333677 edit summary: [16:53:59] Krenair, I'm going to skip over my ongoing confusion and ask: how should I "check that ruby-httpclient on a few puppetmasters is happy with LE certs first" ? [16:56:10] andrewbogott, the script I posted on the task should give the same results before and after the patch is applied [16:56:24] it does not run the code edited in the patch [16:56:42] It merely demonstrates the change I made to the code in a standalone way [16:57:07] I'm not sure exactly how to check whether it applied. Hmm [16:57:37] 06Labs, 10Labs-Infrastructure: Empty default security group for newly created project - https://phabricator.wikimedia.org/T136871#2945454 (10chasemp) >>! In T136871#2944456, @Lokal_Profil wrote: > Just a note about this still happening Can you be more specific @Lokal_Profil. Is there a new project you notic... [16:59:54] andrewbogott, I guess we can try switching wikitech to LE and see if it breaks puppet anywhere [17:01:14] +1 [17:01:19] it looked good to me [17:04:15] You can check that /var/lib/puppet/lib/hiera/mwcache.rb got updated [17:05:29] Krenair: or we could point a labs instance to labtestcontrol puppetmaster [17:05:40] sure [17:05:54] I'll set up a one-off instance to try that [17:05:55] that... might work [17:05:58] and then will just roll out the patch [17:06:37] Krenair: will any old instance do, or does it specifically need to be running a ::standalone master? [17:06:51] I think any old instance would do [17:06:59] 'k [17:07:09] I think the worst that could happen when rolling this out is puppet fails everywhere, but would be fixed by a revert of the cert change commit [17:08:23] * andrewbogott nods [17:10:03] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:18:56] ok, testing with labtest is just leading me down a bunch of rabbit holes. I'm going to just merge and revert if it goes badly. [17:28:16] Krenair: puppet on silver says [17:28:17] "Error: /Stage[main]/Role::Labs::Openstack::Nova::Manager/Sslcert::Certificate[wikitech.wikimedia.org]/File[/etc/ssl/localcerts/wikitech.wikimedia.org.crt]: Could not evaluate: Could not retrieve information from environment production source(s) puppet:///files/ssl/wikitech.wikimedia.org.crt" [17:28:38] that's probably leftover stuff from the pre-LE setup? [17:28:39] andrewbogott: are you pursuing the cert replacement using LE? [17:28:42] ah ok :) [17:29:16] ( andrewbogott: thank you ) [17:29:58] andrewbogott, that's interesting [17:30:12] is sslcert::certificate not able to handle ensure => absent properly? [17:31:48] it looks to me like it /should/ handle it, all I see is $ensure getting passed to file resources [17:32:57] andrewbogott, and also source getting passed [17:33:05] but source should be ignored by file resources if ensure is absent, right? [17:33:10] I'd think [17:33:43] we could change sslsert::certificate to check $ensure, but it's hacky [17:34:11] maybe we can find a puppet expert in the other channel to take a look before we resort to that? [17:34:33] Krenair: I'm looking at Role::Labs::Openstack::Nova::Manager… should it be defining Sslcert::Certificate at all? [17:34:52] only to get rid of the old one from the machine [17:35:20] where is that happening? I only see it in the else clause [17:35:21] could do it manually instead of fiddling with ensure => absent resources [17:35:32] (which shouldn't be traversed on account of hiera('wikitech_use_letsencrypt', false) ) [17:35:53] the if/else is gone [17:36:03] oh, I'm looking at an obsolete branch, sorry [17:36:05] * andrewbogott rebases [17:37:34] I'm looking at https://gerrit.wikimedia.org/r/#/c/331638/4/modules/role/manifests/labs/openstack/nova/manager.pp [17:38:54] Krenair: ok, so the dumb solution is something like https://gerrit.wikimedia.org/r/#/c/332519/ [17:39:39] yep [17:39:58] I think [17:40:18] lemme see if I can scare up alexandros for a second opinion [17:40:27] Krenair: the good news is, nothing else broke [17:40:32] yay [17:40:40] I think [17:41:00] yeah, wikitech shows an LE cert [17:41:03] okay [17:41:15] so we just need to make puppet on silver.wm.o happy [17:43:08] <_joe_> andrewbogott: can you state your problem again? I wasn't here apparently [17:43:34] We did https://gerrit.wikimedia.org/r/#/c/331638/4/modules/role/manifests/labs/openstack/nova/manager.pp [17:43:39] But this error came up on silver.wm.o: [17:43:40] _joe_: It seems like puppet is failing to handle ensure=>absent properly [17:43:46] Error: /Stage[main]/Role::Labs::Openstack::Nova::Manager/Sslcert::Certificate[wikitech.wikimedia.org]/File[/etc/ssl/localcerts/wikitech.wikimedia.org.crt]: Could not evaluate: Could not retrieve information from environment production source(s) puppet:///files/ssl/wikitech.wikimedia.org.crt [17:43:51] Proposed fix is https://gerrit.wikimedia.org/r/#/c/332519/ but that seems really stupid [17:44:12] We set ensure => absent on this resource but it complains about the source not existing [17:44:27] <_joe_> ok let me look at this for a sec [17:44:39] thanks [17:45:01] (I'm running the proposed patch through the puppet compiler now, to see if it actually helps) [17:45:09] I suggested a hacky potential fix to sslcert::certificate which andrew uploaded in https://gerrit.wikimedia.org/r/#/c/332519/ - we both think this is suboptimal [17:45:36] we could just do it manually rather than using ensure => absent [17:45:41] <_joe_> uhm [17:45:49] <_joe_> this is indeed weird and dumb [17:45:53] <_joe_> sounds like puppet! [17:45:56] :) [17:46:02] <_joe_> lemme verify quickly [17:46:59] You would expect that with a file resource, if you set ensure => absent (which we are passing in indirectly), it would ignore the source. I think. [17:47:03] puppet compiler is weird since it seems to imply that it regards the before state as still valid [17:47:10] https://puppet-compiler.wmflabs.org/5119/ [17:47:31] <_joe_> puppet apply -e 'file { "/home/joe/test": ensure => absent, source => "/tmp/nonexistent" }' [17:47:34] <_joe_> Notice: Compiled catalog for gizmo.local in environment production in 0.06 seconds [17:47:37] <_joe_> Error: /Stage[main]/Main/File[/home/joe/test]: Could not evaluate: Could not retrieve information from environment production source(s) file:/tmp/nonexistent [17:47:40] <_joe_> Notice: Applied catalog in 0.04 seconds [17:47:41] <_joe_> puppet, you suck [17:47:44] <_joe_> ahahahahah [17:47:44] !log deployment-prep re-enabling puppet on deployment-restbase01 [17:47:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [17:49:54] didn't know you could do that to test, neat [17:50:58] !log deployment-prep re-enabling puppet on deployment-restbase02 [17:51:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [17:51:05] _joe_: that's...bizarre [17:51:14] <_joe_> chasemp: it is, right? [17:51:47] <_joe_> chasemp: look [17:51:48] <_joe_> joe@gizmo:~$ touch /tmp/nonexistent [17:51:48] <_joe_> joe@gizmo:~$ puppet apply -e 'file { "/home/joe/test": ensure => absent, source => "/tmp/nonexistent" }' [17:51:51] <_joe_> Notice: Compiled catalog for gizmo.local in environment production in 0.06 seconds [17:51:54] <_joe_> Notice: Applied catalog in 0.08 seconds [17:52:07] then next run fails I imagine? [17:52:11] no [17:52:17] I thought so too, but read carefully [17:52:24] source is /tmp/nonexistent, target is /home/joe/test [17:52:26] <_joe_> nope, because /tmp/nonexistent still is there [17:52:31] ah [17:52:37] target is ensure => absent'd, source file is not controlled by puppet [17:52:46] <_joe_> yes [17:52:58] <_joe_> the funny thing is that I can imagine why this goes like that [17:53:19] <_joe_> btw when you have such doubts, #puppet is a good place to ask [17:53:49] okay, so [17:54:00] <_joe_> there are a pair of the elite puppet experts in the world hanging out there [17:54:07] cool [17:54:10] <_joe_> I mean besides me [17:54:13] :D [17:54:18] the proper way around this is our unideal patch for sslcert::certificate? [17:54:34] <_joe_> Krenair: seems so, sigh [17:54:40] <_joe_> but it cannot be so stupid [17:54:55] yeah but, I think it just was :p [17:56:21] I will merge my patch, but I'm going to frown while doing it. [18:00:18] _joe_, Krenair, it worked :( [18:00:29] :/ [18:01:03] <_joe_> I'm sure there are better ways to do it [18:01:11] <_joe_> but well, whatever [18:01:32] if there are better ways let's take a look at those [18:04:07] _joe_, I want to say something like "if you think it's worth the time", but sometimes people would say that to imply that it's not. I also don't want to push you to do it if you don't have time [18:04:26] Krenair: time for me to close that ticket, or is there other followup? [18:05:10] andrewbogott: was all that triggered by wikitech move to LE this week? [18:05:19] (well, i have a note to ask if we are going to do that this week) [18:05:35] I also don't want to make it seem like I'm just rushing to get poor fixes adopted without considering better ones. I just don't know of any better ones [18:05:49] robh: Yeah. Typically we don't bother with ensure=>absent for the old certs, but we did this time. [18:06:07] well, be careful when applying the patches cuz you need the old source certs there for the initial creation of LE certs [18:06:19] and your config has to source the old certs until after regen [18:06:21] Everything about adding the new cert went just fine, it was only the cleanup after that broke [18:06:26] I thought our LE puppetisation handles that robh [18:06:26] ahh, cool [18:06:31] Krenair: Nope =P [18:06:35] andrewbogott, hm. you may want to remove the old key from puppet-private [18:06:46] Krenair: yeah, ok [18:06:46] it breaks, daniel and i have experienced it in detail! [18:06:59] I can't think of any other follow-up but I'm not usually the one who deals with these. mutante and robh have been doing them [18:07:02] so the fix is leave old cert, roll new cert class via puppet, but dont touch apache config [18:07:11] but yeah, if tis all working on new cert now, yay! [18:07:23] robh, to be honest I've not seen it succeeding to handle that [18:07:28] but I thought it made some attempt to :) [18:07:41] afaik it makes zero attempt but i may be wrong [18:07:50] ive been poorly trying to understand the python script [18:08:00] Krenair: so, that's modules/secret/secrets/ssl/wikitech.wikimedia.org.key ? [18:08:06] modules/letsencrypt/manifests/cert/integrated.pp says 'Pre-setup with self-signed cert if necessary, to let $puppet_svc start' [18:08:08] andrewbogott: indeed [18:08:16] andrewbogott, don't know, don't have access to that repo [18:08:20] it is [18:08:38] and we dont bother to revoke the cert [18:08:49] cuz cluttering the revokation lists for that is messy and not ideal [18:08:55] (plus it expires next week anyhow) [18:09:12] thanks for handling it folks! [18:09:25] Krenair: I'll double check with daniel [18:09:37] andrewbogott: so the absent i dislike [18:09:39] we tend to shred it [18:09:41] not absent it [18:09:47] absent doesnt do anything but rm [18:10:01] robh: also it would store it in the archive on-server [18:10:03] afaik [18:10:10] but absent automates versus our manual shred so im not sure which is best [18:10:31] and my use of shred may simply be paranoia. [18:10:39] shred referring to the multiple-pass deleting processes? [18:10:43] yep [18:11:00] likley overly paranoid, and doesnt work if system is ssd of course [18:11:02] have we made that impossible by absenting it? [18:11:07] robh: this is largely moot for a just-about-to-expire cert, right? [18:11:12] well, absent removes the file via rm right? [18:11:33] not via any overly destructive process (nor do i think it typically should) [18:11:33] yeah [18:11:49] afaik [18:11:55] im just sharing what i used to do while acknowledging it is likely not the best practice for this [18:12:03] okay [18:12:03] robh: https://docs.puppet.com/puppet/latest/type.html#file-attribute-backup [18:12:10] the absent is likely ok. if the file was complromised we would have revoked anyhow [18:12:11] it probably got filebucketed in the puppet store [18:12:16] oh [18:12:17] unless you do [18:12:28] backup => false [18:12:32] afair [18:12:51] so shouldnt our new patch set backup fasle? [18:12:53] false even [18:12:58] well, andrews =] [18:13:17] I said 'our' earlier when I didn't want andrew to get the blame for what I thought was a bad idea of mine [18:13:28] on the other hand if we're giving out credit, well, andrew went to the effort of actually implementing it [18:13:42] I didnt wanna say our and claim any credit for you guys fixing shit ;] [18:14:23] but it seems like when we absent a key file, we may want to ensure it isn't in the local filebucket backup as chasemp lists [18:14:27] (unless im not following) [18:14:42] yes, as long as we want it dead and gone for sure [18:14:58] https://docs.puppet.com/puppet/latest/man/filebucket.html [18:15:09] you can pretty trivially retrieve filebucketed items usually [18:15:40] that key is also going to remain sitting in the history of the puppet-private repo right? [18:16:05] correct [18:16:14] also on our paid ssl certs, even if we lost the key [18:16:24] we could regen a new cert and ey without added cost on the original cert's expiry [18:16:41] so multiple safety nets [18:26:47] !log tools Disabling puppet across tools to test https://gerrit.wikimedia.org/r/#/c/329707/ [18:26:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:27:23] PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [18:27:29] PROBLEM - Puppet run on tools-static-10 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [18:28:09] PROBLEM - Puppet run on tools-exec-1419 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [18:28:17] gah [18:28:28] working on it ^ [18:29:13] PROBLEM - Puppet run on tools-webgrid-lighttpd-1207 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [18:29:19] PROBLEM - Puppet run on tools-worker-1019 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [18:29:21] PROBLEM - Puppet run on tools-worker-1004 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [18:29:31] PROBLEM - Puppet run on tools-webgrid-lighttpd-1406 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [18:32:01] PROBLEM - Puppet run on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [18:32:19] PROBLEM - Puppet run on tools-webgrid-lighttpd-1413 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [18:47:21] RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [18:47:53] !log tools Reenabled puppet across tools [18:47:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:49:22] RECOVERY - Puppet run on tools-worker-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [18:51:59] madhuvishy: things going ok? :) [18:52:02] RECOVERY - Puppet run on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [18:52:19] chasemp: yeah :) I disabled puppet a bit too late [18:52:30] RECOVERY - Puppet run on tools-static-10 is OK: OK: Less than 1.00% above the threshold [0.0] [18:54:12] RECOVERY - Puppet run on tools-webgrid-lighttpd-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [18:54:20] RECOVERY - Puppet run on tools-worker-1019 is OK: OK: Less than 1.00% above the threshold [0.0] [18:54:31] RECOVERY - Puppet run on tools-webgrid-lighttpd-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [18:57:17] RECOVERY - Puppet run on tools-webgrid-lighttpd-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [18:58:09] RECOVERY - Puppet run on tools-exec-1419 is OK: OK: Less than 1.00% above the threshold [0.0] [19:20:02] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:28:40] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:04:36] 06Labs, 10Labs-Infrastructure, 10labs-sprint-117, 13Patch-For-Review: Give 'novaobserver' keystone account rights to read everything, everywhere, write or change nothing - https://phabricator.wikimedia.org/T104588#2946336 (10Andrew) [20:04:39] 06Labs, 10Labs-Infrastructure, 10labs-sprint-117: Support a multi-domain model in keystone - https://phabricator.wikimedia.org/T115026#2946334 (10Andrew) 05Open>03declined This turns out to not be needed for any of my plans, so closing for now. [20:04:58] 06Labs, 07LDAP, 13Patch-For-Review: Clean up ldap host entries and references - https://phabricator.wikimedia.org/T148781#2946341 (10Andrew) [20:05:03] 06Labs, 10Labs-Sprint-109, 13Patch-For-Review: Remove reliance on ldap $::projectid from shinkengen - https://phabricator.wikimedia.org/T108625#2946340 (10Andrew) 05Open>03Resolved [20:06:29] 06Labs, 10Labs-Infrastructure, 10labs-sprint-117: Support a multi-domain model in keystone - https://phabricator.wikimedia.org/T115026#2946348 (10Andrew) [20:06:31] 06Labs, 10Labs-Infrastructure, 10labs-sprint-117: switch to keystone api v3 - https://phabricator.wikimedia.org/T115027#2946345 (10Andrew) 05Open>03Resolved a:03Andrew This is done in all the places that matter. [20:06:49] 06Labs, 10Labs-Sprint-109, 10labs-sprint-116, 10labs-sprint-117, and 3 others: Monitor nova services - https://phabricator.wikimedia.org/T90784#2946357 (10Andrew) [20:06:51] 06Labs, 10Labs-Infrastructure, 10labs-sprint-117, 13Patch-For-Review: Give 'novaobserver' keystone account rights to read everything, everywhere, write or change nothing - https://phabricator.wikimedia.org/T104588#2946352 (10Andrew) 05Open>03Resolved a:03Andrew This works! [20:09:44] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:09:44] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2946364 (10madhuvishy) [20:10:20] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-LTA-Knowledgebase: tools.lta missing replica.my.cnf - https://phabricator.wikimedia.org/T155317#2946377 (10chasemp) @yuvipanda, I know you're digging around here. I'm going to leave this for you in case it's interesting in your current debugging. [20:11:02] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:34:45] 06Labs, 10Labs-Infrastructure: Empty default security group for newly created project - https://phabricator.wikimedia.org/T136871#2946515 (10Lokal_Profil) >>! In T136871#2945454, @chasemp wrote: >>>! In T136871#2944456, @Lokal_Profil wrote: >> Just a note about this still happening > > Can you be more specif... [20:38:57] 06Labs, 10Labs-Infrastructure: Empty default security group for newly created project - https://phabricator.wikimedia.org/T136871#2946516 (10chasemp) a:03Andrew [20:45:08] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2907785 (10Neil_P._Quinn_WMF) @madhuvishy Do you know what `editor-engagement` is? Is is the project for http://ee-dashboards.wmflabs.org? I'm trying to figure out... [21:08:49] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2946759 (10chasemp) >>! In T143349#2921057, @chasemp wrote: > @Mattflaschen-WMF (listed as contact here https://wikitech.wikimedia.org/wiki/Nova_Resource:Editor-engagement) > > do you k... [21:11:46] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2946792 (10chasemp) > bots wm-bot @Petrb Do you know if this 'wm-bot' instance can be deleted? From this [[ https://wikitech.wikimedia.org/wiki/Nova_Resource:Bots | text ]]: ```An obs... [21:12:48] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2946796 (10chasemp) [21:14:33] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2565449 (10chasemp) @Qgil and @acs do you know if the instance (admins here https://wikitech.wikimedia.org/wiki/Nova_Resource:Contributors): > Contributors-metrics.contributors.eqiad.wm... [21:14:50] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2946815 (10chasemp) [21:16:00] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:17:25] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2946831 (10Krenair) >>! In T143349#2946792, @chasemp wrote: >> bots wm-bot > > @Petrb Do you know if this 'wm-bot' instance can be deleted? From this [[ https://wikitech.wikimedia.org/... [21:45:31] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2946953 (10madhuvishy) @Neil_P._Quinn_WMF This is the link to the project - https://wikitech.wikimedia.org/wiki/Nova_Resource:Editor-engagement. It's described as... [21:47:05] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2946955 (10Acs) >>! In T143349#2946796, @chasemp wrote: > @Qgil and @acs do you know if the instance (admins here https://wikitech.wikimedia.org/wiki/Nova_Resource:Contributors): > >> C... [22:08:13] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2565449 (10scfc) AFAIK, `wm-bot.bots` (still) runs [[https://meta.wikimedia.org/wiki/Wm-bot|`wm-bot`]] (https://github.com/benapetr/wikimedia-bot). As I believe its services are in use... [22:32:42] hey halfak [22:32:45] Hey folks. I'm trying to help jonas_agx_ connect to some of my instances. [22:32:52] But we can't figure out what his username is. [22:33:22] (03PS7) 10Paladox: Connect wikibugs to irc over ssl [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328663 (https://phabricator.wikimedia.org/T141089) [22:34:02] chasemp, ^ [22:34:19] I used tool labs awhile ago [22:34:42] it's his shell name on wikitech, jonas_agx_ can you login to wikitech? [22:35:28] jonas_agx_, https://wikitech.wikimedia.org [22:36:18] ah sorry [22:37:07] I can't log in to wikitech [22:37:26] Could it be that you' [22:37:33] misremembered using tools? [22:37:44] Or that you chose a different username than your usual one? [22:37:46] Found my user now [22:37:50] \o/ [22:38:17] Sorry for that delay [22:38:27] Logged to wikitech right now [22:38:48] my user is Agx [22:39:05] your "Instance shell account name" will be on https://wikitech.wikimedia.org/wiki/Special:Preferences [22:39:23] halfak: agx:x:2984:500:Agx:/home/agx:/bin/bash [22:39:25] * halfak takes notes for next time [22:39:25] that's him I think [22:39:46] Thanks chasemp & bd808. I think I can take it from here :) [22:39:53] Thank you! [22:40:26] I'm so sorry for this. [22:40:38] No worries. This is why there's a chat room to get help :) [22:40:51] * halfak loves to hang out in this room and help when he can. [22:40:59] :d [22:41:01] :D [22:41:02] Usually it's just SQL queries though [22:41:13] One sec and I'll have the instance ready for you. [22:41:19] Heading back to #wikimedia-ai [22:41:21] thanks again! [22:53:58] jonas_agx_: please never feel bad about asking for help. keeping the whole universe in a single head is pretty much not possible. :) [23:08:46] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2947504 (10Neil_P._Quinn_WMF) @madhuvishy: okay, thanks! None of that looks like it impacts my work. I'm not sure who else one could talk to about that project, bu... [23:14:42] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:41:39] 06Labs, 10Tool-Labs: Grid engine masters down - https://phabricator.wikimedia.org/T100554#2947633 (10scfc) [23:41:42] 06Labs, 10Tool-Labs: Test if grid engine master non-failure depends on the lengths of /etc/hosts lines - https://phabricator.wikimedia.org/T100660#2947632 (10scfc) 05Resolved>03declined [23:43:58] 10MediaWiki-extensions-OpenStackManager: Non-Admin users can't see anything in manage addresses interface - https://phabricator.wikimedia.org/T57897#2947666 (10scfc) 05Resolved>03declined >>! In T57897#2486743, @hashar wrote: > That works given you have proper rights. … but this task was about "does not sho...