[00:02:53] <wikibugs>	 10Toolforge, 10InternetArchiveBot (v1.4), 10User-Zppix, 10User-bd808: IABot Management interface: Make the login sessions last longer or add an option for "remember me" - https://phabricator.wikimedia.org/T170849#3464330 (10Cyberpower678) So the problem is for some reason the DB Session modifications don't...
[00:02:59] <Cyberpower678>	 bd808: ^
[00:03:05] <Cyberpower678>	 Do you need more information?
[00:03:43] <bd808>	 Cyberpower678: that seems like a good place to start working from.
[00:04:10] <bd808>	 my idea is for me to make a test tool and try to get db sessions working properly there
[00:04:33] <bd808>	 I'll start from your core and see if I can isolate what may cause it to fail
[00:05:12] <bd808>	 I'm kind of thinking that making a composer library just to do this very specifically for Toolforge might be a long term useful thing
[00:06:04] <wikibugs>	 10Toolforge, 10InternetArchiveBot (v1.4), 10User-Zppix, 10User-bd808: IABot Management interface: Make the login sessions last longer or add an option for "remember me" - https://phabricator.wikimedia.org/T170849#3464333 (10Cyberpower678) As a note AES-256-CBC-HMAC-SHA256 is not supported on Toolforge so i...
[00:06:58] <Cyberpower678>	 bd808: Just know mine encrypts the session data, using the tool's consumer key with a some random numbers.
[00:07:37] <bd808>	 having the data encrypted at rest is a nice thing for sure
[00:08:05] <Cyberpower678>	 I'm guessing there's an issue with the encryption process.  Or the decryption process.  Perhaps the session data isn't decrypting properly?
[00:08:13] <bd808>	 could you tell if decrypting was part of the problem, or was it just that the sessions were not found?
[00:08:27] <Cyberpower678>	 Though it does seem to work correctly on my machine.
[00:08:55] <Cyberpower678>	 bd808: the problem is it works 100% correctly on my machine, and on labs it just fails miserably.
[00:09:03] <Cyberpower678>	 And the error.log is useless.
[00:10:03] <bd808>	 well we are going to have to get to something more specific than that. :)
[00:10:33] <Cyberpower678>	 Indeed. :p
[00:12:15] <Cyberpower678>	 bd808: If I could debug it there, with xDebug, I would. :p
[00:12:18] <Cyberpower678>	 But I can't.
[00:15:13] <bd808>	 do we not have xdebug available at all?
[00:15:29] <Cyberpower678>	 I wouldn't know how to make use of the PHP session remotely.
[00:15:36] <bd808>	 I think there is a way to add custom php.ini stuff...
[00:16:05] <bd808>	 these are good questions to be asking.
[00:16:44] <bd808>	 I have an open task somewhere about finding ways to make local dev using similar config easier for most languages we support
[00:17:08] <Cyberpower678>	 bd808: I believe you have xDebug installed.  But I do not know how to launch a session on toolforge and step through the code on my machine to see where it goes wrong.
[00:17:19] <bd808>	 *nod*
[00:19:19] <bd808>	 I really don't have any xdebug skills, but I might be able to find someone to look into that
[00:22:47] <wikibugs>	 10Toolforge, 10Documentation: Investigate and document xdebug usage for PHP projects on Toolforge - https://phabricator.wikimedia.org/T171419#3464354 (10bd808)
[00:23:10] <bd808>	 Cyberpower678: ^ I made a task to at least remind me/someone to look into xdebug
[00:23:18] <Cyberpower678>	 :-)
[00:23:39] <Cyberpower678>	 Would go a really long way to letting me debug scripts running on labs.
[00:24:12] <bd808>	 what is this "labs" you speak of? ;)
[00:24:22] <bd808>	 toolforge sure is nice :)
[00:24:39] <Cyberpower678>	 Oh noes.  It disappeared from existence. :p
[00:25:45] <Cyberpower678>	 It seems there are only this fluffy white things where Labs used to be. :p
[00:25:48] <Cyberpower678>	 bd808: ^
[00:26:00] <bd808>	 unicorns!
[00:26:06] <Cyberpower678>	 :D
[00:26:08] <bd808>	 it's unicorns all the way down
[00:26:13] <Cyberpower678>	 Precisely.
[00:26:36] * bd808 now wants a graphic of that
[00:27:40] <Cyberpower678>	 I want a new logo for IABot. :p
[00:28:56] <bd808>	 I have some proposals for a new Toolforge logo -- https://github.com/bd808/toolforge-logos
[00:29:10] <bd808>	 I'm going to email the list about them tomorrow I think
[00:29:41] <Cyberpower678>	 bd808: 1 or 3
[00:30:19] <bd808>	 I think my favorites are "Positive image of anvil" and "Compact"
[00:30:42] <Cyberpower678>	 I don't like compact.  
[00:30:53] <Cyberpower678>	 Order from best to worst.
[00:31:01] <Cyberpower678>	 1, 3, 2
[00:31:07] <d3r1ck>	 bd808: Where did you get this concept? https://avatars0.githubusercontent.com/u/6469?v=4&s=400
[00:32:55] <bd808>	 d3r1ck: the angry unicorn? It started as a low cost vector art image that I bought and then modified
[00:33:16] <d3r1ck>	 Ohhh
[00:33:31] <d3r1ck>	 bd808: It's interesting but a bit scary :D
[00:34:44] <bd808>	 I've had a love of unicorns since I was small. And I like mohawk hair cuts and punk rock music. Somehow it turned into that avatar. :)
[00:35:09] <Cyberpower678>	 bd808: so that's how the cloud logo was formed.
[00:35:30] <bd808>	 no, the Labs unicorn existed when I got here
[00:36:12] <bd808>	 Jorm drew it up and the history of why is a bit muddy, but I like the version where Chad said "just make it a unicorn" and so they did
[00:36:21] <d3r1ck>	 :)
[00:36:26] <Cyberpower678>	 :)
[00:36:38] <d3r1ck>	 Checking out now... Good night people o/
[00:42:00] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1423 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[00:52:12] <Cyberpower678>	 bd808: I forgot to mention.  That code is live, but I unloaded the DB from the class declaration.
[00:52:33] <Cyberpower678>	 So when the class doesn't get a DB object, it will not use the DB session handler.
[00:53:54] <Cyberpower678>	 bd808: if you want to load them again, where $oauthObject is declared and initialized, replace the statement with $oauthObject = new OAuth( false, $dbObject );
[00:54:15] <Cyberpower678>	 in files index.php, api.php, and oauthcallback.php
[01:17:02] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1423 is OK: OK: Less than 1.00% above the threshold [0.0]
[02:10:42] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1405 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[02:50:42] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:27:25] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1411 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[05:02:26] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0]
[05:17:36] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1406 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[06:07:37] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0]
[06:47:10] <shinken-wm>	 PROBLEM - Puppet errors on tools-worker-1027 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[06:57:02] <wikibugs>	 10Data-Services, 10cloud-services-team (Kanban), 10Operations, 10Patch-For-Review: Reimage labstore1001 and labstore1002 for DRBD storage setup - https://phabricator.wikimedia.org/T158196#3464610 (10MoritzMuehlenhoff) These have been reimaged with jessie, but I'm wondering if it would be better to use stre...
[07:27:12] <shinken-wm>	 RECOVERY - Puppet errors on tools-worker-1027 is OK: OK: Less than 1.00% above the threshold [0.0]
[09:59:53] <wikibugs>	 10VPS-Projects, 10Beta-Cluster-Infrastructure, 10Operations, 10Release-Engineering-Team (Kanban), and 2 others: a lot of beta cluster instances are not reachable over SSH - https://phabricator.wikimedia.org/T171174#3465349 (10hashar)
[10:32:04] <paladox>	 !log git upgrading gerrit to 2.14.2 final on gerrit-test3
[10:32:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL
[11:10:23] <wikibugs>	 10VPS-Projects, 10Beta-Cluster-Infrastructure, 10Operations, 10Release-Engineering-Team (Kanban), and 2 others: a lot of beta cluster instances are not reachable over SSH - https://phabricator.wikimedia.org/T171174#3465468 (10hashar) 05Open>03Resolved a:03hashar I have removed faulty puppet classes,...
[11:15:55] <wikibugs>	 10Cloud-Services, 10Wikidata, 10Patch-For-Review, 10User-Ladsgroup, 10Wikidata-Sprint: Open view for term_full_entity_id in wb_terms table in labs - https://phabricator.wikimedia.org/T167114#3465490 (10daniel)
[12:29:55] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1420 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[13:04:57] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1420 is OK: OK: Less than 1.00% above the threshold [0.0]
[13:10:12] <Cyberpower678>	 bd808: whenever you're ready to debug the webserver issue, I'm ready to assist.
[13:52:35] <wikibugs>	 10cloud-services-team, 10DC-Ops, 10Operations, 10ops-eqiad: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3466006 (10Andrew)
[13:56:16] <wikibugs>	 10cloud-services-team, 10DC-Ops, 10Operations, 10ops-eqiad: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3466042 (10Andrew) (I should note that there's no data of interest on that box -- reimaging is just fine)
[14:01:00] <wikibugs>	 10cloud-services-team, 10DC-Ops, 10Operations, 10ops-eqiad: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3466006 (10Cmjohnson) This could be a h/w issue. The h/w system event log shows this   ecord:      1 Date/Time:   04/28/2017 19:37:34 Source:      system Severity:    Ok Description:...
[14:10:25] <wikibugs>	 10cloud-services-team, 10DC-Ops, 10Operations, 10ops-eqiad: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3466109 (10Andrew) Thank you, Chris!  This is new hardware and we can live without it... can we leave this in your hands to follow up with Dell?  Is there any additional info you need?
[14:57:40] <wikibugs>	 10Cloud-Services, 10DBA, 10Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3466268 (10Marostegui) s2 has been imported on labsdb1009, labsdb1010 and labsdb1011. Views have been created so these wikis are now fully availa...
[14:57:48] <wikibugs>	 10Cloud-Services, 10DBA, 10Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3466269 (10Marostegui)
[14:58:38] <wikibugs>	 10Cloud-Services, 10DBA, 10Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3325353 (10Marostegui) a:03Marostegui I will start with s7, the last pending shard, on Monday as I am off from Tuesday.
[15:07:30] <wikibugs>	 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3466298 (10chasemp) 05Open>03Resolved closing this as further implementation will be tracked in other tasks
[15:07:35] <wikibugs>	 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3466301 (10chasemp) 05Open>03Resolved closing this as further implementation will be tracked in other tasks
[15:15:44] <wikibugs>	 10Cloud-Services, 10Horizon, 10Upstream: Clicking on a project name in Identity logs you out of Horizon - https://phabricator.wikimedia.org/T138809#3466340 (10Andrew) 05Open>03Resolved a:03Andrew
[15:25:34] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (Kanban): Set good availability-zone defaults for nova users - https://phabricator.wikimedia.org/T170447#3466380 (10Andrew) p:05Triage>03Normal
[15:32:59] <wikibugs>	 10cloud-services-team, 10Analytics: Remove logging from labs for schema https://meta.wikimedia.org/wiki/Schema:CommandInvocation - https://phabricator.wikimedia.org/T166712#3305259 (10Andrew) Is this tagged with cloud-services-team in error, or is there something you need from us?
[15:35:43] <wikibugs>	 10Striker, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-bd808: Error saving OAuth credentials. [req id: f1a2370b1b8a4e1a8827de96b9bce144] bug - https://phabricator.wikimedia.org/T164847#3466458 (10Andrew) p:05Triage>03Normal
[15:40:06] <wikibugs>	 10Striker, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-bd808: Striker gives fatal error when a SUL account already in use tries to attach to a second LDAP account - https://phabricator.wikimedia.org/T164847#3466489 (10bd808)
[15:43:52] <shinken-wm>	 PROBLEM - Puppet errors on tools-webgrid-lighttpd-1428 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[15:49:59] <wikibugs>	 10Cloud-VPS, 10Operations, 10Puppet: Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188#3466547 (10bd808)
[15:56:05] <wikibugs>	 10Cloud-VPS, 10Operations, 10Puppet: Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188#3456889 (10Andrew) I'm pretty sure that #1 is moot -- at least, anytime we discuss it we conclude that the 'labs-support' vlan isn't really a useful concept and should be elimi...
[15:58:57] <wikibugs>	 10Cloud-VPS, 10Operations, 10Puppet: Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188#3466573 (10Andrew) Here are some things that need to be thought about/figured out before we can go forward:  - Security model: Having a labs VM that is a Ops-only and critical...
[16:23:50] <shinken-wm>	 RECOVERY - Puppet errors on tools-webgrid-lighttpd-1428 is OK: OK: Less than 1.00% above the threshold [0.0]
[17:14:04] <wikibugs>	 10Cloud-Services, 10Cloud-VPS, 10monitoring, 10Wikimedia-Incident: toolschecker fell to pieces when labs-ns0 went down - https://phabricator.wikimedia.org/T152369#3466829 (10madhuvishy) 05Open>03Resolved a:03madhuvishy I had finished investigating and fixing this, but apparently forgot to update this...
[17:22:21] <shinken-wm>	 PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[17:32:11] <wikibugs>	 10Cloud-VPS, 10Operations, 10Puppet: Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188#3456889 (10chasemp) My understanding of this is we are looking at #1 as the current compromise short of moving services into the the Labs realm directly, though I believe in th...
[17:36:41] <Cyberpower678>	 bd808: I'm heading out.  If you need me, just leave me a message on the ticket or here, and I'll read it when I get back.
[17:37:19] <wikibugs>	 10Cloud-VPS, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3466870 (10chasemp) 05Open>03Resolved ongoing implementation tracked in T167559
[17:37:24] <bd808>	 Cyberpower678: ok. I hope to start working on a test "soon". Just wrapping up my morning email & meetings and thinking about lunch.
[17:37:52] <Cyberpower678>	 Enjoy your lunch
[17:39:54] <wikibugs>	 10cloud-services-team (FY2017-18), 10Goal: Refactor openstack Puppet to account for Neutron - https://phabricator.wikimedia.org/T171494#3466898 (10chasemp)
[18:01:41] <wikibugs>	 10cloud-services-team (FY2017-18), 10Goal: Refactor openstack Puppet to account for Neutron - https://phabricator.wikimedia.org/T171494#3467080 (10chasemp) Due to the intermixed nature of some inherent dependencies (our model of include for openstack::repo at the module level, and the role level indiscriminate...
[18:02:22] <shinken-wm>	 RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:42:06] <wikibugs>	 10cloud-services-team (FY2017-18): Investigate and implement alternative for showmount based check at instance boot time - https://phabricator.wikimedia.org/T171508#3467337 (10madhuvishy)
[18:46:36] <wikibugs>	 10Cloud-Services, 10cloud-services-team (FY2017-18): Investigate and implement alternative for showmount based check at instance boot time - https://phabricator.wikimedia.org/T171508#3467357 (10madhuvishy)
[18:58:38] <wikibugs>	 10Cloud-Services, 10cloud-services-team (FY2017-18): Investigate and implement alternative for showmount based check at instance boot time - https://phabricator.wikimedia.org/T171508#3467413 (10madhuvishy)
[19:22:52] <wikibugs>	 10Cloud-Services, 10Toolforge, 10wikimedia-irc-freenode: Freenode sometimes throttles bot connections from tools - https://phabricator.wikimedia.org/T151704#2825129 (10bd808) >>! In T151704#3389006, @charitwo wrote: > after speaking with a staffer today there is no issue adding an iline but the box needs to...
[19:26:10] <Sagan>	 bd808: sounds good :)
[19:26:55] <bd808>	 Sagan: next steps would be to actually test this theory somewhere of course
[19:27:42] <Sagan>	 bd808: yeah. but all in all a good thing. The second advantage of that is: If a bot etc is using ident, only this one would get k-lined in case he does shit
[19:27:45] <Sagan>	 instead of the NAT
[19:27:54] <Sagan>	 I rember we had this two or three times
[19:27:59] <valhallasw`cloud>	 bd808: I just realized there is one potential issue. The NAT is for all labs traffic, and the traffic may thus come from an untrustworthy identd
[19:28:34] <Zppix>	 im currently making my bot use nickserv to ident would this mean this would affect me?
[19:28:46] <valhallasw`cloud>	 So we probably have to limit it to tools, but having a tools-specific NAT sounds "meh" again
[19:29:31] <valhallasw`cloud>	 Zppix: no, identd and nickserv auth are completely separate
[19:29:43] <bd808>	 valhallasw`cloud: hmmm... I saw something about telling oident what upstream to respond to but I didn't look for the inverse
[19:30:27] <Sagan>	 Zppix: ident is something running when IRC-Connection gets established. When succesful, people don't have a ~ at their mask, like me
[19:30:41] <Sagan>	 and your client for example is not running that
[19:30:57] <Zppix>	 I dont my bot can support that let me check
[19:31:05] <Zppix>	 yeah it cant
[19:31:06] <Sagan>	 Zppix: that's why you see a *** No Ident response upon connect
[19:31:22] <Sagan>	 Zppix: that's a thing that would get installed for all tools, so you don't need to change something
[19:31:34] <Zppix>	 oh sweet
[19:31:35] <Zppix>	 ok
[19:31:47] <bd808>	 Zppix: it is something that gets handled at the host computer level rather than inside the irc client code
[19:31:49] <Sagan>	 but it can take a moment longer until the IRC-connection is established, until the ircd gets and ident response
[19:32:41] <Sagan>	 bd808: I don't know the full concept of ident, but I wonder if it's still that useful, for example I've setup ident in a way that I can tell znc what ident I want to use
[19:33:10] <bd808>	 it is mostly useful if you feel you can trust the host itself.
[19:33:31] <Zppix>	 Sagan speed is no issue (my code doesnt run that fast atm anyway)
[19:33:34] <Zppix>	 thanks guys :)
[19:33:49] <bd808>	 if you control root on the host then you can spoof it any way you want
[19:34:11] <Sagan>	 Zppix: it's only when the connection starts, not during it, and you will notice only a few secs normally, that's my experience
[19:34:26] <Sagan>	 bd808: ah, ok
[19:34:38] <bd808>	 so this would basically be freenode saying "we trust the Wikimedia admins" and then I trying to do the right thing
[19:34:47] <bd808>	 s/I/us/
[19:34:49] <Sagan>	 heh :D
[19:35:13] <Sagan>	 bd808: the other way is: If I fail to configure ident rightly, and people use the same one, then multiple users whould get klined, so not their problem ;)
[19:35:20] <Sagan>	 then my problem as admin
[19:36:07] <bd808>	 the real fix for this problem is kubernetes + IPv6
[19:36:27] <wikibugs>	 10Cloud-Services, 10Toolforge, 10wikimedia-irc-freenode: Freenode sometimes throttles bot connections from tools - https://phabricator.wikimedia.org/T151704#3467557 (10charitwo) I am more familiar with oident but I believe either would be suitable  if connecting clients start connecting without the ~ in the...
[19:36:28] <bd808>	 then every container has a unique IPv6 address
[19:38:21] <Sagan>	 bd808: depends on. I don't know how it's configured at freenode, but I know the ircd, normally ranges have a connection limit tii
[19:38:23] <Sagan>	 *too
[19:39:31] <bd808>	 yeah, but it gets rid of NAT at least
[19:40:30] <bd808>	 and sharing on a given IP
[19:41:00] <bd808>	 even with the static IPs we use today N irc clients could end up on the same IP
[20:17:00] <wikibugs>	 10Wikibugs, 10XTools, 10Patch-For-Review: Update XTools on Wikibugs - https://phabricator.wikimedia.org/T171265#3467786 (10Quiddity) I think it probably needs a `fab pull` to be run by one of the [[https://tools.wmflabs.org/?tool=wikibugs |maintainers]], per https://www.mediawiki.org/wiki/Wikibugs#Deploying_...
[21:00:51] <wikibugs>	 !log tools.wikibugs Updated channels.yaml to: da35cf4cfbe116d7c6a32237718696903423ea4d Update project name for XTools
[21:03:56] <wikibugs>	 10Wikibugs, 10XTools, 10Patch-For-Review: Update XTools on Wikibugs - https://phabricator.wikimedia.org/T171265#3467970 (10Legoktm) Normally it should happen automatically, idk why it didn't. Anyways:  ``` (python)km@km-tp ~/p/g/l/t/wikibugs2> fab pull [tools-login.wmflabs.org] Executing task 'pull' [tools-l...
[21:17:40] <wikibugs>	 10Wikibugs, 10XTools, 10Patch-For-Review: Update XTools on Wikibugs - https://phabricator.wikimedia.org/T171265#3468033 (10Quiddity) 05Open>03Resolved
[21:36:38] <SMalyshev>	 bd808: was there some change to labs setup? I can't login to elastic-wikidata.eqiad.wmflabs now and had same problem with wdqs-beta earlier (gehel fixed it but I'm not sure how...)
[21:37:53] <bd808>	 SMalyshev: we had some ldap issues last week that could still be breaking hosts, especially if they are using a project local puppetmaster and it is not up to date with the prod master's puppet tree
[21:38:09] <bd808>	 let me look at elastic-wikidata
[21:38:31] <SMalyshev>	 bd808: this one doesn't seem to use any special puppet... but maybe it needs a puppet run or something
[21:38:43] <chasemp>	 almost definitely teh case for ldap breakage holdover
[21:38:54] <chasemp>	 a reboot would be the biggest hammer to fix
[21:39:36] <SMalyshev>	 ok I'll try that
[21:39:46] <bd808>	 puppet is busted there
[21:40:04] <chasemp>	 ah, and if puppet is busted it probably can't pull the correct cert
[21:40:19] <bd808>	 !log wikidata-dev Puppet broken on elastic-wikidata.wikidata-dev.eqiad.wmflabs: "Could not find data item lvs::configuration::lvs_service_ips in any Hiera data file"
[21:40:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL
[21:40:43] <bd808>	 SMalyshev: so we need to figure out how to fix hiera to make puppet happy
[21:40:45] <SMalyshev>	 I don't see any special classes on it, so it's not local puppetmaster probably
[21:41:00] <SMalyshev>	 bd808: so I assume rebooting it won't fix it then
[21:41:04] <chasemp>	 cat /etc/puppet/puppet.conf (I think) would know for sure
[21:41:25] <bd808>	 SMalyshev: no, this is some role applied there that needs something added to hiera
[21:42:07] <bd808>	 it is pulling from the main puppetmaster, so it's just a role that doesn't have defaults in Cloud VPS
[21:43:09] <bd808>	 it looks like role::elasticsearch::cirrus is all that is applied -- https://tools.wmflabs.org/openstack-browser/server/elastic-wikidata.wikidata-dev.eqiad.wmflabs
[21:43:26] <SMalyshev>	 yes, makes sense
[21:44:25] <bd808>	 the missing thing is "lvs::configuration::lvs_service_ips"
[21:44:40] <paladox>	 bd808 if you remove the role, run puppet and reboot. That should work :). Then re apply the role.
[21:45:03] <bd808>	 it wiould fix one problem anyway
[21:45:14] <bd808>	 but fixing puppet properly seems ideal
[21:46:06] <bd808>	 SMalyshev: hieradata/labs/deployment-prep/common.yaml has a pile of config that deployment-prep is using
[21:46:12] <bd808>	 including this setting
[21:46:46] <bd808>	 SMalyshev: I'm going to join the project and see if I can fix this if it is ok with you
[21:47:14] <SMalyshev>	 bd808: sure, thanks
[21:47:18] <jdlrobson>	 bd808: who looks after labs these days? i'm running into some issues with my labs vagrant instance and i have no idea how to even start fixing it
[21:48:21] <bd808>	 jdlrobson: "people" ;) !help is the keyword to get attention
[21:48:31] <jdlrobson>	 :)
[21:48:42] <andrewbogott>	 jdlrobson: what instance and project?
[21:48:44] <jdlrobson>	 http://reading-web-staging.wmflabs.org/ < been trying to upgrade vagrant here
[21:48:44] <bd808>	 jdlrobson: I'm helping SMalyshev right now but you can get in line
[21:48:57] <andrewbogott>	 jdlrobson: is the issue that you can't log in?  Or is it something vagrant-specific?
[21:48:58] <jdlrobson>	 and i got lots of scary messages about running out of disk space
[21:49:13] <jdlrobson>	 and now i'm failing to recover it
[21:49:19] <jdlrobson>	 andrewbogott: issue is vagrant specific
[21:49:50] <bd808>	 !log wikidata-dev Added BryanDavis (self) as admin to work on hiera/puppet config issue
[21:49:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL
[21:50:27] <andrewbogott>	 jdlrobson: ok, probably you want bryan to help with vagrant.  But also note that what you gave me is a proxy address, bryan will need the actual instance and project to log in.
[21:50:30] <jdlrobson>	 im going to destroy the instance and see if i can boot it up again...
[21:50:44] <andrewbogott>	 that's also a reasonable approach :)
[21:50:53] <jdlrobson>	 There was an error executing ["sudo", "/usr/bin/env", "lxc-stop", "--name", "mediawiki-vagrant_default_1500930374047_27531"]
[21:51:01] <jdlrobson>	 bd808: any ideas? that's a new one for me
[21:51:14] <bd808>	 LXC barfed
[21:51:26] <bd808>	 that's how vagrant runs there
[21:52:08] <bd808>	 sometimes a second attempt works. other times a reboot of the host vm works
[21:53:18] <jdlrobson>	 no errors 2nd attempt.. ok so i'll try `vagrant up` one last time
[21:54:52] <jdlrobson>	 bd808: okay so pretty sure something is wrong
[21:54:54] <jdlrobson>	 Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install apache2' returned 100: Reading package lists...
[21:54:59] <jdlrobson>	 geting lots of red walls of text
[21:56:47] <jdlrobson>	 bd808: reading-web-staging-3.eqiad.wmflabs is the instance
[21:57:30] <bd808>	 !log wikidata-dev Added some hiera config to elastic-wikidata via Horizon. Still more needed to make role::elasticsearch::cirrus apply cleanly.
[21:57:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL
[21:57:39] <mdholloway>	 !log mobile created zim-builder instance
[21:57:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Mobile/SAL
[21:58:22] <bd808>	 SMalyshev: this role needs a ton of config that is missing. I'm going to disable it and get the ldap problem fixed and then leave it for you/someone to actually work out all the problems.
[21:58:46] <SMalyshev>	 bd808: ok, I think just getting login working again should be enough for now
[21:59:10] <bd808>	 !log wikidata-dev Disabling role::elasticsearch::cirrus on elastic-wikidata
[21:59:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL
[22:01:19] <bd808>	 jdlrobson: does the red wall of text say something about a cache directory being missing?
[22:02:06] <bd808>	 SMalyshev: see if you can ssh in now. I think I got ldap fixed there
[22:02:31] <SMalyshev>	 bd808: yes, can log in, thanks!
[22:03:08] <bd808>	 cool. I'm going to leave that puppet role off. If I put it back and nobody works on it then puppet will just stay broken
[22:03:33] <bd808>	 if you or someone else has time to figure out what needs to be changed then feel free to apply it again
[22:04:47] <legoktm>	 bd808: once you have a second, I'm a bit stumped by trusty-compat branch apparently requiring jessie? https://paste.fedoraproject.org/paste/gGrNZseFXWrTH~p6kQyuQg
[22:05:40] <bd808>	 legoktm: hmmm... that's not awesome
[22:06:06] <bd808>	 that check would be in the Vagrantfile
[22:06:11] <bd808>	 maybe there was a bad backport?
[22:06:32] <legoktm>	 yeah I see it in the vagrantfile
[22:06:58] <bd808>	 jdlrobson: checking reading-web-staging-3.eqiad.wmflabs now...
[22:08:35] <legoktm>	 somehow https://github.com/wikimedia/mediawiki-vagrant/commit/bd1562e21ee08217b75a7127f4e845fefb605cdd is in the trusty-compat branch
[22:08:37] <jdlrobson>	 bd808: bit lost myself :/ so many errors
[22:09:00] <bd808>	 jdlrobson: I saw it scroll by. once it stops blowing up I'll see if I can fix it :)
[22:12:35] <bd808>	 legoktm: ugh... this has apparently been broken since I made the branch? the order of 8f31ca2 and a57de1d is flipped?
[22:13:02] <bd808>	 I merged the big scary in and then branched to back compat?
[22:13:36] <bd808>	 or gerrit was "helpful" with dependencies on a cherry-pick
[22:13:54] <legoktm>	 I'm trying with 88c3171 now, which is 2 before the jessie one
[22:14:12] <legoktm>	 it's provisioning now
[22:15:30] <legoktm>	 ok, that works
[22:17:10] <legoktm>	 T171537
[22:17:11] <stashbot>	 T171537: trusty-compat branch requires Debian Jesie - https://phabricator.wikimedia.org/T171537
[22:17:18] <bd808>	 legoktm: if you can figure out how to un-fsck that branch others may appreciate it
[22:17:23] <legoktm>	 will do
[22:17:35] <bd808>	 although I think the only role that lost there is ocg
[22:17:59] <legoktm>	 the whole point of this excercise was actually to be able to login and dump mysql so I could migrate it to jessie :)
[22:18:07] <bd808>	 so ideally people will rebuild and get on jessie
[22:18:08] <bd808>	 heh
[22:20:18] <bd808>	 jdlrobson: somebody rm'ed your cache dir and now the permissions are all messed up :/
[22:20:42] <bd808>	 I git restored the dir, but I think I need to also change perms ...
[22:24:34] <bd808>	 jdlrobson: vagrant provision is running now. I'll !log something when it finishes
[22:25:00] <wikibugs>	 10Cloud-Services, 10cloud-services-team (FY2017-18): Puppetize and setup initial lvms and directory structures for labstore1006|7 - https://phabricator.wikimedia.org/T171539#3468276 (10madhuvishy)
[22:25:52] <legoktm>	 ==> default: Notice: /Stage[main]/Git/Apt::Ppa[git-core/ppa]/Exec[/usr/bin/add-apt-repository --yes ppa:git-core/ppa && /bin/sed -i 's/jessie/xenial/g' /etc/apt/sources.list.d/git-core-ppa-jessie.list && /usr/bin/apt-get update]/returns: executed successfully
[22:26:18] <bd808>	 that's from the jessie branch too obviously
[22:26:31] <legoktm>	 oh this is me creating a new jessie instance
[22:26:49] <bd808>	 ah. there is a comment in the puppet
[22:27:05] <bd808>	 it's a horrible horrible hack to make a couple of PPAs we use work on jessie
[22:27:21] <legoktm>	 is there a reason it doesn't use jessie-backports?
[22:27:26] <wikibugs>	 10cloud-services-team (FY2017-18): Figure out how NFS failovers will work for the dumps servers - labstore1006|7 - https://phabricator.wikimedia.org/T171540#3468288 (10madhuvishy)
[22:27:34] <wikibugs>	 10cloud-services-team (FY2017-18): Figure out how NFS failovers will work for the dumps servers - labstore1006|7 - https://phabricator.wikimedia.org/T171540#3468288 (10madhuvishy) a:05madhuvishy>03None
[22:27:47] <legoktm>	 jessie-backports has 2.13.3, the ppa has 2.13.0
[22:28:12] <bd808>	 legoktm: because reasons? 7 months ago it was the best thing we could do
[22:28:30] <paladox>	 Should this https://phabricator.wikimedia.org/T171538 be tagged with cloud?
[22:28:37] <bd808>	 pinning to backports seems like a better thing than that ppa hack now
[22:29:14] <wikibugs>	 10Data-Services, 10Operations, 10ops-eqiad: Degraded RAID on labsdb1001 - https://phabricator.wikimedia.org/T171538#3468303 (10bd808)
[22:29:18] <jdlrobson>	 bd808: thankkk you
[22:29:38] <jdlrobson>	 bd808: im seeing `No wiki found` now.. so i guess that's a slight improvement
[22:30:38] <bd808>	 jdlrobson: it still had a few failures. I'm running again to see if they fix themselves or not
[22:32:05] <wikibugs>	 10Cloud-Services, 10cloud-services-team (FY2017-18), 10Datasets-General-or-Unknown: Setup periodic rsync jobs from dataset1001/dumpsdata1001|2 to labstore1006|7 - https://phabricator.wikimedia.org/T171541#3468306 (10madhuvishy)
[22:32:40] <wikibugs>	 10Data-Services, 10Operations, 10ops-eqiad: Degraded RAID on labsdb1001 - https://phabricator.wikimedia.org/T171538#3468270 (10jcrespo) I have no idea why things didn't explode here.
[22:33:59] <wikibugs>	 10cloud-services-team (FY2017-18), 10Datasets-General-or-Unknown, 10Goal: Begin migrating customer-facing Dumps endpoints to Cloud Services - https://phabricator.wikimedia.org/T168486#3366040 (10madhuvishy)
[22:34:05] <jdlrobson>	 bd808: bbiab just switching location
[22:34:32] <wikibugs>	 10Data-Services, 10Operations, 10ops-eqiad: Degraded RAID on labsdb1001 - https://phabricator.wikimedia.org/T171538#3468330 (10jcrespo) I would have ready the labsdb1001 depool patch just in case.
[22:36:18] <bd808>	 !log reading-web-staging Fixed mediawiki-vagrant `vagrant provision` with: git checkout -- cache; chown -r mwvagrant:wikidev cache
[22:36:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Reading-web-staging/SAL
[22:36:36] <bd808>	 jdlrobson: ^ all better. Don't rm -rf cache again :)
[22:39:55] <jdlrobson>	 bd808: thanks
[22:39:58] <jdlrobson>	 hows disk space ?
[22:45:27] <bd808>	 jdlrobson: I didn't look
[22:45:39] <wikibugs>	 (03Draft1) 10Paladox: Switch disk check test to check_disk_space [labs/icinga2] - 10https://gerrit.wikimedia.org/r/367620
[22:45:42] <wikibugs>	 (03PS2) 10Paladox: Switch disk check test to check_disk_space [labs/icinga2] - 10https://gerrit.wikimedia.org/r/367620
[22:46:44] <wikibugs>	 (03CR) 10Paladox: [V: 032 C: 032] "Lets give this a try" [labs/icinga2] - 10https://gerrit.wikimedia.org/r/367620 (owner: 10Paladox)
[23:13:29] <wikibugs>	 10Data-Services, 10Operations, 10ops-eqiad: Degraded RAID on labsdb1001 - https://phabricator.wikimedia.org/T171538#3468441 (10chasemp) p:05Triage>03Unbreak! a:03Cmjohnson I think this must be one of the two RAID1 drives for the OS itself rather than a drive in the RAID0 data array.  We should really g...
[23:15:08] <wikibugs>	 10Data-Services, 10cloud-services-team (Kanban), 10Operations, 10ops-eqiad: Degraded RAID on labsdb1001 - https://phabricator.wikimedia.org/T171538#3468445 (10chasemp)
[23:22:35] <wikibugs>	 10Data-Services, 10cloud-services-team (Kanban), 10Operations, 10ops-eqiad: Degraded RAID on labsdb1001 - https://phabricator.wikimedia.org/T171538#3468451 (10chasemp) hmm  ```# cat /proc/mdstat Personalities : unused devices: <none>```