[01:04:22] (03PS1) 10Tim Landscheidt: Do not hardcore database hosts in list-user-databases [labs/toollabs] - 10https://gerrit.wikimedia.org/r/336171 [01:29:31] Hello, I am having a little bit of trouble with my Phabricator account. I need to either change my Phabricator username or create a new Phabricator account. I just registered at Wikitech under the LDAP/shell name "nicolesharp" (MediaWiki name [[user:Nicole Sharp]]). There is an old Phabricator account already registered as "NicoleRuizSharp" though, that I would like to either delete or rename as "nicolesharp". I [01:29:32] tried renaming on the old account, but the rename option is ghosted, and I cannot log into Phabricator under "nicolesharp" to create a new account [02:04:20] PROBLEM - Puppet run on tools-docker-builder-03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [02:09:41] 06Labs, 10Tool-Labs, 10labs-sprint-119, 06Community-Tech-Tool-Labs, 10Diffusion: Figure out a git hosting solution for tools/kubernetes - https://phabricator.wikimedia.org/T117071#2999985 (10demon) >>! In T117071#2998614, @bd808 wrote: >>>! In T117071#2998385, @Legoktm wrote: >> With Striker+Diffusion, i... [02:14:15] 06Labs, 10Tool-Labs, 10Tools-Kubernetes, 13Patch-For-Review: k8s webservice restart failure with `ValueError: get() more than one object; use filter` - https://phabricator.wikimedia.org/T156626#2999989 (10scfc) I have installed https://integration.wikimedia.org/ci/job/debian-glue/608/artifact/toollabs-webs... [02:20:30] PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [02:29:26] 06Labs, 10Tool-Labs: Archive tool purodha-testing - https://phabricator.wikimedia.org/T152849#2999990 (10scfc) 05Resolved>03Open Did I forget to delete the directory after archiving?! ``` scfc@tools-services-02:~$ ls -l /data/project/purodha-testing/ total 3136 -rw-r--r-- 1 51141 51141 227720 Dec 10 19... [02:31:11] 06Labs, 10Tool-Labs: Delete /data/project/purodha-testing on labstore - https://phabricator.wikimedia.org/T157242#2999992 (10scfc) [02:35:41] 06Labs, 10Tool-Labs, 10labs-sprint-119, 06Community-Tech-Tool-Labs, 10Diffusion: Figure out a git hosting solution for tools/kubernetes - https://phabricator.wikimedia.org/T117071#3000006 (10bd808) >>! In T117071#2999985, @demon wrote: >>>! In T117071#2998614, @bd808 wrote: >> There is at least one missi... [02:35:55] 06Labs, 10Tool-Labs, 10Tools-Kubernetes, 13Patch-For-Review: k8s webservice restart failure with `ValueError: get() more than one object; use filter` - https://phabricator.wikimedia.org/T156626#3000007 (10scfc) Puppet wanted to downgrade `tools-webservice`, so I put the dev version in `aptly` and ran Puppe... [02:36:33] 06Labs, 10Tool-Labs, 07Tracking: Tools that should get deleted (tracking) - https://phabricator.wikimedia.org/T133777#3000009 (10scfc) [02:36:36] 06Labs, 10Tool-Labs: Archive tool purodha-testing - https://phabricator.wikimedia.org/T152849#3000008 (10scfc) [02:43:10] 06Labs, 10Tool-Labs: kube-proxy service is failing at least on tools-bastion-03 - https://phabricator.wikimedia.org/T157243#3000012 (10scfc) [02:45:23] RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [05:08:56] enterprisey: you can save yourself some pain by using callback prefixes [05:15:10] tgr: what does that mean? [05:16:28] well, you register http://tools.wmflabs.org/apersonbot/yabbr/ as a callback prefix, and you don't have to create a new consumer every time you want to tweak the URL [05:21:01] I had no idea I could do that [05:21:04] lol [05:21:10] I learn something new every day, I gues [05:21:12] *guess [06:36:45] RECOVERY - Puppet staleness on tools-worker-1004 is OK: OK: Less than 1.00% above the threshold [3600.0] [06:58:10] (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/336182 (owner: 10L10n-bot) [07:10:42] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:45:44] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [09:10:13] 06Labs, 10Tool-Labs, 06Project-Admins: Migrate Tools access request process to Phabricator - https://phabricator.wikimedia.org/T72625#3000817 (10zhuyifei1999) >>! In T72625#2999816, @bd808 wrote: > There is another way that we could automate Phabricator task creation today. Striker (or any tool actually) cou... [09:36:08] 06Labs, 10Tool-Labs, 10DBA: labsdb1001 and labsdb1003 short on available space - https://phabricator.wikimedia.org/T132431#3000896 (10marcmiquel) [09:36:11] 06Labs, 10Tool-Labs, 10DBA: u3532__ (=marcmiquel) table using 64G on labsdb1001 - https://phabricator.wikimedia.org/T133322#3000893 (10marcmiquel) 05Open>03Resolved a:03marcmiquel Ok! Thanks. Marc [13:15:11] 10Quarry: Add time stamp (when query was run) to results page - https://phabricator.wikimedia.org/T157232#3001552 (10Aklapper) [13:37:11] 06Labs: labstore1004 is high load and periodic unavailability to icinga - https://phabricator.wikimedia.org/T155832#3001625 (10chasemp) [[ https://wikitech.wikimedia.org/wiki/Server_Admin_Log | SAL ]] items where we addressed recabling eth1 to be a direct crossover: 2017-01-24 15:54 chasemp: drbdadm adjust tool... [13:37:51] 06Labs: labstore1004 is high load and periodic unavailability to icinga - https://phabricator.wikimedia.org/T155832#3001627 (10chasemp) Incident report in progress https://wikitech.wikimedia.org/wiki/Incident_documentation/20170119-Labstore [14:25:28] 06Labs, 10DBA, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3001845 (10Marostegui) I have hit this: T151607#2826415 when importing the tables on labsdb1011. I am mysqldumping these three tables (which are not reall... [14:28:41] 10Tool-Labs-tools-Other, 10Wikimedia-Video: (Tool labs) videoconvert tool sometimes misrotates phone videos - https://phabricator.wikimedia.org/T134685#3001866 (10zhuyifei1999) a:03Prolineserver (Assigning to @Prolineserver‎, the maintainer of the tool) [14:37:20] 06Labs, 10DBA, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#2889412 (10jcrespo) __wmf_checksums can just be deleted, specially on labs. [15:48:46] 10Tool-Labs-tools-LTA-Knowledgebase, 10Wikimedia-Mailing-lists: Create mailing list for LTA Knowledgebase tool admins - https://phabricator.wikimedia.org/T156556#3002143 (10RobH) 05Open>03Resolved I've gone ahead and created the mailing list as requested, with both list emails as admin. The list's admin p... [15:53:02] 10Quarry: Allow Quarry to work on All Project vice only Meta - https://phabricator.wikimedia.org/T157342#3002152 (10Reguyla) [16:06:31] 06Tool-Labs-standards-committee, 10Wikimedia-Mailing-lists: Create mailing list for Tool-Labs-standards-committee - https://phabricator.wikimedia.org/T156218#3002274 (10RobH) 05Open>03Resolved List has been created, initial admin email auto generated and emailed out. List is currently set to public with p... [16:28:22] 06Labs, 10Tool-Labs: Several users and tools have invalid credentials in replica.my.cnf - https://phabricator.wikimedia.org/T154933#3002340 (10yuvipanda) I responded on the wrong ticket, whoops. We don't have support for user accounts in the new maintain-dbusers script yet. [16:33:55] (03CR) 10Giuseppe Lavagetto: [C: 031] Add missing dummy secrets from production [labs/private] - 10https://gerrit.wikimedia.org/r/335643 (owner: 10Volans) [16:43:45] hi madhuvishy and yuvipanda! do either of you have a few minutes for a paws internal question/issue? [16:44:03] zareen: in a meeting rn, will ping in a few minutes! [16:44:13] what's up? [16:46:03] madhu: thanks! is there a limit to notebook size? i'm generating matplotlib charts in a notebook which is currently ~1 MB. i don't think the notebook is saving properly - the top of the notebook says "Autosave Failed!" and when i try to manually save, a orange box pops up on the right side saying "Request Entity Too Large" [16:46:10] madhuvishy ^ [16:47:04] * zhuyifei1999_ thinks it's probably nginx [16:47:28] Is granfana-labs + logs from kubernetes somehow buggy at the moment? I don't see any values for memory/ cpu and network usage: https://grafana-labs.wikimedia.org/dashboard/db/kubernetes-tool-combined-stats?var-namespace=autolist [16:48:17] tbs_: grafana-labs is buggy since a long time ago [16:48:32] no later than december of last year [16:50:02] (03CR) 10Niedzielski: [C: 031] "This looks good to me. Once this is merged to the repo, do you have to manually pull update the server?" [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/335845 (https://phabricator.wikimedia.org/T108318) (owner: 10Mholloway) [16:54:05] PROBLEM - High iowait on tools-webgrid-lighttpd-1413 is CRITICAL: CRITICAL: tools.tools-webgrid-lighttpd-1413.cpu.total.iowait (>11.11%) [16:54:56] ^ yuvipanda could the k8s upgrade have damaged the prometheus integration in any way? [16:55:23] zhuyifei1999_ madhuvishy : my google research suggested configuring nginx to allow larger requests - however, i'm totally unfamiliar with nginx so wanted to get some help on the issue :) [16:55:49] zareen: nginx default is quite small [16:56:10] something like 1MB [16:56:45] * zhuyifei1999_ leaves this to yuvipanda or madhuvishy [16:56:55] zareen: aah alright, will look into it and raise limits [16:57:23] chasemp, yuvipanda: I'm not seeing data for tool=bash on that dashboard after 2017-01-24T06:00. Is that around the time of one of the k8s config/version changes? [16:58:14] hm not sure, need to troll back in tools logs and ask yuvipanda [16:58:15] bd808: it is! [16:58:43] spot checking it looks like that's the last metrics for pretty much everything [16:59:23] tbs_: looks like you have made us aware of a bug. Thanks! [16:59:26] seems pretty conclusive [17:00:02] madhuvishy: let me know if you need any other info from me or if it's something i can do. thanks so much! [17:00:37] zareen: no, i'm going to go ahead and raise the nginx limits, will ping when it's merged! [17:00:41] (03CR) 10Mholloway: "afaik documentation about the alpha build server doesn't exist. Yeah, there doesn't seem to be any deployment process for it beyond 'git " [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/335845 (https://phabricator.wikimedia.org/T108318) (owner: 10Mholloway) [17:02:02] bd808: you are welcome :) [17:03:19] thanks tbs_! [17:04:04] RECOVERY - High iowait on tools-webgrid-lighttpd-1413 is OK: OK: All targets OK [17:04:20] 06Labs, 10Tool-Labs, 10Tools-Kubernetes, 05Prometheus-metrics-monitoring: Labs Promethius not recording k8s stats since 2017-01-24T06:00 - https://phabricator.wikimedia.org/T157355#3002421 (10bd808) [17:04:34] chasemp, yuvipanda: ^ opened a bug [17:04:46] kk [17:11:33] bd808, chasemp: since you are working on it, could you also see why https://grafana-labs.wikimedia.org/dashboard/db/labs-project-board?var-project=video&var-server=All looks totally broken? [17:12:13] zhuyifei1999_: I see stats there, can you elaborate on what's broken? [17:12:37] zhuyifei1999_: do you see lots of little red corners with ! in them? [17:12:43] yep [17:12:54] and they have no data [17:13:11] I can screenshot if necessary [17:13:19] if so, that's the backend server getting swamped out by the way that grafana does parallel ajax requests [17:13:57] hitting the "refresh" up in the top right corner will often fix that. sometimes takes multiple reloads :/ [17:14:23] I remember seeing something like backend=null when I was debugging that last time [17:15:03] basically the more graphs are on the dashboard, the higher the chance of each erring out. [17:15:22] it's a grafana flaw that I've seen a lot both in labs and prod [17:15:25] refreshing will always leave some behind [17:15:31] *nod* [17:15:34] tldr graphite overloaded? [17:15:41] chasemp: yes [17:15:46] hmm /me never see this on prod [17:15:51] we don't have a solid plan to fix this but we do know it's an issue [17:16:07] (03CR) 10Sniedzielski: [V: 032 C: 032] Add time zone to alpha build timestamp on android-builds.wmflabs.org [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/335845 (https://phabricator.wikimedia.org/T108318) (owner: 10Mholloway) [17:16:51] graphite really wants to hold all metrics in ram with ssd backups. It's not the prettiest backend system ever built [17:17:07] and the folks who built it had lots of hardware to throw at scaling [17:17:18] (upstream that is) [17:18:08] I would not be surprised to learn that Etsy has more hardware for graphite tooling than WMF has for MediaWiki ;) [17:25:13] 06Labs, 10Tool-Labs, 13Patch-For-Review: kube-proxy service is failing at least on tools-bastion-03 - https://phabricator.wikimedia.org/T157243#3002503 (10scfc) 05Open>03Resolved a:03scfc ``` scfc@tools-bastion-03:~$ sudo puppet agent -t Info: Retrieving plugin Info: Loading facts in /var/lib/puppet/li... [17:29:13] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations: labsdb1005 (mysql) maintenance for reimage - https://phabricator.wikimedia.org/T157358#3002516 (10jcrespo) [17:29:42] 06Labs, 10Tool-Labs, 10Tools-Kubernetes, 05Prometheus-metrics-monitoring: Labs Promethius not recording k8s stats since 2017-01-24T06:00 - https://phabricator.wikimedia.org/T157355#3002421 (10scfc) Timewise this aligns with the switch from deployment with shell scripts to Debian packages (efd468f0fb2934920... [17:29:52] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3002535 (10jcrespo) [17:31:33] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations: labsdb1005 (mysql) maintenance for reimage - https://phabricator.wikimedia.org/T157358#3002565 (10jcrespo) [17:32:01] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations: labsdb1005 (mysql) maintenance for reimage - https://phabricator.wikimedia.org/T157358#3002516 (10jcrespo) [17:32:17] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#3002567 (10jcrespo) 05stalled>03Open [17:37:42] !log mobile updated android-builder with 08b2b78 [17:37:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Mobile/SAL [17:38:24] 10Quarry: Allow Quarry to work on All Project vice only Meta - https://phabricator.wikimedia.org/T157342#3002585 (10Capt_Swing) (ping @yuvipanda ) @Reguyla I think that Quarry did work like most other applications. If you're logged into your account on any public Wiki, you can log into Quarry via Oauth. The Oau... [17:41:34] 10Quarry: Add date when query was last run - https://phabricator.wikimedia.org/T77941#3002610 (10Capt_Swing) [17:41:40] 10Quarry: Add time stamp (when query was run) to results page - https://phabricator.wikimedia.org/T157232#3002612 (10Capt_Swing) [18:27:44] PROBLEM - Puppet staleness on tools-worker-1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [43200.0] [18:31:16] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 07Epic, and 2 others: Remove support for precise OGE exec hosts - https://phabricator.wikimedia.org/T94792#3002772 (10bd808) [18:46:31] 10Quarry: Allow Quarry to work on All Project vice only Meta - https://phabricator.wikimedia.org/T157342#3002790 (10Reguyla) To be honest I am blocked on Meta, which is fine because I don't need access to it anyway since its mostly for Stewards and the WMF folks but it's causing a problem because Quarry is linke... [18:50:03] 06Labs: Please install a recent version of node and npm by default on labs - https://phabricator.wikimedia.org/T157368#3002816 (10Jdlrobson) [18:50:17] 06Labs: Please install a recent version of node and npm by default on new labs instance - https://phabricator.wikimedia.org/T157368#3002828 (10Jdlrobson) [18:52:32] zareen: check now? [18:56:04] madhuvishy: got kicked off paws internal, reestablishing connection now and will let you know [18:56:22] aah sorry about that [18:59:27] 06Tool-Labs-standards-committee, 10Wikimedia-Mailing-lists: Create mailing list for Tool-Labs-standards-committee - https://phabricator.wikimedia.org/T156218#3002871 (10Huji) Rob, I did not get the email. It might have gone into my spam folder which I purged minutes ago. I hope @Quiddity got it though. [19:00:17] madhuvishy: seems like i can save now with no issues - thanks!! [19:00:29] zareen: okay cool [19:03:55] madhuvishy: thanks from me too! :) [19:04:54] HaeB: :) no problem [19:34:01] 06Tool-Labs-standards-committee, 10Wikimedia-Mailing-lists: Create mailing list for Tool-Labs-standards-committee - https://phabricator.wikimedia.org/T156218#3003036 (10RobH) @Huji: That is normal. So when I create the list, it asks for a single email and allows me to auto-generate and auto-email out the list... [19:40:01] 06Labs: Please install a recent version of node and npm by default on new labs instance - https://phabricator.wikimedia.org/T157368#3003055 (10Jhernandez) Here are my notes on reinstalling everything node for one of my instances after the last changes > Labs has an old version of node installed (0.10), to get... [19:51:56] puppet run on my labs instance is broken without making a change. it also claims puppet is "disabled" but i can run it anyways. then it fails with all kinds of dependency problems and talks about files in .puppet in my home dir [19:52:25] Error: failed to set mode 755 on /home/dzahn/.puppet/ssl: Operation not permitted @ chmod_internal [19:52:36] and more like that. i have not touched it [19:52:42] it used to run normal [20:01:23] mutante: I don't have an explanation / can't think of anything we would have done to cause that. when did it start? [20:02:45] chasemp: 3974 minutes .. 2.75 days about [20:02:51] huh [20:03:01] that's bizarre [20:03:14] it's like i have a puppet:self but i thought i am just a regular instance with central master [20:04:04] let me check the instance config, maybe somebody else in the project did something [20:07:30] ah, i have to go to Horizon now to configure roles? looking [20:10:46] i went to horizon, logged in, switched to the project, click the instance name, puppet configuration. [20:10:50] and no classes are applied [20:11:34] though the behaviour of the instance is like it had puppetmaster::standalone on it [20:11:47] the role class that used to be on it is gone [20:12:24] role::puppetmaster::standalone shows up but as "Applied False" [20:13:35] aptly::server and mediawiki_vagrant appear in the list but have not been used by this ever [20:16:26] i'm trying to add the role class that it used to have before [20:16:38] but it does't seem to work [20:17:06] i clicked "Edit" under "other classes" and added it and saved. i'm back to the screen as it was before [20:17:40] ah, i see it now [20:17:52] it's applied, let's see what the puppet run does now [20:18:36] behaviour already changed that it gives the normal "is disabled" error instead of running anyways [20:19:07] ok, this fixed it. puppet run back to normal [20:32:15] mutante: so what is the world caused that? [20:34:07] chasemp: i don't know, the switch to horizon? that i did not add the class there earlier ? [20:34:30] That wouldn't make sense on the timeline but I don't know either [20:35:19] a weird side-effect was that "puppet is disabled" part [20:36:03] someone had to have done something or it's some combo of fallout from several events [20:37:16] hmm, yea, there are other people in this project, just looks like none logged in [20:37:40] it's ok with me if we don't find out. it works now:) [20:39:16] it's like it had puppetmaster::self temporarily [20:40:20] 10PAWS: Some usernames breaks notebook imports - https://phabricator.wikimedia.org/T157378#3003267 (10Abbe98) [20:46:25] yuvipanda: hi :) if you are around I got a regex fix for labstore nfs-mount-manager . That is from a couple week ago, you did a rebase but I guess forgot to +2/merge it : https://gerrit.wikimedia.org/r/#/c/333230/3 [20:47:21] hashar: I'm going to let madhuvishy|food deal with it when she has the time :) [20:47:32] good :) [20:48:42] hashar: yeah so I think that mod is ok but the original problem was something else, mainly trying to match soemthing that was't meant to be a mount point "/home" [20:48:47] hashar: we fixed the real problem - https://gerrit.wikimedia.org/r/#/c/333794/ [20:49:59] ohhh [20:50:04] madhuvishy: I still think the more particulalr match is an ok thing probably [20:50:27] atm the actual mount point will be on one single line by tthe point of grep [20:50:29] yeah - i saw some follow up comments from Tim on that patch [20:50:33] so ancors esp seem ok [20:50:35] it is a corner case really, but I guess it might hit us again at some point [20:50:37] if not a good idea [20:50:45] ^ and $ etc [20:51:34] madhuvishy: but take your time there is no urgency :] [21:22:50] 06Labs: Request creation of wikifactmine labs project - https://phabricator.wikimedia.org/T157385#3003391 (10Tarrow) [21:35:47] !log phabricator cherry-picking https://phabricator.wikimedia.org/D551 to phabricator [21:35:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL [21:41:42] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:46:10] !log phabricator cherry-picking https://gerrit.wikimedia.org/r/#/c/336304/ [21:46:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL [22:14:53] !log phabricator Start the migration to phabricator instance (labs) from phab-01. [22:14:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL [22:15:00] mutante twentyafterfour ^^ [22:15:15] paladox: :) [22:15:32] I will shut apache on there. [22:16:41] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [22:18:34] 10PAWS: Some usernames breaks notebook imports - https://phabricator.wikimedia.org/T157378#3003524 (10Abbe98) @YuviPanda I tracked this down to the [[ https://github.com/yuvipanda/ipynb-paws/blob/master/paws/__init__.py#L41 | following line ]] in the ipynb-paws repository: ``` url_segment = '/'.join([s.replace('... [22:21:24] !log phabricator completed the migration from phab-01 to phabricator. [22:21:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL [22:29:58] 10PAWS: User/file-names containing _ breaks notebook imports - https://phabricator.wikimedia.org/T157378#3003538 (10Abbe98) [22:40:17] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3003555 (10Dzahn) [23:40:17] 06Labs, 10Stashbot: Get a cloak for morebots & labs-morebots - https://phabricator.wikimedia.org/T140547#3003727 (10Krinkle) 05Open>03Resolved a:03Krinkle Stashbot has an `@wikimedia/bot/stashbot` cloak.