[00:44:44] !log tools.edgarsdev Edited /data/project/edgarsdev/www/python/src/app.py to fix syntax and stop CrashLoopBackOff of Kubernetes pod [00:44:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.edgarsdev/SAL [00:49:58] !log tools.fireflytools Stopped webservice; stuck in CrashLoopBackOff due to missing python dependency. $HOME/www/python/venv should be examined (T216461) [00:50:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.fireflytools/SAL [00:50:01] T216461: fireflytools python Kubernetes pod stuck in CrashLoopBackOff state - https://phabricator.wikimedia.org/T216461 [00:54:51] !log tools.flossbrowser Commented out $HOME/.lighttpd.conf contents and restarted webservice (T216462) [00:54:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.flossbrowser/SAL [00:54:53] T216462: flossbrowser php5.6 Kubernetes pod stuck in CrashLoopBackOff - https://phabricator.wikimedia.org/T216462 [01:02:50] !log tools.itwikiarticlebot Stopped webservice (T216463) [01:02:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.itwikiarticlebot/SAL [01:02:53] T216463: itwikiarticlebot ruby Kubernetes pod stuck in CrashLoopBackOff - https://phabricator.wikimedia.org/T216463 [01:06:32] !log tools.nsfw Stopped broken webservice (T216464) [01:06:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.nsfw/SAL [01:06:35] T216464: nsfw python Kuberntes pod stuck in CrashLoopBackOff - https://phabricator.wikimedia.org/T216464 [01:17:23] !log tools.webarchivebot kubectl delete deployment/webarchivebot-backend (T216465) [01:17:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.webarchivebot/SAL [01:17:27] T216465: webarchivebot custom Kubernetes pod stuck in CrashLoopBackOff - https://phabricator.wikimedia.org/T216465 [01:49:26] !log tools Revoked Toolforge project membership for user DannyS712 (T215092) [01:49:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [01:49:29] T215092: Rename developer account 'DannyS712 bot' to 'DannyS712' - https://phabricator.wikimedia.org/T215092 [03:24:05] !help is this a good command line for backing up my toolsdb application db? `tools.enwp10@tools-bastion-03:~$ mysqldump --user=s51114 --password --lock-tables --databases s51114_enwp10 > enwp10.2019-02-18.dump.sql` [03:24:06] Sorry, you are not authorized to perform this [03:24:56] it says I'm not authorized to perform '!help' so I'll ask again without [03:24:58] is this a good command line for backing up my toolsdb application db? `tools.enwp10@tools-bastion-03:~$ mysqldump --user=s51114 --password --lock-tables --databases s51114_enwp10 > enwp10.2019-02-18.dump.sql [03:25:15] LOL [03:25:33] the bot is a bit goofy. It sees "! is ..." as an attempt to set a new keyword [03:25:51] ahh [03:26:23] is it cruel to do this to the database after it is so freshly recreated? [03:26:31] audiodude: that command should work to make a dump, but if you are doing that to the tool's homedir you may not be getting as much backup as you hoped [03:27:38] what is the quota in the tool homedir and where should I be writing it instead? [03:27:44] audiodude: oh... that tool has 33GB of data. How about please don't try to dump that to NFS [03:28:06] sure can do [03:28:07] https://tools.wmflabs.org/tool-db-usage/owner/s51114 [03:28:17] or at least exclude the "logging" table [03:28:54] I can do that, I don't think it's super crtical [03:32:02] so where should I dump it to, assuming the homedir is NFS [03:39:02] Homedir is NFS. Hrm. [03:39:12] /tmp :) [03:39:24] * zhuyifei1999_ is not serious [03:39:40] heh [03:42:06] alas, hopefully next presidents day we'll resume the annual tradition of giving each other gifts from the presidents day tree [03:42:28] bd808: do we use scratch for that? [03:42:39] Something like a dump before moving it off there [03:42:50] ssh tunnel? :) [03:45:00] scratch is NFS as well, but it has a very different use profile [03:45:53] harej: tree? [03:53:04] scratch would be better than $HOME I think, but archives that big are going to be a pain anywhere until maybe we build the backup service [03:55:55] I would be happy to download it to my local disk and delete the toolforge copy once it's made [03:56:19] this isn't part of a persistent backup solution, this is a one off [03:57:44] actually, should I just SSH tunnel to the toolsdb and backup to local disk straight away? [03:58:23] the only thing I would worry about would be if my copy of mysqldump is compatible with the Maria version being run [04:00:00] audiodude: an ssh tunnel and local dump is at least worth trying. There are some instructions at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#SSH_tunneling_for_local_testing_which_makes_use_of_Wiki_Replica_databases on tunneling. [04:01:05] You might try using `--single-transaction --quick` options too -- https://stackoverflow.com/questions/5666784/how-can-i-slow-down-a-mysql-dump-as-to-not-affect-current-load-on-the-server [04:10:46] okay looks like it's working with the tunnel [05:34:03] HI, which is PHP memory limit? [05:34:11] For Toolforge [05:34:12] !help [05:34:12] Zoranzoki21: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [05:34:44] Zoranzoki21: in which runtime environment? [05:34:53] kubernetes [05:35:01] * bd808 checks [05:35:08] on stretch [05:35:17] no trusty [05:35:51] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Memory_limit -- I think its 2G [05:36:26] No it [05:36:39] For uploading files via PHP [05:37:18] This https://premium.wpmudev.org/blog/increase-memory-limit/ [05:38:10] So you want to know what upload_max_filesize is? [05:38:31] Yes [05:40:11] 2M [05:40:30] Can it be increased? [05:40:42] For tool [05:40:49] Here's how I found that: webservice --backend=kubernetes php7.2 shell and then php -i|grep upload_max_filesize [05:41:09] Ok. Can it be increased on 4M for my tool testwiki? [05:43:38] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Lighttpd#PHP -- create a $HOME/public_html/.user.ini file and use it to change the setting. You will need to restart the webservice for that to take effect. [05:43:43] * bd808 hopes that works [05:45:17] !log tools.testwiki Restarted tool for taking effect of created .user.ini file [05:45:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.testwiki/SAL [05:48:53] Looks like it is working, thanks! [05:50:08] awesome [05:51:33] bd808: Is it possible to stop creating access.log file? [05:54:51] Zoranzoki21: I think it is, but I closed my laptop so not easy for me to check right now. If it can be done it would be by adding a setting in $HOME/.lighttpd.conf. [05:55:30] Ok, thanks [05:55:33] Try looking on the lighttpd help page on wikitech [05:55:38] Ok, thanks! [05:56:15] If you can’t figure it out, file a bug assigned to me and I’ll track down the right config [05:56:31] I’m really sure I have done that before [12:00:25] !log admin added nagios@icinga2001.wikimedia.org to cloud-admin-feed@ allowed senders [12:00:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:16:48] you could get a wikimedia/ cloak bstorm_ [16:17:10] hehe...I've tried! [16:17:19] I probably should [16:17:37] After failing vaguely once, I wandered off and haven't tried again [16:18:50] you got rejected for a cloak? [16:18:56] or there were problems submitting the form? [17:13:03] Not rejected. The process went down a hole and didn't come back out. [17:13:30] I might bother again. Maybe if I decide to get rid of the underscore in my username as well [17:18:57] bstorm_, arturo, bd808, and others: I hate "me too" mailing list replies, so I'll +1 https://lists.wikimedia.org/pipermail/wikitech-l/2019-February/091568.html here. Thanks for the quick resolution! [17:19:47] thanks anomie. Its nice to hear from folks who are supportive :) [17:21:30] thanks :-) [17:21:47] Thanks :) [17:26:08] bd808: bstorm_ Definitely a big thanks to all of you! But I'd also not want this "crisis to go to waste" - would it also be constructive to let your higher ups of where they could better send resources to help prevent these in the future? That is, how can we help you folks by letting people in positions of power/funding get you things to help? [17:26:47] rewrite: let your higher ups *know* of where they could better send resources [17:28:04] things that help us the most are ways to show the value and impact of the tools and services that people run inside Cloud Services [17:28:55] value and impact to the movement was what got the creation of the Cloud Services team approved 2 years ago [17:33:39] bd808: Thanks, am happy to do what I can to express that [17:45:58] since I don't read wikitech-l, I'll +1 the thank you here too! bstorm_, arturo, bd808, and anyone I forgot: thanks for the work you're doing, not just around this incident, but also in general to keep things running smoothly day in/day out [19:05:34] gtirloni: arturo: do you remember any context on why raid10 was chosen for the cloudvirt hosts, btw? [19:05:57] cdanis: no idea. Way before I was here :-) [19:06:13] I wasn't around but it seems like a sensible option (good balance of read/write performance) [19:06:57] you have any other suggestion cdanis ? [19:08:48] mostly curious. I saw the recent suggestion for a hot spare and I think that's a good idea :) [19:10:34] cdanis: at least part of it is that RAID 10 is semi-standard for all prod hosts [19:11:16] cdanis: if I can summarize the consensus from our meeting today is that 1) losing disk capacity sucks; 2) we may not need all of that capacity after all so we could have spares 3) rebuilding the RAID volume is a very time consuming task (have to drain the hypervisor, reimage, etc) so it's going to happen slowly at each opportunity we have 4) we have some cloudvirts to move from the `main` region to the `eqiad1` region and, when we reimage [19:11:17] them, they will get spares... that's all I remember right now (multi-tasking) [19:11:47] reimage, really? wow, didn't expect that [19:12:44] bd808: that also makes sense. i've only been looking at our software RAID configurations lately, which are a bit of a mess; don't know the story on the hw RAID side [19:12:48] some servers have dedicated OS disks, so in that case no.. but most of them are a single RAID volume [19:13:56] cdanis: would love to hear what you think about stripe size, we default to 256kb and it seems too much for me (more on the sequential/streaming side of workloads) [19:16:06] your intuition there is better than mine gtirloni. possibly it would make sense to do some measuring of I/O request sizes on the current hosts? [19:16:29] yeah, we should do that.. I was exposed to that information this week so haven't done any research [19:17:01] luckily we have the workloads in place so it's not hard to confirm it (or not) [19:17:11] i'm super clueless about it in prod though [19:19:28] the uptime of wmcs is unbelievable for something that used to be 'labs' [19:19:56] so +1 to that mailing list post as well [19:30:48] * framawiki is happy to see that T214921 is moving forward [19:31:19] https://phabricator.wikimedia.org/T214921 is about replicating production elasticsearch indices to labs [19:31:48] framawiki: :) I'm super excited about it too. [19:40:30] how can i enforce the webserver on my instance to set the encoding to utf8? i've put "AddDefaultCharset utf-8" to .htaccess in web root and restarted apache and it did not help [20:42:41] Danny_B: is your apache config setup to process htaccess files? [21:15:19] harej: i reused your sitematrix import code for PAWS the other day, thanks ;) (https://paws-public.wmflabs.org/paws-public/User:Tbayer_(WMF)/Percentage%20of%20pages%20with%20(ambox)%20issues.ipynb ) [21:17:33] yay code reuse! [21:17:42] that's the dream with PAWS [22:06:00] bd808: i assume so - mods-enabled contains rewrite.load [22:07:26] Danny_B: you would need AllowOverride enabled for the directory tree that you are trying to use the .htaccess file in as well. But if you have full control of the apache config you can just make your settings changes there too [22:07:31] https://httpd.apache.org/docs/2.4/howto/htaccess.html [22:19:15] bd808: so i've put AllowOverride All in apache2.conf in restarted apache, still does not work :-/ [22:21:49] Danny_B: StackOverflow is probably your friend for figuring out how to debug your apache2 config. I would start by increasing the log verbosity and making requests that should include the .htaccess change to see if apache is even trying to process it [22:24:59] bd808: in http headers i actually see nginx - which is weird, because i don't see it installed on my instance. so is there any chance that my apache sends to some sort of proxy nginx which cuts the encoding header? [22:26:19] Danny_B: the shared Cloud VPS proxy is nginx, but it should not be tampering with headers. Its just a typical reverse proxy setup to whatever backend you name though Horizon [23:49:14] bd808: fyi: so the issue is that new apache does not add mime type of text/plain to extensionless files (via DefaultType which is no longer valid) and such mime type is needed to apply the charset