[00:23:54] <_huh> Is it okay to ask a really dumb question because I've wasted too much time on it already? [00:24:16] <_huh> inb4 "Don't ask to ask, ask" [00:27:58] _huh, yes [00:28:10] chances are the question is not dumb [00:29:38] <_huh> I'm trying to run an SQL query on the replica database on toolforge, using LIKE in a case-insensitive manner [00:29:52] <_huh> e.g. SELECT COUNT(*) FROM page WHERE page_title LIKE "Example%" [00:30:13] that'll be case-sensitive [00:30:16] <_huh> I already tried rapping page_title in LOWER() and setting different collations but I haven't figured out anything that works [00:30:23] <_huh> *wrapping LOL [00:31:03] <_huh> I'd like [[ExAmPlE 123]] to match for example [00:32:24] _huh, so you tried where lower(page_title) like 'example%' ? [00:32:57] <_huh> yeah amazingly it didn't appear to work, but IDK if that's because my settings are wrong [00:33:01] <_huh> it does work on quarry though [00:33:22] can you give an example? [00:34:46] <_huh> https://quarry.wmflabs.org/query/35130 [00:35:13] ... huh. [00:35:46] <_huh> if you make it capital W Wikimania it returns results [00:35:57] <_huh> so I guess it doesn't work on Quarry either (I thought it did) [00:36:48] <_huh> I don't know how the heck LOWER("Wikimania") LIKE "wikimania%" is false and LOWER("Wikimania") LIKE "Wikimania%" is true [00:36:53] MariaDB [metawiki_p]> select page_title, LOWER(page_title) from page where page_id = 10454666 \G [00:36:54] *************************** 1. row *************************** [00:36:54] page_title: Wikimania_2019 [00:36:54] LOWER(page_title): Wikimania_2019 [00:36:55] <_huh> unless LOWER(X) just returns X [00:36:58] <_huh> ah [00:37:43] <_huh> huh [00:39:09] MariaDB [metawiki_p]> select page_title, LOWER(convert(page_title using 'latin1')) from page where page_id = 10454666 \G [00:39:09] *************************** 1. row *************************** [00:39:10] page_title: Wikimania_2019 [00:39:10] LOWER(convert(page_title using 'latin1')): wikimania_2019 [00:39:38] <_huh> Thanks! [00:39:41] np [00:40:15] I think this is because it's varbinary [00:40:58] https://dev.mysql.com/doc/refman/5.7/en/cast-functions.html says LOWER() and UPPER() are ineffective when applied directly to binary strings because the concept of lettercase does not apply. To perform lettercase conversion of a binary string, first convert it to a nonbinary string: [00:41:24] <_huh> Well, I don't know if I ever would have figured that out [00:41:27] <_huh> lol [00:41:59] welcome back by the way [00:42:05] haven't seen you around for a few years [00:42:10] <_huh> thanks [00:42:16] <_huh> nice to see you too [14:55:09] !help - Since 15:12 Search has been down, https://usercontent.irccloud-cdn.com/file/Ftqpe5ko/image.png, Any fix?? [14:55:09] RhinosF1: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [14:55:31] ^^ Pinging Praxidice, Originally reported [14:55:34] RhinosF1: what wiki? [14:55:49] English Wikipedia bd08 [14:56:58] hmmm... works for me? [14:57:29] Was down for Praxidicae at 15:12 and just failed for me [14:57:43] it's on and off [14:58:39] Let's move to #wikimedia-operations. This channel is really for Toolforge and Cloud VPS support rather than the main project wikis [14:58:46] Okay [17:52:10] !help Is anyone able to force shut down puppet-lta on cloudvirt1015 i cannot do it via ssh or horizon for some reason [17:52:10] Zppix: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [17:52:33] Zppix: you want the instance shut down? [17:52:56] bd808: similar to the shut off button on horizon yes [17:54:07] Zppix: what project is that instance in? [17:54:19] bd808: lta-tracker [17:54:20] * bd808 too lazy to search for it ;) [18:00:39] bd808: lmk when youve done that please [18:00:59] I'm poking around. [18:01:59] bd808: *poke* xD [18:03:21] i think cloudvirt1015 was having issues from andrewbogott comment earlier this morning. [18:03:54] paladox: that would explain the kernal panic error [18:04:00] In the horizon console log [18:04:41] yeah, that cloudvirt might be broken. I think that's the one we replaced a CPU in recently too [18:05:03] * bd808 is failing to login to Horizon currently [18:05:28] bd808: any way to migrate without losing data? [18:10:02] I'm at a party, but: no real risk of data loss... Just reboot for now and I'll evacuate when I have the time [18:10:10] Zppix: ok. I finally got logged into Horizon. Now I can see that instance is marked as "error" state by OpenStack. [18:10:14] Yea [18:10:31] Hm, that's new :( [18:10:41] andrewbogott: thats the issue i cant [18:10:55] "Failed to terminate process 31596 with SIGKILL: Device or resource busy" [18:11:06] But I'm out for now, sorry. There's a ticket if someone searches for cloudvirt1015 [18:11:26] bd808: that was me trying to shut it down myself [18:11:42] andrewbogott: I'll keep poking for a bit. This is not UBN, just a pain in the butt for Zppix [18:11:50] Yes [18:11:55] Seeing thats my puppet master [18:12:11] ticket is T220853 [18:12:12] T220853: VMs on cloudvirt1015 crashing - https://phabricator.wikimedia.org/T220853 [18:13:30] Zppix: do you have a bunch of custom stuff there that will be hard to recreate, or could you just abandon that instance for now and make a new puppetmaster with reasonable effort? [18:13:45] bd808: yeah i could recreate it [18:13:53] Its just a pain the @$$ [18:14:08] If you have access you can log in and force a restart of nova-compute [18:14:22] Otherwise I can help in a few hours [18:14:33] I can try that [18:14:40] bd808: just lmk [18:18:29] Or worst case reboot 1015 altogether. I checked yesterday and I'm 90% sure that everything on there is redundant [18:19:02] andrewbogott: i think the server itself thinks its redundant := [18:19:04] :P [18:21:45] !log lta-tracker Attempting to reboot puppet-lta via OpenStack cli [18:21:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Lta-tracker/SAL [18:23:03] !log lta-tracker nova reset-state --active 8b764eb9-2dca-4902-a9c5-ed54fa3fc57d [18:23:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Lta-tracker/SAL [18:23:57] !log lta-tracker Attempting to reboot puppet-lta via OpenStack cli (take 2) [18:23:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Lta-tracker/SAL [18:25:56] !log admin Restarted nova-compute service on cloudvirt1015 (T220853) [18:25:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [18:25:59] T220853: VMs on cloudvirt1015 crashing - https://phabricator.wikimedia.org/T220853 [18:29:10] !log admin Rebooting cloudvirt1015 (T220853) [18:30:19] that paged :-) [18:30:43] whats wrong with my IRC? [18:30:53] * Guest7200 is arturo [18:31:08] nickserv hates you arturo [18:31:15] Apparently [18:31:34] cloudvirt1015 and I see at least one VM unreachable complaint? [18:31:36] Lol [18:31:56] bstorm_: I rebooted it -- T220853 -- and it's not back yet [18:32:01] Dont mind me im just breaking things [18:32:09] Sorry for getting you all paged [18:32:16] Ok. Rebooted sounds better than just everything blowing up :) [18:32:38] I'm out of the usual IRC channels and I can't identify right now [18:33:55] So I take it we are looking to get 15 working well enough to get VMs off it? [18:34:34] bstorm_: I've got a hunch that its stuck at some bios prompt :/ [18:34:44] Let me check [18:35:07] and yeah this was a small attempt to get it limping along before evacuating [18:36:10] `A stop job is running for Suspend/R…bvirt Guests (7min 14s / no limit)` [18:36:21] It's trying to shut down vms [18:36:28] bstorm_: I'm here too if needed [18:36:36] thanks [18:36:47] I'm about to enter the theater. I dont have my laptop with me. I will be back in a couple of hours [18:36:54] (arturo) [18:37:16] On IRCCloud you still have your photo and full name at least arturo :) [18:38:03] I see full name. No photo [18:39:23] bd808: so it's trying to shut down the vms, but I see kernel debug messages and problems while it works. https://www.irccloud.com/pastebin/vFmT0HL9/ [18:40:36] yuck. [18:41:19] https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt [18:42:12] so "1043 GPs behind" pretty much means bad things [18:44:54] :( [18:44:56] yeah [18:45:47] sounds like a case of downtime in icinga, power it all off, try to evacuate VMs by request in the week? [18:46:03] godog just downtimed it :) [18:46:46] As for hard power off, we are likely to lose some data, unfortunately...but that may be a foregone conclusion. [18:47:01] indeed, just three hours though, in the happy case a reboot brings it back [18:47:29] As of now, the stop job has not succeeded for 18 minutes [18:48:01] the host was known to have problems yesterday and it's already gone through one reboot [18:48:03] :/ [18:48:25] Ah. That really sucks [18:48:47] I feel like i caused more problems then there was initally [18:49:03] Zppix: not anything you did :) [18:49:27] blame me! [18:49:43] Looks like cpus 37 through 71? Stalling in loop [18:50:00] bd808: only cause i indirectly asked you to? [18:50:02] Lol [18:51:10] It's still trying at 21 min. Shall I kick it over with the remote power button at...26min? [18:51:36] `[682479.269046] 49-...: (3099 GPs behind) idle=a0e/0/0 softirq=2273834/2273835 fqs=0` 49 is really struggling :-p [18:51:36] bstorm_: yeah. hit it with the bigger hammer [18:51:57] * Zppix wants to see this bigger hammer [18:51:57] and.rew's phrasing was "I've just seen two VMs crash in a row on cloudvirt1015, so I'm a bit worried that the processor there is making mistakes" :D [18:52:36] we swapped a CPU in that host ~2 weeks ago [18:52:50] probably mainboard I guess [18:57:39] Ok, switching to the hard power cycle [18:58:23] I just wanted to give it a chance [18:59:52] It's booting [19:01:03] made it through post, linux is starting [19:01:28] bios seemed as happy as could be. I love hardware [19:01:36] Ok. It's up fully [19:01:42] * bd808 smells sarcasm [19:01:57] `cloudvirt1015 login: [ 37.796241] libvirt-guests.sh[2020]: Resuming guest i-00006887: done` [19:02:12] VMs should be coming up ...which may or may not be good [19:02:52] that virt is mostly full of tools and toolsbeta instances [19:03:34] which matches the recent re-pooling of it and moving more things out of eqiad to eqiad1-r [19:05:15] Zppix: puppet-lta is up. I would suggest you backup anything special you have done there and build a new instance to replace it [19:05:17] It is successfully resuming most instances [19:05:38] maybe this is a good opportunity to evacuate? [19:05:41] I just don't have much faith in them [19:06:31] The nice thing about tools and toolsbeta stuff is a lot of it has zero important local data. [19:06:42] yeah we should evacuate it. Multiple instance kernel panics and hard reboot needed are not good signs of hardware health there [19:06:58] Not at all :) Very bad signs. [19:11:35] that virt has 7 k8s workers on it. That's a pretty big hit to our capacity [19:51:56] !log tools started migrating tools-flannel-etcd-02 to cloudvirt1013 T220853 [19:52:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:52:01] T220853: VMs on cloudvirt1015 crashing - https://phabricator.wikimedia.org/T220853 [19:58:51] !log tools started migrating tools-k8s-etcd-03 to cloudvirt1012 T220853 [19:58:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:58:54] T220853: VMs on cloudvirt1015 crashing - https://phabricator.wikimedia.org/T220853 [20:17:43] !log discovery-stats migrated product-analytics-bayes to cloudvirt1009 for T220853 [20:17:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Discovery-stats/SAL [20:17:48] T220853: VMs on cloudvirt1015 crashing - https://phabricator.wikimedia.org/T220853 [20:28:39] !log discovery-stats migrated product-analytics-test to cloudvirt1009 for T220853 [20:28:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Discovery-stats/SAL [20:28:46] T220853: VMs on cloudvirt1015 crashing - https://phabricator.wikimedia.org/T220853 [20:32:33] Zppix: are you rebuilding and deleting puppet-lta? If not, and you don't mind a bit more outage on it, I can move it now to another physical server. [20:36:14] !log tools moving tools-elastic-02 to cloudvirt1009 for T220853 [20:36:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:36:20] T220853: VMs on cloudvirt1015 crashing - https://phabricator.wikimedia.org/T220853 [20:42:38] I'll move puppet-lta. That will leave behind nothing but tools stuff and toolsbeta, which will be a better state to leave things in. [20:43:56] bstorm_: I am finally at a point where I can rush home, but it seems like the drama is mostly over? [20:43:59] !log lta-tracker Moving puppet-lta to a new server because of hardware problems T220853 [20:44:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Lta-tracker/SAL [20:44:02] T220853: VMs on cloudvirt1015 crashing - https://phabricator.wikimedia.org/T220853 [20:44:21] andrewbogott: I'm evacuating the most important stuff from 1015. [20:44:40] I'll take a break when I'm mostly down to workers and toolsbeta stuff [20:45:00] Should be ok [20:45:07] Ok! I'm inclined to leave workers there until we have a good diagnosis since being busy seems to trigger the issue. [20:45:31] Yeah. I'll try to take some notes from what I collected on console [20:45:36] Thank you for handling it! I am going to go buy some groceries, but if you want to leave that in this for me to do I will attend to them in half an hour or so. [20:46:05] Hm, the voice recognition screwed that up and now I'm not sure what I actually said. [20:46:13] Np! I need to figure this out anyway, and my apologies if I didn't put things in a great place lol [20:46:37] It should be fine [21:08:05] !log tools Moving tools-prometheus-01 to cloudvirt1009 and tools-clushmaster-02 to cloudvirt1008 for T220853 [21:09:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:10:11] T220853: VMs on cloudvirt1015 crashing - https://phabricator.wikimedia.org/T220853 [21:47:37] thanks bstorm_