Note that this question is NOT about searching for (non)accented characters.
Suppose I have a table where there is a column name, with collation utf8mb4_unicode_ci.
This collation works perfectly for the purpose of selecting the base selection
in a case-insensitive, accent-insensitive way.
The problem is that I need to order the results in an accent-sensitive and case-insensitive way.
The purpose of this is to select every name starting with some character/string and sort them "alphabetically", first should be not-accented, then accented.
From selection e.g.:
Črpw
Cewo
céag
čefw
The final results should be:
Cewo
céag -- because accented e is more than non-accented
čefw
Črpw -- because r is more than e
Note that c/C < č/Č , but lower/upper cases are handled as equals.
I tried searching for this problem, but there are only popping similar questions or questions about searching, which is not the case, the searching itself is fine.
From mentioned I've tried this test query:
SELECT * FROM
(SELECT 'Črpw' as t
UNION SELECT 'Cewo'
UNION SELECT 'céag'
UNION SELECT 'čefw')virtual
ORDER BY t COLLATE utf8mb4_czech_ci ASC
Which produces something very similar to what I want
céag
Cewo
čefw
Črpw
But note that é gets ordered before e.
Is there a way how to get to the results order I want to have?
Using: MySQL 5.5.54 (Debian)
Related
I am doing a large union select and all of the union selects will retrieve a result except for one. One of them has a collation error. For example
SELECT word FROM words WHERE english = 'hello'
UNION
SELECT word FROM words WHERE english = 'no'
UNION
SELECT word FROM words WHERE english = 'пыук';
The last one would produce a collation error and therefore the whole select fails. Is there a way that I can include the select for the one that will return the error while still getting the results for the rest of them?
One way to do this would be to CONVERT() the collation of the string you’re looking up to the collation of the column:
SELECT word FROM words WHERE english = CONVERT('пыук' USING utf8);
Side Note: Be sure to change utf8 to the same collation as your english column.
This is suboptimal, but it will do what you need.
I try to use a regex with mysql that search boundary words in a json array string but I don't want the regex match words order because I don't know them.
So I started firstly to write my regex on regex101 (https://regex101.com/r/wNVyaZ/1) and then try to convert this one for mysql.
WHERE `Wish`.`services` REGEXP '^([^>].*[[:<:]]Hygiène[[:>:]])([^>].*[[:<:]]Radiothérapie[[:>:]]).+';
WHERE `Wish`.`services` REGEXP '^([^>].*[[:<:]]Hygiène[[:>:]])([^>].*[[:<:]]Andrologie[[:>:]]).+';
In the first query I get result, cause "Hygiène" is before "Radiothérapie" but in the second query "Andrologie" is before "Hygiène" and not after like it written in the query. The problem is that the query is generated automatically with a list of services that are choosen with no order importance and I want to match only boundary words if they exists no matter the order they have.
You can search for words in JSON like the following (I tested on MySQL 5.7):
select * from wish
where json_search(services, 'one', 'Hygiène') is not null
and json_search(services, 'one', 'Andrologie') is not null;
+------------------------------------------------------------+
| services |
+------------------------------------------------------------+
| ["Andrologie", "Angiologie", "Hygiène", "Radiothérapie"] |
+------------------------------------------------------------+
See https://dev.mysql.com/doc/refman/5.7/en/json-search-functions.html#function_json-search
If you can, use the JSON search queries (you need a MySQL with JSON support).
If it's advisable, consider changing the database structure and enter the various "words" as a related table. This would allow you much more powerful (and faster) queries.
JOIN has_service AS hh ON (hh.row_id = id)
JOIN services AS ss ON (hh.service_id = ss.id
AND ss.name IN ('Hygiène', 'Angiologie', ...)
Otherwise, in this context, consider that you're not really doing a regexp search, and you're doing a full table scan anyway (unless MySQL 8.0+ or PerconaDB 5.7+ (not sure) and an index on the full extent of the 'services' column), and several LIKE queries will actually cost you less:
WHERE (services LIKE '%"Hygiène"%'
OR services LIKE '%"Angiologie"%'
...)
or
IF(services LIKE '%"Hygiène"%', 1, 0)
+IF(services LIKE '%"Angiologie"%', 1, 0)
+ ... AS score
HAVING score > 0 -- or score=5 if you want only matches on all full five
ORDER BY score DESC;
I have a very long list of users. I want to group them by first letter of name. If the first letter is not a letter, it's grouped under #, so I have max 27 groups, for a-z + #.
I want to show only the gorup labels (e.g. F) if it will have results, and for every letter I want to know how many results it will have. So I do a single GROUP query to count all groups:
SELECT
IF(lastname REGEXP '^[a-z]', UPPER(SUBSTRING(lastname, 1, 1)), '#') first_char,
COUNT(1) num_users
GROUP BY first_char
That seems to work, BUT using REGEXP means that Ö isn't an O, but a #. That's a problem, because LIKE does find 'Ö' = 'O', so it will be in the O group when I name LIKE 'O%'. I could use REGEXP in the results query too, but I rather file Ö under O.
So the LIKE query works perfectly, but the GROUP query doesn't. How do I do exactly what LIKE does during comparisons?, so the group numbers and results always perfectly match.
Or another way to count correctly?
edit 1
Using LIKE a OR LIKE b OR .. OR LIKE z in the IF doesn't even work, because then the group might be Ö instead of O. The numbers will be correct, but the group label won't be. I really need a conversion...
edit 2
Thanks to #mpen. lastname REGEXP '^[[:alpha:]]' is shorter than 26 LIKEs, but the Ö label problem remains. Converting that outside MySQL is easy though.
You can do the grouping like this:
select
IF(name REGEXP '^[[:alpha:]]', UPPER(SUBSTRING(name, 1, 1)), '#') first_char,
COUNT(1) num_users
from _grouptest
group by first_char
And then remove the accents in your scripting language of choice, or if you're brave, you can attempt to remove them in pure MySQL.
_.deburr in JS
Str::removeDiacritics from my PHP lib ptilz which was yoinked from WordPress
I have a table that contains two TEXT fields with textual content, and in a separate table, a field that contains comma separate values of keywords that can be more than one word. The following query works in my WAMP using Appserv, does not works in our Hostgator LAMP...why??
SELECT
t.content_me, t.content_visitor, t.Id, exp.owner_user_id, exp.name
FROM (SELECT t.content_me, t.content_visitor
FROM `texts` AS t
WHERE t.owner_user_id=1 *<== obviously this changes...*
ORDER BY t.Id DESC) AS t
INNER JOIN exp ON t.Id = exp.owner_user_id AND
t.content_me REGEXP (REPLACE(exp.keywords,',','|'))
WHERE t.owner_user_id=e.owner_user_id=6
ORDER BY t.Id DESC
Furthermore, if I literally put a value in this part:
t.content_me REGEXP (REPLACE(exp.keywords,',','|'))
as lets say:
t.content_me REGEXP ('yeah|ok')
It works in Hostgator. So I guess the problem is that REGEXP (REPLACE(exp.keywords,',','|')) thingy...right?
EDIT
Ok, I simplified the query just for the fun of it :)
SELECT
t.Id, t.text, t.owner_user_id FROM `t`
LEFT JOIN e ON e.owner_user_id = t.owner_user_id
WHERE
t.text REGEXP REPLACE(e.keywords,',','|') AND t.owner_user_id=1
Same results: works in WAMP, doesn't in LAMP. Also if I do literal REGEXP like
REGEXP REPLACE('yeah,can',',','|')
Works in both servers. My guess is that something is happening with
REGEXP REPLACE(e.keywords,',','|')
i.e, having the REGEXP use field content and not literal sting.
EDIT:
Well...now I see the LAMP MySQL throws an error (not happening in WAMP):
Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_unicode_ci,IMPLICIT) for operation 'regexp'
So....
REGEXP REPLACE(e.keywords,',','|') COLLATE utf8_unicode_ci
Fixed it
I have a simple table of a single column with rows of char(12) like:
DRF4482
DRF4497
DRF451
DRF4515
EHF452
FJF453
GKF4573
I want to select all of the rows that are between D and F, and have 4 numbers at the end. Like DRF4482, DRF4497, DRF4515, etc. I've tried a number of different wildcard combinations but I get no rows. I'm using:
SELECT * FROM `expired` WHERE id like '%[D-F][A-Z][A-Z]____';
I've even tried to broaden it to:
SELECT * FROM `expired` WHERE id like '%[D-F]%';
and that returns nothing as well.
I've even tried COLLATE latin1_bin based on some other posts but that didn't work either. My table is utf8, but I've created a second table as latin1 and tried a few different collations with the same results - no rows.
Where is my error?
You need to use REGEXP instead of LIKE. Notice that the syntax is a little different; it doesn't do anything with the SQLish % wildcard characters.
So, you want
id REGEXP '[D-F][A-Z][A-Z][0-9]{4}'
for this app. Hopefully you don't have multibyte characters in these strings, because MySQL's regexp doesn't work correctly in those circumstances.