Search With LIKE inside a JSON object in Mysql [duplicate] - mysql

I have a MySQL query, where I filter by a json field:
SELECT id, username
FROM (SELECT id,
Json_extract(payload, '$.username') AS username
FROM table1) AS tmp
WHERE username = 'userName1';
It returns 1 row, which looks like:
1, "userName1" See the quotes that are not in the clause?
What I need is to make the WHERE clause case insensitive.
But when I do
WHERE username LIKE 'userName1';
it returns 0 rows. I don't understand why it works this way, the = clause works though it doesn't have those double quotes.
If I do
WHERE username LIKE '%userName1%';
now also returns the row, because %% takes quotes into consideration:
1, "userName1"
But when I do
WHERE username LIKE '%username1%';
it returns 0 rows, so unlike the usual MySQL LIKE it's somehow case sensitive.
What am I doing wrong and how to filter the json payload the case insensitive way?
EDIT=========================================
The guess is that COLLATE should be used here, but so far I don't understand how to make it work.

Default collation of MySQL is latin1_swedish_ci before 8.0 and utf8mb4_0900_ai_ci since 8.0. So non-binary string comparisons are case-insensitive by default in ordinary columns.
However, as mentioned in MySQL manual for JSON type
MySQL handles strings used in JSON context using the utf8mb4 character set and utf8mb4_bin collation.".
Therefore, your JSON value is in utf8mb4_bin collation and you need to apply a case insensitive collation to either operand to make the comparison case insensitive.
E.g.
WHERE username COLLATE XXX LIKE '...'
where XXX should be a utf8mb4 collation (such as the utf8mb4_general_ci you've mentioned.).
Or
WHERE username LIKE '...' COLLATE YYY
where YYY should be a collation that match the character set of you connection.
For equality comparison, you should unquote the JSON value with JSON_UNQUOTE() or the unquoting extraction operator ->>
E.g.
JSON_UNQUOTE(JSON_EXTRACT(payload, '$.username'))
Or simply
payload->>'$.username'
The JSON type and functions work way different from ordinary data types. It appears that you are new to it. So I would suggest you to read the manual carefully before putting it into a production environment.

Okay, I was able to solve the case insensitivity by adding COLLATE utf8mb4_general_ci after the LIKE clause.
So the point here is to find a working collation, which in its turn can be found by researching the db you work with.

Related

MySQL LIKE with json_extract

I have a MySQL query, where I filter by a json field:
SELECT id, username
FROM (SELECT id,
Json_extract(payload, '$.username') AS username
FROM table1) AS tmp
WHERE username = 'userName1';
It returns 1 row, which looks like:
1, "userName1" See the quotes that are not in the clause?
What I need is to make the WHERE clause case insensitive.
But when I do
WHERE username LIKE 'userName1';
it returns 0 rows. I don't understand why it works this way, the = clause works though it doesn't have those double quotes.
If I do
WHERE username LIKE '%userName1%';
now also returns the row, because %% takes quotes into consideration:
1, "userName1"
But when I do
WHERE username LIKE '%username1%';
it returns 0 rows, so unlike the usual MySQL LIKE it's somehow case sensitive.
What am I doing wrong and how to filter the json payload the case insensitive way?
EDIT=========================================
The guess is that COLLATE should be used here, but so far I don't understand how to make it work.
Default collation of MySQL is latin1_swedish_ci before 8.0 and utf8mb4_0900_ai_ci since 8.0. So non-binary string comparisons are case-insensitive by default in ordinary columns.
However, as mentioned in MySQL manual for JSON type
MySQL handles strings used in JSON context using the utf8mb4 character set and utf8mb4_bin collation.".
Therefore, your JSON value is in utf8mb4_bin collation and you need to apply a case insensitive collation to either operand to make the comparison case insensitive.
E.g.
WHERE username COLLATE XXX LIKE '...'
where XXX should be a utf8mb4 collation (such as the utf8mb4_general_ci you've mentioned.).
Or
WHERE username LIKE '...' COLLATE YYY
where YYY should be a collation that match the character set of you connection.
For equality comparison, you should unquote the JSON value with JSON_UNQUOTE() or the unquoting extraction operator ->>
E.g.
JSON_UNQUOTE(JSON_EXTRACT(payload, '$.username'))
Or simply
payload->>'$.username'
The JSON type and functions work way different from ordinary data types. It appears that you are new to it. So I would suggest you to read the manual carefully before putting it into a production environment.
Okay, I was able to solve the case insensitivity by adding COLLATE utf8mb4_general_ci after the LIKE clause.
So the point here is to find a working collation, which in its turn can be found by researching the db you work with.

Django - Query EXACT string match

I have a user in my DB with this value:
booking_id -> 25DgW
This field is marked as unique in my model
booking_id = models.CharField(null=False, unique=True, max_length=5, default=set_booking_id)
But now when I query for the user like this:
>>> User.objects.get(booking_id='25dgw') # I think this should throw a DoesNotExist exacption
<User: John Doe>
Even if I do:
>>> Partner.objects.get(booking_id__exact='25dgw')
<User: John Doe>
I need a query that return the user only if the code is written exactly in the same way than is saved in database: 25DgW instead of 25dgw.
How should I query?
By default, MySQL does not consider the case of the strings. There will be a character set and collation associated with the database/table or column.
The default collation for character set latin1, which is latin1_swedish_ci, which is case-insensitive. You will have to change the collation type to latin1_general_cs which supports case sensitive comparison.
You can to change character-set/collation settings at the database/table/column levels.
The following steps worked for me.
Go to MySQL shell
mysql -u user -p databasename
Change character set and collation type of the table/column
ALTER TABLE tablename CONVERT TO CHARACTER SET latin1 COLLATE latin1_general_cs;
Now try the query
Partner.objects.get(booking_id__exact='25dgw')
This throws an error, table matching query does not exist.
Reference - Character Sets and Collations in MySQL
User.objects.filter(booking_id='25dgw', booking_id__contains='25dgw').first()
seems to work for me (result = None). The booking_id parameter is to assert the correct letters and no more, the booking_id__contains to assert the case sensitiveness.
This seems to work:
qs.filter(booking_id__contains='25DgW').first()

Case sensitive search in Django, but ignored in Mysql

I have a field in a Django Model for storing a unique (hash) value. Turns out that the database (MySQL/inno) doesn't do a case sensitive search on this type (VARCHAR), not even if I explicitly tell Django to do a case sensitive search Document.objects.get(hash__exact="abcd123"). So "abcd123" and "ABcd123" are both returned, which I don't want.
class document(models.Model):
filename = models.CharField(max_length=120)
hash = models.CharField(max_length=33 )
I can change the 'hash field' to a BinaryField , so in the DB it becomes a LONGBLOB , and it does do a case-sensitive search (and works). However, this doesn't seem very efficient to me.
Is there a better way (in Django) to do this, like adding 'utf8 COLLATE'? or what would be the correct Fieldtype in this situation?
(yes, I know I could use PostgreSQL instead..)
The default collation for character set for MySQL is latin1_swedish_ci, which is case insensitive. Not sure why that is. But you should create your database like so:
CREATE DATABASE database_name CHARACTER SET utf8;
As #dan-klasson mentioned, the default non-binary string comparison is case insensetive by default; notice the _ci at the end of latin1_swedish_ci, it stands for case-insensetive.
You can, as Dan mentioned, create the database with a case sensitive collation and character set.
You may be also interested to know that you can always create a single table or even set only a single column to use a different collation (for the same result). And you may also change these collations post creation, for instance per table:
ALTER TABLE documents__document CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
Additionally, if you rather not change the database/table charset/collation, Django allows to run a custom query using the raw method. So you may be able to work around the change by using something like the following, though I have not tested this myself:
Document.objects.raw("SELECT * FROM documents__document LIKE '%s' COLLATE latin1_bin", ['abcd123'])

mysql collate latin1_german1_ci not working with order by

I have a mysql database where I need to perform a search on a varchar column. All data is encoded in latin1. Sometimes these columns have western accented characters in them (for me almost always French.) Using the default collation (latin1_swedish_ci) has always worked fine for me. But now I have a problem with some data containing umlauts. If I search for "nusserhof" I want mysql to return "nüsserhof", but it is not. Changing the collation to latin1_german1_ci solves the problem in the simplest sense, for instance this query works, returning all rows containing the word "nüsserhof":
select * from mytable where mycolumn like '%nusserhof%' collate latin1_german1_ci;
But if I add an order by clause it no longer works. This doesn't return any rows containing the word "nüsserhof":
select * from mytable where mycolumn like '%nusserhof%' order by mycolumn collate latin1_german1_ci;
Surprisingly, I can't find anything here or through google about this. Is this expected behavior? As a work around I'm just dropping the order by, and sorting after the select in PHP. But it seems like I should be able to get it to work.
Is this expected behavior?
Yes, it is.
In Swedish, the glyph ü represents the letter tyskt y ("German Y") and thus under latin1_swedish_ci it is a variation of the letter y rather than u. If, applying that collation, you were to search where mycolumn like '%nysserhof%', your record containing nüsserhof would be returned.
In German, the glyph ü represents an accented variation (specifically an umlaut) of the base glyph and thus under latin1_german1_ci it is a variation of the letter u as expected. Thus you obtain the desired results when running your search under this collation.
It is because of local differences of this sort that we must choose appropriate collations for our data: no single collation can always be appropriate in the general case.
The problem that you observe when applying ORDER BY results from a misunderstanding of the COLLATE keyword: it is not part of the SELECT command (such that it instructs MySQL to use that collation for all comparisons within the command); rather, it is part of the immediately preceding string (such that it instructs MySQL to use that explicit collation for the immediately preceding string only).
That is, in your first case, the explicit latin1_german1_ci collation is applied to the '%nusserhof%' string literal with a coercibility of 0; the collation of mycolumn (which is presumably latin1_swedish_ci) has a coercibility of 2. Since the former has a lower value, it is used when evaluating the expression.
In your second case, the explicit latin1_german1_ci collation is applied to mycolumn within the ORDER BY clause: thus the sorted results will place 'nüsserhof' between 'nu' and 'nv' instead of between 'ny' and 'nz'. However the explicit collation no longer applies to the filter expression within the WHERE clause, and so the column's default collation will apply.
If the data in mycolumn is all in the German language, you can simply change its default collation and no longer worry about specifying explicit collations within your SQL commands:
ALTER TABLE mytable MODIFY mycolumn <type> COLLATE latin1_german1_ci

How to interpret a column as having a different character set per query?

I need to interface with a database for which I cannot change the collation and charset.
However, I would like to pick some binary data from it, interpret it as if it were UTF8 and then do an UPPER on it (since just doing UPPER() on binary returns the raw value).
I would assume that this works:
SELECT UPPER(Filename.Name) COLLATE utf8_general_ci FROM Filename;
but it doesn't and complains that
COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'binary'
which is fair enough, I need some incantation to cast the binary field as being utf-8. How do I do a select which gives me a computed column with the right character set assigned to it?
Ok figured it out. For modern MySQL versions you can use CAST, and for older ones CONVERT (which is actually standard SQL).
SELECT UPPER(CONVERT(BINARY(Filename.Name) USING utf8)) FROM Filename;
I think you're looking for:
SELECT UPPER(Filename.Name COLLATE utf8_general_ci) FROM Filename;
But I'm not sure because I don't have a broken database to test with.