I have a user in my DB with this value:
booking_id -> 25DgW
This field is marked as unique in my model
booking_id = models.CharField(null=False, unique=True, max_length=5, default=set_booking_id)
But now when I query for the user like this:
>>> User.objects.get(booking_id='25dgw') # I think this should throw a DoesNotExist exacption
<User: John Doe>
Even if I do:
>>> Partner.objects.get(booking_id__exact='25dgw')
<User: John Doe>
I need a query that return the user only if the code is written exactly in the same way than is saved in database: 25DgW instead of 25dgw.
How should I query?
By default, MySQL does not consider the case of the strings. There will be a character set and collation associated with the database/table or column.
The default collation for character set latin1, which is latin1_swedish_ci, which is case-insensitive. You will have to change the collation type to latin1_general_cs which supports case sensitive comparison.
You can to change character-set/collation settings at the database/table/column levels.
The following steps worked for me.
Go to MySQL shell
mysql -u user -p databasename
Change character set and collation type of the table/column
ALTER TABLE tablename CONVERT TO CHARACTER SET latin1 COLLATE latin1_general_cs;
Now try the query
Partner.objects.get(booking_id__exact='25dgw')
This throws an error, table matching query does not exist.
Reference - Character Sets and Collations in MySQL
User.objects.filter(booking_id='25dgw', booking_id__contains='25dgw').first()
seems to work for me (result = None). The booking_id parameter is to assert the correct letters and no more, the booking_id__contains to assert the case sensitiveness.
This seems to work:
qs.filter(booking_id__contains='25DgW').first()
Related
I have a MySQL query, where I filter by a json field:
SELECT id, username
FROM (SELECT id,
Json_extract(payload, '$.username') AS username
FROM table1) AS tmp
WHERE username = 'userName1';
It returns 1 row, which looks like:
1, "userName1" See the quotes that are not in the clause?
What I need is to make the WHERE clause case insensitive.
But when I do
WHERE username LIKE 'userName1';
it returns 0 rows. I don't understand why it works this way, the = clause works though it doesn't have those double quotes.
If I do
WHERE username LIKE '%userName1%';
now also returns the row, because %% takes quotes into consideration:
1, "userName1"
But when I do
WHERE username LIKE '%username1%';
it returns 0 rows, so unlike the usual MySQL LIKE it's somehow case sensitive.
What am I doing wrong and how to filter the json payload the case insensitive way?
EDIT=========================================
The guess is that COLLATE should be used here, but so far I don't understand how to make it work.
Default collation of MySQL is latin1_swedish_ci before 8.0 and utf8mb4_0900_ai_ci since 8.0. So non-binary string comparisons are case-insensitive by default in ordinary columns.
However, as mentioned in MySQL manual for JSON type
MySQL handles strings used in JSON context using the utf8mb4 character set and utf8mb4_bin collation.".
Therefore, your JSON value is in utf8mb4_bin collation and you need to apply a case insensitive collation to either operand to make the comparison case insensitive.
E.g.
WHERE username COLLATE XXX LIKE '...'
where XXX should be a utf8mb4 collation (such as the utf8mb4_general_ci you've mentioned.).
Or
WHERE username LIKE '...' COLLATE YYY
where YYY should be a collation that match the character set of you connection.
For equality comparison, you should unquote the JSON value with JSON_UNQUOTE() or the unquoting extraction operator ->>
E.g.
JSON_UNQUOTE(JSON_EXTRACT(payload, '$.username'))
Or simply
payload->>'$.username'
The JSON type and functions work way different from ordinary data types. It appears that you are new to it. So I would suggest you to read the manual carefully before putting it into a production environment.
Okay, I was able to solve the case insensitivity by adding COLLATE utf8mb4_general_ci after the LIKE clause.
So the point here is to find a working collation, which in its turn can be found by researching the db you work with.
I have a MySQL query, where I filter by a json field:
SELECT id, username
FROM (SELECT id,
Json_extract(payload, '$.username') AS username
FROM table1) AS tmp
WHERE username = 'userName1';
It returns 1 row, which looks like:
1, "userName1" See the quotes that are not in the clause?
What I need is to make the WHERE clause case insensitive.
But when I do
WHERE username LIKE 'userName1';
it returns 0 rows. I don't understand why it works this way, the = clause works though it doesn't have those double quotes.
If I do
WHERE username LIKE '%userName1%';
now also returns the row, because %% takes quotes into consideration:
1, "userName1"
But when I do
WHERE username LIKE '%username1%';
it returns 0 rows, so unlike the usual MySQL LIKE it's somehow case sensitive.
What am I doing wrong and how to filter the json payload the case insensitive way?
EDIT=========================================
The guess is that COLLATE should be used here, but so far I don't understand how to make it work.
Default collation of MySQL is latin1_swedish_ci before 8.0 and utf8mb4_0900_ai_ci since 8.0. So non-binary string comparisons are case-insensitive by default in ordinary columns.
However, as mentioned in MySQL manual for JSON type
MySQL handles strings used in JSON context using the utf8mb4 character set and utf8mb4_bin collation.".
Therefore, your JSON value is in utf8mb4_bin collation and you need to apply a case insensitive collation to either operand to make the comparison case insensitive.
E.g.
WHERE username COLLATE XXX LIKE '...'
where XXX should be a utf8mb4 collation (such as the utf8mb4_general_ci you've mentioned.).
Or
WHERE username LIKE '...' COLLATE YYY
where YYY should be a collation that match the character set of you connection.
For equality comparison, you should unquote the JSON value with JSON_UNQUOTE() or the unquoting extraction operator ->>
E.g.
JSON_UNQUOTE(JSON_EXTRACT(payload, '$.username'))
Or simply
payload->>'$.username'
The JSON type and functions work way different from ordinary data types. It appears that you are new to it. So I would suggest you to read the manual carefully before putting it into a production environment.
Okay, I was able to solve the case insensitivity by adding COLLATE utf8mb4_general_ci after the LIKE clause.
So the point here is to find a working collation, which in its turn can be found by researching the db you work with.
I have a field in a Django Model for storing a unique (hash) value. Turns out that the database (MySQL/inno) doesn't do a case sensitive search on this type (VARCHAR), not even if I explicitly tell Django to do a case sensitive search Document.objects.get(hash__exact="abcd123"). So "abcd123" and "ABcd123" are both returned, which I don't want.
class document(models.Model):
filename = models.CharField(max_length=120)
hash = models.CharField(max_length=33 )
I can change the 'hash field' to a BinaryField , so in the DB it becomes a LONGBLOB , and it does do a case-sensitive search (and works). However, this doesn't seem very efficient to me.
Is there a better way (in Django) to do this, like adding 'utf8 COLLATE'? or what would be the correct Fieldtype in this situation?
(yes, I know I could use PostgreSQL instead..)
The default collation for character set for MySQL is latin1_swedish_ci, which is case insensitive. Not sure why that is. But you should create your database like so:
CREATE DATABASE database_name CHARACTER SET utf8;
As #dan-klasson mentioned, the default non-binary string comparison is case insensetive by default; notice the _ci at the end of latin1_swedish_ci, it stands for case-insensetive.
You can, as Dan mentioned, create the database with a case sensitive collation and character set.
You may be also interested to know that you can always create a single table or even set only a single column to use a different collation (for the same result). And you may also change these collations post creation, for instance per table:
ALTER TABLE documents__document CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
Additionally, if you rather not change the database/table charset/collation, Django allows to run a custom query using the raw method. So you may be able to work around the change by using something like the following, though I have not tested this myself:
Document.objects.raw("SELECT * FROM documents__document LIKE '%s' COLLATE latin1_bin", ['abcd123'])
I'm having the utf-8 Vs. byte string problems mentioned here: Django headache with simple non-ascii string
I don't care about case sensitive matching in the MySQL columns, I just always want UTF-8 strings returned because I find it is impossible to deal with byte strings returned for character columns for non-ascii text.
How do I change my MySQL collation type so that UTF-8 strings are always returned through Django?
You need to be aware of the character-set/collation settings at the database/table/column levels. Column-level settings take precedence over the others. Because of this, I'm including commands you can use to perform these changes at each level of the db.
Inspect your current configuration (database):
SHOW CREATE DATABASE db_name;
Inspect your current configuration (table):
SHOW TABLE STATUS WHERE name='tbl_name'
Inspect your current configuration (columns):
SHOW FULL COLUMNS FROM tbl_name;
Change the character-set/collation (database):
ALTER DATABASE db_name DEFAULT CHARACTER SET utf8;
Change the character-set/collation (table):
ALTER TABLE tbl_name DEFAULT CHARACTER SET utf8;
Change the character-set/collation (columns):
ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8;
In django you must write your own migration:
./manage.py makemigrations --empty app_name
And fill empty migration with these sql command like this:
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from django.db import models, migrations
class Migration(migrations.Migration):
dependencies = [
('app', '0008_prev_migration'),
]
operations = [
migrations.RunSQL('ALTER DATABASE db_name DEFAULT CHARACTER SET utf8;'),
migrations.RunSQL('ALTER TABLE tbl_name DEFAULT CHARACTER SET utf8;'),
migrations.RunSQL('ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8;'),
]
ALTER DATABASE db_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci
Note that if you really did want to change the collation for just one column (I can't think why you might, but who knows) then this is the syntax to alter a TEXT column called DESCRIPTION in the ITEMS table to UTF-8, binary, non-null:
ALTER TABLE ITEMS CHANGE DESCRIPTION DESCRIPTION TEXT CHARACTER SET utf8
COLLATE utf8_bin NOT NULL;
There isn't a case-sensitive UTF-8 collation per se but the utf8_bin collation works for most cases.
I'm getting this strange error while processing a large number of data...
Error Number: 1267
Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
SELECT COUNT(*) as num from keywords WHERE campaignId='12' AND LCASE(keyword)='hello again 昔 ã‹ã‚‰ ã‚ã‚‹ å ´æ‰€'
What can I do to resolve this? Can I escape the string somehow so this error wouldn't occur, or do I need to change my table encoding somehow, and if so, what should I change it to?
SET collation_connection = 'utf8_general_ci';
then for your databases
ALTER DATABASE your_database_name CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
MySQL sneaks swedish in there sometimes for no sensible reason.
CONVERT(column1 USING utf8)
Solves my problem. Where column1 is the column which gives me this error.
You should set both your table encoding and connection encoding to UTF-8:
ALTER TABLE keywords CHARACTER SET UTF8; -- run once
and
SET NAMES 'UTF8';
SET CHARACTER SET 'UTF8';
Use following statement for error
be careful about your data take backup if data have in table.
ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
In general the best way is to Change the table collation. However I have an old application and are not really able to estimate the outcome whether this has side effects. Therefore I tried somehow to convert the string into some other format that solved the collation problem.
What I found working is to do the string compare by converting the strings into a hexadecimal representation of it's characters. On the database this is done with HEX(column). For PHP you may use this function:
public static function strToHex($string)
{
$hex = '';
for ($i=0; $i<strlen($string); $i++){
$ord = ord($string[$i]);
$hexCode = dechex($ord);
$hex .= substr('0'.$hexCode, -2);
}
return strToUpper($hex);
}
When doing the database query, your original UTF8 string must be converted first into an iso string (e.g. using utf8_decode() in PHP) before using it in the DB. Because of the collation type the database cannot have UTF8 characters inside so the comparism should work event though this changes the original string (converting UTF8 characters that are not existend in the ISO charset result in a ? or these are removed entirely). Just make sure that when you write data into the database, that you use the same UTF8 to ISO conversion.
I had my table originally created with CHARSET=latin1. After table conversion to utf8 some columns were not converted, however that was not really obvious.
You can try to run SHOW CREATE TABLE my_table; and see which column was not converted or just fix incorrect character set on problematic column with query below (change varchar length and CHARSET and COLLATE according to your needs):
ALTER TABLE `my_table` CHANGE `my_column` `my_column` VARCHAR(10) CHARSET utf8
COLLATE utf8_general_ci NULL;
I found that using cast() was the best solution for me:
cast(Format(amount, "Standard") AS CHAR CHARACTER SET utf8) AS Amount
There is also a convert() function. More details on it here
Another resource here
Change the character set of the table to utf8
ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8
My user account did not have the permissions to alter the database and table, as suggested in this solution.
If, like me, you don't care about the character collation (you are using the '=' operator), you can apply the reverse fix. Run this before your SELECT:
SET collation_connection = 'latin1_swedish_ci';
After making your corrections listed in the top answer, change the default settings of your server.
In your "/etc/my.cnf.d/server.cnf" or where ever it's located add the defaults to the [mysqld] section so it looks like this:
[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
Source: https://dev.mysql.com/doc/refman/5.7/en/charset-applications.html