Convert MySQL LONGTEXT column to JSON - mysql

I have an existing table that holds valid JSON data but is stored as LONGTEXT (character set utf8mb4 with collation utf8mb4_bin). Doing JSON queries on this is highly inefficient and I want to change the data type of this column to JSON.
When I do so in HeidiSQL I get an: 'COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'binary'. I can fix this by resetting the collation to empty.
I mention this because I thought the character set for a JSON column is utf8mb4 and it's default collation is utf8mb4_bin. In the JSON are some GUIDs and when I try to query them, it appears that I have to use LIKE. If I use the 'normal' = I get no results.
Works: SELECT * FROM MyTable WHERE Data->>"$.Shop[*].ContactPerson.UserId" LIKE "%b1b9ad95-1098-4e6c-a697-50c2a47dc301%";
Doesn't work: SELECT * FROM MyTable WHERE Data->>"$.Shop[*].ContactPerson.UserId" = "b1b9ad95-1098-4e6c-a697-50c2a47dc301";
Is that a problem with my syntax or is it related to collation (I'd say no, but it's the only discrepancy I find). BTW: Leaving out the % in the LIKE also produces no results.
Create/insert:
CREATE TABLE Temp ( `Data` LONGTEXT COLLATE UTF8MB4_BIN );
INSERT INTO Temp( `Data` ) VALUES ("{\"Shop\":[{\"ContactPerson\": {\"UserId\":\"b1b9ad95-1098-4e6c-a697-50c2a47dc301\"}},{\"ContactPerson\": {\"UserId\":\"B27DA5A7-D678-4513-8A44-BD76CC4651CC\"}}]}");
INSERT INTO Temp( `Data` ) VALUES ("{\"Shop\":[{\"ContactPerson\": {\"UserId\":\"82899A81-2024-4F68-917A-710764296A21\"}},{\"ContactPerson\": {\"UserId\":\"AE59DCA7-32AF-4131-93C7-A1BB698DF8E0\"}}]}");
INSERT INTO Temp( `Data` ) VALUES ("{\"Shop\":[{\"ContactPerson\": {\"UserId\":\"4154477B-1B70-4F25-9E4B-2CFBBF4F678F\"}},{\"ContactPerson\": {\"UserId\":\"B27DA5A7-D678-4513-8A44-BD76CC4651CC\"}}]}");
Trying the update: ALTER TABLE `Temp` MODIFY `Data` JSON;
And now it works... (of course) :(
So apparently when using the table-editor in HeidiSQL it fails (I've retried this). Using a SQL statement actually does what I want.
That still leaves me with having to use the LIKE as opposed to = in my query.
Thank you to #Rick James for the "aha moment": Where clause for JSON field on array of objects
So in my case:
SELECT * FROM `Temp` WHERE JSON_SEARCH(`Data`, 'one', 'b1b9ad95-1098-4e6c-a697-50c2a47dc301', NULL, '$.Shop[*].ContactPerson.UserId') IS NOT NULL;
Update:
Issue was reported and will be fixed in next build (https://github.com/HeidiSQL/HeidiSQL/issues/1652)

Related

Equal to operation not working on a field with updated collation

I was trying to convert an existing varchar column with a unique index on it to a case sensitive column. So to do this, I updated the collation of the particular column.
Previous value: utf8mb4_unicode_ci
Current value: utf8mb4_bin
Now I have a row in my table TEST_TABLE with test_column value is abcd.
When I try to run a simple query like SELECT * FROM TEST_TABLE WHERE test_column = 'abcd'; it returns no result.
However when I try SELECT * FROM TEST_TABLE WHERE test_column LIKE 'abcd'; it returns the data correctly.
Also when I try SELECT * FROM TEST_TABLE WHERE BINARY test_column = 'abcd'; it returns the data correctly.
One more thing I tried was creating a duplicate of the table with column collation set as utf8mb4_bin while creating itself and then copy all data from original table. Then the query SELECT * FROM TEST_TABLE WHERE test_column = 'abcd'; is working alright.
So this seems to be a problem with BINARY conversion. Is there any solution to this or Am I doing something wrong ?
This seems to be an issue with MySQL. The steps I followed to resolve this is as follows:
dropped the unique index on the column
change the collation of the column
created the unique index again
Now it is working as expected. It seems MySQL didn't rebuild unique index when collation was changed. However the above steps solved my issue.
How did you change the collation? There are about 4 ways that you might think to do it. Most do something different.
Probably ALTER TABLE ... CONVERT TO COLLATION utf8mb4_bin was what you needed.
Why "bin"? You want to match case and accents? That is "abcd" != "Abcd"?

MySQL trying to apply UTF-8 encoding when trying to insert bytes to BLOB field

I'm using MySQL 8.0.4 (rc4) I need MySQL 8 because it's the only version of MySQL that supports CTEs.
My database is created thus:
CREATE DATABASE IF NOT EXISTS TestDB
DEFAULT CHARACTER SET utf8mb4
DEFAULT COLLATE utf8mb4_general_ci;
USE TestDB;
SET sql_mode = 'STRICT_TRANS_TABLES';
CREATE TABLE IF NOT EXISTS MyTable (
(...)
Body LONGBLOB NOT NULL,
(...)
);
When I try to insert raw byte data to this description field, I receive this error:
Error 1366: Incorrect string value: '\x8B\x08\x00\x00\x00\x00...' for column 'Body' at row 1.
This is the insert statement I'm using.
REPLACE INTO MyTable
SELECT Candidate.* FROM
(SELECT :Id AS Id,
(...)
:Body AS Body,
(...)
) AS Candidate
LEFT JOIN MyTable ON Candidate.Id = MyTable.Id
WHERE (
(...)
);
How could there be an incorrect string value for BLOB? Doesn't BLOB mean I can insert quite literally anything?
What's the : stuff? Why have the nested query? May we see actual SQL? What language are you using? It sounds like the "binding" tried to apply character set rules, when it should not. May we see the code that did the substitution of the : stuff?
BLOBs have not character set. As long as you can get the bytes past the parser, there should be no problem.
However, I find this to be a better way to do it...
In the app language, generate a hex string, then use that in
INSERT INTO ... VALUES (..., UNHEX(the-hex-string), ...)

MYSQL syntax error in CONVERT function

I have some problem in MYSQL syntax
This statement is work correctly
CONVERT(_latin1 'SOME-AR-TEXT' USING utf8));
But i don't need the 'SOME-AR-TEXT' value, i need the value of some variable.
In other words, i tried to do this
CONVERT(_latin1 (SELECT some_variable) USING utf8));
But the console display syntax error.
What can i do to get the value of some_variable variable.
Thank you all
SELECT CONVERT(some_variable USING UTF8) AS field_value
FROM MyTable
By your SQL fiddle it seems like you want to convert each field. Why not just create the table with default charset latin? In that way, you would not have to specifially convert each field.
CREATE TABLE IF NOT EXISTS `example` (
`some_variable` varchar(30) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
And there is something wrong the terminology as well. 'some_variable" isn't really a variable but a column/field in the database-table example.

MySQL LOWER() function not multi-byte safe for the º character?

When I encoding the following character to UTF-8:
º
I get:
º
Then with º stored as a field value, I select the field with the LOWER() function and get
âº
I was expecting it to respect that the value is a multi-byte character and thus will not perform the LOWER on it.
Expected:
º
I am I not understanding correctly that the LOWER() function is suppose to be multi-byte safe as stated in the manual? (http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_lower)
Or am I doing something wrong here?
I am running MySQL 5.1.
EDIT
The encoding on the table is set to UTF-8. The session encoding is default latin1.
Here are my repro steps.
CREATE TABLE test_table (
test_field VARCHAR(1000) DEFAULT NULL
) ENGINE=INNODB DEFAULT CHARSET=utf8;
INSERT INTO test_table(test_field) VALUES('º');
SELECT LOWER(test_field) FROM test_table;
INSERT INTO test_table(test_field) VALUES('º');
Will insert a 2 character string, which has the correct LOWER() of "âº"
Lower("Â") is "â"
Lower("º") is "º"
If you want to insert "º" then make sure you have
SET NAMES 'utf-8';
and
INSERT INTO test_table(test_field) VALUES('º');

MySQL case insensitive string matching using =

I'm trying to search records using an alphanumeric "short_code" column. Something like:
SELECT * FROM items WHERE short_code = "1AV9"
With no collation and with column type set to varchar(), this query is case-insensitive, so it returns records with short_codes 1av9, 1Av9, etc. I don't want this.
So I tried changing the collation of the short_code column to utf8_bin, but now the query isn't returning anything at all. However, if I change the query to:
SELECT * FROM items WHERE short_code LIKE "1AV9%"
Then I get the exact row I want. Is it possible that by converting my column's collation, it somehow appended invisible chars at the end of all my shortcodes? How can I verify/fix this?
EDIT: It looks that by changing my column type to binary and trying a bunch of other stuff, it somehow padded all my short_codes with null bytes, which explains why the query wouldn't return any result. After starting over and setting the utf8_bin collation, everything's working as expected.
Here's a wild guess. I think the table had not origiannly a collation set. Then you set the collation into utf_bin and that caused a confusion in the stored length of the field.
First back up your table. Then try:
ALTER TABLE items
CHANGE COLUMN short_code short_code VARCHAR(48)
CHARACTER SET 'utf8'
COLLATE 'utf8_unicode_ci' ;
Adding some characters (that are not in your data):
UPDATE items
SET short_code = CONCAT('++F++F', short_code, '++F++F') ;
Removing them:
UPDATE items
SET short_code = REPLACE(short_code, '++F++F', '') ;
Back to length 8:
ALTER TABLE items
CHANGE COLUMN short_code short_code VARCHAR(8) ;
And back again to binary collation:
ALTER TABLE items
CHANGE COLUMN short_code short_code VARCHAR(8)
CHARACTER SET 'utf8'
COLLATE 'utf8_bin' ;
Perhaps this will fix the incorrect length. (perhaps a shorter change - from varchar to char and back to varchar - will fix it).
Try
SELECT LENGTH(short_code) FROM items WHERE short_code LIKE "1AV9%"
and see if you get something other than 4 as the result.
Edit: Hmm, your values might have trailing spaces. Try
SELECT * FROM items WHERE short_code = "1AV9 "
(that's 1AV9 plus four spaces) and see if you get any results.
If you can change the collation then try "utf8_general_cs".
or maybe
WHERE '1AV9' COLLATE utf8_general_cs = short_code