When storing binary data in MySQL I use the hex() and unhex() functions. But there are two ways I can search on binary data:
Method 1
select * from tbl
where
id=unhex('ABCDABCDABCDABCDABCDABCDABCDABCD')
Method 2
select * from tbl
where
hex(id)='ABCDABCDABCDABCDABCDABCDABCDABCD'
Both methods work, but my instinct is that method 1 is better as only the input value is worked on by the unhex function, whereas in method 2 every value in the id column of the table will be put through the hex function.
Is this reasoning correct, or would MySQL optimise the query to prevent this? Are there any other reasons for choosing one method over the other?
When you use any functions on columns, using indexes becomes hard or impossible. I'm not sure if MySQL supports indexes with functions, but it's still more complicated than using just the column.
Also as you say the function has to be run for each row, whereas in the other only once for input data.
For these reasons do use the form with unhex.
If there is an index on the id column the first method is much faster. If there is no index, the first method is still more efficient.
With the first method, the UNHEX function can be called just once, and if there is a index, it is used. The second method call the function for each row of the table and does not use the index.
Related
I retrieve data from a MySQL database using a simple SELECT FROM WHERE LIKE case-insensitive query where I escape any % or _ in the like clause, so really the user can only perform basic text research and cannot mess up with regex because I then surround it myself with % in the LIKE clause.
For every row returned by this query, I have to search again using a JS script in order to find all the indexes of the substring in the original string. I dislike this method because I it's a different pattern matching than the one used by the LIKE query, I can't guarantee that the algorithm is the same.
I found MySQL functions POSITION or LOCATE that can achieve it, but they return only the first index if it was found or 0 if it was not found. Yes you can set the first index to search from, and by searching by passing the previously returned index as the first index until the new returned index is 0, you can find all indexes of the substring, but it means a lot of additional queries and it might end up slowing down my application a lot.
So I'm now wondering: Is there a way to have the LIKE query to return substring positions directly, but I didn't find any because I lack MySQL vocabulary yet (I'm a noob).
Simple answer: No.
Longer answer: MySQL has no syntax or mechanism ot return an array of anything -- from either a SELECT or even a Stored Procedure.
Maybe answer: You could write a Stored procedure that loops through one result, finding the results and packing them into a commalist. But I cringe at how messy that code would be. I would quickly decide to write JS code, as you have already done.
Moral of the story: SQL is is not a full language. It's great at storing and efficiently retrieving large sets of rows, but lousy at string manipulation or arrays (other than "rows").
Commalist
If you are actually searching a simple list of things separated by commas, then FIND_IN_SET() and SUBSTRING_INDEX() in MySQL closely match what JS can be done with its split (on comma) method on strings.
I have a table and one of the column is TEXT type. I need to search the table for the rows with the text similar to the given string.
As the string can be pretty long (let's say 10000 bytes) I decided that it will be enough to compare only first 20 bytes of the string. To do this search faster I created a key:
KEY `description` (`description`(20))
So what I want to do now is one of the following query:
SELECT * FROM `table` WHERE STRCMP(SUBSTRING(`description`,0,20),'string_to_compare') = 0
or
SELECT * FROM `table` WHERE `description` LIKE 'string_to_compare%')
Note that I put only one percentage sign at the end of string_to_compare for saying to DB that I want to compare only first bytes.
I hope that MySQL brains will do the best to use key and not to do any extra moves.
Questions:
Is there any difference which query is better? I'm personally prefer the second as it looks clearer and hopefully will be better understand by the DB engine (MyISAM).
Is that correct the MySQL MyISAM will make an efficient code for these queries?
How do I put '%' in the PDO's prepare statement? SELECT * FROM table
WHERE description LIKE ":text%"?
Yes, there's a difference. When the WHERE condition calls a function on the column value, indexes can't be used. I don't think it will realize that your SUBSTRING() call happens to match the indexed part of the text and use that. On the other hand, LIKE is specifically coded to recognize the cases where it can use an index. Also, if you want to compare two strings for equality, you should use =, not STRCMP(), e.g.
WHERE SUBSTRING(`description`,1,20) = 'string_to_compare'
I believe it will make an efficient query for the LIKE version.
The placeholder can't be in quotes for it to work. Use CONCAT() to combine it: WHERE description LIKE CONCAT(:text, '%'). Or you can put the % at the end of the PHP variable that you bind to the placeholder, and use WHERE description LIKE :text.
I have a column with type binary(16) not null in a mysql table. Not all records have data in this column, but because it setup to disallow NULL such records have an all-zero value.
I want to exclude these all-zero records from a query.
So far, the only solution I can find is to HEX the value and compare that:
SELECT uuid
FROM example
WHERE HEX(uuid) != '00000000000000000000000000000000'
which works, but is there a better way?
To match a binary type using a literal, use \0 or \1 for the bits.
In your case with a binary(16), this is how to exclude only zeroes:
where uuid != '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0'
See SQLFiddle.
The advantage of using a plain comparison with a literal (like this) is that an index on the column can be used and it's a lot faster. If you invoke functions to make the comparison, indexes will not be used and the function(s) must be invoked on every row, which can cause big performance problems on large tables.
SELECT uuid FROM example WHERE TRIM('\0' FROM uuid)!='';
Note that Bohemians answer is a lot neater, I just use this when I am not sure about the length of the field (Which probably comes down to bad design on some level, feel free to educate me).
select uuid from example where uuid!=cast('' as binary(N));
where N is the length of the UUID column. Not pretty, but working.
I've been told on several occasions that it is quite efficient to SELECT using math and that it is NOT very efficient to use math in a WHERE clause. Are these sentiments correct? And how does this apply to ORDER BY clauses?
Thanks!!
Example:
SELECT a.* FROM a ORDER BY (a.field_1*a.field_2)
Your query will have to sort the entire table using temporary files on disk if the result is larger than the sort_buffer_size.
You probably want to add a column to your table that holds the value of field1*field2. This of course slightly denormalizes your data, BUT YOU CAN CREATE AN INDEX ON THE FIELD.
If you have an index on the new field, then MySQL can read the data pre-sorted using the index, because MySQL indexes are b*tree structures and b*tree structures are stored in pre-sorted order. This won't incur extra disk IO or CPU activity for the sort and you will scan the table only once.
Its a good idea , but I never think that using mathematical function in ORDER BY clause make any sense.
You can use this by alias :-
select *,(intId * intId)as xalias from m_xxx_list order by xalias;
OR
select * from m_xxx_list order by (intId + intId);
Yes , If you are using mathemetical aggregate function of MYSQL, then Test it.
For MySQL to sort results by a computed value, it actually needs to calculate the value on the fly, after it has filtered out rows based on the WHERE clause. If the result set is quite large, then MySQL will need to compute the results for all the rows.
For a small result set, this should be fine. However, the larger your result set is (previous to the application of the LIMIT), then the more calculations the server has to do simply figure out the value to order the rows in. If the calculation is deterministic, then you should cache it in a column in the result set, and then index it. If it's on the fly, then you'll need to ensure your CPU is up to the task.
In the case provided, I would recommend creating a column, a.field_3, and store the result of (a.field_1*a.field_2) in it. Whenever the values of a.field_1 or a.field_2 change, you'll need to recalculate the result.
I am curious about the disadvantage of quoting integers in MYSQL queries
For example
SELECT col1,col2,col3 FROM table WHERE col1='3';
VS
SELECT col1,col2,col3 FROM table WHERE col1= 3;
If there is a performance cost, what is the size of it and why does it occur? Are there any other disavantages other that performance?
Thanks
Andrew
Edit: The reason for this question
1. Because I want to learn the difference because I am curious
2. I am experimenting with a way of passing composite keys from my database around in my php code as psudo-Id-keys(PIK). These PIK's are the used to target the record.
For example, given a primary key (AreaCode,Category,RecordDtm)
My PIK in the url would look like this:
index.php?action=hello&Id=20001,trvl,2010:10:10 17:10:45
And I would select this record like this:
$Id = $_POST['Id'];//equals 20001,trvl,2010:10:10 17:10:45
$sql = "SELECT AreaCode,Category,RecordDtm,OtherColumns.... FROM table WHERE (AreaCode,Category,RecordDtm) = ({$Id});
$mysqli->query($sql):
......and so on.
At this point the query won't work because of the datetime(which must be quoted) and it is open to sql injection because I haven't escaped those values. Given the fact that I won't always know how my PIK's are constructed I would write a function splits the Id PIK at the commas, cleans each part with real_escape_string and puts It back together with the values quoted. For Example:
$Id = "'20001','trvl','2010:10:10 17:10:45'"
Of course, in this function that is breaking apart and cleaning the Id I could check if the value is a number or not. If it is a number, don't quote it. If it is anything but a string then quote it.
The performance cost is that whenever mysql needs to do a type conversion from whatever you give it to datatype of the column. So with your query
SELECT col1,col2,col3 FROM table WHERE col1='3';
If col1 is not a string type, MySQL needs to convert '3' to that type. This type of query isn't really a big deal, as the performance overhead of that conversion is negligible.
However, when you try to do the same thing when, say, joining 2 table that have several million rows each. If the columns in the ON clause are not the same datatype, then MySQL will have to convert several million rows every single time you run your query, and that is where the performance overhead comes in.
Strings also have a different sort order from numbers.
Compare:
SELECT 312 < 41
(yields 0, because 312 numerically comes after 41)
to:
SELECT '312' < '41'
(yields 1, because '312' lexicographically comes before '41')
Depending on the way your query is built using quotes might give wrong results or none at all.
Numbers should be used as such, so never use quotes unless you have a special reason to do so.
According to me, I think there is no performance/size cost in the case you have mentioned. Even if there is, then it is very much negligible and wont affect your application as such.
It gives the wrong impression about the data type for the column. As an outsider, I assume the column in question is CHAR/VARCHAR & choose operations accordingly.
Otherwise MySQL, like most other databases, will implicitly convert the value to whatever the column data type is. There's no performance issue with this that I'm aware of but there's a risk that supplying a value that requires explicit conversion (using CAST or CONVERT) will trigger an error.