Exclude records with empty binary column data - mysql

I have a column with type binary(16) not null in a mysql table. Not all records have data in this column, but because it setup to disallow NULL such records have an all-zero value.
I want to exclude these all-zero records from a query.
So far, the only solution I can find is to HEX the value and compare that:
SELECT uuid
FROM example
WHERE HEX(uuid) != '00000000000000000000000000000000'
which works, but is there a better way?

To match a binary type using a literal, use \0 or \1 for the bits.
In your case with a binary(16), this is how to exclude only zeroes:
where uuid != '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0'
See SQLFiddle.
The advantage of using a plain comparison with a literal (like this) is that an index on the column can be used and it's a lot faster. If you invoke functions to make the comparison, indexes will not be used and the function(s) must be invoked on every row, which can cause big performance problems on large tables.

SELECT uuid FROM example WHERE TRIM('\0' FROM uuid)!='';
Note that Bohemians answer is a lot neater, I just use this when I am not sure about the length of the field (Which probably comes down to bad design on some level, feel free to educate me).

select uuid from example where uuid!=cast('' as binary(N));
where N is the length of the UUID column. Not pretty, but working.

Related

MySQL- INDEX(): How to Create a Functional Key Part Using Last nth Characters?

How would I write the INDEX() statement to use the last Nth characters of a functional keypart? I'm brand new to SQL/MySQL, and believe that's the proper verbiage of my question. explanation of what I'm looking for is below.
The MySQL 8.0 Ref Manual explains how to use the first nth characters, showing that the secondary index using col2's first 10 characters, via example:
CREATE TABLE t1 (
col1 VARCHAR(40),
col2 VARCHAR(30),
INDEX (col1, col2(10))
);
However, I would like to know how one could form this using the ending characters? Perhaps something like:
...
INDEX ((RIGHT (col2,3)));
);
However, I think that says to index over a column called 'xyz' instead of "put an index on each column value using the last 3 of 30 potential characters"? That's what I'm really trying to figure out.
For some context, it'd be helpful to index something with smooshed/mixed data and am playing around as to how such a thing could be accomplished. Example of the kind of data I'm talking about, below, is a simplified, adjusted version of exported data from an inventory/billing manager that hails from the 90's that I had to endure some years back...:
Col1
Col2
GP6500012_SALES_FY2023_SBucks_503_Thurs
R-DK_Sumat__SKU-503-20230174
GP6500012_SALES_FY2023_SBucks_607_Mon
R-MD_Columb__SKU-607-2023035
GP6500012_SALES_FY2023_SBucks_627_Mon-pm
R-BLD_House__SKU-503-20230024
GP6500012_SALES_FY2023_SBucks_929_Wed
R-FR_Ethp__SKU-929-20230324
Undoubtedly, better options exist that bypass this question altogether- and I'll presumably learn those techniques with time in my data analytics coursework. For now, I'm just curious if it's possible to somehow index the rows by suffix instead of prefix, and what a code example would look like to accomplish that. TIA.
Proposed solution (INDEX ((RIGHT (col2,3)))):
Not available.
Case 1:
When you need to split apart a column to search it, you have probably designed the schema wrong. In particular, that part of the columns needs to be in its own column. That being said, it is possible to use a 'virtual' (or 'generated') column that is a function of the original column, then INDEX that.
Case 2:
If you are suggesting that the last 3 characters are the most selective and that might speed up any lookup, don't bother. Simply index the entire column.
That data:
I would consider splitting up the stuff that is concatenated together by _. Do it as you INSERT the rows. If it needs to be put back together, do so during subsequent SELECTs.
DATEs:
Do not, on the other hand, split up dates (into year, month, etc). Keep them together. (That's another discussion.) Always go to the effort to convert dates (and datetimes) to the MySQL format (year-first) when storing. That way, you can properly use indexes and use the many date functions.
MySQL's Prefix indexing:
In general it is a "bad idea" to use the INDEX(col(10)) construct. It rarely is of any benefit; it often fails to use the index as much as you would expect. This is especially deceptive: UNIQUE(col(10)) -- It declares that the first 10 chars are unique, not the entire col!
CAST:
If the data is the wrong datatype (string vs int; wrong collation; etc), the I argue that it is a bad schema design. This is a common problem with EAV (Entity-Attribute-Value) schemas. When a number is stored as a string, CAST is needed to sort (ORDER BY) it.
Functional indexes:
Your proposed solution not a "prefix", it is something more complicated. I suspect any expression, even on non-string columns will work. This is when it became available:
---- 2018-10-22 8.0.13 General Availability -- -- -----
MySQL now supports creation of functional index key parts that index
expression values rather than column values. Functional key parts
enable indexing of values that cannot be indexed otherwise, such as
JSON values. For details, see CREATE INDEX Syntax.

Fast search solution for numeric type of large mysql table?

I have large mysql database (5 million rows) and data is phone number.
I tried many solution but it's still slow. Now, I'm using INT type and LIKE sql query for store and searching phone number.
Ex: SELECT phonenumber FROM tbl_phone WHERE phonenumber LIKE '%4567'
for searching phone numbers such as 170**4567**, 249**4567**,...
I need a solution which make it run faster. Help me, please!
You are storing numbers as INT, but querying then as CHAR (the LIKE operator implicitly converts INTs to CHARs) and it surely is not optimal. If you'd like to keep numbers as INT (probably the best idea in IO performance therms), you'd better change your queries to use numerical comparisons:
-- instead of CHAR operators
WHERE phone_number LIKE '%4567'
WHERE phone_number LIKE '1234%'
-- use NUMERIC operators
WHERE phone_number % 10000 = 4567
WHERE phone_number >= 12340000 -- considering 8 digit numbers
Besides choosing a homogeneous way to store and query data, you should keep in mind to create the appropriate index CREATE INDEX IDX0 ON table (phone_number);.
Unfortunately, even then your query might not be optimal, because of effects similar to #ron have commented about. In this case you might have to tune your table to break this column into more manageable columns (like national_code, area_code and phone_number). This would allow an index efficient query by area-codes, for example.
Check the advice here How to speed up SELECT .. LIKE queries in MySQL on multiple columns?
Hope it helps!
I would experiment with using REGEXP, rather than LIKE as in the following example:
SELECT `field` WHERE `field` REGEXP '[0-9]';
Other than that, indeed, create an index if your part of the phone search pattern has a constant length.
Here is also a link to MySQL pattern mathching document.
That LIKE predicate is operating on a string, so you've got an implicit conversion from INT to VARCHAR happening. And that means an index on the INT column isn't going to help, even for a LIKE predicate that has leading characters. (The predicate is not sargable.)
If you are searching for the last digits of the phone number, the only way (that I know of) to get something like that to be fast would be to add another VARCHAR column, and store the reversed phone number in it, where the order of the digits is backwards.
Create an index on that VARCHAR column, and then to find phone number that end with '4567':
WHERE reverse_phone LIKE '7654%'
-or-
WHERE reverse_phone LIKE CONCAT(REVERSE('4567'),'%')

Which one will be faster in MySQL with BINARY or without Binary?

Please explain, which one will be faster in Mysql for the following query?
SELECT * FROM `userstatus` where BINARY Name = 'Raja'
[OR]
SELECT * FROM `userstatus` where Name = 'raja'
Db entry for Name field is 'Raja'
I have 10000 records in my db, i tried with "explain" query but both saying same execution time.
Your question does not make sense.
The collation of a row determines the layout of the index and whether tests wil be case-sensitive or not.
If you cast a row, the cast will take time.
So logically the uncasted operation should be faster....
However, if the cast makes it to find fewer rows than the casted operation will be faster or the other way round.
This of course changes the whole problem and makes the comparison invalid.
A cast to BINARY makes the comparison to be case-sensitive, changing the nature of the test and very probably the number of hits.
My advice
Never worry about speed of collations, the percentages are so small it is never worth bothering about.
The speed penalty from using select * (a big no no) will far outweigh the collation issues.
Start with putting in an index. That's a factor 10,000 speedup with a million rows.
Assuming that the Names field is a simple latin-1 text type, and there's no index on it, then the BINARY version of the query will be faster. By default, MySQL does case-insensitive comparisons, which means the field values and the value you're comparing against both get smashed into a single case (either all-upper or all-lower) and then compared. Doing a binary comparison skips the case conversion and does a raw 1:1 numeric comparison of each character value, making it a case-sensitive comparison.
Of course, that's just one very specific scenario, and it's unlikely to be met in your case. Too many other factors affect this, especially the presence of an index.

Disadvantages of quoting integers in a Mysql query?

I am curious about the disadvantage of quoting integers in MYSQL queries
For example
SELECT col1,col2,col3 FROM table WHERE col1='3';
VS
SELECT col1,col2,col3 FROM table WHERE col1= 3;
If there is a performance cost, what is the size of it and why does it occur? Are there any other disavantages other that performance?
Thanks
Andrew
Edit: The reason for this question
1. Because I want to learn the difference because I am curious
2. I am experimenting with a way of passing composite keys from my database around in my php code as psudo-Id-keys(PIK). These PIK's are the used to target the record.
For example, given a primary key (AreaCode,Category,RecordDtm)
My PIK in the url would look like this:
index.php?action=hello&Id=20001,trvl,2010:10:10 17:10:45
And I would select this record like this:
$Id = $_POST['Id'];//equals 20001,trvl,2010:10:10 17:10:45
$sql = "SELECT AreaCode,Category,RecordDtm,OtherColumns.... FROM table WHERE (AreaCode,Category,RecordDtm) = ({$Id});
$mysqli->query($sql):
......and so on.
At this point the query won't work because of the datetime(which must be quoted) and it is open to sql injection because I haven't escaped those values. Given the fact that I won't always know how my PIK's are constructed I would write a function splits the Id PIK at the commas, cleans each part with real_escape_string and puts It back together with the values quoted. For Example:
$Id = "'20001','trvl','2010:10:10 17:10:45'"
Of course, in this function that is breaking apart and cleaning the Id I could check if the value is a number or not. If it is a number, don't quote it. If it is anything but a string then quote it.
The performance cost is that whenever mysql needs to do a type conversion from whatever you give it to datatype of the column. So with your query
SELECT col1,col2,col3 FROM table WHERE col1='3';
If col1 is not a string type, MySQL needs to convert '3' to that type. This type of query isn't really a big deal, as the performance overhead of that conversion is negligible.
However, when you try to do the same thing when, say, joining 2 table that have several million rows each. If the columns in the ON clause are not the same datatype, then MySQL will have to convert several million rows every single time you run your query, and that is where the performance overhead comes in.
Strings also have a different sort order from numbers.
Compare:
SELECT 312 < 41
(yields 0, because 312 numerically comes after 41)
to:
SELECT '312' < '41'
(yields 1, because '312' lexicographically comes before '41')
Depending on the way your query is built using quotes might give wrong results or none at all.
Numbers should be used as such, so never use quotes unless you have a special reason to do so.
According to me, I think there is no performance/size cost in the case you have mentioned. Even if there is, then it is very much negligible and wont affect your application as such.
It gives the wrong impression about the data type for the column. As an outsider, I assume the column in question is CHAR/VARCHAR & choose operations accordingly.
Otherwise MySQL, like most other databases, will implicitly convert the value to whatever the column data type is. There's no performance issue with this that I'm aware of but there's a risk that supplying a value that requires explicit conversion (using CAST or CONVERT) will trigger an error.

Enum or Bool in mysql?

Simple silly question. What is better?
A Bool or an Enum('y','n') ?
BOOLEAN is an alias for TINYINT(1) and is stored as one byte of data.
ENUM('y','n') is also stored as 1 byte of data.
So from a storage size point of view, neither is better.
However you can store 9 in a BOOLEAN field and it will accept it. So if you want to force two states only, go for ENUM.
Here's the problem with storing boolean values as an enum:
SELECT count(*) FROM people WHERE is_active = true; #=> Returns 0 because true != 'true'
Which is misleading because:
SELECT count(*) FROM people WHERE is_active = 'true'; #=> Returns 10
If you're writing all of your own SQL queries, then you would know to not to pass an expression into your query, but if you're using an ORM you're going to run into trouble since an ORM will typically convert the expression to something the database it's querying can understand ('t'/'f' for SQLite; 0/1 for MySQL etc.)
In short, while one may not be faster than the other at the byte level, booleans should be stored as expressions so they can be compared with other expressions.
At least, that's how I see it.
TINYINT(1) - it looks like a Boolean, so make it one.
Never compare internally to things like y when a Boolean (0/1) is available.
Neither are best for storing a single bit (or boolean). The enum has a lookup table, and stores the answer as an integer. The boolean is actually just an alias for "TINYINT(1)" which is technically 8 bits of information. The bit data type will only store as many bits as in its definition (like in the varchar type) so a bit(1) will literally only store one bit. However, if you only have one of these fields, then the question is moot, as nothing will fill the remaining bits, so they will be unused space on each row (amount of space each row is rounded up to at least a byte, typically 8 bits, per row).
A lot of default advise is to use BOOL/TINYINT(1), but as stated in the answer at https://stackoverflow.com/a/4180982/2045006 this allow 9 variations of TRUE.
In many cases this does not matter, but if your column will be part of a unique index then this will become quite a problem.
In the case that you will use the column in a unique index, I would recommend using BIT(1).
ENUM would also work well with a unique index (provided you have a suitable SQL Mode set.) However, I would use ENUM only when you want to work with string representations of true/false rather than actual boolean values.
Depending on the language you're using to interface with the database, you can run into case sensitivity issues by using enum, for example if your database uses a lowercase 'y' but your code expects an uppercase 'Y'. A bool/tinyint will always be 0 or 1 (or NULL) so it avoids this problem.
There are 8 reasons for not using ENUM data type;
So, instead of ENUM, either use boolean or a reference foreign table.