mysql performance <, > for varchar vs integer - mysql

I need to save unknown datatypes to a field. I therefore have to use varchar instead of integer (because the data could also be a string)
id, filter1, filter2
1, male, 24
2, female, 53
In this case filter1 is gender and filter2 is age. Is there a big performance impact if I query:
SELECT * FROM tbl WHERE filter2 > 30
compared to using integer?

If you write:
where filter > 30
Then MySQL will do what you want -- but you might get strange results. If you had a column with the value '44x', then it would also be chosen by the filter. Why? Because MySQL will convert filter to a string. In addition, the type conversion generally precludes the use of an index.
If you use strings:
where filter > '30'
Then you don't have the string conversion problem, but you will get all the strings that start with letters.
In other words, don't mix types like this. Values should be stored in their native types. You should revisit your data model -- you probably want a column called age somewhere (or better yet, date-of-birth).

Related

Efficient way of querying string and number field from PostgreSQL's json field

We have used json field in postgres as we have dynamic fields. Fields can be string or number type only. We have billions of rows in table hence queries are working too slow. We could't add index as we don't aware the field names used in query and queries will be constructed dynamically during run time.
Table design is as follows,
id - integer
workspace_id - integer
data - json
created_at - timestamp
updated_at - timestamp
Stored data in json field as follows,
{"age": 21, "city": "London", "name": "ABC", "test_filed1": "text",...}
Example for string field:
SELECT users.*
FROM users
WHERE users.workspace_id = 1
AND data ->> 'city' = 'London'
ORDER BY users.id DESC
LIMIT 50;
Example for number field:
SELECT users.*
FROM users
WHERE users.workspace_id = 1
AND CAST(data ->> 'age' AS NUMERIC) = 21
ORDER BY users.id DESC
LIMIT 50;
When we are using ->> operator to get data, it automatically type cast the result to string. For instance, when I get age like data ->> 'age' then resulted value will be type cast to '21' though it's stored as number value. If we need to check any of number related condition, we need to type cast (as mentioned in example) to check greater_than, less_than though we stored age data in number format in json field. Also for for checking string also it's doing type cast to ::text.
Since I have stored data in appropriate format (used quotes for string and stored number as number without quotes), is there any better way to get data as stored in DB rather than type cast? So that I can do number related conditions without type cast.
Note : I have already added index for workspace_id.
Well, the ->> always returns a text values - that's how it is implemented (it's not really "casting" the value).
JSON doesn't really have a notion of data types, so there is no operator in Postgres that would yield a different data type when you extract values from a JSON each time you use it. As everything can be represented as text, this is what ->> returns.
This is the price you pay for de-normalizing your data model.
However equality comparisons can be done differently (when you cast the column to jsonb which it should be to begin with).
SELECT users.*
FROM users
WHERE users.workspace_id = 1
AND data::jsonb #> '{"city": "London"}'
ORDER BY users.id DESC
LIMIT 50;
SELECT users.*
FROM users
WHERE users.workspace_id = 1
AND data::jsonb #> '{"age": 21}'
ORDER BY users.id DESC
LIMIT 50;
The #> can make use of a GIN index on the JSONB value, e.g.
create index on users (data::jsonb);
Obviously this is subject to the usual index usage rules. Not every index that is created is also being used.
(But it would be better to convert the data to jsonb to avoid the casting all the time)

MySQL Query where id = id_product error

i'm working with mysql in a nodejs web app. I don't understand why when I ask for some id (key) it gives me more than 1 result.
When I:
SELECT * FROM products WHERE id = 1;
This happens, I get 3 results, but I only want 1:
1, 001 and 0000001.
I just want the info of one product (id: 1 in this example)
How can I fix this?
ID type is varchar(20)
If I use LIKE instead of = my result changes:
SELECT * FROM products WHERE id LIKE 0000001;
I get the info of id = 1 instead 0000001. Don't know why.
Thanks
The WHERE clause of your query contains a comparison of a literal numeric value with a string (column id).
When it needs to compare values of different type, MySQL uses several rules to convert one or both of the values to a common type.
Some of the type conversion rules are not intuitive. The last rule is the only one that matches a comparison of an integer number with a string:
In all other cases, the arguments are compared as floating-point (real) numbers.
When they are converted to floating-point (real) numbers, 1 (integer), '1', '0001' and '0000001' are all equal.
In order to get an exact match the literal value you put in the query must have the same type as the column id (i.e string). The query should be:
SELECT * FROM products WHERE id = '1'
The problem is that you are looking by a varchar type using an integer cast.
Try to add quotes to the id parameter:
SELECT * FROM products WHERE id = '1';
If you want to add integer ids with with leading zeros, I recommend you to use the zerofill option:
https://dev.mysql.com/doc/refman/5.5/en/numeric-type-attributes.html
If you want to use use alphanumeric values then keeps the ID type as varchar, but remember to enclose the search param into quotes.
Numbers in MySQL (and the real world) don't have leading zeros. Strings do.
So, you just need to make the comparison using the right type:
SELECT *
FROM products
WHERE id = '1';
What happens with your original query is that the id is converted to a number. And '1', '001' and '0000001' are all converted to the same integer -- 1. Hence, all three pass the filter.

Automatic MySQL data type casting

I just happened upon an interesting case of data type casting in MySQL. Consider the following queries:
SELECT * FROM (SELECT 0 AS col) AS t WHERE t.col=123; #Yields 0 rows
SELECT * FROM (SELECT 0 AS col) AS t WHERE t.col="123"; #Yields 0 rows
SELECT * FROM (SELECT 0 AS col) AS t WHERE t.col="0"; #Yields 1 row, col=0
SELECT * FROM (SELECT 0 AS col) AS t WHERE t.col="abc"; #Yields 1 row, col=0
Lines 1, 2, and 3 seem logical to me. But on line 4, why, oh why, dear SQL, do you so eagerly cast "abc" to be equal to 0?!
I mean, I get it - "abc" isn't an integer, so 0 makes the most sense... Is there a scenario in which this behavior is actually useful? As far as I can tell, it likely just leads to bugs (as it did on our application)...
Perhaps there's a MySQL "mode" that enables warnings for automatic type-casting like this?
MySQL does implicit type casting for strings in a numeric context. The leading numeric characters of the string are converted to a number, so a string such as 'abc' gets converted to 0.
This can be very handy because this conversion does not cause an error (an explicit conversion would).
The moral is simple: When comparing constants to columns, make the column the same type as the column. That is, don't compare strings and numbers, lest something unexpected happen.
This is definitely the way MySQL works.
When you use a comparison that compares a numeric object to a string constant, the string gets cast as an integer. MySQL tries to interpret the string as an number, like this:
'0123abc' gets the value 123.
'1abc' gets the value 1.
'abc' gets the value 0.
What use is this? It comes in handy in ORDER BY clauses if you need numeric text strings ordered in numeric order with '112abc' after '12abc'.

SQL search with REGEX instead with BETWEEN operator

I have MySQL database, and inside table with ads. In one field of table of that database, data is being saved in json format. In that json formatted data, I have key which value contains price (with decimal values).
That field (named for example ad_data), which is saved in database field, contains (json) data like this:
{"single_input_51":"Ad 44 test.","price":"20.00","single_input_4":"ad test title, ad tes title, .","single_input_11":"8.8.2015.","single_input_5":"video test","single_input_6":"https://www.youtube.com/watch?v=nlTPeCs2puw"}
I would like to search in that field, so I can find price range that is searched. If for example, user sets in html form he wants to search in ranges from 100.00 do 755.00, SQL should return only rows where that field (which data is saved as json) contains those values that are from 100.00 to 755.00.
So basically, I would want to write something like this with REGEX in SQL for that json formatted contents of that field (numbers here are just examples, I must be able to to this for every starting and closing decimal number, and numbers I will pass programatically):
SELECT id, price FROM ads WHERE price BETWEEN 100.00 AND 755.00
What would be SQL command for that search via REGEX?
Don't use REGEX for doing the match, that will be painful. If you had a particular range of prices you were looking for, it might be doable, but to dynamically generate the regular expression to "work" for any specified range of prices, when the price could be two, three or more characters, that's going to be hard. (The REGEXP function in MySQL only returns a boolean indicating whether a match was found or not; it won't return the portion of the string that was matched.)
If I had to do a comparison on "price", I would parse the value for price out of the string, then cast that to a numeric value, and the do a comparison on that.
For example:
SELECT t.col
FROM mytable t
WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(t.col,'"price":"',-1),'"',1) + 0
BETWEEN 100.00 AND 755.00
To answer the question you asked: what expression would you use to perform this match using a REGEX...
For "price between 100.00 and 755.00", using MySQL REGEXP, the regular expression you would need would be something like the second expression in the SELECT list of this query:
SELECT t.col
, t.col REGEXP '"price":"([1-6][0-9][0-9]\.[0-9][0-9]|7[0-4][0-9]\.[0-9][0-9]|75[0-4]\.[0-9][0-9]|755\.00)"' AS _match
FROM ( SELECT 'no' AS col
UNION ALL SELECT 'no "price":"14.00"def'
UNION ALL SELECT 'ok "price":"99.99" def'
UNION ALL SELECT 'ok "price":"100.00" def'
UNION ALL SELECT 'ok "price":"699.99" def'
UNION ALL SELECT 'ok "price":"703.33" def'
UNION ALL SELECT 'ok "price":"743.15" def'
UNION ALL SELECT 'ok "price":"754.99" def'
UNION ALL SELECT 'no "price":"755.01" def'
) t
The regular expression in this example is almost a trivial example, because the price values we're matching all have three digits before the decimal point.
The string used for a regular expression would need to be crafted for each possible range of values. The crafting would need to take into account prices with different number of digits before the decimal point, and handle each of those separately.
For doing a range check of price between 95.55 to 1044.44, that would need to be crafted into a regular expression to check price in these ranges:
95.55 thru 95.59 95\.5[5-9]
95.60 thru 95.99 95\.[6-9][0-9]
96.00 thru 99.99 9[6-9]\.[0-9][0-9]
100.00 thru 999.99 [1-9][0-9][0-9]\.[0-9][0-9]
1000.00 thru 1039.99 10[0-3][0-9]\.[0-9][0-9]
1040.00 thru 1043.99 1040[0-3]\.[0-9][0-9]
1044.00 thru 1044.39 1044\.[0-3][0-9]
1044.40 thru 1044.44 1044\.4[0-4]
It could be done, but the code to generate the regular expression string won't be pretty. (And getting it fully tested won't be pretty either.)
(#spencer7593 has a good point; here's another point)
Performance... If you have an index on that field (and the optimizer decides to use the index), then BETWEEN can be much faster than a REGEXP.
BETWEEN can use an index, thereby minimizing the number of rows to look at.
REGEXP always has to check all rows.

What is the best way to select a rows by comparing ENUM and SET columns

I mean if any table consist of ENUM or SET typed column and I will make the query like:
SELECT * FROM `tabname` WHERE `enum_field` = 'case1' OR `enum_field` = 'case2'
Will it be efficient?
Or the engine will convert number stored in enum_field to string, compare it to the pattern (in my example - case1 and case2)?
What is the most efficient way to use such columns?
Thanks!
The syntax you provided above is correct.
The database engine will convert the strings in your query into numeric indexes, which will be used when searching the table.
According to the mysql documentation, you can also query directly by numeric index, but this can sometimes have unexpected results, particularly if any of your enum string values are numeric.
So assuming "case1" has index 1, and "case2" has index 2, you could rewrite your query like this:
SELECT * FROM `tabname` WHERE `enum_field` = 1 OR `enum_field` = 2