Alternative of IN operator in mysql - mysql

I have a query in which IN operator is used but that query is very slow in production environment.
This is my query
SELECT col1 FROM table_name WHERE col2 IN (251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,
270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,1,2,3,4,5,6,7,8,9,10,11,12,13,14) AND col3 > '1' AND col4 = 1 GROUP BY col1;
I tried find in set but still it is slow, I used composite indexing with the same order of columns as I wrote in where clause, its query cost 151630.44, filter 100 and table contain million of records.
I am expecting to get its query cost as small as possible that is, 1 or 2 and filter 100.

In tends to be slower than comparison. Seeing that your values appear to be largely sequential, try something like
SELECT col1
FROM table_name
WHERE ((col2 >=1 AND Col2 <=14) or (Col2 >=251 AND Col2 <=298))
AND col3 > '1' AND col4 = 1
GROUP BY col1;

Related

How to build a sequence of values by combining two columns?

I am looking to get a sequence of values by combining two columns that are linked using some random ids:
Table (col2 and col3 are linked)
col1 col2 col3
aa a144 d653
bb z567 a144
cc d653 h999
dd y678 z567
The two columns (col2 and col3), this is like a chain that is forming up.
The result I am looking for is a sequence from start to end:
sequence
y678
z567
a144
d653
h999
Explanation:
The sequence starts at row 4 (dd,y678,z567), followed by row 2 (bb,z567,a144) and so on.
Col3 id is the reference for the Col2 id, to decide the next element.
What you're looking for is a recursive query.
Assuming your table is called data, you do it like this:
WITH RECURSIVE query(id) AS (
SELECT col2
FROM data
WHERE col1 = 'dd' -- Select the initial row here
UNION ALL
SELECT data.col3
FROM data
INNER JOIN query on query.id = data.col2
)
SELECT *
FROM query;
Tested snippet available here: https://onecompiler.com/mysql/3xvj2a47v.
This syntax works in MySQL version 8 and up. If your version is lower, first thing I would recommend is to update it, if possible. If not possible, consult this answer for a workaround using MySQL 5: https://stackoverflow.com/a/33737203/2979473.
you are going to have to use a cursor..
https://www.mysqltutorial.org/mysql-cursor/
first step will be to select the value from col2 that doesn't exist in col3
then insert the value from col3 where the current variable is in col2
return the results set when the value in col3 is not found in col2
This will only work if there is one start and end value and one distinct path through the chain.
It will also be slow, because this is not how RDBMS databases are designed to work.
I think this query will work for you.
SELECT DISTINCT SEQ
FROM
(
SELECT COL2 SEQ FROM TABLE1
UNION
SELECT COL3 SEQ FROM TABLE1
) ORDER BY 1

Checking multiple columns for one value with greater than or equal (>=)

Let's say i'm having a table like this:
id,col1,col2,col3,col4
I wish to check if any of col1,col2,col3,col4 are greater than or equal 10
The idea was smth like
SELECT * FROM table WHERE (col1 >= 10 OR col2 >= 10 OR col3 >= 10 OR col4 >= 10);
Is there any more optimized way?
I thought that I could use IN, but as don't have any clue how to use >= in it.
Assuming none of the values are NULL, you can use greatest():
SELECT *
FROM table
WHERE GREATEST(col1, col2, col3, col5) >= 10;
This is no more efficient, but it is shorter.
That is how I might write the script.
You could try the below, but performance is likely to be degraded (not sure as don't have MySQL setup)
Test Table Setup
CREATE TABLE tbl_test (
col1 INT
,col2 INT
,col3 INT
);
INSERT INTO tbl_test (col1,col2,col3)
VALUES (10,15,20)
,(1,2,3)
,(4,5,6)
,(10,0,0)
,(20,0,0);
Comparing using GREATEST() (Less verbose, but performance cost likely)
SELECT *,GREATEST(col1,col2,col3)
FROM tbl_test
WHERE GREATEST(col1,col2,col3) >= 10;
Performance Optimized Version (More verbose, but improved performance)
For performance reasons, I'd generally recommend separate queries instead of multiple OR statements. This allows database engine to use optimal indexes (should they exist)
SELECT * FROM tbl_test WHERE col1 >=10
UNION
SELECT * FROM tbl_test WHERE col2 >=10
UNION
SELECT * FROM tbl_test WHERE col3 >=10;

MySQL order by limit

I am trying to understand the following snippet of a complicated MYSQL procedure, where var1, var2, var3 and var4 are declared variables:
select col1, col2, col3, col4
from table
order by col1 limit 5, 1 into var1, var2, var3, var4;
In particular, I don't know what the above is inserting into var1, var2, var3 and var4.
More generally, I don't understand what the following does in MYSQL:
select...limit ..., 1 into
Any help would be great!
Lets dissect the query.
select col1, col2, col3, col4 from table;
This gives you all the rows from the table, but only with 4 columns.
select col1, col2, col3, col4 from table order by col1;
Now you are are ordering the results in the ascending order of col1.
select col1, col2, col3, col4 from table order by col1 limit 5, 1;
Here 5 is the offset and 1 is the actual limit, which will reduce your result set to a size of 1.
What does offset mean?
Suppose you get 10 results from the second query. Then this third query will give you results starting from the 6th row and limited by the limit passed, which in this case is 1. Hence, you will only get one row having four values.
What does into do?
It simply stores the values of those 4 results into those 4 variables.
So LIMIT can take two paramaters LIMIT offset, count
offset is the offset of the first row you want to return (for example offset of row 1 would be 0)
count is the max amount of rows that you want to return
If you pass it just one parameter then that's just a count so LIMIT 5 would return the first 5 rows in the table.
so your query is returning the 4 values of col1, col2, col3 and col4 that are in row 6 and then is returning those values into var1, var2, var3, var4
(I think... I haven't used anything like that in a while... so I stand to be corrected)

Query with GROUP BY and ORDER BY not working when multiple columns in SELECT are chosen

I'm updating an old website and one of the queries isn't working anymore:
SELECT * FROM tbl WHERE col1 IS NULL GROUP BY col2 ORDER BY col2
I noticed if I dropped the GROUP BY it works, but the result set doesn't match the original:
SELECT * FROM tbl WHERE col1 IS NULL ORDER BY col2
So I tried reading up on GROUP BY in the docs to see what might be the issue, and it seemed to suggest not using * to select all the fields, but explicitly using the column name so I tried it with just the column that was being ordered and grouped:
SELECT col2 FROM tbl WHERE col1 IS NULL GROUP BY col2 ORDER BY col2
Which works but after looking through the code the query requires 2 columns in the query so whoever added * was overdoing it, but if I add that column produces an error, similarly adding a third column produces the same error:
SELECT col2, col3 FROM tbl WHERE col1 IS NULL GROUP BY col2 ORDER BY col2
SELECT col1, col2, col3 FROM tbl WHERE col1 IS NULL GROUP BY col2 ORDER BY col2
Can anyone tell me why this last query doesn't work? I can't decipher why from the docs, but this is the minimum query required to get the result set I need.
Running the query in Adminer I get this error
Error in query (1055): Expression #2 of SELECT list is not in GROUP BY
clause and contains nonaggregated column 'name.table.column'
which is not functionally dependent on columns in GROUP BY clause; this is
incompatible with sql_mode=only_full_group_by
You need to be careful when you use GROUP BY. Once you understand what GROUP BY does, you will know the issue yourself. It does an aggregation on your data or in other words, it reduces your data by doing some operation on the raw entries and creating new reduced number of entries on which some aggregation function has been applied(SUM, COUNT, AVG, etc.)
The fields you provide in the GROUP BY clause represents the level of aggregation/roll-up you are going for.
SELECT col2, col3 FROM tbl WHERE col1 IS NULL GROUP BY col1 ORDER BY col1
Here you are trying to do the aggregation at col1 level, meaning that for every distinct value present in column col1, there will be some operation done on some other columns you provide in SELECT clause(here col2,col3) so that in the output you have non-repeating values in col1 and some rolled-up values of col2 and col3 against each distinct col1 value based on what function you apply(SUM, COUNT, AVG, etc.).
How do you apply this function? That is what is missing in your above query. To solve it, you need to apply some aggregation function on the fields that are present in the SELECT clause but not in GROUP BY clause. Taking an example of SUM, try this:
SELECT SUM(col2), SUM(col3) FROM tbl WHERE col1 IS NULL GROUP BY col1 ORDER BY col1
OR for a better idea, removing WHERE filter and checking the output by running:
SELECT col1, SUM(col2), SUM(col3) FROM tbl GROUP BY col1 ORDER BY col1
Additionally, the reason why your other query
SELECT col2 FROM tbl WHERE col1 IS NULL GROUP BY col2 ORDER BY col2
worked is because you need not apply aggregation to the field(here col2) which is present in the GROUP BY clause.
First of all, when query() returns false, you should find out what the error was. You seem to be using PDO, so I will direct you to this page: http://php.net/manual/en/pdo.error-handling.php
TL;DR - you should enable PDO exceptions, or else you need to write code to check the result of every call to query(), prepare(), and execute() to see if an error occurred. And if so, use errorInfo() to find out the actual error. Doing anything else is flying blind!
Error in query (1055): Expression #2 of SELECT list is not in GROUP BY
clause and contains nonaggregated column 'webvictoria.cats_oct.matchLink'
which is not functionally dependent on columns in GROUP BY clause; this is
incompatible with sql_mode=only_full_group_by
This is a common issue. See dozens of questions tagged mysql-error-1055.
I guess you just upgraded to MySQL 5.7. MySQL 5.7 enabled strict mode by default, so I guess you just upgraded. Prior to MySQL 5.6, strict mode was optional and not enabled by default.
See: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
You can't write ambiguous queries. If you GROUP BY col2, which value in the group of rows of each group should be used for col1 and col3? It's ambiguous.
Without strict mode, MySQL chooses an arbitrary row from the group. With strict mode, it reverts to standard SQL behavior, and disallows the ambiguous query. This is how most other brands of SQL database behave, by the way.
To fix it, you must follow this rule: Every column in your select list must be one of:
A column in your GROUP BY clause
A column functionally dependent on the columns in your GROUP BY clause (so there can only be one value)
Used in an aggregate function like MIN(), MAX(), COUNT(), SUM(), AVG(), or GROUP_CONCAT()
Some people choose to disable strict mode in MySQL 5.7 for the sake of "getting the code working again." But it isn't working—it's just giving ambiguous results like it did before MySQL 5.7.
It's better to fix the logic of your queries.
This query:
SELECT *
FROM tbl
WHERE col1 IS NULL
GROUP BY col1
ORDER BY col1;
never really worked. It may have seemed to work, but you were just lucky. You have unaggregated columns in the SELECT. These come from an arbitrary row.
You can do something like this to get values from other columns:
SELECT col1, min(col2), min(col3)
FROM tbl t
WHERE col1 IS NULL AND
GROUP BY col1
ORDER BY col1;
The reason it didn't work is because you need to use one of the selection criteria in the GROUP BY and the ORDER BY. So if you wanted to group by col1, you would need to do this:
SELECT col1, col2, col3
FROM tbl
WHERE
col1 IS NULL
GROUP BY col1
ORDER BY col1
;
Without selecting that field, you are basically saying "Hey go get me every phone number in California" Then after you get that you say "Now order them by first name and group them by last name" and DBMS says "but... I don't have any of that"
try this
SELECT col2, col3 FROM tbl WHERE col1 IS NULL GROUP BY col2, col3 ORDER BY col2, col3

Mysql use of index in case when syntax

I'm writing a query that updates with the following syntax:
UPDATE foo SET col1 = CASE col2
WHEN 1 THEN 3
WHEN 2 THEN 9
...
ELSE col1 END
WHERE col2 IN (1,2...)
Note that there can be thousands of WHEN THEN cases. EXPLAIN shows that PK will be used for IN clause, but how does the database compute the CASE/WHEN after it filters based on IN clause - does it scan all them or use a hash? I don't think this would be explicit in an EXPLAIN (for example without the IN clause).
Instead of thousands of case when statements, create another table in your database (let's name it keyValueTable) and let one column be the when (key) and the other one the then (value):
id colkey value
1 1 3
2 2 9
make colkey unique and set an index on it, then query the database like
UPDATE foo SET col1 = (
SELECT value from keyValueTable
INNER JOIN foo ON keyValueTable.colkey = foo.col1
LIMIT 1
)
WHERE ...
The CASE expression will be tediously walked through for each row not filtered by the WHERE. There is no optimization, since the WHEN values could be arbitrary expressions, not simple constants like in your case.
This might be faster, assuming you have an index on col2:
UPDATE foo SET col1 = 3 WHERE col2 = 1;
UPDATE foo SET col1 = 9 WHERE col2 = 2;
...