MySQL - SELECT WHERE field IN (subquery) - Extremely slow why? - mysql

I've got a couple of duplicates in a database that I want to inspect, so what I did to see which are duplicates, I did this:
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
This way, I will get all rows with relevant_field occuring more than once. This query takes milliseconds to execute.
Now, I wanted to inspect each of the duplicates, so I thought I could SELECT each row in some_table with a relevant_field in the above query, so I did like this:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)
This turns out to be extreeeemely slow for some reason (it takes minutes). What exactly is going on here to make it that slow? relevant_field is indexed.
Eventually I tried creating a view "temp_view" from the first query (SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1), and then making my second query like this instead:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM temp_view
)
And that works just fine. MySQL does this in some milliseconds.
Any SQL experts here who can explain what's going on?

The subquery is being run for each row because it is a correlated query. One can make a correlated query into a non-correlated query by selecting everything from the subquery, like so:
SELECT * FROM
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) AS subquery
The final query would look like this:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT * FROM
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) AS subquery
)

Rewrite the query into this
SELECT st1.*, st2.relevant_field FROM sometable st1
INNER JOIN sometable st2 ON (st1.relevant_field = st2.relevant_field)
GROUP BY st1.id /* list a unique sometable field here*/
HAVING COUNT(*) > 1
I think st2.relevant_field must be in the select, because otherwise the having clause will give an error, but I'm not 100% sure
Never use IN with a subquery; this is notoriously slow.
Only ever use IN with a fixed list of values.
More tips
If you want to make queries faster,
don't do a SELECT * only select
the fields that you really need.
Make sure you have an index on relevant_field to speed up the equi-join.
Make sure to group by on the primary key.
If you are on InnoDB and you only select indexed fields (and things are not too complex) than MySQL will resolve your query using only the indexes, speeding things way up.
General solution for 90% of your IN (select queries
Use this code
SELECT * FROM sometable a WHERE EXISTS (
SELECT 1 FROM sometable b
WHERE a.relevant_field = b.relevant_field
GROUP BY b.relevant_field
HAVING count(*) > 1)

SELECT st1.*
FROM some_table st1
inner join
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)st2 on st2.relevant_field = st1.relevant_field;
I've tried your query on one of my databases, and also tried it rewritten as a join to a sub-query.
This worked a lot faster, try it!

I have reformatted your slow sql query with www.prettysql.net
SELECT *
FROM some_table
WHERE
relevant_field in
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT ( * ) > 1
);
When using a table in both the query and the subquery, you should always alias both, like this:
SELECT *
FROM some_table as t1
WHERE
t1.relevant_field in
(
SELECT t2.relevant_field
FROM some_table as t2
GROUP BY t2.relevant_field
HAVING COUNT ( t2.relevant_field ) > 1
);
Does that help?

Subqueries vs joins
http://www.scribd.com/doc/2546837/New-Subquery-Optimizations-In-MySQL-6

Try this
SELECT t1.*
FROM
some_table t1,
(SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT (*) > 1) t2
WHERE
t1.relevant_field = t2.relevant_field;

Firstly you can find duplicate rows and find count of rows is used how many times and order it by number like this;
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
CASE q.NID
WHEN #curCode THEN
#curRow := #curRow + 1
ELSE
#curRow := 1
AND #curCode := q.NID
END
) AS No
FROM UserInfo q,
(
SELECT
#curRow := 1,
#curCode := ''
) rt
WHERE q.NID IN
(
SELECT NID
FROM UserInfo
GROUP BY NID
HAVING COUNT(*) > 1
)
after that create a table and insert result to it.
create table CopyTable
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
CASE q.NID
WHEN #curCode THEN
#curRow := #curRow + 1
ELSE
#curRow := 1
AND #curCode := q.NID
END
) AS No
FROM UserInfo q,
(
SELECT
#curRow := 1,
#curCode := ''
) rt
WHERE q.NID IN
(
SELECT NID
FROM UserInfo
GROUP BY NID
HAVING COUNT(*) > 1
)
Finally, delete dublicate rows.No is start 0. Except fist number of each group delete all dublicate rows.
delete from CopyTable where No!= 0;

sometimes when data grow bigger mysql WHERE IN's could be pretty slow because of query optimization. Try using STRAIGHT_JOIN to tell mysql to execute query as is, e.g.
SELECT STRAIGHT_JOIN table.field FROM table WHERE table.id IN (...)
but beware: in most cases mysql optimizer works pretty well, so I would recommend to use it only when you have this kind of problem

This is similar to my case, where I have a table named tabel_buku_besar. What I need are
Looking for record that have account_code='101.100' in tabel_buku_besar which have companyarea='20000' and also have IDR as currency
I need to get all record from tabel_buku_besar which have account_code same as step 1 but have transaction_number in step 1 result
while using select ... from...where....transaction_number in (select transaction_number from ....), my query running extremely slow and sometimes causing request time out or make my application not responding...
I try this combination and the result...not bad...
`select DATE_FORMAT(L.TANGGAL_INPUT,'%d-%m-%y') AS TANGGAL,
L.TRANSACTION_NUMBER AS VOUCHER,
L.ACCOUNT_CODE,
C.DESCRIPTION,
L.DEBET,
L.KREDIT
from (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE!='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) L
INNER JOIN (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) R ON R.TRANSACTION_NUMBER=L.TRANSACTION_NUMBER AND R.COMPANYAREA=L.COMPANYAREA
LEFT OUTER JOIN master_account C ON C.ACCOUNT_CODE=L.ACCOUNT_CODE AND C.COMPANYAREA=L.COMPANYAREA
ORDER BY L.TANGGAL_INPUT,L.TRANSACTION_NUMBER`

I find this to be the most efficient for finding if a value exists, logic can easily be inverted to find if a value doesn't exist (ie IS NULL);
SELECT * FROM primary_table st1
LEFT JOIN comparision_table st2 ON (st1.relevant_field = st2.relevant_field)
WHERE st2.primaryKey IS NOT NULL
*Replace relevant_field with the name of the value that you want to check exists in your table
*Replace primaryKey with the name of the primary key column on the comparison table.

It's slow because your sub-query is executed once for every comparison between relevant_field and your IN clause's sub-query. You can avoid that like so:
SELECT *
FROM some_table T1 INNER JOIN
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) T2
USING(relevant_field)
This creates a derived table (in memory unless it's too large to fit) as T2, then INNER JOIN's it with T1. The JOIN happens one time, so the query is executed one time.
I find this particularly handy for optimising cases where a pivot is used to associate a bulk data table with a more specific data table and you want to produce counts of the bulk table based on a subset of the more specific one's related rows. If you can narrow down the bulk rows to <5% then the resulting sparse accesses will generally be faster than a full table scan.
ie you have a Users table (condition), an Orders table (pivot) and LineItems table (bulk) which references counts of Products. You want the sum of Products grouped by User in PostCode '90210'. In this case the JOIN will be orders of magnitude smaller than when using WHERE relevant_field IN( SELECT * FROM (...) T2 ), and therefore much faster, especially if that JOIN is spilling to disk!

Related

How to select matching results with other random results in mysql

I want to select all the matching results in a database table with also random results but with the matching results being at the top. With the way, I am doing now I am using two queries first one being the matching query, and if the count is zero I now select random results. I would like to do this with just one query.
You could attempt using a UNION ALL query as follows.
select product_name,price
from marketing_table
where price >=5000 /*user supplied filter*/
and price <=10000 /*user supplied filter*/
union all
select m.product_name,m.price
from marketing_table m
where not exists (select *
from marketing_table m1
where m1.price >=5000 /*user supplied filter*/
and m1.price <=10000 /*user supplied filter*/
)
What I understand from you comment, you may try something simple like this first:
SET #product := 'purse'; -- search term
SELECT * FROM product
ORDER BY product_name LIKE CONCAT('%',#product,'%') DESC, price ASC;
This is the simplest I can think of and it could be a starting point for you.
Here's a demo : https://www.db-fiddle.com/f/31jrR27dFJqYQQigzBqLcs/2
If this is not what you want, you have to edit your question and insert some example data with expected output. Your current question tend to be flagged as too broad and need focus/clarity.
Did you try using a UNION subquery with a LIMIT?
SELECT *
FROM (
SELECT 0 priority, t.*
FROM first_table t
UNION ALL
SELECT 1 priority, t.*
FROM second_table t
)
ORDER BY priority
LIMIT 20
If you do not want to include any second_table records if first_table returns, you would need to do a subquery on the second query to confirm that no rows exist.
SELECT *
FROM (
SELECT 0 priority, t.*
FROM first_table t
UNION ALL
SELECT 1 priority, t.*
FROM second_table t
LEFT JOIN (SELECT ... FROM first_table) a
WHERE a.id IS NULL
)
ORDER BY priority
LIMIT 20
I think it would be possible to use the Common Table Expressions (CTE) feature in MySQL 8, if you are using that version.
https://dev.mysql.com/doc/refman/8.0/en/with.html

SQL Union Query - Referencing to alias of derived table

I have a complicated aggregate-functions query that produces a result-set, and which has to be amended with a single row that contains the totals and averages of that result-set.
My idea is to assign an alias to the result-set, and then use that alias in a second query, after a UNION ALL statement.
But, I can't successfully use the alias, in the subsequent SELECT statement, after the UNION ALL statement.
For the sake of simplicity, I won't post the original query here, just a simplified list of the variants I've tried:
SELECT * FROM fees AS Test1 WHERE Percentage = 15
UNION ALL
(SELECT * FROM fees AS Test2 WHERE Percentage > 15)
UNION ALL
(SELECT * FROM (SELECT * FROM fees AS Test3 WHERE Percentage < 10) AS Test4)
UNION ALL
SELECT * FROM Test3
The result is:
MySQL said: Documentation
#1146 - Table 'xxxxxx.Test3' doesn't exist
The result is the same if the last query references to the table Test1, Test2, or Test4.
So, how should I assign an alias to a result-set/derived table in earlier queries and use that same alias in latter queries, all within a UNION query?
Amendment:
My primary query is:
SELECT
COALESCE(referrers.name,order_items.ReferrerID),
SUM(order_items.quantity) as QtySold,
ROUND(SUM((order_items.quantity*order_items.price+order_items.shippingcosts)/((100+order_items.vat)/100)), 2) as TotalRevenueNetto,
ROUND(100*SUM(order_items.quantity*order_items.purchasepricenet)/SUM((order_items.quantity*order_items.price+order_items.shippingcosts)/((100+order_items.vat)/100)), 1) as PurchasePrice,
ROUND(100*SUM(order_items.quantity*COALESCE(order_items.calculatedfee,0)+order_items.quantity*COALESCE(order_items.calculatedcost,0))/SUM((order_items.quantity*order_items.price+order_items.shippingcosts)/((100+order_items.vat)/100)), 1) as Costs,
ROUND(100*SUM(order_items.calculatedprofit) / SUM( (order_items.quantity*order_items.price + order_items.shippingcosts)/((100+order_items.vat)/100) ) , 1) as Profit,
COALESCE(round(100*Returns.TotalReturns_Qty/SUM(order_items.quantity),2),0) as TotalReturns
FROM order_items LEFT JOIN (SELECT order_items.ReferrerID as ReferrerID, sum(order_items.quantity) as TotalReturns_Qty FROM order_items WHERE OrderType='returns' and OrderTimeStamp>='2017-12-1 00:00:00' GROUP BY order_items.ReferrerID) as Returns ON Returns.ReferrerID = order_items.ReferrerID LEFT JOIN `referrers` on `referrers`.`referrerId` = `order_items`.`ReferrerID`
WHERE ( ( order_items.BundleItemID in ('-1', '0') and order_items.OrderType in ('order', '') ) or ( order_items.BundleItemID is NULL and order_items.OrderType = 'returns' ) ) and order_items.OrderTimestamp >= '2017-12-1 00:00:00'
GROUP BY order_items.ReferrerID
ORDER BY referrers.name ASC
I want to make a grand-total of all the rows resulting from query above with:
SELECT 'All marketplaces', SUM(QtySold), SUM(TotalRevenueNetto), AVG(PurchasePrice), AVG(Costs), AVG(Profit), AVG(TotalReturns) FROM PrimaryQuery
I want to do this with a single query.
Your query is well-written. You may be able to get a total line by using a surrounding query with a dummy GROUP BY clause and WITH ROLLUP:
SELECT
COALESCE(Referrer, 'All marketplaces'),
SUM(QtySold) AS QtySold,
SUM(TotalRevenueNetto) AS TotalRevenueNetto,
AVG(PurchasePrice) AS PurchasePrice,
AVG(Costs) AS Costs,
AVG(Profit) AS Profit,
AVG(TotalReturns) AS TotalReturns
FROM
(
SELECT
COALESCE(referrers.name,order_items.ReferrerID) AS Referrer,
SUM(order_items.quantity) AS QtySold,
...
) PrimaryQuery
GROUP BY Referrer ASC WITH ROLLUP;
I'm not entirely sure what you are attempting to solve, but I guess something like the following:
Hypothetical 'main' query:
SELECT T1.ID
, Sum(total_grade)/COUNT(subjects) as AverageGrade
FROM A_Table T1
JOIN AnotherTable T2
ON T2.id = T1.id
GROUP BY T1.ID
You want sub resultsets, without having to keep querying the same data.
Edit: I mistakenly thought the linked documentation and method mentioned below was for the current version of mySQL. It is however a draft for a future version, and CTE's are not currently supported.
In the absence of CTE support, I would probably just insert the resultset into a temporary table. Something like:
CREATE TABLE TEMP_TABLE(ID INT, AverageGrade DECIMAL(15, 3))
INSERT INTO TEMP_TABLE
SELECT T1.ID
, Sum(total_grade)/COUNT(subjects) as AverageGrade
FROM A_Table T1
JOIN AnotherTable T2
ON T2.id = T1.id
GROUP BY T1.ID
SELECT ID, AverageGrade FROM TEMP_TABLE WHERE AverageGrade > 5
UNION ALL
SELECT COUNT(ID) AS TotalCount, SUM(AverageGrade) AS Total_AVGGrade FROM TEMP_TABLE
DROP TABLE TEMP_TABLE
(Disclaimer: I'm not too familiar with mySQL, there may be some syntax errors here. The general idea should be clear, though.)
That is, of course, if i had to do it like this, there are probably better ways to achieve the same. See Thorsten Kettner's comments on the matter.
(Previous answer assuming CTE is a posibility:)
A CTE approach looks like:
WITH CTE AS
(
SELECT T1.ID
, Sum(total_grade)/COUNT(subjects) as AverageGrade
FROM A_Table T1
JOIN AnotherTable T2
ON T2.id = T1.id
GROUP BY T1.ID
)
SELECT ID, AverageGrade FROM CTE WHERE AverageGrade > 5
UNION ALL
SELECT COUNT(ID) AS TotalCount, SUM(AverageGrade) AS Total_AVGGrade FROM CTE
You have the error because every query involved in UNION doens't know the alias of other.
DB Engine execute, in your case, 4 queries and then paste them with UNION operation.
Your real table is fees. Test3 is an alias used in the third query.
If you want to process the results of UNION operation, you must encapsulate your queries in a MAIN query.
It looks like you need something like below. Please try
SELECT * FROM fees AS Test2 WHERE Percentage >= 15
UNION ALL
SELECT * FROM fees AS Test3 WHERE Percentage < 10
You can't use a table alias based on a subquery (is not in the scope of the outer united select) you must repeat the code eg:
SELECT * FROM fees AS Test1 WHERE Percentage = 15
UNION ALL
SELECT * FROM fees AS Test2 WHERE Percentage > 15
UNION ALL
SELECT * FROM (
SELECT * FROM fees AS Test3 WHERE Percentage < 10
) AS Test4
UNION ALL
SELECT * FROM fees AS Test3 WHERE Percentage < 10

SQL - Select multiply conditions

I would like to select multiply conditions using below query:
SELECT (SELECT count(*)
FROM users
)
as totalusers,
(SELECT sum(cashedout)
FROM users
) AS cashedout,
(SELECT COUNT(*)
FROM xeon_users_rented
) AS totalbots,
(SELECT sum(value)
FROM xeon_stats_clicks
WHERE typ='3' OR typ='1'
) AS totalclicks
The above query takes just under a second (0.912 to be exact) to execute. This slows things down a lot with thousands of requests.
What seems logical for me is this approach:
SELECT (SELECT count(*), sum(cashedout)
FROM users
)
as totalusers, cashedout,
(SELECT COUNT(*)
FROM xeon_users_rented
) AS totalbots,
(SELECT sum(value)
FROM xeon_stats_clicks
WHERE typ='3' OR typ='1'
) AS totalclicks
However that doesn't work, as I get the following error:
#1241 - Operand should contain 1 column(s)
Furthermore, how can I join the two other tables "xeon_users_rented" and "xeon_stats_clicks" in my first query?
It's slow because you have multiple subqueries. Try using joins instead.
Also, a list of your tables, columns would help us better assist you.
Your 2nd query is using wrong syntax, it should be
SELECT
count(*) as totalusers,
sum(cashedout) cashedout,
(SELECT COUNT(*) FROM xeon_users_rented) AS totalbots,
(SELECT sum(value) FROM xeon_stats_clicks
WHERE typ='3' OR typ='1') AS totalclicks
FROM users

MySQL CROSS JOIN FROM syntax

I have the following query working
SELECT newTable.Score, COUNT(1) AS Total, COUNT(1) / t.count * 100 AS `Frequency`
FROM mytable newTable
CROSS JOIN (SELECT COUNT(1) AS count FROM mytable) t
GROUP BY newTable.Score
ORDER BY Frequency DESC
However, two things I don't understand from the MySQL docs:
1) I don't understand why there isn't a comma, or a join type, specified in the from clause.
Reading the MySQL docs, this seems necessary.
2) What does the 't' represent in the CROSS JOIN clause?
Any advice appreciated.
The t is the same as the newTable - it is an alias name for the table and the temporary table that the subquery builds.
It is easier to read when the optional as keyword is used
SELECT newTable.Score, COUNT(1) AS Total, COUNT(1) / t.count * 100 AS `Frequency`
FROM mytable as newTable
CROSS JOIN (SELECT COUNT(1) AS count FROM mytable) as t
GROUP BY newTable.Score
ORDER BY Frequency DESC
An alias name replaces the original name of the table with a new one to be used in your query. And you need to give subqueries a name to refer to them in your query too.

SQLite select all records and count

I have the following table:
CREATE TABLE sometable (my_id INTEGER PRIMARY KEY AUTOINCREMENT, name STRING, number STRING);
Running this query:
SELECT * FROM sometable;
Produces the following output:
1|someone|111
2|someone|222
3|monster|333
Along with these three fields I would also like to include a count representing the amount of times the same name exists in the table.
I've obviously tried:
SELECT my_id, name, count(name) FROM sometable GROUP BY name;
though that will not give me an individual result row for every record.
Ideally I would have the following output:
1|someone|111|2
2|someone|222|2
3|monster|333|1
Where the 4th column represents the amount of time this number exists.
Thanks for any help.
You can do this with a correlated subquery in the select clause:
Select st.*,
(SELECT count(*) from sometable st2 where st.name = st2.name) as NameCount
from sometable st;
You can also write this as a join to an aggregated subquery:
select st.*, stn.NameCount
from sometable st join
(select name, count(*) as NameCount
from sometable
group by name
) stn
on st.name = stn.name;
EDIT:
As for performance, the best way to find out is to try both and time them. The correlated subquery will work best when there is an index on sometable(name). Although aggregation is reputed to be slow in MySQL, sometimes this type of query gets surprisingly good results. The best answer is to test.
Select *, (SELECT count(my_id) from sometable) as total from sometable