MySQL Select from UNION performance issue (kills database) - mysql

I have a little problem regarding MySQL
I'm trying to make a UNION of two tables like so:
SELECT `user_id`, `post_id`, `requested_on`
FROM `a`
WHERE `status` != 'cancelled'
UNION
SELECT `user_id`, `post_id`, `time` as requested_on
FROM `b`
WHERE `type` = 'ADD'
This query is executed in Showing rows 0 - 29 (36684 total, Query took 0.0147 sec)
but when I do
SELECT * FROM (
SELECT `user_id`, `post_id`, `requested_on`
FROM `a`
WHERE `status` != 'cancelled'
UNION
SELECT `user_id`, `post_id`, `time` as requested_on
FROM `b`
WHERE `type` = 'ADD'
) tbl1
MySQL dies.
The reason why I want to do this is to GROUP BY user_id, post_id
Any ideas why this happens / any workarounds?
later-edit:
This is the SQL Fiddle:
http://sqlfiddle.com/#!2/c7f82d/2
The final query is there, which executes in:
Record Count: 10; Execution Time: 574ms
574ms for 10 records in my point of view is gigantic.

I found what the problem was from.
It was the fact that I was running the queries in PHPMyAdmin and when I did a SELECT UNION SELECT everything was good but when I did
SELECT * FROM (SELECT UNION SELECT)
the pagination system from PHPMyAdmin failed, and PHPMyAdmin was trying to output to my browser a over 30k rows table, that's why the SQL Request hang. :(

It is not clear what the question:
SELECT * FROM (
SELECT user_id, post_id, requested_on
FROM a
WHERE status != cancelled
UNION
SELECT user_id, post_id, time as requested_on
FROM b
WHERE type = ADD
) tbl1 GROUP BY user_id, post_id
means. Assume you have:
A, x, t1
A, x, t2
would you like the row with t1 or t2? If that does not matter lets apply an aggregate function such as MIN:
SELECT user_id, post_id, MIN(requested_on) FROM (
SELECT user_id, post_id, requested_on
FROM a
WHERE status <> cancelled
UNION
SELECT user_id, post_id, time as requested_on
FROM b
WHERE type = ADD
) tbl1
GROUP BY user_id, post_id
MySQL usually doesn't handle derived tables like this very well, is there any other predicate that you can apply to the parts in the union?

Related

Query with aggregate, subquery and group by not working

Can you help me, please? I spent about 2 hours to understand what is wrong, but still don't.
SQLSTATE[42S21]: Column already exists: 1060 Duplicate column name
'id'
select count(*) as aggregate
from (
select `cities`.*,
`cities`.`id` as `id`,
`cities`.`country_id` as `country_id`,
`cities`.`name` as `name`,
`cities`.`alias` as `alias`,
`cities`.`active_frontend` as `active_frontend`
from `cities`
where (
cities.alias in (
select `alias`
from `cities`
group by `alias`
having COUNT(`alias`) > 1
)
)
) count_row_table
Don't ask me what the hell is going on please. Biggest part of this query is generated by Laravel.
If I delete this part:
where
(cities.alias IN (SELECT alias FROM cities GROUP BY alias HAVING
COUNT(alias) > 1))
It will work. But I need this part af.
The issue is with cities.*.
But you can simplify your query to:
select sum(cnt) as cnt
from (
select COUNT(alias) as cnt
from cities
group by alias
having COUNT(alias) > 1
) t
and avoid re-reading your table because in the end, all your need is total number of rows for which alias has more than one row.
You don't need to materialize a subquery for this. You can do:
select count(*)
from cities c
where exists (select 1 from cities c2 where c2.alias = c.alias and c2.id <> c.id);
With an index on cities(alias, id), this should have better performance.

mysql group by not returning correct corresponding rows

The following query returns the correct latest_time but the corresponding fields are not the correct ones. How do I get matching fields for the given MAX value?
select `id`,`value1`,`value2`, MAX(`timestamp`) as `latest_time`
from `table`
group by `value1`
Non-aggregated columns that appear in the SELECT have an indeterminate value. You need a self-join to get the required values:
select t1.`id`, t1.`value1`, t1.`value2`, t1.`timestamp`
from `table` as t1
join (
select `value`, MAX(`timestamp`) as `latest_time`
from `table`
group by `value1`
) as t2 on t1.`value1` = t2.`value1` and t1.`timestamp` = t2.`latest_time`

How can I combine four queries into one query?

Structure of my tables:
posts (id, name, user_id, about, time)
comments (id, post_id, user_id, text, time)
users_votes (id, user, post_id, time)
users_favs ( id, user_id, post_id, time)
How can I combine these four queries (not with UNION):
SELECT `id`, `name`, `user_id`, `time` FROM `posts` WHERE `user_id` = 1
SELECT `post_id`, `user_id`, `text`, `time` FROM `comments` WHERE `user_id` = 1
SELECT `user`, `post_id`, `time` FROM `users_votes` WHERE `user` = 1
SELECT `user_id`, `post_id`, `time` FROM `users_favs` WHERE `user_id` = 1
Should I use JOINs?
What would the SQL query for this be?
You don't want to join these together.
The kind of JOIN you'd use to retrieve this would end up doing a cross-product of all the rows it finds. This means that if you had 4 posts, 2 comments, 3 votes, and 6 favorites you'd get 4*2*3*6 rows in your results instead of 4+2+3+6 when doing separate queries.
The only time you'd want to JOIN is when the two things are intrinsically related. That is, you want to retrieve the posts associated with a favorite, a vote, or a comment.
Based on your example, there's no such commonality in these things.

MySQL select with min

I need to select fields message and username from table list where list_id=1 (it can be 2 or 5 etc) with minimal number value (min(number)). How can i do it?
I tried it:
SELECT `message`,`username` FROM `list` WHERE `list_id`=2 AND min(`number`)
But it not work.
Try so
SELECT `message`,`username`
FROM `list`
WHERE `list_id` = 2
ORDER BY `number` ASC
LIMIT 1
SELECT
a.*
FROM list
INNER JOIN (
SELECT
`message`,
`username`,
MIN(`number`)
FROM `list`
WHERE `list_id`=2
) as a on a.id = list.id

MySQL - SELECT WHERE field IN (subquery) - Extremely slow why?

I've got a couple of duplicates in a database that I want to inspect, so what I did to see which are duplicates, I did this:
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
This way, I will get all rows with relevant_field occuring more than once. This query takes milliseconds to execute.
Now, I wanted to inspect each of the duplicates, so I thought I could SELECT each row in some_table with a relevant_field in the above query, so I did like this:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)
This turns out to be extreeeemely slow for some reason (it takes minutes). What exactly is going on here to make it that slow? relevant_field is indexed.
Eventually I tried creating a view "temp_view" from the first query (SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1), and then making my second query like this instead:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM temp_view
)
And that works just fine. MySQL does this in some milliseconds.
Any SQL experts here who can explain what's going on?
The subquery is being run for each row because it is a correlated query. One can make a correlated query into a non-correlated query by selecting everything from the subquery, like so:
SELECT * FROM
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) AS subquery
The final query would look like this:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT * FROM
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) AS subquery
)
Rewrite the query into this
SELECT st1.*, st2.relevant_field FROM sometable st1
INNER JOIN sometable st2 ON (st1.relevant_field = st2.relevant_field)
GROUP BY st1.id /* list a unique sometable field here*/
HAVING COUNT(*) > 1
I think st2.relevant_field must be in the select, because otherwise the having clause will give an error, but I'm not 100% sure
Never use IN with a subquery; this is notoriously slow.
Only ever use IN with a fixed list of values.
More tips
If you want to make queries faster,
don't do a SELECT * only select
the fields that you really need.
Make sure you have an index on relevant_field to speed up the equi-join.
Make sure to group by on the primary key.
If you are on InnoDB and you only select indexed fields (and things are not too complex) than MySQL will resolve your query using only the indexes, speeding things way up.
General solution for 90% of your IN (select queries
Use this code
SELECT * FROM sometable a WHERE EXISTS (
SELECT 1 FROM sometable b
WHERE a.relevant_field = b.relevant_field
GROUP BY b.relevant_field
HAVING count(*) > 1)
SELECT st1.*
FROM some_table st1
inner join
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)st2 on st2.relevant_field = st1.relevant_field;
I've tried your query on one of my databases, and also tried it rewritten as a join to a sub-query.
This worked a lot faster, try it!
I have reformatted your slow sql query with www.prettysql.net
SELECT *
FROM some_table
WHERE
relevant_field in
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT ( * ) > 1
);
When using a table in both the query and the subquery, you should always alias both, like this:
SELECT *
FROM some_table as t1
WHERE
t1.relevant_field in
(
SELECT t2.relevant_field
FROM some_table as t2
GROUP BY t2.relevant_field
HAVING COUNT ( t2.relevant_field ) > 1
);
Does that help?
Subqueries vs joins
http://www.scribd.com/doc/2546837/New-Subquery-Optimizations-In-MySQL-6
Try this
SELECT t1.*
FROM
some_table t1,
(SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT (*) > 1) t2
WHERE
t1.relevant_field = t2.relevant_field;
Firstly you can find duplicate rows and find count of rows is used how many times and order it by number like this;
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
CASE q.NID
WHEN #curCode THEN
#curRow := #curRow + 1
ELSE
#curRow := 1
AND #curCode := q.NID
END
) AS No
FROM UserInfo q,
(
SELECT
#curRow := 1,
#curCode := ''
) rt
WHERE q.NID IN
(
SELECT NID
FROM UserInfo
GROUP BY NID
HAVING COUNT(*) > 1
)
after that create a table and insert result to it.
create table CopyTable
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
CASE q.NID
WHEN #curCode THEN
#curRow := #curRow + 1
ELSE
#curRow := 1
AND #curCode := q.NID
END
) AS No
FROM UserInfo q,
(
SELECT
#curRow := 1,
#curCode := ''
) rt
WHERE q.NID IN
(
SELECT NID
FROM UserInfo
GROUP BY NID
HAVING COUNT(*) > 1
)
Finally, delete dublicate rows.No is start 0. Except fist number of each group delete all dublicate rows.
delete from CopyTable where No!= 0;
sometimes when data grow bigger mysql WHERE IN's could be pretty slow because of query optimization. Try using STRAIGHT_JOIN to tell mysql to execute query as is, e.g.
SELECT STRAIGHT_JOIN table.field FROM table WHERE table.id IN (...)
but beware: in most cases mysql optimizer works pretty well, so I would recommend to use it only when you have this kind of problem
This is similar to my case, where I have a table named tabel_buku_besar. What I need are
Looking for record that have account_code='101.100' in tabel_buku_besar which have companyarea='20000' and also have IDR as currency
I need to get all record from tabel_buku_besar which have account_code same as step 1 but have transaction_number in step 1 result
while using select ... from...where....transaction_number in (select transaction_number from ....), my query running extremely slow and sometimes causing request time out or make my application not responding...
I try this combination and the result...not bad...
`select DATE_FORMAT(L.TANGGAL_INPUT,'%d-%m-%y') AS TANGGAL,
L.TRANSACTION_NUMBER AS VOUCHER,
L.ACCOUNT_CODE,
C.DESCRIPTION,
L.DEBET,
L.KREDIT
from (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE!='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) L
INNER JOIN (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) R ON R.TRANSACTION_NUMBER=L.TRANSACTION_NUMBER AND R.COMPANYAREA=L.COMPANYAREA
LEFT OUTER JOIN master_account C ON C.ACCOUNT_CODE=L.ACCOUNT_CODE AND C.COMPANYAREA=L.COMPANYAREA
ORDER BY L.TANGGAL_INPUT,L.TRANSACTION_NUMBER`
I find this to be the most efficient for finding if a value exists, logic can easily be inverted to find if a value doesn't exist (ie IS NULL);
SELECT * FROM primary_table st1
LEFT JOIN comparision_table st2 ON (st1.relevant_field = st2.relevant_field)
WHERE st2.primaryKey IS NOT NULL
*Replace relevant_field with the name of the value that you want to check exists in your table
*Replace primaryKey with the name of the primary key column on the comparison table.
It's slow because your sub-query is executed once for every comparison between relevant_field and your IN clause's sub-query. You can avoid that like so:
SELECT *
FROM some_table T1 INNER JOIN
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) T2
USING(relevant_field)
This creates a derived table (in memory unless it's too large to fit) as T2, then INNER JOIN's it with T1. The JOIN happens one time, so the query is executed one time.
I find this particularly handy for optimising cases where a pivot is used to associate a bulk data table with a more specific data table and you want to produce counts of the bulk table based on a subset of the more specific one's related rows. If you can narrow down the bulk rows to <5% then the resulting sparse accesses will generally be faster than a full table scan.
ie you have a Users table (condition), an Orders table (pivot) and LineItems table (bulk) which references counts of Products. You want the sum of Products grouped by User in PostCode '90210'. In this case the JOIN will be orders of magnitude smaller than when using WHERE relevant_field IN( SELECT * FROM (...) T2 ), and therefore much faster, especially if that JOIN is spilling to disk!