MySQL optimize a union-query by using a join-query instead - mysql

I have 3 tables - one for users, one for their incoming payments, and one for their outgoing payments. I want to display all incoming and outgoing payments in a single result set. I can do this with multiple selects and a union but it seems cumbersome, and I suspect its slow due to the subqueries - and the tables are extremely large (though I am using indexes). Is there a faster way to achieve this? Maybe using a full outer join?
Here is a simplified version of the schema with some example data:
create table users (
id int auto_increment,
name varchar(20),
primary key (id)
) engine=InnoDB;
insert into users (name) values ('bob'),('fred');
create table user_incoming_payments (
user_id int,
funds_in int
) engine=InnoDB;
insert into user_incoming_payments
values (1,100),(1,101),(1,102),(1,103),
(2,200),(2,201),(2,202),(2,203);
create table user_outgoing_payments (
user_id int,
funds_out int
) engine=InnoDB;
insert into user_outgoing_payments
values (1,100),(1,101),(2,200),(2,201);
And here is the ugly looking query which generates the result I want for user bob:
select * from (
(select u.name, i.funds_in, 0 as 'funds_out' from users u
inner join user_incoming_payments i on u.id = i.user_id)
union
(select u.name, 0 as 'funds_in', o.funds_out from users u
inner join user_outgoing_payments o on u.id = o.user_id)
) a where a.name = 'bob'
order by a.funds_in asc, a.funds_out asc;
And here is as close as I can get to doing the same thing with joins - its not correct though because I want this result set to look the same as the previous and I wasn't sure how to use full outer join:
select *
from users u
right join user_incoming_payments i on u.id = i.user_id
right join user_outgoing_payments o on u.id = o.user_id
where u.name = 'bob';
SQL Fiddle here

MySQL doesn't support FULL OUTER JOIN. Even if it did support it, I don't think you would want that, as it would introduce a semi-cartesian product... with each row from incoming_ matching every row in outgoing_, creating extra rows.
If there were four rows from incoming_ and six rows from outgoing_, the set produced by a join operation would contain 24 rows.
This really looks more like you want a set concatenation operation. That is, you have two separate sets that you want to concatenate together. That's not a JOIN operation. That's a UNION ALL set operation.
SELECT ... FROM ...
UNION ALL
SELECT ... FROM ...
If you don't need to remove duplicates (and it looks like you wouldn't want to in this scenario, if there are multiple rows in incoming_ with the same value of funds_in, I don't think you want to remove any of the rows.)...
Then use the UNION ALL set operator which does not perform the check for and removal of duplicate rows.
The UNION operator removes duplicate rows. Which (again) I don't think you want.
The derived table isn't necessary.
And MySQL doesn't "push" the predicate from the outer table into the inline view. Which means that MySQL is going to materialized a derived table with all incoming and outgoing for all users. And the the outer query is going to look through that to find the rows. And until the most recent versions of MySQL, there were no indexes created on derived tables.
See the answer from Strawberry for an example of a more efficient query.
With the small example set, indexes aren't going to make any difference. With a large set, however, you are going to want to add appropriate covering indexes.
Also, with queries like this, I tend to include a discriminator column that tells me which query returned a row.
(
SELECT 'i' AS src
, ...
FROM ...
)
UNION ALL
(
SELECT 'o' AS src
, ...
FROM ...
)
ORDER BY ...

With this model, I'd probably write that query as follows, but I doubt it makes much difference...
select u.name
, i.funds_in
, 0 funds_out
from users u
join user_incoming_payments i
on u.id = i.user_id
where u.name = 'bob'
union all
select u.name
, 0 funds_in
, o.funds_out
from users u
join user_outgoing_payments o
on u.id = o.user_id
where u.name = 'bob'
order
by funds_in asc
, funds_out asc;
However, note that there's no PK here, which may prove problematic.
If it was me, I'd have one table for transactions, which would include a transaction_id PK, a timestamp for each each transaction, and a column to record whether a value was a credit or a debit.

Related

MySQL: Optimizing Sub-queries

I have this query I need to optimize further since it requires too much cpu time and I can't seem to find any other way to write it more efficiently. Is there another way to write this without altering the tables?
SELECT category, b.fruit_name, u.name
, r.count_vote, r.text_c
FROM Fruits b, Customers u
, Categories c
, (SELECT * FROM
(SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r
WHERE b.fruit_id = r.fruit_id
AND u.customer_id = r.customer_id
AND category = "Fruits";
This is your query re-written with explicit joins:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN
(
SELECT * FROM
(
SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r on r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
CROSS JOIN Categories c
WHERE c.category = 'Fruits';
(I am guessing here that the category column belongs to the categories table.)
There are some parts that look suspicious:
Why do you cross join the Categories table, when you don't even display a column of the table?
What is ORDER BY fruit_id, count_vote DESC, r_id supposed to do? Sub query results are considered unordered sets, so an ORDER BY is superfluous and can be ignored by the DBMS. What do you want to achieve here?
SELECT * FROM [ revues ] GROUP BY fruit_id is invalid. If you group by fruit_id, what count_vote and what r.text_c do you expect to get for the ID? You don't tell the DBMS (which would be something like MAX(count_vote) and MIN(r.text_c)for instance. MySQL should through an error, but silently replacescount_vote, r.text_cbyANY_VALUE(count_vote), ANY_VALUE(r.text_c)` instead. This means you get arbitrarily picked values for a fruit.
The answer hence to your question is: Don't try to speed it up, but fix it instead. (Maybe you want to place a new request showing the query and explaining what it is supposed to do, so people can help you with that.)
Your Categories table seems not joined/related to the others this produce a catesia product between all the rows
If you want distinct resut don't use group by but distint so you can avoid an unnecessary subquery
and you dont' need an order by on a subquery
SELECT category
, b.fruit_name
, u.name
, r.count_vote
, r.text_c
FROM Fruits b
INNER JOIN Customers u ON u.customer_id = r.customer_id
INNER JOIN Categories c ON ?????? /Your Categories table seems not joined/related to the others /
INNER JOIN (
SELECT distinct fruit_id, count_vote, text_c, customer_id
FROM Reviews
) r ON b.fruit_id = r.fruit_id
WHERE category = "Fruits";
for better reading you should use explicit join syntax and avoid old join syntax based on comma separated tables name and where condition
The next time you want help optimizing a query, please include the table/index structure, an indication of the cardinality of the indexes and the EXPLAIN plan for the query.
There appears to be absolutely no reason for a single sub-query here, let alone 2. Using sub-queries mostly prevents the DBMS optimizer from doing its job. So your biggest win will come from eliminating these sub-queries.
The CROSS JOIN creates a deliberate cartesian join - its also unclear if any attributes from this table are actually required for the result, if it is there to produce multiples of the same row in the output, or just an error.
The attribute category in the last line of your query is not attributed to any of the tables (but I suspect it comes from the categories table).
Further, your code uses a GROUP BY clause with no aggregation function. This will produce non-deterministic results and is a bug. Assuming that you are not exploiting a side-effect of that, the query can be re-written as:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN Reviews r
ON r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
ORDER BY r.fruit_id, count_vote DESC, r_id;
Since there are no predicates other than joins in your query, there is no scope for further optimization beyond ensuring there are indexes on the join predicates.
As all too frequently, the biggest benefit may come from simply asking the question of why you need to retrieve every single row in the tables in a single query.

MySql Query Timed Out in Live but not in Local

We are using MySQL InnoDB.
We have a query looks like this.
In our live environment, this query took more than 30 seconds to complete.
select count(*) as aggregate
from `parents`
where exists (
SELECT *
from `childs`
where `parents`.`id` = `childs`.`parent_id`
and exists (
SELECT *
from `users`
where `childs`.`user_id` = `users`.`id`
and `id` = '123456' )
and `status` = 'OK' )
And so we have exported the whole database and import into to our local mysql database.
And surprisingly, it took almost instant to get the same query results.
As so we suspect the table was not optimized and we have done the following.
optimize table users;
optimize table parents;
optimize table childs;
Unfortunately the query speed didn't improve.
Can anyone see what could goes wrong?
And why does export/import in local (with exactly same structure data) have almost instant query and the live took almost 30-60 seconds to complete?
EXPLAIN on both local and live shows a difference,
one of the DEPENDENT SUBQUERY for possible keys relating the parents and child table shows
Using where; FirstMatch(closing_batches)
but the live shows only Using where without the FirstMatch.
You can actually probably get all the data from a single query without even using the parents or user table -- IF the "Status" field is in the childs table.
From basic Transitive association,
if A = B and B = C, then A = C.
You are joining from Child to User by ID, then looking at the User ID = "123456".
This is the same as just asking for Childs.User_ID = "123456".
Likewise, from the Child joined to the parent by the Child.Parent_ID, it looks like your query is trying to get a count of distinct parent IDs that are associated with given childs.
So, the following SHOULD be able to get what you need.
select
count( distinct c.Parent_id ) Aggregate
from
childs c
where
c.user_id = '123456'
AND c.status = 'OK'
if the status field is on the PARENT table, you will need to join to that
select
count( distinct c.Parent_id ) Aggregate
from
childs c
join parents p
on c.parent_id = p.id
AND p.status = 'OK'
where
c.user_id = '123456'
For performance, I would ALSO have an index on the childs table on ( user_id, parent_id ). This can significantly optimize the query too.
This is probably equivalent:
select count(*) as aggregate
from `parents` AS p
where exists (
SELECT *
from `childs` AS c
JOIN users AS u ON c.user_id = u.id
WHERE c.user_id = 123456
AND p.`id` = c.`parent_id`
and `status` = 'OK'
)
OPTIMIZE TABLE is rarely useful.
Which table is status in?

Querying a large table using mysql

I manage a property website. I have a table with banned users (small table) and a table called advert_views which keeps track of each listing that each user views (currently 1.3m lines and growing). The advert_views table alsio takes note of the IP address for every advert viewed).
I want to get the IP addresses used by the banned users and check if any of these banned users have opened new accounts. I ran the following query:
SELECT adviews.user_id AS 'banned user_id',
adviews.client_ip AS 'IPs used by banned users',
adviews2.user_id AS 'banned users that opened a new account'
FROM banned_users
LEFT JOIN users on users.email_address = banned_users.email_address #since I don't store the user_id in banned_users
LEFT JOIN advert_views adviews ON adviews.user_id = users.id AND adviews.user_id IS NOT NULL # users may view listings when not logged in but they have restricted access to the information on the listing
LEFT JOIN (SELECT client_ip,
user_id
FROM advert_views
WHERE user_id IS NOT NULL
) adviews2
ON adviews2.client_ip = adviews.client_ip
WHERE banned_users.rec_status = 1 and adviews.user_id <> adviews2.user_id
GROUP BY adviews2.user_id
I applied an index on the advert_views table and the users table as per below:
enter image description here
My query takes half an hour to execute. Is there a way how to improve my query speed?
Thanks!
Chris
First of all: Why do you outer join the tables? Or better: Why do you try to outer join the tables? A left join is meant to get data from a table even when there is no match. But then your results could contain rows with all values null. (That doesn't happen though, because adviews.user_id <> adviews2.user_id in your where clause dismisses all outer-joined rows.) Don't give the DBMS more work to do than necessary. If you want inner joins, then don't outer join. (Though the difference in execution time won't be huge.)
Next: You select from banned_users, but you only use it to check existence. You shouldn't do this. Use an EXISTS or IN clause instead. (This is mainly for readability and in order not to produce duplicate results. This probably won't speed things up.)
SELECT av1.user_id AS 'banned user_id',
av2.client_ip AS 'IPs used by banned users',
av2.user_id AS 'banned users that opened a new account'
FROM adviews av1
JOIN adviews av2 ON av2.client_ip = av1.client_ip AND av2.user_id <> av1.user_id
WHERE av1.user_id IN
(
SELECT user_id
FROM users
WHERE email_address IN (select email_address from banned_users where rec_status = 1)
)
GROUP BY av2.user_id;
You may replace the inner IN clause with a join. It's mostly a matter of personal preference, but it is also that in the past MySQL sometimes didn't perform well on IN clauses, so many people made it a habit to join instead.
WHERE av1.user_id IN
(
SELECT u.user_id
FROM users u
JOIN banned_users bu ON bu.email_address = u.email_address
WHERE bu.rec_status = 1
)
At last consider removing the GROUP BY clause. It reduces your results to one row per reusing user_id, showing one of its related banned user_ids (arbitrarily chosen in case there is more than one). I don't know your tables. Are you getting many records per reusing user_id? If not, remove the clause.
As to indexes I suggest:
banned_users(rec_status, email_address)
users(email_address, user_id)
adviews(user_id, client_ip)
adviews(client_ip, user_id)

mysql query is taking too much time to execute

Hello everyone I am working on phpmyadmin database. Whenever I try to execute query it takes too much time more than 10 mins to show results. Is there any way to speed it up. please response.
The query is
SELECT ib.*, b.brand_name, m.model_name,
s.id as sale_id, br.branch_code,br.branch_name,r.rentry_date,r.id as rid
from in_book ib
left join brand b on ib.brand_id=b.id
left join model m on ib.vehicle_id=m.id
left join re_entry r on r.in_book_id=ib.id
left join sale s on ib.id=s.in_book_id
left join branch br on ib.branch_id=br.id
where ib.id !=''
and ib.branch_id='65'
group by ib.id
order by r.id ASC,
count(r.in_book_id) DESC ,
ib.purchaes_date ASC,
ib.id ASC
there are almost 7 tables
make sure you got an index on every key you use to join the tables.
from http://dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html:
The best way to improve the performance of SELECT operations is to create indexes on one or more of the columns that are tested in the query. The index entries act like pointers to the table rows, allowing the query to quickly determine which rows match a condition in the WHERE clause, and retrieve the other column values for those rows. All MySQL data types can be indexed.
.. this of course also applies to the JOIN conditions.
You don't list any such indexes, however, I would start with the following suggested indexes
table index
in_book ( branch_id, id, brand_id, vehicle_id )
brand ( id, brand_name )
model ( id, model_name )
re_entry ( in_book_id, id, reentry_date )
sale ( in_book_id, id )
branch ( id )
Also, with MySQL, you can use a special keyword "STRAIGHT_JOIN" which tells the engine to query in the order you have selected the tables... Although you are doing LEFT JOINs, I don't think it will matter as it appears the secondary tables are all lookup type of tables and in_book is your primary. But as just a try it would be..
SELECT STRAIGHT_JOIN (...rest of query...)

LEFT JOIN but with WHERE criteria, rows getting lost

I have a simple database with three tables. In the database I have a table for users of my system, a table for applications to a competition, and an intermediary table that allows me to track which users have selected which applications to view.
Table 1 = users (user_id, username, first, last, etc...)
Table 2 = applications (application_id, company_name, url, etc...)
Table 3 = picks (pick_id, user_id, application_id, picked)
I am trying to write an SQL query that will show all the applications that have been submitted and if any individual application has been selected by a user will show that it has been "picked" (1=picked, 0=not picked).
So for user_id = 1 I'd like to see:
Column Names (application_id, company_name, picked)
1, Foo, 1
2, Bar, 1
3, Alpha, Null
4, Beta, Null
I tried it with the following query:
SELECT applications.application_id, applications.company_name, picks.picked
FROM applications
LEFT JOIN picks ON applications.application_id = picks.application_id
ORDER BY applications.application_id ASC
Which is returning this:
1, Foo, 1
1, Foo, 1
2, Bar, null
3, Alpha, null
4, Beta, null
I have a second user (user_id = 2) that also picked application 1 ("Foo") which I know is returning the second row.
Then I tried to limit the scope by specifying user_id = 1 here:
SELECT applications.application_id, applications.company_name, picks.picked
FROM applications
LEFT JOIN picks ON applications.application_id = picks.application_id
WHERE user_id = 1
ORDER BY applications.application_id ASC
Now I'm only getting:
1, Foo, 1
Any suggestions on how I can get what I'm looking for? Again, ideally for a single user I'd like to see:
Column Names (application_id, company_name, picked)
1, Foo, 1
2, Bar, 1
3, Alpha, Null
4, Beta, Null
You have a so-called join table in your database schema. In your case it's called picks. This allows you to create a many-to-many relationship between your users and applications.
To use that join table correctly you need to join all three tables. These queries are easier to write if you use table aliases (applications AS a, etc.)
SELECT a.application_id, a.company_name, p.picked, u.user_id, u.username
FROM applications AS a
LEFT JOIN picks AS p ON a.application_id = p.application_id
LEFT JOIN users AS u ON p.user_id = u.user_id
ORDER BY a.application_id, u.user_id
This will give you a list of all applications with the users who have made them. If no users are related to an application, the LEFT JOIN operations will retain the application row and you'll see NULL values for columns from the picks and users table.
Now, if you add a WHERE p.something = something or u.something = something clause to this query in an attempt to narrow down the presentation, it has the effect of converting the LEFT JOIN clauses into INNER JOIN clauses. That is, you won't retain the applications rows that don't have matching rows in the other tables.
If you want to retain those unmatched rows in your result set, put the condition in the first ON clause instead of the WHERE clause, like so.
SELECT a.application_id, a.company_name, p.picked, u.user_id, u.username
FROM applications AS a
LEFT JOIN picks AS p ON a.application_id = p.application_id AND p.user_id = 1
LEFT JOIN users AS u ON p.user_id = u.user_id
ORDER BY a.application_id, u.user_id
Edit Many join tables like your picks table are set up with a composite primary key, in your example (application_id, user_id). That ensures just one row per possible relationship between the tables being joined. In your case you have the potential for multiple such rows.
To use only the most recent of those rows (the one with the highest pick_id) takes a little more work. You need a subquery (virtual table) to extract it, and to retrieve the appropriate value of picked so your query works. So now things get interesting.
SELECT MAX(pick_id) AS pick_id,
application_id, user_id
FROM picks
GROUP BY application_id, user_id
retrieves the unique relationship pair. That is good. But next we have to fetch the picked column detail value from those rows. That takes another join, using the MAX value of pick_id, like so
SELECT q.application_id, q.user_id, r.picked
FROM (
SELECT MAX(pick_id) AS pick_id,
application_id, user_id
FROM picks
GROUP BY application_id, user_id
) AS q
JOIN picks AS r ON q.pick_id = r.pick_id
So, we need to substitute this little virtual table (subquery) in place of the pick AS p table in the original query. That looks like this.
SELECT a.application_id, a.company_name, p.picked, u.user_id, u.username
FROM applications AS a
LEFT JOIN (
SELECT q.application_id, q.user_id, r.picked
FROM (
SELECT MAX(pick_id) AS pick_id,
application_id, user_id
FROM picks
GROUP BY application_id, user_id
) AS q
JOIN picks AS r ON q.pick_id = r.pick_id
) AS p ON a.application_id = p.application_id AND p.user_id = 1
LEFT JOIN users AS u ON p.user_id = u.user_id
ORDER BY a.application_id, u.user_id
Some developers prefer to create VIEW objects for subqueries like the one here, rather than creating a club sandwich of a query like this one. It's not called Structured Query Language on a foolish whim, eh? These subqueries sometimes can be elements of a structure.