Optimizing a simple query with join and subquery - mysql

I have two tables - object_72194_ and object_72197_ respectively:
| attr_72195_ | | attr_72198_ | attr_72199_ |
| 2013-07-31 | | a | 2013-07-31 |
| 2013-07-30 | | b | 2013-07-31 |
| 2013-07-29 | | c | 2013-07-30 |
| 2013-07-28 | | d | 2013-07-29 |
For each row in the first table I want to get the value of the field attr_72198_ from the second table where attr_72199_ less or equal to attr_72195_. So, in this case the result will look like this:
|attr_72195_ | attr_72196_ |
|2013-07-31 | a |
|2013-07-30 | c |
|2013-07-29 | d |
|2013-07-28 | NULL |
I want to get value per row. Now my working query looks like this:
SELECT f1.attr_72195_, t.attr_72198_ AS attr_72196_
FROM object_72194_ f1
LEFT OUTER JOIN (
SELECT id, attr_72198_, attr_72199_ FROM object_72197_ t
) AS t ON t.attr_72199_ <= f1.attr_72195_
WHERE ( f1.id_obj = 72194 ) AND (t.attr_72199_ = (
SELECT MAX(attr_72199_) FROM object_72197_ t
WHERE attr_72199_ <= f1.attr_72195_
) OR t.attr_72199_ IS NULL
) ORDER BY f1.id_order DESC
It works as expected. But it does not seem to be quite optimal because of the subquery in the last WHERE block. One programmer advised me to use one more join with grouping instead, but I just do not know how.
Thank you!
EDIT:
Removed unnecessary ordering inside subqueries and checked both queries (mine and with joins instead of a subquery) and got interesting result. EXPLAIN returned five rows for the version with a subquery and four rows for the version with joins (id column). In "rows" column I got 13 rows for the subquery version and 18 rows in total for join version. So, have to check against large data to decide what version to use.
EDIT:
Oh, this query with joins turned out to be incorrect, because it groups results by the column
EDIT:
The question is still open. The problem appears when there are duplicates in the second table. As a result, both queries return as many rows as in the second table. But I need just a value per row. Just as I showed from the very first in the example with values "a", "b", "c" and "d".
EDIT:
Finally, I did it. I added grouping by unique field in the first table and returned the previous grouping. So the query now looks like this:
SELECT f1.attr_72195_, f2.attr_72198_ AS attr_72196_
FROM object_72194_ f1
INNER JOIN
(
SELECT f1.attr_72195_, MAX(f2.attr_72199_) AS attr_72199_
FROM object_72194_ f1
LEFT OUTER JOIN object_72197_ f2 ON f1.attr_72195_ >= f2.attr_72199_
GROUP BY f1.attr_72195_
) o
ON f1.attr_72195_ = o.attr_72195_
LEFT OUTER JOIN object_72197_ f2 ON f2.attr_72199_ = o.attr_72199_
GROUP BY f1.id, attr_72195_ ORDER BY f1.id_order DESC
Simple and elegant.

Avoiding the correlated sub queries (not tested):-
SELECT f1.attr_72195_, MIN(f2.attr_72198_)
FROM object_72194_ f1
INNER JOIN
(
SELECT f1.attr_72195_, MAX(f2.attr_72199_) As Max_attr_72199_
FROM object_72194_ f1
LEFT OUTER JOIN object_72197_ f2
ON f1.attr_72195_ >= f2.attr_72199_
GROUP BY f1.attr_72195_
) Sub1
ON f1.attr_72195_ = Sub1.attr_72195_
LEFT OUTER JOIN object_72197_ f2
ON f2.attr_72199_ = Sub1.Max_attr_72199_
GROUP BY f1.attr_72195_
Do a LEFT JOIN between the 2 tables, and get the max date from the 2nd table which is less than or equal to the first table. Inner join the results of that back to the 1st table and left join to the 2nd table. Not sure which value of attr_72198_ you want when the dates are duplicates so I have just used the min function to get the smallest one.
EDIT
Try this which should cope with duplicates on the first table.
SELECT f1.attr_72195_, f2.attr_72198_
FROM object_72194_ f1
INNER JOIN
(
SELECT f1.attr_72195_, MAX(f2.attr_72199_) As Max_attr_72199_
FROM object_72194_ f1
LEFT OUTER JOIN object_72197_ f2
ON f1.attr_72195_ >= f2.attr_72199_
GROUP BY f1.attr_72195_
) Sub1
ON f1.attr_72195_ = Sub1.attr_72195_
LEFT OUTER JOIN
(
SELECT attr_72199_, MIN(attr_72198_) AS attr_72198_
FROM object_72197_
GROUP BY attr_72199_
) f2
ON f2.attr_72199_ = Sub1.Max_attr_72199_

Related

SQL Distinct based on different colum

I have problem to distinct values on column based on other column. The case study is:
Table: List
well | wbore | op|
------------------
wella|wbore_a|op_a|
wella|wbore_a|op_b|
wella|wbore_a|op_b|
wella|wbore_b|op_c|
wella|wbore_b|op_c|
wellb|wbore_g|op_t|
wellb|wbore_g|op_t|
wellb|wbore_h|op_k|
So, I want the output to be appear in different field/column like:
well | total_wbore | total_op
----------------------------
wella | 2 | 3
---------------------------
wellb | 2 | 2
the real study case come from different table but to simplify it I just assume this case happened in 1 table.
The sql query that I tried:
SELECT well.well_name, wellbore.wellbore_name, operation.operation_name, COUNT(*)
FROM well
INNER JOIN wellbore ON wellbore.well_uid = well.well_uid
INNER JOIN operation ON wellbore.well_uid = operation.well_uid
GROUP BY well.well_name,wellbore.wellbore_name
HAVING COUNT(*) > 1
But this query is to calculate the duplicate row which not meet the requirement. Anyone can help?
you need to use count distinct
SELECT
count(distinct wellbore.wellbore_name) as total_wbore
count(distinct operation.operation_name) as total_op
FROM well
INNER JOIN wellbore ON wellbore.well_uid = well.well_uid
INNER JOIN operation ON wellbore.well_uid = operation.well_uid
Final query:
SELECT
well.well_name,
COUNT(DISTINCT wellbore.wellbore_name) AS total_wbore,
COUNT(DISTINCT operation.operation_name) AS total_op
FROM well
INNER JOIN wellbore ON wellbore.well_uid = well.well_uid
INNER JOIN operation ON wellbore.well_uid = operation.well_uid
GROUP BY well.well_name

difference made by sub-queries

Problem statement link
Correct code (by dongyuzhang):
select con.contest_id,
con.hacker_id,
con.name,
sum(total_submissions),
sum(total_accepted_submissions),
sum(total_views), sum(total_unique_views)
from contests con
join colleges col on con.contest_id = col.contest_id
join challenges cha on col.college_id = cha.college_id
left join
(select challenge_id, sum(total_views) as total_views, sum(total_unique_views) as total_unique_views
from view_stats group by challenge_id) vs on cha.challenge_id = vs.challenge_id
left join
(select challenge_id, sum(total_submissions) as total_submissions, sum(total_accepted_submissions) as total_accepted_submissions from submission_stats group by challenge_id) ss on cha.challenge_id = ss.challenge_id
group by con.contest_id, con.hacker_id, con.name
having sum(total_submissions)!=0 or
sum(total_accepted_submissions)!=0 or
sum(total_views)!=0 or
sum(total_unique_views)!=0
order by contest_id;
My changed code without sub-queries which is incorrect and giving larger values of sums. I don't understand how writing sub-queries is making the difference ? A simple example test case would be very helpful. THANKS !
select con.contest_id,
con.hacker_id,
con.name,
sum(total_submissions),
sum(total_accepted_submissions),
sum(total_views), sum(total_unique_views)
from contests con
join colleges col on con.contest_id = col.contest_id
join challenges cha on col.college_id = cha.college_id
left join view_stats vs
on cha.challenge_id = vs.challenge_id
left join submission_stats ss
on cha.challenge_id = ss.challenge_id
group by con.contest_id, con.hacker_id, con.name
having sum(total_submissions)!=0 or
sum(total_accepted_submissions)!=0 or
sum(total_views)!=0 or
sum(total_unique_views)!=0
order by contest_id;
In general with the subqueries first you make the aggregation before the join, so the values are right, since you have only one row per chalange_id respective contest_id and hacker id with the right sum.
If you join them together first, the values are summed up once for every matching row in the main-query.
Table1:
id | value1
a | 1
a | 2
b | 3
Table2:
id | value2
a | 5
a | 6
If you join without subqueries you got(before grouping)
a | 1 | 5
a | 1 | 6
a | 2 | 5
a | 2 | 6
So surely the sums are wrong.
select Table1.id , sum(value1), sum(value2) from
Table1 join Table2 on Table1.id = Table2.id
would return
a | 6 | 22
but
select Table1.id , sum(value1), max(sum2) from
Table1 join (select sum(value2) as sum2 from Table2 group by id) t2 on Table1.id = Table2.id
would return
a | 3 | 11
I don't know if this is the case in your query, but this is the main difference of using subqueries

Unique rows in join result

I have a tables of delas and curencies look like this
curecnies
id,code
pairs (the available pairs of curencies )
id to_sell to_buy
deals
id
user_id
pair_id
amount_to_sell
amount_to_buy
So I need to get all match deals which can execute , but I am can not get the unique matches.
Here is my sql query
select *
from deals as d1
join deals d2
on d1.sell_amount = d2.buy_amount and d1.buy_amount = d2.sell_amount
i am getting result look like this
id | user_id | pair_id | amount_to_buy | amount_to_sell | id | user_id | pair_id | amount_to_buy | amount_to_sell
1|2|1|1000|3000|2|1|2|3000|1000
2|1|2|3000|1000|1|2|1|1000|3000
You may try using a least/greatest trick here:
SELECT t1.*, t2.*
FROM
(
SELECT DISTINCT
LEAST(d1.id, d2.id) AS d1_id,
GREATEST(d1.id, d2.id) AS d2_id
FROM deals AS d1
INNER JOIN deals d2
ON d1.sell_amount = d2.buy_amount AND
d1.buy_amount = d2.sell_amount
) d
INNER JOIN deals t1
ON d.d1_id = t1.id
INNER JOIN deals t2
ON d.d2_id = t2.id;
The basic idea here is that the subquery labelled d finds a single pair of matched deal IDs, using a least/greatest trick. Then, we join twice to the deals table again to bring in the full information for each member of that deal pair.

An explanation with SQL query

I trying to get some data for my JavaFX Application from a couple of tables in database with MySQl.
Here's the query:
select veturattable.id, veturattable.vetura,veturattable.modeli,veturattable.ngjyra,
veturattable.targa, renttable.pagesa, hargjimettable.shuma
from veturattable
left join hargjimettable
on hargjimettable.veturaid= veturattable.id
left join renttable
on renttable.veturaid = veturattable.id ;
Here are datas from rentable
And here are datas from hargjimettable
So what I need is to show me this one:
veturaid | pagesa | shuma
1 | 150 | 91
10 | 110 | 40
You actually need to do two subqueries pre-aggregating the sum amounts per respective ID. Then join each individually back to the main. If you don't, you are getting a Cartesian product. For every record in the hargjimettable table for a given ID, it is joined to the renttable for each amount there. So, if you have 2 records in first table and 3 records in the second, you are getting a multiple of 6.
By pre-querying each grouping by the one ID key respectively, you will only have at most, one record for each possible summation. So grab that record if it exists. The left-join prevents some IDs from not showing up. Using coalesce() prevents nulls from showing.
select
v.id,
v.vetura,
v.modeli,
v.ngjyra,
v.targa,
COALESCE( RSum.SumPagesa, 0 ) as AllPagesa,
COALESCE( HSum.SumShuma, 0 ) as AllShuma
from
veturattable v
left join
( select
h.veturaid,
SUM( h.shuma ) as SumShuma
from
hargjimettable h
group by
h.veturaid ) HSum
ON v.id = HSum.veturaid
left join
( select
r.veturaid,
SUM( r.pagesa ) as SumPagesa
from
renttable r
group by
r.veturaid ) RSum
ON v.id = RSum.veturaid
You actually want the MAX() and SUM() along the GROUP BY like
select max(veturattable.id) as id, max(veturattable.vetura) as vetura,
max(veturattable.modeli) as modeli,
max(veturattable.ngjyra) as ngjyra,
max(veturattable.targa) as targa,
max(renttable.pagesa) as pagesa,
sum(hargjimettable.shuma) as shuma
from veturattable
left join hargjimettable
on hargjimettable.veturaid= veturattable.id
left join renttable
on renttable.veturaid = veturattable.id
group by veturattable.id;

MySQL JOIN returns unexpected values

I'm trying to do a simple mysql request and I'm having problems.
I have 2 tables defined like below:
currencies
______________________________________________________________________________________
currency_id | currency_name | currency_symbol | currency_active | currency_auto_update
exchange_rates
____________________________________________________________________________________________________
exchange_rate_id | currency_id | exchange_rate_date | exchange_rate_value | exchange_rate_added_date
What I want to do is to select the last row inside exchange_rates for the active currency.
I did it like this:
SELECT c.currency_id, c.currency_name, c.currency_symbol, c.currency_active, er.exchange_rate_id, er.exchange_rate_date, er.exchange_rate_value
FROM currencies c
LEFT JOIN (
SELECT er1.exchange_rate_id, er1.currency_id, er1.exchange_rate_date, er1.exchange_rate_value
FROM exchange_rates er1
ORDER BY er1.exchange_rate_date DESC
LIMIT 1
) AS er
ON er.currency_id=c.currency_id
WHERE c.currency_active='1'
This is returning me NULL values from the exchange_rates table, even if there are matching rows
I've tried to remove LIMIT 1 but if I do it like this is returning me all the rows for active currency, which is not the solution I want
How should this query look like?
Thanks!
Try this:
SELECT c.currency_id, c.currency_name, c.currency_symbol, c.currency_active, er.exchange_rate_id, er.exchange_rate_date, er.exchange_rate_value
FROM currencies c
LEFT JOIN (SELECT * FROM
(SELECT er1.exchange_rate_id, er1.currency_id, er1.exchange_rate_date, er1.exchange_rate_value
FROM exchange_rates er1
ORDER BY er1.exchange_rate_date DESC) AS A GROUP BY currency_id
) AS er
ON er.currency_id=c.currency_id
WHERE c.currency_active='1'
The idea of query look correct to me. your left join query should return the row with the newest exchange rate. you could use inner join here because there must be a currency refering to this...
SELECT c.currency_id, c.currency_name, c.currency_symbol, c.currency_active, er.exchange_rate_id, er.exchange_rate_date, er.exchange_rate_value
FROM currencies c
INNER JOIN exchange_rates_er1 e ON e.currency_id = c.currency_id
WHERE c.currency_active='1'
ORDER BY e.exchange_rate_date DESC
LIMIT 1