Nested queries and Join - mysql

As a beginner with SQL, I’m ok to do simple tasks but I’m struggling right now with multiple nested queries.
My problem is that I have 3 tables like this:
a Case table:
id nd date username
--------------------------------------------
1 596 2016-02-09 16:50:03 UserA
2 967 2015-10-09 21:12:23 UserB
3 967 2015-10-09 22:35:40 UserA
4 967 2015-10-09 23:50:31 UserB
5 580 2017-02-09 10:19:43 UserA
a Value table:
case_id labelValue_id Value Type
-------------------------------------------------
1 3633 2731858342 X
1 124 ["864","862"] X
1 8981 -2.103 X
1 27 443 X
... ... ... ...
2 7890 232478 X
2 765 0.2334 X
... ... ... ...
and a Label table:
id label
----------------------
3633 Value of W
124 Value of X
8981 Value of Y
27 Value of Z
Obviously, I want to join these tables. So I can do something like this:
SELECT *
from Case, Value, Label
where Case.id= Value.case_id
and Label.id = Value.labelValue_id
but I get pretty much everything whereas I would like to be more specific.
What I want is to do some filtering on the Case table and then use the resulting id's to join the two other tables. I'd like to:
Filter the Case.nd's such that if there is serveral instances of the same nd, take the oldest one,
Limit the number of nd's in the query. For example, I want to be able to join the tables for just 2, 3, 4 etc... different nd.
Use this query to make a join on the Value and Label table.
For example, the output of the queries 1 and 2 would be:
id nd date username
--------------------------------------------
1 596 2016-02-09 16:50:03 UserA
2 967 2015-10-09 21:12:23 UserB
if I ask for 2 different nd. The nd 967 appears several times but we take the oldest one.
In fact, I think I found out how to do all these things but I can't/don't know how to merge them.
To select the oldest nd, I can do someting like:
select min((date)), nd,id
from Case
group by nd
Then, to limit the number of nd in the output, I found this (based on this and that) :
select *,
#num := if(#type <> t.nd, #num + 1, 1) as row_number,
#type := t.nd as dummy
from(
select min((date)), nd,id
from Case
group by nd
) as t
group by t.nd
having row_number <= 2 -- number of output
It works but I feel it's getting slow.
Finally, when I try to make a join with this subquery and with the two other tables, the processing keeps going on for ever.
During my research, I could find answers for every part of the problem but I can't merge them. Also, for the "counting" problem, where I want to limit the number of nd, I feel it's kind of far-fetch.
I realize this is a long question but I think I miss something and I wanted to give details as much as possible.

to filter the case table to eliminate all but oldest nds,
select * from [case] c
where date = (Select min(date) from case
where nd = c.nd)
then just join this to the other tables:
select * from [case] c
join value v on v.Case_id = c.Id
join label l on l.Id = v.labelValue_id
where date = (Select min(date) from [case]
where nd = c.nd)
to limit it to a certain number of records, there is a mysql specific command, I think it called Limit
select * from [case] c
join value v on v.Case_id = c.Id
join label l on l.Id = v.labelValue_id
where date = (Select min(date) from [case]
where nd = c.nd)
Limit 4 -- <=== will limit return result set to 4 rows
if you only want records for the top N values of nd, then the Limit goes on a subquery restricting what values of nd to retrieve:
select * from [case] c
join value v on v.Case_id = c.Id
join label l on l.Id = v.labelValue_id
where date = (Select min(date) from [case]
where nd = c.nd)
and nd In (select distinct nd from [case]
order by nd desc Limit N)

So finally, here is what worked well for me:
select *
from (
select *
from Case
join (
select nd as T_ND, date as T_date
from Case
where nd in (select distinct nd from Case)
group by T_ND Limit 5 -- <========= Limit of nd's
) as t
on Case.nd = t.T_ND
where date = (select min(date)
from Case
where nd = t.T_ND)
) as subquery
join Value
on Value.context_id = subquery.id
join Label
on Label.id = Value.labelValue_id
Thank you #charlesbretana for leading me on the right track :).

Related

How can I optimise mySQL to use JOINs instead of nested IN queries?

I have a query which combines a user's balance at a number of locations and uses a nested subquery to combine data from the customer_balance table and the merchant_groups table. There is a second piece of data required from the customer_balance table that is unique to each merchant.
I'd like to optimise my query to return a sum and a unique value i.e. the order of results is important.
For instance, there may be three merchants in a merchant_group:
id | group_id | group_member_id
1 12 36
2 12 70
3 12 106
The user may have a balance at 2 locations but not all in the customer_balance table:
id | group_member_id | user_id | balance | personal_note
1 36 420 1.00 "Likes chocolate"
2 70 420 20.00 null
Notice there isn't a 3rd row in the balance table.
What I'd like to end up with is the ability to pull the sum of the balance as well as the most appropriate personal_note.
So far I have this working in all situations with the following query:
SELECT sum(c.cash_balance) as cash_balance,n.customer_note FROM customer_balance AS c
LEFT JOIN (SELECT customer_note, user_id FROM customer_balance
WHERE user_id = 420 AND group_member_id = 36) AS n on c.user_id = n.user_id
WHERE c.user_id = 420 AND c.group_id IN (SELECT group_member_id FROM merchant_group WHERE group_id = 12)
I can change out the group_member_id appropriately and I will always get the combined balance as expected and the appropriate note. i.e. what I'm looking for is:
balance: 21.00
customer_note: "Likes Chocolate" OR null (depending on the group_member_id)
Is it possible to optimise this query without using resource heavy nested queries e.g. using a JOIN? (or some other method).
I have tried a number of options, but cannot get it working in all situations. The following is the closest I have gotten, except this doesn't return the correct note:
SELECT sum(cb.balance), cb.personal_note FROM customer_balance AS cb
LEFT JOIN merchant_group AS mg on mg.group_member_id = cb.group_member_id
WHERE cb.user_id = 420 && mg.group_id = 12
ORDER BY (mg.group_member_id = 106)
I also tried another option (but since lost the query) that works, but not when the group_member_id = 106 - because there was no record in one table (but this is a valid use case that I'd like to cater for).
Thanks!
This should be equivalent but without subselect
SELECT
sum(c.cash_balance) as cash_balance
, n.customer_note
FROM customer_balance AS c
LEFT JOIN customer_balance as n on ( c.user_id = n.user_id AND n.group_member_id = 36 AND n.user_id = 420 )
INNER JOIN merchant_group as mg on ( c.group_id = mg.group_member_id AND mg.group_id = 12)
WHERE c.user_id = 420

One to one join in MySQL

I need to join two table on one common column, but I want to maintain a one-to-one relation on other two column. For example:
table_1
ID_C ID_ROW_C OPT
C 1 10
C 2 10
table_2
ID_F ID_ROW_F OPT
F 3 10
F 4 10
My query:
select *
from table_1, table_2
where table_1.OPT=table_2.OPT
result
ID_C ID_ROW_C OPT ID_F ID_ROW_F
C 1 10 F 3
C 1 10 F 4
C 2 10 F 3
C 2 10 F 4
desired result:
ID_C ID_ROW_C OPT ID_F ID_ROW_F
C 1 10 F 4
C 2 10 F 3
or
ID_C ID_ROW_C OPT ID_F ID_ROW_F
C 1 10 F 3
C 2 10 F 4
How can I do?
What you need to do is use JOIN.
SELECT * FROM table_1
JOIN table_2
ON table_1.OPT = table_2.OPT
More info from the MySQL manual: https://dev.mysql.com/doc/refman/5.0/en/join.html
And a relevant Stack Overflow discussion on the different types of JOINs: What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?
Since you're not providing any rule to relate the columns, you're getting exactly what you're supposed to get: All the rows of both tables that fulfill the relation.
However, you can create an "artificial" condition to get what you want... it's not pretty, but it will work:
select t1.id_c, t1.id_row_c
, t1.opt
, t2.id_f, t2.id_row_f
from
(
select #r_id_1 := (case
when #prev_opt_1 = table_1.opt then #r_id_1 + 1
else 1
end) as r_id
, table_1.*
, #prev_opt_1 := table_1.opt as new_opt_1
from (select #r_id_1 := 0, #prev_opt_1 := 0) as init_1
, table_1
order by table_1.opt, table_1.id_row_c
) as t1
inner join (
select #r_id_2 := (case
when #prev_opt_2 = table_2.opt then #r_id_2 + 1
else 1
end) as r_id
, table_2.*
, #prev_opt_2 := table_2.opt as new_opt_2
from (select #r_id_2 := 0, #prev_opt_2 := 0) as init_2, table_2
order by table_2.opt, table_2.id_row_f
) as t2 on t1.opt = t2.opt and t1.r_id = t2.r_id
See the result at SQL Fiddle.
The explanation
Let's take the first subquery:
select #r_id_1 := (case
when #prev_opt_1 = table_1.opt then #r_id_1 + 1
else 1
end) as r_id
, table_1.*
, #prev_opt_1 := table_1.opt as new_opt_1
from (select #r_id_1 := 0, #prev_opt_1 := 0) as init_1
, table_1
order by table_1.opt, table_1.id_row_c
In the from clause for this query, I'm declaring two user variables and initializing them to zero. The #r_id_1 variable will increase by one if the previous value of #prev_opt_1 is equal to the current value of opt, or reset to 1 if the value is different. The variable #prev_opt_1 will take the value of the opt column after the #r_id_1 variable is set. This means that, for each opt value, the #r_id_1 variable will have an increasing value.
The second subquery does exactly the same for the other table.
Finally, the outer-most query will join both subqueries using opt and the increasing Id.
Take the time to understand what's going on behind scenes (execute each subquery separatedly and see what happens).
As I said, this solution is "artificial"... it's a way to get what you need, but to avoid this dirty and quite complex hard solutions, you need to rethink your tables, and make them more easy to relate with each other.
Hope this helps

Determine ranking with single mysql query

I am selecting a set of items from my table and determine their ranking to display this on my page, my code for selecting the items:
<?
$attra_query=mysqli_query($link, "select * from table WHERE category ='4'");
if(mysqli_num_rows($attra_query)>
0){
while($attra_data=mysqli_fetch_array($attra_query,1)){
?>
In the while loop I determine the ranking for each of those items like so:
SELECT COUNT(mi.location) + 1 rank
FROM table m
LEFT JOIN (
SELECT id,location,country, ROUND(COALESCE(total_rating/total_rating_amount,0),10) rating_per_vote
FROM table WHERE category = '4'
) mi
ON mi.location = m.location
AND mi.country = m.country
AND mi.rating_per_vote > ROUND(COALESCE(m.total_rating/m.total_rating_amount,0),10)
WHERE m.id = '$attra_id';
I figure this is highly inefficient, is there a way to combine the 2 queries into a single one so I don't have to run the ranking query for each item separately ?
//EDIT
Sample data:
id | location | country | category | total_rating | total_rating_amount
1 berlin DE 4 12 2
2 munich DE 4 9 1
Vote system is 1-10 points, for the sample data berlin has received a total rating of 12 with 2 votes, munich has received a rating of 9 with 1 vote, so berlin would have a rating of 6/10 and munich a rating of 9/10 and therefore should be ranked #1
SELECT COUNT(m.id) rank, m.id
FROM
(SELECT * FROM table WHERE category = '4') m
LEFT JOIN (
SELECT id,location,country, ROUND(COALESCE(total_rating/total_rating_amount,0),10) rating_per_vote
FROM table WHERE category = '4'
) mi
ON (mi.location = m.location
AND mi.country = m.country
AND mi.rating_per_vote > ROUND(COALESCE(m.total_rating/m.total_rating_amount,0),10))
OR mi.id=m.id
GROUP BY m.id
This should do I suppose. I don't know if this is the best possible solution.
In MySQL, you can do the ranking using variables. It is a bit hard to tell what you want to rank by from your query, but it would be something like this:
select t.*, (#rn := #rn + 1) as ranking
from table t cross join
(select #rn := 0) vars
where category = '4'
order by rating_per_vote;
If you provide sample data and desired results, it would be possible to refine this solution.

Join Table and Select Highest Date Value

Here is the query that I run
SELECT cl.cl_id, cc_rego, cc_model, cl_dateIn, cl_dateOut
FROM courtesycar cc LEFT JOIN courtesyloan cl
ON cc.cc_id = cl.cc_id
Results:
1 NXI955 Prado 2013-10-24 11:48:38 NULL
2 RJI603 Avalon 2013-10-24 11:48:42 2013-10-24 11:54:18
3 RJI603 Avalon 2013-10-24 12:01:40 NULL
The results that I wanted are to group by the cc_rego values and print the most recent cl_dateIn value. (Only Display Rows 1,3)
I've tried to use MAX on the date and group by clause, but it combines rows, 2 & 3 together showing both the highest value of dateIn and dateOut.
I resolved the problem.
Instead of using left join, I added a condition in the where clause which embeds to MAX of the dateIn
SELECT cll.cl_id, cc.cc_id, cc_rego, cc_model, cll.cl_dateIn, cll.cl_dateOut
FROM courtesycar cc, courtesyloan cll
WHERE cl_dateIn = (
SELECT MAX( cl.cl_dateIn )
FROM courtesyloan cl
WHERE cl.cc_id = cc.cc_id )
AND cc.cc_id = cll.cc_id

Select Count of Rows with Joined Tables

I have two tables with a one to many relationship. I join the tables by an id column. My problem is that I need a count of all matching entries from the second (tablekey_id) table but I need the information from the row marked with the boolean is_basedomain. As a note there is only one row with is_basedomain = 1 per set of rows with the same tablekey_id.
Table: tablekey
id linkdata_id timestamp
22 9495028175 2013-03-10 01:13:46
23 8392740179 2013-03-10 21:23:25
Table: searched_domains.
NOTE: tablekey_id is the foreign key to the id in the tablekey table.
id tablekey_id domain is_basedomain
1 22 somesite.com 1
2 22 yahoo.com 0
3 23 red.com 1
4 23 blue.com 0
5 23 green.com 0
Heres the query Im working with. I was trying to use a sub query but I cant seem to select only the count for the current tablekey_id so this does not work.
SELECT `tablekey_id`, `linkdata_id`, `timestamp`, `domain`, `is_basedomain`,
(SELECT COUNT(1) AS other FROM `searched_domains` AS dd
ON dd.tablekey_id = d.tablekey_id GROUP BY `tablekey_id`) AS count
FROM `tablekey` AS k
JOIN `searched_domains` AS d
ON k.id = d.tablekey_id
WHERE `is_basedomain` = 1 GROUP BY `tablekey_id`
The result that I would like to get back is:
tablekey_id linkdata_id timestamp domain is_basedomain count
22 9495028175 2013-03-10 01:13:46 somesite.com 1 2
23 8392740179 2013-03-10 21:23:25 red.com 1 3
Can anyone help me get this into one query?
You can treat the searched_domains rows that have is_basedomain=1 as a separate table in the query and join it with another instance of searched_domains (to get the count):
SELECT
d.tablekey_id,
k.linkdata_id,
k.timestamp,
d.domain,
d.is_basedomain,
COUNT(*) as 'count'
FROM
tablekey AS k
join searched_domains AS d on d.tablekey_id=k.id
join searched_domains AS d2 on d2.tablekey_id=d.tablekey_id
WHERE
d.is_basedomain = 1
GROUP BY
d.tablekey_id,
k.linkdata_id,
k.timestamp,
d.domain,
d.is_basedomain
you have an error when using ON instead use WHERE
try this
SELECT `tablekey_id`, `linkdata_id`, `timestamp`, `domain`, `is_basedomain`,
(SELECT COUNT(1) AS other FROM `searched_domains` AS dd
where dd.tablekey_id = d.tablekey_id GROUP BY `tablekey_id`) AS count
FROM `tablekey` AS k
JOIN `searched_domains` AS d
ON k.id = d.tablekey_id
WHERE `is_basedomain` = 1 GROUP BY `tablekey_id`
DEMO HERE
There is no reason to use subquery, or what is your opinion?
SELECT
`tablekey_id`,
`linkdata_id`,
`timestamp`,
`domain`,
`is_basedomain`,
COUNT(*) as count
FROM
`tablekey` AS k ,
`searched_domains` AS d
WHERE
k.id = d.tablekey_id AND
`is_basedomain` = 1
GROUP BY
`tablekey_id`,
`linkdata_id`,
`timestamp`,
`domain`,
`is_basedomain`
If you want only latest timestamp use MAX(timestamp) as timestamp and remove it from group by.