Merging 3 Tables, Limiting 1 Table With Multiple Fields Needed - mysql

Been looking into this for awhile. Hoping someone might be able to provide some insight. I have 3 tables. All of which I'm grabbing multiple columns, but the 3rd I need to limit the output to just the most recent timestamp entry, BUT still display multiple columns.
If I have the following data [ Please see SQL Fiddle ]:
http://sqlfiddle.com/#!2/84b91/6
The fiddle is a list of (names) in Table1(users), (job_name,years) in Table2(job), and then (score, timestamp) in Table3(job_details). All linked together by the users id.
I am definitely not great at MYSQL. I know I'm missing something.. possibly a series of JOINs. I have been able to get Table 1, Table 2 and one column of Table 3 by doing this:
select a.id, a.name, b.job_name, b.years,
(select c.timestamp
from job_details as c
where c.user_id = a.id
order by c.timestamp desc limit 1) score
from users a, job as b where a.id = b.user_id;
At this point, I can get multiple column data on the first two columns, limit the 3rd to one value and sort that value on the last timestamp...
My question is: How does one go about adding a second column to the limit? In the example in the fiddle, I'd like to add the score as well as the timestamp to the output.
I'd like the output to be:
NAME, JOB, YEARS, SCORE, TIMESTAMP. The last two columns would only be the last entry in job_details sorted by the most recent TIMESTAMP.
Please let me know if more information is required! Thank you for your time!
T

Try this:
select a.id, a.name, b.job_name, b.years, c.timestamp, c.score
from users a
INNER JOIN job as b ON a.id = b.user_id
INNER JOIN (SELECT jd.user_id, jd.timestamp, jd.score
FROM job_details as jd
INNER JOIN (select user_id, MAX(timestamp) as tstamp
from job_details
GROUP BY user_id) as max_ts ON jd.user_id = max_ts.user_id
AND jd.timestamp = max_ts.tstamp
) as c ON a.id = c.user_id
;

Related

SQL Selecting Repetead Rows in a table

I am currently refreshing my SQL knowledge.
I have a table - Sessions. It stores information about user log activity. ie the duration of how long they are logged in for. See the table below.
So I am trying to select all repeated rows from a table (not just validate that repeated rows exist).
So far I have managed to get the output of the entire table, however, I only need the userId and duration columns. How can I go about selecting only these two rows?
I thought it would have been SELECT a.userId instead of a.* etc however I get the error "ambiguous column name: userId". Not sure what is going on. Sorry if it's a stupid question but any help is appreciated. Thanks.
SELECT a.*
FROM sessions a
JOIN ( SELECT userId,duration
FROM sessions
GROUP BY userId
HAVING COUNT(userId) > 1 ) b
ON a.userId = b.userId
ORDER BY userId;
The problem is due to the ORDER BY clause, which does not scope the userId reference to one of the tables. Use this version:
ORDER BY a.userId;
Here is your updated query, with the select clause of the subquery also corrected by removing the incorrect (and unnecessary) reference to duration:
SELECT a.*
FROM sessions a
INNER JOIN
(
SELECT userId
FROM sessions
GROUP BY userId
HAVING COUNT(userId) > 1
) b
ON a.userId = b.userId
ORDER BY
a.userId;

How can I speed up a multiple inner join query?

I have two tables. The first table (users) is a simple "id, username" with 100,00 rows and the second (stats) is "id, date, stat" with 20M rows.
I'm trying to figure out which username went up by the most in stat and here's the query I have. On a powerful machine, this query takes minutes to complete. Is there a better way to write it to speed it up?
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON (b.id=a.id)
INNER JOIN stats AS c ON (c.id=a.id)
WHERE b.date = '2016-01-10'
AND c.date = '2016-01-13'
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
the other way i tried but it doesn't seem optimal is
SELECT a.id, a.username,
(SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') AS start,
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14') AS end,
((SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') -
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14')) AS stat_diff
FROM users AS a
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
Introduction
Let's suppose we rewrite sentence like this:
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON
b.date = STR_TO_DATE('2016-01-10', '%Y-%m-%d' ) and b.id=a.id
INNER JOIN stats AS c ON
c.date = STR_TO_DATE('2016-01-13', '%Y-%m-%d' ) and c.id=a.id
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
And we ensure than:
users table has index on field id:
stats has index on composite field date, id: create index stats_idx_d_i on stats ( date, id );
Then
Database optimizer may use indexes to selected a Restricted Set of Date ('RSD'), that means, rows that match filtered dates. This is fast.
But
You are sorting by a calculated field:
(b.stat - c.stat) AS stat_diff #<-- calculated
ORDER BY stat_diff DESC #<-- this forces to calculate it
They are no possible optimization on this sort because you should to calculate one by one all results on your 'RSD' (restricted set of data).
Conclusion
The question is, how may rows they are on your 'RSD'? If only they are few hundreds rows you query may run fast, else, your query will be slow.
Any case, you should to be sure the first step of query ( without sorting ) is made by index and no fullscanning. Use Explain command to be sure.
All you need to do is to help optimizer.At a bare minimum.have a check list which looks like below
1.Are my join columns indexed ?
2.Are the where clauses Sargable
3.are there any implicit,explicit conversions
4.Am i seeing any statistics issues
one more interesting aspect to look at is how is your data distributed,once you understand the data,you will be able to intrepret the execution plan and alter it as per your need
EX:
Think like i have any customers table with 100rows,Each one has a minimum of 10 orders(total upto 10000 orders).Now if you need to find out only top 3 orders by date,you dont want a scan happening of orders table
Now in your case ,i may not go with second option,even though the optimizer may choose a good plan for this one as well,I will go first approach and try to see if the execution time is acceptable.if not then i will go through my check list and try to tune it further
The Query Seems OK, Verify your Indexes ..
Or
Try this Query
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN (select id,stat from stats where date = '2016-01-10') AS b ON (b.id=a.id)
INNER JOIN (select id,stat from stats where date = '2016-01-13') AS c ON (c.id=a.id)
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100

creating a custom column from joining two tables

I am terrible with sub queries if that is what i need to do. First let me show you a preview of my tables and what i'm trying to do.
this is the result i want at the end:
business.name
reviews_count (total count, matching the current queries business_id)
where the b.industry_id matches 7
This is what i'm trying but i feel stuck and dont know how to match the total count, let me explain:
select
b.name,
reviews_count as (select count(*) as count from reviews where business_id = b.business_id),
from business as b
left join reviews as r
on r.business_id = b.id
where b.industry_id = 7
the sub query business_id needs to match the the current businesses id that is being run. Hope i made sense. ( reviews_count doesnt exist, i just made it up to use when i output)
This looks like a job for GROUP BY
SELECT
b.name,
count(distinct r.id)
FROM
businesses b
JOIN reviews r ON r.business_id = b.id
WHERE b.industry_id = 7
GROUP BY b.id
That way you can avoid the subquery alltogether.

Select corresponding records from another table, but just the last one

I have 2 tables authors and authors_sales
The table authors_sales is updated each hour so is huge.
What I need is to create a ranking, for that I need to join both tables (authors has all the author data while authors_sales has just sales numbers)
How can I create a final table with the ranking of authors ordering it by sales?
The common key is the: authorId
I tried with LEFT JOIN but I must be doing something wrong because I get all the authors_sales table, not just the last.
Any tip in the right direction much appreciated
If you're looking for aggregate data of the sales, you'd want to join the tables, group by the authorId. Something like...
select authors.author_id, SUM(author_sales.sale_amt) as total_sales
from authors
inner join author_sales on author_sales.author_id = authors.author_id
group by authors.author_id
order by total_sales desc
However (I couldn't distinguish from your question whether the above scenario or next is true), if you're only looking for the max value of the author_sales table (if the data in this table is already aggregated), you can join on a nested query for author_sales, such as...
select author.author_id, t.sales from authors
inner join
(select top 1 author_sales.author_id,
author_sales.sale_amt,
author_sales.some_identifier
from author_sales order by some_identifier desc) t
on t.author_id = author.author_id
order by t.sales desc
The some_identifier would be how you determine which record is the most recent for author_sales, whether it is a timestamp of when it was inserted or an incremental primary key, however it is set up. Depending on if the data in author_sales is aggregated already, one of these two should do it for you...
select a.*, sum(b.sales)
from authors as a
inner join authors_sales as b
using authorId
group by b.authorId
order by sum(b.sales) desc;
/* assuming column sales = total for each row in authors_sales */

ordering a table in Mysql, according to another, but without seeing repetitive rows of the first table

In MySql, I have two tables, A and B.
A has as columns A.id, B has as columns B.id and B.aid.
or each row of A I have many rows of B. And the value of B.aid=A.id
of course.
Now I need to get a list of the values in A, but I need to order them, according to B.
In particular if I have two rows in A: a1 and a2. Each will have a series of rows in B:
b11, b12, b13, ...
and
b21, b22, b23, ...
Now I need to order the A from the one connected with the highest b.id to the one with the second highest, and so on. (of course having one row appearing only once).
I tried this:
SELECT a.id FROM a, b WHERE a.id=b.aib ORDER BY b.id DESC
I did indeed got all the values in the right order, but each value of A would appear n times, if n was the number of rows in B related to that row in A.
How do I avoid that, so that I get only one value.
I am considering taking the wholelist, and then eliminating all the non unique values, but I fear that once the website becomes big it might not be doable anymore.
(In case you wonder, this is to program a fac-simile of a discussion board, the table A is the thread, and the table B is the entry, and I want to have a page where all the threads are presented, but in order of the thread that had the last action later)
Many thanks,
Pietro
P.S. MySql is not my thing, so please do spell out the solution :)
UPDATE:
The actual code is more complex, as it also involves users, and similar. So I am looking at something like:
SELECT DISTINCT a.id, a.question, a.roundid, a.phase, users.username, users.id
FROM a, users, b
WHERE a.phase = 0 AND users.id = a.usercreatorid AND b.experimentid = a.id
ORDER BY b.id DESC
I tried the DISTINCT, as suggested below, but it does not work. I do get all the thread (i.e. questions) uniquely, but thhey are not perfectly ordered. I do not know why, but it seem he is not chosing a random row from b, and this goes generally in the right direction, but it is not the row with the max(b.id). SO the distinct does not sort between rows in the correct way. I will now look at the other solutions proposed.
select * from parent a
order by ( select max(id) from child b where b.parent_id = a.id);
NOTE WELL: this is not a join, so you'll get all rows in a, not just those that have a child in b.
You can see why if you do this:
select *, ( select max(id) from child b where b.parent_id = a.id)
from parent a
order by ( select max(id) from child b where b.parent_id = a.id);
(null sorts before anything else in an ascending sort.)
This avoids grouping or distincting, and has the advantage that the SQL pretty clearly states your intent, not a workaround to get at your intent, which makes it more self-commenting than some alternatives.
As I understand it, you want the a.id values, ordered by the most recent corresponding b.id value.
Where you have a 1->many relation and need that sort of info, you're typically looking at a GROUP BY to aggregrate the data, or a subquery for more complex criteria.
So, something like this should do it, using a group by:
SELECT
a.id, a.question, a.roundid, a.phase,
users.username, users.id,
MAX(b.id) AS latest
FROM
a, users, b
WHERE
a.phase = 0 AND users.id = a.usercreatorid AND b.experimentid = a.id
GROUP BY a.id
ORDER BY latest DESC
You want to use the DISTINCT keyword.
SELECT DISTINCT a.id FROM a, b WHERE a.id=b.aib ORDER BY b.id DESC