I want to retrieve information, specifically the names and the total number of unique activities of an athelete (which i've renamed "sports Variety". I want to rank first the athlete who does more activities, but I'm not sure how to approach this.
I also know I should use DISTINCT somewhere, but not sure where to put it. I also don't know how to sort by unique/most activities.
SELECT athlete.name, COUNT(athlete.name) AS 'Sports Variety' FROM athlete
INNER JOIN athlete ON athlete.id = training_session.activity GROUP BY
training_session.activity HAVING COUNT(athlete) > 0
I expect it to be 6 rows to be returned.
assuming you have two table one name athlete with athlete name and one name training_session for the activities related to each athletes
SELECT athlete.name, COUNT(*) AS 'Sports Variety'
FROM athlete
INNER JOIN training_session ON athlete.id = training_session.athlete_id
GROUP BY athlete.name
If you want to find the athlete having the most activities, you could try a LIMIT query:
SELECT
a.id,
a.name,
COUNT(*) AS num_activities
FROM athlete a
INNER JOIN training_session ts
ON a.id = ts.athelete_id
GROUP BY
a.id,
a.name
ORDER BY
COUNT(*) DESC
LIMIT 1;
I see a handful of syntax problems with your current query, such as the second table name is the same as the first (not what you want here). Note that my query doesn't handle the possibility that two or more athletes might happen to have the same amount of activity. To resolve that, we could either find something else to rank, or restructure the query.
Related
I encountered a problem on a database I am working with. I have a table of counsels which may hold repeating values, but their is an enrolment number filed which is unique and can be used to fetch them. However, I want to join from a cases_counsel table on the "first" unique value of the counsel table that matches that column on the cases counsel table.
I want to list the cases belonging to a particular counsel using the enrolment_number as the counsel_id on the cp_cases_counsel table. That means I want to pick just a distinct value of a counsel, then use it to join the cp_cases_counsel table and also return the count for such.
However, I keep getting duplicates. This was the mysql query I tried
SELECT T.suitno, T.counsel_id, COUNT(*) as total from cp_cases_counsel T
INNER JOIN (SELECT
enrolment_number as id, MIN(counsel)
FROM
cp_counsel
GROUP BY
enrolment_number
) A
ON A.id = T.counsel_id
GROUP BY T.suitno, T.counsel_id
and
SELECT enrolment_number as id, MIN(counsel) as counsel, COUNT(*) as total FROM cp_counsel
JOIN cp_cases_counsel ON cp_cases_counsel.counsel_id = cp_counsel.enrolment_number
GROUP BY enrolment_number
For the second query, it's joining twice and I am having like double of what I am supposed to get.
The columns that you want in the results are councel (actually only one of all its values) from cp_counsel and counsel_id from cp_cases_counsel, so you must group by them and select them:
SELECT a.counsel, t.counsel_id, COUNT(*) AS total
FROM cp_cases_counsel t
INNER JOIN (
SELECT enrolment_number, MIN(counsel) AS counsel
FROM cp_counsel
GROUP BY enrolment_number
) a ON a.enrolment_number = t.counsel_id
GROUP BY a.counsel, t.counsel_id;
I have a table with a recursive 1:M
Every customer may have been referred by a previous customer, now I want to group the customers who have made referrals, so I can display which ones have referred the most.
and I would want a query which gave following output
I tried
SELECT count(*) AS Count_of_referrals, referral_id
FROM Customer
GROUP BY referral_id
which gives the amount of times each referral_id is mentioned, but I can't find a way to link this back to the actual customer who referred them.
Appreciate any help I can get here.. :-)
Well, you're close. You just need to self join the customer table to get the name of the referral_id in there.
SELECT c1.referral_id,
c2.name,
count(*) count_of_referrals
FROM customer c1
INNER JOIN customer c2
ON c2.customer_id = c1.referral_id
GROUP BY c1.referral_id,
c2.name;
I have an SQL query that needs to perform multiple inner joins, as follows:
SELECT DISTINCT adv.Email, adv.Credit, c.credit_id AS creditId, c.creditName AS creditName, a.Ad_id AS adId, a.adName
FROM placementlist pl
INNER JOIN
(SELECT Ad_id, List_id FROM placements) AS p
ON pl.List_id = p.List_id
INNER JOIN
(SELECT Ad_id, Name AS adName, credit_id FROM ad) AS a
ON ...
(few more inner joins)
My question is the following: How can I optimize this query? I was under the impression that, even though the way I currently query the database creates small temporary tables (inner SELECT statements), it would still be advantageous to performing an inner join on the unaltered tables as they could have about 10,000 - 100,000 entries (not millions). However, I was told that this is not the best way to go about it but did not have the opportunity to ask what the recommended approach would be.
What would be the best approach here?
To use derived tables such as
INNER JOIN (SELECT Ad_id, List_id FROM placements) AS p
is not recommendable. Let the dbms find out by itself what values it needs from
INNER JOIN placements AS p
instead of telling it (again) by kinda forcing it to create a view on the table with the two values only. (And using FROM tablename is even much more readable.)
With SQL you mainly say what you want to see, not how this is going to be achieved. (Well, of course this is just a rule of thumb.) So if no other columns except Ad_id and List_id are used from table placements, the dbms will find its best way to handle this. Don't try to make it use your way.
The same is true of the IN clause, by the way, where you often see WHERE col IN (SELECT DISTINCT colx FROM ...) instead of simply WHERE col IN (SELECT colx FROM ...). This does exactly the same, but with DISTINCT you tell the dbms "make your subquery's rows distinct before looking for col". But why would you want to force it to do so? Why not have it use just the method the dbms finds most appropriate?
Back to derived tables: Use them when they really do something, especially aggregations, or when they make your query more readable.
Moreover,
SELECT DISTINCT adv.Email, adv.Credit, ...
doesn't look to good either. Yes, sometimes you need SELECT DISTINCT, but usually you wouldn't. Most often it is just a sign that you haven't thought your query through.
An example: you want to select clients that bought product X. In SQL you would say: where a purchase of X EXISTS for the client. Or: where the client is IN the set of the X purchasers.
select * from clients c where exists
(select * from purchases p where p.clientid = c.clientid and product = 'X');
Or
select * from clients where clientid in
(select clientid from purchases where product = 'X');
You don't say: Give me all combinations of clients and X purchases and then boil that down so I just get each client once.
select distinct c.*
from clients c
join purchases p on p.clientid = c.clientid and product = 'X';
Yes, it is very easy to just join all tables needed and then just list the columns to select and then just put DISTINCT in front. But it makes the query kind of blurry, because you don't write the query as you would word the task. And it can make things difficult when it comes to aggregations. The following query is wrong, because you multiply money earned with the number of money-spent records and vice versa.
select
sum(money_spent.value),
sum(money_earned.value)
from user
join money_spent on money_spent.userid = user.userid
join money_earned on money_earned.userid = user.userid;
And the following may look correct, but is still incorrect (it only works when the values happen to be unique):
select
sum(distinct money_spent.value),
sum(distinct money_earned.value)
from user
join money_spent on money_spent.userid = user.userid
join money_earned on money_earned.userid = user.userid;
Again: You would not say: "I want to combine each purchase with each earning and then ...". You would say: "I want the sum of money spent and the sum of money earned per user". So you are not dealing with single purchases or earnings, but with their sums. As in
select
sum(select value from money_spent where money_spent.userid = user.userid),
sum(select value from money_earned where money_earned.userid = user.userid)
from user;
Or:
select
spent.total,
earned.total
from user
join (select userid, sum(value) as total from money_spent group by userid) spent
on spent.userid = user.userid
join (select userid, sum(value) as total from money_earned group by userid) earned
on earned.userid = user.userid;
So you see, this is where derived tables come into play.
I have to tables in my database, the first one (participants) look just like that:
And I have another called votes in which I can vote for any participants.
So my problem is that I'm trying to get all the votes of each participant but when I execute my query it only retrieves four rows sorted by the COUNT of votes, And the other remaining are not appearing in my query:
SELECT COUNT(DISTINCT `votes`.`id`) AS count_id, participants.name
AS participant_name FROM `participants` LEFT OUTER JOIN `votes` ON
`votes`.`participant_id` = `participants`.`id` GROUP BY votes.participant_id ORDER BY
votes.participant_id DESC;
Retrieves:
I think the problem is that you're grouping by votes.participant_id, rather than participants.id, which limits you to participants with votes, the outer join notwithstanding. Check out http://sqlfiddle.com/#!2/c5d3d/5/0
As what i have understood from the query you gave you were selecting unique id's from the votes table and I assume that your column id is not an identity. but it would be better if that would be an identity? and if so, here is my answer.replace your select with these.
Select count (votes.participant.id) as count_id ,participants.name as participant_name
from participants join votes
on participants.id = vote.participant_id
group by participants.name
order by count_id
just let me know if it works
cheers
I have these tables and queries as defined in sqlfiddle.
First my problem was to group people showing LEFT JOINed visits rows with the newest year. That I solved using subquery.
Now my problem is that that subquery is not using INDEX defined on visits table. That is causing my query to run nearly indefinitely on tables with approx 15000 rows each.
Here's the query. The goal is to list every person once with his newest (by year) record in visits table.
Unfortunately on large tables it gets real sloooow because it's not using INDEX in subquery.
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id
Does anyone know how to force MySQL to use INDEX already defined on visits table?
Your query:
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id;
First, is using non-standard SQL syntax (items appear in the SELECT list that are not part of the GROUP BY clause, are not aggregate functions and do not sepend on the grouping items). This can give indeterminate (semi-random) results.
Second, ( to avoid the indeterminate results) you have added an ORDER BY inside a subquery which (non-standard or not) is not documented anywhere in MySQL documentation that it should work as expected. So, it may be working now but it may not work in the not so distant future, when you upgrade to MySQL version X (where the optimizer will be clever enough to understand that ORDER BY inside a derived table is redundant and can be eliminated).
Try using this query:
SELECT
p.*, v.*
FROM
people AS p
LEFT JOIN
( SELECT
id_people
, MAX(year) AS year
FROM
visits
GROUP BY
id_people
) AS vm
JOIN
visits AS v
ON v.id_people = vm.id_people
AND v.year = vm.year
ON v.id_people = p.id;
The: SQL-fiddle
A compound index on (id_people, year) would help efficiency.
A different approach. It works fine if you limit the persons to a sensible limit (say 30) first and then join to the visits table:
SELECT
p.*, v.*
FROM
( SELECT *
FROM people
ORDER BY name
LIMIT 30
) AS p
LEFT JOIN
visits AS v
ON v.id_people = p.id
AND v.year =
( SELECT
year
FROM
visits
WHERE
id_people = p.id
ORDER BY
year DESC
LIMIT 1
)
ORDER BY name ;
Why do you have a subquery when all you need is a table name for joining?
It is also not obvious to me why your query has a GROUP BY clause in it. GROUP BY is ordinarily used with aggregate functions like MAX or COUNT, but you don't have those.
How about this? It may solve your problem.
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
If you need to show the person, the most recent visit, and the note from the most recent visit, you're going to have to explicitly join the visits table again to the summary query (virtual table) like so.
SELECT a.id, a.name, a.year, v.note
FROM (
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
)a
JOIN visits v ON (a.id = v.id_people and a.year = v.year)
Go fiddle: http://www.sqlfiddle.com/#!2/d67fc/20/0
If you need to show something for people that have never had a visit, you should try switching the JOIN items in my statement with LEFT JOIN.
As someone else wrote, an ORDER BY clause in a subquery is not standard, and generates unpredictable results. In your case it baffled the optimizer.
Edit: GROUP BY is a big hammer. Don't use it unless you need it. And, don't use it unless you use an aggregate function in the query.
Notice that if you have more than one row in visits for a person and the most recent year, this query will generate multiple rows for that person, one for each visit in that year. If you want just one row per person, and you DON'T need the note for the visit, then the first query will do the trick. If you have more than one visit for a person in a year, and you only need the latest one, you have to identify which row IS the latest one. Usually it will be the one with the highest ID number, but only you know that for sure. I added another person to your fiddle with that situation. http://www.sqlfiddle.com/#!2/4f644/2/0
This is complicated. But: if your visits.id numbers are automatically assigned and they are always in time order, you can simply report the highest visit id, and be guaranteed that you'll have the latest year. This will be a very efficient query.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT id_people, max(id) id
FROM visits
GROUP BY id_people
)m
JOIN people p ON (p.id = m.id_people)
JOIN visits v ON (m.id = v.id)
http://www.sqlfiddle.com/#!2/4f644/1/0 But this is not the way your example is set up. So you need another way to disambiguate your latest visit, so you just get one row per person. The only trick we have at our disposal is to use the largest id number.
So, we need to get a list of the visit.id numbers that are the latest ones, by this definition, from your tables. This query does that, with a MAX(year)...GROUP BY(id_people) nested inside a MAX(id)...GROUP BY(id_people) query.
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON (p.id_people = v.id_people AND p.year = v.year)
GROUP BY v.id_people
The overall query (http://www.sqlfiddle.com/#!2/c2da2/1/0) is this.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON ( p.id_people = v.id_people
AND p.year = v.year)
GROUP BY v.id_people
)m
JOIN people p ON (m.id_people = p.id)
JOIN visits v ON (m.id = v.id)
Disambiguation in SQL is a tricky business to learn, because it takes some time to wrap your head around the idea that there's no inherent order to rows in a DBMS.