Mysql subquery field unknown - mysql

I have read many posts have a solution for this but this does not work in my case. What am I doing wrong?
- This gives me the SUM of scores for every user and this is the first part.( Aggregated Data)
The Query result
SELECT user_id, sum(score) as total_user_score
FROM (
SELECT comments_proper.user_id, comments_proper.score
FROM assignment_2.comments_proper
) AS rsch
GROUP BY user_id;
However, I want only 2 records which contain the min and the max score values.

What am I doing wrong?
Oh dear, where to begin.
I have read many posts
You should have been paying attention to which ones got up-voted and good answers, and which were down-voted/closed. The former would have included the table structures, examples of input and expected output. And unambiguous questions.
I want only 2 records
Is that from the source data set or from the aggregated data set?
The latter is a slightly tricky problem which has been asked and answered many times here on SO, there are multiple solutions with different performance characteristics. There's even a chapter in the manual covering just this question. The current content at that link uses subqueries to identify the min/max value which replaces an earlier version of the documentation which explained the max-concat trick, but its also possible to use variables to identify the right caddidate rows in a sub-query or to use sorting.
However the SQL you've shown us here, has very little to do with solving the problem you describe, and is very badly written.
I won't provide examples of every solution, but this will solve your problem...
SELECT user_id, SUM(score)
FROM assignment_2.comments_proper
GROUP BY user_id
ORDER BY SUM(score)
UNION
SELECT user_id, SUM(score)
FROM assignment_2.comments_proper
GROUP BY user_id
ORDER BY SUM(score)
updated
I hadn't tested the above. I did test this:
SELECT *
FROM (
SELECT user_id, SUM(score)
FROM assignment_2.comments_proper
GROUP BY user_id
ORDER BY SUM(score) LIMIT 0,1
) as lowest
UNION ALL
SELECT *
FROM (
SELECT user_id, SUM(score)
FROM assignment_2.comments_proper
GROUP BY user_id
ORDER BY SUM(score) DESC LIMIT 0,1
) as highest

In your queries you have some problem of sintax and a too complex calculation for aggregated resul anyway .. in cp1.* result you have the min related values in cp2.* the max related.
If you need all the resuly for min and max rows on the same row you can use a couple of inner join based on the aggregated result
select cp1.* , cp2.*
from ( SELECT cp.user_id, sum(cp.score), min(cp.score) min_score, max(cp.score) max_score
FROM assignment_2.comments_proper cp
group by cp.user_id ) t
inner join assignment_2.comments_proper cp1 on cp1.user_id = t.user_id
and cp1.score = t.min_score
inner join assignment_2.comments_proper cp2 on cp2.user_id = t.user_id
and cp2.score = t.max_score
otherwise if you want the result in two rows one for min and one for max
select 'min' , cp1.*
from ( SELECT cp.user_id, sum(cp.score), min(cp.score) min_score, max(cp.score) max_score
FROM assignment_2.comments_proper cp
group by cp.user_id ) t
inner join assignment_2.comments_proper cp1 on cp1.user_id = t.user_id
and cp1.score = t.min_score
union
select 'max' , cp2.*
from ( SELECT cp.user_id, sum(cp.score), min(cp.score) min_score, max(cp.score) max_score
FROM assignment_2.comments_proper cp
group by cp.user_id ) t
inner join assignment_2.comments_proper cp2 on cp2.user_id = t.user_id
and cp2.score = t.max_score

Related

How to Optimize mysql query the brings huge quantity of rows?

I really need your help. I´m doing one work from my universitiy and before I come here I read a lot of things from the documentations of mysql, searched and searched but none of this helped me in my sql query. Look I have this query:
SELECT a.nome, COUNT(*)
FROM publ p JOIN auth a on p.pubid = a.pubid
WHERE p.pubid IN (SELECT pubid
FROM auth
GROUP BY pubid
HAVING COUNT(*) < 3) // THIS VALUE 3 here I have to do with value 2, 4 and 5
GROUP BY a.nome // in different querys.
ORDER BY COUNT(*) DESC, a.nome ASC
I tried to put index in the where clause but I never get the results and takes to long time. What can I do to increase my query to bring me more faster the results? Thank you for the help
I would create these indexes and reorder the query
CREATE INDEX publ_pubid ON publ(pubid);
CREATE INDEX auth_pubid ON auth(pubid, nome);
SELECT a.nome, COUNT(*)
FROM (
SELECT pubid
FROM auth
GROUP BY pubid
HAVING COUNT(*) < 3
) L
LEFT JOIN publ p on L.pubid=publ.pubid
JOIN auth a on p.pubid = a.pubid
GROUP BY a.nome
ORDER BY COUNT(*) DESC, a.nome ASC;

How to select comment count, Sum of votes, and whether active user has voted

Im having trouble structuring my MySQL query to return an accurate comment count, sum of votes, and the active users vote.
My tables are
wall_posts ( id, message, username, etc )
comments ( id, wall_id, username, text, etc )
votes ( id, wall_id, vote (+1 or -1), username )
My query looks like this
SELECT
wall_posts.*,
COUNT( comments.wall_id ) AS comment_count,
COALESCE( SUM( v1.vote ), 0 ) AS vote_tally,
v2.vote
FROM
wall_posts
LEFT JOIN comments ON wall_posts.id = comments.wall_id
LEFT JOIN votes v1 ON wall_posts.id = v1.wall_id
LEFT JOIN votes v2 ON wall_posts.id = v2.wall_id AND v2.username=:username
WHERE
symbol =: symbol
GROUP BY
wall_posts.id
ORDER BY
date DESC
LIMIT 15
It works for always returning the correct value for the specific active users vote (+1 or -1) or null if hasnt voted. If there are no comments on an item, the total vote sum is correct. If there are any comments, the vote sum will always be equal to the comment count, possibly with a negative sign if there are down votes but always equal to the amount of comments.
I think its obviously the way ive connected my tables but i just cant figure out why its copying the comment count, 1000000 points to someone who can explain this to me :)
You need to perform the aggregate operations in subqueries. Right now instead you're JOINing all of the tables (pre-aggregation) together. If you remove the aggregates (and the GROUP BY) you'll see the large mass of data which doesn't really mean anything.
Instead, try this (note I'm using a VIEW):
CREATE VIEW walls_posts_stats AS
SELECT
wall_posts.id,
COALESCE( comments_stats.comment_count, 0 ) AS comment_count,
COALESCE( votes_stats.vote_tally, 0 ) AS vote_tally
FROM
wall_posts
LEFT OUTER JOIN
(
SELECT
wall_id,
COUNT(*) AS comment_count
FROM
comments
GROUP BY
wall_id
) AS comments_stats ON wall_posts.id = comments_stats.wall_id
LEFT OUTER JOIN
(
SELECT
wall_id,
SUM( vote ) AS vote_tally
FROM
votes
GROUP BY
wall_id
) AS votes_stats ON wall_posts.id = votes_stats.wall_id
Then you can query it JOINed with your original wall data:
SELECT
wall_posts.*, -- note: avoid the use of * in production queries
stats.comment_count,
stats.vote_tally,
user_votes.vote
FROM
wall_posts
INNER JOIN walls_posts_stats AS stats ON wall_posts.id = stats.id
LEFT OUTER JOIN
(
SELECT
wall_id,
vote
FROM
votes
WHERE
username = :username
) AS user_votes ON wall_posts.id = user_votes.wall_id
ORDER BY
date DESC
LIMIT 15
Hypothetically you could combine it into a single large query (basically copy+paste the VIEW body into the INNER JOIN walls_posts_stats clause) but I feel that would introduce maintainability issues.
While MySQL does support views, it does not support parameterized views (aka composable table-valued functions; stored procedures are not composable) so that's why the user_votes subquery isn't in the walls_posts_stats VIEW.

Fetch rows with MAX DATE and GROUP BY

I have a table name payment_schedule with following contents
I want to fetch records with MAX(due_date) GROUPED BY loan_application_id
With reference to records in above image, i expect the result to be following
I tried using following SQL query
SELECT
id,
MAX(due_date) as due_date,
loan_application_id
FROM
payment_schedule
GROUP BY
loan_application_id
Which returns me the following result.
As you see it does not return the corresponding id for a given due date.
Additionally, I have another column called payment_type_id and I need to exclude rows when payment_type_id has value of 3.
I tried several solution available here, nothing seems to work, how to go about it?
Thanks.
This is called Group-wise Maximum and tagged here as greatest-n-per-group. The most traditional approach is to find the value you want and do a join to get the corresponding row per group like this:
SELECT
ps.id,
ps.due_date,
ps.loan_application_id
FROM
(
SELECT
MAX(due_date) as due_date,
loan_application_id
FROM payment_schedule
WHERE payment_type_id != '3'
GROUP BY loan_application_id
) ps2
LEFT JOIN payment_schedule ps USING (loan_application_id)
WHERE ps.due_date = ps2.due_date
AND ps.payment_type_id != '3'
GROUP BY ps.loan_application_id
It's also worth mentioning that this query will run a bazillion times faster if you have an index on your loan_application_id and due_date columns.
Best discussion I've seen here on SO is this: Select first row in each GROUP BY group?
Also addressed in the official docs here: http://dev.mysql.com/doc/refman/5.7/en/example-maximum-column-group-row.html
If due date per loan_application_id is distinct, you can remove the keyword distinct below:
select distinct a.*
from payment_schedule a, (
select loan_application_id, max(due_date) max_date
from payment_schedule
where payment_type_id <> 3
group by 1
) as b
where a.loan_application_id = b.loan_application_id
and a.due_date = b.max_date
In most databases, this is easiest using window functions. In MySQL, you can use a join and group by:
select ps.*
from payment_schedule ps join
(select load_application_id, max(due_date) as maxdd
from payment_schedule
group by load_application_id
) l
on ps.load_application_id = l.load_application_id and ps.due_date = l.maxdd;

MySQL is not using INDEX in subquery

I have these tables and queries as defined in sqlfiddle.
First my problem was to group people showing LEFT JOINed visits rows with the newest year. That I solved using subquery.
Now my problem is that that subquery is not using INDEX defined on visits table. That is causing my query to run nearly indefinitely on tables with approx 15000 rows each.
Here's the query. The goal is to list every person once with his newest (by year) record in visits table.
Unfortunately on large tables it gets real sloooow because it's not using INDEX in subquery.
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id
Does anyone know how to force MySQL to use INDEX already defined on visits table?
Your query:
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id;
First, is using non-standard SQL syntax (items appear in the SELECT list that are not part of the GROUP BY clause, are not aggregate functions and do not sepend on the grouping items). This can give indeterminate (semi-random) results.
Second, ( to avoid the indeterminate results) you have added an ORDER BY inside a subquery which (non-standard or not) is not documented anywhere in MySQL documentation that it should work as expected. So, it may be working now but it may not work in the not so distant future, when you upgrade to MySQL version X (where the optimizer will be clever enough to understand that ORDER BY inside a derived table is redundant and can be eliminated).
Try using this query:
SELECT
p.*, v.*
FROM
people AS p
LEFT JOIN
( SELECT
id_people
, MAX(year) AS year
FROM
visits
GROUP BY
id_people
) AS vm
JOIN
visits AS v
ON v.id_people = vm.id_people
AND v.year = vm.year
ON v.id_people = p.id;
The: SQL-fiddle
A compound index on (id_people, year) would help efficiency.
A different approach. It works fine if you limit the persons to a sensible limit (say 30) first and then join to the visits table:
SELECT
p.*, v.*
FROM
( SELECT *
FROM people
ORDER BY name
LIMIT 30
) AS p
LEFT JOIN
visits AS v
ON v.id_people = p.id
AND v.year =
( SELECT
year
FROM
visits
WHERE
id_people = p.id
ORDER BY
year DESC
LIMIT 1
)
ORDER BY name ;
Why do you have a subquery when all you need is a table name for joining?
It is also not obvious to me why your query has a GROUP BY clause in it. GROUP BY is ordinarily used with aggregate functions like MAX or COUNT, but you don't have those.
How about this? It may solve your problem.
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
If you need to show the person, the most recent visit, and the note from the most recent visit, you're going to have to explicitly join the visits table again to the summary query (virtual table) like so.
SELECT a.id, a.name, a.year, v.note
FROM (
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
)a
JOIN visits v ON (a.id = v.id_people and a.year = v.year)
Go fiddle: http://www.sqlfiddle.com/#!2/d67fc/20/0
If you need to show something for people that have never had a visit, you should try switching the JOIN items in my statement with LEFT JOIN.
As someone else wrote, an ORDER BY clause in a subquery is not standard, and generates unpredictable results. In your case it baffled the optimizer.
Edit: GROUP BY is a big hammer. Don't use it unless you need it. And, don't use it unless you use an aggregate function in the query.
Notice that if you have more than one row in visits for a person and the most recent year, this query will generate multiple rows for that person, one for each visit in that year. If you want just one row per person, and you DON'T need the note for the visit, then the first query will do the trick. If you have more than one visit for a person in a year, and you only need the latest one, you have to identify which row IS the latest one. Usually it will be the one with the highest ID number, but only you know that for sure. I added another person to your fiddle with that situation. http://www.sqlfiddle.com/#!2/4f644/2/0
This is complicated. But: if your visits.id numbers are automatically assigned and they are always in time order, you can simply report the highest visit id, and be guaranteed that you'll have the latest year. This will be a very efficient query.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT id_people, max(id) id
FROM visits
GROUP BY id_people
)m
JOIN people p ON (p.id = m.id_people)
JOIN visits v ON (m.id = v.id)
http://www.sqlfiddle.com/#!2/4f644/1/0 But this is not the way your example is set up. So you need another way to disambiguate your latest visit, so you just get one row per person. The only trick we have at our disposal is to use the largest id number.
So, we need to get a list of the visit.id numbers that are the latest ones, by this definition, from your tables. This query does that, with a MAX(year)...GROUP BY(id_people) nested inside a MAX(id)...GROUP BY(id_people) query.
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON (p.id_people = v.id_people AND p.year = v.year)
GROUP BY v.id_people
The overall query (http://www.sqlfiddle.com/#!2/c2da2/1/0) is this.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON ( p.id_people = v.id_people
AND p.year = v.year)
GROUP BY v.id_people
)m
JOIN people p ON (m.id_people = p.id)
JOIN visits v ON (m.id = v.id)
Disambiguation in SQL is a tricky business to learn, because it takes some time to wrap your head around the idea that there's no inherent order to rows in a DBMS.

select ONLY repeated rows in a table

I need to find users that received bonus at my DB. The only users that interest to me, are those who got bonus more than one time.
How should I work this query to get ONLY users who got bonus more than once?
select Bonus, BonusUser, BonusType, Amount
from Bonus
where BonusType="1"
order by BonusUser asc;
I need a query that prompts all "duplicated" rows, so I can remove bonus from them.
I haven't explained before, but some users exploited a bug and could get free bonus, so I must select those duplicated rows, analyze and remove if it's abuse case.
you can do like below
select BonusUser,
count(*)
from Bonus
where BonusType="1"
group by BonusUser
having count(*)>1
order by BonusUser asc
you must provide some dummy data with expected result ..
Add a GROUP BY and a HAVING clause
SELECT Bonus
, BonusUser
, BonusType
, Ammount
FROM Bonus
WHERE BonusType="1"
GROUP BY BonusUser
HAVING Count(*) > 1
ORDER BY BonusUser asc;
Based on your comment, I think this is what you want, this will give you the list of all users with a bonus but it will give you the count of those who had more than one bonus:
SELECT Bonus
, t.BonusUser
, BonusType
, amount
, t2.cntbonus
FROM Bonus t
inner join
(
select count(*) as CntBonus, bonususer
from Bonus
where BonusType='1'
group by bonususer
) t2
on t.bonususer = t2.bonususer
WHERE BonusType='1'
ORDER BY BonusUser asc
As an alternative to the other answers, you can also do the following:
SELECT DISTINCT b1.*
FROM Bonus b1
JOIN Bonus b2
ON b1.BonusUser = b2.BonusUser
AND b1.Id > b2.Id
WHERE b1.BonusType = "1"
AND b2.BonusType = "1"
ORDER BY b1.BonusUser ASC;