I have a MySQL table as below.
**AuthorID**, **PublicationName**, ReferenceCount, CitationCount
AuthorID and PublicationName act as the primary key. I need to find the maximum sum of ReferenceCount and CitationCount for all the authors. For example, the data is as below.
1 AAA 2 5
1 BBB 1 3
1 CCC 2 4
2 AAA 1 4
In this case, I need my output as,
1 AAA 7
2 AAA 5
I tried the below query.
SELECT AuthorID, PublicationName, Max(Sum(ReferenceCount + CitationCount))
from Author
Group by AuthorID, PublicationName
If I use max(sum(ReferenceCount + CitationCount)) group by AuthorID, PublicationName I get an error as "Invalid use of Group function". I believe I should include Having clause in my query. But am not sure on how to do the same.
If I understand the question right, you want all the records for the publication that has the most citations. The publication and their citation counts is given by:
SELECT PublicationName, Sum(ReferenceCount + CitationCount)
from Author
Group by PublicationName
order by Sum(ReferenceCount + CitationCount) desc
limit 1;
The order by and limit 1 give you the highest value.
If you want all records for the publication with the maximum sum:
select a.*
from Author a join
(SELECT PublicationName, Sum(ReferenceCount + CitationCount)
from Author
Group by PublicationName
order by Sum(ReferenceCount + CitationCount) desc
limit 1
) asum
on a.PublicationName = asum.PublicationName
Try this:
SELECT AuthorID, PublicationName, Max(ReferenceCount+CitationCount)
FROM Author
GROUP BY AuthorID
The problem with your query was that the SUM() sums a column's value for many rows. It cannot be used to sum columns the way you wanted. For that, just use the plus (+), normally.
Related
There is a table with the name '**work**' that contains data as shown below:
Id Name a_Column work_datetime
-----------------------------------------
1 A A_1 1592110166
2 A A_2 1592110166
3 A A_3 1592110164
4 B B_1 1582111665
5 B B_2 1592110166
6 C C_1 1592110166
If I run a query which group by A and max(work_datetime), then there could be 2 selections for group with Name='A' but i need only one of them with a_Column='A_1' such that final desired output is as follows:-
Id Name a_Column work_datetime
-----------------------------------------
1 A A_1 1592110166
5 B B_2 1592110166
6 C C_1 1592110166
Handling duplicate records at the group by is something which mysql doesn't seem to support!
Any way i can achieve the required result?
A simple option that works on all versions of MySQL is to filter with a subquery:
select w.*
from work w
where w.id = (
select id
from work w1
where w1.name = w.name
order by work_datetime desc, a_column
limit 1
)
For each name, this brings the row with the latest work_datetime; ties are broken by picking the row with the smallest a_column (which is how I understood your requirement).
For performance, you want an index on (work_datetime, a_column, id).
Since version 8 you can use row_number() to assign a number to each row numbering the position of the row in the descending order of the time repeating for each name. Do that in a derived table and then just select the rows where this number is 1 from it.
SELECT x.id,
x.name,
x.a_column,
x.work_datetime
FROM (SELECT w.id,
w.name,
w.a_column,
w.work_datetime,
row_number() OVER (PARTITION BY w.name
ORDER BY w.work_datetime) rn
FROM work w) x
WHERE x.rn = 1;
With row_number() there are no duplicates. Should there be two rows with the same name and time one of it is chosen randomly. If you want to retain the duplicates you can replace row_number() with rank().
I'm having a problem with returning some rows in an ordered / grouped query.
I'm trying to get the latest row of type 2 on a table, so I used a ORDER first GROUP later approach.
Example table:
id type field1 date (dd/mm/yyyy)
1 1 texta 01/01/2019
2 1 textb 02/01/2019
3 2 textc 01/01/2019
4 2 textd 02/01/2019
5 2 texte 03/01/2019
If I do
SELECT * FROM cars WHERE type = 2 ORDER BY date DESC
it returns:
id type field1 date
5 2 texte 03/01/2019
4 2 textd 02/01/2019
3 2 textc 01/01/2019
Then if I do
SELECT a.*
FROM ( SELECT * FROM cars WHERE type = 2 ORDER BY date DESC ) a
GROUP BY a.type
I gets:
id type field1 date
3 2 textc 01/01/2019
Does GROUP BY don't get the first row of the group? How can I get the latest row like:
id type field1 date
5 2 texte 03/01/2019
Thank you!!!
1st of all, to the best of my knowledge, your query should work ok. Maybe it's a matter of MySQL versions as suggested in the comments
As pwe your question, the solution should be to arrive to the solution not using GROUP BY. how about ORDER BY + LIMIT
For example:
SELECT * FROM cars WHERE type = 2 ORDER BY date DESC LIMIT 1;
In this case, the order by brings the needed row first, the limit makes only one row to be returned
Another approach, less efficient, is by LEFT JOIN + NULL:
SELECT * FROM cars
LEFT JOIN cars2 ON cars.type = cars2.type AND cars2.date > cars.date
WHERE type = 2
AND cars2.id IS NULL
LIMIT 1;
In that approach, the inequality enforces tye latest row not to be joined to the other table
filtering by that criteria will extract the last row
to get the latest row of each type
you can use NOT EXISTS:
select c.*
from cars c
where not exists (
select 1 from cars
where type = c.type and date > c.date
)
MySQL doesn't guarantee the order of the rows in group by for non-aggregated columns, so the query you wrote will not work consistently and will produce non-deterministic values.
To get the data that you wanted, you can perform the following query:
SELECT *
FROM cars
WHERE date = (SELECT date FROM cars WHERE type = 2 ORDER BY date DESC limit 1)
In your query you are grouping by type but trying to get all other columns which are not part of the aggregation, so it will not work correctly, and in case ONLY_FULL_GROUP_BY SQL mode is enabled, the query will be considered invalid.
Related information for group by can be found in official MySQL documentation
I need to display the last 2 results from a table (results), the results are comprised of several rows with matching submissionId, The number of rows per submission is unknown, and of course I prefer a single query.
Here is the DB table structure
submissionId input value
1 name jay
1 phone 123-4567
1 email test#gmail.com
2 name mo
2 age 32
3 name abe
3 email abe#gmail.com
4 name jack
4 phone 123-4567
4 email jack#gmail.com
Desierd results:
submissionId input value
3 name abe
3 email abe#gmail.com
4 name jack
4 phone 123-4567
4 email jack#gmail.com
Or even better, if I can combine the rows like this:
3 name abe 3 email abe#gmail.com
4 name jack 4 phone 123-4567 4 email jack#gmail.com
One option here is to use a subquery to identify the most recent and next to most recent submissionId:
SELECT submissionId, input, value
FROM yourTable
WHERE submissionId >= (SELECT MAX(submissionId) FROM yourTable) - 1
ORDER BY submissionId
Demo here:
SQLFiddle
Update:
If your submissionId column were really a date type, and you wanted the most recent two dates in your result set, then the following query will achieve that. Note that the subquery in the WHERE clause, while ugly, is not correlated to the outer query. This means that the MySQL optimizer should be able to figure out that it only needs to run it once.
SELECT submissionDate, input, value
FROM yourTable
WHERE submissionDate >=
(SELECT MAX(CASE WHEN submissionDate = (SELECT MAX(submissionDate) FROM yourTable)
THEN '1000-01-01'
ELSE submissionDate
END) FROM yourTable)
ORDER BY submissionDate
SQLFiddle
You can use limit in subqueries in the from clause, so a typical way to write this is:
SELECT submissionDate, input, value
FROM t join
(select distinct submissionDate
from t
order by submissionDate desc
limit 2
) sd
on t.submissionDate = sd.submissionDate;
This is how the query looks like now, so i can get the results with a LIMIT, RANGE, and id/timestamp (with help of Tim and Gordon):
SELECT *
FROM rmyTable t
JOIN
(SELECT DISTINCT sd.submissionId
FROM myTable sd
WHERE sd.questionId = yourId
ORDER BY sd.submissionId
LIMIT 2
) t2
ON t.submissionId = t2.submissionId
WHERE t.formId = yourId
AND dateTime BETWEEN 0000 AND 1111
So I have a posts table with a author_id foreign key and a published status (bool).
I want to select all the most recent (highest id) published posts for an author (all those with the same author_id foreign id).
Right now I can get the highest for a group of foreign keys
SELECT * FROM Posts WHERE id = (SELECT MAX(id) FROM Posts WHERE published = 1 AND author_id = 5);
But this only returns the highest id for a given foreign key. How would I write this to return all the posts with the highest id in their foreign key group?
Any advice is appreciated. Thanks
EDIT: Had it tagged with sql-server and mysql. It's mysql. Sorry about that
EDIT: Some asked for clarity
Here is a sample of what I'm looking for:
id body author_id published
1 Top 10... 1 1
2 Breaking... 1 1
3 New Report.. 3 1
4 Can Sunscreen... 3 1
5 Dow down... 2 1
6 Heart Warming... 2 1
7 Next time... 1 1
8 New study... 3 0
So what i want to do is grab the posts with ids 4, 6, and 7 because 4 is the most recent (highest id) for author 3, 6 is the most recent for author 2 and 7 is the most recent for author 1. I also have the conditional of published which is why we don't grab 8 because it is 0.
4 Can Sunscreen... 3 1
6 Heart Warming... 2 1
7 Next time... 1 1
Answered:
With a slight tweak to Igor Quirino answer by adding an IN instead of =, i think the following works:
SELECT * FROM Posts WHERE id IN (SELECT MAX(id) FROM Posts WHERE published = 1 GROUP BY author_id);
You should use the IN operator instead of = operator
The IN operator allows you to specify multiple values in a WHERE clause.
The IN operator is a shorthand for multiple OR conditions.
IN Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1, value2, ...);
or:
SELECT column_name(s)
FROM table_name
WHERE column_name IN (SELECT STATEMENT);
The IN operator allows you to use a query that returns multiples records (your query must return exactly one column)
SELECT * FROM Posts WHERE id IN (SELECT MAX(id) FROM Posts WHERE published = 1 GROUP BY author_id);
Happy to Help.
If you want the most recent posts for an author, use ORDER BY and LIMIT. For instance, to get the 10 most recent posts for author 5:
SELECT p.*
FROM Posts p
WHERE p.author_id = 5
ORDER BY p.id DESC
LIMIT 10;
Suppose I have a table called tblSchoolSupplies that looks like this:
itemsID categoryID subCategoryID itemName
1 1 1 pencil
2 1 1 eraser
3 2 2 toilet paper
4 1 2 bond paper
5 1 3 bag
6 1 1 ruler
7 1 2 compass
8 1 3 pencil case
9 2 2 soap
What I want to do is construct a query that meets these 4 criteria:
1) select rows under categoryID = 1
2) group rows by subCategoryID
3) limit 2 rows per subCategoryID
4) rows must be selected by random
Doug R's comment should be taken to heart. Please always include what you have tried. The questions you have are of varying difficulty and I feel like an answer will help you and others.
Example table and queries are here: http://sqlfiddle.com/#!9/3beee/6
In order to select records with category of 1, use the query below. The WHERE clause helps filter your records to only category 1
select * from tblSchoolSupplies where categoryID = 1;
Grouping rows by sub category requires more information. You'd generally group information to get statistics. For example, how many items are there in each subcategory or how many categories do each sub-category belong. Notice that I am selecting subCategoryID and doing GROUP BY on it also. Other columns are statistical calculations. Most, if not all, GROUP BY queries you will encounter will have a dimension like subCategoryID that is grouped along with statistical functions like sum, count, avg etc.
select
subCategoryID,
count(*) as items_in_subcategory,
count(distinct categoryID) as distinct_categories
from tblSchoolSupplies
group by subCategoryID;
Limiting 2 rows per subCategoryID is more challenging in comparison to your first question. The answer below is based on question 12113699
-- limit 2 rows per subCategoryID
set #number := 0;
set #subCategoryID := '';
select *
from
(
select *,
#number:=if(#subCategoryID = subCategoryID, #number + 1, 1) as rownum,
#subCategoryID:=subCategoryID as field1
from tblSchoolSupplies
order by subCategoryID, itemsID
) as subcat
where subcat.rownum < 3;
Using a random sort order and limiting only 1 record output will give you a randomly selected row. Please read through discussion in question 4329396 to gain different perspective on similar question(s).
select * from tblSchoolSupplies order by rand() limit 1;