Get N per group MYSQL - mysql

IT IS NOT THE SAME QUESTION AS : Using LIMIT within GROUP BY to get N results per group?
but i admit it is similar.
I need to select the first 2 rows per person.
the rows are ordered by Year received
Problem : there is a possibility than 2 data were entered the same month (Date is entered YYYY-MM)
The query I came with (following the referred question) is stuck in an BIG loop.
SELECT *
FROM `table_data` as b
WHERE (
SELECT count(*) FROM `table_data` as a
WHERE a.Nom = b.Nom and a.year < b.year
) <= 2;
Sample Data :
A | year | Nom
---------------------
b | 2011-01 | Tim
---------------------
d | 2011-01 | Tim
---------------------
s | 2011-01 | Tim
---------------------
a | 2011-03 | Luc
---------------------
g | 2011-01 | Luc
---------------------
s | 2011-01 | Luc
Should export :
A | year | Nom
---------------------
b | 2011-01 | Tim
---------------------
d | 2011-01 | Tim
---------------------
a | 2011-03 | Luc
---------------------
g | 2011-01 | Luc

(
-- First get a set of results as if you only wanted the latest entry for each
-- name - a simple GROUP BY from a derived table with an ORDER BY
SELECT *
FROM (
SELECT *
FROM `table_data`
ORDER BY `year` DESC
) `a`
GROUP BY `Nom`
)
UNION
(
-- Next union it with the set of result you get if you apply the same criteria
-- and additionally specify that you do not want any of the rows found by the
-- first operation
SELECT *
FROM (
SELECT *
FROM `table_data`
WHERE `id` NOT IN (
SELECT `id`
FROM (
SELECT *
FROM `table_data`
ORDER BY `year` DESC
) `a`
GROUP BY `Nom`
)
ORDER BY `year` DESC
) `b`
GROUP BY `Nom`
)
-- Optionally apply ordering to the final results
ORDER BY `Nom` DESC, `year` DESC
I feel sure there is a shorter way of doing it but right now I can't for the life of me work out what it is. That does work, though - assuming you have a primary key (which you should) and that it is called id.

Related

MYSQL subquery WHERE IN with count/having

I searched and found similar post to what I am trying to accomplish but not an exact solution. I have a table of grouped articles (articles that have information in common). I need to select articles from said table where there are at least 10 articles belonging to the group.
Group ID | Article ID | Posting Date
------------------------------------
| 1 | 1234 | 2017-07-14
| 1 | 5678 | 2017-07-14
| 1 | 9000 | 2017-07-14
| 2 | 8001 | 2017-07-14
| 2 | 8002 | 2017-07-14
------------------------------------
SELECT `groupid`, `article_id`, `publish_date`
FROM `article_group`
WHERE `groupid` IN ( SELECT `groupid`, count(`groupid`) as cnt
FROM `article_group`
WHERE date(`publish_date`) = '2017-07-14'
group by `groupid`
having cnt > 10
order by cnt desc
)
I understand the sub-query should just return the one column, but how do I accomplish this with the count and having?
You are very close. You should only be selecting one column in the subquery and the ORDER BY is not necessary:
SELECT `groupid`, `article_id`, `publish_date`
FROM `article_group`
WHERE `groupid` IN (SELECT `groupid`
FROM `article_group`
WHERE date(`publish_date`) = '2017-07-14'
GROUP BY `groupid`
HAVING COUNT(*) > 10
)

Calculating percentage from MySQL query

Am trying to find the top 5 selling books.
My idea of calculating the top 5 selling books is this:
percentage = number_of_SUCCESS_transactions_each_book / total_number_transactions_each_book
Fetch the result(book_id, percentage) sorted in DESC order, with a LIMIT of 5
Here's a simple representation of the table containing data for the sake of understanding:
tblPayments
-----------
trans_id | book_id | payment_status | purchase_date
---------------------------------------------------
1 | 233 | SUCCESS | 2017-04-05
2 | 145 | FAILED | 2017-04-10
3 | 233 | FAILED | 2017-04-05
4 | 233 | SUCCESS | 2017-04-05
tblBooks
--------
book_id | book_name
-------------------
233 | My Autobiography
145 | How to learn English
201 | Finding Nemo
I will be querying for this top 5 selling books between a particular date. For example, between 2017-04-01 to 2017-04-25
What am expecting as output is something like this:
book_id | book_name | percentage
----------------------------------
233 | My Autobiography | 67
145 | How to learn English | 0
201 | Finding Nemo | 0
After brainstorming for hours, this is what am thinking of:
SELECT b.`book_id`, (
(
( SELECT COUNT(*) FROM `tblPayments` WHERE `book_id` = b.`book_id` AND `payment_status` = 'SUCCESS' ) /
( SELECT COUNT(*) FROM `tblPayments` WHERE `book_id` = b.`book_id` )
) * 100.0 ) AS `percentage`
FROM `tblPayments` AS b
WHERE b.`purchase_date` BETWEEN '2017-04-01' AND '2017-04-25'
GROUP BY b.`book_id`
ORDER BY `percentage` DESC LIMIT 5
Can it be further improved? Will it be causing any performance issues in database?
Right now am in train back to my home. So am writing this from tablet, out of from my head. I would be able to test it out when I reach back home in around 6 hrs time. So I thought to ask it here in the mean time.
Or do you have suggestion on a better approach than this?
Thank you
EDIT
Thanks to both #Strawberry and #Stefano Zanini for the answers.
Just one more doubt. Will it be okay if I just JOIN that with tblBooks to get the book_name field in the resultset?
I mean, this tblPayments table is supposed to have a ton of rows. So will JOIN be okay? Or I should get this 5 rows in PHP and do another query just to get the book_name of each of these 5 books? What would be efficient method?
DROP TABLE IF EXISTS transactions;
CREATE TABLE transactions
(transaction_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,book_id INT NOT NULL
,transaction_status VARCHAR(12) NOT NULL
,transaction_date DATE NOT NULL
);
INSERT INTO transactions VALUES
(1,233,'SUCCESS','2017-04-05'),
(2,145,'FAILED','2017-04-10'),
(3,233,'FAILED','2017-04-05'),
(4,233,'SUCCESS','2017-04-05');
SELECT book_id
, SUM(CASE WHEN transaction_status = 'success' THEN 1 ELSE 0 END)/COUNT(*) success_rate
FROM transactions
GROUP
BY book_id
+---------+--------------+
| book_id | success_rate |
+---------+--------------+
| 145 | 0.0000 |
| 233 | 0.6667 |
+---------+--------------+
I've left out the trivial bits.
You can improve that query by replacing the inner queries you use for the percentage with conditional sums:
SELECT b.`book_id`,
SUM(case when `book_id` = b.`book_id` AND `payment_status` = 'SUCCESS' then 1 end) /
COUNT(*) * 100.0 AS `percentage`
FROM `tblPayments` AS b
WHERE b.`purchase_date` BETWEEN '2017-04-01' AND '2017-04-25'
GROUP BY b.`book_id`
ORDER BY `percentage` DESC
LIMIT 5
Edit
Addressing your new question: there's no need to do a second query intermediated by PHP, you can do everything in a single query:
select t1.book_id, t2.book_name, t1.percentage
from (
SELECT b.`book_id`,
SUM(case when `book_id` = b.`book_id` AND `payment_status` = 'SUCCESS' then 1 end) /
COUNT(*) * 100.0 AS `percentage`
FROM `tblPayments` AS b
WHERE b.`purchase_date` BETWEEN '2017-04-01' AND '2017-04-25'
GROUP BY b.`book_id`
ORDER BY `percentage` DESC
LIMIT 5
) t1
join tblBooks t2
on t1.book_id = t2.book_id
That may be faster than joining tblBooks in the first query
SELECT b.`book_id`,
c.`book_name`,
SUM(case when `book_id` = b.`book_id` AND `payment_status` = 'SUCCESS' then 1 end) /
COUNT(*) * 100.0 AS `percentage`
FROM `tblPayments` AS b
JOIN `tblBooks` AS c
ON b.`book_id` = c.`book_id`
WHERE b.`purchase_date` BETWEEN '2017-04-01' AND '2017-04-25'
GROUP BY b.`book_id`
ORDER BY `percentage` DESC
LIMIT 5
But if I were you I'd do a few tests myself to see if the performances are actually an issue, and in that case which query is faster.

How to select rows that has distinct value in one field and sorted by another field in MySQL?

I have a table like this:
-----------------------------
id | uid | year | other | many | fields
-----------------------------
1 | 1 | 2010 | blabla ...
2 | 2 | 1999 | blablabla ...
3 | 3 | 2011 | bla ...
4 | 1 | 2006 | blablablabla ...
...
-----------------------------
What I want is to select all fields in all records that
has distinct uid and only returns the last record (i.e., has the highest id)
the results are sorted by year
An example of returned records like:
-----------------------------
id | uid | year | other | many | fields
-----------------------------
2 | 2 | 1999 | blablabla ...
4 | 1 | 2006 | blablablabla ...
3 | 3 | 2011 | bla ...
-----------------------------
It looks like similar to question How to use DISTINCT and ORDER BY in same SELECT statement? but I couldn't get it work.
I tried SELECT * FROM table GROUP BY uid ORDER BY MAX(id) DESC, MAX(year), but it seems neither sorting id nor year.
update:
Thanks for all solutions, here is the new problem: I'm actually developing plugin in Discuz, and it doesn't allow sub queries for security reason, is there any way to use only one select? Or any workaround in Discuz plugin development? Thanks again.
you can try this one
select distinct * from test where id IN (select MAx(id) id from test GROUP BY uid) order by year
test=>table name;
it will give out put as
To my knowledge, I can give you two approaches,
(1) Mysql specific
SELECT * FROM (SELECT * FROM `table_name` ORDER BY `id` DESC) tbl
GROUP BY `uid` ORDER BY `year`
Note: In Mysql, we don't have to apply GROUP BY to every column in order to get its non-aggregate value and instead, only the first row is returned.
(2) For any RDBMS
SELECT * FROM table_name
WHERE id IN (
SELECT Max(id) FROM table_name
GROUP BY uid
)
ORDER BY year
OR
SELECT tbl1.id, tbl1.uid, tbl1.year, tbl1.other
FROM table_name tbl1
INNER JOIN (
SELECT Max(id) id FROM table_name
GROUP BY uid
) tbl2
ON tbl1.id = tbl2.id
ORDER BY tbl1.year
All of the above statements will yield the same result as below:
----------------------------
| id | uid | year | other |
-----+-----+------+------------
| 2 | 2 | 1999 | blablabla ...
| 4 | 1 | 2006 | blablablabla ...
| 3 | 3 | 2011 | bla ...
-----------------------------
The following query might do the job done.
SELECT
your_table.id,
your_table.uid,
your_table.year,
your_table.other
FROM your_table
INNER JOIN
(
SELECT
uid,
MAx(id) max_id
FROM your_table
GROUP BY uid
) t
ON your_table.id = t.max_id AND your_table.uid = t.uid
ORDER BY your_table.id, your_table.year;
The above query will return the records corresponding to maximum id under same uid and sorts the records in ascending order of id and year.
SQL FIDDLE is not working
TEST DATA:
DROP TABLE IF EXISTS `your_table`;
CREATE TABLE `your_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`uid` int(50) NOT NULL,
`year` int(11) NOT NULL,
`other` varchar(100) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `your_table` VALUES ('1', '1', '2010', 'blabla ...');
INSERT INTO `your_table` VALUES ('2', '2', '1999', 'blablabla...');
INSERT INTO `your_table` VALUES ('3', '3', '2011', 'bla ...');
INSERT INTO `your_table` VALUES ('4', '1', '2006', 'blablablabla....');
Output:
Running the above query on these test data you will get the following output.
id uid year other
2 2 1999 blablabla...
3 3 2011 bla ...
4 1 2006 blablablabla....

Passing the results of a Mysql query to a subquery on the same table

CREATE TABLE test (
id INT(12),
time VARCHAR(16),
group INT(2),
taken TINYINT(1),
RID int(11) NOT NULL auto_increment,
primary KEY (RID));
id | time | group | taken
---------------------------
1 | 13.00| 1 | 1
---------------------------
2 | 13.00| 2 | 0
---------------------------
3 | 14.00| 2 | 0
---------------------------
4 | 15.00| 2 | 0
---------------------------
5 | 12.00| 3 | 0
Having a table structure and sample data as above, I want to get the smallest "group" number which has not been "taken" (taken=0)
I have come with two queries :
SELECT * From `test`
WHERE taken=0
and
SELECT * FROM `test`
WHERE `group` = ( SELECT MIN(`group`) FROM `test` )
Can someone show me how to combine the two queries so that I can pass the results of the first query to the second query to get as below.
id | time | group | taken
---------------------------
2 | 13.00| 2 | 0
---------------------------
3 | 14.00| 2 | 0
---------------------------
4 | 15.00| 2 | 0
---------------------------
You can use the result of the first query in the second query as follows:
SELECT *
FROM TEST
WHERE `group` = (SELECT MIN(`group`)
FROM `test`
WHERE taken = 0)
Which gives you the desired result according to this SQLFiddle
Use the sub query to get the lowest group for taken of 0. Join your main table to the results of the sub query.
Something like this:-
SELECT a.*
From `test` a
INNER JOIN
(
SELECT MIN(`group`) AS min_group
FROM `test`
WHERE taken=0
) b
ON a.taken = b.taken
AND a.`group` = b.min_group
try this:
SELECT min(`group`) FROM (
SELECT * FROM test
WHERE taken = 0)
AS t;

GROUP BY max in mysql

Let's say I have the following two entries:
`id` | `timestamp` | `content` | `reference`
1 | 2012-01-01 | NEWER | 1
2 | 2013-01-01 | NEWEST | 1
3 | 2011-01-01 | OLD | 2
I need the following result from my query:
`id` | `timestamp` | `content` | `reference`
2 | 2013-01-01 | NEWEST | 1
3 | 2011-01-01 | OLD | 2
Here's what I have so far, but it is incorrect:
SELECT * FROM table GROUP BY reference
What would be the correct query here?
I am looking to get the newest piece of content per reference id. In the example above, there are two reference id's (1 & 2), and I want to get the most recent entry for each.
SELECT *
FROM (SELECT * FROM table ORDER BY timestamp desc) as sub
GROUP BY reference
If you wish to expand the query, put limiting logic into the subquery like so for better performance:
SELECT *
FROM (SELECT *
FROM table
WHERE 1=1 and 2=2
ORDER BY timestamp desc
) as sub
GROUP BY reference
I take it you want the newest of each reference? Something like this:
SELECT * FROM my_table
WHERE id IN (
SELECT id FROM my_table ORDER BY timestamp DESC GROUP BY reference LIMIT 1
);
select * from table where reference_id in
(select max(id) from table group by reference)