Am trying to find the top 5 selling books.
My idea of calculating the top 5 selling books is this:
percentage = number_of_SUCCESS_transactions_each_book / total_number_transactions_each_book
Fetch the result(book_id, percentage) sorted in DESC order, with a LIMIT of 5
Here's a simple representation of the table containing data for the sake of understanding:
tblPayments
-----------
trans_id | book_id | payment_status | purchase_date
---------------------------------------------------
1 | 233 | SUCCESS | 2017-04-05
2 | 145 | FAILED | 2017-04-10
3 | 233 | FAILED | 2017-04-05
4 | 233 | SUCCESS | 2017-04-05
tblBooks
--------
book_id | book_name
-------------------
233 | My Autobiography
145 | How to learn English
201 | Finding Nemo
I will be querying for this top 5 selling books between a particular date. For example, between 2017-04-01 to 2017-04-25
What am expecting as output is something like this:
book_id | book_name | percentage
----------------------------------
233 | My Autobiography | 67
145 | How to learn English | 0
201 | Finding Nemo | 0
After brainstorming for hours, this is what am thinking of:
SELECT b.`book_id`, (
(
( SELECT COUNT(*) FROM `tblPayments` WHERE `book_id` = b.`book_id` AND `payment_status` = 'SUCCESS' ) /
( SELECT COUNT(*) FROM `tblPayments` WHERE `book_id` = b.`book_id` )
) * 100.0 ) AS `percentage`
FROM `tblPayments` AS b
WHERE b.`purchase_date` BETWEEN '2017-04-01' AND '2017-04-25'
GROUP BY b.`book_id`
ORDER BY `percentage` DESC LIMIT 5
Can it be further improved? Will it be causing any performance issues in database?
Right now am in train back to my home. So am writing this from tablet, out of from my head. I would be able to test it out when I reach back home in around 6 hrs time. So I thought to ask it here in the mean time.
Or do you have suggestion on a better approach than this?
Thank you
EDIT
Thanks to both #Strawberry and #Stefano Zanini for the answers.
Just one more doubt. Will it be okay if I just JOIN that with tblBooks to get the book_name field in the resultset?
I mean, this tblPayments table is supposed to have a ton of rows. So will JOIN be okay? Or I should get this 5 rows in PHP and do another query just to get the book_name of each of these 5 books? What would be efficient method?
DROP TABLE IF EXISTS transactions;
CREATE TABLE transactions
(transaction_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,book_id INT NOT NULL
,transaction_status VARCHAR(12) NOT NULL
,transaction_date DATE NOT NULL
);
INSERT INTO transactions VALUES
(1,233,'SUCCESS','2017-04-05'),
(2,145,'FAILED','2017-04-10'),
(3,233,'FAILED','2017-04-05'),
(4,233,'SUCCESS','2017-04-05');
SELECT book_id
, SUM(CASE WHEN transaction_status = 'success' THEN 1 ELSE 0 END)/COUNT(*) success_rate
FROM transactions
GROUP
BY book_id
+---------+--------------+
| book_id | success_rate |
+---------+--------------+
| 145 | 0.0000 |
| 233 | 0.6667 |
+---------+--------------+
I've left out the trivial bits.
You can improve that query by replacing the inner queries you use for the percentage with conditional sums:
SELECT b.`book_id`,
SUM(case when `book_id` = b.`book_id` AND `payment_status` = 'SUCCESS' then 1 end) /
COUNT(*) * 100.0 AS `percentage`
FROM `tblPayments` AS b
WHERE b.`purchase_date` BETWEEN '2017-04-01' AND '2017-04-25'
GROUP BY b.`book_id`
ORDER BY `percentage` DESC
LIMIT 5
Edit
Addressing your new question: there's no need to do a second query intermediated by PHP, you can do everything in a single query:
select t1.book_id, t2.book_name, t1.percentage
from (
SELECT b.`book_id`,
SUM(case when `book_id` = b.`book_id` AND `payment_status` = 'SUCCESS' then 1 end) /
COUNT(*) * 100.0 AS `percentage`
FROM `tblPayments` AS b
WHERE b.`purchase_date` BETWEEN '2017-04-01' AND '2017-04-25'
GROUP BY b.`book_id`
ORDER BY `percentage` DESC
LIMIT 5
) t1
join tblBooks t2
on t1.book_id = t2.book_id
That may be faster than joining tblBooks in the first query
SELECT b.`book_id`,
c.`book_name`,
SUM(case when `book_id` = b.`book_id` AND `payment_status` = 'SUCCESS' then 1 end) /
COUNT(*) * 100.0 AS `percentage`
FROM `tblPayments` AS b
JOIN `tblBooks` AS c
ON b.`book_id` = c.`book_id`
WHERE b.`purchase_date` BETWEEN '2017-04-01' AND '2017-04-25'
GROUP BY b.`book_id`
ORDER BY `percentage` DESC
LIMIT 5
But if I were you I'd do a few tests myself to see if the performances are actually an issue, and in that case which query is faster.
Related
I searched and found similar post to what I am trying to accomplish but not an exact solution. I have a table of grouped articles (articles that have information in common). I need to select articles from said table where there are at least 10 articles belonging to the group.
Group ID | Article ID | Posting Date
------------------------------------
| 1 | 1234 | 2017-07-14
| 1 | 5678 | 2017-07-14
| 1 | 9000 | 2017-07-14
| 2 | 8001 | 2017-07-14
| 2 | 8002 | 2017-07-14
------------------------------------
SELECT `groupid`, `article_id`, `publish_date`
FROM `article_group`
WHERE `groupid` IN ( SELECT `groupid`, count(`groupid`) as cnt
FROM `article_group`
WHERE date(`publish_date`) = '2017-07-14'
group by `groupid`
having cnt > 10
order by cnt desc
)
I understand the sub-query should just return the one column, but how do I accomplish this with the count and having?
You are very close. You should only be selecting one column in the subquery and the ORDER BY is not necessary:
SELECT `groupid`, `article_id`, `publish_date`
FROM `article_group`
WHERE `groupid` IN (SELECT `groupid`
FROM `article_group`
WHERE date(`publish_date`) = '2017-07-14'
GROUP BY `groupid`
HAVING COUNT(*) > 10
)
I have two tables, one is the cost table and the other is the payment table, the cost table contains the cost of product with the product name.
Cost Table
id | cost | name
1 | 100 | A
2 | 200 | B
3 | 200 | A
Payment Table
pid | amount | costID
1 | 10 | 1
2 | 20 | 1
3 | 30 | 2
4 | 50 | 1
Now I have to sum the total of cost by the same name values, and as well sum the total amount of payments by the costID, like the query below
totalTable
name | sum(cost) | sum(amount) |
A | 300 | 80 |
B | 200 | 30 |
However I have been working my way around this using the query below but I think I am doing it very wrong.
SELECT
b.name,
b.sum(cost),
a.sum(amount)
FROM
`Payment Table` a
LEFT JOIN
`Cost Table` b
ON
b.id=a.costID
GROUP by b.name,a.costID
I would be grateful if somebody would help me with my queries or better still an idea as to how to go about it. Thank you
This should work:
select t2.name, sum(t2.cost), coalesce(sum(t1.amount), 0) as amount
from (
select id, name, sum(cost) as cost
from `Cost`
group by id, name
) t2
left join (
select costID, sum(amount) as amount
from `Payment`
group by CostID
) t1 on t2.id = t1.costID
group by t2.name
SQLFiddle
You need do the calculation in separated query and then join them together.
First one is straight forward.
Second one you need to get the name asociated to that payment based in the cost_id
SQL Fiddle Demo
SELECT C.`name`, C.`sum_cost`, COALESCE(P.`sum_amount`,0 ) as `sum_amount`
FROM (
SELECT `name`, SUM(`cost`) as `sum_cost`
FROM `Cost`
GROUP BY `name`
) C
LEFT JOIN (
SELECT `Cost`.`name`, SUM(`Payment`.`amount`) as `sum_amount`
FROM `Payment`
JOIN `Cost`
ON `Payment`.`costID` = `Cost`.`id`
GROUP BY `Cost`.`name`
) P
ON C.`name` = P.`name`
OUTPUT
| name | sum_cost | sum_amount |
|------|----------|------------|
| A | 300 | 80 |
| B | 200 | 30 |
A couple of issues. For one thing, the column references should be qualified, not the aggregate functions.
This is invalid:
table_alias.SUM(column_name)
Should be:
SUM(table_alias.column_name)
This query should return the first two columns you are looking for:
SELECT c.name AS `name`
, SUM(c.cost) AS `sum(cost)`
FROM `Cost Table` c
GROUP BY c.name
ORDER BY c.name
When you introduce a join to another table, like Product Table, where costid is not UNIQUE, you have the potential to produce a (partial) Cartesian product.
To see what that looks like, to see what's happening, remove the GROUP BY and the aggregate SUM() functions, and take a look at the detail rows returned by a query with the join operation.
SELECT c.id AS `c.id`
, c.cost AS `c.cost`
, c.name AS `c.name`
, p.pid AS `p.pid`
, p.amount AS `p.amount`
, p.costid AS `p.costid`
FROM `Cost Table` c
LEFT
JOIN `Payment Table` p
ON p.costid = c.id
ORDER BY c.id, p.pid
That's going to return:
c.id | c.cost | c.name | p.pid | p.amount | p.costid
1 | 100 | A | 1 | 10 | 1
1 | 100 | A | 2 | 20 | 1
1 | 100 | A | 4 | 50 | 1
2 | 200 | B | 3 | 30 | 2
3 | 200 | A | NULL | NULL | NULL
Notice that we are getting three copies of the id=1 row from Cost Table.
So, if we modified that query, adding a GROUP BY c.name, and wrapping c.cost in a SUM() aggregate, we're going to get an inflated value for total cost.
To avoid that, we can aggregate the amount from the Payment Table, so we get only one row for each costid. Then when we do the join operation, we won't be producing duplicate copies of rows from Cost.
Here's a query to aggregate the total amount from the Payment Table, so we get a single row for each costid.
SELECT p.costid
, SUM(p.amount) AS tot_amount
FROM `Payment Table` p
GROUP BY p.costid
ORDER BY p.costid
That would return:
costid | tot_amount
1 | 80
2 | 30
We can use the results from that query as if it were a table, by making that query an "inline view". In this example, we assign an alias of v to the query results. (In the MySQL venacular, an "inline view" is called a "derived table".)
SELECT c.name AS `name`
, SUM(c.cost) AS `sum_cost`
, IFNULL(SUM(v.tot_amount),0) AS `sum_amount`
FROM `Cost Table` c
LEFT
JOIN ( -- inline view to return total amount by costid
SELECT p.costid
, SUM(p.amount) AS tot_amount
FROM `Payment Table` p
GROUP BY p.costid
ORDER BY p.costid
) v
ON v.costid = c.id
GROUP BY c.name
ORDER BY c.name
I have this query
SELECT
`from_id` as user_id,
MAX(`createdon`) as updated_at,
SUM(`unread`) as new,
u.username,
p.sessionid,
s.access
FROM (
SELECT `from_id`, `createdon`, `unread`
FROM `modx_messenger_messages`
WHERE `to_id` = {$id}
UNION
SELECT `to_id`, `createdon`, 0
FROM `modx_messenger_messages`
WHERE `from_id` = {$id}
ORDER BY `createdon` DESC
) as m
LEFT JOIN `modx_users` as u ON (u.id = m.from_id)
LEFT JOIN `modx_user_attributes` as p ON (p.internalKey = m.from_id)
LEFT JOIN `modx_session` as s ON (s.id = p.internalKey)
GROUP BY `from_id`
ORDER BY `new` DESC, `createdon` DESC;
table
id | message | createdon | from_id | to_id | unread
1 | test | NULL | 5 | 6 | 0
2 | test2 | NULL | 6 | 5 | 1
3 | test3 | NULL | 6 | 5 | 1
result new = 28. Why?
If remove joins new = 2, correctly.
Though it depends on the actual database, pure SQL says that a statement using GROUP BY requires all non-aggregated columns to be in the GROUP BY. Without including all columns, weird stuff can happen, which might explain why you get different results. If you know that the other columns are going to be the same within the user_id, you could do MAX(u.username) or something similar (again, depending on your database server). So I'd try and clean up the SQL statement first.
IT IS NOT THE SAME QUESTION AS : Using LIMIT within GROUP BY to get N results per group?
but i admit it is similar.
I need to select the first 2 rows per person.
the rows are ordered by Year received
Problem : there is a possibility than 2 data were entered the same month (Date is entered YYYY-MM)
The query I came with (following the referred question) is stuck in an BIG loop.
SELECT *
FROM `table_data` as b
WHERE (
SELECT count(*) FROM `table_data` as a
WHERE a.Nom = b.Nom and a.year < b.year
) <= 2;
Sample Data :
A | year | Nom
---------------------
b | 2011-01 | Tim
---------------------
d | 2011-01 | Tim
---------------------
s | 2011-01 | Tim
---------------------
a | 2011-03 | Luc
---------------------
g | 2011-01 | Luc
---------------------
s | 2011-01 | Luc
Should export :
A | year | Nom
---------------------
b | 2011-01 | Tim
---------------------
d | 2011-01 | Tim
---------------------
a | 2011-03 | Luc
---------------------
g | 2011-01 | Luc
(
-- First get a set of results as if you only wanted the latest entry for each
-- name - a simple GROUP BY from a derived table with an ORDER BY
SELECT *
FROM (
SELECT *
FROM `table_data`
ORDER BY `year` DESC
) `a`
GROUP BY `Nom`
)
UNION
(
-- Next union it with the set of result you get if you apply the same criteria
-- and additionally specify that you do not want any of the rows found by the
-- first operation
SELECT *
FROM (
SELECT *
FROM `table_data`
WHERE `id` NOT IN (
SELECT `id`
FROM (
SELECT *
FROM `table_data`
ORDER BY `year` DESC
) `a`
GROUP BY `Nom`
)
ORDER BY `year` DESC
) `b`
GROUP BY `Nom`
)
-- Optionally apply ordering to the final results
ORDER BY `Nom` DESC, `year` DESC
I feel sure there is a shorter way of doing it but right now I can't for the life of me work out what it is. That does work, though - assuming you have a primary key (which you should) and that it is called id.
I have this table named "events" in my mysql database:
+-----+-----------+------------------------------+------------+
| ID | CATEGORY | NAME | TYPE |
+-----+-----------+------------------------------+------------+
| 1 | 1 | Concert | music |
| 2 | 2 | Basketball match | indoors |
| 3 | 1 | Theather play | outdoors |
| 4 | 1 | Concert | outdoors |
+-----+-----------+------------------------------+------------+
I need a query to count the events with category 1 and which type is music and also outdoors
Meaning that from the table above the count should be only 1: there are three events with category 1
but only "Concert" has type outdoor and music (ID 1 and ID 4).
What would be that query? Can that be done?
Try this:
SELECT count(DISTINCT e1.name)
FROM `events` AS e1
JOIN `events` AS e2 ON e1.name = e2.name
WHERE e1.category = 1
AND e2.category = 1
AND e1.type = 'music'
AND e2.type = 'outdoor'
Or a harder to understand way, but way faster than the previous one:
SELECT count(*) FROM (
SELECT `name`
FROM `events`
WHERE `category` = 1
GROUP BY `name`
HAVING SUM( `type` = 'music') * SUM( `type` = 'outdoor' ) >= 1
) AS notNeeded
For 2 criteria I would use Alin's answer. An approach you can use for greater numbers is below.
SELECT COUNT(*)
FROM (SELECT `name`
FROM `events`
WHERE `category` = 1
AND `type` IN ( 'outdoors', 'music' )
GROUP BY `name`
HAVING COUNT(DISTINCT `type`) = 2) t
Try this query
Select count(*), group_concat(TYPE SEPARATOR ',') as types
from events where category = 1
HAVING LOCATE('music', types) and LOCATE('outdoors', types)
try:
SELECT * FROM `events` AS e1
LEFT JOIN `events` AS e2 USING (`name`)
WHERE e1.`category` = 1 AND e2.`category` = 1 AND e1.`type` = 'music' AND e2.`type` = 'outdoors'
SELECT COUNT(*)
FROM table
WHERE category=1
AND type='music' AND type IN (SELECT type
FROM table
WHERE type = 'outdoor')
one line keeps resetting my connection. wth? i'll try posting as a comment
Select count(distinct ID) as 'eventcount' from events where Category = '1' and Type in('music','outdoor')