MySQL: Sum the total number of unique ids - mysql

In my database, I wish to output the total number of books with reviews after a certain date:
> SELECT book_id, AVG(score)
FROM review
WHERE review.date > "2012-07-11"
GROUP BY review.book_id ;
+---------+------------+
| book_id | AVG(score) |
+---------+------------+
| 345335 | 3.5 |
| 974147 | 3 |
| 723923 | 4 |
| 281192 | 3 |
| 384423 | 3.5 |
| 123122 | 3.5 |
| 112859 | 3 |
| 234892 | 5 |
+---------+------------+
Now, I would like to know the "total number" of books which meet this condition. That is, I need a total sum of the book_id.
However, I am not sure how to do this. How do you SELECT the SUM(book_id)?

First of all, I'm pretty sure you don't want the SUM because that would be 3,179,893. SUM means adding up all the values and totaling them.
Instead you probably want the COUNT of DISTINCT ids that match your criteria. COUNTing means "how many rows" or using your words the "total number" of entities. And DISTINCT is the keyword which only looks at unique values.
So in SQL, this would be:
select count(distinct book_id)
from review
where review.date > '2012-07-11'

Maybe using COUNT() is what you want:
https://dev.mysql.com/doc/refman/5.7/en/counting-rows.html

Related

How to select for percentage and count calculatio from different table

Please i need help to complete the code to do the following:
calculate the percentage of voters who have voted based on batches.
get the number of voters by batch.
my table :
voters table
+-------------+-------------+----------------+
| stud_id | name | batch |
+-------------+-------------+----------------+
| 1 | Peter | 2016 |
| 2 | John | 2017 |
| 3 | Wick | 2017 |
+-------------+-------------+----------------+
vote table
+-------------+----------------+
| vote_id | stud_id |
+-------------+----------------+
| 1 | 1 |
| 2 | 2 |
+-------------+----------------+
ive tried this query:
SELECT voters.batch,COUNT(*) AS voted_batch, 100.0 * COUNT(*) / (SELECT COUNT(*) FROM vote) AS percentage
FROM vote JOIN voters WHERE voters.stud_id=vote.stud_id
GROUP BY batch asc
the code can only display the percentage of voting (not from a batch) and can only do a total of voting and I am confused to show a total of voters
and my expected table selection is:
+-------------+----------------+----------------+----------------+
| batch |total_batch | voted_batch | percentage |
+-------------+----------------+----------------+----------------+
|2016 | 1 | 1 | 100 |
|2017 | 2 | 1 | 50 |
+-------------+----------------+----------------+----------------+
Much appreciate for your support, thank you very much.
To differentiate between the total number of students in a batch, and the number of students who voted in a given batch, you need to use different fields in your COUNT expressions:
SELECT
voters.batch,
COUNT(*) AS total_batch,
COUNT(vote.vote_id) AS voted_batch,
(COUNT(vote.vote_id) * 1.0) / (COUNT(*)) AS percentage
FROM voters
LEFT JOIN vote ON
vote.stud_id = voters.stud_id
GROUP BY voters.batch
Note that:
COUNT(*) counts all students, even those who haven't voted (since we're using a LEFT JOIN).
COUNT(vote.vote_id), i.e. counting by the primary key of the table on the right side of the join, only counts students who have voted.
We have to multiple one of the COUNTs by 1.0, to coerce the result to a decimal (otherwise, you run into integer division).
Finally, you might want to consider changing your table names to be consistent: One is named voters (plural), while the other is named vote (singular). To avoid starting a religious war, I won't tell you which one to go for.

count groupings of multiple columns

I have a table of tickets to multiple dates of shows shows. basically, it looks like this...
+----+---------------+--------------+-----------+
| ID | ticket_holder | ticket_buyer | show_date |
+----+---------------+--------------+-----------+
ticket_holder and ticket_buyer are both user ids
If I wanted to count the total number of tickets that one ticket holder has, I could group by that holder and count the rows, but I want more stats than that.
I want to know a user's total bought tickets, how many they hold and how many shows they've bought tickets for.
+------+---------+--------+-------+
| USER | HOLDING | BOUGHT | DATES |
+------+---------+--------+-------+
| 1 | 12 | 24 | 7 |
+------+---------+--------+-------+
| 2 | 3 | 4 | 2 |
+------+---------+--------+-------+
| 3 | 1 | 2 | 1 |
+------+---------+--------+-------+
is it possible to put all this in a query, or do i need to do php stuff to make it happen?
I would do it in multiple queries. You can't group by either ticket_holder or ticket_buyer like you want, in a single query. If you try GROUP BY ticket_holder, ticket_buyer then it will group by both columns, which is not what you want.
SELECT ticket_holder, COUNT(*) AS tickets_held
FROM `a table of tickets` GROUP BY ticket_holder;
SELECT ticket_buyer, COUNT(*) as tickets_bought
FROM `a table of tickets` GROUP BY ticket_buyer;
SELECT ticket_buyer, COUNT(DISTINCT show_date) AS shows_bought
FROM `a table of tickets` GROUP BY ticket_buyer;
Not every task has to be accomplished in a single query! It's part of the design of SQL that it should be used by some application language, and you're expected to handle formatting and display in the application.

MySQL: select all rows where just the name is distinct [duplicate]

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 4 years ago.
I'm currently trying to select unique entries in only the name column. I have tried using this query but it will not return prices that are the same as well. I've tried other variations with no success either.
SELECT DISTINCT name, price from table;
Here's the table I'm working with:
+----+-------------------+
| id | name | price |
+----+-----------+-------+
| 1 | Henry | 20 |
| 2 | Henry | 30 |
| 3 | Robert | 20 |
| 4 | Joshua | 10 |
| 5 | Alexander | 30 |
+----+-----------+-------+
The output that I'm seeking is:
+----+-------------------+
| id | name | price |
+----+-----------+-------+
| 1 | Henry | 20 |
| 3 | Robert | 20 |
| 4 | Joshua | 10 |
| 5 | Alexander | 30 |
+----+-----------+-------+
The desired output as you can tell only removed the duplicate name and none of the prices. Is there something I can add to my query above to only select unique entries in the name column? Any help is really appreciated as I have tried to find a solution on here, Google, DuckDuckGo, etc. with no luck.
From your sample data, this should work.
SELECT MIN(Id) AS Id, name, MIN(price) AS price
FROM table
GROUP BY name;
This is what GROUP BY is for:
SELECT * FROM `table` GROUP BY `name`
Usually people run into trouble because they will now get an arbitrarily-chosen row when more than one matches for a given name — you have to use aggregate functions to pick a specific one, e.g. "the one with the maximum price".
But in your case, since you don't seem to care which row is returned, this is perfect as-is.
So you want to select distinct list of rows AND then select that given entire row from the table? Try this query where temporary query is just a list of uniqueid then that row is linked back to the table.
Select n.*
From nameprices n
Join (Select MIN(id) as id
From nameprices
Group by name
Order By id) aTemp On (aTemp.id=n.id);
This is a common problem in SQL queries where we want to use that given fully row data but filter was using a distinct/groupby formula.

MySQL query to select rows from table 2 if *all* rows from table 1 are not present

I'm doing a kind of point-of-sale system whose MySQL database has (among other things) a table with items for sale, a table with sales, and a table with purchases (a purchase being my ad-hoc notation for any single item bought in a sale; if the same person buys three items at once, for example, that's one sale consisting of three purchases). All these tables have logical IDs, viz. item_id, sale_id, purchase_id, and are easily joined with simple pivotal tables.
I am now trying to add a discount feature; basically your garden-variety supermarket discount: buy these particular items and pay X instead of paying the full sum of the regular item prices. These 'package deals' have their own table and are linked to the items table with a simple pivotal table containing deal_id and item_id.
My problem is getting to the point of figuring out when this is to be applied. To give some example data:
items
+---------+--------+---------+
| item_id | title | price |
+---------+--------+---------+
| 12 | Shoe | 10 |
| 76 | Coat | 23 |
| 82 | Whip | 19 |
+---------+--------+---------+
sales
+---------+-----------+
| sale_id | timestamp |
+---------+-----------+
| 2973 | 144995839 |
| 3092 | 144996173 |
+---------+-----------+
purchases
+-------------+-------------+---------+----------+---------+
| purchase_id | no_of_items | item_id | at_price | sale_id |
+-------------+-------------+---------+----------+---------+
| 12993 | 1 | 12 | 10 | 2973 |
| 12994 | 1 | 76 | 23 | 2973 |
| 12996 | 1 | 82 | 19 | 2973 |
| 13053 | 1 | 12 | 10 | 3092 |
| 13054 | 1 | 82 | 19 | 3092 |
+-------------+-------------+---------+----------+---------+
package_deals
+---------+-------+
| deal_id | price |
+---------+-------+
| 1 | 40 |
+---------+-------+
deals_items
+---------+---------+
| deal_id | item_id |
+---------+---------+
| 1 | 12 |
| 1 | 76 |
| 1 | 82 |
+---------+---------+
As is hopefully obvious from that, we have a shoe that cost $10 (let's just assume we use dollars as our currency here, doesn't matter), a coat that costs $23, and a whip that costs $19. We also have a package deal that if you buy both a shoe, a coat, and a whip, you get the whole thing for $40 altogether.
Of the two sales given, one (2973) has purchased all three things and will get the discount, while the other (3092) has purchased only the shoe and the whip and won't get the discount.
In order to find out whether or not to apply the package-deal discount, I of course have to find out whether all the item_ids in a package deal are present in the purchases table for a given sale_id.
How do I do this?
I thought I should be able to do something like this:
SELECT deal_id, item_id, purchase_id
FROM package_deals
LEFT JOIN deals_items
USING (deal_id)
LEFT JOIN purchases
USING (item_id)
WHERE
sale_id = 2973
AND item_id IS NULL
GROUP BY deal_id
In my head, that retrieved all rows from the package_deal table where at least one of the item_ids associated with the package deal in question does not have a corresponding match in the purchases table for the sale_id given. This would then have told me which packages don't apply; i.e., it would return zero rows for purchase 2973 (since none of the items associated with package deal 1 are absent from the purchases table filtered on sale_id = 2973) and one row for 3092 (since one of the items associated with package deal one—namely the coat, item_id 76—is absent from the purchases table filtered on sale_id = 3092).
Obviously, it doesn't do what I naïvely thought it would—rather, it just always returns zero rows, no matter what.
It doesn't really matter much to me whether the resulting set gives me one row for each package deal that should apply, or one for each package deal that shouldn't apply—but how do I get it to show me either in a single query?
Is it even possible?
The problem with your query above is that sale_id is also NULL in the missing row that you're interested in, due to the LEFT JOIN.
This query will return the deal_id for any deals that DO NOT apply to a given order:
SELECT DISTINCT
pd.deal_id
FROM package_deals pd
JOIN deals_items di on pd.deal_id = di.deal_id
WHERE di.item_id NOT IN (SELECT item_id FROM purchases WHERE sale_id = 3092)
From that it's easy to work out the ones that do apply. Note that for a fully functioning system, you'd still need to take the purchase quantities into account - e.g. if the customer had bought 2 of two the items in the deal, but only 1 of the third... etc.
A SQL fiddle demonstrating the query is here: http://sqlfiddle.com/#!9/f2ae4/8
Note that I've made my joins using the ON syntax, as I'm simply more familiar than with USING. I expect that would work too if you prefer it.

Mysql query Max not working

What i want to happen is group by parentid first, then group by position, which i have done. In that group i want the name with the highest rating to be displayed, which isn't happening. Instead the lowest id for each group is being displayed. The results should be tv1,tv3,tv5,tv7; as these are the highest rated values for each group.
id | name| parentid| position| rating |
1 | tv1 | 1 | 1 | 6 |
2 | tv2 | 1 | 2 | 5 |
3 | tv3 | 1 | 2 | 7 |
4 | tv4 | 1 | 2 | 3 |
5 | tv5 | 5 | 1 | 8 |
6 | tv6 | 5 | 1 | 2 |
7 | tv7 | 3 | 1 | 9 |
8 | tv8 | 3 | 1 | 3 |
$getquery = mysql_query("SELECT name,MAX(rating) FROM outcomes GROUP BY position,parentid") or die(mysql_error());
while($row=mysql_fetch_assoc($getquery)) {
$name = $row['name'];
$rating = $row['rating'];
echo "<p>Name: $name - $rating</p><p></p>";
}
It's not that the lowest id is being displayed -- you're not actually selecting the id column. Probably what you are seeing is the first entry in the name column for each group.
SELECT name, MAX(rating)
doesn't do what you think it does -- it doesn't instruct MySQL to pick the maximum value from the rating column, and also return the name that is associated with that row (aside: what do you think it would return if there was a tie for the maximum rating? What do you think it would return if you used AVERAGE rather than MAX?)
What it does instead is return the correctly calculated MAX(rating), and then one of the names out of that group. It doesn't guarantee which one gets returned, and it can change depending on how it decides to execute the query.
In fact, because of the undefined nature of a query such as this, it's not even legal SQL in other databases. (Try this in Postgres, and you'll get an error. Heck, try it in MySQL with the ONLY_FULL_GROUP_BY option enabled, and you'll get a similar error)
If what you want to do is find the maximum rating for each group, and then find the name associated with it, you'll have to do something like this:
SELECT name, max_rating FROM outcomes
JOIN (SELECT position, parentid, MAX(rating) AS max_rating from outcomes group by position, parentid) AS aggregated_table
USING (position, parentid)
WHERE rating = max_rating
(There are four or five other ways to do this, searching this site for mysql and aggregation will likely turn them up)