The following image is the ER diagram of the database:
My task is to create a report that includes the following:
the storeID
the store name
the number of unique players that have purchased a badge from the store
the number of unique players that have not purchased a badge from the store
the total money spent at the store
the most expensive badge a player has purchased at the store
the cheapest badge a player has purchased at the store
the average price of the items that have been purchased at the store.
But when I am trying to execute the following SQL command, I am getting an error saying: Error Code 1111. Invalid use of group function.
use treasurehunters;
select rpt_category.storeId,
rpt_category.storeName,
total_purchased_user,
non_purchased_player,
total_spent,
expensive_badge,
cheapest_badge
avg_spent
from
(select badgename as expensive_badge
from badge
inner join purchase
where cost = max(cost)) rpt_data_2
inner join
(select badgename as cheapest_badge
from badge
inner join purchase
where cost = min(cost)) rpt_data_3
inner join
(select distinct count(username) as total_purchased_user,
storeid,
storename,
sum(cost) as total_spent,
average(cost) as avg_spent
from player
inner join purchase
inner join store
inner join badge
on store.storeID = purchase.storeID and
purchase.username= player.username and
purchase.badgeID = badge.badgeId) rpt_category
inner join
(select count (username) as non_purchased_player,
storeid
from player
inner join purchase
on purchase.storeid != store.storeid and
player.userername= purchase.uername ) rpt_data_1;
Now, what can I do to get rid of that error.
The cause of your error is likely that you're implying a store-level grouping without explicitly grouping on that column with a GROUP BY clause. Therefore, you're attempting to extract aggregate results that are impossible at the table-level.
You can probably resolve this by adding GROUP BY store.storeID in each of your subqueries. However, there's a lot more wrong with this query that makes it unfavorable to attempt to diagnose and resolve it.
This is all doable in a single query / grouping. Here's what your query should look like:
SELECT
store.storeID,
MAX(store.storeName) AS storeName,
COUNT(DISTINCT purchase.username) AS total_purchased_user,
MAX(player_count.players) - COUNT(DISTINCT purchase.username) AS non_purchased_user,
SUM(purchase.cost) AS total_spent,
AVG(purchase.cost) AS avg_spent,
SUBSTRING(MIN(CONCAT(LPAD(purchase.cost, 11, '0'), badge.badgeName)), 12) AS cheapest_badge,
SUBSTRING(MAX(CONCAT(LPAD(purchase.cost, 11, '0'), badge.badgeName)), 12) AS expensive_badge
FROM store
LEFT JOIN purchase ON store.storeID = purchase.storeID
LEFT JOIN badge ON purchase.badgeID = badge.badgeId
CROSS JOIN (SELECT COUNT(*) AS players FROM player) AS player_count
GROUP BY store.storeID;
What's happening here (working bottom-up):
GROUP BY store to ensure the results are aggregated by that, and all other metrics are calculated
FROM store / LEFT JOIN all other tables ensures we get metrics from every store, whether or not there are purchases for it
CROSS JOIN (SELECT COUNT(*) FROM players) this is a hack to give us a running total of all players that we can reference against store player-purchase counts to get the "didn't purchase" count simply and quickly, without any additional joins
COUNT(DISTINCT purchase.username) ensures that user counts are referenced from purchases. This also means we don't have to join on the players table in this main portion of the query to get purchase counts.
SUM / AVERAGE work like you had them
SUBSTRING(MIN(CONCAT... these calculations are using Scalar-Aggregate Reduction, a technique I invented to prevent the need for self-joining a query to get associated min/max values. There's more on this technique here: SQL Query to get column values that correspond with MAX value of another column?
Cheers!
Related
I tried to write a query, but unfortunately I didn't succeed.
I want to know how many packages delivered over a given period by a person.
So I want to know how many packages were delivered by John (user_id = 1) between 01-02-18 and 28-02-18. John drives another car (another plate_id) every day.
(orders_drivers.user_id, plates.plate_name, orders.delivery_date, orders.package_amount)
I have 3 table:
orders with plate_id delivery_date package_amount
plates with plate_id plate_name
orders_drivers with plate_id plate_date user_id
I tried some solutions but didn't get the expected result. Thanks!
Try using JOINS as shown below:
SELECT SUM(o.package_amount)
FROM orders o INNER JOIN orders_drivers od
ON o.plate_id=od.plate_id
WHERE od.user_id=<the_user_id>;
See MySQL Join Made Easy for insight.
You can also use a subquery:
SELECT SUM(o.package_amount)
FROM orders o
WHERE EXISTS (SELECT 1
FROM orders_drivers od
WHERE user_id=<user_id> AND o.plate_id=od.plate_id);
SELECT sum(orders.package_amount) AS amount
FROM orders
LEFT JOIN plates ON orders.plate_id = orders_drivers.plate_id
LEFT JOIN orders_driver ON orders.plate_id = orders_drivers.plate_id
WHERE orders.delivery_date > date1 AND orders.delivery_date < date2 AND orders_driver.user_id = userid
GROUP BY orders_drivers.user_id
But seriously, you need to ask questions that makes more sense.
sum is a function to add all values that has been grouped by GROUP BY.
LEFT JOIN connects all tables by id = id. Any other join can do this in this case, as all ids are unique (at least I hope).
WHERE, where you give the dates and user.
And GROUP BY userid, so if there are more records of the same id, they are returned as one (and summed by their pack amount.)
With the AS, your result is returned under the name 'amount',
If you want the total of packageamount by user in a period, you can use this query:
UPDATE: add a where clause on user_id, to retrieve John related data
SELECT od.user_id
, p.plate_name
, SUM(o.package_amount) AS TotalPackageAmount
FROM orders_drivers od
JOIN plates p
ON o.plate_id = od.plate_id
JOIN orders o
ON o.plate_id = od.plate_id
WHERE o.delivery_date BETWEEN convert(datetime,01/02/2018,103) AND convert(datetime,28/02/2018,103)
AND od.user_id = 1
GROUP BY od.user_id
, p.plate_name
It groups rows on user_id and plate_name, filter a period of delivery_date(s) and then calculate the sum of packageamount for the group
Please see above for the data structure. I am trying to write an SQL query to get the average number of customers for each session
My attempt:
select avg(A.NumberCustomer)
from(
select SessionName, count(distinct customers.Idcustomer) as NumberCustomer,
from customers, enrollments, sessions
where customers.Idcustomer=enrollments.Idcustomer and enrollments.Idsession=sessions.Idsession
group by sessions.SessionName
) A
But I seem to get an error on the from customers, enrollments, sessions line
Not sure about this, any help appreciated.
Thanks
You have and extra comma that you should to delete:
select avg(A.NumberCustomer)
from(
select SessionName,
count(distinct customers.Idcustomer) as NumberCustomer, #<--- here
from customers, enrollments, sessions
where customers.Idcustomer=enrollments.Idcustomer
and enrollments.Idsession=sessions.Idsession
group by sessions.SessionName
) A
By the way, I suggest to you to move to SQL'99 join syntax for readability reasons:
SELECT
avg(A.NumberCustomer)
FROM (
select
SessionName,
count(distinct customers.Idcustomer) as NumberCustomer
from customers
inner join enrollments
on customers.Idcustomer=enrollments.Idcustomer
inner join sessions
on enrollments.Idsession=sessions.Idsession
group by sessions.SessionName
) A
Also, nice diagram on question and remember to include your error message next time.
For the average number of customers in each session, you should be able to use just the enrollments table. The average would be the number of enrollments divided by the number of sessions:
select count(*) / count(distinct idSession)
from enrollments e;
This makes the following assumptions:
All sessions have at least one customer (your original query had this assumption as well).
No customer signs up multiple times for the same session.
I have an SQL query that needs to perform multiple inner joins, as follows:
SELECT DISTINCT adv.Email, adv.Credit, c.credit_id AS creditId, c.creditName AS creditName, a.Ad_id AS adId, a.adName
FROM placementlist pl
INNER JOIN
(SELECT Ad_id, List_id FROM placements) AS p
ON pl.List_id = p.List_id
INNER JOIN
(SELECT Ad_id, Name AS adName, credit_id FROM ad) AS a
ON ...
(few more inner joins)
My question is the following: How can I optimize this query? I was under the impression that, even though the way I currently query the database creates small temporary tables (inner SELECT statements), it would still be advantageous to performing an inner join on the unaltered tables as they could have about 10,000 - 100,000 entries (not millions). However, I was told that this is not the best way to go about it but did not have the opportunity to ask what the recommended approach would be.
What would be the best approach here?
To use derived tables such as
INNER JOIN (SELECT Ad_id, List_id FROM placements) AS p
is not recommendable. Let the dbms find out by itself what values it needs from
INNER JOIN placements AS p
instead of telling it (again) by kinda forcing it to create a view on the table with the two values only. (And using FROM tablename is even much more readable.)
With SQL you mainly say what you want to see, not how this is going to be achieved. (Well, of course this is just a rule of thumb.) So if no other columns except Ad_id and List_id are used from table placements, the dbms will find its best way to handle this. Don't try to make it use your way.
The same is true of the IN clause, by the way, where you often see WHERE col IN (SELECT DISTINCT colx FROM ...) instead of simply WHERE col IN (SELECT colx FROM ...). This does exactly the same, but with DISTINCT you tell the dbms "make your subquery's rows distinct before looking for col". But why would you want to force it to do so? Why not have it use just the method the dbms finds most appropriate?
Back to derived tables: Use them when they really do something, especially aggregations, or when they make your query more readable.
Moreover,
SELECT DISTINCT adv.Email, adv.Credit, ...
doesn't look to good either. Yes, sometimes you need SELECT DISTINCT, but usually you wouldn't. Most often it is just a sign that you haven't thought your query through.
An example: you want to select clients that bought product X. In SQL you would say: where a purchase of X EXISTS for the client. Or: where the client is IN the set of the X purchasers.
select * from clients c where exists
(select * from purchases p where p.clientid = c.clientid and product = 'X');
Or
select * from clients where clientid in
(select clientid from purchases where product = 'X');
You don't say: Give me all combinations of clients and X purchases and then boil that down so I just get each client once.
select distinct c.*
from clients c
join purchases p on p.clientid = c.clientid and product = 'X';
Yes, it is very easy to just join all tables needed and then just list the columns to select and then just put DISTINCT in front. But it makes the query kind of blurry, because you don't write the query as you would word the task. And it can make things difficult when it comes to aggregations. The following query is wrong, because you multiply money earned with the number of money-spent records and vice versa.
select
sum(money_spent.value),
sum(money_earned.value)
from user
join money_spent on money_spent.userid = user.userid
join money_earned on money_earned.userid = user.userid;
And the following may look correct, but is still incorrect (it only works when the values happen to be unique):
select
sum(distinct money_spent.value),
sum(distinct money_earned.value)
from user
join money_spent on money_spent.userid = user.userid
join money_earned on money_earned.userid = user.userid;
Again: You would not say: "I want to combine each purchase with each earning and then ...". You would say: "I want the sum of money spent and the sum of money earned per user". So you are not dealing with single purchases or earnings, but with their sums. As in
select
sum(select value from money_spent where money_spent.userid = user.userid),
sum(select value from money_earned where money_earned.userid = user.userid)
from user;
Or:
select
spent.total,
earned.total
from user
join (select userid, sum(value) as total from money_spent group by userid) spent
on spent.userid = user.userid
join (select userid, sum(value) as total from money_earned group by userid) earned
on earned.userid = user.userid;
So you see, this is where derived tables come into play.
I am trying to do a quick accounting sql statement and I am running into some problems.
I have 3 tables registrations, events, and a payments table. Registrations are individual transactions, events are information about what they signed up for, and payments are payments made to events.
I would like to total the amounts paid by the registrations, put the event name and event startdate into a column, then total the amount of payments made so far. If possible I would also like to find a total not paid. I believe the bottom figures out everything except the payment amount total. The payment amount total is much larger than it should be, more than likely by using the SUM it is counting payments multiple times because of the nesting.
select
sum(`reg_amount`) as total,
event_name,
event_startdate,
(
select sum(payment_amount) as paid
from registrations
group by events.event_id
) pay
FROM registrations
left join events
on events.event_id = registrations.event_id
left join payments
on payments.event_id = events.event_id
group by registrations.event_id
First, you should use aliases so we know where all the fields come from. I'm guessing that payment_amount comes from the payments table and not from registrations.
If so, your subquery is adding up the payments from the outer table for every row in registrations. Probably not what you want.
I think you want something like this:
select sum(`reg_amount`) as total,
e.event_name,
e.event_startdate,
p.TotPayements
FROM registrations r left join
events e
on e.event_id = r.event_id left join
(select event_id, sum(payment_amount) as TotPayments
from payments
group by event_id
) p
on p.event_id = e.event_id
group by r.event_id;
The idea is to aggregate the payments at the lowest possible level, to avoid duplications caused by joining. That is, aggregate before joining.
This is a guess as to the right SQL, but it should put you on the right path.
I'm trying to create a stepped table report using SQL report builder 3.0. The stepped report contains Groups/devices/users along with associated totals for each group/device/user.
I want the entire report to be sorted by these totals along with each individual step sorted this way also.
Currently users are sorted by their totals, but not devices or groups.
Is there a way to sort the other steps?
You can just do this in SQL using some nested queries. Let's assume you have the following tables: Transaction, User, Device, and Group. The transaction table records the transactions of the User on a Device and has an Amount field to sum. A user belongs to a Group.
So you need to sum the Amount for the User, for the Groups and for the Devices used within a Group which will give you SQL that looks like this:
SELECT G.Description AS [Group], D.Description AS Device, U.Description AS UserName, MAX(GT.GroupTotal) AS GroupTotal, MAX(GDT.GroupDeviceTotal) AS GroupDeviceTotal, SUM(T.Amount) AS UserTotal
FROM Transaction AS T
INNER JOIN User AS U ON L.UserId = F.UserId
INNER JOIN Group AS G ON G.GroupId = L.GroupId
INNER JOIN Device AS D ON T.DeviceId = L.DeviceId
INNER JOIN
(SELECT GroupId, SUM(Amount) AS GroupTotal
FROM Transaction
INNER JOIN User ON User.UserId = Transaction.UserId
WHERE (Transaction.TxDate >= '2011-01-01')
GROUP BY User.GroupId) AS GT ON GT.GroupId = U.GroupId
INNER JOIN
(SELECT GroupId, DeviceId, SUM(Amount) AS GroupDeviceTotal
FROM Transaction
INNER JOIN User ON User.UserId = Transaction.UserId
WHERE (TxDate >= '2011-01-01')
GROUP BY GroupId, DeviceId) AS GDT ON GDT.GroupId = U.GroupId AND GDT.DeviceId = T.DeviceId
WHERE (T.TxDate >= '2011-01-01')
GROUP BY G.GroupId, D.DeviceId, U.UserId
ORDER BY GroupTotal DESC, GroupDeviceTotal DESC, UserTotal DESC
Note that the where clause you use has to be the same in the main query and each nested query (this is the "WHERE (T.TxDate >= '2011-01-01')" bit).
You can try going to the Row/Column groups area... then for each group you have, double click the group, select "Sorting" and then add as many sorting fields as you need for the info contained at that group level.
If you have other sorts applied on the data... such as to the tablix/matrix, sometimes SSRS can get confused, so If my suggestion does help you with the effect you're going for but there are some issues, try removing all other sorting you've applied to the data elsewhere in the report besides on those groups... And I would start with the innermost and work out, trying not to repeat a field that is in a lower group's data. (if that makes sense).
edit:
So, let's say we have a report for a vet's office that shows client information, and we want to group by personID, petID and visitID. The tablix as a whole would be sorted by the person's name (or last name, then first name... or whatever). Then your first group would group on the personID and be sorted by the petName. The second, lower group would group on the petID and be sorted by the visitDate. The third level would group on the visitID, and... this doesn't really need to be sorted unless by visitTime if its not included in visitDate.