We have a vehicles table that has a VehicleName column. I'm looking into a bug where a user has multiple vehicles of the same VehicleName. I have this query to help return which VehicleNames have been used multiple times:
SELECT VehicleName,
(
SELECT count(VehicleName)
FROM Vehicles as V2
WHERE V1.VehicleName = V2.VehicleName
)
FROM Vehicles as V1;
For one thing, it's slow. That's not too bad because this isn't going into production; it's just to aid in a bug fix. Second, this will return every VehicleName, even the ones that have a count of one, those are VehicleNames I'm not interested in for this application. I can't remember how to name the subquery, so I can't add a where to limit it.
I'm interested not just in how to name the subquery, but also are their faster solutions to this?
Are you trying to do this in some ugly way ?
SELECT VehicleName, COUNT(*) as TOTAL
FROM Vehicles
GROUP BY VehicleName
HAVING TOTAL > 1
ORDER BY TOTAL DESC;
Related
I'm new to MySQL and databases and I've seen in many places that it is not considered a good programming practice to use subqueries in the FROM field of SELECT in MySQL, like this:
select userid, avg(num_pages)
from (select userid , pageid , regid , count(regid) as num_pages
from reg_pag
where ativa = 1
group by userid, pageid) as from_query
group by userid;
which calculates the average number of registers per page that the users have.
The reg_page table looks like this:
Questions:
How to change the query so that it doesn't use the subquery in the FROM field?
And is there a general way to "flatten" queries like this?
The average number of registers per page per user can also be calculated as number of registers per user divided by number of pages per user. Use count distinct to count only distinct psgeids per user:
select userid, count(regid) / count(distinct pageid) as avg_regs
from reg_pag
where ativa=1
group by userid;
There is no general way of flattening such queries. It may not even be possible to flatten some of them, otherwise there would be little point in having this feature in the first place. Do not get scared of using subqueries in the from clause, in some occasions they may be even more effective, than a flattened query. But this is vendor and even version specific.
One way is to use count(distinct):
select userid, count(*) / count(distinct pageid)
from reg_pag
where ativa = 1
group by userid;
I have 2 tables called Orders and Salesperson shown below:
And I want to retrieve the names of all salespeople that have more than 1 order from the tables above.
Then firing following query shows an error:
SELECT Name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID
GROUP BY salesperson_id
HAVING COUNT( salesperson_id ) >1
The error is:
Column 'Name' is invalid in the select list because it is
not contained in either an aggregate function or
the GROUP BY clause.
From the error and searching it on google, I could understand that the error is because of Name column must be either a part of the group by statement or aggregate function.
Also I tried to understand why does the selected column have to be in the group by clause or art of an aggregate function? But didn't understand clearly.
So, how to fix this error?
SELECT max(Name) as Name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID
GROUP BY salesperson_id
HAVING COUNT( salesperson_id ) >1
The basic idea is that columns that are not in the group by clause need to be in an aggregate function now here due to the fact that the name is probably the same for every salesperson_id min or max make no real difference (the result is the same)
example
Looking at your data you have 3 entry's for Dan(7) now when a join is created the with row Dan (Name) gets multiplied by 3 (For every number 1 Dan) and then the server does not now witch "Dan" to pick cos to the server that are 3 lines even doh they are semantically the same
also try this so that you see what I am talking about:
SELECT Orders.Number, Salesperson.Name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID
As far as the query goes INNER JOIN is a better solution since its kinda the standard for this simple query it should not matter but in some cases can happen that INNER JOIN produces better results but as far as I know this is more of a legacy thing since this days the server should pretty much produce the same execution plan.
For code clarity I would stick with INNER JOIN
Assuming the name is unique to the salesperson.id then simply add it to your group by clause
GROUP BY salesperson_id, salesperson.Name
Otherwise use any Agg function
Select Min(Name)
The reason for this is that SQL doesn't know whether there are multiple name per salesperson.id
For readability and correctness, I usually split aggregate queries into two parts:
The aggregate query
Any additional queries to support fields not contained in aggregate functions
So:
1.Aggregate query - salespeople with more than 1 order
SELECT salesperson_id
FROM ORDERS
GROUP BY salespersonId
HAVING COUNT(Number) > 1
2.Use aggregate as subquery (basically a select joining onto another select) to join on any additional fields:
SELECT *
FROM Salesperson SP
INNER JOIN
(
SELECT salesperson_id
FROM ORDERS
GROUP BY salespersonId
HAVING COUNT(Number) > 1
) AGG_QUERY
ON AGG_QUERY.salesperson_id = SP.ID
There are other approaches, such as selecting the additional fields via aggregation functions (as shown by the other answers). These get the code written quickly so if you are writing the query under time pressure you may prefer that approach. If the query needs to be maintained (and hence readable) I would favour subqueries.
I am working on a query that needs to output 'total engagements' by users in columns like 1 -eng column will display users who have one engagements, second column 2-eng which will display users who have done 2 engagements. Likewise 3eng, and so on. Note that the display should be like this. I have a engagement table which has userID. So I get distinct users like this
select count(distinct userID) from engagements
and I get engagements as
select count(*) from engagements
Engagements here refers to users who have either liked,replied,or shared the content
Please help. Thanks! I have used CASE and IF but unable to display in the below form
1eng 2eng 3eng
100 200 100
Consider returning the results in rows and pivoting them afterwards in your application.
To return the desired results in rows, you could use the following query:
SELECT
engagementCount,
COUNT(*) AS userCount
FROM (
SELECT
userID,
COUNT(*) AS engagementCount
FROM engagements
GROUP BY userID
) AS s
GROUP BY engagementCount
;
Basically, you first group the engagements rows by userID and get the row counts per userID. Afterwards, you use the counts as the grouping criterion and count how many users were found with that count.
If you insist on returning the columnar view in SQL, you'll need to resort to dynamic SQL because of the indefinite number of columns in the final result set. You'd probably need to store the results of the inner SELECT temporarily, scan it to build the list of count expressions for every engagementCount value and ultimately construct a query of this kind:
SELECT
COUNT(engagementCount = 1 OR NULL) AS `1eng`,
COUNT(engagementCount = 2 OR NULL) AS `2eng`,
COUNT(engagementCount = 3 OR NULL) AS `3eng`,
...
FROM temporary_storage
;
Or SUM(engagementCount = value) instead COUNT(engagementCount = value OR NULL). (For me, the latter expresses the intention more explicitly, hence why I've suggested it first, but, in case you happen to prefer the SUM technique, there should be no discernible difference in performance between the two. The OR NULL trick is explained here.)
I have this query:
select count(distinct User_ID) from Web_Request_Log where Added_Timestamp like '20110312%' and User_ID Is Not Null;
User_ID and Added_Timestamp are indexed.
The query is painfully slow (we have millions of records and the table is growing fast).
I've read all the posts I could find about count and distinct, here, but they seem to be mostly syntax related. I'm interested in optimization and I'm wondering if I'm using the right tool for the job.
I can use an intermediate counter table to summarize overall hits, but I'd like a way to do this that would allow me to easily generate ad-hoc 'range' queries; i.e., what is the distinct visitor count for last week, or last month.
Did some tests to see if GROUP BY can help and it seems it can.
On table A with ~8M records and ~340K distinct records for a given non-indexed field:
GROUP BY 17 seconds
COUNT(DISTINCT ..) 21 seconds
On table A with ~2M records and ~50K distinct records for a given indexed field:
GROUP BY 200 ms
COUNT(DISTINCT ..) 2.5 seconds
This is MySql with InnoDB engine, BTW.
I can't find any relevant documentation though, and I wonder if that comparison is dependent on the data (how many duplicates there are).
For your table, the GROUP BY query will look like this:
SELECT COUNT(t.c)
FROM (SELECT 1 AS c
FROM Web_Request_Log
WHERE Added_Timestamp LIKE '20110312%'
AND User_ID IS NOT NULL
GROUP BY User_ID
) AS t
Try it and let us know if it's quicker :)
I'm writing a query where I group a selection of rows to find the MIN value for one of the columns.
I'd also like to return the other column values associated with the MIN row returned.
e.g
ID QTY PRODUCT TYPE
--------------------
1 2 Orange Fruit
2 4 Banana Fruit
3 3 Apple Fruit
If I GROUP this table by the column 'TYPE' and select the MIN qty, it won't return the corresponding product for the MIN row which in the case above is 'Apple'.
Adding an ORDER BY clause before grouping seems to solve the problem. However, before I go ahead and include this query in my application I'd just like to know whether this method will always return the correct value. Is this the correct approach? I've seen some examples where subqueries are used, however I have also read that this inefficient.
Thanks in advance.
Adding an ORDER BY clause before grouping seems to solve the problem. However, before I go ahead and include this query in my application I'd just like to know whether this method will always return the correct value. Is this the correct approach? I've seen some examples where subqueries are used, however I have also read that this inefficient.
No, this is not the correct approach.
I believe you are talking about a query like this:
SELECT product.*, MIN(qty)
FROM product
GROUP BY
type
ORDER BY
qty
What you are doing here is using MySQL's extension that allows you to select unaggregated/ungrouped columns in a GROUP BY query.
This is mostly used in the queries containing both a JOIN and a GROUP BY on a PRIMARY KEY, like this:
SELECT order.id, order.customer, SUM(price)
FROM order
JOIN orderline
ON orderline.order_id = order.id
GROUP BY
order.id
Here, order.customer is neither grouped nor aggregated, but since you are grouping on order.id, it is guaranteed to have the same value within each group.
In your case, all values of qty have different values within the group.
It is not guaranteed from which record within the group the engine will take the value.
You should do this:
SELECT p.*
FROM (
SELECT DISTINCT type
FROM product p
) pd
JOIN p
ON p.id =
(
SELECT pi.id
FROM product pi
WHERE pi.type = pd.type
ORDER BY
type, qty, id
LIMIT 1
)
If you create an index on product (type, qty, id), this query will work fast.
It's difficult to follow you properly without an example of the query you try.
From your comments I guess you query something like,
SELECT ID, COUNT(*) AS QTY, PRODUCT_TYPE
FROM PRODUCTS
GROUP BY PRODUCT_TYPE
ORDER BY COUNT(*) DESC;
My advice, you group by concept (in this case PRODUCT_TYPE) and you order by the times it appears count(*). The query above would do what you want.
The sub-queries are mostly for sorting or dismissing rows that are not interested.
The MIN you look is not exactly a MIN, it is an occurrence and you want to see first the one who gives less occurrences (meaning appears less times, I guess).
Cheers,