Limit the count using GROUP BY - mysql

I want to limit the count to 5 using COUNT(*) and group by but it returns all the rows.Consider I have a table names tbhits
tbhits
id | uname
------------------------
101 | john
101 | james
101 | henry
101 | paul
101 | jacob
101 | jaden
101 | steve
101 | lucas
102 | marie
SELECT id,COUNT(*) as 'hits' FROM tbhits GROUP BY id
returns
id | hits
--------------------
101 | 8
102 | 1
But I want the group by to limit maximum count to 5.
Say I have got 1000 rows I dont want to count them all, if rows are just greater than 5 then just display 5+
I tried using LIMIT 5 but it does not seem to work
SELECT id,COUNT(*) as 'hits' FROM tbhits GROUP BY id LIMIT 5 does not work.
I also used WHERE Clause
SELECT id,COUNT(*) as 'hits' FROM tbhits WHERE id = 101 GROUP BY id LIMIT 5
but it stil returns hits as 8.
id | hits
--------------------
101 | 8
Any help is greatly appreciated.

LIMIT is intended to limit the number of rows you'll get from your query. I suggest you use the COUNT function as follows :
SELECT id, CASE WHEN COUNT(*) < 6 then COUNT(*) ELSE '5+' END as 'hits'
FROM tbhits
GROUP BY id
More details about selecting the minimum of two numbers here, and here goes the sqlfiddle (consider providing it yourself next time).
Note that I went for 6 instead of '5+' on my first suggestion, because you should not, in my opinion, mix data types. But putting 6 is not a good solution either, because someone not aware of the trick will not notice it ('5+', at least, is explicit)
As far as performance is concerned, AFAIK you should not expect MySQL to do the optimization itself.

LIMIT on GROUP BY clause won't actually limit the counts, it will limit the rows being outputed.
Try using if statement to compare count result,
SELECT id,if(COUNT(*)>5,'5+',COUNT(*)) as 'hits'
FROM tbhits
GROUP BY id
O/p:
id | hits
--------------------
101 | 5+
102 | 1
Regarding performance issue, AFAIK GROUP BY will always lead to lead down performance and there is no direct way to limit counts in GROUP BY clause. You will have to go with either IF or CASE statement if you want solution from MySQL. Otherwise go with PHP itself.
Moreover you should have a look at GROUP BY optimization

As was already said LIMIT applies last in this case, thus after the grouping. What you want to do is modify the value that is selected once the grouping is done.
This will appropriately output "5+" if you have more than 5 records for your table.
SELECT id,
IF(COUNT(*)>5,"5+",COUNT(*)) AS 'count'
FROM Whatever GROUP BY id
See the SQL Fiddle here:
http://sqlfiddle.com/#!2/e381e/4

Try with this?
SELECT id,COUNT(*) as 'hits' FROM tbhits GROUP BY id
HAVING hits >= 5

Related

SQL - return latest of multiple records from large data set

Background
I have a stock_price table that stores historical intra-day stock prices for roughly 1000 stocks. Although the old data is purged regularly, the table regularly has 5M+ records. Structure is loosely:
| id | stock_id | value | change | created_at |
|--------|----------|-------|--------|---------------------|
| 12345 | 1 | 50 | 2.12 | 2020-05-05 17:39:00 |
| 12346 | 2 | 25 | 1.23 | 2020-05-05 17:39:00 |
I regularly need to fetch the latest stock prices for ~20ish stocks at time for an API endpoint. An original implementation of this executed a single query per stock:
select * from stock_prices where stock_id = 1 order by created_at desc limit 1
Part 1: An inefficient query
Somewhat inefficient with 20+ queries, but it worked. The code (Laravel 6) was updated to use the correct relationships (stock hasMany stock_prices), which in turn generated a query like this:
select
*
from
`stock_prices`
where
`stock_prices`.`stock_id` in (1, 2, 3, 4, 5)
order by
`id` desc
While this saves on queries, it takes 1-2 seconds to run. Running explain shows it's still having to query 50k+ rows at any given time, even with the foreign key index. My next thought was that I'd add a limit to the query to only return the number of rows equal to the number of stocks I'm asking for. Query is now:
select
*
from
`stock_prices`
where
`stock_prices`.`stock_id` in (1, 2, 3, 4, 5)
order by
`id` desc
limit
5
Part 2: Query sometimes misses records
Performance is amazing - millisecond-level processing with this. However, it suffers from potentially not returning a price for one/ multiple of the stocks. Since the limit has been added, if any stock has more than one price (row) before the next stock, it will "consume" one of the row counts.
This is a very real scenario as some stocks pull data each minute, others every 15 minutes, etc. So there are cases where that above query, due to the limit will pull multiple rows for one stock and subsequently not return data for others:
| id | stock_id | value | change | created_at |
|------|----------|-------|--------|----------------|
| 5000 | 1 | 50 | 0.5 | 5/5/2020 17:00 |
| 5001 | 1 | 51 | 1 | 5/5/2020 17:01 |
| 6001 | 2 | 25 | 2.2 | 5/5/2020 17:00 |
| 6002 | 3 | 35 | 3.2 | 5/5/2020 17:00 |
| 6003 | 4 | 10 | 1.3 | 5/5/2020 17:00 |
In this scenario, you can see that stock_id of 1 has more frequent intervals of data, so when the query was ran, it returned two records for that ID, then continued down the list. After it hit 5 records, it stopped, meaning that stock id of 5 did not have any data returned, although it does exist. As you can imagine, that breaks things down the line in the app when no data was returned.
Part 3: Attempts to solve
The most obvious answer seems to be to add a GROUP BY stock_id as a way to require that I get the same number of results as I'm expected per stock. Unfortunately, this leads me back to Part 1, wherein that query, while it works, takes 1-2 seconds because it ends up having to traverse the same 50k+ rows as it did without the limit previously. This leaves me no better off.
The next thought was to arbitrarily make the LIMIT larger than it needs to be so it can capture all the rows. This is not a predictable solution since the query could be any combination of thousands of stocks that each have different intervals of data available. The most extreme example is stocks that pull daily versus each minute, which means one could have somewhere near 350+ rows before the second stock appears. Multiply that by the number of stocks in one query - say 50, and this still will require querying 15k+ plus rows. Feasible, but not ideal, and potentially not scalable.
Part 4: Suggestions?
Is it such a bad practice to have one API call initiate potentially 50+ DB queries just to get stock price data? Is there some thresehold of LIMIT I should use that minimizes the chances of failure enough to be comfortable? Are there other methods with SQL that would allow me to return the required rows without having to query a large chunk of tables?
Any help appreciated.
The fastest method is union all:
(select * from stock_prices where stock_id = 1 order by created_at desc limit 1)
union all
(select * from stock_prices where stock_id = 2 order by created_at desc limit 1)
union all
(select * from stock_prices where stock_id = 3 order by created_at desc limit 1)
union all
(select * from stock_prices where stock_id = 4 order by created_at desc limit 1)
union all
(select * from stock_prices where stock_id = 5 order by created_at desc limit 1)
This can use an index on stock_prices(stock_id, created_at [desc]). Unfortunately, when you use in, the index cannot be used as effectively.
Groupwise-max
SELECT b.*
FROM ( SELECT stock_id, MAX(created_at) AS created_at
FROM stock_proces
GROUP BY stock_id
) AS a
JOIN stock_prices AS b USING(stock_id, created_at)
Needed:
INDEX(stock_id, created_at)
If you can have two rows for the same stock in the same second, this will give 2 rows. See the link below for alternatives.
If that pair is unique, then make it the PRIMARY KEY and get rid of id; this will help performance, too.
More discussion: http://mysql.rjweb.org/doc.php/groupwise_max#using_an_uncorrelated_subquery

how to sort 2 different column in sql

can you guys show me how to sort user define column and fixed column name in sql. i need to display the highest transaction and outletid, instead i only get the highest transaction but the oulet id is not in grouping.
pardon me, im very bad at english
here is the problem
outlet id | revenue code | total transaction | total amount
6837 | 014 | 326 | 39158.94
6821 | 408 | 291 | 48786.50
6814 | 014 | 285 | 74159.76
6837 | 452 | 282 | 8846.80
and here is my sql
SELECT
outletid,
revcode,
count(receiptnumbe) as Transactions,
sum(amount) as total
FROM
user_payment
WHERE
date = (SELECT MAX(date) FROM user_payment GROUP BY date desc LIMIT 0, 1)
GROUP BY
outletid, revcode
ORDER BY Transactions desc
i need it to be like this. sort by outlet id and highest transactions.
outlet id | revenue code | total transaction | total amount
6837 | 014 | 326 | 39158.94
6837 | 452 | 282 | 8846.80
6821 | 408 | 291 | 48786.50
6814 | 014 | 285 | 74159.76
Is this what you want?
ORDER BY OutletId, Transactions desc
EDIT:
If I understand correctly, you want it sorted by the outlet that has the most total transactions. Then by transactions within that group. To do that, you need to summarize again at the outlet level and join back the results:
select outor.*
from (SELECT up.outletid, up.revcode, count(up.receiptnumbe) as Transactions,
sum(up.amount) as total
FROM user_payment up
WHERE date = (SELECT MAX(date) FROM user_payment)
GROUP BY outletid, revcode
) outor join
(SELECT up.outletid, count(up.receiptnumbe) as Transactions,
sum(up.amount) as total
FROM user_payment up
WHERE date = (SELECT MAX(date) FROM user_payment)
GROUP BY outletid
) o
on outor.outletid = o.outletid
order by o.Transactions desc, outor.outletid, outor.Transactions desc;
1)The first thing to do is make sure that you are sorting the fields the way you want to. Do you want them sorted numerically or alphabetically?
See Sorting Lexical and Numeric
Count should be numerical, but you should check outletid.
If you have access to the tables, you could change the field to a number type for it to be sorted numerically or a string for it to be sorted alphabetically.
You might have to use cast or convert. See Oracle Cast Documentation.
2)If you want the whole table sorted by outlet id and amount of transactions you might consider removing the group by clause.
3)The third thing I would look at even if this did work is renaming column names that had reserved words to the tables that were reserved words. I noticed transaction highlighted in blue.
When these things are checked Melon's comment should work.
Good question. Feel free to comment so I can follow up.

Multiple SUM commands

I am trying to sum a column in my table, the problem being there are multiple sums that need to be done.
So for instance there may be 40 records with an ID of 1 and a point value of 20, And then it will change to a new person with an ID of 2 and a point value of 20. If that makes sense.
How I want to do the query, but it doesn't work is like this:
SELECT SUM(Value)
FROM Points WHERE RegNum IN('','','')
And then I would like it to show up just like a normal SUM command would, with the total summed up, but with a line for each ID. I have looked over other questions about SUM commands and just can't quite apply it to my situation.
Thank you for any help.
It seems like you need to use a GROUP BY in your case. Try
SELECT RegNum, SUM(Value) total
FROM Points
WHERE RegNum IN(1, 2, 3)
GROUP BY RegNum
Sample output:
| REGNUM | TOTAL |
------------------
| 1 | 17 |
| 2 | 9 |
| 3 | 1 |
Here is SQLFiddle demo
Try
SELECT RegNum, SUM(Value) as TotalRegNum
From Points
WHERE RegNum IN('1','2','3')
GROUP BY RegNum

Is it possible to perform calculations with mysql?

I have a DB table for photo ratings and want to retrieve the highest rated photos. I know I need to do this based on an average for the ratings sorted from highest to lowest. The DB table looks like this:
id rating rated_photo_id
-- ------ -------------
1 5 1
2 6 1
3 3 2
4 4 1
5 7 2
Is it efficient or even possible to perform this calculation in the SQL query? If not would it make sense to maintain a second table that stores the averages for each photo_id?
This is possible with almost all databases. Check out the aggregate functions of MySQL.
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
Specifically http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_avg for your question.
You DO NOT need a second table. The rating table has the information you need. Use MySQL aggregate functions with GROUP BY:
SELECT rated_photo_id, AVG(rating) AS AverageRating, COUNT(*) AS NumberOfRatings
FROM rating_table
GROUP BY rated_photo_id
ORDER BY AverageRating DESC
Output:
+----------------+---------------+-----------------+
| rated_photo_id | AverageRating | NumberOfRatings |
+----------------+---------------+-----------------+
| 1 | 5.0000 | 3 |
| 2 | 5.0000 | 2 |
+----------------+---------------+-----------------+
Yes, it's easy and efficient to calculate averages, assuming you've an index on the rated_photo_id column
select rated_photo_id, AVG(rating) as average_rating
from photos group by rated_photo_id order by average_rating desc
For a specific photo could specify an id:
select rated_photo_id, AVG(rating)
from photos where rated_photo_id = 2 group by rated_photo_id
Ideally your index would be (rated_photo_id, rating) to be covering for these queries--resulting in the fastest execution.
You should be able to just group by the photo id and get the average as the group is created.
SELECT rated_photo_id , AVG(rating) as rating
FROM photos
GROUP BY rated_photo_id
ORDER BY rating DESC

Finding an element with max. no of appearing in SQL

I have the following table:
id | year
10 | 2000
11 | 2001
10 | 2002
12 | 2003
11 | 2004
13 | 2005
10 | 2006
10 | 2007
According to id, since 10 appears most, the selection should give 10 for this table. I know this is easy but I couldn't go further than COUNT(*).
The following SQL will work when there is more then one id having the maximum count:
SELECT id FROM table GROUP BY 1
HAVING COUNT(*)=( SELECT MAX(t.count)
FROM ( SELECT id,COUNT(*) AS count
FROM table GROUP BY 1 ) t )
The first (innermost) SELECT will just count each id, this is used in the second SELECT to determine the maximum count and this will be used in the final (outermost) SELECT to display only the right IDs.
Hope that helps.
You need a group by, order by - along with a limit:
SELECT id FROM sometable GROUP BY id ORDER BY COUNT(*) DESC LIMIT 1
This will group the table by id, order them in descending order by their count and pick the first row (the one with highest count).