MySQL: LIMIT by a percentage of the amount of records? - mysql

Let's say I have a list of values, like this:
id value
----------
A 53
B 23
C 12
D 72
E 21
F 16
..
I need the top 10 percent of this list - I tried:
SELECT id, value
FROM list
ORDER BY value DESC
LIMIT COUNT(*) / 10
But this doesn't work. The problem is that I don't know the amount of records before I do the query. Any idea's?

Best answer I found:
SELECT*
FROM (
SELECT list.*, #counter := #counter +1 AS counter
FROM (select #counter:=0) AS initvar, list
ORDER BY value DESC
) AS X
where counter <= (10/100 * #counter);
ORDER BY value DESC
Change the 10 to get a different percentage.

In case you are doing this for an out of order, or random situation - I've started using the following style:
SELECT id, value FROM list HAVING RAND() > 0.9
If you need it to be random but controllable you can use a seed (example with PHP):
SELECT id, value FROM list HAVING RAND($seed) > 0.9
Lastly - if this is a sort of thing that you need full control over you can actually add a column that holds a random value whenever a row is inserted, and then query using that
SELECT id, value FROM list HAVING `rand_column` BETWEEN 0.8 AND 0.9
Since this does not require sorting, or ORDER BY - it is O(n) rather than O(n lg n)

You can also try with that:
SET #amount =(SELECT COUNT(*) FROM page) /10;
PREPARE STMT FROM 'SELECT * FROM page LIMIT ?';
EXECUTE STMT USING #amount;
This is MySQL bug described in here: http://bugs.mysql.com/bug.php?id=19795
Hope it'll help.

I realize this is VERY old, but it still pops up as the top result when you google SQL limit by percent so I'll try to save you some time. This is pretty simple to do these days. The following would give the OP the results they need:
SELECT TOP 10 PERCENT
id,
value
FROM list
ORDER BY value DESC
To get a quick and dirty random 10 percent of your table, the following would suffice:
SELECT TOP 10 PERCENT
id,
value
FROM list
ORDER BY NEWID()

I have an alternative which hasn't been mentionned in the other answers: if you access from any language where you have full access to the MySQL API (i.e. not the MySQL CLI), you can launch the query, ask how many rows there will be and then break the loop if it is time.
E.g. in Python:
...
maxnum = cursor.execute(query)
for num, row in enumerate(query)
if num > .1 * maxnum: # Here I break the loop if I got 10% of the rows.
break
do_stuff...
This works only with mysql_store_result(), not with mysql_use_result(), as the latter requires that you always accept all needed rows.
OTOH, the traffic for my solution might be too high - all rows have to be transferred.

Related

DIviding the SQL result into two halves

The SQL query is :
Select ProductName from Products;
The above query returns 5000 rows.
How can the result of 5000 rows be divided into two result sets of 2500 rows each,.i.e., one result set from 1 to 2500 and the other from 2501 to 5000?
Note:
Here ProductName is the primary Key.No ProductID column is present in the table.
It can be done either in the back end or in the front end.
An approach that works for mySQL (based on this answer https://stackoverflow.com/a/4741301/14015737):
Upper half
SELECT *
FROM (
SELECT test.*, #counter := #counter +1 counter
FROM (select #counter:=0) initvar, test
ORDER BY num
) X
WHERE counter <= round(50/100 * #counter);
ORDER BY num;
Lower half
Invert the sort order and remove the rounding
SELECT *
FROM (
SELECT test.*, #counter := #counter +1 counter
FROM (select #counter:=0) initvar, test
ORDER BY num DESC
) X
WHERE counter <= (50/100 * #counter);
ORDER BY num;
In case of an uneven number of records, the middle record is added to the upper half in this example. If you want it the other way around, move the round() to the other statement. If you don't want it at all, remove round().
Dbfiddle example: https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=fb70eae0f7f1434a24099b5bb19f0878
If you know the numbers that you want, just use limit:
select ProductName
from Products
order by id
And then either:
limit 2500
limit 2500 offset 2499
If you simply want the results split into half, then you can use:
select t.*
from (select t.*,
ntile(2) over (order by <primary key>) as tile
from t
) t
where tile = 1; -- or 2 for the other half
The easiest and probably fastest approach is to use the table's primary key if you are fine with getting the rows in its order.
Run
select productname, id from products order by id;
and fetch 2500 rows. Then with the last ID, say ID 3456, run
select productname, id from products where id > 3456 order by id;
and fetch 2500 rows again. Etc.
UPDATE: Seeing I got a downvote for this, I'll better explain :-)
The query returns 5000 rows now and the OP doesn't want that many rows, so they want to cut this in halves. But the query may well return 10000 rows next year. Will the OP suddenly be fine with getting 5000 rows at once? This doesn't seem likely. It is more likely that there is an amount of rows that shall not be surpassed. This is why I cut the amount into slices of 2500.
The other approach to number all rows and return the first n rows has a severe drawback: All rows must be read again. Even if it is decided to cut the result in chunks of 100 each, everytime all rows must be read, sorted, numbered, fetched from. Reading all rows from a table and sorting all these rows is a lot of work for a DBMS.

select random value based on probability chance

How do I select a random row from the database based on the probability chance assigned to each row.
Example:
Make Chance Value
ALFA ROMEO 0.0024 20000
AUDI 0.0338 35000
BMW 0.0376 40000
CHEVROLET 0.0087 15000
CITROEN 0.016 15000
........
How do I select random make name and its value based on the probability it has to be chosen.
Would a combination of rand() and ORDER BY work? If so what is the best way to do this?
You can do this by using rand() and then using a cumulative sum. Assuming they add up to 100%:
select t.*
from (select t.*, (#cumep := #cumep + chance) as cumep
from t cross join
(select #cumep := 0, #r := rand()) params
) t
where #r between cumep - chance and cumep
limit 1;
Notes:
rand() is called once in a subquery to initialize a variable. Multiple calls to rand() are not desirable.
There is a remote chance that the random number will be exactly on the boundary between two values. The limit 1 arbitrarily chooses 1.
This could be made more efficient by stopping the subquery when cumep > #r.
The values do not have to be in any particular order.
This can be modified to handle chances where the sum is not equal to 1, but that would be another question.

how to generate subsets from a query in postgres?

I need subsets of tuples for a given query. For example , if I need to find list of all employees who are older than 25 and limit that to atmost 5. how do I generate various subsets for the same ?
# select * from employee where age > 25 ;
will list all the employee whose age is > 25
tupleid = { 1,2,3,4,5,6,7,8,9 }
#select * from employee where age > 25 limit 5 ;
will list any 5 tuples from it.
tupleid on first time execution = {1,2,3,4,5}
But I need some kind of permutation of the 5 set tuple where I get
{1,2,3,4,5} {2,3,4,5,6} .. and so on ..
Is there a way to generate the same in sql / postgres ?
Edit 1:
Since the question dint seem to be clear at first.
I have added a sample table and sql Query I was talking about . sqlfiddle.com/#!2/69a0c/2 If you see the output of the 2nd query I only get to see [1,2,3,4,5] tuples are the answer. I want [1,2,3,4,5] , [1,2,3,4,6] , [1,2,3,4,7] and soo on unique combinations as the answer .
To get your "kind of permutations", use OFFSET:
SELECT *
FROM employee
WHERE age > 25
ORDER BY employee_id -- or whatever
OFFSET 1
LIMIT 5;
To get a random sample:
SELECT *
FROM employee
WHERE age > 25
ORDER BY random()
LIMIT 5;
Postgres. In MySQL you can use RAND() instead.

Select data which have same letters

I'm having trouble with this SQL:
$sql = mysql_query("SELECT $menucompare ,
(COUNT($menucompare ) * 100 / (SELECT COUNT( $menucompare )
FROM data WHERE $ww = $button )) AS percentday FROM data WHERE $ww >0 ");
$menucompare is table fields names what ever field is selected and contains data bellow
$button is the week number selected (lets say week '6')
$ww table field name with row who have the number of week '6'
For example, I have data in $menucompare like that:
123456bool
521478bool
122555heel
147788itoo
and I want to select those, who have same word in the last of the data and make percentage.
The output should be like that:
bool -- 50% (2 entries)
heel -- 25% (1 entry)
itoo -- 25% (1 entry)
Any clearness to my SQL will be very appreciated.
I didn't find anything like that around.
Well, keeping data in such format probably not the best way, if possible, split the field into 2 separate ones.
First, you need to extract the string part from the end of the field.
if the length of the string / numeric parts is fixed, then it's quite easy;
if not, you should use regular expressions which, unfortunately, are not there by default with MySQL. There's a solution, check this question: How to do a regular expression replace in MySQL?
I'll assume, that numeric part is fixed:
SELECT s.str, CAST(count(s.str) AS decimal) / t.cnt * 100 AS pct
FROM (SELECT substr(entry, 7) AS str FROM data) AS s
JOIN (SELECT count(*) AS cnt FROM data) AS t ON 1=1
GROUP BY s.str, t.cnt;
If you'll have regexp_replace function, then substr(entry, 7) should be replaced to regexp_replace(entry, '^[0-9]*', '') to achieve the required result.
Variant with substr can be tested here.
When sorting out problems like this, I would do it in two steps:
Sort out the SQL independently of the presentation language (PHP?).
Sort out the parameterization of the query and the presentation of the results after you know you've got the correct query.
Since this question is tagged 'SQL', I'm only going to address the first question.
The first step is to unclutter the query:
SELECT menucompare,
(COUNT(menucompare) * 100 / (SELECT COUNT(menucompare) FROM data WHERE ww = 6))
AS percentday
FROM data
WHERE ww > 0;
This removes the $ signs from most of the variable bits, and substitutes 6 for the button value. That makes it a bit easier to understand.
Your desired output seems to need the last four characters of the string held in menucompare for grouping and counting purposes.
The data to be aggregated would be selected by:
SELECT SUBSTR(MenuCompare, -4) AS Last4
FROM Data
WHERE ww = 6
The divisor in the percentage is the count of such rows, but the sub-stringing isn't necessary to count them, so we can write:
SELECT COUNT(*) FROM Data WHERE ww = 6
This is exactly what you have anyway.
The divdend in the percentage will be the group count of each substring.
SELECT Last4, COUNT(Last4) * 100.0 / (SELECT COUNT(*) FROM Data WHERE ww = 6)
FROM (SELECT SUBSTR(MenuCompare, -4) AS Last4
FROM Data
WHERE ww = 6
) AS Week6
GROUP BY Last4
ORDER BY Last4;
When you've demonstrated that this works, you can re-parameterize the query and deal with the presentation of the results.

MySQL database resultset with values as close to a number "x" as possible

Im trying to get a result set that contains the 10 values that are closest to, in this case, the number 3.
I have a database that has values in a column named rated which can be 1,2,3,4 or 5. What im trying to do is query the database and return the first 10 rows that have the values closest to 3. The values can be above 3 or below 3. I should note that these values in the rated column are floats.
I then need to sort these rows in order so that rows with value of 3 are first and then the row with lowest offset (+/-) from 3.
Is there any SQL query that can return atleast the result set of values closest to 3 ? or am i going to have to return the whole db and sort it myself?
To get the first 10 rows with highest value down i used the statement
SELECT * FROM tabs ORDER BY 5 DESC LIMIT 10";
5 refers to the column rated
Is there some way to modify this to do what i want ?
Thanks
If I understand your problem correctly, this should do the trick:
select *
from tabs
order by abs(`rated` - 3) asc
limit 10
Note that it sorts by the difference in ascending order, so those with a difference of 0 will come first.
SELECT * FROM tabs ORDER BY ABS(3 - Rate) ASC LIMIT 10
If I got right what you need try:
select *
from (
select
case when -(3-rated) > 0 then -(3-rated) else (3-rated) end as distance,
tabs.*
from tabs
) subsel
order by distance
limit 10