Getting rows until a certain condition is met in MySQL - mysql

Basically I have a table which has attachments. Simple table: ID, Name, Size and UploadedDate. I want to get the last x rows that in total, are less than 2 GB.
So, collect all rows in DESC order of UploadedDate until I have 2GB of total file sizes, then eliminate the rest.
Actually I need the inverse of that. So, I need to get all the attachment that are not part of the first 2 GB. I have advance experience in MySQL but right now it seems like I have a blank on this. I don't know what to search for.

A hacky hint:
SELECT items.* FROM (
SELECT 1 as id, 100 as size
UNION ALL
SELECT 2 as id, 100 as size
UNION ALL
SELECT 3 as id, 100 as size
UNION ALL
SELECT 4 as id, 100 as size
ORDER BY id DESC
) items, (SELECT #total:=0) as init
WHERE (#total:=#total+size)+0 <= 200;
+----+------+
| id | size |
+----+------+
| 4 | 100 |
| 3 | 100 |
+----+------+
2 rows in set (0.00 sec)
UPD
Essentially the same, but probably more efficient:
SELECT items.* FROM (
SELECT 1 as id, 100 as size
UNION ALL
SELECT 2 as id, 100 as size
UNION ALL
SELECT 3 as id, 100 as size
UNION ALL
SELECT 4 as id, 100 as size
) items, (SELECT #total:=0) as init
HAVING (#total:=#total+size)+0 <= 200
ORDER BY id DESC;
THe idea is that instead of items there should be your table.

You could work out the possible size of each row by adding up the space requirements for the data type of each field and then use this to work out how many rows are 2 GB.

You could use this query:
SELECT t1.ID
FROM attachments t1, attachments t2
WHERE t2.UploadedDate >= t1.UploadedDate
GROUP BY t1.ID
HAVING sum(t2.Size) > 2GB
to select attachments to delete.
DISCLAIMER while it's standard sql it WILL be slow, as it's Ω(n^2) worst case for table with n rows. Use #newtover's solution.
In this case you might be better off by using stored procedure and looping over the attachments while summing up their sizes.
And for lost postgres souls here's solution equivalent to #newtover's but using window funcion:
SELECT outer_t.ID
FROM (
SELECT t.ID, sum(t.Size) s
OVER (ORDER BY t.UploadedDate DESC)
FROM attachments as t
) as outer_t
WHERE outer_t.s > 2GB

Related

select the first time three unique values appear with sql

From the table below as an example, I need to select all fields from a table where the first 3 columns are the exact same, and take the first time this instance appears. For example, rows 1,3 and 4 should be selected, as they have differing values in the first 3 columns. I have been given this data, and there is no unique ID. There are about 25000 records so handling this once I have SELECT the data in python seems silly therefore the only methods I can think are deleting the records that are nearly identical, or using a SELECT statement I have not worked out yet. Would it be better do try and select the data in small amounts and use python to use the correct bits, as while this is messier, I know how to do it this way?
ID | Class | Season | Grade
---|-------|--------|---------
1 | x | 1 | A
1 | x | 1 | A*
1 | y | 1 | A
1 | x | 2 | C
Try using DISTINCT * it means "select all columns and skip any rows where the values in all columns match some already included row".
So with LIMIT 3 you will have the first 3 unique rows:
SELECT distinct * FROM yourTable LIMIT 3;
You want the first three unique rows. You can actually do this pretty easily if you have an ordering column:
select t.*
from (select t.*,
row_number() over (partition by id, class, season order by <orderingcol>) as seqnum
from t
) t
where seqnum = 1
order by <orderingcol>
limit 3;
Actually, the subquery is not necessary, but the query is a bit more inscrutable without it:
select t.*
from t
where seqnum = 1
order by row_number() over (partition by id, class, season order by <orderingcol>),
<orderingcol>
limit 3;
The one caveat is that this will return duplicates if there are not three unique ones.
Window functions were introduced in MySQL 8+. This could be phrased in earlier versions of MySQL as well:
select t.*
from t join
(select id, class, season, min(<ordering col>) as min_oc
from t
) tt
using (id, class, season)
where t.<ordering col> = tt.min_oc
order by tt.min_oc;

Number duplicate records on the MySQL table

Have a table with similar schema
id control code amount
1 200 12 300
2 400 12 300
3 200 12 300
4 100 10 400
5 100 10 400
6 500 13 500
Trying to list the duplicates of records on a UI.
Using following query I can retrieve the duplicate records and show it on UI.
select * from mwt group by control,code,amount having count(id) > 1;
id control code amount
1 200 12 300
4 100 10 400
Here the records with id 1 and 4 are duplicates of 3 and 5 respectively.
On the UI, the user will click a check-box adjacent to the record and corresponding duplicate records should be populate to the UI. To make things easier trying to populate another column named dup_id. Using this dup_id it is possible to filter the results from UI , which is in the JSON format.
How to create a result set similar to the one shown below?
id control code amount dup_id
1 200 12 300 1
2 400 12 300
3 200 12 300 1
4 100 10 400 4
5 100 10 400 4
6 500 13 500
This seems like a simpler solution than that suggested by #kickstarter - but maybe I've misunderstood the requirement...
SELECT x.*
, y.dup_id
FROM my_table x
LEFT
JOIN
( SELECT MIN(id) dup_id
, control
, code
, amount
FROM my_table
GROUP
BY control
, code
, amount
HAVING COUNT(*) > 1
) y
ON y.control = x.control
AND y.code = x.code
AND y.amount = x.amount;
Depending on how accurate the order has to be, you could do something like this.
This is getting all the unique control / code / amount with a count, to get a flag to know if that is a duplicate row, and ordered by control / code / amount so that they are in order. It does a cross join to initialise a few user variables.
Then it calculates a counter, only incrementing it if any of control / code / amount have changed AND it is a duplicate row. Then sets user variables to store the previous values of control / code / amount.
The outer query then orders the results back in to id order.
SELECT sub3.id,
sub3.control,
sub3.code,
sub3.amount,
sub3.dup_id
FROM
(
SELECT sub2.id,
sub2.control,
sub2.code,
sub2.amount,
#cnt:=IF(#control=control AND #code=code AND #amount=amount AND sub2.id_count IS NOT NULL, #cnt, IF(sub2.id_count IS NULL, #cnt, #cnt + 1)),
#control:=control,
#code:=code,
#amount:=amount,
IF(sub2.id_count IS NULL, NULL, #cnt) AS dup_id
FROM
(
SELECT mwt.id, mwt.control, mwt.code, mwt.amount, sub1.id_count
FROM mwt
LEFT OUTER JOIN
(
SELECT control, code, amount, COUNT(id) AS id_count
FROM mwt
GROUP BY control,code,amount
HAVING id_count > 1
) sub1
ON mwt.control = sub1.control
AND mwt.code = sub1.code
AND mwt.amount = sub1.amount
ORDER BY mwt.control, mwt.code, mwt.amount
) sub2
CROSS JOIN
(
SELECT #cnt:=0, #control:=0, #code:=0, #amount:=0
) sub0
) sub3
ORDER BY id
Note that this is ordering by control, code and amount, so not an exact match for your required output (which would require getting the first duplicates ordered by id first).
EDIT - Simpler and better way to do it. This gets all the duplicate rows with the min id for those duplicates (ordered by the min id), and uses a user variable to add a sequence number for those. Then LEFT OUTER JOINs that back against the main table to put that sequence number in all the matching rows.
SELECT mwt.id, mwt.control, mwt.code, mwt.amount, sub2.dup_id
FROM mwt
LEFT OUTER JOIN
(
SELECT sub1.id, sub1.control, sub1.code, sub1.amount, #cnt:=#cnt+1 AS dup_id
FROM
(
SELECT MIN(id) AS id, control, code, amount
FROM mwt
GROUP BY control,code,amount
HAVING COUNT(id) > 1
ORDER BY id
) sub1
CROSS JOIN
(
SELECT #cnt:=0
) sub0
) sub2
ON mwt.control = sub2.control
AND mwt.code = sub2.code
AND mwt.amount = sub2.amount
ORDER BY mwt.id
Would you need a dup_id column ?. I hope this can be achieved with a simple query like below
select id
, control
, code
, amount
from table
where control = from selected Record
and code = from selected Record
and amount = from selected Record
and id not equals from selected Record
You can very well omit the last not equals if the requirement is to list down duplicates including the selected record.

Single Query to fill in missing column by calculating percentage of group total

I have a mysql table with schema as follows:
group id amount fraction
1 1 3
1 2 5
2 3 2
2 3 1
Each of the rows belongs to a group. Each row also stores a specific value called amount. I want to find the fraction of the group total amount that each row has, so the table will look like this:
AFTER
group id amount fraction
1 1 3 .375
1 2 5 .625
2 3 2 66.67
2 3 1 33.33
To get the value for column 1, I sum up all the amount columns in group 1. That would be 3+5, which is 8. Then I divide the amount in row 1 by the group sum, which yields .375. I do this for all of them.
I could do this by writing query like so:
SELECT SUM(amount) GROUP BY group
Then loop through each group, select rows in that group, calculate fractions, and update the rows. Unfortunately this means that after the initial query I will be dealing with 2 nested for loops, and millions and millions of queries, which will take a long time given the size of the dataset.
I have a subtle feeling that there is a way to do this more efficiently with one mysql query. If anybody has any ideas how to do this with a single query, that's my question.
You can do this in a single query with a join and aggregation:
select t.*, t.amount / tt.sumamount
from t join
(select group, sum(amount) as sumamount
from t
group by group
) tt
on t.group = tt.group;
EDIT:
The update is quite similar:
update t join
(select group, sum(amount) as sumamount
from t
group by group
) tt
on t.group = tt.group
set fraction = t.amount / tt.sumamount;

Single MySQL query to get result from diffrent tables based on no of views

Hi i know basic MySQL query format, but now i have a need to do some complex query.
say (for explanation purpose)
i have 3 tables with column id, item_name, and no_of_views.
i want to a single query to get top 10 viewed items from these 3 table.
Table1
id: item_name no_of_views
1 item1 20
2 item2 25
3 item3 16
4 item4 10
Table2
id: item_name no_of_views
1 itemA 2
2 itemB 5
3 itemC 70
4 itemD 0
Table3
id: item_name no_of_views
1 item_1 34
2 item_2 55
3 item_3 10
4 item_4 1
i know i have to join these tables and then query based on no_of views
like
$query="
(SELECT * FROM Table2)
UNION ALL
(SELECT * FROM Table2)
UNION ALL
(SELECT * FROM Table2)
ORDER by no_of _views DESC LIMIT 10";
This works perfectly fine but my problem is my tables have 100's of item in them and i have many tables, if this is the way i do it the query will have to get all the results from all the tables and then give me only top 10 views,
i want to do it like selecting only 10 values from all tables based on no of views
like say i have 10 tables and each one have a no_of_views, how can i query to see which item has higest views among all tables and then next lower no_of_views from another table or same table which ever is true and so on.
is that even possible.
please suggest.
When comparing your query to the following query, the MySQL Execution plans are identical. The optimizer will still only be selecting the top 10 from each table (not the full record set) and then comparing them as that is the maximum it will need. This method is not as inefficient as you seem to believe.
$query="
(SELECT * FROM Table1 ORDER BY no_of_views DESC LIMIT 10)
UNION ALL
(SELECT * FROM Table2 ORDER BY no_of_views DESC LIMIT 10)
UNION ALL
(SELECT * FROM Table3 ORDER BY no_of_views DESC LIMIT 10)
ORDER by no_of _views DESC LIMIT 10";

SELECT rows with minimum count(*)

Let's say i have a simple table voting with columns
id(primaryKey),token(int),candidate(int),rank(int).
I want to extract all rows having specific rank,grouped by candidate and most importantly only with minimum count(*).
So far i have reached
SELECT candidate, count( * ) AS count
FROM voting
WHERE rank =1
AND candidate <200
GROUP BY candidate
HAVING count = min( count )
But,it is returning empty set.If i replace min(count) with actual minimum value it works properly.
I have also tried
SELECT candidate,min(count)
FROM (SELECT candidate,count(*) AS count
FROM voting
where rank = 1
AND candidate < 200
group by candidate
order by count(*)
) AS temp
But this resulted in only 1 row,I have 3 rows with same min count but with different candidates.I want all these 3 rows.
Can anyone help me.The no.of rows with same minimum count(*) value will also help.
Sample is quite a big,so i am showing some dummy values
1 $sampleToken1 101 1
2 $sampleToken2 102 1
3 $sampleToken3 103 1
4 $sampleToken4 102 1
Here ,when grouped according to candidate there are 3 rows combining with count( * ) results
candidate count( * )
101 1
103 1
102 2
I want the top 2 rows to be showed i.e with count(*) = 1 or whatever is the minimum
Try to use this script as pattern -
-- find minimum count
SELECT MIN(cnt) INTO #min FROM (SELECT COUNT(*) cnt FROM voting GROUP BY candidate) t;
-- show records with minimum count
SELECT * FROM voting t1
JOIN (SELECT id FROM voting GROUP BY candidate HAVING COUNT(*) = #min) t2
ON t1.candidate = t2.candidate;
Remove your HAVING keyword completely, it is not correctly written.
and add SUB SELECT into the where clause to fit that criteria.
(ie. select cand, count(*) as count from voting where rank = 1 and count = (select ..... )
The HAVING keyword can not use the MIN function in the way you are trying. Replace the MIN function with an absolute value such as HAVING count > 10