Select rows until a total amount is met in a column (mysql) - mysql

I have seen this issue in SF, but me being a noob I just can't get my fried brain around them. So please forgive me if this feels like repetition.
My Sample Table
--------------------------
ID | Supplier | QTY
--------------------------
1 1 2
2 1 2
3 2 5
4 3 2
5 1 3
6 2 4
I need to get the rows "UNTIL" the cumulative total for "QTY" is equal or greater than 5 in descending order for a particular supplier id.
In this example, for supplier 1, it will be rows with the ids of 5 and 2.
Id - unique primary key
Supplier - foreign key, there is another table for supplier info.
Qty - double

It ain't pretty, but I think this does it and maybe it can be the basis of something less cumbersome. Note that I use a "fake" INNER JOIN just to get some variable initialized for the first time--it serves no other role.
SELECT ID,
supplier,
qty,
cumulative_qty
FROM
(
SELECT
ID,
supplier,
qty,
-- next line keeps a running total quantity by supplier id
#cumulative_quantity := if (#sup <> supplier, qty, #cumulative_quantity + qty) as cumulative_qty,
-- next is 0 for running total < 5 by supplier, 1 the first time >= 5, and ++ after
#reached_five := if (#cumulative_quantity < 5, 0, if (#sup <> supplier, 1, #reached_five + 1)) as reached_five,
-- next takes note of changes in supplier being processed
#sup := if(#sup <> supplier, supplier, #sup) as sup
FROM
(
--this subquery is key for getting things in supplier order, by descending id
SELECT *
FROM `sample_table`
ORDER BY supplier, ID DESC
) reverse_order_by_id
INNER JOIN
(
-- initialize the variables used to their first ever values
SELECT #cumulative_quantity := 0, #sup := 0, #reached_five := 0
) only_here_to_initialize_variables
) t_alias
where reached_five <= 1 -- only get things up through the time we first get to 5 or above.

How about this? Using two variables.
SQLFIDDLE DEMO
Query:
set #tot:=0;
set #sup:=0;
select x.id, x.supplier, x.ctot
from (
select id, supplier, qty,
#tot:= (case when #sup = supplier then
#tot + qty else qty end) as ctot,
#sup:=supplier
from demo
order by supplier asc, id desc) x
where x.ctot >=5
;
| ID | SUPPLIER | CTOT |
------------------------
| 2 | 1 | 5 |
| 1 | 1 | 7 |
| 3 | 2 | 5 |

Standard SQL has no concept of 'what row number am I up to', so this can only be implemented using something called a cursor. Writing code with cursors is something like writing code with for loops in other languages.
An example of how to use cursors is here:
http://dev.mysql.com/doc/refman/5.0/en/cursors.html

Here is a rough demo about cursor, may be it's helpful.
CREATE TABLE #t
(
ID INT IDENTITY,
Supplier INT,
QTY INT
);
TRUNCATE TABLE #t;
INSERT INTO #t (Supplier, QTY)
VALUES (1, 2),
(1, 2),
(2, 5),
(3, 2),
(1, 3);
DECLARE #sum AS INT;
DECLARE #qty AS INT;
DECLARE #totalRows AS INT;
DECLARE curSelectQTY CURSOR
FOR SELECT QTY
FROM #t
ORDER BY QTY DESC;
OPEN curSelectQTY;
SET #sum = 0;
SET #totalRows = 0;
FETCH NEXT FROM curSelectQTY INTO #qty;
WHILE ##FETCH_STATUS = 0
BEGIN
SET #sum = #sum + #qty;
SET #totalRows = #totalRows + 1;
IF #sum >= 5
BREAK;
END
SELECT TOP (#totalRows) *
FROM #t
ORDER BY QTY DESC;
CLOSE curSelectQTY;
DEALLOCATE curSelectQTY;

SELECT x.*
FROM supplier_stock x
JOIN supplier_stock y
ON y.supplier = x.supplier
AND y.id >= x.id
GROUP
BY supplier
, id
HAVING SUM(y.qty) <=5;

Related

Select into random records using mysql

I have two tables in my MySql database:
Employee Table:
id | name | projectcount
---------------------------------
5 | john | 1
7 | mike | 1
8 | jane | 0
Project Table:
id | name | employeeId
---------------------------------
1 | pro1 | 5
2 | pro2 | 7
3 | pro3 | 8
CREATE TABLE EmployeeTable (
id INT,
name VARCHAR(30),
projectcount int
);
CREATE TABLE ProjectTable (
id INT,
name VARCHAR(30),
employeeId INT
);
INSERT INTO EmployeeTable (id, name, projectcount) VALUES
(5, "john", 1),
(7, "mike", 1),
(8, "jane", 0);
INSERT INTO ProjectTable (id, name, employeeId) VALUES
(1, "pro1", 5),
(2, "pro2", 7),
(3, "pro3", 8);
I would like to select 3 random records from the projects table and update the projectcount in employees table and return those records using select query.
My Approach using Stored Procedure:
DECLARE userId1 int DEFAULT 0;
DECLARE userId2 int DEFAULT 0;
DECLARE userId3 int DEFAULT 0;
DECLARE projectId1 int DEFAULT 0;
DECLARE projectId2 int DEFAULT 0;
DECLARE projectId3 int DEFAULT 0;
select id, employeeid into projectId1, userId1 from project order by RAND() LIMIT 1;
select id, employeeid into projectId2, userId2 from project order by RAND() LIMIT 1;
select id, employeeid into projectId3, userId3 from project order by RAND() LIMIT 1;
update employee set projectcount = projectcount + 1 where id = userId1;
update employee set projectcount = projectcount + 1 where id = userId2;
update employee set projectcount = projectcount + 1 where id = userId3;
select * from project where id in (projectId1, projectId2, projectId3);
The above code works but written in more static way. Is there any improvements done to this to look more cleaner. Thanks.
PLan A
"Is there any improvements done to this to look more cleaner." Do only one request at a time.
Since that probably does not satisfy your intended meaning of "cleaner", I will continue:
Plan B
Flip things around. And use a transaction.
This runs 3 times as fast, and fixes one potential bug. Does that qualify as "cleaner"?
BEGIN;
SELECT #ids := GROUP_CONCAT(id)
-- I assume that `id` is the PRIMARY KEY of `project`?
from project
order by RAND() LIMIT 3;
-- That will run 3 times as fast as 3 separate SELECTs.
-- It avoids tallying the same id 3 times
-- but it may hit the same employee 3 times
SET #sql = CONCAT("UPDATE employee
SET projectcount = projectcount + 1
WHERE id IN (", #ids, ")");
PREPARE _sql FROM #sql;
EXECUTE _sql;
DEALLOCATE PREPARE _sql;
... -- similar execute to get the `SELECT *`
COMMIT;
If you have millions of rows you are picking from, then be aware that ORDER BY RAND() always requires a full table scan and a sort. Here are some kludges to speed that up: http://mysql.rjweb.org/doc.php/random
Plan C
Tighten up the specs (no dup projects, double-assigning a user, define "cleaner", etc) Then we can look for other ways.

mysql how to select count of rows group by sun_calendar_date and div to periodic by every x day

This question as I think needs a function, but every solution is acceptable.
I have a table like below :
sun_calendar_date is integer and its easy for me to convert it to string,
answerset:
id sun_calendar_date data
-------------------------------------------
1 13980120 something
2 13980122 something
3 13980129 something
4 13980130 something
5 13980131 something(end of month)
6 13980201 something
7 13980202 something
8 13980103 something
9 13980103 something
I want to select count of rows group by sun_calendar_date and div to periodic by every x day
for example
for example for period 5 days I had the code below but not working for next month and empty days:
SELECT COUNT(answerset.id) as val,sun_calendar_date FROM answerset
WHERE id group by SUBSTRING(sun_calendar_date,7,2) div 5;
I need this:
val sun_calendar_date
-------------------------------------
2 13980120 20-24=> 2 rows
1 13980129 25-29=> 1 rows
5 13980130 30-03=> 5 rows (next month)
You can use the below to solve your problem:
DELIMITER ;
DROP TABLE IF EXISTS answerset;
CREATE TABLE answerset
(
id INTEGER,
sun_calendar_date DATE,
data VARCHAR(100)
);
INSERT INTO answerset VALUES (1,'13980120','something'),
(2,'13980122','something'),
(3,'13980129','something'),
(4,'13980130','something'),
(5,'13980131','something(end of month)'),
(6,'13980201','something'),
(7,'13980202','something'),
(8,'13980203','something'),
(9,'13980203','something');
-- We need a variable as we need a place to start. You could also set this to whatever date you want
-- if you need to avoid using a variable.
DECLARE #minDate DATE;
SELECT MIN(sun_calendar_date) INTO #minDate FROM answerset;
-- Here we use modulo ((%) returns the remainder of a division) and FLOOR which removes decimal places (you could also
-- convert to INT too). This gives us the number of days after the minimum date grouped into 5s. You could
-- also replace 5 with a variable if you need to change the size of your groups.
SELECT DATE_ADD(sun_calendar_date, INTERVAL -FLOOR((DATEDIFF(sun_calendar_date, #minDate))) % 5 DAY) AS PeriodStart,
MIN(sun_calendar_date) AS Period,
COUNT(DISTINCT sun_calendar_date) AS Val
FROM answerset
GROUP BY DATE_ADD(sun_calendar_date, INTERVAL -FLOOR((DATEDIFF(sun_calendar_date, #minDate))) % 5 DAY)
ORDER BY sun_calendar_date;
You need an auxiliary calendar table. I tried to get this table from the information_schema.columns table.
select
min(a.sun_calendar_date) qnt,
count(a.sun_calendar_date) sun_calendar_date
from (
select
#seq beg,
#seq := adddate(#seq, 5) fin
from (
select
max(sun_calendar_date) x,
#seq := adddate(min(sun_calendar_date),
-(day(min(sun_calendar_date)) % 5))
from answerset
) init
cross join information_schema.columns c1
cross join information_schema.columns c2
where #seq <= init.x
) calendar
join answerset a
on a.sun_calendar_date >= calendar.beg and
a.sun_calendar_date < calendar.fin
group by calendar.beg;
Output:
| qnt | sun_calendar_date |
+------------+-------------------+
| 1398-01-20 | 2 |
| 1398-01-29 | 1 |
| 1398-01-30 | 5 |
Test it online with SQL Fiddle.
MySQL 8.0 with recursive CTEs:
with recursive
init as (
select
adddate(min(sun_calendar_date),
-(day(min(sun_calendar_date)) % 5)) beg,
max(sun_calendar_date) x
from answerset
),
calendar(beg, fin, x) as (
select beg, adddate(beg, 5), x from init
union all
select fin, adddate(fin, 5), x from calendar where fin <= x
)
select
min(a.sun_calendar_date) qnt,
count(a.sun_calendar_date) sun_calendar_date
from answerset a
join calendar c
on a.sun_calendar_date >= c.beg and a.sun_calendar_date < c.fin
group by c.beg;
Test it online with db<>fiddle.

Enumerate records sequentially, grouped and by date, in MySQL

This seems like such a simple question and I terrified that I might be bashed with the duplicate question hammer, but here's what I have:
ID Date
1 1/11/01
1 3/3/03
1 2/22/02
2 1/11/01
2 2/22/02
All I need to do is enumerate the records, based on the date, and grouped by ID! As such:
ID Date Num
1 1/11/01 1
1 3/3/03 3
1 2/22/02 2
2 1/11/01 1
2 2/22/02 2
This is very similar to this question, but it's not working for me. This would be great but it's not MySQL.
I've tried to use group by but it doesn't work, as in
SELECT ta.*, count(*) as Num
FROM temp_a ta
GROUP BY `ID` ORDER BY `ID`;
which clearly doesn't run since the GROUP BY always results to one value.
Any advice greatly appreciated.
Let's assume the table to be as follows:
CREATE TABLE q43381823(id INT, dt DATE);
INSERT INTO q43381823 VALUES
(1, '2001-01-11'),
(1, '2003-03-03'),
(1, '2002-02-22'),
(2, '2001-01-11'),
(2, '2002-02-22');
Then, one of the ways in which the query to get the desired output could be written is:
SELECT q.*,
CASE WHEN (
IF(#id != q.id, #rank := 0, #rank := #rank + 1)
) >=1 THEN #rank
ELSE #rank := 1
END as rank,
#id := q.id AS buffer_id
FROM q43381823 q
CROSS JOIN (
SELECT #rank:= 0,
#id := (SELECT q2.id FROM q43381823 AS q2 ORDER BY q2.id LIMIT 1)
) x
ORDER BY q.id, q.dt
Output:
id | dt | rank | buffer_id
-------------------------------------------------
1 | 2001-01-11 | 1 | 1
1 | 2002-02-22 | 2 | 1
1 | 2003-03-03 | 3 | 1
2 | 2001-01-11 | 1 | 2
2 | 2002-02-22 | 2 | 2
You may please ignore the buffer_id column from the output - it's irrelevant to the result, but required for the resetting of rank.
SQL Fiddle Demo
Explanation:
#id variable keeps track of every id in the row, based on the sorted order of the output. In the initial iteration, we set it to id of the first record that may be obtained in the final result. See sub-query SELECT q2.id FROM q43381823 AS q2 ORDER BY q2.id LIMIT 1
#rank is set to 0 initially and is by default incremented for every subsequent row in the result set. However, when the id changes, we reset it back to 1. Please see the CASE - WHEN - ELSE construct in the query for this.
The final output is sorted first by id and then by dt. This ensures that #rank is set incrementally for every subsequent dt field within the same id, but gets reset to 1 whenever a new id group begins to show up in the result set.

Randomly select rows from a table based on weight and probability

I'm using MySQL. I have a table which looks like that:
id: primary key
name: varchar
weight: int (this can be either 1,2 or 3)
What I want to do is randomly select one row until I get a list of 400 selected rows from a table similar to that below that has 500 rows, but taking into account the weight.
For example, if I have 3 rows:
id, name, weight
1, "some content", 2
2, "other content", 1
3, "something", 3
When creating the list, rows that have a weight of 2 appear 30% of times in the list, rows that have a weight of 1 appear 20% of times in the list and rows with weight of 3 appear 50% of times in the list.
Duplicates are permitted but not back to back.
Is there a way to do that?
If you don't understand something please feel free to ask.
Thanks in advance.
I still havent solve the repetition part. But this will give you a start
SQL Fiddle Demo
most inner select assign a random number
middle select use variables to assign a row_number to each row partition by Weight
last select filter to match the ratio. In this case generate a list of size 50.
the original data has an evenly distribution of ~30 for each category. So size 60 will be the limit to achive 50% Weight = 3
.
SELECT `ID`,`Name`,`Weight`, RowNumber
FROM (
SELECT *,
#row_num := IF(#prev_value = `Weight`,
#row_num + 1,
IF(#prev_value:=`Weight`,
1,
1)
) AS RowNumber
FROM (
SELECT `ID`,`Name`,`Weight`, rand() as rng
FROM `myTable`
ORDER BY `Weight`, rng
) X
CROSS JOIN (SELECT #row_num := 1, #prev_value := 0) y
) T
WHERE ( Weight = 3 and RowNumber <= 50 * 0.5 )
OR ( Weight = 2 and RowNumber <= 50 * 0.3 )
OR ( Weight = 1 and RowNumber <= 50 * 0.2 )
ORDER BY Weight, RowNumber
I suggest you make a temporary table where all records with 1's are repeated 2 times, alle records with 2 are repeated 3 times, and all records with 3's are repeated 5 times. Then make random selections in the temporary table among all the records. This should statistically end up with a distribution very near your target, if the total is large enough (e.g. 400).
In my other answer I solve how assign an ID to each weight. Here I will show you how create a list to handle the duplicates.
I use tables to show the whole process, also you can do select on the demo to validate each result. But with some work can be combine in a single query but wont be easy to read.
SQL FIDDLE DEMO
First we need to create a table to store which row will participate in your list
CREATE TABLE `incr` (
`weight` mediumint,
`row` mediumint
);
Using store procedure we fill the table.
CREATE PROCEDURE dowhile(IN Size INT)
BEGIN
DECLARE v1 INT DEFAULT Size * 0.5;
WHILE v1 >= 0 DO
IF v1 <= (Size - 1) * 0.5 THEN
INSERT incr VALUES (3, v1);
END IF;
IF v1 <= (Size - 1) * 0.3 THEN
INSERT incr VALUES (2, v1);
END IF;
IF v1 <= (Size - 1) * 0.2 THEN
INSERT incr VALUES (1, v1);
END IF;
SET v1 = v1 - 1;
END WHILE;
END//
CALL dowhile(300); -- Indicate List Size
Now create a new table to know the size of each weight category in our sample.
CREATE TABLE maxWeight
SELECT `Weight`, COUNT(*) as mw
FROM `myTable`
GROUP BY `Weight`;
Using % operator we can repeat the rows to fill the required size
CREATE TABLE rowList
SELECT i.weight,
CASE WHEN i.row >= w.mw then i.row % w.mw
ELSE i.row
END newrow
FROM incr i
JOIN maxWeight w
ON i.weight = w.weight;
As you can see here even when my list is only 100 the final result is 300
SELECT weight, count(*)
FROM rowList
GROUP BY weight;
| weight | count(*) |
|--------|----------|
| 1 | 60 |
| 2 | 90 |
| 3 | 150 |
Now join both tables together
CREATE TABLE finalResult
SELECT `ID`,`Name`, T.`Weight`, RowNumber
FROM (
SELECT *,
#row_num := IF(#prev_value = `Weight`,
#row_num + 1,
IF(#prev_value:=`Weight`,
0,
0)
) AS RowNumber
FROM (
SELECT `ID`,`Name`,`Weight`, rand() as rng
FROM `myTable`
ORDER BY `Weight`, rng
) X
CROSS JOIN (SELECT #row_num := 0, #prev_value := 0) y
) T
JOIN rowList
ON T.`RowNumber` = rowList.`newrow`
AND T.`Weight` = rowList.`weight`;
The final result has the desire ratio using repeat the names
SELECT `Weight`, COUNT(*) total, COUNT(DISTINCT `Name`) d_name
FROM finalResult
GROUP BY `Weight`;
| Weight | total | d_name |
|--------|-------|--------|
| 1 | 60 | 36 |
| 2 | 90 | 32 |
| 3 | 150 | 30 |
Even when original table has 37 weight = 1, the tool I use to generate random values duplicate one Name, so d_name = 36

Return the k rows that appear the most

Lets say I got this table
photo_id user_id tag
0 0 Car
0 0 Bridge
0 0 Sky
20 1 Car
20 1 Bridge
2 2 Bridge
2 2 Cat
1 3 Cat
I need to return the k tags that appear the most, WITHOUT USING LIMIT.
tie breaker for tags that appear the same number of times will be the lexicographically order (smallest will have the highest score).
I will need for each tag the number of tags he appeared as well.
for example, for the table above with k=2 the output should be:
Tag #
Bridge 3
Car 2
and for k=4:
Tag #
Bridge 3
Car 2
Cat 2
Sky 1
Try this :
SELECT t1.tag, COUNT(*) as mycount FROM table t1
GROUP BY t1.tag
ORDER BY mycount DESC
LIMIT 2;
Replace the limit ammount for your k var.
Inserting data into table:
INSERT INTO new_table VALUES
(0,0,'Car'),
(0,0,'Bridge'),
(0,0,'Sky'),
(20,1,'Car'),
(20,1,'Bridge'),
(0,0,'bottle');
To query:
SELECT tag, COUNT(1) FROM new_table
GROUP BY tag HAVING COUNT(1) = (
SELECT MIN(c) FROM
(
SELECT COUNT(1) AS c FROM new_table GROUP BY tag
) AS temp
)
Output:
+--------+----------+
| tag | count(1) |
+--------+----------+
| bottle | 1 |
| Sky | 1 |
+--------+----------+
Note : Get smallest count tag
Although this is homework and we are not supposed to answer such questions (not till you've proved that attempted to solve it and not getting desired result), I got a little curious about not using LIMIT in this question, so I am posting here.
The idea is to rank the result and then select only rows whose rank are less than or equal to value k (as in your case). The rank column is like adding a S.No. (serial number) column to your result and selecting till desired number.
DDL statements:
CREATE TABLE new_table(
photo_id INTEGER,
user_id INTEGER,
tag VARCHAR(10)
);
INSERT INTO new_table VALUES
(0, 0, 'Car'),
(0, 0, 'Bridge'),
(0, 0, 'Sky'),
(20, 1, 'Car'),
(20, 1, 'Bridge'),
(2, 2, 'Bridge'),
(2, 2, 'Cat'),
(1, 3, 'Cat');
Query:
SELECT
tag, tag_count,
#k := #k + 1 AS k
FROM (
SELECT
tag,
COUNT(*) AS tag_count
FROM new_table
GROUP BY tag
ORDER BY tag_count DESC
) AS temp, (SELECT #k := 0) AS k
WHERE #k < 2;
Check this SQLFiddle.