For example have such table (named purchase_invoice_items)
Id
NameOfItem
PurchaseQuantity
PurchaseDate
SoldQuantity
1
x
2
2022-04-01
2
y
11
2022-04-01
3
z
8
2022-05-19
4
x
23
2022-08-19
5
x
15
2022-05-19
And i know that sum of sold quantity for NameOfItem x is 20. Sold 20 units of item x. I want to distribute the sold items between PurchaseQuantity using first-in-first-out method. Want to see table like this
Id
NameOfItem
PurchaseQuantity
PurchaseDate
SoldQuantity
1
x
2
2022-04-01
2
2
y
11
2022-04-01
3
z
8
2022-05-19
4
x
23
2022-08-19
3
5
x
15
2022-05-19
15
Using mysql two queries and php, i can do it in following way.
At first i select necessary data from mysql:
$sql_select_purchase_data = 'SELECT `IdPii`, `PurchasedQuantity`
FROM `purchase_invoice_items` WHERE `NameOfItem` = "x"
ORDER BY `PurchaseDate` ASC;';
Then create sql to update.
$sql_update_sold_quantity = 'INSERT INTO `purchase_invoice_items` (`IdPii`, `SoldQuantity`) VALUES ';
php code to continue creating sql
if( isset($arr_select_purchase_data) ){
$sum_of_sold_quantity = 20;
foreach( $arr_select_purchase_data as $one_arr_select_purchase_data ){
if( $sum_of_sold_quantity > 0 ){
$sql_update_sold_quantity .= '(?,?), ';
$data_update_sold_quantity[] = $one_arr_select_purchase_data['IdPii'];//For 'IdPii'
$data_update_sold_quantity[] = min( $one_arr_select_purchase_data['PurchasedQuantity'], $sum_of_sold_quantity);//For 'SoldQuantity'
$sum_of_sold_quantity = $sum_of_sold_quantity - min( $one_arr_select_purchase_data['PurchasedQuantity'], $sum_of_sold_quantity);
}//if( $sum_of_sold_quantity > 0 ){
else{ break; }
}//foreach(
$sql_update_sold_quantity = rtrim(trim($sql_update_sold_quantity), ','). ' ON DUPLICATE KEY UPDATE `SoldQuantity`= VALUES(`SoldQuantity`);';
But this is waste of resources (if i need to select-update many rows)? Two mysql queries and additionally php code.
Any ideas how can i get the same using only mysql (one mysql query; without php)?
I have a large MySQL table with sorted data. When I need to find a starting point, I perform a binary search to find the lower bound ID (auto increment). The only problem is once some data is deleted, I need to look at the first existing row with a lower ID if the ID given by the algorithm doesn't exist. How should I modify this code to achieve that?
$l = 1;
$h = $max; //SELECT MAX(id)
while ($h - $l > 1){
$m = ($h + $l) / 2;
$q = mysqli_query($db, "SELECT col FROM tab WHERE id=". floor($m));
$result = array();
while($result[] = mysqli_fetch_row($q)){;}
if ($result[0][0] < $val) $l = $m;
else $h = $m;
}
echo round($m);
For example I want to find which rows have the value of col greater than 12345 and the table has max ID 10000. I start by looking at row 5000, where the col = 9000, then 7500 (col = 13000), then 6250 has been deleted, so I start looking for the 1st existing row with ID < 6250 and I find that 6245 has col = 10500. Now I'm looking between IDs 6873 and 7500 etc.
The right way to do this
So you have a table like this:
| ID | col |
---------------
| 1 | 15 |
| 3 | 155 |
| 18 | 9231|
| 190 |14343|
| 500 |16888|
You can get find 14343 with the following query:
SELECT ID, col FROM the_table WHERE col>12345 LIMIT 1;
To make it faster, you'd need to add an index (index word is worth googling)
ALTER TABLE `the_table` ADD INDEX `col` (`col`);
After that mysql will create a tree structure internally and will be doing binary searches on it for you.
This will be working much faster as you'll avoid multiple network roundtrips + other per request expenses (query parsing, optimization, all the locks & mutexes, ...)
Answer to your question
I need to look at the first existing row with a lower ID
E.g. you'd like to get first row with an ID < than 300, you do this (limit is what makes the query return only 1 result):
SELECT col FROM the_table WHERE ID < 300 LIMIT 1;
I make a cohort analysis processor. Input parameters: time range and step, condition (initial event) to exctract cohorts, additional condition (retention event) to check after each N hours/days/months. Output parameters: cohort analysis grid, like this:
0h | 16h | 32h | 48h | 64h | 80h | 96h |
cohort #00 15 | 6 | 4 | 1 | 1 | 2 | 2 |
cohort #01 1 | 35 | 8 | 0 | 2 | 0 | 1 |
cohort #02 0 | 3 | 31 | 11 | 5 | 3 | 0 |
cohort #03 0 | 0 | 4 | 27 | 7 | 6 | 2 |
cohort #04 0 | 1 | 1 | 4 | 29 | 4 | 3 |
Basically:
fetch cohorts: unique users who did something 1 in every period from time_begin every time_step.
find how many of them (in each cohort) did something 2 after N seconds, N*2 seconds, N*3, and so on until now.
In short - I have 2 solutions. One works too slow and includes a heavy select with joins for each data step: 1 day, 2 day, 3 day, etc. I want to optimize it by joining result for every data step to cohorts - and it's the second solution. It looks like it works but I'm not sure it's the best way and that it will give the same result even if cohorts will intersect. Please check it out.
Here's the whole story.
I have a table of > 100,000 events, something like this:
#user-id, timestamp, event_name
events_view (uid varchar(64), tm int(11), e varchar(64))
example input row:
"user_sampleid1", 1423836540, "level_end:001:win"
To make a cohort analisys first I extract cohorts: for example, users, who send special event '1st_launch' in 10 hour periods starting from 2015-02-13 and ending with 2015-02-16. All code in this post is simplified and shortened to see the idea.
DROP TABLE IF EXISTS tmp_c;
create temporary table tmp_c (uid varchar(64), tm int(11), c int(11) );
set beg = UNIX_TIMESTAMP('2015-02-13 00:00:00');
set en = UNIX_TIMESTAMP('2015-02-16 00:00:00');
select min(tm) into t_start from events_view ;
select max(tm) into t_end from events_view ;
if beg < t_start then
set beg = t_start;
end if;
if en > t_end then
set en = t_end;
end if;
set period = 3600 * 10;
set cnt_c = ceil((en - beg) / period) ;
/*works quick enough*/
WHILE i < cnt_c DO
insert into tmp_c (
select uid, min(tm), i from events_view where
locate("1st_launch", e) > 0 and tm > (beg + period * i)
AND tm <= (beg + period * (i+1)) group by uid );
SET i = i+1;
END WHILE;
Cohorts may consist the same user ids, though usually one user is exist only in one cohort. And in each cohort users are unique.
Now I have temp table like this:
user_id | 1st timestamp | cohort_no
uid1 1423836540 0
uid2 1423839540 0
uid3 1423841160 1
uid4 1423841460 2
...
uidN 1423843080 M
Then I need to again divide time range on periods and calculate for each period how many users from each cohort have sent event "level_end:001:win".
For each small period I select all unique users who have sent "level_end:001:win" event and left join them to tmp_c cohorts table. So I have something like this:
user_id | 1st timestamp | cohort_no | user_id | other fields...
uid1 1423836540 0 uid1
uid2 1423839540 0 null
uid3 1423841160 1 null
uid4 1423841460 2 uid4
...
uidN 1423843080 M null
This way I see how many users from my cohorts are in those who have sent "level_end:001:win", exclude not found by where clause: where t2.uid is not null.
Finally I perform grouping and have counts of users in each cohort, who have sent "level_end:001:win" in this particluar period.
Here's the code:
DROP TABLE IF EXISTS tmp_res;
create temporary table tmp_res (uid varchar(64) CHARACTER SET cp1251 NOT NULL, c int(11), cnt int(11) );
set i = 0;
set cnt_c = ceil((t_end - beg) / period) ;
WHILE i < cnt_c DO
insert into tmp_res
select concat(beg + period * i, "_", beg + period * (i+1)), c, count(distinct(uid)) from
(select t1.uid, t1.c from tmp_c t1 left join
(select uid, min(tm) from events_view where
locate("level_end:001:win", e) > 0 and
tm > (beg + period * i) AND tm <= (beg + period * (i+1)) group by uid ) t2
on t1.uid = t2.uid where t2.uid is not null) t3
group by c;
SET i = i+1;
END WHILE;
/*getting result of the first method: tooo slooooow!*/
select * from tmp_res;
The result I've got (it's ok that some cohorts are not appear on some periods):
"1423832400_1423890000","1","35"
"1423832400_1423890000","2","3"
"1423832400_1423890000","3","1"
"1423832400_1423890000","4","1"
"1423890000_1423947600","1","21"
"1423890000_1423947600","2","50"
"1423890000_1423947600","3","2"
"1423947600_1424005200","1","9"
"1423947600_1424005200","2","24"
"1423947600_1424005200","3","70"
"1423947600_1424005200","4","6"
"1424005200_1424062800","1","7"
"1424005200_1424062800","2","15"
"1424005200_1424062800","3","21"
"1424005200_1424062800","4","32"
"1424062800_1424120400","1","7"
"1424062800_1424120400","2","13"
"1424062800_1424120400","3","24"
"1424062800_1424120400","4","18"
"1424120400_1424178000","1","10"
"1424120400_1424178000","2","12"
"1424120400_1424178000","3","18"
"1424120400_1424178000","4","14"
"1424178000_1424235600","1","6"
"1424178000_1424235600","2","7"
"1424178000_1424235600","3","9"
"1424178000_1424235600","4","12"
"1424235600_1424293200","1","6"
"1424235600_1424293200","2","8"
"1424235600_1424293200","3","9"
"1424235600_1424293200","4","5"
"1424293200_1424350800","1","5"
"1424293200_1424350800","2","3"
"1424293200_1424350800","3","11"
"1424293200_1424350800","4","10"
"1424350800_1424408400","1","8"
"1424350800_1424408400","2","5"
"1424350800_1424408400","3","7"
"1424350800_1424408400","4","7"
"1424408400_1424466000","2","6"
"1424408400_1424466000","3","7"
"1424408400_1424466000","4","3"
"1424466000_1424523600","1","3"
"1424466000_1424523600","2","4"
"1424466000_1424523600","3","8"
"1424466000_1424523600","4","2"
"1424523600_1424581200","2","3"
"1424523600_1424581200","3","3"
It works but it takes too much time to process because there are many queries here instead of one, so I need to rewrite it.
I think it can be rewritten with joins, but I'm still not sure how.
I decided to make a temporary table and write period boundaries in it:
DROP TABLE IF EXISTS tmp_times;
create temporary table tmp_times (tm_start int(11), tm_end int(11));
set cnt_c = ceil((t_end - beg) / period) ;
set i = 0;
WHILE i < cnt_c DO
insert into tmp_times values( beg + period * i, beg + period * (i+1));
SET i = i+1;
END WHILE;
Then I get periods-to-events mapping (user_id + timestamp represent particular event) to temp table and left join it to cohorts table and group the result:
SELECT Concat(tm_start, "_", tm_end) per,
t1.c coh,
Count(DISTINCT( t2.uid ))
FROM tmp_c t1
LEFT JOIN (SELECT *
FROM tmp_times t3
LEFT JOIN (SELECT uid,
tm
FROM events_view
WHERE Locate("level_end:101:win", e) > 0)
t4
ON ( t4.tm > t3.tm_start
AND t4.tm <= t3.tm_end )
WHERE t4.uid IS NOT NULL
ORDER BY t3.tm_start) t2
ON t1.uid = t2.uid
WHERE t2.uid IS NOT NULL
GROUP BY per,
coh
ORDER BY per,
coh;
In my tests this returns the same result as method #1. I can't check the result manually, but I understand how method #1 work more and as far I can see it gives what I want. Method #2 is faster, but I'm not sure it's the best way and it will give the same result even if cohorts will intersect.
Maybe there are well-known common methods to perform a cohort analysis in SQL? Is method #1 I use more reliable than method #2? I work with joins not that often, that's why still do not fully understand joins magic yet.
Method #2 looks like pure magic, and I used to not believe in what I don't understand :)
Thanks for answers!
I would like to create a statement that is equivalent to (x - y == 0) ? return 0 : return 100 in MySQL. Something that might look like this:
SELECT id, [(integer_val - 10 == 0) ? 0 : 100] AS new_val FROM my_table
I want to compare an attribute in each row to a certain number, and if the difference between that number and the number in the row is 0, I want it to give me 0, otherwise, I want it to give me 100.
Example:
Applying this query on my_table (with 10 being the 'compared to' number):
id | integer_val
===================
1 10
2 10
3 3
4 9
Would return this:
id | new_val
===================
1 100
2 100
3 0
4 0
How can I do this?
Try this:
SELECT id, IF(integer_val = 10, 100, 0) AS new_val
FROM my_table;
OR
SELECT id, (CASE WHEN integer_val = 10 THEN 100 ELSE 0 END) AS new_val
FROM my_table;
Use case when statement:
select *, (case when integer_val = 10 then 100 else 0 end) as New_Val
from yourtable
Try using the IF function:
SELECT id, IF(integer_val - 10 = 0, 0, 100) AS new_val FROM my_table
(I stuck with your condition expression, but it can be simplified a bit since integer_value - 10 = 0 has exactly the same truth value as integer_value = 10.)
Note that the IF function is different from MySQL's IF statement used for stored programs.
I'm working with a timecard database and trying to determine how much time for each punch falls into each one of three distinct shift periods.
For example
shift 1 = 7AM - 3pm
shift 2 = 3pm - 11pm
shift 3 = 11pm - 7am
Joe clocks in at 6:45AM and out at 1:45PM
15 minutes of this would need to be calculated as time on shift 3, but I'm not sure how to go about slicing out that bit of time in MySQL. All I have are a time in and time out field.
There are three shift periods:
Shift TimeStart TimeEnd
1 07:00 15:00
2 15:00 23:00
3 23:00 07:00
Sample Data
ID TimeIn TimeOut Hours
100 2014-07-31 06:45 2014-07-31 13:45 7
Desired Result
ID Shift TimeWorked
100 1 06:45
100 2 00:00
100 3 00:15
SQL Fiddle
I was able to come up with a solution for this using PHP.
What I did was loop through each punch, minute by minute, and determine what shift each one minute time span applies to. Within the loop, I increment one of 4 variable for shifts 1, 2, 3 or 0(no shift pay), and at the end, dump those variables to the database for the record being analyzed.
$query = "SELECT * FROM source_filtered_timecard";
$result_set = mysqli_query($connection, $query);
while($record = mysqli_fetch_assoc($result_set)) {
$checkCount++;
$shift1_hours = 0; $shift2_hours = 0;
$shift3_hours = 0; $shift0_hours = 0;
$time = strtotime($record['in_time']);
$time_out = strtotime('-1 Minute',strtotime($record['out_time']));
while($time <= $time_out) {
$mysql_time = date('G:i:s',$time);
//SELECT SHIFT CODE THAT APPLIES TO CURRENT PIT//
$query = "SELECT shift FROM shift_rules WHERE STR_TO_DATE('{$mysql_time}','%H:%i:%S') BETWEEN start_time_24 AND end_time_24 LIMIT 1";
$current_shift_set = mysqli_query($connection, $query);
if(mysqli_num_rows($current_shift_set) == 1) {
$current_shift = mysqli_fetch_assoc($current_shift_set);
if($current_shift['shift'] == '1'){$shift1_hours++;}
elseif($current_shift['shift'] == '2'){$shift2_hours++;}
elseif($current_shift['shift'] == '3'){$shift3_hours++;}
else{$shift0_hours++;}
} else {
$shift0_hours++;
}
//INCRIMENT TIME BY 1 MINUTE//
$time = strtotime("+1 minute",$time);
}
$shift1_hours = $shift1_hours/60;
$shift2_hours = $shift2_hours/60;
$shift3_hours = $shift3_hours/60;
$shift0_hours = $shift0_hours/60;
//UPDATE TIMECARD ROWS WITH SHIFT HOURS//
$query = "UPDATE source_filtered_timecard
SET shift1_time = {$shift1_hours},
shift2_time = {$shift2_hours},
shift3_time = {$shift3_hours},
shift0_time = {$shift0_hours}
WHERE id = '{$record['id']}'";
$update = mysqli_query($connection, $query);
}
I'd do this in PHP. Looking at Joe's example, it is initially tempting to try to work out how his data maps onto the shift rules. However, I think it would be a neater solution to do it the other way around i.e. map the rules onto his data, until there is no data to classify.
The algorithm might go a bit like this:
Joe's remaining time is 6:45 - 13:45
Let's map the first rule onto it (i.e. how much of this rule contributes to that range?):
shift 1 = 7:00 - 15:00 (6:45 hours)
Now Joe's remaining time is:
6:45 - 7:00
Do the next rule:
shift 2 = 15:00 - 23:00 (0 hours)
Joe's remaining time is therefore unchanged. And finally the last rule:
shift 3 = 23:00 - 1d7:00 (0:15 hours)
There are a few things to note:
The amount of worked time could be stored in an array (a "worked time set"). It starts off as a simple start and end, but if a rule removes a chunk of time from the middle, it may split into two starts and two ends
When applying a rule, convert them to actual timestamps (i.e. a date and a time) so the wrapping to the next day works correctly
Write a function that takes a worked time set, plus a rule start and end timestamp, modifies a worked time set, and returns a number of hours for the rule
First you need to add day column for differentiate time from 23:59:59 to next day time.
id| shift_name | time_start | time_end | day
1 | Day Shift | '07:00:00' | '18:59:59' | 1
2 | Night Shift | '19:00:00' | '06:59:59' | 2
Procedure :
DELIMITER $$ CREATE PROCEDURE sp_check_shift(IN intime time) PROC: begin IF(intime>='00:00:01' AND intime<='23:59:59') THEN IF ( SELECT 1 FROM tbl_shift WHERE time_start<=intime AND time_end>=intime AND day=1) THEN SELECT shift_name FROM tbl_shift WHERE time_start<=intime AND time_end>=intime AND day=1;ELSEIF ( SELECT 1 FROM tbl_shift WHERE time_start<=intime AND day=2) THEN SELECT shift_name FROM tbl_shift WHERE time_start<=intime AND day=2;ELSEIF ( SELECT 1 FROM tbl_shift WHERE time_end>=intime AND day=2) THEN SELECT shift_name FROM tbl_shift WHERE time_end>=intime AND day=2;END IF;ELSE SELECT 'Invalid Time' shift_name;END IF;END$$ delimiter ;