INSERT interpolated rows into existing table - mysql

I have a MySQL table similar to this simplified example:
orders table
--------------------------------
orderid stockid rem_qty reported
--------------------------------
1000000 100 500 00:01:00
1000000 100 200 01:10:00
1000000 100 200 03:20:00
1000000 100 100 04:30:00
1000000 100 50 11:30:00
:
1000010 100 100 00:01:00
1000010 100 100 01:10:00
1000010 100 20 03:20:00
:
1000020 200 1000 03:20:00
1000020 200 995 08:20:00
1000020 200 995 11:50:00
--------------------------------
The table comes from a 3rd party, weighs in at some 80-100M rows daily, and the format is fixed. It would be good, except it lacks rows showing when rem_qty reaches zero. The good news is, I can estimate them, at least a good upper/lower bound:
The 3rd party scans each distinct stockid at essentially random times throughout the day, and returns one row for each open orderid at that time. For example, stockid = 100 was scanned at (00:01, 01:10, 03:20, 04:30, 11:30). At each time, there will be a row for every current orderid with that stockid. Hence, one can see that orderid = 1000000 was still open at 11:30 (the last scan in our data), but sometime between 03:20 and 04:30, orderid = 1000010 sold out. (The times for stockid = 200 have no bearing on stockid = 100).
So, what I would like to do is INSERT the interpolated rows with rem_qty = 0 for each sold-out order. In this case, we can (only) say that orderid = 1000010 went to 0 at AVG('03:20:00','04:30:00'), so I would like to INSERT the following row:
orders table INSERT
--------------------------------
orderid stockid rem_qty reported
--------------------------------
1000010 100 0 03:55:00
--------------------------------
Trouble is, my SQL is rusty and I've not been able to figure out this complex query. Among other failed attempts, I've tried various JOINs, made a TEMPORARY TABLE stock_report(stockid,last_report), and I can do something like this:
SELECT orders.stockid,
orderid,
MAX(reported),
TIMEDIFF(last_report,MAX(reported)) as timediff
FROM orders
INNER JOIN stock_report
ON orders.stockid = stock_report.stockid
GROUP BY orderid
HAVING timediff > 0
ORDER BY orderid
This would show every sold-out order, along with the HH:MM:SS difference between the last time the orderid was reported, and the last time its stockid was reported. It's maybe a good start, but instead of last_report, I need to be able to calculate a next_report column (specific to this orderid, which would basically be:
SELECT MIN(reported) AS next_report
FROM orders
WHERE reported > #order_max_reported
ORDER BY reported
LIMIT 1
But that's just a vain attempt to illustrate part of what I'm after. Again, what I really need is a way to INSERT new rows into the orders() table at the AVG() time the order's rem_qty went to 0, as in the orders table INSERT example table, above. Or, maybe the 64,000 GFLOP question: would I be better off moving this logic to my main (application) language? I'm working with 100 million rows/day, so efficiency is a concern.
Apologies for the lengthy description. This really is the best I could do to edit for conciseness! Can anyone offer any helpful suggestions?

Possible to do. Have a sub query that gets the max reported time for each order id / stock id and join that against the orders table where the stock id is the same and the latest time is less that the order time. This gets you all the report times for that stock id that are greater than the latest time for that stock id and order id.
Use MIN to get the lowest reported time. Convert the 2 times to seconds, add them together and divide by 2, then convert back from seconds to a time.
Something like this:-
SELECT orderid, stockid, 0, SEC_TO_TIME((TIME_TO_SEC(next_poss_order_report) + TIME_TO_SEC(last_order_report)) / 2)
FROM
(
SELECT a.orderid, a.stockid, last_order_report, MIN(b.reported) next_poss_order_report
FROM
(
SELECT orderid, stockid, MAX(reported) last_order_report
FROM orders_table
GROUP BY orderid, stockid
) a
INNER JOIN orders_table b
ON a.stockid = b.stockid
AND a.last_order_report < b.reported
GROUP BY a.orderid, a.stockid, a.last_order_report
) sub0;
SQL fiddle here:-
http://www.sqlfiddle.com/#!2/cf129/17
Possible to simplify this a bit to:-
SELECT a.orderid, a.stockid, 0, SEC_TO_TIME((TIME_TO_SEC(MIN(b.reported)) + TIME_TO_SEC(last_order_report)) / 2)
FROM
(
SELECT orderid, stockid, MAX(reported) last_order_report
FROM orders_table
GROUP BY orderid, stockid
) a
INNER JOIN orders_table b
ON a.stockid = b.stockid
AND a.last_order_report < b.reported
GROUP BY a.orderid, a.stockid, a.last_order_report;
These queries might take a while, but are probably more efficient than running many queries from scripted code.

Related

Total amount of sales done for each product using SQL

Here is the structure of 1st Table called Product.
PRODID PDESC PRICE CATEGORY DISCOUNT
101 BALL 10 SPORTS 5
102 SHIRT 20 APPAREL 10
Here is the structure of 2nd table called SaleDetail.
SALEID PRODID QUANTITY
1001 101 5
1001 101 2
1002 102 10
1002 102 5
I am trying to get total sales amount for each product by joining 2 tables. Here is the SQL i tried but its not giving correct result.
select a.prodid,
(sum((price - discount))),
sum(quantity),
(sum((price - discount))) * sum(quantity)
from product a
join saledetail b on a.prodid = b.prodid
group by a.prodid
2nd column of the query is giving incorrect final price. Please help me correct this SQL.
Please find an indicative answer to your question in the fiddle.
A problem stems from the aggregation of the difference of price. In case that the same product has two different prices, then these prices would be aggregated to one.
Moreover, you multiple the sums of the prices and quantities, while you need to perform the calculation on every sample. Look at the answer by #DanteTheSmith.
You might consider to use the SaleDetail table on the left side of your query.
SELECT SD.PRODID,
P.Price-P.Discount AS Final_Price,
SUM(SD.QUANTITY) AS Amount_Sold,
SUM((P.Price-P.Discount)*SD.QUANTITY) AS Sales_Amount
FROM SaleDetail AS SD
JOIN Product AS P
ON SD.PRODID = P.PRODID
GROUP BY SD.PRODID, P.Price-P.Discount
It would help if you built the example in SQL fiddle or gave the creates for the tables, but if I have to guess your problem is:
(sum((price - discount))) * sum(quantity)
needs to be:
sum((price - discount) * quantity)
(price - discount) * quantity is the function you wanna apply PER ROW of the joined table then you wanna add all those up with SUM() when grouping by prodid.
Furthermore, you can notice that (price - discount) needs to be done ONLY ONCE PER ROW so a quicker version would be to do:
(price-discount) * sum(quantity)
That would give you the total money earned for that product across all the sales you made, and I am guessing this is what you want?
I just notice you have a problem with 2nd column, dunno if that has been in question all along:
(sum((price - discount)))
Why are you summing? Do you want the money earned per product per unit of the product? Well guess what, your price is the same all the time, same as your discount so you can simply go with:
(price-discount) as PPP
NOTE: This assumes the discount is numerical (not percentage) and is applicable to all your sales, also the price is forever the same all which is not real life like.

Get Max "laps" but with minimum "time" from 3 results as join

I'm at a loss and hoping for some help. I've searched SOF, google and tried as many things as I can think but can't get anything even close to what I'm after (so far away there is no point in posting my attempts).
results table
result_id
wcics_live_id
class_id
main
round_num
results_drivers table
rd_id
result_id
user_id
race_time (ex. 5:06.231, this is minutes:seconds)
laps (ex. 25)
For each class_id a driver will have 3 entries in the results_drivers table, for example:
Luke Pittman 25 laps 5:06.231
Luke Pittman 24 laps 5:00.691
Luke Pittman 25 laps 5:05.914
Additionally, each class will have multiple drivers - could be as many as 40 or 50.
I need to be able to gather a list of all the drivers, in order of the fastest time (highest laps with lowest race_time), but only returning one result for each driver. For example:
Faster Guy 26 laps 5:11.134
Luke Pittman 25 laps 5:05.914
Joe Doe 25 laps 5:06.014
Other Guy 24 laps 5:00.141
... and so on
Normally I would do a group by with a max value (or something similar) on a column, but I have no idea how to make that happen with 2 separate columns.
I'd do a query with MAX(laps) ... GROUP BY user_id, and then reference that as an inline view in another query, to get the minimum race_time
Something like this:
SELECT ft.user_id
, ft.laps
, MIN(ft.race_time) AS race_time
FROM ( -- maximum laps
SELECT dr.user_id
, MAX(dr.laps) AS max_laps
FROM driver_results dr
GROUP BY dr.user_id
) ml
JOIN driver_results ft
ON ft.user_id = ml.user_id
AND ft.laps = ml.laps
GROUP
BY ft.user_id
, ft.laps
ORDER
BY laps ASC
, race_time DESC
(I'm assuming here that the laps and race_time columns are canonical, such that ORDER BY and MIN/MAX will work to get the highest number of laps and fastest time. If these are stored as strings, then it won't necessarily work right. i.e. if comparing strings: '10:23.456' will be less than '8:15.555'.

Summary of a day or run a query and sum the results?

Since I don't know to calculate efficiency I'll ask here and I hope someone could tell me what is better and explain it a bit.
The scenario:
Currently I have a table that insert rows of production of each worker.
Something like: (Worker1) produced (product10) with (some amount) for a Date.
And that goes for each station he worked in though the day.
The Question:
I need to generate a report of the sum of amounts that worker produced for each date. I know how to generate the report either way but the question is how is it more efficient?
Having to run a query for each person that sums up the production for each date? or having a table that I'll insert the total amount, workerID and date?
Again if you could explain it a bit further it would be nice, if not than at least an educated answer would help me a lot with this problem.
Example:
This is what I have right now in my production table:
ID EmpID ProductID Amount Dateofproduction
----------------------------------------------------------
1 1 1 100 14/01/2013
2 1 2 20 14/01/2012
This is what I want in the end:
EmpID Amount DateofProduction
-----------------------------------
1 120 14/01/2013
Should I start another table for this? or should I just sum what I have in the production table and take what I need?
Bear in mind that the production table will get larger and larger each day (of course).
i) Direct :
select EmpId, sum(Amount) as Amount, DateOfProduction
from ProductionTable
group by EmpId, DateOfProduction.
ii) Now, the size of the table will keep growing. And you need only day-wise reports.
Is this table being used by anyone else? Can some of the data be archived? If some of the data can be archived, I would suggest, after each day and reporting, backup all the data from this table to a secondary archive table. So, every day you will have to query only today's worth of records.
Secondly, you can consider adding an index to DateOfProduction. You will then be able to restrict your queries in date range. For example, select EmpId, sum(Amount) as Amount, DateOfProduction from ProductionTable group by EmpId, DateOfProduction where DateOfProduction = Date(now()). (or something similar)
Because it is just a single table and no complicated queries, MySql will be easily able to take care of millions of records. Try EXPLAIN on the queries to check the number of records being touched and indexes being used.
Unless I am missing something, it sounds like you just want this:
select empid,
sum(amount) TotalAmount,
Dateofproduction
from yourtable
group by empid, Dateofproduction
See SQL Fiddle with Demo
Result:
| EMPID | TOTALAMOUNT | DATEOFPRODUCTION |
------------------------------------------
| 1 | 120 | 2013-01-14 |
Note: I am guessing that the second row of data you provided is supposed to be 2013 not 2012.

Obtain running frequency distribution from previous N rows of MySQL database

I have a MySQL database where one column contains status codes. The column is of type int and the values will only ever be 100,200,300,400. It looks like below; other columns removed for clarity.
id | status
----------------
1 300
2 100
3 100
4 200
5 300
6 300
7 100
8 400
9 200
10 300
11 100
12 400
13 400
14 400
15 300
16 300
The id field is auto-generated and will always be sequential. I want to have a third column displaying a comma-separated string of the frequency distribution of the status codes of the previous 10 rows. It should look like this.
id | status | freq
-----------------------------------
1 300
2 100
3 100
4 200
5 200
6 300
7 100
8 400
9 300
10 300
11 100 300,100,200,400 -- from rows 1-10
12 400 100,300,200,400 -- from rows 2-11
13 400 100,300,200,400 -- from rows 3-12
14 400 300,400,100,200 -- from rows 4-13
15 300 400,300,100,200 -- from rows 5-14
16 300 300,400,100 -- from rows 6-15
I want the most frequent code listed first. And where two status codes have the same frequency it doesn't matter to me which is listed first but I did list the smaller code before the larger in the example. Lastly, where a code doesn't appear at all in the previous ten rows, it shouldn't be listed in the freq column either.
And to be very clear the row number that the frequency string appears on does NOT take into account the status code of that row; it's only the previous rows.
So what have I done? I'm pretty green with SQL. I'm a programmer and I find this SQL language a tad odd to get used to. I managed the following self-join select statement.
select *, avg(b.status) freq
from sample a
join sample b
on (b.id < a.id) and (b.id > a.id - 11)
where a.id > 10
group by a.id;
Using the aggregate function avg, I can at least demonstrate the concept. The derived table b provides the correct rows to the avg function but I just can't figure out the multi-step process of counting and grouping rows from b to get a frequency distribution and then collapse the frequency rows into a single string value.
Also I've tried using standard stored functions and procedures in place of the built-in aggregate functions, but it seems the b derived table is out of scope or something. I can't seem to access it. And from what I understand writing a custom aggregate function is not possible for me as it seems to require developing in C, something I'm not trained for.
Here's sql to load up the sample.
create table sample (
id int NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
status int
);
insert into sample(status) values(300),(100),(100),(200),(200),(300)
,(100),(400),(300),(300),(100),(400),(400),(400),(300),(300),(300)
,(100),(400),(100),(100),(200),(500),(300),(100),(400),(200),(100)
,(500),(300);
The sample has 30 rows of data to work with. I know it's a long question, but I just wanted to be as detailed as I could be. I've worked on this for a few days now and would really like to get it done.
Thanks for your help.
The only way I know of to do what you're asking is to use a BEFORE INSERT trigger. It has to be BEFORE INSERT because you want to update a value in the row being inserted, which can only be done in a BEFORE trigger. Unfortunately, that also means it won't have been assigned an ID yet, so hopefully it's safe to assume that at the time a new record is inserted, the last 10 records in the table are the ones you're interested in. Your trigger will need to get the values of the last 10 ID's and use the GROUP_CONCAT function to join them into a single string, ordered by the COUNT. I've been using SQL Server mostly and I don't have access to a MySQL server at the moment to test this, but hopefully my syntax will be close enough to at least get you moving in the right direction:
create trigger sample_trigger BEFORE INSERT ON sample
FOR EACH ROW
BEGIN
DECLARE _freq varchar(50);
SELECT GROUP_CONCAT(tbl.status ORDER BY tbl.Occurrences) INTO _freq
FROM (SELECT status, COUNT(*) AS Occurrences, 1 AS grp FROM sample ORDER BY id DESC LIMIT 10) AS tbl
GROUP BY tbl.grp
SET new.freq = _freq;
END
SELECT id, GROUP_CONCAT(status ORDER BY freq desc) FROM
(SELECT a.id as id, b.status, COUNT(*) as freq
FROM
sample a
JOIN
sample b ON (b.id < a.id) AND (b.id > a.id - 11)
WHERE
a.id > 10
GROUP BY a.id, b.status) AS sub
GROUP BY id;
SQL Fiddle

mysql first record retrieval

While very easy to do in Perl or PHP, I cannot figure how to use mysql only to extract the first unique occurence of a record.
For example, given the following table:
Name Date Time Sale
John 2010-09-12 10:22:22 500
Bill 2010-08-12 09:22:37 2000
John 2010-09-13 10:22:22 500
Sue 2010-09-01 09:07:21 1000
Bill 2010-07-25 11:23:23 2000
Sue 2010-06-24 13:23:45 1000
I would like to extract the first record for each individual in asc time order.
After sorting the table is ascending time order, I need to extract the first unique record by name.
So the output would be :
Name Date Time Sale
John 2010-09-12 10:22:22 500
Bill 2010-07-25 11:23:23 2000
Sue 2010-06-24 13:23:45 1000
Is this doable in an easy fashion with mySQL?
I think that something along the lines of
select name, date, time, sale from mytable order by date, time group by name;
will get you what you're looking for
you need to perform a groupwise max or groupwise min
see below or http://pastie.org/973117 for an example
select
u.user_id,
u.username,
latest.comment_id
from
users u
left outer join
(
select
max(comment_id) as comment_id,
user_id
from
user_comment
group by
user_id
) latest on u.user_id = latest.user_id;
In databases, there really is no "first" or "last" record; think of each record as its own, non-positional entity in the table. The only positions they have are when you give them one, say, using ORDER BY.
This will give you what you want. It might not be efficient, but it works.
select Name, Date, Time, Sale from
(select Name, Date, Time, Sale from MyTable
order by Date asc, Time asc) MyTable_subquery_name
group by Name
Note: MyTable_subquery_name is just a dummy name for the subquery. MySQL will give the error ERROR 1248 (42000): Every derived table must have its own alias without it.
If only GROUP BY and ORDER BY were communicative operations, then this wouldn't have to be a subquery.