Interpolate missing values in a MySQL table

Interpolate missing values in a MySQL table - mysql

I have some intraday stock data saved into a MySQL table which looks like this:
+----------+-------+
| tick | quote |
+----------+-------+
| 08:00:10 | 5778 |
| 08:00:11 | 5776 |
| 08:00:12 | 5778 |
| 08:00:13 | 5778 |
| 08:00:14 | NULL |
| 08:00:15 | NULL |
| 08:00:16 | 5779 |
| 08:00:17 | 5778 |
| 08:00:18 | 5780 |
| 08:00:19 | NULL |
| 08:00:20 | 5781 |
| 08:00:21 | 5779 |
| 08:00:22 | 5779 |
| 08:00:23 | 5779 |
| 08:00:24 | 5778 |
| 08:00:25 | 5779 |
| 08:00:26 | 5777 |
| 08:00:27 | NULL |
| 08:00:28 | NULL |
| 08:00:29 | 5776 |
+----------+-------+
As you can see, there are some points where no data is available (quote is NULL). What I would like to do is a simple step interpolation. This means each NULL value should be updated with the last value available. The only way I managed to do this is with cursors, which is pretty slow due to the large amount of data. I'm basically searching something like this:
UPDATE table AS t1
SET quote = (SELECT quote FROM table AS t2
WHERE t2.tick < t1.tick AND
t2.quote IS NOT NULL
ORDER BY t2.tick DESC
LIMIT 1)
WHERE quote IS NULL
Of course this query will not work, but this is how it should look like.
I would appreciate any ideas on how this can be solved without cursors and temp tables.

This should work:
SET #prev = NULL;
UPDATE ticks
SET quote= #prev := coalesce(quote, #prev)
ORDER BY tick;
BTW the same trick works for reading:
SELECT t.tick, #prev := coalesce(t.quote, #prev)
FROM ticks t
JOIN (SELECT #prev:=NULL) as x -- initializes #prev
ORDER BY tick

The main problem here is reference to main query in subquery t2.tick < t1.tick. Because of this you can't simply wrap the subquery in a another subquery.
If this is one time query and there is not so many data, you can do something like that:
UPDATE `table` AS t1
SET quote = (SELECT quote FROM (SELECT quote, tick FROM `table` AS t2 WHERE t2.quote IS NOT NULL) as t3 WHERE t3.tick < t1.tick ORDER BY t3.tick DESC LIMIT 1)
WHERE quote IS NULL
But really, really don't use that as it will be probably to slow. On each null quote, this query selects all data from table table and then from results it gets desired row.

I would create a (temporary) table with the same layout as your table and run the following two queries:
Insert all interpolations into the temp_stock table
INSERT INTO temp_stock (tick, quote)
SELECT s2.tick
, (s1.quote + s3.quote) /2 as quote
FROM stock
INNER JOIN stock s1 ON (s1.tick < s2.tick)
INNER JOIN stock s3 ON (s3.tick > s2.tick)
WHERE s2.quote IS NULL
GROUP BY s2.tick
HAVING s1.tick = MAX(s1.tick), s3.tick = MIN(s3.tick)
Update the stock table with the temp values
UPDATE stock s
INNER JOIN temp_stock ts ON (ts.tick = s.tick) SET s.quote = ts.quote
It does use a temp table (make sure it's a memory table for speed), but it doesn't need a cursor.

Related

MySQL update Column in increment base based on other column value

I have a MySQL table including following columns :
+------------+-------------+
| auto_no | auto_no_new |
+------------+-------------+
| 2021-10431 | 20577 |
| 2021-10432 | 20578 |
| 2021-10433 | 20579 |
| 2021-10434 | 20580 |
| 2021-10435 | 20581 |
| 2021-10436 | 20582 |
+------------+-------------+
Value in the "auto_no" column increments with relevant year. The values show in the table started previously and changed at the beginning of the year as above. Then I needs to start the values in the "auto_no" columns as follows :
+------------+-------------+
| auto_no | auto_no_new |
+------------+-------------+
| 2021-00001 | 20577 |
| 2021-00002 | 20578 |
| 2021-00003 | 20579 |
| 2021-00004 | 20580 |
| 2021-00005 | 20581 |
| 2021-00006 | 20582 |
+------------+-------------+
I used following query
update table set auto_no LIKE %'Y'- '????1'% where auto_no_new > 20577
But didn't get the desired output. What may be going wrong ? Can anyone help ?

Seems trivial
drop table if exists t;
create table t(auto_no varchar(12), auto_no_new int);
insert into t values
( '2021-10431' , 20577 ),
( '2021-10432' , 20578 ),
( '2021-10433' , 20579 ),
( '2021-10434' , 20580 ),
( '2021-10435' , 20581 ),
( '2021-10436' , 20582 );
update t
set auto_no = concat(substring_index(auto_no,'-',1),'-',lpad(auto_no_new - 20576,5,'0'))
where substring_index(auto_no,'-',1) = 2021;
select * from t;
+------------+-------------+
| auto_no | auto_no_new |
+------------+-------------+
| 2021-00001 | 20577 |
| 2021-00002 | 20578 |
| 2021-00003 | 20579 |
| 2021-00004 | 20580 |
| 2021-00005 | 20581 |
| 2021-00006 | 20582 |
+------------+-------------+
6 rows in set (0.013 sec)
or if you don't know the min(auto_no_new)
update t cross join(select min(auto_no_new) - 1 minno from t where substring_index(auto_no,'-',1) = 2021) s
set auto_no = concat(substring_index(auto_no,'-',1),'-',lpad(auto_no_new - s.minno,5,'0'))
where substring_index(auto_no,'-',1) = 2021;

After doing a few tests, this is what I come up with:
UPDATE table1 CROSS JOIN (SELECT #rn := 0) R
SET auto_no=CONCAT(SUBSTRING(auto_no,1,LOCATE('-',auto_no)),LPAD(#rn := #rn+1,5,0))
WHERE auto_no LIKE '2021-%';
However, I advise you to please don't run the update query without backing up your table first. I think the best way is for you to create a copy of the original table and run the update query over it rather than doing it on the original table. At least that would give you chance to do-over if something went wrong. Also, once you're satisfied with the end result (after the update), you can simply rename the original table to something like table1_original then rename the copy table as the original table.
Here is a fiddle demo

Exotic GROUP BY In MySQL

Consider a typical GROUP BY statement in SQL: you have a table like
+------+-------+
| Name | Value |
+------+-------+
| A | 1 |
| B | 2 |
| A | 3 |
| B | 4 |
+------+-------+
And you ask for
SELECT Name, SUM(Value) as Value
FROM table
GROUP BY Name
You'll receive
+------+-------+
| Name | Value |
+------+-------+
| A | 4 |
| B | 6 |
+------+-------+
In your head, you can imagine that SQL generates an intermediate sorted table like
+------+-------+
| Name | Value |
+------+-------+
| A | 1 |
| A | 3 |
| B | 2 |
| B | 4 |
+------+-------+
and then aggregates together successive rows: the "Value" column has been given an aggregator (in this case SUM), so it's easy to aggregate. The "Name" column has been given no aggregator, and thus uses what you might call the "trivial partial aggregator": given two things that are the same (e.g. A and A), it aggregates them into a single copy of one of the inputs (in this case A). Given any other input it doesn't know what to do and is forced to begin aggregating anew (this time with the "Name" column equal to B).
I want to do a more exotic kind of aggregation. My table looks like
+------+-------+
| Name | Value |
+------+-------+
| A | 1 |
| BC | 2 |
| AY | 3 |
| AZ | 4 |
| B | 5 |
| BCR | 6 |
+------+-------+
And the intended output is
+------+-------+
| Name | Value |
+------+-------+
| A | 8 |
| B | 13 |
+------+-------+
Where does this come from? A and B are the "minimal prefixes" for this set of names: they occur in the data set and every Name has exactly one of them as a prefix. I want to aggregate data by grouping rows together when their Names have the same minimal prefix (and add the Values, of course).
In the toy grouping model from before, the intermediate sorted table would be
+------+-------+
| Name | Value |
+------+-------+
| A | 1 |
| AY | 3 |
| AZ | 4 |
| B | 5 |
| BC | 2 |
| BCR | 6 |
+------+-------+
Instead of using the "trivial partial aggregator" for Names, we would use one that can aggregate X and Y together iff X is a prefix of Y; in that case it returns X. So the first three rows would be aggregated together into a row with (Name, Value) = (A, 8), then the aggregator would see that A and B couldn't be aggregated and would move on to a new "block" of rows to aggregate.
The tricky thing is that the value we're grouping by is "non-local": if A were not a name in the dataset, then AY and AZ would each be a minimal prefix. It turns out that the AY and AZ rows are aggregated into the same row in the final output, but you couldn't know that just by looking at them in isolation.
Miraculously, in my use case the minimal prefix of a string can be determined without reference to anything else in the dataset. (Imagine that each of my names is one of the strings "hello", "world", and "bar", followed by any number of z's. I want to group all of the Names with the same "base" word together.)
As I see it I have two options:
1) The simple option: compute the prefix for each row and group by that value directly. Unfortunately I have an index on the Name, and computing the minimal prefix (whose length depends on the Name itself) prevents me from using that index. This forces a full table scan, which is prohibitively slow.
2) The complicated option: somehow convince MySQL to use the "partial prefix aggregator" for Name. This runs into the "non-locality" problem above, but that's fine as long as we scan the table according to my index on Name, since then every minimal prefix will be encountered before any of the other strings it is a prefix of; we would never try to aggregate AY and AZ together if A were in the dataset.
In a declarative programming language #2 would be rather easy: extract rows one at a time, in alphabetical order, keeping track of the current prefix. If your new row's Name has that as a prefix, it goes in the bucket you're currently using. Otherwise, start a new bucket with that as your prefix. In MySQL I am lost as to how to do it. Note that the set of minimal prefixes is not known beforehand.

Edit 2
It occurred to me that if the table is ordered by Name, this would be a lot easier (and faster). Since I don't know if your data is sorted, I've included a sort in this query, but if the data is sorted, you can strip out (SELECT * FROM table1 ORDER BY Name) t1 and just use FROM table1
SELECT prefix, SUM(`Value`)
FROM (SELECT Name, Value, #prefix:=IF(Name NOT LIKE CONCAT(#prefix, '_%'), Name, #prefix) AS prefix
FROM (SELECT * FROM table1 ORDER BY Name) t1
JOIN (SELECT #prefix := '~') p
) t2
GROUP BY prefix
Updated SQLFiddle
Edit
Having slept on the problem, I realised that there is no need to do the IN, it's enough to just have a WHERE NOT EXISTS clause on the JOINed table:
SELECT t1.Name, SUM(t2.Value) AS `Value`
FROM table1 t1
JOIN table1 t2 ON t2.Name LIKE CONCAT(t1.Name, '%')
WHERE NOT EXISTS (SELECT *
FROM table1 t3
WHERE t1.Name LIKE CONCAT(t3.Name, '_%')
)
GROUP BY t1.Name
Updated Explain (Name changed to UNIQUE key from PRIMARY)
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t1 index Name Name 11 NULL 6 Using where; Using index; Using temporary; Using filesort
1 PRIMARY t2 ALL NULL NULL NULL NULL 6 Using where; Using join buffer (Block Nested Loop)
3 DEPENDENT SUBQUERY t3 index NULL Name 11 NULL 6 Using where; Using index
Updated SQLFiddle
Original Answer
Here is one way you could do it. First, you need to find all the unique prefixes in your table. You can do that by looking for all values of Name where it does not look like another value of Name with other characters on the end. This can be done with this query:
SELECT Name
FROM table1 t1
WHERE NOT EXISTS (SELECT *
FROM table1 t2
WHERE t1.Name LIKE CONCAT(t2.Name, '_%')
)
For your sample data, that will give
Name
A
B
Now you can sum all the values where the Name starts with one of those prefixes. Note we change the LIKE pattern in this query so that it also matches the prefix, otherwise we wouldn't count the values for A and B in your example:
SELECT t1.Name, SUM(t2.Value) AS `Value`
FROM table1 t1
JOIN table1 t2 ON t2.Name LIKE CONCAT(t1.Name, '%')
WHERE t1.Name IN (SELECT Name
FROM table1 t3
WHERE NOT EXISTS (SELECT *
FROM table1 t4
WHERE t3.Name LIKE CONCAT(t4.Name, '_%')
)
)
GROUP BY t1.Name
Output:
Name Value
A 8
B 13
An EXPLAIN says that both of these queries use the index on Name, so should be reasonably efficient. Here is the result of the explain on my MySQL 5.6 server:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t1 index PRIMARY PRIMARY 11 NULL 6 Using index; Using temporary; Using filesort
1 PRIMARY t3 eq_ref PRIMARY PRIMARY 11 test.t1.Name 1 Using where; Using index
1 PRIMARY t2 ALL NULL NULL NULL NULL 6 Using where; Using join buffer (Block Nested Loop)
3 DEPENDENT SUBQUERY t4 index NULL PRIMARY 11 NULL 6 Using where; Using index
SQLFiddle Demo

Here are some hints on how to do the task. This locates any prefixes that are useful. That's not what you asked for, but the flow of the query and the usage of #variables, plus the need for 2 (actually 3) levels of nesting, might help you.
SELECT DISTINCT `Prev`
FROM
(
SELECT #prev := #next AS 'Prev',
#next := IF(LEFT(city, LENGTH(#prev)) = #prev, #next, city) AS 'Next'
FROM ( SELECT #next := ' ' ) AS init
JOIN ( SELECT DISTINCT city FROM us ) AS dedup
ORDER BY city
) x
WHERE `Prev` = `Next` ;
Partial output:
+----------------+
| Prev |
+----------------+
| Alamo |
| Allen |
| Altamont |
| Ames |
| Amherst |
| Anderson |
| Arlington |
| Arroyo |
| Auburn |
| Austin |
| Avon |
| Baker |
Check the Al% cities:
mysql> SELECT DISTINCT city FROM us WHERE city LIKE 'Al%' ORDER BY city;
+-------------------+
| city |
+-------------------+
| Alabaster |
| Alameda |
| Alamo | <--
| Alamogordo | <--
| Alamosa |
| Albany |
| Albemarle |
...
| Alhambra |
| Alice |
| Aliquippa |
| Aliso Viejo |
| Allen | <--
| Allen Park | <--
| Allentown | <--
| Alliance |
| Allouez |
| Alma |
| Aloha |
| Alondra Park |
| Alpena |
| Alpharetta |
| Alpine |
| Alsip |
| Altadena |
| Altamont | <--
| Altamonte Springs | <--
| Alton |
| Altoona |
| Altus |
| Alvin |
+-------------------+
40 rows in set (0.01 sec)

How to add the value of a Column with the Average in mysql

This is my table data
the table name is Obat
+---------+---------+----------------+-------+
| merek | formula | nm_obat | harga |
+---------+---------+----------------+-------+
| am001 | 1x1 | Antimo | 3500 |
| gp002 | 1x1 | Glimipirid | 20000 |
| if001 | 1x1 | Inzaflu | 4500 |
| mf500 | 3x1 | Metformin500mg | 10000 |
| mixg001 | 1x1 | Mixagrip | 5000 |
+---------+---------+----------------+-------+
How can I add the value in Harga column with the Average of Harga?
This is what I've been trying:
UPDATE obat SET
harga = harga + (select avg(harga) from obat);

Create a data set consisting of just the average. then cross join it to the base set allowing you to add the two values together. since 1*#records in datatable will equal the same records in data table you'll get the same number of rows.
This approach selects the average once. You could run this select each time by moving it into the select but that is generally slower..
Best approach IMO. (in my opinion)
SELECT A.merek, A.formula, A.nm_obat, A.harga, harga+B.mAvg as newCol
FROM DataTable A
CROSS JOIN (SELECT avg(harga) mAvg FROM dataTable) B
Alternative approach but much slower.
SELECT A.merek
, A.formula
, A.nm_obat
, A.harga
, harga+(SELECT avg(harga) mAvg
FROM dataTable) as newCol
FROM DataTable A
to Update it should be this simple:
(other examples) mysql update column with value from another table
update obat A
cross join (select avg(harga) mavg from obat) b
Set A.harga = A.Harga+B.Mavg;

How to improve query Mysql?

I made this query in MySql. It found but the time of query is 3 or + minutes.
I would like to know if is possible to improve this query.
The query is this:
SELECT CODARTIOLO, NOMEARTICOLO, SUM(QUANTITA) AS QUANTITA,
(SUM(TOTRIGA)/SUM(QUANTITA)) AS TOTALE,
(SELECT (SUM(QUANTITA * PREZZOCAD))/SUM(QUANTITA)
FROM vistacaricomagazzino cm
WHERE cm.DATA <= '$dataStart' AND cm.codarticolo=CODARTIOLO) AS PREZZOMEDIO
FROM vistascontrini c
WHERE c.DATA >= '$dataStart' AND c.DATA <= '$dataEnd'
GROUP BY NOMEARTICOLO
the table is:
VISTASCONTRINI
+--------------+--------------+------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+------------+-------+
| CODARTIOLO | varchar(13) | YES | | NULL | |
| NOMEARTICOLO | varchar(60) | YES | | NULL | |
| QUANTITA | int(11) | YES | | NULL | |
| TOTRIGA | decimal(9,2) | YES | | NULL | |
| DATA | date | NO | | 0000-00-00 | |
VISTACARICOMAGAZZINO
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| codordine | int(11) | NO | | 0 | |
| Quantita | int(11) | YES | | NULL | |
| PrezzoCad | decimal(10,3) | YES | | NULL | |
| codArticolo | varchar(13) | YES | | NULL | |
| Data | date | YES | | NULL | |
+-------------+---------------+------+-----+---------+-------+

If your tables don't have indexes (and it doesn't look like they do), then that's the first thing you need to fix. If you do that right (ie put the indexes on the fields that need to be indexed), it will probably solve the problem for you in one hit.
The fields you need to consider indexing are the ones being used by the WHERE and GROUP BY clauses.
Next, consider converting it from a nested SELECT query into a JOIN query. This will probably give you better performance too.
Finally, you haven't stated just how much data is being collated here, but if it's a large amount of data, then consider storing the collated data totals separately within the database so that you can just query it directly rather than having to re-generate all those sums and groups every time. This obviously has it's own considerations (additional storage, additional code when updating data to also update the totals, possibility of things going out of sync, etc), but if you're really suffering with performance on this, it is a valid solution.

I think you could try to remove the subquery from select clause. Modify the query to something like (untested, just to get the idea),
SELECT c.CODARTIOLO,c.NOMEARTICOLO,SUM(c.QUANTITA) AS QUANTITA,(SUM(c.TOTRIGA)/SUM(c.QUANTITA)) AS TOTALE,(SUM(cm.QUANTITA * cm.PREZZOCAD))/SUM(cm.QUANTITA)
FROM vistascontrini c,
vistacaricomagazzino cm
WHERE cm.codarticolo=c.CODARTIOLO
AND cm.DATA <= '$dataStart'
AND c.DATA >= '$dataStart' AND c.DATA <= '$dataEnd' group by NOMEARTICOLO
Also add indexes on DATA column of tables vistascontrini and vistacaricomagazzino .

Add indexes on vistascontrini.DATA, vistascontrini.NOMEARTICOLO, vistacaricomagazzino.DATA, vistacaricomagazzino.codarticolo
It's not clear do you want to group by CODARTIOLO or not and what value do you need if your group only by NOMEARTICOLO ?
Try to use following query equivalent to yours:
SELECT T.*,T1.PREZZOMEDIO FROM
(
SELECT CODARTIOLO,NOMEARTICOLO,
SUM(QUANTITA) AS QUANTITA,
(SUM(TOTRIGA)/SUM(QUANTITA)) AS TOTALE,
FROM vistascontrini c
WHERE c.DATA >= '$dataStart' AND c.DATA <= '$dataEnd'
group by NOMEARTICOLO
) AS T
LEFT JOIN
( SELECT CODARTIOLO,(SUM(QUANTITA * PREZZOCAD))/SUM(QUANTITA) as PREZZOMEDIO
FROM vistacaricomagazzino cm
WHERE cm.DATA <= '$dataStart'
GROUP BY CODARTIOLO ) as T1
ON T.CODARTIOLO=T1.CODARTIOLO

MySQL query - only exact result or every choice

I've a query that I need some help with -
As part of a form I've got a serial number field that is populated if there is a serial number, blank if it's not, or no result if it's an invalid serial number.
select *
from cust_site_contract as cs
where cs.serial_no = 'C20050' or (cs.serial_no <> 'C20050' and if(cs.serial_no = 'C20050',1,0)=0)
limit 10;
Here's a sample of the regular data:
+----------------------+-----------+-----------+-----------
| idcust_site_contract | system_id | serial_no | end_date
+----------------------+-----------+-----------+-----------
| 561315 | SH001626 | C19244 | 2009-12-21
| 561316 | SH001626 | C19244 | 2010-06-30
| 561317 | SH002125 | C19671 | 2010-05-31
| 561318 | SH001766 | C14781 | 2010-09-25
| 561319 | SH001766 | C14781 | 2011-02-15
| 561320 | SH002059 | C19020 | 2008-07-09
| 561321 | SH002639 | C18889 | 2008-03-31
| 561322 | SH002639 | C18889 | 2008-06-30
| 561323 | SH002715 | C20051 | 2010-04-30
| 561324 | SH002719 | C20057 | 2010-04-30
And an exact result would look something like this:
| 561487 | SH002837 | C20050 | 2012-07-04
I was writing this as a subquery so I could match the system_ids to customer and contract names, but realised I was getting garbage pretty early on.
I'm tempted to try and simplify it by saying the third case might not hold true (i.e. if it's an invalid serial number, allow the choice of any customer name and simply flag it in the data)
Has anyone got any ideas of where I'm going wrong? The combination of conditions is clearly wrong, and I can't work out how to make each side of the or statement mutually exclusive
Even if I try to evaluate only the if(sn = 'blah') I get the wrong result for obvious reasons, but can't think of a sane way to express it.
Many thanks
Scott

If there is is no contract with a serial number of C20050, this query will return all rows, otherwise, it will return only one row where serial_no is C20050:
SELECT a.*
FROM cust_site_contract a
INNER JOIN
(
SELECT COUNT(*) AS rowexists
FROM cust_site_contract
WHERE serial_no = 'C20050'
) b ON b.rowexists = 0
UNION ALL
(
SELECT *
FROM cust_site_contract
WHERE serial_no = 'C20050'
LIMIT 1
)

If you just write the query as below you will get blank if doesn't exists or it's an invalid serial number.
select cs.serial_no from cust_site_contract as cs where cs.serial_no = 'C20050'

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Interpolate missing values in a MySQL table - mysql

This should work: SET #prev = NULL; UPDATE ticks SET quote= #prev := coalesce(quote, #prev) ORDER BY tick; BTW the same trick works for reading: SELECT t.tick, #prev := coalesce(t.quote, #prev) FROM ticks t JOIN (SELECT #prev:=NULL) as x -- initializes #prev ORDER BY tick

Related

MySQL update Column in increment base based on other column value

Exotic GROUP BY In MySQL

How to add the value of a Column with the Average in mysql

How to improve query Mysql?

MySQL query - only exact result or every choice

Categories

Resources