MYSQL Simple Moving Average Calculation - mysql

The following MySql update state seems to take an excessive amount of time to execute for the recordset provided (~5000 records). The update statement below takes on average 12 seconds to execute. I currently plan to run this calculation for 5 different periods and about 500 different stock symbols. This translates into 12secs * 5 calculations * 500 symbols = 30,000 seconds or 8..33 hrs.
Update Statement:
UPDATE tblStockDataMovingAverages_AAPL JOIN
(SELECT t1.Sequence,
(
SELECT AVG(t2.Close)
FROM tblStockDataMovingAverages_AAPL AS t2
WHERE (t1.Sequence - t2.Sequence)BETWEEN 0 AND 7
)AS "8SMA"
FROM tblStockDataMovingAverages_AAPL AS t1
ORDER BY t1.Sequence) AS ma_query
ON tblStockDataMovingAverages_AAPL.Sequence = ma_query.Sequence
SET tblStockDataMovingAverages_AAPL.8MA_Price = ma_query.8SMA
Table Design:
CREATE TABLE `tblStockDataMovingAverages_AAPL` (
`Symbol` char(6) NOT NULL DEFAULT '',
`TradeDate` date NOT NULL DEFAULT '0000-00-00',
`Sequence` int(11) DEFAULT NULL,
`Close` decimal(18,5) DEFAULT NULL,
`200MA_Price` decimal(18,5) DEFAULT NULL,
`100MA_Price` decimal(18,5) DEFAULT NULL,
`50MA_Price` decimal(18,5) DEFAULT NULL,
`20MA_Price` decimal(18,5) DEFAULT NULL,
`8MA_Price` decimal(18,5) DEFAULT NULL,
`50_200_Cross` int(5) DEFAULT NULL,
PRIMARY KEY (`Symbol`,`Sequence`),
KEY `idxSequnce` (`Sequence`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1$$
Any help on sppeding up the process would be greatly appreciated.
Output of Select Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t1 index NULL idxSymbol_Sequnce 11 NULL 5205 Using index; Using filesort
2 DEPENDENT SUBQUERY t2 ALL NULL NULL NULL NULL 5271 Using where

This should be a little better:
update tblStockDataMovingAverages_AAPL
join (
select t1.sequence as sequence, avg(t2.close) as av
from tblStockDataMovingAverages_AAPL t1
join tblStockDataMovingAverages_AAPL t2
on t2.sequence BETWEEN t1.sequence-7 AND t1.sequence
group by t1.sequence
) t1 on tblStockDataMovingAverages_AAPL.sequence = t1.sequence
set 8MA_Price = t1.av
With regard to my BETWEEN statement: field1 OPERATOR expression(field2) is easier to optimise than expression(field1, field2) OPERATOR expression in the ON condition. I think this holds for BETWEEN.
It looks like the ORDER BY in your query is unnecessary and removing it might speed your query up a ton.
If any of the stock symbols appear in the same table, stick all these into a single update query (different periods won't work though), this would likely be way faster than running it for each.
As already suggested, adding an index to Close may help.

you can optimize it slightly by adding index to Close field. AVG function have to be more effective. Please share dump of your dataset to see it more close.

Related

Is there any way to get this order by query to start searching from the given timestamp on a MySQL index?

I am working on a mysql 5.6 database, and I have a table looking something like this:
CREATE TABLE `items` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`node_type_id` int(11) NOT NULL,
`property_native_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`parent_item_id` bigint(20) DEFAULT NULL,
`external_timestamp` datetime DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_items_on_acct_node_prop` (`account_id`,`node_type_id`,`property_native_id`),
KEY `index_items_on_account_id_and_external_timestamp` (`account_id`,`external_timestamp`),
KEY `index_items_on_account_id_and_created_at` (`account_id`,`created_at`),
KEY `parent_item_external_timestamp_idx` (`parent_item_id`,`external_timestamp`),
) ENGINE=InnoDB AUTO_INCREMENT=194417315 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I am trying to optimize a query doing this:
SELECT *
FROM items
WHERE parent_item_id = ?
AND external_timestamp < ( SELECT external_timestamp
FROM items
WHERE id = ?
) FROM items ORDER BY
external_timestamp LIMIT 5
Currently, there is an index on parent_item_id, so when I run this query with EXPLAIN, I get an "extra" of "Using where; Using filesort"
When I modify the index to be (parent_item_id, external_timestamp), then the EXPLAIN's "extra" becomes "Using index condition"
The problem is that the EXPLAIN's "rows" field is still the same (which is usually a couple thousand rows, but it could be millions in some use-cases).
I know that I can do something like AND external_timestamp > (1 week ago) or something like that, but I'd really like the number of rows to be just the number of LIMIT, so 5 in that case.
Is it possible to instruct the database to lock onto a row and then get the 5 rows before it on that (parent_item_id, external_timestamp) index?
(I'm unclear on what you are trying to do. Perhaps you should provide some sample input and output.) See if this works for you:
SELECT i.*
FROM items AS i
WHERE i.parent_item_id = ?
AND i.external_timestamp < ( SELECT external_timestamp
FROM items
WHERE id = ? )
ORDER BY i.external_timestamp
LIMIT 5
Your existing INDEX(parent_item_id, external_timestamp) will probably be used; see EXPLAIN SELECT ....
If id was supposed to match in all 5 rows, then the subquery is not needed.
SELECT items.*
FROM items
CROSS JOIN ( SELECT external_timestamp
FROM items
WHERE id = ? ) subquery
WHERE items.parent_item_id = ?
AND items.external_timestamp < subquery.external_timestamp
ORDER BY external_timestamp LIMIT 5
id is PK, hence the subquery will return only one row (or none).

Performance challenge on MySQL RDS with simultaneous queries

I have an app developed over NodeJS on AWS that has a MySQL RDS database (server class: db.r3.large - Engine: InnoDB) associated. We are having a performance problem, when we execute simultaneous queries (at the same time), the database is returning the results after finishing the last query and not after each query is finished.
So, as an example: if we execute a process that has 10 simultaneous queries of 3 seconds each, we start receiving the results at approximately 30 seconds and we want to start receiving when the first query is finished (3 seconds).
It seems that the database is receiving the queries and make a queue of them.
I'm kind of lost here since I changed several things (separate connections, pool connections, etc) of the code and the settings of AWS but doesn’t seem to improve the result.
TableA (13M records) schema:
CREATE TABLE `TableA` (
`columnA` int(11) NOT NULL AUTO_INCREMENT,
`columnB` varchar(20) DEFAULT NULL,
`columnC` varchar(15) DEFAULT NULL,
`columnD` varchar(20) DEFAULT NULL,
`columnE` varchar(255) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
`columnH` varchar(10) DEFAULT NULL,
`columnI` bigint(11) DEFAULT NULL,
`columnJ` bigint(11) DEFAULT NULL,
`columnK` varchar(5) DEFAULT NULL,
`columnL` varchar(50) DEFAULT NULL,
`columnM` varchar(20) DEFAULT NULL,
`columnN` int(1) DEFAULT NULL,
`columnO` int(1) DEFAULT '0',
`columnP` datetime NOT NULL,
`columnQ` datetime NOT NULL,
PRIMARY KEY (`columnA`),
KEY `columnB` (`columnB`),
KEY `columnO` (`columnO`),
KEY `columnK` (`columnK`),
KEY `columnN` (`columnN`),
FULLTEXT KEY `columnE` (`columnE`)
) ENGINE=InnoDB AUTO_INCREMENT=13867504 DEFAULT CHARSET=utf8;
TableB (15M records) schema:
CREATE TABLE `TableB` (
`columnA` int(11) NOT NULL AUTO_INCREMENT,
`columnB` varchar(50) DEFAULT NULL,
`columnC` varchar(50) DEFAULT NULL,
`columnD` int(1) DEFAULT NULL,
`columnE` datetime NOT NULL,
`columnF` datetime NOT NULL,
PRIMARY KEY (`columnA`),
KEY `columnB` (`columnB`),
KEY `columnC` (`columnC`)
) ENGINE=InnoDB AUTO_INCREMENT=19153275 DEFAULT CHARSET=utf8;
Query:
SELECT COUNT(*) AS total
FROM TableA
WHERE TableA.columnB IN (
SELECT TableB.columnC
FROM TableB
WHERE TableB.columnB = "3764301"
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
)
AND columnM > 2;
1 execution return in 2s
10 executions return the first result in 20s and the another results after that time.
To see that queries are running I'm using "SHOW FULL PROCESSLIST" and the queries are most of the time with state "sending data".
It is not a performance issue about the query, it is a problem of recurrence over database. Even a very simple query like "SELECT COUNT(*) FROM TableA WHERE columnM = 5" has the same problem.
UPDATE
Only for testing purpose I reduce the query to only one subquery condition. Both results have 65k records.
-- USING IN
SELECT COUNT(*) as total
FROM TableA
WHERE TableA.columnB IN (
SELECT TableB.columnC
FROM TableB
WHERE TableB.columnB = "103550181"
AND TableB.columnC NOT IN (
SELECT field
FROM tableX
WHERE fieldX = 15
)
)
AND columnM > 2;
-- USING EXISTS
SELECT COUNT(*) as total
FROM TableA
WHERE EXISTS (
SELECT *
FROM TableB
WHERE TableB.columnB = "103550181"
AND TableA.columnB = TableB.columnC
AND NOT EXISTS (
SELECT *
FROM tableX
WHERE fieldX = 15
AND fieldY = TableB.columnC
)
)
AND columnM > 2;
-- Result
Query using IN : 1.7 sec
Query using EXISTS : 141 sec (:O)
Using IN or EXISTS the problem is the same, when I execute many times this query the data base have a delay and the response comes after a lot of time.
Example: If one query response in 1.7 sec, if I execute 10 times this query, the first result is in 20 sec.
Recommendation 1
Change the NOT IN ( SELECT ... ) to NOT EXISTS ( SELECT * ... ). (And you may need to change the WHERE clause a bit.
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
-->
AND NOT EXISTS ( SELECT * FROM table WHERE field = TableB.columnC )
table needs an index on field.
IN ( SELECT ... ) performs very poorly. EXISTS is much better optimized.
Recommendation 2
To deal with the concurrency, consider doing SET SESSION TRANSACTION READ UNCOMMITTED before the query. This may keep one connection from interfering with another.
Recommendation 3
Show us the EXPLAIN, the indexes (SHOW CREATE TABLE) (what you gave is not sufficient), and the WHERE clauses so we can critique the indexes.
Recommendation 4
It might help for TableB to have a composite INDEX(ColumnB, ColumnC) in that order.
What I can see here is that HUGE temporary table is being build for each query. Consider different architecture.

Ordering in MySQL Bogs Down

I've been working on a small Perl program that works with a table of articles, displaying them to the user if they have not been already read. It has been working nicely and it has been quite speedy, overall. However, this afternoon, the performance has degraded from fast enough that I wasn't worried about optimizing the query to a glacial 3-4 seconds per query. To select articles, I present this query:
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
WHERE ciid NOT
IN (
SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)
AND (
cid =117
OR cid =308
OR cid =310
)
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
The list of possible cid's varies and could be quite a bit more. In any case, I noted that about 2-3 seconds of the total time to make the query is devoted to "ORDER BY." If I remove that, it only takes about a half second to give me the query back. If I drop the subquery, the performance goes back to normal... but the subquery didn't seem to be problematic until just this afternoon, after working fine for a week or so.
Any ideas what could be slowing it down so much? What might I do to try to get the performance back up to snuff? The table being queried has 45,000 rows. The subquery's table has fewer than 3,000 rows at present.
Update: Incidentally, if anyone has suggestions on how to do multiple queries or some other technique that would be more efficient to accomplish what I am trying to do, I am all ears. I'm really puzzled how to solve the problem at this point. Can I somehow apply the order by before the join to make it apply to the real table and not the derived table? Would that be more efficient?
Here is the latest version of the query, derived from suggestions from #Gordon, below
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
LEFT JOIN (
SELECT ciid, dateRead
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)alreadyRead ON channelitem.ciid = alreadyRead.ciid
WHERE (
alreadyRead.ciid IS NULL
)
AND `cid`
IN ( 6648, 329, 323, 6654, 6647 )
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
Also, I should mention what my db structure looks like with regards to these two tables -- maybe someone can spot something odd about the structure:
CREATE TABLE IF NOT EXISTS `channelitem` (
`newsversion` int(11) NOT NULL DEFAULT '0',
`cid` int(11) NOT NULL DEFAULT '0',
`ciid` int(11) NOT NULL AUTO_INCREMENT,
`description` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`url` varchar(222) DEFAULT NULL,
`creationdate` datetime DEFAULT NULL,
`urgent` varchar(10) DEFAULT NULL,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`lastchanged` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`author` varchar(255) NOT NULL,
PRIMARY KEY (`ciid`),
KEY `newsversion` (`newsversion`),
KEY `cid` (`cid`),
KEY `creationdate` (`creationdate`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1638554365 ;
CREATE TABLE IF NOT EXISTS `uninet_channelitem_read` (
`ciid` int(11) NOT NULL,
`uid` int(11) NOT NULL,
`dateRead` datetime NOT NULL,
PRIMARY KEY (`ciid`,`uid`),
KEY `ciid` (`ciid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
It never hurts to try the left outer join version of such a query:
SELECT ci.ciid, ci.cid, ci.name, ci.description, ci.url, ci.creationdate, ci.author
FROM `channelitem` ci left outer join
(SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
) cr
on ci.ciid = cr.ciid
where cr.ciid is null and
ci.cid in (117, 308, 310)
ORDER BY ci.`creationdate` DESC
LIMIT 0 , 100
This query will be faster with an index on uninet_channelitem_read(ciid) and probably on channelitem(cid, ciid, createddate).
The problem could be that you need to create an index on the channelitem table for the column creationdate. Indexes help a database to run queries faster. Here is a link about MySQL Indexing

How to improve search performance in MySQL

I have a table that contains two bigint columns: beginNumber, endNumber, defined as UNIQUE. The ID is the Primary Key.
ID | beginNumber | endNumber | Name | Criteria
The second table contains a number. I want to retrieve the record from table1 when the Number from table2 is found to be between any two numbers. The is the query:
select distinct t1.Name, t1.Country
from t1
where t2.Number
BETWEEN t1.beginIpNum AND t1.endNumber
The query is taking too much time as I have so many records. I don't have experience in DB. But, I read that indexing the table will improve the search so MySQL does not have to pass through every row searching about m Number and this can be done by, for example, having UNIQE values. I made the beginNumber & endNumber in table1 as UNIQUE. Is this all what I can do ? Is there any possible way to improve the time ? Please, provide detailed answers.
EDIT:
table1:
CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`beginNumber` bigint(20) DEFAULT NULL,
`endNumber` bigint(20) DEFAULT NULL,
`Name` varchar(255) DEFAULT NULL,
`Criteria` varchar(455) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `beginNumber_UNIQUE` (`beginNumber`),
UNIQUE KEY `endNumber_UNIQUE` (`endNumber `)
) ENGINE=InnoDB AUTO_INCREMENT=327 DEFAULT CHARSET=utf8
table2:
CREATE TABLE `t2` (
`id2` int(11) NOT NULL AUTO_INCREMENT,
`description` varchar(255) DEFAULT NULL,
`Number` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id2`),
UNIQUE KEY ` description _UNIQUE` (`description `)
) ENGINE=InnoDB AUTO_INCREMENT=433 DEFAULT CHARSET=utf8
This is a toy example of the tables but it shows the concerned part.
I'd suggest an index on t2.Number like this:
ALTER TABLE t2 ADD INDEX numindex(Number);
Your query won't work as written because it won't know which t2 to use. Try this:
SELECT DISTINCT t1.Name, t1.Criteria
FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE t2.Number BETWEEN t1.beginNumber AND t1.endNumber);
Without the t2.Number index EXPLAIN gives this query plan:
1 PRIMARY t1 ALL 1 Using where; Using temporary
2 DEPENDENT SUBQUERY t2 ALL 1 Using where
With an index on t2.Number, you get this plan:
PRIMARY t1 ALL 1 Using where; Using temporary
DEPENDENT SUBQUERY t2 index numindex numindex 9 1 Using where; Using index
The important part to understand is that an ALL comparison is slower than an index comparison.
This is a good place to use binary tree index (default is hashmap). Btree indexes are best when you often sort or use between on column.
CREATE INDEX index_name
ON table_name (column_name)
USING BTREE

MYSQL Value Difference optimization

Hallo guys,
I'm running a very large database (ATM like >5 Million datasets). My database stores custom generated numbers (which and how they compose doesn't really matters here) and the corresponding date to this one. In addition there is a ID stored for every product (means one product can have multiple entries for different dates in my database -> primary key is divided). Now I want to SELECT those top 10 ID's which got the largest difference in theire numbers in the last two days. Currently I tried to achieve this using JOINS but since I got that much datasets this way is far to slow. How could I speed up the whole operation?
SELECT
d1.place,d2.place,d1.ID
FROM
daily
INNER JOIN
daily AS d1 ON d1.date = CURDATE()
INNER JOIN
daily as d2 ON d2.date = DATE_ADD(CURDATE(), INTERVAL -1 DAY)
ORDER BY
d2.code-d1.code LIMIT 10
EDIT: Thats how my structure looks like
CREATE TABLE IF NOT EXISTS `daily` (
`ID` bigint(40) NOT NULL,
`source` char(20) NOT NULL,
`date` date NOT NULL,
`code` int(11) NOT NULL,
`cc` char(2) NOT NULL,
PRIMARY KEY (`ID`,`source`,`date`,`cc`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Thats the output of the Explain Statement
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE d1 ALL PRIMARY NULL NULL NULL 5150350 Using where; Using temporary; Using filesort
1 SIMPLE d2 ref PRIMARY PRIMARY 8 mytable.d1.ID 52 Using where
How about this?
SELECT
d1.ID, d1.place, d2.place
FROM
daily AS d1
CROSS JOIN
daily AS d2
USING (ID)
WHERE
d1.date = CURDATE()
AND d2.date = CURDATE() - INTERVAL 1 DAY
ORDER BY
d2.code - d1.code DESC
LIMIT
10
Some thoughts about your table structure.
`ID` bigint(40) NOT NULL,
Why BIGINT? You would need to be doing 136 inserts/s 24h a day, 7 days a week for a year to exhaust the range of INT. And before you get halfway there, your application will probably need a professional DBA anyway.
Remember, Smaller primary index leads to fater lookups - which brings us to:
PRIMARY KEY (`ID`,`source`,`date`,`cc`)
Why? A single column PK on ID column should be enough. If you need indexes on other columns, create additional indexes (and to it wisely). As it is, you basically have a covering index for entire table... which is like having entire table in the index.
Last but not least: where is place column? You've used it in your query (and then I in mine), but it's nowhere to be seen?
Proposed table structure:
CREATE TABLE IF NOT EXISTS `daily` (
`ID` int(10) UNSIGNED NOT NULL, --usually AUTO_INCREMENT is used as well,
`source` char(20) NOT NULL,
`date` date NOT NULL,
`code` int(11) NOT NULL,
`cc` char(2) NOT NULL,
PRIMARY KEY (`ID`),
KEY `ID_date` (`ID`,`date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;