Good afternoon all. I am coming to you in the hopes that you can provide some direction with a MYSQL optimization problem that I am having. First, a few system specifications.
MYSQL version: 5.2.47 CE
WampServer v 2.2
Computer:
Samsung QX410 (laptop)
Windows 7
Intel i5 (2.67 Ghz)
4GB RAM
I have two tables:
“Delta_Shares” contains stock trade data, and contains two columns of note. “Ticker” is Varchar(45), “Date_Filed” is Date. This table has about 3 million rows (all unique). I have an index on this table “DeltaSharesTickerDateFiled” on (Ticker, Date_Filed).
“Stock_Data” contains two columns of note. “Ticker” is Varchar(45), “Value_Date” is Date. This table has about 19 million rows (all unique). I have an index on this table “StockDataIndex” on (Ticker, Value_Date).
I am attempting to update the “Delta_Shares” table by looking up information from the Stock_Data table. The following query takes more than 4 hours to run.
update delta_shares A, stock_data B
set A.price_at_file = B.stock_close
where A.ticker = B.ticker
and A.date_filed = B.value_Date;
Is the excessive runtime the natural result of the large number of rows, poor index’ing, a bad machine, bad SQL writing, or all of the above? Please let me know if any additional information would be useful (I am not overly familiar with MYSQL, though this issue has moved me significantly down the path of optimization). I greatly appreciate any thoughts or suggestions.
UPDATED with "EXPLAIN SELECT"
1(id) SIMPLE(seltype) A(table) ALL(type) DeltaSharesTickerDateFiled(possible_keys) ... 3038011(rows)
1(id) SIMPLE(seltype) B(table) ref(type) StockDataIndex(possible_keys) StockDataIndex(key) 52(key_len) 13ffeb2013.A.ticker,13ffeb2013.A.date_filed(ref) 1(rows) Using where
UPDATED with table describes.
Stock_Data Table:
idstock_data int(11) NO PRI auto_increment
ticker varchar(45) YES MUL
value_date date YES
stock_close decimal(10,2) YES
Delta_Shares Table:
iddelta_shares int(11) NO PRI auto_increment
cik int(11) YES MUL
ticker varchar(45) YES MUL
date_filed_identify int(11) YES
Price_At_File decimal(10,2) YES
delta_shares int(11) YES
date_filed date YES
marketcomparable varchar(45) YES
market_comparable_price decimal(10,2) YES
industrycomparable varchar(45) YES
industry_comparable_price decimal(10,2) YES
Index from Delta_Shares:
delta_shares 0 PRIMARY 1 iddelta_shares A 3095057 BTREE
delta_shares 1 DeltaIndex 1 cik A 18 YES BTREE
delta_shares 1 DeltaIndex 2 date_filed_identify A 20633 YES BTREE
delta_shares 1 DeltaSharesAllIndex 1 cik A 18 YES BTREE
delta_shares 1 DeltaSharesAllIndex 2 ticker A 619011 YES BTREE
delta_shares 1 DeltaSharesAllIndex 3 date_filed_identify A 3095057 YES BTREE
delta_shares 1 DeltaSharesTickerDateFiled 1 ticker A 11813 YES BTREE
delta_shares 1 DeltaSharesTickerDateFiled 2 date_filed A 3095057 YES BTREE
Index from Stock_Data:
stock_data 0 PRIMARY 1 idstock_data A 18683114 BTREE
stock_data 1 StockDataIndex 1 ticker A 14676 YES BTREE
stock_data 1 StockDataIndex 2 value_date A 18683114 YES BTREE
There are a few benchmarks you could make to see where the bottleneck is. For example, try updating the field to a constant value and see how long it takes (obviously, you'll want to make a copy of the database to do this on). Then try a select query that doesn't update, but just selects the values to be updated and the values they will be updated to.
Benchmarks like these will usually tell you whether you're wasting your time trying to optimize or whether there is much room for improvement.
As for the memory, here's a rough idea of what you're looking at:
varchar fields are 2 bytes plus actual length and datetime fields are 8 bytes. So let's make an extremely liberal guess that your varchar fields in the Stock_Data table average around 42 bytes. With the datetime field that adds up to 50 bytes per row.
50 bytes x 20 million rows = .93 gigabytes
So if this process is the only thing going on in your machine then I don't see memory as being an issue since you can easily fit all the data from both tables that the query is working with in memory at one time. But if there are other things going on then it might be a factor.
Try analyse on both tables and use straight join instead of the implicit join. Just a guess, but it sounds like a confused optimiser.
Related
I have spent 4 hours googling and trying all sorts of indexes, mysqlyog, reading, searching etc. When I add the GROUP BY the query changes from 0.002 seconds to 0.093 seconds. Is this normal and acceptable? Or can I alter the indexes and/or the query?
Table:
uniqueid int(11) NO PRI NULL auto_increment
ip varchar(64) YES NULL
lang varchar(16) YES MUL NULL
timestamp int(11) YES MUL NULL
correct decimal(12,2) YES NULL
user varchar(32) YES NULL
timestart int(11) YES NULL
timeend int(11) YES NULL
speaker varchar(64) YES NULL
postedAnswer int(32) YES NULL
correctAnswerINT int(32) YES NULL
Query:
SELECT
SQL_NO_CACHE
user,
lang,
COUNT(*) AS total,
SUM(correct) AS correct,
ROUND(SUM(correct) / COUNT(*) * 100) AS score,
TIMESTAMP
FROM
maths_score
WHERE TIMESTAMP > 1
AND lang = 'es'
GROUP BY USER
ORDER BY (
(SUM(correct) / COUNT(*) * 100) + SUM(correct)
) DESC
LIMIT 500
explain extended:
id select_type table type possible_keys key key_len ref rows filtered Extra
------ ----------- ----------- ------ ------------------------- -------------- ------- ------ ------ -------- ---------------------------------------------------------------------
1 SIMPLE maths_score ref scoretable,fulltablething fulltablething 51 const 10631 100.00 Using index condition; Using where; Using temporary; Using filesort
Current indexes (I have tried many)
Keyname Type Unique Packed Column Cardinality Collation Null Comment
uniqueid BTREE Yes No uniqueid 21262 A No
scoretable BTREE No No timestamp 21262 A Yes
lang 21262 A Yes
fulltablething BTREE No No lang 56 A Yes
timestamp 21262 A Yes
user 21262 A Yes
Please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.
Do you have INDEX(lang, TIMESTAMP)? (Why.) It is likely to help both versions of the query.
Without the GROUP BY, you get one row, correct? With the GROUP BY, you get many rows, correct? Guess what, it takes more time to deliver more rows.
In addition, the GROUP BY probably involves an extra sort. The ORDER BY involves a sort, but in one case there is only 1 row to sort, hence faster. If there are a million USERs, then the ORDER BY will need to sort a million rows, only to deliver 500.
Please provide EXPLAIN SELECT ... for each case -- you will see some of what I am saying.
So you ran the query without GROUP BY and got one result row in 0.002 secs. Then you added GROUP BY (and ORDER BY obviously) and ended up with multiple result rows in 0.093 secs.
In order to produce this result, the DBMS must somehow order your records by user or create buckets per user, so as to get record count, sum, etc. per user. This takes of course much more time than just running through the table, counting records and summing up a value unconditionally. At last the DBMS must even sort these results again. I am not surprised this runs much longer.
The most appropriate index for this query should be:
create index idx on maths_score (lang, timestamp, user, correct);
This is a covering index, starting with the columns in WHERE, continuing with the column in GROUP BY and ending with all other columns used in the query.
SELECT ticker.ticker_id,
ticker.ticker_code,
inter_day_ticker_candle_price_history.close AS previousDayClose
FROM inter_day_ticker_candle_price_history
INNER JOIN
(SELECT MAX(inter_day_ticker_candle_price_history.candle_price_history_id) AS candle_price_history_id
FROM inter_day_ticker_candle_price_history
WHERE inter_day_ticker_candle_price_history.close>0
GROUP BY inter_day_ticker_candle_price_history.ticker_id) derivedTable
ON inter_day_ticker_candle_price_history.candle_price_history_id = derivedTable.candle_price_history_id
RIGHT JOIN ticker ON ticker.ticker_id = inter_day_ticker_candle_price_history.ticker_id
WHERE ticker.is_active = 1
Kindly suggest me any other technique, I can apply here to reduce the time.
this is the table structure
Field Type Null Key Default Extra
----------------------- ------------- ------ ------ ------- ----------------
candle_price_history_id int(8) NO PRI (NULL) auto_increment
ticker_id bigint(11) NO MUL (NULL)
candle_interval int(11) YES 1
trade_date datetime YES (NULL)
trade_price decimal(16,2) YES (NULL)
trade_size decimal(30,2) YES (NULL)
open decimal(16,2) YES (NULL)
high decimal(16,2) YES (NULL)
low decimal(16,2) YES (NULL)
close decimal(16,2) YES (NULL)
volume bigint(30) YES (NULL)
creation_date datetime YES (NULL)
is_ebabled bit(1) YES b'1'
It would look more natural to select from the ticker table first, then find the latest history entry and then join that:
SELECT
t.ticker_id,
t.ticker_code,
h.close AS previousDayClose
FROM ticker t
LEFT JOIN
(
SELECT ticker_id, MAX(candle_price_history_id) AS candle_price_history_id
FROM inter_day_ticker_candle_price_history
WHERE close > 0
GROUP BY ticker_id
) m on m.ticker_id = t.ticker_id
LEFT JOIN inter_day_ticker_candle_price_history h
ON h.candle_price_history_id = m.candle_price_history_id
WHERE t.is_active = 1;
However, your query should also work.
Make sure to have appropriate indexes. I'd suggest:
create index idx_ticker on ticker(is_active,
ticker_id,
ticker_code);
and
create index idx_history on inter_day_ticker_candle_price_history(ticker_id,
close,
candle_price_history_id);
or
create index idx_history on inter_day_ticker_candle_price_history(close,
ticker_id,
candle_price_history_id);
(The order of columns may make a difference, so you may want to try both versions for the history index. Well, you can of course create both indexes at the same time with different names and see which one gets used.)
Generally creating the apropiate indexes would speed up your querys with multiples filtering conditions.
For instance, creating an index on ticker_id may be the key to make the query faster.
On the other hand, creating indexes on close and is_active could help, but only if is_active = 1 its up to like 10% or less, of the records in the table.
Also, you could change the MAX function for an ORDER BY candle_price_history_id DESC LIMIT 1 as the table is already ordered by candle_price_history_id
This seems to be a "groupwise max" problem. For optimization techniques for that pattern, see http://mysql.rjweb.org/doc.php/groupwise_max .
See my comments on shrinking the table size. I assume this could be a huge table, possibly bigger than will fit in RAM?
If this is InnoDB, innodb_buffer_pool_size needs to be about 70% of available RAM.
If (ticker_id, tradedate) is unique, then make it the PRIMARY KEY and get rid of id completely. The order is important -- this clusters all the rows for a given ticker together, thereby cutting down on I/O. (If you are currently I/O-bound, this may give you a 10-fold speedup.)
Provide EXPLAIN SELECT .... You need for the query (as written) to start with the derived query. LEFT JOIN will not allow that.
Consider getting rid of inactive rows and rows with close <= 0.
I have a table with approximately 120k rows, which contains a field with a BLOB (not more than 1MB each entry in size, usually much less). My problem is that whenever I run a query asking any columns on this table (not including the BLOB one), if the filesystem cache is empty, it takes approximately 40'' to complete. All subsequent queries on the same table require less than 1'' (testing from the command line client, on the server itself). The number of rows returned in the queries vary from an empty set to 60k+
I have eliminated the query cache so it has nothing to do with it.
The table is myisam but I also tried to change it to innodb (and setting ROW_FORMAT=COMPACT), but without any luck.
If I remove the BLOB column, the query is always fast.
So I would assume that the server reads the blobs from the disk (or parts of them) and the filesystem caches them. The problem is that on a server with high traffic and limited memory, the filesystem cache is refreshed every once in a while, so this particular query keeps causing me trouble.
So my question is, is there a way to considerably speed things up, without removing the blob column from the table?
here are 2 example queries, ran one after the other, along with explain, indexes and table definition:
mysql> SELECT ct.score FROM completed_tests ct where ct.status != 'deleted' and ct.status != 'failed' and score < 100;
Empty set (48.21 sec)
mysql> SELECT ct.score FROM completed_tests ct where ct.status != 'deleted' and ct.status != 'failed' and score < 99;
Empty set (1.16 sec)
mysql> explain SELECT ct.score FROM completed_tests ct where ct.status != 'deleted' and ct.status != 'failed' and score < 99;
+----+-------------+-------+-------+---------------+--------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------+---------+------+-------+-------------+
| 1 | SIMPLE | ct | range | status,score | status | 768 | NULL | 82096 | Using where |
+----+-------------+-------+-------+---------------+--------+---------+------+-------+-------------+
1 row in set (0.00 sec)
mysql> show indexes from completed_tests;
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| completed_tests | 0 | PRIMARY | 1 | id | A | 583938 | NULL | NULL | | BTREE | |
| completed_tests | 1 | users_login | 1 | users_LOGIN | A | 11449 | NULL | NULL | YES | BTREE | |
| completed_tests | 1 | tests_ID | 1 | tests_ID | A | 140 | NULL | NULL | | BTREE | |
| completed_tests | 1 | status | 1 | status | A | 3 | NULL | NULL | YES | BTREE | |
| completed_tests | 1 | timestamp | 1 | timestamp | A | 291969 | NULL | NULL | | BTREE | |
| completed_tests | 1 | archive | 1 | archive | A | 1 | NULL | NULL | | BTREE | |
| completed_tests | 1 | score | 1 | score | A | 783 | NULL | NULL | YES | BTREE | |
| completed_tests | 1 | pending | 1 | pending | A | 1 | NULL | NULL | | BTREE | |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> show create table completed_tests;
+-----------------+--------------------------------------
| Table | Create Table |
+-----------------+--------------------------------------
| completed_tests | CREATE TABLE `completed_tests` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`users_LOGIN` varchar(100) DEFAULT NULL,
`tests_ID` mediumint(8) unsigned NOT NULL DEFAULT '0',
`test` longblob,
`status` varchar(255) DEFAULT NULL,
`timestamp` int(10) unsigned NOT NULL DEFAULT '0',
`archive` tinyint(1) NOT NULL DEFAULT '0',
`time_start` int(10) unsigned DEFAULT NULL,
`time_end` int(10) unsigned DEFAULT NULL,
`time_spent` int(10) unsigned DEFAULT NULL,
`score` float DEFAULT NULL,
`pending` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `users_login` (`users_LOGIN`),
KEY `tests_ID` (`tests_ID`),
KEY `status` (`status`),
KEY `timestamp` (`timestamp`),
KEY `archive` (`archive`),
KEY `score` (`score`),
KEY `pending` (`pending`)
) ENGINE=InnoDB AUTO_INCREMENT=117996 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
1 row in set (0.00 sec)
I originally posted this on mysql query slow at first fast afterwards but I now have more information so I repost as a different question
I also posted this on the mysql forum, but I haven't heard back
Thanks in advance as always
The design of BLOB (=TEXT) storage in MySQL seems to be totally flawed and counter-intuitive. I ran a couple of times into the same problem and was unable to find any authoritative explanation. The most detailed analysis I've finally found is this post from 2010: http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/
General belief and expectation is that BLOBs/TEXTs are stored outside main row storage (e.g., see this answer). This is NOT TRUE, though. There are several issues here (I'm basing on the article given above):
If the size of a BLOB item is several KB, it is included directly in row data. Consequently, even if you SELECT only non-BLOB columns, the engine still has to load all your BLOBs from disk. Say, you have 1M rows with 100 bytes of non-blob data each and 5000 bytes of blob data. You SELECT all non-blob columns and expect that MySQL would read from disk around 100-120 bytes per row, which is 100-120 MB in total (+20 for BLOB address). However, the reality is that MySQL stores all BLOBs in the same disk blocks as rows, so they all must be read together even if not used, and so the size of data read from disk is around 5100 MB = 5 GB - this is 50 times more than you would expect and means 50 times slower query execution.
Of course, this design has an advantage: when you need all the columns, including the blob one, SELECT query is faster when blobs are stored with the row than when stored externally: you avoid (sometimes) 1 additional page access per row. However, this is not a typical use case for BLOBs and DB engine should not be optimized towards this case. If your data is so small that it fits in a row and you're fine with loading it in every query no matter if needed or not - then you would use VARCHAR type instead of BLOB/TEXT.
Even if for some reason (long row or long blob) the BLOB value is stored externally, its 768-byte prefix is still kept in the row itself. Let's take the previous example: you have 100 bytes of non-blob data in each row, but now the blob column holds items of 1 MB each so they must be kept externally. SELECT of non-blob columns will have to read roughly 800 bytes per row (non-blobs + blob prefix), instead of 100-120 - this is again 7 times larger disk transfer than you'd expect, and 7x slower query execution.
External BLOB storage is ineffective in its usage of disk space: it allocates space in blocks of 16 KB and single block cannot hold multiple items, so if your blobs are small and take, for instance, 8 KB each, the actual space allocated is twice that large.
I hope this design will get fixed one day: MySQL will store ALL blobs - big and small - in external storage, without any prefixes kept in DB, with external storage allocation being efficient for items of all sizes. Before this happens, separating out BLOB/TEXT columns seems the only reasonable solution - separating out to another table or to the filesystem (each BLOB value kept as a file).
[UPDATE 2019-10-15]
InnoDB documentation provides now an ultimate answer to the issue discussed above:
https://dev.mysql.com/doc/refman/8.0/en/innodb-row-format.html
The case of storing 768-byte prefixes of BLOB/TEXT values inline holds indeed for COMPACT row format. According to the docs, "For each non-NULL variable-length field (...) The internal part is 768 bytes".
However, you can use DYNAMIC row format instead. With this format:
"InnoDB can store long variable-length column values (...) fully off-page, with the clustered index record containing only a 20-byte pointer to the overflow page. (...) TEXT and BLOB columns that are less than or equal to 40 bytes are stored in line."
Here, a BLOB value can occupy up to 40 bytes of inline storage, which is much better than 768 bytes as in the COMPACT mode, and looks like a lot more reasonable approach in the case you want to mix BLOB and non-BLOB types in a table and still be able to scan multiple rows pretty fast. Moreover, the extended (over 20 bytes) inline storage is used ONLY for values sized between 20-40 bytes; for larger values, only the 20-byte pointer is stored (no prefix), unlike in the COMPACT mode. Hence, the extended 40-byte storage is used rarely in practice and one can safely assume the average size of inline storage to be just 20 bytes (or less, if you tend to keep many small values of less than 20B in your BLOB). All in all, it seems DYNAMIC row format, rather than COMPACT, should be the default choice in most cases to achieve good predictable performance of BLOB columns in InnoDB.
An example how to check the actual physical storage in InnoDB can be found here:
https://dba.stackexchange.com/a/210430/177276
As to MyISAM, it apparently does NOT provide off-page storage for BLOBs at all (just inline). Check here for more info:
https://dev.mysql.com/doc/refman/5.7/en/dynamic-format.html
https://forums.mysql.com/read.php?24,105964,267596#msg-267596
I was doing research on this issue for a while. Many people recommend using blob with only one primary key in a separate table and storing the blobs meta data in another table with a foreign key to the blob table. With this the performance will be higher considerably.
Adding a composite index on the two relevant columns should allow these queries to be executed without accessing the table data directly.
CREATE INDEX `IX_score_status` ON `completed_tests` (`score`, `status`);
If you are able to switch to MariaDB then you can make the most of the table elimination optimisations. This would allow you to split the BLOB field out into it's own table and use a view to recreate you existing table structure using a LEFT JOIN. This way it will only access the BLOB data if it is explicitly required for the executing query.
Just add index or indexes to fields used after WHERE query for a table with blobs.
e.g. You have 2 tables with those fields
users : USERID, NAME, ...
userphotos : BLOBID, BLOB, USERNO, ...
select * from userphotos where USERNO=123456;
Normaly this works fine. When you have many large images (e.g. BLOB, MEDIUMBLOB or LONGBLOB more than 5GB in total ) this will take much time (more than minutes) while BLOBID is primary key.
Somehow MySQL is searching whole data including images if there is no index about the field of BLOB table in WHERE clause. When your data goes larger and larger that takes much time. If you create index for the field USERNO, this will speed up your database and it will be independed by the size of whole data.
Solution:
**Add Index to the USERNO at userphotos**
As an answer to your question you should create index for the ct.status
tl;rd:
DB Partitioning with Primary Key
Index size problem.
DB size grows around 1-3 GB per day
Raid setup.
Do you have experience with Hypertable?
Long Version:
i just build / bought a home server:
Xeon E3-1245 3,4 HT
32GB RAM
6x 1,5 TB WD Cavier Black 7200
I will use the Server Board INTEL S1200BTL Raid (no money left for a raid controller). http://ark.intel.com/products/53557/Intel-Server-Board-S1200BTL
The mainboard has 4x SATA 3GB/s ports and 2x SATA 6GB/s
I'm not yet sure if i can setup all 6hdds in RAID 10,
if not possible, i thought 4x hdds Raid 10 (MYSQL DB) & 2xhdds Raid 0 for (OS/Mysql Indexes).
(If raid 0 breaks, its no problem for me, i need only secure the DB)
About the DB:
Its a web crawler DB, where domains, urls, links and such stuff gets stored.
So i thought i partition the DB with the primary keys of each table like
(1-1000000) (1000001-2000000) and so on.
When i do search / insert / select queries in the DB, i need to scan the hole table, cause some stuff could be in ROW 1 and the other in ROW 1000000000000.
If i do such partition by primary key (auto_increment) will this use all my CPU cores? So that it scans each partition parallel? Or should i stick with one huge DB without a partition.
The DB will be very big, on my home System right now its,
Table extract: 25,034,072 Rows
Data 2,058.7 MiB
Index 2,682.8 MiB
Total 4,741.5 MiB
Table Structure:
extract_id bigint(20) unsigned NO PRI NULL auto_increment
url_id bigint(20) NO MUL NULL
extern_link varchar(2083) NO MUL NULL
anchor_text varchar(500) NO NULL
http_status smallint(2) unsigned NO 0
Indexes:
PRIMARY BTREE Yes No extract_id 25034072
link BTREE Yes No url_id
extern_link (400) 25034072
externlink BTREE No No extern_link (400) 1788148
Table urls: 21,889,542 Rows
Data 2,402.3 MiB
Index 3,456.2 MiB
Total 5,858.4 MiB
Table Structure:
url_id bigint(20) NO PRI NULL auto_increment
domain_id bigint(20) NO MUL NULL
url varchar(2083) NO NULL
added date NO NULL
last_crawl date NO NULL
extracted tinyint(2) unsigned NO MUL 0
extern_links smallint(5) unsigned NO 0
crawl_status tinyint(11) unsigned NO 0
status smallint(2) unsigned NO 0
INDEXES:
PRIMARY BTREE Yes No url_id 21889542
domain_id BTREE Yes No domain_id 0
url (330) 21889542
extracted_status BTREE No No extracted 2
status 31
I see i could fix the externlink & link indexes, i just added externlink cause i needed to query that field and i was not able to use the link index. Do you see, what I could tune on the indexes? My new system will have 32 GB but if the DB grows in this speed, i will use 90% of the RAM in FEW wks / months.
Does a packed INDEX help? (How is the performance decrease?)
The other important tables are under 500MB.
Only the URL Source table is huge: 48.6 GiB
Structure:
url_id BIGINT
pagesource mediumblob data is packed with gzip high compression
Index is only on url_id (unique).
From this table the data can be wiped, when i have extracted all what i need.
Do you have any experience with Hypertables? http://hypertable.org/ <= Googles Bigtables. If I move to Hypertables, would this help me in performance (extracting data / searching / inserting / selecting & DB size). I read on the page but I'm still some clueless. Cause you cant directly compare MYSQL with Hypertables. I will try it out soon, must read the documentation first.
What i need, a solution, which fits in my setup, cause i have no money left for any other hardware setup.
Thanks for help.
Hypertable is an excellent choice for a crawl database. Hypertable is an open source, high performance, scalable database modeled after Google's Bigtable. Google developed Bigtable specifically for their crawl database. I recommend reading the Bigtable paper since it uses the crawl database as the running example.
Regarding to #4 (RAID setup), It's not recommended to use RAID5 for production servers. Great article about it -> http://www.dbasquare.com/2012/04/02/should-raid-5-be-used-in-a-mysql-server/
I am working on an e-shop which sells products only via loans. I display 10 products per page in any category, each product has 3 different price tags - 3 different loan types. Everything went pretty well during testing time, query execution time was perfect, but today when transfered the changes to the production server, the site "collapsed" in about 2 minutes. The query that is used to select loan types sometimes hangs for ~10 seconds and it happens frequently and thus it cant keep up and its hella slow. The table that is used to store the data has approximately 2 milion records and each select looks like this:
SELECT *
FROM products_loans
WHERE KOD IN("X17/Q30-10", "X17/12", "X17/5-24")
AND 369.27 BETWEEN CENA_OD AND CENA_DO;
3 loan types and the price that needs to be in range between CENA_OD and CENA_DO, thus 3 rows are returned.
But since I need to display 10 products per page, I need to run it trough a modified select using OR, since I didnt find any other solution to this. I have asked about it here, but got no answer. As mentioned in the referencing post, this has to be done separately since there is no column that could be used in a join (except of course price and code, but that ended very, very badly). Here is the show create table, kod and CENA_OD/CENA_DO very indexed via INDEX.
CREATE TABLE `products_loans` (
`KOEF_ID` bigint(20) NOT NULL,
`KOD` varchar(30) NOT NULL,
`AKONTACIA` int(11) NOT NULL,
`POCET_SPLATOK` int(11) NOT NULL,
`koeficient` decimal(10,2) NOT NULL default '0.00',
`CENA_OD` decimal(10,2) default NULL,
`CENA_DO` decimal(10,2) default NULL,
`PREDAJNA_CENA` decimal(10,2) default NULL,
`AKONTACIA_SUMA` decimal(10,2) default NULL,
`TYP_VYHODY` varchar(4) default NULL,
`stage` smallint(6) NOT NULL default '1',
PRIMARY KEY (`KOEF_ID`),
KEY `CENA_OD` (`CENA_OD`),
KEY `CENA_DO` (`CENA_DO`),
KEY `KOD` (`KOD`),
KEY `stage` (`stage`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
And also selecting all loan types and later filtering them trough php doesnt work good, since each type has over 50k records and the select takes too much time as well...
Any ides about improving the speed are appreciated.
Edit:
Here is the explain
+----+-------------+----------------+-------+---------------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+-------+---------------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | products_loans | range | CENA_OD,CENA_DO,KOD | KOD | 92 | NULL | 190158 | Using where |
+----+-------------+----------------+-------+---------------------+------+---------+------+--------+-------------+
I have tried the combined index and it improved the performance on the test server from 0.44 sec to 0.06 sec, I cant access the production server from home though, so I will have to try it tomorrow.
Your issue is that you are searching for intervals which contain a point (rather than the more normal query of all points in an interval). These queries do not work well with the standard B-tree index, so instead you need to use an R-Tree index. Unfortunately MySQL doesn't allow you to select an R-Tree index on a column, but you can get the desired index by changing your column type to GEOMETRY and using the geometric functions to check if the interval contains the point.
See Quassnoi's article Adjacency list vs. nested sets: MySQL where he explains this in more detail. The use case is different, but the techniques involved are the same. Here's an extract from the relevant part of the article:
There is also a certain class of tasks that require searching for all ranges containing a known value:
Searching for an IP address in the IP range ban list
Searching for a given date within a date range
and several others. These tasks can be improved by using R-Tree capabilities of MySQL.
Try to refactor your query like:
SELECT * FROM products_loans
WHERE KOD IN("X17/Q30-10", "X17/12", "X17/5-24")
AND CENA_OD >= 369.27
AND CENA_DO <= 369.27;
(mysql is not very smart when choosing indexes) and check the performance.
The next try is to add a combined key - (KOD,CENA_OD,CENA_DO)
And the next major try is to refactor your base to have products separated from prices. This should really help.
PS: you can also migrate to postgresql, it's smarter than mysql when choosing right indexes.
MySQL can only use 1 key. If you always get the entry by the 3 columns, depending on the actual data (range) in the columns one of the following could very well add a serious amount of performance:
ALTER TABLE products_loans ADD INDEX(KOD, CENA_OD, CENA_DO);
ALTER TABLE products_loans ADD INDEX(CENA_OD, CENA_DO, KOD);
Notice that the order of the columns matter! If that doesn't improve performance, give us the EXPLAIN output of the query.