We would like to add a normal index to field 3 & 4 of the following MySQL table and would like to understand the impact to the server performance before doing so. E.g. will the index take up additional RAM and slow down the database as a result?
we understand it will take time initially to create the index. we're not concerned about that. rather, we want to know if we need to upgrade our server to anticipate for the potential increase in loading/memory pressure to the database after adding the index. our dba insists that we must increase RAM from 16GB to 48GB as he believes the new index will be kept in the RAM causing the server to run out of memory for other operations. would be great to confirm if that's necessary.
Thanks in advance for your expert advice.
MySQL version: 5.5.30
OS: CentOS
Hardware config: 8 Core, 32G RAM, 1TB Disk
Table size: 490GB
No. of rows: 67M
CREATE TABLE `mytable` (
`field_1` text NOT NULL,
`field_2` varchar(200) NOT NULL,
`field_3` varchar(100) NOT NULL,
`field_4` text NOT NULL,
`field_5` char(8) NOT NULL,
`field_6` varchar(100) NOT NULL DEFAULT '',
`field_7` varchar(100) DEFAULT '',
`field_8` varchar(20) NOT NULL,
`field_9` char(16) NOT NULL,
`field_0` varchar(25) NOT NULL,
`field_a` varchar(50) NOT NULL DEFAULT '',
`field_b` varchar(20) DEFAULT '',
`field_c` varchar(35) DEFAULT '',
`field_d` varchar(35) DEFAULT '',
`field_e` varchar(30) NOT NULL DEFAULT '',
`field_f` varchar(30) DEFAULT '',
`field_g` varchar(3) NOT NULL DEFAULT 'xx',
`field_h` varchar(50) DEFAULT '',
`field_i` varchar(100) DEFAULT '',
`field_j` char(8) NOT NULL,
`field_k` varchar(10) NOT NULL DEFAULT '',
`field_l` datetime NOT NULL,
PRIMARY KEY (`field_9`),
KEY `field_j_idx` (`field_j`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
First of all, indexes are stored on the disk, not in the memory. Both MyISAM and innodb may cache certain index blocks into the memory to enable faster access to the most commonly used blocks. For innodb the size of this buffer is controlled by the innodb_buffer_pool_size server system variable.
As you can see from the description, the setting of this variable is not affected by the addition or removal of indexes. So, unless you decide to increase the size of this variable, there is no direct impact of adding new index on MySQL memory usage.
Obviously, adding a new index to a large existing table will have a performance impact during the creation of the index. There will be an obvious impact after the index is added on any insert / update / delete operations, since MySQL will have to update the additional index data as well.
It depends. What version of MySQL do you have? With newer versions, ALGORITHM=INPLACE makes adding a secondary, non-unique, index relatively fast and painless.
You have another potential problem looming. If this table is really half the size of disk, if you do need to do an ALTER that cannot be done with INPLACE, it will probably crash for lack of disk space. Consider getting a bigger disk before this happens, and/or think about ways to shrink the table.
CHAR(8) -- what kind of data is in it? If it is always hex or plain letters, it should be declared CHARACTER SET ascii (or latin1), not utf8 -- which takes 24 bytes. Field_j already takes double that because of the index.
If some of the columns have repeated values, consider "normalizing" them. Then replace the bulky string with MEDIUMINT UNSIGNED (3 bytes, 16M max) or INT UNSIGNED.
(I understand your need for obfuscation of the column names, but it makes it hard to give you concrete suggestions.)
field_4 is TEXT, which cannot be indexed. Please describe further what type of text is in it; we may be able to suggest workarounds.
I assume innodb_file_per_table=ON when you built the table? And is still ON? Else, all hope is lost.
Related
I'm having a problem with .ibd MySQL files.
Scenario:
I'm having a ubuntu server of 200GB and deployed an application of Django and using MySQL server.
The nature of my application is to store huge data and do some x type of processing on it. I have one table which has 5 to 6 million data recrods. This Table has acquired almost 60GB of space (The space occupied by tablename.ibd file).
I tried running Optimize table tablename but the .ibd file doesn't get shrunk.
The InnoDb is true.
PROBLEM
Firstly the storage is running out as the file getting too much large.
Secondly when I try to migrate the migration for adding a column on this table while running the server gets out of space because on running migration the .ibd file starts getting bigger and the server eventually runs out of space.
I will be very thankful If someone helps me out of this.
Note:(I could not purge data from the table as data is very important for me)
(UPDATED)
SHOW CREATE TABLE tablename
| Table | Create Table |
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| table_name | CREATE TABLE `table_name` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) DEFAULT NULL,
`price` double DEFAULT NULL,
`item_identifier` varchar(20) NOT NULL,
`upc` varchar(20) DEFAULT NULL,
`mpn` varchar(100) DEFAULT NULL,
`weight` double DEFAULT NULL,
`weight_unit` varchar(10) DEFAULT NULL,
`main_category` varchar(50) DEFAULT NULL,
`sub_category` varchar(50) DEFAULT NULL,
`category_tree` varchar(500) DEFAULT NULL,
`description` varchar(3800) DEFAULT NULL,
`color` varchar(50) DEFAULT NULL,
`brand` varchar(150) DEFAULT NULL,
`main_image` varchar(2048) DEFAULT NULL,
`secondary_images` varchar(255) DEFAULT NULL,
`shipping` double,
`stock` int(11) NOT NULL,
`sale_rank` varchar(100) DEFAULT NULL,
`itemHeight` double DEFAULT NULL,
`itemLength` double DEFAULT NULL,
`itemWeight` double DEFAULT NULL,
`itemWidth` double DEFAULT NULL,
`manufacturer` varchar(100) DEFAULT NULL,
`product_model` varchar(150) DEFAULT NULL,
`variations` longtext,
`pack_count` int(11),
`size` varchar(100) DEFAULT NULL,
`flavor` varchar(100) DEFAULT NULL,
`successfully_stored` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `item_identifier` (`item_identifier`),
KEY `table_name_upc_3ca3d702` (`upc`)
) ENGINE=InnoDB AUTO_INCREMENT=7279139 DEFAULT CHARSET=latin1 |
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
SHOW TABLE STATUS LIKE 'tablename'\G
*************************** 1. row ***************************
Name: table_name
Engine: InnoDB
Version: 10
Row_format: Dynamic
Rows: 7439966
Avg_row_length: 8807
Data_length: 65530740736
Max_data_length: 0
Index_length: 323633152
Data_free: 5242880
Auto_increment: 7279139
Create_time: 2021-06-11 21:26:17
Update_time: 2021-06-12 18:08:06
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.01 sec)
InnoDB disk space is 2-3 times as much as you might think. This is because of several different "overhead" things. They provide performance and features; live with it.
60GB / 5M = 12KB per row. Sounds like you have one more big TEXT or BLOB columns? Please provide SHOW CREATE TABLE so we can further discuss the layout of the table.
(OPTIMIZE TABLE is rarely of any use; don't bother using it.)
Sizes
Bill covered most of the size-related things (DOUBLE->FLOAT, etc); alas they will shrink the disk footprint by only a few percent in your case.
It seems that variations must be the bulkiest column. What do you get from SELECT AVG(LENGTH(variations)) FROM table_name; ? I suspect it is a few thousand. Most "text" can easily be compressed 3:1 by standard compression libraries. If the average is 3000, then the potential savings is about 2KB which is something like 20-30% of the table. (It may save more due to the "off-record" storage mechanism, but the computation is complex.)
Compressing a single column requires the cooperation of the client. That is, code in Django needs to compress and uncompress the column between client and server.
Using ROW_FORMAT=COMPRESSED gives about 2:1 compression for the whole table and is transparent to the client. So, overall, this is probably better.
As Bill points, out, all of this is a temporary fix -- you will run out of disk space as the table grows. That is, Optimize, smaller datatypes, and compressions are only temporary fixes. You really need more disk space.
Get a server with larger storage volumes.
Alternative: Get a second server running MySQL Server, and move some of the data in your current instance to that new instance.
Re your update with the table definition and status:
The table status shows that the data length, that is, the rows, use about 61 GiB, and the secondary indexes use about 0.3 GiB. So it's unlikely that you can save space by dropping indexes.
The average row size is 8807 bytes (this is an estimate, it's just the data_length divided by the number of rows). You might be able to reduce the average row size a little bit by changing some data types.
For example, each double takes 8 bytes. Could you get enough precision using float or numeric(9,2) instead? These take 4 bytes each. Similarly, there are some int columns that might be able to be smallint and still store the range of values you need.
You should read about the storage requirements of each data type and make decisions about how best to define your columns. See https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html
The variable-length data types like varchar and longtext already store only the length of the content in the column on each row, not the max length allowed. So for example changing varchar(200) to varchar(100) doesn't make any difference if the strings in them are already shorter than 100 characters.
There are some cases of varchar that might be replaced by an integer reference to a lookup table. An integer may take less space than repeating the same string on every row.
You could use the InnoDB COMPRESSED row format. This has variable results depending on your data, but it might shrink strings by about half.
Changing data types and the row format do require you to run ALTER TABLE, so there needs to be enough storage space for the copy of the table temporarily, similar to running OPTIMIZE TABLE. If you don't have enough space to do that, then you can't alter the table.
Even with these techniques, your table will still be quite large, and databases tend to grow over time as we store more rows of data in them. Even if you shrink it a bit today, you will still need a plan for getting a larger storage volume eventually.
There are several Q&A for "Why is InnoDB (much) slower than MyISAM", but I could not find any topic for the opposite.
So I had a table defined as InnoDB wherin I stored file contents in a blob field. Because normally for that MyISAM should be used I switched over that table. Here is its structure:
CREATE TABLE `liv_fx_files_files` (
`fid` int(11) NOT NULL AUTO_INCREMENT,
`filedata` longblob NOT NULL,
`filetype` varchar(255) NOT NULL,
`filename` varchar(255) NOT NULL,
`filesize` int(11) NOT NULL,
`context` varchar(1) NOT NULL DEFAULT '',
`saveuser` varchar(32) NOT NULL,
`savetime` int(11) NOT NULL,
`_state` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`fid`),
KEY `_state` (`_state`)
) ENGINE=MyISAM AUTO_INCREMENT=4550 DEFAULT CHARSET=utf8;
There are 4549 records stored in it so far (with filedata going from 0 to 48M. Sum of all files is about 6G.
So whenever I need to know current total filesize I issue the query
SELECT SUM(filesize) FROM liv_fx_files_files;
The problem is that since I switched from InnoDB to MyISAM this simple query lasts really long (about 30sec and longer) whereas on InnoDB it was done in a under one second.
But aggregations are not the only queries which are very slow; it's almost every query.
I guess I could fix it by adopting config (which is currently optimized for InnoDB (only) use), but don't know which settings to adjust. Does anyone have a hint for me please?
current mysql server config (SHOW VARIABLES as csv)
Example for another query fired on both table types (both contain exact same data and have same definition). All other tested queries behave the same, say run much longer against MyISAM table as InnoDB!
SELECT sql_no_cache `fxfilefile`.`filename` AS `filename` FROM `myisamtable`|`innodbtable` AS `fxfilefile` WHERE `fxfilefile`.`filename` LIKE '%foo%';
Executive Summary: Use InnoDB, and change the my.cnf settings accordingly.
Details:
"MyISAM is faster" -- This is an old wives' tale. Today, InnoDB is faster in most situations.
Assuming you have at least 4GB of RAM...
If all-MyISAM, key_buffer_size should be about 20% of RAM; innodb_buffer_pool_size should be 0.
If all-InnoDB, key_buffer_size should be, say, only 20MB; innodb_buffer_pool_size should be about 70% of RAM.
If a mixture, do something in between. More discussion.
Let's look at how things are handled differently by the two Engines.
MyISAM puts the entire BLOB 'inline' with the other columns.
InnoDB puts most or all of each blob in other blocks.
Conclusion:
A table scan in a MyISAM table spends a lot of time stepping over cow paddies; InnoDB is much faster if you don't touch the BLOB.
This makes InnoDB a clear winner for SELECT SUM(x) FROM tbl; when there is no index on x. With INDEX(x), either engine will be fast.
Because of the BLOB being inline, MyISAM has fragmentation issues if you update records in the table; InnoDB has much less fragmentation. This impacts all operations, making InnoDB the winner again.
The order of the columns in the CREATE TABLE has no impact on performance in either engine.
Because the BLOB dominates the size of each row, the tweaks to the other columns will have very little impact on performance.
If you decide to go with MyISAM, I would recommend a 'parallel' table ('vertical partitioning'). Put the BLOB and the id in it a separate table. This would help MyISAM come closer to InnoDB's model and performance, but would add complexity to your code.
For "point queries" (looking up a single row via an index), there won't be much difference in performance between the engines.
Your my.cnf seems antique; set-variable has not been necessary in a long time.
Try to edit your MySQL config file, usually /etc/mysql/my.cnf and use "huge" preset.
# The MySQL server
[mysqld]
port = 3306
socket = /var/run/mysqld/mysqld.sock
skip-locking
set-variable = key_buffer=384M
set-variable = max_allowed_packet=1M
set-variable = table_cache=512
set-variable = sort_buffer=2M
set-variable = record_buffer=2M
set-variable = thread_cache=8
# Try number of CPU's*2 for thread_concurrency
set-variable = thread_concurrency=8
set-variable = myisam_sort_buffer_size=64M
Certainly 30 seconds to read 4500 records is very slow. Assuming there is plenty of room for I/O caching then the first thing I would try is to change the order of the fields; if these are written to the table in the order they are declared the DBMS would need to seek to the end of each record before reading the size value (I'd also recommend capping the size of those vharchar(255) columns, and that varhar(1) NOT NULL should be CHAR).
CREATE TABLE `liv_fx_files_files2` (
`fid` int(11) NOT NULL AUTO_INCREMENT,
`filesize` int(11) NOT NULL,
`context` char(1) NOT NULL DEFAULT '',
`saveuser` varchar(32) NOT NULL,
`savetime` int(11) NOT NULL,
`_state` int(11) NOT NULL DEFAULT '0',
`filetype` varchar(255) NOT NULL,
`filename` varchar(255) NOT NULL,
`filedata` longblob NOT NULL,
PRIMARY KEY (`fid`),
KEY `_state` (`_state`)
) ENGINE=MyISAM AUTO_INCREMENT=4550 DEFAULT CHARSET=utf8;
INSERT INTO liv_fx_files_files2
(fid, filesize, context, saveuser, savetime, _state, filetype, filename, filedata)
SELECT fid, filesize, context, saveuser, savetime, _state, filetype, filename, filedata
FROM liv_fx_files_files;
But ideally I'd split the data and metadata into separate tables.
We are having a Analytics product. For each of our customer we give one JavaScript code, they put that in their web sites. If a user visit our customer site the java script code hit our server so that we store this page visit on behalf of this customer. Each customer contains unique domain name.
we are storing this page visits in MySql table.
Following is the table schema.
CREATE TABLE `page_visits` (
`domain` varchar(50) DEFAULT NULL,
`guid` varchar(100) DEFAULT NULL,
`sid` varchar(100) DEFAULT NULL,
`url` varchar(2500) DEFAULT NULL,
`ip` varchar(20) DEFAULT NULL,
`is_new` varchar(20) DEFAULT NULL,
`ref` varchar(2500) DEFAULT NULL,
`user_agent` varchar(255) DEFAULT NULL,
`stats_time` datetime DEFAULT NULL,
`country` varchar(50) DEFAULT NULL,
`region` varchar(50) DEFAULT NULL,
`city` varchar(50) DEFAULT NULL,
`city_lat_long` varchar(50) DEFAULT NULL,
`email` varchar(100) DEFAULT NULL,
KEY `sid_index` (`sid`) USING BTREE,
KEY `domain_index` (`domain`),
KEY `email_index` (`email`),
KEY `stats_time_index` (`stats_time`),
KEY `domain_statstime` (`domain`,`stats_time`),
KEY `domain_email` (`domain`,`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
We don't have primary key for this table.
MySql server details
It is Google cloud MySql (version is 5.6) and storage capacity is 10TB.
As of now we are having 350 million rows in our table and table size is 300 GB. We are storing all of our customer details in the same table even though there is no relation between one customer to another.
Problem 1: For few of our customers having huge number of rows in table, so performance of queries against these customers are very slow.
Example Query 1:
SELECT count(DISTINCT sid) AS count,count(sid) AS total FROM page_views WHERE domain = 'aaa' AND stats_time BETWEEN CONVERT_TZ('2015-02-05 00:00:00','+05:30','+00:00') AND CONVERT_TZ('2016-01-01 23:59:59','+05:30','+00:00');
+---------+---------+
| count | total |
+---------+---------+
| 1056546 | 2713729 |
+---------+---------+
1 row in set (13 min 19.71 sec)
I will update more queries here. We need results in below 5-10 seconds, will it be possible?
Problem 2: The table size is rapidly increasing, we might hit table size 5 TB by this year end so we want to shard our table. We want to keep all records related to one customer in one machine. What are the best practises for this sharding.
We are thinking following approaches for above issues, please suggest us best practices to overcome these issues.
Create separate table for each customer
1) What are the advantages and disadvantages if we create separate table for each customer. As of now we are having 30k customers we might hit 100k by this year end that means 100k tables in DB. We access all tables simultaneously for Read and Write.
2) We will go with same table and will create partitions based on date range
UPDATE : Is a "customer" determined by the domain? Answer is Yes
Thanks
First, a critique if the excessively large datatypes:
`domain` varchar(50) DEFAULT NULL, -- normalize to MEDIUMINT UNSIGNED (3 bytes)
`guid` varchar(100) DEFAULT NULL, -- what is this for?
`sid` varchar(100) DEFAULT NULL, -- varchar?
`url` varchar(2500) DEFAULT NULL,
`ip` varchar(20) DEFAULT NULL, -- too big for IPv4, too small for IPv6; see below
`is_new` varchar(20) DEFAULT NULL, -- flag? Consider `TINYINT` or `ENUM`
`ref` varchar(2500) DEFAULT NULL,
`user_agent` varchar(255) DEFAULT NULL, -- normalize! (add new rows as new agents are created)
`stats_time` datetime DEFAULT NULL,
`country` varchar(50) DEFAULT NULL, -- use standard 2-letter code (see below)
`region` varchar(50) DEFAULT NULL, -- see below
`city` varchar(50) DEFAULT NULL, -- see below
`city_lat_long` varchar(50) DEFAULT NULL, -- unusable in current format; toss?
`email` varchar(100) DEFAULT NULL,
For IP addresses, use inet6_aton(), then store in BINARY(16).
For country, use CHAR(2) CHARACTER SET ascii -- only 2 bytes.
country + region + city + (maybe) latlng -- normalize this to a "location".
All these changes may cut the disk footprint in half. Smaller --> more cacheable --> less I/O --> faster.
Other issues...
To greatly speed up your sid counter, change
KEY `domain_statstime` (`domain`,`stats_time`),
to
KEY dss (domain_id,`stats_time`, sid),
That will be a "covering index", hence won't have to bounce between the index and the data 2713729 times -- the bouncing is what cost 13 minutes. (domain_id is discussed below.)
This is redundant with the above index, DROP it:
KEY domain_index (domain)
Is a "customer" determined by the domain?
Every InnoDB table must have a PRIMARY KEY. There are 3 ways to get a PK; you picked the 'worst' one -- a hidden 6-byte integer fabricated by the engine. I assume there is no 'natural' PK available from some combination of columns? Then, an explicit BIGINT UNSIGNED is called for. (Yes that would be 8 bytes, but various forms of maintenance need an explicit PK.)
If most queries include WHERE domain = '...', then I recommend the following. (And this will greatly improve all such queries.)
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
domain_id MEDIUMINT UNSIGNED NOT NULL, -- normalized to `Domains`
PRIMARY KEY(domain_id, id), -- clustering on customer gives you the speedup
INDEX(id) -- this keeps AUTO_INCREMENT happy
Recommend you look into pt-online-schema-change for making all these changes. However, I don't know if it can work without an explicit PRIMARY KEY.
"Separate table for each customer"? No. This is a common question; the resounding answer is No. I won't repeat all the reasons for not having 100K tables.
Sharding
"Sharding" is splitting the data across multiple machines.
To do sharding, you need to have code somewhere that looks at domain and decides which server will handle the query, then hands it off. Sharding is advisable when you have write scaling problems. You did not mention such, so it is unclear whether sharding is advisable.
When sharding on something like domain (or domain_id), you could use (1) a hash to pick the server, (2) a dictionary lookup (of 100K rows), or (3) a hybrid.
I like the hybrid -- hash to, say, 1024 values, then look up into a 1024-row table to see which machine has the data. Since adding a new shard and migrating a user to a different shard are major undertakings, I feel that the hybrid is a reasonable compromise. The lookup table needs to be distributed to all clients that redirect actions to shards.
If your 'writing' is running out of steam, see high speed ingestion for possible ways to speed that up.
PARTITIONing
PARTITIONing is splitting the data across multiple "sub-tables".
There are only a limited number of use cases where partitioning buys you any performance. You not indicated that any apply to your use case. Read that blog and see if you think that partitioning might be useful.
You mentioned "partition by date range". Will most of the queries include a date range? If so, such partitioning may be advisable. (See the link above for best practices.) Some other options come to mind:
Plan A: PRIMARY KEY(domain_id, stats_time, id) But that is bulky and requires even more overhead on each secondary index. (Each secondary index silently includes all the columns of the PK.)
Plan B: Have stats_time include microseconds, then tweak the values to avoid having dups. Then use stats_time instead of id. But this requires some added complexity, especially if there are multiple clients inserting data. (I can elaborate if needed.)
Plan C: Have a table that maps stats_time values to ids. Look up the id range before doing the real query, then use both WHERE id BETWEEN ... AND stats_time .... (Again, messy code.)
Summary tables
Are many of the queries of the form of counting things over date ranges? Suggest having Summary Tables based perhaps on per-hour. More discussion.
COUNT(DISTINCT sid) is especially difficult to fold into summary tables. For example, the unique counts for each hour cannot be added together to get the unique count for the day. But I have a technique for that, too.
I wouldn't do this if i were you. First thing that come to mind would be, on receive a pageview message, i send the message to a queue so that a worker can pickup and insert to database later (in bulk maybe); also i increase the counter of siteid:date in redis (for example). Doing count in sql is just a bad idea for this scenario.
I have the following two tables in my database (the indexing is not complete as it will be based on which engine I use):
Table 1:
CREATE TABLE `primary_images` (
`imgId` smallint(6) unsigned NOT NULL AUTO_INCREMENT,
`imgTitle` varchar(255) DEFAULT NULL,
`view` varchar(45) DEFAULT NULL,
`secondary` enum('true','false') NOT NULL DEFAULT 'false',
`imgURL` varchar(255) DEFAULT NULL,
`imgWidth` smallint(6) DEFAULT NULL,
`imgHeight` smallint(6) DEFAULT NULL,
`imgDate` datetime DEFAULT NULL,
`imgClass` enum('jeans','t-shirts','shoes','dress_shirts') DEFAULT NULL,
`imgFamily` enum('boss','lacoste','tr') DEFAULT NULL,
`imgGender` enum('mens','womens') NOT NULL DEFAULT 'mens',
PRIMARY KEY (`imgId`),
UNIQUE KEY `imgDate` (`imgDate`)
)
Table 2:
CREATE TABLE `secondary_images` (
`imgId` smallint(6) unsigned NOT NULL AUTO_INCREMENT,
`primaryId` smallint(6) unsigned DEFAULT NULL,
`view` varchar(45) DEFAULT NULL,
`imgURL` varchar(255) DEFAULT NULL,
`imgWidth` smallint(6) DEFAULT NULL,
`imgHeight` smallint(6) DEFAULT NULL,
`imgDate` datetime DEFAULT NULL,
PRIMARY KEY (`imgId`),
UNIQUE KEY `imgDate` (`imgDate`)
)
Table 1 will be used to create a thumbnail gallery with links to larger versions of the image. imgClass, imgFamily, and imgGender will refine the thumbnails that are shown.
Table 2 contains images related to those in Table 1. Hence the use of primaryId to relate a single image in Table 1, with one or more images in Table 2. This is where I was thinking of using the Foreign Key ability of InnoDB, but I'm also familiar with the ability of Indexes in MyISAM to do the same.
Without delving too much into the remaining fields, imgDate is used to order the results.
Last, but not least, I should mention that this database is READ ONLY. All data will be entered by me. I have been told that if a database is read only, it should be MyISAM, but I'm hoping you can shed some light on what you would do in my situation.
Always use InnoDB by default.
In MySQL 5.1 later, you should use InnoDB. In MySQL 5.1, you should enable the InnoDB plugin. In MySQL 5.5, the InnoDB plugin is enabled by default so just use it.
The advice years ago was that MyISAM was faster in many scenarios. But that is no longer true if you use a current version of MySQL.
There may be some exotic corner cases where MyISAM performs marginally better for certain workloads (e.g. table-scans, or high-volume INSERT-only work), but the default choice should be InnoDB unless you can prove you have a case that MyISAM does better.
Advantages of InnoDB besides the support for transactions and foreign keys that is usually mentioned include:
InnoDB is more resistant to table corruption than MyISAM.
Row-level locking. In MyISAM, readers block writers and vice-versa.
Support for large buffer pool for both data and indexes. MyISAM key buffer is only for indexes.
MyISAM is stagnant; all future development will be in InnoDB.
See also my answer to MyISAM versus InnoDB
MyISAM won't enable you to do mysql level check. For instance if you want to update the imgId on both tables as a single transaction:
START TRANSACTION;
UPDATE primary_images SET imgId=2 WHERE imgId=1;
UPDATE secondary_images SET imgId=2 WHERE imgId=1;
COMMIT;
Another drawback is integrity check, using InnoDB you can do some error check like to avoid duplicated values in the field UNIQUE KEY imgDate (imgDate). Trust me, this really come at hand and is way less error prone. In my opinion MyISAM is for playing around while some more serious work should rely on InnoDB.
Hope it helps
A few things to consider :
Do you need transaction support?
Will you be using foreign keys?
Will there be a lot of writes on a table?
If answer to any of these questions is "yes", then you should definitely use InnoDB.
Otherwise, you should answer the following questions :
How big are your tables?
How many rows do they contain?
What is the load on your database engine?
What kind of queries you expect to run?
Unless your tables are very large and you expect large load on your database, either one works just fine.
I would prefer MyISAM because it scales pretty well for a wide range of data-sizes and loads.
I would like to add something that people may benefit from:
I've just created a InnoDB table (leaving everything as the default, except changing the collation to Unicode), and populated it with about 300,000 records (rows).
Queries like SELECT COUNT(id) FROM table - would hang until giving an error message, not returning a result;
I've cloned the table with the data into a new MyISAM table -
and that same query, along with other large SELECTqueries - would return fast, and everything worked ok.
I have a high CPU problem with MYSQL using "top" ( linux ) shows cpu peaks of 90%.
I was trying to find the source of the problem, turned on general log and slow query log,
The slow query log did not find anything.
The Db contains a few small tables and one large table that contains almost 100k rows, Database Engine is MyIsam. strange thing i have noticed that on the large table, select, insert are very fast but update takes 0.2 - 0.5 secs.
already used optimize and repair and no improvement.
the table is being updated frequently, could this be the source of the high CPU% ?
What can i do to improve this?
Does your MySQL server have ganglia setup on it? Regular ganglia metrics along with the mysql_stats plugin for ganglia might reveal what's going on.
I found mytop extremely helpful.
First of all, could you define which is the query that is overloading the server? In that case, please paste it here and may be we can give you a hand on it.
Also, please look at the table structure. Tables with many indexes are likely to have slow updating timespans.
I also recommend you to give us more data about the problem.
Hope that helps,
The first thing that pops into mind is indexing but that doesn't fit since your selects and inserts are fast. It's usually inserts and updates that will slow down on an "overindexed" table. That leaves triggers... do you have an update trigger on that table that could be doing a lot of work and causing the spike?
a query that takes .5 secs won't showup in top cpu of 100%. Its too small.
Also try "show full processlist"; verify you my.cnf and even try reducing the slow query timeout. slow query log can catch anything that is slow long enough.
Any update statement on that table based on the table's key is slow.
for example UPDATE customers SET CustMoney = 1 WHERE CustUID = 'someid'
CREATE TABLE IF NOT EXISTS `customers` (
`CustFullName` varchar(45) NOT NULL,
`CustPassword` varchar(45) NOT NULL,
`CustEmail` varchar(128) NOT NULL,
`SocialNetworkId` tinyint(4) NOT NULL,
`CustUID` varchar(64) character set ascii NOT NULL,
`CustMoney` bigint(20) NOT NULL default '0',
`LastIpAddress` varchar(45) character set ascii NOT NULL,
`LastLoginTime` datetime NOT NULL default '1900-10-10 10:10:10',
`SmallPicURL` varchar(120) character set ascii default '',
`LargePicURL` varchar(120) character set ascii default '',
`LuckyChips` int(10) unsigned NOT NULL default '0',
`AccountCreationTime` datetime NOT NULL default '2009-11-11 11:11:11',
`AccountStatus` tinyint(4) NOT NULL default '1',
`CustLevel` int(11) NOT NULL default '0',
`City` varchar(32) NOT NULL default '',
`State` varchar(32) NOT NULL default '0',
`Country` varchar(32) NOT NULL default '',
`Zip` varchar(16) character set ascii NOT NULL,
`CustExp` bigint(20) NOT NULL default '0',
PRIMARY KEY (`CustUID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Again im not sure that this is the cause for the high CPU Usage but it seems to me that its not normal for an update statement to take that long. ( 0.5 sec)
The table is being updated up to 5 times in a sec at the moment and in the future it will update more frequently.
What kind of server is this? I've seen slooow writes and relatively fast reads on virtual machines. What does http://en.wikipedia.org/wiki/Hdparm has to say?
What cpu/ram you have on it? What the load avg?