MySQL: Denormalization on many tables performance - mysql

I need to decide which is better for performance:
A) Retrieving data from one of ~100 similar tables LEFT JOIN 1 table.
B) Retrieving data from one of ~100 similar tables after denormalizing 1 table that I joined in A).
I'm curious if denormalizing this one table pays off in SELECT performance, since I'm creating a lot more columns in database - similar tables will have 3-15(let's say 8) columns and the table to denormalize will have ~6columns.
So in variant A) I got 100 tables * 8 columns + 1 table * 6 columns = 806 columns.
In variant B) I got 100 tables * (8 columns + 6 columns) = 1400 columns.
So which is better when we're not looking at disk space, only focusing on performance?
-------------EDIT-----------------
As Rick James asked for SHOW CREATE TABLE - the competing one :
CREATE TABLE `ItemsGeneral` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`description` text COLLATE utf8mb4_unicode_ci NOT NULL,
`datePosted` datetime NOT NULL,
`dateEnds` datetime NOT NULL,
`photos` tinyint(3) unsigned NOT NULL,
`userId` int(10) unsigned NOT NULL,
`locationSimple` point NOT NULL,
`locationPrecise` point NOT NULL,
PRIMARY KEY (`id`),
SPATIAL KEY `locationPrecise` (`locationPrecise`),
SPATIAL KEY `locationSimple` (`locationSimple`),
KEY `userId` (`userId`),
KEY `dateEnds` (`dateEnds`)
) ENGINE=MyISAM AUTO_INCREMENT=10001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
And all the other ~100s tables will have tiny/small/medium ints.

The implication of your question is that you already threw normalization out the window from the start.
In terms of performance, a standard rule of thumb in terms of result speed (I'll use > to mean faster than here)
cached data > data cached query > query on SSD > query on magneto optical disk
Another rule of thumb for database performance is that optimization of the size of the data set is directly related to performance against a given resource.
Now in terms of joins, there is a price to pay for a simple keyed join, but since these types of queries are typically measured in milliseconds, that certainly isn't a reason to denormalize lots of data and in the process blow up your dataset 20%, especially if you might need to update all that data.
Normalization is simply a cost for having atomic accurate data but it also helps keep your dataset to an optimal size.
Just as an example, if you have a mysql server, and you are using InnoDB, AND you have properly allocated memory on the server to your InnoDB cache, you can often see an extremely high cache hit ratio, where the queries are coming straight out of ram. At that point the fact that more of your database can be in cache due to the size of the dataset is more important than the fact that you joined 2 tables together.
I find on the regular, that people who setup mysql but aren't experts in it, aren't aware either that the size of their dataset could fit entirely in cache (were they to allocate it), or that they haven't even changed any of the default values and have essentially almost no allocation to cache.
Just to be clear, this involves configuration of the innodb_buffer_pool_size and innodb_buffer_pool_instances.

Related

Optimizing a giant mysql table

I have a giant mysql table which is growing at all the time. It's recording chat data.
this what my table looks like
CREATE TABLE `log` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`channel` VARCHAR(26) NOT NULL,
`timestamp` DATETIME NOT NULL,
`username` VARCHAR(25) NOT NULL,
`message` TEXT NOT NULL,
PRIMARY KEY (`id`),
INDEX `username` (`username`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=2582573
;
Indexing the username is kinda important because queries for a username can take like 5 seconds otherwise.
Is there anyway of optimizing this table even more to prepare it for huge amounts of data.
So that even 100m rows won't be a problem.
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
Will you have more than 4 billion rows? If not, use INT UNSIGNED, saving 4 bytes per row. Plus another 4 bytes for each row in the secondary index.
`channel` VARCHAR(26) NOT NULL,
`username` VARCHAR(25) NOT NULL,
Normalize each -- that is, replace this by, say, a SMALLINT UNSIGNED and have a mapping between them. Savings: lots.
INDEX `username` (`username`)
That becomes user_id, saving even more.
Smaller --> more cacheable --> faster.
What other queries will you have?
"Memory usage" -- For InnoDB, set innodb_buffer_pool_size to about 70% of available RAM. Then, let it worry about what is in memory, what it not. Once the table is too big to be cached, you should shrink the data (as I mentioned above) and provide 'good' indexes (as mentioned in other comments) and perhaps structure the table for "locality of reference" (without knowing all the queries, I can't address this).
You grumbled about using IDs instead of strings... Let's take a closer look at that. How many distinct usernames are there? channels? How does the data come in -- do you get one row at a time, or batches? Is something doing direct INSERTs or feeding to some code that does the INSERTs? Could there be a STORED PROCEDURE to do the normalization and insertion? If you need hundreds of rows inserted per second, then I can discuss how to do both, and do them efficiently.
You did not ask about PARTITIONs. I do not recommend it for a simple username query.
2.5M rows is about the 85th percentile. 100M rows is more exciting -- 98th percentile.

MySql - Handle table size and performance

We are having a Analytics product. For each of our customer we give one JavaScript code, they put that in their web sites. If a user visit our customer site the java script code hit our server so that we store this page visit on behalf of this customer. Each customer contains unique domain name.
we are storing this page visits in MySql table.
Following is the table schema.
CREATE TABLE `page_visits` (
`domain` varchar(50) DEFAULT NULL,
`guid` varchar(100) DEFAULT NULL,
`sid` varchar(100) DEFAULT NULL,
`url` varchar(2500) DEFAULT NULL,
`ip` varchar(20) DEFAULT NULL,
`is_new` varchar(20) DEFAULT NULL,
`ref` varchar(2500) DEFAULT NULL,
`user_agent` varchar(255) DEFAULT NULL,
`stats_time` datetime DEFAULT NULL,
`country` varchar(50) DEFAULT NULL,
`region` varchar(50) DEFAULT NULL,
`city` varchar(50) DEFAULT NULL,
`city_lat_long` varchar(50) DEFAULT NULL,
`email` varchar(100) DEFAULT NULL,
KEY `sid_index` (`sid`) USING BTREE,
KEY `domain_index` (`domain`),
KEY `email_index` (`email`),
KEY `stats_time_index` (`stats_time`),
KEY `domain_statstime` (`domain`,`stats_time`),
KEY `domain_email` (`domain`,`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
We don't have primary key for this table.
MySql server details
It is Google cloud MySql (version is 5.6) and storage capacity is 10TB.
As of now we are having 350 million rows in our table and table size is 300 GB. We are storing all of our customer details in the same table even though there is no relation between one customer to another.
Problem 1: For few of our customers having huge number of rows in table, so performance of queries against these customers are very slow.
Example Query 1:
SELECT count(DISTINCT sid) AS count,count(sid) AS total FROM page_views WHERE domain = 'aaa' AND stats_time BETWEEN CONVERT_TZ('2015-02-05 00:00:00','+05:30','+00:00') AND CONVERT_TZ('2016-01-01 23:59:59','+05:30','+00:00');
+---------+---------+
| count | total |
+---------+---------+
| 1056546 | 2713729 |
+---------+---------+
1 row in set (13 min 19.71 sec)
I will update more queries here. We need results in below 5-10 seconds, will it be possible?
Problem 2: The table size is rapidly increasing, we might hit table size 5 TB by this year end so we want to shard our table. We want to keep all records related to one customer in one machine. What are the best practises for this sharding.
We are thinking following approaches for above issues, please suggest us best practices to overcome these issues.
Create separate table for each customer
1) What are the advantages and disadvantages if we create separate table for each customer. As of now we are having 30k customers we might hit 100k by this year end that means 100k tables in DB. We access all tables simultaneously for Read and Write.
2) We will go with same table and will create partitions based on date range
UPDATE : Is a "customer" determined by the domain? Answer is Yes
Thanks
First, a critique if the excessively large datatypes:
`domain` varchar(50) DEFAULT NULL, -- normalize to MEDIUMINT UNSIGNED (3 bytes)
`guid` varchar(100) DEFAULT NULL, -- what is this for?
`sid` varchar(100) DEFAULT NULL, -- varchar?
`url` varchar(2500) DEFAULT NULL,
`ip` varchar(20) DEFAULT NULL, -- too big for IPv4, too small for IPv6; see below
`is_new` varchar(20) DEFAULT NULL, -- flag? Consider `TINYINT` or `ENUM`
`ref` varchar(2500) DEFAULT NULL,
`user_agent` varchar(255) DEFAULT NULL, -- normalize! (add new rows as new agents are created)
`stats_time` datetime DEFAULT NULL,
`country` varchar(50) DEFAULT NULL, -- use standard 2-letter code (see below)
`region` varchar(50) DEFAULT NULL, -- see below
`city` varchar(50) DEFAULT NULL, -- see below
`city_lat_long` varchar(50) DEFAULT NULL, -- unusable in current format; toss?
`email` varchar(100) DEFAULT NULL,
For IP addresses, use inet6_aton(), then store in BINARY(16).
For country, use CHAR(2) CHARACTER SET ascii -- only 2 bytes.
country + region + city + (maybe) latlng -- normalize this to a "location".
All these changes may cut the disk footprint in half. Smaller --> more cacheable --> less I/O --> faster.
Other issues...
To greatly speed up your sid counter, change
KEY `domain_statstime` (`domain`,`stats_time`),
to
KEY dss (domain_id,`stats_time`, sid),
That will be a "covering index", hence won't have to bounce between the index and the data 2713729 times -- the bouncing is what cost 13 minutes. (domain_id is discussed below.)
This is redundant with the above index, DROP it:
KEY domain_index (domain)
Is a "customer" determined by the domain?
Every InnoDB table must have a PRIMARY KEY. There are 3 ways to get a PK; you picked the 'worst' one -- a hidden 6-byte integer fabricated by the engine. I assume there is no 'natural' PK available from some combination of columns? Then, an explicit BIGINT UNSIGNED is called for. (Yes that would be 8 bytes, but various forms of maintenance need an explicit PK.)
If most queries include WHERE domain = '...', then I recommend the following. (And this will greatly improve all such queries.)
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
domain_id MEDIUMINT UNSIGNED NOT NULL, -- normalized to `Domains`
PRIMARY KEY(domain_id, id), -- clustering on customer gives you the speedup
INDEX(id) -- this keeps AUTO_INCREMENT happy
Recommend you look into pt-online-schema-change for making all these changes. However, I don't know if it can work without an explicit PRIMARY KEY.
"Separate table for each customer"? No. This is a common question; the resounding answer is No. I won't repeat all the reasons for not having 100K tables.
Sharding
"Sharding" is splitting the data across multiple machines.
To do sharding, you need to have code somewhere that looks at domain and decides which server will handle the query, then hands it off. Sharding is advisable when you have write scaling problems. You did not mention such, so it is unclear whether sharding is advisable.
When sharding on something like domain (or domain_id), you could use (1) a hash to pick the server, (2) a dictionary lookup (of 100K rows), or (3) a hybrid.
I like the hybrid -- hash to, say, 1024 values, then look up into a 1024-row table to see which machine has the data. Since adding a new shard and migrating a user to a different shard are major undertakings, I feel that the hybrid is a reasonable compromise. The lookup table needs to be distributed to all clients that redirect actions to shards.
If your 'writing' is running out of steam, see high speed ingestion for possible ways to speed that up.
PARTITIONing
PARTITIONing is splitting the data across multiple "sub-tables".
There are only a limited number of use cases where partitioning buys you any performance. You not indicated that any apply to your use case. Read that blog and see if you think that partitioning might be useful.
You mentioned "partition by date range". Will most of the queries include a date range? If so, such partitioning may be advisable. (See the link above for best practices.) Some other options come to mind:
Plan A: PRIMARY KEY(domain_id, stats_time, id) But that is bulky and requires even more overhead on each secondary index. (Each secondary index silently includes all the columns of the PK.)
Plan B: Have stats_time include microseconds, then tweak the values to avoid having dups. Then use stats_time instead of id. But this requires some added complexity, especially if there are multiple clients inserting data. (I can elaborate if needed.)
Plan C: Have a table that maps stats_time values to ids. Look up the id range before doing the real query, then use both WHERE id BETWEEN ... AND stats_time .... (Again, messy code.)
Summary tables
Are many of the queries of the form of counting things over date ranges? Suggest having Summary Tables based perhaps on per-hour. More discussion.
COUNT(DISTINCT sid) is especially difficult to fold into summary tables. For example, the unique counts for each hour cannot be added together to get the unique count for the day. But I have a technique for that, too.
I wouldn't do this if i were you. First thing that come to mind would be, on receive a pageview message, i send the message to a queue so that a worker can pickup and insert to database later (in bulk maybe); also i increase the counter of siteid:date in redis (for example). Doing count in sql is just a bad idea for this scenario.

Is there any performance hit when we reference multiple columns to one table rather than separate tables?

I have a database design like this. I am using MYSQL.
Have a vehicle table to store information about a vehicle
CREATE TABLE `test`.`vehicle` (
`vehicle_id` BIGINT UNSIGNED NOT NULL,
`fuel_type_id_ref` TINYINT UNSIGNED NULL DEFAULT NULL,
`drive_type_id_ref` TINYINT UNSIGNED NULL DEFAULT NULL,
`condition_id_ref` TINYINT UNSIGNED NOT NULL,
`transmission_type_id_ref` TINYINT UNSIGNED NULL DEFAULT NULL,
PRIMARY KEY (`vehicle_id`)
) ENGINE = INNODB CHARSET = latin1 COLLATE = latin1_swedish_ci ;
I used separate tables to store records for each reference id.
for eg: I have a fuel type table to store fuels, transmission type table and so on.
But now I figured that the schema of those tables are pretty much equivalent.
So created a table like this.
CREATE TABLE `test`.`vehicle_feature` (
`veh_feature_id` TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
`feature_type_id_ref` TINYINT UNSIGNED NOT NULL,
`name` VARCHAR (50) NOT NULL,
`is_active` TINYINT (1) NOT NULL DEFAULT TRUE,
PRIMARY KEY (`veh_feature_id`)
) ENGINE = INNODB CHARSET = latin1 COLLATE = latin1_swedish_ci ;
and I put all those fuels and transmisiion types into this table with a feature type Id to identify the group.
Now I have to join same table again and again to retrieve the values from my vehicle table.
So my question is.
Shall I maintain my separate tables or Shall I go with this new approach? Since I have to write same joins again and again there is no reduce in my code. I can easily join my small tables rather than this one table. Also if I use small tables I can go for inner join to join those tables but in here I have to use left joins to join the tables.Also separate tables have less records comparing to one table. All what this approach doing is reduce the tables of my DB( only 4 tables which I dont care ). Sum of all records in these 4 tables will be 100 records.
So what is performance wise good?
This is a bit of a difficult question, because these are both reasonable approaches. The key to deciding is understand what the application needs from this type of data.
A separate table for the items has one nice advantage because foreign key constraints can actually check the referential integrity of the data. Furthermore, each of the entities is treated as a full-fledged bona-fide entity. This is handy if you have other information about the fuels, drives, and transmissions that is specific to that entity. For instance, the fuel could have an octane rating, which could be in the fuel table but does not need to clutter the other reference tables.
On the other hand, you might end up with lots of similar reference tables. And, for your application, these may not need to be full-fledged entities. In that case, having a single table is quite reasonable. This is actually a bigger advantage if you want to internationalize your application. That is, if you want to provide the names of things in multiple languages.
In an object-oriented language, you would approach this problem using inheritance. The three "types" would all be "subclasses" from a class of vehicle attributes. Unfortunately, SQL does not have such built-in concepts.
From a performance perspective, the two methods would both involve relatively small reference tables (I'm guessing at most a few thousand rows), that are accessed via primary keys. There should be very little performance difference between the two approaches. The important concern is how to properly model the data for your application.

Best way to speed up a query in a innodb table with 100.000.000 rows in Mysql 5.6

I have a Mysql 5.6 table with 70 million rows in it, but it will grow to 100+ million rows or more in a few weeks.
I have a dedicated machine with a humble 500GB disk and 4GB RAM and the innodb_buffer_pool_size is set to 2GB.
The database uses 99% to selects and 1% to inserts (once a month).
The most important column is descripcion_detallada_producto varchar(300) and it is where the selects are aimed at in 90% of the times.
My table is:
CREATE TABLE `t1` (
`N_orden` bigint(20) NOT NULL DEFAULT '0',
`Fecha` varchar(15) COLLATE latin1_spanish_ci DEFAULT NULL,
`Ncm` int(11) NOT NULL,
`Origen` int(11) NOT NULL,
`Adquisicion` int(11) NOT NULL,
`Medida_Estadistica` int(11) NOT NULL,
`Unidad_Comercializacion` varchar(30) COLLATE latin1_spanish_ci DEFAULT NULL,
`Descripcion_Detallada_Producto` varchar(300) COLLATE latin1_spanish_ci DEFAULT NULL,
`Cantidad_Estadistica` double DEFAULT NULL,
`Peso_Liquido_Kg` double DEFAULT NULL,
`Valor_Fob` double DEFAULT NULL,
`Valor_Frete` double DEFAULT NULL,
`Valor_Seguro` double DEFAULT NULL,
`Valor_Unidad` double DEFAULT NULL,
`Cantidad` double DEFAULT NULL,
`Valor_Total` double DEFAULT NULL,
PRIMARY KEY (`N_orden`),
KEY `Ncm` (`Ncm`),
KEY `Origen` (`Origen`),
KEY `Adquisicion` (`Adquisicion`),
KEY `Medida_Estadistica` (`Medida_Estadistica`),
KEY `Descripcion_Detallada_Producto` (`Descripcion_Detallada_Producto`),
CONSTRAINT `t1_ibfk_1` FOREIGN KEY (`Ncm`) REFERENCES `ncm` (`Ncm`),
CONSTRAINT `t1_ibfk_2` FOREIGN KEY (`Origen`) REFERENCES `paises` (`Codigo_Pais`),
CONSTRAINT `t1_ibfk_3` FOREIGN KEY (`Adquisicion`) REFERENCES `paises` (`Codigo_Pais`),
CONSTRAINT `t1_ibfk_4` FOREIGN KEY (`Medida_Estadistica`) REFERENCES `medida_estadistica` (`Codigo_Medida_Estadistica`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_spanish_ci;
My question: Today a SELECT query using LIKE '%whatever%' takes normally 5 to 7 minutes, sometimes more. From where I understand the varchar index just are used when 'whatever%' is used, but I NEED to have the possibility to search for strings using left and right wildcards without needing to wait ~7 minutes each search. How can I do it?
The right way to fix the problem is to look at all the queries being run against the table, and their relative frequency. You've only given us part of one. You didn't even say which field it relates to. Since you do say "The most important column is descripcion_detallada_producto varchar(300) and it is where the selects are aimed at in 90% of the times" I'll assume that you only need to optimize
WHERE descripcion_detallada_producto LIKE '%wathever%'
As Vatev has already said, you probably should be using fulltext searches - which are sematically (and syntactically) different from LIKE predicates. Further you should be splitting the descripcion_detallada_producto attribute into it's own relation to reduce the buffer flushing effects of reading huge rows into memory from disk.
If you are searching for entire words that may be anywhere in a text column, you should consider using fulltext indexes, which are obviously used differently than wildcard searches. If you're unsure how to search your fulltext indexes, you can always get help with that.
Doing a search like the following will not use any of your indexes. Instead, it will scan through all rows of your table data, and you're subjected to disk reads (and any correlated disk fragmentation, which isn't usually a problem because we don't usually scan through tables):
SELECT * FROM t1
WHERE Descripcion_Detallada_Producto LIKE `%whatever%'
The following query would just scan through your index on Descripcion_Detallada_Producto which would act as a "covering" index (notice that the columns in the select make the difference):
SELECT N_orden FROM t1
WHERE Descripcion_Detallada_Producto LIKE `%whatever%'
The advantage in scanning an index instead of the actual table data is that the amount of data that is read as it scans is minimized, and ideally with a large innodb_buffer_pool_size, that index would be in memory, which would avoid disk seeks.
Once you get the N_orden values, then you could retrieve the individual records from the table data.
Additional Info
Consider reducing the size of the columns (bigint to unsigned int for N_orden) and reduce size of Descripcion_Detallada_Producto. Even though VARCHAR only uses up actual bytes (plus length) in the table data, each index entry actually uses the max, so reducing even a VARCHAR column size in an index will improve index scan speed.
In addition, if you have categories, restrict searches to selected categories and create a multi-column index on category+description. The following will only have to scan through a portion of a multi-column index on both category and description by restricting the search to a particular category:
SELECT N_orden FROM t1
WHERE Category = 1
AND Descripcion_Detallada_Producto LIKE `%whatever%'
Finally, consider removing wildcard prefixes. Make the user at least type the beginning of the model number.

Should I use MyISAM or InnoDB Tables for my MySQL Database?

I have the following two tables in my database (the indexing is not complete as it will be based on which engine I use):
Table 1:
CREATE TABLE `primary_images` (
`imgId` smallint(6) unsigned NOT NULL AUTO_INCREMENT,
`imgTitle` varchar(255) DEFAULT NULL,
`view` varchar(45) DEFAULT NULL,
`secondary` enum('true','false') NOT NULL DEFAULT 'false',
`imgURL` varchar(255) DEFAULT NULL,
`imgWidth` smallint(6) DEFAULT NULL,
`imgHeight` smallint(6) DEFAULT NULL,
`imgDate` datetime DEFAULT NULL,
`imgClass` enum('jeans','t-shirts','shoes','dress_shirts') DEFAULT NULL,
`imgFamily` enum('boss','lacoste','tr') DEFAULT NULL,
`imgGender` enum('mens','womens') NOT NULL DEFAULT 'mens',
PRIMARY KEY (`imgId`),
UNIQUE KEY `imgDate` (`imgDate`)
)
Table 2:
CREATE TABLE `secondary_images` (
`imgId` smallint(6) unsigned NOT NULL AUTO_INCREMENT,
`primaryId` smallint(6) unsigned DEFAULT NULL,
`view` varchar(45) DEFAULT NULL,
`imgURL` varchar(255) DEFAULT NULL,
`imgWidth` smallint(6) DEFAULT NULL,
`imgHeight` smallint(6) DEFAULT NULL,
`imgDate` datetime DEFAULT NULL,
PRIMARY KEY (`imgId`),
UNIQUE KEY `imgDate` (`imgDate`)
)
Table 1 will be used to create a thumbnail gallery with links to larger versions of the image. imgClass, imgFamily, and imgGender will refine the thumbnails that are shown.
Table 2 contains images related to those in Table 1. Hence the use of primaryId to relate a single image in Table 1, with one or more images in Table 2. This is where I was thinking of using the Foreign Key ability of InnoDB, but I'm also familiar with the ability of Indexes in MyISAM to do the same.
Without delving too much into the remaining fields, imgDate is used to order the results.
Last, but not least, I should mention that this database is READ ONLY. All data will be entered by me. I have been told that if a database is read only, it should be MyISAM, but I'm hoping you can shed some light on what you would do in my situation.
Always use InnoDB by default.
In MySQL 5.1 later, you should use InnoDB. In MySQL 5.1, you should enable the InnoDB plugin. In MySQL 5.5, the InnoDB plugin is enabled by default so just use it.
The advice years ago was that MyISAM was faster in many scenarios. But that is no longer true if you use a current version of MySQL.
There may be some exotic corner cases where MyISAM performs marginally better for certain workloads (e.g. table-scans, or high-volume INSERT-only work), but the default choice should be InnoDB unless you can prove you have a case that MyISAM does better.
Advantages of InnoDB besides the support for transactions and foreign keys that is usually mentioned include:
InnoDB is more resistant to table corruption than MyISAM.
Row-level locking. In MyISAM, readers block writers and vice-versa.
Support for large buffer pool for both data and indexes. MyISAM key buffer is only for indexes.
MyISAM is stagnant; all future development will be in InnoDB.
See also my answer to MyISAM versus InnoDB
MyISAM won't enable you to do mysql level check. For instance if you want to update the imgId on both tables as a single transaction:
START TRANSACTION;
UPDATE primary_images SET imgId=2 WHERE imgId=1;
UPDATE secondary_images SET imgId=2 WHERE imgId=1;
COMMIT;
Another drawback is integrity check, using InnoDB you can do some error check like to avoid duplicated values in the field UNIQUE KEY imgDate (imgDate). Trust me, this really come at hand and is way less error prone. In my opinion MyISAM is for playing around while some more serious work should rely on InnoDB.
Hope it helps
A few things to consider :
Do you need transaction support?
Will you be using foreign keys?
Will there be a lot of writes on a table?
If answer to any of these questions is "yes", then you should definitely use InnoDB.
Otherwise, you should answer the following questions :
How big are your tables?
How many rows do they contain?
What is the load on your database engine?
What kind of queries you expect to run?
Unless your tables are very large and you expect large load on your database, either one works just fine.
I would prefer MyISAM because it scales pretty well for a wide range of data-sizes and loads.
I would like to add something that people may benefit from:
I've just created a InnoDB table (leaving everything as the default, except changing the collation to Unicode), and populated it with about 300,000 records (rows).
Queries like SELECT COUNT(id) FROM table - would hang until giving an error message, not returning a result;
I've cloned the table with the data into a new MyISAM table -
and that same query, along with other large SELECTqueries - would return fast, and everything worked ok.