SQL Table Split-Is it necessary - mysql

I've a table with around 6-7lacs records and it's going to grow as time passes.It has around 16-20 columns in it. There are no one-many relationship to any of these columns.
User data entries are stored in these table.
So would it be feasible to split my table into multiple small tables or else just split the table into 2 halfs one with all the entries in it and other the recently fresh records which would be present to the data entry operators to feed in their entries.
In short my question is whether the mysql execution time would be faster if I split the tables, or would it be faster if I split them into two half's.
I guess the latter would be more feasible since it would not perform any join queries.
Updated:
CREATE TABLE `images` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`primary_category_id` int(10) unsigned DEFAULT NULL,
`secondary_category_id` int(10) unsigned DEFAULT NULL,
`front_url` varchar(255) DEFAULT NULL,
`back_url` varchar(255) DEFAULT NULL,
`title` varchar(100) DEFAULT NULL,
`part` varchar(10) DEFAULT NULL,
`photo_id` int(10) unsigned DEFAULT NULL,
`photo_dt_month` varchar(2) DEFAULT NULL,
`photo_dt_day` varchar(2) DEFAULT NULL,
`photo_dt_yr` varchar(4) DEFAULT NULL,
`type` varchar(25) DEFAULT NULL,
`size_width` int(10) unsigned DEFAULT NULL,
`size_height` int(10) unsigned DEFAULT NULL,
`dpi` int(10) unsigned NOT NULL DEFAULT '0',
`dpix` int(10) unsigned DEFAULT NULL,
`dpiy` int(10) unsigned DEFAULT NULL,
`in_stock` varchar(50) DEFAULT NULL,
`outlet` varchar(50) DEFAULT NULL,
`source` varchar(50) DEFAULT NULL,
`keywords` varchar(255) DEFAULT NULL,
`emotional_keywords` varchar(255) DEFAULT NULL,
`mechanical_keywords` varchar(255) DEFAULT NULL,
`description` text,
`notes` text,
`comments` text,
`exported_to_ebay_dt` datetime DEFAULT NULL,
`exported_to_ebay` set('Y','N') NOT NULL DEFAULT 'N',
`updated_worker_id` int(10) unsigned DEFAULT NULL,
`updated_worker_dt` datetime DEFAULT NULL,
`locked_worker_id` int(10) unsigned DEFAULT NULL,
`locked_worker_dt` datetime DEFAULT NULL,
`updated_admin_id` int(10) unsigned DEFAULT NULL,
`updated_admin_dt` datetime DEFAULT NULL,
`added_dt` datetime DEFAULT NULL,
`updated_manager_id` int(10) unsigned DEFAULT NULL,
`updated_manager_dt` datetime DEFAULT NULL,
`manager_review` set('Y','N') NOT NULL DEFAULT 'N',
`paid_status` set('Y','N') NOT NULL DEFAULT 'N',
`exported_to_web_dt` datetime DEFAULT NULL,
`exported_to_web` set('Y','N') DEFAULT 'N',
`prefix` varchar(50) DEFAULT NULL,
`is_premium` set('Y','N') DEFAULT 'N',
`template` varchar(50) DEFAULT 'HIPE_default',
`photographer` varchar(100) DEFAULT NULL,
`copyright` varchar(100) DEFAULT NULL,
`priority` int(4) DEFAULT '1',
`step` set('1','2') DEFAULT '1',
PRIMARY KEY (`id`),
UNIQUE KEY `part` (`part`),
KEY `primary_category_id` (`primary_category_id`),
KEY `updated_worker_id` (`updated_worker_id`),
KEY `updated_worker_dt` (`updated_worker_dt`)
) ENGINE=MyISAM AUTO_INCREMENT=1013687 DEFAULT CHARSET=latin1
The above is my table structure.After there are entries being made say around 1lac I would split it into another table say images_history with same structure.Is this feasible or should I split them into multiple tables to reduce the query execution time

Why do you want to split the table? It would lead to a ton of extra code and slow down the execution time by adding extra queries if you still want to access both of the new tables. (If one of the tables are going to store rarely used previous versions of records of the images table - i.e. version control - it may still be a good idea).
Before even thinking about splitting the table, see if you can increase performance by optimizing the existing code by making sure none of the following performance disasters are:
Do all SELECTs filter by PRIMARY KEY?
Is the index cache large enough to hold all indices in the computers RAM?
Are any string matching SELECTs with LIKE using the indices? I.e. only exact matches or wildcards on the right, never on the left (e.g. "searchword%" and never "%searchword"
Are there any slow performing queries that use SELECT * instead of selecting only the columns you need?
Have you avoided using OR in SELECTs?
Performing queries on a table with 700 000 records shouldn't be slow if tabels are properly indexed and queries are actually using those indices.

Related

MySQL extremely slow on a very simple query

I'm getting very slow response running a very simple query in a small table (115k records)...
It takes about 8sec to respond, and I can't figure out why it's taking that long. Any advice would be awesome
Table:
CREATE TABLE `financeiro_fluxo` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`branch` int(10) unsigned NOT NULL,
`abertura` int(10) DEFAULT NULL,
`origem` int(10) unsigned DEFAULT NULL,
`status_pagamento` tinyint(3) unsigned DEFAULT NULL,
`conta` int(10) unsigned NOT NULL,
`tipo_lancamento` tinyint(3) unsigned NOT NULL,
`categoria` int(10) unsigned NOT NULL,
`tipo_entidade` varchar(32) COLLATE utf8_unicode_ci NOT NULL,
`entidade` int(10) unsigned DEFAULT NULL,
`entidade_input` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`tipo_pagamento` tinyint(3) unsigned NOT NULL,
`parcela` smallint(5) unsigned NOT NULL,
`parcelas` smallint(5) unsigned NOT NULL,
`valor` decimal(12,2) NOT NULL,
`valor_taxa` decimal(12,2) DEFAULT NULL,
`valor_troco` decimal(12,2) DEFAULT NULL,
`confirmado` tinyint(3) unsigned DEFAULT NULL,
`data_confirmacao` datetime DEFAULT NULL,
`vencimento` date NOT NULL,
`info` varchar(510) COLLATE utf8_unicode_ci DEFAULT NULL,
`bandeira` int(10) unsigned DEFAULT NULL,
`user_add` int(10) unsigned NOT NULL,
`user_last` int(10) unsigned NOT NULL,
`param_ref` varchar(32) COLLATE utf8_unicode_ci DEFAULT NULL,
`param` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`file` int(10) unsigned DEFAULT NULL,
`date_created` datetime NOT NULL,
`date_modified` datetime NOT NULL,
`status` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=116749 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Query:
SELECT * from financeiro_fluxo
Explain:
id select_type table type key key_len rows
1 SIMPLE financeiro_fluxo ALL 116244
The same query running on localhost with the same table, returns in less than a sec...
Profile:
Seems you are doing a full table scan because your query does not include any limiting conditions (for example WHERE clause or LIMIT). To let the query preform better use indexed columns with some kind of criteria. What happens if you add WHERE id IS NOT NULL
I assume you need all the records, if not limit the result set by added conditions in a more specific WHERE clause (on a indexed column) or a LIMIT clause.
Will the "reports" aggregate data? Of so, you could speed up the 8 second (remote) query by doing more work in the server, thereby shipping less data across the wire.
That is, think about whether AVG(..), COUNT(*), SUM(..), MAX(..), etc can be done in the SELECT.
Taking that another step... Build and maintain a "Summary table" that has subtotals (etc). Then, reading (or scanning) the summary table and summing up the subtotals, etc, will be even faster, both in the Server and across the wire.
(And I agree with the need to avoid *, and that the 8 seconds is probably due to network delay (and "bandwidth"). Where is the server geographically? How long does SELECT 1; take?)

MySQL int Column Mapping to Int64. How can I make it map to Int32 (ie. int)?

I have the following table in my MySQL database:
CREATE TABLE `users` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(15) NOT NULL,
`password` varchar(255) NOT NULL,
`full_name` varchar(30) NOT NULL,
`email` varchar(200) NOT NULL,
`ip_address` varchar(15) NOT NULL,
`joined_unix` int(20) unsigned NOT NULL,
`avatar` varchar(255) DEFAULT NULL,
`bio` text,
`email_confirmed` int(1) NOT NULL DEFAULT '0',
`email_confirmation_security_hash` varchar(64) DEFAULT NULL,
`last_notification_time` int(10) unsigned NOT NULL DEFAULT '0',
`last_login_unix` int(20) unsigned NOT NULL DEFAULT '0',
`admin` int(1) unsigned NOT NULL DEFAULT '0',
`SecurityStamp` longtext,
`PhoneNumber` longtext,
`PhoneNumberConfirmed` tinyint(1) NOT NULL,
`TwoFactorEnabled` tinyint(1) NOT NULL,
`LockoutEndDateUtc` datetime DEFAULT NULL,
`LockoutEnabled` tinyint(1) NOT NULL,
`AccessFailedCount` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `username` (`username`),
UNIQUE KEY `email` (`email`),
KEY `Joined unix for stats page` (`joined_unix`)
) ENGINE=InnoDB AUTO_INCREMENT=3035 DEFAULT CHARSET=utf8;
I notice that my id column maps to Int64 within my EDMX file and if I change it, queries throw an exception even though the EDMX validates successfully.
How can I make it map to just int (Int32) instead of Int64?
The number (e.g., 10) after any integer data type has nothing to do with the data the column can contain in MySQL. That number is known as the "display width."
The display width does not constrain the range of values that can be stored in the column. 
http://dev.mysql.com/doc/refman/5.6/en/numeric-type-attributes.html
It's a legacy feature from the days of fixed-character-width terminals that an application could use as a "hint" of the widest value that should be expected, and nothing more. If what you're using is actually using that information for something else, then that's broken behavior.
An INT UNSIGNED can hold any 32 bit unsigned integer -- values between between 0 and 2^32-1, regardless of the display width in the column definition.
If you have a system that doesn't deal with 32 bit unsigned integers, but you really wany to use that 32 bit data type, you should declare the column as INT, which is a signed 32 bit integer, with the accompanying smaller range on either side of zero, instead of INT UNSIGNED.
http://dev.mysql.com/doc/refman/5.6/en/integer-types.html

MySQL index help - which is faster?

What I'm dealing with:
I have a project which uses ActiveCollab 2, and the database structure is new to me - practically everything gets stored to a project_objects table and has a recursively hierarchical relationship:
Record 1234 might be type "Ticket" with parent_id of 123
Record 123 might be type "Category" with parent_id of 12
Record 12 might be type "Milestone" and so on.
Currently there are upwards of 450,000 records in this table and many of the queries in the code reference the name field which does NOT have an index on it. An example value might be Design or Development.
This might be an example query:
SELECT * FROM project_objects WHERE type = "Ticket" and name = "Design"
My problem:
I have a query that is taking upwards of 12-15 seconds and I have a feeling it's from that
name column lacking the index and requiring the full text search. My understanding with indexes is that if I add one to the name field, it'll speed up the reads, but slow down the inserts and updates. Does the index need to get rebuilt completely every time a record is added or updated or is it just altered/appended? I don't want to optimize this query with an index if it means drastically slowing down other parts of the code base which depend on faster writes.
My question:
Assume 100 reads and 100 writes per day, which is more likely to be a faster process for MySQL - executing the above query on the above table without the index or having to rebuild the index every time a record is added?
I don't have the knowledge or authority to start running benchmarks, but I would like to offer a suggestion to the client without sounding completely novice. Thanks!
EDIT: Here is the table:
'CREATE TABLE `project_objects` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`source` varchar(50) DEFAULT NULL,
`type` varchar(30) NOT NULL DEFAULT ''ProjectObject'',
`module` varchar(30) NOT NULL DEFAULT ''system'',
`project_id` int(10) unsigned NOT NULL DEFAULT ''0'',
`milestone_id` int(10) unsigned DEFAULT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
`parent_type` varchar(30) DEFAULT NULL,
`name` varchar(150) DEFAULT NULL,
`body` longtext,
`tags` text,
`state` tinyint(4) NOT NULL DEFAULT ''0'',
`visibility` tinyint(4) NOT NULL DEFAULT ''0'',
`priority` tinyint(4) DEFAULT NULL,
`created_on` datetime DEFAULT NULL,
`created_by_id` smallint(5) unsigned NOT NULL DEFAULT ''0'',
`created_by_name` varchar(100) DEFAULT NULL,
`created_by_email` varchar(100) DEFAULT NULL,
`updated_on` datetime DEFAULT NULL,
`updated_by_id` smallint(5) unsigned DEFAULT NULL,
`updated_by_name` varchar(100) DEFAULT NULL,
`updated_by_email` varchar(100) DEFAULT NULL,
`due_on` date DEFAULT NULL,
`completed_on` datetime DEFAULT NULL,
`completed_by_id` smallint(5) unsigned DEFAULT NULL,
`completed_by_name` varchar(100) DEFAULT NULL,
`completed_by_email` varchar(100) DEFAULT NULL,
`comments_count` smallint(5) unsigned DEFAULT NULL,
`has_time` tinyint(1) unsigned NOT NULL DEFAULT ''0'',
`is_locked` tinyint(3) unsigned DEFAULT NULL,
`estimate` float(9,2) DEFAULT NULL,
`start_on` date DEFAULT NULL,
`start_on_text` varchar(50) DEFAULT NULL,
`due_on_text` varchar(50) DEFAULT NULL,
`workflow_status` int(4) DEFAULT NULL,
`varchar_field_1` varchar(255) DEFAULT NULL,
`varchar_field_2` varchar(255) DEFAULT NULL,
`integer_field_1` int(11) DEFAULT NULL,
`integer_field_2` int(11) DEFAULT NULL,
`float_field_1` double(10,2) DEFAULT NULL,
`float_field_2` double(10,2) DEFAULT NULL,
`text_field_1` longtext,
`text_field_2` longtext,
`date_field_1` date DEFAULT NULL,
`date_field_2` date DEFAULT NULL,
`datetime_field_1` datetime DEFAULT NULL,
`datetime_field_2` datetime DEFAULT NULL,
`boolean_field_1` tinyint(1) unsigned DEFAULT NULL,
`boolean_field_2` tinyint(1) unsigned DEFAULT NULL,
`position` int(10) unsigned DEFAULT NULL,
`version` int(10) unsigned NOT NULL DEFAULT ''0'',
PRIMARY KEY (`id`),
KEY `type` (`type`),
KEY `module` (`module`),
KEY `project_id` (`project_id`),
KEY `parent_id` (`parent_id`),
KEY `created_on` (`created_on`),
KEY `due_on` (`due_on`)
KEY `milestone_id` (`milestone_id`)
) ENGINE=InnoDB AUTO_INCREMENT=993109 DEFAULT CHARSET=utf8'
As #Ray points out, indexes do not have to be rebuilt on every Insert, Update or Delete operation. So, if you only want to improve efficuency of this (or similar) queries, add either an index on (name, type) or on (type, name).
Since you already have an index on (type) alone, I would add the first one:
ALTER TABLE project_objects
ADD INDEX name_type_IDX
(name, type) ;
It may take a few seconds on a busy server but it has to be done once and then all the queries with conditions like yours will benefit. It may also improve efficiency of several other types of queries that involve name only or name and type:
WHERE name = 'Design' AND type = 'Ticket' --- your query
WHERE name = 'Design' --- condition on `name` only
GROUP BY name --- group by `name`
WHERE name LIKE 'Design%' --- range condition on `name` only
WHERE name = 'Design' --- equality condition on `name`
AND type LIKE 'Ticket%' --- and range condition on `type`
WHERE name = 'Design' --- equality condition on `name`
GROUP BY type --- and group by `type`
GROUP BY name --- group by `name`
, type --- and `type`
The insert cost of adding a single point index on the name column is most likely negligible--it will probably amount to an addition of a constant time increase, probably no more that a few milliseconds. You will eat up some extra disk space, but that's usually not a concern. Nothing like the multiple seconds you're experienceing on select performance.
Add the index, enjoy the performance improvement.
BTW: Indexes aren't 'rebuilt' on every insert. They're usually implemented in B-Trees and unless you're deleting frequently, should require very little re-balancing once you get larger than a few levels (and rebalancing with little depth is pretty cheap).

MySQL Error - unused primary key value returns already used

MySQL MyISAM database which currently has 2,280 rows in a table has locked up twice in the past 6 months or so.
When trying to add a new row it says "Primary key already used", when the next increment value is higher than the last id in the table. Seems to fix itself when I reset the auto increment.
Database is for a site which gets around 200+ hits a day, peak of about 25-20 an hour, so can't imagine it's due to overload on the database.
Trying to figure out why this keeps happening and if I can fix the issue so the client doesn't have to keep calling up whenever they can't add a new article to their site.
EDIT: Just to preempt potential comments, I realise that the table and code are not ideal, but I'm not looking for ways I should improve this, unless it's the root cause of the problem, poor performance/security I can live with (just), but I do need to figure out what could be causing it to lock the table. Thanks.
Table Definition
CREATE TABLE `articles` (
`article_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`article_meta_desc` text,
`article_meta_keyw` text,
`article_title` varchar(255) DEFAULT NULL,
`article_date` int(10) unsigned DEFAULT NULL,
`article_intro` text,
`article_embed` text,
`article_content` text,
`article_sector` varchar(255) DEFAULT NULL,
`article_type` varchar(255) DEFAULT NULL,
`article_ma` tinyint(1) unsigned DEFAULT NULL,
`article_pn` tinyint(1) unsigned DEFAULT NULL,
`article_cw` tinyint(1) unsigned DEFAULT NULL,
`article_er` tinyint(1) unsigned DEFAULT NULL,
`article_kr` tinyint(1) unsigned DEFAULT NULL,
`article_rc` tinyint(1) unsigned DEFAULT NULL,
`article_rs` tinyint(1) DEFAULT NULL,
`article_img_s` varchar(255) DEFAULT NULL,
`article_img_l` varchar(255) DEFAULT NULL,
`article_link` varchar(255) DEFAULT NULL,
`article_highlight` int(10) unsigned DEFAULT NULL,
`article_slug` text,
`article_alias` text,
`article_hide` smallint(1) unsigned NOT NULL DEFAULT '0',
`article_ad_layout` tinyint(1) unsigned DEFAULT '0',
`article_ad_banner` smallint(5) unsigned DEFAULT NULL,
`article_ad_sky` smallint(5) unsigned DEFAULT NULL,
`article_ad_square1` smallint(5) unsigned DEFAULT NULL,
`article_ad_square2` smallint(5) unsigned DEFAULT NULL,
`article_ad_square3` smallint(5) unsigned DEFAULT NULL,
`article_newswire` smallint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`article_id`),
FULLTEXT KEY `full_text` (`article_title`,`article_intro`,`article_content`)
) ENGINE=MyISAM AUTO_INCREMENT=2384 DEFAULT CHARSET=latin1 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC;
_ma,_pn,_cw,_er,_kr,_rc,_rs are used for showing which category articles are for. Please ignore the bad use of a table for the ads and section, site was made quite a long time ago, I have learnt better since :p
Insert statement
INSERT INTO articles (article_id, article_meta_desc, article_meta_keyw, article_title, article_date, article_intro, article_embed, article_content, article_sector, article_type, article_ma, article_pn, article_cw, article_er, article_kr, article_rc, article_rs, article_img_s, article_img_l, article_link, article_highlight, article_slug, article_alias, article_hide)
VALUES ('','$insert_article_meta_desc','$insert_article_meta_keyw','$insert_article_title','$insert_article_date','$insert_article_intro','$insert_article_embed','$insert_article_content','$insert_article_sector','$insert_article_category','$insert_article_ma','$insert_article_pn','$insert_article_cw','$insert_article_er','$insert_article_kr','$insert_article_rc','$insert_article_rs','$insert_article_img_s','$insert_article_img_l','','$insert_article_highlight','$insert_article_slug','','$insert_article_hide')
Again, old site, please forgive me. Not sure if it's something to do with doing an insert that sets the id to '' which would then be set to the next increment value, could this cause problems?
Try taking out the article_id field name, and first '' in the values, from the INSERT stmt; the auto increment ID should get allocated automatically.

MySQL architecture (hierarchical data)

Basically the question is whether I should use the same columns in every table that uses hierarchical data or instead have one table with those columns that handles hierarchical data of all the types of data.
My database stores different types of hierarchical data: Pages, Questions, etc. Below is given the questions database.
CREATE TABLE `hp_questions` (
`client_id` int(4) unsigned NOT NULL,
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`root_id` int(4) unsigned NOT NULL,
`parent_id` int(4) unsigned NOT NULL,
`depth` int(4) unsigned NOT NULL DEFAULT '0',
`position` int(4) unsigned NOT NULL DEFAULT '0',
`absolute_position` int(4) unsigned NOT NULL DEFAULT '0',
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`slug` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`uri` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`body` longtext COLLATE utf8_unicode_ci,
`last_modified_by_id` int(4) unsigned DEFAULT NULL,
`last_modified_on` int(4) unsigned DEFAULT NULL,
`language` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `client_id` (`client_id`)
) ENGINE=InnoDB AUTO_INCREMENT=738 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Now, root_id, parent_id, depth, position, absolute_position are used only for hierarchical data. The same columns has pages, filesystem, templates and permissions tables. I was wondering whether it would be more correct to put them into one table instead and add additional column type indicating the type of data?
The 'correct' answer depends strongly on what you are trying to do and how the data is presented, but I'd say yes, your suggestion makes sense. If the hierarchy is at the core of the data organization, then separating it out makes sense as it normalizes your data.
Doing so would require that you add additional logic in code to ensure that you link to the correct questions/pages/filesystems/etc tables depending on what you are looking at.