Basically the question is whether I should use the same columns in every table that uses hierarchical data or instead have one table with those columns that handles hierarchical data of all the types of data.
My database stores different types of hierarchical data: Pages, Questions, etc. Below is given the questions database.
CREATE TABLE `hp_questions` (
`client_id` int(4) unsigned NOT NULL,
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`root_id` int(4) unsigned NOT NULL,
`parent_id` int(4) unsigned NOT NULL,
`depth` int(4) unsigned NOT NULL DEFAULT '0',
`position` int(4) unsigned NOT NULL DEFAULT '0',
`absolute_position` int(4) unsigned NOT NULL DEFAULT '0',
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`slug` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`uri` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`body` longtext COLLATE utf8_unicode_ci,
`last_modified_by_id` int(4) unsigned DEFAULT NULL,
`last_modified_on` int(4) unsigned DEFAULT NULL,
`language` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `client_id` (`client_id`)
) ENGINE=InnoDB AUTO_INCREMENT=738 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Now, root_id, parent_id, depth, position, absolute_position are used only for hierarchical data. The same columns has pages, filesystem, templates and permissions tables. I was wondering whether it would be more correct to put them into one table instead and add additional column type indicating the type of data?
The 'correct' answer depends strongly on what you are trying to do and how the data is presented, but I'd say yes, your suggestion makes sense. If the hierarchy is at the core of the data organization, then separating it out makes sense as it normalizes your data.
Doing so would require that you add additional logic in code to ensure that you link to the correct questions/pages/filesystems/etc tables depending on what you are looking at.
Related
I'm getting very slow response running a very simple query in a small table (115k records)...
It takes about 8sec to respond, and I can't figure out why it's taking that long. Any advice would be awesome
Table:
CREATE TABLE `financeiro_fluxo` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`branch` int(10) unsigned NOT NULL,
`abertura` int(10) DEFAULT NULL,
`origem` int(10) unsigned DEFAULT NULL,
`status_pagamento` tinyint(3) unsigned DEFAULT NULL,
`conta` int(10) unsigned NOT NULL,
`tipo_lancamento` tinyint(3) unsigned NOT NULL,
`categoria` int(10) unsigned NOT NULL,
`tipo_entidade` varchar(32) COLLATE utf8_unicode_ci NOT NULL,
`entidade` int(10) unsigned DEFAULT NULL,
`entidade_input` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`tipo_pagamento` tinyint(3) unsigned NOT NULL,
`parcela` smallint(5) unsigned NOT NULL,
`parcelas` smallint(5) unsigned NOT NULL,
`valor` decimal(12,2) NOT NULL,
`valor_taxa` decimal(12,2) DEFAULT NULL,
`valor_troco` decimal(12,2) DEFAULT NULL,
`confirmado` tinyint(3) unsigned DEFAULT NULL,
`data_confirmacao` datetime DEFAULT NULL,
`vencimento` date NOT NULL,
`info` varchar(510) COLLATE utf8_unicode_ci DEFAULT NULL,
`bandeira` int(10) unsigned DEFAULT NULL,
`user_add` int(10) unsigned NOT NULL,
`user_last` int(10) unsigned NOT NULL,
`param_ref` varchar(32) COLLATE utf8_unicode_ci DEFAULT NULL,
`param` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`file` int(10) unsigned DEFAULT NULL,
`date_created` datetime NOT NULL,
`date_modified` datetime NOT NULL,
`status` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=116749 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Query:
SELECT * from financeiro_fluxo
Explain:
id select_type table type key key_len rows
1 SIMPLE financeiro_fluxo ALL 116244
The same query running on localhost with the same table, returns in less than a sec...
Profile:
Seems you are doing a full table scan because your query does not include any limiting conditions (for example WHERE clause or LIMIT). To let the query preform better use indexed columns with some kind of criteria. What happens if you add WHERE id IS NOT NULL
I assume you need all the records, if not limit the result set by added conditions in a more specific WHERE clause (on a indexed column) or a LIMIT clause.
Will the "reports" aggregate data? Of so, you could speed up the 8 second (remote) query by doing more work in the server, thereby shipping less data across the wire.
That is, think about whether AVG(..), COUNT(*), SUM(..), MAX(..), etc can be done in the SELECT.
Taking that another step... Build and maintain a "Summary table" that has subtotals (etc). Then, reading (or scanning) the summary table and summing up the subtotals, etc, will be even faster, both in the Server and across the wire.
(And I agree with the need to avoid *, and that the 8 seconds is probably due to network delay (and "bandwidth"). Where is the server geographically? How long does SELECT 1; take?)
Trying to set up a user profile page for a job site. The database I plan to use is the MySQL database.
Looking into a few database design I came up with this schema.
First, the user management tables
CREATE TABLE `user` (
`user_id` int(11) NOT NULL,
`firstname` varchar(32) NOT NULL,
`lastname` varchar(32) NOT NULL,
`email` varchar(96) NOT NULL,
`mobile_number` varchar(32) NOT NULL,
`password` varchar(40) NOT NULL,
`salt` varchar(9) NOT NULL,
`address_id` int(11) NOT NULL DEFAULT '0',
`ip` varchar(40) NOT NULL,
`status` tinyint(1) NOT NULL,
`approved` tinyint(1) NOT NULL,
`registration_date` datetime NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `user_address` (
`user_id` int(11) NOT NULL,
`city` varchar(128) NOT NULL
`work_city` varchar(128) NOT NULL,
`postal_code` varchar(10) NOT NULL,
`country_id` int(11) NOT NULL DEFAULT '0'
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `user_description` (
`user_id` int(11) NOT NULL,
`description` text NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
and then the education table and work experience
CREATE TABLE `education_detail` (
`user_id` int(11) NOT NULL,
`certificate_degree_name` varchar(255) DEFAULT NULL,
`major` varchar(255) DEFAULT NULL,
`institute_university_name` varchar(255) DEFAULT NULL,
`start_date` date NOT NULL DEFAULT '0000-00-00',
`completion_date` date NOT NULL DEFAULT '0000-00-00'
)
CREATE TABLE `experience_detail` (
`user_id` int(11) NOT NULL,
`is_current_job` int(2) DEFAULT NULL,
`start_date` date NOT NULL DEFAULT '0000-00-00',
`end_date` date NOT NULL DEFAULT '0000-00-00',
`job_title` varchar(255) DEFAULT NULL,
`company_name` varchar(255) DEFAULT NULL,
`job_location_city` varchar(255) DEFAULT NULL,
`job_location_state` varchar(255) DEFAULT NULL,
`job_location_country` varchar(255) DEFAULT NULL,
`job_description` varchar(255) DEFAULT NULL
)
Note that user_id in table user_address, user_description, education_detail and experience_detail is a foreign key referencing it to the table user.
There are a few more table like skills, certification etc to which I plan on using user_id as a FK.
My question, is this database design good enough? Can you suggest me what should be done more to make the design much better?
Keep in mind not all will have work experience, some may be freshers.
Use InnoDB, not MyISAM. (There are many Q&A explaining 'why'.)
NULL or an empty string is perfectly fine for a missing description. Do you have any further argument for disliking such? (Meanwhile, InnoDB is more efficient at handling optional big strings.)
Every table should have a PRIMARY KEY; you don't seem to have any. The first table probably needs user_id as the PK. Read about AUTO_INCREMENT.
As with description, why is address in a separate table?
May I suggest this for country name/code/id:
country_code CHAR(2) CHARACTER SET ascii
Education is 1:many from users, so user_id cannot be the PK. Ditto for jobs.
I have a table called register, where I store user's data like username,birth-date,language.
there I offer them to add multiple languages. and I store that languages as id in database.
I have separate table for language too.
now I want to create a view that fetch all data with multiple select language.
CREATE TABLE `register` (
`index_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL,
`password` varchar(300) NOT NULL,
`m_status` varchar(200) NOT NULL,
`username` varchar(100) NOT NULL,
`occupation` int(5) NOT NULL,
`m_tongue` varchar(200) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
PRIMARY KEY (`index_id`)
) ENGINE=MyISAM AUTO_INCREMENT=621 DEFAULT CHARSET=latin1
What I'm dealing with:
I have a project which uses ActiveCollab 2, and the database structure is new to me - practically everything gets stored to a project_objects table and has a recursively hierarchical relationship:
Record 1234 might be type "Ticket" with parent_id of 123
Record 123 might be type "Category" with parent_id of 12
Record 12 might be type "Milestone" and so on.
Currently there are upwards of 450,000 records in this table and many of the queries in the code reference the name field which does NOT have an index on it. An example value might be Design or Development.
This might be an example query:
SELECT * FROM project_objects WHERE type = "Ticket" and name = "Design"
My problem:
I have a query that is taking upwards of 12-15 seconds and I have a feeling it's from that
name column lacking the index and requiring the full text search. My understanding with indexes is that if I add one to the name field, it'll speed up the reads, but slow down the inserts and updates. Does the index need to get rebuilt completely every time a record is added or updated or is it just altered/appended? I don't want to optimize this query with an index if it means drastically slowing down other parts of the code base which depend on faster writes.
My question:
Assume 100 reads and 100 writes per day, which is more likely to be a faster process for MySQL - executing the above query on the above table without the index or having to rebuild the index every time a record is added?
I don't have the knowledge or authority to start running benchmarks, but I would like to offer a suggestion to the client without sounding completely novice. Thanks!
EDIT: Here is the table:
'CREATE TABLE `project_objects` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`source` varchar(50) DEFAULT NULL,
`type` varchar(30) NOT NULL DEFAULT ''ProjectObject'',
`module` varchar(30) NOT NULL DEFAULT ''system'',
`project_id` int(10) unsigned NOT NULL DEFAULT ''0'',
`milestone_id` int(10) unsigned DEFAULT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
`parent_type` varchar(30) DEFAULT NULL,
`name` varchar(150) DEFAULT NULL,
`body` longtext,
`tags` text,
`state` tinyint(4) NOT NULL DEFAULT ''0'',
`visibility` tinyint(4) NOT NULL DEFAULT ''0'',
`priority` tinyint(4) DEFAULT NULL,
`created_on` datetime DEFAULT NULL,
`created_by_id` smallint(5) unsigned NOT NULL DEFAULT ''0'',
`created_by_name` varchar(100) DEFAULT NULL,
`created_by_email` varchar(100) DEFAULT NULL,
`updated_on` datetime DEFAULT NULL,
`updated_by_id` smallint(5) unsigned DEFAULT NULL,
`updated_by_name` varchar(100) DEFAULT NULL,
`updated_by_email` varchar(100) DEFAULT NULL,
`due_on` date DEFAULT NULL,
`completed_on` datetime DEFAULT NULL,
`completed_by_id` smallint(5) unsigned DEFAULT NULL,
`completed_by_name` varchar(100) DEFAULT NULL,
`completed_by_email` varchar(100) DEFAULT NULL,
`comments_count` smallint(5) unsigned DEFAULT NULL,
`has_time` tinyint(1) unsigned NOT NULL DEFAULT ''0'',
`is_locked` tinyint(3) unsigned DEFAULT NULL,
`estimate` float(9,2) DEFAULT NULL,
`start_on` date DEFAULT NULL,
`start_on_text` varchar(50) DEFAULT NULL,
`due_on_text` varchar(50) DEFAULT NULL,
`workflow_status` int(4) DEFAULT NULL,
`varchar_field_1` varchar(255) DEFAULT NULL,
`varchar_field_2` varchar(255) DEFAULT NULL,
`integer_field_1` int(11) DEFAULT NULL,
`integer_field_2` int(11) DEFAULT NULL,
`float_field_1` double(10,2) DEFAULT NULL,
`float_field_2` double(10,2) DEFAULT NULL,
`text_field_1` longtext,
`text_field_2` longtext,
`date_field_1` date DEFAULT NULL,
`date_field_2` date DEFAULT NULL,
`datetime_field_1` datetime DEFAULT NULL,
`datetime_field_2` datetime DEFAULT NULL,
`boolean_field_1` tinyint(1) unsigned DEFAULT NULL,
`boolean_field_2` tinyint(1) unsigned DEFAULT NULL,
`position` int(10) unsigned DEFAULT NULL,
`version` int(10) unsigned NOT NULL DEFAULT ''0'',
PRIMARY KEY (`id`),
KEY `type` (`type`),
KEY `module` (`module`),
KEY `project_id` (`project_id`),
KEY `parent_id` (`parent_id`),
KEY `created_on` (`created_on`),
KEY `due_on` (`due_on`)
KEY `milestone_id` (`milestone_id`)
) ENGINE=InnoDB AUTO_INCREMENT=993109 DEFAULT CHARSET=utf8'
As #Ray points out, indexes do not have to be rebuilt on every Insert, Update or Delete operation. So, if you only want to improve efficuency of this (or similar) queries, add either an index on (name, type) or on (type, name).
Since you already have an index on (type) alone, I would add the first one:
ALTER TABLE project_objects
ADD INDEX name_type_IDX
(name, type) ;
It may take a few seconds on a busy server but it has to be done once and then all the queries with conditions like yours will benefit. It may also improve efficiency of several other types of queries that involve name only or name and type:
WHERE name = 'Design' AND type = 'Ticket' --- your query
WHERE name = 'Design' --- condition on `name` only
GROUP BY name --- group by `name`
WHERE name LIKE 'Design%' --- range condition on `name` only
WHERE name = 'Design' --- equality condition on `name`
AND type LIKE 'Ticket%' --- and range condition on `type`
WHERE name = 'Design' --- equality condition on `name`
GROUP BY type --- and group by `type`
GROUP BY name --- group by `name`
, type --- and `type`
The insert cost of adding a single point index on the name column is most likely negligible--it will probably amount to an addition of a constant time increase, probably no more that a few milliseconds. You will eat up some extra disk space, but that's usually not a concern. Nothing like the multiple seconds you're experienceing on select performance.
Add the index, enjoy the performance improvement.
BTW: Indexes aren't 'rebuilt' on every insert. They're usually implemented in B-Trees and unless you're deleting frequently, should require very little re-balancing once you get larger than a few levels (and rebalancing with little depth is pretty cheap).
I've a table with around 6-7lacs records and it's going to grow as time passes.It has around 16-20 columns in it. There are no one-many relationship to any of these columns.
User data entries are stored in these table.
So would it be feasible to split my table into multiple small tables or else just split the table into 2 halfs one with all the entries in it and other the recently fresh records which would be present to the data entry operators to feed in their entries.
In short my question is whether the mysql execution time would be faster if I split the tables, or would it be faster if I split them into two half's.
I guess the latter would be more feasible since it would not perform any join queries.
Updated:
CREATE TABLE `images` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`primary_category_id` int(10) unsigned DEFAULT NULL,
`secondary_category_id` int(10) unsigned DEFAULT NULL,
`front_url` varchar(255) DEFAULT NULL,
`back_url` varchar(255) DEFAULT NULL,
`title` varchar(100) DEFAULT NULL,
`part` varchar(10) DEFAULT NULL,
`photo_id` int(10) unsigned DEFAULT NULL,
`photo_dt_month` varchar(2) DEFAULT NULL,
`photo_dt_day` varchar(2) DEFAULT NULL,
`photo_dt_yr` varchar(4) DEFAULT NULL,
`type` varchar(25) DEFAULT NULL,
`size_width` int(10) unsigned DEFAULT NULL,
`size_height` int(10) unsigned DEFAULT NULL,
`dpi` int(10) unsigned NOT NULL DEFAULT '0',
`dpix` int(10) unsigned DEFAULT NULL,
`dpiy` int(10) unsigned DEFAULT NULL,
`in_stock` varchar(50) DEFAULT NULL,
`outlet` varchar(50) DEFAULT NULL,
`source` varchar(50) DEFAULT NULL,
`keywords` varchar(255) DEFAULT NULL,
`emotional_keywords` varchar(255) DEFAULT NULL,
`mechanical_keywords` varchar(255) DEFAULT NULL,
`description` text,
`notes` text,
`comments` text,
`exported_to_ebay_dt` datetime DEFAULT NULL,
`exported_to_ebay` set('Y','N') NOT NULL DEFAULT 'N',
`updated_worker_id` int(10) unsigned DEFAULT NULL,
`updated_worker_dt` datetime DEFAULT NULL,
`locked_worker_id` int(10) unsigned DEFAULT NULL,
`locked_worker_dt` datetime DEFAULT NULL,
`updated_admin_id` int(10) unsigned DEFAULT NULL,
`updated_admin_dt` datetime DEFAULT NULL,
`added_dt` datetime DEFAULT NULL,
`updated_manager_id` int(10) unsigned DEFAULT NULL,
`updated_manager_dt` datetime DEFAULT NULL,
`manager_review` set('Y','N') NOT NULL DEFAULT 'N',
`paid_status` set('Y','N') NOT NULL DEFAULT 'N',
`exported_to_web_dt` datetime DEFAULT NULL,
`exported_to_web` set('Y','N') DEFAULT 'N',
`prefix` varchar(50) DEFAULT NULL,
`is_premium` set('Y','N') DEFAULT 'N',
`template` varchar(50) DEFAULT 'HIPE_default',
`photographer` varchar(100) DEFAULT NULL,
`copyright` varchar(100) DEFAULT NULL,
`priority` int(4) DEFAULT '1',
`step` set('1','2') DEFAULT '1',
PRIMARY KEY (`id`),
UNIQUE KEY `part` (`part`),
KEY `primary_category_id` (`primary_category_id`),
KEY `updated_worker_id` (`updated_worker_id`),
KEY `updated_worker_dt` (`updated_worker_dt`)
) ENGINE=MyISAM AUTO_INCREMENT=1013687 DEFAULT CHARSET=latin1
The above is my table structure.After there are entries being made say around 1lac I would split it into another table say images_history with same structure.Is this feasible or should I split them into multiple tables to reduce the query execution time
Why do you want to split the table? It would lead to a ton of extra code and slow down the execution time by adding extra queries if you still want to access both of the new tables. (If one of the tables are going to store rarely used previous versions of records of the images table - i.e. version control - it may still be a good idea).
Before even thinking about splitting the table, see if you can increase performance by optimizing the existing code by making sure none of the following performance disasters are:
Do all SELECTs filter by PRIMARY KEY?
Is the index cache large enough to hold all indices in the computers RAM?
Are any string matching SELECTs with LIKE using the indices? I.e. only exact matches or wildcards on the right, never on the left (e.g. "searchword%" and never "%searchword"
Are there any slow performing queries that use SELECT * instead of selecting only the columns you need?
Have you avoided using OR in SELECTs?
Performing queries on a table with 700 000 records shouldn't be slow if tabels are properly indexed and queries are actually using those indices.