Normalizing CSV to MySQL?

Normalizing CSV to MySQL? - mysql

I'm new to the whole "normalized table" thing. I have a csv file with the contents as follows:
Cell,Width(m),Length(m),Spacing(m),VDD(V),VSS(V),Temp,Param,Value,Path,TOOL
pmos_var12,5e-03,5e-03,5e-03,0.5,0,0,delay[s],4.65e-06,/home/user/tests/run2/pspice
pmos_var12,5e-03,5e-03,5e-03,0.5,0,10,delay[s],6.2e-06,/home/user/tests/run2/pspice
pmos_var12,5e-03,5e-03,5e-03,0.5,0,25,delay[s],7.46e-06,/home/user/tests/run2/pspice
pmos_var12,5e-03,5e-03,5e-03,0.5,0,70,delay[s],8.98e-06,/home/user/tests/run2/pspice
pmos_var12,5e-03,5e-03,5e-03,0.5,0,100,delay[s],9.56e-06,/home/user/tests/run2/pspice
nmos_var12,5e-03,5e-03,5e-03,0.5,0,0,delay[s],4.65e-06,/home/user/tests/run2/pspice
nmos_var12,5e-03,5e-03,5e-03,0.5,0,10,delay[s],6.2e-06,/home/user/tests/run2/pspice
nmos_var12,5e-03,5e-03,5e-03,0.5,0,25,delay[s],7.46e-06,/home/user/tests/run2/pspice
nmos_var12,5e-03,5e-03,5e-03,0.5,0,70,delay[s],8.98e-06,/home/user/tests/run2/pspice
nmos_var12,5e-03,5e-03,5e-03,0.5,0,100,delay[s],9.56e-06,/home/user/tests/run2/pspice
I've created these tables to store the data:
CREATE TABLE `TEST__RUN_MAPPING` (
`ID` int(11) NOT NULL auto_increment,
`NAME` varchar(45) NOT NULL,
`STATUS` varchar(20) NOT NULL,
`PATH` text NOT NULL,
`TOOL` varchar(10) NOT NULL,
`COMMENTS` text NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
CREATE TABLE `TEST__DATA_MAPPING` (
`ID` int(11) NOT NULL auto_increment,
`NAME_ID` int(11) NOT NULL,
`CONDITIONS` int(11) NOT NULL,
`VALUE` varchar(10) NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
CREATE TABLE `TEST__CONDITION_MAPPING` (
`ID` int(11) NOT NULL auto_increment,
`CELL_ID` int(11) NOT NULL,
`W_ID` int(11) NOT NULL,
`L_ID` int(11) NOT NULL,
`SPACE_ID` int(11) NOT NULL,
`VDD_ID` int(11) NOT NULL,
`VSS_ID` int(11) NOT NULL,
`TEMP_ID` int(11) NOT NULL,
`PARAM_ID` int(11) NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
TEST__RUN_MAPPING ID maps to TEST__DATA_MAPPING NAME_ID
TEST__DATA_MAPPING CONDITIONS maps to TEST__CONDITION_MAPPING ID
All *ID in TEST_CONDITION_MAPPING map to their own table in order to have things unique.
each one of these csv files will differ in what technology used in the simulations, and I keep tabs on this with the NAME column in TEST__RUN_MAPPING. Cell, Width(m), Length(m), Spacing(m), VDD(V), VSS(V), and Temp are all values that are swept, but usually they're the same per technology so I grouped them together in a separate table.
Are there any other ways that a more experienced person could break down the relationship such that it can have optimal reading times? better normalization?

If I understand this structure correctly then I would not seperate the conditions table from the run table. Surely they would have a 1-to-1 relationship. So why not have them both in the same table. The conditions for that paticular run.
Also I would be careful about putting a 'TEXT' block inside a record. TEXT and Blobs can cause some performance problems. varchar in mysql 5 can go as high as 65,000 characters. For paths you should not need more than 1024 characters. So varchar(1024) should be enough for a path.

Related

MySql Using filesort when i using a group by

I have a little problem with optimizing a query, I have 2 tables, one which records the participation (participation) in a quiz, and the other which records the answer to each question (participation_rep), participation is linked to the campaign table.
SELECT count(DISTINCT p.id) as number_of_participation
FROM participation_rep prep
INNER JOIN participation p
ON p.id = prep.id_participation
AND p.trash <> 1
WHERE prep.id_question IN (780,787,794,801,809)
AND prep.trash <> 1
GROUP BY pp.id_campaign
Explain of the query
And the problem is that this request is very heavy to execute when there is a lot of data which is concerned by the request and I do not know how to optimize it.
This query take 30-50ms to execute.
Structure of table participation :
CREATE TABLE IF NOT EXISTS `participation` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_campagne` int(11) NOT NULL,
`id_identifiant` int(11) DEFAULT NULL,
`firstname` varchar(255) DEFAULT NULL,
`surname` varchar(255) DEFAULT NULL,
`email` varchar(255) DEFAULT NULL,
`date_p` date NOT NULL,
`hour_p` time NOT NULL,
`comment` text,
`trash` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Structure of table participation_rep :
CREATE TABLE IF NOT EXISTS `participation_rep` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_participation` int(11) NOT NULL,
`id_question` int(11) NOT NULL,
`id_rep` int(11) NOT NULL,
`trash` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `id_participation` (`id_participation`,`id_question`,`id_reponse`),
KEY `id_question` (`id_question`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Is this relationship OneToMany or ManyToMany?

I have two tables ARTICLE and FAQ ( frequently asked questions ). I'm trying to establish a relationship between these two tables but I'm confused!
What I want to achieve is that article can have many FAQ. So for this should I create a pivot table or just reference a FK in FAQ table?
What I tried but I'm not sure that the below flow is right or not?
Article table:
CREATE TABLE IF NOT EXISTS `article` (
`id` int(11) UNSIGNED NOT NULL,
`title` varchar(255) DEFAULT NULL,
`slug` varchar(255) DEFAULT NULL,
`description` longtext NOT NULL,
PRIMARY KEY (`id`)
);
FAQ Table Schema:
CREATE TABLE IF NOT EXISTS `eb_faq` (
`id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`faq_category_id` bigint(20) UNSIGNED DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`question` text NOT NULL,
`answer` text NOT NULL,
PRIMARY KEY (`id`)
);
Pivot:
CREATE TABLE IF NOT EXISTS `article_linked_faq` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`article_id` int(11) DEFAULT NULL,
`faq_id` int(11) DEFAULT NULL,
`order_by` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
);

This schema will indeed allow an article to have multiple FAQs, but also allows one FAQ to be linked to multiple articles. If that's what you want, great! If not then I'd suggest removing the pivot table and adding article_id into eb_faq.

No, you just need to add foreign key in faq table, it will create the relationship between both tables. There is no need to create a third table
CREATE TABLE IF NOT EXISTS `eb_faq` (
`id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`articleId` int(11),
`faq_category_id` bigint(20) UNSIGNED DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`question` text NOT NULL,
`answer` text NOT NULL,
PRIMARY KEY (`id`),
FOREIGN KEY (articleId) REFERENCES article(id)
);

MySQL: Optimizing JOINs to find non-matching records

We have a host management system (let's call it CMDB), and a DNS system, each using different tables. The former syncs to the latter, but manual changes cause them to get out of sync. I would like to craft a query to find aliases in CMDB that do NOT have a matching entry in DNS (either no entry, or the name/IP is different)
Because of the large size of the tables, and the need for this query to run frequently, optimizing the query is very important.
Here's what the tables look like:
cmdb_record: id, ipaddr
cmdb_alias: record_id, host_alias
dns_entry: name, ipaddr
cmdb_alias.record_id is a foreign key from cmdb_record.id, so that one IP address can have multiple aliases.
So far, here's what I've come up with:
SELECT cmdb_alias.host_alias, cmdb_record.ipaddr
FROM cmdb_record
INNER JOIN cmdb_alias ON cmdb_alias.record_id = cmdb_record.id
LEFT JOIN dns_entry
ON dns_entry.ipaddr = cmdb_record.ipaddr
AND dns_entry.name = cmdb_alias.host_alias
WHERE dns_entry.ipaddr IS NULL OR dns_entry.name IS NULL
This seems to work, but takes a very long time to run. Is there a better way to do this? Thanks!
EDIT: As requested, here are the SHOW CREATE TABLEs. There are lots of extra fields that aren't particularly relevant, but included for completeness.
Create Table: CREATE TABLE `cmdb_record` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`ip_version` int(11) DEFAULT NULL,
`ipaddr` varchar(40) DEFAULT NULL,
`ipaddr_numeric` decimal(40,0) DEFAULT NULL,
`block_id` int(11) NOT NULL,
`record_commented` tinyint(1) NOT NULL,
`mod_time` datetime NOT NULL,
`deleted` tinyint(1) NOT NULL,
`deleted_date` datetime DEFAULT NULL,
`record_owner` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ipaddr` (`ipaddr`),
KEY `cmdb_record_fe30f0f7` (`ipaddr`),
KEY `cmdb_record_2b8b575` (`ipaddr_numeric`),
KEY `cmdb_record_45897ef2` (`block_id`),
CONSTRAINT `block_id_refs_id_ed6ed320` FOREIGN KEY (`block_id`) REFERENCES `cmdb_block` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=104427 DEFAULT CHARSET=latin1
Create Table: CREATE TABLE `cmdb_alias` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host_alias` varchar(255) COLLATE latin1_general_cs NOT NULL,
`record_id` int(11) NOT NULL,
`record_order` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `cmdb_alias_fcffc3bb` (`record_id`),
KEY `alias_lookup` (`host_alias`),
CONSTRAINT `record_id_refs_id_8169fc71` FOREIGN KEY (`record_id`) REFERENCES `cmdb_record` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=155433 DEFAULT CHARSET=latin1 COLLATE=latin1_general_cs
Create Table: CREATE TABLE `dns_entry` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`rec_grp_id` varchar(40) NOT NULL,
`parent_id` int(11) NOT NULL,
`domain_id` int(11) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`type` varchar(6) DEFAULT NULL,
`ipaddr` varchar(255) DEFAULT NULL,
`ttl` int(11) DEFAULT NULL,
`prio` int(11) DEFAULT NULL,
`status` varchar(20) NOT NULL,
`op` varchar(20) NOT NULL,
`mod_time` datetime NOT NULL,
`whodunit` varchar(50) NOT NULL,
`comments` longtext NOT NULL,
PRIMARY KEY (`id`),
KEY `dns_entry_a2431ea` (`domain_id`),
KEY `dns_entry_52094d6e` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=49437 DEFAULT CHARSET=utf8

If you don't have one already, create an index on dns_entry(ipaddr, name). This might be all you need to speed the query.

Data Structure causing impossible joins

Tables:
nodes
data_texts
data_profiles
data_locations
data_profiles
data_media
data_products
data_metas
categories
tags
categories_nodes
tags_nodes
This question is a generalized question and is on the back of another question
Explanation:
Each of the "data" tables has a node_id that refers back to the id of the nodes table (hasMany/belongsTo association).
A "Node" can be anything - a TV Show, a Movie, a Person, an Article...etc (all generated via a CMS, so the user can control what type of "Nodes" they want).
When pulling data, I want to be able to query against certain fields. For example if they do a search, I want to be able to pull nodes that have data_texts.title = '%george%' or order by the datetime field in data_locations.
The problem is, when I do a join on all seven data tables (or more), the query has to hit so many combined rows that it just times out (even with a nearly empty database.... total 200 rows across the entire database).
I realize I can determine IF I need a join depending on what I'm doing - but even with five or six joins (once the database gets to 10k+ records), it's going to be horribly slow, if it works at all. Per this question, the query I'm using just doing a join on these tables times out completely.
Each node can have multiple rows of each data type (for multi-language reasons among others).
I'm completely defeated - I'm at the point where I think I need to restructure the entire thing, but don't have the time for that. I've thought about combining all into one table, but aren't sure how....etc
nodes
CREATE TABLE `nodes` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(100) NOT NULL,
`slug` VARCHAR(100) NOT NULL,
`node_type_id` CHAR(36) NOT NULL,
`site_id` CHAR(36) NOT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `nodeTypeId` (`node_type_id`),
INDEX `slug` (`slug`),
INDEX `nodeId` (`id`)
)
COLLATE='latin1_swedish_ci'
ENGINE=MyISAM;
data_texts:
CREATE TABLE `data_texts` (
`id` CHAR(36) NOT NULL,
`title` VARCHAR(250) NULL DEFAULT NULL,
`subtitle` VARCHAR(500) NULL DEFAULT NULL,
`content` LONGTEXT NULL,
`byline` VARCHAR(250) NULL DEFAULT NULL,
`language_id` CHAR(36) NULL DEFAULT NULL,
`foreign_key` CHAR(36) NULL DEFAULT NULL,
`model` VARCHAR(40) NULL DEFAULT NULL,
`node_id` CHAR(36) NULL DEFAULT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `nodeId` (`node_id`),
INDEX `languageId_nodeId` (`language_id`, `node_id`),
INDEX `foreignKey_model` (`foreign_key`, `model`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
data_profiles
CREATE TABLE `data_profiles` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(80) NULL DEFAULT NULL,
`email_personal` VARCHAR(100) NULL DEFAULT NULL,
`email_business` VARCHAR(100) NULL DEFAULT NULL,
`email_other` VARCHAR(100) NULL DEFAULT NULL,
`title` VARCHAR(100) NULL DEFAULT NULL,
`description` LONGTEXT NULL,
`prefix` VARCHAR(40) NULL DEFAULT NULL,
`phone_home` VARCHAR(40) NULL DEFAULT NULL,
`phone_business` VARCHAR(40) NULL DEFAULT NULL,
`phone_mobile` VARCHAR(40) NULL DEFAULT NULL,
`phone_other` VARCHAR(40) NULL DEFAULT NULL,
`foreign_key` CHAR(36) NULL DEFAULT NULL,
`model` VARCHAR(40) NULL DEFAULT NULL,
`node_id` CHAR(36) NULL DEFAULT NULL,
`language_id` CHAR(36) NULL DEFAULT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
`user_id` CHAR(36) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `nodeId` (`node_id`),
INDEX `languageId_nodeId` (`node_id`, `language_id`),
INDEX `foreignKey_model` (`foreign_key`, `model`)
)
COLLATE='latin1_swedish_ci'
ENGINE=MyISAM;
categories
CREATE TABLE `categories` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(100) NOT NULL,
`node_type_id` CHAR(36) NOT NULL,
`site_id` CHAR(36) NOT NULL,
`slug` VARCHAR(100) NOT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `nodeTypeId` (`node_type_id`),
INDEX `slug` (`slug`)
)
COMMENT='Used to categorize nodes'
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
categories_nodes
CREATE TABLE `categories_nodes` (
`id` CHAR(36) NOT NULL,
`category_id` CHAR(36) NOT NULL,
`node_id` CHAR(36) NOT NULL,
PRIMARY KEY (`id`),
INDEX `categoryId_nodeId` (`category_id`, `node_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
node_tags
CREATE TABLE `node_tags` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(40) NOT NULL,
`site_id` CHAR(36) NOT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `siteId` (`site_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
nodes_node_tags
CREATE TABLE `nodes_node_tags` (
`id` CHAR(36) NOT NULL,
`node_id` CHAR(36) NOT NULL,
`node_tag_id` CHAR(36) NOT NULL,
PRIMARY KEY (`id`),
INDEX `node_id_node_tag_id` (`node_id`, `node_tag_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
MySQL:
SELECT `Node`.`id`, `Node`.`name`, `Node`.`slug`, `Node`.`node_type_id`, `Node`.`site_id`, `Node`.`created`, `Node`.`modified`
FROM `mysite`.`nodes` AS `Node`
LEFT JOIN `mysite`.`data_date_times` AS `DataDateTime` ON (`DataDateTime`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_locations` AS `DataLocation` ON (`DataLocation`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_media` AS `DataMedia` ON (`DataMedia`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_metas` AS `DataMeta` ON (`DataMeta`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_profiles` AS `DataProfile` ON (`DataProfile`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_products` AS `DataProduct` ON (`DataProduct`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_texts` AS `DataText` ON (`DataText`.`node_id` = `Node`.`id`)
WHERE 1=1
GROUP BY `Node`.`id`

Firstly, try InnoDB, not MyISAM.
Secondly, remove the group by, see how well it runs then, and how many rows are involved. Shouldn't be that many, but it's interesting.
You don't need the 'nodeId' index on node (as you already have it as a primary key). Again, shouldn't make any difference.
The where clause is irrelevant. You can remove it with no effect one way or another.
Thirdly, well, something is seriously broken.
Have a quick look on how to start profiling (e.g. http://dev.mysql.com/doc/refman/5.0/en/show-profile.html) , and run a profile command to see where all the time is going. Post it here if it doesn't immediately show that something is broken.

I'm unfortunately not in a position where I can do any tests right now. I'll just throw out some ideas. I might be able to do some tests later.
Be suspicious of different collations.
Some of your ids are useless. For example, you should drop the column categories_nodes.id, and put a primary key constraint on {category_id, node_id} instead.
Be suspicious of any design that requires joining all the tables at run time. There are better ways.
Use innodb and foreign key constraints.

Querying two tables... in MySQL

CREATE TABLE IF NOT EXISTS `document`
(
`intId` int(11) NOT NULL auto_increment,
`chDocumentTitle` varchar(32) default NULL,
`dtLastUpdate` datetime default NULL,
`chUser` varchar(32) default NULL,
`chLink` varchar(256) default NULL,
`Keyword` varchar(256) default NULL,
`intParentid` int(11) NOT NULL,
PRIMARY KEY (`intId`),
KEY `dtLastUpdate` (`dtLastUpdate`,`chUser`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=10 ;
CREATE TABLE IF NOT EXISTS `category`
(
`intId` int(11) NOT NULL auto_increment,
`chName` varchar(32) NOT NULL,
`Isactive` tinyint(1) NOT NULL default '0',
`chnestUnder` int(5) NOT NULL default '0',
PRIMARY KEY (`intId`),
KEY `chName` (`chName`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=9 ;
Now I am looking for a query which will do the following...
Want to list out the documents of the categories... in hierarchical order.
Category One
Documents of Category One
Sub Category - [ If any ]
Documents of Sub Category
Based on this I need to generate XML.

This page has a very good explanation and plenty of helpful examples on how to work with hierarchical data in MySQL. In your situation it's definitely worth the read:
http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
...
Also make sure to follow the link to
There's also a reference to this page, with tips on how to work with hierarchical data in your database with a bit of help from PHP.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Normalizing CSV to MySQL? - mysql

Related

MySql Using filesort when i using a group by

Is this relationship OneToMany or ManyToMany?

MySQL: Optimizing JOINs to find non-matching records

Data Structure causing impossible joins

Querying two tables... in MySQL

Categories

Resources