Using query to change table mapping - mysql

I have a table mytable( id, key, value). I realize that key is generating a lot of data redundancy since my key is a string. (my keys are really long, but repetititve) How do I build a separate table out that has (key, keyID) and then alternate my table to be mytable( id, keyID, value) and keyTable(keyID, key) ?

Create keyTable
Fill keys from mytable:
INSERT INTO keyTable (`key`) SELECT DISTINCT mytable.key FROM mytable;
add keyID column to mytable
Assign keyIDs:
UPDATE mytable SET keyID = (SELECT keyTable.keyID FROM keyTable WHERE keyTable.key = mytable.key);
Remove key column from mytable

i just posted my workout for your problem. Just check this step by step:
CREATE TABLE `keytable` (
`keyID` INT( 11 ) NOT NULL auto_increment,
`key` VARCHAR( 100 ) NOT NULL,
`id` INT( 11 ) NOT NULL
) ;
insert into `keytable` (`key`,`id`) select `key`,`id` from mytable;
ALTER TABLE `mytable` CHANGE `key` `keyID` INT( 11 ) NOT NULL ;
update `mytable` set `keyID`= (select `keyID` from keytable where keytable.id=mytable.id)
ALTER TABLE `keytable` DROP `id` ;

Related

How to copy a very large table into another table in MYSQL?

I have a large table with 110M rows. I would like to copy some of the fields into a new table and here is a rough idea of how I am trying to do:
DECLARE l_seenChangesTo DATETIME DEFAULT '1970-01-01 01:01:01';
DECLARE l_migrationStartTime DATETIME;
SELECT NOW() into l_migrationStartTime;
-- See if we've run this migration before and if so, pick up from where we left off...
IF EXISTS(SELECT seenChangesTo FROM migration_status WHERE client_user = CONCAT('this-migration-script-', user())) THEN
SELECT seenChangesTo FROM migration_status WHERE client_user = CONCAT('this-migration-script-', user()) INTO l_seenChangesTo;
SELECT NOW() as LogTime, CONCAT('Picking up from where we left off: ', l_seenChangesTo) as MigrationStatus;
END IF;
INSERT IGNORE INTO newTable
(field1, field2, lastModified)
SELECT o.column1 AS field1,
o.column2 AS field2,
o.lastModified
FROM oldTable o
WHERE
o.lastModified >= l_seenChangesTo AND
o.lastModified <= l_migrationStartTime;
INSERT INTO migration_status (client_user,seenChangesTo)
VALUES (CONCAT('this-migration-script-', user()), l_migrationStartTime)
ON DUPLICATE KEY UPDATE seenChangesTo=l_migrationStartTime;
Context:
CREATE TABLE IF NOT EXISTS `newTable` (
`field1` varchar(255) NOT NULL,
`field2` tinyint unsigned NOT NULL,
`lastModified` datetime NOT NULL,
PRIMARY KEY (`field1`, `field2`),
KEY `ix_field1` (`field1`),
KEY `ix_lastModified` (`lastModified`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `oldTable` (
`column1` varchar(255) NOT NULL,
`column2` tinyint unsigned NOT NULL,
`lastModified` datetime NOT NULL,
PRIMARY KEY (`column1`, `column2`),
KEY `ix_column1` (`column1`),
KEY `ix_lastModified` (`lastModified`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `migration_status` (
`client_user` char(64) NOT NULL,
`seenChangesTo` char(128) NOT NULL,
PRIMARY KEY (`client_user`)
);
Note: I have a few more columns in oldTable. Both oldTable and newTable are in same DB schema using mysql.
What's the general strategy when copying a very table? Should I perform this migration in an iterative manner by copy say 50,000 rows at time.
The insert speed doing a migration like this iteratively is going to be dreadfully slow. Why not SELECT oldTable INTO OUTFILE, then LOAD DATA INFILE ?

creating table from two different table

I am creating table from two different table with query:
create table post_table as
( select t1.id, t2.url, t2.desc, t2.preview, t2.img_url,
t2.title, t2.hash, t2.rate
from user_record t1, post_data t2
primary key (t1.id, t2,hash))
what's syntax error here?
post_data
----
url varchar(255) No
desc varchar(2048) No
preview varchar(255) No
img_url varchar(128) No
title varchar(128) No
hash varchar(128) No // This is one
rate varchar(20) Yes NULL
user_record
id varchar(40) No //This is 2nd
name varchar(50) Yes NULL
email varchar(50) Yes NULL
picture varchar(50) No
UPDATE:
create table post_table (
id VARCHAR(40), url varchar(255), preview varchar(255) , img_url varchar(128), title varchar(128), hash varchar(128), rate varchar(20)
primary key (t1.id, t2,hash));
select t1.id, t2.url, t2.desc, t2.preview, t2.img_url,
t2.title, t2.hash, t2.rate
from user_record t1, post_data t2;
Formatting the CREATE TABLE statement so we can see the ( ) pairing:
create table post_table as (
select t1.id, t2.url, t2.desc, t2.preview, t2.img_url, t2.title, t2.hash, t2.rate
from user_record t1, post_data t2
primary key (t1.id, t2,hash)
)
We can see that the primary key is being attached to the select statement.
Beyond that there are specific restrictions around general CREATE TABLE syntax can be used in a CREATE TABLE ... SELECT statement.
From: http://dev.mysql.com/doc/refman/5.1/en/create-table-select.html
The ENGINE option is part of the CREATE TABLE statement, and should
not be used following the SELECT; this would result in a syntax error.
The same is true for other CREATE TABLE options such as CHARSET.
You can how ever select keys by using syntax similar to:
mysql> CREATE TABLE test (a INT NOT NULL AUTO_INCREMENT,
-> PRIMARY KEY (a), KEY(b))
-> ENGINE=MyISAM SELECT b,c FROM test2;
So with your query re-work it to define the column types first, then the keys, then the select statement last. We don't know your data types but it would look something similar to:
create table post_table (
id DATATYPE, url DATATYPE, desc DATATYPE...
primary key (t1.id, t2,hash))
)
select t1.id, t2.url, t2.desc, t2.preview, t2.img_url,
t2.title, t2.hash, t2.rate
from user_record t1, post_data t2
You have put key definition BEFORE select.
Also you can't do key definition without fields, so if you need keys, you have put all table structure.
http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Other way is create index after creating table by use CREATE INDEX

How to Add integer column to an String column in MySQl 5.0

I Want to add an Integer Column to a String that's because i need to generate a varchar variable with a numeric part that automatically increments. For example, P000001,P000002...
In order to do that what i am doing while creation of table i have taken an int field ID which auto_increments and i am Concatenating P with 00000 and the ID value
The Table i have created is :
CREATE TABLE tblAcceptTest(
ID int AUTO_INCREMENT NOT NULL primary key,
PatientID as CONCAT('P' , CONCAT('000000',CAST(ID as char)))
);
It Shows me the error from as keyword.
Please help
MySQL's documentation (http://dev.mysql.com/doc/refman/5.1/en/create-table.html) says, "the default value must be a constant; it cannot be a function or an expression." Why don't you just get the PatientID value afterward as part of the SELECT:
SELECT CONCAT('P', LPAD(ID, 6, 0)) AS PatientID FROM tblAcceptTest;
It looks like you want six digits after the "P", so try this for your expression:
CONCAT('P', LPAD(ID, 6, '0'))
Mysql has little support for computed columns.
Patient ID from your specification could be a char(7)
CREATE TABLE tblAcceptTest(
ID int AUTO_INCREMENT NOT NULL primary key,
PatientID char(7)
);
Then create some triggers. Note that the following insert trigger will cause issues with high concurrency servers.
DELIMITER |
CREATE TRIGGER tblAcceptTest_insert BEFORE INSERT ON tblAcceptTest
FOR EACH ROW BEGIN
DECLARE next_id INT;
SET next_id = (SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_SCHEMA=DATABASE() AND TABLE_NAME='tblAcceptTest');
SET NEW.PatientID = CONCAT('P' , RIGHT(CONCAT('000000',next_id),6)) ;
END;
|
CREATE TRIGGER tblAcceptTest_update BEFORE UPDATE ON tblAcceptTest
FOR EACH ROW BEGIN
SET NEW.PatientID = CONCAT('P' , RIGHT(CONCAT('000000',NEW.ID),6)) ;
END;
|
DELIMITER ;
You use relationships and views to achieve the same result.
CREATE TABLE `patient` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`patient` varchar(60) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `accepted_test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`patient_id` int(11) NOT NULL,
`accepted` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `patient_id` (`patient_id`),
CONSTRAINT `accepted_test_ibfk_1` FOREIGN KEY (`patient_id`) REFERENCES `patient` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
create or replace view accepted_test_veiw as
select CONCAT('P' , RIGHT(CONCAT('000000',patient_id),6)) patient_key
, accepted
, id accepted_test_id
, patient_id
from accepted_test ;
select * from `accepted_test_veiw`

Super slow query with CROSS JOIN

I have two tables named table_1 (1GB) and reference (250Mb).
When I query a cross join on reference it takes 16hours to update table_1 .. We changed the system files EXT3 for XFS but still it's taking 16hrs.. WHAT AM I DOING WRONG??
Here is the update/cross join query :
mysql> UPDATE table_1 CROSS JOIN reference ON
-> (table_1.start >= reference.txStart AND table_1.end <= reference.txEnd)
-> SET table_1.name = reference.name;
Query OK, 17311434 rows affected (16 hours 36 min 48.62 sec)
Rows matched: 17311434 Changed: 17311434 Warnings: 0
Here is a show create table of table_1 and reference:
CREATE TABLE `table_1` (
`strand` char(1) DEFAULT NULL,
`chr` varchar(10) DEFAULT NULL,
`start` int(11) DEFAULT NULL,
`end` int(11) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`name2` varchar(255) DEFAULT NULL,
KEY `annot` (`start`,`end`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
CREATE TABLE `reference` (
`bin` smallint(5) unsigned NOT NULL,
`name` varchar(255) NOT NULL,
`chrom` varchar(255) NOT NULL,
`strand` char(1) NOT NULL,
`txStart` int(10) unsigned NOT NULL,
`txEnd` int(10) unsigned NOT NULL,
`cdsStart` int(10) unsigned NOT NULL,
`cdsEnd` int(10) unsigned NOT NULL,
`exonCount` int(10) unsigned NOT NULL,
`exonStarts` longblob NOT NULL,
`exonEnds` longblob NOT NULL,
`score` int(11) DEFAULT NULL,
`name2` varchar(255) NOT NULL,
`cdsStartStat` enum('none','unk','incmpl','cmpl') NOT NULL,
`cdsEndStat` enum('none','unk','incmpl','cmpl') NOT NULL,
`exonFrames` longblob NOT NULL,
KEY `chrom` (`chrom`,`bin`),
KEY `name` (`name`),
KEY `name2` (`name2`),
KEY `annot` (`txStart`,`txEnd`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
You should index table_1.start, reference.txStart, table_1.end and reference.txEnd table fields:
ALTER TABLE `table_1` ADD INDEX ( `start` ) ;
ALTER TABLE `table_1` ADD INDEX ( `end` ) ;
ALTER TABLE `reference` ADD INDEX ( `txStart` ) ;
ALTER TABLE `reference` ADD INDEX ( `txEnd` ) ;
Cross joins are Cartesian Products, which are probably one of the most computationally expensive things to compute (they don't scale well).
For each table T_i for i = 1 to n, the number of rows generated by crossing tables T_1 to T_n is the size of each table multiplied by the size of each other table, ie
|T_1| * |T_2| * ... * |T_n|
Assuming each table has M rows, the resulting cost of computing the cross join is then
M_1 * M_2 ... M_n = O(M^n)
which is exponential in the number of tables involved in the join.
I see 2 problems with the UPDATE statement.
There is no index for the End fields. The compound indexes (annot) you have will be used only for the start fields in this query. You should add them as suggested by Emre:
ALTER TABLE `table_1` ADD INDEX ( `end` ) ;
ALTER TABLE `reference` ADD INDEX ( `txEnd` ) ;
Second, the JOIN may (and probably does) find many rows of table reference that are related to a row of table_1. So some (or all) rows of table_1 that are updated, are updated many times. Check the result of this query, to see if it is the same as your updated rows count (17311434):
SELECT COUNT(*)
FROM table_1
WHERE EXISTS
( SELECT *
FROM reference
WHERE table_1.start >= reference.txStart
AND table_1.`end` <= reference.txEnd
)
There can be other ways to write this query but the lack of a PRIMARY KEY on both tables makes it harder. If you define a primary key on table_1, try this, replacing id with the primary key.
Update: No, do not try it on a table with 34M rows. Check the execution plan and try with smaller tables first.
UPDATE table_1 AS t1
JOIN
( SELECT t2.id
, r.name
FROM table_1 AS t2
JOIN
( SELECT name, txStart, txEnd
FROM reference
GROUP BY txStart, txEnd
) AS r
ON t2.start >= r.txStart
AND t2.`end` <= r.txEnd
GROUP BY t2.id
) AS good
ON good.id = t1.id
SET t1.name = good.name;
You can check the query plan by running EXPLAIN on the equivalent SELECT:
EXPLAIN
SELECT t1.id, t1.name, good.name
FROM table_1 AS t1
JOIN
( SELECT t2.id
, r.name
FROM table_1 AS t2
JOIN
( SELECT name, txStart, txEnd
FROM reference
GROUP BY txStart, txEnd
) AS r
ON t2.start >= r.txStart
AND t2.`end` <= r.txEnd
GROUP BY t2.id
) AS good
ON good.id = t1.id ;
Try this:
UPDATE table_1 SET
table_1.name = (
select reference.name
from reference
where table_1.start >= reference.txStart
and table_1.end <= reference.txEnd)
Somebody already offered you to add some indexes. But I think the best performance you may get with these two indexes:
ALTER TABLE `test`.`time`
ADD INDEX `reference_start_end` (`txStart` ASC, `txEnd` ASC),
ADD INDEX `table_1_star_end` (`start` ASC, `end` ASC);
Only one of them will be used by MySQL query, but MySQL will decide which is more useful automatically.

Deleting Duplicates in MySQL

Query was this:
CREATE TABLE `query` (
`id` int(11) NOT NULL auto_increment,
`searchquery` varchar(255) NOT NULL default '',
`datetime` int(11) NOT NULL default '0',
PRIMARY KEY (`id`)
) ENGINE=MyISAM
first I want to drop the table with:
ALTER TABLE `querynew` DROP `id`
and then delete the double entries..
I tried it with:
INSERT INTO `querynew` SELECT DISTINCT * FROM `query`
but with no success.. :(
and with ALTER TABLE query ADD UNIQUE ( searchquery ) - is it possible to save the queries only one time?
I would use MySQL's multi-table delete syntax:
DELETE q2 FROM query q1 JOIN query q2 USING (searchquery, datetime)
WHERE q1.id < q2.id;
I would do this using an index with the MySQL-specific IGNORE keyword. This kills two birds with one stone: it deletes duplicate rows, and adds a unique index so that you will not get any more of them. It is usually faster than the other methods as well:
alter ignore table query add unique index(searchquery, datetime);
You should be able to do it without first removing the column:
DELETE FROM `query`
WHERE `id` IN (
SELECT `id`
FROM `query` q
WHERE EXISTS ( -- Any matching rows with a lower id?
SELECT *
FROM `query`
WHERE `searchquery` = q.`searchquery`
AND `datetime` = q.`datetime`
AND `id` < q.`id`
)
);
You could also go via a temp table:
SELECT MIN(`id`), `searchquery`, `datetime`
INTO `temp_query`
GROUP BY `searchquery`, `datetime`;
DELETE FROM `query`;
INSERT INTO `query` SELECT * FROM `temp_query`;