creating table from two different table - mysql

I am creating table from two different table with query:
create table post_table as
( select t1.id, t2.url, t2.desc, t2.preview, t2.img_url,
t2.title, t2.hash, t2.rate
from user_record t1, post_data t2
primary key (t1.id, t2,hash))
what's syntax error here?
post_data
----
url varchar(255) No
desc varchar(2048) No
preview varchar(255) No
img_url varchar(128) No
title varchar(128) No
hash varchar(128) No // This is one
rate varchar(20) Yes NULL
user_record
id varchar(40) No //This is 2nd
name varchar(50) Yes NULL
email varchar(50) Yes NULL
picture varchar(50) No
UPDATE:
create table post_table (
id VARCHAR(40), url varchar(255), preview varchar(255) , img_url varchar(128), title varchar(128), hash varchar(128), rate varchar(20)
primary key (t1.id, t2,hash));
select t1.id, t2.url, t2.desc, t2.preview, t2.img_url,
t2.title, t2.hash, t2.rate
from user_record t1, post_data t2;

Formatting the CREATE TABLE statement so we can see the ( ) pairing:
create table post_table as (
select t1.id, t2.url, t2.desc, t2.preview, t2.img_url, t2.title, t2.hash, t2.rate
from user_record t1, post_data t2
primary key (t1.id, t2,hash)
)
We can see that the primary key is being attached to the select statement.
Beyond that there are specific restrictions around general CREATE TABLE syntax can be used in a CREATE TABLE ... SELECT statement.
From: http://dev.mysql.com/doc/refman/5.1/en/create-table-select.html
The ENGINE option is part of the CREATE TABLE statement, and should
not be used following the SELECT; this would result in a syntax error.
The same is true for other CREATE TABLE options such as CHARSET.
You can how ever select keys by using syntax similar to:
mysql> CREATE TABLE test (a INT NOT NULL AUTO_INCREMENT,
-> PRIMARY KEY (a), KEY(b))
-> ENGINE=MyISAM SELECT b,c FROM test2;
So with your query re-work it to define the column types first, then the keys, then the select statement last. We don't know your data types but it would look something similar to:
create table post_table (
id DATATYPE, url DATATYPE, desc DATATYPE...
primary key (t1.id, t2,hash))
)
select t1.id, t2.url, t2.desc, t2.preview, t2.img_url,
t2.title, t2.hash, t2.rate
from user_record t1, post_data t2

You have put key definition BEFORE select.
Also you can't do key definition without fields, so if you need keys, you have put all table structure.
http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Other way is create index after creating table by use CREATE INDEX

Related

SQL - Left Join and Group By causes row data from second table to get mixed up

I have two tables, the first table has a reference to the id from the second table, I want to make a query involving a left join with fields from the second table as well as with a COUNT function in the select, because of the COUNT function, I am using an GROUP BY clause.
So my query looks something like:
SELECT t1.id, t1.txt, t2.id, t2.txt, COUNT(t2.id)
FROM test_data1 t1
LEFT JOIN test_data2 t2 ON (t1.ref_col = t2.id)
GROUP BY t1.id
In my tables, only the second row of test_data1 has an entry in ref_col, so I would expect that for the first row in the results, the value for t2.id would be NULL, however that is not the case (in my example I see the value 2, but I'm not sure if there might be an element of randomness here).
If I use
SELECT MAX(t1.id), MAX(t1.txt), MAX(t2.id), MAX(t2.txt), COUNT(t2.id)
FROM test_data1 t1
LEFT JOIN test_data2 t2 ON (t1.ref_col = t2.id)
GROUP BY t1.id
I get my expected results, however I am surprised this is necessary given that there will at most only be one entry in test_data2 matching ref_col in test_data1.
Does anyone know why LEFT JOIN + GROUP BY is behaving this way? This is using MySQL version 8 on Linux.
If you want to reproduce this here are the table definitions:
CREATE TABLE test_data1 (id int unsigned NOT NULL AUTO_INCREMENT,
txt VARCHAR(45) DEFAULT NULL,
ref_col int unsigned DEFAULT NULL, PRIMARY KEY (id));
CREATE TABLE test_data1 (id int unsigned NOT NULL AUTO_INCREMENT,
txt VARCHAR(45) DEFAULT NULL,
ref_col int unsigned DEFAULT NULL, PRIMARY KEY (id));
INSERT INTO test_data1 (id, txt, ref_col)
VALUES
(1,'zz',NULL),
(2,'yy',2),
(3,'xx',NULL);
INSERT INTO test_data2 (id, txt)
VALUES
(1,'aa'),
(2,'bb'),
(3,'cc'),
(4,'dd');

LEFT JOIN table to find non matching rows, same table

I have a table that looks like this:
id int primary key
uniqueID string --not uniquely indexed
foreignKeyID int --foreignKey to another table
I want to find all the uniqueIds in this table that exist for foreign key 1 that do not exist for foreign key 2
I thought I could do something like this:
SELECT * FROM table t1
LEFT JOIN table t2
ON t1.uniqueID = t2.uniqueID
WHERE
t1.foreignKeyID = 1
AND t2.uniqueID IS NULL
However this is never giving me results. I can make it work with a NOT IN subquery but this is a very large table so I suspect a solution using joins will be faster.
Looking for the best way to structure this query.
Here's an sample data set and SQL Fiddle with an example of the working NOT IN query I am trying to convert to a LEFT JOIN:
CREATE TABLE `table` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`uniqueID` varchar(255),
`foreignKeyID` int(5) unsigned NOT NULL DEFAULT 0,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
INSERT INTO `table` (uniqueID, foreignKeyID) VALUES ('aaa', 1), ('bbb', 1);
http://sqlfiddle.com/#!9/48a3f3/4 and a non-working LEFT JOIN I thought would be equivalent.
Thanks!
Try this, seems to be working if understood the question properly:
SELECT *
FROM `table` t
LEFT JOIN `table` tt ON tt.uniqueID = t.uniqueID AND tt.foreignKeyID <> 1
WHERE t.foreignKeyID = 1 AND tt.id IS NULL;

Can't query names from a table that does not exist in another table

I have two tables that are identical in fields. I want to query names in table2 that are not in table1. Both tables have name field as unique (primary key).
Here are the info. of my database design:
My query is:
SELECT `table2`.`name` FROM `mydatabase`.`table2`, `mydatabase`.`table1`
WHERE `table2`.`name` NOT IN (SELECT `table1`.`name` FROM `mydatabase`.`table1`)
AND table2`.`name` NOT LIKE 'xyz%';
The output of SHOW CREATE TABLE <table name>:
For table1:
table1, CREATE TABLE `table1` (
`name` varchar(500) NOT NULL,
`ip` varchar(500) DEFAULT NULL,
`type` varchar(500) DEFAULT NULL,
`grade` varchar(500) DEFAULT NULL,
`extended_ip` text,
PRIMARY KEY (`name`),
UNIQUE KEY `mydatabase_name_UNIQUE` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
And table2:
tabl2, CREATE TABLE `table2` (
`name` varchar(500) NOT NULL,
`ip` varchar(500) DEFAULT NULL,
`type` varchar(500) DEFAULT NULL,
`grade` varchar(500) DEFAULT NULL,
`extended_ip` text,
PRIMARY KEY (`name`),
UNIQUE KEY `mydatabase_name_UNIQUE` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
The output of EXPLAIN <my query>:
# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1, PRIMARY, table1, , index, , mydatabase_name_UNIQUE, 502, , 17584, 100.00, Using index
1, PRIMARY, table2, , index, , mydatabase_name_UNIQUE, 502, , 46264, 100.00, Using where; Using index; Using join buffer (Block Nested Loop)
2, SUBQUERY, table1 , index, PRIMARY,mydatabase_name_UNIQUE, mydatabase_name_UNIQUE, 502, , 17584, 100.00, Using index
EDIT:
And I forgot to mention what happens is that the databse just crashes with my query. i am using mysql-workbench in Ubuntu 18. When I perform this query the whole workbench closes and I have to restart opening it again.
Just do a LEFT JOIN on name, with table2 as your starting table, since you want to consider all the names from table2 which do not exist in table1. Names which don't exist in table1 will have a null value post the join. Note that this join based solution will be significantly faster than any subquery based approach.
Also, you should avoid comma (,) based implicit joins. It is old syntax, and you should use explicit JOIN based syntax. Read: Explicit vs implicit SQL joins
Also, it is a good habit to use Aliasing for better readability
Try the following:
SELECT t2.name
FROM `mydatabase`.`table2` AS t2
LEFT JOIN `mydatabase`.`table1` AS t1 ON t1.name = t2.name
WHERE t1.name IS NULL
AND t2.name NOT LIKE 'xyz%';
Try a subquery:
SELECT `table2`.`name` FROM `mydatabase`.`table2` WHERE `table2`.`name` NOT IN (SELECT `table1`.`name` FROM `mydatabase`.`table1`);

Insert if doesn't exist fail if table empty [duplicate]

This question already has answers here:
How can I do 'insert if not exists' in MySQL?
(11 answers)
Closed 6 years ago.
I used the following command to avoid duplicates in a table :
INSERT INTO mytable (num,name)
SELECT 2,'example' FROM mytable WHERE NOT EXISTS
(SELECT * FROM mytable WHERE num=2 AND name='example') LIMIT 1;
It is working but NOT if mytable is empty.
mytable also contain a AUTO_INCREMENT id.
CREATE TABLE mytable (
id int(11) NOT NULL auto_increment,
num int(11) NOT NULL,
name varchar(100) NOT NULL,
PRIMARY KEY (id)
);
Do you recommanded another method or a workaround ?
In my case replacing mytable by DUAL did the trick. (But i have no idea why)
INSERT INTO mytable (num,name)
SELECT 2,'example' FROM mytable WHERE NOT EXISTS
(SELECT * FROM mytable WHERE num=2 AND name='example') LIMIT 1;
Replaced by :
INSERT INTO mytable (num,name)
SELECT 2,'example' FROM DUAL WHERE NOT EXISTS
(SELECT * FROM mytable WHERE num=2 AND name='example') LIMIT 1;
Thanks for the help.
You can create Unique column by using MySQL UNIQUE
Just update your CREATE TABLE query like below
CREATE TABLE mytable (
id int(11) NOT NULL auto_increment,
num int(11) NOT NULL,
name varchar(100) NOT NULL,
PRIMARY KEY (id),
UNIQUE (num) # You can define Unique column like this
);
Note: Then you no need to check Unique value when Save to Database.
And try to use normal SQL query for inserting Data
INSERT INTO mytable (num,name) VALUES(2,'example')
You can do something like
Select * from tbl where _____ = ______
Then add if statement if results are available then insert or otherwise leave
Why do not you simply use INSERT IGNORE. Also why do you want to copy the data from same table?

Super slow query with CROSS JOIN

I have two tables named table_1 (1GB) and reference (250Mb).
When I query a cross join on reference it takes 16hours to update table_1 .. We changed the system files EXT3 for XFS but still it's taking 16hrs.. WHAT AM I DOING WRONG??
Here is the update/cross join query :
mysql> UPDATE table_1 CROSS JOIN reference ON
-> (table_1.start >= reference.txStart AND table_1.end <= reference.txEnd)
-> SET table_1.name = reference.name;
Query OK, 17311434 rows affected (16 hours 36 min 48.62 sec)
Rows matched: 17311434 Changed: 17311434 Warnings: 0
Here is a show create table of table_1 and reference:
CREATE TABLE `table_1` (
`strand` char(1) DEFAULT NULL,
`chr` varchar(10) DEFAULT NULL,
`start` int(11) DEFAULT NULL,
`end` int(11) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`name2` varchar(255) DEFAULT NULL,
KEY `annot` (`start`,`end`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
CREATE TABLE `reference` (
`bin` smallint(5) unsigned NOT NULL,
`name` varchar(255) NOT NULL,
`chrom` varchar(255) NOT NULL,
`strand` char(1) NOT NULL,
`txStart` int(10) unsigned NOT NULL,
`txEnd` int(10) unsigned NOT NULL,
`cdsStart` int(10) unsigned NOT NULL,
`cdsEnd` int(10) unsigned NOT NULL,
`exonCount` int(10) unsigned NOT NULL,
`exonStarts` longblob NOT NULL,
`exonEnds` longblob NOT NULL,
`score` int(11) DEFAULT NULL,
`name2` varchar(255) NOT NULL,
`cdsStartStat` enum('none','unk','incmpl','cmpl') NOT NULL,
`cdsEndStat` enum('none','unk','incmpl','cmpl') NOT NULL,
`exonFrames` longblob NOT NULL,
KEY `chrom` (`chrom`,`bin`),
KEY `name` (`name`),
KEY `name2` (`name2`),
KEY `annot` (`txStart`,`txEnd`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
You should index table_1.start, reference.txStart, table_1.end and reference.txEnd table fields:
ALTER TABLE `table_1` ADD INDEX ( `start` ) ;
ALTER TABLE `table_1` ADD INDEX ( `end` ) ;
ALTER TABLE `reference` ADD INDEX ( `txStart` ) ;
ALTER TABLE `reference` ADD INDEX ( `txEnd` ) ;
Cross joins are Cartesian Products, which are probably one of the most computationally expensive things to compute (they don't scale well).
For each table T_i for i = 1 to n, the number of rows generated by crossing tables T_1 to T_n is the size of each table multiplied by the size of each other table, ie
|T_1| * |T_2| * ... * |T_n|
Assuming each table has M rows, the resulting cost of computing the cross join is then
M_1 * M_2 ... M_n = O(M^n)
which is exponential in the number of tables involved in the join.
I see 2 problems with the UPDATE statement.
There is no index for the End fields. The compound indexes (annot) you have will be used only for the start fields in this query. You should add them as suggested by Emre:
ALTER TABLE `table_1` ADD INDEX ( `end` ) ;
ALTER TABLE `reference` ADD INDEX ( `txEnd` ) ;
Second, the JOIN may (and probably does) find many rows of table reference that are related to a row of table_1. So some (or all) rows of table_1 that are updated, are updated many times. Check the result of this query, to see if it is the same as your updated rows count (17311434):
SELECT COUNT(*)
FROM table_1
WHERE EXISTS
( SELECT *
FROM reference
WHERE table_1.start >= reference.txStart
AND table_1.`end` <= reference.txEnd
)
There can be other ways to write this query but the lack of a PRIMARY KEY on both tables makes it harder. If you define a primary key on table_1, try this, replacing id with the primary key.
Update: No, do not try it on a table with 34M rows. Check the execution plan and try with smaller tables first.
UPDATE table_1 AS t1
JOIN
( SELECT t2.id
, r.name
FROM table_1 AS t2
JOIN
( SELECT name, txStart, txEnd
FROM reference
GROUP BY txStart, txEnd
) AS r
ON t2.start >= r.txStart
AND t2.`end` <= r.txEnd
GROUP BY t2.id
) AS good
ON good.id = t1.id
SET t1.name = good.name;
You can check the query plan by running EXPLAIN on the equivalent SELECT:
EXPLAIN
SELECT t1.id, t1.name, good.name
FROM table_1 AS t1
JOIN
( SELECT t2.id
, r.name
FROM table_1 AS t2
JOIN
( SELECT name, txStart, txEnd
FROM reference
GROUP BY txStart, txEnd
) AS r
ON t2.start >= r.txStart
AND t2.`end` <= r.txEnd
GROUP BY t2.id
) AS good
ON good.id = t1.id ;
Try this:
UPDATE table_1 SET
table_1.name = (
select reference.name
from reference
where table_1.start >= reference.txStart
and table_1.end <= reference.txEnd)
Somebody already offered you to add some indexes. But I think the best performance you may get with these two indexes:
ALTER TABLE `test`.`time`
ADD INDEX `reference_start_end` (`txStart` ASC, `txEnd` ASC),
ADD INDEX `table_1_star_end` (`start` ASC, `end` ASC);
Only one of them will be used by MySQL query, but MySQL will decide which is more useful automatically.