Good day everyone! I have MySQL Database with tables on
CREATE TABLE `TableWithInnoDBEngine` (
`userID` int(11) NOT NULL, PRIMARY KEY (`userID`),
UNIQUE KEY `userID_UNIQUE` (`userID`) )
ENGINE=InnoDB DEFAULT CHARSET=utf8;
mysql> select * from TableWithInnoDBEngine;
+--------+
| userID |
+--------+
| 1 |
| 2 |
| 3 |
+--------+
I'm doing :
INSERT IGNORE INTO TableWithInnoDBEngine (UserID) VALUES (1),(2),(3),(4),(5);
2 row(s) affected Records: 5 Duplicates: 3 Warnings: 0
And want to get all affected rows?
SELECT LAST_INSERT_ID() returns only last value (5), but need to return
+--------+
| userID |
+--------+
| 4 |
| 5 |
+--------+
I'm using PHP 5.6.17 + MySQL 5.5.46-0+deb7u1
Thank you for your responses!
I would create a temporary table similar to the destination table, insert all the id-s there, and then you can make 2 selects, one to select the duplicates and one to insert the non-duplicates into table.
I don't understand the second part of your question: inserting all id-s from 1 to 35000, and getting the duplicates? It's equivalent to:
SELECT DISTINCT userId FROM table;
Update:
When you do:
$mysqli->query("INSERT IGNORE INTO TableWithInnoDBEngine (UserID) VALUES (1),(2),(3),(4),(5)");
$info = $mysqli->info();
You can get the information you want in string format like "Records: 3 Duplicates: 0 Warnings: 0" see: http://php.net/manual/en/mysqli.info.php
If you use INSERT ... ON DUPLICATE KEY UPDATE ... istead of INSERT IGNORE ... you can save all duplicate IDs into one string doing something like this:
SET #duplicates := '';
INSERT INTO TableWithInnoDBEngine (UserID) VALUES (1),(2),(3),(4),(5)
ON DUPLICATE KEY UPDATE
userID = userID + if(#duplicates := concat(#duplicates,',',userID),0,0);
SET #duplicates := SUBSTRING(#duplicates FROM 2);
SELECT #duplicates;
You can now parse the resulting string on the application side to filter the inserted data.
Seems that fast and cheap in memory usage solution is to insert new data into temporary table and compare it with original table:
CREATE TABLE `Original_TableWithInnoDBEngine` (
`userID` int(11) NOT NULL,
UNIQUE KEY `userID_UNIQUE` (`userID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `tmp_TableWithInnoDBEngine` (
`userID` int(11) NOT NULL,
UNIQUE KEY `userID_UNIQUE` (`userID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
and then:
INSERT INTO Original_TableWithInnoDBEngine (userID) VALUES (1),(2),(3),(4),(5),(6);
select * from Original_TableWithInnoDBEngine;
+--------+
| userID |
+--------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
+--------+
INSERT INTO tmp_TableWithInnoDBEngine (userID) VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
select * from tmp_TableWithInnoDBEngine;
+--------+
| userID |
+--------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
+--------+
and now i use this query to get values that is in tmp table but not in original:
SELECT tmp_TableWithInnoDBEngine.UserID FROM tmp_TableWithInnoDBEngine WHERE tmp_TableWithInnoDBEngine.UserID NOT IN(SELECT UserID FROM original_TableWithInnoDBEngine)";
+--------+
| userID |
+--------+
| 7 |
| 8 |
| 9 |
| 10 |
+--------+
Related
create table Branch
(
BranchNo char(4),
Street varchar(30),
City varchar(30),
PostCode varchar(10)
)
INSERT INTO BRANCH
VALUES ('B002', '55 cOVER', 'LONDON',NULL)
INSERT INTO BRANCH
VALUES ('B003', '163 Main Street', 'Glasgow',NULL)
INSERT INTO BRANCH
VALUES ('B004', '32 Manse Road', 'Bristol',NULL)
INSERT INTO BRANCH
VALUES ('B005', '22 Dear Road', 'LONDON',NULL)
INSERT INTO BRANCH
VALUES ('B007', '16 Argyll', 'Abend',NULL)
Create a view named ViewDeC that displays information of all branches. Must say
make sure it is not possible to update the data for the branch table (Branch) through this View
Create a view and don't let the database update mysql?
enter image description here
If I am not mistaken, this is about how to create a readonly view. Though MySQL does not support creating a view with readonly attribute DIRECTLY, certain things can be done to make the view READONLY. One workaround is to make the view through joined tables.
create view ViewDeC as
select BranchNo,Street,City,PostCode
from Branch
join (select 1) t;
select * from ViewDec;
INSERT INTO ViewDec
VALUES ('B009', '99 Argyll', 'bender',NULL);
-- Error Code: 1471. The target table ViewDec of the INSERT is not insertable-into
Note, this is implemented at the cost of some performance, but not terribly unbearable. I have a table with 1.4 million rows. Here is the test with and without join using a table scan as the access method.
select * from proctable;
-- 1429158 rows in set (1.26 sec)
select * from proctable join (select 1) t;
-- 1429158 rows in set (1.40 sec)
However, for an index lookup access method, this is almost non-existent.
select * from proctable join (select 1) t where id between 100 and 500;
-- 401 rows in set (0.00 sec)
explain select * from proctable join (select 1) t where id between 100 and 500;
+----+-------------+------------+------------+--------+---------------+---------+---------+------+------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+---------------+---------+---------+------+------+----------+----------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | proctable | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 401 | 100.00 | Using where |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+------------+--------+---------------+---------+---------+------+------+----------+----------------+
Lets think of it this way, say I have a table called "names" in MYSQL like so:
id| name
1 | Bob
2 | Sally
3 | Anne
Where "id" is a unique identifier for the table, and auto-increments with every addition of a row.
Say I somehow managed to throw in a row with an id that is completely out of place in the order, like so:
id| name
1 | Bob
2 | Sally
3 | Anne
20| John
Would the rows following the random row continue from the new id 20? (e.g next row added has id 21), or would they still continue from id 3? (e.g next row added has id 4)
Has this happened in SQl before?
They will continue with 21, which prohibits duplicates. Otherwise you would have a problem when you reach 19 and the next inserted row should become 20, which is already there.
By the way it is not complicated to insert such a row. Just provide a specific value on INSERT instead of leaving out the auto-increment column or handing over NULL.
Unless you set the next-autoincrement value manually, MySQL does everything to assure that one does not run into conflicts. So if you insert a big value, it saves this+1 as next autoincrement value.
To see what the next auto increment for mysql table will be, use
SHOW TABLE STATUS LIKE 'tablename';
There you will have Auto_increment column which says what is the next auto increment value. For the sake of test and curiosity I have conducted the following test:
I have created the table as follows:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`num` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1
Then I have conducted following queries:
INSERT INTO `test` (`id`,`num`) VALUES (NULL,1);
INSERT INTO `test` (`id`,`num`) VALUES (NULL,2);
INSERT INTO `test` (`id`,`num`) VALUES (NULL,3);
INSERT INTO `test` (`id`,`num`) VALUES (50,4);
INSERT INTO `test` (`id`,`num`) VALUES (NULL,5);
Output is as follows:
mysql> select * from `test`;
+----+------+
| id | num |
+----+------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 50 | 4 |
| 51 | 5 |
+----+------+
5 rows in set (0.00 sec)
Which means, that after you insert your custom value, Auto_increment also gets incremented. Then I have executed one more query:
INSERT INTO `test` (`id`,`num`) VALUES (100,6);
And after that status is as follows
mysql> SHOW TABLE STATUS LIKE 'test';
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------------+----------+----------------+---------+
| test | InnoDB | 10 | Compact | 6 | 2730 | 16384 | 0 | 0 | 8388608 | 101 | 2014-02-26 22:12:32 | NULL | NULL | latin1_swedish_ci | NULL | | |
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------------+----------+----------------+---------+
1 row in set (0.00 sec)
You can see that next auto increment value is going to be 101, which means that MySQL automatically adjusts to the inserted values.
Let me know what you think.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How can I remove duplicate rows?
Remove duplicates using only a MySQL query?
I have a large table with ~14M entries. The table type is MyISAM ans not InnoDB.
Unfortunately, I have some duplicate entries in this table that I found with the following request :
SELECT device_serial, temp, tstamp, COUNT(*) c FROM up_logs GROUP BY device_serial, temp, tstamp HAVING c > 1
To avoid these duplicates in the future, I want to convert my current index to a unique constraint using SQL request :
ALTER TABLE up_logs DROP INDEX UK_UP_LOGS_TSTAMP_DEVICE_SERIAL,
ALTER TABLE up_logs ADD INDEX UK_UP_LOGS_TSTAMP_DEVICE_SERIAL ( `tstamp` , `device_serial` )
But before that, I need to clean up my duplicates!
My question is : How can I keep only one entry of my duplicated entries? Keep in mind that my table contain 14M entries, so I would like avoid loops if it is possible.
Any comments are welcome!
Creating a new unique key on the over columns you need to have as uniques will automatically clean the table of any duplicates.
ALTER IGNORE TABLE `table_name`
ADD UNIQUE KEY `key_name`(`column_1`,`column_2`);
The IGNORE part does not allow the script to terminate after the first error occurs. And the default behavior is to delete the duplicates.
Since MySQL allows Subqueries in update/delete statements, but not if they refer to the table you want to update, I´d create a copy of the original table first. Then:
DELETE FROM original_table
WHERE id NOT IN(
SELECT id FROM copy_table
GROUP BY column1, column2, ...
);
But I could imagine that copying a table with 14M entries takes some time... selecting the items to keep when copying might make it faster:
INSERT INTO copy_table
SELECT * FROM original_table
GROUP BY column1, column2, ...;
and then
DELETE FROM original_table
WHERE id IN(
SELECT id FROM copy_table
);
It was some time since I used MySQL and SQL in general last time, so I´m quite sure that there is something with better performance - but this should work ;)
This is how you can delete duplicate rows... I'll write you my example and you'll need to apply to your code. I have Actors table with ID and I want to delete the rows with repeated first_name
mysql> select actor_id, first_name from actor_2;
+----------+-------------+
| actor_id | first_name |
+----------+-------------+
| 1 | PENELOPE |
| 2 | NICK |
| 3 | ED |
....
| 199 | JULIA |
| 200 | THORA |
+----------+-------------+
200 rows in set (0.00 sec)
-Now I use a Variable called #a to get the ID if the next row have the same first_name(repeated, null if it's not).
mysql> select if(first_name=#a,actor_id,null) as first_names,#a:=first_name from actor_2 order by first_name;
+---------------+----------------+
| first_names | #a:=first_name |
+---------------+----------------+
| NULL | ADAM |
| 71 | ADAM |
| NULL | AL |
| NULL | ALAN |
| NULL | ALBERT |
| 125 | ALBERT |
| NULL | ALEC |
| NULL | ANGELA |
| 144 | ANGELA |
...
| NULL | WILL |
| NULL | WILLIAM |
| NULL | WOODY |
| 28 | WOODY |
| NULL | ZERO |
+---------------+----------------+
200 rows in set (0.00 sec)
-Now we can get only duplicates ID:
mysql> select first_names from (select if(first_name=#a,actor_id,null) as first_names,#a:=first_name from actor_2 order by first_name) as t1;
+-------------+
| first_names |
+-------------+
| NULL |
| 71 |
| NULL |
...
| 28 |
| NULL |
+-------------+
200 rows in set (0.00 sec)
-the Final Step, Lets DELETE!
mysql> delete from actor_2 where actor_id in (select first_names from (select if(first_name=#a,actor_id,null) as first_names,#a:=first_name from actor_2 order by first_name) as t1);
Query OK, 72 rows affected (0.01 sec)
-Now lets check our table:
mysql> select count(*) from actor_2 group by first_name;
+----------+
| count(*) |
+----------+
| 1 |
| 1 |
| 1 |
...
| 1 |
+----------+
128 rows in set (0.00 sec)
it works, if you have any question write me back
Maybe I didnt explain well.
these are the tables:
Table 1
CREATE TABLE `notforeverdata` (
`id` int(11) NOT NULL auto_increment,
`num` varchar(255) default NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `notforeverdata` VALUES (1, '4,3,0,5');
Table 2
CREATE TABLE `notforeverdata2` (
`id2` int(11) NOT NULL auto_increment,
`num2` varchar(255) default NULL,
PRIMARY KEY (`id2`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `notforeverdata2` VALUES (1, '2,5,6,8');
What I need to do is to check is if any of the numbers of the column "num" in notforeverdata exists in the column num2 of notforeverdata2. in this case the number "5" in the col num of notforeverdata exists in notforeverdata2.
Any Idea?
Thanks
For this example I will create a table, load it with random numbers, and search for each number in a provided list. Here is the catch: the provided list must start with and end with a comma.
From your question, I will use ',3,2,5,'
Here is the example
use test
drop table if exists notforeverdata;
create table notforeverdata
(
id int not null auto_increment,
num VARCHAR(255),
PRIMARY KEY (id)
);
insert into notforeverdata (num) values
(2),(7),(9),(11),(13),(15),(4),(3),(90),(97),(18),(5),(17);
SELECT * FROM notforeverdata;
SELECT * FROM notforeverdata WHERE LOCATE(CONCAT(',',num,','),(',3,2,5,'));
I actually ran it in MySQl 5.5.12 on my desktop. Here is the result:
mysql> use test
drop table if exists notforeverdata;
Database changed
mysql> drop table if exists notforeverdata;
Query OK, 0 rows affected (0.03 sec)
mysql> create table notforeverdata
-> (
-> id int not null auto_increment,
-> num VARCHAR(255),
-> PRIMARY KEY (id)
-> );
Query OK, 0 rows affected (0.12 sec)
mysql> insert into notforeverdata (num) values
-> (2),(7),(9),(11),(13),(15),(4),(3),(90),(97),(18),(5),(17);
Query OK, 13 rows affected (0.06 sec)
Records: 13 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM notforeverdata;
+----+------+
| id | num |
+----+------+
| 1 | 2 |
| 2 | 7 |
| 3 | 9 |
| 4 | 11 |
| 5 | 13 |
| 6 | 15 |
| 7 | 4 |
| 8 | 3 |
| 9 | 90 |
| 10 | 97 |
| 11 | 18 |
| 12 | 5 |
| 13 | 17 |
+----+------+
13 rows in set (0.00 sec)
mysql> SELECT * FROM notforeverdata WHERE LOCATE(CONCAT(',',num,','),(',3,2,5,'));
+----+------+
| id | num |
+----+------+
| 1 | 2 |
| 8 | 3 |
| 12 | 5 |
+----+------+
3 rows in set (0.00 sec)
mysql>
This will, of course, perform a full table scan. Notwithstanding, this works.
Give it a Try !!!
I'm wondering what kind of PK I should be choosing for this table in MySQL. Almost all the SELECT operations will involve the DATETIME (date ranges, a specific date, etc.).
Is there a best practice for this?
I wouldn't recommend that the DATETIME be your PK, but you should certainly create an index on that column.
It's perfectly acceptable to use dates to form part of a composite primary key especially if you're using innodb and want to take advantage of clustered primary key indexes to gain maximum read performance.
have a look at the following:
http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/
MySQL script
Things to note:
innodb doesnt support auto_increment composite primary keys hence the use of the sequence table.
the auto_increment portion of the primary key just helps guarantee uniqueness - the outer part of the key is the important part i.e the date.
full script here: http://pastie.org/1475625 or continue reading...
drop table if exists foo_seq;
create table foo_seq
(
next_val int unsigned not null default 0
)
engine = innodb;
insert into foo_seq values (0);
drop table if exists foo;
create table foo
(
foo_date datetime not null,
foo_id int unsigned not null, -- auto inc field which just guarantees uniqueness
primary key (foo_date, foo_id) -- clustered composite PK (innodb only)
)
engine=innodb;
delimiter #
create trigger foo_before_ins_trig before insert on foo
for each row
begin
declare v_id int unsigned default 0;
select next_val+1 into v_id from foo_seq;
set new.foo_id = v_id;
update foo_seq set next_val = v_id;
end#
delimiter ;
Stats:
select count(*) as counter from foo; -- count(*) under innodb always slow
+---------+
| counter |
+---------+
| 2000000 |
+---------+
select min(foo_date) as min_foo_date from foo;
+---------------------+
| min_foo_date |
+---------------------+
| 1782-11-21 16:32:00 |
+---------------------+
1 row in set (0.00 sec)
select max(foo_date) as max_foo_date from foo;
+---------------------+
| max_foo_date |
+---------------------+
| 2011-01-18 23:06:04 |
+---------------------+
1 row in set (0.00 sec)
select count(*) as counter from foo where foo_date between
'2009-01-01 00:00:00' and '2011-01-01 00:00:00';
+---------+
| counter |
+---------+
| 17520 |
+---------+
1 row in set (0.01 sec)
select * from foo where foo_date between
'2009-01-01 00:00:00' and '2011-01-01 00:00:00' order by 1 desc limit 10;
+---------------------+--------+
| foo_date | foo_id |
+---------------------+--------+
| 2010-12-31 23:06:04 | 433 |
| 2010-12-31 22:06:04 | 434 |
| 2010-12-31 21:06:04 | 435 |
| 2010-12-31 20:06:04 | 436 |
| 2010-12-31 19:06:04 | 437 |
| 2010-12-31 18:06:04 | 438 |
| 2010-12-31 17:06:04 | 439 |
| 2010-12-31 16:06:04 | 440 |
| 2010-12-31 15:06:04 | 441 |
| 2010-12-31 14:06:04 | 442 |
+---------------------+--------+
10 rows in set (0.00 sec)
explain
select * from foo where foo_date between
'2009-01-01 00:00:00' and '2011-01-01 00:00:00' order by 1 desc limit 10;
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref |rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+
| 1 | SIMPLE | foo | range | PRIMARY | PRIMARY | 8 | NULL |35308 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+
1 row in set (0.00 sec)
Pretty performant considering there are 2 million rows...
Hope this helps :)
Choose an autonumber in order to be on the safe side. You could have two or more rows with the same datetime.