Let say I have a MySQL table which contains three columns: id, a and b and the column named id is an AUTO INCREMENT field. If I pass a query like the following to MySQL, it will works fine:
REPLACE INTO `table` (`id`, `a`, `b`) VALUES (1, 'A', 'B')
But if I skip the field id it will no longer works, which is expected.
I want to know if there is a way to ignore some fields in the REPLACE query. So the above query could be something like this:
REPLACE INTO `table` (`a`, `b`) VALUES ('A', 'B')
Why do I need such a thing?
Sometimes I need to check a database with a SELECT query to see if a row exists or not. If it is exists then I need to UPDATE the existing row, otherwise I need to INSERT a new row. I'm wondering if I could achieve a similar result (but not same) with a single REPLACE query.
Why it couldn't be the same result? Simply because REPLACE will DELETE the existing row and will INSERT a new row, which will lose the current primary key and will increase the auto incremented values. In contrast, in an UPDATE query, primary key and the AI fields will be untouched.
MySQL REPLACE.
That's not how you're supposed to use replace.
use replace only when you know primary key values.
Manual:
Note that unless the table has a PRIMARY KEY or UNIQUE index, using a
REPLACE statement makes no sense. It becomes equivalent to INSERT,
because there is no index to be used to determine whether a new row
duplicates another.
What if you have multiple rows that match the fields?
Consider adding a key that you can match on and use INSERT IGNORE.. ON DUPLICATE KEY UPDATE. The way INSERT IGNORE works is slightly different from REPLACE.
INSERT IGNORE is very fast but can have some invisible side effects.
INSERT... ON DUPLICATE KEY UPDATE
Which has fewer side effects but is probably much slower, especially for MyISAM, heavy write loads, or heavily indexed tables.
For more details on the side effects, see:
https://stackoverflow.com/a/548570/1301627
Using INSERT IGNORE seems to work well for very fast lookup MyISAM tables with few columns (maybe just a VARCHAR field).
For example,
create table cities (
city_id int not null auto_increment,
city varchar(200) not null,
primary key (city_id),
unique key city (city))
engine=myisam default charset=utf8;
insert ignore into cities (city) values ("Los Angeles");
In this case, repeatedly re-inserting "Los Angeles" will not result in any actual changes to the table at all and will prevent a new auto_increment ID from being generated, which can help prevent ID field exhaustion (using up all the available auto_increment range on heavily churned tables).
For even more speed, use a small hash like spooky hash before inserting and use that for a separate unique key column and then the varchar won't get indexed at all.
Related
I've been reading up on how to use MySQL insert on duplicate key to see if it will allow me to avoid Selecting a row, checking if it exists, and then either inserting or updating. As I've read the documentation however, there is one area that confuses me. This is what the documentation says:
If you specify ON DUPLICATE KEY UPDATE, and a row is inserted that would cause a duplicate value in a UNIQUE index or PRIMARY KEY, an UPDATE of the old row is performed
The thing is, I don't want to know if this will work for my problem, because the 'condition' I have for not inserting a new one is the existence of a row that has two columns equal to a certain value, not necessarily that the primary key is the same. Right now the syntax I'm imagining is this, but I don't know if it will always insert instead of replace:
INSERT INTO attendance (event_id, user_id, status) VALUES(some_event_number, some_user_id, some_status) ON DUPLICATE KEY UPDATE status=1
The thing is, event_id and user_id aren't primary keys, but if a row in the table 'attendance' already has those columns with those values, I just want to update it. Otherwise I would like to insert it. Is this even possible with ON DUPLICATE? If not, what other method might I use?
The quote includes "a duplicate value in a UNIQUE index". So, your values do not need to be the primary key:
create unique index attendance_eventid_userid on attendance(event_id, user_id);
Presumably, you want to update the existing record because you don't want duplicates. If you want duplicates sometimes, but not for this particular insert, then you will need another method.
If I were you, I would make a primary key out of event_id and user_id. That will make this extremely easy with ON DUPLICATE.
SQLFiddle
create table attendance (
event_id int,
user_id int,
status varchar(100),
primary key(event_id, user_id)
);
Then with ease:
insert into attendance (event_id, user_id, status) values(some_event_number, some_user_id, some_status)
on duplicate key
update status = values(status);
Maybe you can try to write a trigger that checks if the pair (event_id, user_id) exists in the table before inserting, and if it exists just update it.
To the broader question of "Will INSERT ... ON DUPLICATE respect a UK even if the PK changes", the answer is yes: SQLFiddle
In this SQLFiddle I insert a new record, with a new PK id, but its values would violate the UK. It performs the ON DUPLICATE and the original PK id is preserved, but the non-UK ON DUPLICATE KEY UPDATE value changes.
Here is my current table;
CREATE TABLE `linkler` (
`link` varchar(256) NOT NULL,
UNIQUE KEY `link` (`link`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I will only use these 2 queries on the table, SELECT EXISTS (SELECT 1 FROM linkler WHERE link = ?) and INSERT INTO linkler (link) VALUES (?)
I don't know much about indexing databases. Since I won't be adding same thing twice, I thought marking it unique would be a good idea. Is there anything I can do to increase performance? For example, can I do something that rows are always sorted so that mysql can do binary search or something similiar?
Adding a unique index is perfect. Also, since you have a unique index, you don't need to check for existence before you do an insert. You can simply use INSERT IGNORE to insert the row if it doesn't exist (or ignore the error if it does):
INSERT IGNORE INTO linkler (link) VALUES (?)
Whether that will be faster than doing a SELECT/INSERT combination depends on how often you expect to have duplicates.
ETA: If that is the only column in this table, you might want to make it a PRIMARY KEY instead of just a UNIQUE KEY although I don't think it really matters much other than for clarity.
my mysql database have a table with 3 columns ,
its strucure :
CREATE TABLE `Table` (
`value1` VARCHAR(50) NOT NULL DEFAULT '',
`value2` VARCHAR(50) NOT NULL DEFAULT '',
`value3` TEXT NULL,
`value4` VARCHAR(50) NULL DEFAULT NULL,
`value5` VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (`value1`, `value2`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
ROW_FORMAT=DEFAULT
the first and the second columns are :
varchar(50)
and they both combine the primary key
the third column is
text ,
the table contain about 1000000 records i doing my search using the first column it take minutes
to search a specific item.
how can i index this table to fast my search and what index type to use ?
A primary key of 50+50 characters? What does it contain? Are you should that the table is in 3rd normal form? It sounds that the key itself might contain some information, sounds like an alarm bell to me.
If you can change your primary key with something else much shorter and manageable, there are a few things you can try:
externalise text3 to a different table, matched by the new primary key
analyse your table to determine a more optimised length, rather than 50 chars with SELECT FROM xcve_info PROCEDURE ANALYSE()
change the size of the fields accordingly and if you can afford the extra space change VARCHAR to CHAR
add an index to value1, which probably shouldn't be part of the primary key
Always check the performance of the changes, to see if they were worth it or not.
What is the actual query you're executing? The index will only help if you're searching for a prefix (or exact) match. For example:
SELECT * FROM Table WHERE value1='Foo%'
will find anything that starts with Foo, and should use the index and be relatively quick. On the other hand:
SELECT * FROM Table WHERE value1='%Foo%'
will not use the index and you'll be forced to do a full table scan. If you need to do that, you should use a full-text index and query: http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html
The only thing I can see that might possibly improve things would be to add a unique index to the first column. This obviously does not work if the first column is not actually unique, and it is questionable if it would be at all more efficient than the already existing primary key. The way I thought this might possibly help is if the unique index on the first column was smaller than the primary key (index scans would be quicker).
Also, you might be able to create an index on parts of your first column, maybe only the 5 or 10 first characters, that could be more efficient.
Also, after deleting and/or inserting lots of values, remember to run ANALYZE TABLE on the affected table, or even OPTIMIZE TABLE. That way, the stats for the MySQL query optimizer are updated.
Always is a bad idea to use such a long strings as indexes, but in case you really need to search it that way consider how are you filtering the query because MySQL can't perform like operations on indexes, so conditions like WHERE value1 LIKE "%mytext%" will never use indexes, instead try searching a shorter string so MySQL can convert that operation into a equality one. For example, use: value1 = "XXXXX" where "XXXXX" is a part of the string. To determine the best length of the comparision string analize the selectivity of your value1 field.
Consider too that multiple field indexes like (value1, value2) won't use the second field unless the first matches exactly. That it's not a bad index, is just so you know and understand how it works.
If that doesn't works, another solution could be store value1 and value2 in a new table (table2 for example) with an auto incremental id field, then add a foreign key from Table to table2 using ids (f.e. my_long_id) and finally create an index on table2 like: my_idx (value1, value2). The search will be something like:
SELECT t1.*
FROM
table2 as t2
INNER JOIN Table as t1 ON (t1.my_long_id = t2.id)
WHERE
t2.value1 = "your_string"
Ensure that table2 has an index like (value1, value2) and that Table has a primary index on (my_long_id).
As final recommendation, add an 'id' field with AUTO_INCREMENT as PRIMARY KEY and (value1, values2) as a unique/regular key. This helps a lot because B-Tree stores sorted indexes, so using a string of 100 chars makes you waste I/O in this sorting. InnoDB determines the best position for that index at insert, probably it will need to move some indexes to another pages in order to get some space for the new one. With an auto incremental value this is easier and cheaper because it will never need to do such movements.
But why are you searching for a unique item on a non-unique column? Why can't you make queries based on your primary key? If for some reason you cannot then I would index value1, the column you are searching on.
CREATE INDEX 'index_name'
ON 'table' (column_name)
I need to import data from one MySQL table into another. The old table has a different outdated structure (which isn't terribly relevant). That said, I'm appending a field to the new table called "imported_id" which saves the original id from the old table in order to prevent duplicate imports of the old records.
My question now is, how do I actually prevent duplicates? Due to the parallel rollout of the new system with the old, the import will unfortunately need to be run more than once. I can't make the "import_id" field PK/UNIQUE because it will have null values for fields that do not come from the old table, thereby throwing an error when adding new fields. Is there a way to use some type of INSERT IGNORE on the fly for an arbitrary column that doesn't natively have constraints?
The more I think about this problem, the more I think I should handle it in the initial SELECT. However, I'd be interested in quality mechanisms by which to handle this in general.
Best.
You should be able to create a unique key on the import_id column and still specify that column as nullable. It is only primary key columns that must be specified as NOT NULL.
That said, on the new table you could specify a unique key on the nullable import_id column and then handle any duplicate key errors when inserting from the old table into the new table using ON DUPLICATE KEY
Here's a basic worked example of what I'm driving at:
create table your_table
(id int unsigned primary key auto_increment,
someColumn varchar(50) not null,
import_id int null,
UNIQUE KEY `importIdUidx1` (import_id)
);
insert into your_table (someColumn,import_id) values ('someValue1',1) on duplicate key update someColumn = 'someValue1';
insert into your_table (someColumn) values ('someValue2');
insert into your_table (someColumn) values ('someValue3');;
insert into your_table (someColumn,import_id) values ('someValue4',1) on duplicate key update someColumn = 'someValue4';
where the first and last inserts represent inserts from the old table and the 2nd and 3rd represent inserts from elsewhere.
Hope this helps and good luck!
This is probably a common situation, but I couldn't find a specific answer on SO or Google.
I have a large table (>10 million rows) of friend relationships on a MySQL database that is very important and needs to be maintained such that there are no duplicate rows. The table stores the user's uids. The SQL for the table is:
CREATE TABLE possiblefriends(
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
user INT,
possiblefriend INT)
The way the table works is that each user has around 1000 or so "possible friends" that are discovered and need to be stored, but duplicate "possible friends" need to be avoided.
The problem is, due to the design of the program, over the course of a day, I need to add 1 million rows or more to the table that may or not be duplicate row entries. The simple answer would seem to be to check each row to see if it is a duplicate, and if not, then insert it into the table. But this technique will probably get very slow as the table size increases to 100 million rows, 1 billion rows or higher (which I expect it to soon).
What is the best (i.e. fastest) way to maintain this unique table?
I don't need to have a table with only unique values always on hand. I just need it once-a-day for batch jobs. In this case, should I create a separate table that just inserts all the possible rows (containing duplicate rows and all), and then at the end of the day, create a second table that calculates all the unique rows in the first table?
If not, what is the best way for this table long-term?
(If indexes are the best long-term solution, please tell me which indexes to use)
Add a unique index on (user, possiblefriend) then use one of:
INSERT ... ON DUPLICATE KEY UPDATE ...
INSERT IGNORE
REPLACE
to ensure that you don't get errors when you try to insert a duplicate row.
You might also want to consider if you can drop your auto-incrementing primary key and use (user, possiblefriend) as the primary key. This will decrease the size of your table and also the primary key will function as the index, saving you from having to create an extra index.
See also:
“INSERT IGNORE” vs “INSERT … ON DUPLICATE KEY UPDATE”
A unique index will let you be sure that the field is indeed unique, you can add a unique index like so:
CREATE TABLE possiblefriends(
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
user INT,
possiblefriend INT,
PRIMARY KEY (id),
UNIQUE INDEX DefUserID_UNIQUE (user ASC, possiblefriend ASC))
This will also speec up your table access significantly.
Your other issue with the mass insert is a little more tricky, you could use the in-built ON DUPLICATE KEY UPDATE function below:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1;