I want to write a program add new item to table. This item has an unique key name and it can be created by one of 100 threads, so I need to make sure that it is inserted only once.
I have two ideas:
Use insert ignore
Fetch it from database via select then insert it to table if no returned row.
Which option is better? Is there an even more superior idea?
Late to the party, but I'm pondering something similar.
I created the following table to track active users on a license per day:
CREATE TABLE `license_active_users` (
`license_active_user_id` int(11) NOT NULL AUTO_INCREMENT,
`license_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`license_active_user_id`),
UNIQUE KEY `license_id` (`license_id`,`user_id`,`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
In other words, 1 primary key and 1 unique index across the remaining 3 columns.
I then inserted 1 million unique rows into the table.
Attempting to re-insert a subset (10,000 rows) of the same data yielded the following results:
INSERT IGNORE: 38 seconds
INSERT ... ON DUPLICATE KEY UPDATE: 40 seconds
if (!rowExists("SELECT ...")) INSERT: <2 seconds
If those 10,000 rows aren't already present in the table:
INSERT IGNORE: 34 seconds
INSERT ... ON DUPLICATE KEY UPDATE: 41 seconds
if (!rowExists("SELECT ...")) INSERT: 21 seconds
So the conclusion must be if (!rowExists("SELECT ...")) INSERT is fastest by far - at least for this particular table configuration.
The missing test is if (rowExists("SELECT ...")){ UPDATE } else { INSERT }, but I'll assume INSERT ... ON DUPLICATE KEY UPDATE is faster for this operation.
For your particular case, however, I would go with INSERT IGNORE because (as far as I'm aware) it's an atomic operation and that'll save you a lot of trouble when working with threads.
SELECT + INSERT -- two round trips to the server, hence slower.
INSERT IGNORE -- requires a PRIMARY or UNIQUE key to decide whether to toss the new INSERT. If this works for you, it is probably the best.
REPLACE -- is a DELETE + an INSERT. This is rarely the best.
INSERT ... ON DUPLICATE KEY UPDATE -- This lets you either INSERT (if the PRIMARY/UNIQUE key(s) are not found) or UPDATE. This is the one to use if you have things you need to update in existing rows.
"Burning ids" -- Only the "select+insert" avoids a potential problem: running out of AUTO_INCREMENT ids (I call it "burning ids"). All the other techniques will allocate the next id before deciding whether it is needed.
If you have several names to conditionally insert into a normalization, then a 2-query technique can batch them quite efficiently, and not burn ids: http://mysql.rjweb.org/doc.php/staging_table#normalization
Best: SELECT + INSERT IGNORE.
Because it is use SELECT for check it do not need lock table or row in table.
Any INSERT need lock. So this can reduce performance on concurrent INSERT's.
Related
I have a table in MySQL (50 million rows) new data keep inserting periodically.
This table has following structure
CREATE TABLE values (
id double NOT NULL AUTO_INCREMENT,
channel_id int(11) NOT NULL,
val text NOT NULL,
date_time datetime NOT NULL,
PRIMARY KEY (id),
KEY channel_date_index (channel_id,date_time)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Two rows must never have duplicate channel_id and date_time, but if such insert occurs it is important to keep the newest value.
Is there a procedure to check for duplicates realtime before the insert or should I keep inserting all data while doing periodic checks for duplicity in a different cycle.
Realtime speed is important here, because 100 inserts occur per second.
To prevent future duplicates:
Change KEY channel_date_index (channel_id,date_time) to UNIQUE (channel_id,date_time)
Change the INSERT to INSERT ... ON DUPLICATE KEY UPDATE ... to change the timestamp when that pair exists.
To fix the existing table, you could do ALTER IGNORE TABLE ... ADD UNIQUE(...). However that would not give you the latest timestamps.
For minimum downtime (not maximum speed), use pt-online-schema-change.
My table just has single column ID (passwords for admin Log-in)
Because this code runs every time that program starts, I can prevent errors occurs on creating database and creating tables by using IF NOT EXIXTS statement.
Since adminLogin table should be initial first time, When user re-run the program, the Duplicate entry for primary key error occurs.
I used IF NOT EXISTS for inserting into table, But there is some another error!
My table:
Error:
You are trying to insert same value.
PK should be unique.
SET ID as autoincrement.
CREATE TABLE `table_code` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`your_column` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1;
Another possibility
INSERT INTO adminLogin(ID) VALUES(100) ON DUPLICATE KEY UPDATE ID=ID;
You can use INSERT IGNORE instead.
INSERT IGNORE INTO ADMINLOGIN VALUES(200);
INSERT IGNORE
INSERT ... ON DUPLICATE KEY UPDATE ...
REPLACE INTO can all work without erroring. Depending on your needs, some may be a better use than others.
Here's a brief pros and cons of each:
INSERT IGNORE:
Pros: Easy to write
Cons: if data that is being inserted is newer the older data will be preserved
INSERT ... ON DUPLICATE KEY UPDATE ...
Pros: Can Insert or update specific columns that could have more recent data
Cons: A little harder to write the query for
REPLACE INTO
Pros: Can insert and update columns that could have more recent data. Faster than Insert, on duplicate key update.
Cons: Overwrites every column even though only some columns may have newer data. IF you are inserting multiple rows replace into could potentially overwrite data for existing rows that you don't really want to overwrite.
Let say I have a MySQL table which contains three columns: id, a and b and the column named id is an AUTO INCREMENT field. If I pass a query like the following to MySQL, it will works fine:
REPLACE INTO `table` (`id`, `a`, `b`) VALUES (1, 'A', 'B')
But if I skip the field id it will no longer works, which is expected.
I want to know if there is a way to ignore some fields in the REPLACE query. So the above query could be something like this:
REPLACE INTO `table` (`a`, `b`) VALUES ('A', 'B')
Why do I need such a thing?
Sometimes I need to check a database with a SELECT query to see if a row exists or not. If it is exists then I need to UPDATE the existing row, otherwise I need to INSERT a new row. I'm wondering if I could achieve a similar result (but not same) with a single REPLACE query.
Why it couldn't be the same result? Simply because REPLACE will DELETE the existing row and will INSERT a new row, which will lose the current primary key and will increase the auto incremented values. In contrast, in an UPDATE query, primary key and the AI fields will be untouched.
MySQL REPLACE.
That's not how you're supposed to use replace.
use replace only when you know primary key values.
Manual:
Note that unless the table has a PRIMARY KEY or UNIQUE index, using a
REPLACE statement makes no sense. It becomes equivalent to INSERT,
because there is no index to be used to determine whether a new row
duplicates another.
What if you have multiple rows that match the fields?
Consider adding a key that you can match on and use INSERT IGNORE.. ON DUPLICATE KEY UPDATE. The way INSERT IGNORE works is slightly different from REPLACE.
INSERT IGNORE is very fast but can have some invisible side effects.
INSERT... ON DUPLICATE KEY UPDATE
Which has fewer side effects but is probably much slower, especially for MyISAM, heavy write loads, or heavily indexed tables.
For more details on the side effects, see:
https://stackoverflow.com/a/548570/1301627
Using INSERT IGNORE seems to work well for very fast lookup MyISAM tables with few columns (maybe just a VARCHAR field).
For example,
create table cities (
city_id int not null auto_increment,
city varchar(200) not null,
primary key (city_id),
unique key city (city))
engine=myisam default charset=utf8;
insert ignore into cities (city) values ("Los Angeles");
In this case, repeatedly re-inserting "Los Angeles" will not result in any actual changes to the table at all and will prevent a new auto_increment ID from being generated, which can help prevent ID field exhaustion (using up all the available auto_increment range on heavily churned tables).
For even more speed, use a small hash like spooky hash before inserting and use that for a separate unique key column and then the varchar won't get indexed at all.
I need to import data from one MySQL table into another. The old table has a different outdated structure (which isn't terribly relevant). That said, I'm appending a field to the new table called "imported_id" which saves the original id from the old table in order to prevent duplicate imports of the old records.
My question now is, how do I actually prevent duplicates? Due to the parallel rollout of the new system with the old, the import will unfortunately need to be run more than once. I can't make the "import_id" field PK/UNIQUE because it will have null values for fields that do not come from the old table, thereby throwing an error when adding new fields. Is there a way to use some type of INSERT IGNORE on the fly for an arbitrary column that doesn't natively have constraints?
The more I think about this problem, the more I think I should handle it in the initial SELECT. However, I'd be interested in quality mechanisms by which to handle this in general.
Best.
You should be able to create a unique key on the import_id column and still specify that column as nullable. It is only primary key columns that must be specified as NOT NULL.
That said, on the new table you could specify a unique key on the nullable import_id column and then handle any duplicate key errors when inserting from the old table into the new table using ON DUPLICATE KEY
Here's a basic worked example of what I'm driving at:
create table your_table
(id int unsigned primary key auto_increment,
someColumn varchar(50) not null,
import_id int null,
UNIQUE KEY `importIdUidx1` (import_id)
);
insert into your_table (someColumn,import_id) values ('someValue1',1) on duplicate key update someColumn = 'someValue1';
insert into your_table (someColumn) values ('someValue2');
insert into your_table (someColumn) values ('someValue3');;
insert into your_table (someColumn,import_id) values ('someValue4',1) on duplicate key update someColumn = 'someValue4';
where the first and last inserts represent inserts from the old table and the 2nd and 3rd represent inserts from elsewhere.
Hope this helps and good luck!
This is probably a common situation, but I couldn't find a specific answer on SO or Google.
I have a large table (>10 million rows) of friend relationships on a MySQL database that is very important and needs to be maintained such that there are no duplicate rows. The table stores the user's uids. The SQL for the table is:
CREATE TABLE possiblefriends(
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
user INT,
possiblefriend INT)
The way the table works is that each user has around 1000 or so "possible friends" that are discovered and need to be stored, but duplicate "possible friends" need to be avoided.
The problem is, due to the design of the program, over the course of a day, I need to add 1 million rows or more to the table that may or not be duplicate row entries. The simple answer would seem to be to check each row to see if it is a duplicate, and if not, then insert it into the table. But this technique will probably get very slow as the table size increases to 100 million rows, 1 billion rows or higher (which I expect it to soon).
What is the best (i.e. fastest) way to maintain this unique table?
I don't need to have a table with only unique values always on hand. I just need it once-a-day for batch jobs. In this case, should I create a separate table that just inserts all the possible rows (containing duplicate rows and all), and then at the end of the day, create a second table that calculates all the unique rows in the first table?
If not, what is the best way for this table long-term?
(If indexes are the best long-term solution, please tell me which indexes to use)
Add a unique index on (user, possiblefriend) then use one of:
INSERT ... ON DUPLICATE KEY UPDATE ...
INSERT IGNORE
REPLACE
to ensure that you don't get errors when you try to insert a duplicate row.
You might also want to consider if you can drop your auto-incrementing primary key and use (user, possiblefriend) as the primary key. This will decrease the size of your table and also the primary key will function as the index, saving you from having to create an extra index.
See also:
“INSERT IGNORE” vs “INSERT … ON DUPLICATE KEY UPDATE”
A unique index will let you be sure that the field is indeed unique, you can add a unique index like so:
CREATE TABLE possiblefriends(
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
user INT,
possiblefriend INT,
PRIMARY KEY (id),
UNIQUE INDEX DefUserID_UNIQUE (user ASC, possiblefriend ASC))
This will also speec up your table access significantly.
Your other issue with the mass insert is a little more tricky, you could use the in-built ON DUPLICATE KEY UPDATE function below:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1;