Would using bulk insert for 2000 rows of data make sense?
It might be 500-2K in reality.
BTW, does bulk inserts ignore constraint or is that a setting?
(using sql server 2008, .net on the server side, data is coming in via a web service (wse or WCF)).
Bulk insert would probably not make sense for 2000 rows. Maybe for 200,000 rows.
Ignoring constraints is default behaviour. (Also described here).
CHECK_CONSTRAINTS
Specifies that all constraints on the
target table or view must be checked
during the bulk-import operation.
Without the CHECK_CONSTRAINTS option,
any CHECK and FOREIGN KEY constraints
are ignored, and after the operation,
the constraint on the table is marked
as not-trusted.
Note: UNIQUE, PRIMARY KEY, and NOT NULL constraints are always
enforced.
The "KEEPIDENTITY" option of "BULK INSERT":
Specifies that identity value or
values in the imported data file are
to be used for the identity column. If
KEEPIDENTITY is not specified, the
identity values for this column are
verified but not imported and SQL
Server automatically assigns unique
values based on the seed and increment
values specified during table
creation.
Related
I Use Delphi 10.2, MySQL. I have a table that has about 50,000 records and has an Auto_Increment primary key. It has suddenly, and on it's own with no help from me, started trying to re-insert old key values. As a matter of fact, it started over with the value 1. I have no idea how to fix this and I hope you might be able to help.
Thanks,
Jim Sawyer
If the MySQL table is defined with an auto increment primary key then you should never specify the key value. MySQL should not re-use old key values, but you may want to check if there is any table corruption. You can also reset the table's auto-increment value using an ALTER TABLE command. (There's a tutorial on this here: https://www.mysqltutorial.org/mysql-reset-auto-increment)
You can use the Firedac monitoring to confirm whether or not you are sending the primary key to MySQL - set you connection to be monitored using the FireDAC component - they supply a monitoring tool that you can setup to see all of the SQL being transferred. Normally the Firedac layer would do an insert with no primary key and then use LAST_INSERT_ID to update the TField to have the actual value inserted.
If you are sending the wrong key then alter your logic so you don;t send the primary key on an insert.
you can reset the autoincrement value to any value you want with the following command
ALTER TABLE <table_name> AUTO_INCREMENT = <new value>;
so if new value is 100, the next inserted record receives a value of 100.
I am creating a Django app where the primary keys are AutoFields. i.e. I am not manually assigning any field as primary key in my models.
I need to use mySQL.
I will need to export all the data to excel or perhaps another django app from time to time. Therefore the primary keys must be unique to be able to identify new records or records to be deleted in excel/other app.
However, I have read that mySQL autoincrement counter resets to the max key when database restarts. This will result in reassignment of keys if the latest records were deleted.
I need to avoid this. No key should be reassigned.
How can this be done?
MySQL 8.0 now keeps the last auto-increment per table persistently. So it remembers between restarts, and does not reset the auto-increment.
https://www.percona.com/blog/2018/10/08/persistence-of-autoinc-fixed-in-mysql-8-0/
I have a very large table (dozens of millions of rows) and a UNIQUE index needs to be added to a column on that table. I know for a fact that the table does contain duplicated values on that key, which I need to clean up (by deleting rows/resetting the value of the column to something unique that I can automatically generate). A plus is that the rows which are already duplicated do not get modified anymore.
What would be the right approach to perform a change like this, given that I will be probably using the Percona pt-osc tool and there are continuous deletes/inserts on the table? My plan was:
Add code that ensures no dupe IDs get inserted anymore. Probably I need to add a separate table for this temporarily, since I want the database to enforce this for me and not the application - so insert into the "shadow table" with a unique index in a transaction together with my main table, rollback all inserts that try to insert duplicate values
Backfill the table by zapping all invalid column values which are within the primary key range below $current_pkey_value
Then add the index and use pt-osc to changeover the table
Is there anything I am missing?
Since we use pt-online-schema-change we are using triggers for performing the synchronisation from the existing table to a temp table. The tool actually has a special configuration key for this, --no-check-unique-key-change, which will do exactly what we need - agree to perform the ALTER TABLE and set up triggers in such a way that if a conflict occurs, INSERT .. IGNORE will be applied and the first row having used the now-unique value will win in the insert during synchronisation. For us this is a good tradeoff because all the duplicates we have seen resulted from data races, not from actual conflicts in the value generation process.
Background:
I built a scraper in python (not sure if that matters). I scrape the website and update my html table. The main table stores the autogenerated_id, url, raw_html, date_it_was_scrapped, last_date_the_page_was_updated (provided by the website). My table has many duplicate urls which it shouldnt so i am planning on making urls unique in the database.
Desired outcome:
I only want to insert a row if the url doesnt exist and update the html if last_date_the_page_was_updated > date_it_was_scrapped.
Solution:
The following stackoverflow post shows how.
I havent tested it because of the selected answers warning: INSERT ... ON DUPLICATE KEY UPDATE statement against a table having more than one unique or primary key is also marked as unsafe.
What I plan to do based on the stackoverflow question.
INSERT INTO html_table (url, raw_html, date_it_was_scrapped, last_date_the_page_was_updated)
VALUES (the data)
ON DUPLICATE KEY UPDATE
url = VALUES(url),
raw_html = VALUES(raw_html),
date_it_was_scrapped = VALUES(date_it_was_scrapped),
last_date_the_page_was_updated=VALUES(last_date_the_page_was_updated)
WHERE last_date_page_was_update > date_it_was_scrapped
Question:
What is unsafe about it and is there a safe way to do it?
From the description of bug 58637, which is linked in the MySQL documentation page that flags the INSERT ... ON DUPLICATE KEY UPDATE as unsafe :
When the table has more than one unique or primary key, this statement is sensitive to the order in which the storage engines checks the keys. Depending on this order, the storage engine may determine different rows to mysql, and hence mysql can update different rows [...] The order that the storage engine checks keys is not deterministic.
I understand that your table has an autoincremented primary key, and that you are planning to add a unique key on the url column. Because the primary key is autoincremented, you will not pass it as a parameter for INSERT commands, as shown in your SQL command. Hence MySQL will not need to check for duplicate on this column ; it will only check for duplicates on url. As a consequence, this INSERT should be safe.
Other remarks regarding your question.
you don't need to update the url command on duplicate keys (we know it is the same)
The purpose of the WHERE clause in your query is unclear, are you sure that it is needed ?
You will need to remove the duplicates before you enable the unique constraint on URL.
Is there a SQL statement (or atomic sequence of statements) supported by both MySQL and HSQLDB to insert values if they aren't already there?
I'm working on an app that uses MySQL as its production database and HSQLDB for unit tests; I'd like to have a single "initial data import when the tables are empty" script.
MySQL supports INSERT IGNORE, REPLACE INTO and INSERT INTO ... ON DUPLICATE KEY UPDATE ..., but HSQLDB doesn't; conversely, HSQLDB supports MERGE but MySQL doesn't.
HSQLDb from version 2.3.4 adds support for insert ignore.
http://hsqldb.org/
Version 2.3.4 added the UUID type for columns, SYNONYM for tables and
functions, PERIOD predicates, and auto-updated TIMESTAMP columns on
row updates. Other new features included the ability to cancel
long-running statements from JDBC as well as from admin sessions, and
UTF-16 file support for text table sources, in addition to 8-bit text
files. MySQL compatibility for REPLACE, INSERT IGNORE and ON
DUPLICATE KEY UPDATE statements.
And
http://hsqldb.org/doc/guide/guide.pdf (page 260).
HyperSQL supports and translates INSERT IGNORE, REPLACE and ON
DUPLICATE KEY UPDATE variations of INSERT into predictable and
error-free operations. When INSERT IGNORE is used, if any of the
inserted rows would violate a PRIMARY KEY or UNIQUE constraint, that
row is not inserted. With multi-row inserts, the rest of the rows are
then inserted only if there is no other violation such as long strings
or type mismatch, otherwise the appropriate error is returned. When
REPLACE or ON DUPLICATE KEY UPDATE is used, the rows that need
replacing or updating are updated with the given values. This works
exactly like an UPDATE statement for those rows. Referential
constraints and other integrity checks are enforced and update
triggers are activated. The row count returned is simply the total
number of rows inserted and updated.
If someone still has this problem you can enable syntax support for MySQl by adding the following to your script
SET DATABASE SQL SYNTAX MYS TRUE