Teradata identity column and "Duplicate unique prime key error in dbname.tablename" - unique

I created a table using the below definition for a Teradata identity column:
ID INTEGER GENERATED BY DEFAULT AS IDENTITY
(START WITH 1
INCREMENT BY 1
MINVALUE 0
MAXVALUE 100000000
NO CYCLE),
----
UNIQUE PRIMARY INDEX ( ID )
For several months, the ID column has been working properly, automatically generating a unique value for the column. Over the past month, however, ELMAH has been intermittently reporting the following exception from our .NET 4.0 ASP.NET app:
Teradata.Client.Provider.TdException: [Teradata Database] [2801] Duplicate unique prime key error in DATABASENAME.TABLENAME.
I was able to replicate it by opening SQL Assistant and inserting a bunch of records into the table with raw SQL. As expected, most of the time it would insert successfully, but other times it would throw the above exception.
It appears that this error is occuring because Teradata is trying to generate a value for this column that it has previously generated.
Does anyone have any idea how to get to the bottom of what's happening? At the very least, I'd like some way to debug the issue a bit deeper.

I would suggest changing the definition of your identity column to GENERATED ALWAYS to prevent the application or ETL process from supplying a value that could have been used. In fact, it is recommended by Teradata that if you are using your IDENTITY column as part of a UPI that it should be defined as GENERATED ALWAYS ... NO CYCLE
EDIT:
If your business requirements are such that you must be able to provide a value I would also consider using a domain that is outside the range of values you have set aside for the IDENTITY column. You can use a negative domain or a range that is an order of magnitude beyond that of the IDENTITY column. Personal preference would be to use a negative domain.

Related

Sync SQL Binary column to MySQL table

I’m attempting to use a piece of software (Layer2 Cloud Connector) to sync a local SQL table (Sage software) to a remote MySQL database where the data is used reports generated via the company's web app. We are doing this with about 12 tables, and have been doing so for almost two years without any issues.
Background:
I’m using a simple piece of software the uses a SELECT statement to sync records from one table to another using ODBC. In this case from SQL (SQLTable) to MySQL (MySQLTable). To do so, the software requires a SELECT statement for each table, a PK field, and, being ODBC-based, a provider. For SQL I'm using the Actian Zen 4.5, and for MySQL I'm using the MySQL ODBC 5.3.
Here is a screenshot of what the setup screen looks like for each of the tables. I have omitted the other column names that I'm syncing to make the SELECT statement more readable. The other columns are primarily varchar or int types.
Problem
For unrelated reasons, we must now sync a new table. Like most of the other tables, it has a primary key column named rGUID of type binary. When initially setting up the other tables, I tried to sync the primary key as a binary type to a MySQL binary column, but it failed when attempting to verify the SELECT statement on the SQLServer side with the error “Cannot remove this column, because it is a part of the constraint Constraint1 on the table SQLTable”.
Example of what I see for the the GUID/rGUID primary key values stored in the SQLTable via Access, or in MySQL after syncing as string:
¡狻➽䪏蚯㰛蓪
Ҝ諺䖷ᦶ肸邅
ब惈蠷䯧몰吲론�
ॺ䀙㚪䄔麽骧⸍薉
To get around this, I use CAST in the SQLTable SELECT statement to CAST the binary value as a string using: CAST(GUID as nchar(8)) as GUID, and then set up the MySQL column as a VARCHAR(32) using utf8_general_ci collation.
This has worked great for every other table since we originally set this up. But this additional table has considerably more records (about 120,000 versus 5,000-10,000), and though I’m able to sync 10,000 – 15,000 successfully, when I try to sync the entire table I get about 10-12 errors such as:
The metabase record 'd36d2dbe-fa89-4712-be4c-6b212367004b' is marked
to be added. The table 'SQLTable' does not contain a corresponding
row. Changes made to this metabase record will be reset to the
initial state.
I don't understand what is causing the above error or how to work past it.
What I’ve tried so far:
I’ve confirmed the SQLTable has no other unique fields that could be
used as PK in place of the rGUID column
I’ve tried use different type, length and collation settings on the
MySQL table, and have had mixed success, but ultimately still get
errors when attempting to sync the entire table.
I’ve also tried tweaking the CAST settings for the SQL SELECT
statement, but nchar(8) seems to work best for the other tables
I've tried syncing using HASHBYTES('SHA1', GUID) as GUID and syncing
the value of that, but get the below ODBC error
I was thinking perhaps I could convert the SQL GUID to its value, then sync that as a varchar (or a binary), but my attempts at using CONVERT in the SQLTable SELECT statement have failed
Settings I used for all the other tables:
SQL SELECT Statement: SELECT CAST(GUID as nchar(8)) as GUID, OtherColumns FROM SQLTable;
MYSQL SELECT Statement: SELECT GUID, OtherColumns FROM MySQLTable;
Primary Key Field: GUID
Primary Key Field Type: String
MySQL Column Type/Collation: VARCHAR(32), utf8_general_ci
Any help or suggestions at all would be great. I've been troubleshooting this in my spare time for a couple of weeks now, and have no had much success. I'm not particularly familiar with the binary type, and am hoping someone might have an idea on how I might be able to successfully sync this SQL table to MySQL without these errors.
Given the small size of the datasets involved I would select as CHAR(36) from SQL Server and store in a CHAR(36) in MySQL.
If you are able to control the way the data is inserted by Layer2 Cloud Connector then you could set your MySQLTable GUID column as BINARY(16) -
SELECT CAST(GUID AS CHAR(36)) AS GUID, OtherColumns FROM SQLTable;
INSERT INTO MySQLTable (GUID) VALUES (UUID_TO_BIN(GUID)))
SELECT BIN_TO_UUID(GUID) AS GUID, OtherColumns FROM MySQLTable;

Error with inserting into mysql database

I am using cfwheels (coldfusion orm framework).
I recently moved some data from my previous host to a new one. Now I am trying to insert into a table, but am getting an error message: "Error Executing Database Query.
Duplicate entry '13651' for key 'PRIMARY'"
I looked into the database and it appears a record with id 13651 already exists. So I think the problem is with mysql generating the right auto increment value.
It seems Auto_Increment value is damaged or not set to max value in that column. It's possible due to bulk insert.
So as per solution, set the maximum PK value + 1 as new AUTO_INCREMENT value. Now when you insert the records in this table, they will automatically pick the next incremented correctly.
ALTER.TABLE tablename AUTO_INCREMENT = value
Is the rest of the data for that record, and the one you are trying to insert, the same? If you you might just need to tell the ORM to replace that value?
If primary key has auto increment attribute turned on, do not insert it manually. remove that primary key part from your insert query (whatever the syntax according to the taste of your ORM framework).

ssis Data Migration - Master Detail records with new surrogate keys

Finally reached data migration part of my Project and now trying to move data from MySQL to SQL Server.
SQL Server has new schema (mapping is not always one to one).
I am trying to use SSIS for the conversion, which I started learning today morning.
We have customer and customer location table in MySQL and equivalent table in SQL Server. In SQL server all my tables now have surrogate key column (GUID) and I am creating the same in Script Component.
Also note that I do have a primary key in current mysql tables.
What I am looking for is how I can add child records to customer location table with newly created guid as parent key.
I see that SSIS have Foreach loop container, is this of any use here.
if not another possibility that I can think of is create two Data Flow Task and [somehow] just before the master data is sent to Destination Component [Table] on primary dataflow task , add a variable with newly created GUID and another with old PrimaryID, which will be used to create source for DataTask Flow for child records.
May be to simplyfy , this can also be done once datatask for master is complete and then datatask for child reads this master data and inserts child records from MySQL to SQL Server table. This would though mean that I have to load all my parent table records back into memory.
I know this is all too confusing and it is mainly because I am very confused :-(, to bear with me and if you want more information let me know.
I have been through may links that i found through google search but none of them really explains( or I was not able to uderstand) how the process is carried out.
Please advise
regards,
Mar
** Edit 1**
after further searching and refining key words i found this link in SO and going through it to see if it can be used in my scenario
How to load parent child data found in EDI 823 lockbox file using SSIS?
OK here is what I would do. Put the my sql data into staging tables in sql server that have identity columns set up and an extra column for the eventual GUID which will start out as null. Now your records have a primary key.
Next comes the sneaky trick. Pick a required field (we use last_name) and instead of the real data insert the value form the id field in the staging table. Now you havea record that has both the guid and the id in it. Update the guid field in the staging table by joing to it on the ID and the required field you picked out. Now update the last_name field with the real data.
To avoid the sneaky trick and if this is only a onetime upload, add a column to your tables that contains the staging table id. Again you can use this to get the guid for inserting to related tables. Then when you are done, drop the extra column.
You are aware that there are performance issues involved with using GUIDs? Make sure not to make them the clustered index (as the PK they will be by default unless you specify differntly) and use newsequentialid() to populate them. Why are you using GUIDs? If an identity would work, it is usually better to use it.

switching from NewID to Sequential NewID for a default PK value

I have a production database with few million rows all using randomly generated GUIDs using a default value of NewID() as the primary key.
I am thinking of using the Sequential NewIDs going forward.
How will SQL Server know while generating the GUIDs in sequence that it did not already create that GUID when it was randomly generating using NEWID()?
It sounds like you're considering using NEWSEQUENTIALID() as the new default value.
Don't worry about the scenario of duplicates. The PK constraint on that column guarantees that a collision isn't going to happen. You aren't guaranteed that a new sequential GUID will be 'higher' anyway:
Creates a GUID that is greater than any GUID previously generated by this function on a specified computer since Windows was started. After restarting Windows, the GUID can start again from a lower range, but is still globally unique.
Some relevant information that may help you at Microsoft Connect : NEWSEQUENTIALID() is Not Sequential

SERIAL-like INT column

I have an app where depending on the type of transaction being added or updated, the ticket number may or may not increment. I can't use a SERIAL datatype for ticket number because it would increment regardless of the transaction type, so I defined ticket number as an INT. So in a multi-user environment if user A is adding or updating a transaction and user B is also doing the same, I test for tran type and if next ticket number is required, then
LET ticket = (SELECT MAX(ticket) [WITH ADDLOCK or UPDLOCK?] FROM transactions) + 1
However this has to be done exactly when the row is being committed or troubles will begin. Can you think of a better way of doing this with: Informix, Oracle, MySQL, SQL-Server, 4Js/Genero or other RDBMS? This is one main factor which will determine what RDBMS I'm going to re-write my app in.
With the Informix DBMS, the SERIAL column will not change after it is inserted; indeed, you cannot update a SERIAL value at all. You can insert a new one with either 0 as the value - in which case a new value is generated - or you can insert some other value. If the other value already exists and there is a unique constraint, that will fail; if it does not exist, or if there is no unique constraint on the serial column, then it will succeed. If the value inserted is larger than the largest value previously inserted, then the next number to be inserted will be one larger again. If the number inserted is smaller, or negative, then there is no effect on the next number.
So, you could do your update without changing the value - no problem. If you need to change the number, you will have to do a delete and insert (or insert and delete), where the insert has a zero in it. If you prefer consistency and you use transactions, you could always delete, and then (re)insert the row with the same number or with a zero to trigger a new number. This assume you have a programming language running the SQL; I don't think you can tweak ISQL and Perform to do that automatically.
So, at this point, I don't see the problem on Informix.
With the appropriate version of IDS (anything that is supported), you can use SEQUENCE to control the values inserted too. This is based on the Oracle syntax and concept; DB2 also supports this. Other DBMS have other equivalent (but different) mechanisms for handling the auto-generated numbers.
That's what sequences were created for and which is supported by most databases (MySQL being the only one that does not have sequences - not 100% sure about Informix though)
Any algorithm that relies on the SELECT MAX(id) anti-pattern is either dead-slow in a multi-user environment or will simply not work correctly in a multi-user environment.
If you need to support MySQL as well, I'd recommend to use the "native" "auto increment" type in each database (serial for PostgreSQL, auto_increment for MySQL, identity for SQL Server, sequence + trigger in Oracle and so on) and let the driver return the generated ID value
In JDBC there is a getGeneratedKeys() method and I'm sure other interfaces have something similar.
From your tags it's hard to tell what database you are using.
For SQL Server (since it's listed) I suggest
ticket_num = (SELECT MAX(ticket_number) FROM transactions with (updlock)) + 1