MySQL to Postgres data migration - mysql

I'm in the process of migrating a Ruby on Rails application from MySQL to Postgres. Is there a recommended way to keep the deleted data, like all the deleted records (their IDs at least) from MySQL?
In testing a dump-and-restore didn't seem to keep deleted records.
Also, in the event that I manage to keep the records where they are, what'll happen with the blank ones in Postgres? Will they be skipped over or used?
Example
Say I have a user with an ID of 101 and I've deleted users up to 100. I need 101 to stay at 101.

So you don't want to reassign the IDs assigned to records where you generated keys.
That should be the default in any sane migration. When you copy the data rows over - say, exporting from MySQL with SELECT ... INTO OUTFILE and importing into PostgreSQL with COPY tablename FROM 'filename.csv' WITH (FORMAT CSV), the IDs won't change.
All you'll need to do is to set the next ID to be generated in the sequence on the PostgreSQL table afterwards. So, say you have the table:
CREATE TABLE users
(
id serial primary key,
name text not null,
...
);
and you've just copied a user with id = 101 into it.
You'll now just assign a new value to the key generation sequence for the table, e.g.:
SELECT setval('users_id_seq', (SELECT max(id) FROM users)+1);
To learn more about sequences and key generation in PostgreSQL, see SERIAL in the numeric types documentation, the documentation for CREATE SEQUENCE, the docs for setval, etc. The default name for a key generation sequence is tablename_columnname_seq.

Related

Mysql Auto Increment For Group Entries

I need to setup a table that will have two auto increment fields. 1 field will be a standard primary key for each record added. The other field will be used to link multiple records together.
Here is an example.
field 1 | field 2
1 1
2 1
3 1
4 2
5 2
6 3
Notice that each value in field 1 has the auto increment. Field 2 has an auto increment that increases slightly differently. records 1,2 and 3 were made at the same time. records 4 and 5 were made at the same time. record 6 was made individually.
Would it be best to read the last entry for field 2 and then increment it by one in my php program? Just looking for the best solution.
You should have two separate tables.
ItemsToBeInserted
id, batch_id, field, field, field
BatchesOfInserts
id, created_time, field, field field
You would then create a batch record, and add the insert id for that batch to all of the items that are going to be part of the batch.
You get bonus points if you add a batch_hash field to the batches table and then check that each batch is unique so that you don't accidentally submit the same batch twice.
If you are looking for a more awful way to do it that only uses one table, you could do something like:
$batch = //Code to run and get 'SELECT MAX(BATCH_ID) + 1 AS NEW_BATCH_ID FROM myTable'
and add that id to all of the inserted records. I wouldn't recommend that though. You will run into trouble down the line.
MySQL only offers one auto-increment column per table. You can't define two, nor does it make sense to do that.
Your question doesn't say what logic you want to use to control the incrementing of the second field you've called auto-increment. Presumably your PHP program will drive that logic.
Don't use PHP to query the largest ID number, then increment it and use it. If you do your system is vulnerable to race conditions. That is, if more than one instance of your PHP program tries that simultaneously, they will occasionally get the same number by mistake.
The Oracle DBMS has an object called a sequence which gives back guaranteed-unique numbers. But you're using MySQL. You can obtain unique numbers with a programming pattern like the following.
First create a table for the sequence. It has an auto-increment field and nothing else.
CREATE TABLE sequence (
sequence_id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`sequence_id`)
)
Then when you need a unique number in your program, issue these three queries one after the other:
INSERT INTO sequence () VALUES ();
DELETE FROM sequence WHERE sequence_id < LAST_INSERT_ID();
SELECT LAST_INSERT_ID() AS sequence;
The third query is guaranteed to return a unique sequence number. This guarantee holds even if you have dozens of different client programs connected to your database. That's the beauty of AUTO_INCREMENT.
The second query (DELETE) keeps the table from getting big and wasting space. We don't care about any rows in the table except for the most recent one.

How to achieve unique auto-incremented id for rows across multiple tables?

I've got several tables in a database, let's say they're called table1, table 2, etc.
All tables have a primary key column 'id' with auto-increment.
In my current configuration it happens that when inserting into table1, the generated id is 1.
Afterwards when inserting into table2, the generated id happens to be 1 as well.
How to force absolutely unique ids across all tables in a database? I want when inserting into table1, generated id to be 1, and if afterwards inserting into table2, generated id be 2?
I used mysql server on some machine and did not have this problem, but when I installed mysql on my local machine, it started to occur. So I guess it must be some kind of a setting that is applied to the mysql configuration?
Thank you
you can use UUID.
INSERT INTO mytable(id, name) VALUES(SELECT UUID(), 'some data');
Read more about UUID: http://mysqlbackupnet.codeplex.com/wikipage?title=Using%20MySQL%20With%20GUID%20or%20UUID
You can create SEQUENCE which can be used globally.
CREATE SEQUENCE serial
INCREMENT 1
MINVALUE 0
MAXVALUE 200
START 364
CACHE 1;
Edit: Sequences are supported in Postgres but this can be achieved in MySql by setting value for AUTO_INCREMENT and one can use LAST_INSERT_ID(). link

MySQL tables relationship and the use of md5 hash

I have a MySQL DB with 2 tables:
sample_name (stores name of a file, multiple names for same sample_hash);
sample_hash (stores the hashes of a file, will not store duplicate md5);
(all tables have an id int unsigned NOT NULL auto_increment)
My first option to relate these two tables is by creating an md5 column in both tables and relate them. However this seems to have a downside as I will be duplicating a varchar(32), which can be a waste of space with millions of records.
My second option is to calculate the file hashes first, grab the mysql_insert_id() of the sample_hash table and insert into the sample_name table. This makes sense if the hash in the sample_hash table is new, thus I have the mysql_insert_id() variable at my disposal.
But if the hash already exist in the samples_db, I don't want to store the hash again, so I will have no mysql_insert_id().
Is there an alternative other than searching the id of a given md5 in order to store it in the samples_name table in case the md5 already exist? If so, how can I do that?
From the requirements that you describe, there is no need for the sample_hash table at all.
You can keep the hashes in the sample_name table and do all your lookups of hash values in that table.

Is there an extant implementation of a reverse "AUTO_INCREMENT" in either PostgreSQL or MySQL?

Without having to do it manually (which I'm open to implementing if no other options exist), is there a way in either PostgreSQL or MySQL to have an automatic counter/field that decrements instead of increments?
For a variety of reasons in a current application, it would be nice to know how many more entries (from a datatype point of view) can still be added to a table just by looking at the most-recently-added record, rather than subtracting the most recent ID from the max for the datatype.
So, is there an "AUTO_DECREMENT" or similar for either system?
You have to do a bit of manual configuration in PostgreSQL but you can configure a sequence like that:
create sequence example_seq
increment by -1
minvalue 1
maxvalue 5
start with 5;
create table example(
example_id int primary key default nextval('example_seq'),
data text not null
);
alter sequence example_seq owned by example.example_id;
I suppose it would be equivalent to create the table with a serial column and then alter the auto-generated sequence.
Now if I insert some rows I get example_id counting down from 5. If I try to insert more than 5 rows, I get nextval: reached minimum value of sequence "example_seq" (1)

MySql unique id for several records

I use one table withe some casual columns such as id, name, email, etc...also I'm inserting a variable numbers of records in each transaction, to be much efficient I need to have one unique id lets call it transaction id, that would be the same for each group of data which are inserted in one transaction, should be increment. What is the best approach for doing that?
I was thought to use
select max(transaction_id) from users
and increment that value on server side, but that seams like old fashion solution.
You could have another table usergroups with an auto-incrementing primary key, you first insert a record there (maybe including some other useful information about the group). Then get the group's unique id generated during this last insert using mysql_insert_id(), and use that as the groupid for your inserts into the first table.
This way you're still using MySQL's auto-numbering which guarantees you a unique groupid. Doing select max(transaction_id) from users and incrementing this isn't safe, since it's non-atomic (another thread may have read the same max(transaction_id) before you've had a change to increment it, and will start inserting records with a conflicting groupid).
Add new table with auto_increment column
You can create new table with auto_increment column. So you'll be able to generate unique integers in thread safe way. It'll work like this:
DB::insert_into_transaction_table()
transaction_id = DB::mysql_last_insert_id() ## this is integer value
for each record:
DB::insert_into_table(transaction_id, ...other parameters...)
And you don't require mysql transactions for this.
Generate unique string on server side before inserting
You can generate unique id (for example GUID) on server side and use it for all records inserting. But your transaction_id field should be long enough to store values generated this way (some char(...) type). It'll work like this:
transaction_id = new_GUID() ## this is usually a string value
for each record:
DB::insert_into_table(transaction_id, ...other parameters...)