MySQL - Migrating some ID numbers over from randomly generated to autoincremental - mysql

I am in the process of rewriting a company's entire system. The original developer was a bit silly and generated ID numbers for each customer report randomly in his database. Each ID number is up to 7 digits long - but could be anything.
I am migrating over all his old data to our new, far more logically structured database. I obviously want to use a MySQL auto-increment for our ID field. However, it's vital that we keep the old ID numbers as customers still phone up each day with those to reference against.
Ideally, the perfect scenario would be we go live December 1st - everything up to December 1st is all randomly IDed, and from December 1st onwards they automatically increment starting at the highest random ID in the old database.
Is such a thing possible with MySQL without any issues? I am currently using two columns - one, our logical autoincrementing ID, and a second column called old_id which was being used during migration. But we need the call centre staff to only be using one ID or mass confusion will ensue.
Thanks!

If you start numbering from the highest random value, just changing the field to autoincrement should be enough, the normal behaviour is that mysql won't change ids already set, and starts numbering from the highest value+1.
If you want to start from a specific value (say 10,000,000) you can set
ALTER TABLE theTableInQuestion AUTO_INCREMENT=10000000
Of course, be sure to create backups and test, but it should not pose any problems at all. (Note that the old records will be stored in order of the id-field, which is random, and won't reflect the creation order.)

As you need to keep the old IDs, I'm going to assume that you're going to create a new column for autoincrement ID that will become your primary key but keep the existing ID column and rename it (to old_id, maybe?). I'm also going to assume you record when a customer signed up.
If you make your old ID column nullable (allow NULL as a valid value) then you can simply check whether or not the old ID column is NULL. If it's not NULL then treat that as the ID, otherwise use the autoincrement column.
Finding a customer:
SELECT *
FROM customer
WHERE (id = /*Put your ID here*/ AND reg_date >= /*Put the date the new regime starts here*/)
OR (id_old = /*put your ID here*/ AND reg_date < /*Put the date the new regime starts here*/)
This will occasionally return 2 rows so you'll have to use some other criteria to uniquely identify the customer in question.
As for associating an old customer with other tables in the database, you can always use the new ID internally throughout the entire DB once its generated. You will have to update tables that are using the old ID as the foreign key, obviously.
UPDATE target_table
JOIN customers on target_table.cust_id = customers.id_old
SET target_table.cust_id = customers.id;
(Note: The above is just a quick and dirty query that hasn't been tested! I'd suggest testing on a copy of the database before you try it for real!)

Related

How can I select max id in MySQL in the fastest way considering time complexity

Which would be the best way to find the biggest ID in MySQL?
I am working on an eCommerce website and I need to find the maximum ID.
But regarding big table size and high frequency of using database by web application, I would like to know more how MySQL finding the biggest ID in MAX() way.
The only two method I know is that:
Sorting and cut column one
MAX(id)
Databases are good at data. MySQL correctly indexed is no exception.
SELECT MAX(id)
FROM tablename
So keep it simple.
This will scan backwards though a id based index to find the maximum number.
What about creating a resistant variable so that every time a new record is added to the table the max_tableA_id variable gets updated so it is always within easy reach.
Alternately you could create a simple table with two columns...
table names and current max id
and then update the appropriate record each time a new record is added to the table.
now all you need is a simple query to get the current max id for a given table.

MySQL history table design and query

TL;DR: Is this design correct and how should I query it?
Let's say we have history tables for city and address designed like this:
CREATE TABLE city_history (
id BIGINT UNSIGNED NOT NULL PRIMARY KEY,
name VARCHAR(128) NOT NULL,
history_at DATETIME NOT NULL,
obj_id INT UNSIGNED NOT NULL
);
CREATE TABLE address_history (
id BIGINT UNSIGNED NOT NULL PRIMARY KEY,
city_id INT NULL,
building_no VARCHAR(10) NULL,
history_at DATETIME NOT NULL,
obj_id INT UNSIGNED NOT NULL
);
Original tables are pretty much the same except for history_id and obj_id (city: id, name; address: id, city_id, building_no). There's also a foreign key relation between city and address (city_id).
History tables are populated on every change of the original entry (create, update, delete) with the exact state of the entry at given time.
obj_id holds id of original object - no foreign key, because original entry can be deleted and history entries can't. history_at is the time of creation of history entry.
History entries are created for every table independently - change in city name creates city_history entry but does not create address_history entry.
So to see what was the state of the whole address with city (e.g. on printed documents) at any T1 point in time, we take from both history tables most recent entries for given obj_id created before T1, right?
With this design in theory we should be able to see the state of signle address with city at any given point of time. Could anyone help me create such a query for given address id and time? Please note that there could be multiple records with the same exact timestamp.
There is also a need to create a report for showing every change of state of given address in given time period with entries like "city_name, building_no, changed_at". Is it something that can be created with SQL query? Performance doesn't matter here so much, such reports won't be generated so often.
The above report will probably be needed in an interactive version where user can filter results e.g. by city name or building number. Is it still possible to do in SQL?
In reality address table and address_history table have 4 more foreign keys that should be joined in report (street, zip code, etc.). Wouldn't the query be like ten pages long to provide all the needed functionality?
I've tried to build some queries, play with greatest-n-per-group, but I don't think I'm getting anywhere with this. Is this design really OK for my use cases (if so, can you please provide some queries for me to play with to get where I want?)? Or should I rethink the whole design?
Any help appreciated.
(My answer copied from here, since that question never marked an answer as accepted.)
My normal "pattern" in (very)pseudo code:
Table A: a_id (PK), a_stuff
Table A_history: a_history_id (PK), a_id(FK referencing A.a_id), valid_from, valid_to, a_stuff
Triggers on A:
On insert: insert values into A_history with valid_from = now, and valid_to = null.
On update: set valid_to = now for last history record of a_id; and do the same insert from the "on insert" trigger with the updated values for the row.
On delete: set valid_to = now for last history record of a_id.
In this scenario, you'd query history with "x >= from and x < to" (not BETWEEN as the a previous record's "from" value should match the next's to "value").
Additionally, this pattern also makes "change log" reports easier.
Without a table dedicated to change logging, the relevant records can be found just by SELECT * FROM A_history WHERE valid_from BETWEEN [reporting interval] OR valid_to BETWEEN [reporting interval].
If there is a central change log table, the triggers can just be modified to include log entry inserts as well. (Unless log entries include "meta" data such as reason for change, who changed, etc... obviously).
Note: This pattern can be implemented without triggers. Using a stored procedure, or even just multiple queries in code, can actually negate the need for the non-history table.
The history table's "a_id" would need to be replaced with whatever uniquely identifies the record normally though; it could still be an id value, but these values would need synthesized when inserting, and known when updating/deleting.
Queries:
(if not new) UPDATE the most recent entry's valid_to.
(if not deleting) INSERT new entry
This is a very "traditional" Problem, when it comes down to versioning (or monitoring) of changes to a certain row.
There are various "solutions", each having its own drawback and advantage.
The following "statements" are a result of my expericence, they are neither perfect, nor do I claim they are the "only ones"!
1.) Creating a "history table": That's the worst Idea of all. You would always need to take into account which table you need to query, depending on DATA that should be queried. That's a "Chicken-Egg" Problem...
2.) Using ONE Table with ONE (increasing) "Revision" Number: That's a better approach, but it will get "hard" to query: Determining the "most recent row" per "id" is very costly no matter which aproach is used.
My personal expierence is, that following the pattern of a "double linked List" ist the best to solve this, when it comes down to Millions of records:
3.) Maintain two columns among every entity, let's say prev_version_id and next_version_id. prev_version_id points to NULL, if there is no previous version. next_version_id points to NULL if there is no later version.
This approach would require you to ALWAYS perform two actions upon an update:
Create the new row
Update the old rows reference (next_version_id) to the just insterted row.
However, when your database has grown to something like 100 Million Rows, you will be very happy that you have choosen this path:
Querying the "Oldest" Version is as simple as querying where ISNULL(prev_version_id) and entity_id = 5
Querying the "Latest" Version is as simple as querying where ISNULL(next_version_id) and entity_id = 5
Getting a full version history will just target the entity_id=5 of the data-table, sortable by either prev_version_id or next_version_id.
The very often neglected fact: The first two queries will also work to get a list of ALL first versions or of ALL recent versions of an entity - in about NO TIME! (Don't underestimate how "costly" it can be do determine the most recent version of an entity otherwise! Believe me, when "testing" everything seems equaly fine, but the real struggle starts when live-data with millions of records is used.)
cheers,
dognose

Database design: Managing old and new data in database table

I have a table Student with field as followed,
Student table (one record per student)
student_id
Name
Parent_Name
Address_line1, Address_line2, Addess_line
Photo_path
Signature_file_path
Preferred_examcity_choice1,Preferred_examcity_choice1, Preferred_examcity_choice3
Gender
Nationality
.
.
.
I am inserting into this table on Registration form completion through the web interface.
Now there is one more module in a web interface for updating the student data, on every update request I am updating the student table records and inserting the new entry in student_data_change_request. student can change records any number of times.
student_data_change_request
request_id(auto_incr PK)
old_name
new_name
old_photo_path
new_photo_path
old_signature_file_path
new_signature_file_path
Now coming to problem, earlier students were allowed to change very few fields, now client want to allow the candidate to update more number of fields(around 20 fields) and adding old and new columns for the corresponding column isn't elegant and preferred(I guess), I will end up creating 40 columns to keep track of 20 columns. So how should I redesign my table? suggestions are welcomed.
One approach is to have a shadow table named (table)_xx that has the same columns, the time, date, update/insert/delete flag, user or whatever and no referential integrity. Set a trigger to update that table from the source whenever anything happens.
If you've got genuine business requirements that need history then do those properly but this pattern is great as a general audit, debugging and forensic tool.
It's also really easy to automate/script as you just generate it from the DB metadata.
Usually historical table looks like:
request_id
column_name
old_value
new_value
dt
request_id and column_name are primary key. When you update student table you insert new entry in student_data_change_request for each updating column.
Edited:
Another way:
request_id
value_type
name
photo_path
signature_file_path
...
and insert first entry with old values and second entry with new values. Colum value_type is mark old or new.
I would rather have just one table, with an additional column for effective date. Then a view that picks up just the most recent row for each student_id becomes your first "table". If for some reason you must show "current" and "most recently changed" values side-by-side, that is another view.
As usual, it all depends on how you intend to use the data.
My strong preference in these cases is the solution #mathguy suggests - embedding the concept of time in the main table design. This allows you to ask the question "what was this student's address on 1 Jan?", or "who had signature x on 12 Feb?".
If you have to report or execute business logic that reflects the status at any point in time, this design works really well. For instance, if you have to report on how many students lived in a particular address for a given term, you want to know when the records were valid.
But not all applications care about "time" - sometimes, you just want to have an audit table, so you can trace what happened over time in case of anomalies.
In that case, #loztinspace's solution is useful - but in my experience, this rapidly escalates into more work, because those who want to inspect the audit records can or should not get access to a SQL prompt on your production environment.

auto_increment to a lower unused number with mysql

I have an old website and a new website... the old website had 4500 orders placed on it, tracked by a table with a primary key for the order id.
When the new website was launched, it was launched before migrating old orders into it. To accomplish this, the auto_increment value on the new orders table was set to 5000 so any new order placed would not collide with an old id.
This allows orders to continue being placed on the new website, all is well...
Now I'd like to run my import script to bring in the old orders into the new website.
Is it possible to temporarily lower the auto_increment value on the new orders table to my desired order id?
Disclaimer: This is a migration from a Drupal 5 Ubercart based site, to a Drupal 7 Commerce based site, so I do not (easily) have control over the complex queries involved in assembling the new orders, and cannot simply (AFAIK) supply an order id when assembling an order, because the system always refers to the next available primary key value in the table when creating an order. I can easily take the site offline to run the script, so nothing gets out of sync.
For importing the "old" orders you don't need to rely on the autoincrement id-- they already have ids, and you probably want to keep those!
Modify your import script to insert the complete old records into the new table, id and all! As long as the ids don't collide, it shouldn't be a problem.
You can always run ALTER TABLE table_name AUTO_INCREMENT = 1; (or whatever number you want).
The question is: do you WANT to?
If you have any records already in the database, it's probably best to insure the next auto increment value is larger than the maximum already in your database.

MySQL: Custom Auto-Generated Key (AUTO_INCREMENT / multiple-column index)

MySQL is used as database.
As part of inventory system, I need generate stock number, that is unique identifier for asset. Client has requirement that this number is not just autoincremented integer, but follows pattern:
#BusinessUnit#YYYY#Number
, where
#BusinessUnit = string representing business unit;
#YYYY = current year;
#Number=n = Unique number for this BusinessUnit & This Year: n-th item asset registered in system this year and ready for sale.
For example,lets say we have we have various users entering assets for 2 Business Units = {NY, CA}. Stock numbers would be expected as follows:
NY201100001
NY201100002
CA201100001
NY201100003
CA201100002
So far based on manuals available, first thought would be using AUTO_INCREMENT and have separate table for each business unit with trigger on insert, where after insert from numberic auto-generated id update inventory table containing all business unit assets with generated id with concatenated business unit and year in front.
Also as first thing in new year reset AUTO_INCREMENT = 0 - alter all tables.
Is there any better way and ability avoid need create multiple tables, can I just create somehow multi-column index? If yes, could you please provide appropriate table definition sample?
DISREGARD THE CLIENT (partially).
Create your tables with auto-increment "InventoryID" for purposes of guaranteeing simple, disconnected context to anything else on the record. Create a SECOND column that is their "InventoryIDUnit" which can be a "candidate" key matching the business rules you are responsible for keeping. When a search is done on the "InventoryIDUnit" (specifically formatted field), internally and through the rest of your system, you'll have the INTERNAL numeric for joining the rest of the way down through the system.
Think of a customer order system. If you had it based on a person's name, how many "Jane Doe" versions out there... are they the same or not... Internally, customers have an ID and all orders go back to that common ID. Then, Jane gets married and is now "Jane Smith"... Are you going to go back through the data and rename all the entries to the new name??? That's the whole purpose of a surrogate key.