Moving from UUID to auto-increment keys - mysql

I have a huge MySQL database with around 400 tables. This database is generated by a CRM, and we are moving on to maintain our own MySQL database.
The primary keys in the whole schema is generated by MySQL UUID() function. Now, I do not want to continue using UUIDs because of some obvious reasons -
Too huge to store
inserts are slow, because of randomness in BTREE (defragmented pages in memory)
indexing is affected, obviously not as fast as you get with
auto-increment ints
But its benefits are that its unique, which is guaranteed by auto-increment ints too.
All the data in this schema has relationships all over (not enforced by foreign keys through) based on ids. For example, an ID of a row in a table is stored in a cross reference table of many-to-many relationship
I want to change the IDs from UUIDs to auto-increments, while still maintaining the new auto-incremented keys all over the data. I do not want to mess up my current data. Is there an easy way to achieve this?
We are using InnoDB engine
Thanks.

Related

In mysql/mariadb is index stored database level or in table level?

I'm in the process of moving an sql server database to mariadb.
In that i'm now doing the index naming, and have to modify some names because they are longer than 64 chars.
That got me wondering, do in mariadb the indexes get stored on the table level or on the database level like on sql server?
To rephrase the question in another way, do index name need to be unique per database or per table?
The storage engine I'm using is innoDB
Index names (in MySQL) are almost useless. About the only use is for DROP INDEX, which is rarely done. So, I recommend spending very little time on naming indexes. The names only need to be unique within the table.
The PRIMARY KEY (which has no other name than that) is "clustered" with the data. That is, the PK and the data are in the same BTree.
Each secondary key is a separate BTree. The BTree is sorted according to the column(s) specified. The leaf node 'records' contain the columns of the PK, thereby providing a way to get to the actual record.
FULLTEXT and SPATIAL indexes work differently.
PARTITIONing... First of all, partitioning is rarely useful. But if you have any partitioned tables, then here are some details about indexes. A Partitioned table is essentially a collection of sub-tables, each identical (including index names). There is no "global index" across the table; each index for a sub-table refers only to the sub-table.
Keys belong to a table, not a database.

MySQL ( InnoDB): Guid as Primary Key for a Distributed Database

I come from the MSSQL world and have no expert knowledge in MySQL.
Having a GUID as primary key in these two different RDBMs systems is possible. In MSSQL i better do some things in order to not run into a performance nightmare as the row count increases (many million rows).
I create the primary key as a non clustered index to prevent that the database pages change if i insert a new row. If i don't do that the system would insert the row between some existing rows and in order to do that the hard drive needs to find the right position of the page on the disc. I create a second column of a numeric type and this time as a clustered index. This guarantees that new rows will get appended on insert.
Question
But how i do this in MySQL? If my information is right, i cannot force mysql to a non clustered primary key. Is this necessary or does MySQL stores the data in a manner that will not result in a performance disaster later?
Update: But why?
The reason i want to do this is because i want to be able to realize a distributed database.
I ended up using a Sequential GUIDs as described on
CodeProject: GUIDs as fast primary keys under multiple databases.
Great performance!

MySql | relational database vs non relational database in terms of Performance

What i want to ask, if we define relations, one-to-one, one-to-many etc will that increase the performance in comparison to if we dont create relations but do join the table on the go like
select * from employee inner join user on user.user_id = employee.user_id
i know this question has been asked before and most answers i have got saying that performance don't get affected by not using relations.
But i have also heard that creating indexes makes the query faster, so is it possible to create indexes on tables for foreign keys without creating relations. I'm little confused about index.
and what if we have large database like 100+ tables plus alot of records will the relations matter in terms of database query performace??
im using mysql and php..
Foreign keys are basically used for data integrity.
Of course, indexing boosts performance.
Regarding the performance with or without foreign keys, when it's said they improve performance is because when you define a foreign key you are implicitly defining an index. Such an index is created on the referencing table automatically if it does not exist.
Relations are used to maintain the referential integrity of the database. They do not affect performance of the "select" query at all. They do reduce performance of "insert", "update" and "delete" queries, but you rarely want a relational database without referntial integrity.
Indexes are what makes the "select" query run faster. They also make insert and update queries significantly slower. To know more about how the indexes work go to use-the-index-luke. This is by far the best site about this topic that I have found.
That said, databases usually make indexes automatically when you declare a primary key, and some of them (MySql in particular) make indexes automatically even when you define a foreign key. You can read all about why they do that on the above site.

Merging auto-increment table data

I have multiple end-user mySQL dbs with a fairly large amount of data that must be synchronized with a database (also mySQL) populated by an external data feed. End users can add data to their "local" DB, but not to the feed.
The question is how to merge/synchronize the two databases including the foreign keys between the tables of the DBs, without either overwriting the "local" additions or changing the key of the local additions.
Things I've considered include using a csv dump of the feed DB and doing a LOAD DATA INFILE with IGNORE, and then just comparing the files to see which rows from the feed didn't get written, and write them manually and writing some script to go line by line through the feed DB and create new rows in the local DBs, creating new keys at the same time. However, this seems like it could be very slow, particularly with multiple dbs.
Any thoughts on this? If there was a way to merge these DBs, preserving the keys with a sort of load infile simplicity and speed, that would be ideal.
Use a compound primary key.
primary key(id, source_id)
Make each db use a different value for source_id. That way you can copy database contents around without having PK clashes.
One option would be to use GUIDs rather than integer keys, but it may not be practical to make such a significant change.
Assuming that you're just updating the user databases from the central "feed" database, I'd use CSV and LOAD INFILE, but load into a staging table within the target database. You could then replace the keys with new values, and finally insert the rows into the permanent tables.
If you're not dealing with huge data volumes, it could be as simple as finding the difference between the highest ID of the existing data and the lowest ID of the incoming data. Add this amount to all of the keys in your incoming data, and there should be no collisions. This would waste some PK values, but that's probably not worth worrying about unless your record count is in the millions. This assumes that your PKs are integers and sequential.

Session / Log tables keys design question

I have almost always heard people say not to use FKs with user session and any log tables as those are usually High write tables and once written data almost always tays forever without any updates or deletes.
But the question is I have colunms like these:
User_id (link a session or activity log to the user)
activity_id (linking the log activity table to the system activity lookup table)
session_id (linking the user log table with the parent session)
... and there are 4-5 more colunms.
So if I dont use FKs then how will i "relate" these colunms? Can i join tables and get the user info without FKs? Can i write correct data without FKs? Any performance impact or do people just talk and say this is a no no?
Another question I have is if i dont use FKs can i still connect my data with lookup tables?
In fact, you can build the whole database without real FKs in mysql. If you're using MyISAM as a storage engine, the FKs aren't real anyway.
You can nevertheless do all the joins you like, as long as the join keys match.
Performance impact depends on how much data you stuff into a referenced table. It takes extra time if you have a FK in a table and insert data into it, or update a FK value. Upon insertion or modification, the FK needs to be looked up in the referenced table to ensure the reference integrity.
On highly used tables which don't really need reference integrity, I'd just stick with loose columns instead of FKs.
AFAIK InnoDB is currently the only one supporting real foreign keys (unless MySQL 5.5 got new or updated storage engines which support them as well). Storage engines like MyISAM do support the syntax, but don't actually validate the referential integrity.
FK's can be detrimental in "history log" tables. This kind of table wants to preserve the exact state of what happened at a point in time.
The problem with FK's is they don't store the value, just a pointer to the value. If the value changes, then the history is lost. You DO NOT WANT updates to cascade into your history log. It's OK to have a "fake Foreign key" that you can join on, but you also want to intensionally de-normalize relevant fields to preserve the history.