Database design with user data and central database updates - mysql

I am designing a windows desktop app. It uses LiteDB as the single file local db for users - using it very much as a relational database with foreign keys etc (each Table having an integer ID as primary key and references to other tables via FK integers).
It's a retro-gaming app, so 'tables' will include things such as:
System (e.g. "Sony PlayStation", "Nintendo 64")
Controller (e.g. "Sony Dual Shock")
Control (e.g. "Cross", "Start", "Select")
Because of the above, I will have to stick to using integer IDs as the primary key - I though about using the 'name', but this wouldn't work for Controls (i.e. Start will be found on many controllers).
User should be able to add and delete records as they wish (although there will be a discouraging of deleting 'standards')
The challenge is that I'm also going to host a mysql database on my server, allowing users to update their tables from this. Now this is the bit I can't get my head around.
Say they add a System "Casio Watch" to their local table. This will get an auto-gen ID (say '94'). At the same time, some updates occur on the server database and a new system is added (e.g. "Commodore Calculator") this also gets the auto-gen ID of '94.' That's conflict number 1.
You could get around the above by just appending it as a new row in the user db - getting a new ID in that. But my second worry is around foreign keys. Let's say there's a 'Manufacturers' table with a 'Biggest Seller' field. Now on the server, for Manufacturer = Commodore, the 'Biggest Seller' FK is 94 for "Commodore Calculator" However, if this Manufacturer table is imported into the user local db, then Commodore's biggest seller would be "Casio watch" - it's ID being 94 on the user db.
Forgive me if I'm being a bit slow about all this. Referential integrity is coming to mind (is that the one with update/null FKs on change??) but I don't think you can do this through LiteDB (i.e. a change in one does not cascade to related tables).
Any advice would be greatly appreciated.

Using a simple auto increment field will not work as you have accurately stated.
Either add a "server id" field to the relevant tables identifying the computer / installation the data comes from and making sure that this field is unique across all your installations. Each system / manufacturer / etc that you need to synchronise across multiple databases will have a compound primary key consisting of the server id and an auto incremented value (although, you probably need to have a separate generator to create the auto increment locally). So, "Casio Watch" would have the server id of 1 and the auto incremented value of 94. The "Commodore Calculator" would have the same auto increment value, but its server id would be different, therefore no conflict will occur.
The other option is to use universally unique id (UUID) instead of a simple auto increment field. UUIDs are guaranteed to be unique across all mysql installations (there are some limitations). In mysql you can use the uuid() function to generate a uuid.
From a system design view UUID is simpler because mysql guarantees its uniqueness within certain limitations that are described in the above link. However, UUIDs require more storage space and will have negative impact on innodb's performance.

Related

Migrate data from MySQL with auto-increment Ids to the Google Datastore?

I am trying to migrate some data from MySql to the Datastore. I have a table called User with auto-increment primary keys (Bigint(20)). Now I want to move the data from the User table to the datastore.
My plan was let the Datastore generate new Ids for the migrated users and all the new user created after the migration is done. However we have many services (notifications, urls etc) that depend on the old ids. So I want to use the old ids for the migrated user, however how can I guarantee that all new generated ids won't collide with the migrated Ids?
Record the maximum and minimum ids before migrating. Migrate all the sql rows to datastore entities, setting entity.key.id = sql.row.id.
To prevent new datastore ids from colliding with the old ones, always call AllocateIds() to allocate new ids. In C#, the code looks like this:
Key key;
Key incompleteKey = _db.CreateKeyFactory("Task").CreateIncompleteKey();
do
{
key = _db.AllocateId(incompleteKey);
} while (key.Path[0].Id >= minOldId && key.Path[0].Id <= maxOldId);
// Use new key for new entity.
In reality, you are more likely to win the lottery than to see a key collide, so it won't cost anything more to check against the range of old ids.
You cannot hint/tell the Datastore to reserve specific IDs. So, if you manually set IDs when inserting existing data, and later have the Datastore assign an ID, it my pick an ID that you have already used. Depending on the operation you are using (e.g. INSERT or UPSERT), the operation may fail or overwrite the existing entity.
You need to come up with a migration plan to map existing IDs to Datastore IDs. Depending on the number of tables you have and the complexity of relations between them, this could become a time consuming project, but you should still be able to do it.
Let's take a simple example and assume you have two tables:
USER (USER_ID is primary key)
USER_DATA (USER_ID is foreign key)
You could possibly add another column to USER (or another way) to map the USER_ID to DATASTORE_ID. Here, you call Datastore's allocateID method for the Kind you want to use and store the returned ID into the new column.
Now, you can move USER data to Cloud Datastore ignoring the MySQL User ID, instead use the ID from the new column.
To migrate the data from USER_DATA, do a join between the two tables and push the data using datastore ID.
Also, note that using sequential IDs (referred to as monotonically increasing values) could cause performance issues with Datastore. So, you probably want to use IDs that are generated by the Datastore.

Using sAMAccountName as uid in MySQL database

I have an application that authenticate with LDAP and returns a JWT with the sAMAccountname of the logged user.
This application have a MySQL database where I'd like to store the user in different tables (fields like createdBy, updatedBy, etc.) and I was wondering what is the correct way of handling this:
using the sAMAccount name as identifier (so the createdBy will be a VARCHAR(25))
using a link table to match the sAMAccountname with an autoincremented identifier
Normally I would choose the "id" way, it's faster and easier to read in my opinion, but I'm not really into linking users from LDAP dictionary and changing their id in my database, so honestly I would choose the first option.
What are the pro/cons of using a string as uid ? In my case it's likely to be only for statuses like updatedBy, cratedBy, deletedBy etc. so I won't have hardlinks between multiple tables using an user identifier.
I think you should create user table with a surrogate primary key (autoincrementing one) and make unique index on sAMAccount column.
Natural primary keys are good because they just naturally describe a record they point to. But the downsize of using them is that they consume too much space in the index. Index lookups / rebuilds are slower. Tables consume more space also.
I'd connect everything using an id as primary key.
ONe thing is that the sAMAccountName is not necessarilly unique. Think of a user changing her or his name. The sAMAccountName might then change but it's still the same user. When you connect everything via an ID you can change the sAMAccountName-field without breaking everything.
But that's just my 2 cent

Do I need to lock a MySQL table when doing a SELECT followed by an INSERT?

I'm no database guru, so I'm curious if a table lock is necessary in the following circumstance:
We have a web app that lets users add entries to the database via an HTML form
Each entry a user adds must have a unique URL
The URL should be generated on the fly, by pulling the most recent ID from the database, adding one, and appending it to the newly created entry
The app is running on ExpressionEngine (I only mention this in case it makes my situation easier to understand for those familiar with the EE platform)
Relevant DB Columns
(exp_channel_titles)
entry_id (primary key, auto_increment)
url_title (must be unique)
My Hypothetical Solution -- is table locking required here?
Let's say there are 100 entries in the table, and each entry in the table has a url_title like entry_1, entry_2, entry_3, etc., all the way to entry_100. Each time a user adds an entry, my script would do something like this:
Query (SELECT) the table to determine the last entry_id and assign it to the variable $last_id
Add 1 to the returned value, and assign the sum to the variable $new_id
INSERT the new entry, setting the url_title field of the latest entry to entry_$new_id (the 101st entry in the table would thus have a url_title of entry_101)
Since my database knowledge is limited, I don't know if I need to worry about locking here. What if a thousand people try to add entries to the database within a 10 second period? Does MySQL automatically handle this, or do I need to lock the table while each new entry is added, to ensure each entry has the correct id?
Running on the MyISAM engine, if that makes a difference.
I think you should look at one of two approaches:
Use and AUTO_INCREMENT column to assign the id
Switching from MyISAM to the InnoDb storage engine which is fully transactional and wrapping your queries in a transaction

Merging data while keeping relationships in SQL Server

The scenario I have is that multiple PCs running local SQL Server 2008 instances will be generating data using tables with integer identity fields. This data will have related records, linked on those integer ID fields. The data needs to be mergeable from multiple PCs into a single database on a central server with the same structure (so the same reporting code can run against any database) while properly maintaining links between the related records.
As a (fictional) example, let's say the PCs are all recording aspects of the weather. Every 10 minutes, a record is created in the WeatherInspection table. This has an integer identity ID field. A number of records are also created in WeatherInspectionItems containing the temperature at a number of different temperature sensors. These records are related to the WeatherInspection table by the ID field. This is not the real scenario, but illustrates the principle - parent table with an integer ID field, child table linked back on that ID. In practice, there are many more related tables, each with an int ID field.
I need to then be able to merge WeatherInspection and WeatherInspectionItems from all the PCs into a central SQL Server 2008 database. Because each PC has its own identity fields, each PC could have used the same IDs within its own WeatherInspection table.
During the merge, I need to be able to assign a new identity value to the WeatherInspection records so they remain unique in the master database, but the big issue for me is that I also need to be able to alter the value in the child records so they link to the new ID field.
I want to be able to:
Keep using int IDs rather than switching to GUIDS
Maintain the same database structure in both DBs
Keep int IDs as the sole primary key field
I am really interested in whether there is any merge technology within SQL Server or other related products that can reassign ID fields in parent table and maintain the relationship with child records.
I know I could have composite primary keys locally with the machine ID or something like that in it, but due to ORM tools that we may be using that need a single int ID field, I am trying to avoid composite keys and GUIDs.
I've tried searching, but can't find an article anywhere that covers updating related child records with new parent ID values.
Thanks!
With the MS Sync Framework, you can intercept the data before it makes it to the database.
I am not sure if this is possible with merge-replication.
GUIDs would seem the easiest way to go, but if you insist on using the generated ids then it is possible if your ORM is using an SQL Server SEQQUENCE to get the ids.
All you need to to is set your IDENTITY(start,1) to have a defined start number on for all the tables on a given server -- with a diffrent "start" for the other server. 0 on server1, 1000000 on server two , 2000000 on server 3 etc.
Unfortunately SQLServer does not allow you to define an "end" number for an identity sequence so you will have to keep an eye on your tables to detect an overlap with another server.
This solution is linmited an will only work for a small number of servers expecting < 1000000 rows in any one table.
Like I say GUIDS seem like a better solution.
Can't the SQL Data Compare tool from RedGate bring a solution here?

Mysql Constraign Database Entries in Rails

I am using Mysql 5 and Ruby On Rails 2.3.1. I have some validations that occasionally fail to prevent duplicate items being saved to my database. Is it possible at the database level to restrict a duplicate entry to be created based on certain parameters?
I am saving emails to a database, and don't want to save a duplicate subject line, body, and sender address. Does anyone know how to impose such a limit on a DB through a migration?
You have a number of options to ensure a unique value set is inserted into your table. Lets consider 1) Push responsibility to the database engine or 2) your application's responsibilitiy.
Pushing responsibility to the database engine could entail the use of creating a UNIQUE index on your table. See MySql Create Index syntax. Note, this solution may result in an exception thrown in case a duplicate value is inserted. As you've identified what I infer to be three columns to determine uniqueness (subject line, body, and sender address) you'll create the index to include all three columns. Its been a while since I've worked with Rails so you may want to check the record count inserted as well.
If you desire to push this responsibility to your application software you'll need to contend with potential data insertion conflicts. That is, assume you have two users creating an email simultaneously (just work with me here) having the same subject line, body, and send address. Should your code simple query for any records consisting of the text (identical for both users in this example) both will return no records found and will proceed along merily inserting their emails which now violate your premise. So, you can address this with perhaps a table lock, or some other syncing field in the database to ensure duplicates don't appear. This latter approach could consist of another table with a single field indicating if someone is inserting a record or not, once completed it updates that record to state it has completed and then others can proceed.
While there you can have a separate architectural discussion on the implications of each alternative I'll leave that to a separate post. Hopefully this suffices in answering your question.
You should be able to add a unique index to any columns you want to be unique throughout the table.