Merging data while keeping relationships in SQL Server - sql-server-2008

The scenario I have is that multiple PCs running local SQL Server 2008 instances will be generating data using tables with integer identity fields. This data will have related records, linked on those integer ID fields. The data needs to be mergeable from multiple PCs into a single database on a central server with the same structure (so the same reporting code can run against any database) while properly maintaining links between the related records.
As a (fictional) example, let's say the PCs are all recording aspects of the weather. Every 10 minutes, a record is created in the WeatherInspection table. This has an integer identity ID field. A number of records are also created in WeatherInspectionItems containing the temperature at a number of different temperature sensors. These records are related to the WeatherInspection table by the ID field. This is not the real scenario, but illustrates the principle - parent table with an integer ID field, child table linked back on that ID. In practice, there are many more related tables, each with an int ID field.
I need to then be able to merge WeatherInspection and WeatherInspectionItems from all the PCs into a central SQL Server 2008 database. Because each PC has its own identity fields, each PC could have used the same IDs within its own WeatherInspection table.
During the merge, I need to be able to assign a new identity value to the WeatherInspection records so they remain unique in the master database, but the big issue for me is that I also need to be able to alter the value in the child records so they link to the new ID field.
I want to be able to:
Keep using int IDs rather than switching to GUIDS
Maintain the same database structure in both DBs
Keep int IDs as the sole primary key field
I am really interested in whether there is any merge technology within SQL Server or other related products that can reassign ID fields in parent table and maintain the relationship with child records.
I know I could have composite primary keys locally with the machine ID or something like that in it, but due to ORM tools that we may be using that need a single int ID field, I am trying to avoid composite keys and GUIDs.
I've tried searching, but can't find an article anywhere that covers updating related child records with new parent ID values.
Thanks!

With the MS Sync Framework, you can intercept the data before it makes it to the database.
I am not sure if this is possible with merge-replication.

GUIDs would seem the easiest way to go, but if you insist on using the generated ids then it is possible if your ORM is using an SQL Server SEQQUENCE to get the ids.
All you need to to is set your IDENTITY(start,1) to have a defined start number on for all the tables on a given server -- with a diffrent "start" for the other server. 0 on server1, 1000000 on server two , 2000000 on server 3 etc.
Unfortunately SQLServer does not allow you to define an "end" number for an identity sequence so you will have to keep an eye on your tables to detect an overlap with another server.
This solution is linmited an will only work for a small number of servers expecting < 1000000 rows in any one table.
Like I say GUIDS seem like a better solution.

Can't the SQL Data Compare tool from RedGate bring a solution here?

Related

Should I use multiple databases in MySQL for my "hosting" platform? [duplicate]

Let us say I need to design a database which will host data for multiple companies. Now for security and admin purposes I need to make sure that the data for different companies is properly isolated but I also do not want to start 10 mysql processes for hosting the data for 10 companies on 10 different servers. What are the best ways to do this with the mysql database.
There are several approaches to multi-tenant databases. For discussion, they're usually broken into three categories.
One database per tenant.
Shared database, one schema per
tenant.
Shared database, shared schema. A tenant identifier (tenant key) associates every row with the right tenant.
MSDN has a good article on the pros and cons of each design, and examples of implementations.
Microsoft has apparently taken down the pages I referred to, but they are on on archive.org. Links have been changed to point there.
For reference, this is the original link for the second article
In MySQL I prefer to use a single database for all tenants. I restrict access to the data by using a separate database user for each tenant that only has access to views that only show rows that belong to that tenant.
This can be done by:
Add a tenant_id column to every table
Use a trigger to populate the tenant_id with the current database username on insert
Create a view for each table where tenant_id = current_database_username
Only use the views in your application
Connect to the database using the tenant specific username
I've fully documented this in a blog post:
https://opensource.io/it/mysql-multi-tenant/
The simple way is: for each shared table, add a column says SEGMENT_ID. Assigned proper SEGMENT_ID to each customer. Then create views for each customer base on the SEGMENT_ID, These views will keep data separated from each customers. With this method, information can be shared, make it simple for both operation & development (stored procedure can also be shared) simple.
Assuming you'd run one MySQL database on a single MySQL instance - there are several ways how to distinguish between what's belonging to whom.
Most obvious choice (for me at least) would be creating a composite primary key such as:
CREATE TABLE some_table (
id int unsigned not null auto_increment,
companyId int unsigned not null,
..
..
..,
primary key(id, company_id)
) engine = innodb;
and then distinguishing between companies by changing the companyId part of the primary key.
That way you can have all the data of all the companies in the same table / database and at application level you can control what company is tied to which companyId and determine which data to display for certain company.
If this wasn't what you were looking for - my apologies for misunderstanding your question.
Have you considered creating a different schema for each company?
You should try to define more precisely what you want to achieve, though.
If you want to make sure that an HW failure doesn't compromise data for more than one company, for example, you have to create different instances and run them on different nodes.
If you want to make sure that someone from company A cannot see data that belong to company B you can do that at the application level as per Matthew PK answer, for example
If you want to be sure that someone who manages to compromise the security and run arbitrary SQL against the DB you need something more robust than that, though.
If you want to be able to backup data independently so that you can safely backup Company C on mondays and Company A on sundays and be able to restore just company C then, again, a purely application-based solution won't help.
Given a specific DB User, you could give a user membership to group(s) indicating the companies whose data they are permitted to access.
I presume you're going to have a Companies table, so just create a one-to-many relationship between Companies and MySQLUsers or something similar.
Then, as a condition of all your queries, just match the CompanyID based on the UserID
in my file Generate_multiTanentMysql.php i do all steps with PHP script
https://github.com/ziedtuihri/SaaS_Application
A Solution Design Pattern :
Creating a database user for each tenant
Renaming every table to a different and unique name (e.g. using a prefix ‘someprefix_’)
Adding a text column called ‘id_tenant’ to every table to store the name of the tenant the row belongs to
Creating a trigger for each table to automatically store the current database username to the id_tenant column before inserting a new row
Creating a view for each table with the original table name with all the columns except id_tenant. The view will only return rows where (id_tenant = current_database_username)
Only grant permission to the views (not tables) to each tenant’s database user
Then, the only part of the application that needs to change is the database connection logic. When someone connects to the SaaS, the application would need to:
Connect to the database as that tenant-specific username

Convert Access table to SharePoint list with auto increment ID starting from the same number

I need to move a table from within an ms access database to a SharePoint list. The table I need to move has had old records removed that has ID's auto increment. I need the SharePoint list to start from the same auto increment number as in the table.
I have tried using the ms access export to SharePoint list functionality but when I re-import the table back into access as a linked table to a SharePoint list the ID has started back at 1 (not 81 like in the table I uploaded to SharePoint).
I need the table to upload to the SharePoint list with the auto increment ID starting at 81 as it is in the table initially.
I understand Albert's logic, but somewhat disagree. I worked with one client that had been using an original Auto-numbered field as the Customer's ID in their access database for years, so their work orders, invoices, etc..., all have the Customer's ID there. If you use the newly-created autonumber field, then all customer id's would change to new numbers. Worse yet, if you use a 2nd, non-autonumber field as Albert recommends, then anytime you add a new customer, you would need to use a sql statement (or query) to determine the last used number from this 2nd field and increment it 1 so that it is unique. Kind of a pain.
So, the workaround is still a pain, but for a solution that does what you actually asked, you can do this. Create a new list in SharePoint with an auto-numbered field. link that list in the access database. then, look at the highest ID number (eg highest customer id) from the table with your original data... because you need to create that many rows in the SharePoint list. You can either create code to loop through creating x number of records, or if you're not comfortable with that, create an excel sheet with that many rows. then, import that excel file into the linked table. next, create a delete query which deletes all records in the linked table that doesn't have the matching ID (eg customer ID) in your access table. this leaves you with a linked list with only the IDs you are using, and it's still auto-numbering so new records are automatically assigned new numbers.
Like I said, it's a bit of a work-around. But this actually does what you are asking for, instead of being forced into a different solution.
If the column in question is a autonumber column, then during a upload or migration then those autonumbers can change. Since such numbers have NO meaning, then this should not matter. If you have several related tables then you MUST ensure that your relations are setup correctly before you move the data to sharepoint (because SharePooint will re-number these values, then the child tables and FK keys ALSO will be correctly updated). However, if you don't set the relationships, then you WILL break the related data since SharePoint does and can and will re-number the PK's used.
You are limited to ONLY using autonumber PK's if you wish to keep related tables intact. You cannot control this re-numbering, but as noted the number ONE rule in databases is that such numbers do NOT matter anyway.
If you MUST and WANT to stop the re-numbering of that column, then change the data type to a long number, and NOT autonumber type. And then of course simply add another autonumber column. So to STOP or PREVENT the numbers being changed, you have to convert the column from being autonumber to a standard long number column. (edit: you ALSO thus have to ensure that the column is NOT marked as PK).
Keep in mind that any other table as part of the related data will ALSO see those standard long number columns re-numbered and changed if that column is part of a defined relationships to some PK. So Access during a up-size WILL re-number the PK (autonumber) and will ALSO automatic for you re-number the FK columns used in child tables. If you as noted do NOT want such re-numbering to occur, then the PK and FK columns can NOT be part of a defined relationship.
So dropping the autonumber column is the only way to prevent such re-numbering. Since autonumbers don't have any real meaning, then if they are changed during a up-load, then this should not matter.

Database design with user data and central database updates

I am designing a windows desktop app. It uses LiteDB as the single file local db for users - using it very much as a relational database with foreign keys etc (each Table having an integer ID as primary key and references to other tables via FK integers).
It's a retro-gaming app, so 'tables' will include things such as:
System (e.g. "Sony PlayStation", "Nintendo 64")
Controller (e.g. "Sony Dual Shock")
Control (e.g. "Cross", "Start", "Select")
Because of the above, I will have to stick to using integer IDs as the primary key - I though about using the 'name', but this wouldn't work for Controls (i.e. Start will be found on many controllers).
User should be able to add and delete records as they wish (although there will be a discouraging of deleting 'standards')
The challenge is that I'm also going to host a mysql database on my server, allowing users to update their tables from this. Now this is the bit I can't get my head around.
Say they add a System "Casio Watch" to their local table. This will get an auto-gen ID (say '94'). At the same time, some updates occur on the server database and a new system is added (e.g. "Commodore Calculator") this also gets the auto-gen ID of '94.' That's conflict number 1.
You could get around the above by just appending it as a new row in the user db - getting a new ID in that. But my second worry is around foreign keys. Let's say there's a 'Manufacturers' table with a 'Biggest Seller' field. Now on the server, for Manufacturer = Commodore, the 'Biggest Seller' FK is 94 for "Commodore Calculator" However, if this Manufacturer table is imported into the user local db, then Commodore's biggest seller would be "Casio watch" - it's ID being 94 on the user db.
Forgive me if I'm being a bit slow about all this. Referential integrity is coming to mind (is that the one with update/null FKs on change??) but I don't think you can do this through LiteDB (i.e. a change in one does not cascade to related tables).
Any advice would be greatly appreciated.
Using a simple auto increment field will not work as you have accurately stated.
Either add a "server id" field to the relevant tables identifying the computer / installation the data comes from and making sure that this field is unique across all your installations. Each system / manufacturer / etc that you need to synchronise across multiple databases will have a compound primary key consisting of the server id and an auto incremented value (although, you probably need to have a separate generator to create the auto increment locally). So, "Casio Watch" would have the server id of 1 and the auto incremented value of 94. The "Commodore Calculator" would have the same auto increment value, but its server id would be different, therefore no conflict will occur.
The other option is to use universally unique id (UUID) instead of a simple auto increment field. UUIDs are guaranteed to be unique across all mysql installations (there are some limitations). In mysql you can use the uuid() function to generate a uuid.
From a system design view UUID is simpler because mysql guarantees its uniqueness within certain limitations that are described in the above link. However, UUIDs require more storage space and will have negative impact on innodb's performance.

Migrate data from MySQL with auto-increment Ids to the Google Datastore?

I am trying to migrate some data from MySql to the Datastore. I have a table called User with auto-increment primary keys (Bigint(20)). Now I want to move the data from the User table to the datastore.
My plan was let the Datastore generate new Ids for the migrated users and all the new user created after the migration is done. However we have many services (notifications, urls etc) that depend on the old ids. So I want to use the old ids for the migrated user, however how can I guarantee that all new generated ids won't collide with the migrated Ids?
Record the maximum and minimum ids before migrating. Migrate all the sql rows to datastore entities, setting entity.key.id = sql.row.id.
To prevent new datastore ids from colliding with the old ones, always call AllocateIds() to allocate new ids. In C#, the code looks like this:
Key key;
Key incompleteKey = _db.CreateKeyFactory("Task").CreateIncompleteKey();
do
{
key = _db.AllocateId(incompleteKey);
} while (key.Path[0].Id >= minOldId && key.Path[0].Id <= maxOldId);
// Use new key for new entity.
In reality, you are more likely to win the lottery than to see a key collide, so it won't cost anything more to check against the range of old ids.
You cannot hint/tell the Datastore to reserve specific IDs. So, if you manually set IDs when inserting existing data, and later have the Datastore assign an ID, it my pick an ID that you have already used. Depending on the operation you are using (e.g. INSERT or UPSERT), the operation may fail or overwrite the existing entity.
You need to come up with a migration plan to map existing IDs to Datastore IDs. Depending on the number of tables you have and the complexity of relations between them, this could become a time consuming project, but you should still be able to do it.
Let's take a simple example and assume you have two tables:
USER (USER_ID is primary key)
USER_DATA (USER_ID is foreign key)
You could possibly add another column to USER (or another way) to map the USER_ID to DATASTORE_ID. Here, you call Datastore's allocateID method for the Kind you want to use and store the returned ID into the new column.
Now, you can move USER data to Cloud Datastore ignoring the MySQL User ID, instead use the ID from the new column.
To migrate the data from USER_DATA, do a join between the two tables and push the data using datastore ID.
Also, note that using sequential IDs (referred to as monotonically increasing values) could cause performance issues with Datastore. So, you probably want to use IDs that are generated by the Datastore.

Alternative to using same foreign key in almost every table

I am working with a database where "almost" every table in the database has the same field and same value. For example, almost all tables have a field called GroupId and there is only one group id in the database now.
Benefits
All data is related to that field and can be identified by said field
When a new group is created data will be properly identified for the group
Disadvantages
All tables have the this field
All stored procedures need to have this field as a parameter
All queries have to filtered by this field
Is this a big deal? Is there an alternative to this approach?
Thanks
If you need to be able to identify data by more than one group in the future, having foreign keys is a good practice. However, that deosn't mean all tables need to have this field, only the ones directly related to the group. For instance a lookuptable with state values may not need it, but the customers table might. Adding it to all tables willy-nilly can lead to bad things when you try to delete a record and have to check 579 tables (only 25 of which are pertinent). All this depends greatly on what the meaning of the groups is. Most of our tables have a relationship to the client table, because they contain data related to specific clients and because we don't want various clients to have the ability to see data for other clients. Tables which do not contain that kind of data do not.
Yes most queries may need the field and many stored procs will want to have it as an input variable, but if you truly need to filter on this information, then that is as it should be.
If however there is only one group and will never be more than one group, it is a waste of time, effort and space.