Database data migration - sql-server-2008

I'm fishing for knowledge on this one as I know a way I could get what I'm after but I'm wondering if it's the best approach. Due to project creep the database of what was once a very simple application became an overcomplicated monster, and while it worked administration wasn't exactly easy.
I've got the opportunity now to essentially start again, however only piece by piece, gradually migrating functions from the old application to the new one. This requires that I keep the new database synchronised with the old one, which thankfully only needs to be one way as no data will be getting created on the new database that needs to be migrated back.
The options considered so far are SSIS and a Quartz triggered windows service using plain old C# ADO.Net. SSIS I've decided is probably a bad idea as it can be a nightmare with upserts, requiring temporary tables to be created followed by a merge and the schema differences are pretty extensive so the SSIS logic would be a headache. The ADO.Net approach is the direction I'm leaning as data readers, bulk inserts and LINQ should do the job nicely. However considering how many people this must have effected before me I'm thinking there must be a better way. What approach do you guys use?
To get a bit more specific the details are:
SQL Server 2008 R2
~2 million rows over roughly 30 tables -> ~20
tables
Databases will each be on completely separate database servers
with different credentials
Many thanks

Lets supose this scenario:
S_old: your 'old' server
S_new: your 'new' server
db_old: your 'old' database
db_new: your 'new' database
A solution may be a nighly db_old backup and restore from s_old to s_new: db_old_bk. Then, in s_new you have access to both databases (db_old_bk and db_new) At this point you can do upserts with 'merge' sqlcommand on t-sql easyly (we are talking about 20 tables). Is this that you are looking for?

Related

SQLite3 database per customer

Scenario:
Building a commercial app consisting in an RESTful backend with symfony2 and a frontend in AngularJS
This app will never be used by many customers (if I get to sell 100 that would be fantastic. Hopefully much more, but in any case will be massive)
I want to have a multi tenant structure for the database with one schema per customer (they store sensitive information for their customers)
I'm aware of problem when updating schemas but I will have to live with it.
Today I have a MySQL demo database that I will clone each time a new customer purchase the app.
There is no relationship between my customers, so I don't need to communicate with multiple shards for any query
For one customer, they can be using the app from several devices at the time, but there won't be massive write operations in the db
My question
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and if it's a common practice to have one dedicated SQLite3 database PER CLIENT. I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
Is this a correct scenario for SQLite?
Any suggestion (aka tutorial) in how to achieve this?
[I wonder] if it's a common practice to have one dedicated SQLite3 database PER CLIENT
Only if the database is deployed along with the application, like on a phone. Otherwise I've never heard of such a thing.
I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
SQLite is a SQL database and responds to ALTER TABLE and the like. As for updating all the schemas, you'll have to re-run the update for all schemas.
Schema synching is usually handled by an outside utility, usually your ORM will have something. Some are server agnostic, some only support specific servers. There are also dedicated database change management tools such as Sqitch.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and
SQLite's main advantage is not requiring you to install and run a server. That makes sense for quick projects or where you have to deploy the database, like a phone app. For server based application there's no problem having a database server. SQLite's very restricted set of SQL features becomes a disadvantage. It will also likely run slower than a server database for anything but the simplest queries.
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
Under no circumstances should you test with a different database than the production database. Databases do not all implement SQL the same, MySQL is particularly bad about this, and your tests will not reflect reality. Running a MySQL instance for testing is not much work.
This separate schema thing claims three advantages...
Extensibility (you can add fields whenever you like)
Security (a query cannot accidentally show data for the wrong tenant)
Parallel Scaling (you can potentially split each schema onto a different server)
What they're proposing is equivalent to having a separate, customized copy of the code for every tenant. You wouldn't do that, it's obviously a maintenance nightmare. Code at least has the advantage of version control systems with branching and merging. I know only of one database management tool that supports branching, Sqitch.
Let's imagine you've made a custom change to tenant 5's schema. Now you have a general schema change you'd like to apply to all of them. What if the change to 5 conflicts with this? What if the change to 5 requires special data migration different from everybody else? Now let's imagine you've made custom changes to ten schemas. A hundred. A thousand? Nightmare.
Different schemas will require different queries. The application will have to know which schema each tenant is using, there will have to be some sort of schema version map you'll need to maintain. And every different possible query for every different possible schema will have to be maintained in the application code. Nightmare.
Yes, putting each tenant in a separate schema is more secure, but that only protects against writing bad queries or including a query builder (which is a bad idea anyway). There are better ways mitigate the problem such as the view filter suggested in the docs. There are many other ways an attacker can access tenant data that this doesn't address: gain a database connection, gain access to the filesystem, sniff network traffic. I don't see the small security gain being worth the maintenance nightmare.
As for scaling, the article is ten years out of date. There are far, far better ways to achieve parallel scaling then to coarsely put schemas on different servers. There are entire databases dedicated to this idea. Fortunately, you don't need any of this! Scaling won't be a problem for you until you have tens of thousands to millions of tenants. The idea of front loading your design with a schema maintenance nightmare for a hypothetical big parallel scaling problem is putting the cart so far before the horse, it's already at the pub having a pint.
If you want to use a relational database I would recommend PostgreSQL. It has a very rich SQL implementation, its fast and scales well, and it has something that renders this whole idea of separate schemas moot: a built in JSON type. This can be used to implement the "extensibility" mentioned in the article. Each table can have a meta column using the JSON type that you can throw any extra data into you like. The application does not need special queries, the meta column is always there. PostgreSQL's JSON operators make working with the meta data very easy and efficient.
You could also look into a NoSQL database. There are plenty to choose from and many support custom schemas and parallel scaling. However, it's likely you will have to change your choice of framework to use one that supports NoSQL.

What is the best way to prevent Access database bloat

Intro:
I am creating a Access database system that will be rolled out with multi-user functionality.
But as i am creating this database in Access 2000 (Old school I know) there are quite a lot of bugs and random mysterious problems that occur when my database gets passed 40-60MB.
My question:
Has anyone got a good solution to how I can shrink this down or to prevent the bloat?
Details:
I am using many local tables combined with SQL Tables and my front-end links to a back-end SQL Server.
I have already tried compact and repair but it only ever shrinks it to about 15MB and after the user has used the database a few time the bloat expands quickly to over 50-60MB!
Let me know if more detail is needed but that is the rough outline of my problem.
Many Thanks!
Here's some ideas for you to follow.
You said you also have a lot of local tables. Split the local tables off into yet another Access database. So you'll have 2 back-ends (1 SQL Server & 1 Access), and the front end.
Create a batch file that opens your local tables backend database with the /compact option. So, it will look something like this:
"C:\Prog...\Microsoft...\Officexx\ C:\ProjectX_backend.mdb /compact"
Then run this batch file on a daily basis using scheduled tasks. Your frontend should never need compacting unless you edit it in any way.
If you are stuck with 2000, which has a quite bad reputation, then you have to dig down into your application and find out what creates the bloat. The most common reason are bulk inserts followed by deletes. Other reasons, are the use of OLE Object fields. Other reasons are programmatic changes in in form, etc objects. You really have to go through your application and find the specific cause.
An mdb file that is only connected to a backed server and does not make changes to local objects should not grow.
As for your random issues, besides some lack of stability in the 2000 version, you should look into bad RAM in the computers, bad hard drives, and broken network controllers if your mdb file is shared on the network.

Synchronising data between different databases

I'm looking for a possible solution for the following problem.
First the situation I'm at:
I've 2 databases, 1 Oracle DB and 1 MySQL DB. Although they have a lot of similarities they are not identical. A lot of tables are available on both the Oracle DB and the MySQL DB but the Oracle tables are often more extensive and contain more columns.
The situation with the databases can't be changed, so I've to deal with that.
Now I'm looking for the following:
I want to synchronise data from Oracle to MySQL and vice versa. This has to be done real time or as close to real time as possible. So when changes are made at one DB they have to be synced to the other DB as quickly as possible.
Also not every table has to be in sync, so the solution must offer a way of selecting which tables have to be synced and which not.
Because the databases are not identical replication isn't an option I think. But what is?
I hope you guys can help me with finding a way of doing this or a tool which does exactly what I need. Maybe you know some good papers/articles I can use?
Thanks!
Thanks for the comments.
I did some further research on ETL and EAI.
I found out that I am searching for an ETL tool.
I read your question and your answer. I have worked on both Oracle, SQL, ETL and data warehouses and here are my suggestions:
It is good to have a readymade ETL tool. But, if your application is big enough to make you need a tailor made ETL tool, I suggest you for a home-made ETL process.
If your transactional database is on Oracle, you can have triggers set up on the key tables that would further trigger an external procedure written in C, C++ or Java.
The reason behind using an external procedure is to be able to communicate with both databases at a time - Oracle and MySQL.
You can read more about Oracle External Procedures here.
If not through ExtProc, you can develop a separate application in Java or .Net that would extract data from the first database, transform it according to your business rules and load it into your warehouse.
In either approaches that you choose, you will have greater control on the ETL process if you implement your own tool, rather than going for a readymade tool.

MySQL joins across databases on different servers

So, I have an existing db with some tables for a class of users. We're building a more general app to handle multiple things the company does and this class of users, call them hosts, is a general type used by multiple programs in our company. We want to (eventually) migrate into a centralized app as now we have several. However, we don't have the time to do it completely right now. I need to build a login system for these hosts and I'd like to begin to migrate to this new system with that. I can't figure out a reasonable way to move those tables that are in the legacy DB to the new DB, which (of course) resides on a different server, with out wanting to stab my own eyes out after 30 seconds of having to deal with this. The legacy db has many reports the rely on joining on the current hosts tables.
The only things I can come up with don't seem like very good ideas. Those being, writing to both dbs from both apps (pointless data duplication prone to syncing problems), provide an API from the new app and mash the data coming back together with record sets (just seems... wrong).
Anyone have any ideas how to deal with this?
It has it's limitations, but the FEDERATED storage engine might be of assistance.

MS SQL - MySQL Migration in a legacy webapp

I wish to migrate the database of a legacy web app from SQL Server to MySQL. What are the limitations of MySQL that I must look out for ? And what all items would be part of a comprehensive checklist before jumping into actually modifying the code ?
First thing I would check is the data types - the exact definition of datatypes varies from database to database. I would create a mapping list that tellme what to map each of the datatypes to. That will help in building the new tables. I would also check for data tables or columns that are not being used now. No point in migrating them. Do the same with functions, job, sps, etc. Now is the time to clean out the junk.
How are you accessing the data through sps or dynamic queries from the database? Check each query by running it aganst a new dev database and make sure they still work. Again there are differences between how the two flavors of SQl work. I've not used my sql so I'm not sure what some of the common failure points are. While you are at it you might want to time new queries and see if they can be optimized. Optimization also varies from database to database and while you are at it, there are probably some poorly performing queries right now that you can fix as part of the migration.
User defined functions will need to be looked at as well. Don't forget these if you are doing this.
Don't forget scheduled jobs, these will need to be checkd and recreated in myslq as well.
Are you importing any data ona regular schedule? All imports will have to be rewritten.
Key to everything is to use a test database and test, test, test. Test everything especially quarterly or annual reports or jobs that you might forget.
Another thing you want to do is do everything through scripts that are version controlled. Do not move to production until you can run all the scripts in order on dev with no failures.
One thing I forgot, make sure the dev database you are running the migration from (the sql server database) is updated from production immediately before each test run. Hate to have something fail on prod because you were testing against outdated records.
Your client code is almost certain to be the most complex part to modify. Unless your application has a very high quality test suite, you will end up having to do a lot of testing. You can't rely on anything working the same, even things which you might expect to.
Yes, things in the database itself will need to change, but the client code is where the main action is, it will need heaps of work and rigorous testing.
Forget migrating the data, that is the last thing which should be on your mind; the database schema can probably be converted without too much difficulty; other database objects (SPs, views etc) could cause issues, but the client code is where the focus of the problems will be.
Almost every routine which executes a database query will need to be changed, but absolutely all of them will need to be tested. This will be nontrivial.
I am currently looking at migrating our application's main database from MySQL 4.1 to 5, that is much less of a difference, but it will still be a very, very large task.