My requirement is to load data on daily basis from source table from one server to destination table in another server. Both servers are different i.e one is sql server and another one is Oracle server.
what i want is to make the source query fast. Suppose today I Execute the package I should get only the new records instead of all records from source. If I read the whole table it takes much time. Even I am using Lookup transforming to check the record exists or not it is taking much time.
Please look into this
Related
I have to migrate data from one Non-SQL Server-database to SQL-Server database using ssis.
Data contains millions of rows.
However I want to make sure that data in the source and data in the destination remains same.
One of the answer that i followed is suggesting to use Staging Tables.
In addition to above technique What could be the best technique for doing this.
Any Thought/Suggestion would be appreciated.
Thanks
The staging area in the world of Data warehouse is the place where you just copy the data from the source for multiple reasons :
To only execute bulk copy from production server and then avoid to use too much ressources from production servers.
To keep the data unmodified during your calculation
To apply filter and other aggregation to prepare your queries that fill the DWH.
In your case, the staging area is a good idea to make the first step from non-sql to relationnal Database.
Moreover the staging is just a copy then you won't alter the integrity of the data during this step.
Because of this you can run some "integrity tests" after your migration by running count on the staging table and you final structure or by summing data and compare the global result to identify differences.
Hi I have around 1000 million record in production Table and around 1500 million record in Staging table,Here I need to compare staging data with production data if any new record then insert it and if there are some changes on record then update the column or insert a new column.So what is the best approach for this condition?Please suggest.
If you want to get a copy of a table from one database to another four times a year, one way is to simply truncate and reload the target table. Don't mess around trying to update individual rows, it's often quicker to delete the lot and reload them.
You need to do some analysis and see if there is any field you can use in the source to reliably load a subset of data. For example if there is a 'created date' on the record you can use that just load the data created in the last three months data (instead of all of it)
If you add some more info then we can be more specific about a solution.
i.e.
-This has to be fast but we have lots of disk space
-This must be simple to maintain and can take a day to load
Also... I assume the source and target is SQL Server? If so is it the Enterprise edition?
I am working on a project where I am storing data in Sql Server database for data mining. I 'm at the first step of datamining, collecting data.
All the data is being stored currently stored in SQL Server 2008 db. The data is being stored in couple different tables at the moment. The table adds about 100,000 rows per day.
At this rate the table will have more than million records in about a month's time.
I am also running certain select statements against these tables to get upto the minute realtime statistics.
My question is how to handle such large data without impacting query performance. I have already added some indexes to help with the select statements.
One idea is to archive the database once it hits a certain number of rows. Is this the best solution going forward?
Can anyone recommend what is the best way to handle such data, keeping in mind that down the road I want to do some data mining if possible.
Thanks
UPDATE: I have not researched enough to decide what tool I would use for datamining. My first order of task is to collect relevant information. And then do datamining.
My question is how to manage the growing table so that running selects against it does not cause performance issues.
What tool you will you be using to data mine? If you use a tool that uses a relational source then you check the worlkload that it is submitting to the database and optimise based on that. So you don't know what indexes you'll need until you actually start doing data mining.
If you are using SQL Server data mining tools then they pretty much run off SQL Server cubes (which pre aggregate the data). So in this case you want to consider which data structure will allow you to build cubes quickly and easily.
That data structure would be a star schema. But there is additional work required to get it into a star schema, and in most cases you can build a cube off a normalised/OLAP structure OK.
So assuming you are using SQL Server data mining tools, your next step is to build a cube of the tables you have right now and see what challenges you have.
I'm trying to replicate data between 2 MySQL databases. The issue is only some rows needs to be transferred to the second MySQL server based on a specific criteria.
I have 2 MySQL servers. The first one is intranet only, there is an application that reads/writes to it. The second MySQL server is online and the application connecting to it is read only.
I need to find a way to get the data from the first server to the second based on specific criteria (some rows are labeled as private and should not synchronized). I tried to do it with a trigger on the first server (trigger on insert/update) but I have way too many tables, it's very time consuming to do it like that.
What approaches do I have? dumping the entire data is not an option as there will a lot of records and the online server cannot afford to go offline just to get the information. Add to that that not all the records are for public usage.
1 - disable replication
2 - on the intranet, create an empty database and a view based on a query that shows exactly the rows you want to replicate to your internet server
3 - replicate the new database (the one containing the view) to a new database on your internet server
4 - on your internet server, you can cron a script that inserts the new rows to your desired table table, think about using dumps and LOAD DATA IN FILE, it should go verry quickly.
I have an SSIS package that connects to a mysql server and attempts to pulls data from different tables and inserts the rows into a SQL Server 2005 database.
One issue i notice is that at any given time it runs, regardless of what step it is on, it almost always fails to bring in the total records from mysql into sql server.
there are no errors thrown.
One morning it will have all 11M records and on the next run anywhere between 3K and 17K records.
Anyone notice anything like this?
I import data from two separate MySQL databases -- one over the Internet and one in-house. I've never had this type of issue. Can you describe how you setup your connection to the MySQL database? I used the ODBC driver available on the MySQL website and connect using an ADO.NET data source in my data flow that references the ODBC connection.
One possible way you could at least prevent yourself from loading incomplete data is only load new records. If the source table has an ID and the records never change once they are inserted, then you could feed in the maximum ID by checking your database first.
Another possible way to prevent loading incomplete data is loading the MySQL database into a staging table on your destination server and then only load records you haven't already loaded.
Yet another way to do it is load the data into a staging table, verify the records are greater than some minimum threshold such as the row count of the target table or the expected minimum number of transactions per day and then only commit the changes after this validation. If the rows are insufficent, then raise an error on the package and send a notification email. The advantage of raising an error is you can set your SQL Server Agent job to retry the step for a defined number of attempts to see if this resolves the issue.
Hope these tips help even if they don't directly address the root cause of your problem.
I've only tried MySQL -> SQL Server via SSIS once, but the error I found related to MySQL datetimes not converting to SQL Server datetimes. I would have thought this would break the whole dataflow, but depending on your configuration you could have set it purely to ignore bad rows?