I have one production server which will store data from day 1 (latest data) up to day 90.
I will move day 91 data to reporting server everyday when new data enters on production server.
My reporting server will keep 365 days of data.
Production will keep 90 days data.
There are still some daily data update in my production for the total 90 days data. How should I synchronize the changes in production data (90 days) with my reporting data ( 365 days) ?
Please advise.
And for the day 91 data import to reporting, is it the best way to use SSIS import wizard?
Thanks in advance.
No don't use the SSIS wizard. You cannot acheive what you want through the wizard.
You'll need to use something to move the data. Ig the two databases are on the same server you don't need SSIS you can just use INSERT/SELECT SQL statements to move the data. If the DB's are on different servers (or expecte to be in the future), then you need to use an ETL tool, of which SSIS may be your best option.
I suggest you store ALL data in your reporting database, i.e. day 1 to 365. Then you do all your reporting from the reporting database instead of trying to stitch the two databases together.
How do you identify day 91? is there a single field you can use to do this in the source?
The simplest approach is a rolling window approach. You delete day 0 to, say, day 20 in your reporting database. Then you load that same window over the top from production.
The other approach is a full CDC approach but if you have a reliable 'age' field that you can use, this won't be necessary.
Related
I have the shop live database. I need to have opportunity to make copies of this database but with orders data only for last 14 days. Database can be really big but almost 80 percent of data is order, payment and related tables. So we want to copy only last 14 days data of these tables, and all data of another tables. How it can be implemented?
For me it sounds like a classic ETL job. You could use any programming language (like Python) or KNIME that reads from the source db (with an SQL query with a WHERE clause like your_date_column >= CURDATE() - INTERVAL 14 DAY) and write to a sink db.
You can then run it as a (cron) job in Windows/Linux and create each day a backup for the last 14 days but make sure that you also delete/drop the older backups if the size of the backups gets to big.
I am an ex multi value developer that over the last 6 months have been thrust in to the world of SQL and apologies in advance for the length of the question. So far I have got by with general instinct (maybe some ignorance!) and a lot of help from the good people on this site answering questions previously asked.
First some background …
I have an existing reporting database (SQL Server) and a new application (using MySQL) that I am looking to copy data from at either 30 min, hourly or daily intervals (will be based on reporting needs). I have a linked server created so that I can see the MySQL database from SQL Server and have relevant privileges on both databases to do read/writes/updates etc.
The data that I am looking to move to reporting on the 30 minute or hourly schedule typically are header/transactions by nature have both created and modified date/time stamp columns available for use.
Looking at the reporting DBs other feeds, Merge is the statement used most frequently across linked servers but to other SQL server databases. The merge statements also seem to do a full table to table comparison which in some cases takes a while (>5mins) to complete. Whilst the merge seems to be a safe options I do notice a performance hit on reporting whist the larger tables are being processed.
In looking at delta loads only, using dynamic date ranges (eg between -1 hour:00:00 and -1 hour:59:59) on created and modified time stamps, my concern would be the failure of any one job execution could leave the databases out of sync.
Rather than initially ask for specific sql statements what I am looking for is a general approach/statement design for the more regular (hourly) executed statements with the ideal being just to perform delta loads of the new or modified rows safely with a SQL Server to MySQL connection.
I hope the information given is sufficient and any help/suggestions/pointers to reading material gratefully accepted.
Thanks in advance
Darren
I have done a bit of “playing” over the weekend.
The approach I have working pulls the data (inserts and updates) from MySQL via openquery into a CTE. I then merge the CTE into the SQL Server table.
The openquery seems slow (by comparison to other linked tables) but the merge is much faster due to limiting the amount of source data.
In order to log values and work on strategy of a student race car, I am working on a system that logs sensor values and store that into a database. Using a WiFi connection we would like to read these values, but this connection is not very stable. Besides that, we really want to be sure that the computer in the car doesn't have downtime and that it is energy efficient.
Our current design idea consist of a raspberry pi(or beaglebone or equivalent) in the car, and a server in the pits. In the car we would like to log the data, and when we have a connection this should be synchronized to the server in the pits. In the pits there are about 10 laptops that connects to this server where we would like to receive real time data if available, and otherwise the historical data.
MySQL replication looks like the way to go, where the car is the master and the server in the pits the slave. The downside of this is that both computers need to have the same data(correct?). We would like that the car only has the data of today, to have a small database on the raspberry. At the other side, we want a complete archive of racing days on the server in the pits. In the pits we should be able to easily select data from the past 8 days and plot this.
I think we have the following options:
Complete replication, Synchronize all data from the car to the pitlane. Easily implementable, but hard for the rasberry and not possible for a lot of data.
Replication per day. Replicate the data between the car and the server. On the server we have an archive, at the end of the day we have to copy the data from replication database to the archive database. This makes it difficult to select data over days.
Custom replication. We have to make a script on the pitlane server, that connects to the car and compares that with the database in the pitlane. If there are entries with a newer timestamp it copies it to the archieve. Custom scripts are less reliable than build in functions. Maybe more intensive for the raspberry, which can result in downtime.
Is there a better way to do it? That the server in the car automatically clears the database when synchronisation is complete for example, but that auto id's from the rows still keeps counting.
Thank you for thinking with me!
Bart
ps. To give an idea of the data, we have 70 values at 1Hz, 20 values at 5Hz, and 15 values at 20Hz (Double or INT).
I am working on a project where I am storing data in Sql Server database for data mining. I 'm at the first step of datamining, collecting data.
All the data is being stored currently stored in SQL Server 2008 db. The data is being stored in couple different tables at the moment. The table adds about 100,000 rows per day.
At this rate the table will have more than million records in about a month's time.
I am also running certain select statements against these tables to get upto the minute realtime statistics.
My question is how to handle such large data without impacting query performance. I have already added some indexes to help with the select statements.
One idea is to archive the database once it hits a certain number of rows. Is this the best solution going forward?
Can anyone recommend what is the best way to handle such data, keeping in mind that down the road I want to do some data mining if possible.
Thanks
UPDATE: I have not researched enough to decide what tool I would use for datamining. My first order of task is to collect relevant information. And then do datamining.
My question is how to manage the growing table so that running selects against it does not cause performance issues.
What tool you will you be using to data mine? If you use a tool that uses a relational source then you check the worlkload that it is submitting to the database and optimise based on that. So you don't know what indexes you'll need until you actually start doing data mining.
If you are using SQL Server data mining tools then they pretty much run off SQL Server cubes (which pre aggregate the data). So in this case you want to consider which data structure will allow you to build cubes quickly and easily.
That data structure would be a star schema. But there is additional work required to get it into a star schema, and in most cases you can build a cube off a normalised/OLAP structure OK.
So assuming you are using SQL Server data mining tools, your next step is to build a cube of the tables you have right now and see what challenges you have.
I would like to ask for help on how it would be best to replicate 4 tables from our OLTP production database into another database for reporting and keep the data there forever.
Our OLTP database cleans up data older than 3 months and now we have a requirement that 4 of the tables in that OLTP database need to be replicated to another database for reporting data should never be removed from those tables?
The structure of the tables is not optimal for reporting so once we have replicated/copied the tables over to the reporting database we would select from those tables into new tables with slightly fewer columns and slightly different data types. (e.g. they are using money data type for date for few columns).
It is enough if the data is replicated/copied on nightly basis but can be more frequently.
I know this is not detailed information I provide here but this is a rough description of what I have at the moment. Hopefully this enough so that someone could provide me with some suggestions/ideas for me.
Any suggestions for a good solution that would put the least amount of load to the OLTP database is highly appreciated?
Thanks in advance!
Have staging tables where you load new data (e.g. each night you can send data over for the previous day), then you can insert with transformations into the main history table on the reporting server (then truncate the staging table). To limit impact on the OLTP server, you can use backup / restore or log shipping and pull the data from a copy of the production database. This will also have the added benefit of thoroughly testing your backup/restore process.
Some others might suggest SSIS. I think it is overkill for a lot of these scenarios, but YMMV.