Ok so I am trying to get a simple data warehouse built to run a report against.
I have three tables that I managed to get imported from two csv files and a view in a separate database.
I want to split out the columns (none of which have any keys setup yet) into dimensions and fact tables.
Should I go about this by just building a big script that just builds all the tables and creates keys for each one or is there an easier way.
The obstacles around the import took me days to get around so any help would be much appreciated.
Related
I am using SSIS to move data between a local MSSQL server table to a remote MYSQL table (Data flow, OLEdb source and ODBC Destination). this works fine if im only moving 2 lines of data, but is very slow when using the table I want which has 5000 rows that fits into a csv of about 3mb, this currently takes about 3 minutes using ssis's options, however performing the steps below can be done in 5 seconds max).
I can export the data to a csv file copy it to the remote server then run a script to import straight to the DB, but this requires a lot more steps that I would like as I have multiple tables I wish to perform the steps on.
I have tried row by row and batch processing but both are very slow in comparison.
I know I can use the above steps but I like using the SSIS GUI and would have thought there was a better way of tackling this.
I have googled away multiple times but have not found anything that fits the bill so am calling on external opinions.
I understand SSIS has its limitations but I would hope there is is a better and faster way of achieving what I am trying to do. If SSIS is so bad I may as well just rewrite everything into a script and be done with it, But I like the look and feel of the Gui and would like to move my data in this nice friendly way of seeing things happen.
any suggestions or opinions would be appreciated.
thank you for your time.
As above have tried ssis options including a 3rd party option cozyroc but that sent some data with errors (delimiting on columns seemed off) now and again, different amount of rows being copied and enough problems to make me not trust the data.
I have encountered a weird situation now.
Basically I have two data flow tasks in my SSIS package, these two tasks are loading the data to the same staging table. Each one is using the script component as the data source, and using the StreamReader on different two files.
If I enable two tasks at the same time, the 2nd data flow task loads the same data as the first one. But if I disable the 1st one and just leave the 2nd one enabled, it loads the correct file as expected.
I am not very sure what did I do wrong since all the StreamReader are defined on files with different names, the only common part is loading to the same destination.
And these two tasks are not in parallel, they are being constrained in sequence.
Any suggestion or advice? Thanks in advance.
Thanks for the comments, actually I found why they are trying to load the same data when two connections are open at the same time. It is because I copied the 1st script component to the 2nd one, which sharing the same task ID within the container, I tested multiple times, it turned out to be if the task ID are the same, the 2nd one will do what the 1st does, basically it is just the replica task as the 1st one even the codes are different.
Please correct me if my findings are wrong, but it seems it is what it is during my past tests.
I'm attempting to take stock data pulled from Google and create tables for each ticker to record historical market data in Access. I am able to easily import the delimited text data into Access, the problem is, I am pulling multiple tickers in one pull. When imported the data is vertical as such:
I know how to easily do this in Excel, yet I am having the worst time figuring out how to automate it in access. The reason I am attempting to automate it is that the database will be pulling this data and updating it every 15 minutes for over 300 ticker symbols. Essentially in the example above, I need to find 'CVX' then in a new table have it list out the data below it horizontally like so:
I have been searching online and am literally going bananas because I can't figure out how to do this (which would be simple in Excel). Does anyone have any experience manipulating data in this way or know of any potential solutions?
After some more research I realized the data I was getting is in JSON format. After digging a little more I was able to find online converters This one worked particularly well. After converting the file to CSV it was easy to import the data into Access.
I have 10 tables I am importing to another sql server database using SSIS.
Do I have to create 10 different Dataflow tasks or can I proceed with one Dataflow task and add the 10 tables to it?
I have tried to use a single dataflow task but it is only allowing for a single table.
Do all the source tables share one common schema?
Do all the destination tables share one common schema (which doesn't have to be the same as the common schema for the source tables)?
If the answer to both questions is "yes", then you can in fact write a single Data Flow Task (whose connection managers are parameterized) and put it in a Foreach Loop container.
If the answer to either (or both) of those questions is "no", then you'll have to have separate sources and destinations. You might want to investigate Business Intelligence Markup Language as a way to generate those data flows automatically, although it's probably overkill for "only" ten tables.
The answer depends upon you and your best practices and how many developers you will have working on projects at the same time.
It is entirely possible to put more than one set of tables in a single dataflow. You can simply add additional sources and destinations to your dataflow. However, this is almost never a good idea as it adds to the maintenance effort later in the lifecycle of your project. It makes it more difficult to find and debug errors. It makes the entire project more complex.
If you are working alone and you will be building and maintaining this project's full lifecycle by yourself, then by all means do whatever you feel most comfortable with.
If you are in a group that may all maintain this project, I would suggest that you, at a minimum, break out the dataflow to different tables into different dataflow tasks.
If you are in a larger group and for more flexibility in maintenance, I would suggest that each dataflow be broken out into a different package (assuming 2008 or below. I have not played with the 2012 project models yet, so won't comment on them here), so that each can be worked on by different developers simultaneously. (I would actually recommend coding this way even if you are the only one on the project, but that is just the style I have developed over my career.)
Being an SSIS newbie, I am trying to figure out the best possible way to transfer multiple tables. I am trying to import multiple tables from one database to another. I could write multiple parallel data flows for each table, however, I want to be smart about it.
For each of the tables, If I were to generalize,
I need to transfer rows from one table to a table in another database
I need to count the number of rows transferred
Have to record the start and finish time of the data transfer for each table
Record any errors
I am trying not to use Stored procedures since I want people to not have to dig deep into the DB to get the rules for this transformation. I would ideally like to have this done at the SSIS level using the components that therefore can be seen visually and understood.
Any best practises that people have used before?
I would ideally want to do something like
foreach (table in list of tables to transfer)
transfer (table name)
To make a generic table handler you would have to programatically construct the data flow. AFAIK SSIS has no auto-introspection facility. A script task will allow you to do this, and you can get the table metadata from the source. However, you will have to programatically construct the data flow, which means fiddling with the API.
I have worked on a product where this was done, although I didn't develop that component, so I can't offer words of wisdom off the top of my head as to how to do it. However, you can find resources on the web that explain how to do it.
You can find the table structure and types of the columns by querying against the system data dictionary. See this posting for some links to resources describing how this, including a link to a code sample.
What is your destination database doing with this info? Is it simply reading it?
Perhaps you would be best served by replicating the tables.
You could create a config table that has a list of your tables you want to move and then use a for loop to do something repeatedly....but what to do.
http://blogs.conchango.com/jamiethomson/archive/2005/02/28/SSIS_3A00_-Dynamic-modification-of-SSIS-packages.aspx
Below the bullet points, he states that SSIS cannot be modified to change metadata at run time. And to make it easy to maintain....you're going the wrong direction.
I'd keep it simple and use the wizard and then customize with logging/notifications etc.
Maybe you can call the stored procedure inside of your ssis scripts. Here is an example of how you might be able to use the sp
http://blog.sqlauthority.com/2012/10/31/sql-server-copy-data-from-one-table-to-another-table-sql-in-sixty-seconds-031-video/