I am loading multiple excel files into multiple SQL server table.
these excel files are the city, state, etc. Is it possible to create a single flat file connection? currently, I am creating for city and state and for other src. attached the screenshot for better understanding.
If all of the source files are structured exactly the same (number of columns, data types, header rows, etc.), it would be possible to re-use a single flat file connection.
In general, though, the best practice is to create a connection for each file. It makes the package easier to understand in a year or two when you, or someone else, has to open it up again to fix or update it. It will also make it much easier to fix later if one or more of the files ends up changing (new columns, etc.).
Related
I am using SSIS2017 and part of what I am doing involves running several (30ish) SQL scripts to be output into flat files into the same folder. My question is, to do this do I have to create 30 New File Connections or is there a way to define the folder I want all the outputs to go to, and have them saved there?
I am only really thinking of keeping a tidy Connection Manager tab. If there's a more efficient way to do it than 30something file connections that would be great?
A data flow is tightly bound to the columns and types defined within for performance reasons.
If your use case is "I need to generate an extract of sales by year for the past 30ish" then yes, you can make do with a single Flat File Connection Manager because the columns and data types will not change - you're simply segmenting the data.
However, if your use case is "I need to extract Sales, Employees, Addresses, etc" then you will need a Flat File Connection Manager (and preferably a data flow) per entity/data shape.
It's my experience that you would be nicely served by designing this as 30ish packages (SQL Source -> Flat File Destination) with an overall orchestrator package that uses Execute Package Task to run the dependent processes. Top benefits
You can have a team of developers work on the individual packages
Packages can be re-run individually in the event of failure
Better performance
Being me, I'd also look at Biml and see whether you can't just script all that out.
Addressing comments
To future proof location info, I'd define a project parameter of something like BaseFilePath (assuming the most probably change is that dev I use a path of something like C:\ssisdata\input\file1.txt, C:\ssisdata\input\file3.csv and then production would be \server\share\input\file1.txt or E:\someplace\for\data\file1.txt) which I would populate with the dev value C:\ssisdata\input and then assign the value of \\server\share\input for the production to the project via configuration.
The crucial piece would be to ensure that an Expression exists on the Flat File Connection Manager's ConnectionString property to driven, in part, by the parameter's value. Again, being a programmatically lazy person, I have a Variable named CurrentFilePath with an expression like #[Project$::BaseFilePath] + "\\file1.csv"
The FFCM then uses #[User::CurrentFilePath] to ensure I write the file to the correct location. And since I create 1 package per extract, I don't have to worry about creating a Variable per flat file connection manager as it's all the same pattern.
I need to load a directory of different files (Excel and CSV) without any relation between them in multiple tables on database, every file must be loaded in its own table without any transformation.
I tried to do this using TfileList ==> TfileInputExcel ==> tMySQLOutput but it doesn't work because I need a lot of outputs.
Your question is not very clear, but it seems like you want something generic enough that will work with just one flow for all your files.
You might be able to accomplish that using dynamic schemas. See here for further guidance: https://www.talendforge.org/forum/viewtopic.php?id=21887. You will probably need at least 2 flows, one for the CSV files and one for the XLS files. You can filter the files for each flow by their extension in the tFileList component.
But if you are new to Talend, I encourage you to avoid this approach. It might be very hard to understand and use dynamic schemas. Instead, I would recommend you have one flow for each file.
I am still learning SQL Server.
The scenario is that I have a lot of .txt files with name format like DIAGNOSIS.YYMMDDHHSS.txt and only the YYMMDDHHSS is different from file to file. They are all saved in folder Z:\diagnosis.
How could I write a stored procedure to upload all .txt files with a name in the format of DIAGNOSIS.YYMMDDHHSS.txt in folder Z:\diagnosis? Files can only be loaded once.
Thank you
I would not do it using a stored proc. I would use SSIS. It has a for each file task you can use. When the file has been loaded, I would move it to an archive location so that it doesn't get processed the next time. Alternatively you could create a table where you store the names of the files that were successfully processed and have the for each file loop skip any in that table, but then you just keep getting more and more files to loop through, better to move processed ones to a different location if you can.
And personally I also would put the file data in a staging table before loading the data to the final table. We use two of them, one for the raw data and one for the cleaned data. Then we transform to staging tables that match the relational tables in production to make sure the data will meet the needs there before trying to affect production and send exceptions to an exception table of records that can't be inserted for one reason or another. Working in the health care environment you will want to make sure your process meets the government regulations for storage of patient records for the country you are in if they exist (See HIPAA in the US). You may have to load directly to production or severely limit the access to staging tables and files.
I have an Access database that keeps track of many different aspects of my companies performance and I would like to add functionality to keep track of the hours the employees are working.
The hours are all kept track of on a website called timetracker. They have a few reporting options including XML and CSV files. The site has a favorite report feature to get the same data in the format that I want it every week.
What I would like to do is find the best process for getting the data from this website, into a table in my database that I can reference.
I will not be the one executing whatever process I come up with and I would really like it to be as easy as possible for whoever it is that does have to do it.
Right now I have a linked table that is an XML file in our SharePoint folder. I was thinking that maybe we could just run the report and download the file every week then just save it over the old file with the correct sheet names and it should update.
What I am wondering is if anyone can come up with an easier process for doing this that would take the least amount of time and be easiest to write down instructions for that anyone could execute.
(Would it maybe be possible to create some sort of macro to actually download the report automatically?)
I have an SSIS data flow task that reads a CSV file with certain fields, tweaks it a little and inserts results into a table. The source file name is a package parameter. All is good and fine there.
Now, I need to process slightly different kind of CSV files with an extra field. This extra field can be safely ignored, so the processing is essentially the same. The only difference is in the column mapping of the data source..
I could, of course, create a copy of the whole package and tweak the data source to match the second file format. However, this "solution" seems like terrible duplication: if there are any changes in the course of processing, I will have to do them twice. I'd rather pass another parameter to the package that would tell it what kind of file to process.
The trouble is, I don't know how to make SSIS read from one data source or another depending on parameter, hence the question.
I would duplicate the Connection Manager (CSV definition) and Data Flow in the SSIS package and tweak them for the new file format. Then I would use the parameter you described to Enable/Disable either Data Flow.
In essence, SSIS doesnt work with variable metadata. If this is going to be a recurring pattern I would deal with it upstream from SSIS, building a VB / C# command-line app to shred the files into SQL tables.
You could make the connection manager push all the data into 1 column. Then use a script transformation component to parse out the data to the output, depending on the number of fields in the row.
You can split the data based on delimiter into say a string array (I googled for help when I needed to do this). With the array you can tell the size of it and thus what type of file it is that has been connected to.
Then, your mapping to the destination can remain the same. No need to duplicate any components either.
I had to do something similar myself once, because although the files I was using were meant to always be the same format - depending on version of the system sending the file, it could change - and thus by handling it in a script transformation this way I was able to handle the minor variations to the file format. If the files are 99% always the same that is ok.. if they were radically different you would be better to use a separate file connection manager.