I have the following problem:
We have a case where we need to import several dozen, if not hundreds, of uniquely structured source files with SSIS. Perfect opportunity to use BIML for this. We do not have metadata information for the files as is, so I need to get them - not manually, file by file, though.
So I thought, easy, build a table with filepaths, use BIML to create a package with the source connection, SSIS identifies metadata (maybe not 100% correctly) and I can use this metadata (column name, datatype, length, etc.) to initially persist in a metadata table for further use.
But: there seems to be no way to achieve this. While I can view the metadata and even paste it to the clipboard in SSDT, I cannot get it to the pipeline. Tried a script component as well (not my best skill), can get everything BUT the column name in ProcessInput, but can't create an output.
So: is there any known way of achieving this? Googled for several hours already to no avail.
Related
I have a Data Flow with OLE DB Source, Script Component (Transformation), and Flat File Destination:
The OLE DB Source task has 100+ columns. The script component is going to cleanup data in each column and then output it to the Flat File Destination.
Adding output columns by hand in Script Component is unthinkable to me.
What options do I have to mirror the output columns with the input columns in the Script Component? While the output column name will be the same, I plan to change the datatype from DT_STR to DT_WSTR.
Thank you.
You are short of luck here. Possible scenarios:
Either you use Script Component and have to key in all columns and its properties manually. In your case, you have to set proper datatype.
Or you can create your own Custom Component which can be programmed to create output columns based on input columns. It is not easy and I cannot recommend a simple guideline, but it could be done.
This might have sense if you have to repeat similar operations in many places so it is not a one-time task.
You can create a BIML script that creates a package based on metadata. However, the metadata (list of columns and its datatypes) has to be prepared before running BIML script or do some tricks to get it during script execution. Again, some proficiency with BIML is essential.
So, for one-time job and little experience with BIML I would go for a pure manual approach.
I am loading multiple excel files into multiple SQL server table.
these excel files are the city, state, etc. Is it possible to create a single flat file connection? currently, I am creating for city and state and for other src. attached the screenshot for better understanding.
If all of the source files are structured exactly the same (number of columns, data types, header rows, etc.), it would be possible to re-use a single flat file connection.
In general, though, the best practice is to create a connection for each file. It makes the package easier to understand in a year or two when you, or someone else, has to open it up again to fix or update it. It will also make it much easier to fix later if one or more of the files ends up changing (new columns, etc.).
I need to load a directory of different files (Excel and CSV) without any relation between them in multiple tables on database, every file must be loaded in its own table without any transformation.
I tried to do this using TfileList ==> TfileInputExcel ==> tMySQLOutput but it doesn't work because I need a lot of outputs.
Your question is not very clear, but it seems like you want something generic enough that will work with just one flow for all your files.
You might be able to accomplish that using dynamic schemas. See here for further guidance: https://www.talendforge.org/forum/viewtopic.php?id=21887. You will probably need at least 2 flows, one for the CSV files and one for the XLS files. You can filter the files for each flow by their extension in the tFileList component.
But if you are new to Talend, I encourage you to avoid this approach. It might be very hard to understand and use dynamic schemas. Instead, I would recommend you have one flow for each file.
I have an SSIS data flow task that reads a CSV file with certain fields, tweaks it a little and inserts results into a table. The source file name is a package parameter. All is good and fine there.
Now, I need to process slightly different kind of CSV files with an extra field. This extra field can be safely ignored, so the processing is essentially the same. The only difference is in the column mapping of the data source..
I could, of course, create a copy of the whole package and tweak the data source to match the second file format. However, this "solution" seems like terrible duplication: if there are any changes in the course of processing, I will have to do them twice. I'd rather pass another parameter to the package that would tell it what kind of file to process.
The trouble is, I don't know how to make SSIS read from one data source or another depending on parameter, hence the question.
I would duplicate the Connection Manager (CSV definition) and Data Flow in the SSIS package and tweak them for the new file format. Then I would use the parameter you described to Enable/Disable either Data Flow.
In essence, SSIS doesnt work with variable metadata. If this is going to be a recurring pattern I would deal with it upstream from SSIS, building a VB / C# command-line app to shred the files into SQL tables.
You could make the connection manager push all the data into 1 column. Then use a script transformation component to parse out the data to the output, depending on the number of fields in the row.
You can split the data based on delimiter into say a string array (I googled for help when I needed to do this). With the array you can tell the size of it and thus what type of file it is that has been connected to.
Then, your mapping to the destination can remain the same. No need to duplicate any components either.
I had to do something similar myself once, because although the files I was using were meant to always be the same format - depending on version of the system sending the file, it could change - and thus by handling it in a script transformation this way I was able to handle the minor variations to the file format. If the files are 99% always the same that is ok.. if they were radically different you would be better to use a separate file connection manager.
I have a .NET webforms front end that allows admin users to upload two .xls files for offline processing. As these files will be used for validation (and aggregation) I store these in an image field in a table.
My ultimate goal is to create an SSIS package that will process these files offline. Does anyone know how to use SSIS to read a blob from a table into its native (in this case .xls) format for use in a Data Flow task?
In my (admittedly limited) experience with SSIS, it is quite good at rapidly getting something up and running, but frusteratingly limited in getting something that "feels" like the most elegant, efficient solution to a programmer.
Since the Excel Source Editor seems to take only files as input, you need to give it a file or reimplement its functionality in code that can take a blob. I understand that this is unsatisfying, but in the end, this is a time saving tool.