Hi guys I am looking for some help with flat files source in data flow task or bulk mail task. Say I have incoming flat files, I can have
a;b;c or a|b|c
is it possible to assign multiple column delimiter for the same flat file source?
I have been searching how to do it
Thank you very much.
the flat file task doesn't support this. See this similar question as reference.
Instead you could use a script task to determine, which delimiter is used and then forward it to the flat file task with the suitable delimiter.
I ran in to a simliar problem and ended up using Swiss File Knife. Just preprocess the file and have it replace commas with pipes or vice versa. That way you only need to have one import.
You also could use a script transform in your flat file reader to use the string.split method. I'd probably go with the SFK option though. It's a bit more transparent, although slightly less portable.
Related
I need to load a directory of different files (Excel and CSV) without any relation between them in multiple tables on database, every file must be loaded in its own table without any transformation.
I tried to do this using TfileList ==> TfileInputExcel ==> tMySQLOutput but it doesn't work because I need a lot of outputs.
Your question is not very clear, but it seems like you want something generic enough that will work with just one flow for all your files.
You might be able to accomplish that using dynamic schemas. See here for further guidance: https://www.talendforge.org/forum/viewtopic.php?id=21887. You will probably need at least 2 flows, one for the CSV files and one for the XLS files. You can filter the files for each flow by their extension in the tFileList component.
But if you are new to Talend, I encourage you to avoid this approach. It might be very hard to understand and use dynamic schemas. Instead, I would recommend you have one flow for each file.
In progress4gl, Iam exporting some values from mutiple procedure files to a single csv file. But when running the second procedure (.p) file the values which I got from the previous file is getting overwritten...How to export all datas of all the procedure files to a single csv file? Thanks in Advance..
The quick answer is to open the second and subsequent outputs to the file as
OUTPUT TO file.txt APPEND.
if that is all you need. If you are looking to do something more complex, then you could define and open a new shared stream in the calling program, and use that stream in each of the called programs, thus only opening and closing the stream once.
If you're using persistent procedures and functions, this answer may help, as it's a little more complex than normal shared streams.
I would realy not suggest to use a SHARED Stream. Especially with persischen Procedures or OO. STREAM-HANDLES provide a more flexible way of distributing the Stream.
So as was previously suggested
on the first job running you do:
OUTPUT TO file.txt.
on all the other jobs running after this you do:
OUTPUT TO file.txt APPEND.
I need to modify csv files via automated scripts. I need help in what direction I should look towards and what language the script should be in
Situation: I have a simple CSV file but I need an automated script that can edit certain fields and fill in blank ones with whatever I specify. What should my starting point be and what kind of a developer should I look for? Which coding language should he or she be knowledgable at?
Thank you!!
maybe you are looking for CSVfix, this is a tool for manipulating CSV data in the command shell. Take a look here: https://code.google.com/p/csvfix/
With it you can, among other things:
Reorder, remove, split and merge fields
Convert case, trim leading & trailing spaces
Search for specific content using regular expressions
Filter out duplicate data or data on exclusion lists
Enrich with data from other sources
Add sequence numbers and file source information
Split large CSV files into smaller files based on field contents
Perform arithmetic calculations on individual fields
Validate CSV data against a collection of validation rules
and convert between CSV and fixed format, XML, SQL and DSV
I hope this helps you out,
best regards,
Jürgen Jester
I have an SSIS data flow task that reads a CSV file with certain fields, tweaks it a little and inserts results into a table. The source file name is a package parameter. All is good and fine there.
Now, I need to process slightly different kind of CSV files with an extra field. This extra field can be safely ignored, so the processing is essentially the same. The only difference is in the column mapping of the data source..
I could, of course, create a copy of the whole package and tweak the data source to match the second file format. However, this "solution" seems like terrible duplication: if there are any changes in the course of processing, I will have to do them twice. I'd rather pass another parameter to the package that would tell it what kind of file to process.
The trouble is, I don't know how to make SSIS read from one data source or another depending on parameter, hence the question.
I would duplicate the Connection Manager (CSV definition) and Data Flow in the SSIS package and tweak them for the new file format. Then I would use the parameter you described to Enable/Disable either Data Flow.
In essence, SSIS doesnt work with variable metadata. If this is going to be a recurring pattern I would deal with it upstream from SSIS, building a VB / C# command-line app to shred the files into SQL tables.
You could make the connection manager push all the data into 1 column. Then use a script transformation component to parse out the data to the output, depending on the number of fields in the row.
You can split the data based on delimiter into say a string array (I googled for help when I needed to do this). With the array you can tell the size of it and thus what type of file it is that has been connected to.
Then, your mapping to the destination can remain the same. No need to duplicate any components either.
I had to do something similar myself once, because although the files I was using were meant to always be the same format - depending on version of the system sending the file, it could change - and thus by handling it in a script transformation this way I was able to handle the minor variations to the file format. If the files are 99% always the same that is ok.. if they were radically different you would be better to use a separate file connection manager.
I have a .NET webforms front end that allows admin users to upload two .xls files for offline processing. As these files will be used for validation (and aggregation) I store these in an image field in a table.
My ultimate goal is to create an SSIS package that will process these files offline. Does anyone know how to use SSIS to read a blob from a table into its native (in this case .xls) format for use in a Data Flow task?
In my (admittedly limited) experience with SSIS, it is quite good at rapidly getting something up and running, but frusteratingly limited in getting something that "feels" like the most elegant, efficient solution to a programmer.
Since the Excel Source Editor seems to take only files as input, you need to give it a file or reimplement its functionality in code that can take a blob. I understand that this is unsatisfying, but in the end, this is a time saving tool.