Adding static files to Talend jobs - csv

I'm using Talend Open Studio for Big Data and I have a job where I use tFileInputDelimited to load a CSV file and use it as a lookup with a tMap.
Currently the file is loaded from the disk using an absolute path (C:\work\jobs\lookup.csv) and everything works fine locally.
The issue is that when I deploy the task, it obviously doesn't take the lookup.csv file with it.
Which begs a question:
Is there any way to "bundle" this file (lookup.csv) into the job so I can later deploy them together?

With static data such as this your best bet is to hard code the data into the job using a tFixedFlowInput instead.
As an example, if we want to use a list of country names, their ISO2 and ISO3 codes you might have these in a CSV that you'd normally access with a tFileInputDelimited. However, to save bundling this CSV with every build (which could be done with ANT/Maven) you can just hard code this data into a tFixedFlowInput:
You then just need to make sure your schema is set up as the same as your delimited file would have been (so in this case we have 3 columns: Country_Name, ISO2 and ISO3).

Related

Save html table data locally with electron

I'm new to coding and its a lot of try and error. Now I'm struggling with html tables.
For explanation: I am building an Electron Desktop application for stocks. I am able to input the value via GUI in an html table, and also export this as Excel file. But, every time I reload the app, all data from the table are gone. It would be great to save this data permanently, and simply add new data to the existing table after an application restart.
What's the best way to achieve this?
In my mind, it would the best way to overwrite the existing Excel file with the new work (old and new data from the table), because it would be easy to install the tool on a new PC and simply import the Excel file to have all data there. I don't have access to a web server, so I think a local Excel file would be better than a php solution.
Thank you.
<table class="table" id="tblData" >
<tr>
<th>Teilenummer</th>
<th>Hersteller</th>
<th>Beschreibung</th>
</tr>
</table>
This is the actual table markup.
Your question has two parts, it seems to me.
data representation and manipulation
data persistence
For #1, I'd suggest taking a look at Tabulator, in particular its methods of importing and exporting data. In my projects, I use the JSON format with Tabulator and save the data locally so it persists between sessions.
So for #2, how and where to save the data? Electron has built-in methods for getting the paths to common user directories. See app.getPath(name). Since it sounds like you have just one file to save, which does not need to be directly accessible to the user, appData is probably a good place to store it.
As for the "how" to store it – you can just write a file to that path using Node fs, though I like fs-jetpack too. Tabulator can save data as well.
Another way to store data is with electron-store. It works very well, though I've only used it with small amounts of data.
So the gist is that when your app starts, it loads the data and when the app quits, it saves the data, along with any changes which have been made, though I'd suggest saving after every change.
So, there are lots of options depending on your needs.

import csv file

I need to pull data from csv file to SQL Server table. Which Control task should I use ? Is it Flat File ? What is the correct method to pull data ?
The problem is I have used Flat File Task for pulling csv file. But the csv file whihc I am having, contains headings as first row, then on the third row, I have the columns, and data starting from fifth row.
Another problem is, in this file column details comes again after 1000 data ie columns appears in two rows. Is it possible to pull data ? If so, HOW ?
While Valentino's suggestion should work, I suggest that first you work with the provider of the file to get them to provide the data in a better format. When we get stuff like this we almost always push it back and ask for properly formatted data. We get it too about 90% of the time. It will save you work if they will fix their own drek. In our case, the customers providing the data are paying for our programming services and when they understand how substantial an increase in the cost to them, they are usually nmore than willing to accomodate our needs.
I believe you'll first have to transform your file into a proper CSV file so that the SSIS Flat File Source component (Data Flow) can read it. If the source system cannot produce a real CSV file, we usually create custom .NET applications for the cleanup/conversion task.
An Execute Process task (Control Flow) that executes the custom app can then be called prior to the Data Flow.

How can I add file locations to a database after they are uploaded using a Perl CGI script?

I have a CGI program I have written using Perl. One of its functions is to upload pics to the server.
All of it is working well, including adding all kinds of info to a MySQL db. My question is: How can I get the uploaded pic files location and names added to the db?
I would rather that instead of changing the script to actually upload the pics to the db. I have heard horror stories of uploading binary files to databases.
Since I am new to all of this, I am at a loss. Have tried doing some research and web searches for 3 weeks now with no luck. Any suggestions or answers would be greatly appreciated. I would really hate to have to manually add all the locations/names to the db.
I am using: a Perl CGI script, MySQL db, Linux server and the files are being uploaded to the server. I AM NOT looking to add the actual files to the db. Just their location(s).
It sounds like you have your method complete where you take the upload, make it a string and toss it unto mysql similar to reading file in as a string. However since your given a filehandle versus a filename to read by CGI. You are wondering where that file actually is.
If your using CGI.pm, the upload, uploadInfo, the param for the upload, and upload private files will help you deal with the upload file sources. Where they are stashed after the remote client and the CGI are done isn't permanent usually and a minimum is volatile.
You've got a bunch of uploaded files that need to be added to the db? Should be trivial to dash off a one-off script to loop through all the files and insert the details into the DB. If they're all in one spot, then a simple opendir()/readdir() type loop would catch them all, otherwise you can make a list of file paths to loop over and loop over that.
If you've talking about recording new uploads in the server, then it would be something along these lines:
user uploads file to server
script extracts any wanted/needed info from the file (name, size, mime-type, checksums, etc...)
start database transaction
insert file info into database
retrieve ID of new record
move uploaded file to final resting place, using the ID as its filename
if everything goes file, commit the transaction
Using the ID as the filename solves the worries of filename collisions and new uploads overwriting previous ones. And if you store the uploads somewhere outside of the site's webroot, then the only access to the files will be via your scripts, providing you with complete control over downloads.

How do I load Excel files selectively?

I have an SSIS package that needs to lookup two different types of excel files, type A and type B and load the data within to two different staging tables, tableA and tableB. The formats of these excel sheets are different and they match their respective tables.
I have thought of putting typeA.xls and typeB.xls in two different folders for simplicity(folder paths to be configureable). The required excel files will then be put here through some other application or manually.
What I want is to be able to have my dtsx package to scan the folder and pick the latest unprocessed file and load it ignoring others and then postfix the file name with '-loaded' (typeAxxxxxx-loaded.xls). The "-loaded" in the filename is how I plan to differentiate between the already loaded files and the ones yet to be loaded.
I need advice on:
a) How to check that configured folder for the latest file ie. without the '-loaded' in the filename and load it? ..and then after loading it, rename the same file in that configured folder with the '-loaded' postfixed.
b) Is this the best approach to doing this or is there a better way?
Thanks.
You can do it this way, but it might require several complex string expressions.
E.g. create a ForEach loop over .xls files, inside the loop add an empty script task, then a data flow to load this file. Connect them with a precedence constraint and make it conditional: precedence constraint expression will the check if file name does not end with -loaded.xls. You may either do it in script task or purely using SSIS expression on precedence constraint. Finally, add File System Task to rename the file. You may need to build new file name with another expression.
It might be easier to create two folders: Incoming for new unprocessed files, and Loaded for the files you've processed, and just move the .xls to this folder after processing without renaming. This will avoid the first conditional expression (and dummy script task), and simplify the configuration of File System task.
You can get the SQL File watcher Task and add it to your SSIS. I think this is a cleaner way to do what you want.
SQL File Watcher

iSeries Export to CSV

Is there an iSeries command to export the data in a table to CSV format?
I know about the Windows utilities, but since this needs to be run automatically I need to run this from a CL program.
You can use CPYTOIMPF and specify the TOSTMF option to place a CSV file on the IFS.
Example:
CPYTOIMPF FROMFILE(DBFILE) TOSTMF('/outputfile.csv') STMFCODPAG(*PCASCII) RCDDLM(*CRLF)
If you want the data to be downloaded directly to a PC, you can use the "Data Transfer from iSeries" function of IBM iSeries Client Access to create a .CSV file. In the file output details dialog, set the file type to Comma Separated Variable (CSV).
You can save the transfer description to be reused later.
You could use a trigger. The iSeries Client Access software wont do since that is a windows application, what I understand is that you need the data to be exported each time that the file is written. Check this link to know more about triggers.
You are going to need FTP to perform that action.
If your iSeries shop uses ZMOD/FTP your shortest solution is a few lines of code away -- 3 lines to be exact -- the three lines are to Start FTP, Put DBF, and finally, End FTP.
IF you don't use ZMOD/FTP:
- You could use native FTP/400 to accomplish what you need to do, but it is quite involved!!!
- you may probably need to use an RPGLE program to parse, format, and move, data into a "flatfile", then use native FTP/400 to FTP the file out
- and yes, a CL will need as a wrapper!
You can do it all in one very simple CL program:
CPYTOIMPF the file TOSTMF -> the cvs file will be in the IFS
FTP the file elsewhere (to a server or a PC)
It works like a charm