How to process files sequentially? - csv

I have approximately 1000 files in local drive.I need to move that files into SQL Server accordingly one after another.
Since local drive having files like file1.csv,file2.csv,..upto file1000.csv.I am sure that number of files in local drive may change dynamically.
I can able to created template for move that files into SQL Server.But i have to process the file2 when file 1 has been completely moved into SQL Server.
Is this possible in NiFi without using Wait\Notify processor?
can anyone please guide me to solve this?

Using EnforceOrder Processor to process files sequentially available in NiFi-1.2.0.bin.
https://gist.github.com/ijokarumawak/7e6158460cfcb0b5911acefbb455edf0

There is a Concurrent Tasks property in processors.
If you will set 1 in each processor they will run sequentially.
But maybe it's better to insert all the files into temp table and then run aggregation on the level of database?

Related

Sending .csv files to a database: MariaDB

I will preface this by saying I am very new to databases. I am working on a project for my undergraduate research that requires various sensor data to be send from a Raspberry Pi via the internet to a database. I am using MariaDB at the moment, but am open to other options.
The background: Currently all sensor data is being saved in csv files on the RPi. There will be automation to send data at given intervals to the database.
The question: Am I able to audit the file itself to a database? For our application, a csv file is the most logical data storage format and we simply want the database to be a way for us to retrieve data remotely, since the system will be installed miles away from where we work.
I have read about "LOAD DATA INFILE" on this website, but am unsure how it applies to this database. Would JSON be at all applicable for this? I am willing to learn if it makes the process more streamlined.
Thank you!
If 'sending data to the database' means that, by one means or another, additional or replacement CSV files are saved on disk, in a location accessible to a MariaDB client program, then you can load these into the database using the "mysql" command-line client and an appropriate script of SQL commands. That script very likely will make use of the LOAD DATA LOCAL INFILE command.
The "mysql" program may be launched in a variety of ways: 1) spawned by the process that receives the uploaded file; 2) launched by a cron job (Task Scheduler on Windows) that runs periodically to check for new or changed CSV files; of 3) launched by a daemon that continually monitors the disk for new or changed CSV files.
A CSV is typically human readable. I would work with that first before worrying about using JSON. Unless the CSVs are huge, you could probably open them up in a simple text editor to read their contents to get an idea of what the data looks like.
I'm not sure of your environment (feel free to elaborate), but you could just use whatever web services you have to read in the CSV directly and inject the data into your database.
You say that data is being sent using automation. How is it communicating to your web service?
What is your web service? (Is it php?)
Where is the database being hosted? (Is it in the same webservice?)

Load txt file into SQL Server database

I am still learning SQL Server.
The scenario is that I have a lot of .txt files with name format like DIAGNOSIS.YYMMDDHHSS.txt and only the YYMMDDHHSS is different from file to file. They are all saved in folder Z:\diagnosis.
How could I write a stored procedure to upload all .txt files with a name in the format of DIAGNOSIS.YYMMDDHHSS.txt in folder Z:\diagnosis? Files can only be loaded once.
Thank you
I would not do it using a stored proc. I would use SSIS. It has a for each file task you can use. When the file has been loaded, I would move it to an archive location so that it doesn't get processed the next time. Alternatively you could create a table where you store the names of the files that were successfully processed and have the for each file loop skip any in that table, but then you just keep getting more and more files to loop through, better to move processed ones to a different location if you can.
And personally I also would put the file data in a staging table before loading the data to the final table. We use two of them, one for the raw data and one for the cleaned data. Then we transform to staging tables that match the relational tables in production to make sure the data will meet the needs there before trying to affect production and send exceptions to an exception table of records that can't be inserted for one reason or another. Working in the health care environment you will want to make sure your process meets the government regulations for storage of patient records for the country you are in if they exist (See HIPAA in the US). You may have to load directly to production or severely limit the access to staging tables and files.

MySQL query calling other SQL files to execute in order

so I have a Java EE application and I use hibernate.
I have created import.sql file which is loaded each time i start the application.
My issue is that the database is quite big, so I have the startup data prepared in separate sql files, which I should load in certain order.
So within this SQL script file i need to CALL or IMPORT or LOAD other SQL files in folder above this one (path is not a problem).
I would be grateful for the solution for mySQL and maybe oracle db as well (but mysql is more important atm).
This solution is not working
Thanks!
Ok so the thing is that for the moment it is not possible to load other QUERY SCRIPT files in one another. It could be raw data with LOAD DATA INFILE, but otherwise it is not possible.

load many txt files to mysql db

I want to store 7000 records consisting of txt files as datatype BLOB in my DB using workbench.
But:
1. I don't know how to do it automatically? Should I put all the files in one catalogue and then write a script to take them one by one and insert in the adequate rows?
2. I am not sure if BLOB is fine for this type of file storage? Later I want to connect my DB with GUI so after clicking, it should be possible to open each txt file in new window.
Could you advice me how to solve my problem?
You should write a script, yes. If it's hard for you to put them all in one folder I think there are scripts and tools to do this.
You can use C#, PHP or any other lang to scan those files and then insert them into the database.
Bunch of tutorials:
http://www.csharp-examples.net/get-files-from-directory/
Inserting record in to MySQL Database using C#
http://www.codeproject.com/Articles/43438/Connect-C-to-MySQL
http://net.tutsplus.com/articles/news/scanning-folders-with-php/
http://www.homeandlearn.co.uk/php/php13p3.html
Blob should do, takes around 20 megabytes of text.

Indirection in SSIS

Is it possible to perform any sort of indirection in SSIS?
I have a series of jobs performing FTP and loops through the files before trying to run another DTSX package on them. Currently this incurs a lot of repeated cruft to pull down the file and logging.
Is there any way of redesigning this so I only need one package rather than 6?
Based on your comment:
Effectively the 6 packages are really 2 x 3. 1st for each "group" is FTP pull
down and XML parsing to place into flat tables. Then 2nd then transforms and
loads that data.
Instead of downloading files using one package and inserting data into tables using another package, you can do that in a single package.
Here is a link containing an example which downloads files from FTP and saves it to local disk.
Here is a link containing an example to loop through CSV files in a given folder and inserts that data into database.
Since you are using XML files, here is a link that shows how to loop through XML files.
You can effectively combine the above examples into a single package by placing the control flow tasks one after the other.
Let me know if this is not what you are looking for.