I have an app/tool which reads from a CSV file and writes to another, processing it using HSQLDB.
I want to have the CSV file as the only output, and the database files should disappear after the process finishes.
I tried to use mem storage, but that prevents HSQLDB to write to the CSV file.
I also tried to DROP SCHEMA before closing the connection, but that does not remove the files.
I don't like deleting the files manually, as that's HSQLDB implementation-specific and can change over time.
Is there some systemic way to leave only the CSV file?
Ideally, I'd like some option which would allow HSQLDB to write CSV file while using in-memory storage.
HSQLDB never deletes its own files. You can reduce the number of files by using
SET FILES LOG FALSE
Related
I have approximately 1000 files in local drive.I need to move that files into SQL Server accordingly one after another.
Since local drive having files like file1.csv,file2.csv,..upto file1000.csv.I am sure that number of files in local drive may change dynamically.
I can able to created template for move that files into SQL Server.But i have to process the file2 when file 1 has been completely moved into SQL Server.
Is this possible in NiFi without using Wait\Notify processor?
can anyone please guide me to solve this?
Using EnforceOrder Processor to process files sequentially available in NiFi-1.2.0.bin.
https://gist.github.com/ijokarumawak/7e6158460cfcb0b5911acefbb455edf0
There is a Concurrent Tasks property in processors.
If you will set 1 in each processor they will run sequentially.
But maybe it's better to insert all the files into temp table and then run aggregation on the level of database?
I am still learning SQL Server.
The scenario is that I have a lot of .txt files with name format like DIAGNOSIS.YYMMDDHHSS.txt and only the YYMMDDHHSS is different from file to file. They are all saved in folder Z:\diagnosis.
How could I write a stored procedure to upload all .txt files with a name in the format of DIAGNOSIS.YYMMDDHHSS.txt in folder Z:\diagnosis? Files can only be loaded once.
Thank you
I would not do it using a stored proc. I would use SSIS. It has a for each file task you can use. When the file has been loaded, I would move it to an archive location so that it doesn't get processed the next time. Alternatively you could create a table where you store the names of the files that were successfully processed and have the for each file loop skip any in that table, but then you just keep getting more and more files to loop through, better to move processed ones to a different location if you can.
And personally I also would put the file data in a staging table before loading the data to the final table. We use two of them, one for the raw data and one for the cleaned data. Then we transform to staging tables that match the relational tables in production to make sure the data will meet the needs there before trying to affect production and send exceptions to an exception table of records that can't be inserted for one reason or another. Working in the health care environment you will want to make sure your process meets the government regulations for storage of patient records for the country you are in if they exist (See HIPAA in the US). You may have to load directly to production or severely limit the access to staging tables and files.
How do I import a database just like in phpmyadmin at DataGrip?
I have the .sql exported from phpmyadmin... but those are lots of lines so that the IDE stops working when trying to run the whole .sql
In DataGrip go to File > Open and select your mysql dump file. Then right click the tab for the file to get the context menu, and select "Run [your filename...]" option. It may ask you to select your schema to apply the run to. But this is how I accomplished importing a dump from phpMyadmin using DataGrip.
Jetbrains documentation on running SQL scripts does not provide a ton of information on processing large insert statements. There is a discussion in the Datagrip community forums and apparently upcoming features to make working with large scripts easier.
Quote from thread:
Huge SQL files can be executed from Files view (use a context menu action).
I assume you are attempting to import a database export which is a series of SQL statements saved to a file. There could be a memory issue if you are attempting to run a large SQL file in memory. Try the following.
Insert commit statements in your SQL file in a text editor. This can even be done from within datagrip. Every couple of hundred statements you can place the line
commit;
which should purge the previous statements from memory. I strongly recommend saving the file which you edit separately from the export script. This method is not applicable if you need an all or nothing import, meaning if even one statement or block fails you want all of the statement to be rolled back.
1 - Going to View->Tool Windows->Files
2 - Going to schema folder and open it in windows explorer after that past your dump file in my example i will past MyDump.dmp .
3 - Right click on the MyDump.dmp and run it .
To import data from a script file, run the file as it is described in Run database code. In addition to script files, you can import a CSV, TSV, or any other text file that contains delimiter-separated values.
https://www.jetbrains.com/help/datagrip/import-data.html
I'm working on a membership site where users are able to upload a csv file containing sales data. The file will then be read, parsed, and the data will be charted. Which will allow me to dynamically create charts
My question is how to handle this csv upload? Should it be uploaded to folder and stored for later or should it be directly inserted into a MySQL table?
Depends on how much processing needs to be done, I'd say. if it's "short" data and processing is quick, then your upload-handling script should be able to take care of it.
If it's a large file and you'd rather not tie up the user's browser/session while the data's parsed, then do the upload-now-and-deal-with-it-later option.
It depends on how you think the users will use this site.
What do you estimate the size of the files for these users to be?
How often would they (if ever) upload a file twice, can they download the charts?
If the files are small and more for one-off use you could upload it and process it on the fly, if they require repetitive access and analysis then you will save the users time by importing the data to the database.
The LOAD DATA INFILE command in MySQL handles uploads like that really nice.If you make the table you want to upload it to and then use that command it has worked great and super quick for me. I've loaded several thousand rows of data in under 5 seconds using it.
http://dev.mysql.com/doc/refman/5.5/en/load-data.html
In phpMyAdmin there are two options to import a CSV file.
One is CSV. The other is CSV using LOAD DATA.
What's the difference between these two? Is there an advantage to using one over the other?
LOAD DATA INFILE is a MySQL query that works completely independently of PHPMyAdmin.
The CSV import probably involves uploading the file to the PHPMyAdmin server, where it parses the file and builds a series of INSERT statements to be run against the server.
Personally, I wouldn't trust anything PHPMyAdmin does ;-) - however, actual performance will probably depend on your table structure and the data.
I will note, however, that MySQL takes some very efficient shortcuts when inserting data from a LOAD DATA INFILE command.
As stated above the LOAD DATA option is actually telling phpMyAdmin to use the MySQL command to let MySQL parse and load the file rather than phpMyAdmin parsing it first.
As also stated above, giving MySQL access to load the file can be dangerous if you don't feel 100% secure about the source and accuracy of the file it's self. It's like using a php form with no sql injection protection to insert data.
However, in some cases phpMyAdmin does not format the data correctly or has trouble parsing it when the regular CSV" option is used. This will cause un-explained errors such as "invalid format on line N" or "incorrect field count on line N" Those might not be exact error messages since I'm not logged into phpMyAdmin at the moment. In these cases the LOAD DATA option can be used to get passed the error. I think the extra option of Use local keyword has to do with making sure the correct commands for that specific version of MySQL on the local server is used. Not sure about the last part though.
Something to keep in mind is also the size of the file (number of lines being imported) I have had to break down a 1600 line file into smaller files even when using the LOAD DATA option in order to get it to go through. It gave no errors but the "affected rows" was incorrect when the file was too big.
The first option will have phpMyAdmin parse the CSV file itself and then generate and execute the SQL to insert the data. The second option will let MySQL take care of loading, processing, and inserting the data.
Both options (should) behave the same way, but the LOAD DATA INFILE option is generally much faster, and you don't have to worry about PHP's memory/execution time limits. The only problem is that it isn't supported by all configurations because there are security implications for giving MySQL access to the uploaded files, and as such it is often disabled (ex. shared hosting).
To add to the other replies: the "CSV" one insists you have exactly the same amount of columns in the text file and the table. "CSV using LOAD DATA" does not.
CSV and CSV using LOAD DATA. The first method is implemented internally by phpMyAdmin and is the recommended one for its simplicity. With the second method, phpMyAdmin receives the file to be loaded, and passes it to MySQL. In theory, this method should be faster. However, it has more requirements due to MySQL itself