TIMEOUT in Laravel - mysql

So, i have to read a excel file in which each row contains some data that i want do write in my database. I pass the whole file to laravel, it reads the file and format it to a array and then i make a new insertion (or update) in my databse.
The thing is, the input excel file can contain thousands of rows and its taking a while to complete, giving a timeout error in some cases.
When i try to make this locally i use set_time_limit(0); function so timeout doesnt occur, and it works pretty wel. But in a remote server this function is disabled for security reasons and my code crashes because of a timeout.
Somebody can help in how to solve this problem ? Maybe another ideia in how to better solve this problem ?

A nice way to handle tasks that take a long time is by making use of so called jobs.
You can make a job called ImportExcel and dispatch it when someone send you a file.
Take a good look at the docs, they have some great examples on how to do this.

You can take care of this using following steps :
1. Take the csv file and store it temporarily in storage :
You can store the large csv when user uploads. If it's something which is not uploaded from frontend, just make sure you have it saved to be processed in next step.
2. Then dispatch a job which can be queued :
You can create a job which can handle this asynchronously. You can use Supervisor to manage queues and timeouts etc.
3. Use package like thephpleague :
Using this package(or similar), you can chunk the records or read one at a time. It is really really helpful to keep your memory usage under limit. Plus it has different options of methods available to read the data from files.
4. Once file is processed, you can delete it from the temporary storage :
Just some teardown cleanup activity.

Related

Options for storing and retrieving small objects (node.js), is a database necessary?

I am in the process of building my first live node.js web app. It contains a form that accepts data regarding my clients current stock. When submitted, an object is made and saved to an array of current stock. This stock is then permanently displayed on their website until the entry is modified or deleted.
It is unlikely that there will ever be more than 20 objects stored at any time and these will only be updated perhaps once a week. I am not sure if it is necessary to use MongoDB to store these, or whether there could be a simpler more appropriate alternative. Perhaps the objects could be stored to a JSON file instead? Or would this have too big an implication on page load times?
You could potentially store in a JSON file or even in a cache of sorts such as Redis but I still think MongoDB would be your best bet for a live site.
Storing something in a JSON file is not scalable so if you end up storing a lot more data than originally planned (this often happens) you may find you run out of storage on your server hard drive. Also if you end up scaling and putting your app behind a load balancer, then you will need to make sure there are matching copy's of that JSON file on each server. Further more, it is easy to run into race conditions when updating a JSON file. If two processes are trying to update the file at the same time, you are going to potentially lose data. Technically speaking, JSON file would work but it's not recommended.
Storing in memory (i.e.) Redis has similar implications that the data is only available on that one server. Also the data is not persistent, so if your server restarted for whatever reason, you'd lose what was stored in memory.
For all intents and purposes, MongoDB is your best bet.
The only way to know for sure is test it with a load test. But as you probably read html and js files from the file system when serving web pages anyway, the extra load of reading a few json files shouldn't be a problem.
If you want to go with simpler way i.e JSON file use nedb API which is plenty fast as well.

EC2 suitability for synching large CSV files from an FTP

I have to execute a task twice per week. The task consists on fetching a 1.4GB csv file from a public ftp server. Then I have to process it (apply some filters, discard some rows, make some calculations) and then synch it to a Postgres database hosted on AWS RDS. For each row I have to retrieve a SKU entry on the database and determine wether it needs an update or not.
My question is if EC2 could work as a solution for me. My main concern is the memory.. I have searched for some solutions https://github.com/goodby/csv which handle this issue by fetching row by row instead of pulling it all to memory, however they do not work if I try to read the .csv directly from the FTP.
Can anyone provide some insight? Is AWS EC2 a good platform to solve this problem? How would you deal with the issue of the csv size and memory limitations?
You wont be able to stream the file directly from FTP, instead, you are going to copy the entire file and store it locally. Using curl or ftp command is likely the most efficient way to do this.
Once you do that, you will need to write some kind of program that will read the file a line at a time or several if you can parallelize the work. There are ETL tools available that will make this easy. Using PHP can work, but its not a very efficient choice for this type of work and your parallelization options are limited.
Of course you can do this on an EC2 instance (you can do almost anything you can supply the code for in EC2), but if you only need to run the task twice a week, the EC2 instance will be sitting idle, eating money, the rest of the time, unless you manually stop and start it for each task run.
A scheduled AWS Lambda function may be more cost-effective and appropriate here. You are slightly more limited in your code options, but you can give the Lambda function the same IAM privileges to access RDS, and it only runs when it's scheduled or invoked.
FTP protocol doesn't do "streaming". You cannot read file from Ftp chunks by chunk.
Honestly, downloading the file and trigger run a bigger instance is not a big deal if you only run twice a week, you just choose r3.large (it cost less than 0.20/hour ), execute ASAP and stop it. The internal SSD disk space should give you the best possible I/O compare to EBS.
Just make sure your OS and code are deployed inside EBS for future reuse(unless you have automated code deployment mechanism). And you must make sure RDS will handle the burst I/O, otherwise it will become bottleneck.
Even better, using r3.large instance, you can split the CSV file into smaller chunks, load them in parallel, then shutdown the instance after everything finish. You just need to pay the minimal root EBS storage cost afterwards.
I will not suggest lambda if the process is lengthy, since lambda is only mean for short and fast processing (it will terminate after 300 seconds).
(update):
If you open up a file, the simple ways to parse it is read it sequentially, it may not put the whole CPU into full use. You can split up of CSV file follow reference this answer here.
Then using the same script, you can call them simultaneously by sending some to the background process, example below show putting python process in background under Linux.
parse_csvfile.py csv1 &
parse_csvfile.py csv2 &
parse_csvfile.py csv3 &
so instead single file sequential I/O, it will make use of multiple files. In addition, splitting the file should be a snap under SSD.
So I made it work like this.
I used Python and two great libraries. First of all I created a Python code to request and download the csv file from the FTP so I could load it to the memory. The first package is Pandas, which is a tool to analyze large amounts of data. It includes methods to read files from a csv easily. I used the included features to filter and sort. I filtered the large csv by a field and created about 25 new smaller csv files, which allowed me to deal with the memory issue. I used as well Eloquent which is a library inspired by Laravel's ORM. This library allows you to create a connection using AWS public DNS, database name, username and password and make queries using simple methods, without writing a single Postgres query. Finally I created a T2 micro AWS instance, installed Pandas and Eloquent updated my code and that was it.

mysql db heavy load and numerous connection

In my work I need to revamp the web which need to accept numerous connection always. Before I use the JSON to get the data until now.But now I want to direct call the DB and get the data. As I know use cache is the best way for my web. But in initial the concurrent access to DB is often happen.Any advice for me to handle the situation. Because I want the web that can get the updated data always.
Thanks.
Following are my suggestions
If you want to use cache, you have to automate your cache clear process whenever there is an update in the particular data you hit. But this is practically possible if your data is updated infrequently.
If your budget allows, Put your DB in a cluster (Write in master and read from master&slave)
In worst case,ensure your db is properly indexed.

Optimisation of the export function

I am wondering what would be the best way to speed up the company export function in my application:
function export(){
ini_set('memory_limit', '-1');
ini_set('max_execution_time', 0);
$conditions = $this->getConditions($this->data);
$resultCompanies = $this->Company->find('all', array('conditions' => $conditions));
$this->set(compact('resultCompanies'));
}
So what it does is it searches for the companies in the databse that match certain conditions. Then the results are set to be able to be displayed in the corresponding view.
How can I speed this function up? The more results you want to export the more time it takes to export them but is it possible to optimize it in some way? It currently takes around 30 seconds to export just 4000 results - so I don't imagine it being able to export say 40000 results and it should be able to? Is it a matter of the server?
Thanks.
This is not a matter of the server but of the program architecture.
You do not want to fetch and render this huge amount of information on the fly for obvious reasons you already have encountered.
I don't know enough about the requirements of your app but I assume that you need to download a report. Assuming it has to be always up to date here is what I would do:
The user clicks the link to download the report. The user will get a loading indicator displayed and a message that is report export is being prepared using JS and AJAX. On the server side a task is triggered to build a report.
A background service, a simple CakePHP shell that runs in a loop, will notice that there is a new report to build. It will build the report reading the db records in chunks to avoid running out of memory and write it to a file. When it's done the report download request is flagged as done and the file can be downloaded. On the client side the long polling JS script notices that the file is ready and downloads it.
Another solution, assuming the data does not has to be up to date, would be to generate the report files one time per day for example and have them available for download without any waiting time for the user. On the server side the task will stay the same: Read and write the data in chunks.
About the performance part of the question:
It this does not make it faster but it gives the user a feedback, you could even calculate (estimate) the remaining time based on the already processed chunks and further it prevents the script from crashing because of running out of memory. Instead of writing the file to disk you could directly stream it to the client. As soon as it starts reading the first chunk you'll start sending the data. But reading the data from the database... Well throw money and hardware on it. I suggest you something like a RAID5 array with SSDs if you have the money. Expect to throw a few thousand dollar on it.
But even the fastest DB read is limited by the bandwidth you can send and the user can receive. For speeding up the DB I recommend you to ask on superuser.com, I'm not an expert for DB hardware but a properly set up SSD configuration gives a huge speed boost to your DB.
Fetch only data you need:
I don't see any contain() or recursive setting in your find: Make sure you only fetch data that is really needed in the report!

mysql huge operations

I am currently importing a huge CSV file from my iPhone to a rails server. In this case, the server will parse the data and then start inserting rows of data into the database. The CSV file is fairly large and would take a lot time for the operation to end.
Since I am doing this asynchronously, my iPhone is then able to go to other views and do other stuff.
However, when it requests another query in another table.. this will HANG because the first operation is still trying to insert the CSV's information into the database.
Is there a way to resolve this type of issue?
As long as the phone doesn't care when the database insert is complete, you might want to try storing the CSV file in a tmp directory on your server and then have a script write from that file to the database. Or simply store it in memory. That way, once the phone has posted the CSV file, it can move on to other things while the script handles the database inserts asynchronously. And yes, #Barmar is right about using an InnoDB engine rather than MyISAM (which may be default in some configurations).
Or, you might want to consider enabling "low-priority updates" which will delay write calls until all pending read calls have finished. See this article about MySQL table locking. (I'm not sure what exactly you say is hanging: the update, or reads while performing the update…)
Regardless, if you are posting the data asynchronously from your phone (i.e., not from the UI thread), it shouldn't be an issue as long as you don't try to use more than the maximum number of concurrent HTTP connections.