API poor performance - gunicorn

I have developed a REST API in Flask which will be deployed to production. The API basically gets a request with around 70 variables, writes a csv file, then calls (os.system(myCommand)) a R script which reads the csv file, do some math and returns a csv file with a result. Then the API reads the csv file with the final result and sends it back to the client. In order to test performance, I sent 60 requests simultaneously and the time is around 30s.
The API is running in a 10 virtual cores machine with 64Gb and SSD disk.
How many workers should be used? I got slightly better performance with 8 cores insted of 4, but any improvement using 10. What kind of workers should be used under that scenario? I tried both sync and async with no time difference.
Any ideas?
Thanks in advance.
Regards,
Àlex

Related

Ec2 instance type for an api with complex database and data calculation

I need advise from any one who uses aws ec2 instance to host their projects.
Currently I have a php project(for backend api) and a reactjs (for frontend). When testing locally, the api response time is 3 seconds(I still optimizing my backend code to reduce it to 2 seconds), but my main concern is when deployed to a staging machine in aws using t3.medium for the backend and t2.medium for frontend, the response time is at least 19 seconds. Here are my goals
1. For staging, at least 5 seconds response time, since this is mainly used for testing purposes.
2. For production, I want the response time same as my local machine.My local machine uses i-7 and 16 gig of ram(with of course too many other application running and lots of tabs open (google chrome) locally).The initial target users for the production is 10-15 users, but will grow once our app will be tested well and stable(I mean data should be accurate).
At first my plan is to test all the available ec2 instance types and see which of them suits in my requirements particularly the response time, but a friend told me that it will cost me a lot since every time an ec2 instance is provisioned, aws will charge for the resources used.Also, what are the best approach, since my backend api has lot of scripts that is being run .The scripts is actually calling the amazon selling partner api and advertising api, which is currently, a very slow api itself, some of their endpoints has a response time of at least 30 seconds, that is why I decided to run them in backeground tru cron jobs.THis scripts also perform database writes after the response from amazon api is successful.
Thank you

Firebase Functions Count toward my GB bandwidth

I have wrote two functions in firebase, which maintain data. Like daily delete old data.
Question is when I write query to get data. does it count toward my GB downloaded limit which is $1/1GB for Blaze plan.
Since the data is transferred from Firebase Servers (Google Servers) to a user's computer (that is you in this case), you will be charged for all those data transfer into your computer.

HTTP POST Transmission suggestion

I'm building a system which requires an Arduino board to send data to the server.
The requirements/constraints of the app are:
The server must receive data and store them in a MySQL database.
A web application is used to graph and plot historical data.
Data consumption is critical
Web application must also be able to plot data in real time.
So far, the system is working fine, however, optimization is required.
The current adopted steps are:
Accumulate data in Arduino board for 10 seconds.
Send the data to the server using POST with data containing an XML string representing the 10 records.
The server parse the received XML and store the values in the database.
This approach is good for historical data, but not for realtime monitoring.
My question is: Is there a difference between:
Accumulating the data and send them as XML, and,
Send the data each second.
In term of data consumption, is sending a POST request each second too much?
Thanks
EDIT: Can anybody provide a mathematical formula benchmarking the two approaches in term of data consumption?
For your data consumption question you need to figure out how much each POST costs you giving your cell phone plan. I don't know if there is a mathematical formula, but you could easily test and work it out.
However, using 3G (even Wifi for that matter), the power consumption will be an issue, especially if your circuit runs on a battery; each POST bursts around 1.5 amps, that's too much for sending data every second.
But again, why would you send data every second?
Real time doesn't mean sending data every second, it means being at least as fast as the system.
For example, if you are sending temperatures, temperature doesn't change from 0° to 100° in one second. So all those POSTs will be a waste of power and data.
You need to know how fast the parameters change in your system and adapt your POST accordingly.

Data Importing using Azure Web Jobs to Azure SQL

Just looking for some advice on best way to handle data importing via scheduled Web Jobs.
I have 8 json files that are imported every 5 hours via an FTP client using JSON serializer into memory and then these JSON objects are processed and inserted into Azure SQL using EF6. Each file is processed in a loop sequentially as I wanted to make sure that all data is inserted correctly as when I tried to use a Parallel ForEach some of the data was not being inserted on related tables. So if the WebJob fails i know there has been an error and we can run again, problem is this is now taking a long time to complete, near on 2hrs as we have a lot data - each file has 500 locations and each location has 11 days and 24 hour data.
Anyone have any ideas on how to do this quicker whilst ensuring that the data is always inserted correctly or handle any errors. Was looking at using Storage queues but we may need to point to other databases in the future or can I use 1 web job per file so have 8 web jobs for each file being scheduled every 5 hours as i think there is a limit to the number of web jobs i can run per day.
Or is there an alternative way of importing data into Azure SQL that can be scheduled.
Azure Web Jobs (via the Web Jobs SDK) can monitor and process BLOBs. There is no need to create a scheduled job. The SDK can monitor for new BLOBs and process them as they are created. You could break up your processing to smaller files and load them as they are created.
Azure Stream Analytics has similar capabilities.

LOAD DATA not available; fgetcsv times out; alternatives?

I have a site where a CSV of racehorse data is to be uploaded once a week. The CSV contains the details of about 19,000 racehorses currently registered in the UK and is about 1.3MB in size, on average. I have a script that processes that csv and either updates the horse if it exists and the ratings data has changed, or adds it if it doesn't exist. If a horse is unchanged, it skips to the next one. The script works, as it was running on the host I use as a test. It took 5 or 6 minutes to run (less than ideal, I know), but it worked.
However, we're now testing on the staging version of the client's host, and it's running for 15 minutes and then returning a 504 timeout. We've tweaked htaccess and php.ini settings as much as we're able ... no joy.
The host is in a shared environment, so they tell me that MySQL's LOAD DATA is unavailable to us.
What other alternative approaches would you try? Or is there a way of splitting the CSV into chunks and running a process on each one in turn, for example?