Suppose I have multiple users.
Which of the following method will improve the performance if I have to process all user's data?
Database hit for every user to fetch the filtered data of that user or
Fetch data of all user in the single database hit and the filter the data using expression/loops like LAMBDA or LINQ.
It depends on your data.
if it is small than load it all and process it in a loop/stream
if it is very big than it is best to combine,
load a chunk of data process it and load a new chunk, process it and so on,
The issue is that loading from a DB takes time (open connection etc) but loading the entire DB to the memory is also problematic (if it is big), so you need to combine the two options.
hope it helps.
Related
I have a scenario wherein frontend app makes a call to backend DB (APP -> API Gateway -> SpringBoot -> DB) with a JSON request. Backend returns a very large dataset (>50000 rows) in response sizing ~10 MB.
My frontend app is highly responsive and mission critical, we are seeing performance issues; frontend where app is not responding or timing-out. What can be best design to resolve this issue condering
DB query cant be normalized any further.
SpringBoot code has has cache builtin.
No data can be left behind due to intrinsic nature
No multiple calls can be made as data is needed is first call itself
Can any cache be built in-between frontend and backend?
Thanks.
Sounds like this is a generated report from a search. If this data needs to be associated with each other, I'd assign the search an id and restore the results on the server. Then pull the data for this id as needed on the frontend. You should never have to send 50,000 rows to the client in one go... Paginate the data and pull as needed if you have to. If you don't want to paginate, how much data can they display on a single screen? You can pull more data from the server based on where they scroll on the page. You should only need to return the count of the rows to the frontend, and maybe 100 rows of data. This would allow you to show a scrollbar with the right height. When they scroll to a certain position within the data, you can pull the corresponding offset from the server for that particular search id. Even if you could return all 50,000+ rows in one go, it doesn't sound very friendly to the end user's device to have to load that kind of memory for a functional page.
This is a sign of a flawed frontend that should be redone.
10mb is huge and can be inconsiderate to your users especially if there's a high probability of mobile use.
If possible, it would be best to collect this data on the backend, probably put it onto disk, and then provide only the necessary data to the frontend as it's needed. As the map needs more data, you would make further calls to the backend.
If this isn't possible, you could load this data with the client-side bundle. If the data doesn't update too frequently, you can even cache it on the frontend. This would at least prevent the user from needing to fetch it repeatedly.
In my work I need to revamp the web which need to accept numerous connection always. Before I use the JSON to get the data until now.But now I want to direct call the DB and get the data. As I know use cache is the best way for my web. But in initial the concurrent access to DB is often happen.Any advice for me to handle the situation. Because I want the web that can get the updated data always.
Thanks.
Following are my suggestions
If you want to use cache, you have to automate your cache clear process whenever there is an update in the particular data you hit. But this is practically possible if your data is updated infrequently.
If your budget allows, Put your DB in a cluster (Write in master and read from master&slave)
In worst case,ensure your db is properly indexed.
I am wondering what would be the best way to speed up the company export function in my application:
function export(){
ini_set('memory_limit', '-1');
ini_set('max_execution_time', 0);
$conditions = $this->getConditions($this->data);
$resultCompanies = $this->Company->find('all', array('conditions' => $conditions));
$this->set(compact('resultCompanies'));
}
So what it does is it searches for the companies in the databse that match certain conditions. Then the results are set to be able to be displayed in the corresponding view.
How can I speed this function up? The more results you want to export the more time it takes to export them but is it possible to optimize it in some way? It currently takes around 30 seconds to export just 4000 results - so I don't imagine it being able to export say 40000 results and it should be able to? Is it a matter of the server?
Thanks.
This is not a matter of the server but of the program architecture.
You do not want to fetch and render this huge amount of information on the fly for obvious reasons you already have encountered.
I don't know enough about the requirements of your app but I assume that you need to download a report. Assuming it has to be always up to date here is what I would do:
The user clicks the link to download the report. The user will get a loading indicator displayed and a message that is report export is being prepared using JS and AJAX. On the server side a task is triggered to build a report.
A background service, a simple CakePHP shell that runs in a loop, will notice that there is a new report to build. It will build the report reading the db records in chunks to avoid running out of memory and write it to a file. When it's done the report download request is flagged as done and the file can be downloaded. On the client side the long polling JS script notices that the file is ready and downloads it.
Another solution, assuming the data does not has to be up to date, would be to generate the report files one time per day for example and have them available for download without any waiting time for the user. On the server side the task will stay the same: Read and write the data in chunks.
About the performance part of the question:
It this does not make it faster but it gives the user a feedback, you could even calculate (estimate) the remaining time based on the already processed chunks and further it prevents the script from crashing because of running out of memory. Instead of writing the file to disk you could directly stream it to the client. As soon as it starts reading the first chunk you'll start sending the data. But reading the data from the database... Well throw money and hardware on it. I suggest you something like a RAID5 array with SSDs if you have the money. Expect to throw a few thousand dollar on it.
But even the fastest DB read is limited by the bandwidth you can send and the user can receive. For speeding up the DB I recommend you to ask on superuser.com, I'm not an expert for DB hardware but a properly set up SSD configuration gives a huge speed boost to your DB.
Fetch only data you need:
I don't see any contain() or recursive setting in your find: Make sure you only fetch data that is really needed in the report!
I am currently importing a huge CSV file from my iPhone to a rails server. In this case, the server will parse the data and then start inserting rows of data into the database. The CSV file is fairly large and would take a lot time for the operation to end.
Since I am doing this asynchronously, my iPhone is then able to go to other views and do other stuff.
However, when it requests another query in another table.. this will HANG because the first operation is still trying to insert the CSV's information into the database.
Is there a way to resolve this type of issue?
As long as the phone doesn't care when the database insert is complete, you might want to try storing the CSV file in a tmp directory on your server and then have a script write from that file to the database. Or simply store it in memory. That way, once the phone has posted the CSV file, it can move on to other things while the script handles the database inserts asynchronously. And yes, #Barmar is right about using an InnoDB engine rather than MyISAM (which may be default in some configurations).
Or, you might want to consider enabling "low-priority updates" which will delay write calls until all pending read calls have finished. See this article about MySQL table locking. (I'm not sure what exactly you say is hanging: the update, or reads while performing the update…)
Regardless, if you are posting the data asynchronously from your phone (i.e., not from the UI thread), it shouldn't be an issue as long as you don't try to use more than the maximum number of concurrent HTTP connections.
On the customizable front page of our web site, we offer users the option of showing modules showing recently updated content, choosing from well over 100 modules.
All of the data is generated by MySQL queries, the results of which are cached via memcached. Our current system works like this: when a user load a page containing modules, module, they are immediately served the data from cache, and the query is added to a queue to be updated by a separate gearman process (so that the page load does not wait for the mysql query). That query is then run once every 15 minutes to refresh the data in cache. The queue of queries itself is periodically purged so that we do not continually refresh data that has not been requested recently.
The problem is what to do when the cache is empty, for some reason. This doesn't happen often, but when it does, the user is currently shown an empty module, and the data is refreshed in the gearman process so that a bit later, when the same (or a different) user reloads the page, there is data to show.
Our traffic is such that, if we were to try to run the query live for the user when the cache is empty, we would have a serious problem with stampeding--we'd be running the same (possibly slow) query many times as many users loaded the page. Is there any way to solve the "blank module" problem without opening up the risk of stampeding?
This is an interesting implementation though varies a bit from the way most typically implement memcached in fronT of MySQL.
In most cases users will set things up to where queries are first evaluated at memcached to see if there is is an available entry. If so they server it from memcached and never query the database at all. If there is a cache miss, then the query is made against the database, the results added to memcached, and the information returned to the caller. This is how you would typically build up your cache for read queries.
In cases where data is being updated, the update would be made against the database, and then the appropriate data in memcached invalidated and/or updated. Similarly for inserts, you could either do nothing regarding the cache (and let the next read on that record populate the cache), or you could actively add the data related to the insert into the cache, depending on your application needs.
In this way you wouldn't need to take the extra step of calling the database to get authoritative data after getting initial data from memcached. The data in memcached would be a copy of the authoritative data which is just updated/invalidated upon updates/inserts.
Based on your comments, one thing you might want to try in order to prevent a number of of queries on your database in case of cache misses is to use a mutex of sorts. For example, when the first client hits memcached and gets a cache miss for that lookup, you could could insert a temporary value in memcached indicating that the data is pending, then make the query against the database, and the update the memcached data with the result.
On the client side, when you get a cache miss or a "pending" result, you could simply initiate a retry for the cache after a certain period of time (which you may want to increase exponentially). So perhaps first hey wait for 1 second, then try back gain in 2 seconds if they still get a "pending" results, then retry in 4 seconds, and so on.
This would amount in possibly more requests against the memcached server, but should resolve any problems on the database layer.