Better scaling Spring application using #Async for large number of database insert or update queries - mysql

I have a spring REST controller whose sole purpose is to create or update a record every time when mobile client launches or boot app. This URL will be fired only when user launches app or if it comes to foreground after resume ( ie, when user press device home button to something else and after a while, user press the app icon to bring it to the foreground from memory ).
The expected number of requests for this URL is around 600 requests per minute.
To scale this application, is it better to put the database (MySql) create / update logic of spring controller in a separate thread or using #Async feature of Spring ?
So that it won't hold the system port for a very long time and one machine can handle large number of requests before my web server ( glassfish ) pushes requests to the waiting queue.
Also,
The expected table size or the number of records in this table is around 10M - 30M.

I personally wouldn't bother with an async call at least to start with. Create a jmeter script and fire some load at it and see how it performs.
If you start to get slow down using Async with a threadPoolExecutor behind it (that you can easily configure) is certainly a valid option. With these type of things configuring the queue size and number of threads (both for your thread pool executor and your web container) is a bit of a black art which is where something like jmeter and a good profiling tool such as Yourkit come into their own.

Related

Spring Framework #Async method + MySql Performance Degradation - Scalability Problem

I've an api, notifyCustomers() implemented on my batch server which gets called from my application server. It can send notification via three channels SMS, Push & Email. I've separate helper classes for each of them and they all execute in async mode.
I've got around 30k users out of which I usually send notification to the particular set of users ranging from 3k to 20k. The issue that I face is whenever I call that api, mysql performance just goes for a toss, particularly CPU. CPU utilisation goes around 100% for a very long period of around 30 mins
I've figured out workaround by doing following things and it's helping me in keeping things under control:
Using projection instead of domain object
Getting data in batch of 500 in each call
Implemented indexing based on the criteria that I need
No database calls from async methods of SMS, Email and Push
Thread.sleep(10 mins) between each subsequent fetch operation of data batches <== This is the dirty hack that's bothering me a lot
If I remove Thread.sleep() then everything goes haywire because batch server just calls async methods and then fires up db call to fetch next batch of 500 users in very quick successions till the time db server stops responding.
I need help with what I shall be doing in order to get rid of 5th point while keeping things under control? I'm running mysql on RDS with 300 IOPS and 4 GB RAM (db.t3.medium)

Slick with Hikari don't use more connections when needed

I'm trying to understand how Slick-Hikari works, I've read a lot of documentation but I've a use case whose behavior I don't understand.
I'm using Slick 3 with Hikari, with the default configuration. I already have a production app with ~1000 users connected concurrently. My app works with websockets and when I deploy a new release all clients are reconnected. (I know it's not the best way to handle a deploy but I don't have clustering at the moment.) When all these users reconnect, they all starts doing queries to get their user state (dog-pile effect). When it happens Slick starts to throw a lot of errors like:
java.util.concurrent.RejectedExecutionException: Task slick.backend.DatabaseComponent$DatabaseDef$$anon$2#4dbbd9d1 rejected from java.util.concurrent.ThreadPoolExecutor#a3b8495[Running, pool size = 20, active threads = 20, queued tasks = 1000, completed tasks = 23740]
What I think it's happening is that the slick queue for pending queries is full because it can't handle all the clients requesting information from the database. But if I see the metrics that Dropwizard provides me I see the following:
Near 16:45 we se a deploy. Until old instance is terminated we can see that the number of connections goes from 20 to 40. I think that's normal, given how the deploy process is done.
But, if the query queue of Slick becomes full because of the dog-pile effect, why is it not using more than 3-5 connections if it has 20 connections available? The database is performing really well, so I think the bottleneck is in Slick.
Do you have any advice for improving this deploy process? I have only 1000 users now, but I'll have a lot more in few weeks.
Based on the "rejected" exception, I think many slick actions were submitted to slick concurrently, which exceeded the default size(1000) of the queue embedded in slick.
So I think you should:
increase queue size(queueSize) to hold more unprocessed actions.
increase number of thread(numThreads) in slick to process more actions concurrently.
You can get more tips here

Will IIS ever terminate the thread if a POST gets canceled by the browser [duplicate]

Environment:
Windows Server 2003 - IIS 6.x
ASP.NET 3.5 (C#)
IE 7,8,9
FF (whatever the latest 10 versions are)
User Scenario:
User enters search criteria against large data-set. After initiating the request, they are navigated to a results page, where they wait until the data is loaded and can then refine the data.
Technical Scenario:
After user sends search criteria (via ajax call), UI calls back-end service. Back-end service queries transactional system(s) and puts the resulting data into a db "cache" - a denormalized table, set-up for further refining the of the data (i.e. sorting, filtering). UI waits until the data is cached and then upon getting notified that the process is done, navigates to a resulting page. The resulting page then makes a call to get the data from the denormalized table.
Problem:
The search is relatively slow (15-25 seconds) for large queries that end up having to query many systems based on the criteria entered. It is relatively fast for other queries ( <4 seconds).
Technical Constraints:
We can not entirely re-architect this search / results system. There are way to many complexities here between how the UI and the back-end is tied together. The page is required (because of constraints that can not be solved on StackOverflow) to turn after performing the search criteria.
We also can not ask the organization to denormalize the data prior to searching because the data has to be real-time, i.e. if a user makes a change in other systems, the data has to show up correctly if they do a search afterwards.
Process that I want to follow:
I want to cheat a little. I want to issue the "Cache" request via an async HttpHandler in a fire-forget model.
After issuing the query, I want to transition the page to the resulting page.
On the transition page, I want to poll the "Cache" table to see if the data has been inserted into it yet.
The reason I want to do this transition right away, is that the resulting page is expensive on itself (even without getting the data) - still 2 seconds of load time before even getting to calling the service that gets the data from the cache.
Question:
Will the ASP.NET thread that is called via the async handler reliably continue processing even if I navigate away from the page using a javascript redirect?
Technical Boundaries 2:
Yes, I know... This search process does not sound efficient. There is nothing I can do about that right now. I am trying to do whatever I can to get it to perform a little better while we continue researching how we are going to re-architect it.
If your answer is to: "Throw it away and start over", please do not answer. That is not acceptable.
Yes.
There is the property Response.IsClientConnected which is used to know if a long running process is still connected. The reason for this property is a processes will continue running even if the client becomes disconnected and must be manually detected via the property and manually shut down if a premature disconnect occurs. It is not by default to discontinue a running process on client disconnect.
Reference to this property: http://msdn.microsoft.com/en-us/library/system.web.httpresponse.isclientconnected.aspx
update
FYI this is a very bad property to rely on these days with sockets. I strongly encourage you to do an approach which allows you to quickly complete a request that notes in some database or queue of some long running task to complete, probably use RabbitMQ or something like that, that in turns uses socket.io or similar to update the web page or app once completed.
How about don't do the async operation on an ASP.NET thread at all? Let the ASP.NET code call a service to queue the data search, then return to the browser with a token from the service, where it will then redirect to the result page that awaits the completed result? The result page will poll using the token from the service.
That way, you won't have to worry about whether or not ASP.NET will somehow learn that the browser has moved to a different page.
Another option is to use Threading (System.Threading).
When the user sends the search criteria, the server begins processing the page request, creates a new Thread responsible for executing the search, and finishes the response getting back to the browser and redirecting to the results page while the thread continues to execute on the server background.
The results page would keep verifying on the server if the query execution had finished as the started Thread would share the progress information. When it does finish, the results are returned when the next ajax call is done by the results page.
It could also be considered using WebSockets. In a sense that the Webserver itself could tell the browser when it is done processing the query execution as it offers full-duplex communications channels.

Sync data on App_Closing event

I have few clumps of data that needs to be sync'd. The app is a calendar where in dates are stored, along with few other information. So on app exit I need to sync the all dates to the server. The dates and other info are converted to Json format and sent.
I have used HttpWebRequest for getting the responses from the server and hence are a series of callbacks. The function SyncHistory is called in on the Application_Closing
What happens is that the I can see the execution moving to the SyncHistory but once the app is closed, it does not further call the other functions.
I need the app to sync data before it stops? I have tried await keyword, sometimes it calls the functions but some other times it does not?
Where should the code ideally be put. I dont want to sync data everytime the user enters data. Is there any other common exit points which runs even after the app is closed?
This isn't a great idea - you only have a maximum of 10s to complete Application_Closing, before the phone OS will shutdown your app forcibly. Once your app is closed (or shutdown forcibly) none of your code will run.
The nature of a mobile phone networking and cellular networks is that you can't rely on having sent all your data to a server in 10s. You'll have to think of an alternative strategy if you want this to be reliable.
And you haven't even consider the Application_Deactivated scenario where you get even less time to complete.

Application design for data persistence over unreliable internet

I've an Flex actionscript 3 schedule reminding app which talks to a web-service through the internet over wifi. The problem is the wifi connection is unreliable and there are frequent dropouts. The schedule which the app reminds doesn't change very frequently. So instead of calling the web-service for finding the schedule every day/hour the app can store the data locally. Also, if the user updates the schedule on the app, the web-service is updated that the task on the schedule is complete. This data can also be stored locally so that when the user uses the app next time and there is an internet connection, the app can update the web-service.
What are the suggestions for the application design in such a case? Are there any examples?
For storing the schedule locally, use a shared object. Here is a tutorial on the subject, if you haven't used them before.
Any time the user adds/edits an item, attempt to send it to the server. Make sure to store the changed/new item in the shared object. If it fails, have the application periodically (eg every min or every 10 sec or every 15 mins, depending on how you want to set it up) check for a successful connection. As soon as it has a successful connection, have the app sync with the server. Make sure the server sends back a signal for successful saving before the app stops trying to send changes.
Does your application run all the time, or just for brief stints? It would only be able to sync when the app is open on the user's computer, of course. How frequently do you lose/regain connectivity?