Is it possible to do asynchronous / parallel database query in a Django application? - mysql

I have web pages that take 10 - 20 database queries in order to get all the required data.
Normally after a query is sent out, the Django thread/process is blocked waiting for the results to come back, then it'd resume execution until it reaches the next query.
Is there's any way to issue all queries asynchronously so that they can be processed by the database server(s) in parallel?
I'm using MySQL but would like to hear about solutions for other databases too. For example I heard that Postgresql has an async client library - how would I use that in this case?

This very recent blog entry seems to imply that it's not built in to either the django or rails frameworks. I think it covers the issue well and is quite worth a read along with the comments.
http://www.eflorenzano.com/blog/post/how-do-we-kick-our-synchronous-addiction/ (broken link)
I think I remember Cal Henderson mentioning this deficiency somewhere in his excellent speech http://www.youtube.com/watch?v=i6Fr65PFqfk
My naive guess is you might be able to hack something with separate python libraries but you would lose a lot of the ORM/template lazy evaluation stuff django gives to the point you might as well be using another stack. Then again if you are only optimizing a few views in a large django project it might be fine.

I had a similar problem and I solved it with javascript/ajax
Just load the template with basic markup and then do severl ajax requsts to execute the queries and load the data. You can even show loading animation. User will have a web 2.0 feel instead of just gloomy page loading. Ofcourse, this means several more HTTP requests per page, but it's up to you to decide.
Here is how my example looks: http://artiox.lv/en/search?query=test&where_to_search=all (broken link)

Try Celery, there's a bit of a overhead of having to run a ampq server, but it might do what you want. Not sure about concurrency of the DB tho. Also, if you want speed for your DB, I'd recommend MongoDB (but you'll need django-nonrel for that).

Related

Is there any better way to render the fetched feed data on webpage with infinite scrolling?

I am creating a webpage in ReactJS for post feed (with texts, images, videos) just like Reddit with infinite scrolling. I have created a single post component which will be provided with the required data. I am fetching the multiple posts from MySQL with axios. Also, I have implemented redux store in my project.
I have also added post voting. Currently, I am storing all the posts from db in redux store. If user upvotes or downvotes, that change will be in redux store as well as in database, and web-page is re-rendering the element at ease.
Is it feasible to use redux-store for this, as the data will be increased soon, maybe in millions and more ?
I previously used useState hook to store all the data. But with that I had issue of dynamic re-rendering, as I had to set state every time user votes.
If anyone has any efficient way, please help out.
Seems that this question goes far beyond just one topic. Let's break it down to the main pieces:
Client state. You say that you are currently using redux to store posts and update the number of upvotes as it changes. The thing is that this state is not actually a state in your case(or at least most of it). This is a common misconception to treat whatever data that is coming from API a state. In most cases it's not a state, it's a cache. And you need a tool that makes work with cache easier. I would suggest trying something like react-query or swr. This way you will avoid a lot of boilerplate code and hand off server data cache management to a library.
Infinite scrolling. There are a few things to consider here. First, you need to figure out how you are going to detect when to preload more posts. You can do it by using the IntersectionObserver. Or you can use some fance library from NPM that does it for you. Second, if you aim for millions of records, you need to think about virtualization. In a nutshell, it removes elements that are outside of the viewport from the DOM so browsers don't eat up all memory and die after some time of doomscrolling(that would be a nice feature tho). This article would be a good starting point: https://levelup.gitconnected.com/how-to-render-your-lists-faster-with-react-virtualization-5e327588c910.
Data source. You say that you are storing all posts in database but don't mention any API layer. If you are shooting for millions and this is not a project for just practicing your skills, I would suggest having an API between the client app and database. Here are some good questions where you can find out why it is not the best idea to connect to database directly from client: one, two.

Best practice for using mySQL in Vert.x

I just discovered vert.x and played a little with it. I love it, however there is one thing that puzzles me: SQL support. Better said, asynchronous SQL support.
I know there is a postgres mod, and even a mysql one but from what I've read on the github page it is still in an early phase.
Also, one should use the event bus to send the query from the requesting verticle to the working verticle that would do the request and then transfer the results back.
However, this doesn't sound like a good idea to me. The communication would be done in text-mode which would imply a lot of serialization/deserialization. Plus, the results would need to read completely and then sent to the requesting verticle which sometimes can be overkill.
And now for my questions:
What is the best way to use mySQL in a vert.x verticle?
Is using the event-bus ok to transfer this much information in text-mode? (or maybe I'm missing something and there is actually a way to send binary data through the event-bus).

Mvvmcross - Best way to retrieve updates

I'm not after any code in particular but I want to know what is the most efficient way to build a function that will constantly check for updates for things such as messages e.g. Have a chat conversation window and I want live updates such as Facebook.
Currently I have implemented it by putting a while loop in my core code that checks if the view is currently visible run a Task every 5 seconds to get new messages. This works but I don't believe its the most efficient way to do it and I need to consider battery life. *Note I do change visibility when the view goes away e.g. on iOS i do
public override ViewDidDissapper {
Model.SetVisible(false)
}
Has anyone implemented some sort of polling on a cross platform app?
There are many different possible solutions here - which one you prefer depends a lot on your requirements in terms of latency, reliability, efficiency, etc - and it depends on how much you can change server side.
If your server is fixed as a normal http server, then frequent polling may be your best route forwards, although you could choose to modify the 5 seconds occasionally when you think updates aren't likely.
One step up from this is that you could try long polling http requests within your server.
Another step beyond that are using Socket (TCP, UDP or websocket) communications to provide "real time" messaging.
And in parallel to these things, you could also consider using PUSH notifications both within your app and in the background.
Overall, this is a big topic - I'd recommend reading up about PushSharp from #Redth and about SignalR from Microsoft - #gshackles has some blog posts about using this in Xamarin. Also, services like AzureMobileServices, UrbanAirship, Buddy, Parse, etc may help

How can I investigate these mystery Django crashes?

A Django site (hosted on Webfaction) that serves around 950k pageviews a month is experiencing crashes that I haven't been able to figure out how to debug. At unpredictable intervals (averaging about once per day, but not at the same time each day), all requests to the site start to hang/timeout, making the site totally inaccessible until we restart Apache. These requests appear in the frontend access logs as 499s, but do not appear in our application's logs at all.
In poring over the server logs (including those generated by django-timelog) I can't seem to find any pattern in which pages are hit right before the site goes down. For the most recent crash, all the pages that are hit right before the site went down seem to be standard render-to-response operations using templates that seem pretty straightforward and work well the rest of the time. The requests right before the crash do not seem to take longer according to timelog, and I haven't been able to replicate the crashes intentionally via load testing.
Webfaction says that isn't a case of overrunning our allowed memory usage or else they would notify us. One thing to note is that the database is not being restarted (just the app/Apache) when we bring the site back up.
How would you go about investigating this type of recurring issue? It seems like there must be a line of code somewhere that's hanging - do you have any suggestions about a process for finding it?
I once had some issues like this, and it basically boiled down to my misunderstanding of thread-safety within django middleware. Basically the django middleware is I believe a singleton that is shared among all threads, and these threads were thrashing with the values set on a custom middleware class I had. My solution was to rewrite my middleware to not use instance or class attributes that changed, and to switch the critical parts of my application to not use threads at all with my uwsgi server as these seemed to be an overall performance downside for my app. Threaded uwsgi setups seem to work best when you have views that may complete at different intervals (some long running views and some fast ones).
Since you can't really describe what the failure conditions are until you can replicate the crash, you may need to force the situation with ab (apache benchmark). If you don't want to do this with your production site you might replicate the site in a subdomain. Warning: ab can beat the ever loving crap out of a server, so RTM. You might also want to give the WF admins a heads up about what you are going to do.
Update for comment:
I was suggesting using the exact same machine so that the subdomain name was the only difference. Given that you used a different machine there are a large number of subtle (and not so subtle) environmental things that could tweak you away from getting the error to manifest. If the new machine is OK, and if you are willing to walk away from the problem without actually solving it, you might simply make it your production machine and be happy. Personally I tend to obsess about stuff like this, but then again I'm also retired and have plenty of time to play with my toes. :-)

What is the best approach to monitor site performance in rails

I'd like to have a special part of administrate area which shows a list of pages/queries that was slow for the past week.
Is there a gem / plugin or other easy way to implement/automate such functionality for the Rails framework?
http://www.newrelic.com/features.html
And a short screencast for RPM:
http://railslab.newrelic.com/2009/01/22/new-relic-rpm
P.S. I don't work for New Relic:)
Here's one example, and another and yet another. See google for even more.
Yeah, it seems like a nice idea. I´ve never seen it before but I imagine you could just log the time each request takes to a DB; then a simple query will show you the slowest requests.
How you optimize your app depends entirely on your code; I guess there´s no silver bullet for that.
First thing that comes to mind: one could track execution time of the queries and if it passes some threshold which is considered normal (average maybe) it gets logged along with some profiling information (which is discarded otherwise).
It's might also be feasible to profile individual parts of the query (like data acquisition from db, logic and so on), then again compare the times against some averages.
One pitfall is that some pages/queries are bound to be processed significantly longer then others because of the difference in the amount of "work" they do. One would have to keep a whole lot of averages for different parts of the site / different types of queries in order to get rid of constant flow of normal queries that execute longer by design.
This is a very simple approach though, I'm sure there are better ways to do it.