Performance Issues with Include in Entity Framework - json

I am working on a large application being developed using Repository Pattern, Web APIs, AngularJS. In one of the scenario, I am trying to retrieve data from a single lead which has relations with approx. 20 tables. Lazy loading is disable, so I am using Include to get the data from all the 20 tables. Now, here comes performance issue, if I try to retrieve single record, it takes approx. 15 seconds. This is a huge performance issue. I am returning JSON and my entities are decorated with DataContract(IsReference = true)/ Data Member attribute.
Any suggestions will be highly appreciated.

Include is really nasty for performance because of how it joins.
See more info in my blog post here http://mikee.se/Archive.aspx/Details/entity_framework_pitfalls,_include_20140101
To summarize the problem a bit it's because EF handles Include by joining. This creates a result set where every row includes every column of every joined entity (Some contain null values).
This is even more nasty if the root entity contains large fields (like a long text or a binary) because that one get repeated.
15s is way too much though. I suspect something more is at play like missing indexes.
To summarize the solutions. My suggestion is normally that you load every relation separately or in a multiquery. A simple query like that should be 5-30ms per entity depending on your setup. In this case it would still be quite slow (~1s if you are querying on indexes). Maybe you need to look at some way to store this data in a better format if this query is run often (Cache, document, json in the db). I can't help you with that though, would need far more information as the update paths affect the possibilities a lot.

The performance has been improved by Enabling Lazy Loading.

Related

Is filtered selection faster than fetching all the rows and then filtering

So I want to create a table in the frontend where I will list every single user. The thing is that the tables are relational and I have to get data from multiple tables in order to fulfill my goal.
Now here comes my question (keep in mind I have a MySQL database) :
Which method is better on the long run :
Generate joined queries that fetch all the data from each table where a user has any information (it outputs ~80 column per row and only 15 of them are needed)
Fetch the data that I need with multiple queries and then just "stick" the values together and output them (15 columns and all of them are needed, but I have to do extra work)
I would suggest you to go for third option.
Generate joined queries that fetch only necessary 15 columns for your front end. It would be the most efficient way.
If you are facing challenges with joining the tables then you can share table structures with sample data and desired output here with your query. We can try to help you achieve your goal.
This is a bit long for a comment.
I don't understand your first option. Why would you be selecting columns that you don't need? If there are 15 columns that you specifically want, then select those columns and nothing else.
In general, it is faster to have the database do most of the work. It can take advantage of its optimizer to produce the best execution plan that it can.
From Experience with embedded hardware mysql server.
If the hardware can do it and has enough resources you let the databse server run it course, as it can run its optimizer.
But if the server hardware lags on some fronts, you transpport all data to the client and let it run Javascript on all returned data.
The same goes for bandwith of the internet connection, it is slow, you want lesser number of rows, to transport because that the user will notice it, even old smartphones have to much power in cpu, amd can so handle everything with easy what you through at them.
In Basic there is no sime answer, you have to check server hardware and the usual bandwith offered and then program a solution that works best
A simple Rule of Thumb:
Fewer round-trips to the database server is usually the faster alternative.

CakePHP: Is it possible to force find() to run a single MySQL query

I'm using CakePHP 2.x. When I inspect the sql dump, I notice that it's "automagic" is causing one of my find()s to run several separate SELECT queries (and then presumably merging them all together into a single pretty array of data).
This is normally fine, but I need to run one very large query on a table of 10K rows with several joins, and this is proving too much for the magic to handle because when I try to construct it through find('all', $conditions) the query times out after 300 seconds. But when I write an equivalent query manually with JOINS, it runs very fast.
My theory is that whatever PHP "magic" is required to weave the separate queries together is causing a bottleneck for this one large query.
Is my theory a plausible explanation for what's going on?
Is there a way to tell Cake to just keep it simple and make one big fat SELECT instead of it's fancy automagic?
Update: I forgot to mention that I already know about $this->Model->query(); Using this is how I figured out that the slow-down was coming from PHP magic. It works when we do it this way, but it feels a little clunky to maintain the same query in two different forms. That's why I was hoping CakePHP offered an alternative to the way it builds up big queries from multiple smaller ones.
In cases like this where you query tables with 10k records you shouldn't be doing a find('all') without limiting the associations, these are some of the strategies you can apply:
Set recursive to 0 If you don't need related models
Use Containable Behavior to bring only the associated models you need.
Apply limits to your query
Caching is a good friend
Create and destroy associations on the fly As you need.
Since you didn't specify the problem I just gave you general ideas to apply depending on the problem you have

Speeding up Hibernate Object creation?

We use Hibernate as our ORM layer on top of a MySQL database. We have quite a few model objects, of which some are quite large (in terms of number of fields etc.). Some of our queries requires that a lot (if not all) of the model objects are retrieved from the database, to do various calculations on them.
We have lazy loading enabled, but in some cases it still takes a significant amount of time for Hibernate to populate the objects. The execution time of the MySQL query is very fast (in the order of a few milliseconds), but then Hibernate takes its sweet time to populate the objects.
Is there any way / pattern / optimization to speed up this process?
Thanks.
One approach is to not populate the entity but some kind of view object.
Assuming a CustomerView has the appropriate constructor, you can do
select new CustomerView(c.firstname, c.lastname, c.age) from Customer c
Though I'm a bit surprised about Hibernate being slow to populate objects unless you happen to load associated objects by cascade and forget a few appropriate fetches.
Perhaps consider adding a second level cache? This won't necessarily speed up the object instantiation, but it could considerably cut down the frequency in which you are needing to do that.
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html
Since you're asking a performance-related question, you might want to collect more data on where the bottleneck is. You say
Hibernate takes its sweet time to populate the objects.
How do you know it's Hibernate that's the problem? In other words, is Hibernate itself the problem, or could there not be enough memory (or too much) so the JVM isn't running efficiently?
Also, you mention
We have quite a few model objects, of which some are quite large (in terms of number of fields etc.).
How many is "quite large"? Dozens? Hundreds? Thousands? It makes a big difference, because relational databases (such as MySQL) start performing more poorly as your table gets "wider" (see this question: Is there a performance decrease if there are too many columns in a table?).
Performance is a lot about balancing constraints, but it's also about collecting a lot of data to see where the problem is and then fixing that problem. Then you'll find the next bottleneck and fix that one until your performance is good enough, or you run out of implementation time.

Performance of MySql Xml functions?

I am pretty excited about the new Mysql XMl Functions.
Now I can finally embed something like "object oriented" documents in my oldschool relational database.
For an example use-case consider a user who sings up at your website using facebook connect.
You can fetch an object for the user using the graph api, and get nice information. This information however can vary vastly. Some fields may or may not be set, some may be added over time and so on.
Well if you are just intersted in very special fields (for example friends relations, gender, movies...), you can project them into your relational database scheme.
However using the XMl functions you could store the whole object inside a field and then your different models can access the data using the ExtractValue function. You can store everything right away without needing to worry what you will need later.
But what will the performance be?
For example I have a table with 50 000 entries which represent useres.
I have an enum field that states "male", "female" (or various other genders to be politically correct).
The performance of for example fetching all males will be very fast.
But what about something like WHERE ExtractValue(userdata, '/gender/') = 'male' ?
How will the performance vary if the object gets bigger?
Can I maby somehow put an Index on specified xpath selections?
How do field types work together with this functions/performance. Varchar/blob?
Do I need fulltext indexes?
To sum up my question:
Mysql XML functins look great. And I am sure they are really great if you just want to store structured data that you fetch and analyze further in your application.
But how will they stand battle in procedures where there are internal scans/sorting/comparision/calculations performed on them?
Can Mysql replace document oriented databases like CouchDB/Sesame?
What are the gains and trade offs of XML functions?
How and why are they better/worse than a dynamic application that stores various data as attributes?
For example a key/value table with an xpath as key and the value as value connected to the document entity.
Anyone made any other experiences with it or has noticed something mentionable?
I tend to make comments similar to Pekka's, but I think the reason we cannot laugh this off is your statement "This information however can vary vastly." That means it is not realistic to plan to parse it all and project it into the database.
I cannot answer all of your questions, but I can answer some of them.
Most notably I cannot tell you about performance on MySQL. I have seen it in SQL Server, tested it, and found that SQL Server performs in memory XML extractions very slowly, to me it seemed as if it were reading from disk, but that is a bit of an exaggeration. Others may dispute this, but that is what I found.
"Can Mysql replace document oriented databases like CouchDB/Sesame?" This question is a bit over-broad but in your case using MySQL lets you keep ACID compliance for these XML chunks, assuming you are using InnoDB, which cannot be said automatically for some of those document oriented databases.
"How and why are they better/worse than a dynamic application that stores various data as attributes?" I think this is really a matter of style. You are given XML chunks that are (presumably) documented and MySQL can navigate them. If you just keep them as-such you save a step. What would be gained by converting them to something else?
The MySQL docs suggest that the XML file will go into a clob field. Performance may suffer on larger docs. Perhaps then you will identify sub-documents that you want to regularly break out and put into a child table.
Along these same lines, if there are particular sub-docs you know you will want to know about, you can make a child table, "HasDocs", do a little pre-processing, and populate it with names of sub-docs with their counts. This would make for faster statistical analysis and also make it faster to find docs that have certain sub-docs.
Wish I could say more, hope this helps.

Is it better to return one big query or a few smaller ones?

I'm using MySQL to store video game data. I have tables for titles, platforms, tags, badges, reviews, developers, publishers, etc...
When someone is viewing a game, is it best to have have one query that returns all the data associated with a game, or is it better to use several queries? Intuitively, since we have reviews, it seems pointless to include them in the same query since they'll need to be paginated. But there are other situations where I'm unsure if to break the query down or use two queries...
I'm a bit worried about performance since I'm now joining to games the following tables: developers, publishers, metatags, badges, titles, genres, subgenres, classifications... to grab game badges, (from games_badges; many-to-many to games table, and many to many to badges table) I can either do another join, or run a separate query.... and I'm unsure what is best....
It is significantly faster to use one query than to use multiple queries because the startup of a query and calculation of the query plan itself is costly and running multiple queries in a row slows the server more each time. Obviously you should only get the data that you actually need, but fewer queries is always better.
So if you are going to show 20 games on a page, you can speed up the query (still using only one query) with a LIMIT clause and only run that query again later when they get to the next page. That or you can just make them wait for the query to complete and have all of the data there at once. One big wait or several little waits.
tl;dr use as few queries as possible.
There is no panacea.
Always try to get only necessary data.
There is no answer whether one big or several small queries is better. Each case is unique and to answer this question you should profile your application and examine queries' EXPLAINs
This is generally a processing problem.
If making one query would imply retrieving thousands of entries, call several queries to have MySQL do the processing (sums, etc.).
If making multiple queries involves making tens or hundreds of them, then call a single query.
Obviously you're always facing both of these since neither is a goto option if you're asking the question, so the choices really are:
Pick the one you can take the hit on
Cache or mitigate it as much as you can so that you take a hit very rarely
Try to insert preprocessed data in the database to help you process the current data
Do the processing as part of a cron and have the application only retrieve the data
Take a few steps back and explore other possible approaches that don't require the processing