Performance of JPA mappings - mysql

Right now I am using JPA. What I would like to know is the following:
I have a lot of tables mapped to each other. When I look at the log, I see that a lot of information is being pulled out from the database after a simple query. What will happen if there will be a lot of queries at a time? Or will it work fine? How can I increase performance?

There is an overhead that ORM framework comes with. Because it's fairly high level, it sometimes needs to generate a lot of native SQL queries to get you what you want in one or two lines of JPQL or pure EntityManager operations.
However, JPA uses two caches - L1 and L2. One operates on the entities level, other on PersistenceUnits level. Therefore, you might see a lot of SQL queries generated, but after some time, you should have some of the data cached.
If you're unhappy with the performance, you could try using lazy loaded collections or fetching the required data by yourself (you might be interested in Bozho's post regarding this matter).
Finally, if you see that the cache hasn't improved your performance and that the hand-made JPQL queries are not doing the job right - you can always revert to plain SQL queries. Beware, that those queries bypass the JPA caches and might require you to do some flushes before you execute the native query (or invoke it at the beginning of the active transaction).
Despite the optimisation route you'll choose - firstly test it in your environment and answer yourself if you need this optimisation at all. Do some heavy testing, performance tests and so on.
"Premature optimization is the root of all evil." D. Knuth

If you really have a LOT of entities mapped together, this could indeed lead to a performance problem. This will usually be the case if you have a lot of #OneToManyor #ManyToMany mappings:
#Entity
public class A {
#OneToMany
private List<B> listB;
#ManyToMany
private List<C> listC;
#OneToMany
private List<D> listD;
...
}
However one thing you could do, is using lazy fetching. This means that the loading of a field may be delayed until it is accessed for the first time. You could achieve this by using the fetch attribute:
#Entity
public class A {
#OneToMany(fetch=FetchType.LAZY)
private List<B> listB;
#ManyToMany(fetch=FetchType.LAZY)
private List<C> listC;
#OneToMany(fetch=FetchType.LAZY)
private List<D> listD;
...
}
In the above sample it means listB, listC, listD will not be fetched from the DB until the first access to the list.
The concrete implementation of the lazy fetching depends on your JPA provider.

Related

Is the performance of raw sql much better than using spring-data-how?

I want to request a high amount of records (100000 to 1000000) per select request with a join of three tables. Is the performance much better with nativeSQL instead of using spring-data-jpa for mapping it to #Entity objects?
Thx!
JPA and every ORM turn your query results into domain objects.
That of course takes resources. Spring Data JPA adds potential conversions to that and it preprocesses your query in order to support fancy ways of setting parameters.
If you are selecting large amounts of data the preprocessing of the statement probably doesn't matter that much.
But the conversion to domain objects will.
You used the word "migrating" which sounds like you are going to select data and then immediately write it somewhere else. If that is the case, use plain SQL and work directly on the ResultSet tell the driver to make it read only and forward only. See Understanding Forward Only ResultSet

Spring Transaction Performance Multiple Vs Single

So I am curious in knowing what is the performance impact if we have a single transaction with multiple updates in one flow Or have separate transaction for each of the update.
If application can sustain both of the patterns which one is the best one to opt for. But in the case if application can only go with the second option which is to have different transactions for all the updates then how much kick back to do we get in terms of performance.
#Transactional
public updateXYZ(){
updateX()
updateY()
updateZ()
}
VS
public updateXYZ(){
updateSeprateTransactionX()
updateSeprateTransactionY()
updateSeprateTransactionZ()
}

Speeding up Hibernate Object creation?

We use Hibernate as our ORM layer on top of a MySQL database. We have quite a few model objects, of which some are quite large (in terms of number of fields etc.). Some of our queries requires that a lot (if not all) of the model objects are retrieved from the database, to do various calculations on them.
We have lazy loading enabled, but in some cases it still takes a significant amount of time for Hibernate to populate the objects. The execution time of the MySQL query is very fast (in the order of a few milliseconds), but then Hibernate takes its sweet time to populate the objects.
Is there any way / pattern / optimization to speed up this process?
Thanks.
One approach is to not populate the entity but some kind of view object.
Assuming a CustomerView has the appropriate constructor, you can do
select new CustomerView(c.firstname, c.lastname, c.age) from Customer c
Though I'm a bit surprised about Hibernate being slow to populate objects unless you happen to load associated objects by cascade and forget a few appropriate fetches.
Perhaps consider adding a second level cache? This won't necessarily speed up the object instantiation, but it could considerably cut down the frequency in which you are needing to do that.
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html
Since you're asking a performance-related question, you might want to collect more data on where the bottleneck is. You say
Hibernate takes its sweet time to populate the objects.
How do you know it's Hibernate that's the problem? In other words, is Hibernate itself the problem, or could there not be enough memory (or too much) so the JVM isn't running efficiently?
Also, you mention
We have quite a few model objects, of which some are quite large (in terms of number of fields etc.).
How many is "quite large"? Dozens? Hundreds? Thousands? It makes a big difference, because relational databases (such as MySQL) start performing more poorly as your table gets "wider" (see this question: Is there a performance decrease if there are too many columns in a table?).
Performance is a lot about balancing constraints, but it's also about collecting a lot of data to see where the problem is and then fixing that problem. Then you'll find the next bottleneck and fix that one until your performance is good enough, or you run out of implementation time.

Challenges with Linq to sql concept in dot net

Let say if I used the Linq to Sql concept to interact with database from C# language , then what challenges I may be face? means in terms of architecture, performance , type safety, objects orientation etc ..!
Basically Linq to SQL generates a class for each table in your database, complete with relation properties and all, so you will have no problems with type safety. The use of C# partials allows you to add functionality to these objects without messing around with Linq to SQLs autogenerated code. It works pretty well.
As tables map directly to classes and objects, you will either have to accept that your domain layer mirrors the database design directly, or you will have to build some form of abstraction layer above Linq to SQL. The direct mirroring of tables can be especially troublesome with many-to-many relations, which is not directly supported - instead of Orders.Products you get Order.OrderDetails.SelectMany(od => od.Product).
Unlike most other ORMs Linq to SQL does not just dispense objects from the database and allow you to store or update objects by passing them back into the ORM. Instead Linq to SQL tracks the state of objects loaded from the database, and allows you to change the saved state. It is difficult to explain and strange to understand - I recommend you read some of Rick Strahls blogposts on the subject.
Performance wise Linq-to-SQL does pretty good. In benchmarking tests it shows speeds of about 90-95% of what a native SQL reader would provide, and in my experience real world usage is also pretty fast. Like all ORMs Linq to SQL is affected by the N+1 selects problem, but it provides good ways to specify lazy/eager loading depending on context.
Also, by choosing Linq to SQL you choose MSSQL - there do exist third party solutions that allow you to connect to other databases, but last time I checked, none of them appeared very complete.
All in all, Linq to SQL is a good and somewhat easy to learn ORM, which performs okay. If you need features beyond what Linq to SQL is offering, take a look at the new entity framework - it has more features, but is also more complex.
We've had a few challenges, mainly from opening the query construction capability to programmers that don't understand how databases work. Here are a few smells:
//bad scaling
//Query in a loop - causes n roundtrips
// when c roundtrips could have been performed.
List<OrderDetail> od = new List<OrderDetail>();
foreach(Customer cust in customers)
{
foreach(Order o in cust.Orders)
{
od.AddRange(dc.OrderDetails.Where(x => x.OrderId = o.OrderId));
}
}
//no seperation of
// operations intended for execution in the database
// from operations intended to be executed locally
var query =
from c in dc.Customers
where c.City.StartsWith(textBox1.Text)
where DateTime.Parse(textBox2.Text) <= c.SignUpDate
from o in c.Orders
where o.OrderCode == Enum.Parse(OrderCodes.Complete)
select o;
//not understanding when results are pulled into memory
// causing a full table load
List<Item> result = dc.Items.ToList().Skip(100).Take(20).ToList();
Another problem is that one more level of separation from the table structures means indexes are even easier to ignore (that's a problem with any ORM though).

Mutli tenancy backstop check on Linq-to-sql entity load()

I am writing an aspx application that will host 1000's of small customers in a communal SQL Server database. All entities will be created and loaded via Linq-To-Sql.
Surrogate keys (identity columns) will be used throughout the schema for all table relationships and so starting with a root customer object I should be able to navigate to exclusive sets of data for a particular customer using regular Linq queries (SQL joins).
However from a security standpoint the above is a bit fragile so I wish to add an extra layer of tenancy check as a security backstop. All entities in my entity model will have a non-indexed int TenantId field.
I am looking for critical comments about this solution from a performance perspective.
public partial class MyLinqEntity
partial void OnLoaded() // linq-to-sql extensibility function
{
if ( this.TennantId != HttpContext.Current.Session["tenantId"] )
throw new ApplicationException("Logic error, LINQ query crossed tenantId data boundary");
}
partial void OnCreated() // linq-to-sql extensibility function
{
this.TennantId = HttpContext.Current.Session["tenantId"] );
}
Sorry this is mostly random thoughts….
I do not like all the objects depending on HttpContext.
Also I don’t know if looking in the session for each object is fast enough. I think you will be OK on speed, as a database lookup will normally be a lot slower then anything you do in process.
I would tend to use a dependency injection framework to auto create an object that has session scope to do the check. However if you don’t have a need for a dependency injection framework elsewhere this will be over kill.
As all your database rows will have tenantId column, I hope you can move the check into a linq-to-sql “row read callback”, so you don’t have to put it into each object. I don’t know “linq-to-sql”, however I expect you could hook into it’s query creation framework and add a “where tenanted = xx” to all database queries.
Having a “where tenanted = xx” in all quarries will let partition the database if needed by customer, so making “table scans” etc cheaper.