I'd like to test how selecting and loading data in java application using hibernate and mysql can be optimized.
I divided results I found into 2 groups
MySQL
indexes - for sure
stored procedures - is there a difference if select is done in stored procedure?
views - is there a difference if select definition is kept in view?
query cache - does it work only if we do the same select second time?
Hibernate
hibernate cache - is this similar to query cache? how it can be configured?
lazy loading - can it help?
Are there any other ways? I use simple queries with several joins and aggregation functions.
I need to demonstate time changes between "before" and "after" optimization.
For more information I tried to read this, but language is to complicated for me.
Batch fetching is important when you read e.g. a collection. In that case Hibernate can get many rows of the collection in one SQL request, so that it would be faster.
Hibernate caching is very good solution (read about EHCache for instance) it can store retrieved data on memory and if nothing changes on his side it can retrieve it without even asking SQL engine what's going on there.
Lazy loading is a must for One to Many associations (you can kill your solution without this). But fortunately it is set by default in Hibernate for such associations.
You can also read about optimistic lock, which is faster than pessimistic lock in many cases.
Last but not least you should use proper transaction strategy, so that you shouldn't create new transactions when it is not necessary.
Related
Our application is based on Java 8, Spring Data JPA and MySQL. We have two different data source in my application, our task is to fetch millions of data (text stored in a table) from one data source and insert into different data source after some small computation.
When I tried to iterate through each record and insert into different Database, it is taking a longer time than the expected.
Is there any standard and fastest way of doing this? Do I need to use a stored procedure? if yes, then how would I pass the list of entities in the procedure?
Don't use JPA. JPAs main use case is: Loading a non-trivial domain model, manipulating it, then flushing it to the database with automatic detection what changed. You don't seem to need that in your usecase.
Use JDBC and batch inserts. Springs JdbcTemplate will come in handy.
Select a batch, manipulate it as desired, insert it into the target.
For tuning the select process consider value based pagination.
For writing consider removing constraints and indexes and creating them after the process.
There might be more MySQL specific options available, but I don't know about those.
You might want to split your work in three thread pools: One for reading, one for writing, one for processing the data.
I'm not sure, but Spring Batch might help with that.
Load/save entries in batches (100 or 1000 entries in one go).
Load and/or save asynchronously.
I need to fetch huge data in my code. I am using zf2, currently I am using Doctrine2 to deal with my database. But I just found that ORM queries are taking much time than plain mysql queries. So could you please suggest me the best way to fetch huge data from database in zf2.
There are a few things to consider here:
By default, Doctrine will hydrate the database records into php objects (entities): it fills the entities with the data from your query. This is done by "smart guessing" so to speak, which is relatively slow. Hydrate to arrays, and you are getting a much faster response. Read more about hydration in the manual.
By default, Doctrine does not use any caching to parse your mappings or when transforming the DQL into SQL. You can use caching to speed things up. Also, Doctrine is faster dealing with read-only entities than making them read/write. Read more about Doctrine performance issues in the manual.
Rendering 50,000 rows in a html table (or whatever) is a pain for your render engine (Zend\View / php) and your DOM (in the browser). The query might be optimized and load fairly quickly, rendering all these results into a view and displaying them into a browser will be really slow as well.
Using pagination will decrease the size of your dataset, speeding up the query and the rendering at the same time.
Using an ORM to help you with mapping, database abstraction and so on is really nice, but comes with the trade-off: performance. Using a library like Doctrine inherently increases execution time: you will never reach the performance of the php mysql functions in Doctrine.
So:
Try to decrease the number of results from one query, to speed up the database query, hydration and rendering
Try tuning performance with Doctrine by using array hydration, caching and read-only entities.
If you need to fastest way possible, don't use Doctrine. Try out Zend\Db or write all things in sql yourself
I got multiple tables where I have to join, subquery,pagination, grouping, ordering . Keeping hibernate limitation in mind, sometime native SQL is required and during this time hibernate cache is helpless. Also the data is stored in hibernate second level cache is not automatic, since its stored only when DB is accessed. So first time second level cache is empty.
My problem is I used native sql to get data with multiple joins and grouping,ordering, finally ending up in the performance issue.
My thoughts: I like sql VIEW to pull data with all those joins ,ordering , grouping. But the sql VIEW is like a normal select statement and executes every time on access. Is there any live result set as table where I can just say fetch data as select * from ONE_LIVE_RESULT_SET where condition.
Is there any concept like LIVE_RESULT_SET IN sql world? Any comments.
Use a materialized view
Extract from Wikipedia: http://en.wikipedia.org/wiki/Materialized_view
A materialized view is a database object that contains the results of
a query. For example, it may be a local copy of data located remotely,
or may be a subset of the rows and/or columns of a table or join
result, or may be a summary based on aggregations of a table's data.
Materialized views, which store data based on remote tables, are also
known as snapshots. A snapshot can be redefined as a materialized
view.
Example syntax to create a materialized view in Oracle:
CREATE MATERIALIZED VIEW MV_MY_VIEW REFRESH FAST START WITH SYSDATE
NEXT SYSDATE + 1
AS SELECT * FROM ;
Regards
But this MATERIALIZED VIEW is not a live data(sync up with table) but inorder to make it live data it has to be REFRESH. Here the question will be When to REFRESH OR during such refresh again one has to wait. Also frequent data changing is another use case to suffer. Is there any ways where the refresh can be done for specific row?
Any hibernate experts!!! Does HIBERNATE persist the data on multiple joins, complex joins?
I have seen hibernate persisting second level cache session.get(id), but I am not sure about the hql or native sql having multiple/complex join. Is it possible to get from hibernate second level cache for multiple/comples joins ?
I have a Huge person database and do common search with name on it.
SELECT * FROM tbl_person WHERE full_name LIKE 'Sparow%Jack%';
SELECT * FROM tbl_person WHERE full_name LIKE 'Sparow%';
I rarely insert new data in this table.
I want to store common last_name queries on hark disk, queries already stored in ram but I loose it all each time the server reboot.
I have 1.7Billions row in my table and each row (with index) take 1k, yes it's a 1.7Tb database.
It's the main reason why I want to stored common select on disk.
Variable_name,Value
query_alloc_block_size,8192
query_cache_limit,1048576
query_cache_min_res_unit,1024
query_cache_size,4294966272
query_cache_type,ON
query_cache_wlock_invalidate,OFF
query_prealloc_size,8192
Edit :
SELECT * FROM tbl_person WHERE full_name LIKE 'Savard%';
take 1000 sec to execute first time and 2 sec after.
If I reboot the system and execute again, the query take 1000 sec again.
I simply want to avoid mysql take another 1000 sec runing the same query I already do before reboot.
Why not consider something like Redis for caching?
It's an in memory data store and it's very popular right now. Sites using Redis:
http://blog.togo.io/redisphere/redis-roundup-what-companies-use-redis
Redis also can persist data to disk: http://redis.io/topics/persistence
For caching though, saving to disk shouldn't be absolutely critical. The idea is that if some data is not cached, the worst case is not always loading from disk manually, but going straight through to your database.
If you are performing many such queries on your data, I suggest you index your table using Apache Lucene or Sphinx. Database are fast, but they are not so efficient (especially MySQL) when performing partial matches on millions of rows.
I already answered a similar question about Zend Framework and Lucene, and favor Zend's solution as I believe it is the easiest to setup and use with a PHP environment.
Luckily, Zend Framework can be used by module and you can easily only use the Zend Search Lucene module by itself without the entire class library.
** Edit **
The role of an indexer is not to replace your DB, but to improve it's search functionality by providing a way to perform partial searches. For example, given your table, you may only index a few of your fields (make them "queryable") and have other static (non-indexed) fields to reference your rows in your database.
The advantage in using an indexer is that you can also index pre-computations and directly search them, instead of querying the database.
I'm having some performance issues with MySQL database due to it's normalization.
Most of my applications that uses a database needs to do some heavy nested queries, which in my case takes a lot of time. Queries can take up 2 seconds to run, with indexes. Without indexes about 45 seconds.
A solution I came a cross a few month back was to use a faster more linear document based database, in my case Solr, as a primary database. As soon as something was changed in the MySQL database, Solr was notified.
This worked really great. All queries using the Solr database only took about 3ms.
The numbers looks good, but I'm having some problems.
Huge database
The MySQL database is about 200mb, the Solr db contains about 1.4Gb of data.
Each time I need to change a table/column the database need to be reindexed, which in this example took over 12 hours.
Difficult to render both a Solr object and a Active Record (MySQL) object without getting wet.
The view is relying on a certain object. It doesn't care if the object it self is an Active Record object or an Solr object, as long as it can call a set of attributes on the it.
Like this.
# Controller
#song = Song.first
# View
#song.artist.urls.first.service.name
The problem in my case is that the data being returned from Solr is flat like this.
{
id: 123,
song: "Waterloo",
artist: "ABBA",
service_name: "Groveshark",
urls: ["url1", "url2", "url3"]
}
This forces me to build an active record object that can be passed to the view.
My question
Is there a better way to solve the problem?
Some kind of super duper fast primary read only database that can handle complex queries fast would be nice.
Solr individual fields update
About reindexing all on schema change: Solr does not support updating individual fields yet, but there is a JIRA issue about this that's still unresolved. However, how many times do you change schema?
MongoDB
If you can live without a RDBMS (without joins, schema, transactions, foreign key constrains), a document-based DB like MongoDB,
or CouchDB would be a perfect fit. (here is a good comparison between them )
Why use MongoBD:
data is in native format (you can use an ORM mapper like Mongoid directly in the views, so you don't need to adapt your records as you do with Solr)
dynamic queries
very good performance on non-full text search queries
schema-less (no need for migrations)
build-in, easy to setup replication
Why use SOLR:
advanced, very performant full-text search
Why use MySQL
joins, constrains, transactions
Solutions
So, the solutions (combinations) would be:
Use MongoDB + Solr
but you would still need to reindex all on schema change
Use only MongoDB
but drop support for advanced full-text search
Use MySQL in a master-slave configuration, and balance reads from slave(s) (using a plugin like octupus) + Solr
setup complexity
Keep current setup, denormalize data in MySQL
messy
Solr reindexing slowness
The MySQL database is about 200mb, the Solr db contains about 1.4Gb of
data. Each time I need to change a table/column the database need to
be reindexed, which in this example took over 12 hours.
Reindexing 200MB DB in Solr SHOULD NOT take 12 hours! Most probably you have also other issues like:
MySQL:
n+1 issue
indexes
SOLR:
commit after each request - this is the default setup is you use a plugin like sunspot, but it's a perf killer for production
From http://outoftime.github.com/pivotal-sunspot-presentation.html:
By default, Sunspot::Rails commits at the end of every request
that updates the Solr index. Turn that off.
Use Solr's autoCommit
functionality. That's configured in solr/conf/solrconfig.xml
Be
glad for assumed inconsistency. Don't use search where results need to
be up-to-the-second.
other setup issues (http://wiki.apache.org/solr/SolrPerformanceFactors#Indexing_Performance)
Look at the logs for more details
Instead of pushing your data into Solr to flatten the records, why don't you just create a separate table in your MySQL database that is optimized for read only access.
Also you seem to contradict yourself
The view is relying on a certain object. It doesn't care if the object it self is an Active Record object or an Solr object, as long as it can call a set of attributes on the it.
The problem in my case is that the data being returned from Solr is flat... This forces me to build a fake active record object that can be rendered by the view.