I am developing a web application with spring, hibernate and mysql i would like to know how to fetch data from database very fast. I am trying to select a data from my database. There are thousand of record in my database so its taking more time to select records. I have to know how can i minimize my record fetch time.please give me some suggestion so that i can optimize my web application.
Note : My database has foreign key mapping so i am relating many table to produce final result.
To optimize the time of request. You have to use the execution plan. Here is the documentation for MySQL
In general, here are some recommendations to use :
Choose the good index. For instance, if you have to choose between Long and String prefer Long.
In Select clause just specify field you need.
Joins are expensive in terms of time. Make sure that you use all the keys that relate the two tables together and don't join to unused tables -- always try to join on indexed fields. The join type is important as well (INNER, OUTER,... ).
There are some others tips to use but those i list could really improve your time.
Related
There's 10 tables all with a session_id column and a single session table. The goal is to join them all on the session table. I get the feeling that this is a major code smell. Is this good/bad practice ?
What problems could occur?
Whether this is a good design or not depends deeply on what you are trying to represent with it. So, it might be OK or it might not be... there's no way to tell just from your question in its current form.
That being said, there are couple ways to speed up a join:
Use indexes.
Use covering indexes.
Under the right DBMS, you could use a materialized view to store pre-joined rows. You should be able to simulate that under MySQL by maintaining a special table via triggers (or even manually).
Don't join a table unless you actually need its fields. List only the fields you need in the SELECT list (instead of blindly using *). The fastest operation is the one you don't have to do!
And above all, measure on representative amounts of data! Possible results:
It's lightning fast. Yay!
It's slow, but it doesn't matter that it's slow (i.e. rarely used / not important).
It's slow and it matters that it's slow. Strap-in, you have work to do!
We need Query with 11 joins and the EXPLAIN posted in the original question when it is available, please. And be kind to your community, for every table involved post as well SHOW CREATE TABLE tblname SHOW INDEX FROM tblname to avoid additional requests for these 11 tables. And we will know scope of data and cardinality involved for each indexed column.
of Course more join kills performance.
but it depends !! if your data model is like that then you can't help yourself here unless complete new data model re-design happen !!
1) is it a online(real time transaction ) DB or offline DB (data warehouse)
if online , then better maintain single table. keep data in one table , let column increase in size.!!
if offline , it's better to maintain separate table , because you are not going to required all column always.!!
As an example: I'm having a database to detect visitor (bots, etc) and since not every visitor have the same amount of 'credential' I made a 'dynamic' table like so: see fiddle: http://sqlfiddle.com/#!9/ca4c8/1 (simplified version).
This returns me the profile ID that I use to gather info about each profile (in another DB). Depending on the profile type I query the table with different nameclause (name='something') (ei: hostname, ipAddr, userAgent, HumanId, etc).
I'm not an expert in SQL but I'm familiar with indexes, constraints, primary, unique, foreign key etc. And from what I saw from these search results:
Mysql Self-Join Performance
How to tune self-join table in mysql like this?
Optimize MySQL self join query
JOIN Performance Issue MySQL
MySQL JOIN performance issue
Most of them have comments about bad performance on self-join but answers tend to go for the missing index cause.
So the final question is: is self joining a table makes it more prone to bad performance assuming that everything is indexed properly?
On a side note, more information about the table: might be irrelevant to the question but is well in context for my particular situation:
column flag is used to mark records for deletion as the user I use from php don't have DELETE permission over this database. Sorry, Security is more important than performance
I added the 'type' that will go with info I get from the user agent. (ie: if anything is (at least seems to be) a bot, we will only search for type 5000.
Column 'name' is unfortunately a varchar indexed in the primary key (with profile and type).
I tried to use as much INT and filtering (WHERE) in the SELECT query to reduce eventual lost of performance (if that even matters)
I'm willing to study and tweak the thing if needed unless someone with a high background in mysql tells me it's really not a good thing to do.
This is a big project I have in development so I cannot test it with millions of records for now but I wonder if performance will be an issues as this grows. Any input, links, references, documentation or test procedure (maybe in comments) will be appreciated.
A self-join is no different than joining two different tables. The optimizer will pick one 'table', usually based on the WHERE, then do a Nested Loop Join into the other. In your case, you have implied, via LEFT, that it should work only one way. (The Optimizer will ignore that if it sees no need for it.
Your keys are find for that Fiddle.
The real problem is "Entity-Attribute-Value", which is a messy way to lay out data in tables. Your query seems to be saying "find a (limit 1) profile (entity) that has a certain pair of attributes (name = Googlebot AND addr = ...).
It would be so much easier, and faster, to have two columns (name and addr) and a "composite" INDEX(name, addr).
I recommend doing that for the common "attributes", then put the rest into a single column with a JSON string. See here.
I am developing a website by using ASP.net and my DB is MYSQL.
Users can put ads for each categories. And I want to display how much ads for each category infront of the category.
Like this.
To achieve this now I am using a code similar to this
SELECT b.name, COUNT(*) AS count
FROM `vehicle_cat` a
INNER JOIN `vehicle_type` b
ON a.`type_id_ref` = b.`vehicle_type_id`
GROUP BY b.name
This is my explain result
So assume I have 200,000 records for each category.
So am I doing the right thing by considering the performance and efficiency?
What if I manage a separate table for store count for each category? If user save a record for each category I am incrementing the value for corresponding type. Assume 100,000 of users will Post records at once. Is it crash my DB?
Or is there any solutions?
Start by developing the application using the query. If performance is a problem, then create indexes on the query to optimize the query. If indexes are not sufficient, then think about partitioning.
Things not to do:
Don't create a separate table for each category.
Don't focus on performance before you have a performance problem. Do reasonable things, but get the functionality to work first.
If you do need to maintain counts in a separate table for performance reasons, you will probably have to maintain them using triggers.
You can use any caching solution, probably in memory caching like Redis or Memcached. And store your counters here. On cache initialization get them with your SQL script, later change this counters when adding or deleting ads. It will be faster then store them in SQL.
But you probably need to check if COUNT(*) is really hard operation in your SQL database. SQL engine is clever and may be this SELECT is working fast enough or you can optimize it well. If it works, you'd better do nothing until you have perfomance problems!
I can't seem to find any examples of anyone doing this on the web, so am wondering if maybe there's a reason for that (or maybe I haven't used the right search terms). There might even already be a term for this that I'm unaware of?
To save on database storage space for regularly reoccurring strings, I'm thinking of creating a MySQL table called unique_string. It would only have two columns:
"id" : INT : PRIMARY_KEY index
"string" : varchar(255) : UNIQUE index
Any other tables anywhere in the database can then use INT columns instead of VARCHAR columns. For example a varchar field called browser would instead be an INT field called browser_unique_string_id.
I would not use this for anything where performance matters. In this case I'm using it to track details of every single page request (logging web stats) and an "audit trial" of user actions on intranets, but other things potentially too.
I'm also aware the SELECT queries would be complex, so I'm not worried about that. I'll most likely write some code to generate the queries to return the "real" string data.
Thoughts? I feel like I might be overlooking something obvious here.
Thanks!
I have used this structure for a similar application -- keeping track of URIs for web logs. In this case, the database was Oracle.
The performance issues are not minimal. As the database grows, there are tens of millions of URIs. So, just identifying the right string during an INSERT is challenging. We handled this by building most of the update logic in hadoop, so the database table was, in essence, just a copy of a hadoop table.
In a regular database, you would get around this by building an index, as you suggest in your question. And, an index solution would work well up to your available memory. In fact, this is a rather degenerate case for an index, because you really only need the index and not the underlying table. I'm do not know if mysql or SQL Server recognize this, although columnar databases (such as Vertica) should.
SQL Server has another option. If you declare the string as VARCHAR(max), then it is stored no a separate data page from the rest of the data. During a full table scan, there is no need to load the additional page in memory, if the column is not being referenced in the query.
This is a very common design pattern in databases where the cardinality of the data is relatively small compared to the transaction table that it's linked to. The queries wouldn't be very complex, just a simple join to the lookup table. You can include more than just a string on the lookup table, other information that is commonly repeated. You're simply normalizing your model to remove duplicate data.
Example:
Request Table:
Date
Time
IP Address
Browser_ID
Browser Table:
Browser_ID
Browser_Name
Browser_Version
Browser_Properties
If you planning on logging data in real time (as opposed to a batch job) then you want to ensure your time to write a record to the database is as quick as possible. If you are logging synchronously then obviously the record creating time will directly affect the time it takes for a http request to complete. If this is async then slow record creation times will lead to a bottleneck. However if this is batch job then performance will not matter so long as you can confidently create all the batched records before the next batch runs.
In order to reduce the time it takes to create a record you really want to flatten out your database structure, your current query in pseudo might look like
SELECT #id = id from PagesTable
WHERE PageName = #RequestedPageName
IF #id = 0
THEN
INSERT #RequestedPageName into PagesTable
#id = SELECT ##IDENTITY 'or whatever method you db supports for
'fetching the id for a newly created record
END IF
INSERT #id, #BrowserName INTO BrowersLogTable
Where as in a flat structure you would just need 1 INSERT
If you are concerned about data Integrity, which you should be, then typically you would normalise this data by querying at writing it into a separate set of tables (or a separate database) at regular intervals and use this for querying against.
I have a large database of normalized order data that is becoming very slow to query for reporting. Many of the queries that I use in reports join five or six tables and are having to examine tens or hundreds of thousands of lines.
There are lots of queries and most have been optimized as much as possible to reduce server load and increase speed. I think it's time to start keeping a copy of the data in a denormalized format.
Any ideas on an approach? Should I start with a couple of my worst queries and go from there?
I know more about mssql that mysql, but I don't think the number of joins or number of rows you are talking about should cause you too many problems with the correct indexes in place. Have you analyzed the query plan to see if you are missing any?
http://dev.mysql.com/doc/refman/5.0/en/explain.html
That being said, once you are satisifed with your indexes and have exhausted all other avenues, de-normalization might be the right answer. If you just have one or two queries that are problems, a manual approach is probably appropriate, whereas some sort of data warehousing tool might be better for creating a platform to develop data cubes.
Here's a site I found that touches on the subject:
http://www.meansandends.com/mysql-data-warehouse/?link_body%2Fbody=%7Bincl%3AAggregation%7D
Here's a simple technique that you can use to keep denormalizing queries simple, if you're just doing a few at a time (and I'm not replacing your OLTP tables, just creating a new one for reporting purposes). Let's say you have this query in your application:
select a.name, b.address from tbla a
join tblb b on b.fk_a_id = a.id where a.id=1
You could create a denormalized table and populate with almost the same query:
create table tbl_ab (a_id, a_name, b_address);
-- (types elided)
Notice the underscores match the table aliases you use
insert tbl_ab select a.id, a.name, b.address from tbla a
join tblb b on b.fk_a_id = a.id
-- no where clause because you want everything
Then to fix your app to use the new denormalized table, switch the dots for underscores.
select a_name as name, b_address as address
from tbl_ab where a_id = 1;
For huge queries this can save a lot of time and makes it clear where the data came from, and you can re-use the queries you already have.
Remember, I'm only advocating this as the last resort. I bet there's a few indexes that would help you. And when you de-normalize, don't forget to account for the extra space on your disks, and figure out when you will run the query to populate the new tables. This should probably be at night, or whenever activity is low. And the data in that table, of course, will never exactly be up to date.
[Yet another edit] Don't forget that the new tables you create need to be indexed too! The good part is that you can index to your heart's content and not worry about update lock contention, since aside from your bulk insert the table will only see selects.
MySQL 5 does support views, which may be helpful in this scenario. It sounds like you've already done a lot of optimizing, but if not you can use MySQL's EXPLAIN syntax to see what indexes are actually being used and what is slowing down your queries.
As far as going about normalizing data (whether you're using views or just duplicating data in a more efficient manner), I think starting with the slowest queries and working your way through is a good approach to take.
I know this is a bit tangential, but have you tried seeing if there are more indexes you can add?
I don't have a lot of DB background, but I am working with databases a lot recently, and I've been finding that a lot of the queries can be improved just by adding indexes.
We are using DB2, and there is a command called db2expln and db2advis, the first will indicate whether table scans vs index scans are being used, and the second will recommend indexes you can add to improve performance. I'm sure MySQL has similar tools...
Anyways, if this is something you haven't considered yet, it has been helping a lot with me... but if you've already gone this route, then I guess it's not what you are looking for.
Another possibility is a "materialized view" (or as they call it in DB2), which lets you specify a table that is essentially built of parts from multiple tables. Thus, rather than normalizing the actual columns, you could provide this view to access the data... but I don't know if this has severe performance impacts on inserts/updates/deletes (but if it is "materialized", then it should help with selects since the values are physically stored separately).
In line with some of the other comments, i would definately have a look at your indexing.
One thing i discovered earlier this year on our MySQL databases was the power of composite indexes. For example, if you are reporting on order numbers over date ranges, a composite index on the order number and order date columns could help. I believe MySQL can only use one index for the query so if you just had separate indexes on the order number and order date it would have to decide on just one of them to use. Using the EXPLAIN command can help determine this.
To give an indication of the performance with good indexes (including numerous composite indexes), i can run queries joining 3 tables in our database and get almost instant results in most cases. For more complex reporting most of the queries run in under 10 seconds. These 3 tables have 33 million, 110 million and 140 millions rows respectively. Note that we had also already normalised these slightly to speed up our most common query on the database.
More information regarding your tables and the types of reporting queries may allow further suggestions.
For MySQL I like this talk: Real World Web: Performance & Scalability, MySQL Edition. This contains a lot of different pieces of advice for getting more speed out of MySQL.
You might also want to consider selecting into a temporary table and then performing queries on that temporary table. This would avoid the need to rejoin your tables for every single query you issue (assuming that you can use the temporary table for numerous queries, of course). This basically gives you denormalized data, but if you are only doing select calls, there's no concern about data consistency.
Further to my previous answer, another approach we have taken in some situations is to store key reporting data in separate summary tables. There are certain reporting queries which are just going to be slow even after denormalising and optimisations and we found that creating a table and storing running totals or summary information throughout the month as it came in made the end of month reporting much quicker as well.
We found this approach easy to implement as it didn't break anything that was already working - it's just additional database inserts at certain points.
I've been toying with composite indexes and have seen some real benefits...maybe I'll setup some tests to see if that can save me here..at least for a little longer.