I am sending queries to a very large database (meaning many entities/tables).
So I have some queries which include some 7 to 8 joins.
The problem is, that I do not know, how many entries the tables will have in near future. It could be between 1.000 to 100.000 rows each table (or even more).
I think about splitting my queries to perform two or three queries consecutively instead of one mega-query.
Is there a common/recommended limit of JOIN's in an MySQL Query?
How can I measure/calculate which type of splitting would be a good variant (depending on count-of-rows in the tables, and so on)?
I have many JOIN's on the same field (foreign-key) of the same table. Is there a way to optimize that as well? (one row in that table - has many relations/connections)
thanks ;)
UPDATE:
I saw it too late. Somebody was so nice and changed the title of the question.
Because of my bad English I wrote performant - meaning having good performance. I did't mean to perform!
Please consider this in your answers. thank again!
You probably want to learn about EXPLAIN which will show you what MySQL's plan is for executing your query. e.g
EXPLAIN SELECT foo FROM bar NATURAL JOIN baz
will tell you how MySQL would execute the query SELECT foo FROM bar NATURAL JOIN baz
From the EXPLAIN results you may see opportunities to add indexes to the database that will help your queries if they're slow, and in some cases, you may be able to add hints to the query e.g. telling MySQL to prefer one index over another if you have the experise to know that.
In general you will gain nothing from trying to "split up" a query unless your "splitting up" actually completely changes the semantics of what will need to be executed. e.g. if your query is fetching six unrelated things from the database, and you re-write this as six separate queries each fetching one thing, the aggregate time taken to execute will probably be no better (and may be much worse) for your separate queries.
use 'desc (query);' to get a sense of how MySQL will treat your query. You are generally better off having MySQL do the joining and optimizing than doing it yourself. That's what its good at.
This will also tell you where indexing is working or needs to be augmented.
Related
My thinking is that if I put my ANDs that filter out a greater number of rows before those that filter out just a few, my query should run quicker since that selection set is much smaller between And statements.
But does the order of AND in the WHERE clause of an SQL Statement really effect the performance of the SQL that much or are the engines optimized already for this?
It really depends on the optimiser.
It shouldn't matter because it's the optimiser's job to figure out the optimal way to run your query regardless of how you describe it.
In practice, no optimiser is perfect so you might find that re-ordering the clauses does make a difference to particular queries. The only way to know for sure is to test it yourself with your own schema, data etc.
Most SQL engines are optimized to do this work for you. However, I have found situations in which trying to carve down the largest table first can make a big difference - it doesn't hurt !
A lot depends how the indices are set up. If an index exists which combines the two keys, the optimizer should be able to answer the query with a single index search. Otherwise if independent indices exist for both keys, the optimizer may get a list of the records satisfying each key and merge the lists. If an index exists for one condition but not the other, the optimizer should filter using the indexed list first. In any of those scenarios, it shouldn't matter what order the conditions are listed.
If none of the conditions apply, the order the conditions are specified may affect the order of evaluation, but since the database is going to have to fetch every single record to satisfy the query, the time spent fetching will likely dwarf the time spent evaluating the conditions.
I have a MYSQL question:
can anybody tell me a way how to measure if an IN() clause is getting nonperformance or not.
So far I am having a table which holds about 5.000 rows and the IN() will check up to 100 IDs. it may grow up to 50.000 in the next two years.
Thanks
NOTE
with nonperformant I mean, to be in effective, slowly, bad performance, ...
UPDATE
It's a decission finding problem; so the EXPLAIN Command in MySql does not answer my question. When the perfromance is bad, I can see it myself. But I want to know it before I start to design in a way, which might be the wrong...
UPDATE
I am searching for a measuring technique for general purpose.
You would use the EXPLAIN statement to check how the query is being executed. It displays information from the optimizer about the query execution plan, how it would process the statement, and how tables are joined and in which order.
There are many times that a JOIN can be used in place of an IN, which should yield better performance. Additionally, indices make a significant difference on how fast the query runs.
We would need to see your query and an EXPLAIN at the very least.
you can use the mysql explain statement to get the query plan. Just enter explain in front of your select and see what it says. You will need to learn how to read it but it is very helpful in identifying if a query is as fast as you would expect.
mysql also does not have the best query optimizer. In my experience sometimes it is faster to run 100 simple and fast queries than to run a complicated join. This is a rare case but I have gotten performance increases from it
i'm new to mySQL.
i have a books table and i want the users to be able to search through the books and find a specific book.
the books table columns:
======================
publisher
writer
name
price
publishing date
etc.
how should i query this table to find a specefic book to have a good performance? what i am doing now is:
SELECT name,writer,publisher,price
FROM books
WHERE publisher='publisher'
AND writer='wirter'
AND name='name'
AND price<='price'
AND publishingdate>='publishingdate'
etc.
but there are too many ANDs and i think this will kill the server. is there a better method to search a table?
thanks
You should add indexes to your table to improve performance.
You could try indexing each column individually if you want to support a wide range of user-specified searches on different columns.
If after adding these indexes you still find that a specific query that runs slowly, you may want to add a multi-column index that MySQL can use to improve the performance of that query. You can use EXPLAIN to see which indexes a specific query uses.
You should also be aware that adding indexes can improve performance of reads, but it will also decrease the performance of writes to the table. If writing speed is also important, you should be careful to not add many unused indexes.
Related
CREATE INDEX Syntax
Understanding the Query Execution Plan
It's perfectly ok to have multiple like this for the conditions you require.
I wouldn't consider less than a dozen to be excessive. I've seen queries with 30 ands that didn't have performance issues.
It will not 'kill the server' at all. However this is a good opportunity to try queries, see how long they run and then try to improve them with indexes, other tables, etc.
If performance is an issue due to a large number of records you can add indexes to the various fields. If you have under 10,000 records I wouldn't bother.
This is fine, it won't kill anything. Try not to optimize prematurely; make sure it works first.
Once you have a significant number of books rows and actually see performance problems, then is the time to revisit your schema. In most cases you will just need a few indexes to speed up lookups. Tools like the EXPLAIN query help in diagnosing what the database is doing when executing your query.
So in short, don't worry about it.
Recently I was asked to develop an app, which basically is going to use 1 main single table in the whole database for the operations.
It has to have around 20 columns with various types - decimals, int, varchar, date, float. At some point the table will have thousands of rows (3-5k).
The app must have the ability to SELECT records by combining each of the columns criteria - e.g. BETWEEN dates, greater than something, smaller than something, equal to something etc. Basically combining a lot of where clauses in order to see the desired result.
So my question is, since I know how to combine the wheres and make the app, what is the best approach? I mean is MySQL good enough not to slow down when I have 3k records and make a SELECT query with 15 WHERE clauses? I've never worked with a database larger than 1k records, so I'm not sure if I should use MySQL for this. Also I'm going to use PHP as a server language if that matters at all.
you are talking about conditions in ONE where clause.
3000 rows is very minimal for a relational database. these typically go far larger (like 3 million+ or even much more)
i am concerned that you have 20 columns in one table. this sounds like a normalization problem.
With a well-defined structure for your database, including appropriate indexes, 3k records is nothing, even with 15 conditions. Even without indexes, it is doubtful that with so few records, you will see any performance hit.
I would however plan for the future and perhaps look at your queries and see if there is any table optimisation you can do at this stage, to save pain in the future. Who knows, 3k records today, 30m next year.
3000 Records in a database is nothing. You won't have any performance issues even with your 15 WHERE.
MySQL and PHP will do the job just fine.
I'd be more concerned about your huge amount of columns. Maybe you should take a look at this article to make sure you respect the databases normal forms,
Good luck for your project.
I don't think querying a single table of 3-5K rows is going to be particularly intensive. MySQL should be able to cope with something like this easily enough. You could add lot's of indexes to speed up your selects if this is the "choke point" but this will slow down insert, edit's, etc. also if you querying lots of different rows this isn't prob a good idea.
As seeing the no of rows is very minimal,I guess it should not cause any performance issue.Still you can look at using OR operator carefully and also indexes on the columns in where clause.
Indices, indices, indices!
If you need to check a lot of different columns try flatten your used logic. In any case make sure you have set an appropriate index on the checked columns. A not an index per columns, but one index over all those columns, that a used regularly.
I have a large database of normalized order data that is becoming very slow to query for reporting. Many of the queries that I use in reports join five or six tables and are having to examine tens or hundreds of thousands of lines.
There are lots of queries and most have been optimized as much as possible to reduce server load and increase speed. I think it's time to start keeping a copy of the data in a denormalized format.
Any ideas on an approach? Should I start with a couple of my worst queries and go from there?
I know more about mssql that mysql, but I don't think the number of joins or number of rows you are talking about should cause you too many problems with the correct indexes in place. Have you analyzed the query plan to see if you are missing any?
http://dev.mysql.com/doc/refman/5.0/en/explain.html
That being said, once you are satisifed with your indexes and have exhausted all other avenues, de-normalization might be the right answer. If you just have one or two queries that are problems, a manual approach is probably appropriate, whereas some sort of data warehousing tool might be better for creating a platform to develop data cubes.
Here's a site I found that touches on the subject:
http://www.meansandends.com/mysql-data-warehouse/?link_body%2Fbody=%7Bincl%3AAggregation%7D
Here's a simple technique that you can use to keep denormalizing queries simple, if you're just doing a few at a time (and I'm not replacing your OLTP tables, just creating a new one for reporting purposes). Let's say you have this query in your application:
select a.name, b.address from tbla a
join tblb b on b.fk_a_id = a.id where a.id=1
You could create a denormalized table and populate with almost the same query:
create table tbl_ab (a_id, a_name, b_address);
-- (types elided)
Notice the underscores match the table aliases you use
insert tbl_ab select a.id, a.name, b.address from tbla a
join tblb b on b.fk_a_id = a.id
-- no where clause because you want everything
Then to fix your app to use the new denormalized table, switch the dots for underscores.
select a_name as name, b_address as address
from tbl_ab where a_id = 1;
For huge queries this can save a lot of time and makes it clear where the data came from, and you can re-use the queries you already have.
Remember, I'm only advocating this as the last resort. I bet there's a few indexes that would help you. And when you de-normalize, don't forget to account for the extra space on your disks, and figure out when you will run the query to populate the new tables. This should probably be at night, or whenever activity is low. And the data in that table, of course, will never exactly be up to date.
[Yet another edit] Don't forget that the new tables you create need to be indexed too! The good part is that you can index to your heart's content and not worry about update lock contention, since aside from your bulk insert the table will only see selects.
MySQL 5 does support views, which may be helpful in this scenario. It sounds like you've already done a lot of optimizing, but if not you can use MySQL's EXPLAIN syntax to see what indexes are actually being used and what is slowing down your queries.
As far as going about normalizing data (whether you're using views or just duplicating data in a more efficient manner), I think starting with the slowest queries and working your way through is a good approach to take.
I know this is a bit tangential, but have you tried seeing if there are more indexes you can add?
I don't have a lot of DB background, but I am working with databases a lot recently, and I've been finding that a lot of the queries can be improved just by adding indexes.
We are using DB2, and there is a command called db2expln and db2advis, the first will indicate whether table scans vs index scans are being used, and the second will recommend indexes you can add to improve performance. I'm sure MySQL has similar tools...
Anyways, if this is something you haven't considered yet, it has been helping a lot with me... but if you've already gone this route, then I guess it's not what you are looking for.
Another possibility is a "materialized view" (or as they call it in DB2), which lets you specify a table that is essentially built of parts from multiple tables. Thus, rather than normalizing the actual columns, you could provide this view to access the data... but I don't know if this has severe performance impacts on inserts/updates/deletes (but if it is "materialized", then it should help with selects since the values are physically stored separately).
In line with some of the other comments, i would definately have a look at your indexing.
One thing i discovered earlier this year on our MySQL databases was the power of composite indexes. For example, if you are reporting on order numbers over date ranges, a composite index on the order number and order date columns could help. I believe MySQL can only use one index for the query so if you just had separate indexes on the order number and order date it would have to decide on just one of them to use. Using the EXPLAIN command can help determine this.
To give an indication of the performance with good indexes (including numerous composite indexes), i can run queries joining 3 tables in our database and get almost instant results in most cases. For more complex reporting most of the queries run in under 10 seconds. These 3 tables have 33 million, 110 million and 140 millions rows respectively. Note that we had also already normalised these slightly to speed up our most common query on the database.
More information regarding your tables and the types of reporting queries may allow further suggestions.
For MySQL I like this talk: Real World Web: Performance & Scalability, MySQL Edition. This contains a lot of different pieces of advice for getting more speed out of MySQL.
You might also want to consider selecting into a temporary table and then performing queries on that temporary table. This would avoid the need to rejoin your tables for every single query you issue (assuming that you can use the temporary table for numerous queries, of course). This basically gives you denormalized data, but if you are only doing select calls, there's no concern about data consistency.
Further to my previous answer, another approach we have taken in some situations is to store key reporting data in separate summary tables. There are certain reporting queries which are just going to be slow even after denormalising and optimisations and we found that creating a table and storing running totals or summary information throughout the month as it came in made the end of month reporting much quicker as well.
We found this approach easy to implement as it didn't break anything that was already working - it's just additional database inserts at certain points.
I've been toying with composite indexes and have seen some real benefits...maybe I'll setup some tests to see if that can save me here..at least for a little longer.