I am running a very simple select query on some tables of information_schema but it always take too much of time.
For example:
select * from REFERENTIAL_CONSTRAINTS limit 3 ;
It takes around 34 seconds.
This query is very simple, no need table scan i think, no need of any condition etc. So why it takes too much of time.
some other tables in information_Schema also takes lot of time.
Thanks
The information_schema tables are not really tables. They are a mechanism that exposes server internals via the SQL interface. The responses to these queries are not from data that is "stored in a table" in any sense that you might expect -- it's "collected" each time the query is run.
The level of communication between the SQL layer and the lower layers that collect the data does not always support the optimizations you might expect; for example, the LIMIT here is most likely not making it down -- the entire table is being rendered internally and then all but the first three rows are discarded... so this query is probably just as slow with and without the limit.
Two general rules of thumb with information_schema -- which are really valid for all of SQL, but particularly here, are to select only the columns you need (not *, which will potentially require the server to do more work than necessary if you do not really need all the columns returned) and specify WHERE, both of which may reduce the amount of internal work being done.
Another potential performance killer is heavy-handed tweaking ("tuning") of server variables. Most variables on most systems need to be left alone often than they are. Some of them, like table_open_cache can even cause the server to perform worse, the more "optimally" you tune them.
Related
Hi just a simple question
I need to store data to database, there are 2 option to show now
Data : a,b,c,d
1. store a,b,c,d in 1 column, when needed only query and perform splitting in application
2. store a,b,c,d to 4 different column, can query directly from database
Which option will be better? My concern is split it into 4 different column will make the tables contain many column, does it slow down the performance? And also I am curious is it possible the query is fast but the transfer of data to my application is slow?
MySQL performance is a complicated subject. To the issue you raised:
My concern is split it into 4 different column will make the tables
contain many column, does it slow down the performance?
there is nothing inherently worse, from a performance perspective, to have 4 columns, or 10, or 20, or 50.
Now, that being said, there are things that could impact performance, and probably will if you don't know about them. For example, if you SELECT * FROM {my_table} when really you only need to SELECT a FROM {my_table}... yeah, that'll impact your performance (although there are arguments to be made in favor of SELECT * FROM {my_table} depending on your caching strategy).
Likewise, you'll want to consider LIMIT statements. To your question
And also I am curious is it possible the query is fast but the
transfer of data to my application is slow?
Yes, of course. If you only need 50 rows and your table has 50000, you're gonna want to add limit clauses to your SQL statements, or you'll be sending a lot more data over the wire than you need to be. Memory is faster than disk and disk is faster than network. If you're sending a lot of data over the wire that you don't need, you better believe it's gonna cause performance problems. But again, keep in mind, that has nothing to do with how many columns you have. There is absolutely nothing inherent in the number of columns a table has that affects performance (at least not at the scale that you're talking about and in the way that you are thinking about it)
All of which to say, performance is a complex topic. You should take a look into it, if you're interested. And it sounds like a,b,c, and d are logically different columns, so you should probably go ahead and store them in different columns in MySQL. Hope this helps.
What is the optimal solution, use Inner Join or multiple queries?
something like this:
SELECT * FROM brands
INNER JOIN cars
ON brands.id = cars.brand_id
or like this:
SELECT * FROM brands
... (while query)...
SELECT * FROM cars WHERE brand_id = [row(brands.id)]
Generally speaking, one query is better, but there are come caveats to that. For example, older versions of SQL Server had a great decreases in performance if you did more than seven joins. The answer will really depend on the database engine, version, query, schema, fields, etc., so we can't say for sure which is better. Always look into minimizing the number of queries when possible without going too overboard and creating result sets that are crazy or impossible to maintain.
This is a very subjective question but remember that each time you call the database there's significant overhead.
Almost without exception the optimum is to issue as few commands and pull out all the data you'll need. However for practical reasons this clearly may not be possible.
Generally speaking if a database is well maintained one query is quicker than two. If it's not you need to look at your data/indicies and determine why.
A final point, you're hinting in your second example that you'd load the brand then issue a command to get all the cars in each brand. This is without a doubt your worst option as it doesn't issue 2 commands - it issues N+1 where N is the number of brands you have... 100 brands is 101 DB hits!
Your two queries are not exactly the same.
The first returns all fields from brands and cars in one row. The second returns two different result sets that need to be combined together.
In general, it is better to do as many operations in the database as possible. The database is more efficient for processing large amounts of data. And, it generally reduces the amount of data being brought back to the client.
That said, there are a few circumstances where more data is being returned in a single query than with multiple queries. For instance in your example, if you have one brand record with 100 columns and 10,000 car records with three columns, then the two-query method is probably faster. You are only bringing back the columns from brands table once rather than 10,000 times.
These examples where multiple queries is better are few and far between. In general, it is better to do the processing in the database. If performance needs to be improved, then in a few rare cases, you might be able to break up queries and improve performance.
In general, use first query. Why? Because query execution time is not just query itself time, but also some overheads, such as:
Creating connection overhead
Network data sending overhead
Closing (handling) connection overhead
Depending of situation, some overheads may present or not. For example, if you're using persistent connection, then you'll not get connection overhead. But in common case that's not true, thus, it will have place. And creating/maintaining/closing connection overhead is very significant part. Imagine that you have this overhead as only 1% from total query time (in real situation it will be much more). And you have - let's say, 1.000.000 rows. Then first query will produce that overhead only once, while second will be 1.000.000/100 = 10.000 times. Just think about - how slow it will be.
Besides, INNER JOIN will also be done using key - if it exists, thus, in terms of query itself speed it will be near same as second. So I highly recommend to use INNER JOIN option.
Breaking complex query into simple queries may be useful in a very specific cases. For example, case with IN subquery. In this situation, if you're using WHERE id IN (subquery), where (subquery) is some SQL, MySQL will treat that as = ANY subquery and will not use key for that, even if subquery results in narrow list of ids. And - yes, split it into two queries may have sense since WHERE IN(static list) will work in another way - MySQL will use range index scan for that (strange, but true - because for IN (static list) statement IN will be treated as comparison operator, and not =ANY subquery qualifier). This part isn't directly about your case - but to show that - yes, cases, when splitting processing from DBMS may be useful in terms of performance - exist.
One query is better, because up to about 90% of the expense of executing a query is in the overheads:
communication traffic to/from database
syntax checking
authority checking
access plan calculation by optimizer
logging
locking (even read-only requires a lock)
lots of other stuff too
Do all that just once for one query, or do it all n times for n queries, but get the same data.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Can select * usage ever be justified?
Curious to hear this from folks with more DBA insight, but what performance implications does an application face from when you see a query like:
select * from some_large_table;
You have to do a full table scan since no index is being hit, and I believe if we're talking O notation, we're speaking O(N) here where N is the size of the table. Is this typically considered not optimal behavior? What if you really do need everything from the table at certain times? Yes we have tools such as pagination etc, but I'm talking strictly from a database perspective here. Is this type of behavior normally frowned upon?
What happens if you don't specify columns, is that the DB Engine has to query the master table data for the column list. This query is really fast, but causes a minor performance issue. As long as you're not doing a sloppy SELECT * with a JOIN statement or nested queries, you should be fine. However, note the small performance impact of letting the DB Engine doing a query to find the columns.
MySQL server opens a cursor on server-side to read that table. The client of the query may read none or all records and performance for the client will only depend on the number of records it actually fetched. Also the performance of the query on server-side can acutally be faster than query with some conditions as it involves also some index reading. Only if client fetched all records, it will be equivalent to full table scan.
Selecting more columns than you need (select *) is always bad. Don't do more than you have to
If you're selecting from the whole table, it doesn't matter if you have an index.
Some other issues you're going to run into is how you want to lock the table. If this is a busy application you might not want to prevent locking entirely because of the inconsistent data that might be returned. But if you lock too tightly it could slow the query further. O(n) is considered acceptable in any computer science application. However in databases we measure in time & number of reads/writes. This is a huge number of reads and will probably take a long time to execute. Therefore it's unacceptable.
We have a report that users can run that needs to select records from 5 different services. Right not, I am using UNION to combine all the tables in one query, but sometimes, it was just too much for the server and it crashed!
I optimized bits and pieces of the query (where's and table joins) and there haven't been any crashes since, but the report still takes a long time to load (ie the query is very slow).
The question is, will mysql perform faster and more optimally if I create 5 temp tables for the different service types, and then select from all of the temps? Or is there a different idea?
I could, of course, just use 5 separate selects and then combine them in the code (php). But I imagine this would cause the report to load even slower...
Any ideas?
Usually the limiting factor in speed is the database, not PHP. I'd suggest running seperate queries and let the PHP do the combining, see if that is faster. If you're not storing all data in arrays or doing other heavy processing, I suspect the PHP way is much faster.
(this was actually meant as a comment but don't have those rights yet..)
I once had a MySQL database table containing 25 million records, which made even a simple COUNT(*) query takes minute to execute. I ended up making partitions, separating them into a couple tables. What i'm asking is, is there any pattern or design techniques to handle this kind of problem (huge number of records)? Is MSSQL or Oracle better in handling lots of records?
P.S
the COUNT(*) problem stated above is just an example case, in reality the app does crud functionality and some aggregate query (for reporting), but nothing really complicated. It's just that it takes quite a while (minutes) to execute some these queries because of the table volume
See Why MySQL could be slow with large tables and COUNT(*) vs COUNT(col)
Make sure you have an index on the column you're counting. If your server has plenty of RAM, consider increasing MySQL's buffer size. Make sure your disks are configured correctly -- DMA enabled, not sharing a drive or cable with the swap partition, etc.
What you're asking with "SELECT COUNT(*)" is not easy.
In MySQL, the MyISAM non-transactional engine optimises this by keeping a record count, so SELECT COUNT(*) will be very quick.
However, if you're using a transactional engine, SELECT COUNT(*) is basically saying:
Exactly how many records exist in this table in my transaction ?
To do this, the engine needs to scan the entire table; it probably knows roughly how many records exist in the table already, but to get an exact answer for a particular transaction, it needs a scan. This isn't going to be fast using MySQL innodb, it's not going to be fast in Oracle, or anything else. The whole table MUST be read (excluding things stored separately by the engine, such as BLOBs)
Having the whole table in ram will make it a bit faster, but it's still not going to be fast.
If your application relies on frequent, accurate counts, you may want to make a summary table which is updated by a trigger or some other means.
If your application relies on frequent, less accurate counts, you could maintain summary data with a scheduled task (which may impact performance of other operations less).
Many performance issues around large tables relate to indexing problems, or lack of indexing all together. I'd definitely make sure you are familiar with indexing techniques and the specifics of the database you plan to use.
With regards to your slow count(*) on the huge table, i would assume you were using the InnoDB table type in MySQL. I have some tables with over 100 million records using MyISAM under MySQL and the count(*) is very quick.
With regards to MySQL in particular, there are even slight indexing differences between InnoDB and MyISAM tables which are the two most commonly used table types. It's worth understanding the pros and cons of each and how to use them.
What kind of access to the data do you need? I've used HBase (based on Google's BigTable) loaded with a vast amount of data (~30 million rows) as the backend for an application which could return results within a matter of seconds. However, it's not really appropriate if you need "real time" access - i.e. to power a website. Its column-oriented nature is also a fairly radical change if you're used to row-oriented DBMS.
Is count(*) on the whole table actually something you do a lot?
InnoDB will have to do a full table scan to count the rows, which is obviously a major performance issue if counting all of them is something you actually want to do. But that doesn't mean that other operations on the table will be slow.
With the right indexes, MySQL will be very fast at retrieving data from tables much bigger than that. The problem with indexes is that they can hurt insert speeds, particularly for large tables as insert performance drops dramatically once the space required for the index reaches a certain threshold - presumably the size it will keep in memory. But if you only need modest insert speeds, MySQL should do everything you need.
Any other database will have similar tradeoffs between retrieve speed and insert speed; they may or may not be better for your application. But I would look first at getting the indexes right, and maybe rewriting your queries, before you try other databases. For what it's worth, we picked MySQL originally because we found it performed best.
Note that MyISAM tables in MySQL store the total size of the table. They maintain this because it's useful to the optimiser in some cases, but a side effect is that count(*) on the whole table is really fast. That doesn't necessarily mean they're faster than InnoDB at anything else.
I answered a similar question in This Stackoverflow Posting in some detail, describing the merits of the architectures of both systems. To some extent it was done from a data warehousing point of view but many of the differences also matter on transactional systems.
However, 25 million rows is not a VLDB and if you are having performance problems you should look to indexing and tuning. You don't need to go to Oracle to support a 25 million row database - you've got about 3 orders of magnitude to go before you're truly in VLDB territory.
You are asking for a books worth of answer and I therefore propose you get a good book on databases. There are many.
To get you started, here are some database basics:
First, you need a great data model based not just on what data you need to store but on usage patterns. Good database performance starts with good schema design.
Second, place indicies on columns based upon expected lookup AND update needs as update performance is often overlooked.
Third, don't put functions in where clauses if at all possible.
Fourth, use an -ahem- RDBMS engine that is of quality design. I would respectfully submit that while it has improved greatly in the recent past, mysql does not qualify. (Apologies to those who wish to argue it has finally made the grade in recent times.) There is no longer any need to choose between high-price and quality; Postgres (aka PostgreSql) is available open-source and is truly fantastic - and has all the plug-ins available to meet your needs.
Finally, learn what you are asking a database engine to do - gain some insight into internals - so you can better judge what kinds of things are expensive and why.
I'm going to second #Mark Baker, and say that you need to build indices on your tables.
For other queries than the one you selected, you should also be aware that using constructs such as IN() is faster than a series of OR statements in the query. There are lots of little steps you can take to speed-up individual queries.
Indexing is key to performance with this number of records, but how you write the queries can make a big difference as well. Specific performance tuning methods vary by database, but in general, avoid returning more records or fields than you actually need, make sure all join fields are indexed (as well as common where clause fields), avoid cursors (although I think this is less true in Oracle than SQL Server I don't know about mySQL).
Hardware can also be a bottleneck especially if you are running things besides the database server on the same machine.
Performance tuning is a very technical subject and can't really be answered well in a format like this. I suggest you get a performance tuning book and read it. Here is a link to one for mySQL
http://www.amazon.com/High-Performance-MySQL-Optimization-Replication/dp/0596101716