I have a table with the following structure:
ID, SourceID, EventId, Starttime, Stoptime
All of the ID columns are char(36) and the times are dates.
The problem is that querying the table is really slow. I have 7 millons rows, I have about 60-70 threads that are writing (insert or update) to the table all the time.
On the other side I have the GUI that needs to read from this table, and it's here it get slow. If I want to select all the events that have been made where SourceID = something it takes almost 300 seconds. SourceID has an index. I take the same query and put explain keyword first I got this.
select type = simple
type = ref
possible_keys = sourceidnevent,sourceid
key = soruceid
key_len = 109
ref = const
rows = 84148
And the query
SELECT * FROM tabel where sourceid='28B791C7-D519-4F0C-BC03-EFB1D4AC9CEB'
However I started to think about what does I really need from the table. I want to know which event occured on which server, and also which event occured on servers, sorted by date. I have added index for all combination of which where and order by are used.
I need all the rows for becuse I want to make some calculation on them, some grouping, avarage and so on. But I'm doing it in .NET enviroment insteed of asking many question.
However if I add a limit to the select it goes faster. So is the bottleneck the amount of data that is transfered and not actully the finding/selecting part? If so I can rebuild my application to do the calculation on only one day and save the result into another table, and later aggregate all of it.
How can I speed up the procecss? Would it be better to switch to MongoDB? I currently use MySQL and InnoDB.
There's a lot of information you've not provided here - some of which I've mentioned in my comment elsewhere.
NoSQL is unlikely to be much faster than MySQL on a single node. I'd be very surprised if it were faster than using the handler API on MySQL along with appropirate indexes.
You've provided part of an explain plan (but not the query being explained) - but you haven't provided any interpretation of this:
rows = 84148
Does it really need to process that many rows to provide the result you need? If so and the result is not aggregated then maybe you need to think about why you need to ship 80k rows of data to the front end. If it's only having to return a few non-aggregated rows then you really need to analyse your indexes.
I have added index for all combination
Too many indexes is just as bad for performance as too few.
Related
Here is my situation. I have a MySQL MyISAM table containing about 4 million records with a total of 13,3 GB of data. The table contains messages received from an external system. Two of the columns in the table keep track of a timestamp and a boolean whether the message is handled or not.
When using this query:
SELECT MIN(timestampCB) FROM webshop_cb_onx_message
The result shows up almost instantly.
However, I need to find the earliest timestamp of unhandled messages, like this:
SELECT MIN(timestampCB ) FROM webshop_cb_onx_message WHERE handled = 0
The results of this query show up after about 3 minutes, which is way too slow for the script I'm writing.
Both columns are individually indexed, not together. However, adding an index to the table would take incredibly long considering the amount of data that is in there already.
Does my problem originate from the fact that both columns are separatly indexed, and if so, does anyone have a solution to my issue other than adding another index?
It is commonly recommended that if the selectivity of an index over 20% then a full table scan is preferable over an index access. This would mean it is likely that your index on handled won't actually result in using the index but a full table scan given the selectivity.
A composite index of handled, timestampCB may actually improve the performance given its a composite index, even if the selectivity isn't great MySQL would most likely still use it - even if it didn't you could force it's use.
I have a java application and I would like to get some data from a table and display in the application.
I have millions of records, and the query gets really slow when I am going to the last records. it takes few good minutes to get the results.
select Id from Table1x where description like '%error%' and Id between 0 and 1329999 limit 0, 1000
The above query returns a fast result. That is first pages returns fast. But when I am moving the last pages, it becomes slow.
select Id from Table1x where description like '%error%' and Id between 0 and 1329999 limit 644000, 1000.
This query is slow and taking 17 secs.
Any ideas on how to make this faster? Id is the primary key of table1x.
The problem is in the like. To get the first 1000 records, the database only needs to filter the database until it finds 1000 records that match the search. For the other query, the database needs to match records until it has 645000 records, which makes it much slower. There is no sorting or other filtering, so the index on ID doesn't help at all.
An index on description would help, but not if you start the search with a wildcard, like you do now.
I see two solutions.
First option is to add a FULLTEXT index on the description field. It allows to to look for the word error using MATCH rather than LIKE. I think it will be a lot faster, but the index will become larger too, and I'm not sure about the optimizations on the long run.
Second solution: Since you're obviously looking for errors (I think you're building a report on a log table?), you may add a column with a record type. You can give each record a type (just an integer) which indicates where that record holds an error or not. You will need to update your table once, and insert the type along with new records, but it will make your query faster.
I must admit that this second solution is based on assumptions about the data and your goal. If I'm wrong about that, please provide additional information and I may find a solution that suits you better.
i'm creating a ecommerce web applicaiton using PHP and MYSQL(MYISAM). i want to know how to speed up my queries
I have a products table with over a million records with following columns: id (int, primary) catid(int) usrid (int) title (int) description (int) status (enum) date(datetime)
recently i split this one table into multiple tables based on the product categories(catid). thinking that it might reduce the load on the server.
Now i need to fetch results from these tables combined with following sets of conditions 1. results matching a usrid and status. (to fetch a users products) 2 results matching status and title or description (eg: for product search)
now currently i have to use UNION to fetch results from these all tables combined which is slowing down the permormance also i can't apply the LIMIT to the combined result set also. I thought of creating an index on all these columns to speed up the searching but this might slow down the INSERTS and UPDATES. also i'm begingin to think that splitting the table was not a good idea in the first place.
i would like to know the best approach to optimize the data retrieval in such a situation. I'm open to new database schema proposals as well.
To start: load test and turn on the MySQL slow query log.
Some other suggestions:
If staying with separate tables per category use UNION ALL instead of UNION. Reason being UNION implies distinctness, which makes the database engine do extra work to dedupe the rows unnecessarily.
Indices do add a write penalty, but what you describe probably has a read-write ratio of at least 10 to 1 and probably more like 1000 to 1 or higher. So index. For the two queries you describe, I would probably create three indices (you'll need to study explain plans to determine what column order is better).
usrid and status
status and title
status and description (is this an indexable field?)
Another note on indices, creating a covering index, that is one that has all your columns, can also be a useful solution if one of your frequent access patterns is retrieval by primary key.
Have you considered using memcached? It caches the resultset from database queries on the server and returns them if they are requested by multiple users. If it doesn't find a cache resultset, only then will it query the database. It should alleviate the load on the database significantly.
http://memcached.org/
1. So if I search for the word ball inside the toys table where I have 5.000.000 entries does it search for all the 5 millions?
I think the answer is yes because how should it know else, but let me know please.
2. If yes: If I need more informations from that table isn't more logic to query just once and work with the results?
An example
I have this table structure for example:
id | toy_name | state
Now I should query like this
mysql_query("SELECT * FROM toys WHERE STATE = 1");
But isn't more logical to query for all the table
mysql_query("SELECT * FROM toys"); and then do this if($query['state'] == 1)?
3. And something else, if I put an ORDER BY id LIMIT 5 in the mysql_query will it search for the 5 million entries or just the last 5?
Thanks for the answers.
Yes, unless you have a LIMIT clause it will look through all the rows. It will do a table scan unless it can use an index.
You should use a query with a WHERE clause here, not filter the results in PHP. Your RDBMS is designed to be able to do this kind of thing efficiently. Only when you need to do complex processing of the data is it more appropriate to load a resultset into PHP and do it there.
With the LIMIT 5, the RDBMS will look through the table until it has found you your five rows, and then it will stop looking. So, all I can say for sure is, it will look at between 5 and 5 million rows!
Read this about indexes :-)
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
It makes it uber-fast :-)
Full table scan is here only if there are no matching indexes and indeed very slow operation.
Sorting is also accelerated by indexes.
And for the #2 - this is slow because transfer rate from MySQL -> PHP is slow, and MySQL is MUCH faster at doing filtering.
For your #1 question: Depends on how you're searching for 'ball'. If there's no index on the column(s) where you're searching, then the entire table has to be read. If there is an index, then...
WHERE field LIKE 'ball%' will use an index
WHERE field LIKE '%ball%' will NOT use an index
For your #2, think of it this way: Doing SELECT * FROM table and then perusing the results in your application is exactly the same as going to the local super walmart, loading the store's complete inventory into your car, driving it home, picking through every box/package, and throwing out everything except the pack of gum from the impulse buy rack by the front till that you'd wanted in the first place. The whole point of a database is to make it easy to search for data and filter by any kind of clause you could think of. By slurping everything across to your application and doing the filtering there, you've reduced that shiny database to a very expensive disk interface, and would probably be better off storing things in flat files. That's why there's WHERE clauses. "SELECT something FROM store WHERE type=pack_of_gum" gets you just the gum, and doesn't force you to truck home a few thousand bottles of shampoo and bags of kitty litter.
For your #3, yes. If you have an ORDER BY clause in a LIMIT query, the result set has to be sorted before the database can figure out what those 5 records should be. While it's not quite as bad as actually transferring the entire record set to your app and only picking out the first five records, it still involves a bit more work than just retrieving the first 5 records that match your WHERE clause.
I am looking at storing some JMX data from JVMs on many servers for about 90 days. This data would be statistics like heap size and thread count. This will mean that one of the tables will have around 388 million records.
From this data I am building some graphs so you can compare the stats retrieved from the Mbeans. This means I will be grabbing some data at an interval using timestamps.
So the real question is, Is there anyway to optimize the table or query so you can perform these queries in a reasonable amount of time?
Thanks,
Josh
There are several things you can do:
Build your indexes to match the queries you are running. Run EXPLAIN to see the types of queries that are run and make sure that they all use an index where possible.
Partition your table. Paritioning is a technique for splitting a large table into several smaller ones by a specific (aggregate) key. MySQL supports this internally from ver. 5.1.
If necessary, build summary tables that cache the costlier parts of your queries. Then run your queries against the summary tables. Similarly, temporary in-memory tables can be used to store a simplified view of your table as a pre-processing stage.
3 suggestions:
index
index
index
p.s. for timestamps you may run into performance issues -- depending on how MySQL handles DATETIME and TIMESTAMP internally, it may be better to store timestamps as integers. (# secs since 1970 or whatever)
Well, for a start, I would suggest you use "offline" processing to produce 'graph ready' data (for most of the common cases) rather than trying to query the raw data on demand.
If you are using MYSQL 5.1 you can use the new features.
but be warned they contain lot of bugs.
first you should use indexes.
if this is not enough you can try to split the tables by using partitioning.
if this also wont work, you can also try load balancing.
A few suggestions.
You're probably going to run aggregate queries on this stuff, so after (or while) you load the data into your tables, you should pre-aggregate the data, for instance pre-compute totals by hour, or by user, or by week, whatever, you get the idea, and store that in cache tables that you use for your reporting graphs. If you can shrink your dataset by an order of magnitude, then, good for you !
This means I will be grabbing some data at an interval using timestamps.
So this means you only use data from the last X days ?
Deleting old data from tables can be horribly slow if you got a few tens of millions of rows to delete, partitioning is great for that (just drop that old partition). It also groups all records from the same time period close together on disk so it's a lot more cache-efficient.
Now if you use MySQL, I strongly suggest using MyISAM tables. You don't get crash-proofness or transactions and locking is dumb, but the size of the table is much smaller than InnoDB, which means it can fit in RAM, which means much quicker access.
Since big aggregates can involve lots of rather sequential disk IO, a fast IO system like RAID10 (or SSD) is a plus.
Is there anyway to optimize the table or query so you can perform these queries
in a reasonable amount of time?
That depends on the table and the queries ; can't give any advice without knowing more.
If you need complicated reporting queries with big aggregates and joins, remember that MySQL does not support any fancy JOINs, or hash-aggregates, or anything else useful really, basically the only thing it can do is nested-loop indexscan which is good on a cached table, and absolutely atrocious on other cases if some random access is involved.
I suggest you test with Postgres. For big aggregates the smarter optimizer does work well.
Example :
CREATE TABLE t (id INTEGER PRIMARY KEY AUTO_INCREMENT, category INT NOT NULL, counter INT NOT NULL) ENGINE=MyISAM;
INSERT INTO t (category, counter) SELECT n%10, n&255 FROM serie;
(serie contains 16M lines with n = 1 .. 16000000)
MySQL Postgres
58 s 100s INSERT
75s 51s CREATE INDEX on (category,id) (useless)
9.3s 5s SELECT category, sum(counter) FROM t GROUP BY category;
1.7s 0.5s SELECT category, sum(counter) FROM t WHERE id>15000000 GROUP BY category;
On a simple query like this pg is about 2-3x faster (the difference would be much larger if complex joins were involved).
EXPLAIN Your SELECT Queries
LIMIT 1 When Getting a Unique Row
SELECT * FROM user WHERE state = 'Alabama' // wrong
SELECT 1 FROM user WHERE state = 'Alabama' LIMIT 1
Index the Search Fields
Indexes are not just for the primary keys or the unique keys. If there are any columns in your table that you will search by, you should almost always index them.
Index and Use Same Column Types for Joins
If your application contains many JOIN queries, you need to make sure that the columns you join by are indexed on both tables. This affects how MySQL internally optimizes the join operation.
Do Not ORDER BY RAND()
If you really need random rows out of your results, there are much better ways of doing it. Granted it takes additional code, but you will prevent a bottleneck that gets exponentially worse as your data grows. The problem is, MySQL will have to perform RAND() operation (which takes processing power) for every single row in the table before sorting it and giving you just 1 row.
Use ENUM over VARCHAR
ENUM type columns are very fast and compact. Internally they are stored like TINYINT, yet they can contain and display string values.
Use NOT NULL If You Can
Unless you have a very specific reason to use a NULL value, you should always set your columns as NOT NULL.
"NULL columns require additional space in the row to record whether their values are NULL. For MyISAM tables, each NULL column takes one bit extra, rounded up to the nearest byte."
Store IP Addresses as UNSIGNED INT
In your queries you can use the INET_ATON() to convert and IP to an integer, and INET_NTOA() for vice versa. There are also similar functions in PHP called ip2long() and long2ip().