Fast page retrieval in MySQL, index usage? - mysql

I would like to speed a MySQL query that basically retrieve a page of data following the pattern below
select
my_field_A,
my_field_B
where
time_id >= UNIX_TIMESTAMP('1901-01-01 00:00:00') AND
time_id < UNIX_TIMESTAMP('2009-01-16 00:00:00')
The field time_id is an MySQL index, yet, the query behaves as if the entire database was read at each query (retrieving a couple of lines being already quite slow). I not an expert in MySQL. Can someone guess what I am doing wrong?

Make sure you have an index (B-tree) on time_id, this should be efficient for range queries. Also make sure that time_id is in the appropriate time format.
If you really want to understand what mysql is doing you can add the keyword 'explain' infront of the query and run it in your mysql client. This will show some information about what mysql is doing and what kind of scans are performed.
http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

As there are probably lots of time_id's falling under these criteria, MySQL may think that the full table scan is better.
Try forcing the index:
SELECT
my_field_A,
my_field_B
FROM mytable FORCE INDEX (index_name_on_time_id)
WHERE
time_id >= UNIX_TIMESTAMP('1901-01-01 00:00:00') AND
time_id < UNIX_TIMESTAMP('2009-01-16 00:00:00')

Do you need the lower range? Are there any entries earlier than 1901?
How is the time_id column generated? If the time_id is always greater with each new entry being added into DB, you may want to consider finding ID with closest entry to 2009-01-16 and then select by ID
select my_field_A, my_field_B
FROM
mytable
WHERE
id <= ?
If that is not the case, try checking out partitioning in available from MySQL 5.1 and break down the table by years, that should increase speed dramatically.

Related

MySQL - Group By date/time functions on a large table

I have a bunch of financial stock data in a MySQL table. The data is stored in a 1min tick per row format (OHLC). From that data I'd like to create 30min/hourly/daily aggregates. The problem that the table is enormous and grouping by date functions on the timestamp column yeilds horrible performance results.
Ex: The following query produces the right result but ends up taking too long.
SELECT market, max(timestamp) AS TS
FROM tbl_data
GROUP BY market, DATE(timestamp), HOUR(timestamp)
ORDER BY market, TS ASC
The table has a primary index on the (market, timestamp) columns. And I have also added an additional index on the timestamp column. However, that is not of much help as the usage of date/hour functions means a table scan regardless.
How can I improve the performance? Perhaps I should consider a different database than MySQL that provides specialized date/time indexes? if so what would be a good option?
One thing to note is that it would suffice if I could get the LAST row of each hour/day/timeframe. The database has tens of millions of rows.
MySQL version: 5.7
Thanks in advance for the help.
Edit: Here is what Explain shows on a smaller DB of the exact same format:

What keys should be indexed here to make this query optimal

I have a query that looks like the following:
SELECT * from foo
WHERE days >= DATEDIFF(CURDATE(), last_day)
In this case, days is an INT. last_day is a DATE column.
so I need two individual indexes here for days and last_day?
This query predicate, days >= DATEDIFF(CURDATE(), last_day), is inherently not sargeable.
If you keep the present table design you'll probably benefit from a compound index on (last_day, days). Nevertheless, satisfying the query will require a full scan of that index.
Single-column indexes on either one of those columns, or both, will be useless or worse for improving this query's performance.
If you must have this query perform very well, you need to reorganize your table a bit. Let's figure that out. It looks like you are trying to exclude "overdue" records: you want expiration_date < CURDATE(). That is a sargeable search predicate.
So if you added a new column expiration_date to your table, and then set it as follows:
UPDATE foo SET expiration_date = last_day + INTERVAL days DAY
and then indexed it, you'd have a well-performing query.
You must be careful with indexes, they can help you reading, but they can reduce performance in insert.
You may consider to create a partition over last_day field.
I should try to create only in last_day field, but, I think the best is making some performance tests with different configurations.
Since you are using an expression in the where criteria, mysql will not be able to use indexes on any of the two fields. If you use this expression regularly and you have at least mysql v5.7.8, then you can create a generated column and create an index on it.
The other option is to create a regular column and set its value to the result of this expression and index this column. You will need triggers to keep it updated.

SQL Performance of grouping by DATE(TIMESTAMP) vs separate columns for DATE and TIME

I'm facing a problem of displaying data from MySQL database.
I have a table with all user requestes in format:
| TIMESTAMP Time / +INDEX | Some other params |
I want to show this data on my website as a table with number of requests in each day.
The query is quite simple:
SELECT DATE(Time) as D, COUNT(*) as S FROM Stats GROUP BY D ORDER BY D DESC
But when looking into EXPLAIN this drives me mad:
Using index; **Using temporary; Using filesort**
From MySQL docs it says that it creates temporary table for this query on hard drive.
How fast it would be with 1.000.000 records? And how fast with 100.000.000?
Is there any way to put INDEX on result of function?
Maybe I should create separate columns for DATE and TIME and than group by DATE column?
What are other good ways of dealing with such problem? Caching? Another DB engine?
If you have an index on your Time column this operation is going to perform tolerably well. I'm guessing you do have that index, because your EXPLAIN output says it's using an index.
Why does this work well? Because MySQL can access this index in order -- it can scan the index -- to satisfy your query.
Don't be confused by Using temporary; Using filesort. This simply means MySQL needs to create and return a virtual table with a row for each day. That's pretty small and almost surely fits in memory. filesort doesn't necessarily mean the file has spilled to a temp file on disk; it just means MySQL has to sort the virtual table. It has to sort it to get the last day first.
By the way, if you can restrict the date range of the query you'll get predictable performance on this query even when your application has been in use for years. Try something this:
SELECT DATE(Time) as D, COUNT(*) as S
FROM Stats
WHERE Time >= CURDATE() - INTERVAL 30 DAY
GROUP BY D ORDER BY D DESC
First: a GROUP BY means sorting and it is an expensive operation. The data in the index is sorted but even in this case the ddbb needs to groups dates. So I feel that indexing by DATE may help as it will improve the speed of the query at the cost of refreshing another index at every insert. Please test it, i am not 100% sure.
Other alternatives are:
Using a partitioned table by month.
Using a materialized views
Updating a counter with every visit.
Precalculating and storing yesterday's data. Just refresh your daily visits with a WHERE DAY(timestamp) = TODAY. This way the serer will have to sort a smaller amount of data.
Dependes on how often do user visit your page and when you do need this data. Do not optimize prematuraly if you do not need it.

mysql SQL optimization

this query takes an hour
select *,
unix_timestamp(finishtime)-unix_timestamp(submittime) timetaken
from joblog
where jobname like '%cas%'
and submittime>='2013-01-01 00:00:00'
and submittime<='2013-01-10 00:00:00'
order by id desc limit 300;
but the same query with one submittime finishes in like .03 seconds
the table has 2.1 Million rows
Any idea whats causing the issue or how to debug it
Your first step should be to use MySQL EXPLAIN to see what the query is doing. It'll probably give you some insight on how to fix your issue.
My guess is that jobname LIKE '%cas%' is the slowest part because you're doing a wildcard text search. Adding an index here won't even help, because you have a leading wildcard. Is there any way to do this query without a leading wildcard like that? Also adding an index on submittime might improve the speed of this query.
You might try adding a LIMIT to the query and see if that increases the speed that it returns ...
Excerpt from http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
"Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.) However, if such a query uses LIMIT to retrieve only some of the rows, MySQL uses an index anyway, because it can much more quickly find the few rows to return in the result. "
select *,unix_timestamp(finishtime)-unix_timestamp(submittime) timetaken
from joblog
where (submittime between '2013-01-10 00:00:00' and '2013-01-19 00:00:00')
and jobname is not null
and jobname like '%cas%';
this helped
(0.93 seconds)

Very slow query, any other ways to format this with better performace?

I have this query (I didn't write) that was working fine for a client until the table got more then a few thousand rows in it, now it's taking 40 seconds+ on only 4200 rows.
Any suggetions on how to optimize and get the same result?
I've tried a few other methods but didn't get the correct result that this slower query returned...
SELECT COUNT(*) AS num
FROM `fl_events`
WHERE id IN(
SELECT DISTINCT (e2.id)
FROM `fl_events` AS e1, fl_events AS e2
WHERE e1.startdate >= now() AND e1.startdate = e2.startdate
)
ORDER BY `startdate`
Any help would be greatly appriciated!
Appart from the obvious indexes needed, I don't really get why you are joining your table with itself for choosing the IN condition. The ORDER BY is also not needed. Are you sure that your query can't be written just like this?:
SELECT COUNT(*) AS num
FROM `fl_events` AS e1
WHERE e1.startdate >= now()
I don't think rewriting the query will help. The key to your question is "until the table got more than a few thousand rows." This implies that important columns aren't indexed. Prior to a certain number of records, all the data fit on a single memory block - over that point, it takes a new block. And index is the only way to speed up the search.
first - check to see that the ID in fl_events is actually marked as a primary key. That physically orders the records and without it you can see data corruption and occasionally super-slow results. The use of distinct in the query makes it look like it might NOT be a unique value. That will pose a problem.
Then, make sure to add an index on the start_date.
The slowness is probably related to the join of the event table with itself, and possibly startdate not having an index.