MySQL - SQLite How to improve this very simple query? - mysql

I have one simple but large table.
id_tick INTEGER eg: 1622911
price DOUBLE eg: 1.31723
timestamp DATETIME eg: '2010-04-28 09:34:23'
For 1 month of data, I have 2.3 millions rows (150MB)
My query aims at returning the latest price at a given time.
I first set up a SQLite table and used the query:
SELECT max(id_tick), price, timestamp
FROM EURUSD
WHERE timestamp <='2010-04-16 15:22:05'
It is running in 1.6s.
As I need to run this query several thousands of time, 1.6s is by far too long...
I then set up a MySQL table and modified the query (the max function differs from MySQL to SQLite):
SELECT id_tick, price, timestamp
FROM EURUSD
WHERE id_tick = (SELECT MAX(id_tick)
FROM EURUSD WHERE timestamp <='2010-04-16 15:22:05')
Execution time is getting far worse 3.6s
(I know I can avoid the sub query using ORDER BY and LIMIT 1 but it does not improve the execution time.)
I am only using one month of data for now, but I will have to use several years at some point.
My questions are then the following:
is there a way to improve my query?
given the large dataset, should I use another database engine?
any tips ?
Thanks !

1) Make sure you have an index on timestamp
2) Assuming that id_tick is both the PRIMARY KEY and Clustered Index, and assuming that id_tick increments as a function of time (since you are doing a MAX)
You can try this:
SELECT id_tick, price, timestamp
FROM EURUSD
WHERE id_tick = (SELECT id_tick
FROM EURUSD WHERE timestamp <='2010-04-16 15:22:05'
ORDER BY id_tick DESC
LIMIT 1)
This should be similar to janmoesen's performance though, since there should be high page correlation between id_tick and timestamp in any event

Do you have any indexed fields ?
indexing timestamp and/or id_tick could change a lot of things.
Also why don't you use an interval for timestamp ?
WHERE timestamp >= '2010-04-15 15:22:05' AND timestamp <= '2010-04-16 15:22:05'
that would ease the burden of the MAX function.

You are doing analysis using ALL the ticks for large intervals? I'd tried to filter data into minute/hour/day etc. graphs.

OK, I guess my index was corrupted somehow, a re-indexation greatly improved the performance.
The following is now executed in 0.0012s (non cached)
SELECT id_tick, price, timestamp
FROM EURUSD
WHERE timestamp <= '2010-05-11 05:30:10'
ORDER by id_tick desc
LIMIT 1
Thanks!

Related

How to speed up query for datetime in Mysql

SELECT *
FROM LOGS
WHERE datetime > DATE_SUB(NOW(), INTERVAL 1 MONTH)
I have a big table LOGS (InnoDB). When I try to get last month's data, the query waits too long.
I created an index for column datetime but it seems not helping. How to speed up this query?
Since the database records are inserted in oldest to newest, you could create 2 calls. The first call requesting the ID of the oldest record:
int oldestRecordID = SELECT TOP 1 MIN(id)
FROM LOGS
WHERE datetime > DATE_SUB(NOW(), INTERVAL 1 MONTH)
Then with that ID just request all records where ID > oldestRecordID:
SELECT *
FROM LOGS
WHERE ID > oldestRecordID
It's multiple calls, but it could be faster however I am sure you could combine those 2 calls too.
Probably the only thing you can do is create a clustered index on datetime. This will ensure that the values are co-located.
However, I don't think this will solve your real problem. Why are you bringing back all records from a month. This is a lot of data.
In all likelihood, you could summarize the data in the database and only bring back the information you need rather than all the data.

How to generate faster mysql query with 1.6M rows

I have a table that has 1.6M rows. Whenever I use the query below, I get an average of 7.5 seconds.
select * from table
where pid = 170
and cdate between '2017-01-01 0:00:00' and '2017-12-31 23:59:59';
I tried adding a LIMIT 1000 or 10000 or change the date to filter for 1 month, it still processes it to an average of 7.5s. I tried adding a composite index for pid and cdate but it resulted to 1 second slower.
Here is the INDEX list
https://gist.github.com/primerg/3e2470fcd9b21a748af84746554309bc
Can I still make it faster? Is this an acceptable performance considering the amount of data?
Looks like the index is missing. Create this index and see if its helping you.
CREATE INDEX cid_date_index ON table_name (pid, cdate);
And also modify your query to below.
select * from table
where pid = 170
and cdate between CAST('2017-01-01 0:00:00' AS DATETIME) and CAST('2017-12-31 23:59:59' AS DATETIME);
Please provide SHOW CREATE TABLE clicks.
How many rows are returned? If it is 100K rows, the effort to shovel that many rows is significant. And what will you do with that many rows? If you then summarize them, consider summarizing in SQL!
Do have cdate as DATETIME.
Do you use id for anything? Perhaps this would be better:
PRIMARY KEY (pid, cdate, id) -- to get benefit from clustering
INDEX(id) -- if still needed (and to keep AUTO_INCREMENT happy)
This smells like Data Warehousing. DW benefits significantly from building and maintaining Summary table(s), such as one that has the daily click count (etc), from which you could very rapidly sum up 365 counts to get the answer.
CAST is unnecessary. Furthermore 0:00:00 is optional -- it can be included or excluded for either DATE or DATETIME. I prefer
cdate >= '2017-01-01'
AND cdate < '2017-01-01' + INTERVAL 1 YEAR
to avoid leap year, midnight, date arithmetic, etc.

MySQL - group by interval query optimisation

Some background first. We have a MySQL database with a "live currency" table. We use an API to pull the latest currency values for different currencies, every 5 seconds. The table currently has over 8 million rows.
Structure of the table is as follows:
id (INT 11 PK)
currency (VARCHAR 8)
value (DECIMAL
timestamp (TIMESTAMP)
Now we are trying to use this table to plot the data on a graph. We are going to have various different graphs, e.g: Live, Hourly, Daily, Weekly, Monthly.
I'm having a bit of trouble with the query. Using the Weekly graph as an example, I want to output data from the last 7 days, in 15 minute intervals. So here is how I have attempted it:
SELECT *
FROM currency_data
WHERE ((currency = 'GBP')) AND (timestamp > '2017-09-20 12:29:09')
GROUP BY UNIX_TIMESTAMP(timestamp) DIV (15 * 60)
ORDER BY id DESC
This outputs the data I want, but the query is extremely slow. I have a feeling the GROUP BY clause is the cause.
Also BTW I have switched off the sql mode 'ONLY_FULL_GROUP_BY' as it was forcing me to group by id as well, which was returning incorrect results.
Does anyone know of a better way of doing this query which will reduce the time taken to run the query?
You may want to create summary tables for each of the graphs you want to do.
If your data really is coming every 5 seconds, you can attempt something like:
SELECT *
FROM currency_data cd
WHERE currency = 'GBP' AND
timestamp > '2017-09-20 12:29:09' AND
UNIX_TIMESTAMP(timestamp) MOD (15 * 60) BETWEEN 0 AND 4
ORDER BY id DESC;
For both this query and your original query, you want an index on currency_data(currency, timestamp, id).

What is better: select date with trunc date or between

I need to create a query to select some data of my mysql db based on date, but in my where clause i have to options:
1 - trunc the date:
select count(*) from mailing_user where date_format(create_date, '%Y-%m-%d')='2013-11-05';
2 - use between
select count(*) from mailing_user where create_date between '2013-11-05 00:00:00' and '2013-11-05 23:59:59';
the two query's will work, but whats the better? Or, what's recommended? Why?
Here is an article to read.
http://willem.stuursma.name/2009/01/09/mysql-performance-with-date-functions/
If your created_date column is indexed, the 2nd query will be faster.
But if the column is not indexed and if this is your defined date format, you can use the following query.
select count(*) from mailing_user where DATE(create_date) = '2013-11-05';
I use DATE instead of DATE_FORMAT as I can make use of the native feature of getting in this format('2013-11-05').
From your question it seems you want to select records from one day, according to the documentation A DATETIME or TIMESTAMP value can include a trailing fractional seconds part in up to microseconds (6 digits) precision.
So this means your second query might actually get unlucky and miss some records that were inserted into the table at the very last second of that day, so that is why I would say the first one is more precise and is guaranteed to always get you the correct result.
The downside of this is that you cannot index that column using the date_format-function, because MySQL isn't cool with that.
If you don't want to use date_format and get around the precision issue you would change
where create_date between '2013-11-05 00:00:00' and '2013-11-05 23:59:59'
into
where create_date >= '2013-11-05 00:00:00' and create_date < '2013-12-05 00:00:00'
Number 2 will be faster if you have an index on the create_date because number one won't be able to use the index to quickly scan the results.
However this requires there to be an index on the create_date.
Otherwise I imagine they would be similar speed, possibly the second would still be faster because of the smaller processing time to compare(datetime comparison rather than converting to a string and comparing strings), but I doubt it'd be significant.

Datetime vs Date and Time Mysql

I generally use datetime field to store created_time updated time of data within an application.
But now i have come across a database table where they have kept date and time separate fields in table.
So what are the schema in which two of these should be used and why?
What are pros and cons attached with using of two?
There is a huge difference in performance when using DATE field above DATETIME field. I have a table with more then 4.000.000 records and for testing purposes I added 2 fields with both their own index. One using DATETIME and the other field using DATE.
I disabled MySQL query cache to be able to test properly and looped over the same query for 1000x:
SELECT * FROM `logs` WHERE `dt` BETWEEN '2015-04-01' AND '2015-05-01' LIMIT 10000,10;
DATETIME INDEX:
197.564 seconds.
SELECT * FROM `logs` WHERE `d` BETWEEN '2015-04-01' AND '2015-05-01' LIMIT 10000,10;
DATE INDEX:
107.577 seconds.
Using a date indexed field has a performance improvement of: 45.55%!!
So I would say if you are expecting a lot of data in your table please consider in separating the date from the time with their own index.
I tend to think there are basically no advantages to storing the date and time in separate fields. MySQL offers very convenient functions for extracting the date and time parts of a datetime value.
Okay. There can be some efficiency reasons. In MySQL, you can put separate indexes on the fields. So, if you want to search for particular times, for instance, then a query that counts by hours of the day (for instance) can use an index on the time field. An index on a datetime field would not be used in this case. A separate date field might make it easier to write a query that will use the date index, but, strictly speaking, a datetime should also work.
The one time where I've seen dates and times stored separately is in a trading system. In this case, the trade has a valuation date. The valuation time is something like "NY Open" or "London Close" -- this is not a real time value. It is a description of the time of day used for valuation.
The tricky part is when you have to do date arithmetic on a time value and you do not want a date portion coming into the mix. Ex:
myapptdate = 2014-01-02 09:00:00
Select such and such where myapptdate between 2014-01-02 07:00:00 and 2014-01-02 13:00:00
1900-01-02 07:00:00
2014-01-02 07:00:00
One difference I found is using BETWEEN for dates with non-zero time.
Imagine a search with "between dates" filter. Standard user's expectation is it will return records from the end day as well, so using DATETIME you have to always add an extra day for the BETWEEN to work as expected, while using DATE you only pass what user entered, with no extra logic needed.
So query
SELECT * FROM mytable WHERE mydate BETWEEN '2020-06-24' AND '2020-06-25'
will return a record for 2020-06-25 16:30:00, while query:
SELECT * FROM mytable WHERE mydatetime BETWEEN '2020-06-24' AND '2020-06-25'
won't - you'd have to add an extra day:
SELECT * FROM mytable WHERE mydatetime BETWEEN '2020-06-24' AND '2020-06-26'
But as victor diaz mentioned, doing datetime calculations with date+time would be a super inefficient nightmare and far worse, than just adding a day to the second datetime. Therefore I'd only use DATE if the time is irrelevant, or as a "cache" for speeding queries up for date lookups (see Elwin's answer).