I have a large database.
There are 22 million lines for now, but there will be billions of lines in the future.
I have attached example lines where timestamp is in milliseconds.
What I want to do is get the last line every second.
SELECT *
FROM btcusdt
WHERE 1502942432000 <= timestamp
AND timestamp < 1502942433000
ORDER BY tradeid DESC
LIMIT 1
The query I wrote above works fine, but the WHERE condition takes too long because it scans all the rows in the table, but actually
It doesn't need to scan all the lines because the time is already sequential. As soon as it doesn’t fit the where condition, it should finish scanning.
Any suggestions on how I can speed this up?
db example
Related
I don't have much experience with MySQL so not sure if it's issue with MySQL or my code.
I have a table lets say data and it has a created_at column of DATETIME type.
This table gets 20-30 new records per second, no updates at all.
I have a Cron job that runs every 15 minute and tries to get all records created in the last 15 minutes.
if it runs at 10:15:06am and the last run was at 10:00:03, it makes this query:
SELECT * FROM `data` WHERE (created_at >= '2021-07-30T10:00:03Z' AND created_at < '2021-07-30T10:15:06Z')
Current time is excluded, hence the created_at < current_time.
But the problem is, once in a while I get duplicate data error. That is it includes a few data rows from the current_time, which should have been excluded.
Like in this case, if 15 records were inserted at 10:15:06, the query result might have 4-5 records included in it. But it does not happen every time.
I am using Golang and for current time, I use time.Now(). Can this be because of millisecond or something else ? I am not making more than 1 database query, so I think it has to do something with DB, if I have extra records.
I have a billion rows in mysql table and I want to query the table with an indexed field lets say timestamp.
I want to query last 7 days data which can be 1000000 rows approximately and I am querying based on last id fetched and a limit which is 500.
This query works fine when I am processing upto 5000000 rows of data which is 10000 queries but when I increase the number of queries to, let's say, 50000, I can see a degradation in performance over time. Query used to take 5-10ms in the starting but after running for a long time it degraded to 2s. How can I optimize this ?
I earlier tried a naive solution which is limit, offset which gave highly unoptimized results so I tried to optimized it by saving last id and adding last id while querying every time but then again performance degraded overtime for this if I keep fetching one after another for 3-4 hours.
JAVA : Using Hibernate and Slicing
Date date = new Date();
Date timestamp = new DateTime(date).minusDays(7).toDate();
while (true) {
Integer rowLimit = 500;
Sort.Order sortingOrder = Sort.Order.asc("timestamp");
Sort sort = Sort.by(sortingOrder);
Pageable pageable = PageRequest.of(0, rowLimit, sort);
long queryStartTime = System.currentTimeMillis();
entityDataSlice = repository.findAllByTimestampAfterAndIdGreaterThan(
timestamp, lastId, pageable
);
long queryEndTime = System.currentTimeMillis();
if (!entityDataSlice.hasNext()) {
break;
}
}
MYSQL :
select *
from table
where timestamp >= "some_time"
and id >= <some_id>
order
by timestamp
limit 500
Expected result was a performance optimization but overtime it degraded.
Expected upto 100ms overtime but its actually upto 2-3 secs which is more likely to be degraded further upto 5-10 secs
Please provide SHOW CREATE TABLE. Meanwhile, if you have INDEX(timestamp) you don't need the and id.... In fact, it may get in the way of optimizing the ORDER BY.
So, if your query is this:
select *
from table
where timestamp >= "some_time"
order by timestamp
limit 500
and you have INDEX(timestamp), then it is well optimized, and it will not slow down (aside from caching issues).
If that is just a simplified version of the 'real' query, the all bets are off.
I'm using a MySQL database to store values from some energy measurement system. The problem is that the DB contains millions of rows, and the queries take somewhat long to complete. Are the queries optimal? What should I do to improve them?
The database table consists of rows with 15 columns each (t, UL1, UL2, UL3, PL1, PL2, PL3, P, Q1, Q2, Q3,CosPhi1, CosPhi2, CosPhi3, i), where t is time, P is total power and i is some identifier.
Seeing as I display the data in graphs grouped in different intervals (15 minutes, 1 hour, 1 day, 1 month) I want to group the querys as such.
As an example I have a graph that shows the kWh for every day in the current year. The query to gather the data goes like this:
SELECT t, SUM(P) as P
FROM table
WHERE i = 0 and t >= '2015-01-01 00:00:00'
GROUP BY DAY(t), MONTH(t)
ORDER BY t
The database has been gathering measurements for 13 days, and this query alone is already taking 2-3 seconds to complete. Those 13 days have added about 1-1.3 million rows to the db, as a new row gets added every second.
Is this query optimal?
I would actually create a secondary table that has a column for each DAY, and one for the total. Then, via a trigger, your insert into the detail table can update the secondary aggregate table. This way, you can sum the DAILY table which will be much quicker, and yet still have the per second table if you needed to look at the granular level details.
Having aggregate tables can be a common time-saver for querying, especially for read-only types of data, or data you know wont be changing. Then, if you want more granular detail such as hourly or 15 minute intervals, go directly to the raw data.
For this query:
SELECT t, SUM(P) as P
FROM table
WHERE i = 0 and t >= '2015-01-01 00:00:00'
GROUP BY DAY(t), MONTH(t)
ORDER BY t
The optimal index is a covering index: table(i, t, p).
2-3 seconds for 1+ million rows suggests that you already have an index.
You may want to consider DRapp's suggestion and use summary tables. In a few months, you will have so much data that historical queries could be taking a long time.
In the meantime, though, indexes and partitioning might provide sufficient performance for your needs.
My database has two columns : time and work. Its very very large and still growing.
I want to select the last entry:
$sql = "SELECT time, work FROM device ORDER BY time DESC LIMIT 1"
It takes 1.5 seconds to response it. How can I speed this up?. Because I repeat it 20 times.
I can't wait 20 seconds.
Greetings!
use MAX
SELECT *
FROM tableName
WHERE time = (SELECT MAX(time) FROM device)
add also an Index on column time
I was wondering why do you want to repeat this 20 times. If you are working on the application lebel, maybe you can add the result in a variable so you won't execute it again.
My dataset is a table with 3 rows : ID of a hard-drive, percentage of empty space, timestamp. The table is appended with the new state of each HDD (1200 of them) every 20 minutes.
If I want to pick the last state of my HDD pool, I go for a MAX(timestamp), and a MIN(timestamp) if I want the oldest.
But say I have a given timestamp, how can I ask MySQL to retrieve data from more or less X seconds around this timestamp ?
WHERE yourTimeStamp
between TIMESTAMPADD(SECOND,-3,yourtimestamp)
and TIMESTAMPADD(SECOND, 3,yourtimestamp)
where -3 and + 3 was substituted for X
http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_timestampadd for more help.
Like this:
WHERE timestamp BETWEEN DATE_SUB('<given_timestamp>', INTERVAL 5 SECOND)
AND DATE_ADD('<given_timestamp>', INTERVAL 5 SECOND);
As mentioned in the other answer, your query is slow when selection is based on the timestamp field.
You can add an INDEX on that column to speed it up:
ALTER TABLE <table_name> ADD INDEX(`timestamp`)
Note that, depending on the size of your table, the first time you add an index it takes a while. Secondly, it slows down INSERT queries and adds to the size of your database. This is different for everybody so you just have to find out by testing.