SELECT *
FROM LOGS
WHERE datetime > DATE_SUB(NOW(), INTERVAL 1 MONTH)
I have a big table LOGS (InnoDB). When I try to get last month's data, the query waits too long.
I created an index for column datetime but it seems not helping. How to speed up this query?
Since the database records are inserted in oldest to newest, you could create 2 calls. The first call requesting the ID of the oldest record:
int oldestRecordID = SELECT TOP 1 MIN(id)
FROM LOGS
WHERE datetime > DATE_SUB(NOW(), INTERVAL 1 MONTH)
Then with that ID just request all records where ID > oldestRecordID:
SELECT *
FROM LOGS
WHERE ID > oldestRecordID
It's multiple calls, but it could be faster however I am sure you could combine those 2 calls too.
Probably the only thing you can do is create a clustered index on datetime. This will ensure that the values are co-located.
However, I don't think this will solve your real problem. Why are you bringing back all records from a month. This is a lot of data.
In all likelihood, you could summarize the data in the database and only bring back the information you need rather than all the data.
Related
I don't have much experience with MySQL so not sure if it's issue with MySQL or my code.
I have a table lets say data and it has a created_at column of DATETIME type.
This table gets 20-30 new records per second, no updates at all.
I have a Cron job that runs every 15 minute and tries to get all records created in the last 15 minutes.
if it runs at 10:15:06am and the last run was at 10:00:03, it makes this query:
SELECT * FROM `data` WHERE (created_at >= '2021-07-30T10:00:03Z' AND created_at < '2021-07-30T10:15:06Z')
Current time is excluded, hence the created_at < current_time.
But the problem is, once in a while I get duplicate data error. That is it includes a few data rows from the current_time, which should have been excluded.
Like in this case, if 15 records were inserted at 10:15:06, the query result might have 4-5 records included in it. But it does not happen every time.
I am using Golang and for current time, I use time.Now(). Can this be because of millisecond or something else ? I am not making more than 1 database query, so I think it has to do something with DB, if I have extra records.
I am just learning MySql (SQL in general) and I have a question. I ran a process to populate a table with 72 records. This was done, however, I needed to run the process again and this time it populated the table again with a second record for each user for a total now of 144 records. How can I isolate the newest records created today?
A simple solution is to use current_date to figure out today's date and date() to remove the time portion of your column. Then:
where current_date = date(createdTS)
This is fine for a small dataset as yours. As general solution, you'd need a query that won't need to manipulate every row, e.g.
where createdTS >= current_date and createdTS < current_date + interval 1 day
You just have to use your createdTS column, (assuming you know what was the timestamp of both runs).
SELECT * FROM `my_table` WHERE `createdTS` > '2019-07-25 15:00:00'
You could also RANK() over and get only the newest run for each user (something like this)
Some background first. We have a MySQL database with a "live currency" table. We use an API to pull the latest currency values for different currencies, every 5 seconds. The table currently has over 8 million rows.
Structure of the table is as follows:
id (INT 11 PK)
currency (VARCHAR 8)
value (DECIMAL
timestamp (TIMESTAMP)
Now we are trying to use this table to plot the data on a graph. We are going to have various different graphs, e.g: Live, Hourly, Daily, Weekly, Monthly.
I'm having a bit of trouble with the query. Using the Weekly graph as an example, I want to output data from the last 7 days, in 15 minute intervals. So here is how I have attempted it:
SELECT *
FROM currency_data
WHERE ((currency = 'GBP')) AND (timestamp > '2017-09-20 12:29:09')
GROUP BY UNIX_TIMESTAMP(timestamp) DIV (15 * 60)
ORDER BY id DESC
This outputs the data I want, but the query is extremely slow. I have a feeling the GROUP BY clause is the cause.
Also BTW I have switched off the sql mode 'ONLY_FULL_GROUP_BY' as it was forcing me to group by id as well, which was returning incorrect results.
Does anyone know of a better way of doing this query which will reduce the time taken to run the query?
You may want to create summary tables for each of the graphs you want to do.
If your data really is coming every 5 seconds, you can attempt something like:
SELECT *
FROM currency_data cd
WHERE currency = 'GBP' AND
timestamp > '2017-09-20 12:29:09' AND
UNIX_TIMESTAMP(timestamp) MOD (15 * 60) BETWEEN 0 AND 4
ORDER BY id DESC;
For both this query and your original query, you want an index on currency_data(currency, timestamp, id).
I'm having a hard time wrapping my head around how to combine these queries together.
Here is my db setup:
Table 1:
querylog - log of all api calls application makes
- id (AI)
- url (VARCHAR)
- when (DATETIME)
Table 2:
trades - data returned from api calls
- tid (trade ID, unique)
- price
- date (datetime) - when trade occured, not when inserted
- etc
I am trying to get a count of records added in the last hour.
I can use this sql statement to get the first trade TID added in the last hour (pre-modified url is in the form: https://API_INFO_HERE/trades?id=TID_HERE)
SELECT SUBSTRING(url,50, 50) as oldest from querylog where url like 'https://API_INFO_HERE/trades?%' and `when`>= DATE_SUB(NOW(),INTERVAL 1 HOUR) ORDER BY `querylog`.`when` DESC LIMIT 1
Then to get the count all i need is:
SELECT count(*) FROM `trades` where tid > VALUE_FROM_PREVIOUS_QUERY
If anyone could help me combine the queries I would be very appreciative!
I think you should do something like this:
SELECT count(*) FROM `trades`
WHERE tid > (SELECT SUBSTRING(url,50, 50) as oldest from querylog where url like 'https://API_INFO_HERE/trades?%' and `when`>= DATE_SUB(NOW(),INTERVAL 1 HOUR) ORDER BY `querylog`.`when` DESC LIMIT 1)
Keep in mind though, that it may be slow, if there's no index on substring(url,50,50).
You should probably consider logging request time in trades table as well or in contrary add appropriate index on querylog, to speed things up.
I have a table containing access logs. I want to know how many accesses to resource_id '123' occured in each hour in a 24 hour day.
My first thought for retrieving this info is just looping through each hour and querying the table in each loop with something like... and time like '$hour:%', given that the time field holds data in the format 15:47:55.
Is there a way I can group by the hours and retrieve each hour and the number of rows within each hour in a single query?
Database is MySQL, language is PHP.
SELECT HOUR(MyDatetimeColumn) AS h, COUNT(*)
FROM MyTable
GROUP BY h;
You can use the function HOUR to get the hour out of the time. Then you should be able to group by that.