I have a table with three columns:
`id` int(11) NOT NULL auto_increment
`tm` int NOT NULL
`ip` varchar(16) NOT NULL DEFAULT '0.0.0.0'
I want to run a query that will check if the same IP was logged within a minute and then delete all but one of those entries.
For example I have the two rows below.
id=1 tm=1361886629 ip=192.168.0.1
id=2 tm=1361886630 ip=192.168.0.1
I would only like to keep one in the database.
I have read lots of other remove duplicate/partial duplicate entry questions but I'm looking for a way to compare the last two digits of the Unix/epoch time and delete all but one based on that plus the IP.
Any help is much appreciated.
you can use CAST in mysql to remove the last 2 digits
SELECT CAST( tm AS CHAR( 8 ) )
this will select only the first 8 digits from the timestamp and allow you to find duplicates
if you only want to know what the last 2 digits are
SELECT RIGHT(CAST( tm AS CHAR( 10 ) ), 2)
this will select the last two digits only from each timestamp
Related
I've got a Database in MySQL with around 1,000,000 DateTime Entries (Format: 2022-01-31 19:45:39) and so on.
For each day I have 100 Entries. I need a query to get the first DateTime Entry of the day, for the last 30 Days, where Column Number = 1234.
In PostgreSQl I would use a Lateral Join, but how to deal with it in MySQL?
Schema:
id int(11) auto_increment
datetime datetime
number varchar(255)
Someone can give me a full working example? I can't get it working
SELECT MIN(classification) AS classification
,MIN(START) AS START
,MAX(next_start) AS END
,SUM(duration) AS seconds
FROM ( SELECT *
, CASE WHEN (duration < 20*60) THEN CASE WHEN (duration = -1) THEN 'current_session' ELSE 'session' END
ELSE 'break'
END AS classification
, CASE WHEN (duration > 20*60) THEN ((#sum_grouping := #sum_grouping +2)-1)
ELSE #sum_grouping
END AS sum_grouping
FROM ( SELECT *
, CASE WHEN next_start IS NOT NULL THEN TIMESTAMPDIFF(SECOND, START, next_start) ELSE -1 END AS duration
FROM ( SELECT id, studentId, START
, (SELECT MIN(START)
FROM attempt AS sub
WHERE sub.studentId = main.studentId
AND sub.start > main.start
) AS next_start
FROM attempt AS main
WHERE main.studentId = 605
ORDER BY START
) AS t1
) AS t2
WHERE duration != 0
) AS t3
GROUP BY sum_grouping
ORDER BY START DESC, END DESC
Explanation and goal
The attempt table records a student's attempt at some activity, during a session. If two attempts are less than 20 minutes apart, we consider those to be the same session. If they are more than 20 minutes apart, we assume they took a break.
My goal with this query is to take all of the attempts and condense them down in a list of sessions and breaks, with the start time of each session, the end time (defined as the start of the subsequent session), and how long the session was. The classification is whether it is a session, a break, or the current session.
The above query does all of that, but is too slow. How can I improve the performance?
How the current query works
The innermost queries select an attempt's start time and the subsequent attempt's start time, along with the duration between those values.
Then, the #sum_grouping and sum_grouping are used to split the attempts into the sessions and breaks. #sum_grouping is only ever increased when an attempt is more than 20 minutes long (i.e. a break), and it is always increased by 2. However, sum_grouping is set to a value of one less than that for that "break". If an attempt is less than 20 minutes long, then the current #sum_grouping value is used, without modification. As a result, all breaks are distinct odd values, and all sessions (whether of 1 or more attempt) end up as distinct even numbers. This allows the GROUP BY portion to correctly separate the attempts into sessions and breaks.
Example:
Attempt type #sum_grouping sum_grouping
non-break 0 0
non-break 0 0
break 2 1
break 4 3
non-break 4 4
break 6 5
As you can see, all the breaks will be grouped by sum_grouping separately with distinct odd values and all the non-breaks will be grouped together as sessions with the even values.
The MIN(classification) simply forces "current session" to be returned when both "session" and "current session" are present within a grouped row.
OUTPUT OF SHOW CREATE TABLE attempt
CREATE TABLE attempt (
id int(11) NOT NULL AUTO_INCREMENT,
caseId int(11) NOT NULL DEFAULT '0',
eventId int(11) NOT NULL DEFAULT '0',
studentId int(11) NOT NULL DEFAULT '0',
activeUuid char(36) NOT NULL,
start timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
end timestamp NULL DEFAULT NULL,
outcome float DEFAULT NULL,
response varchar(5000) NOT NULL DEFAULT '',
PRIMARY KEY id),
KEY activeUuid activeUuid),
KEY caseId caseId,activeUuid),
KEY end end),
KEY start start),
KEY studentId studentId),
KEY attempt_idx_studentid_stat_id studentId,start,id),
KEY attempt_idx_studentid_stat studentId,start
) ENGINE=MyISAM AUTO_INCREMENT=298382 DEFAULT CHARSET=latin1
(This is not a proper Answer, but here goes anyway.)
Try not to nest 'derived' tables.
I see a lot of syntax errors.
Move from MyISAM to InnoDB.
INDEX(a, b) handles situations where you need INDEX(a), so DROP the latter.
I have a table containing thousands of records representing the temperature of a room in a certain moment. Up to now I have been rendering a client side graph of the temperature with JQuery. However, as the amount of records increases, I think it makes no sense to provide so much data to the view, if it is not going to be able to represent them all in a single graph.
I would like to know if there exists a single MySQL query that returns one out of every n records in the table. If so, I think I could get a representative sample of the temperatures measured during a certain lapse of time.
Any ideas? Thanks in advance.
Edit: add table structure.
CREATE TABLE IF NOT EXISTS `temperature` (
`nid` int(10) unsigned NOT NULL COMMENT 'Node identifier',
`temperature` float unsigned NOT NULL COMMENT 'Temperature in Celsius degrees',
`timestamp` int(10) unsigned NOT NULL COMMENT 'Unix timestamp of the temperature record',
PRIMARY KEY (`nid`,`timestamp`)
)
You could do this, where the subquery is your query, and you add a row number to it:
SET #rows=0;
SELECT * from(
SELECT #rows:=#rows+1 AS rowNumber,nid,temperature,`timestamp`
FROM temperature
) yourQuery
WHERE MOD(rowNumber, 5)=0
The mod would choose every 5th row: The 5 here is your n. so 5th row, then 10th, 15th etc.
Not really sure what your asking but you have multiple options
You can limit your results to n (n representing the amount of temperatures you want to display)
just a simple query with the limit in the end:
select * from tablename limit 1000
You could use a time/date restraint so you display only the results of the last n days.
Here is an example that uses date functions. The following query selects all rows with a date_col value from within the last 30 days:
mysql> SELECT something FROM tbl_name
-> WHERE DATE_SUB(CURDATE(),INTERVAL 30 DAY) <= date_col;
You could select an average temperature of a certain period, the shorter the period the more results you'll get. You can group by date, yearweek, month etc. to "create the periods"
Clearly, I am missing the forest for the trees...I am missing something obvious here!
Scenario:
I've a typical table asset_locator with multiple fields:
id, int(11) PRIMARY
logref, int(11)
unitno, int(11)
tunits, int(11)
operator, varchar(24)
lineid, varchar(24)
uniqueid, varchar(64)
timestamp, timestamp
My current challenge is to SELECT records from this table based on a date range. More specifically, a date range using the MAX(timestamp) field.
So...when selecting I need to start with the latest timestamp value and go back 3 days.
EX: I select all records WHERE the lineid = 'xyz' and going back 3 days from the latest timestamp. Below is an actual example (of the dozens) I've been trying to run.
MySQL returns a single row with all NULL values for the following:
SELECT id, logref, unitno, tunits, operator, lineid,
uniqueid, timestamp, MAX( timestamp ) AS maxdate
FROM asset_locator
WHERE 'maxdate' < DATE_ADD('maxdate',INTERVAL -3 DAY)
ORDER BY uniqueid DESC
There MUST be something obvious I am missing. If anyone has any ideas, please share.
Many thanks!
MAX() is an aggregated function, which means your SELECT will always return one row containing the maximum value. Unless you use GROUP BY, but it looks that's not what you need.
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_max
If you need all the entries between MAX(timestamp) and 3 days before, then you need to do a subselect to obtain the max date, and after that use it in the search condition. Like this:
SELECT id, logref, unitno, tunits, operator, lineid, uniqueid, timestamp
FROM asset_locator
WHERE timestamp >= DATE_ADD( (SELECT MAX(timestamp) FROM asset_locator), INTERVAL -3 DAY)
It will still run efficiently as long as you have an index defined on timestamp column.
Note: In your example
WHERE 'maxdate' < DATE_ADD('maxdate',INTERVAL -3 DAY)
Here you were are actually using the string "maxdate" because of the quotes causing the condition to return false. That's why you were seeing NULL for all fields.
Edit: Oops, forgot the "FROM asset_locator" in query. It got lost at some point when writing the answer :)
my question is how can I select all 24 hours in a day as data in a select?
There is a more polished way to do that:
select 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
from dual
My target db is Mysql, but a sql-standard solution is appreciated!
MySQL doesn't have recursive functionality, so you're left with using the NUMBERS table trick -
Create a table that only holds incrementing numbers - easy to do using an auto_increment:
DROP TABLE IF EXISTS `example`.`numbers`;
CREATE TABLE `example`.`numbers` (
`id` int(10) unsigned NOT NULL auto_increment,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Populate the table using:
INSERT INTO NUMBERS
(id)
VALUES
(NULL)
...for as many values as you need. In this case, the INSERT statement needs to be run at least 25 times.
Use DATE_ADD to construct a list of hours, increasing based on the NUMBERS.id value:
SELECT x.dt
FROM (SELECT TIME(DATE_ADD('2010-01-01', INTERVAL (n.id - 1) HOUR)) AS dt
FROM numbers n
WHERE DATE_ADD('2010-01-01', INTERVAL (n.id - 1) HOUR) <= '2010-01-02' ) x
Why Numbers, not Dates?
Simple - dates can be generated based on the number, like in the example I provided. It also means using a single table, vs say one per data type.
SELECT * from dual WHERE field BETWEEN 0 AND 24