mysql view with group by - performance problem - mysql

I have a table which collects data for web pages performance. There are multiple machines, testing multiple sites in 10 minutes intervals, so currently I have about 700 000 rows (920 MB) with +/- 50 000 new rows daily.
Table source:
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
CREATE TABLE `http_perf_raw_log` (
`run_dt` int(11) DEFAULT NULL,
`dataset` varchar(64) DEFAULT NULL,
`runner` varchar(64) DEFAULT NULL,
`site` varchar(128) DEFAULT NULL,
`machine` varchar(32) DEFAULT NULL,
`called_url` varchar(1024) DEFAULT NULL,
`method` varchar(8) DEFAULT NULL,
`url` varchar(1024) DEFAULT NULL,
`content_type` varchar(64) DEFAULT NULL,
`http_code` int(11) DEFAULT NULL,
`header_size` int(11) DEFAULT NULL,
`request_size` int(11) DEFAULT NULL,
`filetime` int(11) DEFAULT NULL,
`ssl_verify_result` int(11) DEFAULT NULL,
`redirect_count` int(11) DEFAULT NULL,
`total_time` decimal(6,4) DEFAULT NULL,
`namelookup_time` decimal(6,4) DEFAULT NULL,
`connect_time` decimal(6,4) DEFAULT NULL,
`pretransfer_time` decimal(6,4) DEFAULT NULL,
`starttransfer_time` decimal(6,4) DEFAULT NULL,
`redirect_time` decimal(6,4) DEFAULT NULL,
`size_upload` int(11) DEFAULT NULL,
`size_download` int(11) DEFAULT NULL,
`speed_download` int(11) DEFAULT NULL,
`speed_upload` int(11) DEFAULT NULL,
`download_content_length` int(11) DEFAULT NULL,
`upload_content_length` int(11) DEFAULT NULL,
`certinfo` varchar(1024) DEFAULT NULL,
`request_header` varchar(1024) DEFAULT NULL,
`return_content` varchar(4096) DEFAULT NULL,
`return_headers` varchar(2048) DEFAULT NULL,
KEY `run_dt_idx` (`run_dt`),
KEY `dataset_idx` (`dataset`),
KEY `runner_idx` (`runner`),
KEY `site_idx` (`site`),
KEY `machine_idx` (`machine`),
KEY `total_time_idx` (`total_time`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
For aggregating stats (with 1 hour resolution), I created a view:
CREATE OR REPLACE VIEW http_perf_stats (dataset, runner, site, machine, day, hour, calls, total_time, namelookup_time, connect_time, pretransfer_time, starttransfer_time, size_download) AS
SELECT dataset, runner, site, machine,
DATE_FORMAT(run_dt, '%Y-%m-%d') AS day,
DATE_FORMAT(run_dt, '%k') AS hour,
COUNT(*) AS calls,
SUM(total_time),
SUM(namelookup_time),
SUM(connect_time),
SUM(pretransfer_time),
SUM(starttransfer_time),
SUM(size_download)
FROM http_perf_raw_log GROUP BY runner, site, machine, day, hour ORDER BY `day` DESC
But the performance of VIEW (and underlying SELECT) is terrible - takes about 4 seconds.
So, my questions:
1. Is using GROUP BY in a VIEW good idea at all? And if not, what is better alternative?
2. Is there ( I imagine yes, I am not SQL expert :/) a way to optimize this SELECT (changing query or structure of http_perf_raw_log)?

Remove the GROUP BY from the VIEW and use it in the SELECT that calls the VIEW.

In this case it might be a good idea to only create statistics periodically (once per hour for example).
I'd do that as follows. Run the following code once to create a table structure.
CREATE TABLE http_perf_stats AS
SELECT dataset, runner, site, machine,
DATE_FORMAT(run_dt, '%Y-%m-%d') AS day,
DATE_FORMAT(run_dt, '%k') AS hour,
COUNT(*) AS calls,
SUM(total_time),
SUM(namelookup_time),
SUM(connect_time),
SUM(pretransfer_time),
SUM(starttransfer_time),
SUM(size_download)
FROM http_perf_raw_log
GROUP BY runner, site, machine, day, hour
ORDER BY `day` DESC
Make some modifications like changing field types, default values, adding a primary key, and perhaps add some indexes so that you can access and query this table in a fast way.
From then on, update the table like this:
START TRANSACTION;
DELETE FROM http_perf_stats;
INSERT INTO TABLE
SELECT dataset, runner, site, machine,
DATE_FORMAT(run_dt, '%Y-%m-%d') AS day,
DATE_FORMAT(run_dt, '%k') AS hour,
COUNT(*) AS calls,
SUM(total_time),
SUM(namelookup_time),
SUM(connect_time),
SUM(pretransfer_time),
SUM(starttransfer_time),
SUM(size_download)
FROM http_perf_raw_log
GROUP BY runner, site, machine, day, hour
ORDER BY `day` DESC;
COMMIT;
Several ways to do this:
Create a MySQL event (see http://dev.mysql.com/doc/refman/5.1/en/create-event.html) (that's how I would do it)
Create a cron job (unix-flavoured systems) or window scheduler task
Do a "lazy" update. When somebody requests this list, run the code above if the last time it was ran was longer than x minutes/hours ago. That way it works more like a cache. Slow on the first request, fast after. But you won't slow the server down unless somebody is interested in this.

The view is just another SELECT query, but abstracted away to make it easier querying the resultset. If the underlying SELECT is slow, so is the view. Reading through and summing together 1 GB of data in four seconds doesn't sound slow at all to me.

Related

Speed Up A Large Insert From Select Query With Multiple Joins

I'm trying to denormalize a few MySQL tables I have into a new table that I can use to speed up some complex queries with lots of business logic. The problem that I'm having is that there are 2.3 million records I need to add to the new table and to do that I need to pull data from several tables and do a few conversions too. Here's my query (with names changed)
INSERT INTO database_name.log_set_logs
(offload_date, vehicle, jurisdiction, baselog_path, path,
baselog_index_guid, new_location, log_set_name, index_guid)
(
select STR_TO_DATE(logset_logs.offload_date, '%Y.%m.%d') as offload_date,
logset_logs.vehicle, jurisdiction, baselog_path, path,
baselog_trees.baselog_index_guid, new_location, logset_logs.log_set_name,
logset_logs.index_guid
from
(
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 7), '/', -1) as offload_date,
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle,
SUBSTRING_INDEX(path, '/', 9) as baselog_path, index_guid,
path, log_set_name
FROM database_name.baselog_and_amendment_guid_to_path_mappings
) logset_logs
left join database_name.log_trees baselog_trees
ON baselog_trees.original_location = logset_logs.baselog_path
left join database_name.baselog_offload_location location
ON location.baselog_index_guid = baselog_trees.baselog_index_guid);
The query itself works because I was able to run it using a filter on log_set_name however that filter's condition will only work for less than 1% of the total records because one of the values for log_set_name has 2.2 million records in it which is the majority of the records. So there is nothing else I can use to break this query up into smaller chunks from what I can see. The problem is that the query is taking too long to run on the rest of the 2.2 million records and it ends up timing out after a few hours and then the transaction is rolled back and nothing is added to the new table for the 2.2 million records; only the 0.1 million records were able to be processed and that was because I could add a filter that said where log_set_name != 'value with the 2.2 million records'.
Is there a way to make this query more performant? Am I trying to do too many joins at once and perhaps I should populate the row's columns in their own individual queries? Or is there some way I can page this type of query so that MySQL executes it in batches? I already got rid of all my indexes on the log_set_logs table because I read that those will slow down inserts. I also jacked my RDS instance up to a db.r4.4xlarge write node. I am also using MySQL Workbench so I increased all of it's timeout values to their maximums giving them all nines. All three of these steps helped and were necessary in order for me to get the 1% of the records into the new table but it still wasn't enough to get the 2.2 million records without timing out. Appreciate any insights as I'm not adept to this type of bulk insert from a select.
'CREATE TABLE `log_set_logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`purged` tinyint(1) NOT NULL DEFAUL,
`baselog_path` text,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`new_location` text,
`offload_date` date NOT NULL,
`jurisdiction` varchar(20) DEFAULT NULL,
`vehicle` varchar(20) DEFAULT NULL,
`index_guid` varchar(36) NOT NULL,
`path` text NOT NULL,
`log_set_name` varchar(60) NOT NULL,
`protected_by_retention_condition_1` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_2` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_3` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_4` tinyint(1) NOT NULL DEFAULT ''1'',
`general_comments_about_this_log` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1736707 DEFAULT CHARSET=latin1'
'CREATE TABLE `baselog_and_amendment_guid_to_path_mappings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`path` text NOT NULL,
`index_guid` varchar(36) NOT NULL,
`log_set_name` varchar(60) NOT NULL,
PRIMARY KEY (`id`),
KEY `log_set_name_index` (`log_set_name`),
KEY `path_index` (`path`(42))
) ENGINE=InnoDB AUTO_INCREMENT=2387821 DEFAULT CHARSET=latin1'
...
'CREATE TABLE `baselog_offload_location` (
`baselog_index_guid` varchar(36) NOT NULL,
`jurisdiction` varchar(20) NOT NULL,
KEY `baselog_index` (`baselog_index_guid`),
KEY `jurisdiction` (`jurisdiction`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
'CREATE TABLE `log_trees` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`original_location` text NOT NULL, -- This is what I have to join everything on and since it's text I cannot index it and the largest value is above 255 characters so I cannot change it to a vachar then index it either.
`new_location` text,
`distcp_returncode` int(11) DEFAULT NULL,
`distcp_job_id` text,
`distcp_stdout` text,
`distcp_stderr` text,
`validation_attempt` int(11) NOT NULL DEFAULT ''0'',
`validation_result` tinyint(1) NOT NULL DEFAULT ''0'',
`archived` tinyint(1) NOT NULL DEFAULT ''0'',
`archived_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`dir_exists` tinyint(1) NOT NULL DEFAULT ''0'',
`random_guid` tinyint(1) NOT NULL DEFAULT ''0'',
`offload_date` date NOT NULL,
`vehicle` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `baselog_index_guid` (`baselog_index_guid`)
) ENGINE=InnoDB AUTO_INCREMENT=1028617 DEFAULT CHARSET=latin1'
baselog_offload_location has not PRIMARY KEY; what's up?
GUIDs/UUIDs can be terribly inefficient. A partial solution is to convert them to BINARY(16) to shrink them. More details here: http://localhost/rjweb/mysql/doc.php/uuid ; (MySQL 8.0 has similar functions.)
It would probably be more efficient if you have a separate (optionally redundant) column for vehicle rather than needing to do
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle
Why JOIN baselog_offload_location? Three seems to be no reference to columns in that table. If there, be sure to qualify them so we know what is where. Preferably use short aliases.
The lack of an index on baselog_index_guid may be critical to performance.
Please provide EXPLAIN SELECT ... for the SELECT in your INSERT and for the original (slow) query.
SELECT MAX(LENGTH(original_location)) FROM .. -- to see if it really is too big to index. What version of MySQL are you using? The limit increased recently.
For the above item, we can talk about having a 'hash'.
"paging the query". I call it "chunking". See http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks . That talks about deleting, but it can be adapted to INSERT .. SELECT since you want to "chunk" the select. If you go with chunking, Javier's comment becomes moot. Your code would be chunking the selects, hence batching the inserts:
Loop:
INSERT .. SELECT .. -- of up to 1000 rows (see link)
End loop

Why is this query being logged as "not using indexes"?

For some reason my slow query log is reporting the following query as "not using indexes" and for the life of me I cannot understand why.
Here is the query:
update scheduletask
set active = 0
where nextrun < date_sub( now(), interval 2 minute )
and enabled = 1
and active = 1;
Here is the table:
CREATE TABLE `scheduletask` (
`scheduletaskid` int(11) NOT NULL AUTO_INCREMENT,
`schedulethreadid` int(11) NOT NULL,
`taskname` varchar(50) NOT NULL,
`taskpath` varchar(100) NOT NULL,
`tasknote` text,
`recur` int(11) NOT NULL,
`taskinterval` int(11) NOT NULL,
`lastrunstart` datetime NOT NULL,
`lastruncomplete` datetime NOT NULL,
`nextrun` datetime NOT NULL,
`active` int(11) NOT NULL,
`enabled` int(11) NOT NULL,
`creatorid` int(11) NOT NULL,
`editorid` int(11) NOT NULL,
`created` datetime NOT NULL,
`edited` datetime NOT NULL,
PRIMARY KEY (`scheduletaskid`),
UNIQUE KEY `Name` (`taskname`),
KEY `IDX_NEXTRUN` (`nextrun`)
) ENGINE=InnoDB AUTO_INCREMENT=34 DEFAULT CHARSET=latin1;
Add another index like this
KEY `IDX_COMB` (`nextrun`, `enabled`, `active`)
I'm not sure how many rows your table have but the following might apply as well
Sometimes MySQL does not use an index, even if one is available. One
circumstance under which this occurs is when the optimizer estimates
that using the index would require MySQL to access a very large
percentage of the rows in the table. (In this case, a table scan is
likely to be much faster because it requires fewer seeks.)
try using the "explain" command in mysql.
http://dev.mysql.com/doc/refman/5.5/en/explain.html
I think explain only works on select statements, try:
explain select * from scheduletask where nextrun < date_sub( now(), interval 2 minute ) and enabled = 1 and active = 1;
Maybe if you use, nextrun = ..., it will macht the key IDX_NEXTRUN. In your where clause has to be one of your keys, scheduletaskid, taskname or nextrun
Sorry for the short answer but I don't have time to write a complete solution.
I believe you can fix your issue by saving date_sub( now(), interval 2 minute ) in a temporary variable before using it in the query, see here maybe: MySql How to set a local variable in an update statement (Syntax?).

Would using a table with preprocessed date conditions improve efficiency?

I inherited a system that keeps track of temperature data related to time. I had asked a previous question about it: (What is the most efficient way to store a collection of temperature values into MYSQL?)
The system has a separate table which is used to keep track of dates (shown below). It contains several descriptor columns of the current day. I am hesitant of the benefits this kind of structure provides, as it seems to add extra weight to do the same thing a few date functions and math can do.
I was told by the creator of the system that it is better to select a range of data by using the DATE_ID with operators instead of a date function.
For example: Let's say you want to collect all temperature information from June 1st, 2012 till the end of 2012, you could do the following.
1) Get the date ID that corresponds to June 1st, 2012. Lets say the id was 23000
2) Get the date ID that corresponds to the end of the year by using something like:
SELECT DATE_ID FROM DATE_REPRESENTATION WHERE DATE_ID >= 23000 AND END_YEAR_FLAG = 1 AND LIMIT 1;
Lets say that one was 23213
3) Now we would have 2 DATE_IDs, which we could just use like so:
SELECT * FROM temperature_readings WHERE DATE_ID BETWEEN 23000 AND 23213;
I feel that it might be better to properly index the 'temperature_readings' table and use date functions. For example:
SELECT ...... actual_date BETWEEN DATE('2012-06-01') AND LAST_DAY(DATE_ADD(DATE('2012-06-01'), INTERVAL (12 - MONTH(DATE('2012-06-01'))) MONTH))
Is there a better solution than what is currently in use in terms of improving the overall performance? In the previous question, I mention that the system uses the data to produce graphs and alerts based on the data selected by date ranges (daily,weekly, monthly, yearly, or a range that a user can specify).
Current table:
CREATE TABLE `DATE_REPRESENTATION` (
`DATE_ID` int(10) NOT NULL,
`DAY_DATE` timestamp NULL DEFAULT NULL,
`DATE_DESC_LONG` varchar(18) DEFAULT NULL,
`MB_DATE_M_D_YYYY` varchar(18) DEFAULT NULL,
`WEEKDAY` varchar(9) DEFAULT NULL,
`WEEKDAY_ABBREV` char(4) DEFAULT NULL,
`WEEKDAY_NUM` decimal(1,0) DEFAULT NULL,
`WEEK` char(13) DEFAULT NULL,
`WEEK_NUM` decimal(4,0) DEFAULT NULL,
`WEEK_NUM_ABS` decimal(4,0) DEFAULT NULL,
`MONTH_LONG` varchar(9) DEFAULT NULL,
`MONTH_ABBREV` char(3) DEFAULT NULL,
`MONTH_NUM` decimal(2,0) DEFAULT NULL,
`MONTH_NUM_ABS` decimal(5,0) DEFAULT NULL,
`QUARTER` char(1) DEFAULT NULL,
`QUARTER_NUM` decimal(1,0) DEFAULT NULL,
`QUARTER_NUM_ABS` decimal(5,0) DEFAULT NULL,
`YEAR4` decimal(4,0) DEFAULT NULL,
`BEG_WEEK_FLAG` decimal(1,0) DEFAULT NULL,
`END_WEEK_FLAG` decimal(1,0) DEFAULT NULL,
`BEG_MONTH_FLAG` decimal(1,0) DEFAULT NULL,
`END_MONTH_FLAG` decimal(1,0) DEFAULT NULL,
`BEG_QUARTER_FLAG` decimal(1,0) DEFAULT NULL,
`END_QUARTER_FLAG` decimal(1,0) DEFAULT NULL,
`BEG_YEAR_FLAG` decimal(1,0) DEFAULT NULL,
`END_YEAR_FLAG` decimal(1,0) DEFAULT NULL,
PRIMARY KEY (`DATE_ID`),
UNIQUE KEY `DATEID_PK` (`DATE_ID`),
KEY `timeStampky` (`DAY_DATE`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
A DATE should be stored internally as just a number, so the only thing I can imagine is that the old person used to store dates as CHAR and suffered for it :)
When MySQL calculates the BETWEEN values, it will do that once, so there will be little math to be done. Add in the standard optimizations (preparing, parameterizing, indexing, etc), and you should be fine.
The formulas might be a little illegible. Maybe you could wrap them in a stored procedure, so you could call GET_LAST_DAY_OF_QUARTER(date) instead of putting all the date math in the SELECT.

MySQL Query Optimization on a Big Table

I am working with mysql querying a table that has 12 millions registers that are a year of the said data.
The query has to select certain kind of data (coin, enterprise, type, etc..) and then provide a daily average for certain fields of that data, so we can graph it afterwards.
The dream its to be able to do this in real time, so with a response time less than 10 secs, however at the moment its not looking bright at all as its taking between 4 to 6 minutes.
For example, one of the where querys come up with 150k registers, split about 500 per day, and then we average three fields (which are not on the where clause) using a AVG() and GroupBy.
Now, to the raw data, the query is
SELECT
`Valorizacion`.`fecha`, AVG(tir) AS `tir`, AVG(tirBase) AS `tirBase`, AVG(precioPorcentajeValorPar) AS `precioPorcentajeValorPar`
FROM `Valorizacion` USE INDEX (ix_mercado2)
WHERE
(Valorizacion.fecha >= '2011-07-17' ) AND
(Valorizacion.fecha <= '2012-07-18' ) AND
(Valorizacion.plazoResidual >= 365 ) AND
(Valorizacion.plazoResidual <= 3650000 ) AND
(Valorizacion.idMoneda_cache IN ('UF')) AND
(Valorizacion.idEmisorFusionado_cache IN ('ABN AMRO','WATTS', ...)) AND
(Valorizacion.idTipoRA_cache IN ('BB', 'BE', 'BS', 'BU'))
GROUP BY `Valorizacion`.`fecha` ORDER BY `Valorizacion`.`fecha` asc;
248 rows in set (4 min 28.82 sec)
The index is made over all the where clause fields in the order
(fecha, idTipoRA_cache, idMoneda_cache, idEmisorFusionado_cache, plazoResidual)
Selecting the "where" registers, without using group by or AVG
149670 rows in set (58.77 sec)
And selecting the registers, grouping and just doing a count(*) istead of average takes
248 rows in set (35.15 sec)
Which probably its because it doesnt need to go to the disk to search for the data but its obtained directly from the index queries.
So as far as it goes im of the idea of telling my boss "Im sorry but it cant be done", but before doing so i come to you guys asking if you think there is something i could do to improve this. I think i could improve the search by index time moving the index with the biggest cardinality to the front and so on, but even after that the time that takes to access the disk for each record and do the AVG seems too much.
Any ideas?
-- EDIT, the table structure
CREATE TABLE `Valorizacion` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`idInstrumento` int(11) NOT NULL,
`fecha` date NOT NULL,
`tir` decimal(10,4) DEFAULT NULL,
`tirBase` decimal(10,4) DEFAULT NULL,
`plazoResidual` double NOT NULL,
`duracionMacaulay` double DEFAULT NULL,
`duracionModACT365` double DEFAULT NULL,
`precioPorcentajeValorPar` decimal(20,15) DEFAULT NULL,
`valorPar` decimal(20,15) DEFAULT NULL,
`convexidad` decimal(20,15) DEFAULT NULL,
`volatilidad` decimal(20,15) DEFAULT NULL,
`montoCLP` double DEFAULT NULL,
`tirACT365` decimal(10,4) DEFAULT NULL,
`tipoVal` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`idEmisorFusionado_cache` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`idMoneda_cache` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`idClasificacionRA_cache` int(11) DEFAULT NULL,
`idTipoRA_cache` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
`fechaPrepagable_cache` date DEFAULT NULL,
`tasaEmision_cache` decimal(10,4) DEFAULT NULL,
PRIMARY KEY (`id`,`fecha`),
KEY `ix_FechaNemo` (`fecha`,`idInstrumento`) USING BTREE,
KEY `ix_mercado_stackover` (`idMoneda_cache`,`idTipoRA_cache`,`idEmisorFusionado_cache`,`plazoResidual`)
) ENGINE=InnoDB AUTO_INCREMENT=12933194 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Selecting 150K records out of 12M records and performing aggregate functions on them will not be fast no matter what you try to do.
You are probably dealing with primarily historical data as your sample query is for a year of data. A better approach may be to pre-calculate your daily averages and put them into separate tables. Then you may query those tables for reporting, graphs, etc. You will need to decide when and how to run such calculations so that you don't need to re-run them again on the same data.
When your requirement is to do analysis and reporting on millions of historical records you need to consider a data warehouse approach http://en.wikipedia.org/wiki/Data_warehouse rather than a simple database approach.

MySQL: Precision of a Datefield

since I have launched a podcast recently I wanted to analyse our Downloaddata. But some clients seem to send multiple requests. So I wanted to only count one request per IP and User-Agent every 15 Minutes. Best thing I could come up with is the following query, that counts one request per IP and User-Agent every hour. Any ideas how to solve that Problem in MySQL?
SELECT episode, podcast, DATE_FORMAT(date, '%d.%m.%Y %k') as blurry_date, useragent, ip FROM downloaddata GROUP BY ip, useragent
This is the table I've got
CREATE TABLE `downloaddata` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`podcast` varchar(255) DEFAULT NULL,
`episode` int(4) DEFAULT NULL,
`source` varchar(255) DEFAULT NULL,
`useragent` varchar(255) DEFAULT NULL,
`referer` varchar(255) DEFAULT NULL,
`filetype` varchar(15) DEFAULT NULL,
`ip` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=216 DEFAULT CHARSET=utf8;
Personally I'd recomend collecting every request, and then only taking one every 15 mins with a distict query, or perhaps counting the number every 15 mins.
If you are determined to throw data away so it can never be analysed though.
Quick and simple is to just the date and have an int column which is the 15 minute period,
Hour part of current time * 4 + Minute part / 4
DatePart functions are what you want to look up. Things is each time you want to record, you'll have to check if they have in the 15 minute period. Extra work, extra complexity and less / lower quality data...
MINUTE(date)/15 will give you the quarter hour (0-3). Ensure that along with the date is unique (or ensure UNIX_TIMESTAMP(date)/(15*60) is unique).