I have added advertisements to my website which have quite some conditions to meet before delivering to a browsing user. Here's a detailed explanation:
These are the fields that require explaining:
start is by default '0000-00-00' and it indicates whether the ad has been yet paid or not. When an ad payment is accepted start is set to the day after, or any date the customer choses.
impresssions is respectively the remaining impressions of the advertisement
impressions_total and impressions_perday are self explanatory
and the other fields used in the query are just fields that validate whether the user falls into the specifications of the advertisement's auditory
An advertisement has to be paid to start displaying in the first place, however it can be set to start on a future date so the start value will be set but the ad shouldn't show up before it is time to. Then since customers can limit impressions per day I need to pick up only advertisements that have enough impressions for the day in progress. For example if an advertisement is started in 30/08/2013 with 10,000 impressions and 2,000 impressions per day then it shouldn't be able to show up today (31/08/2013) if it has less than 6,000 impressions because it's the second day of the campaign. As well as if the term period is say 5 days, and 5 days have passed, the advertisement has to be shown regardless of remaining impressions. Then there are those other comparisons to validate that the user is fit for this ad to display and the whole thing gets so complicated.
I am not quite good with mysql, although I have managed to construct a working query I am very concerned about optimizing it. I am most certain that the methods I have used are highly inefficient but I couldn't find a better way online. That's why I'm asking this question here, if anyone can help me improve the performance of this query?
SELECT `fields`,
FROM `ads`
WHERE (`impressions`>0 && `start`!='0000-00-00')
AND `start`<CURDATE() AND
(
`impressions`>(`impressions_total`-(CONVERT(CURDATE()-date(`start`), UNSIGNED)*`impressions_perday`))
OR (`impressions_total`/`impressions_perday` < CURDATE()-date(`start`))
-- this is the part where I validate the impressions for the day
-- and am most concerned that I haven't built correctly
)
AND
(
(
(YEAR(NOW())-YEAR("user's birthday") BETWEEN `ageMIN` AND `ageMax`)
AND (`sex`=2 OR `sex`="user's gender")
AND (`country`='' OR `country`="user's country")
) OR `applyToUnregistered` = 1
)
ORDER BY $random_order -- Generate random order pattern
Schema:
CREATE TABLE `ads` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`headline` varchar(25) NOT NULL,
`text` varchar(90) NOT NULL,
`url` varchar(50) NOT NULL,
`country` varchar(2) DEFAULT '0',
`ageMIN` tinyint(2) unsigned NOT NULL,
`ageMax` tinyint(2) unsigned NOT NULL,
`sex` tinyint(1) unsigned NOT NULL DEFAULT '2',
`applyToUnregistered` tinyint(1) unsigned NOT NULL DEFAULT '0',
`creator` int(10) unsigned NOT NULL,
`created` int(10) unsigned NOT NULL,
`start` date NOT NULL,
`impressions_total` int(10) unsigned NOT NULL,
`impressions_perday` mediumint(8) unsigned NOT NULL,
`impressions` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=27 DEFAULT CHARSET=utf8
You have a very complicated query from an optimization perspective.
The only indexes that can be used on the where clause are on ads(impressions) or ads(start). Because you use inequalities, you cannot combine them.
Can you modify the table structure to have an ImpressionsFlag? This would be 1 if there are any impressions and 0 otherwise. If so, then you can try an index on ads(ImpressionsFlag, Start).
If that helps with performance, the next step would be to break up the query into separate subqueries and bring them together using union all. The purpose is to design indexes to optimize the underlying queries.
Related
For self education I am developing an invoicing system for an electricity company. I have multiple time series tables, with different intervals. One table represents consumption, two others represent prices. A third price table should be still incorporated. Now I am running calculation queries, but the queries are slow. I would like to improve the query speed, especially since this is only the beginning calculations and the queries will only become more complicated. Also please note that this is my first database i created and exercises I have done. A simplified explanation is preferred. Thanks for any help provided.
I have indexed: DATE, PERIOD_FROM, PERIOD_UNTIL in each table. This speed up the process from 60 seconds to 5 seconds.
The structure of the tables is the following:
CREATE TABLE `apxprice` (
`APX_id` int(11) NOT NULL AUTO_INCREMENT,
`DATE` date DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`PRICE` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`APX_id`)
) ENGINE=MyISAM AUTO_INCREMENT=28728 DEFAULT CHARSET=latin1
CREATE TABLE `imbalanceprice` (
`imbalanceprice_id` int(11) NOT NULL AUTO_INCREMENT,
`DATE` date DEFAULT NULL,
`PTU` tinyint(3) DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`UPWARD_INCIDENT_RESERVE` tinyint(1) DEFAULT NULL,
`DOWNWARD_INCIDENT_RESERVE` tinyint(1) DEFAULT NULL,
`UPWARD_DISPATCH` decimal(10,2) DEFAULT NULL,
`DOWNWARD_DISPATCH` decimal(10,2) DEFAULT NULL,
`INCENTIVE_COMPONENT` decimal(10,2) DEFAULT NULL,
`TAKE_FROM_SYSTEM` decimal(10,2) DEFAULT NULL,
`FEED_INTO_SYSTEM` decimal(10,2) DEFAULT NULL,
`REGULATION_STATE` tinyint(1) DEFAULT NULL,
`HOUR` int(2) DEFAULT NULL,
PRIMARY KEY (`imbalanceprice_id`),
KEY `DATE` (`DATE`,`PERIOD_FROM`,`PERIOD_UNTIL`)
) ENGINE=MyISAM AUTO_INCREMENT=117427 DEFAULT CHARSET=latin
CREATE TABLE `powerload` (
`powerload_id` int(11) NOT NULL AUTO_INCREMENT,
`EAN` varchar(18) DEFAULT NULL,
`DATE` date DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`POWERLOAD` int(11) DEFAULT NULL,
PRIMARY KEY (`powerload_id`)
) ENGINE=MyISAM AUTO_INCREMENT=61039 DEFAULT CHARSET=latin
Now when running this query:
SELECT i.DATE, i.PERIOD_FROM, i.TAKE_FROM_SYSTEM, i.FEED_INTO_SYSTEM,
a.PRICE, p.POWERLOAD, sum(a.PRICE * p.POWERLOAD)
FROM imbalanceprice i, apxprice a, powerload p
WHERE i.DATE = a.DATE
and i.DATE = p.DATE
AND i.PERIOD_FROM >= a.PERIOD_FROM
and i.PERIOD_FROM = p.PERIOD_FROM
AND i.PERIOD_FROM < a.PERIOD_UNTIL
AND i.DATE >= '2018-01-01'
AND i.DATE <= '2018-01-31'
group by i.DATE
I have run the query with explain and get the following result: Select_type, all simple partitions all null possible keys a,p = null i = DATE Key a,p = null i = DATE key_len a,p = null i = 8 ref a,p = null i = timeseries.a.DATE,timeseries.p.PERIOD_FROM rows a = 28727 p = 61038 i = 1 filtered a = 100 p = 10 i = 100 a extra: using where using temporary using filesort b extra: using where using join buffer (block nested loop) c extra: null
Preferably I run a more complicated query for a whole year and group by month for example with all price tables incorporated. However, this would be too slow. I have indexed: DATE, PERIOD_FROM, PERIOD_UNTIL in each table. The calculation result may not be changed, in this case quarter hourly consumption of two meters multiplied by hourly prices.
"Categorically speaking," the first thing you should look at is indexes.
Your clauses such as WHERE i.DATE = a.DATE ... are categorically known as INNER JOINs, and the SQL engine needs to have the ability to locate the matching rows "instantly." (That is to say, without looking through the entire table!)
FYI: Just like any index in real-life – here I would be talking about "library card catalogs" if we still had such a thing – indexes will assist both "equal to" and "less/greater than" queries. The index takes the computer directly to a particular point in the data, whether that's a "hit" or a "near miss."
Finally, the EXPLAIN verb is very useful: put that word in front of your query, and the SQL engine should "explain to you" exactly how it intends to carry out your query. (The SQL engine looks at the structure of the database to make that decision.) Although the EXPLAIN output is ... (heh) ... "not exactly standardized," it will help you to see if the computer thinks that it needs to do something very time-wasting in order to deliver your answer.
So I have a table that gets a relatively good insert performance ( 2500-3000 rows inserted per second ) but once it gets bigger than my allocated memory (2gb) it slows down 15 times... The worst part is that it will eventually grow to around 20gb...
It looks like this:
CREATE TABLE `slots` (
`customerid` int(11) NOT NULL,
`orderid` int(11) NOT NULL,
`queueid` int(11) NOT NULL AUTO_INCREMENT,
`item_id` int(3) NOT NULL,
`variable1` int(3) NOT NULL,
`variable2` int(3) NOT NULL,
`variable3` int(3) NOT NULL,
`variable4` int(3) NOT NULL,
`variable5` int(3) NOT NULL,
`variable6` int(3) NOT NULL,
`variable7` tinyint(1) NOT NULL,
`variable8` tinyint(1) NOT NULL,
`variable9` tinyint(1) NOT NULL,
PRIMARY `queueid` (`queueid`, `customerid`),
UNIQUE KEY `customerid` (`customerid`,`orderid`)
) ENGINE=InnoDB AUTO_INCREMENT=25883472 DEFAULT CHARSET=latin1
The reason I believe that happens is because I check if the insert isn't duplicated(INSERT IGNORE) so MySQL server needs to check all of the unique-index values - and since they no longer fit in the memory, it has to use the much slower disk I/O.
I've decided to partition my table -> I've made 30 ranges that split the 50 million rows into 30 parts (by range of the customerid field).
But still when I hit the memory limit, it slows down.
Are there any tricks to keep inserts high? This guy (http://www.slideshare.net/bluesmoon/scaling-mysql-writes-through-partitioning-3397422) seemed to use partitioning with great success. My problem is that my data comes in not sequential order so I can't simply append the row to the last position in the index ( sometimes it gets inserted in the middle of the table, sometimes at the beginning, sometimes at end etc. )
How would you construct the keys to get the highest possible insert/select performance? It seems to me that there is some obvious thing that I could do (besides getting more RAM) to keep the speed high.
If any MySQL genius is reading this - kindly please spend a minute and help me with this.
since I have launched a podcast recently I wanted to analyse our Downloaddata. But some clients seem to send multiple requests. So I wanted to only count one request per IP and User-Agent every 15 Minutes. Best thing I could come up with is the following query, that counts one request per IP and User-Agent every hour. Any ideas how to solve that Problem in MySQL?
SELECT episode, podcast, DATE_FORMAT(date, '%d.%m.%Y %k') as blurry_date, useragent, ip FROM downloaddata GROUP BY ip, useragent
This is the table I've got
CREATE TABLE `downloaddata` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`podcast` varchar(255) DEFAULT NULL,
`episode` int(4) DEFAULT NULL,
`source` varchar(255) DEFAULT NULL,
`useragent` varchar(255) DEFAULT NULL,
`referer` varchar(255) DEFAULT NULL,
`filetype` varchar(15) DEFAULT NULL,
`ip` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=216 DEFAULT CHARSET=utf8;
Personally I'd recomend collecting every request, and then only taking one every 15 mins with a distict query, or perhaps counting the number every 15 mins.
If you are determined to throw data away so it can never be analysed though.
Quick and simple is to just the date and have an int column which is the 15 minute period,
Hour part of current time * 4 + Minute part / 4
DatePart functions are what you want to look up. Things is each time you want to record, you'll have to check if they have in the 15 minute period. Extra work, extra complexity and less / lower quality data...
MINUTE(date)/15 will give you the quarter hour (0-3). Ensure that along with the date is unique (or ensure UNIX_TIMESTAMP(date)/(15*60) is unique).
For reference, this is my current table:
`impression` (
`impressionid` bigint(19) unsigned NOT NULL AUTO_INCREMENT,
`creationdate` datetime NOT NULL,
`ip` int(4) unsigned DEFAULT NULL,
`canvas2d` tinyint(1) DEFAULT '0',
`canvas3d` tinyint(1) DEFAULT '0',
`websockets` tinyint(1) DEFAULT '0',
`useragentid` int(10) unsigned NOT NULL,
PRIMARY KEY (`impressionid`),
UNIQUE KEY `impressionsid_UNIQUE` (`impressionid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=447267 ;
It keeps a record of all the impressions on a certain page. After one day of running, it has gathered 447266 views. Those are a lot of records.
Now I want the amount of visitors per minute. I can easily get them like this:
SELECT COUNT( impressionid ) AS visits, DATE_FORMAT( creationdate, '%m-%d %H%i' ) AS DATE
FROM `impression`
GROUP BY DATE
This query takes a long time, of course. Right now around 56 seconds.
So I'm wondering what to do next. Do I:
Create an index on creationdate (I don't know if that'll help since I'm using a function to alter this data by which to group)
Create new fields that stores hours and minutes separately.
The last one would cause there to be duplicate data, and I hate that. But maybe it's the only way in this case?
Or should I go about it in some different way?
If you run this query often, you could denormaize the calculated value into a separate column (perhaps by a trigger on insert/update) then grouping by that.
Your idea of hours and minutes is a good one too, since it lets you group a few different ways other than just minutes. It's still denormalization, but it's more versatile.
Denormalization is fine, as long as it's justified and understood.
I am creating a calendar webapp and I'm kinda stuck between a performance vs storage issue in the creation and subsequent queries of my events table. The concept is "how to make a table with repeating events (daily/weekly)?" Here is my current solution:
CREATE TABLE `events` (
`eventid` int(10) NOT NULL AUTO_INCREMENT, //primary key
`evttitle` varchar(255) NOT NULL, //title of event
`createdby` char(8) NOT NULL, //user identification (I'm using
`evtdatestart` date NOT NULL, ////another's login system)
`evtdateend` date NOT NULL,
`evttimestart` time NOT NULL,
`evttimeend` time NOT NULL,
`evtrepdaily` tinyint(1) NOT NULL DEFAULT 0, //if both are '0' then its
`evtrepweekly` tinyint(1) NOT NULL DEFAULT 0, //a one time event
`evtrepsun` tinyint(1) NOT NULL DEFAULT 0,
`evtrepmon` tinyint(1) NOT NULL DEFAULT 0,
`evtreptue` tinyint(1) NOT NULL DEFAULT 0,
`evtrepwed` tinyint(1) NOT NULL DEFAULT 0,
`evtrepthu` tinyint(1) NOT NULL DEFAULT 0,
`evtrepfri` tinyint(1) NOT NULL DEFAULT 0,
`evtrepsat` tinyint(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`eventid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I also have a very small table of numbers from 0 to 62 which can be used for many different things, but in the query of looking up events, it is used via MOD(num,7) as the day of the week. Here is that query:
SELECT date, evttitle, evttimestart, evttimeend
FROM ( //create result of all days between the span of two dates
SELECT DATE_ADD(startdate, INTERVAL (num-startday) DAY) AS date,
IF(MOD(num,7)=0,7,MOD(num,7)) AS weekday //1=sun...7=sat
FROM (
SELECT '#startdate' AS startdate, DAYOFWEEK('#startdate') AS startday,
'#enddate' AS enddate, DATEDIFF('#enddate','#startdate') AS diff
) AS span, numbers //numbers is 0-62
WHERE num>=startday AND num<=startday+diff
) AS daysinspan, events
WHERE evtdatestart<=date AND evtdateend>=date AND (
(evtdatestart=evtdateend) OR //single event
(evtrepdaily) OR //daily event
(evtrepweekly AND ( //weekly event
(weekday=1 AND evtrepsun) OR ////on Sunday
(weekday=2 AND evtrepmon) OR ////on Monday
(weekday=3 AND evtreptue) OR ////on Tuesday
(weekday=3 AND evtrepwed) OR ////on Wednesday
(weekday=3 AND evtrepthu) OR ////on Thursday
(weekday=3 AND evtrepfri) OR ////on Friday
(weekday=3 AND evtrepsat) ////on Saturday
)) //end of repeat truths
)
ORDER BY date, evtstarttime;
I like this way mostly because it is a pure SQL way of generating the results, it saves me from having to duplicate the event 50+ times for repeating events, and it makes it easier to modify the repeating events. However, it is unacceptable for performance to be slow as this is likely the most common query performed by users.
So another way would be to not include the evtrep columns and simply recreate a new, slightly different, event as many times is needed for the span. But I don't like this idea as the thought of duplicating that much data makes me cringe. However, if it will guarentee me significantly faster results (and clearly the lookup query would be much easier and faster), then I guess it can justify the extra storage.
Which do you all think to be the better plan? Or is there another one that I have not thought of/mentioned here?
Thanks in advance!
Update 1:
person-b made a good suggestion. I believe that person-b is suggesting that my table may not be in first normal form which could be true assuming that many (more than ~30%) of the events are non-repeating (in my case, however, 80%+ of the events will likely be repeating events). However, I think my question is due for a restating, as, in his first update, person-b's suggested change or adding a reptimes would simply push off the date processing to the back end (which in my case is PHP), whose processing time I still have to account for. But my main concern (question) is this: what is the fastest way on an average per query basis to compute the dates and times of every event without manually creating x number of entries for repeating events?
Try normalising your tables - you would separate the event information from the (multiple) date and recurrence information.
Update: Here's an example:
Table 1:
CREATE TABLE `events` (
`eventid` int(10) NOT NULL AUTO_INCREMENT, //primary key
`repeatid` int(10) NOT NULL,
`evttitle` varchar(255) NOT NULL, //title of event
`createdby` char(8) NOT NULL, //user identification (I'm using
`evtdatestart` date NOT NULL, ////another's login system)
`evtdateend` date NOT NULL,
`evttimestart` time NOT NULL,
`evttimeend` time NOT NULL,
PRIMARY KEY (`eventid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Table 2:
CREATE TABLE `repeats` (
`repweek` tinyint(1) NOT NULL DEFAULT 0, // if 0, don't repeat, otherwise, the day 1..7 of the week
`repday` tinyint(1) NOT NULL DEFAULT 0, // repeat daily if 1
`reptimes` int(10) NOT NULL DEFAULT 0, // 0 for indefinite, otherwise number of times
)
Then, use a query like this (untested):
SELECT e.evttitle, r.reptimes FROM events e, repeats r WHERE e.eventid = 9
More information (both from the same guide) on Simple joins and Normalisation.
This will make your system more flexible, and hopefully faster too.