Mysql Select Statement isn't working - mysql

So maybe it's due to lack of sleep, but I am having a major brain malfunction and can't remember what is going wrong here. Here is my statement:
SELECT DISTINCT `county`, COUNT(*)
FROM `ips`
WHERE `county` != 'NULL' AND `county` != '' AND
EXISTS (SELECT * FROM `pages`
WHERE (`timestamp` BETWEEN FROM_UNIXTIME(?) AND FROM_UNIXTIME(?)))
GROUP BY `county`
I'm expecting the results to be something like:
County | Number
Some county | 42
Other county | 27
My pages table has a timestamp of each time a page is viewed by a user, so if they viewed a page between the date, the county from the IP table is selected and the number of that total county is being populated as the num. I'm using PDO and i'm passing in two times that I've used strtotime() on.
I'm currently stuck. All help is apprieciated. Hopefully it's not some stupid little mistake that I've overlooked.

You cant compare null with != you need to use is not null.
SELECT `county`, COUNT(*)
FROM `ips`
WHERE `county` IS NOT NULL AND `county` != '' AND
EXISTS (SELECT 1 FROM `pages`
WHERE (`timestamp` BETWEEN FROM_UNIXTIME(?) AND FROM_UNIXTIME(?)))
GROUP BY `county`

Related

Having a hard time with mysql's GROUP BY or DISTINCT

I tried searching and searching for an answer to my problem but haven't successfully found one so I'm hoping one of you more experienced gurus can help me out with this MySQL issue I'm having.
On one of my websites I allow people to basically POST on their profile. Then on the side of my website, I list the last 8 profile posts that are authored by the recipient (in other words they post on their own profile) and do not show on the side posts done by other people. I call these "Updates"
Recently I noticed a woman had posted twice on her profile and it was showing two posts on the side of the website from her. I only want to show one post by each person, and only the last 8 "Updates"...
What I did which seemed to work at first, is use a GROUP BY in my query as you see here:
SELECT `ratemynudepics`.* FROM `ratemynudepics` WHERE `user_id` = `author` AND `body` <> '' GROUP BY `author` ORDER BY `id` DESC LIMIT 8
Her multiple posts on the side went down to just one post and I thought it was fixed until I tried posting a test "Update" of my own on my own profile. It didn't show up on the side at all.
I tried using a DISTINCT instead of GROUP BY as seen here:
SELECT DISTINCT `user_id`, `author`, `id`, `body`, `date` FROM ratemynudepics WHERE `user_id` = `author` AND `body` <> '' ORDER BY `id` DESC LIMIT 8
That change made it so my post did in fact appear on the side, but the woman's two updates were back underneath mine.
I've tried all sorts of variations and used both DISTINCT and GROUP BY at the same time and no matter what I try nothing will properly show mine up top, (which should be the case since mine is the last record in the database) with only one listing of the woman's. I checked many times to make sure she didn't post with a different user_id and sure enough testing the query in phpmyadmin shows two listings for her, both having the same user_id, and author. I'm not sure why DISTINCT is allowing for multiple rows with the same user_id using the above query.
I tried the following query:
SELECT DISTINCT `user_id`, `id`, `author`, `date`, `body` FROM ratemynudepics WHERE `user_id` = `author` AND `body` <> '' GROUP BY `author` ORDER by `id` DESC LIMIT 8
and it only shows her once but doesn't show me in any of those 8 results even though again my row is the last one inserted into the database.
Can someone please help me to understand what I'm doing wrong here so that I can properly display only a maximum of 1 row per user but not abandon my latest database result which should be the 1st result? Much appreciated!! Peace
Edit - Here is some sample result data to match the queries
Notice the 2nd result set shows my 'test' update but shows a user's post twice, which she did post the same text twice, about a month apart.
SELECT DISTINCT `id`, `user_id`, `author`, `body`, `date` FROM ratemynudepics WHERE `user_id` = `author` AND `body` <> '' GROUP BY `author` ORDER BY `id` DESC LIMIT 8
AND
SELECT `ratemynudepics`.* FROM ratemynudepics WHERE `user_id` = `author` AND `body` <> '' GROUP BY `author` ORDER BY `id` DESC LIMIT 8
Both above queries produce the following
id user_id author body date
122 4391 4391 Email me at [blocked] 1497299836
83 4270 4270 I'm back..lol..ho 1474258804
79 4303 4303 Send me a message if y 1473959358
76 4362 4362 This place is a morgue. 1472580597
68 4358 4358 Smile, have a nice day 1470897755
57 4344 4344 Can someone rate my bo 1467946896
55 4338 4388 hey lets chat 1466792249
50 4319 4319 hi whats up 1465604578
SELECT DISTINCT `id`, `user_id`, `author`, `body`, `date` FROM `ratemynudepics` WHERE `user_id` = `author` AND `body` <> '' ORDER BY `id` DESC LIMIT 8
produces the following results
id user_id author body date
153 1 1 test 1510212341
135 4391 4391 Email me at [blocked] 1508374921
122 4391 4391 Email me at [blocked] 1497299836
83 4270 4270 I'm back..lol..ho 1474258804
79 4303 4303 Send me a message if y 1473959358
76 4362 4362 This place is a morgue. 1472580597
68 4358 4358 Smile, have a nice day 1470897755
57 4344 4344 Can someone rate my bo 1467946896
The bottom result shows my test row, but shows the woman's multiple posts. The first result shows only one of hers but leaves mine out... Any ideas? Thanks!!
Thanks to another post on this website I found the answer to my problem if I use the following query:
SELECT `id`, `user_id`, `author`, `body`, `date`
FROM ratemynudepics
WHERE id IN ( SELECT MAX(id)
FROM ratemynudepics
WHERE `user_id` = `author` AND `body` <> ''
GROUP BY user_id )
ORDER BY `id` DESC LIMIT 8
Produces
id user_id author body date
153 1 1 test 1510212341
135 4391 4391 Email me at [blocked] 1508374921
83 4270 4270 I'm back..lol..ho 1474258804
79 4303 4303 Send me a message if y 1473959358
76 4362 4362 This place is a morgue. 1472580597
68 4358 4358 Smile, have a nice day 1470897755
57 4344 4344 Can someone rate my bo 1467946896
55 4338 4338 hey lets chat 1466792249
Thanks for everyone's help nonetheless. Happy Halloween! :D
Here is the post that helped me with the answer How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?

MySQL select HOTTEST (most upvoted in shortest time)

For a long time I have been trying to figure out how to make Hottest Posts
What I want to achieve : ORDER BY MOST UPVOTED IN LESS TIME
For example I got 4 posts:
ID UPVOTES(Total) UPVOTES(Weekly) DATE
1 50 50 01.09.2017
2 421 6 25.07.2017
3 71 50 13.08.2017
4 111 37 15.08.2017
And It would need to order like 1 -> 3 -> 4 -> 2
My Goal is to get UPVOTES(Weekly) - > I Don't know how to calculate
it. I just made it here, to better explain what I want to achieve.
I have got 2 database tables fun_posts and fun_post_upvotes
I was trying to achieve it like this, but it didn't work, it just ordered by id or ordered by upvotes
$stmt = $this->conn->prepare=("SELECT * , (SELECT COUNT(*) FROM
fun_post_upvotes WHERE image_id=fun_posts.id GROUP BY image_id) FROM fun_posts
ORDER BY fun_posts.id DESC, fun_posts.upvotes DESC");
This is working a part of it.
SELECT fun_posts_upvotes.image_ID, COUNT(image_ID) as 'Upvotes' FROM fun_posts_upvotes GROUP BY fun_posts_upvotes.image_ID ORDER BY Upvotes DESC;
Just try it and add the Date part. ;-) you can ask again if you tried some :P
If I understand the problem correctly you should quite easily be able to apply the following to your problem:
MYSQL Order By Sum of Columns
Order By Sum of Two Fields
CREATE TABLE `tb1` (
`id` int(11) NOT NULL,
`Name` varchar(50) NOT NULL,
`up` int(11) NOT NULL,
`Down` int(11) NOT NULL,
`Date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
SELECT `Name`, (`tb1`.`up`/ DATEDIFF(NOW(),`tb1`.`Date` )) as `heat`
FROM `tb1`
ORDER BY (`tb1`.`up`/ DATEDIFF(`tb1`.`Date`, NOW())) ASC
This should illustrate my point
I think adding to your query:
ORDER BY UPVOTES(Weekly) DESC
would work for your issue.

MySQL: How to construct a given query

I am not a MySQL guru at all, and I would really appreciate if someone takes some time to help me. I have three tables as shown below:
TEAM(teamID, teamName, userID)
YOUTH_TEAM(youthTeamID, youthTeamName, teamID)
YOUTH_PLAYER(youthPlayerID, youthPlayerFirstName, youthPlayerLastName, youthPlayerAge, youthPlayerDays, youthPlayerRating, youthPlayerPosition, youthTeamID)
And this is the query that I have now:
SELECT team.teamName, youth_team.youthTeamName, youth_player.*
FROM youth_player
INNER JOIN youth_team ON youth_player.youthTeamID = youth_team.youthTeamID
INNER JOIN team ON youth_team.teamID = team.teamID
WHERE youth_player.youthPlayerAge < 18
AND youth_player.youthPlayerDays < 21
AND youth_player.youthPlayerRating >= 5.5
What I would like to add to this query is a more thorough checks like the following:
if player has 16 years, and his position is scorer, then the player should have at least 7 rating in order to be returned
if player has 15 years, and his position is playmaker, then the player should have at least 5.5 rating in order to be returned
etc., etc.
How can I implement these requirements in my query (if possible), and is that query going to be a bad-way solution? Is it maybe going to be better if I do the selection with PHP code (if we suppose I use PHP) instead of doing it in the query?
Here is a possible solution with an additional "criteria/filter" table:
-- SAMPLE TEAMS: Yankees, Knicks:
INSERT INTO `team` VALUES (1,'Yankees',2),(2,'Knicks',1);
-- SAMPLE YOUTH TEAMS: Yankees Juniors, Knicks Juniors
INSERT INTO `youth_team` VALUES (1,'Knicks Juniors',1),(2,'Yankees Juniors',2);
-- SAMPLE PLAYERS
INSERT INTO `youth_player` VALUES
(1,'Carmelo','Anthony',16,20,7.5,'scorer',1),
(2,'Amar\'e','Stoudemire',17,45,5.5,'playmaker',1),
(3,'Iman','Shumpert',15,15,6.1,'playmaker',1),
(4,'Alex','Rodriguez',18,60,3.5,'playmaker',2),
(5,'Hiroki','Kuroda',16,17,8.7,'scorer',2),
(6,'Ichiro','Suzuki',19,73,8.3,'playmaker',2);
-- CRITERIA TABLE
CREATE TABLE `criterias` (
`id` int(11) NOT NULL,
`age` int(11) DEFAULT NULL,
`position` varchar(45) DEFAULT NULL,
`min_rating` double DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- SAMPLE CRITERIAS
-- AGE=16, POSITION=SCORER, MIN_RATING=7
-- AGE=15, POSITION=PLAYMAKER, MIN_RATING=5.5
INSERT INTO `criterias` VALUES (1,16,'scorer',7), (2,15,'playmaker',5.5);
Now your query could look like:
SELECT team.teamName, youth_team.youthTeamName, youth_player.*
FROM youth_player
CROSS JOIN criterias
INNER JOIN youth_team ON youth_player.youthTeamID = youth_team.youthTeamID
INNER JOIN team ON youth_team.teamID = team.teamID
WHERE
(
youth_player.youthPlayerAge < 18
AND youth_player.youthPlayerDays < 21
AND youth_player.youthPlayerRating >= 5.5
)
AND
(
youth_player.youthPlayerAge = criterias.age
AND youth_player.youthPlayerPosition = criterias.position
AND youth_player.youthPlayerRating >= criterias.min_rating
)
This yields (shortened results):
teamName youthTeamName youthPlayerName Age Days Rating Position
=============================================================================
Yankees "Knicks Juniors" Carmelo Anthony 16 20 7.5 scorer
Yankees "Knicks Juniors" Iman Shumpert 15 15 6.1 playmaker
Knicks "Yankees Juniors" Hiroki Kuroda 16 17 8.7 scorer
Doing it in the query is quite fine...... as long as it doesn't get too messed up. You can perform a lot of stuff in your query, but it may get hard to maintain. So if it gets too long and you want somebody else to take a look at it, you should split it up or find a solution in your php-script.
As for your requirements add this too your WHERE-part:
AND
(
(YOUTH_PLAYER.youthPlayerAge >= 16 AND YOUTH_PLAYER.youthPlayerPosition = 'scorer' AND YOUTH_PLAYER.youthPlayerRating >= 7)
OR (YOUTH_PLAYER.youthPlayerAge >= 15 AND YOUTH_PLAYER.youthPlayerPosition = 'playmaker' AND YOUTH_PLAYER.youthPlayerRating >= 5.5)
)

MySQL - searching a self join and ranges or data

I'm tasked with my local community center to build a 'newlywed' type game in time for Valentines day, so no rush!
So, We've got 50 odd couples who know each other quite well are going to be asked 100 questions before time. Each question has the users response and a range at which allow a margin of error (this range quota will be limited). and they can then select what they think their partners answer will be, with the same range for margin of error.
EG (I'll play a round as me and my GF):
Question: Do you like fruit?
I am quite fussy about fruit so I'll put a low score out of 100.. say 20. But what I do like, I LOVE and think that my GF might think I will put a higher answer, so my margin of error I'll allow is going to be 30.
I think she loves fruit and will put at least 90.. but she enjoys alot of foods so may just rank it lower, so I'll give her a margin of 20.
Ok, repeat that process for 100 questions and 50 couples.
I'm left with a table like this:
u_a = User answer
u_l = user margin of error level
p_a = partner answer
p_l = partner margin of error level
CREATE TABLE IF NOT EXISTS `large` (
`id_user` int(11) NOT NULL,
`id_q` int(11) NOT NULL,
`u_a` int(11) NOT NULL,
`u_l` int(11) NOT NULL,
`p_a` int(11) NOT NULL,
`p_l` int(11) NOT NULL,
KEY `id_user` (`id_user`,`id_q`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Stackoverflow Test';
So my row will be in the previous example:
(1, 1, 20, 30, 90, 20)
my mission is to search ALL users to see who the best matches are out of the 50.. (and hope that couples are good together!).
I'll want to search the DB for all users where my answer for my partner matches their answer, but for every user.
Here's what I've got so far (Note I've commented out some code, that's cause I'm trying two ways, not sure what's best):
SELECT
match.id_user,
count(*) as count
from `large` `match`
INNER JOIN `large` `me` ON me.id_q = match.id_q
WHERE
me.id_user = 1 AND
match.id_user != 1 AND
GREATEST(abs(me.p_a - match.u_a), 0) <= me.p_l
AND
GREATEST(abs(match.p_a - me.u_a), 0) <= match.p_l
#match.u_a BETWEEN GREATEST(me.p_a - me.p_l, 0) AND (me.p_a + me.p_l)
#AND
#me.u_a BETWEEN GREATEST(match.p_a - match.p_l, 0) AND (match.p_a + match.p_l)
GROUP BY match.id_user
ORDER BY count DESC
My question today is :
This query takes AGES! I'd like to do it during the game and allow users a chance to change answers on the night and get instant results, so this has to be quick. I'm looking at 40 seconds when looking up all matches for me (user 1).
I'm reading about DB engines and indexing now to make sure I'm doing all that I can... but suggestions are welcome!
Cheers and PHEW!
Your query shouldn't be taking 40 seconds on a smallish data set. The best way to know what is going on is to use explain before the query.
However, I suspect the problem is the condition on me. The MySQL engine might be creating all possible combinations for all users and then filtering you out. You can test this by modifying this code:
from `large` `match` INNER JOIN
`large` `me`
ON me.id_q = match.id_q
WHERE me.id_user = 1 AND
match.id_user != 1 AND . . . .
To:
from `large` `match` INNER JOIN
(select me.*
from `large` `me`
where me.id_user = 1
) me
ON me.id_q = match.id_q
WHERE match.id_user != 1 AND . . . .
In addition, the following indexes might help the query: large(id_user, id_q) and large(id_q).

Optimizing a MySQL query summing and averaging by multiple groups over a given date range

I'm currently working on a home-grown analytics system, currently using MySQL 5.6.10 on Windows Server 2008 (moving to Linux soon, and we're not dead set on MySQL, still exploring different options, including Hadoop).
We've just done a huge import, and what was a lightning-fast query for a small customer is now unbearably slow for a big one. I'm probably going to add an entirely new table to pre-calculate the results of this query, unless I can figure out how to make the query itself fast.
What the query does is take #StartDate and #EndDate as parameters, and calculates, for every day of that range, the date, the number of new reviews on that date, a running total of number of reviews (including any before #StartDate), and the daily average rating (if there is no information for a given day, the average rating will be carried over from the previous day).
Available filters are age, gender, product, company, and rating type. Every review has 1-N ratings, containing at the very least an "overall" rating, but possibly more per customer/product, such as "Quality", "Sound Quality", "Durability", "Value", etc...
The API that calls this injects these filters based on user selection. If no rating type is specified, it uses "AND ratingTypeId = 1" in place of the AND clause comment in all three parts of the query I'll be listing below. All ratings are integers between 1 and 5, though that doesn't really matter to this query.
Here are the tables I'm working with:
CREATE TABLE `times` (
`timeId` int(11) NOT NULL AUTO_INCREMENT,
`date` date NOT NULL,
`month` char(7) NOT NULL,
`quarter` char(7) NOT NULL,
`year` char(4) NOT NULL,
PRIMARY KEY (`timeId`),
UNIQUE KEY `date` (`date`)
) ENGINE=MyISAM
CREATE TABLE `reviewCount` (
`companyId` int(11) NOT NULL,
`productId` int(11) NOT NULL,
`createdOnTimeId` int(11) NOT NULL,
`ageId` int(11) NOT NULL,
`genderId` int(11) NOT NULL,
`totalReviews` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`companyId`,`productId`,`createdOnTimeId`,`ageId`,`genderId`),
KEY `companyId_fk` (`companyId`),
KEY `productId_fk` (`productId`),
KEY `createdOnTimeId` (`createdOnTimeId`),
KEY `ageId_fk` (`ageId`),
KEY `genderId_fk` (`genderId`)
) ENGINE=MyISAM
CREATE TABLE `ratingCount` (
`companyId` int(11) NOT NULL,
`productId` int(11) NOT NULL,
`createdOnTimeId` int(11) NOT NULL,
`ageId` int(11) NOT NULL,
`genderId` int(11) NOT NULL,
`ratingTypeId` int(11) NOT NULL,
`negativeRatings` int(10) unsigned NOT NULL DEFAULT '0',
`positiveRatings` int(10) unsigned NOT NULL DEFAULT '0',
`neutralRatings` int(10) unsigned NOT NULL DEFAULT '0',
`totalRatings` int(10) unsigned NOT NULL DEFAULT '0',
`ratingsSum` double unsigned DEFAULT '0',
`totalRecommendations` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`companyId`,`productId`,`createdOnTimeId`,`ageId`,`genderId`,`ratingTypeId`),
KEY `companyId_fk` (`companyId`),
KEY `productId_fk` (`productId`),
KEY `createdOnTimeId` (`createdOnTimeId`),
KEY `ageId_fk` (`ageId`),
KEY `genderId_fk` (`genderId`),
KEY `ratingTypeId_fk` (`ratingTypeId`)
) ENGINE=MyISAM
The 'times' table is pre-filled with every day from 1900-01-01 to 2049-12-31, and the two count tables are populated by an ETL script with a roll-up query grouped by company, product, age, gender, ratingType, etc...
What I'm expecting back from the query is something like this:
Date NewReviews CumulativeReviewsCount DailyRatingAverage
2013-01-24 7020 10586 4.017514595496247
2013-01-25 5505 16091 4.058400718778077
2013-01-27 2043 18134 3.992957746478873
2013-01-28 3280 21414 3.983625730994152
2013-01-29 4648 26062 3.921597633136095
...
2013-03-09 1608 60297 3.9409722222222223
2013-03-10 470 60767 3.7743682310469313
2013-03-11 1028 61795 4.036697247706422
2013-03-13 494 62289 3.857388316151203
2013-03-14 449 62738 3.8282208588957056
I'm pretty sure I could pre-calculate everything grouped by age, gender, etc..., except for the average, but I may be wrong on that. If I had three reviews for two products on one day, with all other groups different, and one had a rating of 2 and 5, and the other a 4, the first would have a daily average of 3.5, and the second 4. Averaging those averages would give me 3.75, when I'd expect to get 3.66667. Maybe I could do something like multiplying the average for that grouping by the number of reviews to get the total rating sum for the day, sum those up, then divide them by total ratings count at the end. Seems like a lot of extra work, but it may be faster than what I'm currently doing. Speaking of which, here's my current query:
SET #cumulativeCount :=
(SELECT coalesce(sum(rc.totalReviews), 0)
FROM reviewCount rc
INNER JOIN times dt ON rc.createdOnTimeId = dt.timeId
WHERE dt.date < #StartDate
-- AND clause for filtering by ratingType (default 1), age, gender, product, and company is injected here in C#
);
SET #dailyAverageWithCarry :=
(SELECT SUM(rc.ratingsSum) / SUM(rc.totalRatings)
FROM ratingCount rc
INNER JOIN times dt ON rc.createdOnTimeId = dt.timeId
WHERE dt.date < #StartDate
AND rc.totalRatings > 0
-- AND clause for filtering by ratingType (default 1), age, gender, product, and company is injected here in C#
GROUP BY dt.timeId
ORDER BY dt.date DESC LIMIT 1
);
SELECT
subquery.d AS `Date`,
subquery.newReviewsCount AS `NewReviews`,
(#cumulativeCount := #cumulativeCount + subquery.newReviewsCount) AS `CumulativeReviewsCount`,
(#dailyAverageWithCarry := COALESCE(subquery.dailyRatingAverage, #dailyAverageWithCarry)) AS `DailyRatingAverage`
FROM
(
SELECT
dt.date AS d,
COALESCE(SUM(rc.totalReviews), 0) AS newReviewsCount,
SUM(rac.ratingsSum) / SUM(rac.totalRatings) AS dailyRatingAverage
FROM times dt
LEFT JOIN reviewCount rc ON dt.timeId = rc.createdOnTimeId
LEFT JOIN ratingCount rac ON dt.timeId = rac.createdOnTimeId
WHERE dt.date BETWEEN #StartDate AND #EndDate
-- AND clause for filtering by ratingType (default 1), age, gender, product, and company is injected here in C#
GROUP BY dt.timeId
ORDER BY dt.timeId
) AS subquery;
The query currently takes ~2 minutes to run, with the following row counts:
times 54787
reviewCount 276389
ratingCount 473683
age 122
gender 3
ratingType 28
product 70070
Any help would be greatly appreciated. I'd either like to make this query much faster, or if it would be faster to do so, to pre-calculate the values grouped by date, age, gender, product, company, and ratingType, then do a quick roll-up query on that table.
UPDATE #1: I tried Meherzad's suggestions of adding indexes to times and ratingCount with:
ALTER TABLE times ADD KEY `timeId_date_key` (`timeId`, `date`);
ALTER TABLE ratingCount ADD KEY `createdOnTimeId_totalRatings_key` (`createdOnTimeId`, `totalRatings`);
Then ran my initial query again, and it was about 1s faster (~89s), but still too slow. I tried Meherzad's suggested query, and had to kill it after a few minutes.
As requested, here is the EXPLAIN results from my query:
id|select_type|table|type|possible_keys|key|key_len|ref|rows|Extra
1|PRIMARY|<derived2>|ALL|NULL|NULL|NULL|NULL|6808032|NULL
2|DERIVED|dt|range|PRIMARY,timeId_date_key,date|date|3|NULL|88|Using index condition; Using temporary; Using filesort
2|DERIVED|rc|ref|PRIMARY,companyId_fk,createdOnTimeId|createdOnTimeId|4|dt.timeId|126|Using where
2|DERIVED|rac|ref|createdOnTimeId,createdOnTimeId_total_ratings_key|createdOnTimeId|4|dt.timeId|614|NULL
I checked the cache read miss rate as mentioned in the article on buffer sizes, and it was
Key_reads 58303
Key_read_requests 147411279
For a miss rate of 3.9551247635535405672723319902814e-4
UPDATE #2: Solved! The indices definitely helped, so I'll give credit for the answer to Meherzad. What actually made the most difference was realizing that calculating the rolling average and daily/cumulative review counts in the same query was joining those two huge tables together. I saw that the variable initialization was done in two separate queries, and decided to try separating the two big queries into subqueries and then joining them based on the timeId. Now it runs in 0.358s with the following query:
SET #StartDate = '2013-01-24';
SET #EndDate = '2013-04-24';
SELECT
#StartDateId:=MIN(timeId), #EndDateId:=MAX(timeId)
FROM
times
WHERE
date IN (#StartDate , #EndDate);
SELECT
#CumulativeCount:=COALESCE(SUM(totalReviews), 0)
FROM
reviewCount
WHERE
createdOnTimeId < #StartDateId
-- Add Filters
;
SELECT
#DailyAverage:=COALESCE(SUM(ratingsSum) / SUM(totalRatings), 0)
FROM
ratingCount
WHERE
createdOnTimeId < #StartDateId
AND totalRatings > 0
-- Add Filters
GROUP BY createdOnTimeId
ORDER BY createdOnTimeId DESC
LIMIT 1;
SELECT
t.date AS `Date`,
COALESCE(q1.newReviewsCount, 0) AS `NewReviews`,
(#CumulativeCount:=#CumulativeCount + COALESCE(q1.newReviewsCount, 0)) AS `CumulativeReviewsCount`,
(#DailyAverage:=COALESCE(q2.dailyRatingAverage,
COALESCE(#DailyAverage, 0))) AS `DailyRatingAverage`
FROM
times t
LEFT JOIN
(SELECT
rc.createdOnTimeId AS createdOnTimeId,
COALESCE(SUM(rc.totalReviews), 0) AS newReviewsCount
FROM
reviewCount rc
WHERE
rc.createdOnTimeId BETWEEN #StartDateId AND #EndDateId
-- Add Filters
GROUP BY rc.createdOnTimeId) AS q1 ON t.timeId = q1.createdOnTimeId
LEFT JOIN
(SELECT
rc.createdOnTimeId AS createdOnTimeId,
SUM(rc.ratingsSum) / SUM(rc.totalRatings) AS dailyRatingAverage
FROM
ratingCount rc
WHERE
rc.createdOnTimeId BETWEEN #StartDateId AND #EndDateId
-- Add Filters
GROUP BY rc.createdOnTimeId) AS q2 ON t.timeId = q2.createdOnTimeId
WHERE
t.timeId BETWEEN #StartDateId AND #EndDateId;
I had assumed that two subqueries would be incredibly slow, but they were insanely fast because they weren't joining completely unrelated rows. It also pointed out the fact that my earlier results were way off. For example, from above:
Date NewReviews CumulativeReviewsCount DailyRatingAverage
2013-01-24 7020 10586 4.017514595496247
Should have been, and now is:
Date NewReviews CumulativeReviewsCount DailyRatingAverage
2013-01-24 599 407327 4.017514595496247
The average was correct, but the join was screwing up the number of both new and cumulative reviews, which I verified with a single query.
I also got rid of the joins to the times table, instead determining the start and end date IDs in a quick initialization query, then just rejoined to the times table at the end.
Now the results are:
Date NewReviews CumulativeReviewsCount DailyRatingAverage
2013-01-24 599 407327 4.017514595496247
2013-01-25 551 407878 4.058400718778077
2013-01-26 455 408333 3.838926174496644
2013-01-27 433 408766 3.992957746478873
2013-01-28 425 409191 3.983625730994152
...
2013-04-13 170 426066 3.874239350912779
2013-04-14 182 426248 3.585714285714286
2013-04-15 171 426419 3.6202531645569622
2013-04-16 0 426419 3.6202531645569622
2013-04-17 0 426419 3.6202531645569622
2013-04-18 0 426419 3.6202531645569622
2013-04-19 0 426419 3.6202531645569622
2013-04-20 0 426419 3.6202531645569622
2013-04-21 0 426419 3.6202531645569622
2013-04-22 0 426419 3.6202531645569622
2013-04-23 0 426419 3.6202531645569622
2013-04-24 0 426419 3.6202531645569622
The last few averages properly carry the earlier ones, too, since we haven't imported from that customer's data feed in about 10 days.
Thanks for the help!
Try this query
You don't have necessary indexes to optimize your query
Table times add compound index on (timeId, dateId)
Table ratingCount add compound index on (createdOnTimeId, totalRatings)
As you have already mentioned that you are using various other AND filters according to the user input so create a compound index for those columns in the order which you are adding for their respective table Ex Table ratingCount compound index (createdOnTimeId, totalRatings, ratingType, age, gender, product, and company). NOTE This index will be useful only if you add these constraints in the query.
I'd also check to make sure your buffer pool is large enough to hold your indexes. You don't want indexes to be paging in and out of the buffer pool during a query.
Check your buffer pool size
BUFFER_SIZE
If you don't find any improvement in performance please post explain statement for your query also, it will help in understanding problem properly.
I have tried to understand your query and made a new one check whether it works or not.
SELECT
*
FROM
(SELECT
dt.timeId
dt.date,
COALESCE(SUM(rc.totalReviews), 0) AS `NewReviews`,
(#cumulativeCount := #cumulativeCount + subquery.newReviewsCount) AS `CumulativeReviewsCount`,
(#dailyAverageWithCarry := COALESCE(SUM(rac.ratingsSum) / SUM(rac.totalRatings), #dailyAverageWithCarry)) AS `DailyRatingAverage`
FROM
times dt
LEFT JOIN
reviewCount rc
ON
dt.timeId = rc.createdOnTimeId
LEFT JOIN
ratingCount rac ON dt.timeId = rac.createdOnTimeId
JOIN
(SELECT #cumulativeCount:=0, #dailyAverageWithCarry:=0) tmp
WHERE
dt.date < #EndDate
-- AND clause for filtering by ratingType (default 1), age, gender, product, and company is injected here in C#
GROUP BY
dt.timeId
ORDER BY
dt.timeId
) AS subquery
WHERE
subquery.date>#StartDate;
Hope this helps....