mysql index guidance needed - group by sub query super slow - mysql

Quick overview, I have worked out a mysql query but need to optimize the performance.
My original post was here but its gone cold and im getting desperate to elaborate on some of the suggestions which I tried to implement. So its not a dupe post but it is related.
Here is the query that takes 45 seconds plus, the group by on the second sub query really slows things down.
SELECT * FROM
(
SELECT DISTINCT email,
title,
first_name,
last_name,
'chauntry' AS source,
post_code AS postcode
FROM chauntry
WHERE mailing_indicator = 1
) AS x
JOIN
(
SELECT email,
Avg(amount_paid) AS avg_paid,
Count(*) AS no_times_booked,
Count(DISTINCT( Date_format(added, '%M %Y') )) AS unique_months
FROM chauntry
WHERE added >= Now() - INTERVAL 1 year
GROUP BY email
) AS y
ON x.email = y.email
Based on the index suggestions from here I looked around for a few examples of indexing and came up with the below
ALTER TABLE `chauntry`
ADD INDEX(`mailing_indicator`, `email`);
ALTER TABLE `chauntry`
ADD INDEX covering_index (`added`, `email`, `amount_paid`);
This makes no difference to the query time and im not sure if what im doing is even close as up until now I have had no need to use indexing.
suggestions welcome on how to index my table correctly or how to modify the query.

Out of curiousity, does this query do what you want?
SELECT email, title, first_name, last_name, 'chauntry' AS source,
post_code AS postcode,
Avg(amount_paid) AS avg_paid,
Count(*) AS no_times_booked,
Count(DISTINCT( Date_format(added, '%M %Y') )) AS unique_months
FROM chauntry
WHERE added >= Now() - INTERVAL 1 year
GROUP BY email, title, first_name, last_name, post_code
HAVING SUM(mailing_indicator = 1) > 0;
It would seem to follow the same logic as your query, except that the mailing indicator would need to have been set in the past year.

Why use JOIN on subselects to same table?
I would try this:
SELECT email,
title,
first_name,
last_name,
'chauntry' AS source,
post_code AS postcode
Avg(amount_paid) AS avg_paid,
Count(*) AS no_times_booked,
Count(DISTINCT( Date_format(added, '%M %Y') )) AS unique_months
FROM chauntry
WHERE
mailing_indicator = 1 and
added >= Now() - INTERVAL 1 year
GROUP BY email
Also I don't think you need any index with query like this, maybe on added and email, but you already added them.

Minor play.
The average of the amount_paid is the biggest problem. If you are prepared to put up with the possibility of an inaccuracy for this figure then you could maybe average the distinct values of the amount_paid field. This WILL give the wrong value under certain circumstances (ie, if you had 100 bookings, 99 at $1 and 1 at $100 the average would be given as $50.50 rather than $1.99), but if the amount paid is never repeated then this may be acceptable.
Otherwise you can probably use a join of the table against itself. To get the no_times_booked you can count the DISTINCT unique identifiers of the table (I have assumed id here).
SELECT c1.email,
c1.title,
c1.first_name,
c1.last_name,
'chauntry' AS source,
c1.post_code AS postcode
Avg(DISTINCT c2.amount_paid) AS avg_paid,
Count(DISTINCT c2.id) AS no_times_booked,
Count(DISTINCT( Date_format(c2.added, '%M %Y') )) AS unique_months
FROM chauntry c1
INNER JOIN chauntry c2
ON c1.email = c2.email
WHERE c1.mailing_indicator = 1
AND c2.added >= Now() - INTERVAL 1 year
GROUP BY c1.email,
c1.title,
c1.first_name,
c1.last_name,
source,
c1.post_code

Related

making a query for stock/price trend in mysql

SQL Fiddle
Table scheme:
CREATE TABLE company
(`company_id` int,`name` varchar(30))
;
INSERT INTO company
(`company_id`,`name`)
VALUES
(1,"Company A"),
(2,"Company B")
;
CREATE TABLE price
(`company_id` int,`price` int,`time` timestamp)
;
INSERT INTO price
(`company_id`,`price`,`time`)
VALUES
(1,50,'2015-02-21 02:34:40'),
(2,60,'2015-02-21 02:35:40'),
(1,70,'2015-02-21 05:34:40'),
(2,120,'2015-02-21 05:35:40'),
(1,150,'2015-02-22 02:34:40'),
(2,130,'2015-02-22 02:35:40'),
(1,170,'2015-02-22 05:34:40'),
(2,190,'2015-02-22 05:35:40')
I'm using Cron Jobs to fetch company prices. In concatenating the price history for each company, how can I make sure that only the last one in each day is included? In this case, I want all of the price records around 05:30am concatenated.
This is the result I'm trying to get (I have used Date(time) to only get the dates from the timestamps):
COMPANY_ID PRICE TIME
1 70|170 2015-02-21|2015-02-22
2 120|190 2015-02-21|2015-02-22
I have tried the following query but it doesn't work. The prices don't correspond to the dates and I don't know how to exclude all of the 2:30 am records before applying the Group_concat function.
SELECT company_id,price,trend_date FROM
(
SELECT company_id, GROUP_CONCAT(price SEPARATOR'|') AS price,
GROUP_CONCAT(trend_date SEPARATOR'|') AS trend_date
FROM
(
SELECT company_id,price,
DATE(time) AS trend_date
FROM price
ORDER BY time ASC
)x1
GROUP BY company_id
)t1
Can anyone show me how to get the desired result?
Ok, so this should work as intended:
SELECT p.company_id,
GROUP_CONCAT(price SEPARATOR '|') as price,
GROUP_CONCAT(PriceDate SEPARATOR '|') as trend_date
FROM price as p
INNER JOIN (SELECT company_id,
DATE(`time`) as PriceDate,
MAX(`time`) as MaxTime
FROM price
GROUP BY company_id,
DATE(`time`)) as t
ON p.company_id = t.company_id
AND p.`time` = t.MaxTime
GROUP BY p.company_id
Here is the modified sqlfiddle.
This is a bit unorthodox but I think it solves your problem:
SELECT company_id,
GROUP_CONCAT(price SEPARATOR'|'),
GROUP_CONCAT(trend_date SEPARATOR'|')
FROM (
SELECT *
FROM (
SELECT company_id,
DATE(`time`) `trend_date`,
price
FROM price
ORDER BY `time` DESC
) AS a
GROUP BY company_id, `trend_date`
) AS b
GROUP BY company_id

Select smallest date after group by

Let's start off by saying that I have almost completed my MySQL query, but I just need a final hint towards the answer.
I used the following MySQL query (for reference):
SELECT * FROM
(
SELECT ll.id AS id, ll.globalId AS globalId, ll.date AS date, ll.serverId AS serverId, ll.gamemodeId AS gamemodeId, ll.mapId AS mapId, origin,
pjl.id AS pjlid, pjl.globalid AS pjlglobalId, pjl.date AS pjldate, MIN(pjl.date) AS mindate, pjl.serverId AS pjlserverId, pjl.playerId AS pjlplayerId
FROM (
(
SELECT id, globalId, date, serverId, playerId, 'playerjoins' AS origin
FROM playerjoins pj
WHERE playerId =976
)
UNION ALL
(
SELECT id, globalId, date, serverId, playerId, 'playerleaves' AS origin
FROM playerleaves pl
WHERE playerId =976
)
ORDER BY date DESC
)pjl
JOIN levelsloaded ll ON pjl.date >= ll.date
GROUP BY ll.id, origin
ORDER BY date DESC) above
This give me the resultset (part of it), that can be found on the following SQL Fiddle: http://sqlfiddle.com/#!2/514b6/1/0
What I want is the following:
You now see that there are duplicate id's in the resultset, take for the example result with id = 133.
I want to see the first action that happened after the date in that record (id = 113).
The date of that record is November, 27 2013 00:00:17+0000.
Now there are two possible actions that happened directly after that date:
1) origin = 'playerjoins' on mindate = November, 28 2013 00:00:18+0000.
2) origin = 'playerleaves' on mindate = November, 28 2013 00:00:19+0000.
Since playerjoins is the one that has first happened, I want that in my final resultset.
So, I hope it is clear with my example: I want to have, for every 2 rows with the same id, the row that has the lowest mindate. I need to be able to see the whole row, so only knowing the lowest mindate per 2 rows does not suffice. I need to know the origin aswell.
EDIT: The answer might be found here, https://stackoverflow.com/a/7745635/2057294 , still investigating it.
The correct query is:
SELECT *
FROM levelsloaded ll
INNER JOIN
(SELECT id, MIN(mindate) AS finalmindate
FROM levelsloaded
GROUP BY id
) ill
ON ll.id = ill.id AND ll.mindate = ill.finalmindate
ORDER BY date DESC
This does exactly what I described, a more detailed answer can be found in: https://stackoverflow.com/a/7745635/2057294.

sql calculate change and percent by year

I have an data set that simulates the rate of return for a trading account. There is an entry for each day showing the balance and the open equity. I want to calculate the yearly, or quarterly, or monthly change and percent gain or loss. I have this working for daily data, but for some reason I can't seem to get it to work for yearly data.
The code for daily data follows:
SELECT b.`Date`, b.Open_Equity, delta,
concat(round(delta_p*100,4),'%') as delta_p
FROM (SELECT *,
(Open_Equity - #pequity) as delta,
(Open_Equity - #pequity)/#pequity as delta_p,
(#pequity:= Open_Equity)
FROM tim_account_history p
CROSS JOIN
(SELECT #pequity:= NULL
FROM tim_account_history
ORDER by `Date` LIMIT 1) as a
ORDER BY `Date`) as b
ORDER by `Date` ASC
Grouping by YEAR(Date) doesn't seem to make the desired difference. I have tried everything I can think of, but it still seems to return daily rate of change even if you group by month or year, etc. I think I'm not using windowing correctly, but I can't seem to figure it out. If anyone knows of a good book about this sort of query I'd appreciate that also.
Thanks.sqlfiddle example
Using what Lolo contributed, I have added some code so the data comes from the last day of the year, instead of the first. I also just need the Open_Equity, not the sum.
I'm still not certain I understand why this works, but it does give me what I was looking for. Using another select statement as a from seems to be the key here; I don't think I would have come up with this without Lolo's help. Thank you.
SELECT b.`yyyy`, b.Open_Equity,
concat('$',round(delta, 2)) as delta,
concat(round(delta_p*100,4),'%') as delta_p
FROM (SELECT *,
(Open_Equity - #pequity) as delta,
(Open_Equity - #pequity)/#pequity as delta_p,
(#pequity:= Open_Equity)
FROM (SELECT (EXTRACT(YEAR FROM `Date`)) as `yyyy`,
(SUBSTRING_INDEX(GROUP_CONCAT(CAST(`Open_Equity` AS CHAR) ORDER BY `Date` DESC), ',', 1 )) AS `Open_Equity`
FROM tim_account_history GROUP BY `yyyy` ORDER BY `yyyy` DESC) p
CROSS JOIN
(SELECT #pequity:= NULL) as a
ORDER BY `yyyy` ) as b
ORDER by `yyyy` ASC
Try this:
SELECT b.`Date`, b.Open_Equity, delta,
concat(round(delta_p*100,4),'%') as delta_p
FROM (SELECT *,
(Open_Equity - #pequity) as delta,
(Open_Equity - #pequity)/#pequity as delta_p,
(#pequity:= Open_Equity)
FROM (SELECT YEAR(`Date`) `Date`, SUM(Open_Equity) Open_Equity FROM tim_account_history GROUP BY YEAR(`Date`)) p
CROSS JOIN
(SELECT #pequity:= NULL) as a
ORDER BY `Date` ) as b
ORDER by `Date` ASC

Balancing out MYSQL select statements

I inserted 'vanity_name' and 'name' into the first and second SELECT statements respectively.
I get a mismatched number of columns error, which I'm confused about because I added a column to both select statements to maintain a balance.
SQL Statement:
SELECT id,
vanity_name,
Date_format(DATE, '%M %e, %Y') AS DATE,
TYPE
FROM (SELECT resume_id AS id,
date_mod AS DATE,
'resume' AS TYPE
FROM resumes
WHERE user_id = '1'
UNION ALL
SELECT profile_id,
name,
date_mod AS DATE,
'profile'
FROM profiles
WHERE user_id = '1'
ORDER BY DATE DESC
LIMIT
5) AS d1
ORDER BY DATE DESC
Erm, you have four columns in your outer select, three in the inner select.
id, vanity_name, date, type
vs.
id, date, TYPE
Based on the parenthesis, you're trying to union:
(SELECT resume_id AS id, date_mod AS date, 'resume' AS TYPE FROM resumes WHERE user_id = '1'
with
SELECT profile_id,name,date_mod AS date, 'profile' FROM profiles ... LIMIT 5)
and they obviously don't match. Reposition your parens.

PHP MySQL Group By question

I have a column inside my table: tbl_customers that distinguishes a customer record as either a LEAD or a CUS.
The column is simply: recordtype, with is a char(1). I populate it with either C, or L.
Obviously C = customer, while L = lead.
I want to run a query that groups by the day the record was created, so I have a column called: datecreated.
Here's where I get confused with the grouping.
I want to display a result (in one query) the COUNT of customers and the COUNT of leads for a particular day, or date range. I'm successful with only pulling the number for either recordtype:C or recordtype:L , but that takes 2 queries.
Here's what I have so far:
SELECT COUNT(customerid) AS `count`, datecreated
FROM `tbl_customers`
WHERE `datecreated` BETWEEN '$startdate."' AND '".$enddate."'
AND `recordtype` = 'C'
GROUP BY `datecreated` ASC
As expected, this displays 2 columns (the count of customer records and the datecreated).
Is there a way to display both in one query, while still grouping by the datecreated column?
You can do a group by with over multiple columns.
SELECT COUNT(customerid) AS `count`, datecreated, `recordtype`
FROM `tbl_customers`
WHERE `datecreated` BETWEEN '$startdate."' AND '".$enddate."'
GROUP BY `datecreated` ASC, `recordtype`
SELECT COUNT(customerid) AS `count`,
datecreated,
SUM(`recordtype` = 'C') AS CountOfC,
SUM(`recordtype` = 'L') AS CountOfL
FROM `tbl_customers`
WHERE `datecreated` BETWEEN '$startdate."' AND '".$enddate."'
GROUP BY `datecreated` ASC
See Is it possible to count two columns in the same query
There are two solutions, depending on whether you want the two counts in separate rows or in separate columns.
In separate rows:
SELECT datecreated, recordtype, COUNT(*)
FROM tbl_customers
WHERE datecreated BETWEEN '...' AND '...'
GROUP BY datecreated, recordtype
In separate colums (this is called pivoting the table)
SELECT datecreated,
SUM(recordtype = 'C') AS count_customers,
SUM(recordtype = 'L') AS count_leads
FROM tbl_customers
WHERE datecreated BETWEEN '...' AND '...'
GROUP BY datecreated
Use:
$query = sprintf("SELECT COUNT(c.customerid) AS count,
c.datecreated,
SUM(CASE WHEN c.recordtype = 'C' THEN 1 ELSE 0 END) AS CountOfC,
SUM(CASE WHEN c.recordtype = 'L' THEN 1 ELSE 0 END) AS CountOfL
FROM tbl_customers c
WHERE c.datecreated BETWEEN STR_TO_DATE('%s', '%Y-%m-%d %H:%i')
AND STR_TO_DATE('%s', '%Y-%m-%d %H:%i')
GROUP BY c.datecreated",
$startdate, $enddate);
You need to fill out the date format - see STR_TO_DATE for details.