Using results from parent queries in nested select - mysql

I'm sure this is a fairly trivial problem, but I'm not sure what to google to find the solution.
I have a table that looks like this:
CREATE TABLE IF NOT EXISTS `transactions` (
`name` text collate utf8_swedish_ci NOT NULL,
`value` decimal(65,2) NOT NULL,
`date` date NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci ROW_FORMAT=COMPACT;
I populate this by cutting and pasting data from my internet banking service.
Value can be a negative or positive value, what both date and name contain should be fairly obvious ;)
I have constructed a query to let me see my bottom line for each month:
SELECT sum(`value`) as 'change', DATE_FORMAT(`date`, '%M %Y') as 'month'
FROM `transactions`
WHERE 1
GROUP BY year(`date`), month(`date`)
Now I would like to add the total accumulated money in the account at the end of the month as an additional column.
SELECT sum(`value`) as 'change', DATE_FORMAT(`date`, '%M %Y') as 'month',
(SELECT sum(`value`) FROM `transactions` WHERE `date` <= 123) as 'accumulated'
FROM `transactions`
WHERE 1
GROUP BY year(`date`), month(`date`)
123 is not exactly what I want in there, but I do not understand how to get at the result from my DATE_FORMAT inside that subquery.
Is this even the proper way to approach the problem?
This is mostly a personal exercise (running on a very small dataset) so I'm not very concerned about performance, readable SQL is far more important.
I am running a InnoDB table on MySQL 5.0.45

SELECT change,
CONCAT(mymonth, ' ', myyear) AS 'month',
(
SELECT SUM(`value`)
FROM `transactions`
WHERE `date` < DATE_ADD(STR_TO_DATE(CONCAT('01.', mymonth, '.', myyear, '%D.%M.%Y'), INTERVAL 1 MONTH))
)
FROM (
SELECT sum(`value`) as 'change', YEAR(date) AS myyear, MONTH(date) AS mymonth
FROM `transactions`
WHERE 1
GROUP BY
YEAR(`date`), MONTH(`date`)
) q
You wrote that you don't cate for performance, but this syntax is not much more complex but will be more efficient (just in case):
SELECT SUM(value) AS change,
CONCAT(MONTH(`date`), ' ', YEAR(`date`)) AS 'month',
#r : = #r + SUM(value) AS cumulative
FROM (
SELECT #r := 0
) AS vars,
transactions
WHERE 1
GROUP BY
YEAR(`date`), MONTH(`date`)
ORDER BY
YEAR(`date`), MONTH(`date`)
This one will count cumulative SUM's as well, but it will count each month only once.

Related

Nested MariaDB Query Slow

I am having performance issues with a query, I have 21 million records across the table, and 2 of the tables I'm looking in here have 8 million each; individually, they are very quick. But I've done a query that, in my opinion, isn't very good, but it's the only way I know how to do it.
This query takes 65 seconds, I need to get it under 1 second and I think it's possible if I don't have all the SELECT queries, but once again, I am not sure how else to do it with my SQL knowledge.
Database server version is MariaDB 10.6.
SELECT
pa.`slug`,
(
SELECT
SUM(`impressions`)
FROM `rh_pages_gsc_country`
WHERE `page_id` = pa.`page_id`
AND `country` = 'aus'
AND `date_id` IN
(
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
)
) as au_impressions,
(
SELECT
SUM(`clicks`)
FROM `rh_pages_gsc_country`
WHERE `page_id` = pa.`page_id`
AND `country` = 'aus'
AND `date_id` IN
(
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
)
) as au_clicks,
(
SELECT
COUNT(`keywords_id`)
FROM `rh_pages_gsc_keywords`
WHERE `page_id` = pa.`page_id`
AND `date_id` IN
(
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
)
) as keywords,
(
SELECT
AVG(`position`)
FROM `rh_pages_gsc_keywords`
WHERE `page_id` = pa.`page_id`
AND `date_id` IN
(
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
)
) as avg_pos,
(
SELECT
AVG(`ctr`)
FROM `rh_pages_gsc_keywords`
WHERE `page_id` = pa.`page_id`
AND `date_id` IN
(
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
)
) as avg_ctr
FROM `rh_pages` pa
WHERE pa.`site_id` = 13
ORDER BY au_impressions DESC, keywords DESC, slug DESC
If anyone can help, I don't think the table structure is needed here as it's basically shown in the query, but here is a photo of the constraints and table types.
Anyone that can help is greatly appreciated.
Do NOT normalize any column that will be regularly used in a "range scan", such as date. The following is terribly slow:
AND `date_id` IN (
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH
AND NOW() )
It also consumes extra space to have BIGINT (8 bytes) pointing to a DATE (5 bytes).
Once you move the date to the various tables, the subqueries simplify, such as
SELECT AVG(`position`)
FROM `rh_pages_gsc_keywords`
WHERE `page_id` = pa.`page_id`
AND `date_id` IN (
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH
AND NOW() )
becomes
SELECT AVG(`position`)
FROM `rh_pages_gsc_keywords`
WHERE `page_id` = pa.`page_id`
AND `date` >= NOW() - INTERVAL 12 MONTH
I'm assuming that nothing after "NOW" has yet been stored.
If there are dates in the future, then add
AND `date` < NOW()
Each table will probably need a new index, such as
INDEX(page_id, date) -- in that order
(Yes, the "JOIN" suggestion by others is a good one. It's essentially orthogonal to my suggestions above and below.)
After you have made those changes, if the performance is not good enough, we can discuss Summary Tables
Your query is aggregating (summarizing) rows from two different detail tables, rh_pages_gsc_country and rh_pages_gsc_keywords, and doing so for a particular date range. And it has a lot of correlated subqueries.
The first steps in your path to better performance are
Converting your correlated subqueries to independent subqueries, then JOINing them.
Writing one subquery for each detail table, rather than one for each column you need summarized.
You mentioned you've been struggling with this. The concept I hope you learn from this answer is this: you can often refactor away your correlated subqueries if you can come up with independent subqueries that give the same results, and then join them together. If you mention subqueries in your SELECT clause -- SELECT ... (SELECT whatever) whatever ... -- you probably have an opportunity to do this refactoring.
Here goes. First you need a subquery for your date range. You have this one right, just repeated.
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
Next you need a subquery for rh_pages_gsc_country. It is a modification of what you have. We'll fetch both SUMs in one subquery.
SELECT SUM(`impressions`) impressions,
SUM(`clicks`) clicks,
page_id, date_id
FROM `rh_pages_gsc_country`
WHERE `country` = 'aus'
GROUP BY page_id, date_id
See how this goes? This subquery yields a virtual table with exactly one row for every combination of page_id and date_id, containing the number of impressions and the number of clicks.
Next, let's join the subqueries together in a main query. This yields some columns of your result set.
SELECT pa.slug, country.impressions, country.clicks
FROM rh_pages pa
JOIN (
SELECT SUM(`impressions`) impressions,
SUM(`clicks`) clicks,
page_id, date_id
FROM `rh_pages_gsc_country`
WHERE `country` = 'aus' -- constant for country code
GROUP BY page_id, date_id
) country ON country.page_id = pa.page_id
JOIN (
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
) dates ON dates.date_id = country.date_id
WHERE pa.site_id = 13 -- constant for page id
ORDER BY country.impressions DESC
This runs through the rows of rh_pages_gsc_dates and rh_pages_gsc_country just once to satisfy your query. So, faster.
Finally let's do the same thing for your rh_pages_gsc_keywords table's summary.
SELECT pa.slug, country.impressions, country.clicks,
keywords.keywords, keywords.avg_pos, keywords.avg_ctr
FROM rh_pages pa
JOIN (
SELECT SUM(`impressions`) impressions,
SUM(`clicks`) clicks,
page_id, date_id
FROM `rh_pages_gsc_country`
WHERE `country` = 'aus' -- constant for country code
GROUP BY page_id, date_id
) country ON country.page_id = pa.page_id
JOIN (
SELECT SUM(`keywords_id`) keywords,
AVG(`position`) position,
AVG(`ctr`) avg_ctr,
page_id, date_id
FROM `rh_pages_gsc_keywords`
GROUP BY page_id, date_id
) keywords ON keywords.page_id = pa.page_id
JOIN (
SELECT `date_id`
FROM `rh_pages_gsc_keywords`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
) dates ON dates.date_id = country.date_id
AND dates.date_id = keywords.date_id
WHERE pa.site_id = 13 -- constant for page id
ORDER BY impressions DESC, keywords DESC, slug DESC
This will almost certainly be faster than what you have now. If it's fast enough, great. If not, please don't hesitate to ask another question for help, tagging it query-optimization. We will need to see your table definitions, your index definitions, and the output of EXPLAIN. Please read this before asking a followup question.
I did not, repeat not, debug any of this. That's up to you.

MySQL Sub Query difficulty

I'm trying to get a count of how many times a user has triggered the following query. And I've concluded that a Sub Query is required.
The below (admittedly indelicate) query works, as far as it goes, without the Sub Query. And the Sub Query works as a standalone query. But after three days of trying, I cannot get the two to work combined. I don't know if I have a glaring syntax error, or whether I'm getting it all wrong in principle. I need help!
SELECT id, status, FirstName, LastName, Track, KeyChange, Version,
DATE_FORMAT(CONVERT_TZ(Created,'+00:00','+1:00'), '%l:%i %p') AS Created_formatted,
TIME_FORMAT(SEC_TO_TIME(TIMESTAMPDIFF(SECOND, pinknoise.Created, CURRENT_TIMESTAMP() - INTERVAL '0' HOUR)),'%Hh %im') AS elapsed,
(SELECT `FirstName`, Count(*) AS 'CountRequests' FROM `pinknoise` GROUP by `FirstName`)
FROM pinknoise
WHERE status = 'incoming'
ORDER BY Created DESC
I don't really understand what your query should achieve, but well formatted it looks like:
SELECT
id,
status,
FirstName,
LastName,
Track,
KeyChange,
Version,
DATE_FORMAT(
CONVERT_TZ(
Created,
'+00:00',
'+1:00'
),
'%l:%i %p'
) AS Created_formatted,
TIME_FORMAT(
SEC_TO_TIME(
TIMESTAMPDIFF(
SECOND,
pinknoise.Created,
CURRENT_TIMESTAMP() - INTERVAL '0' HOUR
)
),
'%Hh %im'
) AS elapsed
(
SELECT
`FirstName`,
Count(*) AS 'CountRequests'
FROM
`pinknoise`
GROUP by
`FirstName`
)
FROM
pinknoise
WHERE
status = 'incoming'
ORDER BY
Created DESC
What I imagine: you want the number of total entries for this particular firstname in the same table. The dirty way would be:
SELECT
id,
status,
FirstName,
LastName,
Track,
KeyChange,
Version,
DATE_FORMAT(
CONVERT_TZ(
Created,
'+00:00',
'+1:00'
),
'%l:%i %p'
) AS Created_formatted,
TIME_FORMAT(
SEC_TO_TIME(
TIMESTAMPDIFF(
SECOND,
pinknoise.Created,
CURRENT_TIMESTAMP() - INTERVAL '0' HOUR
)
),
'%Hh %im'
) AS elapsed,
(
SELECT
Count(*)
FROM
`pinknoise` AS tb
WHERE
tb.FirstName = pinknoise.FirstName
) AS CountRequests
FROM
pinknoise
WHERE
status = 'incoming'
ORDER BY
Created DESC
A much better performance would have a join:
SELECT
pinknoise.id,
pinknoise.status,
pinknoise.FirstName,
pinknoise.LastName,
pinknoise.Track,
pinknoise.KeyChange,
pinknoise.Version,
DATE_FORMAT(
CONVERT_TZ(
pinknoise.Created,
'+00:00',
'+1:00'
),
'%l:%i %p'
) AS Created_formatted,
TIME_FORMAT(
SEC_TO_TIME(
TIMESTAMPDIFF(
SECOND,
pinknoise.Created,
CURRENT_TIMESTAMP() - INTERVAL '0' HOUR
)
),
'%Hh %im'
) AS elapsed,
tabA.CountRequests
FROM
pinknoise
INNER JOIN
(
SELECT
Count(*) AS 'CountRequests',
FirstName
FROM
`pinknoise`
GROUP BY
FirstName
) tabA
ON
pinknoise.FirstName = tabA.FirstName
WHERE
status = 'incoming'
ORDER BY
Created DESC
Your subselect is returning 2 values in the select portion where it only expects one value. I'm guessing you are getting the FirstName with the intent of doing a join. If so, then try this:
SELECT
p.id,
p.status,
p.FirstName,
p.LastName,
p.Track,
p.KeyChange,
p.Version,
DATE_FORMAT(CONVERT_TZ(p.Created,'+00:00','+1:00'), '%l:%i %p') AS Created_formatted,
TIME_FORMAT(SEC_TO_TIME(TIMESTAMPDIFF(SECOND, p.Created, CURRENT_TIMESTAMP() - INTERVAL '0' HOUR)),'%Hh %im') AS elapsed,
cnt.CountRequests
FROM
pinknoise p
inner join (SELECT p.FirstName, Count(*) AS CountRequests FROM pinknoise p GROUP by p.FirstName) cnt on p.FirstName = cnt.FirstName
WHERE
p.status = 'incoming'
ORDER BY
p.Created DESC;

How to display tree results count by each month?

I am trying to count achieve 3 results in one query:
Count all results where ‘app_creationdate’ = ('month') from current row
Count all results where ‘app_start’ = ('month') from current row
Count all results where ‘app_creationsdate’ < ‘app_start’ and ‘app_start’ = ('month') from current row
My Table:
app_id | app_creationdate(timestamp) | app_start(datetime)
00001 | 2014-11-17 19:39:04 | 2014-11-18 09:30:00
SELECT
DATE_FORMAT( app_creationsdate, '%m' ) AS 'month',
COUNT( app_id ) AS 'new',
(SELECT COUNT( app_id )
FROM appointments WHERE MONTH(app_start) = MONTH(NOW())) AS 'act',
(SELECT COUNT( app_id )
FROM appointments WHERE MONTH(app_creationsdate) < MONTH(app_start)) AS 'prev'
FROM appointments
WHERE app_owner = 2 AND app_creationsdate > DATE_SUB(now(), INTERVAL 12 MONTH)
GROUP BY DATE_FORMAT( app_creationsdate, '%Y%m' )
This may be closer to what you want. I'm still a bit confused about the prev scenario, so I did my best. I use EXTRACT(YEAR_MONTH FROM ...) to get the year and month, without the day, of each date so that we can do monthly comparisons. That's probably what you were trying to do with the DATE_FORMAT business.
SELECT DATE_FORMAT( app_creationsdate, '%m' ) AS 'month',
COUNT( app_id ) AS 'new',
-- get all other appointments that start in this month
(SELECT COUNT( act.app_id )
FROM appointments AS act
WHERE EXTRACT(YEAR_MONTH FROM act.app_start) = EXTRACT(YEAR_MONTH FROM appointments.app_creationsdate)) AS 'act',
-- get all appointments that were created before they started (???) and that started before this month
(SELECT COUNT( prev.app_id )
FROM appointments AS prev
WHERE EXTRACT(YEAR_MONTH FROM prev.app_creationsdate) < EXTRACT(YEAR_MONTH FROM appointments.app_creationsdate)
AND EXTRACT(YEAR_MONTH FROM prev.app_start) = EXTRACT(YEAR_MONTH FROM appointments.app_creationsdate)) AS 'prev'
FROM appointments
WHERE app_owner = 2
AND app_creationsdate > DATE_SUB(now(), INTERVAL 12 MONTH)
GROUP BY EXTRACT(YEAR_MONTH FROM app_creationsdate)
I am not entirely sure what you are trying to accomplish but I do notice one thing:
DATE_FORMAT(a2.app_start, '%m' ) = DATE_FORMAT('month', '%m' )
This: DATE_FORMAT('month', '%m' )...evaluates to NULL so equating that with anything is never going to work (I don't think anyway, but I'm new to MySQL... :) ).

Using grouped value instead of table value in MySQL query

I need to group some data by date, but I have a very special case where the name of the final field should be the same of the original field and I can't use an expression in the GROUP BY
I have created this sqlfiddle with some example data:
http://sqlfiddle.com/#!2/8771a/1
I need this result:
DATE PAGEVIEWS
2013-12 69
2013-11 70
Note 1: I can't change the group by, if I do this I get the result, but I need to group by date and date should be the formatted date, and not the real date in the table:
SELECT DATE_FORMAT(`date`, "%Y-%m") AS `date`, SUM(pageviews) AS pageviews
FROM `domains_data`
GROUP BY DATE_FORMAT(`date`, "%Y-%m")
ORDER BY `date` DESC
Note 2: I can't rename the field, it should have the name "date" at the end, this isn't possible for me:
SELECT DATE_FORMAT(`date`, "%Y-%m") AS `date2`, SUM(pageviews) AS pageviews
FROM `domains_data`
GROUP BY `date2`
ORDER BY `date` DESC
There is some way to do it with MySQL?
Is the use of nested query allowed?
SELECT `date`, SUM(pageviews) AS pageviews FROM
(SELECT DATE_FORMAT(`date`, "%Y-%m") AS `date`, SUM(pageviews) AS pageviews
FROM `domains_data`
GROUP BY `date`) AS Ref
GROUP BY `date`
ORDER BY `date` DESC
If you can change the from clause:
SELECT `date`, SUM(pageviews) AS pageviews
FROM (select DATE_FORMAT(`date`, "%Y-%m") AS `date`, pageviews
from `domains_data` dd
) dd
GROUP BY `date`
ORDER BY `date` DESC;
However, given the constraint that you cannot change the group by, you probably cannot change the from either. Can you explain why your query has these limitations?

Multiple month columns without inline subqueries

Is there a more effective way to return multiple columns from a table that contains a date column instead of using inline subqueries?
SELECT (SELECT SUM(`value`) FROM `data` WHERE MONTH(`date`) = 1 AS `Jan`),
(SELECT ...) // Feb, Mar, etc.
Because having 12 inline subqueries is taxing on the query engine, right?
SELECT YEAR(`date`) as `YEAR`,
SUM(CASE WHEN MONTH(`date`)=1 THEN `value` ELSE 0 END) AS `JAN`,
...
GROUP BY YEAR(`date`)
SELECT SUM(`value`) FROM `data` GROUP BY MONTH(`date`)