Optimize sql query - select in select - mysql

Which is the best method for optimizing the select in select mysql query ?
This is my example:
SELECT count(distinct email)
FROM emails_stats
WHERE DATE_FORMAT(time, '%Y-%m-%d') >= '2012-12-12'
and email in (SELECT email
FROM `reminder`
WHERE DATE_FORMAT(time, '%Y-%m-%d') = '2012-12-12')
My database has over 500k entries.

SELECT count(distinct emails_stats.email)
FROM emails_stats
JOIN reminder ON emails_stats.email= reminder.email
WHERE
emails_stats.time >= CAST('2012-12-12 00:00:00' AS datetime) AND
(reminder.time BETWEEN CAST('2012-12-12 00:00:00' AS datetime) AND CAST('2012-12-12 23:59:59' AS datetime));
If you use date_format() with the table fields, mysql will need to go through each row in the table, because it needs to get the result of that date_format() function to be able to compare the value with your given string. To make it faster, create an index for the 'time' fields and use this query instead. That way mysql can determine which rows it needs just by looking up the index.

Use an exists clause instead:
SELECT count(distinct email)
FROM emails_stats
WHERE DATE_FORMAT(time, '%Y-%m-%d') >= '2012-12-12'
and exists (SELECT 1
FROM `reminder`
WHERE emails_stats.email = `reminder`.email
and DATE_FORMAT(time, '%Y-%m-%d') = '2012-12-12')

the best method is to use join here like
SELECT count(distinct emails_stats.email)
FROM emails_stats
JOIN reminder ON emails_stats.email= reminder.email
WHERE DATE_FORMAT(emails_stats.time, '%Y-%m-%d') >= '2012-12-12';

Related

Avg function not returning proper value

I expect this query to give me the avg value from daily active users up to date and grouped by month (from Oct to December). But the result is 164K aprox when it should be 128K. Why avg is not working? Avg should be SUM of values / number of current month days up to today.
SELECT sq.month_year AS 'month_year', AVG(number)
FROM
(
SELECT CONCAT(MONTHNAME(date), "-", YEAR(DATE)) AS 'month_year', count(distinct id_user) AS number
FROM table1
WHERE date between '2020-10-01' and '2020-12-31 23:59:59'
GROUP BY EXTRACT(year_month FROM date)
) sq
GROUP BY 1
Ok guys thanks for your help. The problem was that on the subquery I was pulling the info by month and not by day. So I should pull the info by day there and group by month in the outer query. This finally worked:
SELECT sq.day_month, AVG(number)
FROM (SELECT date(date) AS day_month,
count(distinct id_user) AS number
FROM table_1
WHERE date >= '2020-10-01' AND
date < '2021-01-01'
GROUP BY 1
) sq
GROUP BY EXTRACT(year_month FROM day_month)
Do not use single quotes for column aliases!
SELECT sq.month_year, AVG(number)
FROM (SELECT CONCAT(MONTHNAME(date), '-', YEAR(DATE)) AS month_year,
count(distinct id_user) AS number
FROM table1
WHERE date >= '2020-10-01' AND
date < '2021-01-01'
GROUP BY month_year
) sq
GROUP BY 1;
Note the fixes to the query:
The GROUP BY uses the same columns as the SELECT. Your query should return an error (although it works in older versions of MySQL).
The date comparisons have been simplified.
No single quotes on column aliases.
Note that the outer query is not needed. I assume it is there just to illustrate the issue you are having.

MySQL Performance DATE_FORMAT() vs YEAR() AND MONTH()

Which is better for performance when looking for timestamps in current month and year?
SELECT *
FROM mytable
WHERE YEAR(CURRENT_TIMESTAMP) = YEAR(mytable.timestamp)
AND MONTH(CURRENT_TIMESTAMP) = MONTH(mytable.timestamp)
OR
SELECT *
FROM mytable
WHERE DATE_FORMAT(CURRENT_TIMESTAMP,'%m-%Y') = DATE_FORMAT(mytable.timestamp,'%m-%Y')
For better perfomance and able to use index in mytable.timestamp, truncate the current date to month.
SELECT DATE_FORMAT(CURRENT_TIMESTAMP, '%Y-%m-01')
This create a constant value and you can index search for it.
And then you can get all the record from this month
SELECT *
FROM mytable
WHERE mytable.timestamp >= DATE_FORMAT(CURRENT_TIMESTAMP, '%Y-%m-01')

mySQL query with HAVING gives me an error. How to fix it?

When I run this query I have this error message on phpmydamin: Unknown column 'timestamp' in 'having clause'
My column name is timestamp
SELECT DISTINCT (
hash
) AS total
FROM behaviour
HAVING total =1 and date(timestamp) = curdate()
How to get the number of hash for today?
Use where. And parentheses are not appropriate for select distinct (distinct is not a function). I suspect that you intend:
SELECT COUNT(DISTINCT hash) AS total
FROM behaviour
WHERE date(timestamp) = curdate();
It is better to write the WHERE clause without using a function on the column:
SELECT COUNT(DISTINCT hash) AS total
FROM behaviour
WHERE timestamp >= curdate() AND timestamp < date_add(curdate, interval 1 day);
Although more complicated, it allows the database engine to use an index on behaviour(timestamp) (or better yet, on behaviour(timestamp, hash).
EDIT:
If you want the hash that only appear once, one method is a subquery:
select count(*)
from (select hash
from behaviour
where timestamp >= curdate() AND timestamp < date_add(curdate, interval 1 day)
group by hash
having count(*) = 1
);
To count the hash values only existing once:
select count(*)
from
(
select hash
from behavior
where date(timestamp) = curdate()
group by hash
having count(*) = 1
) dt
The inner select (derived table) will return the hash values only existing once. The outer select will count those rows.

MySQL select specific date from datetime

I'm trying to setup a mysql select query based on a given date in a datetime column. Something like:
SELECT DATE_FORMAT(colName, '%Y-%m-%d') WHERE..
But I'm unsure on how to do this correctly. I'm using php to do this.
Following example explain the query to select date between 01-Jan-2014 and 15-Jan-2014.
SELECT
DATE_FORMAT(colName, '%Y-%m-%d') AS colName
FROM
`my_table`
WHERE
DATE_FORMAT(colName, '%Y-%m-%d') >= '2014-01-01' AND DATE_FORMAT(colName, '%Y-%m-%d') <= '2014-01-15'
To verify your query, use sqlfiddle and use a query without any table such as
SELECT DATE_FORMAT(now(), '%Y-%m-%d') from dual

Select the first element in each day of the month

How to select the first element of each day in a month with mysql query ?
I have table with offers - startdate, so i can check for each day,month,year i'm getting the element but, i'm wondering how to get only the first element in each day of some month ?
Assume the following
Table is called mytable
Table has id as primary key
Table has dt as datatime
You want the first id of everyday in February 2012
Try this:
SELECT B.id FROM
(
SELECT DATE(dt) date_dt,MIN(dt) dt
FROM mytable
WHERE dt >= '2012-02-01 00:00:00'
AND dt < '2012-03-01 00:00:00'
GROUP BY DATE(dt)
) A
LEFT JOIN mytable B USING (dt);
If any dt has multiple B.id values try this:
SELECT dt,MIN(id) id
(
SELECT B.id,B.dt FROM
(
SELECT DATE(dt) date_dt,MIN(dt) dt
FROM mytable
WHERE dt >= '2012-02-01 00:00:00'
AND dt < '2012-03-01 00:00:00'
GROUP BY DATE(dt)
) A
LEFT JOIN mytable B USING (dt)
) AA GROUP BY dt;
Assuming startdate is a DATETIME type, and the earliest entry is the one with the earliest DATETIME value, for March, 2012:
SELECT DISTINCT *
FROM tbl t1
LEFT JOIN tbl t2
ON (t2.startdate BETWEEN '2012-02-01 00:00:00' AND '2012-02-29 23:59:59')
AND t2.startdate < t1.startdate
WHERE (t1.startdate BETWEEN '2012-02-01 00:00:00' AND '2012-02-29 23:59:59')
AND t2.startdate IS NULL
If there are no duplicate dates, then you don't need the DISTINCT.
This query works by joining with any earlier record for the same month, so if nothing was joined, it's the earliest, through process of elimination.
This technique is explained in detail in the book SQL Antipatterns.
This could also be solved with subqueries, but this type of JOIN is supposed to be easier to optimize by MySQL than subqueries, which often negate the use of indexes.
without knowing the exact structure of your table something like this should work:
SELECT MIN(offerId) FROM offers WHERE startdate <= '2012-03-06' AND startdate >= '2012-02-06' GROUP BY date(startdate)
It sounds like you are trying to do something like the following:
SELECT col_1, date_col, col_3 FROM tbl
WHERE
date_col = ( SELECT min(date_col) FROM tbl
WHERE
year(date_col) = 2006 AND
month(date_col) = 02
);
This can also be used to find the max( date_col ) . Hope this helps.
Just to offer a different way to skin this cat (much easier in SQL Server for once actually)
SELECT
t0.offerId
FROM
offers AS t0 LEFT JOIN
offers AS t1 ON t0.offerId = t1.offerId AND t1.startDate > t0.startDate AND
(t0.startDate BETWEEN '2012-02-01' AND '2012-03-01') AND
(t1.startDate BETWEEN '2012-02-01' AND '2012-03-01')
WHERE
t1.col1 IS NULL;
If you have multiple rows with the same exact time you will get multiple values returned, which you can weed out in your application logic or with a sub-query. BTW this is called a groupwise minimum/maximum.