Mysql sub select is very slow - mysql

I have a table like this :
-------------------------------
| id | valid_until | username|
-------------------------------
| 1 | 2020-01-01 | user1 |
-------------------------------
| 1 | 2020-01-01 | user2 |
-------------------------------
| 1 | 2020-01-02 | user3 |
-------------------------------
| 1 | 2020-01-02 | user4 |
-------------------------------
| 1 | 2020-01-03 | user5 |
-------------------------------
| 1 | 2020-01-03 | user6 |
-------------------------------
| 1 | 2020-01-03 | user7 |
-------------------------------
This is the user subscription table, valid_until says when the subscription will end up.
I want to know active subscription in each day, So I have a range, for example, 2020-01-01 TO 2020-01-03 and here is my query :
SELECT
valid_until qva,
(SELECT
COUNT(iod.id) AS ct
FROM
`order_detail` iod
WHERE
valid_until > qva)
FROM
`order_detail`
WHERE
valid_until >= '2020-01-01'
AND valid_until <= '2020-01-03'
GROUP BY qva
But this query is too slow, What is the problem with my query? Response time (230 sec)

Your query is invalid according to the SQL standard. You have GROUP BY qva (and using the alias name here is usually not allowed, but in MySQL and MariaDB it is), but you don't apply an aggregate function on the subquery result. MySQL is known for violating the standard here and they silently apply ANY_VALUE on such unaggregated expressions.
This makes the query slow. For each order in the date range the count is evaluated, only to pick one of those counts per date at the very end.
So, let's build the query up from scratch. You want one result row for each day in the given date range. That is:
select distinct valid_until
where valid_until between date '2020-01-01' and date '2020-01-03'
from order_detail;
Then, for each of these dates you want to get the count. You can do this in a subquery as in your original query. I suppose it must be >= instead of > though, as an order valid until a day is still valid that day (or at least this is what I'd expect).
SELECT
dates.day,
(
select count(*)
from order_detail od
where od.valid_until >= dates.day
) as ct
FROM
(
select distinct valid_until as day
where valid_until between date '2020-01-01' and date '2020-01-03'
from order_detail
) dates
order by dates.day;
(This assumes that valid_until is a mere date. If it's a datetime, you'll have to adjust this query a little.)
UPDATE:
As ysth told you in the request comments, your query will get you only days in the given range that exist as valid_until in your table. So does mine. If you wanted the three days regardless, you'd have to replace the subquery and make it independent from the table, e.g.
FROM
(
select date '2020-01-01' as day union all
select date '2020-01-02' as day union all
select date '2020-01-03' as day
) dates

I think you just want a cumulative count of valid_until from the latest date to the earliest:
select valid_until, count(*) day_count,
sum(count(*)) over (order by valid_until desc) as count_as_of_day
from order_details od
group by valid_until;
If you want this for a range of dates, then use a subquery:
select od.*
from (select valid_until, count(*) day_count,
sum(count(*)) over (order by valid_until desc) as count_as_of_day
from order_details od
group by valid_until
) od
where valid_until >= :from and valid_until <= :to

Related

MySQL sum where inner join is missing from right or left table

I have a turnover table on the one side that has :
Storeid Turnover myDate
| 1 | 1000 | 2020-01-01 |
| 1 | 200 | 2020-01-02 |
| 1 | 4000 | 2020-01-03 |
| 1 | 1000 | 2020-01-05 |
on the other side I have a table with the number of transactions:
Storeid Transactions myDate
| 1 | 20 | 2020-01-01 |
| 1 | 40 | 2020-01-03 |
| 1 | 20 | 2020-01-04 |
| 1 | 60 | 2020-01-05 |
I need to work out the sum of the turnover and the sum of the transactions for a given date range. However I might have missing dates on either one of the tables. If I sum them individually I get the correct answer for each but any sort of inner or left join and I get incomplete answers (as per below):
select sum(Turnover), sum(transactions) from TurnoverTable
left join TransactionTable on TurnoverTable.storeid = TransactionTable.storeid and
TurnoverTable.myDate = TransactionTable.myDate where TurnoverTable.myDate >= '2020-01-01'
This will produce a sum for Turnover of 6200 and for Transactions of 120 (20 is missing from the 2020-01-04 date as this date is not available in the Turnover table, therefore fails in the join).
Short of running 2 select sum queries, is there a way to run these sums?
Much appreciated.
You have dates missing in both tables, which rules out a left join solution. Conceptually, you want to full join. In MySQL, where this syntax is not supported, you can use union all; the rest is just aggregation:
select sum(turnover) turnover, sum(transactions) transactions
from (
select mydate, turnover, 0 transactions
union all
select mydate, 0, transactions
) t
where mydate >= '2020-01-01'
Regarding this kind of statistics, you should not use JOIN. Because you may get wrong results by rows duplications. Especially, we need to join many tables in practice.
So I recommend using UNION like the following: Please include a date where clause in UNION.
SELECT
Storeid,
SUM(Turnover),
SUM(Transactions)
FROM
(SELECT
Storeid,
myDate,
Turnover,
0 AS Transactions
FROM
turnovers
WHERE myDate BETWEEN '2020-01-01'
AND '2020-08-21'
UNION
ALL
SELECT
Storeid,
myDate,
0 AS Turnover,
Transactions
WHERE myDate BETWEEN '2020-01-01'
AND '2020-08-21'
FROM
Transactions) AS t
GROUP BY Storeid ;

Conditional subquery in where clause

I'm trying to create a query with conditional logic where I only calculate revenue for the most recent records by each month using a datetime column (start_date), but only if there are multiple records in that month from the same account_id.
Here's a basic example of the schema after I join two tables (full schema in sqlfiddle link).
| account_id | plan_id | start_date | plan_interval | price |
|------------|---------|----------------------|---------------|-------|
| 1 | 1 | 2018-01-03T14:52:13Z | month | 39 |
| 1 | 3 | 2018-02-07T11:10:17Z | year | 999 |
| 1 | 2 | 2018-02-07T11:11:17Z | month | 99 |
In the above example, I would only like to include rows 1 and 3 in my output, as it's the one record from account_id 1 in January and the most recent of two records for account_id 1 in February.
SELECT
MONTH(start_date) AS month,
SUM(CASE WHEN plan_interval = 'month'
THEN price * .01
ELSE (price * .01)/12 END) AS mrr
FROM subscriptions
JOIN plans
ON plans.id = subscriptions.plan_id
WHERE Year(start_date) = 2018 AND
CASE WHEN (account_id = account_id
AND MONTH(start_date) = MONTH(start_date))
THEN (SELECT MAX(start_date) FROM subscriptions)
ELSE (SELECT start_date FROM subscriptions)
END
GROUP BY month
ORDER BY month ASC;
The case statement in the subquery above does not seem to work in doing this. It returns the data without filtering out records when the first condition is met.
Here is an example: sqlfiddle
This query returns the rows that you are asking for in the question:
SELECT s.*, p.plan_interval, p.price,
(CASE WHEN p.plan_interval = 'month'
THEN p.price * 0.01
ELSE (p.price * 0.01)/12
END) AS mrr
FROM subscriptions s JOIN
plans p
ON p.id = s.plan_id
WHERE YEAR(s.start_date) = 2018 AND
s.start_date = (SELECT MAX(s2.start_date)
FROM subscriptions s2
WHERE s2.account_id = s.account_id AND
EXTRACT(YEAR_MONTH FROM s2.start_date) = EXTRACT(YEAR_MONTH FROM s.start_date)
)
ORDER BY s.start_date ASC;
This uses a subquery to get the most recent record for a subscription for each month.
You can then aggregate this however you wish.
Notes about the query:
Table aliases make the query easier to write and to read.
The subquery uses the handy YEAR_MONTH option of EXTRACT(), so it handles both years and months.
For numeric constants between -1 and 1, I always prepend with a 0, so 0.12 rather than .12. If find that this makes the decimal point more obvious.
First work out the last entry by account and month (sub query a) join to subscriptions to get the plan_id and then get the plan
SELECT S.ACCOUNT_id,s.plan_id,s.start_date,p.Price,p.plan_interval,
case when p.plan_interval = 'month' then p.price * .01 /12 else p.price * .01 end as rev
from subscriptions s
join (select s.account_id,month(s.start_date), max(s.start_date) start_date
from subscriptions s
group by account_id,month(start_date)) a on a.account_id = s.account_id and a.start_date = s.start_date
join plans p on p.id = s.plan_id;
+------------+---------+---------------------+----------+---------------+--------------+
| ACCOUNT_id | plan_id | start_date | Price | plan_interval | rev |
+------------+---------+---------------------+----------+---------------+--------------+
| 1 | 1 | 2018-01-03 14:52:13 | 3900.00 | month | 3.25000000 |
| 1 | 2 | 2018-02-07 11:11:17 | 9900.00 | month | 8.25000000 |
| 2 | 3 | 2018-01-03 17:40:05 | 99900.00 | year | 999.00000000 |
+------------+---------+---------------------+----------+---------------+--------------+
In your case, the WHERE statement does not work because the CASE statement will always return a boolean.
CASE WHEN (account_id = account_id
AND MONTH(start_date) = MONTH(start_date))
THEN (SELECT MAX(start_date) FROM subscriptions)
ELSE (SELECT start_date FROM subscriptions)
END
Another approach to what you are building would involve using a subquery to order the columns the way you want within the groups.
SELECT
account_id,
month,
CASE WHEN plan_interval = 'month'
THEN price * .01
ELSE (price * .01)/12
END AS mrr
FROM (
SELECT *, MONTH(start_date) AS month
FROM subscriptions
INNER JOIN plans ON plans.id = subscriptions.plan_id
ORDER BY account_id, start_date DESC
) sq
GROUP BY account_id, month
This works because selecting columns in a GROUP BY will automatically take the first row that is returned by the subquery for a given group of columns.

SQL count row equal to column

Here is my table:
+--------+---------------------+
| roomNo | date |
+--------+---------------------+
| 1 | 2017-05-17 16:05:00 |
| 1 | 2017-05-17 15:05:00 |
| 2 | 2019-05-20 12:30:00 |
| 2 | 2019-05-15 10:30:00 |
| 2 | 2019-05-14 08:00:00 |
+--------+---------------------+
I want to get the day where the room is used at least once and which day(s) had the most operations in it and how many times, in the current year. I don't know how to compare the dates.
The expected result would be something like :
+--------+------------+------------+
| roomNo | date | operations |
+--------+------------+------------+
| 2 | 2019-05-20 | 3 |
+--------+------------+------------+
We can use MySQL DATE function to lop off times from DATETIME and TIMESTAMP columns. Or we could use MySQL DATE_FORMAT function, to return just year, month day.
We can use an aggregate function like COUNT or SUM in a query with GROUP BY to get counts by room and day.
If "current year" means from Jan 1 thru Dec 31, we can use expression to derive date values of '2019-01-01' and '2020-01-01', and do a comparison of the date column to those values in the WHERE clause.
As a start, consider this:
SELECT t.roomno
, DATE(t.date) AS date_
, COUNT(*) AS cnt_
FROM mytable t
WHERE t.date >= DATE_FORMAT(NOW(),'%Y-01-01') + INTERVAL 0 YEAR
AND t.date < DATE_FORMAT(NOW(),'%Y-01-01') + INTERVAL 1 YEAR
GROUP
BY t.roomno
, DATE(t.date)
ORDER
BY t.roomno
, cnt_ DESC
If the goal is to just return one of the rooms that has the highest number of uses, we could use a LIMIT clause, and order by the highest count to lowest,
ORDER
BY cnt_ DESC
, t.roomno
LIMIT 1
If the results are more complex than that, we can omit the LIMIT clause, and use the result from that query as an inline view in an outer query.
With MySQL 8.0, we can use common table expression (CTE) and window/analytic functions, to get more elaborate results.

How can I return new IDs in accordance with a date?

I have a table with the following data (merely an example, actual table has 600,000 rows) (aid = access id [primary key] and id = user id [foreign key]):
aid | id | date
332 | 1 | 2016-12-15
331 | 4 | 2016-12-15
330 | 3 | 2016-12-15
329 | 1 | 2016-12-14
328 | 1 | 2016-12-14
327 | 2 | 2016-12-14
326 | 3 | 2016-12-13
325 | 2 | 2016-12-13
324 | 1 | 2016-12-13
323 | 1 | 2016-12-12
322 | 3 | 2016-12-12
321 | 1 | 2016-12-12
Each id is a users primary key, and every time they access something in my system I log them in this table (with the date in the format as shown, and their id). A user can be logged multiple times a day.
I'm looking to: return the total number of times the thing has been accessed in a day and return the total number of NEW users who have accessed the thing in a day, for the last 8 days (something will always be logged each day, so using "LIMIT 8" is fine for getting only the last 8 days).
My SQL currently looks like:
SELECT COUNT(id), COUNT(distinct id), date
FROM table
GROUP BY date
ORDER BY date DESC
LIMIT 8;
That SQL does the first part correctly, but I can't figure out how to get it to return the number of users who have never accessed the thing until that day.
Desired results would be, the one "newuser" represents the user with id "4" as they have never accessed the thing before:
COUNT(id) | newusers | date
3 | 1 | 2016-12-15
3 | 0 | 2016-12-14
3 | 0 | 2016-12-13
3 | 0 | 2016-12-12
Sorry if I didn't explain this clear enough.
To get new users you want the first day an id appeared:
select id, min(date)
from t
group by id;
The rest is just a join and group by:
select d.date, cnt, count(dd.id) as newusers
from (select date, count(*) as cnt
from t
group by date
) d left join
(select id, min(date) as mindate
from t
group by id
) dd
on d.date = dd.mindate
group by d.date, d.cnt
limit 8;
To get the number of new users you need to compare them to a set of ids over the past 8 days
My MySQL is a bit rusty, so you might have to correct the syntax.
SELECT COUNT(id)
FROM table
WHERE id NOT IN (
SELECT DISTINCT id
FROM table
WHERE date BETWEEN DATE(DATE_SUB(NOW(), INTERVAL 8 DAY)) AND DATE(DATE_SUB(NOW(), INTERVAL 1 DAY))
)
I'll leave it as a task for you to combine it with your other query ;)
Hi if your date column in database is datetime/date or other date representing format you can do something like this:
for getting all users who accessed something in 8 days:
Select id, date from table
where date BETWEEN DATE_ADD(NOW(), INTERVAL -9 DAY) AND NOW()
I think, you can do whatever grouping you want on that.
To get new users, you can either go with self join or with sub select
selfjoin:
select t.id, t.date from table as t
LEFT join table as t2
ON t.id = t2.id
AND t.date BETWEEN DATE_ADD(NOW(), INTERVAL -1 DAY) AND NOW()
AND t2.date NOT BETWEEN DATE_ADD(NOW(), INTERVAL -9 DAY) AND NOW()
WHERE t2.id IS NULL
i used left join to match all access from users and then in where excluded those rows. However self joins are slow, and even slower with LEFT join
subselect:
select id, date from table
where date BETWEEN DATE_ADD(NOW(), INTERVAL -1 DAY) AND NOW()
AND id NOT IN (
SELECT id FROM table
WHERE date BETWEEN DATE_ADD(NOW(), INTERVAL -2 DAY) AND DATE_ADD(NOW(), INTERVAL -1 DAY)
)
I know those betweens with date_adds are not exactly nice looking, but i hope it will help you more than grouping dates
I would suggest using date with time for more information, but its entirely up to meaning of yours data

MYSQL find field value that where never recorder previously

Hi I have a table that logs user events.
date | event | userId
2016/09/29 | A | 10
2016/09/29 | A | 3
2016/09/29 | A | 2
2016/09/28 | A | 2
2016/09/28 | B | 2
2016/09/27 | A | 1
2016/09/27 | A | 1
2016/09/27 | B | 1
I need to count for each day of the current month, the number of userId that were never seen before (to simplify, the number of new users).
I have come up with the following query for the current day.
SELECT DATE(date) AS time, COUNT(DISTINCT userId) AS count FROM event
WHERE DATE(date) = date(now())
AND userId NOT IN (
SELECT DISTINCT userId FROM event WHERE DATE(date) <= DATE_SUB(now(), INTERVAL 1 DAY)
AND userId IS NOT NULL
)
But I am not able to find an efficient query to get the same result for each day of the current month.
You can try this:
SELECT myDate, COUNT(userId) AS user_count
FROM (
SELECT userId, DATE(MIN(date)) AS myDate
FROM event
GROUP BY userId
) AS ch
GROUP BY myDate
select fd date, count(1) newUserCount from
(select min(date) fd ,userId from event where date between {monthStart} and {monthEnd}
group by userId )t group by fd
Find each user's first visit date of the month,and the daily count of new user of this month is each day's user count.
Hope this can help