How can I do optimize multiple left joins in MySQL? - mysql

Can you help me get this query to work?
I have a log query that counts logged items (table: log) for each active user (table: user, status: 1 (for active)) by day (table: calendar, including days without rows.
The following query takes 10 minutes to run! How can I run this in seconds rather than minutes?
SELECT
c.day, COUNT(u.id) AS count
FROM calendar c
LEFT JOIN log l
ON c.day = DATE_FORMAT(l.db_timestamp , '%Y-%m-%d')
LEFT JOIN user u
ON l.user_id = u.id
AND u.user_status_type_id = 1
WHERE
c.day > '2012-12-01'
AND c.day < '2013-01-01'
GROUP BY
c.day
Table structure:
calendar (~3,000 rows)
day
===============================
2012-01-01
2012-01-02
2012-01-03
...
2020-01-01
log (~30,000 rows)
id user_id db_timestamp
================================
1 1 2012-01-01 01:01:01
1 2 2012-01-01 01:01:01
1 1 2012-01-01 01:01:01
user (~3,000,000 rows)
id user_status_type_id
================================
1 1
1 0
Result should look like this:
Sample Expected Results
day count
=================
2012-12-01 1
2012-12-02 0
2012-12-03 4
...
2012-12-31 0
Unfortunately it takes forever to run. What should I do next?

for your selected columns you don't need any joins. use following sql
SELECT DATE_FORMAT(l.db_timestamp , '%Y-%m-%d') AS days, COUNT(l.id) AS COUNT
FROM LOG l
WHERE
DATE_FORMAT(l.db_timestamp , '%Y-%m-%d') > '2012-12-01'
AND DATE_FORMAT(l.db_timestamp , '%Y-%m-%d') < '2013-01-01'
GROUP BY days
for user wise count
use
GROUP BY days, l.user_id

Try this::
USE DATE() WHILE JOINING
SELECT
c.day, COUNT(u.id) AS count
FROM calendar c
LEFT JOIN log l
ON c.day = DATE(l.db_timestamp)
LEFT JOIN user u
ON l.user_id = u.id
AND u.user_status_type_id = 1
WHERE
c.day between '2013-01-01'
AND '2012-12-01'
GROUP BY
c.day

Related

Creating an overdraft statement

I'm currently stuck on how to create a statement that shows daily overdraft statements for a particular council.
I have the following, councils, users, markets, market_transactions, user_deposits.
market_transaction run daily reducing user's account balance. When the account_balance is 0 the users go into overdraft (negative). When users make a deposit their account balance increases.
I Have put the following tables to show how transactions and deposits are stored.
if I reverse today's transactions I'm able to get what account balance a user had yesterday but to formulate a query to get the daily OD amount is where the problem is.
USERS
user_id
name
account_bal
1
Wells
-5
2
James
100
3
Joy
10
4
Mumbi
-300
DEPOSITS
id
user_id
amount
date
1
1
5
2021-04-26
2
3
10
2021-04-26
3
3
5
2021-04-25
4
4
5
2021-04-25
TRANSACTIONS
id
user_id
amount_tendered
date
1
1
5
2021-04-27
2
2
10
2021-04-26
3
3
15
2021-04-26
4
4
50
2021-04-25
The Relationships are as follows,
COUNCILS
council_id
name
1
a
2
b
3
c
MARKETS
market_id
name
council_id
1
x
3
2
y
1
3
z
2
MARTKET_USER_LINK
id
market_id
user_id
1
1
3
2
2
2
3
3
1
I'm running this SQL query to get the total amount users have spent and subtracting with the current user account balance.
Don't know If I can use this to figure out the account_balance for each day.
SELECT u.user_id, total_spent, total_deposits,m.council_id
FROM users u
JOIN market_user_link ul ON ul.user_id= u.user_id
LEFT JOIN markets m ON ul.market_id =m.market_id
LEFT JOIN councils c ON m.council_id =c.council_id
LEFT JOIN (
SELECT user_id, SUM(amount_tendered) AS total_spent
FROM transactions
WHERE DATE(date) BETWEEN DATE('2021-02-01') AND DATE(NOW())
GROUP BY user_id
) t ON t.user_id= u.user_id
ORDER BY user_id, total_spent ASC
// looks like this when run
| user_id | total_spent | council_id |
|-------------|----------------|------------|
| 1 | 50.00 | 1 |
| 2 | 2.00 | 3 |
I was hoping to reverse transactions and deposits done to get the account balance for a day then get the sum of users with an account balance < 0... But this has just failed to work.
The goal is to produce a query that shows daily overdraft (Only SUM the total account balance of users with account balance below 0 ) for a particular council.
Expected Result
date
council_id
o_d_amount
2021-04-24
1
-300.00
2021-04-24
2
-60.00
2021-04-24
3
-900.00
2021-04-25
1
-600.00
2021-04-25
2
-100.00
2021-04-25
3
-1200.00
This is actually not that hard, but the way you asked makes it hard to follow.
Also, your expected result should match the data you provided.
Edited: Previous solution was wrong - It counted withdraws and deposits more than once if you have more than one event for each user/date.
Start by having the total exchanged on each day, like
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d
group by user_id, date
order by user_id, date;
What follows gets the state of the account only on days that had any deposits or withdraws.
To get the results of all days (and not just those with account movement) you just have to change the cross join part to get a table with all dates you want (like Get all dates between two dates in SQL Server) but I digress...
select dates.date, c.council_id, u.name username
, u.account_bal - sum(case when e.date >= dates.date then e.exchanged_on_day else 0 end) as amount_on_start_of_day
, u.account_bal - sum(case when e.date > dates.date then e.exchanged_on_day else 0 end) as amount_on_end_of_day
from councils c
inner join markets m on c.council_id=m.council_id
inner join market_user_link mul on m.market_id=mul.market_id
inner join users u on mul.user_id=u.user_id
left join (
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d group by user_id, date
) e on u.user_id=e.user_id --exchange on each Day
cross join (select distinct date from (select date from deposits union select date from transactions) datesInternal) dates --all days that had a transaction
group by dates.date, c.council_id, u.name, u.account_bal
order by dates.date desc, c.council_id, u.name;
From there you can rearrange to get the result you want.
select date, council_id
, sum(case when amount_on_start_of_day<0 then amount_on_start_of_day else 0 end) o_d_amount_start
, sum(case when amount_on_end_of_day<0 then amount_on_end_of_day else 0 end) o_d_amount_end
from (
select dates.date, c.council_id, u.name username
, u.account_bal - sum(case when e.date >= dates.date then e.exchanged_on_day else 0 end) as amount_on_start_of_day
, u.account_bal - sum(case when e.date > dates.date then e.exchanged_on_day else 0 end) as amount_on_end_of_day
from councils c
inner join markets m on c.council_id=m.council_id
inner join market_user_link mul on m.market_id=mul.market_id
inner join users u on mul.user_id=u.user_id
left join (
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d group by user_id, date
) e on u.user_id=e.user_id --exchange on each Day
cross join (select distinct date from (select date from deposits union select date from transactions) datesInternal) dates --all days that had a transaction
group by dates.date, c.council_id, u.name, u.account_bal
) result
group by date, council_id
order by date;
You can check it on https://www.db-fiddle.com/f/msScT6B5F7FjU2aQXVr2da/6
Basically the query maps users to councils, caculates periods of overdrafts for users, them aggregates over councils. I assume that starting balance is dated start of the month '2021-04-01' (it could be ending balance as well, see below), change it as needed. Also that negative starting balance counts as an overdraft. For simplicity and debugging the query is divided into a number of steps.
with uc as (
select distinct m.council_id, mul.user_id
from markets m
join market_user_link mul on m.market_id = mul.market_id
),
user_running_total as (
select user_id, date,
coalesce(lead(date) over(partition by user_id order by date) - interval 1 day, date) nxt,
sum(sum(s)) over(partition by user_id order by date) rt
from (
select user_id, date, -amount_tendered s
from transactions
union all
select user_id, date, amount
from deposits
union all
select user_id, se.d, se.s
from users
cross join lateral (
select date(NOW() + interval 1 day) d, 0 s
union all
select '2021-04-01' d, account_bal
) se
) t
group by user_id, date
),
user_overdraft as (
select user_id, date, nxt, least(rt, 0) ovd
from user_running_total
where date <= date(NOW())
),
dates as (
select date
from user_overdraft
union
select nxt
from user_overdraft
),
council__overdraft as (
select uc.council_id, d.date, sum(uo.ovd) total_overdraft, lag(sum(uo.ovd), 1, sum(uo.ovd) - 1) over(partition by uc.council_id order by d.date) prev_ovd
from uc
cross join dates d
join user_overdraft uo on uc.user_id = uo.user_id and d.date between uo.date and uo.nxt
group by uc.council_id, d.date
)
select council_id, date, total_overdraft
from council__overdraft
where total_overdraft <> prev_ovd
order by date, council_id
Really council__overdraft is quite usable, the last step just compacts output excluding intermidiate dates when overdraft is not changed.
With following sample data:
users
user_id name account_bal
1 Wells -5
2 James 100
3 Joy 10
4 Mumbi -300
deposits, odered by date, extra row added for the last date
id user_id amount date
3 3 5 2021-04-25
4 4 5 2021-04-25
1 1 5 2021-04-26
2 3 10 2021-04-26
5 3 73 2021-05-06
transactions, odered by date (note the added row, to illustrate running total in action)
id user_id amount_tendered date
5 4 50 2021-04-25
2 2 10 2021-04-26
3 3 15 2021-04-26
1 1 5 2021-04-27
4 3 17 2021-04-27
councils
council_id name
1 a
2 b
3 c
markets
market_id name council_id
1 x 3
2 y 1
3 z 2
market_user_link
id market_id user_id
1 1 3
2 2 2
3 3 1
4 3 4
the query ouput is
council_id
date
overdraft
1
2021-04-01
0
2
2021-04-01
-305
3
2021-04-01
0
2
2021-04-25
-350
2
2021-04-26
-345
2
2021-04-27
-350
3
2021-04-27
-7
3
2021-05-06
0
Alternatively, provided the users table is holding a closing (NOW()) balance, replace user_running_total CTE with the following code
user_running_total as (
select user_id, date,
coalesce(lead(date) over(partition by user_id order by date) - interval 1 day, date) nxt,
coalesce(sum(sum(s)) over(partition by user_id order by date desc
rows between unbounded preceding and 1 preceding), sum(s)) rt
from (
select user_id, date, amount_tendered s
from transactions
union all
select user_id, date, -amount
from deposits
union all
select user_id, se.d, se.s
from users
cross join lateral (
select date(NOW() + interval 1 day) d, account_bal s
union all
select '2021-04-01' d, 0
) se
) t
where DATE(date) between date '2021-04-01' and date(NOW() + interval 1 day)
group by user_id, date
),
This way the query starts with closing balance dated next date after now and rollouts a running total in the reverse order till '2021-04-01' as a starting date.
Output
council_id
date
overdraft
1
2021-04-01
0
2
2021-04-01
-260
3
2021-04-01
-46
2
2021-04-25
-305
3
2021-04-25
-41
2
2021-04-26
-300
3
2021-04-26
-46
2
2021-04-27
-305
3
2021-04-27
-63
3
2021-05-06
0
db-fiddle both versions

How to include some data and other data in SQL conditioned to date?

So I have this query, that selects the users, some data, with some filters (such as group that they are in and stuff) and with them the amount they produced (in $) last month (get the last existing record from last month, using MAX(created_date)), for a management platform, which shows how much they produced this month and at the previous (us.amount_produced and up.amount_produced last_month_amount).
The problem is that it doesn't select users that are new (that haven´t produced any amount last month), and I need those to return too.
Any help is appreciated, thanks
(I was thinking about doing a JOIN or even two queries, but I´m sure about the best approach)
Note by examples below that the user #3 didnt have any logs at the User_Performance table before February, he was created on february. So the query below won't return him (i need it to return him)
User table structure:
Users
id email login amount_produced created_date
---------------------------------------------
1 foo#bar.com foo 1000 2019-12-20 22:30:01
2 jack#gmail.com jack 0 2019-12-20 22:30:01
3 john#gmail.com john 2000 2020-02-01 00:00:01
User_Group_Config table structure:
User_Group_Config
user_id group_id
---------------------------------------------
1 4
2 1
3 4
User_Performance table structure this table is a log table that a job inserts data every hour, calculating users productivity and logging:
Users
user_id amount_produced created_date
---------------------------------------------
1 500 2020-01-31 22:30:01
2 0 2020-01-31 22:30:01
1 500 2020-01-31 23:30:01
2 0 2020-01-31 23:30:01
1 1000 2020-02-01 00:30:01
2 0 2020-02-01 00:30:01
3 0 2020-02-01 00:30:01
SELECT
us.id,
us.email,
us.login,
ugc.group_id,
up.user_id,
up.amount_produced last_month_amount
FROM
db.User_Performance AS up,
db.User_Group_Config ugc,
db.User AS us
WHERE
created_date IN (SELECT
MAX(created_date)
FROM
User_Performance
WHERE
/* Here it filters only users that have data last month, I need these AND the ones that have no data to return zero here or null or undefined at this row)*/
MONTH(created_date) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH)
GROUP BY user_id)
AND ugc.group_id = 4
AND up.user_id = ugc.user_id
AND us.id = up.user_id;
Desired Results (note that user #2 wasn´t selected since his group_id is #1
Results
(current month) (previous month)
id email login amount_produced last_month_amount
---------------------------------------------
1 foo#bar.com foo 1000 500
3 john#gmail.com john 0 null or 0
Test
SELECT
us.id,
us.contact_phone,
us.email,
us.first_name,
us.last_name,
us.login,
ugc.group_id,
us.create_date,
us.expire_date,
us.profile_photo,
us.dashboard_enabled,
us.general_rating,
us.rework_rating,
us.amount_produced,
us.amount_spent,
up.user_id,
up.amount_produced last_month_amount
FROM db.User_Performance AS up
LEFT JOIN db.User_Group_Config ugc ON up.user_id = ugc.user_id AND ugc.group_id = 4
LEFT JOIN db.User us ON us.id = up.user_id
WHERE
up.created_date IN (SELECT
MAX(created_date)
FROM
User_Performance
WHERE
/* Here it filters only users that have data last month, I need these AND the ones that have no data to return zero here or null or undefined at this row)*/
MONTH(created_date) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH)
GROUP BY user_id);
Solved using this, with subquery and JOIN (not the best solution, but a solution):
SELECT
us.id,
us.email,
us.login,
ugc.group_id,
us.amount_produced,
(
SELECT
perf.amount_produced
FROM
User_Performance perf
WHERE
perf.user_id = us.id AND
perf.created_date BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%Y-%m-01 00:00:00') and CONCAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), " 23:59:59")
ORDER BY
perf.created_date DESC
LIMIT 1
) as amount_produced_last_month
FROM
User AS us
INNER JOIN
User_Group_Config ugc ON ugc.user_id = us.id
WHERE
ugc.group_id = 4;

How to determine a query for a specific time interval in MySQL?

I'm running some crontabs which trigger R-Scripts where I load Google Analytics Data for a specific time interval. Usually its the interval:
Today - 1 to Today - 14 days which corresponds to the following statement:
subset(mydata, date >= Sys.Date()-14 & date <= Sys.Date()-1)
I would like to add some MySQL-Query to that R-Scriptin order to get some data, which uses the same time interval. My tables have the following form:
`pictures` `music` `likes`
id date_of_upload id pictures_id id pictures_id
1 2012-01-16 50 1283 287 12
2 2012-02-17 25 736 2366 39
... ... ... ... ... ...
6000 2016-01-23
My query has the following form where I would like to meet the upper time interval:
SELECT
COUNT(p.id) AS pictures,
COUNT(m.id) AS songs,
COUNT(l.id) AS likes,
CAST(p.date_of_upload AS DATE) AS Posted
FROM pictures p
LEFT JOIN
music m ON p.id = m.pictures_id
LEFT JOIN
likes l ON p.id = l.pictures_id
WHERE p.date_of_upload > DATE_ADD(CURRENT_DATE(), INTERVAL - 14 DAY)
But that doesn't seem to be the right implementation for the time interval.
The required output may look as following:
posted songs likes picture
2016-01-23 20 30 3
2016-01-22 10 8 1
2016-01-21
...
2016-01-07
I think the simplest solution is to use COUNT(DISTINCT):
SELECT COUNT(DISTINCT p.id) AS pictures,
COUNT(DISTINCT m.id) AS songs,
COUNT(DISTINCT l.id) AS likes,
CAST(p.date_of_upload AS DATE) AS Posted
FROM pictures p LEFT JOIN
music m
ON p.id = m.pictures_id LEFT JOIN
likes l
ON p.id = l.pictures_id
WHERE p.date_of_upload > DATE_ADD(CURRENT_DATE(), INTERVAL - 14 DAY)
The problem is probably that you are getting Cartesian products between the two tables -- a separate row for each combination of pictures, music, and likes.
COUNT(DISTINCT) is the easiest way, but if you have large values, then it is inefficient.

How to update table after a certain time interval

How can I update a table after some time interval when a condtion is matched?
tb_contest
id contest_id name is_expire
1 101 new 0
2 102 old 0
tb_answer
contest_id answer_id date
101 1 2012-02-02
101 2 2012-09-14
102 5 2012-06-01
I need to update tb_contest after some condition was met and make is_expire=1 after 2 days on basis of the last answer received i:e 2012-03-14, so the tb_contest should be updated on 2012-09-16.
You could use MySQL's event scheduler:
CREATE EVENT expire_contests
ON SCHEDULE EVERY DAY
STARTS CURRENT_DATE
DO UPDATE tb_contest JOIN (
SELECT contest_id, MAX(date) AS latest
FROM tb_answer
GROUP BY contest_id
) t USING (contest_id)
SET tb_contest.is_expire = 1
WHERE tb_contest.is_expire <> 1
AND t.latest <= CURRENT_DATE - INTERVAL 2 DAY
Try this one,
UPDATE tb_contest a INNER JOIN
(
SELECT contest_ID, MAX(`date`) maxDate
FROM tb_answer
GROUP BY contest_ID
) b ON a.contest_ID = b.contest_ID
SET a.is_expire = 1
WHERE DATEDIFF(CURDATE(), b.maxDate) >= 2 AND
a.is_expire = 0
So here it goes, the two tables were joined by contest_ID and having the lastest answered date on tb_answer. By using DATEDIFF() we can know the difference between today's date and the date the contest has been answered.
You can JOIN the contest and an inner-query on the answer table in the UPDATE clause and use MySQL's DATEDIFF to count the number-of-days since the answer was, well, answered:
UPDATE
tb_contest c
JOIN (SELECT contest_id, MAX(date) AS date FROM tb_answer GROUP BY contest_id) AS a
ON a.contest_id = c.id
SET
c.is_expire = 1
WHERE
DATEDIFF(NOW(), a.date) >= 2

MySQL date related query

I am having problems understanding how best to tackle this query.
I have a table called user_files. A user can have many files. I want to get a list of users who have not modified any of their files within the last year. If a user has modified at least one of their files within the last year, then that user should be excluded.
Table: user_files
file_id | user_id | date_modified
----------------------------------
1 100 2010-10-01
2 100 2010-11-13
3 100 2011-01-01
4 101 2010-10-01
5 101 2010-06-13
6 101 2011-04-12
7 101 2012-04-01
The expected result would only list user_id 100.
Here is some bad sql I have been playing with. The idea is that I find all users who recently modified their files and then find users who are not included in that list.
select user_id from users where user_id not in
(
select user_id from
(
select user_id, story_id, max(date_modified) from user_files
where DATE_FORMAT(date_modified, '%Y-%m-%d') >= DATE_SUB(curdate(), INTERVAL 1 YEAR)
group by user_id
)x
)
Thanks
SELECT DISTINCT(f.user_id)
FROM user_files f
WHERE NOT EXISTS(SELECT 1
FROM user_files ff
WHERE ff.user_id = f.user_id
AND ff.date_modified >= DATE_SUB(curdate(), INTERVAL 1 YEAR))
http://sqlfiddle.com/#!2/64e7f/1
Or,
SELECT user_id
FROM user_files
GROUP BY user_id
HAVING MAX(date_modified) < DATE_SUB(curdate(), INTERVAL 1 YEAR)
http://sqlfiddle.com/#!2/64e7f/4
You can use this simple solution:
SELECT a.user_id
FROM users a
LEFT JOIN user_files b ON
a.user_id = b.user_id AND
b.date_modified >= CURDATE() - INTERVAL 1 YEAR
WHERE b.user_id IS NULL