Select on aggregate result - mysql

I have a query which does a count from another table then adds a column for the result. I then need to alter the original select results based on that but am being told unknown column.
E.g. the following query does a count from another table within the main query, and the result is named shares, I need to filter the main query result set based on whether that column is greater than 0 but I get error unknown column shares
select b.name, event_title, ue.event_vis, event_date,
(select count(*) from list_shares
where user_id = 63 and event_id=ue.user_event_id) as shares,
(DAYOFYEAR(ue.event_date) - DAYOFYEAR(CURDATE())) as days
FROM brains b
join user_events ue on b.user_id=ue.user_id
where b.user_id=63 and ((ue.event_vis='Public') OR (shares>0))
and MOD(DAYOFYEAR(ue.event_date) - DAYOFYEAR(CURDATE()) + 365, 365) <= 30
order by days asc
Is there a way to do this?

I would suggest using a derived table to deliver the aggregate value and join it as you would do with a "phyiscal" table. Example:
select
b.name,
ue.event_title,
ue.event_vis,
ue.event_date,
tmp.shares,
(DAYOFYEAR(ue.event_date) - DAYOFYEAR(CURDATE())) as days
from
brains b join user_events ue on b.user_id = ue.user_id
left join (
select
ls.user_id,
ls.event_id,
count(*) as shares
from
list_shares ls
group by
ls.user_id,
ls.event_id) tmp on b.user_id = tmp.user_id and ue.user_event_id = tmp.event_id
where
b.user_id = 63
and
((ue.event_vis = 'Public') OR (tmp.shares > 0))
and
MOD(DAYOFYEAR(ue.event_date) - DAYOFYEAR(CURDATE()) + 365, 365) <= 30
order by
days asc
Please note the "left join". Because your using the OR operator in your where clause it seems to me like you want to get also rows without a share.
Of course you could also use the same subselect in your where clause but that's duplicate code and harder to maintain.

You cannot use a computed column to filter in the same query. Try something like
SELECT x.*
FROM (
/*Your Query*/
) as x
WHERE x.shares > 0
Or you could do something like
select b.name, event_title, ue.event_vis, event_date,
shares.SharesCount as shares,
(DAYOFYEAR(ue.event_date) - DAYOFYEAR(CURDATE())) as days
FROM brains b
join user_events ue on b.user_id=ue.user_id, event_id
JOIN (select count(*) as [sharesCount], from list_shares
where user_id = 63) as shares ON shares.event_id = ue.user_event_id
where b.user_id=63 and ((ue.event_vis='Public') OR (shares>0))
and MOD(DAYOFYEAR(ue.event_date) - DAYOFYEAR(CURDATE()) + 365, 365) <= 30
AND shares.sharesCount > 0
order by days asc

Related

MYSQL select max date from joined tables

I have 2 tables which I want to join and retrieve some specific data. These are my tables.
tbl_user (reg_id, l_name, f_name, status)
tbl_payments (pay_id, reg_id, mem_plan, from_date, to_date, bill_no, payed_date)
What I need to do is select and view the users who have due payments. To do that I want to get the user details where "status=0" from tbl_user and join the 2 tables together and the conditions are to_date< current date, difference between [current date and the to_date] < 31 and filter by the Max value of to_date.
What I did so far gives me a result according to above mentioned conditions except it dosen't filter by the MAX(to_date). This is my query.
SELECT
A.reg_id,
A.f_name,
A.l_name,
B.mem_plan,
B.from_date,
Max(B.to_date) AS to_date,
B.bill_no,
B.payed_date
FROM
tbl_user A,
tbl_payments B
WHERE
A.status = 0
AND A.reg_id = B.reg_id
AND Date(Now()) >= Date(B.to_date)
AND Datediff(Date(Now()), Date(b.to_date)) < 31
GROUP BY
a.reg_id, b.mem_plan, b.from_date, b.bill_no, b.payed_date;
I'm not very familiar with MYSQL, So please someone tell me what I did wrong or if this query is not up to the standard.
Here are some sample data to work on.
tbl_user ( [M1111,Jon, Doe,0], [M1112,Jane,Doe,1],[M1113,Jony,Doe,0] )
tbl_payment ( [1,M1111,Monthly,2018-05-14,2018-06-14,b123,2018-05-14],[2,M1112,3Months,2018-02-03,2018-05-03,b112,2018-02-03],[3,M1113,Monthly,2018-06-14,2018-07-14,b158,2018-06-14],[4,M1111,Monthly,2018-06-15,2018-07-15,b345,2018-06-15],[5,M1113,Monthly,2018-06-06,2018-07-06,b158,2018-06-06],[6,M1111,Monthly,2018-07-05,2018-08-05,b345,2018-07-05] )
Assuming current date is 2018-07-17, The expecting result should be this
[M1111,Jon,Doe,Monthly,2018-06-15,2018-07-15,b345,2018-06-15],[M1113,Jony,Doe,Monthly,2018-06-14,2018-07-14,b158,2018-06-14]
Instead of that, my query gives me this.
[M1111,Jon,Doe,Monthly,2018-06-15,2018-07-15,b345,2018-06-15],[M1113,Jony,Doe,Monthly,2018-06-06,2018-07-06,b158,2018-06-06],
[M1113,Jony,Doe,Monthly,2018-06-14,2018-07-14,b158,2018-06-14]
I wrote another query which gives me the result set exactly as i want. But I'm not sure whether it's up to the standards. If someone can simplify this or make it better, appreciate very much.
SELECT A.reg_id,A.f_name,A.l_name,D.mem_plan,D.from_date,D.to_date,D.bill_no,D.payed_date
FROM tbl_user A
JOIN (SELECT B.reg_id,B.mem_plan,B.from_date,B.to_date,B.bill_no,B.payed_date
FROM tbl_payments B
JOIN (
SELECT reg_id, MAX(to_date) as to_date
FROM tbl_payments
WHERE DATE(NOW()) >= DATE(to_date) AND DATEDIFF(DATE(NOW()), DATE(to_date))<31
GROUP BY reg_id) C
ON B.reg_id = C.reg_id AND B.to_date= C.to_date) D
ON A.reg_id = D.reg_id
WHERE A.status=0;
I believe having won't work here and that your second query is about as good as it gets. I've condensed it a little here:
SELECT A.reg_id,f_name,l_name,mem_plan,from_date,to_date,bill_no,payed_date
FROM #tbl_user A
JOIN #tbl_payments B ON A.reg_id = b.reg_id
JOIN (
SELECT reg_id, MAX(to_date) as max_to_date
FROM #tbl_payments
WHERE DATE(NOW()) >= DATE(to_date) AND DATEDIFF(DATE(NOW()), DATE(to_date))<31
GROUP BY reg_id
) C ON B.reg_id = C.reg_id AND B.to_date= C.max_to_date
WHERE A.status=0;

Select most recent record grouped by 3 columns

I am trying to return the price of the most recent record grouped by ItemNum and FeeSched, Customer can be eliminated. I am having trouble understanding how I can do that reasonably.
The issue is that I am joining about 5 tables containing hundreds of thousands of rows to end up with this result set. The initial query takes about a minute to run, and there has been some trouble with timeout errors in the past. Since this will run on a client's workstation, it may run even slower, and I have no access to modify server settings to increase memory / timeouts.
Here is my data:
Customer Price ItemNum FeeSched Date
5 70.75 01202 12 12-06-2017
5 70.80 01202 12 06-07-2016
5 70.80 01202 12 07-21-2017
5 70.80 01202 12 10-26-2016
5 82.63 02144 61 12-06-2017
5 84.46 02144 61 06-07-2016
5 84.46 02144 61 07-21-2017
5 84.46 02144 61 10-26-2016
I don't have access to create temporary tables, or views and there is no such thing as a #variable in C-tree, but in most ways it acts like MySql. I wanted to use something like GROUP BY ItemNum, FeeSched and select MAX(Date). The issue is that unless I put Price into the GROUP BY I get an error.
I could run the query again only selecting ItemNum, FeeSched, Date and then doing an INNER JOIN, but with the query taking a minute to run each time, it seems there is a better way that maybe I don't know.
Here is my query I am running, it isn't really that complicated of a query other than the amount of data it is processing. Final results are about 50,000 rows. I can't share much about the database structure as it is covered under an NDA.
SELECT DISTINCT
CustomerNum,
paid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.primfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
UNION ALL
SELECT DISTINCT
CustomerNum,
secpaid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.secfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
I feel it quite simple when I'd read the first three paragraphs, but I get a little confused when I've read the whole question.
Whatever you have done to get the data posted above, once you've got the data like that it's easy to retrive "the most recent record grouped by ItemNum and FeeSched".
How to:
Firstly, sort the whole result set by Date DESC.
Secondly, select fields you need from the sorted result set and group by ItemNum, FeeSched without any aggregation methods.
So, the query might be something like this:
SELECT t.Price, t.ItemNum, t.FeeSched, t.Date
FROM (SELECT * FROM table ORDER BY Date DESC) AS t
GROUP BY t.ItemNum, t.FeeSched;
How it works:
When your data is grouped and you select rows without aggregation methods, it will only return you the first row of each group. As you have sorted all rows before grouping, so the first row would exactly be "the most recent record".
Contact me if you got any problems or errors with this approach.
You can also try like this:
Select Price, ItemNum, FeeSched, Date from table where Date IN (Select MAX(Date) from table group by ItemNum, FeeSched,Customer);
Internal sql query return maximum date group by ItemNum and FeeSched and IN statement fetch only the records with maximum date.

Optimize Join, Sum, Subqueries

I'm building a Tinder clone for a study project and I'm trying to do something very simple conceptually but it appears that my request is really too heavy.
Data Structure
I've created this simple fiddle to visualize the database structure.
I've tried to put indexes on user.id user.gender * user.orientation match.user1 match.user2 match.createdAt with no luck.
Expected result
I want to find the people who have the less number of matches depending on gender, orientation, lastLogin and calendar date.
Users musn't be part of more than 4 matches during 24h so I look for users with <= 3 matches during the last 24h.
Values in the following are hard coded for easy editing of the request and because I didn't took time to do this part for now.
A match is composed of 2 users (user1 and user2).
The limit of 4 matches on the same day is a sum of when they appear as user1 and user2.
SELECT total_sum, userId
FROM (
SELECT u.id as userId, u.orientation as userOrientation, u.gender as userGender, m1.sum1, m2.sum2, (m1.sum1 + m2.sum2) AS total_sum
FROM user u
INNER JOIN (
SELECT user1, COUNT(user1) as sum1
FROM `match`
WHERE createdAt > DATE('2017-12-11 00:00:00')
GROUP BY user1
) m1
ON m1.user1 = u.id
INNER JOIN (
SELECT user2, COUNT(user1) as sum2
FROM `match`
WHERE createdAt > DATE('2017-12-11 00:00:00')
GROUP BY user2
) m2
ON m2.user2 = u.id
WHERE u.gender IN ('female')
AND u.orientation IN ('hetero', 'bi')
AND u.lastLogin > 1512873464582
) as total
WHERE total_sum < 4
ORDER BY total_sum ASC
LIMIT 8
The issue
With tiny tables, request takes few ms but with medium tables (50k users, 200k matches), request takes ages (170s).
Optimizing
According to #Thorsten Kettner response, this is the explain plan of his request when I run it into my test db after setting the indexes he advised:
Solution
I've ended up doing something easier.
First I flatened my match table by removing user2 column. It double the size because now 1 match become 2 rows but allow me to do something very simpler and very efficient with proper indexes.
The first query is to manage users with no matches and the second one to handle user with matches. I don't have anymore the matchesLimit into the query as it add extra work for mysql and I just need to check the first result to see if matchNumber is <= 3.
(SELECT u.id, mc.id as nb_match, u.gender, u.orientation
FROM user u
LEFT JOIN match_composition mc
ON (mc.matchedUser = u.id AND mc.createdAt > DATE('2017-12-11 00:00:00'))
WHERE u.lastLogin > 1512931740721
AND u.orientation IN ('bi', 'hetero')
AND u.gender IN ('female')
AND mc.id IS NULL
ORDER BY u.lastLogin DESC)
UNION ALL
(SELECT u.id, count(mc.id) as nb_match, u.gender, u.orientation
FROM match_composition mc
JOIN user u
ON u.id = matchedUser
WHERE mc.createdAt > DATE('2017-12-11 00:00:00')
AND u.lastLogin > 1512931740721
AND u.orientation IN ('bi', 'hetero')
AND u.gender IN ('female')
GROUP BY matchedUser
ORDER BY nb_match ASC
LIMIT 8)
thanks for your help
A user can be matched as user1 or user2. We can use UNION ALL to get one record per user:
select user1 as userid from match union all select user2 as userid from match;
The complete query:
select
u.id as userid,
coalesce(um.total, 0) as total
from user u
left join
(
select userid, count(*) as total
from
(
select user1 as userid from match where createdat > date '2017-12-11'
union all
select user2 as userid from match where createdat > date '2017-12-11'
) m
group by userid
) um on um.userid = u.id
where u.gender IN ('female')
and u.orientation in ('hetero', 'bi')
and u.lastlogin > 1512873464582
and coalesce(um.total, 0) < 4
order by coalesce(um.total, 0);
You would have the following indexes for this:
create index idx_m1 on match (createdat, user1);
create index idx_m2 on match (createdat, user2);
create index idx_u on user (lastlogin, gender, orientation, id);
I guess you were right about your SQL skills. This is what I came up with:
SELECT u.id as userId,
u.orientation as userOrientation,
u.gender as userGender,
count(m.user1) total_sum
FROM user u
LEFT JOIN `match` m on (u.id in (m.user1, m.user2)
and m.createdAt > DATE('2017-12-11 00:00:00'))
WHERE u.gender IN ('female')
AND u.orientation IN ('hetero', 'bi')
AND u.lastLogin > 1512873464582
having count(m.user1) <=4
ORDER BY total_sum ASC
LIMIT 8;
Edit: Covered also cases with no matches
Try to play around with indexing match table columns user1, user1 and also with User table columns(or column combinations) you use in filters (gender for example), see what brings better performance.
From what you provide, I would create indexes on:
- match.user1
- match.user2
- match.createdAt
- user.id (unique, and probably a PK)
- user.lastLogin
I would also try to replace COUNT(user1) by COUNT(*), but it won't probably have a big impact.
Indexes on user.gender and user.orientation are probably useless: the efficiency of an index is somehow proportional to the variance of its underlying values. Therefore an index on a field with 2-3 distinct values is more costly than useful.
As for the DLL, try the following. I tried to force the filtering on user to be done BEFORE the joins with match, in case the query optimizer does not work properly (I have little experience with non MS databases)
SELECT total_sum, userId
FROM (SELECT u.id as userId, u.orientation as userOrientation, u.gender as userGender, m1.sum1, m2.sum2, (m1.sum1 + m2.sum2) AS total_sum
FROM (SELECT * FROM user
WHERE gender = 'female'
AND orientation IN ('hetero', 'bi')
AND lastLogin > 1512873464582
) u
INNER JOIN (SELECT user1, COUNT(*) as sum1
FROM `match`
WHERE createdAt > DATE('2017-12-11 00:00:00')
GROUP BY user1
) m1 ON m1.user1 = u.id
INNER JOIN (SELECT user2, COUNT(*) as sum2
FROM `match`
WHERE createdAt > DATE('2017-12-11 00:00:00')
GROUP BY user2
) m2 ON m2.user2 = u.id
) as total
WHERE total_sum < 4
ORDER BY total_sum ASC
LIMIT 8

MySQL Help: Return invoices and payments by date

I am having trouble getting a MySQL query to work for me. Here is the setup.
A customer has asked me to compile a report from some accounting data. He wants to select a date (and possibly other criteria) and have it return all of the following (an OR statement):
1.) All invoices that were inserted on or after that date
2.) All invoices regardless of their insert date that have corresponding payments in a separate table whose insert dates are on or after the selected date.
The first clause is basic, but I am having trouble pairing it with the second.
I have assembled a comparable set of test data in an SQL Fiddle. The query that I currently have is provided.
http://www.sqlfiddle.com/#!2/d8d9c/3/2
As noted in the comments of the fiddle, I am working with July 1, 2013 as my selected date. For the test to work, I need invoices 1 through 5 to appear, but not invoice #6.
Try this: http://www.sqlfiddle.com/#!2/d8d9c/9
Here are the summarized changes
I got rid of your GROUP BY. You did not have any aggregate functions. I used DISTINCT instead to eliminate duplicate records
I removed your implicit joins and put explicit joins in their place for readability. Then I changed them to LEFT JOINs. I am not sure what your data looks like but at a minimum, I would assume you need the payments LEFT JOINed if you want to select an invoice that has no payments.
This will probably get you the records you want, but those subselects in the SELECT clause may perform better as LEFT JOINs then using the SUM function
Here is the query
SELECT DISTINCT
a.abbr landowner,
CONCAT(f.ForestLabel, '-', l.serial, '-', l.revision) leasenumber,
i.iid,
FROM_UNIXTIME(i.dateadded,'%M %d, %Y') InvoiceDate,
(SELECT IFNULL(SUM(ch.amount), 0.00) n FROM test_charges ch WHERE ch.invoiceid = i.iid) totalBilled,
(SELECT SUM(p1.amount) n FROM test_payments p1 WHERE p1.invoiceid = i.iid AND p1.transtype = 'check' AND p1.status = 2) checks,
(SELECT SUM(p1.amount) n FROM test_payments p1 WHERE p1.invoiceid = i.iid AND p1.transtype = 'ach' AND p1.status = 2) ach,
CASE WHEN i.totalbilled < 0 THEN i.totalbilled * -1 ELSE 0.00 END credits,
CASE WHEN i.balance >= 0 THEN i.balance ELSE 0.00 END balance,
t.typelabel, g.groupname
FROM test_invoices i
LEFT JOIN test_contracts c
ON i.contractid = c.cid
LEFT JOIN test_leases l
ON c.leaseid = l.bid
LEFT JOIN test_forest f
ON l.forest = f.ForestID
LEFT JOIN test_leasetypes t
ON l.leasetype = t.tid
LEFT JOIN test_accounts a
ON l.account = a.aid
LEFT JOIN test_groups g
ON c.groupid = g.gid
LEFT JOIN test_payments p
ON p.invoiceid = i.iid
WHERE (i.dateadded >= #startdate) OR (p.dateadded >= #startdate)
Try this.
http://www.sqlfiddle.com/#!2/d8d9c/11/2
TL;DR:
… AND (i.dateadded > #startdate
OR EXISTS (
SELECT * FROM test_payments
WHERE test_payments.invoiceid = i.iid
AND test_payments.dateadded >= #startdate))

MySQL Subquery returned more than 1 row

I cannot figure out why this is not working. Basically, I am running a subquery to count all rows of p.songid WHERE trackDeleted=0. The subquery works fine when I execute it by itself, but when I implement I get "subquery returned more than 1 row".
SELECT u.username, u.id, u.score, s.genre, s.songid, s.songTitle, s.timeSubmitted, s.userid, s.insWanted, s.bounty,
(SELECT COUNT(p.songid)
FROM songs s
LEFT JOIN users u
ON u.id = s.userid
LEFT JOIN posttracks p
ON s.songid = p.songid
WHERE p.trackDeleted=0
GROUP BY s.timeSubmitted ASC
LIMIT 25)
AS trackCount
FROM songs s
LEFT JOIN users u
ON u.id = s.userid
LEFT JOIN posttracks p
ON s.songid = p.songid
WHERE paid=1 AND s.timeSubmitted >= ( CURDATE() - INTERVAL 60 DAY )
GROUP BY s.timeSubmitted ASC
LIMIT 25
Obviously, a sub-query can't return more than one row, as this makes no sense. You only expect one value to be returned - COUNT(p.songid) - yet you GROUP BY s.timeSubmitted, which will make it return multiple rows, and multiple counts of p.songid.
Think about it this way, a subquery in the SELECT statement like you have needs to return a single value since it is going to act like just another column in your select list. Since you have a LIMIT 25 on yours, you're obviously expecting more than one value back, which is inocrrect for this usage.
OK, your query is a mess. Not only is the subquery broken, but I'm pretty sure the GROUP BY s.timeSubmitted ASC isn't doing what you think think it does. (Did you mean ORDER BY instead?) It might help if you explained in words what you're trying to accomplish.
Anyway, I'm going to take a wild guess and suggest that this might be what you want:
SELECT
u.username, u.id, u.score, s.genre, s.songid, s.songTitle,
s.timeSubmitted, s.userid, s.insWanted, s.bounty,
COUNT(p.songid) AS trackCount
FROM songs s
LEFT JOIN users u ON u.id = s.userid
LEFT JOIN posttracks p ON p.songid = s.songid AND p.trackDeleted = 0
WHERE paid = 1 AND s.timeSubmitted >= ( CURDATE() - INTERVAL 60 DAY )
GROUP BY s.songid
ORDER BY s.timeSubmitted ASC
LIMIT 25
Edit: Fixed the COUNT() so that it will correctly return 0 if there are no matching tracks.