MS Access Totals Query Sum Field Incorrect Result - ms-access

I've built what I believe to be a simple query in MS Access.
Two Tables are involved:
Properties
History
The History table includes multiple rows of data for each property, and for various dates.
I'm trying to show the sum of net_value for a specific date for properties that share a common area in the property table.
Here's my query:
SELECT Properties.Area
, History.HIST_DATE
, History.ID
, Sum(History.NET_VALUE) AS SumOfNET_VALUE
FROM Properties INNER JOIN History ON Properties.ID = History.ID
WHERE (((History.Account_ID)=45))
GROUP BY Properties.Area, History.HIST_DATE, History.ID
HAVING (((Properties.Area)="MY AREA") AND
((History.HIST_DATE)=#2/1/2017#));
The problem is, the sum field is wildly incorrect.
Debugging
The root cause of the issue is that there are multiple entries of Properties.ID.
So I suppose the select is not distinct? Is there a way around this?
The Properties.ID is effectively an account identifier and multiple properties can be associated with it; so I can't really limit Properties.ID to one record per ID... thoughts?

It looks like History.ID is a unique field, or at least unique to each property.
By including it in your query the sum will group on that ID as well as the area, so you'll end up with a total per property per area.
No idea where Account_ID comes into it, but have included it in the WHERE clause anyway.
SELECT Properties.Area
, History.Hist_Date
, SUM(History.Net_Value) AS Total_Net_Value
FROM Properties LEFT JOIN History ON Properties.ID = History.ID
WHERE History.Account_ID = 45 AND
Properties.Area = "My Area"
History.Hist_Date=#04/27/2018#
GROUP BY Properties.Area
, History.Hist_Date

Try this one:
SELECT Properties.Area
, History.HIST_DATE
, History.ID
, Sum(History.NET_VALUE) AS SumOfNET_VALUE
FROM Properties INNER JOIN History ON Properties.ID = History.ID
WHERE ((History.Account_ID)=45)
AND (Properties.Area)="MY AREA"
AND (History.HIST_DATE)=#2/1/2017#))
GROUP BY Properties.Area, History.HIST_DATE, History.ID;
Do not use HAVING clause as Filter except for aggregate function, see here. Use WHERE Part instead.
HAVING would be reasonable for ...HAVING SumOfNET_VALUE > 200, for example.

Related

SQL: Column Must Appear in the GROUP BY Clause Or Be Used in an Aggregate Function

I'm doing what I would have expected to be a fairly straightforward query on a modified version of the imdb database:
select primary_name, release_year, max(rating)
from titles natural join primary_names natural join title_ratings
group by year
having title_category = 'film' and year > 1989;
However, I'm immediately running into
"column must appear in the GROUP BY clause or be used in an aggregate function."
I've tried researching this but have gotten confusing information; some examples I've found for this problem look structurally identical to mine, where others state that you must group every single selected parameter, which defeats the whole purpose of a group as I'm only wanting to select the maximum entry per year.
What am I doing wrong with this query?
Expected result: table with 3 columns which displays the highest-rated movie of each year.
If you want the maximum entry per year, then you should do something like this:
select r.*
from ratings r
where r.rating = (select max(r2.rating) where r2.year = r.year) and
r.year > 1989;
In other words, group by is the wrong approach to writing this query.
I would also strongly encourage you to forget that natural join exists at all. It is an abomination. It uses the names of common columns for joins. It does not even use properly declared foreign key relationships. In addition, you cannot see what columns are used for the join.
While I am it, another piece of advice: qualify all column names in queries that have more than one table reference. That is, include the table alias in the column name.
If you want to display all the columns you can user window function like :
select primary_name, year, max(rating) Over (Partition by year) as rating
from titles natural
join primary_names natural join ratings
where title_type = 'film' and year > 1989;

Remove duplicates from LEFT JOIN query

I am using the following JOIN statement:
SELECT *
FROM students2014
JOIN notes2014 ON (students2014.Student = notes2014.NoteStudent)
WHERE students2014.Consultant='$Consultant'
ORDER BY students2014.LastName
to retrieve a list of students (students2014) and corresponding notes for each student stored in (notes2014).
Each student has multiple notes within the notes2014 table and each note has an ID that corresponds with each student's unique ID. The above statement is returning a the list of students but duplicating every student that has more than one note. I only want to display the latest note for each student (which is determined by the highest note ID).
Is this possible?
You need another join based on the MAX noteId you got from your select.
Something like this should do it (not tested; next time I'd recommed you to paste a link to http://sqlfiddle.com/ with your table structure and some sample data.
SELECT *
FROM students s
LEFT JOIN (
SELECT MAX(NoteId) max_id, NoteStudent
FROM notes
GROUP BY NoteStudent
) aux ON aux.NoteStudent = s.Student
LEFT JOIN notes n2 ON aux.max_id = n2.NoteId
If I may say so, the fact that a table is called students2014 is a big code smell. You'd be much better off with a students table and a year field, for many reasons (just a couple: you won't need to change your DB structure every year, querying across years is much, much easier, etc, etc). Perhaps you "inherited" this, but I thought I'd mention it.
GROUP the query by studentId and select the MAX of the noteId
Try :
SELECT
students2014.Student,
IFNULL(MAX(NoteId),0)
FROM students2014
LEFT JOIN notes2014 ON (students2014.Student = notes2014.NoteStudent)
WHERE students2014.Consultant='$Consultant'
GROUP BY students2014.Student
ORDER BY students2014.LastName

How to select distinct rows from a table without a primary key

I need to show a Notification on user login if there is any unread messages. So if multiple users send (5 messages each) while the user is in offline these messages should be shown on login. Means have to show the last messages from each user.
I use joining to find records.
In this scenario Message from User is not a primary key.
This is my query
SELECT
UserMessageConversations.MessageFrom, UserMessageConversations.MessageFromUserName,
UserMessages.MessageTo, UserMessageConversations.IsGroupChat,
UserMessageConversations.IsLocationChat,
UserMessageConversations.Message, UserMessages.UserGroupID,UserMessages.LocationID
FROM
UserMessageConversations
LEFT OUTER JOIN
UserMessages ON UserMessageConversations.UserMessageID = UserMessages.UserMessageID
WHERE
UserMessageConversations.MessageTo = 743
AND UserMessageConversations.ReadFlag = 0
This is the output obtained from above query.
MessageFrom -582 appears twice. I need only one record of this user.
How is it possible
I'm not entirely sure I totally understand your question - but one approach would be to use a CTE (Common Table Expression).
With this CTE, you can partition your data by some criteria - i.e. your MessageFrom - and have SQL Server number all your rows starting at 1 for each of those partitions, ordered by some other criteria - this is the point that's entirely unclear from your question, whether you even care what the rows for each MessageFrom number are sorted on (do you have some kind of a MessageDate or something that you could order by?) ...
So try something like this:
;WITH PartitionedMessages AS
(
SELECT
umc.MessageFrom, umc.MessageFromUserName,
um.MessageTo, umc.IsGroupChat,
umc.IsLocationChat,
umc.Message, um.UserGroupID, um.LocationID ,
ROW_NUMBER() OVER(PARTITION BY umc.MessageFrom
ORDER BY MessageDate DESC) AS 'RowNum' <=== totally unclear yet
FROM
dbo.UserMessageConversations umc
LEFT OUTER JOIN
dbo.UserMessages um ON umc.UserMessageID = um.UserMessageID
WHERE
umc.MessageTo = 743
AND umc.ReadFlag = 0
)
SELECT
MessageFrom, MessageFromUserName, MessageTo,
IsGroupChat, IsLocationChat,
Message, UserGroupID, LocationID
FROM
PartitionedMessages
WHERE
RowNum = 1
Here, I am selecting only the "first" entry for each "partition" (i.e. for each MessageFrom) - ordered by a "imagined" MessageDate column so that the most recent (the newest) message would be selected.
Does that approach what you're looking for??
If you think of them as same rows, I assume you don't care about the message field.
In this case you can use the DISTINCT clause:
SELECT DISTINCT
UserMessageConversations.MessageFrom, UserMessageConversations.MessageFromUserName,
UserMessages.MessageTo, UserMessageConversations.IsGroupChat,
UserMessageConversations.IsLocationChat,
UserMessages.UserGroupID,UserMessages.LocationID
FROM
UserMessageConversations
LEFT OUTER JOIN
UserMessages ON UserMessageConversations.UserMessageID = UserMessages.UserMessageID
WHERE
UserMessageConversations.MessageTo = 743
AND UserMessageConversations.ReadFlag = 0
In general with distinct clause you have a row for every distinct group of row attributes.
If your requirement instead is to show a single field for all the messages (example: every message folded in a single message with a separator between them) you can use an aggregate function, but in SQL Server it seems is not that easy.

Removing duplicate rows in a complex MySQL Query - not sure if I'm doing this wrong, or it's a MySQL bug

I have a reasonably complex MySQL query being run on another developer's database. I am trying to copy over his data to our new database structure, so I'm running this query to get a load of the data over to copy. The main table has around 45,000 rows.
As you can see from the query below, there's a lot of fields from several different tables. My problem is that the field Ref.refno (as ref_id) is being pulled through, in some cases, two or three times. This is because in the table LandlordOnlineRef (LLRef) there are sometimes multiple rows with this same reference number - in this case, because the row should have been edited, but instead was duplicated...
Here's what I've tried doing: -
SELECT DISTINCT(Ref.refno) [...] - this makes no difference to the output at all, although I would've assumed it would stop selecting duplicate refno IDs
Is this a MySQL bug, or me? - I also tried adding GROUP BY ref_id to the end of my query. The query normally takes a few milliseconds to run, but when I add GROUP BY to the end, it seems to run infinitely - I waited several minutes but nothing was happening. I thought it might be struggling because I'm using LIMIT 1000, so I also tried LIMIT 10 but still get the same effect.
Here's the problem query - thanks!
SELECT
-- progress
Ref.refno AS ref_id,
Ref.tenantid AS tenant_id,
Ref.productid AS product_id,
Ref.guarantorid AS guarantor_id,
Ref.agentid AS agent_id,
Ref.companyid AS company_id,
Ref.status AS status,
Ref.startdate AS ref_start_date,
Ref.enddate AS ref_end_date,
-- ReferenceDetails
RefDetails.creditscore AS credit_score,
-- LandlordOnlineRef
LLRef.propaddress AS prev_ll_address,
LLRef.rent AS prev_ll_rent,
LLRef.startdate AS prev_ll_start_date,
LLRef.enddate AS prev_ll_end_date,
LLRef.arrears AS prev_ll_arrears,
LLRef.arrearsreason AS prev_ll_arrears_reason,
LLRef.propertycondition AS prev_ll_property_condition,
LLRef.conditionreason AS prev_ll_condition_reason,
LLRef.consideragain AS prev_ll_consider_again,
LLRef.completedby AS prev_ll_completed_by,
LLRef.contactno AS prev_ll_contact_no,
LLRef.landlordagent AS prev_ll_or_agent,
-- EmpDetails
EmpRef.cempname AS emp_name,
EmpRef.cempadd1 AS emp_address_1,
EmpRef.cempadd2 AS emp_address_2,
EmpRef.cemptown AS emp_address_town,
EmpRef.cempcounty AS emp_address_county,
EmpRef.cemppostcode AS emp_address_postcode,
EmpRef.ctelephone AS emp_telephone,
EmpRef.cemail AS emp_email,
EmpRef.ccontact AS emp_contact,
EmpRef.cgross AS emp_income,
EmpRef.cyears AS emp_years,
EmpRef.cmonths AS emp_months,
EmpRef.cposition AS emp_position,
-- EmpLlodReference
ELRef.lod_ref_status AS prev_ll_status,
ELRef.lod_ref_email AS prev_ll_email,
ELRef.lod_ref_tele AS prev_ll_telephone,
ELRef.emp_ref_status AS emp_status,
ELRef.emp_ref_tele AS emp_telephone,
ELRef.emp_ref_email AS emp_email
FROM ReferenceDetails AS RefDetails
LEFT JOIN progress AS Ref ON Ref.refno
LEFT JOIN LandlordOnlineRef AS LLRef ON LLRef.refno = Ref.refno
LEFT JOIN EmpLlodReference AS ELRef ON ELRef.refno = Ref.refno
LEFT JOIN EmpDetails AS EmpRef ON EmpRef.tenantid = Ref.tenantid
-- For testing purposes to speed things up, limit it to 1000 rows
LIMIT 1000
LEFT JOIN progress AS Ref ON Ref.refno
is going to basically turn that into a cartesian join. You're not doing an explicit comparison, you're saying "join all records where there's a non-null value".
Shouldn't it be
LEFT JOIN progress AS Ref ON Ref.refno = RefDetails.something
?
Put all of the selected columns into DISTINCT, separated by ,. If you want to keep the renaming, wrap another SELECT DISTINCT(*) FROM (YOUR_SELECT) around.
Are there indexes on the columns in the GROUP BY clause? LIMIT is applied after GROUP BY. So limiting does not affect the query runtime.
General rule is to never group by more columns then you have to. Use a subquery with a group by on the table thats returning duplicate rows to get rid of them.
Change:
LEFT JOIN LandlordOnlineRef AS LLRef ON LLRef.refno = Ref.refno
to:
Left Join (select refno
, othercolumns you need
from LandlordOnlineRef
group by refno,othercolumn) as LLRef
Not sure on which columns you'll want to include here, but at any table level, you can change that table to a subquery to eliminate duplicate rows before the join. As MarkBannister says, you'll need some logic to identify a unique refno within LLRef . You can also use a date column for 'most recent' or any other logic you can think of to get back a unique LLRef and the info related to that record.
ugh, auto spell check is changing refno to refine. ha

Losing Records in a Left join after using Group by

Basically after executing this query:
SELECT
`view_customer_locations`.customerid,
`view_customer_locations`.community_groupid,
`view_customer_locations`.community_group,
`view_sip_user_agents`.sip_user_agentid,
`view_sip_user_agents`.didid,
`view_sip_user_agents`.temporary_didid,
`view_sip_user_agents`.active_did,
GROUP_CONCAT( (IF(`view_sip_user_agents`.active_did = 'permanent', cast(`permanent_dids`.did as char(10)), cast(`temporary_dids`.did as char(10)))) SEPARATOR ', ') as did,
`view_sip_user_agents`.sip_user_agents_date_archived
FROM `view_customer_locations`
LEFT JOIN `view_sip_user_agents` on `view_customer_locations`.customerid = `view_sip_user_agents`.customerid
LEFT JOIN `dids` AS permanent_dids ON `view_sip_user_agents`.didid = `permanent_dids`.id
LEFT JOIN `dids` AS temporary_dids ON `view_sip_user_agents`.temporary_didid = `temporary_dids`.id
Group by `view_customer_locations`.customerid
i still want all the rows from the view_customer_locations table.. but i am losing any entries in the view_customer_locations table that don't have a corresponding record in the view_sip_user_agents table. I also want the entries to be grouped by customerid .. so that each customer only has one entry in the resulting query.
If i remove the group by clause, i get all the entires from the view_customer_locations table but naturally i have multiple entries per customer which is not what i want.
please help
Although MySQL does let you "get away" with expressing a GROUP BY clause with fields in the SELECT tha that might conceivable vary over the GROUP BY fields (theoretically picking an "arbitrary/random value"), the results of this ill-conceived, logically not well-founded operation are sometimes surprising, as you've noticed.
Try using correct SQL, e.g. with a MAX operator over the fields you're not "grouping by". If the implied assumption that those fields are strictly determined by the grouped-by fields is right, this can't possibly damage your results in any way, right? And yet sometimes you'll find that results do appear, or change (meaning the implied assumption was, simply, wrong).
In your case, since some of the fields might be uniformly NULL in a group, and MAX in that case is not necessarily well-defined, you might further try to use IFNULL there, of course.
I don't think GROUP BY is what you really want here. DISTINCT is more correct because it will eliminate duplicates but the results are defined on the non-grouped by fields
SELECT DISTINCT
`view_customer_locations`.customerid,
`view_customer_locations`.community_groupid,
`view_customer_locations`.community_group,
`view_sip_user_agents`.sip_user_agentid,
`view_sip_user_agents`.didid,
`view_sip_user_agents`.temporary_didid,
`view_sip_user_agents`.active_did,
GROUP_CONCAT( (IF(`view_sip_user_agents`.active_did = 'permanent', cast(`permanent_dids`.did as char(10)), cast(`temporary_dids`.did as char(10)))) SEPARATOR ', ') as did,
`view_sip_user_agents`.sip_user_agents_date_archived
FROM `view_customer_locations`
LEFT JOIN `view_sip_user_agents` on `view_customer_locations`.customerid = `view_sip_user_agents`.customerid
LEFT JOIN `dids` AS permanent_dids ON `view_sip_user_agents`.didid = `permanent_dids`.id
LEFT JOIN `dids` AS temporary_dids ON `view_sip_user_agents`.temporary_didid = `temporary_dids`.id
Group by `view_customer_locations`.customerid