Remove duplicates from LEFT JOIN query

Remove duplicates from LEFT JOIN query - mysql

I am using the following JOIN statement:
SELECT *
FROM students2014
JOIN notes2014 ON (students2014.Student = notes2014.NoteStudent)
WHERE students2014.Consultant='$Consultant'
ORDER BY students2014.LastName
to retrieve a list of students (students2014) and corresponding notes for each student stored in (notes2014).
Each student has multiple notes within the notes2014 table and each note has an ID that corresponds with each student's unique ID. The above statement is returning a the list of students but duplicating every student that has more than one note. I only want to display the latest note for each student (which is determined by the highest note ID).
Is this possible?

You need another join based on the MAX noteId you got from your select.
Something like this should do it (not tested; next time I'd recommed you to paste a link to http://sqlfiddle.com/ with your table structure and some sample data.
SELECT *
FROM students s
LEFT JOIN (
SELECT MAX(NoteId) max_id, NoteStudent
FROM notes
GROUP BY NoteStudent
) aux ON aux.NoteStudent = s.Student
LEFT JOIN notes n2 ON aux.max_id = n2.NoteId
If I may say so, the fact that a table is called students2014 is a big code smell. You'd be much better off with a students table and a year field, for many reasons (just a couple: you won't need to change your DB structure every year, querying across years is much, much easier, etc, etc). Perhaps you "inherited" this, but I thought I'd mention it.

GROUP the query by studentId and select the MAX of the noteId
Try :
SELECT
students2014.Student,
IFNULL(MAX(NoteId),0)
FROM students2014
LEFT JOIN notes2014 ON (students2014.Student = notes2014.NoteStudent)
WHERE students2014.Consultant='$Consultant'
GROUP BY students2014.Student
ORDER BY students2014.LastName

Related

MySQL - Cannot reference result of a subquery

Hit a roadblock and hoping someone here is able to help please?
edit: DB<>FIDDLE : https://dbfiddle.uk/?rdbms=mysql_5.5&fiddle=c73f8ec9a60f530fe4ad489dc743f9b9
I have 3 tables:
marks - which uses grades;
target - which also uses grades;
grades - a lookup for grades to points.
What I am trying to do is calculate the total points for grades within the marks table, then calculate the total target points by multiplying the points value for the target by the number of grades within the marks table for a given person (adno).
I'm able to sum and count the points values from the marks table without a problem, but as I've used an inner joint for the marks to grades already I cannot add a further one for target to grades so I've used a subquery.
However when I try to use the result of subquery (EDIT single_target_points , not target_points as I originally posted) in the calculation in the line straight after it I get the error :
[Err] 1054 - Unknown column 'single_target_points' in 'field list'
This is the query I am trying:
SELECT
marks.adno,
Sum(grades.points) AS total_points,
Count(grades.points) AS no_of_subjects,
(SELECT grades.points FROM targets INNER JOIN grades ON targets.grade = grades.grade WHERE targets.adno = marks.adno GROUP BY grades.points) AS single_target_points,
single_target_points*no_of_subjects AS target_points
FROM
marks
INNER JOIN grades ON marks.resultvalue = grades.grade
INNER JOIN targets ON targets.adno = marks.adno
GROUP BY
marks.adno

This will resolve the query for you, but not sure what you are doing is completely accurate. I took your in-line query and made it a subquery including the "adno" (person id) to get all distinct points. Using that and JOINING based on the person like the others.
SELECT
m.adno,
Sum(g.points) AS total_points,
Count(g.points) AS no_of_subjects,
SUM( TP.single_target_points * g.points) AS target_points
FROM
marks m
JOIN grades g
ON m.resultvalue = g.grade
JOIN students s
ON m.adno = s.adno
JOIN targets t
ON m.adno = t.adno
JOIN
( SELECT distinct
t2.adno,
t2.grade,
g2.points single_target_points
FROM
targets t2
JOIN grades g2
ON t2.grade = g2.grade ) TP
on t.adno = TP.adno
and t.grade = TP.grade
GROUP BY
m.adno
Now, that being said, it looks like you are trying to compute a person's GPA (grade point average). If you can EDIT your existing post and provide samples of data (use spaces to align, not tabs) even if fictional names (but not necessary since things here are all ID values and otherwise not private). It would help to see the basis of what computations are going on. If not done correctly, you will get Cartesian results and skew your points due to multiple entries * multiple entries = oversized expected answer values.
Also, I Updated your dbfiddle. You inserted into adno instead of students the sample records, inserted into targets the grades, so those tables were blank. I had to correct that. Then changed grades to integer since doing math (via sum( g.points)). Best to use proper data types vs everything as character.
At least the queries are now working, but does not make sense if this is GPA calculations -- as far as I know and have done for U.S. college transcript purposes.

SQL Temporary Table or Select

I've got a problem with MySQL select statement.
I have a table with different Department and statuses, there are 4 statuses for every department, but for each month there are not always every single status but I would like to show it in the analytics graph that there is '0'.
I have a problem with select statement that it shows only existing statuses ( of course :D ).
Is it possible to create temporary table with all of the Departments , Statuses and amount of statuses as 0, then update it by values from other select?
Select statement and screen how it looks in perfect situation, and how it looks in bad situation :
SELECT utd.Departament,uts.statusDef as statusoforder,Count(uts.statusDef) as Ilosc_Statusow
FROM ur_tasks_details utd
INNER JOIN ur_tasks_status uts on utd.StatusOfOrder = uts.statusNR
WHERE month = 'Sierpien'
GROUP BY uts.statusDef,utd.Departament
Perfect scenario, now bad scenario :
I've tried with "union" statements but i don't know if there is a possibility to take only "the highest value" for every department.
example :
I've also heard about
With CTE tables, but I don't really get how to use it. Would love to get some tips on it!
Thanks for your help.

Use a cross join to generate the rows you want. Then use a left join and aggregation to bring in the data:
select d.Departament, uts.statusDef as statusoforder,
Count(uts.statusDef) as Ilosc_Statusow
from (select distinct utd.Departament
from ur_tasks_details utd
) d cross join
ur_tasks_status uts left join
ur_tasks_details utd
on utd.Departament = d.Departament and
utd.StatusOfOrder = uts.statusNR and
utd.month = 'Sierpien'
group by uts.statusDef, d.Departament;
The first subquery should be your source of all the departments.
I also suspect that month is in the details table, so that should be part of the on clause.

SQL: Column Must Appear in the GROUP BY Clause Or Be Used in an Aggregate Function

I'm doing what I would have expected to be a fairly straightforward query on a modified version of the imdb database:
select primary_name, release_year, max(rating)
from titles natural join primary_names natural join title_ratings
group by year
having title_category = 'film' and year > 1989;
However, I'm immediately running into
"column must appear in the GROUP BY clause or be used in an aggregate function."
I've tried researching this but have gotten confusing information; some examples I've found for this problem look structurally identical to mine, where others state that you must group every single selected parameter, which defeats the whole purpose of a group as I'm only wanting to select the maximum entry per year.
What am I doing wrong with this query?
Expected result: table with 3 columns which displays the highest-rated movie of each year.

If you want the maximum entry per year, then you should do something like this:
select r.*
from ratings r
where r.rating = (select max(r2.rating) where r2.year = r.year) and
r.year > 1989;
In other words, group by is the wrong approach to writing this query.
I would also strongly encourage you to forget that natural join exists at all. It is an abomination. It uses the names of common columns for joins. It does not even use properly declared foreign key relationships. In addition, you cannot see what columns are used for the join.
While I am it, another piece of advice: qualify all column names in queries that have more than one table reference. That is, include the table alias in the column name.

If you want to display all the columns you can user window function like :
select primary_name, year, max(rating) Over (Partition by year) as rating
from titles natural
join primary_names natural join ratings
where title_type = 'film' and year > 1989;

Multitable counting and multiplying in same query

I have got a somewhat complicated problem. This is my situation (ERD).
For a dashboard i need to create a pivot table that shows me the total amount of competences used by the vacancies. Therefore I need to:
Count the amount of vacancies per template
Count the amount of templates per competence
and last: multiply these numbers to get the total amount of comps used.
I have the first query:
SELECT vacancytemplate_id, count(id)
FROM vacancies
group by vacancytemplate_id;
And the second query isn't that difficult either, but I don't know what the right solution will be. I'm literally brainstuck. My mind can't comprehend how I can achieve the next step and put it down in a query. Please kind stranger, help me out :)
EDIT: my desired result is something like this
NameOfComp, NrOfTimesUsed
Leading, 17
Inspiring, 2
EDIT2: the meta query it should look like:
SELECT NameOfComp, (count of the competences used by templates) * (number of vacancies per template)
EDIT3: http://sqlfiddle.com/#!9/2773ca SQLFiddle
Thanks a lot!

If I am understanding your request correctly, you are wanting a count of competences per vacancy. This can be done very simply due to your table structure:
Select v.ID, count(*) from vacancy as v inner join CompTemplate_Table as CT
on v.Template_ID = CT.Template_ID group by v.ID;
The reason you can do only one join is because there will be a record in the CompTemplate_Table for every competency in each template. Additionally, the same key is used to join vacancy to templates as is used to join templates to CompTemplate_Table, so they represent the same key value (and you can skip joining the Templates table if you don't need data from there).
If you are wanting to add this data to a pivot table, I will leave that exercise to you. There are a number of tutorials available if you do a quick google search and it should not be that hard.
UPDATE: For the second query you are looking at something like:
Select cp.NameOfComp, count(*) from vacancy as v inner join CompTemplate_Table as CT
on v.Template_ID = CT.Template_ID inner join competencies as CP
on CP.ID = CT.Comp_ID
group by CP.NameOfComp
The differences here are you are adding in the comptetencies table, as you need data from that, and grouping by the CP.NameOfComp instead of the vacancy id. You can also restrict this to specific templates, competencies, or vacancies by adding in search conditions (e.g. where CP.ID = 12345)

How to deal with bad data in mysql?

I have three tables that I want to combine.
I have the following query to run:
DROP TABLE
IF EXISTS testgiver.smart_curmonth_downs;
CREATE TABLE testgiver.smart_curmonth_downs
SELECT
ldap_karen.uid,
ldap_karen.supemail,
ldap_karen.regionname,
smart_curmonth_downs_raw.username,
smart_curmonth_downs_raw.email,
smart_curmonth_downs_raw.publisher,
smart_curmonth_downs_raw.itemtitle,
smart_items.`Owner`
FROM
smart_curmonth_downs_raw
INNER JOIN ldap_karen ON smart_curmonth_downs_raw.username = ldap_karen.uid
INNER JOIN smart_items ON smart_curmonth_downs_raw.itemtitle = smart_items.Title
I want to know how to create the joins while maintaining a one to one relationship at all times with rows in table smart_curmonth_downs_raw.
For instance if there is not a uid in ldap_karen I have issues. And then the last issue I have found is that our CMS is allowing for duplicate itemtitle. So if I run my query I am getting a lot more rows because it is creating a row for each itemtitle. For example would there be a way to only catch the last itemtitle that is in smart_items. I would just really like to maintain the same number of rows - and I have no control over the integrity issues of the other tables.
The smart_curmonth_downs_raw table is the raw download information (download stats), the karen table adds unique user information, and the smart_items table adds unique items (download) info. They are all important. If a user made a download but is knocked off the karen table I would like to see NULLs for the user info and if there is more than one item in smart_items that has the same name then I would like to see just the item with the highest ID.

It sounds like relationship between smart_curmonth_downs_raw and ldap_karen is optional, which means you want to use a LEFT JOIN which all the rows in the first table, and, if the right table does not exists, use NULL as the right table's column values.
In terms of the last item in the smart_items table, you could use this query.
SELECT title, MAX(id) AS max_id
FROM smart_items
GROUP BY title;
Combining that query with the other logic, try this query as a solution.
SELECT COALESCE(ldap_karen.uid, 'Unknown') AS uid,
COALESCE(ldap_karen.supemail, 'Unknown') AS supemail,
COALESCE(ldap_karen.regionname, 'Unknown') AS regionname,
smart_curmonth_downs_raw.username,
smart_curmonth_downs_raw.email,
smart_curmonth_downs_raw.publisher,
smart_curmonth_downs_raw.itemtitle,
smart_items.`Owner`
FROM smart_curmonth_downs_raw
INNER JOIN (SELECT title, MAX(id) AS max_id
FROM smart_items
GROUP BY title) AS most_recent
ON smart_curmonth_downs_raw.itemtitle = most_recent.Title;
INNER JOIN smart_items
ON most_recent.max_id = smart_items.id
LEFT JOIN ldap_karen
ON smart_curmonth_downs_raw.username = ldap_karen.uid;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008