remove only duplicates ids - mysql

I got 2 tables: job and job_working_time
job: id (Increment, Index, Unique)
job_working_time: job_id(allow multiple), property_working_time
This SQL query returns the ids with multiple values or duplicates. It helps, but it doesn't fix my problem:
SELECT a.id AS id, count( b.job_id ) AS cnt, b.property_working_time AS value
FROM job a
INNER JOIN job_working_time b ON a.id = b.job_id
GROUP BY b.job_id
HAVING cnt >1
I want to remove duplicates, only for ids with duplicates, e.g: l = leave, r = remove
1 - 1 (l)
1 - 1 (r)
1 - 2 (l)
1 - 3 (l)
1 - 1 (r)
2 - 1 (l)
2 - 1 (r)
2 - 2 (l)
Thanks in advance
[Later edit]
One thing you should consider:
Anyway, when I want to delete the ID, will remove all values for it. So, the idea is to keep all the values for that id, so later I can add without duplicates. That's why it's so important to retrieve, both the id and the value, but not duplicate content. The SQL should return all the (l) values from the above e.g. list.

Since your example looks like you don't want any row be same
DISTINCT will work for you
it select completely diffirent rows
SELECT DISTINCT a.id AS id, count( b.job_id ) AS cnt, b.property_working_time AS value
FROM job a
INNER JOIN job_working_time b ON a.id = b.job_id
GROUP BY b.job_id
HAVING cnt >1
Edit:
Example added
https://data.stackexchange.com/stackoverflow/q/119267/

First/ simple Query:
SELECT id, field, count(*) c from ... group by field having c >1
Second Query, modifed for get IDs to delete:
SELECT instr(s,mid(s,',')+1) from (
SELECT group_concat(id) s, count(*) c from ... group by field having c >1)z
and the end:
Delete FROM ... WHERE id in ( ...)
looks ugly, but fast (Mid function performance is faster than not in (...)).

Related

SELECT query with JOIN returns duplicate of each row

I am running a SELECT query to return addresses in a table associated with a certain "applicant code" and I'd like to join a table to also return (in the same row) the name of that applicant.
Therefore my query as of now is
SELECT a.id, a.created_at, a.updated_at, a.code, a.applicant_code, a.form_code, a.address_line_1, a.address_line_2, a.town_city, a.county_state, a.country, a.post_code, a.start_date, a.end_date, a.type, ap.first_name, ap.last_name
FROM sfs_addresses a
JOIN sfs_personal_details ap ON a.form_code = ap.form_code
WHERE a.form_code = ? AND a.applicant_code = ?
The query works, and I get the right columns and values in each row, but it returns 2 of each so like
ID
===
1
1
2
2
3
3
4
4
If I remove the JOIN it works fine. I have tried adding DISTINCT (makes no difference) I'm lost.
EDIT: Based on this answer and the comments, the OP realized that the JOIN condition should be on applicant_code rather than form_code.
You have duplicates in the second table based on the JOIN key you are using (I question if the JOIN is correct).
If you just want one row arbitrarily, you can use row_number():
SELECT a.*, ap.first_name, ap.last_name
FROM sfs_addresses a JOIN
(SELECT ap.*,
ROW_NUMBER() OVER (PARTITION BY ap.form_code ORDER BY ap.form_code) as seqnum
FROM sfs_personal_details ap
) ap
ON a.form_code = ap.form_code
WHERE a.form_code = ? AND a.applicant_code = ?;
You can replace the columns in the ORDER BY with which result you want -- for instance the oldest or most recent.
Note: form_code seems like an odd JOIN column for a table called "personal details". So, you might just need to fix the JOIN condition.
relation between 2 tables one to many to return non duplicate use distinct
SELECT distinct a.id, a.created_at, a.updated_at, a.code, a.applicant_code, a.form_code, a.address_line_1, a.address_line_2, a.town_city, a.county_state, a.country, a.post_code, a.start_date, a.end_date, a.type, ap.first_name, ap.last_name
FROM sfs_addresses a
JOIN sfs_personal_details ap ON a.form_code = ap.form_code
WHERE a.form_code = ? AND a.applicant_code = ?

Select most recent record grouped by 3 columns

I am trying to return the price of the most recent record grouped by ItemNum and FeeSched, Customer can be eliminated. I am having trouble understanding how I can do that reasonably.
The issue is that I am joining about 5 tables containing hundreds of thousands of rows to end up with this result set. The initial query takes about a minute to run, and there has been some trouble with timeout errors in the past. Since this will run on a client's workstation, it may run even slower, and I have no access to modify server settings to increase memory / timeouts.
Here is my data:
Customer Price ItemNum FeeSched Date
5 70.75 01202 12 12-06-2017
5 70.80 01202 12 06-07-2016
5 70.80 01202 12 07-21-2017
5 70.80 01202 12 10-26-2016
5 82.63 02144 61 12-06-2017
5 84.46 02144 61 06-07-2016
5 84.46 02144 61 07-21-2017
5 84.46 02144 61 10-26-2016
I don't have access to create temporary tables, or views and there is no such thing as a #variable in C-tree, but in most ways it acts like MySql. I wanted to use something like GROUP BY ItemNum, FeeSched and select MAX(Date). The issue is that unless I put Price into the GROUP BY I get an error.
I could run the query again only selecting ItemNum, FeeSched, Date and then doing an INNER JOIN, but with the query taking a minute to run each time, it seems there is a better way that maybe I don't know.
Here is my query I am running, it isn't really that complicated of a query other than the amount of data it is processing. Final results are about 50,000 rows. I can't share much about the database structure as it is covered under an NDA.
SELECT DISTINCT
CustomerNum,
paid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.primfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
UNION ALL
SELECT DISTINCT
CustomerNum,
secpaid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.secfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
I feel it quite simple when I'd read the first three paragraphs, but I get a little confused when I've read the whole question.
Whatever you have done to get the data posted above, once you've got the data like that it's easy to retrive "the most recent record grouped by ItemNum and FeeSched".
How to:
Firstly, sort the whole result set by Date DESC.
Secondly, select fields you need from the sorted result set and group by ItemNum, FeeSched without any aggregation methods.
So, the query might be something like this:
SELECT t.Price, t.ItemNum, t.FeeSched, t.Date
FROM (SELECT * FROM table ORDER BY Date DESC) AS t
GROUP BY t.ItemNum, t.FeeSched;
How it works:
When your data is grouped and you select rows without aggregation methods, it will only return you the first row of each group. As you have sorted all rows before grouping, so the first row would exactly be "the most recent record".
Contact me if you got any problems or errors with this approach.
You can also try like this:
Select Price, ItemNum, FeeSched, Date from table where Date IN (Select MAX(Date) from table group by ItemNum, FeeSched,Customer);
Internal sql query return maximum date group by ItemNum and FeeSched and IN statement fetch only the records with maximum date.

mysql select statement for multiple table

This is just a test project. I want to know how to select All professors with more than 5 failed students in a subject
I already know how to select All professors with at least 2 subjects with the following query:
SELECT paulin_professors.*,
IFNULL(sub_p.total, 0) num
FROM paulin_professors
LEFT JOIN ( SELECT COUNT(*) total, pau_profid
FROM paulin_profsubject
GROUP BY pau_profid
) sub_p ON (sub_p.pau_profid = paulin_professors.pau_profid)
WHERE sub_p.total >= 2;
I know I'm close but I can't get it to work (All professors with more than 5 failed students in a subject) . Any ideas? TIA
try using SELECT with UNION
select [columnName1],[columnName2] from [Table1] where [condition] union select [columnName1],[columnName2] from [Table1] where [condition] union ....
Looks like can get the professor IDs from the profsubject table and JOIN the studentenrolled table using the subjid for the join. In a similar way to what you had, you can get the count of students who have a grade less than a certain pass/fail threshold (in this case 65).
Then to get a short list, you can select the distinct profids from this derivaed table.
SELECT
distinct pau_profid
FROM
(SELECT
t1.pau_profid,
IFNULL(t2.total_failed, 0) number_failed >= 5
FROM
paulin_profsubject t1
LEFT JOIN
(SELECT
COUNT(*) total_failed,
pau_subjid
FROM
paulin_studentenrolled
WHERE
pau_grade < 65
GROUP BY
pau_subjid
) t2
ON
t1.pau_subjid = t2.pau_subjid
WHERE
number_failed >= 5
) t3;

Inner join to select MAX value from the SUM of Rows

http://sqlfiddle.com/#!2/1a2df
I am trying to select a column "photo_name" by counting the MAX value of SUM(valid)
the preferred results is "test5.jpg" but after hours of trying, I still can't figure it out ,
below is my previous approach, but it doesn't work
SELECT photo_name FROM
(
SELECT a.*
FROM test a
INNER JOIN
(
SELECT *, SUM(valid) v
FROM test
WHERE page_id = 3 AND `valid` = 1
) b ON MAX(b.v)
)c
Please help,
Something like this should work. It sums the valid column for each photo, orders high-to-low for the sum, then limits results to the top row:
SELECT photo_name, SUM(valid) AS sum_valid
FROM test
GROUP BY photo_name
ORDER BY sum_valid DESC
LIMIT 1

Query output differs from the expected output

Below query is doing what I need:
SELECT assign.from_uid, assign.aid, assign.message, curriculum.asset,
curriculum.title, curriculum.description
FROM assignment assign
INNER JOIN curriculum_topics_assets curriculum
ON assign.nid = curriculum.asset
WHERE assign.to_uid = 13 AND assign.status = 1
GROUP BY assign.from_uid, assign.to_uid, assign.nid
ORDER BY assign.created DESC
Now I need to get the total count of rows of the result. For example if it is displaying 5 rows the o/p should be like My expected o/p. The query I tried is given below.
SELECT count(description) FROM assignment assign
INNER JOIN curriculum_topics_assets curriculum ON assign.nid = curriculum.asset
WHERE assign.to_uid = 13 AND assign.status = 1
GROUP BY assign.from_uid, assign.to_uid, assign.nid
ORDER BY assign.created DESC
My expected o/p:
count(*)
---------
5
My current o/p:
count(*)
---------
6
2
5
6
6
The easiest solution would be to
place your initial GROUP BY query in a subselect
select the amount of rows retrieved from this subselect
SQL Statement
SELECT COUNT(*)
FROM (
SELECT assign.from_uid
FROM assignment assign
INNER JOIN curriculum_topics_assets curriculum ON assign.nid = curriculum.asset
WHERE assign.to_uid = 13
AND assign.status = 1
GROUP BY
assign.from_uid
, assign.to_uid
, assign.nid
) q
Edit - why doesn't the original query return the results required
It did already prepared what was needed to get the correct result
Your query without grouping returns a resultset of 25 records (6+2+5+6+6)
From these 25 records, you have 5 unique combinations of from_uid, to_uid, nid
Now you don't want to count how many records each combination has (as you did in your example) but how many unique (distinct anyone?) combinations there are.
One solution to this is the subselect I presented but following equivalent statement using a DISTINCT clause might be more comprehensive.
SELECT COUNT(*)
FROM (
SELECT DISTINCT assign.from_uid
, assign.to_uid
, assign.nid
FROM assignment assign
INNER JOIN curriculum_topics_assets curriculum ON assign.nid = curriculum.asset
WHERE assign.to_uid = 13
AND assign.status = 1
) q
Note that my personal preference goes to the GROUP BY solution.
To get the number of rows for a query do:
SELECT COUNT(*) as RowCount FROM (--insert other query here--) s
In you example:
SELECT COUNT(*) as RowCount FROM (SELECT a.from_uid
FROM assignment a
INNER JOIN curriculum_topics_assets c ON a.nid = c.asset
WHERE a.to_uid = 13
AND a.status = 1
GROUP BY a.from_uid, a.to_uid, a.nid
) s
Note that I the dropped the stuff that has no effect on the number of rows to make the query run slightly faster.
You should use COUNT(*) instead of count(description). Look at: http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/