How to Find and group Duplicates in MS Access 2010? - duplicates

I'm trying to find duplicate entries across 2 columns in Access 2010 and group them together without affecting the data integrity(I have 20 columns in total and 2 of them contain duplicates in random rows). I would like the rows that contain duplicates be lined up.
I have this formula for finding the duplicates but it doesn't seem to work and doesn't group the duplicates anyhow.
SELECT t.IsParent, tbl.*
FROM (SELECT
parent.id , 1 AS IsParent, parent.[company url] as Col
FROM
tbl AS child INNER JOIN tbl AS parent ON child.domain_name = parent.[company url]
WHERE (((parent.domain_name)<>[parent].[company url]))
union all
SELECT
child.id , 0 AS IsParent, child.[company url] as Col
FROM
tbl AS child INNER JOIN tbl AS parent ON child.domain_name = parent.[company url]
WHERE (((parent.domain_name)<>[parent].[company url]))
) AS t INNER JOIN tbl ON t.id = tbl.id;
I would highly appreciate help in solving this issue.
Thank you

Related

How to fix MySQL providing duplicates that do not exist?

I have been recently messing with MySQL as I'm using it in a current project, I have a few thousand records in a table but there's one which stands out to me, I have a SELECT statement which collects a bunch of column names and uses them for the final query to send.
However when I run the query, it gives me duplicates as seen here:
https://i.imgur.com/PImNBam.png
The strange thing is that the ID is set as the key, so there's no right for MySQL to produce duplicates, and even if I go into the table and check manually, no duplicates exist.
This query used to work without a hitch on this exact server, I tried to group the scores by id and by song_name (from the photo) but it has given no results, I tried to delete duplicates using:
DELETE t1
FROM scores t1
INNER JOIN scores t2
WHERE t1.score < t2.score
AND t1.beatmap_md5 = t2.beatmap_md5
AND t1.userid = t2.userid;
But that returned zero queries and didn't change anything at all.
SQL query that I use to gather the information:
SELECT scores.id,
beatmaps.song_name,
scores.beatmap_md5,
users.username,
scores.userid,
scores.time,
scores.score,
scores.pp,
scores.play_mode,
scores.mods
FROM scores
LEFT JOIN beatmaps ON beatmaps.beatmap_md5 = scores.beatmap_md5
LEFT JOIN users ON users.id = scores.userid
WHERE users.privileges & 1 > 0
I really expected no duplicates to show as none of those exist, I don't know if mysql is having some caching issue or if this could be something else.
For avoid duplicated rows you could use distinct
SELECT DISTINCT
scores.id
, beatmaps.song_name
, scores.beatmap_md5
, users.username
, scores.userid
, scores.time
, scores.score
, scores.pp
, scores.play_mode
, scores.mods
FROM scores
LEFT JOIN beatmaps ON beatmaps.beatmap_md5 = scores.beatmap_md5
LEFT JOIN users ON users.id = scores.userid
WHERE users.privileges & 1 > 0

Select most recent record grouped by 3 columns

I am trying to return the price of the most recent record grouped by ItemNum and FeeSched, Customer can be eliminated. I am having trouble understanding how I can do that reasonably.
The issue is that I am joining about 5 tables containing hundreds of thousands of rows to end up with this result set. The initial query takes about a minute to run, and there has been some trouble with timeout errors in the past. Since this will run on a client's workstation, it may run even slower, and I have no access to modify server settings to increase memory / timeouts.
Here is my data:
Customer Price ItemNum FeeSched Date
5 70.75 01202 12 12-06-2017
5 70.80 01202 12 06-07-2016
5 70.80 01202 12 07-21-2017
5 70.80 01202 12 10-26-2016
5 82.63 02144 61 12-06-2017
5 84.46 02144 61 06-07-2016
5 84.46 02144 61 07-21-2017
5 84.46 02144 61 10-26-2016
I don't have access to create temporary tables, or views and there is no such thing as a #variable in C-tree, but in most ways it acts like MySql. I wanted to use something like GROUP BY ItemNum, FeeSched and select MAX(Date). The issue is that unless I put Price into the GROUP BY I get an error.
I could run the query again only selecting ItemNum, FeeSched, Date and then doing an INNER JOIN, but with the query taking a minute to run each time, it seems there is a better way that maybe I don't know.
Here is my query I am running, it isn't really that complicated of a query other than the amount of data it is processing. Final results are about 50,000 rows. I can't share much about the database structure as it is covered under an NDA.
SELECT DISTINCT
CustomerNum,
paid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.primfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
UNION ALL
SELECT DISTINCT
CustomerNum,
secpaid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.secfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
I feel it quite simple when I'd read the first three paragraphs, but I get a little confused when I've read the whole question.
Whatever you have done to get the data posted above, once you've got the data like that it's easy to retrive "the most recent record grouped by ItemNum and FeeSched".
How to:
Firstly, sort the whole result set by Date DESC.
Secondly, select fields you need from the sorted result set and group by ItemNum, FeeSched without any aggregation methods.
So, the query might be something like this:
SELECT t.Price, t.ItemNum, t.FeeSched, t.Date
FROM (SELECT * FROM table ORDER BY Date DESC) AS t
GROUP BY t.ItemNum, t.FeeSched;
How it works:
When your data is grouped and you select rows without aggregation methods, it will only return you the first row of each group. As you have sorted all rows before grouping, so the first row would exactly be "the most recent record".
Contact me if you got any problems or errors with this approach.
You can also try like this:
Select Price, ItemNum, FeeSched, Date from table where Date IN (Select MAX(Date) from table group by ItemNum, FeeSched,Customer);
Internal sql query return maximum date group by ItemNum and FeeSched and IN statement fetch only the records with maximum date.

Mysql query with group by and order by involving several tables

I'm having a problem regarding a query because i don't have all the records and i don't know why
This is the query
SELECT `ebspma_paad_ebspma`.`semana_dias`.`dia`,`ebspma_paad_ebspma`.`req_material_sala`.`sala`, `ebspma_paad_ebspma`.`req_material_tempo`.`inicio`, `ebspma_paad_ebspma`.`sala_ocupacao`.`id_ocup`, `ebspma_paad_ebspma`.`turmas`.`turma`
FROM `ebspma_paad_ebspma`.`sala_ocupacao`
INNER JOIN `ebspma_paad_ebspma`.`semana_dias`
ON (`sala_ocupacao`.`id_dia` = `semana_dias`.`id_dia`)
INNER JOIN `ebspma_paad_ebspma`.`req_material_sala`
ON (`sala_ocupacao`.`id_sala` = `req_material_sala`.`idsala`)
LEFT JOIN `ebspma_paad_ebspma`.`req_material_tempo`
ON (`sala_ocupacao`.`id_tempo` = `req_material_tempo`.`idtempo`)
LEFT JOIN `ebspma_paad_ebspma`.`turmas`
ON (`sala_ocupacao`.`id_turma` = `turmas`.`id_turma`)
where`ebspma_paad_ebspma`.`sala_ocupacao`.`id_turma` = '$turma'
GROUP BY `ebspma_paad_ebspma`.`sala_ocupacao`.`id_dia` , `ebspma_paad_ebspma`.`req_material_tempo`.`inicio` ASC";
Running this query i have almost records but this is a school timetable and when a class is divided in 2 groups i have two classrooms for this class. With this query i have only one group
For exemple the class start at 1 PM in two classrooms (27 and 31), with this query i should have at 1 PM the classroom X is on 27 and 31 classroom, but i have only the first one
Image to check http://postimg.org/image/u24r35fkz/
And my database image http://postimg.org/image/hyvpb1qz1/ce7a7320/
So what's wrong with my query?
Thanks
UPDATE 1
I have simplified my query to
SELECT t2.`dia` , t3.`sala` , t4.`inicio` , t1.`id_ocup` , t5.`turma`
FROM `ebspma_paad_ebspma`.`sala_ocupacao` AS t1
INNER JOIN `ebspma_paad_ebspma`.`semana_dias` AS t2 ON ( t1.`id_dia` = t2.`id_dia` )
INNER JOIN `ebspma_paad_ebspma`.`req_material_sala` AS t3 ON ( t1.`id_sala` = t3.`idsala` )
LEFT JOIN `ebspma_paad_ebspma`.`req_material_tempo` AS t4 ON ( t1.`id_tempo` = t4.`idtempo` )
LEFT JOIN `ebspma_paad_ebspma`.`turmas` AS t5 ON ( t1.`id_turma` = t5.`id_turma` )
WHERE t1.`id_turma` =12
GROUP BY t1.`id_dia` , t3.`idsala` , t4.`inicio`
Now i can see all the classes but not in the right order, the order should be given by t4.inicio and by day (id dia)
You are not grouping by sala so MySQL will behave badly and give you a random row that fits the other requirements. Better functioning database engines would give you an error saying you haven't aggregated or grouped all result columns.
If you add sala to GROUP BY you should see the difference.
For the ordering: you're not asking the database to ORDER BY anything so the rows will be in whatever order they happen to come out. Probably want to add ORDER BY t4.inicio, t1.id_dia to handle that.

mysql limiting join

I've done a few searches on this subject but non of the solutions seem to work so perhaps my requirement is slightly different.
Basically I have a "content" table and a "file_screenshots" table. Each row in the "file_screenshots" table has a "screenshot_content_id" column. I want to select from the "content" table, join the "file_screenshots" table but only select a maximum of 5 screenshots for any single piece of content.
If this isn't possible i'm happy to use two queries, but again i'm not sure how to limit the results to only receiving 5 screenshots per piece of content.
Here is an example query:
SELECT * FROM content
LEFT JOIN file_screenshots
ON file_screenshots.screenshot_content_id = content.content_id
WHERE content_type_id = 4
Assuming you have some sort of unique id column in your file_screenshots table, this should work for you:
SELECT
c.*,
fs.*
FROM
content c
JOIN
file_screenshots fs
ON (fs.screenshot_content_id = c.content_id)
LEFT JOIN
file_screenshots fs2
ON (fs2.screenshot_content_id = c.content_id AND fs2.id < fs.id)
GROUP BY
fs.id
HAVING
COUNT(*) < 5
ORDER BY c.content_id, fs.id
I've named the id column id. Rename it if neccessary.
If you want the 5 screenshots with the highest id, reverse the fs2.id vs. fs.id comparison.
ON (fs2.screenshot_content_id = c.content_id AND fs2.id > fs.id)

How to retrieve count of items in top level categories and including items in immediate sub-categories using mysql query

I tried ..
SELECT c.* , (
SELECT COUNT( * )
FROM item t
WHERE t.cat_id = c.cat_id
)ct_items, (
SELECT COUNT( * )
FROM item t
INNER JOIN cat c3 ON t.cat_id = c3.cat_id
AND c3.cat_id = c.parent_id
) ct_sub
FROM cat c
WHERE parent_id = '0'
ORDER BY name
but got Unknown column 'c.parent_id' in 'on clause'. Any ideas why I am getting this or another way to achieve this using mysql query? I can workout the numbers using multiple queries and using php etc though.
Thanks
You don't necessarily have to do everything in one query; sometimes trying to glue queries together ends up with worse performance (especially when correlated subqueries are involved). Two queries is OK; it's when you end up calling a new query for each row you've got problems.
So you could get the category items:
SELECT c0.*, COUNT(i0.id) AS cat_nitems
FROM cat AS c0
LEFT JOIN item AS i0 ON i0.cat_id=c0.cat_id
WHERE c0.parent_id= '0'
GROUP BY c0.cat_id
ORDER BY c0.name
and then separately get the subcategory items using a parent-child self-join:
SELECT c0.*, COUNT(i1.id) AS subcats_nitems
FROM cat AS c0
LEFT JOIN cat AS c1 ON c1.parent_id=c0.cat_id
LEFT JOIN item AS i1 ON item.cat_id=c1.cat_id
WHERE c0.parent_id= '0'
GROUP BY c0.cat_id
ORDER BY c0.name
And yes, you can join them both into a single query:
SELECT c0.*, COUNT(DISTINCT i0.id) AS cat_nitems, COUNT(DISTINCT i1.id) AS subcats_nitems
FROM cat AS c0
LEFT JOIN cat AS c1 ON c1.parent_id=c0.cat_id
LEFT JOIN item AS i0 ON item.cat_id=c0.cat_id
LEFT JOIN item AS i1 ON item.cat_id=c1.cat_id
WHERE c0.parent_id= '0'
GROUP BY c0.cat_id
ORDER BY c0.name
I would suspect the larger join and DISTINCT processing might make it less efficient. But then I guess on a small database you ain't gonna notice.
Either way, this will only work for two-deep nesting. If you need sub-sub-categories or arbitrary-depth trees in general, you should consider a schema that's better at modelling trees, such as nested sets.