I need to design a query that finds "duplicates" in MS Access table. They are not true duplicates, in that every field is not identical, but it is highly unlikely that a patient would be seen twice within 60 days, so 2 records in that timespan are likely duplicates.
The relevant columns in the table are:
id integer autoincrement
patientid text
proceduredate date/time
I want to produce a list of patientid where the proceduredate is within 60 days of each other. I was able to find a list of all "duplicates" with the following query:
SELECT * FROM tblProcedures
WHERE patientid = ANY
(SELECT tblProcedures.patientid
FROM tblProcedures
GROUP BY tblProcedures.patientid
HAVING COUNT(tblProcedures.patientid) > 1)
ORDER BY tblProcedures.patientid, tblProcedures.proceduredate DESC
But I'm not sure how to limit the results to records involving the same patientid where the proceduredate is within 60 days of a previous procedure.
One solution is to join the table to itself:
Select *
From tblProcedures As P1
Inner Join tblProcedures As P2
On P2.PatientId = P1.PatientId
And P2.Id <> P1.Id
Where Abs(DateDiff("d", P1.ProcedureDate, P2.ProcedureDate)) <= 60
Order By P1.PatientId, P1.ProcedureDate Desc
I think you want something like:
SELECT *
FROM tblProcedures P1
LEFT JOIN tblProcedures P2
ON P1.patientid = P2.patientid
AND P1.proceduredate < P2.proceduredate
AND DateDiff("d", P1.proceduredate, P2.Proceduredate) <= 60
WHERE P2.patientid IS NOT NULL
ORDER BY P1.patientid, P1.proceduredate DESC
Related
I'm trying to run count query on a 2 table join. e_amazing_client table is having million entries/rows and m_user has just 50 rows BUT count query is taking forever!
SELECT COUNT(`e`.`id`) AS `count`
FROM `e_amazing_client` AS `e`
LEFT JOIN `user` AS `u` ON `e`.`cx_hc_user_id` = `u`.`id`
WHERE ((`e`.`date_created` >= '2018-11-11') AND (`e`.`date_created` >= '2018-11-18')) AND (`e`.`id` >= 1)
I don't know what is wrong with this query?
First, I'm guessing that this is sufficient:
SELECT COUNT(*) AS `count`
FROM e_amazing_client e
WHERE e.date_created >= '2018-11-11' AND e.id >= 1;
If user has only 50 rows, I doubt it is creating duplicates. The comparisons on date_created are redundant.
For this query, try creating an index on e_amazing_client(date_created, id).
Maybe you wanted this:
SELECT COUNT(`e`.`id`) AS `count`
FROM `e_amazing_client` AS `e`
LEFT JOIN `user` AS `u` ON `e`.`cx_hc_user_id` = `u`.`id`
WHERE ((`e`.`date_created` >= '2018-11-11') AND (`e`.`date_created` <= '2018-11-18')) AND (`e`.`id` >= 1)
to check between dates?
Also, do you really need
AND (`e`.`id` >= 1)
If id is what an id is usually in a table, is there a case to be <1?
Your query is pulling ALL records on/after 2018-11-11 because your WHERE clause is ID >= 1 You have no clause in there for a specific user. You also had in your original query based on a date of >= 2018-11-18. You MAY have meant you only wanted the count WITHIN the week 11/11 to 11/18 where the sign SHOULD have been >= 11-11 and <= 11-18.
As for the count, you are getting ALL people (assuming no entry has an ID less than 1) and thus a count within that date range. If you want it per user as you indicated you need to group by the cx_hc_user_id (user) column to see who has the most, or make the user part of the WHERE clause to get one person.
SELECT
e.cx_hc_user_id,
count(*) countPerUser
from
e_amazing_client e
WHERE
e.date_created >= '2018-11-11'
AND e.date_created <= '2018-11-18'
group by
e.cx_hc_user_id
You can order by the count descending to get the user with the highest count, but still not positive what you are asking.
I am trying to return the price of the most recent record grouped by ItemNum and FeeSched, Customer can be eliminated. I am having trouble understanding how I can do that reasonably.
The issue is that I am joining about 5 tables containing hundreds of thousands of rows to end up with this result set. The initial query takes about a minute to run, and there has been some trouble with timeout errors in the past. Since this will run on a client's workstation, it may run even slower, and I have no access to modify server settings to increase memory / timeouts.
Here is my data:
Customer Price ItemNum FeeSched Date
5 70.75 01202 12 12-06-2017
5 70.80 01202 12 06-07-2016
5 70.80 01202 12 07-21-2017
5 70.80 01202 12 10-26-2016
5 82.63 02144 61 12-06-2017
5 84.46 02144 61 06-07-2016
5 84.46 02144 61 07-21-2017
5 84.46 02144 61 10-26-2016
I don't have access to create temporary tables, or views and there is no such thing as a #variable in C-tree, but in most ways it acts like MySql. I wanted to use something like GROUP BY ItemNum, FeeSched and select MAX(Date). The issue is that unless I put Price into the GROUP BY I get an error.
I could run the query again only selecting ItemNum, FeeSched, Date and then doing an INNER JOIN, but with the query taking a minute to run each time, it seems there is a better way that maybe I don't know.
Here is my query I am running, it isn't really that complicated of a query other than the amount of data it is processing. Final results are about 50,000 rows. I can't share much about the database structure as it is covered under an NDA.
SELECT DISTINCT
CustomerNum,
paid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.primfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
UNION ALL
SELECT DISTINCT
CustomerNum,
secpaid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.secfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
I feel it quite simple when I'd read the first three paragraphs, but I get a little confused when I've read the whole question.
Whatever you have done to get the data posted above, once you've got the data like that it's easy to retrive "the most recent record grouped by ItemNum and FeeSched".
How to:
Firstly, sort the whole result set by Date DESC.
Secondly, select fields you need from the sorted result set and group by ItemNum, FeeSched without any aggregation methods.
So, the query might be something like this:
SELECT t.Price, t.ItemNum, t.FeeSched, t.Date
FROM (SELECT * FROM table ORDER BY Date DESC) AS t
GROUP BY t.ItemNum, t.FeeSched;
How it works:
When your data is grouped and you select rows without aggregation methods, it will only return you the first row of each group. As you have sorted all rows before grouping, so the first row would exactly be "the most recent record".
Contact me if you got any problems or errors with this approach.
You can also try like this:
Select Price, ItemNum, FeeSched, Date from table where Date IN (Select MAX(Date) from table group by ItemNum, FeeSched,Customer);
Internal sql query return maximum date group by ItemNum and FeeSched and IN statement fetch only the records with maximum date.
I have a table where I am storing the stored number of barrels inside of many tanks. I am storing values here every night at midnight, and at the beggining and end of any operator initiated transfer.
What I want to return is the number of barrels difference since the previous event record for that specific tank. I have the correct ID for the self join to get the previous record number, however the barrels is incorrect.
Here is what I currently have.
SELECT
inventory.id,
MAX(inventory2.id) AS id2,
inventory.tankname,
inventory.barrels,
inventory.eventstamp,
inventory2.barrels
FROM
inventory
LEFT JOIN
inventory inventory2 ON inventory2.tankname = inventory.tankname AND inventory2.eventstamp < inventory.eventstamp
GROUP BY
inventory.id,
inventory.tankname,
inventory.barrels,
inventory.eventstamp
ORDER BY
inventory.tankname,
inventory.eventstamp
That returns the following
Just use correlated subqueries:
SELECT i.*,
(SELECT i2.id
FROM inventory i2
WHERE i2.tankname = i.tankname AND
i2.eventstamp < i.eventstamp
ORDER BY i2.eventstamp DESC
LIMIT 1
) as prev_id,
(SELECT i2.barrels
FROM inventory i2
WHERE i2.tankname = i.tankname AND
i2.eventstamp < i.eventstamp
ORDER BY i2.eventstamp DESC
LIMIT 1
) as prev_barrels
FROM inventory i
ORDER BY i.tankname, i.eventstamp;
Your query doesn't work because you have columns in the SELECT that are not in the GROUP BY and are not aggregated. That shouldn't be allowed in any database; it is unfortunate that MySQL does allow it.
There are 3 tables, persontbl1, persontbl2 (each 7500 rows) and schedule (~3000 active schedules i.e. schedule.status = 0). Person tables contain data for the same persons as one to one relationship and INNER join between two takes less than a second. And schedule table contains data about persons to be interviewed and not all persons have schedules in schedule table. With Left join query instantly takes around 45 seconds, which is causing all sorts of issues.
SELECT persontbl1._CREATION_DATE, persontbl2._TOP_LEVEL_AURI,
persontbl2.RESP_CNIC, persontbl2.RESP_CNIC_NAME,
persontbl1.MOB_NUMBER1, persontbl1.MOB_NUMBER2,
schedule.id, schedule.call_datetime, schedule.enum_id,
schedule.enum_change, schedule.status
FROM persontbl1
INNER JOIN persontbl2 ON (persontbl2._TOP_LEVEL_AURI = persontbl1._URI)
AND (AGR_CONTACT=1)
LEFT JOIN SCHEDULE ON (schedule.survey_id = persontbl1._URI)
AND (SCHEDULE.status=0)
AND (DATE(SCHEDULE.call_datetime) <= CURDATE())
ORDER BY schedule.call_datetime IS NULL DESC, persontbl1._CREATION_DATE ASC
Here is the explain for query:
Schedule Table structure:
Schedule Table indexes:
Please let me know if any further information is required.
Thanks.
Edit: Added fully qualified table names and their columns.
You should just replace this line:
AND (DATE(SCHEDULE.call_datetime) <= CURDATE())
to this one:
AND SCHEDULE.call_datetime <= '2015-04-18 00:00:00'
so mysql will not call 2 functions per every record but will use static constant '2015-04-18 00:00:00'.
So you can just try for performance improvements if your query is:
SELECT persontbl1._CREATION_DATE, persontbl2._TOP_LEVEL_AURI,
persontbl2.RESP_CNIC, persontbl2.RESP_CNIC_NAME,
persontbl1.MOB_NUMBER1, persontbl1.MOB_NUMBER2,
schedule.id, schedule.call_datetime, schedule.enum_id,
schedule.enum_change, schedule.status
FROM persontbl1
INNER JOIN persontbl2 ON (persontbl2._TOP_LEVEL_AURI = persontbl1._URI)
AND (AGR_CONTACT=1)
LEFT JOIN SCHEDULE ON (schedule.survey_id = persontbl1._URI)
AND (SCHEDULE.status=0)
AND (SCHEDULE.call_datetime <= '2015-02-01 00:00:00')
ORDER BY schedule.call_datetime IS NULL DESC, persontbl1._CREATION_DATE ASC
EDIT 1 So you said without LEFT JOIN part it was fast enough, so you can try then:
SELECT persontbl1._CREATION_DATE, persontbl2._TOP_LEVEL_AURI,
persontbl2.RESP_CNIC, persontbl2.RESP_CNIC_NAME,
persontbl1.MOB_NUMBER1, persontbl1.MOB_NUMBER2,
s.id, s.call_datetime, s.enum_id,
s.enum_change, s.status
FROM persontbl1
INNER JOIN persontbl2 ON (persontbl2._TOP_LEVEL_AURI = persontbl1._URI)
AND (AGR_CONTACT=1)
LEFT JOIN
(SELECT *
FROM SCHEDULE
WHERE status=0
AND call_datetime <= '2015-02-01 00:00:00'
) s
ON s.survey_id = persontbl1._URI
ORDER BY s.call_datetime IS NULL DESC, persontbl1._CREATION_DATE ASC
I'm guessing that AGR_CONTACT comes from p1. This is the query you want to optimize:
SELECT p1._CREATION_DATE, _TOP_LEVEL_AURI, RESP_CNIC, RESP_CNIC_NAME,
MOB_NUMBER1, MOB_NUMBER2,
s.id, s.call_datetime, s.enum_id, s.enum_change, s.status
FROM persontbl1 p1 INNER JOIN
persontbl2 p2
ON (p2._TOP_LEVEL_AURI = p1._URI) AND (p1.AGR_CONTACT = 1) LEFT JOIN
SCHEDULE s
ON (s.survey_id = p1._URI) AND
(s.status = 0) AND
(DATE(s.call_datetime) <= CURDATE())
ORDER BY s.call_datetime IS NULL DESC, p1._CREATION_DATE ASC;
The best indexes for this query are: persontbl2(agr_contact), persontbl1(_TOP_LEVEL_AURI, _uri), and schedule(survey_id, status, call_datime).
The use of date() around the date time is not recommended. In general, that precludes the use of indexes. However, in this case, you have a left join, so it doesn't make a difference. That column is not being used for filtering anyway. The index on schedule is only for covering the on clause.
I'm referencing two tables in the simple machines forum web forum software: smf_members, and smf_members. each row in smf_members has a field: id_group, of which I am interested in values: 1,9,20,26,23,27 (using the IN() clause). The general goal is to determine which rows in the smf_members table that are in the above group id's haven't had a row entered in smf_messages in 90 days. poster_time in smf_messages is a unix timestamp. What I have so far is:
SELECT
m.id_member,
m.id_group,
from_unixtime(max(ms.poster_time))
FROM
smf_members m
LEFT JOIN smf_messages ms
USING(id_member)
WHERE
max(ms.poster_time) < (NOW() < (86400 * 90)
GROUP BY
m.id_member
It fails with and ERROR 1111: Invalid use of the group function, since I am using max() in the where clause. How can I aggregate my join results to only reference the latest entry based off the poster_time field?
There are several ways to this.
SELECT * FROM members
WHERE id_member IN ( ... )
AND NOT EXISTS
( SELECT 1 FROM messages
WHERE is_member = members.id_member
AND poster_time > (UNIX_TIMESTAMP*() - 90 * 24 * 60 * 60)
look for the cases where an outer join finds nothing
SELECT DISTINCT id_member
FROM MEMBERS AS m ...
LEFT JOIN messages AS msg USING ....
WHERE msg.id IS NULL
SELECT * FROM members
WHERE id_member IN ( .... )
AND id_member NOT IN
(SELECT id_member FROM messages
WHERE poster_time > (UNIX_TIMESTAMP*() - 90 * 24 * 60 * 60)
)
You may want to try using a temporary table to figure out the max(ms.poster_time) before attempting the GROUP BY. Something like this ought to work.
SELECT * FROM
(SELECT
m.id_member,
m.id_group,
from_unixtime(max(ms.poster_time))
FROM
smf_members m
LEFT JOIN smf_messages ms
USING(id_member)
WHERE
max(ms.poster_time) < (NOW() < (86400 * 90) ) AS tmp
GROUP BY
m.id_member
Doing it this way allows you to use the aggregate function with the GROUP BY. It won't perform very well if you have large datasets. If that's the case, you may want to consider regularly summarizing the data generated by the inner table, and then querying on that.