Excluding 'near' duplicates from a mysql query - mysql

We have an iPhone app that sends invoice data by each of our employees several times per day. When they are in low cell signal areas tickets can come in as duplicates, however they are assigned a unique 'job id' in the mysql database, so they're viewed as unique. I could exclude the job id and make the rest of the columns DISTINCT, which gives me the filtered rows I'm looking for (since literally every data point is identical except for the job id), however I need the job ID since it's the primary reference point for each invoice and is what I point to for: approvals, edits, etc.
So my question is, how can I filter out 'near' duplicate rows in my query, while still pulling in the job id for each ticket?
The current query is below:
SELECT * FROM jobs, users
WHERE jobs.job_csuper = users.user_id
AND users.user_email = '".$login."'
AND jobs.job_approverid1 = '0'
Thanks for looking into it!
Edit (examples provided):
This is what I meant by 'near duplicate'
Job_ID - Job_title - Job_user - Job_time - Job_date
2345 - Worked on circuits - John Smith - 1.50 - 2013-01-01
2344 - Worked on circuits - John Smith - 1.50 - 2013-01-01
2343 - Worked on circuits - John Smith - 1.50 - 2013-01-01
So everything is identical except for the Job_ID column.

You want a group by:
SELECT *
FROM jobs, users
WHERE jobs.job_csuper = users.user_id
AND users.user_email = '".$login."'
AND jobs.job_approverid1 = '0'
group by <all fields from jobs except jobid>
I think the final query should look something like this:
select min(Job_ID) as JobId, Job_title, user.name as Job_user, Job_time, Job_date
FROM jobs join users
on jobs.job_csuper = users.user_id
WHERE jusers.user_email = '".$login."' AND jobs.job_approverid1 = '0'
group by Job_title, user.name, Job_time, Job_date
(This uses ANSI syntax for joins and is explicit about the fields coming back.)

It's better to prevent the double submission.
Given that you cannot prevent the double submission...
I would query like this:
select
min(Job_ID) as real_job_id
,count(Job_ID) as num_dup_job_ids
,group_concat(Job_ID) as all_dup_job_ids
,j.Job_title, j.Job_user, j.Job_time, j.Job_date
from
jobs j
inner join users u on u.user_id = j.job_csuper
where
whatever_else
group by
j.Job_title, j.Job_user, j.Job_time, j.Job_date
That includes more than you explicitly asked for. But it's probably good to be reminded of how many dups you have, and it gives you easy access to the duplicate id info when you need it.

How about creating a hash for each row and comparing them:
`SHA1(concat_ws(field1, field2, field3, ...)) AS jobhash`

Related

Get rows which are related to the searched row, by specific column

I am trying to implement a sql query to below scenario,
user_id
nic_number
reg_number
full_name
code
B123
12345
1212
John
123
B124
12346
1213
Peter
124
B125
12347
1214
Darln
125
B123
12345
1212
John
126
B123
12345
1212
John
127
In the subscribers table there can be rows with same user_id , nic_number , reg_number , full_name. But the code is different.
First -> get the user who have same code i have typed in the query ( i have implemented a query for that and it is working fine)
Second -> Then in that data i need to find the related rows (check by nic_number, and reg_number) and display only those related rows. That means in the below query I have got the data for code = 123. Which will show the first row of the table.
But I need to display only the rest of the rows which have the same nic_number or reg_number for the searched code only once.
That means the last 2 rows of the table.
select code,
GROUP_CONCAT(distinct trim(nic_number)) as nic_number,
GROUP_CONCAT(distinct trim(reg_number)) as reg_number,
GROUP_CONCAT(distinct trim(full_name)) as full_name from subscribers
where code like lower(concat('123')) group by code;
I need to implement sql query for this scenario by changing the above query.(Only one query, without joins or triggers).
I have tried this for a long time and unable to get the result. If anyone of you help me to get the result it will be very helpful.
You can combine nic and reg numbers in a unique key to get your records.
EDITED
to extract only related rows and not the one searched by code,
by the way, code seems not to be unique in subscribers table.
select
code,
trim(nic_number) as nic_number,
trim(reg_number) as reg_number,
trim(full_name) as full_name,
trim(code) as code
from
subscribers s1
where
code <> lower(trim('123'))
and trim(nic_number) + '|' + trim(reg_number) IN (
select trim(nic_number) + '|' + trim(reg_number)
from subscribers
where code = lower(trim('123'))
)
I'm not sure why you have specified "without joins" - I get that you may not want to have triggers on a table (which you don't need to achieve this anyway), but a JOIN is standard SQL syntax that will help you achieve the result you are after.
Try:
SELECT
s1.code, s1.nic_number, s1.reg_number, s1.full_name
FROM subscribers s1
INNER JOIN
(
SELECT nic_number, reg_number
FROM subscribers
WHERE code = '123'
) s2
ON s1.nic_number = s2.nic_number
AND s1.reg_number = s2.reg_number
WHERE s1.code <> '123';
Or, if you really need to achieve it with no JOINs at all, then you're just doubling-up the sub-query that you need to include:
SELECT
s1.code, s1.nic_number, s1.reg_number, s1.full_name
FROM subscribers s1
WHERE s1.nic_number IN
(
SELECT nic_number FROM subscribers
WHERE code = '123'
)
AND s1.reg_number IN
(
SELECT reg_number FROM subscribers
WHERE code = '123'
)
AND s1.code <> '123';
The latter query is not necessarily ideal, but it still achieves the desired result.

Joining and selecting multiple tables and creating new column names

I have very limited experience with MySQL past standard queries, but when it comes to joins and relations between multiple tables I have a bit of an issue.
I've been tasked with creating a job that will pull a few values from a mysql database every 15 minutes but the info it needs to display is pulled from multiple tables.
I have worked with it for a while to figure out the relationships between everything for the phone system and I have discovered how I need to pull everything out but I'm trying to find the right way to create the job to do the joins.
I'm thinking of creating a new table for the info I need, with columns named as:
Extension | Total Talk Time | Total Calls | Outbound Calls | Inbound Calls | Missed Calls
I know that I need to start with the extension ID from my 'user' table and match it with 'extensionID' in my 'callSession'. There may be multiple instances of each extensionID but each instance creates a new 'UniqueCallID'.
The 'UniqueCallID' field then matches to 'UniqueCallID' in my 'CallSum' table. At that point, I just need to be able to say "For each 'uniqueCallID' that is associated with the same 'extensionID', get the sum of all instances in each column or a count of those instances".
Here is an example of what I need it to do:
callSession Table
UniqueCallID | extensionID |
----------------------------
A 123
B 123
C 123
callSum table
UniqueCallID | Duration | Answered |
------------------------------------
A 10 1
B 5 1
C 15 0
newReport table
Extension | Total Talk Time | Total Calls | Missed Calls
--------------------------------------------------------
123 30 3 1
Hopefully that conveys my idea properly.
If I create a table to hold these values, I need to know how I would select, join and insert those things based on that diagram but I'm unable to construct the right query/statement.
You simply JOIN the two tables, and do a group by on the extensionID. Also, add formulas to summarize and gather the info.
SELECT
`extensionID` AS `Extension`,
SUM(`Duration`) AS `Total Talk Time`,
COUNT(DISTINCT `UniqueCallID`) as `Total Calls`,
SUM(IF(`Answered` = 1,0,1)) AS `Missed Calls`
FROM `callSession` a
JOIN `callSum` b
ON a.`UniqueCallID` = b.`UniqueCallID`
GROUP BY a.`extensionID`
ORDER BY a.`extensionID`
You can use a join and group by
select
a.extensionID
, sum(b.Duration) as Total_Talk_Time
, count(b.Answered) as Total_Calls
, count(b.Answered) -sum(b.Answered) as Missed_calls
from callSession as a
inner join callSum as b on a.UniqueCallID = b.UniqueCallID
group by a.extensionID
This should do the trick. What you are being asked to do is to aggregate the number of and duration of calls. Unless explicitly requested, you do not need to create a new table to do this. The right combination of JOINs and AGGREGATEs will get the information you need. This should be pretty straightforward... the only semi-interesting part is calculating the number of missed calls, which is accomplished here using a "CASE" statement as a conditional check on whether each call was answered or not.
Pardon my syntax... My experience is with SQL Server.
SELECT CS.Extension, SUM(CA.Duration) [Total Talk Time], COUNT(CS.UniqueCallID) [Total Calls], SUM(CASE CS.Answered WHEN '0' THEN SELECT 1 ELSE SELECT 0 END CASE) [Missed Calls]
FROM callSession CS
INNER JOIN callSum CA ON CA.UniqueCallID = CS.UniqueCallID
GROUP BY CS.Extension

Joins are coming back with no rows selected

i'm having some trouble with trying to extract some data from several MySQL tables in a join statement.
My tables and attributes are:
appointment_end_time (table)
appointment_end_time_id (int)(pk)(ai)
appointment_end_date (datetime)
appointment_start_time (table)
appointment_date_id (int)(pk)(ai)
appointment_start_date (datetime)
instructor(table)
instructor_id (int)(pk)(ai)
firstname varchar(45)
lastname varchar(45)
appointment_timetable
appointment_timetable_id int(11) AI PK
instructor_id int(11) FK
appointment_date_id int(11) FK
appointment_end_time_id int(11) FK
SELECT a.appointment_timetable_id, i.instructor_id, ad.appointment_start_date, aet.appointment_end_date
FROM db12405956.appointment_timetable a
JOIN instructor i on i.instructor_id = a.instructor_id
JOIN appointment_start_time ad on ad.appointment_date_id = a.appointment_date_id
JOIN appointment_end_time aet on aet.appointment_end_time_id = a.appointment_end_time_id
ORDER BY a.appointment_timetable_id;
However, this code brings back no rows selected when executed so i'm wondering what i'm doing wrong, any help will be much appreciated
Sample rows:
(appointment_end_time)
appointment_end_time_id appointment_end_date
1 2016-12-26 14:00:00
2 2016-12-24 13:00:00
3 2016-12-26 13:00:00
(appointment_start_time)
appointment_date_id appointment_start_date
1 2016-12-26 15:00:00
2 2016-12-24 16:00:00
3 2016-12-26 15:30:00
instructor_id firstname lastname
1 Sasha Thompson
2 Laura Robinson
3 John Walters
appointment_timetable
appointment_timetable_id instructor_id appointment_date_id appointment_end_time_
1 Blank Blank Blank
2 Blank Blank Blank
3 Blank Blank Blank
What you need is to learn how to diagnose the problem yourself. It is a common problem that a query doesn't return the expected results and you should understand how to break things down to find the issue.
Let's start with your query:
SELECT a.appointment_timetable_id, i.instructor_id, ad.appointment_start_date, aet.appointment_end_date
FROM db12405956.appointment_timetable a
JOIN instructor i on i.instructor_id = a.instructor_id
JOIN appointment_start_time ad on ad.appointment_date_id = a.appointment_date_id
JOIN appointment_end_time aet on aet.appointment_end_time_id = a.appointment_end_time_id
ORDER BY a.appointment_timetable_id;
What you do to break it down is start with the first table and then add the joins (and where conditions although you don't have any here), one at a time until the data problem appears. I find this easiest to do by using select * or select top 1 * (Or top 10 as I usually prefer to see more than one record) instead of the field list because then you don't have to look for the fields that are associated with joins you haven't added in yet.
So start with
SELECT top 10 *
FROM db12405956.appointment_timetable a
Then try
SELECT top 10 *
FROM db12405956.appointment_timetable a
JOIN instructor i on i.instructor_id = a.instructor_id
Then
SELECT top 10 *
FROM db12405956.appointment_timetable a
JOIN instructor i on i.instructor_id = a.instructor_id
JOIN appointment_start_time ad on ad.appointment_date_id = a.appointment_date_id
Finally
SELECT top 10 *
FROM db12405956.appointment_timetable a
JOIN instructor i on i.instructor_id = a.instructor_id
JOIN appointment_start_time ad on ad.appointment_date_id = a.appointment_date_id
JOIN appointment_end_time aet on aet.appointment_end_time_id = a.appointment_end_time_id
ORDER BY a.appointment_timetable_id;
At some point you will see where the records fell out and that is the location of the problem. Then you might need to look at the fields you are joining on and the data in them in your data sets to see why they are not returning any matches. For instance, if you are joining on dates, they may be stored as dates in one table and as varchar in another and date "01/01/2016' is not equal to 'Jan 1, 2016' or sometimes the column has some sort of prefix or suffix not in the other table. Something like PR2345 in one table and 2345 in the other. Sometimes the query is correct and no rows genuinely meet the conditions. This could be because the data is not fully populated yet (think writing a report for a system that is not live yet, no data on completed actions because none have completed yet.) or because the requirement was wrong in some of its assumptions or because there should be no matching records. It could even be a bug in the data entry.
Depending on the nature of the problem, you might need to return all the records or only use select top 1 (since all records are disappearing). Using SELECT * this way will help when you are returning too many or duplicate records as well as sometimes is is the fields not being returning that affect the results set. Note that I am not saying to use SELECT * in your final result set, it is only being used as a diagnostic tool here.
In your case, the problem looks as if it is in the first table. There are blanks for instructor ID and the other fields in your sample, so there is nothing to join on. (You only gave a sample so the rest of the table may not be like this.) If this is a case where the data is not there yet due to the feature that would add it not yet being live, then you can test your query only by adding test data to the table. Be sure to delete this data after you have finished unit testing. If the data should have been there, then you need to look at the insert from the application for a bug.

Complex MySQL COUNT query

Evening folks,
I have a complex MySQL COUNT query I am trying to perform and am looking for the best way to do it.
In our system, we have References. Each Reference can have many (or no) Income Sources, each of which can be validated or not (status). We have a Reference table and an Income table - each row in the Income table points back to Reference with reference_id
On our 'Awaiting' page (the screen that shows each Income that is yet to be validated), we show it grouped by Reference. So you may, for example, see Mr John Smith has 3 Income Sources.
We want it to show something like "2 of 3 Validated" beside each row
My problem is writing the query that figures this out!
What I have been trying to do is this, using a combination of PHP and MySQL to bridge the gap where SQL (or my knowledge) falls short:
First, select a COUNT of the number of incomes associated with each reference:
SELECT `reference_id`, COUNT(status) AS status_count
FROM (`income`)
WHERE `income`.`status` = 0
GROUP BY `reference_id`
Next, having used PHP to generate a WHERE IN clause, proceed to COUNT the number of confirmed references from these:
SELECT `reference_id`, COUNT(status) AS status_count
FROM (`income`)
WHERE `reference_id` IN ('8469', '78969', '126613', ..... etc
AND status = 1
GROUP BY `reference_id`
However this doesn't work. It returns 0 rows.
Any way to achieve what I'm after?
Thanks!
In MySQL, you can SUM() on a boolean expression to get a count of the rows where that expression is true. You can do this because MySQL treats true as the integer 1 and false as the integer 0.
SELECT `reference_id`,
SUM(`status` = 1) AS `validated_count`,
COUNT(*) AS `total_count`
FROM `income`
GROUP BY `reference_id`

Mysql: Adding product restricted shipping options to cart

I have a custom shop, and I need to redo the shipping. However, that is sometimes later, and in the meantime, I need to add a shipping option for when a cart only contains a certain range of products.
SO there is a ship_method table
id menuname name zone maxweight
1 UK Standard ukfirst 1 2000
2 UK Economy uksecond 1 750
3 Worldwide Air world_air 4 2000
To this I have added another column prod_restrict which is 0 for the existing ones, and 1 for the restricted ones, and a new table called ship_prod_restrict which contains two columns, ship_method_id and item_id, listing what products are allowed in a shipping category.
So all I need to do is look in my transactions, and for each cart, just check which shipping methods are either prod_restrict of 0 or have 1 and have no products in the cart that aren't in the restriction table.
Unfortunately it seems that because you can't values from an outer query to an inner one, I can't find a neat way of doing it. (edited to show the full query due to comments below)
select ship_method.* from ship_method, ship_prod_restrict where
ship_method.`zone` = 1 and prod_restrict='0' or
(
prod_restrict='1'
and ship_method.id = ship_prod_restrict.ship_method_id
and (
select count(*) from (
select transactions.item from transactions
LEFT JOIN ship_prod_restrict
on ship_prod_restrict.item_id = transactions.item
and ship_prod_restrict.ship_method_id=XXXXX
where transactions.session='shoppingcartsessionid'
and item_id is null
) as non_permitted_items < 1 )
group by ship_method.id
gives you a list of whether the section matches or not, and works as an inner query but I can't get that ship_method_id in there (at XXXXX).
Is there a simple way of doing this, or am I going about it the wrong way? I can't currently change the primary shipping table, as this is already in place for now, but the other bits can change. I could also do it within PHP but you know, that seems like cheating!
Not sure how the count is important, but this might be a bit lighter - hard to tell without a full table schema dump:
SELECT COUNT(t.item) FROM transactions t
INNER JOIN ship_prod_restrict r
ON r.item_id = t.item
WHERE t.session = 'foo'
AND r.ship_method_id IN (**restricted, id's, here**)