Select rows with duplicate values that may be in different columns - mysql

I have a table of businesses, and each business can have up to 3 phone numbers. I want to find any duplicate phone numbers, but since the phone numbers are in different columns I don't think I can make the classic GROUP BY query work.
Sample data:
ID
Business_Name
phone_main
phone_mobile
phone_tollfree
1
John's Donuts
555-551-5555
555-551-5556
null
2
John's Bakery
555-551-5557
555-551-5555
null
3
SuperBake!
555-300-1005
null
555-551-5555
4
Grocery Fred
555-223-5511
555-334-5555
null
In this case I want to identify records 1, 2, and 3 as being the same. Simply identifying the phone number 555-551-5555 as a number with duplicates would be fine, as I can do a subquery or the calling program can use the phone number and send a new query getting all records with 555-551-5555 in any of the 3 phone columns.
This is on MariaDB if it matters.
Edit, (adding my current flailing attempt since someone seems to really want it):
Here's what I have right now:
SELECT ID, phone_main, phone_mobile, phone_tollfree
(
SELECT COUNT(*) FROM businesses b
WHERE (
phone IS NOT NULL AND (b.phone_mobile=phone OR b.tollfree=phone )
)
OR (
phone_mobile IS NOT NULL AND (b.phone=phone_mobile OR b.phone_tollfree=phone_mobile)
)
OR (
phone_tollfree IS NOT NULL AND (b.phone=phone_tollfree OR b.phone_mobile=phone_tollfree)
)
) cnt
from business HAVING cnt > 1
Problems with this:
It seems to be returning every row in my table.
It won't find duplicates within a single column.

How about uniting all the phone columns into one an then counting the reoccurrences?
I Didn't run the code but it might give you a direction...:
SELECT phone, COUNT(phone)
FROM (
SELECT phone_main as phone FROM SampleData
UNION ALL
(SELECT phone_mobile as phone FROM SampleData
ORDER BY City;
UNION ALL
SELECT phone_tollfree as phone FROM SampleData
ORDER BY City; and )
)
GROUP BY phone

E.g.:
SELECT DISTINCT x.*
FROM my_table x
JOIN my_table y
ON y.id <> x.id
AND
( y.phone_main IN(x.phone_main,x.phone_mobile,x.phone_tollfree)
OR y.phone_mobile IN(x.phone_main,x.phone_mobile,x.phone_tollfree)
OR y.phone_tollfree IN(x.phone_main,x.phone_mobile,x.phone_tollfree)
);

You can do:
with
l as (
select *, least(phone_main, phone_mobile, phone_tollfree) as p
from t
)
select *
from l
where p in (select p from l group by p having count(*) > 1)

Related

How can I easily INSERT data in a table from multiple columns from another table?

I want to take all phone numbers from the companies table, and put that in a dedicated phone numbers table, is there an easy way to do this using (if possible) only one query?
example data from the companies table (tel3 and tel4 could have phone numbers):
id
tel
tel2
tel3
tel4
1
32772373636
32724522341
2
32783675626
3
32968381949
expected example output in phonenrs table:
id
company_id
phonenr
1
1
32772373636
2
1
32724522341
3
2
32783675626
4
3
32968381949
You could use an insert-select statement from a query that union alls the phone numbers:
INSERT INTO numbers (company_id, phonenr)
SELECT it, tel FROM numbers WHERE tel IS NOT NULL
UNION ALL
SELECT it, tel2 FROM numbers WHERE tel2 IS NOT NULL
UNION ALL
SELECT it, tel3 FROM numbers WHERE tel3 IS NOT NULL
UNION ALL
SELECT it, tel4 FROM numbers WHERE tel4 IS NOT NULL
To get the EXACT match to your intended output, first we need to add a row id for your new id column. Then,to make sure the sorting precedence of tel > tel2 > tel3 > tel4, we can perform a trick to do so. Here is the code written and tested in workbench:
select #row_id:=#row_id+1 as id, id as company_id,trim(leading '0' from phone ) as phonenr
from
(select id,ifnull(concat('000',tel),1) as phone from companies
union all
select id,ifnull(concat('00',tel2),1) from companies
union all
select id,ifnull(concat('0',tel3),1) from companies
union all
select id,ifnull(tel4,1) from companies
) t1,
(select #row_id:=0) t2
where phone !=1
order by company_id,phone
;
-- result:
# id, company_id, phonenr
1, 1, 32772373636
2, 1, 32724522341
3, 2, 32783675626
4, 3, 32968381949
As you can see, by adding different number of leading zero to the phone,we can manipulate the sorting precedence. Without it, I got 32724522341 instead of 32772373636 for the first line.

MySQL query delivers wrong result when using <> (not equal) in WHERE clause

I came accross a strange problem with a MySQL Query
SELECT COUNT(id) FROM members
100
SELECT COUNT(id) FROM members WHERE lastname = 'Smith'
20
SELECT COUNT(id) FROM members WHERE lastname <> 'Smith'
0
The problem is, that the last query (Members with lastname != 'Smith') returns 0.
If there are 100 members in total and 20 members named 'Smith', the number of member with other last names should be 80, shouldn't it?
I tried different version using <>, !=, enclosing Smith with ' or ". The result when using LIKE and NOT LIKE instead is the same.
How is this possible? It seems that I am missing something quite obvious, but what...?
because others are null
try this :
SELECT COUNT(id) FROM members WHERE IFNULL(lastname ,'--')<> 'Smith'
Example :
CREATE TABLE my_table
SELECT 'ersin' name FROM dual
union all
SELECT 'ersin' name FROM dual
union all
SELECT 'ersin' name FROM dual
union all
SELECT null name FROM dual
union all
SELECT null name FROM dual
union all
SELECT null name FROM dual;
select script:
select count(*) from my_table where IFNULL(name ,'--') <> 'ersin' ;
output:
count(*)
3

Need list of data using DISTINCT, COUNT, MAX

The table structure is as below,
My first SQL query is as below,
SELECT DISTINCT(IndustryVertical)
, COUNT(IndustryVertical) AS IndustryVerticalCount
, City
FROM `records`
WHERE City!=''
GROUP
BY IndustryVertical
, City
ORDER
BY `IndustryVerticalCount` DESC
by running the above query I'm getting the below,
What I'm trying to achieve is to get the List of all the DISTINCT CITY with ONLY ONE MAX(IndustryVerticalCount) and IndustryVertical.
Tried several things with no hope.
Anyone, please guide me.
There're several records in each City values. what I'm trying to achieve is that getting,
All the distinct City Values
The MAX COUNT of industryVertical
Name of industryVertical
The record I'm getting is as below,
What I'm trying to get,
The above record is reference purpose. Here, you can see only distinct city values with only one the vertical name having max count.
Since you are using group by, it will automatically select only distinct rows. Since you are using group by on two columns, you will get rows in which only combination of both columns is distinct.
What you now have to do is use this resulting table, and perform a query on it to find the maximum count grouped by city.
SELECT IndustryVertical, IndustryVerticalCount, City from
( SELECT IndustryVertical
, COUNT(IndustryVertical) AS IndustryVerticalCount
, City
FROM `records`
WHERE City!=''
GROUP
BY IndustryVertical
, City) as tbl where IndustryVerticalCount IN (Select max(IndustryVerticalCount) from ( SELECT IndustryVertical
, COUNT(IndustryVertical) AS IndustryVerticalCount
, City
FROM `records`
WHERE City!=''
GROUP
BY IndustryVertical
, City) as tbl2 where tbl.City=tbl2.city)
This may not be the most efficient method, but I think it will work.
How about this? I think it should be worked:
DECLARE #DataSet TABLE (
City VARCHAR(50),
IndustryVertical VARCHAR(50),
IndustryVerticalCount INT
)
INSERT INTO #DataSet SELECT 'Bangalore', 'Consumer Internet', 279
INSERT INTO #DataSet SELECT 'Bangalore', 'Technology', 269
INSERT INTO #DataSet SELECT 'Bangalore', 'Logistics', 179
INSERT INTO #DataSet SELECT 'Mumbai', 'Technology', 194
INSERT INTO #DataSet SELECT 'Mumbai', 'Consumer Internet', 89
SELECT
table_a.*
FROM #DataSet table_a
LEFT JOIN #DataSet table_b
ON table_a.City = table_b.City
AND table_a.IndustryVerticalCount < table_b.IndustryVerticalCount
WHERE table_b.IndustryVerticalCount IS NULL
I think you simply want a HAVING clause:
SELECT r.IndustryVertical,
COUNT(*) AS IndustryVerticalCount,
r.City
FROM records r
WHERE r.City <> ''
GROUP BY r.IndustryVertical, r.City
HAVING COUNT(*) = (SELECT COUNT(*)
FROM records r2
WHERE r2.City = r.City
ORDER BY COUNT(*) DESC
LIMIT 1
)
ORDER BY IndustryVerticalCount DESC;

SELECT COUNT(*) for unique pairs of IDs

I have a table like the following, named matches:
match_id ( AUTO INCREMENT )
user_id ( INT 11 )
opponent_id ( INT 11 )
date ( TIMESTAMP )
What I have to do is to SELECT the count of the rows where user_id and opponent_id are a unique pair. The goal is to see the count of total matches started between different users.
So if we have:
user_id = 10 and opponent_id = 11
user_id = 20 and opponent_id = 22
user_id = 10 and opponent_id = 11
user_id = 11 and opponent_id = 10
The result of the query should be 2.
In fact we only have 2 matches that have been started by a couple of different users. Match 1 - 3 - 4 are the same matches, because played by the same couple of user IDs.
Can anyone help me with this?
I have done similar queries but never on pairs of IDs, always on a single ID.
FancyPants answer is correct, but I prefer to use DISTINCT when no aggregate function is used:
SELECT COUNT(DISTINCT
LEAST(user_id, opponent_id),
GREATEST(user_id, opponent_id)
)
FROM yourtable;
is sufficient.
SELECT COUNT(*) AS nr_of_matches FROM (
SELECT
LEAST(user_id, opponent_id) AS pl1,
GREATEST(user_id, opponent_id) AS pl2
FROM yourtable
GROUP BY pl1, pl2
) sq
see it working in an sqlfiddle

Fetch 2nd Higest value from MySql DB with GROUP BY

I have a table tbl_patient and I want to fetch last 2 visit of each patient in order to compare whether patient condition is improving or degrading.
tbl_patient
id | patient_ID | visit_ID | patient_result
1 | 1 | 1 | 5
2 | 2 | 1 | 6
3 | 2 | 3 | 7
4 | 1 | 2 | 3
5 | 2 | 3 | 2
6 | 1 | 3 | 9
I tried the query below to fetch the last visit of each patient as,
SELECT MAX(id), patient_result FROM `tbl_patient` GROUP BY `patient_ID`
Now i want to fetch the 2nd last visit of each patient with query but it give me error
(#1242 - Subquery returns more than 1 row)
SELECT id, patient_result FROM `tbl_patient` WHERE id <(SELECT MAX(id) FROM `tbl_patient` GROUP BY `patient_ID`) GROUP BY `patient_ID`
Where I'm wrong
select p1.patient_id, p2.maxid id1, max(p1.id) id2
from tbl_patient p1
join (select patient_id, max(id) maxid
from tbl_patient
group by patient_id) p2
on p1.patient_id = p2.patient_id and p1.id < p2.maxid
group by p1.patient_id
id11 is the ID of the last visit, id2 is the ID of the 2nd to last visit.
Your first query doesn't get the last visits, since it gives results 5 and 6 instead of 2 and 9.
You can try this query:
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
union
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
where id not in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
GROUP BY patient_ID)
order by 1,2
SELECT id, patient_result FROM `tbl_patient` t1
JOIN (SELECT MAX(id) as max, patient_ID FROM `tbl_patient` GROUP BY `patient_ID`) t2
ON t1.patient_ID = t2.patient_ID
WHERE id <max GROUP BY t1.`patient_ID`
There are a couple of approaches to getting the specified resultset returned in a single SQL statement.
Unfortunately, most of those approaches yield rather unwieldy statements.
The more elegant looking statements tend to come with poor (or unbearable) performance when dealing with large sets. And the statements that tend to have better performance are more un-elegant looking.
Three of the most common approaches make use of:
correlated subquery
inequality join (nearly a Cartesian product)
two passes over the data
Here's an approach that uses two passes over the data, using MySQL user variables, which basically emulates the analytic RANK() OVER(PARTITION ...) function available in other DBMS:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM (
SELECT p.id
, p.patient_id
, p.visit_id
, p.patient_result
, #rn := if(#prev_patient_id = patient_id, #rn + 1, 1) AS rn
, #prev_patient_id := patient_id AS prev_patient_id
FROM tbl_patients p
JOIN (SELECT #rn := 0, #prev_patient_id := NULL) i
ORDER BY p.patient_id DESC, p.id DESC
) t
WHERE t.rn <= 2
Note that this involves an inline view, which means there's going to be a pass over all the data in the table to create a "derived tabled". Then, the outer query will run against the derived table. So, this is essentially two passes over the data.
This query can be tweaked a bit to improve performance, by eliminating the duplicated value of the patient_id column returned by the inline view. But I show it as above, so we can better understand what is happening.
This approach can be rather expensive on large sets, but is generally MUCH more efficient than some of the other approaches.
Note also that this query will return a row for a patient_id if there is only one id value exists for that patient; it does not restrict the return to just those patients that have at least two rows.
It's also possible to get an equivalent resultset with a correlated subquery:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
WHERE ( SELECT COUNT(1) AS cnt
FROM tbl_patients p
WHERE p.patient_id = t.patient_id
AND p.id >= t.id
) <= 2
ORDER BY t.patient_id ASC, t.id ASC
Note that this is making use of a "dependent subquery", which basically means that for each row returned from t, MySQL is effectively running another query against the database. So, this will tend to be very expensive (in terms of elapsed time) on large sets.
As another approach, if there are relatively few id values for each patient, you might be able to get by with an inequality join:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
LEFT
JOIN tbl_patients p
ON p.patient_id = t.patient_id
AND t.id < p.id
GROUP
BY t.id
, t.patient_id
, t.visit_id
, t.patient_result
HAVING COUNT(1) <= 2
Note that this will create a nearly Cartesian product for each patient. For a limited number of id values for each patient, this won't be too bad. But if a patient has hundreds of id values, the intermediate result can be huge, on the order of (O)n**2.
Try this..
SELECT id, patient_result FROM tbl_patient AS tp WHERE id < ((SELECT MAX(id) FROM tbl_patient AS tp_max WHERE tp_max.patient_ID = tp.patient_ID) - 1) GROUP BY patient_ID
Why not use simply...
GROUP BY `patient_ID` DESC LIMIT 2
... and do the rest in the next step?