MYSQL - Select Unique Common Columns between two tables - Most Efficient Query - mysql

I have two tables:
db_contacts
Phone | Name | Last_Name
--------------------
111 | Foo | Foo
222 | Bar | Bar
333 | John | Smith
444 | Tomy | Smith
users_contacts
User_ID | Phone
--------------------
1 | 123
1 | 111
2 | 222
2 | 333
3 | 111
3 | 333
4 | 444
Notice from above that:
User with ID 2 is the only one that have the phone number 222
User with ID 4 is the only one that have the phone number 444
I need to obtain these results with a MySQL query.
In other words: How can I select all the users that have a unique phone number in condition that this number exists in the db_contacts.
I need my end result to be something like that:
User_ID | Phone | Name | Last_Name
------------------------------------
2 | 222 | Bar | Bar
4 | 444 | Tomy | Smith
PS: There is no Foreign key between the Phone columns, as a User can have a phone that is not in the db_contacts.
In real life, db_contacts contains about 1 million records and users_contacts about 5 million records.
What I tried and failed and taking a lot of time to execute:
SELECT *
FROM users_contacts
WHERE users_contacts.phone IN (
SELECT users_contacts.phone
FROM `users_contacts`
JOIN db_contacts ON db_contacts.phone = users_contacts.phone
GROUP BY users_contacts.phone
HAVING COUNT(users_contacts.phone) = 1
)
Update:
Thank you for your replies, I have provided my solution that fits my case perfectly.

I think you want:
select uc.*
from user_contacts uc
where not exists (select 1
from user_contacts uc2
where uc2.phone = uc.phone and uc2.user_id <> uc.user_id
);
For performance, you want an index on user_contacts(phone, user_id).
Another method is:
select max(user_id) as user_id, phone
from user_contacts
group by phone
having count(*) = 1;
The not exists version is probably going to be faster.

I would use a simple JOIN with a NOT EXISTS condition. This is usually the most efficient way to check that something has no duplicates ; compared to your solution, this has the advantage of avoiding aggregation.
SELECT uc.User_ID, dc.*
FROM users_contacts uc
INNER JOIN db_contacts dc ON uc.Phone = dc.Phone
WHERE NOT EXISTS (
SELECT 1
FROM users_contacts uc1
WHERE uc1.Phone = dc.Phone AND uc1.User_ID != uc2.User_ID
)
Hint: consider setting the following indexes:
users_contacts(Phone, User_ID)
db_contacts(Phone)

I first would like to thank everyone that posted solutions, they all worked.
But I was a bit crucial on response times, and solutions provided by the fellows took a lot of time to execute, couple of seconds.
In case anyone was having a similar problem, I ended up by creating a new table calling it users_unique_contacts, and created a trigger AFTER INSERT on users_contacts that checks if the newly created contact existed in the users_unique_contacts, if it didn't exist, add it, else remove it as it means the number is not unique anymore.
My Trigger went like this:
BEGIN
IF EXISTS (SELECT 1 = 1 FROM users_unique_contacts WHERE phone = new.phone LIMIT 1) THEN
BEGIN
DELETE FROM users_unique_contacts WHERE phone = new.phone LIMIT 1;
END;
ELSE
BEGIN
INSERT INTO users_unique_contacts (user_id,phone) VALUES (new.user_id, new.phone);
END;
END IF;
END
Now everytime I want the unique numbers of a user, I query the users_unique_contacts and execution time is milliseconds.

Related

SQL, table join wont display proper output [duplicate]

I've got the following two tables (in MySQL):
Phone_book
+----+------+--------------+
| id | name | phone_number |
+----+------+--------------+
| 1 | John | 111111111111 |
+----+------+--------------+
| 2 | Jane | 222222222222 |
+----+------+--------------+
Call
+----+------+--------------+
| id | date | phone_number |
+----+------+--------------+
| 1 | 0945 | 111111111111 |
+----+------+--------------+
| 2 | 0950 | 222222222222 |
+----+------+--------------+
| 3 | 1045 | 333333333333 |
+----+------+--------------+
How do I find out which calls were made by people whose phone_number is not in the Phone_book? The desired output would be:
Call
+----+------+--------------+
| id | date | phone_number |
+----+------+--------------+
| 3 | 1045 | 333333333333 |
+----+------+--------------+
There's several different ways of doing this, with varying efficiency, depending on how good your query optimiser is, and the relative size of your two tables:
This is the shortest statement, and may be quickest if your phone book is very short:
SELECT *
FROM Call
WHERE phone_number NOT IN (SELECT phone_number FROM Phone_book)
alternatively (thanks to Alterlife)
SELECT *
FROM Call
WHERE NOT EXISTS
(SELECT *
FROM Phone_book
WHERE Phone_book.phone_number = Call.phone_number)
or (thanks to WOPR)
SELECT *
FROM Call
LEFT OUTER JOIN Phone_Book
ON (Call.phone_number = Phone_book.phone_number)
WHERE Phone_book.phone_number IS NULL
(ignoring that, as others have said, it's normally best to select just the columns you want, not '*')
SELECT Call.ID, Call.date, Call.phone_number
FROM Call
LEFT OUTER JOIN Phone_Book
ON (Call.phone_number=Phone_book.phone_number)
WHERE Phone_book.phone_number IS NULL
Should remove the subquery, allowing the query optimiser to work its magic.
Also, avoid "SELECT *" because it can break your code if someone alters the underlying tables or views (and it's inefficient).
The code below would be a bit more efficient than the answers presented above when dealing with larger datasets.
SELECT *
FROM Call
WHERE NOT EXISTS (
SELECT 'x'
FROM Phone_book
WHERE Phone_book.phone_number = Call.phone_number
);
SELECT DISTINCT Call.id
FROM Call
LEFT OUTER JOIN Phone_book USING (id)
WHERE Phone_book.id IS NULL
This will return the extra id-s that are missing in your Phone_book table.
I think
SELECT CALL.* FROM CALL LEFT JOIN Phone_book ON
CALL.id = Phone_book.id WHERE Phone_book.name IS NULL
SELECT t1.ColumnID,
CASE
WHEN NOT EXISTS( SELECT t2.FieldText
FROM Table t2
WHERE t2.ColumnID = t1.ColumnID)
THEN t1.FieldText
ELSE t2.FieldText
END FieldText
FROM Table1 t1, Table2 t2
SELECT name, phone_number FROM Call a
WHERE a.phone_number NOT IN (SELECT b.phone_number FROM Phone_book b)
Alternatively,
select id from call
minus
select id from phone_number
Don't forget to check your indexes!
If your tables are quite large you'll need to make sure the phone book has an index on the phone_number field. With large tables the database will most likely choose to scan both tables.
SELECT *
FROM Call
WHERE NOT EXISTS
(SELECT *
FROM Phone_book
WHERE Phone_book.phone_number = Call.phone_number)
You should create indexes both Phone_Book and Call containing the phone_number. If performance is becoming an issue try an lean index like this, with only the phone number:
The fewer fields the better since it will have to load it entirely. You'll need an index for both tables.
ALTER TABLE [dbo].Phone_Book ADD CONSTRAINT [IX_Unique_PhoneNumber] UNIQUE NONCLUSTERED
(
Phone_Number
)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = ON) ON [PRIMARY]
GO
If you look at the query plan it will look something like this and you can confirm your new index is actually being used. Note this is for SQL Server but should be similar for MySQL.
With the query I showed there's literally no other way for the database to produce a result other than scanning every record in both tables.

update rate for unique productId by each userID

I'm going to implement a method on my own SQL. I have two tables in MySQL. Suppose that each row is updated in the FirstTable and the values of the rate and countView are variable, I'm trying to update them with the same command:
UPDATE FirstTable SET `countView`= `countView`+1,
`rate`=('$MyRate' + (`countView`-1)*`rate`)/`countView`
WHERE `productId`='$productId'
FirstTable:
productId | countView | rate | other column |
------------+-----------+------+-------------------+---
21 | 12 | 4 | anything |
------------+-----------+------+-------------------+---
22 | 18 | 3 | anything |
------------+-----------+------+-------------------+---
But in this way, a user can vote every time he wants to. So I tried to create a table with two columns productId and userID. Like below:
SecondTable:
productId | userID |
------------+---------------|
21 | 100001 |
------------+---------------|
22 | 100002 |
------------+---------------|
21 | 100001 |
------------+---------------|
21 | 100003 |
------------+---------------|
Now, as in the example given in the SecondTable, a user has given to a productId two vote. So I don't want both of these votes to be recorded.
Problems with this method:
The value of the counter is added to each vote.
I can not properly link the SecondTable and FirstTable to manage the update of the FirstTable.
Of course, this question may not be completely new, but I searched a lot to get the right answer. One of the questions from this site came through this method. Using this method, you can manage the update of a table. This method is as follows:
UPDATE `FirstTable` SET `countView`= `countView`+1,
`rate`=('$MyRate' + (`countView`-1)*`rate`)/`countView`
WHERE `productId`='$productId' IN ( SELECT DISTINCT productId, userID
FROM SecondTable)
But the next problem is that even when I use this command, I encounter the following error:
1241 - Operand should contain 1 column(s)
So thank you so much if you can guide me. And I'm sure my question is not duplicate... thank you again.
This fixes your specific syntax problem:
UPDATE FirstTable
SET countView = countView + 1,
rate = ($MyRate + (countView - 1) * rate) / countView
WHERE productId = $productId AND
productId IN (SELECT t2.productId FROM SecondTable t2);
But if two different users vote on the same product, FirstTable will be updated only once. It is unclear if that is intentional behavior or not.
Note that SELECT DISTINCT is not needed in the subquery.
The error is being generated because you can't return 2 fields in an "in" statement. You'll want to use group by:
Try:
IN ( SELECT DISTINCT productId FROM rating group by product, UserID)
Here's documentation to look over for mysql group by if you want: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html

Get most unique combinations of 2 pictures

I have a mySQL table that holds n number of pictures.
+------------+--------------+
| picture_id | picture_name |
+------------+--------------+
| 1 | ben.jpg |
| 2 | nick.jpg |
| 3 | mark.jpg |
| 4 | james.jpg |
| .. | ... |
| n | abraham.jpg |
+------------+--------------+
For a web application, i need to display 2 pictures simultaneously where the user can vote for one picture or the other. After voting, the user gets a new set of two pictures.
(application use interface)
+---------------------+--------------------+
| Vorte for picture 1 | Vote for picture 2 |
+---------------------+--------------------+
I would like to avoid displaying the same combinations as much as possible. I can create a helper table that will hold all possible combinations.
+----------------+--------------+--------------+
| combination_id | picture_id_1 | Picture_id_2 |
+----------------+--------------+--------------+
| 1 | 1 | 2 |
| 2 | 1 | 3 |
| 3 | 1 | 4 |
| 4 | 1 | 5 |
| .. | .. | .. |
| (n^2-n)/2 | .. | .. |
+----------------+--------------+--------------+
but for 100 pictures, that would be (100^2 - 100)/2 = 4950 (edit) rows, and with every added picture the table would grow exponentially. (which is not a big issue in todays computing i suppose)
But how do i query this table in a way that the user always sees as little duplicates as possible.
Expected result:
run 1: picture_id's = 4,5 (any numbers between 1 and n)
run 2: picture_id's = 2,7
run 3: picture_id's = 5 and 20
...
DEMO:http://rextester.com/VNWIOA4679 (added 100 pic samples) 2 sec query for 1 user w/o any indexes.
I see no need for a helper table as the data can easily be constructed on the fly with proper indexes. at 1000 pictures you're looking at 499,500 combinations a user could vote upon. still easily managed within a database construct as we operate on a set level, not a record level.
Here's one way assuming my own table structures. I can't think of a more efficient way to store/process the data.
Using this approach as new pictures are added the query will generate a larger and larger combination set but always exclude those on which a user has already voted. no code changes for new pics, no regenerating sets just processing each time the ones a user hasn't made a selection upon.
Create table SO46205797_Pics(
PICID int);
Insert into SO46205797_Pics values (1);
Insert into SO46205797_Pics values (2);
Insert into SO46205797_Pics values (3);
Insert into SO46205797_Pics values (4);
Insert into SO46205797_Pics values (5);
Insert into SO46205797_Pics values (6);
Insert into SO46205797_Pics values (7);
Create table SO46205797_UserPicResults (
USERID int,
PICID int,
PICID2 int,
PICChoiceID int);
Insert into SO46205797_UserPicResults values (1,1,2,1);
Insert into SO46205797_UserPicResults values (1,1,3,1);
Insert into SO46205797_UserPicResults values (1,1,4,4);
magic happens here the above was just data setup.
SELECT A.PICID, B.PICID, C.PICChoiceID
FROM SO46205797_Pics A
INNER JOIN SO46205797_Pics B
on A.PICID < B.PICID
LEFT JOIN SO46205797_UserPicResults C
on A.PICID = C.PicID
and B.PICID = C.PICID2
and C.USERID = 1
WHERE C.userID is null;
Note that if we eliminate the C.userID is null part then we see all of the possible combinations (for user1) (note I treat ID 1, 2 the same as ID 2,1 which I think youw ant) for the 2 photos and which ones the user has selected. Since we don't want to display that choice again, we use the c.userID is null to exclude combinations the user already made a choice for.
Also when saving data to the userPicResults, you need to ensure PICID1 is always less than PICID2.
A different way to do this is using a not exists which may be slightly faster.
obviously indexes on USERID, PICID, PICID2 and in that order would be beneficial (i'd probably make it the a combined PK) for SO46205797_UserPicResults and an index on PICID for SO46205797_Pics as the PK.
SELECT A.PICID, B.PICID
FROM SO46205797_Pics A
INNER JOIN SO46205797_Pics B
on A.PICID < B.PICID
WHERE not exists (SELECT *
FROM SO46205797_UserPicResults C
WHERE A.PICID = C.PicID
and B.PICID = C.PICID2
and C.USERID = 1);
I considered maintaining a parent/child relationship for each image for each user; but this approach doesn't store the choices for all combinations.
The goal of this application is to let people vote for one picture against another, right? Then you need to have some kind of vote results table:
vote_results:
| vote_id | user_id | vote_up_picture_id | vote_down_picture_id | ...
Then, based on data from this table you can easily show to a user picture pairs, which he haven't seen yet:
select first.picture_id, second.picture_id
from pictures as first, pictures as second
where not exists(
select * from vote_results v
where (v.vote_up_picture_id = first.picture_id and v.vote_down_picture_id = second.picture_id)
or (v.vote_up_picture_id = second.picture_id and v.vote_down_picture_id = first.picture_id)
) and first.picture_id != second.picture_id
order by rand()
limit 1
PS. As you see, there is no need in helper table with combination_id

MySQL counting number of max groups

I asked a similar question earlier today, but I've run into another issue that I need assistance with.
I have a logging system that scans a server and catalogs every user that's online at that given moment. Here is how my table looks like:
-----------------
| ab_logs |
-----------------
| id |
| scan_id |
| found_user |
-----------------
id is an autoincrementing primary key. Has no real value other than that.
scan_id is an integer that is incremented after each successful scan of all users. It so I can separate results from different scans.
found_user. Stores which user was found online during the scan.
The above will generate a table that could look like this:
id | scan_id | found_user
----------------------------
1 | 1 | Nick
2 | 2 | Nick
3 | 2 | John
4 | 3 | John
So on the first scan the system found only Nick online. On the 2nd it found both Nick and John. On the 3rd only John was still online.
My problem is that I want to get the total amount of unique users connected to the server at the time of each scan. In other words, I want the aggregate number of users that have connected at each scan. Think counter.
From the example above, the result I want from the sql is:
1
2
2
EDIT:
This is what I have tried so far, but it's wrong:
SELECT COUNT(DISTINCT(found_user)) FROM ab_logs WHERE DATE(timestamp) = CURDATE() GROUP BY scan_id
What I tried returns this:
1
2
1
The code below should give you the results you are looking for
select s.scan_id, count(*) from
(select distinct
t.scan_id
,t1.found_user
from
tblScans t
inner join tblScans t1 on t.scan_id >= t1.scan_id) s
group by
s.scan_id;
Here is sqlFiddle
It assumes the names are unique and includes current and every previous scans in the count
Try with group by clause:
SELECT scan_id, count(*)
FROM mytable
GROUP BY scan_id

MySQL: Update SET and WHERE both determined by one subselect

I don't have any idea how to do this..
I have a table like this:
account_categories
--------------------
id | description
--------------------
34 | Home Services
35 | Home Services
36 | Home Services
39 | Home Design
40 | Home Design
I have another table (accounts) that references account_categories.id and it uses all of the above values. :/
I want to flatten account_categories, so I need to pick one duplicate from account_categories and update accounts so that all duplicates use the one selected value.
For instance, I need to turn this:
accounts
---------------------
id | accountCategory
---------------------
1 | 34
2 | 35
3 | 36
4 | 39
5 | 40
Into this:
accounts
---------------------
id | accountCategory
---------------------
1 | 34
2 | 34
3 | 34
4 | 39
5 | 39
I can select an id and distinct description from account categories like this:
SELECT DISTINCT (description), id
FROM crmalpha.account_categories
GROUP BY description
But I guess that the next step is to do something like this:
for ( row in ( SELECT DISTINCT (description), id FROM crmalpha.account_categories GROUP BY description ) ) {
UPDATE crmalpha.accounts SET accountCategory = $row['id'] WHERE accountCategory IN ( SELECT id FROM crmalpha.account_categories WHERE description = $row['description] )
}
Forgive the for loop and php variable pseudo code, I'm just trying to think through it logically. I have no idea how to accomplish this in pure SQL.
Any ideas?
PS., Afterwards, I will go through and delete from account_categories every row where the ID is not used in the accounts table.
This worked when I tried it against the test data you posted above. That said, when doing any mass cleanup like this I'd recommend making a copy of the table first. Also check results after issuing the UPDATE and before issuing a COMMIT.
Here's the query:
UPDATE Accounts acct
INNER JOIN Account_Categories cat ON acct.AccountCategory = cat.id
INNER JOIN (
SELECT MIN(id) AS NewID, Description
FROM Account_Categories
GROUP BY Description) NewCat ON cat.Description = NewCat.Description
SET acct.AccountCategory = NewCat.NewID
Some explanation:
The subquery (SELECT MIN(id)...) gets a single ID value (the lowest one) for each description.
The first join (to Account_Categories) associates each account with its category for the sole purpose of having the decription available.
The second join (to the subquery) associates the account's existing description to the table of flattened/de-duped descriptions and their ID.