I am not a databases guy,but I have been given the "fun" job of cleaning up someone else's database. We have many duplicate record in our databases and some of customers are getting double or triple billed every month.
Given the following Database example
:
Table: Customers
ID Name Phone DoNotBill
1 Acme Inc 5125551212 No
2 ABC LLC 7138221661 No
3 Big Inc 4132229807 No
4 Acme 5125551212 No
5 Tree Top 2127657654 No
Is it possible to write a query that Identifies the all duplicate phone numbers (in this case records 1 and 4) and then marks and duplicate records yes by updating the DoNotBill column. But leaves the first record unmarked.
In this example case we would be left with:
ID Name Phone DoNotBill
1 Acme Inc 5125551212 No
2 ABC LLC 7138221661 No
3 Big Inc 4132229807 No
4 Acme 5125551212 Yes
5 Tree Top 2127657654 No
something like this?
UPDATE
customers cust,
(SELECT
c1.ID,
c1.name,
c1.phone,
c1.DoNotBill
FROM customers c
LEFT JOIN
(SELECT
cc.ID
FROM customers cc
) as c1 on c1.phone = c.phone
) dup
SET cust.DoNotBill = 'Yes' WHERE cust.id=dup.id ;
To begin with I assume that the DoNotBill column only has two possible values; yes and no. In that case it should be bool instead of varchar, meaning it would be either true or false.
Furthermore I don't get the meaning of the DoNotBill column. Why wouldn't you just use something like this?
select distinct phone from customers
SQL SELECT DISTINCT
That would give you the phone numbers without duplicates and without the need for an extra column.
This depends on ur data amount
You can do it in steps and make use some tools like excel...
This qrt
SELECT a.id,b.id,a.phone FROM clients a , clients b WHERE
A.phone =b.phone
And a.id!=b.id
The result is all duplicated records.
Add
Group by a.phone
And u will get 1 record for each 2 duplicates.
if you like the records and they are whT u need. ChNge select to select a.id and
Use this qry as subqry to an update sql statement
UPDATE clients SET billing='no' WHERE id IN ( sql goes here)
UPDATE customers c SET c.DoNotBill="Yes";
UPDATE customers c
JOIN (
SELECT MIN( ID ) ID, Phone
FROM customers
GROUP BY Phone
) u ON c.ID = u.ID AND c.Phone = u.Phone
SET c.DoNotBill="No";
That way not only duplicates are eliminated, but all multiple entries are dealt with.
Related
Table: user
id
compId
1
comp1
2
comp1
Table: Company
id
name
comp1
coke
comp2
pepsi
need a MYSQL query which should fetch company record only if it has one or more users, when passed a company id. I will have other where conditions on company table.
Can this be achieved by joins?
example 1: query(comp1) result: coke (atleast one user exists)
example 2: query(comp2) result: no records (Since no user exists who belong to company comp2)
What you're asking for is called a semi-join. This returns one row from company if there are one or more matching rows in user.
If you use a regular join:
SELECT c.* FROM company c JOIN user u ON u.compid = c.id;
This does return the row from company, but you might not like that it returns one row per user. I.e. rows in the result are multiplied by the number of matches.
There are several possible fixes for this, to reduce the results to one row per company.
SELECT DISTINCT c.* FROM company c JOIN user u ON u.compid = c.id;
SELECT c.* FROM company c JOIN (SELECT DISTINCT compid FROM user) u ON u.compid = c.id;
SELECT * FROM company c WHERE c.id IN (SELECT compid FROM user);
SELECT * FROM company c WHERE EXISTS (SELECT * FROM user WHERE compid = c.id);
Which one is best for your app depends on many factors, such as the sizes of the tables, the other conditions in the query, etc... I'll leave it to you to evaluate them given your specific needs.
I have an issue with data redundancy. My JOIN query in MySQL creates a very large data set (~8mb) while a lot of the data is redundant. After analysis, I can see that the query is fast, but the data transfer can take several seconds. What options do I have?
For example, say that I have the two tables
Users:
user_id
user_name
1
Alex
2
Joe
And Purchases:
user_id
purchase_id
purchase_amount
1
A
100
2
B
200
1
C
300
1
D
400
If I simply LEFT Join the tables with
SELECT users.user_id, users.user_name, purchase_id, purchase_amount
FROM Users
LEFT JOIN purchases ON users.id = purchases.user_id
I will end up with a result:
user_id
user_name
purchase_id
purchase_amount
1
Alex
A
100
2
Joe
B
200
1
Alex
C
300
1
Alex
D
400
However, as we can see, the user_id 1 and user_name Alex exists in three places. For very large result sets this can become an issue.
I'm thinking about using GROUP BY and GROUP_CONCAT to reduce the redundancy. Is this in general a good idea? My first tests seem to work, but I have to set the MySQL SET SESSION group_concat_max_len = 1000000; which might not be a good thing since I don't know what to set it to.
For example I could do something like
SELECT user_id, user_name, GROUP_CONCAT(CONCAT(purchase_id, ':', purchase_amount))
FROM Users
LEFT JOIN purchases ON users.id = purchases.user_id
GROUP BY user_id, user_name
And end up with a result:
user_id
user_name
GROUP_CONCAT...
1
Alex
A:100,C:300,D:400
2
Joe
B:200
Are there any other options for me? Is this the way to go? Parsing the concatenated column is not an issue. I am trying to solve the large data set being returned.
we can have temp table in between?
use apache spark's map reduce to get data in desired format.
Having trouble counting from a separate table. I'm only getting how many callers are making calls rather than each individual count for every call.
I have went in and checked that most callers make multiple calls but i'm not sure how to show this.
I'm looking for which Company has >18 calls.
Tables are:
Customer
Company_ref
Company_name
Contact_id
Address_1
Address_2
Caller
Caller_id
Company_ref
First_name
Last_name
Issue
Call_ref
Caller_id
Call_date
Detail
Query:
SELECT Company_name, Count(Call_ref)
from Customer JOIN Issue on (Contact_id = Caller_id)
Group by Company_name
and example of the outcome is
Affright Retail 5
Askew Inc. 5
Askew Shipping 6
Bai Services 2
Cell Group 5
Comfiture Traders 5
which is only counting how many callers rather than how many calls made
This should work (MS SQL-Server):
select a.Company_ref, count(c.Call_ref) as Calls from caller a
join Issue b on (a.Caller_id = b.Caller_id)
join Customer c on (a.Company_ref = c.Company_ref)
group by a.Company_ref
adding a Where clause to determine companies with 18 or more calls:
select * from (
select a.Company_ref, count(c.Call_ref) as Calls from caller a
join Issue b on (a.Caller_id = b.Caller_id)
join Customer c on (a.Company_ref = c.Company_ref)
group by a.Company_ref ) result
where Calls > 18
CallerID refers to Caller table not Customer table
SELECT Y.Company_name, Count(I.Call_ref)
FROM Issue I
JOIN Customer C
ON I.Caller_id = C.Caller_id
JOIN Company Y
ON C.Company_ref = Y.Company_ref
Group by Company_name
I have the following tables:
Users
user_id course_id completion_rate
1 2 0.4
1 23 0.6
1 49 0.5
... ... ...
Courses
course_id title
1 Intro to Python
2 Intro to R
... ...
70 Intro to Flask
Each entry in the user table represents a course that the user took. However, it is rare that users have taken every course.
What I need is a result set with user_id, course_id, completion_rate. In the case that the user has taken the course, the existing completion_rate should be used, but if not then the completion_rate should be set to 0. That is, there would be 70 rows for each user_id, one for each course.
I don't have a lot of experience with SQL, and I'm not sure where to start. Would it be easier to do this in something like R?
Thank you.
You should first cross join the courses with distinct users. Then left join on this to get the desired result. If the user hasn't taken a course the completion_rate would be null and we use coalesce to default a 0.
select c.course_id,cu.user_id,coalesce(u.completion_rate,0) as completion_rate
from courses c
cross join (select distinct user_id from users) cu
left join users u on u.course_id=c.course_id and cu.user_id=u.user_id
Step1: Take the distinct client_id from client_data (abc) and do 1 on 1 merge with the course data (abc1) . 1 on 1 merge helps up write all the courses against each client_id
Step2: Merge the above dataset with the client info on client_id as well as course
create table ans as
select p.*,case when q.completion_rate is not null then q.completion_rate else 0
end as completion_rate
from
(
select a.client_id,b.course from
(select distinct client_id from abc) a
left join
abc1 b
on 1=1
) p
left join
abc q
on p.client_id = q.client_id and p.course = q.course
order by client_id,course;
Let me know in case of any queries.
I am trying to get a list of companies in our db that are either account_level IN ('basic', 'full'). Our clients are account_level = 'enterprise'. In many cases, an enterprise client will also have rows in the db from then they were basic or full, so I want to exclude any basic/full companies which also have enterprise rows (i.e. I want to exclude our current clients). This way, I can just get a list of companies who are strictly basic or full, and actually aren't our clients yet.
Here's an example of the company table:
1 company a basic
2 company a full
3 company b basic
4 company b enterprise
5 company c basic
I want the query to return companies a and c.
I am trying to use:
SELECT *
FROM company c1
INNER JOIN company c2 ON c1.id=c2.id
WHERE c1.company NOT IN (SELECT c2.company FROM company c2
WHERE account_level = 'enterprise')
AND c1.account_level IN ('full', 'basic')
ORDER BY c1.company;
but get no results. Can somebody see what I am doing wrong? Sorry, I'm not too experienced yet in mysql. Thanks for your help.
You can get the desired result using a combination of EXISTS and NOT EXISTS:
SELECT DISTINCT c1.company
FROM company c1
WHERE EXISTS (SELECT 1
FROM company AS c2
WHERE c1.company = c2.company AND c2.account_level IN ('full', 'basic'))
AND
NOT EXISTS (SELECT 1
FROM company AS c3
WHERE c1.company = c3.company AND c3.account_level IN ('enterprise')
Or, even simpler:
SELECT DISTINCT c1.company
FROM company c1
WHERE c1.account_level IN ('full', 'basic'))
AND
NOT EXISTS (SELECT 1
FROM company AS c2
WHERE c1.company = c2.company AND c2.account_level IN ('enterprise')