I have two tables contacts and calllist. contacts has multiple columns containing phone numbers. calllist has only one column from_number containing phone numbers. I'm trying to get all phone numbers from the column from_number which do not match the phone numbers in the table calllist.
Here is my working but probably very inefficient and slow SQL query:
SELECT from_number AS phone_number, COUNT(from_number) AS number_of_calls
FROM calllist
WHERE from_number NOT IN (
SELECT businessPhone1
FROM contacts
WHERE businessPhone1 IS NOT NULL
)
AND from_number NOT IN (
SELECT businessPhone2
FROM contacts
WHERE businessPhone2 IS NOT NULL
)
AND from_number NOT IN (
SELECT homePhone1
FROM contacts
WHERE homePhone1 IS NOT NULL
)
AND from_number NOT IN (
SELECT homePhone2
FROM contacts
WHERE homePhone2 IS NOT NULL
)
AND from_number NOT IN (
SELECT mobilePhone
FROM contacts
WHERE mobilePhone IS NOT NULL
)
AND (received_at BETWEEN '$startDate' AND DATE_ADD('$endDate', INTERVAL 1 DAY))
GROUP BY phone_number
ORDER BY number_of_calls DESC
LIMIT 10
How do i rewrite this SQL query to be faster? Any help would be much appreciated.
try this
SELECT from_number AS phone_number, COUNT(from_number) AS number_of_calls
FROM calllist
WHERE from_number NOT IN (
SELECT businessPhone1
FROM contacts
WHERE businessPhone1 IS NOT NULL
UNION
SELECT businessPhone2
FROM contacts
WHERE businessPhone2 IS NOT NULL
UNION
SELECT homePhone1
FROM contacts
WHERE homePhone1 IS NOT NULL
UNION
SELECT homePhone2
FROM contacts
WHERE homePhone2 IS NOT NULL
UNION
SELECT mobilePhone
FROM contacts
WHERE mobilePhone IS NOT NULL
)
AND (received_at BETWEEN '$startDate' AND DATE_ADD('$endDate', INTERVAL 1 DAY))
GROUP BY phone_number
ORDER BY number_of_calls DESC
LIMIT 10
I don't like the schema design. You have multiple columns holding 'identical' data -- namely phone numbers. What if technology advances and you need a 6th phone number??
Instead, have a separate table of phone numbers, with linkage (id) to JOIN back to calllist. That gets rid of all the slow NOT IN ( SELECT... ), avoids a messy UNION ALL, etc.
If you desire, the new table could have a 3rd column that says which type of phone it is.
ENUM('unknown', 'company', 'home', 'mobile')
The simplified query goes something like
SELECT cl.from_number AS phone_number,
COUNT(*) AS number_of_calls
FROM calllist AS cl
LEFT JOIN phonenums AS pn ON cl.id = pn.user_id
WHERE cl.received_at >= '$startDate' AND
AND cl.received_at < '$endDate' + INTERVAL 1 DAY
AND pn.number IS NULL -- not found in phonenums
GROUP BY phone_number
ORDER BY number_of_calls DESC
LIMIT 10
Related
I am fetching data from MySQL views table and Main table. I have created Indexes and Primary keys in Main table but I cannot create Indexes and primary keys on views table.
When I execute the below query it is taking around 10 seconds. I want to optimize the below query to less time.
SELECT DISTINCT
`Emp_No`, `Name`
FROM
`ResLookup`
WHERE
`IsActive` = 1
AND `Department` IN ('SDG' , 'HDD', 'ENG', 'PDN')
AND (`Emp_No` IN (SELECT DISTINCT
ProjList.PM_No
FROM
ProjList
WHERE
ProjList.PM_No != 1749 UNION SELECT DISTINCT
ProjList.PL_No
FROM
ProjList
WHERE
ProjList.PL_No != 1749)
OR Emp_No IN (SELECT
MEMBER_ID
FROM
s_group_details
WHERE
GROUP_ID = 'GRP109'
AND MEMBERSHIP_LEVEL = 30));
Only s_group_details table have Indexes and primary key. Remaining all tables are fetching from views table.
Using Explain Query I have the below output
I don't know your query requirements but still check below query helpful or not
SELECT DISTINCT
`Emp_No`, `Name`
FROM
`ResLookup` inner join (SELECT DISTINCT
ProjList.PM_No ,ProjList.PL_No
FROM
ProjList
WHERE
ProjList.PM_No != 1749
or
ProjList.PL_No != 1749) a
on ResLookup.Emp_No = a.PM_No
and ResLookup.Emp_No = a.PL_No
OR Emp_No IN (SELECT
MEMBER_ID
FROM
s_group_details
WHERE
GROUP_ID = 'GRP109'
AND MEMBERSHIP_LEVEL = 30)
WHERE
`IsActive` = 1
AND `Department` IN ('SDG' , 'HDD', 'ENG', 'PDN');
It may be better to turn things somewhat inside-out:
SELECT `Emp_No`,
( SELECT Name
FROM ResLookup
WHERE Emp_No = u.PM_No
) AS Name
FROM
( SELECT PM_No FROM ProjList WHERE PM_No != 1749 )
UNION DISTINCT
( SELECT PL_No FROM ProjList WHERE PL_No != 1749 )
UNION DISTINCT
( SELECT MEMBER_ID
FROM s_group_details AS d
WHERE d.GROUP_ID = 'GRP109'
AND d.MEMBERSHIP_LEVEL = 30
) AS u
JOIN `ResLookup` AS r ON u.PM_No = r.Emp_No
WHERE r.`IsActive` = 1
AND r.`Department` IN ('SDG' , 'HDD', 'ENG', 'PDN');
Indexes needed:
ResLookup: (Emp_No, IsActive, Department)
s_group_details: (GROUP_ID, MEMBERSHIP_LEVEL, MEMBER_ID)
I have the following query -
(SELECT entry_by_id, description,
issue_date AS issue_date,
bill_type_id AS account_type,
amount
FROM ac_bills
WHERE issue_date BETWEEN '2015-08-01' and '2015-08-31')
UNION
(select officer_id AS entry_by_id, description,
tr_date AS issue_date,
account_type_id AS account_type,
amount
FROM ac_transactions
WHERE ac_transactions.tr_date BETWEEN '2015-08-01' AND '2015-08-31'
AND ac_transactions.account_type_id
IN (SELECT id FROM account_types WHERE type_of_nature=2))
ORDER BY issue_date DESC, account_type ASC
As you can see -
1. The account_types table (in the sub query) has the column type_of_nature that tells me what type transaction an item is (1=income and 2=expense)
2. I am trying to pull all expenses from two tables - ac_bills and ac_transactions. ac_bills only holds expenses while the ac_transactions can hold any type of transactions (income/expense) and we know what type the transaction is by the account_types table reference.
The issue - the query is pulling all types of transactions from the ac_transactions table when, in theory, it should pull only expenses. Funny thing is - when I run the select query on the ac_transactions alone without the union part, then it successfully pulls only expenses.
I am pulling my hair out! Can anyone see what I am doing wrong?
Can you run this... not sure that it's right, but it seems cleaner at least
(
SELECT entry_by_id
, description
, issue_date
, bill_type_id account_type
, amount
FROM ac_bills
where issue_date between '2015-08-01' and '2015-08-31'
)
union
(
select officer_id entry_by_id
, description
,tr_date
, account_type_id
, amount
from ac_transactions
join account_types
ON account_types.id = ac_transactions.account_type_id
where ac_transactions.tr_date between '2015-08-01' and '2015-08-31'
and account_types.type_of_nature = 2
)
ORDER
BY issue_date DESC
, account_type ASC;
I have a table CONTACT with a field opt_out.
The field opt_out may have values 'Y', 'N' and NULL.
I have a table CONTACT_AUDIT with fields
date
contact_id
field_name
value_before
value_after
When I add a new contact, a new line is added in the CONTACT table, nothing the CONTACT_AUDIT table.
When I edit a contact, for example if I change the opt_out field value from NULL to 'Y', the opt_out field value in CONTACT table is changed and a new line is added to CONTACT_AUDIT table with values
date=NOW()
contact_id=<my contact's id>
field_name='opt_out'
value_before=NULL
value_after='Y'
I need to know the contacts who had opt_out='Y' at a given date.
I tried this :
SELECT count(*) AS nb
FROM contacts c
WHERE
( -- contact is optout now and has never been modified before
c.optout = 'Y'
AND c.id NOT IN (SELECT DISTINCT contact_id FROM contacts_audit WHERE field_name = 'optout')
)
OR ( -- we consider contacts where the last row before date in contacts_audit is optout = 'Y'
c.id IN (
SELECT ca.contact_id
FROM contacts_audit ca
WHERE date_created BETWEEN '2014-07-24' AND DATE_ADD( '2014-07-24', INTERVAL 1 DAY )
AND field_name = 'optout'
ORDER BY date_created
LIMIT 1
)
)
But mysql does not support LIMIT in subquery.
So I tried with HAVING :
SELECT count(*) AS nb
FROM contacts c
WHERE
( -- contact is optout now and has never been modified before
c.optout = 'Y'
AND c.id NOT IN (SELECT DISTINCT contact_id FROM contacts_audit WHERE field_name = 'optout')
)
OR ( -- we consider contacts where the last row before date in contacts_audit is optout = 'Y'
c.id IN (
SELECT ca.contact_id
FROM contacts_audit ca
WHERE date_created BETWEEN '2014-07-24' AND DATE_ADD( '2014-07-24', INTERVAL 1 DAY )
AND field_name = 'optout'
HAVING MAX(date_created)
)
)
The query runs, but now, I don't know how to know if the value corresponding to the subquery value is 'Y' or 'N'. If I add a WHERE clause to check only for 'Y' values, 'N' values will be filtred and I will not be able to know if the last value at date was 'Y' or 'N'...
Thank you for your help
If i understand your problem correctly you may want to use a union. I dont have mysql to test it right now but the code could be something like this. tell me if this helped
select c.id, c.optout
where c.optout = 'Y'
AND c.id NOT IN (SELECT DISTINCT contact_id FROM contacts_audit WHERE field_name = 'optout')
UNION
select c.id, c.optout where c.id IN (
SELECT ca.contact_id
FROM contacts_audit ca
WHERE date_created BETWEEN '2014-07-24' AND DATE_ADD( '2014-07-24', INTERVAL 1 DAY )
AND field_name = 'optout'
HAVING MAX(date_created)
)
Lets say we have a table named record with 4 fields
id (INT 11 AUTO_INC)
email (VAR 50)
timestamp (INT 11)
status (INT 1)
And the table contains following data
Now we can see that the email address test#xample.com was duplicated 4 times (the record with the lowest timestamp is the original one and all copies after that are duplicates). I can easily count the number of unique records using
SELECT COUNT(DISTINCT email) FROM record
I can also easily find out which email address was duplicated how many times using
SELECT email, count(id) FROM record GROUP BY email HAVING COUNT(id)>1
But now the business question is
How many times STATUS was 1 on all the Duplicate Records?
For example:
For test#example.com there was no duplicate record having status 1
For second#example.com there was 1 duplicate record having status 1
For third#example.com there was 1 duplicate record having status 1
For four#example.com there was no duplicate record having status 1
For five#example.com there were 2 duplicate record having status 1
So the sum of all the numbers is 0 + 1 + 1 + 0 + 2 = 4
Which means there were 4 Duplicate records which had status = 1 In table
Question
How many Duplicate records have status = 1 ?
This is a new solution that works better. It removes the first entry for each email and then counts the rest. It's not easy to read, if possible I would write this in a stored procedure but this works.
select sum(status)
from dude d1
join (select email,
min(ts) as ts
from dude
group by email) mins
using (email)
where d1.ts != mins.ts;
sqlfiddle
original answer below
Your own query to find "which email address was duplicated how many times using"
SELECT email,
count(id) as duplicates
FROM record
GROUP BY email
HAVING COUNT(id)>1
can easily be modified to answer "How many Duplicate records have status = 1"
SELECT email,
count(id) as duplicates_status_sum
FROM record
GROUP BY email
WHERE status = 1
HAVING COUNT(id)>1
Both these queries will answer including the original line so it's actually "duplicates including the original one". You can subtract 1 from the sums if the original one always have status 1.
SELECT email,
count(id) -1 as true_duplicates
FROM record
GROUP BY email
HAVING COUNT(id)>1
SELECT email,
count(id) -1 as true_duplicates_status_sum
FROM record
GROUP BY email
WHERE status = 1
HAVING COUNT(id)>1
If I am not wrong in understanding then your query should be
SELECT `email` , COUNT( `id` ) AS `tot`
FROM `record` , (
SELECT `email` AS `emt` , MIN( `timestamp` ) AS `mtm`
FROM `record`
GROUP BY `email`
) AS `temp`
WHERE `email` = `emt`
AND `timestamp` > `mtm`
AND `status` =1
GROUP BY `email`
HAVING COUNT( `id` ) >=1
First we need to get the minimum timestamp and then find duplicate records that are inserted after this timestamp and having status 1.
If you want the total sum then the query is
SELECT SUM( `tot` ) AS `duplicatesWithStatus1`
FROM (
SELECT `email` , COUNT( `id` ) AS `tot`
FROM `record` , (
SELECT `email` AS `emt` , MIN( `timestamp` ) AS `mtm`
FROM `record`
GROUP BY `email`
) AS `temp`
WHERE `email` = `emt`
AND `timestamp` > `mtm`
AND `status` =1
GROUP BY `email`
HAVING COUNT( `id` ) >=1
) AS t
Hope this is what you want
You can get the count of Duplicate records have status = 1 by
select count(*) as Duplicate_Record_Count
from (select *
from record r
where r.status=1
group by r.email,r.status
having count(r.email)>1 ) t1
The following query will return the duplicate email with status 1 count and timestamp
select r.email,count(*)-1 as Duplicate_Count,min(r.timestamp) as timestamp
from record r
where r.status=1
group by r.email
having count(r.email)>1
I have a query
select user_id,sum(hours),date, task_id from table where used_id = 'x' and date >='' and date<= '' group by user_id, date, task_id with roll up
The query works fine. But I also need to find a second sum(hours) where the group by order is changed.
select user_id,sum(hours),date, task_id from table where used_id = 'x' group by user_id,task_id
(The actual where condition is much longer.)
Is it possible to get both the sum in a single query since the where condition almost the same?
SELECT * FROM (
SELECT 1 AS list_id
, user_id
, sum(hours) AS total_hours
, `date`
, task_id
FROM table WHERE used_id = 'x' AND `date` BETWEEN #thisdate AND #thatdate
GROUP BY user_id, `date`, task_id /*WITH ROLLUP*/
UNION ALL
SELECT 2 AS list_id
, user_id
, sum(hours) AS total_hours
, `date`
, task_id
FROM table
WHERE used_id = 'x'
GROUP BY user_id,task_id WITH ROLLUP ) q
/*ORDER BY q.list_id, q.user_id, q.`date`, q.task_id*/
Depending on your needs, you should only need one with rollup, or two.