Is there a better way to do this query? - mysql

I'm kind of new to MYSQL, this query works but wonder if there's a more efficient way.
I have 2 tables...client_table and client_skills. Both tables have a 'client_id' column. I'm trying to select only the clients that match all of the requested skill code and types
This is what I first tried... (doesn't work unless you change AND to OR)
SELECT * FROM client_table
LEFT JOIN client_skills ON client_table.client_id = client_skills.client_id
WHERE (skill_code='97' AND skill_type='0')
AND (skill_code='65' AND skill_type='0')
AND (skill_code='23' AND skill_type='5')
Then I tried this... (which works and returns the results I want it to)
SELECT * FROM client_table
WHERE client_id IN
(SELECT client_id FROM client_skills WHERE (skill_code='97' AND skill_type='0'))
AND client_id IN
(SELECT client_id FROM client_skills WHERE (skill_code='65' AND skill_type='0'))
AND client_id IN
(SELECT client_id FROM client_skills WHERE (skill_code='23' AND skill_type='5'))
But if there is a shorter or more efficient way to do that I'd love to learn it.
Thanks for your help

your 1st query modified using OR is not bad. and you can do with UNION:
SELECT client_id FROM client_skills WHERE (skill_code='97' AND skill_type='0')
UNION
SELECT client_id FROM client_skills WHERE (skill_code='65' AND skill_type='0')
UNION
SELECT client_id FROM client_skills WHERE (skill_code='23' AND skill_type='5')
EDITED
you can accomplish something like this. SELF JOIN is required. (not tested on you data):
SELECT t1.client_id
FROM client_skills t1 JOIN client_skills t2 ON t1.client_id = t2.client_id
AND t1.skill_code='97' AND t1.skill_type='0'
AND t2.skil_code='65' AND t2.skill_type='0'
JOIN client_skills t3 ON t3.client_id = t1.client_id
AND t3.skil_code='23' AND t3.skill_type='5'

is this what you're asking for since you wanted to use AND instead of OR
sqlFiddle example in my example only Jeff has all 3 skills (of that skill_code and skill_type)
SELECT c.*
FROM client_table c,
client_skills s1,
client_skills s2,
client_skills s3
WHERE
c.client_id = s1.client_id AND s1.skill_code=97 AND s1.skill_type=0
AND c.client_id = s2.client_id AND s2.skill_code=65 AND s2.skill_type=0
AND c.client_id = s3.client_id AND s3.skill_code=23 AND s3.skill_type=5
It's the same logic as your second query.

the second one is slower than the first one since you are doing 4 queries, so between those two example just go with the first one with OR instead of AND.
There may be better/faster queries, but that depends on the requirements and your table's indexes
For example, do you really need skill_type? can you have two skills with the same code and different type? if all skill codes are different, just do:
SELECT * FROM client_table
LEFT JOIN client_skills ON client_table.client_id = client_skills.client_id
WHERE skill_code IN ('97','65','23')
if you have an index on skill_code then the query would be faster too
EDIT: ok, now I understand your question, you need at least 2 queries, one to get all client ids that matches the tree skills and one to get clients with those ids, try this:
SELECT * from client_table WHERE client_id IN (
SELECT client_id FROM client_skills WHERE (skill_code='97' AND skill_type='0') OR
(skill_code='65' AND skill_type='0') OR (skill_code='23' AND skill_type='5')
GROUP BY client_id HAVING count(*) = 3)
what you are doing there is:
get all client_skills records the matches any of your three desired skills
group them by client_id
select only the client_id for groups that has exactly 3 records (it means those groups have all three skills)
select all clients who's ids are in the ids returned by the subquery
I think that's be fastest approach, only 2 queries, I don't think it can be done with just one query
NOTE: that query is not tested, it's the idea, you may need to play a little with HAVING and GROUP_BY

SELECT
client_table.*
,SUM(
IF( skill_code='97' AND skill_type='0' ,1 ,0 )
+
IF( skill_code='65' AND skill_type='0' ,1 ,0 )
+
IF( skill_code='23' AND skill_type='5' ,1 ,0 )
) AS skill_code_type_count
FROM
client_table
LEFT JOIN client_skills ON client_table.client_id = client_skills.client_id
GROUP BY
client_table.client_id
HAVING
skill_code_type_count >= 3

Related

sql request too long ... How to simplify?

i would like to reduce the process time of my SQL request (actually it runs 10 minutes ...)
I think the problem come from the nested SQL queries.
(sorry for my english, i'm french student)
SELECT DISTINCT `gst.codeAP21`, `gst.email`, `gst.date`, `go.amount`
FROM globe_statistique
JOIN globe_customers ON `gst.codeAP21`=`gc.codeAP21`
JOIN globe_orders ON `gc.ID`=`go.FK_ID_customers`
WHERE `gst.page` = 'send_order'
AND `gst.date` = FROM_UNIXTIME(`go.date`,'%%Y-%%m-%%d')
UNION
SELECT DISTINCT `gst.codeAP21`, `gst.email`, `gst.date`, '-'
FROM globe_statistique
WHERE `gst.page` NOT LIKE 'send_order' "
AND (`gst.codeAP21`,`gst.date`) NOT IN
( SELECT `gst.codeAP21`,`gst.date` FROM globe_statistique
WHERE `gst.page`='send_order');
Thanks
try this:
SELECT DISTINCT `gst.codeAP21`, `gst.email`, `gst.date`, `go.amount`
FROM globe_statistique
JOIN globe_customers ON `gst.codeAP21`=`gc.codeAP21`
JOIN globe_orders ON `gc.ID`=`go.FK_ID_customers`
WHERE `gst.page` = 'send_order'
AND `gst.date` = FROM_UNIXTIME(`go.date`,'%%Y-%%m-%%d')
UNION
SELECT DISTINCT t1.`gst.codeAP21`, t1.`gst.email`, t1.`gst.date`, '-'
FROM globe_statistique t1
left join globe_statistique t2 on t1.gst.page =t2.gst.page and t1.gst.date =t2.gst.date and t2.gst.page =send_order
WHERE `gst.page` <> 'send_order' AND t2.gst.date is null
But i recomment to rename your column names and remove the dots.
Also use EXPLAIN to find out why the query is slow and add the correct index
try to avoid the use of distinct. To this end, UNION ALL should be used. Group by at the end gives the same result:
select codeAP21, email, date, amount
from ( --> your query without distinct but with UNION ALL <-- )
group by codeAP21, email, date, amount
see: Huge performance difference when using group by vs distinct

How efficiently check record exist more than 2 times in table using sub-query?

I have a query like this . I have compound index for CC.key1,CC.key2.
I am executing this in a big database
Select * from CC where
( (
(select count(*) from Service s
where CC.key1=s.sr2 and CC.key2=s.sr1) > 2
AND
CC.key3='new'
)
OR
(
(select count(*) from Service s
where CC.key1=s.sr2 and CC.key2=s.sr1) <= 2
)
)
limit 10000;
I tried to make it as inner join , but its getting slower . How can i optimize this query ?
The trick here is being able to articulate a query for the problem:
SELECT *
FROM CC t1
INNER JOIN
(
SELECT cc.key1, cc.key2
FROM CC cc
LEFT JOIN Service s
ON cc.key1 = s.sr2 AND
cc.key2 = s.sr1
GROUP BY cc.key1, cc.key2
HAVING COUNT(*) <= 2 OR
SUM(CASE WHEN cc.key = 'new' THEN 1 ELSE 0 END) > 2
) t2
ON t1.key1 = t2.key1 AND
t1.key2 = t2.key2
Explanation:
Your original two subqueries would only add to the count if a given record in CC, with a given key1 and key2 value, matched to a corresponding record in the Service table. The strategy behind my inner query is to use GROUP BY to count the number of times that this happens, and use this instead of your subqueries. The first count condition is your bottom subquery, and the second one is the top.
The inner query finds all key1, key2 pairs in CC corresponding to records which should be retained. And recognize that these two columns are the only criteria in your original query for determining whether a record from CC gets retained. Then, this inner query can be inner joined to CC again to get your final result set.
In terms of performance, even this answer could leave something to be desired, but it should be better than a massive correlated subquery, which is what you had.
Basically get the Columns that must not have a duplicate then join them together. Example:
select *
FROM Table_X A
WHERE exists (SELECT 1
FROM Table_X B
WHERE 1=1
and a.SHOULD_BE_UNIQUE = b.SHOULD_BE_UNIQUE
and a.SHOULD_BE_UNIQUE2 = b.SHOULD_BE_UNIQUE2
/* excluded because these columns are null or can be Duplicated*/
--and a.GENERIC_COLUMN = b.GENERIC_COLUMN
--and a.GENERIC_COLUMN2 = b.GENERIC_COLUMN2
--and a.NULL_COLUMN = b.NULL_COLUMN
--and a.NULL_COLUMN2 = b.NULL_COLUMN2
and b.rowid > a.ROWID);
Where SHOULD_BE_UNIQUE and SHOULD_BE_UNIQUE2 are columns that shouldn't be repeated and have unique columns and the GENERIC_COLUMN and NULL_COLUMNS can be ignored so just leave them out of the query.
Been using this approach when we have issues in Duplicate Records.
With the limited information you've given us, this could be a rewrite using 'simplified' logic:
SEELCT *
FROM CC NATURAL JOIN
( SELECT key1, key2, COUNT(*) AS tally
FROM Service
GROUP
BY key1, key2 ) AS t
WHERE key3 = 'new' OR tally <= 2;
Not sure whether it will perform better but might give you some ideas of what to try next?

How to combine 2 mysql queries

I have the following 2 queries.
Query 1 :
select distinct(thread_id) from records where client_name='MyClient'
Query 2 :
select max(thread_no) from records
where thread_id='loop_result_from_above_query' AND action='Reviewed'
Is it possible to combine them into a single query ?
The second query is run on every result of the first query.
Thank you.
See attached image of a small snippet of mysql records.
I need a single mysql query to output only records which have action="MyAction" as the latest records for a given set of thread_ids. In the sample data set : record with Sr: 7201
I hope this helps in helping me :)
SELECT client_name, thread_id, MAX(thread_no) max_thread
FROM records
WHERE action='Reviewed' AND client_name='MyClient'
GROUP BY client_name, thread_id
UPDATE 1
SELECT a.*
FROM records a
INNER JOIN
(
SELECT thread_id, max(sr) max_sr
FROM records
GROUP BY thread_id
) b ON a.thread_id = b.thread_id AND
a.sr = b.max_sr
WHERE a.action = 'MyAction'
You can use SELF JOIN, but it is not advisable and will impact your query performance. Please check below query for your reference
SELECT DISTINCT r1.thread_id, MAX(r2.thread_no) from records r1 LEFT JOIN records r2 ON r2.thread_id=r1.thread_id WHERE r1.client_name='MyClient' AND r2.action='Reviewed'
SELECT a.maxthreadid,
b.maxthreadno
FROM (SELECT DISTINCT( thread_id ) AS MaxThreadId
FROM records
WHERE client_name = 'MyClient') a
CROSS JOIN (SELECT Max(thread_no) AS MaxThreadNo
FROM records
WHERE thread_id = 'loop_result_from_above_query'
AND action = 'Reviewed') b
Try this.
SELECT *
FROM (SELECT Row_number()
OVER (
partition BY thread_id
ORDER BY thread_no) no,
Max(thread_no)
OVER(
partition BY thread_id ) Maxthread_no,
thread_id,
action,
client_name
FROM records
Where client_name = 'MyClient') AS T1
WHERE no = 1
AND action = 'Reviewed'

union of two select queries with different fields

I have two tables empmaster and allocation. I used union to do sql operation in order to get results from two tables. empmaster has empid and other empdetails. Table allocation contains empid from empmaster as foriegn key another field called per_alloc. I need to retrieve empdetails which satisfies:
empmaster.empid not in allocation.empid.
empmaster.empid in allocation.empid and allocation.per_alloc < 100.
MySQL query I used is:
select distinct(tbl_empmaster.emp_fname)
from tbl_empmaster
where tbl_empmaster.emp_id not in(select tbl_allocation.emp_id
from tbl_allocation)
union
select distinct(tbl_empmaster.emp_fname)
from tbl_empmaster
where tbl_empmaster.emp_id in(select tbl_allocation.emp_id
from tbl_allocation
group by emp_id
having sum(per_alloc) < 100)
This only retrieves empdetails, say tbl_empmaster.emp_fname, I need to retrieve sum(per_alloc) from select tbl_allocation!!! When I tried it gives lot of errors, Can any one show me the correct way, please?
Try this:
SELECT DISTINCT em.emp_fname, 0 alloc
FROM tbl_empmaster em
WHERE em.emp_id NOT IN(SELECT emp_id FROM tbl_allocation)
UNION
SELECT DISTINCT em.emp_fname, SUM(a.per_alloc) alloc
FROM tbl_empmaster em
INNER JOIN tbl_allocation a ON em.emp_id = a.emp_id
GROUP BY a.emp_id
HAVING SUM(a.per_alloc)<100
Ok, from what I have understood of your problem, I see two problems.
There is the unecessary grouping in the subquery of the second select statement. It should be ok to just write select tbl_allocation.emp_id from tbl_allocation where tbl_allocation.per_alloc<100)*
And the answer to your question.change the second select statement to the following, and it should work:select A.emp_fname, B.per_alloc from tbl_empmaster A join tbl_allocation B using(emp_id) where A.emp_id in(select C.emp_id from tbl_allocation C where C.per_alloc<100))
**Assuming that emp_id is the primary key*

Query optimization

I'm having a problem with this slow query:
SELECT c.*, csc1.changed_status
FROM contract c
LEFT
JOIN contract_status_change csc1
ON csc1.contract_status_change_id =
( SELECT csc2.contract_status_change_id
FROM contract_status_change csc2
WHERE csc2.contract_id = c.contract_id
ORDER
BY csc2.date_changed DESC
LIMIT 1
)
;
I have a contract table and a contract_status_change table, which records statuses against the contract. This query is joining on the latest status with the contract so you can get its current status..
Please can you help me tidy it up?
-edit-
my apologies. I have updated the query to include selecting the actual latest status out. Sorry for the confusion!
After formatting your query for readability (consistent whitespace and capitalization, removing unnecessary backticks and parentheses, more sensible aliases):
SELECT c.*
FROM contract c
LEFT
JOIN contract_status_change csc1
ON csc1.contract_status_change_id =
( SELECT csc2.contract_status_change_id
FROM contract_status_change csc2
WHERE csc2.contract_id = c.contract_id
ORDER
BY csc2.date_changed DESC
LIMIT 1
)
;
and assuming that contract_status_change.contract_status_change_id is a unique identifier, I'm forced to conclude that your query is equivalent to this, much more efficient one:
SELECT c.*
FROM contract c
;
You say that it "is joining on the latest status with the contract so you can get its current status", but it doesn't do anything with the current status — doesn't order by it, doesn't filter by it, doesn't include it in the query results — so there's no need for that.
This should help a bit.
SELECT c.*, csc1.changed_status
FROM contract c LEFT JOIN contract_changed_status csc1 ON c.contract_id = csc1.contract_id
INNER JOIN
(
SELECT contract_id, changed_status, MAX(date_changed) AS 'max_date'
FROM contract_status_changed GROUP_BY contract_id
GROUP BY contract_id
) csc2 ON csc1.contract_id = csc2.contract_id AND csc1.date_changed = csc2.max_date