MINUS operator in MySQL query [duplicate] - mysql

I am trying to perform a MINUS operation in MySql.I have three tables:
one with service details
one table with states that a service is offered in
another table (based on zipcode and state) shows where this service is not offered.
I am able to get the output for those two select queries separately. But I need a combined statement that gives the output as
'SELECT query_1 - SELECT query_2'.
Service_Details Table
Service_Code(PK) Service Name
Servicing_States Table
Service_Code(FK) State Country PK(Service_Code,State,Country)
Exception Table
Service_Code(FK) Zipcode State PK(Service_Code,Zipcode,State)

MySql does not recognise MINUS and INTERSECT, these are Oracle based operations. In MySql a user can use NOT IN as MINUS (other solutions are also there, but I liked it lot).
Example:
select a.id
from table1 as a
where <condition>
AND a.id NOT IN (select b.id
from table2 as b
where <condition>);

MySQL Does not supports MINUS or EXCEPT,You can use NOT EXISTS, NULL or NOT IN.

Here's my two cents... a complex query just made it work, originally expressed with Minus and translated for MySql
With MINUS:
select distinct oi.`productOfferingId`,f.name
from t_m_prod_action_oitem_fld f
join t_m_prod_action_oitem oi
on f.fld2prod_action_oitem = oi.oid;
minus
select
distinct r.name,f.name
from t_m_prod_action_oitem_fld f
join t_m_prod_action_oitem oi
on f.fld2prod_action_oitem = oi.oid
join t_m_rfs r
on r.name = oi.productOfferingId
join t_m_attr a
on a.attr2rfs = r.oid and f.name = a.name;
With NOT EXISTS
select distinct oi.`productOfferingId`,f.name
from t_m_prod_action_oitem_fld f
join t_m_prod_action_oitem oi
on f.fld2prod_action_oitem = oi.oid
where not exists (
select
r.name,f.name
from t_m_rfs r
join t_m_attr a
on a.attr2rfs = r.oid
where r.name = oi.productOfferingId and f.name = a.name

The tables have to have the same columns, but I think you can achieve what you are looking for with EXCEPT... except that EXCEPT only works in standard SQL! Here's how to do it in MySQL:
SELECT * FROM Servicing_states ss WHERE NOT EXISTS
( SELECT * FROM Exception e WHERE ss.Service_Code = e.Service_Code);
http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/
Standard SQL
SELECT * FROM Servicing_States
EXCEPT
SELECT * FROM Exception;

An anti-join pattern is the approach I typically use. That's an outer join, to return all rows from query_1, along with matching rows from query_2, and then filtering out all the rows that had a match... leaving only rows from query_1 that didn't have a match. For example:
SELECT q1.*
FROM ( query_1 ) q1
LEFT
JOIN ( query_2 ) q2
ON q2.id = q1.id
WHERE q2.id IS NULL
To emulate the MINUS set operator, we'd need the join predicate to compare all columns returned by q1 and q2, also matching NULL values.
ON q1.col1 <=> q2.col2
AND q1.col2 <=> q2.col2
AND q1.col3 <=> q2.col3
AND ...
Also, To fully emulate the MINUS operation, we'd also need to remove duplicate rows returned by q1. Adding the DISTINCT keyword would be sufficient to do that.

In case the tables are huge and are similar, one option is to save the PK to new tables. Then compare based only on the PK. In case you know that the first half is identical or so add a where clause to check only after a specific value or date .
create table _temp_old ( id int NOT NULL PRIMARY KEY )
create table _temp_new ( id int NOT NULL PRIMARY KEY )
### will take some time
insert into _temp_old ( id )
select id from _real_table_old
### will take some time
insert into _temp_new ( id )
select id from _real_table_new
### this version should be much faster
select id from _temp_old to where not exists ( select id from _temp_new tn where to.id = tn.id)
### this should be much slower
select id from _real_table_old rto where not exists ( select id from _real_table_new rtn where rto.id = rtn.id )

Related

Order by count(*) of my second table takes long time

Let's assume I have 2 tables. One contains car manufacturer's names and their IDs, the second contains information about car models. I need to select few of them from the first table, but order them by quantity of linked from the second table data.
Currently, my query looks like this:
SELECT DISTINCT `manufacturers`.`name`,
`manufacturers`.`cars_link`,
`manufacturers`.`slug`
FROM `manufacturers`
JOIN `cars`
ON manufacturers.cars_link = cars.manufacturer
WHERE ( NOT ( `manufacturers`.`cars_link` IS NULL ) )
AND ( `cars`.`class` = 'sedan' )
ORDER BY (SELECT Count(*)
FROM `cars`
WHERE `manufacturers`.cars_link = `cars`.manufacturer) DESC
It was working ok for my table of scooters which size is few dozens of mb. But now i need to do the same thing for the cars table, which size is few hundreds megabytes. The problem is that the query takes very long time, sometimes it even causes nginx timeout. Also, i think, that i have all the necesary database indexes. Is there any alternative for the query above?
lets try to use subquery for your count instead.
select * from (
select distinct m.name, m.cars_link, m.slug
from manufacturers m
join cars c on m.cars_link=c.manufacturer
left join
(select count(1) ct, c1.manufacturer from manufacturers m1
inner join cars_link c2 on m1.cars_link=c2.manufacturer
where coalesce(m1.cars_link, '') != '' and c1.class='sedan'
group by c1.manufacturer) as t1
on t1.manufacturer = c.manufacturer
where coalesce(m.cars_link, '') != '' and c.class='sedan') t2
order by t1.ct

How to exclude rows from a SQL fetch that have a related keyword through a many to many relationship

I have a MySQL database with a media table, and a keywords table, and a many-to-many relationship between media and keywords via a media_keywords join table.
I want to fetch all records from the media table where the following set of conditions match:
'description' is like 'dog' OR
'media.keywords' includes the id for the 'dog' keyword [100]
And exclude from the found set any records where:
'description' is like 'cat' OR
'media.keywords' includes the id for the 'cat' keyword [400]
And also exclude any row where:
'media.keywords' includes the id for the 'monochrome' keyword [500]
I also want to return only distinct rows, so I'm using GROUP By 'media.id'
The SQL statement I have at the moment is as follows:
SELECT DISTINCT
`media`.`id`,`media`.`description`,
`keywords`.`id` AS `keywords.id`,
`keywords->media_keywords`.`id` AS `keywords.media_keywords.id`,
`keywords->media_keywords`.`media_id` AS `keywords.media_keywords.media_id`,
`keywords->media_keywords`.`keyword_id` AS `keywords.media_keywords.keyword_id`
FROM database.media
LEFT OUTER JOIN
(
`media_keywords` AS `keywords->media_keywords`
INNER JOIN `keywords` AS `keywords`
ON `keywords`.`id` = `keywords->media_keywords`.`keyword_id`
)
ON `media`.`id` = `keywords->media_keywords`.`media_id`
WHERE
(
(`media`.`description` LIKE '%dog%' )
OR `keywords`.`id` IN (100)
)
AND NOT
(
(`media`.`description` LIKE '%cat%' )
OR `keywords`.`id` IN (400,500)
)
GROUP BY `media`.`id` ;
This correctly fetches records where 'dog' is in the description or is a keyword, but ignores the exclusions completely.
Can anyone see what I'm doing wrong here?
I would use a where clause:
select m.*
from media m
where (m.description like '%dog%' or
exists (select 1
from keywords k
where k.media_id = m.id and
k.keyword_id = 100
)
) and
(m.description not like '%cat%' or
exists (select 1
from keywords k
where k.media_id = m.id and
k.keyword_id in (400, 500)
)
);
This is pretty much a direct translation of your conditions.
I don't work with mySQL much, but I would suggestion a different approach
Move the "DOG" condition inside the LEFT JOIN (so only get keywords
matching "DOG") and make it a JOIN. Now you'll have a list of all
matches.
add a subquery in the WHERE clause
WHERE id not in (SELECT id FROM ... WHERE LIKE '%cat%')

MySQL query to find location of the last entry of an device

I have two tables (in a SQL Server database) as following:
create table tblDevices(
idDevice varchar(255) not null,
(...)
primary key(idGUID));
create table tblEnvironmentLog(
ID int not null auto_increment,
idDevice varchar(30) not null,
txtLocation varchar(255),
datDate date,
datTime time,
primary key(ID));
Each event in tblEnvironmentLog belongs to a device in tblDevice, and each record has a date and a location(<- location of the device). An device may have several records(at least one).
I need a SQL query that finds for each idDevice in tblDevices the location of the latest record among all its record.
I've tried to write the query for long time, but couldn't find the solution, so any help or hint will be welcomed.
So, you basically want to get the rows that correspond to the latest date/time for each device. You've specified both MySQL and SQL Server. This is for SQL Server.
SELECT *
FROM tblDevices t1
INNER JOIN tblEnvironmentLog t2
ON t1.idDevice = t2.idDevice
WHERE t2.ID = (SELECT TOP 1 t3.ID
FROM tblEnvironmentLog t3
WHERE t2.idDevice = t3.idDevice
ORDER BY t3.datDate DESC, t3.datTime DESC)
You need something like this -
SELECT t1.*, t2.datDate, t2.datTime
FROM tblDevices t1
JOIN tblEnvironmentLog t2 on t1.idDevice = t2.idDevice
WHERE t2.id in ( SELECT MAX(ID) FROM tblEnvironmentLog GROUP BY idDevice)
Try a query like below. In case you need to know better about how to add date time individual components correctly, see this answer
Explanation:
We use an inner query (as E) to find the max date time per idDevice, and then join it to tblDevices (as D) as well as tblEnvironmentLog (as E2) to get desired results.
Query
select
DISTINCT -- this is needed as many result pair may come up with same values
D.iDDevice, E2.Location
from tblDevices D left join
(
Select
idDevice,
max( cast(datDate as datetime)+cast (datTime as dattime)) as dt
from tblEnvironmentLog
group by idDevice) E
on D.idDevice =E.idDevice
left join tblEnvironmentLog E2 on E.idDevice=E2.idDevice and
cast(E2.datDate as datetime)+cast(E2.datTime as datetime)=E.dt

How efficiently check record exist more than 2 times in table using sub-query?

I have a query like this . I have compound index for CC.key1,CC.key2.
I am executing this in a big database
Select * from CC where
( (
(select count(*) from Service s
where CC.key1=s.sr2 and CC.key2=s.sr1) > 2
AND
CC.key3='new'
)
OR
(
(select count(*) from Service s
where CC.key1=s.sr2 and CC.key2=s.sr1) <= 2
)
)
limit 10000;
I tried to make it as inner join , but its getting slower . How can i optimize this query ?
The trick here is being able to articulate a query for the problem:
SELECT *
FROM CC t1
INNER JOIN
(
SELECT cc.key1, cc.key2
FROM CC cc
LEFT JOIN Service s
ON cc.key1 = s.sr2 AND
cc.key2 = s.sr1
GROUP BY cc.key1, cc.key2
HAVING COUNT(*) <= 2 OR
SUM(CASE WHEN cc.key = 'new' THEN 1 ELSE 0 END) > 2
) t2
ON t1.key1 = t2.key1 AND
t1.key2 = t2.key2
Explanation:
Your original two subqueries would only add to the count if a given record in CC, with a given key1 and key2 value, matched to a corresponding record in the Service table. The strategy behind my inner query is to use GROUP BY to count the number of times that this happens, and use this instead of your subqueries. The first count condition is your bottom subquery, and the second one is the top.
The inner query finds all key1, key2 pairs in CC corresponding to records which should be retained. And recognize that these two columns are the only criteria in your original query for determining whether a record from CC gets retained. Then, this inner query can be inner joined to CC again to get your final result set.
In terms of performance, even this answer could leave something to be desired, but it should be better than a massive correlated subquery, which is what you had.
Basically get the Columns that must not have a duplicate then join them together. Example:
select *
FROM Table_X A
WHERE exists (SELECT 1
FROM Table_X B
WHERE 1=1
and a.SHOULD_BE_UNIQUE = b.SHOULD_BE_UNIQUE
and a.SHOULD_BE_UNIQUE2 = b.SHOULD_BE_UNIQUE2
/* excluded because these columns are null or can be Duplicated*/
--and a.GENERIC_COLUMN = b.GENERIC_COLUMN
--and a.GENERIC_COLUMN2 = b.GENERIC_COLUMN2
--and a.NULL_COLUMN = b.NULL_COLUMN
--and a.NULL_COLUMN2 = b.NULL_COLUMN2
and b.rowid > a.ROWID);
Where SHOULD_BE_UNIQUE and SHOULD_BE_UNIQUE2 are columns that shouldn't be repeated and have unique columns and the GENERIC_COLUMN and NULL_COLUMNS can be ignored so just leave them out of the query.
Been using this approach when we have issues in Duplicate Records.
With the limited information you've given us, this could be a rewrite using 'simplified' logic:
SEELCT *
FROM CC NATURAL JOIN
( SELECT key1, key2, COUNT(*) AS tally
FROM Service
GROUP
BY key1, key2 ) AS t
WHERE key3 = 'new' OR tally <= 2;
Not sure whether it will perform better but might give you some ideas of what to try next?

Searching on multi (1-n) relation tables

I learned the hard way that i shouldn't store serialized data in a table when i need to make it searchable .
So i made 3 tables the base & two 1-n relation tables .
So here is the query i get if i want to select a specific activity .
SELECT
jdc_organizations_activities.id
FROM
jdc_activity_sector ,
jdc_activity_type
INNER JOIN jdc_organizations_activities ON jdc_activity_type.activityId = jdc_organizations_activities.id
AND
jdc_activity_sector.activityId = jdc_organizations_activities.id
WHERE
jdc_activity_sector.activitySector = 5 AND
jdc_activity_type.activityType = 3
Questions :
1- What kind of indexes can i add on a 1-n relation table , i already have a unique combination of (activityId - activitySector) & (activityId - activityType)
2- Is there a better way to write the query to have a better performance ?
Thank you !
I would re-organise the query to avoid the cross product caused by using , notation.
Also, you are effectively only using the sector and type tables as filters. So put activity table first, and then join on your other tables.
Some may suggest that; the first join should ideally be the join which is most likely to restrict your results the most, leaving the minimal amount of work to do in the second join. In reality, the sql engine can actually re-arrange your query when generateing a plan, but it does help to think this way to help you think about the efforts the sql engine are having to go to.
Finally, there are the indexes on each table. I would actually suggest reversing the Indexes...
- ActivitySector THEN ActivityId
- ActivityType THEN ActivityId
This is specifically because the sql engine is manipulating your query. It can take the WHERE clause and say "only include records from the Sector table where ActivitySector = 5", and similarly for the Type table. By having the Sector and Type identifies FIRST in the index, this filtering of the tables can be done much faster, and then the joins will have much less work to do.
SELECT
[activity].id
FROM
jdc_organizations_activities AS [activity]
INNER JOIN
jdc_activity_sector AS [sector]
ON [activity].id = [sector].activityId
INNER JOIN
jdc_activity_type AS [type]
ON [activity].id = [type].activityId
WHERE
[sector].activitySector = 5
AND [type].activityType = 3
Or, because you don't actually use the content of the Activity table...
SELECT
[sector].activityId
FROM
jdc_activity_sector AS [sector]
INNER JOIN
jdc_activity_type AS [type]
ON [sector].activityId = [type].activityId
WHERE
[sector].activitySector = 5
AND [type].activityType = 3
Or...
SELECT
[activity].id
FROM
jdc_organizations_activities AS [activity]
WHERE
EXISTS (SELECT * FROM jdc_activity_sector WHERE activityId = [activity].id AND activitySector = 5)
AND EXISTS (SELECT * FROM jdc_activity_type WHERE activityId = [activity].id AND activityType = 3)
I would advise against mixing old style from table1, table2 and new style from table1 inner join table2 ... in a single query. And you can alias tables using table1 as t1, shortening long table names to an easy to remember mnenomic:
select a.id
from jdc_organizations_activities a
join jdc_activity_sector as
on as.activityId = a.Id
join jdc_activity_type as at
on at.activityId = a.Id
where as.activitySector = 5
and at.activityType = 3
Or even more readable using IN:
select a.id
from jdc_organizations_activities a
where a.id in
(
select activityId
from jdc_activity_sector
where activitySector = 5
)
and a.id in
(
select activityId
from jdc_activity_type
where activityType = 3
)