Is querying with NOT IN faster then querying with IN? - mysql

Let's presume the following simple situation:
I have two tables, a category table that contains two fields, CategoryId, and CategoryGroup, and an ads table that contains another two fields, AdId, and category_CategoryId which is a link to the category table.
All the rows in the category table are grouped in two separate groups: buy or rent. So, each row in that table has in the CategoryGroup either the string buy or the string rent.
Let's say I want to count how many ads I have in the ads which are for sale.
I have two ways to do this:
Do a NOT IN query like this: SELECT COUNT(AdId) as Total FROM ads WHERE category_CategoryId NOT IN (SELECT CategoryId FROM category WHERE CategoryGroup = 'rent')
Or do an 'IN' query like this: SELECT COUNT(AdId) as Total FROM ads WHERE category_CategoryId IN (SELECT CategoryId FROM category WHERE CategoryGroup = 'buy')
I've tested both queries, and it seems to me, that the NOT IN query performs way faster than the IN type of query.
(0.45 secs for NOT IN on a table of ~900.000 rows, and with around 45 categories, while 1.1 secs for IN on the same dataset)
Is this incidental, or NOT IN queries will always perform faster in simmilar situations?

IN ( SELECT ... ) and NOT IN ( SELECT ... ) are perhaps never the most efficient way to code something. One may be faster than the other because SELECT has fewer rows than the other, not because of NOT.
Assuming an ad is in only one category, this is probably the most efficient.
SELECT Count(ads.AdId) as Total, ads.CategoryId
FROM ads
JOIN category AS c ON c.CategoryId = ads.CategoryId
WHERE c.CategoryGroup = 'buy'
GROUP BY ads.CategoryId
If an ad can be in multiple categories, then you have a puzzle: Should an ad that is both 'buy' and 'rent' be included or excluded from the count? Anyway, I am leading up to replacing IN with EXISTS as an alternative optimization:
SELECT Count(AdId) as Total, CategoryId
FROM ads
WHERE EXISTS
( SELECT *
FROM category
WHERE CategoryId = ads.CategoryId
AND CategoryGroup = 'buy'
)
GROUP BY CategoryId
(Sorry, I can't stand unnecessarily redundant column names like category_CategoryId.)
Perform EXPLAIN SELECT ... on the various choices to get more insight.

Related

SELECT data from multiple tables if a requirement is met in second table

the title doesnt describe it that well, my problem:
I have 2 tables, one table for orders, the other for the product.
An order can have n products associated with it.
I want to select those orders, where all their associated products have a status (attribute of the product) greater or equal to x. (So I know that every product of my order is "ready" and the order can be processed further)
Every ordered product has an OrderID
Any tips?
e: Just started with SQL, dont bash me if this is a stupid question
It's a matter of mindset.
You have to find the 'dual' form of your question ( -> double negation).
You need to find all the orders that have AT LEAST one line that is not ready.
Assuming your tables are the common:
Order(ID,bla,bla,bla) and Order Line(orderID, row#, status, bla, bla) FK orderid references order.
You can use this stub:
Select *
from orders O
where not exists ( select * from order_line OL
where ol.orderID=O.orderID --binding with outer query
and status <> 'ready'
)
SIDE NOTE: my query will produce also empty orders, to filter them just add to outer query and exists (select * from orderline oe where oe.orderid=o.orderid)

sql table design to fetch records with multiple inclusion and exclusion conditions

We want to select customers based on following parameters i.e. customer should be in:
specific city i.e. cityId=1,2,3...
specific customerId should be excluded i.e. customerId=33,2323,34534...
specific age i.e. 5 years, 7 years, 72 years...
This inclusion & exclusion list can be any long.
How should we design database for this:
Create separate table 'customerInclusionCities' for these inclusion cities and do like:
select * from customers where cityId in (select cityId from customerInclusionCities)
Some we do for age, create table 'customerEligibleAge' with all entries of eligible age entries:
i.e. select * from customers where age in (select age from customerEligibleAge)
and Create separate table 'customerIdToBeExcluded' for excluding customers:
i.e. select * from customers where customerId not in (select customerId from customerIdToBeExcluded)
OR
Create One table with Category and Ids.
i.e. Category1 for cities, Category2 for CustomerIds to be excluded.
Which approach is better, creating one table for these parameters OR creating separate tables for each list i.e. age, customerId, city?
IN ( SELECT ... ) can be very slow. Do your query as a single SELECT without subqueries. I assume all 3 columns are in the same table? (If not, that adds complexity.) The WHERE clause will probably have 3 IN ( constants ) clauses:
SELECT ...
FROM tbl
WHERE cityId IN (1,2,3...)
AND customerId NOT IN (33,2323,34534...)
AND age IN (5, 7, 72)
Have (at least):
INDEX(cityId),
INDEX(age)
(Negated things are unlikely to be able to use an index.)
The query will use one of the indexes; having both will give the Optimizer a choice of which it thinks is better.
Or...
SELECT c.*
FROM customers AS c
JOIN cityEligible AS b ON b.city = c.city
JOIN customerEligibleAge AS ce ON c.age = ce.age
LEFT JOIN customerIdToBeExcluded AS ex ON c.customerId = ex.customerId
WHERE ex.customerId IS NULL
Suggested indexes (probably as PRIMARY KEY):
customers: (city)
customerEligibleAge: (age)
customerIdToBeExcluded: (customerId)
In order to discuss further, please provide SHOW CREATE TABLE for each table and EXPLAIN SELECT ... for any of the queries actually work.
If you use the database only that operation, I recommend to use the first solution. Also the first solution is very simple to deploy.
The second solution fills up with junk the DB.

Select rows where item is only in certain categories

So I have a somewhat complicated mysql query question. I have 3 tables. One is a table of items. One is a table of categories. And one is a linking table that just has 2 fields, itemID and categoryID. It is a many to many relationship, so one item can be in multiple categories and each category can have multiple items. Now two of the fields in the category table are isactive and ismain. They are just bools of 1 or 0. I want to grab all items that only belong to categories where at either isactive=0 or ismain=0 or both.
I took some time and set up a sql fiddle for someone to play around with. http://sqlfiddle.com/#!9/b03842/2
Solution using subquery:
SELECT DISTINCT i.* FROM cart_item i
JOIN cart_item_category ic ON i.itemref = ic.itemref
WHERE ic.catid IN (
SELECT id FROM cart_category WHERE active = 0 OR ismain = 0
)

What's the best way to combine 2 tables in MYSQL and remove duplicates?

I have 2 tables:
matches TABLE
FIELDS: record, date, competition
outrights TABLE
FIELDS: record, competition
What I would like, is to select rows grouped by the different types of competition. Below are the statements that work fine when I treat each table seperately.
Firstly, from 'matches' and only if the date hasn't already past:
SELECT competition, date FROM matches WHERE date >= '$currentTime' GROUP BY competition
Followed by rows from 'outrights':
SELECT competition FROM outrights GROUP BY competition
This is all pretty straight forward, except the same competition values will often (but not always) appear in both tables. I have looked at many different methods (including LEFT and RIGHT JOINS), but haven't found a simple solution. Basically I want the different competition types that appear in both tables, without duplication. Is this possible?
Is this what you are looking for. A little confused by the question but it appears that you want a DISTINCT listing of the competition column from both tables
SELECT DISTINCT competition
FROM
(
SELECT competition FROM matches
UNION
SELECT competition from outrights
) AS t
If you need the distinct competitions that appear only in both tables and not just one or both you could use
SELECT DISTINCT competition
FROM
(
SELECT competition FROM matches INNER JOIN
outrights ON matches.competition = outrights.competition
) AS t

MySQL: grab one row from each category, but remove duplicate rows posted in multiple categories

I have a database of articles, which are stored in categories. For my homepage, I want to grab an article from each category (I don't care which). However, some articles are crossposted to multiple categories, so they come up twice.
I have a table called tblReview with the article fields (reviewID, headline, reviewText) and a table called tblWebsiteContent that tells the site which categories the articles are in (id, reviewID, categoryID) and finally, a table called tblCategories (categoryID, categoryName) which stores the categories.
My query basically joins these tables and uses GROUP BY tblCategory.categoryID. If I try adding 'tblReview.reviewID' into the GROUP BY statement, I end up with hundreds of articles, rather than 22 (the number of categories I have).
I have a feeling this needs a subquery but my test efforts haven't worked (not sure which query needs to contain my joins / field list / where clause etc).
Thanks!
Matt
SELECT T.categoryName, tR.headline, tR.reviewText
FROM (
SELECT tC.categoryName, MAX(tR1.reviewID) reviewID
FROM tblReview tR1 join tblWebsiteContent tWC on tR1.reviewID = tWC.reviewID
join tblCategory tC on tC.categoryID = tWC.categoryID
GROUP BY tC.categoryName) T JOIN
tblReview.tR on tR.reviewID = T.reviewID
this query will select for each category an article headline corresponding to the Max reviewId for that category (you said 'I don't care which')
Try using SELECT DISTINCT. (This will only work if your SELECT is only pulling the article ID.)
select DISTINCT reviewID