sql table design to fetch records with multiple inclusion and exclusion conditions - mysql

We want to select customers based on following parameters i.e. customer should be in:
specific city i.e. cityId=1,2,3...
specific customerId should be excluded i.e. customerId=33,2323,34534...
specific age i.e. 5 years, 7 years, 72 years...
This inclusion & exclusion list can be any long.
How should we design database for this:
Create separate table 'customerInclusionCities' for these inclusion cities and do like:
select * from customers where cityId in (select cityId from customerInclusionCities)
Some we do for age, create table 'customerEligibleAge' with all entries of eligible age entries:
i.e. select * from customers where age in (select age from customerEligibleAge)
and Create separate table 'customerIdToBeExcluded' for excluding customers:
i.e. select * from customers where customerId not in (select customerId from customerIdToBeExcluded)
OR
Create One table with Category and Ids.
i.e. Category1 for cities, Category2 for CustomerIds to be excluded.
Which approach is better, creating one table for these parameters OR creating separate tables for each list i.e. age, customerId, city?

IN ( SELECT ... ) can be very slow. Do your query as a single SELECT without subqueries. I assume all 3 columns are in the same table? (If not, that adds complexity.) The WHERE clause will probably have 3 IN ( constants ) clauses:
SELECT ...
FROM tbl
WHERE cityId IN (1,2,3...)
AND customerId NOT IN (33,2323,34534...)
AND age IN (5, 7, 72)
Have (at least):
INDEX(cityId),
INDEX(age)
(Negated things are unlikely to be able to use an index.)
The query will use one of the indexes; having both will give the Optimizer a choice of which it thinks is better.
Or...
SELECT c.*
FROM customers AS c
JOIN cityEligible AS b ON b.city = c.city
JOIN customerEligibleAge AS ce ON c.age = ce.age
LEFT JOIN customerIdToBeExcluded AS ex ON c.customerId = ex.customerId
WHERE ex.customerId IS NULL
Suggested indexes (probably as PRIMARY KEY):
customers: (city)
customerEligibleAge: (age)
customerIdToBeExcluded: (customerId)
In order to discuss further, please provide SHOW CREATE TABLE for each table and EXPLAIN SELECT ... for any of the queries actually work.

If you use the database only that operation, I recommend to use the first solution. Also the first solution is very simple to deploy.
The second solution fills up with junk the DB.

Related

Inner join in mysql take a long time

I have table contacts with more than 1,000,000 and other table cities which have about 20,000 records. Need to fetch all cities which have used in contacts table.
Contacts table have following columns
Id, name, phone, email, city, state, country, postal, address, manager_Id
cities table have
Id, city
I used Inner join for this, but its taking a long time to go. Query takes more than 2 minutes to execute.
I used this query
SELECT cities.* FROM cities
INNER JOIN contacts ON contacts.City = cities.city
WHERE contacts.manager_Id= 1
created index on manager_Id as well. But still its very slow.
for better performance you could add index
on table cities column city
on table contacts a composite index on columns (manager_id, city)
Filter contacts first and then join to cities:
SELECT ct.*
FROM cities ct INNER JOIN (
SELECT city FROM contacts
WHERE manager_Id = 1
) cn ON cn.city = ct.city
You need indexes for city in both tables and for manager_id in contacts.
As others have pointed out about having proper index, I am taking it a bit more for clarification. You are specifically looking for contacts where the MANAGER ID = 1. This is not expected to be one person, but could be many people. So having the MANAGER ID in the first position will optimize get me all people for that manager. By having the city as part of the index via (manager_id, city), you are pulling the two data elements you need to optimize as part of the index. This way the engine does not have to go to the raw data pages to get the other part of interest.
Now, From that, you want all the city information (hence the join to city table on that ID).
Since you are only querying the CITIES and not the actual contact people information, you probably want to have DISTINCT City ID. Lets say a manager is responsible for 50 people and most of them live in the same city or neighboring. You may have 5 distinct cities? That too will limit your result set of joining.
Having said that, I would do a follows, and with MySQL, using STRAIGHT_JOIN can help optimize by "do the query as I wrote it, don't think for me".
select STRAIGHT_JOIN
cty.*
from
( select distinct c.City
from Contacts c
where c.Manager_ID = 1 ) PQ
JOIN Cities cty
on PQ.City = Cty.City
The "PQ" is an alias representing my "pre-query" of just DISTINCT cities for a given manager.
Again, have one index on Contacts table on (manager_id, city). On the city table, I would expect and index on (city).
You need two indexes, one on each table.
On the contacts table, first index manager_Id, then City
CREATE INDEX idx_contacts_mgr_city ON contacts(manager_Id, City);
On the cities table, just index `City.
Is the 'City' field from the table 'Contacts' a VARCHAR?
If that's the case, I see multiple things here.
First of all, since you have already have the 'Id' for the corresponding city in your 'cities' tables, I don't see why not to use the same 'Id' from the 'cities' table for the 'Contacts' table.
You can add the 'IdCity' field to the 'Contacts' table so you don't have to modify your existing records.
You'll have to insert the 'IdCity' manually though for each of your records, or you can create a Query using 'cities' table and then compare the 'idCity' but insert the 'city' (city name) in your 'Contacts' table.
Returning to your query:
Then, use an INT JOIN instead of a VARCHAR JOIN. Since you have many records, this can show up an important significance in performance.
It looks like you need to add two indexes, one on cities.city and one on (contacts.manager_Id, contacts.city). That should speed things up significantly.

SELECT data from multiple tables if a requirement is met in second table

the title doesnt describe it that well, my problem:
I have 2 tables, one table for orders, the other for the product.
An order can have n products associated with it.
I want to select those orders, where all their associated products have a status (attribute of the product) greater or equal to x. (So I know that every product of my order is "ready" and the order can be processed further)
Every ordered product has an OrderID
Any tips?
e: Just started with SQL, dont bash me if this is a stupid question
It's a matter of mindset.
You have to find the 'dual' form of your question ( -> double negation).
You need to find all the orders that have AT LEAST one line that is not ready.
Assuming your tables are the common:
Order(ID,bla,bla,bla) and Order Line(orderID, row#, status, bla, bla) FK orderid references order.
You can use this stub:
Select *
from orders O
where not exists ( select * from order_line OL
where ol.orderID=O.orderID --binding with outer query
and status <> 'ready'
)
SIDE NOTE: my query will produce also empty orders, to filter them just add to outer query and exists (select * from orderline oe where oe.orderid=o.orderid)

Sql: choose all baskets containing a set of particular items

Eddy has baskets with items. Each item can belong to arbitrary number of baskets or can belong to none of them.
Sql schema to represent it is as following:
tbl_basket
- basketId
tbl_item
- itemId
tbl_basket_item
- pkId
- basketId
- itemId
Question: how to select all baskets containing a particular set of items?
UPDATE. Baskets with all the items are needed. Otherwise it would have been easy task to solve.
UPDATE B. Have implemented following solution, including SQL generation in PHP:
SELECT basketId
FROM tbl_basket
JOIN (SELECT basketId FROM tbl_basket_item WHERE itemId = 1 ) AS t0 USING(basketId)
JOIN (SELECT basketId FROM tbl_basket_item WHERE itemId = 15 ) AS t1 USING(basketId)
JOIN (SELECT basketId FROM tbl_basket_item WHERE itemId = 488) AS t2 USING(basketId)
where number of JOINs equals to number of items.
That works good unless some of the items are included in almost every basket. Then performance drops dramatically.
UPDATE B+. To resolve performance issues heuristic is applied. First you select frequency of each item. If it exceeds some threshold, you don't include it in JOINs and either:
apply post-filtering in PHP
or just don't apply filter by particular itemId, giving a user approximate results in a resonable amount of time
UPDATE B++. Seems that current problem have no nice solution in MySQL. This point raises one question and one solution:
(question) Does PostgreSQL have some advanced indexing techniques which allows to solve this problem without doing a full scan?
(solution) Seems that it could be solved nicely in Redis using sets and SINTER command to get an intersection.
I think the best way is to create a temporary table with the set of needed items (procedure that takes the item ids as parameters or something along those lines) and then left join it with all of the above tables joined together.
If for a given basketid you have NO nulls on the right side of the left join, the basket contains all the needed items.
-- the table definitions
CREATE TABLE basket ( basketid INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE item ( itemid INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE basket_item
( basketid INTEGER NOT NULL REFERENCES basket (basketid)
, itemid INTEGER NOT NULL REFERENCES item (itemid)
, PRIMARY KEY (basketid, itemid)
);
-- the query
SELECT * FROM basket b
WHERE NOT EXISTS (
SELECT * FROM item i
WHERE i.itemid IN (1,15,488)
AND NOT EXISTS (
SELECT * FROM basket_item bi
WHERE bi.basketid = b.basketid
AND bi.itemid = i.itemid
)
);
If you are going to provide the list of items, then edit id1, id2, etc. in below query:
select distinct t.basketId
from tbl_basket_item as t
where t.itemID in (id1, id2)
will give all baskets containing a set of items. No need to join any other tables as your requirements don't need them.
The simplest solution is to use HAVING clause.
SELECT basketId
FROM tbl_basket
WHERE itemId IN (1,15,488)
HAVING Count(DISTINCT itemId) = 3 --DISTINCT in case we have duplicate items in a basket
GROUP BY basketId

MySql comma seperated values and using IN to select data

I store destinations a user is willing to ship a product to in a varchar field like this:
"userId" "destinations" "product"
"1" "US,SE,DE" "apples"
"2" "US,SE" "books"
"3" "US" "mushrooms"
"1" "SE,DE" "figs"
"2" "UK" "Golf Balls"
I was hoping this query would return all rows where US was present. Instead it returns only a single row.
select * from destinations where destinations IN('US');
How do I get this right? Am I using the wrong column type? or is it my query that's failing.
Current Results
US
Expected Results
US,SE,DE
US,SE
US
Try with FIND_IN_SET
select * from destinations where FIND_IN_SET('US',destinations);
Unfortunately, the way you've structured your table, you'll have to check for a pattern match for "US" in your string at the beginning, middle, or end.
One way you can do that is using LIKE, as follows:
SELECT *
FROM destinations
WHERE destinations LIKE ('%US%');
Another way is using REGEXP:
SELECT *
FROM destinations
WHERE destinations REGEXP '.*US.*';
Yet another is using FIND_IN_SET, as explained by Sadkhasan.
CAVEAT
None of these will offer great performance or data integrity, though. And they will all COMPOUND their performance problems when you add criteria to your search.
E.g. using FIND_IN_SET, proposed by Sadkhasan, you would have to do something like:
SELECT * FROM destinations
WHERE FIND_IN_SET('US',destinations)
OR FIND_IN_SET('CA',destinations)
OR FIND_IN_SET('ET',destinations);
Using REGEXP is a little better, though REGEXP is innately slow:
SELECT *
FROM destinations
WHERE destinations REGEXP '.*US|CA|ET.*';
SO WHAT NOW?
Your best bet would be switching to a 3NF design with destinations applying to products by splitting into 2 tables that you can join, e.g.:
CREATE TABLE products (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
userId INT NOT NULL REFERENCES users(id),
name VARCHAR(255) NOT NULL
) TYPE=InnoDB;
Then you would add what's called a composite key table, each row containing a productId and a single country, with one row per country.
CREATE TABLE product_destinations (
productId INT NOT NULL REFERENCES products(id),
country VARCHAR(2) NOT NULL,
PRIARY KEY (productId, country)
) TYPE=InnoDB;
Data in this table would look like:
productId | country
----------|--------
1 | US
1 | CA
1 | ET
2 | US
2 | GB
Then you could structure a query like this:
SELECT p.*
FROM products AS p
INNER JOIN product_destinations AS d
ON p.id = d.productId
WHERE d.country IN ('US', 'CA', 'ET')
GROUP BY p.id;
It's important to add the GROUP (or DISTINCT in the SELECT clause), as a single product may ship to multiple countries, resulting in multiple row matches - aggregation will reduce those to a single result per product id.
An added bonus is you don't have to UPDATE your countries column and do string operations to determine if the country already exists there. You can let the database do that for you, and INSERT - preventing locking issues that will further compound your problems.
You can use this if your destinations have just two caracters of the countries.
SELECT * FROM destinations WHERE destinations LIKE ('%US%')
to add other country
SELECT * FROM destinations WHERE destinations LIKE ('%US%')
AND destinations LIKE ('%SE%')
^^^--> you use AND or OR as you want the result.

mysql select problem

Hi were trying to perform a mysql select which isnt going to plan and hoping someone can shed some light on it.
we have estimated 10,000 plus listing records, a customer can have several listing records for different locations. we need to select all customer listings where at least one of the locations is equal to a specifield location.
so for example lets say customer 1 has a listing in sheffiled, doncaster, leeds, wakefield and customer 2 has listings in london and brighton.
Now I want to select all customer listings where one of the listings is for the area sheffield.
Id hope to get back the 4 rows for customer 1 because one of his listings is in sheffield.
for the sake of this example lets just presume the table consists of just customerId and LocationName
I need to select all customerIds where one of the locationNames = sheffield. So Id get 4 rows retruend with the cusotmer ID and the 4 locations
How do you write this query in mysql? Im guessing subselect but not too sure.
SELECT customerid FROM customers_location
WHERE customerid IN(SELECT DISTINCT customerid FROM customers_location WHERE LocationName = 'sheffield')
Something like:
SELECT * FROM CUST_TABLE WHERE CUST_ID IN (
SELECT DISTINCT CUST_ID FROM CUST_TABLE WHERE CUST_LOCN='Sheffield')
Note; The distinct clause may not be strictly necessary, not sure.
That would give you eg. 4 records for customer xyz who has one of their listing locations as Sheffield, which I think is what you're asking.