Sql: choose all baskets containing a set of particular items - mysql

Eddy has baskets with items. Each item can belong to arbitrary number of baskets or can belong to none of them.
Sql schema to represent it is as following:
tbl_basket
- basketId
tbl_item
- itemId
tbl_basket_item
- pkId
- basketId
- itemId
Question: how to select all baskets containing a particular set of items?
UPDATE. Baskets with all the items are needed. Otherwise it would have been easy task to solve.
UPDATE B. Have implemented following solution, including SQL generation in PHP:
SELECT basketId
FROM tbl_basket
JOIN (SELECT basketId FROM tbl_basket_item WHERE itemId = 1 ) AS t0 USING(basketId)
JOIN (SELECT basketId FROM tbl_basket_item WHERE itemId = 15 ) AS t1 USING(basketId)
JOIN (SELECT basketId FROM tbl_basket_item WHERE itemId = 488) AS t2 USING(basketId)
where number of JOINs equals to number of items.
That works good unless some of the items are included in almost every basket. Then performance drops dramatically.
UPDATE B+. To resolve performance issues heuristic is applied. First you select frequency of each item. If it exceeds some threshold, you don't include it in JOINs and either:
apply post-filtering in PHP
or just don't apply filter by particular itemId, giving a user approximate results in a resonable amount of time
UPDATE B++. Seems that current problem have no nice solution in MySQL. This point raises one question and one solution:
(question) Does PostgreSQL have some advanced indexing techniques which allows to solve this problem without doing a full scan?
(solution) Seems that it could be solved nicely in Redis using sets and SINTER command to get an intersection.

I think the best way is to create a temporary table with the set of needed items (procedure that takes the item ids as parameters or something along those lines) and then left join it with all of the above tables joined together.
If for a given basketid you have NO nulls on the right side of the left join, the basket contains all the needed items.

-- the table definitions
CREATE TABLE basket ( basketid INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE item ( itemid INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE basket_item
( basketid INTEGER NOT NULL REFERENCES basket (basketid)
, itemid INTEGER NOT NULL REFERENCES item (itemid)
, PRIMARY KEY (basketid, itemid)
);
-- the query
SELECT * FROM basket b
WHERE NOT EXISTS (
SELECT * FROM item i
WHERE i.itemid IN (1,15,488)
AND NOT EXISTS (
SELECT * FROM basket_item bi
WHERE bi.basketid = b.basketid
AND bi.itemid = i.itemid
)
);

If you are going to provide the list of items, then edit id1, id2, etc. in below query:
select distinct t.basketId
from tbl_basket_item as t
where t.itemID in (id1, id2)
will give all baskets containing a set of items. No need to join any other tables as your requirements don't need them.

The simplest solution is to use HAVING clause.
SELECT basketId
FROM tbl_basket
WHERE itemId IN (1,15,488)
HAVING Count(DISTINCT itemId) = 3 --DISTINCT in case we have duplicate items in a basket
GROUP BY basketId

Related

Update Column Using ROW_Number() function. But it is failing. Could Any one suggest a solution?

I know guys, this might be a silly question, but I have not found any solution till now, so I am asking this question with all the inputs and outputs that I have done. Could anyone provide me the solution.
What I want to do is: the parcelno can have one or more invoicenumbers, I want to find how many invoice numbers does an parcel has and give it a rank. The ranking part is important because my further work is depending on this column.
I have one table named TableA. It has three columns Invoicenumber which is the unique id, ParcelNo which can be duplicate and Ranking which I want to update.
CREATE TABLE TableA
(
Invoicenumber varchar(5),
ParcelNo varchar(5),
Ranking bit,
IDate Datetime
)
INSERT INTO TableA (Invoicenumber, ParcelNo)
VALUES ('INV01', 'P0001'), ('INV02', 'P0001'),
('INV03', 'P0002'), ('INV04', 'P0002'),
('INV05', 'P0003'), ('INV06', 'P0003')
When I run the following query the output is as desired.
;WITH CTE AS
(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY PARCELNO ORDER BY INVOICENUMBER) AS RWNO
FROM
TableA
)
SELECT
T.*, C.RWNO
FROM CTE C
JOIN TableA T ON T.Invoicenumber = C.Invoicenumber
The output is below:
So, I tried to update the Ranking column in Table A.
I run this query to do so:
;WITH CTE AS
(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY PARCELNO ORDER BY INVOICENUMBER) AS RWNO
FROM
TableA
)
UPDATE T
SET Ranking = C.RWNO
FROM CTE C
JOIN TableA T ON T.Invoicenumber = C.Invoicenumber
But the output is wrong. The column is not updated as expected.
Below is the output of the updated column:
Why is the Ranking column is updated incorrectly?
I want to update the column to prepare some data. This table is sample for the explanation.
I am elaborating my issue below:-
Below in the image are two tables:-
Table A and Table B has IDate column.
I want to update the IDate column in A from B. But the dates should be unique. First date should not be repeated. These date are associated with Invoicenumbers.
I think what you really want is a calculated column (called a calculated field or generated field). I'm guessing that your parcel number should point to another table that stores information about the parcels. If that's the case, then go with:
-- First approach
CREATE TABLE Parcels (
id int IDENTITY (1,1) NOT NULL,
ParcelNo varchar(5),
Description varchar(max)
-- Ranking AS (SELECT COUNT(*) FROM Invoices i WHERE i.ParcelID = id)
);
CREATE TABLE Invoices (
id int IDENTITY (1,1) NOT NULL,
InvoiceNumber varchar(5),
ParcelID int FOREIGN KEY REFERENCES Parcels(id)
);
ALTER TABLE Parcels ADD Ranking AS (SELECT COUNT(*) FROM Invoices i WHERE i.ParcelID = id);
INSERT INTO Parcels
(ParcelNo)
VALUES
('P0001'),
('P0001'),
('P0002'),
('P0003');
INSERT INTO Invoices
(InvoiceNumber, ParcelID)
VALUES
('INV01', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0001')),
('INV02', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0001')),
('INV03', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0002')),
('INV04', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0002')),
('INV05', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0003')),
('INV06', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0003'));
On the other hand, if you really want all the data in a single table, then try this:
-- Second approach
CREATE TABLE TableA (
Invoicenumber varchar(5),
ParcelNo varchar(5),
Ranking AS (SELECT COUNT(*) FROM TableA a WHERE a.ParcelNo = ParcelNo)
)
Some notes:
Both of my approaches assume that by ranking, you mean that you want a count of how many invoices are in a parcel.
My first approach has a circular reference, because the Invoices table has a foreign key into the Parcels table, but the Parcels table tabulates information from the Invoices table. That's why I commented out the calculated field in the first table, then added the calculated field back in after creating both tables.
Notice that I capitalized all SQL keywords (except the types such as varchar). It's easier to read SQL if you either go with all caps or no caps for an entire query.
Notice my semicolons at the end of each logical break. Semi-colons are technically optional, but a lot of folks consider using them to be good practice.
For my first approach, I'm using a foreign key. You can read more about those here.
Because my first approach split the table into 2 tables, I needed to somehow know the id of the Parcels table when populating the Invoices table, even though the ids are given by the database (so I can't know them ahead of time). Those select statements accomplish that.
My syntax should work with SQL Server, but no necessarily with any other DBMS. That's because calculated fields are not ANSI standard.

Accessing multiple tables at the same time and multiplying values in SQL

My problem is very specific and I couldn't figure out a better name for the title.
I have 3 tables, which are Pessoa (Person), Bicicleta (Bicicle) and Viagem (Trip):
What I want to do is select the names of the individuals by alphabetic order who had a trip, together with the Avaliacao (Evaluation) multiplied by Valor_Viagem (Trip cost).
What I tried to do (not working properly nor finished):
select distinct PESSOA.Nome, VIAGEM.Avaliacao, VIAGEM.Id_Bicicleta, BICICLETA.Valor_Viagem from PESSOA, VIAGEM
join BICICLETA ON VIAGEM.Id_Bicicleta = BICICLETA.Id where PESSOA.Email IN (
SELECT Email_Utilizador FROM VIAGEM
);
Which gives me:
^This is NOT what I want, as stated before.
I am also not 100% sure what you are looking for, but I assume you need a list of distinct names that contains the Avalacao * Valor_Viagem summed for each person (so a person with 5 trips has five times Avalacao * Valor_Viagem + ... + ...).
That is very easy to achieve:
select PESSOA.Nome, VIAGEM.Avaliacao, VIAGEM.Id_Bicicleta, BICICLETA.Valor_Viagem from PESSOA, VIAGEM, SUM(VIAGEM.Avaliacao * BICICLETA.Valor_viagem) AS trip_cost
join BICICLETA ON VIAGEM.Id_Bicicleta = BICICLETA.Id where PESSOA.Email IN (
SELECT Email_Utilizador FROM VIAGEM
) GROUP BY PESSOA.Nome;
What happens is the following:
first you compute the product for each trip
than you use the GROUP BY clause to group persons with identical names together
using SUM in combination with GROUP BY causes to sum all values of persons within this group, in that case all records with the same PESSOA.Nome
A word of warning
This assumes you will have distinct names. This appears risky. Better assign each person a unique Id and use this Id as foreign key instead of the name.

SELECT data from multiple tables if a requirement is met in second table

the title doesnt describe it that well, my problem:
I have 2 tables, one table for orders, the other for the product.
An order can have n products associated with it.
I want to select those orders, where all their associated products have a status (attribute of the product) greater or equal to x. (So I know that every product of my order is "ready" and the order can be processed further)
Every ordered product has an OrderID
Any tips?
e: Just started with SQL, dont bash me if this is a stupid question
It's a matter of mindset.
You have to find the 'dual' form of your question ( -> double negation).
You need to find all the orders that have AT LEAST one line that is not ready.
Assuming your tables are the common:
Order(ID,bla,bla,bla) and Order Line(orderID, row#, status, bla, bla) FK orderid references order.
You can use this stub:
Select *
from orders O
where not exists ( select * from order_line OL
where ol.orderID=O.orderID --binding with outer query
and status <> 'ready'
)
SIDE NOTE: my query will produce also empty orders, to filter them just add to outer query and exists (select * from orderline oe where oe.orderid=o.orderid)

sql table design to fetch records with multiple inclusion and exclusion conditions

We want to select customers based on following parameters i.e. customer should be in:
specific city i.e. cityId=1,2,3...
specific customerId should be excluded i.e. customerId=33,2323,34534...
specific age i.e. 5 years, 7 years, 72 years...
This inclusion & exclusion list can be any long.
How should we design database for this:
Create separate table 'customerInclusionCities' for these inclusion cities and do like:
select * from customers where cityId in (select cityId from customerInclusionCities)
Some we do for age, create table 'customerEligibleAge' with all entries of eligible age entries:
i.e. select * from customers where age in (select age from customerEligibleAge)
and Create separate table 'customerIdToBeExcluded' for excluding customers:
i.e. select * from customers where customerId not in (select customerId from customerIdToBeExcluded)
OR
Create One table with Category and Ids.
i.e. Category1 for cities, Category2 for CustomerIds to be excluded.
Which approach is better, creating one table for these parameters OR creating separate tables for each list i.e. age, customerId, city?
IN ( SELECT ... ) can be very slow. Do your query as a single SELECT without subqueries. I assume all 3 columns are in the same table? (If not, that adds complexity.) The WHERE clause will probably have 3 IN ( constants ) clauses:
SELECT ...
FROM tbl
WHERE cityId IN (1,2,3...)
AND customerId NOT IN (33,2323,34534...)
AND age IN (5, 7, 72)
Have (at least):
INDEX(cityId),
INDEX(age)
(Negated things are unlikely to be able to use an index.)
The query will use one of the indexes; having both will give the Optimizer a choice of which it thinks is better.
Or...
SELECT c.*
FROM customers AS c
JOIN cityEligible AS b ON b.city = c.city
JOIN customerEligibleAge AS ce ON c.age = ce.age
LEFT JOIN customerIdToBeExcluded AS ex ON c.customerId = ex.customerId
WHERE ex.customerId IS NULL
Suggested indexes (probably as PRIMARY KEY):
customers: (city)
customerEligibleAge: (age)
customerIdToBeExcluded: (customerId)
In order to discuss further, please provide SHOW CREATE TABLE for each table and EXPLAIN SELECT ... for any of the queries actually work.
If you use the database only that operation, I recommend to use the first solution. Also the first solution is very simple to deploy.
The second solution fills up with junk the DB.

Strange behavior in SQL with CASE WHEN EXIST in subquery (MySQL 5.5)

I've got a shop with items and itemgroups.
I also got some additional items, from which one should randomly be selected to present it in the cart overview, if that one is not present in the cart allready.
There can be items linked to the items group as well. If there is no item linked to the items in the cart i want one of them, that is not allready inside the cart to be randomly selected
I save relations between items inside table item2item:
itemid INT
additional_item_id INT
I save relations between groups and items in table group2item
groupid INT
additional_item_id INT
To make it a little more simple let's assume my item table looks like this:
itemid INT
name VARCHAR(100)
Here is what i tried to get an additional item:
SELECT
name
FROM
items a
WHERE
(a.itemid) = (
# if we have any additional items linked to the item get one of em randomly, that is not inside of a cart
SELECT
CASE WHEN EXISTS (
SELECT
b.additional_item_id
FROM
item2item b
WHERE
b.additional_item_id NOT IN (10)
AND b.itemid IN (10)
)
THEN (
SELECT
c.additional_item_id
FROM
item2item c
WHERE
c.additional_item_id NOT IN (10)
AND b.itemid IN (10)
ORDER BY
RAND()
LIMIT 1
# else if we have additional items linked to the items group get one of em randomly, that is not inside of a cart
) ELSE (
(
SELECT
d.additional_item_id
FROM
group2item d
WHERE
d.additional_item_id NOT IN (10)
AND
d.groupid IN (1)
ORDER BY RAND()
LIMIT 1
)
)
END as selecteditemid
)
Anyone can explain to me, why i get different amounts of rows with this?
You are asking why you might get different numbers of rows. Here are some thoughts:
items.itemid is not unique, so the duplicates are coming from multiple matches.
The else clause is executed and the where clause filters out all rows.
The else clause is executed and group2item.additional_item_id matches no items.itemid.
I speculate that it might be possible that when the first condition in the case statement is executed, the data might change between the when and then.
If you are looking for a fix to this, then move the subqueries to the from clause and put simpler logic in the where.