MySQL: How can this UNION be improved? - mysql

I have two tables, a for access and p for provider. I then have third table for joining the tables together, standard normalization. However, the provider table is a parent/child table, and the joining table has an option whether the access should be granted for all the provider children or not.
CREATE TABLE p (
p_id int PRIMARY KEY,
name varchar(32),
parent_id int,
FOREIGN KEY (parent_id) REFERENCES p(p_id)
);
CREATE TABLE a (
a_id int PRIMARY KEY,
name varchar(32)
);
CREATE TABLE ap (
a_id int,
p_id int,
sub tinyint,
FOREIGN KEY (a_id) REFERENCES a(a_id),
FOREIGN KEY (p_id) REFERENCES p(p_id)
);
Some sample data, 1 provider with 2 child providers. 2 access users, 1 with no child access and with child access.
INSERT INTO p VALUES(1, 'a', null);
INSERT INTO p VALUES(2, 'a.a', 1);
INSERT INTO p VALUES(3, 'a.b', 1);
INSERT INTO a VALUES(1, 'user 1');
INSERT INTO a VALUES(2, 'user 2');
INSERT INTO ap VALUES(1, 1, 0);
INSERT INTO ap VALUES(2, 1, 1);
The result is that I want to have a list of providers that the user have access to, based on the third table.
Currently I've solved this by joining two queries with UNION. The first query selects possible child providers and the second goes for primary providers.
SELECT p_id, name
FROM p
WHERE parent_id IN(
SELECT p_id FROM ap WHERE a_id = 1 AND sub = 1
)
UNION
SELECT ap.p_id, p.name
FROM ap
LEFT JOIN p ON p.p_id = ap.p_id
WHERE a_id = 1;
I don't like this query, it's ugly and there must be a smarter way :)

If I have the logic correct, then you want all records from p where one of the following is true:
p_id matches a record in ap for the given a_id.
parent_id matches a record in ap for a given a_id.
This suggests using exists for the conditions:
select p.p_id, p.name
from p
where exists (select 1
from ap
where ap.p_id = p.p_id and ap.a_id = 1
) or
exists (select 1
from ap
where ap.p_id = p.parent_id and ap.sub = 1 and ap.a_id = 1
)
With a composite index on ap(p_id, a_id, sub), this should have much better performance than your version of the query.

Related

MySQL: filter child records, include all siblings

There are two MySQL tables:
tparent(id int, some data...)
tchild(id int, parent_id int, some data...)
I need to return all columns (parent plus all children) where at least one of the children matches some criteria.
My current solution:
-- prepare sample data
DROP TABLE IF EXISTS tparent;
DROP TABLE IF EXISTS tchild;
CREATE TABLE tparent (id int, c1 varchar(10), c2 date, c3 float);
CREATE TABLE tchild(id int, parent_id int, c4 float, c5 varchar(20), c6 date);
CREATE UNIQUE INDEX tparent_id_IDX USING BTREE ON tparent (id);
CREATE UNIQUE INDEX tchild_id_IDX USING BTREE ON tchild (id);
INSERT INTO tparent
VALUES
(1, 'a', '2021-01-01', 1.23)
, (2, 'b', '2021-02-01', 1.32)
, (3, 'c', '2021-01-03', 2.31);
INSERT INTO tchild
VALUES
(10, 1, 22.333, 'argh1', '2000-01-01')
, (20, 1, 33.222, 'argh2', '2000-01-02')
, (30, 1, 44.555, 'argh3', '2000-02-02')
, (40, 2, 33.222, 'argh4', '2000-03-02')
, (50, 3, 33.222, 'argh5', '2000-04-02')
, (60, 3, 33.222, 'argh6', '2000-05-02');
-- the query
WITH parent_filter AS
(
SELECT
parent_id
FROM
tchild
WHERE
c4>44
)
SELECT
p.*,
c.*
FROM
tparent p
JOIN tchild c ON p.id = c.parent_id
JOIN parent_filter pf ON p.id = pf.parent_id;
It returns 3 rows for parent id 1 and child ids 10, 20, 30, because child id 30 has a matching record. It does not return data for any other parent id.
However, I am querying tchild twice here (first in the CTE, then again in the main query). As both tables are relatively big (10s - 100s millions of rows, 2-5 child records per parent record on average), I am hitting performance / timing issues.
Is there a better way of achieving this filtering? I.e. without having to query tchild table more than once?
did you try this version?
SELECT *
FROM tparent p
JOIN tchild c ON p.id = c.parent_id AND <criteria>
this way you limit the tchild table with the createria before the actual join
Perhaps you can use this instead:
select p.*, c.*
from tparent p
join tchild c
on p.id = c.parent_id
where exists (select 1 from tchild where <crtiteria>)
This should retrieve all rows for parent and child join when there is at least one record in the child table meeting the criteria.

Get all Items attached to sellerId - SQL

When execute my query i just get 1 item back that i attached to the sellerId instead of 2. Does anyone know how i can say?
select the name of item and re seller for each item that belongs to the re seller. With a rating higher than 4?
Current Query:
SELECT items.name, sellers.name
FROM items
inner JOIN sellers
on items.id=sellers.id
WHERE rating > 4
ORDER BY sellerId
The query for tables inc. data:
CREATE TABLE sellers (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(30) NOT NULL,
rating INTEGER NOT NULL
);
CREATE TABLE items (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(30) NOT NULL,
sellerId INTEGER REFERENCES sellers(id)
);
INSERT INTO sellers(id, name, rating) values(1, 'Roger', 3);
INSERT INTO sellers(id, name, rating) values(2, 'Penny', 5);
INSERT INTO items(id, name, sellerId) values(1, 'Notebook', 2);
INSERT INTO items(id, name, sellerId) values(2, 'Stapler', 1);
INSERT INTO items(id, name, sellerId) values(3, 'Pencil', 2);
You've got the wrong join, here's a corrected query;
SELECT items.name, sellers.name
FROM items
inner JOIN sellers
on items.sellerId=sellers.id
WHERE rating > 4
ORDER BY sellerId
You're joining on id = id, you want sellerid = id
Notice in your table definition that item.sellerId is the field that joins to seller.id
CREATE TABLE items (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(30) NOT NULL,
sellerId INTEGER REFERENCES sellers(id)
);
You need to join on the correct column:
SELECT i.name, s.name
FROM items i INNER JOIN
sellers s
ON i.sellerid = s.id
----------^
WHERE rating > 4
ORDER BY i.sellerId
Note that I also introduced table aliases and qualified column names. These make a query easier to write and to read.
SELECT items.name, sellers.name
FROM items, sellers
WHERE items.sellerId = sellers.id and sellers.rating>4;
Here is the right query:
SELECT items.name as items, sellers.name as sellers
FROM sellers
INNER JOIN items
ON (sellers.id = items.sellerid)
WHERE sellers.rating > 4

how to get the ID card in MYSQL?

I created the following tables:
create table people (
ID varchar(9),
name varchar(20),
CONSTRAINT pk_ID PRIMARY KEY (ID)
);
create table cars (
license_plate varchar(9),
ID varchar(9),
CONSTRAINT pk_ID PRIMARY KEY (license_plate)
);
create table accidents (
code varchar(9),
license_plate varchar(9),
CONSTRAINT pk_ID PRIMARY KEY (code)
);
I inserted the following data:
insert into people(ID, name) values('0x1','Louis');
insert into people(ID, name) values('0x2','Alice');
insert into people(ID, name) values('0x3','Peter');
insert into cars(license_plate, ID) values('001','0x1');
insert into cars(license_plate, ID) values('002','0x2');
insert into cars(license_plate, ID) values('003','0x1');
insert into cars(license_plate, ID) values('004','0x3');
insert into accidents(code, license_plate) values('fd1','001');
insert into accidents(code, license_plate) values('fd2','004');
insert into accidents(code, license_plate) values('fd3','002');
The question is: How to select people who don't have had accidents in any of their cars?
My problem is that when I was trying to use not in. Having "Louis" at least one car in the table accidents, the query show me "Louis"and should not show "Louis".
My query:
select ID from people where ID in (select ID from cars where license_plate not in (select license_plate from accidents));
Result:
+-----+
| ID |
+-----+
| 0x1 |
+-----+
select name from people where ID not in (
select distinct c.ID from
accidents as a inner join cars as c
on a.license_plate = c.license_plate
)
Explanation = the sub query will join the cars and accidents, will give you the ID's of all cars who had accidents. On this you can run not in query on the people table
I need two subquery
select id from people
where id not it
(select id form cars where licens_plate not in
(select distintc license_plate from accidents))
This should be quite fast:
SELECT people.* FROM people
LEFT JOIN cars ON cars.ID = people.ID
LEFT JOIN accidents ON accidents.license_plate = cars.license_plate
WHERE accidents.code IS NULL
GROUP BY people.ID

For every customer select all the other customers that bought the same item

Database description
I have a simple database composed of three tables: customer, product and custumer_product.
customer: Contains the information about the customer. His ID and name
product: Contains informations about the products that are available in the store. ID and name
custumer_product: junction table
- customer (table)
id integer primary key not null
name TEXT
- custumer_product (table)
id_product integer
id_customer integer
primary key(id_product, id_customer)
FOREIGN KEY(Id_product) REFERENCES product(id)
FOREIGN KEY (ID_customer) REFERENCES customer(ID)
- product (table)
id integer primary key not null
name TEXT
The three tables have been initialized in sqlfiddle by using SQLITE. The following SQL queries are used to construct the database
create table if not exists customer (id integer primary key not null, name TEXT);
create table if not exists product (id integer primary key not null, name TEXT);
create table if not exists customer_product (id_product integer, id_customer
integer, primary key(id_product, id_customer), FOREIGN KEY(Id_product) REFERENCES product(id), FOREIGN KEY (ID_customer) REFERENCES customer(ID));
insert into customer(id,name) values(1,"john");
insert into customer(id,name) values(2,"Paul");
insert into customer(id,name) values(3,"Jenny");
insert into customer(id,name) values(4,"Fred");
insert into customer(id,name) values(5,"Lea");
insert into product(id,name) values(1,"Mouse");
insert into product(id,name) values(2,"screen");
insert into product(id,name) values(3,"pc");
insert into product(id,name) values(4,"CD");
insert into product(id,name) values(5,"Game");
insert into customer_product values(1,1);
insert into customer_product values(1,2);
insert into customer_product values(1,3);
insert into customer_product values(2,1);
insert into customer_product values(2,2);
insert into customer_product values(2,3);
insert into customer_product values(3,4);
insert into customer_product values(4,5);
insert into customer_product values(5,5);
Problem
For every customer I want to select all the other customers that bought at least one similar product.
John and Paul bought at least 1 similar product
No customer bought a similar product as jenny yet
Fred and lea bought a similar product
output
"John" "Paul"
"Jenny"
"Fred" "Lea"
This is basically a self-join and possibly an aggregation. For instance, the following gets all customers that have purchased a similar product as another, ordered by the number of similar products:
select cp.id_customer, cp2.id_customer, count(*)
from customer_product cp join
customer_product cp2
on cp.id_product = cp2.id_product
group by cp.id_customer, cp2.id_customer
order by cp.id_customer, count(*) desc;
You can bring in additional information such as customer names by doing additional joins.
While I'm not entirely sure I understand the conditions, there are three basic steps to this problem, which you can combine into one query (or not).
Get the products that the customer bought
Get the IDs of the customers that bought the same products
Get the customer details based on those IDs
So for 1, you do a simple select:
SELECT id_product FROM customer_product WHERE id_customer = 1
For 2, you can use the IN statement:
SELECT * FROM customer_product WHERE id_product IN
(SELECT id_product FROM customer_product WHERE id_customer = 1);
For 3 use a combination of JOIN and GROUP BY to get the relevant details from the customer table.
first find the list of product bought by at least 2 customers, second find the name of the custumers using the join table and third select the customer's name once. here is the query:
select distinct c.name from(select c.name, p.name,cp.id_customer, cp.id_product from customer_product cp join customer c on c.id=cp.id_customer join product p on p.id=cp.id_customer where cp.id_product in(select id_product, total from(select id_product,count(*) as total from customer_product group by id_product)p where total>=2)p1)p2)

SQL: Detect duplicate customers

im trying to create a sql query, that will detect (possible) duplicate customers in my database:
I have two tables:
Customer with the columns: cid, firstname, lastname, zip. Note that cid is the unique customer id and primary key for this table.
IgnoreForDuplicateCustomer with the columns: cid1, cid2. Both columns are foreign keys, which references to Customer(cid). This table is used to say, that the customer with cid1 is not the same as the customer with the cid2.
So for example, if i have
a Customer entry with cid = 1, firstname="foo", lastname="anonymous" and zip="11231"
and another Customer entry with cid=2, firstname="foo", lastname="anonymous" and zip="11231".
So my sql query should search for customers, that have the same firstname, lastname and zip and the detect that customer with cid = 1 is the same as customer with cid = 2.
However, it should be possible to say, that customer cid = 1 and cid=2 are not the same, by storing a new entry in the IgnoreForDuplicateCustomer table by setting cid1 = 1 and cid2 = 2.
So detecting the duplicate customers work well with this sql query script:
SELECT cid, firstname, lastname, zip, COUNT(*) AS NumOccurrences
FROM Customer
GROUP BY fistname, lastname,zip
HAVING ( COUNT(*) > 1 )
My problem is, that i am not able, to integrate the IgnoreForDuplicateCustomer table, to that
like in my previous example the customer with cid = 1 and cid=2 will not be marked / queried as the same, since there is an entry/rule in the IgnoreForDuplicateCustomer table.
So i tried to extend my previous query by adding a where clause:
SELECT cid, firstname, lastname, COUNT(*) AS NumOccurrences
FROM Customer
WHERE cid NOT IN (
SELECT cid1 FROM IgnoreForDuplicateCustomer WHERE cid2=cid
UNION
SELECT cid2 FROM IgnoreForDuplicateCustomer WHERE cid1=cid
)
GROUP BY firstname, lastname, zip
HAVING ( COUNT(*) > 1 )
Unfortunately this additional WHERE clause has absolutely no impact on my result.
Any suggestions?
Here you are:
Select a.*
From (
select c1.cid 'CID1', c2.cid 'CID2'
from Customer c1
join Customer c2 on c1.firstname=c2.firstname
and c1.lastname=c2.lastname and c1.zip=c2.zip
and c1.cid < c2.cid) a
Left Join (
Select cid1 'CID1', cid2 'CID2'
From ignoreforduplicatecustomer one
Union
Select cid2 'CID1', cid1 'CID2'
From ignoreforduplicatecustomer two) b on a.cid1 = b.cid1 and a.cid2 = b.cid2
where b.cid1 is null
This will get you the IDs of duplicate records from customer table, which are not in table ignoreforduplicatecustomer.
Tested with:
CREATE TABLE IF NOT EXISTS `customer` (
`CID` int(11) NOT NULL AUTO_INCREMENT,
`Firstname` varchar(50) NOT NULL,
`Lastname` varchar(50) NOT NULL,
`ZIP` varchar(10) NOT NULL,
PRIMARY KEY (`CID`))
ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=100 ;
INSERT INTO `customer` (`CID`, `Firstname`, `Lastname`, `ZIP`) VALUES
(1, 'John', 'Smith', '1234'),
(2, 'John', 'Smith', '1234'),
(3, 'John', 'Smith', '1234'),
(4, 'Jane', 'Doe', '1234');
And:
CREATE TABLE IF NOT EXISTS `ignoreforduplicatecustomer` (
`CID1` int(11) NOT NULL,
`CID2` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `ignoreforduplicatecustomer` (`CID1`, `CID2`) VALUES
(1, 2);
Results for my test setup are:
CID1 CID2
1 3
2 3
Edit as per TPete's comment (dind't try it):
SELECT
C1.cid, C1.firstname, C1.lastname
FROM
Customer C1,
Customer C2
WHERE
C1.cid < C2.cid AND
C1.firstname = C2.firstname AND
C1.lastname = C2.lastname AND
C1.zip = C2.zip AND
CAST(C1.cid AS VARCHAR)+' ' +CAST(C2.cid AS VARCHAR) <>
(SELECT CAST(cid1 AS VARCHAR)+' '+CAST(cid2 AS VARCHAR) FROM IgnoreForDuplicateCustomer I WHERE I.cid1 = C1.cid AND I.cid2 = C2.cid);
Initially I thought that IgnoreForDuplicateCustomer was a field in the customer table.
crazy but I think it works :)
first I join the customer tables with itself on the names to get the duplicates
then I exclud the keys on the IgnoreForDuplicateCustomer table (the union is because the first query returns cid1, cid2 and cid2,cid1
the result will be duplicated but I think you can get the info you need
select c1.cid, c2.cid
from Customer c1
join Customer c2 on c1.firstname=c2.firstname
and c1.lastname=c2.lastname and c1.zip=c2.zip
and c1.cid!=c2.cid
except
(
select cid1,cid2 from IgnoreForDuplicateCustomer
UNION
select cid2,cid1 from IgnoreForDuplicateCustomer
)
second shot:
select firstname,lastname,zip from Customer
group by firstname,lastname,zip
having (count(*)>1)
except
select c1.firstname, c1.lastname, c1.zip
from Customer c1 join IgnoreForDuplicateCustomer IG on c1.cid=ig.cid1 join Customer c2 on ig.cid2=c2.cid
third:
select firstname,lastname,zip from (
select firstname,lastname,zip from Customer
group by firstname,lastname,zip
having (count(*)>1)
) X
where firstname not in (
select c1.firstname
from Customer c1 join IgnoreForDuplicateCustomer IG on c1.cid=ig.cid1 join Customer c2 on ig.cid2=c2.cid
)