Select distinct records on a join - mysql

I have two mysql tables - a sales table:
+----------------+------------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------------------+------+-----+---------+-------+
| StoreId | bigint(20) unsigned | NO | PRI | NULL | |
| ItemId | bigint(20) unsigned | NO | | NULL | |
| SaleWeek | int(10) unsigned | NO | PRI | NULL | |
+----------------+------------------------------+------+-----+---------+-------+
and an items table:
+--------------------+------------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+------------------------------+------+-----+---------+-------+
| ItemId | bigint(20) unsigned | NO | PRI | NULL | |
| ItemName | varchar(100) | NO | | NULL | |
+--------------------+------------------------------+------+-----+---------+-------+
The sales table contains multiple records for each ItemID - one for each SaleWeek. I want to select all items sold by joining the two tables like so:
SELECT items.ItemName, items.ItemId FROM items
JOIN sales ON items.ItemId = sales.ItemId
WHERE sales.StoreID = ? ORDER BY sales.SaleWeek DESC;
However, this is returning multiple ItemId values based on the multiple entries for each SaleWeek. Can I do a distinct select to only return one ItemID - I don't want to have to query for the latest SaleWeek because some items may not have an entry for the latest SaleWeek so I need to get the last sale. Do I need to specify DISTINCT or use a LEFT OUTER JOIN or something?

A DISTINCT should do what you're looking for:
SELECT DISTINCT items.ItemName, items.ItemId FROM items
JOIN sales ON items.ItemId = sales.ItemId
WHERE sales.StoreID = ? ORDER BY sales.SaleWeek DESC;
That would return only distinct items.ItemName, items.ItemId tuples.

You had comment about the sales week too. And wanting the most recent week, you may want to try using a GROUP BY
SELECT
items.ItemName,
items.ItemId,
max( Sales.SaleWeek ) MostRecentSaleWeek
FROM
items JOIN sales ON items.ItemId = sales.ItemId
WHERE
sales.StoreID = ?
GROUP BY
items.ItemID,
items.ItemName
ORDER BY
MostRecentSaleWeek, -- ordinal column number 3 via the MAX() call
items.ItemName
You may have to change the ORDER BY to the ordinal 3rd column reference if you so want based on that column.. This query will give you each distinct item AND the most recent week it was sold.

SELECT u.user_name,u.user_id, u.user_country,u.user_phone_no,ind.Industry_name,inv.id,u.user_email
FROM invitations inv
LEFT JOIN users u
ON inv.sender_id = u.user_id
LEFT JOIN employee_info ei
ON inv.sender_id=ei.employee_fb_id
LEFT JOIN industries ind
ON ei.industry_id=ind.id
WHERE inv.receiver_id='XXX'
AND inv.invitation_status='0'
AND inv.invitati
on_status_desc='PENDING'
GROUP BY (user_id)

We can use this:
INSERT INTO `test_table` (`id`, `name`) SELECT DISTINCT
a.`employee_id`,b.`first_name` FROM `employee_leave_details`as a INNER JOIN
`employee_register` as b ON a.`employee_id` = b.`employee_id`

Related

Show only products the user did not order

I'm getting data about products from 3 different tables and I want to show only products the user didn't order.
Table 1:
Supplier
__________________
| id | name | .. |
|____|______|____|
| 1 | john | .. |
|____|______|____|
Table 2:
Product
___________________________
| id | p_name| supplier_id |
|____|_______|_____________|
| 1 | phone | 1 |
|____|_______|_____________|
| 2 | watch | 1 |
|____|_______|_____________|
Table 3:
Order
___________________________
| id | p_id | buyer_id |
|____|_______|_____________|
| 1 | 1 | 10 |
|____|_______|_____________|
So in this case when the user visit the products page, I want to show the products he didn't order which is watch in this example.
My SQL query:
SELECT supplier.name, products.p_name FROM products
INNER JOIN supplier ON supplier.id = product.supplier_id
INNER JOIN order ON product.id = order.p_id
I tried LEFT JOIN order ON product.id != order.p_id and WHERE order.p_id IS NULL, But no success.
So how to check if the user didn't order this product? Then show the rest of the products?
You can use a WHERE product NOT IN to exclude specific products, shown below.
SELECT supplier.name, products.p_name
FROM products
INNER JOIN supplier ON supplier.id = products.supplier_id
WHERE products.id NOT IN (
SELECT p_id FROM order WHERE buyer_id = supplier.id
)
Within the WHERE statement, you select the product id's of all the orders from a specific user. By applying NOT IN all of these products will be excluded in your list.
You can use WHERE NOT EXISTS statement for you case:
SELECT suppliers.name, products.name
FROM products
INNER JOIN suppliers ON suppliers.id = products.supplier_id
WHERE NOT EXISTS (
SELECT product_id FROM orders WHERE orders.product_id = products.id
);
SQL fiddle here
It should be something like this:
SELECT supplier.name, products.p_name
FROM product
INNER JOIN supplier ON supplier.id = product.supplier_id
LEFT JOIN order ON product.id = order.p_id
WHERE order.id IS NULL
You issue a LEFT JOIN against order to also get products without a match and you discard rows without matches with order.id IS NULL. There's also no need to discard duplicate rows because products that haven't been ordered will only appear once.
+---------+
| |
| |
| |
| product +--------+
| | |
| | order |
| | |
+---------+--------+
This is an unusual schema; A supplier would not normally be an attribute of a 'products' table, and the details of the order would normally be held in a separate table from the orders - otherwise an order can only comprise one item, but anyway...
DROP TABLE IF EXISTS suppliers;
CREATE TABLE suppliers
(id INT AUTO_INCREMENT PRIMARY KEY
,name VARCHAR(12) UNIQUE
);
INSERT INTO suppliers VALUES
(1,'john');
DROP TABLE IF EXISTS products;
CREATE TABLE products
(id INT AUTO_INCREMENT PRIMARY KEY
,product_name VARCHAR(12) UNIQUE
,supplier_id INT NOT NULL
);
INSERT INTO products VALUES
(1,'phone',1),
(2,'watch',1);
DROP TABLE IF EXISTS orders;
CREATE TABLE orders
(id INT AUTO_INCREMENT PRIMARY KEY
,product_id INT NOT NULL
,buyer_id INT NOT NULL
);
INSERT INTO orders VALUES
(1,1,10);
...
SELECT p.*
FROM products p
LEFT
JOIN orders o
ON o.product_id = p.id
WHERE o.id IS NULL;
+----+--------------+-------------+
| id | product_name | supplier_id |
+----+--------------+-------------+
| 2 | watch | 1 |
+----+--------------+-------------+

How to join multiple tables with multiple conditions

I have six tables
online_transaction
| date | id | supplier_id | product code |
online_transaction_enc
| date | id | item |
offline_transaction
| date | id |
offline_transaction_enc
| date | id | item |
products
| type | product_code |
supplier
| supplier_id | country |
Select count(item) where date is between '2018-Jun-01' And '2018-July-30' AND Type='household' AND country='Malaysia'
These is roughly what I want to achieve. I want to union item from online and offline on date and id so I will get all items combine and then followed by the other requirements.
How can do this in MySQL ?
Try this:
select count(a.item) from
(select date, id, item from online_transaction_enc
union
select date, id, item from offline_transaction_enc)a
inner join
(
select date,id,supplier_id,productcode from online_transaction
union
select date,id,supplier_id,productcode from offline_transaction)b
on a.date=b.date and a.id=b.id
inner join supplier on b.supplier_id=supplier.supplier_id
inner join products on b.productcode=products.product_code
where a.date between '20180601' And '20180730' AND Type='household' AND country='Malaysia'
SELECT count(item)
FROM online_transaction_enc
INNER JOIN online_transaction_enc ON supplier_id
INNER JOIN products ON supplier_id
...
WHERE WHERE (date BETWEEN '2010-01-30 14:15:55' AND '2010-09-29 10:15:55')
;
This solution is using an inner join. I believe this is what you're looking for. Here is more documentation
http://www.mysqltutorial.org/mysql-inner-join.aspx

Sorting left join results on large open schema tables

I am designing an open schema database with the following table definitions
mysql> desc orders;
+-------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| json | text | NO | | NULL | |
+-------+---------+------+-----+---------+----------------+
mysql> desc ordersnames;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(330) | NO | UNI | NULL | |
+-------+--------------+------+-----+---------+----------------+
with an index on name
mysql> desc orderskeys;
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| reference | int(11) | NO | MUL | NULL | |
| nameref | int(11) | NO | MUL | NULL | |
| value | varchar(330) | NO | | NULL | |
+-----------+--------------+------+-----+---------+----------------+
with indices on:
reference,nameref,value
nameref,value
reference
All json fields (1 dimension only) have entry in the orderskeys table per existing field, whereby nameref is a reference to the field name as defined in ordersname.
I would typically query like this:
SELECT
orderskeysdeliveryPostcode.value deliveryPostcode,
orders.ID,
orderskeysCN.value CN
FROM
orders
JOIN ordersnames as ordersnamesCN
on ordersnamesCN.name = 'CN'
JOIN orderskeys as orderskeysCN
on orderskeysCN.nameref = ordersnamesCN.ID
and orderskeysCN.reference = orders.ID
and orderskeysCN.value = '10094'
JOIN ordersnames as ordersnamesdeliveryPostcode
on ordersnamesdeliveryPostcode.name = 'deliveryPostcode'
JOIN orderskeys as orderskeysdeliveryPostcode
on orderskeysdeliveryPostcode.nameref = ordersnamesdeliveryPostcode.ID
and orderskeysdeliveryPostcode.reference = orders.ID
order by deliveryPostcode
limit 0,1000
yielding a result set like this
+------------------+--------+-------+
| deliveryPostcode | ID | CN |
+------------------+--------+-------+
| NULL | 251018 | 10094 |
| NULL | 157153 | 10094 |
| NULL | 95419 | 10094 |
| B-5030 | 172944 | 10094 |
+------------------+--------+-------+
-> lightning fast even with 400k + orders records
However, not all record do contain all fields, so the above query will not yield the records that do not have a 'deliveryPostcode field', so I have to query like this
SELECT
orderskeysdeliveryPostcode.value deliveryPostcode,
orders.ID,
orderskeysCN.value CN
FROM
orders
JOIN ordersnames as ordersnamesCN
on ordersnamesCN.name = 'CN'
JOIN orderskeys as orderskeysCN
on orderskeysCN.nameref = ordersnamesCN.ID
and orderskeysCN.reference = orders.ID
and orderskeysCN.value = '10094'
JOIN ordersnames as ordersnamesdeliveryPostcode
on ordersnamesdeliveryPostcode.name = 'deliveryPostcode'
LEFT JOIN orderskeys as orderskeysdeliveryPostcode
on orderskeysdeliveryPostcode.nameref = ordersnamesdeliveryPostcode.ID
and orderskeysdeliveryPostcode.reference = orders.ID
limit 0,1000
-> equally fast, but as soon as I add an ORDER BY clause on the key value from a left joined table, mysql wants to do the sorting externally (temporary, filesort) instead of using an existing index.
SELECT
orderskeysdeliveryPostcode.value deliveryPostcode,
orders.ID,
orderskeysCN.value CN
FROM
orders
JOIN ordersnames as ordersnamesCN
on ordersnamesCN.name = 'CN'
JOIN orderskeys as orderskeysCN
on orderskeysCN.nameref = ordersnamesCN.ID
and orderskeysCN.reference = orders.ID
and orderskeysCN.value = '10094'
JOIN ordersnames as ordersnamesdeliveryPostcode
on ordersnamesdeliveryPostcode.name = 'deliveryPostcode'
LEFT JOIN orderskeys as orderskeysdeliveryPostcode
on orderskeysdeliveryPostcode.nameref = ordersnamesdeliveryPostcode.ID
and orderskeysdeliveryPostcode.reference = orders.ID
ORDER BY deliveryPostCode
limit 0,1000
-> very slow ...
In fact the sorting operation itself is not much different , as all NULL values for column deliveryPostcode would be at the beginning (ASC) or the end (DESC) while the rest of the dataset would have the same order as with JOIN instead of LEFT JOIN.
How can I query (and order) such tables efficiently? Do I need different relations or indices ?
Much obliged ...
With INNER JOINs, to reduce the number of lookups, MySQL is going to start with the table with the fewest rows (see the EXPLAIN result to see which table MySQL starts with).
If you order by anything other than a column in that first table, or there is no index to satisfy the ORDER BY clause on that first table, MySQL is going to have to do a filesort.
The use of a temporary table is much more likely when text columns are involved, and not just an in-memory temporary table, but a dreadful on-disk temporary table.
Use STRAIGHT_JOIN to force the order that MySQL performs inner joins.
I am not sure what logic do you have in some parts of your query.
I think it still can be optimized.
But just to resolve the issue you have, try just switch it to RIGHT JOIN for now:
SELECT
orderskeysdeliveryPostcode.value deliveryPostcode,
o.id,
o.CN
FROM orderskeys as orderskeysdeliveryPostcode
INNER JOIN ordersnames as ord_n
on ord_n.id = orderskeysdeliveryPostcode.nameref
AND ord_n.name = 'deliveryPostcode'
RIGHT JOIN (
SELECT
orders.ID,
orderskeysCN.CN
FROM
orders
LEFT JOIN
(SELECT
orderskeys.value as CN,
orderskeys.reference
FROM
orderskeys
INNER JOIN ordersnames as ordersnamesCN
ON ordersnamesCN.id = orderskeys.nameref
AND ordersnamesCN.name = 'CN'
WHERE orderskeys.value = '12209'
) as orderskeysCN
ON
orderskeysCN.reference = orders.ID
limit 0,1000
) as o
on
orderskeysdeliveryPostcode.reference = o.ID
ORDER BY deliveryPostCode;
and here is sqlfiddle we can play with. Just need you to add data inserts there.

Left Join Subselect with LIMIT in MySQL

I have 3 tables:
actor
| FIELD | TYPE | NULL | KEY | DEFAULT | EXTRA |
|----------|------------------|------|-----|---------|----------------|
| actor_id | int(10) unsigned | NO | PRI | (null) | auto_increment |
| username | varchar(30) | NO | | (null) | |
tag
| FIELD | TYPE | NULL | KEY | DEFAULT | EXTRA |
|--------|------------------|------|-----|---------|----------------|
| tag_id | int(10) unsigned | NO | PRI | (null) | auto_increment |
| title | varchar(40) | NO | | (null) | |
actor_tag_count
| FIELD | TYPE | NULL | KEY | DEFAULT | EXTRA |
|------------------|------------------|------|-----|-------------------|-----------------------------|
| actor_id | int(10) unsigned | NO | PRI | (null) | |
| tag_id | int(10) unsigned | NO | PRI | (null) | |
| clip_count | int(10) unsigned | NO | | (null) | |
| update_timestamp | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
SQLFiddle
I want to get the 5 most frequent (highest clip_count) and most recently updated (latest update_timestamp) tags for each actor.
My attempted query is:
SELECT
`a`.`actor_id`,
`a`.`username`,
GROUP_CONCAT(atc.clip_count) AS `tag_clip_counts`,
GROUP_CONCAT(t.tag_id) AS `tag_ids`,
GROUP_CONCAT(t.title) AS `tag_titles`
FROM
`actor` AS `a`
LEFT JOIN (
SELECT
`atc`.`actor_id`,
`atc`.`tag_id`,
`atc`.`clip_count`
FROM
`actor_tag_count` AS `atc`
INNER JOIN `actor` AS `a` USING (actor_id)
ORDER BY
atc.clip_count DESC,
atc.update_timestamp DESC
LIMIT 5
) AS `atc` USING (actor_id)
LEFT JOIN `tag` AS `t` ON atc.tag_id = t.tag_id
GROUP BY
`a`.`actor_id`
The problem is that the left join subselect is only calculated once and the tags for every result in the set are only fetched from a pool of 5 tags.
Expected GROUP_CONCAT'd tag title results for Keanu Reeves:
comedy, scifi, action, suspense, western
(Both western and documentary have a clip_count of 2, but western should come first because it has a later update_timestamp)
I'm not sure this is a point of any relevance, but I am executing other joins on the actors table but had them removed for this question.
It would be highly preferable to make this all 1 query, but I'm stumped on how to do this even with 2 queries. 1-or-2-query solutions appreciated.
SQLFiddle, with the help of a very nice answer about using a GROUP_CONCAT limit workaround:
SELECT
`a`.`actor_id`,
`a`.`username`,
SUBSTRING_INDEX(GROUP_CONCAT(atc.clip_count ORDER BY atc.clip_count DESC, atc.update_timestamp DESC), ',', 5) AS `tag_clip_counts`,
SUBSTRING_INDEX(GROUP_CONCAT(t.tag_id ORDER BY atc.clip_count DESC, atc.update_timestamp DESC), ',', 5) AS `tag_ids`,
SUBSTRING_INDEX(GROUP_CONCAT(t.title ORDER BY atc.clip_count DESC, atc.update_timestamp DESC), ',', 5) AS `tag_titles`
FROM
`actor` AS `a`
LEFT JOIN actor_tag_count AS `atc` USING (actor_id)
LEFT JOIN `tag` AS `t` ON atc.tag_id = t.tag_id
GROUP BY
`a`.`actor_id`
It is possible by adding a sequence number, but might not perform well on large tables.
Something like this (not tested):-
SELECT actor_id,
username,
GROUP_CONCAT(clip_count) AS tag_clip_counts,
GROUP_CONCAT(tag_id) AS tag_ids,
GROUP_CONCAT(title) AS tag_titles
FROM
(
SELECT actor.actor_id,
actor.username,
atc.clip_count,
tag.tag_id,
tag.title,
#aSeq := IF(#aActorId = actor.actor_id, #aSeq, 0) + a AS aSequence,
#aActorId := actor.actor_id
FROM
(
SELECT actor.actor_id,
actor.username,
atc.clip_count,
tag.tag_id,
tag.title
FROM actor
LEFT JOIN actor_tag_count AS atc ON actor.actor_id = atc.actor_id
LEFT JOIN tag ON atc.tag_id = tag.tag_id
ORDER BY actor.actor_id, atc.clip_count DESC, atc.update_timestamp DESC
)
CROSS JOIN (SELECT #aSeq:=0, #aActorId:=0)
)
WHERE aSequence <= 5
GROUP BY actor_id, username
A alternative would be to have a subselect that has a correlated sub query in the select statement (with a limit of 5), and then have an outer query that does the group concats. Something like this (again not tested)
SELECT
actor_id,
username,
GROUP_CONCAT(clip_count) AS tag_clip_counts,
GROUP_CONCAT(tag_id) AS tag_ids,
GROUP_CONCAT(title) AS tag_titles
FROM
(
SELECT
a.actor_id,
a.username,
(
SELECT
atc.clip_count,
t.tag_id,
t.title
FROM actor_tag_count AS atc ON a.actor_id = atc.actor_id
LEFT JOIN tag t ON atc.tag_id = t.tag_id
ORDER BY atc.clip_count DESC, atc.update_timestamp DESC
LIMIT 5
)
FROM actor a
)
GROUP BY actor_id, username

How to get smallest column value without triggering "Mixing of GROUP columns [...] with no GROUP columns is illegal if there is no GROUP BY clause"?

I have a table 'foo' with a timestamp field 'bar'. How do I get only the oldest timestamp for a query like: SELECT foo.bar from foo? I tried doing something like: SELECT MIN(foo.bar) from foo but it failed with this error
ERROR 1140 (42000) at line 1: Mixing of GROUP columns (MIN(),MAX(),COUNT(),...) with no GROUP columns is illegal if there is no GROUP BY clause
OK, so my query is much more complicated than that and that's why I am having a hard time with it. This is the query with the MIN(a.timestamp):
select distinct a.user_id as 'User ID',
a.project_id as 'Remix Project Id',
prjs.based_on_pid as 'Original Project ID',
(case when f.reasons is NULL then 'N' else 'Y' end)
as 'Flagged Y or N',
f.reasons, f.timestamp, MIN(a.timestamp)
from view_stats a
join (select id, based_on_pid, user_id
from projects p) prjs on
(a.project_id = prjs.id)
left outer join flaggers f on
( f.project_id = a.project_id
and f.user_id = a.user_id)
where a.project_id in
(select distinct b.id
from projects b
where b.based_on_pid in
( select distinct c.id
from projects c
where c.user_id = a.user_id
)
)
order by f.reasons desc, a.user_id, a.project_id;
Any help would be greatly appreciated.
The view_stats table:
+------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+-------------------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| user_id | int(10) unsigned | NO | MUL | 0 | |
| project_id | int(10) unsigned | NO | MUL | 0 | |
| ipaddress | bigint(20) | YES | MUL | NULL | |
| timestamp | timestamp | NO | | CURRENT_TIMESTAMP | |
+------------+------------------+------+-----+-------------------+----------------+
If you are going to use aggregate functions (like min(), max(), avg(), etc.) you need to tell the database what exactly it needs to take the min() of.
transaction date
one 8/4/09
one 8/5/09
one 8/6/09
two 8/1/09
two 8/3/09
three 8/4/09
I assume you want the following.
transaction date
one 8/4/09
two 8/1/09
three 8/4/09
Then to get that you can use the following query...note the group by clause which tells the database how to group the data and get the min() of something.
select
transaction,
min(date)
from
table
group by
transaction