MySQL substring to self join - mysql

I'm defining the relationship between the two tables using a join table. I want to arrange them in the order of many overlapping things. Currently, we are using subquery, is there a way to get the same result using join?
People FoodTable PeopleFood
ID | NAME ID | Food ID | PeopleId | FoodId
1 BOB 1 Hamberger 1 1 1
2 JOHN 2 Pizza 2 1 2
3 KATY 3 Chicken 3 1 3
4 MILLER 4 Salad 4 2 1
5 AMANDA 5 Sushi 5 2 2
6 2 3
7 3 2
8 3 3
9 4 3
10 4 5
11 5 5
When the table is defined in this way, I want to arrange food tastes similar to Bob's.
I'm doing it like this now.
SELECT people_id, COUNT(people_id) as count
FROM peopleFood
WHERE food_id IN
(SELECT food_id FROM peopleFood
WHERE people_id = 1)
AND people_id != 1
GROUP BY people_id
ORDER BY count DESC;
-- Result -------------
People_id | count
2 3
3 2
4 1
Is there a better way to change this method or use join?
Thank you!!!

You have been inconsistent in your use of the table and column names -
Tables - PeopleFood in your sample data but you reference peopleFood in your query.
Columns - PeopleId and FoodId in your sample data but you reference people_id and food_id in your query.
Choose a naming convention and stick to it. Everyone has there own preference but the important thing is to be consistent.
The equivalent query with INNER JOIN instead of your sub-query is -
SELECT
`pf2`.`people_id`,
COUNT(`pf2`.`food_id`) as `count`
FROM `PeopleFood` `pf1`
INNER JOIN `PeopleFood` `pf2`
ON `pf2`.`people_id` <> `pf1`.`people_id`
AND `pf2`.`food_id` = `pf1`.`food_id`
WHERE `pf1`.`people_id` = 1
GROUP BY `pf2`.`people_id`
ORDER BY `count` DESC;
The performance difference between the two queries is unlikely to be noticeable and it might be argued that the intent is clearer in your version with the sub-query.
The surrogate key ID on your PeopleFood table should be dropped in favour of the compound “natural” primary key on people_id and food_id.
The Cost of Useless Surrogate Keys in Relationship Tables

Inner join:
SELECT p.People_id, COUNT(p.People_id) as count FROM PeopleTable p
INNER JOIN FoodTable f
ON(p.People_id = f.FoodId)
WHERE people = 1
GROUP BY p.people_id
ORDER BY count DESC;
If it helps, please mark it as an accepted answer!

Related

Trying to get latest status for related shipment but the results I receive are incorrect

I am currently working on a project while trying to learn MySQL and I would like to join three tables and get the latest status for each related shipment. Here are the tables I'm working with (with example data):
shipments
id
consignee
tracking_number
shipper
weight
import_no
1
JOHN BROWN
TBA99900000121
AMAZON
1
101
2
HELEN SMITH
TBA99900000190
AMAZON
1
102
3
JACK BLACK
TBA99900000123
AMAZON
1
103
4
JOE BROWM
TBA99900000812
AMAZON
1
104
5
JULIA KERR
TBA99900000904
AMAZON
1
105
statuses
id
name
slug
1
At Warehouse
at_warehouse
2
Ready For Pickup
ready_for_pickup
3
Delivered
delivered
shipment_status (pivot table)
id
shipment_id
status_id
1
1
1
2
2
1
3
3
1
4
4
1
5
5
1
6
1
2
7
2
2
8
3
2
9
4
2
10
5
2
all tables do have created_at and updated_at timestamp columns
Example of the results I'm trying to achieve
slug
shipment_id
status_id
ready_for_pickup
1
2
ready_for_pickup
2
2
ready_for_pickup
3
2
ready_for_pickup
4
2
ready_for_pickup
5
2
Here's the query I wrote to try to achieve what I'm looking for based on examples and research I did during the past couple of days. I find that sometimes there is sometimes a mismatch with the latest status that relates to the shipment
SELECT
statuses.slug AS slug,
MAX(shipments.id) AS shipment_id,
statuses.id AS status_id,
FROM
`shipments`
INNER JOIN `shipment_status` ON `shipment_status`.`shipment_id` = `shipments`.`id`
INNER JOIN `statuses` ON `shipment_status`.`status_id` = `statuses`.`id`
GROUP BY
`shipment_id`
Because we need to reference other fields from the same record that evaluates from the MAX aggregation, you need to do it in two steps, there are other ways, but I find this syntax simpler:
SELECT
shipments.id AS id,
statuses.slug AS slug,
statuses.id AS status_id,
shipment_status.shipment_id as shipment_id
FROM
`shipments`
INNER JOIN `shipment_status` ON `shipment_status`.`shipment_id` = `shipments`.`id`
INNER JOIN `statuses` ON `shipment_status`.`status_id` = `statuses`.`id`
WHERE
shipment_status.id = (
SELECT MAX(shipment_status.id)
FROM `shipment_status`
WHERE shipment_status.shipment_id = shipments.id
)
try it out!
This query makes the assumption that the id field is an identity column, so the MAX(shipment_status.id) represents only the most recent status for the given shipment_id
You can use window functions:
SELECT s.id, st.slug, st.id
FROM shipments s JOIN
(SELECT ss.*,
ROW_NUMBER() OVER (PARTITION BY shipment_id ORDER BY ss.id DESC) as seqnum
FROM shipment_status ss
) ss
ON ss.shipment_id = s.id JOIN
statuses st
ON ss.status_id` = st.id
WHERE ss.seqnum = 1;
Also note the use of table aliases so the query is easier to write and to read.

Mysql group by some field non distinct

I have some task.
I need to get this table. It consist of two tables. where table_2.name not distinct.
Please help me to make this query. Thanks!
id name1 id name2
1 Alex 2 Alexander
2 Alex 3 Alexan
4 Vlad 5 Vladimir
5 Vlad 6 Vladik
From two tables.
Table_1
id name
1 Alex
2 Pit
3 Vlad
And
Table_2
id id_table_1 real_name
1 1 Alexander
2 1 Alexan
3 2 Piter
4 3 Vladimir
5 3 Vladik
my query
select table_1.name,table_2.id,table_2.real_name
from table_1 join table_2
where table_1.id = table_2.id_table_1
if all you want is to combine duplicated rows, use SELECT DISTINCT.
If you need to combine rows that are duplicate in some columns, use GROUP BY but you need to to specify what to do with the other columns. You can either omit them (by not listing them in the SELECT clause) or aggregate them (using functions like SUM, MIN, and AVG)

mysql select in another select group: how many people in downline?

Hello i've a table similar to this one:
id sponsor name
------------------------
1 0 Sasha
2 1 John
3 1 Walter
4 3 Ashley
5 1 Mark
6 4 Alexa
7 3 Robert
8 3 Frank
9 4 Marika
10 5 Philip
11 9 Elizabeth
when i choose an ID (call it MYCHOICE) i want know all the name of people who has sponsor like MYCHOICE... is simply:
select * from tablename where sponsor=MYCHOICE
but... here is the problem... i would know how many people there is in the downline of this results... so... how many records there are with sponsor like each id.
if i choose id 1 result should be
id name downline
----------------------
2 John 0 (noone with sponsor=2)
3 Walter 3 (3 with sponsor=3: ashley, robert, frank)
5 Mark 1 (1 with sponsor=5: philip)
if i choose id 4 result should be
id name downline
----------------------
6 Alexa 0
9 Marika 1 (1 with sponsor=9: Elizabeth)
i try this "bad solution" if mychoice is 1
select sponsor,count(*) as downline from tablename where sponsor in
(select id from tablename where sponsor=1) group by sponsor order by
downline desc
result of this query is
sponsor downline
---------------------
3 3
5 1
there are 2 problems:
- names are not rights and is not that i want
- the count 0 "2|John|0" in the example dont appears
thank u for advice and help, sorry for english,
N.
SELECT child.id,
child.name,
COUNT(grandchild.sponsor) downline
FROM TableName child
INNER JOIN TableName parent
ON child.sponsor = parent.id AND
parent.id = ? -- << user choice
LEFT JOIN TableName grandchild
ON child.id = grandchild.sponsor
GROUP BY child.id, child.name
SQLFiddle Demo
As you can see, the table is joined to itself twice. The first join that uses INNER JOIN gets the records associated with the Sponsor which is your user_choice. The second join which uses LEFT JOIN gets all the records associated with records from your user_choice.

Make in clause to match all items ot any alternative?

I have a table hotel [hotelid,hotelname,etc]
and another table facilities[facilityid,facilityname]
these 2 tables are linked through table hotel_to_facilities_map[hotelid,facility_id]
so the table hotel_to_facilities_map might contain values as
hotelid facility_id
-------------------
1 3
1 5
1 6
2 6
2 2
2 5
now i want to retrieve all the hotels which match ALL facilities asked for
select * from hotel_to_facilities_map where facility_id IN (3,5,2)
but this will cause the match as an OR Expression while i need AND.
is there any workaround or solution for this?
select hotelid
from hotel_to_facilities_map
where facility_id in (3,5,2)
group by hotelid
having count(*) = 3

Complex query involving timestamps

I'm having some trouble with a complex query involving the following tables. Assume time is using the built-in sqlite timestamp datatype.
I am trying to return the customers whose 2nd purchase is within 4 hours of their first purchase AND if it's within 2 hours it must be from a different store.
I'm having trouble wrapping my head around how to refer to the specific rows to compare a first purchase with a second purchase.
purchases
purchase_id | customer_id | store_id | purchase_time
1 1 1 2009-01-27 10:00:00.0
2 1 2 2009-01-27 10:30:00.0
3 2 1 2009-01-27 10:00:00.0
4 2 1 2009-01-27 10:30:00.0
5 3 1 2009-01-27 10:00:00.0
6 3 2 2009-01-27 16:00:00.0
7 4 3 2009-01-27 10:00:00.0
8 4 3 2009-01-27 13:00:00.0
stores
store_id | misc columns...
1
2
3
customers
customer_id | f_name
1 name1
2 name2
3 name3
4 name4
The correct return would be name1, name4 in this case.
You're going to be joining the purchase table to itself, and then selecting on one of the two criteria.
The only real trick here is to formulate the different time criteria as:
Purchases that were made < 2 hours at different stores.
Purchases that were made between 2 and 4 hours, independent of store_id.
Both of which obviously apply for the same customer_id.
So, we've got:
select p1.purchase_id purchase_1,
p2.purchase_id purchase_2,
c.name,
p1.customer_id customer
from purchases p1
join purchases p2 on
p1.customer_id = p2.customer_id
join customer c on c.customer_id = p1.customer_id
where p1.purchase_time < p2.purchase_time
and (
(
addtime(p1.purchase_time,'2:00:00') >= p2.purchase_time
and p1.store_id <> p2.store_id
)
or
(
addtime(p1.purchase_time,'2:00:00') < p2.purchase_time
and addtime(p1.purchase_time,'4:00:00') >= p2.purchase_time
)
)
Which joins purchases to itself by customer_id, first checks that you're comparing earlier purchases to later purchases, and then applies the two different criteria in the criteria that are ORed.
I find the time comparisons easiest to do with the addtime() and then comparing the results. Others may prefer other ways.
SQL Fiddle here: http://sqlfiddle.com/#!2/14dda/2
Results:
PURCHASE_1 PURCHASE_2 NAME CUSTOMER
1 2 name1 1
7 8 name4 4
--
EDIT: Perhaps, you'd get some efficiency by moving the p1.purchase_time < p2.purchase_time up into the join clause. This might be faster with lots of data, though the execution plans for this little amount of data are identical. You'd like the optimizer to eliminate all those cases where p1.purchase_time > p2.purchase_time before doing the more expensive comparisons. But that's somewhat beyond the basic question of ways to write this query.