MySQL excluding results on a JOIN - mysql

I'm working on a restaurant CMS app. I have a many-to-many relationship between 2 tables, menu_sections and menu_items. The relationship is maintained with a table in between called menu_relationships.
As an example let's say the menu section called Snacks (menu_section_id = 1) contains a menu item called Pretzels (menu_item_id = 1) and the menu section called Desserts (menu_section_id = 2) contains a menu item called Ice Cream (menu_item_id = 2), but Ice Cream is also contained within another menu section called Kids Food (menu_section_id = 3). So there would be 3 rows in the menu_relationships table to map out these 3 relationships. The relationship table would look like this:
---------------------------------------
| menu_section_id | menu_item_id |
|=====================================|
| 1 | 1 |
|-------------------------------------|
| 2 | 2 |
|-------------------------------------|
| 3 | 2 |
---------------------------------------
So far so good.
I want to generate a result set that will return the names of all menu items except for menu items with a given menu_section_id. So to return the menu item names, I have a join on the menu_items table. Here's the SQL:
SELECT menu_section_id, menu_items.menu_item_id, menu_item_name
FROM menu_relationships
JOIN menu_items
ON menu_items.menu_item_id = menu_relationships.menu_item_id
WHERE menu_section_id != 2
The result set which will give me a row for each relationship that doesn't contain a given menu_section_id. With the example data I would be getting 2 rows back from the relationship table:
-----------------------------------------------------------
| menu_section_id | menu_item_id | menu_item_name |
|======================================|==================|
| 1 | 1 | Pretzels |
|--------------------------------------|------------------|
| 3 | 2 | Ice Cream |
-----------------------------------------------------------
But what I want is to exclude the menu item altogether from the result set, if it has ANY relationship to the specified menu_section_id. In other words, in the case of this example , I only want to return rows for menu items that have no relationship mappings at all to a menu_section_id of 2, I only want to return the Pretzels row.
I've tried various things with GROUP BY and HAVING using the bit_xor() aggregate function, but so far no luck at all in getting what I want.
I probably could have taken less time to explain that but I wanted it to be a clear as I can make it. I hope it is. Can anyone help?

This is a wonderful case for the use of LEFT OUTER JOIN because it includes all rows from your left-hand table and matches where it can, returning NULL for any non-match.
Building on Mark Breyer's sample query from above, see this example:
SELECT R.menu_section_id, I.menu_item_id, I.menu_item_name
FROM menu_items AS I
LEFT OUTER JOIN menu_relationships R on (R.menu_item_id=I.menu_item_id) AND (R.menu_section_id = 2)
The mysql optimizer may actually rewrite this as a subquery - i'm not an optimization expert by any means - I'd take a look at the way your indexes are built and see if this type of join makes sense for your schema. I'd also test to see if it's actually faster because it's actually less semantic.

There are many ways to do this. Here is one example using WHERE ... NOT IN (...):
SELECT
R.menu_section_id,
I.menu_item_id,
I.menu_item_name
FROM menu_items AS I
JOIN menu_relationships AS R
ON R.menu_item_id = I.menu_item_id
WHERE I.menu_item_id NOT IN
(
SELECT menu_item_id
FROM menu_relationships
WHERE menu_section_id = 2
)

I would use a subquery for this, getting me every menu_item_id which has the menu_section_id 2 and then using NOT IN. Here you go:
SELECT menu_section_id, menu_items.menu_item_id, menu_item_name
FROM menu_relationships
JOIN menu_items
ON menu_items.menu_item_id = menu_relationships.menu_item_id
WHERE menu_relationships.menu_item_id NOT IN (
SELECT menu_item_id
FROM menu_relationships
WHERE menu_section_id = 2
);

I was going to suggest a subquery a well, except that I wanted to mention that subqueries can dramatically affect performance on your site. You may want to consider options for caching to avoid serious load time hangups due to things like this.
In most cases you'll be ok, but if you're only showing us part of the issue and just not mentioning the irrelevant details then you could very well be building a site where you run 100 of these queries on a page, for example, because someone mentioned it here without mentioning the compounded overhead things like this can result in...
Like I said though, you'll probably be fine. Just don't do a subquery within a subquery unless you want to restart your server.

Related

Running a query using multiple junction tables

I have a table listing case studies and another table that list outcomes. A case study can have multiple outcomes so I created a junction table.
I want to run a SQL that will show the case once and each of the outcomes so far I have
SELECT caseSummaries.caseTitle, caseSummaries.caseSynopsis, RESULTS.resultText
FROM JNCT_RESULT_CASESUMMARY
JOIN caseSummaries ON JNCT_RESULT_CASESUMMARY.caseSummary_FK = caseSummaries.caseID
JOIN RESULTS ON JNCT_RESULT_CASESUMMARY.result_FK = RESULTS.result_ID
GROUP BY caseSummaries.caseID;
which gives me one row and only the first outcome of three. How can I show the others in the same row? Will I have to create temporary tables and how is that done? So far I have used a LEFT JOIN but I still get one row. If I don't use GROUP BY I get the caseSummaries.caseTitle repeated thrice and the outcome for each listed. I want to get the case summary once and each outcome appear in a new column.
Thanks,
C
Assume from the question I have two tables
Case studies with three fields:
ID
Title
Synopsis
and another table containing Outcomes:
[bullet]
Apology
Compensation
Policy change
There is a many to many relationship and my SQL needs to show the outcomes for each case study like this:
Case 1 | Title | Synopsis | Apology|Compensation|Policy change
Case 2 |Title | Synopsis | Apology|NULL|Policy change
assuming the Case 2 only has 2 outcomes.
At the moment without the GROUP BY or SELECT DISTINCT I get
Case 1 | Title | Synopsis | Apology
Case 1 | Title | Synopsis |Compensation
Case 1 | Title | Synopsis |Policy change
Case 2 | Title | Synopsis | Apology
Case 2 | Title | Synopsis |Policy change
The group_concat function should do what you need:
SELECT caseSummaries.caseTitle,
caseSummaries.caseSynopsis,
GROUP_CONCAT(RESULTS.resultText)
FROM JNCT_RESULT_CASESUMMARY
JOIN caseSummaries ON JNCT_RESULT_CASESUMMARY.caseSummary_FK = caseSummaries.caseID
JOIN RESULTS ON JNCT_RESULT_CASESUMMARY.result_FK = RESULTS.result_ID
GROUP BY caseSummaries.caseTitle, caseSummaries.caseSynopsis;

MySQL queries, selecting field from one of many databases

I have a remarks table which can be linked to any number of other items in a system, in the case of this example we'll use bookings, enquiries and referrals.
Thus in the remarks table we have columns
remark_id | datetime | text | booking_id | enquiry_id | referral_id
1 | 2014-06-28 | abc | 0 | 8 | 0
2 | 2014-06-27 | def | 3 | 0 | 0
2 | 2014-05-31 | ghi | 0 | 0 | 10
Etc...
Each of the item tables will have a field called name. Thus when I want to select a remark the likelihood is I'll need this name.
I'd like to achieve this with a single query, getting a 2d array as follows:
['remark_id'=>1, 'datetime'=>'2014-06-28', 'text'=>'abc', 'name'=>'Harold']
However the query I'd expect to use would be
SELECT r.remark_id,r.datetime,r.text
,b.name AS book,rr.name AS referral,e.name AS enquiry
FROM remarks AS r
LEFT JOIN bookings AS b ON b.book_id=r.book_id
LEFT JOIN referrals AS rr ON rr.referral_id=r.referral_id
LEFT JOIN enquiries AS e ON e.enquiry_id=r.enquiry_id
Leaving me with the output
['remark_id'=>1, 'datetime'=>'2014-06-28', 'text'=>'abc', 'book'=>'Harold', 'referral'=>'', 'enquiry'=>'']
And more processing to do before or during rendering it to a view.
Is there a way to write a query such that it would fill a field from the first NOT NULL string it encountered in one of the joined tables?
Please only suggest using a different database system if you know that MySQL doesn't provide any way to do what I'm asking. If it's the case it can't be done there's no business sense in rewriting the system anyway, but I'd like to ask!
Two ways I can think of:
use UNION:
SELECT remark_id, datetime, text, name
FROM remarks
JOIN bookings ON (remarks.book_id = bookings.book_id)
UNION
SELECT remark_id, datetime, text, name
FROM remarks
JOIN referrals ON (remarks.referral_id = referrals.referral_id)
UNION
SELECT remark_id, datetime, text, name
FROM remarks
JOIN enquiries ON (remarks.enquiry_id = enquiries.enquiry_id)</code>
use IFNULL (probably much slower):
SELECT r.remark_id,r.datetime,r.text,
IFNULL(b.name,IFNULL(rr.name,e.name)) AS name
FROM remarks AS r
LEFT JOIN bookings AS b ON b.book_id=r.book_id
LEFT JOIN referrals AS rr ON rr.referral_id=r.referral_id
LEFT JOIN enquiries AS e ON e.enquiry_id=r.enquiry_id</code>
Variant 2 is really much slower because of the LEFT JOINs.
Also, generally I would not recommend using 0 as value for non-existent links, rather use NULL. This will allow MySQL to speed up the join.
one way to achieve this is with nested if statements:
if(b.name is not null, b.name, if(rr.name is not null, rr.name, e.name)) as name
one drawback is that this gives an implicit priority to books? not sure if that would be an issue.
perhaps the main drawback, though, is that this is kind of "magical" and has goofy syntax so it might be more clear to just handle those cases in the controller after all.
Seems quite messy that you have multiple unused columns for each entry, unless I'm not understanding correctly. If you add more tables, you'd have to adjust each of the views so that it would filter out the new table.
I'd be tempted to redesign your structure so that each of the tables has a remarkgroup_id column, then add the following remark table
remark_id, remarkgroup_id, date, message
This would clean up the extra unused columns and allow you to use simple joining logic.

MySQL query get column value similar to given

Sorry if my question seems unclear, I'll try to explain.
I have a column in a row, for example /1/3/5/8/42/239/, let's say I would like to find a similar one where there is as many corresponding "ids" as possible.
Example:
| My Column |
#1 | /1/3/7/2/4/ |
#2 | /1/5/7/2/4/ |
#3 | /1/3/6/8/4/ |
Now, by running the query on #1 I would like to get row #2 as it's the most similar. Is there any way to do it or it's just my fantasy? Thanks for your time.
EDIT:
As suggested I'm expanding my question. This column represents favourite artist of an user from a music site. I'm searching them like thisMyColumn LIKE '%/ID/%' and remove by replacing /ID/ with /
Since you did not provice really much info about your data I have to fill the gaps with my guesses.
So you have a users table
users table
-----------
id
name
other_stuff
And you like to store which artists are favorites of a user. So you must have an artists table
artists table
-------------
id
name
other_stuff
And to relate you can add another table called favorites
favorites table
---------------
user_id
artist_id
In that table you add a record for every artist that a user likes.
Example data
users
id | name
1 | tom
2 | john
artists
id | name
1 | michael jackson
2 | madonna
3 | deep purple
favorites
user_id | artist_id
1 | 1
1 | 3
2 | 2
To select the favorites of user tom for instance you can do
select a.name
from artists a
join favorites f on f.artist_id = a.id
join users u on f.user_id = u.id
where u.name = 'tom'
And if you add proper indexing to your table then this is really fast!
Problem is you're storing this in a really, really awkward way.
I'm guessing you have to deal with an arbitrary number of values. You have two options:
Store the multiple ID's in a blob object in JSON format. While MySQL doesn't have JSON functions built in, there are user defined functions that will extract values for you, etc.
See: http://blog.ulf-wendel.de/2013/mysql-5-7-sql-functions-for-json-udf/
Alternatively, switch to PostGres
Add as many columns to your table as the maximum number of ID's you expect to have. So if /1/3/7/2/4/8/ is the longest entry, have 6 columns in your table. Reason this is bad: you'll have sparse columns that'll unnecessarily slow your tables.
I'm sure you could write some horrific regex to accomplish the task, but I caution on using complex regex's on enormous tables.

Database design and query optimization/general efficiency when joining 6 tables in mySQL

I have 6 tables. These are simplified for this example.
user_items
ID | user_id | item_name | version
-------------------------------------
1 | 123 | test | 1
data
ID | name | version | info
----------------------------
1 | test | 1 | info
data_emails
ID | name | version | email_id
------------------------
1 | test | 1 | 1
2 | test | 1 | 2
emails
ID | email
-------------------
1 | email#address.com
2 | second#email.com
data_ips
ID | name | version | ip_id
----------------------------
1 | test | 1 | 1
2 | test | 1 | 2
ips
ID | ip
--------
1 | 1.2.3.4
2 | 2.3.4.5
What I am looking to achieve is the following.
The user (123) has the item with name 'test'. This is the basic information we need for a given entry.
There is data in our 'data' table and the current version is 1 as such the version in our user_items table is also 1. The two tables are linked together by the name and version. The setup is like this as a user could have an item for which we dont have data, likewise there could be an item for which we have data but no user owns..
For each item there are also 0 or more emails and ips associated. These can be the same for many items so rather than duplicate the actual email varchar over and over we have the data_emails and data_ips tables which link to the emails and ips table respectively based on the email_id/ip_id and the respective ID columns.
The emails and ips are associated with the data version again through the item name and version number.
My first query is is this a good/well optimized database setup?
My next query and my main question is joining this complex data structure.
What i had was:
PHP
- get all the user items
- loop through them and get the most recent data entry (if any)
- if there is one get the respective emails
- get the respective ips
Does that count as 3 queries or essentially infinite depending on the number of user items?
I was made to believe that the above was inefficient and as such I wanted to condense my setup into using one query to get the same data.
I have achieved that with the following code
SELECT user_items.name,GROUP_CONCAT( emails.email SEPARATOR ',' ) as emails, x.ip
FROM user_items
JOIN data AS data ON (data.name = user_items.name AND data.version = user_items.version)
LEFT JOIN data_emails AS data_emails ON (data_emails.name = user_items.name AND data_emails.version = user_items.version)
LEFT JOIN emails AS emails ON (data_emails.email_id = emails.ID)
LEFT JOIN
(SELECT name,version,GROUP_CONCAT( the_ips.ip SEPARATOR ',' ) as ip FROM data_ips
LEFT JOIN ips as the_ips ON data_ips.ip_id = the_ips.ID )
x ON (x.name = data.name AND x.version = user_items.version)
I have done loads of reading to get to this point and worked tirelessly to get here.
This works as I require - this question seeks to clarify what are the benefits of using this instead?
I have had to use a subquery (I believe?) to get the ips as previously it was multiplying results (I believe based on the complex joins). How this subquery works I suppose is my main confusion.
Summary of questions.
-Is my database setup well setup for my usage? Any improvements would be appreciated. And any useful resources to help me expand my knowledge would be great.
-How does the subquery in my sql actually work - what is the query doing?
-Am i correct to keep using left joins - I want to return the user item, and null values if applicable to the right.
-Am I essentially replacing a potentially infinite number of queries with 2? Does this make a REAL difference? Can the above be improved?
-Given that when i update a version of an item in my data table i know have to update the version in the user_items table, I now have a few more update queries to do. Is the tradeoff off of this setup in practice worthwhile?
Thanks to anyone who contributes to helping me get a better grasp of this !!
Given your data layout, and your objective, the query is correct. If you've only got a small amount of data it shouldn't be a performance problem - that will change quickly as the amount of data grows. However when you ave a large amount of data there are very few circumstances where you should ever see all your data in one go, implying that the results will be filtered in some way. Exactly how they are filtered has a huge impact on the structure of the query.
How does the subquery in my sql actually work
Currently it doesn't work properly - there is no GROUP BY
Is the tradeoff off of this setup in practice worthwhile?
No - it implies that your schema is too normalized.

Grabbing child-nodes of parent using the adjacency model in MySQL

I'm having difficulty with the current model for my MySQL Table, namely that I can't seem to properly query all the child nodes of a specific parent. As the title states, I'm using the adjacency model.
The problem is that most methods I have found online either query all leaf nodes, or select more than just what I'm attempting to grab.
The first tutorial I was following was on MikeHillyer.com, and his solution was to the effect of:
SELECT t1.name FROM
category AS t1 LEFT JOIN category as t2
ON t1.category_id = t2.parent
WHERE t2.category_id IS NULL;
The problem with this, is it queries all of the leaf nodes, and not just the ones related to the parents. He also suggested using the Nested Set Model, which I REALLY don't want to have to use due to it being a little more difficult to insert new nodes (I do realize it's significance though, I'd just rather not have to resort to it).
The next solution I found was on a shared slideshow on slide 53 (found from another answer here on StackOverflow). This solution is supposed to query a node's immediate children, only... The solution does not seem to be working for me.
Here's their solution:
SELECT * FROM Comments cl
LEFT JOIN Comments c2
ON(c2.parent_id = cl.comment_id);
Now, my table is a little different, and so I adjusted some of the code for it. A brief excerpt of my table is the following:
Table: category
id | parent | section | title | ...
----+--------+----------+----------+-----
1 | NULL | home | Home | ...
2 | NULL | software | Software | ...
3 | 2 | software | Desktop | ...
4 | 2 | software | Mobile | ...
5 | NULL | about | About | ...
6 | 5 | about | Legal | ...
... | ... | ... | ... | ...
When I modified the above query, I did the following:
SELECT * FROM category cat1
LEFT JOIN category cat2
ON(category.parent = cl.id);
This resulted in EVERYTHING being queried and tied in a table twice as long as the unaltered table (obviously not what I'm looking for)
I'm pretty certain I'm just doing something wrong with my query, and so I'm just hoping someone here can correct whatever my mistake is and point me in the right direction.
I know it's supposed to be easier to use a Nested Set Model, but I just dislike that option for the difficulty of adding new options.
Looks like you're very close. Your left join is guaranteeing that all records from the table will be returned.
See the below query.
SELECT c1.*, c2.id FROM category c1 INNER JOIN category c2 ON (c1.parent = c2.id);