MySQL select unique values from a relationship table - mysql

I am currently working on a MYSQL project which has a few fairly standard many to many relationship.
I've shown a greatly simplified relationship table to aid my example.
table: book_authors
ID BOOKS AUTHORS
1 1 4
2 1 5
3 4 4
4 4 5
5 2 6
6 2 1
7 2 5
8 3 6
9 3 5
10 3 1
12 5 2
13 6 2
14 7 5
What I'm looking to achieve is to be able to select all books which have specified authors, and only get the books which match all of the supplied authors. The number of authors requested each time is also variable.
So if i'm looking for all books written by author 4 and author 5, I'll get result of books 1 and 4 only. If im looking for books written by author 5 only, i'll get book 7 only.

I don't think there's a very smooth way to do this, but if performance isn't a big concern, you can exploit MySQL's GROUP_CONCAT function, and write this:
SELECT books
FROM book_authors
WHERE books IN
( SELECT books
FROM book_authors
WHERE authors = 4
)
GROUP
BY books
HAVING GROUP_CONCAT(authors ORDER BY authors) = '4,5'
;
(The WHERE clause isn't even needed for correct results, but it makes the query less gratuitously expensive.)

If it's a one off you could do:
not very repeatable though.
select distinct
ba1.books
from
book_authors ba1
join
book_authors ba2
on ba1.id<>ba2.id and ba1.books=ba2.books
where
ba1.authors=5
and ba2.authors=4
edit: forgot part of the join

Related

Limit selected results by unique selected IDs when using left joins

I have a table users and some other tables like images and products
Table users:
user_id user_name
1 andrew
2 lutz
3 sophie
4 michael
5 peter
6 oscor
7 anton
8 billy
9 henry
10 jon
Tables images:
user_id img_type img_url
1 0 url1
1 1 url4
2 0 url5
7 0 url7
8 0 url8
9 1 url9
Table Products
user_id prod_id
1 5
1 55
2 555
8 5555
9 5
9 55
I use this kind of SELECT:
SELECT * FROM
(SELECT user.user_id,user.user_name, img.img_type, prod.prod_id FROM
users
LEFT JOIN images img ON img.user_id = users.user_id
LEFT JOIN products prod ON prod.user_id = users.user_id
WHERE user.user_id <= 5) AS users
ORDER BY user.user_id ASC
The result should be the following output. Due to performance improvements, I use ORDER BY and an inner select. If I put a LIMIT 5 within the inner or outer select, things won't work. MySQL will hard LIMIT the results to 5. However I need the LIMIT of 5 (pagination) found unique user_id results which would lead to 9 in this case.
Can I use maybe an if-statement to push an array with found user_id and break/finish up the select when the array consist of 5 UIDs? Or can I modify somehow the select?
user_id user_name img_type prod_id
1 andrew 0 5
1 andrew 1 5
1 andrew 0 55
1 andrew 1 55
2 lutz 0 5
2 lutz 0 55
3 sophie null null
4 michael null null
5 peter null null
results: 9
LIMIT 5 and user_id <= 5 do not necessarily give you the same results. One reason: There are multiple rows (after the JOINs) for user_id = 1. This is because there can be multiple images and/or multiple products for a given 'user'.
So, first decide which you want.
LIMIT without ORDER BY gives you an arbitrary set of rows. (Yeah, it is somewhat predictable, but you should not depend on it.)
ORDER BY + LIMIT usually implies gathering all the potentially relevant rows, sorting them, then doing the "limit". There are sometimes ways around this sluggishness.
LEFT leads to the NULLs you got; did you want that?
What do you want pagination to do if you are displaying 5 items per page, but user 1 has 6 images? You need to think about this edge case before we can help you with a solution. Maybe you want all of user 1 on a page, even if it exceeds 5? Maybe you want to break in the middle of '1'; but then we need an unambiguous way to know where to continue from for the next page.
Probably any viable solution will not use nested SELECTs. As you are finding out, it leads to "errors". Think of it this way: First find all the rows you need to display on all the pages, then carve out 5 for the current page.
Here are some more musings on pagination: http://mysql.rjweb.org/doc.php/pagination

MySQL custom sort by subquery impacts performance

I have 1 table from which I return search results and display them in a a specific order. This example is an exact, simplified version of my db structure: http://www.java2s.com/Code/SQL/Select-Clause/Orderbyvaluefromsubquery.htm
and here is my current code, which works but heavily impacts performance to a large extend because of the subquery used:
SELECT * FROM `table` AS p1
WHERE CONCAT(title,artist,creator,version) LIKE '%searchInput%'
ORDER BY
(SELECT
MAX(`rating`) FROM `table` AS p2 WHERE p1.setId=p2.setId
) DESC
the above code searches and sorts the result sets by the highest rating in the set and that all rows from the same set are kept together, for example:
id setId rating title,artist,etc...
1 1 5
2 1 5
3 2 7
4 1 6
5 2 1
6 3 3
would sort to:
id setId rating title,artist,etc...
3 2 7
5 2 1
4 1 6
1 1 5
2 1 5
6 3 3
Currently it takes around 8.5sec to query 1000 rows and over half a minute for a large amount of rows, is there any way to improve the performance or would it be better to fetch all the results and sort them in PHP memory?
Help is much appreciated
You can probably speed things up a bit by separating the LIKEs:
SELECT p1.* FROM `table` AS p1
WHERE (title LIKE '%searchInput%')
OR (artist LIKE '%searchInput%')
OR (creator LIKE '%searchInput%')
OR (version LIKE '%searchInput%')
ORDER BY
(SELECT MAX(`rating`) FROM `table` AS p2 WHERE p1.setId=p2.setId) DESC
You could also try to
CREATE INDEX tbl_ndx ON table(setId, rating)
to improve sorting performances.

Mysql query contains

Table
id name(varhcar)
2 15
3 15,23
4 1315,424
5 1512,2323
6 23,15,345
7 253,234,15
I need to find out those values which contains 15 which mean i need 2,3,6,7 not 4,5.
Above is sample data, in real time it can be any number.
Can anyone please help me?
If your database is small, consider using find_in_set function:
select * from your_table
where find_in_set('15',name);
Consider change the model to master-detail table to increase the speed if you have a big table.
This is the kind of relational model you could adopt to make this an easy problem to solve:
TABLE: records
id
2
3
4
5
6
7
TABLE: values
record_id value
2 15
3 15
3 23
4 1315
4 424
5 1512
5 2323
6 23
6 15
6 345
7 253
7 234
7 15
Then you can query:
SELECT DISTINCT id FROM records
INNER JOIN values ON records.id = values.record_id AND values.value = 15
This is the only way you can take good advantage of MySQL's query optimizer.
Not that it's impossible to do what you're trying to do, but it kind of misses the point.
If you're already storing data in this format, you should write a one-time migration to transfer it to this "normalized" format in the programming language of your choice, using something like Java's split or PHP's explode.

Find all the leaf nodes below a subtree in a Tree structure in sql server

I've a tree structure, and its subsequent assignment table for customer categories in an sql server database.
CustomerCategory (CategoryID, ParentId)
CustomerInCategory(CustomerID, CategoryID)
If a CustomerCategory has any customer assigned to it, we can't add another subcategory to it. So, Customer can only be added to the lowest level in every sub tree. In other sense, the result of this query
SELECT * FROM `CustomerCategory` WHERE `CategoryId` NOT IN
(SELECT DISTINCT `parentid` FROM `CustomerCategory` WHERE `parentid` IS NOT NULL)
would yield leaf nodes. The Other thing is that, this tree might have subtrees of different levels, and we also, don't want to bound the number of levels in anyway, however, our users won't need more than 10 levels. Consider this as an illustration
CategoryID------ParentID---------------Name
1 NULL All Customers
2 1 Domestic
3 1 International
4 2 Independent Retailers
5 2 Chain Retailers
6 2 Whole Sellers
7 5 A-Mart
8 5 B-Mart
9 4 Grocery Stores
10 4 Restaurants
11 4 Cafes
CustomerID---------CustomerName----------Category
1 Int.Customer#1 3
2 Int.Customer#2 3
3 A-Mart.Branch#1 7
4 A-Mart.Branch#2 7
5 B-Mart.Branch#1 8
6 B-Mart.Branch#2 8
7 Grocery#1 9
8 Grocery#2 9
9 Grocery#3 9
10 Restaurant#1 10
11 Restaurant#2 10
12 Cafe#1 11
13 Wholeseller#1 6
14 Wholeseller#2 6
My requirement is something like this, "Given a node in Categories, Return All the Customers attached to any node below it".
How can I do it with sql?
Obviously this can be done with a recursive call in the code, but how can we do it in t-sql (without calling a stored procedure several times or using text-based search)?
Can any body, Use a CTE to solve this problem?
I have a result set of something like this in mind
CustomerID--------Customer Name----------------CategoryId----------CAtegoryName
12 Cafe#1 11 Cafes
12 Cafe#1 4 IndependentRetailers
12 Cafe#1 2 Demoestic
12 Cafe#1 1 AllCustomers
.
.
.
4 A-Mart.Branch#2 7 A-Mart
4 A-Mart.Branch#2 5 Chain Retailers
4 A-Mart.Branch#2 2 Domestic
4 A-Mart.Branch#2 1 All Customers
.
.
.
14 Wholeseller#2 6 WholeSellers
14 Wholeseller#2 2 Domestic
14 Wholeseller#2 1 All Customers
This is not necessarily a good Idea to layout a result like this, This would consume too much space, something that might not be required, yet, a search in such result set would be very fast. If I want to find all the customers below say categoryId = 2 , I would simply query
SELECT * FROM resultset where category ID = 2
Any suggestions to improve the data model is super welcomed! If It helps solving this problem.
Once again, I'm not fixated on this result set. Any other Suggestion that solves the problem,
"Given a node in Categories, Return All the Customers attached to any node below it", is well accepted.
You can use a CTE to recursively build a table containing all the parent-child relationships and use the where clause to get only the subtree you need (in my example, everyting under CategoryId 5) :
WITH CategorySubTree AS (
SELECT cc.CategoryId as SubTreeRoot,
cc.CategoryId
FROM CustomerCategory cc
UNION ALL
SELECT cst.SubTreeRoot, cc.CategoryId
FROM CustomerCategory cc
INNER JOIN CategorySubTree cst ON cst.CategoryId = cc.parentId
)
SELECT cst.CategoryId
FROM CategorySubTree cst
WHERE cst.SubTreeRoot = 5
You can modify this query to add whatever you need, for example, to get customers linked to the category nodes in the subtree :
WITH CategorySubTree AS (
SELECT cc.CategoryId as SubTreeRoot,
cc.CategoryId
FROM CustomerCategory cc
UNION ALL
SELECT cst.SubTreeRoot, cc.CategoryId
FROM CustomerCategory cc
INNER JOIN CategorySubTree cst ON cst.CategoryId = cc.parentId
)
SELECT cst.CategoryId,cic.CustomerId
FROM CategorySubTree cst
INNER JOIN CustomerInCategory cic ON cic.CategoryId = cst.CategoryId
WHERE cst.SubTreeRoot = 5
And of course you can join further tables to get labels and other needed information.

Storing data in a link table

Supoose I have the following:
tbl_options
===========
id name
1 experience
2 languages
3 hourly_rate
tbl_option_attributes
=====================
id option_id name value
1 1 beginner 1
2 1 advanced 2
3 2 english 1
4 2 french 2
5 2 spanish 3
6 3 £10 p/h 10
7 3 £20 p/h 20
tbl_user_options
================
user_id option_id value
1 1 2
1 2 1
1 2 2
1 2 3
1 3 20
In the above example tbl_user_options stores option data for the user. We can store multiple entries for some options.
Now I wish to extend this, i.e. for "languages" I want the user to be able to specify their proficiency in a language (basic/intermediate/advanced). There will also be other fields that will have extended attributes.
So my question is, can these extended attributes be stored in the same table (tbl_user_options) or do I need to create more tables? Obviously if I put in a field "language_proficiency" it won't apply to the other fields. But this way I only have one user options table to manage. What do you think?
EDIT: This is what I propose
tbl_user_options
================
user_id option_id value lang_prof
1 1 2 null
1 2 1 2
1 2 2 3
1 2 3 3
1 3 20 null
My gut instinct would be to split the User/Language/Proficiency relationship out into its own tables. Even if you kept it in the same table with your other options, you'd need to write special code to handle the language case, so you might as well use a new table structure.
Unless your data model is in constant flux, I would rather have tbl_languages and tabl_user_languages tables to store those types of data:
tbl_languages
================
lang_id name
1 English
2 French
3 Spanish
tbl_user_languages
================
user_id lang_id proficiency hourly_rate
1 1 1 20
1 2 2 10
2 2 1 15
2 2 3 20
3 3 2 10
Designing a system that is "too generic" is a Turing tarpit trap for a relational SQL database. A document-based database is better suited to arbitrary key-value stores.
Excepting certain optimisations, your database model should match your domain model as closely as possible to minimise the object-relational impedance mismatch.
This design lets you display a sensible table of user language proficiencies and hourly rates with only two inner joins:
SELECT
ul.user_id,
u.name,
l.name,
ul.proficiency,
ul.hourly_rate
FROM tbl_user_languages ul
INNER JOIN tbl_languages l
ON l.lang_id = ul.lang_id
INNER JOIN tbl_users u
ON u.user_id = ul.user_id
ORDER BY
l.name, u.hour
Optionally you can split out a list of language proficiencies into a tbl_profiencies table, where 1 == Beginner, 2 == Advanced, 3 == Expert and join it onto tbl_user_languages.
i'm thinking it's a mistake to put "languages" as an option. while reading your text it seems to me that english is an option, and it might have an attribute from option_attributes.