Advanced searching in a joined multirelation MySQL DB - mysql

This might be a bit advanced to explain, as it's a pretty complicated thing I'm trying to do (at least to me).
I'm currently building a movie-database for personal use in PHP and MySQL, and the MySQL part is killing me. The current setup is like this:
I have a main movie database containing names, description and values with a single option (like year, age-limit and media-type (DVD, Blu-Ray, etc).
I have additional tables for language, subtitles, audio-formats etc. which all have two columns. One for the ID of the movie, and one that matches an index (eg. language id). These are supposed to be joined together with the main table, and concatted into a single field.
Example of my language table:
movid | langid
--------------
1 | 2
1 | 4
2 | 4
3 | 5
Optimally, I would like something like this:
| ID | name | description | year | subtitles | languages | audio |
--------------------------------------------------------------------
| 1 | One | Bla bla | 2010 | 2,3,5,6,7 | 3,6,22,6 | 10,5 |
| 2 | Another | foo bar | 2008 | 6,33,5,27 | 10,4,2,3 | 8,15 |
With the subtitles and languages being able to be exploded to a PHP array. That part I've actually got working just fine using GROUP_CONCAT, right up 'till the part where I need to search for specific subid's or langid's. This is the query I've been using so far. I hope you'll get the idea even though I havent written out all my table info:
SET SESSION group_concat_max_len = 512;
SELECT
movie.id,
movie.name,
movie.origname,
movie.`year`,
movie.`type`,
movie.duration,
movie.age,
GROUP_CONCAT(DISTINCT movie_language.langid ORDER BY langid) AS lang,
GROUP_CONCAT(DISTINCT movie_subtitles.subid ORDER BY subid) AS subtitles
FROM `movie`
LEFT JOIN `movie_audio` ON `movie`.`id`=`movie_audio`.`movid`
LEFT JOIN `movie_company` ON `movie`.`id`=`movie_company`.`movid`
LEFT JOIN `movie_genre` ON `movie`.`id`=`movie_genre`.`movid`
LEFT JOIN `movie_language` ON `movie`.`id`=`movie_language`.`movid`
LEFT JOIN `movie_subtitles` ON `movie`.`id`=`movie_subtitles`.`movid`
GROUP BY movie.id
I use the group_concat_max_len to prevent me getting a BLOB, and so far I have only tried group_concatting two of my joined tables (will add the rest later).
This returns exactly what I want, but I can only have one WHERE clause per joined table or it'll return 0 rows. Again, if I only search for one, it will only return the searched number/id in the GROUP_CONCAT'ted field.
Then I sorta fixed it using the IN() function. At least I thought I did. But the problem is that only works with what I'd call OR-searches. Adding:
WHERE movie_subtitles.subid IN ()
With numbers not in the subtitles table will still return the row, just only with the matching numbers. This is fine for half the searches, but I need a way to search with the AND-like method as well.
I have no idea if I need to restructure completely, or need a totally different query, but I hope for some assistance or hints.
I should perhaps say that I've had a look at the HAVING option as well, but as far as I've understood, it will not be effective on my query.
By the way, if this is impossible to do, I've considered scrapping the joined tables and replacing them with a field easily searchable in the main movie table (like using this "syntax": '#2##4#' and then using LIKE '%#2#%' AND '%#4#%' to match results, or as a last resort using PHP to sort it out (I'd rather die than doing that), though I'd rather like it if above solution could be fixed and used).
Thanks a lot in advance for helping away my headaches!

Sub-query your select then you will have an easier time with your clauses.
Like:
select *
from (`your big query above`) as t
where subtitles regexp `your ids you want`
and lang regexp `your ids you want`
Well, its not perfect since your ids will have been turned into strings (In postgres you have arrays so you could do a proper search from the top level.) I don't think I would really want search for ids with regular expressions though.
It would be better then, to not concatenate your ids until the final level. So you want 3 levels of queries:
select
stuff, ...
group_concats
from
(
select *
from (`your big query above but without the group_concat`) as inner
conditions ...
) as outer
edit
try this:
SELECT
id,
name,
origname,
`year`,
`type`,
duration,
age,
-- at this point we have the right rows we are just
-- grouping lang and subtitles
GROUP_CONCAT(DISTINCT langid ORDER BY langid) AS lang,
GROUP_CONCAT(DISTINCT subid ORDER BY subid) AS subtitles
from
(
(
SELECT
movie.id,
movie.name,
movie.origname,
movie.`year`,
movie.`type`,
movie.duration,
movie.age,
langid,
subid
FROM `movie`
LEFT JOIN `movie_audio` ON `movie`.`id`=`movie_audio`.`movid`
LEFT JOIN `movie_company` ON `movie`.`id`=`movie_company`.`movid`
LEFT JOIN `movie_genre` ON `movie`.`id`=`movie_genre`.`movid`
LEFT JOIN `movie_language` ON `movie`.`id`=`movie_language`.`movid`
LEFT JOIN `movie_subtitles` ON `movie`.`id`=`movie_subtitles`.`movid`
-- each row will have a different langid and different subid
GROUP BY
movie.id, langid, subid
) as inner
-- you should be able to do any complex condition as this point
where
(langid = 1 or langid = 2)
and (subid = 2 or subid = 3)
) as outer

Related

Mysql - Import values into a column from joined tables and two queries

my task is to have the list of all products with the associated brand.
I have available the following tables:
posts | term_relationships | term_taxonomy | terms
---------|------------------------|------------------|---------
ID | object_id | term_taxonomy_id | term_id
post_type| term_taxonomy_id | term_id | name
brand_id | | taxonomies
they are related in the following way:
posts.ID -> term_relationships.object_id
term_relationships.term_taxonomy_id -> term_taxonomy.term_taxonomy_id
term_taxonomy.term_id -> terms.term_id
the association between the fields is made by a script that assigns IDs as per attached image:
Query result
At this point I have to develop the query for assigning to posts.brand_id the value terms.term_id if the association is in place.
Filtering criteria are:
term_taxonomy.taxonomies="product_brand"
posts.post_type= "products"
The query logic could be:
IF
term_taxonomy.taxonomies="product_brand" AND
term_taxonomy.term_taxonomy_id = term_relationships.term_taxonomy_id AND
term_relationships.object_id = posts.ID
THEN
INSERT INTO terms_term_id = brand_id
ELSE
brands.brand_id IS NULL
I tried to translate it into MySQL but unsuccessfully. Do you have suggestion for this?
You can create a view and and then SELECT from that view as you can do with any other table (it is not a table, of course, even when it looks like). It is then a materialized view on your joined tables.
If you want to see what the optimizer does, prepend an EXPLAIN in front of it. Then you will see how good it performs. Okay, a bit OT but still generally advisable.
Maybe for that IF you need a function? I'm not sure about it. Still, I would recommend views over complex queries as they "hide" those complex queries behind a much more simple query. You can still do WHERE and ORDER BY and even JOIN again. But look out for performance (when it comes to high-load websites, every JOIN and missing index hurts).
Okay, OT. Maybe this gives you some hints at least.
I tried to create the VIEW:
create view brand2product as
select
mg_term_relationships.object_id,
mg_term_relationships.term_taxonomy_id ,
mg_term_taxonomy.term_id ,
mg_term_taxonomy.term_taxonomy_id ,
mg_termS.term_id ,
mg_terms.name
from mg_term_relationships
join mg_term_taxonomy on mg_term_taxonomy.term_taxonomy_id = mg_term_relationships.term_taxonomy_id
join mg_terms on mg_terms.term_id= mg_term_taxonomy.term_id
where mg_term_taxonomy.taxonomy="product_brand"
it reports the following error:
#1060 Duplicate column name "term_taxonomy_id"
The query itself works fine.

How to combine data if the a table returns multiple rows in a select?

Lets say I have a sql database and query like this:
http://sqlfiddle.com/#!2/10af9/3
As you can see, I have a few users who have a few favorite colors. What I would like to be able to do is run a query through and select all the users and their favorite colors... however, for ease of use in my own code, I would like there not to be duplicate data on each row.
What I would like is for the end result to be:
1 | Email | admin#sqlfiddle.com | [Blue, Green]
2 | Twitter | #sqlfiddle | [Purple]
That way I am returned is a table of users, where every row is a user, and I can get their favorite color.
If there's a different way of thinking about this (not combining, but ending up with a table of users and their favorite colors on ONE row), I'd love to hear that too.
Thanks.
The SQL Fiddle is MySQL, so I'm assuming that is the database. In MySQL, you can put them together using group_concat():
select sc.id, sc.type, sc.details,
group_concat(fc.color)
from supportContacts sc left join
favoriteColors fc
on sc.id = fc.support_id
group by sc.id, sc.type, sc.details;
If you really want them in the format with the square braces:
concat('[', group_concat(fc.color separator ', '), ']')
Here is the SQL Fiddle.
Other databases generally have similar functionality.
You can use GROUP_CONCAT which will do exactly what you want.
select
supportContacts.id,
supportContacts.type,
supportContacts.details,
GROUP_CONCAT(favoriteColors.color)
from supportContacts
left join favoriteColors on supportContacts.id = favoriteColors.support_id
group by supportContacts.id

getting quize data, questions and answers in 1 query?

I need to get quize title, quize description, quize questions and answers for each questions. My table structure is:
quizes
quize_id | title | user_id | ...
questions
questions_id | quize_id | question | ...
question_answers
answer_id | question_id | user_id | answer | ...
I can use join
SELECT * FROM quizes JOIN questions q ON q.quize_id=quizes.quize_id JOIN question_answers a ON a.question_id=q.question_id
But the problem with this is that I will get in results many rows with redundant data. For example each row will carry field title,user_id, ... Another way is to make for each question extra query to get answers. Is there any better way? Should I use only 1 query or more?
Your tables hold 3 types of data. If you use the query you've got, you'll get all the data as a big table. You've said that this involves a lot of duplication.
If you use multiple queries, you will get multiple result sets, which effectively will leave you with multiple tables, and thus this is unlikely to help.
You could cut the query down to just the columns you want to get the data for:
SELECT qq.Question, qa.Answer
FROM quizes qz
join questions qq on qz.quize_id = qq.quize_id
join question_answers qa on qq.question_id = qa.question_id
WHERE qz.quize_id = #quize_id
ORDER BY 1, 2 --or other ordering
However where there are multiple answers for the same question, the question will be repeated on every row. There isnt much you can do about that, it is the price of combining multiple table's data into one table ("denormalising").
If you need to format your output table so that it looks like this (but with more columns):
Quize_id | Question | Answer
1 Q1 A1
A2
Q2 A3
2 Q3 A4
This is a whole different matter. You would need to use the query you've got to populate a temporary table, ordering the data by the sort order you want displayed. To this table you'd need to add a primary key (integer) column, then run a set of update statements to replace the repeated values with nulls, then output the table in the order of the primary key column. (There are other ways to do this, but this is the easiest to explain)
Does this help?
I found also another way which return all data I need, including user details for each question:
SELECT
question,
group_concat(qa.answer SEPARATOR ',') as answers,
group_concat(qa.user_id SEPARATOR ',') as userIds,
group_concat(up.nickname SEPARATOR ',') as nickname
FROM quize_questions qq
INNER JOIN question_answers qa ON qa.question_id=qq.question_id
INNER JOIN user_profile up ON up.user_id = qa.user_Id
GROUP BY qq.question_id
I am just not sure if this is the right way. I am worried about speed.

MySQL Select based on count of substring in a column?

Let's say I have two tables (I'm trying to remove everything irrelevant to the question from the tables and make some sample ones, so bear with me :)
___________________ ________________________
|File | |Content |
|_________________| |______________________|
|ID Primary Key | 1 * |ID Primary Key |
|URL Varcher(255) |---------|FileID Foreign Key |
|_________________| | ref File(ID) |
|FileContent Text |
|______________________|
A File has a url. There may be many Content items corresponding to each File.
I need to create a query using these tables that I'm having some trouble with. I essentially want the query, in simple terms, to say:
"Select the file URL and the sum of the times substring "X" appears in all content entries associated with that file."
I'm pretty good with SQL selects, but I'm not so good with aggregate functions and it's letting me down. Any help is greatly appreciated :)
The query won't be efficient but might give you a hint:
SELECT url, cnt
FROM (
SELECT
f.id,
IFNULL(
SUM(
(LENGTH(c.text) - LENGTH(REPLACE(c.text, f.url, '')))/LENGTH(f.url)
),
0
) as cnt
FROM file c
JOIN content c ON f.id = c.fileid
GROUP BY f.id
) cnts JOIN file USING(id);
To append files that do not have a match in the content table you can UNION ALL the rest of use LEFT JOIN in the cnts subquery.
This solution attempts to use REGEXP to match the substring. REGEXP returns 1 if it matches, 0 if not, so SUM() them up for the total. REGEXP might seem like overkill, but would allow for more complicated matching than a simple substring.
SELECT
File.ID,
File.URL,
SUM(Content.FileContent REGEXP 'substring') AS numSubStrs
FROM File LEFT JOIN Content ON File.ID = Content.ID
GROUP BY File.ID, File.URL;
The easier method if a more complex match pattern won't ever be needed uses LIKE and COUNT(*) instead of SUM():
SELECT
File.ID,
File.URL,
COUNT(*) AS numSubStrs
FROM File LEFT JOIN Content ON File.ID = Content.ID
WHERE Content.FileContent LIKE '%substring%'
GROUP BY File.ID, File.URL;
Note the use of LEFT JOIN, which should produce 0 when there are not actually any entries in Content.

Rewriting subquery has having clause into a join?

I have a table with columns like this;
site, chromosome, strand.
The pair of site and chromosome should be unique while they can have more than one strand.
Before loading the data, I found that some of the sites have more than one chromosome, which is obviously an error. I was trying to identify the errors, which is sites with more than 1 chromosome. I've thought about it and couldn't come up with proper SQL.
So I divided the problem. First I create a table selecting distinct records by;
create table distinct_pair
as select distinct site, chromosome
from original_table;
Then I could find the sites that have more than one chromosome by this;
select site
from distinct_pair
group by site
having count(site)>1;
It worked fine. Then trying to see the whole information of the errors from the original table, I did this;
select * from original_table
where site
in (select name from distinct_pair
group by site
having count(site)>1);
Then this subquery was way too slow even though the columns were all indexed.
I tried to rewrite the query as a join but the having makes it difficult.
Please help me.
===================
Thanks all of you who answered this question.
My data look like this.
Site | Chromosome | Strand
N111 | 2L | +
N111 | 2L | -
N112 | 2L | +
N112 | 2L | -
N112 | 3L | +
N112 | 3L | -
....
In this case, N111 is fine but N112 is an error because it has two chromosome data. The subquery of the second answer picked N111 as well as N112 because of the strand, which was the same problem I had. The group by function with multi column worked different from what I guessed. However, the suggested answer gave me a clue how group by works so that I could modify it slightly to make it works. The two answers give the same results.
Thanks again, you guys.
Site
You could just find the one with different chromosome for a given site:
SELECT DISTINCT t1.site, t1.chromosome, t2.chromosome
FROM original_table t1
INNER JOIN original_table t2 USING (site)
WHERE t1.chromosome <&gt t2.chromosome
Looks like you want something like this :
SELECT site, chromosome, strand
FROM original_table O
INNER JOIN (SELECT site, chromosome
FROM original_table
GROUP BY site, chromosome
HAVING COUNT(*) > 1) T
ON USING (site)
AND USING (chromosome)
The subquery selects the site and chromosome pairs that are repeated more than once, then you join it to the the big table. Since it's an INNER JOIN, it only returns the rows who has a match in the subquery.