organize similar items in columns from same table - mysql

I've a table named 'artist' with columns - id, name, genre
I need to find out all artist that share the same genre.
Here's an example: If artist X has genre ‘rock’ and ‘classic’, artist Y has genre ‘classic’ and ‘pop’, and artist Z has genre ‘pop’, then sample output as below:
first artist | second artist | genre
X | Y | Classic
Y | Z | Pop

Join data using Genre
Exclude "mirror duplicates" (XY-Classic vs YX-Classic) and loops (
ZZ-Pop ) using S.id < J.id condition:
Select S.Name as FirstArtist,
J.Name as SecondArtist,
S.Genre as CommonGenre
From Artist S
Inner Join Artist J
ON S.Genre = J.Genre and S.id < J.id

Judging from your sample output, you're trying to do two things. First, order your data set, and second, "pivot" it so some columns in your data become rows.
The first part:
SELECT name, genre
FROM artist
ORDER BY genre
The second part is presentation. There are many ways to do this. You could try this, for example.
SELECT GROUP_CONCAT(name ORDER BY name) AS names,
genre
FROM artist
GROUP BY genre
ORDER BY genre
Or you could look up various techniques in MySQL to do pivoting. Hint: they're all kind of difficult.
Or, you could do this in your presentation software.

Related

MYSQL: How to select using multiple tags

I have 3 tables set up like this
artist
[ artist_id | artist_name | ... ]
genre
[ genre_id | genre_name | ... ]
artist_genre
[ artist_genre_artist_id | artist_genre_genre_id | ... ]
An artist (e.g. Queen) has multiple genres like Rock, glam rock, psychedelic rock, hard rock, progressive rock, punk rock, heavy metal, pop, blues, rock and roll, rhythm 'n blues.
If I visit a page of other artists like The Rolling Stone, I would like to display similar artists like queen based on the current artist's multiple genres (tags).
The Rolling Stone does genres like Rock, blues, rock and roll.
I want to return results in order of RELEVANCE by how many of the current artist's genres appear on the other artists' list.
I don't know if I've made enough sense but I hope you get the point. Thanks!
We can try a self join approach with aggregation:
SELECT
a1.artist_name,
a2.artist_name,
COUNT(*) AS common_cnt
FROM artist_genre ag1
INNER JOIN artist_genre ag2
ON ag2.artist_genre_artist_id > ag1.artist_genre_artist_id AND
ag2.artist_genre_genre_id = ag1.artist_genre_genre_id
INNER JOIN artist a1
ON a1.artist_id = ag1.artist_genre_artist_id
INNER JOIN artist a2
ON a2.artist_id = ag2.artist_genre_artist_id
GROUP BY
a1.artist_name,
a2.artist_name
ORDER BY
COUNT(*) DESC,
a1.artist_name,
a2.artist_name;
The strategy here is to bring every artist in comparison to every other artist (without duplication), on the condition that a given junction table record matches genres on both sides of the join. Then, we aggregate by the pair of artists being compared, and take the number of records, which is the overlap. Here is a working demo with a small sample data set:
Demo
SELECT artist_genre_genre_id FROM artist_genre WHERE artist_genre_artist_id =$login_id;
store the above result in array $related_genre
SELECT DISTINCT artist.artist_name FROM artist_genre
LEFT JOIN artist
ON artist.artist_id = artist_genre.artist_genre_artist_id
WHERE artist_genre.artist_genre_genre_id in ($related_genre);

Average Count of Genres per Movie for MySQL

Stumbled into a problem using an IMBd dataset that I can't seem to figure out the answer to. The question is:
Create a table that contains the average count of genres per movie for
each genre
We have two tables: Movies: id, name; Genres: id (movieId), genre
Movies:
id,name
1,Toy Story
2,Jumanji
3,Grumpier Old Men
4,Waiting to Exhale
5,Father of the Bride Part II
6,Heat
Genres:
id,genre
1,Animation
1,Children's
1,Comedy
2,Adventure
2,Children's
2,Fantasy
3,Comedy
3,Romance
4,Comedy
4,Drama
5,Comedy
6,Action
6,Crime
6,Thriller
I maybe interpreting the question incorrectly, but shouldn't the output be 3 columns: genre, movie, and count?
My answer would start along the lines of:
SELECT genre, name, AVG(COUNT(*)) FROM movies
JOIN genres ON genres.id=movies.id
GROUP BY name;
Any ideas on how you would interpret the question and answer?
Well, I would start with the number of genres per movie:
select id, count(*) as num_genres
from genres g
group by id
Then, I would "attach" this information to the genres information. And aggregate and average:
select g.genre, avg(m.num_genres)
from genres g join
(select id, count(*) as num_genres
from genres g
group by id
) m
on g.id = m.id
group by g.genre;
I agree with Gordon first number of genres per movie
select id, count(*) as num_genres
from genres g
group by id
But the average of genres per movies should be
SELECT AVG(num_generes)
FROM (
SELECT id, count(*) as num_genres
FROM genres g
GROUP BY id
) t

How to separate the maximum count for each genre of something

I want to find for each
genre of movie, find the N actors who have played in most movies
of the genre
I have done this:
select genre.genre_name,actor.actor_id,count(genre.genre_name) from genre
inner join movie_has_genre on movie_has_genre.genre_id=genre.genre_id
inner join movie on movie_has_genre.movie_id=movie.movie_id
inner join role on movie.movie_id=role.movie_id
inner join actor on actor.actor_id=role.actor_id
group by genre.genre_name,actor.actor_id;
which gives as a result for each genre how many movies of that genre every actor has played and now i want to find for each genre the actor that has played the most moviesof that genre.
Tables and their columns:
actor(actor_id,name)
role(actor_id,movie_id)
movie(movie_id,title)
movie_has_genre(movie_id,genre_id)
genre(genre_id,genre_name)
Also the result should be something like this:
Action 22591 7
Horror 25863 3
Horror 24867 3
Comedy 23476 2
Drama 14536 1
Drama 19634 1
Drama 17563 1
Man, what I'd do is the next (supposing your code is working well):
-- Notice this is your code with some aliases, nothing else.
-- Just for making mi job easier.
create view frequency as
select genre.genre_name as genre_ name,
actor.actor_id as actor_id,
count(genre.genre_name) as freq
from genre
inner join movie_has_genre on movie_has_genre.genre_id=genre.genre_id
inner join movie on movie_has_genre.movie_id=movie.movie_id
inner join role on movie.movie_id=role.movie_id
inner join actor on actor.actor_id=role.actor_id
group by genre.genre_name,actor.actor_id;
-- And this is my proposal
-- Take the max frequency per each category
-- and find the guy who possesses it (maybe 2 or more...)
select genre.genre_name,actor.actor_id
from frequency as tbl1 inner join
(
-- The max frequency in a genre.
select f.genre_name,
max(f.freq) as max_freq
from frequency f
group by(genre_name)
) as tbl2 on (tbl1.genre_name = tbl2.genre_name)
where tbl1.freq = tbl2.max_freq;
And well, there's one problem: It may return more than one actor per category, if there's a tie. But how can I know who is the winner? I let it for you. Maybe it's wrong, I don't think so, but we're both learning! Hope I'd help you.
You need to use the MAX() function. Some SQL implementations (such as Oracle) allow you to do this: SELECT MAX(COUNT(whatever)) but MySQL isn't one of them.
One way to do what you want is this:
select genre_name, actor_id, max(genrecount)
from (
select genre.genre_name, actor.actor_id, count(genre.genre_name) as genrecount
from genre
inner join movie_has_genre on movie_has_genre.genre_id=genre.genre_id
inner join movie on movie_has_genre.movie_id=movie.movie_id
inner join role on movie.movie_id=role.movie_id
group by genre.genre_name,actor.actor_id
) as topactor
This does the outer SELECT on the table derived from the inner SELECT.

denormalize many to many relationship in MySQL

I would like to denormalize many to many relationship in mysql. In order to import to MongoDB as Json format Schema.
Input
I have 3 tables:
Movies : id, title, url
Genres : id, genre
movie_genres : movie_id, genre_id
example
movie Table
id title link
1 star wars http://link-to-imdb
2 shrek http://link-to-imdb
movie_genres Table
movie genre
1 1
2 1
genres Table
id genre
0 unknown
1 action
2 comedy
3 drama
I would like to transform it to a single table by moving genres into movies as array or multiple values.
There are quite a few limited number of genres (only 15).
Output
So, Final output of table would be:
Movies : id, title, url, genre
Here, genre would be multiple values.
Example:
id title link genre
1 star wars http://link-to-imdb action, drama, sci-fi
2 shrek http://link-to-imdb anime
I did this - MySQL Query:
select M.id ,M.title ,M.release_date, M.video, M.IMDBURL, G.genre
from genres G, movie_genres MG, movies M
where M.id = MG.movie and MG.genre = G.id
but causes lot of repetition depending on number of genres. It would be nice If I could dump genres altogether.
In this cause you should use GROUP_CONCAT() function
SELECT movie.id, movie.title, movie.url, GROUP_CONCAT(g.genre SEPARATOR ', ') AS genres
FROM movie
LEFT JOIN movie_genres mg ON movie.id = mg.movie_id
LEFT JOIN genres g ON mg.genre_id = g.id
GROUP BY movie.id
I didn't test the query above (there could be some typos), but I hope you will be able to get the idea

SQL column with multiple values

i want to know how can we store multiple values in a sql column, or which is the best way to store the values in a sql column.
Consider in a movie table, a movie will be having multiple Genre
eg,
Genre: "Action, Adventure, Fantasy"
Genre: "Adventure, Drama, Fantasy, Thriller"
which is the best way to store the Genre values in database.
This is a classic n to m relation. It works like this
movies table
------------
id
name
...
genres table
------------
id
name
movie_genres table
------------------
movie_id
genre_id
The movie_genres table then contains one record for each genre of a movie. Example:
movie_genres
------------
movie_id | genre_id
1 | 1
1 | 2
1 | 13
To get the genres of a movie do:
select g.name as genre_name
from genres g
join movie_genres mg on mg.genre_id = g.id
join movies m on m.id = mg.movie_id
where m.name = 'star wars'
IMO, using a n:m relationship, as juergen_d suggested, is the best option.
But in mysql there is another option it might work in your case: using SET data type. Details here. Defintely not as powerful nor robust as using n:m relationship. Not normalization friendly.