Normalizing table LIMIT issue - mysql

I have 3 tables song, author, song_author.
Connection is song 1--* song_author 1--* author
I use a query like
SELECT *
FROM song
LEFT JOIN song_author ON s_id = sa_song
LEFT JOIN author ON sa_author = a_id
For the given example that would result in 3 rows:
John - How beautiful it is
John - Awesome
George - Awesome
and I am populating my objects like that so it's fine.
However, I want to add a LIMIT clause, but because it is returning several rows for just one song a LIMIT 10 doesn't always show 10 songs.
The other possibility I know is print all songs and then inside take a second query, but that would result in O(n) which I'd like to avoid.
author:
+------+--------+
| a_id | a_name |
+------+--------+
| 1 | John |
| 2 | George |
+------+--------+
song:
+------+---------------------+
| s_id | s_name |
+------+---------------------+
| 1 | How beautiful it is |
| 2 | Awesome |
+------+---------------------+
song_author:
+-----------+---------+
| sa_author | sa_song |
+-----------+---------+
| 1 | 1 |
| 1 | 2 |
| 2 | 2 |
+-----------+---------+

You can try something like that:
SELECT *
FROM
(SELECT *
FROM song
LIMIT 10) r
LEFT JOIN song_author ON r.s_id = sa_song
LEFT JOIN author ON sa_author = a_id

To be clear, your goal is what, exactly? It seems like you want to grab a set amount of songs at one time (10 in the example), and all related authors for said songs. However, if multiple of those [song, author] record tuples represented the same song, you'd end up with 10 records, but less than 10 actual songs.
I like Marcin's approach, but if you want to avoid sub-queries and temporary tables, you can instead use a feature built into MySQL: GROUP_CONCAT()
SELECT
s_name,
GROUP_CONCAT(DISTINCT a_name SEPARATOR '|')
FROM
song
LEFT JOIN
song_author ON sa_song = s_id
LEFT JOIN
author ON a_id = sa_author
GROUP BY
s_name
LIMIT 10

Related

Merge based on "group by" groups

So I have a table called the Activities table that contains a schema of user_id, activity
There is a row for each user, activity combo.
Here is a what it might look like (empty rows added to make things easier to look at, please ignore):
| user_id | activity |
|---------|-----------|
| 1 | swimming | -- We want to match this
| 1 | running | -- person's activities
| | |
| 2 | swimming |
| 2 | running |
| 2 | rowing |
| | |
| 3 | swimming |
| | |
| 4 | skydiving |
| 4 | running |
| 4 | swimming |
I would like to basically find all other users with at least the same activities as a given input id so that I could recommend users with similar activities.
so in the table above, if I wanna find recommended users for user_id=1, the query would return user_id=2 and user_id=4 because they engage in both swimming, running (and more), but not user_id=3 because they only engage in swimming
So a result with a single column of:
| user_id |
|---------|
| 2 |
| 4 |
is what I would ideally be looking for
As far as what I've tried, I am kinda stuck at how to get a solid set of user_id=1's activities to match against. Basically I'm looking for something along the lines of:
SELECT user_id from Activities
GROUP BY user_id
HAVING input_user_activities in user_x_activities
where user1_activities is just a set of our input user's activities. I can create that set using a WITH input_user_activities AS (...) in the beginning, what I'm stuck at is the user_x_activities part
Any thoughts?
To get users with the same activities, you can use a self join. Let me assume that the rows are unique:
select a.user_id
from activities a1 join
activities a
on a1.activity = a.activity and
a1.user_id = #user_id
group by a.user_id
having count(*) = (select count(*) from activities a1 where a1.user_id = #user_id);
The having clause answers your question -- of getting users that have the same activities as a given user.
You can easily get all users ordered by similarity using a JOIN (that finds all common rows) and a GROUP BY (to summarize the similarity per user_id) and finally an ORDER BY to return the most similar users first.
SELECT b.user_id, COUNT(*) similarity
FROM activities a
JOIN activities b
ON a.activity = b.activity
WHERE a.user_id = 1 AND b.user_id != 1
GROUP BY b.user_id
ORDER BY COUNT(*) DESC
An SQLfiddle to test with.

Order table by presences in a third table

I have a movie database with a table for actors and another one for movies, I created a third table to add an actor partecipation in a movie. I added a field "star" to distinque leading actors from not leading actors.
I wish create a list order by the actors importance and so by the the total number of "stars".
SELECT a.id, a.name, COUNT( p.star ) AS star
FROM actors a
JOIN playing p, movies m
WHERE p.id_actor = a.id
AND p.id_movie = m.id
AND p.star =1
GROUP BY p.star
ORDER BY p.star DESC;
ACTORS
+----+---------+
| id | name |
+----+---------+
| 1 | actor01 |
| 2 | actor02 |
| 3 | actor03 |
+----+---------+
MOVIES
+----+----------+
| id | title |
+----+----------+
| 1 | movie01 |
| 2 | movie02 |
| 3 | movie03 |
+----+----------+
PLAYING
+----------+----------+-------+------+
| id_movie | id_actor | char | star |
+----------+----------+-------+------+
| 1 | 1 | char1 | 0 |
| 1 | 2 | char2 | 1 |
| 2 | 3 | char3 | 1 |
+----------+----------+-------+------+
I Need output Like:
+----------+--------------+
| actor | protagonist |
+----------+--------------+
| actor01 | 2 times |
| actor02 | 3 times |
+----------+--------------+
You need to fix the group by clause to group by the actor not the star column. You need to fix the order by to group by the aggregated column, not the original column:
SELECT a.id, a.name, sum( p.star = 1) AS stars
FROM actors a join playing p
on p.id_actor = a.id join
movies m
on p.id_movie = m.id
GROUP BY a.id, a.name
ORDER BY stars DESC;
Along the way, I fixed the from so it uses proper join syntax (with an on clause). And changed the query so it returns all actors, even those who have never been the star.
1.If you want to count all stars for an actor, you should group by actor but not stars.(Unless you want to count how many times an actor gets 1 star in a movie, you may not want to group by star)
2.You may want to use ON with JOIN
3.You may want to ORDER BY star but not ORDER BY p.star since you want to order by the result.
4.You may want to use SUM instead of COUNT to get the star counts.(SUM calculates the value but COUNT calculates the number. With SUM, you can set star value to whatever you want without change your sql. You can have star=2 which shows the actor is important to the movie or have star=-1, which means the actor stinks.)
You may have a look at the sql below:
SELECT a.id, a.name, SUM( p.star ) AS sum
FROM actors a
LEFT JOIN playing p ON p.id_actor = a.id
LEFT JOIN movies m ON p.id_movie = m.id
GROUP BY a.id
ORDER BY sum DESC;

Join two tables using multiple rows in the join

I have two tables
Table: color_document
+----------+---------------------+
| color_id | document_id |
+----------+---------------------+
| 180907 | 4270851 |
| 180954 | 4270851 |
+----------+---------------------+
Table: color_group
+----------------+-----------+
| color_group_id | color_id |
+----------------+-----------+
| 3 | 180954 |
| 4 | 180907 |
| 11 | 180907 |
| 11 | 180984 |
| 12 | 180907 |
| 12 | 180954 |
+----------------+-----------+
Is it possible for a query to get a result that looks something like this using multiple color id's to join the two tables?
Result
+----------------+--------------+
| color_group_id | document_id |
+----------------+--------------+
| 12 | 4270851 |
+----------------+--------------+
Since Color Group 12 is the only group that has the exact same set of Colors that Document 4270851 has.
I've got some bad data that i'm being forced to work with so I've had to manufacture the color groups by finding each unique set of color_id's associated with document_id's. I'm trying to then create a new relationship directly between my manufactured color groups and documents.
I know I could probably do something with a GROUP_CONCAT to make a pseudo key of concatenated color ids, but I'm trying to find a solution that would also work in, say, Oracle. Am I barking up the completely wrong tree with this logic?
My ultimate goal is to be able to have a single row in a table that would represent any number of Colors that are associated with a Document to be exported to a completely different system than the one I'm working with.
Any thoughts/comments/suggestions are greatly appreciated.
Thank you in advance for looking at my question.
Do a normal join of the two tables, and count the number of rows in each pairing. Then test whether this is the same as the number of times each of the items appears in the original tables. If all are the same, then all color IDs must match.
SELECT a.color_group_id, a.document_id
FROM (
SELECT color_group_id, document_id, COUNT(*) ct
FROM color_document d
JOIN color_group g ON d.color_id = g.color_id
GROUP BY color_group_id, document_id) a
JOIN (
SELECT color_group_id, COUNT(*) ct
FROM color_group
GROUP BY color_group_id) b
ON a.color_group_id = b.color_group_id and a.ct = b.ct
JOIN (
SELECT document_id, COUNT(*) ct
FROM color_document
GROUP BY document_id) c
ON a.document_id = c.document_id and a.ct = c.ct
SQLFIDDLE
If i understand your question correct you just have to join the two tables and then group the results by color_group_id an document_id.
SQL Fiddle
select color_group_id, document_id
from
color_document cd join
color_group cg
on cd.color_id = cg.color_id
group by color_group_id, document_id
That query will give you this result set:
COLOR_GROUP_ID DOCUMENT_ID
3 4270851
4 4270851
11 4270851
12 4270851
Is that what you want?

Listing user types with counts and percentages

I have a 'users' table:
user_id | prov_platform | first_name | last_name
--------|-----------------|--------------|-------------------
1 | Facebook | Joe | Bloggs
2 | Facebook | Sue | Barker
3 | | John | Doe
4 | Twitter | John | Terry
5 | Google | Angelina | Jolie
And I originally wanted to return a list of all the different social platform types there were in my users table, with counts beside each one - so I came up with this:
SELECT
IFNULL(prov_platform, 'Other') AS prov_platform,
COUNT(*) AS platform_total
FROM users
GROUP BY prov_platform
ORDER BY platform_total DESC
Which resulted in this:
prov_platform | platform_total
---------------|-----------------
Facebook | 2
Twitter | 1
Google | 1
Other | 1
But I now want to add another couple of fields to this query; 'allround_total' and 'percentage'. So, the above recordset would become:
prov_platform | platform_total | allround_total | percentage
---------------|----------------|----------------|---------------
Facebook | 2 | 5 | 40%
Twitter | 1 | 5 | 20%
Google | 1 | 5 | 20%
Other | 1 | 5 | 20%
This is as far as I got before getting in a muddle:
SELECT
u.prov_platform,
COUNT(*) AS platform_total,
allround_total,
allround_total/platform_total*100 AS percentage
FROM
users AS u
INNER JOIN (
SELECT COUNT(*) AS allround_total FROM users
) AS allround_total
GROUP BY
prov_platform
ORDER BY
platform_total DESC
This returns the 'allround_total' field, which works, but have no idea how performance friendly it'll be. What I can't workout is how to get the percentage to work correctly. Currently, the above query returns an error:
Unknown column 'platform_total' in 'field list'
I think I'm close, I just need a much appreciated push over the line.
You cannot use column aliases in the same level as they are defined. I also think you have the calculation for percentage backwards.
SELECT u.prov_platform, COUNT(*) AS platform_total,
const.allround_total,
100*count(*)/const.allround_total AS percentage
FROM users u cross join
(SELECT COUNT(*) as allround_total FROM users
) const
GROUP BY prov_platform
ORDER BY platform_total DESC;
I changed the join from inner join to cross join. Although MySQL allows all joins to lack an on clause, I find it disconcerting to see an inner join with no on. Similarly, I changed the name of the table alias to differ from the column alias, to make the query easier to read.

Using subqueries with multiple results which depend on the main query

I have a persons table called "tc_person" and a marriage table called "tc_marriage". I want to select a few columns from my persons table and one column which represents the id of the partner.
The marriage table includes the pid_1 and pid_2 of two people - but it is important that there is only one entry for a couple and the order of the couples ids may vary. Here's the tables:
tc_person:
| id | name | lastname |
--------------------------------------
| 4 | peter | smith |
| 5 | sarah | smith |
tc_marriage:
| id | pid_1 | pid_2 |
--------------------------------------
| 0 | 5 | 4 |
| 1 | 7 | 9 |
It seems that my subquery is interpreted as a whole before the original select statement. Now I get the error that my subquery returns more than one row.
SELECT p.id, p.name, p.lastname,
(SELECT m.pid_1 FROM tc_marriage m WHERE m.pid_2 = p.id UNION
SELECT m.pid_2 FROM tc_marriage m WHERE m.pid_1 = p.id) as partner_id
FROM tc_person p WHERE p.lastname LIKE 'smith';
I am looking for the following output:
| id | name | lastname | partner_id |
-----------------------------------------------------
| 4 | peter | smith | 5 |
| 5 | sarah | smith | 4 |
Is this even possible with only one single query? You can probably tell by now that I'm quite the SQL noob. Maybe you guys can help.
You can use IN to avoid a UNION (which is typically slower), and a CASE statement to pick out the correct partner_id:
SELECT p.id,
p.name,
p.last_name,
CASE p.id WHEN m.pid_1 THEN m.pid_2 ELSE m.pid_1 END AS partner_id
FROM tc_person p
JOIN tc_marriage m ON p.id IN (m.pid_1, m.pid_2)
What you want to do is to join the person table to the marriage table. However, the problem is that you have two keys. One solution is to do two joins and then some logic to choose the right value.
I prefer to "double" the marriage table by swapping the partners. The following query takes this approach in a subquery:
select p.id, p.name, p.lastname, partner_id
from tc_person p join
(select pid_1 as firstid, pid_2 as partner_id
from marriage
union all
select pid_2 as firstid, pid_1 as partner_id
from marriage
) m
on p.id = m.firstid