Greatest n per group in a many to many relationship - mysql

I know there are many questions similar to this one but it seems that nothing fits my problem. I've spent quite a few hours researching this problem and came up with a query that doesn't select the last image, more on that later. So, the problem is a have 3 tables
table: images
| Field | Type
+-------------------+------------------
| id | int(10) unsigned
| filename | varchar(255)
| created_at | timestamp
| updated_at | timestamp
table: offers
| Field | Type
+-------------------+------------------
| id | int(10) unsigned
| message | varchar(255)
| created_at | timestamp
and a table connecting them: offer_images
| Field | Type
+----------+------------------
| offer_id | int(10) unsigned
| image_id | int(10) unsigned
So, the question is:
How do I select all offers with the last updated image (based on updated_at) from the images table that is linked to the offer. Here is what I got so far:
SELECT `o`.*, `i`.`filename`
FROM `offer_images` AS `oi`
INNER JOIN `offers` AS `o` on `oi`.`offer_id` = `o`.`id`
INNER JOIN `images` as `i` on `oi`.`photo_id` = `i`.`id`
GROUP BY `o`.`id`
The query selects everything and it's working besides that it ignores the updated_at field.

Try this:
SELECT o.*, i.*
FROM offers AS o
INNER JOIN (
-- Get the latest update_at date per offer_id
SELECT oi.offer_id, MAX(updated_at) AS max_updated_at
FROM offer_images AS oi
INNER JOIN images AS i ON oi.image_id = i.id
GROUP BY oi.offer_id
) AS d ON o.id = d.offer_id
INNER JOIN offer_images AS oi ON oi.offer_id = d.offer_id
INNER JOIN images AS i ON i.id = oi.image_id AND i.updated_at = d.max_updated_at
The query uses a derived table to get the latest update_at date per offer_id. Using this date we can join back to the images table in order to get the greatest-n-per-group record.

Related

SUM() not working correctly when left joining Many to Many table

Note: I see similar SQL questions but nothing specific to MySQL on how to solve this issue.
I have the following query which sums a product value by day for a period based on the sale date from the sales table, products can be filtered based on categories which is why I need to have the left join, categories also need to be displayed along with the rest of the information. Due to project requirements I can not do any processing outside of this MySQL query.
select `sales`.`sell_date` as `date`, SUM(product_value.value) as value from
`sales` left join `products` on `sales`.`product_id` = `products`.`id` left join
`product_value` on `product_value`.`product_id` = `products`.`id` and
`sales`.`sell_date` BETWEEN product_value.date_from AND
IFNULL(product_value.date_to, '2999-01-01')
left join `product_product_category` on `product_product_category`.`product_id`
= `products`.`id` left join `product_categories` on
`product_product_category`.`product_category_id` = `product_categories`.`id`
left join `users` on `sales`.`seller_id` = `users`.`id`
where `sales`.`sell_date` between "2016-02-01" and "2016-02-29" and `product_value`.`deleted_at` is null
and `products`.`id` in ("178") and `sales`.`deleted_at` is null group by
`sales`.`sell_date` order by `sales`.`sell_date` asc
The above query will get a sum which is doubled or trippled when there is two or three categories for a product. Categories can be things such as color, size, etc.
The sum works fine when I remove the following from the query which has lead me to believe that the many to many relationship here is causing the issue.
left join `product_product_category` on `product_product_category`.`product_id` =
`products`.`id` left join `product_categories` on
`product_product_category`.`product_category_id` = `product_categories`.`id`
How can I prevent this left join from causing my SUM() to give me the wrong total value?
Using Distinct on product_value.value will not work as product values can be the same for many products.
My tables
sales
ID | sell_date | product_id
----------------------------
2 | 2016-02-15 | 178
product_value
ID | value | date_from | date_to | product_id
-------------------------------------------------
1 | 500 | 2016-01-01 | NULL | 178
2 | 500 | 2015-01-01 | 2015-12-01 | 392
products
ID | name
----------
178 | ProductName
product_product_category
product_id | product_category_id
--------------------------------
178 | 1
178 | 2
product_categories
ID | name
---------
1 | Red
2 | Large
So to make this clear, if I run the above query on these tables I would get value = 1000 but value should be 500. How can I make sure SUM() shows the correct value when joining many to many relationships?
You can remove left join and add filters in where as subquery
...
where
...
and exists (
select 1
from `product_product_category`
inner join `product_categories` on `product_product_category`.`product_category_id` = `product_categories`.`id`
where `product_product_category`.`product_id` = `products`.`id`
and ....
and ....
)
...

sql not showing proper result

I have following tables in mySql.
blog
Field Type
---------- ------------
id int(11)
name varchar(255)
user_id int(11)
share int(14)
user_blog_analytics
Field Type
----------- ------------
id int(11)
blog_id int(11)
ip varchar(255)
impressions int(11)
date date
user_profile
Field Type
----------- ------------
id int(11)
user_id int(11)
description text
share int(14)
user_profile_analytics
Field Type
----------- ------------
id int(11)
user_id int(11)
ip varchar(255)
impressions int(11)
date date
users
Field Type
----------- ------------
id int(11)
email varchar(255)
I want a query that gives me total blog shares of each users from blog table, total profile shares of each users from user_profile table, total blog views from yesterday i.e. from user_blog_analytics table, all time views on profile from user_profile_analytics table.
I created a query but not giving me the results I expect, it only gives me few results.
SELECT a.user_id, COUNT(DISTINCT b.ip) AS blog_view_count, a.share AS blog_share_count, c.share AS profile_share_count, COUNT(DISTINCT d.ip) AS user_profile_view
FROM blog AS a
JOIN user_blog_analytics AS b ON b.blog_id=a.id
JOIN user_profile AS c ON c.user_id=a.user_id
JOIN user_profile_analytics AS d ON d.user_id=c.user_id
JOIN users AS e ON e.id=a.user_id
WHERE DATE_SUB(CURDATE(), INTERVAL 1 DAY) = b.date AND e.role_id=2
GROUP BY a.id;
When I ran this query it gives me only one result but when I manually checked the tables then it should be giving me at least 2 results. Tell me where I am wrong and how can I get the result by modifying this query.
Try this:
SELECT u.id, u.email, b.blog_share_count, b.blog_view_count,
up.profile_share_count, upa.user_profile_view
FROM users u
LEFT JOIN (SELECT b.user_id, SUM(b.share) AS blog_share_count, COUNT(DISTINCT b.ip) AS blog_view_count
FROM blog b
LEFT JOIN user_blog_analytics AS uba ON uba.blog_id = b.id AND DATE_SUB(CURDATE(), INTERVAL 1 DAY) = uba.date
GROUP BY b.user_id
) b ON u.id = b.user_id
LEFT JOIN (SELECT up.user_id, SUM(up.share) AS profile_share_count
FROM user_profile up
GROUP BY up.user_id
) up ON u.id = up.user_id
LEFT JOIN (SELECT up.user_id, COUNT(DISTINCT up.ip) AS user_profile_view
FROM user_profile_analytics up
GROUP BY up.user_id
) upa ON u.id = upa.user_id
Not an answer, but something to think about...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table(i14 INT(14),i4 INT(4));
INSERT INTO my_table VALUES (123456789012345,123456789012345);
SELECT * FROM my_table;
+------------+------------+
| i14 | i4 |
+------------+------------+
| 2147483647 | 2147483647 |
+------------+------------+
So, the numbers in parentheses ain't doing much for ya!

Normalization made my queries slower

Before normalization I had a column called genreand it contained values like "Action, Thriller, Comedy"
Now I have normalized the genre column by creating genre and movie2genre tables.
The problem now is my queries are more complicated and are actually slower
These two queries basically search for movies that are action and thriller
Old query
select title, genre from movie where genre like '%action%' and genre like '%thriller%'
0.062 sec duration / 0.032 sec fetch
New Query
SELECT movie.title, movie.genre
FROM Movie
Where
EXISTS (
select *
from movie2genre
JOIN Genre on Genre.id = movie2genre.GenreId
where Movie.id = movie2genre.MovieId
and genre in ('action', 'thriller')
)
0.328 sec duration / 0.078 sec fetch
Am I doing something wrong?
More info:
Movie
+-------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| Title | varchar(345) | YES | | NULL | |
ETC....
Genre
+---------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+----------------+
| genreid | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(50) | YES | | NULL | |
+---------+-------------+------+-----+---------+----------------+.
movie2genre
+---------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------+------+-----+---------+-------+
| movieid | int(11) | YES | | NULL | |
| genreid | int(11) | YES | | NULL | |
+---------+---------+------+-----+---------+-------+
Try this without correlated Queries (Please check the execution plan of both queries if you are concerned about the performance) also Make sure you have proper indexes on your new table.
SELECT *
FROM movie2genre mg, Genre g, Movie m
WHERE m.id = mg.MovieId
AND g.id = mg.GenreId
AND g.genre in ('action', 'thriller')
First, your two queries are not the same. The newer version does an or rather than an and, so the difference in time could simply be returning a larger result set. In addition, your new query refers to movie.genre, a column that wouldn't exist in a normalized database.
You seem to be asking for:
select m.title
from Movie m
where exists (select 1
from movie2genre m2g JOIN
Genre g
on g.id = m2g.GenreId
where m.id = m2g.MovieId and g.genre = 'action'
) and
exists (select 1
from movie2genre m2g JOIN
Genre g
on g.id = m2g.GenreId
where m.id = m2g.MovieId and g.genre = 'thriller'
);
Admittedly, you probably will not think this solves the "complication" problem. Leaving that aside, you need to have indexes for this to work well. Do you have the "obvious" indexes of: movie2genre(MovieId, GenreId) and genre(GenreId)?
Second, your data is not particularly large (judging by the duration for the queries). So, a full table scan may be more efficient than the joining and filtering with these tables. As the database grows, the normalized approach will often be faster.
A more equivalent query is:
select m.title, group_concat(g.genre)
from movies m join
movie2genre m2g
on m.movieid = m2g.movieid join
genre g
on g.genreid = m2g.genreid
group by m.title
having sum(g.genre = 'action') > 0 and sum(g.genre = 'thriller') > 0;
Because of the nature of your particular query -- you need to fetch all genres on a movie so you cannot filter on them -- this particular query is probably going to perform less well than the unnormalized version.
By the way, normalization is more about keeping data consistent than about speeding queries. Normalized databases require more join operations. Indexes can help performance, but there is still work in doing the join. In some cases, the tables themselves are bigger than the unnormalized forms. And, normalized databases may require aggregation where none is required for non-normalized database. All of these can affect performance, which is why in many decision support architectures, the central database is normalized but the application-specific databases are not.
Indexes are vitally important when doing joins (and sub queries tend to lose the indexing).
There are 2 ways I would suggest trying.
Firstly you join movies to movie2genre, and then one join to genre for each one you are checking. Well indexed this should be fast.
SELECT movie.title,
movie.genre
FROM Movie
INNER JOIN movie2genre
ON Movie.id = movie2genre.MovieId
INNER JOIN Genre G1
ON G1.id = movie2genre.GenreId
AND G1.genre = 'action'
INNER JOIN Genre G2
ON G2.id = movie2genre.GenreId
AND G2.genre = 'thriller'
An alternative is to use IN, and use the aggregate COUNT function to check that the number of genres found is the same as the number expected.
SELECT movie.title,
movie.genre
FROM Movie
INNER JOIN movie2genre
ON Movie.id = movie2genre.MovieId
INNER JOIN Genre
ON Genre.id = movie2genre.GenreId
AND Genre.genre IN ('action', 'thriller')
GROUP BY movie.title, movie.genre
HAVING COUNT(DISTINCT genreid) = 2
I would prefer the 1st solution, but it is a bit more complicated to set up the SQL for in code (ie, the SQL varies greatly depending on the number of genres), and potentially is limited by the max number of table joins if you are checking for lots of genres.

mysql join tables with an if condition?

Im trying to join 5 tables that look somewhat like this
post table
ID | product | user-us | make-id | dealer-id | pending | .... 30 other columns ... |
make table
ID | make |
state table
prefix | state | city | zip
members table
ID | name | email | password | zip | dealer-id
dealer table
ID | dealer name | city | state | zip | address | phone
MySql query looks like this
SELECT *
FROM `post` AS p
JOIN (`make` AS m, `state` AS s, `members` AS mb, `dealer` AS d)
ON (p.Make = m.id AND p.state = s.id AND p.id = mb.id AND mb.dealer-id = d.id)
WHERE p.pending != '1'
The problem is this query only returns rows that member.dealer-id = dealer.id, And if i use a LEFT JOIN, it returns all the correct rows, BUT all columns in tables make, state, members and dealer will be NULL. which it shouldn't be because i need the info in those tables.
Is there away i can only join the dealer table if member.dealer-id is > 0? i could add a row in the dealer table with id 0, but there has to be a better way.
Once you code your joins in the normal way, use LEFT JOIN only on the dealer table:
SELECT *
FROM post AS p
JOIN make AS m ON p.Make = m.id
JOIN state AS s ON p.state = s.id
JOIN`members AS mb ON p.id = mb.id
LEFT JOIN dealer AS d ON mb.dealer_id = d.id
WHERE p.pending != '1'
This will automatically only join to dealer if the member.dealer-id is greater than zero.
btw, I have never seen a query join coded like yours before. If I had, I would have assumed it would not execute due to a syntax error - it looks that strange.

Forcing all rows from first table of a join

I have three tables, machines holding vending machines, products holding all possible products, and machines_products which is the intersection of the two, giving how many of each product line is stocked in a particular machine. If a product is not stocked in a machine, there is no corresponding row in the third table.
DESCRIBE machines_products;
+------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| machine_id | int(10) unsigned | NO | PRI | 0 | |
| product_id | int(10) unsigned | NO | PRI | 0 | |
| quantity | int(10) unsigned | NO | | 0 | |
+------------+------------------+------+-----+---------+-------+
Each product has a category (think chocolate bars vs. drinks bottles) and a machine knows what category of products it can vend. I want a result table of all products for the category, with a quantity for a specific machine. I have got as far as this:
SELECT products.*, SUM(quantity) qty
FROM products
LEFT JOIN machines_products USING (product_id)
WHERE machine_id=m AND category_id=c
GROUP BY product_id;
The problem is that this filters out all rows where there is no quantity, whereas what I want is all rows from the left table, and NULL/0 in the qty column if there are no corresponding rows in the right-hand table.
BTW: this is not a homework question! I am 30 and sitting in my office :o)
SELECT p.*
, SUM(mp.quantity) AS qty
FROM products p
LEFT JOIN machine_products mp
ON mp.product_id = p.product_id
AND mp.machine_id = m --- this condition moved from WHERE to ON
WHERE p.category_id = c
GROUP BY p.product_id
Actually I figured out the answer a short while after posting. The trick is to avoid specifying either of the columns from the third table's primary key (i.e. machine_id and product_id) in the WHERE clause. By using an AND in the JOIN's ON condition, and specifying the machine ID there, I get the result I was looking for.
SELECT products.*, quantity
FROM products
LEFT JOIN machines_products
ON products.product_id=machines_products.product_id
AND machine_id=m
WHERE category_id=c
The COALESCE() function suggested by Brendan was not necessary in my case, since I check the value with PHP's empty() function, so NULL is fine.
As it turns out, there was never a need for GROUP BY, which I had been playing with when posting the question.
SUM returns NULL if a single value in the equation is NULL. COALESCE the value first and then SUM:
SELECT p.*, SUM(COALESCE(mp.quantity, 0)) AS qty
FROM products p
LEFT JOIN machine_products mp ON mp.product_id = p.id
WHERE mp.machine_id = m
AND p.category_id = c
GROUP BY p.id
I assumed you have a column in products called id. Rename if it's something different...
SELECT p.id, SUM(mp.quantity) AS qty
FROM products p
LEFT JOIN machines_products mp ON p.id=mp.product_id
WHERE mp.machine_id=m
AND p.category_id=c
GROUP BY p.id;