I'm having three tables: articles, tags and articles_tags. As you can imagine, each article can have multiple tags and each tag can be assigned to multiple articles. I have so-called main article (represented by unique URL) and would like to get related articles of it, based on shared tags between them like: if main article and article 2 has one tag in common, show both articles (and ideally, it would not show/include in the results the main article). Unique URL of main article is passed in SQL query.
The expected result is beyond my reach, so any help would be appreciated.
SQLFiddle
Copied code, if site above goes offline:
Databases and content:
CREATE TABLE `articles` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`url` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`status` tinyint(4) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `tags` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`tag` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`url` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `articles_tags` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`article_id` int(11) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `articles` (`url`, `title`, `status`) VALUES
('test-article-1', 'Test Article #1', 1),
('test-article-2', 'Test Article #2', 1),
('test-article-3', 'Test Article #3', 0),
('test-article-4', 'Test Article #4', 0),
('test-article-5', 'Test Article #5', 1);
INSERT INTO `tags` (`tag`, `url`) VALUES
('Test', 'test'),
('City', 'city'),
('Nature', 'nature');
INSERT INTO `articles_tags` (`article_id`, `tag_id`) VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 2),
(3, 1),
(3, 2),
(4, 2),
(5, 1);
Latest (not working properly) SQL query:
SELECT
tags.tag,
articles.url,
articles.title
FROM articles
LEFT JOIN articles_tags ON articles_tags.article_id=articles.id
LEFT JOIN tags ON articles_tags.tag_id=tags.id
WHERE (articles.url='test-article-1'
OR tags.id IN (articles_tags.tag_id))
AND articles.status=1
GROUP BY articles.id
Result:
As you can see on SQLFiddle, it shows articles 1, 2 and 5, but in my mind it should show only 1 and 5
Expected Result: Articles 1 and 5, ideally only 5 (excluding article 1 because it's the main one).
I am not quite sure I understand why you didn't expect article 2 in your results, as it and article 1 both have tag 2. This below should still return article 2, so it may not be what you want, but it is the most straight forward "similarly tagged ranking" query I can think of:
SELECT b.*, COUNT(1) AS tagMatches
FROM articles AS a
INNER JOIN articles_tags AS aTags ON a.id=aTags.article_id
INNER JOIN articles_tags AS bTags
ON aTags.article_id<>bTags.article_id
AND aTags.tag_id = bTags.tag_id
INNER JOIN articles AS b ON bTags.article_id
WHERE a.url = ?
GROUP BY b.url
ORDER BY tagMatches DESC, b.title
;
Edit: This assumes articles cannot have the same tag more than once. If this is not the case, it will skew the rankings (but that might be favorable if the duplicated tags should have more weight).
Edit2: It is also worth noting, that * probably should not be used for final results; I just used it here for simplicity.
Your OR condition OR tags.id IN (articles_tags.tag_id)) fires on these rows:
INSERT INTO `articles_tags` (`article_id`, `tag_id`) VALUES
(1, 1),
...
(3, 1),
...,
(5, 1);
so, for me the result looks fine
Related
I have a MYSQL table called tbl_product
Another table called tb_opciones_productos
And a third table called tb_opciones
I need to show every item from tbl_product as follows:
How can I get one item from tbl_product and the needed rows from tb_opciones_producto to get the needed result?
EDIT:
This is my current query proposal:
SELECT tbl_product.*,
GROUP_CONCAT( (SELECT CONCAT(tb_opciones.nombre, "(+$", tb_opciones.precio, ")")
FROM tb_opciones WHERE tb_opciones.id_opcion = tb_opciones_productos.id_opcion) SEPARATOR "<br>" ) as options FROM tbl_product
INNER JOIN tb_opciones_productos ON tbl_product.id = tb_opciones_productos.producto
I've create a little sqlfiddle to test : http://sqlfiddle.com/#!9/fc3316/16
You can GROUP_CONCAT a sub-query. It may not be optimized, but it do the job.
PS: next time, can you provide a sample structure ?
Structure :
CREATE TABLE IF NOT EXISTS `products` (
`id` int(6) unsigned NOT NULL,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `products` (`id`, `name`) VALUES
(1, 'Product Lorem'),
(2, 'Product Ipsum');
CREATE TABLE IF NOT EXISTS `products_options` (
`id_product` int(6) unsigned NOT NULL,
`id_option` int(6) unsigned NOT NULL,
PRIMARY KEY (`id_product`, `id_option`)
) DEFAULT CHARSET=utf8;
INSERT INTO `products_options` (`id_product`, `id_option`) VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 3);
CREATE TABLE IF NOT EXISTS `options` (
`id` int(6) unsigned NOT NULL,
`name` varchar(255) NOT NULL,
`value` double NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `options` (`id`, `name`, `value`) VALUES
(1, 'Option A', 42),
(2, 'Option B', 6),
(3, 'Option C', 12);
Request :
SELECT products.*,
GROUP_CONCAT(options.name, " (+$", options.value, ")" SEPARATOR "<br>")
FROM products
INNER JOIN products_options
ON products.id = products_options.id_product
INNER JOIN options
ON products_options.id_option = options.id
GROUP BY products.id
With your Structure, I think this one will work :
SELECT tbl_product.*,
GROUP_CONCAT(tb_opciones.nombre, " (+$", tb_opciones.precio, ")" SEPARATOR "<br>")
FROM tbl_product
INNER JOIN tb_opciones_productos
ON tbl_product.id = tb_opciones_productos.producto
INNER JOIN tb_opciones
ON tb_opciones_productos.opcion = tb_opciones.id
GROUP BY tbl_product.id
I have a simple click tracking system that consists of three tables "tracking" (which holds unique views), "views" (which holds raw views) and "products" (which holds products).
Here's how it works: each time a user clicks on a tracking link, if the hash present in the link does not exist in the database, it will be saved in the "tracking" table as an unique view and also in the "views" table as a raw view. If the hash present in the link does exist in the database, then it will be saved only in the "views" table. So basically the number of "raw views" can not be smaller than the number of "unique views" because each "unique view" also counts as a "raw view".
I wrote a query to create reports based on products, but the number of "raw views" returned is not correct.
I've also created a fiddle which I hope it will give a better overview of my problem.
Here's the table structure:
CREATE TABLE `products` (
`id` int(10) UNSIGNED NOT NULL,
`name` varchar(128) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `products` (`id`, `name`) VALUES
(1, 'Test product');
CREATE TABLE `tracking` (
`id` int(10) UNSIGNED NOT NULL,
`product_id` int(11) NOT NULL,
`hash` varchar(32) NOT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `tracking` (`id`, `product_id`, `hash`, `created`) VALUES
(1, 1, '7ddf32e17a6ac5ce04a8ecbf782ca509', '2020-02-09 18:50:19'),
(2, 1, '00bb28eaf259ba0c932d67f649d90783', '2020-02-09 18:55:34');
CREATE TABLE `views` (
`id` int(10) UNSIGNED NOT NULL,
`hash` varchar(32) NOT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `views` (`id`, `hash`, `created`) VALUES
(1, '7ddf32e17a6ac5ce04a8ecbf782ca509', '2020-02-09 18:46:30'),
(2, '7ddf32e17a6ac5ce04a8ecbf782ca509', '2020-02-09 18:46:30'),
(3, '7ddf32e17a6ac5ce04a8ecbf782ca509', '2020-02-09 18:46:35'),
(4, '7ddf32e17a6ac5ce04a8ecbf782ca509', '2020-02-09 18:46:42'),
(5, '00bb28eaf259ba0c932d67f649d90783', '2020-02-09 18:56:31'),
(6, '00bb28eaf259ba0c932d67f649d90783', '2020-02-09 18:57:01');
And here's the query I wrote so far:
SELECT products.name AS `param`,
SUM(IF(tracking.product_id<>24, 1, 0)) AS `uniques`,
IF(SUM(IF(tracking.product_id<>24, 1, 0))=0, 0,
(SELECT COUNT(`hash`)
FROM `views` WHERE tracking.hash = views.hash)) AS `views`
FROM tracking
LEFT JOIN products ON products.id = tracking.product_id
WHERE tracking.created BETWEEN '2019-01-01 00:00:00' AND '2020-02-10 00:00:00'
GROUP BY products.name
As you can see I have 2 unique views and 6 raw views (4 for one hash and 2 for the other hash).
My expectation would be for the query result to be 2 uniques and 6 raw views for this given product, but instead I'm getting 2 uniques and 4 raw views. Like it's counting the views only for the first hash.
The next query can solve your situation:
SELECT
products.name,
COUNT(DISTINCT `tracking`.`hash`) AS `uniques`, -- count unique hashes
COUNT(*) AS `views` -- count total
FROM `tracking`
JOIN `views` ON `views`.hash = tracking.hash
LEFT JOIN products ON products.id = tracking.product_id
WHERE tracking.created BETWEEN '2019-01-01 00:00:00' AND '2020-02-10 00:00:00'
GROUP BY products.name;
;
I am working on comment system, I have to count all replies of a single comment on several levels.
Like this:
Parent
->child
-> child
Parent
-> child
-> child
->child
My Sql is :
CREATE TABLE IF NOT EXISTS `comment` (
`id` bigint(11) NOT NULL AUTO_INCREMENT COMMENT 'This is primary key of the table',
`parent_id` bigint(11) NOT NULL,
`content` text NOT NULL,
PRIMARY KEY (`comment_id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=unicode_ci AUTO_INCREMENT=8 ;
INSERT INTO `comments` (`id`, parent_id`, `content`) VALUES
(1, 0, 'Parent'),
(2, 1, 'child'),
(3, 2, 'child'),
(4, 3, 'child'),
(5, 1, 'child2'),
(6, 0, 'Parent2'),
(7, 6,'child of parent2');
Try below query:
select count(*)
from comments c0
join comments c1 on c0.id = c1.parentid
-- in case if child comment doesn't have any children, we still need to keep it
left join comments c2 on c1.id = c2.parentid
where c0.id = 1 --particular id for which we want to count children
Currently at the moment I have managed to join two tables to retrieve the information that is need.
I have now decided to try and retrieve another piece of information from a another table ( users.user_id ) but the query I'm trying to use doesn't seem to work. If someone could help with the query would be great.
Here is my current query that works fine.
"SELECT films.movie_title, films.rating, films.actor, reviewed.review
FROM films
INNER JOIN reviewed
ON films.movie_id=reviewed.movie_id";
Here is the query being used to get data from three tables but wont work
"SELECT films.movie_title, films.rating, films.actor, reviewed.review users.name
FROM films
OUTER JOIN reviewed, users
ON films.movie_id=reviewed.movie_id && films.user_id=users.user_id";
Database: film
Table structure for table films
CREATE TABLE IF NOT EXISTS `films` (
`movie_id` int(4) NOT NULL AUTO_INCREMENT,
`movie_title` varchar(100) NOT NULL,
`actor` varchar(100) NOT NULL,
`rating` varchar(20) NOT NULL,
`user_id` int(100) NOT NULL,
PRIMARY KEY (`movie_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;
INSERT INTO `films` (`movie_id`, `movie_title`, `actor`, `rating`, `user_id`) VALUES
(1, 'batman', 'christian bale', 'Excellent', 3),
(2, 'Bne', 'reee', 'Ok', 3),
(3, 'Today', 'dd', 'Fair', 3);
Table structure for table reviewed
CREATE TABLE IF NOT EXISTS `reviewed` (
`review_id` int(4) NOT NULL AUTO_INCREMENT,
`review` mediumtext NOT NULL,
`movie_id` int(4) NOT NULL,
PRIMARY KEY (`review_id`),
KEY `movie_id` (`movie_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;
INSERT INTO `reviewed` (`review_id`, `review`, `movie_id`) VALUES
(1, 'Wicked film', 1),
(2, 'gedtg', 2),
(3, 'dddd', 3);
Table structure for table users
CREATE TABLE IF NOT EXISTS `users` (
`user_id` int(4) NOT NULL AUTO_INCREMENT,
`email` varchar(40) NOT NULL,
`password` varchar(40) NOT NULL,
`name` varchar(30) NOT NULL,
PRIMARY KEY (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=30 ;
INSERT INTO `users` (`user_id`, `email`, `password`, `name`) VALUES
(1, 'ben#talktalk.net', 'password', 'Ben'),
(2, 'richard#talk.net', '1', 'richard'),
Try this:
SELECT films.movie_title, films.rating, films.actor, reviewed.review, users.name
FROM films
LEFT JOIN reviewed ON films.movie_id=reviewed.movie_id
LEFT JOIN users ON films.user_id=users.user_id
Is it possible that you want to have the user_id in your reviews table?
That way you'd have the following:
table of movies; only one per movie
table of users; only one per user
table of reviews; one per review linked to a user and to a movie
The reviews table would now have the rating, the review itself, a user id, a movie id and a unique review id.
That way a Batman could be given an Excellent rating by me and an Average rating by you without duplicating the movie row.
To just fix your above query, you can use the following:
SELECT films.movie_title, films.rating, films.actor, reviewed.review, users.name FROM films, reviewed, users WHERE films.movie_id = reviewed.movie_id AND films.user_id = users.user_id;
If you want to print only films with reviews, you don't need OUTER JOIN.
SELECT films.movie_title, films.rating, films.actor, reviewed.review users.name
FROM films JOIN reviewed on films.movie_id=reviewed.movie_id
JOIN users ON films.user_id=users.user_id;
If you want to print all films, even those with 0 reviews, you have to use LEFT JOIN (MySQL doesn't have FULL OUTER JOIN).
SELECT films.movie_title, films.rating, films.actor, reviewed.review users.name
FROM films LEFT JOIN reviewed on films.movie_id=reviewed.movie_id
LEFT JOIN users ON films.user_id=users.user_id;
Having these 3 tables:
users
CREATE TABLE `users` (
`user_id` MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
`first_name` VARCHAR(64) NOT NULL,
`last_name` VARCHAR(64) NOT NULL,
PRIMARY KEY (`user_id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
AUTO_INCREMENT=1;
posts
CREATE TABLE `posts` (
`post_id` MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
`category_id` MEDIUMINT(8) UNSIGNED NOT NULL,
`author_id` MEDIUMINT(8) UNSIGNED NOT NULL,
`title` VARCHAR(128) NOT NULL,
`text` TEXT NOT NULL,
PRIMARY KEY (`post_id`),
INDEX `FK_posts__category_id` (`category_id`),
INDEX `FK_posts__author_id` (`author_id`),
CONSTRAINT `FK_posts__author_id` FOREIGN KEY (`author_id`) REFERENCES `users` (`user_id`) ON UPDATE CASCADE,
CONSTRAINT `FK_posts__category_id` FOREIGN KEY (`category_id`) REFERENCES `categories` (`category_id`) ON UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
AUTO_INCREMENT=1;
categories
CREATE TABLE `categories` (
`category_id` MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
`name` VARCHAR(64) NOT NULL,
PRIMARY KEY (`category_id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
AUTO_INCREMENT=1;
And data in tables:
INSERT INTO `users` (`user_id`, `first_name`, `last_name`) VALUES
(1, 'John', 'Doe'),
(2, 'Pen', 'Poe'),
(3, 'Robert', 'Roe');
INSERT INTO `categories` (`category_id`, `name`) VALUES
(1, 'Category 1'),
(2, 'Category 2'),
(3, 'Category 3'),
(4, 'Category 4');
INSERT INTO `posts` (`post_id`, `category_id`, `author_id`, `title`, `text`) VALUES
(1, 1, 1, 'title 1', 'text 1'),
(2, 1, 2, 'title 2', 'text 2');
I want to make a simple select (and let MySQL EXPLAIN it):
EXPLAIN SELECT p.post_id, p.title, p.text, c.category_id, c.name, u.user_id, u.first_name, u.last_name
FROM posts AS p
JOIN categories AS c
ON c.category_id = p.category_id
JOIN users AS u
ON u.user_id = p.author_id
WHERE p.category_id = 1
I got this:
What I don't understand is, why has MySQL to do a full table scan at u (users). I mean there will be only two users it has to retrieve data about (with id 1 and 2), and these two can be found by primary key user_id. Can somebody with more experience help me to understand this? Is there a better way of creating indexes so MySQL don't has to make a full scan on the users table to retrieve data about the post authors?
Thanks you!
So with such a small amount a index search is going to be slower than a sequential search. Thus MySQL is choosing to use a simple table read.
It has to do with operational efficiency here. Lets simply the operations that MySQL has to do to read the entire table vs using a index.
Full read:
Open table
Read each line one at a time and match criteria
Return result set
That is 5 operations.
Index Read
Open table
For the criteria read the index for each row
Using the index pointer locate the row on disk for each row
Return resultset
In this case 8 operations.
This is very simplified but unless you have enough data your indexes can slow you down. As the table grows MySQL might choose a different query path. That is why you dont force the use of indexes.
You only have ~3 rows in your users table, according to your test data and your EXPLAIN report.
The optimizer can produce skewed results if you have too few rows in the tables. It may do a table-scan for a tiny table, even if it would use an index for the same query against the same tables with a few hundred or a few thousand rows.
So when doing development, it's important to have a non-trivial amount of test data in your tables if you want to get accurate optimizer reports.