Many to Many single query optimization - mysql

I have 3 tables with many to many relation
CREATE TABLE news(id int, content varchar(64));
CREATE TABLE tags(id int, name varchar(64));
CREATE TABLE news_tags(id int, tag_id int, news_id int);
INSERT INTO news VALUES
(1, "Hello, world!"),
(2, "Test news"),
(3, "test news 2"),
(4, "test news 3"),
(5, "test news 4");
INSERT INTO tags VALUES
(1, "general tag"),
(2, "sub tag 1"),
(3, "sub tag 2"),
(4, "normal tag");
INSERT INTO news_tags VALUES
(1, 1, 1),
(2, 2, 1),
(3, 3, 1);
INSERT INTO news_tags VALUES
(4, 1, 2),
(5, 2, 2),
(6, 1, 3),
(7, 4, 3),
(7, 2, 4),
(8, 3, 4),
(9, 1, 5);
I want to select news_id what
On relation have only general (id 1 on example) tag and dont have any other sub tag (on exmpl id 3)
Have pair of tags general + sub tag (id 2)
I create a query
SELECT news_id FROM news_tags WHERE tag_id = 1 OR tag_id = 2
GROUP BY news_id
HAVING COUNT(news_id) = 2
UNION
SELECT news_id FROM news_tags WHERE tag_id = 1 AND news_id not in (SELECT news_id FROM news_tags WHERE tag_id in (2,3));
but have 2 problems
I think its not optimization way (have 2 select with union + sub select query)
if i what search more one pairs sub tags i need add new select with union
How can i optimizate this query ?
live example http://www.sqlfiddle.com/#!9/1067b7/1/0

Your question is unclear because the concepts of "sub" tags and "general" tags are not defined.
But if you want to handle multiple conditions at the same time, you can still use one GROUP BY and HAVING clause.
For instance, if you wanted news_ids that met either of these conditions:
tag_id = 1
Or both tag_id = 2 and tag_id = 3
Then you can use:
SELECT nt.news_id
FROM news_tags nt
GROUP BY nt.news_id
HAVING (COUNT(*) = 1 AND MIN(nt.tag_id) = 1) OR
SUM( nt.tag_id IN (2, 3) ) = 2;
You can easily extend this idea to the descriptions of the tags (but you need to join in the tags table for that.

I would suggest based on the extra comments about blocking tags, to do a redesign.
Assigning tags to news items is good, but your table should look like news_tags(news_id, tag_id), and primary key is over both news_id, and tag_id field.
If you want to make tags blocking, one way is to add another Many-To-Many relation, called news_blocking_tags(news_id, tag_id). Or you can define your news_tags(news_id, tag_id, is_blocking), so you know which tags are blocking, and which are just tags.
Optimising starts with designing the database. We can only give general pointers here. Good that you know what the outcome needs to be, that's already half the design!

Look like i found solution without made change on schema tables
SELECT news_id, tag_id
FROM news_tags
WHERE tag_id in (1,2,3)
GROUP BY news_id
HAVING (COUNT(news_id) = 1 AND tag_id = 1) OR (count(news_id) = 2 and tag_id in (1,2));
first we find all news with blocked tags , than with HAVING filter result
(COUNT(news_id) = 1 AND tag_id = 1)
find all record with only "blocked" tag for all countrys (its part can use for on many to many find all records what use only single tag)
(count(news_id) = 2 and tag_id in (1,2))
find pairs of "blocked" tag and country tag
if we need more country we need add OR and
(count(news_id) = 2 and tag_id in (1,3))
i need do some test but look like its work nice , and better that my first query

I've done tagging many times. I recommend not using a many-to-many mapping table to a tag table. Instead, combine them into
CREATE TABLE tags (
news_id MEDIUMINT UNSIGNED NOT NULL, -- Assuming you don't need more than 16M
tag VARCHAR(...) NOT NULL,
PRIMARY KEY(news_id, tag), -- For going one way
INDEX(tag, news_id) -- For going the other way
) ENGINE=InnoDB; -- Important due to the way indexes are handled
Adding an auto_inc is a waste of space and speed.
Yeah, some queries get a big gnarly. But any technique leads to messy code; I belive that this schema is the best.

Related

How can I write that SQL statement analyzing the frequency of used keywords?

I have a mysql database containing blog articles. Each article has multiple keywords that are m:n linked by using table 'art_key'.
Table containing the article itself:
table articles {
id,
title,
text
}
Table containing each keyword once:
table keywords {
id,
word
}
Table linking the articles and keywords together: One article contain multiple keywords and one keyword can be used in multiple articles.
table art_key {
id,
article_id,
keyword_id
}
Some of the articles contain pictures. Those have an additional keyword "[PICTURE]".
For analysis I'd like to see how often (in how many articles) each keyword has been used and for each keyword: what percentage of the articles containing this keywords have a picture (have keyword "[PICTURE]").
Additionally, the analysis should be case-insensitive and leading blanks removed. So the keywords 'sql', ' SQL', 'sqL ', 'SqL' should be seen as one keyword 'sql'.
How can I write that query using an SQL statement?
Thanks!
This query should do what you want. It joins the keyword list to the art_key table to find all articles with a given keyword, then joins that to a list of articles which have pictures (which is found by a separate JOIN subquery) to determine how many articles with a given keyword have pictures in them. Keywords are pre-processed and grouped for display using LOWER and TRIM to make the result case-insensitive and tolerant of white space.
SELECT LOWER(TRIM(k.word)) AS keyword
, COUNT(DISTINCT a.article_id) AS num_articles
, COUNT(DISTINCT p.article_id) / COUNT(DISTINCT a.article_id) * 100 AS percent_with_pictures
FROM keywords k
LEFT JOIN art_key a ON a.keyword_id = k.id
LEFT JOIN (SELECT a.article_id
, COUNT(DISTINCT a.article_id) AS num_pictures
FROM art_key a
JOIN keywords k ON k.id = a.keyword_id AND LOWER(TRIM(k.word)) = '[picture]'
GROUP BY a.article_id) p ON p.article_id = a.article_id
GROUP BY keyword
HAVING COUNT(a.article_id) > 0
I created a small demo on SQLFiddle to show how I've interpreted your question and how the query works.
create table keywords (id int auto_increment primary key, word varchar(20));
insert into keywords (word) values
('sql'), ('SQL '), (' SQL'), ('SQl'), (' sQl '), ('MySQL'), ('[PICTURE]');
create table art_key(id int auto_increment primary key, article_id int, keyword_id int);
insert into art_key (article_id, keyword_id) values
(1, 2), (1, 3), (1, 4), (1, 6), (2, 1), (2, 5), (3, 4), (4, 5), (4, 2), (4, 6), (1, 7), (4, 7);
Output:
keyword num_articles percent_with_pictures
mysql 2 100
sql 4 50
[picture] 2 100

selecting 1 image from each product if there were n different products, if not, select more than 1 image from each products til n is reached

In the project's design schema, one product may have many images.
Now I want to select n images from products with the situation:
If n products were defined, select 1 image from each.
Else select more images from each product limit n
Do we also need a PHP side action to reach the goal?
The schema is as expected:
product (id, title, ...)
image (id, product_id, filename, ...)
I cannot even think of a query like this, that's why I haven't unfortunately tried anything.
The query should look like this:
SELECT * FROM image ..with..those..hard..conditions.. LIMIT n
If I understand it well -- you need n images. If possible from different products. Otherwise, several images from the same product is acceptable as fallback.
From now, the only solution I could think of, is to build a temporary table, with numbered rows such as there will be one "image" of each product at the "top" -- and filed with the rest of the images.
Once that table is build, your query is just a SELECT ... LIMIT n.
This will perform horribly -- and if you choose that solution of something inspired -- you should consolidate the image table off-line or on schedule basis.
see http://sqlfiddle.com/#!2/81274/2
--
-- test values
--
create table P (id int, title char(20));
insert into P values
(1, "product 1"),
(2, "product 2"),
(3, "product 3");
create table I (pid int, filename char(40));
insert into I values
(1, "image p1-1"),
(1, "image p1-2"),
(3, "image p3-1"),
(3, "image p3-2"),
(3, "image p3-3");
--
-- "ordered" image table
--
create table T(n int primary key auto_increment not null,
filename char(20));
--
-- consolidate images (once in a while)
--
delete from T;
insert into T(filename)
select filename from I group by pid;
insert into T(filename)
select filename from I order by rand();
--
-- do the actual query
--
select * from T limit n;
EDIT Here is a completely different idea. Not using a consolidation table/view -- so this might be seen as better:
http://sqlfiddle.com/#!2/57ea9/4
select distinct(filename) from
(select 1 as p, filename from I group by pid
union (select 2 as p, filename from I order by rand() limit 3)) as T
order by p limit 3
The key point here is I don't have to really "number" the rows. Only to keep track of which rows are coming from the first SELECT. That is the purpose of p. I set both LIMIT clause to the same value for simplicity. I don't think you have to "optimize" that part since the benefit would be very small -- and ORDER BY RAND() is so terrible that you don't have to think about "performances" here ;)
Please note I don't have fully tested this solution. Let me know if there is some corner cases (well.. any case) that don't work.
http://sqlfiddle.com/#!2/32198/2/0
create view T as select (select filename from I where pid = id order by filename limit 1) as singleImage from P having singleImage is not null;
select * from (
select singleImage from T
union all (select filename from I where filename not in
(select singleImage from T) order by rand() limit 5)
) as MoreThanN limit 5;
If your N is rather small, you may benefit from my technique for selecting random rows from large tables: although it is intended for selecting a single row, it could be adapted to select a few random rows relatively easily.
Here's the SQL with Sylvain Leroux's examples:
-- Test values
create table P (id int, title char(20));
insert into P values
(1, "product 1"),
(2, "product 2"),
(3, "product 3");
create table I (pid int, filename char(40));
insert into I values
(1, "image p1-1"),
(1, "image p1-2"),
(3, "image p3-1"),
(3, "image p3-2"),
(3, "image p3-3"),
(3, "image p3-4"),
(3, "image p3-5");
-- View to avoid repeating the query
create view T as select (select filename from I where pid = id order by filename limit 1) as singleImage from P having singleImage is not null;
-- Query
select * from (
select singleImage from T
union all (select filename from I where filename not in
(select singleImage from T) order by rand() limit 5)
) as MoreThanN limit 5;

MySQL order by IN order

i have simple query:
SELECT data FROM table WHERE id IN (5, 2, 8, 1, 10)
Question is, how can i select my data and order it like in my IN.
Order must be 5, 2, 8, 1, 10.
Problem is that i have no key for order. IN data is from other query (1), but i need to safe order.
Any solutions?
(1)
SELECT login
FROM posts
LEFT JOIN users ON posts.post_id=users.id
WHERE posts.post_n IN (
2280219,2372244, 2345146, 2374106, 2375952, 2375320, 2371611, 2360673, 2339976, 2331440, 2279494, 2329266, 2271919, 1672114, 2301856
)
Thanx for helping, solutions works but very slow, maybe find something better later, thanx anyway
The only way I can think to order by an arbitrary list would be to ORDER BY comparisons to each item in that list. It's ugly, but it will work. You may be better off sorting in whatever code you are doing the selection.
SELECT data FROM t1 WHERE id IN (5, 2, 8, 1, 10)
ORDER BY id = 10, id = 1, id = 8, id = 2, id = 5
The order is reversed because otherwise you would have to add DESC to each condition.
You can use a CASE statement
SELECT data
FROM table WHERE id IN (5, 2, 8, 1, 10)
ORDER BY CASE WHEN id = 5 THEN 1 WHEN id = 2 THEN 2 WHEN id = 8 THEN 3 WHEN id = 1 THEN 4 WHEN id = 10 THEN 5 END
SELECT data FROM table
WHERE id IN (5, 2, 8, 1, 10)
ORDER BY FIELD (id, 5, 2, 8, 1, 10)
http://dev.mysql.com/doc/refman/5.5/en/string-functions.html#function_field
Might be easier to auto-generate (because it basically just needs inserting the wanted IDs comma-separated in the same order a second time) than the other solutions suggested using CASE or a number of ID=x, ID=y ...
http://sqlfiddle.com/#!2/40b299/6
I think that's what you're looking for :D Adapt it to your own situation.
To do this dynamically, and within MySql, I would suggest to do the following:
Create a temp table or table variable (not sure if MySql has these), with two columns:
OrderID mediumint not null auto_increment
InValue mediumint -(or whatever type it is)
Insert the values of the IN clause in order, which will generate ID's in order of insertion
Add a JOIN to your query on this temp table
Change your Order By to be
order by TempTable.OrderID
Drop temp table (again, in SQL inside a stored proc, this is automatic, not sure about MySql so mentioning here for full disclosure)
This effectively circumvents the issue of you not having a key to order by in your table ... you create one. Should work.

Multiple Join on same table with different columns

I have problems making a SQL request.
Here is my tables:
CREATE TABLE dates(
id INT PRIMARY KEY,
obj_id INT,
dispo_date text
);
CREATE TABLE option(
id INT PRIMARY KEY,
obj_id INT,
random_option INT
);
CREATE TABLE obj(
id INT PRIMARY KEY,
);
and a random date that the user gives me and some options.
I'd like to select everything on both tables which correspond to an obj having his date equal to the user's date.
let's say that DATE = "22/01/2013" and OPTIONS = 3.
SELECT * FROM obj
INNER JOIN dates
ON dates.obj_id=obj.id
INNER JOIN option
ON option.obj_id=obj.id
WHERE dates.dispo_date="22/01/2013"
AND option.random_option=3;
That just gives me everything from my obj table with, for each one, the same dates and options without filtering anything.
Can someone give me some pointers about what I'm doing wrong ?
SOLUTION:
Since everybody seemed to get what I was looking for I restarted my SQL server and since, everything works ...
Thanks for your help and sorry for the time-loss :-(
As far as I can see, there is nothing wrong with the query.
When I try it, it returns only the obj rows where there is a corresponding date and a corresponding option.
insert into dates values
(1, 1, '22/01/2013'),
(2, 1, '23/01/2013'),
(3, 2, '22/01/2013'),
(4, 2, '23/01/2013'),
(5, 3, '23/01/2013'),
(6, 3, '24/01/2013');
insert into `option` values
(1, 1, 4),
(2, 1, 5),
(3, 2, 3),
(4, 2, 4),
(5, 3, 3),
(6, 3, 4);
insert into obj values
(1),
(2),
(3)
With this data it should filter out obj 1 because there is no option 3 for it, and filter out obj 3 because there is no date 22 for it.
Result:
ID OBJ_ID DISPO_DATE RANDOM_OPTION
-------------------------------------
2 2 22/01/2013 3
Demo: http://sqlfiddle.com/#!2/a398f/1
Change your line
WHERE dates.dispo_date="22/01/2013"
for
WHERE DATE(dates.dispo_date)="22/01/2013"
Handling dates in text fields is a little tricky (also bad practice). Make sure both dates are in the same format.
First, I'm a little confused on which ID's map to which tables. I might respectfully suggest that the id field in DATES be renamed to date_id, the id in OPTION be renamed to option_id, and the id in obj to obj_id. Makes those relationships MUCH clearer for folks looking in through the keyhole. I'm going in a bit of a circle making sure I understand your relationships properly. On that basis, I may be understanding your problem incorrectly.
I think you have obj.id->dates.obj_id, and option.obj_id->dates.obj_id, so on that basis, I think your query has to be a bit more complicated:
This gives you object dates:
Select *
from obj obj
join dates d
on obj.id=d.obj_id
This gives you user dates:
select *
from option o
join dates d
on o.obj_id=d.obj_id
To get the result of objects and users having the same dates, you'd need to hook these two together:
select *
from (Select *
from obj obj
join dates d
on obj.id=d.obj_id) a
join (select *
from option o
join dates d
on o.obj_id=d.obj_id) b
on a.dispo_date=b.dispo_date
where b.random=3
I hope this is useful. Good luck.

SQL Query to get column values that correspond with MAX value of another column?

Ok, this is my query:
SELECT
video_category,
video_url,
video_date,
video_title,
short_description,
MAX(video_id)
FROM
videos
GROUP BY
video_category
When it pulls the data, I get the correct row for the video_id, but it pulls the first row for each category for the others. So when I get the max result for the video_id of category 1, I get the max ID, but the first row in the table for the url, date, title, and description.
How can I have it pull the other columns that correspond with the max ID result?
Edit: Fixed.
SELECT
*
FROM
videos
WHERE
video_id IN
(
SELECT
DISTINCT
MAX(video_id)
FROM
videos
GROUP BY
video_category
)
ORDER BY
video_category ASC
I would try something like this:
SELECT
s.video_id
,s.video_category
,s.video_url
,s.video_date
,s.video_title
,short_description
FROM videos s
JOIN (SELECT MAX(video_id) AS id FROM videos GROUP BY video_category) max
ON s.video_id = max.id
which is quite faster that your own solution
I recently released a new technique to handle this type of problem in MySQL.
SCALAR-AGGREGATE REDUCTION
Scalar-Aggregate Reduction is by far the highest-performance approach and simplest method (in DB engine terms) for accomplishing this, because it requires no joins, no subqueries, and no CTE.
For your query, it would look something like this:
SELECT
video_category,
MAX(video_id) AS video_id,
SUBSTRING(MAX(CONCAT(LPAD(video_id, 11, '0'), video_url)), 12) AS video_url,
SUBSTRING(MAX(CONCAT(LPAD(video_id, 11, '0'), video_date)), 12) AS video_date,
SUBSTRING(MAX(CONCAT(LPAD(video_id, 11, '0'), video_title)), 12) AS video_title,
SUBSTRING(MAX(CONCAT(LPAD(video_id, 11, '0'), short_description)), 12) AS short_description
FROM
videos
GROUP BY
video_category
The combination of scalar and aggregate functions does the following:
LPADs the intra-aggregate correlated identifier to allow proper string comparison (e.g. "0009" and "0025" will be properly ranked). I'm LPADDING to 11 characters here assuming an INT primary key. If you use a BIGINT, you will want to increase this to support your table's ordinality. If you're comparing on a DATETIME field (fixed length), no padding is necessary.
CONCATs the padded identifier with the output column (so you get "00000000009myvalue" vs "0000000025othervalue")
MAX the aggregate set, which will yield "00000000025othervalue" as the winner.
SUBSTRING the result, which will truncate the compared identifier portion, leaving only the value.
If you want to retrieve values in types other than CHAR, you may need to performa an additional CAST on the output, e.g. if you want video_date to be a DATETIME:
CAST(SUBSTRING(MAX(CONCAT(LPAD(video_id, 11, '0'), video_date)), 12) AS DATETIME)
Another benefit of this method over the self-joining method is that you can combine other aggregate data (not just latest values), or even combine first AND last item in the same query, e.g.
SELECT
-- Overall totals
video_category,
COUNT(1) AS videos_in_category,
DATEDIFF(MAX(video_date), MIN(video_date)) AS timespan,
-- Last video details
MAX(video_id) AS last_video_id,
SUBSTRING(MAX(CONCAT(LPAD(video_id, 11, '0'), video_url)), 12) AS last_video_url,
...
-- First video details
MIN(video_id) AS first_video_id,
SUBSTRING(MIN(CONCAT(LPAD(video_id, 11, '0'), video_url)), 12) AS first_video_url,
...
-- And so on
For further details explaining the benefits of this method vs other older methods, my full blog post is here: https://www.stevenmoseley.com/blog/tech/high-performance-sql-correlated-scalar-aggregate-reduction-queries
Here is a more general solution (handles duplicates)
CREATE TABLE test(
i INTEGER,
c INTEGER,
v INTEGER
);
insert into test(i, c, v)
values
(3, 1, 1),
(3, 2, 2),
(3, 3, 3),
(4, 2, 4),
(4, 3, 5),
(4, 4, 6),
(5, 3, 7),
(5, 4, 8),
(5, 5, 9),
(6, 4, 10),
(6, 5, 11),
(6, 6, 12);
SELECT t.c, t.v
FROM test t
JOIN (SELECT test.c, max(i) as mi FROM test GROUP BY c) j ON
t.i = j.mi AND
t.c = j.c
ORDER BY c;
A slightly more "rustic" solution, but should do the job just the same:
SELECT
video_category,
video_url,
video_date,
video_title,
short_description,
video_id
FROM
videos
ORDER BY video_id DESC
LIMIT 1;
In other words, just produce a table with all of the columns that you want, sort it so that your maximum value is at the top, and chop it off so you only return one row.
SELECT video_category,video_url,video_date,video_title,short_description,video_id
FROM videos t1
where video_id in (SELECT max(video_id) FROM videos t2 WHERE t1.video_category=t2.video_category );
Please provide your input and output records so that it can be understood properly and tested.