The Situation
As some of you might already know from my previous questions, I'm currently developing a Blog-system.
This time, I'm stuck at getting all posts from a specific category, with their category.
Database
Here are the SQL-commands to create the three required tables.
Post
create table Post(
headline varchar(100),
date datetime,
content text,
author int unsigned,
public tinyint,
type int,
ID serial,
Primary Key (ID),
)ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
author is the ID of the user who created the post, public determines if the post can be read from everyone or is just a draft and type determines if it's a blog-post (0) or something else.
Category
create table Kategorie(
name varchar(30),
short varchar(200),
ID serial,
Primary Key (name)
)ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Post_Kategorie
create table Post_Kategorie(
post_ID bigint unsigned,
kategorie_ID bigint unsigned,
Primary Key (post_ID, kategorie_ID),
Foreign Key (post_ID) references Post(ID),
Foreign Key (kategorie_ID) references Kategorie(ID)
)ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The Query
This is my current query to get all posts tagged with a specific category, which is determined by the category's ID:
SELECT Post.headline, Post.date, Post.ID,
CONCAT(
"[", GROUP_CONCAT('{"name":"',Kategorie.name,'","id":',Kategorie.ID,'}'), "]"
) as "categorys"
FROM Post
INNER JOIN Post_Kategorie
ON Post.ID = Post_Kategorie.post_ID
INNER JOIN Kategorie
ON Post_Kategorie.kategorie_ID = 2
WHERE Post.public = 1
AND Post.type = 0
GROUP BY Post.headline, Post.date
ORDER BY Post.date DESC
LIMIT 0, 20
The query works for listing all posts tagged with a specific category, but the categorys-column gets mixed up as every listed post has all available category's (every category listed in the Kategorie-table).
I'm sure the problem lays in the INNER JOIN-condition, but I have no clue where. Please point me in the right direction.
I suspect there might be issues with your CONCAT function, as it mixes different types of quotation marks. I think "[" and "]" should be respectively '[' and ']'.
Otherwise, the problem does seem to be with one of the joins. In particular, INNER JOIN Kategorie does not specify the joining condition, which, I think, should be Post_Kategorie.Kategorie_ID = Kategorie.ID.
There entire query should thus be something like this:
SELECT Post.headline, Post.date, Post.ID,
CONCAT(
"[", GROUP_CONCAT('{"name":"',Kategorie.name,'","id":',Kategorie.ID,'}'), "]"
) as "categorys"
FROM Post
INNER JOIN Post_Kategorie
ON Post.ID = Post_Kategorie.post_ID
INNER JOIN Kategorie
ON Post_Kategorie.Kategorie_ID = Kategorie.ID
WHERE Post.public = 1
AND Post.type = 0
GROUP BY Post.headline, Post.date
HAVING MAX(CASE Post_Kategorie.kategorie_ID WHEN 2 THEN 1 ELSE 0 END) = 1
ORDER BY Post.date DESC
LIMIT 0, 20
The Post_Kategorie.kategorie_ID = 2 condition has been modified to a CASE expression and moved to the HAVING clause, and it is used together with the MAX() aggregate function. This works as follows:
If a post is tagged with a tag or tags belonging to Kategorie.ID = 2, the CASE expression will return 1, and MAX will evaluate to 1 too. Consequently, all the group will be valid and remain in the output.
If no tag the post is tagged with belongs to the said category, the CASE expression will never evaluate to 1, nor will MAX. As a result, the entire group will be discarded.
Related
I need to select rows from images where the set of tags belonging to an image contains at least all of the tags specified in a list of strings.
CREATE TABLE images (
image_checksum varchar(56) NOT NULL,
filename varchar(56),
PRIMARY KEY (image_checksum)
);
CREATE TABLE tags (
id int NOT NULL AUTO_INCREMENT,
name varchar(64),
confidence DECIMAL(5,2),
image varchar(56) NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (image) REFERENCES images(image_checksum)
);
I have this query that returns all of the images with tags that contain ANY of the objects specified in the list. The list will be variable length depending on what comes in from the client. I have two images in my database specified. One of a dog, one of a cat. With the query I need -- I would expect to get zero results because neither image contains a dog AND a cat.
SELECT DISTINCT images.image_checksum, images.filename, tags.name, tags.confidence from images
LEFT OUTER JOIN tags ON (tags.image = images.image_checksum)
WHERE name in ('dog','cat');
Any help is appreciated!
You want window functions to count the number of matching tags. Then use that for filtering:
SELECT it.*
FROM (SELECT i.image_checksum, i.filename, t.name, tags.confidence ,
COUNT(*) OVER (PARTITION BY i.image_checksum) as num_tags
FROM images i JOIN
tags t
ON t.image = i.image_checksum
WHERE t.name in ('dog', 'cat')
) it
WHERE num_tags = 2;
you could use group_concat for this particular problem.
SELECT images.image_checksum, images.filename, tags.name,
tags.confidence
from images
LEFT OUTER JOIN tags ON (tags.image = images.image_checksum)
WHERE tags.image in (select t1.image
from tags t1
group by t1.image
having group_concat(t1.name order by t1.name asc) like '%cat,dog%');
This will return all images that have both tags, but will also return all the tags related to those images.
You would have to just make sure that the tags being searched are in alphabetical order, so that it may find them.
group_concat, by default, uses a comma as a separator for different values.
But, it can be overridden using the key word SEPARATOR
group_concat(tags.name SEPARATOR ', ')
More info can be obtained here
I have a recipe table, called recipes. There is the IDRecipe field and other parameters of the recipe except the categories. Categories are multi dimensional, so I have another table that connects one to many with one recipe. It is called category table (table 1 below). As you will see below, one recipe can have multiple categories in multiple dimensions. So I have another table (table 2) that describes the categories and dimensions, also below:
-- Table 1
CREATE TABLE `recepti_kategorije` (
`IDRecipe` int(11) NOT NULL,
`IDdimenzija` int(11) NOT NULL,
`IDKategorija` int(11) NOT NULL,
KEY `Iskanje` (`IDdimenzija`,`IDKategorija`,`IDRecipe`) USING BTREE,
KEY `izvlecek_recept` (`IDdimenzija`,`IDRecipe`),
KEY `IDRecipe` (`IDRecept`,`IDdimenzija`,`IDKategorija`) USING BTREE,
KEY `kategorija` (`IDKategorija`,`IDdimenzija`,`IDRecipe`) USING BTREE
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_slovenian_ci;
INSERT INTO `recepti_kategorije` VALUES
(1,1,1),
(1,1,2),
(1,2,3),
(1,3,2);
-- Table 2
CREATE TABLE `recipes_dimensions` (
`IDDimenzija` int(11) NOT NULL,
`IDKategorija` int(11) NOT NULL,
`Ime` char(50) COLLATE utf8_slovenian_ci NOT NULL,
KEY `IDDmenzija` (`IDDimenzija`,`IDKategorija`) USING BTREE,
KEY `IDKategorija` (`IDKategorija`,`IDDimenzija`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_slovenian_ci;
INSERT INTO `recipes_dimensions` VALUES
(1,1,'cheese'),
(1,2,'eggs'),
(1,3,'meat'),
(1,4,'vegetables'),
(2,1,'main dish'),
(2,2,'sweet'),
(2,3,'soup'),
(3,1,'summer'),
(3,2,'winter');
-- Table 3
CREATE TABLE `recepti_dimenzije_glavne` (
`IDDimenzija` int(11) NOT NULL,
`DimenzijaIme` char(50) COLLATE utf8_slovenian_ci DEFAULT NULL,
PRIMARY KEY (`IDDimenzija`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_slovenian_ci;
INSERT INTO `recepti_dimenzije_glavne` VALUES
(1,'ingredient'),
(2,'type'),
(3,'season');
Table 2 is the key table to find out the legend of each dimensions and each category.
So from this example we see that my recipe with ID1 has the tag: cheese and eggs from dimension 1 and is soup for winter season.
Now on my recipes page I need to get all this out to print the names of each dimension together with all the category names.
Ok, so there is another table, table 3, to get the names of the dimensions out:
Now what I need is a query that would get me at the same time for recipe ID=1 all the dimensions group concatenated with names, like:
ingredient: cheese, eggs | type: soup | season: winter
I tried doing a query for each of them in SELECT statement and it works, but I need 8 select queries (in total I have 8 dimensions, for the example I only wrote 3), my select query is:
SELECT
r.ID
(
SELECT
group_concat(ime SEPARATOR ', ')
FROM
recepti_kategorije rkat
JOIN recepti_dimenzije rd ON rd.IDKategorija = rkat.IDKategorija
AND rd.IDDimenzija = rkat.IDdimenzija
WHERE
rkat.IDRecipe = r.ID
AND rkat.IDDimenzija = 1
ORDER BY
ime ASC
) AS ingredient,
(
SELECT
group_concat(ime SEPARATOR ', ')
FROM
recepti_kategorije rkat
JOIN recepti_dimenzije rd ON rd.IDKategorija = rkat.IDKategorija
AND rd.IDDimenzija = rkat.IDdimenzija
WHERE
rkat.IDRecipe = r.ID
AND rkat.IDDimenzija = 2
ORDER BY
ime ASC
) AS type,
(
SELECT
group_concat(ime SEPARATOR ', ')
FROM
recepti_kategorije rkat
JOIN recepti_dimenzije rd ON rd.IDKategorija = rkat.IDKategorija
AND rd.IDDimenzija = rkat.IDdimenzija
WHERE
rkat.IDRecipe = r.ID
AND rkat.IDDimenzija = 3
ORDER BY
ime ASC
) AS season
FROM
recipes r
WHERE
r.ID = 1
That works, but it is somehow slow because the explain says it is searching like 6-8 rows each time and it is a long query and I don't get the names of the dimensions out because I need another join.
What would be optimal way to get all the dimensions separated into fields and concated with category names? I need to have this optimised as this is for one recipe presentation that happens each second, I can not fool around here. And whta indexes do I need so that this would be fast.
Something like below, not sure I typed the table/column names right or not, but should be easy to debug:
SELECT c.ID,GROUP_CONCAT(CONCAT(d.DimenzijaIme,': ',c.imes) SEPARATOR ' | ')
FROM (
SELECT
r.ID,rkat.IDDimenzija,
group_concat(rd.ime SEPARATOR ', ' ORDER BY rd.ime) AS imes
FROM recepti_kategorije rkat
JOIN recepti_dimenzije rd
ON rd.IDKategorija = rkat.IDKategorija
AND rd.IDDimenzija = rkat.IDdimenzija
INNER JOIN recipes r
ON r.ID=rkat.IDRecipe
GROUP BY r.ID,rkat.IDDimenzija) c
INNER JOIN recepti_dimenzije_glavne d
ON d.IDDimenzija=c.IDDimenzija
GROUP BY c.ID
I've database in which I'm storing japanese dictionary: words, readings, tags, types, meanings in other languages (english is the most important here, but there's also a few other) and so on.
Now, I want to create an interface using Datatables js plugin, so user could see table and use some filtering options (like, show only verbs, or find entries containing "dog"). I'm struggling, however, with query which can be pretty slow when using filtering... I already speed it up a lot, but it still not good.
This is my basic query:
select
v.id,
(
select group_concat(distinct vke.kanji_element separator '; ') from vocabulary_kanji_element as vke
where vke.vocabulary_id = v.id
) kanji_notation,
(
select group_concat(distinct vre.reading_element separator '; ') from vocabulary_reading_element as vre
where vre.vocabulary_id = v.id
) reading_notation,
(
select group_concat(distinct vsg.gloss separator '; ') from vocabulary_sense_gloss as vsg
join vocabulary_sense as vs on vsg.sense_id = vs.id
join language as l on vsg.language_id = l.id and l.language_code = 'eng'
where vs.vocabulary_id = v.id
) meanings,
(
select group_concat(distinct pos.name_code separator '; ') from vocabulary_sense as vs
join vocabulary_sense_has_pos as vshp on vshp.sense_id = vs.id
join part_of_speech as pos on pos.id = vshp.pos_id
where vs.vocabulary_id = v.id
) pos
from vocabulary as v
join vocabulary_sense as vs on vs.vocabulary_id = v.id
join vocabulary_sense_gloss as vsg on vsg.sense_id = vs.id
join vocabulary_kanji_element as vke on vke.vocabulary_id = v.id
join vocabulary_reading_element as vre on vre.vocabulary_id = v.id
join language as l on l.id = vsg.language_id and l.language_code = 'eng'
join vocabulary_sense_has_pos as vshp on vshp.sense_id = vs.id
join part_of_speech as pos on pos.id = vshp.pos_id
where
-- pos.name_code = 'n' and
(vsg.gloss like '%eat%' OR vke.kanji_element like '%eat%' OR vre.reading_element like '%eat%')
group by v.id
order by v.id desc
-- limit 3900, 25
Output is something like this:
|id | kanji_notation | reading_notation | meanings | pos |
---------------------------------------------------------------
|117312| お手; 御手 | おて | hand; arm |n; int|
Right now (working on my local machine), If there's no WHERE statement, but with limit, it works fast - about 0,140 sec. But when text filtering is on, execution time wents up to 6,5 sec, and often above. With filtering on part_of_speech first, its like 5,5 sec. 3 sec would be ok, but 6 is just way too long.
There's 1 155 897 records in table vocabulary_sense_gloss, so I think that's not a lot.
CREATE TABLE `vocabulary_sense_gloss` (
`id` MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
`sense_id` MEDIUMINT(8) UNSIGNED NOT NULL,
`gloss` VARCHAR(255) NOT NULL,
`language_id` MEDIUMINT(8) UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
INDEX `vocabulary_sense_gloss_vocabulary_sense_id` (`sense_id`),
INDEX `vocabulary_sense_gloss_language_id` (`language_id`),
FULLTEXT INDEX `vocabulary_sense_gloss_gloss` (`gloss`),
CONSTRAINT `vocabulary_sense_gloss_language_id` FOREIGN KEY (`language_id`) REFERENCES `language` (`id`),
CONSTRAINT `vocabulary_sense_gloss_vocabulary_sense_id` FOREIGN KEY (`sense_id`) REFERENCES `vocabulary_sense` (`id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
I wonder, is there some way to optimize it? Or maybe should I change my database? I was trying to use fulltext search, but it's not much faster, and seems to work only on full terms, so its no use. Similiar story with using 'eat%' instead of '%eat%': it won't return what I want.
I tried to divide vocabulary_sense_gloss in two tables - one with english only terms, and other with the rest. Since users would use usually english anyway, it would make things faster, but I'm not sure if that's a good approach.
Also, I was trying to change VARCHAR to CHAR. It seemed to speed up execution time, though table size went up a lot.
This WHERE clause has extremely poor performance.
(vsg.gloss like '%eat%' OR
vke.kanji_element like '%eat%' OR
vre.reading_element like '%eat%')
Why? First of all: column LIKE '%constant%' requires the query engine to examine every possible value of column. It can't possibly use an index because of the leading % in the constant search term.
Second: the OR clause means the query planner has to scan the results three different times.
What are you going to do to improve this? It won't be easy. You need to figure out how to use column LIKE 'constant%' search terms eliminating the leading % from the constants.
Once you do that, you may be able to beat the triple scan of your vast joined result set with a construct like this
...
WHERE v.id IN
(SELECT sense_id AS id
FROM vocabulary_sense_gloss
WHERE gloss LIKE 'eat%'
UNION
SELECT vocabulary_id AS id
FROM vocabulary_kanji_element
WHERE kanji_element LIKE 'eat%'
UNION
SELECT vocabulary_id AS id
FROM vocabulary_reading_element
WHERE reading_element LIKE 'eat%'
)
This will pull the id numbers of the relevant words directly, rather than from the result of a multiway JOIN. For this to be fast, your vocabulary_sense_gloss will need an index on (vocabulary_sense_gloss, sense_id). The other two tables will need similar indexes.
I am trying to interpret the explain of mysql on a query,this is the table:
create table text_mess(
datamess timestamp(3) DEFAULT 0,
sender bigint ,
recipient bigint ,
roger boolean,
msg char(255),
foreign key(recipient)
references users (tel)
on delete cascade
on update cascade,
primary key(datamess,sender)
)
engine = InnoDB
this is the first type of query :
EXPLAIN
select /*!STRAIGHT_JOIN*/datamess, sender,recipient,roger,msg
from text_mess join (select max(datamess)as dmess
from text_mess
where roger = true
group by sender,recipient) as max
on text_mess.datamess=max.dmess ;
and this is the second:EXPLAIN
EXPLAIN
select /*!STRAIGHT_JOIN*/datamess, sender,recipient,roger,msg
from (select max(datamess)as dmess
from text_mess
where roger = true
group by sender,recipient) as max
join
text_mess
on max.dmess = text_mess.datamess ;
the two queries are asking the same thing, the only difference is the order of ref_table (driving_table), in the first case is text_mess, in the second case is a sub query :
![first and second query][1]
as you can see the difference is in the order of the first two lines, my question in particular is on the second (the faster query )
the second line should be the inner-table, but if so, why the column ref tells me: max.dmess, that should be the column of the ref-table (sub-query).
and, the last row is referred on how the first is built?
and in the end you think there is a more efficient query?
I have this query in sqlite:
SELECT
'L_MEDIA_ARTIST'.'MEDIA_ID'
FROM \
'L_MEDIA_ARTIST',
'L_ARTIST_CAT',
'ARTIST_CAT'
WHERE
'L_ARTIST_CAT'.'ART_ID' == 'L_MEDIA_ARTIST'.'ART_ID'
AND
'L_ARTIST_CAT'.'ART_CAT_ID' == 'ARTIST_CAT'.'ID'
AND
('ARTIST_CAT'.'NAME' == 'SINGER' OR 'ARTIST_CAT'.'NAME' == 'ACTOR')
which just selects all the media id such that the artist has at least one of the tag 'SINGER' or 'ACTOR'.
How can I change this query in order to obtain the list of all media such that the actor has neither the tag 'SINGER' nor the tag 'ACTOR'?
The involved tables are built up has follows:
CREATE TABLE 'L_MEDIA_ARTIST' (
'MEDIA_ID' INTEGER,
'ART_ID' INTEGER,
FOREIGN KEY('MEDIA_ID') REFERENCES MEDIA('ID'),
FOREIGN KEY('ART_ID') REFERENCES ARTIST('ID'),
UNIQUE('MEDIA_ID', 'ART_ID'));
CREATE TABLE 'L_ARTIST_CAT' (
'ART_ID' INTEGER,
'ART_CAT_ID' INTEGER,
FOREIGN KEY('ART_ID') REFERENCES ARTIST('ID'),
FOREIGN KEY('ART_CAT_ID') REFERENCES ARTIST_CAT('ID'),
UNIQUE('ART_ID', 'ART_CAT_ID'));
CREATE TABLE 'ARTIST_CAT' (
'ID' INTEGER PRIMARY KEY,
'NAME' TEXT NOT NULL UNIQUE);
You need an aggregation query for this, because you have to check that none of the values for a media are in the list. Just looking on one row doesn't provide enough information:
SELECT l.MEDIA_ID
FROM L_MEDIA_ARTIST l JOIN
L_ARTIST_CAT ac
ON l.ART_ID = ac.ART_ID JOIN
ARTIST_CAT c
ON ac.ART_CAT_ID = c.ID
GROUP BY l.MEDIA_ID
HAVING SUM(CASE WHEN c.Name IN ('SINGER', 'ACTOR') THEN 1 ELSE 0 END) = 0;
Note that I also fixed the query:
Introduced proper join syntax. You should learn modern join syntax.
Added table aliases so the query is easier to write and to read.
Removed the single quotes around table and column names, which just cause syntax errors.
The HAVING clause counts the number of times that "SINGER" and "ACTOR" are found in the data. The = 0 ensures there are none for a given media.
The media IDs that you do not want can be retrieved with this query:
SELECT L_Media_Artist.Media_ID
FROM L_Media_Artist
JOIN L_Artist_Cat USING (Art_ID)
JOIN Artist_Cat ON L_Artist_Cat.Art_Cat_ID = Artist_Cat.ID
WHERE Artist_Cat.Name IN ('SINGER', 'ACTOR')
(This is the same as your first query.)
So you want all media that are not one of those:
SELECT ID
FROM Media
WHERE ID NOT IN (SELECT L_Media_Artist.Media_ID
FROM L_Media_Artist
JOIN L_Artist_Cat USING (Art_ID)
JOIN Artist_Cat ON L_Artist_Cat.Art_Cat_ID = Artist_Cat.ID
WHERE Artist_Cat.Name IN ('SINGER', 'ACTOR'))