I have a difficult mysql-question. I have a database that matches keywords to cases, simplified:
TABLE cases (id)
TABLE keywords (id)
TABLE cases_keywords (case, keyword)
So I could have case A with keywords Y and Z and I could have case B with keywords X and Y. This gives me four rows in cases_keywords:
(A, Y)
(A, Z)
(B, X)
(B, Y)
The problem:
I have a search on my site where users type in keywords to search. I want the results to show matches when all keywords are found. So when user types in Y and Z as keywords only case A appears (as case B does not have keyword Z). But when the user only gives Y as keyword the site shows both A and B (as they both got keyword Y).
So I know the queryis dynamic via PHP, the big question is to match for multiple keywords... Let's say I want to match for 2 keywords: Y and Z with result: case A.
How do I program this into a query? How can I match on multiple rows?
SELECT C.id FROM cases C JOIN cases_keywords CK ON CK.case = C.id WHERE CK.keyword = Y AND CK.keyword = Z
Above is not working, so I tried something with WHERE C.case IN () but I got stuck there too...
Someone please help me
2 suggestions here.
first one must be built dynamically and won't be that efficient, but it is easy to understand.
SELECT C.id
FROM cases C
JOIN cases_keywords CK1 ON CK1.case = C.id AND CK1.keyword = Y
JOIN cases_keywords CK2 ON CK2.case = C.id AND CK2.keyword = Z
More efficient and can be built non dynamically:-
SELECT C.id, COUNT(CK.keyword) AS KeywordCount
FROM cases C
JOIN cases_keywords CK ON CK.case = C.id
WHERE CK.keyword IN ('Y', 'Z')
GROUP BY C.id
HAVING KeywordCount = 2
If you can use MyISAM storage engine (or perhaps InnoDB in MySQL 5.6) you can do the following with fulltext indexes (which then allows you to have keywords that are actually phrases):
CREATE TABLE cases (id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(10) NOT NULL) Engine = MyISAM;
CREATE TABLE keywords (id INT AUTO_INCREMENT PRIMARY KEY,
word VARCHAR(10) NOT NULL) Engine = MyISAM;
ALTER TABLE keywords ADD FULLTEXT INDEX (word);
CREATE TABLE cases_keywords (`case` INT,
keyword INT,
PRIMARY KEY (`case`,keyword)) Engine = MyISAM;
INSERT INTO cases VALUES (1,'Alpha'),(2,'Beta');
INSERT INTO keywords VALUES (1,'Xylophone'),(2,'Yaks'),(3,'Zebra');
INSERT INTO cases_keywords VALUES (1,2),(1,3),(2,1),(2,2);
SELECT `cases`.name, COUNT(keywords.id) AS matches
FROM `cases`
JOIN cases_keywords
ON cases_keywords.case = `cases`.id
JOIN keywords
ON keywords.id = cases_keywords.keyword
WHERE MATCH(word) AGAINST ('(Yaks Zebra) ("Yaks Zebra") (+Yaks* +Zebra*)' IN BOOLEAN MODE)
GROUP BY `cases`.id
ORDER BY matches DESC
Working SQLFiddle: http://sqlfiddle.com/#!2/49c19/2
Obviously you can tweak the fulltext search to work how you want, but we used the following BOOLEAN MODE search to do the following (which fit our requirements):
(Yaks Zebra) -> matches any occurence in the `word` column of Yak or Zebra
("Yaks Zebra") -> matches the whole phrase "Yaks Zebra" in the `word` column
(+Yaks* +Zebra*) -> requires both Yaks and Zebra to be in the `word` column, but
also allows extra characters after each of those words...
Related
I've database in which I'm storing japanese dictionary: words, readings, tags, types, meanings in other languages (english is the most important here, but there's also a few other) and so on.
Now, I want to create an interface using Datatables js plugin, so user could see table and use some filtering options (like, show only verbs, or find entries containing "dog"). I'm struggling, however, with query which can be pretty slow when using filtering... I already speed it up a lot, but it still not good.
This is my basic query:
select
v.id,
(
select group_concat(distinct vke.kanji_element separator '; ') from vocabulary_kanji_element as vke
where vke.vocabulary_id = v.id
) kanji_notation,
(
select group_concat(distinct vre.reading_element separator '; ') from vocabulary_reading_element as vre
where vre.vocabulary_id = v.id
) reading_notation,
(
select group_concat(distinct vsg.gloss separator '; ') from vocabulary_sense_gloss as vsg
join vocabulary_sense as vs on vsg.sense_id = vs.id
join language as l on vsg.language_id = l.id and l.language_code = 'eng'
where vs.vocabulary_id = v.id
) meanings,
(
select group_concat(distinct pos.name_code separator '; ') from vocabulary_sense as vs
join vocabulary_sense_has_pos as vshp on vshp.sense_id = vs.id
join part_of_speech as pos on pos.id = vshp.pos_id
where vs.vocabulary_id = v.id
) pos
from vocabulary as v
join vocabulary_sense as vs on vs.vocabulary_id = v.id
join vocabulary_sense_gloss as vsg on vsg.sense_id = vs.id
join vocabulary_kanji_element as vke on vke.vocabulary_id = v.id
join vocabulary_reading_element as vre on vre.vocabulary_id = v.id
join language as l on l.id = vsg.language_id and l.language_code = 'eng'
join vocabulary_sense_has_pos as vshp on vshp.sense_id = vs.id
join part_of_speech as pos on pos.id = vshp.pos_id
where
-- pos.name_code = 'n' and
(vsg.gloss like '%eat%' OR vke.kanji_element like '%eat%' OR vre.reading_element like '%eat%')
group by v.id
order by v.id desc
-- limit 3900, 25
Output is something like this:
|id | kanji_notation | reading_notation | meanings | pos |
---------------------------------------------------------------
|117312| お手; 御手 | おて | hand; arm |n; int|
Right now (working on my local machine), If there's no WHERE statement, but with limit, it works fast - about 0,140 sec. But when text filtering is on, execution time wents up to 6,5 sec, and often above. With filtering on part_of_speech first, its like 5,5 sec. 3 sec would be ok, but 6 is just way too long.
There's 1 155 897 records in table vocabulary_sense_gloss, so I think that's not a lot.
CREATE TABLE `vocabulary_sense_gloss` (
`id` MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
`sense_id` MEDIUMINT(8) UNSIGNED NOT NULL,
`gloss` VARCHAR(255) NOT NULL,
`language_id` MEDIUMINT(8) UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
INDEX `vocabulary_sense_gloss_vocabulary_sense_id` (`sense_id`),
INDEX `vocabulary_sense_gloss_language_id` (`language_id`),
FULLTEXT INDEX `vocabulary_sense_gloss_gloss` (`gloss`),
CONSTRAINT `vocabulary_sense_gloss_language_id` FOREIGN KEY (`language_id`) REFERENCES `language` (`id`),
CONSTRAINT `vocabulary_sense_gloss_vocabulary_sense_id` FOREIGN KEY (`sense_id`) REFERENCES `vocabulary_sense` (`id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
I wonder, is there some way to optimize it? Or maybe should I change my database? I was trying to use fulltext search, but it's not much faster, and seems to work only on full terms, so its no use. Similiar story with using 'eat%' instead of '%eat%': it won't return what I want.
I tried to divide vocabulary_sense_gloss in two tables - one with english only terms, and other with the rest. Since users would use usually english anyway, it would make things faster, but I'm not sure if that's a good approach.
Also, I was trying to change VARCHAR to CHAR. It seemed to speed up execution time, though table size went up a lot.
This WHERE clause has extremely poor performance.
(vsg.gloss like '%eat%' OR
vke.kanji_element like '%eat%' OR
vre.reading_element like '%eat%')
Why? First of all: column LIKE '%constant%' requires the query engine to examine every possible value of column. It can't possibly use an index because of the leading % in the constant search term.
Second: the OR clause means the query planner has to scan the results three different times.
What are you going to do to improve this? It won't be easy. You need to figure out how to use column LIKE 'constant%' search terms eliminating the leading % from the constants.
Once you do that, you may be able to beat the triple scan of your vast joined result set with a construct like this
...
WHERE v.id IN
(SELECT sense_id AS id
FROM vocabulary_sense_gloss
WHERE gloss LIKE 'eat%'
UNION
SELECT vocabulary_id AS id
FROM vocabulary_kanji_element
WHERE kanji_element LIKE 'eat%'
UNION
SELECT vocabulary_id AS id
FROM vocabulary_reading_element
WHERE reading_element LIKE 'eat%'
)
This will pull the id numbers of the relevant words directly, rather than from the result of a multiway JOIN. For this to be fast, your vocabulary_sense_gloss will need an index on (vocabulary_sense_gloss, sense_id). The other two tables will need similar indexes.
I have this query in sqlite:
SELECT
'L_MEDIA_ARTIST'.'MEDIA_ID'
FROM \
'L_MEDIA_ARTIST',
'L_ARTIST_CAT',
'ARTIST_CAT'
WHERE
'L_ARTIST_CAT'.'ART_ID' == 'L_MEDIA_ARTIST'.'ART_ID'
AND
'L_ARTIST_CAT'.'ART_CAT_ID' == 'ARTIST_CAT'.'ID'
AND
('ARTIST_CAT'.'NAME' == 'SINGER' OR 'ARTIST_CAT'.'NAME' == 'ACTOR')
which just selects all the media id such that the artist has at least one of the tag 'SINGER' or 'ACTOR'.
How can I change this query in order to obtain the list of all media such that the actor has neither the tag 'SINGER' nor the tag 'ACTOR'?
The involved tables are built up has follows:
CREATE TABLE 'L_MEDIA_ARTIST' (
'MEDIA_ID' INTEGER,
'ART_ID' INTEGER,
FOREIGN KEY('MEDIA_ID') REFERENCES MEDIA('ID'),
FOREIGN KEY('ART_ID') REFERENCES ARTIST('ID'),
UNIQUE('MEDIA_ID', 'ART_ID'));
CREATE TABLE 'L_ARTIST_CAT' (
'ART_ID' INTEGER,
'ART_CAT_ID' INTEGER,
FOREIGN KEY('ART_ID') REFERENCES ARTIST('ID'),
FOREIGN KEY('ART_CAT_ID') REFERENCES ARTIST_CAT('ID'),
UNIQUE('ART_ID', 'ART_CAT_ID'));
CREATE TABLE 'ARTIST_CAT' (
'ID' INTEGER PRIMARY KEY,
'NAME' TEXT NOT NULL UNIQUE);
You need an aggregation query for this, because you have to check that none of the values for a media are in the list. Just looking on one row doesn't provide enough information:
SELECT l.MEDIA_ID
FROM L_MEDIA_ARTIST l JOIN
L_ARTIST_CAT ac
ON l.ART_ID = ac.ART_ID JOIN
ARTIST_CAT c
ON ac.ART_CAT_ID = c.ID
GROUP BY l.MEDIA_ID
HAVING SUM(CASE WHEN c.Name IN ('SINGER', 'ACTOR') THEN 1 ELSE 0 END) = 0;
Note that I also fixed the query:
Introduced proper join syntax. You should learn modern join syntax.
Added table aliases so the query is easier to write and to read.
Removed the single quotes around table and column names, which just cause syntax errors.
The HAVING clause counts the number of times that "SINGER" and "ACTOR" are found in the data. The = 0 ensures there are none for a given media.
The media IDs that you do not want can be retrieved with this query:
SELECT L_Media_Artist.Media_ID
FROM L_Media_Artist
JOIN L_Artist_Cat USING (Art_ID)
JOIN Artist_Cat ON L_Artist_Cat.Art_Cat_ID = Artist_Cat.ID
WHERE Artist_Cat.Name IN ('SINGER', 'ACTOR')
(This is the same as your first query.)
So you want all media that are not one of those:
SELECT ID
FROM Media
WHERE ID NOT IN (SELECT L_Media_Artist.Media_ID
FROM L_Media_Artist
JOIN L_Artist_Cat USING (Art_ID)
JOIN Artist_Cat ON L_Artist_Cat.Art_Cat_ID = Artist_Cat.ID
WHERE Artist_Cat.Name IN ('SINGER', 'ACTOR'))
I have a table with a composite key composed of 2 columns, say Name and ID. I have some service that gets me the keys (name, id combination) of the rows to keep, the rest i need to delete. If it was with only 1 row , I could use
delete from table_name where name not in (list_of_valid_names)
but how do I make the query so that I can say something like
name not in (valid_names) and id not in(valid_ids)
// this wont work since they separately dont identity a unique record or will it?
Use mysql's special "multiple value" in syntax:
delete from table_name
where (name, id) not in (select name, id from some_table where some_condition);
If your list is a literal list, you can still use this approach:
delete from table_name
where (name, id) not in (select 'john', 1 union select 'sally', 2);
Actually, no I retract my comment about needing special juice or being stuck with (AND OR'ing all your options).
Since you have a list of values of what you want to retain, dump that into a temporary table. Then do a delete against the base table for what does not exist in the temporary table (left outer join). I suck at mysql syntax or I'd cobble together your query. Psuedocode is approximate
DELETE
B
FROM
BASE B
LEFT OUTER JOIN
#RETAIN R
ON R.key1 = B.key1
AND R.key2 = B.key
WHERE
R.key1 IS NULL
The NOT EXISTS version:
DELETE
b
FROM
BaseTable b
WHERE
NOT EXISTS
( SELECT
*
FROM
RetainTable r
WHERE
(r.key1, r.key2) = (b.key1, b.key2)
)
This query will be done in a cached autocomplete text box, possibly by thousands of users at the same time. What I have below works, bit I feel there may be a better way to do what I am doing.
Any advice?
UPDATED -- it can be 'something%':
SELECT a.`object_id`, a.`type`,
IF( b.`name` IS NOT NULL, b.`name`,
IF( c.`name` IS NOT NULL, c.`name`,
IF( d.`name` IS NOT NULL, d.`name`,
IF ( e.`name` IS NOT NULL, e.`name`, f.`name` )
)
)
) AS name
FROM `user_permissions` AS a
LEFT JOIN `divisions` AS b
ON ( a.`object_id` = b.`division_id`
AND a.`type` = 'division'
AND b.`status` = 1 )
LEFT JOIN `departments` AS c
ON ( a.`object_id` = c.`department_id`
AND a.`type` = 'department'
AND c.`status` = 1 )
LEFT JOIN `sections` AS d
ON ( a.`object_id` = d.`section_id`
AND a.`type` = 'section'
AND d.`status` = 1 )
LEFT JOIN `units` AS e
ON ( a.`object_id` = e.`unit_id`
AND a.`type` = 'unit'
AND e.`status` = 1 )
LEFT JOIN `positions` AS f
ON ( a.`object_id` = f.`position_id`
AND a.`type` = 'position'
AND f.`status` = 1 )
WHERE a.`user_id` = 1 AND (
b.`name` LIKE '?%' OR
c.`name` LIKE '?%' OR
d.`name` LIKE '?%' OR
e.`name` LIKE '?%' OR
f.`name` LIKE '?%'
)
Two simple, fast queries is often better than one huge, inefficient query.
Here's how I'd design it:
First, create a table for all your names, in MyISAM format with a FULLTEXT index. That's where your names are stored. Each of the respective object type (e.g. departments, divisions, etc.) are dependent tables whose primary key reference the primary key of the main named objects table.
Now you can search for names with this much simpler query, which runs blazingly fast:
SELECT a.`object_id`, a.`type`, n.name, n.object_type
FROM `user_permissions` AS a
JOIN `named_objects` AS n ON a.`object_id = n.`object_id`
WHERE MATCH(n.name) AGAINST ('name-to-be-searched')
Using the fulltext index will run hundreds of times faster than using LIKE in the way you're doing.
Once you have the object id and type, if you want any other attributes of the respective object type you can do a second SQL query joining to the table for the appropriate object type:
SELECT ... FROM {$object_type} WHERE object_id = ?
This will also go very fast.
Re your comment: Yes, I'd create the table with names even if it's redundant.
Other than changing the nested Ifs to use a Coalesce() function (MySql has Coalesce() doesn't it)? There is not much you can do as long as you need to filter on that input parameter with a like expresion. Putting a filter on a column using a Like expression, where the Like parameter has a wildcard at the begining, as you do, makes the query argument non-SARG-able, which means that the query processor must do a complete table scan of all the rows in the table to evaluate the filter predicate.
It cannot use an index, because an index is based on the column values, and with your Like parameter, it doesn't know which index entries to read from (since the parameter starts with a wild card)
if MySql has Coalesce, you can replace your Select with:
SELECT a.`object_id`, a.`type`,
Coalesce(n.name, c.name, d.Name, e.Name) name
If you can replace the search argument parameter so that it does not start with a wildcard, then just ensure that there is an index on the name column in each of the tables, and (if there are not indices on that column now), the query performance will increase enormously.
There are 500 things you can do. Optimize once you know where your bottlenecks are. Until then, work on getting those users onto your app. Its a much higher priority.
If it matters, I'm using Firebird 2.1 database.
I have three tables, one with keywords, one with negative keywords, and the other with required keywords. I need to be able to filter the data so the output has just the keywords that meat the stipulation of not being in the negative keyword list, and IF there are any required words, then it will require the results to have those keywords in the end result.
The tables are very similar, the field in the tables that I would be matching against are all called keyword.
I don't know SQL very well at all. I'm guessing it would be something like SELECT keyword from keywordstable where keyword in requiredkeywordstable and where NOT in negativekeywordstable
Just a side note, The required keywords table could be empty which would mean there are no required keywords.
Any help would be appreciated.
Example Of Tables:
KeywordsTable
-Keywords varchar 255
RequiredKeywordsTable
-Keywords varchar 255
NegativeKeywordsTable
-Keywords varchar 255
Example Data:
KeywordsTable
Cat
Dog
Mouse
Horse
House
With Nothing set in the Negative and Required Keywords Tables then the output would simply be the Keywords Table data unchanged.
IF RequiredKeywordsTable has the value of Car, Cat, Dog then the output would be and Cat Dog
If NegativeKeywordsTable has the value of Horse and requiredkeywords was empty then the output of the Keywords table would be cat, dog, mouse, House.
etc..
-Brad
Your specification is a bit hazy. It would help if you provided some schema. Is the keywords table just words or is it a list of keywords for a given entity? What happens if there exists at least one RequiredKeyword but not all Keywords are required? Should the non-required keywords show or should only required keywords show in that scenario? If both required and non-required keywords should be returned, then how does the list of required keywords affect the outcome? Here are some possible solutions:
Scenario 1:
The three tables are keywords for a given entity key.
The EntityKey is not nullable.
If a given entity has a required keyword, then only required keywords should show.
Select ...
From Keywords As K
Left Join NegativeKeywords As NK
On NK.EntityKey = K.EntityKey
Left Join RequiredKeywords As RK
On RK.EntityKey = K.EntityKey
Where NK.EntityKey Is Null
And (
Not Exists (
Select 1
From RequiredKeywords As RK1
Where RK1.EntityKey = K.EntityKey
)
Or RK.EntityKey Is Not Null
)
Scenario 2:
Only the Keywords table is for a given entity key or is just words but the other two are a list of required and negative keywords.
The Keyword column in all three tables is not nullable.
If there exists even one required keyword, then only required keywords should show:
Select ...
From Keywords As K
Left Join NegativeKeywords As NK
On NK.Keyword = K.Keyword
Left Join RequiredKeywords As RK
On RK.Keyword = K.Keyword
Where NK.Keyword Is Null
And (
Not Exists (
Select 1
From RequiredKeywords As RK1
Where RK1.Keyword = K.Keyword
)
Or RK.Keyword Is Not Null
)
Scenario 3:
The Keywords table is just words
The Keyword column in all three tables is not nullable.
The system should return whether the given keyword is required or not but should also show non-required keywords.
Select ...
, Case When RK.Keywords Is Not Null Then 1 Else 0 End As IsRequired
From Keywords As K
Left Join NegativeKeywords As NK
On NK.Keyword = K.Keyword
Left Join RequiredKeywords As RK
On RK.Keyword = K.Keyword
Where NK.Keyword Is Null
Addition
Given your additional information, here is how you can solve the problem. First, based on what you said, I'm presuming the schema looks something like:
Create Table Keywords( Keywords varchar(255) not null primary key )
Create Table NegativeKeywords( Keywords varchar(255) not null primary key )
Create Table RequiredKeywords( Keywords varchar(255) not null primary key )
If the Keywords column is the only column, I would make it not nullable and the primary key. This ensures that you do not have duplicates and lets us rely on the fact that the column is not nullable to check for non-existence. The problem is significantly more difficult to solve if the Keywords column is nullable in the NegativeKeywords and/or RequiredKeywords table.
Insert Keywords(Keywords) Values( 'Cat' )
Insert Keywords(Keywords) Values( 'Dog' )
Insert Keywords(Keywords) Values( 'Mouse' )
Insert Keywords(Keywords) Values( 'Horse' )
Insert Keywords(Keywords) Values( 'House' )
Select ...
From Keywords As K
Left Join NegativeKeywords As NK
On NK.Keywords = K.Keywords
Left Join RequiredKeywords As RK
On RK.Keywords = K.Keywords
Where NK.Keywords Is Null
And (
Not Exists (
Select 1
From RequiredKeywords As RK1
Where RK1.Keywords = K.Keywords
)
Or RK.Keywords Is Not Null
)