Matching phrases within delimiters in MySQL - mysql

I am looking to perform an exact match on a phrase within specified delimiters in MySQL. I have the following data in a full text index field.
,garden furniture,patio heaters,best offers,best deals,
I am performing the following query which is returning the aforementioned record.
SELECT id, tags
FROM Store
WHERE MATCH(tags) AGAINST(',garden,' IN BOOLEAN MODE)
I only want to return records which contain the value: ,garden, not ,garden furniture, or ,country garden, etc.
It is currently performing a greedy match and ignoring the comma delimiters specified in the query. I have attempted to escape the commas to force them to be included in the query, but this does not work.
Is is possible to specify non-alphanumeric delimiters as part of the match? I want to be able to perform an exact match, like a regular expression i.e '/,garden,/'.

From the docs:
Modify a character set file: This requires no recompilation. The true_word_char() macro uses a “character type” table to distinguish letters and numbers from other characters. . You can edit the contents of the <ctype><map> array in one of the character set XML files to specify that ',' is a “letter.” Then use the given character set for your FULLTEXT indexes. For information about the <ctype><map> array format, see Section 9.3.1, “Character Definition Arrays”.
An other option is to add a new collation.
Either way, you'll have to rebuild the index:
REPAIR TABLE Store QUICK;

Only match against can use an index on your search.
However if your table if not too big, you can use:
SELECT id, tags
FROM Store
WHERE tags LIKE "garden" OR tags LIKE "garden,%" OR tags LIKE "%, garden,%"
There are other options (find_in_set), but I really don't want to go into those, because they perform even worse than the above SQL.
The real problem, never use CSV in a database!
Use CSV in a database is a really really bad idea, because
• It is wasteful, your data is not normalized
• You cannot join on a CSV field
• You cannot use indexes on a CSV field
• Full-text indexes does not play nice with separators (as you've seen)
The answer to create 2 extra tables.
Table tag (innoDB)
----------
id integer primary key auto_increment
tag varchar(50) //one tag per row!
Table tag_link (innoDB)
--------------
store_id integer foreign key references store(id)
tag_id integer foreign key references tag(id)
primary key = (store_id + tag_id) //composite PK
Now you can easily do all sorts of queries on tags.
SELECT s.id, GROUP_CONCAT(t2.tag) FROM store s
INNER JOIN tag_link tl1 ON (s.id = tl1.store_id)
INNER JOIN tag t1 ON (t1.id = tl1.tag_id)
INNER JOIN tag_link tl2 ON (s.id = tl2.store_id)
INNER JOIN tag t2 ON (t2.id = tl2.tag_id)
WHERE t1.tag = 'garden'
GROUP BY s.id
This will select one tag named garden (using t1 and tl1), find all stores linked to that tag and then get all tags linked to those stores (using t2 and tl2).
Very fast and very flexible.

Related

Is there way to add multiple values to 1 ID in access

I have a table that has Act ID, and another table that has Act ID, percentage complete. This can have multiple entries for different days. I need the sum of the percentage added for the Act ID on the first tableZA.108381.080
First table
Act ID Percent Date
ZA.108381.110 Total from 2 table
ZA.108381.120
ZA.108476.020
ZA.108381.110 25% 5/25/19
ZA.108381.110 75 6/1/19
ZA.108381.120
ZA.108476.020
This would be generally considering not good practice. Your primary key should be uniquely identifiable for that specific table, and any other data related to that key should be stored in separate columns.
However since an answer is not a place for a lecture, if you want to store multiple values in you Act ID column, I would suggest changing your primary key to something more generic "RowID". Then using vba to insert multiple values into this field.
However changing the primary key late in a databases life may cause alot of issues or be difficult. So good luck
Storing calculated values in a table has the disadvantage that these entries may become outdated as the other table is updated. It is preferable to query the tables on the fly to always get uptodate results
SELECT A.ActID, SUM(B.Percentage) AS SumPercent
FROM
table1 A
LEFT JOIN table2 B
ON A.ActID = B.ActID
GROUP BY A.ActID
ORDER BY A.ActID
This query allows you to add additional columns from the first table. If you only need the ActID from table 1, then you can simplify the query, and instead take it from table 2:
SELECT ActID, SUM(Percentage) AS SumPercent
FROM table2
GROUP BY ActID
ORDER BY ActID
If you have spaces other other special characters in a column or table name, you must escape it with []. E.g. [Act ID].
Do not change the IDs in the table. If you want to have the result displayed as the ID merged with the sum, change the query to
SELECT A.ActID & "." & Format(SUM(B.Percentage), "0.000") AS Result
FROM ...
See also: SQL GROUP BY Statement (w3schools)

Optimal MySQL table schema for given use case

I have two tables - books and images. The books table has many columns - including id (primary key), name (which is not unique), releasedate, etc. The images table have two columns - id (which is not unique, i.e one book id may have multiple images associated with it, and we need all those images. This column has a non-unique index), and poster (which is unique primary key, all images lie in the same bucket, hence cannot have duplicate names). My requirement is given a book name, find all images associated with it (along with the year of release and the bucketname for each image, the bucketname being just a number in this case).
I am running this query:
select books.id,poster,bucketname,year(releasedate) from books
inner join images where images.bookId = books.id and books.name = "<name>";
A sample result set may look like this:
As you can see there are two results matching - one with id 2 and year 1989, having 5 images, other one with id 261009, year 2013 and one image.
The problem is, the query is extremely slow. It takes around .14 seconds from MySQL console itself, under zero load (in production there may be several concurrent requests and they may be queued, leading to further delay), which is unacceptable for autocomplete. Can anyone tell me how to optimize the query by adding correct indices/keys to the tables? If it is not possible from MySQL, suggestions regarding a proper Redis schema would be useful as well.
Edit: Approx no. of rows in images - 480k, in books - 285k. In future, autocomplete will show result for book authors as well as book names, hence the query will need to expand to take into account a separate table authors where each author will have an id and name, just like a book.
For optimal performance, you want suitable covering indexes available. For example:
... on `books` (`name`,`id`,`releasedate`)
... on `images` (`bookid`,`poster`,`bucketname`)
We want name as the leading column in the index, because of the equality predicate in the WHERE clause. We want id and releasedate also included in the index to make it a "covering index", so the query can be satisfied from the index, without a need to visit pages of the underlying table to retrieve values.
We want bookid as the leading column because of the reference in the ON clause. Again, having poster and bucketname available right in the index make it a "covering" index.
Use EXPLAIN to see the query execution plan.
Also, note that the inner join operation won't return a row from books if a matching row in images is not found. If we want to return a row from books even when no image is available, we could use an outer join.
I'd write the query like this:
SELECT b.id
, i.poster
, i.bucketname
, YEAR(b.releasedate)
FROM books b
LEFT
JOIN images i
ON i.bookid = b.id
WHERE b.name = ?

order table by optional value in second table (mysql, wordpress)

I want to order all rows of a table ("posts") by a (sort-)value ("sortDate") that is stored in a second table ("meta").
The (sort-)value of the second table is stored as a key-value pair. the key is 'publishDate'
The linking column between both tables is "postID".
The (sort-)value of the second table is optional, or can be entered multiple times.
-> If the (sort-)value is entered multiple times, i want to use the maximum.
-> If the (sort-)value is not present in the second table, i want to use the "postDate" - value of the first table instead.
This is my solution:
SELECT posts.postID,posts.postDate,metaDate.publishDate,
CASE
WHEN metaDate.publishDate is null Then posts.postDate
ELSE metaDate.publishDate
END AS sortDate /*fallback for those rows that do not have a matching key-value pair in second table*/
From posts
Left Join
(
Select meta.postID,MAX(metaValue) as publishDate
From meta
Where meta.metaKey = 'publishDate'
GROUP BY meta.postID
) As metaDate /*create a table with the maximum of publishDate, therefor handle multiple entries*/
ON posts.postID = metaDate.postID
ORDER BY sortDate DESC;
see also
sqlfiddle with this solution --->
Is there a smarte / faster way to do so?
As i am not a sql expert - anything i have overseen ?
(Background:
the structure of the tables is a wordpress-database-structure, therefore it is given, a related topic would be "sort posts by custom fields in wordpress" - but the solutions i found did not handle multiple or optional custom fields)
Thanks for comments and support

Combining table values in one output in SQL

I have two tables: one called tweets and one called references. tweets consists out of the rows tweet_id and classified amongst others. references consists out of the rows tweet_id and class_id.
The row tweet_id in the table references only consists out of a fraction of the total tweet_ids in the table tweets.
What I would like to do is combine these tables in such a way that the eventual table shows the rows r.tweet_id, t.classified and r.class_id.
I've come up with this query, but for some reason it shows zero rows of output. In reality however, there are about 900 rows in r.tweet_id which all exist in t.tweet_id.
SELECT 't.tweet_id', 't.classified', 'r.tweet_id', 'r.class_id'
FROM `tweets` t, `references` r
WHERE 'r.tweet_id' = 't.tweet_id'
Could somebody tell me what I am doing wrong and how I should change my script in order to get the desired outcome?
Mysql uses backticks ` to escape schema object names (columns, tables, databases) and apostrophes ' and quotes " to escape strings so you are comparing string r.tweet_id with string t.tweed_id in your condition (which is supposed to be false), do:
SELECT t.tweet_id, t.classified, r.tweet_id, r.class_id
FROM tweets AS t
INNER JOIN `references` AS r ON r.tweet_id = t.tweet_id
Note that you have to just escape word references because it's reserved word in mysql and you can omit other backticks.
Also if you also want to display rows like 1, 2, NULL, NULL (tweets that weren't classified) you can use LEFT JOIN instead of INNER JOIN;if you allow multiple classifications per one tweet, some GROUP BY (Aggregate) Functions may get handy.
BTW: PostgreSQL uses " for schema object names and ' for strings.

Querying for multiple many-to-many associates in MySQL

Background
My program is storing a series of objects, a set of tags, and the many-to-many associations between tags and objects, in a MySQL database. To give you an idea of the structure:
CREATE TABLE objects (
object_id INT PRIMARY KEY,
...
);
CREATE TABLE tags (
tag_name VARCHAR(32) NOT NULL
);
CREATE TABLE object_tags (
object_id INT NOT NULL,
tag_name VARCHAR(32) NOT NULL,
PRIMARY KEY (object_id, tag_name)
);
Problem
I want to be able to query for all objects that are tagged with all of the tags in a given set. As an example, let's say I have a live tree, a dead flower, an orangutan, and a ship as my objects, and I want to query all of those tagged living and plant. I expect to receive a list containing only the tree, assuming the tags match the characteristics of the objects.
Current Solution
Presently, given a list of tags T1, T2, ..., Tn, I am solving the problem as follows:
Select all object_id columns from the object_tags table where tag_name is T1.
Join the result of (1) with the object_tags table, and select all object_id columns where tag_name is T2.
Join the result of (2) with the object_tags table again, and select all object_id columns where tag_name is T3.
Repeat as necessary for T4, ..., Tn.
Join the result of (4) with the objects table and select the additional columns of the objects that are needed.
In practice (using Java), I start with the query string for the first tag, then prepend/append the string parts for the second tag, and so on in a loop, before finally prepending/appending the string parts that make up the overall query. Only then does the string actually get passed into a PreparedStatement and get executed on the server.
Edit: Expanding on my example from above, using this solution I would issue the following query:
SELECT object_id FROM object_tags JOIN (
SELECT object_id FROM object_tags WHERE tag_name='living'
) AS _temp USING (object_id) WHERE tag_name='plant';
Question
Is there a better solution to this problem? Although the number of tags is not likely to be large, I am concerned about the performance of this solution, especially as the database grows in size. Furthermore, it is very difficult to read and maintain the code, especially when the additional concerns/constraints of the application are thrown in.
I am open to suggestions at any level, although the languages (MySQL and Java) are not variables at this point.
I don't know about the performance of this solution, but you can simplify by using pattern matching in MySql to match a set of pipe-delimited tags (or any delimiter). This is a solution I've used before for similar applications with tag tables (#match would be a variable passed in by your Java code, I've harded coded a value for demonstration):
set #match = 'living|plant';
set #numtags =
length(#match) - length(replace(#match, '|', '')) + 1;
select * from objects o
where #numtags =
(
select count(*) from object_tags ot
where concat('|',#match,'|')
like concat('%|',ot.tag_name,'|%')
and ot.object_id = o.object_id
)
Here is a working demo: http://sqlize.com/0vP6DgQh0j