MySQL search query ordered by match relevance

MySQL search query ordered by match relevance - mysql

I know basic MySQL querying, but I have no idea how to achieve an accurate and relevant search query.
My table look like this:
id | kanji
-------------
1 | 一子
2 | 一人子
3 | 一私人
4 | 一時
5 | 一時逃れ
I already have this query:
SELECT * FROM `definition` WHERE `kanji` LIKE '%一%'
The problem is that I want to order the results from the learnt characters, 一 being a required character for the results of this query.
Say, a user knows those characters: 人,子,時
Then, I want the results to be ordered that way:
id | kanji
-------------
2 | 一人子
1 | 一子
4 | 一時
3 | 一私人
5 | 一時逃れ
The result which matches the most learnt characters should be first. If possible, I'd like to show results that contain only learnt characters first, then a mix of learnt and unknown characters.
How do I do that?

Per your preference, ordering by number of unmatched characters (increasing), and then number of matched character (decreasing).
SELECT *,
(kanji LIKE '%人%')
+ (kanji LIKE '%子%')
+ (kanji LIKE '%時%') score
FROM kanji
ORDER BY CHAR_LENGTH(kanji) - score, score DESC
Or, the relational way to do it is to normalize. Create the table like this:
kanji_characters
kanji_id | index | character
----------------------------
1 | 0 | 一
1 | 1 | 子
2 | 0 | 一
2 | 1 | 人
2 | 2 | 子
...
Then
SELECT kanji_id,
COUNT(*) length,
SUM(CASE WHEN character IN ('人','子','時') THEN 1 END) score
FROM kanji_characters
WHERE index <> 0
AND kanji_id IN (SELECT kanji_id FROM kanji_characters WHERE index = 0 AND character = '一')
GROUP BY kanji_id
ORDER BY length - score, score DESC
Though you didn't specify what should be done in the case of duplicate characters. The two solutions above handle that differently.

Just a thought, but a text index may help, you can get a score back like this:
SELECT match(kanji) against ('your search' in natural language mode) as rank
FROM `definition` WHERE match(`kanji`) against ('your search' in natural language mode)
order by rank, length(kanji)
The trick is to index these terms (or words?) the right way. I think the general trick is to encapsulate each word with double quotes and make a space between each. This way the tokenizer will populate the index the way you want. Of course you would need to add/remove the quotes on the way in/out respectively.
Hope this doesn't bog you down.

Related

Mysql-> Group after rand()

I have the following table in Mysql
Name Age Group
abel 7 A
joe 6 A
Rick 7 A
Diana 5 B
Billy 6 B
Pat 5 B
I want to randomize the rows, but they should still remain grouped by the Group column.
For exmaple i want my result to look something like this.
Name Age Group
joe 6 A
abel 7 A
Rick 7 A
Billy 6 B
Pat 5 B
Diana 5 B
What query should i use to get this result? The entire table should be randomised and then grouped by "Group" column.

What you describe in your question as GROUPing is more correctly described as sorting. This is a particular issue when talking about SQL databases where "GROUP" means something quite different and determines the scope of aggregation operations.
Indeed "group" is a reserved word in SQL, so although mysql and some other SQL databases can work around this, it is a poor choice as an attribute name.
SELECT *
FROM yourtable
ORDER BY `group`
Using random values also has a lot of semantic confusion. A truly random number would have a different value every time it is retrieved - which would make any sorting impossible (and databases do a lot of sorting which is normally invisible to the user). As long as the implementation uses a finite time algorithm such as quicksort that shouldn't be a problem - but a bubble sort would never finish, and a merge sort could get very confused.
There are also degrees of randomness. There are different algorithms for generating random numbers. For encryption it's critical than the random numbers be evenly distributed and completely unpredictable - often these will use hardware events (sometimes even dedicated hardware) but I don't expect you would need that. But do you want the ordering to be repeatable across invocations?
SELECT *
FROM yourtable
ORDER BY `group`, RAND()
...will give different results each time.
OTOH
SELECT
FROM yourtable
ORDER BY `group`, MD5(CONCAT(age, name, `group`))
...would give the results always sorted in the same order. While
SELECT
FROM yourtable
ORDER BY `group`, MD5(CONCAT(DATE(), age, name, `group`))
...will give different results on different days.

DROP TABLE my_table;
CREATE TABLE my_table
(name VARCHAR(12) NOT NULL
,age INT NOT NULL
,my_group CHAR(1) NOT NULL
);
INSERT INTO my_table VALUES
('Abel',7,'A'),
('Joe',6,'A'),
('Rick',7,'A'),
('Diana',5,'B'),
('Billy',6,'B'),
('Pat',5,'B');
SELECT * FROM my_table ORDER BY my_group,RAND();
+-------+-----+----------+
| name | age | my_group |
+-------+-----+----------+
| Joe | 6 | A |
| Abel | 7 | A |
| Rick | 7 | A |
| Pat | 5 | B |
| Diana | 5 | B |
| Billy | 6 | B |
+-------+-----+----------+

Do the random first then sort by column group.
select Name, Age, Group
from (
select *
FROM yourtable
order by RAND()
) t
order by Group

Try this:
SELECT * FROM table order by Group,rand()

mySQL Multi Join from 2 Statements

I've found many similar questions but have not been able to understand / apply the answers; and I don't really know what to search for...
I have 2 tables (docs and words) which have a many to many relationship. I am trying to generate a list of the top 5 most frequently used words that DO NOT appear in a specified docs.
To this end I have 2 mySQL queries, each of which takes me part way to achieving my goal:
Query #1 - returns words sorted by frequency of use, falls short because it also returns ALL words (SQLFiddle.com)
SELECT `words_idwords` as wdID, COUNT(*) as freq
FROM docs_has_words
GROUP BY `words_idwords`
ORDER BY freq DESC, wdID ASC
Query #2 - returns words that are missing from specified document, falls short because it does not sort by frequency of use (SQLFiddle.com)
SELECT wordscol as wrd, idwords as wID
FROM `words` where NOT `idwords`
IN (SELECT `words_idwords` FROM `docs_has_words` WHERE `docs_iddocs` = 1)
But what I want the output to look like is:
idwords | wordscol | freq
-------------------------
| 8 | Dog | 3 |
| 3 | Ape | 2 |
| 4 | Bear | 1 |
| 6 | Cat | 1 |
| 7 | Cheetah | 1 |
| 5 | Beaver | 0 |
Note: `Dolphin`, one of the most frequently used words, is NOT in the
list because it is already in the document iddocs = 1
Note: `Beaver`, is a "never used word" BUT is in the list because it is
in the main word list
And the question is: how can I combine these to queries, or otherwise, get my desired output?
Basic requirements:
- 3 column output
- results sorted by frequency of use, even if use is zero
Updates:
In light of some comments, the approach that I was thinking of when I came up with the 2 queries was:
Step 1 - find all the words that are in the main word list but not used in document 1
Step 2 - rank words from Step 1 according to how many documents use them
Once I had the 2 queries I thought it would be easy to combine them with a where clause, but I just can't get it working.
A hack solution could be based on adding a dummy document that contains all the words and then subtract 1 from freq (but I'm not that much of a hack!).

I see now what the problem is. I was mislead by your statement regarding the results of the 1st query (emphasis is mine):
returns words sorted by frequency of use, falls short because it also returns ALL words
This query does not return all words, it only returns all used words.
So, you need to left join the words table on docs_has_words table to get all words and eliminate the words that are associated with doc 1:
SELECT w.idwords as wdID, w.wordscol, COUNT(d.words_idwords) as freq
FROM words w
LEFT JOIN `docs_has_words` d on w.idwords=d.words_idwords
WHERE w.idwords not in (SELECT `words_idwords` FROM `docs_has_words` WHERE `docs_iddocs` = 1)
GROUP BY w.idwords
ORDER BY freq DESC, wdID ASC;
See sqlfiddle

I think #Shadow has it right in his comment, you just need to add the where clause like this: sqlFiddle
SELECT
`words_idwords` as wdID,
COUNT(*) as freq
FROM docs_has_words
WHERE NOT `words_idwords` IN (SELECT `words_idwords` FROM `docs_has_words` WHERE `docs_iddocs` = 1)
GROUP BY `words_idwords`
ORDER BY freq DESC, wdID ASC
Does this produce the output you need?

ASCII sum of all the all the characters in column Mysql

I have a table users but i have shown only 2 columns I want to sum all the characters of name column.
+----+-------+
| id | name |
+----+-------+
| 0 | user |
| 1 | admin |
| 3 | edit |
+----+-------+
for example ascii sum of user will be
sum(user)=117+115+101+114=447
i have tired this
SELECT ASCII(Substr(name, 1,1)) + ASCII(Substr(name, 2, 1)) FROM user
but it only sums 2.

You are going to have to fetch one character at a time to do the sum. One method is to write a function with a while loop. You can do this with a SELECT, if you know the longest string:
SELECT name, SUM(ASCII(SUBSTR(name, n, 1)))
FROM user u JOIN
(SELECT 1 as n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
SELECT 4 UNION ALL SELECT 5 -- sufficient for your examples
) n
ON LENGTH(name) <= n.n
GROUP BY name;
If your goal is to turn the string as something that can be easily compared or a fixed length, then you might consider the encryption functions in MySQL. Adding up the ASCII values is not a particularly good hash function (because strings with the same characters in different orders produce the same value). At the very least, multiplying each ASCII value by the position is a bit better.

Sorting field by numerical value and lexicographical value

I have a table of orders. For each order, we allow the user to enter their (non-unique) order number. This can we whatever they want.
I have these orders displayed in an HTML table, with the ability to sort the orders by various fields, such as order number.
One of our clients noticed an issue with the sorting. Since the order numbers are stored as VARCHAR, they are sorted lexicographically. Problem is, not all order numbers are numerc, some are words, and others are alphanumeric.
So, for example, I can have order numbers like so:
42
Order8
MyOrder
9
Order63
When sorted using ORDER BY orderNumber, I get:
42
9
MyOrder
Order63
Order8
DEMO: http://sqlfiddle.com/#!2/7973e/1
This is not what I want. I want them to be sorted like so:
9
42
MyOrder
Order8
Order63
I want them to be lexicographical for the strings, but numeric for the numbers. I thought of something that might work:
ORDER BY IFNULL(NULLIF(CAST(orderNumber AS SIGNED), 0), orderNumber)
DEMO: http://sqlfiddle.com/#!2/7973e/2
But alas, I still get the same results (as the numbers are then re-cast back to strings). How can I sort these values in the way that I want? If only there was some way to "convert" the strings into a sort of numerical value.

You could try padding the order number with zeros when you see a numeric value. See this, for example:
http://sqlfiddle.com/#!2/7973e/21
SELECT
CASE
WHEN CAST(orderNumber AS SIGNED) != 0 THEN LPAD(orderNumber, 10, '0')
ELSE orderNumber
END as padded,
orders.*
FROM orders
ORDER BY
padded
Results in :
| PADDED | ORDERID | ORDERNUMBER |
--------------------------------------
| 0000000009 | 4 | 9 |
| 0000000042 | 1 | 42 |
| MyOrder | 3 | MyOrder |
| Order63 | 5 | Order63 |
| Order8 | 2 | Order8 |
Full disclosure: I'm the author of SQL Fiddle.

You can get close to what you want with the following:
ORDER BY CASE WHEN CONVERT(OrderNumber, SIGNED INTEGER)= 0
THEN 1e50
ELSE CONVERT(OrderNumber, SIGNED INTEGER)
END ASC, OrderNumber ASC
However, it would need more work if you need to sort a mixed (text/number) order number by its last digits (e.g. Order63, Order8).

MySQL Fulltext search present me inaccurate result

Let's say that I have a database that looks like this (MyISAM):
+------------+-------------------+------------------+
| student_id | student_firstname | student_lastname |
+------------+-------------------+------------------+
| 30 | Patrik | Andersson |
| 79 | Patrik | Svensson |
+------------+-------------------+------------------+
And I perform this query:
SELECT s.student_firstname, s.student_lastname FROM students s
WHERE MATCH (student_firstname, student_lastname)
AGAINST
('+Patrik Svensson*' IN BOOLEAN mode)
This generates both of the above rows. Why do I not get 1 row in my result? Is it because the last three letters in the student_lastname are the same? Is there any way to make FULLTEXT more precise?

Have you tried reading the MySQL documentation?
http://dev.mysql.com/doc/refman/5.5/en/fulltext-boolean.html
And I quote:
By default (when neither + nor - is specified) the word is optional,
but the rows that contain it are rated higher.
And:
'+apple macintosh'
Find rows that contain the word “apple”, but rank rows higher if they
also contain “macintosh”.

I have tested it, this query is giving right result
SELECT s.student_firstname, s.student_lastname FROM students s
WHERE MATCH (student_firstname, student_lastname)
AGAINST
('+Patrik +Svensson*' IN BOOLEAN mode)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL search query ordered by match relevance - mysql

Related

Mysql-> Group after rand()

mySQL Multi Join from 2 Statements

ASCII sum of all the all the characters in column Mysql

Sorting field by numerical value and lexicographical value

MySQL Fulltext search present me inaccurate result

Categories

Resources