Database reports different character lengths for the same word - mysql

I have a table in which I added english dictionary words. Now I have some records that seem to be duplicates but the length of the string differs.
for example 'aaron' is repeated twice in my table, but when I use this query:
select id, word, char_length(word) from my_table;
I get the following back:
id | word | char_length
7 | aaron | 5
12 | aaron | 6
How can the char_length change for the same word? What can I do to remove one word which exceeds length by 1?

It's likely #Vatev is on the right track with his comment.
Try these two queries:
1. SELECT * FROM my_table WHERE word = 'aaron';
2. SELECT * FROM my_table WHERE word like '%aaron%';
The first will only match rows where word is exactly aaron while the second will match rows which contain aaron anywhere. If there are rows with extra content, like whitespace, they would show up in the second, but not the first.
One possible way to clean up these duplicates would be to run the following:
DELETE FROM my_table WHERE TRIM(word) != word;
But don't run that blindly - it will delete all rows with extra whitespace, even if there isn't a matching "correct" entry.

Related

Is it possible to search a column of IDs for 1 if other rows include IDs with 1 in them?

I have a MySQL database with a varchar column (although the column type can be changed if needed).
The column stores some ids separated with underscores like so:
Row 1: 1
Row 2: 1_2_3
Row 3: 10_2
Row 4: 4_5_1
Is there anyway in this structure to query that column for 1 and return all rows with 1 (but not Row 3 which contains 1 but the ID is 10).
To get the current results I am attempting to search the column LIKE %1%.
Or do I need to change the structure to achieve the result I want?
Maybe you can try:
select *
from t
where c like '1\_%'
or c like '%\_1'
or c like '%\_1\_%'
or c = '1'
You need to escape the underscore as \_, since SQL defines it as a wildcard and will match any character.
If we had a comma separator, then we could use MySQL FIND_IN_SET function.
We can use MySQL REPLACE function to change the underscores to commas,
e.g.
SELECT t.*
FROM t
WHERE FIND_IN_SET('1',REPLACE( t.id ,'_',','))
Reference:
https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_find-in-set
https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_replace
NOTE:
Storing underscore separated lists is an antipattern. See Chapter 2 of Bill Karwin's book "SQL Antipatterns: Avoiding the Pitfalls of Database Programming"
https://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557
With the operator like:
select * from tablename
where concat('_', id, '_') like '%#_1#_%' escape '#'
See the demo.
Results:
| id |
| ----- |
| 1 |
| 1_2_3 |
| 4_5_1 |

MySQL FULLTEXT search yields 0 for a one-letter difference

I have a table, let's call it A, with a FULLTEXT index on its name field.
This table, containing around 1200 rows, contains a single row whose name field has a value of "TELEVISORI"
This query:
SELECT A.name, MATCH(name) AGAINST ('Televisori') AS `match`
FROM A
Results in (not the exact value, but I remember it being around 8):
+------------+-----------+
| name | match |
+------------+-----------+
| TELEVISORI | 8.3947893 |
+------------+-----------+
Whereas this one:
SELECT A.name, MATCH(name) AGAINST ('Televisore') AS `match`
FROM A
yields no results.
Things I've checked:
The word is not a stop word
Its length is over 4 characters (rather, 3 since I'm using InnoDB?)
The word appears in less than 50% of records - It's the only one in hundreds.
I tried changing the casing of the words in every possible combination just to be completely sure, but that shouldn't be it since I'm using a case insensitive collation. Also, it matches without a problem when I use "Televisori"
Is there anything I'm missing?

MySQL and regexp - return as a SELECT column with a table's column's regexp matching part only

I know there is an easy way to give a WHERE condition matching a regexp in MySQL, but my question is different. The column is like this:
jknewfjnkewnjkfewnjfwe1jnkf2jnw wefwef 1234567.12345678 qwrqwerqwrq
jnewdnkewjk ewnfewf1 wefwefew2 1234568.22314152 qwrqwrqwr qw
whjefjwefwe1 wefwefwef2 qweqwrqrw 1234369.21213131 qwdqwdqwd
I would like to get a SELECT column SUBSTRING phrase, which returns for me:
selectcol1 selectcol2
1234567 12345678
1234568 22314152
1234369 21213131
All I know: first matching number is 7 digits, and 2nd matching number is 8 digits always, and the parts before and after surely won't match the exactly 7 digit pattern.
Is there any way to get these SELECT columns?
Short answer: No. Long answer...
REGEXP '[[:<:]][0-9]{7}[[:>:]].*[[:<:]][0-9]{8}[[:>:]]'
will match the first 'word' of 7 digits followed (eventually) by a 'word' with 8 digits.
Demonstration (1=true):
mysql> SELECT 'jnewdnkewjk ewnfewf1 wefwefew2 1234568.22314152 qwrqwrqwr qw'
-> REGEXP '[[:<:]][0-9]{7}[[:>:]].*[[:<:]][0-9]{8}[[:>:]]' AS test;
+------+
| test |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
Keep in mind that SQL is not a good language for extracting those fields. In fact, I would say that you should use PHP (or other language) to extract the fields after MySQL locates and fetches the rows.
That is, there is no LOCATE() with regexp.

Iterating through MySQL rows

I have a simple MySQL table made up of words and an associated number. The numbers are unique for each word. I want to find the first word whose index is larger than a given number. As an example:
-----------------------
| WORD: | F_INDEX: |
|---------------------|
| a | 5 |
| cat | 12 |
| bat | 4002 |
-----------------------
If I was given the number "9" I would want "cat" returned, as it is the first word whose index is larger than 9.
I know that I can get a full list of sorted rows by querying:
SELECT * FROM table_name ORDER BY f_index;
But would, instead, like to make a MySQL query that does this. (The confusion lies in the fact that I'm unsure as to how to keep track of the current row in my query). I know can loop with something like this:
CREATE PROCEDURE looper(desired_index INT)
BEGIN
DECLARE current_index int DEFAULT 0
// Loop here, setting current_index to whatever the next rows index is,
// then do a comparison to check it to our desired_index, breaking out
// if it is greater.
END;
Any help would be greatly appreciated.
Try this:
SELECT t.word
, t.f_index
FROM table_name t
WHERE t.f_index > 9
ORDER
BY t.f_index
LIMIT 1
It is much more efficient to have the database return the row you need, than it is to pull a whole bunch of rows and figure out which one you need.
For best performance of this query, you will want an index ON table_name (f_index,word).
Why don't you just use MYSQL statement to retrieve the first item you found from f_index where the f_index is greater than the value your pass in.
For example :
select word from table_name
where f_index > desired_index
order by f_index
limit 1

Best way to store and retrieve synonyms in database mysql

I am making a synonyms list that I will store it in database and retrieve it before doing full text search.
When users enters like: word1
I need to lookup for this word in my synonyms table. So if the word is found, I would SELECT all the synonyms of this word and use it in the fulltext search on the next query where I contruct the query like
MATCH (columnname) AGAINST ((word1a word1b word1c) IN BOOLEAN MODE)
So how do I store the synonyms in a table? I found 2 choices:
using key and word columns like
val keyword
-------------
1 word1a
1 word1b
1 word1c
2 word2a
2 word2b
3 word3a
etc.
So then I can find exact match of the entered word in one query and find it's ID. In the next select I get all the words with that ID and somehow concate them using a recordset loop in server side langauge. I can then construct the real search on the main table that I need to look for the words.
using only word columns like
word1a|word1b|word1c
word2a|word2b|word2c
word3a
Now I so the SELECT for my word if it is inside any record, if it is, extract all the record and explode it at | and I have my words again that I can use.
This second approach lookes easier to maintain for the one who would make this database of synonyms, but I see 2 problems:
a) How do I find in mysql if a word is inside the string? I can not LIKE 'word1a' it because synonims can be very alike in a way word1a could be strowberry and strowberries could be birds and word 2a could be berry. Obviously I need exact match, so how could a LIKE statement exact match inside a string?
b) I see a speed problem, using LIKE would I guess take more mysql take than "=" using the first approach where I exact match a word. On the other hand in the first option I need 2 statements, one to get the ID of the word and second to get all the words with this ID.
How would you solve this problem, more of a dilemma which approach to take? Is there a third way I don't see that is easy for admin to add/edit synonyms and in the same time fast and optimal? Ok I know there is no best way usually ;-)
UPDATE: The solution to use two tables one for master word and second for the synonym words will not work in my case. Because I don't have a MASTER word that user types in search field. He can type any of the synonyms in the field, so I am still wondering how to set this tables as I don't have master words that I would have ID's in one table and synonims with ID of the master in second table. There is no master word.
Don't use a (one) string to store different entries.
In other words: Build a word table (word_ID,word) and a synonym table (word_ID,synonym_ID) then add the word to the word table and one entry per synonym to the synonyms table.
UPDATE (added 3rd synonym)
Your word table must contain every word (ALL), your synonym table only holds pointers to synonyms (not a single word!) ..
If you had three words: A, B and C, that are synonyms, your DB would be
WORD_TABLE SYNONYM_TABLE
ID | WORD W_ID | S_ID
---+----- -----+-------
1 | A 1 | 2
2 | B 2 | 1
3 | C 1 | 3
3 | 1
2 | 3
3 | 2
Don't be afraid of the many entries in the SYNONYM_TABLE, they will be managed by the computer and are needed to reflect the existing relations between the words.
2nd approach
You might also be tempted (I don't think you should!) to go with one table that has separate fields for word and a list of synonyms (or IDs) (word_id,word,synonym_list). Beware that that is contrary to the way a relational DB works (one field, one fact).
I think 3 columns and only one table is better
WORD_TABLE
ID | WORD | GroupID
---+----------------
1 | A | 1
2 | B | 1
3 | C | 1
Another approach is to store meaning (this does not use master words, but a meaning table that groups instead)
would be to store the words in a words table without synonyms and with only text, like this:
Many words, one meaning
meaning_table
meaning_id
---
1
2
3
And store the words in another table, for example if A, B and C were all synonyms of 1 meaning
word_table
word_id | meaning_id | word
--------+------------+------
1 | 1 | A
2 | 1 | B
3 | 1 | C
Even though it looks a lot like what Hasan Amin Sarand suggests, it has the key difference that you don't select from the WORD_TABLE but instead select from the MEANING_TABLE, this is much better and I learned that the hard way.
This way you store the meaning in one table and as many words for that meaning as you like in another.
Although it assumes that you have 1 meaning per word.
Many words, many meanings
if you want to store words with multiple meanings then you need another table for the many to many relationship and the whole thing becomes:
meaning_table
-------------
meaning_id
-------------
1
2
3
word_meaning_table
--------------------
word_id | meaning_id
--------+-----------
1 | 1
2 | 1
3 | 1
word_table
--------------
word_id | word
--------+-----
1 | A
2 | B
3 | C
Now you can have as many words with as many meanings as you want, where any word can mean anything you want and any meaning can have many words.
If you want to select a word and it's synonyms then you do
SELECT
meaning_id,word_id,word
FROM meaning_table
INNER JOIN word_meaning_table USING (meaning_id)
INNER JOIN word_table USING (meaning_id)
WHERE meaning_id=1
You can also then store meaning that does not have a word yet or that you don't know the word of.
If you don't know what meaning it belongs to then you can just insert a new meaning for every new word and fix the meaning_id in the word_table later.
You can then even store and select the words that are the same but mean different things
SELECT
meaning_id,word_id,word
FROM meaning_table
INNER JOIN word_meaning_table USING (meaning_id)
INNER JOIN word_table USING (meaning_id)
WHERE word_id=1