Accented characters and MySQL searching - mysql

I've searched for a question for hours and I haven't found an answer that was suitable for me, so... Here I come...
I'm Hungarian and we use the following accented characters in our language: áéíóöőúüű (and of course the capital counterparts)
I want to make a smart search in php where the user is allowed to enter a search word and it finds the result whether it has the accent or not and both in the MySQl table or the search field. So...
My MySQL table is using utf8_hungarian_ci
I can do the php conversion so that either the user types in 'Bla' or 'blá', it will return 'bla' and we are searching the mySQL database.
But my problem is... My database might have a 'bla fér' or a 'bláter' field entry. But if we search with 'bla' (from PHP) it only return 'bla fer'. How can I convert the field I'M searching to make 'bla fér'-> 'bla fer' and 'bláter' - > 'blater'. So essentially...
I want to get rid of the accented characters and make them into unaccented ones. But of course, only for the sake of searching. Please help! Thank you!
EDIT:
<?php
$search = $_GET["search"]; // May contain áéíóöőúüű
$accented= array("Ö","ö","Ü","ü","ű","Ó","ó","O","o","Ú","ú","Á","á","U","u","É","é","Í","í"," ","+","'","ő", "Ű", "Ő", "ä","Ä","ű","Ű","ő","Ő");
$nonaccented=array("O","o","U","u","u","O","o","O","o","U","u","A","a","U","u","E","e","I","i","_","_","_","o", "U", "O", "a","A","u","u","o","o");
$search = str_replace($accented,$nonaccented,$search);
$query = "SELECT id, name FROM people WHERE name LIKE '%$search%'"; // Database column 'name' may also contain áéíóöőúüű
?>

Here are some results from my tests. You can compare to yours:
CREATE TABLE `test` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(32) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8
Table contents:
mysql> select * from test;
+----+---------+
| id | name |
+----+---------+
| 1 | bla |
| 2 | blater |
| 3 | bláter |
| 4 | bhei |
+----+---------+
4 rows in set (0.00 sec)
Search results;
mysql> select * from test where name like '%bla%';
+----+---------+
| id | name |
+----+---------+
| 1 | bla |
| 2 | blater |
| 3 | bláter |
+----+---------+
3 rows in set (0.00 sec)
Search with accent:
mysql> select * from test where name like '%blá%';;
+----+---------+
| id | name |
+----+---------+
| 3 | bláter |
+----+---------+
1 row in set (0.00 sec)
I get the same results even with COLLATE=utf8_hungarian_ci

For me, setting the character encoding to utf8_general_ci (or something else with "_ci") did the trick. The _ci means that SQL will match searches irrespective of case or accent.

Related

Recommended MySQL INDEX for storing domain names

I'm trying to store about 100 Million domain names in a MySQL database, but I can't figure out the right INDEX method to use on the domain names.
The issue being that LIKE queries will also be executed:
SELECT id FROM domains WHERE domain LIKE '%.example.com'
or
SELECT id FROM domains WHERE domain LIKE 'example.%'
If it makes it easier, '%example%' is not a requirement, but at best a nice to have / be able to.
What would be the proper index to use? Left to right (example.%) should be realitivly straight forward, but right to left (%.example.com) is problematic but the most common query.
I'm using MariaDB 10.3 on Linux. DB running on a PCI-e SSD, lookup times longer then 10 seconds should be coincided "unacceptable"
You can spend one virtual permanent column (rdomain) in your table where the virtual function stores the domainname in reverse order like REVERSE(domain). so it is possible to search from start of string i.e. search for '%.mydomain.com' -> WHERE rdomain like REVERSE('%.mydomain.com
the table
CREATE TABLE `myreverse` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`domain` varchar(64) CHARACTER SET latin1 DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_domain` (`domain`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
add the column
ALTER TABLE myreverse
ADD COLUMN rdomain VARCHAR(64) AS (REVERSE(domain)),
ADD KEY idx_rdomain (rdomain);
insert some data
INSERT INTO `myreverse` (`id`, `domain`)
VALUES
(2, 'img.google.com'),
(3, 'w3.google.com'),
(1, 'www.coogle.com'),
(4, 'www.google.de'),
(5, 'www.mydomain.com');
see the data
mysql> SELECT * from myreverse;
+----+------------------+------------------+
| id | domain | rdomain |
+----+------------------+------------------+
| 1 | www.google.com | moc.elgoog.www |
| 2 | img.google.com | moc.elgoog.gmi |
| 3 | w3.coogle.com | moc.elgooc.3w |
| 4 | www.google.de | ed.elgoog.www |
| 5 | www.mydomain.com | moc.niamodym.www |
+----+------------------+------------------+
5 rows in set (0.01 sec)
mysql>
now you can query with reverse order and MySQL can use the index.
query
mysql> select * from myreverse WHERE rdomain like REVERSE('%.google.com');
+----+----------------+----------------+
| id | domain | rdomain |
+----+----------------+----------------+
| 3 | w3.google.com | moc.elgoog.3w |
| 2 | img.google.com | moc.elgoog.gmi |
+----+----------------+----------------+
2 rows in set (0.00 sec)
mysql>
Here you can see that the optimizer use the index.
mysql> EXPLAIN select * from myreverse WHERE rdomain like REVERSE('%.google.com');
+----+-------------+-----------+------------+-------+---------------+-------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+-------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | myreverse | NULL | range | idx_rdomain | idx_rdomain | 195 | NULL | 2 | 100.00 | Using where |
+----+-------------+-----------+------------+-------+---------------+-------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)
mysql>
I'm not sure an index would help you here. If you can't change the database, your options seem limited. One thing you could do, is if you're running both a subdomain and domain query back to back, to run the subdomain query first. That should help reduce the number of rows the domain query has to cover.
It would definitely help if you split the URL between subdomains and domains into different columsn in the database. Have indexes for both of them. Then you could query the subdomains only and the domains only. It should speed things up. And if there are a lot of repeating values, you should normalize those fields so to remove repetition and speed up queries even more.

Mysql Full Text Match Returns Bool Match but 0 Relevance [duplicate]

I have a problem which I have been able to recreate with two very simple tables. The tables were defined as follows:
create table Temp_Table_MyISAM(
id INT UNSIGNED AUTO_INCREMENT,
code VARCHAR(10) NOT NULL,
name VARCHAR(256) NOT NULL,
PRIMARY KEY (id),
KEY (code),
FULLTEXT (name)
) ENGINE = MYISAM;
create table Temp_Table_InnoDB(
id INT UNSIGNED AUTO_INCREMENT,
code VARCHAR(10) NOT NULL,
name VARCHAR(256) NOT NULL,
PRIMARY KEY (id),
KEY (code),
FULLTEXT (name)
);
Each table has two rows, as can be seen from the result of the following two queries:
select * from Temp_Table_MyISAM;
+----+---------+----------------+
| id | code | name |
+----+---------+----------------+
| 1 | AC-7865 | 38 NORTHRIDGE |
| 2 | DE-3514 | POLARIS VENTRI |
+----+---------+----------------+
select * from Temp_Table_InnoDB;
+----+---------+----------------+
| id | code | name |
+----+---------+----------------+
| 1 | AC-7865 | 38 NORTHRIDGE |
| 2 | DE-3514 | POLARIS VENTRI |
+----+---------+----------------+
When I do a FULLTEXT search on the MyISAM table, I don't get any hits
MariaDB [stackoverflow]> SELECT name, code FROM Temp_Table_MyISAM
WHERE MATCH(name) AGAINST('38');
Empty set (0.00 sec)
MariaDB [stackoverflow]> SELECT name, code FROM Temp_Table_MyISAM
WHERE MATCH(name) AGAINST('POLARIS');
Empty set (0.00 sec)
When I do a FULLTEXT search on the InnoDB table, I get a hit only when the pattern to be matched does not start with a numeric value
MariaDB [stackoverflow]> SELECT name, code FROM Temp_Table_InnoDB
WHERE MATCH(name) AGAINST('38');
Empty set (0.00 sec)
MariaDB [stackoverflow]> SELECT name, code FROM Temp_Table_InnoDB
WHERE MATCH(name) AGAINST('POLARIS');
+----------------+---------+
| name | code |
+----------------+---------+
| POLARIS VENTRI | DE-3514 |
+----------------+---------+
Any insight would be appreciated.
There are 3 rules to watch out for in MyISAM's FULLTEXT:
Text words shorter than ft_min_word_len (default 4 characters) will not be indexed. ("38")
Search words that show up in more 50% or more of the rows, will be ignored. ("Polaris")
"Stop words" in the text are not indexed. ("the", "and", ...)
Since InnoDB now supports FULLTEXT, you should move to that engine. (And the rules are different there.)

How to retrieve in MySQL select result the characters/words that meet the LIKE requirements, not a whole line

The table that can be used as a reference point:
CREATE TABLE TEST (
Owner varchar(64) DEFAULT NULL,
Devices varchar(64) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO TEST (Owner,Devices) VALUES
('Peter', 'laptop,phone,tablet'),
('Joe', 'phone,laptop,tablet'),
('Eugene', 'phone,tablet,laptop');
mysql> SELECT Owner, Devices FROM TEST WHERE Devices LIKE '%Laptop%';
+--------+---------------------+
| Owner | Devices |
+--------+---------------------+
| Peter | laptop,phone,tablet |
| Joe | phone,laptop,tablet |
| Eugene | phone,tablet,laptop |
+--------+---------------------+
3 rows in set (0.00 sec)
LIKE Operator looks for a certain pattern, but MySQL result shows not only a matched pattern, but a whole string with a matched pattern.
Please advise if there is a way to show the result in the following way:
+--------+---------------------+
| Owner | Devices(LIKE result)|
+--------+---------------------+
| Peter | laptop |
| Joe | laptop |
| Eugene | laptop |
+--------+---------------------+
3 rows in set (0.00 sec)
Thank you in advance!
Just use the word you are looking for in the Select:
SELECT Owner, "Laptop" FROM TEST WHERE Devices LIKE '%Laptop%';
declare device varchar(50)
set device = 'laptop'
select Owner , device as Devices from test where Devices
like '%' + device + '%';

mysql aes_encrypt into longtext column

Is it possible to store a MySQL AES_ENCRYPT into a LONGTEXT column?
I know I'm suppose to use varbinary or blob, but I have a table that I'm storing a bunch of random "settings" in, and the settings_value column is longtext.
I went to store a "smtp mail password" in there, and got a little stuck.
If not, I guess, I'll store it as a hex string through php.
SOLUTION:
My query was something like this:
INSERT INTO table (setting_value)VALUES(AES_ENCRYPT('password', 'key')) ON DUPLICATE KEY UPDATE setting_value=VALUES(setting_value)
As you will see in my comments below, I tried changing my column encoding from utf8_unicode_ci to utf8_bin and still it failed. I changed to latin1_bin and it worked.
I switched back to utf8_unicode_ci and changed my query to the following:
INSERT INTO table (setting_value)VALUES(HEX(AES_ENCRYPT('password', 'key'))) ON DUPLICATE KEY UPDATE setting_value=VALUES(setting_value)
That worked since it just turned my value into a hex string.
Took me a second to figure out how to get the value back out correctly, so for documentation purposes:
$pass = SELECT AES_DECRYPT(BINARY(UNHEX(setting_value)), 'key') as orig_text FROM table
echo $pass->orig_text
Did you try it? It's pretty easy to set up a test case, and from what I can see it works fine for your requirements:
mysql> create table t (id int unsigned not null auto_increment primary key, str LONGTEXT);
Query OK, 0 rows affected (0.13 sec)
mysql> desc t;
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| str | longtext | YES | | NULL | |
+-------+------------------+------+-----+---------+----------------+
2 rows in set (0.05 sec)
mysql>
mysql> INSERT INTO t VALUES (1,AES_ENCRYPT('text','password'));
Query OK, 1 row affected (0.02 sec)
mysql>
mysql> select id,str,AES_DECRYPT(str,'password') from t;
+----+-----------------------------+-----------------------------+
| id | str | AES_DECRYPT(str,'password') |
+----+-----------------------------+-----------------------------+
| 1 | ö½¨Ü·øÍJ/ª¼Tf€D | text |
+----+-----------------------------+-----------------------------+
1 row in set (0.00 sec)
Use some binary column type (like BLOB instead of LONGTEXT) for storing AES_ENCRYPTed content.

How can I access the table comment from a mysql table?

How can I get just the table comment from a mysql table? I tried the following, but they didn't work for various reasons. I want to figure out how to get just the string 'my comment' (ideally via perl =)
Any help?
-- Abbreviated output for convenience.
SHOW TABLE STATUS WHERE Name="foo"
+------+--------+---------+------------+------+----------------+---------------+
| Name | Engine | Version | Row_format | Rows | Create_options | Comment |
+------+--------+---------+------------+------+----------------+---------------+
| foo | MyISAM | 10 | Fixed | 0 | | my comment |
+------+--------+---------+------------+------+----------------+---------------+
and
SHOW CREATE TABLE foo;
+-------+------------------------------------------------------------------------------+
| Table | Create Table |
+-------+------------------------------------------------------------------------------+
| fooo | CREATE TABLE `fooo` (`id` int(11) NOT NULL PRIMARY KEY) COMMENT='my comment' |
+-------+------------------------------------------------------------------------------+
Based on the answer by OMG Ponies, but using INFORMATION_SCHEMA.TABLES instead of INFORMATION_SCHEMA.COLUMNS. When looking around on the web, all I could find was info on the columns' comments, but never on the table's. This is how to get a table's comment.
SELECT table_comment
FROM INFORMATION_SCHEMA.TABLES
WHERE table_schema='my_cool_database'
AND table_name='user_skill';
+--------------------------+
| table_comment |
+--------------------------+
| my awesome comment |
+--------------------------+
If you don't want to have both database name and table name in the query, you can use :
SHOW TABLE STATUS WHERE Name='table_name';
and then pick up the "Comment" key of the result (you have to use an associative function like mysqli_fetch_assoc() in php).