Matching first char in string to digit or non-standard character - mysql

I need to allow users to browse a table, with >1 million entries, by the first letter in the title.
I want them to be able to browse by every letter from A-Z, 0-9 in a list together and all other characters together.
Since it's a big database and it is to be displayed on a website, I need it to be efficient. Regex does not use index, so that would be too slow.
Is this possible or will I have to rethink the design?
Thanks in advance

As long as there's an index on the "Title", you should be able to use a SQL like
select *
from myTable
where Title like 'A%'
(or 'B%', 'C%'...)

Create links representing every letter and number. Clicking these links will provide the users with the results from the database that begin with the selected character.
SELECT title FROM table
WHERE LEFT(title,1) = ?Char
ORDER BY title ASC;
Consider paginating these result pages into appropriate chunks. MySQL will let you do this with LIMIT
This command will select the first 100 records from the desired character group:
SELECT title FROM table
WHERE LEFT(title,1) = ?Char
ORDER BY title ASC
LIMIT 0, 100;
This command will select the second 100 records from the desired character group:
SELECT title FROM table
WHERE LEFT(title,1) = ?Char
ORDER BY title ASC
LIMIT 100, 100;
Per your comments, if you want to combine characters 0-9 without using regex, you will need to combine several OR statements:
SELECT title FROM table
WHERE (
LEFT(title,1) = '0'
OR LEFT(title,1) = '1'
...
)
ORDER BY title ASC;

Related

How to use REGEXP in mysql for matching words from a text

I have a mysql query like :
SELECT name FROM table_name WHERE name LIKE '%custom%' limit 10
It retruns me 2 rows from my custom table.
I want to get records which contains either of any word from the text c cu cus cust usto stom tom om m also.
I tried below query :
SELECT name FROM table_name WHERE name like '%custom%' OR name REGEXP 'c|cu|cus|cust|usto|stom|tom|om|m' limit 10
Above query returning me 7 records but these 7 records does not have such 2 records which 1st query result have.
How to get that? Or any other way to get these result in mysql?
EDIT : Here I also want to order by maximum substrings matches in second query.
Try this:
SELECT name FROM table_name WHERE name REGEXP 'custom' limit 10;
There is no need of LIKE with REGEXP, but REGEXP are slower then LIKE. So if your table have so many records then REGEXP quesries are slower.
Try this:
SELECT name FROM table_name WHERE name REGEXP 'custom|c|cu|cus|cust|usto|stom|tom|om|m' limit 10
What we did above is that we combined custom with the rest of the patterns, and we made them all use REGEXP.
You need to add word boundaries, which in MySQL are [[:<:]] for start of word and [[:>:]] for end of word:
SELECT name
FROM table_name
WHERE name REGEXP '[[:<:]](c|cu|cus|cust|usto|stom|tom|om|m)[[:>:]]'
limit 10
See live demo.
Note the brackets around the alternation.

MySQL wildcard Like query with multiple words

I have a mysql query as follows.
$query="SELECT name,activity FROM appid
where result > 5 AND name LIKE :term ORDER BY name ASC LIMIT 0,40";
$result = $pdo->prepare($query);
$result->bindvalue(':term','%'.$_GET["q"].'%',PDO::PARAM_STR);
$result->execute();
What i want to do is this.
I have and entry like this that i want to find
'News & Weather'
However when i type
'news weather'
it of course will not find it. How can i be able to type that and retrieve that entry?
Regular expressions can do the trick:
select *
from appid
where name rlike 'news|weather' -- Matches 'news' or 'weather'
Another example:
select *
from appid
where name rlike 'news.*weather' -- Matches 'news' and 'wether'
-- with any characters in-between or none at all
-- (ordered)
Just one more:
select *
from appid
where name rlike '^news.{1,}weather$' -- Matches any text that starts with 'news'
-- and has at least one character before
-- ending with 'weather'
Regular espressions can be used to create very complicated filters on text fields. Read the link above for more information.
If you can add a full-text index to your table, Full-text search might be the better way to go with this. Specifically, a boolean Full-Text search:
select *
from appid
where match(name) against (+news +weather)
I believe the only way possible are through code:
Option A: Replace the spaces in your query parameter with '%' in code, but that of course will make the multiple words ordered
Option B: Split your parameter on spaces and dynamically construct your query with as many LIKEs as needed, adding additional ":termN" parameters for each one.

Best way to return "champion" that exists in a table in case the given query "championship" is not found

I have a very big table with strings.
Field "words":
- dog
- champion
- cat
- this is a cat
- pool
- champ
- boots
...
In my example, if a select query is looking for the given string "championship", it won't find it because this string is not in the table.
In that case, I want the query to return "champion" from the table, i.e. the longest string in the table that begins the given word "championship".
The possible match (if found) is the longest one in table between championship, or championshi, or championsh, or champions, ..., or cham, or cha, or ch, or C.
Question: I want to return longest string in table that starts a given string.
I need high speed. Is there a way to create index and query in order to have fast execution of queries?
Here's one query that will return the specified result:
SELECT t.mycol
FROM mytable t
WHERE 'championship' LIKE CONCAT(t.mycol,'%')
ORDER
BY LENGTH(t.mycol) DESC
LIMIT 1
This query can't do a index range scan, it's going to have to be full scan, but it may be able to use an index to satisfy the query.
If you can restrict the search to a finite number of leading letters that need to match to be considered a "hit", you could include another predicate. For example, to match at least 4 characters:
SELECT t.mycol
FROM mytable t
WHERE 'championship' LIKE CONCAT(t.mycol,'%')
AND t.mycol LIKE 'cham%'
ORDER
BY LENGTH(t.mycol) DESC
LIMIT 1
--or--
AND t.mycol >= 'cham'
AND t.mycol < 'chan'
You are a little vague with 'the longest string in the table that begins the given word "championship".' Would "championing" count as a match?
Perhaps the following will help. If you have an index on words, then the following will return the last word before the given word. It should maximize the initial sequence of matches:
select word
from t
where words <= 'championship'
order by words desc
limit 1;
This isn't exactly what you are asking for, but it might work in practice.
EDIT:
If you are looking for an exact match, then the following should use an index on words effectively and return what you want:
select word
from t
where word in ('championship', 'championshi', 'championsh', 'champions', 'champion',
'champio', 'champi', 'champ', 'cham', 'cha', 'ch', 'c')
order by word desc
limit 1;
It is a bit brute force, but it should have the property of using the index to speed up the query.
Have a look at this article:
http://blog.fatalmind.com/2010/09/29/finding-the-best-match-with-a-top-n-query/
It explains the solution from this SO question:
How to use index efficienty in mysql query
The solution pattern looks like this:
select words
from (
select words
from yourtable
where words <= 'championship'
order by words desc
limit 1
) tmp
where 'championship' like concat (words, '%')

mysql search LIKE not working for long phrase

I have a MySQL table, type 'MyISAM', collation: 'latin1_swedish_ci'. Inside it, I have a column named 'content'.
Inside there, I have a row with the following content:
<p>The state will have a different advantage over most other states, with one of the largest populations in the nation to blablablabla. </p>
My query is this in phpMyAdmin and also in my PHP file:
SELECT *
FROM `pages`
WHERE `content` LIKE '%with one of the largest populations%'
ORDER BY `pages`.`title` ASC
LIMIT 0 , 30
0 rows are returned.
The weird thing is that if I edit the query to this:
SELECT *
FROM `pages`
WHERE `content` LIKE '%with one of the largest%'
ORDER BY `pages`.`title` ASC
LIMIT 0 , 30
Then , 1 rows are returned, and it works.
Is there any setting that might limit the search query to only a few words or only a few characters?
Most probably there are any other whitespace character(s), otherwise, your query seems fine.
try this largest populations should also return 0 recs.
So replace, those characters from column before, searching.
You can find some help here

Unexpected behaviour in MySQL with Boolean-Mode-Query with quoted hyphenated string

I have a problem or rather an understanding problem with a hyphenated searchstring which is quoted.
In my Table there is a table with a column 'company'.
One of the entries in that column is: A-Z Electro
The following examples are simplified a lot (though the real query is much more complex) - but the effect is still the same.
When I do the following search, I don't get the row with the above mentioned company:
SELECT i.*
FROM my_table i
WHERE MATCH (i.company) AGAINST ('+\"A-Z\" +Electro*' IN BOOLEAN MODE)
GROUP BY i.uid ORDER BY i.company ASC LIMIT 0, 40;
If I do the following search, get the row with the above mentioned company (notice only changed the - to a + before "A-Z":
SELECT i.*
FROM my_table i
WHERE MATCH (i.company) AGAINST ('-\"A-Z\" +Electro*' IN BOOLEAN MODE)
GROUP BY i.uid ORDER BY i.company ASC LIMIT 0, 40;
I also get the row, if I remove the operator completely:
SELECT i.*
FROM my_table i
WHERE MATCH (i.company) AGAINST ('\"A-Z\" +Electro*' IN BOOLEAN MODE)
GROUP BY i.uid ORDER BY i.company ASC LIMIT 0, 40;
Can anyone explain to me this behaviour? Because I would expect, when searching with a +, I should get the result too...
I just checked the table index with myisam_ftdump.
Two-Character-Words are indexed properly as there are entries like
14f2e8 0.7908264 ab
3a164 0.8613265 dv
There is also an entry:
de340 0.6801047 az
I suppose this should be the entry for A-Z - so the search should find this entry, shouldn't it?
The default value of ft_min_word_len is 4. See this link for information on that. In short, your system isn't indexing words of less than 4 characters.
Why is this important? Well:
A-Z is less than 4 characters long
...therefore it's not in the index
...but your first query +"A-Z" states it must be in the index in order for the match to succeed
The other two (match if it's not in the index, match if either this or that is in the index) work because it's not in the index.
The hyphen is a red herring - the reason is because "A-Z" is three characters long and your FT index ignores it.