Sphinxsearch can not match arabic words - mysql

I have sphinxsearch and use real time index and this is my config of rt table
mc_offers{
type = rt
path = /var/lib/sphinxsearch/mc_offers
rt_mem_limit = 16M
rt_field = title
rt_attr_string = title
min_word_len = 1
min_infix_len = 1
enable_star = 1
dict = keywords
charset_type = utf-8
charset_table = 0..9, A..Z->a..z, _, !, /, +, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F,\
U+0531..U+0556->U+0561..U+0586, U+0561..U+0586, U+0587, U+2116,\
U+0626,U+0627..U+063A,U+0641..U+064A,U+0679,U+067E,U+0686,U+0688,U+0691,U+0698,U+06AF,U+06BA, U+06BB,U+0660..U+0669→0..9,U+06F0..U+06F9→0..9, U+0622→U+0627, U+0623→U+0627, U+0625→U+0627, U+0671→U+0627, U+0672→U+0627, U+0673→U+0627, U+0675→U+0627, U+066E→U+0628, U+067B→U+0628, U+0680→U+0628, U+06C0→U+0629, U+06C1→U+0629, U+06C2→U+0629, U+06C3→U+0629, U+067A→U+062A, U+067B→U+062A, U+067C→U+062A, U+067D→U+062A, U+067F→U+062A, U+0680→U+062A, U+0681→U+062D, U+0682→U+062D, U+0683→U+062D, U+0684→U+062D, U+0685→U+062D, U+0687→U+0686, U+06BF→U+0686, U+0689→U+062F, U+068A→U+062F, U+068C→U+062F, U+068D→U+062F, U+068E→U+062F, U+068F→U+062F, U+0690→U+062F, U+06EE→U+062F, U+068B→U+0688, U+0692→U+0631, U+0693→U+0631, U+0694→U+0631, U+0695→U+0631, U+0696→U+0631, U+0697→U+0631, U+0699→U+0631, U+06EF→U+0631, U+069A→U+0633, U+069B→U+0633, U+069C→U+0633, U+06FA→U+0633, U+069D→U+0635, U+069E→U+0635, U+06FB→U+0635, U+069F→U+0637, U+06A0→U+0639, U+06FC→U+0639, U+06A1→U+0641, U+06A2→U+0641, U+06A3→U+0641, U+06A4→U+0641, U+06A5→U+0641, U+06A6→U+0641, U+066F→U+0642, U+06A7→U+0642, U+06A8→U+0642, U+063B→U+0643, U+063C→U+0643, U+06A9→U+0643, U+06AA→U+0643, U+06AB→U+0643, U+06AC→U+0643, U+06AD→U+0643, U+06AE→U+0643, U+06B0→U+06AF, U+06B1→U+06AF, U+06B2→U+06AF, U+06B3→U+06AF, U+06B4→U+06AF, U+06B5→U+0644, U+06B6→U+0644, U+06B7→U+0644, U+06B8→U+0644, U+06FE→U+0645, U+06B9→U+0646, U+06BC→U+0646, U+06BD→U+0646, U+06BE→U+0647, U+06C0→U+0647, U+06C1→U+0647, U+06C2→U+0647, U+06C3→U+0647, U+06D5→U+0647, U+06FF→U+0647, U+06C4→U+0648, U+06C5→U+0648, U+06C6→U+0648, U+06C7→U+0648, U+06C8→U+0648, U+06C9→U+0648, U+06CA→U+0648, U+06CB→U+0648, U+06CF→U+0648, U+063D→U+064A, U+063E→U+064A, U+063F→U+064A, U+06CC→U+064A, U+06CD→U+064A, U+06CE→U+064A, U+06D0→U+064A, U+06D1→U+064A, U+06D2→U+064A, U+06D3→U+064A
docinfo = extern
morphology = none
ignore_chars=U+0640,U+064B..U+065F,U+06D6..U+06DC,U+06DF..U+06E8,U+06EA..U+06ED
}
and I have row like this one
| id | weight | partner_offer_id | section_id | location_id | place_id|price_aed | price_usd | label_id | lat | lng | end_date |title | description | short_description | tags | type | owner_type |sub_section | user_residency | available_lng_id |
| 405 | 1 | 0 | 1 | 1 | 0 | 123 | 19 | 0 | 25.269428 | 55.279106 | 1893441600 | test offer asd քաք | nknkn انضم | knkjnk انضم | | regular | partner | 4 | visitor resident | 1 8 |
which contains arabic and armenian words.
arabic - انضم
armenian - քաք
and when I run this query it works fine
SELECT id, sub_section, WEIGHT() as relevance FROM mc_offers WHERE MATCH('(քաք*)');
it return result
but when I run same query to match arabic it return empty result
SELECT id, sub_section, WEIGHT() as relevance FROM mc_offers WHERE MATCH('(انضم*)');
Empty set (0.00 sec)

did you try on the source config to add sql_query_pre = SET NAMES utf8 ?

Related

How to remove the special character from any length of the string in MYSQL

I have sample Data
+----+-----------+
| Id | Name |
+----+-----------+
| 1 | $John |
| 2 | $Carol |
| 3 | $Mike |
| 4 | $Sam |
| 5 | $David$Mohan$ |
6 | $David$
7 | $David$Mohan$
| 8 | Robert$Ram$ |
| 9 | Maxwell$ |
+----+-----------+
I need to remove the only $ first character
Need output :
+----+-----------+
| Id | Name |
+----+-----------+
| 1 | John |
| 2 | Carol |
| 3 | Mike |
| 4 | Sam |
| 5 | David$Mohan |
6 | David
7 | David$Mohan
| 8 | Robert$Ram |
| 9 | Maxwell |
+----+-----------+
Select REPLACE(col,'$','') from Tbl
select regexp_replace(name, '^$', '') name from mytable
I have tried with Replace and Substring but still missing the point .
Can anyone suggest me .
If you are only looking for starting $, you can use this below logic-
DEMO HERE
SELECT
CASE
WHEN LEFT(D,1) = '$' THEN RIGHT(D, LENGTH(D)-1)
ELSE D
END STR,
IF(LEFT(D,1) = '$', RIGHT(D, LENGTH(D)-1), D) STR2
-- you can use any of the above option
FROM
(
select '$David$Mohan$' D UNION ALL
select 'Da$Mo$'
)A
Try this:
select
id,
case when SUBSTR(Name, 1,1)='$' and SUBSTR(Name,-1,1)='$' then substr(Name,2,(length(Name)-2))
when SUBSTR(Name, 1,1)='$' then substr(Name,2)
else Name
end
from Tbl
Based on your example you should try;
Replace(trim(replace({col},'$',' ')), ' ','$')
This is turning the '$' into spaces, removing spaces at the start or end or the string, then switching back to '$'.
Try this, it's working for me for all your test cases
SELECT REGEXP_SUBSTR(name,'[^$].+[^$]') from users;
If case you want to replace $ with space, David$Ang => David Ang
SELECT REGEXP_REPLACE(REGEXP_SUBSTR(name,'[^$].+[^$]'), "[$]", " ") from users;

SQL query returning empty set

I have this table
| BookID | BookTitle | NumberOfPages | NoOfCopies |
+--------+--------------------------------+---------------+------------+
| 1 | The Help | 444 | 4 |
| 2 | The Catcher in the Rye | 277 | 10 |
| 3 | Crime and Punishment | 545 | 2 |
| 4 | The Brothers Karamazov | 795 | 1 |
| 5 | A Crown of Wishes | 369 | 12 |
| 6 | The Fireman | 752 | 3 |
| 7 | Fahrenheit 451 | 174 | 9 |
| 8 | The Hobbit | 366 | 1 |
| 9 | Lord of Emperors | 560 | 4 |
| 10 | Holy Bible: King James Version | 1590 | 11 |
----------------------------------------------------------------------------
When I insert a book title and expect it to return the book id, it always returns an empty set
so far, I have tried these queries.->book_info is the name of the table:
select BookID from book_info where ucase(BookTitle) = ' THE HELP% ';
select BookID from book_info where BookTitle = ' The Help ';
select BookID from book_info where lcase(trim(BookTitle) = 'the help';
but none of them worked.
Note I don't rely on sql in my job.
you need to use like if you want to use "%"
when you use "=" you need to sure it is same. even space also count
select BookID from book_info where BookTitle LIKE 'THE HELP%';
The issue here is with the operator you are using and the value you are function you are expecting from it, = operator checks for the exact match that's why your queries are returning no records:
select BookID from book_info where ucase(BookTitle) = ' THE HELP% ';
select BookID from book_info where BookTitle = ' The Help ';
select BookID from book_info where lcase(trim(BookTitle) = 'the help';
And one more thing that is:
MySQL queries are not case-sensitive by default.
So you don't need to add the string methods here to change the values case.
We usually use the % with LIKE only like this:
select BookID from book_info where ucase(BookTitle) LIKE '%THE HELP%';
In this query LIKE %THE HELP% will match all the string having THE HELP in them;

group_concat does not show all the values mysql

ModelName.all(:having=>"count(receipt_no)>1",:select=>"school_id,group_concat(id SEPARATOR ',') as f_ids,receipt_no,count(distinct id) as id_count,count(receipt_no) as rec_count",:conditions=>"receipt_no is not null",:group=>"receipt_no")
Output is
+------------+-----------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| receipt_no | school_id | id_count | f_ids | rec_count |
+------------+-----------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| 1261 | 1783 | 2 | 557660,557661 | 2 |
| 14/15- | 1783 | 1209 | 68352,77056,113664,56320,68353,77057,113665,56321,68354,56322,68355,81923,173571,113667,56323,68356,94980,56324,68357,56325,68358,80390,56326,68359,80391,110599,56327,80392,885... | 1209 |
| 15- | 1783 | 112 | 344067,344068,344069,344070,344075,326923,373261,373262,345882,360218,344091,361755,347685,341542,347689,360233,351530,358705,352829,324674,341576,324684,360018,368469,371541,3... | 112 |
Here group_concat does not show all the values but the count of items as same as the count receipt no. Suppose the items in the f_ids column is more than 200 character then its not showing all the values . In other case it will show correct value
I got the solution
SET SESSION group_concat_max_len = 1000000;
Run this code in MySQL console, then this code will change default group_concat character limit to 1000000 characters.
If you want to use in rails console,you can use in this following way
sql = "SET SESSION group_concat_max_len = 1000000"
ActiveRecord::Base.connection.execute(sql)
Please note:
This configuration will work only in that session

Mysql query to extract tld from dns domain names

In this practice, I'd like to extarct the domain name from the TLD (Top Level Domain) given the following tables.
Table name: dns
+---------------------------+
| dnsdomain |
+---------------------------+
| ns2.hosting.indo.net.id. |
| ns1.onepanel.indo.net.id. |
| ns-1591.awsdns-06.co.uk. |
| mail189.atl21.rsgsv.net. |
| gli.websitewelcome.com. |
| ns2.metrolink.pl. |
| ns1.metrolink.pl. |
| ns-1591.awsdns-06.co.uk. |
| NS3.METRORED.HN. |
| NS.METRORED.HN. |
| ns2.hosting.indo.net.id. |
| ns1.onepanel.indo.net.id. |
| www.csis.ul.ie. |
+---------------------------+
and
Table name: tld
+----------+
| tld |
+----------+
| .net.id. |
| .co.uk. |
| .net. |
| .com. |
| .pl. |
| .uk. |
| .hn. |
| .id. |
| .ie. |
+----------+
I'd like to print out the dnstomain with its related tld. I perform the following mysql query:
select test.dnsdomain , tld.tld from test join tld where locate(tld.tld, test.dnsdomain, length(test.dnsdomain) - length (tld.tld) )!= 0;
and get the below table:
+---------------------------+----------+
| dnsdomain | tld |
+---------------------------+----------+
| ns2.hosting.indo.net.id. | .net.id. |
| ns1.onepanel.indo.net.id. | .net.id. |
| ns-1591.awsdns-06.co.uk. | .co.uk. |
| mail189.atl21.rsgsv.net. | .net. |
| gli.websitewelcome.com. | .com. |
| ns2.metrolink.pl. | .pl. |
| ns1.metrolink.pl. | .pl. |
| ns-1591.awsdns-06.co.uk. | .uk. |
| NS3.METRORED.HN. | .hn. |
| NS.METRORED.HN. | .hn. |
| ns2.hosting.indo.net.id. | .id. |
| ns1.onepanel.indo.net.id. | .id. |
| www.csis.ul.ie. | .ie. |
+---------------------------+----------+
The problem with my query is that for every single record in table 'test' it does not check all the tld from table 'tld' that's why I see something like:
| ns-1591.awsdns-06.co.uk. | .uk. |
where as the expected result would be like:
| ns-1591.awsdns-06.co.uk. | .co.uk. |
What I am doing wrong?
Try Group By function. This statement works in mysql :
select test.dnsdomain , tld.tld ,
max(length(tld.tld)) as x
from test
join tld
where locate(tld.tld, test.dnsdomain, length(test.dnsdomain) - length (tld.tld) )!= 0;
group by test.tnsdomain
OR
select test.dnsdomain , max(tld.tld) as tld
from test
join tld
where locate(tld.tld, test.dnsdomain, length(test.dnsdomain) - length (tld.tld) )!= 0;
group by test.tnsdomain
You're not doing anything wrong. That dnsname 'blah.co.uk.' matches both '.co.uk.' and '.uk.'. Both rows are being returned.
Sounds like you want to filter out all but the "longest" matching tld.
NOTE: I'd prefer to use the RIGHT() function to extract the rightmost portion from dnsdomain. (That's just easier for me to understand, but it should be equivalent to the expression you are using.)
Reference: RIGHT() https://dev.mysql.com/doc/refman/5.5/en/string-functions.html#function_right
One option to filter out the shorter matches is to use a correlated subquery to determine the maximum length of all of the tld that match, and only return the tld that has that length.
For example:
SELECT test.dnsdomain
, tld.tld
FROM test
JOIN tld
ON tld.tld = RIGHT(test.tndsdomain,CHAR_LENGTH(tld.tld))
WHERE CHAR_LENGTH(tld.tld) =
( SELECT MAX(CHAR_LENGTH(m.tld))
FROM tld m
WHERE m.tld = RIGHT(test.tndsdomain,CHAR_LENGTH(m.tld))
)
You could get an equivalent result using a JOIN operation to an inline view, it does basically the same thing:
SELECT test.dnsdomain
, tld.tld
FROM test
JOIN tld
ON tld.tld = RIGHT(test.tndsdomain,CHAR_LENGTH(tld.tld))
JOIN ( SELECT n.dnsdomain
, MAX(CHAR_LENGTH(m.tld)) AS tld_len
FROM test n
JOIN tld m
ON m.tld = RIGHT(n.tndsdomain,CHAR_LENGTH(m.tld))
GROUP BY n.dnsdomain
) o
ON o.dnsdomain = test.dnsdomain
AND o.tld_len = CHAR_LENGTH(tld.tld)
Also, it's better practice to use CHAR_LENGTH() function than LENGTH() function. The LENGTH() function returns a number of bytes, which is the same as the number of characters, for single byte character sets (like latin1), but with multibyte charactersets, the number of characters can be less than the number of bytes.)

Issue with UNION in MySQL

I have two tables.
rp_format
+-----+--+--------------+
| fid | | recordformat |
+-----+--+--------------+
| 1 | | CD |
| 2 | | Vinyl |
| 3 | | DVD |
+-----+--+--------------+
rp_records
+----+--+--------+
| id | | format |
+----+--+--------+
| 1 | | 1 |
| 2 | | 2 |
| 3 | | 3 |
+----+--+--------+
What I would like to achieve is to display everything from "rp_format". But I would also like make a check to see if there is a "fid"-value found in "format".
Example that should be displayed on page like this:
fid recordformat
1 CD Remove this format
2 Vinyl Remove this format
3 DVD Remove this format
But let's say an "fid" value is found in "format" then I would like it to be displayed like this on page:
fid recordformat
1 CD Remove this format
2 Vinyl Can't remove this format
3 DVD Remove this format
"Remove this format / Can't remove this format" is text that will be displayed by checking if "fid" = "format" using PHP.
Here is my SQL query so far:
global $wpdb;
$rpdb = $wpdb->prefix . 'rp_format';
$rpdb2 = $wpdb->prefix . 'rp_records';
$sql = "
SELECT *
FROM $rpdb
LEFT OUTER JOIN $rpdb2 ON $rpdb.fid = $rpdb2.format
UNION
SELECT *
FROM $rpdb
RIGHT OUTER JOIN $rpdb2 ON $rpdb.fid = $rpdb2.format
WHERE $rpdb.fid IS NOT NULL
";
The issue I have with this query is that when "fid" is found in "format" (let's say it's found 10 times) every of these 10 values will be outputed also.
How can this be fixed?
Kind regards
Johan
If I understand correctly you want to display some message depending on if the data exists on rp_records or not and avoid multiple display.
Consider the following
mysql> select * from rp_format;
+------+--------------+
| fid | recordformat |
+------+--------------+
| 1 | CD |
| 2 | Vinyl |
| 3 | DVD |
| 4 | Test |
+------+--------------+
4 rows in set (0.00 sec)
mysql> select * from rp_records;
+------+--------+
| id | format |
+------+--------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 2 |
| 5 | 1 |
+------+--------+
So the query is
select
f.*,
case
when r.format is not null then 'Can\'t remove' else 'Remove this' end
as message
from rp_format f
left join rp_records r on r.format = f.fid
group by f.fid ;
+------+--------------+--------------+
| fid | recordformat | message |
+------+--------------+--------------+
| 1 | CD | Can't remove |
| 2 | Vinyl | Can't remove |
| 3 | DVD | Can't remove |
| 4 | Test | Remove this |
+------+--------------+--------------+
Not sure that i correctly understand your logic with found and not found format, if i wrong - add to if condition r.format IS NOT NULL instead r.format IS NULL. And i think you no need to use union, you should use join:
SELECT
r.fid,
f.recordformat,
IF(r.format IS NULL, "Can't remove this format", "Remove this format")
FROM rp_format f
LEFT JOIN rp_records r ON f.fid = r.format
GROUP BY f.fid
;
I'm sure that something like this will help you!