mysql fulltext query is very slow - mysql

I have one table for User detail in MySql with about 500000 records in it. I have also created fulltext index on firstname, lastname field on this table. but when I am trying to search any single latter/alphabet (e.g. a to z, single character), it is responding very slow in first time. It's taking about 5-6 seconds to respond. after that, it's come down to 800 milliseconds. EXPLAIN command seems ok as It shows "fulltext" in type column, but I couldn't find why it is reacting very slow.
my query is looks like as follows.
SELECT SQL_NO_CACHE usr.id, usr.uname, ifnull(usr.fullname,'') fullname,
ifnull(ct.City, '') city,
MATCH(usr.fname,usr.lname) AGAINST('a*' IN BOOLEAN MODE) ordfld
FROM usertable usr
LEFT JOIN citymas ct ON ct.CityID = upm.CityID
WHERE usr.UserStatus IN(10,11)
AND usr.id <> 1
AND MATCH(usr.fname,usr.lname) AGAINST('a*' IN BOOLEAN MODE) > 0
ORDER BY ( CASE WHEN usr.fullname = 'a' THEN 1
WHEN usr.fname rlike 'a%' THEN 2
WHEN usr.lname LIKE 'a%' THEN 3
WHEN usr.fname like '%a' THEN 6
WHEN usr.lname LIKE '%a' THEN 7
WHEN usr.fullname LIKE '%a%' THEN 8
ELSE 10 END ),
ordfld DESC,
( CASE WHEN ifnull(usr.cityid,0) = 234 THEN '0' ELSE '1' END ), usr.fullname
LIMIT 20
and explain show me following
1, 'SIMPLE', 'usr', 'fulltext', 'PRIMARY,IX_usertable_fname_lname', 'IX_usertable_fname_lname', 0, NULL , 1, 'Using where; Using filesort'
1, 'SIMPLE', 'ct' , 'eq_ref' , 'PRIMARY' , 'PRIMARY' , 3, 'usr.cityid', 1, NULL
above query is taking too much time, it is responding between 800-900ms.
Any guess?
EDIT :
does ft_min_word_len matter? when I changed it in my localhost and rebuild index again, same query returns within 500ms. If I would like to change this on Amazon RDS, How Do I do this?

Related

MySQL Count after an specific value shows

The problem is, I need to calculate average number of pages/hits after reaching the pax
page (including pax hit).
The database is:
CREATE TABLE search (
SESSION_ID INTEGER,
HIT_NUMBER INTEGER,
PAGE VARCHAR(24),
MEDIUM_T VARCHAR(24)
);
INSERT INTO search
(SESSION_ID, HIT_NUMBER, PAGE, MEDIUM_T)
VALUES
('123', '1', 'home', 'direct'),
('123', '2', 'flights_home', 'direct'),
('123', '3', 'results', 'direct'),
('456', '1', 'pax', 'metasearch'),
('789', '1', 'home', 'partners'),
('789', '2', 'flights_home', 'partners'),
('789', '3', 'results', 'partners'),
('789', '4', 'home', 'partners'),
('146', '1', 'results', 'SEM'),
('146', '2', 'pax', 'SEM'),
('146', '3', 'payment', 'SEM'),
('146', '4', 'confirmation', 'SEM');
And my approach is:
SELECT s1.SESSION_ID, COUNT(*) as sCOUNT
FROM search s1
WHERE PAGE = 'pax'
GROUP BY s1.SESSION_ID
UNION ALL
SELECT 'Total AVG', AVG(a.sCOUNT)
FROM (
SELECT COUNT(*) as sCOUNT
FROM search s2
GROUP BY s2.SESSION_ID
) a
Obviously the 3r line is wrong, my code misses the part in which after 'pax' is shown starts counting and I don't have any clue for that.
Thank you in advanced :)
Finding all pax pages and the ones after it could be done with exists. Rest is straight forward:
SELECT AVG(hits)
FROM (
SELECT session_id, COUNT(*) AS hits
FROM search AS s1
WHERE page = 'pax' OR EXISTS (
SELECT *
FROM search AS s2
WHERE s2.session_id = s1.session_id
AND s2.hit_number < s1.hit_number
AND s2.page = 'pax'
)
GROUP BY session_id
) AS x
If using MySQL 8 then window functions provide a simpler solution:
WITH cte1 AS (
SELECT session_id, MAX(CASE WHEN page = 'pax' THEN 1 END) OVER (
PARTITION BY session_id
ORDER BY hit_number
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS countme
FROM search
), cte2 as (
SELECT COUNT(*) AS hits
FROM cte1
WHERE countme IS NOT NULL
GROUP BY session_id
)
SELECT AVG(hits)
FROM cte2
My approach uses WITH CTE (common-table-expression) to pre-declare what the underlying query basis is, then querying and averaging from that.
First one premise that was not explicitly covered in your sample data. What happens IF a user bounces back and forth between multiple pages and hits the PAX page more than once. You now have multiple pax page hits. I would assume you want the FIRST instance to such pax page and that is inclusive of all page hits. This solution should help account for it.
Lets look at the inner-most from clause with final alias "pxHits".
I am grouping by session ID and grabbing the FIRST INSTANCE of a pax page hit (or null if no such pax page encountered), but ALSO grabbing the HIGHEST hit number per session. The HAVING clause will make sure that it only returns those sessions that HAD a PAX page are returned leaving all other sessions excluded from the results.
This would result with two entries passed up to the outer select which includes the 1 + lastHitNumber - firstPaxHit calculation. The reason for the 1 + is because you at least HIT the page once. But, in the scenario of your session 456 where the first and last hit WERE the first page, you need that since the lastHitNumber - firstPaxHit would net zero. This would be true if a person had 25 page hits and got to the pax page on page 26. Your result would still be 1 via 1 + 26 - 26 = 1 total page including the pax page, not the 25 prior to.
Your other qualifying session would be 146. The first pax hit was 2 but they proceeded to a highest page hit of 4. so 1 + 4 - 2 = 3 total pages.
So now on to the final. Since you can see the HOW things are prepared, we can now get the averages. You can't mix/auto convert different data types (session_id vs the fixed message of your 'Total Avg'. They must be the same type. So my query is converting the session_id to character to match. I happen to be getting the AVERAGE query first as a simple select from the WITH CTE alias, and THEN getting the actual session_id and counts.
with PaxSummary as
(
select
pxHits.*,
1 + lastHitNumber - firstPaxHit HitsIncludingPax
from
( select
session_id,
min( case when page = 'pax'
then hit_number
else null end ) firstPaxHit,
max( hit_number ) lastHitNumber
from
search
group by
session_id
having
min( case when page = 'pax'
then hit_number
else null end ) > 0 ) pxHits
)
select
'Avg Pax Pages' FinalMsg,
avg( ps2.HitsIncludingPax ) HitsIncludingPax
from
PaxSummary ps2
union all
select
cast( ps1.session_id as varchar) FinalMsg,
ps1.HitsIncludingPax
from
PaxSummary ps1
As an alternative to the EXISTS (correlated subquery) pattern, we can write a query that gets us the hit_number of the first 'pax' hit for each session_id, and use that as an inline view.
Something along these lines:
-- count hits on or after the first 'pax' of each session_id that has a 'pax' hit
SELECT s.session_id
, COUNT(*) AS cnt_hits_after_pax
FROM ( -- get the first 'pax' hit for each session_id
-- exclude session_id that do not have a 'pax' hit
SELECT px.session_id AS pax_session_id
, MIN(px.hit_number) AS pax_hit_number
FROM search px
WHERE px.page = 'pax'
) p
-- all the hits for session_id on or after the first 'pax' hit
JOIN search s
ON s.session_id = p.session_id
AND s.hit_number >= p.hit_number
GROUP BY s.session_id
to get an average from that query, we can wrap it parens and turn it into an inline view
SELECT AVG(c.cnt_hits_after_pax) AS avg_cnt_hits_after_pax
FROM (
-- query above goes here
) c

Extract and return distinct alphabetic values with MySQL

Intro
I am creating an AutoComplete street name app. Let's say below are column entries (street names) which only differ by the street number.
Ahnewinkelstr. 1
Ahnewinkelstr. 32B
Ahnewinkelstr. 36
Ahnewinkelstr. 37
Ahnewinkelstr. 39
Hansstr. 3
Hansstr. 6
Hansstr. 128
Now I would like MySQL to extract only first alphabetical part of the street name and leave out anything after the first numerical char and return a list of distinct street names that were extracted.
RESULT should look like
Ahnewinkelstr.
Hansstr.
Do you think this is doable? Before I tried with Hibernate Search to realize this, but that was certainly too complicated.
Without a regex replace (as has already been show), you'd probably need to do something ugly, and probably slow, like this:
SELECT SUBSTRING(theField, 0
, LEAST(
INSTR(theField, '0')
, INSTR(theField, '1')
, INSTR(theField, '2')
, INSTR(theField, '3')
, INSTR(theField, '4')
, INSTR(theField, '5')
, INSTR(theField, '6')
, INSTR(theField, '7')
, INSTR(theField, '8')
, INSTR(theField, '9')
) AS beforeNums
FROM ....
On MariaDB, you have access to REGEX_REPLACE(). You could write a query like:
SELECT DISTINCT REGEXP_REPLACE(`street`, '\\s+\\d.*$', '')
FROM `streets`
Results:
I believe the option you need is substring_index. The idea with this is it will return portions of a string based on exploding at certain values. In your case you would run:
SELECT DISTINCT(SUBSTRING_INDEX(Field,'.',1))
FROM your_table
See Substring_index ref from Mysql

How to make search faster in millions of data in SQL Server 2008

We have millions of records in database, we need to search TOP 50 records on the basis of some conditions.
Here is the filter criteria:
where clause having different columns of different table.
WHERE e.event_id NOT IN (
SELECT Event_Id
FROM [dbo].Event_Table eeii
WHERE eeii.Submitted_By != 524
AND eeii.STATUS = 'Draft'
AND eeii.ActivityCompletedIndicator IS NULL
)
AND e.Event_Id > 0
AND CONTAINS (
e.Event_Id_SearchString
,'"*4*"'
)
AND CONTAINS (
e.Event_Name
,'"*a*"'
)
AND e.STATUS IN (
'Not Submitted'
,'Draft'
)
AND e.Event_Type_Id = 2
AND e.Country_Id IN (22)
AND sm.State_Name LIKE N'%A%'
AND CONTAINS (
e.External_Activity_ID
,'"*P*"'
)
AND u.FirstName LIKE N'%a%'
AND u.LastName LIKE N'%a%'
AND ai.Attendee_FName LIKE N'%a%'
AND ai.Attendee_LName LIKE N'%a%'
AND ai.Fullname_organizationName LIKE N'%a%'
AND ai.CustomerID LIKE N'%P%'
AND EX.SpendItemid = 1
AND epi.Product_Id = 18
AND e.Total_Amount = 789
AND e.CurrencyId = 11
AND e.Payment_ID = 1826
We have used Full Text Index on those column where we can manage to use.
We have created required index for other columns.
Note : We have used APPLYs instead of JOINs, So that we can not use full text Index for other table which are combined using APPLYs
Still not getting expected performance.
Please suggest.

MySQL Slow Query: How to optimize the following query?

Following is the query that i am using:
SELECT *
FROM (
SELECT
(
CASE WHEN product_name like '%word1%' THEN 1 ELSE 0 END +
CASE WHEN product_name like '%word2%' THEN 1 ELSE 0 END +
CASE WHEN product_name like '%word3%' THEN 1 ELSE 0 END
) AS numMatches
FROM products as p
) as derived
WHERE numMatches > 0
ORDER BY numMatches DESC
LIMIT 30,10
I added an index (BTREE) on product_name, there are 3 million records in the column, the query is executing in 3-5 seconds.
Explain says 'Using where; Using filesort' so i can figure out its not using the index.
No, it's not using the index.
For that, you would have to compare with 'word1%', 'word2%', etc.. but doesn't work when you use the joker at the beginning.
But, If your mysql version is relatively modern you can use fulltext indexes, which would serve for your query.
https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html

Optimize query mysql search

I have the following SQL but its execution this very slow, takes about 45 seconds, the table has 15 million record, how can I improve?
SELECT A.*, B.ESPECIE
FROM
(
SELECT
A.CODIGO_DOCUMENTO,
A.DOC_SERIE,A.DATA_EMISSAO,
A.DOC_NUMERO,
A.CF_NOME,
A.CF_SRF,
A.TOTAL_DOCUMENTO,
A.DOC_MODELO
FROM MOVIMENTO A
WHERE
A.CODIGO_EMPRESA = 1
AND A.CODIGO_FILIAL = 5
AND A.DOC_TIPO_MOVIMENTO = 1
AND A.DOC_MODELO IN ('65','55')
AND (A.CF_NOME LIKE '%TEXT_SEARCH%'
OR A.CF_CODIGO LIKE 'TEXT_SEARCH%'
OR A.CF_SRF LIKE 'TEXT_SEARCH%'
OR A.DOC_SERIE LIKE 'TEXT_SEARCH%'
OR A.DOC_NUMERO LIKE 'TEXT_SEARCH%')
ORDER BY A.DATA_EMISSAO DESC , A.CODIGO_DOCUMENTO DESC
LIMIT 0, 100
) A
LEFT JOIN MODELODOCUMENTOFISCAL B ON A.DOC_MODELO = B.CODMODELO
For this query, I would start with an index on MOVIMENTO(CODIGO_EMPRESA, CODIGO_FILIAL, DOC_MODELO) and MODELODOCUMENTOFISCAL(CODMODELO).
That should speed the query.
If it doesn't you may need to consider a full text search to handle the LIKE clauses. I do note that you only have a wildcard at the beginning of one of the patterns. Is that intentional?