How to optimize this MySQL query from Doctrine QueryBuilder? - mysql

This query is generated by a doctrine2 QueryBuilder (the concat function takes only 2 parameters), and it takes 4 seconds.
SELECT COUNT(*) AS dctrn_count
FROM
(
SELECT DISTINCT id_4
FROM
(
SELECT 1 / LOCATE( ?, CONCAT( CONCAT( CONCAT(w0_.firstname, ' '),
CONCAT(w0_.lastname, ' ') ), w1_.fullname )
) AS sclr_0,
1 / LOCATE( ?, CONCAT( CONCAT( CONCAT(w0_.firstname, ' '),
CONCAT(w0_.lastname, ' ') ), w1_.shortname )
) AS sclr_1,
1 / LOCATE( ?, CONCAT( CONCAT( CONCAT(w0_.nickname, ' '),
CONCAT(w0_.lastname, ' ') ), w1_.fullname )
) AS sclr_2,
1 / LOCATE( ?, CONCAT( CONCAT( CONCAT(w0_.nickname, ' '),
CONCAT(w0_.lastname, ' ') ), w1_.shortname )
) AS sclr_3,
w0_.id AS id_4, w0_.slug AS slug_5, w0_.firstname AS firstname_6,
w0_.lastname AS lastname_7, w0_.nickname AS nickname_8,
w0_.gender AS gender_9, w0_.email AS email_10, w0_.email_checked AS email_checked_11,
w0_.title_en AS title_en_12, w0_.short_title AS short_title_13,
-- lots of stuff removed (see edit) --
w5_.biography_en AS biography_en_55, w5_.created AS created_56, w5_.updated AS updated_57, w6_.id AS id_58, w6_.web_text AS web_text_59, w6_.created AS created_60
FROM wmn_executive w0_
INNER JOIN wmn_company w1_ ON w0_.company_id = w1_.id
INNER JOIN wmn_industry w7_ ON w1_.industry_id = w7_.id
INNER JOIN wmn_location w2_ ON w1_.location_id = w2_.id
INNER JOIN wmn_country w3_ ON w2_.country_id = w3_.id
INNER JOIN wmn_city w4_ ON w2_.city_id = w4_.id
LEFT JOIN wmn_executive_link w5_ ON w0_.link_id = w5_.id
LEFT JOIN wmn_web_executive w6_ ON w0_.id = w6_.executive_id
WHERE w0_.original_id IS NULL
AND w0_.user_id IS NOT NULL
AND ( w0_.firstname LIKE ?
OR w0_.lastname LIKE ?
OR w0_.nickname LIKE ?
OR w1_.fullname LIKE ?
OR w1_.shortname LIKE ?
OR w0_.title_en LIKE ?
OR w0_.short_title LIKE ?
OR w7_.industry_name_en LIKE ?
OR w7_.industry_name_fr LIKE ?
OR w3_.country_name_en LIKE ?
OR w3_.country_name_fr LIKE ?
OR w4_.city_name LIKE ?
)
ORDER BY sclr_0 DESC, sclr_1 DESC, sclr_2 DESC, sclr_3 DESC ) dctrn_result
) dctrn_table

** The ORDER BY provides no benefit to the end result; remove it.
**
SELECT COUNT(*) AS dctrn_count
FROM
(
SELECT DISTINCT id_4
can be simplified to
SELECT COUNT(DISTINCT(id_4))
** All the items in the SELECT clause are not use, except for id_4; get rid of them.
**** Those 3 optimization might shrink the run time from 4.0s to maybe 3.9s.
And then you will say that this is not the real query, but merely a count?
If you are going to do a messy text scan like that, you need all those strings in one table. Better yet, all the strings concatenated together into one column in one table. This would be just for searching, not for display. Then make a FULLTEXT index on that column. This will solve the OR and LIKE '%...' problems. But how to get it back into doctrine2, I don't know.

Related

How to improve slow query performance?

I have a multi-join query that targeting the hospital's chart database.
this takes 5~10 seconds or more.
This is the visual expain using mysql workbench.
The query is below.
select sc.CLIENT_ID as 'guardianId', sp.PET_ID as 'patientId', sp.NAME as 'petName'
, (select BW from syn_vital where HOSPITAL_ID = sp.HOSPITAL_ID and PET_ID = sp.PET_ID order by DATE, TIME desc limit 1) as 'weight'
, sp.BIRTH as 'birth', sp.RFID as 'regNo', sp.BREED as 'vName'
, (case when ss.NAME like '%fel%' or ss.NAME like '%cat%' or ss.NAME like '%pawpaw%' or ss.NAME like '%f' then '002'
when ss.NAME like '%canine%' or ss.NAME like '%dog%' or ss.NAME like '%can%' then '001' else '007' end) as 'sCode'
, (case when LOWER(replace(sp.SEX, ' ', '')) like 'male%' then 'M'
when LOWER(replace(sp.SEX, ' ', '')) like 'female%' or LOWER(replace(sp.SEX, ' ', '')) like 'fam%' or LOWER(replace(sp.SEX, ' ', '')) like 'woman%' then 'F'
when LOWER(replace(sp.SEX, ' ', '')) like 'c.m%' or LOWER(replace(sp.SEX, ' ', '')) like 'castratedmale' or LOWER(replace(sp.SEX, ' ', '')) like 'neutered%' or LOWER(replace(sp.SEX, ' ', '')) like 'neutrality%man%' or LOWER(replace(sp.SEX, ' ', '')) like 'M.N%' then 'MN'
when LOWER(replace(sp.SEX, ' ', '')) like 'woman%' or LOWER(replace(sp.SEX, ' ', '')) like 'f.s%' or LOWER(replace(sp.SEX, ' ', '')) like 'S%' or LOWER(replace(sp.SEX, ' ', '')) like 'neutrality%%' then 'FS' else 'NONE' end) as 'sex'
from syn_client sc
left join syn_tel st on sc.HOSPITAL_ID = st.HOSPITAL_ID and sc.CLIENT_ID = st.CLIENT_ID
inner join syn_pet sp on sc.HOSPITAL_ID = sp.HOSPITAL_ID and sc.FAMILY_ID = sp.FAMILY_ID and sp.STATE = 0
inner join syn_species ss on sp.HOSPITAL_ID = ss.HOSPITAL_ID and sp.SPECIES_ID = ss.SPECIES_ID
WHERE
trim(replace(st.NUMBER, '-','')) = '01099999999'
and trim(sc.NAME) = 'johndoe'
and sp.HOSPITAL_ID = 'HOSPITALID999999'
order by TEL_DEFAULT desc
I would like to know how to improve the performance of this complex query.
The most obvious performance killers in your query are the non-sargable criteria in your where clause.
trim(replace(st.NUMBER, '-','')) = '01099999999'
This cannot use any available index as you have applied a function to the column, which needs to be evaluated before the comparison can be made.
As suggested by Pham, you could change your criterion to -
st.number IN ('01099999999', '01-099-999-999', 'ALL_OTHERS_FORMAT_YOU_ACCEPTS...')
or better still would be to normalize the numbers before you store them (you can always apply formatting for display purposes), that way you know how to search the stored data. Strip all the hyphens and spaces from the existings numbers -
UPDATE syn_tel
SET number = REPLACE(REPLACE(number, '-',''), ' ', '')
WHERE number LIKE '% %' OR number LIKE '%-%';
Similarly for the next criterion -
trim(sc.NAME) = 'johndoe'
The name should be trimmed before being stored in the database so there is no need to trim it when searching it. Update already stored names to trim whitespace -
UPDATE syn_client
SET NAME = TRIM(NAME)
WHERE NAME LIKE ' %' OR NAME LIKE '% ';
Changing sp.HOSPITAL_ID = 'HOSPITALID999999' to sc.HOSPITAL_ID = 'HOSPITALID999999' will allow for the use of a composite index on syn_client (HOSPITAL_ID, name) assuming you drop the TRIM() from the previously discussed criterion.
The sorting in your sub-query for weight might be wrong -
order by DATE, TIME desc limit 1
presumably you want the most recent weight -
order by `DATE` desc, `TIME` desc limit 1
/* OR */
order by CONCAT(`DATE`, ' ', `TIME`) desc limit 1
order by DATE, TIME desc -- really? That's equivalent to date ASC, time DESC. If you want "newest first", then ORDER BY date DESC, time DESC. Furthermore, it is usually bad practice and clumsy to code when you have DATE and TIME in separate columns. Is there a strong reason for storing them separately? It is reasonably easy to split them apart in a SELECT.
Similarly, cleanse NUMBER and NAME when inserting.
This will make the first subquery much faster:
syn_vital needs INDEX(hostital_id, pet_id, date, time, BW)
LIKE with a leading wildcard (%) is slow, but you probably cannot avoid it in this case.
LOWER(replace(sp.SEX, ' ', '')) -- Cleanse the input during INSERT, not on output!.
LOWER(...) -- With a suitable COLLATION (eg, the default), calling LOWER is unnecessary.
Some of these 'composite' INDEXes may be useful:
ss: INDEX(HOSPITAL_ID, SPECIES_ID, NAME)
st: INDEX(HOSPITAL_ID, CLIENT_ID, NUMBER)
sp: INDEX(HOSPITAL_ID, PET_ID)
What table is TEL_DEFAULT in?
You may want to:
Create index on syn_client(hospital_id, name --,tel_default?)
Create index on syn_tel(hospital_id, client_id, number)
Create index on syn_pet(hospital_id, family_id, state)
Create index on syn_species(hospital_id, species_id)
Change your query to:
SELECT ...
FROM syn_client sc
INNER JOIN syn_tel st ON sc.hospital_id = st.hospital_id AND sc.client_id = st.client_id
INNER JOIN syn_pet sp ON sc.hospital_id = sp.hospital_id AND sc.family_id = sp.family_id AND sp.state = 0
INNER JOIN syn_species ss ON sp.hospital_id = ss.hospital_id AND sp.species_id = ss.species_id
WHERE st.number IN ('01099999999', '01-099-999-999', 'ALL_OTHERS_FORMAT_YOU_ACCEPTS...')
AND trim(sc.name) = 'johndoe' --sc.name = 'johndoe' with standardize data input
AND sc.hospital_id = 'HOSPITALID999999' --not sp.hospital_id
ORDER BY tel_default DESC;

How to write "replace" once in a SQL query having 3 times like operator

Here is my sql query. I don't want to write the "replace" 3 times. How can I optimize it ?
select * from table1 where col1='blah' AND
(
replace(replace(col2,'_',' '),'-',' ') LIKE ? OR
replace(replace(col2,'_',' '),'-',' ') LIKE ? OR
replace(replace(col2,'_',' '),'-',' ') LIKE ?
)
You could use subquery:
SELECT *
FROM (
select *, replace(replace(col2,'_',' '),'-',' ') AS r
from table1
where col1='blah'
) s
WHERE r LIKE ? OR r LIKE ? OR r LIKE ?
Or LATERAL:
select *
from table1
,LATERAL(SELECT replace(replace(col2,'_',' '),'-',' ') AS r) s
where col1='blah'
and (s.r LIKE ? OR s.r LIKE ? OR s.r LIKE ?)
db<>fiddle demo
I prefer the second approach because there is no need for introducing outer query. This feature was added in version 8.0.14.
Related:
PostgreSQL: using a calculated column in the same query
CROSS/OUTER APPLY in MySQL
In MySQL you can use a column alias in the HAVING clause even without any aggregation:
select *, replace(replace(col2,'_',' '),'-',' ') as col2_replace
from table1
where col1='blah'
having col2_replace like ?
or col2_replace like ?
MySQL has a tendency to materialize subqueries -- not only is this overhead for reading and writing a temporary table but it can also affect the use of indexes in a more complicated query.
Here are three alternative solutions that do not require subqueries.
If ? does not contain wildcards, then the simplest method is:
replace(replace(col2, '_', ' '), '-', ' ') in (?, ?, ?)
If it does, then change logic to use a single regular expression pattern:
replace(replace(col2, '_', ' '), '-', ' ') regexp ?
You can also explicitly adjust the pattern in the query:
replace(replace(col2, '_', ' '), '-', ' ') regexp
concat('(',
replace(replace(?, '_', '.'), '%', '.*'), ')|(',
replace(replace(?, '_', '.'), '%', '.*'), ')|(',
replace(replace(?, '_', '.'), '%', '.*'), ')'
)

Sort MySQL results by a column with several values separated by a period

I suspect this may have already been asked, but I'm not sure how to phrase the question so that SO search engine picks it up.
I have a column called TCID, which contains values in this format:
1.A.1.1.1
4.A.1.1.1
2.B.1.1.10
2.B.1.1.2
...
There are 5 units in this TCID, separated by periods. I want the position to the left to take the highest priority, and then finally the last digit is the lowest priority.
So it would sort like this:
1.A.1.1.1
2.B.1.1.2
2.B.1.1.10
4.A.1.1.1
Here is the query I have so far. It almost works, but the last position is not getting sorted.
SELECT *
FROM system
WHERE cluster = \"$tc_name\"
ORDER BY CAST(SUBSTR( SUBSTRING_INDEX(tcid,'.',1) , 1 ) AS UNSIGNED),
SUBSTR( SUBSTRING_INDEX(tcid,'.',2) , LENGTH( SUBSTRING_INDEX(tcid,'.',1)) + 2 ),
CAST(SUBSTR( SUBSTRING_INDEX(tcid,'.',3) , LENGTH( SUBSTRING_INDEX(tcid,'.',2)) + 2 ) AS UNSIGNED),
CAST(SUBSTR( SUBSTRING_INDEX(tcid,'.',4) , LENGTH( SUBSTRING_INDEX(tcid,'.',3)) + 2 ) AS UNSIGNED)
Can anyone help me fix this or suggest a better way to do this?
There are obviously better ways to store this information in the database, such as storing the values in separate fields. However, it's not always possible to change the code base to do such things.
But I believe you just need to add the final order by to your query in order to for it to work as expected;
SELECT *
FROM system
WHERE cluster = "<some search term>"
ORDER BY CAST(SUBSTR( SUBSTRING_INDEX(tcid,'.',1) , 1 ) AS UNSIGNED),
SUBSTR( SUBSTRING_INDEX(tcid,'.',2) , LENGTH( SUBSTRING_INDEX(tcid,'.',1)) + 2 ),
CAST(SUBSTR( SUBSTRING_INDEX(tcid,'.',3) , LENGTH( SUBSTRING_INDEX(tcid,'.',2)) + 2 ) AS UNSIGNED),
CAST(SUBSTR( SUBSTRING_INDEX(tcid,'.',4) , LENGTH( SUBSTRING_INDEX(tcid,'.',3)) + 2 ) AS UNSIGNED),
CAST(SUBSTR( SUBSTRING_INDEX(tcid,'.',5) , LENGTH( SUBSTRING_INDEX(tcid,'.',4)) + 2 ) AS UNSIGNED);
Please see this sqlfiddle to check
Just for fun...
SELECT *
FROM
(
SELECT '1.A.1.1.1' x
UNION ALL
SELECT '4.A.1.1.1'
UNION ALL
SELECT '2.B.1.1.10'
UNION ALL
SELECT '2.B.1.1.2'
) a
ORDER BY
INET_ATON(CONCAT(MID(x,1,1),MID(x,4,1000)));

Why is this query returning part of itself in the result?

I've not come across this before. Here's the query:
$query="SELECT
CONCAT_WS(' ',
TRIM(SUBSTRING_INDEX(
SUBSTRING(document, 1, INSTR(document, 'Quickstart') - 1 ),
' ',
-8)
),'Quickstart',
TRIM(SUBSTRING_INDEX(
SUBSTRING(document, INSTR(document, 'Quickstart') + LENGTH('Quickstart') ),
' ',
5)
)
)
FROM documents WHERE MATCH(document)
AGAINST('Quickstart' IN BOOLEAN MODE )";
And here's the resulting array:
[0] => Array
(
[CONCAT_WS(' ',
TRIM(SUBSTRING_INDEX(
SUBSTRING(document, 1, INSTR(document, 'Quickstart') - 1 ),
' ',
-8)
),'Quickstart',
TRIM(SUBSTRING_INDEX(
S] =>
Quickstart for set up. 1. Register your
)
The last part it's returning seems to be correct:
Quickstart for set up. 1. Register your
But why is the query itself returned? Here's the php:
if (!$result = mysql_query($query)) send(mysql_error(),"e");
$hitArray=array();
while ($row=mysql_fetch_array($result, MYSQL_ASSOC) ) { $hitArray[]=$row; }
Thanks for taking a look.
It's not returning itself, it's the key in the array, try to assign an alias to the CONCAT_WS part:
SELECT
CONCAT_WS( ... ) as concatenated
FROM documents WHERE MATCH(document)
AGAINST('Quickstart' IN BOOLEAN MODE )

MySQL Variable Substitution

I want to simplify the following using dynamic SQL like one could do in Transact SQL.
I want to do something like:
SET #s = replace(field_name, '_complete','')
and use #s instead of replace(field_name, '_complete','')
Please adive if possible and if so how.
My current code:
select distinct
if(instr(replace(field_name, '_complete',''),'_') <= 5
,left(replace(field_name, '_complete','')
,instr(replace(field_name, '_complete',''),'_') - 1
)
,replace(field_name, '_complete','')
) AS form_id ,replace(
if(instr(replace(field_name, '_complete',''),'_') <= 5,
mid(replace(field_name, '_complete',''),
instr(replace(field_name, '_complete',''),'_') + 1,
length(replace(field_name, '_complete','')) - instr(replace(field_name, '_complete',''),'_')
)
,replace(field_name, '_complete','')
),
'_',
' ') as form_name ,field_name from redcap_extract2use where field_name like '%_complete' order by 1;
The above would then be replaced with:
select distinct
if(instr(#s,'_') <= 5 ,left(#s,instr(#s,'_') - 1),#s ) AS form_id
,replace( if(instr(#s,'_') <= 5,
mid(#s,instr(#s,'_') + 1,length(#s) - instr(#s,'_')),#s), '_', ' ') as form_name
,field_name
from redcap_extract2use
where field_name like '%_complete'
order by 1;
and I would have an execute... to run the query
If I'm understanding your question correctly then you would want to use the PREPARE and the EXECUTE statements.
For example:
SET #s = replace(field_name, '_complete','');
PREPARE mystatement FROM
SELECT DISTINCT ...... ;
EXECUTE mystatement;