I have the query outlined as below. At present it takes over 8 min to run given there are over 8 million records within the zt_Arrival_Data table, while the zt_Tpl_Tuple_Stats_2 tale only carries 9774 records, with the total output of simply 6946 unique records.
In what way can I structure this query to improve performance?
SELECT distinct b.Tuple_ID
, LTRIM(RTRIM(a.ORIGIN_CITY)) + ', ' + LTRIM(RTRIM(a.ORIGIN_STATE)) AS Origin_TX
, LTRIM(RTRIM(a.DESTINATION_CITY)) + ', ' + LTRIM(RTRIM(a.DESTINATION_STATE)) AS Destination_TX
, LTRIM(RTRIM(a.ORIGIN_CITY)) + ' - ' + LTRIM(RTRIM(a.CUSTOMER_NAME)) AS Origin_Customer_TX
, LTRIM(RTRIM(a.ORIGIN_CITY)) + ' - ' + LTRIM(RTRIM(a.DESTINATION_CITY)) AS Origin_Destination_TX
, LTRIM(RTRIM(a.CUSTOMER_NAME)) AS Customer_Name
, LTRIM(RTRIM(a.CUSTOMER_NAME)) + ', ' + LTRIM(RTRIM(a.CUSTOMER_NO)) AS Customer_TX
, CASE
WHEN LTRIM(RTRIM(a.CUSTOMER_TYPE)) = 'C' THEN 'Customer'
WHEN LTRIM(RTRIM(a.CUSTOMER_TYPE)) = 'I' THEN 'Internal'
WHEN LTRIM(RTRIM(a.CUSTOMER_TYPE)) = 'S' THEN 'Shop'
WHEN LTRIM(RTRIM(a.CUSTOMER_TYPE)) = '' THEN 'zUnkown'
ELSE LTRIM(RTRIM(a.CUSTOMER_TYPE))
END AS Customer_Type
, CASE
WHEN a.CARE_OF_NAME = '' THEN 'zUnknown'
ELSE a.CARE_OF_NAME
END AS Care_of_Name
, LTRIM(RTRIM(a.ORIGIN_CITY )) AS Origin_City
, LTRIM(RTRIM(a.ORIGIN_STATE )) AS Origin_State
, LTRIM(RTRIM(a.DESTINATION_CITY )) AS Destination_City
, LTRIM(RTRIM(a.DESTINATION_STATE )) AS Destination_State
, LTRIM(RTRIM(b.BusinessGroup_TX )) AS BusinessGroup_TX
, b.Fleet_TX AS Fleet_TX
, c.Leg_TX AS Leg_TX
FROM zt_Arrival_Data a
INNER JOIN zt_Tpl_Tuple_Stats_2 b
ON LTRIM(RTRIM(a.ORIGIN_CITY)) + ', ' + LTRIM(RTRIM(a.ORIGIN_STATE)) = b.ORIGIN_TX
AND LTRIM(RTRIM(a.DESTINATION_CITY)) + ', ' + LTRIM(RTRIM(a.DESTINATION_STATE)) = b.DESTINATION_TX
AND a.CUSTOMER_NO = b.CUSTOMER_CD
AND a.BUSINESS_GROUP = b.BusinessGroup_TX
AND a.[FLEET_ID (GEN PLANT)] = b.Fleet_TX
JOIN zt_LegMap c ON c.Leg_CD = b.Leg_CD
It is far better to trim the data on data entry where it only has to happen once than to do this sort of thing against a large table in a select.
It is an especially bad design that you you have to concatenate in order to join. You lose the ability to use indexes when you do these things. In SQL Server I would create a calculated persisted column that Ci could join on instead, not sure if mysql has such things. But you should investigate doing this.
From my experience I've learned that when you need to format fields in order to join the tables you should format the columns of the table which has LESS rows to match the one with MORE rows, which must be compared unaltered.
Some idea to start with:
FROM zt_Arrival_Data a
INNER JOIN zt_Tpl_Tuple_Stats_2 b
ON a.ORIGIN_CITY = <format the b table columns to match a.ORIGIN_CITY>
AND a.DESTINATION_STATE = <format the b table columns to match a.DESTINATION_STATE>
AND a.DESTINATION_CITY = <format the b table columns to match a.DESTINATION_CITY>
AND a.ORIGIN_STATE = <format the b table columns to match a.ORIGIN_STATE>
AND a.CUSTOMER_NO = b.CUSTOMER_CD
AND a.BUSINESS_GROUP = b.BusinessGroup_TX
AND a.[FLEET_ID (GEN PLANT)] = b.Fleet_TX
Related
I have a multi-join query that targeting the hospital's chart database.
this takes 5~10 seconds or more.
This is the visual expain using mysql workbench.
The query is below.
select sc.CLIENT_ID as 'guardianId', sp.PET_ID as 'patientId', sp.NAME as 'petName'
, (select BW from syn_vital where HOSPITAL_ID = sp.HOSPITAL_ID and PET_ID = sp.PET_ID order by DATE, TIME desc limit 1) as 'weight'
, sp.BIRTH as 'birth', sp.RFID as 'regNo', sp.BREED as 'vName'
, (case when ss.NAME like '%fel%' or ss.NAME like '%cat%' or ss.NAME like '%pawpaw%' or ss.NAME like '%f' then '002'
when ss.NAME like '%canine%' or ss.NAME like '%dog%' or ss.NAME like '%can%' then '001' else '007' end) as 'sCode'
, (case when LOWER(replace(sp.SEX, ' ', '')) like 'male%' then 'M'
when LOWER(replace(sp.SEX, ' ', '')) like 'female%' or LOWER(replace(sp.SEX, ' ', '')) like 'fam%' or LOWER(replace(sp.SEX, ' ', '')) like 'woman%' then 'F'
when LOWER(replace(sp.SEX, ' ', '')) like 'c.m%' or LOWER(replace(sp.SEX, ' ', '')) like 'castratedmale' or LOWER(replace(sp.SEX, ' ', '')) like 'neutered%' or LOWER(replace(sp.SEX, ' ', '')) like 'neutrality%man%' or LOWER(replace(sp.SEX, ' ', '')) like 'M.N%' then 'MN'
when LOWER(replace(sp.SEX, ' ', '')) like 'woman%' or LOWER(replace(sp.SEX, ' ', '')) like 'f.s%' or LOWER(replace(sp.SEX, ' ', '')) like 'S%' or LOWER(replace(sp.SEX, ' ', '')) like 'neutrality%%' then 'FS' else 'NONE' end) as 'sex'
from syn_client sc
left join syn_tel st on sc.HOSPITAL_ID = st.HOSPITAL_ID and sc.CLIENT_ID = st.CLIENT_ID
inner join syn_pet sp on sc.HOSPITAL_ID = sp.HOSPITAL_ID and sc.FAMILY_ID = sp.FAMILY_ID and sp.STATE = 0
inner join syn_species ss on sp.HOSPITAL_ID = ss.HOSPITAL_ID and sp.SPECIES_ID = ss.SPECIES_ID
WHERE
trim(replace(st.NUMBER, '-','')) = '01099999999'
and trim(sc.NAME) = 'johndoe'
and sp.HOSPITAL_ID = 'HOSPITALID999999'
order by TEL_DEFAULT desc
I would like to know how to improve the performance of this complex query.
The most obvious performance killers in your query are the non-sargable criteria in your where clause.
trim(replace(st.NUMBER, '-','')) = '01099999999'
This cannot use any available index as you have applied a function to the column, which needs to be evaluated before the comparison can be made.
As suggested by Pham, you could change your criterion to -
st.number IN ('01099999999', '01-099-999-999', 'ALL_OTHERS_FORMAT_YOU_ACCEPTS...')
or better still would be to normalize the numbers before you store them (you can always apply formatting for display purposes), that way you know how to search the stored data. Strip all the hyphens and spaces from the existings numbers -
UPDATE syn_tel
SET number = REPLACE(REPLACE(number, '-',''), ' ', '')
WHERE number LIKE '% %' OR number LIKE '%-%';
Similarly for the next criterion -
trim(sc.NAME) = 'johndoe'
The name should be trimmed before being stored in the database so there is no need to trim it when searching it. Update already stored names to trim whitespace -
UPDATE syn_client
SET NAME = TRIM(NAME)
WHERE NAME LIKE ' %' OR NAME LIKE '% ';
Changing sp.HOSPITAL_ID = 'HOSPITALID999999' to sc.HOSPITAL_ID = 'HOSPITALID999999' will allow for the use of a composite index on syn_client (HOSPITAL_ID, name) assuming you drop the TRIM() from the previously discussed criterion.
The sorting in your sub-query for weight might be wrong -
order by DATE, TIME desc limit 1
presumably you want the most recent weight -
order by `DATE` desc, `TIME` desc limit 1
/* OR */
order by CONCAT(`DATE`, ' ', `TIME`) desc limit 1
order by DATE, TIME desc -- really? That's equivalent to date ASC, time DESC. If you want "newest first", then ORDER BY date DESC, time DESC. Furthermore, it is usually bad practice and clumsy to code when you have DATE and TIME in separate columns. Is there a strong reason for storing them separately? It is reasonably easy to split them apart in a SELECT.
Similarly, cleanse NUMBER and NAME when inserting.
This will make the first subquery much faster:
syn_vital needs INDEX(hostital_id, pet_id, date, time, BW)
LIKE with a leading wildcard (%) is slow, but you probably cannot avoid it in this case.
LOWER(replace(sp.SEX, ' ', '')) -- Cleanse the input during INSERT, not on output!.
LOWER(...) -- With a suitable COLLATION (eg, the default), calling LOWER is unnecessary.
Some of these 'composite' INDEXes may be useful:
ss: INDEX(HOSPITAL_ID, SPECIES_ID, NAME)
st: INDEX(HOSPITAL_ID, CLIENT_ID, NUMBER)
sp: INDEX(HOSPITAL_ID, PET_ID)
What table is TEL_DEFAULT in?
You may want to:
Create index on syn_client(hospital_id, name --,tel_default?)
Create index on syn_tel(hospital_id, client_id, number)
Create index on syn_pet(hospital_id, family_id, state)
Create index on syn_species(hospital_id, species_id)
Change your query to:
SELECT ...
FROM syn_client sc
INNER JOIN syn_tel st ON sc.hospital_id = st.hospital_id AND sc.client_id = st.client_id
INNER JOIN syn_pet sp ON sc.hospital_id = sp.hospital_id AND sc.family_id = sp.family_id AND sp.state = 0
INNER JOIN syn_species ss ON sp.hospital_id = ss.hospital_id AND sp.species_id = ss.species_id
WHERE st.number IN ('01099999999', '01-099-999-999', 'ALL_OTHERS_FORMAT_YOU_ACCEPTS...')
AND trim(sc.name) = 'johndoe' --sc.name = 'johndoe' with standardize data input
AND sc.hospital_id = 'HOSPITALID999999' --not sp.hospital_id
ORDER BY tel_default DESC;
I've got 5 columns in table:
- id / a /b /operation/ c
where for example: 1 / 2 /3 / + / 5
What is my goal is to check if operation (+/-/*/%) is correct with the formula (a (operation) b = c) and show only those records where formula is true.
Honestly I've been trying to solve that but unfortunately I really don't know how to...
For sure I have to start with operation column:
(IF(operation = '+', a+b, IF( operation = '-', a-b, IF(operation = '*', a*b, IF(operation = '/',a/b, ''))))
Shall I mix it with IF CASE maybe? What's more, zero exepction needs to be added
Use case:
select (c = (case when operation = '+' then a + b
when operation = '-' then a - b
when operation = '/' then a / nullif(b, 0)
. . .
end)
) as expected_equals_actual
I am trying to solve an issue with joining last and first names that are identified using id's in another table. My code is producing the correct fields, but the Guide_Name and Guest_Name columns show all/only 0 (zero). Here is my code:
use www;
SELECT
d.destination_name,
tt.trip_type_name,
t.trip_number,
t.trip_date,
CONCAT(e.last_name + ', ' + e.first_name) AS guide_name,
CONCAT(g.last_name + ', ' + g.first_name) AS guest_name,
ex.exp_name AS guest_experience,
g.age AS guest_age,
g.weight AS guest_weight,
g.swimmer AS guest_is_swimmer,
g.mobile_phone AS guest_mobile_phone
FROM
trip_type tt
JOIN
trips t ON tt.trip_type_code = t.trip_type_code
JOIN
destination d ON t.destination_code = d.destination_code
JOIN
reservation r ON t.trip_number = r.trip_number
JOIN
guests g ON r.guest_id = g.guest_id
JOIN
experience ex ON ex.exp_code = g.exp_code
JOIN
employees e ON t.guide_employee_id = e.employee_id
ORDER BY d.destination_name , tt.trip_type_name , t.trip_date , g.last_name , e.employee_id
And here is the EER diagram:
CONCAT should just be a comma separated list of strings, so I would first change
CONCAT(e.last_name + ', ' + e.first_name)
to
CONCAT(e.last_name, ', ', e.first_name)
and see if that helps.
With + in the concat, mysql thinks you want them treated as numbers. Perhaps you're confusing it with javascript?
http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_concat
dave has the heart of this problem, but I expect you also will want to deal with nulls. I think the final code you want will look like this:
COALESCE(CONCAT(e.last_name,', ',e.first_name),e.last_name,e.first_name,'') AS guide_name,
COALESCE(CONCAT(g.last_name,', ',g.first_name),g.last_name,g.first_name,'') AS guest_name,
I have a table with fields: country_code, short_name, currency_unit, a2010, a2011, a2012, a2013, a2014, a2015. a2010-a2015 fields are type of double.
How do I make a query which orders the results by average of fields a2010-a2015, keeping in mind that these fields might have NULL value?
I tried this code and it did not work (returns a mistake, which tells there is something wrong in ORDER BY part. mistake was saying something about coumn names and GROUP BY). The logic is: ORDER BY ((A)/(B)) where A - sum of not NULL fields and B - count of not NULL fields.
Any ideas?
(if important, the code is going to be used in BigInsights environment)
SELECT country_code, short_name, currency_unit, a2010, a2011, a2012,
a2013, a2014, a2015
FROM my_schema.my_table
WHERE Indicator_Code = 'SE.PRM.TENR'
ORDER BY
(
(
Coalesce(a2010,0) + Coalesce(a2011,0) + Coalesce(a2012,0)
+Coalesce(a2013,0) + Coalesce(a2014,0) + Coalesce(a2015,0)
)
/
(
COUNT(Coalesce(a2010)) + COUNT(Coalesce(a2011)) + COUNT(Coalesce(a2012))
+ COUNT(Coalesce(a2013)) + COUNT(Coalesce(a2014)) +
COUNT(Coalesce(a2015))
)
) DESC;
use MySQL ifnull
IFNULL(expression_1,expression_2)
in your query :-
IFNULL(
(
COUNT(Coalesce(a2010)) + COUNT(Coalesce(a2011)) + COUNT(Coalesce(a2012))
+ COUNT(Coalesce(a2013)) + COUNT(Coalesce(a2014)) +
COUNT(Coalesce(a2015))
),
1
)
I have a table, MapLocation, which has a column and two relationships with tables that have a field that really need to be displayed as a single concatenated value. I was thinking this was a perfect case for a computed column, but not sure how to go about it.
MapLocation MaoNo Section
_____________________ _____________________ _____________________
MapNoId MapNoId SectionId
SectionId MapNumber (int) Section (int)
Identifier (nvarchar)
LocationName (nvarchar)
LocationName = "MapNUmber - SectionNumber - Identifier"
ex: 20 - 03 - SW4
How would I write that? I haven't done much with computed columns or concatenating in SQL.
Edit:
I need an actual computed column that is automatically updated, im looking for the formula. Or is this more of a function/trigger? Its possible, I certainly barely know what I'm doing. The idea is that I dont want to have to do two more server calls and concatenate these values client side.
You would use something like this to get the value:
select cast(n.MapNumber as nvarchar(10)) + ' - ' -- cast the MapNumber
+ cast(s.SectionId as nvarchar(10)) + ' - ' -- cast the SectionId
+ l.Identifier
from MapLocation l
left join MaoNo n
on l.MapNoId = n.MapNoId
left join Section s
on l.SectionId = s.SectionId
Then if you need to perform an UPDATE:
update l
set l.LocationName = (cast(n.MapNumber as nvarchar(10)) + ' - '
+ cast(s.SectionId as nvarchar(10)) + ' - '
+ l.Identifier)
from MapLocation l
left join MaoNo n
on l.MapNoId = n.MapNoId
left join Section s
on l.SectionId = s.SectionId
Edit #1 - you can use a TRIGGER:
CREATE TRIGGER trig_LocationName
ON MapLocation
AFTER INSERT
AS
Begin
update MapLocation
set LocationName = (cast(n.MapNumber as nvarchar(10)) + ' - '
+ cast(s.SectionId as nvarchar(10)) + ' - '
+ i.Identifier)
from Inserted i
left join MaoNo n
on i.MapNoId = n.MapNoId
left join Section s
on i.SectionId = s.SectionId
where MapLocation.MapNoId = i.MapNoId -- fields here to make sure you update the correct record
End