Multitable join is taking hours, is there a better way? - mysql

I have a few tables recording trades by myself and trades that happen in general, separated out into tradelog and timesales respectively. Both tradelog and timesales each have a second table holding extra data about the trades. I have a column TimeSalesID in my tradelog so I can link up public trades which I participated in. Below is the query I'm running to try and get ALL of my trades and time and sales in one result. The right join is taking FOREVER though, is there a better way?
SELECT SUM(tsJoin.TradeEdge)
FROM
(SELECT * from tradelog tl JOIN slippage_processed sp ON tl.ID = sp.TradeLogID WHERE tl.TradeTIme > '2019-01-21') AS tlJoin
RIGHT JOIN
(SELECT * from timesales ts JOIN slippage_processed_timesales spt ON ts.ID = spt.TimeSalesID WHERE ts.TradeTime > '2019-01-21') AS tsJoin
ON tlJoin.TIMESalesID = tsJoin.ID

Not using subqueries would help. MySQL tends to materialize subqueries, which means that indexes are lost -- greatly impeding query plans.
I would start with:
select sum(?.TradeEdge) -- whatever table column it comes from
from timesales ts join
slippage_processed_timesales spt
on ts.ID = spt.TimeSalesID left join
tradelog tl
on ?.TIMESalesID = ?.ID left join
slippage_processed sp
on tl.ID = sp.TradeLogID and tl.TradeTIme > '2019-01-21'
where ts.TradeTime > '2019-01-21';
The ? are because I don't know the base tables where the columns are coming from. Depending on the tables, the query might need to be adjusted a bit.
Also, I don't think the outer joins are necessary for what you want to do.

You should simplify your sql to simple JOINs. Based on your query above, I think this might get you a result a little faster
SELECT SUM(TradeEdge)
FROM tradelog tl
INNER JOIN slippage_processed sp ON tl.ID = sp.TradeLogID
INNER JOIN timesales ts ON ts.ID = sp.TimeSalesID
WHERE tl.TradeTIme > '2019-01-21'
If this does not work, please post your table structure for all tables involved

Related

SQL query optimization for speed

So I was working on the problem of optimizing the following query I have already optimized this to the fullest from my side can this be further optimized?
select distinct name ad_type
from dim_ad_type x where exists ( select 1
from sum_adserver_dimensions sum
left join dim_ad_tag_map on dim_ad_tag_map.id=sum.ad_tag_map_id and dim_ad_tag_map.client_id=sum.client_id
left join dim_site on dim_site.id = dim_ad_tag_map.site_id
left join dim_geo on dim_geo.id = sum.geo_id
left join dim_region on dim_region.id=dim_geo.region_id
left join dim_device_category on dim_device_category.id=sum.device_category_id
left join dim_ad_unit on dim_ad_unit.id=dim_ad_tag_map.ad_unit_id
left join dim_monetization_channel on dim_monetization_channel.id=dim_ad_tag_map.monetization_channel_id
left join dim_os on dim_os.id = sum.os_id
left join dim_ad_type on dim_ad_type.id = dim_ad_tag_map.ad_type_id
left join dim_integration_type on dim_integration_type.id = dim_ad_tag_map.integration_type_id
where sum.client_id = 50
and dim_ad_type.id=x.id
)
order by 1
Your query although joined ok, is an overall bloat. You are using the dim_ad_type table on the outside, just to make sure it exists on the inside as well. You have all those left-joins that have NO bearing on the final outcome, why are they even there. I would simplify by reversing the logic. By tracing your INNER query for the same dim_ad_type table, I find the following is the direct line. sum -> dim_ad_tag_map -> dim_ad_type. Just run that.
select distinct
dat.name Ad_Type
from
sum_adserver_dimensions sum
join dim_ad_tag_map tm
on sum.ad_tag_map_id = tm.id
and sum.client_id = tm.client_id
join dim_ad_type dat
on tm.ad_type_id = dat.id
where
sum.client_id = 50
order by
1
Your query was running ALL dim_ad_types, then finding all the sums just to find those that matched. Run it direct starting with the one client, then direct with JOINs.

Mysql Join performance MongoDB, Cassandra

I have a join query which takes a lot of time to process.
SELECT
COUNT(c.id)
FROM `customers` AS `c`
LEFT JOIN `setting` AS `ssh` ON `c`.`shop_id` = `ssh`.`id`
LEFT JOIN `customer_extra` AS `cx` ON `c`.`id` = `cx`.`customer_id`
LEFT JOIN `customers_address` AS `ca` ON `ca`.`id` = `cx`.`customer_default_address_id`
LEFT JOIN `lytcustomer_tier` AS `ct` ON `cx`.`lyt_customer_tier_id` = `ct`.`id`
WHERE (c.shop_id = '12121') AND ((DATE(cx.last_email_open_date) > '2019-11-08'));
This is primarily because the table 'customers' has 2 million records.
I could go over into indexing etc. But, the larger point is, this 2.5 million could become a billion records 1 day.
I'm looking for solutions which can enhance performance.
I've given thought to
a) horizontal scalability. -: distribute the mysql table into different sections and query the count independently.
b) using composite indexes.
c) My favourite one -: Just create a seperate collection in mongodb or redis which only houses the count(output of this query) Since, the count is just 1 number. this will not require a huge size aka better query performance (Only question is, how many such queries are there, because that will increase size of the new collection)
Try this and see if it improve performance:
SELECT
COUNT(c.id)
FROM `customers` AS `c`
INNER JOIN `customer_extra` AS `cx` ON `c`.`id` = `cx`.`customer_id`
LEFT JOIN `setting` AS `ssh` ON `c`.`shop_id` = `ssh`.`id`
LEFT JOIN `customers_address` AS `ca` ON `ca`.`id` = `cx`.`customer_default_address_id`
LEFT JOIN `lytcustomer_tier` AS `ct` ON `cx`.`lyt_customer_tier_id` = `ct`.`id`
WHERE (c.shop_id = '12121') AND ((DATE(cx.last_email_open_date) > '2019-11-08'));
As I mention in the comment, since the condition AND ((DATE(cx.last_email_open_date) > '2019-11-08'));, already made customers table to INNER JOIN with customer_extra table, you might just change it to INNER JOIN customer_extra AS cx ON c.id = cx.customer_id and follow it with other LEFT JOIN.
The INNER JOIN will at least get the initial result to only return any customer who have last_email_open_date value based on what has been specified.
Say COUNT(*), not COUNT(c.id)
Remove these; they slow down the query without adding anything that I can see:
LEFT JOIN `setting` AS `ssh` ON `c`.`shop_id` = `ssh`.`id`
LEFT JOIN `customers_address` AS `ca` ON `ca`.`id` = `cx`.`customer_default_address_id`
LEFT JOIN `lytcustomer_tier` AS `ct` ON `cx`.`lyt_customer_tier_id` = `ct`.`id`
DATE(...) makes that test not "sargable". This works for DATE or DATETIME; and this is much faster:
cx.last_email_open_date > '2019-11-08'
Consider whether that should be >= instead of >.
Need an index on shop_id. (Please provide SHOW CREATE TABLE.)
Don't use LEFT JOIN when JOIN would work equally well.
If customer_extra is columns that should have been in customer, now is the time to move them in. That would let you use this composite index for even more performance:
INDEX(shop_id, last_email_open_date) -- in this order
With those changes, a billion rows in MySQL will probably not be a problem. If it is, there are still more fixes I can suggest.

Optimizing a MySQL NOT IN( query

I am trying to optimize this MySQL query. I want to get a count of the number of customers that do not have an appointment prior to the current appointment being looked at. In other words, if they have an appointment (which is what the NOT IN( subquery is checking for), then exclude them.
However, this query is absolutely killing performance. I know that MySQL is not very good with NOT IN( queries, but I am not sure on the best way to go about optimizing this query. It takes anywhere from 15 to 30 seconds to run. I have created indexes on CustNo, AptStatus, and AptNum.
SELECT
COUNT(*) AS NumOfCustomersWithPriorAppointment,
FROM
transaction_log AS tl
LEFT JOIN
appointment AS a
ON
a.AptNum = tl.AptNum
INNER JOIN
customer AS c
ON
c.CustNo = tl.CustNo
WHERE
a.AptStatus IN (2)
AND a.CustNo NOT IN
(
SELECT
a2.CustNo
FROM
appointment a2
WHERE
a2.AptDateTime < a.AptDateTime)
AND a.AptDateTime > BEGIN_QUERY_DATE
AND a.AptDateTime < END_QUERY_DATE
Thank you in advance.
Try the following:
SELECT
COUNT(*) AS NumOfCustomersWithPriorAppointment,
FROM
transaction_log AS tl
INNER JOIN
appointment AS a
ON
a.AptNum = tl.AptNum
LEFT OUTER JOIN appointment AS earlier_a
ON earlier_a.CustNo = a.CustNo
AND earlier_a.AptDateTime < a.AptDateTime
INNER JOIN
customer AS c
ON
c.CustNo = tl.CustNo
WHERE
a.AptStatus IN (2)
AND earlier_a.AptNum IS NULL
AND a.AptDateTime > BEGIN_QUERY_DATE
AND a.AptDateTime < END_QUERY_DATE
This will benefit from a composite index on (CustNo,AptDateTime). Make it unique if that fits your business model (logically it seems like it should, but practically it may not, depending on how you handle conflicts in your application.)
Provide SHOW CREATE TABLE statements for all tables if this does not create a sufficient performance improvement.

Difficult to complete a tough query script for me

SELECT a.acikkapali,
b.onay, b.evrakno,
b.tarih,
a.kod,
d.ad,
a.note0,
a.sf_miktar,
a.sf_sf_unit,
a.rtalepedilentestarih,
c.evrakno
FROM stok47T a
LEFT JOIN stok47e b on a.evrakno = b.evrakno
LEFT JOIN stok46t1 c on a.evrakno = c.talepno
LEFT JOIN stok00 d on a.kod = d.kod
WHERE a.tarih = '2013/04/15'
I need to add two my tables into that script with right way that means If I mapped one of them then the normal row count increases this makes me crazy, I have been trying to solve that issue for a couple days but I had been fail many times.
I couldn't find a good mapped fields between stok47t and the others. But there are still a few columns(fields) matches for their types and data.
I need to listen ppl opinions and learns something.
Here is a big part of my query
If you are getting increase in row count then chances are it could be due to using LEFT JOIN. An INNER join might help (see http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html guidance)
SELECT a.acikkapali,
b.onay, b.evrakno,
b.tarih,
a.kod,
d.ad,
a.note0,
a.sf_miktar,
a.sf_sf_unit,
a.rtalepedilentestarih,
c.evrakno
FROM stok47T a
INNER JOIN stok47e b on a.evrakno = b.evrakno
INNER JOIN stok46t1 c on a.evrakno = c.talepno
INNER JOIN stok00 d on a.kod = d.kod
WHERE a.tarih = '2013/04/15'
However without understanding your data structure then there is a chance you might lose the information that you are after.
If you are getting multiple rows, it is probably due to a Cartesian product occurring in the joins. This is unrelated to the type of join (left/right/full/inner). It is based on the relationships between the tables. You have 1-N relationships along different dimensions.
Your conditions are:
FROM stok47T a
LEFT JOIN stok47e b on a.evrakno = b.evrakno
LEFT JOIN stok46t1 c on a.evrakno = c.talepno
LEFT JOIN stok00 d on a.kod = d.kod
I have no idea what these tables and fields mean. But, if you have a case where there is one row per evrakno in table stok47t, and there are two rows in table stok47e and three in table stok46t1, then you will get six rows in the output.
Without more information, it is impossible to tell you the best solution. One method is to summarize the tables. Another is to take the first or last corresponding row, by doing something like:
from stok47T a left join
(select s.*, row_number() over (partition by evrakno order by id desc) as seqnum
from stok47e s
) b
on a.evrakno = b.evrakno and b.seqnum = 1

MySql Query takes forever

hi I am doing A query to get some product info, but there is something strange going on, the first query returns resultset fast (.1272s) but the second (note that I just added 1 column) takes forever to complete (28-35s), anyone know what is happening?
query 1
SELECT
p.partnumberp,
p.model,
p.descriptionsmall,
p.brandname,
sum(remainderint) stockint
from
inventario_dbo.inventoryindetails ind
left join purchaseorders.product p on (p.partnumberp = ind.partnumberp)
left join inventario_dbo.inventoryin ins on (ins.inventoryinid= ind.inventoryinid)
group by partnumberp, projectid
query 2
SELECT
p.partnumberp,
p.model,
p.descriptionsmall,
p.brandname,
p.descriptiondetail,
sum(remainderint) stockint
from
inventario_dbo.inventoryindetails inda
left join purchaseorders.product p on (p.partnumberp = inda.partnumberp)
left join inventario_dbo.inventoryin ins on (ins.inventoryinid= inda.inventoryinid)
group by partnumberp, projectid
You shouldn't group by some columns and then select other columns unless you use aggregate functions. Only p.partnumberp and sum(remainderint) make sense here. You're doing a huge join and select and then the results for most rows just end up getting discarded.
You can make the query much faster by doing an inner select first and then joining that to the remaining tables to get your final result for the last few columns.
The inner select should look something like this:
select p.partnumberp, projectid, sum(remainderint) stockint
from inventario_dbo.inventoryindetails ind
left join purchaseorders.product p on (p.partnumberp = ind.partnumberp)
left join inventario_dbo.inventoryin ins on (ins.inventoryinid = ind.inventoryinid)
group by partnumberp, projectid
After the join:
select T1.partnumberp, T1.projectid, p2.model, p2.descriptionsmall, p2.brandname, T1.stockint
from
(select p.partnumberp, projectid, sum(remainderint) stockint
from inventario_dbo.inventoryindetails ind
left join purchaseorders.product p on (p.partnumberp = ind.partnumberp)
left join inventario_dbo.inventoryin ins on (ins.inventoryinid = ind.inventoryinid)
group by partnumberp, projectid) T1
left join purchaseorders.product p2 on (p2.partnumberp = T1.partnumberp)
Is descriptiondetail a really large column? Sounds like it could be a lot of text compared to the other fields based on its name, so maybe it just takes a lot more time to read from disk, but if you could post the schema detail for the purchaseorders.product table or maybe the average length of that column that would help.
Otherswise I would try running the query a few times and see you consistently get the same time results. Could just be load on the database server the time you got the slower result.