Mysql Join performance MongoDB, Cassandra - mysql

I have a join query which takes a lot of time to process.
SELECT
COUNT(c.id)
FROM `customers` AS `c`
LEFT JOIN `setting` AS `ssh` ON `c`.`shop_id` = `ssh`.`id`
LEFT JOIN `customer_extra` AS `cx` ON `c`.`id` = `cx`.`customer_id`
LEFT JOIN `customers_address` AS `ca` ON `ca`.`id` = `cx`.`customer_default_address_id`
LEFT JOIN `lytcustomer_tier` AS `ct` ON `cx`.`lyt_customer_tier_id` = `ct`.`id`
WHERE (c.shop_id = '12121') AND ((DATE(cx.last_email_open_date) > '2019-11-08'));
This is primarily because the table 'customers' has 2 million records.
I could go over into indexing etc. But, the larger point is, this 2.5 million could become a billion records 1 day.
I'm looking for solutions which can enhance performance.
I've given thought to
a) horizontal scalability. -: distribute the mysql table into different sections and query the count independently.
b) using composite indexes.
c) My favourite one -: Just create a seperate collection in mongodb or redis which only houses the count(output of this query) Since, the count is just 1 number. this will not require a huge size aka better query performance (Only question is, how many such queries are there, because that will increase size of the new collection)

Try this and see if it improve performance:
SELECT
COUNT(c.id)
FROM `customers` AS `c`
INNER JOIN `customer_extra` AS `cx` ON `c`.`id` = `cx`.`customer_id`
LEFT JOIN `setting` AS `ssh` ON `c`.`shop_id` = `ssh`.`id`
LEFT JOIN `customers_address` AS `ca` ON `ca`.`id` = `cx`.`customer_default_address_id`
LEFT JOIN `lytcustomer_tier` AS `ct` ON `cx`.`lyt_customer_tier_id` = `ct`.`id`
WHERE (c.shop_id = '12121') AND ((DATE(cx.last_email_open_date) > '2019-11-08'));
As I mention in the comment, since the condition AND ((DATE(cx.last_email_open_date) > '2019-11-08'));, already made customers table to INNER JOIN with customer_extra table, you might just change it to INNER JOIN customer_extra AS cx ON c.id = cx.customer_id and follow it with other LEFT JOIN.
The INNER JOIN will at least get the initial result to only return any customer who have last_email_open_date value based on what has been specified.

Say COUNT(*), not COUNT(c.id)
Remove these; they slow down the query without adding anything that I can see:
LEFT JOIN `setting` AS `ssh` ON `c`.`shop_id` = `ssh`.`id`
LEFT JOIN `customers_address` AS `ca` ON `ca`.`id` = `cx`.`customer_default_address_id`
LEFT JOIN `lytcustomer_tier` AS `ct` ON `cx`.`lyt_customer_tier_id` = `ct`.`id`
DATE(...) makes that test not "sargable". This works for DATE or DATETIME; and this is much faster:
cx.last_email_open_date > '2019-11-08'
Consider whether that should be >= instead of >.
Need an index on shop_id. (Please provide SHOW CREATE TABLE.)
Don't use LEFT JOIN when JOIN would work equally well.
If customer_extra is columns that should have been in customer, now is the time to move them in. That would let you use this composite index for even more performance:
INDEX(shop_id, last_email_open_date) -- in this order
With those changes, a billion rows in MySQL will probably not be a problem. If it is, there are still more fixes I can suggest.

Related

SQL query optimization for speed

So I was working on the problem of optimizing the following query I have already optimized this to the fullest from my side can this be further optimized?
select distinct name ad_type
from dim_ad_type x where exists ( select 1
from sum_adserver_dimensions sum
left join dim_ad_tag_map on dim_ad_tag_map.id=sum.ad_tag_map_id and dim_ad_tag_map.client_id=sum.client_id
left join dim_site on dim_site.id = dim_ad_tag_map.site_id
left join dim_geo on dim_geo.id = sum.geo_id
left join dim_region on dim_region.id=dim_geo.region_id
left join dim_device_category on dim_device_category.id=sum.device_category_id
left join dim_ad_unit on dim_ad_unit.id=dim_ad_tag_map.ad_unit_id
left join dim_monetization_channel on dim_monetization_channel.id=dim_ad_tag_map.monetization_channel_id
left join dim_os on dim_os.id = sum.os_id
left join dim_ad_type on dim_ad_type.id = dim_ad_tag_map.ad_type_id
left join dim_integration_type on dim_integration_type.id = dim_ad_tag_map.integration_type_id
where sum.client_id = 50
and dim_ad_type.id=x.id
)
order by 1
Your query although joined ok, is an overall bloat. You are using the dim_ad_type table on the outside, just to make sure it exists on the inside as well. You have all those left-joins that have NO bearing on the final outcome, why are they even there. I would simplify by reversing the logic. By tracing your INNER query for the same dim_ad_type table, I find the following is the direct line. sum -> dim_ad_tag_map -> dim_ad_type. Just run that.
select distinct
dat.name Ad_Type
from
sum_adserver_dimensions sum
join dim_ad_tag_map tm
on sum.ad_tag_map_id = tm.id
and sum.client_id = tm.client_id
join dim_ad_type dat
on tm.ad_type_id = dat.id
where
sum.client_id = 50
order by
1
Your query was running ALL dim_ad_types, then finding all the sums just to find those that matched. Run it direct starting with the one client, then direct with JOINs.

Multitable join is taking hours, is there a better way?

I have a few tables recording trades by myself and trades that happen in general, separated out into tradelog and timesales respectively. Both tradelog and timesales each have a second table holding extra data about the trades. I have a column TimeSalesID in my tradelog so I can link up public trades which I participated in. Below is the query I'm running to try and get ALL of my trades and time and sales in one result. The right join is taking FOREVER though, is there a better way?
SELECT SUM(tsJoin.TradeEdge)
FROM
(SELECT * from tradelog tl JOIN slippage_processed sp ON tl.ID = sp.TradeLogID WHERE tl.TradeTIme > '2019-01-21') AS tlJoin
RIGHT JOIN
(SELECT * from timesales ts JOIN slippage_processed_timesales spt ON ts.ID = spt.TimeSalesID WHERE ts.TradeTime > '2019-01-21') AS tsJoin
ON tlJoin.TIMESalesID = tsJoin.ID
Not using subqueries would help. MySQL tends to materialize subqueries, which means that indexes are lost -- greatly impeding query plans.
I would start with:
select sum(?.TradeEdge) -- whatever table column it comes from
from timesales ts join
slippage_processed_timesales spt
on ts.ID = spt.TimeSalesID left join
tradelog tl
on ?.TIMESalesID = ?.ID left join
slippage_processed sp
on tl.ID = sp.TradeLogID and tl.TradeTIme > '2019-01-21'
where ts.TradeTime > '2019-01-21';
The ? are because I don't know the base tables where the columns are coming from. Depending on the tables, the query might need to be adjusted a bit.
Also, I don't think the outer joins are necessary for what you want to do.
You should simplify your sql to simple JOINs. Based on your query above, I think this might get you a result a little faster
SELECT SUM(TradeEdge)
FROM tradelog tl
INNER JOIN slippage_processed sp ON tl.ID = sp.TradeLogID
INNER JOIN timesales ts ON ts.ID = sp.TimeSalesID
WHERE tl.TradeTIme > '2019-01-21'
If this does not work, please post your table structure for all tables involved

Optimizing a MySQL NOT IN( query

I am trying to optimize this MySQL query. I want to get a count of the number of customers that do not have an appointment prior to the current appointment being looked at. In other words, if they have an appointment (which is what the NOT IN( subquery is checking for), then exclude them.
However, this query is absolutely killing performance. I know that MySQL is not very good with NOT IN( queries, but I am not sure on the best way to go about optimizing this query. It takes anywhere from 15 to 30 seconds to run. I have created indexes on CustNo, AptStatus, and AptNum.
SELECT
COUNT(*) AS NumOfCustomersWithPriorAppointment,
FROM
transaction_log AS tl
LEFT JOIN
appointment AS a
ON
a.AptNum = tl.AptNum
INNER JOIN
customer AS c
ON
c.CustNo = tl.CustNo
WHERE
a.AptStatus IN (2)
AND a.CustNo NOT IN
(
SELECT
a2.CustNo
FROM
appointment a2
WHERE
a2.AptDateTime < a.AptDateTime)
AND a.AptDateTime > BEGIN_QUERY_DATE
AND a.AptDateTime < END_QUERY_DATE
Thank you in advance.
Try the following:
SELECT
COUNT(*) AS NumOfCustomersWithPriorAppointment,
FROM
transaction_log AS tl
INNER JOIN
appointment AS a
ON
a.AptNum = tl.AptNum
LEFT OUTER JOIN appointment AS earlier_a
ON earlier_a.CustNo = a.CustNo
AND earlier_a.AptDateTime < a.AptDateTime
INNER JOIN
customer AS c
ON
c.CustNo = tl.CustNo
WHERE
a.AptStatus IN (2)
AND earlier_a.AptNum IS NULL
AND a.AptDateTime > BEGIN_QUERY_DATE
AND a.AptDateTime < END_QUERY_DATE
This will benefit from a composite index on (CustNo,AptDateTime). Make it unique if that fits your business model (logically it seems like it should, but practically it may not, depending on how you handle conflicts in your application.)
Provide SHOW CREATE TABLE statements for all tables if this does not create a sufficient performance improvement.

Improve performance mySQL query

Hi all champions out there
I am far from a guru when it comes to high performance SQL queries and and wonder in anyone can help me improve the query below. It works but takes far too long, especially since I can have 100 or so entries in the IN () part.
The code is as follows, hope you can figure out the schema enough to help.
SELECT inv.amount
FROM invoice inv
WHERE inv.invoiceID IN (
SELECT childInvoiceID
FROM invoiceRelation ir
LEFT JOIN Payment pay ON pay.invoiceID = ir.parentInvoiceID
WHERE pay.paymentID IN ( 125886, 119293, 123497 ) )
Restructure your query to use a join instead of a subselect. Also, use an INNER JOIN instead of a LEFT JOIN to the Payment table. This is justified, since you have a WHERE filter that would filter rows without a match in the Payment table anyway.
SELECT inv.amount
FROM invoice inv
INNER JOIN invoiceRelation ir ON inv.incoiceID = ir.childInvoiceID
INNER JOIN Payment pay on pay.invoiceID = ir.parentInvoiceID
WHERE pay.paymentID IN (...)
One way to improve performance is to have a good index on relevant columns. In your example, an index on inv.invoiceID would probably speed up the query quite a bit.
Also on pay.paymentID
Try this and see if it helps:
ALTER TABLE invoice ADD INDEX invoiceID_idx (invoiceID);
and
ALTER TABLE Payment ADD INDEX paymendID_idx (paymentID);
I think instead of first in you can use inner join.
select inv.amount from invoice inv
inner join invoiceRelation ir on (inv.invoiceID = ir.childInvoiceID)
LEFT JOIN Payment pay ON pay.invoiceID = ir.parentInvoiceID
WHERE pay.paymentID IN (125886,119293,123497)
What if you try like this instead:
SELECT inv.amount
FROM invoice inv
inner join invoiceRelation ir on inv.invoiceID = ir.parentInvoiceID
left join Payment pay on pay.invoiceID = ir.parentInvoiceID
WHERE pay.paymentID IN ( 125886, 119293, 123497 )
(OR) better; if you sure you have a invoiceID FK in payment table then make that left join to a inner join.
In a nutshell, you should try avoiding correlated subquery at all times if you can replace that subquery to a join.

Difficult to complete a tough query script for me

SELECT a.acikkapali,
b.onay, b.evrakno,
b.tarih,
a.kod,
d.ad,
a.note0,
a.sf_miktar,
a.sf_sf_unit,
a.rtalepedilentestarih,
c.evrakno
FROM stok47T a
LEFT JOIN stok47e b on a.evrakno = b.evrakno
LEFT JOIN stok46t1 c on a.evrakno = c.talepno
LEFT JOIN stok00 d on a.kod = d.kod
WHERE a.tarih = '2013/04/15'
I need to add two my tables into that script with right way that means If I mapped one of them then the normal row count increases this makes me crazy, I have been trying to solve that issue for a couple days but I had been fail many times.
I couldn't find a good mapped fields between stok47t and the others. But there are still a few columns(fields) matches for their types and data.
I need to listen ppl opinions and learns something.
Here is a big part of my query
If you are getting increase in row count then chances are it could be due to using LEFT JOIN. An INNER join might help (see http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html guidance)
SELECT a.acikkapali,
b.onay, b.evrakno,
b.tarih,
a.kod,
d.ad,
a.note0,
a.sf_miktar,
a.sf_sf_unit,
a.rtalepedilentestarih,
c.evrakno
FROM stok47T a
INNER JOIN stok47e b on a.evrakno = b.evrakno
INNER JOIN stok46t1 c on a.evrakno = c.talepno
INNER JOIN stok00 d on a.kod = d.kod
WHERE a.tarih = '2013/04/15'
However without understanding your data structure then there is a chance you might lose the information that you are after.
If you are getting multiple rows, it is probably due to a Cartesian product occurring in the joins. This is unrelated to the type of join (left/right/full/inner). It is based on the relationships between the tables. You have 1-N relationships along different dimensions.
Your conditions are:
FROM stok47T a
LEFT JOIN stok47e b on a.evrakno = b.evrakno
LEFT JOIN stok46t1 c on a.evrakno = c.talepno
LEFT JOIN stok00 d on a.kod = d.kod
I have no idea what these tables and fields mean. But, if you have a case where there is one row per evrakno in table stok47t, and there are two rows in table stok47e and three in table stok46t1, then you will get six rows in the output.
Without more information, it is impossible to tell you the best solution. One method is to summarize the tables. Another is to take the first or last corresponding row, by doing something like:
from stok47T a left join
(select s.*, row_number() over (partition by evrakno order by id desc) as seqnum
from stok47e s
) b
on a.evrakno = b.evrakno and b.seqnum = 1