SQL query for large amount of data with many joins

SQL query for large amount of data with many joins - mysql

I have written a sql query for my requirement.
This is working fine for me. This is taking 0.0006 sec to execute.
I want to know from sql experts "will this work fine with large amount of data?".
I have written my query below.
SELECT HM_customers.id,
HM_customers.username,
HM_customers.firstname,
HM_customers.lastname,
HM_customers.company,
HM_customers_address_bank.field_data
FROM HM_orders
JOIN HM_order_items
ON HM_order_items.order_id = HM_orders.id
JOIN HM_bid
ON HM_order_items.bid_id = HM_bid.bid_id
JOIN HM_customers
ON HM_bid.user_id = HM_customers.id
JOIN HM_customers_address_bank
ON HM_customers_address_bank.id = HM_customers.default_billing_address
WHERE HM_orders.id = '4'
Any expert can advice me or let me know how can I improve this query. Please suggest me if any issue in this query.
NOTE:- This is a simple query. But I want to know, will this work with large amount of data with less time

You don't need to include the orders table:
SELECT c.id,
c.username,
c.firstname,
c.lastname,
c.company,
cb.field_data
FROM HM_order_items oi
JOIN HM_bid b
ON oi.bid_id = b.bid_id
JOIN HM_customers c
ON b.user_id = c.id
JOIN HM_customers_address_bank cb
ON cb.id = c.default_billing_address
WHERE oi.order_id = '4';
Your query can also result in duplicate rows, if a customer bids on the same items multiple times. If you put in a select distinct, then you will incur overhead of duplicate elimination. If this becomes a problem, you will probably want to restructure the query as an exists.

There are few points worth noting
1) The reference to an outer table column in the WHERE clause prevents the OUTER JOIN from returning any non-matched rows, which implicitly converts the query to an INNER JOIN. This is probably a bug in the query or a misunderstanding of how OUTER JOIN works.
2) Selecting all columns with the * wildcard will cause the query's meaning and behavior to change if the table's schema changes, and might cause the query to retrieve too much data. You should only choose columns you need.

Please make your driven table to 'HM_customers' as all your data is coming from this table and change your join like this way, hopefully this will help you :)
SELECT hmCust.id,
hmCust.username,
hmCust.firstname,
hmCust.lastname,
hmCust.company,
hmCustAdd.field_data
FROM HM_customers hmCust
INNER JOIN HM_bid hmBid
ON hmBid.user_id = hmCust.id
INNER JOIN HM_customers_address_bank hmCustAdd
ON hmCustAdd.id = hmCust.default_billing_address
INNER JOIN HM_order_items hmOrderItem
ON hmOrderItem.order_id = hmBid.bid_id
INNER JOIN HM_orders hmOrder
ON hmOrder.id = hmOrderItem.order_id
WHERE hmOrder.id = '4'

Related

Is there any difference, performance wise, with these two queries? (Repeating the where clause inside the sub-query) MYSQL

I have a query that goes something like this.
Select *
FROM FaultCode FC
JOIN (
SELECT INNER_E.* FROM Equipment INNER_E
) E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057 AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN'
As you can see, in the outside query, there is a where clause in the outside main query.
But also, on the inside, we have an Inner Join statement with the line SELECT INNER_E.* FROM Equipment INNER_E. This inner join makes us only retrieve the fault codes that are inside the equipment table (correct me if I'm wrong).
I am trying to optimize this query.
My question is, does it make any difference to do this
Select *
FROM FaultCode FC
JOIN (
SELECT INNER_E.* FROM Equipment INNER_E
WHERE INNER_E.id_organization = 100057 AND INNER_E.equipment_status = 'ACTIVE'
) E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057 AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN'
So repeating the where clause inside the inner sub query, to further limit it before it joins. Or does the optimizer know to do this automatically?
I tried implementing that line in code, and it seemed to only make my query slower strangely enough. Is there any way I can optimize that query above, or since it's pretty simple, is that the best it's going to get without indexes?
I tried running the Explain Select statement, but I have a hard time parsing what it's telling me. Are there any good resources I can look into to learn some tips or techniques to optimize my query?
I don't have any aggregate functions in my Select fields. So is the only real answer Indexes?

Why is the first subquery needed? Perhaps simply
Select *
FROM FaultCode FC
JOIN Equipment AS E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type
AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057
AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN';
Likely Indexes:
FC: INDEX(code_status, EquipmentID)
E: INDEX(id_organization, equipment_status, EquipmentID,)
Probably unwise to do SELECT * -- It will give you all the columns of all 4 tables. (Without further details, I cannot suggest any "covering" indexes, which seems likely for AT.)
With my version of the query, your question about repeating the WHERE vanishes. With your version, it is likely to help. I don't think the Optimizer is smart enough to catch on to what you are doing.
Show us the EXPLAINs. We can help some with what the cryptic stuff is saying. (And what it is not saying.)
"the best it's going to get without indexes" -- Are you saying you have no indexes??! Not even a PRIMARY KEY for each table? "So is the only real answer Indexes?" Every time you write a query against a non-tiny table, you should ask "do the table(s) have adequate indexes for this query?"

Sql Join taking a lot of time

I am tying to execute this query but it is taking more than 5 hours, but the data base size is just 20mb. this is my code. Here I am joining 11 tables with reg_id. I need all columns with distinct values. Please guide me how to rearrange the query.
SELECT *
FROM degree
JOIN diploma
ON degree.reg_id = diploma.reg_id
JOIN further_studies
ON diploma.reg_id = further_studies.reg_id
JOIN iti
ON further_studies.reg_id = iti.reg_id
JOIN personal_info
ON iti.reg_id = personal_info.reg_id
JOIN postgraduation
ON personal_info.reg_id = postgraduation.reg_id
JOIN puc
ON postgraduation.reg_id = puc.reg_id
JOIN skills
ON puc.reg_id = skills.reg_id
JOIN sslc
ON skills.reg_id = sslc.reg_id
JOIN license
ON sslc.reg_id = license.reg_id
JOIN passport
ON license.reg_id = passport.reg_id
GROUP BY fullname
Please help me if I did any mistake

This is a bit long for a comment.
The first problem with your query is that you are using select * with group by fullname. You have zillions of columns in the select that are not in the group by. Unless you really, really, really know what you are doing (which I doubt), this is the wrong way to write a query.
Your performance problem is undoubtedly due to cartesian products and lack of indexes. You are joining across different dimensions -- such as skills and degrees. The result is a product of all the possibilities. For some people, the data size can grow and grow and grow.
And then, the question is: do you have indexes on the keys used in the joins? For performance, you generally want such indexes.

I thought the problem is in the query.First make sure group by fullname and try to give some column names instead of *.

SQL join query - join instead of inline query

I would like to use join instead of inline queries but the one I was able to make is giving wrong values.
Please check this link - http://www.sqlfiddle.com/#!2/57cad/9
It has 2 individual queries which give correct values and a join query which gives an incorrect result.
Can someone please help...

Answer shown here: SQLFiddle
Your queries could be improved upon in a few areas which make their JOINing a little more obvious.
In your first query, a better version has the the GROUP BY clause's columns listed in the SELECT clause and your HAVING clause (while working) becomes the WHERE clause (IMO: The best practice is to use aggregate function only in the HAVING clause:)
SELECT usercode, ROUND(coalesce(sum(paymentamount)*0.99,0),2) AS payment
FROM accountpayments
WHERE usercode = 21
GROUP BY usercode;
Your second query can be rewritten as a JOIN (vs. the subquery)
SELECT campaigns.usercode, ROUND(coalesce(sum(lmc_cds.total_spending),0),2) AS total_spending
FROM logsmaincontrols_campaigns_daily_stats AS lmc_cds
JOIN campaigns
ON campaigns.campcode = lmc_cds.campcode
WHERE campaigns.usercode = 21;
Since the queries don't share any tables, I decided to JOIN the queries to each other as derived tables using the usercode as the JOINing column.
SELECT t1.usercode, t1.payment, t2.total_spending
FROM (SELECT usercode, ROUND(coalesce(sum(paymentamount)*0.99,0),2) AS payment
FROM accountpayments
WHERE usercode = 21
GROUP BY usercode) AS t1
JOIN (SELECT campaigns.usercode, ROUND(coalesce(sum(lmc_cds.total_spending),0),2) AS total_spending
FROM logsmaincontrols_campaigns_daily_stats AS lmc_cds
JOIN campaigns
ON campaigns.campcode = lmc_cds.campcode
WHERE campaigns.usercode = 21) AS t2
ON t1.usercode = t2.usercode;

sql/Mysql: What is the best method to complete this query

I had most of this query worked about except two things, large things, one, as soon as I add the forth table [departments_tbl]into the query, I get about 8K rows returned when I should only have about 100.
See the attached schema, no the checkmarks, these are the fields I want returned.
This won't help, but here is just one of the queries that I almost had working, until the [department_tbl was added to the mix]
SELECT _n_cust_entity_storeid_15.entity_id,
_n_cust_entity_storeid_15.email,
customer_group.customer_group_code,
departments.`name`,
departments.manager,
_n_cust_rpt_copy.first_name,
_n_cust_rpt_copy.last_name,
_n_cust_rpt_copy.last_login_date,
_n_cust_rpt_copy.billing_address,
_n_cust_rpt_copy.billing_city,
_n_cust_rpt_copy.billing_state,
_n_cust_rpt_copy.billing_zip
FROM _n_cust_entity_storeid_15 INNER JOIN customer_group ON _n_cust_entity_storeid_15.group_id = customer_group.customer_group_id
INNER JOIN departments ON _n_cust_entity_storeid_15.store_id = departments.store_id,
_n_cust_rpt_copy
ORDER BY _n_cust_rpt_copy.last_name ASC
I've tried subqueries, joins, but just can't get it to work.
Any help would be greatly appreciated.
Schema Please note that entity_id and cust_id fields would the be links between the _ncust_rpt_copy table and the _n_cust_entity_storeid_15 tbl

You have a cross join to the last table, _n_cust_rpt_copy:
SELECT _n_cust_entity_storeid_15.entity_id,
_n_cust_entity_storeid_15.email,
customer_group.customer_group_code,
departments.`name`,
departments.manager,
_n_cust_rpt_copy.first_name,
_n_cust_rpt_copy.last_name,
_n_cust_rpt_copy.last_login_date,
_n_cust_rpt_copy.billing_address,
_n_cust_rpt_copy.billing_city,
_n_cust_rpt_copy.billing_state,
_n_cust_rpt_copy.billing_zip
FROM _n_cust_entity_storeid_15 INNER JOIN
customer_group
ON _n_cust_entity_storeid_15.group_id = customer_group.customer_group_id INNER JOIN
departments
ON _n_cust_entity_storeid_15.store_id = departments.store_id join
_n_cust_rpt_copy
ON ???
ORDER BY _n_cust_rpt_copy.last_name ASC;
It is not obvious to me what the right join conditions are, but there must be something.
I might guess they it at least includes the department:
_n_cust_rpt_copy
ON _n_cust_rpt_copy.department_name = departments.name and

SELECT DISTINCT statement in MySQL is taking 10 minutes

I'm reasonably new to MySQL and I'm trying to select a distinct set of rows using this statement:
SELECT DISTINCT sp.atcoCode, sp.name, sp.longitude, sp.latitude
FROM `transportdata`.stoppoints as sp
INNER JOIN `vehicledata`.gtfsstop_times as st ON sp.atcoCode = st.fk_atco_code
INNER JOIN `vehicledata`.gtfstrips as trip ON st.trip_id = trip.trip_id
INNER JOIN `vehicledata`.gtfsroutes as route ON trip.route_id = route.route_id
INNER JOIN `vehicledata`.gtfsagencys as agency ON route.agency_id = agency.agency_id
WHERE agency.agency_id IN (1,2,3,4);
However, the select statement is taking around 10 minutes, so something is clearly afoot.
One significant factor is that the table gtfsstop_times is huge. (~250 million records)
Indexes seem to be set up properly; all the above joins are using indexed columns. Table sizes are, roughly:
gtfsagencys - 4 rows
gtfsroutes - 56,000 rows
gtfstrips - 5,500,000 rows
gtfsstop_times - 250,000,000 rows
`transportdata`.stoppoints - 400,000 rows
The server has 22Gb of memory, I've set the InnoDB buffer pool to 8G and I'm using MySQL 5.6.
Can anybody see a way of making this run faster? Or indeed, at all!
Does it matter that the stoppoints table is in a different schema?
EDIT:
EXPLAIN SELECT... returns this:

It looks like you are trying to find a collection of stop points, based on certain criteria. And, you're using SELECT DISTINCT to avoid duplicate stop points. Is that right?
It looks like atcoCode is a unique key for your stoppoints table. Is that right?
If so, try this:
SELECT sp.name, sp.longitude, sp.latitude, sp.atcoCode
FROM `transportdata`.stoppoints` AS sp
JOIN (
SELECT DISTINCT st.fk_atco_code AS atcoCode
FROM `vehicledata`.gtfsroutes AS route
JOIN `vehicledata`.gtfstrips AS trip ON trip.route_id = route.route_id
JOIN `vehicledata`.gtfsstop_times AS st ON trip.trip_id = st.trip_id
WHERE route.agency_id BETWEEN 1 AND 4
) ids ON sp.atcoCode = ids.atcoCode
This does a few things: It eliminates a table (agency) which you don't seem to need. It changes the search on agency_id from IN(a,b,c) to a range search, which may or may not help. And finally it relocates the DISTINCT processing from a situation where it has to handle a whole ton of data to a subquery situation where it only has to handle the ID values.
(JOIN and INNER JOIN are the same. I used JOIN to make the query a bit easier to read.)
This should speed you up a bit. But, it has to be said, a quarter gigarow table is a big table.

Having 250M records, I would shard the gtfsstop_times table on one column. Then each sharded table can be joined in a separate query that can run parallel in separate threads, you'll only need to merge the result sets.

The trick is to reduce how many rows of gtfsstop_times SQL has to evaluate. In this case SQL first evaluates every row in the inner join of gtfsstop_times and transportdata.stoppoints, right? How many rows does transportdata.stoppoints have? Then SQL evaluates the WHERE clause, then it evaluates DISTINCT. How does it do DISTINCT? By looking at every single row multiple times to determine if there are other rows like it. That would take forever, right?
However, GROUP BY quickly squishes all the matching rows together, without evaluating each one. I normally use joins to quickly reduce the number of rows the query needs to evaluate, then I look at my grouping.
In this case you want to replace DISTINCT with grouping.
Try this;
SELECT sp.name, sp.longitude, sp.latitude, sp.atcoCode
FROM `transportdata`.stoppoints as sp
INNER JOIN `vehicledata`.gtfsstop_times as st ON sp.atcoCode = st.fk_atco_code
INNER JOIN `vehicledata`.gtfstrips as trip ON st.trip_id = trip.trip_id
INNER JOIN `vehicledata`.gtfsroutes as route ON trip.route_id = route.route_id
INNER JOIN `vehicledata`.gtfsagencys as agency ON route.agency_id = agency.agency_id
WHERE agency.agency_id IN (1,2,3,4)
GROUP BY sp.name
, sp.longitude
, sp.latitude
, sp.atcoCode

There other valuable answers to your question and mine is an addition to it. I assume sp.atcoCode and st.fk_atco_code are indexed columns in their table.
If you can validate and make sure that agency ids in the WHERE clause are valid, you can eliminate joining `vehicledata.gtfsagencys` in the JOINS as you are not fetching any records from the table.
SELECT DISTINCT sp.atcoCode, sp.name, sp.longitude, sp.latitude
FROM `transportdata`.stoppoints as sp
INNER JOIN `vehicledata`.gtfsstop_times as st ON sp.atcoCode = st.fk_atco_code
INNER JOIN `vehicledata`.gtfstrips as trip ON st.trip_id = trip.trip_id
INNER JOIN `vehicledata`.gtfsroutes as route ON trip.route_id = route.route_id
WHERE route.agency_id IN (1,2,3,4);

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SQL query for large amount of data with many joins - mysql

Related

Is there any difference, performance wise, with these two queries? (Repeating the where clause inside the sub-query) MYSQL

Sql Join taking a lot of time

SQL join query - join instead of inline query

sql/Mysql: What is the best method to complete this query

SELECT DISTINCT statement in MySQL is taking 10 minutes

Categories

Resources