sql/Mysql: What is the best method to complete this query - mysql

I had most of this query worked about except two things, large things, one, as soon as I add the forth table [departments_tbl]into the query, I get about 8K rows returned when I should only have about 100.
See the attached schema, no the checkmarks, these are the fields I want returned.
This won't help, but here is just one of the queries that I almost had working, until the [department_tbl was added to the mix]
SELECT _n_cust_entity_storeid_15.entity_id,
_n_cust_entity_storeid_15.email,
customer_group.customer_group_code,
departments.`name`,
departments.manager,
_n_cust_rpt_copy.first_name,
_n_cust_rpt_copy.last_name,
_n_cust_rpt_copy.last_login_date,
_n_cust_rpt_copy.billing_address,
_n_cust_rpt_copy.billing_city,
_n_cust_rpt_copy.billing_state,
_n_cust_rpt_copy.billing_zip
FROM _n_cust_entity_storeid_15 INNER JOIN customer_group ON _n_cust_entity_storeid_15.group_id = customer_group.customer_group_id
INNER JOIN departments ON _n_cust_entity_storeid_15.store_id = departments.store_id,
_n_cust_rpt_copy
ORDER BY _n_cust_rpt_copy.last_name ASC
I've tried subqueries, joins, but just can't get it to work.
Any help would be greatly appreciated.
Schema Please note that entity_id and cust_id fields would the be links between the _ncust_rpt_copy table and the _n_cust_entity_storeid_15 tbl

You have a cross join to the last table, _n_cust_rpt_copy:
SELECT _n_cust_entity_storeid_15.entity_id,
_n_cust_entity_storeid_15.email,
customer_group.customer_group_code,
departments.`name`,
departments.manager,
_n_cust_rpt_copy.first_name,
_n_cust_rpt_copy.last_name,
_n_cust_rpt_copy.last_login_date,
_n_cust_rpt_copy.billing_address,
_n_cust_rpt_copy.billing_city,
_n_cust_rpt_copy.billing_state,
_n_cust_rpt_copy.billing_zip
FROM _n_cust_entity_storeid_15 INNER JOIN
customer_group
ON _n_cust_entity_storeid_15.group_id = customer_group.customer_group_id INNER JOIN
departments
ON _n_cust_entity_storeid_15.store_id = departments.store_id join
_n_cust_rpt_copy
ON ???
ORDER BY _n_cust_rpt_copy.last_name ASC;
It is not obvious to me what the right join conditions are, but there must be something.
I might guess they it at least includes the department:
_n_cust_rpt_copy
ON _n_cust_rpt_copy.department_name = departments.name and

Related

PHP Queries not working

This is working:
SELECT *
FROM ((((defect
JOIN project_testcase ON
defect.Test_Id=project_testcase.Test_Id)
JOIN testcase ON
defect.Test_Id=testcase.Test_Id)
JOIN project_pm ON
project_testcase.Project_Id=project_pm.Project_Id)
JOIN employee ON
employee.Emp_id=project_pm.Emp_id)
However, this does not work:
SELECT *
FROM ((((defect
JOIN project_testcase ON
defect.Test_Id=project_testcase.Test_Id)
JOIN testcase ON
defect.Test_Id=testcase.Test_Id)
JOIN project_pm ON
project_testcase.Project_Id=project_pm.Project_Id)
JOIN employee ON
employee.Emp_id=project_pm.Emp_id)
WHERE Project_Id LIKE '%$categ%'
As I have used JOIN tables and joined using Project_Id. Is that the error?
The first thing I would do to troubleshoot this is to paste it into an SQL formatter. This will help find syntax errors and could help you see a logic error. I would recommend freeformatter.com.
Second you can get rid of the parenthesis.
The Fix
You need to specify what table to get the Project_Id in the WHERE because it is in multiple tables, but for clarity I would always specify what table it comes from.
select
*
from
defect
join
project_testcase
on defect.Test_Id=project_testcase.Test_Id
join
testcase
on defect.Test_Id=testcase.Test_Id
join
project_pm
on project_testcase.Project_Id=project_pm.Project_Id
join
employee
on employee.Emp_id=project_pm.Emp_id
where
project_testcase.Project_Id like '%$categ%'
Project_Id in the where clause is ambiguous, you need to define it as you have done on your joins ie WHERE project_pm.Project_Id like '%$categ%'
In the where clause of the 2nd query you need to tell the rdbms exactly which Project_Id field you want to filter on, since multiple tables contain the Project_Id field, e.g. project_pm.Project_Id.
Starting by removing all those unnecessary parenthesis, formatting and aliases reveals a fairly simple query underneath.
select * --This should really be the columns you actually need instead of all of them
from defect d
join project_testcase ptc on d.Test_Id = ptc.Test_Id
join testcase t on d.Test_Id = t.Test_Id
join project_pm p on ptc.Project_Id = p.Project_Id
join employee e on e.Emp_id = p.Emp_id
where p.Project_Id like '%$categ%'
Of course this begs the question of why do you have text like that in a column named Project_Id? That does not look like a project id to me. And since you are using a leading wildcard you have a nonSARGable query so you have eliminated the ability of index seeks on that column.

Difference between FROM and JOIN tables

I'm working through the JOIN tutorial on SQL zoo.
Let's say I'm about to execute the code below:
SELECT a.stadium, COUNT(g.matchid)
FROM game a
JOIN goal g
ON g.matchid = a.id
GROUP BY a.stadium
As it happens, it produces the same output as the code below:
SELECT a.stadium, COUNT(g.matchid)
FROM goal g
JOIN game a
ON g.matchid = a.id
GROUP BY a.stadium
So then, when does it matter which table you assign at FROM and which one you assign at JOIN?
When you are using an INNER JOIN like you are here, the order doesn't matter. That is because you are connecting two tables on a common index, so the order in which you use them is up to you. You should pick an order that is most logical to you, and easiest to read. A habit of mine is to put the table I'm selecting from first. In your case, you're selecting information about a stadium, which comes from the game table, so my preference would be to put that first.
In other joins, however, such as LEFT OUTER JOIN and RIGHT OUTER JOIN the order will matter. That is because these joins will select all rows from one table. Consider for example I have a table for Students and a table for Projects. They can exist independently, some students may have an associated project, but not all will.
If I want to get all students and project information while still seeing students without projects, I need a LEFT JOIN:
SELECT s.name, p.project
FROM student s
LEFT JOIN project p ON p.student_id = s.id;
Note here, that the LEFT JOIN refers to the table in the FROM clause, so that means ALL of students were being selected. This also means that p.project will be null for some rows. Order matters here.
If I took the same concept with a RIGHT JOIN, it will select all rows from the table in the join clause. So if I changed the query to this:
SELECT s.name, p.project
FROM student s
RIGHT JOIN project p ON p.student_id = s.id;
This will return all rows from the project table, regardless of whether or not it has a match for students. This means that in some rows, s.name will be null. Similar to the first example, because I've made project the outer joined table, p.project will never be null (assuming it isn't in the original table). In the first example, s.name should never be null.
In the case of outer joins, order will matter. Thankfully, you can think intuitively with LEFT and RIGHT joins. A left join will return all rows in the table to the left of that statement, while a right join returns all rows from the right of that statement. Take this as a rule of thumb, but be careful. You might want to develop a pattern to be consistent with yourself, as I mentioned earlier, so these queries are easier for you to understand later on.
When you only JOIN 2 tables, usually the order does not matter: MySQL scans the tables in the optimal order.
When you scan more than 2 tables, the order could matter:
SELECT ...
FROM a
JOIN b ON ...
JOIN c ON ...
Also, MySQL tries to scan the tables in the fastest way (large tables first). But if a join is slow, it is possible that MySQL is scanning them in a non-optimal order. You can verify this with EXPLAIN. In this case, you can force the join order by adding the STRAIGHT_JOIN keyword.
The order doesn't always matter, I usually just order it in a way that makes sense to someone reading your query.
Sometime order does matter. Try it with LEFT JOIN and RIGHT JOIN.
In this instance you are using an INNER JOIN, if you're expecting a match on a common ID or foreign key, it probably doesn't matter too much.
You would however need to specify the tables the correct way round if you were performing an OUTER JOIN, as not all records in this type of join are guaranteed to match via the same field.
yes, it will matter when you will user another join LEFT JOIN, RIGHT JOIN
currently You are using NATURAL JOIN that is return all tables related data, if JOIN table row not match then it will exclude row from result
If you use LEFT / RIGHT {OUTER} join then result will be different, follow this link for more detail

SQL query for large amount of data with many joins

I have written a sql query for my requirement.
This is working fine for me. This is taking 0.0006 sec to execute.
I want to know from sql experts "will this work fine with large amount of data?".
I have written my query below.
SELECT HM_customers.id,
HM_customers.username,
HM_customers.firstname,
HM_customers.lastname,
HM_customers.company,
HM_customers_address_bank.field_data
FROM HM_orders
JOIN HM_order_items
ON HM_order_items.order_id = HM_orders.id
JOIN HM_bid
ON HM_order_items.bid_id = HM_bid.bid_id
JOIN HM_customers
ON HM_bid.user_id = HM_customers.id
JOIN HM_customers_address_bank
ON HM_customers_address_bank.id = HM_customers.default_billing_address
WHERE HM_orders.id = '4'
Any expert can advice me or let me know how can I improve this query. Please suggest me if any issue in this query.
NOTE:- This is a simple query. But I want to know, will this work with large amount of data with less time
You don't need to include the orders table:
SELECT c.id,
c.username,
c.firstname,
c.lastname,
c.company,
cb.field_data
FROM HM_order_items oi
JOIN HM_bid b
ON oi.bid_id = b.bid_id
JOIN HM_customers c
ON b.user_id = c.id
JOIN HM_customers_address_bank cb
ON cb.id = c.default_billing_address
WHERE oi.order_id = '4';
Your query can also result in duplicate rows, if a customer bids on the same items multiple times. If you put in a select distinct, then you will incur overhead of duplicate elimination. If this becomes a problem, you will probably want to restructure the query as an exists.
There are few points worth noting
1) The reference to an outer table column in the WHERE clause prevents the OUTER JOIN from returning any non-matched rows, which implicitly converts the query to an INNER JOIN. This is probably a bug in the query or a misunderstanding of how OUTER JOIN works.
2) Selecting all columns with the * wildcard will cause the query's meaning and behavior to change if the table's schema changes, and might cause the query to retrieve too much data. You should only choose columns you need.
Please make your driven table to 'HM_customers' as all your data is coming from this table and change your join like this way, hopefully this will help you :)
SELECT hmCust.id,
hmCust.username,
hmCust.firstname,
hmCust.lastname,
hmCust.company,
hmCustAdd.field_data
FROM HM_customers hmCust
INNER JOIN HM_bid hmBid
ON hmBid.user_id = hmCust.id
INNER JOIN HM_customers_address_bank hmCustAdd
ON hmCustAdd.id = hmCust.default_billing_address
INNER JOIN HM_order_items hmOrderItem
ON hmOrderItem.order_id = hmBid.bid_id
INNER JOIN HM_orders hmOrder
ON hmOrder.id = hmOrderItem.order_id
WHERE hmOrder.id = '4'

SELECT DISTINCT statement in MySQL is taking 10 minutes

I'm reasonably new to MySQL and I'm trying to select a distinct set of rows using this statement:
SELECT DISTINCT sp.atcoCode, sp.name, sp.longitude, sp.latitude
FROM `transportdata`.stoppoints as sp
INNER JOIN `vehicledata`.gtfsstop_times as st ON sp.atcoCode = st.fk_atco_code
INNER JOIN `vehicledata`.gtfstrips as trip ON st.trip_id = trip.trip_id
INNER JOIN `vehicledata`.gtfsroutes as route ON trip.route_id = route.route_id
INNER JOIN `vehicledata`.gtfsagencys as agency ON route.agency_id = agency.agency_id
WHERE agency.agency_id IN (1,2,3,4);
However, the select statement is taking around 10 minutes, so something is clearly afoot.
One significant factor is that the table gtfsstop_times is huge. (~250 million records)
Indexes seem to be set up properly; all the above joins are using indexed columns. Table sizes are, roughly:
gtfsagencys - 4 rows
gtfsroutes - 56,000 rows
gtfstrips - 5,500,000 rows
gtfsstop_times - 250,000,000 rows
`transportdata`.stoppoints - 400,000 rows
The server has 22Gb of memory, I've set the InnoDB buffer pool to 8G and I'm using MySQL 5.6.
Can anybody see a way of making this run faster? Or indeed, at all!
Does it matter that the stoppoints table is in a different schema?
EDIT:
EXPLAIN SELECT... returns this:
It looks like you are trying to find a collection of stop points, based on certain criteria. And, you're using SELECT DISTINCT to avoid duplicate stop points. Is that right?
It looks like atcoCode is a unique key for your stoppoints table. Is that right?
If so, try this:
SELECT sp.name, sp.longitude, sp.latitude, sp.atcoCode
FROM `transportdata`.stoppoints` AS sp
JOIN (
SELECT DISTINCT st.fk_atco_code AS atcoCode
FROM `vehicledata`.gtfsroutes AS route
JOIN `vehicledata`.gtfstrips AS trip ON trip.route_id = route.route_id
JOIN `vehicledata`.gtfsstop_times AS st ON trip.trip_id = st.trip_id
WHERE route.agency_id BETWEEN 1 AND 4
) ids ON sp.atcoCode = ids.atcoCode
This does a few things: It eliminates a table (agency) which you don't seem to need. It changes the search on agency_id from IN(a,b,c) to a range search, which may or may not help. And finally it relocates the DISTINCT processing from a situation where it has to handle a whole ton of data to a subquery situation where it only has to handle the ID values.
(JOIN and INNER JOIN are the same. I used JOIN to make the query a bit easier to read.)
This should speed you up a bit. But, it has to be said, a quarter gigarow table is a big table.
Having 250M records, I would shard the gtfsstop_times table on one column. Then each sharded table can be joined in a separate query that can run parallel in separate threads, you'll only need to merge the result sets.
The trick is to reduce how many rows of gtfsstop_times SQL has to evaluate. In this case SQL first evaluates every row in the inner join of gtfsstop_times and transportdata.stoppoints, right? How many rows does transportdata.stoppoints have? Then SQL evaluates the WHERE clause, then it evaluates DISTINCT. How does it do DISTINCT? By looking at every single row multiple times to determine if there are other rows like it. That would take forever, right?
However, GROUP BY quickly squishes all the matching rows together, without evaluating each one. I normally use joins to quickly reduce the number of rows the query needs to evaluate, then I look at my grouping.
In this case you want to replace DISTINCT with grouping.
Try this;
SELECT sp.name, sp.longitude, sp.latitude, sp.atcoCode
FROM `transportdata`.stoppoints as sp
INNER JOIN `vehicledata`.gtfsstop_times as st ON sp.atcoCode = st.fk_atco_code
INNER JOIN `vehicledata`.gtfstrips as trip ON st.trip_id = trip.trip_id
INNER JOIN `vehicledata`.gtfsroutes as route ON trip.route_id = route.route_id
INNER JOIN `vehicledata`.gtfsagencys as agency ON route.agency_id = agency.agency_id
WHERE agency.agency_id IN (1,2,3,4)
GROUP BY sp.name
, sp.longitude
, sp.latitude
, sp.atcoCode
There other valuable answers to your question and mine is an addition to it. I assume sp.atcoCode and st.fk_atco_code are indexed columns in their table.
If you can validate and make sure that agency ids in the WHERE clause are valid, you can eliminate joining `vehicledata.gtfsagencys` in the JOINS as you are not fetching any records from the table.
SELECT DISTINCT sp.atcoCode, sp.name, sp.longitude, sp.latitude
FROM `transportdata`.stoppoints as sp
INNER JOIN `vehicledata`.gtfsstop_times as st ON sp.atcoCode = st.fk_atco_code
INNER JOIN `vehicledata`.gtfstrips as trip ON st.trip_id = trip.trip_id
INNER JOIN `vehicledata`.gtfsroutes as route ON trip.route_id = route.route_id
WHERE route.agency_id IN (1,2,3,4);

simple joins between 2 mysql tables returning all results every time.. Help!

I just imported a large amount of data into two tables. Let's call them shipments and returns.
When trying to do a simple join (left or inner) based on any criteria in these two tables. query looks like it tries to do a cross join or find every combination instead of what the query should be pulling.
each table has an PK id field, but there is not FK relationship between the two other than some shared field.
I'm currently just trying to related them on shipment_id.
I feel this is a simple answer. Am I missing a reference or something obvious that is causing this? Thanks!
here's an example. This should returned under 100 rows. This instead returns hundreds of thousands.
SELECT r.*
FROM returns as r
left outer join shipments as s
on r.shipment_id = s.shipment_id
where r.date = '2011-06-20'
Here is a query that should work:
SELECT T0.*, T1.*
FROM shipments AS T0 LEFT JOIN returns AS T1 ON T0.shipment_id = T1.shipment_id
ORDER BY T0.shipment_id;
This query join assumes 1:1 on the shipment_id
It would be nice if you included the query you were using
You need to specify what you are joining on, otherwise it will do a cartesian join:
SELECT r.*
FROM returns as r
LEFT JOIN shipments as s ON s.shipment_id = r.shipment_id
where r.date = '2011-06-20'
Josh,
I would be interested in seeing what would happen if you forced a join to a specific record or set of records instead of the whole table. Assuming there is a shipment with an id of 5 in your table, you could try:
SELECT r.* FROM returns as r
left join shipments as s
ON 5 = r.shipment_id
WHERE r.date = '2011-06-20'
While just a fancy where clause, it would at least prove that the join you are attempting will eventually work correctly. The issue is that your on clause is always returning true, no matter what the value is. This could be because it's not interpreting the shipment_id as an integer, but instead as a true/false variable where any value evaluates to true.
Original Rejected Solution:
No Foreign Key relationship should be needed in order to make the joins happen. The PK id fields I'm assuming are an integer (or number, or whatever your rdms equivalent is)?
Can you past a snippet of your sql query?
Updating based on posted query:
I would add your explicit join criteria in order to rule out any funny business (my guess is since no criteria is specified, it's using 1=1, which always joins). So I would change your query to look like:
SELECT r.*
FROM returns as r
left join shipments as s ON
s.ShipId = R.ReturnId
where r.date = '2011-06-20'
The issue turned out to be very simple, just not readily apparent until going through all the columns. It turns out that the shipment ID was duplicated through every row as it hit the upper limit for the int datatype. This is why joins were returning every record.
After switching the datatype to bigint and reimporting, everything worked great. Thanks all for looking into it.