I'm doing a project at university and I seem to be encountering some issues when i'm trying to search to collect some results.
I am trying to display the results which give the StudentName, ModuleName, and DegreeID. When I do this, it appears to be duplicating the values and returning wrong results.
For example - Owen Barnes is only studying Computer Science, not Philosophy yet it is simply returning all values instead of the specified 3 it should. Further, Connor Borne is studying Philosophy yet it is suggesting he is studying every module including those in Computer Science.
I was hoping someone could help me. I'm using 2 tables (ModulesFormDegree & StudiesModules) which are used to link Modules to Degree (using 2 foreign keys) and Students with Modules (also using 2 foreign keys).
I've attached my problem below, if any more data is required please let me know.
Inquiry & Results
Description of Tables
Query:
select StudentName, ModuleName, DegreeID
from Student, Modules, Degree, StudiesModules, ModulesFormDegree
where Student.StudentID=StudiesModules.StudentID and
Modules.ModuleID=ModulesFormDegree.ModID and
Degree.DegreeID=ModulesFormDegree.DegID
It's diffcult to say for sure because you have not posted all your table definitions but your query is missing a condition in the where clause which is causing the Cartesian product and can be fixed as follows:
select StudentName,
ModuleName,
DegreeID
from Student,
Modules,
Degree,
StudiesModules,
ModulesFormDegree
where Student.StudentID=StudiesModules.StudentID and
Modules.ModuleID=ModulesFormDegree.ModID and
Degree.DegreeID=ModulesFormDegree.DegID and
StudiesModules.ModuleID = ModulesFormDegree.ModID
however, joining tables on conditions in the WHERE clause is fairly antiquated and superseded using ANSI joins as follows:
SELECT StudentName,
ModuleName,
DegreeID
FROM StudiesModules sm
JOIN ModulesFormDegree md
ON sm.ModuleID = md.ModID
JOIN Degree d
On d.DegreeID = md.DegID
JOIN Modules m
ON m.ModuleID = md.ModID
JOIN Student s
ON s.StudentID = sm.StudentID
Edit2: Chose to separate the queries and collate/handle the information as a whole outside of the database's output. Taking these out in a .CSV format, and adding them into Excel where I'm going to be running the actual numbers.
Query 1 to pull out orders and desired info:
SELECT
shipstation_orders_v2.id AS SSO_id,
shipstation_orders_v2.order_number AS SSO_orderNumber,
shipstation_orders_v2.order_id AS SSO_orderID,
shipstation_orders_v2.storename AS SSO_storeName,
shipstation_orders_v2.order_date AS SSO_orderDate,
shipstation_orders_v2.order_total AS SSO_orderTotal,
shipstation_orders_v2.name AS SSO_name,
shipstation_orders_v2.company AS SSO_company
FROM shipstation_orders_v2
GROUP BY shipstation_orders_v2.id,
shipstation_orders_v2.order_number,
shipstation_orders_v2.order_id,
shipstation_orders_v2.storename,
shipstation_orders_v2.order_date,
shipstation_orders_v2.order_total,
shipstation_orders_v2.name,
shipstation_orders_v2.company
ORDER BY SSO_orderDate
Query 2 to pull out fulfillments and equivalent info:
SELECT DISTINCT
shipstation_orders_v2.id AS SSO_id,
shipstation_fulfillments.id AS SSF_id,
shipstation_fulfillments.order_number AS SSF_orderNumber,
shipstation_orders_v2.order_number AS SSO_orderNumber,
shipstation_orders_v2.order_id AS SSO_orderID,
shipstation_orders_v2.storename AS SSO_storeName,
shipstation_orders_v2.order_date AS SSO_orderDate,
shipstation_fulfillments.order_date AS SSF_orderDate,
shipstation_orders_v2.order_total AS SSO_orderTotal,
shipstation_fulfillments.amount_paid AS SSF_amountPaid,
shipstation_orders_v2.name AS SSO_name,
shipstation_orders_v2.company AS SSO_company,
shipstation_fulfillments.name AS SSF_name,
shipstation_fulfillments.company AS SSF_company
FROM shipstation_fulfillments
INNER JOIN shipstation_orders_v2
ON shipstation_fulfillments.order_number =
shipstation_orders_v2.order_number
WHERE shipstation_fulfillments.order_number =
shipstation_orders_v2.order_number
GROUP BY shipstation_orders_v2.id,
shipstation_fulfillments.id,
shipstation_fulfillments.order_number,
shipstation_orders_v2.order_number,
shipstation_orders_v2.order_id,
shipstation_orders_v2.storename,
shipstation_orders_v2.order_date,
shipstation_fulfillments.order_date,
shipstation_orders_v2.order_total,
shipstation_fulfillments.amount_paid,
shipstation_orders_v2.name,
shipstation_orders_v2.company,
shipstation_fulfillments.name,
shipstation_fulfillments.company
Edit: Question marked as answered. I figured out another way to do it that wasn't quite as harebrained. Props to DRapp for getting my brain moving.
Original Code is below Wall of Text
I'm a self-taught MySQL database user. I won't say administrator, since it's just me. I've put together a small database for work - about 60,000 rows and a maximum of 51 columns spread over three tables. I use this at work as a way to organize a fairly disparate sales data setup and make sense of it to identify trends, seasonality, all that good stuff. I work primarily with Shipstation data.
My problem is when I needed to introduce this third table. With two tables, obviously, it's just a simple JOIN. I got that working just fine. I'm having quite a bit of trouble setting up the JOINs correctly for this third table.
I'm attempting to JOIN the data from the two innermost queries to shipstation_orders_v2 and order_keys to the shipstation_fulfillments results I have in the third table.
For those of you who don't use Shipstation or aren't familiar with this element of it, fulfillments are in a different category than orders and don't use quite the same data. This is my dirty way of gluing them together so we have some decent, manipulable information on sales and shipping trends, etc.
I am making an internal query from shipstation_orders_v2 to order_keys as a way to SELECT DISTINCT the sum totals of split orders. I had problems with data duplication before I built up that subquery. With the (now) subquery and sub-subquery, the duping problem has been eliminated and with just those two tables it worked fine.
The issue is, when I'm making the SELECT from shipstation_fulfillments with a JOIN to the subquery and sub-subquery, I'm hitting a roadblock.
I've gotten several errors while working on this query. In order of occurrence and resolution:
Error 2013, lost connection to server during query (which told me I'm doing a full table read on three joined tables, since it isn't erroring out beforehand, but my rinkadink setup can't handle it). I got rid of that one.
Then, Error 1051 for an unidentified table name shipstation_fulfillments. To me I think it might be an issue for the query aliases. I am not sure.
Finally, good ole Error 1064, incorrect syntax on the first subquery after the
SELECT shipstation_fulfillments arguments.
Being self-taught, I'd virtually guarantee I'm merely missing an element of syntax somewhere that would appear fairly obvious to a well-practiced user of MySQL. Below is my current query setup.
If there needs to be any clarification, let me know.
SELECT
`shipstation_fulfillments`.`order_date` AS `orderDate`,
`shipstation_fulfillments`.`order_number` AS `orderNumber`,
(`shipstation_fulfillments`.`amount_paid` + `shipstation_fulfillments`.`tax_paid`) AS "Total Paid",
`shipstation_fulfillments`.`name` AS `name`,
`shipstation_fulfillments`.`company` AS `company`,
FROM
(
(SELECT
COUNT(`shipstation_orders_v2`.`order_key`) AS `orderCount`,
`shipstation_orders_v2`.`key_id` AS `key_id`,
`shipstation_orders_v2`.`order_number` AS `order_number`,
MAX(`shipstation_orders_v2`.`order_date`) AS `order_date`,
`shipstation_orders_v2`.`storename` AS `store`,
(`shipstation_orders_v2`.`order_total` - `shipstation_orders_v2`.`shippingPaid`) AS `orderPrice`,
`shipstation_orders_v2`.`shippingpaid` AS `shippingPaid`,
SUM(`shipstation_orders_v2`.`shippingpaid`) AS `SUM shippingPaid`,
`shipstation_orders_v2`.`order_total` AS `orderTotal`,
SUM(`shipstation_orders_v2`.`order_total`) AS `SUM Total Amount Paid`,
`shipstation_orders_v2`.`qtyshipped` AS `qtyShipped`,
SUM(`shipstation_orders_v2`.`qtyshipped`) AS `SUM qtyShipped`,
`shipstation_orders_v2`.`name` AS `name`,
`shipstation_orders_v2`.`company` AS `company`
FROM
(SELECT DISTINCT
`order_keys`.`key_id` AS `key_id`,
`order_keys`.`order_key` AS `order_key`,
`shipstation_orders_v2`.`order_number` AS `order_number`,
`shipstation_orders_v2`.`order_id` AS `order_id`,
`shipstation_orders_v2`.`order_date` AS `order_date`,
`shipstation_orders_v2`.`storename` AS `storename`,
`shipstation_orders_v2`.`order_total` AS `order_total`,
`shipstation_orders_v2`.`qtyshipped` AS `qtyshipped`,
`shipstation_orders_v2`.`shippingpaid` AS `shippingpaid`,
`shipstation_orders_v2`.`name` AS `name`,
`shipstation_orders_v2`.`company` AS `company`
FROM
(`shipstation_orders_v2`
JOIN `order_keys` ON ((`order_keys`.`order_key` = `shipstation_orders_v2`.`order_id`)))) `t`)
JOIN `shipstation_fulfillments`
ON (`shipstation_orders_v2`.`order_number` = `shipstation_fulfillments`.`order_number`)) `w`
As a couple notes... As for long table names, no problem, but you can use alias references to them such as I have done via example ...ShipStation_Fulfillments SSF... the "SSF" is now an alias for shorter typing yet still makes sense of origin.
When changing column names in query via "AS", you only need the as if your column name result will change from its original as you had in the beginning such as SSF.order_date AS orderDate where you remove the "_" from the final column name, but also in "Total Paid" (yet I HATE column names with embedded spaces, let the user interface handle labeling things, but thats just me).
When typing table.column (or alias.column), doing via CamelCasing helps readability vs camelcasing slightly harder to read where the brain naturally breaks into readable words for us.
Other issue based on query. Outer query portions can't recognize aliases from inner closed queryies, only the alias of the subselect as you had with the "t" and "w" aliases.
Next, when doing JOINs, my preference is to read them in the way the tables are within the query listing the first one on the left, and whatever is joined TO on the right.
If went from Table A Join to Table B, the ON clause would be ON A.KeyID = B.KeyID vs B.KeyID = A.KeyID especially if you are going several tables... A->B, B->C, C->D
Any query with aggregates (sum, avg, count, min, max, etc) must have a "GROUP BY" clause to identify when each record should break. In your example, I would assume break on the original sales order.
Although this query IS NOT WORKING, here is a cleaned-up version of your query showing implementations from above.
SELECT
SSF.order_date AS OrderDate,
SSF.order_number AS OrderNumber,
(SSF.amount_paid + SSF.tax_paid) AS `Total Paid`,
SSF.name,
SSF.company
FROM
( SELECT
SSOv2.key_id,
SSOv2.order_number,
SSOv2.storename AS store,
SSOv2.order_total - SSOv2.shippingPaid AS OrderPrice,
SSOv2.ShippingPaid,
SSOv2.order_total AS OrderTotal,
SSOv2.QtyShipped,
SSOv2.name,
SSOv2.company,
COUNT(SSOv2.order_key) AS orderCount,
MAX(SSOv2.order_date) AS order_date,
SUM(SSOv2.shippingpaid) AS `SUM shippingPaid`,
SUM(SSOv2.order_total) AS `SUM Total Amount Paid`,
SUM(SSOv2.qtyshipped) AS `SUM qtyShipped`
FROM
( SELECT DISTINCT
OK.key_id AS key_id,
OK.order_key AS order_key,
SSOv2.order_number AS order_number,
SSOv2.order_id AS order_id,
SSOv2.order_date AS order_date,
SSOv2.storename AS storename,
SSOv2.order_total AS order_total,
SSOv2.qtyshipped AS qtyshipped,
SSOv2.shippingpaid AS shippingpaid,
SSOv2.name AS name,
SSOv2.company AS company
FROM
shipstation_orders_v2 SSOv2
JOIN order_keys
ON SSOv2.order_id = OK.order_key
JOIN shipstation_fulfillments SSF
ON SSOv2.order_number = SSF.order_number ) t
) w
Next, without seeing actual data or listed structures critical to solve the query, I will ask you edit your existing post. Create a sample table listing table, columns and sample data so we can see the basis of what you are aggregating and trying to get out of the query. Especially show where there could be multiple rows per order and fulfillment respectively and a sample answer of what you EXPECT the results to show.
After reading the question title you may find it silly but I'm seriously asking this question with curiosity in my mind.
I'm using MySQL database system.
Consider below the two tables :
Customers(CustomerID(Primary Key), CustomerName, ContactName, Address, City, PostalCode, Country)
Orders(OrderID(Primary Key), CustomerID(Foreign Key), EmployeeID, OrderDate, ShipperID)
Now I want to get the details of all orders that is which order is placed by which customer?
So, I did it in two ways :
First way:
SELECT o.OrderID, o.OrderDate, c.CustomerName
FROM Customers AS c, Orders AS o
WHERE c.CustomerID=o.CustomerID;
Second way:
SELECT Orders.OrderID, Orders.OrderDate, Customers.CustomerName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
In both the cases I'm getting exactly the same correct result. My question is why there is a necessary of additional and confusing concept of Inner Join in MySQL as we can achieve the same results even without using Inner Join?
Is the Inner Join more effective in any manner?
What you are looking at is ANSI-89 syntax (A,B WHERE) vs ANSI-92 syntax (A JOIN B ON).
For very simple queries, there is no difference. However, there are a number of things you can do with ANSI-92 that you cannot do or that become very difficult to implement and maintain in ANSI-89. Anything more than two tables involved, more than one condition in the same join, or separating LEFT JOIN conditions from WHERE conditions are all much harder to read and work with in the older syntax.
The old A,B WHERE syntax is generally considered obsolete and avoided, even for the simple queries where it still works.
The trade-offs of hardware optimization are second to none to users being able to maintain their queries.
Having explicit clean code is better than having esoteric implicit code. In actual production relational databases, most of the queries that take too long come from the ones where the tables are in a concatenated list. These queries show that:
User did not put the effort on expressing the order these tables are joined.
All the relationship joins are cluttered in one place instead organized on its own space for each join.
If all queries are in such format for said user, user does not take
advantage of Outer Joins. There are many cases where a relationship between tables can be: (1) TO (0-many) OR (many) TO (many) instead of (1) TO (1-many).
As in most use cases, these queries become to start to be a problem when the number of joins increase. Beginner users choose to query the tables by placing them as a list delimited with a comma because it takes less to type. At first, it does not seem to be a problem because they are joined against two to three tables. This in turn become a habit to the beginner user. As they start to write more complicated queries by increasing their number of joins, those type of queries are harder to maintain as described from the above bullet points.
Conclusion: As the number of joins within a query scales, improper indentation and categorization make the query harder to maintain.
You should use INNER JOIN and ident your query as below so it is easy for others to read:
SELECT
Orders.OrderID,
Orders.OrderDate,
Customers.CustomerName
FROM Orders
INNER JOIN Customers
ON Customers.CustomerID = Orders.CustomerID;
I've got a SQL statement:
SELECT AVG(`totalhours`)/5 AS `average`,* FROM `report_signout` JOIN `employee` ON `employee`.`username`=`report_signout`.`username`
However it's not working. Basically I need to calculate the average total hours an employee has been on the premises in any one week, the assumption is accepted that the office is only open 5 days a week. The total hours are coming from the table report_signout which I need to join on the username of the employee table so I can produce an outcome where I can then list the Firstname and Lastname of the employee along with the average hours on a web page. That last part is done in PHP which I already know how to do. I just need to see where I'm going wrong with the SQL statement.
If someone could point out to me please or give me a bit of help it would be much appreciated, thanks!
MySQL does not really know what totalhours is. As you are referencing more than one table, a table is not assumed, so you need to declare what to average by defining the reference as:
`table name`.`column name`
This is true for most if not all MySQL built in functions. Use absolute declarations (ie those with table and column names both defined, as above) as much as possible.
SELECT AVG(`report_signout`.`totalhours`)/5 AS `average`,
`report_signout`.*, `employee`.*
FROM `report_signout`
INNER JOIN `employee` ON `employee`.`username` = `report_signout`.`username`
As a small aside, try and avoid vague JOIN referencing instead using a complete JOIN reference, which states the type of JOIN rather than an assumption.
Also try to avoiding using * selection instead stating each column you wish to call.
I am trying to gather data for a research study for my university thesis. Unfortunately I am not a computer science or programming expert and do not have any SQL experience.
For my thesis I need to do a SQL query answering the question: "Give me all patents of a company X where there is more than one applicant (other company) in a specific time span". The data I want to extract is stored on a database called PATSTAT (where I have a 1 month trial) and is using - dont be surprised SQL.
I tried a lot of queries but all the time I am getting different syntax errors.
This is how the interface looks like:
http://www10.pic-upload.de/07.07.13/7u5bqf7jsow.png
I think I have a really good understanding of what (also from an SQL POV) needs to be done but I cannot execute it.
My idea: As result I want the names of the companies (with reference to the company entered below)
SELECT person_name from tls206_person table
Now because I need a criteria like
WHERE nb_applicants > 1 from tls201_appln table
I need to join these two tables tls206 and tls201. I did read some brief introduction guide on SQL (provided by european patent office) and because both tables have no common "reference key" we need to use the table tls207_pers_appln als "intermediate" so to speak. Now thats the point where I am getting stuck. I tried the following but this is not working
SELECT person_name, tls201_appln.nb_applicants
FROM tls206_person
INNER JOIN tls207_pers_appln ON tls206_person.person_id= tls207_pers_appln.person_id
INNER JOIN tls207_pers_appln ON tls201_appln.appln_id=tls201_appln.appln_id
WHERE person_name = "%Samsung%"
AND tls201_appln.nb_applicants > 1
AND tls201_appln.ipr_type = "PI"
I get the following error: "0:37:11 [SELECT - 0 row(s), 0 secs] [Error Code: 1064, SQL State: 0] Not unique table/alias: 'tls207_pers_appln'"
I think for just 4 Hours SQL my approach is not to bad but I really need some guidance on how to proceed because I am not making any progress.
Ideally I would like to count (for every company) and for every row respectively how many "nb_applicants" were found.
If you need further information for giving me guidance, just let me know.
Looking forward to your answers.
Best regards
Kendels
another way of doing the same thing, which you might find easier to understand (if you are new to sql it is impressive you have got so far), is:
SELECT tls206_person.person_name, tls201_appln.nb_applicants
FROM tls206_person, tls207_pers_appln, tls201_appln
WHERE tls206_person.person_id = tls207_pers_appln.person_id
AND tls201_appln.appln_id = tls201_appln.appln_id
AND tls206_person.person_name LIKE "%Samsung%"
AND tls201_appln.nb_applicants > 1
AND tls201_appln.ipr_type = "PI"
(it's equivalent to the other answer, but instead of trying to understand the JOIN syntax, you just write out all the logic and SQL is smart enough to make it work - this is often called the "new" or "ISO" inner join syntax, if you want to google for more info) (although it is possible, i suppose, that this newer syntax isn't supported by the database you are using).
You are referencing the table tls201_appln, but it is not in the from clause. I am guessing that the second reference to tls207_pers_appln should be to the other table:
SELECT person_name, tls201_appln.nb_applicants
FROM tls206_person
INNER JOIN tls207_pers_appln ON tls206_person.person_id = tls207_pers_appln.person_id
INNER JOIN tls201_appln ON tls201_appln.appln_id = tls207_pers_appln.appln_id
WHERE person_name like '%Samsung%"'
AND tls201_appln.nb_applicants > 1
AND tls201_appln.ipr_type = "PI"
For my thesis I need to do a SQL query answering the question: "Give me all patents of a company X where there is more than one applicant (other company) in a specific time span".
Let me rephrase that for you :
SELECT * FROM patents p -- : "Give me all patents
WHERE p.company = 'X' -- of a company X
AND EXISTS ( -- where there is
SELECT *
FROM applicants x1
WHERE x1.patent_id = p.patent_id
AND x1.company <> 'X' -- another company:: exclude ourselves
AND x1.application_date >= $begin_date -- in a specific time span
AND x1.application_date < $end_date
-- more than one applicant (other company)
-- To avoid aggregation: Just repeat the same subquery
AND EXISTS ( -- where there is
SELECT *
FROM applicants x2
WHERE x2.patent_id = p.patent_id
AND x2.company <> 'X' -- another company:: exclude ourselves
AND x2.company <> x1.company -- :: exclude other other company, too
AND x2.application_date >= $begin_date -- in a specific time span
AND x2.application_date < $end_date
)
)
;
[Note: Since the OP did not give any table definitions, I had to invent these]
This is not the perfect query, but it does express your intentions. Given sane keys/indexes it will perform reasonably, too.