Relational database JOIN return excessive values - mysql

Using relational database usually i am joing the table using INNER JOIN. When querying out the data from the joined table it returns excessive amount of value. As an illustration i have created a dummy data table as below.
The table sequence are Operation can have many daily and daily can have many activity and each activity is unique by its activity UID.
Usually i will INNER JOIN for example to join Operation table with Activity table to query out each breakdown of class, phase, ops, root per operation. However it returns excessive amount of Durationhrs. The worst case if joined all the 3 tables (Operation, Daily, Activity) it will return massive hours value beyond believe.
My questions are
Where did i went wrong?
What kind of join to make it right?
If this cannot be done what is the best method to join?
My database like this
CREATE TABLE Operation
(`Operationuid` varchar(3), `operationname` varchar(10), `owner` varchar(55))
;
INSERT INTO operation
(`Operationuid`, `operationname`, `owner`)
VALUES
('AA1', 'Cow', 'Jon Letoy'),
('AA2', 'Chicken', 'Ridikill' ),
('AA3', 'Snake', 'Mighty'),
('AA4', 'Sheep', 'The great'),
('AA5', 'Pig', 'Peon');
CREATE TABLE Activity
(`Operationuid` varchar(3), `DailyUID` varchar(10), `ActivityUID` varchar(55), `Class` varchar(3), `Phase` varchar(3), `Ops` varchar(3), `Root` varchar(3), Duration int);
INSERT INTO Activity
(`Operationuid`, `DailyUID`, `ActivityUID`, `Class`, `Phase`, `Ops`, `Root`, `Duration`)
VALUES
('AA1', 'DD1', 'AC1', 'AB1', 'PH1', 'OP1', null, 12),
('AA1', 'DD1', 'AC2', 'AB1', 'PH2', 'OP1', null, 2),
('AA1', 'DD1', 'AC3', 'AB2', 'PH2', 'OP2', 'RR1', 3),
('AA1', 'DD1', 'AC4', 'AB3', 'PH3', 'OP3', null, 5),
('AA1', 'DD1', 'AC5', 'AB4', 'PH4', 'OP4', 'RR2', 1);
CREATE TABLE Daily
(`Operationuid` varchar(3), `DailyUID` varchar(10), `Dayno` varchar(55), `Daycost` decimal);
INSERT INTO Daily
(`Operationuid`, `DailyUID`, `Dayno`, `Daycost`)
VALUES
('AA1', 'DD1', 1, 1000),
('AA1', 'DD2', 2, 2000),
('AA1', 'DD3', 3, 3000),
('AA1', 'DD4', 4, 4000),
('AA1', 'DD5', 5, 5000);
Select operation.*, daily.*, activity.* from Operation
INNER JOIN daily on daily.operationUID=operatin.operationUID
INNER JOIN activity on activity.operationUID=operation.operationUID
This will return 25 result instead of only 5 that i need why?

You have a little problem on your JOIN. You joined the Activity table with itself here: INNER JOIN activity on activity.operationUID=activity.operationUID
You may want to correct that as:
Select operation.*, daily.*, activity.* from Operation
INNER JOIN daily on daily.operationUID=operation.operationUID
INNER JOIN activity on operation.operationUID=activity.operationUID
See it working here: http://sqlfiddle.com/#!9/1a4ae9/8
On that result you will see 25 rows as result. That's because when you query many tables without JOIN operations what happens is a cartesian plane between the involved tables which will result in the multiplication of numbers of rows in all tables, on your case it WOULD be 5*5*5. But since we have the JOIN operation you receive the equivalence between operation and daily tables therefore 5 registries than you add another join with activity table which has 5 more registries to the Operationuid equals to AA1 so it will result in a each registry of the first join (operation with daily) with all registries on the activity table.

Related

How avoid joining multiple times to the same table to get results back in single rows?

I have a schema that requires joining to the same table multiple times to get more information on the data pointed to by the columns. Below is an example schema that shows this situation:
SQL Fiddle: http://sqlfiddle.com/#!9/7a4019/1
CREATE TABLE STATE
(
employee INT NOT NULL,
boss INT,
manager INT,
rep INT
);
CREATE TABLE EMPLOYEE
(
id INT NOT NULL,
name VARCHAR(255) NOT NULL
);
INSERT INTO EMPLOYEE (id, name) VALUES (1, "Joe");
INSERT INTO EMPLOYEE (id, name) VALUES (2, "John");
INSERT INTO EMPLOYEE (id, name) VALUES (3, "Jack");
INSERT INTO EMPLOYEE (id, name) VALUES (4, "Jeff");
INSERT INTO EMPLOYEE (id, name) VALUES (5, "Jason");
INSERT INTO STATE (employee, boss, manager, rep) VALUES (1, 2, 3, 4);
INSERT INTO STATE (employee, boss, manager, rep) VALUES (2, 3, 3, 4);
INSERT INTO STATE (employee, boss, manager, rep) VALUES (3, NULL, NULL, 4);
INSERT INTO STATE (employee, boss, manager, rep) VALUES (4, 3, 3, NULL);
INSERT INTO STATE (employee, boss, manager, rep) VALUES (5, 2, 3, 4);
Currently, the only way i know to get this information in single rows for each employee, is left joining multiple times like this:
SELECT employee, b.name AS boss, m.name AS manager, r.name AS rep
FROM STATE
LEFT JOIN EMPLOYEE b ON b.employee = STATE.boss
LEFT JOIN EMPLOYEE m ON m.employee = STATE.manager
LEFT JOIN EMPLOYEE r ON r.employee = STATE.rep
Is there a way to do it without joins and without subqueries?
You asked:
Is there a way to do it without joins and without subqueries?
Not really. You are using the JOIN operations precisely as they're intended to be used -- each JOIN reflects a specific relationship between rows of a table.
You can avoid doing multiple joins by using aggregate functions, which I recently found useful after hitting the limit (61) of the number of joins that can be done in a query in MySQL/MariaDB.
SELECT s.employee,
GROUP_CONCAT(if(e.id=s.boss, name, NULL)) as boss,
GROUP_CONCAT(if(e.id=s.manager, name, NULL)) as manager,
GROUP_CONCAT(if(e.id=s.rep, name, NULL)) as rep,
FROM STATE s, EMPLOYEE e
GROUP BY s.employee
The above example uses MySQL's GROUP_CONCAT function. It appears not to be an ANSI standard. Other relational databases may have similar functions. A cursory web search turned up a page that discussed aggregate functions for various relational databases: http://www.postgresonline.com/journal/archives/191-String-Aggregation-in-PostgreSQL,-SQL-Server,-and-MySQL.html

SQL query of multiple joins

I have a problem dealing with joins
This is my first table:
CREATE TABLE IF NOT EXISTS `form` (
`id_form` int(20) NOT NULL AUTO_INCREMENT,
`nameform` varchar(50) NOT NULL,
PRIMARY KEY (`id_form`)
)
The data in the table
INSERT INTO `form` (`id_form`, `nameform`) VALUES
(1, 'Formulaire commun'),
(2, 'Formulaire FCPR'),
(3, 'Formulaire fonds d''amorçage'),
(4, 'Formulaire FOPRODI'),
(5, 'Formulaire ITP'),
(6, 'Formulaire PASRI'),
(7, 'Formulaire PCAM'),
(8, 'Formulaire PIRD'),
(9, 'Formulaire PMN'),
(10, 'Formulaire PNRI'),
(11, 'Formulaire PRF'),
(12, 'Formulaire RIICTIC'),
(13, 'Formulaire VRR');
My second table userdata:
CREATE TABLE IF NOT EXISTS `donnée_utilisateur` (
`id_d` int(20) NOT NULL AUTO_INCREMENT,
`id_form` int(20) NOT NULL,
`id_us` int(20) NOT NULL,
PRIMARY KEY (`id_d`),
KEY `id-form` (`id_form`),
KEY `id-us` (`id_us`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=407 ;
ALTER TABLE `donnée_utilisateur`
ADD CONSTRAINT `fvdsvsd` FOREIGN KEY (`id_us`) REFERENCES `utilisateur` (`id_us`),
ADD CONSTRAINT `ssssssssssss` FOREIGN KEY (`id_form`) REFERENCES `form` (`id_form`);
The data in it:
INSERT INTO `donnée_utilisateur` (`id_d`, `id_form`, `id_us`) VALUES
(380, 2, 6),
(381, 2, 6),
(382, 3, 6),
(383, 3, 6),
(384, 4, 6),
(385, 5, 6);
And finally the user table :
CREATE TABLE IF NOT EXISTS `utilisateur` (
`id_us` int(20) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id_us`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=8 ;
The data :
INSERT INTO `utilisateur` (`id_us`) VALUES
(3),
(6),
(7);
What I want to do is to get the id_form which doesn't exist in userdata table for a specific user.
I've tried to do it like this:
SELECT f.id_form
FROM `donnée_utilisateur` d
RIGHT JOIN `form` f ON f.id_form=d.id_form Where d.id_d IS NULL
This query leads to this result if we have that kind of data :
id_form
1
6
7
8
9
10
11
12
13
This is the expected result and it's correct. If I want this result for a specific user, I change it like this :
SELECT f.id_form
FROM `donnée_utilisateur` d
RIGHT JOIN `form` f ON f.id_form=d.id_form
INNER JOIN `utilisateur` u ON u.id_us=d.id_us Where d.id_d IS NULL AND id_us=6
I'm getting nothing or it should be like the result that I just wrote.
Let's take another example for id_us=7
SELECT f.id_form
FROM `donnée_utilisateur` d
RIGHT JOIN `form` f ON f.id_form=d.id_form
INNER JOIN `utilisateur` u ON u.id_us=d.id_us Where d.id_d IS NULL AND u. id_us=7
This should result in all id_form from 1 to 12 because the user didn't insert any data.
Right joins are very hard to read and thus prone to errors. Usually you'd start with the table you must get data from and then left outer join tables you might get data from.
Let's look at your query:
You right join after table donnée_utilisateur, so donnée_utilisateur gets outer joined to the other tables.
The other tables are form and utilisateur. You have no join criteria combining the two, so you cross join them, i.e. combine every form with every utilisateur.
So to this cross join product you outer join donnée_utilisateur.
Where d.id_d IS NULL makes this an anti join. A trick used to replace a mere NOT EXISTS or NOT IN in DBMS that have weaknesses with these straight-forward methods. You use it to get all form / utilisateur combinations for which there is no entry in donnée_utilisateur. Probably many.
Where id_us=6 further narrows the results. Unfortunately you forgot to use a qualifier. Is it u.id_us or d.id_us? The DBMS cannot know. Let's say it decides you mean d.id_us. That field is always null, because you just dismissed all matches. d.id_us = 6 is never true, so all rows get discarded. Your result is empty. If the DBMS decided you mean u.id_us, you'd prabably get results, particularly the same id_form over and over.
You may want to add the qualifier u, but I suggest you rather re-write the whole query and use NOT IN or NOT EXISTS.
And what has utilisateur to do with your query anyway? I thought you where looking for forms for which not exists user 6 in donnée_utilisateur. Why join utilisateur at all? (And if you join it, you should probably outer join it to donnée_utilisateur.)
You can do with a subselect
select id_form from form where
id_form not in (select distinct id_form from donnée_utilisateur where id_us=6 )
Or RDBMS engine correctly
select id_form from form where
id_form not in (select id_form from donnée_utilisateur where id_us=6 )
Thorsten was very good in his clarification, but did not provide the completed query to help you. Your original right-join query was VERY close. However, I have switched to a left-join as follows:
SELECT
f.id_form,
f.nameform
from
form f
left join donnée_utilisateur d
ON f.id_form = d.id_form
AND d.id_us = 6
where
d.id_d IS NULL
So, I am starting with the FORM table to get the ID and the name. No problem. Now, your consideration that the form is not found within the secondary table, so that is a left-join on the form ID and looking for NULL in the where clause. But this, by itself is qualifying a form for ANY user. To finalize your need for a specific user, just add the AND clause to the secondary table so THAT portion remains as a left-join for the FORM AND specific user resulting in NULL for the d.id_d column

mysql running total as view

I am still an sql greenhorn and try to convert this script, building a running total as view in mysql:
DROP TABLE IF EXISTS `table_account`;
CREATE TABLE `table_account`
(
id int(11),
account int(11),
bdate DATE,
amount DECIMAL(10,2)
);
ALTER TABLE `table_account` ADD PRIMARY KEY(id);
INSERT INTO `table_account` VALUES (1, 1, '2014-01-01', 1.0);
INSERT INTO `table_account` VALUES (2, 1, '2014-01-02', 2.1);
INSERT INTO `table_account` VALUES (4, 1, '2014-01-02', 2.2);
INSERT INTO `table_account` VALUES (5, 1, '2014-01-02', 2.3);
INSERT INTO `table_account` VALUES (3, 1, '2014-01-03', 3.0);
INSERT INTO `table_account` VALUES (7, 1, '2014-01-04', 4.0);
INSERT INTO `table_account` VALUES (6, 1, '2014-01-06', 5.0);
INSERT INTO `table_account` VALUES (8, 1, '2014-01-07', 6.0);
SET #iruntot:=0.00;
SELECT
q1.account,
q1.bdate,
q1.amount,
(#iruntot := #iruntot + q1.amount) AS runningtotal
FROM
(SELECT
account AS account,
bdate AS bdate,
amount AS amount
FROM `table_account`
ORDER BY account ASC, bdate ASC) AS q1
This is much more faster than building a sum over the whole history on each line.
The problems I cannot solve are:
Set in view
Subquery in view
I think it might be posssible to use some kind of JOIN instead of "SET #iruntot:=0.00;"
and use two views to prevent the need of a subquery.
But I do know how.
Will be happy for any hints to try.
Regards,
Abraxas
MySQL doesn't allow subqueries in the from clause for a view. Nor does it allow variables. You can do this with a correlated subquery, though:
SELECT q.account, q.b_date, q.amount,
(SELECT SUM(q2.amount)
FROM myview1 q2
WHERE q2.account < q.account OR
q2.account = q.account and q2.date <= q.date
) as running total
FROM myview1 q;
Note that this assumes that the account/date column is unique -- no repeated dates for an account. Otherwise, the results will not be exactly the same.
Also, it seems a little strange that you are doing a running total across all accounts and dates. I might expect a running total within accounts, but this is how you formulated the query in the question.

move the SQL database data from the old structure to the new structure using SQL

I am normalizing a MySQL database. I designed a new structure.
How do I move the database data from the old structure to the new structure using SQL?
Here is an Example of Normalizing Tables in a Script. I advise you do something like this
e.g Table: tbl_tmpData
Date, ProductName, ProductCode, ProductType, MarketDescription, Units, Value
2010-01-01, 'Arnotts Biscuits', '01', 'Biscuit', 'Store 1', 20, 20.00
2010-01-02, 'Arnotts Biscuits', '01', 'Biscuit', 'Store 2', 40, 40.00
2010-01-03, 'Arnotts Biscuits', '01', 'Biscuit', 'Store 3', 40, 40.00
2010-01-01, 'Cola', '02', 'Drink', 'Store 1', 40, 80.00
2010-01-02, 'Cola', '02', 'Drink', 'Store 2', 20, 40.00
2010-01-03, 'Cola', '02', 'Drink', 'Store 2', 60, 120.00
2010-01-01, 'Simiri Gum', '03', 'Gum', 'Store 1', 40, 80.00
2010-01-02, 'Simiri Gum', '03', 'Gum', 'Store 2', 20, 40.00
2010-01-03, 'Simiri Gum', '03', 'Gum', 'Store 3', 60, 120.00
You would Create Your Date Table first:
CREATE TABLE tbl_Date
(
DateID int PRIMARY KEY IDENTITY(1,1)
,DateValue datetime
)
INSERT INTO tbl_Date (DateValue)
SELECT DISTINCT Date
FROM tbl_Data
WHERE Date NOT IN (SELECT DISTINCT DateValue FROM tbl_Date)
you would then Create your Market Table
CREATE TABLE tbl_Market
(
MarketID int PRIMARY KEY IDENTITY(1,1)
,MarketName varchar(200)
)
INSERT INTO tbl_Market (MarketName)
SELECT DISTINCT MarketDescription
FROM tbl_tmpData
WHERE MarketName NOT IN (SELECT DISTINCT MarketDescription FROM tbl_Market)
you would then Create your ProductType Table
CREATE TABLE tbl_ProductType
(
ProductTypeID int PRIMARY KEY IDENTITY(1,1)
,ProductType varchar(200)
)
INSERT INTO tbl_ProductType (ProductType)
SELECT DISTINCT ProductType
FROM tbl_tmpData
WHERE ProductType NOT IN (SELECT DISTINCT ProductType FROM tbl_ProductType)
you would then Create your Product Table
CREATE TABLE tbl_Product
(
ProductID int PRIMARY KEY IDENTITY(1,1)
, ProductCode varchar(100)
, ProductDescription varchar(300)
,ProductType int
)
INSERT INTO tbl_Product (ProductCode, ProductDescription, ProductType)
SELECT DISTINCT tmp.ProductCode,tmp.ProductName, pt.ProductType
FROM tbl_tmpData tmp
INNER JOIN tbl_ProductType pt ON tmp.ProductType = pt.ProductType
WHERE ProductCode NOT IN (SELECT DISTINCT ProductCode FROM tbl_Product)
you would then Create your Data Table
CREATE TABLE tbl_Data
(
DataID int PRIMARY KEY IDENTITY(1,1)
, DateID varchar(100)
, ProductID varchar(100)
, MarketID varchar(300)
,Units decimal(10,5)
, value decimal(10,5)
)
INSERT INTO tbl_Data (ProductID, MarketID, Units, Value)
SELECT t.DateID
, p.ProductID
, m.MarketID
, SUM(tmp.Units)
, SUM(tmp.VALUE)
FROM tbl_tmpData tmp
INNER JOIN tbl_Date t ON tmp.Date = t.DateValue
INNER JOIN tbl_Product p ON tmp.ProductCode = p.ProductCode
INNER JOIN tbl_Market m ON tmp.MarketDescription = m.MarketName
GROUP BY t.DateID, p.ProductID, m.MarketID
ORDER BY t.DateID, p.ProductID, m.MarketID
I have recently done this and have some insights to how the general procedure can be executed.
Start by modelling your data. When you start with a database that is not normalized you need to create a proper model which you want to transfer your data to. This includes identifying atomic objects that should live in its own tables. Identify duplicated data and determine where that should go. Also identify all relationships that exists on you data structure.
An optional step. The database usually go together with an interface that probably also needs updating. Look at that design also in this step and decide if there are any isolated parts that can wait, both in the data structure and interface program. How much should be included is determined by practical aspects such as time and budget. Maybe some part does not need modification just yet.
It can also be an option to start completely from scratch, skip backwards compatibility and let there be two parallel systems
Write a script that adds all new columns and tables that the normalized data requires.
Write another script that transfers the non-normalized data to the new normalized data structure. This is the most tricky part I would say, and can be rather messy, depending on in how bad shape the old data is.
Enforce all constraints from the model on the new normalized data by adding constraints to the new tables and columns. This is also best done in a script. Her you will see if your data migration succeeded. If it did, you will be able to add all the constraints. If it failed, some constraint will fail, and you will have to go back and look at what failed.
Finally, make yet another script that deletes all columns and tables that were removed in the new model. By doing this, you will easily identify all places in the interface which needs updating. Anything talking to anything in these columns and tables will have to be updated in the interface.
Some general tips is to do all development against a, maybe reduced, copy of the database. E.g. in MySQL you can do an SQL Dump using for example Workbench and test you scripts on that. You will probably need a few iterations on the database before the migration works. In connection, also do the actual migration on a copy of the database, not to break anything in production.

SUM data based on a Group By statement for another table

I am trying to create a query that allows me to get the sum of a total stored in one table based on values in another table.
Specifically, I have one table called 'winning_bids', that I want to join with another table, called 'objects'. 'winning_bids' contains a User ID, and an Object ID (primary key of 'objects' table). The 'objects' table contains an Object ID, and the value of the object. I want to sum the value from the 'objects' table for each user, grouped by the User ID from the 'winning_bids' table.
I tried something like this, but it does not work:
SELECT SUM(o.value) AS total, w.uid
FROM winning_bids w
LEFT JOIN objects o ON (o.id = w.oid)
GROUP BY w.uid
This statement merely returns all of the User IDs, but with the total for only the first User ID in each row.
Any help would be appreciated, thanks.
It works fine for me.
Here is what I did to test your query:
CREATE TABLE winning_bids (uid INT NOT NULL, oid INT NOT NULL);
INSERT INTO winning_bids (uid, oid) VALUES
(1, 1),
(1, 2),
(2, 3);
CREATE TABLE objects (id INT NOT NULL, value INT NOT NULL);
INSERT INTO objects (id, value) VALUES
(1, 1),
(2, 20),
(3, 300);
SELECT SUM(o.value) AS total, w.uid
FROM winning_bids w
LEFT JOIN objects o ON (o.id = w.oid)
GROUP BY w.uid;
Result:
total uid
21 1
300 2
If you still think it doesn't work can you please post example input data that gives the wrong result when you run your query, and also specify what you believe that the correct result should be.