Is un-normalised data always contained in a single table? - mysql

For an assignment I have created a database driven web application. I have to show my understanding of normalisation by showing my database in de-normalised form, and then normalising it gradually, explaining what was done at each stage.
The normalisation process at stages 1 to 3 (which is as far as we have to go) I have no trouble understanding.
My database contains 20+ tables and I don't know how I am supposed to represent this is 0NF. The main difficulty is due to the fact that, as I have understood, 0NF data is in a single table. In fact, I don't see any way around this because 0NF has no primary keys, and therefore there would be no way to reference data in other tables.
Am I right in thinking this? Or can I represent 0NF data in multiple tables, which would make this task a lot easier as I wouldn't have a 100+ column table.

0NF is a single table - like a spreadsheet of data. You wouldn't reference any other tables, you would simply repeat the data in the one table.
For example, imagine a messaging system:
Customer | Recipient | Message
Bob John Hello John
John Bob Hello
Bob John Have you got time to answer a question?
John Bob No way
We don't have a table containing the Person to link to, we repeat Bob or John in the customer column and in the recipient column.

0NF data can occur in multiple tables, each of which may be 0NF, but one table for everything is the worst form.
This may very well be the case of an assignment where you first have to fuck up your spontaneous solution first, so you can show the process of how to make it better.

You mean "unnormalized" not "de-normalized". The latter is when normalized base tables are replaced by others whose values are always the join of the orignals. You need to find out from whoever gave you the asignment whether unnormalized form here means your first design attempt or specifically a "universal relation" that is an appropriate join of all those. That would be de-normalizing.
Every base table and query result holds the rows that make some predicate (statement parameterized by columns) into a true proposition (statement).
SELECT * FROM EMP "employee [E] is named [N] and has dependent [D]"
SELECT * FROM DEP "employee [E] works for department [D]"
query SELECT E, N FROM EMP
for some D, "employee [E] is named [N] and has dependent [D]"
("employee [E] is named [N] and has some dependent")
An SQL FROM makes a temporary table that you can think of as having columns T.C for each column C of each table T. For inner JOINs (ie INNER, CROSS and plain) this temporary table is a cross join. It's predicate is the AND of the predicates of the joined tables. ON and WHERE conditions are also ANDed into the predicate. The SELECT clause renames the temporary columns so there are no "."s. (Athough SQL does that implicity if there's no ambiguity.)
query SELECT EMP.E AS E, N, DEP.D AS D FROM EMP JOIN DEP
"for some EMP.D, employee [EMP.E] is named [EMP.N] and has dependent [EMP.D]"
AND "employee [DEP.E] works for department [DEP.D]"
(ie "employee [E] is named [N] and has some dependent and works for department [D]")
Note that it doesn't matter what constraints hold. (Including UNIQUE, PRIMARY KEY,FOREIGN KEY & CHECK). Constraints just tell you that tables are limited in the values they will ever hold. In fact the constraints are determined by the predicates and the situations that can arise.
If you know that it's always the case that T1.C = T2.C for some column C of tables T1 & T2 then you only have to SELECT one of them, AS C. If every column C is always equal in every table then NATURAL JOIN does the appropriate = and the AS without having to mention any columns.
(More re predicates & SQL.)
PS The single-base version of a database is not a base whose value is the FULL ( OUTER ) JOIN of separate bases. First, normalization does not deal with NULLs, so you would have to remove them from any OUTER JOIN result, more or less giving you your tables back. Second, FULL JOIN is in general not associative, ie (T1 FULL JOIN T2) FULL JOIN T3 <> T1 FULL JOIN (T2 JOIN T3), so there is no such thing as "the FULL JOIN` of more than two tables". Third, even with just two tables their FULL JOIN does not in general allow you to reconstruct their values.
PPS There is no "0th Normal Form". There are different uses of "1st Normal Form". Sometimes it just means being a relation, and sometimes it means being a relation with no relation-valued attributes, and it is also frequently used in various other confused/nonsensible ways that are really about aspects of good design.

Related

How to join 2 sql tables where one table contains multiple values in a single column

Currently, this is what my SELECT code looks like:
SELECT student.stu_code, user.f_name, user.l_name
FROM user
INNER JOIN student
ON student.stu_code = user.user_id
INNER JOIN course
ON course.stu_code ?????;
Basically, to elaborate the student table inherits from user table, therefore I had user_id = stu_code. What I'm confused about is how to join course table with student table.
Let's say that the course table has a course code (PK), a few other attributes and a stu_code column, however, the student code column has multiple values inside a single column to represent that multiple students are taking the course and stored as VARCHAR.
Example: Student table has stu_code string value of '123' and course table has a stu_code with string value of '123, 246, 369'.
How would I go about joining these two tables together and separating the stu_code in the course table so that it represents 3 separate stu_code values -> i.e. '123', '246', '369'.
Any help is greatly appreciated!
however, the student code column has multiple values inside a single column to represent that multiple students are taking the course and stored as VARCHAR.
Your data model is broken. Put your effort into fixing the data model. You want a junction/association table courseStudents or perhaps enrolled, with columns like:
stu_code (foreign key to students)
course_code (foreign key to students)
enrollment_date
and so on
What is wrong with your data model? Here are a few things:
You are storing numbers as a string.
You are putting multiple values into a string column.
You cannot define foreign key relationships.
SQL has poor string handling capabilities.
SQL has a great way to store lists of things. It is not called "string". It is called "table".
Your data model is ~broken~ hindering you from elegant solutions.
You cannot join your two tables efficiently. While they might both contain strings they do not contain data with the same rules. Thus, you must transform the data in order to join them so you could do this in a few ways but one way is using regular expression function.
You can use it to evaluate a test on whether the stu_code matches the list of codes. Further, you can do this dynamically ... constructing the test string itself based upon values from the left and right
join based on REGEXP
SELECT student.stu_code, user.f_name, user.l_name
FROM user
INNER JOIN student
ON student.stu_code = user.user_id
INNER JOIN course
ON student.stu_code REGEXP CONCAT('[[:&lt:]]',course.stu_code,'[[:&gt:]]')
Assuming tables and data:
Student
- - - -
stu_code
123
Course
- - - -
stu_code
'123, 246, 369'
Example:
http://sqlfiddle.com/#!9/672b57f/4
about the regular expression
in mysql the regex syntax can be a little bit different. [[:<:]] is the character class in spencer notation for word boundary.
if you have a new enough version of mysql/mariadb you can use more typical ICU notation of \b.
more about that here : https://dev.mysql.com/doc/refman/8.0/en/regexp.html
about efficiency
in large datasets the performance will be awful. you will have to scan all records and you will have to perform the function on all of them. In a large set you might get some gains by joining on like first (which is faster than regexp). This will be much faster at filtering-out and then the regexp can deal with filtering-in.
Perhaps your model was based upon an assumption of having a courses table with very few rows?
It ironic because you have made your course table unnecessarily large. You would actually be better off with an intermediary table that represents the many-to-many nature (the fact that students can take many courses and courses can have many students) with 1 row per unique relationship. While this table would be an order of magnitude "longer" it would be leaner and it could be indexed and query performance would be faster.
The courses table does not need to have any awareness of the student list and thus you can alter courses by removing courses.stu_code once you change the model (aside: It might be useful if courses cached a hint of the expected student count for that course)
possible link table
would be a new table like this (note how it only ever needs these 2 columns)
stu_course_lnk
- - - - - - - -
stu_code course_id
123 ABC
124 ABC
...
123 XYZ
...
124 LMN
then you add joins of
...
student.stu_code = stu_course_lnk.stu_code
and
stu_course_lnk.course_id = course.id
...

How do I find relations between tables that are long-distance related? MySQL

I have a problem with finding relations between tables ps_product and ps_carrier from a prestashop database. The schema is available at http://doc.prestashop.com/display/PS16/Fundamentals+of+PrestaShop+Development.
I need to make an update by joining these two tables in my shop but I'm struggling with finding good keys. How do I compose my query?
Tables represent business relationships/associations. The "relation[ship]s" you mention are FKs (foreign keys), and which are not needed for querying. They state that subrow values for some columns must also be subrow values for some key columns. What is needed is to know what a row says about the current business situation when it is in a table. (Which, given what situations arise, determine the FKs and other constraints.)
From Required to join 2 tables with their FKs in a 3rd table:
Every base table comes with a predicate--a statement template parameterized by column names. The table value is the rows that make its predicate into a true statement.
A query also has a predicate. Its value also is the rows that make its predicate true. Its predicate is built up according to its FROM, WHERE and other clauses.
(CROSS or INNER) JOIN puts AND between predicates; UNION puts OR between them; EXCEPT inserts AND NOT and ON & WHERE AND in a condition; when SELECT drops a column C from T, it puts FOR SOME (value for) C in front of T's predicate. (Etc for other operators.)
So given
-- rows where product [id_product] is supplied by [id_supplier] ...
ps_product(id_product, id_supplier, ...)
-- rows where carrier [id_carrier] has reference [id_reference] ...
ps_carrier(id_carrier, id_reference, ....)
we write
ps_product s
JOIN ...
ON s.id_product = ...
...
JOIN ps_carrier c
ON ... = id_carrier
WHERE ...
to get rows where
product [p.id_product] is supplied by [p.id_supplier] ...
AND ...
AND s.id_product = ...
...
AND carrier [c.id_carrier] has reference [c.id_reference] ...
AND ... = id_carrier
AND ...
You need know your tables' predicates then JOIN together tables ON or WHERE conditions so that the resulting predicate is for the rows you want back.
Is there any rule of thumb to construct SQL query from a human-readable description?

Querying multiple SQL tables

I'm having trouble understanding joins and subqueries and when to use each. I'm sure that one of them is appropriate here.
I have a table ("owners") of (to keep things simple) unit numbers, names and email addresses.
I have another table ("widgets") of unit numbers and the number of widgets assigned to each unit. Each unit has 0, 1 or 2 widgets.
I need to send an email to each unit depending on whether they have 0, 1 or 2 widgets. In other words (and in plain English, not even remotely an attempt at semi-correct SQL):
select numwidgets from widgets where unit=x
then where owners.unit = widgets.unit
select unit, name, email
The data that I need to pass to my script will look like this:
unit name email widgets
1 Bob Smith bob#example.com 2
I can visualise in my mind the data that I need, but it's extracting it from two different tables that is the problem. The "owners" table is a permanent table, and the "widgets" table is a temporary one for tracking a specific issue that is being addressed in the email I'm sending. I don't need help sending the email, just creating the SQL I need to use to extract the data (numwidgets, name, email) for one email.
Thanks.
EDIT:
Input data:
owners table:
unit, name, email
1,Bob Smith, bob#example.com
widgets table:
unit,widgets
1,2
try this, a inner join selects all rows from both tables as long as there is a match between the columns in both tables.
Subqueries (also known as inner queries or nested queries) are a tool for performing operations in multiple steps. For example, if you wanted to take the sums of several columns, then average all of those values, you’d need to do each aggregation in a distinct step.
select owners.unit, name, email, widgets.numwidgets
from owners
inner join widgets On owners.unit = widgets.unit
where owners.unit = x
For your case you need an inner join. To understand that you need to see the concept of keys which is pretty simple.
In your tables unitnumber is the common column in both tables. So a join has to be applied based on this column.
Subqueries are used when the output of one query is given as input to another query i.e to perform related operations
Select o.unit,o.name,o.email ,w.numwidgets from owners o inner join widgets w on o.unit=w.unit where w.unit=X
In above query pass X = 0,1,2 as per the result you want
Thanks
I think you want:
select o.*, w.widgets
from owners o
inner join widget w
on o.unit = w.unit
where o.unit = 123;

SQL Query to populate table based on PK of Main Table being joined

Here is my Database structure (basic relations):
I'm attempting to formulate a one-line query that will populate the clients_ID, Job_id, tech_id, & Part_id and return back all the work orders present. Nothing more nothing less.
Thus far I've struggled to generate this Query:
SELECT cli.client_name, tech.tech_name, job.Job_Name, w.wo_id, w.time_started, w.part_id, w.job_id, w.tech_id, w.clients_id, part.Part_name
FROM work_orders as w, technicians as tech, clients as cli, job_types as job, parts_list as part
LEFT JOIN technicians as techy ON tech_id = techy.tech_name
LEFT JOIN parts_list party ON part.part_id = party.Part_Name
LEFT JOIN job_types joby ON job_id = joby.Job_Name
LEFT JOIN clients cliy ON clients_id = cliy.client_name
Apparently, once all the joining happens it does not even populate the correct foreign key values according to their reference.
[some values came out as the actual foreign key id, not even
corresponding value.]
It just goes on about 20-30 times depending on largest row of a table that I have (one of the above).
I only have two work orders created, So ideally it should return just TWO Records, and columns, and fields with correct information. What could I be doing wrong? Haven't been with MySQL too long but am learning as much as I can.
Your join conditions are wrong. Join on tech_id = tech_id, not tech_id = tech_name. Looks like you do this for all your joins, so they all need to be fixed.
I really don't follow the text of your question, so I am basing my answer solely on your query.
Edit
Replying to your comment here. You said you want to "load up" the tech name column. I assume you mean you want tech name to be part of your result set.
The SELECT part of the query is what determines the columns that are in the result set. As long as the table where the column lives is referenced in the FROM/JOIN clauses, you can SELECT any column from that table.
Think of a JOIN statement as a way to "look up" a value in one table based on a value in another table. This is a very simplified definition, but it's a good way to start thinking about it. You want tech name in your result set, so you look it up in the Technicians table, which is where it lives. However, you want to look it up by a value that you have in the Work Orders table. The key (which is actually called a foreign key) that you have in the Work Orders table that relates it to the Technicians table is the tech_id. You use the tech_id to look up the related row in the Technicians table, and by doing so can include any column in that table in your result set.

Best structure for tables with more than 10000 columns

I am applying a group of data mining algorithms to a dataset comprised of a set of customers along with a large number of descriptive attributes that summarize various aspects of their past behavior. There are more than 10,000 attributes, each stored as a column in a table with the customer id as the primary key. For several reasons, it is necessary to pre-compute these attributes rather than calculating them on the fly. I generally try to select customer with a specified attribute set. The algorithms can combine any arbitrary number of these attributes together in a single SELECT statement and join the required tables. All the tables have the same number of rows (one per customer).
I am wondering what's the best way to structure these tables of attributes. Is it better to group the attributes into tables of 20-30 columns, requiring more joins on average but fewer columns per SELECT, or have tables with the maximum number of columns to minimize the number of joins, but having potentially all 10K columns joined at once?
I also thought of using one giant 3-column customerID-attribute-value table and storing all the info there, but it would be harder to structure a "select all customers with these attributes-type query that I need."
I'm using MySQL 5.0+, but I assume this is a general SQL-ish question.
From my expirience using tables with 10,000 columns is very-very-very bad idea. What if in future this number will be increased?
If there are a lot of attributes you shouldn't use a horizontal scaled tables (with large number of columns). You should create a new table attributes and place alltributes values into it. Then connect this table with Many-To-One relationship to main entry table
Maybe the second way is to use no-SQL (like MongoDB) systems
As #odiszapc said, you have to use a meta-model structure, like for instance:
CREATE TABLE customer(ID INT NOT NULL PRIMARY KEY, NAME VARCHAR(64));
CREATE TABLE customer_attribute(ID INT NOT NULL, ID_CUSTOMER INT NOT NULL, NAME VARCHAR(64), VALUE VARCHAR(1024));
Return basic informations of given customer:
SELECT * FROM customers WHERE name='John';
Return customer(s) matching certain attributes:
SELECT c.*
FROM customer c
INNER JOIN attribute a1 ON a1.id_customer = c.id
AND a1.name = 'address'
AND a1.value = '1078, c/ los gatos madrileños'
INNER JOIN attribute a2 ON a2.id_customer = c.id
AND a2.name = 'age'
AND a2.value = '27'
Your generator should generate the inner joins on the fly.
Proper indexes on the tables should allow all this engine to go relatively fast (if we assume 10k attributes per customer, and 10k customers, that's actually pretty much a challenge...)
10,000 columns is much. The SELECT statement will be very long and messy if you wouldn't use *. I think you can narrow the attributes down to most useful and meaningful ones, eliminating others