A client needs to migrate a large volume of data and I feel this question could be generic enough for SO.
Legacy system
Student profiles contain fields like names, emails etc, as well as university name. The university name is represented as a string and as such is repeated which is wasteful and slow.
Our new form
A more efficient solution is to have a table called university that only stores the university name once with a foreign key (university_id) and the HTML dropdown just POSTs the university_id to the server. This makes things much faster for doing GROUP BY queries, for example. New form data going into the database works fine.
The problem
How can we write a query that will INSERT all the other columns (first_name, last_name, email, ...) but then rather than inserting the university string, find out its university_id from the university table and INSERT the corresponding int instead of the original string? (scenario: data is in a CSV file that we will manipulate into INSERT INTO syntax)
Many thanks.
Use INSERT INTO ... SELECT with a LEFT JOIN. Left is chosen so that student record won't get discarded if it has a null value for university_name.
INSERT INTO students_new(first_name, last_name, email, university_id)
SELECT s.first_name, s.last_name, s.email, u.university_id
FROM students_old s
LEFT JOIN university u ON s.university_name = u.university_name
Table and column names are to be replaced for real ones. Above assumes that your new table for students holding foreign key to university is students_new while the old one (from before normalisation) is students_old.
Related
My lecturer has given me a document with raw data in and I have to create tables and implement it into MySQL.
Background: The raw data has multiple transactions from companies. Some transactions are from the same company so the same COMPANYID comes up often.
The lecturer told us to insert the raw data into a table titled RAW to store everything. Then we have to insert data into our smaller tables from that RAW table.
ISSUE:
My issue is when I’m trying to create my table COMPANY, I obviously want to include the COMPANYID.
But when I use the code
Insert into COMPANY
Select distinct COMPANYID, COMPANYNAME, NumberofDivisions
From RAW;
I get the duplicate error because obviously in the RAW table, the same COMPANYID comes up multiple times for each transaction!
How can I only have the COMPANYID once in the COMPANY table?
Try insert ignore instead of insert which would continue inserting records till the end of file even though duplicate record error.
Insert ignore into COMPANY
Select distinct COMPANYID, COMPANYNAME, NumberofDivisions
From RAW;
I have table called addresses that stores N:N sub-information from those two tables people and companies.
So I created another table called address_links that have the fields address_id and contact_id so that contact id could come from a people or from a companies record.
So I need another field that by the creation of the record, points to the correct table. But what can I do to automatize that interpretation when I make a query that shows the name of the owners of that addresses in a query list? I tried IF in order to case that 3rd field by selecting the table but it did not worked.
Explain Note: Some address may be house or workplace for more than one person.
If you need to link to different tables, then normally you would want to have a separate link table for each. This allows the database to enforce referential integrity and eliminates the need for special if statements, it also makes the database easier to understand if another developer looks at it without having to understand special implementation details.
Example Tables (columns):
Addresses (AddressId, Address, City, State, Zip, ...)
Persons (PersonId, FirstName, ...)
Companies (CompanyId, Name, ...)
AddressPersonLinks (AddressId, PersonId)
AddressCompanyLinks (AddressId, CompanyId)
I have an e-r model which has following entities: Doctor, Patient, Human and there is a generation relationship between Doctor -> Human and Patient -> Human. I am trying to create an relation model. So, which model is correct: first one or the second one?
1)
Human (Name, Surname, Sex, Address)
Doctor(License number, specification)
Patient(Insurance number, diagnosis)
2)
Doctor(Name, Surname, Sex, Address, License number, specification)
Patient(Name, Surname, Sex, Address, Insurance number, diagnosis)
and the entity Human is not necesarry.
P.S. A am new to relational models.
This depends largely on your requirements. Can doctors be patients as well? If yes you need a base table like human. In that case doctor and patient should have a foreign key to human.
Another option would be to use table inheritance if your DBMS supports that. In PostgreSQL you could e.g. do:
create table human
(
name text,
surname text,
sex char(1),
address text
);
create table doctor
(
license_number integer,
specifiction text
)
inherits (human);
create table patient
(
insurance_number integer,
diagnosis text
)
inherits (human);
Both models are 'correct'. But there's a difference in inserting and querying the data.
The first one is a good choice when you want to persist your data normalized (see http://en.wikipedia.org/wiki/Database_normalization for more information about normalization). But when you want to query all doctors or all patients for example, you have to join two tables.
Also when you insert a new object you have to insert two rows (into Human and Doctor/Patient).
When you work with an OR-Mapper, which transforms your data into objects you can use polymorphism because you have a common base (Human).
The second possibility is a good choice when speed matters.
It will be faster when you want to query all doctors/patients because no joins are necessary.
But it will be slower when you want to query e.g. all addresses from doctors and patients
I have a contact management system and an sql dump of my contacts with five or six columns of data in it I want to import into three specific tables. Wondering what the best way to go about this is. I have already uploaded the sql dump...its a single table now in in my contact management database.
The tables in the crm require in the companies table only the contactID...and in the songs table:
companyID,
contactID,
date added (not required) and
notes (not required)
Then there is the third table, the contact table which only requires contactname.
I have already uploaded data to each of the three tables (not sure if my order is correct on this) but now need to update and match the data in the fourth table (originally the sql dump) with the other three and update everything with its unique identifier.
Table Structures:
+DUMP_CONTACTS
id <<< I dont need this ID, the IDs given to each row in the CRM are the important ones.
contact_name
company
year
event_name
event_description
====Destination Tables====
+++CONTACTS TABLE++
*contactID < primary key
*contact_name
+++COMPANIES TABLE+++
*companyID < primary key
*name
*contact_ID
*year
++++Events++++
*EventID < primary key
*companyID
*contactID
*eventname
*description
There are parts of your post that I still don't understand, so I'm going to give you SQL and then you can run them in a testing environment and we can take it from there and/or go back and start again:
-- Populate CONTACTS_TABLE with contact_name from uploaded dump
INSERT INTO CONTACTS_TABLE (contact_name)
SELECT contact_name FROM DUMP_CONTACTS
-- Populate COMPANIES with information from both CONTACTS_TABLE + dump
INSERT INTO COMPANIES (name, contact_ID, year)
SELECT d.company, c.contactID, d.year
FROM DUMP_CONTACTS AS d
INNER JOIN CONTACTS_TABLE AS c
ON d.contact_name = c.contact_name
-- Populate SONGS_TABLE with info from COMPANIES
INSERT INTO SONGS_TABLE (companyID, contactID)
SELECT cm.companyID, cm.contact_ID
FROM COMPANIES AS cm
-- Populate Events with info from COMPANIES + dump
INSERT INTO Events (companyID, contactID, eventname, description)
SELECT cm.companyID, cm.contact_ID, d.event_name, d.event_description
FROM DUMP_CONTACTS AS d
INNER JOIN COMPANIES AS cm
ON d.company = cm.name
I first populate CONTACTS_TABLE and then, since the contactID is required for records in COMPANIES, insert records from CONTACTS_TABLE joining the dump. SONGS_TABLE takes data directly from COMPANIES, and lastly the Events gets its data by joining COMPANIES and the dump.
I am currently writing my truly first PHP Application and i would like to know how to project/design/implement MySQL Views properly;
In my particular case User data is spread across several tables (as a consequence of Database Normalization) and i was thinking to use a View to group data into one large table:
CREATE VIEW `Users_Merged` (
name,
surname,
email,
phone,
role
) AS (
SELECT name, surname, email, phone, 'Customer'
FROM `Customer`
)
UNION (
SELECT name, surname, email, tel, 'Admin'
FROM `Administrator`
)
UNION (
SELECT name, surname, email, tel, 'Manager'
FROM `manager`
);
This way i can use the View's data from the PHP app easily but i don't really know how much this can affect performance.
For example:
SELECT * from `Users_Merged` WHERE role = 'Admin';
Is the right way to filter view's data or should i filter BEFORE creating the view itself?
(I need this to have a list of users and the functionality to filter them by role).
EDIT
Specifically what i'm trying to obtain is Denormalization of three tables into one. Is my solution correct?
See Denormalization on wikipedia
In general, the database engine will perform the optimization for you. That means that the engine is going to figure out that the users table needs to be filtered before being joined to the other tables.
So, go ahead and use your view and let the database worry about it.
If you detect poor performance later, use MySQL EXPLAIN to get MySQL to tell you what it's doing.
PS: Your data design allows for only one role per user, is that what you wanted? If so, and if the example query you gave is one you intend to run frequently, make sure to index the role column in users.
If you have <1000 users (which seems likely), it doesn't really matter how you do it. If the user list is unlikely to change for long periods of time, the best you can probably do in terms of performance is to load the user list into memory and not go to the database at all. Even if user data were to change in the meantime, you could update the in-memory structure as well as the database and, again, not have to read user information from the DB.
You would probably be much better off normalizing the Administrators, Users, Managers and what-have-you into one uniform table with a discriminator column "Role" that would save a lot of duplication, which is essentially the reason to do normalization in the first place. You can then add the role specific details to distinct tables that you use with the User table in a join.
Your query could then look as simple as:
SELECT
`Name`, `Surname`, `Email`, `Phone`, `Role`
FROM `User`
WHERE
`User`.`Role` IN('Administrator','Manager','Customer', ...)
Which is also easier for the database to process than a set of unions
If you go a step further you could add a UserRoleCoupling table (instead of the Role column in User) that holds all the roles a User has per user:
CREATE TABLE `UserRoleCoupling` (
UserID INT NOT NULL, -- assuming your User table has and ID column of INT
RoleID INT NOT NULL,
PRIMARY KEY(UserID, RoleID)
);
And put the actual role information into a separate table as well:
CREATE TABLE `Role` (
ID INT NOT NULL UNIQUE AUTO_INCREMENT,
Name VARCHAR(64) NOT NULL
PRIMARY KEY (Name)
)
Now you can have multiple roles per User and use queries like
SELECT
`U`.`Name`
,`U`.`Surname`
,`U`.`Email`
,`U`.`Phone`
,GROUP_CONCAT(`R`.`Name`) `Roles`
FROM `User`
INNER JOIN `UserGroupCoupling` `UGC` ON `UGC`.`UserID` = `User`.`ID`
INNER JOIN `Role` `R` ON `R`.`ID` = `UGC`.`RoleID`
GROUP BY
`U`.`Name`, `U`.`Surname`, `U`.`Email`, `U`.`Phone`
Which would give you the basic User details and a comma seperated list of all assigned Role names.
In general, the best way to normalize a database structure is to make the tables as generic as possible without being redundant, so don't add administrator or customer specific details to the user table, but use a relationship between User and Administrator to find the specific administrator details. The way you're doing it now isn't really normalized.
I'll see if i can find my favorite book on database normalization and post the ISBN when I have time later.