A client needs to migrate a large volume of data and I feel this question could be generic enough for SO.
Legacy system
Student profiles contain fields like names, emails etc, as well as university name. The university name is represented as a string and as such is repeated which is wasteful and slow.
Our new form
A more efficient solution is to have a table called university that only stores the university name once with a foreign key (university_id) and the HTML dropdown just POSTs the university_id to the server. This makes things much faster for doing GROUP BY queries, for example. New form data going into the database works fine.
The problem
How can we write a query that will INSERT all the other columns (first_name, last_name, email, ...) but then rather than inserting the university string, find out its university_id from the university table and INSERT the corresponding int instead of the original string? (scenario: data is in a CSV file that we will manipulate into INSERT INTO syntax)
Many thanks.
Use INSERT INTO ... SELECT with a LEFT JOIN. Left is chosen so that student record won't get discarded if it has a null value for university_name.
INSERT INTO students_new(first_name, last_name, email, university_id)
SELECT s.first_name, s.last_name, s.email, u.university_id
FROM students_old s
LEFT JOIN university u ON s.university_name = u.university_name
Table and column names are to be replaced for real ones. Above assumes that your new table for students holding foreign key to university is students_new while the old one (from before normalisation) is students_old.
I want to model a database to store data of several types of tournaments (whith different types of modes: single rounds, double rounds, league, league + playoffs, losers, ...).
Maybe, this project would be a kind of Challonge: www.challonge.com
My question is: How to create a model in sql-relationship database to store all this types of tournaments?
I can't imagine how to do this work. There is a lot of different tables but all tables is related to one attribute: tournamentType...
Can I store a tournamentType field and use this field to select the appropiate table on query?
Thanks you
I can understand why you're struggling with modeling this. One of the key reasons why this is difficult is because of the object relational impendance-mismatch. While I am a huge fan of SQL and it is an incredibly powerful way of being able to organize data, one of its downfalls - and why NoSQL exists - is because SQL is different from Object Oriented Programming. When you describe leagues, with different matches, it's pretty easy to picture this in object form: A Match object is extended by League_Match, Round_Match, Knockout_Match, etc. Each of these Match objects contains two Team objects. Team can be extended to Winner and Loser...
But this is not how SQL databases work.
So let's translate this into relationships:
I want to model a database to store data of several types of tournaments (whith different types of modes: single rounds, double rounds, league, league + playoffs, losers, ...).
Tournaments and "modes" are a one to many (1:n) relationship.
Each tournament has many teams, and each team can be part of many tournaments (n:n).
Each team has many matches, and each match has two teams (n:n).
Each tournament has multiple matches but each match only belongs to one tournament (1:n).
The missing piece here that is hard to define as a universal relationship?
- In rounds, each future match has two teams.
- In knockout matches, each future match has an exponential but shrinking number of choices depending on the number of initial teams.
You could define this in the database layer or you could define this in your application layer. If your goal is to keep referential integrity in mind (which is one of the key reasons I use SQL databases) then you'll want to keep it in the database.
Another way of looking at this: I find that it is easiest for me to design a database when I think about the end result by thinking of it as JSON (or an array, if you prefer) that I can interact with.
Let's look at some sample objects:
Tournament:
[
{
name: "team A",
schedule: [
{
date: "11/1/15",
vs: "team B",
score1: 2,
score2: 4
},
{
date: "11/15/15",
vs: "team C",
}
]
}
],
[
//more teams
]
As I see it, this works well for everything except for knockout, where you don't actually know which team is going to play which other team until an elimination takes place. This confirms my feeling that we're going to create descendants of a Tournament class to handle specific types of tournaments.
Therefore I'd recommend three tables with the following columns:
Tournament
- id (int, PK)
- tournament_name
- tournament_type
Team
- id (int, PK)
- team_name (varchar, not null)
# Any other team columns you want.
Match
- id (int, PK, autoincrement)
- date (int)
- team_a_score (int, null)
- team_b_score (int, null)
- status (either future, past, or live)
- tournament_id (int, Foreign Key)
Match_Round
- match_id (int, not null, foreign key to match.id)
- team_a_id (int, not null, foreign key to team.id)
- team_b_id (int, not null, foreign key to team.id)
Match_Knockout
- match_id (int, not null, foreign key to match.id)
- winner__a_of (match_id, not null, foreign key to match.id)
- winner_b_of (match_id, not null, foreign key to match.id)
You have utilized sub-tables in this model. The benefit to this is that knockout matches and round/league matches are very different and you are treating them differently. The downside is that you're adding additional complexity which you're going to have to handle. It may be a bit annoying, but in my experience trying to avoid it only adds more headaches and makes it far less scalable.
Now I'll go back to referential integrity. The challenge with this setup is that theoretically you could have values in both Match_Round and Match_Knockout when they only belong in one. To prevent this, I'd utilize TRIGGERs. Basically, stick a trigger on both the Match_Round and Match_Knockout tables, which prevents an INSERT if the tournament_type is not acceptable.
Although this is a bit of a hassle to set up, it does have the happy benefit of being easy to translate into objects while still maintaining referential integrity.
You could create tables to hold tournament types, league types, playoff types, and have a schedule table, showing an even name along with its tournament type, and then use that relationship to retrieve information about that tournament. Note, this is not MySQL, this is more generic SQL language:
CREATE TABLE tournTypes (
ID int autoincrement primary key,
leagueId int constraint foreign key references leagueTypes.ID,
playoffId int constraint foreign key references playoffTypes.ID
--...other attributes would necessitate more tables
)
CREATE TABLE leagueTypes(
ID int autoincrement primary key,
noOfTeams int,
noOfDivisions int,
interDivPlay bit -- e.g. a flag indicating if teams in different divisions would play
)
CREATE TABLE playoffTypes(
ID int autoincrement primary key,
noOfTeams int,
isDoubleElim bit -- e.g. flag if it is double elimination
)
CREATE TABLE Schedule(
ID int autoincrement primary key,
Name text,
startDate datetime,
endDate datetime,
tournId int constraint foreign key references tournTypes.ID
)
Populating the tables...
INSERT INTO tournTypes VALUES
(1,2),
(1,3),
(2,3),
(3,1)
INSERT INTO leagueTypes VALUES
(16,2,0), -- 16 teams, 2 divisions, teams only play within own division
(8,1,0),
(28,4,1)
INSERT INTO playoffTypes VALUES
(8,0), -- 8 teams, single elimination
(4,0),
(8,1)
INSERT INTO Schedule VALUES
('Champions league','2015-12-10','2016-02-10',1),
('Rec league','2015-11-30','2016-03-04-,2)
Getting info on a tournament...
SELECT Name
,startDate
,endDate
,l.noOfTeams as LeagueSize
,p.noOfTeams as PlayoffTeams
,case p.doubleElim when 0 then 'Single' when 1 then 'Double' end as Elimination
FROM Schedule s
INNER JOIN tournTypes t
ON s.tournId = t.ID
INNER JOIN leagueTypes l
ON t.leagueId = l.ID
INNER JOIN playoffTypes p
ON t.playoffId = p.ID
It's easy to make data models far more complex than they need to be. A lot of what you describe is business logic that can't actually be answered by a perfect data model. Most of the tournament logic should be captured outside the data model in a programming language, such as mysql functions, Java, Python, C# etc. Really your data model should be all "static" data you need, and none of the moving parts. I would suggest the data model to be:
METADATA TABLES
League_Type:
Id
Description
Playoff_Rounds
Resolve_Losing_Teams
Max_Number_of_Teams
Min_Number_of_Teams
Number_Of_Games_In_Season
any other "settings" you want...
Game_Type:
Id
League_Type_Id (fk to League_Type)
Game_Type_Name (e.g. regular season, playoff, championship)
DATA TABLES
League:
Id
League_Type_Id (fk to League_Type)
League_Name
Team:
Id
League_Id (fk to League_Type)
Team_Name
Game:
Id
League_Id (fk to League_Type)
Game_Type_Id (fk to Game_Type)
Home_Team_Id (fk to Team)
Visiting_Team_Id (fk to Team)
Week_of_season
Home_Team_Score
Visiting_Team_Score
Winning_Team (Home or Visitor)
From a data model perspective that should really be all you need. The procedural code should handle things like:
Creating games based on a randomized schedule
Updating scores and winning team in the Game table
Creating playoff games based on when the number of games in the season is up per the league settings table.
Setting matchups in the playoffs based on how many games each team has one.
Forcing the number of teams in a league to be between Min_Number_of_Teams and Max_Number_of_Teams prior to the season beginning.
Etc.
You'll also likely want to create some views based on these tables to create some other meaningful information for end users:
Wins/Losses for a team (based on the Team table joined to the Game table)
Current team standings based on the previous wins/losses view for all teams
Home wins for a team
Road wins for a team
Anything else your heart desires!
Final thoughts
You do not want to do anything that would repeat data stored in the database. A great example of this would be creating a separate table for playoff games vs. regular season games. Most of the columns would be duplicated because almost all of the functionality and data stored between the two tables is the same. To create both tables would break the rules of normalization. The more compact and simple your data structure can be, the less procedural code you will have to write, and the easier it will be to maintain your database.
This looks like a generalization/specialization problem to me. I will answer how to do this in a general way, as you didn't give much detail about the entities you acttually need.
Suppose you have an entity Vehicle (replacing your tournament) and the specialization Train and Car. All Vehicles have an attribute maxSpeed and Train has numberOfWagons and Car has trunkCapacity.
To model this, you have several options:
(1) Merge them all into one table
You can create one table Vehicle with columns maxSpeed, numberOfWagons and trunkCapacity. You add another field vehicleType to distinguish between Trains and Cars, and you'll probably want an Id.
For any concrete Verhicle some of the columns will always be null.
(2) use separate super/sub tables
Alternatively you can create a table Vehicle with just Id and maxSpeed and create tables for Train and Car which just hold the extra attributes, namely numberOfWagons and trunkCapacity (and also an Id).
In this case, creating a new Car will require two inserts, one in the Vehicle Table and one in the Car Table. To select a car you would have to join Vehicle and Car, unless you are only interested in its vehicle attributes.
While this approach is more complicated than (1) it has some benefits
you will not have that many null columns. A contraint like "the trunkCapacity of a car must not be null" can be easily epressed.
you can add new Verhicle Types by just adding new tables and without changing any of the existing ones.
Converting between the two
From (2), you can still get a "merged" view (as in (1)) of all your vehicles by creating a view. This view will be a union of several selects, where each select joins one specialization (Train or Car) with Vehicle and adds constant null columns for the attributes it cannot retrieve from the specialization, so all selects in the union return the same number of columns.
From (1) you can create individual views for Trains and Cars by selecting a particular vehicle type from the Vehicle table and only the columns which are relevant for that vehicle type.
(3) A mixture of the two
You can merge the most prominent attributes into one table and exile the more exotic attributes into extra tables.
A word of caution
One must be careful not to overdo generalizations. It is often better to just model a Cat as a Cat. In object-oriented programming, generalization ("Superclasses") are treasured. There is saves code duplication, but column duplication is not nearly as bad as code duplication. Remember that you're just modelling data and not behaviour. And also in OO-land generalizations are often overdone.
I don't really see the complexity of this model. Let's see:
You need:
Tournaments table (to uniquely define each tournament - league, cup, etc.)
Tournament types, periods and phases (to define each tournament characteristics)
Matchups (every single match of every tournament, including home and visitor teams and the final score)
Matchup types (linked to the tournament phases)
Teams, roster and players (last 2 optional if you intend to add that level of information)
So as an example:
tournament: Premier Leage
tournament type: league
tournament_period: 2015-2016
tournament_phase: 20th round
matchup: Chelsea VS Liverpool
matchup_type: second leg
score_visitor: 2
score_local: 0
or
tournament: Champions league
tournament type: tournament
tournament_period: 2015-2016
tournament_phase: 2nd Round
matchup_type: first leg
matchup: Chelsea VS Barcelona
score_visitor: 5
score_local: 0
I did this pretty fast so the relationships and columns might not be right, but I guess you have a start point.
Hope it helps!
Regards
Not an elegant solution, but you could do the following:
Create a table that holds key value pairs of attributes for a given tournament. Each tournament would be stored in multiple rows.
CREATE TABLE TOURNAMENT {
TOURNAMENT_TYPE VARCHAR2(100) NOT NULL,
TOURNAMENT_NAME VARCHAR2(100) NOT NULL,
ATTRIBUTE_NAME VARCHAR2(100) NOT NULL,
ATTRIBUTE_VALUE VARCHAR2(100) NOT NULL
};
e.g.
TOURNAMENT_TYPE,TOURNAMENT_NAME,ATTRIBUTE_NAME,ATTRIBUTE_VALUE
Volleyball,My volleyball tournament,team_member_1,Josie
Volleyball,My volleyball tournament,team_member_2,Ralph
Volleyball,My volleyball tournament,Rounds Per Game,12
Soccer,My volleyball tournament,team_member_1,Jim
Soccer,My soccer tournament,team_member_2,Emma
Soccer,My soccer tournament,Tournament Duration,20
So I'm creating a web app with different user types that can come from different countries. Examples of the user types would be company, staff etc. Where a company would have a company_name field and staff would not.
In the users database I'm wondering if it's a good idea to implement a one table per column approach i.e for each user attribute there would be a table with a foreign key which would be the user_id and a value for the attribute value.
eg.
users.company_name =
id(PK), | user_id(FK) | 'company_name'
1 | 1 | company 1
users.email =
id(PK), | user_id(FK) | 'email'
1 | 1 | user#email.com
The same could be applied to an address database where different countries' addresses have different values.
Opinions?
The term you're looking for is "The Party Model"
You want to use Table Inheritance†, also known as subtype/supertype relationships to model stuff like this.
An Individual is a concretion of an abstract Legal Party. An Organization (e.g. a Company) is also a concretion of an abstract Legal Party.
"Staff" is not a subtype of Legal Party. It's a relationship between a Company and an Individual. A company hasMany staffRelationships with individuals.
I recommend Single Table Inheritance, as it's fast and simple. If you really don't like nulls, then go for Class Table Inheritance.
create table parties (
party_id int primary key,
type smallint not null references party_types(party_type_id), --elided,
individual_name text null,
company_name text null,
/* use check constraints for type vs individual/company values */
);
I'd go with PostgreSQL over MySQL (or MariaDB) if you're going to use Single Table Inheritance, as the latter do not support check constraints.
You can make user belongTo a party, or make party haveOne user.
† Which is different than PostgreSQL's Inheritance feature.
I'd create a single users table with company_name and email columns.
For addresses table, I'd start with something simple like this: id, address_line_1, address_line_2, city, state, country, zip.
With this strategy you'll have to do a lot of joining tables to get a meaningful query result. As a result your performance will suffer and you have very ineffective use of storage.
You should at least combine columns that will typically be combined for a logical entity in your application. So if a 'company' differs from 'staff' in that it has extra columns, you would create a table 'users.company_properties'.
I've got a prexisting table that contains all kinds of customer information. Currently it also has the "city" as well as the "region" and the "state" listed in a 3 columns as strings. Redundant info!
I'd like to create three new tables, one for the city and one for the region and one for the state, that will contain single entries for each of the cities etc, and then reference the ID back into the existing customer table with a location_id.
How would I go about exporting the distinct city names into the cities table, and the distinct regions into a regions table, and then have the cities reference the region_id and state_id table as well so that the information is all grouped!
Amatuer question for sure, but I appreciate any help!
You don't want three different tables! You want one table with three columns: city, state, region.
The reason is that city does not exist by itself. Consider (in the US) Springfield, IL. And Springfield, MA. Or Miami, FL and Miami, OH. What you have is a dimension of the data that has hierarchies. The right way to store this is at the lowest level (city in your case) with a "dimension" table providing the other information.
Assuming that your original data is correct, you can do something like this:
create table Cities (
CityId int auto_increment not null primary key,
City varchar(255),
State varchar(255),
Region varchar(255)
);
insert into Cities(City, State, Region)
select distinct City, State, Region
from YourTable;
I realize that this is not "standard normal form". But for most applications this works well. If you are doing this for an application where you want to pick states from a list, for instance, create an index on state and the query will be fast.
There are some circumstances where you might want separate tables at the state and region level. This would be the case if you had lots of different columns at those levels. And, in particular, if you were modifying the values in those columns. A flattened dimension (such as described here) is most appropriate when the data is static (cities don't change states very often). Normalization is most appropriate when you are changing values in the different levels.
I need a schema for fitness class.
The booking system needs to store max-number of students it can take, number of students who booked to join the class, students ids, datetime etc.
A student table needs to store classes which he/she booked. But this may not need if I store students ids in class tables.
I am hoping to get some good ideas.
Thanks in advance.
Student: ID, Name, ...
Class: ID, Name, MaxStudents, ...
Student_in_Class: STUDENT_ID, CLASS_ID, DATE_ENROLL
*Not a mySql guru, I typically deal w/ MS SQL, but I think you'll get the idea. You might need to dig a little in the mySql docs to find appropriate data types that match the ones I've suggested. Also, I only gave brief explanation for some types to clarify what they're for, since this is mySql and not MS SQL.
Class_Enrollment - stores the classes each student is registered for
Class_Enrollment_ID INT IDENTITY PK ("identity is made specifically
to serve as an id and it's a field that the system will manage
on its own. It automatically gets updated when a new record is
created. I would try to find something similar in mySql")
Class_ID INT FK
Student_ID INT FK
Date_Time smalldatetime ("smalldatetime just stores the date as a
smaller range of years than datetime + time up to minutes")
put a unique constraint index on class_id and student_id to prevent duplicates
Class - stores your classes
Class_ID INT IDENTITY PK
Name VARCHAR('size') UNIQUE CONSTRAINT INDEX ("UNIQUE CONSTRAINT INDEX is
like a PK, but you can have more than one in a table")
Max_Enrollment INT ("unless you have a different max for different sessions
of the same class, then you only need to define max enrollment once per
class, so it belongs in the class table, not the Class_Enrollment table")
Student - stores your students
Student_ID INT IDENTITY PK
First_Name VARCHAR('size')
Last_Name VARCHAR('size')
Date_of_Birth smalldatetime ("smalldatetime can store just the date,
will automatically put 0's for the time, works fine")
put a unique constraint index on fname, lname, and date of birth to eliminate duplicates (you may have two John Smiths, but two John Smiths w/ exact same birth date in same database is unlikely unless it's a very large database. Otherwise, consider using first name, last name, and phone as a unique constraint)