Are there any default tables on SqlFiddle that I can query from?
I want to try a basic analytical query on a simple table but I don't want to set up the schema and seed data etc.
normally I would do something like select * from all_objects
( http://sqlfiddle.com/ )
Auto Shop Database:
SQL fiddle
Library Database:
SQL fiddle
Countries Table:
SQL fiddle
You can use the "View Sample Fiddle" in the SQLFiddle app.
The following content is from "Auto Shop Database" from Stack Overflow Documentation
(archived here); copyright 2017 by FlyingPiMonster, Prateek, forsvarir, Tot Zam, Florin Ghita, Abhilash R Vankayala, WesleyJohnson, Matt, Mureinik, Magisch, Bostjan, Mzzzzzz, Franck Dernoncourt, enrico.bacis, JavaHopper, rdans, bignose, and CL.; licensed
under CC BY-SA 3.0. An archive of the full Stack Overflow
Documentation content can be found at archive.org, in which this
example is indexed by its topic ID: 280, as example: 1014.
Auto Shop Database
In the following example - Database for an auto shop business, we have a list of departments, employees, customers and customer cars. We are using foreign keys to create relationships between the various tables.
Live example: SQL fiddle
Relationships between tables
Each Department may have 0 or more Employees
Each Employee may have 0 or 1 Manager
Each Customer may have 0 or more Cars
Departments
| Id | Name |
|:---|:------|
| 1 | HR |
| 2 | Sales |
| 3 | Tech |
SQL statements to create the table:
CREATE TABLE Departments (
Id INT NOT NULL AUTO_INCREMENT,
Name VARCHAR(25) NOT NULL,
PRIMARY KEY(Id)
);
INSERT INTO Departments
([Id], [Name])
VALUES
(1, 'HR'),
(2, 'Sales'),
(3, 'Tech')
;
Employees
| Id | FName | LName | PhoneNumber | ManagerId | DepartmentId | Salary | HireDate |
|:---|:----------|:---------|:------------|:----------|:-------------|:-------|:-----------|
| 1 | James | Smith | 1234567890 | NULL | 1 | 1000 | 01-01-2002 |
| 2 | John | Johnson | 2468101214 | 1 | 1 | 400 | 23-03-2005 |
| 3 | Michael | Williams | 1357911131 | 1 | 2 | 600 | 12-05-2009 |
| 4 | Johnathon | Smith | 1212121212 | 2 | 1 | 500 | 24-07-2016 |
SQL statements to create the table:
CREATE TABLE Employees (
Id INT NOT NULL AUTO_INCREMENT,
FName VARCHAR(35) NOT NULL,
LName VARCHAR(35) NOT NULL,
PhoneNumber VARCHAR(11),
ManagerId INT,
DepartmentId INT NOT NULL,
Salary INT NOT NULL,
HireDate DATETIME NOT NULL,
PRIMARY KEY(Id),
FOREIGN KEY (ManagerId) REFERENCES Employees(Id),
FOREIGN KEY (DepartmentId) REFERENCES Departments(Id)
);
INSERT INTO Employees
([Id], [FName], [LName], [PhoneNumber], [ManagerId], [DepartmentId], [Salary], [HireDate])
VALUES
(1, 'James', 'Smith', 1234567890, NULL, 1, 1000, '01-01-2002'),
(2, 'John', 'Johnson', 2468101214, '1', 1, 400, '23-03-2005'),
(3, 'Michael', 'Williams', 1357911131, '1', 2, 600, '12-05-2009'),
(4, 'Johnathon', 'Smith', 1212121212, '2', 1, 500, '24-07-2016')
;
Customers
| Id | FName | LName | Email | PhoneNumber | PreferredContact |
|:---|:--------|:-------|:--------------------------|:------------|:-----------------|
| 1 | William | Jones | william.jones#example.com | 3347927472 | PHONE |
| 2 | David | Miller | dmiller#example.net | 2137921892 | EMAIL |
| 3 | Richard | Davis | richard0123#example.com | NULL | EMAIL |
SQL statements to create the table:
CREATE TABLE Customers (
Id INT NOT NULL AUTO_INCREMENT,
FName VARCHAR(35) NOT NULL,
LName VARCHAR(35) NOT NULL,
Email varchar(100) NOT NULL,
PhoneNumber VARCHAR(11),
PreferredContact VARCHAR(5) NOT NULL,
PRIMARY KEY(Id)
);
INSERT INTO Customers
([Id], [FName], [LName], [Email], [PhoneNumber], [PreferredContact])
VALUES
(1, 'William', 'Jones', 'william.jones#example.com', '3347927472', 'PHONE'),
(2, 'David', 'Miller', 'dmiller#example.net', '2137921892', 'EMAIL'),
(3, 'Richard', 'Davis', 'richard0123#example.com', NULL, 'EMAIL')
;
Cars
| Id | CustomerId | EmployeeId | Model | Status | Total Cost |
|:---|:-----------|:-----------|:-------------|:--------|:-----------|
| 1 | 1 | 2 | Ford F-150 | READY | 230 |
| 2 | 1 | 2 | Ford F-150 | READY | 200 |
| 3 | 2 | 1 | Ford Mustang | WAITING | 100 |
| 4 | 3 | 3 | Toyota Prius | WORKING | 1254 |
SQL statements to create the table:
CREATE TABLE Cars (
Id INT NOT NULL AUTO_INCREMENT,
CustomerId INT NOT NULL,
EmployeeId INT NOT NULL,
Model varchar(50) NOT NULL,
Status varchar(25) NOT NULL,
TotalCost INT NOT NULL,
PRIMARY KEY(Id),
FOREIGN KEY (CustomerId) REFERENCES Customers(Id),
FOREIGN KEY (EmployeeId) REFERENCES Employees(Id)
);
INSERT INTO Cars
([Id], [CustomerId], [EmployeeId], [Model], [Status], [TotalCost])
VALUES
('1', '1', '2', 'Ford F-150', 'READY', '230'),
('2', '1', '2', 'Ford F-150', 'READY', '200'),
('3', '2', '1', 'Ford Mustang', 'WAITING', '100'),
('4', '3', '3', 'Toyota Prius', 'WORKING', '1254')
;
The following content is from "Library Database" from Stack Overflow Documentation
(archived here); copyright 2017 by enrico.bacis, Bostjan, Shiva, WesleyJohnson, and Christian; licensed
under CC BY-SA 3.0. An archive of the full Stack Overflow
Documentation content can be found at archive.org, in which this
example is indexed by its topic ID: 280, as example: 1014.
Library Database
In this example database for a library, we have Authors, Books and BooksAuthors tables.
Live example: SQL fiddle
Authors and Books are known as base tables, since they contain column definition and data for the actual entities in the relational model. BooksAuthors is known as the relationship table, since this table defines the relationship between the Books and Authors table.
Relationships between tables
Each author can have 1 or more books
Each book can have 1 or more authors
Authors
(view table)
| Id | Name | Country |
|:---|:---------------------|:--------|
| 1 | J.D. Salinger | USA |
| 2 | F. Scott. Fitzgerald | USA |
| 3 | Jane Austen | UK |
| 4 | Scott Hanselman | USA |
| 5 | Jason N. Gaylord | USA |
| 6 | Pranav Rastogi | India |
| 7 | Todd Miranda | USA |
| 8 | Christian Wenz | USA |
SQL to create the table:
CREATE TABLE Authors (
Id INT NOT NULL AUTO_INCREMENT,
Name VARCHAR(70) NOT NULL,
Country VARCHAR(100) NOT NULL,
PRIMARY KEY(Id)
);
INSERT INTO Authors
(Name, Country)
VALUES
('J.D. Salinger', 'USA'),
('F. Scott. Fitzgerald', 'USA'),
('Jane Austen', 'UK'),
('Scott Hanselman', 'USA'),
('Jason N. Gaylord', 'USA'),
('Pranav Rastogi', 'India'),
('Todd Miranda', 'USA'),
('Christian Wenz', 'USA')
;
Books
(view table)
| Id | Title |
|:---|:--------------------------------------|
| 1 | The Catcher in the Rye |
| 2 | Nine Stories |
| 3 | Franny and Zooey |
| 4 | The Great Gatsby |
| 5 | Tender id the Night |
| 6 | Pride and Prejudice |
| 7 | Professional ASP.NET 4.5 in C# and VB |
SQL to create the table:
CREATE TABLE Books (
Id INT NOT NULL AUTO_INCREMENT,
Title VARCHAR(50) NOT NULL,
PRIMARY KEY(Id)
);
INSERT INTO Books
(Id, Title)
VALUES
(1, 'The Catcher in the Rye'),
(2, 'Nine Stories'),
(3, 'Franny and Zooey'),
(4, 'The Great Gatsby'),
(5, 'Tender id the Night'),
(6, 'Pride and Prejudice'),
(7, 'Professional ASP.NET 4.5 in C# and VB')
;
BooksAuthors
(view table)
| BookId | AuthorId |
|:-------|:---------|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
| 6 | 3 |
| 7 | 4 |
| 7 | 5 |
| 7 | 6 |
| 7 | 7 |
| 7 | 8 |
SQL to create the table:
CREATE TABLE BooksAuthors (
AuthorId INT NOT NULL,
BookId INT NOT NULL,
FOREIGN KEY (AuthorId) REFERENCES Authors(Id),
FOREIGN KEY (BookId) REFERENCES Books(Id)
);
INSERT INTO BooksAuthors
(BookId, AuthorId)
VALUES
(1, 1),
(2, 1),
(3, 1),
(4, 2),
(5, 2),
(6, 3),
(7, 4),
(7, 5),
(7, 6),
(7, 7),
(7, 8)
;
Examples
View all authors (view live example):
SELECT * FROM Authors;
View all book titles (view live example):
SELECT * FROM Books;
View all books and their authors (view live example):
SELECT
ba.AuthorId,
a.Name AuthorName,
ba.BookId,
b.Title BookTitle
FROM BooksAuthors ba
INNER JOIN Authors a ON a.id = ba.authorid
INNER JOIN Books b ON b.id = ba.bookid
;
The following content is from "Countries Table" from Stack Overflow Documentation
(archived here); copyright 2017 by enrico.bacis, Bostjan, and Shiva; licensed
under CC BY-SA 3.0. An archive of the full Stack Overflow
Documentation content can be found at archive.org, in which this
example is indexed by its topic ID: 280, as example: 9933.
Countries Table
In this example, we have a Countries table. A table for countries has many uses, especially in Financial applications involving currencies and exchange rates.
Live example: SQL fiddle
Some Market data software applications like Bloomberg and Reuters require you to give their API either a 2 or 3 character country code along with the currency code. Hence this example table has both the 2-character ISO code column and the 3 character ISO3 code columns.
Countries
(view table)
| Id | ISO | ISO3 | ISONumeric | CountryName | Capital | ContinentCode | CurrencyCode |
|:---|:----|:-----|:-----------|:--------------|:-----------|:--------------|:-------------|
| 1 | AU | AUS | 36 | Australia | Canberra | OC | AUD |
| 2 | DE | DEU | 276 | Germany | Berlin | EU | EUR |
| 2 | IN | IND | 356 | India | New Delhi | AS | INR |
| 3 | LA | LAO | 418 | Laos | Vientiane | AS | LAK |
| 4 | US | USA | 840 | United States | Washington | NA | USD |
| 5 | ZW | ZWE | 716 | Zimbabwe | Harare | AF | ZWL |
SQL to create the table:
CREATE TABLE Countries (
Id INT NOT NULL AUTO_INCREMENT,
ISO VARCHAR(2) NOT NULL,
ISO3 VARCHAR(3) NOT NULL,
ISONumeric INT NOT NULL,
CountryName VARCHAR(64) NOT NULL,
Capital VARCHAR(64) NOT NULL,
ContinentCode VARCHAR(2) NOT NULL,
CurrencyCode VARCHAR(3) NOT NULL,
PRIMARY KEY(Id)
)
;
INSERT INTO Countries
(ISO, ISO3, ISONumeric, CountryName, Capital, ContinentCode, CurrencyCode)
VALUES
('AU', 'AUS', 36, 'Australia', 'Canberra', 'OC', 'AUD'),
('DE', 'DEU', 276, 'Germany', 'Berlin', 'EU', 'EUR'),
('IN', 'IND', 356, 'India', 'New Delhi', 'AS', 'INR'),
('LA', 'LAO', 418, 'Laos', 'Vientiane', 'AS', 'LAK'),
('US', 'USA', 840, 'United States', 'Washington', 'NA', 'USD'),
('ZW', 'ZWE', 716, 'Zimbabwe', 'Harare', 'AF', 'ZWL')
;
For MySQL fiddles, follow the links from the answer by Daniel Käfer.
For Microsoft SQL Server versions of the same tables, use these links:
Auto Shop Database:
SQL fiddle
Library Database:
SQL fiddle
Countries Table:
SQL fiddle
Related
Do not judge strictly, but I can not figure it out in any way
My table:
CREATE table courses (id INT PRIMARY KEY AUTO_INCREMENT,
-> faculty VARCHAR(55) NULL,
-> number INT(10) NULL,
-> diff VARCHAR(10) NULL);
mysql> select * from courses;
Target. Inject values ('ez', 'mid', 'hard') into diff column.
For exampl, im trying this:
mysql> INSERT courses (diff) VALUES ('ez');
OR
mysql> UPDATE courses SET faculty = 'chem', number = 2, diff = 'mid';
Add rows with empty id(values NULL).
PLZ help me!
I want to get this result
+----+---------+--------+------+
| id | faculty | number | diff |
+----+---------+--------+------+
| 1 | bio | 1 | ez |
| 2 | chem | 2 | mid |
| 3 | math | 3 | hard |
| 4 | geo | 4 | mid |
| 5 | gum | 5 | ez |
+----+---------+--------+------+
You can use a case expression in an UPDATE statements:
UPDATE courses
SET diff=CASE
WHEN faculty in ('bio', 'gum') THEN 'ez'
WHEN faculty in ('chem', 'geo') THEN 'mid'
WHEN faculty = 'math' THEN 'hard'
END;
I have a MariaDB table that looks like this:
+--------+--------+--------+---------------------+
| realm | key2 | userId | date |
+--------+--------+--------+---------------------+
| AB3 | 123 | 1 | 2017-08-04 17:30:00 |
| AB3 | 124 | 1 | 2017-08-04 17:30:00 |
| AB3 | 125 | 1 | 2017-08-04 17:30:00 |
| XY7 | 97 | 2 | 2017-08-04 17:35:00 |
| XY7 | 98 | 2 | 2017-08-04 17:35:00 |
| XY7 | 99 | 2 | 2017-08-04 17:35:00 |
| AB3 | 110 | 3 | 2017-08-04 17:40:00 |
| AB3 | 111 | 3 | 2017-08-04 17:40:00 |
+--------+--------+--------+---------------------+
PRIMARY_KEY (realm, key2)
INDEX (realm, userId)
INDEX (date)
This table operates as some sort of queue for processing user actions. Basically a server always takes the oldest data from this table, processes it and deletes it from this table. Each realm has its own server processing this queue.
Now I want to find out a user's position in queue for that realm. So, using the example above, when I request the position for userId 3 in realm 'AB3', I want to get the result 2 because only one other user (userId 1) is to be processed earlier for realm AB3.
(The row key2 might be irrelevant in this example. I only included it because it is part of the primary key which may make it relevant for finding a good solution)
Here is the SQL schema:
CREATE TABLE `queue` (
`realm` varchar(5) NOT NULL,
`key2` int(10) UNSIGNED NOT NULL,
`userId` int(10) UNSIGNED NOT NULL,
`date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO `queue` (`realm`, `key2`, `userId`, `date`) VALUES
('AB3', 110, 3, '2017-08-04 17:40:00'),
('AB3', 111, 3, '2017-08-04 17:40:00'),
('AB3', 123, 1, '2017-08-04 17:30:00'),
('AB3', 124, 1, '2017-08-04 17:30:00'),
('AB3', 125, 1, '2017-08-04 17:30:00'),
('XY7', 97, 2, '2017-08-04 17:35:00'),
('XY7', 98, 2, '2017-08-04 17:35:00'),
('XY7', 99, 2, '2017-08-04 17:35:00');
ALTER TABLE `queue`
ADD PRIMARY KEY (`realm`,`key2`),
ADD KEY `ru` (`realm`,`userId`) USING BTREE,
ADD KEY `date` (`date`);
I came up with this query that seems to work but is pretty slow (~3 seconds) on a table with 10,000,000 entries:
SELECT (COUNT(DISTINCT `realm`, `userId`)+1) `position`
FROM `queue`
WHERE `realm` = 'AB3'
AND `date` < (
SELECT `date`
FROM `queue`
WHERE `realm` = 'AB3' AND `userId` = 3
GROUP BY `realm`, `userId`
)
SQL Fiddle: http://sqlfiddle.com/#!9/fb04fd/9/0
EXPLAIN EXTENDED of this query:
+----+-------------+-------+-------------+-----------------+------------+---------+-------+---------+----------+------------------------------------------+--+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |
+----+-------------+-------+-------------+-----------------+------------+---------+-------+---------+----------+------------------------------------------+--+
| 1 | PRIMARY | queue | ref | PRIMARY,ru,date | PRIMARY | 767 | const | 5266123 | 100.00 | Using where | |
| 2 | SUBQUERY | queue | index_merge | PRIMARY,ru | ru,PRIMARY | 771,767 | | 496 | 75.00 | Using intersect(ru,PRIMARY); Using where | |
+----+-------------+-------+-------------+-----------------+------------+---------+-------+---------+----------+------------------------------------------+--+
Do you have any ideas how I can optimize this query to run faster on a table with like 10,000,000 entries?
Other queries that are run on this table:
SELECT `m`.*
FROM `queue` `m`
JOIN (
SELECT `m`.*
FROM `queue` `m`
WHERE `m`.`realm` = ?
ORDER BY `date` ASC
LIMIT 1
) `mm` ON `m`.`realm` = `mm`.`realm` AND `m`.`userId` = `mm`.`userId`;
and
DELETE FROM `queue` WHERE `realm` = ? AND `userId` = ?;
How could I optimize my indexes?
I feel like something wrong with the table DDL. Anyway, i would have rewriten your query like :
SELECT (COUNT(DISTINCT `userId`)+1) `position`
FROM `queue`
WHERE `realm` = 'AB3'
AND `date` < (
SELECT min(`date`)
FROM `queue`
WHERE `realm` = 'AB3' AND `userId` = 3
)
and perhaps have a really specific index for this query like :
index (realm, date)
You can try the sheety index
index (realm, date, userId)
but not even sure it will be faster that the previous one.
The theme of this question is to maintain the user comments over my website.
I had around 25000 articles on my website(of different categories) and each article has a comments section below it.Since the number of comments increased over 70,000 I decided to divide the articles into various tables depending on its category articles_of_type_category and a corresponding comments table article_category_comments for each table,assuming that it would improve the performance in future (though currently its working fine)
Now I have two questions :
1) Should I divide the database or there will no degradation in performance if table grows further in size?
2)If yes,then I have some problem in SQL for join operation for the new database design.On the comments page for each article I show the comments,name of the person who made the comment and his points.
So suppose user is viewing the article 3, hence I need to obtain the following detail to show on the page of article 3
-------------------------------------------------------------------------------------------
serial#| comment | name_who_made_this_comment | points | gold | silver | bronze
-------------------------------------------------------------------------------------------
| | | | | |
| | | | | |
by joining these three tables
user_details
+----+--------+----------+
| id | name | college |
+----+--------+----------+
| 1 | naveen | a |
| 2 | rahul | b |
| 3 | dave | c |
| 4 | tom | d |
+----+--------+----------+
score (this table stores the user points like stackoverflow)
+----+--------+------+--------+--------+---------+
| id | points | gold | silver | bronze | user_id |
+----+--------+------+--------+--------+---------+
| 1 | 2354 | 2 | 9 | 25 | 3 |
| 2 | 4562 | 1 | 9 | 11 | 2 |
| 3 | 1123 | 7 | 9 | 11 | 1 |
| 4 | 3457 | 0 | 9 | 4 | 4 |
+----+--------+------+--------+--------+---------+
comments (this table stores comment, id of the article on which it was made,and user id)
+----+----------------------------+-------------+---------+
| id | comment | article_id | user_id |
+----+----------------------------+-------------+---------+
| 1 | This is a nice article | 3 | 1 |
| 2 | This is a tough article | 3 | 4 |
| 3 | This is a good article | 2 | 7 |
| 4 | This is a good article | 1 | 3 |
| 5 | Please update this article | 4 | 4 |
+----+----------------------------+-------------+---------+
I tried something like
select * from comments join (select * from user_details join points where user_details.id=points.user_id)as joined_temp where comments.id=joined_temp.u_id and article_id=3;
This is a response to this comment, "#DanBracuk:It would really be useful if you give an overview by naming the tables and corresponding column names"
Table category
categoryId int not null, autoincrement primary key
category varchar(50)
Sample categories could be "Fairy Tale", "World War I", or "Movie Stars".
Table article
articleId int not null, autoincrement primary key
categoryId int not null foreign key
text clob, or whatever the mysql equivalent is
Since the comment was in response to my comment about articles and categories, this answer is limited to that.
I would start with a table with articles and categories. Then use a bridge table to link the both. My advice would be to index the categories in the bridge table. This would speed up the access.
Example of table structure:
CREATE TABLE Article (
id int NOT NULL AUTO_INCREMENT PRIMARY KEY,
title varchar(100) NOT NULL
);
INSERT INTO Article
(title)
VALUES
('kljlkjlkjalk'),
('aiouiwiuiuaoijukj');
CREATE TABLE Category (
id int NOT NULL AUTO_INCREMENT PRIMARY KEY,
name varchar(100)
);
INSERT INTO Category
(name)
VALUES
('kljlkjlkjalk'),
('aiouiwiuiuaoijukj');
CREATE TABLE Article_Category (
id int NOT NULL AUTO_INCREMENT PRIMARY KEY,
article_id int,
category_id int
);
INSERT INTO Article_Category
(article_id, category_id)
VALUES
(1,1),
(1,2);
CREATE TABLE User_Details
(`id` int, `name` varchar(6), `college` varchar(1))
;
INSERT INTO User_Details
(`id`, `name`, `college`)
VALUES
(1, 'naveen', 'a'),
(2, 'rahul', 'b'),
(3, 'dave', 'c'),
(4, 'tom', 'd')
;
CREATE TABLE Score
(`id` int, `points` int, `gold` int, `silver` int, `bronze` int, `user_id` int)
;
INSERT INTO Score
(`id`, `points`, `gold`, `silver`, `bronze`, `user_id`)
VALUES
(1, 2354, 2, 9, 25, 3),
(2, 4562, 1, 9, 11, 2),
(3, 1123, 7, 9, 11, 1),
(4, 3457, 0, 9, 4, 4)
;
CREATE TABLE Comment
(`id` int, `comment` varchar(26), `article_id` int, `user_id` int)
;
INSERT INTO Comment
(`id`, `comment`, `article_id`, `user_id`)
VALUES
(1, 'This is a nice article', 3, 1),
(2, 'This is a tough article', 3, 4),
(3, 'This is a good article', 2, 7),
(4, 'This is a good article', 1, 3),
(5, 'Please update this article', 4, 4)
;
Try this:
SQLFiddle Demo
Best of luck.
70000 elements are not so many. In fact the number is close to nothing. Your problem lies in bad design. I have a table with many millions of records and when I request to the application server which executes complex queries in the backend and it responds in less than a second. So you are definitely doing something in sub-optimal design. I think that a detailed answer would take too much space and effort (as we have a complete science built on your question) which is out of scope in this website, so I choose to point you to the right direction:
Read about normalization (1NF, 2NF, 3NF, BCNF and so on) and compare it to your design.
Read about indexing and other implicit optimizations
Optimize your queries and minimize the number of queries
As to answer your concrete question: No, you should not "divide" your table. You should fix the structural errors in your database schema and optimize the algorithms using your database.
I had a table for stores containing store name and address. After some discussion, we are now normalizing the the table, putting address in separate tables. This is done for two reasons:
Increase search speed for stores by location / address
Increase execution time for checking misspelled street names using the Levenshtein algorithm when importing stores.
The new structure looks like this (ignore typos):
country;
+--------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| id | varchar(2) | NO | PRI | NULL | |
| name | varchar(45) | NO | | NULL | |
| prefix | varchar(5) | NO | | NULL | |
+--------------------+--------------+------+-----+---------+-------+
city;
+--------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| city | varchar(50) | NO | | NULL | |
+--------------------+--------------+------+-----+---------+-------+
street;
+--------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| street | varchar(50) | YES | | NULL | |
| fk_cityID | int(11) | NO | | NULL | |
+--------------------+--------------+------+-----+---------+-------+
address;
+--------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| streetNum | varchar(10) | NO | | NULL | |
| street2 | varchar(50) | NO | | NULL | |
| zipcode | varchar(10) | NO | | NULL | |
| fk_streetID | int(11) | NO | | NULL | |
| fk_countryID | int(11) | NO | | NULL | |
+--------------------+--------------+------+-----+---------+-------+
*street2 is for secondary reference or secondary address in e.g. the US.
store;
+--------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| name | varchar(50) | YES | | NULL | |
| street | varchar(50) | YES | | NULL | |
| fk_addressID | int(11) | NO | | NULL | |
+--------------------+--------------+------+-----+---------+-------+
*I've left out address columns in this table to shorten code
The new tables have been populated with correct data and the only thing remaining is to add foreign key address.id in store table.
The following code lists all street names correctly:
select a.id, b.street, a.street2, a.zipcode, c.city, a.fk_countryID
from address a
left join street b on a.fk_streetID = b.id
left join city c on b.fk_cityID = c.id
How can I update fk_addressID in store table?
How can I list all stores with correct address?
Is this bad normalization considering the reasons given above?
UPDATE
It seems like the following code lists all stores with correct address - however it is a bit slow (I have about 2000 stores):
select a.id, a.name, b.id, c.street
from sl_store a, sl_address b, sl_street c
where b.fk_streetID = c.id
and a.street1 = c.street
group by a.name
order by a.id
I'm not going to speak to misspellings. Since you're importing the data, misspellings are better handled in a staging table.
Let's look at this slightly simplified version.
create table stores
(
store_name varchar(50) primary key,
street_num varchar(10) not null,
street_name varchar(50) not null,
city varchar(50) not null,
state_code char(2) not null,
zip_code char(5) not null,
iso_country_code char(2) not null,
-- Depending on what kind of store you're talking about, you *could* have
-- two of them at the same address. If so, drop this constraint.
unique (street_num, street_name, city, state_code, zip_code, iso_country_code)
);
insert into stores values
('Dairy Queen #212', '232', 'N 1st St SE', 'Castroville', 'CA', '95012', 'US'),
('Dairy Queen #213', '177', 'Broadway Ave', 'Hartsdale', 'NY', '10530', 'US'),
('Dairy Queen #214', '7640', 'Vermillion St', 'Seneca Falls', 'NY', '13148', 'US'),
('Dairy Queen #215', '1014', 'Handy Rd', 'Olive Hill', 'KY', '41164', 'US'),
('Dairy Mart #101', '145', 'N 1st St SE', 'Castroville', 'CA', '95012', 'US'),
('Dairy Mart #121', '1042', 'Handy Rd', 'Olive Hill', 'KY', '41164', 'US');
Although a lot of people firmly believe that ZIP code determines city and state in the US, that's not the case. ZIP codes have to do with how carriers drive their routes, not with geography. Some cities straddle the borders between states; single ZIP code routes can cross state lines. Even Wikipedia knows this, although their examples might be out of date. (Delivery routes change constantly.)
So we have a table that has two candidate keys,
{store_name}, and
{street_num, street_name, city, state_code, zip_code, iso_country_code}
It has no non-key attributes. I think this table is in 5NF. What do you think?
If I wanted to increase the data integrity for street names, I might start with something like this.
create table street_names
(
street_name varchar(50) not null,
city varchar(50) not null,
state_code char(2) not null,
iso_country_code char(2) not null,
primary key (street_name, city, state_code, iso_country_code)
);
insert into street_names
select distinct street_name, city, state_code, iso_country_code
from stores;
alter table stores
add constraint streets_from_street_names
foreign key (street_name, city, state_code, iso_country_code)
references street_names (street_name, city, state_code, iso_country_code);
-- I don't cascade updates or deletes, because in my experience
-- with addresses, that's almost never the right thing to do when a
-- street name changes.
You could (and probably should) repeat this process for city names, state names (state codes), and country names.
Some problems with your approach
You can apparently enter a street id number for a street that's in the US, along with the country id for Croatia. (The "full name" of a city, so to speak, is the kind of fact you probably want to store in order to increase data integrity. That's probably also true of the "full name" of a street.)
Using id numbers for every bit of data greatly increases the number of joins required. Using id numbers doesn't have anything to do with normalization. Using id numbers without corresponding unique constraints on the natural keys--an utterly commonplace mistake--allows duplicate data.
I've got an interesting dilemma now. I have a database schema like the following:
GameList:
+-------+----------+-----------+------------+--------------------------------+
| id | steam_id | origin_id | impulse_id | game_title |
+-------+----------+-----------+------------+--------------------------------+
| 1 | 17450 | NULL | NULL | Dragon Age: Origins |
| 2 | NULL | 138994900 | NULL | Dragon Age(TM): Origins |
| 3 | NULL | NULL | dragonage | Dragon Age Origins |
| 4 | 47850 | 201841300 | fifamgr11 | FIFA Manager 11 |
| ... | ... | ... | ... | ... |
+-------+----------+-----------+------------+--------------------------------+
GameAlias:
+----------+-----------+
| old_id | new_id |
+----------+-----------+
| 2 | 1 |
| 3 | 1 |
| ... | ... |
+----------+-----------+
Depending on whether the stores use the same title for the game there may be no issues, or there may be multiple rows for the same game. The Alias table exists to resolve this issue, by stating that id 2 and id 3 are just aliases for id 1.
What I need is an SQL query which uses both the GameList table and the GameAlias table and returns the following:
ConglomerateGameList:
+-------+----------+-----------+------------+--------------------------------+
| id | steam_id | origin_id | impulse_id | game_title |
+-------+----------+-----------+------------+--------------------------------+
| 1 | 17450 | 138994900 | dragonage | Dragon Age: Origins |
| 4 | 47850 | 201841300 | fifamgr11 | FIFA Manager 11 |
| ... | ... | ... | ... | ... |
+-------+----------+-----------+------------+--------------------------------+
Note that I want the game title of the "new id". The game title for any "old ids" should simply be discarded/ignored.
I would also like to note that I can't make any modifications to the GameList table to solve this issue. If I were to simply re-write the table to look like my desired output then every night when I grab an updated game list from the stores it would fail to find the game in the database, generating yet another row like so:
+-------+----------+-----------+------------+--------------------------------+
| id | steam_id | origin_id | impulse_id | game_title |
+-------+----------+-----------+------------+--------------------------------+
| 1 | 17450 | 138994900 | dragonage | Dragon Age: Origins |
| 4 | 47850 | 201841300 | fifamgr11 | FIFA Manager 11 |
| ... | ... | ... | ... | ... |
| 8139 | NULL | 138994900 | NULL | Dragon Age(TM): Origins |
| 8140 | NULL | NULL | dragonage | Dragon Age Origins |
+-------+----------+-----------+------------+--------------------------------+
I also can't work on the assumption that a game's id will never change as Steam has been known to change them when a major update to the game is released.
Bonus points if it can recognize recursive aliases, like the following:
GameAlias:
+----------+-----------+
| old_id | new_id |
+----------+-----------+
| 2 | 1 |
| 3 | 2 |
| ... | ... |
+----------+-----------+
Since id 3 is an alias for id 2 which itself is an alias for id 1. If recursive aliases is impossible then I can just develop my application logic to prevent them.
Does this work? Correct the table names.
select ga1.new_id, max(gl1.steam_id), max(gl1.origin_id), max(gl1.impulse_id),
max(if(gl1.id = ga1.new_id,gl1.game_title,NULL)) as game_title
from gl1, ga1
where (gl1.id = ga1.new_id OR gl1.id = ga1.old_id)
group by ga1.new_id
union
select gl2.id, gl2.steam_id, gl2.origin_id, gl2.impulse_id, gl2.game_title
from gl2, ga2
where (gl2.id not in (
select ga3.new_id from ga3
union
select ga4.old_id from ga4))
1.First solution (without recursion):
CREATE TABLE GameList
(
id INT NOT NULL PRIMARY KEY
,steam_id INT NULL
,origin_id INT NULL
,impulse_id NVARCHAR(50) NULL
,game_title NVARCHAR(50) NOT NULL
);
INSERT GameList(id, steam_id, origin_id, impulse_id, game_title)
SELECT 1, 17450, NULL, NULL, 'Dragon Age: Origins'
UNION ALL
SELECT 2, NULL, 138994900, NULL, 'Dragon Age(TM): Origins'
UNION ALL
SELECT 3, NULL, NULL, 'dragonage','Dragon Age Origins'
UNION ALL
SELECT 4, 47850, 201841300, 'fifamgr11','FIFA Manager 11';
CREATE TABLE GameAlias
(
old_id INT NOT NULL PRIMARY KEY
,new_id INT NOT NULL
);
INSERT GameAlias (old_id, new_id) VALUES (2,1);
INSERT GameAlias (old_id, new_id) VALUES (3,1);
-- Solution 1
SELECT COALESCE(ga.new_id, gl.id) new_id
,MAX(gl.steam_id) new_steam_id
,MAX(gl.origin_id) new_origin_id
,MAX(gl.impulse_id) new_impulse_id
,MAX( CASE WHEN ga.old_id IS NULL THEN gl.game_title ELSE NULL END ) new_game_title
FROM GameList gl
LEFT OUTER JOIN GameAlias ga ON gl.id = ga.old_id
GROUP BY COALESCE(ga.new_id, gl.id);
-- End of Solution 1
DROP TABLE GameList;
DROP TABLE GameAlias;
Results:
1 17450 138994900 dragonage Dragon Age: Origins
4 47850 201841300 fifamgr11 FIFA Manager 11
2.Second solution (levels of recursion = three levels):
CREATE TABLE GameList
(
id INT NOT NULL PRIMARY KEY
,steam_id INT NULL
,origin_id INT NULL
,impulse_id NVARCHAR(50) NULL
,game_title NVARCHAR(50) NOT NULL
);
INSERT GameList(id, steam_id, origin_id, impulse_id, game_title)
SELECT 1, 17450, NULL, NULL, 'Dragon Age: Origins'
UNION ALL
SELECT 2, NULL, 138994900, NULL, 'Dragon Age(TM): Origins'
UNION ALL
SELECT 3, NULL, NULL, 'dragonage','Dragon Age Origins'
UNION ALL
SELECT 4, 47850, 201841300, 'fifamgr11','FIFA Manager 11'
UNION ALL
SELECT 5, 11111, NULL, NULL, 'Starcraft 1'
UNION ALL
SELECT 6, NULL, 1111111111, NULL, 'Starcraft 1.1'
UNION ALL
SELECT 7, NULL, NULL, NULL, 'Starcraft 1.2'
UNION ALL
SELECT 8, NULL, NULL, 'sc1', 'Starcraft 1.3';
CREATE TABLE GameAlias
(
old_id INT NOT NULL PRIMARY KEY
,new_id INT NOT NULL
);
INSERT GameAlias (old_id, new_id) VALUES (2,1);
INSERT GameAlias (old_id, new_id) VALUES (3,1);
INSERT GameAlias (old_id, new_id) VALUES (6,5);
INSERT GameAlias (old_id, new_id) VALUES (7,6);
INSERT GameAlias (old_id, new_id) VALUES (8,7);
-- Solution 2
CREATE TEMPORARY TABLE Mappings
(
old_id INT NOT NULL PRIMARY KEY
,new_id INT NOT NULL
);
INSERT Mappings (old_id, new_id)
-- first level mapping
SELECT ga.old_id, ga.new_id
FROM GameAlias ga
WHERE ga.new_id NOT IN (SELECT t.old_id FROM GameAlias t)
-- second level mapping
UNION ALL
SELECT ga.old_id, ga2.new_id
FROM GameAlias ga
INNER JOIN GameAlias ga2 ON ga.new_id = ga2.old_id
WHERE ga2.new_id NOT IN (SELECT t.old_id FROM GameAlias t)
-- third level mapping
UNION ALL
SELECT ga.old_id, ga3.new_id
FROM GameAlias ga
INNER JOIN GameAlias ga2 ON ga.new_id = ga2.old_id
INNER JOIN GameAlias ga3 ON ga2.new_id = ga3.old_id;
SELECT COALESCE(ga.new_id, gl.id) new_id
,MAX(gl.steam_id) new_steam_id
,MAX(gl.origin_id) new_origin_id
,MAX(gl.impulse_id) new_impulse_id
,MAX( CASE WHEN ga.old_id IS NULL THEN gl.game_title ELSE NULL END ) new_game_title
FROM GameList gl
LEFT OUTER JOIN Mappings ga ON gl.id = ga.old_id
GROUP BY COALESCE(ga.new_id, gl.id);
DROP TEMPORARY TABLE Mappings;
-- End of Solution 2
DROP TABLE GameList;
DROP TABLE GameAlias;
Results:
1 17450 138994900 dragonage Dragon Age: Origins
4 47850 201841300 fifamgr11 FIFA Manager 11
5 11111 1111111111 sc1 Starcraft 1
I'm sorry, but MySQL doesn't has recursive queries/CTEs.