Automating table normalization - mysql

I have a table with this structure (simplified):
artID: 1
artName: TNT
ArtBrand: ACME
...
And I want to normalize it making a separate table for the brand (it will have additional data about every brand)
So I want to end up with this
article table:
artID: 1
artName: TNT
brandID: 1
...
brand table
brandID: 1
brandName: ACME
brandInfo: xyz
....
This table have way too many brands to do this manually.
Any easy way to automate this?
I'm using MySQL

As the other answers suggested, you can use the INSERT ... SELECT syntax to do something like this:
INSERT INTO brands (brandName)
SELECT artBrand
FROM original
GROUP BY artBrand;
INSERT INTO articles (artName, brandID)
SELECT o.artName, b.brandID
FROM original o
JOIN brands b ON (b.brandName = o.artBrand);
Test case:
CREATE TABLE original (artID int, artName varchar(10), artBrand varchar(10));
CREATE TABLE articles (artID int auto_increment primary key, artName varchar(10), brandID int);
CREATE TABLE brands (brandID int auto_increment primary key, brandName varchar(10));
INSERT INTO original VALUES (1, 'TNT1', 'ACME1');
INSERT INTO original VALUES (2, 'TNT2', 'ACME1');
INSERT INTO original VALUES (3, 'TNT3', 'ACME1');
INSERT INTO original VALUES (4, 'TNT4', 'ACME2');
INSERT INTO original VALUES (5, 'TNT5', 'ACME2');
INSERT INTO original VALUES (6, 'TNT6', 'ACME3');
INSERT INTO original VALUES (7, 'TNT7', 'ACME3');
INSERT INTO original VALUES (8, 'TNT8', 'ACME3');
INSERT INTO original VALUES (9, 'TNT9', 'ACME4');
Result:
SELECT * FROM brands;
+---------+-----------+
| brandID | brandName |
+---------+-----------+
| 1 | ACME1 |
| 2 | ACME2 |
| 3 | ACME3 |
| 4 | ACME4 |
+---------+-----------+
4 rows in set (0.00 sec)
ELECT * FROM articles;
+-------+---------+---------+
| artID | artName | brandID |
+-------+---------+---------+
| 1 | TNT1 | 1 |
| 2 | TNT2 | 1 |
| 3 | TNT3 | 1 |
| 4 | TNT4 | 2 |
| 5 | TNT5 | 2 |
| 6 | TNT6 | 3 |
| 7 | TNT7 | 3 |
| 8 | TNT8 | 3 |
| 9 | TNT9 | 4 |
+-------+---------+---------+
9 rows in set (0.00 sec)

I would use create table as select
... syntax to create the brands
table with generated id-s
create the brand_id column, and fill it up with the generated id-s from the brands table, using the existing brand columns in article table.
remove the brand columns from article table except of course brand_id
create the foreign key...

Generating brands table should be fairly simple:
CREATE TABLE brands (
id INT PRIMARY KEY AUTO_INCREMENT,
brand_name VARCHAR(50),
brand_info VARCHAR(200)
);
INSERT INTO brands VALUES (brand_name)
SELECT ArtBrand FROM Table
GROUP BY ArtBrand;
Similar story with creating relations between your original table and new brands table, just that select statement in your insert will look like this:
SELECT t.artId, b.id
FROM table t JOIN brands b ON (t.ArtBrand = b.brand_name)

Related

Inserting new data in a table

I have created a basic table for learning purposes.
CREATE TABLE friends (
id INT,
name TEXT,
birthday DATE
);
Added some data...
INSERT INTO friends (id,name,birthday)
VALUES (1,'Jo Monro','1948-05-30');
INSERT INTO friends (id,name,birthday)
VALUES (2, 'Lara Johnson','1970-03-03');
INSERT INTO friends (id,name,birthday)
VALUES (3,'Bob Parker', '1962-09-3');
And I realised that I forgot to include the email column.
I added the column...
ALTER TABLE friends
ADD COLUMN email;
..but how can I add now data to this new column only?
I have tried WHERE statements, rewriting the INSERT INTO statements with and without the other column names but nothing worked?
What am I missing here?
Thank you!
Insert the emails into a temporary table, then update the real table with that.
CREATE TABLE friends (
id INT auto_increment primary key,
name VARCHAR(100),
birthday DATE
);
INSERT INTO friends (name, birthday) VALUES
('Jo Monro','1948-05-30')
, ('Lara Johnson','1970-03-03')
, ('Bob Parker', '1962-09-3');
ALTER TABLE friends ADD COLUMN email VARCHAR(100);
select * from friends
id | name | birthday | email
-: | :----------- | :--------- | :----
1 | Jo Monro | 1948-05-30 | null
2 | Lara Johnson | 1970-03-03 | null
3 | Bob Parker | 1962-09-03 | null
--
-- temporary table for the emails
--
CREATE TEMPORARY TABLE tmpEmails (
name varchar(100) primary key,
email varchar(100)
);
--
-- fill the temp
--
insert into tmpEmails (name, email) values
('Jo Monro','jo.monro#unmail.net')
, ('Lara Johnson','lara.johnson#unmail.net')
, ('Bob Parker', 'UltimateLordOfDarkness#chuni.byo');
--
-- update the real table
--
update friends friend
join tmpEmails tmp
on friend.name = tmp.name
set friend.email = tmp.email
where friend.email is null;
select * from friends
id | name | birthday | email
-: | :----------- | :--------- | :-------------------------------
1 | Jo Monro | 1948-05-30 | jo.monro#unmail.net
2 | Lara Johnson | 1970-03-03 | lara.johnson#unmail.net
3 | Bob Parker | 1962-09-03 | UltimateLordOfDarkness#chuni.byo
db<>fiddle here

split a field having comma separated values and update the same field by adding additional rows with separated values

I have a field with comma separated values.I want to split the field and add the rows with the obtained values(after splitting) in the same table.
Eg:
**Data as in db**
ID CustomerId Preferences
------------------------------
1. 4456823 AA,BB,DD
2. 4456824 BB,DD
**Data format required**
ID CustomerId Preferences
------------------------------
1. 4456823 AA
2. 4456823 BB
3. 4456823 DD
4. 4456824 BB
5. 4456824 DD
Is there a way I can do this without using a temp table because the the Customer here is a cascading entity..This Id is formed by some other table which is an auto increment key.
Basically I want to update the first field and insert the other split values to get the desired result shown above
Here's one approach using a table of integers (0-9):
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL
,customerid INT NOT NULL
,preferences VARCHAR(100) NOT NULL
);
INSERT INTO my_table VALUES
(1,4456823,'AA,BB,DD'),
(2,4456824,'BB,DD');
DROP TABLE IF EXISTS my_new_table;
CREATE TABLE my_new_table (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,customerid INT NOT NULL,preferences CHAR(2) NOT NULL);
INSERT INTO my_new_table (customerid,preferences)
SELECT DISTINCT customerid, SUBSTRING_INDEX(SUBSTRING_INDEX(preferences,',',i+1),',',-1) x FROM my_table, ints ORDER BY customerid,x;
SELECT * FROM my_new_table;
+----+------------+-------------+
| id | customerid | preferences |
+----+------------+-------------+
| 1 | 4456823 | AA |
| 2 | 4456823 | BB |
| 3 | 4456823 | DD |
| 4 | 4456824 | BB |
| 5 | 4456824 | DD |
+----+------------+-------------+

Database design and join operations among database tables

The theme of this question is to maintain the user comments over my website.
I had around 25000 articles on my website(of different categories) and each article has a comments section below it.Since the number of comments increased over 70,000 I decided to divide the articles into various tables depending on its category articles_of_type_category and a corresponding comments table article_category_comments for each table,assuming that it would improve the performance in future (though currently its working fine)
Now I have two questions :
1) Should I divide the database or there will no degradation in performance if table grows further in size?
2)If yes,then I have some problem in SQL for join operation for the new database design.On the comments page for each article I show the comments,name of the person who made the comment and his points.
So suppose user is viewing the article 3, hence I need to obtain the following detail to show on the page of article 3
-------------------------------------------------------------------------------------------
serial#| comment | name_who_made_this_comment | points | gold | silver | bronze
-------------------------------------------------------------------------------------------
| | | | | |
| | | | | |
by joining these three tables
user_details
+----+--------+----------+
| id | name | college |
+----+--------+----------+
| 1 | naveen | a |
| 2 | rahul | b |
| 3 | dave | c |
| 4 | tom | d |
+----+--------+----------+
score (this table stores the user points like stackoverflow)
+----+--------+------+--------+--------+---------+
| id | points | gold | silver | bronze | user_id |
+----+--------+------+--------+--------+---------+
| 1 | 2354 | 2 | 9 | 25 | 3 |
| 2 | 4562 | 1 | 9 | 11 | 2 |
| 3 | 1123 | 7 | 9 | 11 | 1 |
| 4 | 3457 | 0 | 9 | 4 | 4 |
+----+--------+------+--------+--------+---------+
comments (this table stores comment, id of the article on which it was made,and user id)
+----+----------------------------+-------------+---------+
| id | comment | article_id | user_id |
+----+----------------------------+-------------+---------+
| 1 | This is a nice article | 3 | 1 |
| 2 | This is a tough article | 3 | 4 |
| 3 | This is a good article | 2 | 7 |
| 4 | This is a good article | 1 | 3 |
| 5 | Please update this article | 4 | 4 |
+----+----------------------------+-------------+---------+
I tried something like
select * from comments join (select * from user_details join points where user_details.id=points.user_id)as joined_temp where comments.id=joined_temp.u_id and article_id=3;
This is a response to this comment, "#DanBracuk:It would really be useful if you give an overview by naming the tables and corresponding column names"
Table category
categoryId int not null, autoincrement primary key
category varchar(50)
Sample categories could be "Fairy Tale", "World War I", or "Movie Stars".
Table article
articleId int not null, autoincrement primary key
categoryId int not null foreign key
text clob, or whatever the mysql equivalent is
Since the comment was in response to my comment about articles and categories, this answer is limited to that.
I would start with a table with articles and categories. Then use a bridge table to link the both. My advice would be to index the categories in the bridge table. This would speed up the access.
Example of table structure:
CREATE TABLE Article (
id int NOT NULL AUTO_INCREMENT PRIMARY KEY,
title varchar(100) NOT NULL
);
INSERT INTO Article
(title)
VALUES
('kljlkjlkjalk'),
('aiouiwiuiuaoijukj');
CREATE TABLE Category (
id int NOT NULL AUTO_INCREMENT PRIMARY KEY,
name varchar(100)
);
INSERT INTO Category
(name)
VALUES
('kljlkjlkjalk'),
('aiouiwiuiuaoijukj');
CREATE TABLE Article_Category (
id int NOT NULL AUTO_INCREMENT PRIMARY KEY,
article_id int,
category_id int
);
INSERT INTO Article_Category
(article_id, category_id)
VALUES
(1,1),
(1,2);
CREATE TABLE User_Details
(`id` int, `name` varchar(6), `college` varchar(1))
;
INSERT INTO User_Details
(`id`, `name`, `college`)
VALUES
(1, 'naveen', 'a'),
(2, 'rahul', 'b'),
(3, 'dave', 'c'),
(4, 'tom', 'd')
;
CREATE TABLE Score
(`id` int, `points` int, `gold` int, `silver` int, `bronze` int, `user_id` int)
;
INSERT INTO Score
(`id`, `points`, `gold`, `silver`, `bronze`, `user_id`)
VALUES
(1, 2354, 2, 9, 25, 3),
(2, 4562, 1, 9, 11, 2),
(3, 1123, 7, 9, 11, 1),
(4, 3457, 0, 9, 4, 4)
;
CREATE TABLE Comment
(`id` int, `comment` varchar(26), `article_id` int, `user_id` int)
;
INSERT INTO Comment
(`id`, `comment`, `article_id`, `user_id`)
VALUES
(1, 'This is a nice article', 3, 1),
(2, 'This is a tough article', 3, 4),
(3, 'This is a good article', 2, 7),
(4, 'This is a good article', 1, 3),
(5, 'Please update this article', 4, 4)
;
Try this:
SQLFiddle Demo
Best of luck.
70000 elements are not so many. In fact the number is close to nothing. Your problem lies in bad design. I have a table with many millions of records and when I request to the application server which executes complex queries in the backend and it responds in less than a second. So you are definitely doing something in sub-optimal design. I think that a detailed answer would take too much space and effort (as we have a complete science built on your question) which is out of scope in this website, so I choose to point you to the right direction:
Read about normalization (1NF, 2NF, 3NF, BCNF and so on) and compare it to your design.
Read about indexing and other implicit optimizations
Optimize your queries and minimize the number of queries
As to answer your concrete question: No, you should not "divide" your table. You should fix the structural errors in your database schema and optimize the algorithms using your database.

MySQL INSERT .. UPDATE breaks AUTO_INCREMENT?

There are the following two tables:
create table lol(id int auto_increment, data int, primary key id(id));
create table lol2(id int auto_increment, data int, primary key id(id));
Insert some values:
insert into lol2 (data) values (1),(2),(3),(4);
Now insert using select:
insert into lol (data) select data from lol2;
Do it again:
insert into lol (data) select data from lol2;
Now look at the table:
select * from lol;
I receive:
+----+------+
| id | data |
+----+------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 8 | 1 |
| 9 | 2 |
| 10 | 3 |
| 11 | 4 |
+----+------+
I'm puzzled by the gap between 4 and 8... What caused this and how can I do it so that there isn't a gap? Thanks a lot!
auto_increment does not guarantee to have increments by 1 in the ID column. And it cannot, because as soon as you work with parallel transactions it would break anyways:
BEGIN BEGIN
INSERT INTO lol VALUES(...) INSERT INTO lol VALUES(..)
... ...
COMMIT ROLLBACK
What ids should be assigned by the database? It cannot know in advance which transaction will succeed and which will be rolled back.
If you need a sequential numbering of your records you would use a query which returns that; e.g.
SELECT COUNT(*) as position, lol.data FROM lol
INNER JOIN lol2 ON lol.id < lol2.id
GROUP BY lol.id

Merge two MySQL tables

I have two tables: data and comments
data:
-------------------
id | name | email |
pics:
-------------------
msg_id | pic |
The data is assosiated by id and msg_id. How I can merge this two tables into final one?
Edit: some rows in data don't have associated a pic, and i need to keep them.
You may want to use the INSERT INTO ... SELECT syntax:
INSERT INTO final_table
SELECT id, name, email, pic
FROM data
JOIN pics ON (pics.msg_id = data.id);
Example:
Your current data:
CREATE TABLE data (id int, name varchar(20), email varchar(100));
INSERT INTO data VALUES (1, 'name1', 'a#example.com');
INSERT INTO data VALUES (2, 'name2', 'b#example.com');
INSERT INTO data VALUES (3, 'name3', 'c#example.com');
INSERT INTO data VALUES (4, 'name4', 'd#example.com');
CREATE TABLE pics (msg_id int, pic varchar(100));
INSERT INTO pics VALUES (1, 'pic1.jpg');
INSERT INTO pics VALUES (1, 'pic2.jpg');
INSERT INTO pics VALUES (2, 'pic3.jpg');
INSERT INTO pics VALUES (2, 'pic4.jpg');
INSERT INTO pics VALUES (3, 'pic5.jpg');
Your new table:
CREATE TABLE final_table (
id int, name varchar(20), email varchar(100), pic varchar(100)
);
INSERT INTO final_table
SELECT id, name, email, pic
FROM data
LEFT JOIN pics ON (pics.msg_id = data.id);
Result:
SELECT * FROM final_table;
+------+-------+---------------+----------+
| id | name | email | pic |
+------+-------+---------------+----------+
| 1 | name1 | a#example.com | pic1.jpg |
| 1 | name1 | a#example.com | pic2.jpg |
| 2 | name2 | b#example.com | pic3.jpg |
| 2 | name2 | b#example.com | pic4.jpg |
| 3 | name3 | c#example.com | pic5.jpg |
| 4 | name4 | d#example.com | NULL |
+------+-------+---------------+----------+
6 rows in set (0.00 sec)
Probably you need to fetch data from both table and join them into a single one? So you could use the following query:
SELECT D.*, P.* FROM data D, pics P WHERE D.Id=P.msg_id