MySQL schema design issues - Normalizing - mysql

I'm creating tables for my site using the following design(s)
Design 1
Design 2
Since not every user who register will try the challenge, Design 1 is suited. On insert into third table, table 2 score is updated accordingly. But the user_id field becomes redundant.
Either 0 or NULL values are set for every user in design 2 which still isn't normalized.
What would be the optimal design and how important is normalization or key in an organization?

Edit:
For future people - I had some problems understanding what OP was asking for so read through the comments if you get a little lost. Ultimately, they were looking to store aggregate data and didn't know where to put it or how to make it happen. The solution is basically to use an insert trigger, which is explained near the end of this post.
I chose to just add another column on to the user table to store the accumulated sum of user_problem.score. However, making a new table (with the columns user_id and total_sum) isn't a bad option at all even though it seems to be an excessive use of normalization. Sometimes it is good to keep data that is constantly updated separate from data that is rarely changed. That way if something goes wrong, you know your static data will be safe.
Something else I never touched on are the data concurrency and integrity issues associated with storing aggregate data in general... so beware of that.
I would suggest something like this:
User Table
User_ID - Email - Name - Password - FB_ID
-- holds all the user information
Problem Table
Problem_ID - Problem_Title - Problem_Descr
-- holds all the info on the individual challenges/problems/whatever
User_Problem Table
User_Problem_ID - User_ID - Problem_ID - Score - Completion_Date
-- Joins the User and Problem tables and has information specific
-- to a user+challenge pair
And this assumes that a user can take many challenges/problems. And one problem/challenge can be taken by several users.
To see all the problems by a certain user, you would do something like:
select user.user_id,
user.name,
problem_title,
problem_descr,
user_problem.score,
user_problem.completed_date
from user
join user_problem on user.user_id = user_problem.user_id
join problem on user_problem.problem_id = problem.problem_id
where user.user_id = 123 or user.email = 'stuff#gmail.com'
The lengths for the varchar fields are fairly generic...
create table User(
User_ID int unsigned auto_increment primary key,
Email varchar(100),
Name varchar(100),
Password varchar(100),
FB_ID int
);
create table Problem (
Problem_ID int unsigned auto_increment primary key,
Problem_Title varchar(100),
Problem_Descr varchar(500)
);
create table User_Problem (
User_Problem_ID int unsigned auto_increment primary key,
User_ID int unsigned,
Problem_ID int unsigned,
Score int,
Completion_Date datetime,
foreign key (User_ID) references User (User_ID),
foreign key (Problem_ID) references Problem (Problem_ID)
);
After our conversation from down below in the comments... you would add a column to user:
User Table
User_ID - Email - Name - Password - FB_ID - Total_Score
I gave the column a default value of 0 because you seemed to want/need that if the person didn't have any associated problem/challenges. Depending on other things, it may benefit you to make this an unsigned int if you have a rule which states there will never be a negative score.
alter table user add column Total_Score int default 0;
then... you would use an insert trigger on the user_problem table that affects the user table.
CREATE TRIGGER tgr_update_total_score
AFTER INSERT ON User_Problem
FOR EACH ROW
UPDATE User
SET Total_score = Total_score + New.Score
WHERE User_ID = NEW.User_ID;
So... after a row is added to User_Problem, you would add the new score to user.total_score...
mysql> select * from user;
+---------+-------+------+----------+-------+-------------+
| User_ID | Email | Name | Password | FB_ID | Total_Score |
+---------+-------+------+----------+-------+-------------+
| 1 | NULL | kim | NULL | NULL | 0 |
| 2 | NULL | kyle | NULL | NULL | 0 |
+---------+-------+------+----------+-------+-------------+
2 rows in set (0.00 sec)
mysql> insert into user_problem values (null,1,1,10,now());
Query OK, 1 row affected (0.16 sec)
mysql> select * from user;
+---------+-------+------+----------+-------+-------------+
| User_ID | Email | Name | Password | FB_ID | Total_Score |
+---------+-------+------+----------+-------+-------------+
| 1 | NULL | kim | NULL | NULL | 10 |
| 2 | NULL | kyle | NULL | NULL | 0 |
+---------+-------+------+----------+-------+-------------+
2 rows in set (0.00 sec)
mysql> select * from user_problem;
+-----------------+---------+------------+-------+---------------------+
| User_Problem_ID | User_ID | Problem_ID | Score | Completion_Date |
+-----------------+---------+------------+-------+---------------------+
| 1 | 1 | 1 | 10 | 2013-11-03 11:31:53 |
+-----------------+---------+------------+-------+---------------------+
1 row in set (0.00 sec)

Related

Mysql 3 tables, multiple counts grouped by, but gets stuck

I'm running with some troubles on a query. I'm trying to retrieve some data of a big database where 3 tables are involved.
These tables contain data about adds where, in a backend website, the administrator can manage which local adds he wants to be displayed, position and etc... These are organized in 3 tables, 1 of them, contains all the data that are relevant to adds info (Name, date of avaliability, date of expiration, etc...). Then, there's another 2 tables which contain some extra info, but just about views, or clicks.
So I have only 15 adds, that have multiple clicks and multiple views.
Each click and view table, register a new row for every click. So, when a click is registered, it will add a new row where addid_views is a register(click), and addid is addid from adds_table. So for instance, add (1) will have 2 views and 2 clicks while add (2) will have 1 view and 1 click.
My idea is to get for each add, how many clicks and views had in total.
I have 3 tables like these:
adds_table adds_clicks_table adds_views_table
+-------+-----------+ +-------------+------+ +-------------+------+
| addid | name | | addid_click |addid | | addid_views |addid |
+-------+-----------+ +-------------+------+ +-------------+------+
| 1 | add_name1 | | 1 | 1 | | 1 | 1 |
+-------+-----------+ +-------------+------+ +-------------+------+
| 2 | add_name2 | | 2 | 2 | | 2 | 1 |
+-------+-----------+
| 3 | add_name3 | | 3 | 1 | | 3 | 2 |
+-------+-----------+ +-------------+------+ +-------------+------+
CREATE TABLE `bwm_adds` (
`addid` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
...
PRIMARY KEY (`addid`)
) ENGINE=InnoDB AUTO_INCREMENT=16 DEFAULT CHARSET=utf8
CREATE TABLE `bwm_adds_clicks` (
`add_clickid` int(19) NOT NULL AUTO_INCREMENT,
`addid` int(11) NOT NULL,
...
PRIMARY KEY (`add_clickid`)
) ENGINE=InnoDB AUTO_INCREMENT=3374 DEFAULT CHARSET=utf8
CREATE TABLE `bwm_adds_views` (
`add_viewsid` int(19) NOT NULL AUTO_INCREMENT,
`addid` int(11) NOT NULL,
...
PRIMARY KEY (`add_viewsid`)
) ENGINE=InnoDB AUTO_INCREMENT=2078738 DEFAULT CHARSET=utf8
The result would be a single table where I retrieved, per each add (addid), how many clicks and how many views it had.
I need to get all a query where I get something like this:
+-------+---------+-----------+
| addid | clicks | views |
+-------+---------+-----------+
| 1 | 123123 | 235457568 |
+-------+---------+-----------+
| 2 | 5124123 | 435345234 |
+-------+---------+-----------+
| 3 | 123541 | 453563623 |
+-------+---------+-----------+
I tried to execute a query but it get's stuck and loading for undefined time... I 'm pretty sure that my query is failing cause if I remove one of the counts, displays some data very fast.
SELECT a.addid, COUNT(ac.addid_clicks) as 'clicks', COUNT(av.addid_views) as 'views'
FROM `adds_table` a
LEFT JOIN `adds_clicks_table` ac ON a.addid = ac.addid_click
LEFT JOIN `adds_views_table` av ON ac.addid_click = av.addid_views
GROUP BY a.addid
Mysql gets loading all the time, any idea to help know what I'm missing?
By the way, I found this post where treats almost the same problem I have, you can see I have the query very similar to the first answer, but I get the Loading message all the time. No errors, just Loading.
Edit: I missplaced the numbers and got confused. Now the tables are fixed and I added some explanation about it.
Edit2: I updated the post with SHOW CREATE TABLES DEFINITIONS.
Edit3: Is there any way to optimise this query? It seems it retrieves the result I want but the mysql database cancels the query because it gets more than 30 seconds to execute.
SELECT a.addid,
(SELECT COUNT(addid) FROM add_clicks where addid = a.addid) as clicks,
(SELECT COUNT(addid) FROM add_views where addid = a.addid) as views
FROM adds a ORDER BY a.addid;
If those are really your tables (one column, plus an auto_inc), then there is no meaningful information justifying having 3 tables instead of 1:
CREATE TABLE `bwm_adds` (
`addid` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
clicks INT UNSIGNED NOT NULL,
views INT UNSIGNED NOT NULL,
PRIMARY KEY (`addid`)
) ENGINE=InnoDB AUTO_INCREMENT=16 DEFAULT CHARSET=utf8
and then UPDATE ... SET views = views + 1 (etc) rather than inserting into the other tables.
If you have an old version,
SELECT a.addid,
( SELECT COUNT(addid_clicks)
FROM `adds_clicks_table`
WHERE addid = a.addid
) AS 'clicks',
( SELECT COUNT(addid_clicks)
FROM `adds_views_table`
WHERE addid = a.addid
) AS 'views'
FROM adds_table AS a
For 5.6 and later, this might be faster:
SELECT a.addid, c.clicks, v.views
FROM `adds_table` a
LEFT JOIN ( SELECT addid, COUNT(addid_clicks) FROM addid_clicks ) AS c USING(addid)
LEFT JOIN ( SELECT addid, COUNT(addid_views) FROM addid_views ) AS v USING(addid)
If you get NULLs but prefer 0s, then wrap the value in IFNULL(..., 0).
If you need to discuss further, please provide SHOW CREATE TABLE and EXPLAIN SELECT ...
I ended with a solution to my problem. The table I was trying to reach was too big cause of the bad engineered database, where in adds_views_table, for each view, a new row would be added. Ending with almost 3 millions of rows and with a table that weights almost the 35% of the entire database (326MB).
When phpmyadmin tried to execute a query, loaded for ever and never showed a result because a timeout limit applied to mysql. Changing this value would help but wasn't viable to retrieve that data and display it on a website (that implies the website or data wouldn't load until the query its executed).
That problem was fixed thanks to creating an index of addid in adds_table. Also, the query it's faster if subquery's are used for some reason. The query ended like this:
SELECT a.addid,
(SELECT COUNT(addid) FROM adds_clicks_table WHERE addid = a.addid) AS 'clicks',(SELECT COUNT(addid) FROM adds_views_table WHERE addid = a.addid) AS 'views'
FROM adds_table a
ORDER BY a.addid;
Thanks to #Rick James who posted a similar query and I ended modifying it to get the data I needed
forgive my horrible english

How to set up table keys with 15M+ rows for high performance and low cost?

I need to ensure best performance for a table with 15M+ rows in a MySQL database hosted in AWS using Aurora (Small sized instance currently). The table is essentially for tracking the ownership and update timestamp of product units over time, along with each unit's other basic information like serial number.
The columns are as follows:
UnitId, ScanTime, Model, SerialNumber, MfrTimestamp, UpdateTimestamp,
CustomerId
Table Creation Statement
CREATE TABLE `UnitHistory` (
`UnitId` bigint(20) NOT NULL,
`ScanTime` int(11) NOT NULL,
`Model` bigint(20) NOT NULL,
`SerialNumber` int(11) NOT NULL,
`MfrTimestamp` int(11) NOT NULL,
`UpdateTimestamp` int(11) DEFAULT NULL,
`CustomerId` bigint(20) DEFAULT NULL,
PRIMARY KEY (`UnitId`,`ScanTime`)
);
Rows will be added over time, but NEVER modified.
I chose UnitId and ScanTime as the primary key because those two together are sufficient to always be unique.
Query 1
The query I'll most frequently use will ideally produce a list of all UnitId's for a specific Model, along with the unit's most up-to-date details.
The following query will work, but will of course also return more rows than I need (redundant data):
SELECT UnitId, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId FROM UnitHistory WHERE Model=2500;
If there's a way to constrain that query so that only the row with the most recent ScanTime is returned for any given UnitId, that would be ideal.
Otherwise I'll simply search the result for the row with the most recent ScanTime for each UnitId afterward.
Query 2
The other very frequently used query will produce a basic set of details and history for any particular unit, like this:
SELECT ScanTime, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId FROM UnitHistory WHERE UnitId=1234567;
This query will primarily be used to track the change of ownership as it passes from the manufacturer to a customer, then
back to the manufacturer for update, then out to perhaps a different customer, etc.
Summary
With the above scenario, what additional key(s) should I have in order to ensure good performance and low cost?
One cost factor is that I assume my working set should fit within RAM in order to avoid lots of IOs since AWS charges for IOs.
My current database instance has 2 GB RAM, and for cost reasons I don't want to upgrade it.
For your query 1, you should have this index:
ALTER TABLE UnitHistory ADD INDEX (Model, ScanTime);
To get the most recent:
SELECT UnitId, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId
FROM UnitHistory WHERE Model=2500
ORDER BY ScanTime DESC LIMIT 1;
Here's a demo of using EXPLAIN to confirm the query uses the index (which is named "Model" after the first column of the index since I didn't give it a name in my test):
mysql> explain SELECT UnitId, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId FROM UnitHistory WHERE Model=2500 order by scantime desc limit 1;
+----+-------------+-------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | UnitHistory | NULL | ref | Model | Model | 8 | const | 1 | 100.00 | Using where |
+----+-------------+-------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
Your other query 1 is already searching by the left-most column of the primary key, so there's no need to add another index.
mysql> explain SELECT ScanTime, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId FROM UnitHistory WHERE UnitId=1234567;
+----+-------------+-------------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | UnitHistory | NULL | ref | PRIMARY | PRIMARY | 8 | const | 1 | 100.00 | NULL |
+----+-------------+-------------+------------+------+---------------+---------+---------+-------+------+----------+-------+
I can't predict whether your working set will fit in RAM, because I don't know the distribution of your data.
I assume this is an audit table and you are taking readings for units?
Partitioning tables, having views or prepared statements are some possible ways.
Here another way for Query1. Create another table like your UnitHistory. Create table UnitReadings like UnitHistory; but unitid being the primary key.
And then alter you UnitHistory table and add triggers before insert or after insert. something like,
Insert into `UnitReading`(
UnitId,
ScanTime,
Model,
SerialNumber,
MfrTimestamp,
UpdateTimestamp,
CustomerId
) values
(
NEW.UnitId,
NEW.ScanTime,
NEW.Model,
NEW.SerialNumber,
NEW.MfrTimestamp,
NEW.UpdateTimestamp,
NEW.CustomerId
) ON DUPLICATE KEY UPDATE
ScanTime = values(ScanTime),
Model = values(Model),
SerialNumber = values(SerialNumber),
MfrTimestamp = values(MfrTimestamp),
UpdateTimestamp = values(UpdateTimestamp),
CustomerId = values(CustomerId);
The goal ist to keep the latest reading in a "header table" which might have less rows than your entire history of your (readings * per day * days) rows. After few years you might exceed 15m rows but your header table could still be around 1000 units or whatever amount of units you are taking readings of. You may well exceed your performance expectation usign this header table "withing your 2GB RAM" :) :)
Not sure if you can implement this but you get the idea right?

MYSQL, PHP, order by not working, primary key

I am generating a mySQL query from PHP.
Part of the query re-orders a table based on some variables (which do not include the primary key).
The code doesn't produce errors, however the table is not sorted.
I echo'd out the SQL code, and it looks correct, I tried running it directly in phpMyAdmin, and it runs also without error, but the table is still not sorted as requested.
alter table anavar order by dset_name, var_id;
I am pretty sure that this has to do with the fact that I have a primary key variable (UID) which is not present in the sort.
Both prior and post running the query the table remains ordered by UID. Deleting UID and re-running the query results in a correctly sorted table, but this seems like an overkill solution.
Any suggestions?
create table t2
( id int auto_increment primary key,
someInt int not null,
thing varchar(100) not null,
theWhen datetime not null,
key(theWhen) -- creates an index on theWhen
);
-- my table now has 2 indexes on it
-- see it by running `show indexes from t2`
-- truncate table t2;
insert t2(someInt,thing,theWhen) values
(17,'chopstick','2016-05-08 13:00:00'),
(14,'alligator','2016-05-01'),
(11,'snail','2016-07-08 19:00:00');
select * from t2; -- returns in physical order (the primary key `id`)
select * from t2 order by thing; -- returns via thing, which has no index anyway
select * from t2 order by theWhen,thing; -- partial index use
note that indexes aren't even used until you have a significant number of rows in a db anyway
Edit (new data comes in)
insert t2 (someInt,thing,theWhen) values (777,'apple',now());
select t2.id,t2.thing,t2.theWhen,#rnk:=#rnk+1 as rank
from t2
cross join (select #rnk:=0) xParams
order by thing;
+----+-----------+---------------------+------+
| id | thing | theWhen | rank |
+----+-----------+---------------------+------+
| 2 | alligator | 2016-05-01 00:00:00 | 1 |
| 4 | apple | 2016-09-04 15:04:50 | 2 |
| 1 | chopstick | 2016-05-08 13:00:00 | 3 |
| 3 | snail | 2016-07-08 19:00:00 | 4 |
+----+-----------+---------------------+------+
Focus on the fact that you can maintain your secondary indices and generate a rank on the fly whenever you want.

mySQL column to hold array

I'm a beginner concerning coding and especially SQL and PHP.
I deal with app. 120 users.
The users can acquire app. 300 different collectible items.
When a user acquires a specific item, I would like the ID number of that particular item to be stored in the row of the user who acquired it, so that there is some information about what items the user already has (and to avoid duplicate items in his possession).
Is there a good way to store such information?
Is it even possible to set a column type to array and store it there?
Please note: I'm not lazy and I've been digging around and searching for the answer for 2 hours. I couldn't find a solution. I know of the rule that one should insert only one piece of information into one cell.
MySQL does not support storing arrays. However, you can use a second table to emulate an array by storing the relation between the users and items. Say you have the table users:
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
...
);
And you have a table defining items:
CREATE TABLE items (
item_id SERIAL PRIMARY KEY,
...
);
You can relate what items a user has using a table similar to user_items:
CREATE TABLE user_items (
id SERIAL PRIMARY KEY,
user_id BIGINT UNSIGNED NOT NULL,
item_id BIGINT UNSIGNED NOT NULL,
...,
FOREIGN KEY (user_id)
REFERENCES users (user_id),
FOREIGN KEY (item_id)
REFERENCES items (item_id)
);
Then, to determine what items user 123 has acquired, you could use JOINs similar to:
SELECT items.*
FROM users
INNER JOIN user_items
ON user_items.user_id = users.user_id
INNER JOIN items
ON items.item_id = user_items.item_id
WHERE users.user_id = 123; -- Or some other condition.
I assume you have 2 tables for example, users and items. To control which user already has a specific item, i would create an associative table, including the UserID from users and ItemID from items. This way you can now check in your user_items table if the user already has this item.
Here is a small example:
users (UserID is PK):
+--------+----------+
| UserID | UserName |
+--------+----------+
| 1 | Fred |
| 2 | Joe |
+--------+----------+
items (ItemID is PK):
+---------+----------+
| ItemID | ItemName |
+---------+----------+
| 5 | Book |
| 6 | Computer |
+---------+----------+
user_items (ItemID referencing items.ItemID, UserID referencing users.UserID):
+---------+--------+
| ItemID | UserID |
+---------+--------+
| 5 | 1 |
| 6 | 2 |
+---------+--------+

How can I store user-ordered data in a MySQL database?

In a particular application, a user is able to store widgets in a favorites list. Users and widgets each have their own IDs, so I devised a Favorites table to store users' favorites:
+-----------+------------------+
| Field | Type |
+-----------+------------------+
| userid | int(11) unsigned |
| widgetid | int(11) unsigned |
| dateadded | int(11) unsigned |
+-----------+------------------+
This is all well and good, but now I want the user to be able to manually reorder the widgets in their favorites. Suddenly, the dateadded column is unhelpful, since updating it would only allow a user to bump a widget to the front of the list.
Therefore, it would seem I need an index column instead:
+-----------+------------------+
| Field | Type |
+-----------+------------------+
| userid | int(11) unsigned |
| widgetid | int(11) unsigned |
| index | int(11) unsigned |
+-----------+------------------+
This way, to reorder columns, I can just manipulate the DB table like an array—pull out a row, renumber the others to "free" a slot, and reinsert the row. For example, to move the object at index 13 to index 5, I would do the following:
SELECT * FROM Favorites WHERE `index` = 13;
DELETE * FROM Favorites WHERE `index` = 13;
UPDATE Favorites
SET `index` = `index` + 1
WHERE `index` >= 5 AND `index` < 13;
INSERT INTO Favorites (userid, widgetid, index) VALUES (<data from select>, 5);
However, this feels extremely hacky and very un-SQL, and I can't imagine the performance would be fantastic, either.
What is a "proper" solution to storing arbitrarily-reorderable data in a MySQL database?
I would use index values with some step e.g. 10, 20, 30, 40.
Then when I would like to move index 40 to be between 10 and 20 I first update the index to be 15 where index=40. And then update all the user's records once more restoring the 10, 20, 30, 40 order based on the select below.
select f.*,
#new_index:=#new_index+10 as new_index
from Favorites f, (select #new_index:=0) as sess_var
where f.user_id=:theUserId
order by f.index
The query will return you all the old records with new values for the index. Adapt it to be used in your second update.
UPDATE:
update Favorites f
join (the query above) sub on
f.user_id = sub.user_id and f.widget_id= sub.widget_id
set index=sub.new_index