We re-factored our user specific configuration from
create table user (id, name, ..., config1, config2, config3, ..) to
create table user (id, name, ...);
create table user_config (id, user_id, config_val);
Our MySQL database size increased by a factor of 2 after making this change and migrating the users from the older table to the newer table. We made this so that user configuration can be made extensible, but why does the space requirement go up because of this. What could be the reason.
If you had an oritinal table with 20 fields, and 1,000,000 users, that would be 20 * 1,000,000 = 20,000,000 items of data.
Say, for example, you now have the same number of users, but decrease the table to 10 fields, and had 10 config rows with three fields each (as per your code). This would be 10 * 1,000,000 + 10 * 3 * 1,000,000 = 50,000,000. This would be a factor of 2.5.
So, basically, for each configuration variable, you are now adding an id (Primary Key), and a user (Foreign Key) field. Added to that, there is now more indexing data that has to be generated.
SO, it could very well be the case that your data requirements have dramatically increased.
Related
I have 3 table - TB1, TB2 and r_tb1_tb2 (innoDB)
TB1 Hold the details of the users (will be inserted)
- id (primary, unique)
- name
TB2 Holds the details of the course the users can take (static table)
- id (primary, unique)
- name of the course
r_tb1_tb2 hold the relation between the 2 tables
- rID
- user_id (from table 1)
- course_id (reference to table 2)
When I insert a new row in TB1, I get the id of the last inserted row.
And use that to insert another row in r_tb1_tb2
I can forsee that this may result to erroneous entries in case of simultaneous instances of inserts in tb1.
Can someone please point to the best practices for such simultaneous updates.
Thanks in advance.
last_insert_id has built in protection for this
The ID that was generated is maintained in the server on a
per-connection basis. This means that the value returned by the
function to a given client is the first AUTO_INCREMENT value generated
for most recent statement affecting an AUTO_INCREMENT column by that
client. This value cannot be affected by other clients, even if they
generate AUTO_INCREMENT values of their own. This behavior ensures
that each client can retrieve its own ID without concern for the
activity of other clients, and without the need for locks or
transactions.
(emphasis theirs)
Thus if two different users are taking action on your site that results in records being inserted into T1, the last_insert_ids for those users will be different because they are using two different connections (clients in the conext above)
last_insert_id return value according the the user:
I am trying to figure out the best method to design a database that allows users to rank a number of items.
The items are ranked by all users.
Every item will have a rank assigned to it automatically in the beginning. So when I create a user, the rankings table will be auto populated.
So something like this:
users (id, name) - 100,000+ entries
items (id, name) - never more than 1,000 entries
The only thing I can currently think of to house the rankings is this:
rankings (id, user_id, item_id, ranking)
But that feels wrong because I'll have 100 million entries in the rankings table. I don't know if that's ok? What other option could I use?
Can each user assign either zero or one ranking to each item? Or can she assign more than one ranking to a given item?
If it's zero-or-one, your ranking table should have these columns
user_id INT
item_id INT
ranking INT
The primary key should be the composite key (user_id, item_id). This will disallow multiple rankings for the same item for the same user, and will be decently efficient. Putting a separate id on this table is not the right thing to do.
For the sake of query efficiency I suggest you also create the covering indexes (user_id, item_id,ranking) and (item_id, user_id, ranking).
If you can get your hundreds of thousands of users to rank all 1000 items, that will be great. That's a problem any web app developer would love to have.
This table's rows are reasonably small, and with the indexes I mentioned it should perform decently well both at a smaller scale and as you add users.
Im designing MySql database for store multiple products such as computers,mobile phones,pen drives....etc.each product has different features such as
**Computers has**
Processor
Ram
HDD
Monitor Size
....etc
**Mobile phone has**
Display Size
Network type
OS
Internal Memory
....etc
**Pendrive has**
Capacity
USB Version
....etc
And i have to store unlimited number of products, instead of creating separate tables for each product how to create database structure to store these information on one or fixed number of tables(data tables + mapping tables).(i think Wordpress store data in this kind of format, it uses few tables and store any number field related to post/category within those tables).any help/idea to solve this problem would be appreciated.
Consider this
Create three table product, feature, product_feature and maybe product_photos
Product database will be
pid, p_name, p_description, p_price, ...
insert query
INSERT INTO (p_name, p_description, p_price, ....) VALUES(?,?,?,...)
feature table will
fid, f_name, f_description, ...
insert query
INSERT INTO (F_name, F_description, ....) VALUES(?,?,?,...)
now the product_feature table will be
id, pid, fid
query for one product
// say a product Id is 1
INSERT INTO (pid, fid) VALUES(1, 10)
INSERT INTO (pid, fid) VALUES(1, 15
INSERT INTO (pid, fid) VALUES(1, 30)
where pid and fid are foreign keys with relations, phpmyadmin can do that for you
you can then add a product with multiple features
then maybe the photo table
foto_id, photo_name, photo_path ....
use InnoDB for all the tables
Let me know if you need further help
You need to design a table in such a way that it covers all attributes for all products. This would help you insert any type of product in to that table. But, keep in mind that if there are 4 products with 10 different attributes, you end up creating 40 columns and might use only 10 to insert data at any point of time.
You can select one approach used by some ORM like hibernate or doctrine
https://docs.jboss.org/hibernate/orm/3.5/reference/en/html/inheritance.html
http://docs.doctrine-project.org/en/2.0.x/reference/inheritance-mapping.html
Your selection depends on your data and how you will use it.
If you never will get a list with different kind of devices at once you should use different tables.
If you plan that all "devices" should share some fields like "ref", "name", "description", "price", "weight", etc, you should use one table because take a simple list with all kind of devices will be easy and low cost, but the table can grown if you plan to add more types of devices and properties.
If you plan that this fields will increase the table then you should split concrete properties of each type of device in different tables and use a discriminator field in the devices table that will allow you to make joins.
If the properties of the devices will grown a lot (or dynamically), then you should consider to use a table with fields: attribute and value. In this case the queries will be more heavy.
So basically you can do:
computer->
mobile->
pendribe->
or:
devices-> type, price, name, processor, ram, internal memory, capacity, usb version
or:
devices->type, price, name
computer-> processor, ram
mobile-> internal memory
pendribe-> usb version
or:
devices -> id, price, name
attributes-> id, id_attribute, value
I have object which store in database, it's a some text with properties.
That text has rating. I need to store this rating, and prevent to one user raise this raiting more than one time. If I store "text id" and "user id" in other table and count all records which have needing "text id" i have too much records in table.
There are two ways:
You can use many-to-many relationship ie use separate table with name like 'user_likes', it will have user_id and like_id columns, both of them are primary key (it makes possible user to like the like_object only once)
Another way - which hightraffic websites use: every user record in user table has columns: likes which is just serialized array or json, whatever. Before update this columns your application retrieve this data and look for particular like_object_id if it doesn't exist - you update your database. Please note that in this case all care about data consistency in your application (for instance like_object_id exists in some user record, but doesn't exist in like_object table) should be implemented in your application code, not database.
P.S. Sorry for my english, but I tried to explain as best as I could.
If I store "text id" and "user id" in other table and count all records which have needing "text id" i have too much records in table.
How do you know what is too many records?
Some of the MySQL tables I support have billions of rows. If they need more than that, they split the data to multiple MySQL servers. 1 million rows is not a problem for a MySQL database.
If you want to limit the data so each user can "like" a given text only once, you must store the data separately for each user. This is also true if a user can "unlike" a text they had previously liked.
CREATE TABLE likes (
user_id BIGINT UNSIGNED NOT NULL,
post_id BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (user_id, post_id),
KEY (post_id, user_id)
);
This example table uses its primary key constraint to ensure each user can like a given post only once. By adding a second index, this helps to optimize queries for likes on a specific post.
This is only 16 bytes per row, plus the size of the index. I filled an InnoDB table with over 1 million rows, and it uses about 60MB.
mysql> show table status\G
Name: likes
Engine: InnoDB
Rows: 1046760
Data_length: 39419904
Index_length: 23658496
It's common to store databases on terabyte-sized storage these days, so a 60MB table doesn't seem too large.
I store the likes with the post itself, but not sure with its performance since non of my websites reached a very heavy load.
but I do the following :
Post {
id int;
likes_count int; // likes count to quickly retrive it
likes string; // id of the users liked this post, comma separated
}
when a user likes a post, (using ajax):
the UI will update directly and show that the user liked the post
ajax will send request to the server with the post id and the user id, then post data will be updated as follow:
post.likes_count += 1;
post.likes += userId + ',' ;
when the user reload the page, it will check if his id is in likes, then it the post will appear as liked.
I have a mysql table that stores user ratings for different items. It has the following fields:
id (int, pk)
userId (int)
itemId (int)
rating (float)
timestamp (int)
and following indices:
(userId, rating): for queries about all items a particular user has rated
(itemId, rating): for queries about all users that have rated a particular item
This table has over 10 million rows. To make it more scalable, I would like to perform a horizontal partitioning. In particular, I plan to partition the table into 20 tables:
tbl_rating_by_item_0: store ratings whose itemId ending with 0
tbl_rating_by_item_1: store ratings whose itemId ending with 1
......
tbl_rating_by_item_9: store ratings whose itemId ending with 9
and
tbl_rating_by_user_0: store ratings whose userId ending with 0
tbl_rating_by_user_1: store ratings whose userId ending with 1
......
tbl_rating_by_user_9: store ratings whose userId ending with 9
The idea is when querying by itemId we read from tbl_rating_by_item_itemId and when querying by userId we read from tbl_rating_by_user_userId. The drawback is whenever I want to insert or delete a rating, I need to insert into or delete from two tables.
Is there any other solutions?
Have you tried indexing? Creating two composite indexes
INDEX name1 (rating,userid)
INDEX name2 (rating,itemId)
may help increase in performance.
Also consider table partitioning. Have a look at Mysql able partitioning
This is better than physically creating two separate tables.