DataBase Design(Big Table) - mysql

What is big table(i.e Google DataBase design), I have such type of requirement, but I don't know how to design it.
In Big Table, how to maintain relations among them?

create all table with innodb storage engine which maintain relationships
Choose table fields according to and limited to requirement

The paper of Big table, published by google, may be hard to read. Hope my answer can help you to start understanding.
In old days, RDBMS stores data according rows, one record one row, 1,2,3,4,5.....
Then if you want to find record 5, it's ok, database will seek in a B+ tree(or something similar) to get the address of record 5, load it for you.
But the nightmare is when you want to get records that have column "user=Michael", the database has no way but seek every record to check out if the user is "Michael".
Big Table has a different way to store data. It stores all the columns by an inverted table. When we want to find out all the records that satisfy "user=Michael", it seeks this as a key via a B+ tree or hash table, and gets the address of inverted table where stores the list of all records satisfying.
Maybe a good starting point is Lucene, an open source full text search engine, a fully implementation of big table principles.
Be noticed, inverted table is not a column-based storage in RDBMS. They are different, please must remember this.

Related

Can I have one million tables in my database?

Would there be any advantages/disadvantages to having one million tables in my database.
I am trying to implement comments. So far, I can think of two ways to do this:
1. Have all comments from all posts in 1 table.
2. Have a separate table for each post and store all comments from that post in it's respective table.
Which one would be better?
Thanks
You're better off having one table for comments, with a field that identifies which post id each comment belongs to. It will be a lot easier to write queries to get comments for a given post id if you do this, as you won't first need to dynamically determine the name of the table you're looking in.
I can only speak for MySQL here (not sure how this works in Postgresql) but make sure you add an index on the post id field so the queries run quickly.
You can have a million tables but this might not be ideal for a number of reasons[*]. Classical RDBMS are typically deployed & optimised for storing millions/billions of rows in hundreds/thousands of tables.
As for the problem you're trying to solve, as others state, use foreign keys to relate a pair of tables: posts & comments a la [MySQL syntax]:
create table post(id integer primary key, post text);
create table comment(id integer primary key, postid integer , comment text, key fk (postid));
{you can add constraints to enforce referential integrity between comment and posts to avoid orphaned comments but this requires certain capabilities of the storage engine to be effective}
The generation of primary key IDs is left to the reader, but something as simple as auto increment might give you a quick start [http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html].
Which is better?
Unless this is a homework assignment, storing this kind of material in a classic RDBMS might not fit with contemporary idioms. Keep the same spiritual schema and use something like SOLR/Elasticsearch to store your material and benefit from the content indexing since I trust that you'll want to avoid writing your own search engine? You can use something like sphinx [http://sphinxsearch.com] to index MySQL in an equal manner.
[*] Without some unconventional structuring of your schema, the amount of metadata and pressure on the underlying filesystem will be problematic (for example some dated/legacy storage engines, like MyISAM on MySQL will create three files per table).
When working with relational databases, you have to understand (a little bit about) normalization. The third normal form (3NF) is easy to understand and works in almost any case. A short tutorial can be found here. Use Google if need more/other/better examples.
One table per record is a red light, you know you're missing something. It also means you need dynamic DDL, you must create new tables when you have new records. This is also a security issue, the database user needs to many permissions and becomes a security risk.

one table or several table in mysql

i have a huge data.what is the best way for store this data in database.store this data to one table or store in several table?
if i want to save data to several table i must create table for every user.
I think a table per users is not a good, please read above link it will hwlp you to design a better and more efficient database
MySQL :: An Introduction to Database Normalization
http://ftp.nchu.edu.tw/MySQL/tech-resources/articles/intro-to-normalization.html
In a relational dbms, data table size doesn't dictate the database design, but only the relations of entities.
So even with a table of billions of records holding large multi media data, you would not make this several tables, just because the table gets so big.
Maybe using an RDBMS is a wrong approach for your task. Maybe a NoSQL dbms would be better. Even a simple file system could be the way to go.
Maybe however, an RDBMS is the right approach. We don't know, because you have told us almost nothing about your database. And what seems huge to you may be considered small by your dbms.
Are you saying that you would have a huge table holding data for all users, but every user will only be interested in their own data? Then simply partition the table by user. The database design will remain the same, only the underlying storage and internal data access will be different. As long as your queries always select one user's data, you will stay within one partition and data access will be fast.
Here is how to partition a table:
alter table user_movies
add primary key (user_id, movie_id)
partition by hash(user_id) partitions 100;
So many factors you should be taken into consideration before making decisions. But how about this one, try "no-sql" type db, each user will be treated as key, the info about this user will be treated as value.

MySQL or NoSQL? Recommended way of dealing with large amount of data

I have a database of which will be used by a large amount of users to store random long string (up to 100 characters). The table columns will be: userid, stringid and the actual long string.
So it will look pretty much like this:
Userid will be unique and stringid will be unique for each user.
The app is like a simple todo-list app, so each user will have an average amount of 50 todo's.
I am using the stringid in order that users will be able to delete the specific task at any given time.
I assume this todo app could end up with 7 million tasks in 3 years time and that scares me of using MySQL.
So my question is if this is the actual recommended way of dealing with large amount of data with long string (every new task gets a new row)? and is MySQL is the right database solution to choose for this kind of projects ?
I have not experienced with large amount of data yet and I am trying to save myself for the far future.
This is not a question of "large amounts" of data (mysql handles large amounts of data just fine and 2 mio rows isn't "large amounts" in any case).
MySql is a relational database. So if you have data that can be normalized, that is distributed among a number of tables that ensures every datapoint is saved only once then you should use MySql (or Maria, or any other relational database).
If you have schema-less data and speed is more important than consistency than you can/should use some NoSql database. Personally I don't see how a todo list would profit from NoSql (doesn't really matter in this case, but I guess as of now most programmig frameworks have better support for relational databases than for Nosql).
This is a pretty straightforward relational use case. I wouldn't see a need for NoSQL here.
The table you present should work fine however, I personally would question the need for the compound primary key as you would present this. I would probably have a primary key on stringid only to enforce uniqueness across all records. Rather than a compound primary key across userid and stringid. I would then put a regular index on userid.
The reason for this is in case you just want to query by stringid only (i.e. for deletes or updates), you are not tied into always having to query across both field to leverage your index (or adding having to add individual indexes on stringid and userid to enable querying by each field, which means my space in memory and disk taken up by indexes).
As far as whether MySQL is the right solution, this would really be for you to determine. I would say that MySQL should have no problem handling tables with 2 million rows and 2 indexes on two integer id fields. This is assuming you have allocated enough memory to hold these indexes in memory. There is certainly a ton of information available on working with MySQL, so if you are just trying to learn, it would likely be a good choice.
Regardless of what you consider a "large amount of data", modern DB engines are designed to handle a lot. The question of "Relational or NoSQL?" isn't about which option can support more data. Different relational and NoSQL solutions will handle the large amounts of data differently, some better than others.
MySQL can handle many millions of records, SQLite can not (at least not as effectively). Mongo (NoSQL) attempts to hold it's collections in memory (as well as the file system) so I have seen it fail with less than 1 million records on servers with limited memory, although it offers sharding which can help it scale more effectively.
The bottom line is: The number of records you store should not play into SQL vs NoSQL decisions, that decision should be left to how you will save and retrieve the data. It sounds like your data is already normalized (e.g. UserID) and if you also desire consistency when you i.e. delete a user (the TODO items also get deleted) then I would suggest using a SQL solution.
I assume that all queries will reference a specific userid. I also assume that the stringid is a dummy value used internally instead of the actual task-text (your random string).
Use an InnoDB table with a compound primary key on {userid, stringid} and you will have all the performance you need, due to the way a clustered index works.

large databases

I have an online service (online vocabulary trainer). Each user has its vocabulary.
Now, I'm not sure, how I should structure my Mysql-DB.
As far as I know, I have different possibilities:
everything in one table (MyISAM): I store all the vocabulary in one large MyISAM-table and add a column "userid" to identify each user's vocabulary
every user has its own table (MyISAM): Every time, when a user is created, the programm adds a table named like "vocabulary_{userid}" where {userid} is to connect the table to a user.
everything in one table (InnoDB): Like point one, but with InnoDB instead of MyISAM.
The problem is, that one large vocabulary table can reach up to 100 millions rows. With MyISAM, the problem is, that every query locks the whole table. So I imagine, if there are many users online (and send many queries), the table might be locked a lot. And with InnoDB, I'm simply not sure, wheather this is a good solution as I'm having quite some SELECT-, UPDATE-, and INSERT- commands.
I hope anyone can help me. Thank you in advance.
It is almost always better to go with InnoDB. InnoDB can handle 100 milllions rows, the max size is 64tb.
It doesn't sound like you have a relational dataset, but more of a key/value store. Maybe Riak is a better solution.
It depends
If you start having one table per user (aka sharding) you will have some troubles at the beginning.
if you don't have the need of scale right now. go for 1 table with good indexes. I wouldn't use MyISAM but InnoDB instead otherwise you can get hit by the bigests issue of MyISAM (locks...)
The normal relational design for this would, I think, use three tables:
Users — user ID, and other attributes: name, email, etc
Vocabulary — least clear from the question, but presumably words with attributes such as part of speech and maybe meaning, probably including a word ID (because some word spellings have multiple meanings).
User_Vocabulary — a table with a User ID, Word ID, and maybe attributes such as 'date learned'.
If MyISAM locks the table while a query is going on, then you can't afford to use MyISAM if you need concurrent updates to the User_Vocabulary table. So, go with InnoDB for all the tables.

Multiple table or one single table?

I already saw a few forums with this question but they do not answer one thing I want to know. I'll explain first my topic:
I have a system where each log of multiple users are entered to the database (ex. User1 logged in, User2 logged in, User1 entered User management, User2 changed password, etc). So I would be expecting 100 to 200 entries per user per day. Right now, I'm doing it in a single table and to view it, I just have to filter out using UserID.
My question is, which is more efficient? Should I use one single table or create a table per user?
I am worried that if I use a single table, the system might have some difficulty filtering thousands of entries. I've read some pros and cons using multiple tables and a single table especially concerning updating the table(s).
I also want to know which one saves more space? multiple table or single table?
As long as you use indexes on the fields you're selecting from, you shouldn't have any speed problems (although indexes slow writes, so too many are a bad thing). A table with a few thousand entries is nothing to mySQL (or any other database engine).
The overhead of creating thousands of tables is much worse -- say you want to make a change to the fields in your user table -- now you'd have to change thousands of tables.
A table we regularly search against for a single record # work has about 150,000 rows, and because the field we search for is indexed, the search time is in very small fractions of a second.
If you're selecting those records without using the primary key, create an index on the field you use to select like this:
CREATE INDEX my_column_name ON my_table(my_column_name);
Thats the most basic form. To learn more about it, check here
I would go with a single table. With an index on userId, you should be able to scale easily to millions of rows with little issue.
A table per user might be more efficient, but it's generally poor design. The problem with a table per user is it makes it difficult to answer other kinds of questions like "who was in user management yesterday?" or "how many people have changed their passwords?"
As for storage space used - I would say a table per user would probably use a little more space, but the difference between the two options should be quite small.
I would go with just 1 table. I certainly wouldn't want to create a new table every time a user is added to the system. The number of entries you mention for each day really is really not that much data.
Also, create an index on the user column of your table to improve query times.
Definitely a single table. Having tables created dynamically for entities that are created by the application does not scale. Also, you would need to create your queries with variable tables names, something which makes things difficult to debug and maintain.
If you have an index on the user id you use for filtering it's not a big deal for a db to work through millions of lines.
Any database worth its salt will handle a single table containing all that user information without breaking a sweat. A single table is definitely the right way to do it.
If you used multiple tables, you'd need to create a new table every time a new user registered. You'd need to create a new statement object for each user you queried. It would be a complete mess.
I would go for the single table as well. You might want to go for multiple tables, when you want to server multiple customers with different set of users (multi tenancy).
Otherwise if you go for multiple tables, take a look at this refactoring tool: http://www.liquibase.org/. You can do schema modifications on the fly.
I guess, if you are using i.e. proper indexing, then the single table solution can perform well enough (and the maintenance will be much more simple).
Single table brings efficiency in $_POST and $_GET prepared statements of PHP. I think, for small to medium platforms, single table will be fine. Summary, few tables to many tables will be ideal.
However, multiple tables will not cause any much havoc as well. But, the best is on a single table.