Db performance improvement - mysql

I am creating a database and a project. In this project we will create different-different companies. We have two options for create database.
Create a common table for all companies and save all information in a single table. Suppose company_daily_records which will have all companies data. Suppose a company have 1,00,000 records and we have 1000 companies so this company_daily_records will have 1,00,000*1000 records
Create separate db table for each company so their will be 1000 company_daily_records tables and each table will have 1,00,000 records.
Which db performance will be good,
Also which db SQL language we should prefer?

1) if you create separate database for each company, which is more likely, then your record will be organized. But if your project deal with all companies at the same time then you have to switch your connection frequently.
2) if you create one database for all companies, it is possible also you just have to add a additional table of 'company' includes all companies that can be used as foreign_key in e.g 'employee' table to separate employees from specific company...
But it has complexity of records as its not in very organized form.
As you mention the daily record can be in billions, I suggest you to go with separate databases that will surely save searching, query time which is the most important aspect...
--> I think you can use mysql to manage your record.
Thankyou

I would not suggest create a table for each companies because:
How do you know what/how many companies there will be?
When you have a new company, you would possibly need to create a new table in database, and update your application code manually. It could be made automatic, but not an easy task though
Because you are at the early state now, it is fine to with the traditional way of relational database. That is to a company table a company_record table. You can worry about performance later when it happens or when you have spare time for optimization

Don't design the schema for a large dataset until you have some thoughts on how how the data will be inserted and queried.
You need to avoid scanning 100 million (10 crore) rows to get an answer; it will be painfully slow. That implies indexing.
NoSQL implies no indexing, or you have to build the indexes yourself. You would be better off with a real RDBMS doing such heavy-lifting for you.
If you split by company into tables or databases or partitions or shards:
Today you have 1000 tables (etc), tomorrow you have 1123.
Any operation that goes across companies will be difficult and slow.
Working with 1000 tables/dbs/partition, or especially shards, has inefficiencies.
I vote for a single 'large' (but not 'huge') table with a SMALLINT UNSIGNED (2-bytes) column for company_id.
Since you are into the "Data Warehouse" realm, Summary Tables come to mind.
Will you be deleting "old" data? That is another thing to worry about in large tables.
Inserting 1000 rows per day is no problem. (1000/second would be another story.)

Related

MYSQL - compare between 1 table have n partition and n table have same struct

I am a student and I have a question when I research about mysql partition.
Example I have a table "Label" with 10 partitions by hash(TaskId)
resourceId (PK)
TaskId (PK)
...
And I have 10 table with name table is "label": + taskId:
tables:
task1(resourceId,...)
task2(resourceId,...)
...
Could you please tell me about advantages and disadvantages between them?
Thanks
Welcome to Stack Overflow. I wish you had offered a third alternative in your question: "just one table with no partitions." That is by far, in almost all cases in the real world, the best way to handle your data. It only requires maintaining and querying one copy of each index, for example. If your data approaches billions of rows in size, it's time to consider stuff like partitions.
But never mind that. Your question was to compare ten tables against one table with ten partitions. Your ten-table approach is often called sharding your data.
First, here's what the two have in common: they both are represented by ten different tables on your storage device (ssd or disk). A query for a row of data that might be anywhere in the ten involves searching all ten, using whatever indexes or other techniques are available. Each of these ten tables consumes resources on your server: open file descriptors, RAM caches, etc.
Here are some differences:
When INSERTing a row into a partitioned table, MySQL figures out which partition to use. When you are using shards, your application must figure out which table to use and write the INSERT query for that particular table.
When querying a partitioned table for a few rows, MySQL automatically figures out from your query's WHERE conditions which partitions it must search. When you search your sharded data, on the other hand, your application much figure out which table or tables to search.
In the case you presented --partitioning by hash on the primary key -- the only way to get MySQL to search just one partition is to search for particular values of the PK. In your case this would be WHERE resourceId = foo AND TaskId = bar. If you search based on some other criterion -- WHERE customerId = something -- MySQL must search all the partitions. That takes time. In the sharding case, your application can use its own logic to figure out which tables to search.
If your system grows very large, you'll be able to move each shard to its own MySQL server running on its own hardware. Then, of course, your application will need to choose the correct server as well as the correct shard table for each access. This won't work with partitions.
With a partitioned table with an autoincrementing id value on each row inserted, each of your rows will have its own unique id no matter which partition it is in. In the sharding case, each table has its own sequence of autoincrementing ids. Rows from different tables will have duplicate ids.
The Data Definition Language (DDL: CREATE TABLE and the like) for partitioning is slightly simpler than for sharding. It's easier and less repetitive to write the DDL add a column or an index to a partitioned table than it is to a bunch of shard tables. With the volume of data that justifies sharding or partitioning, you will need to add and modify indexes to match the needs of your application in future.
Those are some practical differences. Pro tip don't partition and don't shard your data unless you have really good reasons to do so.
Keep in mind that server hardware, disk hardware, and the MySQL software are under active development. If it takes several years for your data to grow very large, new hardware and new software releases may improve fast enough in the meantime that you don't have to worry too much about partitioning / sharding.

SQL Table MAX Limit

I'm building a web application that uses database (MySQL in particular).
For each database, each user will have their own table, that each has their category. For example:
Database 1 (Music Playlist) - Tables:User1,User2,User3
Database 2 (Wall posts) - Tables:User1,User2,User3
Database 3 (Wall replies) - Tables:User1_post1,User1_post2,User3_post1
Let's say I have 100,000 users now. Thinking about how many tables there are in total, will this be wise? Is there a maximum table limit to this? Will this affect performance?
I'm taking a course right now and I just realized there is JOINing of tables. Is this a better idea? Will it have a difference in performance?
Relational databases are designed to handle large amounts of data. Tables with many millions of rows are not uncommon; there are examples with billions of rows as well. So, you don't have to worry about 100,000 users, as long as you understand how to structure your database. Two key ideas are indexes and partitions.
You do have a problem with your structure, however. You do not want a separate table for each user. You want a single table with a column specifying the user. Although the tables will have hundreds of thousands or millions of rows, you do not need to worry. Databases are designed for this type of volume.

When is using multiple tables the right solution to reducing table size?

After reading about foreign keys and references on MySQL, I got this question in my mind.
If you run an email marketing service, and let your users create contact lists, that will hold tons of records, thousands probably a million or two, would it be better idea to have one table for each user's contact list? Or one big table that has all the contacts ever added to the website, and reference them to the list the belong?
If you have 50,000 users, and each of them has 2 contact lists each of 10,000 emails, you have 1 million emails logged into 1 single table. Wouldn't it be too intensive to run a select query on that table like...
SELECT * FROM name_of_the_big_table
WHERE belonging_contact_list = id_of_the_contact_list
If you have 50,000 users, and each of them has 2 contact lists each of 10,000 emails, you have 1 million emails logged into 1 single table. Wouldn't it be too intensive to run a select query on that table like...
No
You need to consider the design and implementation of your MySql environment and queries but handling the scale you mention is not inherently a problem.
Scale Mysql
A million records isn't a big number for a database - they're meant to handle such cases. You're better off having less fragmentation by keeping all contacts in one table - this way there will be no repetition. I would prefer the latter option.

mysql multiple table/multiple schema performance

A quick bit of background - We have a table "orders" which has about 10k records written into it per day. This is the most queried table in the database. To keep the table small we plan to move the records written into it a week or so ago into a different table. This will be done by an automated job. While we understand it would make sense to pop the history off to a separate server we currently just have a single DB server.
The orders table is in databaseA. Following are the approaches we are considering:
Create a new schema databaseB and create an orders table that contains the history?
Create a table ordershistory in databaseA.
It would be great if we could get pointers as to which design would give a better performance?
EDIT:
Better performance for
Querying the current orders - since its not weighed down by the past data
Querying the history
You could either:
Have a separate archival table, possibly in other database. This can potentially compilcate querying.
Use partitioning.
I'm not sure how effective the MySQL partitioning is. For alternatives, you may take a look at PostgreSQL partitioning. Most commercial databases support it too.
I take it from your question that you only want to deal with current orders.
In the past, have used 3 tables on busy sites
new orders,
processing orders,
filled orders,
and a main orders table
orders
All these tables have a relation to the orders table and a primary key.
eg new_orders_id, orders_id
processing_orders_id, orders_id ....
using a left join to find new and processing orders should be relatively efficient

MySql DB Design question

Assuming I have an application where I expect no more than 50,000 users.
I have 7 tables in my DB schema. Is it a good idea to replicate all the
tables for every user? So, in my case, number of tables will roughly be
50,000 * 7 = 350,000.
Is it in anyway better than 7 monolithic tables?
NO, I would not recomend creating a table schema per user in the same database.
mysql should handle the load perfectly well, given the correct indexes on tables.
What you're proposing is horizontal partitoning. http://en.wikipedia.org/wiki/Partition_%28database%29
i.e. you take all the rows in what would (logically) be one table, and you spread those rows across multiple tables, according to some partitioning scheme (in this case, one table per user).
In general, for starter applications of this nature, only consider partitioning (along with a host of other options) when performance starts to suffer.
No, it's definitely not. For two reasons:
The database design should not change just because you store more data in it.
Accessing different tables is more expensive and more complicated than accessing specific data in a single table.
Just image if you wanted statistics for a value from each user. You would have to gather data from 50 000 tables using 50 000 separate queries instead of running a single query against one table.
A table can easily contain millons of records, but i'm not sure that the database can even contain 350 000 tables.
Certainly not. A DB is typically optimized for having multiple rows in a given table... Try to rethink your DB schema to have a field in each of your tables that holds the user's ID or another associative table that holds the user id and key for the particular entry in the data table.
There's a decent intro to DB design here.