I know that in SQL Server, the maximum number of "objects" in a database is a little over 2 billion. Objects contains tables, views, stored procedures, indexes, among other things . I'm not at all worried about going beyond 2 billion objects. However, what I would like to know, is, does SQL Server suffer a performance hit from having a large number of tables. Does each table you add have a performance hit, or is there basically no difference (assuming constant amount of data). Does anybody have any experience working with databases with thousands of tables? I'm also wondering the same about MySQL.
No difference, assuming constant amount of data.
Probably a gain in practical terms because of things like reduced maintenance windows (smaller index rebuilds), ability to have read-only file groups etc.
Performance is determined by queries and indexes (at the most basic level): not number of objects
In terms of the max number of tables I have had a database with 2 million tables. No performance hit at all. my tables where small around 15MB each.
I doubt SQL Server will have a performance problem working with thousands of tables, but I sure would.
I've worked on databases with hundreds of tables in SQL Server with no problems, though.
SQl Server can suffer a larger performance hit by using tables with many, many columns instead of breaking out a related table (even one with a one-to_one relationship). Plus a wide table likely can have problems when the data you want to input exceeds the number of bytes that you can store for a column. You can create a table that has the potential to store, for example, 10000 bytes but you will still only be able to store 8060 bytes.
In my experience I dont think that the number of tables will hit the performance. But then you should be able to justify why you are having so many tables in the database. That is because having so many tables at the database side will also effect the work of developer at the server side.
IMO if you divide the tables on the basis of functionality then you can not make the life of developer easy but also have performance gain in your application because you have fixed tables from where you suppose to get the required data.
Say like you need to store sales,purchase,receipt and payment details. All of the them have the same table structure then instead of storing them in single table you could store them all separately in separate tables. with these you can get all the details for the sales in single table, for purchase in its single table and likewise. thus it can help in improving the response time of database tier of the application which is one of the most slowest component in all web tiers...!!! ofcourse we imporve upon the performance of database by SQL quires but then such structuring can also indirectly help you improve upon the database performance.
Related
I have a single database, most of the tables are connected in some way.
It consists of over 500000 records.
I need to implement live search, but number of records bothers me.
Database will grow and live search in millions of records will sure cause problems. So i need to move old records (let's assume date field is present) to another database and only keep fresh ones available for search.
Old records won't be used anymore, that's for sure, but i still need to keep them.
Any ideas how that could be implemented in MySQL?
500,000 records really is not very many records.
Before you start taking drastic actions (such as limiting the ability of users to seamlessly see all the data at once), you should consider basics for improving performance:
Indexes to improve standard query performance.
Partitioning to limit the portions of tables that need to be accessed.
Full text indexing to improve match() queries.
Optimization of SQL queries.
In general, these are sufficient for databases that are orders of magnitude larger than the volume you are dealing with.
These may not apply to your particular situation; but you should exhaust the lower-hanging fruit for performance optimization before changing your physical data model for a problem that might never occur.
I am dealing with a large database that is collecting historical pricing data. The schema is relatively simple and does not change.
Something like:
SKU (char), type(enum), price(double), datetime(datetime)
The issue is that this table now has over 500,000,000 rows and is around 20gb and growing. It is already getting a bit difficult to run queries. One common query is to get all skus from a specific date range consisting of maybe 500,000 records. Add any complexity like group by, and you can forget it.
This db is mostly writes. But we obviously need to crunch the data and run queries occasionally. I understand that better index planning can help speed up the queries, but I am wondering if this is the type of data that would benefit from a noSQL solution like MongoDB? Can I expect mysql (probably moving to MariaDB) to continue to work for us, even after it grows beyond 100-200 gb in size? Or should I explore alternatives before things get unweildly?
NoSQL is not a solution to a "large database" problem; NoSQL--specifically document databases--are designed for scenarios where the nature of the data you're storing varies, so you don't want to define rigid schemas and relationships up front.
What you have is simple, well-defined data. This is ideally suited for a relational database, but for something of that scale I would recommend looking something either commercial (i.e. SQL Server or Oracle, depending on your platform). The databases I work with in SQL Server are around four terabytes in size with several tables in the hundreds-of-millions records like you have. A relational database can easily accommodate the simple data you've outlined.
You actually have an ideal use-case for SQL, and a rather bad fit for NoSQL. MySQL devs report people using databases of 5,000,000,000 records. Some other SQL servers will be even more scalable than that. However, if you don't have a proper index support, it should be impossible to manage even a fraction of that.
BTW, what is your table schema, including indices?
You could switch to mariadb and then use the spider engine. The spider engine makes it possible to split your data across multiple mariadb instances without loosing the abillity to run queries against your existing instance.
So you can define your own rules for partitioning and then create one instance per partition. So in the end you have multiple instances of mariadb but all your records are virtual sumed up in one table with the spider engine.
Your performace gain would be because you split your data across multiple instances and therefore reduce the amount of records per table or instance and of course by using more hardware ressources.
The case:
I have been developing a web application in which I storage data from different automated data sources. Currently I am using MySQL as DBMS and PHP as programming language on a shared LAMP server.
I use several tables to identify the data sources and two tables for the data updates. Data sources are in a three level hierarchy, and updates are timestamped.
One table contains the two upper levels of the hierarchy (geographic location and instrument), plus the time-stamp and an “update ID”. The other table contains the update ID, the third level of the hierarchy (meter) and the value.
Most queries involve a joint statement between this to tables.
Currently the first table contains near 2.5 million records (290 MB) and the second table has over 15 million records (1.1 GB), each hour near 500 records are added to the first table and 3,000 to the second one, and I expect this numbers to increase. I don't think these numbers are too big, but I've been experiencing some performance drawbacks.
Most queries involve looking for immediate past activity (per site, per group of sites, and per instrument) which are no problem, but some involve summaries of daily, weekly and monthly activity (per site and per instrument). The page takes several seconds to load, sometimes surpassing the server's timeout (30s).
It also seems that the automatic updates are suffering from these timeouts, causing the connection to fail.
The question:
Is there any rational way to split these tables so that queries perform more quickly?
Or should I attempt other types of optimizations not involving splitting tables?
(I think the tables are properly indexed, and I know that a possible answer is to move to a dedicated server, probably running something else than MySQL, but just yet I cannot make this move and any optimization will help this scenario.)
If the queries that are slow are the historical summary queries, then you might want to consider a Data Warehouse. As long as your history data is relatively static, there isn't usually much risk to pre-calculating transactional summary data.
Data warehousing and designing schemas for Business Intelligence (BI) reporting is a very broad topic. You should read up on it and ask any specific BI design questions you may have.
I am working on a project involving large amount of data from the delicious website. The data available is "Date, UserId, Url, Tags" (for each bookmark).
I normalized my database to a 3NF, and because of the nature of the queries that we wanted to use in combination, I came down to 6 tables... The design looks fine, however, now that a large amount of data is in the database, most of the queries need to join at least 2 tables together to get the answer, sometimes 3 or 4. At first, we didn't have any performance issues, because for testing matters we had not added too much data to the database. Now that we have a lot of data, simply joining extremely large tables takes a lot of time and for our project, which has to be real-time, this is a disaster.
I was wondering how big companies solve these issues. Looks like normalizing tables just adds complexity, but how does the big company handle large amounts of data in their databases, don't they use normalization?
Thanks.
Since you asked about how big companies (generally) approaches this:
They usually have a dba(database administrator) who lives and breathes the database the company uses.
This means they have people that know everything from how to design the tables optimally, profile and tune the queries/indexes/OS/server to knowing what firmware revision of the RAID controller that can cause problems for the database.
You don't talk much about what kind of tuning you've done, e.g.
Are you using MyISAM or InnoDB tables ? Their performance(and not the least their features) is radically different for different workloads.
Are the tables properly indexed according to the queries you run ?
run EXPLAIN on all your queries - which will help you identify keys that could be added/removed, wether the proper keys are selected, compare queries(SQL leaves you with lots of way to accomplish the same things)
Have you tuned the query-cache ? For some workloads the query cache(default on) can cause considerable slowdown.
How much memory do your box have , and is mysql tuned to take advantage of this ?
Do you use a file system and raid setup geared towards the database ?
Sometimes a little de-normalization is needed.
Different database products will have different charasteristics, MySQL might be blazingly fast for some worlkoads, and slow for others.
I'm working on a project which is similar in nature to website visitor analysis.
It will be used by 100s of websites with average of 10,000s to 100,000s page views a day each so the data amount will be very large.
Should I use a single table with websiteid or a separate table for each website?
Making changes to a live service with 100s of websites with separate tables for each seems like a big problem. On the other hand performance and scalability are probably going to be a problem with such large data. Any suggestions, comments or advice is most welcome.
How about one table partitioned by website FK?
I would say use the design that most makes sense given your data - in this case one large table.
The records will all be the same type, with same columns, so from a database normalization standpoint they make sense to have them in the same table. An index makes selecting particular rows easy, especially when whole queries can be satisfied by data in a single index (which can often be the case).
Note that visitor analysis will necessarily involve a lot of operations where there is no easy way to optimise other than to operate on a large number of rows at once - for instance: counts, sums, and averages. It is typical for resource intensive statistics like this to be pre-calculated and stored, rather than fetched live. It's something you would want to think about.
If the data is uniform, go with one table. If you ever need to SELECT across all websites
having multiple tables is a pain. However if you write enough scripting you can do it with multiple tables.
You could use MySQL's MERGE storage engine to do SELECTs across the tables (but don't expect good performance, and watch out for the Windows hard limit on the number of open files - in Linux you may haveto use ulimit to raise the limit. There's no way to do it in Windows).
I have broken a huge table into many (hundreds) of tables and used MERGE to SELECT. I did this so the I could perform off-line creation and optimization of each of the small tables. (Eg OPTIMIZE or ALTER TABLE...ORDER BY). However the performance of SELECT with MERGE caused me to write my own custom storage engine. (Described http://blog.coldlogic.com/categories/coldstore/'>here)
Use the single data structure. Once you start encountering performance problems there are many solutions like you can partition your tables by website id also known as horizontal partitioning or you can also use replication. This all depends upon the the ratio of reads vs writes.
But for start keep things simple and use one table with proper indexing. You can also determine if you need transactions or not. You can also take advantage of various different mysql storage engines like MyIsam or NDB (in memory clustering) to boost up the performance. Also caching plays a very good role in offloading the load from the database. The data that is mostly read only and can be computed easily is usually put in the cache and the cache serves the request instead of going to the database and only the necessary queries go to the database.
Use one table unless you have performance problems with MySQL.
Nobody here cannot answer performance questions, you should just do performance tests yourself to understand, whether having one big table is sufficient.