I have a InnoDB table that has about 17 normalized columns with ~6 million records. The size of the table is ~15GB. The queries from the table is starting to take too long and sometimes timeout/crash. I am thinking of splitting the table but am confused which way would be better. Do I split the columns into different tables on the same/different DB? Or do I split the rows of the table into another DB but then how would I know which row is where in the DB's.
Someone mentioned something about Map/Reduce but has gotten me even more confused. Any help on this will be much appreciated.
Thanks.
Splitting up your tables to make your queries faster is not a step I would take. I would first try to see if you can't change your queries or add indexes to make them faster. I would suggest adding the queries & tables in your question, so that we can provide better answers.
If you already have optimized your queries & indexes, you can still try partitioning. That physically splits your table over for example different harddisks but it stays logically one table. That means you won't have to change your queries while still making them faster.
Related
Assuming that I have 20L records,
Approach 1: Hold all 20L records in a single table.
Approach 2: Make 20 tables and enter 1L into each.
Which is the best method to increase performance and why, or are there any other approaches?
Splitting a large table into smaller ones can give better performance -- it is called sharding when the tables are then distributed across multiple database servers -- but when you do it manually it is most definitely an antipattern.
What happens if you have 100 tables and you are looking for a row but you don't know which table has it? If you put index on the tables you'll need to do it 100 times. If somebody wants to join the data set he might need to include 100 tables in his join in some use cases. You'd need to invent your own naming conventions, document and enforce them yourself with no help from the database catalog. Backup and recovery and all the other maintenance tasks will be a nightmare....just don't do it.
Instead just break up the table by partitioning it. You get 100% of the performance improvement that you would have gotten from multiple tables but now the database is handling the details for you.
When looking for read time performance, indexes are a great way to improve the performance. However, having indexes can slow down the write time queries.
So if you are looking for a read time performance, prefer indexes.
Few things to keep in mind when creating the index
Try to avoid null values in the index
Cardinality of the columns matter. It's been observed that having a column with lower cardinality first gives better performance when compared to a column with higher cardinality
Sequence of the columns in index should match your where clause. For ex. you create a index on Col A and Col B but query on Col C, your index would not be used. So formulate your indexes according to your where clauses.
When in doubt if an index was used or not, use EXPLAIN to see which index was used.
DB indexes can be a tricky subject for the beginners but imagining it as a tree traversal helps visualize the path traced when reading the data.
The best/easiest is to have a unique table with proper indexes. On 100K lines I had 30s / query, but with an index I got 0.03s / query.
When it doesn't fit anymore you split tables (for me it's when I got to millions of lines).
And preferably on different servers.
You can then create a microservice accessing all servers and returning data to consumers like if there was only one database.
But once you do this you better not have joins, because it'll get messy replicating data on every databases.
I would stick to the first method.
I have a huge Database table containing around 5 Million rows. Now retrieving records make the server slow in some cases. How can i manage the table now as it growing over the days.
I was thinking to make some archiving technique on yearly basis for example breakdown the complete tables into many small tables on yearly basis, but that cost me a lot of changes in coding. I have to change the complete structure of querying from database. So, most probably changes on most of the places in project.
What else i can do to reduce fetching time of records from database tables? Is there any other technique that i can adopt to avoid many changes in my code?
Thanks in advance
You can easily PARITION the table horizontally using PARITION BY RANGE in MySQL.
Also if you have many columns in table then you can break that table into two or more tables by Vertical partitioning method.
Also add proper indexes preferably clustered or covering indexes on tables and test queries for performance by using EXPLAIN.
May be table partitioning will help in that case. The following provide the info :
http://dev.mysql.com/doc/refman/5.1/en/partitioning-overview.html
https://dba.stackexchange.com/questions/19313/partitioning-a-table-will-boost-the-performance
Recently I was asked to develop an app, which basically is going to use 1 main single table in the whole database for the operations.
It has to have around 20 columns with various types - decimals, int, varchar, date, float. At some point the table will have thousands of rows (3-5k).
The app must have the ability to SELECT records by combining each of the columns criteria - e.g. BETWEEN dates, greater than something, smaller than something, equal to something etc. Basically combining a lot of where clauses in order to see the desired result.
So my question is, since I know how to combine the wheres and make the app, what is the best approach? I mean is MySQL good enough not to slow down when I have 3k records and make a SELECT query with 15 WHERE clauses? I've never worked with a database larger than 1k records, so I'm not sure if I should use MySQL for this. Also I'm going to use PHP as a server language if that matters at all.
you are talking about conditions in ONE where clause.
3000 rows is very minimal for a relational database. these typically go far larger (like 3 million+ or even much more)
i am concerned that you have 20 columns in one table. this sounds like a normalization problem.
With a well-defined structure for your database, including appropriate indexes, 3k records is nothing, even with 15 conditions. Even without indexes, it is doubtful that with so few records, you will see any performance hit.
I would however plan for the future and perhaps look at your queries and see if there is any table optimisation you can do at this stage, to save pain in the future. Who knows, 3k records today, 30m next year.
3000 Records in a database is nothing. You won't have any performance issues even with your 15 WHERE.
MySQL and PHP will do the job just fine.
I'd be more concerned about your huge amount of columns. Maybe you should take a look at this article to make sure you respect the databases normal forms,
Good luck for your project.
I don't think querying a single table of 3-5K rows is going to be particularly intensive. MySQL should be able to cope with something like this easily enough. You could add lot's of indexes to speed up your selects if this is the "choke point" but this will slow down insert, edit's, etc. also if you querying lots of different rows this isn't prob a good idea.
As seeing the no of rows is very minimal,I guess it should not cause any performance issue.Still you can look at using OR operator carefully and also indexes on the columns in where clause.
Indices, indices, indices!
If you need to check a lot of different columns try flatten your used logic. In any case make sure you have set an appropriate index on the checked columns. A not an index per columns, but one index over all those columns, that a used regularly.
Just a question. On of my websites became significant slower. Loadtimes taking over 30 seconds on 30k rows. I must say queries aren't optimized so 10k queries can be fired but still, I find this taking too long... So I figured, let's check the indexes. After viewing some of the 'problem' tables I saw I made index over multiple columns but the cardinality is only shown on 1 column and the other indexes have 0 cardinality.
Did I made wrong indexes? In other words, should I make an index for each column instead of combining them?
It's almost certainly the case that you created the wrong indexes. Most people do! :-)
There's no rule to create indexes on multiple columns vs. individual columns. The best indexes to create depends on the queries you run, not your database schema.
Analyzing the queries and deciding on indexes is a meticulous process. You can use EXPLAIN to see how a given query is using existing indexes. Be sure to read the docs:
http://dev.mysql.com/doc/refman/5.1/en/using-explain.html
http://dev.mysql.com/doc/refman/5.1/en/explain-output.html
Since I'm still in the beginning of my site design I figured now's a good time to ask this.
I know that one of the ways to optimize MySQL queries is to split your rows into seperate tables, however, that does have a few comfort issues.
What I'm considering is this: would querying a table consisting of around 1'000'000 rows and 150 columns using excellently designed indexes and getting only the needed columns from each query result in a much higher server load than splittiing the table into multiple ones, resulting in less collumns?
Big blob tables are a anti-pattern, never use them.
Normalized tables will run much much faster than a single blob.
InnoDB is optimized for many small tables that need to be joined.
Using a normalized table will save you many headaches besides:
Your data will be smaller, so more of it fits in memory.
You only store data in one place, so it cannot end up with inconsistent data.
MySQL only allows you to use one index per select per table, multiple tables means you get to use more indexes and get more speed.
Triggers on tables execute much faster.
Normalized tables are easier to maintain.
You have less indexes per table, so inserts are faster.
Indexes are smaller (fewer rows) and narrows (less columns) and will run much faster as a result.
If the data is static, you can pack the tables for greater efficiency. Here is the page in the reference manual