MySQL - Basic issue with a large table - mysql

In my db there are two large tables. The first one (A) has 1.7 million rows, the second one (B): 2.1 millions. Records in A and B have a fairly identical size.
I can do any operation on A. It takes time, but it works. On B, I can't do anything. Even a simple select count(*) just hangs for ever. The problem is I don't see any error: it just hangs (when I show the process list it just says "updating" for ever).
It seems weird to me that the small delta (percentage-wise) between 1.7 and 2.1 million could make such a difference (from being able to do everything, to not even be able to do the simplest operation).
Can there be some kind of 2 million rows hard limit?
I am on Linux 2.6+, and I use innoDB.
Thanks!
Pierre

It appears it depends more on the amount of data in each row than it does on the total number of rows. If the rows contain little data, then the maximum rows returned will be higher than rows with more data. Check this link for more info:
http://dev.mysql.com/doc/refman/5.0/en/innodb-restrictions.html

The row size (the number of bytes needed to store one row) might be much larger for the second table. Count(*) may require a full table scan - ie reading through the entire table on disk - larger rows mean more I/O and longer time.
The presence/absence of indexes will likely make a difference too.

As I was saying in my initial post, the thing was the two tables were fairly similar, so row size would be fairly close in both tables. That's why I was a bit surprised, and I started to think that maybe, somehow, a 2 million limit was set somewhere.
It turns out my table was corrupted. It is bizarre since I was still able to access some records (using joins with other tables), and mySQL was not "complaining". I found out by doing a CHECK TABLE: it did not return any error, but it crashed mysqld every time...
Anyway, thank you all for your help on this.
Pierre

Related

Time Complexity of Sorting a database

I'm currently developing a mobile app and using Codeigniter MySQL. I'm now faced with a situation where I have a table of books (this table will be 100k+ with records). With in this table I have a column called NotSelling. Example of db:
Book A 45
Book B 0
Book C 159
Book D 78
.
.
.
Book Z 450
Where above the numbers are what appears in the NotSelling column in the db. I need to extract the top 20 books from this large table. Now my solution to doing this is to sort the table and then just use TOP to extract the top 20 records.
What I would like to know is about the performance of sorting of the table. As I'm sure constantly sorting the table to simply get the top 20 results would take a hideously long time. I have been given solutions to the problem:
index the NotSelling problem.
cache the query (but I've read about coarse invalidation which may cause problems as my case the invalidation frequency would be high)
Sort the table take the top 20 records, place them in another table and then periodically just update the table say every hour or so.
But all this being said does anyone know of a better solution to this problem or have a way/method of optimizing the performance of the functionality I'm looking to do? Note I am a newbie so should anyone be able to point me in the right direction where I can read up about database performance I would really appreciate it.
I think you are thinking too much here. Definitely a case of premature optimization. While all the above mentioned solutions are perfectly valid. You should know that 100K+ records is chowder to Mysql. We used to routinely order on tables with 30 million+ rows, with excellent perf.
But You MUST have index on the column being sorted on and double check your table schema. Reg. Caching too don't worry, mysql does that for you for repetitive queries when the table has not changed. But index on column is a must, primary and most important requirement.
Don't worry about the performance of sorting. That can always be fixed in the database at a later time by adding an index if it actually proves to be a problem.
In the design phase, optimization is a distraction. Instead, focus on the functionality and the directness that the implementation represents the problem. As long as those are on target, everything else can be fixed comparatively easily.
Depending on what kind of meta-data is kept inside of the data structure of the index backing the column, a traversal can likely be done in O(n) time with n being the number of items returned.
This means that in theory, whether you have 1 million or 200 trillion records, pulling the first 20 will be just as fast as long as you have an index. In practice, there is going to be a performance difference since a small index will fit in memory whereas a large one will have to use the disk.
So in short, you're worrying too much. As Srikar Appal, a properly indexed 100k record table is nothing to MySQL

mysql Number of SQL Queries

I just converted an Access Db to mysql (using Access as frontend, mysql as backend)
It is a simple, 4 table database
I have a Master table, and 3 linked tables to it
So an Access form displays has data from:
Master table (mainform)
Details1 table (subform)
Details2 table (subform)
Details3 table (subform)
The master table will always be one row, however all the linked tables ("details" tables) can have any number of records, usually around 10-30 rows in each detail table per master record.
Everything runs well, however when checking Mysql Administrator Health>Connection Health>Number of SQL Queries, the number of queries jump to 10 (queries) everytime I move between Master record pages.
I am running this on my own laptop, and I am worried this will become a problem (performance) when I put 100+ users in the work server all working at once.
Could anyone advise if this high "number of queries" reported by Mysql Admin will be a problem?
What number is considered "dangerous" for performance purposes?
The idea is to have a fast running system, so I would like to avoid too many queries to the database.
I also dont understand why (example) it displays 7 queries when there are only 4 tables in total..with only one row per table being dislayed
ANy ideas/comments will be appreciated
Can there something be changed in Access front end to make the number of queries lower ?
thanks so much
Those 10 queries probably don't take a long time, and they are very likely sequential. I doubt there will be a problem with 100 users since they won't all be running the queries at once. Even then, mysql can handle quite a load.
I'm not sure what is going on inside Access. "Queries" can be just about anything (i.e. meta data), not just queries for records from the table. For example, getting the total number of records in a table to display something like "showing 23 of 1,000". If access is doing this for each table, that's an extra 4 queries right there, leaving only 3 to get actual data to display.
It's hard to be sure because it depends on a lot of things like server's memory, cpu and complexity of the queries but...
Supposing the queries for the subforms are directly linked to the master table (with an indexed id field) and do not need to join with other tables (as you have only 4 tables), I think you're ok to run without problems as the number of queries is not too high.
As an example, some years ago I had an old machine (Athlon XP1600 with only 512MB or 1GGB RAM) running mysql and serving files for 20 users. Most of the queries were small Stored Procedures using mainly 20 tables but returning a lot of rows (usualy around 2000 for the most used query). Everything was fast. This old system ran 14 millions queries in 2 months (average > 700 per minute) so I think you will be OK.
Anyways, if you have a way to do a partial test it would be the best option. You could use a small script querying the database in a loop on several machines for example.

Large MySQL Table - Advice Needed

I have a large mysql MyISAM table with 1.5mil rows and 4.5GB big, still increasing everyday.
I have done all the necessary indexing and the performance has been greatly optimized. Yet, the database occasionally break down (showing 500 Internal Server error) usually due to query overload. Whenever there is a break down, the table will start to work very slowly and I'll have to do a silly but effective task : copy the entire table over to a new table and replace the new one with the old one!!
You may ask why such a stupid action. Why not repair or optimize the table? I've tried that but the time to do repair or optimization may be more than the time to simply duplicate the table and more importantly the new table performs much faster.
Newly built table usually work very well. But over time, it will become sluggish (maybe after a month) and eventually lead to another break down (500 internal server). That's when everything slow down significantly and I need to repeat the silly process of replacing table.
For your info:
- The data in the table seldom get deleted. So there isn't a lot of overhead in the table.
- Under optimal condition, each query takes 1-3 secs. But when it becomes sluggish, the same query can take more than 30 seconds.
- The table has 24 fields, 7 are int, 3 are text, 5 are varchar and the rest are smallint. It's used to hold articles.
If you can explain what cause the sluggishness or you have suggestion on how to improve the situation, feel free to share it. I will be very thankful.
Consider moving to InnoDB. One of its advantages is that it's crash safe. If you need full text capabilities, you can achieve that by implementing external tools like Sphinx or Lucene.
Partitioning is a common strategy here. You might be able to partition the articles by what month they were committed to the database (for example) and then have your query account for returning results from the month of interest (how you partition the table would be up to you and your application's design/behavior). You can union results if you will need your results to come from more than one table.
Even better, depending on your MySQL version, partitioning may be supported by your server. See this for details.

mysql amount of data per table

I'm designing a system, and by going deep into numbers, I realize that it could reach a point where there could be a table with 54,240,211,584 records/year (approximately). WOW!!!!
So, I brook it down & down to 73,271,952 records/year (approximately).
I got the numbers by making some excel running on what would happen if:
a) no success = 87 users,
b) low moderated success = 4300 users,
c) high moderated success = 13199 users,
d) success = 55100 users
e) incredible success = nah
Taking into account that the table is used for SELECT, INSERT, UPDATE & JOIN statements and that these statements would be executed by any user logged into the system hourly/daily/weekly (historical data is not an option):
Question 1: is 2nd quantity suitable/handy for the MySQL engine, such that performance would suffer little impact???
Question 2: I set the table as InnoDB but, given the fact that I handle all of the statements with JOINS & that I'm willing to run into the 4GB limit problem, is InnoDB useful???
Quick overview of the tables:
table #1: user/event purchase. Up to 15 columns, some of them VARCHAR.
table #2: tickets by purchase. Up to 8 columns, only TINYINT. Primary Key INT. From 4 to 15 rows inserted by each table #1 insertion.
table #3: items by ticket. 4 columns, only TINYINT. Primary Key INT. 3 rows inserted by each table #2 insertion. I want to keep it as a separated table, but if someone has to die...
table #3 is the target of the question. The way I reduced to 2nd quantity was by making each table #3's row be a table #2's column.
Something that I dont want to do, but I would if necessary, is to partition the tables by week and add more logic to application.
Every answer helps, but it would be more helpful something like:
i) 33,754,240,211,584: No, so lets drop the last number.
ii) 3,375,424,021,158: No, so lets drop the last number.
iii) 337,542,402,115: No, so lets drop the last number. And so on until we get something like "well, it depends on many factors..."
What would I consider "little performance impact"??? Up to 1,000,000 records, it takes no more than 3 seconds to exec the queries. If 33,754,240,211,584 records will take around 10 seconds, that's excellent to me.
Why don't I just test it by myself??? I think I'm not capable of doing such a test. The stuff I would do is just to insert that quantity of rows and see what happens. I prefer FIRST the point of view of someone who has already known of something similar. Remember, I'm still in design stage
Thanks in advance.
54,240,211,584 is a lot. I only have experience with mysql tables up to 300 million rows, and it handles that with little problem. I'm not sure what you're actually asking, but here's some notes:
Use InnoDB if you need transaction support, or are doing a lot of inserts/updates.
MyISAM tables are bad for transactional data, but ok if you're very read heavy and only do bulk inserts/updates every now and then.
There's no 4Gb limit with mysql if you're using recent releases/recen't operating systems. My biggest table is 211Gb now.
Purging data in large tables is very slow. e.g. deleting all records for a month takes me a few hours. (Deleting single records is fast though).
Don't use int/tinyint if you're expecting many billions of records, they'll wrap around.
Get something working, fix the scaling after the first release. An unrealized idea is pretty much useless, something that works(for now) might be very useful.
Test. There's no real substitute - your app and db usage might be wildely different from someone elses huge database.
Look into partitioned tables, a recent feature in MySQL that can help you scale in may ways.
Start at the level you're at. Build from there.
There are plenty of people out there who will sell you services you don't need right now.
If $10/month shared hosting isn't working anymore, then upgrade, and eventually hire someone to help you get around the record limitations of your DB.
There is no 4Gb limit, but of course there are limits. Don't plan too far ahead. If you're just starting up and you plan to be the next Facebook, that's great but you have no resources.
Get something working so you can show your investors :)

MySQL speed optimization on a table with many rows : what is the best way to handle it?

I'm developping a chat application. I want to keep everything logged into a table (i.e. "who said what and when").
I hope that in a near future I'll have thousands of rows.
I was wondering : what is the best way to optimize the table, knowing that I'll do often rows insertion and sometimes group reading (i.e. showing an entire conversation from a user (look when he/she logged in/started to chat then look when he/she quit then show the entire conversation)).
This table should be able to handle (I hope though !) many many rows. (15000 / day => 4,5 M each month => 54 M of rows at the end of the year).
The conversations older than 15 days could be historized (but I don't know how I should do to do it right).
Any idea ?
I have two advices for you:
If you are expecting lots of writes
with little low priority reads. Then you
are better off with as little
indexes as possible. Indexes will
make insert slower. Only add what you really need.
If the log table
is going to get bigger and bigger
overtime you should consider log
rotation. Otherwise you might end up
with one gigantic corrupted table.
54 million rows is not that many, especially over a year.
If you are going to be rotating out lots of data periodically, I would recommend using MyISAM and MERGE tables. Since you won't be deleting or editing records, you won't have any locking issues as long as concurrency is set to 1. Inserts will then always be added to the end of the table, so SELECTs and INSERTs can happen simultaneously. So you don't have to use InnoDB based tables (which can use MERGE tables).
You could have 1 table per month, named something like data200905, data200904, etc. Your merge table would them include all the underlying tables you need to search on. Inserts are done on the merge table, so you don't have to worry about changing names. When it's time to rotate out data and create a new table, just redeclare the MERGE table.
You could even create multiple MERGE tables, based on quarter, years, etc. One table can be used in multiple MERGE tables.
I've done this setup on databases that added 30 million records per month.
Mysql does surprisingly well handling very large data sets with little more than standard database tuning and indexes. I ran a site that had millions of rows in a database and was able to run it just fine on mysql.
Mysql does have an "archive" table engine option for handling many rows, but the lack of index support will make it not a great option for you, except perhaps for historical data.
Index creation will be required, but you do have to balance them and not just create them because you can. They will allow for faster queries (and will required for usable queries on a table that large), but the more indexes you have, the more cost there will be inserting.
If you are just querying on your "user" id column, an index on there will not be a problem, but if you are looking to do full text queries on the messages, you may want to consider only indexing the user column in mysql and using something like sphynx or lucene for the full text searches, as full text searches in mysql are not the fastest and significantly slow down insert time.
You could handle this with two tables - one for the current chat history and one archive table. At the end of a period ( week, month or day depending on your traffic) you can archive current chat messages, remove them from the small table and add them to the archive.
This way your application is going to handle well the most common case - query the current chat status and this is going to be really fast.
For queries like "what did x say last month" you will query the archive table and it is going to take a little longer, but this is OK since there won't be that much of this queries and if someone does search like this he would be willing to wait a couple of seconds more.
Depending on your use cases you could extend this principle - if there will be a lot of queries for chat messages during last 6 months - store them in separate table too.
Similar principle (for completely different area) is used by the .NET garbage collector which has different storage for short lived objects, long lived objects, large objects, etc.