When to LOCK TABLES in MySQL (MyISAM tables)? - mysql

Is the internal locking of MySQL sufficient for a small to medium sized website? My tables are MyISAM. There might be a few hundred people concurrently hitting a specific table with SELECT'S and INSERT's. None of the INSERT/UPDATE queries would overlap. That is, no two users will be updating the same comment ID. INSERT's/UPDATE's would be one-off operations---there would be no reading of data and performing additional operations within the same query.
Specifically, I am setting up a comment/chat system for my website. At worst, there might be a couple of hundred people performing a SELECT statement on the comment/chat tables in order to read new posts. With respect to INSERT's, there might be 100(?) different people trying to INSERT a new comment at any time.
I found this article in another question on SO, and it states that LOCK TABLES is "never required for self-contained insert, update, or delete operations." Is this good practice for the amount of DB traffic that I might have? TIA for any advice.

The only form of locking MyISAM tables is table locks. The idea is that they designed it to be way-fast enough that no one else should need access while it works. Right - YMM definitely V. But for most small-to-medium-size websites, it's fine. For intance, that's what WordPress uses.

Locking is probably only something you need to worry about if rows are being viewed and edited at the same time. Don't worry about locking for what you're doing - it's not only good practice to do so, using locks here would hurt performance.

Related

Splitting database table for performance?

I have one MySQL innodb table storing data for my web service that is heavily used. I am wondering if splitting it into two tables(with identical schema) and having my web service balance load between the two tables offer any performance benefits if they are still sitting on the same database server?
Update: db inserts are slow and there’s an enum column called “type” that we can potentially use to split one table into multiple tables with different types
Thanks!
Probably no noticeable benefit.
Please elaborate on "heavily used":
If there are slow queries; let's see some of them; maybe they can be sped up.
Your thought about "horizontal partitioning" (even with the builtin PARTITION) is very unlikely to help performance.
If you have columns like "like_count" and/or "view_count", and these are frequently updated by one (thereby leading to "heavily used"), then there may be an excuse for pulling them out into a separate table. This is "vertical partitioning".
When a site becomes really busy (thousands of queries per second), "sharding" may be necessary. This involves moving some 'users' to other servers. This adds quite a bit of complexity to your application.
My point is that there is help for your system, but not necessarily the technique you are asking about.

How many rows are 'too many' for a MySQL table? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How many rows in a database are TOO MANY?
I am building the database scheme for an application that will have users, and each user will have many rows in relation tables such as 'favorites'.
Each user could have thousands of favorites, and there could be thousands of registered users (over time).
Given that users are never deleted, because that would either leave other entities orphaned, or have them deleted too (which isn't desired), and therefore these tables will keep growing forever, I was wondering if the resulting tables could be too big (eg: 1kk rows), and I should worry about this and do something like mark old and inactive users as deleted and remove the relations that only affect them (such as the favorites and other preferences).
Is this the way to go? Or can mysql easily handle 1kk rows in a table? Is there a known limit? Or is it fully hardware-dependant?
I agree with klennepette and Brian - with a couple of caveats.
If your data is inherently relational, and subject to queries that work well with SQL, you should be able to scale to hundreds of millions of records without exotic hardware requirements.
You will need to invest in indexing, query tuning, and making the occasional sacrifice to the relational model in the interests of speed. You should at least nod to performance when designing tables – preferring integers to strings for keys, for instance.
If, however, you have document-centric requirements, need free text search, or have lots of hierarchical relationships, you may need to look again.
If you need ACID transactions, you may run into scalability issues earlier than if you don't care about transactions (though this is still unlikely to affect you in practice); if you have long-running or complex transactions, your scalability decreases quite rapidly.
I'd recommend building the project from the ground up with scalability requirements in mind. What I've done in the past is set up a test environment populated with millions of records (I used DBMonster, but not sure if that's still around), and regularly test work-in-progress code against this database using load testing tools like Jmeter.
Millions of rows is fine, tens of millions of rows is fine - provided you've got an even remotely decent server, i.e. a few Gbs of RAM, plenty disk space. You will need to learn about indexes for fast retrieval, but in terms of MySQL being able to handle it, no problem.
Here's an example that demonstrates what can be achived using a well designed/normalised innodb schema which takes advantage of innodb's clustered primary key indexes (not available with myisam). The example is based on a forum with threads and has 500 million rows and query runtimes of 0.02 seconds while under load.
MySQL and NoSQL: Help me to choose the right one
It is mostly hardware dependant, but having that said MySQL scales pretty well.
I wouldn't worry too much about table size, if it does become an issue later on you can always use partitioning to ease the stress.

How can I optimize my database?

I am creating a platform for some clients. Each client needs to have contacts and manage them in groups, categories (which depends of the group) and subcategories (which depends of the category).
The database is going to be very big, and Im afraid about the performance. I want to optimize the database; now, I have these options:
Manage only one database with multiple tables (as we manage now)
Create a database for each client (each database will have the same multiple tables as the option 1)
Manage multiple XML files (like option 2, each client will have a directory with an XML for contacts, another XML file for groups, another for categories, and so on)
Wich is the best option for performance and management of the data (CRUD, create, read, update, delete)??
Thanks!!
I think one database with multiple tables is the way to go, because duplicating the database and schema for each new client doesn't scale well. XML files sounds cool but so far I haven't seen an XML read/write engine which is as fast as most RDBMSes, so bin that one.
To make this work (lots of tables in one database) you should pay attention to indexing and optimizing the one database; indexes in particular will help you maintain speed as you scale up.
Use clustered indexing on the clienId in whichever table it might exist as a foreign key. This procedure will give you the best client-centric performance because you would (usually) be pulling a particular client's info in a page fetch.
For #2, I would suggest making that a premium service to your clients. If they want "priority hosting" on a separate server of "their own" then they pay extra. That will make the maintenance headache worthwhile.
Have you tried actually implementing 1 (which is the easiest)?
Did you profile the code?
What is the performance now?
use EXPLAIN to see how the queries are performing?
Do you use indexes (often correct indexes are enough to give excellent performance changes)?
Optimize when you hit a bottleneck (or when you set certain benchmarks for performance), not during design phase...
UPDATE: You mentioned "millions of entries". That's nothing for mysql (provided you use correct indexes on your tables). I have a table with about 40 million rows & although it's not lightning fast it gives me results in a couple of seconds. So there you go...
3 is not advisable. Search etc. is not what XML files do efficiently.
2 is a maintenance problem.
1 should be doable. "very big" means what? I have a database with a tabe with currently 1.5 billion entries - that is "big" not "very big". What do you define as very big?
As far as ongoing maintenance and support goes I think only option 1 makes sense for you.
Index all columns you need to but nothing more. Look at your code and see how tables are being JOINed and index the columns which will otherwise require a table scan.
Indicies will speed up the read operations but slow down your write operations as you need to update the indicies as well as the column. They also need more space in the DB.
As suggested above use EXPLAIN to see how your queries are executing and what can be optimized there.
Finally performance tuning only works well after you baseline your existing performance, make a change, then baseline performance again to see if it helped. If not roll back and try something else. But always start with a known level of performance, otherwise you might end up making multiple changes which in total slow things down. Good luck!

mysql tables structure - one very large table or separate tables?

I'm working on a project which is similar in nature to website visitor analysis.
It will be used by 100s of websites with average of 10,000s to 100,000s page views a day each so the data amount will be very large.
Should I use a single table with websiteid or a separate table for each website?
Making changes to a live service with 100s of websites with separate tables for each seems like a big problem. On the other hand performance and scalability are probably going to be a problem with such large data. Any suggestions, comments or advice is most welcome.
How about one table partitioned by website FK?
I would say use the design that most makes sense given your data - in this case one large table.
The records will all be the same type, with same columns, so from a database normalization standpoint they make sense to have them in the same table. An index makes selecting particular rows easy, especially when whole queries can be satisfied by data in a single index (which can often be the case).
Note that visitor analysis will necessarily involve a lot of operations where there is no easy way to optimise other than to operate on a large number of rows at once - for instance: counts, sums, and averages. It is typical for resource intensive statistics like this to be pre-calculated and stored, rather than fetched live. It's something you would want to think about.
If the data is uniform, go with one table. If you ever need to SELECT across all websites
having multiple tables is a pain. However if you write enough scripting you can do it with multiple tables.
You could use MySQL's MERGE storage engine to do SELECTs across the tables (but don't expect good performance, and watch out for the Windows hard limit on the number of open files - in Linux you may haveto use ulimit to raise the limit. There's no way to do it in Windows).
I have broken a huge table into many (hundreds) of tables and used MERGE to SELECT. I did this so the I could perform off-line creation and optimization of each of the small tables. (Eg OPTIMIZE or ALTER TABLE...ORDER BY). However the performance of SELECT with MERGE caused me to write my own custom storage engine. (Described http://blog.coldlogic.com/categories/coldstore/'>here)
Use the single data structure. Once you start encountering performance problems there are many solutions like you can partition your tables by website id also known as horizontal partitioning or you can also use replication. This all depends upon the the ratio of reads vs writes.
But for start keep things simple and use one table with proper indexing. You can also determine if you need transactions or not. You can also take advantage of various different mysql storage engines like MyIsam or NDB (in memory clustering) to boost up the performance. Also caching plays a very good role in offloading the load from the database. The data that is mostly read only and can be computed easily is usually put in the cache and the cache serves the request instead of going to the database and only the necessary queries go to the database.
Use one table unless you have performance problems with MySQL.
Nobody here cannot answer performance questions, you should just do performance tests yourself to understand, whether having one big table is sufficient.

What techniques are most effective for dealing with millions of records?

I once had a MySQL database table containing 25 million records, which made even a simple COUNT(*) query takes minute to execute. I ended up making partitions, separating them into a couple tables. What i'm asking is, is there any pattern or design techniques to handle this kind of problem (huge number of records)? Is MSSQL or Oracle better in handling lots of records?
P.S
the COUNT(*) problem stated above is just an example case, in reality the app does crud functionality and some aggregate query (for reporting), but nothing really complicated. It's just that it takes quite a while (minutes) to execute some these queries because of the table volume
See Why MySQL could be slow with large tables and COUNT(*) vs COUNT(col)
Make sure you have an index on the column you're counting. If your server has plenty of RAM, consider increasing MySQL's buffer size. Make sure your disks are configured correctly -- DMA enabled, not sharing a drive or cable with the swap partition, etc.
What you're asking with "SELECT COUNT(*)" is not easy.
In MySQL, the MyISAM non-transactional engine optimises this by keeping a record count, so SELECT COUNT(*) will be very quick.
However, if you're using a transactional engine, SELECT COUNT(*) is basically saying:
Exactly how many records exist in this table in my transaction ?
To do this, the engine needs to scan the entire table; it probably knows roughly how many records exist in the table already, but to get an exact answer for a particular transaction, it needs a scan. This isn't going to be fast using MySQL innodb, it's not going to be fast in Oracle, or anything else. The whole table MUST be read (excluding things stored separately by the engine, such as BLOBs)
Having the whole table in ram will make it a bit faster, but it's still not going to be fast.
If your application relies on frequent, accurate counts, you may want to make a summary table which is updated by a trigger or some other means.
If your application relies on frequent, less accurate counts, you could maintain summary data with a scheduled task (which may impact performance of other operations less).
Many performance issues around large tables relate to indexing problems, or lack of indexing all together. I'd definitely make sure you are familiar with indexing techniques and the specifics of the database you plan to use.
With regards to your slow count(*) on the huge table, i would assume you were using the InnoDB table type in MySQL. I have some tables with over 100 million records using MyISAM under MySQL and the count(*) is very quick.
With regards to MySQL in particular, there are even slight indexing differences between InnoDB and MyISAM tables which are the two most commonly used table types. It's worth understanding the pros and cons of each and how to use them.
What kind of access to the data do you need? I've used HBase (based on Google's BigTable) loaded with a vast amount of data (~30 million rows) as the backend for an application which could return results within a matter of seconds. However, it's not really appropriate if you need "real time" access - i.e. to power a website. Its column-oriented nature is also a fairly radical change if you're used to row-oriented DBMS.
Is count(*) on the whole table actually something you do a lot?
InnoDB will have to do a full table scan to count the rows, which is obviously a major performance issue if counting all of them is something you actually want to do. But that doesn't mean that other operations on the table will be slow.
With the right indexes, MySQL will be very fast at retrieving data from tables much bigger than that. The problem with indexes is that they can hurt insert speeds, particularly for large tables as insert performance drops dramatically once the space required for the index reaches a certain threshold - presumably the size it will keep in memory. But if you only need modest insert speeds, MySQL should do everything you need.
Any other database will have similar tradeoffs between retrieve speed and insert speed; they may or may not be better for your application. But I would look first at getting the indexes right, and maybe rewriting your queries, before you try other databases. For what it's worth, we picked MySQL originally because we found it performed best.
Note that MyISAM tables in MySQL store the total size of the table. They maintain this because it's useful to the optimiser in some cases, but a side effect is that count(*) on the whole table is really fast. That doesn't necessarily mean they're faster than InnoDB at anything else.
I answered a similar question in This Stackoverflow Posting in some detail, describing the merits of the architectures of both systems. To some extent it was done from a data warehousing point of view but many of the differences also matter on transactional systems.
However, 25 million rows is not a VLDB and if you are having performance problems you should look to indexing and tuning. You don't need to go to Oracle to support a 25 million row database - you've got about 3 orders of magnitude to go before you're truly in VLDB territory.
You are asking for a books worth of answer and I therefore propose you get a good book on databases. There are many.
To get you started, here are some database basics:
First, you need a great data model based not just on what data you need to store but on usage patterns. Good database performance starts with good schema design.
Second, place indicies on columns based upon expected lookup AND update needs as update performance is often overlooked.
Third, don't put functions in where clauses if at all possible.
Fourth, use an -ahem- RDBMS engine that is of quality design. I would respectfully submit that while it has improved greatly in the recent past, mysql does not qualify. (Apologies to those who wish to argue it has finally made the grade in recent times.) There is no longer any need to choose between high-price and quality; Postgres (aka PostgreSql) is available open-source and is truly fantastic - and has all the plug-ins available to meet your needs.
Finally, learn what you are asking a database engine to do - gain some insight into internals - so you can better judge what kinds of things are expensive and why.
I'm going to second #Mark Baker, and say that you need to build indices on your tables.
For other queries than the one you selected, you should also be aware that using constructs such as IN() is faster than a series of OR statements in the query. There are lots of little steps you can take to speed-up individual queries.
Indexing is key to performance with this number of records, but how you write the queries can make a big difference as well. Specific performance tuning methods vary by database, but in general, avoid returning more records or fields than you actually need, make sure all join fields are indexed (as well as common where clause fields), avoid cursors (although I think this is less true in Oracle than SQL Server I don't know about mySQL).
Hardware can also be a bottleneck especially if you are running things besides the database server on the same machine.
Performance tuning is a very technical subject and can't really be answered well in a format like this. I suggest you get a performance tuning book and read it. Here is a link to one for mySQL
http://www.amazon.com/High-Performance-MySQL-Optimization-Replication/dp/0596101716